Happy Bear Software

Byte manipulation in Ruby

Recently I've been working through the cryptography exercises by the guys at Matasano Security. Many of these involve feeding crafted ciphertext to an oracle function with the purpose of garnering some information about the encryption process.

Crafting the ciphertext involves manipulating data at the byte-level, for which ruby provides a few helpful tools.

Integers with different bases

Literals

It can be useful to to be able to type integer literals in different bases in your code:

97         #=> 97 in decimal (base 10)
0x61       #=> 97 in hex     (base 16)
0141       #=> 97 in octal   (base 8)
0b01100001 #=> 97 in binary  (base 2)

# These are all just different ways of typing the same thing:
[0141, 0x61, 0b01100001, 97] #=> [97, 97, 97, 97]

Representation at different bases

Ints take a parameter in to_s for the base:

97.to_s(10) #=> '97'      (base 10)
97.to_s(16) #=> '61'      (base 16)
97.to_s(8)  #=> '141'     (base 8)
97.to_s(2)  #=> '1100001' (base 2)

Strings/bytes

For a given character, can switch back and forth between chars/bytes with chr and ord:

97.chr  #=> 'a'
'a'.ord #=> 97

# For strings of length > 1, ord returns the integer representation
# of the first char:
'abc'.ord #=> 97

You can get the bytes out of a string with .bytes:

"Hello, world".bytes
# => [72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100]

Note that since ruby is unicode aware, this is not the same as .chars.map(&:ord).

Array#pack and String#unpack

Array#pack is a method for 'packing' an array into a string of bytes according to a certain format. Some examples:

hw_bytes = "Hello, world".bytes
# => [72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100]

# c* => Pack every element as an 8 bit signed integer
# As a string, this is exactly the same as "Hello, world"
hw_bytes.pack("c*") # => "Hello, world"

# s5 => Pack the first five elements as 16 bit signed integers
# Since "Hello".bytes is a bunch of 8-bit integers, the string contains
# a null byte after every 8-bit char
hw_bytes.pack("s5") # => "H\0e\0l\0l\0o\0"

# There's also an 'm' format that we can use for fast base64 encoding:
["Hello, world"].pack("m0") #=> 'SGVsbG8sIHdvcmxk'

String#unpack takes you in the other direction: it lets you take a string in a certain binary format and 'unpack' it into an array:

# This says: read the string as a sequence 8-bit integers and
# put each of those integers into an array
"Hello, world".unpack('c*')
# => [72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100]

# Read as a sequence of 16 bit integers and put each
# of those 16 bit integers into an array (gets us hw_bytes)
"H\0e\0l\0l\0o\0,\0 \0w\0o\0r\0l\0d\0".unpack('s*')
# => [72, 101, 108, 108, 111, 44, 32, 119, 111, 114, 108, 100]

# As with Array#pack, we can use the 'm' format and base64 decode
"SGVsbG8sIHdvcmxk".unpack('m0') # => ["Hello, world"]

XORing bits

XOR is a reversible way to 'mix' data together, and is the bread and butter operation of modern cryptography. At the logic level it means one or the other but not both and so for bits the truth table looks like this:

10
101
010

Like a lot of languages, in ruby the ^ operator denotes XOR. An example XOR with base 2 literals:

# The key insight with XORing bits is that in the result, all bits that
# are *different* are 1, and all bits that are the *same* are 0
0b1110 ^ 0b1101
# => 0b0011

Combining some of the other tools together, we can put together a short method to xor the bytes of two strings together and return the resulting byte string:

class String
  def xor_with(other_string)
    self.bytes.zip(other_string.bytes).map { |(a,b)| a ^ b }.pack('c*')
  end
end

# This gets us random junk
"Hello".xor_with('world')
# => "?\n\x1E\0\v"

# A strings bytes XOR'd with its self should be a string of null bytes:
"Hello".xor_with("Hello")
# => "\0\0\0\0\0"

# This is useful for when you want to compare two strings for equality
# in a time-insensitive way (i.e. to mitigate timing attacks)