A guide to Array#pack and String#unpack

By Hannah Dwan

When manipulating strings in Ruby, or pretty much any programming language, for that matter, the average programmer might dive into Regex - or, more accurately, their Regex cheat sheet of choice. Ruby, though, comes prepackaged with a powerful little method for arrays and strings: Array#pack and String#unpack.

It’s existed in Ruby for, well, almost all of Ruby’s life so far, and running through the commit history for the method reveals a few lines that haven’t been touched in almost 20 years. The principle of Array#pack and String#unpack is that they manipulate data in accordance to templates that you can supply to the method. Or, in more confusing terms, Array#pack is described as this within the documentation:

“Packs the contents of arr into a binary sequence according to the directives in aTemplateString (see the table below) Directives “A,” “a,” and “Z” may be followed by a count, which gives the width of the resulting field. The remaining directives also may take a count, indicating the number of array elements to convert. If the count is an asterisk "*"), all remaining array elements will be converted. Any of the directives "sSiIlL" may be followed by an underscore ("_") to use the underlying platform’s native size for the specified type; otherwise, they use a platform-independent size. Spaces are ignored in the template string.”

On the flipside, String#unpack is described as:

“Decodes str (which may contain binary data) according to the format string, returning an array of each value extracted. The format string consists of a sequence of single-character directives, summarized in the table at the end of this entry. Each directive may be followed by a number, indicating the number of times to repeat with this directive. An asterisk (“*”) will use up all remaining elements. The directives sSiIlL may each be followed by an underscore (“_”) or exclamation mark (“!”) to use the underlying platform’s native size for the specified type; otherwise, it uses a platform-independent consistent size. Spaces are ignored in the format string.”

The table these two descriptions refer to is:

Integer directive Array Element Meaning
C Integer 8-bit unsigned (unsigned char)
S Integer 16-bit unsigned, native endian (uint16_t)
L Integer 32-bit unsigned, native endian (uint32_t)
Q Integer 64-bit unsigned, native endian (uint64_t)
c Integer 8-bit signed (signed char)
s Integer 16-bit signed, native endian (int16_t)
l Integer 32-bit signed, native endian (int32_t)
q Integer 64-bit signed, native endian (int64_t)
S_, S! Integer unsigned short, native endian
I, I_, I! Integer unsigned int, native endian
L_, L! Integer unsigned long, native endian
s_, s! Integer signed short, native endian
i, i_, i! Integer signed int, native endian
l_, l! Integer signed long, native endian
S> L> Q> s> l> q> S!> I!> s!> i!> l!> Integer same as the directives without ">" except big endian (available since Ruby 1.9.3), "S>" is same as "n", "L>" is same as "N"
S< L< Q< s< l< q< S!< I!< L!< s!< i!< l!< Integer same as the directives without "<" except little endian (available since Ruby 1.9.3), "S<" is same as "v", "L<" is same as "V"
n Integer 16-bit unsigned, network (big-endian) byte order
N Integer 32-bit unsigned, network (big-endian) byte order
v Integer 16-bit unsigned, VAX (little-endian) byte order
V Integer 32-bit unsigned, VAX (little-endian) byte order
U Integer UTF-8 character
w Integer BER-compressed integer
Float Directive Array Element Meaning
D, d Float double-precision, native format
F, f Float single-precision, native format
E Float double-precision, little-endian byte order
e Float single-precision, little-endian byte order
G Float double-precision, network (big-endian) byte order
g Float single-precision, network (big-endian) byte order
String Directive Array Element Meaning
A String arbitrary binary string (space padded, count is width)
a String arbitrary binary string (null padded, count is width)
Z String same as "a", except that null is added with *
B String bit string (MSB first)
b String bit string (LSB first)
H String hex string (high nibble first)
h String hex string (low nibble first)
u String UU-encoded string
M String quoted printable, MIME encoding (see RFC2045)
m String base64 encoded string (see RFC 2045, count is width)(if count is 0, no line feed are added, see RFC 4648)
P String pointer to a structure (fixed-length string)
p String pointer to a null-terminated string
Misc. Directive Array Element Meaning
@ --- moves to absolute position
X --- back up a byte
x --- null byte

In more helpful terms, these two methods allow you to manipulate data, relative to computer number format. The left column represents a possible character for the template, the central column represents what the type of its output is, and the right column describes the purpose. The table above isn’t the most user-friendly experience, so here’s a few examples of exactly what you could do with these methods.

Since Array#pack and String#unpack are rooted in computer number format, converting data to and from hexadecimal is perhaps the most obvious use of the methods. For example, the string ‘Happy Bear Software’ can be easily converted to hexadecimal.

’Happy Bear Software’.unpack(‘H*’) # => ["4861707079204265617220536f667477617265"]

The asterisk (*) here is used to indicate that the template should repeat its final character of the template for the rest of the string. You can then do the reverse to convert back to the original string.

["4861707079204265617220536f667477617265"].pack(‘H*’) # => ‘Happy Bear Software’

When trying to convert multiple hex texts to readable strings, it’s important to consider a specific part of the documentation above. The full description of the letter ‘H’ in templates is “hex string (high nibble first)” - this means that the elements in the array are interpreted as nibbles, if you were to convert multiple texts, you’d want to do it this way:

array = ['486170707920', '4265617220', '536f667477617265']
array.unpack(‘H*’ * array.size) # => ‘Happy Bear Software’

Binary, too, is made simple with these methods.

‘Happy Bear Software’.unpack(‘B*’) # => [‘01001000011000010111000001110000011110010010000001000010011001010110000101110010001000000101001101101111011001100111010001110111011000010111001001100101’]
[‘01001000011000010111000001110000011110010010000001000010011001010110000101110010001000000101001101101111011001100111010001110111011000010111001001100101’].pack(‘B*’) # => ‘Happy Bear Software’

Or, for example, you can do something similar, for encoding in and decoding out of base64 representation. This system is slightly different, as the plaintext is supplied within an array - this is because of how Array#pack and String#unpack's templates are coded, meaning it's a little unintuitive.

[‘Happy Bear Software’].pack(‘m*’) # => ‘SGFwcHkgQmVhciBTb2Z0d2FyZQ==\n’
‘SGFwcHkgQmVhciBTb2Z0d2FyZQ==\n’.unpack(‘m*’) # => [‘Happy Bear Software’]

Further useful templates include 'Z', which continues only until a null byte or the template ends in the case of String#unpack, and 'U', which converts bytes to Unicode code points. Introduced in Ruby 2.4, there's also String#unpack1, which will return the first value of what would otherwise be returned as an array. In the case of the final base64 encoding above, it would change to:

‘SGFwcHkgQmVhciBTb2Z0d2FyZQ==\n’.unpack1(‘m*’) # => ‘Happy Bear Software’

Of course, there are gems and other methods around this sort of thing: for single characters, the template of just ‘C’ functions almost identically to the method String#ord, many developers would use the base64 module included in Ruby’s standard library for encoding/decoding with base64 (although base64 itself uses Array#pack and String#unpack under the hood), and converting to binary strings is often done with String#to_s by providing an argument to indicate you’re looking for a base 2 output. What’s important about knowing these methods is that it helps massively in wrapping your head around low-level string manipulation.

Behind the scenes, Ruby does a lot of the legwork in interpreting the data we put in without us defining everything. It's helpful to get an idea of what Ruby is doing - these methods help with that. Seeing how strings are split into bytes, how text can be altered to unreadable states but retain a meaning. The way we use and transmit data isn't limited to something a human reads. Whenever something else becomes necessary, Array#pack and String#unpack are useful ways to think about what exactly needs to be done to the data.

Some templates are unlikely to be ever used - 'v' is 16 bit VAX byte order, but the endianness (which refers to the order in which data is stored, with the ‘big end’ being stored first at index 0 for big-endian and the ‘small end’ being stored at index 0 for little-endian, more can be read about endianness here) of outputs can be directly specified within the template, meaning this template is rarely required, and explicit endianness is a more readable output.

In fact, Array#pack and String#unpack aren’t going to revolutionise the websites you build, but they’ll help in ensuring you’re looking at the text you use in the right way. The relationships between strings, character codepoints, and hexadecimal or base64 text are often obscured and abstracted by friendlier names, but that abstraction creates a level of distance that makes it harder to understand why something might work.