In C, handling binary data such as network packets feels almost like a core part of the language. In Python on the other hand, there are a lot of supporting library functions required to facilitate this.
As I only occasionally use Python for this purpose, I’ve written up the below as a reference for myself. All of these examples target Python 3.
Python has three built in functions for base conversions. These are int()
, hex()
and bin()
. Note that hex()
and bin()
both return strings.
Considering the example where x = 42
:
int(x)
gives 42
hex(x)
gives '0x2a'
bin(x)
gives '0b101010'
Alternatively, we can get slightly more control over the output by using the str.format()
method and it’s format syntax.
For example, the following outputs zero-padded binary numbers to a width of 8:
"{0:08b}".format(x)
produces '00101010'
If the initial value you wish to convert is a string, the int()
function can be used to firstly convert it to an integer. This requires providing both the string and its base as arguments to the int()
function.
In the case where x = "0x2a"
:
int(x,16)
gives 42
bin(int(x,16))
gives '0b101010'
"{0:08b}".format(int(x,16))
gives '00101010'
The ord()
built in function returns the integer value / code point of a specified character. For example, examining the “straight” ASCII apostrophe and the “curly” opening version:
>>> ord("'")
39
>>> ord("‘")
8216
The chr()
function preforms the inverse of ord()
. It will return the string representation of an integer argument. If you wanted the rocket symbol you could issue:
>>> chr(0x1F680)
'🚀'
Binary values can be stored within the bytes
object. This object is immutable and can store raw binary values within the range 0 to 255. It’s constructor is the aptly named bytes()
. There are several different ways to initialise a bytes object:
>>> bytes((1,2,3))
b'\x01\x02\x03'
>>> bytes("hello", "ascii")
b'hello'
The bytearray
object serves the same purpose as bytes
but is mutable, allowing elements in the array to be modified. It has the constructor bytearray()
.
>>> x = bytearray("hello.", "ascii")
>>> x
bytearray(b'hello.')
>>> x[5] = ord("!")
>>> x
bytearray(b'hello!')
A bytes
literal can be specified using the b
or B
prefix, e.g. b"bytes literal"
.
Comparing this with a standard string:
type("string literal")
gives <class 'str'>
type(b"bytes literal")
gives <class 'bytes'>
Non-ASCII bytes can be inserted using the
"\xHH"
escape sequence. This places the binary representation of the hexadecimal number 0xHH
into the string, e.g. b"The NULL terminator is \x00"
.
The str
object has an encode()
method to return the bytes
representation of the string. Similarly, the bytes
object has a decode()
method to return the str
representation of the data:
"string to bytes".encode("ascii")
gives b'string to bytes'
b"bytes to string".decode("ascii")
gives 'bytes to string'
The hexadecimal string representation of a single byte requires two characters, so a hex representation of a bytes
string will be twice the length.
To convert from bytes to a hex representation use binascii.hexlify()
and from hex to bytes binascii.unhexlify()
.
For example, where x = b"hello"
binascii.hexlify(x)
gives b'68656c6c6f'
binascii.hexlify(x).decode()
gives '68656c6c6f'
The reverse process, if y = "68656c6c6f"
binascii.unhexlify(y.encode())
gives b'hello'
The struct
module provides a way to convert data to/from C structs (or network data).
The key functions in this module are struct.pack()
and struct.unpack()
. In addition to the data, these functions require a format string to be provided to specify the byte order and the intended binary layout of the data.
Consider an IPv4 header. This structure contains some fields that are shorter than a byte (octet), e.g. the version
field is 4-bits wide (aka a nibble). The smallest data unit struct
can handle is a byte, so these fields must be treated as larger data units and then extracted separately via bit shifting.
IPv4 Field | Format Character |
---|---|
Version and IHL | B |
Type of Service | B |
Total Length | H |
Identification | H |
Flags and Fragmentation Offset | H |
Time to Live | B |
Protocol | B |
Header Checksum | H |
Source Address | L |
Destination Address | L |
As this data should be in network byte order, we need to specify this with an exclamation mark, !
. The format string which represents an IPv4 header is therefore: !BBHHHBBHLL
.
Below is an example of packing IPv4 fields into a bytes
object and hex stream:
import struct
import binascii
fmt_string = "!BBHHHBBHLL"
version_ihl = 4 << 4 | 4
tos = 0
total_length = 100
identification = 42
flags = 0
ttl = 32
protocol = 6
checksum = 0xabcd
s_addr = 0x0a0b0c0d
d_addr = 0x01010101
ip_header = struct.pack(fmt_string,
version_ihl,
tos,
total_length,
identification,
flags,
ttl,
protocol,
checksum,
s_addr,
d_addr)
print(ip_header)
print(binascii.hexlify(ip_header).decode())
The output of this is:
b'D\x00\x00d\x00*\x00\x00 \x06\xab\xcd\n\x0b\x0c\r\x01\x01\x01\x01'
44000064002a00002006abcd0a0b0c0d01010101
The unpack()
method can reverse this process:
ip_header_fields = struct.unpack(fmt_string, ip_header)
print(ip_header_fields)
The unpacked data is a tuple of the individual fields:
(68, 0, 100, 42, 0, 32, 6, 43981, 168496141, 16843009)