Binary blobs

This is a very short article on a neat feature of the GNU compile tools, that I don’t think many people are aware of. It is often required to include large sets of binary data with an executable. One option is to just include it as a separate data file. But there are cases where you might not have access to a storage medium. Another option is to convert it to a byte array and declare it as a constant array variable, but a large data set will quickly make your source files cumbersome to work with.

A neat feature of the GNU tools, is the ability to link a separate binary file right into the executable. This is ideal for our kernel project, where I included a font file as well a logo file, right into the kernel image file. The way this is done, is to use the objcopy tool, to convert the file into an object file as follows:

objcopy -I binary -O elf-littlearm32 -B arm ./bin/binfile.dat binfile.o

The resulting object file is then linked along with the other object files. The -I tells objcopy that our input file is in binary format. The -O indicates that our output format is elf-littlearm32. The -B indicates that our architecture is arm and the remaining two parameters are the input and out filenames.

Having a binary file linked into the executable is only half of the magic, though. We are not using it yet. The way we reference the data from our code, is a bit awkward at first. The linker resolves the memory occupied by our binary data to standard variables that we can reference from our code. To be exact, three variables are available, a start and end memory address pointer and a length variable. Using the command in previous example, results in the following variables:

unsigned char _binary_bin_binfile_dat_start
unsigned char _binary_bin_binfile_dat_end
unsigned int _binary_bin_binfile_dat_size

A little closer inspection should make it obvious how the linker comes up with variable names, based on the parameters we supplied. To see the variables that the linker provided from our previous example, run the following command:

objdump -x binfile.o

The variables _binary_bin_binfile_dat_start and _binary_bin_binfile_dat_end are pointers and should be used as follows:

unsigned char *ptr_start=(unsigned char*)&_binary_bin_binfile_dat_start;
unsigned char *ptr_end=(unsigned char*)&_binary_bin_binfile_dat_end;

And there you have it! A neat way to handle messy data sets without making your head explode.

Comments are closed.