Global, static string->data associative array for a C program

I pondered the question whether there is a simple way to have a static (as in static lifetime) read-only associative array in C, with strings as keys, mapping to any kind of data, ideally without having to manually handle its lifetime. So, basically similar to what one would get with standard static, const arrays, like the following:

// in some header
extern const int month_days[];

// in some translation unit
const int month_days[] = { 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31 };

Turns it it's not too hard, actually, at least on systems with dynamic linkers. We would need a dynamic symbol for each entry, allowing for lookups via dlsym(3), getting back a pointer to our data.

Staying with our example above, let's say we want to lookup the same by name:

const int month_day_jan = 31;
const int month_day_feb = 28;
const int month_day_mar = 31;
const int month_day_apr = 30;
const int month_day_may = 31;
const int month_day_jun = 30;
const int month_day_jul = 31;
const int month_day_aug = 31;
const int month_day_sep = 30;
const int month_day_oct = 31;
const int month_day_nov = 30;
const int month_day_dec = 31;

Now when linking, we need to declare those symbols to be dynamic, this can be done for example with the --dynamic-list linker flag, and a file like the following:


Now we can do lookups like the following, for example:

int x = *(int*)dlsym(NULL, "month_day_oct");

This is simple and straightforward, and doesn't come as a surprise - after all this is how object files are structured and how linkers work. However, I never considered (ab)using the dynamic linker as some sort of dynamic, associative array lookup equivalent.

The upsides are that the data is embedded, that its lifetime is static, that you can use strings as keys, that you don't have to do any memory management (like fill a map at startup and free it at the end), that dlsym(3) lookups are efficiently implemented (well, most likely at least), etc.

There are downsides, however:

  • the keys being symbol names are limited to alphanumeric characters and underscores, and cannot start with a number (some platforms might allow for other characters)
  • they are potentially name-mangled by the compiler (use objdump -t to check)
  • the names must be globally unique and are subject to name clashes with other unrelated symbols

Let's look at another example, embedding binary data directly, without using any C code, by also making sure that the data is in the .rodata section, and creating the dynamic symbol list from our object file:

# with LLVM's ld you might need to pass -m explicitly, e.g. -m elf_amd64
ld -r -b binary -o bins.o file1.png folder_x/otherfile.txt
objcopy --rename-section .data=.rodata,contents,alloc,load,readonly bins.o
# generate dynamic symbol list
nm bins.o | awk 'BEGIN{print "{"}{print $3";"}END{print "};"}' > bins.symlst

This will actually create 3 symbols per file, with a filename-based symbol name and some prefix and suffixes, all pretty self explanatory: