<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title><![CDATA[0xpebbles.org]]></title>
    <link><![CDATA[http://blog.0xpebbles.org]]></link>
    <description><![CDATA[0xpebbles blog]]></description>
    <lastBuildDate>Fri, 09 Apr 2021 08:37:23 +0200</lastBuildDate>
    <pubDate>Fri, 09 Apr 2021 08:37:23 +0200</pubDate>
    <language>en</language>

<!-- 20200726 -->    <item>
      <title>Global, static string->data associative array for a C program</title>
      <link>http://blog.0xpebbles.org/Static-associative-array-for-a-C-program</link>
      <pubDate>26 Jul 2020 00:00:00 +0000</pubDate>
      <content:encoded><![CDATA[

  

<p>
I pondered the question whether there is a <em>simple</em> way to have a static
(as in static lifetime) read-only associative array in C, with <b>strings as keys</b>,
mapping to any kind of data, ideally without having to manually handle its
lifetime. So, basically similar to what one would get with standard static,
const arrays, like the following:
</p>

<style type="text/css">
<!--
pre.c, pre.sh { font-family: monospace; color: #b2b2b2; background-color: #000000; }
.Type { color: #d7d787; }
.String { color: #8787d7; }
.Comment { color: #626262; }
.Constant { color: #87afff; font-weight: bold; }
.Special { color: #00d700; font-weight: bold; }
.Identifier { color: #ffd700; font-weight: bold; }
.Statement { color: #ff8700; font-weight: bold; }
-->
</style>

<pre class='c'>
<span class="Comment">// in some header</span>
<span class="Type">extern</span> <span class="Type">const</span> <span class="Type">int</span> month_days[];

<span class="Comment">// in some translation unit</span>
<span class="Type">const</span> <span class="Type">int</span> month_days[] = { <span class="Constant">31</span>, <span class="Constant">28</span>, <span class="Constant">31</span>, <span class="Constant">30</span>, <span class="Constant">31</span>, <span class="Constant">30</span>, <span class="Constant">31</span>, <span class="Constant">31</span>, <span class="Constant">30</span>, <span class="Constant">31</span>, <span class="Constant">30</span>, <span class="Constant">31</span> };
</pre>

<p>
Turns it it's not too hard, actually, at least on systems with dynamic linkers. We would need a
dynamic symbol for each entry, allowing for lookups via dlsym(3), getting back a pointer to our data.
</p>

<p>
Staying with our example above, let's say we want to lookup the same by name:
</p>

<pre class='c'>
<span class="Type">const</span> <span class="Type">int</span> month_day_jan = <span class="Constant">31</span>;
<span class="Type">const</span> <span class="Type">int</span> month_day_feb = <span class="Constant">28</span>;
<span class="Type">const</span> <span class="Type">int</span> month_day_mar = <span class="Constant">31</span>;
<span class="Type">const</span> <span class="Type">int</span> month_day_apr = <span class="Constant">30</span>;
<span class="Type">const</span> <span class="Type">int</span> month_day_may = <span class="Constant">31</span>;
<span class="Type">const</span> <span class="Type">int</span> month_day_jun = <span class="Constant">30</span>;
<span class="Type">const</span> <span class="Type">int</span> month_day_jul = <span class="Constant">31</span>;
<span class="Type">const</span> <span class="Type">int</span> month_day_aug = <span class="Constant">31</span>;
<span class="Type">const</span> <span class="Type">int</span> month_day_sep = <span class="Constant">30</span>;
<span class="Type">const</span> <span class="Type">int</span> month_day_oct = <span class="Constant">31</span>;
<span class="Type">const</span> <span class="Type">int</span> month_day_nov = <span class="Constant">30</span>;
<span class="Type">const</span> <span class="Type">int</span> month_day_dec = <span class="Constant">31</span>;
</pre>

<p>
Now when linking, we need to declare those symbols to be dynamic, this can be done for example with
the --dynamic-list linker flag, and a file like the following:
</p>

<pre>
{
	month_day_jan;
	month_day_feb;
	month_day_mar;
	month_day_apr;
	month_day_may;
	month_day_jun;
	month_day_jul;
	month_day_aug;
	month_day_sep;
	month_day_oct;
	month_day_nov;
	month_day_dec;
};
</pre>

<p>
Now we can do lookups like the following, for example:
</p>

<pre class='c'>
<span class="Type">int</span> x = *(<span class="Type">int</span>*)dlsym(<span class="Constant">NULL</span>, <span class="String">&quot;month_day_oct&quot;</span>);
</pre>

<p>
This is simple and straightforward, and doesn't come as a surprise - after all
this is how object files are structured and how linkers work. However, I never
considered (ab)using the dynamic linker as some sort of dynamic, associative
array lookup equivalent.
</p>

<p>
The upsides are that the data is embedded, that its lifetime
is static, that you can use strings as keys, that you don't have to do any
memory management (like fill a map at startup and free it at the end), that
dlsym(3) lookups are efficiently implemented (well, most likely at least), etc.
</p>

<p>
There are downsides, however:
</p>

<ul>
<li>the keys being symbol names are limited to alphanumeric characters and underscores, and cannot start with a number (some platforms might allow for other characters)</li>
<li>they are potentially name-mangled by the compiler (use objdump -t to check)</li>
<li>the names must be globally unique and are subject to name clashes with other unrelated symbols</li>
</ul>

<p>
Let's look at another example, embedding binary data directly, without using any C code,
by also making sure that the data is in the .rodata section, and creating the
dynamic symbol list from our object file:
</p>

<pre class='sh'>
<span class="Comment"># with LLVM's ld you might need to pass -m explicitly, e.g. -m elf_amd64</span>
ld <span class="Special">-r</span> <span class="Special">-b</span> binary <span class="Special">-o</span> bins.o file1.png folder_x/otherfile.txt
objcopy <span class="Special">--rename-section</span> .<span class="Identifier">data</span>=.rodata,contents,alloc,load,readonly bins.o
<span class="Comment"># generate dynamic symbol list</span>
nm bins.o | awk <span class="Statement">'</span><span class="String">BEGIN{print &quot;{&quot;}{print $3&quot;;&quot;}END{print &quot;};&quot;}</span><span class="Statement">'</span> <span class="Statement">&gt;</span> bins.symlst
</pre>

<p>
This will actually create 3 symbols per file, with a filename-based symbol name
and some prefix and suffixes, all pretty self explanatory:
</p>

<pre>
_binary_file1_png_end
_binary_file1_png_size
_binary_file1_png_start
_binary_folder_x_otherfile_txt_end
_binary_folder_x_otherfile_txt_size
_binary_folder_x_otherfile_txt_start
</pre>

]]></content:encoded>
    </item>




  </channel>
</rss>

