\\\\ . ====--------------- ELF Files: Symbolic Troubles ~ DeLuks ---------------| > //// ' INTRODUCTION: Greetings young adventurer! So you have decided to learn the dark arts of the ELF and are having troubles with the mysterious symbols? Well fear not, for I am here to help you out! But first, let me clarify what those symbols are. PART I:_WHAT_EVEN_IS_A_SYMBOL?______________________________________________ A symbol is essentially a name that represents a function, variable or some other entity in our code. These are used by our linker to resolve addresses when combining different object files into one executable or shared library. Since symbols are used as identifiers we store them in a so-called "symbol table" to ensure their uniqueness. The simplest form of a symbol would be an ANSI C integer. Say we have the following C code: ---------------------------------------------------------------------------- int ABC = 404; ---------------------------------------------------------------------------- Now, if we only compiled the code with "gcc -c code.c", there would be an entry in the symbol table with the name "ABC". Let's check it out with readelf: ---------------------------------------------------------------------------- $ readelf -s code.o Symbol table '.symtab' contains 3 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS code.c 2: 0000000000000000 4 OBJECT GLOBAL DEFAULT 2 ABC <------------. -------------------------------------------------------------------------- | | Do not be afraid of the complex output, everything will make sense later, | but as mentioned above, our beautiful new variable name will show up here.-' Now let us go over the many types of symbols: ---------------------------------------------------------------------------- Names Values ---------------------- STT_NOTYPE 00 STT_OBJECT 01 STT_FUNC 02 STT_SECTION 03 STT_FILE 04 STT_LOPROC 13 STT_HIPROC 15 ---------------------------------------------------------------------------- Again, we have many funky words, but it will all make sense in a bit! Let's start with the simplest one, the "NOTYPE". Its value is 0, and it just means that the symbol's type is not specified. Simple, right? Anyways on to the second type - the "OBJECT" type. It simply means that the symbol is associated with a data object, like an array or simply a variable like in our case. Do note that private variables do not have symbols, only global and static variables do (static variables get a number appended to them to differentiate between two instances of the variable). Next, we got "FUNC", which just means that the symbol is associated with a function or executable code, shrimple. These types have special significance in shared object files (.so), when another object file function references a symbol in our shared object, the linker automatically does its magic by creating a so-called "procedure linkage table entry" for our referenced symbol. After that, we have the "SECTION" type. This one is self-explanatory, the symbol is related to a section, like for example the .text section that holds all the executable code. The "SECTION" symbol is essentially just here to help with the relocations. Right after, there is the "FILE" type. This one references the name of the source file associated with our object file. In my case this is the "code.c" file. As to why we need the source file, it's just for debugging purposes. More on this one in a bit. Finally we have the "LOPROC" and "HIPROC" types - these are values reserved for processor-specific semantics. PART II:_THE_TALE_OF_THE_SYMBOL_TABLE_______________________________________ Sooo now that we are familiar with symbols let's try to learn a little about the "Symbol Table". Let's first clarify what it is and where it's located. The symbol table is, as the name implies, a collection of symbol table entries. The linker uses those entries to locate and relocate a program's definitions and references. There are two possible sections where the symbol tables may be: the ".symtab" for statically linked symbols and the ".dynsym" for dynamically linked symbols. .-INFO:-What-was-the-difference-again?-------------------------------------. | | | So imagine you want to work on a project of yours with a library. With | | static linking the library gets compiled *with* your code, resulting in | | a larger binary. And with dynamic linking, your library is in some | | shared object (.so) and your program is in a separate file, so at run- | | time, when your program gets loaded, the shared object gets loaded with | | it in memory. | '--------------------------------------------------------------------------' Now, each entry is defined like seen below: ---------------------------------------------------------------------------- typedef struct { Elf32_Word st_name; // index of name in the SHT_STRTAB Elf32_Addr st_value; // value or address of the symbol Elf32_Word st_size; // size of symbol unsigned char st_info; // binding, type and other info unsigned char st_other; // symbol visibility Elf32_Half st_shndx; // section header table index } Elf32_Sym; ---------------------------------------------------------------------------- As you can see, we have a lot of members here. Let's look at them one by one, starting with the "st_name" member. The "st_name" member, as the spec says, holds an index into the object's symbol string table, and that one holds the character representations of the symbol names. The next member in the struct is the "st_value", which holds the value or the address of the symbol. Following that we got "st_size", which, as the name already says, holds the size of the symbol that is *IF* it has a size or a known size. After that, we have a little more complex member called "st_info". It holds 3 values: the type of the symbol, the binding of the symbol, and some additional data. Since we talked about the types we can skip those and talk about the bindings. If we read in the specification it says that "A symbol's binding determines the linkage, visibility and behaviour" or in simpler terms the binding tells the loader how the symbol should be treated, to sum up there are 5 possible values for the bindings: ---------------------------------------------------------------------------- Name Value ------------------ STB_LOCAL 00 STB_GLOBAL 01 STB_WEAK 02 STB_LOPROC 13 STB_HIPROC 15 ---------------------------------------------------------------------------- Okayyy, we are gonna keep this short, "LOCAL" means that the symbols are local to the file, "GLOBAL" means that other files can access the symbol, and "WEAK" is similar to "GLOBAL" but has a lower precedence. Other ones are processor-specific semantics... However in the symbol table the "LOCAL" bindings precede the "WEAK" and "GLOBAL" symbols. Anyways, going back to the struct, we can modify the symbol table values using the following macros: ---------------------------------------------------------------------------- #define ELF32_ST_BIND(i) ((i)>>4) #define ELF32_ST_TYPE(i) ((i)&0xf) #define ELF32_ST_INFO(b, t) (((b)<<4)+((t)&0xf)) ---------------------------------------------------------------------------- This can be useful when changing symbol visibility, analyzing/developing malware, patching binaries or just for modifying the symbol table for fun... Neeeeeext we have the "st_other" member, which was originally undefined, however as of the 2001 update of the ABI it defines the symbol visibility. This field describes how the symbol is accessed and linked across different components and whether it can be overwritten or preempted. There are four possible values: STV_DEFAULT - 0 - default behaviour - global and weak symbols are available to other modules; definitions in other modules may preempt the symbol. STV_INTERNAL - 1 - processor-specific class. STV_HIDDEN - 2 - not visible to other modules; local symbols always resolve to it. <-----------------------------. STV_PROTECTED - 3 - available to other modules, but local symbols always | resolve to it. | .-------------------------------------------------------------------------' NOTE: these must be removed or converted to STB_LOCAL when the binary is located in a executable file or shared object. And finally we have the "st_shndx", that stores the relevant section headers index where the symbol appears. Cool, now we can read this: ---------------------------------------------------------------------------- $ readelf -s code.o Symbol table '.symtab' contains 3 entries: Num: Value Size Type Bind Vis Ndx Name 0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND 1: 0000000000000000 0 FILE LOCAL DEFAULT ABS code.c 2: 0000000000000000 4 OBJECT GLOBAL DEFAULT 2 ABC ---------------------------------------------------------------------------- But what's up with the first index being empty? Well, it is used as a consistent starting value, if it's empty we know we're doing something right, if the first value is something else, we know we messed something up. By now you may have been wondering, "Great, but how do I find the symbol table?". Well, do not worry, it's relatively simple, allow me to illustrate the process with a graph: ---------------------------------------------------------------------------- find section headers find entry .-------------------. .-------------------. ( be sure to check | ElfN_Ehdr.e_shoff | ----> | ElfN_Shdr.sh_type | if it's SYMTAB or DYNSYM ) '-------------------' '-------------------' ' | .---------------------. '---------------> | ElfN_Shdr.sh_offset | '---------------------' offset to symbol table in file ---------------------------------------------------------------------------- So as we can see, we get the section headers offsets from the ELF header. From there on out we find a symbol table entry in the section headers by looking for a .SYMTAB or .DYNSYM entry (.sh_type) and finally once we find one we read the .sh_offset to get to our offset! CONCLUSION:_________________________________________________________________ In this article we have checked out the symbol table, what symbols are, and how they are used by the linker. I hope this article has helped you in some way to understand symbols in ELF files. For more information be sure to check out the references and if you still have questions, I'd be glad to help in tmp.0uts' discord server! ))) ASSIGNMENT:_________________________________________________________________ If you feel brave enough, you can pick your favorite language and write a small program that will parse the symbol tables and display them in some creative way. REFERENCES:_________________________________________________________________ - https://www.man7.org/linux/man-pages/man5/elf.5.html - http://flint.cs.yale.edu/cs422/doc/ELF_Format.pdf --[ PREV | HOME | NEXT ]--