:: :: :: Weird ELFs, or a tale of breaking parsers once again :: :: :: g1inko :: [ 0x00: intro ] :: Some time ago i enjoyed a speedrun of LiveOverflow videos, and ideas from one of them [1] haunted me until my curiosity took over and i finally started my own research. The point was simple: feed a program an ELF corrupted so that a parser breaks, but the kernel nevertheless successfully runs the executable. I decided to dig a bit deeper and not just get broken-but-runnable ELF files-- it has been done many times before me already--but also comprehend the problem of parsers more entirely and widely. And here i want to share my story. :: :: ::----------------------------[ C O N T E N T S ]----------------------------:: :: :: 0x01: langsec langsec+0x10: a note on polyglots langsec+0x20: parse tree differentials 0x02: ELF parsers ELF parsers+0x10: defining targets 0x03: finding differentials finding differentials+0x10: Linux finding differentials+0x20: gdb finding differentials+0x30: edb and CVE-2023-27734 finding differentials+0x40: r2hang 0x04: .atexit 0xfe: thanks 0xff: references :: :: ::-----------------------------[ 0x01: langsec ]-----------------------------:: :: :: Research dedicated to parser problems specifically started with the birth of langsec [2], and many parser bugs were discovered and discussed ever since. Langsec studies different parser problems, namely where the theory (i.e. specs) and implementations differ. Be liberal in what you recieve, they say [3]. This is, some deviations from the standards are ok as long as the context is enough to recover the meaning of the input data. That is, a parser can fill the gaps and presume the missing data as it wants. Sometimes being much more liberal than the spec tells to be. And, sometimes, different parsers fill the gaps different ways. These lead to an interesting outcome, making funny things possible. Here we'll take a look at two of them: polyglot files and parse tree differentials. :: `------------------> langsec+0x10: a note on polyglots Polyglot files happen when a standard or its implementation are loose enough to embed another file into it, so that two parsers of two different formats can successfully operate with such file. As corkami mentions [4], PDF allows much liberty in this sense, and he uses PDF a lot in his own PoCs of polyglots. One of my favourites is a PDF that, when run, acts as a python web-server with a JS compiler and many more [5]. It's much fun to reverse that :D Another example i'd like to mention is an αcτµαlly pδrταblε εxεcµταblε [6]. Its author managed to mix PE, ELF, Mach-O, sh and bare bootsector into one, so that a single file is able to run in those environments. Incredible research, may i say. :: `------------------> langsec+0x20: parse tree differentials This type of parser issues becomes possible when two implementations of the same standard behave differently in some cases. During my research, i gathered several examples i find the most demonstrative. The first one i ought to mention is bugs in x509 certificates parsers described in 2010 by langsec pioneers, Len Sassaman and Meredith Patterson, who wondered: what if put a NULL-byte in a CN field [7]? This turned out to be a phisher's dream, as the browsers under test showed a user only a part of domain prior to the NULL-byte, while the domain might be something like: "www.bank.com\x00badguy.com" with a completely valid certificate. Another example i like is about XML. XML is an amazing format to find bugs around: it's generally so simple for humans yet so hard to be parsed correctly. A security researcher once asked: `Find me two different XML parsers that always, for every input, result in the same output' [8]. What could possibly go wrong if it was used to hold entitlements info an PLIST files? Even more, what could ever go wrong if we had four XML parsers in the system, one of which operated in kernel space? This bug could be used for a sandbox escape in iOS. And at that time, an incredible solution was to introduce the fifth PLIST parser, making the researcher who found the bug even more happy [9, section 5]. Later, however, they decided to move to a binary DER format [10]. Yet another amusing parser issue is named Sophail, after Sophos AV, and relates to the fact that the AV skipped further checks on an ELF if it was of one specific architecture [11]. As for breaking analysis in practice, a crash due to insufficient checks on some PE header data in x64dbg was said to be used by malware in the wild [12]. :: :: ::---------------------------[ 0x02: ELF parsers ]---------------------------:: :: :: Now that we've come closer to ELF parsers, let's see what has been done on it already. Back in 2012 nitr0us performed similar research [13] on how to make an ELF debug-resistant, but not by the runtime checks, as it ways for the latter were known already and thus not that interesting. nitr0us managed to break gdb 7.5.1 and IDA Pro 6.3 with an ELF that ran perfectly; and a bug in OpenBSD ELF loader was also found [14]. :: `------------------> ELF parsers+0x10: defining targets Now it's time to clarify the task. First i wanted to get an ELF that would run but would fail to open in gdb, edb-debugger, radare2. Later, however, it turned out to be about debugging the debugger (much fun, i must admit!) and fixing bugs i found since all the tools are open source. :: :: ::----------------------[ 0x03: finding differentials ]----------------------:: :: :: The easiest way to find a differential is by fuzzing. There are several approaches that could be used: 1. 30-lines python script. This is what LiveOverflow used in his video, and just the same logic helped in finding several r2 bugs published in tmp.0ut issue 1 [15]. 2. AFL++ or LibFuzzer. They are goot when targeting a particular software, but that's not what i wanted (which was to get a bunch of corrupted ELF files to analyze with all the tools in question). 3. Some ELF-aware fuzzer that generates ELFs corrupted in a special way. OS is known to ignore section headers info, but debuggers and disassemblers parse it to name functions, sections, or apply DWARF data if any. Section headers is exactly what could be targeted to get a runnable ELF that would not analyze. I found a fuzzer authored by nitr0us already familiar to us. The fuzzer proudly named Melkor [16] can corrupt specific parts of ELF files which is just what i needed. So, i generated a hundred of ELF files with broken Section Header Table (SHT) out of a template ELF with the following command: ./melkor -Sn100 templates/foo I decided to generate only 100 ELF files with broken SHT so that i have a set of inputs small enough to embrace if i need to manually analyze some cases. :: `------------------> finding differentials+0x10: Linux 10 of the generated files segfaulted when run. Most of this problem was due to requested interpreter was the same as the binary: $ readelf -l orcs_foo/orc_0013 Elf file type is DYN (Position-Independent Executable file) Entry point 0x1100 There are 13 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040 0x00000000000002d8 0x00000000000002d8 R 0x8 INTERP 0x0000000000000318 0x0000000000000318 0x0000000000000318 0x000000000000001c 0x000000000000001c R 0x1 [Requesting program interpreter: orcs_foo/orc_0013] LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x00000000000008b0 0x00000000000008b0 R 0x1000 . . . However, this was not the only cause. In a couple of ELFs Melkor broke plt (procedure linkage table), so that any call to external function from such ELF would segfault, because instead of jumping to .got.plt the execution would jump to an ASCII smiley in .rodata section. 6 more files wouldn't run because their dynamic dependencies were messed up: Testing binary: orcs_foo/orc_0050 orcs_foo/orc_0050: error while loading shared libraries: lib %x .6: cannot \ open shared object file: No such file or directory So, i was left with 84 files that run. Do they break debuggers? :: `------------------> finding differentials+0x20: gdb gdb is your friend #1 if you want to prevent analysis. It is a mature debugger that performs many sanity checks and refuses to analyze a malformed file. In 98 cases it said: not in executable format: file format not recognized And in one: Can't read symbols from .../orcs_foo/orc_0063: bad value Eventually, gdb agreed to open and run only one file of the broken hundred. :: `------------> finding differentials+0x30: edb and CVE-2023-27734 edb segfaulted 55 times and was killed with SIGFPE (Floating point exception) 9 times. Both bugs originated in the code responsible for symbols parsing, namely the BinaryInfo edb plugin. In the first case, the section name string pointer (more precisely, an offset into the section names section, .shstrtab) was not checked to lie within the file when looking for specific sections. In case of FPE, edb tried iterating through symbol tables. Prior to walking, it needed to calculate the number of entries in the table. This was done by dividing section size by its entry size with no check wether entry was was zero. SIGFPE, despite its name, is generated in case of a division by zero as well. Both bugs were fixed [17], and for some reason only one of them got assigned CVE-2023-27734. :: `--------------> finding differentials+0x40: r2hang r2 did not crash on any file. When trying to debug, it reported segfaults of the binaries that segfaulted when executing in Linux. But this section wouldn't exist if there were no problems with r2, would it? :) About a quarter of executables made r2 hang when just opening them for analysis with 100% CPU core usage. Further investigation included debugging r2 and some binary diffing, and revealed that r2 got tangled up in the loop when trying to locate plt_addr within .plt.got in get_import_addr_x86_manual(), and Melkor had created this section with reported size of 0x4242424258aebf0b. In the file tested, r2 iterated from the start of .plt.got (0x10f0 in this case) about that many times: (0x4242424258aebf0b — 0x10f0)/8 = 596806425961158083. Notably, after about an eternity or two r2 would happily proceed to the next step of binary analysis. Still, the bug is enough to prevent the analysis. Despite iterating goes beyond the file size, there were no buffer overruns that in the end could result in the crash when exceeding the mapping, like it was in the case of edb 'out-of-file' reads. This was because the reading was done via r2 wrappers that are aware of the file size, and r2 just kept reading 0 bytes from the file buffer. This was also fixed and included in 5.8.4 release [18]. :: :: ::-----------------------------[ 0x04: .atexit ]-----------------------------:: :: :: Finding bugs in parsers in fun. Finding bugs in ELF parsers is even more fun, as it's the way to learn more about the ELF itself, about its loading, and also contribute to opensource and make things better! :: :: ::------------------------------[ 0xfe: thanks ]-----------------------------:: :: :: my respects to liveoverflow and corkami (and the whole researchers community!) for sharing the knowledge, langsec guys for opening my eyes to see parser bugs from a different angle, and tmp.0ut crew for this amazing zine! :^) :: :: ::----------------------------[ 0xff: references ]---------------------------:: :: :: 1. Uncrackable Program? Finding a Parser Differential in loading ELF (https://www.youtube.com/watch?v=OZvc-c1OLnM) 2. https://langsec.org 3. https://datatracker.ietf.org/doc/html/rfc761#section-2.10 4. https://github.com/corkami/docs/blob/master/AbusingFileFormats/README.md#specific-examples 5. PoC‖GTFO Issue 0x16 (https://raw.githubusercontent.com/angea/pocorgtfo/master/contents/articles/16-12.pdf) 6. https://justine.lol/ape.html 7. Towards a formal theory of computer insecurity: a language-theoretic approach (https://www.youtube.com/watch?v=AqZNebWoqnc) 8. XMPP Stanza Smuggling or How I Hacked Zoom (https://youtu.be/ERaRNsvCBrw?t=467) 9. "Psychic Paper" (https://blog.siguza.net/psychicpaper/) 10. DER Entitlements: The (Brief) Return of the Psychic Paper (https://googleprojectzero.blogspot.com/2023/01/der-entitlements-brief-return-of.html) 11. Sophail A Critical Analysis of Sophos Antivirus Tavis Ormandy (https://www.youtube.com/watch?v=EnotiUfBaW4&t=3078s) 12. Malware Samples Crashing x64dbg Fixed! (https://www.youtube.com/watch?v=FNuFlhnfZQU) 13. Striking Back GDB and IDA debuggers through malformed ELF executables (https://ioactive.com/striking-back-gdb-and-ida-debuggers-through-malformed-elf-executables/) 14. https://ioactive.com/pdfs/IOActive_Advisory_OpenBSD_5_5_Local_Kernel_Panic.pdf 15. Fuzzing Radare2 For 0days In About 30 Lines Of Code (https://tmpout.sh/1/5.html) 16. https://github.com/IOActive/Melkor_ELF_Fuzzer 17. https://github.com/eteran/edb-debugger/pull/834 18. https://github.com/radareorg/radare2/pull/21423