::                                                        ::
::  Weird ELFs, or a tale of breaking parsers once again  ::
::                                                        ::


                            :: [ 0x00: intro ] ::

Some time ago i enjoyed a speedrun  of  LiveOverflow videos, and ideas from one
of them [1] haunted  me until my  curiosity took over and  i finally started my
own research.  The  point was simple: feed a program an ELF corrupted so that a
parser breaks, but the kernel nevertheless successfully runs the executable.

I decided to dig a bit deeper and not just get  broken-but-runnable ELF files--
it  has been done many times before me already--but also comprehend the problem
of parsers more entirely and widely. And here i want to share my story.

::                                                                           ::
::----------------------------[ C O N T E N T S ]----------------------------::
::                                                                           ::

    0x01: langsec
          langsec+0x10: a note on polyglots
          langsec+0x20: parse tree differentials
    0x02: ELF parsers
          ELF parsers+0x10: defining targets
    0x03: finding differentials
          finding differentials+0x10: Linux
          finding differentials+0x20: gdb
          finding differentials+0x30: edb and CVE-2023-27734
          finding differentials+0x40: r2hang
    0x04: .atexit
    0xfe: thanks
    0xff: references

::                                                                           ::
::-----------------------------[ 0x01: langsec ]-----------------------------::
::                                                                           ::

Research dedicated  to parser  problems specifically  started with the birth of
langsec [2], and many parser bugs were discovered and discussed ever since.

Langsec studies different parser problems, namely where the theory (i.e. specs)
and implementations differ.  Be liberal in what you recieve, they say [3]. This
is, some deviations from the standards are ok  as long as the context is enough
to recover the meaning of the input data.  That is,  a parser can fill the gaps
and presume  the missing  data as  it wants.  Sometimes being much more liberal
than  the spec tells to be.  And,  sometimes,  different parsers  fill the gaps
different ways.

These lead to an interesting outcome, making funny things possible. Here we'll
take a look at two of them: polyglot files and parse tree differentials.

 `------------------> langsec+0x10: a note on polyglots

Polyglot files happen when a standard or its implementation are loose enough to
embed another file into it,  so that two parsers  of two different  formats can
successfully operate with such file.  As corkami mentions [4],  PDF allows much
liberty in this sense,  and he uses PDF a lot in his own PoCs of polyglots. One
of my favourites is a PDF that, when run, acts as a python web-server with a JS
compiler and many more [5]. It's much fun to reverse that :D

Another example i'd like to mention is an αcτµαlly pδrταblε εxεcµταblε [6]. Its
author managed to mix PE, ELF, Mach-O, sh and bare bootsector into one, so that
a single file is able to run in those environments.  Incredible research, may i

 `------------------> langsec+0x20: parse tree differentials

This type  of parser issues  becomes  possible when  two implementations of the
same  standard  behave  differently  in  some cases.  During  my  research,  i 
gathered several examples i find the most demonstrative.

The first one i ought to mention is bugs in x509 certificates parsers described
in 2010 by langsec pioneers, Len Sassaman and Meredith Patterson, who wondered:
what if put a  NULL-byte in a CN field [7]?  This  turned out to be a phisher's
dream,  as the browsers under test showed a user only a part of domain prior to
the NULL-byte, while the domain might be something like:


with a completely valid certificate.

Another example i like  is  about XML.  XML is  an  amazing format to find bugs
around: it's generally so simple for humans yet so hard to be parsed correctly.
A security  researcher once asked:  `Find me two  different  XML  parsers  that
always, for every input, result in the same output' [8]. What could possibly go
wrong if it was used to hold entitlements info an PLIST files?  Even more, what
could ever go wrong  if  we had four XML parsers  in  the system,  one of which 
operated in kernel space?

This bug could be used for  a  sandbox  escape  in  iOS.  And  at that time, an
incredible solution  was  to  introduce  the  fifth  PLIST  parser,  making the
researcher who found the bug even more  happy  [9, section 5].  Later,  however,
they decided to move to a binary DER format [10].

Yet another  amusing  parser  issue  is  named  Sophail,  after  Sophos AV, and
relates to the fact that the AV  skipped  further checks on an ELF if it was of 
one specific architecture [11].

As for breaking analysis in  practice,  a  crash due to insufficient checks on 
some PE header data in x64dbg was said to be used by malware in the wild [12].

::                                                                           ::
::---------------------------[ 0x02: ELF parsers ]---------------------------::
::                                                                           ::

Now that we've  come closer to ELF parsers,  let's see what has been done on it
already. Back in 2012 nitr0us performed similar research [13] on how to make an
ELF debug-resistant,  but  not by the runtime checks, as it ways for the latter
were known already and thus not that interesting.  nitr0us managed to break gdb
7.5.1 and IDA Pro 6.3 with an ELF that ran perfectly;  and a bug in OpenBSD ELF
loader was also found [14].

 `------------------> ELF parsers+0x10: defining targets

Now it's time to clarify the task.  First i wanted to get an ELF that would run
but would fail to open in gdb, edb-debugger, radare2. Later, however, it turned
out to be about debugging  the  debugger  (much fun, i must admit!)  and fixing 
bugs i found since all the tools are open source.

::                                                                           ::
::----------------------[ 0x03: finding differentials ]----------------------::
::                                                                           ::

The easiest way  to  find  a  differential  is  by  fuzzing.  There are several
approaches that could be used:

1. 30-lines python script. This is what LiveOverflow used in his video, and 
   just the same logic helped in finding several r2 bugs published in tmp.0ut
   issue 1 [15].

2. AFL++ or LibFuzzer. They are goot when targeting a particular software, but
   that's not what i wanted (which was to get a bunch of corrupted ELF files to
   analyze with all the tools in question).

3. Some ELF-aware fuzzer that generates ELFs corrupted in a special way. OS is
   known to ignore section headers info, but debuggers and disassemblers  parse
   it to name functions, sections, or apply DWARF data if any. Section headers
   is exactly what could be targeted to get a runnable ELF that would not

I found a fuzzer authored by nitr0us already familiar to us. The fuzzer proudly
named Melkor [16]  can corrupt specific parts of ELF files which is just what i
needed. So, i generated a hundred of ELF files with broken Section Header Table
(SHT) out of a template ELF with the following command:

    ./melkor -Sn100 templates/foo

I decided to generate only 100 ELF files with broken SHT so that i have a set
of inputs small enough to embrace if i need to manually analyze some cases.

 `------------------> finding differentials+0x10: Linux

10 of the generated files segfaulted when run.  Most of this problem was due to
requested interpreter was the same as the binary:

    $ readelf -l orcs_foo/orc_0013

    Elf file type is DYN (Position-Independent Executable file)
    Entry point 0x1100
    There are 13 program headers, starting at offset 64

    Program Headers:
    Type           Offset             VirtAddr           PhysAddr
                    FileSiz            MemSiz              Flags  Align
    PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                    0x00000000000002d8 0x00000000000002d8  R      0x8
    INTERP         0x0000000000000318 0x0000000000000318 0x0000000000000318
                    0x000000000000001c 0x000000000000001c  R      0x1
        [Requesting program interpreter: orcs_foo/orc_0013]
    LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                    0x00000000000008b0 0x00000000000008b0  R      0x1000
    . . .

However, this was not the only  cause.  In  a  couple of ELFs  Melkor broke plt
(procedure linkage table),  so that any call to external function from such ELF
would segfault, because instead of jumping to .got.plt the execution would jump
to an ASCII smiley in .rodata section.

6 more files wouldn't run because their dynamic dependencies were messed up:

    Testing binary: orcs_foo/orc_0050
    orcs_foo/orc_0050: error while loading shared libraries: lib %x .6: cannot \
                             open shared object file: No such file or directory

So, i was left with 84 files that run. Do they break debuggers?

 `------------------> finding differentials+0x20: gdb

gdb is your friend #1 if you want to prevent analysis.  It is a mature debugger
that performs many sanity checks and refuses to analyze a malformed file. In 98
cases it said:

    not in executable format: file format not recognized

And in one:

    Can't read symbols from .../orcs_foo/orc_0063: bad value

Eventually, gdb agreed to open and run only one file of the broken hundred.

 `------------> finding differentials+0x30: edb and CVE-2023-27734

edb segfaulted 55 times and was killed with SIGFPE (Floating point exception) 9
times. Both bugs originated in the code responsible for symbols parsing, namely
the BinaryInfo edb plugin.

In the first case,  the  section name string pointer (more precisely, an offset
into the section names section, .shstrtab)  was  not  checked to lie within the
file when looking  for specific sections.  In case of FPE,  edb tried iterating
through symbol tables.  Prior to walking, it needed to  calculate the number of
entries in the table.  This was done by dividing section size by its entry size
with no check wether entry was was zero. SIGFPE, despite its name, is generated
in case of a division by zero as well.

Both bugs were fixed [17], and  for  some reason only one  of them got assigned

 `--------------> finding differentials+0x40: r2hang

r2 did not crash on any file.  When  trying to debug,  it reported segfaults of
the binaries that segfaulted when executing in Linux. But this section wouldn't
exist if there were no problems with r2, would it? :)

About a quarter of executables made r2 hang when just opening them for analysis
with 100% CPU core usage.  Further investigation included debugging r2 and some
binary diffing,  and revealed that r2 got tangled up in the loop when trying to
locate  plt_addr  within .plt.got  in get_import_addr_x86_manual(), and  Melkor
had created this section with reported size of 0x4242424258aebf0b.  In the file
tested, r2 iterated from the start of .plt.got (0x10f0 in this case) about that
many times:

    (0x4242424258aebf0b — 0x10f0)/8 = 596806425961158083.

Notably, after about an eternity or  two  r2  would happily proceed to the next
step of binary analysis. Still, the bug is enough to prevent the analysis.

Despite iterating goes beyond the file size, there were no buffer overruns that
in the end could result in the crash when exceeding the mapping, like it was in
the case of edb 'out-of-file' reads.  This was because the reading was done via
r2 wrappers that are aware of the file size,  and  r2 just kept reading 0 bytes
from the file buffer. This was also fixed and included in 5.8.4 release [18]. 

::                                                                           ::
::-----------------------------[ 0x04: .atexit ]-----------------------------::
::                                                                           ::

Finding bugs in parsers in fun.  Finding bugs in  ELF parsers is even more fun,
as it's the way to learn more about the ELF itself, about its loading, and also
contribute to opensource and make things better!

::                                                                           ::
::------------------------------[ 0xfe: thanks ]-----------------------------::
::                                                                           ::

my respects to liveoverflow and corkami  (and the whole researchers community!)
for sharing the knowledge, langsec guys for opening  my eyes to see parser bugs
from a different angle, and tmp.0ut crew for this amazing zine! :^)

::                                                                           ::
::----------------------------[ 0xff: references ]---------------------------::
::                                                                           ::

1. Uncrackable Program? Finding a Parser Differential in loading ELF (https://www.youtube.com/watch?v=OZvc-c1OLnM)
2. https://langsec.org
3. https://datatracker.ietf.org/doc/html/rfc761#section-2.10
4. https://github.com/corkami/docs/blob/master/AbusingFileFormats/README.md#specific-examples
5. PoC‖GTFO Issue 0x16 (https://raw.githubusercontent.com/angea/pocorgtfo/master/contents/articles/16-12.pdf)
6. https://justine.lol/ape.html
7. Towards a formal theory of computer insecurity: a language-theoretic approach (https://www.youtube.com/watch?v=AqZNebWoqnc)
8. XMPP Stanza Smuggling or How I Hacked Zoom (https://youtu.be/ERaRNsvCBrw?t=467)
9. "Psychic Paper" (https://blog.siguza.net/psychicpaper/)
10. DER Entitlements: The (Brief) Return of the Psychic Paper (https://googleprojectzero.blogspot.com/2023/01/der-entitlements-brief-return-of.html)
11. Sophail A Critical Analysis of Sophos Antivirus Tavis Ormandy (https://www.youtube.com/watch?v=EnotiUfBaW4&t=3078s)
12. Malware Samples Crashing x64dbg Fixed! (https://www.youtube.com/watch?v=FNuFlhnfZQU)
13. Striking Back GDB and IDA debuggers through malformed ELF executables (https://ioactive.com/striking-back-gdb-and-ida-debuggers-through-malformed-elf-executables/)
14. https://ioactive.com/pdfs/IOActive_Advisory_OpenBSD_5_5_Local_Kernel_Panic.pdf
15. Fuzzing Radare2 For 0days In About 30 Lines Of Code (https://tmpout.sh/1/5.html)
16. https://github.com/IOActive/Melkor_ELF_Fuzzer
17. https://github.com/eteran/edb-debugger/pull/834
18. https://github.com/radareorg/radare2/pull/21423