Polyglottar, a structured ISO+TAR+ELF polyglot

|=-----------------------------------------------------------------------=|
|=-----=[   Polyglottar, a structured ISO+TAR+ELF polyglot    ]=---------=|
|=-----------------------------------------------------------------------=|
|=-------------------=[ by Enrique Soriano-Salvador ]=-------------------=|
|=--------------------=[ enrique.soriano@gmail.com ]=--------------------=|
|=-----------------------------------------------------------------------=|

1 - Introduction
2 - Background
3 - Formats
 3.1 - TAR
 3.2 - ELF
4 - Polyglottar
 4.1 - Plan
5 - Give me more: ISO
 5.1 - ELF + TAR + ISO
6 - Conclusions
7 - References


--[ 1 - Introduction

Polyglots are files that can play the role of two or more file formats
simultaneously. They are used to bypass protection mechanisms (e.g. IDS
or AV). There are a lot of different polyglot types, you can find a
good compilation of references in [1].

Most polyglots break the metadata structure of one of their types because
most file formats put the magic number (or file signature) at offset 0,
so you have to pick only one magic number for the file.

Nevertheless, there are some weird file types that don't put the magic
number at offset 0. One of them is very popular among UNIX geeks (like
me): TAR. It's an old file archiver used in UNIX (a .tgz file is a
compressed tar). Another example is ISO, a popular format used to store
disk images.

This article describes a ISO+TAR+ELF polyglot PoC: polyglottar. This file
is identified and can be used as a TAR and as an ISO image, but it's also
an ELF binary that can be executed on a Linux system.

--[ 2 - Background

I am not the first one focusing on the TAR format to make a polyglot. For
example, PoC or GTFO #6 describes a TAR + PDF polyglot.

In addition, a paper titled "Abusing File Processing in Malware Detectors
for Fun and Profit" [2] also
describes TAR as a technique to hide exploits. CVE 2012-1429 references
this paper.

In this polyglot_database [3] there is not a ISO+TAR+ELF polygot... So
let's do it!

--[ 3 - Formats

----[ 3.1 - TAR

The format is described in the GNU manual [4].

The TAR magic number is placed at offset 0x101. We focus on the POSIX TAR
format. This is the file header (as a C struct):

c struct posix_header {
    char name[100];
    char mode[8];
    char uid[8];
    char gid[8];
    char size[12];
    char mtime[12];
    char chksum[8];
    char typeflag;
    char linkname[100];
    char magic[6];
    char version[2];
    char uname[32];
    char gname[32];
    char devmajor[8];
    char devminor[8];
    char prefix[155];
};

The first field is the name of the directory. What if the directory has a funny
name, for example, the magic number of another file format?

Let's try it. We will use the ELF executable format magic number: 0x7f454c46,
that is, the byte 0x7f followed by the string ELF (from now on, we will
use $> to represent the prompt of a Linux shell):

$> mkdir test
$> cp file1 file2 file3 test
$> mv test  $'\x7FELF'
$> ls -d *ELF*
''$'\177''ELF'
$> tar cvf file.tar  $'\x7FELF'
$> file file.tar
file.tar: POSIX tar archive (GNU)
$> xxd file.tar | head -1
00000000: 7f45 4c46 2f00 0000 0000 0000 0000 0000  .ELF/..........
$>

This way, we can put both magic numbers (TAR and ELF) in the same
file. Note that the file is recognized as a TAR file by the file command.

----[ 3.2 - ELF

This is the 64-bit ELF header:

typedef struct
{
  unsigned char e_ident[EI_NIDENT];     /* Magic number and other info */
  Elf64_Half    e_type;                 /* Object file type */
  Elf64_Half    e_machine;              /* Architecture */
  Elf64_Word    e_version;              /* Object file version */
  Elf64_Addr    e_entry;                /* Entry point virtual address */
  Elf64_Off     e_phoff;                /* Program header table file offset */
  Elf64_Off     e_shoff;                /* Section header table file offset */
  Elf64_Word    e_flags;                /* Processor-specific flags */
  Elf64_Half    e_ehsize;               /* ELF header size in bytes */
  Elf64_Half    e_phentsize;            /* Program header table entry size */
  Elf64_Half    e_phnum;                /* Program header table entry count */
  Elf64_Half    e_shentsize;            /* Section header table entry size */
  Elf64_Half    e_shnum;                /* Section header table entry count */
  Elf64_Half    e_shstrndx;             /* Section header string table index */
} Elf64_Ehdr;

After the header, there are the sections (data for the linker) and segments
(data to load the program: TEXT (instructions), DATA (global variables),
and so on).

--[ 4 - Polyglottar

We will use this little program (we will use GNU AS, so it's AT&T assembly
syntax) for the PoC (example code taken from [5]):

.global _start
.text
_start:
    mov   $1, %al     # RAX holds syscall 1 (write), I chose to use
                      # %al, which is the lower 8 bits of the %rax
                      # register. From a binary standpoint, there
                      # is less space used to represent this than
                      # mov $1, %rax
    mov   %rax, %rdi  # RDI holds File Handle 1, STDOUT. This means
                      # that we are writing to the screen. Again,
                      # moving RAX to RDI is shorter than
                      # using mov $1, %rdi
    mov   $msg, %rsi  # RSI holds the address of our string buffer.
    mov   $6, %dl     # RDX holds the size our of string buffer.
                      # Moving into %dl to save space.
    syscall           # Invoke a syscall with these arguments.
    mov   $60, %al    # Now we are invoking syscall 60.
    xor   %rdi, %rdi  # Zero out RDI, which holds the return value.
    syscall           # Call the system again to exit.
msg:
    .ascii "hey!!\n"

The program just writes hey!! in its standard output. First, it performs
a write system call. Then it finishes with exit the system call. Let's
see:

$> as hey.s -o hey.o
$> ld hey.o -o hey
$> ./hey
hey!!
$>

Let's dump the ELF file generated by ld:

$> readelf -a hey
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          2097360 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         1
  Size of section headers:           64 (bytes)
  Number of section headers:         5
  Section header string table index: 4

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00200000
       000000000000001d  0000000000000000  AX       0     0     1
  [ 2] .symtab           SYMTAB           0000000000000000  00200020
       0000000000000078  0000000000000018           3     4     8
  [ 3] .strtab           STRTAB           0000000000000000  00200098
       0000000000000016  0000000000000000           0     0     1
  [ 4] .shstrtab         STRTAB           0000000000000000  002000ae
       0000000000000021  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000200000 0x0000000000000000 0x0000000000000000
                 0x000000000000001d 0x000000000000001d  R E    0x200000

 Section to Segment mapping:
  Segment Sections...
   00     .text

There is no dynamic section in this file.

There are no relocations in this file.

The decoding of unwind sections for machine type Advanced Micro Devices
X86-64 is not currently supported.

Symbol table '.symtab' contains 5 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000000000     0 SECTION LOCAL  DEFAULT    1
     2: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS asm_hey.o
     3: 0000000000000017     0 NOTYPE  LOCAL  DEFAULT    1 msg
     4: 0000000000000000     0 NOTYPE  GLOBAL DEFAULT    1 _start

No version information found in this file.
$>

Basically, it has the TEXT segment, the symbol table and the string table.

We have a problem: the TAR header is 512 bytes (in fact, the header is
500 bytes, but TAR works with 512 byte sectors). If we want a valid TAR,
we also need to preserve the TAR data, but it will destroy the TEXT
segment of the ELF. What can we do?

----[ 4.1 - Plan

This is the plan:

    1) Create a new segment in the ELF, named CUSTOM. This segment will
    contain part of the TAR header and the TAR data.

    2) Ignore the CUSTOM segment in the program.

This is the scheme:

       FILE

       offset
       0 ...                                               ... n
       +-------------------------------------------------------+
       |                                                       |
       +------------+-------------+----------------------------+
       |            |             |                            |
       |            |             |                            |
       |  header    |   tar data  |         trash              |  TAR VIEW
       |            |             |                            |
       |            |             |                            |
       +------------+-------------+----------------------------+
       |                                                       |
       |                                                       |
       +----------+---------------+-----------+----------------+
       |          |               |           |                |
       |          |               |           |                |
       |  header  |    .custom    |   .text   | .symtab, etc.  |  ELF VIEW
       |          |               |           |                |
       |          |               |           |                |
       +----------+---------------+-----------+----------------+
       |                                                       |
       +-------------------------------------------------------+

Now, we are going to change the program in order to create the new segment:

$> cat proto1.s
.global _start
.text
_start:
    mov   $1, %al     # RAX holds syscall 1 (write), I chose to use
                      # %al, which is the lower 8 bits of the %rax
                      # register. From a binary standpoint, there
                      # is less space used to represent this than
                      # mov $1, %rax
    mov   %rax, %rdi  # RDI holds File Handle 1, STDOUT. This means
                      # that we are writing to the screen. Again,
                      # moving RAX to RDI is shorter than
                      # using mov $1, %rdi
    mov   $msg, %rsi  # RSI holds the address of our string buffer.
    mov   $6, %dl     # RDX holds the size our of string buffer.
                      # Moving into %dl to save space.
    syscall           # Invoke a syscall with these arguments.
    mov   $60, %al    # Now we are invoking syscall 60.
    xor   %rdi, %rdi  # Zero out RDI, which holds the return value.
    syscall           # Call the system again to exit.
msg:
    .ascii "hey!!\n"

.section .custom , "aw" , @progbits
    .ascii "This lives in the CUSTOM section"
$>

If we dump the ELF:

$> readelf -a proto1
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x4000b0
  Start of program headers:          64 (bytes into file)
  Start of section headers:          536 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         2
  Size of section headers:           64 (bytes)
  Number of section headers:         6
  Section header string table index: 5

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         00000000004000b0  000000b0
       000000000000001d  0000000000000000  AX       0     0     1
  [ 2] .custom           PROGBITS         00000000006000cd  000000cd
       0000000000000020  0000000000000000  WA       0     0     1
  [ 3] .symtab           SYMTAB           0000000000000000  000000f0
       00000000000000d8  0000000000000018           4     5     8
  [ 4] .strtab           STRTAB           0000000000000000  000001c8
       0000000000000026  0000000000000000           0     0     1
  [ 5] .shstrtab         STRTAB           0000000000000000  000001ee
       0000000000000029  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000cd 0x00000000000000cd  R E    0x200000
  LOAD           0x00000000000000cd 0x00000000006000cd 0x00000000006000cd
                 0x0000000000000020 0x0000000000000020  RW     0x200000

 Section to Segment mapping:
  Segment Sections...
   00     .text
   01     .custom

There is no dynamic section in this file.

There are no relocations in this file.

The decoding of unwind sections for machine type Advanced Micro Devices
X86-64 is not currently supported.

Symbol table '.symtab' contains 9 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 00000000004000b0     0 SECTION LOCAL  DEFAULT    1
     2: 00000000006000cd     0 SECTION LOCAL  DEFAULT    2
     3: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS proto1.o
     4: 00000000004000c7     0 NOTYPE  LOCAL  DEFAULT    1 msg
     5: 00000000004000b0     0 NOTYPE  GLOBAL DEFAULT    1 _start
     6: 00000000006000ed     0 NOTYPE  GLOBAL DEFAULT    2 __bss_start
     7: 00000000006000ed     0 NOTYPE  GLOBAL DEFAULT    2 _edata
     8: 00000000006000f0     0 NOTYPE  GLOBAL DEFAULT    2 _end

No version information found in this file.
$>

We have a problem. We need to put the CUSTOM segment before the TEXT
segment, it has to be the first segment of the ELF. How can we do that? We
need some ld hacking.

The option --verbose provides the ld script that is used to link the
program:

$> ld --verbose proto1.o -o proto1
GNU ld (GNU Binutils for Ubuntu) 2.30
  Supported emulations:
   elf_x86_64
   elf32_x86_64
   elf_i386
   elf_iamcu
   i386linux
   elf_l1om
   elf_k1om
   i386pep
   i386pe
using internal linker script:
==================================================
/* Script for -z combreloc: combine and sort reloc sections */
/* Copyright (C) 2014-2018 Free Software Foundation, Inc.
   Copying and distribution of this script, with or without modification,
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved.  */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
              "elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)
SEARCH_DIR("=/usr/local/lib/x86_64-linux-gnu");
SEARCH_DIR("=/lib/x86_64-linux-gnu");
SEARCH_DIR("=/usr/lib/x86_64-linux-gnu");
SEARCH_DIR("=/usr/lib/x86_64-linux-gnu64");
SEARCH_DIR("=/usr/local/lib64");
SEARCH_DIR("=/lib64");
SEARCH_DIR("=/usr/lib64");
SEARCH_DIR("=/usr/local/lib");
SEARCH_DIR("=/lib");
SEARCH_DIR("=/usr/lib");
SEARCH_DIR("=/usr/x86_64-linux-gnu/lib64");
SEARCH_DIR("=/usr/x86_64-linux-gnu/lib");
SECTIONS
{
  /* Read-only sections, merged into text segment: */
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000));
  . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
  .interp         : { *(.interp) }
  .note.gnu.build-id : { *(.note.gnu.build-id) }
  .hash           : { *(.hash) }
  .gnu.hash       : { *(.gnu.hash) }
  .dynsym         : { *(.dynsym) }
  .dynstr         : { *(.dynstr) }
  .gnu.version    : { *(.gnu.version) }
  .gnu.version_d  : { *(.gnu.version_d) }
  .gnu.version_r  : { *(.gnu.version_r) }
  .rela.dyn       :
    {
      *(.rela.init)
      *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*)
      *(.rela.fini)
      *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*)
      *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*)
      *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*)
      *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*)
      *(.rela.ctors)
      *(.rela.dtors)
      *(.rela.got)
      *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*)
      *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*)
      *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*)
      *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*)
      *(.rela.ifunc)
    }
  .rela.plt       :
    {
      *(.rela.plt)
      PROVIDE_HIDDEN (__rela_iplt_start = .);
      *(.rela.iplt)
      PROVIDE_HIDDEN (__rela_iplt_end = .);
    }
  .init           :
  {
    KEEP (*(SORT_NONE(.init)))
  }
  .plt            : { *(.plt) *(.iplt) }
.plt.got        : { *(.plt.got) }
.plt.sec        : { *(.plt.sec) }
  .text           :
  {
    *(.text.unlikely .text.*_unlikely .text.unlikely.*)
    *(.text.exit .text.exit.*)
    *(.text.startup .text.startup.*)
    *(.text.hot .text.hot.*)
    *(.text .stub .text.* .gnu.linkonce.t.*)
    /* .gnu.warning sections are handled specially by elf32.em.  */
    *(.gnu.warning)
  }
  .fini           :
  {
    KEEP (*(SORT_NONE(.fini)))
  }
  PROVIDE (__etext = .);
  PROVIDE (_etext = .);
  PROVIDE (etext = .);
  .rodata         : { *(.rodata .rodata.* .gnu.linkonce.r.*) }
  .rodata1        : { *(.rodata1) }
  .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) }
  .eh_frame       : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) }
  .gcc_except_table   : ONLY_IF_RO { *(.gcc_except_table
  .gcc_except_table.*) }
  .gnu_extab   : ONLY_IF_RO { *(.gnu_extab*) }
  /* These sections are generated by the Sun/Oracle C++ compiler.  */
  .exception_ranges   : ONLY_IF_RO { *(.exception_ranges
  .exception_ranges*) }
  /* Adjust the address for the data segment.  We want to adjust up to
     the same address within the page on the next page up.  */
  . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE));
  /* Exception handling  */
  .eh_frame       : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) }
  .gnu_extab      : ONLY_IF_RW { *(.gnu_extab) }
  .gcc_except_table   : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) }
  .exception_ranges   : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) }
  /* Thread Local Storage sections  */
  .tdata          : { *(.tdata .tdata.* .gnu.linkonce.td.*) }
  .tbss           : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) }
  .preinit_array  :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  }
  .init_array     :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*)
        SORT_BY_INIT_PRIORITY(.ctors.*)))
    KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o
        *crtend.o *crtend?.o ) .ctors))
    PROVIDE_HIDDEN (__init_array_end = .);
  }
  .fini_array     :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*)
        SORT_BY_INIT_PRIORITY(.dtors.*)))
    KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o
        *crtbegin?.o *crtend.o *crtend?.o ) .dtors))
    PROVIDE_HIDDEN (__fini_array_end = .);
  }
  .ctors          :
  {
    /* gcc uses crtbegin.o to find the start of
       the constructors, so we make sure it is
       first.  Because this is a wildcard, it
       doesn't matter if the user does not
       actually link against crtbegin.o; the
       linker won't look for a file to match a
       wildcard.  The wildcard also means that it
       doesn't matter which directory crtbegin.o
       is in.  */
    KEEP (*crtbegin.o(.ctors))
    KEEP (*crtbegin?.o(.ctors))
    /* We don't want to include the .ctor section from
       the crtend.o file until after the sorted ctors.
       The .ctor section from the crtend file contains the
       end of ctors marker and it must be last */
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors))
    KEEP (*(SORT(.ctors.*)))
    KEEP (*(.ctors))
  }
  .dtors          :
  {
    KEEP (*crtbegin.o(.dtors))
    KEEP (*crtbegin?.o(.dtors))
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors))
    KEEP (*(SORT(.dtors.*)))
    KEEP (*(.dtors))
  }
  .jcr            : { KEEP (*(.jcr)) }
  .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*)
        *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) }
  .dynamic        : { *(.dynamic) }
  .got            : { *(.got) *(.igot) }
  . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .);
  .got.plt        : { *(.got.plt)  *(.igot.plt) }
  .data           :
  {
    *(.data .data.* .gnu.linkonce.d.*)
    SORT(CONSTRUCTORS)
  }
  .data1          : { *(.data1) }
  _edata = .; PROVIDE (edata = .);
  . = .;
  __bss_start = .;
  .bss            :
  {
   *(.dynbss)
   *(.bss .bss.* .gnu.linkonce.b.*)
   *(COMMON)
   /* Align here to ensure that the .bss section occupies space up to
      _end.  Align after .bss to ensure correct alignment even if the
      .bss section disappears because there are no input sections.
      FIXME: Why do we need it? When there is no .bss section, we don't
      pad the .data section.  */
   . = ALIGN(. != 0 ? 64 / 8 : 1);
  }
  .lbss   :
  {
    *(.dynlbss)
    *(.lbss .lbss.* .gnu.linkonce.lb.*)
    *(LARGE_COMMON)
  }
  . = ALIGN(64 / 8);
  . = SEGMENT_START("ldata-segment", .);
  .lrodata   ALIGN(CONSTANT (MAXPAGESIZE)) +
        (. & (CONSTANT (MAXPAGESIZE) - 1)) :
  {
    *(.lrodata .lrodata.* .gnu.linkonce.lr.*)
  }
  .ldata   ALIGN(CONSTANT (MAXPAGESIZE)) +
        (. & (CONSTANT (MAXPAGESIZE) - 1)) :
  {
    *(.ldata .ldata.* .gnu.linkonce.l.*)
    . = ALIGN(. != 0 ? 64 / 8 : 1);
  }
  . = ALIGN(64 / 8);
  _end = .; PROVIDE (end = .);
  . = DATA_SEGMENT_END (.);
  /* Stabs debugging sections.  */
  .stab          0 : { *(.stab) }
  .stabstr       0 : { *(.stabstr) }
  .stab.excl     0 : { *(.stab.excl) }
  .stab.exclstr  0 : { *(.stab.exclstr) }
  .stab.index    0 : { *(.stab.index) }
  .stab.indexstr 0 : { *(.stab.indexstr) }
  .comment       0 : { *(.comment) }
  /* DWARF debug sections.
     Symbols in the DWARF debugging sections are relative to the beginning
     of the section so we begin them at 0.  */
  /* DWARF 1 */
  .debug          0 : { *(.debug) }
  .line           0 : { *(.line) }
  /* GNU DWARF 1 extensions */
  .debug_srcinfo  0 : { *(.debug_srcinfo) }
  .debug_sfnames  0 : { *(.debug_sfnames) }
  /* DWARF 1.1 and DWARF 2 */
  .debug_aranges  0 : { *(.debug_aranges) }
  .debug_pubnames 0 : { *(.debug_pubnames) }
  /* DWARF 2 */
  .debug_info     0 : { *(.debug_info .gnu.linkonce.wi.*) }
  .debug_abbrev   0 : { *(.debug_abbrev) }
  .debug_line     0 : { *(.debug_line .debug_line.* .debug_line_end ) }
  .debug_frame    0 : { *(.debug_frame) }
  .debug_str      0 : { *(.debug_str) }
  .debug_loc      0 : { *(.debug_loc) }
  .debug_macinfo  0 : { *(.debug_macinfo) }
  /* SGI/MIPS DWARF 2 extensions */
  .debug_weaknames 0 : { *(.debug_weaknames) }
  .debug_funcnames 0 : { *(.debug_funcnames) }
  .debug_typenames 0 : { *(.debug_typenames) }
  .debug_varnames  0 : { *(.debug_varnames) }
  /* DWARF 3 */
  .debug_pubtypes 0 : { *(.debug_pubtypes) }
  .debug_ranges   0 : { *(.debug_ranges) }
  /* DWARF Extension.  */
  .debug_macro    0 : { *(.debug_macro) }
  .debug_addr     0 : { *(.debug_addr) }
  .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) }
  /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) }
}

==================================================
attempt to open proto1.o succeeded
proto1.o

We can modify this script to put the CUSTOM segment before the TEXT
segment. We only need a new line before the TEXT segment part:

$> cat ld.script
/* Script for -z combreloc: combine and sort reloc sections */
/* Copyright (C) 2014-2018 Free Software Foundation, Inc.
   Copying and distribution of this script, with or without modification,
   are permitted in any medium without royalty provided the copyright
   notice and this notice are preserved.  */
OUTPUT_FORMAT("elf64-x86-64", "elf64-x86-64",
              "elf64-x86-64")
OUTPUT_ARCH(i386:x86-64)
ENTRY(_start)
SEARCH_DIR("=/usr/local/lib/x86_64-linux-gnu");
SEARCH_DIR("=/lib/x86_64-linux-gnu");
SEARCH_DIR("=/usr/lib/x86_64-linux-gnu");
SEARCH_DIR("=/usr/lib/x86_64-linux-gnu64");
SEARCH_DIR("=/usr/local/lib64");
SEARCH_DIR("=/lib64");
SEARCH_DIR("=/usr/lib64");
SEARCH_DIR("=/usr/local/lib");
SEARCH_DIR("=/lib");
SEARCH_DIR("=/usr/lib");
SEARCH_DIR("=/usr/x86_64-linux-gnu/lib64");
SEARCH_DIR("=/usr/x86_64-linux-gnu/lib");
SECTIONS
{
  /* Read-only sections, merged into text segment: */
  PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x400000));
      . = SEGMENT_START("text-segment", 0x400000) + SIZEOF_HEADERS;
  .interp         : { *(.interp) }
  .note.gnu.build-id : { *(.note.gnu.build-id) }
  .hash           : { *(.hash) }
  .gnu.hash       : { *(.gnu.hash) }
  .dynsym         : { *(.dynsym) }
  .dynstr         : { *(.dynstr) }
  .gnu.version    : { *(.gnu.version) }
  .gnu.version_d  : { *(.gnu.version_d) }
  .gnu.version_r  : { *(.gnu.version_r) }
  .rela.dyn       :
    {
      *(.rela.init)
      *(.rela.text .rela.text.* .rela.gnu.linkonce.t.*)
      *(.rela.fini)
      *(.rela.rodata .rela.rodata.* .rela.gnu.linkonce.r.*)
      *(.rela.data .rela.data.* .rela.gnu.linkonce.d.*)
      *(.rela.tdata .rela.tdata.* .rela.gnu.linkonce.td.*)
      *(.rela.tbss .rela.tbss.* .rela.gnu.linkonce.tb.*)
      *(.rela.ctors)
      *(.rela.dtors)
      *(.rela.got)
      *(.rela.bss .rela.bss.* .rela.gnu.linkonce.b.*)
      *(.rela.ldata .rela.ldata.* .rela.gnu.linkonce.l.*)
      *(.rela.lbss .rela.lbss.* .rela.gnu.linkonce.lb.*)
      *(.rela.lrodata .rela.lrodata.* .rela.gnu.linkonce.lr.*)
      *(.rela.ifunc)
    }
  .rela.plt       :
    {
      *(.rela.plt)
      PROVIDE_HIDDEN (__rela_iplt_start = .);
      *(.rela.iplt)
      PROVIDE_HIDDEN (__rela_iplt_end = .);
    }
  .init           :
  {
    KEEP (*(SORT_NONE(.init)))
  }
  .plt            : { *(.plt) *(.iplt) }
.plt.got        : { *(.plt.got) }
.plt.sec        : { *(.plt.sec) }
  .custom : { *(.custom) }
  .text           :
  {
    *(.text.unlikely .text.*_unlikely .text.unlikely.*)
    *(.text.exit .text.exit.*)
    *(.text.startup .text.startup.*)
    *(.text.hot .text.hot.*)
    *(.text .stub .text.* .gnu.linkonce.t.*)
    /* .gnu.warning sections are handled specially by elf32.em.  */
    *(.gnu.warning)
  }
  .fini           :
  {
    KEEP (*(SORT_NONE(.fini)))
  }
  PROVIDE (__etext = .);
  PROVIDE (_etext = .);
  PROVIDE (etext = .);
  .rodata         : { *(.rodata .rodata.* .gnu.linkonce.r.*) }
  .rodata1        : { *(.rodata1) }
  .eh_frame_hdr : { *(.eh_frame_hdr) *(.eh_frame_entry .eh_frame_entry.*) }
  .eh_frame       : ONLY_IF_RO { KEEP (*(.eh_frame)) *(.eh_frame.*) }
  .gcc_except_table   : ONLY_IF_RO { *(.gcc_except_table
  .gcc_except_table.*) }
  .gnu_extab   : ONLY_IF_RO { *(.gnu_extab*) }
  /* These sections are generated by the Sun/Oracle C++ compiler.  */
  .exception_ranges   : ONLY_IF_RO { *(.exception_ranges
  .exception_ranges*) }
  /* Adjust the address for the data segment.  We want to adjust up to
     the same address within the page on the next page up.  */
  . = DATA_SEGMENT_ALIGN (CONSTANT (MAXPAGESIZE), CONSTANT (COMMONPAGESIZE));
  /* Exception handling  */
  .eh_frame       : ONLY_IF_RW { KEEP (*(.eh_frame)) *(.eh_frame.*) }
  .gnu_extab      : ONLY_IF_RW { *(.gnu_extab) }
  .gcc_except_table   : ONLY_IF_RW { *(.gcc_except_table .gcc_except_table.*) }
  .exception_ranges   : ONLY_IF_RW { *(.exception_ranges .exception_ranges*) }
  /* Thread Local Storage sections  */
  .tdata              : { *(.tdata .tdata.* .gnu.linkonce.td.*) }
  .tbss               : { *(.tbss .tbss.* .gnu.linkonce.tb.*) *(.tcommon) }
  .preinit_array      :
  {
    PROVIDE_HIDDEN (__preinit_array_start = .);
    KEEP (*(.preinit_array))
    PROVIDE_HIDDEN (__preinit_array_end = .);
  }
  .init_array     :
  {
    PROVIDE_HIDDEN (__init_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.init_array.*)
        SORT_BY_INIT_PRIORITY(.ctors.*)))
    KEEP (*(.init_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o *crtend.o
        *crtend?.o ) .ctors))
    PROVIDE_HIDDEN (__init_array_end = .);
  }
  .fini_array     :
  {
    PROVIDE_HIDDEN (__fini_array_start = .);
    KEEP (*(SORT_BY_INIT_PRIORITY(.fini_array.*)
        SORT_BY_INIT_PRIORITY(.dtors.*)))
    KEEP (*(.fini_array EXCLUDE_FILE (*crtbegin.o *crtbegin?.o
        *crtend.o *crtend?.o ) .dtors))
    PROVIDE_HIDDEN (__fini_array_end = .);
  }
  .ctors          :
  {
    /* gcc uses crtbegin.o to find the start of
       the constructors, so we make sure it is
       first.  Because this is a wildcard, it
       doesn't matter if the user does not
       actually link against crtbegin.o; the
       linker won't look for a file to match a
       wildcard.  The wildcard also means that it
       doesn't matter which directory crtbegin.o
       is in.  */
    KEEP (*crtbegin.o(.ctors))
    KEEP (*crtbegin?.o(.ctors))
    /* We don't want to include the .ctor section from
       the crtend.o file until after the sorted ctors.
       The .ctor section from the crtend file contains the
       end of ctors marker and it must be last */
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .ctors))
    KEEP (*(SORT(.ctors.*)))
    KEEP (*(.ctors))
  }
  .dtors          :
  {
    KEEP (*crtbegin.o(.dtors))
    KEEP (*crtbegin?.o(.dtors))
    KEEP (*(EXCLUDE_FILE (*crtend.o *crtend?.o ) .dtors))
    KEEP (*(SORT(.dtors.*)))
    KEEP (*(.dtors))
  }
  .jcr            : { KEEP (*(.jcr)) }
  .data.rel.ro : { *(.data.rel.ro.local* .gnu.linkonce.d.rel.ro.local.*)
        *(.data.rel.ro .data.rel.ro.* .gnu.linkonce.d.rel.ro.*) }
  .dynamic        : { *(.dynamic) }
  .got            : { *(.got) *(.igot) }
  . = DATA_SEGMENT_RELRO_END (SIZEOF (.got.plt) >= 24 ? 24 : 0, .);
  .got.plt        : { *(.got.plt)  *(.igot.plt) }
  .data           :
  {
    *(.data .data.* .gnu.linkonce.d.*)
    SORT(CONSTRUCTORS)
  }
  .data1          : { *(.data1) }
  _edata = .; PROVIDE (edata = .);
  . = .;
  __bss_start = .;
  .bss            :
  {
   *(.dynbss)
   *(.bss .bss.* .gnu.linkonce.b.*)
   *(COMMON)
   /* Align here to ensure that the .bss section occupies space up to
      _end.  Align after .bss to ensure correct alignment even if the
      .bss section disappears because there are no input sections.
      FIXME: Why do we need it? When there is no .bss section, we don't
      pad the .data section.  */
   . = ALIGN(. != 0 ? 64 / 8 : 1);
  }
  .lbss   :
  {
    *(.dynlbss)
    *(.lbss .lbss.* .gnu.linkonce.lb.*)
    *(LARGE_COMMON)
  }
  . = ALIGN(64 / 8);
  . = SEGMENT_START("ldata-segment", .);
  .lrodata   ALIGN(CONSTANT (MAXPAGESIZE)) +
        (. & (CONSTANT (MAXPAGESIZE) - 1)) :
  {
    *(.lrodata .lrodata.* .gnu.linkonce.lr.*)
  }
  .ldata   ALIGN(CONSTANT (MAXPAGESIZE)) + (. & (CONSTANT (MAXPAGESIZE) - 1)) :
  {
    *(.ldata .ldata.* .gnu.linkonce.l.*)
    . = ALIGN(. != 0 ? 64 / 8 : 1);
  }
  . = ALIGN(64 / 8);
  _end = .; PROVIDE (end = .);
  . = DATA_SEGMENT_END (.);
  /* Stabs debugging sections.  */
  .stab          0 : { *(.stab) }
  .stabstr       0 : { *(.stabstr) }
  .stab.excl     0 : { *(.stab.excl) }
  .stab.exclstr  0 : { *(.stab.exclstr) }
  .stab.index    0 : { *(.stab.index) }
  .stab.indexstr 0 : { *(.stab.indexstr) }
  .comment       0 : { *(.comment) }
  /* DWARF debug sections.
     Symbols in the DWARF debugging sections are relative to the beginning
     of the section so we begin them at 0.  */
  /* DWARF 1 */
  .debug          0 : { *(.debug) }
  .line           0 : { *(.line) }
  /* GNU DWARF 1 extensions */
  .debug_srcinfo  0 : { *(.debug_srcinfo) }
  .debug_sfnames  0 : { *(.debug_sfnames) }
  /* DWARF 1.1 and DWARF 2 */
  .debug_aranges  0 : { *(.debug_aranges) }
  .debug_pubnames 0 : { *(.debug_pubnames) }
  /* DWARF 2 */
  .debug_info     0 : { *(.debug_info .gnu.linkonce.wi.*) }
  .debug_abbrev   0 : { *(.debug_abbrev) }
  .debug_line     0 : { *(.debug_line .debug_line.* .debug_line_end ) }
  .debug_frame    0 : { *(.debug_frame) }
  .debug_str      0 : { *(.debug_str) }
  .debug_loc      0 : { *(.debug_loc) }
  .debug_macinfo  0 : { *(.debug_macinfo) }
  /* SGI/MIPS DWARF 2 extensions */
  .debug_weaknames 0 : { *(.debug_weaknames) }
  .debug_funcnames 0 : { *(.debug_funcnames) }
  .debug_typenames 0 : { *(.debug_typenames) }
  .debug_varnames  0 : { *(.debug_varnames) }
  /* DWARF 3 */
  .debug_pubtypes 0 : { *(.debug_pubtypes) }
  .debug_ranges   0 : { *(.debug_ranges) }
  /* DWARF Extension.  */
  .debug_macro    0 : { *(.debug_macro) }
  .debug_addr     0 : { *(.debug_addr) }
  .gnu.attributes 0 : { KEEP (*(.gnu.attributes)) }
  /DISCARD/ : { *(.note.GNU-stack) *(.gnu_debuglink) *(.gnu.lto_*) }
}

Now we link the program with the new script:

$> ld  proto1.o -T ld.script -o proto1
$> readelf -a proto1
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400098
  Start of program headers:          64 (bytes into file)
  Start of section headers:          480 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         1
  Size of section headers:           64 (bytes)
  Number of section headers:         6
  Section header string table index: 5

Section Headers:
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .custom           PROGBITS         0000000000400078  00000078
       0000000000000020  0000000000000000  WA       0     0     1
  [ 2] .text             PROGBITS         0000000000400098  00000098
       000000000000001d  0000000000000000  AX       0     0     1
  [ 3] .symtab           SYMTAB           0000000000000000  000000b8
       00000000000000d8  0000000000000018           4     5     8
  [ 4] .strtab           STRTAB           0000000000000000  00000190
       0000000000000026  0000000000000000           0     0     1
  [ 5] .shstrtab         STRTAB           0000000000000000  000001b6
       0000000000000029  0000000000000000           0     0     1
Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings), I (info),
  L (link order), O (extra OS processing required), G (group), T (TLS),
  C (compressed), x (unknown), o (OS specific), E (exclude),
  l (large), p (processor specific)

There are no section groups in this file.

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x00000000000000b5 0x00000000000000b5  RWE    0x200000

 Section to Segment mapping:
  Segment Sections...
   00     .custom .text

There is no dynamic section in this file.

There are no relocations in this file.

The decoding of unwind sections for machine type Advanced Micro Devices
X86-64 is not currently supported.

Symbol table '.symtab' contains 9 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
     1: 0000000000400078     0 SECTION LOCAL  DEFAULT    1
     2: 0000000000400098     0 SECTION LOCAL  DEFAULT    2
     3: 0000000000000000     0 FILE    LOCAL  DEFAULT  ABS proto1.o
     4: 00000000004000af     0 NOTYPE  LOCAL  DEFAULT    2 msg
     5: 0000000000400098     0 NOTYPE  GLOBAL DEFAULT    2 _start
     6: 00000000006000b5     0 NOTYPE  GLOBAL DEFAULT    2 __bss_start
     7: 00000000006000b5     0 NOTYPE  GLOBAL DEFAULT    2 _edata
     8: 00000000006000b8     0 NOTYPE  GLOBAL DEFAULT    2 _end

No version information found in this file.
$>

Now we can see that the CUSTOM segment is where we needed, it's the
first segment. The binary works fine:

$> xxd proto1 | head -15
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 9800 4000 0000 0000  ..>.......@.....
00000020: 4000 0000 0000 0000 e001 0000 0000 0000  @...............
00000030: 0000 0000 4000 3800 0100 4000 0600 0500  ....@.8...@.....
00000040: 0100 0000 0700 0000 0000 0000 0000 0000  ................
00000050: 0000 4000 0000 0000 0000 4000 0000 0000  ..@.......@.....
00000060: b500 0000 0000 0000 b500 0000 0000 0000  ................
00000070: 0000 2000 0000 0000 5468 6973 206c 6976  .. .....This liv
00000080: 6573 2069 6e20 7468 6520 4355 5354 4f4d  es in the CUSTOM
00000090: 2073 6563 7469 6f6e b001 4889 c748 c7c6   section..H..H..
000000a0: af00 4000 b206 0f05 b03c 4831 ff0f 0568  ..@......<H1...h
000000b0: 6579 2121 0a00 0000 0000 0000 0000 0000  ey!!............
000000c0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
000000d0: 0000 0000 0300 0100 7800 4000 0000 0000  ........x.@.....
000000e0: 0000 0000 0000 0000 0000 0000 0300 0200  ................
000000f0: 9800 4000 0000 0000 0000 0000 0000 0000  ..@.............
$> ./proto1
hey!!
$>

So far so good! Let's make the CUSTOM segment big enough to hold the
TAR data (the TAR file is 10 KB). We create a 10240 byte array, with
all the bytes set to 0x66:

$> cat proto2.s
.global _start
.text
_start:
    mov   $1, %al     # RAX holds syscall 1 (write), I chose to use
                      # %al, which is the lower 8 bits of the %rax
                      # register. From a binary standpoint, there
                      # is less space used to represent this than
                      # mov $1, %rax
    mov   %rax, %rdi  # RDI holds File Handle 1, STDOUT. This means
                      # that we are writing to the screen. Again,
                      # moving RAX to RDI is shorter than
                      # using mov $1, %rdi
    mov   $msg, %rsi  # RSI holds the address of our string buffer.
    mov   $6, %dl     # RDX holds the size our of string buffer.
                      # Moving into %dl to save space.
    syscall           # Invoke a syscall with these arguments.
    mov   $60, %al    # Now we are invoking syscall 60.
    xor   %rdi, %rdi  # Zero out RDI, which holds the return value.
    syscall           # Call the system again to exit.
msg:
    .ascii "hey!!\n"

.section .custom , "aw" , @progbits
    array: .fill 10240 , 1 , 0x66

$> as proto2.s -o proto2.o
$> ld  proto2.o -T ld.script -o proto2
$> ./proto2
hey!!
$> xxd proto2 | head -20
00000000: 7f45 4c46 0201 0100 0000 0000 0000 0000  .ELF............
00000010: 0200 3e00 0100 0000 7828 4000 0000 0000  ..>.....x(@.....
00000020: 4000 0000 0000 0000 e029 0000 0000 0000  @........)......
00000030: 0000 0000 4000 3800 0100 4000 0600 0500  ....@.8...@.....
00000040: 0100 0000 0700 0000 0000 0000 0000 0000  ................
00000050: 0000 4000 0000 0000 0000 4000 0000 0000  ..@.......@.....
00000060: 9528 0000 0000 0000 9528 0000 0000 0000  .(.......(......
00000070: 0000 2000 0000 0000 6666 6666 6666 6666  .. .....ffffffff
00000080: 6666 6666 6666 6666 6666 6666 6666 6666  ffffffffffffffff
00000090: 6666 6666 6666 6666 6666 6666 6666 6666  ffffffffffffffff
000000a0: 6666 6666 6666 6666 6666 6666 6666 6666  ffffffffffffffff
000000b0: 6666 6666 6666 6666 6666 6666 6666 6666  ffffffffffffffff
000000c0: 6666 6666 6666 6666 6666 6666 6666 6666  ffffffffffffffff
000000d0: 6666 6666 6666 6666 6666 6666 6666 6666  ffffffffffffffff
000000e0: 6666 6666 6666 6666 6666 6666 6666 6666  ffffffffffffffff
000000f0: 6666 6666 6666 6666 6666 6666 6666 6666  ffffffffffffffff
00000100: 6666 6666 6666 6666 6666 6666 6666 6666  ffffffffffffffff
00000110: 6666 6666 6666 6666 6666 6666 6666 6666  ffffffffffffffff
00000120: 6666 6666 6666 6666 6666 6666 6666 6666  ffffffffffffffff
00000130: 6666 6666 6666 6666 6666 6666 6666 6666  ffffffffffffffff
$>

It looks good.

We need to do some surgery. We are going to create a Frankenstein file with
the ELF header, part of the TAR header, the TAR data (overwriting the
CUSTOM segment) and the rest of the ELF file. In order to get a correct
ELF (header and tables before the CUSTOM segment), we need to sacrifice
these fields of the TAR header:

char name[100];
char mode[8];
char uid[8];
char gid[8];

We use the dd command to overwrite the ELF file with the TAR file (from
offset 120 to EOF):


$> cp proto2 frankie.tar
$> dd if=/tmp/file.tar of=frankie.tar skip=120 seek=120 bs=1 count=10240
           conv=notrunc
10120+0 records in
10120+0 records out
10120 bytes (10 kB, 9.9 KiB) copied, 0.0209625 s, 483 kB/s

If we execute it:

$> ./frankie.tar
hey!!
$>

Is it a TAR file?

$> file ./frankie.tar
./frankie.tar: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
       statically linked, not stripped
$> tar tvf frankie.tar
tar: This does not look like a tar archive
tar: Skipping to next header
-rw-rw-r-- esoriano/esoriano 22 2020-04-07 14:28 \177ELF/README
-rw-rw-r-- esoriano/esoriano  5 2020-04-07 14:28 \177ELF/install.exe
tar: Exiting with failure status due to previous errors
$>

Ooops. We have a problem: the TAR header stores the checksum of the
header block (the first 512 bytes). If the checksum is wrong, there
is an error. Moreover, in this case, file says that the file is an
ELF executable!

After reading the GNU manual, the GNU tar command code (horror!) and
the Plan 9 tar command code (very clean, as usual), I found that:

    a) The checksum is the sum of all bytes of the header block.

    b) The chksum field must be set to blanks (space) to compute the
    checksum.

    c) The chksum field is text.

    e) The number is octal.

I wrote this little C program to print and update the checksum of a TAR:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
#include <err.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

enum{
    Blocksz = 512,
    Chkstart = 148,
    Chksz = 8,
};

struct posix_header{                              /* byte offset */
  char name[100];               /*   0 */
  char mode[8];                 /* 100 */
  char uid[8];                  /* 108 */
  char gid[8];                  /* 116 */
  char size[12];                /* 124 */
  char mtime[12];               /* 136 */
  char chksum[8];               /* 148 */
  char typeflag;                /* 156 */
  char linkname[100];           /* 157 */
  char magic[6];                /* 257 */
  char version[2];              /* 263 */
  char uname[32];               /* 265 */
  char gname[32];               /* 297 */
  char devmajor[8];             /* 329 */
  char devminor[8];             /* 337 */
  char prefix[155];             /* 345 */
};

union block{
    struct posix_header h;
    char b[Blocksz];
};

int
main(int argc, char *argv[])
{
    FILE *f;
    int i;
    unsigned char *p;
    union block u;
    long sum = 0;

    if(argc != 2)
        errx(1, "I need a file");
    f = fopen(argv[1], "r+");
    if(f == NULL)
        errx(1, "can't open file");
    if(fread(&u, sizeof(u), 1, f) != 1)
        errx(1, "can't read header");
    p = (unsigned char *)&u;
    for(i=0; i<Blocksz; i++){
        if(i >= Chkstart && i < Chkstart+Chksz){
            sum += ' ';
        }else{
            sum += p[i];
        }
    }
    fprintf(stderr, "old checksum: %s\n", u.h.chksum);
    snprintf(u.h.chksum, 8, "%lo",sum);
    fprintf(stderr, "current checksum: %s\n", u.h.chksum);
    fseek(f, 0, SEEK_SET);
    if(fwrite(&u, sizeof(u), 1, f) != 1)
        errx(1, "can't write header");
    fclose(f);
    exit(0);
}

Let's try again:

$> mkdir m
$> echo ROGUE TAR > m/README
$> echo NULL > m/install.exe
$> mv m  $'\x7FELF'
$> tar cvf file.tar $'\x7FELF'
\177ELF/
\177ELF/README
\177ELF/install.exe
$> ls -l file.tar
-rw-rw-r-- 1 esoriano esoriano 10240 abr  8 13:49 file.tar
$> cp proto2 frankie.tar
$> dd if=/tmp/file.tar of=frankie.tar skip=120 seek=120 bs=1 count=10240
       conv=notrunc
$> ./tar-header frankie.tar
$> file frankie.tar
frankie.tar: POSIX tar archive (GNU)
$> $ exiftool frankie.tar
ExifTool Version Number         : 10.80
File Name                       : frankie.tar
Directory                       : .
File Size                       : 11 kB
File Modification Date/Time     : 2020:04:08 13:55:56+02:00
File Access Date/Time           : 2020:04:08 13:56:36+02:00
File Inode Change Date/Time     : 2020:04:08 13:55:56+02:00
File Permissions                : rwxrwxr-x
File Type                       : TAR
File Type Extension             : tar
MIME Type                       : application/x-tar
Warning                         : Unsupported file type
$> ./frankie.tar
hey!!
$> tar tvf frankie.tar
d--------- esoriano/esoriano 0 2020-04-08 13:48 \177ELF\002\001\001
-rw-rw-r-- esoriano/esoriano 10 2020-04-08 13:47 \177ELF/README
-rw-rw-r-- esoriano/esoriano  5 2020-04-08 13:48 \177ELF/install.exe
$>

It works!

Both file and exiftool say that it's a TAR file. We can list and
extract it. Also, the binary works fine.

If we upload the file to virustotal, only one AV engine (McAfee) detects
that something is wrong: It says it's an exploit and provides the CVE
we mentioned before (CVE 2012-1429) [6].

The file is identified as an ELF:

MD5 f9ed8754eb8d18a44f0c7ccb6390b1a5
SHA-1 bbbd1271a0fc74d95d8912a7cd82778b68243b7d
SHA-256 3015d4ad1d002032ce1f9bc07507562916e190c6130568f886f66d36667438d6
Vhash cc331f244d599354f6c8960918e7d4cf
SSDEEP 12:BvwPqX18Q7omfEoyImJXZlye+rdZthMI7l4QUKXQ06ED6e666kQ:Gy2S0IRVh1Ee/
File type ELF
Magic POSIX tar archive (GNU)
File size 10.84 KB (11104 bytes)

Note that the file has the ELF magic number. The next two bytes are used
to define the arch (32 or 64 bit) and the endianess of the ELF. We need
to put the '/' and the NULL char '\0' to make it a correct TAR (that is,
0x2f00 after \x7FELF). If we polish the header with an hexadecimal editor:

$> xxd frankie.tar | head -1
00000000: 7f45 4c46 2f00 0100 0000 0000 0000 0000  .ELF/...........
$>

The ELF executes OK. Now, virustotal says that the file is a TAR [7]:

MD5 eab51c63e4750a450d2c5dbfd45d72d3
SHA-1 df119c10cc1865d1dc85aaeebacc7cc4728d173a
SHA-256 2ae17d426ccae26ceb994625524978f6e68721bcda18e8885ff2091031c8900e
Vhash 90b42c1a137695cb28ff7fd3d5c19596
SSDEEP 12:BRIPqX18QumfEoyImJXZlye+rdZthMI7l4QUKXQ06ED6e666kQ:YyAS0IRVh1Ee/
File type TAR
Magic POSIX tar archive (GNU)
File size 10.84 KB (11104 bytes)

McAfee still detects it. The CVE-2012-1429 description says:

    The ELF file parser in Bitdefender 7.2, Comodo Antivirus 7424,
    Emsisoft Anti-Malware 5.1.0.1, eSafe 7.0.17.0, F-Secure Anti-Virus
    9.0.16160.0, Ikarus Virus Utilities T3 Command Line Scanner
    1.1.97.0, McAfee Anti-Virus Scanning Engine 5.400.0.1158, McAfee
    Gateway (formerly Webwasher) 2010.1C, and nProtect Anti-Virus
    2011-01-17.01 allows remote attackers to bypass malware detection
    via an ELF file with a ustar character sequence at a certain
    location. NOTE: this may later be SPLIT into multiple CVEs
    if additional information is published showing that the error
    occurred independently in different ELF parser implementations.

Well, we have gone way further. This polyglot is not just an ELF with
that string at offset 0x101. We have made a correct TAR file which is
also a correct ELF executable.

If we try to execute polyglottar in Windows 10 WSL, it works, as expected
(since WSL 2 is a virtualized Linux kernel).

--[ 5 - Give me more: ISO

There is another file format that does not have the magic number at offset
0, ISO. The magic, CD001, is located at offsets 0x8001,0x8801 and 0x9001:

$> xxd dsl-4.11.rc1.iso  | egrep '^0+8000:'
00008000: 0143 4430 3031 0100 4c49 4e55 5820 2020  .CD001..LINUX
$>

What if we put a little ELF at the beginning of an ISO file?

$> dd if=proto1 of=dsl-4.11.rc1.iso  bs=1 count=464  conv=notrunc
464+0 records in
464+0 records out
464 bytes copied, 0.00196944 s, 236 kB/s
$> chmod +x dsl-4.11.rc1.iso
$> ./dsl-4.11.rc1.iso
hey!!
$> file dsl-4.11.rc1.iso
dsl-4.11.rc1.iso: ELF 64-bit LSB executable, x86-64, version 1 (SYSV),
    statically linked, stripped
$> exiftool dsl-4.11.rc1.iso
ExifTool Version Number         : 10.80
File Name                       : dsl-4.11.rc1.iso
Directory                       : .
File Size                       : 51 MB
File Modification Date/Time     : 2020:04:08 16:21:53+02:00
File Access Date/Time           : 2020:04:08 16:22:18+02:00
File Inode Change Date/Time     : 2020:04:08 16:22:15+02:00
File Permissions                : rwxrwxr-x
File Type                       : ISO
File Type Extension             : iso
MIME Type                       : application/x-iso9660-image
System                          : LINUX
Volume Name                     : KNOPPIX
Volume Block Count              : 25932
Volume Block Size               : 2048
Root Directory Create Date      : 2012:08:01 07:02:27-04:00
Software                        : MKISOFS ISO 9660/HFS FILESYSTEM BUILDER &
CDRECORD CD-R/DVD CREATOR (C) 1993 E.YOUNGDALE (C) 1997 J.PEARSON/J.SCHILLING
Volume Create Date              : 2012:08:03 18:12:29.00-04:00
Volume Modify Date              : 2012:08:03 18:12:29.00-04:00
Volume Effective Date           : 2012:08:03 18:12:29.00-04:00
Boot System                     : EL TORITO SPECIFICATION
Volume Size                     : 51 MB
$>

The file command identifies it as an ELF file. Nevertheless, exiftool
identifies it as an ISO file. Virustotal does not detect anything wrong
but identifies it as an ELF file [8]:

MD5 02f4d023109e06ded5e1e438c8e7cf54
SHA-1 c1c31ed29ebad8bbb44221e21a23bcbcfa749648
SHA-256 1065e4ea505969b9a94470d645fb28205afe16b2e422073717877c2cc80adb40
Vhash d87386f358c10d5e3e92962ee598c8dc
SSDEEP 786432:DegLQvyiyhMuyZNalZrpJeKpmpOueJ46G3IhABa9LIYloO3aTuzREoPqlJKpn:Dem
MuyibzuOuvNA9sYmOKSzGcF
File type ELF
Magic ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically
    linked, stripped
File size 50.65 MB (53108736 bytes)

It would be a good way to bypass some AV and IDS protections. For example,
the Ubuntu Linux ISO is a very common download:

$> ./ubuntu-19.10-desktop-amd64.iso
hey!!
$> file ubuntu-19.10-desktop-amd64.iso
ubuntu-19.10-desktop-amd64.iso: ELF 64-bit LSB executable, x86-64,
    version 1 (SYSV), statically linked, not stripped
$> exiftool ubuntu-19.10-desktop-amd64.iso
ExifTool Version Number         : 10.80
File Name                       : ubuntu-19.10-desktop-amd64.iso
Directory                       : .
File Size                       : 2.3 GB
File Modification Date/Time     : 2020:04:08 16:09:53+02:00
File Access Date/Time           : 2020:04:08 16:10:06+02:00
File Inode Change Date/Time     : 2020:04:08 16:10:05+02:00
File Permissions                : rwxrwxr-x
File Type                       : ISO
File Type Extension             : iso
MIME Type                       : application/x-iso9660-image
Volume Name                     : Ubuntu 19.10 amd64
Volume Block Count              : 1203048
Volume Block Size               : 2048
Root Directory Create Date      : 2019:10:17 12:52:27+00:00
Data Preparer                   : XORRISO-1.2.4 2012.07.20.130001,
LIBISOBURN-1.2.4, LIBISOFS-1.2.4, LIBBURN-1.2.4
Volume Create Date              : 2019:10:17 12:53:34.00+00:00
Volume Modify Date              : 2019:10:17 12:53:34.00+00:00
Boot System                     : EL TORITO SPECIFICATION
Volume Size                     : 2.3 GB
$>

----[5.1 - ELF + TAR + ISO

Creating an ISO+TAR+ELF polyglot, with the three valid magic numbers, is
trivial:

$> dd if=polyglottar.tar of=dsl-4.11.rc1.iso  bs=1 count=11104 conv=notrunc

The polyglot is identified as an ISO file or a TAR file, it depends on the tool:

$> file dsl-4.11.rc1.iso
dsl-4.11.rc1.iso: POSIX tar archive (GNU)
$> exiftool dsl-4.11.rc1.iso
ExifTool Version Number         : 11.88
File Name                       : dsl-4.11.rc1.iso
Directory                       : .
File Size                       : 51 MB
File Modification Date/Time     : 2022:02:27 19:48:34+01:00
File Access Date/Time           : 2022:02:27 19:49:12+01:00
File Inode Change Date/Time     : 2022:02:27 19:49:11+01:00
File Permissions                : rwxrwxr-x
File Type                       : ISO
File Type Extension             : iso
MIME Type                       : application/x-iso9660-image
System                          : LINUX
Volume Name                     : KNOPPIX
Volume Block Count              : 25932
Volume Block Size               : 2048
Root Directory Create Date      : 2012:08:01 07:02:27-04:00
Software                        : MKISOFS ISO 9660/HFS FILESYSTEM
BUILDER & CDRECORD CD-R/DVD CREATOR (C) 1993 E.YOUNGDALE (C) 1997
J.PEARSON/J.SCHILLING
Volume Create Date              : 2012:08:03 18:12:29.00-04:00
Volume Modify Date              : 2012:08:03 18:12:29.00-04:00
Volume Effective Date           : 2012:08:03 18:12:29.00-04:00
Boot System                     : EL TORITO SPECIFICATION
Volume Size                     : 51 MB
$>

We can execute the ELF:

$> ./dsl-4.11.rc1.iso
hey!!
$>

Also, we can mount the ISO image:

$> mount | grep KN
/tmp/dsl-4.11.rc1.iso on /media/xxx/KNOPPIX type iso9660
(ro,nosuid,nodev,relatime,nojoliet,check=s,map=n,blocksize=2048,uid=1000,
              gid=1000,dmode=500,fmode=400,iocharset=utf8,uhelper=udisks2)
$> ls /media/xxx/KNOPPIX
boot  index.html  KNOPPIX
$>

--[ 6 - Conclusions

Although polyglottar uses a known mechanism to hide exploits, most
virustotal AV engines don't detect it. They should.

There can be other file formats with the magic number located at exotic
offsets. Those formats can be also exploited to create "structured"
polyglots like polyglottar (e.g. CD-i CD image file, Apple Disk image,
MPEG Transport Stream, etc. [9]).

Polyglots are fun.

--[ 7 - References

[1] Polyglot links. https://github.com/mindcrypt/polyglot

[2] "Abusing File Processing in Malware Detectors for Fun and Profit". 2012
IEEE Symposium on Security and Privacy. DOI: 10.1109/SP.2012.15

[3] Polyglots database. https://github.com/Polydet/polyglot-database

[4] Basic Tar Format. GNU tar: an archiver tool.
https://www.gnu.org/software/tar/manual/html_node/Standard.html

[5] ELF Binary Mangling Part 1 - Concepts
https://netspooky.medium.com/elf-binary-mangling-part-1-concepts-e00cb1352301

[6] https://www.virustotal.com/gui/file/
3015d4ad1d002032ce1f9bc07507562916e190c6130568f886f66d36667438d6/detection

[7] https://www.virustotal.com/gui/file/
2ae17d426ccae26ceb994625524978f6e68721bcda18e8885ff2091031c8900e/detection

[8] https://www.virustotal.com/gui/file/
1065e4ea505969b9a94470d645fb28205afe16b2e422073717877c2cc80adb40/detection

[9] List of file signatures. Wikipedia.
https://en.wikipedia.org/wiki/List_of_file_signatures

------
(cc) Enrique Soriano-Salvador Creative Commons (by-nc-nd).
Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA.
------

--[ PREV | HOME | NEXT ]--