Polymorphic shellcodes for samples taken from Shell-Storm

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert Certification. The task for 6/7 assignment is to take up to 3 shellcodes from shell-storm and create polymorphic versions of those samples to beat pattern matching. The requirement is that the polymorphic versions cannot be larger than the 150% of the existing shellcode.

Student ID: SLAE64 - 1594

Shellcode 1 - execve()

The first shellcode I chose from the Shell-Storm website was a generic 64-bit execve() shellcode which is 30 bytes long. So, I created a polymorphic version with the length of 43 bytes (less than 150% of the original shellcode). On the picture below I have shown which exact instructions got replaced with something else and then continue to explain each desicion one step at a time.

1 - Instead of the push rax marked in yellow, I decided to xor rdx, rdx first and use that instead, as I would be needing to set rdx zero anyway to prepare it for the execve() syscall.

2 - I also decided to replace the push rbx (in green) from the original shellcode in a manner where I could get the same result but via another register. For that I simply chose to move the value in rbx to another register which I then push’ed instead.

3 - To prepare the 2nd argument rsi for the syscall, I took another way of first allocating space for it right after I had pushed 8 zero-bytes to the stack. Then moved the value from rdi to an address appointed to by the stack pointer (so right to the spot which I just allocated) and moved that address to rsi.

4 - since I did not clean the rax register somewhere earlier, I made sure it was zeroed out before loading the syscall number to it and executed the syscall.

Shellcode 2 - execveat()

Going through shellcodes at Shell-Storm, I came by a shellcode which used the execveat() syscall. I had never tried it myself before so seemed an interesting way to go with that one. I ended up with a 31 byte long polymorphic version of that shellcode, so only 2 bytes longer than the original one.

1 - the author of the original shellcode wanted to prepare the syscall number 0x142 for execveat() as the very first thing. Without loading the value directly to its intended register rax, he loaded a value 0x42 and then incremented the ah by 1 ending up with the value 0x142. I decided to go the other way around and substract 1 from 0x143 instead, ending up with the same result.

2 - cqo is an intresting instruction, which doubles the size of the operand in register ax/eax/rax and stores the result as dx:ax/edx:eax/rdx:rax. so basically what happens here is that rdx register will simply be zerod out, because of sign-extending. We can get the same result by xor’ing rdx with itself.

3 - the instructions marked in blue, where the program name “/bin/sh” is placed eventually to the top of the stack with push reg instruction, I simply used another register (rdi –> rcx).

4 - I removed the lines marked in red, as there was no need for those instructions. This ended up shortening my shellcode a little bit as well.

Shellcode 3 - execve()

Shellcode number 3 is 27 bytes in length and quite much similar to the first one, but gets the program name for rdi by negating the value in rbx. I decided to go with the imul instruction instead as you can see on the image below.

1 - I chose to substract the value in rax from itself, ending up with a zero.

2 - Next I moved the value 0xEEBE234583299 to rbx. Multiplying it with 0x7 EEBE234583299 × 7 = 68732F6E69622F and converting this hex value to ascii, will give us the string /bin/sh.

3 - cdq replacement with mov rdx, rax allowed me to have the same effect of zeroing out rdx as I assured earlier that the value in rax would be zero.

4 - Finally the pop rsi instruction. What pop essentially does is that it takes the value stored at an address pointed to by rsp and places that value into rsi. As a side-effect, rsp is incremented by 8. So, by using the mov rsi, [rsp] I am going for the value stored at an address rsp and moving it to rsi. Then with add rsp, 0x8 I increment the rsp as this is what would happen automatically if I used the pop instruction.

So what we learnt?

• cqo is good to use when you try to zero out rdx by sign-extending the value in rax
• We can switch up the registers we use in our shellcode super easily
• We can use various combinations of instructions to end up with the same result value