hackme: Deconstructing an ELF File

A friend recently asked me to find the password for a little hard-to-hack program he had written and I agreed to it. The short journey of a few hours that led me to its password were extremely interesting and this article describes the process as well as some of the new techniques learnt along the way.

Few minutes after accepting his challenge, I received a binary called "hackme" in an E-mail and I got started! Those interested in giving it a shot can download the binary file and get back to this article later. Do let me know if you find anything interesting along the way that I did not think of or missed! Comments can be sent to manohar dot vanga at gmail dot com with the title [hackme]. UPDATE: You can post comments to the Hacker News discussion.

Test Run

I ran the binary and tried a couple of random passwords. As expected, they failed, providing me with an extremely helpful message:

$ ./hackme
Password, please? password
Oops..

Interestingly, running the binary under GDB gave a specially crafted welcome message:

$ $ gdb ./hackme 
Reading symbols from /tmp/hack/hackme...(no
debugging symbols found)...done.
(gdb) r
Starting program: ./hackme 
Fuck off! no debuggers!

Program exited with code 0364.
(gdb) 

Same with ptrace:

$ strace ./hackme 
execve("./hackme", ["./hackme"], [/* 41 vars */]) = 0
brk(0)                                  = 0x9016000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such
file or directory)
... snip ...
ptrace(PTRACE_TRACEME, 0, 0, 0)         = -1 EPERM (Operation
not permitted)
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...})
= 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb783e000
write(1, "Fuck off! no debuggers!\n", 24Fuck off! no debuggers!
) = 24
_exit(2543604)                          = ?

The Obvious Stuff

Even though the probability of a password being visible in plain sight was near zero, I decided to give it a shot anyway.

First, I checked if the binary was stripped or not:

$ file hackme
hackme: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically
linked (uses shared libs), for GNU/Linux 2.6.27, stripped

It was stripped. Dead end there. GDB would not be much help with a stripped binary in terms of deconstructing the logic. I decided to try looking for a literal string comparison with the password within the binary just in case:

$ strings hackme
/lib/ld-linux.so.2
libdl.so.2
__gmon_start__
_Jv_RegisterClasses
dlopen
dlsym
libc.so.6
_IO_stdin_used
__libc_start_main
random
GLIBC_2.1
GLIBC_2.0
PTRh 
QVhE
[^Ph
[^_]
8%':!06!
%!'460
&64;3
%'<;!3
UWVS
[^_]
Fuck off! no debuggers!
Password, please? 
Congratulations!
Oops..

Just in case, I try all the strings as passwords but they all fail. Not too surprising. The output however gives me the message that gets printed on success; "Congratulations!". It also seems to be containing the string "libc.so.6". Fishy. A quick ltrace quickly explains what the binary is doing:

$ ltrace ./hackme 
__libc_start_main(0x8048645, 1, 0xbfb48a04, 0x80486b0, 0x8048720
<unfinished ...>
dlopen("/lib/libc.so.6", 2)
= 0xb7757ae0
dlsym(0xb7757ae0, "ptrace")
= 0x00eddf40
dlsym(0xb7757ae0, "scanf")
= 0x00e621a0
dlsym(0xb7757ae0, "printf")
= 0x00e5baa0
Fuck off! no debuggers!
+++ exited (status 244) +++

While this gives us the same welcoming message, we can see what is happening here. The libc shared library is being opened dynamically and the addresses of ptrace, scanf and printf are being retreived with dlsym! Sneaky trick!

More irritatingly however, the strings output shows that the binary is using the random() function. However, since it is a reproducible program, that is the password works every time, the call to random is probably not being seeded. We will worry about this later.

The strings output also explains how the binary is detecting the debugging environment. Calling ptrace while running inside a ptraced environment (eg. strace, ltrace or gdb) returns -1.

Getting over this debugger hurdle however is quite easy using the LD_PRELOAD environment variable. The LD_PRELOAD environment variable can contain a list of custom shared objects which are then loaded prior to all others when running an executable. It is an easy route to blocking a process from calling unwanted functions. I quickly wrote a quick dummy function in a new file:

/* fake ptrace() */
#include <stdio.h>

long ptrace(int x, int y, int z)
{
	printf("B-)\n");
	return 0;
}

...and compiled it:

gcc -shared -fPIC -o fake.so fake.c

Now running under strace with the LD_PRELOAD variable set to our fake shared object gives us the full scoop!

$ strace -E LD_PRELOAD=./fake.so ./hackme
execve("./hackme", ["./hackme"], [/* 24 vars */]) = 0
brk(0)                                  = 0x9727000
access("/etc/ld.so.nohwcap", F_OK)      = -1 ENOENT (No such
file or directory)
mmap2(NULL, 8192, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb78a6000
open("./fake", O_RDONLY)                = 3
read(3,
"\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\240\3\0\0004\0\0\0"...,
512) = 512
... snip ...
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb78a1000
write(1, "Password, please? ", 18Password, please? )      = 18
read(0, password
"password\n", 1024)             = 9
write(1, "Oops..\n", 7Oops..
)                 = 7
exit_group(7)                           = ?

Looks like the password buffer is 1024 bytes long. I could try overflowing that but coupled with stack randomization (which can be turned off if I remember correctly), it would be hard to do on a lazy Friday. More importantly, my goal is not the break the program but to get the password!

It is starting to seem like my only option left is to sit and reverse engineer the code for this binary; something I did not want to get in to on a Friday afternoon. A geek challenge however trumps laziness any day of the week so I started with the task of disassembling the binary.

Disassembly

I start with an objdump output (go ahead and open it in a new tab if you want to follow along):

$ objdump -D ./hackme > out.asm

The assembly is a mess as expected of a stripped binary. I quickly need to find the part that pertains to the password encryption logic. From the test run, I know that the logic is somewhere between where it prints the "Password, please?" message and the "Oops.." message. I will start by locating these strings in the assembly output and backtracing to where they are being used. The "Pa" of the "Password, please?" string is 50 followed by 61 in hexadecimal. A quick search locates the string in the assembly output:

$ grep "50 61" objdumpout.txt 
 8048798:	00 50 61             	add    %dl,0x61(%eax)

The address of the string therefore is 0x8048799 (since the first byte needs to be skipped). Searching the file for this address leads me to this code:

 804859d:       68 99 87 04 08          push   $0x8048799
 80485a2:       ff 15 94 99 04 08       call   *0x8049994

Great! It is pushing the address of the string (as a literal) on to the stack and calling an address. I assume that this is the variable storing the address returned by dlsym for printf.

Now I need to find the "Oops.." message. I repeat the same procedure to get to the string and find the code:

 8048633:       68 c1 87 04 08          push   $0x80487c1
 8048638:       ff d0                   call   *%eax

While I am at it, I pick out the "Congratulations!" message as well and find the location. Finally the code looks much more comprehensible:

 # The "Password, please?" message is being printed here
 804859d:	68 99 87 04 08       	push   $0x8048799
 80485a2:	ff 15 94 99 04 08    	call   *0x8049994
 80485a8:	8d 45 84             	lea    -0x7c(%ebp),%eax
 ... snip ...
 8048626:	83 ec 0c             	sub    $0xc,%esp
 # The "Congratulations!" message is being printed here
 8048629:	68 af 87 04 08       	push   $0x80487af
 804862e:	eb 08                	jmp    8048638 <dlopen@plt+0x268>
 8048630:	83 ec 0c             	sub    $0xc,%esp
 # The "Oops.." message is being printed here
 8048633:	68 c1 87 04 08       	push   $0x80487c1
 8048638:	ff d0                	call   *%eax

I quickly annotate the assembly (bottom posting of comments) so I remember what I figured out:

 804859d:	68 99 87 04 08       	push   $0x8048799
 80485a2:	ff 15 94 99 04 08    	call   *0x8049994
 # The "Password, please?" message is being printed here

 80485a8:	8d 45 84             	lea    -0x7c(%ebp),%eax
 # This is probably the address of the password buffer.

 80485ab:	5b                   	pop    %ebx
 80485ac:	5e                   	pop    %esi

 80485ad:	50                   	push   %eax
 80485ae:	68 ac 87 04 08       	push   $0x80487ac
 80485b3:	ff 15 90 99 04 08    	call   *0x8049990
 80485b9:	83 c4 10             	add    $0x10,%esp
 # Push the password buffer and the string "%s" onto the stack and call scanf

 80485bc:	31 c0                	xor    %eax,%eax
 # Clear EAX.

 80485be:	eb 01                	jmp    80485c1 <dlopen@plt+0x1f1>
 80485c0:	40                   	inc    %eax
 80485c1:	80 7c 05 84 00       	cmpb   $0x0,-0x7c(%ebp,%eax,1)
 80485c6:	75 f8                	jne    80485c0 <dlopen@plt+0x1f0>
 # Find the string length of the password we entered. Return value in EAX.

 80485c8:	31 db                	xor    %ebx,%ebx

 80485ca:	83 f8 13             	cmp    $0x13,%eax
 80485cd:	0f 94 c3             	sete   %bl
 # Hmm! If the strlen(buf) != 0x13) BL is set to 1! We have our first hint!

 80485d0:	be 0a 00 00 00       	mov    $0xa,%esi
 # Move integer 10 into ESI. This is the start of a loop that runs 10 times.

 80485d5:	e8 b6 fd ff ff       	call   8048390 <random@plt>
 # Call random(). Return value in EAX

 80485da:	b9 13 00 00 00       	mov    $0x13,%ecx
 80485df:	99                   	cltd   
 80485e0:	f7 f9                	idiv   %ecx
 # Divide the random number in EAX with 19. EAX is quotient, EDX is remainder.

 80485e2:	31 c0                	xor    %eax,%eax
 # Throw away quotient.

 80485e4:	8a 8a 9c 86 04 08    	mov    0x804869c(%edx),%cl
 # Hmm. That address looks like a lookup table of some sort.
 # The operation is basically doing "CL = table[remainder]".
 # Since remainder can't be more that 19, I dump the first 19 bytes of this
 # address:
 #     0xfb, 0x4c, 0x8d, 0x58, 0x0f, 0xd4, 0xe8, 0x94, 0x98, 0xee,
 #     0x6b, 0x18, 0x30, 0xe0, 0x55, 0xc5, 0x28, 0x0e

 80485ea:	0f b6 7c 15 84       	movzbl -0x7c(%ebp,%edx,1),%edi
 # This basically does EDI = password[remainder]

 80485ef:	42                   	inc    %edx
 80485f0:	89 95 74 ff ff ff    	mov    %edx,-0x8c(%ebp)
 # Increment the remainder and store it in another variable

 80485f6:	31 d2                	xor    %edx,%edx
 80485f8:	eb 0c                	jmp    8048606 <dlopen@plt+0x236>
 80485fa:	69 c0 8d 78 01 6d    	imul   $0x6d01788d,%eax,%eax
 8048600:	42                   	inc    %edx
 8048601:	05 39 30 00 00       	add    $0x3039,%eax
 8048606:	3b 95 74 ff ff ff    	cmp    -0x8c(%ebp),%edx
 804860c:	7c ec                	jl     80485fa <dlopen@plt+0x22a>
 # This is a weird loop. It seems to be a pseudorandom generator.
 # The loop runs while a counter is less than the incremented remainder above.
 # Inside, it's doing the following (remember eax was cleared above to 0):
 #     eax = eax * 0x6d01788d //This is a prime number according to Wolfram Alpha
 #     eax += 0x3039 // 12345 in decimal
 # That is an unseeded (or seeded to 0) pseudorandom generator! Nice but
 # pointless as it is unseeded.

 804860e:	31 f8                	xor    %edi,%eax
 # XOR the pseudorandom value above with password[remainder] as stored above

 8048610:	38 c1                	cmp    %al,%cl
 # Compare the lower byte of the XOR'ed result with the lookup table entry stored in CL

 8048612:	b8 00 00 00 00       	mov    $0x0,%eax
 8048617:	0f 45 d8             	cmovne %eax,%ebx
 # If the lower byte of the XOR is not equal to the lookup table entry set EBX=0

 804861a:	4e                   	dec    %esi
 804861b:	75 b8                	jne    80485d5 <dlopen@plt+0x205>
 # Decrement the main loop counter (the one that runs 10 times) and jump
 # if more iterations are left

 804861d:	85 db                	test   %ebx,%ebx
 804861f:	a1 94 99 04 08       	mov    0x8049994,%eax
 8048624:	74 0a                	je     8048630 <dlopen@plt+0x260>
 # At last! Jump to the failure message (past the congratulations) if EBX is 0!
 # EBX should be non-zero in order to print the congratulations message!

 8048626:	83 ec 0c             	sub    $0xc,%esp

 # The "Congratulations!" message is being printed here
 8048629:	68 af 87 04 08       	push   $0x80487af
 804862e:	eb 08                	jmp    8048638 <dlopen@plt+0x268>
 8048630:	83 ec 0c             	sub    $0xc,%esp

 # The "Oops.." message is being printed here
 8048633:	68 c1 87 04 08       	push   $0x80487c1
 8048638:	ff d0                	call   *%eax

Phew! That was not as bad as I expected! Converting that logic to C code that dumps out the password took me a while with testing and the final result looked like the following:

#include <stdio.h>
#include <string.h>

int main()
{
	int i, j, edi;
	char buf[50], ch;
	char out[50];
	unsigned char check;
	int ret = 0, val, len, rem;
	int magic;
	int k;
	unsigned char arr[] = {0x6a, 0xfb, 0x4c, 0x8d, 0x58, 0x0f, 0xd4, 0xe8,
		0x94, 0x98, 0xee, 0x6b, 0x18, 0x30, 0xe0, 0x55, 0xc5, 0x28,
		0x0e};
	
	for (i = 0; i < 19; i++)
		out[i] = 'x';
	out[i] = '\0';

	for (i = 10; i > 0; i--) {
		int m2;

		val = random();
		rem = val%19;
		check = arr[rem] & 0xff;
		ch = buf[rem++];

		j = 0;
		magic = 0;
		printf("rem = %d\n", rem);
		while (j < rem) {
			magic *= 1828812941;
			magic += 12345;
			j++;
		}
		m2 = magic;

		magic ^= ch;
		out[rem - 1] = (m2 & 0xff) ^ (check & 0xff));
	}
	printf("Password: %s\n", out);
}

Now for the fun; the output:

$ ./decompiled
rem = 3
rem = 16
rem = 4
rem = 4
rem = 11
rem = 9
rem = 11
rem = 12
rem = 3
rem = 8
Password: xxsaxxxpexYoxxxexxx

The loop in the binary runs only 10 times and it has repeating checks for some offsets of the password. The only characters of the password that matter are the ones NOT marked with an 'x' in the output (I make the program set these as it runs).

Now the sweet part! I run the password against the original program:

$ ./hackme 
Password, please? xxsaxxxpexYoxxxexxx
Congratulations!

HA! That was fun!

Conclusions

So what did I learn?

Know Thy Tools

I knew how to get around many of the hurdles I faced from past knowledge and experiences with the various tools I used. The better you know your tools, the more you can think about the real issues at hand (in this case, reversing the program logic).

Test the Waters

I knew I would not find an easy way to reverse this program but I tried all the easy routes anyway. Even though it did not provide me with much information, I gained confidence after having eliminated some of the options. It cleared the way forward.

Know Thy Assembly

The machine instructions were a little hard to decompile and I found myself referencing the Intel manuals from time to time to figure out what was going on. More than the assembly itself though, I really recommend learning the GNU Assembler syntax. I was familiar with the Intel syntax (eg. NASM) but not completely proficient with GAS syntax (AT&T syntax). I found this article and this article very useful in quickly brushing this up.

Some thoughts on the program itself follow

All in all, a good Friday afternoon! Again, comments can be mailed to manohar dot vanga at gmail dot com with a title of [hackme].

Download: all files