Context
I’ve been looking into symbolic execution lately and, more specifically, angr, a binary analysis framework which is also capable of symbolically executing binaries.
So what’s all this “symbolic” execution about?
It’s simpler to explain visually. Suppose we have the following simple C program:
#include <stdio.h>
int main(int argc, char* argv[])
{
int x = 0;
scanf("%d", &x);
if (x % 2 == 0)
printf ("x is even\n");
else
printf ("x is odd\n");
return 0;
}
When executing this program, after the user supplies the value of x
(i.e. 5), it is set in stone from that point on (or until the program ends or changes the value of x
somehow). We say that x
has a concrete value. Since x
is fixed, we can safely say that the program will only run through one of the two possible paths.
Symbolic execution allows for variables, registers, memory regions and even file descriptors to be symbolic. This means that in a symbolic context, x
is no longer the (concrete) value supplied by the user, but rather a symbolic value. This new type of value will reach the if
statement which will trigger the symbolic execution engine to fork on two different paths: one in which x % 2 == 0
holds true (x
still being symbolic, and the entire expression as well) and one in which the negation holds true. This way, the engine explores all possible paths of the program.
As the paths are explored, symbolic expressions are built and constraints are added to the forked states. These constraints can then be passed on to an SMT solver like Z3 which will determine a concrete value which satisfies said constraints.
How does this help me in a practical way?
Think of a program which requires a valid input to grant you access to some file or network. A program can be seen as a complex graph of basic blocks of code. This graph most likely has a “start” node, a “success” node and a “failure” node, among many other intermediate nodes. You want to find the answer to the question: how do I get from the “start” node to the “success” node while avoiding the “failure” node? The answer to your question can be given through symbolic execution.
There are practical limitations to symbolic execution, such as path explosion, which won’t be discussed in this post.
Example
We’ll use the binary from tutorial 2 from Radare2 Explorations.
$ ./xor
Enter the password: 1234
Wrong!
Let’s have a look at the internals of this binary using radare2.
╒ (fcn) sym.main 256
│ ---------------CUT--------------------
│ 0x08048440 6880860408 push str.Enter_the_password: ; str.Enter_the_password: ; "Enter the password: " @ 0x8048680
│ 0x0804845d e8eefeffff call sym.imp.printf
│ 0x08048462 58 pop eax
│ 0x08048463 5a pop edx
│ 0x08048464 57 push edi
│ 0x08048465 6895860408 push str._32s ; str._32s ; "%32s" @ 0x8048695
│ 0x0804846a e821ffffff call sym.imp.__isoc99_scanf
│ 0x0804846f 59 pop ecx
│ 0x08048470 58 pop eax
│ 0x08048471 8d45d7 lea eax, [ebp - local_29h]
│ 0x08048474 50 push eax
│ 0x08048475 57 push edi
│ 0x08048476 e835010000 call sym.check
│ 0x0804847b 83c410 add esp, 0x10
│ 0x0804847e 85c0 test eax, eax
│ ┌─< 0x08048480 741c je 0x804849e
│ │ 0x08048482 83ec0c sub esp, 0xc
│ │ 0x08048485 68a1860408 push str.Good_job__:_ ; str.Good_job__:_ ; "Good job! :)" @ 0x80486a1
│ │ 0x0804848a e8d1feffff call sym.imp.puts
│ │ 0x0804848f 83c410 add esp, 0x10
│ │ ; JMP XREF from 0x080484ae (sym.main)
│ ┌──> 0x08048492 8d65f8 lea esp, [ebp - local_8h]
│ ││ 0x08048495 31c0 xor eax, eax
│ ││ 0x08048497 59 pop ecx
│ ││ 0x08048498 5f pop edi
│ ││ 0x08048499 5d pop ebp
│ ││ 0x0804849a 8d61fc lea esp, [ecx - 4]
│ ││ 0x0804849d c3 ret
│ ││ ; JMP XREF from 0x08048480 (sym.main)
│ │└─> 0x0804849e 83ec0c sub esp, 0xc
│ │ 0x080484a1 689a860408 push str.Wrong_ ; str.Wrong_ ; "Wrong!" @ 0x804869a
│ │ 0x080484a6 e8b5feffff call sym.imp.puts
│ │ 0x080484ab 83c410 add esp, 0x10
╘ └──< 0x080484ae ebe2 jmp 0x8048492
The program seems fairly simple. It reads a password from stdin, calls the sym.check
function, and then prints “Good job! :)” or “Wrong!” depending on the result of the verification.
We’re going to solve this blindly. angr
is going to do all of the work for us. We don’t even care what the check function does. We just need to tell angr
that we want an input which gets us at 0x08048485
, which is our “success” state, and avoid 0x080484a1
, which is our “failure” state.
import angr
def main():
p = angr.Project("./xor", load_options={'auto_load_libs': False})
ex = p.surveyors.Explorer(find=(0x08048485,), avoid=(0x080484a1,))
ex.run()
return ex.found[0].state.posix.dumps(0).strip('\0\n')
if __name__=='__main__':
print main()
Let’s see how well it does.
$ time python solve.py
MONO[th4t_wa5~pr3tty=ea5y_r1gh7]
real 0m2.210s
user 0m2.084s
sys 0m0.104s
$ ./xor
Enter the password: MONO[th4t_wa5~pr3tty=ea5y_r1gh7]
Good job! :)
Almost feels like cheating, doesn’t it?
Have fun with angr in your future endeavors!