Context
I’ve been looking into symbolic execution lately and, more specifically, angr, a binary analysis framework which is also capable of symbolically executing binaries.
So what’s all this “symbolic” execution about?
It’s simpler to explain visually. Suppose we have the following simple C program:
When executing this program, after the user supplies the value of x
(i.e. 5), it is set in stone from that point on (or until the program ends or changes the value of x
somehow). We say that x
has a concrete value. Since x
is fixed, we can safely say that the program will only run through one of the two possible paths.
Symbolic execution allows for variables, registers, memory regions and even file descriptors to be symbolic. This means that in a symbolic context, x
is no longer the (concrete) value supplied by the user, but rather a symbolic value. This new type of value will reach the if
statement which will trigger the symbolic execution engine to fork on two different paths: one in which x % 2 == 0
holds true (x
still being symbolic, and the entire expression as well) and one in which the negation holds true. This way, the engine explores all possible paths of the program.
As the paths are explored, symbolic expressions are built and constraints are added to the forked states. These constraints can then be passed on to an SMT solver like Z3 which will determine a concrete value which satisfies said constraints.
How does this help me in a practical way?
Think of a program which requires a valid input to grant you access to some file or network. A program can be seen as a complex graph of basic blocks of code. This graph most likely has a “start” node, a “success” node and a “failure” node, among many other intermediate nodes. You want to find the answer to the question: how do I get from the “start” node to the “success” node while avoiding the “failure” node? The answer to your question can be given through symbolic execution.
There are practical limitations to symbolic execution, such as path explosion, which won’t be discussed in this post.
Example
We’ll use the binary from tutorial 2 from Radare2 Explorations.
Let’s have a look at the internals of this binary using radare2.
The program seems fairly simple. It reads a password from stdin, calls the sym.check
function, and then prints “Good job! :)” or “Wrong!” depending on the result of the verification.
We’re going to solve this blindly. angr
is going to do all of the work for us. We don’t even care what the check function does. We just need to tell angr
that we want an input which gets us at 0x08048485
, which is our “success” state, and avoid 0x080484a1
, which is our “failure” state.
Let’s see how well it does.
Almost feels like cheating, doesn’t it?
Have fun with angr in your future endeavors!