With the recent release of Frida version 9, I got motivated to dive into it some more and figure things out by myself, since the Linux section is disappointingly dry at the moment.
Dynamic Binary Instrumentation
DBI is a runtime analysis technique for code, be it source or binary. You usually come across it in relation to code profiling done in order to optimize performance or find memory leaks.
The principle behind instrumentation is that of injecting your own code to run inside a given process. In layman’s terms, the main difference in principle between instrumenting and debugging is that with a debugger you attach to a process; with instrumentation, you are the process (in some sense).
Illustrative example
Consider the following code:
We wish to instrument the malloc
and free
instructions by inserting our own code. We obviously cannot do this in the .text segment. The interpreter will map its own region of memory in which it can both write and execute code. It will then make a copy of the original code and add our own, as follows:
In this way, whenever malloc
is called, it will in turn call our instrumentation routine, which can print the argument (or change it!), or the return value, or increment a counter, print the register values at that point and so on. Much more complex things can be achieved, such as passing a custom sockaddr_in
struct to a connect
call.
This technique is known as interception. Instruction level instrumentation is the fine-grained version in which each instruction is instrumented, rather than each function.
Use cases
As mentioned previously, profiling and tracking down leaks (the Valgrind suite is a good example for this). But there are other interesting use-cases as well, such as fault injection, reversing/discovering APIs, building code tracers, side-channel attacks on badly implemented crypto binaries (i.e. via counting instructions), fuzzing and taint analysis.
Frameworks
Two very efficient and feature-rich instrumentation frameworks are Intel’s Pin and DynamoRIO. Both of them provide a C/C++ API in which you can write your instrumentation code. You then have to compile your code into a dynamic library which will be injected in the desired binary.
Frida
The third option is the relatively recent but fast-growing Frida framework. There are a couple of advantages (or disadvantages, depending on how you look at it). Frida injects a JavaScript interpreter (Duktape by default as of version 9; it’s capable of also injecting the bulkier Google V8 engine) inside the binary, which is capable of running JS code. Now, instead of writing C code, you’re writing JS to instrument your binary. This also means that you don’t have to compile anything. Frida always injects the same interpreter; what gets changed is the instrumentation code written in JavaScript. In effect, you are manipulating low-level elements (basic blocks, instructions) using a high-level language.
Frida is a good excuse for a reverse engineer to learn a bit of JavaScript, or for a web developer to learn a bit of reversing. Being in the former case, I stumbled across the wonderful world of JS, where every Number is a Float, where the triple equals operator exists (and is needed; and I heard of a quad equals operator being requested) and where very interesting (in the most frustrating sense imaginable) things happen.
The comprehensive JS API features some very high-level entities, such as ObjC and Java, which allow for access to native ObjectiveC and Java methods and objects, which are brilliant to use when working with mobile platforms like Android and iOS.
While the instrumentation code has to be written in JavaScript, the resulting tools can be written in either Python or JS. The injected interpreter can communicate with your application via primitive send
and recv
methods. The data exchanged has to be serializable to JSON.
The REPL
After installing the framework and Python bindings (which is a breeze via pip), you get a collection of tools which have been built using Frida, such as the REPL, frida-discover, frida-ls-devices, frida-ps, frida-trace.
Just like with a debugger, you can use the Frida CLI app to attach to a process or spawn a new one.
We’re given a fully fledged, beautifully-colored JS REPL, much like iPython, inside the binary. What’s lacking as of now is an interactive help, but that’s what the JS API Docs are for.
We can explore the binary a little, by enumerating function names from imports, getting addresses from debug symbols (won’t work on stripped binaries, obviously), disassemble an instruction at an address.
Let’s build on that last example to disassemble the main function.
Moving on to scripting
That last example was a little extreme for CLI use. We can use it as a building block for a simple disassembly tool. Please note that I like to keep my instrumentation JS code and my Python management code in separate script files.
First, the code which performs the actual disassembly.
This script will receive an address in hex from the Python script, which will in turn be given as a command-line argument.
Next, the management code, pretty easy to read and understand.
Now we can test it.
Brilliant!
Building our own ltrace
Let’s use Frida’s Interceptor
to trace all malloc
and free
calls performed by a binary, similar to ltrace
. We want to know how much is being requested to be allocated, pointer values returned and the argument of free.
We’ll be using pretty much the same Python script, but do note the frida.resume(pid)
to get the process to resume execution.
Frida’s Interceptor
can auto-detect some of the common calling conventions. If this wasn’t the case, then we could simply use the global context
to read registers, or navigate through memory to retrieve the arguments.
And that’s about it. Let’s see if this works.
frida-trace
Let’s redo the last example using frida-trace
, a nifty tracer built using Frida.
Notice that, again, Frida has no inner understanding about malloc
, free
and their respective arguments. The frida-trace
tool has generated handler stubs for us in the local directory, which we can modify to our liking.
Both stubs look something like this (discarding helpful comments):
We can change these to supply us with useful information. In the case of free
, we need only print the argument.
While for malloc
, we’re also interested in the return value.
If we run frida-trace
again, it will use the handlers we just modified.
That’s about it for this session. Stay tuned for more in the (hopefully) near future, when I’ll dive into the Stalker API and provide a fun use-case.