Early on in my career, I was a Solaris Systems Administrator.
Solaris, we’d scoff, is far superior an Operating System to Linux. When asked why, we’d point to three things: Solaris Zones, ZFS, and DTrace.
Then along came Oracle. They acquired Sun Microsystems, closed-sourced Solaris, ratcheted up licensing fees and laid off most of Sun’s talent. Within a few short years, they had killed Solaris, leaving Linux to pick through the bones.
Linux got Zones, in a big way. They eventually called it Docker and used it to revolutionise the way we deliver software.
Linux also got ZFS, through OpenZFS (although, for political reasons involving those cheeky scamps at Oracle again, it is unlikely to ever be part of the mainline kernel.)
But Linux never did get DTrace…Until now, that is.
Well, sort of. bpftrace
is the spiritual successor to DTrace. Like DTrace, it enables the tracing of pretty much any activity that happens within the OS. Like DTrace, it has a powerful scripting language and suite of tools. But where the two differ is that bpftrace is built atop a revolutionary technology called eBPF.
eBPF allows the running of sandboxed code in response to events occurring (or, more properly, probes firing). This facilitates the tracing (and even modification) of any program running in both user-space and kernel-space. There are many resources out there for eBPF; it has its own homepage, and even a movie!.
However, writing and running eBPF programs is a non-trivial exercise. It requires knowledge of C, and a learning curve steep enough to deter all but the most adventurous of engineers. But this is where bpftrace
comes in: it acts as a front-end for eBPF. It allows one to specify probes and actions through a scripting language, and transparently handles the attaching of those probes into the kernel.
The visibility and debugging powers that bpftrace
provides are truly game-changing, and what better way to illustrate them than a pointless example: let’s build a key-logger!
The Key Logger
Our example will do one simple thing: log every bash command which is run by any user across our OS.
But where to start? There must be a function which is responsible for reading bash commands from the terminal.
We could use bpftrace
to figure out which function is called when enter commands. But in the interests of keeping this article simple, let’s just say there is a function, and it’s called readline
.
Bash is of course OSS, and we can see the actual function code here.
The function signature of readline()
looks like this:
/* Read a line of input. Prompt with PROMPT. An empty PROMPT means
none. A return value of NULL means that EOF was encountered. */
char * readline (const char *prompt)
readline
is a function that takes a string prompt - i.e. some message to prompt the user into entering some text - and returns a string. The string is the text that was duly entered by the prompted user, delimited by a carriage return. But don’t take my word for that, let’s observe it in action, using bpftrace
.
In order to do this, we’ll use a certain flavour of eBPF probe, a uprobe
.
First, what is a probe? A probe is an eBPF program that can be attached to a given location in any code. For ‘location’ think function (or, more properly, symbol). Whenever that function is invoked, the probe fires, and the eBPF program is executed.
Uprobes are the user-space variant of a probe (the kernel-space equivalents are called kprobes
, there are also tracepoints
and many other probe types).
Our First Bpftrace Script
OK, We need to attach a uprobe
to the readline
symbol in the bash
binary.
As a brief aside, if we didn’t know the symbol we needed to inspect, we can list all the symbols of a compiled binary using the objtool
command:
$ objtool -T /bin/bash | grep readline
A bpftrace
probe which matches the readline
symbol is constructed like this:
uprobe:/bin/bash:readline uprobe - binary - symbol
And we can use that probe in a bpftrace program like:
bpftrace -e 'uprobe:/bin/bash:readline { printf("%s\n", str(arg0)) }
Let’s unpack this a little bit.
The bpftrace -e
bit tells bpftrace to execute the following instruction, it makes this command a one-liner.
This is followed by our probe uprobe:/bin/bash:readline
.
Whenever this probe fires, the code within the curly braces will be executed: { printf("%s\n", str(arg0)) }
.
The code itself should be immediately recognisable as a printf
statement, but the str(arg0)
argument bears some explaining…
With uprobes
we get access to the actual arguments that the function was called with, conveniently populated as arg0...argN
. These arguments are always uint64
s and so need to be cast to their concrete types. In our case, the str()
function de-references the argument.
Now, if we run that bpftrace command in one terminal session, and then hit return a couple of times in another, we should see some output like this:
This is, of course, our prompt (the ever pliable PS1), which is nice…but not quite what we are looking for. We’re building a key-logger, remember? We need to see the text that the user entered in response to the prompt. We need the return value of readline
, and for that we need a slightly different breed of uprobe
, a uretprobe
.
As you’ve maybe intuited, uretprobes
work exactly the same way as uprobes
except they fire when a function returns, not when it is called.
bpftrace -e 'uretprobe:/bin/bash:readline { printf("%s\n", str(retval)) }'
Notice too that there is a slight change to the printf
statement, we are now printing the value of retval
. This is a bpftrace builtin which is set to the return value of the function being traced (in C, functions can only return a single value).
If we run this command in one terminal, and enter some commands in another we should see this:
There - we are logging every bash command that any user across our system executes!
Summary
Now, clearly our key-logger isn’t going to change anybody’s life. BUT, let’s just step back and reflect: we attached a probe that will fire whenever a particular user-space function is invoked. We were able to log both the function arguments and its return value, and we did all this with a single one-liner. And we’ve barely scratched the surface of what bpftrace can do, the possibilities are endless.
When I discovered DTrace as a fresh-faced engineer, it opened a portal into a hitherto uncharted world. I could suddenly witness the inner workings of the OS, and its interactions with CPU, memory and hardware; it led me into a 20-year dalliance with systems programming and OS internals. My hope is that bpftrace
can do the same for a new generation of engineers. Thanks for reading!