A very simple eBPF program
Table of Contents
Introduction
Last year I saw a Computerphile video on eBPF which got me interested in exploring the topic a little more. eBPF is sometimes described as "JavaScript for the kernel" in the sense that it provides a safe virtual machine running inside the kernel, similar to how browsers come with a JavaScript engine. The programs targeting this virtual machine are compiled to a special instruction set and statically verified before execution to ensure that they come to halt. Originally the "BPF" in eBPF was short for Berkely Package Filter and the "e" was added for "extended", but nowadays it can do a lot more than just filter packets.
I wanted to write a very simple eBPF program that would make use of its capability to attach a probe to a kernel function in order to trace them. The first attempt I made was with BCC. It comes with plenty of examples and includes a Python frontend that takes care of compiling and loading the BPF program. This is very convenient but some of the magic it performs (in particular the LLVM bindings) made it hard for me to understand how it works.
I then instead tried libbpf which I found a lot easier to understand. The two main ingredients for writing a BPF application are a userspace program and a BPF program that can exchange some data. With libbpf both programs are written in C (although it should also be possible to write the userspace program in Rust with libbpf-rs). Many examples can be found in the libbpf-bootstrap repository and most of the program is based on what I saw there.
The very simple program I ended up writing attaches a probe to the kernel function tcp_set_state
.
The full code can be found in the repository tcpeed (github) or tcpeed (sourcehut), here I will only post the most important excerpts.
The BPF program
We begin with the BPF program, i.e. with the code that should be run when the kernel enters tcp_set_state
.
We have to include some libbpf header files that provide helper functions, macros to define the probes and shared data structures as well as macros to read data from kernel memory.
We also include a special header vmlinux.h
:
#include "vmlinux.h" #include <bpf/bpf_helpers.h> #include <bpf/bpf_tracing.h> #include <bpf/bpf_core_read.h>
The vmlinux.h
header file contains the BTF (BPF Type Format) information known to the kernel, such as kernel structs or enums.
For example, we can find a copy of the definition of the sock
struct which we will need below:
struct sock { struct sock_common __sk_common; // ... }; struct sock_common { union { __addrpair skc_addrpair; struct { __be32 skc_daddr; __be32 skc_rcv_saddr; }; }; // ... };
The kernel exposes the BPF type information in the file /sys/kernel/btf/vmlinux
and the corresponding vmlinux.h
header file can be generated from it with bpftool as follows:
bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
There are a few different data structures (called maps) that are used to communicate between the BPF program running in the kernel and the userspace program. Here we set up a ringbuffer with at most 1024 entries:
struct { __uint(type, BPF_MAP_TYPE_RINGBUF); __uint(max_entries, 1024); } events SEC(".maps");
To push an item into the ringbuffer, we must first reserve some space on it with bpf_ringbuf_reserve(void *ringbuf, __u64 size, __u64 flags)
.
This returns a pointer to memory that we can directly write data to.
The new item then has to be submitted with bpf_ringbuf_submit
.
There are different types of BPF programs with different sets of capabilities.
The program type BPF_PROG_TYPE_KPROBE
allows our BPF program to attach a probe to any kernel function.
The probe is a callback defined in the BPF program which is executed whenever the kernel enters the function that the probe is attached to.
To attach two probes that trigger when tcp_set_state
is called and has finished executing respectively, we can do the following in our BPF program:
SEC("kprobe/tcp_set_state") int BPF_KPROBE(tcp_set_state, struct sock *skp, int state) { // Some code to run when the kernel enters tcp_set_state return 0; } SEC("kretprobe/tcp_set_state") int BPF_KRETPROBE(tcp_set_state_ret, int ret) { // Some code to run when the kernel has finished running tcp_set_state return 0; }
The SEC(...)
macro tells the compiler in which ELF section the code should be put.
This is needed for libbpf to determine the BPF program type and to automatically attach the probes when the BPF program is loaded from the userspace program.
We can now compile the program into BPF bytecode:
clang -g -O2 \ -target bpf \ -D__TARGET_ARCH_x86 \ -I. \ -c tcpeek.bpf.c \ -o tcpeek.bpf.o
The flag -D__TARGET_ARCH_x86
sets the target architecture and has to be adapted if the program is compiled for a different architecture.
The -g
flag for debug information also seems to be necessary, otherwise some of the macros provided by libpbf to read kernel data do not work.
We now have an ELF file with BPF bytecode in it that can be loaded by our usespace program.1
The userspace program
To work with the BPF program from userspace, we generate a so-called skeleton header file that will be included in the userspace program.
bpftool gen skeleton tcpeek.bpf.o > tcpeek.skel.h
The skeleton header contains a struct struct tcpeek_bpf
in which libbpf stores references to the shared data structures (in our case the ring buffer) and a copy of the bytecode of our BPF program.
Moreover, it provides some helper functions to initialize the struct and load the BPF program as we will see below.
The basic structure of the userspace program is the following:
#include <bpf/libbpf.h> #include "tcpeek.skel.h" static int event_handler(void *ctx, void *data, size_t size) { // Code to run when a new item is available in the ring buffer return 0; } int main(int argc, char **argv) { struct tcpeek_bpf *skel; struct ring_buffer *ringbuffer; int err; skel = tcpeek_bpf__open_and_load(); // Some error handling in case this fails err = tcpeek_bpf__attach(skel); // Again, some error handling in case this fails ringbuffer = ring_buffer__new(bpf_map__fd(skel->maps.events), event_handler, NULL, NULL); // Again, some error handling in case this fails while (ring_buffer__poll(ringbuffer, -1) >= 0) { } return 0; }
Inside main
we have to open the BPF program, load it and attach the kernel probes.
The first two steps are achieved by tcpeek_bpf__open_and_load
which is one of the functions generated by bpftool
in the skeleton header file.
This also runs the BPF program through the static verifier which (among other things) ensures that the BPF program will come to a halt.
Attaching the probes also happens with a helper function from the skeleton header, tcpeek_bpf__attach
.
To know which kernel functions to attach to, libbpf uses the ELF section names which we set up with the SEC(...)
macro in the BPF program.
The ringbuffer for communication with the BPF program is initialized with ring_buffer__new
.
The first argument to this function is a file descriptor (created with bpf_map__fd
) which maps to the BPF program's events
variable.
The second argument is a callback function int event_handler(void *ctx, void *data, size_t size)
to be run when new items are pushed into the ringbuffer.
In our case this function simply prints some information about the socket that tcp_set_state
was called on.
Finally, the userspace program runs a main loop where it periodically checks for new items in the ringbuffer.
We can compile and link our program as follows:
clang -g -Wall -I. -c tcpeek.c -o tcpeek.o clang -g -Wall tcpeek.o /usr/lib/libbpf.so -o tcpeek
and run it as root with
sudo ./tcpeek
It should print some output whenever the state of a TCP connection changes.
Obviously this is an extremely simple program and the demo applications in the BCC and the libbpf-bootstrap repositories provide the same functionality and a lot more. I nevertheless found it very useful to go through the entire process of writing it by hand once to understand each ingredient and how they all fit together a little bit better.
Footnotes
If there is more than one BPF program, all ELF files can be combined into a single BPF ELF program with the command bpftool gen object
.
This also does deduplicates BTF information that is present in all input files.