README.md - Android社区 - https://www.androidos.net.cn/

# Compiling P4 to EBPF

Mihai Budiu - mbudiu@barefootnetworks.com

September 22, 2015

## Abstract

This document describes a prototype compiler that translates programs
written in the P4 programming languages to eBPF programs. The
translation is performed by generating programs written in a subset of
the C programming language, that are converted to EBPF using the BPF
Compiler Collection tools.

The compiler code is licensed under an [Apache v2.0 license]
(http://www.apache.org/licenses/LICENSE-2.0.html).

## Preliminaries

In this section we give a brief overview of P4 and EBPF. A detailed
treatment of these topics is outside the scope of this text.

### P4

P4 (http://p4.org) is a domain-specific programming language for
specifying the behavior of the dataplanes of network-forwarding
elements. The name of the programming language comes from the title
of a paper published in the proceedings of SIGCOMM Computer
Communications Review in 2014:
http://www.sigcomm.org/ccr/papers/2014/July/0000000.0000004:
"Programming Protocol-Independent Packet Processors".

P4 itself is protocol-independent but allows programmers to express a
rich set of data plane behaviors and protocols. The core P4
abstractions are:

* Header definitions describe the format (the set of fields and their
sizes) of each header within a packet.

* Parse graphs (finite-state machines) describe the permitted header
sequences within received packets.

* Tables associate keys to actions. P4 tables generalize traditional
forwarding tables; they can be used to implement routing tables,
flow lookup tables, access-control lists, etc.

* Actions describe how packet header fields and metadata are manipulated.

* Match-action units stitch together tables and actions, and perform
the following sequence of operations:

* Construct lookup keys from packet fields or computed metadata,

* Use the constructed lookup key to index into tables, choosing an
action to execute,

* Finally, execute the selected action.

* Control flow is expressed as an imperative program describing the
data-dependent packet processing within a pipeline, including the
data-dependent sequence of match-action unit invocations.

P4 programs describe the behavior of network-processing dataplanes. A
P4 program is designed to operate in concert with a separate *control
plane* program. The control plane is responsible for managing at
runtime the contents of the P4 tables. P4 cannot be used to specify
control-planes; however, a P4 program implicitly specifies the
interface between the data-plane and the control-plane.

The P4 language is under active development; the current stable
version is 1.0.2 (see http://p4.org/spec); a reference implementation
of a compiler and associated tools is freely available using a Apache
2 open-source license (see http://p4.org/code).

### EBPF

#### Safe code

EBPF is a acronym that stands for Extended Berkeley Packet Filters.
In essence EBPF is a low-level programming language (similar to
machine code); EBPF programs are traditionally executed by a virtual
machine that resides in the Linux kernel. EBPF programs can be
inserted and removed from a live kernel using dynamic code
instrumentation. The main feature of EBPF programs is their *static
safety*: prior to execution all EBPF programs have to be validated as
being safe, and unsafe programs cannot be executed. A safe program
provably cannot compromise the machine it is running on:

* it can only access a restricted memory region (on the local stack)

* it can run only for a limited amount of time; during execution it
cannot block, sleep or take any locks

* it cannot use any kernel resources with the exception of a limited
set of kernel services which have been specifically whitelisted,
including operations to manipulate tables (described below)

#### Kernel hooks

EBPF programs are inserted into the kernel using *hooks*. There are
several types of hooks available:

* any function entry point in the kernel can act as a hook; attaching
an EBPF program to a function `foo()` will cause the EBPF program to
execute every time some kernel thread executes `foo()`.

* EBPF programs can also be attached using the Linux Traffic Control
(TC) subsystem, in the network packet processing datapath. Such
programs can be used as TC classifiers and actions.

* EBPF programs can also be attached to sockets or network interfaces.
In this case they can be used for processing packets that flow
through the socket/interface.

EBPF programs can be used for many purposes; the main use cases are
dynamic tracing and monitoring, and packet processing. We are mostly
interested in the latter use case in this document.

#### EBPF Tables

The EBPF runtime exposes a bi-directional kernel-userspace data
communication channel, called *tables* (also called maps in some EBPF
documents and code samples). EBPF tables are essentially key-value
stores, where keys and values are arbitrary fixed-size bitstrings.
The key width, value width and table size (maximum number of entries
that can be stored) are declared statically, at table creation time.

In user-space tables handles are exposed as file descriptors. Both
user- and kernel-space programs can manipulate tables, by inserting,
deleting, looking up, modifying, and enumerating entries in a table.

In kernel space the keys and values are exposed as pointers to the raw
underlying data stored in the table, whereas in user-space the
pointers point to copies of the data.

#### Concurrency

An important aspect to understand related to EBPF is the execution
model. An EBPF program is triggered by a kernel hook; multiple
instances of the same kernel hook can be running simultaneously on
different cores.

Each table however has a single instances across all the cores. A
single table may be accessed simultaneously by multiple instances of
the same EBPF program running as separate kernel threads on different
cores. EBPF tables are native kernel objects, and access to the table
contents is protected using the kernel RCU mechanism. This makes
access to table entries safe under concurrent execution; for example,
the memory associated to a value cannot be accidentally freed while an
EBPF program holds a pointer to the respective value. However,
accessing tables is prone to data races; since EBPF programs cannot
use locks, some of these races often cannot be avoided.

EBPF and the associated tools are also under active development, and
new capabilities are added frequently. The P4 compiler generates code
that can be compiled using the BPF Compiler Collection (BCC)
(https://github.com/iovisor/bcc)

## Compiling P4 to EBPF

From the above description it is apparent that the P4 and EBPF
programming languages have different expressive powers. However,
there is a significant overlap in their capabilities, in particular,
in the domain of network packet processing. The following image
illustrates the situation:

![P4 and EBPF overlap in capabilities](scope.png)

We expect that the overlapping region will grow in size as both P4 and
EBPF continue to mature.

The current version of the P4 to EBPF compiler translates programs
written in the version 1.1 of the P4 programming language to programs
written in a restricted subset of C. The subset of C is chosen such
that it should be compilable to EBPF using BCC.

```
-------------- -------
P4 ---> | P4-to-EBPF | ---> C ----> | BCC | --> EBPF
-------------- -------
```

The P4 program only describes the packet processing *data plane*, that
runs in the Linux kernel. The *control plane* must be separately
implemented by the user. The BCC tools simplify this task
considerably, by generating C and/or Python APIs that expose the
dataplane/control-plane APIs.

### Dependencies

EBPF programs require a Linux kernel with version 4.2 or newer.

In order to use the P4 to EBPF compiler the following software must be installed:

* The compiler itself is written in the Python (v2.x) programming
language.

* the P4 compiler front-end: (https://github.com/p4lang/p4-hlir).
This is required for parsing the P4 programs.

* the BCC compiler collection tools: (https://github.com/iovisor/bcc).
This is required for compiling the generated code. Also, BCC comes
with a set of Python utilities which can be used to implement
control-plane programs that operate in concert with the kernel EBPF
datapath.

The P4 to EBPF compiler generates code that is designed for being used
as a classifier using the Linux TC subsystem.

Furthermore, the test code provided is written using the Python (v3.x)
programming language and requires several Python packages to be
installed.

### Supported capabilities

The current version of the P4 to EBPF compiler supports a relatively
narrow subset of the P4 language, but still powerful enough to write
very complex packet filters and simple packet forwarding engines. In
the spirit of open-source "release early, release often", we expect
that the compiler's capabilities will improve gradually.

* Packet filtering is performed using the `drop()` action. Packets
that are not dropped will be forwarded.

* Packet forwarding is performed by setting the
`standard_metadata.egress_port` to the index of the destination
network interface

Here are some limitations imposed on the P4 programs:

* Currently both the ingress and the egress P4 pipelines are executed
at the same hook (wherever the user chooses to insert the generated
EBPF program). In the future the compiler should probably generate
two separate EBPF programs.

* arbitrary parsers can be compiled, but the BCC compiler will reject
parsers that contain cycles

* arithmetic on data wider than 32 bits is not supported

* checksum computations are not implemented. In consequence, programs
that IP/TCP/UDP headers will produce incorrect packet headers.

* EBPF does not offer support for ternary or LPM tables

* P4 cloning and recirculation and not supported

* meters and registers are not supported; only direct counters are
currently supported. EBPF can potentially support registers and
arbitrary counters, so these may appear in the future.

* learning (i.e. `generate_digest`) is not implemented

### Translating P4 to C

To simplify the translation, the P4 programmer should refrain using
identifiers whose name starts with `ebpf_`.

The following table provides a brief summary of how each P4 construct
is mapped to a corresponding C construct:

#### Translating parsers

P4 Construct | C Translation
----------|------------
`header_type` | `struct` type
`header` | `struct` instance with an additional `valid` bit
`metadata` | `struct` instance
parser state | code block
state transition | `goto` statement
`extract` | load/shift/mask data from packet buffer

#### Translating match-action pipelines

P4 Construct | C Translation
----------|------------
table | 2 EBPF tables: second one used just for the default action
table key | `struct` type
table `actions` block | tagged `union` with all possible actions
`action` arguments | `struct`
table `reads` | EBPF table access
`action` body | code block
table `apply` | `switch` statement
counters | additional EBPF table

### Code organization

The compiler code is organized in two folders:

* `compiler`: the complete compiler source code, in Python v2.x
The compiler entry point is `p4toEbpf.py`.

* `test`: testing code and data. There are two testing programs:
* `testP4toEbpf.py`: which compiles all P4 files in the testprograms folder

* `endToEndTest.py`: which compiles and executes the simple.p4
program, and includes a simple control plane

Currently the compiler contains no installation capabilities.

### Invoking the compiler

Invoking the compiler is just a matter of invoking the python program
with a suitable input P4 file:

```
p4toEbpf.py file.p4 -o file.c
```

#### Compiler options

The P4 compiler first runs the C preprocessor on the input P4 file.
Some of the command-line options are passed directly to the
preprocessor.

The following compiler options are available:

Option | Meaning
-------|--------
`-D macro` | Option passed to C preprocessor
`-I path` | Option passed to C preprocessor
`-U macro` | Option passed to C preprocessor
`-g [router|filter]` | Controls whether the generated code behaves like a router or a filter.
`-o outoutFile` | writes the generated C code to the specified output file.

The `-g` option controls the nature of the generated code:

* `-g filter` generates a filter; the only P4 action that has an
effect is the `drop()` action. Setting metadata in P4 (e.g.,
`egress_port`) has no effect.

* `-g router` generates a simple router; both `drop()` and
`egress_port` impact packet processing.

#### Using the generated code

The resulting file contains the complete data structures, tables, and
a C function named `ebpf_filter` that implements the P4-specified
data-plane. This C file can be manipulated using the BCC tools;
please refer to the BCC project documentation and sample test files of
the P4 to EBPF source code for an in-depth understanding. A minimal
Python program that compiles and loads into the kernel the generated
file into EBPF is:

```
#!/usr/bin/env python3
from bcc import BPF

b = BPF(src_file="file.c", debug=0)
fn = b.load_func("ebpf_filter", BPF.SCHED_CLS)
```

##### Connecting the generated program with the TC

The EBPF code that is generated is intended to be used as a classifier
attached to the ingress packet path using the Linux TC subsystem. The
same EBPF code should be attached to all interfaces. Note however
that all EBPF code instances share a single set of tables, which are
used to control the program behavior.

The following code fragment illustrates how the EBPF code can be
hooked up to the `eth0` interface using a Python program. (The `fn`
variable is the one produced by the previous code fragment).

```
from pyroute2 import IPRoute

ipr = IPRoute()
interface_name="eth0"
if_index = ipr.link_lookup(ifname=interface_name)[0]
ipr.tc("add", "ingress", if_index, "ffff:")
ipr.tc("add-filter", "bpf", if_index, ":1", fd=fn.fd,
name=fn.name, parent="ffff:", action="ok", classid=1)
```