Demonstrations of nfsdist, the Linux eBPF/bcc version.
nfsdist traces NFS reads, writes, opens, and getattr, and summarizes their
latency as a power-of-2 histogram. For example:
./nfsdist.py
Tracing NFS operation latency... Hit Ctrl-C to end.
operation = read
usecs : count distribution
0 -> 1 : 4 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 7107 |************** |
16 -> 31 : 19864 |****************************************|
32 -> 63 : 1494 |*** |
64 -> 127 : 491 | |
128 -> 255 : 1810 |*** |
256 -> 511 : 6356 |************ |
512 -> 1023 : 4860 |********* |
1024 -> 2047 : 3070 |****** |
2048 -> 4095 : 1853 |*** |
4096 -> 8191 : 921 |* |
8192 -> 16383 : 122 | |
16384 -> 32767 : 15 | |
32768 -> 65535 : 5 | |
65536 -> 131071 : 2 | |
131072 -> 262143 : 1 | |
operation = write
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 1 | |
16 -> 31 : 0 | |
32 -> 63 : 9 | |
64 -> 127 : 19491 |****************************************|
128 -> 255 : 3064 |****** |
256 -> 511 : 940 |* |
512 -> 1023 : 365 | |
1024 -> 2047 : 312 | |
2048 -> 4095 : 119 | |
4096 -> 8191 : 31 | |
8192 -> 16383 : 84 | |
16384 -> 32767 : 31 | |
32768 -> 65535 : 5 | |
65536 -> 131071 : 3 | |
131072 -> 262143 : 0 | |
262144 -> 524287 : 1 | |
operation = getattr
usecs : count distribution
0 -> 1 : 27 |****************************************|
2 -> 3 : 2 |** |
4 -> 7 : 3 |**** |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 2 |** |
512 -> 1023 : 2 |** |
operation = open
usecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 0 | |
16 -> 31 : 0 | |
32 -> 63 : 0 | |
64 -> 127 : 0 | |
128 -> 255 : 0 | |
256 -> 511 : 2 |****************************************|
In this example you can see that the read traffic is rather bi-modal, with about
26K reads falling within 8 - 30 usecs and about 18K reads spread between 128 -
8191 usecs. Write traffic is largely clustered in the 64 - 127 usecs bracket.
The faster read traffic is probably coming from a filesystem cache and the slower
traffic from disk. The reason why the writes are so consistently fast is because
this example test was run on a couple of VM's and I believe the hypervisor was
caching all the write traffic to memory.
This "latency" is measured from when the operation was issued from the VFS
interface to the file system, to when it completed. This spans everything:
RPC latency, network latency, file system CPU cycles, file system locks, run
queue latency, etc. This is a better measure of the latency suffered by
applications reading from a NFS share and can better expose problems
experienced by NFS clients.
Note that this only traces the common NFS operations (read, write, open and
getattr). I chose to include getattr as a significant percentage of NFS
traffic end up being getattr calls and are a good indicator of problems
with an NFS server.
An optional interval and a count can be provided, as well as -m to show the
distributions in milliseconds. For example:
./nfsdist -m 1 5
Tracing NFS operation latency... Hit Ctrl-C to end.
11:02:39:
operation = write
msecs : count distribution
0 -> 1 : 1 | |
2 -> 3 : 24 |******** |
4 -> 7 : 114 |****************************************|
8 -> 15 : 9 |*** |
16 -> 31 : 1 | |
32 -> 63 : 1 | |
11:02:40:
operation = write
msecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 11 |*** |
4 -> 7 : 111 |****************************************|
8 -> 15 : 13 |**** |
16 -> 31 : 1 | |
11:02:41:
operation = write
msecs : count distribution
0 -> 1 : 0 | |
2 -> 3 : 21 |****** |
4 -> 7 : 137 |****************************************|
8 -> 15 : 3 | |
This shows a write workload, with writes hovering primarily in the 4-7ms range.
USAGE message:
./nfsdist -h
usage: nfsdist.py [-h] [-T] [-m] [-p PID] [interval] [count]
Summarize NFS operation latency
positional arguments:
interval output interval, in seconds
count number of outputs
optional arguments:
-h, --help show this help message and exit
-T, --notimestamp don't include timestamp on interval output
-m, --milliseconds output in milliseconds
-p PID, --pid PID trace this PID only
examples:
./nfsdist # show operation latency as a histogram
./nfsdist -p 181 # trace PID 181 only
./nfsdist 1 10 # print 1 second summaries, 10 times
./nfsdist -m 5 # 5s summaries, milliseconds