Delayed capability release
The Ceph MDS server issues “capabilities” to clients to grant them permission to read or write objects for a particular file. I’ve added a delayed release of capabilities after a file is closed, as many workloads will quickly reopen the same file. In that case, we can re-use our existing capabilities (and be assured by the metadata leases that we’re correctly resolved the pathname to the correct inode) without any additional MDS interaction. This is a big win for, surprisingly, find, which has a pretty peculiar access pattern when descending out of directories:
open("..", O_RDONLY|O_NOFOLLOW) = 4 fchdir(4) = 0 close(4) = 0 open("..", O_RDONLY|O_NOFOLLOW) = 4 fchdir(4) = 0 close(4) = 0 fchdir(3) = 0 open(".", O_RDONLY|O_NOFOLLOW) = 4 fchdir(4) = 0 close(4) = 0 close(1) = 0
The explicit open was almost doubling the number of MDS ops. After adding delayed cap release, the workload (as seen by the MDS), now looks pretty tidy:
mds0 <== client0 ==== client_request(client0.22 open /a) ==== mds0 <== client0 ==== client_request(client0.23 readdir #10000000000) ==== mds0 <== client0 ==== client_request(client0.24 open /a/b) ==== mds0 <== client0 ==== client_request(client0.25 readdir #10000000001) ==== mds0 <== client0 ==== client_request(client0.26 open /a/b/c) ==== mds0 <== client0 ==== client_request(client0.27 readdir #10000000002) ==== mds0 <== client0 ==== client_request(client0.28 open /a/b/d) ==== mds0 <== client0 ==== client_request(client0.29 readdir #10000000003) ==== mds0 <== client0 ==== client_request(client0.30 open /a/e) ==== mds0 <== client0 ==== client_request(client0.31 readdir #10000000004) ==== mds0 <== client0 ==== client_file_caps(ack ino 10000000000 seq 2 caps [ ] wanted[ ] size 0/0) ==== mds0 <== client0 ==== client_file_caps(ack ino 10000000001 seq 2 caps [ ] wanted[ ] size 0/0) ==== mds0 <== client0 ==== client_file_caps(ack ino 10000000002 seq 2 caps [ ] wanted[ ] size 0/0) ==== mds0 <== client0 ==== client_file_caps(ack ino 10000000003 seq 2 caps [ ] wanted[ ] size 0/0) ==== mds0 <== client0 ==== client_file_caps(ack ino 10000000004 seq 2 caps [ ] wanted[ ] size 0/0) ====
..where the last 5 message release the capabilities a few seconds later.