On Tue, Sep 6, 2022 at 10:04 AM Roberto Sassu roberto.sassu@huaweicloud.com wrote:
From: Roberto Sassu roberto.sassu@huawei.com
Commit 6e71b04a82248 ("bpf: Add file mode configuration into bpf maps") added the BPF_F_RDONLY and BPF_F_WRONLY flags, to let user space specify whether it will just read or modify a map.
Map access control is done in two steps. First, when user space wants to obtain a map fd, it provides to the kernel the eBPF-defined flags, which are converted into open flags and passed to the security_bpf_map() security hook for evaluation by LSMs.
Second, if user space successfully obtained an fd, it passes that fd to the kernel when it requests a map operation (e.g. lookup or update). The kernel first checks if the fd has the modes required to perform the requested operation and, if yes, continues the execution and returns the result to user space.
While the fd modes check was added for map_*_elem() functions, it is currently missing for map iterators, added more recently with commit a5cbe05a6673 ("bpf: Implement bpf iterator for map elements"). A map iterator executes a chosen eBPF program for each key/value pair of a map and allows that program to read and/or modify them.
Whether a map iterator allows only read or also write depends on whether the MEM_RDONLY flag in the ctx_arg_info member of the bpf_iter_reg structure is set. Also, write needs to be supported at verifier level (for example, it is currently not supported for sock maps).
Since map iterators obtain a map from a user space fd with bpf_map_get_with_uref(), add the new req_modes parameter to that function, so that map iterators can provide the required fd modes to access a map. If the user space fd doesn't include the required modes, bpf_map_get_with_uref() returns with an error, and the map iterator will not be created.
If a map iterator marks both the key and value as read-only, it calls bpf_map_get_with_uref() with FMODE_CAN_READ as value for req_modes. If it also allows write access to either the key or the value, it calls that function with FMODE_CAN_READ | FMODE_CAN_WRITE as value for req_modes, regardless of whether or not the write is supported by the verifier (the write is intentionally allowed).
bpf_fd_probe_obj() does not require any fd mode, as the fd is only used for the purpose of finding the eBPF object type, for pinning the object to the bpffs filesystem.
Finally, it is worth to mention that the fd modes check was not added for the cgroup iterator, although it registers an attach_target method like the other iterators. The reason is that the fd is not the only way for user space to reference a cgroup object (also by ID and by path). For the protection to be effective, all reference methods need to be evaluated consistently. This work is deferred to a separate patch.
I think the current behavior is fine. File permissions don't apply at iterator level or prog level. fmode_can_read/write are for syscall commands only. To be fair we've added them to lookup/delete commands and it was more of a pain to maintain and no confirmed good use.