On Tue, Jul 16, 2019 at 01:54:57PM -0700, Alexei Starovoitov wrote:
On Wed, Jul 10, 2019 at 10:15:44AM -0400, Joel Fernandes (Google) wrote:
Hi,
why are you cc-ing the whole world for this patch set?
Well, the whole world happens to be interested in BPF on Android.
I'll reply to all as well, but I suspect a bunch of folks consider it spam. Please read Documentation/bpf/bpf_devel_QA.rst
Ok, I'll read it.
Also, I think, netdev@vger rejects emails with 80+ characters in cc as spam, so I'm not sure this set reached public mailing lists.
Certainly the CC list here is not added to folks who consider it spam. All the folks added have been interested in BPF on Android at various points of time. Is this CC list really that large? It has around 24 email addresses or so. I can trim it a bit if needed. Also, you sound like as if people are screaming at me to stop emailing them, certainly that's not the case and no one has told me it is spam.
And, it did reach the public archive btw: https://lore.kernel.org/netdev/20190716205455.iimn3pqpvsc3k4ry@ast-mbp.dhcp....
These patches make it possible to attach BPF programs directly to tracepoints using ftrace (/sys/kernel/debug/tracing) without needing the process doing the attach to be alive. This has the following benefits:
- Simplified Security: In Android, we have finer-grained security controls to
specific ftrace trace events using SELinux labels. We control precisely who is allowed to enable an ftrace event already. By adding a node to ftrace for attaching BPF programs, we can use the same mechanism to further control who is allowed to attach to a trace event.
- Process lifetime: In Android we are adding usecases where a tracing program
needs to be attached all the time to a tracepoint, for the full life time of the system. Such as to gather statistics where there no need for a detach for the full system lifetime. With perf or bpf(2)'s BPF_RAW_TRACEPOINT_OPEN, this means keeping a process alive all the time. However, in Android our BPF loader currently (for hardeneded security) involves just starting a process at boot time, doing the BPF program loading, and then pinning them to /sys/fs/bpf. We don't keep this process alive all the time. It is more suitable to do a one-shot attach of the program using ftrace and not need to have a process alive all the time anymore for this. Such process also needs elevated privileges since tracepoint program loading currently requires CAP_SYS_ADMIN anyway so by design Android's bpfloader runs once at init and exits.
This series add a new bpf file to /sys/kernel/debug/tracing/events/X/Y/bpf The following commands can be written into it: attach:<fd> Attaches BPF prog fd to tracepoint detach:<fd> Detaches BPF prog fd to tracepoint
Looks like, to detach a program the user needs to read a text file, parse bpf prog id from text into binary. Then call fd_from_id bpf syscall, get a binary FD, convert it back to text and write as a text back into this file. I think this is just a single example why text based apis are not accepted in bpf anymore.
This can also be considered a tracefs API.
And we can certainly change the detach to accept program ids as well if that's easier. 'detach:prog:<prog_id>' and 'detach:fd:<fd>'.
By the way, I can also list the set of cumbersome steps needed to attach a BPF program using perf and I bet it will be longer ;-)
Through the patch set you call it ftrace. As far as I can see, this set has zero overlap with ftrace. There is no ftrace-bpf connection here at all that we discussed in the past Steven. It's all quite confusing.
It depends on what you mean by ftrace, may be I can call it 'trace events' or something if it is less ambiguious. All of this has been collectively called ftrace before.
I am not sure if you you are making sense actually, trace_events mechanism is a part of ftrace. See the documentation: Documentation/trace/ftrace.rst. Even the documentation file name has the word ftrace in it.
I have also spoken to Steven before about this, I don't think he ever told me there is no connection so again I am a bit lost at your comments.
I suggest android to solve sticky raw_tracepoint problem with user space deamon. The reasons, you point out why user daemon cannot be used, sound weak to me.
I don't think it is weak. It seems overkill to have a daemon for a trace event that is say supposed to be attached to all the time for the lifetime of the system. Why should there be a daemon consuming resources if it is active all the time?
In Android, we are very careful about spawning useless processes and leaving them alive for the lifetime of the system - for no good reason. Our security teams also don't like this, and they can comment more.
Another acceptable solution would be to introduce pinning of raw_tp objects. bpf progs and maps can be pinned in bpffs already. Pinning raw_tp would be natural extension.
I don't think the pinning solves the security problem, it just solves the process lifetime problem. Currently, attaching trace events through perf requires CAP_SYS_ADMIN. However, with ftrace events, we already control security of events by labeling the nodes in tracefs and granting access to the labeled context through the selinux policies. Having a 'bpf' node in tracefs for events, and granting access to the labels is a natural extension.
I also thought about the pinning idea before, but we also want to add support for not just raw tracepoints, but also regular tracepoints (events if you will). I am hesitant to add a new BPF API just for creating regular tracepoints and then pinning those as well.
I don't see why a new bpf node for a trace event is a bad idea, really. tracefs is how we deal with trace events on Android. We do it in production systems. This is a natural extension to that and fits with the security model well.
thanks,
- Joel