Re: [PATCH bpf-next v3 1/2] net: bpf: Always call BPF cgroup filters for egress.

22 Jun 2023

On 6/22/23 10:15 AM, Kui-Feng Lee wrote:
...
On 6/21/23 20:37, Yonghong Song wrote:
...
On 6/20/23 10:14 AM, Kui-Feng Lee wrote:
...
Always call BPF filters if CGROUP BPF is enabled for EGRESS without
checking skb->sk against sk.
The filters were called only if skb is owned by the sock that the
skb is sent out through.  In another words, skb->sk should point to
the sock that it is sending through its egress.  However, the filters 
would
miss SYNACK skbs that they are owned by a request_sock but sent through
the listening sock, that is the socket listening incoming connections.
This is an unnecessary restrict.
The original patch which introduced 'sk == skb->sk' is
   3007098494be  cgroup: add support for eBPF programs
There are no mentioning in commit message why 'sk == skb->sk'
is needed. So it is possible that this is just restricted
for use cases at that moment. Now there are use cases
where 'sk != skb->sk' so removing this check can enable
the new use case. Maybe you can add this into your commit
message so people can understand the history of 'sk == skb->sk'.
After checking the code and the Alexei's comment[1] again, this check
may be different from what I thought. In another post[2],
Daniel Borkmann mentioned
Wouldn't that mean however, when you go through stacked devices that
     you'd run the same eBPF cgroup program for skb->sk multiple times?
I read this paragraph several times.
This check ensures the filters are only called for the device on
the top of a stack.  So, I probably should change the check to
sk == skb_to_full_sk(skb)
I think this should work. It exactly covers your use case:
   they are owned by a request_sock but sent through
   the listening sock, that is the socket listening incoming connections
and sk == skb->sk for non request_sock/listening_sock case.
I originally though whether you could do
   sk == skb->sk || skb->sk->sk_state == TCP_NEW_SYN_RECV
but obviously your approach is better.
...
instead of removing it.  If we remove the check, egress filters
could be called multiple times for a skb, just like what Daniel said.
Does that make sense?
[1] 
https://lore.kernel.org/all/CAADnVQKi0c=Mf3b=z43=b6n2xBVhwPw4QoV_au5+pFE29iL...
[2] https://lore.kernel.org/all/58193E9D.7040201@iogearbox.net/
...
...
Signed-off-by: Kui-Feng Lee kuifeng@meta.com
include/linux/bpf-cgroup.h | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
index 57e9e109257e..e656da531f9f 100644
--- a/include/linux/bpf-cgroup.h
+++ b/include/linux/bpf-cgroup.h
@@ -199,7 +199,7 @@ static inline bool cgroup_bpf_sock_enabled(struct 
sock *sk,
  #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk, skb)                   \
  ({                                           \
      int __ret = 0;                                   \
-    if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk && sk == 
skb->sk) { \
+    if (cgroup_bpf_enabled(CGROUP_INET_EGRESS) && sk) {               \
          typeof(sk) __sk = sk_to_full_sk(sk);                   \
          if (sk_fullsock(__sk) &&                       \
              cgroup_bpf_sock_enabled(__sk, 
CGROUP_INET_EGRESS))           \

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH bpf-next v3 1/2] net: bpf: Always call BPF cgroup filters for egress.

Signed-off-by: Kui-Feng Lee kuifeng@meta.com