This is the second attempt at achieving the same goal. This time, the submission avoids forking the current code base, ensuring it remains easier to maintain over time.
The set has been tested using the SCM_RIGHTS test suite [1] using QEMU and has been seen to successfully mitigate a UAF on on a top tier handset.
RESULTS:
TAP version 13 1..20 # Starting 20 tests from 5 test cases. # RUN scm_rights.dgram.self_ref ... # OK scm_rights.dgram.self_ref ok 1 scm_rights.dgram.self_ref # RUN scm_rights.dgram.triangle ... # OK scm_rights.dgram.triangle ok 2 scm_rights.dgram.triangle # RUN scm_rights.dgram.cross_edge ... # OK scm_rights.dgram.cross_edge ok 3 scm_rights.dgram.cross_edge # RUN scm_rights.dgram.backtrack_from_scc ... # OK scm_rights.dgram.backtrack_from_scc ok 4 scm_rights.dgram.backtrack_from_scc # RUN scm_rights.stream.self_ref ... # OK scm_rights.stream.self_ref ok 5 scm_rights.stream.self_ref # RUN scm_rights.stream.triangle ... # OK scm_rights.stream.triangle ok 6 scm_rights.stream.triangle # RUN scm_rights.stream.cross_edge ... # OK scm_rights.stream.cross_edge ok 7 scm_rights.stream.cross_edge # RUN scm_rights.stream.backtrack_from_scc ... # OK scm_rights.stream.backtrack_from_scc ok 8 scm_rights.stream.backtrack_from_scc # RUN scm_rights.stream_oob.self_ref ... # OK scm_rights.stream_oob.self_ref ok 9 scm_rights.stream_oob.self_ref # RUN scm_rights.stream_oob.triangle ... # OK scm_rights.stream_oob.triangle ok 10 scm_rights.stream_oob.triangle # RUN scm_rights.stream_oob.cross_edge ... # OK scm_rights.stream_oob.cross_edge ok 11 scm_rights.stream_oob.cross_edge # RUN scm_rights.stream_oob.backtrack_from_scc ... # OK scm_rights.stream_oob.backtrack_from_scc ok 12 scm_rights.stream_oob.backtrack_from_scc # RUN scm_rights.stream_listener.self_ref ... # OK scm_rights.stream_listener.self_ref ok 13 scm_rights.stream_listener.self_ref # RUN scm_rights.stream_listener.triangle ... # OK scm_rights.stream_listener.triangle ok 14 scm_rights.stream_listener.triangle # RUN scm_rights.stream_listener.cross_edge ... # OK scm_rights.stream_listener.cross_edge ok 15 scm_rights.stream_listener.cross_edge # RUN scm_rights.stream_listener.backtrack_from_scc ... # OK scm_rights.stream_listener.backtrack_from_scc ok 16 scm_rights.stream_listener.backtrack_from_scc # RUN scm_rights.stream_listener_oob.self_ref ... # OK scm_rights.stream_listener_oob.self_ref ok 17 scm_rights.stream_listener_oob.self_ref # RUN scm_rights.stream_listener_oob.triangle ... # OK scm_rights.stream_listener_oob.triangle ok 18 scm_rights.stream_listener_oob.triangle # RUN scm_rights.stream_listener_oob.cross_edge ... # OK scm_rights.stream_listener_oob.cross_edge ok 19 scm_rights.stream_listener_oob.cross_edge # RUN scm_rights.stream_listener_oob.backtrack_from_scc ... # OK scm_rights.stream_listener_oob.backtrack_from_scc ok 20 scm_rights.stream_listener_oob.backtrack_from_scc # PASSED: 20 / 20 tests passed. # Totals: pass:20 fail:0 xfail:0 xpass:0 skip:0 error:0
[0] https://lore.kernel.org/all/20250304030149.82265-1-kuniyu@amazon.com/ [1] https://lore.kernel.org/all/20240325202425.60930-16-kuniyu@amazon.com/
Alexander Mikhalitsyn (1): af_unix: Kconfig: make CONFIG_UNIX bool
Kuniyuki Iwashima (24): af_unix: Return struct unix_sock from unix_get_socket(). af_unix: Run GC on only one CPU. af_unix: Try to run GC async. af_unix: Replace BUG_ON() with WARN_ON_ONCE(). af_unix: Remove io_uring code for GC. af_unix: Remove CONFIG_UNIX_SCM. af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd. af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd. af_unix: Link struct unix_edge when queuing skb. af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb. af_unix: Iterate all vertices by DFS. af_unix: Detect Strongly Connected Components. af_unix: Save listener for embryo socket. af_unix: Fix up unix_edge.successor for embryo socket. af_unix: Save O(n) setup of Tarjan's algo. af_unix: Skip GC if no cycle exists. af_unix: Avoid Tarjan's algorithm if unnecessary. af_unix: Assign a unique index to SCC. af_unix: Detect dead SCC. af_unix: Replace garbage collection algorithm. af_unix: Remove lock dance in unix_peek_fds(). af_unix: Try not to hold unix_gc_lock during accept(). af_unix: Don't access successor in unix_del_edges() during GC. af_unix: Add dead flag to struct scm_fp_list.
Michal Luczaj (1): af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
Shigeru Yoshida (1): af_unix: Fix uninit-value in __unix_walk_scc()
include/net/af_unix.h | 48 ++- include/net/scm.h | 11 + net/Makefile | 2 +- net/core/scm.c | 17 ++ net/unix/Kconfig | 11 +- net/unix/Makefile | 2 - net/unix/af_unix.c | 120 +++++--- net/unix/garbage.c | 691 +++++++++++++++++++++++++++++------------- net/unix/scm.c | 154 ---------- net/unix/scm.h | 10 - 10 files changed, 618 insertions(+), 448 deletions(-) delete mode 100644 net/unix/scm.c delete mode 100644 net/unix/scm.h
From: Alexander Mikhalitsyn aleksandr.mikhalitsyn@canonical.com
[ Upstream commit 97154bcf4d1b7cabefec8a72cff5fbb91d5afb7b ]
Let's make CONFIG_UNIX a bool instead of a tristate. We've decided to do that during discussion about SCM_PIDFD patchset [1].
[1] https://lore.kernel.org/lkml/20230524081933.44dc8bea@kernel.org/
Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: Leon Romanovsky leon@kernel.org Cc: David Ahern dsahern@kernel.org Cc: Arnd Bergmann arnd@arndb.de Cc: Kees Cook keescook@chromium.org Cc: Christian Brauner brauner@kernel.org Cc: Kuniyuki Iwashima kuniyu@amazon.com Cc: Lennart Poettering mzxreary@0pointer.de Cc: Luca Boccassi bluca@debian.org Cc: linux-kernel@vger.kernel.org Cc: netdev@vger.kernel.org Cc: linux-arch@vger.kernel.org Suggested-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Alexander Mikhalitsyn aleksandr.mikhalitsyn@canonical.com Acked-by: Christian Brauner brauner@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net (cherry picked from commit 97154bcf4d1b7cabefec8a72cff5fbb91d5afb7b) Signed-off-by: Lee Jones lee@kernel.org --- net/unix/Kconfig | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/net/unix/Kconfig b/net/unix/Kconfig index b7f811216820..28b232f281ab 100644 --- a/net/unix/Kconfig +++ b/net/unix/Kconfig @@ -4,7 +4,7 @@ #
config UNIX - tristate "Unix domain sockets" + bool "Unix domain sockets" help If you say Y here, you will include support for Unix domain sockets; sockets are the standard Unix mechanism for establishing and @@ -14,10 +14,6 @@ config UNIX an embedded system or something similar, you therefore definitely want to say Y here.
- To compile this driver as a module, choose M here: the module will be - called unix. Note that several important services won't work - correctly if you say M here and then neglect to load the module. - Say Y unless you know what you are doing.
config UNIX_SCM
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 97154bcf4d1b7cabefec8a72cff5fbb91d5afb7b
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Alexander Mikhalitsynaleksandr.mikhalitsyn@canonical.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Present (exact SHA1)
Note: The patch differs from the upstream commit: --- 1: 97154bcf4d1b7 ! 1: dc0456caaf050 af_unix: Kconfig: make CONFIG_UNIX bool @@ Metadata ## Commit message ## af_unix: Kconfig: make CONFIG_UNIX bool
+ [ Upstream commit 97154bcf4d1b7cabefec8a72cff5fbb91d5afb7b ] + Let's make CONFIG_UNIX a bool instead of a tristate. We've decided to do that during discussion about SCM_PIDFD patchset [1].
@@ Commit message Acked-by: Christian Brauner brauner@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net + (cherry picked from commit 97154bcf4d1b7cabefec8a72cff5fbb91d5afb7b) + Signed-off-by: Lee Jones lee@kernel.org
## net/unix/Kconfig ## @@ ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.1.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 5b17307bd0789edea0675d524a2b277b93bbde62 ]
Currently, unix_get_socket() returns struct sock, but after calling it, we always cast it to unix_sk().
Let's return struct unix_sock from unix_get_socket().
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Pavel Begunkov asml.silence@gmail.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240123170856.41348-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 5b17307bd0789edea0675d524a2b277b93bbde62) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 2 +- net/unix/garbage.c | 19 +++++++------------ net/unix/scm.c | 19 +++++++------------ 3 files changed, 15 insertions(+), 25 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index e7d71a516bd4..52ae023c3e93 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -13,7 +13,7 @@ void unix_notinflight(struct user_struct *user, struct file *fp); void unix_destruct_scm(struct sk_buff *skb); void unix_gc(void); void wait_for_unix_gc(void); -struct sock *unix_get_socket(struct file *filp); +struct unix_sock *unix_get_socket(struct file *filp); struct sock *unix_peer_get(struct sock *sk);
#define UNIX_HASH_MOD (256 - 1) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index d2fc795394a5..438f5d9b9173 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -105,20 +105,15 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
while (nfd--) { /* Get the socket the fd matches if it indeed does so */ - struct sock *sk = unix_get_socket(*fp++); + struct unix_sock *u = unix_get_socket(*fp++);
- if (sk) { - struct unix_sock *u = unix_sk(sk); + /* Ignore non-candidates, they could have been added + * to the queues after starting the garbage collection + */ + if (u && test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) { + hit = true;
- /* Ignore non-candidates, they could - * have been added to the queues after - * starting the garbage collection - */ - if (test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) { - hit = true; - - func(u); - } + func(u); } } if (hit && hitlist != NULL) { diff --git a/net/unix/scm.c b/net/unix/scm.c index 4eff7da9f6f9..693817a31ad8 100644 --- a/net/unix/scm.c +++ b/net/unix/scm.c @@ -21,9 +21,8 @@ EXPORT_SYMBOL(gc_inflight_list); DEFINE_SPINLOCK(unix_gc_lock); EXPORT_SYMBOL(unix_gc_lock);
-struct sock *unix_get_socket(struct file *filp) +struct unix_sock *unix_get_socket(struct file *filp) { - struct sock *u_sock = NULL; struct inode *inode = file_inode(filp);
/* Socket ? */ @@ -33,10 +32,10 @@ struct sock *unix_get_socket(struct file *filp)
/* PF_UNIX ? */ if (s && sock->ops && sock->ops->family == PF_UNIX) - u_sock = s; + return unix_sk(s); }
- return u_sock; + return NULL; } EXPORT_SYMBOL(unix_get_socket);
@@ -45,13 +44,11 @@ EXPORT_SYMBOL(unix_get_socket); */ void unix_inflight(struct user_struct *user, struct file *fp) { - struct sock *s = unix_get_socket(fp); + struct unix_sock *u = unix_get_socket(fp);
spin_lock(&unix_gc_lock);
- if (s) { - struct unix_sock *u = unix_sk(s); - + if (u) { if (!u->inflight) { BUG_ON(!list_empty(&u->link)); list_add_tail(&u->link, &gc_inflight_list); @@ -68,13 +65,11 @@ void unix_inflight(struct user_struct *user, struct file *fp)
void unix_notinflight(struct user_struct *user, struct file *fp) { - struct sock *s = unix_get_socket(fp); + struct unix_sock *u = unix_get_socket(fp);
spin_lock(&unix_gc_lock);
- if (s) { - struct unix_sock *u = unix_sk(s); - + if (u) { BUG_ON(!u->inflight); BUG_ON(list_empty(&u->link));
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 5b17307bd0789edea0675d524a2b277b93bbde62
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 5b17307bd0789 ! 1: fed4cc80f032b af_unix: Return struct unix_sock from unix_get_socket(). @@ Metadata ## Commit message ## af_unix: Return struct unix_sock from unix_get_socket().
+ [ Upstream commit 5b17307bd0789edea0675d524a2b277b93bbde62 ] + Currently, unix_get_socket() returns struct sock, but after calling it, we always cast it to unix_sk().
@@ Commit message Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240123170856.41348-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit 5b17307bd0789edea0675d524a2b277b93bbde62) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## -@@ include/net/af_unix.h: void unix_destruct_scm(struct sk_buff *skb); - void io_uring_destruct_scm(struct sk_buff *skb); +@@ include/net/af_unix.h: void unix_notinflight(struct user_struct *user, struct file *fp); + void unix_destruct_scm(struct sk_buff *skb); void unix_gc(void); void wait_for_unix_gc(void); -struct sock *unix_get_socket(struct file *filp); @@ net/unix/scm.c: EXPORT_SYMBOL(gc_inflight_list); @@ net/unix/scm.c: struct sock *unix_get_socket(struct file *filp)
/* PF_UNIX ? */ - if (s && ops && ops->family == PF_UNIX) + if (s && sock->ops && sock->ops->family == PF_UNIX) - u_sock = s; + return unix_sk(s); } ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 8b90a9f819dc2a06baae4ec1a64d875e53b824ec ]
If more than 16000 inflight AF_UNIX sockets exist and the garbage collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc(). Also, they wait for unix_gc() to complete.
In unix_gc(), all inflight AF_UNIX sockets are traversed at least once, and more if they are the GC candidate. Thus, sendmsg() significantly slows down with too many inflight AF_UNIX sockets.
There is a small window to invoke multiple unix_gc() instances, which will then be blocked by the same spinlock except for one.
Let's convert unix_gc() to use struct work so that it will not consume CPUs unnecessarily.
Note WRITE_ONCE(gc_in_progress, true) is moved before running GC. If we leave the WRITE_ONCE() as is and use the following test to call flush_work(), a process might not call it.
CPU 0 CPU 1 --- --- start work and call __unix_gc() if (work_pending(&unix_gc_work) || <-- false READ_ONCE(gc_in_progress)) <-- false flush_work(); <-- missed! WRITE_ONCE(gc_in_progress, true)
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240123170856.41348-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 8b90a9f819dc2a06baae4ec1a64d875e53b824ec) Signed-off-by: Lee Jones lee@kernel.org --- net/unix/garbage.c | 54 +++++++++++++++++++++++----------------------- 1 file changed, 27 insertions(+), 27 deletions(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 438f5d9b9173..bf628bfb6d35 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -86,7 +86,6 @@ /* Internal data structures and random procedures: */
static LIST_HEAD(gc_candidates); -static DECLARE_WAIT_QUEUE_HEAD(unix_gc_wait);
static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), struct sk_buff_head *hitlist) @@ -182,23 +181,8 @@ static void inc_inflight_move_tail(struct unix_sock *u) }
static bool gc_in_progress; -#define UNIX_INFLIGHT_TRIGGER_GC 16000 - -void wait_for_unix_gc(void) -{ - /* If number of inflight sockets is insane, - * force a garbage collect right now. - * Paired with the WRITE_ONCE() in unix_inflight(), - * unix_notinflight() and gc_in_progress(). - */ - if (READ_ONCE(unix_tot_inflight) > UNIX_INFLIGHT_TRIGGER_GC && - !READ_ONCE(gc_in_progress)) - unix_gc(); - wait_event(unix_gc_wait, !READ_ONCE(gc_in_progress)); -}
-/* The external entry point: unix_gc() */ -void unix_gc(void) +static void __unix_gc(struct work_struct *work) { struct sk_buff *next_skb, *skb; struct unix_sock *u; @@ -209,13 +193,6 @@ void unix_gc(void)
spin_lock(&unix_gc_lock);
- /* Avoid a recursive GC. */ - if (gc_in_progress) - goto out; - - /* Paired with READ_ONCE() in wait_for_unix_gc(). */ - WRITE_ONCE(gc_in_progress, true); - /* First, select candidates for garbage collection. Only * in-flight sockets are considered, and from those only ones * which don't have any external reference. @@ -346,8 +323,31 @@ void unix_gc(void) /* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false);
- wake_up(&unix_gc_wait); - - out: spin_unlock(&unix_gc_lock); } + +static DECLARE_WORK(unix_gc_work, __unix_gc); + +void unix_gc(void) +{ + WRITE_ONCE(gc_in_progress, true); + queue_work(system_unbound_wq, &unix_gc_work); +} + +#define UNIX_INFLIGHT_TRIGGER_GC 16000 + +void wait_for_unix_gc(void) +{ + /* If number of inflight sockets is insane, + * force a garbage collect right now. + * + * Paired with the WRITE_ONCE() in unix_inflight(), + * unix_notinflight(), and __unix_gc(). + */ + if (READ_ONCE(unix_tot_inflight) > UNIX_INFLIGHT_TRIGGER_GC && + !READ_ONCE(gc_in_progress)) + unix_gc(); + + if (READ_ONCE(gc_in_progress)) + flush_work(&unix_gc_work); +}
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 8b90a9f819dc2a06baae4ec1a64d875e53b824ec
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 8b90a9f819dc2 ! 1: 0a69126a52167 af_unix: Run GC on only one CPU. @@ Metadata ## Commit message ## af_unix: Run GC on only one CPU.
+ [ Upstream commit 8b90a9f819dc2a06baae4ec1a64d875e53b824ec ] + If more than 16000 inflight AF_UNIX sockets exist and the garbage collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc(). Also, they wait for unix_gc() to complete. @@ Commit message Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240123170856.41348-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit 8b90a9f819dc2a06baae4ec1a64d875e53b824ec) + Signed-off-by: Lee Jones lee@kernel.org
## net/unix/garbage.c ## @@ ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit d9f21b3613337b55cc9d4a6ead484dca68475143 ]
If more than 16000 inflight AF_UNIX sockets exist and the garbage collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc(). Also, they wait for unix_gc() to complete.
In unix_gc(), all inflight AF_UNIX sockets are traversed at least once, and more if they are the GC candidate. Thus, sendmsg() significantly slows down with too many inflight AF_UNIX sockets.
However, if a process sends data with no AF_UNIX FD, the sendmsg() call does not need to wait for GC. After this change, only the process that meets the condition below will be blocked under such a situation.
1) cmsg contains AF_UNIX socket 2) more than 32 AF_UNIX sent by the same user are still inflight
Note that even a sendmsg() call that does not meet the condition but has AF_UNIX FD will be blocked later in unix_scm_to_skb() by the spinlock, but we allow that as a bonus for sane users.
The results below are the time spent in unix_dgram_sendmsg() sending 1 byte of data with no FD 4096 times on a host where 32K inflight AF_UNIX sockets exist.
Without series: the sane sendmsg() needs to wait gc unreasonably.
$ sudo /usr/share/bcc/tools/funclatency -p 11165 unix_dgram_sendmsg Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end. ^C nsecs : count distribution [...] 524288 -> 1048575 : 0 | | 1048576 -> 2097151 : 3881 |****************************************| 2097152 -> 4194303 : 214 |** | 4194304 -> 8388607 : 1 | |
avg = 1825567 nsecs, total: 7477526027 nsecs, count: 4096
With series: the sane sendmsg() can finish much faster.
$ sudo /usr/share/bcc/tools/funclatency -p 8702 unix_dgram_sendmsg Tracing 1 functions for "unix_dgram_sendmsg"... Hit Ctrl-C to end. ^C nsecs : count distribution [...] 128 -> 255 : 0 | | 256 -> 511 : 4092 |****************************************| 512 -> 1023 : 2 | | 1024 -> 2047 : 0 | | 2048 -> 4095 : 0 | | 4096 -> 8191 : 1 | | 8192 -> 16383 : 1 | |
avg = 410 nsecs, total: 1680510 nsecs, count: 4096
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240123170856.41348-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit d9f21b3613337b55cc9d4a6ead484dca68475143) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 12 ++++++++++-- include/net/scm.h | 1 + net/core/scm.c | 5 +++++ net/unix/af_unix.c | 6 ++++-- net/unix/garbage.c | 10 +++++++++- 5 files changed, 29 insertions(+), 5 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 52ae023c3e93..be488d627531 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -8,12 +8,20 @@ #include <linux/refcount.h> #include <net/sock.h>
+#if IS_ENABLED(CONFIG_UNIX) +struct unix_sock *unix_get_socket(struct file *filp); +#else +static inline struct unix_sock *unix_get_socket(struct file *filp) +{ + return NULL; +} +#endif + void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); void unix_destruct_scm(struct sk_buff *skb); void unix_gc(void); -void wait_for_unix_gc(void); -struct unix_sock *unix_get_socket(struct file *filp); +void wait_for_unix_gc(struct scm_fp_list *fpl); struct sock *unix_peer_get(struct sock *sk);
#define UNIX_HASH_MOD (256 - 1) diff --git a/include/net/scm.h b/include/net/scm.h index 585adc1346bd..a5c26008fcec 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -23,6 +23,7 @@ struct scm_creds {
struct scm_fp_list { short count; + short count_unix; short max; struct user_struct *user; struct file *fp[SCM_MAX_FD]; diff --git a/net/core/scm.c b/net/core/scm.c index a877c4ef4c25..bb25052624ee 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -36,6 +36,7 @@ #include <net/compat.h> #include <net/scm.h> #include <net/cls_cgroup.h> +#include <net/af_unix.h>
/* @@ -85,6 +86,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp) return -ENOMEM; *fplp = fpl; fpl->count = 0; + fpl->count_unix = 0; fpl->max = SCM_MAX_FD; fpl->user = NULL; } @@ -109,6 +111,9 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp) fput(file); return -EINVAL; } + if (unix_get_socket(file)) + fpl->count_unix++; + *fpp++ = file; fpl->count++; } diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 5ce60087086c..f74f7878b3fe 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1875,11 +1875,12 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg, long timeo; int err;
- wait_for_unix_gc(); err = scm_send(sock, msg, &scm, false); if (err < 0) return err;
+ wait_for_unix_gc(scm.fp); + err = -EOPNOTSUPP; if (msg->msg_flags&MSG_OOB) goto out; @@ -2145,11 +2146,12 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, bool fds_sent = false; int data_len;
- wait_for_unix_gc(); err = scm_send(sock, msg, &scm, false); if (err < 0) return err;
+ wait_for_unix_gc(scm.fp); + err = -EOPNOTSUPP; if (msg->msg_flags & MSG_OOB) { #if IS_ENABLED(CONFIG_AF_UNIX_OOB) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index bf628bfb6d35..2934d7b68036 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -335,8 +335,9 @@ void unix_gc(void) }
#define UNIX_INFLIGHT_TRIGGER_GC 16000 +#define UNIX_INFLIGHT_SANE_USER (SCM_MAX_FD * 8)
-void wait_for_unix_gc(void) +void wait_for_unix_gc(struct scm_fp_list *fpl) { /* If number of inflight sockets is insane, * force a garbage collect right now. @@ -348,6 +349,13 @@ void wait_for_unix_gc(void) !READ_ONCE(gc_in_progress)) unix_gc();
+ /* Penalise users who want to send AF_UNIX sockets + * but whose sockets have not been received yet. + */ + if (!fpl || !fpl->count_unix || + READ_ONCE(fpl->user->unix_inflight) < UNIX_INFLIGHT_SANE_USER) + return; + if (READ_ONCE(gc_in_progress)) flush_work(&unix_gc_work); }
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: d9f21b3613337b55cc9d4a6ead484dca68475143
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: d9f21b3613337 ! 1: 61310f376dac5 af_unix: Try to run GC async. @@ Metadata ## Commit message ## af_unix: Try to run GC async.
+ [ Upstream commit d9f21b3613337b55cc9d4a6ead484dca68475143 ] + If more than 16000 inflight AF_UNIX sockets exist and the garbage collector is not running, unix_(dgram|stream)_sendmsg() call unix_gc(). Also, they wait for unix_gc() to complete. @@ Commit message Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240123170856.41348-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit d9f21b3613337b55cc9d4a6ead484dca68475143) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## @@ @@ include/net/af_unix.h void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); void unix_destruct_scm(struct sk_buff *skb); - void io_uring_destruct_scm(struct sk_buff *skb); void unix_gc(void); -void wait_for_unix_gc(void); -struct unix_sock *unix_get_socket(struct file *filp); ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit d0f6dc26346863e1f4a23117f5468614e54df064 ]
This is a prep patch for the last patch in this series so that checkpatch will not warn about BUG_ON().
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Jens Axboe axboe@kernel.dk Link: https://lore.kernel.org/r/20240129190435.57228-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit d0f6dc26346863e1f4a23117f5468614e54df064) Signed-off-by: Lee Jones lee@kernel.org --- net/unix/garbage.c | 8 ++++---- net/unix/scm.c | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 2934d7b68036..7eeaac165e85 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -145,7 +145,7 @@ static void scan_children(struct sock *x, void (*func)(struct unix_sock *), /* An embryo cannot be in-flight, so it's safe * to use the list link. */ - BUG_ON(!list_empty(&u->link)); + WARN_ON_ONCE(!list_empty(&u->link)); list_add_tail(&u->link, &embryos); } spin_unlock(&x->sk_receive_queue.lock); @@ -224,8 +224,8 @@ static void __unix_gc(struct work_struct *work)
total_refs = file_count(sk->sk_socket->file);
- BUG_ON(!u->inflight); - BUG_ON(total_refs < u->inflight); + WARN_ON_ONCE(!u->inflight); + WARN_ON_ONCE(total_refs < u->inflight); if (total_refs == u->inflight) { list_move_tail(&u->link, &gc_candidates); __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags); @@ -318,7 +318,7 @@ static void __unix_gc(struct work_struct *work) list_move_tail(&u->link, &gc_inflight_list);
/* All candidates should have been detached by now. */ - BUG_ON(!list_empty(&gc_candidates)); + WARN_ON_ONCE(!list_empty(&gc_candidates));
/* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false); diff --git a/net/unix/scm.c b/net/unix/scm.c index 693817a31ad8..6f446dd2deed 100644 --- a/net/unix/scm.c +++ b/net/unix/scm.c @@ -50,10 +50,10 @@ void unix_inflight(struct user_struct *user, struct file *fp)
if (u) { if (!u->inflight) { - BUG_ON(!list_empty(&u->link)); + WARN_ON_ONCE(!list_empty(&u->link)); list_add_tail(&u->link, &gc_inflight_list); } else { - BUG_ON(list_empty(&u->link)); + WARN_ON_ONCE(list_empty(&u->link)); } u->inflight++; /* Paired with READ_ONCE() in wait_for_unix_gc() */ @@ -70,8 +70,8 @@ void unix_notinflight(struct user_struct *user, struct file *fp) spin_lock(&unix_gc_lock);
if (u) { - BUG_ON(!u->inflight); - BUG_ON(list_empty(&u->link)); + WARN_ON_ONCE(!u->inflight); + WARN_ON_ONCE(list_empty(&u->link));
u->inflight--; if (!u->inflight)
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: d0f6dc26346863e1f4a23117f5468614e54df064
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: d0f6dc2634686 ! 1: 8d734ba6e6dbe af_unix: Replace BUG_ON() with WARN_ON_ONCE(). @@ Metadata ## Commit message ## af_unix: Replace BUG_ON() with WARN_ON_ONCE().
+ [ Upstream commit d0f6dc26346863e1f4a23117f5468614e54df064 ] + This is a prep patch for the last patch in this series so that checkpatch will not warn about BUG_ON().
@@ Commit message Acked-by: Jens Axboe axboe@kernel.dk Link: https://lore.kernel.org/r/20240129190435.57228-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit d0f6dc26346863e1f4a23117f5468614e54df064) + Signed-off-by: Lee Jones lee@kernel.org
## net/unix/garbage.c ## @@ net/unix/garbage.c: static void scan_children(struct sock *x, void (*func)(struct unix_sock *), @@ net/unix/garbage.c: static void scan_children(struct sock *x, void (*func)(struc spin_unlock(&x->sk_receive_queue.lock); @@ net/unix/garbage.c: static void __unix_gc(struct work_struct *work)
- total_refs = file_count(u->sk.sk_socket->file); + total_refs = file_count(sk->sk_socket->file);
- BUG_ON(!u->inflight); - BUG_ON(total_refs < u->inflight); ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
On Wed, 21 May 2025 16:27:04 +0100 Lee Jones lee@kernel.org wrote:
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit d0f6dc26346863e1f4a23117f5468614e54df064 ]
This is a prep patch for the last patch in this series so that checkpatch will not warn about BUG_ON().
Does any of this actually make any sense? Either the BUG_ON() should be just deleted because it can't happen (or doesn't matter) or there should be an error path. Blindly replacing with WARN_ON_ONCE() can't be right.
The last change (repeated here)
if (u) {
BUG_ON(!u->inflight);
BUG_ON(list_empty(&u->link));
WARN_ON_ONCE(!u->inflight);
WARN_ON_ONCE(list_empty(&u->link));
u->inflight--; if (!u->inflight)
is clearly just plain wrong. If 'inflight' is zero then 'decrementing' it to ~0 is just going to 'crash and burn' very badly not much later on.
David
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Jens Axboe axboe@kernel.dk Link: https://lore.kernel.org/r/20240129190435.57228-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit d0f6dc26346863e1f4a23117f5468614e54df064) Signed-off-by: Lee Jones lee@kernel.org
net/unix/garbage.c | 8 ++++---- net/unix/scm.c | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 2934d7b68036..7eeaac165e85 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -145,7 +145,7 @@ static void scan_children(struct sock *x, void (*func)(struct unix_sock *), /* An embryo cannot be in-flight, so it's safe * to use the list link. */
BUG_ON(!list_empty(&u->link));
} spin_unlock(&x->sk_receive_queue.lock);WARN_ON_ONCE(!list_empty(&u->link)); list_add_tail(&u->link, &embryos);
@@ -224,8 +224,8 @@ static void __unix_gc(struct work_struct *work) total_refs = file_count(sk->sk_socket->file);
BUG_ON(!u->inflight);
BUG_ON(total_refs < u->inflight);
WARN_ON_ONCE(!u->inflight);
if (total_refs == u->inflight) { list_move_tail(&u->link, &gc_candidates); __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags);WARN_ON_ONCE(total_refs < u->inflight);
@@ -318,7 +318,7 @@ static void __unix_gc(struct work_struct *work) list_move_tail(&u->link, &gc_inflight_list); /* All candidates should have been detached by now. */
- BUG_ON(!list_empty(&gc_candidates));
- WARN_ON_ONCE(!list_empty(&gc_candidates));
/* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false); diff --git a/net/unix/scm.c b/net/unix/scm.c index 693817a31ad8..6f446dd2deed 100644 --- a/net/unix/scm.c +++ b/net/unix/scm.c @@ -50,10 +50,10 @@ void unix_inflight(struct user_struct *user, struct file *fp) if (u) { if (!u->inflight) {
BUG_ON(!list_empty(&u->link));
} else {WARN_ON_ONCE(!list_empty(&u->link)); list_add_tail(&u->link, &gc_inflight_list);
BUG_ON(list_empty(&u->link));
} u->inflight++; /* Paired with READ_ONCE() in wait_for_unix_gc() */WARN_ON_ONCE(list_empty(&u->link));
@@ -70,8 +70,8 @@ void unix_notinflight(struct user_struct *user, struct file *fp) spin_lock(&unix_gc_lock);
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 11498715f266a3fb4caabba9dd575636cbcaa8f1 ]
Since commit 705318a99a13 ("io_uring/af_unix: disable sending io_uring over sockets"), io_uring's unix socket cannot be passed via SCM_RIGHTS, so it does not contribute to cyclic reference and no longer be candidate for garbage collection.
Also, commit 6e5e6d274956 ("io_uring: drop any code related to SCM_RIGHTS") cleaned up SCM_RIGHTS code in io_uring.
Let's do it in AF_UNIX as well by reverting commit 0091bfc81741 ("io_uring/af_unix: defer registered files gc to io_uring release") and commit 10369080454d ("net: reclaim skb->scm_io_uring bit").
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Jens Axboe axboe@kernel.dk Link: https://lore.kernel.org/r/20240129190435.57228-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 11498715f266a3fb4caabba9dd575636cbcaa8f1) Signed-off-by: Lee Jones lee@kernel.org --- net/unix/garbage.c | 25 ++----------------------- 1 file changed, 2 insertions(+), 23 deletions(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 7eeaac165e85..c04f82489abb 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -184,12 +184,10 @@ static bool gc_in_progress;
static void __unix_gc(struct work_struct *work) { - struct sk_buff *next_skb, *skb; - struct unix_sock *u; - struct unix_sock *next; struct sk_buff_head hitlist; - struct list_head cursor; + struct unix_sock *u, *next; LIST_HEAD(not_cycle_list); + struct list_head cursor;
spin_lock(&unix_gc_lock);
@@ -293,30 +291,11 @@ static void __unix_gc(struct work_struct *work)
spin_unlock(&unix_gc_lock);
- /* We need io_uring to clean its registered files, ignore all io_uring - * originated skbs. It's fine as io_uring doesn't keep references to - * other io_uring instances and so killing all other files in the cycle - * will put all io_uring references forcing it to go through normal - * release.path eventually putting registered files. - */ - skb_queue_walk_safe(&hitlist, skb, next_skb) { - if (skb->scm_io_uring) { - __skb_unlink(skb, &hitlist); - skb_queue_tail(&skb->sk->sk_receive_queue, skb); - } - } - /* Here we are. Hitlist is filled. Die. */ __skb_queue_purge(&hitlist);
spin_lock(&unix_gc_lock);
- /* There could be io_uring registered files, just push them back to - * the inflight list - */ - list_for_each_entry_safe(u, next, &gc_candidates, link) - list_move_tail(&u->link, &gc_inflight_list); - /* All candidates should have been detached by now. */ WARN_ON_ONCE(!list_empty(&gc_candidates));
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 11498715f266a3fb4caabba9dd575636cbcaa8f1
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 11498715f266a ! 1: 70f88da83e058 af_unix: Remove io_uring code for GC. @@ Metadata ## Commit message ## af_unix: Remove io_uring code for GC.
+ [ Upstream commit 11498715f266a3fb4caabba9dd575636cbcaa8f1 ] + Since commit 705318a99a13 ("io_uring/af_unix: disable sending io_uring over sockets"), io_uring's unix socket cannot be passed via SCM_RIGHTS, so it does not contribute to cyclic reference and @@ Commit message Acked-by: Jens Axboe axboe@kernel.dk Link: https://lore.kernel.org/r/20240129190435.57228-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org - - ## include/net/af_unix.h ## -@@ include/net/af_unix.h: static inline struct unix_sock *unix_get_socket(struct file *filp) - void unix_inflight(struct user_struct *user, struct file *fp); - void unix_notinflight(struct user_struct *user, struct file *fp); - void unix_destruct_scm(struct sk_buff *skb); --void io_uring_destruct_scm(struct sk_buff *skb); - void unix_gc(void); - void wait_for_unix_gc(struct scm_fp_list *fpl); - struct sock *unix_peer_get(struct sock *sk); + (cherry picked from commit 11498715f266a3fb4caabba9dd575636cbcaa8f1) + Signed-off-by: Lee Jones lee@kernel.org
## net/unix/garbage.c ## @@ net/unix/garbage.c: static bool gc_in_progress; @@ net/unix/garbage.c: static void __unix_gc(struct work_struct *work) - * release.path eventually putting registered files. - */ - skb_queue_walk_safe(&hitlist, skb, next_skb) { -- if (skb->destructor == io_uring_destruct_scm) { +- if (skb->scm_io_uring) { - __skb_unlink(skb, &hitlist); - skb_queue_tail(&skb->sk->sk_receive_queue, skb); - } @@ net/unix/garbage.c: static void __unix_gc(struct work_struct *work) /* All candidates should have been detached by now. */ WARN_ON_ONCE(!list_empty(&gc_candidates));
- - ## net/unix/scm.c ## -@@ net/unix/scm.c: void unix_destruct_scm(struct sk_buff *skb) - sock_wfree(skb); - } - EXPORT_SYMBOL(unix_destruct_scm); -- --void io_uring_destruct_scm(struct sk_buff *skb) --{ -- unix_destruct_scm(skb); --} --EXPORT_SYMBOL(io_uring_destruct_scm); ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 99a7a5b9943ea2d05fb0dee38e4ae2290477ed83 ]
Originally, the code related to garbage collection was all in garbage.c.
Commit f4e65870e5ce ("net: split out functions related to registering inflight socket files") moved some functions to scm.c for io_uring and added CONFIG_UNIX_SCM just in case AF_UNIX was built as module.
However, since commit 97154bcf4d1b ("af_unix: Kconfig: make CONFIG_UNIX bool"), AF_UNIX is no longer built separately. Also, io_uring does not support SCM_RIGHTS now.
Let's move the functions back to garbage.c
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Jens Axboe axboe@kernel.dk Link: https://lore.kernel.org/r/20240129190435.57228-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 99a7a5b9943ea2d05fb0dee38e4ae2290477ed83) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 7 +- net/Makefile | 2 +- net/unix/Kconfig | 5 -- net/unix/Makefile | 2 - net/unix/af_unix.c | 63 +++++++++++++++++- net/unix/garbage.c | 73 ++++++++++++++++++++- net/unix/scm.c | 149 ------------------------------------------ net/unix/scm.h | 10 --- 8 files changed, 137 insertions(+), 174 deletions(-) delete mode 100644 net/unix/scm.c delete mode 100644 net/unix/scm.h
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index be488d627531..91d2036fc182 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -17,19 +17,20 @@ static inline struct unix_sock *unix_get_socket(struct file *filp) } #endif
+extern spinlock_t unix_gc_lock; +extern unsigned int unix_tot_inflight; + void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); -void unix_destruct_scm(struct sk_buff *skb); void unix_gc(void); void wait_for_unix_gc(struct scm_fp_list *fpl); + struct sock *unix_peer_get(struct sock *sk);
#define UNIX_HASH_MOD (256 - 1) #define UNIX_HASH_SIZE (256 * 2) #define UNIX_HASH_BITS 8
-extern unsigned int unix_tot_inflight; - struct unix_address { refcount_t refcnt; int len; diff --git a/net/Makefile b/net/Makefile index 0914bea9c335..103cd8d61f68 100644 --- a/net/Makefile +++ b/net/Makefile @@ -17,7 +17,7 @@ obj-$(CONFIG_NETFILTER) += netfilter/ obj-$(CONFIG_INET) += ipv4/ obj-$(CONFIG_TLS) += tls/ obj-$(CONFIG_XFRM) += xfrm/ -obj-$(CONFIG_UNIX_SCM) += unix/ +obj-$(CONFIG_UNIX) += unix/ obj-y += ipv6/ obj-$(CONFIG_BPFILTER) += bpfilter/ obj-$(CONFIG_PACKET) += packet/ diff --git a/net/unix/Kconfig b/net/unix/Kconfig index 28b232f281ab..8b5d04210d7c 100644 --- a/net/unix/Kconfig +++ b/net/unix/Kconfig @@ -16,11 +16,6 @@ config UNIX
Say Y unless you know what you are doing.
-config UNIX_SCM - bool - depends on UNIX - default y - config AF_UNIX_OOB bool depends on UNIX diff --git a/net/unix/Makefile b/net/unix/Makefile index 20491825b4d0..4ddd125c4642 100644 --- a/net/unix/Makefile +++ b/net/unix/Makefile @@ -11,5 +11,3 @@ unix-$(CONFIG_BPF_SYSCALL) += unix_bpf.o
obj-$(CONFIG_UNIX_DIAG) += unix_diag.o unix_diag-y := diag.o - -obj-$(CONFIG_UNIX_SCM) += scm.o diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index f74f7878b3fe..7bcc4c526274 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -116,8 +116,6 @@ #include <linux/file.h> #include <linux/btf_ids.h>
-#include "scm.h" - static atomic_long_t unix_nr_socks; static struct hlist_head bsd_socket_buckets[UNIX_HASH_SIZE / 2]; static spinlock_t bsd_socket_locks[UNIX_HASH_SIZE / 2]; @@ -1726,6 +1724,52 @@ static int unix_getname(struct socket *sock, struct sockaddr *uaddr, int peer) return err; }
+/* The "user->unix_inflight" variable is protected by the garbage + * collection lock, and we just read it locklessly here. If you go + * over the limit, there might be a tiny race in actually noticing + * it across threads. Tough. + */ +static inline bool too_many_unix_fds(struct task_struct *p) +{ + struct user_struct *user = current_user(); + + if (unlikely(READ_ONCE(user->unix_inflight) > task_rlimit(p, RLIMIT_NOFILE))) + return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN); + return false; +} + +static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) +{ + int i; + + if (too_many_unix_fds(current)) + return -ETOOMANYREFS; + + /* Need to duplicate file references for the sake of garbage + * collection. Otherwise a socket in the fps might become a + * candidate for GC while the skb is not yet queued. + */ + UNIXCB(skb).fp = scm_fp_dup(scm->fp); + if (!UNIXCB(skb).fp) + return -ENOMEM; + + for (i = scm->fp->count - 1; i >= 0; i--) + unix_inflight(scm->fp->user, scm->fp->fp[i]); + + return 0; +} + +static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb) +{ + int i; + + scm->fp = UNIXCB(skb).fp; + UNIXCB(skb).fp = NULL; + + for (i = scm->fp->count - 1; i >= 0; i--) + unix_notinflight(scm->fp->user, scm->fp->fp[i]); +} + static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) { scm->fp = scm_fp_dup(UNIXCB(skb).fp); @@ -1773,6 +1817,21 @@ static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) spin_unlock(&unix_gc_lock); }
+static void unix_destruct_scm(struct sk_buff *skb) +{ + struct scm_cookie scm; + + memset(&scm, 0, sizeof(scm)); + scm.pid = UNIXCB(skb).pid; + if (UNIXCB(skb).fp) + unix_detach_fds(&scm, skb); + + /* Alas, it calls VFS */ + /* So fscking what? fput() had been SMP-safe since the last Summer */ + scm_destroy(&scm); + sock_wfree(skb); +} + static int unix_scm_to_skb(struct scm_cookie *scm, struct sk_buff *skb, bool send_fds) { int err = 0; diff --git a/net/unix/garbage.c b/net/unix/garbage.c index c04f82489abb..0104be9d4704 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -81,11 +81,80 @@ #include <net/scm.h> #include <net/tcp_states.h>
-#include "scm.h" +struct unix_sock *unix_get_socket(struct file *filp) +{ + struct inode *inode = file_inode(filp); + + /* Socket ? */ + if (S_ISSOCK(inode->i_mode) && !(filp->f_mode & FMODE_PATH)) { + struct socket *sock = SOCKET_I(inode); + const struct proto_ops *ops; + struct sock *sk = sock->sk;
-/* Internal data structures and random procedures: */ + ops = READ_ONCE(sock->ops);
+ /* PF_UNIX ? */ + if (sk && ops && ops->family == PF_UNIX) + return unix_sk(sk); + } + + return NULL; +} + +DEFINE_SPINLOCK(unix_gc_lock); +unsigned int unix_tot_inflight; static LIST_HEAD(gc_candidates); +static LIST_HEAD(gc_inflight_list); + +/* Keep the number of times in flight count for the file + * descriptor if it is for an AF_UNIX socket. + */ +void unix_inflight(struct user_struct *user, struct file *filp) +{ + struct unix_sock *u = unix_get_socket(filp); + + spin_lock(&unix_gc_lock); + + if (u) { + if (!u->inflight) { + WARN_ON_ONCE(!list_empty(&u->link)); + list_add_tail(&u->link, &gc_inflight_list); + } else { + WARN_ON_ONCE(list_empty(&u->link)); + } + u->inflight++; + + /* Paired with READ_ONCE() in wait_for_unix_gc() */ + WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1); + } + + WRITE_ONCE(user->unix_inflight, user->unix_inflight + 1); + + spin_unlock(&unix_gc_lock); +} + +void unix_notinflight(struct user_struct *user, struct file *filp) +{ + struct unix_sock *u = unix_get_socket(filp); + + spin_lock(&unix_gc_lock); + + if (u) { + WARN_ON_ONCE(!u->inflight); + WARN_ON_ONCE(list_empty(&u->link)); + + u->inflight--; + if (!u->inflight) + list_del_init(&u->link); + + /* Paired with READ_ONCE() in wait_for_unix_gc() */ + WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1); + } + + WRITE_ONCE(user->unix_inflight, user->unix_inflight - 1); + + spin_unlock(&unix_gc_lock); +}
static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), struct sk_buff_head *hitlist) diff --git a/net/unix/scm.c b/net/unix/scm.c deleted file mode 100644 index 6f446dd2deed..000000000000 --- a/net/unix/scm.c +++ /dev/null @@ -1,149 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#include <linux/module.h> -#include <linux/kernel.h> -#include <linux/string.h> -#include <linux/socket.h> -#include <linux/net.h> -#include <linux/fs.h> -#include <net/af_unix.h> -#include <net/scm.h> -#include <linux/init.h> -#include <linux/io_uring.h> - -#include "scm.h" - -unsigned int unix_tot_inflight; -EXPORT_SYMBOL(unix_tot_inflight); - -LIST_HEAD(gc_inflight_list); -EXPORT_SYMBOL(gc_inflight_list); - -DEFINE_SPINLOCK(unix_gc_lock); -EXPORT_SYMBOL(unix_gc_lock); - -struct unix_sock *unix_get_socket(struct file *filp) -{ - struct inode *inode = file_inode(filp); - - /* Socket ? */ - if (S_ISSOCK(inode->i_mode) && !(filp->f_mode & FMODE_PATH)) { - struct socket *sock = SOCKET_I(inode); - struct sock *s = sock->sk; - - /* PF_UNIX ? */ - if (s && sock->ops && sock->ops->family == PF_UNIX) - return unix_sk(s); - } - - return NULL; -} -EXPORT_SYMBOL(unix_get_socket); - -/* Keep the number of times in flight count for the file - * descriptor if it is for an AF_UNIX socket. - */ -void unix_inflight(struct user_struct *user, struct file *fp) -{ - struct unix_sock *u = unix_get_socket(fp); - - spin_lock(&unix_gc_lock); - - if (u) { - if (!u->inflight) { - WARN_ON_ONCE(!list_empty(&u->link)); - list_add_tail(&u->link, &gc_inflight_list); - } else { - WARN_ON_ONCE(list_empty(&u->link)); - } - u->inflight++; - /* Paired with READ_ONCE() in wait_for_unix_gc() */ - WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1); - } - WRITE_ONCE(user->unix_inflight, user->unix_inflight + 1); - spin_unlock(&unix_gc_lock); -} - -void unix_notinflight(struct user_struct *user, struct file *fp) -{ - struct unix_sock *u = unix_get_socket(fp); - - spin_lock(&unix_gc_lock); - - if (u) { - WARN_ON_ONCE(!u->inflight); - WARN_ON_ONCE(list_empty(&u->link)); - - u->inflight--; - if (!u->inflight) - list_del_init(&u->link); - /* Paired with READ_ONCE() in wait_for_unix_gc() */ - WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1); - } - WRITE_ONCE(user->unix_inflight, user->unix_inflight - 1); - spin_unlock(&unix_gc_lock); -} - -/* - * The "user->unix_inflight" variable is protected by the garbage - * collection lock, and we just read it locklessly here. If you go - * over the limit, there might be a tiny race in actually noticing - * it across threads. Tough. - */ -static inline bool too_many_unix_fds(struct task_struct *p) -{ - struct user_struct *user = current_user(); - - if (unlikely(READ_ONCE(user->unix_inflight) > task_rlimit(p, RLIMIT_NOFILE))) - return !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN); - return false; -} - -int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) -{ - int i; - - if (too_many_unix_fds(current)) - return -ETOOMANYREFS; - - /* - * Need to duplicate file references for the sake of garbage - * collection. Otherwise a socket in the fps might become a - * candidate for GC while the skb is not yet queued. - */ - UNIXCB(skb).fp = scm_fp_dup(scm->fp); - if (!UNIXCB(skb).fp) - return -ENOMEM; - - for (i = scm->fp->count - 1; i >= 0; i--) - unix_inflight(scm->fp->user, scm->fp->fp[i]); - return 0; -} -EXPORT_SYMBOL(unix_attach_fds); - -void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb) -{ - int i; - - scm->fp = UNIXCB(skb).fp; - UNIXCB(skb).fp = NULL; - - for (i = scm->fp->count-1; i >= 0; i--) - unix_notinflight(scm->fp->user, scm->fp->fp[i]); -} -EXPORT_SYMBOL(unix_detach_fds); - -void unix_destruct_scm(struct sk_buff *skb) -{ - struct scm_cookie scm; - - memset(&scm, 0, sizeof(scm)); - scm.pid = UNIXCB(skb).pid; - if (UNIXCB(skb).fp) - unix_detach_fds(&scm, skb); - - /* Alas, it calls VFS */ - /* So fscking what? fput() had been SMP-safe since the last Summer */ - scm_destroy(&scm); - sock_wfree(skb); -} -EXPORT_SYMBOL(unix_destruct_scm); diff --git a/net/unix/scm.h b/net/unix/scm.h deleted file mode 100644 index 5a255a477f16..000000000000 --- a/net/unix/scm.h +++ /dev/null @@ -1,10 +0,0 @@ -#ifndef NET_UNIX_SCM_H -#define NET_UNIX_SCM_H - -extern struct list_head gc_inflight_list; -extern spinlock_t unix_gc_lock; - -int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb); -void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb); - -#endif
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 99a7a5b9943ea2d05fb0dee38e4ae2290477ed83
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 99a7a5b9943ea ! 1: ec2aeb54a3930 af_unix: Remove CONFIG_UNIX_SCM. @@ Metadata ## Commit message ## af_unix: Remove CONFIG_UNIX_SCM.
+ [ Upstream commit 99a7a5b9943ea2d05fb0dee38e4ae2290477ed83 ] + Originally, the code related to garbage collection was all in garbage.c.
Commit f4e65870e5ce ("net: split out functions related to registering @@ Commit message Acked-by: Jens Axboe axboe@kernel.dk Link: https://lore.kernel.org/r/20240129190435.57228-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit 99a7a5b9943ea2d05fb0dee38e4ae2290477ed83) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## @@ include/net/af_unix.h: static inline struct unix_sock *unix_get_socket(struct file *filp) @@ net/Makefile: obj-$(CONFIG_NETFILTER) += netfilter/ -obj-$(CONFIG_UNIX_SCM) += unix/ +obj-$(CONFIG_UNIX) += unix/ obj-y += ipv6/ + obj-$(CONFIG_BPFILTER) += bpfilter/ obj-$(CONFIG_PACKET) += packet/ - obj-$(CONFIG_NET_KEY) += key/
## net/unix/Kconfig ## @@ net/unix/Kconfig: config UNIX @@ net/unix/Makefile: unix-$(CONFIG_BPF_SYSCALL) += unix_bpf.o
## net/unix/af_unix.c ## @@ + #include <linux/file.h> #include <linux/btf_ids.h> - #include <linux/bpf-cgroup.h>
-#include "scm.h" - @@ net/unix/scm.c (deleted) - /* Socket ? */ - if (S_ISSOCK(inode->i_mode) && !(filp->f_mode & FMODE_PATH)) { - struct socket *sock = SOCKET_I(inode); -- const struct proto_ops *ops = READ_ONCE(sock->ops); - struct sock *s = sock->sk; - - /* PF_UNIX ? */ -- if (s && ops && ops->family == PF_UNIX) +- if (s && sock->ops && sock->ops->family == PF_UNIX) - return unix_sk(s); - } - ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 1fbfdfaa590248c1d86407f578e40e5c65136330 ]
We will replace the garbage collection algorithm for AF_UNIX, where we will consider each inflight AF_UNIX socket as a vertex and its file descriptor as an edge in a directed graph.
This patch introduces a new struct unix_vertex representing a vertex in the graph and adds its pointer to struct unix_sock.
When we send a fd using the SCM_RIGHTS message, we allocate struct scm_fp_list to struct scm_cookie in scm_fp_copy(). Then, we bump each refcount of the inflight fds' struct file and save them in scm_fp_list.fp.
After that, unix_attach_fds() inexplicably clones scm_fp_list of scm_cookie and sets it to skb. (We will remove this part after replacing GC.)
Here, we add a new function call in unix_attach_fds() to preallocate struct unix_vertex per inflight AF_UNIX fd and link each vertex to skb's scm_fp_list.vertices.
When sendmsg() succeeds later, if the socket of the inflight fd is still not inflight yet, we will set the preallocated vertex to struct unix_sock.vertex and link it to a global list unix_unvisited_vertices under spin_lock(&unix_gc_lock).
If the socket is already inflight, we free the preallocated vertex. This is to avoid taking the lock unnecessarily when sendmsg() could fail later.
In the following patch, we will similarly allocate another struct per edge, which will finally be linked to the inflight socket's unix_vertex.edges.
And then, we will count the number of edges as unix_vertex.out_degree.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 1fbfdfaa590248c1d86407f578e40e5c65136330) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 9 +++++++++ include/net/scm.h | 3 +++ net/core/scm.c | 7 +++++++ net/unix/af_unix.c | 6 ++++++ net/unix/garbage.c | 38 ++++++++++++++++++++++++++++++++++++++ 5 files changed, 63 insertions(+)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 91d2036fc182..b41aff1ac688 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -22,9 +22,17 @@ extern unsigned int unix_tot_inflight;
void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); +int unix_prepare_fpl(struct scm_fp_list *fpl); +void unix_destroy_fpl(struct scm_fp_list *fpl); void unix_gc(void); void wait_for_unix_gc(struct scm_fp_list *fpl);
+struct unix_vertex { + struct list_head edges; + struct list_head entry; + unsigned long out_degree; +}; + struct sock *unix_peer_get(struct sock *sk);
#define UNIX_HASH_MOD (256 - 1) @@ -62,6 +70,7 @@ struct unix_sock { struct path path; struct mutex iolock, bindlock; struct sock *peer; + struct unix_vertex *vertex; struct list_head link; unsigned long inflight; spinlock_t lock; diff --git a/include/net/scm.h b/include/net/scm.h index a5c26008fcec..4183495d1981 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -25,6 +25,9 @@ struct scm_fp_list { short count; short count_unix; short max; +#ifdef CONFIG_UNIX + struct list_head vertices; +#endif struct user_struct *user; struct file *fp[SCM_MAX_FD]; }; diff --git a/net/core/scm.c b/net/core/scm.c index bb25052624ee..09bacb3d36f2 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -89,6 +89,9 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp) fpl->count_unix = 0; fpl->max = SCM_MAX_FD; fpl->user = NULL; +#if IS_ENABLED(CONFIG_UNIX) + INIT_LIST_HEAD(&fpl->vertices); +#endif } fpp = &fpl->fp[fpl->count];
@@ -372,8 +375,12 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) if (new_fpl) { for (i = 0; i < fpl->count; i++) get_file(fpl->fp[i]); + new_fpl->max = new_fpl->count; new_fpl->user = get_uid(fpl->user); +#if IS_ENABLED(CONFIG_UNIX) + INIT_LIST_HEAD(&new_fpl->vertices); +#endif } return new_fpl; } diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 7bcc4c526274..0d3ba0d210c0 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -955,6 +955,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern, sk->sk_destruct = unix_sock_destructor; u = unix_sk(sk); u->inflight = 0; + u->vertex = NULL; u->path.dentry = NULL; u->path.mnt = NULL; spin_lock_init(&u->lock); @@ -1756,6 +1757,9 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) for (i = scm->fp->count - 1; i >= 0; i--) unix_inflight(scm->fp->user, scm->fp->fp[i]);
+ if (unix_prepare_fpl(UNIXCB(skb).fp)) + return -ENOMEM; + return 0; }
@@ -1766,6 +1770,8 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb) scm->fp = UNIXCB(skb).fp; UNIXCB(skb).fp = NULL;
+ unix_destroy_fpl(scm->fp); + for (i = scm->fp->count - 1; i >= 0; i--) unix_notinflight(scm->fp->user, scm->fp->fp[i]); } diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 0104be9d4704..8ea7640e032e 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -101,6 +101,44 @@ struct unix_sock *unix_get_socket(struct file *filp) return NULL; }
+static void unix_free_vertices(struct scm_fp_list *fpl) +{ + struct unix_vertex *vertex, *next_vertex; + + list_for_each_entry_safe(vertex, next_vertex, &fpl->vertices, entry) { + list_del(&vertex->entry); + kfree(vertex); + } +} + +int unix_prepare_fpl(struct scm_fp_list *fpl) +{ + struct unix_vertex *vertex; + int i; + + if (!fpl->count_unix) + return 0; + + for (i = 0; i < fpl->count_unix; i++) { + vertex = kmalloc(sizeof(*vertex), GFP_KERNEL); + if (!vertex) + goto err; + + list_add(&vertex->entry, &fpl->vertices); + } + + return 0; + +err: + unix_free_vertices(fpl); + return -ENOMEM; +} + +void unix_destroy_fpl(struct scm_fp_list *fpl) +{ + unix_free_vertices(fpl); +} + DEFINE_SPINLOCK(unix_gc_lock); unsigned int unix_tot_inflight; static LIST_HEAD(gc_candidates);
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 1fbfdfaa590248c1d86407f578e40e5c65136330
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 1fbfdfaa59024 < -: ------------- af_unix: Allocate struct unix_vertex for each inflight AF_UNIX fd. -: ------------- > 1: 325285d9fc869 Linux 6.1.139 ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 29b64e354029cfcf1eea4d91b146c7b769305930 ]
As with the previous patch, we preallocate to skb's scm_fp_list an array of struct unix_edge in the number of inflight AF_UNIX fds.
There we just preallocate memory and do not use immediately because sendmsg() could fail after this point. The actual use will be in the next patch.
When we queue skb with inflight edges, we will set the inflight socket's unix_sock as unix_edge->predecessor and the receiver's unix_sock as successor, and then we will link the edge to the inflight socket's unix_vertex.edges.
Note that we set NULL to cloned scm_fp_list.edges in scm_fp_dup() so that MSG_PEEK does not change the shape of the directed graph.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 29b64e354029cfcf1eea4d91b146c7b769305930) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 6 ++++++ include/net/scm.h | 5 +++++ net/core/scm.c | 2 ++ net/unix/garbage.c | 6 ++++++ 4 files changed, 19 insertions(+)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index b41aff1ac688..279087595966 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -33,6 +33,12 @@ struct unix_vertex { unsigned long out_degree; };
+struct unix_edge { + struct unix_sock *predecessor; + struct unix_sock *successor; + struct list_head vertex_entry; +}; + struct sock *unix_peer_get(struct sock *sk);
#define UNIX_HASH_MOD (256 - 1) diff --git a/include/net/scm.h b/include/net/scm.h index 4183495d1981..19d7d802ed6c 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -21,12 +21,17 @@ struct scm_creds { kgid_t gid; };
+#ifdef CONFIG_UNIX +struct unix_edge; +#endif + struct scm_fp_list { short count; short count_unix; short max; #ifdef CONFIG_UNIX struct list_head vertices; + struct unix_edge *edges; #endif struct user_struct *user; struct file *fp[SCM_MAX_FD]; diff --git a/net/core/scm.c b/net/core/scm.c index 09bacb3d36f2..4c343729f960 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -90,6 +90,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp) fpl->max = SCM_MAX_FD; fpl->user = NULL; #if IS_ENABLED(CONFIG_UNIX) + fpl->edges = NULL; INIT_LIST_HEAD(&fpl->vertices); #endif } @@ -379,6 +380,7 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) new_fpl->max = new_fpl->count; new_fpl->user = get_uid(fpl->user); #if IS_ENABLED(CONFIG_UNIX) + new_fpl->edges = NULL; INIT_LIST_HEAD(&new_fpl->vertices); #endif } diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 8ea7640e032e..912b7945692c 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -127,6 +127,11 @@ int unix_prepare_fpl(struct scm_fp_list *fpl) list_add(&vertex->entry, &fpl->vertices); }
+ fpl->edges = kvmalloc_array(fpl->count_unix, sizeof(*fpl->edges), + GFP_KERNEL_ACCOUNT); + if (!fpl->edges) + goto err; + return 0;
err: @@ -136,6 +141,7 @@ int unix_prepare_fpl(struct scm_fp_list *fpl)
void unix_destroy_fpl(struct scm_fp_list *fpl) { + kvfree(fpl->edges); unix_free_vertices(fpl); }
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 29b64e354029cfcf1eea4d91b146c7b769305930
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 29b64e354029c < -: ------------- af_unix: Allocate struct unix_edge for each inflight AF_UNIX fd. -: ------------- > 1: 325285d9fc869 Linux 6.1.139 ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 42f298c06b30bfe0a8cbee5d38644e618699e26e ]
Just before queuing skb with inflight fds, we call scm_stat_add(), which is a good place to set up the preallocated struct unix_vertex and struct unix_edge in UNIXCB(skb).fp.
Then, we call unix_add_edges() and construct the directed graph as follows:
1. Set the inflight socket's unix_sock to unix_edge.predecessor. 2. Set the receiver's unix_sock to unix_edge.successor. 3. Set the preallocated vertex to inflight socket's unix_sock.vertex. 4. Link inflight socket's unix_vertex.entry to unix_unvisited_vertices. 5. Link unix_edge.vertex_entry to the inflight socket's unix_vertex.edges.
Let's say we pass the fd of AF_UNIX socket A to B and the fd of B to C. The graph looks like this:
+-------------------------+ | unix_unvisited_vertices | <-------------------------. +-------------------------+ | + | | +--------------+ +--------------+ | +--------------+ | | unix_sock A | <---. .---> | unix_sock B | <-|-. .---> | unix_sock C | | +--------------+ | | +--------------+ | | | +--------------+ | .-+ | vertex | | | .-+ | vertex | | | | | vertex | | | +--------------+ | | | +--------------+ | | | +--------------+ | | | | | | | | | | +--------------+ | | | +--------------+ | | | | '-> | unix_vertex | | | '-> | unix_vertex | | | | | +--------------+ | | +--------------+ | | | `---> | entry | +---------> | entry | +-' | | |--------------| | | |--------------| | | | edges | <-. | | | edges | <-. | | +--------------+ | | | +--------------+ | | | | | | | | | .----------------------' | | .----------------------' | | | | | | | | | +--------------+ | | | +--------------+ | | | | unix_edge | | | | | unix_edge | | | | +--------------+ | | | +--------------+ | | `-> | vertex_entry | | | `-> | vertex_entry | | | |--------------| | | |--------------| | | | predecessor | +---' | | predecessor | +---' | |--------------| | |--------------| | | successor | +-----' | successor | +-----' +--------------+ +--------------+
Henceforth, we denote such a graph as A -> B (-> C).
Now, we can express all inflight fd graphs that do not contain embryo sockets. We will support the particular case later.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 42f298c06b30bfe0a8cbee5d38644e618699e26e) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 2 + include/net/scm.h | 1 + net/core/scm.c | 2 + net/unix/af_unix.c | 8 +++- net/unix/garbage.c | 90 ++++++++++++++++++++++++++++++++++++++++++- 5 files changed, 100 insertions(+), 3 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 279087595966..08cc90348043 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -22,6 +22,8 @@ extern unsigned int unix_tot_inflight;
void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); +void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver); +void unix_del_edges(struct scm_fp_list *fpl); int unix_prepare_fpl(struct scm_fp_list *fpl); void unix_destroy_fpl(struct scm_fp_list *fpl); void unix_gc(void); diff --git a/include/net/scm.h b/include/net/scm.h index 19d7d802ed6c..19789096424d 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -30,6 +30,7 @@ struct scm_fp_list { short count_unix; short max; #ifdef CONFIG_UNIX + bool inflight; struct list_head vertices; struct unix_edge *edges; #endif diff --git a/net/core/scm.c b/net/core/scm.c index 4c343729f960..1ff78bd4ee83 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -90,6 +90,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp) fpl->max = SCM_MAX_FD; fpl->user = NULL; #if IS_ENABLED(CONFIG_UNIX) + fpl->inflight = false; fpl->edges = NULL; INIT_LIST_HEAD(&fpl->vertices); #endif @@ -380,6 +381,7 @@ struct scm_fp_list *scm_fp_dup(struct scm_fp_list *fpl) new_fpl->max = new_fpl->count; new_fpl->user = get_uid(fpl->user); #if IS_ENABLED(CONFIG_UNIX) + new_fpl->inflight = false; new_fpl->edges = NULL; INIT_LIST_HEAD(&new_fpl->vertices); #endif diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 0d3ba0d210c0..658a1680a92e 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1910,8 +1910,10 @@ static void scm_stat_add(struct sock *sk, struct sk_buff *skb) struct scm_fp_list *fp = UNIXCB(skb).fp; struct unix_sock *u = unix_sk(sk);
- if (unlikely(fp && fp->count)) + if (unlikely(fp && fp->count)) { atomic_add(fp->count, &u->scm_stat.nr_fds); + unix_add_edges(fp, u); + } }
static void scm_stat_del(struct sock *sk, struct sk_buff *skb) @@ -1919,8 +1921,10 @@ static void scm_stat_del(struct sock *sk, struct sk_buff *skb) struct scm_fp_list *fp = UNIXCB(skb).fp; struct unix_sock *u = unix_sk(sk);
- if (unlikely(fp && fp->count)) + if (unlikely(fp && fp->count)) { atomic_sub(fp->count, &u->scm_stat.nr_fds); + unix_del_edges(fp); + } }
/* diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 912b7945692c..b5b4a200dbf3 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -101,6 +101,38 @@ struct unix_sock *unix_get_socket(struct file *filp) return NULL; }
+static LIST_HEAD(unix_unvisited_vertices); + +static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) +{ + struct unix_vertex *vertex = edge->predecessor->vertex; + + if (!vertex) { + vertex = list_first_entry(&fpl->vertices, typeof(*vertex), entry); + vertex->out_degree = 0; + INIT_LIST_HEAD(&vertex->edges); + + list_move_tail(&vertex->entry, &unix_unvisited_vertices); + edge->predecessor->vertex = vertex; + } + + vertex->out_degree++; + list_add_tail(&edge->vertex_entry, &vertex->edges); +} + +static void unix_del_edge(struct scm_fp_list *fpl, struct unix_edge *edge) +{ + struct unix_vertex *vertex = edge->predecessor->vertex; + + list_del(&edge->vertex_entry); + vertex->out_degree--; + + if (!vertex->out_degree) { + edge->predecessor->vertex = NULL; + list_move_tail(&vertex->entry, &fpl->vertices); + } +} + static void unix_free_vertices(struct scm_fp_list *fpl) { struct unix_vertex *vertex, *next_vertex; @@ -111,6 +143,60 @@ static void unix_free_vertices(struct scm_fp_list *fpl) } }
+DEFINE_SPINLOCK(unix_gc_lock); + +void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver) +{ + int i = 0, j = 0; + + spin_lock(&unix_gc_lock); + + if (!fpl->count_unix) + goto out; + + do { + struct unix_sock *inflight = unix_get_socket(fpl->fp[j++]); + struct unix_edge *edge; + + if (!inflight) + continue; + + edge = fpl->edges + i++; + edge->predecessor = inflight; + edge->successor = receiver; + + unix_add_edge(fpl, edge); + } while (i < fpl->count_unix); + +out: + spin_unlock(&unix_gc_lock); + + fpl->inflight = true; + + unix_free_vertices(fpl); +} + +void unix_del_edges(struct scm_fp_list *fpl) +{ + int i = 0; + + spin_lock(&unix_gc_lock); + + if (!fpl->count_unix) + goto out; + + do { + struct unix_edge *edge = fpl->edges + i++; + + unix_del_edge(fpl, edge); + } while (i < fpl->count_unix); + +out: + spin_unlock(&unix_gc_lock); + + fpl->inflight = false; +} + int unix_prepare_fpl(struct scm_fp_list *fpl) { struct unix_vertex *vertex; @@ -141,11 +227,13 @@ int unix_prepare_fpl(struct scm_fp_list *fpl)
void unix_destroy_fpl(struct scm_fp_list *fpl) { + if (fpl->inflight) + unix_del_edges(fpl); + kvfree(fpl->edges); unix_free_vertices(fpl); }
-DEFINE_SPINLOCK(unix_gc_lock); unsigned int unix_tot_inflight; static LIST_HEAD(gc_candidates); static LIST_HEAD(gc_inflight_list);
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 42f298c06b30bfe0a8cbee5d38644e618699e26e
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 42f298c06b30b ! 1: 2a936f6956d7c af_unix: Link struct unix_edge when queuing skb. @@ Metadata ## Commit message ## af_unix: Link struct unix_edge when queuing skb.
+ [ Upstream commit 42f298c06b30bfe0a8cbee5d38644e618699e26e ] + Just before queuing skb with inflight fds, we call scm_stat_add(), which is a good place to set up the preallocated struct unix_vertex and struct unix_edge in UNIXCB(skb).fp. @@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit 42f298c06b30bfe0a8cbee5d38644e618699e26e) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## @@ include/net/af_unix.h: extern unsigned int unix_tot_inflight; ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 22c3c0c52d32f41cc38cd936ea0c93f22ced3315 ]
Currently, we track the number of inflight sockets in two variables. unix_tot_inflight is the total number of inflight AF_UNIX sockets on the host, and user->unix_inflight is the number of inflight fds per user.
We update them one by one in unix_inflight(), which can be done once in batch. Also, sendmsg() could fail even after unix_inflight(), then we need to acquire unix_gc_lock only to decrement the counters.
Let's bulk update the counters in unix_add_edges() and unix_del_edges(), which is called only for successfully passed fds.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 22c3c0c52d32f41cc38cd936ea0c93f22ced3315) Signed-off-by: Lee Jones lee@kernel.org --- net/unix/garbage.c | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index b5b4a200dbf3..f7041fc23000 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -144,6 +144,7 @@ static void unix_free_vertices(struct scm_fp_list *fpl) }
DEFINE_SPINLOCK(unix_gc_lock); +unsigned int unix_tot_inflight;
void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver) { @@ -168,7 +169,10 @@ void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver) unix_add_edge(fpl, edge); } while (i < fpl->count_unix);
+ WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + fpl->count_unix); out: + WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight + fpl->count); + spin_unlock(&unix_gc_lock);
fpl->inflight = true; @@ -191,7 +195,10 @@ void unix_del_edges(struct scm_fp_list *fpl) unix_del_edge(fpl, edge); } while (i < fpl->count_unix);
+ WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - fpl->count_unix); out: + WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight - fpl->count); + spin_unlock(&unix_gc_lock);
fpl->inflight = false; @@ -234,7 +241,6 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) unix_free_vertices(fpl); }
-unsigned int unix_tot_inflight; static LIST_HEAD(gc_candidates); static LIST_HEAD(gc_inflight_list);
@@ -255,13 +261,8 @@ void unix_inflight(struct user_struct *user, struct file *filp) WARN_ON_ONCE(list_empty(&u->link)); } u->inflight++; - - /* Paired with READ_ONCE() in wait_for_unix_gc() */ - WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + 1); }
- WRITE_ONCE(user->unix_inflight, user->unix_inflight + 1); - spin_unlock(&unix_gc_lock); }
@@ -278,13 +279,8 @@ void unix_notinflight(struct user_struct *user, struct file *filp) u->inflight--; if (!u->inflight) list_del_init(&u->link); - - /* Paired with READ_ONCE() in wait_for_unix_gc() */ - WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - 1); }
- WRITE_ONCE(user->unix_inflight, user->unix_inflight - 1); - spin_unlock(&unix_gc_lock); }
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 22c3c0c52d32f41cc38cd936ea0c93f22ced3315
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 22c3c0c52d32f ! 1: 98babb0b1635c af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb. @@ Metadata ## Commit message ## af_unix: Bulk update unix_tot_inflight/unix_inflight when queuing skb.
+ [ Upstream commit 22c3c0c52d32f41cc38cd936ea0c93f22ced3315 ] + Currently, we track the number of inflight sockets in two variables. unix_tot_inflight is the total number of inflight AF_UNIX sockets on the host, and user->unix_inflight is the number of inflight fds per @@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-5-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit 22c3c0c52d32f41cc38cd936ea0c93f22ced3315) + Signed-off-by: Lee Jones lee@kernel.org
## net/unix/garbage.c ## @@ net/unix/garbage.c: static void unix_free_vertices(struct scm_fp_list *fpl) ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 6ba76fd2848e107594ea4f03b737230f74bc23ea ]
The new GC will use a depth first search graph algorithm to find cyclic references. The algorithm visits every vertex exactly once.
Here, we implement the DFS part without recursion so that no one can abuse it.
unix_walk_scc() marks every vertex unvisited by initialising index as UNIX_VERTEX_INDEX_UNVISITED and iterates inflight vertices in unix_unvisited_vertices and call __unix_walk_scc() to start DFS from an arbitrary vertex.
__unix_walk_scc() iterates all edges starting from the vertex and explores the neighbour vertices with DFS using edge_stack.
After visiting all neighbours, __unix_walk_scc() moves the visited vertex to unix_visited_vertices so that unix_walk_scc() will not restart DFS from the visited vertex.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 6ba76fd2848e107594ea4f03b737230f74bc23ea) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 2 ++ net/unix/garbage.c | 74 +++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 76 insertions(+)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 08cc90348043..9d51d675cc9f 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -33,12 +33,14 @@ struct unix_vertex { struct list_head edges; struct list_head entry; unsigned long out_degree; + unsigned long index; };
struct unix_edge { struct unix_sock *predecessor; struct unix_sock *successor; struct list_head vertex_entry; + struct list_head stack_entry; };
struct sock *unix_peer_get(struct sock *sk); diff --git a/net/unix/garbage.c b/net/unix/garbage.c index f7041fc23000..295dd1a7b8e0 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -103,6 +103,11 @@ struct unix_sock *unix_get_socket(struct file *filp)
static LIST_HEAD(unix_unvisited_vertices);
+enum unix_vertex_index { + UNIX_VERTEX_INDEX_UNVISITED, + UNIX_VERTEX_INDEX_START, +}; + static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex = edge->predecessor->vertex; @@ -241,6 +246,73 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) unix_free_vertices(fpl); }
+static LIST_HEAD(unix_visited_vertices); + +static void __unix_walk_scc(struct unix_vertex *vertex) +{ + unsigned long index = UNIX_VERTEX_INDEX_START; + struct unix_edge *edge; + LIST_HEAD(edge_stack); + +next_vertex: + vertex->index = index; + index++; + + /* Explore neighbour vertices (receivers of the current vertex's fd). */ + list_for_each_entry(edge, &vertex->edges, vertex_entry) { + struct unix_vertex *next_vertex = edge->successor->vertex; + + if (!next_vertex) + continue; + + if (next_vertex->index == UNIX_VERTEX_INDEX_UNVISITED) { + /* Iterative deepening depth first search + * + * 1. Push a forward edge to edge_stack and set + * the successor to vertex for the next iteration. + */ + list_add(&edge->stack_entry, &edge_stack); + + vertex = next_vertex; + goto next_vertex; + + /* 2. Pop the edge directed to the current vertex + * and restore the ancestor for backtracking. + */ +prev_vertex: + edge = list_first_entry(&edge_stack, typeof(*edge), stack_entry); + list_del_init(&edge->stack_entry); + + vertex = edge->predecessor->vertex; + } + } + + /* Don't restart DFS from this vertex in unix_walk_scc(). */ + list_move_tail(&vertex->entry, &unix_visited_vertices); + + /* Need backtracking ? */ + if (!list_empty(&edge_stack)) + goto prev_vertex; +} + +static void unix_walk_scc(void) +{ + struct unix_vertex *vertex; + + list_for_each_entry(vertex, &unix_unvisited_vertices, entry) + vertex->index = UNIX_VERTEX_INDEX_UNVISITED; + + /* Visit every vertex exactly once. + * __unix_walk_scc() moves visited vertices to unix_visited_vertices. + */ + while (!list_empty(&unix_unvisited_vertices)) { + vertex = list_first_entry(&unix_unvisited_vertices, typeof(*vertex), entry); + __unix_walk_scc(vertex); + } + + list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); +} + static LIST_HEAD(gc_candidates); static LIST_HEAD(gc_inflight_list);
@@ -388,6 +460,8 @@ static void __unix_gc(struct work_struct *work)
spin_lock(&unix_gc_lock);
+ unix_walk_scc(); + /* First, select candidates for garbage collection. Only * in-flight sockets are considered, and from those only ones * which don't have any external reference.
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 6ba76fd2848e107594ea4f03b737230f74bc23ea
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 6ba76fd2848e1 ! 1: 81c21430d5b6f af_unix: Iterate all vertices by DFS. @@ Metadata ## Commit message ## af_unix: Iterate all vertices by DFS.
+ [ Upstream commit 6ba76fd2848e107594ea4f03b737230f74bc23ea ] + The new GC will use a depth first search graph algorithm to find cyclic references. The algorithm visits every vertex exactly once.
@@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit 6ba76fd2848e107594ea4f03b737230f74bc23ea) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## @@ include/net/af_unix.h: struct unix_vertex { ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 3484f063172dd88776b062046d721d7c2ae1af7c ]
In the new GC, we use a simple graph algorithm, Tarjan's Strongly Connected Components (SCC) algorithm, to find cyclic references.
The algorithm visits every vertex exactly once using depth-first search (DFS).
DFS starts by pushing an input vertex to a stack and assigning it a unique number. Two fields, index and lowlink, are initialised with the number, but lowlink could be updated later during DFS.
If a vertex has an edge to an unvisited inflight vertex, we visit it and do the same processing. So, we will have vertices in the stack in the order they appear and number them consecutively in the same order.
If a vertex has a back-edge to a visited vertex in the stack, we update the predecessor's lowlink with the successor's index.
After iterating edges from the vertex, we check if its index equals its lowlink.
If the lowlink is different from the index, it shows there was a back-edge. Then, we go backtracking and propagate the lowlink to its predecessor and resume the previous edge iteration from the next edge.
If the lowlink is the same as the index, we pop vertices before and including the vertex from the stack. Then, the set of vertices is SCC, possibly forming a cycle. At the same time, we move the vertices to unix_visited_vertices.
When we finish the algorithm, all vertices in each SCC will be linked via unix_vertex.scc_entry.
Let's take an example. We have a graph including five inflight vertices (F is not inflight):
A -> B -> C -> D -> E (-> F) ^ | `---------'
Suppose that we start DFS from C. We will visit C, D, and B first and initialise their index and lowlink. Then, the stack looks like this:
B = (3, 3) (index, lowlink)
D = (2, 2) C = (1, 1)
When checking B's edge to C, we update B's lowlink with C's index and propagate it to D.
B = (3, 1) (index, lowlink)
D = (2, 1)
C = (1, 1)
Next, we visit E, which has no edge to an inflight vertex.
E = (4, 4) (index, lowlink)
B = (3, 1) D = (2, 1) C = (1, 1)
When we leave from E, its index and lowlink are the same, so we pop E from the stack as single-vertex SCC. Next, we leave from B and D but do nothing because their lowlink are different from their index.
B = (3, 1) (index, lowlink) D = (2, 1)
C = (1, 1)
Then, we leave from C, whose index and lowlink are the same, so we pop B, D and C as SCC.
Last, we do DFS for the rest of vertices, A, which is also a single-vertex SCC.
Finally, each unix_vertex.scc_entry is linked as follows:
A -. B -> C -> D E -. ^ | ^ | ^ | `--' `---------' `--'
We use SCC later to decide whether we can garbage-collect the sockets.
Note that we still cannot detect SCC properly if an edge points to an embryo socket. The following two patches will sort it out.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 3484f063172dd88776b062046d721d7c2ae1af7c) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 3 +++ net/unix/garbage.c | 46 +++++++++++++++++++++++++++++++++++++++++-- 2 files changed, 47 insertions(+), 2 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 9d51d675cc9f..e6f7bba19152 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -32,8 +32,11 @@ void wait_for_unix_gc(struct scm_fp_list *fpl); struct unix_vertex { struct list_head edges; struct list_head entry; + struct list_head scc_entry; unsigned long out_degree; unsigned long index; + unsigned long lowlink; + bool on_stack; };
struct unix_edge { diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 295dd1a7b8e0..cdeff548e130 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -251,11 +251,19 @@ static LIST_HEAD(unix_visited_vertices); static void __unix_walk_scc(struct unix_vertex *vertex) { unsigned long index = UNIX_VERTEX_INDEX_START; + LIST_HEAD(vertex_stack); struct unix_edge *edge; LIST_HEAD(edge_stack);
next_vertex: + /* Push vertex to vertex_stack. + * The vertex will be popped when finalising SCC later. + */ + vertex->on_stack = true; + list_add(&vertex->scc_entry, &vertex_stack); + vertex->index = index; + vertex->lowlink = index; index++;
/* Explore neighbour vertices (receivers of the current vertex's fd). */ @@ -283,12 +291,46 @@ static void __unix_walk_scc(struct unix_vertex *vertex) edge = list_first_entry(&edge_stack, typeof(*edge), stack_entry); list_del_init(&edge->stack_entry);
+ next_vertex = vertex; vertex = edge->predecessor->vertex; + + /* If the successor has a smaller lowlink, two vertices + * are in the same SCC, so propagate the smaller lowlink + * to skip SCC finalisation. + */ + vertex->lowlink = min(vertex->lowlink, next_vertex->lowlink); + } else if (next_vertex->on_stack) { + /* Loop detected by a back/cross edge. + * + * The successor is on vertex_stack, so two vertices are + * in the same SCC. If the successor has a smaller index, + * propagate it to skip SCC finalisation. + */ + vertex->lowlink = min(vertex->lowlink, next_vertex->index); + } else { + /* The successor was already grouped as another SCC */ } }
- /* Don't restart DFS from this vertex in unix_walk_scc(). */ - list_move_tail(&vertex->entry, &unix_visited_vertices); + if (vertex->index == vertex->lowlink) { + struct list_head scc; + + /* SCC finalised. + * + * If the lowlink was not updated, all the vertices above on + * vertex_stack are in the same SCC. Group them using scc_entry. + */ + __list_cut_position(&scc, &vertex_stack, &vertex->scc_entry); + + list_for_each_entry_reverse(vertex, &scc, scc_entry) { + /* Don't restart DFS from this vertex in unix_walk_scc(). */ + list_move_tail(&vertex->entry, &unix_visited_vertices); + + vertex->on_stack = false; + } + + list_del(&scc); + }
/* Need backtracking ? */ if (!list_empty(&edge_stack))
[ Sasha's backport helper bot ]
Hi,
Summary of potential issues: ℹ️ This is part 13/27 of a series ⚠️ Found follow-up fixes in mainline
The upstream commit SHA1 provided is correct: 3484f063172dd88776b062046d721d7c2ae1af7c
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Found fixes commits: 927fa5b3e4f5 af_unix: Fix uninit-value in __unix_walk_scc()
Note: The patch differs from the upstream commit: --- 1: 3484f063172dd ! 1: 3c8c9bc58de2c af_unix: Detect Strongly Connected Components. @@ Metadata ## Commit message ## af_unix: Detect Strongly Connected Components.
+ [ Upstream commit 3484f063172dd88776b062046d721d7c2ae1af7c ] + In the new GC, we use a simple graph algorithm, Tarjan's Strongly Connected Components (SCC) algorithm, to find cyclic references.
@@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-7-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit 3484f063172dd88776b062046d721d7c2ae1af7c) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## @@ include/net/af_unix.h: void wait_for_unix_gc(struct scm_fp_list *fpl); ---
NOTE: These results are for this patch alone. Full series testing will be performed when all parts are received.
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit aed6ecef55d70de3762ce41c561b7f547dbaf107 ]
This is a prep patch for the following change, where we need to fetch the listening socket from the successor embryo socket during GC.
We add a new field to struct unix_sock to save a pointer to a listening socket.
We set it when connect() creates a new socket, and clear it when accept() is called.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit aed6ecef55d70de3762ce41c561b7f547dbaf107) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 1 + net/unix/af_unix.c | 5 ++++- 2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index e6f7bba19152..624fea657518 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -83,6 +83,7 @@ struct unix_sock { struct path path; struct mutex iolock, bindlock; struct sock *peer; + struct sock *listener; struct unix_vertex *vertex; struct list_head link; unsigned long inflight; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 658a1680a92e..6075ecbe40b2 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -954,6 +954,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern, sk->sk_max_ack_backlog = READ_ONCE(net->unx.sysctl_max_dgram_qlen); sk->sk_destruct = unix_sock_destructor; u = unix_sk(sk); + u->listener = NULL; u->inflight = 0; u->vertex = NULL; u->path.dentry = NULL; @@ -1558,6 +1559,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, newsk->sk_type = sk->sk_type; init_peercred(newsk); newu = unix_sk(newsk); + newu->listener = other; RCU_INIT_POINTER(newsk->sk_wq, &newu->peer_wq); otheru = unix_sk(other);
@@ -1651,8 +1653,8 @@ static int unix_accept(struct socket *sock, struct socket *newsock, int flags, bool kern) { struct sock *sk = sock->sk; - struct sock *tsk; struct sk_buff *skb; + struct sock *tsk; int err;
err = -EOPNOTSUPP; @@ -1677,6 +1679,7 @@ static int unix_accept(struct socket *sock, struct socket *newsock, int flags, }
tsk = skb->sk; + unix_sk(tsk)->listener = NULL; skb_free_datagram(sk, skb); wake_up_interruptible(&unix_sk(sk)->peer_wait);
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: aed6ecef55d70de3762ce41c561b7f547dbaf107
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: aed6ecef55d70 ! 1: bb250adb0457d af_unix: Save listener for embryo socket. @@ Metadata ## Commit message ## af_unix: Save listener for embryo socket.
+ [ Upstream commit aed6ecef55d70de3762ce41c561b7f547dbaf107 ] + This is a prep patch for the following change, where we need to fetch the listening socket from the successor embryo socket during GC. @@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-8-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit aed6ecef55d70de3762ce41c561b7f547dbaf107) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## @@ include/net/af_unix.h: struct unix_sock { @@ include/net/af_unix.h: struct unix_sock {
## net/unix/af_unix.c ## @@ net/unix/af_unix.c: static struct sock *unix_create1(struct net *net, struct socket *sock, int kern, - sk->sk_max_ack_backlog = net->unx.sysctl_max_dgram_qlen; + sk->sk_max_ack_backlog = READ_ONCE(net->unx.sysctl_max_dgram_qlen); sk->sk_destruct = unix_sock_destructor; u = unix_sk(sk); + u->listener = NULL; ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit dcf70df2048d27c5d186f013f101a4aefd63aa41 ]
To garbage collect inflight AF_UNIX sockets, we must define the cyclic reference appropriately. This is a bit tricky if the loop consists of embryo sockets.
Suppose that the fd of AF_UNIX socket A is passed to D and the fd B to C and that C and D are embryo sockets of A and B, respectively. It may appear that there are two separate graphs, A (-> D) and B (-> C), but this is not correct.
A --. .-- B X C <-' `-> D
Now, D holds A's refcount, and C has B's refcount, so unix_release() will never be called for A and B when we close() them. However, no one can call close() for D and C to free skbs holding refcounts of A and B because C/D is in A/B's receive queue, which should have been purged by unix_release() for A and B.
So, here's another type of cyclic reference. When a fd of an AF_UNIX socket is passed to an embryo socket, the reference is indirectly held by its parent listening socket.
.-> A .-> B | `- sk_receive_queue | `- sk_receive_queue | `- skb | `- skb | `- sk == C | `- sk == D | `- sk_receive_queue | `- sk_receive_queue | `- skb +---------' `- skb +-. | | `---------------------------------------------------------'
Technically, the graph must be denoted as A <-> B instead of A (-> D) and B (-> C) to find such a cyclic reference without touching each socket's receive queue.
.-> A --. .-- B <-. | X | == A <-> B `-- C <-' `-> D --'
We apply this fixup during GC by fetching the real successor by unix_edge_successor().
When we call accept(), we clear unix_sock.listener under unix_gc_lock not to confuse GC.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit dcf70df2048d27c5d186f013f101a4aefd63aa41) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 1 + net/unix/af_unix.c | 2 +- net/unix/garbage.c | 20 +++++++++++++++++++- 3 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 624fea657518..d7f589e14467 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -24,6 +24,7 @@ void unix_inflight(struct user_struct *user, struct file *fp); void unix_notinflight(struct user_struct *user, struct file *fp); void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver); void unix_del_edges(struct scm_fp_list *fpl); +void unix_update_edges(struct unix_sock *receiver); int unix_prepare_fpl(struct scm_fp_list *fpl); void unix_destroy_fpl(struct scm_fp_list *fpl); void unix_gc(void); diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 6075ecbe40b2..4d8b2b2b9a70 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1679,7 +1679,7 @@ static int unix_accept(struct socket *sock, struct socket *newsock, int flags, }
tsk = skb->sk; - unix_sk(tsk)->listener = NULL; + unix_update_edges(unix_sk(tsk)); skb_free_datagram(sk, skb); wake_up_interruptible(&unix_sk(sk)->peer_wait);
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index cdeff548e130..6ff7e0b5c544 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -101,6 +101,17 @@ struct unix_sock *unix_get_socket(struct file *filp) return NULL; }
+static struct unix_vertex *unix_edge_successor(struct unix_edge *edge) +{ + /* If an embryo socket has a fd, + * the listener indirectly holds the fd's refcnt. + */ + if (edge->successor->listener) + return unix_sk(edge->successor->listener)->vertex; + + return edge->successor->vertex; +} + static LIST_HEAD(unix_unvisited_vertices);
enum unix_vertex_index { @@ -209,6 +220,13 @@ void unix_del_edges(struct scm_fp_list *fpl) fpl->inflight = false; }
+void unix_update_edges(struct unix_sock *receiver) +{ + spin_lock(&unix_gc_lock); + receiver->listener = NULL; + spin_unlock(&unix_gc_lock); +} + int unix_prepare_fpl(struct scm_fp_list *fpl) { struct unix_vertex *vertex; @@ -268,7 +286,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
/* Explore neighbour vertices (receivers of the current vertex's fd). */ list_for_each_entry(edge, &vertex->edges, vertex_entry) { - struct unix_vertex *next_vertex = edge->successor->vertex; + struct unix_vertex *next_vertex = unix_edge_successor(edge);
if (!next_vertex) continue;
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: dcf70df2048d27c5d186f013f101a4aefd63aa41
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: dcf70df2048d2 ! 1: c26a81ec8cd5e af_unix: Fix up unix_edge.successor for embryo socket. @@ Metadata ## Commit message ## af_unix: Fix up unix_edge.successor for embryo socket.
+ [ Upstream commit dcf70df2048d27c5d186f013f101a4aefd63aa41 ] + To garbage collect inflight AF_UNIX sockets, we must define the cyclic reference appropriately. This is a bit tricky if the loop consists of embryo sockets. @@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-9-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit dcf70df2048d27c5d186f013f101a4aefd63aa41) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## @@ include/net/af_unix.h: void unix_inflight(struct user_struct *user, struct file *fp); ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit ba31b4a4e1018f5844c6eb31734976e2184f2f9a ]
Before starting Tarjan's algorithm, we need to mark all vertices as unvisited. We can save this O(n) setup by reserving two special indices (0, 1) and using two variables.
The first time we link a vertex to unix_unvisited_vertices, we set unix_vertex_unvisited_index to index.
During DFS, we can see that the index of unvisited vertices is the same as unix_vertex_unvisited_index.
When we finalise SCC later, we set unix_vertex_grouped_index to each vertex's index.
Then, we can know (i) that the vertex is on the stack if the index of a visited vertex is >= 2 and (ii) that it is not on the stack and belongs to a different SCC if the index is unix_vertex_grouped_index.
After the whole algorithm, all indices of vertices are set as unix_vertex_grouped_index.
Next time we start DFS, we know that all unvisited vertices have unix_vertex_grouped_index, and we can use unix_vertex_unvisited_index as the not-on-stack marker.
To use the same variable in __unix_walk_scc(), we can swap unix_vertex_(grouped|unvisited)_index at the end of Tarjan's algorithm.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-10-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit ba31b4a4e1018f5844c6eb31734976e2184f2f9a) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 1 - net/unix/garbage.c | 26 +++++++++++++++----------- 2 files changed, 15 insertions(+), 12 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index d7f589e14467..ffbc7322e41b 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -37,7 +37,6 @@ struct unix_vertex { unsigned long out_degree; unsigned long index; unsigned long lowlink; - bool on_stack; };
struct unix_edge { diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 6ff7e0b5c544..feae6c17b291 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -115,16 +115,20 @@ static struct unix_vertex *unix_edge_successor(struct unix_edge *edge) static LIST_HEAD(unix_unvisited_vertices);
enum unix_vertex_index { - UNIX_VERTEX_INDEX_UNVISITED, + UNIX_VERTEX_INDEX_MARK1, + UNIX_VERTEX_INDEX_MARK2, UNIX_VERTEX_INDEX_START, };
+static unsigned long unix_vertex_unvisited_index = UNIX_VERTEX_INDEX_MARK1; + static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex = edge->predecessor->vertex;
if (!vertex) { vertex = list_first_entry(&fpl->vertices, typeof(*vertex), entry); + vertex->index = unix_vertex_unvisited_index; vertex->out_degree = 0; INIT_LIST_HEAD(&vertex->edges);
@@ -265,6 +269,7 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) }
static LIST_HEAD(unix_visited_vertices); +static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
static void __unix_walk_scc(struct unix_vertex *vertex) { @@ -274,10 +279,10 @@ static void __unix_walk_scc(struct unix_vertex *vertex) LIST_HEAD(edge_stack);
next_vertex: - /* Push vertex to vertex_stack. + /* Push vertex to vertex_stack and mark it as on-stack + * (index >= UNIX_VERTEX_INDEX_START). * The vertex will be popped when finalising SCC later. */ - vertex->on_stack = true; list_add(&vertex->scc_entry, &vertex_stack);
vertex->index = index; @@ -291,7 +296,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex) if (!next_vertex) continue;
- if (next_vertex->index == UNIX_VERTEX_INDEX_UNVISITED) { + if (next_vertex->index == unix_vertex_unvisited_index) { /* Iterative deepening depth first search * * 1. Push a forward edge to edge_stack and set @@ -317,7 +322,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex) * to skip SCC finalisation. */ vertex->lowlink = min(vertex->lowlink, next_vertex->lowlink); - } else if (next_vertex->on_stack) { + } else if (next_vertex->index != unix_vertex_grouped_index) { /* Loop detected by a back/cross edge. * * The successor is on vertex_stack, so two vertices are @@ -344,7 +349,8 @@ static void __unix_walk_scc(struct unix_vertex *vertex) /* Don't restart DFS from this vertex in unix_walk_scc(). */ list_move_tail(&vertex->entry, &unix_visited_vertices);
- vertex->on_stack = false; + /* Mark vertex as off-stack. */ + vertex->index = unix_vertex_grouped_index; }
list_del(&scc); @@ -357,20 +363,18 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
static void unix_walk_scc(void) { - struct unix_vertex *vertex; - - list_for_each_entry(vertex, &unix_unvisited_vertices, entry) - vertex->index = UNIX_VERTEX_INDEX_UNVISITED; - /* Visit every vertex exactly once. * __unix_walk_scc() moves visited vertices to unix_visited_vertices. */ while (!list_empty(&unix_unvisited_vertices)) { + struct unix_vertex *vertex; + vertex = list_first_entry(&unix_unvisited_vertices, typeof(*vertex), entry); __unix_walk_scc(vertex); }
list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); + swap(unix_vertex_unvisited_index, unix_vertex_grouped_index); }
static LIST_HEAD(gc_candidates);
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: ba31b4a4e1018f5844c6eb31734976e2184f2f9a
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: ba31b4a4e1018 ! 1: e40450e8778d8 af_unix: Save O(n) setup of Tarjan's algo. @@ Metadata ## Commit message ## af_unix: Save O(n) setup of Tarjan's algo.
+ [ Upstream commit ba31b4a4e1018f5844c6eb31734976e2184f2f9a ] + Before starting Tarjan's algorithm, we need to mark all vertices as unvisited. We can save this O(n) setup by reserving two special indices (0, 1) and using two variables. @@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-10-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit ba31b4a4e1018f5844c6eb31734976e2184f2f9a) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## @@ include/net/af_unix.h: struct unix_vertex { ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 77e5593aebba823bcbcf2c4b58b07efcd63933b8 ]
We do not need to run GC if there is no possible cyclic reference. We use unix_graph_maybe_cyclic to decide if we should run GC.
If a fd of an AF_UNIX socket is passed to an already inflight AF_UNIX socket, they could form a cyclic reference. Then, we set true to unix_graph_maybe_cyclic and later run Tarjan's algorithm to group them into SCC.
Once we run Tarjan's algorithm, we are 100% sure whether cyclic references exist or not. If there is no cycle, we set false to unix_graph_maybe_cyclic and can skip the entire garbage collection next time.
When finalising SCC, we set true to unix_graph_maybe_cyclic if SCC consists of multiple vertices.
Even if SCC is a single vertex, a cycle might exist as self-fd passing. Given the corner case is rare, we detect it by checking all edges of the vertex and set true to unix_graph_maybe_cyclic.
With this change, __unix_gc() is just a spin_lock() dance in the normal usage.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-11-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 77e5593aebba823bcbcf2c4b58b07efcd63933b8) Signed-off-by: Lee Jones lee@kernel.org --- net/unix/garbage.c | 48 +++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 47 insertions(+), 1 deletion(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index feae6c17b291..8f0dc39bb72f 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -112,6 +112,19 @@ static struct unix_vertex *unix_edge_successor(struct unix_edge *edge) return edge->successor->vertex; }
+static bool unix_graph_maybe_cyclic; + +static void unix_update_graph(struct unix_vertex *vertex) +{ + /* If the receiver socket is not inflight, no cyclic + * reference could be formed. + */ + if (!vertex) + return; + + unix_graph_maybe_cyclic = true; +} + static LIST_HEAD(unix_unvisited_vertices);
enum unix_vertex_index { @@ -138,12 +151,16 @@ static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge)
vertex->out_degree++; list_add_tail(&edge->vertex_entry, &vertex->edges); + + unix_update_graph(unix_edge_successor(edge)); }
static void unix_del_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex = edge->predecessor->vertex;
+ unix_update_graph(unix_edge_successor(edge)); + list_del(&edge->vertex_entry); vertex->out_degree--;
@@ -227,6 +244,7 @@ void unix_del_edges(struct scm_fp_list *fpl) void unix_update_edges(struct unix_sock *receiver) { spin_lock(&unix_gc_lock); + unix_update_graph(unix_sk(receiver->listener)->vertex); receiver->listener = NULL; spin_unlock(&unix_gc_lock); } @@ -268,6 +286,26 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) unix_free_vertices(fpl); }
+static bool unix_scc_cyclic(struct list_head *scc) +{ + struct unix_vertex *vertex; + struct unix_edge *edge; + + /* SCC containing multiple vertices ? */ + if (!list_is_singular(scc)) + return true; + + vertex = list_first_entry(scc, typeof(*vertex), scc_entry); + + /* Self-reference or a embryo-listener circle ? */ + list_for_each_entry(edge, &vertex->edges, vertex_entry) { + if (unix_edge_successor(edge) == vertex) + return true; + } + + return false; +} + static LIST_HEAD(unix_visited_vertices); static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
@@ -353,6 +391,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex) vertex->index = unix_vertex_grouped_index; }
+ if (!unix_graph_maybe_cyclic) + unix_graph_maybe_cyclic = unix_scc_cyclic(&scc); + list_del(&scc); }
@@ -363,6 +404,8 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
static void unix_walk_scc(void) { + unix_graph_maybe_cyclic = false; + /* Visit every vertex exactly once. * __unix_walk_scc() moves visited vertices to unix_visited_vertices. */ @@ -524,6 +567,9 @@ static void __unix_gc(struct work_struct *work)
spin_lock(&unix_gc_lock);
+ if (!unix_graph_maybe_cyclic) + goto skip_gc; + unix_walk_scc();
/* First, select candidates for garbage collection. Only @@ -633,7 +679,7 @@ static void __unix_gc(struct work_struct *work)
/* All candidates should have been detached by now. */ WARN_ON_ONCE(!list_empty(&gc_candidates)); - +skip_gc: /* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false);
[ Sasha's backport helper bot ]
Hi,
Summary of potential issues: ℹ️ This is part 17/27 of a series ⚠️ Found follow-up fixes in mainline
The upstream commit SHA1 provided is correct: 77e5593aebba823bcbcf2c4b58b07efcd63933b8
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Found fixes commits: 1af2dface5d2 af_unix: Don't access successor in unix_del_edges() during GC.
Note: The patch differs from the upstream commit: --- 1: 77e5593aebba8 ! 1: 15f6ca2c3df2b af_unix: Skip GC if no cycle exists. @@ Metadata ## Commit message ## af_unix: Skip GC if no cycle exists.
+ [ Upstream commit 77e5593aebba823bcbcf2c4b58b07efcd63933b8 ] + We do not need to run GC if there is no possible cyclic reference. We use unix_graph_maybe_cyclic to decide if we should run GC.
@@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-11-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit 77e5593aebba823bcbcf2c4b58b07efcd63933b8) + Signed-off-by: Lee Jones lee@kernel.org
## net/unix/garbage.c ## @@ net/unix/garbage.c: static struct unix_vertex *unix_edge_successor(struct unix_edge *edge) ---
NOTE: These results are for this patch alone. Full series testing will be performed when all parts are received.
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit ad081928a8b0f57f269df999a28087fce6f2b6ce ]
Once a cyclic reference is formed, we need to run GC to check if there is dead SCC.
However, we do not need to run Tarjan's algorithm if we know that the shape of the inflight graph has not been changed.
If an edge is added/updated/deleted and the edge's successor is inflight, we set false to unix_graph_grouped, which means we need to re-classify SCC.
Once we finalise SCC, we set true to unix_graph_grouped.
While unix_graph_grouped is true, we can iterate the grouped SCC using vertex->scc_entry in unix_walk_scc_fast().
list_add() and list_for_each_entry_reverse() uses seem weird, but they are to keep the vertex order consistent and make writing test easier.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-12-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit ad081928a8b0f57f269df999a28087fce6f2b6ce) Signed-off-by: Lee Jones lee@kernel.org --- net/unix/garbage.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 8f0dc39bb72f..d25841ab2de4 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -113,6 +113,7 @@ static struct unix_vertex *unix_edge_successor(struct unix_edge *edge) }
static bool unix_graph_maybe_cyclic; +static bool unix_graph_grouped;
static void unix_update_graph(struct unix_vertex *vertex) { @@ -123,6 +124,7 @@ static void unix_update_graph(struct unix_vertex *vertex) return;
unix_graph_maybe_cyclic = true; + unix_graph_grouped = false; }
static LIST_HEAD(unix_unvisited_vertices); @@ -144,6 +146,7 @@ static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) vertex->index = unix_vertex_unvisited_index; vertex->out_degree = 0; INIT_LIST_HEAD(&vertex->edges); + INIT_LIST_HEAD(&vertex->scc_entry);
list_move_tail(&vertex->entry, &unix_unvisited_vertices); edge->predecessor->vertex = vertex; @@ -418,6 +421,26 @@ static void unix_walk_scc(void)
list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); swap(unix_vertex_unvisited_index, unix_vertex_grouped_index); + + unix_graph_grouped = true; +} + +static void unix_walk_scc_fast(void) +{ + while (!list_empty(&unix_unvisited_vertices)) { + struct unix_vertex *vertex; + struct list_head scc; + + vertex = list_first_entry(&unix_unvisited_vertices, typeof(*vertex), entry); + list_add(&scc, &vertex->scc_entry); + + list_for_each_entry_reverse(vertex, &scc, scc_entry) + list_move_tail(&vertex->entry, &unix_visited_vertices); + + list_del(&scc); + } + + list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); }
static LIST_HEAD(gc_candidates); @@ -570,7 +593,10 @@ static void __unix_gc(struct work_struct *work) if (!unix_graph_maybe_cyclic) goto skip_gc;
- unix_walk_scc(); + if (unix_graph_grouped) + unix_walk_scc_fast(); + else + unix_walk_scc();
/* First, select candidates for garbage collection. Only * in-flight sockets are considered, and from those only ones
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: ad081928a8b0f57f269df999a28087fce6f2b6ce
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: ad081928a8b0f ! 1: ecaebb7bb5591 af_unix: Avoid Tarjan's algorithm if unnecessary. @@ Metadata ## Commit message ## af_unix: Avoid Tarjan's algorithm if unnecessary.
+ [ Upstream commit ad081928a8b0f57f269df999a28087fce6f2b6ce ] + Once a cyclic reference is formed, we need to run GC to check if there is dead SCC.
@@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-12-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit ad081928a8b0f57f269df999a28087fce6f2b6ce) + Signed-off-by: Lee Jones lee@kernel.org
## net/unix/garbage.c ## @@ net/unix/garbage.c: static struct unix_vertex *unix_edge_successor(struct unix_edge *edge) ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit bfdb01283ee8f2f3089656c3ff8f62bb072dabb2 ]
The definition of the lowlink in Tarjan's algorithm is the smallest index of a vertex that is reachable with at most one back-edge in SCC. This is not useful for a cross-edge.
If we start traversing from A in the following graph, the final lowlink of D is 3. The cross-edge here is one between D and C.
A -> B -> D D = (4, 3) (index, lowlink) ^ | | C = (3, 1) | V | B = (2, 1) `--- C <--' A = (1, 1)
This is because the lowlink of D is updated with the index of C.
In the following patch, we detect a dead SCC by checking two conditions for each vertex.
1) vertex has no edge directed to another SCC (no bridge) 2) vertex's out_degree is the same as the refcount of its file
If 1) is false, there is a receiver of all fds of the SCC and its ancestor SCC.
To evaluate 1), we need to assign a unique index to each SCC and assign it to all vertices in the SCC.
This patch changes the lowlink update logic for cross-edge so that in the example above, the lowlink of D is updated with the lowlink of C.
A -> B -> D D = (4, 1) (index, lowlink) ^ | | C = (3, 1) | V | B = (2, 1) `--- C <--' A = (1, 1)
Then, all vertices in the same SCC have the same lowlink, and we can quickly find the bridge connecting to different SCC if exists.
However, it is no longer called lowlink, so we rename it to scc_index. (It's sometimes called lowpoint.)
Also, we add a global variable to hold the last index used in DFS so that we do not reset the initial index in each DFS.
This patch can be squashed to the SCC detection patch but is split deliberately for anyone wondering why lowlink is not used as used in the original Tarjan's algorithm and many reference implementations.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-13-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit bfdb01283ee8f2f3089656c3ff8f62bb072dabb2) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 2 +- net/unix/garbage.c | 29 +++++++++++++++-------------- 2 files changed, 16 insertions(+), 15 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index ffbc7322e41b..14d56b07a54d 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -36,7 +36,7 @@ struct unix_vertex { struct list_head scc_entry; unsigned long out_degree; unsigned long index; - unsigned long lowlink; + unsigned long scc_index; };
struct unix_edge { diff --git a/net/unix/garbage.c b/net/unix/garbage.c index d25841ab2de4..2e66b57f3f0f 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -312,9 +312,8 @@ static bool unix_scc_cyclic(struct list_head *scc) static LIST_HEAD(unix_visited_vertices); static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
-static void __unix_walk_scc(struct unix_vertex *vertex) +static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *last_index) { - unsigned long index = UNIX_VERTEX_INDEX_START; LIST_HEAD(vertex_stack); struct unix_edge *edge; LIST_HEAD(edge_stack); @@ -326,9 +325,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex) */ list_add(&vertex->scc_entry, &vertex_stack);
- vertex->index = index; - vertex->lowlink = index; - index++; + vertex->index = *last_index; + vertex->scc_index = *last_index; + (*last_index)++;
/* Explore neighbour vertices (receivers of the current vertex's fd). */ list_for_each_entry(edge, &vertex->edges, vertex_entry) { @@ -358,30 +357,30 @@ static void __unix_walk_scc(struct unix_vertex *vertex) next_vertex = vertex; vertex = edge->predecessor->vertex;
- /* If the successor has a smaller lowlink, two vertices - * are in the same SCC, so propagate the smaller lowlink + /* If the successor has a smaller scc_index, two vertices + * are in the same SCC, so propagate the smaller scc_index * to skip SCC finalisation. */ - vertex->lowlink = min(vertex->lowlink, next_vertex->lowlink); + vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index); } else if (next_vertex->index != unix_vertex_grouped_index) { /* Loop detected by a back/cross edge. * - * The successor is on vertex_stack, so two vertices are - * in the same SCC. If the successor has a smaller index, + * The successor is on vertex_stack, so two vertices are in + * the same SCC. If the successor has a smaller *scc_index*, * propagate it to skip SCC finalisation. */ - vertex->lowlink = min(vertex->lowlink, next_vertex->index); + vertex->scc_index = min(vertex->scc_index, next_vertex->scc_index); } else { /* The successor was already grouped as another SCC */ } }
- if (vertex->index == vertex->lowlink) { + if (vertex->index == vertex->scc_index) { struct list_head scc;
/* SCC finalised. * - * If the lowlink was not updated, all the vertices above on + * If the scc_index was not updated, all the vertices above on * vertex_stack are in the same SCC. Group them using scc_entry. */ __list_cut_position(&scc, &vertex_stack, &vertex->scc_entry); @@ -407,6 +406,8 @@ static void __unix_walk_scc(struct unix_vertex *vertex)
static void unix_walk_scc(void) { + unsigned long last_index = UNIX_VERTEX_INDEX_START; + unix_graph_maybe_cyclic = false;
/* Visit every vertex exactly once. @@ -416,7 +417,7 @@ static void unix_walk_scc(void) struct unix_vertex *vertex;
vertex = list_first_entry(&unix_unvisited_vertices, typeof(*vertex), entry); - __unix_walk_scc(vertex); + __unix_walk_scc(vertex, &last_index); }
list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices);
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: bfdb01283ee8f2f3089656c3ff8f62bb072dabb2
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: bfdb01283ee8f ! 1: a0878968a63fd af_unix: Assign a unique index to SCC. @@ Metadata ## Commit message ## af_unix: Assign a unique index to SCC.
+ [ Upstream commit bfdb01283ee8f2f3089656c3ff8f62bb072dabb2 ] + The definition of the lowlink in Tarjan's algorithm is the smallest index of a vertex that is reachable with at most one back-edge in SCC. This is not useful for a cross-edge. @@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-13-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit bfdb01283ee8f2f3089656c3ff8f62bb072dabb2) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## @@ include/net/af_unix.h: struct unix_vertex { ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit a15702d8b3aad8ce5268c565bd29f0e02fd2db83 ]
When iterating SCC, we call unix_vertex_dead() for each vertex to check if the vertex is close()d and has no bridge to another SCC.
If both conditions are true for every vertex in SCC, we can execute garbage collection for all skb in the SCC.
The actual garbage collection is done in the following patch, replacing the old implementation.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-14-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit a15702d8b3aad8ce5268c565bd29f0e02fd2db83) Signed-off-by: Lee Jones lee@kernel.org --- net/unix/garbage.c | 44 +++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 43 insertions(+), 1 deletion(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 2e66b57f3f0f..1f53c25fc71b 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -289,6 +289,39 @@ void unix_destroy_fpl(struct scm_fp_list *fpl) unix_free_vertices(fpl); }
+static bool unix_vertex_dead(struct unix_vertex *vertex) +{ + struct unix_edge *edge; + struct unix_sock *u; + long total_ref; + + list_for_each_entry(edge, &vertex->edges, vertex_entry) { + struct unix_vertex *next_vertex = unix_edge_successor(edge); + + /* The vertex's fd can be received by a non-inflight socket. */ + if (!next_vertex) + return false; + + /* The vertex's fd can be received by an inflight socket in + * another SCC. + */ + if (next_vertex->scc_index != vertex->scc_index) + return false; + } + + /* No receiver exists out of the same SCC. */ + + edge = list_first_entry(&vertex->edges, typeof(*edge), vertex_entry); + u = edge->predecessor; + total_ref = file_count(u->sk.sk_socket->file); + + /* If not close()d, total_ref > out_degree. */ + if (total_ref != vertex->out_degree) + return false; + + return true; +} + static bool unix_scc_cyclic(struct list_head *scc) { struct unix_vertex *vertex; @@ -377,6 +410,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *last_inde
if (vertex->index == vertex->scc_index) { struct list_head scc; + bool scc_dead = true;
/* SCC finalised. * @@ -391,6 +425,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *last_inde
/* Mark vertex as off-stack. */ vertex->index = unix_vertex_grouped_index; + + if (scc_dead) + scc_dead = unix_vertex_dead(vertex); }
if (!unix_graph_maybe_cyclic) @@ -431,13 +468,18 @@ static void unix_walk_scc_fast(void) while (!list_empty(&unix_unvisited_vertices)) { struct unix_vertex *vertex; struct list_head scc; + bool scc_dead = true;
vertex = list_first_entry(&unix_unvisited_vertices, typeof(*vertex), entry); list_add(&scc, &vertex->scc_entry);
- list_for_each_entry_reverse(vertex, &scc, scc_entry) + list_for_each_entry_reverse(vertex, &scc, scc_entry) { list_move_tail(&vertex->entry, &unix_visited_vertices);
+ if (scc_dead) + scc_dead = unix_vertex_dead(vertex); + } + list_del(&scc); }
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: a15702d8b3aad8ce5268c565bd29f0e02fd2db83
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: a15702d8b3aad ! 1: cf7e2580a8d3f af_unix: Detect dead SCC. @@ Metadata ## Commit message ## af_unix: Detect dead SCC.
+ [ Upstream commit a15702d8b3aad8ce5268c565bd29f0e02fd2db83 ] + When iterating SCC, we call unix_vertex_dead() for each vertex to check if the vertex is close()d and has no bridge to another SCC. @@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-14-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit a15702d8b3aad8ce5268c565bd29f0e02fd2db83) + Signed-off-by: Lee Jones lee@kernel.org
## net/unix/garbage.c ## @@ net/unix/garbage.c: void unix_destroy_fpl(struct scm_fp_list *fpl) ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 4090fa373f0e763c43610853d2774b5979915959 ]
If we find a dead SCC during iteration, we call unix_collect_skb() to splice all skb in the SCC to the global sk_buff_head, hitlist.
After iterating all SCC, we unlock unix_gc_lock and purge the queue.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-15-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 4090fa373f0e763c43610853d2774b5979915959) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 8 -- net/unix/af_unix.c | 12 -- net/unix/garbage.c | 318 +++++++++--------------------------------- 3 files changed, 64 insertions(+), 274 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 14d56b07a54d..47808e366731 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -19,9 +19,6 @@ static inline struct unix_sock *unix_get_socket(struct file *filp)
extern spinlock_t unix_gc_lock; extern unsigned int unix_tot_inflight; - -void unix_inflight(struct user_struct *user, struct file *fp); -void unix_notinflight(struct user_struct *user, struct file *fp); void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver); void unix_del_edges(struct scm_fp_list *fpl); void unix_update_edges(struct unix_sock *receiver); @@ -85,12 +82,7 @@ struct unix_sock { struct sock *peer; struct sock *listener; struct unix_vertex *vertex; - struct list_head link; - unsigned long inflight; spinlock_t lock; - unsigned long gc_flags; -#define UNIX_GC_CANDIDATE 0 -#define UNIX_GC_MAYBE_CYCLE 1 struct socket_wq peer_wq; wait_queue_entry_t peer_wake; struct scm_stat scm_stat; diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 4d8b2b2b9a70..25f66adf47d1 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -955,12 +955,10 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern, sk->sk_destruct = unix_sock_destructor; u = unix_sk(sk); u->listener = NULL; - u->inflight = 0; u->vertex = NULL; u->path.dentry = NULL; u->path.mnt = NULL; spin_lock_init(&u->lock); - INIT_LIST_HEAD(&u->link); mutex_init(&u->iolock); /* single task reading lock */ mutex_init(&u->bindlock); /* single task binding lock */ init_waitqueue_head(&u->peer_wait); @@ -1744,8 +1742,6 @@ static inline bool too_many_unix_fds(struct task_struct *p)
static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) { - int i; - if (too_many_unix_fds(current)) return -ETOOMANYREFS;
@@ -1757,9 +1753,6 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb) if (!UNIXCB(skb).fp) return -ENOMEM;
- for (i = scm->fp->count - 1; i >= 0; i--) - unix_inflight(scm->fp->user, scm->fp->fp[i]); - if (unix_prepare_fpl(UNIXCB(skb).fp)) return -ENOMEM;
@@ -1768,15 +1761,10 @@ static int unix_attach_fds(struct scm_cookie *scm, struct sk_buff *skb)
static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb) { - int i; - scm->fp = UNIXCB(skb).fp; UNIXCB(skb).fp = NULL;
unix_destroy_fpl(scm->fp); - - for (i = scm->fp->count - 1; i >= 0; i--) - unix_notinflight(scm->fp->user, scm->fp->fp[i]); }
static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 1f53c25fc71b..89ea71d9297b 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -322,6 +322,52 @@ static bool unix_vertex_dead(struct unix_vertex *vertex) return true; }
+enum unix_recv_queue_lock_class { + U_RECVQ_LOCK_NORMAL, + U_RECVQ_LOCK_EMBRYO, +}; + +static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *hitlist) +{ + struct unix_vertex *vertex; + + list_for_each_entry_reverse(vertex, scc, scc_entry) { + struct sk_buff_head *queue; + struct unix_edge *edge; + struct unix_sock *u; + + edge = list_first_entry(&vertex->edges, typeof(*edge), vertex_entry); + u = edge->predecessor; + queue = &u->sk.sk_receive_queue; + + spin_lock(&queue->lock); + + if (u->sk.sk_state == TCP_LISTEN) { + struct sk_buff *skb; + + skb_queue_walk(queue, skb) { + struct sk_buff_head *embryo_queue = &skb->sk->sk_receive_queue; + + /* listener -> embryo order, the inversion never happens. */ + spin_lock_nested(&embryo_queue->lock, U_RECVQ_LOCK_EMBRYO); + skb_queue_splice_init(embryo_queue, hitlist); + spin_unlock(&embryo_queue->lock); + } + } else { + skb_queue_splice_init(queue, hitlist); + +#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + if (u->oob_skb) { + kfree_skb(u->oob_skb); + u->oob_skb = NULL; + } +#endif + } + + spin_unlock(&queue->lock); + } +} + static bool unix_scc_cyclic(struct list_head *scc) { struct unix_vertex *vertex; @@ -345,7 +391,8 @@ static bool unix_scc_cyclic(struct list_head *scc) static LIST_HEAD(unix_visited_vertices); static unsigned long unix_vertex_grouped_index = UNIX_VERTEX_INDEX_MARK2;
-static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *last_index) +static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *last_index, + struct sk_buff_head *hitlist) { LIST_HEAD(vertex_stack); struct unix_edge *edge; @@ -430,7 +477,9 @@ static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *last_inde scc_dead = unix_vertex_dead(vertex); }
- if (!unix_graph_maybe_cyclic) + if (scc_dead) + unix_collect_skb(&scc, hitlist); + else if (!unix_graph_maybe_cyclic) unix_graph_maybe_cyclic = unix_scc_cyclic(&scc);
list_del(&scc); @@ -441,7 +490,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *last_inde goto prev_vertex; }
-static void unix_walk_scc(void) +static void unix_walk_scc(struct sk_buff_head *hitlist) { unsigned long last_index = UNIX_VERTEX_INDEX_START;
@@ -454,7 +503,7 @@ static void unix_walk_scc(void) struct unix_vertex *vertex;
vertex = list_first_entry(&unix_unvisited_vertices, typeof(*vertex), entry); - __unix_walk_scc(vertex, &last_index); + __unix_walk_scc(vertex, &last_index, hitlist); }
list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); @@ -463,7 +512,7 @@ static void unix_walk_scc(void) unix_graph_grouped = true; }
-static void unix_walk_scc_fast(void) +static void unix_walk_scc_fast(struct sk_buff_head *hitlist) { while (!list_empty(&unix_unvisited_vertices)) { struct unix_vertex *vertex; @@ -480,279 +529,40 @@ static void unix_walk_scc_fast(void) scc_dead = unix_vertex_dead(vertex); }
+ if (scc_dead) + unix_collect_skb(&scc, hitlist); + list_del(&scc); }
list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); }
-static LIST_HEAD(gc_candidates); -static LIST_HEAD(gc_inflight_list); - -/* Keep the number of times in flight count for the file - * descriptor if it is for an AF_UNIX socket. - */ -void unix_inflight(struct user_struct *user, struct file *filp) -{ - struct unix_sock *u = unix_get_socket(filp); - - spin_lock(&unix_gc_lock); - - if (u) { - if (!u->inflight) { - WARN_ON_ONCE(!list_empty(&u->link)); - list_add_tail(&u->link, &gc_inflight_list); - } else { - WARN_ON_ONCE(list_empty(&u->link)); - } - u->inflight++; - } - - spin_unlock(&unix_gc_lock); -} - -void unix_notinflight(struct user_struct *user, struct file *filp) -{ - struct unix_sock *u = unix_get_socket(filp); - - spin_lock(&unix_gc_lock); - - if (u) { - WARN_ON_ONCE(!u->inflight); - WARN_ON_ONCE(list_empty(&u->link)); - - u->inflight--; - if (!u->inflight) - list_del_init(&u->link); - } - - spin_unlock(&unix_gc_lock); -} - -static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *), - struct sk_buff_head *hitlist) -{ - struct sk_buff *skb; - struct sk_buff *next; - - spin_lock(&x->sk_receive_queue.lock); - skb_queue_walk_safe(&x->sk_receive_queue, skb, next) { - /* Do we have file descriptors ? */ - if (UNIXCB(skb).fp) { - bool hit = false; - /* Process the descriptors of this socket */ - int nfd = UNIXCB(skb).fp->count; - struct file **fp = UNIXCB(skb).fp->fp; - - while (nfd--) { - /* Get the socket the fd matches if it indeed does so */ - struct unix_sock *u = unix_get_socket(*fp++); - - /* Ignore non-candidates, they could have been added - * to the queues after starting the garbage collection - */ - if (u && test_bit(UNIX_GC_CANDIDATE, &u->gc_flags)) { - hit = true; - - func(u); - } - } - if (hit && hitlist != NULL) { - __skb_unlink(skb, &x->sk_receive_queue); - __skb_queue_tail(hitlist, skb); - } - } - } - spin_unlock(&x->sk_receive_queue.lock); -} - -static void scan_children(struct sock *x, void (*func)(struct unix_sock *), - struct sk_buff_head *hitlist) -{ - if (x->sk_state != TCP_LISTEN) { - scan_inflight(x, func, hitlist); - } else { - struct sk_buff *skb; - struct sk_buff *next; - struct unix_sock *u; - LIST_HEAD(embryos); - - /* For a listening socket collect the queued embryos - * and perform a scan on them as well. - */ - spin_lock(&x->sk_receive_queue.lock); - skb_queue_walk_safe(&x->sk_receive_queue, skb, next) { - u = unix_sk(skb->sk); - - /* An embryo cannot be in-flight, so it's safe - * to use the list link. - */ - WARN_ON_ONCE(!list_empty(&u->link)); - list_add_tail(&u->link, &embryos); - } - spin_unlock(&x->sk_receive_queue.lock); - - while (!list_empty(&embryos)) { - u = list_entry(embryos.next, struct unix_sock, link); - scan_inflight(&u->sk, func, hitlist); - list_del_init(&u->link); - } - } -} - -static void dec_inflight(struct unix_sock *usk) -{ - usk->inflight--; -} - -static void inc_inflight(struct unix_sock *usk) -{ - usk->inflight++; -} - -static void inc_inflight_move_tail(struct unix_sock *u) -{ - u->inflight++; - - /* If this still might be part of a cycle, move it to the end - * of the list, so that it's checked even if it was already - * passed over - */ - if (test_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags)) - list_move_tail(&u->link, &gc_candidates); -} - static bool gc_in_progress;
static void __unix_gc(struct work_struct *work) { struct sk_buff_head hitlist; - struct unix_sock *u, *next; - LIST_HEAD(not_cycle_list); - struct list_head cursor;
spin_lock(&unix_gc_lock);
- if (!unix_graph_maybe_cyclic) + if (!unix_graph_maybe_cyclic) { + spin_unlock(&unix_gc_lock); goto skip_gc; - - if (unix_graph_grouped) - unix_walk_scc_fast(); - else - unix_walk_scc(); - - /* First, select candidates for garbage collection. Only - * in-flight sockets are considered, and from those only ones - * which don't have any external reference. - * - * Holding unix_gc_lock will protect these candidates from - * being detached, and hence from gaining an external - * reference. Since there are no possible receivers, all - * buffers currently on the candidates' queues stay there - * during the garbage collection. - * - * We also know that no new candidate can be added onto the - * receive queues. Other, non candidate sockets _can_ be - * added to queue, so we must make sure only to touch - * candidates. - * - * Embryos, though never candidates themselves, affect which - * candidates are reachable by the garbage collector. Before - * being added to a listener's queue, an embryo may already - * receive data carrying SCM_RIGHTS, potentially making the - * passed socket a candidate that is not yet reachable by the - * collector. It becomes reachable once the embryo is - * enqueued. Therefore, we must ensure that no SCM-laden - * embryo appears in a (candidate) listener's queue between - * consecutive scan_children() calls. - */ - list_for_each_entry_safe(u, next, &gc_inflight_list, link) { - struct sock *sk = &u->sk; - long total_refs; - - total_refs = file_count(sk->sk_socket->file); - - WARN_ON_ONCE(!u->inflight); - WARN_ON_ONCE(total_refs < u->inflight); - if (total_refs == u->inflight) { - list_move_tail(&u->link, &gc_candidates); - __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags); - __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); - - if (sk->sk_state == TCP_LISTEN) { - unix_state_lock_nested(sk, U_LOCK_GC_LISTENER); - unix_state_unlock(sk); - } - } - } - - /* Now remove all internal in-flight reference to children of - * the candidates. - */ - list_for_each_entry(u, &gc_candidates, link) - scan_children(&u->sk, dec_inflight, NULL); - - /* Restore the references for children of all candidates, - * which have remaining references. Do this recursively, so - * only those remain, which form cyclic references. - * - * Use a "cursor" link, to make the list traversal safe, even - * though elements might be moved about. - */ - list_add(&cursor, &gc_candidates); - while (cursor.next != &gc_candidates) { - u = list_entry(cursor.next, struct unix_sock, link); - - /* Move cursor to after the current position. */ - list_move(&cursor, &u->link); - - if (u->inflight) { - list_move_tail(&u->link, ¬_cycle_list); - __clear_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); - scan_children(&u->sk, inc_inflight_move_tail, NULL); - } } - list_del(&cursor);
- /* Now gc_candidates contains only garbage. Restore original - * inflight counters for these as well, and remove the skbuffs - * which are creating the cycle(s). - */ - skb_queue_head_init(&hitlist); - list_for_each_entry(u, &gc_candidates, link) { - scan_children(&u->sk, inc_inflight, &hitlist); + __skb_queue_head_init(&hitlist);
-#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - if (u->oob_skb) { - kfree_skb(u->oob_skb); - u->oob_skb = NULL; - } -#endif - } - - /* not_cycle_list contains those sockets which do not make up a - * cycle. Restore these to the inflight list. - */ - while (!list_empty(¬_cycle_list)) { - u = list_entry(not_cycle_list.next, struct unix_sock, link); - __clear_bit(UNIX_GC_CANDIDATE, &u->gc_flags); - list_move_tail(&u->link, &gc_inflight_list); - } + if (unix_graph_grouped) + unix_walk_scc_fast(&hitlist); + else + unix_walk_scc(&hitlist);
spin_unlock(&unix_gc_lock);
- /* Here we are. Hitlist is filled. Die. */ __skb_queue_purge(&hitlist); - - spin_lock(&unix_gc_lock); - - /* All candidates should have been detached by now. */ - WARN_ON_ONCE(!list_empty(&gc_candidates)); skip_gc: - /* Paired with READ_ONCE() in wait_for_unix_gc(). */ WRITE_ONCE(gc_in_progress, false); - - spin_unlock(&unix_gc_lock); }
static DECLARE_WORK(unix_gc_work, __unix_gc);
[ Sasha's backport helper bot ]
Hi,
Summary of potential issues: ℹ️ This is part 21/27 of a series ⚠️ Found follow-up fixes in mainline
The upstream commit SHA1 provided is correct: 4090fa373f0e763c43610853d2774b5979915959
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Found fixes commits: 041933a1ec7b af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
Note: The patch differs from the upstream commit: --- 1: 4090fa373f0e7 ! 1: 5bd268b2b0ecc af_unix: Replace garbage collection algorithm. @@ Metadata ## Commit message ## af_unix: Replace garbage collection algorithm.
+ [ Upstream commit 4090fa373f0e763c43610853d2774b5979915959 ] + If we find a dead SCC during iteration, we call unix_collect_skb() to splice all skb in the SCC to the global sk_buff_head, hitlist.
@@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240325202425.60930-15-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit 4090fa373f0e763c43610853d2774b5979915959) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## @@ include/net/af_unix.h: static inline struct unix_sock *unix_get_socket(struct file *filp) @@ net/unix/garbage.c: static void unix_walk_scc_fast(void) - * receive queues. Other, non candidate sockets _can_ be - * added to queue, so we must make sure only to touch - * candidates. +- * +- * Embryos, though never candidates themselves, affect which +- * candidates are reachable by the garbage collector. Before +- * being added to a listener's queue, an embryo may already +- * receive data carrying SCM_RIGHTS, potentially making the +- * passed socket a candidate that is not yet reachable by the +- * collector. It becomes reachable once the embryo is +- * enqueued. Therefore, we must ensure that no SCM-laden +- * embryo appears in a (candidate) listener's queue between +- * consecutive scan_children() calls. - */ - list_for_each_entry_safe(u, next, &gc_inflight_list, link) { +- struct sock *sk = &u->sk; - long total_refs; - -- total_refs = file_count(u->sk.sk_socket->file); +- total_refs = file_count(sk->sk_socket->file); - - WARN_ON_ONCE(!u->inflight); - WARN_ON_ONCE(total_refs < u->inflight); @@ net/unix/garbage.c: static void unix_walk_scc_fast(void) - list_move_tail(&u->link, &gc_candidates); - __set_bit(UNIX_GC_CANDIDATE, &u->gc_flags); - __set_bit(UNIX_GC_MAYBE_CYCLE, &u->gc_flags); +- +- if (sk->sk_state == TCP_LISTEN) { +- unix_state_lock_nested(sk, U_LOCK_GC_LISTENER); +- unix_state_unlock(sk); +- } - } - } - ---
NOTE: These results are for this patch alone. Full series testing will be performed when all parts are received.
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 118f457da9ed58a79e24b73c2ef0aa1987241f0e ]
In the previous GC implementation, the shape of the inflight socket graph was not expected to change while GC was in progress.
MSG_PEEK was tricky because it could install inflight fd silently and transform the graph.
Let's say we peeked a fd, which was a listening socket, and accept()ed some embryo sockets from it. The garbage collection algorithm would have been confused because the set of sockets visited in scan_inflight() would change within the same GC invocation.
That's why we placed spin_lock(&unix_gc_lock) and spin_unlock() in unix_peek_fds() with a fat comment.
In the new GC implementation, we no longer garbage-collect the socket if it exists in another queue, that is, if it has a bridge to another SCC. Also, accept() will require the lock if it has edges.
Thus, we need not do the complicated lock dance.
Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240401173125.92184-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 118f457da9ed58a79e24b73c2ef0aa1987241f0e) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 1 - net/unix/af_unix.c | 42 ------------------------------------------ net/unix/garbage.c | 2 +- 3 files changed, 1 insertion(+), 44 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 47808e366731..4c726df56c0b 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -17,7 +17,6 @@ static inline struct unix_sock *unix_get_socket(struct file *filp) } #endif
-extern spinlock_t unix_gc_lock; extern unsigned int unix_tot_inflight; void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver); void unix_del_edges(struct scm_fp_list *fpl); diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 25f66adf47d1..ce5b74dfd8ae 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1770,48 +1770,6 @@ static void unix_detach_fds(struct scm_cookie *scm, struct sk_buff *skb) static void unix_peek_fds(struct scm_cookie *scm, struct sk_buff *skb) { scm->fp = scm_fp_dup(UNIXCB(skb).fp); - - /* - * Garbage collection of unix sockets starts by selecting a set of - * candidate sockets which have reference only from being in flight - * (total_refs == inflight_refs). This condition is checked once during - * the candidate collection phase, and candidates are marked as such, so - * that non-candidates can later be ignored. While inflight_refs is - * protected by unix_gc_lock, total_refs (file count) is not, hence this - * is an instantaneous decision. - * - * Once a candidate, however, the socket must not be reinstalled into a - * file descriptor while the garbage collection is in progress. - * - * If the above conditions are met, then the directed graph of - * candidates (*) does not change while unix_gc_lock is held. - * - * Any operations that changes the file count through file descriptors - * (dup, close, sendmsg) does not change the graph since candidates are - * not installed in fds. - * - * Dequeing a candidate via recvmsg would install it into an fd, but - * that takes unix_gc_lock to decrement the inflight count, so it's - * serialized with garbage collection. - * - * MSG_PEEK is special in that it does not change the inflight count, - * yet does install the socket into an fd. The following lock/unlock - * pair is to ensure serialization with garbage collection. It must be - * done between incrementing the file count and installing the file into - * an fd. - * - * If garbage collection starts after the barrier provided by the - * lock/unlock, then it will see the elevated refcount and not mark this - * as a candidate. If a garbage collection is already in progress - * before the file count was incremented, then the lock/unlock pair will - * ensure that garbage collection is finished before progressing to - * installing the fd. - * - * (*) A -> B where B is on the queue of A or B is on the queue of C - * which is on the queue of listening socket A. - */ - spin_lock(&unix_gc_lock); - spin_unlock(&unix_gc_lock); }
static void unix_destruct_scm(struct sk_buff *skb) diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 89ea71d9297b..12a4ec27e0d4 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -183,7 +183,7 @@ static void unix_free_vertices(struct scm_fp_list *fpl) } }
-DEFINE_SPINLOCK(unix_gc_lock); +static DEFINE_SPINLOCK(unix_gc_lock); unsigned int unix_tot_inflight;
void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 118f457da9ed58a79e24b73c2ef0aa1987241f0e
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 118f457da9ed5 ! 1: 6abd6af9bc94d af_unix: Remove lock dance in unix_peek_fds(). @@ Metadata ## Commit message ## af_unix: Remove lock dance in unix_peek_fds().
+ [ Upstream commit 118f457da9ed58a79e24b73c2ef0aa1987241f0e ] + In the previous GC implementation, the shape of the inflight socket graph was not expected to change while GC was in progress.
@@ Commit message Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240401173125.92184-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit 118f457da9ed58a79e24b73c2ef0aa1987241f0e) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## @@ include/net/af_unix.h: static inline struct unix_sock *unix_get_socket(struct file *filp) ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit fd86344823b521149bb31d91eba900ba3525efa6 ]
Commit dcf70df2048d ("af_unix: Fix up unix_edge.successor for embryo socket.") added spin_lock(&unix_gc_lock) in accept() path, and it caused regression in a stress test as reported by kernel test robot.
If the embryo socket is not part of the inflight graph, we need not hold the lock.
To decide that in O(1) time and avoid the regression in the normal use case,
1. add a new stat unix_sk(sk)->scm_stat.nr_unix_fds
2. count the number of inflight AF_UNIX sockets in the receive queue under unix_state_lock()
3. move unix_update_edges() call under unix_state_lock()
4. avoid locking if nr_unix_fds is 0 in unix_update_edges()
Reported-by: kernel test robot oliver.sang@intel.com Closes: https://lore.kernel.org/oe-lkp/202404101427.92a08551-oliver.sang@intel.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240413021928.20946-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com (cherry picked from commit fd86344823b521149bb31d91eba900ba3525efa6) Signed-off-by: Lee Jones lee@kernel.org --- include/net/af_unix.h | 1 + net/unix/af_unix.c | 2 +- net/unix/garbage.c | 20 ++++++++++++++++---- 3 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/include/net/af_unix.h b/include/net/af_unix.h index 4c726df56c0b..b1f82d74339e 100644 --- a/include/net/af_unix.h +++ b/include/net/af_unix.h @@ -67,6 +67,7 @@ struct unix_skb_parms {
struct scm_stat { atomic_t nr_fds; + unsigned long nr_unix_fds; };
#define UNIXCB(skb) (*(struct unix_skb_parms *)&((skb)->cb)) diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index ce5b74dfd8ae..79b783a70c87 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1677,12 +1677,12 @@ static int unix_accept(struct socket *sock, struct socket *newsock, int flags, }
tsk = skb->sk; - unix_update_edges(unix_sk(tsk)); skb_free_datagram(sk, skb); wake_up_interruptible(&unix_sk(sk)->peer_wait);
/* attach accepted sock to socket */ unix_state_lock(tsk); + unix_update_edges(unix_sk(tsk)); newsock->state = SS_CONNECTED; unix_sock_inherit_flags(sock, newsock); sock_graft(tsk, newsock); diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 12a4ec27e0d4..95240a59808f 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -209,6 +209,7 @@ void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver) unix_add_edge(fpl, edge); } while (i < fpl->count_unix);
+ receiver->scm_stat.nr_unix_fds += fpl->count_unix; WRITE_ONCE(unix_tot_inflight, unix_tot_inflight + fpl->count_unix); out: WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight + fpl->count); @@ -222,6 +223,7 @@ void unix_add_edges(struct scm_fp_list *fpl, struct unix_sock *receiver)
void unix_del_edges(struct scm_fp_list *fpl) { + struct unix_sock *receiver; int i = 0;
spin_lock(&unix_gc_lock); @@ -235,6 +237,8 @@ void unix_del_edges(struct scm_fp_list *fpl) unix_del_edge(fpl, edge); } while (i < fpl->count_unix);
+ receiver = fpl->edges[0].successor; + receiver->scm_stat.nr_unix_fds -= fpl->count_unix; WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - fpl->count_unix); out: WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight - fpl->count); @@ -246,10 +250,18 @@ void unix_del_edges(struct scm_fp_list *fpl)
void unix_update_edges(struct unix_sock *receiver) { - spin_lock(&unix_gc_lock); - unix_update_graph(unix_sk(receiver->listener)->vertex); - receiver->listener = NULL; - spin_unlock(&unix_gc_lock); + /* nr_unix_fds is only updated under unix_state_lock(). + * If it's 0 here, the embryo socket is not part of the + * inflight graph, and GC will not see it, so no lock needed. + */ + if (!receiver->scm_stat.nr_unix_fds) { + receiver->listener = NULL; + } else { + spin_lock(&unix_gc_lock); + unix_update_graph(unix_sk(receiver->listener)->vertex); + receiver->listener = NULL; + spin_unlock(&unix_gc_lock); + } }
int unix_prepare_fpl(struct scm_fp_list *fpl)
[ Sasha's backport helper bot ]
Hi,
Summary of potential issues: ℹ️ This is part 23/27 of a series ⚠️ Found follow-up fixes in mainline
The upstream commit SHA1 provided is correct: fd86344823b521149bb31d91eba900ba3525efa6
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Found fixes commits: 1af2dface5d2 af_unix: Don't access successor in unix_del_edges() during GC.
Note: The patch differs from the upstream commit: --- 1: fd86344823b52 ! 1: 24f48aefd8acb af_unix: Try not to hold unix_gc_lock during accept(). @@ Metadata ## Commit message ## af_unix: Try not to hold unix_gc_lock during accept().
+ [ Upstream commit fd86344823b521149bb31d91eba900ba3525efa6 ] + Commit dcf70df2048d ("af_unix: Fix up unix_edge.successor for embryo socket.") added spin_lock(&unix_gc_lock) in accept() path, and it caused regression in a stress test as reported by kernel test robot. @@ Commit message Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240413021928.20946-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com + (cherry picked from commit fd86344823b521149bb31d91eba900ba3525efa6) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/af_unix.h ## @@ include/net/af_unix.h: struct unix_skb_parms { ---
NOTE: These results are for this patch alone. Full series testing will be performed when all parts are received.
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 1af2dface5d286dd1f2f3405a0d6fa9f2c8fb998 ]
syzbot reported use-after-free in unix_del_edges(). [0]
What the repro does is basically repeat the following quickly.
1. pass a fd of an AF_UNIX socket to itself
socketpair(AF_UNIX, SOCK_DGRAM, 0, [3, 4]) = 0 sendmsg(3, {..., msg_control=[{cmsg_len=20, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[4]}], ...}, 0) = 0
2. pass other fds of AF_UNIX sockets to the socket above
socketpair(AF_UNIX, SOCK_SEQPACKET, 0, [5, 6]) = 0 sendmsg(3, {..., msg_control=[{cmsg_len=48, cmsg_level=SOL_SOCKET, cmsg_type=SCM_RIGHTS, cmsg_data=[5, 6]}], ...}, 0) = 0
3. close all sockets
Here, two skb are created, and every unix_edge->successor is the first socket. Then, __unix_gc() will garbage-collect the two skb:
(a) free skb with self-referencing fd (b) free skb holding other sockets
After (a), the self-referencing socket will be scheduled to be freed later by the delayed_fput() task.
syzbot repeated the sequences above (1. ~ 3.) quickly and triggered the task concurrently while GC was running.
So, at (b), the socket was already freed, and accessing it was illegal.
unix_del_edges() accesses the receiver socket as edge->successor to optimise GC. However, we should not do it during GC.
Garbage-collecting sockets does not change the shape of the rest of the graph, so we need not call unix_update_graph() to update unix_graph_grouped when we purge skb.
However, if we clean up all loops in the unix_walk_scc_fast() path, unix_graph_maybe_cyclic remains unchanged (true), and __unix_gc() will call unix_walk_scc_fast() continuously even though there is no socket to garbage-collect.
To keep that optimisation while fixing UAF, let's add the same updating logic of unix_graph_maybe_cyclic in unix_walk_scc_fast() as done in unix_walk_scc() and __unix_walk_scc().
Note that when unix_del_edges() is called from other places, the receiver socket is always alive:
- sendmsg: the successor's sk_refcnt is bumped by sock_hold() unix_find_other() for SOCK_DGRAM, connect() for SOCK_STREAM
- recvmsg: the successor is the receiver, and its fd is alive
[0]: BUG: KASAN: slab-use-after-free in unix_edge_successor net/unix/garbage.c:109 [inline] BUG: KASAN: slab-use-after-free in unix_del_edge net/unix/garbage.c:165 [inline] BUG: KASAN: slab-use-after-free in unix_del_edges+0x148/0x630 net/unix/garbage.c:237 Read of size 8 at addr ffff888079c6e640 by task kworker/u8:6/1099
CPU: 0 PID: 1099 Comm: kworker/u8:6 Not tainted 6.9.0-rc4-next-20240418-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Workqueue: events_unbound __unix_gc Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:114 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 unix_edge_successor net/unix/garbage.c:109 [inline] unix_del_edge net/unix/garbage.c:165 [inline] unix_del_edges+0x148/0x630 net/unix/garbage.c:237 unix_destroy_fpl+0x59/0x210 net/unix/garbage.c:298 unix_detach_fds net/unix/af_unix.c:1811 [inline] unix_destruct_scm+0x13e/0x210 net/unix/af_unix.c:1826 skb_release_head_state+0x100/0x250 net/core/skbuff.c:1127 skb_release_all net/core/skbuff.c:1138 [inline] __kfree_skb net/core/skbuff.c:1154 [inline] kfree_skb_reason+0x16d/0x3b0 net/core/skbuff.c:1190 __skb_queue_purge_reason include/linux/skbuff.h:3251 [inline] __skb_queue_purge include/linux/skbuff.h:3256 [inline] __unix_gc+0x1732/0x1830 net/unix/garbage.c:575 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK>
Allocated by task 14427: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:312 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3897 [inline] slab_alloc_node mm/slub.c:3957 [inline] kmem_cache_alloc_noprof+0x135/0x290 mm/slub.c:3964 sk_prot_alloc+0x58/0x210 net/core/sock.c:2074 sk_alloc+0x38/0x370 net/core/sock.c:2133 unix_create1+0xb4/0x770 unix_create+0x14e/0x200 net/unix/af_unix.c:1034 __sock_create+0x490/0x920 net/socket.c:1571 sock_create net/socket.c:1622 [inline] __sys_socketpair+0x33e/0x720 net/socket.c:1773 __do_sys_socketpair net/socket.c:1822 [inline] __se_sys_socketpair net/socket.c:1819 [inline] __x64_sys_socketpair+0x9b/0xb0 net/socket.c:1819 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 1805: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2190 [inline] slab_free mm/slub.c:4393 [inline] kmem_cache_free+0x145/0x340 mm/slub.c:4468 sk_prot_free net/core/sock.c:2114 [inline] __sk_destruct+0x467/0x5f0 net/core/sock.c:2208 sock_put include/net/sock.h:1948 [inline] unix_release_sock+0xa8b/0xd20 net/unix/af_unix.c:665 unix_release+0x91/0xc0 net/unix/af_unix.c:1049 __sock_release net/socket.c:659 [inline] sock_close+0xbc/0x240 net/socket.c:1421 __fput+0x406/0x8b0 fs/file_table.c:422 delayed_fput+0x59/0x80 fs/file_table.c:445 process_one_work kernel/workqueue.c:3218 [inline] process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3299 worker_thread+0x86d/0xd70 kernel/workqueue.c:3380 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
The buggy address belongs to the object at ffff888079c6e000 which belongs to the cache UNIX of size 1920 The buggy address is located 1600 bytes inside of freed 1920-byte region [ffff888079c6e000, ffff888079c6e780)
Reported-by: syzbot+f3f3eef1d2100200e593@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f3f3eef1d2100200e593 Fixes: 77e5593aebba ("af_unix: Skip GC if no cycle exists.") Fixes: fd86344823b5 ("af_unix: Try not to hold unix_gc_lock during accept().") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240419235102.31707-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com (cherry picked from commit 1af2dface5d286dd1f2f3405a0d6fa9f2c8fb998) Signed-off-by: Lee Jones lee@kernel.org --- net/unix/garbage.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 95240a59808f..d76450133e4f 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -158,11 +158,14 @@ static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) unix_update_graph(unix_edge_successor(edge)); }
+static bool gc_in_progress; + static void unix_del_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex = edge->predecessor->vertex;
- unix_update_graph(unix_edge_successor(edge)); + if (!gc_in_progress) + unix_update_graph(unix_edge_successor(edge));
list_del(&edge->vertex_entry); vertex->out_degree--; @@ -237,8 +240,10 @@ void unix_del_edges(struct scm_fp_list *fpl) unix_del_edge(fpl, edge); } while (i < fpl->count_unix);
- receiver = fpl->edges[0].successor; - receiver->scm_stat.nr_unix_fds -= fpl->count_unix; + if (!gc_in_progress) { + receiver = fpl->edges[0].successor; + receiver->scm_stat.nr_unix_fds -= fpl->count_unix; + } WRITE_ONCE(unix_tot_inflight, unix_tot_inflight - fpl->count_unix); out: WRITE_ONCE(fpl->user->unix_inflight, fpl->user->unix_inflight - fpl->count); @@ -526,6 +531,8 @@ static void unix_walk_scc(struct sk_buff_head *hitlist)
static void unix_walk_scc_fast(struct sk_buff_head *hitlist) { + unix_graph_maybe_cyclic = false; + while (!list_empty(&unix_unvisited_vertices)) { struct unix_vertex *vertex; struct list_head scc; @@ -543,6 +550,8 @@ static void unix_walk_scc_fast(struct sk_buff_head *hitlist)
if (scc_dead) unix_collect_skb(&scc, hitlist); + else if (!unix_graph_maybe_cyclic) + unix_graph_maybe_cyclic = unix_scc_cyclic(&scc);
list_del(&scc); } @@ -550,8 +559,6 @@ static void unix_walk_scc_fast(struct sk_buff_head *hitlist) list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); }
-static bool gc_in_progress; - static void __unix_gc(struct work_struct *work) { struct sk_buff_head hitlist;
[ Sasha's backport helper bot ]
Hi,
Summary of potential issues: ℹ️ This is part 24/27 of a series ⚠️ Found follow-up fixes in mainline
The upstream commit SHA1 provided is correct: 1af2dface5d286dd1f2f3405a0d6fa9f2c8fb998
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Found fixes commits: 7172dc93d621 af_unix: Add dead flag to struct scm_fp_list.
Note: The patch differs from the upstream commit: --- 1: 1af2dface5d28 ! 1: b9290c1b47598 af_unix: Don't access successor in unix_del_edges() during GC. @@ Metadata ## Commit message ## af_unix: Don't access successor in unix_del_edges() during GC.
+ [ Upstream commit 1af2dface5d286dd1f2f3405a0d6fa9f2c8fb998 ] + syzbot reported use-after-free in unix_del_edges(). [0]
What the repro does is basically repeat the following quickly. @@ Commit message Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240419235102.31707-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com + (cherry picked from commit 1af2dface5d286dd1f2f3405a0d6fa9f2c8fb998) + Signed-off-by: Lee Jones lee@kernel.org
## net/unix/garbage.c ## @@ net/unix/garbage.c: static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) ---
NOTE: These results are for this patch alone. Full series testing will be performed when all parts are received.
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 7172dc93d621d5dc302d007e95ddd1311ec64283 ]
Commit 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges() during GC.") fixed use-after-free by avoid accessing edge->successor while GC is in progress.
However, there could be a small race window where another process could call unix_del_edges() while gc_in_progress is true and __skb_queue_purge() is on the way.
So, we need another marker for struct scm_fp_list which indicates if the skb is garbage-collected.
This patch adds dead flag in struct scm_fp_list and set it true before calling __skb_queue_purge().
Fixes: 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges() during GC.") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240508171150.50601-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 7172dc93d621d5dc302d007e95ddd1311ec64283) Signed-off-by: Lee Jones lee@kernel.org --- include/net/scm.h | 1 + net/core/scm.c | 1 + net/unix/garbage.c | 14 ++++++++++---- 3 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/include/net/scm.h b/include/net/scm.h index 19789096424d..0be0dc3eb1dc 100644 --- a/include/net/scm.h +++ b/include/net/scm.h @@ -31,6 +31,7 @@ struct scm_fp_list { short max; #ifdef CONFIG_UNIX bool inflight; + bool dead; struct list_head vertices; struct unix_edge *edges; #endif diff --git a/net/core/scm.c b/net/core/scm.c index 1ff78bd4ee83..cdd4e5befb14 100644 --- a/net/core/scm.c +++ b/net/core/scm.c @@ -91,6 +91,7 @@ static int scm_fp_copy(struct cmsghdr *cmsg, struct scm_fp_list **fplp) fpl->user = NULL; #if IS_ENABLED(CONFIG_UNIX) fpl->inflight = false; + fpl->dead = false; fpl->edges = NULL; INIT_LIST_HEAD(&fpl->vertices); #endif diff --git a/net/unix/garbage.c b/net/unix/garbage.c index d76450133e4f..1f8b8cdfcdc8 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -158,13 +158,11 @@ static void unix_add_edge(struct scm_fp_list *fpl, struct unix_edge *edge) unix_update_graph(unix_edge_successor(edge)); }
-static bool gc_in_progress; - static void unix_del_edge(struct scm_fp_list *fpl, struct unix_edge *edge) { struct unix_vertex *vertex = edge->predecessor->vertex;
- if (!gc_in_progress) + if (!fpl->dead) unix_update_graph(unix_edge_successor(edge));
list_del(&edge->vertex_entry); @@ -240,7 +238,7 @@ void unix_del_edges(struct scm_fp_list *fpl) unix_del_edge(fpl, edge); } while (i < fpl->count_unix);
- if (!gc_in_progress) { + if (!fpl->dead) { receiver = fpl->edges[0].successor; receiver->scm_stat.nr_unix_fds -= fpl->count_unix; } @@ -559,9 +557,12 @@ static void unix_walk_scc_fast(struct sk_buff_head *hitlist) list_replace_init(&unix_visited_vertices, &unix_unvisited_vertices); }
+static bool gc_in_progress; + static void __unix_gc(struct work_struct *work) { struct sk_buff_head hitlist; + struct sk_buff *skb;
spin_lock(&unix_gc_lock);
@@ -579,6 +580,11 @@ static void __unix_gc(struct work_struct *work)
spin_unlock(&unix_gc_lock);
+ skb_queue_walk(&hitlist, skb) { + if (UNIXCB(skb).fp) + UNIXCB(skb).fp->dead = true; + } + __skb_queue_purge(&hitlist); skip_gc: WRITE_ONCE(gc_in_progress, false);
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 7172dc93d621d5dc302d007e95ddd1311ec64283
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Kuniyuki Iwashimakuniyu@amazon.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 7172dc93d621d ! 1: f8f884570679e af_unix: Add dead flag to struct scm_fp_list. @@ Metadata ## Commit message ## af_unix: Add dead flag to struct scm_fp_list.
+ [ Upstream commit 7172dc93d621d5dc302d007e95ddd1311ec64283 ] + Commit 1af2dface5d2 ("af_unix: Don't access successor in unix_del_edges() during GC.") fixed use-after-free by avoid accessing edge->successor while GC is in progress. @@ Commit message Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240508171150.50601-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit 7172dc93d621d5dc302d007e95ddd1311ec64283) + Signed-off-by: Lee Jones lee@kernel.org
## include/net/scm.h ## @@ include/net/scm.h: struct scm_fp_list { ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Michal Luczaj mhal@rbox.co
[ Upstream commit 041933a1ec7b4173a8e638cae4f8e394331d7e54 ]
GC attempts to explicitly drop oob_skb's reference before purging the hit list.
The problem is with embryos: kfree_skb(u->oob_skb) is never called on an embryo socket.
The python script below [0] sends a listener's fd to its embryo as OOB data. While GC does collect the embryo's queue, it fails to drop the OOB skb's refcount. The skb which was in embryo's receive queue stays as unix_sk(sk)->oob_skb and keeps the listener's refcount [1].
Tell GC to dispose embryo's oob_skb.
[0]: from array import array from socket import *
addr = '\x00unix-oob' lis = socket(AF_UNIX, SOCK_STREAM) lis.bind(addr) lis.listen(1)
s = socket(AF_UNIX, SOCK_STREAM) s.connect(addr) scm = (SOL_SOCKET, SCM_RIGHTS, array('i', [lis.fileno()])) s.sendmsg([b'x'], [scm], MSG_OOB) lis.close()
[1] $ grep unix-oob /proc/net/unix $ ./unix-oob.py $ grep unix-oob /proc/net/unix 0000000000000000: 00000002 00000000 00000000 0001 02 0 @unix-oob 0000000000000000: 00000002 00000000 00010000 0001 01 6072 @unix-oob
Fixes: 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") Signed-off-by: Michal Luczaj mhal@rbox.co Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com (cherry picked from commit 041933a1ec7b4173a8e638cae4f8e394331d7e54) Signed-off-by: Lee Jones lee@kernel.org --- net/unix/garbage.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index 1f8b8cdfcdc8..dfe94a90ece4 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -342,6 +342,18 @@ enum unix_recv_queue_lock_class { U_RECVQ_LOCK_EMBRYO, };
+static void unix_collect_queue(struct unix_sock *u, struct sk_buff_head *hitlist) +{ + skb_queue_splice_init(&u->sk.sk_receive_queue, hitlist); + +#if IS_ENABLED(CONFIG_AF_UNIX_OOB) + if (u->oob_skb) { + WARN_ON_ONCE(skb_unref(u->oob_skb)); + u->oob_skb = NULL; + } +#endif +} + static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *hitlist) { struct unix_vertex *vertex; @@ -365,18 +377,11 @@ static void unix_collect_skb(struct list_head *scc, struct sk_buff_head *hitlist
/* listener -> embryo order, the inversion never happens. */ spin_lock_nested(&embryo_queue->lock, U_RECVQ_LOCK_EMBRYO); - skb_queue_splice_init(embryo_queue, hitlist); + unix_collect_queue(unix_sk(skb->sk), hitlist); spin_unlock(&embryo_queue->lock); } } else { - skb_queue_splice_init(queue, hitlist); - -#if IS_ENABLED(CONFIG_AF_UNIX_OOB) - if (u->oob_skb) { - kfree_skb(u->oob_skb); - u->oob_skb = NULL; - } -#endif + unix_collect_queue(u, hitlist); }
spin_unlock(&queue->lock);
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 041933a1ec7b4173a8e638cae4f8e394331d7e54
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Michal Luczajmhal@rbox.co
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 041933a1ec7b4 ! 1: c5354d822aa1a af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS @@ Metadata ## Commit message ## af_unix: Fix garbage collection of embryos carrying OOB with SCM_RIGHTS
+ [ Upstream commit 041933a1ec7b4173a8e638cae4f8e394331d7e54 ] + GC attempts to explicitly drop oob_skb's reference before purging the hit list.
@@ Commit message Signed-off-by: Michal Luczaj mhal@rbox.co Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com + (cherry picked from commit 041933a1ec7b4173a8e638cae4f8e394331d7e54) + Signed-off-by: Lee Jones lee@kernel.org
## net/unix/garbage.c ## @@ net/unix/garbage.c: enum unix_recv_queue_lock_class { ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
From: Shigeru Yoshida syoshida@redhat.com
[ Upstream commit 927fa5b3e4f52e0967bfc859afc98ad1c523d2d5 ]
KMSAN reported uninit-value access in __unix_walk_scc() [1].
In the list_for_each_entry_reverse() loop, when the vertex's index equals it's scc_index, the loop uses the variable vertex as a temporary variable that points to a vertex in scc. And when the loop is finished, the variable vertex points to the list head, in this case scc, which is a local variable on the stack (more precisely, it's not even scc and might underflow the call stack of __unix_walk_scc(): container_of(&scc, struct unix_vertex, scc_entry)).
However, the variable vertex is used under the label prev_vertex. So if the edge_stack is not empty and the function jumps to the prev_vertex label, the function will access invalid data on the stack. This causes the uninit-value access issue.
Fix this by introducing a new temporary variable for the loop.
[1] BUG: KMSAN: uninit-value in __unix_walk_scc net/unix/garbage.c:478 [inline] BUG: KMSAN: uninit-value in unix_walk_scc net/unix/garbage.c:526 [inline] BUG: KMSAN: uninit-value in __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584 __unix_walk_scc net/unix/garbage.c:478 [inline] unix_walk_scc net/unix/garbage.c:526 [inline] __unix_gc+0x2589/0x3c20 net/unix/garbage.c:584 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393 kthread+0x3c4/0x530 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Uninit was stored to memory at: unix_walk_scc net/unix/garbage.c:526 [inline] __unix_gc+0x2adf/0x3c20 net/unix/garbage.c:584 process_one_work kernel/workqueue.c:3231 [inline] process_scheduled_works+0xade/0x1bf0 kernel/workqueue.c:3312 worker_thread+0xeb6/0x15b0 kernel/workqueue.c:3393 kthread+0x3c4/0x530 kernel/kthread.c:389 ret_from_fork+0x6e/0x90 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Local variable entries created at: ref_tracker_free+0x48/0xf30 lib/ref_tracker.c:222 netdev_tracker_free include/linux/netdevice.h:4058 [inline] netdev_put include/linux/netdevice.h:4075 [inline] dev_put include/linux/netdevice.h:4101 [inline] update_gid_event_work_handler+0xaa/0x1b0 drivers/infiniband/core/roce_gid_mgmt.c:813
CPU: 1 PID: 12763 Comm: kworker/u8:31 Not tainted 6.10.0-rc4-00217-g35bb670d65fc #32 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014 Workqueue: events_unbound __unix_gc
Fixes: 3484f063172d ("af_unix: Detect Strongly Connected Components.") Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Shigeru Yoshida syoshida@redhat.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20240702160428.10153-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org (cherry picked from commit 927fa5b3e4f52e0967bfc859afc98ad1c523d2d5) Signed-off-by: Lee Jones lee@kernel.org --- net/unix/garbage.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c index dfe94a90ece4..23efb78fe9ef 100644 --- a/net/unix/garbage.c +++ b/net/unix/garbage.c @@ -476,6 +476,7 @@ static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *last_inde }
if (vertex->index == vertex->scc_index) { + struct unix_vertex *v; struct list_head scc; bool scc_dead = true;
@@ -486,15 +487,15 @@ static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *last_inde */ __list_cut_position(&scc, &vertex_stack, &vertex->scc_entry);
- list_for_each_entry_reverse(vertex, &scc, scc_entry) { + list_for_each_entry_reverse(v, &scc, scc_entry) { /* Don't restart DFS from this vertex in unix_walk_scc(). */ - list_move_tail(&vertex->entry, &unix_visited_vertices); + list_move_tail(&v->entry, &unix_visited_vertices);
/* Mark vertex as off-stack. */ - vertex->index = unix_vertex_grouped_index; + v->index = unix_vertex_grouped_index;
if (scc_dead) - scc_dead = unix_vertex_dead(vertex); + scc_dead = unix_vertex_dead(v); }
if (scc_dead)
[ Sasha's backport helper bot ]
Hi,
✅ All tests passed successfully. No issues detected. No action required from the submitter.
The upstream commit SHA1 provided is correct: 927fa5b3e4f52e0967bfc859afc98ad1c523d2d5
WARNING: Author mismatch between patch and upstream commit: Backport author: Lee Joneslee@kernel.org Commit author: Shigeru Yoshidasyoshida@redhat.com
Status in newer kernel trees: 6.14.y | Present (exact SHA1) 6.12.y | Present (exact SHA1) 6.6.y | Not found
Note: The patch differs from the upstream commit: --- 1: 927fa5b3e4f52 ! 1: 3996bfbf88f0e af_unix: Fix uninit-value in __unix_walk_scc() @@ Metadata ## Commit message ## af_unix: Fix uninit-value in __unix_walk_scc()
+ [ Upstream commit 927fa5b3e4f52e0967bfc859afc98ad1c523d2d5 ] + KMSAN reported uninit-value access in __unix_walk_scc() [1].
In the list_for_each_entry_reverse() loop, when the vertex's index @@ Commit message Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20240702160428.10153-1-syoshida@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org + (cherry picked from commit 927fa5b3e4f52e0967bfc859afc98ad1c523d2d5) + Signed-off-by: Lee Jones lee@kernel.org
## net/unix/garbage.c ## @@ net/unix/garbage.c: static void __unix_walk_scc(struct unix_vertex *vertex, unsigned long *last_inde ---
Results of testing on various branches:
| Branch | Patch Apply | Build Test | |---------------------------|-------------|------------| | stable/linux-6.6.y | Success | Success |
linux-stable-mirror@lists.linaro.org