The patch titled
Subject: shm: skip shm_destroy if task IPC namespace was changed
has been removed from the -mm tree. Its filename was
shm-skip-shm_destroy-if-task-ipc-namespace-was-changed.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Subject: shm: skip shm_destroy if task IPC namespace was changed
Patch series "shm: omit forced shm destroy if task IPC namespace was changed".
Task IPC namespace shm's has shm_rmid_forced feature which is per IPC
namespace and controlled by kernel.shm_rmid_forced sysctl. When feature
is turned on, then during task exit (and unshare(CLONE_NEWIPC)) all
sysvshm's will be destroyed by exit_shm(struct task_struct *task)
function.
But there is a problem if task was changed IPC namespace since shmget()
call. In such situation exit_shm() function will try to call
shm_destroy(<new_ipc_namespace_ptr>, <sysvshmem_from_old_ipc_namespace>)
which leads to the situation when sysvshm object still attached to old IPC
namespace but freed; later during old IPC namespace cleanup we will try to
free such sysvshm object for the second time and will get the problem :)
First patch solves this problem by postponing shm_destroy to the moment
when IPC namespace cleanup will be called. Second patch is useful to
prevent (or easy catch) such bugs in the future by adding corresponding
WARNings.
This patch (of 2):
Task may change IPC namespace by doing setns() but sysvshm objects remains
at the origin IPC namespace (=IPC namespace where task was when shmget()
was called). Let's skip forced shm destroy in such case because we can't
determine IPC namespace by shm only. These problematic sysvshm's will be
destroyed on ipc namespace cleanup.
Link: https://lkml.kernel.org/r/20210706132259.71740-1-alexander.mikhalitsyn@virt…
Link: https://lkml.kernel.org/r/20210706132259.71740-2-alexander.mikhalitsyn@virt…
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: Milton Miller <miltonm(a)bga.com>
Cc: Jack Miller <millerjo(a)us.ibm.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Alexander Mikhalitsyn <alexander(a)mihalicyn.com>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
ipc/shm.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
--- a/ipc/shm.c~shm-skip-shm_destroy-if-task-ipc-namespace-was-changed
+++ a/ipc/shm.c
@@ -173,6 +173,14 @@ static inline struct shmid_kernel *shm_o
return container_of(ipcp, struct shmid_kernel, shm_perm);
}
+static inline bool is_shm_in_ns(struct ipc_namespace *ns, struct shmid_kernel *shp)
+{
+ int idx = ipcid_to_idx(shp->shm_perm.id);
+ struct shmid_kernel *tshp = shm_obtain_object(ns, idx);
+
+ return !IS_ERR(tshp) && tshp == shp;
+}
+
/*
* shm_lock_(check_) routines are called in the paths where the rwsem
* is not necessarily held.
@@ -415,7 +423,7 @@ void exit_shm(struct task_struct *task)
list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
shp->shm_creator = NULL;
- if (shm_may_destroy(ns, shp)) {
+ if (is_shm_in_ns(ns, shp) && shm_may_destroy(ns, shp)) {
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}
_
Patches currently in -mm which might be from alexander.mikhalitsyn(a)virtuozzo.com are
ipc-warn-if-trying-to-remove-ipc-object-which-is-absent.patch
Commit 3226b158e67c ("net: avoid 32 x truesize under-estimation for tiny skbs")
introduces a ~10% performance drop when using virtio-net drivers.
This commit has been backported to v4.14 in commit 40b95b92f1db and this
performance drop is also visible there.
Here at Tessares, we can also notice this drop with the MPTCP fork [1]
on top of the v4.14 kernel.
Eric Dumazet already fixed this issue a few months ago, see
commit 0f6925b3e8da ("virtio_net: Do not pull payload in skb->head").
Unfortunately, this patch has not been backported to < v5.4 because it
caused issues [2]. Indeed, after having backported it, the kernel fails
to compile. Please refer to patch 1/2 for more details.
A new version of this patch is proposed here fixing the compilation
issue. It has been validated: it fixes the original issue on v4.14 as
well.
Please note that there is also a fix for the fix, see
commit 38ec4944b593 ("gro: ensure frag0 meets IP header alignment").
This second fix has also not been backported because it caused issues as
well [3]. Here, it was due to a conflict but also a compilation error
when the conflict has been resolved. Please refer to patch 2/2 for more
details.
One last note: It looks like it could be interesting to backport these
two patches to v4.9 and v4.4 as well but unfortunately, the backport of
these two patches fails with conflicts and I don't have any setup to
validate the performance drop and fix with v4.9 and v4.4 kernels.
[1] https://github.com/multipath-tcp/mptcp
[2] https://lore.kernel.org/stable/161806389310822@kroah.com/
[3] https://lore.kernel.org/stable/16187490172453@kroah.com/
Eric Dumazet (2):
virtio_net: Do not pull payload in skb->head
gro: ensure frag0 meets IP header alignment
drivers/net/virtio_net.c | 10 +++++++---
include/linux/skbuff.h | 9 +++++++++
include/linux/virtio_net.h | 14 +++++++++-----
net/core/dev.c | 3 ++-
4 files changed, 27 insertions(+), 9 deletions(-)
--
2.31.1