Backports the following three patches to fix the issue of IMA mishandling
LSM based rule during LSM policy update, causing a file to match an
unexpected rule.
v6:
Removed the redundent i in ima_free_rule().
v5:
goes back to ima_lsm_free_rule() instead to avoid freeing
rule->fsname.
v4:
Make use of the exisiting ima_free_rule() instead of backported
ima_lsm_free_rule(). Which resolves additional memory leak issues.
v3:
Backport "LSM: switch to blocking policy update notifiers" as well, as
the prerequsite of "ima: use the lsm policy update notifier".
v2:
Re-adjust the bacported logic.
GUO Zihua (1):
ima: Handle -ESTALE returned by ima_filter_rule_match()
Janne Karhunen (2):
LSM: switch to blocking policy update notifiers
ima: use the lsm policy update notifier
drivers/infiniband/core/device.c | 4 +-
include/linux/security.h | 12 +--
security/integrity/ima/ima.h | 2 +
security/integrity/ima/ima_main.c | 8 ++
security/integrity/ima/ima_policy.c | 151 ++++++++++++++++++++++------
security/security.c | 23 +++--
security/selinux/hooks.c | 2 +-
security/selinux/selinuxfs.c | 2 +-
8 files changed, 155 insertions(+), 49 deletions(-)
--
2.17.1
Backports the following three patches to fix the issue of IMA mishandling
LSM based rule during LSM policy update, causing a file to match an
unexpected rule.
v7:
Fixed the target for free in ima_lsm_copy_rule().
v6:
Removed the redundent i in ima_free_rule().
v5:
goes back to ima_lsm_free_rule() instead to avoid freeing
rule->fsname.
v4:
Make use of the exisiting ima_free_rule() instead of backported
ima_lsm_free_rule(). Which resolves additional memory leak issues.
v3:
Backport "LSM: switch to blocking policy update notifiers" as well, as
the prerequsite of "ima: use the lsm policy update notifier".
v2:
Re-adjust the bacported logic.
GUO Zihua (1):
ima: Handle -ESTALE returned by ima_filter_rule_match()
Janne Karhunen (2):
LSM: switch to blocking policy update notifiers
ima: use the lsm policy update notifier
drivers/infiniband/core/device.c | 4 +-
include/linux/security.h | 12 +--
security/integrity/ima/ima.h | 2 +
security/integrity/ima/ima_main.c | 8 ++
security/integrity/ima/ima_policy.c | 151 ++++++++++++++++++++++------
security/security.c | 23 +++--
security/selinux/hooks.c | 2 +-
security/selinux/selinuxfs.c | 2 +-
8 files changed, 155 insertions(+), 49 deletions(-)
--
2.17.1
Since commit 07ec77a1d4e8 ("sched: Allow task CPU affinity to be
restricted on asymmetric systems"), the setting and clearing of
user_cpus_ptr are done under pi_lock for arm64 architecture. However,
dup_user_cpus_ptr() accesses user_cpus_ptr without any lock
protection. Since sched_setaffinity() can be invoked from another
process, the process being modified may be undergoing fork() at
the same time. When racing with the clearing of user_cpus_ptr in
__set_cpus_allowed_ptr_locked(), it can lead to user-after-free and
possibly double-free in arm64 kernel.
Commit 8f9ea86fdf99 ("sched: Always preserve the user requested
cpumask") fixes this problem as user_cpus_ptr, once set, will never
be cleared in a task's lifetime. However, this bug was re-introduced
in commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") which allows the clearing of user_cpus_ptr in
do_set_cpus_allowed(). This time, it will affect all arches.
Fix this bug by always clearing the user_cpus_ptr of the newly
cloned/forked task before the copying process starts and check the
user_cpus_ptr state of the source task under pi_lock.
Note to stable, this patch won't be applicable to stable releases.
Just copy the new dup_user_cpus_ptr() function over.
Fixes: 07ec77a1d4e8 ("sched: Allow task CPU affinity to be restricted on asymmetric systems")
Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
CC: stable(a)vger.kernel.org
Reported-by: David Wang 王标 <wangbiao3(a)xiaomi.com>
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/sched/core.c | 34 +++++++++++++++++++++++++++++-----
1 file changed, 29 insertions(+), 5 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 25b582b6ee5f..b93d030b9fd5 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2612,19 +2612,43 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
int node)
{
+ cpumask_t *user_mask;
unsigned long flags;
- if (!src->user_cpus_ptr)
+ /*
+ * Always clear dst->user_cpus_ptr first as their user_cpus_ptr's
+ * may differ by now due to racing.
+ */
+ dst->user_cpus_ptr = NULL;
+
+ /*
+ * This check is racy and losing the race is a valid situation.
+ * It is not worth the extra overhead of taking the pi_lock on
+ * every fork/clone.
+ */
+ if (data_race(!src->user_cpus_ptr))
return 0;
- dst->user_cpus_ptr = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
- if (!dst->user_cpus_ptr)
+ user_mask = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
+ if (!user_mask)
return -ENOMEM;
- /* Use pi_lock to protect content of user_cpus_ptr */
+ /*
+ * Use pi_lock to protect content of user_cpus_ptr
+ *
+ * Though unlikely, user_cpus_ptr can be reset to NULL by a concurrent
+ * do_set_cpus_allowed().
+ */
raw_spin_lock_irqsave(&src->pi_lock, flags);
- cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
+ if (src->user_cpus_ptr) {
+ swap(dst->user_cpus_ptr, user_mask);
+ cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
+ }
raw_spin_unlock_irqrestore(&src->pi_lock, flags);
+
+ if (unlikely(user_mask))
+ kfree(user_mask);
+
return 0;
}
--
2.31.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
12521a5d5cb7 ("io_uring: fix CQ waiting timeout handling")
35d90f95cfa7 ("io_uring: include task_work run after scheduling in wait for events")
3a08576b96e3 ("io_uring: remove check_cq checking from hot paths")
ed29b0b4fd83 ("io_uring: move to separate directory")
155bc9505dbd ("io_uring: return an error when cqe is dropped")
10988a0a67ba ("io_uring: use constants for cq_overflow bitfield")
3e813c902672 ("io_uring: rework io_uring_enter to simplify return value")
cef216fc32d7 ("io_uring: explicitly keep a CQE in io_kiocb")
b4f20bb4e6d5 ("io_uring: move finish_wait() outside of loop in cqring_wait()")
d487b43cd327 ("io_uring: optimise mutex locking for submit+iopoll")
773697b610bf ("io_uring: pre-calculate syscall iopolling decision")
f81440d33cc6 ("io_uring: split off IOPOLL argument verifiction")
b605a7fabb60 ("io_uring: move poll recycling later in compl flushing")
a538be5be328 ("io_uring: optimise io_free_batch_list")
c0713540f6d5 ("io_uring: fix leaks on IOPOLL and CQE_SKIP")
323b190ba2de ("io_uring: free iovec if file assignment fails")
7179c3ce3dbf ("io_uring: fix poll error reporting")
cce64ef01308 ("io_uring: fix poll file assign deadlock")
82733d168cbd ("io_uring: stop using io_wq_work as an fd placeholder")
2804ecd8d3e3 ("io_uring: move apoll->events cache")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 12521a5d5cb7ff0ad43eadfc9c135d86e1131fa8 Mon Sep 17 00:00:00 2001
From: Pavel Begunkov <asml.silence(a)gmail.com>
Date: Thu, 5 Jan 2023 10:49:15 +0000
Subject: [PATCH] io_uring: fix CQ waiting timeout handling
Jiffy to ktime CQ waiting conversion broke how we treat timeouts, in
particular we rearm it anew every time we get into
io_cqring_wait_schedule() without adjusting the timeout. Waiting for 2
CQEs and getting a task_work in the middle may double the timeout value,
or even worse in some cases task may wait indefinitely.
Cc: stable(a)vger.kernel.org
Fixes: 228339662b398 ("io_uring: don't convert to jiffies for waiting on timeouts")
Signed-off-by: Pavel Begunkov <asml.silence(a)gmail.com>
Link: https://lore.kernel.org/r/f7bffddd71b08f28a877d44d37ac953ddb01590d.16729156…
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 472574192dd6..2ac1cd8d23ea 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2470,7 +2470,7 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx)
/* when returns >0, the caller should retry */
static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
struct io_wait_queue *iowq,
- ktime_t timeout)
+ ktime_t *timeout)
{
int ret;
unsigned long check_cq;
@@ -2488,7 +2488,7 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
if (check_cq & BIT(IO_CHECK_CQ_DROPPED_BIT))
return -EBADR;
}
- if (!schedule_hrtimeout(&timeout, HRTIMER_MODE_ABS))
+ if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS))
return -ETIME;
/*
@@ -2564,7 +2564,7 @@ static int io_cqring_wait(struct io_ring_ctx *ctx, int min_events,
}
prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq,
TASK_INTERRUPTIBLE);
- ret = io_cqring_wait_schedule(ctx, &iowq, timeout);
+ ret = io_cqring_wait_schedule(ctx, &iowq, &timeout);
if (__io_cqring_events_user(ctx) >= min_events)
break;
cond_resched();