Sorry, I may not have sent the email correctly.
I will resend it.
On Thu, 27 Oct 2022 20:26:04 +0000 Andrew Morton <akpm(a)linux-foundation.org> wrote:
> On Wed, 26 Oct 2022 20:24:38 +0900 NARIBAYASHI Akira <a.naribayashi(a)fujitsu.com> wrote:
>
> > Depending on the memory configuration, isolate_freepages_block() may
> > scan pages out of the target range and causes panic.
> >
> > The problem is that pfn as argument of fast_isolate_around() could
> > be out of the target range. Therefore we should consider the case
> > where pfn < start_pfn, and also the case where end_pfn < pfn.
> >
> > This problem should have been addressd by the commit 6e2b7044c199
> > ("mm, compaction: make fast_isolate_freepages() stay within zone")
> > but there was an oversight.
> >
> > Case1: pfn < start_pfn
> >
> > <at memory compaction for node Y>
> > | node X's zone | node Y's zone
> > +-----------------+------------------------------...
> > pageblock ^ ^ ^
> > +-----------+-----------+-----------+-----------+...
> > ^ ^ ^
> > ^ ^ end_pfn
> > ^ start_pfn = cc->zone->zone_start_pfn
> > pfn
> > <---------> scanned range by "Scan After"
> >
> > Case2: end_pfn < pfn
> >
> > <at memory compaction for node X>
> > | node X's zone | node Y's zone
> > +-----------------+------------------------------...
> > pageblock ^ ^ ^
> > +-----------+-----------+-----------+-----------+...
> > ^ ^ ^
> > ^ ^ pfn
> > ^ end_pfn
> > start_pfn
> > <---------> scanned range by "Scan Before"
> >
> > It seems that there is no good reason to skip nr_isolated pages
> > just after given pfn. So let perform simple scan from start to end
> > instead of dividing the scan into "Before" and "After".
>
> Under what circumstances will this panic occur? I assume those
> circumstnces are pretty rare, give that 6e2b7044c1992 was nearly two
> years ago.
>
> Did you consider the desirability of backporting this fix into earlier
> kernels?
Panic can occur on systems with multiple zones in a single pageblock.
The reason it is rare is that it only happens in special configurations.
Depending on how many similar systems there are, it may be a good idea to fix this problem for older kernels as well.
Since commit 07ec77a1d4e8 ("sched: Allow task CPU affinity to be
restricted on asymmetric systems"), the setting and clearing of
user_cpus_ptr are done under pi_lock for arm64 architecture. However,
dup_user_cpus_ptr() accesses user_cpus_ptr without any lock
protection. When racing with the clearing of user_cpus_ptr in
__set_cpus_allowed_ptr_locked(), it can lead to user-after-free and
double-free in arm64 kernel.
Commit 8f9ea86fdf99 ("sched: Always preserve the user requested
cpumask") fixes this problem as user_cpus_ptr, once set, will never
be cleared in a task's lifetime. However, this bug was re-introduced
in commit 851a723e45d1 ("sched: Always clear user_cpus_ptr in
do_set_cpus_allowed()") which allows the clearing of user_cpus_ptr in
do_set_cpus_allowed(). This time, it will affect all arches.
Fix this bug by always clearing the user_cpus_ptr of the newly
cloned/forked task before the copying process starts and check the
user_cpus_ptr state of the source task under pi_lock.
Note to stable, this patch won't be applicable to stable releases.
Just copy the new dup_user_cpus_ptr() function over.
Fixes: 07ec77a1d4e8 ("sched: Allow task CPU affinity to be restricted on asymmetric systems")
Fixes: 851a723e45d1 ("sched: Always clear user_cpus_ptr in do_set_cpus_allowed()")
CC: stable(a)vger.kernel.org
Reported-by: David Wang 王标 <wangbiao3(a)xiaomi.com>
Signed-off-by: Waiman Long <longman(a)redhat.com>
---
kernel/sched/core.c | 32 ++++++++++++++++++++++++++++----
1 file changed, 28 insertions(+), 4 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 8df51b08bb38..f2b75faaf71a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2624,19 +2624,43 @@ void do_set_cpus_allowed(struct task_struct *p, const struct cpumask *new_mask)
int dup_user_cpus_ptr(struct task_struct *dst, struct task_struct *src,
int node)
{
+ cpumask_t *user_mask;
unsigned long flags;
+ /*
+ * Always clear dst->user_cpus_ptr first as their user_cpus_ptr's
+ * may differ by now due to racing.
+ */
+ dst->user_cpus_ptr = NULL;
+
+ /*
+ * This check is racy and losing the race is a valid situation.
+ * It is not worth the extra overhead of taking the pi_lock on
+ * every fork/clone.
+ */
if (!src->user_cpus_ptr)
return 0;
- dst->user_cpus_ptr = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
- if (!dst->user_cpus_ptr)
+ user_mask = kmalloc_node(cpumask_size(), GFP_KERNEL, node);
+ if (!user_mask)
return -ENOMEM;
- /* Use pi_lock to protect content of user_cpus_ptr */
+ /*
+ * Use pi_lock to protect content of user_cpus_ptr
+ *
+ * Though unlikely, user_cpus_ptr can be reset to NULL by a concurrent
+ * do_set_cpus_allowed().
+ */
raw_spin_lock_irqsave(&src->pi_lock, flags);
- cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
+ if (src->user_cpus_ptr) {
+ swap(dst->user_cpus_ptr, user_mask);
+ cpumask_copy(dst->user_cpus_ptr, src->user_cpus_ptr);
+ }
raw_spin_unlock_irqrestore(&src->pi_lock, flags);
+
+ if (unlikely(user_mask))
+ kfree(user_mask);
+
return 0;
}
--
2.31.1
From: Xiubo Li <xiubli(a)redhat.com>
When ceph releasing the file_lock it will try to get the inode pointer
from the fl->fl_file, which the memory could already be released by
another thread in filp_close(). Because in VFS layer the fl->fl_file
doesn't increase the file's reference counter.
Will switch to use ceph dedicate lock info to track the inode.
And in ceph_fl_release_lock() we should skip all the operations if
the fl->fl_u.ceph_fl.fl_inode is not set, which should come from
the request file_lock. And we will set fl->fl_u.ceph_fl.fl_inode when
inserting it to the inode lock list, which is when copying the lock.
Cc: stable(a)vger.kernel.org
Cc: Jeff Layton <jlayton(a)kernel.org>
URL: https://tracker.ceph.com/issues/57986
Signed-off-by: Xiubo Li <xiubli(a)redhat.com>
---
fs/ceph/locks.c | 20 ++++++++++++++++++--
include/linux/ceph/ceph_fs_fl.h | 17 +++++++++++++++++
include/linux/fs.h | 2 ++
3 files changed, 37 insertions(+), 2 deletions(-)
create mode 100644 include/linux/ceph/ceph_fs_fl.h
diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c
index b191426bf880..621f38f10a88 100644
--- a/fs/ceph/locks.c
+++ b/fs/ceph/locks.c
@@ -34,18 +34,34 @@ static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src)
{
struct inode *inode = file_inode(dst->fl_file);
atomic_inc(&ceph_inode(inode)->i_filelock_ref);
+ dst->fl_u.ceph_fl.fl_inode = igrab(inode);
}
+/*
+ * Do not use the 'fl->fl_file' in release function, which
+ * is possibly already released by another thread.
+ */
static void ceph_fl_release_lock(struct file_lock *fl)
{
- struct inode *inode = file_inode(fl->fl_file);
- struct ceph_inode_info *ci = ceph_inode(inode);
+ struct inode *inode = fl->fl_u.ceph_fl.fl_inode;
+ struct ceph_inode_info *ci;
+
+ /*
+ * If inode is NULL it should be a request file_lock,
+ * nothing we can do.
+ */
+ if (!inode)
+ return;
+
+ ci = ceph_inode(inode);
if (atomic_dec_and_test(&ci->i_filelock_ref)) {
/* clear error when all locks are released */
spin_lock(&ci->i_ceph_lock);
ci->i_ceph_flags &= ~CEPH_I_ERROR_FILELOCK;
spin_unlock(&ci->i_ceph_lock);
}
+ fl->fl_u.ceph_fl.fl_inode = NULL;
+ iput(inode);
}
static const struct file_lock_operations ceph_fl_lock_ops = {
diff --git a/include/linux/ceph/ceph_fs_fl.h b/include/linux/ceph/ceph_fs_fl.h
new file mode 100644
index 000000000000..ad1cf96329f9
--- /dev/null
+++ b/include/linux/ceph/ceph_fs_fl.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * ceph_fs_fl.h - Ceph lock info
+ *
+ * LGPL2
+ */
+
+#ifndef CEPH_FS_FL_H
+#define CEPH_FS_FL_H
+
+#include <linux/fs.h>
+
+struct ceph_lock_info {
+ struct inode *fl_inode;
+};
+
+#endif
diff --git a/include/linux/fs.h b/include/linux/fs.h
index d6cb42b7e91c..2b03d5e375d7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1066,6 +1066,7 @@ bool opens_in_grace(struct net *);
/* that will die - we need it for nfs_lock_info */
#include <linux/nfs_fs_i.h>
+#include <linux/ceph/ceph_fs_fl.h>
/*
* struct file_lock represents a generic "file lock". It's used to represent
@@ -1119,6 +1120,7 @@ struct file_lock {
int state; /* state of grant or error if -ve */
unsigned int debug_id;
} afs;
+ struct ceph_lock_info ceph_fl;
} fl_u;
} __randomize_layout;
--
2.31.1
From: Jan Dabros <jsd(a)semihalf.com>
Currently tpm transactions are executed unconditionally in
tpm_pm_suspend() function, which may lead to races with other tpm
accessors in the system. Specifically, the hw_random tpm driver makes
use of tpm_get_random(), and this function is called in a loop from a
kthread, which means it's not frozen alongside userspace, and so can
race with the work done during system suspend:
[ 3.277834] tpm tpm0: tpm_transmit: tpm_recv: error -52
[ 3.278437] tpm tpm0: invalid TPM_STS.x 0xff, dumping stack for forensics
[ 3.278445] CPU: 0 PID: 1 Comm: init Not tainted 6.1.0-rc5+ #135
[ 3.278450] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[ 3.278453] Call Trace:
[ 3.278458] <TASK>
[ 3.278460] dump_stack_lvl+0x34/0x44
[ 3.278471] tpm_tis_status.cold+0x19/0x20
[ 3.278479] tpm_transmit+0x13b/0x390
[ 3.278489] tpm_transmit_cmd+0x20/0x80
[ 3.278496] tpm1_pm_suspend+0xa6/0x110
[ 3.278503] tpm_pm_suspend+0x53/0x80
[ 3.278510] __pnp_bus_suspend+0x35/0xe0
[ 3.278515] ? pnp_bus_freeze+0x10/0x10
[ 3.278519] __device_suspend+0x10f/0x350
Fix this by calling tpm_try_get_ops(), which itself is a wrapper around
tpm_chip_start(), but takes the appropriate mutex.
Signed-off-by: Jan Dabros <jsd(a)semihalf.com>
Reported-by: Vlastimil Babka <vbabka(a)suse.cz>
Tested-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
Tested-by: Vlastimil Babka <vbabka(a)suse.cz>
Link: https://lore.kernel.org/all/c5ba47ef-393f-1fba-30bd-1230d1b4b592@suse.cz/
Cc: stable(a)vger.kernel.org
Fixes: e891db1a18bf ("tpm: turn on TPM on suspend for TPM 1.x")
[Jason: reworked commit message, added metadata]
Signed-off-by: Jason A. Donenfeld <Jason(a)zx2c4.com>
---
drivers/char/tpm/tpm-interface.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c
index 1621ce818705..d69905233aff 100644
--- a/drivers/char/tpm/tpm-interface.c
+++ b/drivers/char/tpm/tpm-interface.c
@@ -401,13 +401,14 @@ int tpm_pm_suspend(struct device *dev)
!pm_suspend_via_firmware())
goto suspended;
- if (!tpm_chip_start(chip)) {
+ rc = tpm_try_get_ops(chip);
+ if (!rc) {
if (chip->flags & TPM_CHIP_FLAG_TPM2)
tpm2_shutdown(chip, TPM2_SU_STATE);
else
rc = tpm1_pm_suspend(chip, tpm_suspend_pcr);
- tpm_chip_stop(chip);
+ tpm_put_ops(chip);
}
suspended:
--
2.38.1
When bfqq is shared by multiple processes it can happen that one of the
processes gets moved to a different cgroup (or just starts submitting IO
for different cgroup). In case that happens we need to split the merged
bfqq as otherwise we will have IO for multiple cgroups in one bfqq and
we will just account IO time to wrong entities etc.
Similarly if the bfqq is scheduled to merge with another bfqq but the
merge didn't happen yet, cancel the merge as it need not be valid
anymore.
CC: stable(a)vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
block/bfq-cgroup.c | 36 +++++++++++++++++++++++++++++++++---
block/bfq-iosched.c | 2 +-
block/bfq-iosched.h | 1 +
3 files changed, 35 insertions(+), 4 deletions(-)
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 420eda2589c0..9352f3cc2377 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -743,9 +743,39 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
}
if (sync_bfqq) {
- entity = &sync_bfqq->entity;
- if (entity->sched_data != &bfqg->sched_data)
- bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
+ if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
+ /* We are the only user of this bfqq, just move it */
+ if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
+ bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
+ } else {
+ struct bfq_queue *bfqq;
+
+ /*
+ * The queue was merged to a different queue. Check
+ * that the merge chain still belongs to the same
+ * cgroup.
+ */
+ for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
+ if (bfqq->entity.sched_data !=
+ &bfqg->sched_data)
+ break;
+ if (bfqq) {
+ /*
+ * Some queue changed cgroup so the merge is
+ * not valid anymore. We cannot easily just
+ * cancel the merge (by clearing new_bfqq) as
+ * there may be other processes using this
+ * queue and holding refs to all queues below
+ * sync_bfqq->new_bfqq. Similarly if the merge
+ * already happened, we need to detach from
+ * bfqq now so that we cannot merge bio to a
+ * request from the old cgroup.
+ */
+ bfq_put_cooperator(sync_bfqq);
+ bfq_release_process_ref(bfqd, sync_bfqq);
+ bic_set_bfqq(bic, NULL, 1);
+ }
+ }
}
return bfqg;
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 7d00b21ebe5d..89fe3f85eb3c 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5315,7 +5315,7 @@ static void bfq_put_stable_ref(struct bfq_queue *bfqq)
bfq_put_queue(bfqq);
}
-static void bfq_put_cooperator(struct bfq_queue *bfqq)
+void bfq_put_cooperator(struct bfq_queue *bfqq)
{
struct bfq_queue *__bfqq, *next;
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 3b83e3d1c2e5..a56763045d19 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -979,6 +979,7 @@ void bfq_weights_tree_remove(struct bfq_data *bfqd,
void bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bool compensate, enum bfqq_expiration reason);
void bfq_put_queue(struct bfq_queue *bfqq);
+void bfq_put_cooperator(struct bfq_queue *bfqq);
void bfq_end_wr_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
void bfq_release_process_ref(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_schedule_dispatch(struct bfq_data *bfqd);
--
2.34.1
test_bpf tail call tests end up as:
test_bpf: #0 Tail call leaf jited:1 85 PASS
test_bpf: #1 Tail call 2 jited:1 111 PASS
test_bpf: #2 Tail call 3 jited:1 145 PASS
test_bpf: #3 Tail call 4 jited:1 170 PASS
test_bpf: #4 Tail call load/store leaf jited:1 190 PASS
test_bpf: #5 Tail call load/store jited:1
BUG: Unable to handle kernel data access on write at 0xf1b4e000
Faulting instruction address: 0xbe86b710
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=4K MMU=Hash PowerMac
Modules linked in: test_bpf(+)
CPU: 0 PID: 97 Comm: insmod Not tainted 6.1.0-rc4+ #195
Hardware name: PowerMac3,1 750CL 0x87210 PowerMac
NIP: be86b710 LR: be857e88 CTR: be86b704
REGS: f1b4df20 TRAP: 0300 Not tainted (6.1.0-rc4+)
MSR: 00009032 <EE,ME,IR,DR,RI> CR: 28008242 XER: 00000000
DAR: f1b4e000 DSISR: 42000000
GPR00: 00000001 f1b4dfe0 c11d2280 00000000 00000000 00000000 00000002 00000000
GPR08: f1b4e000 be86b704 f1b4e000 00000000 00000000 100d816a f2440000 fe73baa8
GPR16: f2458000 00000000 c1941ae4 f1fe2248 00000045 c0de0000 f2458030 00000000
GPR24: 000003e8 0000000f f2458000 f1b4dc90 3e584b46 00000000 f24466a0 c1941a00
NIP [be86b710] 0xbe86b710
LR [be857e88] __run_one+0xec/0x264 [test_bpf]
Call Trace:
[f1b4dfe0] [00000002] 0x2 (unreliable)
Instruction dump:
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
---[ end trace 0000000000000000 ]---
This is a tentative to write above the stack. The problem is encoutered
with tests added by commit 38608ee7b690 ("bpf, tests: Add load store
test case for tail call")
This happens because tail call is done to a BPF prog with a different
stack_depth. At the time being, the stack is kept as is when the caller
tail calls its callee. But at exit, the callee restores the stack based
on its own properties. Therefore here, at each run, r1 is erroneously
increased by 32 - 16 = 16 bytes.
This was done that way in order to pass the tail call count from caller
to callee through the stack. As powerpc32 doesn't have a red zone in
the stack, it was necessary the maintain the stack as is for the tail
call. But it was not anticipated that the BPF frame size could be
different.
Let's take a new approach. Use register r4 to carry the tail call count
during the tail call, and save it into the stack at function entry if
required. This means the input parameter must be in r3, which is more
correct as it is a 32 bits parameter, then tail call better match with
normal BPF function entry, the down side being that we move that input
parameter back and forth between r3 and r4. That can be optimised later.
Doing that also has the advantage of maximising the common parts between
tail calls and a normal function exit.
With the fix, tail call tests are now successfull:
test_bpf: #0 Tail call leaf jited:1 53 PASS
test_bpf: #1 Tail call 2 jited:1 115 PASS
test_bpf: #2 Tail call 3 jited:1 154 PASS
test_bpf: #3 Tail call 4 jited:1 165 PASS
test_bpf: #4 Tail call load/store leaf jited:1 101 PASS
test_bpf: #5 Tail call load/store jited:1 141 PASS
test_bpf: #6 Tail call error path, max count reached jited:1 994 PASS
test_bpf: #7 Tail call count preserved across function calls jited:1 140975 PASS
test_bpf: #8 Tail call error path, NULL target jited:1 110 PASS
test_bpf: #9 Tail call error path, index out of range jited:1 69 PASS
test_bpf: test_tail_calls: Summary: 10 PASSED, 0 FAILED, [10/10 JIT'ed]
Suggested-by: Naveen N. Rao <naveen.n.rao(a)linux.vnet.ibm.com>
Fixes: 51c66ad849a7 ("powerpc/bpf: Implement extended BPF on PPC32")
Cc: stable(a)vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy(a)csgroup.eu>
---
v2: Using r4 for tcc as suggested by Naveen.
---
arch/powerpc/net/bpf_jit_comp32.c | 52 +++++++++++++------------------
1 file changed, 21 insertions(+), 31 deletions(-)
diff --git a/arch/powerpc/net/bpf_jit_comp32.c b/arch/powerpc/net/bpf_jit_comp32.c
index 43f1c76d48ce..a379b0ce19ff 100644
--- a/arch/powerpc/net/bpf_jit_comp32.c
+++ b/arch/powerpc/net/bpf_jit_comp32.c
@@ -113,23 +113,19 @@ void bpf_jit_build_prologue(u32 *image, struct codegen_context *ctx)
{
int i;
- /* First arg comes in as a 32 bits pointer. */
- EMIT(PPC_RAW_MR(bpf_to_ppc(BPF_REG_1), _R3));
- EMIT(PPC_RAW_LI(bpf_to_ppc(BPF_REG_1) - 1, 0));
+ /* Initialize tail_call_cnt, to be skipped if we do tail calls. */
+ EMIT(PPC_RAW_LI(_R4, 0));
+
+#define BPF_TAILCALL_PROLOGUE_SIZE 4
+
EMIT(PPC_RAW_STWU(_R1, _R1, -BPF_PPC_STACKFRAME(ctx)));
- /*
- * Initialize tail_call_cnt in stack frame if we do tail calls.
- * Otherwise, put in NOPs so that it can be skipped when we are
- * invoked through a tail call.
- */
if (ctx->seen & SEEN_TAILCALL)
- EMIT(PPC_RAW_STW(bpf_to_ppc(BPF_REG_1) - 1, _R1,
- bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
- else
- EMIT(PPC_RAW_NOP());
+ EMIT(PPC_RAW_STW(_R4, _R1, bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
-#define BPF_TAILCALL_PROLOGUE_SIZE 16
+ /* First arg comes in as a 32 bits pointer. */
+ EMIT(PPC_RAW_MR(bpf_to_ppc(BPF_REG_1), _R3));
+ EMIT(PPC_RAW_LI(bpf_to_ppc(BPF_REG_1) - 1, 0));
/*
* We need a stack frame, but we don't necessarily need to
@@ -170,24 +166,24 @@ static void bpf_jit_emit_common_epilogue(u32 *image, struct codegen_context *ctx
for (i = BPF_PPC_NVR_MIN; i <= 31; i++)
if (bpf_is_seen_register(ctx, i))
EMIT(PPC_RAW_LWZ(i, _R1, bpf_jit_stack_offsetof(ctx, i)));
-}
-
-void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
-{
- EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_0)));
-
- bpf_jit_emit_common_epilogue(image, ctx);
-
- /* Tear down our stack frame */
if (ctx->seen & SEEN_FUNC)
EMIT(PPC_RAW_LWZ(_R0, _R1, BPF_PPC_STACKFRAME(ctx) + PPC_LR_STKOFF));
+ /* Tear down our stack frame */
EMIT(PPC_RAW_ADDI(_R1, _R1, BPF_PPC_STACKFRAME(ctx)));
if (ctx->seen & SEEN_FUNC)
EMIT(PPC_RAW_MTLR(_R0));
+}
+
+void bpf_jit_build_epilogue(u32 *image, struct codegen_context *ctx)
+{
+ EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_0)));
+
+ bpf_jit_emit_common_epilogue(image, ctx);
+
EMIT(PPC_RAW_BLR());
}
@@ -244,7 +240,6 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
EMIT(PPC_RAW_RLWINM(_R3, b2p_index, 2, 0, 29));
EMIT(PPC_RAW_ADD(_R3, _R3, b2p_bpf_array));
EMIT(PPC_RAW_LWZ(_R3, _R3, offsetof(struct bpf_array, ptrs)));
- EMIT(PPC_RAW_STW(_R0, _R1, bpf_jit_stack_offsetof(ctx, BPF_PPC_TC)));
/*
* if (prog == NULL)
@@ -255,19 +250,14 @@ static int bpf_jit_emit_tail_call(u32 *image, struct codegen_context *ctx, u32 o
/* goto *(prog->bpf_func + prologue_size); */
EMIT(PPC_RAW_LWZ(_R3, _R3, offsetof(struct bpf_prog, bpf_func)));
-
- if (ctx->seen & SEEN_FUNC)
- EMIT(PPC_RAW_LWZ(_R0, _R1, BPF_PPC_STACKFRAME(ctx) + PPC_LR_STKOFF));
-
EMIT(PPC_RAW_ADDIC(_R3, _R3, BPF_TAILCALL_PROLOGUE_SIZE));
-
- if (ctx->seen & SEEN_FUNC)
- EMIT(PPC_RAW_MTLR(_R0));
-
EMIT(PPC_RAW_MTCTR(_R3));
EMIT(PPC_RAW_MR(_R3, bpf_to_ppc(BPF_REG_1)));
+ /* Put tail_call_cnt in r4 */
+ EMIT(PPC_RAW_MR(_R4, _R0));
+
/* tear restore NVRs, ... */
bpf_jit_emit_common_epilogue(image, ctx);
--
2.38.1
From: Eric Biggers <ebiggers(a)google.com>
Mounting a filesystem whose journal inode has the encrypt flag causes a
NULL dereference in fscrypt_limit_io_blocks() when the 'inlinecrypt'
mount option is used.
The problem is that when jbd2_journal_init_inode() calls bmap(), it
eventually finds its way into ext4_iomap_begin(), which calls
fscrypt_limit_io_blocks(). fscrypt_limit_io_blocks() requires that if
the inode is encrypted, then its encryption key must already be set up.
That's not the case here, since the journal inode is never "opened" like
a normal file would be. Hence the crash.
A reproducer is:
mkfs.ext4 -F /dev/vdb
debugfs -w /dev/vdb -R "set_inode_field <8> flags 0x80808"
mount /dev/vdb /mnt -o inlinecrypt
To fix this, make ext4 consider journal inodes with the encrypt flag to
be invalid. (Note, maybe other flags should be rejected on the journal
inode too. For now, this is just the minimal fix for the above issue.)
I've marked this as fixing the commit that introduced the call to
fscrypt_limit_io_blocks(), since that's what made an actual crash start
being possible. But this fix could be applied to any version of ext4
that supports the encrypt feature.
Reported-by: syzbot+ba9dac45bc76c490b7c3(a)syzkaller.appspotmail.com
Fixes: 38ea50daa7a4 ("ext4: support direct I/O with fscrypt using blk-crypto")
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
---
fs/ext4/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 7950904fbf04f..2274f730b87e5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -5723,7 +5723,7 @@ static struct inode *ext4_get_journal_inode(struct super_block *sb,
ext4_debug("Journal inode found at %p: %lld bytes\n",
journal_inode, journal_inode->i_size);
- if (!S_ISREG(journal_inode->i_mode)) {
+ if (!S_ISREG(journal_inode->i_mode) || IS_ENCRYPTED(journal_inode)) {
ext4_msg(sb, KERN_ERR, "invalid journal inode");
iput(journal_inode);
return NULL;
base-commit: 8f71a2b3f435f29b787537d1abedaa7d8ebe6647
--
2.38.1
ICC_BWMON driver uses REGMAP_MMIO for accessing the hardware registers.
So select the dependency in Kconfig. Without this, there will be errors
while building the driver with COMPILE_TEST only:
ERROR: modpost: "__devm_regmap_init_mmio_clk" [drivers/soc/qcom/icc-bwmon.ko] undefined!
make[1]: *** [scripts/Makefile.modpost:126: Module.symvers] Error 1
make: *** [Makefile:1944: modpost] Error 2
Cc: <stable(a)vger.kernel.org> # 6.0
Cc: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Fixes: b9c2ae6cac40 ("soc: qcom: icc-bwmon: Add bandwidth monitoring driver")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
drivers/soc/qcom/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig
index 024e420f1bb7..75bfdb6f9705 100644
--- a/drivers/soc/qcom/Kconfig
+++ b/drivers/soc/qcom/Kconfig
@@ -236,6 +236,7 @@ config QCOM_ICC_BWMON
tristate "QCOM Interconnect Bandwidth Monitor driver"
depends on ARCH_QCOM || COMPILE_TEST
select PM_OPP
+ select REGMAP_MMIO
help
Sets up driver monitoring bandwidth on various interconnects and
based on that voting for interconnect bandwidth, adjusting their
--
2.25.1