From: Petr Pavlu <petr.pavlu(a)suse.com>
The reader code in rb_get_reader_page() swaps a new reader page into the
ring buffer by doing cmpxchg on old->list.prev->next to point it to the
new page. Following that, if the operation is successful,
old->list.next->prev gets updated too. This means the underlying
doubly-linked list is temporarily inconsistent, page->prev->next or
page->next->prev might not be equal back to page for some page in the
ring buffer.
The resize operation in ring_buffer_resize() can be invoked in parallel.
It calls rb_check_pages() which can detect the described inconsistency
and stop further tracing:
[ 190.271762] ------------[ cut here ]------------
[ 190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0
[ 190.271789] Modules linked in: [...]
[ 190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1
[ 190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G E 6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f
[ 190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[ 190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0
[ 190.272023] Code: [...]
[ 190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206
[ 190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80
[ 190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700
[ 190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000
[ 190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720
[ 190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000
[ 190.272053] FS: 00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000
[ 190.272057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0
[ 190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 190.272077] Call Trace:
[ 190.272098] <TASK>
[ 190.272189] ring_buffer_resize+0x2ab/0x460
[ 190.272199] __tracing_resize_ring_buffer.part.0+0x23/0xa0
[ 190.272206] tracing_resize_ring_buffer+0x65/0x90
[ 190.272216] tracing_entries_write+0x74/0xc0
[ 190.272225] vfs_write+0xf5/0x420
[ 190.272248] ksys_write+0x67/0xe0
[ 190.272256] do_syscall_64+0x82/0x170
[ 190.272363] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 190.272373] RIP: 0033:0x7f1bd657d263
[ 190.272381] Code: [...]
[ 190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263
[ 190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001
[ 190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000
[ 190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500
[ 190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002
[ 190.272412] </TASK>
[ 190.272414] ---[ end trace 0000000000000000 ]---
Note that ring_buffer_resize() calls rb_check_pages() only if the parent
trace_buffer has recording disabled. Recent commit d78ab792705c
("tracing: Stop current tracer when resizing buffer") causes that it is
now always the case which makes it more likely to experience this issue.
The window to hit this race is nonetheless very small. To help
reproducing it, one can add a delay loop in rb_get_reader_page():
ret = rb_head_page_replace(reader, cpu_buffer->reader_page);
if (!ret)
goto spin;
for (unsigned i = 0; i < 1U << 26; i++) /* inserted delay loop */
__asm__ __volatile__ ("" : : : "memory");
rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;
.. and then run the following commands on the target system:
echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable
while true; do
echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
done &
while true; do
for i in /sys/kernel/tracing/per_cpu/*; do
timeout 0.1 cat $i/trace_pipe; sleep 0.2
done
done
To fix the problem, make sure ring_buffer_resize() doesn't invoke
rb_check_pages() concurrently with a reader operating on the same
ring_buffer_per_cpu by taking its cpu_buffer->reader_lock.
Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavl…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 659f451ff213 ("ring-buffer: Add integrity check at end of iter read")
Signed-off-by: Petr Pavlu <petr.pavlu(a)suse.com>
[ Fixed whitespace ]
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 42227727a49d..28853966aa9a 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1460,6 +1460,11 @@ static void rb_check_bpage(struct ring_buffer_per_cpu *cpu_buffer,
*
* As a safety measure we check to make sure the data pages have not
* been corrupted.
+ *
+ * Callers of this function need to guarantee that the list of pages doesn't get
+ * modified during the check. In particular, if it's possible that the function
+ * is invoked with concurrent readers which can swap in a new reader page then
+ * the caller should take cpu_buffer->reader_lock.
*/
static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
{
@@ -2210,8 +2215,12 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
*/
synchronize_rcu();
for_each_buffer_cpu(buffer, cpu) {
+ unsigned long flags;
+
cpu_buffer = buffer->buffers[cpu];
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
rb_check_pages(cpu_buffer);
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
}
atomic_dec(&buffer->record_disabled);
}
--
2.43.0
Hello,
Upstream commit 11fbb1bfb5bc8c98b2d7db9da332b5e568f4aaab ("ice: use
relative VSI index for VFs VSIs") was applied to stable 6.1, 6.6 and 6.8:
6.1: 5693dd6d3d01f0eea24401f815c98b64cb315b67
6.6: c926393dc3442c38fdcab17d040837cf4acad1c3
6.8: d3da0d4d9fb472ad7dccb784f3d9de40d0c2f6a9
However, it was a part of a series submitted to net-next [1]. Applying
this one patch on its own broke the VF devices created with the ice as a PF:
# [ 307.688237] iavf: Intel(R) Ethernet Adaptive Virtual Function
Network Driver
# [ 307.688241] Copyright (c) 2013 - 2018 Intel Corporation.
# [ 307.688424] iavf 0000:af:01.0: enabling device (0000 -> 0002)
# [ 307.758860] iavf 0000:af:01.0: Invalid MAC address
00:00:00:00:00:00, using random
# [ 307.759628] iavf 0000:af:01.0: Multiqueue Enabled: Queue pair
count = 16
# [ 307.759683] iavf 0000:af:01.0: MAC address: 6a:46:83:88:c2:26
# [ 307.759688] iavf 0000:af:01.0: GRO is enabled
# [ 307.790937] iavf 0000:af:01.0 ens802f0v0: renamed from eth0
# [ 307.896041] iavf 0000:af:01.0: PF returned error -5
(IAVF_ERR_PARAM) to our request 6
# [ 307.916967] iavf 0000:af:01.0: PF returned error -5
(IAVF_ERR_PARAM) to our request 8
The VF initialization fails and the VF device is completely unusable.
This can be fixed either by:
1 - Reverting the above mentioned commit (upstream
11fbb1bfb5bc8c98b2d7db9da332b5e568f4aaab)
Or,
2 - applying the following upstream commits (part of the series):
a) a21605993dd5dfd15edfa7f06705ede17b519026 ("ice: pass VSI pointer
into ice_vc_isvalid_q_id")
b) 363f689600dd010703ce6391bcfc729a97d21840 ("ice: remove unnecessary
duplicate checks for VF VSI ID")
Thanks,
Ahmed
[1]: https://www.spinics.net/lists/netdev/msg979289.html
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
When a remount happens, if a gid or uid is specified update the inodes to
have the same gid and uid. This will allow the simplification of the
permissions logic for the dynamically created files and directories.
Cc: stable(a)vger.kernel.org
Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 17 +++++++++++++----
fs/tracefs/inode.c | 15 ++++++++++++---
2 files changed, 25 insertions(+), 7 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 55a40a730b10..5dfb1ccd56ea 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -317,20 +317,29 @@ void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid)
if (!ei)
return;
- if (update_uid)
+ if (update_uid) {
ei->attr.mode &= ~EVENTFS_SAVE_UID;
+ ei->attr.uid = ti->vfs_inode.i_uid;
+ }
+
- if (update_gid)
+ if (update_gid) {
ei->attr.mode &= ~EVENTFS_SAVE_GID;
+ ei->attr.gid = ti->vfs_inode.i_gid;
+ }
if (!ei->entry_attrs)
return;
for (int i = 0; i < ei->nr_entries; i++) {
- if (update_uid)
+ if (update_uid) {
ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID;
- if (update_gid)
+ ei->entry_attrs[i].uid = ti->vfs_inode.i_uid;
+ }
+ if (update_gid) {
ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID;
+ ei->entry_attrs[i].gid = ti->vfs_inode.i_gid;
+ }
}
}
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index a827f6a716c4..9252e0d78ea2 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -373,12 +373,21 @@ static int tracefs_apply_options(struct super_block *sb, bool remount)
rcu_read_lock();
list_for_each_entry_rcu(ti, &tracefs_inodes, list) {
- if (update_uid)
+ if (update_uid) {
ti->flags &= ~TRACEFS_UID_PERM_SET;
+ ti->vfs_inode.i_uid = fsi->uid;
+ }
- if (update_gid)
+ if (update_gid) {
ti->flags &= ~TRACEFS_GID_PERM_SET;
-
+ ti->vfs_inode.i_gid = fsi->gid;
+ }
+
+ /*
+ * Note, the above ti->vfs_inode updates are
+ * used in eventfs_remount() so they must come
+ * before calling it.
+ */
if (ti->flags & TRACEFS_EVENT_INODE)
eventfs_remount(ti, update_uid, update_gid);
}
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The directories require unique inode numbers but all the eventfs files
have the same inode number. Prevent the directories from having the same
inode numbers as the files as that can confuse some tooling.
Cc: stable(a)vger.kernel.org
Fixes: 834bf76add3e6 ("eventfs: Save directory inodes in the eventfs_inode structure")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 0256afdd4acf..55a40a730b10 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -50,8 +50,12 @@ static struct eventfs_root_inode *get_root_inode(struct eventfs_inode *ei)
/* Just try to make something consistent and unique */
static int eventfs_dir_ino(struct eventfs_inode *ei)
{
- if (!ei->ino)
+ if (!ei->ino) {
ei->ino = get_next_ino();
+ /* Must not have the file inode number */
+ if (ei->ino == EVENTFS_FILE_INODE_INO)
+ ei->ino = get_next_ino();
+ }
return ei->ino;
}
--
2.43.0
From: Baokun Li <libaokun1(a)huawei.com>
commit d36f6ed761b53933b0b4126486c10d3da7751e7f upstream.
Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/extents_status.c:199!
[...]
RIP: 0010:ext4_es_end fs/ext4/extents_status.c:199 [inline]
RIP: 0010:__es_tree_search+0x1e0/0x260 fs/ext4/extents_status.c:217
[...]
Call Trace:
ext4_es_cache_extent+0x109/0x340 fs/ext4/extents_status.c:766
ext4_cache_extents+0x239/0x2e0 fs/ext4/extents.c:561
ext4_find_extent+0x6b7/0xa20 fs/ext4/extents.c:964
ext4_ext_map_blocks+0x16b/0x4b70 fs/ext4/extents.c:4384
ext4_map_blocks+0xe26/0x19f0 fs/ext4/inode.c:567
ext4_getblk+0x320/0x4c0 fs/ext4/inode.c:980
ext4_bread+0x2d/0x170 fs/ext4/inode.c:1031
ext4_quota_read+0x248/0x320 fs/ext4/super.c:6257
v2_read_header+0x78/0x110 fs/quota/quota_v2.c:63
v2_check_quota_file+0x76/0x230 fs/quota/quota_v2.c:82
vfs_load_quota_inode+0x5d1/0x1530 fs/quota/dquot.c:2368
dquot_enable+0x28a/0x330 fs/quota/dquot.c:2490
ext4_quota_enable fs/ext4/super.c:6137 [inline]
ext4_enable_quotas+0x5d7/0x960 fs/ext4/super.c:6163
ext4_fill_super+0xa7c9/0xdc00 fs/ext4/super.c:4754
mount_bdev+0x2e9/0x3b0 fs/super.c:1158
mount_fs+0x4b/0x1e4 fs/super.c:1261
[...]
==================================================================
Above issue may happen as follows:
-------------------------------------
ext4_fill_super
ext4_enable_quotas
ext4_quota_enable
ext4_iget
__ext4_iget
ext4_ext_check_inode
ext4_ext_check
__ext4_ext_check
ext4_valid_extent_entries
Check for overlapping extents does't take effect
dquot_enable
vfs_load_quota_inode
v2_check_quota_file
v2_read_header
ext4_quota_read
ext4_bread
ext4_getblk
ext4_map_blocks
ext4_ext_map_blocks
ext4_find_extent
ext4_cache_extents
ext4_es_cache_extent
ext4_es_cache_extent
__es_tree_search
ext4_es_end
BUG_ON(es->es_lblk + es->es_len < es->es_lblk)
The error ext4 extents is as follows:
0af3 0300 0400 0000 00000000 extent_header
00000000 0100 0000 12000000 extent1
00000000 0100 0000 18000000 extent2
02000000 0400 0000 14000000 extent3
In the ext4_valid_extent_entries function,
if prev is 0, no error is returned even if lblock<=prev.
This was intended to skip the check on the first extent, but
in the error image above, prev=0+1-1=0 when checking the second extent,
so even though lblock<=prev, the function does not return an error.
As a result, bug_ON occurs in __es_tree_search and the system panics.
To solve this problem, we only need to check that:
1. The lblock of the first extent is not less than 0.
2. The lblock of the next extent is not less than
the next block of the previous extent.
The same applies to extent_idx.
Cc: stable(a)kernel.org
Fixes: 5946d089379a ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220518120816.1541863-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Reported-by: syzbot+2a58d88f0fb315c85363(a)syzkaller.appspotmail.com
[gpiccoli: Manual backport due to unrelated missing patches.]
Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
---
Hey folks, this one should have been backported but due to merge
issues [0], it ended-up not being on 5.4.y . So here is a working version!
Cheers,
Guilherme
[0] https://lore.kernel.org/stable/165451751147179@kroah.com/
fs/ext4/extents.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 98e1b1ddb4ec..90b12c7c0f20 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -409,7 +409,7 @@ static int ext4_valid_extent_entries(struct inode *inode,
{
unsigned short entries;
ext4_lblk_t lblock = 0;
- ext4_lblk_t prev = 0;
+ ext4_lblk_t cur = 0;
if (eh->eh_entries == 0)
return 1;
@@ -435,12 +435,12 @@ static int ext4_valid_extent_entries(struct inode *inode,
/* Check for overlapping extents */
lblock = le32_to_cpu(ext->ee_block);
- if ((lblock <= prev) && prev) {
+ if (lblock < cur) {
pblock = ext4_ext_pblock(ext);
es->s_last_error_block = cpu_to_le64(pblock);
return 0;
}
- prev = lblock + ext4_ext_get_actual_len(ext) - 1;
+ cur = lblock + ext4_ext_get_actual_len(ext);
ext++;
entries--;
}
@@ -460,13 +460,13 @@ static int ext4_valid_extent_entries(struct inode *inode,
/* Check for overlapping index extents */
lblock = le32_to_cpu(ext_idx->ei_block);
- if ((lblock <= prev) && prev) {
+ if (lblock < cur) {
*pblk = ext4_idx_pblock(ext_idx);
return 0;
}
ext_idx++;
entries--;
- prev = lblock;
+ cur = lblock + 1;
}
}
return 1;
--
2.43.2
From: NeilBrown <neilb(a)suse.de>
[ Upstream commit 3903902401451b1cd9d797a8c79769eb26ac7fe5 ]
The original implementation of nfsd used signals to stop threads during
shutdown.
In Linux 2.3.46pre5 nfsd gained the ability to shutdown threads
internally it if was asked to run "0" threads. After this user-space
transitioned to using "rpc.nfsd 0" to stop nfsd and sending signals to
threads was no longer an important part of the API.
In commit 3ebdbe5203a8 ("SUNRPC: discard svo_setup and rename
svc_set_num_threads_sync()") (v5.17-rc1~75^2~41) we finally removed the
use of signals for stopping threads, using kthread_stop() instead.
This patch makes the "obvious" next step and removes the ability to
signal nfsd threads - or any svc threads. nfsd stops allowing signals
and we don't check for their delivery any more.
This will allow for some simplification in later patches.
A change worth noting is in nfsd4_ssc_setup_dul(). There was previously
a signal_pending() check which would only succeed when the thread was
being shut down. It should really have tested kthread_should_stop() as
well. Now it just does the latter, not the former.
Signed-off-by: NeilBrown <neilb(a)suse.de>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
fs/nfs/callback.c | 9 +--------
fs/nfsd/nfs4proc.c | 5 ++---
fs/nfsd/nfssvc.c | 12 ------------
net/sunrpc/svc_xprt.c | 16 ++++++----------
4 files changed, 9 insertions(+), 33 deletions(-)
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 456af7d230cf..46a0a2d6962e 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -80,9 +80,6 @@ nfs4_callback_svc(void *vrqstp)
set_freezable();
while (!kthread_freezable_should_stop(NULL)) {
-
- if (signal_pending(current))
- flush_signals(current);
/*
* Listen for a request on the socket
*/
@@ -112,11 +109,7 @@ nfs41_callback_svc(void *vrqstp)
set_freezable();
while (!kthread_freezable_should_stop(NULL)) {
-
- if (signal_pending(current))
- flush_signals(current);
-
- prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE);
+ prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_IDLE);
spin_lock_bh(&serv->sv_cb_lock);
if (!list_empty(&serv->sv_cb_list)) {
req = list_first_entry(&serv->sv_cb_list,
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index c14f5ac1484c..6779291efca9 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1317,12 +1317,11 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr,
/* found a match */
if (ni->nsui_busy) {
/* wait - and try again */
- prepare_to_wait(&nn->nfsd_ssc_waitq, &wait,
- TASK_INTERRUPTIBLE);
+ prepare_to_wait(&nn->nfsd_ssc_waitq, &wait, TASK_IDLE);
spin_unlock(&nn->nfsd_ssc_lock);
/* allow 20secs for mount/unmount for now - revisit */
- if (signal_pending(current) ||
+ if (kthread_should_stop() ||
(schedule_timeout(20*HZ) == 0)) {
finish_wait(&nn->nfsd_ssc_waitq, &wait);
kfree(work);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 4c1a0a1623e5..3d4fd40c987b 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -938,15 +938,6 @@ nfsd(void *vrqstp)
current->fs->umask = 0;
- /*
- * thread is spawned with all signals set to SIG_IGN, re-enable
- * the ones that will bring down the thread
- */
- allow_signal(SIGKILL);
- allow_signal(SIGHUP);
- allow_signal(SIGINT);
- allow_signal(SIGQUIT);
-
atomic_inc(&nfsdstats.th_cnt);
set_freezable();
@@ -971,9 +962,6 @@ nfsd(void *vrqstp)
validate_process_creds();
}
- /* Clear signals before calling svc_exit_thread() */
- flush_signals(current);
-
atomic_dec(&nfsdstats.th_cnt);
out:
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 67ccf1a6459a..b19592673eef 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -700,8 +700,8 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
/* Made progress, don't sleep yet */
continue;
- set_current_state(TASK_INTERRUPTIBLE);
- if (signalled() || kthread_should_stop()) {
+ set_current_state(TASK_IDLE);
+ if (kthread_should_stop()) {
set_current_state(TASK_RUNNING);
return -EINTR;
}
@@ -736,7 +736,7 @@ rqst_should_sleep(struct svc_rqst *rqstp)
return false;
/* are we shutting down? */
- if (signalled() || kthread_should_stop())
+ if (kthread_should_stop())
return false;
/* are we freezing? */
@@ -758,11 +758,7 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout)
if (rqstp->rq_xprt)
goto out_found;
- /*
- * We have to be able to interrupt this wait
- * to bring down the daemons ...
- */
- set_current_state(TASK_INTERRUPTIBLE);
+ set_current_state(TASK_IDLE);
smp_mb__before_atomic();
clear_bit(SP_CONGESTED, &pool->sp_flags);
clear_bit(RQ_BUSY, &rqstp->rq_flags);
@@ -784,7 +780,7 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout)
if (!time_left)
atomic_long_inc(&pool->sp_stats.threads_timedout);
- if (signalled() || kthread_should_stop())
+ if (kthread_should_stop())
return ERR_PTR(-EINTR);
return ERR_PTR(-EAGAIN);
out_found:
@@ -882,7 +878,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
try_to_freeze();
cond_resched();
err = -EINTR;
- if (signalled() || kthread_should_stop())
+ if (kthread_should_stop())
goto out;
xprt = svc_get_next_xprt(rqstp, timeout);
base-commit: b925f60c6ee7ec871d2d48575d0fde3872129c20
--
2.44.0
Hello,
Please pickup commit c79e387389d5add7cb967d2f7622c3bf5550927b ("mfd: stpmic1: Fix swapped mask/unmask in irq chip")
for inclusion in stable kernel 6.1.y.
This fixes this warning at boot:
stpmic1 [...]: mask_base and unmask_base are inverted, please fix it
It also avoid to invert masks later in IRQ framework so regression risks should be minimal.
Thanks!
--
Yoann Congal
Smile ECS - Tech Expert
Hi,
Please backport commit:
ecfe9a015d3e ("pinctrl: core: handle radix_tree_insert() errors in pinctrl_register_one_pin()")
to stable trees 5.4.y, 5.10.y, 5.15.y, 6.1.y. This commit fixes error handling of radix_tree_insert().
This bug was discovered and resolved using Coverity Static Analysis
Security Testing (SAST) by Synopsys, Inc.
Thanks,
Hagar Hemdan
Amazon Web Services Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 317a215d4932
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024052216-jaws-pester-a65d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 317a215d493230da361028ea8a4675de334bfa1a Mon Sep 17 00:00:00 2001
From: Ronald Wahl <ronald.wahl(a)raritan.com>
Date: Mon, 13 May 2024 16:39:22 +0200
Subject: [PATCH] net: ks8851: Fix another TX stall caused by wrong ISR flag
handling
Under some circumstances it may happen that the ks8851 Ethernet driver
stops sending data.
Currently the interrupt handler resets the interrupt status flags in the
hardware after handling TX. With this approach we may lose interrupts in
the time window between handling the TX interrupt and resetting the TX
interrupt status bit.
When all of the three following conditions are true then transmitting
data stops:
- TX queue is stopped to wait for room in the hardware TX buffer
- no queued SKBs in the driver (txq) that wait for being written to hw
- hardware TX buffer is empty and the last TX interrupt was lost
This is because reenabling the TX queue happens when handling the TX
interrupt status but if the TX status bit has already been cleared then
this interrupt will never come.
With this commit the interrupt status flags will be cleared before they
are handled. That way we stop losing interrupts.
The wrong handling of the ISR flags was there from the beginning but
with commit 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX
buffer overrun") the issue becomes apparent.
Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Simon Horman <horms(a)kernel.org>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Ronald Wahl <ronald.wahl(a)raritan.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
index 502518cdb461..6453c92f0fa7 100644
--- a/drivers/net/ethernet/micrel/ks8851_common.c
+++ b/drivers/net/ethernet/micrel/ks8851_common.c
@@ -328,7 +328,6 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
{
struct ks8851_net *ks = _ks;
struct sk_buff_head rxq;
- unsigned handled = 0;
unsigned long flags;
unsigned int status;
struct sk_buff *skb;
@@ -336,24 +335,17 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
ks8851_lock(ks, &flags);
status = ks8851_rdreg16(ks, KS_ISR);
+ ks8851_wrreg16(ks, KS_ISR, status);
netif_dbg(ks, intr, ks->netdev,
"%s: status 0x%04x\n", __func__, status);
- if (status & IRQ_LCI)
- handled |= IRQ_LCI;
-
if (status & IRQ_LDI) {
u16 pmecr = ks8851_rdreg16(ks, KS_PMECR);
pmecr &= ~PMECR_WKEVT_MASK;
ks8851_wrreg16(ks, KS_PMECR, pmecr | PMECR_WKEVT_LINK);
-
- handled |= IRQ_LDI;
}
- if (status & IRQ_RXPSI)
- handled |= IRQ_RXPSI;
-
if (status & IRQ_TXI) {
unsigned short tx_space = ks8851_rdreg16(ks, KS_TXMIR);
@@ -365,20 +357,12 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
if (netif_queue_stopped(ks->netdev))
netif_wake_queue(ks->netdev);
spin_unlock(&ks->statelock);
-
- handled |= IRQ_TXI;
}
- if (status & IRQ_RXI)
- handled |= IRQ_RXI;
-
if (status & IRQ_SPIBEI) {
netdev_err(ks->netdev, "%s: spi bus error\n", __func__);
- handled |= IRQ_SPIBEI;
}
- ks8851_wrreg16(ks, KS_ISR, handled);
-
if (status & IRQ_RXI) {
/* the datasheet says to disable the rx interrupt during
* packet read-out, however we're masking the interrupt
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 317a215d4932
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024052215-epidemic-outpour-6c88@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 317a215d493230da361028ea8a4675de334bfa1a Mon Sep 17 00:00:00 2001
From: Ronald Wahl <ronald.wahl(a)raritan.com>
Date: Mon, 13 May 2024 16:39:22 +0200
Subject: [PATCH] net: ks8851: Fix another TX stall caused by wrong ISR flag
handling
Under some circumstances it may happen that the ks8851 Ethernet driver
stops sending data.
Currently the interrupt handler resets the interrupt status flags in the
hardware after handling TX. With this approach we may lose interrupts in
the time window between handling the TX interrupt and resetting the TX
interrupt status bit.
When all of the three following conditions are true then transmitting
data stops:
- TX queue is stopped to wait for room in the hardware TX buffer
- no queued SKBs in the driver (txq) that wait for being written to hw
- hardware TX buffer is empty and the last TX interrupt was lost
This is because reenabling the TX queue happens when handling the TX
interrupt status but if the TX status bit has already been cleared then
this interrupt will never come.
With this commit the interrupt status flags will be cleared before they
are handled. That way we stop losing interrupts.
The wrong handling of the ISR flags was there from the beginning but
with commit 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX
buffer overrun") the issue becomes apparent.
Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Simon Horman <horms(a)kernel.org>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Ronald Wahl <ronald.wahl(a)raritan.com>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
index 502518cdb461..6453c92f0fa7 100644
--- a/drivers/net/ethernet/micrel/ks8851_common.c
+++ b/drivers/net/ethernet/micrel/ks8851_common.c
@@ -328,7 +328,6 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
{
struct ks8851_net *ks = _ks;
struct sk_buff_head rxq;
- unsigned handled = 0;
unsigned long flags;
unsigned int status;
struct sk_buff *skb;
@@ -336,24 +335,17 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
ks8851_lock(ks, &flags);
status = ks8851_rdreg16(ks, KS_ISR);
+ ks8851_wrreg16(ks, KS_ISR, status);
netif_dbg(ks, intr, ks->netdev,
"%s: status 0x%04x\n", __func__, status);
- if (status & IRQ_LCI)
- handled |= IRQ_LCI;
-
if (status & IRQ_LDI) {
u16 pmecr = ks8851_rdreg16(ks, KS_PMECR);
pmecr &= ~PMECR_WKEVT_MASK;
ks8851_wrreg16(ks, KS_PMECR, pmecr | PMECR_WKEVT_LINK);
-
- handled |= IRQ_LDI;
}
- if (status & IRQ_RXPSI)
- handled |= IRQ_RXPSI;
-
if (status & IRQ_TXI) {
unsigned short tx_space = ks8851_rdreg16(ks, KS_TXMIR);
@@ -365,20 +357,12 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
if (netif_queue_stopped(ks->netdev))
netif_wake_queue(ks->netdev);
spin_unlock(&ks->statelock);
-
- handled |= IRQ_TXI;
}
- if (status & IRQ_RXI)
- handled |= IRQ_RXI;
-
if (status & IRQ_SPIBEI) {
netdev_err(ks->netdev, "%s: spi bus error\n", __func__);
- handled |= IRQ_SPIBEI;
}
- ks8851_wrreg16(ks, KS_ISR, handled);
-
if (status & IRQ_RXI) {
/* the datasheet says to disable the rx interrupt during
* packet read-out, however we're masking the interrupt
Hi,
In some monitors with DSC a divide by zero has been reported and fixed
in 6.10 by this commit.
130afc8a8861 ("drm/amd/display: Fix division by zero in setup_dsc_config")
Can you please bring it back to 5.15.y+?
Thanks,
__kernel_map_pages() is a debug function which clears the valid bit in page
table entry for deallocated pages to detect illegal memory accesses to
freed pages.
This function set/clear the valid bit using __set_memory(). __set_memory()
acquires init_mm's semaphore, and this operation may sleep. This is
problematic, because __kernel_map_pages() can be called in atomic context,
and thus is illegal to sleep. An example warning that this causes:
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd
preempt_count: 2, expected: 0
CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff800060dc>] dump_backtrace+0x1c/0x24
[<ffffffff8091ef6e>] show_stack+0x2c/0x38
[<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72
[<ffffffff8092bb24>] dump_stack+0x14/0x1c
[<ffffffff8003b7ac>] __might_resched+0x104/0x10e
[<ffffffff8003b7f4>] __might_sleep+0x3e/0x62
[<ffffffff8093276a>] down_write+0x20/0x72
[<ffffffff8000cf00>] __set_memory+0x82/0x2fa
[<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4
[<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a
[<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba
[<ffffffff80011904>] copy_process+0x72c/0x17ec
[<ffffffff80012ab4>] kernel_clone+0x60/0x2fe
[<ffffffff80012f62>] kernel_thread+0x82/0xa0
[<ffffffff8003552c>] kthreadd+0x14a/0x1be
[<ffffffff809357de>] ret_from_fork+0xe/0x1c
Rewrite this function with apply_to_existing_page_range(). It is fine to
not have any locking, because __kernel_map_pages() works with pages being
allocated/deallocated and those pages are not changed by anyone else in the
meantime.
Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
arch/riscv/mm/pageattr.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c
index 410056a50aa9..271d01a5ba4d 100644
--- a/arch/riscv/mm/pageattr.c
+++ b/arch/riscv/mm/pageattr.c
@@ -387,17 +387,33 @@ int set_direct_map_default_noflush(struct page *page)
}
#ifdef CONFIG_DEBUG_PAGEALLOC
+static int debug_pagealloc_set_page(pte_t *pte, unsigned long addr, void *data)
+{
+ int enable = *(int *)data;
+
+ unsigned long val = pte_val(ptep_get(pte));
+
+ if (enable)
+ val |= _PAGE_PRESENT;
+ else
+ val &= ~_PAGE_PRESENT;
+
+ set_pte(pte, __pte(val));
+
+ return 0;
+}
+
void __kernel_map_pages(struct page *page, int numpages, int enable)
{
if (!debug_pagealloc_enabled())
return;
- if (enable)
- __set_memory((unsigned long)page_address(page), numpages,
- __pgprot(_PAGE_PRESENT), __pgprot(0));
- else
- __set_memory((unsigned long)page_address(page), numpages,
- __pgprot(0), __pgprot(_PAGE_PRESENT));
+ unsigned long start = (unsigned long)page_address(page);
+ unsigned long size = PAGE_SIZE * numpages;
+
+ apply_to_existing_page_range(&init_mm, start, size, debug_pagealloc_set_page, &enable);
+
+ flush_tlb_kernel_range(start, start + size);
}
#endif
--
2.39.2
debug_pagealloc is a debug feature which clears the valid bit in page table
entry for freed pages to detect illegal accesses to freed memory.
For this feature to work, virtual mapping must have PAGE_SIZE resolution.
(No, we cannot map with huge pages and split them only when needed; because
pages can be allocated/freed in atomic context and page splitting cannot be
done in atomic context)
Force linear mapping to use small pages if debug_pagealloc is enabled.
Note that it is not necessary to force the entire linear mapping, but only
those that are given to memory allocator. Some parts of memory can keep
using huge page mapping (for example, kernel's executable code). But these
parts are minority, so keep it simple. This is just a debug feature, some
extra overhead should be acceptable.
Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support")
Signed-off-by: Nam Cao <namcao(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
Interestingly this feature somehow still worked when first introduced.
My guess is that back then only 2MB page size is used. When a 4KB page is
freed, the entire 2MB will be (incorrectly) invalidated by this feature.
But 2MB is quite small, so no one else happen to use other 4KB pages in
this 2MB area. In other words, it used to work by luck.
Now larger page sizes are used, so this feature invalidate large chunk of
memory, and the probability that someone else access this chunk and
trigger a page fault is much higher.
arch/riscv/mm/init.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 2574f6a3b0e7..73914afa3aba 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -682,6 +682,9 @@ void __init create_pgd_mapping(pgd_t *pgdp,
static uintptr_t __init best_map_size(phys_addr_t pa, uintptr_t va,
phys_addr_t size)
{
+ if (debug_pagealloc_enabled())
+ return PAGE_SIZE;
+
if (pgtable_l5_enabled &&
!(pa & (P4D_SIZE - 1)) && !(va & (P4D_SIZE - 1)) && size >= P4D_SIZE)
return P4D_SIZE;
--
2.39.2
Hi all,
This series fixed some issues on bootloader - kernel
interface.
The first two fixed booting with devicetree, the last two
enhanced kernel's tolerance on different bootloader implementation.
Please review.
Thanks
Signed-off-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
---
Changes in v2:
- Enhance PATCH 3-4 based on off list discussions with Huacai & co.
- Link to v1: https://lore.kernel.org/r/20240521-loongarch-booting-fixes-v1-0-659c201c037…
---
Jiaxun Yang (4):
LoongArch: Fix built-in DTB detection
LoongArch: smp: Add all CPUs enabled by fdt to NUMA node 0
LoongArch: Fix entry point in image header
LoongArch: Override higher address bits in JUMP_VIRT_ADDR
arch/loongarch/include/asm/stackframe.h | 2 +-
arch/loongarch/kernel/head.S | 2 +-
arch/loongarch/kernel/setup.c | 6 ++++--
arch/loongarch/kernel/smp.c | 5 ++++-
drivers/firmware/efi/libstub/loongarch.c | 2 +-
5 files changed, 11 insertions(+), 6 deletions(-)
---
base-commit: 124cfbcd6d185d4f50be02d5f5afe61578916773
change-id: 20240521-loongarch-booting-fixes-366e13e7ca55
Best regards,
--
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
The arm64 crypto drivers duplicate driver names when adding simd
variants, which after backported commit 27016f75f5ed ("crypto: api -
Disallow identical driver names"), causes an error that leads to the
aes algs not being installed. On weaker processors this results in hangs
due to falling back to SW crypto.
Use simd_skcipher_create() as it will properly namespace the new algs.
This issue does not exist in mainline/latest (and stable v6.1+) as the
driver has been refactored to remove the simd algs from this code path.
Fixes: 27016f75f5ed ("crypto: api - Disallow identical driver names")
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: stable(a)vger.kernel.org
Signed-off-by: Liam Kearney <liam.kearney(a)canonical.com>
---
v2:
- No changes
---
arch/arm64/crypto/aes-glue.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/arch/arm64/crypto/aes-glue.c b/arch/arm64/crypto/aes-glue.c
index aa13344a3a5e..af862e52a36b 100644
--- a/arch/arm64/crypto/aes-glue.c
+++ b/arch/arm64/crypto/aes-glue.c
@@ -1028,7 +1028,6 @@ static int __init aes_init(void)
struct simd_skcipher_alg *simd;
const char *basename;
const char *algname;
- const char *drvname;
int err;
int i;
@@ -1045,9 +1044,8 @@ static int __init aes_init(void)
continue;
algname = aes_algs[i].base.cra_name + 2;
- drvname = aes_algs[i].base.cra_driver_name + 2;
basename = aes_algs[i].base.cra_driver_name;
- simd = simd_skcipher_create_compat(algname, drvname, basename);
+ simd = simd_skcipher_create(algname, basename);
err = PTR_ERR(simd);
if (IS_ERR(simd))
goto unregister_simds;
--
2.40.1
The patch titled
Subject: nilfs2: fix potential hang in nilfs_detach_log_writer()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-potential-hang-in-nilfs_detach_log_writer.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix potential hang in nilfs_detach_log_writer()
Date: Mon, 20 May 2024 22:26:21 +0900
Syzbot has reported a potential hang in nilfs_detach_log_writer() called
during nilfs2 unmount.
Analysis revealed that this is because nilfs_segctor_sync(), which
synchronizes with the log writer thread, can be called after
nilfs_segctor_destroy() terminates that thread, as shown in the call trace
below:
nilfs_detach_log_writer
nilfs_segctor_destroy
nilfs_segctor_kill_thread --> Shut down log writer thread
flush_work
nilfs_iput_work_func
nilfs_dispose_list
iput
nilfs_evict_inode
nilfs_transaction_commit
nilfs_construct_segment (if inode needs sync)
nilfs_segctor_sync --> Attempt to synchronize with
log writer thread
*** DEADLOCK ***
Fix this issue by changing nilfs_segctor_sync() so that the log writer
thread returns normally without synchronizing after it terminates, and by
forcing tasks that are already waiting to complete once after the thread
terminates.
The skipped inode metadata flushout will then be processed together in the
subsequent cleanup work in nilfs_segctor_destroy().
Link: https://lkml.kernel.org/r/20240520132621.4054-4-konishi.ryusuke@gmail.com
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+e3973c409251e136fdd0(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=e3973c409251e136fdd0
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Cc: "Bai, Shuangpeng" <sjb7183(a)psu.edu>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 21 ++++++++++++++++++---
1 file changed, 18 insertions(+), 3 deletions(-)
--- a/fs/nilfs2/segment.c~nilfs2-fix-potential-hang-in-nilfs_detach_log_writer
+++ a/fs/nilfs2/segment.c
@@ -2190,6 +2190,14 @@ static int nilfs_segctor_sync(struct nil
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
+ /*
+ * Synchronize only while the log writer thread is alive.
+ * Leave flushing out after the log writer thread exits to
+ * the cleanup work in nilfs_segctor_destroy().
+ */
+ if (!sci->sc_task)
+ break;
+
if (atomic_read(&wait_req.done)) {
err = wait_req.err;
break;
@@ -2205,7 +2213,7 @@ static int nilfs_segctor_sync(struct nil
return err;
}
-static void nilfs_segctor_wakeup(struct nilfs_sc_info *sci, int err)
+static void nilfs_segctor_wakeup(struct nilfs_sc_info *sci, int err, bool force)
{
struct nilfs_segctor_wait_request *wrq, *n;
unsigned long flags;
@@ -2213,7 +2221,7 @@ static void nilfs_segctor_wakeup(struct
spin_lock_irqsave(&sci->sc_wait_request.lock, flags);
list_for_each_entry_safe(wrq, n, &sci->sc_wait_request.head, wq.entry) {
if (!atomic_read(&wrq->done) &&
- nilfs_cnt32_ge(sci->sc_seq_done, wrq->seq)) {
+ (force || nilfs_cnt32_ge(sci->sc_seq_done, wrq->seq))) {
wrq->err = err;
atomic_set(&wrq->done, 1);
}
@@ -2362,7 +2370,7 @@ static void nilfs_segctor_notify(struct
if (mode == SC_LSEG_SR) {
sci->sc_state &= ~NILFS_SEGCTOR_COMMIT;
sci->sc_seq_done = sci->sc_seq_accepted;
- nilfs_segctor_wakeup(sci, err);
+ nilfs_segctor_wakeup(sci, err, false);
sci->sc_flush_request = 0;
} else {
if (mode == SC_FLUSH_FILE)
@@ -2746,6 +2754,13 @@ static void nilfs_segctor_destroy(struct
|| sci->sc_seq_request != sci->sc_seq_done);
spin_unlock(&sci->sc_state_lock);
+ /*
+ * Forcibly wake up tasks waiting in nilfs_segctor_sync(), which can
+ * be called from delayed iput() via nilfs_evict_inode() and can race
+ * with the above log writer thread termination.
+ */
+ nilfs_segctor_wakeup(sci, 0, true);
+
if (flush_work(&sci->sc_iput_work))
flag = true;
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-use-after-free-of-timer-for-log-writer-thread.patch
nilfs2-fix-unexpected-freezing-of-nilfs_segctor_sync.patch
nilfs2-fix-potential-hang-in-nilfs_detach_log_writer.patch
The patch titled
Subject: nilfs2: fix unexpected freezing of nilfs_segctor_sync()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-unexpected-freezing-of-nilfs_segctor_sync.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix unexpected freezing of nilfs_segctor_sync()
Date: Mon, 20 May 2024 22:26:20 +0900
A potential and reproducible race issue has been identified where
nilfs_segctor_sync() would block even after the log writer thread writes a
checkpoint, unless there is an interrupt or other trigger to resume log
writing.
This turned out to be because, depending on the execution timing of the
log writer thread running in parallel, the log writer thread may skip
responding to nilfs_segctor_sync(), which causes a call to schedule()
waiting for completion within nilfs_segctor_sync() to lose the opportunity
to wake up.
The reason why waking up the task waiting in nilfs_segctor_sync() may be
skipped is that updating the request generation issued using a shared
sequence counter and adding an wait queue entry to the request wait queue
to the log writer, are not done atomically. There is a possibility that
log writing and request completion notification by nilfs_segctor_wakeup()
may occur between the two operations, and in that case, the wait queue
entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of
nilfs_segctor_sync() will be carried over until the next request occurs.
Fix this issue by performing these two operations simultaneously within
the lock section of sc_state_lock. Also, following the memory barrier
guidelines for event waiting loops, move the call to set_current_state()
in the same location into the event waiting loop to ensure that a memory
barrier is inserted just before the event condition determination.
Link: https://lkml.kernel.org/r/20240520132621.4054-3-konishi.ryusuke@gmail.com
Fixes: 9ff05123e3bf ("nilfs2: segment constructor")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Cc: "Bai, Shuangpeng" <sjb7183(a)psu.edu>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 17 +++++++++++++----
1 file changed, 13 insertions(+), 4 deletions(-)
--- a/fs/nilfs2/segment.c~nilfs2-fix-unexpected-freezing-of-nilfs_segctor_sync
+++ a/fs/nilfs2/segment.c
@@ -2168,19 +2168,28 @@ static int nilfs_segctor_sync(struct nil
struct nilfs_segctor_wait_request wait_req;
int err = 0;
- spin_lock(&sci->sc_state_lock);
init_wait(&wait_req.wq);
wait_req.err = 0;
atomic_set(&wait_req.done, 0);
+ init_waitqueue_entry(&wait_req.wq, current);
+
+ /*
+ * To prevent a race issue where completion notifications from the
+ * log writer thread are missed, increment the request sequence count
+ * "sc_seq_request" and insert a wait queue entry using the current
+ * sequence number into the "sc_wait_request" queue at the same time
+ * within the lock section of "sc_state_lock".
+ */
+ spin_lock(&sci->sc_state_lock);
wait_req.seq = ++sci->sc_seq_request;
+ add_wait_queue(&sci->sc_wait_request, &wait_req.wq);
spin_unlock(&sci->sc_state_lock);
- init_waitqueue_entry(&wait_req.wq, current);
- add_wait_queue(&sci->sc_wait_request, &wait_req.wq);
- set_current_state(TASK_INTERRUPTIBLE);
wake_up(&sci->sc_wait_daemon);
for (;;) {
+ set_current_state(TASK_INTERRUPTIBLE);
+
if (atomic_read(&wait_req.done)) {
err = wait_req.err;
break;
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-use-after-free-of-timer-for-log-writer-thread.patch
nilfs2-fix-unexpected-freezing-of-nilfs_segctor_sync.patch
nilfs2-fix-potential-hang-in-nilfs_detach_log_writer.patch
The patch titled
Subject: nilfs2: fix use-after-free of timer for log writer thread
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-use-after-free-of-timer-for-log-writer-thread.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix use-after-free of timer for log writer thread
Date: Mon, 20 May 2024 22:26:19 +0900
Patch series "nilfs2: fix log writer related issues".
This bug fix series covers three nilfs2 log writer-related issues,
including a timer use-after-free issue and potential deadlock issue on
unmount, and a potential freeze issue in event synchronization found
during their analysis. Details are described in each commit log.
This patch (of 3):
A use-after-free issue has been reported regarding the timer sc_timer on
the nilfs_sc_info structure.
The problem is that even though it is used to wake up a sleeping log
writer thread, sc_timer is not shut down until the nilfs_sc_info structure
is about to be freed, and is used regardless of the thread's lifetime.
Fix this issue by limiting the use of sc_timer only while the log writer
thread is alive.
Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com
Fixes: fdce895ea5dd ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: "Bai, Shuangpeng" <sjb7183(a)psu.edu>
Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJ
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/segment.c | 25 +++++++++++++++++++------
1 file changed, 19 insertions(+), 6 deletions(-)
--- a/fs/nilfs2/segment.c~nilfs2-fix-use-after-free-of-timer-for-log-writer-thread
+++ a/fs/nilfs2/segment.c
@@ -2118,8 +2118,10 @@ static void nilfs_segctor_start_timer(st
{
spin_lock(&sci->sc_state_lock);
if (!(sci->sc_state & NILFS_SEGCTOR_COMMIT)) {
- sci->sc_timer.expires = jiffies + sci->sc_interval;
- add_timer(&sci->sc_timer);
+ if (sci->sc_task) {
+ sci->sc_timer.expires = jiffies + sci->sc_interval;
+ add_timer(&sci->sc_timer);
+ }
sci->sc_state |= NILFS_SEGCTOR_COMMIT;
}
spin_unlock(&sci->sc_state_lock);
@@ -2320,10 +2322,21 @@ int nilfs_construct_dsync_segment(struct
*/
static void nilfs_segctor_accept(struct nilfs_sc_info *sci)
{
+ bool thread_is_alive;
+
spin_lock(&sci->sc_state_lock);
sci->sc_seq_accepted = sci->sc_seq_request;
+ thread_is_alive = (bool)sci->sc_task;
spin_unlock(&sci->sc_state_lock);
- del_timer_sync(&sci->sc_timer);
+
+ /*
+ * This function does not race with the log writer thread's
+ * termination. Therefore, deleting sc_timer, which should not be
+ * done after the log writer thread exits, can be done safely outside
+ * the area protected by sc_state_lock.
+ */
+ if (thread_is_alive)
+ del_timer_sync(&sci->sc_timer);
}
/**
@@ -2349,7 +2362,7 @@ static void nilfs_segctor_notify(struct
sci->sc_flush_request &= ~FLUSH_DAT_BIT;
/* re-enable timer if checkpoint creation was not done */
- if ((sci->sc_state & NILFS_SEGCTOR_COMMIT) &&
+ if ((sci->sc_state & NILFS_SEGCTOR_COMMIT) && sci->sc_task &&
time_before(jiffies, sci->sc_timer.expires))
add_timer(&sci->sc_timer);
}
@@ -2539,6 +2552,7 @@ static int nilfs_segctor_thread(void *ar
int timeout = 0;
sci->sc_timer_task = current;
+ timer_setup(&sci->sc_timer, nilfs_construction_timeout, 0);
/* start sync. */
sci->sc_task = current;
@@ -2606,6 +2620,7 @@ static int nilfs_segctor_thread(void *ar
end_thread:
/* end sync. */
sci->sc_task = NULL;
+ timer_shutdown_sync(&sci->sc_timer);
wake_up(&sci->sc_wait_task); /* for nilfs_segctor_kill_thread() */
spin_unlock(&sci->sc_state_lock);
return 0;
@@ -2669,7 +2684,6 @@ static struct nilfs_sc_info *nilfs_segct
INIT_LIST_HEAD(&sci->sc_gc_inodes);
INIT_LIST_HEAD(&sci->sc_iput_queue);
INIT_WORK(&sci->sc_iput_work, nilfs_iput_work_func);
- timer_setup(&sci->sc_timer, nilfs_construction_timeout, 0);
sci->sc_interval = HZ * NILFS_SC_DEFAULT_TIMEOUT;
sci->sc_mjcp_freq = HZ * NILFS_SC_DEFAULT_SR_FREQ;
@@ -2748,7 +2762,6 @@ static void nilfs_segctor_destroy(struct
down_write(&nilfs->ns_segctor_sem);
- timer_shutdown_sync(&sci->sc_timer);
kfree(sci);
}
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-use-after-free-of-timer-for-log-writer-thread.patch
nilfs2-fix-unexpected-freezing-of-nilfs_segctor_sync.patch
nilfs2-fix-potential-hang-in-nilfs_detach_log_writer.patch
The patch titled
Subject: selftests/mm: fix build warnings on ppc64
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-fix-build-warnings-on-ppc64.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Michael Ellerman <mpe(a)ellerman.id.au>
Subject: selftests/mm: fix build warnings on ppc64
Date: Tue, 21 May 2024 13:02:19 +1000
Fix warnings like:
In file included from uffd-unit-tests.c:8:
uffd-unit-tests.c: In function `uffd_poison_handle_fault':
uffd-common.h:45:33: warning: format `%llu' expects argument of type
`long long unsigned int', but argument 3 has type `__u64' {aka `long
unsigned int'} [-Wformat=]
By switching to unsigned long long for u64 for ppc64 builds.
Link: https://lkml.kernel.org/r/20240521030219.57439-1-mpe@ellerman.id.au
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Shuah Khan <skhan(a)linuxfoundation.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/gup_test.c | 1 +
tools/testing/selftests/mm/uffd-common.h | 1 +
2 files changed, 2 insertions(+)
--- a/tools/testing/selftests/mm/gup_test.c~selftests-mm-fix-build-warnings-on-ppc64
+++ a/tools/testing/selftests/mm/gup_test.c
@@ -1,3 +1,4 @@
+#define __SANE_USERSPACE_TYPES__ // Use ll64
#include <fcntl.h>
#include <errno.h>
#include <stdio.h>
--- a/tools/testing/selftests/mm/uffd-common.h~selftests-mm-fix-build-warnings-on-ppc64
+++ a/tools/testing/selftests/mm/uffd-common.h
@@ -8,6 +8,7 @@
#define __UFFD_COMMON_H__
#define _GNU_SOURCE
+#define __SANE_USERSPACE_TYPES__ // Use ll64
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
_
Patches currently in -mm which might be from mpe(a)ellerman.id.au are
selftests-mm-fix-build-warnings-on-ppc64.patch
The patch titled
Subject: selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-compaction_test-fix-bogus-test-success-and-reduce-probability-of-oom-killer-invocation.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Dev Jain <dev.jain(a)arm.com>
Subject: selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
Date: Tue, 21 May 2024 13:13:58 +0530
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof:
The test wants to run on 80% of available memory to prevent OOM-killing
(see original code comments). Let the value of mem_free at the start
of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the
former case, the test operates on 0.8 * x of memory. In the latter,
the test operates on 0.8 * (x - y) of memory, with y already filled,
hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 *
x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the
memory consumed by hugepages be greater than 25% of x, with x and y
defined as above. The definition of compaction_index is c_index = (x -
y)/z where z is the memory consumed by hugepages after trying to
increase them again. In check_compaction(), we set the number of
hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/compaction_test.c | 71 +++++++++++------
1 file changed, 49 insertions(+), 22 deletions(-)
--- a/tools/testing/selftests/mm/compaction_test.c~selftests-mm-compaction_test-fix-bogus-test-success-and-reduce-probability-of-oom-killer-invocation
+++ a/tools/testing/selftests/mm/compaction_test.c
@@ -82,13 +82,16 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
+ unsigned long initial_nr_hugepages)
{
unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[20] = {0};
char nr_hugepages[20] = {0};
+ char init_nr_hugepages[20] = {0};
+
+ sprintf(init_nr_hugepages, "%lu", initial_nr_hugepages);
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -102,23 +105,6 @@ int check_compaction(unsigned long mem_f
goto out;
}
- if (read(fd, initial_nr_hugepages, sizeof(initial_nr_hugepages)) <= 0) {
- ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
- /* Start with the initial condition of 0 huge pages*/
- if (write(fd, "0", sizeof(char)) != sizeof(char)) {
- ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
/* Request a large number of huge pages. The Kernel will allocate
as much as it can */
if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
@@ -146,8 +132,8 @@ int check_compaction(unsigned long mem_f
lseek(fd, 0, SEEK_SET);
- if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages))
- != strlen(initial_nr_hugepages)) {
+ if (write(fd, init_nr_hugepages, strlen(init_nr_hugepages))
+ != strlen(init_nr_hugepages)) {
ksft_print_msg("Failed to write value to /proc/sys/vm/nr_hugepages: %s\n",
strerror(errno));
goto close_fd;
@@ -171,6 +157,41 @@ int check_compaction(unsigned long mem_f
return ret;
}
+int set_zero_hugepages(unsigned long *initial_nr_hugepages)
+{
+ int fd, ret = -1;
+ char nr_hugepages[20] = {0};
+
+ fd = open("/proc/sys/vm/nr_hugepages", O_RDWR | O_NONBLOCK);
+ if (fd < 0) {
+ ksft_print_msg("Failed to open /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ if (read(fd, nr_hugepages, sizeof(nr_hugepages)) <= 0) {
+ ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ lseek(fd, 0, SEEK_SET);
+
+ /* Start with the initial condition of 0 huge pages */
+ if (write(fd, "0", sizeof(char)) != sizeof(char)) {
+ ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ *initial_nr_hugepages = strtoul(nr_hugepages, NULL, 10);
+ ret = 0;
+
+ close_fd:
+ close(fd);
+
+ out:
+ return ret;
+}
int main(int argc, char **argv)
{
@@ -181,6 +202,7 @@ int main(int argc, char **argv)
unsigned long mem_free = 0;
unsigned long hugepage_size = 0;
long mem_fragmentable_MB = 0;
+ unsigned long initial_nr_hugepages;
ksft_print_header();
@@ -189,6 +211,10 @@ int main(int argc, char **argv)
ksft_set_plan(1);
+ /* Start the test without hugepages reducing mem_free */
+ if (set_zero_hugepages(&initial_nr_hugepages))
+ ksft_exit_fail();
+
lim.rlim_cur = RLIM_INFINITY;
lim.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_MEMLOCK, &lim))
@@ -232,7 +258,8 @@ int main(int argc, char **argv)
entry = entry->next;
}
- if (check_compaction(mem_free, hugepage_size) == 0)
+ if (check_compaction(mem_free, hugepage_size,
+ initial_nr_hugepages) == 0)
ksft_exit_pass();
ksft_exit_fail();
_
Patches currently in -mm which might be from dev.jain(a)arm.com are
selftests-mm-compaction_test-fix-bogus-test-success-on-aarch64.patch
selftests-mm-compaction_test-fix-incorrect-write-of-zero-to-nr_hugepages.patch
selftests-mm-compaction_test-fix-bogus-test-success-and-reduce-probability-of-oom-killer-invocation.patch
The patch titled
Subject: selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-compaction_test-fix-incorrect-write-of-zero-to-nr_hugepages.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Dev Jain <dev.jain(a)arm.com>
Subject: selftests/mm: compaction_test: fix incorrect write of zero to nr_hugepages
Date: Tue, 21 May 2024 13:13:57 +0530
Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read(). Fix that
using lseek().
Link: https://lkml.kernel.org/r/20240521074358.675031-3-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/compaction_test.c | 2 ++
1 file changed, 2 insertions(+)
--- a/tools/testing/selftests/mm/compaction_test.c~selftests-mm-compaction_test-fix-incorrect-write-of-zero-to-nr_hugepages
+++ a/tools/testing/selftests/mm/compaction_test.c
@@ -108,6 +108,8 @@ int check_compaction(unsigned long mem_f
goto close_fd;
}
+ lseek(fd, 0, SEEK_SET);
+
/* Start with the initial condition of 0 huge pages*/
if (write(fd, "0", sizeof(char)) != sizeof(char)) {
ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
_
Patches currently in -mm which might be from dev.jain(a)arm.com are
selftests-mm-compaction_test-fix-bogus-test-success-on-aarch64.patch
selftests-mm-compaction_test-fix-incorrect-write-of-zero-to-nr_hugepages.patch
selftests-mm-compaction_test-fix-bogus-test-success-and-reduce-probability-of-oom-killer-invocation.patch
The patch titled
Subject: selftests/mm: compaction_test: fix bogus test success on Aarch64
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-compaction_test-fix-bogus-test-success-on-aarch64.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Dev Jain <dev.jain(a)arm.com>
Subject: selftests/mm: compaction_test: fix bogus test success on Aarch64
Date: Tue, 21 May 2024 13:13:56 +0530
Patch series "Fixes for compaction_test", v2.
The compaction_test memory selftest introduces fragmentation in memory
and then tries to allocate as many hugepages as possible. This series
addresses some problems.
On Aarch64, if nr_hugepages == 0, then the test trivially succeeds since
compaction_index becomes 0, which is less than 3, due to no division by
zero exception being raised. We fix that by checking for division by
zero.
Secondly, correctly set the number of hugepages to zero before trying
to set a large number of them.
Now, consider a situation in which, at the start of the test, a non-zero
number of hugepages have been already set (while running the entire
selftests/mm suite, or manually by the admin). The test operates on 80%
of memory to avoid OOM-killer invocation, and because some memory is
already blocked by hugepages, it would increase the chance of OOM-killing.
Also, since mem_free used in check_compaction() is the value before we
set nr_hugepages to zero, the chance that the compaction_index will
be small is very high if the preset nr_hugepages was high, leading to a
bogus test success.
This patch (of 3):
Currently, if at runtime we are not able to allocate a huge page, the test
will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.
Link: https://lkml.kernel.org/r/20240521074358.675031-1-dev.jain@arm.com
Link: https://lkml.kernel.org/r/20240521074358.675031-2-dev.jain@arm.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/compaction_test.c | 20 +++++++++++------
1 file changed, 13 insertions(+), 7 deletions(-)
--- a/tools/testing/selftests/mm/compaction_test.c~selftests-mm-compaction_test-fix-bogus-test-success-on-aarch64
+++ a/tools/testing/selftests/mm/compaction_test.c
@@ -82,12 +82,13 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
{
+ unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[10] = {0};
- char nr_hugepages[10] = {0};
+ char initial_nr_hugepages[20] = {0};
+ char nr_hugepages[20] = {0};
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -134,7 +135,12 @@ int check_compaction(unsigned long mem_f
/* We should have been able to request at least 1/3 rd of the memory in
huge pages */
- compaction_index = mem_free/(atoi(nr_hugepages) * hugepage_size);
+ nr_hugepages_ul = strtoul(nr_hugepages, NULL, 10);
+ if (!nr_hugepages_ul) {
+ ksft_print_msg("ERROR: No memory is available as huge pages\n");
+ goto close_fd;
+ }
+ compaction_index = mem_free/(nr_hugepages_ul * hugepage_size);
lseek(fd, 0, SEEK_SET);
@@ -145,11 +151,11 @@ int check_compaction(unsigned long mem_f
goto close_fd;
}
- ksft_print_msg("Number of huge pages allocated = %d\n",
- atoi(nr_hugepages));
+ ksft_print_msg("Number of huge pages allocated = %lu\n",
+ nr_hugepages_ul);
if (compaction_index > 3) {
- ksft_print_msg("ERROR: Less that 1/%d of memory is available\n"
+ ksft_print_msg("ERROR: Less than 1/%d of memory is available\n"
"as huge pages\n", compaction_index);
goto close_fd;
}
_
Patches currently in -mm which might be from dev.jain(a)arm.com are
selftests-mm-compaction_test-fix-bogus-test-success-on-aarch64.patch
selftests-mm-compaction_test-fix-incorrect-write-of-zero-to-nr_hugepages.patch
selftests-mm-compaction_test-fix-bogus-test-success-and-reduce-probability-of-oom-killer-invocation.patch
This bit is necessary to enable packets on the interface. Without this
bit set, ethernet behaves as if it is working but no activity occurs.
The vendor SDK sets this bit along with the PHY_ID bits. u-boot will set
this bit as well but if u-boot is not compiled with networking, the
interface will not work.
Fixes: 9a24e1ff4326 ("net: mdio: add amlogic gxl mdio mux support");
Signed-off-by: Da Xue <da(a)libre.computer>
---
drivers/net/mdio/mdio-mux-meson-gxl.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mdio/mdio-mux-meson-gxl.c
b/drivers/net/mdio/mdio-mux-meson-gxl.c
index 89554021b5cc..b2bd57f54034 100644
--- a/drivers/net/mdio/mdio-mux-meson-gxl.c
+++ b/drivers/net/mdio/mdio-mux-meson-gxl.c
@@ -17,6 +17,7 @@
#define REG2_LEDACT GENMASK(23, 22)
#define REG2_LEDLINK GENMASK(25, 24)
#define REG2_DIV4SEL BIT(27)
+#define REG2_RESERVED_28 BIT(28)
#define REG2_ADCBYPASS BIT(30)
#define REG2_CLKINSEL BIT(31)
#define ETH_REG3 0x4
@@ -65,7 +66,7 @@ static void gxl_enable_internal_mdio(struct
gxl_mdio_mux *priv)
* The only constraint is that it must match the one in
* drivers/net/phy/meson-gxl.c to properly match the PHY.
*/
- writel(FIELD_PREP(REG2_PHYID, EPHY_GXL_ID),
+ writel(REG2_RESERVED_28 | FIELD_PREP(REG2_PHYID, EPHY_GXL_ID),
priv->regs + ETH_REG2);
/* Enable the internal phy */
--
2.39.2
Current implementation of PID filtering logic for multi-uprobes in
uprobe_prog_run() is filtering down to exact *thread*, while the intent
for PID filtering it to filter by *process* instead. The check in
uprobe_prog_run() also differs from the analogous one in
uprobe_multi_link_filter() for some reason. The latter is correct,
checking task->mm, not the task itself.
Fix the check in uprobe_prog_run() to perform the same task->mm check.
While doing this, we also update get_pid_task() use to use PIDTYPE_TGID
type of lookup, given the intent is to get a representative task of an
entire process. This doesn't change behavior, but seems more logical. It
would hold task group leader task now, not any random thread task.
Last but not least, given multi-uprobe support is half-broken due to
this PID filtering logic (depending on whether PID filtering is
important or not), we need to make it easy for user space consumers
(including libbpf) to easily detect whether PID filtering logic was
already fixed.
We do it here by adding an early check on passed pid parameter. If it's
negative (and so has no chance of being a valid PID), we return -EINVAL.
Previous behavior would eventually return -ESRCH ("No process found"),
given there can't be any process with negative PID. This subtle change
won't make any practical change in behavior, but will allow applications
to detect PID filtering fixes easily. Libbpf fixes take advantage of
this in the next patch.
Cc: stable(a)vger.kernel.org
Acked-by: Jiri Olsa <jolsa(a)kernel.org>
Fixes: b733eeade420 ("bpf: Add pid filter support for uprobe_multi link")
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
---
kernel/trace/bpf_trace.c | 8 ++++----
.../testing/selftests/bpf/prog_tests/uprobe_multi_test.c | 2 +-
2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index f5154c051d2c..1baaeb9ca205 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -3295,7 +3295,7 @@ static int uprobe_prog_run(struct bpf_uprobe *uprobe,
struct bpf_run_ctx *old_run_ctx;
int err = 0;
- if (link->task && current != link->task)
+ if (link->task && current->mm != link->task->mm)
return 0;
if (sleepable)
@@ -3396,8 +3396,9 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
upath = u64_to_user_ptr(attr->link_create.uprobe_multi.path);
uoffsets = u64_to_user_ptr(attr->link_create.uprobe_multi.offsets);
cnt = attr->link_create.uprobe_multi.cnt;
+ pid = attr->link_create.uprobe_multi.pid;
- if (!upath || !uoffsets || !cnt)
+ if (!upath || !uoffsets || !cnt || pid < 0)
return -EINVAL;
if (cnt > MAX_UPROBE_MULTI_CNT)
return -E2BIG;
@@ -3421,10 +3422,9 @@ int bpf_uprobe_multi_link_attach(const union bpf_attr *attr, struct bpf_prog *pr
goto error_path_put;
}
- pid = attr->link_create.uprobe_multi.pid;
if (pid) {
rcu_read_lock();
- task = get_pid_task(find_vpid(pid), PIDTYPE_PID);
+ task = get_pid_task(find_vpid(pid), PIDTYPE_TGID);
rcu_read_unlock();
if (!task) {
err = -ESRCH;
diff --git a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
index 8269cdee33ae..38fda42fd70f 100644
--- a/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
+++ b/tools/testing/selftests/bpf/prog_tests/uprobe_multi_test.c
@@ -397,7 +397,7 @@ static void test_attach_api_fails(void)
link_fd = bpf_link_create(prog_fd, 0, BPF_TRACE_UPROBE_MULTI, &opts);
if (!ASSERT_ERR(link_fd, "link_fd"))
goto cleanup;
- ASSERT_EQ(link_fd, -ESRCH, "pid_is_wrong");
+ ASSERT_EQ(link_fd, -EINVAL, "pid_is_wrong");
cleanup:
if (link_fd >= 0)
--
2.43.0
Hi There,
Would you be interested in acquiring an attendees list of Railway Interchange Expo?
No. of Attendees: 8,500
Date : May 20th to 22nd
Location: Indianapolis, USA
List Contains - Attendee name, Job Title, Business Name, Web Address, Physical Address, Phone Number, Official Email address, etc.,
Interested? I will send you more details and cost.
Regards,
Deanne Walton |Demand Generation Executive
#regzbot introduced: v6.8..v6.9-rc1
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
Since the above commit present in 6.9+, Node running a Yarn installation
that executes a subprocess always shows the following:
/test # yarn --offline install
yarn install v1.22.22
warning package.json: "test" is also the name of a node core module
warning test(a)1.0.0: "test" is also the name of a node core module
[1/4] Resolving packages...
[2/4] Fetching packages...
[3/4] Linking dependencies...
[4/4] Building fresh packages...
error /test/node_modules/snyk: Command failed.
Exit code: 126
Command: node wrapper_dist/bootstrap.js exec
Arguments:
Directory: /test/node_modules/snyk
Output:
/bin/sh: node: Text file busy
The commit was found by bisection with a simple initramfs that just runs
'yarn --offline install' with a test project and cached Yarn packages.
To reproduce:
npm install -g yarn
mkdir test
cd test
cat > package.json <<EOF
{
"name": "test",
"version": "1.0.0",
"main": "index.js",
"license": "MIT",
"dependencies": {
"snyk": "^1.1291.0"
}
}
EOF
yarn install
Modern Yarn will give the same result but with slightly different output.
This also appears to affect node-gyp:
https://github.com/nodejs/node/issues/53051
See also: https://bugs.gentoo.org/931942
--
Andrew
From: Jakub Kicinski <kuba(a)kernel.org>
commit 8590541473188741055d27b955db0777569438e3 upstream
Since we're setting the CRYPTO_TFM_REQ_MAY_BACKLOG flag on our
requests to the crypto API, crypto_aead_{encrypt,decrypt} can return
-EBUSY instead of -EINPROGRESS in valid situations. For example, when
the cryptd queue for AESNI is full (easy to trigger with an
artificially low cryptd.cryptd_max_cpu_qlen), requests will be enqueued
to the backlog but still processed. In that case, the async callback
will also be called twice: first with err == -EINPROGRESS, which it
seems we can just ignore, then with err == 0.
Compared to Sabrina's original patch this version uses the new
tls_*crypt_async_wait() helpers and converts the EBUSY to
EINPROGRESS to avoid having to modify all the error handling
paths. The handling is identical.
Fixes: a54667f6728c ("tls: Add support for encryption using async offload accelerator")
Fixes: 94524d8fc965 ("net/tls: Add support for async decryption of tls records")
Co-developed-by: Sabrina Dubroca <sd(a)queasysnail.net>
Signed-off-by: Sabrina Dubroca <sd(a)queasysnail.net>
Link: https://lore.kernel.org/netdev/9681d1febfec295449a62300938ed2ae66983f28.169…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Srish: fixed merge-conflict in stable branch linux-6.1.y,
needs to go on top of https://lore.kernel.org/stable/20240307155930.913525-1-lee@kernel.org/]
Signed-off-by: Srish Srinivasan <srish.srinivasan(a)broadcom.com>
---
net/tls/tls_sw.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 2bd27b777..61b01dfc6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -195,6 +195,17 @@ static void tls_decrypt_done(crypto_completion_data_t *data, int err)
struct sock *sk;
int aead_size;
+ /* If requests get too backlogged crypto API returns -EBUSY and calls
+ * ->complete(-EINPROGRESS) immediately followed by ->complete(0)
+ * to make waiting for backlog to flush with crypto_wait_req() easier.
+ * First wait converts -EBUSY -> -EINPROGRESS, and the second one
+ * -EINPROGRESS -> 0.
+ * We have a single struct crypto_async_request per direction, this
+ * scheme doesn't help us, so just ignore the first ->complete().
+ */
+ if (err == -EINPROGRESS)
+ return;
+
aead_size = sizeof(*aead_req) + crypto_aead_reqsize(aead);
aead_size = ALIGN(aead_size, __alignof__(*dctx));
dctx = (void *)((u8 *)aead_req + aead_size);
@@ -268,6 +279,10 @@ static int tls_do_decryption(struct sock *sk,
}
ret = crypto_aead_decrypt(aead_req);
+ if (ret == -EBUSY) {
+ ret = tls_decrypt_async_wait(ctx);
+ ret = ret ?: -EINPROGRESS;
+ }
if (ret == -EINPROGRESS) {
if (darg->async)
return 0;
@@ -452,6 +467,9 @@ static void tls_encrypt_done(crypto_completion_data_t *data, int err)
bool ready = false;
struct sock *sk;
+ if (err == -EINPROGRESS) /* see the comment in tls_decrypt_done() */
+ return;
+
rec = container_of(aead_req, struct tls_rec, aead_req);
msg_en = &rec->msg_encrypted;
@@ -560,6 +578,10 @@ static int tls_do_encryption(struct sock *sk,
atomic_inc(&ctx->encrypt_pending);
rc = crypto_aead_encrypt(aead_req);
+ if (rc == -EBUSY) {
+ rc = tls_encrypt_async_wait(ctx);
+ rc = rc ?: -EINPROGRESS;
+ }
if (!rc || rc != -EINPROGRESS) {
atomic_dec(&ctx->encrypt_pending);
sge->offset -= prot->prepend_size;
--
2.34.1
There could be a potential use-after-free case in
tcpm_register_source_caps(). This could happen when:
* new (say invalid) source caps are advertised
* the existing source caps are unregistered
* tcpm_register_source_caps() returns with an error as
usb_power_delivery_register_capabilities() fails
This causes port->partner_source_caps to hold on to the now freed source
caps.
Reset port->partner_source_caps value to NULL after unregistering
existing source caps.
Fixes: 230ecdf71a64 ("usb: typec: tcpm: unregister existing source caps before re-registration")
Cc: stable(a)vger.kernel.org
Signed-off-by: Amit Sunil Dhamne <amitsd(a)google.com>
---
drivers/usb/typec/tcpm/tcpm.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 8a1af08f71b6..be4127ef84e9 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -3014,8 +3014,10 @@ static int tcpm_register_source_caps(struct tcpm_port *port)
memcpy(caps.pdo, port->source_caps, sizeof(u32) * port->nr_source_caps);
caps.role = TYPEC_SOURCE;
- if (cap)
+ if (cap) {
usb_power_delivery_unregister_capabilities(cap);
+ port->partner_source_caps = NULL;
+ }
cap = usb_power_delivery_register_capabilities(port->partner_pd, &caps);
if (IS_ERR(cap))
base-commit: 51474ab44abf907023a8a875e799b07de461e466
--
2.45.0.rc1.225.g2a3ae87e7f-goog
Although 'norecovery' mount option is marked deprecated for a long time
and a warning message is introduced during the deprecation window, it's
still actively utilized by several projects that need a safely way to
mount a btrfs without any writes.
Furthermore this 'norecovery' mount option is supported by most major
filesystems, which makes it harder to validate our motivation.
This patch would re-introduce the 'norecovery' mount option, and output
a message to recommend 'rescue=nologreplay' option.
Link: https://lore.kernel.org/linux-btrfs/ZkxZT0J-z0GYvfy8@gardel-login/#t
Link: https://github.com/systemd/systemd/pull/32892
Link: https://bugzilla.suse.com/show_bug.cgi?id=1222429
Reported-by: Lennart Poettering <lennart(a)poettering.net>
Reported-by: Jiri Slaby <jslaby(a)suse.com>
Fixes: a1912f712188 ("btrfs: remove code for inode_cache and recovery mount options")
Cc: stable(a)vger.kernel.org # 6.8+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/super.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 2dbc930a20f7..f05cce7c8b8d 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -119,6 +119,7 @@ enum {
Opt_thread_pool,
Opt_treelog,
Opt_user_subvol_rm_allowed,
+ Opt_norecovery,
/* Rescue options */
Opt_rescue,
@@ -245,6 +246,8 @@ static const struct fs_parameter_spec btrfs_fs_parameters[] = {
__fsparam(NULL, "nologreplay", Opt_nologreplay, fs_param_deprecated, NULL),
/* Deprecated, with alias rescue=usebackuproot */
__fsparam(NULL, "usebackuproot", Opt_usebackuproot, fs_param_deprecated, NULL),
+ /* For compatibility only, alias for "rescue=nologreplay". */
+ fsparam_flag("norecovery", Opt_norecovery),
/* Debugging options. */
fsparam_flag_no("enospc_debug", Opt_enospc_debug),
@@ -438,6 +441,11 @@ static int btrfs_parse_param(struct fs_context *fc, struct fs_parameter *param)
"'nologreplay' is deprecated, use 'rescue=nologreplay' instead");
btrfs_set_opt(ctx->mount_opt, NOLOGREPLAY);
break;
+ case Opt_norecovery:
+ btrfs_info(NULL,
+"'norecovery' is for compatibility only, recommended to use 'rescue=nologreplay'");
+ btrfs_set_opt(ctx->mount_opt, NOLOGREPLAY);
+ break;
case Opt_flushoncommit:
if (result.negated)
btrfs_clear_opt(ctx->mount_opt, FLUSHONCOMMIT);
--
2.45.1
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 9d22c96316ac59ed38e80920c698fed38717b91b
Gitweb: https://git.kernel.org/tip/9d22c96316ac59ed38e80920c698fed38717b91b
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Fri, 17 May 2024 16:40:36 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 21 May 2024 14:52:35 +02:00
x86/topology: Handle bogus ACPI tables correctly
The ACPI specification clearly states how the processors should be
enumerated in the MADT:
"To ensure that the boot processor is supported post initialization,
two guidelines should be followed. The first is that OSPM should
initialize processors in the order that they appear in the MADT. The
second is that platform firmware should list the boot processor as the
first processor entry in the MADT.
...
Failure of OSPM implementations and platform firmware to abide by
these guidelines can result in both unpredictable and non optimal
platform operation."
The kernel relies on that ordering to detect the real BSP on crash kernels
which is important to avoid sending a INIT IPI to it as that would cause a
full machine reset.
On a Dell XPS 16 9640 the BIOS ignores this rule and enumerates the CPUs in
the wrong order. As a consequence the kernel falsely detects a crash kernel
and disables the corresponding CPU.
Prevent this by checking the IA32_APICBASE MSR for the BSP bit on the boot
CPU. If that bit is set, then the MADT based BSP detection can be safely
ignored. If the kernel detects a mismatch between the BSP bit and the first
enumerated MADT entry then emit a firmware bug message.
This obviously also has to be taken into account when the boot APIC ID and
the first enumerated APIC ID match. If the boot CPU does not have the BSP
bit set in the APICBASE MSR then there is no way for the boot CPU to
determine which of the CPUs is the real BSP. Sending an INIT to the real
BSP would reset the machine so the only sane way to deal with that is to
limit the number of CPUs to one and emit a corresponding warning message.
Fixes: 5c5682b9f87a ("x86/cpu: Detect real BSP on crash kernels")
Reported-by: Carsten Tolkmit <ctolkmit(a)ennit.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Carsten Tolkmit <ctolkmit(a)ennit.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87le48jycb.ffs@tglx
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218837
---
arch/x86/kernel/cpu/topology.c | 53 +++++++++++++++++++++++++++++++--
1 file changed, 50 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/cpu/topology.c b/arch/x86/kernel/cpu/topology.c
index d17c9b7..621a151 100644
--- a/arch/x86/kernel/cpu/topology.c
+++ b/arch/x86/kernel/cpu/topology.c
@@ -128,6 +128,9 @@ static void topo_set_cpuids(unsigned int cpu, u32 apic_id, u32 acpi_id)
static __init bool check_for_real_bsp(u32 apic_id)
{
+ bool is_bsp = false, has_apic_base = boot_cpu_data.x86 >= 6;
+ u64 msr;
+
/*
* There is no real good way to detect whether this a kdump()
* kernel, but except on the Voyager SMP monstrosity which is not
@@ -144,17 +147,61 @@ static __init bool check_for_real_bsp(u32 apic_id)
if (topo_info.real_bsp_apic_id != BAD_APICID)
return false;
+ /*
+ * Check whether the enumeration order is broken by evaluating the
+ * BSP bit in the APICBASE MSR. If the CPU does not have the
+ * APICBASE MSR then the BSP detection is not possible and the
+ * kernel must rely on the firmware enumeration order.
+ */
+ if (has_apic_base) {
+ rdmsrl(MSR_IA32_APICBASE, msr);
+ is_bsp = !!(msr & MSR_IA32_APICBASE_BSP);
+ }
+
if (apic_id == topo_info.boot_cpu_apic_id) {
- topo_info.real_bsp_apic_id = apic_id;
- return false;
+ /*
+ * If the boot CPU has the APIC BSP bit set then the
+ * firmware enumeration is agreeing. If the CPU does not
+ * have the APICBASE MSR then the only choice is to trust
+ * the enumeration order.
+ */
+ if (is_bsp || !has_apic_base) {
+ topo_info.real_bsp_apic_id = apic_id;
+ return false;
+ }
+ /*
+ * If the boot APIC is enumerated first, but the APICBASE
+ * MSR does not have the BSP bit set, then there is no way
+ * to discover the real BSP here. Assume a crash kernel and
+ * limit the number of CPUs to 1 as an INIT to the real BSP
+ * would reset the machine.
+ */
+ pr_warn("Enumerated BSP APIC %x is not marked in APICBASE MSR\n", apic_id);
+ pr_warn("Assuming crash kernel. Limiting to one CPU to prevent machine INIT\n");
+ set_nr_cpu_ids(1);
+ goto fwbug;
}
- pr_warn("Boot CPU APIC ID not the first enumerated APIC ID: %x > %x\n",
+ pr_warn("Boot CPU APIC ID not the first enumerated APIC ID: %x != %x\n",
topo_info.boot_cpu_apic_id, apic_id);
+
+ if (is_bsp) {
+ /*
+ * The boot CPU has the APIC BSP bit set. Use it and complain
+ * about the broken firmware enumeration.
+ */
+ topo_info.real_bsp_apic_id = topo_info.boot_cpu_apic_id;
+ goto fwbug;
+ }
+
pr_warn("Crash kernel detected. Disabling real BSP to prevent machine INIT\n");
topo_info.real_bsp_apic_id = apic_id;
return true;
+
+fwbug:
+ pr_warn(FW_BUG "APIC enumeration order not specification compliant\n");
+ return false;
}
static unsigned int topo_unit_count(u32 lvlid, enum x86_topology_domains at_level,
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases.
Proof: The test wants to run on 80% of available memory to prevent
OOM-killing (see original code comments). Let the value of mem_free at the
start of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the former
case, the test operates on 0.8 * x of memory. In the latter, the test
operates on 0.8 * (x - y) of memory, with y already filled, hence, memory
consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 * x. Q.E.D
- The probability of a bogus test success increases.
Proof: Let the memory consumed by hugepages be greater than 25% of x,
with x and y defined as above. The definition of compaction_index is
c_index = (x - y)/z where z is the memory consumed by hugepages after
trying to increase them again. In check_compaction(), we set the number
of hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore,
c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
Changes in v2:
- Handle unsigned long number of hugepages
v1:
- https://lore.kernel.org/all/20240515093633.54814-3-dev.jain@arm.com/
NOTE: This patch depends on the previous one.
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
tools/testing/selftests/mm/compaction_test.c | 71 ++++++++++++++------
1 file changed, 49 insertions(+), 22 deletions(-)
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 5e9bd1da9370..e140558e6f53 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,13 +82,16 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size,
+ unsigned long initial_nr_hugepages)
{
unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[20] = {0};
char nr_hugepages[20] = {0};
+ char init_nr_hugepages[20] = {0};
+
+ sprintf(init_nr_hugepages, "%lu", initial_nr_hugepages);
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -102,23 +105,6 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto out;
}
- if (read(fd, initial_nr_hugepages, sizeof(initial_nr_hugepages)) <= 0) {
- ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
- /* Start with the initial condition of 0 huge pages*/
- if (write(fd, "0", sizeof(char)) != sizeof(char)) {
- ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
/* Request a large number of huge pages. The Kernel will allocate
as much as it can */
if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
@@ -146,8 +132,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
lseek(fd, 0, SEEK_SET);
- if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages))
- != strlen(initial_nr_hugepages)) {
+ if (write(fd, init_nr_hugepages, strlen(init_nr_hugepages))
+ != strlen(init_nr_hugepages)) {
ksft_print_msg("Failed to write value to /proc/sys/vm/nr_hugepages: %s\n",
strerror(errno));
goto close_fd;
@@ -171,6 +157,41 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
return ret;
}
+int set_zero_hugepages(unsigned long *initial_nr_hugepages)
+{
+ int fd, ret = -1;
+ char nr_hugepages[20] = {0};
+
+ fd = open("/proc/sys/vm/nr_hugepages", O_RDWR | O_NONBLOCK);
+ if (fd < 0) {
+ ksft_print_msg("Failed to open /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ if (read(fd, nr_hugepages, sizeof(nr_hugepages)) <= 0) {
+ ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ lseek(fd, 0, SEEK_SET);
+
+ /* Start with the initial condition of 0 huge pages */
+ if (write(fd, "0", sizeof(char)) != sizeof(char)) {
+ ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ *initial_nr_hugepages = strtoul(nr_hugepages, NULL, 10);
+ ret = 0;
+
+ close_fd:
+ close(fd);
+
+ out:
+ return ret;
+}
int main(int argc, char **argv)
{
@@ -181,6 +202,7 @@ int main(int argc, char **argv)
unsigned long mem_free = 0;
unsigned long hugepage_size = 0;
long mem_fragmentable_MB = 0;
+ unsigned long initial_nr_hugepages;
ksft_print_header();
@@ -189,6 +211,10 @@ int main(int argc, char **argv)
ksft_set_plan(1);
+ /* Start the test without hugepages reducing mem_free */
+ if (set_zero_hugepages(&initial_nr_hugepages))
+ ksft_exit_fail();
+
lim.rlim_cur = RLIM_INFINITY;
lim.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_MEMLOCK, &lim))
@@ -232,7 +258,8 @@ int main(int argc, char **argv)
entry = entry->next;
}
- if (check_compaction(mem_free, hugepage_size) == 0)
+ if (check_compaction(mem_free, hugepage_size,
+ initial_nr_hugepages) == 0)
ksft_exit_pass();
ksft_exit_fail();
--
2.34.1
Currently, the test tries to set nr_hugepages to zero, but that is not
actually done because the file offset is not reset after read(). Fix that
using lseek().
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
tools/testing/selftests/mm/compaction_test.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 0b249a06a60b..5e9bd1da9370 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -108,6 +108,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
goto close_fd;
}
+ lseek(fd, 0, SEEK_SET);
+
/* Start with the initial condition of 0 huge pages*/
if (write(fd, "0", sizeof(char)) != sizeof(char)) {
ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
--
2.34.1
Currently, if at runtime we are not able to allocate a huge page, the
test will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo, and handle the
case where the number of hugepages may overflow an integer.
Changes in v2:
- Combine with this series, handle unsigned long number of hugepages
v1:
- https://lore.kernel.org/all/20240513082842.4117782-1-dev.jain@arm.com/
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
tools/testing/selftests/mm/compaction_test.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 4f42eb7d7636..0b249a06a60b 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,12 +82,13 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
{
+ unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[10] = {0};
- char nr_hugepages[10] = {0};
+ char initial_nr_hugepages[20] = {0};
+ char nr_hugepages[20] = {0};
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -134,7 +135,12 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
/* We should have been able to request at least 1/3 rd of the memory in
huge pages */
- compaction_index = mem_free/(atoi(nr_hugepages) * hugepage_size);
+ nr_hugepages_ul = strtoul(nr_hugepages, NULL, 10);
+ if (!nr_hugepages_ul) {
+ ksft_print_msg("ERROR: No memory is available as huge pages\n");
+ goto close_fd;
+ }
+ compaction_index = mem_free/(nr_hugepages_ul * hugepage_size);
lseek(fd, 0, SEEK_SET);
@@ -145,11 +151,11 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
goto close_fd;
}
- ksft_print_msg("Number of huge pages allocated = %d\n",
- atoi(nr_hugepages));
+ ksft_print_msg("Number of huge pages allocated = %lu\n",
+ nr_hugepages_ul);
if (compaction_index > 3) {
- ksft_print_msg("ERROR: Less that 1/%d of memory is available\n"
+ ksft_print_msg("ERROR: Less than 1/%d of memory is available\n"
"as huge pages\n", compaction_index);
goto close_fd;
}
--
2.34.1
Currently, if at runtime we are not able to allocate a huge page, the
test will trivially pass on Aarch64 due to no exception being raised on
division by zero while computing compaction_index. Fix that by checking
for nr_hugepages == 0. Anyways, in general, avoid a division by zero by
exiting the program beforehand. While at it, fix a typo.
Changes in v2:
- Combine with this series, handle unsigned long number of hugepages
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
tools/testing/selftests/mm/compaction_test.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 4f42eb7d7636..0b249a06a60b 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,12 +82,13 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
{
+ unsigned long nr_hugepages_ul;
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[10] = {0};
- char nr_hugepages[10] = {0};
+ char initial_nr_hugepages[20] = {0};
+ char nr_hugepages[20] = {0};
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -134,7 +135,12 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
/* We should have been able to request at least 1/3 rd of the memory in
huge pages */
- compaction_index = mem_free/(atoi(nr_hugepages) * hugepage_size);
+ nr_hugepages_ul = strtoul(nr_hugepages, NULL, 10);
+ if (!nr_hugepages_ul) {
+ ksft_print_msg("ERROR: No memory is available as huge pages\n");
+ goto close_fd;
+ }
+ compaction_index = mem_free/(nr_hugepages_ul * hugepage_size);
lseek(fd, 0, SEEK_SET);
@@ -145,11 +151,11 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
goto close_fd;
}
- ksft_print_msg("Number of huge pages allocated = %d\n",
- atoi(nr_hugepages));
+ ksft_print_msg("Number of huge pages allocated = %lu\n",
+ nr_hugepages_ul);
if (compaction_index > 3) {
- ksft_print_msg("ERROR: Less that 1/%d of memory is available\n"
+ ksft_print_msg("ERROR: Less than 1/%d of memory is available\n"
"as huge pages\n", compaction_index);
goto close_fd;
}
--
2.34.1
In some scenarios, the DPT object gets shrunk but
the actual framebuffer did not and thus its still
there on the DPT's vm->bound_list. Then it tries to
rewrite the PTEs via a stale CPU mapping. This causes panic.
Credits-to: Ville Syrjala <ville.syrjala(a)linux.intel.com>
Shawn Lee <shawn.c.lee(a)intel.com>
Cc: stable(a)vger.kernel.org
Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for dpt")
Signed-off-by: Vidya Srinivas <vidya.srinivas(a)intel.com>
---
drivers/gpu/drm/i915/gem/i915_gem_object.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 3560a062d287..e6b485fc54d4 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -284,7 +284,8 @@ bool i915_gem_object_has_iomem(const struct drm_i915_gem_object *obj);
static inline bool
i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj)
{
- return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE);
+ return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE) &&
+ !obj->is_dpt;
}
static inline bool
--
2.34.1
When asn1_encode_sequence() fails, WARN is not the correct solution.
1. asn1_encode_sequence() is not an internal function (located
in lib/asn1_encode.c).
2. Location is known, which makes the stack trace useless.
3. Results a crash if panic_on_warn is set.
It is also noteworthy that the use of WARN is undocumented, and it
should be avoided unless there is a carefully considered rationale to
use it.
Replace WARN with pr_err, and print the return value instead, which is
only useful piece of information.
Cc: stable(a)vger.kernel.org # v5.13+
Fixes: f2219745250f ("security: keys: trusted: use ASN.1 TPM2 key format for the blobs")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
security/keys/trusted-keys/trusted_tpm2.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/security/keys/trusted-keys/trusted_tpm2.c b/security/keys/trusted-keys/trusted_tpm2.c
index c6882f5d094f..8b7dd73d94c1 100644
--- a/security/keys/trusted-keys/trusted_tpm2.c
+++ b/security/keys/trusted-keys/trusted_tpm2.c
@@ -84,8 +84,9 @@ static int tpm2_key_encode(struct trusted_key_payload *payload,
work1 = payload->blob;
work1 = asn1_encode_sequence(work1, work1 + sizeof(payload->blob),
scratch, work - scratch);
- if (WARN(IS_ERR(work1), "BUG: ASN.1 encoder failed")) {
+ if (IS_ERR(work1)) {
ret = PTR_ERR(work1);
+ pr_err("BUG: ASN.1 encoder failed with %d\n", ret);
goto err;
}
--
2.45.1
The Elan eKTH5015M touch controller found on the Lenovo ThinkPad X13s
shares the VCC33 supply with other peripherals that may remain powered
during suspend (e.g. when enabled as wakeup sources).
The reset line is also wired so that it can be left deasserted when the
supply is off.
This is important as it avoids holding the controller in reset for
extended periods of time when it remains powered, which can lead to
increased power consumption, and also avoids leaking current through the
X13s reset circuitry during suspend (and after driver unbind).
Use the new 'no-reset-on-power-off' devicetree property to determine
when reset needs to be asserted on power down.
Notably this also avoids wasting power on machine variants without a
touchscreen for which the driver would otherwise exit probe with reset
asserted.
Fixes: bd3cba00dcc6 ("HID: i2c-hid: elan: Add support for Elan eKTH6915 i2c-hid touchscreens")
Cc: stable(a)vger.kernel.org # 6.0
Cc: Douglas Anderson <dianders(a)chromium.org>
Tested-by: Steev Klimaszewski <steev(a)kali.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/hid/i2c-hid/i2c-hid-of-elan.c | 59 +++++++++++++++++++++------
1 file changed, 47 insertions(+), 12 deletions(-)
diff --git a/drivers/hid/i2c-hid/i2c-hid-of-elan.c b/drivers/hid/i2c-hid/i2c-hid-of-elan.c
index 5b91fb106cfc..091e37933225 100644
--- a/drivers/hid/i2c-hid/i2c-hid-of-elan.c
+++ b/drivers/hid/i2c-hid/i2c-hid-of-elan.c
@@ -31,6 +31,7 @@ struct i2c_hid_of_elan {
struct regulator *vcc33;
struct regulator *vccio;
struct gpio_desc *reset_gpio;
+ bool no_reset_on_power_off;
const struct elan_i2c_hid_chip_data *chip_data;
};
@@ -40,17 +41,17 @@ static int elan_i2c_hid_power_up(struct i2chid_ops *ops)
container_of(ops, struct i2c_hid_of_elan, ops);
int ret;
+ gpiod_set_value_cansleep(ihid_elan->reset_gpio, 1);
+
if (ihid_elan->vcc33) {
ret = regulator_enable(ihid_elan->vcc33);
if (ret)
- return ret;
+ goto err_deassert_reset;
}
ret = regulator_enable(ihid_elan->vccio);
- if (ret) {
- regulator_disable(ihid_elan->vcc33);
- return ret;
- }
+ if (ret)
+ goto err_disable_vcc33;
if (ihid_elan->chip_data->post_power_delay_ms)
msleep(ihid_elan->chip_data->post_power_delay_ms);
@@ -60,6 +61,15 @@ static int elan_i2c_hid_power_up(struct i2chid_ops *ops)
msleep(ihid_elan->chip_data->post_gpio_reset_on_delay_ms);
return 0;
+
+err_disable_vcc33:
+ if (ihid_elan->vcc33)
+ regulator_disable(ihid_elan->vcc33);
+err_deassert_reset:
+ if (ihid_elan->no_reset_on_power_off)
+ gpiod_set_value_cansleep(ihid_elan->reset_gpio, 0);
+
+ return ret;
}
static void elan_i2c_hid_power_down(struct i2chid_ops *ops)
@@ -67,7 +77,14 @@ static void elan_i2c_hid_power_down(struct i2chid_ops *ops)
struct i2c_hid_of_elan *ihid_elan =
container_of(ops, struct i2c_hid_of_elan, ops);
- gpiod_set_value_cansleep(ihid_elan->reset_gpio, 1);
+ /*
+ * Do not assert reset when the hardware allows for it to remain
+ * deasserted regardless of the state of the (shared) power supply to
+ * avoid wasting power when the supply is left on.
+ */
+ if (!ihid_elan->no_reset_on_power_off)
+ gpiod_set_value_cansleep(ihid_elan->reset_gpio, 1);
+
if (ihid_elan->chip_data->post_gpio_reset_off_delay_ms)
msleep(ihid_elan->chip_data->post_gpio_reset_off_delay_ms);
@@ -79,6 +96,7 @@ static void elan_i2c_hid_power_down(struct i2chid_ops *ops)
static int i2c_hid_of_elan_probe(struct i2c_client *client)
{
struct i2c_hid_of_elan *ihid_elan;
+ int ret;
ihid_elan = devm_kzalloc(&client->dev, sizeof(*ihid_elan), GFP_KERNEL);
if (!ihid_elan)
@@ -93,21 +111,38 @@ static int i2c_hid_of_elan_probe(struct i2c_client *client)
if (IS_ERR(ihid_elan->reset_gpio))
return PTR_ERR(ihid_elan->reset_gpio);
+ ihid_elan->no_reset_on_power_off = of_property_read_bool(client->dev.of_node,
+ "no-reset-on-power-off");
+
ihid_elan->vccio = devm_regulator_get(&client->dev, "vccio");
- if (IS_ERR(ihid_elan->vccio))
- return PTR_ERR(ihid_elan->vccio);
+ if (IS_ERR(ihid_elan->vccio)) {
+ ret = PTR_ERR(ihid_elan->vccio);
+ goto err_deassert_reset;
+ }
ihid_elan->chip_data = device_get_match_data(&client->dev);
if (ihid_elan->chip_data->main_supply_name) {
ihid_elan->vcc33 = devm_regulator_get(&client->dev,
ihid_elan->chip_data->main_supply_name);
- if (IS_ERR(ihid_elan->vcc33))
- return PTR_ERR(ihid_elan->vcc33);
+ if (IS_ERR(ihid_elan->vcc33)) {
+ ret = PTR_ERR(ihid_elan->vcc33);
+ goto err_deassert_reset;
+ }
}
- return i2c_hid_core_probe(client, &ihid_elan->ops,
- ihid_elan->chip_data->hid_descriptor_address, 0);
+ ret = i2c_hid_core_probe(client, &ihid_elan->ops,
+ ihid_elan->chip_data->hid_descriptor_address, 0);
+ if (ret)
+ goto err_deassert_reset;
+
+ return 0;
+
+err_deassert_reset:
+ if (ihid_elan->no_reset_on_power_off)
+ gpiod_set_value_cansleep(ihid_elan->reset_gpio, 0);
+
+ return ret;
}
static const struct elan_i2c_hid_chip_data elan_ekth6915_chip_data = {
--
2.43.2
If ath10k_snoc is built-in, while Qualcomm remoteprocs are built as
modules, compilation fails with:
/usr/bin/aarch64-linux-gnu-ld: drivers/net/wireless/ath/ath10k/snoc.o: in function `ath10k_modem_init':
drivers/net/wireless/ath/ath10k/snoc.c:1534: undefined reference to `qcom_register_ssr_notifier'
/usr/bin/aarch64-linux-gnu-ld: drivers/net/wireless/ath/ath10k/snoc.o: in function `ath10k_modem_deinit':
drivers/net/wireless/ath/ath10k/snoc.c:1551: undefined reference to `qcom_unregister_ssr_notifier'
Add corresponding dependency to ATH10K_SNOC Kconfig entry so that it's
built as module if QCOM_RPROC_COMMON is built as module too.
Fixes: 747ff7d3d742 ("ath10k: Don't always treat modem stop events as crashes")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
---
drivers/net/wireless/ath/ath10k/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/wireless/ath/ath10k/Kconfig b/drivers/net/wireless/ath/ath10k/Kconfig
index e6ea884cafc1..4f385f4a8cef 100644
--- a/drivers/net/wireless/ath/ath10k/Kconfig
+++ b/drivers/net/wireless/ath/ath10k/Kconfig
@@ -45,6 +45,7 @@ config ATH10K_SNOC
depends on ATH10K
depends on ARCH_QCOM || COMPILE_TEST
depends on QCOM_SMEM
+ depends on QCOM_RPROC_COMMON || QCOM_RPROC_COMMON=n
select QCOM_SCM
select QCOM_QMI_HELPERS
help
---
base-commit: 75fa778d74b786a1608d55d655d42b480a6fa8bd
change-id: 20240511-ath10k-snoc-dep-862a9da2e6bb
Best regards,
--
Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the
test, the following problems arise:
- The probability of the test getting OOM-killed increases.
Proof: The test wants to run on 80% of available memory to prevent
OOM-killing (see original code comments). Let the value of mem_free at the
start of the test, when nr_hugepages = 0, be x. In the other case, when
nr_hugepages > 0, let the memory consumed by hugepages be y. In the former
case, the test operates on 0.8 * x of memory. In the latter, the test
operates on 0.8 * (x - y) of memory, with y already filled, hence, memory
consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 * x. Q.E.D
- The probability of a bogus test success increases.
Proof: Let the memory consumed by hugepages be greater than 25% of x,
with x and y defined as above. The definition of compaction_index is
c_index = (x - y)/z where z is the memory consumed by hugepages after
trying to increase them again. In check_compaction(), we set the number
of hugepages to zero, and then increase them back; the probability that
they will be set back to consume at least y amount of memory again is
very high (since there is not much delay between the two attempts of
changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption).
Therefore,
c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3
hence, c_index can always be forced to be less than 3, thereby the test
succeeding always. Q.E.D
NOTE: This patch depends on the previous one.
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
tools/testing/selftests/mm/compaction_test.c | 72 ++++++++++++++------
1 file changed, 50 insertions(+), 22 deletions(-)
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index c5be395f8363..2ae059989771 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -82,12 +82,15 @@ int prereq(void)
return -1;
}
-int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
+int check_compaction(unsigned long mem_free, unsigned int hugepage_size,
+ int initial_nr_hugepages)
{
int fd, ret = -1;
int compaction_index = 0;
- char initial_nr_hugepages[10] = {0};
char nr_hugepages[10] = {0};
+ char init_nr_hugepages[10] = {0};
+
+ sprintf(init_nr_hugepages, "%d", initial_nr_hugepages);
/* We want to test with 80% of available memory. Else, OOM killer comes
in to play */
@@ -101,23 +104,6 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
goto out;
}
- if (read(fd, initial_nr_hugepages, sizeof(initial_nr_hugepages)) <= 0) {
- ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
- /* Start with the initial condition of 0 huge pages*/
- if (write(fd, "0", sizeof(char)) != sizeof(char)) {
- ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
- goto close_fd;
- }
-
- lseek(fd, 0, SEEK_SET);
-
/* Request a large number of huge pages. The Kernel will allocate
as much as it can */
if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
@@ -140,8 +126,8 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
lseek(fd, 0, SEEK_SET);
- if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages))
- != strlen(initial_nr_hugepages)) {
+ if (write(fd, init_nr_hugepages, strlen(init_nr_hugepages))
+ != strlen(init_nr_hugepages)) {
ksft_print_msg("Failed to write value to /proc/sys/vm/nr_hugepages: %s\n",
strerror(errno));
goto close_fd;
@@ -165,6 +151,42 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
return ret;
}
+int set_zero_hugepages(int *initial_nr_hugepages)
+{
+ int fd, ret = -1;
+ char nr_hugepages[10] = {0};
+
+ fd = open("/proc/sys/vm/nr_hugepages", O_RDWR | O_NONBLOCK);
+ if (fd < 0) {
+ ksft_print_msg("Failed to open /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto out;
+ }
+
+ if (read(fd, nr_hugepages, sizeof(nr_hugepages)) <= 0) {
+ ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ lseek(fd, 0, SEEK_SET);
+
+ /* Start with the initial condition of 0 huge pages */
+ if (write(fd, "0", sizeof(char)) != sizeof(char)) {
+ ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
+ strerror(errno));
+ goto close_fd;
+ }
+
+ *initial_nr_hugepages = atoi(nr_hugepages);
+ ret = 0;
+
+ close_fd:
+ close(fd);
+
+ out:
+ return ret;
+}
int main(int argc, char **argv)
{
@@ -175,6 +197,7 @@ int main(int argc, char **argv)
unsigned long mem_free = 0;
unsigned long hugepage_size = 0;
long mem_fragmentable_MB = 0;
+ int initial_nr_hugepages;
ksft_print_header();
@@ -183,6 +206,10 @@ int main(int argc, char **argv)
ksft_set_plan(1);
+ /* start the test without hugepages reducing mem_free */
+ if (set_zero_hugepages(&initial_nr_hugepages))
+ return ksft_exit_fail();
+
lim.rlim_cur = RLIM_INFINITY;
lim.rlim_max = RLIM_INFINITY;
if (setrlimit(RLIMIT_MEMLOCK, &lim))
@@ -226,7 +253,8 @@ int main(int argc, char **argv)
entry = entry->next;
}
- if (check_compaction(mem_free, hugepage_size) == 0)
+ if (check_compaction(mem_free, hugepage_size,
+ initial_nr_hugepages) == 0)
return ksft_exit_pass();
return ksft_exit_fail();
--
2.30.2
nr_hugepages is not set to zero because the file offset has not been reset
after read(). Fix that using lseek().
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dev Jain <dev.jain(a)arm.com>
---
Merge dependency: https://lore.kernel.org/all/20240513082842.4117782-1-dev.jain@arm.com/
Andrew, does it sound reasonable to have the fixes tag in the above
patch too, along with this series?
tools/testing/selftests/mm/compaction_test.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c
index 533999b6c284..c5be395f8363 100644
--- a/tools/testing/selftests/mm/compaction_test.c
+++ b/tools/testing/selftests/mm/compaction_test.c
@@ -107,6 +107,8 @@ int check_compaction(unsigned long mem_free, unsigned int hugepage_size)
goto close_fd;
}
+ lseek(fd, 0, SEEK_SET);
+
/* Start with the initial condition of 0 huge pages*/
if (write(fd, "0", sizeof(char)) != sizeof(char)) {
ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n",
--
2.30.2
When asn1_encode_sequence() fails, WARN is not the correct solution.
1. asn1_encode_sequence() is not an internal function (located
in lib/asn1_encode.c).
2. Location is known, which makes the stack trace useless.
3. Results a crash if panic_on_warn is set.
It is also noteworthy that the use of WARN is undocumented, and it
should be avoided unless there is a carefully considered rationale to
use it.
Replace WARN with pr_err, and print the return value instead, which is
only useful piece of information.
Cc: stable(a)vger.kernel.org # v5.13+
Fixes: f2219745250f ("security: keys: trusted: use ASN.1 TPM2 key format for the blobs")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
security/keys/trusted-keys/trusted_tpm2.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/security/keys/trusted-keys/trusted_tpm2.c b/security/keys/trusted-keys/trusted_tpm2.c
index c6882f5d094f..8b7dd73d94c1 100644
--- a/security/keys/trusted-keys/trusted_tpm2.c
+++ b/security/keys/trusted-keys/trusted_tpm2.c
@@ -84,8 +84,9 @@ static int tpm2_key_encode(struct trusted_key_payload *payload,
work1 = payload->blob;
work1 = asn1_encode_sequence(work1, work1 + sizeof(payload->blob),
scratch, work - scratch);
- if (WARN(IS_ERR(work1), "BUG: ASN.1 encoder failed")) {
+ if (IS_ERR(work1)) {
ret = PTR_ERR(work1);
+ pr_err("BUG: ASN.1 encoder failed with %d\n", ret);
goto err;
}
--
2.45.1
The patch titled
Subject: mm/huge_memory: don't unpoison huge_zero_folio
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memory-dont-unpoison-huge_zero_folio.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/huge_memory: don't unpoison huge_zero_folio
Date: Thu, 16 May 2024 20:26:08 +0800
When I did memory failure tests recently, below panic occurs:
kernel BUG at include/linux/mm.h:1135!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
Call Trace:
<TASK>
do_shrink_slab+0x14f/0x6a0
shrink_slab+0xca/0x8c0
shrink_node+0x2d0/0x7d0
balance_pgdat+0x33a/0x720
kswapd+0x1f3/0x410
kthread+0xd5/0x100
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
The root cause is that HWPoison flag will be set for huge_zero_folio
without increasing the folio refcnt. But then unpoison_memory() will
decrease the folio refcnt unexpectedly as it appears like a successfully
hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when
releasing huge_zero_folio.
Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue.
We're not prepared to unpoison huge_zero_folio yet.
Link: https://lkml.kernel.org/r/20240516122608.22610-1-linmiaohe@huawei.com
Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Yang Shi <shy828301(a)gmail.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Xu Yu <xuyu(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/mm/memory-failure.c~mm-huge_memory-dont-unpoison-huge_zero_folio
+++ a/mm/memory-failure.c
@@ -2546,6 +2546,13 @@ int unpoison_memory(unsigned long pfn)
goto unlock_mutex;
}
+ if (is_huge_zero_folio(folio)) {
+ unpoison_pr_info("Unpoison: huge zero page is not supported %#lx\n",
+ pfn, &unpoison_rs);
+ ret = -EOPNOTSUPP;
+ goto unlock_mutex;
+ }
+
if (!PageHWPoison(p)) {
unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n",
pfn, &unpoison_rs);
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-huge_memory-dont-unpoison-huge_zero_folio.patch
If the circular buffer is empty, it just means we fit all characters to
send into the HW fifo, but not that the hardware finished transmitting
them.
So if we immediately call stop_tx() after that, this may abort any
pending characters in the HW fifo, and cause dropped characters on the
console.
Fix this by only stopping tx when the tx HW fifo is actually empty.
Fixes: 8275b48b2780 ("tty: serial: introduce transmit helpers")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jonas Gorski <jonas.gorski(a)gmail.com>
---
(this is v2 of the bcm63xx-uart fix attempt)
v1 -> v2
* replace workaround with fix for core issue
* add Cc: for stable
I'm somewhat confident this is the core issue causing the broken output
with bcm63xx-uart, and there is no actual need for the UART_TX_NOSTOP.
I wouldn't be surprised if this also fixes mxs-uart for which
UART_TX_NOSTOP was introduced.
If it does, there is no need for the flag anymore.
include/linux/serial_core.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h
index 55b1f3ba48ac..bb0f2d4ac62f 100644
--- a/include/linux/serial_core.h
+++ b/include/linux/serial_core.h
@@ -786,7 +786,8 @@ enum UART_TX_FLAGS {
if (pending < WAKEUP_CHARS) { \
uart_write_wakeup(__port); \
\
- if (!((flags) & UART_TX_NOSTOP) && pending == 0) \
+ if (!((flags) & UART_TX_NOSTOP) && pending == 0 && \
+ __port->ops->tx_empty(__port)) \
__port->ops->stop_tx(__port); \
} \
\
--
2.34.1
Hi stable team,
Please backport the mainline commit
bf202e654bfa ("cpufreq: amd-pstate: fix the highest frequency issue which limits performance")
to the 6.9 stable series.
It fixes a performance regression on AMD Phoenix platforms.
It was meant to get into the 6.9 release or the stable branch shortly
after, but apparently that didn't happen.
On 2024-05-08 13:47:03+0000, Perry Yuan wrote:
> To address the performance drop issue, an optimization has been
> implemented. The incorrect highest performance value previously set by the
> low-level power firmware for AMD CPUs with Family ID 0x19 and Model ID
> ranging from 0x70 to 0x7F series has been identified as the cause.
>
> To resolve this, a check has been implemented to accurately determine the
> CPU family and model ID. The correct highest performance value is now set
> and the performance drop caused by the incorrect highest performance value
> are eliminated.
>
> Before the fix, the highest frequency was set to 4200MHz, now it is set
> to 4971MHz which is correct.
>
> CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ MHZ
> 0 0 0 0 0:0:0:0 yes 4971.0000 400.0000 400.0000
> 1 0 0 0 0:0:0:0 yes 4971.0000 400.0000 400.0000
> 2 0 0 1 1:1:1:0 yes 4971.0000 400.0000 4865.8140
> 3 0 0 1 1:1:1:0 yes 4971.0000 400.0000 400.0000
>
> v1->v2:
> * add test by flag from Gaha Bana
>
> Fixes: f3a052391822 ("cpufreq: amd-pstate: Enable amd-pstate preferred core support")
> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218759
> Signed-off-by: Perry Yuan <perry.yuan(a)amd.com>
> Co-developed-by: Mario Limonciello <mario.limonciello(a)amd.com>
> Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
> Tested-by: Gaha Bana <gahabana(a)gmail.com>
> ---
> drivers/cpufreq/amd-pstate.c | 22 +++++++++++++++++++---
> 1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c
> index 2db095867d03..6a342b0c0140 100644
> --- a/drivers/cpufreq/amd-pstate.c
> +++ b/drivers/cpufreq/amd-pstate.c
> @@ -50,7 +50,8 @@
>
> #define AMD_PSTATE_TRANSITION_LATENCY 20000
> #define AMD_PSTATE_TRANSITION_DELAY 1000
> -#define AMD_PSTATE_PREFCORE_THRESHOLD 166
> +#define CPPC_HIGHEST_PERF_PERFORMANCE 196
> +#define CPPC_HIGHEST_PERF_DEFAULT 166
>
> /*
> * TODO: We need more time to fine tune processors with shared memory solution
> @@ -326,6 +327,21 @@ static inline int amd_pstate_enable(bool enable)
> return static_call(amd_pstate_enable)(enable);
> }
>
> +static u32 amd_pstate_highest_perf_set(struct amd_cpudata *cpudata)
> +{
> + struct cpuinfo_x86 *c = &cpu_data(0);
> +
> + /*
> + * For AMD CPUs with Family ID 19H and Model ID range 0x70 to 0x7f,
> + * the highest performance level is set to 196.
> + * https://bugzilla.kernel.org/show_bug.cgi?id=218759
> + */
> + if (c->x86 == 0x19 && (c->x86_model >= 0x70 && c->x86_model <= 0x7f))
> + return CPPC_HIGHEST_PERF_PERFORMANCE;
> +
> + return CPPC_HIGHEST_PERF_DEFAULT;
> +}
> +
> static int pstate_init_perf(struct amd_cpudata *cpudata)
> {
> u64 cap1;
> @@ -342,7 +358,7 @@ static int pstate_init_perf(struct amd_cpudata *cpudata)
> * the default max perf.
> */
> if (cpudata->hw_prefcore)
> - highest_perf = AMD_PSTATE_PREFCORE_THRESHOLD;
> + highest_perf = amd_pstate_highest_perf_set(cpudata);
> else
> highest_perf = AMD_CPPC_HIGHEST_PERF(cap1);
>
> @@ -366,7 +382,7 @@ static int cppc_init_perf(struct amd_cpudata *cpudata)
> return ret;
>
> if (cpudata->hw_prefcore)
> - highest_perf = AMD_PSTATE_PREFCORE_THRESHOLD;
> + highest_perf = amd_pstate_highest_perf_set(cpudata);
> else
> highest_perf = cppc_perf.highest_perf;
>
> --
> 2.34.1
>
When asn1_encode_sequence() fails, WARN is not the correct solution.
1. asn1_encode_sequence() is not an internal function (located
in lib/asn1_encode.c).
2. Location is known, which makes the stack trace useless.
3. Results a crash if panic_on_warn is set.
It is also noteworthy that the use of WARN is undocumented, and it
should be avoided unless there is a carefully considered rationale to
use it.
Replace WARN with pr_err, and print the return value instead, which is
only useful piece of information.
Cc: stable(a)vger.kernel.org # v5.13+
Fixes: f2219745250f ("security: keys: trusted: use ASN.1 TPM2 key format for the blobs")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
security/keys/trusted-keys/trusted_tpm2.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/security/keys/trusted-keys/trusted_tpm2.c b/security/keys/trusted-keys/trusted_tpm2.c
index dfeec06301ce..dbdd6a318b8b 100644
--- a/security/keys/trusted-keys/trusted_tpm2.c
+++ b/security/keys/trusted-keys/trusted_tpm2.c
@@ -38,6 +38,7 @@ static int tpm2_key_encode(struct trusted_key_payload *payload,
u8 *end_work = scratch + SCRATCH_SIZE;
u8 *priv, *pub;
u16 priv_len, pub_len;
+ int ret;
priv_len = get_unaligned_be16(src) + 2;
priv = src;
@@ -79,8 +80,11 @@ static int tpm2_key_encode(struct trusted_key_payload *payload,
work1 = payload->blob;
work1 = asn1_encode_sequence(work1, work1 + sizeof(payload->blob),
scratch, work - scratch);
- if (WARN(IS_ERR(work1), "BUG: ASN.1 encoder failed"))
- return PTR_ERR(work1);
+ if (IS_ERR(work1)) {
+ ret = PTR_ERR(work1);
+ pr_err("ASN.1 encode error %d\n", ret);
+ return ret;
+ }
return work1 - payload->blob;
}
--
2.45.1
When asn1_encode_sequence() fails, WARN is not the correct solution.
1. asn1_encode_sequence() is not an internal function (located
in lib/asn1_encode.c).
2. Location is known, which makes the stack trace useless.
3. Results a crash if panic_on_warn is set.
It is also noteworthy that the use of WARN is undocumented, and it
should be avoided unless there is a carefully considered rationale to
use it.
Replace WARN with pr_err, and print the return value instead, which is
only useful piece of information.
Cc: stable(a)vger.kernel.org # v5.13+
Fixes: f2219745250f ("security: keys: trusted: use ASN.1 TPM2 key format for the blobs")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
security/keys/trusted-keys/trusted_tpm2.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/security/keys/trusted-keys/trusted_tpm2.c b/security/keys/trusted-keys/trusted_tpm2.c
index dfeec06301ce..dbdd6a318b8b 100644
--- a/security/keys/trusted-keys/trusted_tpm2.c
+++ b/security/keys/trusted-keys/trusted_tpm2.c
@@ -38,6 +38,7 @@ static int tpm2_key_encode(struct trusted_key_payload *payload,
u8 *end_work = scratch + SCRATCH_SIZE;
u8 *priv, *pub;
u16 priv_len, pub_len;
+ int ret;
priv_len = get_unaligned_be16(src) + 2;
priv = src;
@@ -79,8 +80,11 @@ static int tpm2_key_encode(struct trusted_key_payload *payload,
work1 = payload->blob;
work1 = asn1_encode_sequence(work1, work1 + sizeof(payload->blob),
scratch, work - scratch);
- if (WARN(IS_ERR(work1), "BUG: ASN.1 encoder failed"))
- return PTR_ERR(work1);
+ if (IS_ERR(work1)) {
+ ret = PTR_ERR(work1);
+ pr_err("ASN.1 encode error %d\n", ret);
+ return ret;
+ }
return work1 - payload->blob;
}
--
2.45.1
Notice...
Your fund that was stopped from completion has been released and ready to be transferred indicate if this email id is active for more details
Regards
Mr. Rowland Cole
From: NeilBrown <neilb(a)suse.de>
[ Upstream commit 3903902401451b1cd9d797a8c79769eb26ac7fe5 ]
The original implementation of nfsd used signals to stop threads during
shutdown.
In Linux 2.3.46pre5 nfsd gained the ability to shutdown threads
internally it if was asked to run "0" threads. After this user-space
transitioned to using "rpc.nfsd 0" to stop nfsd and sending signals to
threads was no longer an important part of the API.
In commit 3ebdbe5203a8 ("SUNRPC: discard svo_setup and rename
svc_set_num_threads_sync()") (v5.17-rc1~75^2~41) we finally removed the
use of signals for stopping threads, using kthread_stop() instead.
This patch makes the "obvious" next step and removes the ability to
signal nfsd threads - or any svc threads. nfsd stops allowing signals
and we don't check for their delivery any more.
This will allow for some simplification in later patches.
A change worth noting is in nfsd4_ssc_setup_dul(). There was previously
a signal_pending() check which would only succeed when the thread was
being shut down. It should really have tested kthread_should_stop() as
well. Now it just does the latter, not the former.
Signed-off-by: NeilBrown <neilb(a)suse.de>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
fs/nfs/callback.c | 9 +--------
fs/nfsd/nfs4proc.c | 5 ++---
fs/nfsd/nfssvc.c | 12 ------------
net/sunrpc/svc_xprt.c | 16 ++++++----------
4 files changed, 9 insertions(+), 33 deletions(-)
Greg, Sasha - This is the third resend for this fix. Why isn't it
applied to origin/linux-5.15.y yet?
diff --git a/fs/nfs/callback.c b/fs/nfs/callback.c
index 456af7d230cf..46a0a2d6962e 100644
--- a/fs/nfs/callback.c
+++ b/fs/nfs/callback.c
@@ -80,9 +80,6 @@ nfs4_callback_svc(void *vrqstp)
set_freezable();
while (!kthread_freezable_should_stop(NULL)) {
-
- if (signal_pending(current))
- flush_signals(current);
/*
* Listen for a request on the socket
*/
@@ -112,11 +109,7 @@ nfs41_callback_svc(void *vrqstp)
set_freezable();
while (!kthread_freezable_should_stop(NULL)) {
-
- if (signal_pending(current))
- flush_signals(current);
-
- prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_INTERRUPTIBLE);
+ prepare_to_wait(&serv->sv_cb_waitq, &wq, TASK_IDLE);
spin_lock_bh(&serv->sv_cb_lock);
if (!list_empty(&serv->sv_cb_list)) {
req = list_first_entry(&serv->sv_cb_list,
diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index c14f5ac1484c..6779291efca9 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1317,12 +1317,11 @@ static __be32 nfsd4_ssc_setup_dul(struct nfsd_net *nn, char *ipaddr,
/* found a match */
if (ni->nsui_busy) {
/* wait - and try again */
- prepare_to_wait(&nn->nfsd_ssc_waitq, &wait,
- TASK_INTERRUPTIBLE);
+ prepare_to_wait(&nn->nfsd_ssc_waitq, &wait, TASK_IDLE);
spin_unlock(&nn->nfsd_ssc_lock);
/* allow 20secs for mount/unmount for now - revisit */
- if (signal_pending(current) ||
+ if (kthread_should_stop() ||
(schedule_timeout(20*HZ) == 0)) {
finish_wait(&nn->nfsd_ssc_waitq, &wait);
kfree(work);
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c
index 4c1a0a1623e5..3d4fd40c987b 100644
--- a/fs/nfsd/nfssvc.c
+++ b/fs/nfsd/nfssvc.c
@@ -938,15 +938,6 @@ nfsd(void *vrqstp)
current->fs->umask = 0;
- /*
- * thread is spawned with all signals set to SIG_IGN, re-enable
- * the ones that will bring down the thread
- */
- allow_signal(SIGKILL);
- allow_signal(SIGHUP);
- allow_signal(SIGINT);
- allow_signal(SIGQUIT);
-
atomic_inc(&nfsdstats.th_cnt);
set_freezable();
@@ -971,9 +962,6 @@ nfsd(void *vrqstp)
validate_process_creds();
}
- /* Clear signals before calling svc_exit_thread() */
- flush_signals(current);
-
atomic_dec(&nfsdstats.th_cnt);
out:
diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 67ccf1a6459a..b19592673eef 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -700,8 +700,8 @@ static int svc_alloc_arg(struct svc_rqst *rqstp)
/* Made progress, don't sleep yet */
continue;
- set_current_state(TASK_INTERRUPTIBLE);
- if (signalled() || kthread_should_stop()) {
+ set_current_state(TASK_IDLE);
+ if (kthread_should_stop()) {
set_current_state(TASK_RUNNING);
return -EINTR;
}
@@ -736,7 +736,7 @@ rqst_should_sleep(struct svc_rqst *rqstp)
return false;
/* are we shutting down? */
- if (signalled() || kthread_should_stop())
+ if (kthread_should_stop())
return false;
/* are we freezing? */
@@ -758,11 +758,7 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout)
if (rqstp->rq_xprt)
goto out_found;
- /*
- * We have to be able to interrupt this wait
- * to bring down the daemons ...
- */
- set_current_state(TASK_INTERRUPTIBLE);
+ set_current_state(TASK_IDLE);
smp_mb__before_atomic();
clear_bit(SP_CONGESTED, &pool->sp_flags);
clear_bit(RQ_BUSY, &rqstp->rq_flags);
@@ -784,7 +780,7 @@ static struct svc_xprt *svc_get_next_xprt(struct svc_rqst *rqstp, long timeout)
if (!time_left)
atomic_long_inc(&pool->sp_stats.threads_timedout);
- if (signalled() || kthread_should_stop())
+ if (kthread_should_stop())
return ERR_PTR(-EINTR);
return ERR_PTR(-EAGAIN);
out_found:
@@ -882,7 +878,7 @@ int svc_recv(struct svc_rqst *rqstp, long timeout)
try_to_freeze();
cond_resched();
err = -EINTR;
- if (signalled() || kthread_should_stop())
+ if (kthread_should_stop())
goto out;
xprt = svc_get_next_xprt(rqstp, timeout);
base-commit: 284087d4f7d57502b5a41b423b771f16cc6d157a
--
2.44.0
Zoned devices request sequential writing on the same zone. That means
if 2 requests on the saem zone, the lower pos request need to dispatch
to device first.
While different priority has it's own tree & list, request with high
priority will be disptch first.
So if requestA & requestB are on the same zone. RequestA is BE and pos
is X+0. ReqeustB is RT and pos is X+1. RequestB will be disptched before
requestA, which got an ERROR from zoned device.
This is found in a practice scenario when using F2FS on zoned device.
And it is very easy to reproduce:
1. Use fsstress to run 8 test processes
2. Use ionice to change 4/8 processes to RT priority
Fixes: c807ab520fc3 ("block/mq-deadline: Add I/O priority support")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Wu Bo <bo.wu(a)vivo.com>
---
block/mq-deadline.c | 31 +++++++++++++++++++++++++++++++
include/linux/blk-mq.h | 15 +++++++++++++++
2 files changed, 46 insertions(+)
diff --git a/block/mq-deadline.c b/block/mq-deadline.c
index 02a916ba62ee..6a05dd86e8ca 100644
--- a/block/mq-deadline.c
+++ b/block/mq-deadline.c
@@ -539,6 +539,37 @@ static struct request *__dd_dispatch_request(struct deadline_data *dd,
if (started_after(dd, rq, latest_start))
return NULL;
+ if (!blk_rq_is_seq_zoned_write(rq))
+ goto skip_check;
+ /*
+ * To ensure sequential writing, check the lower priority class to see
+ * if there is a request on the same zone and need to be dispatched
+ * first
+ */
+ ioprio_class = dd_rq_ioclass(rq);
+ prio = ioprio_class_to_prio[ioprio_class];
+ prio++;
+ for (; prio <= DD_PRIO_MAX; prio++) {
+ struct request *temp_rq;
+ unsigned long flags;
+ bool can_dispatch;
+
+ if (!dd_queued(dd, prio))
+ continue;
+
+ temp_rq = deadline_from_pos(&dd->per_prio[prio], data_dir, blk_rq_pos(rq));
+ if (temp_rq && blk_req_zone_in_one(temp_rq, rq) &&
+ blk_rq_pos(temp_rq) < blk_rq_pos(rq)) {
+ spin_lock_irqsave(&dd->zone_lock, flags);
+ can_dispatch = blk_req_can_dispatch_to_zone(temp_rq);
+ spin_unlock_irqrestore(&dd->zone_lock, flags);
+ if (!can_dispatch)
+ return NULL;
+ rq = temp_rq;
+ per_prio = &dd->per_prio[prio];
+ }
+ }
+skip_check:
/*
* rq is the selected appropriate request.
*/
diff --git a/include/linux/blk-mq.h b/include/linux/blk-mq.h
index d3d8fd8e229b..bca1e639e0f3 100644
--- a/include/linux/blk-mq.h
+++ b/include/linux/blk-mq.h
@@ -1202,6 +1202,15 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq)
return true;
return !blk_req_zone_is_write_locked(rq);
}
+
+static inline bool blk_req_zone_in_one(struct request *rq_a,
+ struct request *rq_b)
+{
+ unsigned int zone_sectors = rq_a->q->limits.chunk_sectors;
+
+ return round_down(blk_rq_pos(rq_a), zone_sectors) ==
+ round_down(blk_rq_pos(rq_b), zone_sectors);
+}
#else /* CONFIG_BLK_DEV_ZONED */
static inline bool blk_rq_is_seq_zoned_write(struct request *rq)
{
@@ -1229,6 +1238,12 @@ static inline bool blk_req_can_dispatch_to_zone(struct request *rq)
{
return true;
}
+
+static inline bool blk_req_zone_in_one(struct request *rq_a,
+ struct request *rq_b)
+{
+ return false;
+}
#endif /* CONFIG_BLK_DEV_ZONED */
#endif /* BLK_MQ_H */
--
2.35.3
From: Ronald Wahl <ronald.wahl(a)raritan.com>
Under some circumstances it may happen that the ks8851 Ethernet driver
stops sending data.
Currently the interrupt handler resets the interrupt status flags in the
hardware after handling TX. With this approach we may lose interrupts in
the time window between handling the TX interrupt and resetting the TX
interrupt status bit.
When all of the three following conditions are true then transmitting
data stops:
- TX queue is stopped to wait for room in the hardware TX buffer
- no queued SKBs in the driver (txq) that wait for being written to hw
- hardware TX buffer is empty and the last TX interrupt was lost
This is because reenabling the TX queue happens when handling the TX
interrupt status but if the TX status bit has already been cleared then
this interrupt will never come.
With this commit the interrupt status flags will be cleared before they
are handled. That way we stop losing interrupts.
The wrong handling of the ISR flags was there from the beginning but
with commit 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX
buffer overrun") the issue becomes apparent.
Fixes: 3dc5d4454545 ("net: ks8851: Fix TX stall caused by TX buffer overrun")
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Simon Horman <horms(a)kernel.org>
Cc: netdev(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # 5.10+
Signed-off-by: Ronald Wahl <ronald.wahl(a)raritan.com>
---
drivers/net/ethernet/micrel/ks8851_common.c | 18 +-----------------
1 file changed, 1 insertion(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/micrel/ks8851_common.c b/drivers/net/ethernet/micrel/ks8851_common.c
index 502518cdb461..6453c92f0fa7 100644
--- a/drivers/net/ethernet/micrel/ks8851_common.c
+++ b/drivers/net/ethernet/micrel/ks8851_common.c
@@ -328,7 +328,6 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
{
struct ks8851_net *ks = _ks;
struct sk_buff_head rxq;
- unsigned handled = 0;
unsigned long flags;
unsigned int status;
struct sk_buff *skb;
@@ -336,24 +335,17 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
ks8851_lock(ks, &flags);
status = ks8851_rdreg16(ks, KS_ISR);
+ ks8851_wrreg16(ks, KS_ISR, status);
netif_dbg(ks, intr, ks->netdev,
"%s: status 0x%04x\n", __func__, status);
- if (status & IRQ_LCI)
- handled |= IRQ_LCI;
-
if (status & IRQ_LDI) {
u16 pmecr = ks8851_rdreg16(ks, KS_PMECR);
pmecr &= ~PMECR_WKEVT_MASK;
ks8851_wrreg16(ks, KS_PMECR, pmecr | PMECR_WKEVT_LINK);
-
- handled |= IRQ_LDI;
}
- if (status & IRQ_RXPSI)
- handled |= IRQ_RXPSI;
-
if (status & IRQ_TXI) {
unsigned short tx_space = ks8851_rdreg16(ks, KS_TXMIR);
@@ -365,20 +357,12 @@ static irqreturn_t ks8851_irq(int irq, void *_ks)
if (netif_queue_stopped(ks->netdev))
netif_wake_queue(ks->netdev);
spin_unlock(&ks->statelock);
-
- handled |= IRQ_TXI;
}
- if (status & IRQ_RXI)
- handled |= IRQ_RXI;
-
if (status & IRQ_SPIBEI) {
netdev_err(ks->netdev, "%s: spi bus error\n", __func__);
- handled |= IRQ_SPIBEI;
}
- ks8851_wrreg16(ks, KS_ISR, handled);
-
if (status & IRQ_RXI) {
/* the datasheet says to disable the rx interrupt during
* packet read-out, however we're masking the interrupt
--
2.45.0
The initial change to set x86_virt_bits to the correct value straight
away broke boot on Intel Quark X1000 CPUs (which are family 5, model 9,
stepping 0)
With deeper investigation it appears that the Quark doesn't have
the bit 19 set in 0x01 CPUID leaf, which means it doesn't provide
any clflush instructions and hence the cache alignment is set to 0.
The actual cache line size is 16 bytes, hence we may set the alignment
accordingly. At the same time the physical and virtual address bits
are retrieved via 0x80000008 CPUID leaf.
Note, we don't really care about the value of x86_clflush_size as it
is either used with a proper check for the instruction to be present,
or, like in PCI case, it assumes 32 bytes for all supported 32-bit CPUs
that have actually smaller cache line sizes and don't advertise it.
The commit fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct
value straight away, instead of a two-phase approach") basically
revealed the issue that has been present from day 1 of introducing
the Quark support.
Fixes: aece118e487a ("x86: Add cpu_detect_cache_sizes to init_intel() add Quark legacy_cache()")
Cc: stable(a)vger.kernel.org
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
---
arch/x86/kernel/cpu/intel.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index be30d7fa2e66..2bffae158dd5 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -321,6 +321,15 @@ static void early_init_intel(struct cpuinfo_x86 *c)
#ifdef CONFIG_X86_64
set_cpu_cap(c, X86_FEATURE_SYSENTER32);
#else
+ /*
+ * The Quark doesn't have bit 19 set in 0x01 CPUID leaf, which means
+ * it doesn't provide any clflush instructions and hence the cache
+ * alignment is set to 0. The actual cache line size is 16 bytes,
+ * hence set the alignment accordingly. At the same time the physical
+ * and virtual address bits are retrieved via 0x80000008 CPUID leaf.
+ */
+ if (c->x86 == 5 && c->x86_model == 9)
+ c->x86_cache_alignment = 16;
/* Netburst reports 64 bytes clflush size, but does IO in 128 bytes */
if (c->x86 == 15 && c->x86_cache_alignment == 64)
c->x86_cache_alignment = 128;
--
2.43.0.rc1.1336.g36b5255a03ac
I'm announcing the release of the 6.9.1 kernel.
All users of the 6.9 kernel series must upgrade.
The updated 6.9.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.9.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
drivers/dma/idxd/cdev.c | 77 +++++++++++++++++++++++
drivers/dma/idxd/idxd.h | 3
drivers/dma/idxd/init.c | 4 +
drivers/dma/idxd/registers.h | 3
drivers/dma/idxd/sysfs.c | 27 +++++++-
drivers/net/wireless/mediatek/mt76/mt7915/main.c | 4 +
drivers/vfio/pci/vfio_pci.c | 2
include/linux/pci_ids.h | 2
security/keys/key.c | 3
10 files changed, 120 insertions(+), 7 deletions(-)
Arjan van de Ven (2):
VFIO: Add the SPR_DSA and SPR_IAX devices to the denylist
dmaengine: idxd: add a new security check to deal with a hardware erratum
Ben Greear (1):
wifi: mt76: mt7915: add missing chanctx ops
Greg Kroah-Hartman (1):
Linux 6.9.1
Nikhil Rao (1):
dmaengine: idxd: add a write() method for applications to submit work
Silvio Gissi (1):
keys: Fix overwrite of key expiration on instantiation
PERST# is active low according to the PCIe specification.
However, the existing pcie-dw-rockchip.c driver does:
gpiod_set_value(..., 0); msleep(100); gpiod_set_value(..., 1);
When asserting + deasserting PERST#.
This is of course wrong, but because all the device trees for this
compatible string have also incorrectly marked this GPIO as ACTIVE_HIGH:
$ git grep -B 10 reset-gpios arch/arm64/boot/dts/rockchip/rk3568*
$ git grep -B 10 reset-gpios arch/arm64/boot/dts/rockchip/rk3588*
The actual toggling of PERST# is correct.
(And we cannot change it anyway, since that would break device tree
compatibility.)
However, this driver does request the GPIO to be initialized as
GPIOD_OUT_HIGH, which does cause a silly sequence where PERST# gets
toggled back and forth for no good reason.
Fix this by requesting the GPIO to be initialized as GPIOD_OUT_LOW
(which for this driver means PERST# asserted).
This will avoid an unnecessary signal change where PERST# gets deasserted
(by devm_gpiod_get_optional()) and then gets asserted
(by rockchip_pcie_start_link()) just a few instructions later.
Before patch, debug prints on EP side, when booting RC:
[ 845.606810] pci: PERST# asserted by host!
[ 852.483985] pci: PERST# de-asserted by host!
[ 852.503041] pci: PERST# asserted by host!
[ 852.610318] pci: PERST# de-asserted by host!
After patch, debug prints on EP side, when booting RC:
[ 125.107921] pci: PERST# asserted by host!
[ 132.111429] pci: PERST# de-asserted by host!
This extra, very short, PERST# assertion + deassertion has been reported
to cause issues with certain WLAN controllers, e.g. RTL8822CE.
Fixes: 0e898eb8df4e ("PCI: rockchip-dwc: Add Rockchip RK356X host controller driver")
Tested-by: Jianfeng Liu <liujianfeng1994(a)gmail.com>
Tested-by: Heiko Stuebner <heiko(a)sntech.de>
Signed-off-by: Niklas Cassel <cassel(a)kernel.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Cc: stable(a)vger.kernel.org # 5.15+
---
Changes since v2:
-Picked up tag from Heiko.
-Change subject (Bjorn).
-s/PERST/PERST#/ (Bjorn).
drivers/pci/controller/dwc/pcie-dw-rockchip.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/controller/dwc/pcie-dw-rockchip.c b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
index d6842141d384..a909e42b4273 100644
--- a/drivers/pci/controller/dwc/pcie-dw-rockchip.c
+++ b/drivers/pci/controller/dwc/pcie-dw-rockchip.c
@@ -240,7 +240,7 @@ static int rockchip_pcie_resource_get(struct platform_device *pdev,
return PTR_ERR(rockchip->apb_base);
rockchip->rst_gpio = devm_gpiod_get_optional(&pdev->dev, "reset",
- GPIOD_OUT_HIGH);
+ GPIOD_OUT_LOW);
if (IS_ERR(rockchip->rst_gpio))
return PTR_ERR(rockchip->rst_gpio);
--
2.44.0
Rockchip platforms use 'GPIO_ACTIVE_HIGH' flag in the devicetree definition
for ep_gpio. This means, whatever the logical value set by the driver for
the ep_gpio, physical line will output the same logic level.
For instance,
gpiod_set_value_cansleep(rockchip->ep_gpio, 0); --> Level low
gpiod_set_value_cansleep(rockchip->ep_gpio, 1); --> Level high
But while requesting the ep_gpio, GPIOD_OUT_HIGH flag is currently used.
Now, this also causes the physical line to output 'high' creating trouble
for endpoint devices during host reboot.
When host reboot happens, the ep_gpio will initially output 'low' due to
the GPIO getting reset to its POR value. Then during host controller probe,
it will output 'high' due to GPIOD_OUT_HIGH flag. Then during
rockchip_pcie_host_init_port(), it will first output 'low' and then 'high'
indicating the completion of controller initialization.
On the endpoint side, each output 'low' of ep_gpio is accounted for PERST#
assert and 'high' for PERST# deassert. With the above mentioned flow during
host reboot, endpoint will witness below state changes for PERST#:
(1) PERST# assert - GPIO POR state
(2) PERST# deassert - GPIOD_OUT_HIGH while requesting GPIO
(3) PERST# assert - rockchip_pcie_host_init_port()
(4) PERST# deassert - rockchip_pcie_host_init_port()
Now the time interval between (2) and (3) is very short as both happen
during the driver probe(), and this results in a race in the endpoint.
Because, before completing the PERST# deassertion in (2), endpoint got
another PERST# assert in (3).
A proper way to fix this issue is to change the GPIOD_OUT_HIGH flag in (2)
to GPIOD_OUT_LOW. Because the usual convention is to request the GPIO with
a state corresponding to its 'initial/default' value and let the driver
change the state of the GPIO when required.
As per that, the ep_gpio should be requested with GPIOD_OUT_LOW as it
corresponds to the POR value of '0' (PERST# assert in the endpoint). Then
the driver can change the state of the ep_gpio later in
rockchip_pcie_host_init_port() as per the initialization sequence.
This fixes the firmware crash issue in Qcom based modems connected to
Rockpro64 based board.
Cc: <stable(a)vger.kernel.org> # 4.9
Reported-by: Slark Xiao <slark_xiao(a)163.com>
Closes: https://lore.kernel.org/mhi/20240402045647.GG2933@thinkpad/
Fixes: e77f847df54c ("PCI: rockchip: Add Rockchip PCIe controller support")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
drivers/pci/controller/pcie-rockchip.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pci/controller/pcie-rockchip.c b/drivers/pci/controller/pcie-rockchip.c
index 0ef2e622d36e..c07d7129f1c7 100644
--- a/drivers/pci/controller/pcie-rockchip.c
+++ b/drivers/pci/controller/pcie-rockchip.c
@@ -121,7 +121,7 @@ int rockchip_pcie_parse_dt(struct rockchip_pcie *rockchip)
if (rockchip->is_rc) {
rockchip->ep_gpio = devm_gpiod_get_optional(dev, "ep",
- GPIOD_OUT_HIGH);
+ GPIOD_OUT_LOW);
if (IS_ERR(rockchip->ep_gpio))
return dev_err_probe(dev, PTR_ERR(rockchip->ep_gpio),
"failed to get ep GPIO\n");
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240416-pci-rockchip-perst-fix-88c922621d9a
Best regards,
--
Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Remove wrong mask on subsys_vendor_id. Both the Vendor ID and Subsystem
Vendor ID are u16 variables and are written to a u32 register of the
controller. The Subsystem Vendor ID was always 0 because the u16 value
was masked incorrectly with GENMASK(31,16) resulting in all lower 16
bits being set to 0 prior to the shift.
Remove both masks as they are unnecessary and set the register correctly
i.e., the lower 16-bits are the Vendor ID and the upper 16-bits are the
Subsystem Vendor ID.
This is documented in the RK3399 TRM section 17.6.7.1.17
Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller")
Signed-off-by: Rick Wertenbroek <rick.wertenbroek(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
drivers/pci/controller/pcie-rockchip-ep.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/pci/controller/pcie-rockchip-ep.c b/drivers/pci/controller/pcie-rockchip-ep.c
index c9046e97a1d2..37d4bcb8bd5b 100644
--- a/drivers/pci/controller/pcie-rockchip-ep.c
+++ b/drivers/pci/controller/pcie-rockchip-ep.c
@@ -98,10 +98,9 @@ static int rockchip_pcie_ep_write_header(struct pci_epc *epc, u8 fn, u8 vfn,
/* All functions share the same vendor ID with function 0 */
if (fn == 0) {
- u32 vid_regs = (hdr->vendorid & GENMASK(15, 0)) |
- (hdr->subsys_vendor_id & GENMASK(31, 16)) << 16;
-
- rockchip_pcie_write(rockchip, vid_regs,
+ rockchip_pcie_write(rockchip,
+ hdr->vendorid |
+ hdr->subsys_vendor_id << 16,
PCIE_CORE_CONFIG_VENDOR);
}
--
2.25.1
Imagine an mmap()'d file. Two threads touch the same address at the same
time and fault. Both allocate a physical page and race to install a PTE
for that page. Only one will win the race. The loser frees its page, but
still continues handling the fault as a success and returns
VM_FAULT_NOPAGE from the fault handler.
The same race can happen with SGX. But there's a bug: the loser in the
SGX steers into a failure path. The loser EREMOVE's the winner's EPC
page, then returns SIGBUS, likely killing the app.
Fix the SGX loser's behavior. Change the return code to VM_FAULT_NOPAGE
to avoid SIGBUS and call sgx_free_epc_page() which avoids EREMOVE'ing
the winner's page and only frees the page that the loser allocated.
The race can be illustrated as follows:
/* /*
* Fault on CPU1 * Fault on CPU2
* on enclave page X * on enclave page X
*/ */
sgx_vma_fault() { sgx_vma_fault() {
xa_load(&encl->page_array) xa_load(&encl->page_array)
== NULL --> == NULL -->
sgx_encl_eaug_page() { sgx_encl_eaug_page() {
... ...
/* /*
* alloc encl_page * alloc encl_page
*/ */
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray
*/
xa_insert(&encl->page_array, ...);
/*
* add page to enclave via EAUG
* (page is in pending state)
*/
/*
* add PTE entry
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
}
/*
* All good up to here: enclave page
* successfully added to enclave,
* ready for EACCEPT from user space
*/
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray,
* this fails with -EBUSY as this
* page was already added by CPU2
*/
xa_insert(&encl->page_array, ...);
err_out_shrink:
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*
* *BUG*: page added by CPU2 is
* yanked from enclave while it
* remains accessible from OS
* perspective (PTE installed)
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
mutex_unlock(&encl->lock);
/*
* *BUG*: SIGBUS is returned
* for a valid enclave page
*/
return VM_FAULT_SIGBUS;
}
}
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable(a)vger.kernel.org
Reported-by: Marcelina Kościelnicka <mwk(a)invisiblethingslab.com>
Suggested-by: Reinette Chatre <reinette.chatre(a)intel.com>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
Reviewed-by: Haitao Huang <haitao.huang(a)linux.intel.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Reviewed-by: Reinette Chatre <reinette.chatre(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..41f14b1a3025 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -382,8 +382,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
* If ret == -EBUSY then page was created in another flow while
* running without encl->lock
*/
- if (ret)
+ if (ret) {
+ if (ret == -EBUSY)
+ vmret = VM_FAULT_NOPAGE;
goto err_out_shrink;
+ }
pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
pginfo.addr = encl_page->desc & PAGE_MASK;
@@ -419,7 +422,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
err_out_shrink:
sgx_encl_shrink(encl, va_page);
err_out_epc:
- sgx_encl_free_epc_page(epc_page);
+ sgx_free_epc_page(epc_page);
err_out_unlock:
mutex_unlock(&encl->lock);
kfree(encl_page);
--
2.34.1
Hi Greg. Here are three reports for regressions introduced during the
6.9 cycle that were not fixed for 6.9 for one reason or another, but are
fixed in mainline now. So they might be good candidates to pick up early
for 6.9.y -- or maybe not, not sure. You are the better judge here. I
just thought you might wanted to know about them.
* net: Bluetooth: firmware loading problems with older firmware:
https://lore.kernel.org/lkml/20240401144424.1714-1-mike@fireburn.co.uk/
Fixed by 958cd6beab693f ("Bluetooth: btusb: Fix the patch for MT7920 the
affected to MT7921") – which likely should have gone into 6.9, but did
not due to lack of fixes: an stable tags:
https://lore.kernel.org/all/CABBYNZK1QWNHpmXUyne1Vmqqvy7csmivL7q7N2Mu=2fmrU…
* leds/iwlwifi: hangs on boot:
https://lore.kernel.org/lkml/30f757e3-73c5-5473-c1f8-328bab98fd7d@candelate…
Fixed by 3d913719df14c2 ("wifi: iwlwifi: Use request_module_nowait") –
not sure if that one is worth it, the regression might be an exotic
corner case.
* Ryzen 7840HS CPU single core never boosts to max frequency:
https://bugzilla.kernel.org/show_bug.cgi?id=218759
Fixed by bf202e654bfa57 ("cpufreq: amd-pstate: fix the highest frequency
issue which limits performance") – which was broken out of a patch-set
by the developers to send it in for 6.9, but then was only merged for
6.10 by the maintainer.
Ciao, Thorsten
From: Rafael Beims <rafael.beims(a)toradex.com>
When changing the interface type we also need to update the bss_num, the
driver private data is searched based on a unique (bss_type, bss_num)
tuple, therefore every time bss_type changes, bss_num must also change.
This fixes for example an issue in which, after the mode changed, a
wireless scan on the changed interface would not finish, leading to
repeated -EBUSY messages to userspace when other scan requests were
sent.
Fixes: c606008b7062 ("mwifiex: Properly initialize private structure on interface type changes")
Cc: stable(a)vger.kernel.org
Signed-off-by: Rafael Beims <rafael.beims(a)toradex.com>
Reviewed-by: Francesco Dolcini <francesco.dolcini(a)toradex.com>
Signed-off-by: Francesco Dolcini <francesco.dolcini(a)toradex.com>
---
drivers/net/wireless/marvell/mwifiex/cfg80211.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/wireless/marvell/mwifiex/cfg80211.c b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
index b909a7665e9c..155eb0fab12a 100644
--- a/drivers/net/wireless/marvell/mwifiex/cfg80211.c
+++ b/drivers/net/wireless/marvell/mwifiex/cfg80211.c
@@ -926,6 +926,8 @@ mwifiex_init_new_priv_params(struct mwifiex_private *priv,
return -EOPNOTSUPP;
}
+ priv->bss_num = mwifiex_get_unused_bss_num(adapter, priv->bss_type);
+
spin_lock_irqsave(&adapter->main_proc_lock, flags);
adapter->main_locked = false;
spin_unlock_irqrestore(&adapter->main_proc_lock, flags);
--
2.39.2
This is the start of the stable review cycle for the 6.9.1 release.
There are 5 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 17 May 2024 08:23:27 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.1-rc1.…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.9.1-rc1
Ben Greear <greearb(a)candelatech.com>
wifi: mt76: mt7915: add missing chanctx ops
Silvio Gissi <sifonsec(a)amazon.com>
keys: Fix overwrite of key expiration on instantiation
Nikhil Rao <nikhil.rao(a)intel.com>
dmaengine: idxd: add a write() method for applications to submit work
Arjan van de Ven <arjan(a)linux.intel.com>
dmaengine: idxd: add a new security check to deal with a hardware erratum
Arjan van de Ven <arjan(a)linux.intel.com>
VFIO: Add the SPR_DSA and SPR_IAX devices to the denylist
-------------
Diffstat:
Makefile | 4 +-
drivers/dma/idxd/cdev.c | 77 ++++++++++++++++++++++++
drivers/dma/idxd/idxd.h | 3 +
drivers/dma/idxd/init.c | 4 ++
drivers/dma/idxd/registers.h | 3 -
drivers/dma/idxd/sysfs.c | 27 ++++++++-
drivers/net/wireless/mediatek/mt76/mt7915/main.c | 4 ++
drivers/vfio/pci/vfio_pci.c | 2 +
include/linux/pci_ids.h | 2 +
security/keys/key.c | 3 +-
10 files changed, 121 insertions(+), 8 deletions(-)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x d85cf67a339685beae1d0aee27b7f61da95455be
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051347-buddhism-hunting-5c42@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
d85cf67a3396 ("net: bcmgenet: synchronize EXT_RGMII_OOB_CTRL access")
696450c05181 ("net: bcmgenet: Clear RGMII_LINK upon link down")
fc13d8c03773 ("net: bcmgenet: pull mac_config from adjust_link")
fcb5dfe7dc40 ("net: bcmgenet: remove old link state values")
50e356686fa9 ("net: bcmgenet: remove netif_carrier_off from adjust_link")
b972b54a68b2 ("net: bcmgenet: Patch PHY interface for dedicated PHY driver")
28e303da55b3 ("net: broadcom: share header defining UniMAC registers")
12cf8e75727a ("bgmac: add bgmac_umac_*() helpers for accessing UniMAC registers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d85cf67a339685beae1d0aee27b7f61da95455be Mon Sep 17 00:00:00 2001
From: Doug Berger <opendmb(a)gmail.com>
Date: Thu, 25 Apr 2024 15:27:19 -0700
Subject: [PATCH] net: bcmgenet: synchronize EXT_RGMII_OOB_CTRL access
The EXT_RGMII_OOB_CTRL register can be written from different
contexts. It is predominantly written from the adjust_link
handler which is synchronized by the phydev->lock, but can
also be written from a different context when configuring the
mii in bcmgenet_mii_config().
The chances of contention are quite low, but it is conceivable
that adjust_link could occur during resume when WoL is enabled
so use the phydev->lock synchronizer in bcmgenet_mii_config()
to be sure.
Fixes: afe3f907d20f ("net: bcmgenet: power on MII block for all MII modes")
Cc: stable(a)vger.kernel.org
Signed-off-by: Doug Berger <opendmb(a)gmail.com>
Acked-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c b/drivers/net/ethernet/broadcom/genet/bcmmii.c
index 9ada89355747..86a4aa72b3d4 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -2,7 +2,7 @@
/*
* Broadcom GENET MDIO routines
*
- * Copyright (c) 2014-2017 Broadcom
+ * Copyright (c) 2014-2024 Broadcom
*/
#include <linux/acpi.h>
@@ -275,6 +275,7 @@ int bcmgenet_mii_config(struct net_device *dev, bool init)
* block for the interface to work, unconditionally clear the
* Out-of-band disable since we do not need it.
*/
+ mutex_lock(&phydev->lock);
reg = bcmgenet_ext_readl(priv, EXT_RGMII_OOB_CTRL);
reg &= ~OOB_DISABLE;
if (priv->ext_phy) {
@@ -286,6 +287,7 @@ int bcmgenet_mii_config(struct net_device *dev, bool init)
reg |= RGMII_MODE_EN;
}
bcmgenet_ext_writel(priv, reg, EXT_RGMII_OOB_CTRL);
+ mutex_unlock(&phydev->lock);
if (init)
dev_info(kdev, "configuring instance for %s\n", phy_name);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d85cf67a339685beae1d0aee27b7f61da95455be
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051343-casket-astride-c192@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
d85cf67a3396 ("net: bcmgenet: synchronize EXT_RGMII_OOB_CTRL access")
696450c05181 ("net: bcmgenet: Clear RGMII_LINK upon link down")
fc13d8c03773 ("net: bcmgenet: pull mac_config from adjust_link")
fcb5dfe7dc40 ("net: bcmgenet: remove old link state values")
50e356686fa9 ("net: bcmgenet: remove netif_carrier_off from adjust_link")
b972b54a68b2 ("net: bcmgenet: Patch PHY interface for dedicated PHY driver")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d85cf67a339685beae1d0aee27b7f61da95455be Mon Sep 17 00:00:00 2001
From: Doug Berger <opendmb(a)gmail.com>
Date: Thu, 25 Apr 2024 15:27:19 -0700
Subject: [PATCH] net: bcmgenet: synchronize EXT_RGMII_OOB_CTRL access
The EXT_RGMII_OOB_CTRL register can be written from different
contexts. It is predominantly written from the adjust_link
handler which is synchronized by the phydev->lock, but can
also be written from a different context when configuring the
mii in bcmgenet_mii_config().
The chances of contention are quite low, but it is conceivable
that adjust_link could occur during resume when WoL is enabled
so use the phydev->lock synchronizer in bcmgenet_mii_config()
to be sure.
Fixes: afe3f907d20f ("net: bcmgenet: power on MII block for all MII modes")
Cc: stable(a)vger.kernel.org
Signed-off-by: Doug Berger <opendmb(a)gmail.com>
Acked-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/broadcom/genet/bcmmii.c b/drivers/net/ethernet/broadcom/genet/bcmmii.c
index 9ada89355747..86a4aa72b3d4 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmmii.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmmii.c
@@ -2,7 +2,7 @@
/*
* Broadcom GENET MDIO routines
*
- * Copyright (c) 2014-2017 Broadcom
+ * Copyright (c) 2014-2024 Broadcom
*/
#include <linux/acpi.h>
@@ -275,6 +275,7 @@ int bcmgenet_mii_config(struct net_device *dev, bool init)
* block for the interface to work, unconditionally clear the
* Out-of-band disable since we do not need it.
*/
+ mutex_lock(&phydev->lock);
reg = bcmgenet_ext_readl(priv, EXT_RGMII_OOB_CTRL);
reg &= ~OOB_DISABLE;
if (priv->ext_phy) {
@@ -286,6 +287,7 @@ int bcmgenet_mii_config(struct net_device *dev, bool init)
reg |= RGMII_MODE_EN;
}
bcmgenet_ext_writel(priv, reg, EXT_RGMII_OOB_CTRL);
+ mutex_unlock(&phydev->lock);
if (init)
dev_info(kdev, "configuring instance for %s\n", phy_name);
From: Ard Biesheuvel <ardb(a)kernel.org>
The legacy decompressor has elaborate logic to ensure that the
randomized physical placement of the decompressed kernel image does not
conflict with any memory reservations, including ones specified on the
command line using mem=, memmap=, efi_fake_mem= or hugepages=, which are
taken into account by the kernel proper at a later stage.
When booting in EFI mode, it is the firmware's job to ensure that the
chosen range does not conflict with any memory reservations that it
knows about, and this is trivially achieved by using the firmware's
memory allocation APIs.
That leaves reservations specified on the command line, though, which
the firmware knows nothing about, as these regions have no other special
significance to the platform. Since commit
a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
these reservations are not taken into account when randomizing the
physical placement, which may result in conflicts where the memory
cannot be reserved by the kernel proper because its own executable image
resides there.
To avoid having to duplicate or reuse the existing complicated logic,
disable physical KASLR entirely when such overrides are specified. These
are mostly diagnostic tools or niche features, and physical KASLR (as
opposed to virtual KASLR, which is much more important as it affects the
memory addresses observed by code executing in the kernel) is something
we can live without.
Closes: https://lkml.kernel.org/r/FA5F6719-8824-4B04-803E-82990E65E627%40akamai.com
Reported-by: Ben Chaney <bchaney(a)akamai.com>
Fixes: a1b87d54f4e4 ("x86/efistub: Avoid legacy decompressor when doing EFI boot")
Cc: <stable(a)vger.kernel.org> # v6.1+
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
drivers/firmware/efi/libstub/x86-stub.c | 28 +++++++++++++++++++++++--
1 file changed, 26 insertions(+), 2 deletions(-)
diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c
index d5a8182cf2e1..1983fd3bf392 100644
--- a/drivers/firmware/efi/libstub/x86-stub.c
+++ b/drivers/firmware/efi/libstub/x86-stub.c
@@ -776,6 +776,26 @@ static void error(char *str)
efi_warn("Decompression failed: %s\n", str);
}
+static const char *cmdline_memmap_override;
+
+static efi_status_t parse_options(const char *cmdline)
+{
+ static const char opts[][14] = {
+ "mem=", "memmap=", "efi_fake_mem=", "hugepages="
+ };
+
+ for (int i = 0; i < ARRAY_SIZE(opts); i++) {
+ const char *p = strstr(cmdline, opts[i]);
+
+ if (p == cmdline || (p > cmdline && isspace(p[-1]))) {
+ cmdline_memmap_override = opts[i];
+ break;
+ }
+ }
+
+ return efi_parse_options(cmdline);
+}
+
static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
{
unsigned long virt_addr = LOAD_PHYSICAL_ADDR;
@@ -807,6 +827,10 @@ static efi_status_t efi_decompress_kernel(unsigned long *kernel_entry)
!memcmp(efistub_fw_vendor(), ami, sizeof(ami))) {
efi_debug("AMI firmware v2.0 or older detected - disabling physical KASLR\n");
seed[0] = 0;
+ } else if (cmdline_memmap_override) {
+ efi_info("%s detected on the kernel command line - disabling physical KASLR\n",
+ cmdline_memmap_override);
+ seed[0] = 0;
}
boot_params_ptr->hdr.loadflags |= KASLR_FLAG;
@@ -883,7 +907,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
}
#ifdef CONFIG_CMDLINE_BOOL
- status = efi_parse_options(CONFIG_CMDLINE);
+ status = parse_options(CONFIG_CMDLINE);
if (status != EFI_SUCCESS) {
efi_err("Failed to parse options\n");
goto fail;
@@ -892,7 +916,7 @@ void __noreturn efi_stub_entry(efi_handle_t handle,
if (!IS_ENABLED(CONFIG_CMDLINE_OVERRIDE)) {
unsigned long cmdline_paddr = ((u64)hdr->cmd_line_ptr |
((u64)boot_params->ext_cmd_line_ptr << 32));
- status = efi_parse_options((char *)cmdline_paddr);
+ status = parse_options((char *)cmdline_paddr);
if (status != EFI_SUCCESS) {
efi_err("Failed to parse options\n");
goto fail;
--
2.45.0.rc1.225.g2a3ae87e7f-goog
The canary in collect_domain_accesses() can be triggered when trying to
link a root mount point. This cannot work in practice because this
directory is mounted, but the VFS check is done after the call to
security_path_link().
Do not use source directory's d_parent when the source directory is the
mount point.
Add tests to check error codes when renaming or linking a mount root
directory. This previously triggered a kernel warning. The
linux/mount.h file is not sorted with other headers to ease backport to
Linux 6.1 .
Cc: Günther Noack <gnoack(a)google.com>
Cc: Paul Moore <paul(a)paul-moore.com>
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+bf4903dc7e12b18ebc87(a)syzkaller.appspotmail.com
Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER")
Closes: https://lore.kernel.org/r/000000000000553d3f0618198200@google.com
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
Link: https://lore.kernel.org/r/20240516181935.1645983-2-mic@digikod.net
---
security/landlock/fs.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 22d8b7c28074..7877a64cc6b8 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -1110,6 +1110,7 @@ static int current_check_refer_path(struct dentry *const old_dentry,
bool allow_parent1, allow_parent2;
access_mask_t access_request_parent1, access_request_parent2;
struct path mnt_dir;
+ struct dentry *old_parent;
layer_mask_t layer_masks_parent1[LANDLOCK_NUM_ACCESS_FS] = {},
layer_masks_parent2[LANDLOCK_NUM_ACCESS_FS] = {};
@@ -1157,9 +1158,17 @@ static int current_check_refer_path(struct dentry *const old_dentry,
mnt_dir.mnt = new_dir->mnt;
mnt_dir.dentry = new_dir->mnt->mnt_root;
+ /*
+ * old_dentry may be the root of the common mount point and
+ * !IS_ROOT(old_dentry) at the same time (e.g. with open_tree() and
+ * OPEN_TREE_CLONE). We do not need to call dget(old_parent) because
+ * we keep a reference to old_dentry.
+ */
+ old_parent = (old_dentry == mnt_dir.dentry) ? old_dentry :
+ old_dentry->d_parent;
+
/* new_dir->dentry is equal to new_dentry->d_parent */
- allow_parent1 = collect_domain_accesses(dom, mnt_dir.dentry,
- old_dentry->d_parent,
+ allow_parent1 = collect_domain_accesses(dom, mnt_dir.dentry, old_parent,
&layer_masks_parent1);
allow_parent2 = collect_domain_accesses(
dom, mnt_dir.dentry, new_dir->dentry, &layer_masks_parent2);
--
2.45.0
Hello,
I encountered an issue when upgrading to 6.1.89 from 6.1.77. This upgrade caused a breakage in emulated persistent memory. Significant amounts of memory are missing from a pmem device:
fdisk -l /dev/pmem*
Disk /dev/pmem0: 355.9 GiB, 382117871616 bytes, 746323968 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disk /dev/pmem1: 25.38 GiB, 27246198784 bytes, 53215232 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
The memmap parameter that created these pmem devices is “memmap=364416M!28672M,367488M!419840M”, which should cause a much larger amount of memory to be allocated to /dev/pmem1. The amount of missing memory and the device it is missing from is randomized on each reboot. There is some amount of memory missing in almost all cases, but not 100% of the time. Notably, the memory that is missing from these devices is not reclaimed by the system for general use. This system in question has 768GB of memory split evenly across two NUMA nodes.
When the error occurs, there are also the following error messages showing up in dmesg:
[ 5.318317] nd_pmem namespace1.0: [mem 0x5c2042c000-0x5ff7ffffff flags 0x200] misaligned, unable to map
[ 5.335073] nd_pmem: probe of namespace1.0 failed with error -95
Bisection implicates 2dfaeac3f38e4e550d215204eedd97a061fdc118 as the patch that first caused the issue. I believe the cause of the issue is that the EFI stub is randomizing the location of the decompressed kernel without accounting for the memory map, and it is clobbering some of the memory that has been reserved for pmem.
Thank you,
Ben Chaney
On Thu, May 16, 2024 at 04:50:34AM -0700, Alexander Sergeev wrote:
> Good afternoon!
>
> Thank you for your advice! Unfortunately, I couldn't find any way or
> form for reporting issues in netlink, so I decided to use a bug
> reporting guide for kernel in general. Could you please let me know
> where could I forward this information? How could I contact netlink
> development team?
The NETWORKING section of the MAINTAINERS file lists:
netdev(a)vger.kernel.org
as the proper place to go.
greg k-h
Good morning!
Recently, I have found an issue with `rtnetlink` library that seems to be not intended and/or documented. I have asked about it several times, including here and it was also reported here. Neither in related RFC document nor in rtnetlink manuals the issue is described or mentioned.
In a few words, the issue is: for some reason, rtnetlink GETADDR request works only with NLM_F_ROOT or NLM_F_DUMP flags. Consequently, for instance, filtering by ifa_index field, that would *theoretically* according to docs, allow us to receive address info for 1 specific interface only (by ID) , is not possible. For comparison, similar functionality to GETLINK request (getting information about exactly 1 link by index) is indeed very possible. Instead of the expected output (information about 1 interface only), what is returned is an error message with errno -95, which, as I suppose, can be read as "operation not permitted".
To this email, I attach a small C file illustrating my issue. Feel free to change IFACE_ID define for ID of any network interface present in your system. It illustrates the error.
Please, let me know what you think about this issue. Am I missing something or does it really needs fixing or at least documenting?
Best regards,
Aleksandr Sergeev.
P.S. I have been told that this issue probably won't be addressed, because it "can easily be fixed in user space", however I believe in Linux kernel maintainers team dedication for clear, well-documented and bug-free code; so if this issue is not hard to fix, I would suggest addressing it at least in docs. If it can be useful, I would be more than grateful to provide my help with it.
A debugfs directory entry is create early during probe(). This entry is
not removed on error path leading to some "already present" issues in
case of EPROBE_DEFER.
Create this entry later in the probe() code to avoid the need to change
many 'return' in 'goto' and add the removal in the already present error
path.
Fixes: 942814840127 ("net: lan966x: Add VCAP debugFS support")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>
---
drivers/net/ethernet/microchip/lan966x/lan966x_main.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
index 2635ef8958c8..61d88207eed4 100644
--- a/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
+++ b/drivers/net/ethernet/microchip/lan966x/lan966x_main.c
@@ -1087,8 +1087,6 @@ static int lan966x_probe(struct platform_device *pdev)
platform_set_drvdata(pdev, lan966x);
lan966x->dev = &pdev->dev;
- lan966x->debugfs_root = debugfs_create_dir("lan966x", NULL);
-
if (!device_get_mac_address(&pdev->dev, mac_addr)) {
ether_addr_copy(lan966x->base_mac, mac_addr);
} else {
@@ -1179,6 +1177,8 @@ static int lan966x_probe(struct platform_device *pdev)
return dev_err_probe(&pdev->dev, -ENODEV,
"no ethernet-ports child found\n");
+ lan966x->debugfs_root = debugfs_create_dir("lan966x", NULL);
+
/* init switch */
lan966x_init(lan966x);
lan966x_stats_init(lan966x);
@@ -1257,6 +1257,8 @@ static int lan966x_probe(struct platform_device *pdev)
destroy_workqueue(lan966x->stats_queue);
mutex_destroy(&lan966x->stats_lock);
+ debugfs_remove_recursive(lan966x->debugfs_root);
+
return err;
}
--
This patch was previously sent as part of a bigger series:
https://lore.kernel.org/lkml/20240430083730.134918-9-herve.codina@bootlin.c…
As it is a simple fix, this v2 is the patch extracted from the series
and sent alone to net.
Changes v1 -> v2
Add 'Reviewed-by: Andrew Lunn <andrew(a)lunn.ch>'
2.44.0
Early during NAND identification, mtd_info fields have not yet been
initialized (namely, writesize and oobsize) and thus cannot be used for
sanity checks yet. Of course if there is a misuse of
nand_change_read_column_op() so early we won't be warned, but there is
anyway no actual check to perform at this stage as we do not yet know
the NAND geometry.
So, if the fields are empty, especially mtd->writesize which is *always*
set quite rapidly after identification, let's skip the sanity checks.
nand_change_read_column_op() is subject to be used early for ONFI/JEDEC
identification in the very unlikely case of:
- bitflips appearing in the parameter page,
- the controller driver not supporting simple DATA_IN cycles.
Fixes: c27842e7e11f ("mtd: rawnand: onfi: Adapt the parameter page read to constraint controllers")
Fixes: daca31765e8b ("mtd: rawnand: jedec: Adapt the parameter page read to constraint controllers")
Cc: stable(a)vger.kernel.org
Reported-by: Alexander Dahl <ada(a)thorsis.com>
Closes: https://lore.kernel.org/linux-mtd/20240306-shaky-bunion-d28b65ea97d7@thorsi…
Reported-by: Steven Seeger <steven.seeger(a)flightsystems.net>
Closes: https://lore.kernel.org/linux-mtd/DM6PR05MB4506554457CF95191A670BDEF7062@DM…
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/raw/nand_base.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index 248e654ecefd..a66e73cd68cb 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -1440,12 +1440,14 @@ int nand_change_read_column_op(struct nand_chip *chip,
if (len && !buf)
return -EINVAL;
- if (offset_in_page + len > mtd->writesize + mtd->oobsize)
- return -EINVAL;
+ if (mtd->writesize) {
+ if ((offset_in_page + len > mtd->writesize + mtd->oobsize))
+ return -EINVAL;
- /* Small page NANDs do not support column change. */
- if (mtd->writesize <= 512)
- return -ENOTSUPP;
+ /* Small page NANDs do not support column change. */
+ if (mtd->writesize <= 512)
+ return -ENOTSUPP;
+ }
if (nand_has_exec_op(chip)) {
const struct nand_interface_config *conf =
--
2.40.1
Two enclave threads may try to access the same non-present enclave page
simultaneously (e.g., if the SGX runtime supports lazy allocation). The
threads will end up in sgx_encl_eaug_page(), racing to acquire the
enclave lock. The winning thread will perform EAUG, set up the page
table entry, and insert the page into encl->page_array. The losing
thread will then get -EBUSY on xa_insert(&encl->page_array) and proceed
to error handling path.
This race condition can be illustrated as follows:
/* /*
* Fault on CPU1 * Fault on CPU2
* on enclave page X * on enclave page X
*/ */
sgx_vma_fault() { sgx_vma_fault() {
xa_load(&encl->page_array) xa_load(&encl->page_array)
== NULL --> == NULL -->
sgx_encl_eaug_page() { sgx_encl_eaug_page() {
... ...
/* /*
* alloc encl_page * alloc encl_page
*/ */
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray
*/
xa_insert(&encl->page_array, ...);
/*
* add page to enclave via EAUG
* (page is in pending state)
*/
/*
* add PTE entry
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
}
/*
* All good up to here: enclave page
* successfully added to enclave,
* ready for EACCEPT from user space
*/
mutex_lock(&encl->lock);
/*
* alloc EPC page
*/
epc_page = sgx_alloc_epc_page(...);
/*
* add page to enclave's xarray,
* this fails with -EBUSY as this
* page was already added by CPU2
*/
xa_insert(&encl->page_array, ...);
err_out_shrink:
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*
* *BUG*: page added by CPU2 is
* yanked from enclave while it
* remains accessible from OS
* perspective (PTE installed)
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
mutex_unlock(&encl->lock);
/*
* *BUG*: SIGBUS is returned
* for a valid enclave page
*/
return VM_FAULT_SIGBUS;
}
}
The err_out_shrink error handling path contains two bugs: (1) function
sgx_encl_free_epc_page() is called that performs EREMOVE even though the
enclave page was never intended to be removed, and (2) SIGBUS is sent to
userspace even though the enclave page is correctly installed by another
thread.
The first bug renders the enclave page perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page. The second bug is less severe: a spurious SIGBUS signal is
unnecessarily sent to user space.
Fix these two bugs (1) by returning VM_FAULT_NOPAGE to the generic Linux
fault handler so that no signal is sent to userspace, and (2) by
replacing sgx_encl_free_epc_page() with sgx_free_epc_page() so that no
EREMOVE is performed.
Note that sgx_encl_free_epc_page() performs an additional WARN_ON_ONCE
check in comparison to sgx_free_epc_page(): whether the EPC page is
being reclaimer tracked. However, the EPC page is allocated in
sgx_encl_eaug_page() and has zeroed-out flags in all error handling
paths. In other words, the page is marked as reclaimable only in the
happy path of sgx_encl_eaug_page(). Therefore, in the particular code
path affected in this commit, the "page reclaimer tracked" condition is
always false and the warning is never printed. Thus, it is safe to
replace sgx_encl_free_epc_page() with sgx_free_epc_page().
Fixes: 5a90d2c3f5ef ("x86/sgx: Support adding of pages to an initialized enclave")
Cc: stable(a)vger.kernel.org
Reported-by: Marcelina Kościelnicka <mwk(a)invisiblethingslab.com>
Suggested-by: Reinette Chatre <reinette.chatre(a)intel.com>
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 279148e72459..41f14b1a3025 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -382,8 +382,11 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
* If ret == -EBUSY then page was created in another flow while
* running without encl->lock
*/
- if (ret)
+ if (ret) {
+ if (ret == -EBUSY)
+ vmret = VM_FAULT_NOPAGE;
goto err_out_shrink;
+ }
pginfo.secs = (unsigned long)sgx_get_epc_virt_addr(encl->secs.epc_page);
pginfo.addr = encl_page->desc & PAGE_MASK;
@@ -419,7 +422,7 @@ static vm_fault_t sgx_encl_eaug_page(struct vm_area_struct *vma,
err_out_shrink:
sgx_encl_shrink(encl, va_page);
err_out_epc:
- sgx_encl_free_epc_page(epc_page);
+ sgx_free_epc_page(epc_page);
err_out_unlock:
mutex_unlock(&encl->lock);
kfree(encl_page);
--
2.34.1
Two enclave threads may try to add and remove the same enclave page
simultaneously (e.g., if the SGX runtime supports both lazy allocation
and MADV_DONTNEED semantics). Consider some enclave page added to the
enclave. User space decides to temporarily remove this page (e.g.,
emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
space performs a memory access on the same page on CPU2, which results
in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
follows:
/*
* CPU1: User space performs
* ioctl(SGX_IOC_ENCLAVE_REMOVE_PAGES)
* on enclave page X
*/
sgx_encl_remove_pages() {
mutex_lock(&encl->lock);
entry = sgx_encl_load_page(encl);
/*
* verify that page is
* trimmed and accepted
*/
mutex_unlock(&encl->lock);
/*
* remove PTE entry; cannot
* be performed under lock
*/
sgx_zap_enclave_ptes(encl);
/*
* Fault on CPU2 on same page X
*/
sgx_vma_fault() {
/*
* PTE entry was removed, but the
* page is still in enclave's xarray
*/
xa_load(&encl->page_array) != NULL ->
/*
* SGX driver thinks that this page
* was swapped out and loads it
*/
mutex_lock(&encl->lock);
/*
* this is effectively a no-op
*/
entry = sgx_encl_load_page_in_vma();
/*
* add PTE entry
*
* *BUG*: a PTE is installed for a
* page in process of being removed
*/
vmf_insert_pfn(...);
mutex_unlock(&encl->lock);
return VM_FAULT_NOPAGE;
}
/*
* continue with page removal
*/
mutex_lock(&encl->lock);
sgx_encl_free_epc_page(epc_page) {
/*
* remove page via EREMOVE
*/
/*
* free EPC page
*/
sgx_free_epc_page(epc_page);
}
xa_erase(&encl->page_array);
mutex_unlock(&encl->lock);
}
Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
same page. This enclave page becomes perpetually inaccessible (until
another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
marked accessible in the PTE entry but is not EAUGed, and any subsequent
access to this page raises a fault: with the kernel believing there to
be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
path do_user_addr_fault() -> access_error() causes the SGX driver's
sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
The userspace SIGSEGV handler cannot perform EACCEPT because the page
was not EAUGed. Thus, the user space is stuck with the inaccessible
page.
Fix this race by forcing the fault handler on CPU2 to back off if the
page is currently being removed (on CPU1). This is achieved by
introducing a new flag SGX_ENCL_PAGE_BEING_REMOVED, which is unset by
default and set only right-before the first mutex_unlock() in
sgx_encl_remove_pages(). Upon loading the page, CPU2 checks whether this
page is being removed, and if yes then CPU2 backs off and waits until
the page is completely removed. After that, any memory access to this
page results in a normal "allocate and EAUG a page on #PF" flow.
Fixes: 9849bb27152c ("x86/sgx: Support complete page removal")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dmitrii Kuvaiskii <dmitrii.kuvaiskii(a)intel.com>
---
arch/x86/kernel/cpu/sgx/encl.c | 3 ++-
arch/x86/kernel/cpu/sgx/encl.h | 3 +++
arch/x86/kernel/cpu/sgx/ioctl.c | 1 +
3 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/sgx/encl.c b/arch/x86/kernel/cpu/sgx/encl.c
index 41f14b1a3025..7ccd8b2fce5f 100644
--- a/arch/x86/kernel/cpu/sgx/encl.c
+++ b/arch/x86/kernel/cpu/sgx/encl.c
@@ -257,7 +257,8 @@ static struct sgx_encl_page *__sgx_encl_load_page(struct sgx_encl *encl,
/* Entry successfully located. */
if (entry->epc_page) {
- if (entry->desc & SGX_ENCL_PAGE_BEING_RECLAIMED)
+ if (entry->desc & (SGX_ENCL_PAGE_BEING_RECLAIMED |
+ SGX_ENCL_PAGE_BEING_REMOVED))
return ERR_PTR(-EBUSY);
return entry;
diff --git a/arch/x86/kernel/cpu/sgx/encl.h b/arch/x86/kernel/cpu/sgx/encl.h
index f94ff14c9486..fff5f2293ae7 100644
--- a/arch/x86/kernel/cpu/sgx/encl.h
+++ b/arch/x86/kernel/cpu/sgx/encl.h
@@ -25,6 +25,9 @@
/* 'desc' bit marking that the page is being reclaimed. */
#define SGX_ENCL_PAGE_BEING_RECLAIMED BIT(3)
+/* 'desc' bit marking that the page is being removed. */
+#define SGX_ENCL_PAGE_BEING_REMOVED BIT(2)
+
struct sgx_encl_page {
unsigned long desc;
unsigned long vm_max_prot_bits:8;
diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
index b65ab214bdf5..c542d4dd3e64 100644
--- a/arch/x86/kernel/cpu/sgx/ioctl.c
+++ b/arch/x86/kernel/cpu/sgx/ioctl.c
@@ -1142,6 +1142,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
* Do not keep encl->lock because of dependency on
* mmap_lock acquired in sgx_zap_enclave_ptes().
*/
+ entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED;
mutex_unlock(&encl->lock);
sgx_zap_enclave_ptes(encl, addr);
--
2.34.1
Hi,
There are just two minor fixes from Hector that we've been carrying downstream
for a while now. One increases the timeout while waiting for the firmware to
boot which is optional for the controller already supported upstream but
required for a newer 4388 board for which we'll also submit support soon.
It also fixes the units for the timeouts which is why I've already included it
here. The other one fixes a call to bitmap_release_region where we only wanted
to release a single bit but are actually releasing much more.
Best,
Sven
To: Hector Martin <marcan(a)marcan.st>
To: Alyssa Rosenzweig <alyssa(a)rosenzweig.io>
To: Marcel Holtmann <marcel(a)holtmann.org>
To: Luiz Augusto von Dentz <luiz.dentz(a)gmail.com>
Cc: asahi(a)lists.linux.dev
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-bluetooth(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Sven Peter <sven(a)svenpeter.dev>
Changes in v2:
- split the timeout commit into two
- collect Neal's tag
- Link to v1: https://lore.kernel.org/r/20240512-btfix-msgid-v1-0-ab1bd938a7f4@svenpeter.…
---
Hector Martin (2):
Bluetooth: hci_bcm4377: Increase boot timeout
Bluetooth: hci_bcm4377: Fix msgid release
Sven Peter (1):
Bluetooth: hci_bcm4377: Use correct unit for timeouts
drivers/bluetooth/hci_bcm4377.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
---
base-commit: cf87f46fd34d6c19283d9625a7822f20d90b64a4
change-id: 20240512-btfix-msgid-d76029a7d917
Best regards,
--
Sven Peter <sven(a)svenpeter.dev>
From: Sven Peter <sven(a)svenpeter.dev>
Some Broadcom controllers found on Apple Silicon machines abuse the
reserved bits inside the PHY fields of LE Extended Advertising Report
events for additional flags. Add a quirk to drop these and correctly
extract the Primary/Secondary_PHY field.
The following excerpt from a btmon trace shows a report received with
"Reserved" for "Primary PHY" on a 4388 controller:
> HCI Event: LE Meta Event (0x3e) plen 26
LE Extended Advertising Report (0x0d)
Num reports: 1
Entry 0
Event type: 0x2515
Props: 0x0015
Connectable
Directed
Use legacy advertising PDUs
Data status: Complete
Reserved (0x2500)
Legacy PDU Type: Reserved (0x2515)
Address type: Random (0x01)
Address: 00:00:00:00:00:00 (Static)
Primary PHY: Reserved
Secondary PHY: No packets
SID: no ADI field (0xff)
TX power: 127 dBm
RSSI: -60 dBm (0xc4)
Periodic advertising interval: 0.00 msec (0x0000)
Direct address type: Public (0x00)
Direct address: 00:00:00:00:00:00 (Apple, Inc.)
Data length: 0x00
Cc: stable(a)vger.kernel.org
Fixes: 2e7ed5f5e69b ("Bluetooth: hci_sync: Use advertised PHYs on hci_le_ext_create_conn_sync")
Reported-by: Janne Grunau <j(a)jannau.net>
Closes: https://lore.kernel.org/all/Zjz0atzRhFykROM9@robin
Tested-by: Janne Grunau <j(a)jannau.net>
Signed-off-by: Sven Peter <sven(a)svenpeter.dev>
---
drivers/bluetooth/hci_bcm4377.c | 8 ++++++++
include/net/bluetooth/hci.h | 11 +++++++++++
net/bluetooth/hci_event.c | 7 +++++++
3 files changed, 26 insertions(+)
diff --git a/drivers/bluetooth/hci_bcm4377.c b/drivers/bluetooth/hci_bcm4377.c
index 9a7243d5db71..55109ad45115 100644
--- a/drivers/bluetooth/hci_bcm4377.c
+++ b/drivers/bluetooth/hci_bcm4377.c
@@ -495,6 +495,10 @@ struct bcm4377_data;
* extended scanning
* broken_mws_transport_config: Set to true if the chip erroneously claims to
* support MWS Transport Configuration
+ * broken_le_ext_adv_report_phy: Set to true if this chip stuffs flags inside
+ * reserved bits of Primary/Secondary_PHY inside
+ * LE Extended Advertising Report events which
+ * have to be ignored
* send_calibration: Optional callback to send calibration data
* send_ptb: Callback to send "PTB" regulatory/calibration data
*/
@@ -513,6 +517,7 @@ struct bcm4377_hw {
unsigned long broken_ext_scan : 1;
unsigned long broken_mws_transport_config : 1;
unsigned long broken_le_coded : 1;
+ unsigned long broken_le_ext_adv_report_phy : 1;
int (*send_calibration)(struct bcm4377_data *bcm4377);
int (*send_ptb)(struct bcm4377_data *bcm4377,
@@ -2374,6 +2379,8 @@ static int bcm4377_probe(struct pci_dev *pdev, const struct pci_device_id *id)
set_bit(HCI_QUIRK_BROKEN_EXT_SCAN, &hdev->quirks);
if (bcm4377->hw->broken_le_coded)
set_bit(HCI_QUIRK_BROKEN_LE_CODED, &hdev->quirks);
+ if (bcm4377->hw->broken_le_ext_adv_report_phy)
+ set_bit(HCI_QUIRK_FIXUP_LE_EXT_ADV_REPORT_PHY, &hdev->quirks);
pci_set_drvdata(pdev, bcm4377);
hci_set_drvdata(hdev, bcm4377);
@@ -2478,6 +2485,7 @@ static const struct bcm4377_hw bcm4377_hw_variants[] = {
.clear_pciecfg_subsystem_ctrl_bit19 = true,
.broken_mws_transport_config = true,
.broken_le_coded = true,
+ .broken_le_ext_adv_report_phy = true,
.send_calibration = bcm4387_send_calibration,
.send_ptb = bcm4378_send_ptb,
},
diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index 5c12761cbc0e..342aab86dad6 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -347,6 +347,17 @@ enum {
* claim to support it.
*/
HCI_QUIRK_BROKEN_READ_ENC_KEY_SIZE,
+
+ /*
+ * When this quirk is set, the reserved bits of Primary/Secondary_PHY
+ * inside the LE Extended Advertising Report events are discarded.
+ * This is required for some Apple/Broadcom controllers which
+ * abuse these reserved bits for unrelated flags.
+ *
+ * This quirk can be set before hci_register_dev is called or
+ * during the hdev->setup vendor callback.
+ */
+ HCI_QUIRK_FIXUP_LE_EXT_ADV_REPORT_PHY,
};
/* HCI device flags */
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index d72d238c1656..01d3a8d343f1 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -6446,6 +6446,13 @@ static void hci_le_ext_adv_report_evt(struct hci_dev *hdev, void *data,
evt_type = __le16_to_cpu(info->type) & LE_EXT_ADV_EVT_TYPE_MASK;
legacy_evt_type = ext_evt_type_to_legacy(hdev, evt_type);
+
+ if (test_bit(HCI_QUIRK_FIXUP_LE_EXT_ADV_REPORT_PHY,
+ &hdev->quirks)) {
+ info->primary_phy &= 0x1f;
+ info->secondary_phy &= 0x1f;
+ }
+
if (legacy_evt_type != LE_ADV_INVALID) {
process_adv_report(hdev, legacy_evt_type, &info->bdaddr,
info->bdaddr_type, NULL, 0,
---
base-commit: a38297e3fb012ddfa7ce0321a7e5a8daeb1872b6
change-id: 20240512-btfix-95c10c456f17
Best regards,
--
Sven Peter <sven(a)svenpeter.dev>
Commit 35039eb6b199 ("x86: Show symbol name if insn decoder test failed")
included symbol lines in the post-processed objdump output consumed by
the insn decoder test. This broke the `instuction lines == total lines`
property that `insn_decoder_test.c` relied upon to print the offending
line's number in error messages. This has the consequence that the line
number reported on a test failure is unreated to, and much smaller than,
the line that actually caused the problem.
Add a new variable that counts the combined (insn+symbol) line count and
report this in the error message.
Fixes: 35039eb6b199 ("x86: Show symbol name if insn decoder test failed")
Cc: stable(a)vger.kernel.org
Reviewed-by: Miguel Ojeda <ojeda(a)kernel.org>
Tested-by: Miguel Ojeda <ojeda(a)kernel.org>
Reported-by: John Baublitz <john.m.baublitz(a)gmail.com>
Debugged-by: John Baublitz <john.m.baublitz(a)gmail.com>
Signed-off-by: Valentin Obst <kernel(a)valentinobst.de>
---
See v2's commit message and [1] for context why this bug made debugging a
test failure harder than necessary.
[1]: https://rust-for-linux.zulipchat.com/#narrow/stream/291565-Help/topic/insn_…
Changes in v3:
- Add Cc stable tag in sign-off area.
- Make commit message less verbose.
- Link to v2: https://lore.kernel.org/r/20240223-x86-insn-decoder-line-fix-v2-1-cde49c69f…
Changes in v2:
- Added tags 'Reviewed-by', 'Tested-by', 'Reported-by', 'Debugged-by',
'Link', and 'Fixes'.
- Explain why this patch fixes the commit mentioned in the 'Fixes' tag.
- CCed the stable list and sent to all x86 maintainers.
- Link to v1: https://lore.kernel.org/r/20240221-x86-insn-decoder-line-fix-v1-1-47cd5a171…
---
arch/x86/tools/insn_decoder_test.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/tools/insn_decoder_test.c b/arch/x86/tools/insn_decoder_test.c
index 472540aeabc2..727017a3c3c7 100644
--- a/arch/x86/tools/insn_decoder_test.c
+++ b/arch/x86/tools/insn_decoder_test.c
@@ -114,6 +114,7 @@ int main(int argc, char **argv)
unsigned char insn_buff[16];
struct insn insn;
int insns = 0;
+ int lines = 0;
int warnings = 0;
parse_args(argc, argv);
@@ -123,6 +124,8 @@ int main(int argc, char **argv)
int nb = 0, ret;
unsigned int b;
+ lines++;
+
if (line[0] == '<') {
/* Symbol line */
strcpy(sym, line);
@@ -134,12 +137,12 @@ int main(int argc, char **argv)
strcpy(copy, line);
tab1 = strchr(copy, '\t');
if (!tab1)
- malformed_line(line, insns);
+ malformed_line(line, lines);
s = tab1 + 1;
s += strspn(s, " ");
tab2 = strchr(s, '\t');
if (!tab2)
- malformed_line(line, insns);
+ malformed_line(line, lines);
*tab2 = '\0'; /* Characters beyond tab2 aren't examined */
while (s < tab2) {
if (sscanf(s, "%x", &b) == 1) {
---
base-commit: 4cece764965020c22cff7665b18a012006359095
change-id: 20240221-x86-insn-decoder-line-fix-7b1f2e1732ff
Best regards,
--
Valentin Obst <kernel(a)valentinobst.de>
Memory access #VE's are hard for Linux to handle in contexts like the
entry code or NMIs. But other OSes need them for functionality.
There's a static (pre-guest-boot) way for a VMM to choose one or the
other. But VMMs don't always know which OS they are booting, so they
choose to deliver those #VE's so the "other" OSes will work. That,
unfortunately has left us in the lurch and exposed to these
hard-to-handle #VEs.
The TDX module has introduced a new feature. Even if the static
configuration is "send nasty #VE's", the kernel can dynamically request
that they be disabled.
Check if the feature is available and disable SEPT #VE if possible.
If the TD allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
attribute is no longer reliable. It reflects the initial state of the
control for the TD, but it will not be updated if someone (e.g. bootloader)
changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
determine if SEPT #VEs are enabled or disabled.
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Fixes: 373e715e31bf ("x86/tdx: Panic on bad configs that #VE on "private" memory access")
Cc: stable(a)vger.kernel.org
---
arch/x86/coco/tdx/tdx.c | 88 +++++++++++++++++++++++++------
arch/x86/include/asm/shared/tdx.h | 11 +++-
2 files changed, 83 insertions(+), 16 deletions(-)
diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c
index 1ff571cb9177..ba37f4306f4e 100644
--- a/arch/x86/coco/tdx/tdx.c
+++ b/arch/x86/coco/tdx/tdx.c
@@ -77,6 +77,20 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args)
panic("TDCALL %lld failed (Buggy TDX module!)\n", fn);
}
+/* Read TD-scoped metadata */
+static inline u64 tdg_vm_rd(u64 field, u64 *value)
+{
+ struct tdx_module_args args = {
+ .rdx = field,
+ };
+ u64 ret;
+
+ ret = __tdcall_ret(TDG_VM_RD, &args);
+ *value = args.r8;
+
+ return ret;
+}
+
/* Write TD-scoped metadata */
static inline u64 tdg_vm_wr(u64 field, u64 value, u64 mask)
{
@@ -179,6 +193,62 @@ static void __noreturn tdx_panic(const char *msg)
__tdx_hypercall(&args);
}
+/*
+ * The kernel cannot handle #VEs when accessing normal kernel memory. Ensure
+ * that no #VE will be delivered for accesses to TD-private memory.
+ *
+ * TDX 1.0 does not allow the guest to disable SEPT #VE on its own. The VMM
+ * controls if the guest will receive such #VE with TD attribute
+ * ATTR_SEPT_VE_DISABLE.
+ *
+ * Newer TDX module allows the guest to control if it wants to receive SEPT
+ * violation #VEs.
+ *
+ * Check if the feature is available and disable SEPT #VE if possible.
+ *
+ * If the TD allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE
+ * attribute is no longer reliable. It reflects the initial state of the
+ * control for the TD, but it will not be updated if someone (e.g. bootloader)
+ * changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to
+ * determine if SEPT #VEs are enabled or disabled.
+ */
+static void disable_sept_ve(u64 td_attr)
+{
+ const char *msg = "TD misconfiguration: SEPT #VE has to be disabled";
+ bool debug = td_attr & ATTR_DEBUG;
+ u64 config, controls;
+
+ /* Is this TD allowed to disable SEPT #VE */
+ tdg_vm_rd(TDCS_CONFIG_FLAGS, &config);
+ if (!(config & TDCS_CONFIG_FLEXIBLE_PENDING_VE)) {
+ /* No SEPT #VE controls for the guest: check the attribute */
+ if (td_attr & ATTR_SEPT_VE_DISABLE)
+ return;
+
+ /* Relax SEPT_VE_DISABLE check for debug TD for backtraces */
+ if (debug)
+ pr_warn("%s\n", msg);
+ else
+ tdx_panic(msg);
+ return;
+ }
+
+ /* Check if SEPT #VE has been disabled before us */
+ tdg_vm_rd(TDCS_TD_CTLS, &controls);
+ if (controls & TD_CTLS_PENDING_VE_DISABLE)
+ return;
+
+ /* Keep #VEs enabled for splats in debugging environments */
+ if (debug)
+ return;
+
+ /* Disable SEPT #VEs */
+ tdg_vm_wr(TDCS_TD_CTLS, TD_CTLS_PENDING_VE_DISABLE,
+ TD_CTLS_PENDING_VE_DISABLE);
+
+ return;
+}
+
static void tdx_setup(u64 *cc_mask)
{
struct tdx_module_args args = {};
@@ -204,24 +274,12 @@ static void tdx_setup(u64 *cc_mask)
gpa_width = args.rcx & GENMASK(5, 0);
*cc_mask = BIT_ULL(gpa_width - 1);
+ td_attr = args.rdx;
+
/* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */
tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL);
- /*
- * The kernel can not handle #VE's when accessing normal kernel
- * memory. Ensure that no #VE will be delivered for accesses to
- * TD-private memory. Only VMM-shared memory (MMIO) will #VE.
- */
- td_attr = args.rdx;
- if (!(td_attr & ATTR_SEPT_VE_DISABLE)) {
- const char *msg = "TD misconfiguration: SEPT_VE_DISABLE attribute must be set.";
-
- /* Relax SEPT_VE_DISABLE check for debug TD. */
- if (td_attr & ATTR_DEBUG)
- pr_warn("%s\n", msg);
- else
- tdx_panic(msg);
- }
+ disable_sept_ve(td_attr);
}
/*
diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h
index fdfd41511b02..fecb2a6e864b 100644
--- a/arch/x86/include/asm/shared/tdx.h
+++ b/arch/x86/include/asm/shared/tdx.h
@@ -16,11 +16,20 @@
#define TDG_VP_VEINFO_GET 3
#define TDG_MR_REPORT 4
#define TDG_MEM_PAGE_ACCEPT 6
+#define TDG_VM_RD 7
#define TDG_VM_WR 8
-/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */
+/* TDX TD-Scope Metadata. To be used by TDG.VM.WR and TDG.VM.RD */
+#define TDCS_CONFIG_FLAGS 0x1110000300000016
+#define TDCS_TD_CTLS 0x1110000300000017
#define TDCS_NOTIFY_ENABLES 0x9100000000000010
+/* TDCS_CONFIG_FLAGS bits */
+#define TDCS_CONFIG_FLEXIBLE_PENDING_VE BIT_ULL(1)
+
+/* TDCS_TD_CTLS bits */
+#define TD_CTLS_PENDING_VE_DISABLE BIT_ULL(0)
+
/* TDX hypercall Leaf IDs */
#define TDVMCALL_MAP_GPA 0x10001
#define TDVMCALL_GET_QUOTE 0x10002
--
2.43.0
Hi,
Please can you add this regression fix to kernel 6.9.1.
Feel free to add a:
Tested-by: James Courtier-Dutton <james.dutton(a)gmail.com>
It has been tested by many others also, as listed in the commit, so
don't feel the need to add my name if it delays adding it to kernel 6.9.1.
Without this fix, the network card in my HP laptop does not work.
Here is the summary with links:
- [net] e1000e: change usleep_range to udelay in PHY mdic access
https://git.kernel.org/netdev/net/c/387f295cb215
Kind Regards
James
The attached two patches fix an issue with running BPF sockmap on TCP
sockets where applications error out due to sender doing a send for
each scatter gather element in the msg. The other 6.x stable and
longterm kernels have this fix so we don't see it there and the issue
was introduced in 6.1 by converting to the read_skb() interface. The
5.15 stable kernels use the read_sock() interface which doesn't have
this issue.
We missed adding the fixes tag to the original series because the work
was a code improvement and wasn't originally identified as a bugfix.
The first patch applies cleanly, the second patch does not because
it touches smc, espintcp, and siw which do not apply because that
code does not use sendmsg() yet on 6.1. I remove those chunks of the
patch because they don't apply here.
I added a Fixes tag to the patches to point at the patch introducing
the issue. Originally I sent something similar to this as a single
patch where I incorrectly merged the two patches. Greg asked me to
do this as two patches. Thanks.
David Howells (2):
tcp_bpf: Inline do_tcp_sendpages as it's now a wrapper around
tcp_sendmsg
tcp_bpf, smc, tls, espintcp, siw: Reduce MSG_SENDPAGE_NOTLAST usage
net/ipv4/tcp_bpf.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
--
2.33.0
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 07a57a338adb6ec9e766d6a6790f76527f45ceb5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024042309-rural-overlying-190b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
07a57a338adb ("mm,swapops: update check in is_pfn_swap_entry for hwpoison entries")
d027122d8363 ("mm/hwpoison: move definitions of num_poisoned_pages_* to memory-failure.c")
e591ef7d96d6 ("mm,hwpoison,hugetlb,memory_hotplug: hotremove memory section with hwpoisoned hugepage")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 07a57a338adb6ec9e766d6a6790f76527f45ceb5 Mon Sep 17 00:00:00 2001
From: Oscar Salvador <osalvador(a)suse.de>
Date: Sun, 7 Apr 2024 15:05:37 +0200
Subject: [PATCH] mm,swapops: update check in is_pfn_swap_entry for hwpoison
entries
Tony reported that the Machine check recovery was broken in v6.9-rc1, as
he was hitting a VM_BUG_ON when injecting uncorrectable memory errors to
DRAM.
After some more digging and debugging on his side, he realized that this
went back to v6.1, with the introduction of 'commit 0d206b5d2e0d
("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")'. That
commit, among other things, introduced swp_offset_pfn(), replacing
hwpoison_entry_to_pfn() in its favour.
The patch also introduced a VM_BUG_ON() check for is_pfn_swap_entry(), but
is_pfn_swap_entry() never got updated to cover hwpoison entries, which
means that we would hit the VM_BUG_ON whenever we would call
swp_offset_pfn() for such entries on environments with CONFIG_DEBUG_VM
set. Fix this by updating the check to cover hwpoison entries as well,
and update the comment while we are it.
Link: https://lkml.kernel.org/r/20240407130537.16977-1-osalvador@suse.de
Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
Signed-off-by: Oscar Salvador <osalvador(a)suse.de>
Reported-by: Tony Luck <tony.luck(a)intel.com>
Closes: https://lore.kernel.org/all/Zg8kLSl2yAlA3o5D@agluck-desk3/
Tested-by: Tony Luck <tony.luck(a)intel.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: <stable(a)vger.kernel.org> [6.1.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 48b700ba1d18..a5c560a2f8c2 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -390,6 +390,35 @@ static inline bool is_migration_entry_dirty(swp_entry_t entry)
}
#endif /* CONFIG_MIGRATION */
+#ifdef CONFIG_MEMORY_FAILURE
+
+/*
+ * Support for hardware poisoned pages
+ */
+static inline swp_entry_t make_hwpoison_entry(struct page *page)
+{
+ BUG_ON(!PageLocked(page));
+ return swp_entry(SWP_HWPOISON, page_to_pfn(page));
+}
+
+static inline int is_hwpoison_entry(swp_entry_t entry)
+{
+ return swp_type(entry) == SWP_HWPOISON;
+}
+
+#else
+
+static inline swp_entry_t make_hwpoison_entry(struct page *page)
+{
+ return swp_entry(0, 0);
+}
+
+static inline int is_hwpoison_entry(swp_entry_t swp)
+{
+ return 0;
+}
+#endif
+
typedef unsigned long pte_marker;
#define PTE_MARKER_UFFD_WP BIT(0)
@@ -483,8 +512,9 @@ static inline struct folio *pfn_swap_entry_folio(swp_entry_t entry)
/*
* A pfn swap entry is a special type of swap entry that always has a pfn stored
- * in the swap offset. They are used to represent unaddressable device memory
- * and to restrict access to a page undergoing migration.
+ * in the swap offset. They can either be used to represent unaddressable device
+ * memory, to restrict access to a page undergoing migration or to represent a
+ * pfn which has been hwpoisoned and unmapped.
*/
static inline bool is_pfn_swap_entry(swp_entry_t entry)
{
@@ -492,7 +522,7 @@ static inline bool is_pfn_swap_entry(swp_entry_t entry)
BUILD_BUG_ON(SWP_TYPE_SHIFT < SWP_PFN_BITS);
return is_migration_entry(entry) || is_device_private_entry(entry) ||
- is_device_exclusive_entry(entry);
+ is_device_exclusive_entry(entry) || is_hwpoison_entry(entry);
}
struct page_vma_mapped_walk;
@@ -561,35 +591,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
}
#endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
-#ifdef CONFIG_MEMORY_FAILURE
-
-/*
- * Support for hardware poisoned pages
- */
-static inline swp_entry_t make_hwpoison_entry(struct page *page)
-{
- BUG_ON(!PageLocked(page));
- return swp_entry(SWP_HWPOISON, page_to_pfn(page));
-}
-
-static inline int is_hwpoison_entry(swp_entry_t entry)
-{
- return swp_type(entry) == SWP_HWPOISON;
-}
-
-#else
-
-static inline swp_entry_t make_hwpoison_entry(struct page *page)
-{
- return swp_entry(0, 0);
-}
-
-static inline int is_hwpoison_entry(swp_entry_t swp)
-{
- return 0;
-}
-#endif
-
static inline int non_swap_entry(swp_entry_t entry)
{
return swp_type(entry) >= MAX_SWAPFILES;
From: Mark Brown <broonie(a)kernel.org>
[ Upstream commit 907f33028871fa7c9a3db1efd467b78ef82cce20 ]
The standard library perror() function provides a convenient way to print
an error message based on the current errno but this doesn't play nicely
with KTAP output. Provide a helper which does an equivalent thing in a KTAP
compatible format.
nolibc doesn't have a strerror() and adding the table of strings required
doesn't seem like a good fit for what it's trying to do so when we're using
that only print the errno.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Stable-dep-of: 071af0c9e582 ("selftests: timers: Convert posix_timers test to generate KTAP output")
Signed-off-by: Edward Liaw <edliaw(a)google.com>
---
tools/testing/selftests/kselftest.h | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index e8eecbc83a60..ad7b97e16f37 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -48,6 +48,7 @@
#include <stdlib.h>
#include <unistd.h>
#include <stdarg.h>
+#include <string.h>
#include <stdio.h>
#include <sys/utsname.h>
#endif
@@ -156,6 +157,19 @@ static inline void ksft_print_msg(const char *msg, ...)
va_end(args);
}
+static inline void ksft_perror(const char *msg)
+{
+#ifndef NOLIBC
+ ksft_print_msg("%s: %s (%d)\n", msg, strerror(errno), errno);
+#else
+ /*
+ * nolibc doesn't provide strerror() and it seems
+ * inappropriate to add one, just print the errno.
+ */
+ ksft_print_msg("%s: %d)\n", msg, errno);
+#endif
+}
+
static inline void ksft_test_result_pass(const char *msg, ...)
{
int saved_errno = errno;
--
2.44.0.769.g3c40516874-goog
From: Qu Wenruo <wqu(a)suse.com>
commit 1db7959aacd905e6487d0478ac01d89f86eb1e51 upstream.
[BUG]
There is a recent report that when memory pressure is high (including
cached pages), btrfs can spend most of its time on memory allocation in
btrfs_alloc_page_array() for compressed read/write.
[CAUSE]
For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
even if the bulk allocation failed (fell back to single page
allocation) we still retry but with extra memalloc_retry_wait().
If the bulk alloc only returned one page a time, we would spend a lot of
time on the retry wait.
The behavior was introduced in commit 395cb57e8560 ("btrfs: wait between
incomplete batch memory allocations").
[FIX]
Although the commit mentioned that other filesystems do the wait, it's
not the case at least nowadays.
All the mainlined filesystems only call memalloc_retry_wait() if they
failed to allocate any page (not only for bulk allocation).
If there is any progress, they won't call memalloc_retry_wait() at all.
For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait()
if there is no allocation progress at all, and the call is not for
metadata readahead.
So I don't believe we should call memalloc_retry_wait() unconditionally
for short allocation.
Call memalloc_retry_wait() if it fails to allocate any page for tree
block allocation (which goes with __GFP_NOFAIL and may not need the
special handling anyway), and reduce the latency for
btrfs_alloc_page_array().
Reported-by: Julian Taylor <julian.taylor(a)1und1.de>
Tested-by: Julian Taylor <julian.taylor(a)1und1.de>
Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
CC: stable(a)vger.kernel.org # 6.1+
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel(a)dorminy.me>
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/extent_io.c | 14 ++------------
1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 5acb2cb79d4b..9fbffd84b16c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -686,24 +686,14 @@ int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array)
unsigned int last = allocated;
allocated = alloc_pages_bulk_array(GFP_NOFS, nr_pages, page_array);
-
- if (allocated == nr_pages)
- return 0;
-
- /*
- * During this iteration, no page could be allocated, even
- * though alloc_pages_bulk_array() falls back to alloc_page()
- * if it could not bulk-allocate. So we must be out of memory.
- */
- if (allocated == last) {
+ if (unlikely(allocated == last)) {
+ /* No progress, fail and do cleanup. */
for (int i = 0; i < allocated; i++) {
__free_page(page_array[i]);
page_array[i] = NULL;
}
return -ENOMEM;
}
-
- memalloc_retry_wait(GFP_NOFS);
}
return 0;
}
--
2.45.0
The quilt patch titled
Subject: mm/huge_memory: mark huge_zero_page reserved
has been removed from the -mm tree. Its filename was
mm-huge_memory-mark-huge_zero_page-reserved.patch
This patch was dropped because an alternative patch was or shall be merged
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/huge_memory: mark huge_zero_page reserved
Date: Sat, 11 May 2024 11:54:35 +0800
When I did memory failure tests recently, below panic occurs:
kernel BUG at include/linux/mm.h:1135!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
Call Trace:
<TASK>
do_shrink_slab+0x14f/0x6a0
shrink_slab+0xca/0x8c0
shrink_node+0x2d0/0x7d0
balance_pgdat+0x33a/0x720
kswapd+0x1f3/0x410
kthread+0xd5/0x100
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
The root cause is that HWPoison flag will be set for huge_zero_page
without increasing the page refcnt. But then unpoison_memory() will
decrease the page refcnt unexpectly as it appears like a successfully
hwpoisoned page leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0)
when releasing huge_zero_page.
Fix this issue by marking huge_zero_page reserved. So unpoison_memory()
will skip this page. This will make it consistent with ZERO_PAGE case too.
Link: https://lkml.kernel.org/r/20240511035435.1477004-1-linmiaohe@huawei.com
Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Xu Yu <xuyu(a)linux.alibaba.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/huge_memory.c~mm-huge_memory-mark-huge_zero_page-reserved
+++ a/mm/huge_memory.c
@@ -212,6 +212,7 @@ retry:
folio_put(zero_folio);
goto retry;
}
+ __SetPageReserved(zero_page);
WRITE_ONCE(huge_zero_pfn, folio_pfn(zero_folio));
/* We take additional reference here. It will be put back by shrinker */
@@ -264,6 +265,7 @@ static unsigned long shrink_huge_zero_pa
struct folio *zero_folio = xchg(&huge_zero_folio, NULL);
BUG_ON(zero_folio == NULL);
WRITE_ONCE(huge_zero_pfn, ~0UL);
+ __ClearPageReserved(zero_page);
folio_put(zero_folio);
return HPAGE_PMD_NR;
}
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
The patch titled
Subject: mm/huge_memory: mark huge_zero_page reserved
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memory-mark-huge_zero_page-reserved.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/huge_memory: mark huge_zero_page reserved
Date: Sat, 11 May 2024 11:54:35 +0800
When I did memory failure tests recently, below panic occurs:
kernel BUG at include/linux/mm.h:1135!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
Call Trace:
<TASK>
do_shrink_slab+0x14f/0x6a0
shrink_slab+0xca/0x8c0
shrink_node+0x2d0/0x7d0
balance_pgdat+0x33a/0x720
kswapd+0x1f3/0x410
kthread+0xd5/0x100
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
The root cause is that HWPoison flag will be set for huge_zero_page
without increasing the page refcnt. But then unpoison_memory() will
decrease the page refcnt unexpectly as it appears like a successfully
hwpoisoned page leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0)
when releasing huge_zero_page.
Fix this issue by marking huge_zero_page reserved. So unpoison_memory()
will skip this page. This will make it consistent with ZERO_PAGE case too.
Link: https://lkml.kernel.org/r/20240511035435.1477004-1-linmiaohe@huawei.com
Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Xu Yu <xuyu(a)linux.alibaba.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/huge_memory.c~mm-huge_memory-mark-huge_zero_page-reserved
+++ a/mm/huge_memory.c
@@ -208,6 +208,7 @@ retry:
__free_pages(zero_page, compound_order(zero_page));
goto retry;
}
+ __SetPageReserved(zero_page);
WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
/* We take additional reference here. It will be put back by shrinker */
@@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_pa
struct page *zero_page = xchg(&huge_zero_page, NULL);
BUG_ON(zero_page == NULL);
WRITE_ONCE(huge_zero_pfn, ~0UL);
+ __ClearPageReserved(zero_page);
__free_pages(zero_page, compound_order(zero_page));
return HPAGE_PMD_NR;
}
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-huge_memory-mark-huge_zero_page-reserved.patch
The patch titled
Subject: mm/huge_memory: mark huge_zero_page reserved
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-huge_memory-mark-huge_zero_page-reserved.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/huge_memory: mark huge_zero_page reserved
Date: Sat, 11 May 2024 11:54:35 +0800
When I did memory failure tests recently, below panic occurs:
kernel BUG at include/linux/mm.h:1135!
invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
Call Trace:
<TASK>
do_shrink_slab+0x14f/0x6a0
shrink_slab+0xca/0x8c0
shrink_node+0x2d0/0x7d0
balance_pgdat+0x33a/0x720
kswapd+0x1f3/0x410
kthread+0xd5/0x100
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x1a/0x30
</TASK>
Modules linked in: mce_inject hwpoison_inject
---[ end trace 0000000000000000 ]---
RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0
RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246
RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8
RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0
RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492
R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00
FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
The root cause is that HWPoison flag will be set for huge_zero_page
without increasing the page refcnt. But then unpoison_memory() will
decrease the page refcnt unexpectly as it appears like a successfully
hwpoisoned page leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0)
when releasing huge_zero_page.
Fix this issue by marking huge_zero_page reserved. So unpoison_memory()
will skip this page. This will make it consistent with ZERO_PAGE case too.
Link: https://lkml.kernel.org/r/20240511035435.1477004-1-linmiaohe@huawei.com
Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: Xu Yu <xuyu(a)linux.alibaba.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/huge_memory.c~mm-huge_memory-mark-huge_zero_page-reserved
+++ a/mm/huge_memory.c
@@ -208,6 +208,7 @@ retry:
__free_pages(zero_page, compound_order(zero_page));
goto retry;
}
+ __SetPageReserved(zero_page);
WRITE_ONCE(huge_zero_pfn, page_to_pfn(zero_page));
/* We take additional reference here. It will be put back by shrinker */
@@ -260,6 +261,7 @@ static unsigned long shrink_huge_zero_pa
struct page *zero_page = xchg(&huge_zero_page, NULL);
BUG_ON(zero_page == NULL);
WRITE_ONCE(huge_zero_pfn, ~0UL);
+ __ClearPageReserved(zero_page);
__free_pages(zero_page, compound_order(zero_page));
return HPAGE_PMD_NR;
}
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-huge_memory-mark-huge_zero_page-reserved.patch
From: Qu Wenruo <wqu(a)suse.com>
commit 1db7959aacd905e6487d0478ac01d89f86eb1e51 upstream.
[BUG]
There is a recent report that when memory pressure is high (including
cached pages), btrfs can spend most of its time on memory allocation in
btrfs_alloc_page_array() for compressed read/write.
[CAUSE]
For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and
even if the bulk allocation failed (fell back to single page
allocation) we still retry but with extra memalloc_retry_wait().
If the bulk alloc only returned one page a time, we would spend a lot of
time on the retry wait.
The behavior was introduced in commit 395cb57e8560 ("btrfs: wait between
incomplete batch memory allocations").
[FIX]
Although the commit mentioned that other filesystems do the wait, it's
not the case at least nowadays.
All the mainlined filesystems only call memalloc_retry_wait() if they
failed to allocate any page (not only for bulk allocation).
If there is any progress, they won't call memalloc_retry_wait() at all.
For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait()
if there is no allocation progress at all, and the call is not for
metadata readahead.
So I don't believe we should call memalloc_retry_wait() unconditionally
for short allocation.
Call memalloc_retry_wait() if it fails to allocate any page for tree
block allocation (which goes with __GFP_NOFAIL and may not need the
special handling anyway), and reduce the latency for
btrfs_alloc_page_array().
Reported-by: Julian Taylor <julian.taylor(a)1und1.de>
Tested-by: Julian Taylor <julian.taylor(a)1und1.de>
Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@1und1.de/
Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations")
CC: stable(a)vger.kernel.org # 6.1+
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel(a)dorminy.me>
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
---
fs/btrfs/extent_io.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 539bc9bdcb93..5f923c9b773e 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1324,19 +1324,14 @@ int btrfs_alloc_page_array(unsigned int nr_pages, struct page **page_array)
unsigned int last = allocated;
allocated = alloc_pages_bulk_array(GFP_NOFS, nr_pages, page_array);
-
- if (allocated == nr_pages)
- return 0;
-
- /*
- * During this iteration, no page could be allocated, even
- * though alloc_pages_bulk_array() falls back to alloc_page()
- * if it could not bulk-allocate. So we must be out of memory.
- */
- if (allocated == last)
+ if (unlikely(allocated == last)) {
+ /* No progress, fail and do cleanup. */
+ for (int i = 0; i < allocated; i++) {
+ __free_page(page_array[i]);
+ page_array[i] = NULL;
+ }
return -ENOMEM;
-
- memalloc_retry_wait(GFP_NOFS);
+ }
}
return 0;
}
--
2.45.0
From: Hao Ge <gehao(a)kylinos.cn>
In function eventfs_find_events,there is a potential null pointer
that may be caused by calling update_events_attr which will perform
some operations on the members of the ei struct when ei is NULL.
Hence,When ei->is_freed is set,return NULL directly.
Link: https://lore.kernel.org/linux-trace-kernel/20240513053338.63017-1-hao.ge@li…
Cc: stable(a)vger.kernel.org
Fixes: 8186fff7ab64 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Hao Ge <gehao(a)kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index a878cea70f4c..0256afdd4acf 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -345,10 +345,9 @@ static struct eventfs_inode *eventfs_find_events(struct dentry *dentry)
* If the ei is being freed, the ownership of the children
* doesn't matter.
*/
- if (ei->is_freed) {
- ei = NULL;
- break;
- }
+ if (ei->is_freed)
+ return NULL;
+
// Walk upwards until you find the events inode
} while (!ei->is_events);
--
2.43.0
Please find attached a report generated by keyword matching commits
from upstream that may be suitable for stable and probably as CVEs as
well.
I exclude commits that are already tagged with CC stable in upstream
and also commits already in
stable/linux-rolling-stable.
I can send these type of reports weekly if you want.
I plan to add more keywords to check for. But let's start small.
A question about commits that have a Fixes: tag. There are ~2000 of
them since v6.8.
As these are definitely bugfixes, do you want me to add commits that
include a Fixes: tag in future reports/scans?
Also let me know if/how I can change the format of the scan so that it
is easier for you to parse in your tooling.
regards
ronnie sahlberg
--
Ronnie Sahlberg [Principal Software Engineer, Linux]
P 775 384 8203 | E [email] | W ciq.com
Wire up LMH on QCM2290 and fix a bad bug while at it.
P1-2 for thermal, P3 for qcom
Signed-off-by: Konrad Dybcio <konrad.dybcio(a)linaro.org>
---
Changes in v2:
- Pick up tags
- Fix a couple typos in commit messages
- Drop stray msm8998 binding addition
- Link to v1: https://lore.kernel.org/r/20240308-topic-rb1_lmh-v1-0-50c60ffe1130@linaro.o…
---
Konrad Dybcio (2):
dt-bindings: thermal: lmh: Add QCM2290 compatible
thermal: qcom: lmh: Check for SCM availability at probe
Loic Poulain (1):
arm64: dts: qcom: qcm2290: Add LMH node
Documentation/devicetree/bindings/thermal/qcom-lmh.yaml | 12 ++++++++----
arch/arm64/boot/dts/qcom/qcm2290.dtsi | 14 +++++++++++++-
drivers/thermal/qcom/lmh.c | 3 +++
3 files changed, 24 insertions(+), 5 deletions(-)
---
base-commit: 8ffc8b1bbd505e27e2c8439d326b6059c906c9dd
change-id: 20240308-topic-rb1_lmh-1e0f440c392a
Best regards,
--
Konrad Dybcio <konrad.dybcio(a)linaro.org>
When reading EDID fails and driver reports no modes available, the DRM
core adds an artificial 1024x786 mode to the connector. Unfortunately
some variants of the Exynos HDMI (like the one in Exynos4 SoCs) are not
able to drive such mode, so report a safe 640x480 mode instead of nothing
in case of the EDID reading failure.
This fixes the following issue observed on Trats2 board since commit
13d5b040363c ("drm/exynos: do not return negative values from .get_modes()"):
[drm] Exynos DRM: using 11c00000.fimd device for DMA mapping operations
exynos-drm exynos-drm: bound 11c00000.fimd (ops fimd_component_ops)
exynos-drm exynos-drm: bound 12c10000.mixer (ops mixer_component_ops)
exynos-dsi 11c80000.dsi: [drm:samsung_dsim_host_attach] Attached s6e8aa0 device (lanes:4 bpp:24 mode-flags:0x10b)
exynos-drm exynos-drm: bound 11c80000.dsi (ops exynos_dsi_component_ops)
exynos-drm exynos-drm: bound 12d00000.hdmi (ops hdmi_component_ops)
[drm] Initialized exynos 1.1.0 20180330 for exynos-drm on minor 1
exynos-hdmi 12d00000.hdmi: [drm:hdmiphy_enable.part.0] *ERROR* PLL could not reach steady state
panel-samsung-s6e8aa0 11c80000.dsi.0: ID: 0xa2, 0x20, 0x8c
exynos-mixer 12c10000.mixer: timeout waiting for VSYNC
------------[ cut here ]------------
WARNING: CPU: 1 PID: 11 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x2b0/0x2b8
[CRTC:70:crtc-1] vblank wait timed out
Modules linked in:
CPU: 1 PID: 11 Comm: kworker/u16:0 Not tainted 6.9.0-rc5-next-20240424 #14913
Hardware name: Samsung Exynos (Flattened Device Tree)
Workqueue: events_unbound deferred_probe_work_func
Call trace:
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x68/0x88
dump_stack_lvl from __warn+0x7c/0x1c4
__warn from warn_slowpath_fmt+0x11c/0x1a8
warn_slowpath_fmt from drm_atomic_helper_wait_for_vblanks.part.0+0x2b0/0x2b8
drm_atomic_helper_wait_for_vblanks.part.0 from drm_atomic_helper_commit_tail_rpm+0x7c/0x8c
drm_atomic_helper_commit_tail_rpm from commit_tail+0x9c/0x184
commit_tail from drm_atomic_helper_commit+0x168/0x190
drm_atomic_helper_commit from drm_atomic_commit+0xb4/0xe0
drm_atomic_commit from drm_client_modeset_commit_atomic+0x23c/0x27c
drm_client_modeset_commit_atomic from drm_client_modeset_commit_locked+0x60/0x1cc
drm_client_modeset_commit_locked from drm_client_modeset_commit+0x24/0x40
drm_client_modeset_commit from __drm_fb_helper_restore_fbdev_mode_unlocked+0x9c/0xc4
__drm_fb_helper_restore_fbdev_mode_unlocked from drm_fb_helper_set_par+0x2c/0x3c
drm_fb_helper_set_par from fbcon_init+0x3d8/0x550
fbcon_init from visual_init+0xc0/0x108
visual_init from do_bind_con_driver+0x1b8/0x3a4
do_bind_con_driver from do_take_over_console+0x140/0x1ec
do_take_over_console from do_fbcon_takeover+0x70/0xd0
do_fbcon_takeover from fbcon_fb_registered+0x19c/0x1ac
fbcon_fb_registered from register_framebuffer+0x190/0x21c
register_framebuffer from __drm_fb_helper_initial_config_and_unlock+0x350/0x574
__drm_fb_helper_initial_config_and_unlock from exynos_drm_fbdev_client_hotplug+0x6c/0xb0
exynos_drm_fbdev_client_hotplug from drm_client_register+0x58/0x94
drm_client_register from exynos_drm_bind+0x160/0x190
exynos_drm_bind from try_to_bring_up_aggregate_device+0x200/0x2d8
try_to_bring_up_aggregate_device from __component_add+0xb0/0x170
__component_add from mixer_probe+0x74/0xcc
mixer_probe from platform_probe+0x5c/0xb8
platform_probe from really_probe+0xe0/0x3d8
really_probe from __driver_probe_device+0x9c/0x1e4
__driver_probe_device from driver_probe_device+0x30/0xc0
driver_probe_device from __device_attach_driver+0xa8/0x120
__device_attach_driver from bus_for_each_drv+0x80/0xcc
bus_for_each_drv from __device_attach+0xac/0x1fc
__device_attach from bus_probe_device+0x8c/0x90
bus_probe_device from deferred_probe_work_func+0x98/0xe0
deferred_probe_work_func from process_one_work+0x240/0x6d0
process_one_work from worker_thread+0x1a0/0x3f4
worker_thread from kthread+0x104/0x138
kthread from ret_from_fork+0x14/0x28
Exception stack(0xf0895fb0 to 0xf0895ff8)
...
irq event stamp: 82357
hardirqs last enabled at (82363): [<c01a96e8>] vprintk_emit+0x308/0x33c
hardirqs last disabled at (82368): [<c01a969c>] vprintk_emit+0x2bc/0x33c
softirqs last enabled at (81614): [<c0101644>] __do_softirq+0x320/0x500
softirqs last disabled at (81609): [<c012dfe0>] __irq_exit_rcu+0x130/0x184
---[ end trace 0000000000000000 ]---
exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out
exynos-drm exynos-drm: [drm] *ERROR* [CRTC:70:crtc-1] commit wait timed out
exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out
exynos-drm exynos-drm: [drm] *ERROR* [CONNECTOR:74:HDMI-A-1] commit wait timed out
exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out
exynos-drm exynos-drm: [drm] *ERROR* [PLANE:56:plane-5] commit wait timed out
exynos-mixer 12c10000.mixer: timeout waiting for VSYNC
Cc: stable(a)vger.kernel.org
Fixes: 13d5b040363c ("drm/exynos: do not return negative values from .get_modes()")
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
---
drivers/gpu/drm/exynos/exynos_hdmi.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/exynos/exynos_hdmi.c b/drivers/gpu/drm/exynos/exynos_hdmi.c
index 5fdeec8a3875..9d246db6ef2b 100644
--- a/drivers/gpu/drm/exynos/exynos_hdmi.c
+++ b/drivers/gpu/drm/exynos/exynos_hdmi.c
@@ -887,11 +887,11 @@ static int hdmi_get_modes(struct drm_connector *connector)
int ret;
if (!hdata->ddc_adpt)
- return 0;
+ goto no_edid;
edid = drm_get_edid(connector, hdata->ddc_adpt);
if (!edid)
- return 0;
+ goto no_edid;
hdata->dvi_mode = !connector->display_info.is_hdmi;
DRM_DEV_DEBUG_KMS(hdata->dev, "%s : width[%d] x height[%d]\n",
@@ -906,6 +906,9 @@ static int hdmi_get_modes(struct drm_connector *connector)
kfree(edid);
return ret;
+
+no_edid:
+ return drm_add_modes_noedid(connector, 640, 480);
}
static int hdmi_find_phy_conf(struct hdmi_context *hdata, u32 pixel_clock)
--
2.34.1
Hi,
There are just two minor fixes from Hector that we've been carrying downstream
for a while now. One increases the timeout while waiting for the firmware to
boot which is optional for the controller already supported upstream but
required for a newer 4388 board for which we'll also submit support soon.
It also fixes the units for the timeouts which is why I've already included it
here. The other one fixes a call to bitmap_release_region where we only wanted
to release a single bit but are actually releasing much more.
Best,
Sven
Signed-off-by: Sven Peter <sven(a)svenpeter.dev>
---
Hector Martin (2):
Bluetooth: hci_bcm4377: Increase boot timeout
Bluetooth: hci_bcm4377: Fix msgid release
drivers/bluetooth/hci_bcm4377.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
---
base-commit: cf87f46fd34d6c19283d9625a7822f20d90b64a4
change-id: 20240512-btfix-msgid-d76029a7d917
Best regards,
--
Sven Peter <sven(a)svenpeter.dev>
When asn1_encode_sequence() fails, WARN is not the correct solution.
1. asn1_encode_sequence() is not an internal function (located
in lib/asn1_encode.c).
2. Location is known, which makes the stack trace useless.
3. Results a crash if panic_on_warn is set.
It is also noteworthy that the use of WARN is undocumented, and it
should be avoided unless there is a carefully considered rationale to
use it.
Replace WARN with pr_err, and print the return value instead, which is
only useful piece of information.
Cc: stable(a)vger.kernel.org # v5.13+
Fixes: f2219745250f ("security: keys: trusted: use ASN.1 TPM2 key format for the blobs")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v2: Fix typos in the commit message
---
security/keys/trusted-keys/trusted_tpm2.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/security/keys/trusted-keys/trusted_tpm2.c b/security/keys/trusted-keys/trusted_tpm2.c
index dfeec06301ce..dbdd6a318b8b 100644
--- a/security/keys/trusted-keys/trusted_tpm2.c
+++ b/security/keys/trusted-keys/trusted_tpm2.c
@@ -38,6 +38,7 @@ static int tpm2_key_encode(struct trusted_key_payload *payload,
u8 *end_work = scratch + SCRATCH_SIZE;
u8 *priv, *pub;
u16 priv_len, pub_len;
+ int ret;
priv_len = get_unaligned_be16(src) + 2;
priv = src;
@@ -79,8 +80,11 @@ static int tpm2_key_encode(struct trusted_key_payload *payload,
work1 = payload->blob;
work1 = asn1_encode_sequence(work1, work1 + sizeof(payload->blob),
scratch, work - scratch);
- if (WARN(IS_ERR(work1), "BUG: ASN.1 encoder failed"))
- return PTR_ERR(work1);
+ if (IS_ERR(work1)) {
+ ret = PTR_ERR(work1);
+ pr_err("ASN.1 encode error %d\n", ret);
+ return ret;
+ }
return work1 - payload->blob;
}
--
2.45.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d36f6ed761b53933b0b4126486c10d3da7751e7f Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Wed, 18 May 2022 20:08:16 +0800
Subject: [PATCH] ext4: fix bug_on in __es_tree_search
Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/extents_status.c:199!
[...]
RIP: 0010:ext4_es_end fs/ext4/extents_status.c:199 [inline]
RIP: 0010:__es_tree_search+0x1e0/0x260 fs/ext4/extents_status.c:217
[...]
Call Trace:
ext4_es_cache_extent+0x109/0x340 fs/ext4/extents_status.c:766
ext4_cache_extents+0x239/0x2e0 fs/ext4/extents.c:561
ext4_find_extent+0x6b7/0xa20 fs/ext4/extents.c:964
ext4_ext_map_blocks+0x16b/0x4b70 fs/ext4/extents.c:4384
ext4_map_blocks+0xe26/0x19f0 fs/ext4/inode.c:567
ext4_getblk+0x320/0x4c0 fs/ext4/inode.c:980
ext4_bread+0x2d/0x170 fs/ext4/inode.c:1031
ext4_quota_read+0x248/0x320 fs/ext4/super.c:6257
v2_read_header+0x78/0x110 fs/quota/quota_v2.c:63
v2_check_quota_file+0x76/0x230 fs/quota/quota_v2.c:82
vfs_load_quota_inode+0x5d1/0x1530 fs/quota/dquot.c:2368
dquot_enable+0x28a/0x330 fs/quota/dquot.c:2490
ext4_quota_enable fs/ext4/super.c:6137 [inline]
ext4_enable_quotas+0x5d7/0x960 fs/ext4/super.c:6163
ext4_fill_super+0xa7c9/0xdc00 fs/ext4/super.c:4754
mount_bdev+0x2e9/0x3b0 fs/super.c:1158
mount_fs+0x4b/0x1e4 fs/super.c:1261
[...]
==================================================================
Above issue may happen as follows:
-------------------------------------
ext4_fill_super
ext4_enable_quotas
ext4_quota_enable
ext4_iget
__ext4_iget
ext4_ext_check_inode
ext4_ext_check
__ext4_ext_check
ext4_valid_extent_entries
Check for overlapping extents does't take effect
dquot_enable
vfs_load_quota_inode
v2_check_quota_file
v2_read_header
ext4_quota_read
ext4_bread
ext4_getblk
ext4_map_blocks
ext4_ext_map_blocks
ext4_find_extent
ext4_cache_extents
ext4_es_cache_extent
ext4_es_cache_extent
__es_tree_search
ext4_es_end
BUG_ON(es->es_lblk + es->es_len < es->es_lblk)
The error ext4 extents is as follows:
0af3 0300 0400 0000 00000000 extent_header
00000000 0100 0000 12000000 extent1
00000000 0100 0000 18000000 extent2
02000000 0400 0000 14000000 extent3
In the ext4_valid_extent_entries function,
if prev is 0, no error is returned even if lblock<=prev.
This was intended to skip the check on the first extent, but
in the error image above, prev=0+1-1=0 when checking the second extent,
so even though lblock<=prev, the function does not return an error.
As a result, bug_ON occurs in __es_tree_search and the system panics.
To solve this problem, we only need to check that:
1. The lblock of the first extent is not less than 0.
2. The lblock of the next extent is not less than
the next block of the previous extent.
The same applies to extent_idx.
Cc: stable(a)kernel.org
Fixes: 5946d089379a ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220518120816.1541863-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 474479ce76e0..c148bb97b527 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -372,7 +372,7 @@ static int ext4_valid_extent_entries(struct inode *inode,
{
unsigned short entries;
ext4_lblk_t lblock = 0;
- ext4_lblk_t prev = 0;
+ ext4_lblk_t cur = 0;
if (eh->eh_entries == 0)
return 1;
@@ -396,11 +396,11 @@ static int ext4_valid_extent_entries(struct inode *inode,
/* Check for overlapping extents */
lblock = le32_to_cpu(ext->ee_block);
- if ((lblock <= prev) && prev) {
+ if (lblock < cur) {
*pblk = ext4_ext_pblock(ext);
return 0;
}
- prev = lblock + ext4_ext_get_actual_len(ext) - 1;
+ cur = lblock + ext4_ext_get_actual_len(ext);
ext++;
entries--;
}
@@ -420,13 +420,13 @@ static int ext4_valid_extent_entries(struct inode *inode,
/* Check for overlapping index extents */
lblock = le32_to_cpu(ext_idx->ei_block);
- if ((lblock <= prev) && prev) {
+ if (lblock < cur) {
*pblk = ext4_idx_pblock(ext_idx);
return 0;
}
ext_idx++;
entries--;
- prev = lblock;
+ cur = lblock + 1;
}
}
return 1;
Error on asn1_encode_sequence() is handled with a WARN incorrectly
because:
1. asn1_encode_sequence() is not an internal function (located
in lib/asn1_encode.c).
2. Location on known, which makes the stack trace useless.
3. Results a crash if panic_on_warn is set.
It is also noteworthy that the use of WARN is undocumented, and it
should be avoided unless there is carefully considered rationale to use
it, which is now non-existent.
Replace WARN with pr_err, and print the return value instead, which is
only useful piece of information (and was not printed).
Cc: stable(a)vger.kernel.org # v5.13+
Fixes: f2219745250f ("security: keys: trusted: use ASN.1 TPM2 key format for the blobs")
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
security/keys/trusted-keys/trusted_tpm2.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/security/keys/trusted-keys/trusted_tpm2.c b/security/keys/trusted-keys/trusted_tpm2.c
index c8d8fdefbd8d..e31fe53822a1 100644
--- a/security/keys/trusted-keys/trusted_tpm2.c
+++ b/security/keys/trusted-keys/trusted_tpm2.c
@@ -39,6 +39,7 @@ static int tpm2_key_encode(struct trusted_key_payload *payload,
u8 *end_work = scratch + SCRATCH_SIZE;
u8 *priv, *pub;
u16 priv_len, pub_len;
+ int ret;
priv_len = get_unaligned_be16(src) + 2;
priv = src;
@@ -80,8 +81,11 @@ static int tpm2_key_encode(struct trusted_key_payload *payload,
work1 = payload->blob;
work1 = asn1_encode_sequence(work1, work1 + sizeof(payload->blob),
scratch, work - scratch);
- if (WARN(IS_ERR(work1), "BUG: ASN.1 encoder failed"))
- return PTR_ERR(work1);
+ if (IS_ERR(work1)) {
+ ret = PTR_ERR(work1);
+ pr_err("ASN.1 encode error %d\n", ret);
+ return ret;
+ }
return work1 - payload->blob;
}
--
2.45.0
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 40d442f969fb1e871da6fca73d3f8aef1f888558
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051321-refutable-qualified-fe5c@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
40d442f969fb ("Bluetooth: qca: fix firmware check error path")
2e4edfa1e2bd ("Bluetooth: qca: add missing firmware sanity checks")
ecf6b2d95666 ("Bluetooth: btqca: Add support for firmware image with mbn type for WCN6750")
d8f97da1b92d ("Bluetooth: hci_qca: Add support for QTI Bluetooth chip wcn6750")
b43ca511178e ("Bluetooth: btqca: Don't modify firmware contents in-place")
c1a74160eaf1 ("Bluetooth: hci_qca: Add device_may_wakeup support")
eaf19b0c47d1 ("Bluetooth: btqca: Enable MSFT extension for Qualcomm WCN399x")
c0187b0bd3e9 ("Bluetooth: btqca: Add support to read FW build version for WCN3991 BTSoC")
99719449a4a6 ("Bluetooth: hci_qca: resolve various warnings")
054ec5e94a46 ("Bluetooth: hci_qca: Remove duplicate power off in proto close")
590deccf4c06 ("Bluetooth: hci_qca: Disable SoC debug logging for WCN3991")
37aee136f8c4 ("Bluetooth: hci_qca: allow max-speed to be set for QCA9377 devices")
e5d6468fe9d8 ("Bluetooth: hci_qca: Add support for Qualcomm Bluetooth SoC QCA6390")
77131dfec6af ("Bluetooth: hci_qca: Replace devm_gpiod_get() with devm_gpiod_get_optional()")
8a208b24d770 ("Bluetooth: hci_qca: Make bt_en and susclk not mandatory for QCA Rome")
b63882549b2b ("Bluetooth: btqca: Fix the NVM baudrate tag offcet for wcn3991")
4f9ed5bd63dc ("Bluetooth: hci_qca: Not send vendor pre-shutdown command for QCA Rome")
66cb70513564 ("Bluetooth: hci_qca: Enable clocks required for BT SOC")
ae563183b647 ("Bluetooth: hci_qca: Enable power off/on support during hci down/up for QCA Rome")
5559904ccc08 ("Bluetooth: hci_qca: Add QCA Rome power off support to the qca_power_shutdown()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40d442f969fb1e871da6fca73d3f8aef1f888558 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Wed, 1 May 2024 08:37:40 +0200
Subject: [PATCH] Bluetooth: qca: fix firmware check error path
A recent commit fixed the code that parses the firmware files before
downloading them to the controller but introduced a memory leak in case
the sanity checks ever fail.
Make sure to free the firmware buffer before returning on errors.
Fixes: f905ae0be4b7 ("Bluetooth: qca: add missing firmware sanity checks")
Cc: stable(a)vger.kernel.org # 4.19
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index 8d8a664620a3..638074992c82 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -605,7 +605,7 @@ static int qca_download_firmware(struct hci_dev *hdev,
ret = qca_tlv_check_data(hdev, config, data, size, soc_type);
if (ret)
- return ret;
+ goto out;
segment = data;
remain = size;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 40d442f969fb1e871da6fca73d3f8aef1f888558
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051321-festival-sprint-9288@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
40d442f969fb ("Bluetooth: qca: fix firmware check error path")
2e4edfa1e2bd ("Bluetooth: qca: add missing firmware sanity checks")
ecf6b2d95666 ("Bluetooth: btqca: Add support for firmware image with mbn type for WCN6750")
d8f97da1b92d ("Bluetooth: hci_qca: Add support for QTI Bluetooth chip wcn6750")
b43ca511178e ("Bluetooth: btqca: Don't modify firmware contents in-place")
c1a74160eaf1 ("Bluetooth: hci_qca: Add device_may_wakeup support")
eaf19b0c47d1 ("Bluetooth: btqca: Enable MSFT extension for Qualcomm WCN399x")
c0187b0bd3e9 ("Bluetooth: btqca: Add support to read FW build version for WCN3991 BTSoC")
99719449a4a6 ("Bluetooth: hci_qca: resolve various warnings")
054ec5e94a46 ("Bluetooth: hci_qca: Remove duplicate power off in proto close")
590deccf4c06 ("Bluetooth: hci_qca: Disable SoC debug logging for WCN3991")
37aee136f8c4 ("Bluetooth: hci_qca: allow max-speed to be set for QCA9377 devices")
e5d6468fe9d8 ("Bluetooth: hci_qca: Add support for Qualcomm Bluetooth SoC QCA6390")
77131dfec6af ("Bluetooth: hci_qca: Replace devm_gpiod_get() with devm_gpiod_get_optional()")
8a208b24d770 ("Bluetooth: hci_qca: Make bt_en and susclk not mandatory for QCA Rome")
b63882549b2b ("Bluetooth: btqca: Fix the NVM baudrate tag offcet for wcn3991")
4f9ed5bd63dc ("Bluetooth: hci_qca: Not send vendor pre-shutdown command for QCA Rome")
66cb70513564 ("Bluetooth: hci_qca: Enable clocks required for BT SOC")
ae563183b647 ("Bluetooth: hci_qca: Enable power off/on support during hci down/up for QCA Rome")
5559904ccc08 ("Bluetooth: hci_qca: Add QCA Rome power off support to the qca_power_shutdown()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40d442f969fb1e871da6fca73d3f8aef1f888558 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Wed, 1 May 2024 08:37:40 +0200
Subject: [PATCH] Bluetooth: qca: fix firmware check error path
A recent commit fixed the code that parses the firmware files before
downloading them to the controller but introduced a memory leak in case
the sanity checks ever fail.
Make sure to free the firmware buffer before returning on errors.
Fixes: f905ae0be4b7 ("Bluetooth: qca: add missing firmware sanity checks")
Cc: stable(a)vger.kernel.org # 4.19
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index 8d8a664620a3..638074992c82 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -605,7 +605,7 @@ static int qca_download_firmware(struct hci_dev *hdev,
ret = qca_tlv_check_data(hdev, config, data, size, soc_type);
if (ret)
- return ret;
+ goto out;
segment = data;
remain = size;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 40d442f969fb1e871da6fca73d3f8aef1f888558
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051320-majority-visiting-2bf1@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
40d442f969fb ("Bluetooth: qca: fix firmware check error path")
2e4edfa1e2bd ("Bluetooth: qca: add missing firmware sanity checks")
ecf6b2d95666 ("Bluetooth: btqca: Add support for firmware image with mbn type for WCN6750")
d8f97da1b92d ("Bluetooth: hci_qca: Add support for QTI Bluetooth chip wcn6750")
b43ca511178e ("Bluetooth: btqca: Don't modify firmware contents in-place")
c1a74160eaf1 ("Bluetooth: hci_qca: Add device_may_wakeup support")
eaf19b0c47d1 ("Bluetooth: btqca: Enable MSFT extension for Qualcomm WCN399x")
c0187b0bd3e9 ("Bluetooth: btqca: Add support to read FW build version for WCN3991 BTSoC")
99719449a4a6 ("Bluetooth: hci_qca: resolve various warnings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40d442f969fb1e871da6fca73d3f8aef1f888558 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Wed, 1 May 2024 08:37:40 +0200
Subject: [PATCH] Bluetooth: qca: fix firmware check error path
A recent commit fixed the code that parses the firmware files before
downloading them to the controller but introduced a memory leak in case
the sanity checks ever fail.
Make sure to free the firmware buffer before returning on errors.
Fixes: f905ae0be4b7 ("Bluetooth: qca: add missing firmware sanity checks")
Cc: stable(a)vger.kernel.org # 4.19
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index 8d8a664620a3..638074992c82 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -605,7 +605,7 @@ static int qca_download_firmware(struct hci_dev *hdev,
ret = qca_tlv_check_data(hdev, config, data, size, soc_type);
if (ret)
- return ret;
+ goto out;
segment = data;
remain = size;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x a112d3c72a227f2edbb6d8094472cc6e503e52af
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051301-untwist-vertical-ba65@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
a112d3c72a22 ("Bluetooth: qca: fix NVM configuration parsing")
2e4edfa1e2bd ("Bluetooth: qca: add missing firmware sanity checks")
ecf6b2d95666 ("Bluetooth: btqca: Add support for firmware image with mbn type for WCN6750")
d8f97da1b92d ("Bluetooth: hci_qca: Add support for QTI Bluetooth chip wcn6750")
b43ca511178e ("Bluetooth: btqca: Don't modify firmware contents in-place")
c1a74160eaf1 ("Bluetooth: hci_qca: Add device_may_wakeup support")
eaf19b0c47d1 ("Bluetooth: btqca: Enable MSFT extension for Qualcomm WCN399x")
c0187b0bd3e9 ("Bluetooth: btqca: Add support to read FW build version for WCN3991 BTSoC")
99719449a4a6 ("Bluetooth: hci_qca: resolve various warnings")
054ec5e94a46 ("Bluetooth: hci_qca: Remove duplicate power off in proto close")
590deccf4c06 ("Bluetooth: hci_qca: Disable SoC debug logging for WCN3991")
37aee136f8c4 ("Bluetooth: hci_qca: allow max-speed to be set for QCA9377 devices")
e5d6468fe9d8 ("Bluetooth: hci_qca: Add support for Qualcomm Bluetooth SoC QCA6390")
77131dfec6af ("Bluetooth: hci_qca: Replace devm_gpiod_get() with devm_gpiod_get_optional()")
8a208b24d770 ("Bluetooth: hci_qca: Make bt_en and susclk not mandatory for QCA Rome")
b63882549b2b ("Bluetooth: btqca: Fix the NVM baudrate tag offcet for wcn3991")
4f9ed5bd63dc ("Bluetooth: hci_qca: Not send vendor pre-shutdown command for QCA Rome")
66cb70513564 ("Bluetooth: hci_qca: Enable clocks required for BT SOC")
ae563183b647 ("Bluetooth: hci_qca: Enable power off/on support during hci down/up for QCA Rome")
5559904ccc08 ("Bluetooth: hci_qca: Add QCA Rome power off support to the qca_power_shutdown()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a112d3c72a227f2edbb6d8094472cc6e503e52af Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Tue, 30 Apr 2024 19:07:40 +0200
Subject: [PATCH] Bluetooth: qca: fix NVM configuration parsing
The NVM configuration files used by WCN3988 and WCN3990/1/8 have two
sets of configuration tags that are enclosed by a type-length header of
type four which the current parser fails to account for.
Instead the driver happily parses random data as if it were valid tags,
something which can lead to the configuration data being corrupted if it
ever encounters the words 0x0011 or 0x001b.
As is clear from commit b63882549b2b ("Bluetooth: btqca: Fix the NVM
baudrate tag offcet for wcn3991") the intention has always been to
process the configuration data also for WCN3991 and WCN3998 which
encodes the baud rate at a different offset.
Fix the parser so that it can handle the WCN3xxx configuration files,
which has an enclosing type-length header of type four and two sets of
TLV tags enclosed by a type-length header of type two and three,
respectively.
Note that only the first set, which contains the tags the driver is
currently looking for, will be parsed for now.
With the parser fixed, the software in-band sleep bit will now be set
for WCN3991 and WCN3998 (as it is for later controllers) and the default
baud rate 3200000 may be updated by the driver also for WCN3xxx
controllers.
Notably the deep-sleep feature bit is already set by default in all
configuration files in linux-firmware.
Fixes: 4219d4686875 ("Bluetooth: btqca: Add wcn3990 firmware download support.")
Cc: stable(a)vger.kernel.org # 4.19
Cc: Matthias Kaehlcke <mka(a)chromium.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index 6743b0a79d7a..f6c9f89a6311 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -281,6 +281,7 @@ static int qca_tlv_check_data(struct hci_dev *hdev,
struct tlv_type_patch *tlv_patch;
struct tlv_type_nvm *tlv_nvm;
uint8_t nvm_baud_rate = config->user_baud_rate;
+ u8 type;
config->dnld_mode = QCA_SKIP_EVT_NONE;
config->dnld_type = QCA_SKIP_EVT_NONE;
@@ -346,11 +347,30 @@ static int qca_tlv_check_data(struct hci_dev *hdev,
tlv = (struct tlv_type_hdr *)fw_data;
type_len = le32_to_cpu(tlv->type_len);
- length = (type_len >> 8) & 0x00ffffff;
+ length = type_len >> 8;
+ type = type_len & 0xff;
- BT_DBG("TLV Type\t\t : 0x%x", type_len & 0x000000ff);
+ /* Some NVM files have more than one set of tags, only parse
+ * the first set when it has type 2 for now. When there is
+ * more than one set there is an enclosing header of type 4.
+ */
+ if (type == 4) {
+ if (fw_size < 2 * sizeof(struct tlv_type_hdr))
+ return -EINVAL;
+
+ tlv++;
+
+ type_len = le32_to_cpu(tlv->type_len);
+ length = type_len >> 8;
+ type = type_len & 0xff;
+ }
+
+ BT_DBG("TLV Type\t\t : 0x%x", type);
BT_DBG("Length\t\t : %d bytes", length);
+ if (type != 2)
+ break;
+
if (fw_size < length + (tlv->data - fw_data))
return -EINVAL;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x a112d3c72a227f2edbb6d8094472cc6e503e52af
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051359-refueling-alienable-65c1@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
a112d3c72a22 ("Bluetooth: qca: fix NVM configuration parsing")
2e4edfa1e2bd ("Bluetooth: qca: add missing firmware sanity checks")
ecf6b2d95666 ("Bluetooth: btqca: Add support for firmware image with mbn type for WCN6750")
d8f97da1b92d ("Bluetooth: hci_qca: Add support for QTI Bluetooth chip wcn6750")
b43ca511178e ("Bluetooth: btqca: Don't modify firmware contents in-place")
c1a74160eaf1 ("Bluetooth: hci_qca: Add device_may_wakeup support")
eaf19b0c47d1 ("Bluetooth: btqca: Enable MSFT extension for Qualcomm WCN399x")
c0187b0bd3e9 ("Bluetooth: btqca: Add support to read FW build version for WCN3991 BTSoC")
99719449a4a6 ("Bluetooth: hci_qca: resolve various warnings")
054ec5e94a46 ("Bluetooth: hci_qca: Remove duplicate power off in proto close")
590deccf4c06 ("Bluetooth: hci_qca: Disable SoC debug logging for WCN3991")
37aee136f8c4 ("Bluetooth: hci_qca: allow max-speed to be set for QCA9377 devices")
e5d6468fe9d8 ("Bluetooth: hci_qca: Add support for Qualcomm Bluetooth SoC QCA6390")
77131dfec6af ("Bluetooth: hci_qca: Replace devm_gpiod_get() with devm_gpiod_get_optional()")
8a208b24d770 ("Bluetooth: hci_qca: Make bt_en and susclk not mandatory for QCA Rome")
b63882549b2b ("Bluetooth: btqca: Fix the NVM baudrate tag offcet for wcn3991")
4f9ed5bd63dc ("Bluetooth: hci_qca: Not send vendor pre-shutdown command for QCA Rome")
66cb70513564 ("Bluetooth: hci_qca: Enable clocks required for BT SOC")
ae563183b647 ("Bluetooth: hci_qca: Enable power off/on support during hci down/up for QCA Rome")
5559904ccc08 ("Bluetooth: hci_qca: Add QCA Rome power off support to the qca_power_shutdown()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a112d3c72a227f2edbb6d8094472cc6e503e52af Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Tue, 30 Apr 2024 19:07:40 +0200
Subject: [PATCH] Bluetooth: qca: fix NVM configuration parsing
The NVM configuration files used by WCN3988 and WCN3990/1/8 have two
sets of configuration tags that are enclosed by a type-length header of
type four which the current parser fails to account for.
Instead the driver happily parses random data as if it were valid tags,
something which can lead to the configuration data being corrupted if it
ever encounters the words 0x0011 or 0x001b.
As is clear from commit b63882549b2b ("Bluetooth: btqca: Fix the NVM
baudrate tag offcet for wcn3991") the intention has always been to
process the configuration data also for WCN3991 and WCN3998 which
encodes the baud rate at a different offset.
Fix the parser so that it can handle the WCN3xxx configuration files,
which has an enclosing type-length header of type four and two sets of
TLV tags enclosed by a type-length header of type two and three,
respectively.
Note that only the first set, which contains the tags the driver is
currently looking for, will be parsed for now.
With the parser fixed, the software in-band sleep bit will now be set
for WCN3991 and WCN3998 (as it is for later controllers) and the default
baud rate 3200000 may be updated by the driver also for WCN3xxx
controllers.
Notably the deep-sleep feature bit is already set by default in all
configuration files in linux-firmware.
Fixes: 4219d4686875 ("Bluetooth: btqca: Add wcn3990 firmware download support.")
Cc: stable(a)vger.kernel.org # 4.19
Cc: Matthias Kaehlcke <mka(a)chromium.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index 6743b0a79d7a..f6c9f89a6311 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -281,6 +281,7 @@ static int qca_tlv_check_data(struct hci_dev *hdev,
struct tlv_type_patch *tlv_patch;
struct tlv_type_nvm *tlv_nvm;
uint8_t nvm_baud_rate = config->user_baud_rate;
+ u8 type;
config->dnld_mode = QCA_SKIP_EVT_NONE;
config->dnld_type = QCA_SKIP_EVT_NONE;
@@ -346,11 +347,30 @@ static int qca_tlv_check_data(struct hci_dev *hdev,
tlv = (struct tlv_type_hdr *)fw_data;
type_len = le32_to_cpu(tlv->type_len);
- length = (type_len >> 8) & 0x00ffffff;
+ length = type_len >> 8;
+ type = type_len & 0xff;
- BT_DBG("TLV Type\t\t : 0x%x", type_len & 0x000000ff);
+ /* Some NVM files have more than one set of tags, only parse
+ * the first set when it has type 2 for now. When there is
+ * more than one set there is an enclosing header of type 4.
+ */
+ if (type == 4) {
+ if (fw_size < 2 * sizeof(struct tlv_type_hdr))
+ return -EINVAL;
+
+ tlv++;
+
+ type_len = le32_to_cpu(tlv->type_len);
+ length = type_len >> 8;
+ type = type_len & 0xff;
+ }
+
+ BT_DBG("TLV Type\t\t : 0x%x", type);
BT_DBG("Length\t\t : %d bytes", length);
+ if (type != 2)
+ break;
+
if (fw_size < length + (tlv->data - fw_data))
return -EINVAL;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a112d3c72a227f2edbb6d8094472cc6e503e52af
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051358-overplant-spent-f520@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
a112d3c72a22 ("Bluetooth: qca: fix NVM configuration parsing")
2e4edfa1e2bd ("Bluetooth: qca: add missing firmware sanity checks")
ecf6b2d95666 ("Bluetooth: btqca: Add support for firmware image with mbn type for WCN6750")
d8f97da1b92d ("Bluetooth: hci_qca: Add support for QTI Bluetooth chip wcn6750")
b43ca511178e ("Bluetooth: btqca: Don't modify firmware contents in-place")
c1a74160eaf1 ("Bluetooth: hci_qca: Add device_may_wakeup support")
eaf19b0c47d1 ("Bluetooth: btqca: Enable MSFT extension for Qualcomm WCN399x")
c0187b0bd3e9 ("Bluetooth: btqca: Add support to read FW build version for WCN3991 BTSoC")
99719449a4a6 ("Bluetooth: hci_qca: resolve various warnings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a112d3c72a227f2edbb6d8094472cc6e503e52af Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Tue, 30 Apr 2024 19:07:40 +0200
Subject: [PATCH] Bluetooth: qca: fix NVM configuration parsing
The NVM configuration files used by WCN3988 and WCN3990/1/8 have two
sets of configuration tags that are enclosed by a type-length header of
type four which the current parser fails to account for.
Instead the driver happily parses random data as if it were valid tags,
something which can lead to the configuration data being corrupted if it
ever encounters the words 0x0011 or 0x001b.
As is clear from commit b63882549b2b ("Bluetooth: btqca: Fix the NVM
baudrate tag offcet for wcn3991") the intention has always been to
process the configuration data also for WCN3991 and WCN3998 which
encodes the baud rate at a different offset.
Fix the parser so that it can handle the WCN3xxx configuration files,
which has an enclosing type-length header of type four and two sets of
TLV tags enclosed by a type-length header of type two and three,
respectively.
Note that only the first set, which contains the tags the driver is
currently looking for, will be parsed for now.
With the parser fixed, the software in-band sleep bit will now be set
for WCN3991 and WCN3998 (as it is for later controllers) and the default
baud rate 3200000 may be updated by the driver also for WCN3xxx
controllers.
Notably the deep-sleep feature bit is already set by default in all
configuration files in linux-firmware.
Fixes: 4219d4686875 ("Bluetooth: btqca: Add wcn3990 firmware download support.")
Cc: stable(a)vger.kernel.org # 4.19
Cc: Matthias Kaehlcke <mka(a)chromium.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index 6743b0a79d7a..f6c9f89a6311 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -281,6 +281,7 @@ static int qca_tlv_check_data(struct hci_dev *hdev,
struct tlv_type_patch *tlv_patch;
struct tlv_type_nvm *tlv_nvm;
uint8_t nvm_baud_rate = config->user_baud_rate;
+ u8 type;
config->dnld_mode = QCA_SKIP_EVT_NONE;
config->dnld_type = QCA_SKIP_EVT_NONE;
@@ -346,11 +347,30 @@ static int qca_tlv_check_data(struct hci_dev *hdev,
tlv = (struct tlv_type_hdr *)fw_data;
type_len = le32_to_cpu(tlv->type_len);
- length = (type_len >> 8) & 0x00ffffff;
+ length = type_len >> 8;
+ type = type_len & 0xff;
- BT_DBG("TLV Type\t\t : 0x%x", type_len & 0x000000ff);
+ /* Some NVM files have more than one set of tags, only parse
+ * the first set when it has type 2 for now. When there is
+ * more than one set there is an enclosing header of type 4.
+ */
+ if (type == 4) {
+ if (fw_size < 2 * sizeof(struct tlv_type_hdr))
+ return -EINVAL;
+
+ tlv++;
+
+ type_len = le32_to_cpu(tlv->type_len);
+ length = type_len >> 8;
+ type = type_len & 0xff;
+ }
+
+ BT_DBG("TLV Type\t\t : 0x%x", type);
BT_DBG("Length\t\t : %d bytes", length);
+ if (type != 2)
+ break;
+
if (fw_size < length + (tlv->data - fw_data))
return -EINVAL;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x c88033efe9a391e72ba6b5df4b01d6e628f4e734
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051304-turkey-underfed-40fe@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
c88033efe9a3 ("mm/userfaultfd: reset ptes when close() for wr-protected ones")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c88033efe9a391e72ba6b5df4b01d6e628f4e734 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx(a)redhat.com>
Date: Mon, 22 Apr 2024 09:33:11 -0400
Subject: [PATCH] mm/userfaultfd: reset ptes when close() for wr-protected ones
Userfaultfd unregister includes a step to remove wr-protect bits from all
the relevant pgtable entries, but that only covered an explicit
UFFDIO_UNREGISTER ioctl, not a close() on the userfaultfd itself. Cover
that too. This fixes a WARN trace.
The only user visible side effect is the user can observe leftover
wr-protect bits even if the user close()ed on an userfaultfd when
releasing the last reference of it. However hopefully that should be
harmless, and nothing bad should happen even if so.
This change is now more important after the recent page-table-check
patch we merged in mm-unstable (446dd9ad37d0 ("mm/page_table_check:
support userfault wr-protect entries")), as we'll do sanity check on
uffd-wp bits without vma context. So it's better if we can 100%
guarantee no uffd-wp bit leftovers, to make sure each report will be
valid.
Link: https://lore.kernel.org/all/000000000000ca4df20616a0fe16@google.com/
Fixes: f369b07c8614 ("mm/uffd: reset write protection when unregister with wp-mode")
Analyzed-by: David Hildenbrand <david(a)redhat.com>
Link: https://lkml.kernel.org/r/20240422133311.2987675-1-peterx@redhat.com
Reported-by: syzbot+d8426b591c36b21c750e(a)syzkaller.appspotmail.com
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: Nadav Amit <nadav.amit(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 60dcfafdc11a..292f5fd50104 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -895,6 +895,10 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
prev = vma;
continue;
}
+ /* Reset ptes for the whole vma range if wr-protected */
+ if (userfaultfd_wp(vma))
+ uffd_wp_range(vma, vma->vm_start,
+ vma->vm_end - vma->vm_start, false);
new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS;
vma = vma_modify_flags_uffd(&vmi, prev, vma, vma->vm_start,
vma->vm_end, new_flags,
The patch below does not apply to the 6.8-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.8.y
git checkout FETCH_HEAD
git cherry-pick -x d57cf30c4c07837799edec949102b0adf58bae79
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051331-sediment-viewable-33bc@gregkh' --subject-prefix 'PATCH 6.8.y' HEAD^..
Possible dependencies:
d57cf30c4c07 ("eventfs: Have "events" directory get permissions from its parent")
d53891d348ac ("eventfs: Do not differentiate the toplevel events directory")
c3137ab6318d ("eventfs: Create eventfs_root_inode to store dentry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d57cf30c4c07837799edec949102b0adf58bae79 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 2 May 2024 16:08:27 -0400
Subject: [PATCH] eventfs: Have "events" directory get permissions from its
parent
The events directory gets its permissions from the root inode. But this
can cause an inconsistency if the instances directory changes its
permissions, as the permissions of the created directories under it should
inherit the permissions of the instances directory when directories under
it are created.
Currently the behavior is:
# cd /sys/kernel/tracing
# chgrp 1002 instances
# mkdir instances/foo
# ls -l instances/foo
[..]
-r--r----- 1 root lkp 0 May 1 18:55 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 18:55 current_tracer
-rw-r----- 1 root lkp 0 May 1 18:55 error_log
drwxr-xr-x 1 root root 0 May 1 18:55 events
--w------- 1 root lkp 0 May 1 18:55 free_buffer
drwxr-x--- 2 root lkp 0 May 1 18:55 options
drwxr-x--- 10 root lkp 0 May 1 18:55 per_cpu
-rw-r----- 1 root lkp 0 May 1 18:55 set_event
All the files and directories under "foo" has the "lkp" group except the
"events" directory. That's because its getting its default value from the
mount point instead of its parent.
Have the "events" directory make its default value based on its parent's
permissions. That now gives:
# ls -l instances/foo
[..]
-rw-r----- 1 root lkp 0 May 1 21:16 buffer_subbuf_size_kb
-r--r----- 1 root lkp 0 May 1 21:16 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:16 current_tracer
-rw-r----- 1 root lkp 0 May 1 21:16 error_log
drwxr-xr-x 1 root lkp 0 May 1 21:16 events
--w------- 1 root lkp 0 May 1 21:16 free_buffer
drwxr-x--- 2 root lkp 0 May 1 21:16 options
drwxr-x--- 10 root lkp 0 May 1 21:16 per_cpu
-rw-r----- 1 root lkp 0 May 1 21:16 set_event
Link: https://lore.kernel.org/linux-trace-kernel/20240502200906.161887248@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 6e08405892ae..a878cea70f4c 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -37,6 +37,7 @@ static DEFINE_MUTEX(eventfs_mutex);
struct eventfs_root_inode {
struct eventfs_inode ei;
+ struct inode *parent_inode;
struct dentry *events_dir;
};
@@ -226,12 +227,23 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
static void update_events_attr(struct eventfs_inode *ei, struct super_block *sb)
{
- struct inode *root;
+ struct eventfs_root_inode *rei;
+ struct inode *parent;
- /* Get the tracefs root inode. */
- root = d_inode(sb->s_root);
- ei->attr.uid = root->i_uid;
- ei->attr.gid = root->i_gid;
+ rei = get_root_inode(ei);
+
+ /* Use the parent inode permissions unless root set its permissions */
+ parent = rei->parent_inode;
+
+ if (rei->ei.attr.mode & EVENTFS_SAVE_UID)
+ ei->attr.uid = rei->ei.attr.uid;
+ else
+ ei->attr.uid = parent->i_uid;
+
+ if (rei->ei.attr.mode & EVENTFS_SAVE_GID)
+ ei->attr.gid = rei->ei.attr.gid;
+ else
+ ei->attr.gid = parent->i_gid;
}
static void set_top_events_ownership(struct inode *inode)
@@ -817,6 +829,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
// Note: we have a ref to the dentry from tracefs_start_creating()
rei = get_root_inode(ei);
rei->events_dir = dentry;
+ rei->parent_inode = d_inode(dentry->d_sb->s_root);
ei->entries = entries;
ei->nr_entries = size;
@@ -826,10 +839,15 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
uid = d_inode(dentry->d_parent)->i_uid;
gid = d_inode(dentry->d_parent)->i_gid;
- /* This is used as the default ownership of the files and directories */
ei->attr.uid = uid;
ei->attr.gid = gid;
+ /*
+ * When the "events" directory is created, it takes on the
+ * permissions of its parent. But can be reset on remount.
+ */
+ ei->attr.mode |= EVENTFS_SAVE_UID | EVENTFS_SAVE_GID;
+
INIT_LIST_HEAD(&ei->children);
INIT_LIST_HEAD(&ei->list);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x d57cf30c4c07837799edec949102b0adf58bae79
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051329-cringing-recast-3368@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
d57cf30c4c07 ("eventfs: Have "events" directory get permissions from its parent")
d53891d348ac ("eventfs: Do not differentiate the toplevel events directory")
c3137ab6318d ("eventfs: Create eventfs_root_inode to store dentry")
264424dfdd5c ("eventfs: Restructure eventfs_inode structure to be more condensed")
5a49f996046b ("eventfs: Warn if an eventfs_inode is freed without is_freed being set")
43aa6f97c2d0 ("eventfs: Get rid of dentry pointers without refcounts")
408600be78cd ("eventfs: Remove unused d_parent pointer field")
49304c2b93e4 ("tracefs: dentry lookup crapectomy")
99c001cb617d ("tracefs: Avoid using the ei->dentry pointer unnecessarily")
4fa4b010b83f ("eventfs: Initialize the tracefs inode properly")
29142dc92c37 ("tracefs: remove stale 'update_gid' code")
834bf76add3e ("eventfs: Save directory inodes in the eventfs_inode structure")
1057066009c4 ("eventfs: Use kcalloc() instead of kzalloc()")
852e46e239ee ("eventfs: Do not create dentries nor inodes in iterate_shared")
53c41052ba31 ("eventfs: Have the inodes all for files and directories all be the same")
1de94b52d5e8 ("eventfs: Shortcut eventfs_iterate() by skipping entries already read")
704f960dbee2 ("eventfs: Read ei->entries before ei->children in eventfs_iterate()")
1e4624eb5a0e ("eventfs: Do ctx->pos update for all iterations in eventfs_iterate()")
e109deadb733 ("eventfs: Have eventfs_iterate() stop immediately if ei->is_freed is set")
8186fff7ab64 ("tracefs/eventfs: Use root and instance inodes as default ownership")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d57cf30c4c07837799edec949102b0adf58bae79 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 2 May 2024 16:08:27 -0400
Subject: [PATCH] eventfs: Have "events" directory get permissions from its
parent
The events directory gets its permissions from the root inode. But this
can cause an inconsistency if the instances directory changes its
permissions, as the permissions of the created directories under it should
inherit the permissions of the instances directory when directories under
it are created.
Currently the behavior is:
# cd /sys/kernel/tracing
# chgrp 1002 instances
# mkdir instances/foo
# ls -l instances/foo
[..]
-r--r----- 1 root lkp 0 May 1 18:55 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 18:55 current_tracer
-rw-r----- 1 root lkp 0 May 1 18:55 error_log
drwxr-xr-x 1 root root 0 May 1 18:55 events
--w------- 1 root lkp 0 May 1 18:55 free_buffer
drwxr-x--- 2 root lkp 0 May 1 18:55 options
drwxr-x--- 10 root lkp 0 May 1 18:55 per_cpu
-rw-r----- 1 root lkp 0 May 1 18:55 set_event
All the files and directories under "foo" has the "lkp" group except the
"events" directory. That's because its getting its default value from the
mount point instead of its parent.
Have the "events" directory make its default value based on its parent's
permissions. That now gives:
# ls -l instances/foo
[..]
-rw-r----- 1 root lkp 0 May 1 21:16 buffer_subbuf_size_kb
-r--r----- 1 root lkp 0 May 1 21:16 buffer_total_size_kb
-rw-r----- 1 root lkp 0 May 1 21:16 current_tracer
-rw-r----- 1 root lkp 0 May 1 21:16 error_log
drwxr-xr-x 1 root lkp 0 May 1 21:16 events
--w------- 1 root lkp 0 May 1 21:16 free_buffer
drwxr-x--- 2 root lkp 0 May 1 21:16 options
drwxr-x--- 10 root lkp 0 May 1 21:16 per_cpu
-rw-r----- 1 root lkp 0 May 1 21:16 set_event
Link: https://lore.kernel.org/linux-trace-kernel/20240502200906.161887248@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: 8186fff7ab649 ("tracefs/eventfs: Use root and instance inodes as default ownership")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 6e08405892ae..a878cea70f4c 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -37,6 +37,7 @@ static DEFINE_MUTEX(eventfs_mutex);
struct eventfs_root_inode {
struct eventfs_inode ei;
+ struct inode *parent_inode;
struct dentry *events_dir;
};
@@ -226,12 +227,23 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
static void update_events_attr(struct eventfs_inode *ei, struct super_block *sb)
{
- struct inode *root;
+ struct eventfs_root_inode *rei;
+ struct inode *parent;
- /* Get the tracefs root inode. */
- root = d_inode(sb->s_root);
- ei->attr.uid = root->i_uid;
- ei->attr.gid = root->i_gid;
+ rei = get_root_inode(ei);
+
+ /* Use the parent inode permissions unless root set its permissions */
+ parent = rei->parent_inode;
+
+ if (rei->ei.attr.mode & EVENTFS_SAVE_UID)
+ ei->attr.uid = rei->ei.attr.uid;
+ else
+ ei->attr.uid = parent->i_uid;
+
+ if (rei->ei.attr.mode & EVENTFS_SAVE_GID)
+ ei->attr.gid = rei->ei.attr.gid;
+ else
+ ei->attr.gid = parent->i_gid;
}
static void set_top_events_ownership(struct inode *inode)
@@ -817,6 +829,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
// Note: we have a ref to the dentry from tracefs_start_creating()
rei = get_root_inode(ei);
rei->events_dir = dentry;
+ rei->parent_inode = d_inode(dentry->d_sb->s_root);
ei->entries = entries;
ei->nr_entries = size;
@@ -826,10 +839,15 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
uid = d_inode(dentry->d_parent)->i_uid;
gid = d_inode(dentry->d_parent)->i_gid;
- /* This is used as the default ownership of the files and directories */
ei->attr.uid = uid;
ei->attr.gid = gid;
+ /*
+ * When the "events" directory is created, it takes on the
+ * permissions of its parent. But can be reset on remount.
+ */
+ ei->attr.mode |= EVENTFS_SAVE_UID | EVENTFS_SAVE_GID;
+
INIT_LIST_HEAD(&ei->children);
INIT_LIST_HEAD(&ei->list);
The patch below does not apply to the 6.8-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.8.y
git checkout FETCH_HEAD
git cherry-pick -x b63db58e2fa5d6963db9c45df88e60060f0ff35f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051325-ranting-skydiver-968e@gregkh' --subject-prefix 'PATCH 6.8.y' HEAD^..
Possible dependencies:
b63db58e2fa5 ("eventfs/tracing: Add callback for release of an eventfs_inode")
c3137ab6318d ("eventfs: Create eventfs_root_inode to store dentry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b63db58e2fa5d6963db9c45df88e60060f0ff35f Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 2 May 2024 09:03:15 -0400
Subject: [PATCH] eventfs/tracing: Add callback for release of an eventfs_inode
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Synthetic events create and destroy tracefs files when they are created
and removed. The tracing subsystem has its own file descriptor
representing the state of the events attached to the tracefs files.
There's a race between the eventfs files and this file descriptor of the
tracing system where the following can cause an issue:
With two scripts 'A' and 'B' doing:
Script 'A':
echo "hello int aaa" > /sys/kernel/tracing/synthetic_events
while :
do
echo 0 > /sys/kernel/tracing/events/synthetic/hello/enable
done
Script 'B':
echo > /sys/kernel/tracing/synthetic_events
Script 'A' creates a synthetic event "hello" and then just writes zero
into its enable file.
Script 'B' removes all synthetic events (including the newly created
"hello" event).
What happens is that the opening of the "enable" file has:
{
struct trace_event_file *file = inode->i_private;
int ret;
ret = tracing_check_open_get_tr(file->tr);
[..]
But deleting the events frees the "file" descriptor, and a "use after
free" happens with the dereference at "file->tr".
The file descriptor does have a reference counter, but there needs to be a
way to decrement it from the eventfs when the eventfs_inode is removed
that represents this file descriptor.
Add an optional "release" callback to the eventfs_entry array structure,
that gets called when the eventfs file is about to be removed. This allows
for the creating on the eventfs file to increment the tracing file
descriptor ref counter. When the eventfs file is deleted, it can call the
release function that will call the put function for the tracing file
descriptor.
This will protect the tracing file from being freed while a eventfs file
that references it is being opened.
Link: https://lore.kernel.org/linux-trace-kernel/20240426073410.17154-1-Tze-nan.W…
Link: https://lore.kernel.org/linux-trace-kernel/20240502090315.448cba46@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: Tze-nan wu <Tze-nan.Wu(a)mediatek.com>
Tested-by: Tze-nan Wu (吳澤南) <Tze-nan.Wu(a)mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 894c6ca1e500..f5510e26f0f6 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -84,10 +84,17 @@ enum {
static void release_ei(struct kref *ref)
{
struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, kref);
+ const struct eventfs_entry *entry;
struct eventfs_root_inode *rei;
WARN_ON_ONCE(!ei->is_freed);
+ for (int i = 0; i < ei->nr_entries; i++) {
+ entry = &ei->entries[i];
+ if (entry->release)
+ entry->release(entry->name, ei->data);
+ }
+
kfree(ei->entry_attrs);
kfree_const(ei->name);
if (ei->is_events) {
@@ -112,6 +119,18 @@ static inline void free_ei(struct eventfs_inode *ei)
}
}
+/*
+ * Called when creation of an ei fails, do not call release() functions.
+ */
+static inline void cleanup_ei(struct eventfs_inode *ei)
+{
+ if (ei) {
+ /* Set nr_entries to 0 to prevent release() function being called */
+ ei->nr_entries = 0;
+ free_ei(ei);
+ }
+}
+
static inline struct eventfs_inode *get_ei(struct eventfs_inode *ei)
{
if (ei)
@@ -734,7 +753,7 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
/* Was the parent freed? */
if (list_empty(&ei->list)) {
- free_ei(ei);
+ cleanup_ei(ei);
ei = NULL;
}
return ei;
@@ -835,7 +854,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
return ei;
fail:
- free_ei(ei);
+ cleanup_ei(ei);
tracefs_failed_creating(dentry);
return ERR_PTR(-ENOMEM);
}
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 7a5fe17b6bf9..d03f74658716 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -62,6 +62,8 @@ struct eventfs_file;
typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data,
const struct file_operations **fops);
+typedef void (*eventfs_release)(const char *name, void *data);
+
/**
* struct eventfs_entry - dynamically created eventfs file call back handler
* @name: Then name of the dynamic file in an eventfs directory
@@ -72,6 +74,7 @@ typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data,
struct eventfs_entry {
const char *name;
eventfs_callback callback;
+ eventfs_release release;
};
struct eventfs_inode;
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 52f75c36bbca..6ef29eba90ce 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2552,6 +2552,14 @@ static int event_callback(const char *name, umode_t *mode, void **data,
return 0;
}
+/* The file is incremented on creation and freeing the enable file decrements it */
+static void event_release(const char *name, void *data)
+{
+ struct trace_event_file *file = data;
+
+ event_file_put(file);
+}
+
static int
event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
{
@@ -2566,6 +2574,7 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
{
.name = "enable",
.callback = event_callback,
+ .release = event_release,
},
{
.name = "filter",
@@ -2634,6 +2643,9 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
return ret;
}
+ /* Gets decremented on freeing of the "enable" file */
+ event_file_get(file);
+
return 0;
}
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x b63db58e2fa5d6963db9c45df88e60060f0ff35f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051323-relight-stoop-214c@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
b63db58e2fa5 ("eventfs/tracing: Add callback for release of an eventfs_inode")
c3137ab6318d ("eventfs: Create eventfs_root_inode to store dentry")
264424dfdd5c ("eventfs: Restructure eventfs_inode structure to be more condensed")
5a49f996046b ("eventfs: Warn if an eventfs_inode is freed without is_freed being set")
43aa6f97c2d0 ("eventfs: Get rid of dentry pointers without refcounts")
408600be78cd ("eventfs: Remove unused d_parent pointer field")
49304c2b93e4 ("tracefs: dentry lookup crapectomy")
4fa4b010b83f ("eventfs: Initialize the tracefs inode properly")
29142dc92c37 ("tracefs: remove stale 'update_gid' code")
834bf76add3e ("eventfs: Save directory inodes in the eventfs_inode structure")
1057066009c4 ("eventfs: Use kcalloc() instead of kzalloc()")
852e46e239ee ("eventfs: Do not create dentries nor inodes in iterate_shared")
53c41052ba31 ("eventfs: Have the inodes all for files and directories all be the same")
1de94b52d5e8 ("eventfs: Shortcut eventfs_iterate() by skipping entries already read")
704f960dbee2 ("eventfs: Read ei->entries before ei->children in eventfs_iterate()")
1e4624eb5a0e ("eventfs: Do ctx->pos update for all iterations in eventfs_iterate()")
e109deadb733 ("eventfs: Have eventfs_iterate() stop immediately if ei->is_freed is set")
8186fff7ab64 ("tracefs/eventfs: Use root and instance inodes as default ownership")
493ec81a8fb8 ("eventfs: Stop using dcache_readdir() for getdents()")
b0f7e2d739b4 ("eventfs: Remove "lookup" parameter from create_dir/file_dentry()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b63db58e2fa5d6963db9c45df88e60060f0ff35f Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 2 May 2024 09:03:15 -0400
Subject: [PATCH] eventfs/tracing: Add callback for release of an eventfs_inode
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Synthetic events create and destroy tracefs files when they are created
and removed. The tracing subsystem has its own file descriptor
representing the state of the events attached to the tracefs files.
There's a race between the eventfs files and this file descriptor of the
tracing system where the following can cause an issue:
With two scripts 'A' and 'B' doing:
Script 'A':
echo "hello int aaa" > /sys/kernel/tracing/synthetic_events
while :
do
echo 0 > /sys/kernel/tracing/events/synthetic/hello/enable
done
Script 'B':
echo > /sys/kernel/tracing/synthetic_events
Script 'A' creates a synthetic event "hello" and then just writes zero
into its enable file.
Script 'B' removes all synthetic events (including the newly created
"hello" event).
What happens is that the opening of the "enable" file has:
{
struct trace_event_file *file = inode->i_private;
int ret;
ret = tracing_check_open_get_tr(file->tr);
[..]
But deleting the events frees the "file" descriptor, and a "use after
free" happens with the dereference at "file->tr".
The file descriptor does have a reference counter, but there needs to be a
way to decrement it from the eventfs when the eventfs_inode is removed
that represents this file descriptor.
Add an optional "release" callback to the eventfs_entry array structure,
that gets called when the eventfs file is about to be removed. This allows
for the creating on the eventfs file to increment the tracing file
descriptor ref counter. When the eventfs file is deleted, it can call the
release function that will call the put function for the tracing file
descriptor.
This will protect the tracing file from being freed while a eventfs file
that references it is being opened.
Link: https://lore.kernel.org/linux-trace-kernel/20240426073410.17154-1-Tze-nan.W…
Link: https://lore.kernel.org/linux-trace-kernel/20240502090315.448cba46@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reported-by: Tze-nan wu <Tze-nan.Wu(a)mediatek.com>
Tested-by: Tze-nan Wu (吳澤南) <Tze-nan.Wu(a)mediatek.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 894c6ca1e500..f5510e26f0f6 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -84,10 +84,17 @@ enum {
static void release_ei(struct kref *ref)
{
struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, kref);
+ const struct eventfs_entry *entry;
struct eventfs_root_inode *rei;
WARN_ON_ONCE(!ei->is_freed);
+ for (int i = 0; i < ei->nr_entries; i++) {
+ entry = &ei->entries[i];
+ if (entry->release)
+ entry->release(entry->name, ei->data);
+ }
+
kfree(ei->entry_attrs);
kfree_const(ei->name);
if (ei->is_events) {
@@ -112,6 +119,18 @@ static inline void free_ei(struct eventfs_inode *ei)
}
}
+/*
+ * Called when creation of an ei fails, do not call release() functions.
+ */
+static inline void cleanup_ei(struct eventfs_inode *ei)
+{
+ if (ei) {
+ /* Set nr_entries to 0 to prevent release() function being called */
+ ei->nr_entries = 0;
+ free_ei(ei);
+ }
+}
+
static inline struct eventfs_inode *get_ei(struct eventfs_inode *ei)
{
if (ei)
@@ -734,7 +753,7 @@ struct eventfs_inode *eventfs_create_dir(const char *name, struct eventfs_inode
/* Was the parent freed? */
if (list_empty(&ei->list)) {
- free_ei(ei);
+ cleanup_ei(ei);
ei = NULL;
}
return ei;
@@ -835,7 +854,7 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
return ei;
fail:
- free_ei(ei);
+ cleanup_ei(ei);
tracefs_failed_creating(dentry);
return ERR_PTR(-ENOMEM);
}
diff --git a/include/linux/tracefs.h b/include/linux/tracefs.h
index 7a5fe17b6bf9..d03f74658716 100644
--- a/include/linux/tracefs.h
+++ b/include/linux/tracefs.h
@@ -62,6 +62,8 @@ struct eventfs_file;
typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data,
const struct file_operations **fops);
+typedef void (*eventfs_release)(const char *name, void *data);
+
/**
* struct eventfs_entry - dynamically created eventfs file call back handler
* @name: Then name of the dynamic file in an eventfs directory
@@ -72,6 +74,7 @@ typedef int (*eventfs_callback)(const char *name, umode_t *mode, void **data,
struct eventfs_entry {
const char *name;
eventfs_callback callback;
+ eventfs_release release;
};
struct eventfs_inode;
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 52f75c36bbca..6ef29eba90ce 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -2552,6 +2552,14 @@ static int event_callback(const char *name, umode_t *mode, void **data,
return 0;
}
+/* The file is incremented on creation and freeing the enable file decrements it */
+static void event_release(const char *name, void *data)
+{
+ struct trace_event_file *file = data;
+
+ event_file_put(file);
+}
+
static int
event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
{
@@ -2566,6 +2574,7 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
{
.name = "enable",
.callback = event_callback,
+ .release = event_release,
},
{
.name = "filter",
@@ -2634,6 +2643,9 @@ event_create_dir(struct eventfs_inode *parent, struct trace_event_file *file)
return ret;
}
+ /* Gets decremented on freeing of the "enable" file */
+ event_file_get(file);
+
return 0;
}
The patch below does not apply to the 6.8-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.8.y
git checkout FETCH_HEAD
git cherry-pick -x ee4e0379475e4fe723986ae96293e465014fa8d9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051314-unwilling-outmatch-c624@gregkh' --subject-prefix 'PATCH 6.8.y' HEAD^..
Possible dependencies:
ee4e0379475e ("eventfs: Free all of the eventfs_inode after RCU")
b63db58e2fa5 ("eventfs/tracing: Add callback for release of an eventfs_inode")
c3137ab6318d ("eventfs: Create eventfs_root_inode to store dentry")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee4e0379475e4fe723986ae96293e465014fa8d9 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 2 May 2024 16:08:22 -0400
Subject: [PATCH] eventfs: Free all of the eventfs_inode after RCU
The freeing of eventfs_inode via a kfree_rcu() callback. But the content
of the eventfs_inode was being freed after the last kref. This is
dangerous, as changes are being made that can access the content of an
eventfs_inode from an RCU loop.
Instead of using kfree_rcu() use call_rcu() that calls a function to do
all the freeing of the eventfs_inode after a RCU grace period has expired.
Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.370261163@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: 43aa6f97c2d03 ("eventfs: Get rid of dentry pointers without refcounts")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index f5510e26f0f6..cc8b838bbe62 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -73,6 +73,21 @@ enum {
#define EVENTFS_MODE_MASK (EVENTFS_SAVE_MODE - 1)
+static void free_ei_rcu(struct rcu_head *rcu)
+{
+ struct eventfs_inode *ei = container_of(rcu, struct eventfs_inode, rcu);
+ struct eventfs_root_inode *rei;
+
+ kfree(ei->entry_attrs);
+ kfree_const(ei->name);
+ if (ei->is_events) {
+ rei = get_root_inode(ei);
+ kfree(rei);
+ } else {
+ kfree(ei);
+ }
+}
+
/*
* eventfs_inode reference count management.
*
@@ -85,7 +100,6 @@ static void release_ei(struct kref *ref)
{
struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, kref);
const struct eventfs_entry *entry;
- struct eventfs_root_inode *rei;
WARN_ON_ONCE(!ei->is_freed);
@@ -95,14 +109,7 @@ static void release_ei(struct kref *ref)
entry->release(entry->name, ei->data);
}
- kfree(ei->entry_attrs);
- kfree_const(ei->name);
- if (ei->is_events) {
- rei = get_root_inode(ei);
- kfree_rcu(rei, ei.rcu);
- } else {
- kfree_rcu(ei, rcu);
- }
+ call_rcu(&ei->rcu, free_ei_rcu);
}
static inline void put_ei(struct eventfs_inode *ei)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x ee4e0379475e4fe723986ae96293e465014fa8d9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051313-twilight-disposal-0184@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
ee4e0379475e ("eventfs: Free all of the eventfs_inode after RCU")
b63db58e2fa5 ("eventfs/tracing: Add callback for release of an eventfs_inode")
c3137ab6318d ("eventfs: Create eventfs_root_inode to store dentry")
264424dfdd5c ("eventfs: Restructure eventfs_inode structure to be more condensed")
5a49f996046b ("eventfs: Warn if an eventfs_inode is freed without is_freed being set")
43aa6f97c2d0 ("eventfs: Get rid of dentry pointers without refcounts")
408600be78cd ("eventfs: Remove unused d_parent pointer field")
49304c2b93e4 ("tracefs: dentry lookup crapectomy")
4fa4b010b83f ("eventfs: Initialize the tracefs inode properly")
29142dc92c37 ("tracefs: remove stale 'update_gid' code")
834bf76add3e ("eventfs: Save directory inodes in the eventfs_inode structure")
1057066009c4 ("eventfs: Use kcalloc() instead of kzalloc()")
852e46e239ee ("eventfs: Do not create dentries nor inodes in iterate_shared")
53c41052ba31 ("eventfs: Have the inodes all for files and directories all be the same")
1de94b52d5e8 ("eventfs: Shortcut eventfs_iterate() by skipping entries already read")
704f960dbee2 ("eventfs: Read ei->entries before ei->children in eventfs_iterate()")
1e4624eb5a0e ("eventfs: Do ctx->pos update for all iterations in eventfs_iterate()")
e109deadb733 ("eventfs: Have eventfs_iterate() stop immediately if ei->is_freed is set")
8186fff7ab64 ("tracefs/eventfs: Use root and instance inodes as default ownership")
493ec81a8fb8 ("eventfs: Stop using dcache_readdir() for getdents()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee4e0379475e4fe723986ae96293e465014fa8d9 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 2 May 2024 16:08:22 -0400
Subject: [PATCH] eventfs: Free all of the eventfs_inode after RCU
The freeing of eventfs_inode via a kfree_rcu() callback. But the content
of the eventfs_inode was being freed after the last kref. This is
dangerous, as changes are being made that can access the content of an
eventfs_inode from an RCU loop.
Instead of using kfree_rcu() use call_rcu() that calls a function to do
all the freeing of the eventfs_inode after a RCU grace period has expired.
Link: https://lore.kernel.org/linux-trace-kernel/20240502200905.370261163@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: 43aa6f97c2d03 ("eventfs: Get rid of dentry pointers without refcounts")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index f5510e26f0f6..cc8b838bbe62 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -73,6 +73,21 @@ enum {
#define EVENTFS_MODE_MASK (EVENTFS_SAVE_MODE - 1)
+static void free_ei_rcu(struct rcu_head *rcu)
+{
+ struct eventfs_inode *ei = container_of(rcu, struct eventfs_inode, rcu);
+ struct eventfs_root_inode *rei;
+
+ kfree(ei->entry_attrs);
+ kfree_const(ei->name);
+ if (ei->is_events) {
+ rei = get_root_inode(ei);
+ kfree(rei);
+ } else {
+ kfree(ei);
+ }
+}
+
/*
* eventfs_inode reference count management.
*
@@ -85,7 +100,6 @@ static void release_ei(struct kref *ref)
{
struct eventfs_inode *ei = container_of(ref, struct eventfs_inode, kref);
const struct eventfs_entry *entry;
- struct eventfs_root_inode *rei;
WARN_ON_ONCE(!ei->is_freed);
@@ -95,14 +109,7 @@ static void release_ei(struct kref *ref)
entry->release(entry->name, ei->data);
}
- kfree(ei->entry_attrs);
- kfree_const(ei->name);
- if (ei->is_events) {
- rei = get_root_inode(ei);
- kfree_rcu(rei, ei.rcu);
- } else {
- kfree_rcu(ei, rcu);
- }
+ call_rcu(&ei->rcu, free_ei_rcu);
}
static inline void put_ei(struct eventfs_inode *ei)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ee59be35d7a8be7fcaa2d61fb89734ab5c25e4ee
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051340-destitute-overlaid-3681@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
ee59be35d7a8 ("misc/pvpanic-pci: register attributes via pci_driver")
c1426d392aeb ("misc/pvpanic: deduplicate common code")
84b0f12a953c ("pvpanic: Indentation fixes here and there")
cc5b392d0f94 ("pvpanic: Fix typos in the comments")
33a430419456 ("pvpanic: Keep single style across modules")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee59be35d7a8be7fcaa2d61fb89734ab5c25e4ee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= <linux(a)weissschuh.net>
Date: Thu, 11 Apr 2024 23:33:51 +0200
Subject: [PATCH] misc/pvpanic-pci: register attributes via pci_driver
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In __pci_register_driver(), the pci core overwrites the dev_groups field of
the embedded struct device_driver with the dev_groups from the outer
struct pci_driver unconditionally.
Set dev_groups in the pci_driver to make sure it is used.
This was broken since the introduction of pvpanic-pci.
Fixes: db3a4f0abefd ("misc/pvpanic: add PCI driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Fixes: ded13b9cfd59 ("PCI: Add support for dev_groups to struct pci_driver")
Link: https://lore.kernel.org/r/20240411-pvpanic-pci-dev-groups-v1-1-db8cb69f1b09…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/misc/pvpanic/pvpanic-pci.c b/drivers/misc/pvpanic/pvpanic-pci.c
index 9ad20e82785b..b21598a18f6d 100644
--- a/drivers/misc/pvpanic/pvpanic-pci.c
+++ b/drivers/misc/pvpanic/pvpanic-pci.c
@@ -44,8 +44,6 @@ static struct pci_driver pvpanic_pci_driver = {
.name = "pvpanic-pci",
.id_table = pvpanic_pci_id_tbl,
.probe = pvpanic_pci_probe,
- .driver = {
- .dev_groups = pvpanic_dev_groups,
- },
+ .dev_groups = pvpanic_dev_groups,
};
module_pci_driver(pvpanic_pci_driver);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x ee59be35d7a8be7fcaa2d61fb89734ab5c25e4ee
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051340-puma-unshaven-405c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
ee59be35d7a8 ("misc/pvpanic-pci: register attributes via pci_driver")
c1426d392aeb ("misc/pvpanic: deduplicate common code")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee59be35d7a8be7fcaa2d61fb89734ab5c25e4ee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= <linux(a)weissschuh.net>
Date: Thu, 11 Apr 2024 23:33:51 +0200
Subject: [PATCH] misc/pvpanic-pci: register attributes via pci_driver
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In __pci_register_driver(), the pci core overwrites the dev_groups field of
the embedded struct device_driver with the dev_groups from the outer
struct pci_driver unconditionally.
Set dev_groups in the pci_driver to make sure it is used.
This was broken since the introduction of pvpanic-pci.
Fixes: db3a4f0abefd ("misc/pvpanic: add PCI driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Fixes: ded13b9cfd59 ("PCI: Add support for dev_groups to struct pci_driver")
Link: https://lore.kernel.org/r/20240411-pvpanic-pci-dev-groups-v1-1-db8cb69f1b09…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/misc/pvpanic/pvpanic-pci.c b/drivers/misc/pvpanic/pvpanic-pci.c
index 9ad20e82785b..b21598a18f6d 100644
--- a/drivers/misc/pvpanic/pvpanic-pci.c
+++ b/drivers/misc/pvpanic/pvpanic-pci.c
@@ -44,8 +44,6 @@ static struct pci_driver pvpanic_pci_driver = {
.name = "pvpanic-pci",
.id_table = pvpanic_pci_id_tbl,
.probe = pvpanic_pci_probe,
- .driver = {
- .dev_groups = pvpanic_dev_groups,
- },
+ .dev_groups = pvpanic_dev_groups,
};
module_pci_driver(pvpanic_pci_driver);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x ee59be35d7a8be7fcaa2d61fb89734ab5c25e4ee
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051339-strut-alienable-25c7@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
ee59be35d7a8 ("misc/pvpanic-pci: register attributes via pci_driver")
c1426d392aeb ("misc/pvpanic: deduplicate common code")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ee59be35d7a8be7fcaa2d61fb89734ab5c25e4ee Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Wei=C3=9Fschuh?= <linux(a)weissschuh.net>
Date: Thu, 11 Apr 2024 23:33:51 +0200
Subject: [PATCH] misc/pvpanic-pci: register attributes via pci_driver
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In __pci_register_driver(), the pci core overwrites the dev_groups field of
the embedded struct device_driver with the dev_groups from the outer
struct pci_driver unconditionally.
Set dev_groups in the pci_driver to make sure it is used.
This was broken since the introduction of pvpanic-pci.
Fixes: db3a4f0abefd ("misc/pvpanic: add PCI driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Weißschuh <linux(a)weissschuh.net>
Fixes: ded13b9cfd59 ("PCI: Add support for dev_groups to struct pci_driver")
Link: https://lore.kernel.org/r/20240411-pvpanic-pci-dev-groups-v1-1-db8cb69f1b09…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/misc/pvpanic/pvpanic-pci.c b/drivers/misc/pvpanic/pvpanic-pci.c
index 9ad20e82785b..b21598a18f6d 100644
--- a/drivers/misc/pvpanic/pvpanic-pci.c
+++ b/drivers/misc/pvpanic/pvpanic-pci.c
@@ -44,8 +44,6 @@ static struct pci_driver pvpanic_pci_driver = {
.name = "pvpanic-pci",
.id_table = pvpanic_pci_id_tbl,
.probe = pvpanic_pci_probe,
- .driver = {
- .dev_groups = pvpanic_dev_groups,
- },
+ .dev_groups = pvpanic_dev_groups,
};
module_pci_driver(pvpanic_pci_driver);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 819fe8c96a5172dfd960e5945e8f00f8fed32953
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051326-filter-stoplight-c8cc@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
819fe8c96a51 ("arm64: dts: qcom: sa8155p-adp: fix SDHC2 CD pin configuration")
028fe09cda0a ("arm64: dts: qcom: sm8150: align TLMM pin configuration with DT schema")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 819fe8c96a5172dfd960e5945e8f00f8fed32953 Mon Sep 17 00:00:00 2001
From: Volodymyr Babchuk <Volodymyr_Babchuk(a)epam.com>
Date: Fri, 12 Apr 2024 19:03:25 +0000
Subject: [PATCH] arm64: dts: qcom: sa8155p-adp: fix SDHC2 CD pin configuration
There are two issues with SDHC2 configuration for SA8155P-ADP,
which prevent use of SDHC2 and causes issues with ethernet:
- Card Detect pin for SHDC2 on SA8155P-ADP is connected to gpio4 of
PMM8155AU_1, not to SoC itself. SoC's gpio4 is used for DWMAC
TX. If sdhc driver probes after dwmac driver, it reconfigures
gpio4 and this breaks Ethernet MAC.
- pinctrl configuration mentions gpio96 as CD pin. It seems it was
copied from some SM8150 example, because as mentioned above,
correct CD pin is gpio4 on PMM8155AU_1.
This patch fixes both mentioned issues by providing correct pin handle
and pinctrl configuration.
Fixes: 0deb2624e2d0 ("arm64: dts: qcom: sa8155p-adp: Add support for uSD card")
Cc: stable(a)vger.kernel.org
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk(a)epam.com>
Reviewed-by: Stephan Gerhold <stephan(a)gerhold.net>
Link: https://lore.kernel.org/r/20240412190310.1647893-1-volodymyr_babchuk@epam.c…
Signed-off-by: Bjorn Andersson <andersson(a)kernel.org>
diff --git a/arch/arm64/boot/dts/qcom/sa8155p-adp.dts b/arch/arm64/boot/dts/qcom/sa8155p-adp.dts
index 5e4287f8c8cd..b2cf2c988336 100644
--- a/arch/arm64/boot/dts/qcom/sa8155p-adp.dts
+++ b/arch/arm64/boot/dts/qcom/sa8155p-adp.dts
@@ -367,6 +367,16 @@ queue0 {
};
};
+&pmm8155au_1_gpios {
+ pmm8155au_1_sdc2_cd: sdc2-cd-default-state {
+ pins = "gpio4";
+ function = "normal";
+ input-enable;
+ bias-pull-up;
+ power-source = <0>;
+ };
+};
+
&qupv3_id_1 {
status = "okay";
};
@@ -384,10 +394,10 @@ &remoteproc_cdsp {
&sdhc_2 {
status = "okay";
- cd-gpios = <&tlmm 4 GPIO_ACTIVE_LOW>;
+ cd-gpios = <&pmm8155au_1_gpios 4 GPIO_ACTIVE_LOW>;
pinctrl-names = "default", "sleep";
- pinctrl-0 = <&sdc2_on>;
- pinctrl-1 = <&sdc2_off>;
+ pinctrl-0 = <&sdc2_on &pmm8155au_1_sdc2_cd>;
+ pinctrl-1 = <&sdc2_off &pmm8155au_1_sdc2_cd>;
vqmmc-supply = <&vreg_l13c_2p96>; /* IO line power */
vmmc-supply = <&vreg_l17a_2p96>; /* Card power line */
bus-width = <4>;
@@ -505,13 +515,6 @@ data-pins {
bias-pull-up; /* pull up */
drive-strength = <16>; /* 16 MA */
};
-
- sd-cd-pins {
- pins = "gpio96";
- function = "gpio";
- bias-pull-up; /* pull up */
- drive-strength = <2>; /* 2 MA */
- };
};
sdc2_off: sdc2-off-state {
@@ -532,13 +535,6 @@ data-pins {
bias-pull-up; /* pull up */
drive-strength = <2>; /* 2 MA */
};
-
- sd-cd-pins {
- pins = "gpio96";
- function = "gpio";
- bias-pull-up; /* pull up */
- drive-strength = <2>; /* 2 MA */
- };
};
usb2phy_ac_en1_default: usb2phy-ac-en1-default-state {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 719564737a9ac3d0b49c314450b56cf6f7d71358
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051350-decent-trout-e701@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
719564737a9a ("drm/amd/display: Handle Y carry-over in VCP X.Y calculation")
3bc8d9214679 ("drm/amd/display: Add DP 2.0 HPO Link Encoder")
83228ebb82e4 ("drm/amd/display: Add DP 2.0 HPO Stream Encoder")
61452908a79e ("drm/amd/display: Add DP 2.0 Audio Package Generator")
926d6972efb6 ("drm/amd/display: Add DCN3.1 blocks to the DC Makefile")
2083640f0d5b ("drm/amd/display: Add DCN3.1 Resource")
cbaf919f3313 ("drm/amd/display: Add DCN3.1 DIO")
cd6d421e3d1a ("drm/amd/display: Initial DC support for Beige Goby")
2ff3cf823882 ("drm/amd/display: Fix hangs with psr enabled on dcn3.xx")
8cf9575d7079 ("drm/amd/display: Fix DSC enable sequence")
66611a721b59 ("drm/amd/display: Add debug flag to enable eDP ILR by default")
42b599732ee1 ("drm/amdgpu/display: fix memory leak for dimgrey cavefish")
e1f4328f22c0 ("drm/amd/display: Update link encoder object creation")
4f8e37dbaf58 ("drm/amd/display: Support for DMUB AUX")
1e3489136968 ("drm/amd/display: 3.2.124")
77a2b7265f20 ("drm/amd/display: Synchronize displays with different timings")
97628eb5ac20 ("drm/amd/display: 3.2.123")
ef4dd6b2757e ("drm/amd/display: 3.2.122")
6fce5bcee582 ("drm/amd/display: move edp sink present detection to hw init")
f1e17351984c ("drm/amd/display: 3.2.121")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 719564737a9ac3d0b49c314450b56cf6f7d71358 Mon Sep 17 00:00:00 2001
From: George Shen <george.shen(a)amd.com>
Date: Thu, 16 Sep 2021 19:55:39 -0400
Subject: [PATCH] drm/amd/display: Handle Y carry-over in VCP X.Y calculation
Theoretically rare corner case where ceil(Y) results in rounding up to
an integer. If this happens, the 1 should be carried over to the X
value.
CC: stable(a)vger.kernel.org
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Signed-off-by: George Shen <george.shen(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
index 5b7ad38f85e0..65e45a0b4ff3 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
@@ -395,6 +395,12 @@ void dcn31_hpo_dp_link_enc_set_throttled_vcp_size(
x),
25));
+ // If y rounds up to integer, carry it over to x.
+ if (y >> 25) {
+ x += 1;
+ y = 0;
+ }
+
switch (stream_encoder_inst) {
case 0:
REG_SET_2(DP_DPHY_SYM32_VC_RATE_CNTL0, 0,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 719564737a9ac3d0b49c314450b56cf6f7d71358
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051349-suitcase-strive-29c2@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
719564737a9a ("drm/amd/display: Handle Y carry-over in VCP X.Y calculation")
3bc8d9214679 ("drm/amd/display: Add DP 2.0 HPO Link Encoder")
83228ebb82e4 ("drm/amd/display: Add DP 2.0 HPO Stream Encoder")
61452908a79e ("drm/amd/display: Add DP 2.0 Audio Package Generator")
926d6972efb6 ("drm/amd/display: Add DCN3.1 blocks to the DC Makefile")
2083640f0d5b ("drm/amd/display: Add DCN3.1 Resource")
cbaf919f3313 ("drm/amd/display: Add DCN3.1 DIO")
cd6d421e3d1a ("drm/amd/display: Initial DC support for Beige Goby")
2ff3cf823882 ("drm/amd/display: Fix hangs with psr enabled on dcn3.xx")
8cf9575d7079 ("drm/amd/display: Fix DSC enable sequence")
66611a721b59 ("drm/amd/display: Add debug flag to enable eDP ILR by default")
42b599732ee1 ("drm/amdgpu/display: fix memory leak for dimgrey cavefish")
e1f4328f22c0 ("drm/amd/display: Update link encoder object creation")
4f8e37dbaf58 ("drm/amd/display: Support for DMUB AUX")
1e3489136968 ("drm/amd/display: 3.2.124")
77a2b7265f20 ("drm/amd/display: Synchronize displays with different timings")
97628eb5ac20 ("drm/amd/display: 3.2.123")
ef4dd6b2757e ("drm/amd/display: 3.2.122")
6fce5bcee582 ("drm/amd/display: move edp sink present detection to hw init")
f1e17351984c ("drm/amd/display: 3.2.121")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 719564737a9ac3d0b49c314450b56cf6f7d71358 Mon Sep 17 00:00:00 2001
From: George Shen <george.shen(a)amd.com>
Date: Thu, 16 Sep 2021 19:55:39 -0400
Subject: [PATCH] drm/amd/display: Handle Y carry-over in VCP X.Y calculation
Theoretically rare corner case where ceil(Y) results in rounding up to
an integer. If this happens, the 1 should be carried over to the X
value.
CC: stable(a)vger.kernel.org
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Signed-off-by: George Shen <george.shen(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
index 5b7ad38f85e0..65e45a0b4ff3 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
@@ -395,6 +395,12 @@ void dcn31_hpo_dp_link_enc_set_throttled_vcp_size(
x),
25));
+ // If y rounds up to integer, carry it over to x.
+ if (y >> 25) {
+ x += 1;
+ y = 0;
+ }
+
switch (stream_encoder_inst) {
case 0:
REG_SET_2(DP_DPHY_SYM32_VC_RATE_CNTL0, 0,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 719564737a9ac3d0b49c314450b56cf6f7d71358
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051348-unclad-penpal-8f46@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
719564737a9a ("drm/amd/display: Handle Y carry-over in VCP X.Y calculation")
3bc8d9214679 ("drm/amd/display: Add DP 2.0 HPO Link Encoder")
83228ebb82e4 ("drm/amd/display: Add DP 2.0 HPO Stream Encoder")
61452908a79e ("drm/amd/display: Add DP 2.0 Audio Package Generator")
926d6972efb6 ("drm/amd/display: Add DCN3.1 blocks to the DC Makefile")
2083640f0d5b ("drm/amd/display: Add DCN3.1 Resource")
cbaf919f3313 ("drm/amd/display: Add DCN3.1 DIO")
cd6d421e3d1a ("drm/amd/display: Initial DC support for Beige Goby")
2ff3cf823882 ("drm/amd/display: Fix hangs with psr enabled on dcn3.xx")
8cf9575d7079 ("drm/amd/display: Fix DSC enable sequence")
66611a721b59 ("drm/amd/display: Add debug flag to enable eDP ILR by default")
42b599732ee1 ("drm/amdgpu/display: fix memory leak for dimgrey cavefish")
e1f4328f22c0 ("drm/amd/display: Update link encoder object creation")
4f8e37dbaf58 ("drm/amd/display: Support for DMUB AUX")
1e3489136968 ("drm/amd/display: 3.2.124")
77a2b7265f20 ("drm/amd/display: Synchronize displays with different timings")
97628eb5ac20 ("drm/amd/display: 3.2.123")
ef4dd6b2757e ("drm/amd/display: 3.2.122")
6fce5bcee582 ("drm/amd/display: move edp sink present detection to hw init")
f1e17351984c ("drm/amd/display: 3.2.121")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 719564737a9ac3d0b49c314450b56cf6f7d71358 Mon Sep 17 00:00:00 2001
From: George Shen <george.shen(a)amd.com>
Date: Thu, 16 Sep 2021 19:55:39 -0400
Subject: [PATCH] drm/amd/display: Handle Y carry-over in VCP X.Y calculation
Theoretically rare corner case where ceil(Y) results in rounding up to
an integer. If this happens, the 1 should be carried over to the X
value.
CC: stable(a)vger.kernel.org
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Signed-off-by: George Shen <george.shen(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
index 5b7ad38f85e0..65e45a0b4ff3 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
@@ -395,6 +395,12 @@ void dcn31_hpo_dp_link_enc_set_throttled_vcp_size(
x),
25));
+ // If y rounds up to integer, carry it over to x.
+ if (y >> 25) {
+ x += 1;
+ y = 0;
+ }
+
switch (stream_encoder_inst) {
case 0:
REG_SET_2(DP_DPHY_SYM32_VC_RATE_CNTL0, 0,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 719564737a9ac3d0b49c314450b56cf6f7d71358
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051348-consensus-polygon-1044@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
719564737a9a ("drm/amd/display: Handle Y carry-over in VCP X.Y calculation")
3bc8d9214679 ("drm/amd/display: Add DP 2.0 HPO Link Encoder")
83228ebb82e4 ("drm/amd/display: Add DP 2.0 HPO Stream Encoder")
61452908a79e ("drm/amd/display: Add DP 2.0 Audio Package Generator")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 719564737a9ac3d0b49c314450b56cf6f7d71358 Mon Sep 17 00:00:00 2001
From: George Shen <george.shen(a)amd.com>
Date: Thu, 16 Sep 2021 19:55:39 -0400
Subject: [PATCH] drm/amd/display: Handle Y carry-over in VCP X.Y calculation
Theoretically rare corner case where ceil(Y) results in rounding up to
an integer. If this happens, the 1 should be carried over to the X
value.
CC: stable(a)vger.kernel.org
Reviewed-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Signed-off-by: George Shen <george.shen(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
index 5b7ad38f85e0..65e45a0b4ff3 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
@@ -395,6 +395,12 @@ void dcn31_hpo_dp_link_enc_set_throttled_vcp_size(
x),
25));
+ // If y rounds up to integer, carry it over to x.
+ if (y >> 25) {
+ x += 1;
+ y = 0;
+ }
+
switch (stream_encoder_inst) {
case 0:
REG_SET_2(DP_DPHY_SYM32_VC_RATE_CNTL0, 0,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x be4a2a81b6b90d1a47eaeaace4cc8e2cb57b96c7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051310-outcast-feisty-83a7@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
be4a2a81b6b9 ("drm/amdkfd: don't allow mapping the MMIO HDP page with large pages")
b38c074b2b07 ("drm/amdkfd: CRIU Refactor restore BO function")
67a359d85ec2 ("drm/amdkfd: CRIU remove sync and TLB flush on restore")
22804e03f7a5 ("drm/amdkfd: Fix criu_restore_bo error handling")
d8a25e485857 ("drm/amdkfd: fix loop error handling")
e5af61ffaaef ("drm/amdkfd: CRIU fix a NULL vs IS_ERR() check")
be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
bef153b70c6e ("drm/amdkfd: CRIU implement gpu_id remapping")
40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events")
42c6c48214b7 ("drm/amdkfd: CRIU checkpoint and restore queue mqds")
2485c12c980a ("drm/amdkfd: CRIU restore sdma id for queues")
8668dfc30d3e ("drm/amdkfd: CRIU restore queue ids")
626f7b3190b4 ("drm/amdkfd: CRIU add queues support")
cd9f79103003 ("drm/amdkfd: CRIU Implement KFD unpause operation")
011bbb03024f ("drm/amdkfd: CRIU Implement KFD resume ioctl")
73fa13b6a511 ("drm/amdkfd: CRIU Implement KFD restore ioctl")
5ccbb057c0a1 ("drm/amdkfd: CRIU Implement KFD checkpoint ioctl")
f185381b6481 ("drm/amdkfd: CRIU Implement KFD process_info ioctl")
3698807094ec ("drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs")
f61c40c0757a ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be4a2a81b6b90d1a47eaeaace4cc8e2cb57b96c7 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Sun, 14 Apr 2024 13:06:39 -0400
Subject: [PATCH] drm/amdkfd: don't allow mapping the MMIO HDP page with large
pages
We don't get the right offset in that case. The GPU has
an unused 4K area of the register BAR space into which you can
remap registers. We remap the HDP flush registers into this
space to allow userspace (CPU or GPU) to flush the HDP when it
updates VRAM. However, on systems with >4K pages, we end up
exposing PAGE_SIZE of MMIO space.
Fixes: d8e408a82704 ("drm/amdkfd: Expose HDP registers to user space")
Reviewed-by: Felix Kuehling <felix.kuehling(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 55aa74cbc532..1e6cc0bfc432 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1139,7 +1139,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
goto err_unlock;
}
offset = dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
err = -ENOMEM;
goto err_unlock;
}
@@ -2307,7 +2307,7 @@ static int criu_restore_memory_of_gpu(struct kfd_process_device *pdd,
return -EINVAL;
}
offset = pdd->dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
pr_err("amdgpu_amdkfd_get_mmio_remap_phys_addr failed\n");
return -ENOMEM;
}
@@ -3349,6 +3349,9 @@ static int kfd_mmio_mmap(struct kfd_node *dev, struct kfd_process *process,
if (vma->vm_end - vma->vm_start != PAGE_SIZE)
return -EINVAL;
+ if (PAGE_SIZE > 4096)
+ return -EINVAL;
+
address = dev->adev->rmmio_remap.bus_addr;
vm_flags_set(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x be4a2a81b6b90d1a47eaeaace4cc8e2cb57b96c7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051310-stoppable-backyard-6cff@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
be4a2a81b6b9 ("drm/amdkfd: don't allow mapping the MMIO HDP page with large pages")
b38c074b2b07 ("drm/amdkfd: CRIU Refactor restore BO function")
67a359d85ec2 ("drm/amdkfd: CRIU remove sync and TLB flush on restore")
22804e03f7a5 ("drm/amdkfd: Fix criu_restore_bo error handling")
d8a25e485857 ("drm/amdkfd: fix loop error handling")
e5af61ffaaef ("drm/amdkfd: CRIU fix a NULL vs IS_ERR() check")
be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
bef153b70c6e ("drm/amdkfd: CRIU implement gpu_id remapping")
40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events")
42c6c48214b7 ("drm/amdkfd: CRIU checkpoint and restore queue mqds")
2485c12c980a ("drm/amdkfd: CRIU restore sdma id for queues")
8668dfc30d3e ("drm/amdkfd: CRIU restore queue ids")
626f7b3190b4 ("drm/amdkfd: CRIU add queues support")
cd9f79103003 ("drm/amdkfd: CRIU Implement KFD unpause operation")
011bbb03024f ("drm/amdkfd: CRIU Implement KFD resume ioctl")
73fa13b6a511 ("drm/amdkfd: CRIU Implement KFD restore ioctl")
5ccbb057c0a1 ("drm/amdkfd: CRIU Implement KFD checkpoint ioctl")
f185381b6481 ("drm/amdkfd: CRIU Implement KFD process_info ioctl")
3698807094ec ("drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs")
f61c40c0757a ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be4a2a81b6b90d1a47eaeaace4cc8e2cb57b96c7 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Sun, 14 Apr 2024 13:06:39 -0400
Subject: [PATCH] drm/amdkfd: don't allow mapping the MMIO HDP page with large
pages
We don't get the right offset in that case. The GPU has
an unused 4K area of the register BAR space into which you can
remap registers. We remap the HDP flush registers into this
space to allow userspace (CPU or GPU) to flush the HDP when it
updates VRAM. However, on systems with >4K pages, we end up
exposing PAGE_SIZE of MMIO space.
Fixes: d8e408a82704 ("drm/amdkfd: Expose HDP registers to user space")
Reviewed-by: Felix Kuehling <felix.kuehling(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 55aa74cbc532..1e6cc0bfc432 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1139,7 +1139,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
goto err_unlock;
}
offset = dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
err = -ENOMEM;
goto err_unlock;
}
@@ -2307,7 +2307,7 @@ static int criu_restore_memory_of_gpu(struct kfd_process_device *pdd,
return -EINVAL;
}
offset = pdd->dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
pr_err("amdgpu_amdkfd_get_mmio_remap_phys_addr failed\n");
return -ENOMEM;
}
@@ -3349,6 +3349,9 @@ static int kfd_mmio_mmap(struct kfd_node *dev, struct kfd_process *process,
if (vma->vm_end - vma->vm_start != PAGE_SIZE)
return -EINVAL;
+ if (PAGE_SIZE > 4096)
+ return -EINVAL;
+
address = dev->adev->rmmio_remap.bus_addr;
vm_flags_set(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x be4a2a81b6b90d1a47eaeaace4cc8e2cb57b96c7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051309-blanching-list-a7ab@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
be4a2a81b6b9 ("drm/amdkfd: don't allow mapping the MMIO HDP page with large pages")
b38c074b2b07 ("drm/amdkfd: CRIU Refactor restore BO function")
67a359d85ec2 ("drm/amdkfd: CRIU remove sync and TLB flush on restore")
22804e03f7a5 ("drm/amdkfd: Fix criu_restore_bo error handling")
d8a25e485857 ("drm/amdkfd: fix loop error handling")
e5af61ffaaef ("drm/amdkfd: CRIU fix a NULL vs IS_ERR() check")
be072b06c739 ("drm/amdkfd: CRIU export BOs as prime dmabuf objects")
bef153b70c6e ("drm/amdkfd: CRIU implement gpu_id remapping")
40e8a766a761 ("drm/amdkfd: CRIU checkpoint and restore events")
42c6c48214b7 ("drm/amdkfd: CRIU checkpoint and restore queue mqds")
2485c12c980a ("drm/amdkfd: CRIU restore sdma id for queues")
8668dfc30d3e ("drm/amdkfd: CRIU restore queue ids")
626f7b3190b4 ("drm/amdkfd: CRIU add queues support")
cd9f79103003 ("drm/amdkfd: CRIU Implement KFD unpause operation")
011bbb03024f ("drm/amdkfd: CRIU Implement KFD resume ioctl")
73fa13b6a511 ("drm/amdkfd: CRIU Implement KFD restore ioctl")
5ccbb057c0a1 ("drm/amdkfd: CRIU Implement KFD checkpoint ioctl")
f185381b6481 ("drm/amdkfd: CRIU Implement KFD process_info ioctl")
3698807094ec ("drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs")
f61c40c0757a ("drm/amdkfd: enable heavy-weight TLB flush on Arcturus")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be4a2a81b6b90d1a47eaeaace4cc8e2cb57b96c7 Mon Sep 17 00:00:00 2001
From: Alex Deucher <alexander.deucher(a)amd.com>
Date: Sun, 14 Apr 2024 13:06:39 -0400
Subject: [PATCH] drm/amdkfd: don't allow mapping the MMIO HDP page with large
pages
We don't get the right offset in that case. The GPU has
an unused 4K area of the register BAR space into which you can
remap registers. We remap the HDP flush registers into this
space to allow userspace (CPU or GPU) to flush the HDP when it
updates VRAM. However, on systems with >4K pages, we end up
exposing PAGE_SIZE of MMIO space.
Fixes: d8e408a82704 ("drm/amdkfd: Expose HDP registers to user space")
Reviewed-by: Felix Kuehling <felix.kuehling(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 55aa74cbc532..1e6cc0bfc432 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -1139,7 +1139,7 @@ static int kfd_ioctl_alloc_memory_of_gpu(struct file *filep,
goto err_unlock;
}
offset = dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
err = -ENOMEM;
goto err_unlock;
}
@@ -2307,7 +2307,7 @@ static int criu_restore_memory_of_gpu(struct kfd_process_device *pdd,
return -EINVAL;
}
offset = pdd->dev->adev->rmmio_remap.bus_addr;
- if (!offset) {
+ if (!offset || (PAGE_SIZE > 4096)) {
pr_err("amdgpu_amdkfd_get_mmio_remap_phys_addr failed\n");
return -ENOMEM;
}
@@ -3349,6 +3349,9 @@ static int kfd_mmio_mmap(struct kfd_node *dev, struct kfd_process *process,
if (vma->vm_end - vma->vm_start != PAGE_SIZE)
return -EINVAL;
+ if (PAGE_SIZE > 4096)
+ return -EINVAL;
+
address = dev->adev->rmmio_remap.bus_addr;
vm_flags_set(vma, VM_IO | VM_DONTCOPY | VM_DONTEXPAND | VM_NORESERVE |
On 5/6/24 2:30 PM, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> spi: axi-spi-engine: fix version format string
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> spi-axi-spi-engine-fix-version-format-string.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
Does not meet the criteria for stable.
(only fixes theoretical problem, not actual documented problem)
On 5/6/24 2:30 PM, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> spi: axi-spi-engine: Convert to platform remove callback returning void
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> spi-axi-spi-engine-convert-to-platform-remove-callba.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
Does not meet the criteria for stable.
This commit landed in Linux v5.16, and should have been back-ported
to stable branches:
0ac10b291bee8 arm64: dts: qcom: Fix 'interrupt-map' parent address cells
It has three hunks, and the commit can be cherry-picked directly
into the 5.15.y and 5.10.y stable branches. The first hunk fixes
a problem first introduced in Linux v5.2, so that hunk (only) should
be back-ported to v5.4.y.
Signed-off-by: Alex Elder <elder(a)linaro.org>
Is there anything further I need to do? If you'd prefer I send
patches, I can do that also (just ask). Thanks.
-Alex
Rob's original fix in Linus' tree:
0ac10b291bee8 arm64: dts: qcom: Fix 'interrupt-map' parent address cells
This first landed in v5.16.
The commit that introduced the problem in the first hunk:
b84dfd175c098 arm64: dts: qcom: msm8998: Add PCIe PHY and RC nodes
This first landed in v5.2, so back-port to v5.4.y, v5.10.y, v5.15.y.
The commit that introduced the problem in the second hunk:
5c538e09cb19b arm64: dts: qcom: sdm845: Add first PCIe controller and PHY
This first landed in v5.7, so back-port to v5.10.y, v5.15.y
The commit that introduced the problem in the third hunk:
42ad231338c14 arm64: dts: qcom: sdm845: Add second PCIe PHY and
controller
This first landed in v5.7 also
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x d18ca8635db2f88c17acbdf6412f26d4f6aff414
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051308-sullen-plug-a6fb@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
d18ca8635db2 ("ASoC: ti: davinci-mcasp: Fix race condition during probe")
1b4fb70e5b28 ("ASoC: ti: davinci-mcasp: Handle missing required DT properties")
1125d925990b ("ASoC: ti: davinci-mcasp: Simplify the configuration parameter handling")
db8793a39b29 ("ASoC: ti: davinci-mcasp: Remove legacy dma_request parsing")
372c4bd11de1 ("ASoC: ti: davinci-mcasp: Use platform_get_irq_byname_optional")
19f6e424d615 ("ASoC: ti: davinci-mcasp: remove always zero of davinci_mcasp_get_dt_params")
f4d95de415b2 ("ASoC: ti: davinci-mcasp: remove redundant assignment to variable ret")
764958f2b523 ("ASoC: ti: davinci-mcasp: Support for auxclk-fs-ratio")
540f1ba7b3a5 ("ASoC: ti: davinci-mcasp: Add support for GPIO mode of the pins")
617547175507 ("ASoC: ti: davinci-mcasp: Move context save/restore to runtime_pm callbacks")
f2055e145f29 ("ASoC: ti: Merge davinci and omap directories")
bc1845498531 ("ASoC: davinci-mcasp: Implement configurable dismod handling")
ca3d9433349e ("ASoC: davinci-mcasp: Update PDIR (pin direction) register handling")
9c34d023dc35 ("ASoC: omap-mcbsp: Re-arrange files for core McBSP and Sidetone function split")
be51c576e849 ("ASoC: omap-mcbsp: Move out the FIFO check from set_threshold and get_delay")
59d177f65f50 ("ASoC: omap-mcbsp: Simplify the mcbsp_start/_stop function parameters")
d63a7625a6df ("ASoC: omap-mcbsp: Clean up the interrupt handlers")
c9ece9c29e26 ("ASoC: omap-mcbsp: Skip dma_data.maxburst initialization")
dd443a7c0b00 ("ASoC: omap-mcbsp: Clean up dma_data addr initialization code")
2c2596f3ab25 ("ASoC: omap: Remove unused machine driver for AM3517-evm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d18ca8635db2f88c17acbdf6412f26d4f6aff414 Mon Sep 17 00:00:00 2001
From: Joao Paulo Goncalves <joao.goncalves(a)toradex.com>
Date: Wed, 17 Apr 2024 15:41:38 -0300
Subject: [PATCH] ASoC: ti: davinci-mcasp: Fix race condition during probe
When using davinci-mcasp as CPU DAI with simple-card, there are some
conditions that cause simple-card to finish registering a sound card before
davinci-mcasp finishes registering all sound components. This creates a
non-working sound card from userspace with no problem indication apart
from not being able to play/record audio on a PCM stream. The issue
arises during simultaneous probe execution of both drivers. Specifically,
the simple-card driver, awaiting a CPU DAI, proceeds as soon as
davinci-mcasp registers its DAI. However, this process can lead to the
client mutex lock (client_mutex in soc-core.c) being held or davinci-mcasp
being preempted before PCM DMA registration on davinci-mcasp finishes.
This situation occurs when the probes of both drivers run concurrently.
Below is the code path for this condition. To solve the issue, defer
davinci-mcasp CPU DAI registration to the last step in the audio part of
it. This way, simple-card CPU DAI parsing will be deferred until all
audio components are registered.
Fail Code Path:
simple-card.c: probe starts
simple-card.c: simple_dai_link_of: simple_parse_node(..,cpu,..) returns EPROBE_DEFER, no CPU DAI yet
davinci-mcasp.c: probe starts
davinci-mcasp.c: devm_snd_soc_register_component() register CPU DAI
simple-card.c: probes again, finish CPU DAI parsing and call devm_snd_soc_register_card()
simple-card.c: finish probe
davinci-mcasp.c: *dma_pcm_platform_register() register PCM DMA
davinci-mcasp.c: probe finish
Cc: stable(a)vger.kernel.org
Fixes: 9fbd58cf4ab0 ("ASoC: davinci-mcasp: Choose PCM driver based on configured DMA controller")
Signed-off-by: Joao Paulo Goncalves <joao.goncalves(a)toradex.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi(a)gmail.com>
Reviewed-by: Jai Luthra <j-luthra(a)ti.com>
Link: https://lore.kernel.org/r/20240417184138.1104774-1-jpaulo.silvagoncalves@gm…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/ti/davinci-mcasp.c b/sound/soc/ti/davinci-mcasp.c
index b892d66f7847..1e760c315521 100644
--- a/sound/soc/ti/davinci-mcasp.c
+++ b/sound/soc/ti/davinci-mcasp.c
@@ -2417,12 +2417,6 @@ static int davinci_mcasp_probe(struct platform_device *pdev)
mcasp_reparent_fck(pdev);
- ret = devm_snd_soc_register_component(&pdev->dev, &davinci_mcasp_component,
- &davinci_mcasp_dai[mcasp->op_mode], 1);
-
- if (ret != 0)
- goto err;
-
ret = davinci_mcasp_get_dma_type(mcasp);
switch (ret) {
case PCM_EDMA:
@@ -2449,6 +2443,12 @@ static int davinci_mcasp_probe(struct platform_device *pdev)
goto err;
}
+ ret = devm_snd_soc_register_component(&pdev->dev, &davinci_mcasp_component,
+ &davinci_mcasp_dai[mcasp->op_mode], 1);
+
+ if (ret != 0)
+ goto err;
+
no_audio:
ret = davinci_mcasp_init_gpiochip(mcasp);
if (ret) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x d18ca8635db2f88c17acbdf6412f26d4f6aff414
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051302-deluge-renovate-d904@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
d18ca8635db2 ("ASoC: ti: davinci-mcasp: Fix race condition during probe")
1b4fb70e5b28 ("ASoC: ti: davinci-mcasp: Handle missing required DT properties")
1125d925990b ("ASoC: ti: davinci-mcasp: Simplify the configuration parameter handling")
db8793a39b29 ("ASoC: ti: davinci-mcasp: Remove legacy dma_request parsing")
372c4bd11de1 ("ASoC: ti: davinci-mcasp: Use platform_get_irq_byname_optional")
19f6e424d615 ("ASoC: ti: davinci-mcasp: remove always zero of davinci_mcasp_get_dt_params")
f4d95de415b2 ("ASoC: ti: davinci-mcasp: remove redundant assignment to variable ret")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d18ca8635db2f88c17acbdf6412f26d4f6aff414 Mon Sep 17 00:00:00 2001
From: Joao Paulo Goncalves <joao.goncalves(a)toradex.com>
Date: Wed, 17 Apr 2024 15:41:38 -0300
Subject: [PATCH] ASoC: ti: davinci-mcasp: Fix race condition during probe
When using davinci-mcasp as CPU DAI with simple-card, there are some
conditions that cause simple-card to finish registering a sound card before
davinci-mcasp finishes registering all sound components. This creates a
non-working sound card from userspace with no problem indication apart
from not being able to play/record audio on a PCM stream. The issue
arises during simultaneous probe execution of both drivers. Specifically,
the simple-card driver, awaiting a CPU DAI, proceeds as soon as
davinci-mcasp registers its DAI. However, this process can lead to the
client mutex lock (client_mutex in soc-core.c) being held or davinci-mcasp
being preempted before PCM DMA registration on davinci-mcasp finishes.
This situation occurs when the probes of both drivers run concurrently.
Below is the code path for this condition. To solve the issue, defer
davinci-mcasp CPU DAI registration to the last step in the audio part of
it. This way, simple-card CPU DAI parsing will be deferred until all
audio components are registered.
Fail Code Path:
simple-card.c: probe starts
simple-card.c: simple_dai_link_of: simple_parse_node(..,cpu,..) returns EPROBE_DEFER, no CPU DAI yet
davinci-mcasp.c: probe starts
davinci-mcasp.c: devm_snd_soc_register_component() register CPU DAI
simple-card.c: probes again, finish CPU DAI parsing and call devm_snd_soc_register_card()
simple-card.c: finish probe
davinci-mcasp.c: *dma_pcm_platform_register() register PCM DMA
davinci-mcasp.c: probe finish
Cc: stable(a)vger.kernel.org
Fixes: 9fbd58cf4ab0 ("ASoC: davinci-mcasp: Choose PCM driver based on configured DMA controller")
Signed-off-by: Joao Paulo Goncalves <joao.goncalves(a)toradex.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi(a)gmail.com>
Reviewed-by: Jai Luthra <j-luthra(a)ti.com>
Link: https://lore.kernel.org/r/20240417184138.1104774-1-jpaulo.silvagoncalves@gm…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/ti/davinci-mcasp.c b/sound/soc/ti/davinci-mcasp.c
index b892d66f7847..1e760c315521 100644
--- a/sound/soc/ti/davinci-mcasp.c
+++ b/sound/soc/ti/davinci-mcasp.c
@@ -2417,12 +2417,6 @@ static int davinci_mcasp_probe(struct platform_device *pdev)
mcasp_reparent_fck(pdev);
- ret = devm_snd_soc_register_component(&pdev->dev, &davinci_mcasp_component,
- &davinci_mcasp_dai[mcasp->op_mode], 1);
-
- if (ret != 0)
- goto err;
-
ret = davinci_mcasp_get_dma_type(mcasp);
switch (ret) {
case PCM_EDMA:
@@ -2449,6 +2443,12 @@ static int davinci_mcasp_probe(struct platform_device *pdev)
goto err;
}
+ ret = devm_snd_soc_register_component(&pdev->dev, &davinci_mcasp_component,
+ &davinci_mcasp_dai[mcasp->op_mode], 1);
+
+ if (ret != 0)
+ goto err;
+
no_audio:
ret = davinci_mcasp_init_gpiochip(mcasp);
if (ret) {
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x d18ca8635db2f88c17acbdf6412f26d4f6aff414
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051357-errand-chariot-abc6@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
d18ca8635db2 ("ASoC: ti: davinci-mcasp: Fix race condition during probe")
1b4fb70e5b28 ("ASoC: ti: davinci-mcasp: Handle missing required DT properties")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d18ca8635db2f88c17acbdf6412f26d4f6aff414 Mon Sep 17 00:00:00 2001
From: Joao Paulo Goncalves <joao.goncalves(a)toradex.com>
Date: Wed, 17 Apr 2024 15:41:38 -0300
Subject: [PATCH] ASoC: ti: davinci-mcasp: Fix race condition during probe
When using davinci-mcasp as CPU DAI with simple-card, there are some
conditions that cause simple-card to finish registering a sound card before
davinci-mcasp finishes registering all sound components. This creates a
non-working sound card from userspace with no problem indication apart
from not being able to play/record audio on a PCM stream. The issue
arises during simultaneous probe execution of both drivers. Specifically,
the simple-card driver, awaiting a CPU DAI, proceeds as soon as
davinci-mcasp registers its DAI. However, this process can lead to the
client mutex lock (client_mutex in soc-core.c) being held or davinci-mcasp
being preempted before PCM DMA registration on davinci-mcasp finishes.
This situation occurs when the probes of both drivers run concurrently.
Below is the code path for this condition. To solve the issue, defer
davinci-mcasp CPU DAI registration to the last step in the audio part of
it. This way, simple-card CPU DAI parsing will be deferred until all
audio components are registered.
Fail Code Path:
simple-card.c: probe starts
simple-card.c: simple_dai_link_of: simple_parse_node(..,cpu,..) returns EPROBE_DEFER, no CPU DAI yet
davinci-mcasp.c: probe starts
davinci-mcasp.c: devm_snd_soc_register_component() register CPU DAI
simple-card.c: probes again, finish CPU DAI parsing and call devm_snd_soc_register_card()
simple-card.c: finish probe
davinci-mcasp.c: *dma_pcm_platform_register() register PCM DMA
davinci-mcasp.c: probe finish
Cc: stable(a)vger.kernel.org
Fixes: 9fbd58cf4ab0 ("ASoC: davinci-mcasp: Choose PCM driver based on configured DMA controller")
Signed-off-by: Joao Paulo Goncalves <joao.goncalves(a)toradex.com>
Acked-by: Peter Ujfalusi <peter.ujfalusi(a)gmail.com>
Reviewed-by: Jai Luthra <j-luthra(a)ti.com>
Link: https://lore.kernel.org/r/20240417184138.1104774-1-jpaulo.silvagoncalves@gm…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/ti/davinci-mcasp.c b/sound/soc/ti/davinci-mcasp.c
index b892d66f7847..1e760c315521 100644
--- a/sound/soc/ti/davinci-mcasp.c
+++ b/sound/soc/ti/davinci-mcasp.c
@@ -2417,12 +2417,6 @@ static int davinci_mcasp_probe(struct platform_device *pdev)
mcasp_reparent_fck(pdev);
- ret = devm_snd_soc_register_component(&pdev->dev, &davinci_mcasp_component,
- &davinci_mcasp_dai[mcasp->op_mode], 1);
-
- if (ret != 0)
- goto err;
-
ret = davinci_mcasp_get_dma_type(mcasp);
switch (ret) {
case PCM_EDMA:
@@ -2449,6 +2443,12 @@ static int davinci_mcasp_probe(struct platform_device *pdev)
goto err;
}
+ ret = devm_snd_soc_register_component(&pdev->dev, &davinci_mcasp_component,
+ &davinci_mcasp_dai[mcasp->op_mode], 1);
+
+ if (ret != 0)
+ goto err;
+
no_audio:
ret = davinci_mcasp_init_gpiochip(mcasp);
if (ret) {
Hello,
We discovered that the CONFIG_INFINIBAND_IRDMA configuration option in
the linux kernel is causing excessive memory usage on idle mode on
specific servers like the DELL VEP4600
(https://www.dell.com/en-us/shop/ipovw/virtual-edge-platform-4600.
By default we were using Debian's linux-image-6.1.0-13-amd64 which is
the stable 6.1.55-1 amd64, we then compiled the kernel again with the
same config file from the stable 6.1.55 tag and had the same problem. We
were able to resolve the memory problem by removing the
`CONFIG_INFINIBAND_IRDMA` option from the kernel config.
The tag used to reproduce the problem is v6.1.55.
adding the following config `CONFIG_INFINIBAND_IRDMA=m` causes the
excessive memory usage to go from 1.4Gb to 7Gb.
Here are the 2 config files used and the outputs showing the Memory
usage in both cases.
Do you have any ideas regarding this bug, we only had this bug on this
specific hardware model mentioned above.
Debian version of the linux 6.1.55:
#uname -a
Linux wibox 6.1.0-13-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1
(2023-09-29) x86_64 GNU/Linux
#free -m
total used free shared buff/cache
available
Mem: 15724 6757 8840 3 410 8967
Swap: 0 0 0
Compiled version without INFINIBAND:
#uname -a
Linux wibox 6.1.55-without-infiniband #75 SMP PREEMPT_DYNAMIC Mon Apr 29
19:06:23 CEST 2024 x86_64 GNU/Linux
#free -m
total used free shared buff/cache
available
Mem: 15724 1480 13339 3 1205 14244
Swap: 0 0 0
Thank you,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 2dbe5f19368caae63b1f59f5bc2af78c7d522b3a
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024051323-tinfoil-decoy-9476@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
2dbe5f19368c ("net: bcmgenet: synchronize use of bcmgenet_set_rx_mode()")
1a1d5106c1e3 ("net: bcmgenet: move clk_wol management to bcmgenet_wol")
6f7689057a0f ("net: bcmgenet: Fix WoL with password after deep sleep")
72f96347628e ("net: bcmgenet: set Rx mode before starting netif")
99d55638d4b0 ("net: bcmgenet: enable NETIF_F_HIGHDMA flag")
d7ee287827ef ("Merge tag 'blk-dim-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2dbe5f19368caae63b1f59f5bc2af78c7d522b3a Mon Sep 17 00:00:00 2001
From: Doug Berger <opendmb(a)gmail.com>
Date: Thu, 25 Apr 2024 15:27:20 -0700
Subject: [PATCH] net: bcmgenet: synchronize use of bcmgenet_set_rx_mode()
The ndo_set_rx_mode function is synchronized with the
netif_addr_lock spinlock and BHs disabled. Since this
function is also invoked directly from the driver the
same synchronization should be applied.
Fixes: 72f96347628e ("net: bcmgenet: set Rx mode before starting netif")
Cc: stable(a)vger.kernel.org
Signed-off-by: Doug Berger <opendmb(a)gmail.com>
Acked-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/broadcom/genet/bcmgenet.c b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
index b1f84b37032a..5452b7dc6e6a 100644
--- a/drivers/net/ethernet/broadcom/genet/bcmgenet.c
+++ b/drivers/net/ethernet/broadcom/genet/bcmgenet.c
@@ -2,7 +2,7 @@
/*
* Broadcom GENET (Gigabit Ethernet) controller driver
*
- * Copyright (c) 2014-2020 Broadcom
+ * Copyright (c) 2014-2024 Broadcom
*/
#define pr_fmt(fmt) "bcmgenet: " fmt
@@ -3334,7 +3334,9 @@ static void bcmgenet_netif_start(struct net_device *dev)
struct bcmgenet_priv *priv = netdev_priv(dev);
/* Start the network engine */
+ netif_addr_lock_bh(dev);
bcmgenet_set_rx_mode(dev);
+ netif_addr_unlock_bh(dev);
bcmgenet_enable_rx_napi(priv);
umac_enable_set(priv, CMD_TX_EN | CMD_RX_EN, true);