From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
When a remount happens, if a gid or uid is specified update the inodes to
have the same gid and uid. This will allow the simplification of the
permissions logic for the dynamically created files and directories.
Cc: stable(a)vger.kernel.org
Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 17 +++++++++++++----
fs/tracefs/inode.c | 15 ++++++++++++---
2 files changed, 25 insertions(+), 7 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 55a40a730b10..5dfb1ccd56ea 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -317,20 +317,29 @@ void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid)
if (!ei)
return;
- if (update_uid)
+ if (update_uid) {
ei->attr.mode &= ~EVENTFS_SAVE_UID;
+ ei->attr.uid = ti->vfs_inode.i_uid;
+ }
+
- if (update_gid)
+ if (update_gid) {
ei->attr.mode &= ~EVENTFS_SAVE_GID;
+ ei->attr.gid = ti->vfs_inode.i_gid;
+ }
if (!ei->entry_attrs)
return;
for (int i = 0; i < ei->nr_entries; i++) {
- if (update_uid)
+ if (update_uid) {
ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID;
- if (update_gid)
+ ei->entry_attrs[i].uid = ti->vfs_inode.i_uid;
+ }
+ if (update_gid) {
ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID;
+ ei->entry_attrs[i].gid = ti->vfs_inode.i_gid;
+ }
}
}
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index a827f6a716c4..9252e0d78ea2 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -373,12 +373,21 @@ static int tracefs_apply_options(struct super_block *sb, bool remount)
rcu_read_lock();
list_for_each_entry_rcu(ti, &tracefs_inodes, list) {
- if (update_uid)
+ if (update_uid) {
ti->flags &= ~TRACEFS_UID_PERM_SET;
+ ti->vfs_inode.i_uid = fsi->uid;
+ }
- if (update_gid)
+ if (update_gid) {
ti->flags &= ~TRACEFS_GID_PERM_SET;
-
+ ti->vfs_inode.i_gid = fsi->gid;
+ }
+
+ /*
+ * Note, the above ti->vfs_inode updates are
+ * used in eventfs_remount() so they must come
+ * before calling it.
+ */
if (ti->flags & TRACEFS_EVENT_INODE)
eventfs_remount(ti, update_uid, update_gid);
}
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The directories require unique inode numbers but all the eventfs files
have the same inode number. Prevent the directories from having the same
inode numbers as the files as that can confuse some tooling.
Cc: stable(a)vger.kernel.org
Fixes: 834bf76add3e6 ("eventfs: Save directory inodes in the eventfs_inode structure")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 0256afdd4acf..55a40a730b10 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -50,8 +50,12 @@ static struct eventfs_root_inode *get_root_inode(struct eventfs_inode *ei)
/* Just try to make something consistent and unique */
static int eventfs_dir_ino(struct eventfs_inode *ei)
{
- if (!ei->ino)
+ if (!ei->ino) {
ei->ino = get_next_ino();
+ /* Must not have the file inode number */
+ if (ei->ino == EVENTFS_FILE_INODE_INO)
+ ei->ino = get_next_ino();
+ }
return ei->ino;
}
--
2.43.0
[CCing Mario, who asked for the two suspected commits to be backported]
On 06.05.24 14:24, Gia wrote:
> Hello, from 6.8.7=>6.8.8 I run into a similar problem with my Caldigit
> TS3 Plus Thunderbolt 3 dock.
>
> After the update I see this message on boot "xHCI host controller not
> responding, assume dead" and the dock is not working anymore. Kernel
> 6.8.7 works great.
Thx for the report. Could you make the kernel log (journalctl -k/dmesg)
accessible somewhere?
And have you looked into the other stuff that Mario suggested in the
other thread? See the following mail and the reply to it for details:
https://lore.kernel.org/all/1eb96465-0a81-4187-b8e7-607d85617d5f@gmail.com/…
Ciao, Thorsten
P.S.: To be sure the issue doesn't fall through the cracks unnoticed,
I'm adding it to regzbot, the Linux kernel regression tracking bot:
#regzbot ^introduced v6.8.7..v6.8.8
#regzbot title thunderbolt: TB3 dock problems, xHCI host controller not
responding, assume dead
From: Petr Pavlu <petr.pavlu(a)suse.com>
The reader code in rb_get_reader_page() swaps a new reader page into the
ring buffer by doing cmpxchg on old->list.prev->next to point it to the
new page. Following that, if the operation is successful,
old->list.next->prev gets updated too. This means the underlying
doubly-linked list is temporarily inconsistent, page->prev->next or
page->next->prev might not be equal back to page for some page in the
ring buffer.
The resize operation in ring_buffer_resize() can be invoked in parallel.
It calls rb_check_pages() which can detect the described inconsistency
and stop further tracing:
[ 190.271762] ------------[ cut here ]------------
[ 190.271771] WARNING: CPU: 1 PID: 6186 at kernel/trace/ring_buffer.c:1467 rb_check_pages.isra.0+0x6a/0xa0
[ 190.271789] Modules linked in: [...]
[ 190.271991] Unloaded tainted modules: intel_uncore_frequency(E):1 skx_edac(E):1
[ 190.272002] CPU: 1 PID: 6186 Comm: cmd.sh Kdump: loaded Tainted: G E 6.9.0-rc6-default #5 158d3e1e6d0b091c34c3b96bfd99a1c58306d79f
[ 190.272011] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552c-rebuilt.opensuse.org 04/01/2014
[ 190.272015] RIP: 0010:rb_check_pages.isra.0+0x6a/0xa0
[ 190.272023] Code: [...]
[ 190.272028] RSP: 0018:ffff9c37463abb70 EFLAGS: 00010206
[ 190.272034] RAX: ffff8eba04b6cb80 RBX: 0000000000000007 RCX: ffff8eba01f13d80
[ 190.272038] RDX: ffff8eba01f130c0 RSI: ffff8eba04b6cd00 RDI: ffff8eba0004c700
[ 190.272042] RBP: ffff8eba0004c700 R08: 0000000000010002 R09: 0000000000000000
[ 190.272045] R10: 00000000ffff7f52 R11: ffff8eba7f600000 R12: ffff8eba0004c720
[ 190.272049] R13: ffff8eba00223a00 R14: 0000000000000008 R15: ffff8eba067a8000
[ 190.272053] FS: 00007f1bd64752c0(0000) GS:ffff8eba7f680000(0000) knlGS:0000000000000000
[ 190.272057] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 190.272061] CR2: 00007f1bd6662590 CR3: 000000010291e001 CR4: 0000000000370ef0
[ 190.272070] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 190.272073] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 190.272077] Call Trace:
[ 190.272098] <TASK>
[ 190.272189] ring_buffer_resize+0x2ab/0x460
[ 190.272199] __tracing_resize_ring_buffer.part.0+0x23/0xa0
[ 190.272206] tracing_resize_ring_buffer+0x65/0x90
[ 190.272216] tracing_entries_write+0x74/0xc0
[ 190.272225] vfs_write+0xf5/0x420
[ 190.272248] ksys_write+0x67/0xe0
[ 190.272256] do_syscall_64+0x82/0x170
[ 190.272363] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 190.272373] RIP: 0033:0x7f1bd657d263
[ 190.272381] Code: [...]
[ 190.272385] RSP: 002b:00007ffe72b643f8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 190.272391] RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f1bd657d263
[ 190.272395] RDX: 0000000000000002 RSI: 0000555a6eb538e0 RDI: 0000000000000001
[ 190.272398] RBP: 0000555a6eb538e0 R08: 000000000000000a R09: 0000000000000000
[ 190.272401] R10: 0000555a6eb55190 R11: 0000000000000246 R12: 00007f1bd6662500
[ 190.272404] R13: 0000000000000002 R14: 00007f1bd6667c00 R15: 0000000000000002
[ 190.272412] </TASK>
[ 190.272414] ---[ end trace 0000000000000000 ]---
Note that ring_buffer_resize() calls rb_check_pages() only if the parent
trace_buffer has recording disabled. Recent commit d78ab792705c
("tracing: Stop current tracer when resizing buffer") causes that it is
now always the case which makes it more likely to experience this issue.
The window to hit this race is nonetheless very small. To help
reproducing it, one can add a delay loop in rb_get_reader_page():
ret = rb_head_page_replace(reader, cpu_buffer->reader_page);
if (!ret)
goto spin;
for (unsigned i = 0; i < 1U << 26; i++) /* inserted delay loop */
__asm__ __volatile__ ("" : : : "memory");
rb_list_head(reader->list.next)->prev = &cpu_buffer->reader_page->list;
.. and then run the following commands on the target system:
echo 1 > /sys/kernel/tracing/events/sched/sched_switch/enable
while true; do
echo 16 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
echo 8 > /sys/kernel/tracing/buffer_size_kb; sleep 0.1
done &
while true; do
for i in /sys/kernel/tracing/per_cpu/*; do
timeout 0.1 cat $i/trace_pipe; sleep 0.2
done
done
To fix the problem, make sure ring_buffer_resize() doesn't invoke
rb_check_pages() concurrently with a reader operating on the same
ring_buffer_per_cpu by taking its cpu_buffer->reader_lock.
Link: https://lore.kernel.org/linux-trace-kernel/20240517134008.24529-3-petr.pavl…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 659f451ff213 ("ring-buffer: Add integrity check at end of iter read")
Signed-off-by: Petr Pavlu <petr.pavlu(a)suse.com>
[ Fixed whitespace ]
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 42227727a49d..28853966aa9a 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -1460,6 +1460,11 @@ static void rb_check_bpage(struct ring_buffer_per_cpu *cpu_buffer,
*
* As a safety measure we check to make sure the data pages have not
* been corrupted.
+ *
+ * Callers of this function need to guarantee that the list of pages doesn't get
+ * modified during the check. In particular, if it's possible that the function
+ * is invoked with concurrent readers which can swap in a new reader page then
+ * the caller should take cpu_buffer->reader_lock.
*/
static void rb_check_pages(struct ring_buffer_per_cpu *cpu_buffer)
{
@@ -2210,8 +2215,12 @@ int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
*/
synchronize_rcu();
for_each_buffer_cpu(buffer, cpu) {
+ unsigned long flags;
+
cpu_buffer = buffer->buffers[cpu];
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
rb_check_pages(cpu_buffer);
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
}
atomic_dec(&buffer->record_disabled);
}
--
2.43.0
Hello,
Upstream commit 11fbb1bfb5bc8c98b2d7db9da332b5e568f4aaab ("ice: use
relative VSI index for VFs VSIs") was applied to stable 6.1, 6.6 and 6.8:
6.1: 5693dd6d3d01f0eea24401f815c98b64cb315b67
6.6: c926393dc3442c38fdcab17d040837cf4acad1c3
6.8: d3da0d4d9fb472ad7dccb784f3d9de40d0c2f6a9
However, it was a part of a series submitted to net-next [1]. Applying
this one patch on its own broke the VF devices created with the ice as a PF:
# [ 307.688237] iavf: Intel(R) Ethernet Adaptive Virtual Function
Network Driver
# [ 307.688241] Copyright (c) 2013 - 2018 Intel Corporation.
# [ 307.688424] iavf 0000:af:01.0: enabling device (0000 -> 0002)
# [ 307.758860] iavf 0000:af:01.0: Invalid MAC address
00:00:00:00:00:00, using random
# [ 307.759628] iavf 0000:af:01.0: Multiqueue Enabled: Queue pair
count = 16
# [ 307.759683] iavf 0000:af:01.0: MAC address: 6a:46:83:88:c2:26
# [ 307.759688] iavf 0000:af:01.0: GRO is enabled
# [ 307.790937] iavf 0000:af:01.0 ens802f0v0: renamed from eth0
# [ 307.896041] iavf 0000:af:01.0: PF returned error -5
(IAVF_ERR_PARAM) to our request 6
# [ 307.916967] iavf 0000:af:01.0: PF returned error -5
(IAVF_ERR_PARAM) to our request 8
The VF initialization fails and the VF device is completely unusable.
This can be fixed either by:
1 - Reverting the above mentioned commit (upstream
11fbb1bfb5bc8c98b2d7db9da332b5e568f4aaab)
Or,
2 - applying the following upstream commits (part of the series):
a) a21605993dd5dfd15edfa7f06705ede17b519026 ("ice: pass VSI pointer
into ice_vc_isvalid_q_id")
b) 363f689600dd010703ce6391bcfc729a97d21840 ("ice: remove unnecessary
duplicate checks for VF VSI ID")
Thanks,
Ahmed
[1]: https://www.spinics.net/lists/netdev/msg979289.html
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
When a remount happens, if a gid or uid is specified update the inodes to
have the same gid and uid. This will allow the simplification of the
permissions logic for the dynamically created files and directories.
Cc: stable(a)vger.kernel.org
Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 17 +++++++++++++----
fs/tracefs/inode.c | 15 ++++++++++++---
2 files changed, 25 insertions(+), 7 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 55a40a730b10..5dfb1ccd56ea 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -317,20 +317,29 @@ void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid)
if (!ei)
return;
- if (update_uid)
+ if (update_uid) {
ei->attr.mode &= ~EVENTFS_SAVE_UID;
+ ei->attr.uid = ti->vfs_inode.i_uid;
+ }
+
- if (update_gid)
+ if (update_gid) {
ei->attr.mode &= ~EVENTFS_SAVE_GID;
+ ei->attr.gid = ti->vfs_inode.i_gid;
+ }
if (!ei->entry_attrs)
return;
for (int i = 0; i < ei->nr_entries; i++) {
- if (update_uid)
+ if (update_uid) {
ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID;
- if (update_gid)
+ ei->entry_attrs[i].uid = ti->vfs_inode.i_uid;
+ }
+ if (update_gid) {
ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID;
+ ei->entry_attrs[i].gid = ti->vfs_inode.i_gid;
+ }
}
}
diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c
index a827f6a716c4..9252e0d78ea2 100644
--- a/fs/tracefs/inode.c
+++ b/fs/tracefs/inode.c
@@ -373,12 +373,21 @@ static int tracefs_apply_options(struct super_block *sb, bool remount)
rcu_read_lock();
list_for_each_entry_rcu(ti, &tracefs_inodes, list) {
- if (update_uid)
+ if (update_uid) {
ti->flags &= ~TRACEFS_UID_PERM_SET;
+ ti->vfs_inode.i_uid = fsi->uid;
+ }
- if (update_gid)
+ if (update_gid) {
ti->flags &= ~TRACEFS_GID_PERM_SET;
-
+ ti->vfs_inode.i_gid = fsi->gid;
+ }
+
+ /*
+ * Note, the above ti->vfs_inode updates are
+ * used in eventfs_remount() so they must come
+ * before calling it.
+ */
if (ti->flags & TRACEFS_EVENT_INODE)
eventfs_remount(ti, update_uid, update_gid);
}
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The directories require unique inode numbers but all the eventfs files
have the same inode number. Prevent the directories from having the same
inode numbers as the files as that can confuse some tooling.
Cc: stable(a)vger.kernel.org
Fixes: 834bf76add3e6 ("eventfs: Save directory inodes in the eventfs_inode structure")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 0256afdd4acf..55a40a730b10 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -50,8 +50,12 @@ static struct eventfs_root_inode *get_root_inode(struct eventfs_inode *ei)
/* Just try to make something consistent and unique */
static int eventfs_dir_ino(struct eventfs_inode *ei)
{
- if (!ei->ino)
+ if (!ei->ino) {
ei->ino = get_next_ino();
+ /* Must not have the file inode number */
+ if (ei->ino == EVENTFS_FILE_INODE_INO)
+ ei->ino = get_next_ino();
+ }
return ei->ino;
}
--
2.43.0