The series is intended for stable(a)vger.kernel.org # 5.4+
Syzkaller reported the following bug on linux-5.{4, 10, 15}.y:
https://syzkaller.appspot.com/bug?id=ce5575575f074c33ff80d104f5baee26f22e95…
The upstream commit that introduces this bug is:
1ed1d5921139 ("net: skip virtio_net_hdr_set_proto if protocol already set")
Upstream fixes the bug with the following commits, one of which introduces
new support:
e9d3f80935b6 ("net/af_packet: make sure to pull mac header")
dfed913e8b55 ("net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO")
The additional logic and risk backported seems manageable.
The blammed commit introduces a kernel BUG in __skb_gso_segment for
AF_PACKET SOCK_RAW GSO VLAN tagged packets. What happens is that
virtio_net_hdr_set_proto() exists early as skb->protocol is already set to
ETH_P_ALL. Then in packet_parse_headers() skb->protocol is set to
ETH_P_8021AD, but neither the network header position is adjusted, nor the
mac header is pulled. Thus when we get to validate the xmit skb and enter
skb_mac_gso_segment(), skb->mac_len has value 14, but vlan_depth gets
updated to 18 after skb_network_protocol() is called. This causes the
BUG_ON from __skb_pull(skb, vlan_depth) to be hit, as the mac header has
not been pulled yet.
The fixes from upstream backported cleanly without conflicts. I updated
the commit message of the first patch to describe the problem encountered,
and added Cc, Fixes, Reported-by and Tested-by tags. For the second patch
I just added Cc to stable indicating the versions to be fixed, and added
my Tested and Signed-off-by tags.
I tested the patches on linux-5.{4, 10, 15}.y.
Eric Dumazet (1):
net/af_packet: make sure to pull mac header
Hangbin Liu (1):
net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO
net/packet/af_packet.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)
--
2.34.1
SVACE reports always true condition issue at
tl92d_phy_reload_iqk_setting() in 5.10 stable releases. The problem has
been fixed by the following patches which can be cleanly applied to the
5.10 branch.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Stable team,
Please backport these upstream commits to stable kernels:
- c7423dbdbc9e ("ima: Handle -ESTALE returned by
ima_filter_rule_match()"
Dependency on:
- d57378d3aa4d ("ima: Simplify ima_lsm_copy_rule")
Known minor merge conflicts:
- Commit: 65603435599f ("ima: Fix trivial typos in the comments") fixed
"refrences" spelling, causes a merge conflict.
- Commit 28073eb09c5a ("ima: Fix fall-through warnings for Clang") adds
a "break;" before "default:", causes a merge conflict.
Simplifies backporting to linux-5.4.y:
- 465aee77aae8 ("ima: Free the entire rule when deleting a list of
rules")
except for the line "kfree(entry->keyrings);" - introduced in 5.6.y.
- 39e5993d0d45 ("ima: Shallow copy the args_p member of
ima_rule_entry.lsm elements")
- b8867eedcf76 ("ima: Rename internal filter rule functions")
- f60c826d0318 ("ima: Use kmemdup rather than kmalloc+memcpy")
A patch for kernels prior to commit b16942455193 ("ima: use the lsm
policy
update notifier") will be posted separately.
thanks,
Mimi
This bug is marked as fixed by commit:
ext4: block range must be validated before use in ext4_mb_clear_bb()
But I can't find it in the tested trees[1] for more than 90 days.
Is it a correct commit? Please update it by replying:
#syz fix: exact-commit-title
Until then the bug is still considered open and new crashes with
the same signature are ignored.
Kernel: Android 5.10
Dashboard link: https://syzkaller.appspot.com/bug?extid=15cd994e273307bf5cfa
---
[1] I expect the commit to be present in:
1. android12-5.10-lts branch of
https://android.googlesource.com/kernel/common
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
1c0908d8e441 ("rtmutex: Add acquire semantics for rtmutex lock acquisition slow path")
ee042be16cb4 ("locking: Apply contention tracepoints in the slow path")
d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
7cdacc5f52d6 ("locking/rwsem: Disable preemption for spinning region")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1c0908d8e441631f5b8ba433523cf39339ee2ba0 Mon Sep 17 00:00:00 2001
From: Mel Gorman <mgorman(a)techsingularity.net>
Date: Fri, 2 Dec 2022 10:02:23 +0000
Subject: [PATCH] rtmutex: Add acquire semantics for rtmutex lock acquisition
slow path
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench
on XFS on arm64.
kernel BUG at fs/inode.c:625!
Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP
CPU: 11 PID: 6611 Comm: dbench Tainted: G E 6.0.0-rt14-rt+ #1
pc : clear_inode+0xa0/0xc0
lr : clear_inode+0x38/0xc0
Call trace:
clear_inode+0xa0/0xc0
evict+0x160/0x180
iput+0x154/0x240
do_unlinkat+0x184/0x300
__arm64_sys_unlinkat+0x48/0xc0
el0_svc_common.constprop.4+0xe4/0x2c0
do_el0_svc+0xac/0x100
el0_svc+0x78/0x200
el0t_64_sync_handler+0x9c/0xc0
el0t_64_sync+0x19c/0x1a0
It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this
is likely a bug that existed forever and only became visible when ARM
support was added to preempt-rt. The same problem does not occur on x86-64
and he also reported that converting sb->s_inode_wblist_lock to
raw_spinlock_t makes the problem disappear indicating that the RT spinlock
variant is the problem.
Which in turn means that RT mutexes on ARM64 and any other weakly ordered
architecture are affected by this independent of RT.
Will Deacon observed:
"I'd be more inclined to be suspicious of the slowpath tbh, as we need to
make sure that we have acquire semantics on all paths where the lock can
be taken. Looking at the rtmutex code, this really isn't obvious to me
-- for example, try_to_take_rt_mutex() appears to be able to return via
the 'takeit' label without acquire semantics and it looks like we might
be relying on the caller's subsequent _unlock_ of the wait_lock for
ordering, but that will give us release semantics which aren't correct."
Sebastian Andrzej Siewior prototyped a fix that does work based on that
comment but it was a little bit overkill and added some fences that should
not be necessary.
The lock owner is updated with an IRQ-safe raw spinlock held, but the
spin_unlock does not provide acquire semantics which are needed when
acquiring a mutex.
Adds the necessary acquire semantics for lock owner updates in the slow path
acquisition and the waiter bit logic.
It successfully completed 10 iterations of the dbench workload while the
vanilla kernel fails on the first iteration.
[ bigeasy(a)linutronix.de: Initial prototype fix ]
Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics")
Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core")
Reported-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.n…
diff --git a/kernel/locking/rtmutex.c b/kernel/locking/rtmutex.c
index 7779ee8abc2a..010cf4e6d0b8 100644
--- a/kernel/locking/rtmutex.c
+++ b/kernel/locking/rtmutex.c
@@ -89,15 +89,31 @@ static inline int __ww_mutex_check_kill(struct rt_mutex *lock,
* set this bit before looking at the lock.
*/
-static __always_inline void
-rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
+static __always_inline struct task_struct *
+rt_mutex_owner_encode(struct rt_mutex_base *lock, struct task_struct *owner)
{
unsigned long val = (unsigned long)owner;
if (rt_mutex_has_waiters(lock))
val |= RT_MUTEX_HAS_WAITERS;
- WRITE_ONCE(lock->owner, (struct task_struct *)val);
+ return (struct task_struct *)val;
+}
+
+static __always_inline void
+rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner)
+{
+ /*
+ * lock->wait_lock is held but explicit acquire semantics are needed
+ * for a new lock owner so WRITE_ONCE is insufficient.
+ */
+ xchg_acquire(&lock->owner, rt_mutex_owner_encode(lock, owner));
+}
+
+static __always_inline void rt_mutex_clear_owner(struct rt_mutex_base *lock)
+{
+ /* lock->wait_lock is held so the unlock provides release semantics. */
+ WRITE_ONCE(lock->owner, rt_mutex_owner_encode(lock, NULL));
}
static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
@@ -106,7 +122,8 @@ static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock)
((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS);
}
-static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
+static __always_inline void
+fixup_rt_mutex_waiters(struct rt_mutex_base *lock, bool acquire_lock)
{
unsigned long owner, *p = (unsigned long *) &lock->owner;
@@ -172,8 +189,21 @@ static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock)
* still set.
*/
owner = READ_ONCE(*p);
- if (owner & RT_MUTEX_HAS_WAITERS)
- WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
+ if (owner & RT_MUTEX_HAS_WAITERS) {
+ /*
+ * See rt_mutex_set_owner() and rt_mutex_clear_owner() on
+ * why xchg_acquire() is used for updating owner for
+ * locking and WRITE_ONCE() for unlocking.
+ *
+ * WRITE_ONCE() would work for the acquire case too, but
+ * in case that the lock acquisition failed it might
+ * force other lockers into the slow path unnecessarily.
+ */
+ if (acquire_lock)
+ xchg_acquire(p, owner & ~RT_MUTEX_HAS_WAITERS);
+ else
+ WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS);
+ }
}
/*
@@ -208,6 +238,13 @@ static __always_inline void mark_rt_mutex_waiters(struct rt_mutex_base *lock)
owner = *p;
} while (cmpxchg_relaxed(p, owner,
owner | RT_MUTEX_HAS_WAITERS) != owner);
+
+ /*
+ * The cmpxchg loop above is relaxed to avoid back-to-back ACQUIRE
+ * operations in the event of contention. Ensure the successful
+ * cmpxchg is visible.
+ */
+ smp_mb__after_atomic();
}
/*
@@ -1243,7 +1280,7 @@ static int __sched __rt_mutex_slowtrylock(struct rt_mutex_base *lock)
* try_to_take_rt_mutex() sets the lock waiters bit
* unconditionally. Clean this up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
return ret;
}
@@ -1604,7 +1641,7 @@ static int __sched __rt_mutex_slowlock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit
* unconditionally. We might have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
trace_contention_end(lock, ret);
@@ -1719,7 +1756,7 @@ static void __sched rtlock_slowlock_locked(struct rt_mutex_base *lock)
* try_to_take_rt_mutex() sets the waiter bit unconditionally.
* We might have to fix that up:
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
debug_rt_mutex_free_waiter(&waiter);
trace_contention_end(lock, 0);
diff --git a/kernel/locking/rtmutex_api.c b/kernel/locking/rtmutex_api.c
index 900220941caa..cb9fdff76a8a 100644
--- a/kernel/locking/rtmutex_api.c
+++ b/kernel/locking/rtmutex_api.c
@@ -267,7 +267,7 @@ void __sched rt_mutex_init_proxy_locked(struct rt_mutex_base *lock,
void __sched rt_mutex_proxy_unlock(struct rt_mutex_base *lock)
{
debug_rt_mutex_proxy_unlock(lock);
- rt_mutex_set_owner(lock, NULL);
+ rt_mutex_clear_owner(lock);
}
/**
@@ -382,7 +382,7 @@ int __sched rt_mutex_wait_proxy_lock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
* have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, true);
raw_spin_unlock_irq(&lock->wait_lock);
return ret;
@@ -438,7 +438,7 @@ bool __sched rt_mutex_cleanup_proxy_lock(struct rt_mutex_base *lock,
* try_to_take_rt_mutex() sets the waiter bit unconditionally. We might
* have to fix that up.
*/
- fixup_rt_mutex_waiters(lock);
+ fixup_rt_mutex_waiters(lock, false);
raw_spin_unlock_irq(&lock->wait_lock);
The patch below does not apply to the 6.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
8660495a9c5b ("drm/amdgpu: skip mes self test after s0i3 resume for MES IP v11.0")
bbce8cdb8390 ("drm/amdgpu: skip mes self test for gc 11.0.3")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8660495a9c5b9afeec4cc006b3b75178f0fb2f10 Mon Sep 17 00:00:00 2001
From: Tim Huang <tim.huang(a)amd.com>
Date: Mon, 19 Dec 2022 18:32:32 +0800
Subject: [PATCH] drm/amdgpu: skip mes self test after s0i3 resume for MES IP
v11.0
MES is part of gfxoff and MES suspend and resume are skipped for S0i3.
But the mes_self_test call path is still in the amdgpu_device_ip_late_init.
it's should also be skipped for s0ix as no hardware re-initialization
happened.
Besides, mes_self_test will free the BO that triggers a lot of warning
messages while in the suspend state.
[ 81.656085] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:425 amdgpu_bo_free_kernel+0xfc/0x110 [amdgpu]
[ 81.679435] Call Trace:
[ 81.679726] <TASK>
[ 81.679981] amdgpu_mes_remove_hw_queue+0x17a/0x230 [amdgpu]
[ 81.680857] amdgpu_mes_self_test+0x390/0x430 [amdgpu]
[ 81.681665] mes_v11_0_late_init+0x37/0x50 [amdgpu]
[ 81.682423] amdgpu_device_ip_late_init+0x53/0x280 [amdgpu]
[ 81.683257] amdgpu_device_resume+0xae/0x2a0 [amdgpu]
[ 81.684043] amdgpu_pmops_resume+0x37/0x70 [amdgpu]
[ 81.684818] pci_pm_resume+0x5c/0xa0
[ 81.685247] ? pci_pm_thaw+0x90/0x90
[ 81.685658] dpm_run_callback+0x4e/0x160
[ 81.686110] device_resume+0xad/0x210
[ 81.686529] async_resume+0x1e/0x40
[ 81.686931] async_run_entry_fn+0x33/0x120
[ 81.687405] process_one_work+0x21d/0x3f0
[ 81.687869] worker_thread+0x4a/0x3c0
[ 81.688293] ? process_one_work+0x3f0/0x3f0
[ 81.688777] kthread+0xff/0x130
[ 81.689157] ? kthread_complete_and_exit+0x20/0x20
[ 81.689707] ret_from_fork+0x22/0x30
[ 81.690118] </TASK>
[ 81.690380] ---[ end trace 0000000000000000 ]---
v2: make the comment clean and use adev->in_s0ix instead of
adev->suspend
Signed-off-by: Tim Huang <tim.huang(a)amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org # 6.0, 6.1
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 5459366f49ff..970b066b37bb 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -1342,7 +1342,8 @@ static int mes_v11_0_late_init(void *handle)
{
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
- if (!amdgpu_in_reset(adev) &&
+ /* it's only intended for use in mes_self_test case, not for s0ix and reset */
+ if (!amdgpu_in_reset(adev) && !adev->in_s0ix &&
(adev->ip_versions[GC_HWIP][0] != IP_VERSION(11, 0, 3)))
amdgpu_mes_self_test(adev);
On Tue, 03 Jan 2023 15:48:41 +0100,
Takashi Iwai wrote:
>
> On Tue, 03 Jan 2023 14:04:50 +0100,
> PÁLFFY Dániel wrote:
> >
> > And confirming, 5.10.161 with e8444560b4d9302a511f0996f4cfdf85b628f4ca
> > and 636110411ca726f19ef8e87b0be51bb9a4cdef06 cherry-picked works for
> > me.
>
> That's a good news. Then we can ask stable people to pick up those
> commits for 5.10.y and 5.15.y.
I confirmed that the latest 5.15.y requires those fixes, too.
Greg, could you cherry-pick the following two commits to both 5.10.y
and 5.15.y stable trees? This fixes the recent regression caused by
the backport of 39bd801d6908.
e8444560b4d9302a511f0996f4cfdf85b628f4ca
ASoC/SoundWire: dai: expand 'stream' concept beyond SoundWire
636110411ca726f19ef8e87b0be51bb9a4cdef06
ASoC: Intel/SOF: use set_stream() instead of set_tdm_slots() for HDAudio
Thanks!
Takashi
>
>
> Takashi
>
> >
> > On Tue, Jan 3, 2023 at 1:05 PM PÁLFFY Dániel <dpalffy(a)gmail.com> wrote:
> > >
> > > Another report: https://bugs.archlinux.org/task/76795
> > > Apparently, folks at alsa-devel traced down the dependencies of that patch, see the mail thread at https://lore.kernel.org/all/dc65501c-c2fd-5608-c3d9-7cea184c3989%40opensour…
> > >
> > > On Mon, Jan 2, 2023 at 1:42 PM Takashi Iwai <tiwai(a)suse.de> wrote:
> > >>
> > >> On Mon, 02 Jan 2023 11:43:36 +0100,
> > >> Salvatore Bonaccorso wrote:
> > >> >
> > >> > Hi,
> > >> >
> > >> > [Adding as well Richard Fitzgerald and PÁLFFY Dániel to recipients]
> > >> >
> > >> > On Fri, Dec 30, 2022 at 09:08:57AM +0100, Thorsten Leemhuis wrote:
> > >> > > Hi, this is your Linux kernel regression tracker speaking.
> > >> > >
> > >> > > I noticed a regression report in bugzilla.kernel.org. As many (most?)
> > >> > > kernel developer don't keep an eye on it, I decided to forward it by
> > >> > > mail. Quoting from https://bugzilla.kernel.org/show_bug.cgi?id=216861 :
> > >> > >
> > >> > > > Sergey 2022-12-29 10:07:51 UTC
> > >> > > >
> > >> > > > Created attachment 303497 [details]
> > >> > > > pulseaudio.log
> > >> > > >
> > >> > > > Sudden sound disappearance was reported for some laptops, e.g.
> > >> > > >
> > >> > > > Acer Swift 3 SF314-59-78UR 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
> > >> > > >
> > >> > > > # lspci
> > >> > > > 0000:00:1f.3 Multimedia audio controller: Intel Corporation Tiger Lake-LP Smart Sound Technology Audio Controller (rev 20)
> > >> > > > Subsystem: Acer Incorporated [ALI] Device 148c
> > >> > > > Flags: bus master, fast devsel, latency 32, IRQ 197, IOMMU group 12
> > >> > > > Memory at 601f270000 (64-bit, non-prefetchable) [size=16K]
> > >> > > > Memory at 601f000000 (64-bit, non-prefetchable) [size=1M]
> > >> > > > Capabilities: [50] Power Management version 3
> > >> > > > Capabilities: [80] Vendor Specific Information: Len=14 <?>
> > >> > > > Capabilities: [60] MSI: Enable+ Count=1/1 Maskable- 64bit+
> > >> > > > Kernel driver in use: sof-audio-pci
> > >> > > >
> > >> > > > I am attaching the pulseaudio and dmesg logs
> > >> > > >
> > >> > > > This bug started reproducing after updating the kernel from 5.10.156 to 5.10.157
> > >> > > >
> > >> > > > Bisection revealed the commit being reverted:
> > >> > > >
> > >> > > > c34db0d6b88b1da95e7ab3353e674f4f574cccee is the first bad commit
> > >> > > > commit c34db0d6b88b1da95e7ab3353e674f4f574cccee
> > >> > > > Author: Richard Fitzgerald <rf(a)opensource.cirrus.com>
> > >> > > > Date: Fri Nov 4 13:22:13 2022 +0000
> > >> > > >
> > >> > > > ASoC: soc-pcm: Don't zero TDM masks in __soc_pcm_open()
> > >> > > >
> > >> > > > [ Upstream commit 39bd801d6908900e9ab0cdc2655150f95ddd4f1a ]
> > >> > > >
> > >> > > > The DAI tx_mask and rx_mask are set by snd_soc_dai_set_tdm_slot()
> > >> > > > and used by later code that depends on the TDM settings. So
> > >> > > > __soc_pcm_open() should not be obliterating those mask values.
> > >> > > >
> > >> > > > [...]
> > >> > > > Original bug report: https://bugzilla.altlinux.org/44690
> > >> > >
> > >> > > See the ticket for more details.
> > >> > >
> > >> > > BTW, let me use this mail to also add the report to the list of tracked
> > >> > > regressions to ensure it's doesn't fall through the cracks:
> > >> > >
> > >> > > #regzbot introduced: c34db0d6b88b1d
> > >> > > https://bugzilla.kernel.org/show_bug.cgi?id=216861
> > >> > > #regzbot title: sound: asoc: sudden sound disappearance
> > >> > > #regzbot ignore-activity
> > >> >
> > >> > FWIW, we had as well reports in Debian after having updated the kernel
> > >> > from 5.10.149 based one to 5.10.158 based one in the last point
> > >> > releases, they are at least:
> > >> >
> > >> > https://bugs.debian.org/1027483
> > >> > https://bugs.debian.org/1027430
> > >>
> > >> I got another report while the commit was backported to 5.14-based
> > >> openSUSE Leap kernel, and I ended up with dropping it.
> > >>
> > >> So, IMO, it's safer to drop this patch from the older stable trees.
> > >> As far as I see, 5.15.y and 5.10.y got this.
> > >>
> > >> Unless anyone gives a better fix, I'm going to submit a revert patch
> > >> for those trees.
> > >>
> > >>
> > >> thanks,
> > >>
> > >> Takashi
> >
>