This is the start of the stable review cycle for the 4.19.257 release. There are 56 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 04 Sep 2022 12:13:47 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.257-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.19.257-rc1
Yang Yingliang yangyingliang@huawei.com net: neigh: don't call kfree_skb() under spin_lock_irqsave()
Kuniyuki Iwashima kuniyu@amazon.com kprobes: don't call disarm_kprobe() for disabled kprobes
Geert Uytterhoeven geert@linux-m68k.org netfilter: conntrack: NF_CONNTRACK_PROCFS should no longer default to y
Juergen Gross jgross@suse.com s390/hypfs: avoid error message under KVM
Denis V. Lunev den@openvz.org neigh: fix possible DoS due to net iface start/stop loop
Fudong Wang Fudong.Wang@amd.com drm/amd/display: clear optc underflow before turn off odm clock
Jann Horn jannh@google.com mm/rmap: Fix anon_vma->degree ambiguity leading to double-reuse
Yang Jihong yangjihong1@huawei.com ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead
Letu Ren fantasquex@gmail.com fbdev: fb_pm2fb: Avoid potential divide by zero error
Karthik Alapati mail@karthek.com HID: hidraw: fix memory leak in hidraw_release()
Dongliang Mu mudongliangabcd@gmail.com media: pvrusb2: fix memory leak in pvr_probe
Lee Jones lee.jones@linaro.org HID: steam: Prevent NULL pointer dereference in steam_{recv,send}_report
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: L2CAP: Fix build errors in some archs
Jing Leng jleng@ambarella.com kbuild: Fix include path in scripts/Makefile.modpost
Pawan Gupta pawan.kumar.gupta@linux.intel.com x86/bugs: Add "unknown" reporting for MMIO Stale Data
Gerald Schaefer gerald.schaefer@linux.ibm.com s390/mm: do not trigger write fault when vma does not allow VM_WRITE
Stanislav Fomichev sdf@google.com selftests/bpf: Fix test_align verifier log patterns
Maxim Mikityanskiy maximmi@nvidia.com bpf: Fix the off-by-two error in range markings
Hsin-Yi Wang hsinyi@chromium.org arm64: map FDT as RW for early_init_dt_scan()
Jann Horn jannh@google.com mm: Force TLB flush for PFNMAP mappings before unlink_file_vma()
Saurabh Sengar ssengar@linux.microsoft.com scsi: storvsc: Remove WQ_MEM_RECLAIM from storvsc_error_wq
Guoqing Jiang guoqing.jiang@linux.dev md: call __md_stop_writes in md_stop
David Hildenbrand david@redhat.com mm/hugetlb: fix hugetlb not supporting softdirty tracking
Brian Foster bfoster@redhat.com s390: fix double free of GS and RI CBs on fork() failure
Quanyang Wang quanyang.wang@windriver.com asm-generic: sections: refactor memory_intersects
Siddh Raman Pant code@siddh.me loop: Check for overflow while configuring loop
Chen Zhongjin chenzhongjin@huawei.com x86/unwind/orc: Unwind ftrace trampolines with correct ORC entry
Goldwyn Rodrigues rgoldwyn@suse.de btrfs: check if root is readonly while setting security xattr
Jacob Keller jacob.e.keller@intel.com ixgbe: stop resetting SYSTIME in ixgbe_ptp_start_cyclecounter
Kuniyuki Iwashima kuniyu@amazon.com net: Fix a data-race around sysctl_somaxconn.
Kuniyuki Iwashima kuniyu@amazon.com net: Fix a data-race around netdev_budget_usecs.
Kuniyuki Iwashima kuniyu@amazon.com net: Fix a data-race around netdev_budget.
Kuniyuki Iwashima kuniyu@amazon.com net: Fix a data-race around sysctl_net_busy_read.
Kuniyuki Iwashima kuniyu@amazon.com net: Fix a data-race around sysctl_net_busy_poll.
Kuniyuki Iwashima kuniyu@amazon.com net: Fix a data-race around sysctl_tstamp_allow_data.
Kuniyuki Iwashima kuniyu@amazon.com ratelimit: Fix data-races in ___ratelimit().
Kuniyuki Iwashima kuniyu@amazon.com net: Fix data-races around netdev_tstamp_prequeue.
Kuniyuki Iwashima kuniyu@amazon.com net: Fix data-races around weight_p and dev_weight_[rt]x_bias.
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_tunnel: restrict it to netdev family
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_osf: restrict osf to ipv4, ipv6 and inet families
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_payload: do not truncate csum_offset and csum_type
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_payload: report ERANGE for too long offset and length
Florian Westphal fw@strlen.de netfilter: ebtables: reject blobs that don't provide all entry points
Maciej Żenczykowski maze@google.com net: ipvtap - add __init/__exit annotations to module init/exit funcs
Jonathan Toppins jtoppins@redhat.com bonding: 802.3ad: fix no transmission of LACPDUs
Bernard Pidoux f6bvp@free.fr rose: check NULL rose_loopback_neigh->loopback
Herbert Xu herbert@gondor.apana.org.au af_key: Do not call xfrm_probe_algs in parallel
Xin Xiong xiongx18@fudan.edu.cn xfrm: fix refcount leak in __xfrm_policy_check()
Hui Su suhui_kernel@163.com kernel/sched: Remove dl_boosted flag comment
Juri Lelli juri.lelli@redhat.com sched/deadline: Fix priority inheritance with multiple scheduling classes
Lucas Stach l.stach@pengutronix.de sched/deadline: Fix stale throttling on de-/boosted tasks
Daniel Bristot de Oliveira bristot@redhat.com sched/deadline: Unthrottle PI boosted threads while enqueuing
Basavaraj Natikar Basavaraj.Natikar@amd.com pinctrl: amd: Don't save/restore interrupt status and wake status bits
Randy Dunlap rdunlap@infradead.org kernel/sys_ni: add compat entry for fadvise64_64
Helge Deller deller@gmx.de parisc: Fix exception handler for fldw and fstw instructions
Gaosheng Cui cuigaosheng1@huawei.com audit: fix potential double free on error path from fsnotify_add_inode_mark
-------------
Diffstat:
.../hw-vuln/processor_mmio_stale_data.rst | 14 +++ Makefile | 4 +- arch/arm64/include/asm/mmu.h | 2 +- arch/arm64/kernel/kaslr.c | 5 +- arch/arm64/kernel/setup.c | 9 +- arch/arm64/mm/mmu.c | 15 +-- arch/parisc/kernel/unaligned.c | 2 +- arch/s390/hypfs/hypfs_diag.c | 2 +- arch/s390/hypfs/inode.c | 2 +- arch/s390/kernel/process.c | 22 +++- arch/s390/mm/fault.c | 4 +- arch/x86/include/asm/cpufeatures.h | 3 +- arch/x86/kernel/cpu/bugs.c | 14 ++- arch/x86/kernel/cpu/common.c | 34 ++++-- arch/x86/kernel/unwind_orc.c | 15 ++- drivers/block/loop.c | 5 + drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c | 5 + drivers/hid/hid-steam.c | 10 ++ drivers/hid/hidraw.c | 3 + drivers/md/md.c | 1 + drivers/media/usb/pvrusb2/pvrusb2-hdw.c | 1 + drivers/net/bonding/bond_3ad.c | 38 +++--- drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 59 ++++++++-- drivers/net/ipvlan/ipvtap.c | 4 +- drivers/pinctrl/pinctrl-amd.c | 11 +- drivers/scsi/storvsc_drv.c | 2 +- drivers/video/fbdev/pm2fb.c | 5 + fs/btrfs/xattr.c | 3 + include/asm-generic/sections.h | 7 +- include/linux/netfilter_bridge/ebtables.h | 4 - include/linux/rmap.h | 7 +- include/linux/sched.h | 14 ++- include/net/busy_poll.h | 2 +- kernel/audit_fsnotify.c | 1 + kernel/kprobes.c | 9 +- kernel/sched/core.c | 11 +- kernel/sched/deadline.c | 131 +++++++++++++-------- kernel/sys_ni.c | 1 + kernel/trace/ftrace.c | 10 ++ lib/ratelimit.c | 12 +- mm/mmap.c | 20 +++- mm/rmap.c | 31 ++--- net/bluetooth/l2cap_core.c | 10 +- net/bridge/netfilter/ebtable_broute.c | 8 -- net/bridge/netfilter/ebtable_filter.c | 8 -- net/bridge/netfilter/ebtable_nat.c | 8 -- net/bridge/netfilter/ebtables.c | 8 +- net/core/dev.c | 14 +-- net/core/neighbour.c | 27 ++++- net/core/skbuff.c | 2 +- net/core/sock.c | 2 +- net/core/sysctl_net_core.c | 15 ++- net/key/af_key.c | 3 + net/netfilter/Kconfig | 1 - net/netfilter/nft_osf.c | 18 ++- net/netfilter/nft_payload.c | 29 +++-- net/netfilter/nft_tunnel.c | 1 + net/rose/rose_loopback.c | 3 +- net/sched/sch_generic.c | 2 +- net/socket.c | 2 +- net/xfrm/xfrm_policy.c | 1 + scripts/Makefile.modpost | 3 +- tools/testing/selftests/bpf/test_align.c | 27 +++-- tools/testing/selftests/bpf/test_verifier.c | 32 ++--- 64 files changed, 493 insertions(+), 285 deletions(-)
From: Gaosheng Cui cuigaosheng1@huawei.com
commit ad982c3be4e60c7d39c03f782733503cbd88fd2a upstream.
Audit_alloc_mark() assign pathname to audit_mark->path, on error path from fsnotify_add_inode_mark(), fsnotify_put_mark will free memory of audit_mark->path, but the caller of audit_alloc_mark will free the pathname again, so there will be double free problem.
Fix this by resetting audit_mark->path to NULL pointer on error path from fsnotify_add_inode_mark().
Cc: stable@vger.kernel.org Fixes: 7b1293234084d ("fsnotify: Add group pointer in fsnotify_init_mark()") Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/audit_fsnotify.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/audit_fsnotify.c +++ b/kernel/audit_fsnotify.c @@ -111,6 +111,7 @@ struct audit_fsnotify_mark *audit_alloc_
ret = fsnotify_add_inode_mark(&audit_mark->mark, inode, true); if (ret < 0) { + audit_mark->path = NULL; fsnotify_put_mark(&audit_mark->mark); audit_mark = ERR_PTR(ret); }
From: Helge Deller deller@gmx.de
commit 7ae1f5508d9a33fd58ed3059bd2d569961e3b8bd upstream.
The exception handler is broken for unaligned memory acceses with fldw and fstw instructions, because it trashes or uses randomly some other floating point register than the one specified in the instruction word on loads and stores.
The instruction "fldw 0(addr),%fr22L" (and the other fldw/fstw instructions) encode the target register (%fr22) in the rightmost 5 bits of the instruction word. The 7th rightmost bit of the instruction word defines if the left or right half of %fr22 should be used.
While processing unaligned address accesses, the FR3() define is used to extract the offset into the local floating-point register set. But the calculation in FR3() was buggy, so that for example instead of %fr22, register %fr12 [((22 * 2) & 0x1f) = 12] was used.
This bug has been since forever in the parisc kernel and I wonder why it wasn't detected earlier. Interestingly I noticed this bug just because the libime debian package failed to build on *native* hardware, while it successfully built in qemu.
This patch corrects the bitshift and masking calculation in FR3().
Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/kernel/unaligned.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/parisc/kernel/unaligned.c +++ b/arch/parisc/kernel/unaligned.c @@ -121,7 +121,7 @@ #define R1(i) (((i)>>21)&0x1f) #define R2(i) (((i)>>16)&0x1f) #define R3(i) ((i)&0x1f) -#define FR3(i) ((((i)<<1)&0x1f)|(((i)>>6)&1)) +#define FR3(i) ((((i)&0x1f)<<1)|(((i)>>6)&1)) #define IM(i,n) (((i)>>1&((1<<(n-1))-1))|((i)&1?((0-1L)<<(n-1)):0)) #define IM5_2(i) IM((i)>>16,5) #define IM5_3(i) IM((i),5)
From: Randy Dunlap rdunlap@infradead.org
commit a8faed3a02eeb75857a3b5d660fa80fe79db77a3 upstream.
When CONFIG_ADVISE_SYSCALLS is not set/enabled and CONFIG_COMPAT is set/enabled, the riscv compat_syscall_table references 'compat_sys_fadvise64_64', which is not defined:
riscv64-linux-ld: arch/riscv/kernel/compat_syscall_table.o:(.rodata+0x6f8): undefined reference to `compat_sys_fadvise64_64'
Add 'fadvise64_64' to kernel/sys_ni.c as a conditional COMPAT function so that when CONFIG_ADVISE_SYSCALLS is not set, there is a fallback function available.
Link: https://lkml.kernel.org/r/20220807220934.5689-1-rdunlap@infradead.org Fixes: d3ac21cacc24 ("mm: Support compiling out madvise and fadvise") Signed-off-by: Randy Dunlap rdunlap@infradead.org Suggested-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Arnd Bergmann arnd@arndb.de Cc: Josh Triplett josh@joshtriplett.org Cc: Paul Walmsley paul.walmsley@sifive.com Cc: Palmer Dabbelt palmer@dabbelt.com Cc: Albert Ou aou@eecs.berkeley.edu Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sys_ni.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/sys_ni.c +++ b/kernel/sys_ni.c @@ -257,6 +257,7 @@ COND_SYSCALL_COMPAT(keyctl);
/* mm/fadvise.c */ COND_SYSCALL(fadvise64_64); +COND_SYSCALL_COMPAT(fadvise64_64);
/* mm/, CONFIG_MMU only */ COND_SYSCALL(swapon);
From: Basavaraj Natikar Basavaraj.Natikar@amd.com
commit b8c824a869f220c6b46df724f85794349bafbf23 upstream.
Saving/restoring interrupt and wake status bits across suspend can cause the suspend to fail if an IRQ is serviced across the suspend cycle.
Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Basavaraj Natikar Basavaraj.Natikar@amd.com Fixes: 79d2c8bede2c ("pinctrl/amd: save pin registers over suspend/resume") Link: https://lore.kernel.org/r/20220613064127.220416-3-Basavaraj.Natikar@amd.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/pinctrl-amd.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/drivers/pinctrl/pinctrl-amd.c +++ b/drivers/pinctrl/pinctrl-amd.c @@ -798,6 +798,7 @@ static int amd_gpio_suspend(struct devic struct platform_device *pdev = to_platform_device(dev); struct amd_gpio *gpio_dev = platform_get_drvdata(pdev); struct pinctrl_desc *desc = gpio_dev->pctrl->desc; + unsigned long flags; int i;
for (i = 0; i < desc->npins; i++) { @@ -806,7 +807,9 @@ static int amd_gpio_suspend(struct devic if (!amd_gpio_should_save(gpio_dev, pin)) continue;
- gpio_dev->saved_regs[i] = readl(gpio_dev->base + pin*4); + raw_spin_lock_irqsave(&gpio_dev->lock, flags); + gpio_dev->saved_regs[i] = readl(gpio_dev->base + pin * 4) & ~PIN_IRQ_PENDING; + raw_spin_unlock_irqrestore(&gpio_dev->lock, flags); }
return 0; @@ -817,6 +820,7 @@ static int amd_gpio_resume(struct device struct platform_device *pdev = to_platform_device(dev); struct amd_gpio *gpio_dev = platform_get_drvdata(pdev); struct pinctrl_desc *desc = gpio_dev->pctrl->desc; + unsigned long flags; int i;
for (i = 0; i < desc->npins; i++) { @@ -825,7 +829,10 @@ static int amd_gpio_resume(struct device if (!amd_gpio_should_save(gpio_dev, pin)) continue;
- writel(gpio_dev->saved_regs[i], gpio_dev->base + pin*4); + raw_spin_lock_irqsave(&gpio_dev->lock, flags); + gpio_dev->saved_regs[i] |= readl(gpio_dev->base + pin * 4) & PIN_IRQ_PENDING; + writel(gpio_dev->saved_regs[i], gpio_dev->base + pin * 4); + raw_spin_unlock_irqrestore(&gpio_dev->lock, flags); }
return 0;
From: Daniel Bristot de Oliveira bristot@redhat.com
commit feff2e65efd8d84cf831668e182b2ce73c604bbb upstream.
stress-ng has a test (stress-ng --cyclic) that creates a set of threads under SCHED_DEADLINE with the following parameters:
dl_runtime = 10000 (10 us) dl_deadline = 100000 (100 us) dl_period = 100000 (100 us)
These parameters are very aggressive. When using a system without HRTICK set, these threads can easily execute longer than the dl_runtime because the throttling happens with 1/HZ resolution.
During the main part of the test, the system works just fine because the workload does not try to run over the 10 us. The problem happens at the end of the test, on the exit() path. During exit(), the threads need to do some cleanups that require real-time mutex locks, mainly those related to memory management, resulting in this scenario:
Note: locks are rt_mutexes... ------------------------------------------------------------------------ TASK A: TASK B: TASK C: activation activation activation
lock(a): OK! lock(b): OK! <overrun runtime> lock(a) -> block (task A owns it) -> self notice/set throttled +--< -> arm replenished timer | switch-out | lock(b) | -> <C prio > B prio> | -> boost TASK B | unlock(a) switch-out | -> handle lock a to B | -> wakeup(B) | -> B is throttled: | -> do not enqueue | switch-out | | +---------------------> replenishment timer -> TASK B is boosted: -> do not enqueue ------------------------------------------------------------------------
BOOM: TASK B is runnable but !enqueued, holding TASK C: the system crashes with hung task C.
This problem is avoided by removing the throttle state from the boosted thread while boosting it (by TASK A in the example above), allowing it to be queued and run boosted.
The next replenishment will take care of the runtime overrun, pushing the deadline further away. See the "while (dl_se->runtime <= 0)" on replenish_dl_entity() for more information.
Reported-by: Mark Simmons msimmons@redhat.com Signed-off-by: Daniel Bristot de Oliveira bristot@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Juri Lelli juri.lelli@redhat.com Tested-by: Mark Simmons msimmons@redhat.com Link: https://lkml.kernel.org/r/5076e003450835ec74e6fa5917d02c4fa41687e6.160017029... [Ankit: Regenerated the patch for v4.19.y] Signed-off-by: Ankit Jain ankitja@vmware.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/deadline.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
--- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1484,6 +1484,27 @@ static void enqueue_task_dl(struct rq *r */ if (pi_task && dl_prio(pi_task->normal_prio) && p->dl.dl_boosted) { pi_se = &pi_task->dl; + /* + * Because of delays in the detection of the overrun of a + * thread's runtime, it might be the case that a thread + * goes to sleep in a rt mutex with negative runtime. As + * a consequence, the thread will be throttled. + * + * While waiting for the mutex, this thread can also be + * boosted via PI, resulting in a thread that is throttled + * and boosted at the same time. + * + * In this case, the boost overrides the throttle. + */ + if (p->dl.dl_throttled) { + /* + * The replenish timer needs to be canceled. No + * problem if it fires concurrently: boosted threads + * are ignored in dl_task_timer(). + */ + hrtimer_try_to_cancel(&p->dl.dl_timer); + p->dl.dl_throttled = 0; + } } else if (!dl_prio(p->normal_prio)) { /* * Special case in which we have a !SCHED_DEADLINE task
From: Lucas Stach l.stach@pengutronix.de
commit 46fcc4b00c3cca8adb9b7c9afdd499f64e427135 upstream.
When a boosted task gets throttled, what normally happens is that it's immediately enqueued again with ENQUEUE_REPLENISH, which replenishes the runtime and clears the dl_throttled flag. There is a special case however: if the throttling happened on sched-out and the task has been deboosted in the meantime, the replenish is skipped as the task will return to its normal scheduling class. This leaves the task with the dl_throttled flag set.
Now if the task gets boosted up to the deadline scheduling class again while it is sleeping, it's still in the throttled state. The normal wakeup however will enqueue the task with ENQUEUE_REPLENISH not set, so we don't actually place it on the rq. Thus we end up with a task that is runnable, but not actually on the rq and neither a immediate replenishment happens, nor is the replenishment timer set up, so the task is stuck in forever-throttled limbo.
Clear the dl_throttled flag before dropping back to the normal scheduling class to fix this issue.
Signed-off-by: Lucas Stach l.stach@pengutronix.de Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Juri Lelli juri.lelli@redhat.com Link: https://lkml.kernel.org/r/20200831110719.2126930-1-l.stach@pengutronix.de [Ankit: Regenerated the patch for v4.19.y] Signed-off-by: Ankit Jain ankitja@vmware.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sched/deadline.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -1507,12 +1507,15 @@ static void enqueue_task_dl(struct rq *r } } else if (!dl_prio(p->normal_prio)) { /* - * Special case in which we have a !SCHED_DEADLINE task - * that is going to be deboosted, but exceeds its - * runtime while doing so. No point in replenishing - * it, as it's going to return back to its original - * scheduling class after this. + * Special case in which we have a !SCHED_DEADLINE task that is going + * to be deboosted, but exceeds its runtime while doing so. No point in + * replenishing it, as it's going to return back to its original + * scheduling class after this. If it has been throttled, we need to + * clear the flag, otherwise the task may wake up as throttled after + * being boosted again with no means to replenish the runtime and clear + * the throttle. */ + p->dl.dl_throttled = 0; BUG_ON(!p->dl.dl_boosted || flags != ENQUEUE_REPLENISH); return; }
From: Juri Lelli juri.lelli@redhat.com
commit 2279f540ea7d05f22d2f0c4224319330228586bc upstream.
Glenn reported that "an application [he developed produces] a BUG in deadline.c when a SCHED_DEADLINE task contends with CFS tasks on nested PTHREAD_PRIO_INHERIT mutexes. I believe the bug is triggered when a CFS task that was boosted by a SCHED_DEADLINE task boosts another CFS task (nested priority inheritance).
------------[ cut here ]------------ kernel BUG at kernel/sched/deadline.c:1462! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 12 PID: 19171 Comm: dl_boost_bug Tainted: ... Hardware name: ... RIP: 0010:enqueue_task_dl+0x335/0x910 Code: ... RSP: 0018:ffffc9000c2bbc68 EFLAGS: 00010002 RAX: 0000000000000009 RBX: ffff888c0af94c00 RCX: ffffffff81e12500 RDX: 000000000000002e RSI: ffff888c0af94c00 RDI: ffff888c10b22600 RBP: ffffc9000c2bbd08 R08: 0000000000000009 R09: 0000000000000078 R10: ffffffff81e12440 R11: ffffffff81e1236c R12: ffff888bc8932600 R13: ffff888c0af94eb8 R14: ffff888c10b22600 R15: ffff888bc8932600 FS: 00007fa58ac55700(0000) GS:ffff888c10b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa58b523230 CR3: 0000000bf44ab003 CR4: 00000000007606e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: ? intel_pstate_update_util_hwp+0x13/0x170 rt_mutex_setprio+0x1cc/0x4b0 task_blocks_on_rt_mutex+0x225/0x260 rt_spin_lock_slowlock_locked+0xab/0x2d0 rt_spin_lock_slowlock+0x50/0x80 hrtimer_grab_expiry_lock+0x20/0x30 hrtimer_cancel+0x13/0x30 do_nanosleep+0xa0/0x150 hrtimer_nanosleep+0xe1/0x230 ? __hrtimer_init_sleeper+0x60/0x60 __x64_sys_nanosleep+0x8d/0xa0 do_syscall_64+0x4a/0x100 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x7fa58b52330d ... ---[ end trace 0000000000000002 ]—
He also provided a simple reproducer creating the situation below:
So the execution order of locking steps are the following (N1 and N2 are non-deadline tasks. D1 is a deadline task. M1 and M2 are mutexes that are enabled * with priority inheritance.)
Time moves forward as this timeline goes down:
N1 N2 D1 | | | | | | Lock(M1) | | | | | | Lock(M2) | | | | | | Lock(M2) | | | | Lock(M1) | | (!!bug triggered!) |
Daniel reported a similar situation as well, by just letting ksoftirqd run with DEADLINE (and eventually block on a mutex).
Problem is that boosted entities (Priority Inheritance) use static DEADLINE parameters of the top priority waiter. However, there might be cases where top waiter could be a non-DEADLINE entity that is currently boosted by a DEADLINE entity from a different lock chain (i.e., nested priority chains involving entities of non-DEADLINE classes). In this case, top waiter static DEADLINE parameters could be null (initialized to 0 at fork()) and replenish_dl_entity() would hit a BUG().
Fix this by keeping track of the original donor and using its parameters when a task is boosted.
Reported-by: Glenn Elliott glenn@aurora.tech Reported-by: Daniel Bristot de Oliveira bristot@redhat.com Signed-off-by: Juri Lelli juri.lelli@redhat.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Tested-by: Daniel Bristot de Oliveira bristot@redhat.com Link: https://lkml.kernel.org/r/20201117061432.517340-1-juri.lelli@redhat.com [Ankit: Regenerated the patch for v4.19.y] Signed-off-by: Ankit Jain ankitja@vmware.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/sched.h | 10 ++++ kernel/sched/core.c | 11 ++--- kernel/sched/deadline.c | 97 ++++++++++++++++++++++++++---------------------- 3 files changed, 68 insertions(+), 50 deletions(-)
--- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -546,7 +546,6 @@ struct sched_dl_entity { * overruns. */ unsigned int dl_throttled : 1; - unsigned int dl_boosted : 1; unsigned int dl_yielded : 1; unsigned int dl_non_contending : 1; unsigned int dl_overrun : 1; @@ -565,6 +564,15 @@ struct sched_dl_entity { * time. */ struct hrtimer inactive_timer; + +#ifdef CONFIG_RT_MUTEXES + /* + * Priority Inheritance. When a DEADLINE scheduling entity is boosted + * pi_se points to the donor, otherwise points to the dl_se it belongs + * to (the original one/itself). + */ + struct sched_dl_entity *pi_se; +#endif };
union rcu_special { --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -3869,20 +3869,21 @@ void rt_mutex_setprio(struct task_struct if (!dl_prio(p->normal_prio) || (pi_task && dl_prio(pi_task->prio) && dl_entity_preempt(&pi_task->dl, &p->dl))) { - p->dl.dl_boosted = 1; + p->dl.pi_se = pi_task->dl.pi_se; queue_flag |= ENQUEUE_REPLENISH; - } else - p->dl.dl_boosted = 0; + } else { + p->dl.pi_se = &p->dl; + } p->sched_class = &dl_sched_class; } else if (rt_prio(prio)) { if (dl_prio(oldprio)) - p->dl.dl_boosted = 0; + p->dl.pi_se = &p->dl; if (oldprio < prio) queue_flag |= ENQUEUE_HEAD; p->sched_class = &rt_sched_class; } else { if (dl_prio(oldprio)) - p->dl.dl_boosted = 0; + p->dl.pi_se = &p->dl; if (rt_prio(oldprio)) p->rt.timeout = 0; p->sched_class = &fair_sched_class; --- a/kernel/sched/deadline.c +++ b/kernel/sched/deadline.c @@ -43,6 +43,28 @@ static inline int on_dl_rq(struct sched_ return !RB_EMPTY_NODE(&dl_se->rb_node); }
+#ifdef CONFIG_RT_MUTEXES +static inline struct sched_dl_entity *pi_of(struct sched_dl_entity *dl_se) +{ + return dl_se->pi_se; +} + +static inline bool is_dl_boosted(struct sched_dl_entity *dl_se) +{ + return pi_of(dl_se) != dl_se; +} +#else +static inline struct sched_dl_entity *pi_of(struct sched_dl_entity *dl_se) +{ + return dl_se; +} + +static inline bool is_dl_boosted(struct sched_dl_entity *dl_se) +{ + return false; +} +#endif + #ifdef CONFIG_SMP static inline struct dl_bw *dl_bw_of(int i) { @@ -657,7 +679,7 @@ static inline void setup_new_dl_entity(s struct dl_rq *dl_rq = dl_rq_of_se(dl_se); struct rq *rq = rq_of_dl_rq(dl_rq);
- WARN_ON(dl_se->dl_boosted); + WARN_ON(is_dl_boosted(dl_se)); WARN_ON(dl_time_before(rq_clock(rq), dl_se->deadline));
/* @@ -695,21 +717,20 @@ static inline void setup_new_dl_entity(s * could happen are, typically, a entity voluntarily trying to overcome its * runtime, or it just underestimated it during sched_setattr(). */ -static void replenish_dl_entity(struct sched_dl_entity *dl_se, - struct sched_dl_entity *pi_se) +static void replenish_dl_entity(struct sched_dl_entity *dl_se) { struct dl_rq *dl_rq = dl_rq_of_se(dl_se); struct rq *rq = rq_of_dl_rq(dl_rq);
- BUG_ON(pi_se->dl_runtime <= 0); + BUG_ON(pi_of(dl_se)->dl_runtime <= 0);
/* * This could be the case for a !-dl task that is boosted. * Just go with full inherited parameters. */ if (dl_se->dl_deadline == 0) { - dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline; - dl_se->runtime = pi_se->dl_runtime; + dl_se->deadline = rq_clock(rq) + pi_of(dl_se)->dl_deadline; + dl_se->runtime = pi_of(dl_se)->dl_runtime; }
if (dl_se->dl_yielded && dl_se->runtime > 0) @@ -722,8 +743,8 @@ static void replenish_dl_entity(struct s * arbitrary large. */ while (dl_se->runtime <= 0) { - dl_se->deadline += pi_se->dl_period; - dl_se->runtime += pi_se->dl_runtime; + dl_se->deadline += pi_of(dl_se)->dl_period; + dl_se->runtime += pi_of(dl_se)->dl_runtime; }
/* @@ -737,8 +758,8 @@ static void replenish_dl_entity(struct s */ if (dl_time_before(dl_se->deadline, rq_clock(rq))) { printk_deferred_once("sched: DL replenish lagged too much\n"); - dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline; - dl_se->runtime = pi_se->dl_runtime; + dl_se->deadline = rq_clock(rq) + pi_of(dl_se)->dl_deadline; + dl_se->runtime = pi_of(dl_se)->dl_runtime; }
if (dl_se->dl_yielded) @@ -771,8 +792,7 @@ static void replenish_dl_entity(struct s * task with deadline equal to period this is the same of using * dl_period instead of dl_deadline in the equation above. */ -static bool dl_entity_overflow(struct sched_dl_entity *dl_se, - struct sched_dl_entity *pi_se, u64 t) +static bool dl_entity_overflow(struct sched_dl_entity *dl_se, u64 t) { u64 left, right;
@@ -794,9 +814,9 @@ static bool dl_entity_overflow(struct sc * of anything below microseconds resolution is actually fiction * (but still we want to give the user that illusion >;). */ - left = (pi_se->dl_deadline >> DL_SCALE) * (dl_se->runtime >> DL_SCALE); + left = (pi_of(dl_se)->dl_deadline >> DL_SCALE) * (dl_se->runtime >> DL_SCALE); right = ((dl_se->deadline - t) >> DL_SCALE) * - (pi_se->dl_runtime >> DL_SCALE); + (pi_of(dl_se)->dl_runtime >> DL_SCALE);
return dl_time_before(right, left); } @@ -881,24 +901,23 @@ static inline bool dl_is_implicit(struct * Please refer to the comments update_dl_revised_wakeup() function to find * more about the Revised CBS rule. */ -static void update_dl_entity(struct sched_dl_entity *dl_se, - struct sched_dl_entity *pi_se) +static void update_dl_entity(struct sched_dl_entity *dl_se) { struct dl_rq *dl_rq = dl_rq_of_se(dl_se); struct rq *rq = rq_of_dl_rq(dl_rq);
if (dl_time_before(dl_se->deadline, rq_clock(rq)) || - dl_entity_overflow(dl_se, pi_se, rq_clock(rq))) { + dl_entity_overflow(dl_se, rq_clock(rq))) {
if (unlikely(!dl_is_implicit(dl_se) && !dl_time_before(dl_se->deadline, rq_clock(rq)) && - !dl_se->dl_boosted)){ + !is_dl_boosted(dl_se))) { update_dl_revised_wakeup(dl_se, rq); return; }
- dl_se->deadline = rq_clock(rq) + pi_se->dl_deadline; - dl_se->runtime = pi_se->dl_runtime; + dl_se->deadline = rq_clock(rq) + pi_of(dl_se)->dl_deadline; + dl_se->runtime = pi_of(dl_se)->dl_runtime; } }
@@ -997,7 +1016,7 @@ static enum hrtimer_restart dl_task_time * The task might have been boosted by someone else and might be in the * boosting/deboosting path, its not throttled. */ - if (dl_se->dl_boosted) + if (is_dl_boosted(dl_se)) goto unlock;
/* @@ -1025,7 +1044,7 @@ static enum hrtimer_restart dl_task_time * but do not enqueue -- wait for our wakeup to do that. */ if (!task_on_rq_queued(p)) { - replenish_dl_entity(dl_se, dl_se); + replenish_dl_entity(dl_se); goto unlock; }
@@ -1115,7 +1134,7 @@ static inline void dl_check_constrained_
if (dl_time_before(dl_se->deadline, rq_clock(rq)) && dl_time_before(rq_clock(rq), dl_next_period(dl_se))) { - if (unlikely(dl_se->dl_boosted || !start_dl_timer(p))) + if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(p))) return; dl_se->dl_throttled = 1; if (dl_se->runtime > 0) @@ -1246,7 +1265,7 @@ throttle: dl_se->dl_overrun = 1;
__dequeue_task_dl(rq, curr, 0); - if (unlikely(dl_se->dl_boosted || !start_dl_timer(curr))) + if (unlikely(is_dl_boosted(dl_se) || !start_dl_timer(curr))) enqueue_task_dl(rq, curr, ENQUEUE_REPLENISH);
if (!is_leftmost(curr, &rq->dl)) @@ -1440,8 +1459,7 @@ static void __dequeue_dl_entity(struct s }
static void -enqueue_dl_entity(struct sched_dl_entity *dl_se, - struct sched_dl_entity *pi_se, int flags) +enqueue_dl_entity(struct sched_dl_entity *dl_se, int flags) { BUG_ON(on_dl_rq(dl_se));
@@ -1452,9 +1470,9 @@ enqueue_dl_entity(struct sched_dl_entity */ if (flags & ENQUEUE_WAKEUP) { task_contending(dl_se, flags); - update_dl_entity(dl_se, pi_se); + update_dl_entity(dl_se); } else if (flags & ENQUEUE_REPLENISH) { - replenish_dl_entity(dl_se, pi_se); + replenish_dl_entity(dl_se); } else if ((flags & ENQUEUE_RESTORE) && dl_time_before(dl_se->deadline, rq_clock(rq_of_dl_rq(dl_rq_of_se(dl_se))))) { @@ -1471,19 +1489,7 @@ static void dequeue_dl_entity(struct sch
static void enqueue_task_dl(struct rq *rq, struct task_struct *p, int flags) { - struct task_struct *pi_task = rt_mutex_get_top_task(p); - struct sched_dl_entity *pi_se = &p->dl; - - /* - * Use the scheduling parameters of the top pi-waiter task if: - * - we have a top pi-waiter which is a SCHED_DEADLINE task AND - * - our dl_boosted is set (i.e. the pi-waiter's (absolute) deadline is - * smaller than our deadline OR we are a !SCHED_DEADLINE task getting - * boosted due to a SCHED_DEADLINE pi-waiter). - * Otherwise we keep our runtime and deadline. - */ - if (pi_task && dl_prio(pi_task->normal_prio) && p->dl.dl_boosted) { - pi_se = &pi_task->dl; + if (is_dl_boosted(&p->dl)) { /* * Because of delays in the detection of the overrun of a * thread's runtime, it might be the case that a thread @@ -1516,7 +1522,7 @@ static void enqueue_task_dl(struct rq *r * the throttle. */ p->dl.dl_throttled = 0; - BUG_ON(!p->dl.dl_boosted || flags != ENQUEUE_REPLENISH); + BUG_ON(!is_dl_boosted(&p->dl) || flags != ENQUEUE_REPLENISH); return; }
@@ -1553,7 +1559,7 @@ static void enqueue_task_dl(struct rq *r return; }
- enqueue_dl_entity(&p->dl, pi_se, flags); + enqueue_dl_entity(&p->dl, flags);
if (!task_current(rq, p) && p->nr_cpus_allowed > 1) enqueue_pushable_dl_task(rq, p); @@ -2715,11 +2721,14 @@ void __dl_clear_params(struct task_struc dl_se->dl_bw = 0; dl_se->dl_density = 0;
- dl_se->dl_boosted = 0; dl_se->dl_throttled = 0; dl_se->dl_yielded = 0; dl_se->dl_non_contending = 0; dl_se->dl_overrun = 0; + +#ifdef CONFIG_RT_MUTEXES + dl_se->pi_se = dl_se; +#endif }
bool dl_param_changed(struct task_struct *p, const struct sched_attr *attr)
From: Hui Su suhui_kernel@163.com
commit 0e3872499de1a1230cef5221607d71aa09264bd5 upstream.
since commit 2279f540ea7d ("sched/deadline: Fix priority inheritance with multiple scheduling classes"), we should not keep it here.
Signed-off-by: Hui Su suhui_kernel@163.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Daniel Bristot de Oliveira bristot@redhat.com Link: https://lore.kernel.org/r/20220107095254.GA49258@localhost.localdomain [Ankit: Regenerated the patch for v4.19.y] Signed-off-by: Ankit Jain ankitja@vmware.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/sched.h | 4 ---- 1 file changed, 4 deletions(-)
--- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -528,10 +528,6 @@ struct sched_dl_entity { * task has to wait for a replenishment to be performed at the * next firing of dl_timer. * - * @dl_boosted tells if we are boosted due to DI. If so we are - * outside bandwidth enforcement mechanism (but only until we - * exit the critical section); - * * @dl_yielded tells if task gave up the CPU before consuming * all its available runtime during the last job. *
From: Xin Xiong xiongx18@fudan.edu.cn
[ Upstream commit 9c9cb23e00ddf45679b21b4dacc11d1ae7961ebe ]
The issue happens on an error path in __xfrm_policy_check(). When the fetching process of the object `pols[1]` fails, the function simply returns 0, forgetting to decrement the reference count of `pols[0]`, which is incremented earlier by either xfrm_sk_policy_lookup() or xfrm_policy_lookup(). This may result in memory leaks.
Fix it by decreasing the reference count of `pols[0]` in that path.
Fixes: 134b0fc544ba ("IPsec: propagate security module errors up from flow_cache_lookup") Signed-off-by: Xin Xiong xiongx18@fudan.edu.cn Signed-off-by: Xin Tan tanxin.ctf@gmail.com Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/xfrm/xfrm_policy.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index 3582f77bab6a8..1cd21a8c4deac 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -2403,6 +2403,7 @@ int __xfrm_policy_check(struct sock *sk, int dir, struct sk_buff *skb, if (pols[1]) { if (IS_ERR(pols[1])) { XFRM_INC_STATS(net, LINUX_MIB_XFRMINPOLERROR); + xfrm_pol_put(pols[0]); return 0; } pols[1]->curlft.use_time = ktime_get_real_seconds();
From: Herbert Xu herbert@gondor.apana.org.au
[ Upstream commit ba953a9d89a00c078b85f4b190bc1dde66fe16b5 ]
When namespace support was added to xfrm/afkey, it caused the previously single-threaded call to xfrm_probe_algs to become multi-threaded. This is buggy and needs to be fixed with a mutex.
Reported-by: Abhishek Shah abhishek.shah@columbia.edu Fixes: 283bc9f35bbb ("xfrm: Namespacify xfrm state/policy locks") Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Steffen Klassert steffen.klassert@secunet.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/key/af_key.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/key/af_key.c b/net/key/af_key.c index af67e0d265c05..337c6bc8211ed 100644 --- a/net/key/af_key.c +++ b/net/key/af_key.c @@ -1707,9 +1707,12 @@ static int pfkey_register(struct sock *sk, struct sk_buff *skb, const struct sad pfk->registered |= (1<<hdr->sadb_msg_satype); }
+ mutex_lock(&pfkey_mutex); xfrm_probe_algs();
supp_skb = compose_sadb_supported(hdr, GFP_KERNEL | __GFP_ZERO); + mutex_unlock(&pfkey_mutex); + if (!supp_skb) { if (hdr->sadb_msg_satype != SADB_SATYPE_UNSPEC) pfk->registered &= ~(1<<hdr->sadb_msg_satype);
From: Bernard Pidoux f6bvp@free.fr
[ Upstream commit 3c53cd65dece47dd1f9d3a809f32e59d1d87b2b8 ]
Commit 3b3fd068c56e3fbea30090859216a368398e39bf added NULL check for `rose_loopback_neigh->dev` in rose_loopback_timer() but omitted to check rose_loopback_neigh->loopback.
It thus prevents *all* rose connect.
The reason is that a special rose_neigh loopback has a NULL device.
/proc/net/rose_neigh illustrates it via rose_neigh_show() function : [...] seq_printf(seq, "%05d %-9s %-4s %3d %3d %3s %3s %3lu %3lu", rose_neigh->number, (rose_neigh->loopback) ? "RSLOOP-0" : ax2asc(buf, &rose_neigh->callsign), rose_neigh->dev ? rose_neigh->dev->name : "???", rose_neigh->count,
/proc/net/rose_neigh displays special rose_loopback_neigh->loopback as callsign RSLOOP-0:
addr callsign dev count use mode restart t0 tf digipeaters 00001 RSLOOP-0 ??? 1 2 DCE yes 0 0
By checking rose_loopback_neigh->loopback, rose_rx_call_request() is called even in case rose_loopback_neigh->dev is NULL. This repairs rose connections.
Verification with rose client application FPAC:
FPAC-Node v 4.1.3 (built Aug 5 2022) for LINUX (help = h) F6BVP-4 (Commands = ?) : u Users - AX.25 Level 2 sessions : Port Callsign Callsign AX.25 state ROSE state NetRom status axudp F6BVP-5 -> F6BVP-9 Connected Connected ---------
Fixes: 3b3fd068c56e ("rose: Fix Null pointer dereference in rose_send_frame()") Signed-off-by: Bernard Pidoux f6bvp@free.fr Suggested-by: Francois Romieu romieu@fr.zoreil.com Cc: Thomas DL9SAU Osterried thomas@osterried.de Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rose/rose_loopback.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/rose/rose_loopback.c b/net/rose/rose_loopback.c index c318e5c9f6df3..56eea298b8ef7 100644 --- a/net/rose/rose_loopback.c +++ b/net/rose/rose_loopback.c @@ -99,7 +99,8 @@ static void rose_loopback_timer(struct timer_list *unused) }
if (frametype == ROSE_CALL_REQUEST) { - if (!rose_loopback_neigh->dev) { + if (!rose_loopback_neigh->dev && + !rose_loopback_neigh->loopback) { kfree_skb(skb); continue; }
From: Jonathan Toppins jtoppins@redhat.com
[ Upstream commit d745b5062ad2b5da90a5e728d7ca884fc07315fd ]
This is caused by the global variable ad_ticks_per_sec being zero as demonstrated by the reproducer script discussed below. This causes all timer values in __ad_timer_to_ticks to be zero, resulting in the periodic timer to never fire.
To reproduce: Run the script in `tools/testing/selftests/drivers/net/bonding/bond-break-lacpdu-tx.sh` which puts bonding into a state where it never transmits LACPDUs.
line 44: ip link add fbond type bond mode 4 miimon 200 \ xmit_hash_policy 1 ad_actor_sys_prio 65535 lacp_rate fast setting bond param: ad_actor_sys_prio given: params.ad_actor_system = 0 call stack: bond_option_ad_actor_sys_prio() -> bond_3ad_update_ad_actor_settings() -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio -> ad.system.sys_mac_addr = bond->dev->dev_addr; because params.ad_actor_system == 0 results: ad.system.sys_mac_addr = bond->dev->dev_addr
line 48: ip link set fbond address 52:54:00:3B:7C:A6 setting bond MAC addr call stack: bond->dev->dev_addr = new_mac
line 52: ip link set fbond type bond ad_actor_sys_prio 65535 setting bond param: ad_actor_sys_prio given: params.ad_actor_system = 0 call stack: bond_option_ad_actor_sys_prio() -> bond_3ad_update_ad_actor_settings() -> set ad.system.sys_priority = bond->params.ad_actor_sys_prio -> ad.system.sys_mac_addr = bond->dev->dev_addr; because params.ad_actor_system == 0 results: ad.system.sys_mac_addr = bond->dev->dev_addr
line 60: ip link set veth1-bond down master fbond given: params.ad_actor_system = 0 params.mode = BOND_MODE_8023AD ad.system.sys_mac_addr == bond->dev->dev_addr call stack: bond_enslave -> bond_3ad_initialize(); because first slave -> if ad.system.sys_mac_addr != bond->dev->dev_addr return results: Nothing is run in bond_3ad_initialize() because dev_addr equals sys_mac_addr leaving the global ad_ticks_per_sec zero as it is never initialized anywhere else.
The if check around the contents of bond_3ad_initialize() is no longer needed due to commit 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings changes immediately") which sets ad.system.sys_mac_addr if any one of the bonding parameters whos set function calls bond_3ad_update_ad_actor_settings(). This is because if ad.system.sys_mac_addr is zero it will be set to the current bond mac address, this causes the if check to never be true.
Fixes: 5ee14e6d336f ("bonding: 3ad: apply ad_actor settings changes immediately") Signed-off-by: Jonathan Toppins jtoppins@redhat.com Acked-by: Jay Vosburgh jay.vosburgh@canonical.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_3ad.c | 38 ++++++++++++++-------------------- 1 file changed, 16 insertions(+), 22 deletions(-)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index b3eaef31b7673..a6bb7e915f74f 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -1977,30 +1977,24 @@ void bond_3ad_initiate_agg_selection(struct bonding *bond, int timeout) */ void bond_3ad_initialize(struct bonding *bond, u16 tick_resolution) { - /* check that the bond is not initialized yet */ - if (!MAC_ADDRESS_EQUAL(&(BOND_AD_INFO(bond).system.sys_mac_addr), - bond->dev->dev_addr)) { - - BOND_AD_INFO(bond).aggregator_identifier = 0; - - BOND_AD_INFO(bond).system.sys_priority = - bond->params.ad_actor_sys_prio; - if (is_zero_ether_addr(bond->params.ad_actor_system)) - BOND_AD_INFO(bond).system.sys_mac_addr = - *((struct mac_addr *)bond->dev->dev_addr); - else - BOND_AD_INFO(bond).system.sys_mac_addr = - *((struct mac_addr *)bond->params.ad_actor_system); + BOND_AD_INFO(bond).aggregator_identifier = 0; + BOND_AD_INFO(bond).system.sys_priority = + bond->params.ad_actor_sys_prio; + if (is_zero_ether_addr(bond->params.ad_actor_system)) + BOND_AD_INFO(bond).system.sys_mac_addr = + *((struct mac_addr *)bond->dev->dev_addr); + else + BOND_AD_INFO(bond).system.sys_mac_addr = + *((struct mac_addr *)bond->params.ad_actor_system);
- /* initialize how many times this module is called in one - * second (should be about every 100ms) - */ - ad_ticks_per_sec = tick_resolution; + /* initialize how many times this module is called in one + * second (should be about every 100ms) + */ + ad_ticks_per_sec = tick_resolution;
- bond_3ad_initiate_agg_selection(bond, - AD_AGGREGATOR_SELECTION_TIMER * - ad_ticks_per_sec); - } + bond_3ad_initiate_agg_selection(bond, + AD_AGGREGATOR_SELECTION_TIMER * + ad_ticks_per_sec); }
/**
From: Maciej Żenczykowski maze@google.com
[ Upstream commit 4b2e3a17e9f279325712b79fb01d1493f9e3e005 ]
Looks to have been left out in an oversight.
Cc: Mahesh Bandewar maheshb@google.com Cc: Sainath Grandhi sainath.grandhi@intel.com Fixes: 235a9d89da97 ('ipvtap: IP-VLAN based tap driver') Signed-off-by: Maciej Żenczykowski maze@google.com Link: https://lore.kernel.org/r/20220821130808.12143-1-zenczykowski@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ipvlan/ipvtap.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ipvlan/ipvtap.c b/drivers/net/ipvlan/ipvtap.c index 0bcc07f346c3e..2e517e30c5ac1 100644 --- a/drivers/net/ipvlan/ipvtap.c +++ b/drivers/net/ipvlan/ipvtap.c @@ -193,7 +193,7 @@ static struct notifier_block ipvtap_notifier_block __read_mostly = { .notifier_call = ipvtap_device_event, };
-static int ipvtap_init(void) +static int __init ipvtap_init(void) { int err;
@@ -227,7 +227,7 @@ static int ipvtap_init(void) } module_init(ipvtap_init);
-static void ipvtap_exit(void) +static void __exit ipvtap_exit(void) { rtnl_link_unregister(&ipvtap_link_ops); unregister_netdevice_notifier(&ipvtap_notifier_block);
From: Florian Westphal fw@strlen.de
[ Upstream commit 7997eff82828304b780dc0a39707e1946d6f1ebf ]
Harshit Mogalapalli says: In ebt_do_table() function dereferencing 'private->hook_entry[hook]' can lead to NULL pointer dereference. [..] Kernel panic:
general protection fault, probably for non-canonical address 0xdffffc0000000005: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000028-0x000000000000002f] [..] RIP: 0010:ebt_do_table+0x1dc/0x1ce0 Code: 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 5c 16 00 00 48 b8 00 00 00 00 00 fc ff df 49 8b 6c df 08 48 8d 7d 2c 48 89 fa 48 c1 ea 03 <0f> b6 14 02 48 89 f8 83 e0 07 83 c0 03 38 d0 7c 08 84 d2 0f 85 88 [..] Call Trace: nf_hook_slow+0xb1/0x170 __br_forward+0x289/0x730 maybe_deliver+0x24b/0x380 br_flood+0xc6/0x390 br_dev_xmit+0xa2e/0x12c0
For some reason ebtables rejects blobs that provide entry points that are not supported by the table, but what it should instead reject is the opposite: blobs that DO NOT provide an entry point supported by the table.
t->valid_hooks is the bitmask of hooks (input, forward ...) that will see packets. Providing an entry point that is not support is harmless (never called/used), but the inverse isn't: it results in a crash because the ebtables traverser doesn't expect a NULL blob for a location its receiving packets for.
Instead of fixing all the individual checks, do what iptables is doing and reject all blobs that differ from the expected hooks.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com Reported-by: syzkaller syzkaller@googlegroups.com Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/netfilter_bridge/ebtables.h | 4 ---- net/bridge/netfilter/ebtable_broute.c | 8 -------- net/bridge/netfilter/ebtable_filter.c | 8 -------- net/bridge/netfilter/ebtable_nat.c | 8 -------- net/bridge/netfilter/ebtables.c | 8 +------- 5 files changed, 1 insertion(+), 35 deletions(-)
diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h index c6935be7c6ca3..954ffe32f6227 100644 --- a/include/linux/netfilter_bridge/ebtables.h +++ b/include/linux/netfilter_bridge/ebtables.h @@ -94,10 +94,6 @@ struct ebt_table { struct ebt_replace_kernel *table; unsigned int valid_hooks; rwlock_t lock; - /* e.g. could be the table explicitly only allows certain - * matches, targets, ... 0 == let it in */ - int (*check)(const struct ebt_table_info *info, - unsigned int valid_hooks); /* the data used by the kernel */ struct ebt_table_info *private; struct module *me; diff --git a/net/bridge/netfilter/ebtable_broute.c b/net/bridge/netfilter/ebtable_broute.c index 276b60262981c..b21c8a317be73 100644 --- a/net/bridge/netfilter/ebtable_broute.c +++ b/net/bridge/netfilter/ebtable_broute.c @@ -33,18 +33,10 @@ static struct ebt_replace_kernel initial_table = { .entries = (char *)&initial_chain, };
-static int check(const struct ebt_table_info *info, unsigned int valid_hooks) -{ - if (valid_hooks & ~(1 << NF_BR_BROUTING)) - return -EINVAL; - return 0; -} - static const struct ebt_table broute_table = { .name = "broute", .table = &initial_table, .valid_hooks = 1 << NF_BR_BROUTING, - .check = check, .me = THIS_MODULE, };
diff --git a/net/bridge/netfilter/ebtable_filter.c b/net/bridge/netfilter/ebtable_filter.c index 550324c516ee3..c71795e4c18cf 100644 --- a/net/bridge/netfilter/ebtable_filter.c +++ b/net/bridge/netfilter/ebtable_filter.c @@ -42,18 +42,10 @@ static struct ebt_replace_kernel initial_table = { .entries = (char *)initial_chains, };
-static int check(const struct ebt_table_info *info, unsigned int valid_hooks) -{ - if (valid_hooks & ~FILTER_VALID_HOOKS) - return -EINVAL; - return 0; -} - static const struct ebt_table frame_filter = { .name = "filter", .table = &initial_table, .valid_hooks = FILTER_VALID_HOOKS, - .check = check, .me = THIS_MODULE, };
diff --git a/net/bridge/netfilter/ebtable_nat.c b/net/bridge/netfilter/ebtable_nat.c index c0fb3ca518af8..44dde9e635e24 100644 --- a/net/bridge/netfilter/ebtable_nat.c +++ b/net/bridge/netfilter/ebtable_nat.c @@ -42,18 +42,10 @@ static struct ebt_replace_kernel initial_table = { .entries = (char *)initial_chains, };
-static int check(const struct ebt_table_info *info, unsigned int valid_hooks) -{ - if (valid_hooks & ~NAT_VALID_HOOKS) - return -EINVAL; - return 0; -} - static const struct ebt_table frame_nat = { .name = "nat", .table = &initial_table, .valid_hooks = NAT_VALID_HOOKS, - .check = check, .me = THIS_MODULE, };
diff --git a/net/bridge/netfilter/ebtables.c b/net/bridge/netfilter/ebtables.c index f59230e4fc295..ea27bacbd0057 100644 --- a/net/bridge/netfilter/ebtables.c +++ b/net/bridge/netfilter/ebtables.c @@ -1003,8 +1003,7 @@ static int do_replace_finish(struct net *net, struct ebt_replace *repl, goto free_iterate; }
- /* the table doesn't like it */ - if (t->check && (ret = t->check(newinfo, repl->valid_hooks))) + if (repl->valid_hooks != t->valid_hooks) goto free_unlock;
if (repl->num_counters && repl->num_counters != t->private->nentries) { @@ -1197,11 +1196,6 @@ int ebt_register_table(struct net *net, const struct ebt_table *input_table, if (ret != 0) goto free_chainstack;
- if (table->check && table->check(newinfo, table->valid_hooks)) { - ret = -EINVAL; - goto free_chainstack; - } - table->private = newinfo; rwlock_init(&table->lock); mutex_lock(&ebt_mutex);
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 94254f990c07e9ddf1634e0b727fab821c3b5bf9 ]
Instead of offset and length are truncation to u8, report ERANGE.
Fixes: 96518518cc41 ("netfilter: add nftables") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_payload.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c index fd87216bc0a99..04b9df9e39554 100644 --- a/net/netfilter/nft_payload.c +++ b/net/netfilter/nft_payload.c @@ -398,6 +398,7 @@ nft_payload_select_ops(const struct nft_ctx *ctx, { enum nft_payload_bases base; unsigned int offset, len; + int err;
if (tb[NFTA_PAYLOAD_BASE] == NULL || tb[NFTA_PAYLOAD_OFFSET] == NULL || @@ -423,8 +424,13 @@ nft_payload_select_ops(const struct nft_ctx *ctx, if (tb[NFTA_PAYLOAD_DREG] == NULL) return ERR_PTR(-EINVAL);
- offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET])); - len = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_LEN])); + err = nft_parse_u32_check(tb[NFTA_PAYLOAD_OFFSET], U8_MAX, &offset); + if (err < 0) + return ERR_PTR(err); + + err = nft_parse_u32_check(tb[NFTA_PAYLOAD_LEN], U8_MAX, &len); + if (err < 0) + return ERR_PTR(err);
if (len <= 4 && is_power_of_2(len) && IS_ALIGNED(offset, len) && base != NFT_PAYLOAD_LL_HEADER)
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 7044ab281febae9e2fa9b0b247693d6026166293 ]
Instead report ERANGE if csum_offset is too long, and EOPNOTSUPP if type is not support.
Fixes: 7ec3f7b47b8d ("netfilter: nft_payload: add packet mangling support") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_payload.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c index 04b9df9e39554..5732b32ab9320 100644 --- a/net/netfilter/nft_payload.c +++ b/net/netfilter/nft_payload.c @@ -332,6 +332,8 @@ static int nft_payload_set_init(const struct nft_ctx *ctx, const struct nlattr * const tb[]) { struct nft_payload_set *priv = nft_expr_priv(expr); + u32 csum_offset, csum_type = NFT_PAYLOAD_CSUM_NONE; + int err;
priv->base = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE])); priv->offset = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_OFFSET])); @@ -339,11 +341,15 @@ static int nft_payload_set_init(const struct nft_ctx *ctx, priv->sreg = nft_parse_register(tb[NFTA_PAYLOAD_SREG]);
if (tb[NFTA_PAYLOAD_CSUM_TYPE]) - priv->csum_type = - ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_TYPE])); - if (tb[NFTA_PAYLOAD_CSUM_OFFSET]) - priv->csum_offset = - ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_OFFSET])); + csum_type = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_CSUM_TYPE])); + if (tb[NFTA_PAYLOAD_CSUM_OFFSET]) { + err = nft_parse_u32_check(tb[NFTA_PAYLOAD_CSUM_OFFSET], U8_MAX, + &csum_offset); + if (err < 0) + return err; + + priv->csum_offset = csum_offset; + } if (tb[NFTA_PAYLOAD_CSUM_FLAGS]) { u32 flags;
@@ -354,13 +360,14 @@ static int nft_payload_set_init(const struct nft_ctx *ctx, priv->csum_flags = flags; }
- switch (priv->csum_type) { + switch (csum_type) { case NFT_PAYLOAD_CSUM_NONE: case NFT_PAYLOAD_CSUM_INET: break; default: return -EOPNOTSUPP; } + priv->csum_type = csum_type;
return nft_validate_register_load(priv->sreg, priv->len); }
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 5f3b7aae14a706d0d7da9f9e39def52ff5fc3d39 ]
As it was originally intended, restrict extension to supported families.
Fixes: b96af92d6eaf ("netfilter: nf_tables: implement Passive OS fingerprint module in nft_osf") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_osf.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/net/netfilter/nft_osf.c b/net/netfilter/nft_osf.c index e259454b6a643..4fac2d9a4b885 100644 --- a/net/netfilter/nft_osf.c +++ b/net/netfilter/nft_osf.c @@ -81,9 +81,21 @@ static int nft_osf_validate(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nft_data **data) { - return nft_chain_validate_hooks(ctx->chain, (1 << NF_INET_LOCAL_IN) | - (1 << NF_INET_PRE_ROUTING) | - (1 << NF_INET_FORWARD)); + unsigned int hooks; + + switch (ctx->family) { + case NFPROTO_IPV4: + case NFPROTO_IPV6: + case NFPROTO_INET: + hooks = (1 << NF_INET_LOCAL_IN) | + (1 << NF_INET_PRE_ROUTING) | + (1 << NF_INET_FORWARD); + break; + default: + return -EOPNOTSUPP; + } + + return nft_chain_validate_hooks(ctx->chain, hooks); }
static struct nft_expr_type nft_osf_type;
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 01e4092d53bc4fe122a6e4b6d664adbd57528ca3 ]
Only allow to use this expression from NFPROTO_NETDEV family.
Fixes: af308b94a2a4 ("netfilter: nf_tables: add tunnel support") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_tunnel.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/netfilter/nft_tunnel.c b/net/netfilter/nft_tunnel.c index 8ae948fd9dcfc..3fc55c81f16ac 100644 --- a/net/netfilter/nft_tunnel.c +++ b/net/netfilter/nft_tunnel.c @@ -104,6 +104,7 @@ static const struct nft_expr_ops nft_tunnel_get_ops = {
static struct nft_expr_type nft_tunnel_type __read_mostly = { .name = "tunnel", + .family = NFPROTO_NETDEV, .ops = &nft_tunnel_get_ops, .policy = nft_tunnel_policy, .maxattr = NFTA_TUNNEL_MAX,
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit bf955b5ab8f6f7b0632cdef8e36b14e4f6e77829 ]
While reading weight_p, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Also, dev_[rt]x_weight can be read/written at the same time. So, we need to use READ_ONCE() and WRITE_ONCE() for its access. Moreover, to use the same weight_p while changing dev_[rt]x_weight, we add a mutex in proc_do_dev_weight().
Fixes: 3d48b53fb2ae ("net: dev_weight: TX/RX orthogonality") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/dev.c | 2 +- net/core/sysctl_net_core.c | 15 +++++++++------ net/sched/sch_generic.c | 2 +- 3 files changed, 11 insertions(+), 8 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c index 42f6ff8b9703c..6ee4390863fbe 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -5851,7 +5851,7 @@ static int process_backlog(struct napi_struct *napi, int quota) net_rps_action_and_irq_enable(sd); }
- napi->weight = dev_rx_weight; + napi->weight = READ_ONCE(dev_rx_weight); while (again) { struct sk_buff *skb;
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 0a0bf80623658..d7e39167ceca0 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -231,14 +231,17 @@ static int set_default_qdisc(struct ctl_table *table, int write, static int proc_do_dev_weight(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { - int ret; + static DEFINE_MUTEX(dev_weight_mutex); + int ret, weight;
+ mutex_lock(&dev_weight_mutex); ret = proc_dointvec(table, write, buffer, lenp, ppos); - if (ret != 0) - return ret; - - dev_rx_weight = weight_p * dev_weight_rx_bias; - dev_tx_weight = weight_p * dev_weight_tx_bias; + if (!ret && write) { + weight = READ_ONCE(weight_p); + WRITE_ONCE(dev_rx_weight, weight * dev_weight_rx_bias); + WRITE_ONCE(dev_tx_weight, weight * dev_weight_tx_bias); + } + mutex_unlock(&dev_weight_mutex);
return ret; } diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index cad2586c34734..c966dacf1130b 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -397,7 +397,7 @@ static inline bool qdisc_restart(struct Qdisc *q, int *packets)
void __qdisc_run(struct Qdisc *q) { - int quota = dev_tx_weight; + int quota = READ_ONCE(dev_tx_weight); int packets;
while (qdisc_restart(q, &packets)) {
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 61adf447e38664447526698872e21c04623afb8e ]
While reading netdev_tstamp_prequeue, it can be changed concurrently. Thus, we need to add READ_ONCE() to its readers.
Fixes: 3b098e2d7c69 ("net: Consistent skb timestamping") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/dev.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/core/dev.c b/net/core/dev.c index 6ee4390863fbe..d5bef2bb0b7c8 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -4474,7 +4474,7 @@ static int netif_rx_internal(struct sk_buff *skb) { int ret;
- net_timestamp_check(netdev_tstamp_prequeue, skb); + net_timestamp_check(READ_ONCE(netdev_tstamp_prequeue), skb);
trace_netif_rx(skb);
@@ -4794,7 +4794,7 @@ static int __netif_receive_skb_core(struct sk_buff **pskb, bool pfmemalloc, int ret = NET_RX_DROP; __be16 type;
- net_timestamp_check(!netdev_tstamp_prequeue, skb); + net_timestamp_check(!READ_ONCE(netdev_tstamp_prequeue), skb);
trace_netif_receive_skb(skb);
@@ -5146,7 +5146,7 @@ static int netif_receive_skb_internal(struct sk_buff *skb) { int ret;
- net_timestamp_check(netdev_tstamp_prequeue, skb); + net_timestamp_check(READ_ONCE(netdev_tstamp_prequeue), skb);
if (skb_defer_rx_timestamp(skb)) return NET_RX_SUCCESS; @@ -5176,7 +5176,7 @@ static void netif_receive_skb_list_internal(struct list_head *head)
INIT_LIST_HEAD(&sublist); list_for_each_entry_safe(skb, next, head, list) { - net_timestamp_check(netdev_tstamp_prequeue, skb); + net_timestamp_check(READ_ONCE(netdev_tstamp_prequeue), skb); skb_list_del_init(skb); if (!skb_defer_rx_timestamp(skb)) list_add_tail(&skb->list, &sublist);
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 6bae8ceb90ba76cdba39496db936164fa672b9be ]
While reading rs->interval and rs->burst, they can be changed concurrently via sysctl (e.g. net_ratelimit_state). Thus, we need to add READ_ONCE() to their readers.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- lib/ratelimit.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/lib/ratelimit.c b/lib/ratelimit.c index d01f471352390..b805702de84dd 100644 --- a/lib/ratelimit.c +++ b/lib/ratelimit.c @@ -27,10 +27,16 @@ */ int ___ratelimit(struct ratelimit_state *rs, const char *func) { + /* Paired with WRITE_ONCE() in .proc_handler(). + * Changing two values seperately could be inconsistent + * and some message could be lost. (See: net_ratelimit_state). + */ + int interval = READ_ONCE(rs->interval); + int burst = READ_ONCE(rs->burst); unsigned long flags; int ret;
- if (!rs->interval) + if (!interval) return 1;
/* @@ -45,7 +51,7 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func) if (!rs->begin) rs->begin = jiffies;
- if (time_is_before_jiffies(rs->begin + rs->interval)) { + if (time_is_before_jiffies(rs->begin + interval)) { if (rs->missed) { if (!(rs->flags & RATELIMIT_MSG_ON_RELEASE)) { printk_deferred(KERN_WARNING @@ -57,7 +63,7 @@ int ___ratelimit(struct ratelimit_state *rs, const char *func) rs->begin = jiffies; rs->printed = 0; } - if (rs->burst && rs->burst > rs->printed) { + if (burst && burst > rs->printed) { rs->printed++; ret = 1; } else {
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit d2154b0afa73c0159b2856f875c6b4fe7cf6a95e ]
While reading sysctl_tstamp_allow_data, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: b245be1f4db1 ("net-timestamp: no-payload only sysctl") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/skbuff.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index c623c129d0ab6..e0be1f8651bbe 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4377,7 +4377,7 @@ static bool skb_may_tx_timestamp(struct sock *sk, bool tsonly) { bool ret;
- if (likely(sysctl_tstamp_allow_data || tsonly)) + if (likely(READ_ONCE(sysctl_tstamp_allow_data) || tsonly)) return true;
read_lock_bh(&sk->sk_callback_lock);
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit c42b7cddea47503411bfb5f2f93a4154aaffa2d9 ]
While reading sysctl_net_busy_poll, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 060212928670 ("net: add low latency socket poll") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/busy_poll.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/net/busy_poll.h b/include/net/busy_poll.h index c76a5e9894dac..8f42f6f3af86f 100644 --- a/include/net/busy_poll.h +++ b/include/net/busy_poll.h @@ -43,7 +43,7 @@ extern unsigned int sysctl_net_busy_poll __read_mostly;
static inline bool net_busy_loop_on(void) { - return sysctl_net_busy_poll; + return READ_ONCE(sysctl_net_busy_poll); }
static inline bool sk_can_busy_loop(const struct sock *sk)
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit e59ef36f0795696ab229569c153936bfd068d21c ]
While reading sysctl_net_busy_read, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 2d48d67fa8cd ("net: poll/select low latency socket support") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/sock.c b/net/core/sock.c index 79f085df52cef..cd23a8e4556ca 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2856,7 +2856,7 @@ void sock_init_data(struct socket *sock, struct sock *sk)
#ifdef CONFIG_NET_RX_BUSY_POLL sk->sk_napi_id = 0; - sk->sk_ll_usec = sysctl_net_busy_read; + sk->sk_ll_usec = READ_ONCE(sysctl_net_busy_read); #endif
sk->sk_max_pacing_rate = ~0U;
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 2e0c42374ee32e72948559d2ae2f7ba3dc6b977c ]
While reading netdev_budget, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 51b0bdedb8e7 ("[NET]: Separate two usages of netdev_max_backlog.") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c index d5bef2bb0b7c8..c93068ea2e4f2 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6336,7 +6336,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) struct softnet_data *sd = this_cpu_ptr(&softnet_data); unsigned long time_limit = jiffies + usecs_to_jiffies(netdev_budget_usecs); - int budget = netdev_budget; + int budget = READ_ONCE(netdev_budget); LIST_HEAD(list); LIST_HEAD(repoll);
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit fa45d484c52c73f79db2c23b0cdfc6c6455093ad ]
While reading netdev_budget_usecs, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 7acf8a1e8a28 ("Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/core/dev.c b/net/core/dev.c index c93068ea2e4f2..880b096eef8a6 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6335,7 +6335,7 @@ static __latent_entropy void net_rx_action(struct softirq_action *h) { struct softnet_data *sd = this_cpu_ptr(&softnet_data); unsigned long time_limit = jiffies + - usecs_to_jiffies(netdev_budget_usecs); + usecs_to_jiffies(READ_ONCE(netdev_budget_usecs)); int budget = READ_ONCE(netdev_budget); LIST_HEAD(list); LIST_HEAD(repoll);
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 3c9ba81d72047f2e81bb535d42856517b613aba7 ]
While reading sysctl_somaxconn, it can be changed concurrently. Thus, we need to add READ_ONCE() to its reader.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/socket.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/socket.c b/net/socket.c index e5cc9f2b981ed..a5167f03c31db 100644 --- a/net/socket.c +++ b/net/socket.c @@ -1619,7 +1619,7 @@ int __sys_listen(int fd, int backlog)
sock = sockfd_lookup_light(fd, &err, &fput_needed); if (sock) { - somaxconn = sock_net(sock->sk)->core.sysctl_somaxconn; + somaxconn = READ_ONCE(sock_net(sock->sk)->core.sysctl_somaxconn); if ((unsigned int)backlog > somaxconn) backlog = somaxconn;
From: Jacob Keller jacob.e.keller@intel.com
[ Upstream commit 25d7a5f5a6bb15a2dae0a3f39ea5dda215024726 ]
The ixgbe_ptp_start_cyclecounter is intended to be called whenever the cyclecounter parameters need to be changed.
Since commit a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices"), this function has cleared the SYSTIME registers and reset the TSAUXC DISABLE_SYSTIME bit.
While these need to be cleared during ixgbe_ptp_reset, it is wrong to clear them during ixgbe_ptp_start_cyclecounter. This function may be called during both reset and link status change. When link changes, the SYSTIME counter is still operating normally, but the cyclecounter should be updated to account for the possibly changed parameters.
Clearing SYSTIME when link changes causes the timecounter to jump because the cycle counter now reads zero.
Extract the SYSTIME initialization out to a new function and call this during ixgbe_ptp_reset. This prevents the timecounter adjustment and avoids an unnecessary reset of the current time.
This also restores the original SYSTIME clearing that occurred during ixgbe_ptp_reset before the commit above.
Reported-by: Steve Payne spayne@aurora.tech Reported-by: Ilya Evenbach ievenbach@aurora.tech Fixes: a9763f3cb54c ("ixgbe: Update PTP to support X550EM_x devices") Signed-off-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Gurucharan gurucharanx.g@intel.com (A Contingent worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c | 59 +++++++++++++++----- 1 file changed, 46 insertions(+), 13 deletions(-)
diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c index b3e0d8bb5cbd8..eec68cc9288c8 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ptp.c @@ -1066,7 +1066,6 @@ void ixgbe_ptp_start_cyclecounter(struct ixgbe_adapter *adapter) struct cyclecounter cc; unsigned long flags; u32 incval = 0; - u32 tsauxc = 0; u32 fuse0 = 0;
/* For some of the boards below this mask is technically incorrect. @@ -1101,18 +1100,6 @@ void ixgbe_ptp_start_cyclecounter(struct ixgbe_adapter *adapter) case ixgbe_mac_x550em_a: case ixgbe_mac_X550: cc.read = ixgbe_ptp_read_X550; - - /* enable SYSTIME counter */ - IXGBE_WRITE_REG(hw, IXGBE_SYSTIMR, 0); - IXGBE_WRITE_REG(hw, IXGBE_SYSTIML, 0); - IXGBE_WRITE_REG(hw, IXGBE_SYSTIMH, 0); - tsauxc = IXGBE_READ_REG(hw, IXGBE_TSAUXC); - IXGBE_WRITE_REG(hw, IXGBE_TSAUXC, - tsauxc & ~IXGBE_TSAUXC_DISABLE_SYSTIME); - IXGBE_WRITE_REG(hw, IXGBE_TSIM, IXGBE_TSIM_TXTS); - IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_TIMESYNC); - - IXGBE_WRITE_FLUSH(hw); break; case ixgbe_mac_X540: cc.read = ixgbe_ptp_read_82599; @@ -1144,6 +1131,50 @@ void ixgbe_ptp_start_cyclecounter(struct ixgbe_adapter *adapter) spin_unlock_irqrestore(&adapter->tmreg_lock, flags); }
+/** + * ixgbe_ptp_init_systime - Initialize SYSTIME registers + * @adapter: the ixgbe private board structure + * + * Initialize and start the SYSTIME registers. + */ +static void ixgbe_ptp_init_systime(struct ixgbe_adapter *adapter) +{ + struct ixgbe_hw *hw = &adapter->hw; + u32 tsauxc; + + switch (hw->mac.type) { + case ixgbe_mac_X550EM_x: + case ixgbe_mac_x550em_a: + case ixgbe_mac_X550: + tsauxc = IXGBE_READ_REG(hw, IXGBE_TSAUXC); + + /* Reset SYSTIME registers to 0 */ + IXGBE_WRITE_REG(hw, IXGBE_SYSTIMR, 0); + IXGBE_WRITE_REG(hw, IXGBE_SYSTIML, 0); + IXGBE_WRITE_REG(hw, IXGBE_SYSTIMH, 0); + + /* Reset interrupt settings */ + IXGBE_WRITE_REG(hw, IXGBE_TSIM, IXGBE_TSIM_TXTS); + IXGBE_WRITE_REG(hw, IXGBE_EIMS, IXGBE_EIMS_TIMESYNC); + + /* Activate the SYSTIME counter */ + IXGBE_WRITE_REG(hw, IXGBE_TSAUXC, + tsauxc & ~IXGBE_TSAUXC_DISABLE_SYSTIME); + break; + case ixgbe_mac_X540: + case ixgbe_mac_82599EB: + /* Reset SYSTIME registers to 0 */ + IXGBE_WRITE_REG(hw, IXGBE_SYSTIML, 0); + IXGBE_WRITE_REG(hw, IXGBE_SYSTIMH, 0); + break; + default: + /* Other devices aren't supported */ + return; + }; + + IXGBE_WRITE_FLUSH(hw); +} + /** * ixgbe_ptp_reset * @adapter: the ixgbe private board structure @@ -1170,6 +1201,8 @@ void ixgbe_ptp_reset(struct ixgbe_adapter *adapter)
ixgbe_ptp_start_cyclecounter(adapter);
+ ixgbe_ptp_init_systime(adapter); + spin_lock_irqsave(&adapter->tmreg_lock, flags); timecounter_init(&adapter->hw_tc, &adapter->hw_cc, ktime_to_ns(ktime_get_real()));
From: Goldwyn Rodrigues rgoldwyn@suse.de
commit b51111271b0352aa596c5ae8faf06939e91b3b68 upstream.
For a filesystem which has btrfs read-only property set to true, all write operations including xattr should be denied. However, security xattr can still be changed even if btrfs ro property is true.
This happens because xattr_permission() does not have any restrictions on security.*, system.* and in some cases trusted.* from VFS and the decision is left to the underlying filesystem. See comments in xattr_permission() for more details.
This patch checks if the root is read-only before performing the set xattr operation.
Testcase:
DEV=/dev/vdb MNT=/mnt
mkfs.btrfs -f $DEV mount $DEV $MNT echo "file one" > $MNT/f1
setfattr -n "security.one" -v 2 $MNT/f1 btrfs property set /mnt ro true
setfattr -n "security.one" -v 1 $MNT/f1
umount $MNT
CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Qu Wenruo wqu@suse.com Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Goldwyn Rodrigues rgoldwyn@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/xattr.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/btrfs/xattr.c +++ b/fs/btrfs/xattr.c @@ -369,6 +369,9 @@ static int btrfs_xattr_handler_set(const const char *name, const void *buffer, size_t size, int flags) { + if (btrfs_root_readonly(BTRFS_I(inode)->root)) + return -EROFS; + name = xattr_full_name(handler, name); return btrfs_setxattr(NULL, inode, name, buffer, size, flags); }
From: Chen Zhongjin chenzhongjin@huawei.com
commit fc2e426b1161761561624ebd43ce8c8d2fa058da upstream.
When meeting ftrace trampolines in ORC unwinding, unwinder uses address of ftrace_{regs_}call address to find the ORC entry, which gets next frame at sp+176.
If there is an IRQ hitting at sub $0xa8,%rsp, the next frame should be sp+8 instead of 176. It makes unwinder skip correct frame and throw warnings such as "wrong direction" or "can't access registers", etc, depending on the content of the incorrect frame address.
By adding the base address ftrace_{regs_}caller with the offset *ip - ops->trampoline*, we can get the correct address to find the ORC entry.
Also change "caller" to "tramp_addr" to make variable name conform to its content.
[ mingo: Clarified the changelog a bit. ]
Fixes: 6be7fa3c74d1 ("ftrace, orc, x86: Handle ftrace dynamically allocated trampolines") Signed-off-by: Chen Zhongjin chenzhongjin@huawei.com Signed-off-by: Ingo Molnar mingo@kernel.org Reviewed-by: Steven Rostedt (Google) rostedt@goodmis.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220819084334.244016-1-chenzhongjin@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/unwind_orc.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/arch/x86/kernel/unwind_orc.c +++ b/arch/x86/kernel/unwind_orc.c @@ -89,22 +89,27 @@ static struct orc_entry *orc_find(unsign static struct orc_entry *orc_ftrace_find(unsigned long ip) { struct ftrace_ops *ops; - unsigned long caller; + unsigned long tramp_addr, offset;
ops = ftrace_ops_trampoline(ip); if (!ops) return NULL;
+ /* Set tramp_addr to the start of the code copied by the trampoline */ if (ops->flags & FTRACE_OPS_FL_SAVE_REGS) - caller = (unsigned long)ftrace_regs_call; + tramp_addr = (unsigned long)ftrace_regs_caller; else - caller = (unsigned long)ftrace_call; + tramp_addr = (unsigned long)ftrace_caller; + + /* Now place tramp_addr to the location within the trampoline ip is at */ + offset = ip - ops->trampoline; + tramp_addr += offset;
/* Prevent unlikely recursion */ - if (ip == caller) + if (ip == tramp_addr) return NULL;
- return orc_find(caller); + return orc_find(tramp_addr); } #else static struct orc_entry *orc_ftrace_find(unsigned long ip)
From: Siddh Raman Pant code@siddh.me
commit c490a0b5a4f36da3918181a8acdc6991d967c5f3 upstream.
The userspace can configure a loop using an ioctl call, wherein a configuration of type loop_config is passed (see lo_ioctl()'s case on line 1550 of drivers/block/loop.c). This proceeds to call loop_configure() which in turn calls loop_set_status_from_info() (see line 1050 of loop.c), passing &config->info which is of type loop_info64*. This function then sets the appropriate values, like the offset.
loop_device has lo_offset of type loff_t (see line 52 of loop.c), which is typdef-chained to long long, whereas loop_info64 has lo_offset of type __u64 (see line 56 of include/uapi/linux/loop.h).
The function directly copies offset from info to the device as follows (See line 980 of loop.c): lo->lo_offset = info->lo_offset;
This results in an overflow, which triggers a warning in iomap_iter() due to a call to iomap_iter_done() which has: WARN_ON_ONCE(iter->iomap.offset > iter->pos);
Thus, check for negative value during loop_set_status_from_info().
Bug report: https://syzkaller.appspot.com/bug?id=c620fe14aac810396d3c3edc9ad73848bf69a29...
Reported-and-tested-by: syzbot+a8e049cd3abd342936b6@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Reviewed-by: Matthew Wilcox (Oracle) willy@infradead.org Signed-off-by: Siddh Raman Pant code@siddh.me Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20220823160810.181275-1-code@siddh.me Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/loop.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/block/loop.c +++ b/drivers/block/loop.c @@ -1351,6 +1351,11 @@ loop_get_status(struct loop_device *lo, info->lo_number = lo->lo_number; info->lo_offset = lo->lo_offset; info->lo_sizelimit = lo->lo_sizelimit; + + /* loff_t vars have been assigned __u64 */ + if (lo->lo_offset < 0 || lo->lo_sizelimit < 0) + return -EOVERFLOW; + info->lo_flags = lo->lo_flags; memcpy(info->lo_file_name, lo->lo_file_name, LO_NAME_SIZE); memcpy(info->lo_crypt_name, lo->lo_crypt_name, LO_NAME_SIZE);
From: Quanyang Wang quanyang.wang@windriver.com
commit 0c7d7cc2b4fe2e74ef8728f030f0f1674f9f6aee upstream.
There are two problems with the current code of memory_intersects:
First, it doesn't check whether the region (begin, end) falls inside the region (virt, vend), that is (virt < begin && vend > end).
The second problem is if vend is equal to begin, it will return true but this is wrong since vend (virt + size) is not the last address of the memory region but (virt + size -1) is. The wrong determination will trigger the misreporting when the function check_for_illegal_area calls memory_intersects to check if the dma region intersects with stext region.
The misreporting is as below (stext is at 0x80100000): WARNING: CPU: 0 PID: 77 at kernel/dma/debug.c:1073 check_for_illegal_area+0x130/0x168 DMA-API: chipidea-usb2 e0002000.usb: device driver maps memory from kernel text or rodata [addr=800f0000] [len=65536] Modules linked in: CPU: 1 PID: 77 Comm: usb-storage Not tainted 5.19.0-yocto-standard #5 Hardware name: Xilinx Zynq Platform unwind_backtrace from show_stack+0x18/0x1c show_stack from dump_stack_lvl+0x58/0x70 dump_stack_lvl from __warn+0xb0/0x198 __warn from warn_slowpath_fmt+0x80/0xb4 warn_slowpath_fmt from check_for_illegal_area+0x130/0x168 check_for_illegal_area from debug_dma_map_sg+0x94/0x368 debug_dma_map_sg from __dma_map_sg_attrs+0x114/0x128 __dma_map_sg_attrs from dma_map_sg_attrs+0x18/0x24 dma_map_sg_attrs from usb_hcd_map_urb_for_dma+0x250/0x3b4 usb_hcd_map_urb_for_dma from usb_hcd_submit_urb+0x194/0x214 usb_hcd_submit_urb from usb_sg_wait+0xa4/0x118 usb_sg_wait from usb_stor_bulk_transfer_sglist+0xa0/0xec usb_stor_bulk_transfer_sglist from usb_stor_bulk_srb+0x38/0x70 usb_stor_bulk_srb from usb_stor_Bulk_transport+0x150/0x360 usb_stor_Bulk_transport from usb_stor_invoke_transport+0x38/0x440 usb_stor_invoke_transport from usb_stor_control_thread+0x1e0/0x238 usb_stor_control_thread from kthread+0xf8/0x104 kthread from ret_from_fork+0x14/0x2c
Refactor memory_intersects to fix the two problems above.
Before the 1d7db834a027e ("dma-debug: use memory_intersects() directly"), memory_intersects is called only by printk_late_init:
printk_late_init -> init_section_intersects ->memory_intersects.
There were few places where memory_intersects was called.
When commit 1d7db834a027e ("dma-debug: use memory_intersects() directly") was merged and CONFIG_DMA_API_DEBUG is enabled, the DMA subsystem uses it to check for an illegal area and the calltrace above is triggered.
[akpm@linux-foundation.org: fix nearby comment typo] Link: https://lkml.kernel.org/r/20220819081145.948016-1-quanyang.wang@windriver.co... Fixes: 979559362516 ("asm/sections: add helpers to check for section data") Signed-off-by: Quanyang Wang quanyang.wang@windriver.com Cc: Ard Biesheuvel ardb@kernel.org Cc: Arnd Bergmann arnd@arndb.de Cc: Thierry Reding treding@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/asm-generic/sections.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/include/asm-generic/sections.h +++ b/include/asm-generic/sections.h @@ -100,7 +100,7 @@ static inline bool memory_contains(void /** * memory_intersects - checks if the region occupied by an object intersects * with another memory region - * @begin: virtual address of the beginning of the memory regien + * @begin: virtual address of the beginning of the memory region * @end: virtual address of the end of the memory region * @virt: virtual address of the memory object * @size: size of the memory object @@ -113,7 +113,10 @@ static inline bool memory_intersects(voi { void *vend = virt + size;
- return (virt >= begin && virt < end) || (vend >= begin && vend < end); + if (virt < end && vend > begin) + return true; + + return false; }
/**
From: Brian Foster bfoster@redhat.com
commit 13cccafe0edcd03bf1c841de8ab8a1c8e34f77d9 upstream.
The pointers for guarded storage and runtime instrumentation control blocks are stored in the thread_struct of the associated task. These pointers are initially copied on fork() via arch_dup_task_struct() and then cleared via copy_thread() before fork() returns. If fork() happens to fail after the initial task dup and before copy_thread(), the newly allocated task and associated thread_struct memory are freed via free_task() -> arch_release_task_struct(). This results in a double free of the guarded storage and runtime info structs because the fields in the failed task still refer to memory associated with the source task.
This problem can manifest as a BUG_ON() in set_freepointer() (with CONFIG_SLAB_FREELIST_HARDENED enabled) or KASAN splat (if enabled) when running trinity syscall fuzz tests on s390x. To avoid this problem, clear the associated pointer fields in arch_dup_task_struct() immediately after the new task is copied. Note that the RI flag is still cleared in copy_thread() because it resides in thread stack memory and that is where stack info is copied.
Signed-off-by: Brian Foster bfoster@redhat.com Fixes: 8d9047f8b967c ("s390/runtime instrumentation: simplify task exit handling") Fixes: 7b83c6297d2fc ("s390/guarded storage: simplify task exit handling") Cc: stable@vger.kernel.org # 4.15 Reviewed-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Reviewed-by: Heiko Carstens hca@linux.ibm.com Link: https://lore.kernel.org/r/20220816155407.537372-1-bfoster@redhat.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kernel/process.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
--- a/arch/s390/kernel/process.c +++ b/arch/s390/kernel/process.c @@ -75,6 +75,18 @@ int arch_dup_task_struct(struct task_str
memcpy(dst, src, arch_task_struct_size); dst->thread.fpu.regs = dst->thread.fpu.fprs; + + /* + * Don't transfer over the runtime instrumentation or the guarded + * storage control block pointers. These fields are cleared here instead + * of in copy_thread() to avoid premature freeing of associated memory + * on fork() failure. Wait to clear the RI flag because ->stack still + * refers to the source thread. + */ + dst->thread.ri_cb = NULL; + dst->thread.gs_cb = NULL; + dst->thread.gs_bc_cb = NULL; + return 0; }
@@ -131,13 +143,11 @@ int copy_thread_tls(unsigned long clone_ frame->childregs.flags = 0; if (new_stackp) frame->childregs.gprs[15] = new_stackp; - - /* Don't copy runtime instrumentation info */ - p->thread.ri_cb = NULL; + /* + * Clear the runtime instrumentation flag after the above childregs + * copy. The CB pointer was already cleared in arch_dup_task_struct(). + */ frame->childregs.psw.mask &= ~PSW_MASK_RI; - /* Don't copy guarded storage control block */ - p->thread.gs_cb = NULL; - p->thread.gs_bc_cb = NULL;
/* Set a new TLS ? */ if (clone_flags & CLONE_SETTLS) {
From: David Hildenbrand david@redhat.com
commit f96f7a40874d7c746680c0b9f57cef2262ae551f upstream.
Patch series "mm/hugetlb: fix write-fault handling for shared mappings", v2.
I observed that hugetlb does not support/expect write-faults in shared mappings that would have to map the R/O-mapped page writable -- and I found two case where we could currently get such faults and would erroneously map an anon page into a shared mapping.
Reproducers part of the patches.
I propose to backport both fixes to stable trees. The first fix needs a small adjustment.
This patch (of 2):
Staring at hugetlb_wp(), one might wonder where all the logic for shared mappings is when stumbling over a write-protected page in a shared mapping. In fact, there is none, and so far we thought we could get away with that because e.g., mprotect() should always do the right thing and map all pages directly writable.
Looks like we were wrong:
-------------------------------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <errno.h> #include <sys/mman.h>
#define HUGETLB_SIZE (2 * 1024 * 1024u)
static void clear_softdirty(void) { int fd = open("/proc/self/clear_refs", O_WRONLY); const char *ctrl = "4"; int ret;
if (fd < 0) { fprintf(stderr, "open(clear_refs) failed\n"); exit(1); } ret = write(fd, ctrl, strlen(ctrl)); if (ret != strlen(ctrl)) { fprintf(stderr, "write(clear_refs) failed\n"); exit(1); } close(fd); }
int main(int argc, char **argv) { char *map; int fd;
fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT); if (!fd) { fprintf(stderr, "open() failed\n"); return -errno; } if (ftruncate(fd, HUGETLB_SIZE)) { fprintf(stderr, "ftruncate() failed\n"); return -errno; }
map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed\n"); return -errno; }
*map = 0;
if (mprotect(map, HUGETLB_SIZE, PROT_READ)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; }
clear_softdirty();
if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; }
*map = 0;
return 0; } --------------------------------------------------------------------------
Above test fails with SIGBUS when there is only a single free hugetlb page. # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test Bus error (core dumped)
And worse, with sufficient free hugetlb pages it will map an anonymous page into a shared mapping, for example, messing up accounting during unmap and breaking MAP_SHARED semantics: # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test # cat /proc/meminfo | grep HugePages_ HugePages_Total: 2 HugePages_Free: 1 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0
Reason in this particular case is that vma_wants_writenotify() will return "true", removing VM_SHARED in vma_set_page_prot() to map pages write-protected. Let's teach vma_wants_writenotify() that hugetlb does not support softdirty tracking.
Link: https://lkml.kernel.org/r/20220811103435.188481-1-david@redhat.com Link: https://lkml.kernel.org/r/20220811103435.188481-2-david@redhat.com Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Signed-off-by: David Hildenbrand david@redhat.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Cc: Peter Feiner pfeiner@google.com Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Cyrill Gorcunov gorcunov@openvz.org Cc: Pavel Emelyanov xemul@parallels.com Cc: Jamie Liu jamieliu@google.com Cc: Hugh Dickins hughd@google.com Cc: Naoya Horiguchi n-horiguchi@ah.jp.nec.com Cc: Bjorn Helgaas bhelgaas@google.com Cc: Muchun Song songmuchun@bytedance.com Cc: Peter Xu peterx@redhat.com Cc: stable@vger.kernel.org [3.18+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: David Hildenbrand david@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mmap.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/mm/mmap.c +++ b/mm/mmap.c @@ -1640,8 +1640,12 @@ int vma_wants_writenotify(struct vm_area pgprot_val(vm_pgprot_modify(vm_page_prot, vm_flags))) return 0;
- /* Do we need to track softdirty? */ - if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && !(vm_flags & VM_SOFTDIRTY)) + /* + * Do we need to track softdirty? hugetlb does not support softdirty + * tracking yet. + */ + if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && !(vm_flags & VM_SOFTDIRTY) && + !is_vm_hugetlb_page(vma)) return 1;
/* Specialty mapping? */
From: Guoqing Jiang guoqing.jiang@linux.dev
commit 0dd84b319352bb8ba64752d4e45396d8b13e6018 upstream.
From the link [1], we can see raid1d was running even after the path
raid_dtr -> md_stop -> __md_stop.
Let's stop write first in destructor to align with normal md-raid to fix the KASAN issue.
[1]. https://lore.kernel.org/linux-raid/CAPhsuW5gc4AakdGNdF8ubpezAuDLFOYUO_sfMZce...
Fixes: 48df498daf62 ("md: move bitmap_destroy to the beginning of __md_stop") Reported-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Guoqing Jiang guoqing.jiang@linux.dev Signed-off-by: Song Liu song@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/md.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -5937,6 +5937,7 @@ void md_stop(struct mddev *mddev) /* stop the array and free an attached data structures. * This is called from dm-raid */ + __md_stop_writes(mddev); __md_stop(mddev); bioset_exit(&mddev->bio_set); bioset_exit(&mddev->sync_set);
From: Saurabh Sengar ssengar@linux.microsoft.com
commit d957e7ffb2c72410bcc1a514153a46719255a5da upstream.
storvsc_error_wq workqueue should not be marked as WQ_MEM_RECLAIM as it doesn't need to make forward progress under memory pressure. Marking this workqueue as WQ_MEM_RECLAIM may cause deadlock while flushing a non-WQ_MEM_RECLAIM workqueue. In the current state it causes the following warning:
[ 14.506347] ------------[ cut here ]------------ [ 14.506354] workqueue: WQ_MEM_RECLAIM storvsc_error_wq_0:storvsc_remove_lun is flushing !WQ_MEM_RECLAIM events_freezable_power_:disk_events_workfn [ 14.506360] WARNING: CPU: 0 PID: 8 at <-snip->kernel/workqueue.c:2623 check_flush_dependency+0xb5/0x130 [ 14.506390] CPU: 0 PID: 8 Comm: kworker/u4:0 Not tainted 5.4.0-1086-azure #91~18.04.1-Ubuntu [ 14.506391] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 05/09/2022 [ 14.506393] Workqueue: storvsc_error_wq_0 storvsc_remove_lun [ 14.506395] RIP: 0010:check_flush_dependency+0xb5/0x130 <-snip-> [ 14.506408] Call Trace: [ 14.506412] __flush_work+0xf1/0x1c0 [ 14.506414] __cancel_work_timer+0x12f/0x1b0 [ 14.506417] ? kernfs_put+0xf0/0x190 [ 14.506418] cancel_delayed_work_sync+0x13/0x20 [ 14.506420] disk_block_events+0x78/0x80 [ 14.506421] del_gendisk+0x3d/0x2f0 [ 14.506423] sr_remove+0x28/0x70 [ 14.506427] device_release_driver_internal+0xef/0x1c0 [ 14.506428] device_release_driver+0x12/0x20 [ 14.506429] bus_remove_device+0xe1/0x150 [ 14.506431] device_del+0x167/0x380 [ 14.506432] __scsi_remove_device+0x11d/0x150 [ 14.506433] scsi_remove_device+0x26/0x40 [ 14.506434] storvsc_remove_lun+0x40/0x60 [ 14.506436] process_one_work+0x209/0x400 [ 14.506437] worker_thread+0x34/0x400 [ 14.506439] kthread+0x121/0x140 [ 14.506440] ? process_one_work+0x400/0x400 [ 14.506441] ? kthread_park+0x90/0x90 [ 14.506443] ret_from_fork+0x35/0x40 [ 14.506445] ---[ end trace 2d9633159fdc6ee7 ]---
Link: https://lore.kernel.org/r/1659628534-17539-1-git-send-email-ssengar@linux.mi... Fixes: 436ad9413353 ("scsi: storvsc: Allow only one remove lun work item to be issued per lun") Reviewed-by: Michael Kelley mikelley@microsoft.com Signed-off-by: Saurabh Sengar ssengar@linux.microsoft.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/storvsc_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/scsi/storvsc_drv.c +++ b/drivers/scsi/storvsc_drv.c @@ -1858,7 +1858,7 @@ static int storvsc_probe(struct hv_devic */ host_dev->handle_error_wq = alloc_ordered_workqueue("storvsc_error_wq_%d", - WQ_MEM_RECLAIM, + 0, host->host_no); if (!host_dev->handle_error_wq) goto err_out2;
From: Jann Horn jannh@google.com
commit b67fbebd4cf980aecbcc750e1462128bffe8ae15 upstream.
Some drivers rely on having all VMAs through which a PFN might be accessible listed in the rmap for correctness. However, on X86, it was possible for a VMA with stale TLB entries to not be listed in the rmap.
This was fixed in mainline with commit b67fbebd4cf9 ("mmu_gather: Force tlb-flush VM_PFNMAP vmas"), but that commit relies on preceding refactoring in commit 18ba064e42df3 ("mmu_gather: Let there be one tlb_{start,end}_vma() implementation") and commit 1e9fdf21a4339 ("mmu_gather: Remove per arch tlb_{start,end}_vma()").
This patch provides equivalent protection without needing that refactoring, by forcing a TLB flush between removing PTEs in unmap_vmas() and the call to unlink_file_vma() in free_pgtables().
[This is a stable-specific rewrite of the upstream commit!] Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mmap.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/mm/mmap.c +++ b/mm/mmap.c @@ -2572,6 +2572,18 @@ static void unmap_region(struct mm_struc tlb_gather_mmu(&tlb, mm, start, end); update_hiwater_rss(mm); unmap_vmas(&tlb, vma, start, end); + + /* + * Ensure we have no stale TLB entries by the time this mapping is + * removed from the rmap. + * Note that we don't have to worry about nested flushes here because + * we're holding the mm semaphore for removing the mapping - so any + * concurrent flush in this region has to be coming through the rmap, + * and we synchronize against that using the rmap lock. + */ + if ((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) != 0) + tlb_flush_mmu(&tlb); + free_pgtables(&tlb, vma, prev ? prev->vm_end : FIRST_USER_ADDRESS, next ? next->vm_start : USER_PGTABLES_CEILING); tlb_finish_mmu(&tlb, start, end);
From: Hsin-Yi Wang hsinyi@chromium.org
commit e112b032a72c78f15d0c803c5dc6be444c2e6c66 upstream.
Currently in arm64, FDT is mapped to RO before it's passed to early_init_dt_scan(). However, there might be some codes (eg. commit "fdt: add support for rng-seed") that need to modify FDT during init. Map FDT to RO after early fixups are done.
Signed-off-by: Hsin-Yi Wang hsinyi@chromium.org Reviewed-by: Stephen Boyd swboyd@chromium.org Reviewed-by: Mike Rapoport rppt@linux.ibm.com Signed-off-by: Will Deacon will@kernel.org [mkbestas: fixed trivial conflicts for 4.19 backport] Signed-off-by: Michael Bestas mkbestas@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/include/asm/mmu.h | 2 +- arch/arm64/kernel/kaslr.c | 5 +---- arch/arm64/kernel/setup.c | 9 ++++++++- arch/arm64/mm/mmu.c | 15 +-------------- 4 files changed, 11 insertions(+), 20 deletions(-)
--- a/arch/arm64/include/asm/mmu.h +++ b/arch/arm64/include/asm/mmu.h @@ -98,7 +98,7 @@ extern void init_mem_pgprot(void); extern void create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys, unsigned long virt, phys_addr_t size, pgprot_t prot, bool page_mappings_only); -extern void *fixmap_remap_fdt(phys_addr_t dt_phys); +extern void *fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot); extern void mark_linear_text_alias_ro(void);
#endif /* !__ASSEMBLY__ */ --- a/arch/arm64/kernel/kaslr.c +++ b/arch/arm64/kernel/kaslr.c @@ -65,9 +65,6 @@ out: return default_cmdline; }
-extern void *__init __fixmap_remap_fdt(phys_addr_t dt_phys, int *size, - pgprot_t prot); - /* * This routine will be executed with the kernel mapped at its default virtual * address, and if it returns successfully, the kernel will be remapped, and @@ -96,7 +93,7 @@ u64 __init kaslr_early_init(u64 dt_phys) * attempt at mapping the FDT in setup_machine() */ early_fixmap_init(); - fdt = __fixmap_remap_fdt(dt_phys, &size, PAGE_KERNEL); + fdt = fixmap_remap_fdt(dt_phys, &size, PAGE_KERNEL); if (!fdt) return 0;
--- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -183,9 +183,13 @@ static void __init smp_build_mpidr_hash(
static void __init setup_machine_fdt(phys_addr_t dt_phys) { - void *dt_virt = fixmap_remap_fdt(dt_phys); + int size; + void *dt_virt = fixmap_remap_fdt(dt_phys, &size, PAGE_KERNEL); const char *name;
+ if (dt_virt) + memblock_reserve(dt_phys, size); + if (!dt_virt || !early_init_dt_scan(dt_virt)) { pr_crit("\n" "Error: invalid device tree blob at physical address %pa (virtual address 0x%p)\n" @@ -197,6 +201,9 @@ static void __init setup_machine_fdt(phy cpu_relax(); }
+ /* Early fixups are done, map the FDT as read-only now */ + fixmap_remap_fdt(dt_phys, &size, PAGE_KERNEL_RO); + name = of_flat_dt_get_machine_name(); if (!name) return; --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -859,7 +859,7 @@ void __set_fixmap(enum fixed_addresses i } }
-void *__init __fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot) +void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot) { const u64 dt_virt_base = __fix_to_virt(FIX_FDT); int offset; @@ -912,19 +912,6 @@ void *__init __fixmap_remap_fdt(phys_add return dt_virt; }
-void *__init fixmap_remap_fdt(phys_addr_t dt_phys) -{ - void *dt_virt; - int size; - - dt_virt = __fixmap_remap_fdt(dt_phys, &size, PAGE_KERNEL_RO); - if (!dt_virt) - return NULL; - - memblock_reserve(dt_phys, size); - return dt_virt; -} - int __init arch_ioremap_pud_supported(void) { /*
From: Maxim Mikityanskiy maximmi@nvidia.com
commit 2fa7d94afc1afbb4d702760c058dc2d7ed30f226 upstream.
The first commit cited below attempts to fix the off-by-one error that appeared in some comparisons with an open range. Due to this error, arithmetically equivalent pieces of code could get different verdicts from the verifier, for example (pseudocode):
// 1. Passes the verifier: if (data + 8 > data_end) return early read *(u64 *)data, i.e. [data; data+7]
// 2. Rejected by the verifier (should still pass): if (data + 7 >= data_end) return early read *(u64 *)data, i.e. [data; data+7]
The attempted fix, however, shifts the range by one in a wrong direction, so the bug not only remains, but also such piece of code starts failing in the verifier:
// 3. Rejected by the verifier, but the check is stricter than in #1. if (data + 8 >= data_end) return early read *(u64 *)data, i.e. [data; data+7]
The change performed by that fix converted an off-by-one bug into off-by-two. The second commit cited below added the BPF selftests written to ensure than code chunks like #3 are rejected, however, they should be accepted.
This commit fixes the off-by-two error by adjusting new_range in the right direction and fixes the tests by changing the range into the one that should actually fail.
Fixes: fb2a311a31d3 ("bpf: fix off by one for range markings with L{T, E} patterns") Fixes: b37242c773b2 ("bpf: add test cases to bpf selftests to cover all access tests") Signed-off-by: Maxim Mikityanskiy maximmi@nvidia.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20211130181607.593149-1-maximmi@nvidia.com [OP: cherry-pick selftest changes only] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/test_verifier.c | 32 ++++++++++++++-------------- 1 file changed, 16 insertions(+), 16 deletions(-)
--- a/tools/testing/selftests/bpf/test_verifier.c +++ b/tools/testing/selftests/bpf/test_verifier.c @@ -9108,10 +9108,10 @@ static struct bpf_test tests[] = { BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data_end)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JGT, BPF_REG_3, BPF_REG_1, 1), BPF_JMP_IMM(BPF_JA, 0, 0, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -9166,10 +9166,10 @@ static struct bpf_test tests[] = { BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data_end)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JLT, BPF_REG_1, BPF_REG_3, 1), BPF_JMP_IMM(BPF_JA, 0, 0, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -9279,9 +9279,9 @@ static struct bpf_test tests[] = { BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data_end)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JGE, BPF_REG_1, BPF_REG_3, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -9451,9 +9451,9 @@ static struct bpf_test tests[] = { BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data_end)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JLE, BPF_REG_3, BPF_REG_1, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -9564,10 +9564,10 @@ static struct bpf_test tests[] = { BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JGT, BPF_REG_3, BPF_REG_1, 1), BPF_JMP_IMM(BPF_JA, 0, 0, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -9622,10 +9622,10 @@ static struct bpf_test tests[] = { BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JLT, BPF_REG_1, BPF_REG_3, 1), BPF_JMP_IMM(BPF_JA, 0, 0, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -9735,9 +9735,9 @@ static struct bpf_test tests[] = { BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JGE, BPF_REG_1, BPF_REG_3, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), }, @@ -9907,9 +9907,9 @@ static struct bpf_test tests[] = { BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, offsetof(struct xdp_md, data)), BPF_MOV64_REG(BPF_REG_1, BPF_REG_2), - BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 8), + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 6), BPF_JMP_REG(BPF_JLE, BPF_REG_3, BPF_REG_1, 1), - BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -8), + BPF_LDX_MEM(BPF_DW, BPF_REG_0, BPF_REG_1, -6), BPF_MOV64_IMM(BPF_REG_0, 0), BPF_EXIT_INSN(), },
From: Stanislav Fomichev sdf@google.com
commit 5366d2269139ba8eb6a906d73a0819947e3e4e0a upstream.
Commit 294f2fc6da27 ("bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds()") changed the way verifier logs some of its state, adjust the test_align accordingly. Where possible, I tried to not copy-paste the entire log line and resorted to dropping the last closing brace instead.
Fixes: 294f2fc6da27 ("bpf: Verifer, adjust_scalar_min_max_vals to always call update_reg_bounds()") Signed-off-by: Stanislav Fomichev sdf@google.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20200515194904.229296-1-sdf@google.com [OP: adjust for 4.19 selftests, apply only the relevant diffs] Signed-off-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/test_align.c | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-)
--- a/tools/testing/selftests/bpf/test_align.c +++ b/tools/testing/selftests/bpf/test_align.c @@ -359,15 +359,15 @@ static struct bpf_align_test tests[] = { * is still (4n), fixed offset is not changed. * Also, we create a new reg->id. */ - {29, "R5_w=pkt(id=4,off=18,r=0,umax_value=2040,var_off=(0x0; 0x7fc))"}, + {29, "R5_w=pkt(id=4,off=18,r=0,umax_value=2040,var_off=(0x0; 0x7fc)"}, /* At the time the word size load is performed from R5, * its total fixed offset is NET_IP_ALIGN + reg->off (18) * which is 20. Then the variable offset is (4n), so * the total offset is 4-byte aligned and meets the * load's requirements. */ - {33, "R4=pkt(id=4,off=22,r=22,umax_value=2040,var_off=(0x0; 0x7fc))"}, - {33, "R5=pkt(id=4,off=18,r=22,umax_value=2040,var_off=(0x0; 0x7fc))"}, + {33, "R4=pkt(id=4,off=22,r=22,umax_value=2040,var_off=(0x0; 0x7fc)"}, + {33, "R5=pkt(id=4,off=18,r=22,umax_value=2040,var_off=(0x0; 0x7fc)"}, }, }, { @@ -410,15 +410,15 @@ static struct bpf_align_test tests[] = { /* Adding 14 makes R6 be (4n+2) */ {9, "R6_w=inv(id=0,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc))"}, /* Packet pointer has (4n+2) offset */ - {11, "R5_w=pkt(id=1,off=0,r=0,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc))"}, - {13, "R4=pkt(id=1,off=4,r=0,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc))"}, + {11, "R5_w=pkt(id=1,off=0,r=0,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc)"}, + {13, "R4=pkt(id=1,off=4,r=0,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc)"}, /* At the time the word size load is performed from R5, * its total fixed offset is NET_IP_ALIGN + reg->off (0) * which is 2. Then the variable offset is (4n+2), so * the total offset is 4-byte aligned and meets the * load's requirements. */ - {15, "R5=pkt(id=1,off=0,r=4,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc))"}, + {15, "R5=pkt(id=1,off=0,r=4,umin_value=14,umax_value=1034,var_off=(0x2; 0x7fc)"}, /* Newly read value in R6 was shifted left by 2, so has * known alignment of 4. */ @@ -426,15 +426,15 @@ static struct bpf_align_test tests[] = { /* Added (4n) to packet pointer's (4n+2) var_off, giving * another (4n+2). */ - {19, "R5_w=pkt(id=2,off=0,r=0,umin_value=14,umax_value=2054,var_off=(0x2; 0xffc))"}, - {21, "R4=pkt(id=2,off=4,r=0,umin_value=14,umax_value=2054,var_off=(0x2; 0xffc))"}, + {19, "R5_w=pkt(id=2,off=0,r=0,umin_value=14,umax_value=2054,var_off=(0x2; 0xffc)"}, + {21, "R4=pkt(id=2,off=4,r=0,umin_value=14,umax_value=2054,var_off=(0x2; 0xffc)"}, /* At the time the word size load is performed from R5, * its total fixed offset is NET_IP_ALIGN + reg->off (0) * which is 2. Then the variable offset is (4n+2), so * the total offset is 4-byte aligned and meets the * load's requirements. */ - {23, "R5=pkt(id=2,off=0,r=4,umin_value=14,umax_value=2054,var_off=(0x2; 0xffc))"}, + {23, "R5=pkt(id=2,off=0,r=4,umin_value=14,umax_value=2054,var_off=(0x2; 0xffc)"}, }, }, { @@ -469,11 +469,11 @@ static struct bpf_align_test tests[] = { .matches = { {4, "R5_w=pkt_end(id=0,off=0,imm=0)"}, /* (ptr - ptr) << 2 == unknown, (4n) */ - {6, "R5_w=inv(id=0,smax_value=9223372036854775804,umax_value=18446744073709551612,var_off=(0x0; 0xfffffffffffffffc))"}, + {6, "R5_w=inv(id=0,smax_value=9223372036854775804,umax_value=18446744073709551612,var_off=(0x0; 0xfffffffffffffffc)"}, /* (4n) + 14 == (4n+2). We blow our bounds, because * the add could overflow. */ - {7, "R5=inv(id=0,var_off=(0x2; 0xfffffffffffffffc))"}, + {7, "R5=inv(id=0,smin_value=-9223372036854775806,smax_value=9223372036854775806,umin_value=2,umax_value=18446744073709551614,var_off=(0x2; 0xfffffffffffffffc)"}, /* Checked s>=0 */ {9, "R5=inv(id=0,umin_value=2,umax_value=9223372036854775806,var_off=(0x2; 0x7ffffffffffffffc))"}, /* packet pointer + nonnegative (4n+2) */ @@ -528,7 +528,7 @@ static struct bpf_align_test tests[] = { /* New unknown value in R7 is (4n) */ {11, "R7_w=inv(id=0,umax_value=1020,var_off=(0x0; 0x3fc))"}, /* Subtracting it from R6 blows our unsigned bounds */ - {12, "R6=inv(id=0,smin_value=-1006,smax_value=1034,var_off=(0x2; 0xfffffffffffffffc))"}, + {12, "R6=inv(id=0,smin_value=-1006,smax_value=1034,umin_value=2,umax_value=18446744073709551614,var_off=(0x2; 0xfffffffffffffffc)"}, /* Checked s>= 0 */ {14, "R6=inv(id=0,umin_value=2,umax_value=1034,var_off=(0x2; 0x7fc))"}, /* At the time the word size load is performed from R5, @@ -537,7 +537,8 @@ static struct bpf_align_test tests[] = { * the total offset is 4-byte aligned and meets the * load's requirements. */ - {20, "R5=pkt(id=1,off=0,r=4,umin_value=2,umax_value=1034,var_off=(0x2; 0x7fc))"}, + {20, "R5=pkt(id=1,off=0,r=4,umin_value=2,umax_value=1034,var_off=(0x2; 0x7fc)"}, + }, }, {
From: Gerald Schaefer gerald.schaefer@linux.ibm.com
commit 41ac42f137080bc230b5882e3c88c392ab7f2d32 upstream.
For non-protection pXd_none() page faults in do_dat_exception(), we call do_exception() with access == (VM_READ | VM_WRITE | VM_EXEC). In do_exception(), vma->vm_flags is checked against that before calling handle_mm_fault().
Since commit 92f842eac7ee3 ("[S390] store indication fault optimization"), we call handle_mm_fault() with FAULT_FLAG_WRITE, when recognizing that it was a write access. However, the vma flags check is still only checking against (VM_READ | VM_WRITE | VM_EXEC), and therefore also calling handle_mm_fault() with FAULT_FLAG_WRITE in cases where the vma does not allow VM_WRITE.
Fix this by changing access check in do_exception() to VM_WRITE only, when recognizing write access.
Link: https://lkml.kernel.org/r/20220811103435.188481-3-david@redhat.com Fixes: 92f842eac7ee3 ("[S390] store indication fault optimization") Cc: stable@vger.kernel.org Reported-by: David Hildenbrand david@redhat.com Reviewed-by: Heiko Carstens hca@linux.ibm.com Signed-off-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Gerald Schaefer gerald.schaefer@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/mm/fault.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/s390/mm/fault.c +++ b/arch/s390/mm/fault.c @@ -455,7 +455,9 @@ static inline vm_fault_t do_exception(st flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; if (user_mode(regs)) flags |= FAULT_FLAG_USER; - if (access == VM_WRITE || (trans_exc_code & store_indication) == 0x400) + if ((trans_exc_code & store_indication) == 0x400) + access = VM_WRITE; + if (access == VM_WRITE) flags |= FAULT_FLAG_WRITE; down_read(&mm->mmap_sem);
From: Pawan Gupta pawan.kumar.gupta@linux.intel.com
commit 7df548840c496b0141fb2404b889c346380c2b22 upstream.
Older Intel CPUs that are not in the affected processor list for MMIO Stale Data vulnerabilities currently report "Not affected" in sysfs, which may not be correct. Vulnerability status for these older CPUs is unknown.
Add known-not-affected CPUs to the whitelist. Report "unknown" mitigation status for CPUs that are not in blacklist, whitelist and also don't enumerate MSR ARCH_CAPABILITIES bits that reflect hardware immunity to MMIO Stale Data vulnerabilities.
Mitigation is not deployed when the status is unknown.
[ bp: Massage, fixup. ]
Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data") Suggested-by: Andrew Cooper andrew.cooper3@citrix.com Suggested-by: Tony Luck tony.luck@intel.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/a932c154772f2121794a5f2eded1a11013114711.165784626... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst | 14 ++++ arch/x86/include/asm/cpufeatures.h | 3 arch/x86/kernel/cpu/bugs.c | 14 +++- arch/x86/kernel/cpu/common.c | 34 ++++++---- 4 files changed, 51 insertions(+), 14 deletions(-)
--- a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst +++ b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst @@ -230,6 +230,20 @@ The possible values in this file are: * - 'Mitigation: Clear CPU buffers' - The processor is vulnerable and the CPU buffer clearing mitigation is enabled. + * - 'Unknown: No mitigations' + - The processor vulnerability status is unknown because it is + out of Servicing period. Mitigation is not attempted. + +Definitions: +------------ + +Servicing period: The process of providing functional and security updates to +Intel processors or platforms, utilizing the Intel Platform Update (IPU) +process or other similar mechanisms. + +End of Servicing Updates (ESU): ESU is the date at which Intel will no +longer provide Servicing, such as through IPU or other similar update +processes. ESU dates will typically be aligned to end of quarter.
If the processor is vulnerable then the following information is appended to the above information: --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -396,6 +396,7 @@ #define X86_BUG_ITLB_MULTIHIT X86_BUG(23) /* CPU may incur MCE during certain page attribute changes */ #define X86_BUG_SRBDS X86_BUG(24) /* CPU may leak RNG bits if not mitigated */ #define X86_BUG_MMIO_STALE_DATA X86_BUG(25) /* CPU is affected by Processor MMIO Stale Data vulnerabilities */ -#define X86_BUG_EIBRS_PBRSB X86_BUG(26) /* EIBRS is vulnerable to Post Barrier RSB Predictions */ +#define X86_BUG_MMIO_UNKNOWN X86_BUG(26) /* CPU is too old and its MMIO Stale Data status is unknown */ +#define X86_BUG_EIBRS_PBRSB X86_BUG(27) /* EIBRS is vulnerable to Post Barrier RSB Predictions */
#endif /* _ASM_X86_CPUFEATURES_H */ --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -396,7 +396,8 @@ static void __init mmio_select_mitigatio u64 ia32_cap;
if (!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA) || - cpu_mitigations_off()) { + boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN) || + cpu_mitigations_off()) { mmio_mitigation = MMIO_MITIGATION_OFF; return; } @@ -501,6 +502,8 @@ out: pr_info("TAA: %s\n", taa_strings[taa_mitigation]); if (boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA)) pr_info("MMIO Stale Data: %s\n", mmio_strings[mmio_mitigation]); + else if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN)) + pr_info("MMIO Stale Data: Unknown: No mitigations\n"); }
static void __init md_clear_select_mitigation(void) @@ -1868,6 +1871,9 @@ static ssize_t tsx_async_abort_show_stat
static ssize_t mmio_stale_data_show_state(char *buf) { + if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN)) + return sysfs_emit(buf, "Unknown: No mitigations\n"); + if (mmio_mitigation == MMIO_MITIGATION_OFF) return sysfs_emit(buf, "%s\n", mmio_strings[mmio_mitigation]);
@@ -1995,6 +2001,7 @@ static ssize_t cpu_show_common(struct de return srbds_show_state(buf);
case X86_BUG_MMIO_STALE_DATA: + case X86_BUG_MMIO_UNKNOWN: return mmio_stale_data_show_state(buf);
default: @@ -2051,6 +2058,9 @@ ssize_t cpu_show_srbds(struct device *de
ssize_t cpu_show_mmio_stale_data(struct device *dev, struct device_attribute *attr, char *buf) { - return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_STALE_DATA); + if (boot_cpu_has_bug(X86_BUG_MMIO_UNKNOWN)) + return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_UNKNOWN); + else + return cpu_show_common(dev, attr, buf, X86_BUG_MMIO_STALE_DATA); } #endif --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -955,6 +955,7 @@ static void identify_cpu_without_cpuid(s #define NO_SWAPGS BIT(6) #define NO_ITLB_MULTIHIT BIT(7) #define NO_EIBRS_PBRSB BIT(8) +#define NO_MMIO BIT(9)
#define VULNWL(_vendor, _family, _model, _whitelist) \ { X86_VENDOR_##_vendor, _family, _model, X86_FEATURE_ANY, _whitelist } @@ -972,6 +973,11 @@ static const __initconst struct x86_cpu_ VULNWL(NSC, 5, X86_MODEL_ANY, NO_SPECULATION),
/* Intel Family 6 */ + VULNWL_INTEL(TIGERLAKE, NO_MMIO), + VULNWL_INTEL(TIGERLAKE_L, NO_MMIO), + VULNWL_INTEL(ALDERLAKE, NO_MMIO), + VULNWL_INTEL(ALDERLAKE_L, NO_MMIO), + VULNWL_INTEL(ATOM_SALTWELL, NO_SPECULATION | NO_ITLB_MULTIHIT), VULNWL_INTEL(ATOM_SALTWELL_TABLET, NO_SPECULATION | NO_ITLB_MULTIHIT), VULNWL_INTEL(ATOM_SALTWELL_MID, NO_SPECULATION | NO_ITLB_MULTIHIT), @@ -989,9 +995,9 @@ static const __initconst struct x86_cpu_
VULNWL_INTEL(ATOM_AIRMONT_MID, NO_L1TF | MSBDS_ONLY | NO_SWAPGS | NO_ITLB_MULTIHIT),
- VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT), - VULNWL_INTEL(ATOM_GOLDMONT_X, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT), - VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB), + VULNWL_INTEL(ATOM_GOLDMONT, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO), + VULNWL_INTEL(ATOM_GOLDMONT_X, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO), + VULNWL_INTEL(ATOM_GOLDMONT_PLUS, NO_MDS | NO_L1TF | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO | NO_EIBRS_PBRSB),
/* * Technically, swapgs isn't serializing on AMD (despite it previously @@ -1006,13 +1012,13 @@ static const __initconst struct x86_cpu_ VULNWL_INTEL(ATOM_TREMONT_X, NO_ITLB_MULTIHIT | NO_EIBRS_PBRSB),
/* AMD Family 0xf - 0x12 */ - VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT), - VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT), - VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT), - VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT), + VULNWL_AMD(0x0f, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO), + VULNWL_AMD(0x10, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO), + VULNWL_AMD(0x11, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO), + VULNWL_AMD(0x12, NO_MELTDOWN | NO_SSB | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO),
/* FAMILY_ANY must be last, otherwise 0x0f - 0x12 matches won't work */ - VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT), + VULNWL_AMD(X86_FAMILY_ANY, NO_MELTDOWN | NO_L1TF | NO_MDS | NO_SWAPGS | NO_ITLB_MULTIHIT | NO_MMIO), {} };
@@ -1152,10 +1158,16 @@ static void __init cpu_set_bug_bits(stru * Affected CPU list is generally enough to enumerate the vulnerability, * but for virtualization case check for ARCH_CAP MSR bits also, VMM may * not want the guest to enumerate the bug. + * + * Set X86_BUG_MMIO_UNKNOWN for CPUs that are neither in the blacklist, + * nor in the whitelist and also don't enumerate MSR ARCH_CAP MMIO bits. */ - if (cpu_matches(cpu_vuln_blacklist, MMIO) && - !arch_cap_mmio_immune(ia32_cap)) - setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA); + if (!arch_cap_mmio_immune(ia32_cap)) { + if (cpu_matches(cpu_vuln_blacklist, MMIO)) + setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA); + else if (!cpu_matches(cpu_vuln_whitelist, NO_MMIO)) + setup_force_cpu_bug(X86_BUG_MMIO_UNKNOWN); + }
if (cpu_has(c, X86_FEATURE_IBRS_ENHANCED) && !cpu_matches(cpu_vuln_whitelist, NO_EIBRS_PBRSB) &&
From: Jing Leng jleng@ambarella.com
commit 23a0cb8e3225122496bfa79172005c587c2d64bf upstream.
When building an external module, if users don't need to separate the compilation output and source code, they run the following command: "make -C $(LINUX_SRC_DIR) M=$(PWD)". At this point, "$(KBUILD_EXTMOD)" and "$(src)" are the same.
If they need to separate them, they run "make -C $(KERNEL_SRC_DIR) O=$(KERNEL_OUT_DIR) M=$(OUT_DIR) src=$(PWD)". Before running the command, they need to copy "Kbuild" or "Makefile" to "$(OUT_DIR)" to prevent compilation failure.
So the kernel should change the included path to avoid the copy operation.
Signed-off-by: Jing Leng jleng@ambarella.com [masahiro: I do not think "M=$(OUT_DIR) src=$(PWD)" is the official way, but this patch is a nice clean up anyway.] Signed-off-by: Masahiro Yamada masahiroy@kernel.org [nsc: updated context for v4.19] Signed-off-by: Nicolas Schier n.schier@avm.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- scripts/Makefile.modpost | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/scripts/Makefile.modpost +++ b/scripts/Makefile.modpost @@ -51,8 +51,7 @@ obj := $(KBUILD_EXTMOD) src := $(obj)
# Include the module's Makefile to find KBUILD_EXTRA_SYMBOLS -include $(if $(wildcard $(KBUILD_EXTMOD)/Kbuild), \ - $(KBUILD_EXTMOD)/Kbuild, $(KBUILD_EXTMOD)/Makefile) +include $(if $(wildcard $(src)/Kbuild), $(src)/Kbuild, $(src)/Makefile) endif
include scripts/Makefile.lib
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
commit b840304fb46cdf7012722f456bce06f151b3e81b upstream.
This attempts to fix the follow errors:
In function 'memcmp', inlined from 'bacmp' at ./include/net/bluetooth/bluetooth.h:347:9, inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2003:15: ./include/linux/fortify-string.h:44:33: error: '__builtin_memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread] 44 | #define __underlying_memcmp __builtin_memcmp | ^ ./include/linux/fortify-string.h:420:16: note: in expansion of macro '__underlying_memcmp' 420 | return __underlying_memcmp(p, q, size); | ^~~~~~~~~~~~~~~~~~~ In function 'memcmp', inlined from 'bacmp' at ./include/net/bluetooth/bluetooth.h:347:9, inlined from 'l2cap_global_chan_by_psm' at net/bluetooth/l2cap_core.c:2004:15: ./include/linux/fortify-string.h:44:33: error: '__builtin_memcmp' specified bound 6 exceeds source size 0 [-Werror=stringop-overread] 44 | #define __underlying_memcmp __builtin_memcmp | ^ ./include/linux/fortify-string.h:420:16: note: in expansion of macro '__underlying_memcmp' 420 | return __underlying_memcmp(p, q, size); | ^~~~~~~~~~~~~~~~~~~
Fixes: 332f1795ca20 ("Bluetooth: L2CAP: Fix l2cap_global_chan_by_psm regression") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Cc: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/l2cap_core.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -1826,11 +1826,11 @@ static struct l2cap_chan *l2cap_global_c src_match = !bacmp(&c->src, src); dst_match = !bacmp(&c->dst, dst); if (src_match && dst_match) { - c = l2cap_chan_hold_unless_zero(c); - if (c) { - read_unlock(&chan_list_lock); - return c; - } + if (!l2cap_chan_hold_unless_zero(c)) + continue; + + read_unlock(&chan_list_lock); + return c; }
/* Closest match */
From: Lee Jones lee.jones@linaro.org
commit cd11d1a6114bd4bc6450ae59f6e110ec47362126 upstream.
It is possible for a malicious device to forgo submitting a Feature Report. The HID Steam driver presently makes no prevision for this and de-references the 'struct hid_report' pointer obtained from the HID devices without first checking its validity. Let's change that.
Cc: Jiri Kosina jikos@kernel.org Cc: Benjamin Tissoires benjamin.tissoires@redhat.com Cc: linux-input@vger.kernel.org Fixes: c164d6abf3841 ("HID: add driver for Valve Steam Controller") Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hid-steam.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/drivers/hid/hid-steam.c +++ b/drivers/hid/hid-steam.c @@ -134,6 +134,11 @@ static int steam_recv_report(struct stea int ret;
r = steam->hdev->report_enum[HID_FEATURE_REPORT].report_id_hash[0]; + if (!r) { + hid_err(steam->hdev, "No HID_FEATURE_REPORT submitted - nothing to read\n"); + return -EINVAL; + } + if (hid_report_len(r) < 64) return -EINVAL;
@@ -165,6 +170,11 @@ static int steam_send_report(struct stea int ret;
r = steam->hdev->report_enum[HID_FEATURE_REPORT].report_id_hash[0]; + if (!r) { + hid_err(steam->hdev, "No HID_FEATURE_REPORT submitted - nothing to read\n"); + return -EINVAL; + } + if (hid_report_len(r) < 64) return -EINVAL;
From: Dongliang Mu mudongliangabcd@gmail.com
commit 945a9a8e448b65bec055d37eba58f711b39f66f0 upstream.
The error handling code in pvr2_hdw_create forgets to unregister the v4l2 device. When pvr2_hdw_create returns back to pvr2_context_create, it calls pvr2_context_destroy to destroy context, but mp->hdw is NULL, which leads to that pvr2_hdw_destroy directly returns.
Fix this by adding v4l2_device_unregister to decrease the refcount of usb interface.
Reported-by: syzbot+77b432d57c4791183ed4@syzkaller.appspotmail.com Signed-off-by: Dongliang Mu mudongliangabcd@gmail.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/usb/pvrusb2/pvrusb2-hdw.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/media/usb/pvrusb2/pvrusb2-hdw.c +++ b/drivers/media/usb/pvrusb2/pvrusb2-hdw.c @@ -2602,6 +2602,7 @@ struct pvr2_hdw *pvr2_hdw_create(struct del_timer_sync(&hdw->encoder_run_timer); del_timer_sync(&hdw->encoder_wait_timer); flush_work(&hdw->workpoll); + v4l2_device_unregister(&hdw->v4l2_dev); usb_free_urb(hdw->ctl_read_urb); usb_free_urb(hdw->ctl_write_urb); kfree(hdw->ctl_read_buffer);
From: Karthik Alapati mail@karthek.com
commit a5623a203cffe2d2b84d2f6c989d9017db1856af upstream.
Free the buffered reports before deleting the list entry.
BUG: memory leak unreferenced object 0xffff88810e72f180 (size 32): comm "softirq", pid 0, jiffies 4294945143 (age 16.080s) hex dump (first 32 bytes): 64 f3 c6 6a d1 88 07 04 00 00 00 00 00 00 00 00 d..j............ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff814ac6c3>] kmemdup+0x23/0x50 mm/util.c:128 [<ffffffff8357c1d2>] kmemdup include/linux/fortify-string.h:440 [inline] [<ffffffff8357c1d2>] hidraw_report_event+0xa2/0x150 drivers/hid/hidraw.c:521 [<ffffffff8356ddad>] hid_report_raw_event+0x27d/0x740 drivers/hid/hid-core.c:1992 [<ffffffff8356e41e>] hid_input_report+0x1ae/0x270 drivers/hid/hid-core.c:2065 [<ffffffff835f0d3f>] hid_irq_in+0x1ff/0x250 drivers/hid/usbhid/hid-core.c:284 [<ffffffff82d3c7f9>] __usb_hcd_giveback_urb+0xf9/0x230 drivers/usb/core/hcd.c:1670 [<ffffffff82d3cc26>] usb_hcd_giveback_urb+0x1b6/0x1d0 drivers/usb/core/hcd.c:1747 [<ffffffff82ef1e14>] dummy_timer+0x8e4/0x14c0 drivers/usb/gadget/udc/dummy_hcd.c:1988 [<ffffffff812f50a8>] call_timer_fn+0x38/0x200 kernel/time/timer.c:1474 [<ffffffff812f5586>] expire_timers kernel/time/timer.c:1519 [inline] [<ffffffff812f5586>] __run_timers.part.0+0x316/0x430 kernel/time/timer.c:1790 [<ffffffff812f56e4>] __run_timers kernel/time/timer.c:1768 [inline] [<ffffffff812f56e4>] run_timer_softirq+0x44/0x90 kernel/time/timer.c:1803 [<ffffffff848000e6>] __do_softirq+0xe6/0x2ea kernel/softirq.c:571 [<ffffffff81246db0>] invoke_softirq kernel/softirq.c:445 [inline] [<ffffffff81246db0>] __irq_exit_rcu kernel/softirq.c:650 [inline] [<ffffffff81246db0>] irq_exit_rcu+0xc0/0x110 kernel/softirq.c:662 [<ffffffff84574f02>] sysvec_apic_timer_interrupt+0xa2/0xd0 arch/x86/kernel/apic/apic.c:1106 [<ffffffff84600c8b>] asm_sysvec_apic_timer_interrupt+0x1b/0x20 arch/x86/include/asm/idtentry.h:649 [<ffffffff8458a070>] native_safe_halt arch/x86/include/asm/irqflags.h:51 [inline] [<ffffffff8458a070>] arch_safe_halt arch/x86/include/asm/irqflags.h:89 [inline] [<ffffffff8458a070>] acpi_safe_halt drivers/acpi/processor_idle.c:111 [inline] [<ffffffff8458a070>] acpi_idle_do_entry+0xc0/0xd0 drivers/acpi/processor_idle.c:554
Link: https://syzkaller.appspot.com/bug?id=19a04b43c75ed1092021010419b5e560a8172c4... Reported-by: syzbot+f59100a0428e6ded9443@syzkaller.appspotmail.com Signed-off-by: Karthik Alapati mail@karthek.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/hidraw.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/hid/hidraw.c +++ b/drivers/hid/hidraw.c @@ -354,10 +354,13 @@ static int hidraw_release(struct inode * unsigned int minor = iminor(inode); struct hidraw_list *list = file->private_data; unsigned long flags; + int i;
mutex_lock(&minors_lock);
spin_lock_irqsave(&hidraw_table[minor]->list_lock, flags); + for (i = list->tail; i < list->head; i++) + kfree(list->buffer[i].value); list_del(&list->node); spin_unlock_irqrestore(&hidraw_table[minor]->list_lock, flags); kfree(list);
From: Letu Ren fantasquex@gmail.com
commit 19f953e7435644b81332dd632ba1b2d80b1e37af upstream.
In `do_fb_ioctl()` of fbmem.c, if cmd is FBIOPUT_VSCREENINFO, var will be copied from user, then go through `fb_set_var()` and `info->fbops->fb_check_var()` which could may be `pm2fb_check_var()`. Along the path, `var->pixclock` won't be modified. This function checks whether reciprocal of `var->pixclock` is too high. If `var->pixclock` is zero, there will be a divide by zero error. So, it is necessary to check whether denominator is zero to avoid crash. As this bug is found by Syzkaller, logs are listed below.
divide error in pm2fb_check_var Call Trace: <TASK> fb_set_var+0x367/0xeb0 drivers/video/fbdev/core/fbmem.c:1015 do_fb_ioctl+0x234/0x670 drivers/video/fbdev/core/fbmem.c:1110 fb_ioctl+0xdd/0x130 drivers/video/fbdev/core/fbmem.c:1189
Reported-by: Zheyu Ma zheyuma97@gmail.com Signed-off-by: Letu Ren fantasquex@gmail.com Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/video/fbdev/pm2fb.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/video/fbdev/pm2fb.c +++ b/drivers/video/fbdev/pm2fb.c @@ -616,6 +616,11 @@ static int pm2fb_check_var(struct fb_var return -EINVAL; }
+ if (!var->pixclock) { + DPRINTK("pixclock is zero\n"); + return -EINVAL; + } + if (PICOS2KHZ(var->pixclock) > PM2_MAX_PIXCLOCK) { DPRINTK("pixclock too high (%ldKHz)\n", PICOS2KHZ(var->pixclock));
From: Yang Jihong yangjihong1@huawei.com
commit c3b0f72e805f0801f05fa2aa52011c4bfc694c44 upstream.
ftrace_startup does not remove ops from ftrace_ops_list when ftrace_startup_enable fails:
register_ftrace_function ftrace_startup __register_ftrace_function ... add_ftrace_ops(&ftrace_ops_list, ops) ... ... ftrace_startup_enable // if ftrace failed to modify, ftrace_disabled is set to 1 ... return 0 // ops is in the ftrace_ops_list.
When ftrace_disabled = 1, unregister_ftrace_function simply returns without doing anything: unregister_ftrace_function ftrace_shutdown if (unlikely(ftrace_disabled)) return -ENODEV; // return here, __unregister_ftrace_function is not executed, // as a result, ops is still in the ftrace_ops_list __unregister_ftrace_function ...
If ops is dynamically allocated, it will be free later, in this case, is_ftrace_trampoline accesses NULL pointer:
is_ftrace_trampoline ftrace_ops_trampoline do_for_each_ftrace_op(op, ftrace_ops_list) // OOPS! op may be NULL!
Syzkaller reports as follows: [ 1203.506103] BUG: kernel NULL pointer dereference, address: 000000000000010b [ 1203.508039] #PF: supervisor read access in kernel mode [ 1203.508798] #PF: error_code(0x0000) - not-present page [ 1203.509558] PGD 800000011660b067 P4D 800000011660b067 PUD 130fb8067 PMD 0 [ 1203.510560] Oops: 0000 [#1] SMP KASAN PTI [ 1203.511189] CPU: 6 PID: 29532 Comm: syz-executor.2 Tainted: G B W 5.10.0 #8 [ 1203.512324] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 1203.513895] RIP: 0010:is_ftrace_trampoline+0x26/0xb0 [ 1203.514644] Code: ff eb d3 90 41 55 41 54 49 89 fc 55 53 e8 f2 00 fd ff 48 8b 1d 3b 35 5d 03 e8 e6 00 fd ff 48 8d bb 90 00 00 00 e8 2a 81 26 00 <48> 8b ab 90 00 00 00 48 85 ed 74 1d e8 c9 00 fd ff 48 8d bb 98 00 [ 1203.518838] RSP: 0018:ffffc900012cf960 EFLAGS: 00010246 [ 1203.520092] RAX: 0000000000000000 RBX: 000000000000007b RCX: ffffffff8a331866 [ 1203.521469] RDX: 0000000000000000 RSI: 0000000000000008 RDI: 000000000000010b [ 1203.522583] RBP: 0000000000000000 R08: 0000000000000000 R09: ffffffff8df18b07 [ 1203.523550] R10: fffffbfff1be3160 R11: 0000000000000001 R12: 0000000000478399 [ 1203.524596] R13: 0000000000000000 R14: ffff888145088000 R15: 0000000000000008 [ 1203.525634] FS: 00007f429f5f4700(0000) GS:ffff8881daf00000(0000) knlGS:0000000000000000 [ 1203.526801] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1203.527626] CR2: 000000000000010b CR3: 0000000170e1e001 CR4: 00000000003706e0 [ 1203.528611] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 1203.529605] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Therefore, when ftrace_startup_enable fails, we need to rollback registration process and remove ops from ftrace_ops_list.
Link: https://lkml.kernel.org/r/20220818032659.56209-1-yangjihong1@huawei.com
Suggested-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Yang Jihong yangjihong1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ftrace.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -2748,6 +2748,16 @@ static int ftrace_startup(struct ftrace_
ftrace_startup_enable(command);
+ /* + * If ftrace is in an undefined state, we just remove ops from list + * to prevent the NULL pointer, instead of totally rolling it back and + * free trampoline, because those actions could cause further damage. + */ + if (unlikely(ftrace_disabled)) { + __unregister_ftrace_function(ops); + return -ENODEV; + } + ops->flags &= ~FTRACE_OPS_FL_ADDING;
return 0;
From: Jann Horn jannh@google.com
commit 2555283eb40df89945557273121e9393ef9b542b upstream.
anon_vma->degree tracks the combined number of child anon_vmas and VMAs that use the anon_vma as their ->anon_vma.
anon_vma_clone() then assumes that for any anon_vma attached to src->anon_vma_chain other than src->anon_vma, it is impossible for it to be a leaf node of the VMA tree, meaning that for such VMAs ->degree is elevated by 1 because of a child anon_vma, meaning that if ->degree equals 1 there are no VMAs that use the anon_vma as their ->anon_vma.
This assumption is wrong because the ->degree optimization leads to leaf nodes being abandoned on anon_vma_clone() - an existing anon_vma is reused and no new parent-child relationship is created. So it is possible to reuse an anon_vma for one VMA while it is still tied to another VMA.
This is an issue because is_mergeable_anon_vma() and its callers assume that if two VMAs have the same ->anon_vma, the list of anon_vmas attached to the VMAs is guaranteed to be the same. When this assumption is violated, vma_merge() can merge pages into a VMA that is not attached to the corresponding anon_vma, leading to dangling page->mapping pointers that will be dereferenced during rmap walks.
Fix it by separately tracking the number of child anon_vmas and the number of VMAs using the anon_vma as their ->anon_vma.
Fixes: 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy") Cc: stable@kernel.org Acked-by: Michal Hocko mhocko@suse.com Acked-by: Vlastimil Babka vbabka@suse.cz Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/rmap.h | 7 +++++-- mm/rmap.c | 31 +++++++++++++++++-------------- 2 files changed, 22 insertions(+), 16 deletions(-)
--- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -39,12 +39,15 @@ struct anon_vma { atomic_t refcount;
/* - * Count of child anon_vmas and VMAs which points to this anon_vma. + * Count of child anon_vmas. Equals to the count of all anon_vmas that + * have ->parent pointing to this one, including itself. * * This counter is used for making decision about reusing anon_vma * instead of forking new one. See comments in function anon_vma_clone. */ - unsigned degree; + unsigned long num_children; + /* Count of VMAs whose ->anon_vma pointer points to this object. */ + unsigned long num_active_vmas;
struct anon_vma *parent; /* Parent of this anon_vma */
--- a/mm/rmap.c +++ b/mm/rmap.c @@ -82,7 +82,8 @@ static inline struct anon_vma *anon_vma_ anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL); if (anon_vma) { atomic_set(&anon_vma->refcount, 1); - anon_vma->degree = 1; /* Reference for first vma */ + anon_vma->num_children = 0; + anon_vma->num_active_vmas = 0; anon_vma->parent = anon_vma; /* * Initialise the anon_vma root to point to itself. If called @@ -190,6 +191,7 @@ int __anon_vma_prepare(struct vm_area_st anon_vma = anon_vma_alloc(); if (unlikely(!anon_vma)) goto out_enomem_free_avc; + anon_vma->num_children++; /* self-parent link for new root */ allocated = anon_vma; }
@@ -199,8 +201,7 @@ int __anon_vma_prepare(struct vm_area_st if (likely(!vma->anon_vma)) { vma->anon_vma = anon_vma; anon_vma_chain_link(vma, avc, anon_vma); - /* vma reference or self-parent link for new root */ - anon_vma->degree++; + anon_vma->num_active_vmas++; allocated = NULL; avc = NULL; } @@ -279,19 +280,19 @@ int anon_vma_clone(struct vm_area_struct anon_vma_chain_link(dst, avc, anon_vma);
/* - * Reuse existing anon_vma if its degree lower than two, - * that means it has no vma and only one anon_vma child. + * Reuse existing anon_vma if it has no vma and only one + * anon_vma child. * - * Do not chose parent anon_vma, otherwise first child - * will always reuse it. Root anon_vma is never reused: + * Root anon_vma is never reused: * it has self-parent reference and at least one child. */ - if (!dst->anon_vma && anon_vma != src->anon_vma && - anon_vma->degree < 2) + if (!dst->anon_vma && + anon_vma->num_children < 2 && + anon_vma->num_active_vmas == 0) dst->anon_vma = anon_vma; } if (dst->anon_vma) - dst->anon_vma->degree++; + dst->anon_vma->num_active_vmas++; unlock_anon_vma_root(root); return 0;
@@ -341,6 +342,7 @@ int anon_vma_fork(struct vm_area_struct anon_vma = anon_vma_alloc(); if (!anon_vma) goto out_error; + anon_vma->num_active_vmas++; avc = anon_vma_chain_alloc(GFP_KERNEL); if (!avc) goto out_error_free_anon_vma; @@ -361,7 +363,7 @@ int anon_vma_fork(struct vm_area_struct vma->anon_vma = anon_vma; anon_vma_lock_write(anon_vma); anon_vma_chain_link(vma, avc, anon_vma); - anon_vma->parent->degree++; + anon_vma->parent->num_children++; anon_vma_unlock_write(anon_vma);
return 0; @@ -393,7 +395,7 @@ void unlink_anon_vmas(struct vm_area_str * to free them outside the lock. */ if (RB_EMPTY_ROOT(&anon_vma->rb_root.rb_root)) { - anon_vma->parent->degree--; + anon_vma->parent->num_children--; continue; }
@@ -401,7 +403,7 @@ void unlink_anon_vmas(struct vm_area_str anon_vma_chain_free(avc); } if (vma->anon_vma) - vma->anon_vma->degree--; + vma->anon_vma->num_active_vmas--; unlock_anon_vma_root(root);
/* @@ -412,7 +414,8 @@ void unlink_anon_vmas(struct vm_area_str list_for_each_entry_safe(avc, next, &vma->anon_vma_chain, same_vma) { struct anon_vma *anon_vma = avc->anon_vma;
- VM_WARN_ON(anon_vma->degree); + VM_WARN_ON(anon_vma->num_children); + VM_WARN_ON(anon_vma->num_active_vmas); put_anon_vma(anon_vma);
list_del(&avc->same_vma);
From: Fudong Wang Fudong.Wang@amd.com
[ Upstream commit b2a93490201300a749ad261b5c5d05cb50179c44 ]
[Why] After ODM clock off, optc underflow bit will be kept there always and clear not work. We need to clear that before clock off.
[How] Clear that if have when clock off.
Reviewed-by: Alvin Lee alvin.lee2@amd.com Acked-by: Tom Chung chiahsuan.chung@amd.com Signed-off-by: Fudong Wang Fudong.Wang@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c index 411f89218e019..cb5c44b339e09 100644 --- a/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c +++ b/drivers/gpu/drm/amd/display/dc/dcn10/dcn10_optc.c @@ -452,6 +452,11 @@ void optc1_enable_optc_clock(struct timing_generator *optc, bool enable) OTG_CLOCK_ON, 1, 1, 1000); } else { + + //last chance to clear underflow, otherwise, it will always there due to clock is off. + if (optc->funcs->is_optc_underflow_occurred(optc) == true) + optc->funcs->clear_optc_underflow(optc); + REG_UPDATE_2(OTG_CLOCK_CONTROL, OTG_CLOCK_GATE_DIS, 0, OTG_CLOCK_EN, 0);
From: Denis V. Lunev den@openvz.org
[ Upstream commit 66ba215cb51323e4e55e38fd5f250e0fae0cbc94 ]
Normal processing of ARP request (usually this is Ethernet broadcast packet) coming to the host is looking like the following: * the packet comes to arp_process() call and is passed through routing procedure * the request is put into the queue using pneigh_enqueue() if corresponding ARP record is not local (common case for container records on the host) * the request is processed by timer (within 80 jiffies by default) and ARP reply is sent from the same arp_process() using NEIGH_CB(skb)->flags & LOCALLY_ENQUEUED condition (flag is set inside pneigh_enqueue())
And here the problem comes. Linux kernel calls pneigh_queue_purge() which destroys the whole queue of ARP requests on ANY network interface start/stop event through __neigh_ifdown().
This is actually not a problem within the original world as network interface start/stop was accessible to the host 'root' only, which could do more destructive things. But the world is changed and there are Linux containers available. Here container 'root' has an access to this API and could be considered as untrusted user in the hosting (container's) world.
Thus there is an attack vector to other containers on node when container's root will endlessly start/stop interfaces. We have observed similar situation on a real production node when docker container was doing such activity and thus other containers on the node become not accessible.
The patch proposed doing very simple thing. It drops only packets from the same namespace in the pneigh_queue_purge() where network interface state change is detected. This is enough to prevent the problem for the whole node preserving original semantics of the code.
v2: - do del_timer_sync() if queue is empty after pneigh_queue_purge() v3: - rebase to net tree
Cc: "David S. Miller" davem@davemloft.net Cc: Eric Dumazet edumazet@google.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: Daniel Borkmann daniel@iogearbox.net Cc: David Ahern dsahern@kernel.org Cc: Yajun Deng yajun.deng@linux.dev Cc: Roopa Prabhu roopa@nvidia.com Cc: Christian Brauner brauner@kernel.org Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: Alexey Kuznetsov kuznet@ms2.inr.ac.ru Cc: Alexander Mikhalitsyn alexander.mikhalitsyn@virtuozzo.com Cc: Konstantin Khorenko khorenko@virtuozzo.com Cc: kernel@openvz.org Cc: devel@openvz.org Investigated-by: Alexander Mikhalitsyn alexander.mikhalitsyn@virtuozzo.com Signed-off-by: Denis V. Lunev den@openvz.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/neighbour.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-)
diff --git a/net/core/neighbour.c b/net/core/neighbour.c index 6233e9856016e..65e80aaa09481 100644 --- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -224,14 +224,23 @@ static int neigh_del_timer(struct neighbour *n) return 0; }
-static void pneigh_queue_purge(struct sk_buff_head *list) +static void pneigh_queue_purge(struct sk_buff_head *list, struct net *net) { + unsigned long flags; struct sk_buff *skb;
- while ((skb = skb_dequeue(list)) != NULL) { - dev_put(skb->dev); - kfree_skb(skb); + spin_lock_irqsave(&list->lock, flags); + skb = skb_peek(list); + while (skb != NULL) { + struct sk_buff *skb_next = skb_peek_next(skb, list); + if (net == NULL || net_eq(dev_net(skb->dev), net)) { + __skb_unlink(skb, list); + dev_put(skb->dev); + kfree_skb(skb); + } + skb = skb_next; } + spin_unlock_irqrestore(&list->lock, flags); }
static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev) @@ -297,9 +306,9 @@ int neigh_ifdown(struct neigh_table *tbl, struct net_device *dev) write_lock_bh(&tbl->lock); neigh_flush_dev(tbl, dev); pneigh_ifdown_and_unlock(tbl, dev); - - del_timer_sync(&tbl->proxy_timer); - pneigh_queue_purge(&tbl->proxy_queue); + pneigh_queue_purge(&tbl->proxy_queue, dev_net(dev)); + if (skb_queue_empty_lockless(&tbl->proxy_queue)) + del_timer_sync(&tbl->proxy_timer); return 0; } EXPORT_SYMBOL(neigh_ifdown); @@ -1614,7 +1623,7 @@ int neigh_table_clear(int index, struct neigh_table *tbl) /* It is not clean... Fix it to unload IPv6 module safely */ cancel_delayed_work_sync(&tbl->gc_work); del_timer_sync(&tbl->proxy_timer); - pneigh_queue_purge(&tbl->proxy_queue); + pneigh_queue_purge(&tbl->proxy_queue, NULL); neigh_ifdown(tbl, NULL); if (atomic_read(&tbl->entries)) pr_crit("neighbour leakage\n");
From: Juergen Gross jgross@suse.com
[ Upstream commit 7b6670b03641ac308aaa6fa2e6f964ac993b5ea3 ]
When booting under KVM the following error messages are issued:
hypfs.7f5705: The hardware system does not support hypfs hypfs.7a79f0: Initialization of hypfs failed with rc=-61
Demote the severity of first message from "error" to "info" and issue the second message only in other error cases.
Signed-off-by: Juergen Gross jgross@suse.com Acked-by: Heiko Carstens hca@linux.ibm.com Acked-by: Christian Borntraeger borntraeger@linux.ibm.com Link: https://lore.kernel.org/r/20220620094534.18967-1-jgross@suse.com [arch/s390/hypfs/hypfs_diag.c changed description] Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/hypfs/hypfs_diag.c | 2 +- arch/s390/hypfs/inode.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/s390/hypfs/hypfs_diag.c b/arch/s390/hypfs/hypfs_diag.c index 3452e18bb1ca8..38105ba35c814 100644 --- a/arch/s390/hypfs/hypfs_diag.c +++ b/arch/s390/hypfs/hypfs_diag.c @@ -437,7 +437,7 @@ __init int hypfs_diag_init(void) int rc;
if (diag204_probe()) { - pr_err("The hardware system does not support hypfs\n"); + pr_info("The hardware system does not support hypfs\n"); return -ENODATA; } if (diag204_info_type == DIAG204_INFO_EXT) { diff --git a/arch/s390/hypfs/inode.c b/arch/s390/hypfs/inode.c index e4d17d9ea93d8..4af5c0dd9fbe2 100644 --- a/arch/s390/hypfs/inode.c +++ b/arch/s390/hypfs/inode.c @@ -494,9 +494,9 @@ static int __init hypfs_init(void) hypfs_vm_exit(); fail_hypfs_diag_exit: hypfs_diag_exit(); + pr_err("Initialization of hypfs failed with rc=%i\n", rc); fail_dbfs_exit: hypfs_dbfs_exit(); - pr_err("Initialization of hypfs failed with rc=%i\n", rc); return rc; } device_initcall(hypfs_init)
From: Geert Uytterhoeven geert@linux-m68k.org
[ Upstream commit aa5762c34213aba7a72dc58e70601370805fa794 ]
NF_CONNTRACK_PROCFS was marked obsolete in commit 54b07dca68557b09 ("netfilter: provide config option to disable ancient procfs parts") in v3.3.
Signed-off-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/Kconfig | 1 - 1 file changed, 1 deletion(-)
diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index 56cddadb65d0c..92e0514f624fa 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -117,7 +117,6 @@ config NF_CONNTRACK_ZONES
config NF_CONNTRACK_PROCFS bool "Supply CT list in procfs (OBSOLETE)" - default y depends on PROC_FS ---help--- This option enables for the list of known conntrack entries
From: Kuniyuki Iwashima kuniyu@amazon.com
commit 9c80e79906b4ca440d09e7f116609262bb747909 upstream.
The assumption in __disable_kprobe() is wrong, and it could try to disarm an already disarmed kprobe and fire the WARN_ONCE() below. [0] We can easily reproduce this issue.
1. Write 0 to /sys/kernel/debug/kprobes/enabled.
# echo 0 > /sys/kernel/debug/kprobes/enabled
2. Run execsnoop. At this time, one kprobe is disabled.
# /usr/share/bcc/tools/execsnoop & [1] 2460 PCOMM PID PPID RET ARGS
# cat /sys/kernel/debug/kprobes/list ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE] ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE]
3. Write 1 to /sys/kernel/debug/kprobes/enabled, which changes kprobes_all_disarmed to false but does not arm the disabled kprobe.
# echo 1 > /sys/kernel/debug/kprobes/enabled
# cat /sys/kernel/debug/kprobes/list ffffffff91345650 r __x64_sys_execve+0x0 [FTRACE] ffffffff91345650 k __x64_sys_execve+0x0 [DISABLED][FTRACE]
4. Kill execsnoop, when __disable_kprobe() calls disarm_kprobe() for the disabled kprobe and hits the WARN_ONCE() in __disarm_kprobe_ftrace().
# fg /usr/share/bcc/tools/execsnoop ^C
Actually, WARN_ONCE() is fired twice, and __unregister_kprobe_top() misses some cleanups and leaves the aggregated kprobe in the hash table. Then, __unregister_trace_kprobe() initialises tk->rp.kp.list and creates an infinite loop like this.
aggregated kprobe.list -> kprobe.list -. ^ | '.__.'
In this situation, these commands fall into the infinite loop and result in RCU stall or soft lockup.
cat /sys/kernel/debug/kprobes/list : show_kprobe_addr() enters into the infinite loop with RCU.
/usr/share/bcc/tools/execsnoop : warn_kprobe_rereg() holds kprobe_mutex, and __get_valid_kprobe() is stuck in the loop.
To avoid the issue, make sure we don't call disarm_kprobe() for disabled kprobes.
[0] Failed to disarm kprobe-ftrace at __x64_sys_execve+0x0/0x40 (error -2) WARNING: CPU: 6 PID: 2460 at kernel/kprobes.c:1130 __disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129) Modules linked in: ena CPU: 6 PID: 2460 Comm: execsnoop Not tainted 5.19.0+ #28 Hardware name: Amazon EC2 c5.2xlarge/, BIOS 1.0 10/16/2017 RIP: 0010:__disarm_kprobe_ftrace.isra.19 (kernel/kprobes.c:1129) Code: 24 8b 02 eb c1 80 3d c4 83 f2 01 00 75 d4 48 8b 75 00 89 c2 48 c7 c7 90 fa 0f 92 89 04 24 c6 05 ab 83 01 e8 e4 94 f0 ff <0f> 0b 8b 04 24 eb b1 89 c6 48 c7 c7 60 fa 0f 92 89 04 24 e8 cc 94 RSP: 0018:ffff9e6ec154bd98 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffff930f7b00 RCX: 0000000000000001 RDX: 0000000080000001 RSI: ffffffff921461c5 RDI: 00000000ffffffff RBP: ffff89c504286da8 R08: 0000000000000000 R09: c0000000fffeffff R10: 0000000000000000 R11: ffff9e6ec154bc28 R12: ffff89c502394e40 R13: ffff89c502394c00 R14: ffff9e6ec154bc00 R15: 0000000000000000 FS: 00007fe800398740(0000) GS:ffff89c812d80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000c00057f010 CR3: 0000000103b54006 CR4: 00000000007706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> __disable_kprobe (kernel/kprobes.c:1716) disable_kprobe (kernel/kprobes.c:2392) __disable_trace_kprobe (kernel/trace/trace_kprobe.c:340) disable_trace_kprobe (kernel/trace/trace_kprobe.c:429) perf_trace_event_unreg.isra.2 (./include/linux/tracepoint.h:93 kernel/trace/trace_event_perf.c:168) perf_kprobe_destroy (kernel/trace/trace_event_perf.c:295) _free_event (kernel/events/core.c:4971) perf_event_release_kernel (kernel/events/core.c:5176) perf_release (kernel/events/core.c:5186) __fput (fs/file_table.c:321) task_work_run (./include/linux/sched.h:2056 (discriminator 1) kernel/task_work.c:179 (discriminator 1)) exit_to_user_mode_prepare (./include/linux/resume_user_mode.h:49 kernel/entry/common.c:169 kernel/entry/common.c:201) syscall_exit_to_user_mode (./arch/x86/include/asm/jump_label.h:55 ./arch/x86/include/asm/nospec-branch.h:384 ./arch/x86/include/asm/entry-common.h:94 kernel/entry/common.c:133 kernel/entry/common.c:296) do_syscall_64 (arch/x86/entry/common.c:87) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) RIP: 0033:0x7fe7ff210654 Code: 15 79 89 20 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb be 0f 1f 00 8b 05 9a cd 20 00 48 63 ff 85 c0 75 11 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3a f3 c3 48 83 ec 18 48 89 7c 24 08 e8 34 fc RSP: 002b:00007ffdbd1d3538 EFLAGS: 00000246 ORIG_RAX: 0000000000000003 RAX: 0000000000000000 RBX: 0000000000000008 RCX: 00007fe7ff210654 RDX: 0000000000000000 RSI: 0000000000002401 RDI: 0000000000000008 RBP: 0000000000000000 R08: 94ae31d6fda838a4 R0900007fe8001c9d30 R10: 00007ffdbd1d34b0 R11: 0000000000000246 R12: 00007ffdbd1d3600 R13: 0000000000000000 R14: fffffffffffffffc R15: 00007ffdbd1d3560 </TASK>
Link: https://lkml.kernel.org/r/20220813020509.90805-1-kuniyu@amazon.com Fixes: 69d54b916d83 ("kprobes: makes kprobes/enabled works correctly for optimized kprobes.") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reported-by: Ayushman Dutta ayudutta@amazon.com Cc: "Naveen N. Rao" naveen.n.rao@linux.ibm.com Cc: Anil S Keshavamurthy anil.s.keshavamurthy@intel.com Cc: "David S. Miller" davem@davemloft.net Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Wang Nan wangnan0@huawei.com Cc: Kuniyuki Iwashima kuniyu@amazon.com Cc: Kuniyuki Iwashima kuni1840@gmail.com Cc: Ayushman Dutta ayudutta@amazon.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/kprobes.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1709,11 +1709,12 @@ static struct kprobe *__disable_kprobe(s /* Try to disarm and disable this/parent probe */ if (p == orig_p || aggr_kprobe_disabled(orig_p)) { /* - * If kprobes_all_disarmed is set, orig_p - * should have already been disarmed, so - * skip unneed disarming process. + * Don't be lazy here. Even if 'kprobes_all_disarmed' + * is false, 'orig_p' might not have been armed yet. + * Note arm_all_kprobes() __tries__ to arm all kprobes + * on the best effort basis. */ - if (!kprobes_all_disarmed) { + if (!kprobes_all_disarmed && !kprobe_disabled(orig_p)) { ret = disarm_kprobe(orig_p, true); if (ret) { p->flags &= ~KPROBE_FLAG_DISABLED;
From: Yang Yingliang yangyingliang@huawei.com
commit d5485d9dd24e1d04e5509916515260186eb1455c upstream.
It is not allowed to call kfree_skb() from hardware interrupt context or with interrupts being disabled. So add all skb to a tmp list, then free them after spin_unlock_irqrestore() at once.
Fixes: 66ba215cb513 ("neigh: fix possible DoS due to net iface start/stop loop") Suggested-by: Denis V. Lunev den@openvz.org Signed-off-by: Yang Yingliang yangyingliang@huawei.com Reviewed-by: Nikolay Aleksandrov razor@blackwall.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/neighbour.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/net/core/neighbour.c +++ b/net/core/neighbour.c @@ -226,21 +226,27 @@ static int neigh_del_timer(struct neighb
static void pneigh_queue_purge(struct sk_buff_head *list, struct net *net) { + struct sk_buff_head tmp; unsigned long flags; struct sk_buff *skb;
+ skb_queue_head_init(&tmp); spin_lock_irqsave(&list->lock, flags); skb = skb_peek(list); while (skb != NULL) { struct sk_buff *skb_next = skb_peek_next(skb, list); if (net == NULL || net_eq(dev_net(skb->dev), net)) { __skb_unlink(skb, list); - dev_put(skb->dev); - kfree_skb(skb); + __skb_queue_tail(&tmp, skb); } skb = skb_next; } spin_unlock_irqrestore(&list->lock, flags); + + while ((skb = __skb_dequeue(&tmp))) { + dev_put(skb->dev); + kfree_skb(skb); + } }
static void neigh_flush_dev(struct neigh_table *tbl, struct net_device *dev)
On Fri, 02 Sep 2022 14:18:20 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.257 release. There are 56 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 04 Sep 2022 12:13:47 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.257-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v4.19: 10 builds: 10 pass, 0 fail 22 boots: 22 pass, 0 fail 40 tests: 40 pass, 0 fail
Linux version: 4.19.257-rc1-g56ebf961480f Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On 9/2/22 06:18, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.257 release. There are 56 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 04 Sep 2022 12:13:47 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.257-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Fri, Sep 02, 2022 at 02:18:20PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.257 release. There are 56 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 04 Sep 2022 12:13:47 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 157 fail: 0 Qemu test results: total: 422 pass: 422 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
Hi Greg,
On Fri, Sep 02, 2022 at 02:18:20PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.19.257 release. There are 56 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 04 Sep 2022 12:13:47 +0000. Anything received after that time might be too late.
Build test (gcc version 11.3.1 20220819): mips: 63 configs -> no failure arm: 115 configs -> no failure arm64: 2 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1]
[1]. https://openqa.qa.codethink.co.uk/tests/1754
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
-- Regards Sudip
On Fri, 2 Sept 2022 at 17:55, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.19.257 release. There are 56 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Sun, 04 Sep 2022 12:13:47 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.257-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro's test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 4.19.257-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-4.19.y * git commit: 56ebf961480f5507db5b5ff29f49ab7c03b05a24 * git describe: v4.19.256-57-g56ebf961480f * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.19.y/build/v4.19....
## No test Regressions (compared to v4.19.256)
## No metric Regressions (compared to v4.19.256)
## No test Fixes (compared to v4.19.256)
## No metric Fixes (compared to v4.19.256)
## Test result summary total: 84836, pass: 74060, fail: 668, skip: 9690, xfail: 418
## Build Summary * arc: 10 total, 10 passed, 0 failed * arm: 291 total, 286 passed, 5 failed * arm64: 58 total, 57 passed, 1 failed * i386: 26 total, 25 passed, 1 failed * mips: 35 total, 35 passed, 0 failed * parisc: 12 total, 12 passed, 0 failed * powerpc: 54 total, 54 passed, 0 failed * s390: 12 total, 12 passed, 0 failed * sh: 24 total, 24 passed, 0 failed * sparc: 12 total, 12 passed, 0 failed * x86_64: 52 total, 51 passed, 1 failed
## Test suites summary * fwts * igt-gpu-tools * kunit * kvm-unit-tests * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-open-posix-tests * ltp-pty * ltp-sched * ltp-securebits * ltp-syscalls * ltp-tracing * network-basic-tests * packetdrill * rcutorture * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
Hi!
This is the start of the stable review cycle for the 4.19.257 release. There are 56 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-4...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
linux-stable-mirror@lists.linaro.org