This is the start of the stable review cycle for the 4.9.257 release. There are 43 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Feb 2021 14:57:55 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.257-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.9.257-rc1
Shih-Yuan Lee (FourDollars) sylee@canonical.com ALSA: hda/realtek - Fix typo of pincfg for Dell quirk
Nadav Amit namit@vmware.com iommu/vt-d: Do not use flush-queue when caching-mode is on
Rafael J. Wysocki rafael.j.wysocki@intel.com ACPI: thermal: Do not call acpi_thermal_check() directly
Benjamin Valentin benpicco@googlemail.com Input: xpad - sync supported devices with fork on GitHub
Dave Hansen dave.hansen@linux.intel.com x86/apic: Add extra serialization for non-serializing MSRs
Josh Poimboeuf jpoimboe@redhat.com x86/build: Disable CET instrumentation in the kernel
Hugh Dickins hughd@google.com mm: thp: fix MADV_REMOVE deadlock on shmem THP
Muchun Song songmuchun@bytedance.com mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active
Muchun Song songmuchun@bytedance.com mm: hugetlb: fix a race between isolating and freeing page
Muchun Song songmuchun@bytedance.com mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page
Russell King rmk+kernel@armlinux.org.uk ARM: footbridge: fix dc21285 PCI configuration accessors
Fengnan Chang fengnanchang@gmail.com mmc: core: Limit retries when analyse of SDIO tuples fails
Aurelien Aptel aaptel@suse.com cifs: report error instead of invalid when revalidating a dentry fails
Mathias Nyman mathias.nyman@linux.intel.com xhci: fix bounce buffer usage for non-sg list case
Wang ShaoBo bobo.shaobowang@huawei.com kretprobe: Avoid re-registration of the same kretprobe earlier
Felix Fietkau nbd@nbd.name mac80211: fix station rate table updates on assoc
Heiko Stuebner heiko.stuebner@theobroma-systems.com usb: dwc2: Fix endpoint direction check in ep_from_windex
Jeremy Figgins kernel@jeremyfiggins.com USB: usblp: don't call usb_set_interface if there's a single alt
Dan Carpenter dan.carpenter@oracle.com USB: gadget: legacy: fix an error code in eth_bind()
Arnd Bergmann arnd@arndb.de elfcore: fix building with clang
Xie He xie.he.0141@gmail.com net: lapb: Copy the skb before sending a packet
Alexey Dobriyan adobriyan@gmail.com Input: i8042 - unbreak Pegatron C15B
Christoph Schemmel christoph.schemmel@gmail.com USB: serial: option: Adding support for Cinterion MV31
Chenxin Jin bg4akv@hotmail.com USB: serial: cp210x: add new VID/PID for supporting Teraoka AD2000
Pho Tran Pho.Tran@silabs.com USB: serial: cp210x: add pid/vid for WSDA-200-USB
Sasha Levin sashal@kernel.org stable: clamp SUBLEVEL in 4.4 and 4.9
Josh Poimboeuf jpoimboe@redhat.com objtool: Don't fail on missing symbol table
Brian King brking@linux.vnet.ibm.com scsi: ibmvfc: Set default timeout to avoid crash during migration
Felix Fietkau nbd@nbd.name mac80211: fix fast-rx encryption check
Javed Hasan jhasan@marvell.com scsi: libfc: Avoid invoking response handler twice if ep is already completed
Thomas Gleixner tglx@linutronix.de futex: Handle faults correctly for PI futexes
Thomas Gleixner tglx@linutronix.de futex: Simplify fixup_pi_state_owner()
Thomas Gleixner tglx@linutronix.de futex: Use pi_state_update_owner() in put_pi_state()
Thomas Gleixner tglx@linutronix.de rtmutex: Remove unused argument from rt_mutex_proxy_unlock()
Thomas Gleixner tglx@linutronix.de futex: Provide and use pi_state_update_owner()
Thomas Gleixner tglx@linutronix.de futex: Replace pointless printk in fixup_owner()
Peter Zijlstra peterz@infradead.org futex: Avoid violating the 10th rule of futex
Peter Zijlstra peterz@infradead.org futex: Rework inconsistent rt_mutex/futex_q state
Peter Zijlstra peterz@infradead.org futex: Remove rt_mutex_deadlock_account_*()
Peter Zijlstra peterz@infradead.org futex,rt_mutex: Provide futex specific rt_mutex API
Eric Dumazet edumazet@google.com net_sched: reject silly cell_log in qdisc_get_rtab()
Lijun Pan ljp@linux.ibm.com ibmvnic: Ensure that CRQ entry read are correctly ordered
Pan Bian bianpan2016@163.com net: dsa: bcm_sf2: put device node before return
-------------
Diffstat:
Makefile | 12 +- arch/arm/mach-footbridge/dc21285.c | 12 +- arch/x86/Makefile | 3 + arch/x86/include/asm/apic.h | 10 -- arch/x86/include/asm/barrier.h | 18 +++ arch/x86/kernel/apic/apic.c | 4 + arch/x86/kernel/apic/x2apic_cluster.c | 6 +- arch/x86/kernel/apic/x2apic_phys.c | 6 +- drivers/acpi/thermal.c | 55 ++++--- drivers/input/joystick/xpad.c | 17 ++- drivers/input/serio/i8042-x86ia64io.h | 2 + drivers/iommu/intel-iommu.c | 6 + drivers/mmc/core/sdio_cis.c | 6 + drivers/net/dsa/bcm_sf2.c | 8 +- drivers/net/ethernet/ibm/ibmvnic.c | 6 + drivers/scsi/ibmvscsi/ibmvfc.c | 4 +- drivers/scsi/libfc/fc_exch.c | 16 +- drivers/usb/class/usblp.c | 19 ++- drivers/usb/dwc2/gadget.c | 8 +- drivers/usb/gadget/legacy/ether.c | 4 +- drivers/usb/host/xhci-ring.c | 31 ++-- drivers/usb/serial/cp210x.c | 2 + drivers/usb/serial/option.c | 6 + fs/cifs/dir.c | 22 ++- fs/hugetlbfs/inode.c | 3 +- include/linux/elfcore.h | 22 +++ include/linux/hugetlb.h | 3 + kernel/Makefile | 1 - kernel/elfcore.c | 25 --- kernel/futex.c | 276 +++++++++++++++++++--------------- kernel/kprobes.c | 4 + kernel/locking/rtmutex-debug.c | 9 -- kernel/locking/rtmutex-debug.h | 3 - kernel/locking/rtmutex.c | 127 ++++++++++------ kernel/locking/rtmutex.h | 2 - kernel/locking/rtmutex_common.h | 12 +- mm/huge_memory.c | 37 +++-- mm/hugetlb.c | 9 +- net/lapb/lapb_out.c | 3 +- net/mac80211/driver-ops.c | 5 +- net/mac80211/rate.c | 3 +- net/mac80211/rx.c | 2 + net/sched/sch_api.c | 3 +- sound/pci/hda/patch_realtek.c | 2 +- tools/objtool/elf.c | 7 +- 45 files changed, 521 insertions(+), 320 deletions(-)
From: Pan Bian bianpan2016@163.com
commit cf3c46631e1637582f517a574c77cd6c05793817 upstream.
Put the device node dn before return error code on failure path.
Fixes: 461cd1b03e32 ("net: dsa: bcm_sf2: Register our slave MDIO bus") Signed-off-by: Pan Bian bianpan2016@163.com Link: https://lore.kernel.org/r/20210121123343.26330-1-bianpan2016@163.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/bcm_sf2.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/net/dsa/bcm_sf2.c +++ b/drivers/net/dsa/bcm_sf2.c @@ -515,15 +515,19 @@ static int bcm_sf2_mdio_register(struct /* Find our integrated MDIO bus node */ dn = of_find_compatible_node(NULL, NULL, "brcm,unimac-mdio"); priv->master_mii_bus = of_mdio_find_bus(dn); - if (!priv->master_mii_bus) + if (!priv->master_mii_bus) { + of_node_put(dn); return -EPROBE_DEFER; + }
get_device(&priv->master_mii_bus->dev); priv->master_mii_dn = dn;
priv->slave_mii_bus = devm_mdiobus_alloc(ds->dev); - if (!priv->slave_mii_bus) + if (!priv->slave_mii_bus) { + of_node_put(dn); return -ENOMEM; + }
priv->slave_mii_bus->priv = priv; priv->slave_mii_bus->name = "sf2 slave mii";
From: Lijun Pan ljp@linux.ibm.com
commit e41aec79e62fa50f940cf222d1e9577f14e149dc upstream.
Ensure that received Command-Response Queue (CRQ) entries are properly read in order by the driver. dma_rmb barrier has been added before accessing the CRQ descriptor to ensure the entire descriptor is read before processing.
Fixes: 032c5e82847a ("Driver for IBM System i/p VNIC protocol") Signed-off-by: Lijun Pan ljp@linux.ibm.com Link: https://lore.kernel.org/r/20210128013442.88319-1-ljp@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/ibm/ibmvnic.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/net/ethernet/ibm/ibmvnic.c +++ b/drivers/net/ethernet/ibm/ibmvnic.c @@ -3496,6 +3496,12 @@ static irqreturn_t ibmvnic_interrupt(int while (!done) { /* Pull all the valid messages off the CRQ */ while ((crq = ibmvnic_next_crq(adapter)) != NULL) { + /* This barrier makes sure ibmvnic_next_crq()'s + * crq->generic.first & IBMVNIC_CRQ_CMD_RSP is loaded + * before ibmvnic_handle_crq()'s + * switch(gen_crq->first) and switch(gen_crq->cmd). + */ + dma_rmb(); ibmvnic_handle_crq(crq, adapter); crq->generic.first = 0; }
From: Eric Dumazet edumazet@google.com
commit e4bedf48aaa5552bc1f49703abd17606e7e6e82a upstream
iproute2 probably never goes beyond 8 for the cell exponent, but stick to the max shift exponent for signed 32bit.
UBSAN reported: UBSAN: shift-out-of-bounds in net/sched/sch_api.c:389:22 shift exponent 130 is too large for 32-bit type 'int' CPU: 1 PID: 8450 Comm: syz-executor586 Not tainted 5.11.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:79 [inline] dump_stack+0x183/0x22e lib/dump_stack.c:120 ubsan_epilogue lib/ubsan.c:148 [inline] __ubsan_handle_shift_out_of_bounds+0x432/0x4d0 lib/ubsan.c:395 __detect_linklayer+0x2a9/0x330 net/sched/sch_api.c:389 qdisc_get_rtab+0x2b5/0x410 net/sched/sch_api.c:435 cbq_init+0x28f/0x12c0 net/sched/sch_cbq.c:1180 qdisc_create+0x801/0x1470 net/sched/sch_api.c:1246 tc_modify_qdisc+0x9e3/0x1fc0 net/sched/sch_api.c:1662 rtnetlink_rcv_msg+0xb1d/0xe60 net/core/rtnetlink.c:5564 netlink_rcv_skb+0x1f0/0x460 net/netlink/af_netlink.c:2494 netlink_unicast_kernel net/netlink/af_netlink.c:1304 [inline] netlink_unicast+0x7de/0x9b0 net/netlink/af_netlink.c:1330 netlink_sendmsg+0xaa6/0xe90 net/netlink/af_netlink.c:1919 sock_sendmsg_nosec net/socket.c:652 [inline] sock_sendmsg net/socket.c:672 [inline] ____sys_sendmsg+0x5a2/0x900 net/socket.c:2345 ___sys_sendmsg net/socket.c:2399 [inline] __sys_sendmsg+0x319/0x400 net/socket.c:2432 do_syscall_64+0x2d/0x70 arch/x86/entry/common.c:46 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: Cong Wang cong.wang@bytedance.com Link: https://lore.kernel.org/r/20210114160637.1660597-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org [sudip: adjust context] Signed-off-by: Sudip Mukherjee sudipm.mukherjee@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sched/sch_api.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/net/sched/sch_api.c +++ b/net/sched/sch_api.c @@ -393,7 +393,8 @@ struct qdisc_rate_table *qdisc_get_rtab( { struct qdisc_rate_table *rtab;
- if (tab == NULL || r->rate == 0 || r->cell_log == 0 || + if (tab == NULL || r->rate == 0 || + r->cell_log == 0 || r->cell_log >= 32 || nla_len(tab) != TC_RTAB_SIZE) return NULL;
From: Peter Zijlstra peterz@infradead.org
[ Upstream commit 5293c2efda37775346885c7e924d4ef7018ea60b ]
Part of what makes futex_unlock_pi() intricate is that rt_mutex_futex_unlock() -> rt_mutex_slowunlock() can drop rt_mutex::wait_lock.
This means it cannot rely on the atomicy of wait_lock, which would be preferred in order to not rely on hb->lock so much.
The reason rt_mutex_slowunlock() needs to drop wait_lock is because it can race with the rt_mutex fastpath, however futexes have their own fast path.
Since futexes already have a bunch of separate rt_mutex accessors, complete that set and implement a rt_mutex variant without fastpath for them.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.702962446@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de [Lee: Back-ported to solve a dependency] Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/futex.c | 30 ++++++++++----------- kernel/locking/rtmutex.c | 56 +++++++++++++++++++++++++++++----------- kernel/locking/rtmutex_common.h | 8 ++++- 3 files changed, 61 insertions(+), 33 deletions(-)
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -941,7 +941,7 @@ static void exit_pi_state_list(struct ta pi_state->owner = NULL; raw_spin_unlock_irq(&curr->pi_lock);
- rt_mutex_unlock(&pi_state->pi_mutex); + rt_mutex_futex_unlock(&pi_state->pi_mutex);
spin_unlock(&hb->lock);
@@ -1441,20 +1441,18 @@ static int wake_futex_pi(u32 __user *uad pi_state->owner = new_owner; raw_spin_unlock(&new_owner->pi_lock);
- raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); - - deboost = rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q); - /* - * First unlock HB so the waiter does not spin on it once he got woken - * up. Second wake up the waiter before the priority is adjusted. If we - * deboost first (and lose our higher priority), then the task might get - * scheduled away before the wake up can take place. + * We've updated the uservalue, this unlock cannot fail. */ + deboost = __rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q); + + raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); - wake_up_q(&wake_q); - if (deboost) + + if (deboost) { + wake_up_q(&wake_q); rt_mutex_adjust_prio(current); + }
return 0; } @@ -2397,7 +2395,7 @@ static int fixup_owner(u32 __user *uaddr * task acquired the rt_mutex after we removed ourself from the * rt_mutex waiters list. */ - if (rt_mutex_trylock(&q->pi_state->pi_mutex)) { + if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) { locked = 1; goto out; } @@ -2721,7 +2719,7 @@ retry_private: if (!trylock) { ret = rt_mutex_timed_futex_lock(&q.pi_state->pi_mutex, to); } else { - ret = rt_mutex_trylock(&q.pi_state->pi_mutex); + ret = rt_mutex_futex_trylock(&q.pi_state->pi_mutex); /* Fixup the trylock return value: */ ret = ret ? 0 : -EWOULDBLOCK; } @@ -2744,7 +2742,7 @@ retry_private: * it and return the fault to userspace. */ if (ret && (rt_mutex_owner(&q.pi_state->pi_mutex) == current)) - rt_mutex_unlock(&q.pi_state->pi_mutex); + rt_mutex_futex_unlock(&q.pi_state->pi_mutex);
/* Unqueue and drop the lock */ unqueue_me_pi(&q); @@ -3051,7 +3049,7 @@ static int futex_wait_requeue_pi(u32 __u spin_lock(q.lock_ptr); ret = fixup_pi_state_owner(uaddr2, &q, current); if (ret && rt_mutex_owner(&q.pi_state->pi_mutex) == current) - rt_mutex_unlock(&q.pi_state->pi_mutex); + rt_mutex_futex_unlock(&q.pi_state->pi_mutex); /* * Drop the reference to the pi state which * the requeue_pi() code acquired for us. @@ -3094,7 +3092,7 @@ static int futex_wait_requeue_pi(u32 __u * userspace. */ if (ret && rt_mutex_owner(pi_mutex) == current) - rt_mutex_unlock(pi_mutex); + rt_mutex_futex_unlock(pi_mutex);
/* Unqueue and drop the lock. */ unqueue_me_pi(&q); --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1519,15 +1519,23 @@ EXPORT_SYMBOL_GPL(rt_mutex_lock_interrup
/* * Futex variant with full deadlock detection. + * Futex variants must not use the fast-path, see __rt_mutex_futex_unlock(). */ -int rt_mutex_timed_futex_lock(struct rt_mutex *lock, +int __sched rt_mutex_timed_futex_lock(struct rt_mutex *lock, struct hrtimer_sleeper *timeout) { might_sleep();
- return rt_mutex_timed_fastlock(lock, TASK_INTERRUPTIBLE, timeout, - RT_MUTEX_FULL_CHAINWALK, - rt_mutex_slowlock); + return rt_mutex_slowlock(lock, TASK_INTERRUPTIBLE, + timeout, RT_MUTEX_FULL_CHAINWALK); +} + +/* + * Futex variant, must not use fastpath. + */ +int __sched rt_mutex_futex_trylock(struct rt_mutex *lock) +{ + return rt_mutex_slowtrylock(lock); }
/** @@ -1586,20 +1594,38 @@ void __sched rt_mutex_unlock(struct rt_m EXPORT_SYMBOL_GPL(rt_mutex_unlock);
/** - * rt_mutex_futex_unlock - Futex variant of rt_mutex_unlock - * @lock: the rt_mutex to be unlocked - * - * Returns: true/false indicating whether priority adjustment is - * required or not. + * Futex variant, that since futex variants do not use the fast-path, can be + * simple and will not need to retry. */ -bool __sched rt_mutex_futex_unlock(struct rt_mutex *lock, - struct wake_q_head *wqh) +bool __sched __rt_mutex_futex_unlock(struct rt_mutex *lock, + struct wake_q_head *wake_q) { - if (likely(rt_mutex_cmpxchg_release(lock, current, NULL))) { - rt_mutex_deadlock_account_unlock(current); - return false; + lockdep_assert_held(&lock->wait_lock); + + debug_rt_mutex_unlock(lock); + + if (!rt_mutex_has_waiters(lock)) { + lock->owner = NULL; + return false; /* done */ + } + + mark_wakeup_next_waiter(wake_q, lock); + return true; /* deboost and wakeups */ +} + +void __sched rt_mutex_futex_unlock(struct rt_mutex *lock) +{ + WAKE_Q(wake_q); + bool deboost; + + raw_spin_lock_irq(&lock->wait_lock); + deboost = __rt_mutex_futex_unlock(lock, &wake_q); + raw_spin_unlock_irq(&lock->wait_lock); + + if (deboost) { + wake_up_q(&wake_q); + rt_mutex_adjust_prio(current); } - return rt_mutex_slowunlock(lock, wqh); }
/** --- a/kernel/locking/rtmutex_common.h +++ b/kernel/locking/rtmutex_common.h @@ -113,8 +113,12 @@ extern int rt_mutex_wait_proxy_lock(stru extern bool rt_mutex_cleanup_proxy_lock(struct rt_mutex *lock, struct rt_mutex_waiter *waiter); extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to); -extern bool rt_mutex_futex_unlock(struct rt_mutex *lock, - struct wake_q_head *wqh); +extern int rt_mutex_futex_trylock(struct rt_mutex *l); + +extern void rt_mutex_futex_unlock(struct rt_mutex *lock); +extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock, + struct wake_q_head *wqh); + extern void rt_mutex_adjust_prio(struct task_struct *task);
#ifdef CONFIG_DEBUG_RT_MUTEXES
From: Peter Zijlstra peterz@infradead.org
These are unused and clutter up the code.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.652692478@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de [Lee: Back-ported to solve a dependency] Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/locking/rtmutex-debug.c | 9 -------- kernel/locking/rtmutex-debug.h | 3 -- kernel/locking/rtmutex.c | 42 +++++++++++++++-------------------------- kernel/locking/rtmutex.h | 2 - 4 files changed, 16 insertions(+), 40 deletions(-)
--- a/kernel/locking/rtmutex-debug.c +++ b/kernel/locking/rtmutex-debug.c @@ -173,12 +173,3 @@ void debug_rt_mutex_init(struct rt_mutex lock->name = name; }
-void -rt_mutex_deadlock_account_lock(struct rt_mutex *lock, struct task_struct *task) -{ -} - -void rt_mutex_deadlock_account_unlock(struct task_struct *task) -{ -} - --- a/kernel/locking/rtmutex-debug.h +++ b/kernel/locking/rtmutex-debug.h @@ -9,9 +9,6 @@ * This file contains macros used solely by rtmutex.c. Debug version. */
-extern void -rt_mutex_deadlock_account_lock(struct rt_mutex *lock, struct task_struct *task); -extern void rt_mutex_deadlock_account_unlock(struct task_struct *task); extern void debug_rt_mutex_init_waiter(struct rt_mutex_waiter *waiter); extern void debug_rt_mutex_free_waiter(struct rt_mutex_waiter *waiter); extern void debug_rt_mutex_init(struct rt_mutex *lock, const char *name); --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -956,8 +956,6 @@ takeit: */ rt_mutex_set_owner(lock, task);
- rt_mutex_deadlock_account_lock(lock, task); - return 1; }
@@ -1365,8 +1363,6 @@ static bool __sched rt_mutex_slowunlock(
debug_rt_mutex_unlock(lock);
- rt_mutex_deadlock_account_unlock(current); - /* * We must be careful here if the fast path is enabled. If we * have no waiters queued we cannot set owner to NULL here @@ -1432,11 +1428,10 @@ rt_mutex_fastlock(struct rt_mutex *lock, struct hrtimer_sleeper *timeout, enum rtmutex_chainwalk chwalk)) { - if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) { - rt_mutex_deadlock_account_lock(lock, current); + if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) return 0; - } else - return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK); + + return slowfn(lock, state, NULL, RT_MUTEX_MIN_CHAINWALK); }
static inline int @@ -1448,21 +1443,19 @@ rt_mutex_timed_fastlock(struct rt_mutex enum rtmutex_chainwalk chwalk)) { if (chwalk == RT_MUTEX_MIN_CHAINWALK && - likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) { - rt_mutex_deadlock_account_lock(lock, current); + likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) return 0; - } else - return slowfn(lock, state, timeout, chwalk); + + return slowfn(lock, state, timeout, chwalk); }
static inline int rt_mutex_fasttrylock(struct rt_mutex *lock, int (*slowfn)(struct rt_mutex *lock)) { - if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) { - rt_mutex_deadlock_account_lock(lock, current); + if (likely(rt_mutex_cmpxchg_acquire(lock, NULL, current))) return 1; - } + return slowfn(lock); }
@@ -1472,19 +1465,18 @@ rt_mutex_fastunlock(struct rt_mutex *loc struct wake_q_head *wqh)) { WAKE_Q(wake_q); + bool deboost;
- if (likely(rt_mutex_cmpxchg_release(lock, current, NULL))) { - rt_mutex_deadlock_account_unlock(current); + if (likely(rt_mutex_cmpxchg_release(lock, current, NULL))) + return;
- } else { - bool deboost = slowfn(lock, &wake_q); + deboost = slowfn(lock, &wake_q);
- wake_up_q(&wake_q); + wake_up_q(&wake_q);
- /* Undo pi boosting if necessary: */ - if (deboost) - rt_mutex_adjust_prio(current); - } + /* Undo pi boosting if necessary: */ + if (deboost) + rt_mutex_adjust_prio(current); }
/** @@ -1682,7 +1674,6 @@ void rt_mutex_init_proxy_locked(struct r __rt_mutex_init(lock, NULL); debug_rt_mutex_proxy_lock(lock, proxy_owner); rt_mutex_set_owner(lock, proxy_owner); - rt_mutex_deadlock_account_lock(lock, proxy_owner); }
/** @@ -1698,7 +1689,6 @@ void rt_mutex_proxy_unlock(struct rt_mut { debug_rt_mutex_proxy_unlock(lock); rt_mutex_set_owner(lock, NULL); - rt_mutex_deadlock_account_unlock(proxy_owner); }
/** --- a/kernel/locking/rtmutex.h +++ b/kernel/locking/rtmutex.h @@ -11,8 +11,6 @@ */
#define rt_mutex_deadlock_check(l) (0) -#define rt_mutex_deadlock_account_lock(m, t) do { } while (0) -#define rt_mutex_deadlock_account_unlock(l) do { } while (0) #define debug_rt_mutex_init_waiter(w) do { } while (0) #define debug_rt_mutex_free_waiter(w) do { } while (0) #define debug_rt_mutex_lock(l) do { } while (0)
From: Peter Zijlstra peterz@infradead.org
[Upstream commit 73d786bd043ebc855f349c81ea805f6b11cbf2aa ]
There is a weird state in the futex_unlock_pi() path when it interleaves with a concurrent futex_lock_pi() at the point where it drops hb->lock.
In this case, it can happen that the rt_mutex wait_list and the futex_q disagree on pending waiters, in particular rt_mutex will find no pending waiters where futex_q thinks there are. In this case the rt_mutex unlock code cannot assign an owner.
The futex side fixup code has to cleanup the inconsistencies with quite a bunch of interesting corner cases.
Simplify all this by changing wake_futex_pi() to return -EAGAIN when this situation occurs. This then gives the futex_lock_pi() code the opportunity to continue and the retried futex_unlock_pi() will now observe a coherent state.
The only problem is that this breaks RT timeliness guarantees. That is, consider the following scenario:
T1 and T2 are both pinned to CPU0. prio(T2) > prio(T1)
CPU0
T1 lock_pi() queue_me() <- Waiter is visible
preemption
T2 unlock_pi() loops with -EAGAIN forever
Which is undesirable for PI primitives. Future patches will rectify this.
Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170322104151.850383690@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de [Lee: Back-ported to solve a dependency] Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/futex.c | 50 ++++++++++++++------------------------------------ 1 file changed, 14 insertions(+), 36 deletions(-)
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -1394,12 +1394,19 @@ static int wake_futex_pi(u32 __user *uad new_owner = rt_mutex_next_owner(&pi_state->pi_mutex);
/* - * It is possible that the next waiter (the one that brought - * this owner to the kernel) timed out and is no longer - * waiting on the lock. + * When we interleave with futex_lock_pi() where it does + * rt_mutex_timed_futex_lock(), we might observe @this futex_q waiter, + * but the rt_mutex's wait_list can be empty (either still, or again, + * depending on which side we land). + * + * When this happens, give up our locks and try again, giving the + * futex_lock_pi() instance time to complete, either by waiting on the + * rtmutex or removing itself from the futex queue. */ - if (!new_owner) - new_owner = this->task; + if (!new_owner) { + raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); + return -EAGAIN; + }
/* * We pass it to the next owner. The WAITERS bit is always @@ -2372,7 +2379,6 @@ static long futex_wait_restart(struct re */ static int fixup_owner(u32 __user *uaddr, struct futex_q *q, int locked) { - struct task_struct *owner; int ret = 0;
if (locked) { @@ -2386,43 +2392,15 @@ static int fixup_owner(u32 __user *uaddr }
/* - * Catch the rare case, where the lock was released when we were on the - * way back before we locked the hash bucket. - */ - if (q->pi_state->owner == current) { - /* - * Try to get the rt_mutex now. This might fail as some other - * task acquired the rt_mutex after we removed ourself from the - * rt_mutex waiters list. - */ - if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) { - locked = 1; - goto out; - } - - /* - * pi_state is incorrect, some other task did a lock steal and - * we returned due to timeout or signal without taking the - * rt_mutex. Too late. - */ - raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock); - owner = rt_mutex_owner(&q->pi_state->pi_mutex); - if (!owner) - owner = rt_mutex_next_owner(&q->pi_state->pi_mutex); - raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock); - ret = fixup_pi_state_owner(uaddr, q, owner); - goto out; - } - - /* * Paranoia check. If we did not take the lock, then we should not be * the owner of the rt_mutex. */ - if (rt_mutex_owner(&q->pi_state->pi_mutex) == current) + if (rt_mutex_owner(&q->pi_state->pi_mutex) == current) { printk(KERN_ERR "fixup_owner: ret = %d pi-mutex: %p " "pi-state %p\n", ret, q->pi_state->pi_mutex.owner, q->pi_state->owner); + }
out: return ret ? ret : locked;
From: Peter Zijlstra peterz@infradead.org
commit c1e2f0eaf015fb7076d51a339011f2383e6dd389 upstream.
Julia reported futex state corruption in the following scenario:
waiter waker stealer (prio > waiter)
futex(WAIT_REQUEUE_PI, uaddr, uaddr2, timeout=[N ms]) futex_wait_requeue_pi() futex_wait_queue_me() freezable_schedule() <scheduled out> futex(LOCK_PI, uaddr2) futex(CMP_REQUEUE_PI, uaddr, uaddr2, 1, 0) /* requeues waiter to uaddr2 */ futex(UNLOCK_PI, uaddr2) wake_futex_pi() cmp_futex_value_locked(uaddr2, waiter) wake_up_q() <woken by waker> <hrtimer_wakeup() fires, clears sleeper->task> futex(LOCK_PI, uaddr2) __rt_mutex_start_proxy_lock() try_to_take_rt_mutex() /* steals lock */ rt_mutex_set_owner(lock, stealer) <preempted> <scheduled in> rt_mutex_wait_proxy_lock() __rt_mutex_slowlock() try_to_take_rt_mutex() /* fails, lock held by stealer */ if (timeout && !timeout->task) return -ETIMEDOUT; fixup_owner() /* lock wasn't acquired, so, fixup_pi_state_owner skipped */
return -ETIMEDOUT;
/* At this point, we've returned -ETIMEDOUT to userspace, but the * futex word shows waiter to be the owner, and the pi_mutex has * stealer as the owner */
futex_lock(LOCK_PI, uaddr2) -> bails with EDEADLK, futex word says we're owner.
And suggested that what commit:
73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state")
removes from fixup_owner() looks to be just what is needed. And indeed it is -- I completely missed that requeue_pi could also result in this case. So we need to restore that, except that subsequent patches, like commit:
16ffa12d7425 ("futex: Pull rt_mutex_futex_unlock() out from under hb->lock")
changed all the locking rules. Even without that, the sequence:
- if (rt_mutex_futex_trylock(&q->pi_state->pi_mutex)) { - locked = 1; - goto out; - }
- raw_spin_lock_irq(&q->pi_state->pi_mutex.wait_lock); - owner = rt_mutex_owner(&q->pi_state->pi_mutex); - if (!owner) - owner = rt_mutex_next_owner(&q->pi_state->pi_mutex); - raw_spin_unlock_irq(&q->pi_state->pi_mutex.wait_lock); - ret = fixup_pi_state_owner(uaddr, q, owner);
already suggests there were races; otherwise we'd never have to look at next_owner.
So instead of doing 3 consecutive wait_lock sections with who knows what races, we do it all in a single section. Additionally, the usage of pi_state->owner in fixup_owner() was only safe because only the rt_mutex owner would modify it, which this additional case wrecks.
Luckily the values can only change away and not to the value we're testing, this means we can do a speculative test and double check once we have the wait_lock.
Fixes: 73d786bd043e ("futex: Rework inconsistent rt_mutex/futex_q state") Reported-by: Julia Cartwright julia@ni.com Reported-by: Gratian Crisan gratian.crisan@ni.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Julia Cartwright julia@ni.com Tested-by: Gratian Crisan gratian.crisan@ni.com Cc: Darren Hart dvhart@infradead.org Link: https://lkml.kernel.org/r/20171208124939.7livp7no2ov65rrc@hirez.programming.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org [Lee: Back-ported to solve a dependency] Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/futex.c | 80 +++++++++++++++++++++++++++++++++------- kernel/locking/rtmutex.c | 26 +++++++++---- kernel/locking/rtmutex_common.h | 1 3 files changed, 87 insertions(+), 20 deletions(-)
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -2262,30 +2262,34 @@ static void unqueue_me_pi(struct futex_q spin_unlock(q->lock_ptr); }
-/* - * Fixup the pi_state owner with the new owner. - * - * Must be called with hash bucket lock held and mm->sem held for non - * private futexes. - */ static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, - struct task_struct *newowner) + struct task_struct *argowner) { - u32 newtid = task_pid_vnr(newowner) | FUTEX_WAITERS; struct futex_pi_state *pi_state = q->pi_state; - struct task_struct *oldowner = pi_state->owner; u32 uval, uninitialized_var(curval), newval; + struct task_struct *oldowner, *newowner; + u32 newtid; int ret;
+ lockdep_assert_held(q->lock_ptr); + + oldowner = pi_state->owner; /* Owner died? */ if (!pi_state->owner) newtid |= FUTEX_OWNER_DIED;
/* - * We are here either because we stole the rtmutex from the - * previous highest priority waiter or we are the highest priority - * waiter but failed to get the rtmutex the first time. - * We have to replace the newowner TID in the user space variable. + * We are here because either: + * + * - we stole the lock and pi_state->owner needs updating to reflect + * that (@argowner == current), + * + * or: + * + * - someone stole our lock and we need to fix things to point to the + * new owner (@argowner == NULL). + * + * Either way, we have to replace the TID in the user space variable. * This must be atomic as we have to preserve the owner died bit here. * * Note: We write the user space value _before_ changing the pi_state @@ -2299,6 +2303,39 @@ static int fixup_pi_state_owner(u32 __us * in lookup_pi_state. */ retry: + if (!argowner) { + if (oldowner != current) { + /* + * We raced against a concurrent self; things are + * already fixed up. Nothing to do. + */ + return 0; + } + + if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) { + /* We got the lock after all, nothing to fix. */ + return 0; + } + + /* + * Since we just failed the trylock; there must be an owner. + */ + newowner = rt_mutex_owner(&pi_state->pi_mutex); + BUG_ON(!newowner); + } else { + WARN_ON_ONCE(argowner != current); + if (oldowner == current) { + /* + * We raced against a concurrent self; things are + * already fixed up. Nothing to do. + */ + return 0; + } + newowner = argowner; + } + + newtid = task_pid_vnr(newowner) | FUTEX_WAITERS; + if (get_futex_value_locked(&uval, uaddr)) goto handle_fault;
@@ -2385,12 +2422,29 @@ static int fixup_owner(u32 __user *uaddr /* * Got the lock. We might not be the anticipated owner if we * did a lock-steal - fix up the PI-state in that case: + * + * Speculative pi_state->owner read (we don't hold wait_lock); + * since we own the lock pi_state->owner == current is the + * stable state, anything else needs more attention. */ if (q->pi_state->owner != current) ret = fixup_pi_state_owner(uaddr, q, current); goto out; }
+ /* + * If we didn't get the lock; check if anybody stole it from us. In + * that case, we need to fix up the uval to point to them instead of + * us, otherwise bad things happen. [10] + * + * Another speculative read; pi_state->owner == current is unstable + * but needs our attention. + */ + if (q->pi_state->owner == current) { + ret = fixup_pi_state_owner(uaddr, q, NULL); + goto out; + } + /* * Paranoia check. If we did not take the lock, then we should not be * the owner of the rt_mutex. --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1314,6 +1314,19 @@ rt_mutex_slowlock(struct rt_mutex *lock, return ret; }
+static inline int __rt_mutex_slowtrylock(struct rt_mutex *lock) +{ + int ret = try_to_take_rt_mutex(lock, current, NULL); + + /* + * try_to_take_rt_mutex() sets the lock waiters bit + * unconditionally. Clean this up. + */ + fixup_rt_mutex_waiters(lock); + + return ret; +} + /* * Slow path try-lock function: */ @@ -1336,13 +1349,7 @@ static inline int rt_mutex_slowtrylock(s */ raw_spin_lock_irqsave(&lock->wait_lock, flags);
- ret = try_to_take_rt_mutex(lock, current, NULL); - - /* - * try_to_take_rt_mutex() sets the lock waiters bit - * unconditionally. Clean this up. - */ - fixup_rt_mutex_waiters(lock); + ret = __rt_mutex_slowtrylock(lock);
raw_spin_unlock_irqrestore(&lock->wait_lock, flags);
@@ -1530,6 +1537,11 @@ int __sched rt_mutex_futex_trylock(struc return rt_mutex_slowtrylock(lock); }
+int __sched __rt_mutex_futex_trylock(struct rt_mutex *lock) +{ + return __rt_mutex_slowtrylock(lock); +} + /** * rt_mutex_timed_lock - lock a rt_mutex interruptible * the timeout structure is provided --- a/kernel/locking/rtmutex_common.h +++ b/kernel/locking/rtmutex_common.h @@ -114,6 +114,7 @@ extern bool rt_mutex_cleanup_proxy_lock( struct rt_mutex_waiter *waiter); extern int rt_mutex_timed_futex_lock(struct rt_mutex *l, struct hrtimer_sleeper *to); extern int rt_mutex_futex_trylock(struct rt_mutex *l); +extern int __rt_mutex_futex_trylock(struct rt_mutex *l);
extern void rt_mutex_futex_unlock(struct rt_mutex *lock); extern bool __rt_mutex_futex_unlock(struct rt_mutex *lock,
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit 04b79c55201f02ffd675e1231d731365e335c307 ]
If that unexpected case of inconsistent arguments ever happens then the futex state is left completely inconsistent and the printk is not really helpful. Replace it with a warning and make the state consistent.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/futex.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-)
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -2447,14 +2447,10 @@ static int fixup_owner(u32 __user *uaddr
/* * Paranoia check. If we did not take the lock, then we should not be - * the owner of the rt_mutex. + * the owner of the rt_mutex. Warn and establish consistent state. */ - if (rt_mutex_owner(&q->pi_state->pi_mutex) == current) { - printk(KERN_ERR "fixup_owner: ret = %d pi-mutex: %p " - "pi-state %p\n", ret, - q->pi_state->pi_mutex.owner, - q->pi_state->owner); - } + if (WARN_ON_ONCE(rt_mutex_owner(&q->pi_state->pi_mutex) == current)) + return fixup_pi_state_owner(uaddr, q, current);
out: return ret ? ret : locked;
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit c5cade200ab9a2a3be9e7f32a752c8d86b502ec7 ]
Updating pi_state::owner is done at several places with the same code. Provide a function for it and use that at the obvious places.
This is also a preparation for a bug fix to avoid yet another copy of the same code or alternatively introducing a completely unpenetratable mess of gotos.
Originally-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/futex.c | 64 +++++++++++++++++++++++++++++---------------------------- 1 file changed, 33 insertions(+), 31 deletions(-)
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -837,6 +837,29 @@ static struct futex_pi_state * alloc_pi_ return pi_state; }
+static void pi_state_update_owner(struct futex_pi_state *pi_state, + struct task_struct *new_owner) +{ + struct task_struct *old_owner = pi_state->owner; + + lockdep_assert_held(&pi_state->pi_mutex.wait_lock); + + if (old_owner) { + raw_spin_lock(&old_owner->pi_lock); + WARN_ON(list_empty(&pi_state->list)); + list_del_init(&pi_state->list); + raw_spin_unlock(&old_owner->pi_lock); + } + + if (new_owner) { + raw_spin_lock(&new_owner->pi_lock); + WARN_ON(!list_empty(&pi_state->list)); + list_add(&pi_state->list, &new_owner->pi_state_list); + pi_state->owner = new_owner; + raw_spin_unlock(&new_owner->pi_lock); + } +} + /* * Drops a reference to the pi_state object and frees or caches it * when the last reference is gone. @@ -1432,26 +1455,16 @@ static int wake_futex_pi(u32 __user *uad else ret = -EINVAL; } - if (ret) { - raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); - return ret; - } - - raw_spin_lock(&pi_state->owner->pi_lock); - WARN_ON(list_empty(&pi_state->list)); - list_del_init(&pi_state->list); - raw_spin_unlock(&pi_state->owner->pi_lock);
- raw_spin_lock(&new_owner->pi_lock); - WARN_ON(!list_empty(&pi_state->list)); - list_add(&pi_state->list, &new_owner->pi_state_list); - pi_state->owner = new_owner; - raw_spin_unlock(&new_owner->pi_lock); - - /* - * We've updated the uservalue, this unlock cannot fail. - */ - deboost = __rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q); + if (!ret) { + /* + * This is a point of no return; once we modified the uval + * there is no going back and subsequent operations must + * not fail. + */ + pi_state_update_owner(pi_state, new_owner); + deboost = __rt_mutex_futex_unlock(&pi_state->pi_mutex, &wake_q); + }
raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); spin_unlock(&hb->lock); @@ -2353,19 +2366,8 @@ retry: * We fixed up user space. Now we need to fix the pi_state * itself. */ - if (pi_state->owner != NULL) { - raw_spin_lock_irq(&pi_state->owner->pi_lock); - WARN_ON(list_empty(&pi_state->list)); - list_del_init(&pi_state->list); - raw_spin_unlock_irq(&pi_state->owner->pi_lock); - } - - pi_state->owner = newowner; + pi_state_update_owner(pi_state, newowner);
- raw_spin_lock_irq(&newowner->pi_lock); - WARN_ON(!list_empty(&pi_state->list)); - list_add(&pi_state->list, &newowner->pi_state_list); - raw_spin_unlock_irq(&newowner->pi_lock); return 0;
/*
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit 2156ac1934166d6deb6cd0f6ffc4c1076ec63697 ] Nothing uses the argument. Remove it as preparation to use pi_state_update_owner().
Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/futex.c | 2 +- kernel/locking/rtmutex.c | 3 +-- kernel/locking/rtmutex_common.h | 3 +-- 3 files changed, 3 insertions(+), 5 deletions(-)
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -883,7 +883,7 @@ static void put_pi_state(struct futex_pi list_del_init(&pi_state->list); raw_spin_unlock_irq(&pi_state->owner->pi_lock);
- rt_mutex_proxy_unlock(&pi_state->pi_mutex, pi_state->owner); + rt_mutex_proxy_unlock(&pi_state->pi_mutex); }
if (current->pi_state_cache) --- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -1696,8 +1696,7 @@ void rt_mutex_init_proxy_locked(struct r * No locking. Caller has to do serializing itself * Special API call for PI-futex support */ -void rt_mutex_proxy_unlock(struct rt_mutex *lock, - struct task_struct *proxy_owner) +void rt_mutex_proxy_unlock(struct rt_mutex *lock) { debug_rt_mutex_proxy_unlock(lock); rt_mutex_set_owner(lock, NULL); --- a/kernel/locking/rtmutex_common.h +++ b/kernel/locking/rtmutex_common.h @@ -102,8 +102,7 @@ enum rtmutex_chainwalk { extern struct task_struct *rt_mutex_next_owner(struct rt_mutex *lock); extern void rt_mutex_init_proxy_locked(struct rt_mutex *lock, struct task_struct *proxy_owner); -extern void rt_mutex_proxy_unlock(struct rt_mutex *lock, - struct task_struct *proxy_owner); +extern void rt_mutex_proxy_unlock(struct rt_mutex *lock); extern int rt_mutex_start_proxy_lock(struct rt_mutex *lock, struct rt_mutex_waiter *waiter, struct task_struct *task);
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit 6ccc84f917d33312eb2846bd7b567639f585ad6d ]
No point in open coding it. This way it gains the extra sanity checks.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/futex.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -879,10 +879,7 @@ static void put_pi_state(struct futex_pi * and has cleaned up the pi_state already */ if (pi_state->owner) { - raw_spin_lock_irq(&pi_state->owner->pi_lock); - list_del_init(&pi_state->list); - raw_spin_unlock_irq(&pi_state->owner->pi_lock); - + pi_state_update_owner(pi_state, NULL); rt_mutex_proxy_unlock(&pi_state->pi_mutex); }
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit f2dac39d93987f7de1e20b3988c8685523247ae2 ]
Too many gotos already and an upcoming fix would make it even more unreadable.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/futex.c | 41 +++++++++++++++++++++++++++-------------- 1 file changed, 27 insertions(+), 14 deletions(-)
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -2272,18 +2272,16 @@ static void unqueue_me_pi(struct futex_q spin_unlock(q->lock_ptr); }
-static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, - struct task_struct *argowner) +static int __fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, + struct task_struct *argowner) { struct futex_pi_state *pi_state = q->pi_state; - u32 uval, uninitialized_var(curval), newval; struct task_struct *oldowner, *newowner; - u32 newtid; - int ret; - - lockdep_assert_held(q->lock_ptr); + u32 uval, curval, newval, newtid; + int err = 0;
oldowner = pi_state->owner; + /* Owner died? */ if (!pi_state->owner) newtid |= FUTEX_OWNER_DIED; @@ -2324,7 +2322,7 @@ retry:
if (__rt_mutex_futex_trylock(&pi_state->pi_mutex)) { /* We got the lock after all, nothing to fix. */ - return 0; + return 1; }
/* @@ -2339,7 +2337,7 @@ retry: * We raced against a concurrent self; things are * already fixed up. Nothing to do. */ - return 0; + return 1; } newowner = argowner; } @@ -2380,7 +2378,7 @@ retry: handle_fault: spin_unlock(q->lock_ptr);
- ret = fault_in_user_writeable(uaddr); + err = fault_in_user_writeable(uaddr);
spin_lock(q->lock_ptr);
@@ -2388,12 +2386,27 @@ handle_fault: * Check if someone else fixed it for us: */ if (pi_state->owner != oldowner) - return 0; + return argowner == current; + + /* Retry if err was -EAGAIN or the fault in succeeded */ + if (!err) + goto retry;
- if (ret) - return ret; + return err; +} + +static int fixup_pi_state_owner(u32 __user *uaddr, struct futex_q *q, + struct task_struct *argowner) +{ + struct futex_pi_state *pi_state = q->pi_state; + int ret; + + lockdep_assert_held(q->lock_ptr);
- goto retry; + raw_spin_lock_irq(&pi_state->pi_mutex.wait_lock); + ret = __fixup_pi_state_owner(uaddr, q, argowner); + raw_spin_unlock_irq(&pi_state->pi_mutex.wait_lock); + return ret; }
static long futex_wait_restart(struct restart_block *restart);
From: Thomas Gleixner tglx@linutronix.de
fixup_pi_state_owner() tries to ensure that the state of the rtmutex, pi_state and the user space value related to the PI futex are consistent before returning to user space. In case that the user space value update faults and the fault cannot be resolved by faulting the page in via fault_in_user_writeable() the function returns with -EFAULT and leaves the rtmutex and pi_state owner state inconsistent.
A subsequent futex_unlock_pi() operates on the inconsistent pi_state and releases the rtmutex despite not owning it which can corrupt the RB tree of the rtmutex and cause a subsequent kernel stack use after free.
It was suggested to loop forever in fixup_pi_state_owner() if the fault cannot be resolved, but that results in runaway tasks which is especially undesired when the problem happens due to a programming error and not due to malice.
As the user space value cannot be fixed up, the proper solution is to make the rtmutex and the pi_state consistent so both have the same owner. This leaves the user space value out of sync. Any subsequent operation on the futex will fail because the 10th rule of PI futexes (pi_state owner and user space value are consistent) has been violated.
As a consequence this removes the inept attempts of 'fixing' the situation in case that the current task owns the rtmutex when returning with an unresolvable fault by unlocking the rtmutex which left pi_state::owner and rtmutex::owner out of sync in a different and only slightly less dangerous way.
Fixes: 1b7558e457ed ("futexes: fix fault handling in futex_lock_pi") Reported-by: gzobqq@gmail.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/futex.c | 38 ++++++++++++++++++++------------------ 1 file changed, 20 insertions(+), 18 deletions(-)
--- a/kernel/futex.c +++ b/kernel/futex.c @@ -1017,7 +1017,8 @@ static void exit_pi_state_list(struct ta * FUTEX_OWNER_DIED bit. See [4] * * [10] There is no transient state which leaves owner and user space - * TID out of sync. + * TID out of sync. Except one error case where the kernel is denied + * write access to the user address, see fixup_pi_state_owner(). */
/* @@ -2392,6 +2393,24 @@ handle_fault: if (!err) goto retry;
+ /* + * fault_in_user_writeable() failed so user state is immutable. At + * best we can make the kernel state consistent but user state will + * be most likely hosed and any subsequent unlock operation will be + * rejected due to PI futex rule [10]. + * + * Ensure that the rtmutex owner is also the pi_state owner despite + * the user space value claiming something different. There is no + * point in unlocking the rtmutex if current is the owner as it + * would need to wait until the next waiter has taken the rtmutex + * to guarantee consistent state. Keep it simple. Userspace asked + * for this wreckaged state. + * + * The rtmutex has an owner - either current or some other + * task. See the EAGAIN loop above. + */ + pi_state_update_owner(pi_state, rt_mutex_owner(&pi_state->pi_mutex)); + return err; }
@@ -2777,13 +2796,6 @@ retry_private: if (res) ret = (res < 0) ? res : 0;
- /* - * If fixup_owner() faulted and was unable to handle the fault, unlock - * it and return the fault to userspace. - */ - if (ret && (rt_mutex_owner(&q.pi_state->pi_mutex) == current)) - rt_mutex_futex_unlock(&q.pi_state->pi_mutex); - /* Unqueue and drop the lock */ unqueue_me_pi(&q);
@@ -3088,8 +3100,6 @@ static int futex_wait_requeue_pi(u32 __u if (q.pi_state && (q.pi_state->owner != current)) { spin_lock(q.lock_ptr); ret = fixup_pi_state_owner(uaddr2, &q, current); - if (ret && rt_mutex_owner(&q.pi_state->pi_mutex) == current) - rt_mutex_futex_unlock(&q.pi_state->pi_mutex); /* * Drop the reference to the pi state which * the requeue_pi() code acquired for us. @@ -3126,14 +3136,6 @@ static int futex_wait_requeue_pi(u32 __u if (res) ret = (res < 0) ? res : 0;
- /* - * If fixup_pi_state_owner() faulted and was unable to handle - * the fault, unlock the rt_mutex and return the fault to - * userspace. - */ - if (ret && rt_mutex_owner(pi_mutex) == current) - rt_mutex_futex_unlock(pi_mutex); - /* Unqueue and drop the lock. */ unqueue_me_pi(&q); }
From: Javed Hasan jhasan@marvell.com
[ Upstream commit b2b0f16fa65e910a3ec8771206bb49ee87a54ac5 ]
A race condition exists between the response handler getting called because of exchange_mgr_reset() (which clears out all the active XIDs) and the response we get via an interrupt.
Sequence of events:
rport ba0200: Port timeout, state PLOGI rport ba0200: Port entered PLOGI state from PLOGI state xid 1052: Exchange timer armed : 20000 msecs xid timer armed here rport ba0200: Received LOGO request while in state PLOGI rport ba0200: Delete port rport ba0200: work event 3 rport ba0200: lld callback ev 3 bnx2fc: rport_event_hdlr: event = 3, port_id = 0xba0200 bnx2fc: ba0200 - rport not created Yet!! /* Here we reset any outstanding exchanges before freeing rport using the exch_mgr_reset() */ xid 1052: Exchange timer canceled /* Here we got two responses for one xid */ xid 1052: invoking resp(), esb 20000000 state 3 xid 1052: invoking resp(), esb 20000000 state 3 xid 1052: fc_rport_plogi_resp() : ep->resp_active 2 xid 1052: fc_rport_plogi_resp() : ep->resp_active 2
Skip the response if the exchange is already completed.
Link: https://lore.kernel.org/r/20201215194731.2326-1-jhasan@marvell.com Signed-off-by: Javed Hasan jhasan@marvell.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/libfc/fc_exch.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/libfc/fc_exch.c b/drivers/scsi/libfc/fc_exch.c index d0a86ef806522..59fd6101f188b 100644 --- a/drivers/scsi/libfc/fc_exch.c +++ b/drivers/scsi/libfc/fc_exch.c @@ -1585,8 +1585,13 @@ static void fc_exch_recv_seq_resp(struct fc_exch_mgr *mp, struct fc_frame *fp) rc = fc_exch_done_locked(ep); WARN_ON(fc_seq_exch(sp) != ep); spin_unlock_bh(&ep->ex_lock); - if (!rc) + if (!rc) { fc_exch_delete(ep); + } else { + FC_EXCH_DBG(ep, "ep is completed already," + "hence skip calling the resp\n"); + goto skip_resp; + } }
/* @@ -1605,6 +1610,7 @@ static void fc_exch_recv_seq_resp(struct fc_exch_mgr *mp, struct fc_frame *fp) if (!fc_invoke_resp(ep, sp, fp)) fc_frame_free(fp);
+skip_resp: fc_exch_release(ep); return; rel: @@ -1848,10 +1854,16 @@ static void fc_exch_reset(struct fc_exch *ep)
fc_exch_hold(ep);
- if (!rc) + if (!rc) { fc_exch_delete(ep); + } else { + FC_EXCH_DBG(ep, "ep is completed already," + "hence skip calling the resp\n"); + goto skip_resp; + }
fc_invoke_resp(ep, sp, ERR_PTR(-FC_EX_CLOSED)); +skip_resp: fc_seq_set_resp(sp, NULL, ep->arg); fc_exch_release(ep); }
From: Felix Fietkau nbd@nbd.name
[ Upstream commit 622d3b4e39381262da7b18ca1ed1311df227de86 ]
When using WEP, the default unicast key needs to be selected, instead of the STA PTK.
Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20201218184718.93650-5-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/rx.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c index 9be82ed02e0e5..c38d68131d02e 100644 --- a/net/mac80211/rx.c +++ b/net/mac80211/rx.c @@ -3802,6 +3802,8 @@ void ieee80211_check_fast_rx(struct sta_info *sta)
rcu_read_lock(); key = rcu_dereference(sta->ptk[sta->ptk_idx]); + if (!key) + key = rcu_dereference(sdata->default_unicast_key); if (key) { switch (key->conf.cipher) { case WLAN_CIPHER_SUITE_TKIP:
From: Brian King brking@linux.vnet.ibm.com
[ Upstream commit 764907293edc1af7ac857389af9dc858944f53dc ]
While testing live partition mobility, we have observed occasional crashes of the Linux partition. What we've seen is that during the live migration, for specific configurations with large amounts of memory, slow network links, and workloads that are changing memory a lot, the partition can end up being suspended for 30 seconds or longer. This resulted in the following scenario:
CPU 0 CPU 1 ------------------------------- ---------------------------------- scsi_queue_rq migration_store -> blk_mq_start_request -> rtas_ibm_suspend_me -> blk_add_timer -> on_each_cpu(rtas_percpu_suspend_me _______________________________________V | V -> IPI from CPU 1 -> rtas_percpu_suspend_me -> __rtas_suspend_last_cpu
-- Linux partition suspended for > 30 seconds -- -> for_each_online_cpu(cpu) plpar_hcall_norets(H_PROD -> scsi_dispatch_cmd -> scsi_times_out -> scsi_abort_command -> queue_delayed_work -> ibmvfc_queuecommand_lck -> ibmvfc_send_event -> ibmvfc_send_crq - returns H_CLOSED <- returns SCSI_MLQUEUE_HOST_BUSY -> __blk_mq_requeue_request
-> scmd_eh_abort_handler -> scsi_try_to_abort_cmd - returns SUCCESS -> scsi_queue_insert
Normally, the SCMD_STATE_COMPLETE bit would protect against the command completion and the timeout, but that doesn't work here, since we don't check that at all in the SCSI_MLQUEUE_HOST_BUSY path.
In this case we end up calling scsi_queue_insert on a request that has already been queued, or possibly even freed, and we crash.
The patch below simply increases the default I/O timeout to avoid this race condition. This is also the timeout value that nearly all IBM SAN storage recommends setting as the default value.
Link: https://lore.kernel.org/r/1610463998-19791-1-git-send-email-brking@linux.vne... Signed-off-by: Brian King brking@linux.vnet.ibm.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/ibmvscsi/ibmvfc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/ibmvscsi/ibmvfc.c b/drivers/scsi/ibmvscsi/ibmvfc.c index 04b3ac17531db..7865feb8e5e83 100644 --- a/drivers/scsi/ibmvscsi/ibmvfc.c +++ b/drivers/scsi/ibmvscsi/ibmvfc.c @@ -2891,8 +2891,10 @@ static int ibmvfc_slave_configure(struct scsi_device *sdev) unsigned long flags = 0;
spin_lock_irqsave(shost->host_lock, flags); - if (sdev->type == TYPE_DISK) + if (sdev->type == TYPE_DISK) { sdev->allow_restart = 1; + blk_queue_rq_timeout(sdev->request_queue, 120 * HZ); + } spin_unlock_irqrestore(shost->host_lock, flags); return 0; }
From: Josh Poimboeuf jpoimboe@redhat.com
[ Upstream commit 1d489151e9f9d1647110277ff77282fe4d96d09b ]
Thanks to a recent binutils change which doesn't generate unused symbols, it's now possible for thunk_64.o be completely empty without CONFIG_PREEMPTION: no text, no data, no symbols.
We could edit the Makefile to only build that file when CONFIG_PREEMPTION is enabled, but that will likely create confusion if/when the thunks end up getting used by some other code again.
Just ignore it and move on.
Reported-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Nathan Chancellor natechancellor@gmail.com Reviewed-by: Miroslav Benes mbenes@suse.cz Tested-by: Nathan Chancellor natechancellor@gmail.com Link: https://github.com/ClangBuiltLinux/linux/issues/1254 Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/objtool/elf.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/objtool/elf.c b/tools/objtool/elf.c index d84c28eac262d..0ba5bb51bd93c 100644 --- a/tools/objtool/elf.c +++ b/tools/objtool/elf.c @@ -226,8 +226,11 @@ static int read_symbols(struct elf *elf)
symtab = find_section_by_name(elf, ".symtab"); if (!symtab) { - WARN("missing symbol table"); - return -1; + /* + * A missing symbol table is actually possible if it's an empty + * .o file. This can happen for thunk_64.o. + */ + return 0; }
symbols_nr = symtab->sh.sh_size / symtab->sh.sh_entsize;
Right now SUBLEVEL is overflowing, and some userspace may start treating 4.9.256 as 4.10. While out of tree modules have different ways of extracting the version number (and we're generally ok with breaking them), we do care about breaking userspace and it would appear that this overflow might do just that.
Our rules around userspace ABI in the stable kernel are pretty simple: we don't break it. Thus, while userspace may be checking major/minor, it shouldn't be doing anything with sublevel.
This patch applies a big band-aid to the 4.9 and 4.4 kernels in the form of clamping their sublevel to 255.
The clamp is done for the purpose of LINUX_VERSION_CODE only, and extracting the version number from the Makefile or "make kernelversion" will continue to work as intended.
We might need to do it later in newer trees, but maybe we'll have a better solution by then, so I'm ignoring that problem for now.
Signed-off-by: Sasha Levin sashal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/Makefile +++ b/Makefile @@ -1141,7 +1141,7 @@ endef
define filechk_version.h (echo #define LINUX_VERSION_CODE $(shell \ - expr $(VERSION) * 65536 + 0$(PATCHLEVEL) * 256 + 0$(SUBLEVEL)); \ + expr $(VERSION) * 65536 + 0$(PATCHLEVEL) * 256 + 255); \ echo '#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) + (c))';) endef
From: Pho Tran Pho.Tran@silabs.com
commit 3c4f6ecd93442f4376a58b38bb40ee0b8c46e0e6 upstream.
Information pid/vid of WSDA-200-USB, Lord corporation company: vid: 199b pid: ba30
Signed-off-by: Pho Tran pho.tran@silabs.com [ johan: amend comment with product name ] Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/cp210x.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -198,6 +198,7 @@ static const struct usb_device_id id_tab { USB_DEVICE(0x1901, 0x0194) }, /* GE Healthcare Remote Alarm Box */ { USB_DEVICE(0x1901, 0x0195) }, /* GE B850/B650/B450 CP2104 DP UART interface */ { USB_DEVICE(0x1901, 0x0196) }, /* GE B850 CP2105 DP UART interface */ + { USB_DEVICE(0x199B, 0xBA30) }, /* LORD WSDA-200-USB */ { USB_DEVICE(0x19CF, 0x3000) }, /* Parrot NMEA GPS Flight Recorder */ { USB_DEVICE(0x1ADB, 0x0001) }, /* Schweitzer Engineering C662 Cable */ { USB_DEVICE(0x1B1C, 0x1C00) }, /* Corsair USB Dongle */
From: Chenxin Jin bg4akv@hotmail.com
commit 43377df70480f82919032eb09832e9646a8a5efb upstream.
Teraoka AD2000 uses the CP210x driver, but the chip VID/PID is customized with 0988/0578. We need the driver to support the new VID/PID.
Signed-off-by: Chenxin Jin bg4akv@hotmail.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/cp210x.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -58,6 +58,7 @@ static const struct usb_device_id id_tab { USB_DEVICE(0x08e6, 0x5501) }, /* Gemalto Prox-PU/CU contactless smartcard reader */ { USB_DEVICE(0x08FD, 0x000A) }, /* Digianswer A/S , ZigBee/802.15.4 MAC Device */ { USB_DEVICE(0x0908, 0x01FF) }, /* Siemens RUGGEDCOM USB Serial Console */ + { USB_DEVICE(0x0988, 0x0578) }, /* Teraoka AD2000 */ { USB_DEVICE(0x0B00, 0x3070) }, /* Ingenico 3070 */ { USB_DEVICE(0x0BED, 0x1100) }, /* MEI (TM) Cashflow-SC Bill/Voucher Acceptor */ { USB_DEVICE(0x0BED, 0x1101) }, /* MEI series 2000 Combo Acceptor */
From: Christoph Schemmel christoph.schemmel@gmail.com
commit e478d6029dca9d8462f426aee0d32896ef64f10f upstream.
Adding support for Cinterion device MV31 for enumeration with PID 0x00B3 and 0x00B7.
usb-devices output for 0x00B3 T: Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 6 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1 P: Vendor=1e2d ProdID=00b3 Rev=04.14 S: Manufacturer=Cinterion S: Product=Cinterion PID 0x00B3 USB Mobile Broadband S: SerialNumber=b3246eed C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#=0x0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim I: If#=0x1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#=0x3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=ff Prot=ff Driver=cdc_wdm I: If#=0x4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#=0x5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
usb-devices output for 0x00B7 T: Bus=04 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 5 Spd=5000 MxCh= 0 D: Ver= 3.20 Cls=ef(misc ) Sub=02 Prot=01 MxPS= 9 #Cfgs= 1 P: Vendor=1e2d ProdID=00b7 Rev=04.14 S: Manufacturer=Cinterion S: Product=Cinterion PID 0x00B3 USB Mobile Broadband S: SerialNumber=b3246eed C: #Ifs= 4 Cfg#= 1 Atr=a0 MxPwr=896mA I: If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan I: If#=0x1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=30 Driver=option
Signed-off-by: Christoph Schemmel christoph.schemmel@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/option.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -425,6 +425,8 @@ static void option_instat_callback(struc #define CINTERION_PRODUCT_AHXX_2RMNET 0x0084 #define CINTERION_PRODUCT_AHXX_AUDIO 0x0085 #define CINTERION_PRODUCT_CLS8 0x00b0 +#define CINTERION_PRODUCT_MV31_MBIM 0x00b3 +#define CINTERION_PRODUCT_MV31_RMNET 0x00b7
/* Olivetti products */ #define OLIVETTI_VENDOR_ID 0x0b3c @@ -1896,6 +1898,10 @@ static const struct usb_device_id option { USB_DEVICE(SIEMENS_VENDOR_ID, CINTERION_PRODUCT_HC25_MDMNET) }, { USB_DEVICE(SIEMENS_VENDOR_ID, CINTERION_PRODUCT_HC28_MDM) }, /* HC28 enumerates with Siemens or Cinterion VID depending on FW revision */ { USB_DEVICE(SIEMENS_VENDOR_ID, CINTERION_PRODUCT_HC28_MDMNET) }, + { USB_DEVICE_INTERFACE_CLASS(CINTERION_VENDOR_ID, CINTERION_PRODUCT_MV31_MBIM, 0xff), + .driver_info = RSVD(3)}, + { USB_DEVICE_INTERFACE_CLASS(CINTERION_VENDOR_ID, CINTERION_PRODUCT_MV31_RMNET, 0xff), + .driver_info = RSVD(0)}, { USB_DEVICE(OLIVETTI_VENDOR_ID, OLIVETTI_PRODUCT_OLICARD100), .driver_info = RSVD(4) }, { USB_DEVICE(OLIVETTI_VENDOR_ID, OLIVETTI_PRODUCT_OLICARD120),
From: Alexey Dobriyan adobriyan@gmail.com
[ Upstream commit a3a9060ecad030e2c7903b2b258383d2c716b56c ]
g++ reports
drivers/input/serio/i8042-x86ia64io.h:225:3: error: ‘.matches’ designator used multiple times in the same initializer list
C99 semantics is that last duplicated initialiser wins, so DMI entry gets overwritten.
Fixes: a48491c65b51 ("Input: i8042 - add ByteSpeed touchpad to noloop table") Signed-off-by: Alexey Dobriyan adobriyan@gmail.com Acked-by: Po-Hsu Lin po-hsu.lin@canonical.com Link: https://lore.kernel.org/r/20201228072335.GA27766@localhost.localdomain Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/serio/i8042-x86ia64io.h | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h index fa07be0b4500e..2317f8d3fef6f 100644 --- a/drivers/input/serio/i8042-x86ia64io.h +++ b/drivers/input/serio/i8042-x86ia64io.h @@ -223,6 +223,8 @@ static const struct dmi_system_id __initconst i8042_dmi_noloop_table[] = { DMI_MATCH(DMI_SYS_VENDOR, "PEGATRON CORPORATION"), DMI_MATCH(DMI_PRODUCT_NAME, "C15B"), }, + }, + { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "ByteSpeed LLC"), DMI_MATCH(DMI_PRODUCT_NAME, "ByteSpeed Laptop C15B"),
From: Xie He xie.he.0141@gmail.com
[ Upstream commit 88c7a9fd9bdd3e453f04018920964c6f848a591a ]
When sending a packet, we will prepend it with an LAPB header. This modifies the shared parts of a cloned skb, so we should copy the skb rather than just clone it, before we prepend the header.
In "Documentation/networking/driver.rst" (the 2nd point), it states that drivers shouldn't modify the shared parts of a cloned skb when transmitting.
The "dev_queue_xmit_nit" function in "net/core/dev.c", which is called when an skb is being sent, clones the skb and sents the clone to AF_PACKET sockets. Because the LAPB drivers first remove a 1-byte pseudo-header before handing over the skb to us, if we don't copy the skb before prepending the LAPB header, the first byte of the packets received on AF_PACKET sockets can be corrupted.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Xie He xie.he.0141@gmail.com Acked-by: Martin Schiller ms@dev.tdt.de Link: https://lore.kernel.org/r/20210201055706.415842-1-xie.he.0141@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/lapb/lapb_out.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/lapb/lapb_out.c b/net/lapb/lapb_out.c index 482c94d9d958a..d1c7dcc234486 100644 --- a/net/lapb/lapb_out.c +++ b/net/lapb/lapb_out.c @@ -87,7 +87,8 @@ void lapb_kick(struct lapb_cb *lapb) skb = skb_dequeue(&lapb->write_queue);
do { - if ((skbn = skb_clone(skb, GFP_ATOMIC)) == NULL) { + skbn = skb_copy(skb, GFP_ATOMIC); + if (!skbn) { skb_queue_head(&lapb->write_queue, skb); break; }
From: Arnd Bergmann arnd@arndb.de
commit 6e7b64b9dd6d96537d816ea07ec26b7dedd397b9 upstream.
kernel/elfcore.c only contains weak symbols, which triggers a bug with clang in combination with recordmcount:
Cannot find symbol for section 2: .text. kernel/elfcore.o: failed
Move the empty stubs into linux/elfcore.h as inline functions. As only two architectures use these, just use the architecture specific Kconfig symbols to key off the declaration.
Link: https://lkml.kernel.org/r/20201204165742.3815221-2-arnd@kernel.org Signed-off-by: Arnd Bergmann arnd@arndb.de Cc: Nathan Chancellor natechancellor@gmail.com Cc: Nick Desaulniers ndesaulniers@google.com Cc: Barret Rhoden brho@google.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/elfcore.h | 22 ++++++++++++++++++++++ kernel/Makefile | 1 - kernel/elfcore.c | 25 ------------------------- 3 files changed, 22 insertions(+), 26 deletions(-) delete mode 100644 kernel/elfcore.c
--- a/include/linux/elfcore.h +++ b/include/linux/elfcore.h @@ -55,6 +55,7 @@ static inline int elf_core_copy_task_xfp } #endif
+#if defined(CONFIG_UM) || defined(CONFIG_IA64) /* * These functions parameterize elf_core_dump in fs/binfmt_elf.c to write out * extra segments containing the gate DSO contents. Dumping its @@ -69,5 +70,26 @@ elf_core_write_extra_phdrs(struct coredu extern int elf_core_write_extra_data(struct coredump_params *cprm); extern size_t elf_core_extra_data_size(void); +#else +static inline Elf_Half elf_core_extra_phdrs(void) +{ + return 0; +} + +static inline int elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset) +{ + return 1; +} + +static inline int elf_core_write_extra_data(struct coredump_params *cprm) +{ + return 1; +} + +static inline size_t elf_core_extra_data_size(void) +{ + return 0; +} +#endif
#endif /* _LINUX_ELFCORE_H */ --- a/kernel/Makefile +++ b/kernel/Makefile @@ -90,7 +90,6 @@ obj-$(CONFIG_TASK_DELAY_ACCT) += delayac obj-$(CONFIG_TASKSTATS) += taskstats.o tsacct.o obj-$(CONFIG_TRACEPOINTS) += tracepoint.o obj-$(CONFIG_LATENCYTOP) += latencytop.o -obj-$(CONFIG_ELFCORE) += elfcore.o obj-$(CONFIG_FUNCTION_TRACER) += trace/ obj-$(CONFIG_TRACING) += trace/ obj-$(CONFIG_TRACE_CLOCK) += trace/ --- a/kernel/elfcore.c +++ /dev/null @@ -1,25 +0,0 @@ -#include <linux/elf.h> -#include <linux/fs.h> -#include <linux/mm.h> -#include <linux/binfmts.h> -#include <linux/elfcore.h> - -Elf_Half __weak elf_core_extra_phdrs(void) -{ - return 0; -} - -int __weak elf_core_write_extra_phdrs(struct coredump_params *cprm, loff_t offset) -{ - return 1; -} - -int __weak elf_core_write_extra_data(struct coredump_params *cprm) -{ - return 1; -} - -size_t __weak elf_core_extra_data_size(void) -{ - return 0; -}
From: Dan Carpenter dan.carpenter@oracle.com
commit 3e1f4a2e1184ae6ad7f4caf682ced9554141a0f4 upstream.
This code should return -ENOMEM if the allocation fails but it currently returns success.
Fixes: 9b95236eebdb ("usb: gadget: ether: allocate and init otg descriptor by otg capabilities") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Link: https://lore.kernel.org/r/YBKE9rqVuJEOUWpW@mwanda Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/gadget/legacy/ether.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/usb/gadget/legacy/ether.c +++ b/drivers/usb/gadget/legacy/ether.c @@ -407,8 +407,10 @@ static int eth_bind(struct usb_composite struct usb_descriptor_header *usb_desc;
usb_desc = usb_otg_descriptor_alloc(gadget); - if (!usb_desc) + if (!usb_desc) { + status = -ENOMEM; goto fail1; + } usb_otg_descriptor_init(gadget, usb_desc); otg_desc[0] = usb_desc; otg_desc[1] = NULL;
From: Jeremy Figgins kernel@jeremyfiggins.com
commit d8c6edfa3f4ee0d45d7ce5ef18d1245b78774b9d upstream.
Some devices, such as the Winbond Electronics Corp. Virtual Com Port (Vendor=0416, ProdId=5011), lockup when usb_set_interface() or usb_clear_halt() are called. This device has only a single altsetting, so it should not be necessary to call usb_set_interface().
Acked-by: Pete Zaitcev zaitcev@redhat.com Signed-off-by: Jeremy Figgins kernel@jeremyfiggins.com Link: https://lore.kernel.org/r/YAy9kJhM/rG8EQXC@watson Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/class/usblp.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
--- a/drivers/usb/class/usblp.c +++ b/drivers/usb/class/usblp.c @@ -1349,14 +1349,17 @@ static int usblp_set_protocol(struct usb if (protocol < USBLP_FIRST_PROTOCOL || protocol > USBLP_LAST_PROTOCOL) return -EINVAL;
- alts = usblp->protocol[protocol].alt_setting; - if (alts < 0) - return -EINVAL; - r = usb_set_interface(usblp->dev, usblp->ifnum, alts); - if (r < 0) { - printk(KERN_ERR "usblp: can't set desired altsetting %d on interface %d\n", - alts, usblp->ifnum); - return r; + /* Don't unnecessarily set the interface if there's a single alt. */ + if (usblp->intf->num_altsetting > 1) { + alts = usblp->protocol[protocol].alt_setting; + if (alts < 0) + return -EINVAL; + r = usb_set_interface(usblp->dev, usblp->ifnum, alts); + if (r < 0) { + printk(KERN_ERR "usblp: can't set desired altsetting %d on interface %d\n", + alts, usblp->ifnum); + return r; + } }
usblp->bidir = (usblp->protocol[protocol].epread != NULL);
From: Heiko Stuebner heiko.stuebner@theobroma-systems.com
commit f670e9f9c8cac716c3506c6bac9e997b27ad441a upstream.
dwc2_hsotg_process_req_status uses ep_from_windex() to retrieve the endpoint for the index provided in the wIndex request param.
In a test-case with a rndis gadget running and sending a malformed packet to it like: dev.ctrl_transfer( 0x82, # bmRequestType 0x00, # bRequest 0x0000, # wValue 0x0001, # wIndex 0x00 # wLength ) it is possible to cause a crash:
[ 217.533022] dwc2 ff300000.usb: dwc2_hsotg_process_req_status: USB_REQ_GET_STATUS [ 217.559003] Unable to handle kernel read from unreadable memory at virtual address 0000000000000088 ... [ 218.313189] Call trace: [ 218.330217] ep_from_windex+0x3c/0x54 [ 218.348565] usb_gadget_giveback_request+0x10/0x20 [ 218.368056] dwc2_hsotg_complete_request+0x144/0x184
This happens because ep_from_windex wants to compare the endpoint direction even if index_to_ep() didn't return an endpoint due to the direction not matching.
The fix is easy insofar that the actual direction check is already happening when calling index_to_ep() which will return NULL if there is no endpoint for the targeted direction, so the offending check can go away completely.
Fixes: c6f5c050e2a7 ("usb: dwc2: gadget: add bi-directional endpoint support") Cc: stable@vger.kernel.org Reported-by: Gerhard Klostermeier gerhard.klostermeier@syss.de Signed-off-by: Heiko Stuebner heiko.stuebner@theobroma-systems.com Link: https://lore.kernel.org/r/20210127103919.58215-1-heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/dwc2/gadget.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
--- a/drivers/usb/dwc2/gadget.c +++ b/drivers/usb/dwc2/gadget.c @@ -942,7 +942,6 @@ static void dwc2_hsotg_complete_oursetup static struct dwc2_hsotg_ep *ep_from_windex(struct dwc2_hsotg *hsotg, u32 windex) { - struct dwc2_hsotg_ep *ep; int dir = (windex & USB_DIR_IN) ? 1 : 0; int idx = windex & 0x7F;
@@ -952,12 +951,7 @@ static struct dwc2_hsotg_ep *ep_from_win if (idx > hsotg->num_of_eps) return NULL;
- ep = index_to_ep(hsotg, idx, dir); - - if (idx && ep->dir_in != dir) - return NULL; - - return ep; + return index_to_ep(hsotg, idx, dir); }
/**
From: Felix Fietkau nbd@nbd.name
commit 18fe0fae61252b5ae6e26553e2676b5fac555951 upstream.
If the driver uses .sta_add, station entries are only uploaded after the sta is in assoc state. Fix early station rate table updates by deferring them until the sta has been uploaded.
Cc: stable@vger.kernel.org Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20210201083324.3134-1-nbd@nbd.name [use rcu_access_pointer() instead since we won't dereference here] Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mac80211/driver-ops.c | 5 ++++- net/mac80211/rate.c | 3 ++- 2 files changed, 6 insertions(+), 2 deletions(-)
--- a/net/mac80211/driver-ops.c +++ b/net/mac80211/driver-ops.c @@ -128,8 +128,11 @@ int drv_sta_state(struct ieee80211_local } else if (old_state == IEEE80211_STA_AUTH && new_state == IEEE80211_STA_ASSOC) { ret = drv_sta_add(local, sdata, &sta->sta); - if (ret == 0) + if (ret == 0) { sta->uploaded = true; + if (rcu_access_pointer(sta->sta.rates)) + drv_sta_rate_tbl_update(local, sdata, &sta->sta); + } } else if (old_state == IEEE80211_STA_ASSOC && new_state == IEEE80211_STA_AUTH) { drv_sta_remove(local, sdata, &sta->sta); --- a/net/mac80211/rate.c +++ b/net/mac80211/rate.c @@ -892,7 +892,8 @@ int rate_control_set_rates(struct ieee80 if (old) kfree_rcu(old, rcu_head);
- drv_sta_rate_tbl_update(hw_to_local(hw), sta->sdata, pubsta); + if (sta->uploaded) + drv_sta_rate_tbl_update(hw_to_local(hw), sta->sdata, pubsta);
return 0; }
From: Wang ShaoBo bobo.shaobowang@huawei.com
commit 0188b87899ffc4a1d36a0badbe77d56c92fd91dc upstream.
Our system encountered a re-init error when re-registering same kretprobe, where the kretprobe_instance in rp->free_instances is illegally accessed after re-init.
Implementation to avoid re-registration has been introduced for kprobe before, but lags for register_kretprobe(). We must check if kprobe has been re-registered before re-initializing kretprobe, otherwise it will destroy the data struct of kretprobe registered, which can lead to memory leak, system crash, also some unexpected behaviors.
We use check_kprobe_rereg() to check if kprobe has been re-registered before running register_kretprobe()'s body, for giving a warning message and terminate registration process.
Link: https://lkml.kernel.org/r/20210128124427.2031088-1-bobo.shaobowang@huawei.co...
Cc: stable@vger.kernel.org Fixes: 1f0ab40976460 ("kprobes: Prevent re-registration of the same kprobe") [ The above commit should have been done for kretprobes too ] Acked-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Acked-by: Ananth N Mavinakayanahalli ananth@linux.ibm.com Acked-by: Masami Hiramatsu mhiramat@kernel.org Signed-off-by: Wang ShaoBo bobo.shaobowang@huawei.com Signed-off-by: Cheng Jian cj.chengjian@huawei.com Signed-off-by: Steven Rostedt (VMware) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/kprobes.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1884,6 +1884,10 @@ int register_kretprobe(struct kretprobe int i; void *addr;
+ /* If only rp->kp.addr is specified, check reregistering kprobes */ + if (rp->kp.addr && check_kprobe_rereg(&rp->kp)) + return -EINVAL; + if (kretprobe_blacklist_size) { addr = kprobe_addr(&rp->kp); if (IS_ERR(addr))
From: Mathias Nyman mathias.nyman@linux.intel.com
commit d4a610635400ccc382792f6be69427078541c678 upstream.
xhci driver may in some special cases need to copy small amounts of payload data to a bounce buffer in order to meet the boundary and alignment restrictions set by the xHCI specification.
In the majority of these cases the data is in a sg list, and driver incorrectly assumed data is always in urb->sg when using the bounce buffer.
If data instead is contiguous, and in urb->transfer_buffer, we may still need to bounce buffer a small part if data starts very close (less than packet size) to a 64k boundary.
Check if sg list is used before copying data to/from it.
Fixes: f9c589e142d0 ("xhci: TD-fragment, align the unsplittable case with a bounce buffer") Cc: stable@vger.kernel.org Reported-by: Andreas Hartmann andihartmann@01019freenet.de Tested-by: Andreas Hartmann andihartmann@01019freenet.de Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20210203113702.436762-2-mathias.nyman@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-ring.c | 31 ++++++++++++++++++++----------- 1 file changed, 20 insertions(+), 11 deletions(-)
--- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -692,11 +692,16 @@ void xhci_unmap_td_bounce_buffer(struct dma_unmap_single(dev, seg->bounce_dma, ring->bounce_buf_len, DMA_FROM_DEVICE); /* for in tranfers we need to copy the data from bounce to sg */ - len = sg_pcopy_from_buffer(urb->sg, urb->num_sgs, seg->bounce_buf, - seg->bounce_len, seg->bounce_offs); - if (len != seg->bounce_len) - xhci_warn(xhci, "WARN Wrong bounce buffer read length: %zu != %d\n", - len, seg->bounce_len); + if (urb->num_sgs) { + len = sg_pcopy_from_buffer(urb->sg, urb->num_sgs, seg->bounce_buf, + seg->bounce_len, seg->bounce_offs); + if (len != seg->bounce_len) + xhci_warn(xhci, "WARN Wrong bounce buffer read length: %zu != %d\n", + len, seg->bounce_len); + } else { + memcpy(urb->transfer_buffer + seg->bounce_offs, seg->bounce_buf, + seg->bounce_len); + } seg->bounce_len = 0; seg->bounce_offs = 0; } @@ -3196,12 +3201,16 @@ static int xhci_align_td(struct xhci_hcd
/* create a max max_pkt sized bounce buffer pointed to by last trb */ if (usb_urb_dir_out(urb)) { - len = sg_pcopy_to_buffer(urb->sg, urb->num_sgs, - seg->bounce_buf, new_buff_len, enqd_len); - if (len != new_buff_len) - xhci_warn(xhci, - "WARN Wrong bounce buffer write length: %zu != %d\n", - len, new_buff_len); + if (urb->num_sgs) { + len = sg_pcopy_to_buffer(urb->sg, urb->num_sgs, + seg->bounce_buf, new_buff_len, enqd_len); + if (len != new_buff_len) + xhci_warn(xhci, "WARN Wrong bounce buffer write length: %zu != %d\n", + len, new_buff_len); + } else { + memcpy(seg->bounce_buf, urb->transfer_buffer + enqd_len, new_buff_len); + } + seg->bounce_dma = dma_map_single(dev, seg->bounce_buf, max_pkt, DMA_TO_DEVICE); } else {
From: Aurelien Aptel aaptel@suse.com
commit 21b200d091826a83aafc95d847139b2b0582f6d1 upstream.
Assuming - //HOST/a is mounted on /mnt - //HOST/b is mounted on /mnt/b
On a slow connection, running 'df' and killing it while it's processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS.
This triggers the following chain of events: => the dentry revalidation fail => dentry is put and released => superblock associated with the dentry is put => /mnt/b is unmounted
This patch makes cifs_d_revalidate() return the error instead of 0 (invalid) when cifs_revalidate_dentry() fails, except for ENOENT (file deleted) and ESTALE (file recreated).
Signed-off-by: Aurelien Aptel aaptel@suse.com Suggested-by: Shyam Prasad N nspmangalore@gmail.com Reviewed-by: Shyam Prasad N nspmangalore@gmail.com CC: stable@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/dir.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)
--- a/fs/cifs/dir.c +++ b/fs/cifs/dir.c @@ -830,6 +830,7 @@ static int cifs_d_revalidate(struct dentry *direntry, unsigned int flags) { struct inode *inode; + int rc;
if (flags & LOOKUP_RCU) return -ECHILD; @@ -839,8 +840,25 @@ cifs_d_revalidate(struct dentry *direntr if ((flags & LOOKUP_REVAL) && !CIFS_CACHE_READ(CIFS_I(inode))) CIFS_I(inode)->time = 0; /* force reval */
- if (cifs_revalidate_dentry(direntry)) - return 0; + rc = cifs_revalidate_dentry(direntry); + if (rc) { + cifs_dbg(FYI, "cifs_revalidate_dentry failed with rc=%d", rc); + switch (rc) { + case -ENOENT: + case -ESTALE: + /* + * Those errors mean the dentry is invalid + * (file was deleted or recreated) + */ + return 0; + default: + /* + * Otherwise some unexpected error happened + * report it as-is to VFS layer + */ + return rc; + } + } else { /* * If the inode wasn't known to be a dfs entry when
From: Fengnan Chang fengnanchang@gmail.com
commit f92e04f764b86e55e522988e6f4b6082d19a2721 upstream.
When analysing tuples fails we may loop indefinitely to retry. Let's avoid this by using a 10s timeout and bail if not completed earlier.
Signed-off-by: Fengnan Chang fengnanchang@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20210123033230.36442-1-fengnanchang@gmail.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/sdio_cis.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/mmc/core/sdio_cis.c +++ b/drivers/mmc/core/sdio_cis.c @@ -24,6 +24,8 @@ #include "sdio_cis.h" #include "sdio_ops.h"
+#define SDIO_READ_CIS_TIMEOUT_MS (10 * 1000) /* 10s */ + static int cistpl_vers_1(struct mmc_card *card, struct sdio_func *func, const unsigned char *buf, unsigned size) { @@ -269,6 +271,8 @@ static int sdio_read_cis(struct mmc_card
do { unsigned char tpl_code, tpl_link; + unsigned long timeout = jiffies + + msecs_to_jiffies(SDIO_READ_CIS_TIMEOUT_MS);
ret = mmc_io_rw_direct(card, 0, 0, ptr++, 0, &tpl_code); if (ret) @@ -321,6 +325,8 @@ static int sdio_read_cis(struct mmc_card prev = &this->next;
if (ret == -ENOENT) { + if (time_after(jiffies, timeout)) + break; /* warn about unknown tuples */ pr_warn_ratelimited("%s: queuing unknown" " CIS tuple 0x%02x (%u bytes)\n",
From: Russell King rmk+kernel@armlinux.org.uk
commit 39d3454c3513840eb123b3913fda6903e45ce671 upstream.
Building with gcc 4.9.2 reveals a latent bug in the PCI accessors for Footbridge platforms, which causes a fatal alignment fault while accessing IO memory. Fix this by making the assembly volatile.
Cc: stable@vger.kernel.org Signed-off-by: Russell King rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/mach-footbridge/dc21285.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/arch/arm/mach-footbridge/dc21285.c +++ b/arch/arm/mach-footbridge/dc21285.c @@ -69,15 +69,15 @@ dc21285_read_config(struct pci_bus *bus, if (addr) switch (size) { case 1: - asm("ldrb %0, [%1, %2]" + asm volatile("ldrb %0, [%1, %2]" : "=r" (v) : "r" (addr), "r" (where) : "cc"); break; case 2: - asm("ldrh %0, [%1, %2]" + asm volatile("ldrh %0, [%1, %2]" : "=r" (v) : "r" (addr), "r" (where) : "cc"); break; case 4: - asm("ldr %0, [%1, %2]" + asm volatile("ldr %0, [%1, %2]" : "=r" (v) : "r" (addr), "r" (where) : "cc"); break; } @@ -103,17 +103,17 @@ dc21285_write_config(struct pci_bus *bus if (addr) switch (size) { case 1: - asm("strb %0, [%1, %2]" + asm volatile("strb %0, [%1, %2]" : : "r" (value), "r" (addr), "r" (where) : "cc"); break; case 2: - asm("strh %0, [%1, %2]" + asm volatile("strh %0, [%1, %2]" : : "r" (value), "r" (addr), "r" (where) : "cc"); break; case 4: - asm("str %0, [%1, %2]" + asm volatile("str %0, [%1, %2]" : : "r" (value), "r" (addr), "r" (where) : "cc"); break;
From: Muchun Song songmuchun@bytedance.com
commit 585fc0d2871c9318c949fbf45b1f081edd489e96 upstream.
If a new hugetlb page is allocated during fallocate it will not be marked as active (set_page_huge_active) which will result in a later isolate_huge_page failure when the page migration code would like to move that page. Such a failure would be unexpected and wrong.
Only export set_page_huge_active, just leave clear_page_huge_active as static. Because there are no external users.
Link: https://lkml.kernel.org/r/20210115124942.46403-3-songmuchun@bytedance.com Fixes: 70c3547e36f5 (hugetlbfs: add hugetlbfs_fallocate()) Signed-off-by: Muchun Song songmuchun@bytedance.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: David Hildenbrand david@redhat.com Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/hugetlbfs/inode.c | 3 ++- include/linux/hugetlb.h | 3 +++ mm/hugetlb.c | 2 +- 3 files changed, 6 insertions(+), 2 deletions(-)
--- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -665,8 +665,9 @@ static long hugetlbfs_fallocate(struct f
mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+ set_page_huge_active(page); /* - * page_put due to reference from alloc_huge_page() + * put_page() due to reference from alloc_huge_page() * unlock_page because locked by add_to_page_cache() */ put_page(page); --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -502,6 +502,9 @@ static inline void hugetlb_count_sub(lon { atomic_long_sub(l, &mm->hugetlb_usage); } + +void set_page_huge_active(struct page *page); + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; #define alloc_huge_page(v, a, r) NULL --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1215,7 +1215,7 @@ bool page_huge_active(struct page *page) }
/* never called for tail page */ -static void set_page_huge_active(struct page *page) +void set_page_huge_active(struct page *page) { VM_BUG_ON_PAGE(!PageHeadHuge(page), page); SetPagePrivate(&page[1]);
From: Muchun Song songmuchun@bytedance.com
commit 0eb2df2b5629794020f75e94655e1994af63f0d4 upstream.
There is a race between isolate_huge_page() and __free_huge_page().
CPU0: CPU1:
if (PageHuge(page)) put_page(page) __free_huge_page(page) spin_lock(&hugetlb_lock) update_and_free_page(page) set_compound_page_dtor(page, NULL_COMPOUND_DTOR) spin_unlock(&hugetlb_lock) isolate_huge_page(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHead(page), page) spin_lock(&hugetlb_lock) page_huge_active(page) // trigger BUG_ON VM_BUG_ON_PAGE(!PageHuge(page), page) spin_unlock(&hugetlb_lock)
When we isolate a HugeTLB page on CPU0. Meanwhile, we free it to the buddy allocator on CPU1. Then, we can trigger a BUG_ON on CPU0, because it is already freed to the buddy allocator.
Link: https://lkml.kernel.org/r/20210115124942.46403-5-songmuchun@bytedance.com Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage") Signed-off-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: David Hildenbrand david@redhat.com Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/hugetlb.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4657,9 +4657,9 @@ bool isolate_huge_page(struct page *page { bool ret = true;
- VM_BUG_ON_PAGE(!PageHead(page), page); spin_lock(&hugetlb_lock); - if (!page_huge_active(page) || !get_page_unless_zero(page)) { + if (!PageHeadHuge(page) || !page_huge_active(page) || + !get_page_unless_zero(page)) { ret = false; goto unlock; }
From: Muchun Song songmuchun@bytedance.com
commit ecbf4724e6061b4b01be20f6d797d64d462b2bc8 upstream.
The page_huge_active() can be called from scan_movable_pages() which do not hold a reference count to the HugeTLB page. So when we call page_huge_active() from scan_movable_pages(), the HugeTLB page can be freed parallel. Then we will trigger a BUG_ON which is in the page_huge_active() when CONFIG_DEBUG_VM is enabled. Just remove the VM_BUG_ON_PAGE.
Link: https://lkml.kernel.org/r/20210115124942.46403-6-songmuchun@bytedance.com Fixes: 7e1f049efb86 ("mm: hugetlb: cleanup using paeg_huge_active()") Signed-off-by: Muchun Song songmuchun@bytedance.com Reviewed-by: Mike Kravetz mike.kravetz@oracle.com Acked-by: Michal Hocko mhocko@suse.com Reviewed-by: Oscar Salvador osalvador@suse.de Cc: David Hildenbrand david@redhat.com Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/hugetlb.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1210,8 +1210,7 @@ struct hstate *size_to_hstate(unsigned l */ bool page_huge_active(struct page *page) { - VM_BUG_ON_PAGE(!PageHuge(page), page); - return PageHead(page) && PagePrivate(&page[1]); + return PageHeadHuge(page) && PagePrivate(&page[1]); }
/* never called for tail page */
From: Hugh Dickins hughd@google.com
commit 1c2f67308af4c102b4e1e6cd6f69819ae59408e0 upstream.
Sergey reported deadlock between kswapd correctly doing its usual lock_page(page) followed by down_read(page->mapping->i_mmap_rwsem), and madvise(MADV_REMOVE) on an madvise(MADV_HUGEPAGE) area doing down_write(page->mapping->i_mmap_rwsem) followed by lock_page(page).
This happened when shmem_fallocate(punch hole)'s unmap_mapping_range() reaches zap_pmd_range()'s call to __split_huge_pmd(). The same deadlock could occur when partially truncating a mapped huge tmpfs file, or using fallocate(FALLOC_FL_PUNCH_HOLE) on it.
__split_huge_pmd()'s page lock was added in 5.8, to make sure that any concurrent use of reuse_swap_page() (holding page lock) could not catch the anon THP's mapcounts and swapcounts while they were being split.
Fortunately, reuse_swap_page() is never applied to a shmem or file THP (not even by khugepaged, which checks PageSwapCache before calling), and anonymous THPs are never created in shmem or file areas: so that __split_huge_pmd()'s page lock can only be necessary for anonymous THPs, on which there is no risk of deadlock with i_mmap_rwsem.
Link: https://lkml.kernel.org/r/alpine.LSU.2.11.2101161409470.2022@eggly.anvils Fixes: c444eb564fb1 ("mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()") Signed-off-by: Hugh Dickins hughd@google.com Reported-by: Sergey Senozhatsky sergey.senozhatsky.work@gmail.com Reviewed-by: Andrea Arcangeli aarcange@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/huge_memory.c | 37 +++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 14 deletions(-)
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1753,7 +1753,7 @@ void __split_huge_pmd(struct vm_area_str spinlock_t *ptl; struct mm_struct *mm = vma->vm_mm; unsigned long haddr = address & HPAGE_PMD_MASK; - bool was_locked = false; + bool do_unlock_page = false; pmd_t _pmd;
mmu_notifier_invalidate_range_start(mm, haddr, haddr + HPAGE_PMD_SIZE); @@ -1766,7 +1766,6 @@ void __split_huge_pmd(struct vm_area_str VM_BUG_ON(freeze && !page); if (page) { VM_WARN_ON_ONCE(!PageLocked(page)); - was_locked = true; if (page != pmd_page(*pmd)) goto out; } @@ -1775,19 +1774,29 @@ repeat: if (pmd_trans_huge(*pmd)) { if (!page) { page = pmd_page(*pmd); - if (unlikely(!trylock_page(page))) { - get_page(page); - _pmd = *pmd; - spin_unlock(ptl); - lock_page(page); - spin_lock(ptl); - if (unlikely(!pmd_same(*pmd, _pmd))) { - unlock_page(page); + /* + * An anonymous page must be locked, to ensure that a + * concurrent reuse_swap_page() sees stable mapcount; + * but reuse_swap_page() is not used on shmem or file, + * and page lock must not be taken when zap_pmd_range() + * calls __split_huge_pmd() while i_mmap_lock is held. + */ + if (PageAnon(page)) { + if (unlikely(!trylock_page(page))) { + get_page(page); + _pmd = *pmd; + spin_unlock(ptl); + lock_page(page); + spin_lock(ptl); + if (unlikely(!pmd_same(*pmd, _pmd))) { + unlock_page(page); + put_page(page); + page = NULL; + goto repeat; + } put_page(page); - page = NULL; - goto repeat; } - put_page(page); + do_unlock_page = true; } } if (PageMlocked(page)) @@ -1797,7 +1806,7 @@ repeat: __split_huge_pmd_locked(vma, pmd, haddr, freeze); out: spin_unlock(ptl); - if (!was_locked && page) + if (do_unlock_page) unlock_page(page); mmu_notifier_invalidate_range_end(mm, haddr, haddr + HPAGE_PMD_SIZE); }
From: Josh Poimboeuf jpoimboe@redhat.com
commit 20bf2b378729c4a0366a53e2018a0b70ace94bcd upstream.
With retpolines disabled, some configurations of GCC, and specifically the GCC versions 9 and 10 in Ubuntu will add Intel CET instrumentation to the kernel by default. That breaks certain tracing scenarios by adding a superfluous ENDBR64 instruction before the fentry call, for functions which can be called indirectly.
CET instrumentation isn't currently necessary in the kernel, as CET is only supported in user space. Disable it unconditionally and move it into the x86's Makefile as CET/CFI... enablement should be a per-arch decision anyway.
[ bp: Massage and extend commit message. ]
Fixes: 29be86d7f9cb ("kbuild: add -fcf-protection=none when using retpoline flags") Reported-by: Nikolay Borisov nborisov@suse.com Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Nikolay Borisov nborisov@suse.com Tested-by: Nikolay Borisov nborisov@suse.com Cc: stable@vger.kernel.org Cc: Seth Forshee seth.forshee@canonical.com Cc: Masahiro Yamada yamada.masahiro@socionext.com Link: https://lkml.kernel.org/r/20210128215219.6kct3h2eiustncws@treble Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Makefile | 6 ------ arch/x86/Makefile | 3 +++ 2 files changed, 3 insertions(+), 6 deletions(-)
--- a/Makefile +++ b/Makefile @@ -841,12 +841,6 @@ KBUILD_CFLAGS += $(call cc-option,-Wer # change __FILE__ to the relative path from the srctree KBUILD_CFLAGS += $(call cc-option,-fmacro-prefix-map=$(srctree)/=)
-# ensure -fcf-protection is disabled when using retpoline as it is -# incompatible with -mindirect-branch=thunk-extern -ifdef CONFIG_RETPOLINE -KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none) -endif - # use the deterministic mode of AR if available KBUILD_ARFLAGS := $(call ar-option,D)
--- a/arch/x86/Makefile +++ b/arch/x86/Makefile @@ -137,6 +137,9 @@ else KBUILD_CFLAGS += -mno-red-zone KBUILD_CFLAGS += -mcmodel=kernel
+ # Intel CET isn't enabled in the kernel + KBUILD_CFLAGS += $(call cc-option,-fcf-protection=none) + # -funit-at-a-time shrinks the kernel .text considerably # unfortunately it makes reading oopses harder. KBUILD_CFLAGS += $(call cc-option,-funit-at-a-time)
From: Dave Hansen dave.hansen@linux.intel.com
commit 25a068b8e9a4eb193d755d58efcb3c98928636e0 upstream.
Jan Kiszka reported that the x2apic_wrmsr_fence() function uses a plain MFENCE while the Intel SDM (10.12.3 MSR Access in x2APIC Mode) calls for MFENCE; LFENCE.
Short summary: we have special MSRs that have weaker ordering than all the rest. Add fencing consistent with current SDM recommendations.
This is not known to cause any issues in practice, only in theory.
Longer story below:
The reason the kernel uses a different semantic is that the SDM changed (roughly in late 2017). The SDM changed because folks at Intel were auditing all of the recommended fences in the SDM and realized that the x2apic fences were insufficient.
Why was the pain MFENCE judged insufficient?
WRMSR itself is normally a serializing instruction. No fences are needed because the instruction itself serializes everything.
But, there are explicit exceptions for this serializing behavior written into the WRMSR instruction documentation for two classes of MSRs: IA32_TSC_DEADLINE and the X2APIC MSRs.
Back to x2apic: WRMSR is *not* serializing in this specific case. But why is MFENCE insufficient? MFENCE makes writes visible, but only affects load/store instructions. WRMSR is unfortunately not a load/store instruction and is unaffected by MFENCE. This means that a non-serializing WRMSR could be reordered by the CPU to execute before the writes made visible by the MFENCE have even occurred in the first place.
This means that an x2apic IPI could theoretically be triggered before there is any (visible) data to process.
Does this affect anything in practice? I honestly don't know. It seems quite possible that by the time an interrupt gets to consume the (not yet) MFENCE'd data, it has become visible, mostly by accident.
To be safe, add the SDM-recommended fences for all x2apic WRMSRs.
This also leaves open the question of the _other_ weakly-ordered WRMSR: MSR_IA32_TSC_DEADLINE. While it has the same ordering architecture as the x2APIC MSRs, it seems substantially less likely to be a problem in practice. While writes to the in-memory Local Vector Table (LVT) might theoretically be reordered with respect to a weakly-ordered WRMSR like TSC_DEADLINE, the SDM has this to say:
In x2APIC mode, the WRMSR instruction is used to write to the LVT entry. The processor ensures the ordering of this write and any subsequent WRMSR to the deadline; no fencing is required.
But, that might still leave xAPIC exposed. The safest thing to do for now is to add the extra, recommended LFENCE.
[ bp: Massage commit message, fix typos, drop accidentally added newline to tools/arch/x86/include/asm/barrier.h. ]
Reported-by: Jan Kiszka jan.kiszka@siemens.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200305174708.F77040DD@viggo.jf.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/apic.h | 10 ---------- arch/x86/include/asm/barrier.h | 18 ++++++++++++++++++ arch/x86/kernel/apic/apic.c | 4 ++++ arch/x86/kernel/apic/x2apic_cluster.c | 6 ++++-- arch/x86/kernel/apic/x2apic_phys.c | 6 ++++-- 5 files changed, 30 insertions(+), 14 deletions(-)
--- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -176,16 +176,6 @@ static inline void lapic_update_tsc_freq #endif /* !CONFIG_X86_LOCAL_APIC */
#ifdef CONFIG_X86_X2APIC -/* - * Make previous memory operations globally visible before - * sending the IPI through x2apic wrmsr. We need a serializing instruction or - * mfence for this. - */ -static inline void x2apic_wrmsr_fence(void) -{ - asm volatile("mfence" : : : "memory"); -} - static inline void native_apic_msr_write(u32 reg, u32 v) { if (reg == APIC_DFR || reg == APIC_ID || reg == APIC_LDR || --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -110,4 +110,22 @@ do { \
#include <asm-generic/barrier.h>
+/* + * Make previous memory operations globally visible before + * a WRMSR. + * + * MFENCE makes writes visible, but only affects load/store + * instructions. WRMSR is unfortunately not a load/store + * instruction and is unaffected by MFENCE. The LFENCE ensures + * that the WRMSR is not reordered. + * + * Most WRMSRs are full serializing instructions themselves and + * do not require this barrier. This is only required for the + * IA32_TSC_DEADLINE and X2APIC MSRs. + */ +static inline void weak_wrmsr_fence(void) +{ + asm volatile("mfence; lfence" : : : "memory"); +} + #endif /* _ASM_X86_BARRIER_H */ --- a/arch/x86/kernel/apic/apic.c +++ b/arch/x86/kernel/apic/apic.c @@ -42,6 +42,7 @@ #include <asm/x86_init.h> #include <asm/pgalloc.h> #include <linux/atomic.h> +#include <asm/barrier.h> #include <asm/mpspec.h> #include <asm/i8259.h> #include <asm/proto.h> @@ -476,6 +477,9 @@ static int lapic_next_deadline(unsigned { u64 tsc;
+ /* This MSR is special and need a special fence: */ + weak_wrmsr_fence(); + tsc = rdtsc(); wrmsrl(MSR_IA32_TSC_DEADLINE, tsc + (((u64) delta) * TSC_DIVISOR)); return 0; --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -27,7 +27,8 @@ static void x2apic_send_IPI(int cpu, int { u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu);
- x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL); }
@@ -40,7 +41,8 @@ __x2apic_send_IPI_mask(const struct cpum unsigned long flags; u32 dest;
- x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence();
local_irq_save(flags);
--- a/arch/x86/kernel/apic/x2apic_phys.c +++ b/arch/x86/kernel/apic/x2apic_phys.c @@ -40,7 +40,8 @@ static void x2apic_send_IPI(int cpu, int { u32 dest = per_cpu(x86_cpu_to_apicid, cpu);
- x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence(); __x2apic_send_IPI_dest(dest, vector, APIC_DEST_PHYSICAL); }
@@ -51,7 +52,8 @@ __x2apic_send_IPI_mask(const struct cpum unsigned long this_cpu; unsigned long flags;
- x2apic_wrmsr_fence(); + /* x2apic MSRs are special and need a special fence: */ + weak_wrmsr_fence();
local_irq_save(flags);
From: Benjamin Valentin benpicco@googlemail.com
commit 9bbd77d5bbc9aff8cb74d805c31751f5f0691ba8 upstream.
There is a fork of this driver on GitHub [0] that has been updated with new device IDs.
Merge those into the mainline driver, so the out-of-tree fork is not needed for users of those devices anymore.
[0] https://github.com/paroj/xpad
Signed-off-by: Benjamin Valentin benpicco@googlemail.com Link: https://lore.kernel.org/r/20210121142523.1b6b050f@rechenknecht2k11 Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/joystick/xpad.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
--- a/drivers/input/joystick/xpad.c +++ b/drivers/input/joystick/xpad.c @@ -232,9 +232,17 @@ static const struct xpad_device { { 0x0e6f, 0x0213, "Afterglow Gamepad for Xbox 360", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x021f, "Rock Candy Gamepad for Xbox 360", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0246, "Rock Candy Gamepad for Xbox One 2015", 0, XTYPE_XBOXONE }, - { 0x0e6f, 0x02ab, "PDP Controller for Xbox One", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02a0, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02a1, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02a2, "PDP Wired Controller for Xbox One - Crimson Red", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02a4, "PDP Wired Controller for Xbox One - Stealth Series", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x02a6, "PDP Wired Controller for Xbox One - Camo Series", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02a7, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02a8, "PDP Xbox One Controller", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02ab, "PDP Controller for Xbox One", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02ad, "PDP Wired Controller for Xbox One - Stealth Series", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02b3, "Afterglow Prismatic Wired Controller", 0, XTYPE_XBOXONE }, + { 0x0e6f, 0x02b8, "Afterglow Prismatic Wired Controller", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0301, "Logic3 Controller", 0, XTYPE_XBOX360 }, { 0x0e6f, 0x0346, "Rock Candy Gamepad for Xbox One 2016", 0, XTYPE_XBOXONE }, { 0x0e6f, 0x0401, "Logic3 Controller", 0, XTYPE_XBOX360 }, @@ -313,6 +321,9 @@ static const struct xpad_device { { 0x1bad, 0xfa01, "MadCatz GamePad", 0, XTYPE_XBOX360 }, { 0x1bad, 0xfd00, "Razer Onza TE", 0, XTYPE_XBOX360 }, { 0x1bad, 0xfd01, "Razer Onza", 0, XTYPE_XBOX360 }, + { 0x20d6, 0x2001, "BDA Xbox Series X Wired Controller", 0, XTYPE_XBOXONE }, + { 0x20d6, 0x281f, "PowerA Wired Controller For Xbox 360", 0, XTYPE_XBOX360 }, + { 0x2e24, 0x0652, "Hyperkin Duke X-Box One pad", 0, XTYPE_XBOXONE }, { 0x24c6, 0x5000, "Razer Atrox Arcade Stick", MAP_TRIGGERS_TO_BUTTONS, XTYPE_XBOX360 }, { 0x24c6, 0x5300, "PowerA MINI PROEX Controller", 0, XTYPE_XBOX360 }, { 0x24c6, 0x5303, "Xbox Airflo wired controller", 0, XTYPE_XBOX360 }, @@ -446,8 +457,12 @@ static const struct usb_device_id xpad_t XPAD_XBOX360_VENDOR(0x162e), /* Joytech X-Box 360 controllers */ XPAD_XBOX360_VENDOR(0x1689), /* Razer Onza */ XPAD_XBOX360_VENDOR(0x1bad), /* Harminix Rock Band Guitar and Drums */ + XPAD_XBOX360_VENDOR(0x20d6), /* PowerA Controllers */ + XPAD_XBOXONE_VENDOR(0x20d6), /* PowerA Controllers */ XPAD_XBOX360_VENDOR(0x24c6), /* PowerA Controllers */ XPAD_XBOXONE_VENDOR(0x24c6), /* PowerA Controllers */ + XPAD_XBOXONE_VENDOR(0x2e24), /* Hyperkin Duke X-Box One pad */ + XPAD_XBOX360_VENDOR(0x2f24), /* GameSir Controllers */ { } };
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 81b704d3e4674e09781d331df73d76675d5ad8cb upstream.
Calling acpi_thermal_check() from acpi_thermal_notify() directly is problematic if _TMP triggers Notify () on the thermal zone for which it has been evaluated (which happens on some systems), because it causes a new acpi_thermal_notify() invocation to be queued up every time and if that takes place too often, an indefinite number of pending work items may accumulate in kacpi_notify_wq over time.
Besides, it is not really useful to queue up a new invocation of acpi_thermal_check() if one of them is pending already.
For these reasons, rework acpi_thermal_notify() to queue up a thermal check instead of calling acpi_thermal_check() directly and only allow one thermal check to be pending at a time. Moreover, only allow one acpi_thermal_check_fn() instance at a time to run thermal_zone_device_update() for one thermal zone and make it return early if it sees other instances running for the same thermal zone.
While at it, fold acpi_thermal_check() into acpi_thermal_check_fn(), as it is only called from there after the other changes made here.
[This issue appears to have been exposed by commit 6d25be5782e4 ("sched/core, workqueues: Distangle worker accounting from rq lock"), but it is unclear why it was not visible earlier.]
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=208877 Reported-by: Stephen Berman stephen.berman@gmx.net Diagnosed-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Tested-by: Stephen Berman stephen.berman@gmx.net Cc: All applicable stable@vger.kernel.org [bigeasy: Backported to v4.9.y, use atomic_t instead of refcount_t] Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/thermal.c | 55 +++++++++++++++++++++++++++++++++---------------- 1 file changed, 38 insertions(+), 17 deletions(-)
--- a/drivers/acpi/thermal.c +++ b/drivers/acpi/thermal.c @@ -188,6 +188,8 @@ struct acpi_thermal { int tz_enabled; int kelvin_offset; struct work_struct thermal_check_work; + struct mutex thermal_check_lock; + atomic_t thermal_check_count; };
/* -------------------------------------------------------------------------- @@ -513,17 +515,6 @@ static int acpi_thermal_get_trip_points( return 0; }
-static void acpi_thermal_check(void *data) -{ - struct acpi_thermal *tz = data; - - if (!tz->tz_enabled) - return; - - thermal_zone_device_update(tz->thermal_zone, - THERMAL_EVENT_UNSPECIFIED); -} - /* sys I/F for generic thermal sysfs support */
static int thermal_get_temp(struct thermal_zone_device *thermal, int *temp) @@ -557,6 +548,8 @@ static int thermal_get_mode(struct therm return 0; }
+static void acpi_thermal_check_fn(struct work_struct *work); + static int thermal_set_mode(struct thermal_zone_device *thermal, enum thermal_device_mode mode) { @@ -582,7 +575,7 @@ static int thermal_set_mode(struct therm ACPI_DEBUG_PRINT((ACPI_DB_INFO, "%s kernel ACPI thermal control\n", tz->tz_enabled ? "Enable" : "Disable")); - acpi_thermal_check(tz); + acpi_thermal_check_fn(&tz->thermal_check_work); } return 0; } @@ -951,6 +944,12 @@ static void acpi_thermal_unregister_ther Driver Interface -------------------------------------------------------------------------- */
+static void acpi_queue_thermal_check(struct acpi_thermal *tz) +{ + if (!work_pending(&tz->thermal_check_work)) + queue_work(acpi_thermal_pm_queue, &tz->thermal_check_work); +} + static void acpi_thermal_notify(struct acpi_device *device, u32 event) { struct acpi_thermal *tz = acpi_driver_data(device); @@ -961,17 +960,17 @@ static void acpi_thermal_notify(struct a
switch (event) { case ACPI_THERMAL_NOTIFY_TEMPERATURE: - acpi_thermal_check(tz); + acpi_queue_thermal_check(tz); break; case ACPI_THERMAL_NOTIFY_THRESHOLDS: acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_THRESHOLDS); - acpi_thermal_check(tz); + acpi_queue_thermal_check(tz); acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break; case ACPI_THERMAL_NOTIFY_DEVICES: acpi_thermal_trips_update(tz, ACPI_TRIPS_REFRESH_DEVICES); - acpi_thermal_check(tz); + acpi_queue_thermal_check(tz); acpi_bus_generate_netlink_event(device->pnp.device_class, dev_name(&device->dev), event, 0); break; @@ -1071,7 +1070,27 @@ static void acpi_thermal_check_fn(struct { struct acpi_thermal *tz = container_of(work, struct acpi_thermal, thermal_check_work); - acpi_thermal_check(tz); + + if (!tz->tz_enabled) + return; + /* + * In general, it is not sufficient to check the pending bit, because + * subsequent instances of this function may be queued after one of them + * has started running (e.g. if _TMP sleeps). Avoid bailing out if just + * one of them is running, though, because it may have done the actual + * check some time ago, so allow at least one of them to block on the + * mutex while another one is running the update. + */ + if (!atomic_add_unless(&tz->thermal_check_count, -1, 1)) + return; + + mutex_lock(&tz->thermal_check_lock); + + thermal_zone_device_update(tz->thermal_zone, THERMAL_EVENT_UNSPECIFIED); + + atomic_inc(&tz->thermal_check_count); + + mutex_unlock(&tz->thermal_check_lock); }
static int acpi_thermal_add(struct acpi_device *device) @@ -1103,6 +1122,8 @@ static int acpi_thermal_add(struct acpi_ if (result) goto free_memory;
+ atomic_set(&tz->thermal_check_count, 3); + mutex_init(&tz->thermal_check_lock); INIT_WORK(&tz->thermal_check_work, acpi_thermal_check_fn);
pr_info(PREFIX "%s [%s] (%ld C)\n", acpi_device_name(device), @@ -1168,7 +1189,7 @@ static int acpi_thermal_resume(struct de tz->state.active |= tz->trips.active[i].flags.enabled; }
- queue_work(acpi_thermal_pm_queue, &tz->thermal_check_work); + acpi_queue_thermal_check(tz);
return AE_OK; }
From: Nadav Amit namit@vmware.com
commit 29b32839725f8c89a41cb6ee054c85f3116ea8b5 upstream.
When an Intel IOMMU is virtualized, and a physical device is passed-through to the VM, changes of the virtual IOMMU need to be propagated to the physical IOMMU. The hypervisor therefore needs to monitor PTE mappings in the IOMMU page-tables. Intel specifications provide "caching-mode" capability that a virtual IOMMU uses to report that the IOMMU is virtualized and a TLB flush is needed after mapping to allow the hypervisor to propagate virtual IOMMU mappings to the physical IOMMU. To the best of my knowledge no real physical IOMMU reports "caching-mode" as turned on.
Synchronizing the virtual and the physical IOMMU tables is expensive if the hypervisor is unaware which PTEs have changed, as the hypervisor is required to walk all the virtualized tables and look for changes. Consequently, domain flushes are much more expensive than page-specific flushes on virtualized IOMMUs with passthrough devices. The kernel therefore exploited the "caching-mode" indication to avoid domain flushing and use page-specific flushing in virtualized environments. See commit 78d5f0f500e6 ("intel-iommu: Avoid global flushes with caching mode.")
This behavior changed after commit 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing"). Now, when batched TLB flushing is used (the default), full TLB domain flushes are performed frequently, requiring the hypervisor to perform expensive synchronization between the virtual TLB and the physical one.
Getting batched TLB flushes to use page-specific invalidations again in such circumstances is not easy, since the TLB invalidation scheme assumes that "full" domain TLB flushes are performed for scalability.
Disable batched TLB flushes when caching-mode is on, as the performance benefit from using batched TLB invalidations is likely to be much smaller than the overhead of the virtual-to-physical IOMMU page-tables synchronization.
Fixes: 13cf01744608 ("iommu/vt-d: Make use of iova deferred flushing") Signed-off-by: Nadav Amit namit@vmware.com Cc: David Woodhouse dwmw2@infradead.org Cc: Lu Baolu baolu.lu@linux.intel.com Cc: Joerg Roedel joro@8bytes.org Cc: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Acked-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/20210127175317.1600473-1-namit@vmware.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Nadav Amit namit@vmware.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/iommu/intel-iommu.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/iommu/intel-iommu.c +++ b/drivers/iommu/intel-iommu.c @@ -3323,6 +3323,12 @@ static int __init init_dmars(void)
if (!ecap_pass_through(iommu->ecap)) hw_pass_through = 0; + + if (!intel_iommu_strict && cap_caching_mode(iommu->cap)) { + pr_info("Disable batched IOTLB flush due to virtualization"); + intel_iommu_strict = 1; + } + #ifdef CONFIG_INTEL_IOMMU_SVM if (pasid_enabled(iommu)) intel_svm_alloc_pasid_tables(iommu);
From: Shih-Yuan Lee (FourDollars) sylee@canonical.com
commit b4576de87243c32fab50dda9f8eba1e3cf13a7e2 upstream.
The PIN number for Dell headset mode of ALC3271 is wrong.
Fixes: fcc6c877a01f ("ALSA: hda/realtek - Support Dell headset mode for ALC3271") Signed-off-by: Shih-Yuan Lee (FourDollars) sylee@canonical.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6284,7 +6284,7 @@ static const struct snd_hda_pin_quirk al SND_HDA_PIN_QUIRK(0x10ec0299, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, ALC225_STANDARD_PINS, {0x12, 0xb7a60130}, - {0x13, 0xb8a60140}, + {0x13, 0xb8a61140}, {0x17, 0x90170110}), {} };
On 2/8/21 7:00 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.257 release. There are 43 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Feb 2021 14:57:55 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.257-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
Tested-by: Florian Fainelli f.fainelli@gmail.com
On ARCH_BRCMSTB using 32-bit ARM and 64-bit ARM kernels, no regressions observed, thanks!
On 2/8/21 8:00 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.257 release. There are 43 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Feb 2021 14:57:55 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.257-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Mon, Feb 08, 2021 at 04:00:26PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.9.257 release. There are 43 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Feb 2021 14:57:55 +0000. Anything received after that time might be too late.
Build results: total: 168 pass: 168 fail: 0 Qemu test results: total: 382 pass: 382 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Mon, 8 Feb 2021 at 20:35, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.9.257 release. There are 43 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 10 Feb 2021 14:57:55 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.257-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
Summary ------------------------------------------------------------------------
kernel: 4.9.257-rc1 git repo: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc git branch: linux-4.9.y git commit: cffa06cb070f1ce0015e19c0f54b7a2b86ab4d24 git describe: v4.9.256-44-gcffa06cb070f Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.9.y/build/v4.9.25...
No regressions (compared to build v4.9.255-14-gd4780bd0ccef)
No fixes (compared to build v4.9.255-14-gd4780bd0ccef)
Ran 39352 total tests in the following environments and test suites.
Environments -------------- - arm - arm64 - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-64k_page_size - juno-r2 - arm64 - juno-r2-compat - juno-r2-kasan - mips - qemu-arm64-kasan - qemu-x86_64-kasan - qemu_arm - qemu_arm64 - qemu_arm64-compat - qemu_i386 - qemu_x86_64 - qemu_x86_64-compat - sparc - x15 - arm - x86_64 - x86-kasan - x86_64
Test Suites ----------- * build * linux-log-parser * igt-gpu-tools * install-android-platform-tools-r2600 * kselftest-android * kselftest-bpf * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-intel_pstate * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-lkdtm * kselftest-membarrier * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sysctl * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-zram * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-controllers-tests * ltp-cpuhotplug-tests * ltp-crypto-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-tracing-tests * perf * v4l2-compliance * fwts * kselftest-efivarfs * kselftest-filesystems * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-ipc * kselftest-ir * kselftest-kcmp * libhugetlbfs * ltp-fs-tests * ltp-hugetlb-tests * ltp-mm-tests * network-basic-tests * kvm-unit-tests * ltp-open-posix-tests * kselftest-vm * kselftest-kexec * kselftest-x86
-- Linaro LKFT https://lkft.linaro.org
linux-stable-mirror@lists.linaro.org