This is the start of the stable review cycle for the 4.14.121 release. There are 63 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed 22 May 2019 11:50:54 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.121-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 4.14.121-rc1
zhangyi (F) yi.zhang@huawei.com ext4: fix compile error when using BUFFER_TRACE
Eric Dumazet edumazet@google.com iov_iter: optimize page_copy_sane()
Sean Christopherson sean.j.christopherson@intel.com KVM: x86: Skip EFER vs. guest CPUID checks for host-initiated writes
Michał Wadowski wadosm@gmail.com ALSA: hda/realtek - Fix for Lenovo B50-70 inverted internal microphone bug
Sahitya Tummala stummala@codeaurora.org ext4: fix use-after-free in dx_release()
Lukas Czerner lczerner@redhat.com ext4: fix data corruption caused by overlapping unaligned and aligned IO
Sriram Rajagopalan sriramr@arista.com ext4: zero out the unused memory region in the extent tree block
Jiufei Xue jiufei.xue@linux.alibaba.com fs/writeback.c: use rcu_barrier() to wait for inflight wb switches going into workqueue when umount
Greg Kroah-Hartman gregkh@linuxfoundation.org fib_rules: fix error in backport of e9919a24d302 ("fib_rules: return 0...")
Eric Biggers ebiggers@google.com crypto: ccm - fix incompatibility between "ccm" and "ccm_base"
Eric Biggers ebiggers@google.com crypto: salsa20 - don't access already-freed walk.iv
Eric Biggers ebiggers@google.com crypto: arm64/aes-neonbs - don't access already-freed walk.iv
Kamlakant Patel kamlakantp@marvell.com ipmi:ssif: compare block number correctly for multi-part return messages
Debabrata Banerjee dbanerje@akamai.com ext4: fix ext4_show_options for file systems w/o journal
Kirill Tkhai ktkhai@virtuozzo.com ext4: actually request zeroing of inode table after grow
Barret Rhoden brho@google.com ext4: fix use-after-free race with debug_want_extra_isize
Coly Li colyli@suse.de bcache: never set KEY_PTRS of journal key to 0 in journal_reclaim()
Liang Chen liangchen.linux@gmail.com bcache: fix a race between cache register and cacheset unregister
Filipe Manana fdmanana@suse.com Btrfs: do not start a transaction at iterate_extent_inodes()
Filipe Manana fdmanana@suse.com Btrfs: do not start a transaction during fiemap
Pan Bian bianpan2016@163.com ext4: avoid drop reference to iloc.bh twice
Theodore Ts'o tytso@mit.edu ext4: ignore e_value_offs for xattrs with value-in-ea-inode
Jan Kara jack@suse.cz ext4: make sanity check in mballoc more strict
Jiufei Xue jiufei.xue@linux.alibaba.com jbd2: check superblock mapped prior to committing
Sergei Trofimovich slyfox@gentoo.org tty/vt: fix write/write race in ioctl(KDSKBSENT) handler
Yifeng Li tomli@tomli.me tty: vt.c: Fix TIOCL_BLANKSCREEN console blanking if blankinterval == 0
Alexander Sverdlin alexander.sverdlin@nokia.com mtd: spi-nor: intel-spi: Avoid crossing 4K address boundary on read/write
Dmitry Osipenko digetx@gmail.com mfd: max77620: Fix swapped FPS_PERIOD_MAX_US values
Steve Twiss stwiss.opensource@diasemi.com mfd: da9063: Fix OTP control register names to match datasheets for DA9063/63L
Andrea Arcangeli aarcange@redhat.com userfaultfd: use RCU to free the task struct when fork fails
Shuning Zhang sunny.s.zhang@oracle.com ocfs2: fix ocfs2 read inode data panic in ocfs2_iget
Jiri Kosina jkosina@suse.cz mm/mincore.c: make mincore() more conservative
Daniel Borkmann daniel@iogearbox.net bpf, arm64: remove prefetch insn in xadd mapping
Curtis Malainey cujomalainey@chromium.org ASoC: RT5677-SPI: Disable 16Bit SPI Transfers
Jon Hunter jonathanh@nvidia.com ASoC: max98090: Fix restore of DAPM Muxes
Kailang Yang kailang@realtek.com ALSA: hda/realtek - EAPD turn on later
Hui Wang hui.wang@canonical.com ALSA: hda/hdmi - Consider eld_valid when reporting jack event
Hui Wang hui.wang@canonical.com ALSA: hda/hdmi - Read the pin sense from register when repolling
Wenwen Wang wang6495@umn.edu ALSA: usb-audio: Fix a memory leak bug
Eric Biggers ebiggers@google.com crypto: arm/aes-neonbs - don't access already-freed walk.iv
Zhang Zhijie zhangzj@rock-chips.com crypto: rockchip - update IV buffer to contain the next IV
Eric Biggers ebiggers@google.com crypto: gcm - fix incompatibility between "gcm" and "gcm_base"
Eric Biggers ebiggers@google.com crypto: x86/crct10dif-pcl - fix use via crypto_shash_digest()
Eric Biggers ebiggers@google.com crypto: crct10dif-generic - fix use via crypto_shash_digest()
Eric Biggers ebiggers@google.com crypto: skcipher - don't WARN on unprocessed data after slow walk step
Daniel Axtens dja@axtens.net crypto: vmx - fix copy-paste error in CTR mode
Eric Biggers ebiggers@google.com crypto: chacha20poly1305 - set cra_name correctly
Peter Zijlstra peterz@infradead.org sched/x86: Save [ER]FLAGS on context switch
Jean-Philippe Brucker jean-philippe.brucker@arm.com arm64: Save and restore OSDLR_EL1 across suspend/resume
Jean-Philippe Brucker jean-philippe.brucker@arm.com arm64: Clear OSDLR_EL1 on CPU boot
Vincenzo Frascino vincenzo.frascino@arm.com arm64: compat: Reduce address limit
Gustavo A. R. Silva gustavo@embeddedor.com power: supply: axp288_charger: Fix unchecked return value
Wen Yang wen.yang99@zte.com.cn ARM: exynos: Fix a leaked reference by adding missing of_node_put
Sylwester Nawrocki s.nawrocki@samsung.com ARM: dts: exynos: Fix audio (microphone) routing on Odroid XU3
Stuart Menefy stuart.menefy@mathembedded.com ARM: dts: exynos: Fix interrupt for shared EINTs on Exynos5260
Josh Poimboeuf jpoimboe@redhat.com objtool: Fix function fallthrough detection
Andy Lutomirski luto@kernel.org x86/speculation/mds: Improve CPU buffer clear documentation
Andy Lutomirski luto@kernel.org x86/speculation/mds: Revert CPU buffer clear on double fault exit
Dexuan Cui decui@microsoft.com PCI: hv: Add pci_destroy_slot() in pci_devices_present_work(), if necessary
Dexuan Cui decui@microsoft.com PCI: hv: Add hv_pci_remove_slots() when we unload the driver
Dexuan Cui decui@microsoft.com PCI: hv: Fix a memory leak in hv_eject_device_work()
Waiman Long longman@redhat.com locking/rwsem: Prevent decrement of reader count before increment
Sasha Levin sashal@kernel.org net: core: another layer of lists, around PF_MEMALLOC skb handling
-------------
Diffstat:
Documentation/x86/mds.rst | 44 +++------------- Makefile | 4 +- arch/arm/boot/dts/exynos5260.dtsi | 2 +- arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi | 2 +- arch/arm/crypto/aes-neonbs-glue.c | 2 + arch/arm/mach-exynos/firmware.c | 1 + arch/arm/mach-exynos/suspend.c | 2 + arch/arm64/crypto/aes-neonbs-glue.c | 2 + arch/arm64/include/asm/processor.h | 8 +++ arch/arm64/kernel/debug-monitors.c | 1 + arch/arm64/mm/proc.S | 34 ++++++------ arch/arm64/net/bpf_jit.h | 6 --- arch/arm64/net/bpf_jit_comp.c | 1 - arch/x86/crypto/crct10dif-pclmul_glue.c | 13 ++--- arch/x86/entry/entry_32.S | 2 + arch/x86/entry/entry_64.S | 2 + arch/x86/include/asm/switch_to.h | 1 + arch/x86/kernel/process_32.c | 7 +++ arch/x86/kernel/process_64.c | 8 +++ arch/x86/kernel/traps.c | 8 --- arch/x86/kvm/x86.c | 37 ++++++++----- crypto/ccm.c | 44 +++++++--------- crypto/chacha20poly1305.c | 4 +- crypto/crct10dif_generic.c | 11 ++-- crypto/gcm.c | 34 ++++-------- crypto/salsa20_generic.c | 2 +- crypto/skcipher.c | 9 +++- drivers/char/ipmi/ipmi_ssif.c | 6 ++- drivers/crypto/rockchip/rk3288_crypto_ablkcipher.c | 25 ++++++--- drivers/crypto/vmx/aesp8-ppc.pl | 4 +- drivers/md/bcache/journal.c | 11 ++-- drivers/md/bcache/super.c | 2 +- drivers/mtd/spi-nor/intel-spi.c | 8 +++ drivers/pci/host/pci-hyperv.c | 21 ++++++++ drivers/power/supply/axp288_charger.c | 4 ++ drivers/tty/vt/keyboard.c | 33 +++++++++--- drivers/tty/vt/vt.c | 2 - fs/btrfs/backref.c | 34 +++++++----- fs/ext4/extents.c | 17 +++++- fs/ext4/file.c | 7 +++ fs/ext4/inode.c | 2 +- fs/ext4/ioctl.c | 2 +- fs/ext4/mballoc.c | 2 +- fs/ext4/namei.c | 5 +- fs/ext4/resize.c | 1 + fs/ext4/super.c | 60 +++++++++++++--------- fs/ext4/xattr.c | 2 +- fs/fs-writeback.c | 11 ++-- fs/jbd2/journal.c | 4 ++ fs/ocfs2/export.c | 30 ++++++++++- include/linux/list.h | 30 +++++++++++ include/linux/mfd/da9063/registers.h | 6 +-- include/linux/mfd/max77620.h | 4 +- kernel/fork.c | 31 ++++++++++- kernel/locking/rwsem-xadd.c | 44 +++++++++++----- lib/iov_iter.c | 17 +++++- mm/mincore.c | 23 ++++++++- net/core/fib_rules.c | 1 + sound/pci/hda/patch_hdmi.c | 11 +++- sound/pci/hda/patch_realtek.c | 5 +- sound/soc/codecs/max98090.c | 12 ++--- sound/soc/codecs/rt5677-spi.c | 35 ++++++------- sound/usb/mixer.c | 2 + tools/objtool/check.c | 3 +- 64 files changed, 527 insertions(+), 281 deletions(-)
[ Upstream commit 78ed8cc25986ac5c21762eeddc1e86e94d422e36 ]
First example of a layer splitting the list (rather than merely taking individual packets off it). Involves new list.h function, list_cut_before(), like list_cut_position() but cuts on the other side of the given entry.
Signed-off-by: Edward Cree ecree@solarflare.com Signed-off-by: David S. Miller davem@davemloft.net [sl: cut out non list.h bits, we only want list_cut_before] Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/list.h | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+)
--- a/include/linux/list.h +++ b/include/linux/list.h @@ -285,6 +285,36 @@ static inline void list_cut_position(str __list_cut_position(list, head, entry); }
+/** + * list_cut_before - cut a list into two, before given entry + * @list: a new list to add all removed entries + * @head: a list with entries + * @entry: an entry within head, could be the head itself + * + * This helper moves the initial part of @head, up to but + * excluding @entry, from @head to @list. You should pass + * in @entry an element you know is on @head. @list should + * be an empty list or a list you do not care about losing + * its data. + * If @entry == @head, all entries on @head are moved to + * @list. + */ +static inline void list_cut_before(struct list_head *list, + struct list_head *head, + struct list_head *entry) +{ + if (head->next == entry) { + INIT_LIST_HEAD(list); + return; + } + list->next = head->next; + list->next->prev = list; + list->prev = entry->prev; + list->prev->next = list; + head->next = entry; + entry->prev = head; +} + static inline void __list_splice(const struct list_head *list, struct list_head *prev, struct list_head *next)
[ Upstream commit a9e9bcb45b1525ba7aea26ed9441e8632aeeda58 ]
During my rwsem testing, it was found that after a down_read(), the reader count may occasionally become 0 or even negative. Consequently, a writer may steal the lock at that time and execute with the reader in parallel thus breaking the mutual exclusion guarantee of the write lock. In other words, both readers and writer can become rwsem owners simultaneously.
The current reader wakeup code does it in one pass to clear waiter->task and put them into wake_q before fully incrementing the reader count. Once waiter->task is cleared, the corresponding reader may see it, finish the critical section and do unlock to decrement the count before the count is incremented. This is not a problem if there is only one reader to wake up as the count has been pre-incremented by 1. It is a problem if there are more than one readers to be woken up and writer can steal the lock.
The wakeup was actually done in 2 passes before the following v4.9 commit:
70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only once")
To fix this problem, the wakeup is now done in two passes again. In the first pass, we collect the readers and count them. The reader count is then fully incremented. In the second pass, the waiter->task is then cleared and they are put into wake_q to be woken up later.
Signed-off-by: Waiman Long longman@redhat.com Acked-by: Linus Torvalds torvalds@linux-foundation.org Cc: Borislav Petkov bp@alien8.de Cc: Davidlohr Bueso dave@stgolabs.net Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: Tim Chen tim.c.chen@linux.intel.com Cc: Will Deacon will.deacon@arm.com Cc: huang ying huang.ying.caritas@gmail.com Fixes: 70800c3c0cc5 ("locking/rwsem: Scan the wait_list for readers only once") Link: http://lkml.kernel.org/r/20190428212557.13482-2-longman@redhat.com Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/locking/rwsem-xadd.c | 44 +++++++++++++++++++++++++------------ 1 file changed, 30 insertions(+), 14 deletions(-)
diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c index c75017326c37a..3f5be624c7649 100644 --- a/kernel/locking/rwsem-xadd.c +++ b/kernel/locking/rwsem-xadd.c @@ -130,6 +130,7 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, { struct rwsem_waiter *waiter, *tmp; long oldcount, woken = 0, adjustment = 0; + struct list_head wlist;
/* * Take a peek at the queue head waiter such that we can determine @@ -188,18 +189,42 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, * of the queue. We know that woken will be at least 1 as we accounted * for above. Note we increment the 'active part' of the count by the * number of readers before waking any processes up. + * + * We have to do wakeup in 2 passes to prevent the possibility that + * the reader count may be decremented before it is incremented. It + * is because the to-be-woken waiter may not have slept yet. So it + * may see waiter->task got cleared, finish its critical section and + * do an unlock before the reader count increment. + * + * 1) Collect the read-waiters in a separate list, count them and + * fully increment the reader count in rwsem. + * 2) For each waiters in the new list, clear waiter->task and + * put them into wake_q to be woken up later. */ - list_for_each_entry_safe(waiter, tmp, &sem->wait_list, list) { - struct task_struct *tsk; - + list_for_each_entry(waiter, &sem->wait_list, list) { if (waiter->type == RWSEM_WAITING_FOR_WRITE) break;
woken++; - tsk = waiter->task; + } + list_cut_before(&wlist, &sem->wait_list, &waiter->list); + + adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment; + if (list_empty(&sem->wait_list)) { + /* hit end of list above */ + adjustment -= RWSEM_WAITING_BIAS; + } + + if (adjustment) + atomic_long_add(adjustment, &sem->count); + + /* 2nd pass */ + list_for_each_entry_safe(waiter, tmp, &wlist, list) { + struct task_struct *tsk;
+ tsk = waiter->task; get_task_struct(tsk); - list_del(&waiter->list); + /* * Ensure calling get_task_struct() before setting the reader * waiter to nil such that rwsem_down_read_failed() cannot @@ -215,15 +240,6 @@ static void __rwsem_mark_wake(struct rw_semaphore *sem, /* wake_q_add() already take the task ref */ put_task_struct(tsk); } - - adjustment = woken * RWSEM_ACTIVE_READ_BIAS - adjustment; - if (list_empty(&sem->wait_list)) { - /* hit end of list above */ - adjustment -= RWSEM_WAITING_BIAS; - } - - if (adjustment) - atomic_long_add(adjustment, &sem->count); }
/*
[ Upstream commit 05f151a73ec2b23ffbff706e5203e729a995cdc2 ]
When a device is created in new_pcichild_device(), hpdev->refs is set to 2 (i.e. the initial value of 1 plus the get_pcichild()).
When we hot remove the device from the host, in a Linux VM we first call hv_pci_eject_device(), which increases hpdev->refs by get_pcichild() and then schedules a work of hv_eject_device_work(), so hpdev->refs becomes 3 (let's ignore the paired get/put_pcichild() in other places). But in hv_eject_device_work(), currently we only call put_pcichild() twice, meaning the 'hpdev' struct can't be freed in put_pcichild().
Add one put_pcichild() to fix the memory leak.
The device can also be removed when we run "rmmod pci-hyperv". On this path (hv_pci_remove() -> hv_pci_bus_exit() -> hv_pci_devices_present()), hpdev->refs is 2, and we do correctly call put_pcichild() twice in pci_devices_present_work().
Fixes: 4daace0d8ce8 ("PCI: hv: Add paravirtual PCI front-end for Microsoft Hyper-V VMs") Signed-off-by: Dexuan Cui decui@microsoft.com [lorenzo.pieralisi@arm.com: commit log rework] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Stephen Hemminger stephen@networkplumber.org Reviewed-by: Michael Kelley mikelley@microsoft.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/host/pci-hyperv.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c index 53d1c08cef4dc..292450c7da625 100644 --- a/drivers/pci/host/pci-hyperv.c +++ b/drivers/pci/host/pci-hyperv.c @@ -1941,6 +1941,7 @@ static void hv_eject_device_work(struct work_struct *work) VM_PKT_DATA_INBAND, 0);
put_pcichild(hpdev, hv_pcidev_ref_childlist); + put_pcichild(hpdev, hv_pcidev_ref_initial); put_pcichild(hpdev, hv_pcidev_ref_pnp); put_hvpcibus(hpdev->hbus); }
[ Upstream commit 15becc2b56c6eda3d9bf5ae993bafd5661c1fad1 ]
When we unload the pci-hyperv host controller driver, the host does not send us a PCI_EJECT message.
In this case we also need to make sure the sysfs PCI slot directory is removed, otherwise a command on a slot file eg:
"cat /sys/bus/pci/slots/2/address"
will trigger a
"BUG: unable to handle kernel paging request"
and, if we unload/reload the driver several times we would end up with stale slot entries in PCI slot directories in /sys/bus/pci/slots/
root@localhost:~# ls -rtl /sys/bus/pci/slots/ total 0 drwxr-xr-x 2 root root 0 Feb 7 10:49 2 drwxr-xr-x 2 root root 0 Feb 7 10:49 2-1 drwxr-xr-x 2 root root 0 Feb 7 10:51 2-2
Add the missing code to remove the PCI slot and fix the current behaviour.
Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot information") Signed-off-by: Dexuan Cui decui@microsoft.com [lorenzo.pieralisi@arm.com: reformatted the log] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Stephen Hemminger sthemmin@microsoft.com Reviewed-by: Michael Kelley mikelley@microsoft.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/host/pci-hyperv.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c index 292450c7da625..a5825bbcded72 100644 --- a/drivers/pci/host/pci-hyperv.c +++ b/drivers/pci/host/pci-hyperv.c @@ -1513,6 +1513,21 @@ static void hv_pci_assign_slots(struct hv_pcibus_device *hbus) } }
+/* + * Remove entries in sysfs pci slot directory. + */ +static void hv_pci_remove_slots(struct hv_pcibus_device *hbus) +{ + struct hv_pci_dev *hpdev; + + list_for_each_entry(hpdev, &hbus->children, list_entry) { + if (!hpdev->pci_slot) + continue; + pci_destroy_slot(hpdev->pci_slot); + hpdev->pci_slot = NULL; + } +} + /** * create_root_hv_pci_bus() - Expose a new root PCI bus * @hbus: Root PCI bus, as understood by this driver @@ -2719,6 +2734,7 @@ static int hv_pci_remove(struct hv_device *hdev) pci_lock_rescan_remove(); pci_stop_root_bus(hbus->pci_bus); pci_remove_root_bus(hbus->pci_bus); + hv_pci_remove_slots(hbus); pci_unlock_rescan_remove(); hbus->state = hv_pcibus_removed; }
[ Upstream commit 340d455699400f2c2c0f9b3f703ade3085cdb501 ]
When we hot-remove a device, usually the host sends us a PCI_EJECT message, and a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
When we execute the quick hot-add/hot-remove test, the host may not send us the PCI_EJECT message if the guest has not fully finished the initialization by sending the PCI_RESOURCES_ASSIGNED* message to the host, so it's potentially unsafe to only depend on the pci_destroy_slot() in hv_eject_device_work() because the code path
create_root_hv_pci_bus() -> hv_pci_assign_slots()
is not called in this case. Note: in this case, the host still sends the guest a PCI_BUS_RELATIONS message with bus_rel->device_count == 0.
In the quick hot-add/hot-remove test, we can have such a race before the code path
pci_devices_present_work() -> new_pcichild_device()
adds the new device into the hbus->children list, we may have already received the PCI_EJECT message, and since the tasklet handler
hv_pci_onchannelcallback()
may fail to find the "hpdev" by calling
get_pcichild_wslot(hbus, dev_message->wslot.slot)
hv_pci_eject_device() is not called; Later, by continuing execution
create_root_hv_pci_bus() -> hv_pci_assign_slots()
creates the slot and the PCI_BUS_RELATIONS message with bus_rel->device_count == 0 removes the device from hbus->children, and we end up being unable to remove the slot in
hv_pci_remove() -> hv_pci_remove_slots()
Remove the slot in pci_devices_present_work() when the device is removed to address this race.
pci_devices_present_work() and hv_eject_device_work() run in the singled-threaded hbus->wq, so there is not a double-remove issue for the slot.
We cannot offload hv_pci_eject_device() from hv_pci_onchannelcallback() to the workqueue, because we need the hv_pci_onchannelcallback() synchronously call hv_pci_eject_device() to poll the channel ringbuffer to work around the "hangs in hv_compose_msi_msg()" issue fixed in commit de0aa7b2f97d ("PCI: hv: Fix 2 hang issues in hv_compose_msi_msg()")
Fixes: a15f2c08c708 ("PCI: hv: support reporting serial number as slot information") Signed-off-by: Dexuan Cui decui@microsoft.com [lorenzo.pieralisi@arm.com: rewritten commit log] Signed-off-by: Lorenzo Pieralisi lorenzo.pieralisi@arm.com Reviewed-by: Stephen Hemminger stephen@networkplumber.org Reviewed-by: Michael Kelley mikelley@microsoft.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pci/host/pci-hyperv.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/pci/host/pci-hyperv.c b/drivers/pci/host/pci-hyperv.c index a5825bbcded72..f591de23f3d35 100644 --- a/drivers/pci/host/pci-hyperv.c +++ b/drivers/pci/host/pci-hyperv.c @@ -1824,6 +1824,10 @@ static void pci_devices_present_work(struct work_struct *work) hpdev = list_first_entry(&removed, struct hv_pci_dev, list_entry); list_del(&hpdev->list_entry); + + if (hpdev->pci_slot) + pci_destroy_slot(hpdev->pci_slot); + put_pcichild(hpdev, hv_pcidev_ref_initial); }
From: Andy Lutomirski luto@kernel.org
commit 88640e1dcd089879530a49a8d212d1814678dfe7 upstream.
The double fault ESPFIX path doesn't return to user mode at all -- it returns back to the kernel by simulating a #GP fault. prepare_exit_to_usermode() will run on the way out of general_protection before running user code.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@suse.de Cc: Frederic Weisbecker frederic@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jon Masters jcm@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Fixes: 04dcbdb80578 ("x86/speculation/mds: Clear CPU buffers on exit to user") Link: http://lkml.kernel.org/r/ac97612445c0a44ee10374f6ea79c222fe22a5c4.1557865329... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/x86/mds.rst | 7 ------- arch/x86/kernel/traps.c | 8 -------- 2 files changed, 15 deletions(-)
--- a/Documentation/x86/mds.rst +++ b/Documentation/x86/mds.rst @@ -158,13 +158,6 @@ Mitigation points mitigated on the return from do_nmi() to provide almost complete coverage.
- - Double fault (#DF): - - A double fault is usually fatal, but the ESPFIX workaround, which can - be triggered from user space through modify_ldt(2) is a recoverable - double fault. #DF uses the paranoid exit path, so explicit mitigation - in the double fault handler is required. - - Machine Check Exception (#MC):
Another corner case is a #MC which hits between the CPU buffer clear --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -58,7 +58,6 @@ #include <asm/alternative.h> #include <asm/fpu/xstate.h> #include <asm/trace/mpx.h> -#include <asm/nospec-branch.h> #include <asm/mpx.h> #include <asm/vm86.h>
@@ -386,13 +385,6 @@ dotraplinkage void do_double_fault(struc regs->ip = (unsigned long)general_protection; regs->sp = (unsigned long)&gpregs->orig_ax;
- /* - * This situation can be triggered by userspace via - * modify_ldt(2) and the return does not take the regular - * user space exit, so a CPU buffer clear is required when - * MDS mitigation is enabled. - */ - mds_user_clear_cpu_buffers(); return; } #endif
From: Andy Lutomirski luto@kernel.org
commit 9d8d0294e78a164d407133dea05caf4b84247d6a upstream.
On x86_64, all returns to usermode go through prepare_exit_to_usermode(), with the sole exception of do_nmi(). This even includes machine checks -- this was added several years ago to support MCE recovery. Update the documentation.
Signed-off-by: Andy Lutomirski luto@kernel.org Cc: Borislav Petkov bp@suse.de Cc: Frederic Weisbecker frederic@kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jon Masters jcm@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Fixes: 04dcbdb80578 ("x86/speculation/mds: Clear CPU buffers on exit to user") Link: http://lkml.kernel.org/r/999fa9e126ba6a48e9d214d2f18dbde5c62ac55c.1557865329... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- Documentation/x86/mds.rst | 39 +++++++-------------------------------- 1 file changed, 7 insertions(+), 32 deletions(-)
--- a/Documentation/x86/mds.rst +++ b/Documentation/x86/mds.rst @@ -142,38 +142,13 @@ Mitigation points mds_user_clear.
The mitigation is invoked in prepare_exit_to_usermode() which covers - most of the kernel to user space transitions. There are a few exceptions - which are not invoking prepare_exit_to_usermode() on return to user - space. These exceptions use the paranoid exit code. - - - Non Maskable Interrupt (NMI): - - Access to sensible data like keys, credentials in the NMI context is - mostly theoretical: The CPU can do prefetching or execute a - misspeculated code path and thereby fetching data which might end up - leaking through a buffer. - - But for mounting other attacks the kernel stack address of the task is - already valuable information. So in full mitigation mode, the NMI is - mitigated on the return from do_nmi() to provide almost complete - coverage. - - - Machine Check Exception (#MC): - - Another corner case is a #MC which hits between the CPU buffer clear - invocation and the actual return to user. As this still is in kernel - space it takes the paranoid exit path which does not clear the CPU - buffers. So the #MC handler repopulates the buffers to some - extent. Machine checks are not reliably controllable and the window is - extremly small so mitigation would just tick a checkbox that this - theoretical corner case is covered. To keep the amount of special - cases small, ignore #MC. - - - Debug Exception (#DB): - - This takes the paranoid exit path only when the INT1 breakpoint is in - kernel space. #DB on a user space address takes the regular exit path, - so no extra mitigation required. + all but one of the kernel to user space transitions. The exception + is when we return from a Non Maskable Interrupt (NMI), which is + handled directly in do_nmi(). + + (The reason that NMI is special is that prepare_exit_to_usermode() can + enable IRQs. In NMI context, NMIs are blocked, and we don't want to + enable IRQs with NMIs blocked.)
2. C-State transition
From: Josh Poimboeuf jpoimboe@redhat.com
commit e6f393bc939d566ce3def71232d8013de9aaadde upstream.
When a function falls through to the next function due to a compiler bug, objtool prints some obscure warnings. For example:
drivers/regulator/core.o: warning: objtool: regulator_count_voltages()+0x95: return with modified stack frame drivers/regulator/core.o: warning: objtool: regulator_count_voltages()+0x0: stack state mismatch: cfa1=7+32 cfa2=7+8
Instead it should be printing:
drivers/regulator/core.o: warning: objtool: regulator_supply_is_couple() falls through to next function regulator_count_voltages()
This used to work, but was broken by the following commit:
13810435b9a7 ("objtool: Support GCC 8's cold subfunctions")
The padding nops at the end of a function aren't actually part of the function, as defined by the symbol table. So the 'func' variable in validate_branch() is getting cleared to NULL when a padding nop is encountered, breaking the fallthrough detection.
If the current instruction doesn't have a function associated with it, just consider it to be part of the previously detected function by not overwriting the previous value of 'func'.
Reported-by: kbuild test robot lkp@intel.com Signed-off-by: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Fixes: 13810435b9a7 ("objtool: Support GCC 8's cold subfunctions") Link: http://lkml.kernel.org/r/546d143820cd08a46624ae8440d093dd6c902cae.1557766718... Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/objtool/check.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -1779,7 +1779,8 @@ static int validate_branch(struct objtoo return 1; }
- func = insn->func ? insn->func->pfunc : NULL; + if (insn->func) + func = insn->func->pfunc;
if (func && insn->ignore) { WARN_FUNC("BUG: why am I validating an ignored function?",
From: Stuart Menefy stuart.menefy@mathembedded.com
commit b7ed69d67ff0788d8463e599dd5dd1b45c701a7e upstream.
Fix the interrupt information for the GPIO lines with a shared EINT interrupt.
Fixes: 16d7ff2642e7 ("ARM: dts: add dts files for exynos5260 SoC") Cc: stable@vger.kernel.org Signed-off-by: Stuart Menefy stuart.menefy@mathembedded.com Signed-off-by: Krzysztof Kozlowski krzk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/exynos5260.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/exynos5260.dtsi +++ b/arch/arm/boot/dts/exynos5260.dtsi @@ -226,7 +226,7 @@ wakeup-interrupt-controller { compatible = "samsung,exynos4210-wakeup-eint"; interrupt-parent = <&gic>; - interrupts = <GIC_SPI 32 IRQ_TYPE_LEVEL_HIGH>; + interrupts = <GIC_SPI 48 IRQ_TYPE_LEVEL_HIGH>; }; };
From: Sylwester Nawrocki s.nawrocki@samsung.com
commit 9b23e1a3e8fde76e8cc0e366ab1ed4ffb4440feb upstream.
The name of CODEC input widget to which microphone is connected through the "Headphone" jack is "IN12" not "IN1". This fixes microphone support on Odroid XU3.
Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Sylwester Nawrocki s.nawrocki@samsung.com Signed-off-by: Krzysztof Kozlowski krzk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi +++ b/arch/arm/boot/dts/exynos5422-odroidxu3-audio.dtsi @@ -23,7 +23,7 @@ "Headphone Jack", "HPL", "Headphone Jack", "HPR", "Headphone Jack", "MICBIAS", - "IN1", "Headphone Jack", + "IN12", "Headphone Jack", "Speakers", "SPKL", "Speakers", "SPKR";
From: Wen Yang wen.yang99@zte.com.cn
commit 629266bf7229cd6a550075f5961f95607b823b59 upstream.
The call to of_get_next_child returns a node pointer with refcount incremented thus it must be explicitly decremented after the last usage.
Detected by coccinelle with warnings like: arch/arm/mach-exynos/firmware.c:201:2-8: ERROR: missing of_node_put; acquired a node pointer with refcount incremented on line 193, but without a corresponding object release within this function.
Cc: stable@vger.kernel.org Signed-off-by: Wen Yang wen.yang99@zte.com.cn Signed-off-by: Krzysztof Kozlowski krzk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/mach-exynos/firmware.c | 1 + arch/arm/mach-exynos/suspend.c | 2 ++ 2 files changed, 3 insertions(+)
--- a/arch/arm/mach-exynos/firmware.c +++ b/arch/arm/mach-exynos/firmware.c @@ -205,6 +205,7 @@ void __init exynos_firmware_init(void) return;
addr = of_get_address(nd, 0, NULL, NULL); + of_node_put(nd); if (!addr) { pr_err("%s: No address specified.\n", __func__); return; --- a/arch/arm/mach-exynos/suspend.c +++ b/arch/arm/mach-exynos/suspend.c @@ -649,8 +649,10 @@ void __init exynos_pm_init(void)
if (WARN_ON(!of_find_property(np, "interrupt-controller", NULL))) { pr_warn("Outdated DT detected, suspend/resume will NOT work\n"); + of_node_put(np); return; } + of_node_put(np);
pm_data = (const struct exynos_pm_data *) match->data;
From: Gustavo A. R. Silva gustavo@embeddedor.com
commit c3422ad5f84a66739ec6a37251ca27638c85b6be upstream.
Currently there is no check on platform_get_irq() return value in case it fails, hence never actually reporting any errors and causing unexpected behavior when using such value as argument for function regmap_irq_get_virq().
Fix this by adding a proper check, a message reporting any errors and returning *pirq*
Addresses-Coverity-ID: 1443940 ("Improper use of negative value") Fixes: 843735b788a4 ("power: axp288_charger: axp288 charger driver") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva gustavo@embeddedor.com Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/power/supply/axp288_charger.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/power/supply/axp288_charger.c +++ b/drivers/power/supply/axp288_charger.c @@ -881,6 +881,10 @@ static int axp288_charger_probe(struct p /* Register charger interrupts */ for (i = 0; i < CHRG_INTR_END; i++) { pirq = platform_get_irq(info->pdev, i); + if (pirq < 0) { + dev_err(&pdev->dev, "Failed to get IRQ: %d\n", pirq); + return pirq; + } info->irq[i] = regmap_irq_get_virq(info->regmap_irqc, pirq); if (info->irq[i] < 0) { dev_warn(&info->pdev->dev,
From: Vincenzo Frascino vincenzo.frascino@arm.com
commit d263119387de9975d2acba1dfd3392f7c5979c18 upstream.
Currently, compat tasks running on arm64 can allocate memory up to TASK_SIZE_32 (UL(0x100000000)).
This means that mmap() allocations, if we treat them as returning an array, are not compliant with the sections 6.5.8 of the C standard (C99) which states that: "If the expression P points to an element of an array object and the expression Q points to the last element of the same array object, the pointer expression Q+1 compares greater than P".
Redefine TASK_SIZE_32 to address the issue.
Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will.deacon@arm.com Cc: Jann Horn jannh@google.com Cc: stable@vger.kernel.org Reported-by: Jann Horn jannh@google.com Signed-off-by: Vincenzo Frascino vincenzo.frascino@arm.com [will: fixed typo in comment] Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/include/asm/processor.h | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/arch/arm64/include/asm/processor.h +++ b/arch/arm64/include/asm/processor.h @@ -49,7 +49,15 @@ * TASK_UNMAPPED_BASE - the lower boundary of the mmap VM area. */ #ifdef CONFIG_COMPAT +#ifdef CONFIG_ARM64_64K_PAGES +/* + * With CONFIG_ARM64_64K_PAGES enabled, the last page is occupied + * by the compat vectors page. + */ #define TASK_SIZE_32 UL(0x100000000) +#else +#define TASK_SIZE_32 (UL(0x100000000) - PAGE_SIZE) +#endif /* CONFIG_ARM64_64K_PAGES */ #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ TASK_SIZE_32 : TASK_SIZE_64) #define TASK_SIZE_OF(tsk) (test_tsk_thread_flag(tsk, TIF_32BIT) ? \
From: Jean-Philippe Brucker jean-philippe.brucker@arm.com
commit 6fda41bf12615ee7c3ddac88155099b1a8cf8d00 upstream.
Some firmwares may reboot CPUs with OS Double Lock set. Make sure that it is unlocked, in order to use debug exceptions.
Cc: stable@vger.kernel.org Signed-off-by: Jean-Philippe Brucker jean-philippe.brucker@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/debug-monitors.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm64/kernel/debug-monitors.c +++ b/arch/arm64/kernel/debug-monitors.c @@ -133,6 +133,7 @@ NOKPROBE_SYMBOL(disable_debug_monitors); */ static int clear_os_lock(unsigned int cpu) { + write_sysreg(0, osdlr_el1); write_sysreg(0, oslar_el1); isb(); return 0;
From: Jean-Philippe Brucker jean-philippe.brucker@arm.com
commit 827a108e354db633698f0b4a10c1ffd2b1f8d1d0 upstream.
When the CPU comes out of suspend, the firmware may have modified the OS Double Lock Register. Save it in an unused slot of cpu_suspend_ctx, and restore it on resume.
Cc: stable@vger.kernel.org Signed-off-by: Jean-Philippe Brucker jean-philippe.brucker@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/mm/proc.S | 34 ++++++++++++++++++---------------- 1 file changed, 18 insertions(+), 16 deletions(-)
--- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -64,24 +64,25 @@ ENTRY(cpu_do_suspend) mrs x2, tpidr_el0 mrs x3, tpidrro_el0 mrs x4, contextidr_el1 - mrs x5, cpacr_el1 - mrs x6, tcr_el1 - mrs x7, vbar_el1 - mrs x8, mdscr_el1 - mrs x9, oslsr_el1 - mrs x10, sctlr_el1 + mrs x5, osdlr_el1 + mrs x6, cpacr_el1 + mrs x7, tcr_el1 + mrs x8, vbar_el1 + mrs x9, mdscr_el1 + mrs x10, oslsr_el1 + mrs x11, sctlr_el1 alternative_if_not ARM64_HAS_VIRT_HOST_EXTN - mrs x11, tpidr_el1 + mrs x12, tpidr_el1 alternative_else - mrs x11, tpidr_el2 + mrs x12, tpidr_el2 alternative_endif - mrs x12, sp_el0 + mrs x13, sp_el0 stp x2, x3, [x0] - stp x4, xzr, [x0, #16] - stp x5, x6, [x0, #32] - stp x7, x8, [x0, #48] - stp x9, x10, [x0, #64] - stp x11, x12, [x0, #80] + stp x4, x5, [x0, #16] + stp x6, x7, [x0, #32] + stp x8, x9, [x0, #48] + stp x10, x11, [x0, #64] + stp x12, x13, [x0, #80] ret ENDPROC(cpu_do_suspend)
@@ -104,8 +105,8 @@ ENTRY(cpu_do_resume) msr cpacr_el1, x6
/* Don't change t0sz here, mask those bits when restoring */ - mrs x5, tcr_el1 - bfi x8, x5, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH + mrs x7, tcr_el1 + bfi x8, x7, TCR_T0SZ_OFFSET, TCR_TxSZ_WIDTH
msr tcr_el1, x8 msr vbar_el1, x9 @@ -129,6 +130,7 @@ alternative_endif /* * Restore oslsr_el1 by writing oslar_el1 */ + msr osdlr_el1, x5 ubfx x11, x11, #1, #1 msr oslar_el1, x11 reset_pmuserenr_el0 x0 // Disable PMU access from EL0
From: Peter Zijlstra peterz@infradead.org
commit 6690e86be83ac75832e461c141055b5d601c0a6d upstream.
Effectively reverts commit:
2c7577a75837 ("sched/x86_64: Don't save flags on context switch")
Specifically because SMAP uses FLAGS.AC which invalidates the claim that the kernel has clean flags.
In particular; while preemption from interrupt return is fine (the IRET frame on the exception stack contains FLAGS) it breaks any code that does synchonous scheduling, including preempt_enable().
This has become a significant issue ever since commit:
5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses")
provided for means of having 'normal' C code between STAC / CLAC, exposing the FLAGS.AC state. So far this hasn't led to trouble, however fix it before it comes apart.
Reported-by: Julien Thierry julien.thierry@arm.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Andy Lutomirski luto@amacapital.net Cc: Borislav Petkov bp@alien8.de Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: stable@kernel.org Fixes: 5b24a7a2aa20 ("Add 'unsafe' user access functions for batched accesses") Signed-off-by: Ingo Molnar mingo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/entry/entry_32.S | 2 ++ arch/x86/entry/entry_64.S | 2 ++ arch/x86/include/asm/switch_to.h | 1 + arch/x86/kernel/process_32.c | 7 +++++++ arch/x86/kernel/process_64.c | 8 ++++++++ 5 files changed, 20 insertions(+)
--- a/arch/x86/entry/entry_32.S +++ b/arch/x86/entry/entry_32.S @@ -234,6 +234,7 @@ ENTRY(__switch_to_asm) pushl %ebx pushl %edi pushl %esi + pushfl
/* switch stack */ movl %esp, TASK_threadsp(%eax) @@ -256,6 +257,7 @@ ENTRY(__switch_to_asm) #endif
/* restore callee-saved registers */ + popfl popl %esi popl %edi popl %ebx --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -342,6 +342,7 @@ ENTRY(__switch_to_asm) pushq %r13 pushq %r14 pushq %r15 + pushfq
/* switch stack */ movq %rsp, TASK_threadsp(%rdi) @@ -364,6 +365,7 @@ ENTRY(__switch_to_asm) #endif
/* restore callee-saved registers */ + popfq popq %r15 popq %r14 popq %r13 --- a/arch/x86/include/asm/switch_to.h +++ b/arch/x86/include/asm/switch_to.h @@ -41,6 +41,7 @@ asmlinkage void ret_from_fork(void); * order of the fields must match the code in __switch_to_asm(). */ struct inactive_task_frame { + unsigned long flags; #ifdef CONFIG_X86_64 unsigned long r15; unsigned long r14; --- a/arch/x86/kernel/process_32.c +++ b/arch/x86/kernel/process_32.c @@ -132,6 +132,13 @@ int copy_thread_tls(unsigned long clone_ struct task_struct *tsk; int err;
+ /* + * For a new task use the RESET flags value since there is no before. + * All the status flags are zero; DF and all the system flags must also + * be 0, specifically IF must be 0 because we context switch to the new + * task with interrupts disabled. + */ + frame->flags = X86_EFLAGS_FIXED; frame->bp = 0; frame->ret_addr = (unsigned long) ret_from_fork; p->thread.sp = (unsigned long) fork_frame; --- a/arch/x86/kernel/process_64.c +++ b/arch/x86/kernel/process_64.c @@ -278,6 +278,14 @@ int copy_thread_tls(unsigned long clone_ childregs = task_pt_regs(p); fork_frame = container_of(childregs, struct fork_frame, regs); frame = &fork_frame->frame; + + /* + * For a new task use the RESET flags value since there is no before. + * All the status flags are zero; DF and all the system flags must also + * be 0, specifically IF must be 0 because we context switch to the new + * task with interrupts disabled. + */ + frame->flags = X86_EFLAGS_FIXED; frame->bp = 0; frame->ret_addr = (unsigned long) ret_from_fork; p->thread.sp = (unsigned long) fork_frame;
From: Eric Biggers ebiggers@google.com
commit 5e27f38f1f3f45a0c938299c3a34a2d2db77165a upstream.
If the rfc7539 template is instantiated with specific implementations, e.g. "rfc7539(chacha20-generic,poly1305-generic)" rather than "rfc7539(chacha20,poly1305)", then the implementation names end up included in the instance's cra_name. This is incorrect because it then prevents all users from allocating "rfc7539(chacha20,poly1305)", if the highest priority implementations of chacha20 and poly1305 were selected. Also, the self-tests aren't run on an instance allocated in this way.
Fix it by setting the instance's cra_name from the underlying algorithms' actual cra_names, rather than from the requested names. This matches what other templates do.
Fixes: 71ebc4d1b27d ("crypto: chacha20poly1305 - Add a ChaCha20-Poly1305 AEAD construction, RFC7539") Cc: stable@vger.kernel.org # v4.2+ Cc: Martin Willi martin@strongswan.org Signed-off-by: Eric Biggers ebiggers@google.com Reviewed-by: Martin Willi martin@strongswan.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/chacha20poly1305.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/crypto/chacha20poly1305.c +++ b/crypto/chacha20poly1305.c @@ -647,8 +647,8 @@ static int chachapoly_create(struct cryp
err = -ENAMETOOLONG; if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, - "%s(%s,%s)", name, chacha_name, - poly_name) >= CRYPTO_MAX_ALG_NAME) + "%s(%s,%s)", name, chacha->base.cra_name, + poly->cra_name) >= CRYPTO_MAX_ALG_NAME) goto out_drop_chacha; if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME, "%s(%s,%s)", name, chacha->base.cra_driver_name,
From: Daniel Axtens dja@axtens.net
commit dcf7b48212c0fab7df69e84fab22d6cb7c8c0fb9 upstream.
The original assembly imported from OpenSSL has two copy-paste errors in handling CTR mode. When dealing with a 2 or 3 block tail, the code branches to the CBC decryption exit path, rather than to the CTR exit path.
This leads to corruption of the IV, which leads to subsequent blocks being corrupted.
This can be detected with libkcapi test suite, which is available at https://github.com/smuellerDD/libkcapi
Reported-by: Ondrej Mosnáček omosnacek@gmail.com Fixes: 5c380d623ed3 ("crypto: vmx - Add support for VMS instructions by ASM") Cc: stable@vger.kernel.org Signed-off-by: Daniel Axtens dja@axtens.net Tested-by: Michael Ellerman mpe@ellerman.id.au Tested-by: Ondrej Mosnacek omosnacek@gmail.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/vmx/aesp8-ppc.pl | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/crypto/vmx/aesp8-ppc.pl +++ b/drivers/crypto/vmx/aesp8-ppc.pl @@ -1815,7 +1815,7 @@ Lctr32_enc8x_three: stvx_u $out1,$x10,$out stvx_u $out2,$x20,$out addi $out,$out,0x30 - b Lcbc_dec8x_done + b Lctr32_enc8x_done
.align 5 Lctr32_enc8x_two: @@ -1827,7 +1827,7 @@ Lctr32_enc8x_two: stvx_u $out0,$x00,$out stvx_u $out1,$x10,$out addi $out,$out,0x20 - b Lcbc_dec8x_done + b Lctr32_enc8x_done
.align 5 Lctr32_enc8x_one:
From: Eric Biggers ebiggers@google.com
commit dcaca01a42cc2c425154a13412b4124293a6e11e upstream.
skcipher_walk_done() assumes it's a bug if, after the "slow" path is executed where the next chunk of data is processed via a bounce buffer, the algorithm says it didn't process all bytes. Thus it WARNs on this.
However, this can happen legitimately when the message needs to be evenly divisible into "blocks" but isn't, and the algorithm has a 'walksize' greater than the block size. For example, ecb-aes-neonbs sets 'walksize' to 128 bytes and only supports messages evenly divisible into 16-byte blocks. If, say, 17 message bytes remain but they straddle scatterlist elements, the skcipher_walk code will take the "slow" path and pass the algorithm all 17 bytes in the bounce buffer. But the algorithm will only be able to process 16 bytes, triggering the WARN.
Fix this by just removing the WARN_ON(). Returning -EINVAL, as the code already does, is the right behavior.
This bug was detected by my patches that improve testmgr to fuzz algorithms against their generic implementation.
Fixes: b286d8b1a690 ("crypto: skcipher - Add skcipher walk interface") Cc: stable@vger.kernel.org # v4.10+ Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/skcipher.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/crypto/skcipher.c +++ b/crypto/skcipher.c @@ -131,8 +131,13 @@ unmap_src: memcpy(walk->dst.virt.addr, walk->page, n); skcipher_unmap_dst(walk); } else if (unlikely(walk->flags & SKCIPHER_WALK_SLOW)) { - if (WARN_ON(err)) { - /* unexpected case; didn't process all bytes */ + if (err) { + /* + * Didn't process all bytes. Either the algorithm is + * broken, or this was the last step and it turned out + * the message wasn't evenly divisible into blocks but + * the algorithm requires it. + */ err = -EINVAL; goto finish; }
From: Eric Biggers ebiggers@google.com
commit 307508d1072979f4435416f87936f87eaeb82054 upstream.
The ->digest() method of crct10dif-generic reads the current CRC value from the shash_desc context. But this value is uninitialized, causing crypto_shash_digest() to compute the wrong result. Fix it.
Probably this wasn't noticed before because lib/crc-t10dif.c only uses crypto_shash_update(), not crypto_shash_digest(). Likewise, crypto_shash_digest() is not yet tested by the crypto self-tests because those only test the ahash API which only uses shash init/update/final.
This bug was detected by my patches that improve testmgr to fuzz algorithms against their generic implementation.
Fixes: 2d31e518a428 ("crypto: crct10dif - Wrap crc_t10dif function all to use crypto transform framework") Cc: stable@vger.kernel.org # v3.11+ Cc: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/crct10dif_generic.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
--- a/crypto/crct10dif_generic.c +++ b/crypto/crct10dif_generic.c @@ -65,10 +65,9 @@ static int chksum_final(struct shash_des return 0; }
-static int __chksum_finup(__u16 *crcp, const u8 *data, unsigned int len, - u8 *out) +static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out) { - *(__u16 *)out = crc_t10dif_generic(*crcp, data, len); + *(__u16 *)out = crc_t10dif_generic(crc, data, len); return 0; }
@@ -77,15 +76,13 @@ static int chksum_finup(struct shash_des { struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
- return __chksum_finup(&ctx->crc, data, len, out); + return __chksum_finup(ctx->crc, data, len, out); }
static int chksum_digest(struct shash_desc *desc, const u8 *data, unsigned int length, u8 *out) { - struct chksum_desc_ctx *ctx = shash_desc_ctx(desc); - - return __chksum_finup(&ctx->crc, data, length, out); + return __chksum_finup(0, data, length, out); }
static struct shash_alg alg = {
From: Eric Biggers ebiggers@google.com
commit dec3d0b1071a0f3194e66a83d26ecf4aa8c5910e upstream.
The ->digest() method of crct10dif-pclmul reads the current CRC value from the shash_desc context. But this value is uninitialized, causing crypto_shash_digest() to compute the wrong result. Fix it.
Probably this wasn't noticed before because lib/crc-t10dif.c only uses crypto_shash_update(), not crypto_shash_digest(). Likewise, crypto_shash_digest() is not yet tested by the crypto self-tests because those only test the ahash API which only uses shash init/update/final.
Fixes: 0b95a7f85718 ("crypto: crct10dif - Glue code to cast accelerated CRCT10DIF assembly as a crypto transform") Cc: stable@vger.kernel.org # v3.11+ Cc: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/crypto/crct10dif-pclmul_glue.c | 13 +++++-------- 1 file changed, 5 insertions(+), 8 deletions(-)
--- a/arch/x86/crypto/crct10dif-pclmul_glue.c +++ b/arch/x86/crypto/crct10dif-pclmul_glue.c @@ -76,15 +76,14 @@ static int chksum_final(struct shash_des return 0; }
-static int __chksum_finup(__u16 *crcp, const u8 *data, unsigned int len, - u8 *out) +static int __chksum_finup(__u16 crc, const u8 *data, unsigned int len, u8 *out) { if (irq_fpu_usable()) { kernel_fpu_begin(); - *(__u16 *)out = crc_t10dif_pcl(*crcp, data, len); + *(__u16 *)out = crc_t10dif_pcl(crc, data, len); kernel_fpu_end(); } else - *(__u16 *)out = crc_t10dif_generic(*crcp, data, len); + *(__u16 *)out = crc_t10dif_generic(crc, data, len); return 0; }
@@ -93,15 +92,13 @@ static int chksum_finup(struct shash_des { struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
- return __chksum_finup(&ctx->crc, data, len, out); + return __chksum_finup(ctx->crc, data, len, out); }
static int chksum_digest(struct shash_desc *desc, const u8 *data, unsigned int length, u8 *out) { - struct chksum_desc_ctx *ctx = shash_desc_ctx(desc); - - return __chksum_finup(&ctx->crc, data, length, out); + return __chksum_finup(0, data, length, out); }
static struct shash_alg alg = {
From: Eric Biggers ebiggers@google.com
commit f699594d436960160f6d5ba84ed4a222f20d11cd upstream.
GCM instances can be created by either the "gcm" template, which only allows choosing the block cipher, e.g. "gcm(aes)"; or by "gcm_base", which allows choosing the ctr and ghash implementations, e.g. "gcm_base(ctr(aes-generic),ghash-generic)".
However, a "gcm_base" instance prevents a "gcm" instance from being registered using the same implementations. Nor will the instance be found by lookups of "gcm". This can be used as a denial of service. Moreover, "gcm_base" instances are never tested by the crypto self-tests, even if there are compatible "gcm" tests.
The root cause of these problems is that instances of the two templates use different cra_names. Therefore, fix these problems by making "gcm_base" instances set the same cra_name as "gcm" instances, e.g. "gcm(aes)" instead of "gcm_base(ctr(aes-generic),ghash-generic)".
This requires extracting the block cipher name from the name of the ctr algorithm. It also requires starting to verify that the algorithms are really ctr and ghash, not something else entirely. But it would be bizarre if anyone were actually using non-gcm-compatible algorithms with gcm_base, so this shouldn't break anyone in practice.
Fixes: d00aa19b507b ("[CRYPTO] gcm: Allow block cipher parameter") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/gcm.c | 34 +++++++++++----------------------- 1 file changed, 11 insertions(+), 23 deletions(-)
--- a/crypto/gcm.c +++ b/crypto/gcm.c @@ -616,7 +616,6 @@ static void crypto_gcm_free(struct aead_
static int crypto_gcm_create_common(struct crypto_template *tmpl, struct rtattr **tb, - const char *full_name, const char *ctr_name, const char *ghash_name) { @@ -657,7 +656,8 @@ static int crypto_gcm_create_common(stru goto err_free_inst;
err = -EINVAL; - if (ghash->digestsize != 16) + if (strcmp(ghash->base.cra_name, "ghash") != 0 || + ghash->digestsize != 16) goto err_drop_ghash;
crypto_set_skcipher_spawn(&ctx->ctr, aead_crypto_instance(inst)); @@ -669,24 +669,24 @@ static int crypto_gcm_create_common(stru
ctr = crypto_spawn_skcipher_alg(&ctx->ctr);
- /* We only support 16-byte blocks. */ + /* The skcipher algorithm must be CTR mode, using 16-byte blocks. */ err = -EINVAL; - if (crypto_skcipher_alg_ivsize(ctr) != 16) + if (strncmp(ctr->base.cra_name, "ctr(", 4) != 0 || + crypto_skcipher_alg_ivsize(ctr) != 16 || + ctr->base.cra_blocksize != 1) goto out_put_ctr;
- /* Not a stream cipher? */ - if (ctr->base.cra_blocksize != 1) + err = -ENAMETOOLONG; + if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, + "gcm(%s", ctr->base.cra_name + 4) >= CRYPTO_MAX_ALG_NAME) goto out_put_ctr;
- err = -ENAMETOOLONG; if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME, "gcm_base(%s,%s)", ctr->base.cra_driver_name, ghash_alg->cra_driver_name) >= CRYPTO_MAX_ALG_NAME) goto out_put_ctr;
- memcpy(inst->alg.base.cra_name, full_name, CRYPTO_MAX_ALG_NAME); - inst->alg.base.cra_flags = (ghash->base.cra_flags | ctr->base.cra_flags) & CRYPTO_ALG_ASYNC; inst->alg.base.cra_priority = (ghash->base.cra_priority + @@ -728,7 +728,6 @@ static int crypto_gcm_create(struct cryp { const char *cipher_name; char ctr_name[CRYPTO_MAX_ALG_NAME]; - char full_name[CRYPTO_MAX_ALG_NAME];
cipher_name = crypto_attr_alg_name(tb[1]); if (IS_ERR(cipher_name)) @@ -738,12 +737,7 @@ static int crypto_gcm_create(struct cryp CRYPTO_MAX_ALG_NAME) return -ENAMETOOLONG;
- if (snprintf(full_name, CRYPTO_MAX_ALG_NAME, "gcm(%s)", cipher_name) >= - CRYPTO_MAX_ALG_NAME) - return -ENAMETOOLONG; - - return crypto_gcm_create_common(tmpl, tb, full_name, - ctr_name, "ghash"); + return crypto_gcm_create_common(tmpl, tb, ctr_name, "ghash"); }
static struct crypto_template crypto_gcm_tmpl = { @@ -757,7 +751,6 @@ static int crypto_gcm_base_create(struct { const char *ctr_name; const char *ghash_name; - char full_name[CRYPTO_MAX_ALG_NAME];
ctr_name = crypto_attr_alg_name(tb[1]); if (IS_ERR(ctr_name)) @@ -767,12 +760,7 @@ static int crypto_gcm_base_create(struct if (IS_ERR(ghash_name)) return PTR_ERR(ghash_name);
- if (snprintf(full_name, CRYPTO_MAX_ALG_NAME, "gcm_base(%s,%s)", - ctr_name, ghash_name) >= CRYPTO_MAX_ALG_NAME) - return -ENAMETOOLONG; - - return crypto_gcm_create_common(tmpl, tb, full_name, - ctr_name, ghash_name); + return crypto_gcm_create_common(tmpl, tb, ctr_name, ghash_name); }
static struct crypto_template crypto_gcm_base_tmpl = {
From: Zhang Zhijie zhangzj@rock-chips.com
commit f0cfd57b43fec65761ca61d3892b983a71515f23 upstream.
The Kernel Crypto API request output the next IV data to IV buffer for CBC implementation. So the last block data of ciphertext should be copid into assigned IV buffer.
Reported-by: Eric Biggers ebiggers@google.com Fixes: 433cd2c617bf ("crypto: rockchip - add crypto driver for rk3288") Cc: stable@vger.kernel.org # v4.5+ Signed-off-by: Zhang Zhijie zhangzj@rock-chips.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/crypto/rockchip/rk3288_crypto_ablkcipher.c | 25 +++++++++++++++------ 1 file changed, 18 insertions(+), 7 deletions(-)
--- a/drivers/crypto/rockchip/rk3288_crypto_ablkcipher.c +++ b/drivers/crypto/rockchip/rk3288_crypto_ablkcipher.c @@ -250,9 +250,14 @@ static int rk_set_data_start(struct rk_c u8 *src_last_blk = page_address(sg_page(dev->sg_src)) + dev->sg_src->offset + dev->sg_src->length - ivsize;
- /* store the iv that need to be updated in chain mode */ - if (ctx->mode & RK_CRYPTO_DEC) + /* Store the iv that need to be updated in chain mode. + * And update the IV buffer to contain the next IV for decryption mode. + */ + if (ctx->mode & RK_CRYPTO_DEC) { memcpy(ctx->iv, src_last_blk, ivsize); + sg_pcopy_to_buffer(dev->first, dev->src_nents, req->info, + ivsize, dev->total - ivsize); + }
err = dev->load_data(dev, dev->sg_src, dev->sg_dst); if (!err) @@ -288,13 +293,19 @@ static void rk_iv_copyback(struct rk_cry struct ablkcipher_request *req = ablkcipher_request_cast(dev->async_req); struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req); + struct rk_cipher_ctx *ctx = crypto_ablkcipher_ctx(tfm); u32 ivsize = crypto_ablkcipher_ivsize(tfm);
- if (ivsize == DES_BLOCK_SIZE) - memcpy_fromio(req->info, dev->reg + RK_CRYPTO_TDES_IV_0, - ivsize); - else if (ivsize == AES_BLOCK_SIZE) - memcpy_fromio(req->info, dev->reg + RK_CRYPTO_AES_IV_0, ivsize); + /* Update the IV buffer to contain the next IV for encryption mode. */ + if (!(ctx->mode & RK_CRYPTO_DEC)) { + if (dev->aligned) { + memcpy(req->info, sg_virt(dev->sg_dst) + + dev->sg_dst->length - ivsize, ivsize); + } else { + memcpy(req->info, dev->addr_vir + + dev->count - ivsize, ivsize); + } + } }
static void rk_update_iv(struct rk_crypto_info *dev)
From: Eric Biggers ebiggers@google.com
commit 767f015ea0b7ab9d60432ff6cd06b664fd71f50f upstream.
If the user-provided IV needs to be aligned to the algorithm's alignmask, then skcipher_walk_virt() copies the IV into a new aligned buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then if the caller unconditionally accesses walk.iv, it's a use-after-free.
arm32 xts-aes-neonbs doesn't set an alignmask, so currently it isn't affected by this despite unconditionally accessing walk.iv. However this is more subtle than desired, and it was actually broken prior to the alignmask being removed by commit cc477bf64573 ("crypto: arm/aes - replace bit-sliced OpenSSL NEON code"). Thus, update xts-aes-neonbs to start checking the return value of skcipher_walk_virt().
Fixes: e4e7f10bfc40 ("ARM: add support for bit sliced AES using NEON instructions") Cc: stable@vger.kernel.org # v3.13+ Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/crypto/aes-neonbs-glue.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/arm/crypto/aes-neonbs-glue.c +++ b/arch/arm/crypto/aes-neonbs-glue.c @@ -280,6 +280,8 @@ static int __xts_crypt(struct skcipher_r int err;
err = skcipher_walk_virt(&walk, req, true); + if (err) + return err;
crypto_cipher_encrypt_one(ctx->tweak_tfm, walk.iv, walk.iv);
From: Wenwen Wang wang6495@umn.edu
commit cb5173594d50c72b7bfa14113dfc5084b4d2f726 upstream.
In parse_audio_selector_unit(), the string array 'namelist' is allocated through kmalloc_array(), and each string pointer in this array, i.e., 'namelist[]', is allocated through kmalloc() in the following for loop. Then, a control instance 'kctl' is created by invoking snd_ctl_new1(). If an error occurs during the creation process, the string array 'namelist', including all string pointers in the array 'namelist[]', should be freed, before the error code ENOMEM is returned. However, the current code does not free 'namelist[]', resulting in memory leaks.
To fix the above issue, free all string pointers 'namelist[]' in a loop.
Signed-off-by: Wenwen Wang wang6495@umn.edu Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/mixer.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -2184,6 +2184,8 @@ static int parse_audio_selector_unit(str kctl = snd_ctl_new1(&mixer_selectunit_ctl, cval); if (! kctl) { usb_audio_err(state->chip, "cannot malloc kcontrol\n"); + for (i = 0; i < desc->bNrInPins; i++) + kfree(namelist[i]); kfree(namelist); kfree(cval); return -ENOMEM;
From: Hui Wang hui.wang@canonical.com
commit 8c2e6728c2bf95765b724e07d0278ae97cd1ee0d upstream.
The driver will check the monitor presence when resuming from suspend, starting poll or interrupt triggers. In these 3 situations, the jack_dirty will be set to 1 first, then the hda_jack.c reads the pin_sense from register, after reading the register, the jack_dirty will be set to 0. But hdmi_repoll_work() is enabled in these 3 situations, It will read the pin_sense a couple of times subsequently, since the jack_dirty is 0 now, It does not read the register anymore, instead it uses the shadow pin_sense which is read at the first time.
It is meaningless to check the shadow pin_sense a couple of times, we need to read the register to check the real plugging state, so we set the jack_dirty to 1 in the hdmi_repoll_work().
Signed-off-by: Hui Wang hui.wang@canonical.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_hdmi.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -1661,6 +1661,11 @@ static void hdmi_repoll_eld(struct work_ container_of(to_delayed_work(work), struct hdmi_spec_per_pin, work); struct hda_codec *codec = per_pin->codec; struct hdmi_spec *spec = codec->spec; + struct hda_jack_tbl *jack; + + jack = snd_hda_jack_tbl_get(codec, per_pin->pin_nid); + if (jack) + jack->jack_dirty = 1;
if (per_pin->repoll_count++ > 6) per_pin->repoll_count = 0;
From: Hui Wang hui.wang@canonical.com
commit 7f641e26a6df9269cb25dd7a4b0a91d6586ed441 upstream.
On the machines with AMD GPU or Nvidia GPU, we often meet this issue: after s3, there are 4 HDMI/DP audio devices in the gnome-sound-setting even there is no any monitors plugged.
When this problem happens, we check the /proc/asound/cardX/eld#N.M, we will find the monitor_present=1, eld_valid=0.
The root cause is BIOS or GPU driver makes the PRESENCE valid even no monitor plugged, and of course the driver will not get the valid eld_data subsequently.
In this situation, we should not report the jack_plugged event, to do so, let us change the function hdmi_present_sense_via_verbs(). In this function, it reads the pin_sense via snd_hda_pin_sense(), after calling this function, the jack_dirty is 0, and before exiting via_verbs(), we change the shadow pin_sense according to both monitor_present and eld_valid, then in the snd_hda_jack_report_sync(), since the jack_dirty is still 0, it will report jack event according to this modified shadow pin_sense.
After this change, the driver will not report Jack_is_plugged event through hdmi_present_sense_via_verbs() if monitor_present is 1 and eld_valid is 0.
Signed-off-by: Hui Wang hui.wang@canonical.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_hdmi.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -1549,9 +1549,11 @@ static bool hdmi_present_sense_via_verbs ret = !repoll || !eld->monitor_present || eld->eld_valid;
jack = snd_hda_jack_tbl_get(codec, pin_nid); - if (jack) + if (jack) { jack->block_report = !ret; - + jack->pin_sense = (eld->monitor_present && eld->eld_valid) ? + AC_PINSENSE_PRESENCE : 0; + } mutex_unlock(&per_pin->lock); return ret; }
From: Kailang Yang kailang@realtek.com
commit 607ca3bd220f4022e6f5356026b19dafc363863a upstream.
Let EAPD turn on after set pin output.
[ NOTE: This change is supposed to reduce the possible click noises at (runtime) PM resume. The functionality should be same (i.e. the verbs are executed correctly) no matter which order is, so this should be safe to apply for all codecs -- tiwai ]
Signed-off-by: Kailang Yang kailang@realtek.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -781,11 +781,10 @@ static int alc_init(struct hda_codec *co if (spec->init_hook) spec->init_hook(codec);
+ snd_hda_gen_init(codec); alc_fix_pll(codec); alc_auto_init_amp(codec, spec->init_amp);
- snd_hda_gen_init(codec); - snd_hda_apply_fixup(codec, HDA_FIXUP_ACT_INIT);
return 0;
From: Jon Hunter jonathanh@nvidia.com
commit ecb2795c08bc825ebd604997e5be440b060c5b18 upstream.
The max98090 driver defines 3 DAPM muxes; one for the right line output (LINMOD Mux), one for the left headphone mixer source (MIXHPLSEL Mux) and one for the right headphone mixer source (MIXHPRSEL Mux). The same bit is used for the mux as well as the DAPM enable, and although the mux can be correctly configured, after playback has completed, the mux will be reset during the disable phase. This is preventing the state of these muxes from being saved and restored correctly on system reboot. Fix this by marking these muxes as SND_SOC_NOPM.
Note this has been verified this on the Tegra124 Nyan Big which features the MAX98090 codec.
Signed-off-by: Jon Hunter jonathanh@nvidia.com Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/codecs/max98090.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/sound/soc/codecs/max98090.c +++ b/sound/soc/codecs/max98090.c @@ -1209,14 +1209,14 @@ static const struct snd_soc_dapm_widget &max98090_right_rcv_mixer_controls[0], ARRAY_SIZE(max98090_right_rcv_mixer_controls)),
- SND_SOC_DAPM_MUX("LINMOD Mux", M98090_REG_LOUTR_MIXER, - M98090_LINMOD_SHIFT, 0, &max98090_linmod_mux), + SND_SOC_DAPM_MUX("LINMOD Mux", SND_SOC_NOPM, 0, 0, + &max98090_linmod_mux),
- SND_SOC_DAPM_MUX("MIXHPLSEL Mux", M98090_REG_HP_CONTROL, - M98090_MIXHPLSEL_SHIFT, 0, &max98090_mixhplsel_mux), + SND_SOC_DAPM_MUX("MIXHPLSEL Mux", SND_SOC_NOPM, 0, 0, + &max98090_mixhplsel_mux),
- SND_SOC_DAPM_MUX("MIXHPRSEL Mux", M98090_REG_HP_CONTROL, - M98090_MIXHPRSEL_SHIFT, 0, &max98090_mixhprsel_mux), + SND_SOC_DAPM_MUX("MIXHPRSEL Mux", SND_SOC_NOPM, 0, 0, + &max98090_mixhprsel_mux),
SND_SOC_DAPM_PGA("HP Left Out", M98090_REG_OUTPUT_ENABLE, M98090_HPLEN_SHIFT, 0, NULL, 0),
From: Curtis Malainey cujomalainey@chromium.org
commit a46eb523220e242affb9a6bc9bb8efc05f4f7459 upstream.
The current algorithm allows 3 types of transfers, 16bit, 32bit and burst. According to Realtek, 16bit transfers have a special restriction in that it is restricted to the memory region of 0x18020000 ~ 0x18021000. This region is the memory location of the I2C registers. The current algorithm does not uphold this restriction and therefore fails to complete writes.
Since this has been broken for some time it likely no one is using it. Better to simply disable the 16 bit writes. This will allow users to properly load firmware over SPI without data corruption.
Signed-off-by: Curtis Malainey cujomalainey@chromium.org Reviewed-by: Ben Zhang benzh@chromium.org Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/soc/codecs/rt5677-spi.c | 35 ++++++++++++++++------------------- 1 file changed, 16 insertions(+), 19 deletions(-)
--- a/sound/soc/codecs/rt5677-spi.c +++ b/sound/soc/codecs/rt5677-spi.c @@ -58,13 +58,15 @@ static DEFINE_MUTEX(spi_mutex); * RT5677_SPI_READ/WRITE_32: Transfer 4 bytes * RT5677_SPI_READ/WRITE_BURST: Transfer any multiples of 8 bytes * - * For example, reading 260 bytes at 0x60030002 uses the following commands: - * 0x60030002 RT5677_SPI_READ_16 2 bytes + * Note: + * 16 Bit writes and reads are restricted to the address range + * 0x18020000 ~ 0x18021000 + * + * For example, reading 256 bytes at 0x60030004 uses the following commands: * 0x60030004 RT5677_SPI_READ_32 4 bytes * 0x60030008 RT5677_SPI_READ_BURST 240 bytes * 0x600300F8 RT5677_SPI_READ_BURST 8 bytes * 0x60030100 RT5677_SPI_READ_32 4 bytes - * 0x60030104 RT5677_SPI_READ_16 2 bytes * * Input: * @read: true for read commands; false for write commands @@ -79,15 +81,13 @@ static u8 rt5677_spi_select_cmd(bool rea { u8 cmd;
- if (align == 2 || align == 6 || remain == 2) { - cmd = RT5677_SPI_READ_16; - *len = 2; - } else if (align == 4 || remain <= 6) { + if (align == 4 || remain <= 4) { cmd = RT5677_SPI_READ_32; *len = 4; } else { cmd = RT5677_SPI_READ_BURST; - *len = min_t(u32, remain & ~7, RT5677_SPI_BURST_LEN); + *len = (((remain - 1) >> 3) + 1) << 3; + *len = min_t(u32, *len, RT5677_SPI_BURST_LEN); } return read ? cmd : cmd + 1; } @@ -108,7 +108,7 @@ static void rt5677_spi_reverse(u8 *dst, } }
-/* Read DSP address space using SPI. addr and len have to be 2-byte aligned. */ +/* Read DSP address space using SPI. addr and len have to be 4-byte aligned. */ int rt5677_spi_read(u32 addr, void *rxbuf, size_t len) { u32 offset; @@ -124,7 +124,7 @@ int rt5677_spi_read(u32 addr, void *rxbu if (!g_spi) return -ENODEV;
- if ((addr & 1) || (len & 1)) { + if ((addr & 3) || (len & 3)) { dev_err(&g_spi->dev, "Bad read align 0x%x(%zu)\n", addr, len); return -EACCES; } @@ -159,13 +159,13 @@ int rt5677_spi_read(u32 addr, void *rxbu } EXPORT_SYMBOL_GPL(rt5677_spi_read);
-/* Write DSP address space using SPI. addr has to be 2-byte aligned. - * If len is not 2-byte aligned, an extra byte of zero is written at the end +/* Write DSP address space using SPI. addr has to be 4-byte aligned. + * If len is not 4-byte aligned, then extra zeros are written at the end * as padding. */ int rt5677_spi_write(u32 addr, const void *txbuf, size_t len) { - u32 offset, len_with_pad = len; + u32 offset; int status = 0; struct spi_transfer t; struct spi_message m; @@ -178,22 +178,19 @@ int rt5677_spi_write(u32 addr, const voi if (!g_spi) return -ENODEV;
- if (addr & 1) { + if (addr & 3) { dev_err(&g_spi->dev, "Bad write align 0x%x(%zu)\n", addr, len); return -EACCES; }
- if (len & 1) - len_with_pad = len + 1; - memset(&t, 0, sizeof(t)); t.tx_buf = buf; t.speed_hz = RT5677_SPI_FREQ; spi_message_init_with_transfers(&m, &t, 1);
- for (offset = 0; offset < len_with_pad;) { + for (offset = 0; offset < len;) { spi_cmd = rt5677_spi_select_cmd(false, (addr + offset) & 7, - len_with_pad - offset, &t.len); + len - offset, &t.len);
/* Construct SPI message header */ buf[0] = spi_cmd;
From: Daniel Borkmann daniel@iogearbox.net
commit 8968c67a82ab7501bc3b9439c3624a49b42fe54c upstream.
Prefetch-with-intent-to-write is currently part of the XADD mapping in the AArch64 JIT and follows the kernel's implementation of atomic_add. This may interfere with other threads executing the LDXR/STXR loop, leading to potential starvation and fairness issues. Drop the optional prefetch instruction.
Fixes: 85f68fe89832 ("bpf, arm64: implement jiting of BPF_XADD") Reported-by: Will Deacon will.deacon@arm.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Jean-Philippe Brucker jean-philippe.brucker@arm.com Acked-by: Will Deacon will.deacon@arm.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/net/bpf_jit.h | 6 ------ arch/arm64/net/bpf_jit_comp.c | 1 - 2 files changed, 7 deletions(-)
--- a/arch/arm64/net/bpf_jit.h +++ b/arch/arm64/net/bpf_jit.h @@ -100,12 +100,6 @@ #define A64_STXR(sf, Rt, Rn, Rs) \ A64_LSX(sf, Rt, Rn, Rs, STORE_EX)
-/* Prefetch */ -#define A64_PRFM(Rn, type, target, policy) \ - aarch64_insn_gen_prefetch(Rn, AARCH64_INSN_PRFM_TYPE_##type, \ - AARCH64_INSN_PRFM_TARGET_##target, \ - AARCH64_INSN_PRFM_POLICY_##policy) - /* Add/subtract (immediate) */ #define A64_ADDSUB_IMM(sf, Rd, Rn, imm12, type) \ aarch64_insn_gen_add_sub_imm(Rd, Rn, imm12, \ --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -712,7 +712,6 @@ emit_cond_jmp: case BPF_STX | BPF_XADD | BPF_DW: emit_a64_mov_i(1, tmp, off, ctx); emit(A64_ADD(1, tmp, tmp, dst), ctx); - emit(A64_PRFM(tmp, PST, L1, STRM), ctx); emit(A64_LDXR(isdw, tmp2, tmp), ctx); emit(A64_ADD(isdw, tmp2, tmp2, src), ctx); emit(A64_STXR(isdw, tmp2, tmp, tmp3), ctx);
From: Jiri Kosina jkosina@suse.cz
commit 134fca9063ad4851de767d1768180e5dede9a881 upstream.
The semantics of what mincore() considers to be resident is not completely clear, but Linux has always (since 2.3.52, which is when mincore() was initially done) treated it as "page is available in page cache".
That's potentially a problem, as that [in]directly exposes meta-information about pagecache / memory mapping state even about memory not strictly belonging to the process executing the syscall, opening possibilities for sidechannel attacks.
Change the semantics of mincore() so that it only reveals pagecache information for non-anonymous mappings that belog to files that the calling process could (if it tried to) successfully open for writing; otherwise we'd be including shared non-exclusive mappings, which
- is the sidechannel
- is not the usecase for mincore(), as that's primarily used for data, not (shared) text
[jkosina@suse.cz: v2] Link: http://lkml.kernel.org/r/20190312141708.6652-2-vbabka@suse.cz [mhocko@suse.com: restructure can_do_mincore() conditions] Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1903062342020.19912@cbobk.fhfr.pm Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Vlastimil Babka vbabka@suse.cz Acked-by: Josh Snyder joshs@netflix.com Acked-by: Michal Hocko mhocko@suse.com Originally-by: Linus Torvalds torvalds@linux-foundation.org Originally-by: Dominique Martinet asmadeus@codewreck.org Cc: Andy Lutomirski luto@amacapital.net Cc: Dave Chinner david@fromorbit.com Cc: Kevin Easton kevin@guarana.org Cc: Matthew Wilcox willy@infradead.org Cc: Cyril Hrubis chrubis@suse.cz Cc: Tejun Heo tj@kernel.org Cc: Kirill A. Shutemov kirill@shutemov.name Cc: Daniel Gruss daniel@gruss.cc Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- mm/mincore.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)
--- a/mm/mincore.c +++ b/mm/mincore.c @@ -169,6 +169,22 @@ out: return 0; }
+static inline bool can_do_mincore(struct vm_area_struct *vma) +{ + if (vma_is_anonymous(vma)) + return true; + if (!vma->vm_file) + return false; + /* + * Reveal pagecache information only for non-anonymous mappings that + * correspond to the files the calling process could (if tried) open + * for writing; otherwise we'd be including shared non-exclusive + * mappings, which opens a side channel. + */ + return inode_owner_or_capable(file_inode(vma->vm_file)) || + inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; +} + /* * Do a chunk of "sys_mincore()". We've already checked * all the arguments, we hold the mmap semaphore: we should @@ -189,8 +205,13 @@ static long do_mincore(unsigned long add vma = find_vma(current->mm, addr); if (!vma || addr < vma->vm_start) return -ENOMEM; - mincore_walk.mm = vma->vm_mm; end = min(vma->vm_end, addr + (pages << PAGE_SHIFT)); + if (!can_do_mincore(vma)) { + unsigned long pages = DIV_ROUND_UP(end - addr, PAGE_SIZE); + memset(vec, 1, pages); + return pages; + } + mincore_walk.mm = vma->vm_mm; err = walk_page_range(addr, end, &mincore_walk); if (err < 0) return err;
From: Shuning Zhang sunny.s.zhang@oracle.com
commit e091eab028f9253eac5c04f9141bbc9d170acab3 upstream.
In some cases, ocfs2_iget() reads the data of inode, which has been deleted for some reason. That will make the system panic. So We should judge whether this inode has been deleted, and tell the caller that the inode is a bad inode.
For example, the ocfs2 is used as the backed of nfs, and the client is nfsv3. This issue can be reproduced by the following steps.
on the nfs server side, ..../patha/pathb
Step 1: The process A was scheduled before calling the function fh_verify.
Step 2: The process B is removing the 'pathb', and just completed the call to function dput. Then the dentry of 'pathb' has been deleted from the dcache, and all ancestors have been deleted also. The relationship of dentry and inode was deleted through the function hlist_del_init. The following is the call stack. dentry_iput->hlist_del_init(&dentry->d_u.d_alias)
At this time, the inode is still in the dcache.
Step 3: The process A call the function ocfs2_get_dentry, which get the inode from dcache. Then the refcount of inode is 1. The following is the call stack. nfsd3_proc_getacl->fh_verify->exportfs_decode_fh->fh_to_dentry(ocfs2_get_dentry)
Step 4: Dirty pages are flushed by bdi threads. So the inode of 'patha' is evicted, and this directory was deleted. But the inode of 'pathb' can't be evicted, because the refcount of the inode was 1.
Step 5: The process A keep running, and call the function reconnect_path(in exportfs_decode_fh), which call function ocfs2_get_parent of ocfs2. Get the block number of parent directory(patha) by the name of ... Then read the data from disk by the block number. But this inode has been deleted, so the system panic.
Process A Process B 1. in nfsd3_proc_getacl | 2. | dput 3. fh_to_dentry(ocfs2_get_dentry) | 4. bdi flush dirty cache | 5. ocfs2_iget |
[283465.542049] OCFS2: ERROR (device sdp): ocfs2_validate_inode_block: Invalid dinode #580640: OCFS2_VALID_FL not set
[283465.545490] Kernel panic - not syncing: OCFS2: (device sdp): panic forced after error
[283465.546889] CPU: 5 PID: 12416 Comm: nfsd Tainted: G W 4.1.12-124.18.6.el6uek.bug28762940v3.x86_64 #2 [283465.548382] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015 [283465.549657] 0000000000000000 ffff8800a56fb7b8 ffffffff816e839c ffffffffa0514758 [283465.550392] 000000000008dc20 ffff8800a56fb838 ffffffff816e62d3 0000000000000008 [283465.551056] ffff880000000010 ffff8800a56fb848 ffff8800a56fb7e8 ffff88005df9f000 [283465.551710] Call Trace: [283465.552516] [<ffffffff816e839c>] dump_stack+0x63/0x81 [283465.553291] [<ffffffff816e62d3>] panic+0xcb/0x21b [283465.554037] [<ffffffffa04e66b0>] ocfs2_handle_error+0xf0/0xf0 [ocfs2] [283465.554882] [<ffffffffa04e7737>] __ocfs2_error+0x67/0x70 [ocfs2] [283465.555768] [<ffffffffa049c0f9>] ocfs2_validate_inode_block+0x229/0x230 [ocfs2] [283465.556683] [<ffffffffa047bcbc>] ocfs2_read_blocks+0x46c/0x7b0 [ocfs2] [283465.557408] [<ffffffffa049bed0>] ? ocfs2_inode_cache_io_unlock+0x20/0x20 [ocfs2] [283465.557973] [<ffffffffa049f0eb>] ocfs2_read_inode_block_full+0x3b/0x60 [ocfs2] [283465.558525] [<ffffffffa049f5ba>] ocfs2_iget+0x4aa/0x880 [ocfs2] [283465.559082] [<ffffffffa049146e>] ocfs2_get_parent+0x9e/0x220 [ocfs2] [283465.559622] [<ffffffff81297c05>] reconnect_path+0xb5/0x300 [283465.560156] [<ffffffff81297f46>] exportfs_decode_fh+0xf6/0x2b0 [283465.560708] [<ffffffffa062faf0>] ? nfsd_proc_getattr+0xa0/0xa0 [nfsd] [283465.561262] [<ffffffff810a8196>] ? prepare_creds+0x26/0x110 [283465.561932] [<ffffffffa0630860>] fh_verify+0x350/0x660 [nfsd] [283465.562862] [<ffffffffa0637804>] ? nfsd_cache_lookup+0x44/0x630 [nfsd] [283465.563697] [<ffffffffa063a8b9>] nfsd3_proc_getattr+0x69/0xf0 [nfsd] [283465.564510] [<ffffffffa062cf60>] nfsd_dispatch+0xe0/0x290 [nfsd] [283465.565358] [<ffffffffa05eb892>] ? svc_tcp_adjust_wspace+0x12/0x30 [sunrpc] [283465.566272] [<ffffffffa05ea652>] svc_process_common+0x412/0x6a0 [sunrpc] [283465.567155] [<ffffffffa05eaa03>] svc_process+0x123/0x210 [sunrpc] [283465.568020] [<ffffffffa062c90f>] nfsd+0xff/0x170 [nfsd] [283465.568962] [<ffffffffa062c810>] ? nfsd_destroy+0x80/0x80 [nfsd] [283465.570112] [<ffffffff810a622b>] kthread+0xcb/0xf0 [283465.571099] [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180 [283465.572114] [<ffffffff816f11b8>] ret_from_fork+0x58/0x90 [283465.573156] [<ffffffff810a6160>] ? kthread_create_on_node+0x180/0x180
Link: http://lkml.kernel.org/r/1554185919-3010-1-git-send-email-sunny.s.zhang@orac... Signed-off-by: Shuning Zhang sunny.s.zhang@oracle.com Reviewed-by: Joseph Qi jiangqi903@gmail.com Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Changwei Ge gechangwei@live.cn Cc: piaojun piaojun@huawei.com Cc: "Gang He" ghe@suse.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ocfs2/export.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-)
--- a/fs/ocfs2/export.c +++ b/fs/ocfs2/export.c @@ -148,16 +148,24 @@ static struct dentry *ocfs2_get_parent(s u64 blkno; struct dentry *parent; struct inode *dir = d_inode(child); + int set;
trace_ocfs2_get_parent(child, child->d_name.len, child->d_name.name, (unsigned long long)OCFS2_I(dir)->ip_blkno);
+ status = ocfs2_nfs_sync_lock(OCFS2_SB(dir->i_sb), 1); + if (status < 0) { + mlog(ML_ERROR, "getting nfs sync lock(EX) failed %d\n", status); + parent = ERR_PTR(status); + goto bail; + } + status = ocfs2_inode_lock(dir, NULL, 0); if (status < 0) { if (status != -ENOENT) mlog_errno(status); parent = ERR_PTR(status); - goto bail; + goto unlock_nfs_sync; }
status = ocfs2_lookup_ino_from_name(dir, "..", 2, &blkno); @@ -166,11 +174,31 @@ static struct dentry *ocfs2_get_parent(s goto bail_unlock; }
+ status = ocfs2_test_inode_bit(OCFS2_SB(dir->i_sb), blkno, &set); + if (status < 0) { + if (status == -EINVAL) { + status = -ESTALE; + } else + mlog(ML_ERROR, "test inode bit failed %d\n", status); + parent = ERR_PTR(status); + goto bail_unlock; + } + + trace_ocfs2_get_dentry_test_bit(status, set); + if (!set) { + status = -ESTALE; + parent = ERR_PTR(status); + goto bail_unlock; + } + parent = d_obtain_alias(ocfs2_iget(OCFS2_SB(dir->i_sb), blkno, 0, 0));
bail_unlock: ocfs2_inode_unlock(dir, 0);
+unlock_nfs_sync: + ocfs2_nfs_sync_unlock(OCFS2_SB(dir->i_sb), 1); + bail: trace_ocfs2_get_parent_end(parent);
From: Andrea Arcangeli aarcange@redhat.com
commit c3f3ce049f7d97cc7ec9c01cb51d9ec74e0f37c2 upstream.
The task structure is freed while get_mem_cgroup_from_mm() holds rcu_read_lock() and dereferences mm->owner.
get_mem_cgroup_from_mm() failing fork() ---- --- task = mm->owner mm->owner = NULL; free(task) if (task) *task; /* use after free */
The fix consists in freeing the task with RCU also in the fork failure case, exactly like it always happens for the regular exit(2) path. That is enough to make the rcu_read_lock hold in get_mem_cgroup_from_mm() (left side above) effective to avoid a use after free when dereferencing the task structure.
An alternate possible fix would be to defer the delivery of the userfaultfd contexts to the monitor until after fork() is guaranteed to succeed. Such a change would require more changes because it would create a strict ordering dependency where the uffd methods would need to be called beyond the last potentially failing branch in order to be safe. This solution as opposed only adds the dependency to common code to set mm->owner to NULL and to free the task struct that was pointed by mm->owner with RCU, if fork ends up failing. The userfaultfd methods can still be called anywhere during the fork runtime and the monitor will keep discarding orphaned "mm" coming from failed forks in userland.
This race condition couldn't trigger if CONFIG_MEMCG was set =n at build time.
[aarcange@redhat.com: improve changelog, reduce #ifdefs per Michal] Link: http://lkml.kernel.org/r/20190429035752.4508-1-aarcange@redhat.com Link: http://lkml.kernel.org/r/20190325225636.11635-2-aarcange@redhat.com Fixes: 893e26e61d04 ("userfaultfd: non-cooperative: Add fork() event") Signed-off-by: Andrea Arcangeli aarcange@redhat.com Tested-by: zhong jiang zhongjiang@huawei.com Reported-by: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com Cc: Oleg Nesterov oleg@redhat.com Cc: Jann Horn jannh@google.com Cc: Hugh Dickins hughd@google.com Cc: Mike Rapoport rppt@linux.vnet.ibm.com Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Peter Xu peterx@redhat.com Cc: Jason Gunthorpe jgg@mellanox.com Cc: "Kirill A . Shutemov" kirill.shutemov@linux.intel.com Cc: Michal Hocko mhocko@suse.com Cc: zhong jiang zhongjiang@huawei.com Cc: syzbot+cbb52e396df3e565ab02@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/fork.c | 31 +++++++++++++++++++++++++++++-- 1 file changed, 29 insertions(+), 2 deletions(-)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -790,6 +790,15 @@ static void mm_init_aio(struct mm_struct #endif }
+static __always_inline void mm_clear_owner(struct mm_struct *mm, + struct task_struct *p) +{ +#ifdef CONFIG_MEMCG + if (mm->owner == p) + WRITE_ONCE(mm->owner, NULL); +#endif +} + static void mm_init_owner(struct mm_struct *mm, struct task_struct *p) { #ifdef CONFIG_MEMCG @@ -1211,6 +1220,7 @@ static struct mm_struct *dup_mm(struct t free_pt: /* don't put binfmt in mmput, we haven't got module yet */ mm->binfmt = NULL; + mm_init_owner(mm, NULL); mmput(mm);
fail_nomem: @@ -1528,6 +1538,21 @@ static inline void rcu_copy_process(stru #endif /* #ifdef CONFIG_TASKS_RCU */ }
+static void __delayed_free_task(struct rcu_head *rhp) +{ + struct task_struct *tsk = container_of(rhp, struct task_struct, rcu); + + free_task(tsk); +} + +static __always_inline void delayed_free_task(struct task_struct *tsk) +{ + if (IS_ENABLED(CONFIG_MEMCG)) + call_rcu(&tsk->rcu, __delayed_free_task); + else + free_task(tsk); +} + /* * This creates a new process as a copy of the old one, * but does not actually start it yet. @@ -1960,8 +1985,10 @@ bad_fork_cleanup_io: bad_fork_cleanup_namespaces: exit_task_namespaces(p); bad_fork_cleanup_mm: - if (p->mm) + if (p->mm) { + mm_clear_owner(p->mm, p); mmput(p->mm); + } bad_fork_cleanup_signal: if (!(clone_flags & CLONE_THREAD)) free_signal_struct(p->signal); @@ -1992,7 +2019,7 @@ bad_fork_cleanup_count: bad_fork_free: p->state = TASK_DEAD; put_task_stack(p); - free_task(p); + delayed_free_task(p); fork_out: return ERR_PTR(retval); }
From: Steve Twiss stwiss.opensource@diasemi.com
commit 6b4814a9451add06d457e198be418bf6a3e6a990 upstream.
Mismatch between what is found in the Datasheets for DA9063 and DA9063L provided by Dialog Semiconductor, and the register names provided in the MFD registers file. The changes are for the OTP (one-time-programming) control registers. The two naming errors are OPT instead of OTP, and COUNT instead of CONT (i.e. control).
Cc: Stable stable@vger.kernel.org Signed-off-by: Steve Twiss stwiss.opensource@diasemi.com Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/mfd/da9063/registers.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/include/linux/mfd/da9063/registers.h +++ b/include/linux/mfd/da9063/registers.h @@ -215,9 +215,9 @@
/* DA9063 Configuration registers */ /* OTP */ -#define DA9063_REG_OPT_COUNT 0x101 -#define DA9063_REG_OPT_ADDR 0x102 -#define DA9063_REG_OPT_DATA 0x103 +#define DA9063_REG_OTP_CONT 0x101 +#define DA9063_REG_OTP_ADDR 0x102 +#define DA9063_REG_OTP_DATA 0x103
/* Customer Trim and Configuration */ #define DA9063_REG_T_OFFSET 0x104
From: Dmitry Osipenko digetx@gmail.com
commit ea611d1cc180fbb56982c83cd5142a2b34881f5c upstream.
The FPS_PERIOD_MAX_US definitions are swapped for MAX20024 and MAX77620, fix it.
Cc: stable stable@vger.kernel.org Signed-off-by: Dmitry Osipenko digetx@gmail.com Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/linux/mfd/max77620.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/include/linux/mfd/max77620.h +++ b/include/linux/mfd/max77620.h @@ -136,8 +136,8 @@ #define MAX77620_FPS_PERIOD_MIN_US 40 #define MAX20024_FPS_PERIOD_MIN_US 20
-#define MAX77620_FPS_PERIOD_MAX_US 2560 -#define MAX20024_FPS_PERIOD_MAX_US 5120 +#define MAX20024_FPS_PERIOD_MAX_US 2560 +#define MAX77620_FPS_PERIOD_MAX_US 5120
#define MAX77620_REG_FPS_GPIO1 0x54 #define MAX77620_REG_FPS_GPIO2 0x55
From: Alexander Sverdlin alexander.sverdlin@nokia.com
commit 2b75ebeea6f4937d4d05ec4982c471cef9a29b7f upstream.
It was observed that reads crossing 4K address boundary are failing.
This limitation is mentioned in Intel documents:
Intel(R) 9 Series Chipset Family Platform Controller Hub (PCH) Datasheet:
"5.26.3 Flash Access Program Register Access: * Program Register Accesses are not allowed to cross a 4 KB boundary..."
Enhanced Serial Peripheral Interface (eSPI) Interface Base Specification (for Client and Server Platforms):
"5.1.4 Address For other memory transactions, the address may start or end at any byte boundary. However, the address and payload length combination must not cross the naturally aligned address boundary of the corresponding Maximum Payload Size. It must not cross a 4 KB address boundary."
Avoid this by splitting an operation crossing the boundary into two operations.
Fixes: 8afda8b26d01 ("spi-nor: Add support for Intel SPI serial flash controller") Cc: stable@vger.kernel.org Reported-by: Romain Porte romain.porte@nokia.com Tested-by: Pascal Fabreges pascal.fabreges@nokia.com Signed-off-by: Alexander Sverdlin alexander.sverdlin@nokia.com Reviewed-by: Tudor Ambarus tudor.ambarus@microchip.com Acked-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/mtd/spi-nor/intel-spi.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/mtd/spi-nor/intel-spi.c +++ b/drivers/mtd/spi-nor/intel-spi.c @@ -503,6 +503,10 @@ static ssize_t intel_spi_read(struct spi while (len > 0) { block_size = min_t(size_t, len, INTEL_SPI_FIFO_SZ);
+ /* Read cannot cross 4K boundary */ + block_size = min_t(loff_t, from + block_size, + round_up(from + 1, SZ_4K)) - from; + writel(from, ispi->base + FADDR);
val = readl(ispi->base + HSFSTS_CTL); @@ -553,6 +557,10 @@ static ssize_t intel_spi_write(struct sp while (len > 0) { block_size = min_t(size_t, len, INTEL_SPI_FIFO_SZ);
+ /* Write cannot cross 4K boundary */ + block_size = min_t(loff_t, to + block_size, + round_up(to + 1, SZ_4K)) - to; + writel(to, ispi->base + FADDR);
val = readl(ispi->base + HSFSTS_CTL);
From: Yifeng Li tomli@tomli.me
commit 75ddbc1fb11efac87b611d48e9802f6fe2bb2163 upstream.
Previously, in the userspace, it was possible to use the "setterm" command from util-linux to blank the VT console by default, using the following command.
According to the man page,
The force option keeps the screen blank even if a key is pressed.
It was implemented by calling TIOCL_BLANKSCREEN.
case BLANKSCREEN: ioctlarg = TIOCL_BLANKSCREEN; if (ioctl(STDIN_FILENO, TIOCLINUX, &ioctlarg)) warn(_("cannot force blank")); break;
However, after Linux 4.12, this command ceased to work anymore, which is unexpected. By inspecting the kernel source, it shows that the issue was triggered by the side-effect from commit a4199f5eb809 ("tty: Disable default console blanking interval").
The console blanking is implemented by function do_blank_screen() in vt.c: "blank_state" will be initialized to "blank_normal_wait" in con_init() if AND ONLY IF ("blankinterval" > 0). If "blankinterval" is 0, "blank_state" will be "blank_off" (== 0), and a call to do_blank_screen() will always abort, even if a forced blanking is required from the user by calling TIOCL_BLANKSCREEN, the console won't be blanked.
This behavior is unexpected from a user's point-of-view, since it's not mentioned in any documentation. The setterm man page suggests it will always work, and the kernel comments in uapi/linux/tiocl.h says
/* keep screen blank even if a key is pressed */ #define TIOCL_BLANKSCREEN 14
To fix it, we simply remove the "blank_state != blank_off" check, as pointed out by Nicolas Pitre, this check doesn't logically make sense and it's safe to remove.
Suggested-by: Nicolas Pitre nicolas.pitre@linaro.org Fixes: a4199f5eb809 ("tty: Disable default console blanking interval") Signed-off-by: Yifeng Li tomli@tomli.me Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/vt/vt.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/tty/vt/vt.c +++ b/drivers/tty/vt/vt.c @@ -3840,8 +3840,6 @@ void do_blank_screen(int entering_gfx) return; }
- if (blank_state != blank_normal_wait) - return; blank_state = blank_off;
/* don't blank graphics */
From: Sergei Trofimovich slyfox@gentoo.org
commit 46ca3f735f345c9d87383dd3a09fa5d43870770e upstream.
The bug manifests as an attempt to access deallocated memory:
BUG: unable to handle kernel paging request at ffff9c8735448000 #PF error: [PROT] [WRITE] PGD 288a05067 P4D 288a05067 PUD 288a07067 PMD 7f60c2063 PTE 80000007f5448161 Oops: 0003 [#1] PREEMPT SMP CPU: 6 PID: 388 Comm: loadkeys Tainted: G C 5.0.0-rc6-00153-g5ded5871030e #91 Hardware name: Gigabyte Technology Co., Ltd. To be filled by O.E.M./H77M-D3H, BIOS F12 11/14/2013 RIP: 0010:__memmove+0x81/0x1a0 Code: 4c 89 4f 10 4c 89 47 18 48 8d 7f 20 73 d4 48 83 c2 20 e9 a2 00 00 00 66 90 48 89 d1 4c 8b 5c 16 f8 4c 8d 54 17 f8 48 c1 e9 03 <f3> 48 a5 4d 89 1a e9 0c 01 00 00 0f 1f 40 00 48 89 d1 4c 8b 1e 49 RSP: 0018:ffffa1b9002d7d08 EFLAGS: 00010203 RAX: ffff9c873541af43 RBX: ffff9c873541af43 RCX: 00000c6f105cd6bf RDX: 0000637882e986b6 RSI: ffff9c8735447ffb RDI: ffff9c8735447ffb RBP: ffff9c8739cd3800 R08: ffff9c873b802f00 R09: 00000000fffff73b R10: ffffffffb82b35f1 R11: 00505b1b004d5b1b R12: 0000000000000000 R13: ffff9c873541af3d R14: 000000000000000b R15: 000000000000000c FS: 00007f450c390580(0000) GS:ffff9c873f180000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff9c8735448000 CR3: 00000007e213c002 CR4: 00000000000606e0 Call Trace: vt_do_kdgkb_ioctl+0x34d/0x440 vt_ioctl+0xba3/0x1190 ? __bpf_prog_run32+0x39/0x60 ? mem_cgroup_commit_charge+0x7b/0x4e0 tty_ioctl+0x23f/0x920 ? preempt_count_sub+0x98/0xe0 ? __seccomp_filter+0x67/0x600 do_vfs_ioctl+0xa2/0x6a0 ? syscall_trace_enter+0x192/0x2d0 ksys_ioctl+0x3a/0x70 __x64_sys_ioctl+0x16/0x20 do_syscall_64+0x54/0xe0 entry_SYSCALL_64_after_hwframe+0x49/0xbe
The bug manifests on systemd systems with multiple vtcon devices: # cat /sys/devices/virtual/vtconsole/vtcon0/name (S) dummy device # cat /sys/devices/virtual/vtconsole/vtcon1/name (M) frame buffer device
There systemd runs 'loadkeys' tool in tapallel for each vtcon instance. This causes two parallel ioctl(KDSKBSENT) calls to race into adding the same entry into 'func_table' array at:
drivers/tty/vt/keyboard.c:vt_do_kdgkb_ioctl()
The function has no locking around writes to 'func_table'.
The simplest reproducer is to have initrams with the following init on a 8-CPU machine x86_64:
#!/bin/sh
loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 &
loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & loadkeys -q windowkeys ru4 & wait
The change adds lock on write path only. Reads are still racy.
CC: Greg Kroah-Hartman gregkh@linuxfoundation.org CC: Jiri Slaby jslaby@suse.com Link: https://lkml.org/lkml/2019/2/17/256 Signed-off-by: Sergei Trofimovich slyfox@gentoo.org Cc: stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/tty/vt/keyboard.c | 33 +++++++++++++++++++++++++++------ 1 file changed, 27 insertions(+), 6 deletions(-)
--- a/drivers/tty/vt/keyboard.c +++ b/drivers/tty/vt/keyboard.c @@ -122,6 +122,7 @@ static const int NR_TYPES = ARRAY_SIZE(m static struct input_handler kbd_handler; static DEFINE_SPINLOCK(kbd_event_lock); static DEFINE_SPINLOCK(led_lock); +static DEFINE_SPINLOCK(func_buf_lock); /* guard 'func_buf' and friends */ static unsigned long key_down[BITS_TO_LONGS(KEY_CNT)]; /* keyboard key bitmap */ static unsigned char shift_down[NR_SHIFT]; /* shift state counters.. */ static bool dead_key_next; @@ -1959,11 +1960,12 @@ int vt_do_kdgkb_ioctl(int cmd, struct kb char *p; u_char *q; u_char __user *up; - int sz; + int sz, fnw_sz; int delta; char *first_free, *fj, *fnw; int i, j, k; int ret; + unsigned long flags;
if (!capable(CAP_SYS_TTY_CONFIG)) perm = 0; @@ -2006,7 +2008,14 @@ int vt_do_kdgkb_ioctl(int cmd, struct kb goto reterr; }
+ fnw = NULL; + fnw_sz = 0; + /* race aginst other writers */ + again: + spin_lock_irqsave(&func_buf_lock, flags); q = func_table[i]; + + /* fj pointer to next entry after 'q' */ first_free = funcbufptr + (funcbufsize - funcbufleft); for (j = i+1; j < MAX_NR_FUNC && !func_table[j]; j++) ; @@ -2014,10 +2023,12 @@ int vt_do_kdgkb_ioctl(int cmd, struct kb fj = func_table[j]; else fj = first_free; - + /* buffer usage increase by new entry */ delta = (q ? -strlen(q) : 1) + strlen(kbs->kb_string); + if (delta <= funcbufleft) { /* it fits in current buf */ if (j < MAX_NR_FUNC) { + /* make enough space for new entry at 'fj' */ memmove(fj + delta, fj, first_free - fj); for (k = j; k < MAX_NR_FUNC; k++) if (func_table[k]) @@ -2030,20 +2041,28 @@ int vt_do_kdgkb_ioctl(int cmd, struct kb sz = 256; while (sz < funcbufsize - funcbufleft + delta) sz <<= 1; - fnw = kmalloc(sz, GFP_KERNEL); - if(!fnw) { - ret = -ENOMEM; - goto reterr; + if (fnw_sz != sz) { + spin_unlock_irqrestore(&func_buf_lock, flags); + kfree(fnw); + fnw = kmalloc(sz, GFP_KERNEL); + fnw_sz = sz; + if (!fnw) { + ret = -ENOMEM; + goto reterr; + } + goto again; }
if (!q) func_table[i] = fj; + /* copy data before insertion point to new location */ if (fj > funcbufptr) memmove(fnw, funcbufptr, fj - funcbufptr); for (k = 0; k < j; k++) if (func_table[k]) func_table[k] = fnw + (func_table[k] - funcbufptr);
+ /* copy data after insertion point to new location */ if (first_free > fj) { memmove(fnw + (fj - funcbufptr) + delta, fj, first_free - fj); for (k = j; k < MAX_NR_FUNC; k++) @@ -2056,7 +2075,9 @@ int vt_do_kdgkb_ioctl(int cmd, struct kb funcbufleft = funcbufleft - delta + sz - funcbufsize; funcbufsize = sz; } + /* finally insert item itself */ strcpy(func_table[i], kbs->kb_string); + spin_unlock_irqrestore(&func_buf_lock, flags); break; } ret = 0;
From: Jiufei Xue jiufei.xue@linux.alibaba.com
commit 742b06b5628f2cd23cb51a034cb54dc33c6162c5 upstream.
We hit a BUG at fs/buffer.c:3057 if we detached the nbd device before unmounting ext4 filesystem.
The typical chain of events leading to the BUG: jbd2_write_superblock submit_bh submit_bh_wbc BUG_ON(!buffer_mapped(bh));
The block device is removed and all the pages are invalidated. JBD2 was trying to write journal superblock to the block device which is no longer present.
Fix this by checking the journal superblock's buffer head prior to submitting.
Reported-by: Eric Ren renzhen@linux.alibaba.com Signed-off-by: Jiufei Xue jiufei.xue@linux.alibaba.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/jbd2/journal.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/jbd2/journal.c +++ b/fs/jbd2/journal.c @@ -1353,6 +1353,10 @@ static int jbd2_write_superblock(journal journal_superblock_t *sb = journal->j_superblock; int ret;
+ /* Buffer got discarded which means block device got invalidated */ + if (!buffer_mapped(bh)) + return -EIO; + trace_jbd2_write_superblock(journal, write_flags); if (!(journal->j_flags & JBD2_BARRIER)) write_flags &= ~(REQ_FUA | REQ_PREFLUSH);
From: Jan Kara jack@suse.cz
commit 31562b954b60f02acb91b7349dc6432d3f8c3c5f upstream.
The sanity check in mb_find_extent() only checked that returned extent does not extend past blocksize * 8, however it should not extend past EXT4_CLUSTERS_PER_GROUP(sb). This can happen when clusters_per_group < blocksize * 8 and the tail of the bitmap is not properly filled by 1s which happened e.g. when ancient kernels have grown the filesystem.
Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/mballoc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -1555,7 +1555,7 @@ static int mb_find_extent(struct ext4_bu ex->fe_len += 1 << order; }
- if (ex->fe_start + ex->fe_len > (1 << (e4b->bd_blkbits + 3))) { + if (ex->fe_start + ex->fe_len > EXT4_CLUSTERS_PER_GROUP(e4b->bd_sb)) { /* Should never happen! (but apparently sometimes does?!?) */ WARN_ON(1); ext4_error(e4b->bd_sb, "corruption or bug in mb_find_extent "
From: Theodore Ts'o tytso@mit.edu
commit e5d01196c0428a206f307e9ee5f6842964098ff0 upstream.
In other places in fs/ext4/xattr.c, if e_value_inum is non-zero, the code ignores the value in e_value_offs. The e_value_offs *should* be zero, but we shouldn't depend upon it, since it might not be true in a corrupted/fuzzed file system.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202897 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=202877 Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/xattr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1698,7 +1698,7 @@ static int ext4_xattr_set_entry(struct e
/* No failures allowed past this point. */
- if (!s->not_found && here->e_value_size && here->e_value_offs) { + if (!s->not_found && here->e_value_size && !here->e_value_inum) { /* Remove the old value. */ void *first_val = s->base + min_offs; size_t offs = le16_to_cpu(here->e_value_offs);
From: Pan Bian bianpan2016@163.com
commit 8c380ab4b7b59c0c602743810be1b712514eaebc upstream.
The reference to iloc.bh has been dropped in ext4_mark_iloc_dirty. However, the reference is dropped again if error occurs during ext4_handle_dirty_metadata, which may result in use-after-free bugs.
Fixes: fb265c9cb49e("ext4: add ext4_sb_bread() to disambiguate ENOMEM cases") Signed-off-by: Pan Bian bianpan2016@163.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/resize.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -849,6 +849,7 @@ static int add_new_gdb(handle_t *handle, err = ext4_handle_dirty_metadata(handle, NULL, gdb_bh); if (unlikely(err)) { ext4_std_error(sb, err); + iloc.bh = NULL; goto errout; } brelse(dind);
From: Filipe Manana fdmanana@suse.com
commit 03628cdbc64db6262e50d0357960a4e9562676a1 upstream.
During fiemap, for regular extents (non inline) we need to check if they are shared and if they are, set the shared bit. Checking if an extent is shared requires checking the delayed references of the currently running transaction, since some reference might have not yet hit the extent tree and be only in the in-memory delayed references.
However we were using a transaction join for this, which creates a new transaction when there is no transaction currently running. That means that two more potential failures can happen: creating the transaction and committing it. Further, if no write activity is currently happening in the system, and fiemap calls keep being done, we end up creating and committing transactions that do nothing.
In some extreme cases this can result in the commit of the transaction created by fiemap to fail with ENOSPC when updating the root item of a subvolume tree because a join does not reserve any space, leading to a trace like the following:
heisenberg kernel: ------------[ cut here ]------------ heisenberg kernel: BTRFS: Transaction aborted (error -28) heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018 heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286 heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006 heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0 heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007 heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078 heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068 heisenberg kernel: FS: 0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000 heisenberg kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0 heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 heisenberg kernel: Call Trace: heisenberg kernel: commit_fs_roots+0x166/0x1d0 [btrfs] heisenberg kernel: ? _cond_resched+0x15/0x30 heisenberg kernel: ? btrfs_run_delayed_refs+0xac/0x180 [btrfs] heisenberg kernel: btrfs_commit_transaction+0x2bd/0x870 [btrfs] heisenberg kernel: ? start_transaction+0x9d/0x3f0 [btrfs] heisenberg kernel: transaction_kthread+0x147/0x180 [btrfs] heisenberg kernel: ? btrfs_cleanup_transaction+0x530/0x530 [btrfs] heisenberg kernel: kthread+0x112/0x130 heisenberg kernel: ? kthread_bind+0x30/0x30 heisenberg kernel: ret_from_fork+0x35/0x40 heisenberg kernel: ---[ end trace 05de912e30e012d9 ]---
Since fiemap (and btrfs_check_shared()) is a read-only operation, do not do a transaction join to avoid the overhead of creating a new transaction (if there is currently no running transaction) and introducing a potential point of failure when the new transaction gets committed, instead use a transaction attach to grab a handle for the currently running transaction if any.
Reported-by: Christoph Anton Mitterer calestyo@scientia.net Link: https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4... Fixes: afce772e87c36c ("btrfs: fix check_shared for fiemap ioctl") CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/backref.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
--- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1452,8 +1452,8 @@ int btrfs_find_all_roots(struct btrfs_tr * callers (such as fiemap) which want to know whether the extent is * shared but do not need a ref count. * - * This attempts to allocate a transaction in order to account for - * delayed refs, but continues on even when the alloc fails. + * This attempts to attach to the running transaction in order to account for + * delayed refs, but continues on even when no running transaction exists. * * Return: 0 if extent is not shared, 1 if it is shared, < 0 on error. */ @@ -1476,13 +1476,16 @@ int btrfs_check_shared(struct btrfs_root tmp = ulist_alloc(GFP_NOFS); roots = ulist_alloc(GFP_NOFS); if (!tmp || !roots) { - ulist_free(tmp); - ulist_free(roots); - return -ENOMEM; + ret = -ENOMEM; + goto out; }
- trans = btrfs_join_transaction(root); + trans = btrfs_attach_transaction(root); if (IS_ERR(trans)) { + if (PTR_ERR(trans) != -ENOENT && PTR_ERR(trans) != -EROFS) { + ret = PTR_ERR(trans); + goto out; + } trans = NULL; down_read(&fs_info->commit_root_sem); } else { @@ -1515,6 +1518,7 @@ int btrfs_check_shared(struct btrfs_root } else { up_read(&fs_info->commit_root_sem); } +out: ulist_free(tmp); ulist_free(roots); return ret;
From: Filipe Manana fdmanana@suse.com
commit bfc61c36260ca990937539cd648ede3cd749bc10 upstream.
When finding out which inodes have references on a particular extent, done by backref.c:iterate_extent_inodes(), from the BTRFS_IOC_LOGICAL_INO (both v1 and v2) ioctl and from scrub we use the transaction join API to grab a reference on the currently running transaction, since in order to give accurate results we need to inspect the delayed references of the currently running transaction.
However, if there is currently no running transaction, the join operation will create a new transaction. This is inefficient as the transaction will eventually be committed, doing unnecessary IO and introducing a potential point of failure that will lead to a transaction abort due to -ENOSPC, as recently reported [1].
That's because the join, creates the transaction but does not reserve any space, so when attempting to update the root item of the root passed to btrfs_join_transaction(), during the transaction commit, we can end up failling with -ENOSPC. Users of a join operation are supposed to actually do some filesystem changes and reserve space by some means, which is not the case of iterate_extent_inodes(), it is a read-only operation for all contextes from which it is called.
The reported [1] -ENOSPC failure stack trace is the following:
heisenberg kernel: ------------[ cut here ]------------ heisenberg kernel: BTRFS: Transaction aborted (error -28) heisenberg kernel: WARNING: CPU: 0 PID: 7137 at fs/btrfs/root-tree.c:136 btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: CPU: 0 PID: 7137 Comm: btrfs-transacti Not tainted 4.19.0-4-amd64 #1 Debian 4.19.28-2 heisenberg kernel: Hardware name: FUJITSU LIFEBOOK U757/FJNB2A5, BIOS Version 1.21 03/19/2018 heisenberg kernel: RIP: 0010:btrfs_update_root+0x22b/0x320 [btrfs] (...) heisenberg kernel: RSP: 0018:ffffb5448828bd40 EFLAGS: 00010286 heisenberg kernel: RAX: 0000000000000000 RBX: ffff8ed56bccef50 RCX: 0000000000000006 heisenberg kernel: RDX: 0000000000000007 RSI: 0000000000000092 RDI: ffff8ed6bda166a0 heisenberg kernel: RBP: 00000000ffffffe4 R08: 00000000000003df R09: 0000000000000007 heisenberg kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff8ed63396a078 heisenberg kernel: R13: ffff8ed092d7c800 R14: ffff8ed64f5db028 R15: ffff8ed6bd03d068 heisenberg kernel: FS: 0000000000000000(0000) GS:ffff8ed6bda00000(0000) knlGS:0000000000000000 heisenberg kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 heisenberg kernel: CR2: 00007f46f75f8000 CR3: 0000000310a0a002 CR4: 00000000003606f0 heisenberg kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 heisenberg kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 heisenberg kernel: Call Trace: heisenberg kernel: commit_fs_roots+0x166/0x1d0 [btrfs] heisenberg kernel: ? _cond_resched+0x15/0x30 heisenberg kernel: ? btrfs_run_delayed_refs+0xac/0x180 [btrfs] heisenberg kernel: btrfs_commit_transaction+0x2bd/0x870 [btrfs] heisenberg kernel: ? start_transaction+0x9d/0x3f0 [btrfs] heisenberg kernel: transaction_kthread+0x147/0x180 [btrfs] heisenberg kernel: ? btrfs_cleanup_transaction+0x530/0x530 [btrfs] heisenberg kernel: kthread+0x112/0x130 heisenberg kernel: ? kthread_bind+0x30/0x30 heisenberg kernel: ret_from_fork+0x35/0x40 heisenberg kernel: ---[ end trace 05de912e30e012d9 ]---
So fix that by using the attach API, which does not create a transaction when there is currently no running transaction.
[1] https://lore.kernel.org/linux-btrfs/b2a668d7124f1d3e410367f587926f622b3f03a4...
Reported-by: Zygo Blaxell ce3g8jdj@umail.furryterror.org CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/backref.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -1907,13 +1907,19 @@ int iterate_extent_inodes(struct btrfs_f extent_item_objectid);
if (!search_commit_root) { - trans = btrfs_join_transaction(fs_info->extent_root); - if (IS_ERR(trans)) - return PTR_ERR(trans); + trans = btrfs_attach_transaction(fs_info->extent_root); + if (IS_ERR(trans)) { + if (PTR_ERR(trans) != -ENOENT && + PTR_ERR(trans) != -EROFS) + return PTR_ERR(trans); + trans = NULL; + } + } + + if (trans) btrfs_get_tree_mod_seq(fs_info, &tree_mod_seq_elem); - } else { + else down_read(&fs_info->commit_root_sem); - }
ret = btrfs_find_all_leafs(trans, fs_info, extent_item_objectid, tree_mod_seq_elem.seq, &refs, @@ -1945,7 +1951,7 @@ int iterate_extent_inodes(struct btrfs_f
free_leaf_list(refs); out: - if (!search_commit_root) { + if (trans) { btrfs_put_tree_mod_seq(fs_info, &tree_mod_seq_elem); btrfs_end_transaction(trans); } else {
From: Liang Chen liangchen.linux@gmail.com
commit a4b732a248d12cbdb46999daf0bf288c011335eb upstream.
There is a race between cache device register and cache set unregister. For an already registered cache device, register_bcache will call bch_is_open to iterate through all cachesets and check every cache there. The race occurs if cache_set_free executes at the same time and clears the caches right before ca is dereferenced in bch_is_open_cache. To close the race, let's make sure the clean up work is protected by the bch_register_lock as well.
This issue can be reproduced as follows, while true; do echo /dev/XXX> /sys/fs/bcache/register ; done& while true; do echo 1> /sys/block/XXX/bcache/set/unregister ; done &
and results in the following oops,
[ +0.000053] BUG: unable to handle kernel NULL pointer dereference at 0000000000000998 [ +0.000457] #PF error: [normal kernel read fault] [ +0.000464] PGD 800000003ca9d067 P4D 800000003ca9d067 PUD 3ca9c067 PMD 0 [ +0.000388] Oops: 0000 [#1] SMP PTI [ +0.000269] CPU: 1 PID: 3266 Comm: bash Not tainted 5.0.0+ #6 [ +0.000346] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.fc28 04/01/2014 [ +0.000472] RIP: 0010:register_bcache+0x1829/0x1990 [bcache] [ +0.000344] Code: b0 48 83 e8 50 48 81 fa e0 e1 10 c0 0f 84 a9 00 00 00 48 89 c6 48 89 ca 0f b7 ba 54 04 00 00 4c 8b 82 60 0c 00 00 85 ff 74 2f <49> 3b a8 98 09 00 00 74 4e 44 8d 47 ff 31 ff 49 c1 e0 03 eb 0d [ +0.000839] RSP: 0018:ffff92ee804cbd88 EFLAGS: 00010202 [ +0.000328] RAX: ffffffffc010e190 RBX: ffff918b5c6b5000 RCX: ffff918b7d8e0000 [ +0.000399] RDX: ffff918b7d8e0000 RSI: ffffffffc010e190 RDI: 0000000000000001 [ +0.000398] RBP: ffff918b7d318340 R08: 0000000000000000 R09: ffffffffb9bd2d7a [ +0.000385] R10: ffff918b7eb253c0 R11: ffffb95980f51200 R12: ffffffffc010e1a0 [ +0.000411] R13: fffffffffffffff2 R14: 000000000000000b R15: ffff918b7e232620 [ +0.000384] FS: 00007f955bec2740(0000) GS:ffff918b7eb00000(0000) knlGS:0000000000000000 [ +0.000420] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ +0.000801] CR2: 0000000000000998 CR3: 000000003cad6000 CR4: 00000000001406e0 [ +0.000837] Call Trace: [ +0.000682] ? _cond_resched+0x10/0x20 [ +0.000691] ? __kmalloc+0x131/0x1b0 [ +0.000710] kernfs_fop_write+0xfa/0x170 [ +0.000733] __vfs_write+0x2e/0x190 [ +0.000688] ? inode_security+0x10/0x30 [ +0.000698] ? selinux_file_permission+0xd2/0x120 [ +0.000752] ? security_file_permission+0x2b/0x100 [ +0.000753] vfs_write+0xa8/0x1a0 [ +0.000676] ksys_write+0x4d/0xb0 [ +0.000699] do_syscall_64+0x3a/0xf0 [ +0.000692] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Signed-off-by: Liang Chen liangchen.linux@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Coly Li colyli@suse.de Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/bcache/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/md/bcache/super.c +++ b/drivers/md/bcache/super.c @@ -1357,6 +1357,7 @@ static void cache_set_free(struct closur bch_btree_cache_free(c); bch_journal_free(c);
+ mutex_lock(&bch_register_lock); for_each_cache(ca, c, i) if (ca) { ca->set = NULL; @@ -1379,7 +1380,6 @@ static void cache_set_free(struct closur mempool_destroy(c->search); kfree(c->devices);
- mutex_lock(&bch_register_lock); list_del(&c->list); mutex_unlock(&bch_register_lock);
From: Coly Li colyli@suse.de
commit 1bee2addc0c8470c8aaa65ef0599eeae96dd88bc upstream.
In journal_reclaim() ja->cur_idx of each cache will be update to reclaim available journal buckets. Variable 'int n' is used to count how many cache is successfully reclaimed, then n is set to c->journal.key by SET_KEY_PTRS(). Later in journal_write_unlocked(), a for_each_cache() loop will write the jset data onto each cache.
The problem is, if all jouranl buckets on each cache is full, the following code in journal_reclaim(),
529 for_each_cache(ca, c, iter) { 530 struct journal_device *ja = &ca->journal; 531 unsigned int next = (ja->cur_idx + 1) % ca->sb.njournal_buckets; 532 533 /* No space available on this device */ 534 if (next == ja->discard_idx) 535 continue; 536 537 ja->cur_idx = next; 538 k->ptr[n++] = MAKE_PTR(0, 539 bucket_to_sector(c, ca->sb.d[ja->cur_idx]), 540 ca->sb.nr_this_dev); 541 } 542 543 bkey_init(k); 544 SET_KEY_PTRS(k, n);
If there is no available bucket to reclaim, the if() condition at line 534 will always true, and n remains 0. Then at line 544, SET_KEY_PTRS() will set KEY_PTRS field of c->journal.key to 0.
Setting KEY_PTRS field of c->journal.key to 0 is wrong. Because in journal_write_unlocked() the journal data is written in following loop,
649 for (i = 0; i < KEY_PTRS(k); i++) { 650-671 submit journal data to cache device 672 }
If KEY_PTRS field is set to 0 in jouranl_reclaim(), the journal data won't be written to cache device here. If system crahed or rebooted before bkeys of the lost journal entries written into btree nodes, data corruption will be reported during bcache reload after rebooting the system.
Indeed there is only one cache in a cache set, there is no need to set KEY_PTRS field in journal_reclaim() at all. But in order to keep the for_each_cache() logic consistent for now, this patch fixes the above problem by not setting 0 KEY_PTRS of journal key, if there is no bucket available to reclaim.
Signed-off-by: Coly Li colyli@suse.de Reviewed-by: Hannes Reinecke hare@suse.com Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/md/bcache/journal.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/md/bcache/journal.c +++ b/drivers/md/bcache/journal.c @@ -512,11 +512,11 @@ static void journal_reclaim(struct cache ca->sb.nr_this_dev); }
- bkey_init(k); - SET_KEY_PTRS(k, n); - - if (n) + if (n) { + bkey_init(k); + SET_KEY_PTRS(k, n); c->journal.blocks_free = c->sb.bucket_size >> c->block_bits; + } out: if (!journal_full(&c->journal)) __closure_wake_up(&c->journal.wait); @@ -641,6 +641,9 @@ static void journal_write_unlocked(struc ca->journal.seq[ca->journal.cur_idx] = w->data->seq; }
+ /* If KEY_PTRS(k) == 0, this jset gets lost in air */ + BUG_ON(i == 0); + atomic_dec_bug(&fifo_back(&c->journal.pin)); bch_journal_next(&c->journal); journal_reclaim(c);
From: Barret Rhoden brho@google.com
commit 7bc04c5c2cc467c5b40f2b03ba08da174a0d5fa7 upstream.
When remounting with debug_want_extra_isize, we were not performing the same checks that we do during a normal mount. That allowed us to set a value for s_want_extra_isize that reached outside the s_inode_size.
Fixes: e2b911c53584 ("ext4: clean up feature test macros with predicate functions") Reported-by: syzbot+f584efa0ac7213c226b7@syzkaller.appspotmail.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Barret Rhoden brho@google.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/super.c | 58 ++++++++++++++++++++++++++++++++------------------------ 1 file changed, 34 insertions(+), 24 deletions(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3454,6 +3454,37 @@ int ext4_calculate_overhead(struct super return 0; }
+static void ext4_clamp_want_extra_isize(struct super_block *sb) +{ + struct ext4_sb_info *sbi = EXT4_SB(sb); + struct ext4_super_block *es = sbi->s_es; + + /* determine the minimum size of new large inodes, if present */ + if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE && + sbi->s_want_extra_isize == 0) { + sbi->s_want_extra_isize = sizeof(struct ext4_inode) - + EXT4_GOOD_OLD_INODE_SIZE; + if (ext4_has_feature_extra_isize(sb)) { + if (sbi->s_want_extra_isize < + le16_to_cpu(es->s_want_extra_isize)) + sbi->s_want_extra_isize = + le16_to_cpu(es->s_want_extra_isize); + if (sbi->s_want_extra_isize < + le16_to_cpu(es->s_min_extra_isize)) + sbi->s_want_extra_isize = + le16_to_cpu(es->s_min_extra_isize); + } + } + /* Check if enough inode space is available */ + if (EXT4_GOOD_OLD_INODE_SIZE + sbi->s_want_extra_isize > + sbi->s_inode_size) { + sbi->s_want_extra_isize = sizeof(struct ext4_inode) - + EXT4_GOOD_OLD_INODE_SIZE; + ext4_msg(sb, KERN_INFO, + "required extra inode space not available"); + } +} + static void ext4_set_resv_clusters(struct super_block *sb) { ext4_fsblk_t resv_clusters; @@ -4320,30 +4351,7 @@ no_journal: if (ext4_setup_super(sb, es, sb_rdonly(sb))) sb->s_flags |= MS_RDONLY;
- /* determine the minimum size of new large inodes, if present */ - if (sbi->s_inode_size > EXT4_GOOD_OLD_INODE_SIZE && - sbi->s_want_extra_isize == 0) { - sbi->s_want_extra_isize = sizeof(struct ext4_inode) - - EXT4_GOOD_OLD_INODE_SIZE; - if (ext4_has_feature_extra_isize(sb)) { - if (sbi->s_want_extra_isize < - le16_to_cpu(es->s_want_extra_isize)) - sbi->s_want_extra_isize = - le16_to_cpu(es->s_want_extra_isize); - if (sbi->s_want_extra_isize < - le16_to_cpu(es->s_min_extra_isize)) - sbi->s_want_extra_isize = - le16_to_cpu(es->s_min_extra_isize); - } - } - /* Check if enough inode space is available */ - if (EXT4_GOOD_OLD_INODE_SIZE + sbi->s_want_extra_isize > - sbi->s_inode_size) { - sbi->s_want_extra_isize = sizeof(struct ext4_inode) - - EXT4_GOOD_OLD_INODE_SIZE; - ext4_msg(sb, KERN_INFO, "required extra inode space not" - "available"); - } + ext4_clamp_want_extra_isize(sb);
ext4_set_resv_clusters(sb);
@@ -5128,6 +5136,8 @@ static int ext4_remount(struct super_blo goto restore_opts; }
+ ext4_clamp_want_extra_isize(sb); + if ((old_opts.s_mount_opt & EXT4_MOUNT_JOURNAL_CHECKSUM) ^ test_opt(sb, JOURNAL_CHECKSUM)) { ext4_msg(sb, KERN_ERR, "changing journal_checksum "
From: Kirill Tkhai ktkhai@virtuozzo.com
commit 310a997fd74de778b9a4848a64be9cda9f18764a upstream.
It is never possible, that number of block groups decreases, since only online grow is supported.
But after a growing occured, we have to zero inode tables for just created new block groups.
Fixes: 19c5246d2516 ("ext4: add new online resize interface") Signed-off-by: Kirill Tkhai ktkhai@virtuozzo.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -918,7 +918,7 @@ group_add_out: if (err == 0) err = err2; mnt_drop_write_file(filp); - if (!err && (o_group > EXT4_SB(sb)->s_groups_count) && + if (!err && (o_group < EXT4_SB(sb)->s_groups_count) && ext4_has_group_desc_csum(sb) && test_opt(sb, INIT_INODE_TABLE)) err = ext4_register_li_request(sb, o_group);
From: Debabrata Banerjee dbanerje@akamai.com
commit 50b29d8f033a7c88c5bc011abc2068b1691ab755 upstream.
Instead of removing EXT4_MOUNT_JOURNAL_CHECKSUM from s_def_mount_opt as I assume was intended, all other options were blown away leading to _ext4_show_options() output being incorrect.
Fixes: 1e381f60dad9 ("ext4: do not allow journal_opts for fs w/o journal") Signed-off-by: Debabrata Banerjee dbanerje@akamai.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4209,7 +4209,7 @@ static int ext4_fill_super(struct super_ "data=, fs mounted w/o journal"); goto failed_mount_wq; } - sbi->s_def_mount_opt &= EXT4_MOUNT_JOURNAL_CHECKSUM; + sbi->s_def_mount_opt &= ~EXT4_MOUNT_JOURNAL_CHECKSUM; clear_opt(sb, JOURNAL_CHECKSUM); clear_opt(sb, DATA_FLAGS); sbi->s_journal = NULL;
From: Kamlakant Patel kamlakantp@marvell.com
commit 55be8658c7e2feb11a5b5b33ee031791dbd23a69 upstream.
According to ipmi spec, block number is a number that is incremented, starting with 0, for each new block of message data returned using the middle transaction.
Here, the 'blocknum' is data[0] which always starts from zero(0) and 'ssif_info->multi_pos' starts from 1. So, we need to add +1 to blocknum while comparing with multi_pos.
Fixes: 7d6380cd40f79 ("ipmi:ssif: Fix handling of multi-part return messages"). Reported-by: Kiran Kolukuluru kirank@ami.com Signed-off-by: Kamlakant Patel kamlakantp@marvell.com Message-Id: 1556106615-18722-1-git-send-email-kamlakantp@marvell.com [Also added a debug log if the block numbers don't match.] Signed-off-by: Corey Minyard cminyard@mvista.com Cc: stable@vger.kernel.org # 4.4 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/ipmi/ipmi_ssif.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/char/ipmi/ipmi_ssif.c +++ b/drivers/char/ipmi/ipmi_ssif.c @@ -703,12 +703,16 @@ static void msg_done_handler(struct ssif /* End of read */ len = ssif_info->multi_len; data = ssif_info->data; - } else if (blocknum != ssif_info->multi_pos) { + } else if (blocknum + 1 != ssif_info->multi_pos) { /* * Out of sequence block, just abort. Block * numbers start at zero for the second block, * but multi_pos starts at one, so the +1. */ + if (ssif_info->ssif_debug & SSIF_DEBUG_MSG) + dev_dbg(&ssif_info->client->dev, + "Received message out of sequence, expected %u, got %u\n", + ssif_info->multi_pos - 1, blocknum); result = -EIO; } else { ssif_inc_stat(ssif_info, received_message_parts);
From: Eric Biggers ebiggers@google.com
commit 4a8108b70508df0b6c4ffa4a3974dab93dcbe851 upstream.
If the user-provided IV needs to be aligned to the algorithm's alignmask, then skcipher_walk_virt() copies the IV into a new aligned buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then if the caller unconditionally accesses walk.iv, it's a use-after-free.
xts-aes-neonbs doesn't set an alignmask, so currently it isn't affected by this despite unconditionally accessing walk.iv. However this is more subtle than desired, and unconditionally accessing walk.iv has caused a real problem in other algorithms. Thus, update xts-aes-neonbs to start checking the return value of skcipher_walk_virt().
Fixes: 1abee99eafab ("crypto: arm64/aes - reimplement bit-sliced ARM/NEON implementation for arm64") Cc: stable@vger.kernel.org # v4.11+ Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/crypto/aes-neonbs-glue.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/arm64/crypto/aes-neonbs-glue.c +++ b/arch/arm64/crypto/aes-neonbs-glue.c @@ -307,6 +307,8 @@ static int __xts_crypt(struct skcipher_r int err;
err = skcipher_walk_virt(&walk, req, true); + if (err) + return err;
kernel_neon_begin();
From: Eric Biggers ebiggers@google.com
commit edaf28e996af69222b2cb40455dbb5459c2b875a upstream.
If the user-provided IV needs to be aligned to the algorithm's alignmask, then skcipher_walk_virt() copies the IV into a new aligned buffer walk.iv. But skcipher_walk_virt() can fail afterwards, and then if the caller unconditionally accesses walk.iv, it's a use-after-free.
salsa20-generic doesn't set an alignmask, so currently it isn't affected by this despite unconditionally accessing walk.iv. However this is more subtle than desired, and it was actually broken prior to the alignmask being removed by commit b62b3db76f73 ("crypto: salsa20-generic - cleanup and convert to skcipher API").
Since salsa20-generic does not update the IV and does not need any IV alignment, update it to use req->iv instead of walk.iv.
Fixes: 2407d60872dd ("[CRYPTO] salsa20: Salsa20 stream cipher") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/salsa20_generic.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/crypto/salsa20_generic.c +++ b/crypto/salsa20_generic.c @@ -186,7 +186,7 @@ static int encrypt(struct blkcipher_desc blkcipher_walk_init(&walk, dst, src, nbytes); err = blkcipher_walk_virt_block(desc, &walk, 64);
- salsa20_ivsetup(ctx, walk.iv); + salsa20_ivsetup(ctx, desc->info);
while (walk.nbytes >= 64) { salsa20_encrypt_bytes(ctx, walk.dst.virt.addr,
From: Eric Biggers ebiggers@google.com
commit 6a1faa4a43f5fabf9cbeaa742d916e7b5e73120f upstream.
CCM instances can be created by either the "ccm" template, which only allows choosing the block cipher, e.g. "ccm(aes)"; or by "ccm_base", which allows choosing the ctr and cbcmac implementations, e.g. "ccm_base(ctr(aes-generic),cbcmac(aes-generic))".
However, a "ccm_base" instance prevents a "ccm" instance from being registered using the same implementations. Nor will the instance be found by lookups of "ccm". This can be used as a denial of service. Moreover, "ccm_base" instances are never tested by the crypto self-tests, even if there are compatible "ccm" tests.
The root cause of these problems is that instances of the two templates use different cra_names. Therefore, fix these problems by making "ccm_base" instances set the same cra_name as "ccm" instances, e.g. "ccm(aes)" instead of "ccm_base(ctr(aes-generic),cbcmac(aes-generic))".
This requires extracting the block cipher name from the name of the ctr and cbcmac algorithms. It also requires starting to verify that the algorithms are really ctr and cbcmac using the same block cipher, not something else entirely. But it would be bizarre if anyone were actually using non-ccm-compatible algorithms with ccm_base, so this shouldn't break anyone in practice.
Fixes: 4a49b499dfa0 ("[CRYPTO] ccm: Added CCM mode") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- crypto/ccm.c | 44 ++++++++++++++++++-------------------------- 1 file changed, 18 insertions(+), 26 deletions(-)
--- a/crypto/ccm.c +++ b/crypto/ccm.c @@ -455,7 +455,6 @@ static void crypto_ccm_free(struct aead_
static int crypto_ccm_create_common(struct crypto_template *tmpl, struct rtattr **tb, - const char *full_name, const char *ctr_name, const char *mac_name) { @@ -483,7 +482,8 @@ static int crypto_ccm_create_common(stru
mac = __crypto_hash_alg_common(mac_alg); err = -EINVAL; - if (mac->digestsize != 16) + if (strncmp(mac->base.cra_name, "cbcmac(", 7) != 0 || + mac->digestsize != 16) goto out_put_mac;
inst = kzalloc(sizeof(*inst) + sizeof(*ictx), GFP_KERNEL); @@ -506,23 +506,27 @@ static int crypto_ccm_create_common(stru
ctr = crypto_spawn_skcipher_alg(&ictx->ctr);
- /* Not a stream cipher? */ + /* The skcipher algorithm must be CTR mode, using 16-byte blocks. */ err = -EINVAL; - if (ctr->base.cra_blocksize != 1) + if (strncmp(ctr->base.cra_name, "ctr(", 4) != 0 || + crypto_skcipher_alg_ivsize(ctr) != 16 || + ctr->base.cra_blocksize != 1) goto err_drop_ctr;
- /* We want the real thing! */ - if (crypto_skcipher_alg_ivsize(ctr) != 16) + /* ctr and cbcmac must use the same underlying block cipher. */ + if (strcmp(ctr->base.cra_name + 4, mac->base.cra_name + 7) != 0) goto err_drop_ctr;
err = -ENAMETOOLONG; + if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME, + "ccm(%s", ctr->base.cra_name + 4) >= CRYPTO_MAX_ALG_NAME) + goto err_drop_ctr; + if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME, "ccm_base(%s,%s)", ctr->base.cra_driver_name, mac->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME) goto err_drop_ctr;
- memcpy(inst->alg.base.cra_name, full_name, CRYPTO_MAX_ALG_NAME); - inst->alg.base.cra_flags = ctr->base.cra_flags & CRYPTO_ALG_ASYNC; inst->alg.base.cra_priority = (mac->base.cra_priority + ctr->base.cra_priority) / 2; @@ -564,7 +568,6 @@ static int crypto_ccm_create(struct cryp const char *cipher_name; char ctr_name[CRYPTO_MAX_ALG_NAME]; char mac_name[CRYPTO_MAX_ALG_NAME]; - char full_name[CRYPTO_MAX_ALG_NAME];
cipher_name = crypto_attr_alg_name(tb[1]); if (IS_ERR(cipher_name)) @@ -578,12 +581,7 @@ static int crypto_ccm_create(struct cryp cipher_name) >= CRYPTO_MAX_ALG_NAME) return -ENAMETOOLONG;
- if (snprintf(full_name, CRYPTO_MAX_ALG_NAME, "ccm(%s)", cipher_name) >= - CRYPTO_MAX_ALG_NAME) - return -ENAMETOOLONG; - - return crypto_ccm_create_common(tmpl, tb, full_name, ctr_name, - mac_name); + return crypto_ccm_create_common(tmpl, tb, ctr_name, mac_name); }
static struct crypto_template crypto_ccm_tmpl = { @@ -596,23 +594,17 @@ static int crypto_ccm_base_create(struct struct rtattr **tb) { const char *ctr_name; - const char *cipher_name; - char full_name[CRYPTO_MAX_ALG_NAME]; + const char *mac_name;
ctr_name = crypto_attr_alg_name(tb[1]); if (IS_ERR(ctr_name)) return PTR_ERR(ctr_name);
- cipher_name = crypto_attr_alg_name(tb[2]); - if (IS_ERR(cipher_name)) - return PTR_ERR(cipher_name); - - if (snprintf(full_name, CRYPTO_MAX_ALG_NAME, "ccm_base(%s,%s)", - ctr_name, cipher_name) >= CRYPTO_MAX_ALG_NAME) - return -ENAMETOOLONG; + mac_name = crypto_attr_alg_name(tb[2]); + if (IS_ERR(mac_name)) + return PTR_ERR(mac_name);
- return crypto_ccm_create_common(tmpl, tb, full_name, ctr_name, - cipher_name); + return crypto_ccm_create_common(tmpl, tb, ctr_name, mac_name); }
static struct crypto_template crypto_ccm_base_tmpl = {
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
When commit e9919a24d302 ("fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied") was backported to 4.9.y, it changed the logic a bit as err should have been reset before exiting the test, like it happens in the original logic.
If this is not set, errors happen :(
Reported-by: Nathan Chancellor natechancellor@gmail.com Reported-by: David Ahern dsahern@gmail.com Reported-by: Florian Westphal fw@strlen.de Cc: Hangbin Liu liuhangbin@gmail.com Cc: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/fib_rules.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/core/fib_rules.c b/net/core/fib_rules.c index bb26457e8c21..c03dd2104d33 100644 --- a/net/core/fib_rules.c +++ b/net/core/fib_rules.c @@ -430,6 +430,7 @@ int fib_nl_newrule(struct sk_buff *skb, struct nlmsghdr *nlh) goto errout_free;
if (rule_exists(ops, frh, tb, rule)) { + err = 0; if (nlh->nlmsg_flags & NLM_F_EXCL) err = -EEXIST; goto errout_free;
From: Jiufei Xue jiufei.xue@linux.alibaba.com
commit ec084de929e419e51bcdafaafe567d9e7d0273b7 upstream.
synchronize_rcu() didn't wait for call_rcu() callbacks, so inode wb switch may not go to the workqueue after synchronize_rcu(). Thus previous scheduled switches was not finished even flushing the workqueue, which will cause a NULL pointer dereferenced followed below.
VFS: Busy inodes after unmount of vdd. Self-destruct in 5 seconds. Have a nice day... BUG: unable to handle kernel NULL pointer dereference at 0000000000000278 evict+0xb3/0x180 iput+0x1b0/0x230 inode_switch_wbs_work_fn+0x3c0/0x6a0 worker_thread+0x4e/0x490 ? process_one_work+0x410/0x410 kthread+0xe6/0x100 ret_from_fork+0x39/0x50
Replace the synchronize_rcu() call with a rcu_barrier() to wait for all pending callbacks to finish. And inc isw_nr_in_flight after call_rcu() in inode_switch_wbs() to make more sense.
Link: http://lkml.kernel.org/r/20190429024108.54150-1-jiufei.xue@linux.alibaba.com Signed-off-by: Jiufei Xue jiufei.xue@linux.alibaba.com Acked-by: Tejun Heo tj@kernel.org Suggested-by: Tejun Heo tj@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/fs-writeback.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -530,8 +530,6 @@ static void inode_switch_wbs(struct inod
isw->inode = inode;
- atomic_inc(&isw_nr_in_flight); - /* * In addition to synchronizing among switchers, I_WB_SWITCH tells * the RCU protected stat update paths to grab the mapping's @@ -539,6 +537,9 @@ static void inode_switch_wbs(struct inod * Let's continue after I_WB_SWITCH is guaranteed to be visible. */ call_rcu(&isw->rcu_head, inode_switch_wbs_rcu_fn); + + atomic_inc(&isw_nr_in_flight); + goto out_unlock;
out_free: @@ -908,7 +909,11 @@ restart: void cgroup_writeback_umount(void) { if (atomic_read(&isw_nr_in_flight)) { - synchronize_rcu(); + /* + * Use rcu_barrier() to wait for all pending callbacks to + * ensure that all in-flight wb switches are in the workqueue. + */ + rcu_barrier(); flush_workqueue(isw_wq); } }
From: Sriram Rajagopalan sriramr@arista.com
commit 592acbf16821288ecdc4192c47e3774a4c48bb64 upstream.
This commit zeroes out the unused memory region in the buffer_head corresponding to the extent metablock after writing the extent header and the corresponding extent node entries.
This is done to prevent random uninitialized data from getting into the filesystem when the extent block is synced.
This fixes CVE-2019-11833.
Signed-off-by: Sriram Rajagopalan sriramr@arista.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/extents.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
--- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -1047,6 +1047,7 @@ static int ext4_ext_split(handle_t *hand __le32 border; ext4_fsblk_t *ablocks = NULL; /* array of allocated blocks */ int err = 0; + size_t ext_size = 0;
/* make decision: where to split? */ /* FIXME: now decision is simplest: at current extent */ @@ -1138,6 +1139,10 @@ static int ext4_ext_split(handle_t *hand le16_add_cpu(&neh->eh_entries, m); }
+ /* zero out unused area in the extent block */ + ext_size = sizeof(struct ext4_extent_header) + + sizeof(struct ext4_extent) * le16_to_cpu(neh->eh_entries); + memset(bh->b_data + ext_size, 0, inode->i_sb->s_blocksize - ext_size); ext4_extent_block_csum_set(inode, neh); set_buffer_uptodate(bh); unlock_buffer(bh); @@ -1217,6 +1222,11 @@ static int ext4_ext_split(handle_t *hand sizeof(struct ext4_extent_idx) * m); le16_add_cpu(&neh->eh_entries, m); } + /* zero out unused area in the extent block */ + ext_size = sizeof(struct ext4_extent_header) + + (sizeof(struct ext4_extent) * le16_to_cpu(neh->eh_entries)); + memset(bh->b_data + ext_size, 0, + inode->i_sb->s_blocksize - ext_size); ext4_extent_block_csum_set(inode, neh); set_buffer_uptodate(bh); unlock_buffer(bh); @@ -1282,6 +1292,7 @@ static int ext4_ext_grow_indepth(handle_ ext4_fsblk_t newblock, goal = 0; struct ext4_super_block *es = EXT4_SB(inode->i_sb)->s_es; int err = 0; + size_t ext_size = 0;
/* Try to prepend new index to old one */ if (ext_depth(inode)) @@ -1307,9 +1318,11 @@ static int ext4_ext_grow_indepth(handle_ goto out; }
+ ext_size = sizeof(EXT4_I(inode)->i_data); /* move top-level index/leaf into new block */ - memmove(bh->b_data, EXT4_I(inode)->i_data, - sizeof(EXT4_I(inode)->i_data)); + memmove(bh->b_data, EXT4_I(inode)->i_data, ext_size); + /* zero out unused area in the extent block */ + memset(bh->b_data + ext_size, 0, inode->i_sb->s_blocksize - ext_size);
/* set size of new block */ neh = ext_block_hdr(bh);
From: Lukas Czerner lczerner@redhat.com
commit 57a0da28ced8707cb9f79f071a016b9d005caf5a upstream.
Unaligned AIO must be serialized because the zeroing of partial blocks of unaligned AIO can result in data corruption in case it's overlapping another in flight IO.
Currently we wait for all unwritten extents before we submit unaligned AIO which protects data in case of unaligned AIO is following overlapping IO. However if a unaligned AIO is followed by overlapping aligned AIO we can still end up corrupting data.
To fix this, we must make sure that the unaligned AIO is the only IO in flight by waiting for unwritten extents conversion not just before the IO submission, but right after it as well.
This problem can be reproduced by xfstest generic/538
Signed-off-by: Lukas Czerner lczerner@redhat.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/file.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/fs/ext4/file.c +++ b/fs/ext4/file.c @@ -262,6 +262,13 @@ ext4_file_write_iter(struct kiocb *iocb, }
ret = __generic_file_write_iter(iocb, from); + /* + * Unaligned direct AIO must be the only IO in flight. Otherwise + * overlapping aligned IO after unaligned might result in data + * corruption. + */ + if (ret == -EIOCBQUEUED && unaligned_aio) + ext4_unwritten_wait(inode); inode_unlock(inode);
if (ret > 0)
From: Sahitya Tummala stummala@codeaurora.org
commit 08fc98a4d6424af66eb3ac4e2cedd2fc927ed436 upstream.
The buffer_head (frames[0].bh) and it's corresping page can be potentially free'd once brelse() is done inside the for loop but before the for loop exits in dx_release(). It can be free'd in another context, when the page cache is flushed via drop_caches_sysctl_handler(). This results into below data abort when accessing info->indirect_levels in dx_release().
Unable to handle kernel paging request at virtual address ffffffc17ac3e01e Call trace: dx_release+0x70/0x90 ext4_htree_fill_tree+0x2d4/0x300 ext4_readdir+0x244/0x6f8 iterate_dir+0xbc/0x160 SyS_getdents64+0x94/0x174
Signed-off-by: Sahitya Tummala stummala@codeaurora.org Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Andreas Dilger adilger@dilger.ca Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/namei.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -870,12 +870,15 @@ static void dx_release(struct dx_frame * { struct dx_root_info *info; int i; + unsigned int indirect_levels;
if (frames[0].bh == NULL) return;
info = &((struct dx_root *)frames[0].bh->b_data)->info; - for (i = 0; i <= info->indirect_levels; i++) { + /* save local copy, "info" may be freed after brelse() */ + indirect_levels = info->indirect_levels; + for (i = 0; i <= indirect_levels; i++) { if (frames[i].bh == NULL) break; brelse(frames[i].bh);
From: Michał Wadowski wadosm@gmail.com
commit 56df90b631fc027fe28b70d41352d820797239bb upstream.
Add patch for realtek codec in Lenovo B50-70 that fixes inverted internal microphone channel. Device IdeaPad Y410P has the same PCI SSID as Lenovo B50-70, but first one is about fix the noise and it didn't seem help in a later kernel version. So I replaced IdeaPad Y410P device description with B50-70 and apply inverted microphone fix.
Bugzilla: https://bugs.launchpad.net/ubuntu/+source/alsa-driver/+bug/1524215 Signed-off-by: Michał Wadowski wadosm@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6550,7 +6550,7 @@ static const struct snd_pci_quirk alc269 SND_PCI_QUIRK(0x17aa, 0x313c, "ThinkCentre Station", ALC294_FIXUP_LENOVO_MIC_LOCATION), SND_PCI_QUIRK(0x17aa, 0x3902, "Lenovo E50-80", ALC269_FIXUP_DMIC_THINKPAD_ACPI), SND_PCI_QUIRK(0x17aa, 0x3977, "IdeaPad S210", ALC283_FIXUP_INT_MIC), - SND_PCI_QUIRK(0x17aa, 0x3978, "IdeaPad Y410P", ALC269_FIXUP_NO_SHUTUP), + SND_PCI_QUIRK(0x17aa, 0x3978, "Lenovo B50-70", ALC269_FIXUP_DMIC_THINKPAD_ACPI), SND_PCI_QUIRK(0x17aa, 0x5013, "Thinkpad", ALC269_FIXUP_LIMIT_INT_MIC_BOOST), SND_PCI_QUIRK(0x17aa, 0x501a, "Thinkpad", ALC283_FIXUP_INT_MIC), SND_PCI_QUIRK(0x17aa, 0x501e, "Thinkpad L440", ALC292_FIXUP_TPT440_DOCK),
From: Sean Christopherson sean.j.christopherson@intel.com
commit 11988499e62b310f3bf6f6d0a807a06d3f9ccc96 upstream.
KVM allows userspace to violate consistency checks related to the guest's CPUID model to some degree. Generally speaking, userspace has carte blanche when it comes to guest state so long as jamming invalid state won't negatively affect the host.
Currently this is seems to be a non-issue as most of the interesting EFER checks are missing, e.g. NX and LME, but those will be added shortly. Proactively exempt userspace from the CPUID checks so as not to break userspace.
Note, the efer_reserved_bits check still applies to userspace writes as that mask reflects the host's capabilities, e.g. KVM shouldn't allow a guest to run with NX=1 if it has been disabled in the host.
Fixes: d80174745ba39 ("KVM: SVM: Only allow setting of EFER_SVME when CPUID SVM is set") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/x86.c | 37 ++++++++++++++++++++++++------------- 1 file changed, 24 insertions(+), 13 deletions(-)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1107,31 +1107,42 @@ static int do_get_msr_feature(struct kvm return 0; }
-bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer) +static bool __kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer) { - if (efer & efer_reserved_bits) - return false; - if (efer & EFER_FFXSR && !guest_cpuid_has(vcpu, X86_FEATURE_FXSR_OPT)) - return false; + return false;
if (efer & EFER_SVME && !guest_cpuid_has(vcpu, X86_FEATURE_SVM)) - return false; + return false;
return true; + +} +bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer) +{ + if (efer & efer_reserved_bits) + return false; + + return __kvm_valid_efer(vcpu, efer); } EXPORT_SYMBOL_GPL(kvm_valid_efer);
-static int set_efer(struct kvm_vcpu *vcpu, u64 efer) +static int set_efer(struct kvm_vcpu *vcpu, struct msr_data *msr_info) { u64 old_efer = vcpu->arch.efer; + u64 efer = msr_info->data;
- if (!kvm_valid_efer(vcpu, efer)) - return 1; + if (efer & efer_reserved_bits) + return false;
- if (is_paging(vcpu) - && (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME)) - return 1; + if (!msr_info->host_initiated) { + if (!__kvm_valid_efer(vcpu, efer)) + return 1; + + if (is_paging(vcpu) && + (vcpu->arch.efer & EFER_LME) != (efer & EFER_LME)) + return 1; + }
efer &= ~EFER_LMA; efer |= vcpu->arch.efer & EFER_LMA; @@ -2240,7 +2251,7 @@ int kvm_set_msr_common(struct kvm_vcpu * vcpu->arch.arch_capabilities = data; break; case MSR_EFER: - return set_efer(vcpu, data); + return set_efer(vcpu, msr_info); case MSR_K7_HWCR: data &= ~(u64)0x40; /* ignore flush filter disable */ data &= ~(u64)0x100; /* ignore ignne emulation enable */
From: Eric Dumazet edumazet@google.com
commit 6daef95b8c914866a46247232a048447fff97279 upstream.
Avoid cache line miss dereferencing struct page if we can.
page_copy_sane() mostly deals with order-0 pages.
Extra cache line miss is visible on TCP recvmsg() calls dealing with GRO packets (typically 45 page frags are attached to one skb).
Bringing the 45 struct pages into cpu cache while copying the data is not free, since the freeing of the skb (and associated page frags put_page()) can happen after cache lines have been evicted.
Signed-off-by: Eric Dumazet edumazet@google.com Cc: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Al Viro viro@zeniv.linux.org.uk Cc: Matthew Wilcox willy@infradead.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- lib/iov_iter.c | 17 +++++++++++++++-- 1 file changed, 15 insertions(+), 2 deletions(-)
--- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -687,8 +687,21 @@ EXPORT_SYMBOL(_copy_from_iter_full_nocac
static inline bool page_copy_sane(struct page *page, size_t offset, size_t n) { - struct page *head = compound_head(page); - size_t v = n + offset + page_address(page) - page_address(head); + struct page *head; + size_t v = n + offset; + + /* + * The general case needs to access the page order in order + * to compute the page size. + * However, we mostly deal with order-0 pages and thus can + * avoid a possible cache line miss for requests that fit all + * page orders. + */ + if (n <= v && v <= PAGE_SIZE) + return true; + + head = compound_head(page); + v += (page - head) << PAGE_SHIFT;
if (likely(n <= v && v <= (PAGE_SIZE << compound_order(head)))) return true;
From: zhangyi (F) yi.zhang@huawei.com
commit ddccb6dbe780d68133191477571cb7c69e17bb8c upstream.
Fix compile error below when using BUFFER_TRACE.
fs/ext4/inode.c: In function ‘ext4_expand_extra_isize’: fs/ext4/inode.c:5979:19: error: request for member ‘bh’ in something not a structure or union BUFFER_TRACE(iloc.bh, "get_write_access");
Fixes: c03b45b853f58 ("ext4, project: expand inode extra size if possible") Signed-off-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5818,7 +5818,7 @@ int ext4_expand_extra_isize(struct inode
ext4_write_lock_xattr(inode, &no_expand);
- BUFFER_TRACE(iloc.bh, "get_write_access"); + BUFFER_TRACE(iloc->bh, "get_write_access"); error = ext4_journal_get_write_access(handle, iloc->bh); if (error) { brelse(iloc->bh);
stable-rc/linux-4.14.y boot: 117 boots: 1 failed, 113 passed with 1 offline, 1 untried/unknown, 1 conflict (v4.14.120-64-gffedd7fd95e8)
Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.14.y/kernel/v4.14... Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-4.14.y/kernel/v4.14.120-64...
Tree: stable-rc Branch: linux-4.14.y Git Describe: v4.14.120-64-gffedd7fd95e8 Git Commit: ffedd7fd95e8d03834094434754a33dbc060770d Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 62 unique boards, 24 SoC families, 14 builds out of 201
Boot Regressions Detected:
arm:
omap2plus_defconfig: gcc-8: omap4-panda: lab-baylibre: new failure (last pass: v4.14.120)
Boot Failure Detected:
arm64: defconfig: gcc-8: rk3399-firefly: 1 failed lab
Offline Platforms:
arm:
multi_v7_defconfig: gcc-8 stih410-b2120: 1 offline lab
Conflicting Boot Failure Detected: (These likely are not failures as other labs are reporting PASS. Needs review.)
arm: omap2plus_defconfig: omap4-panda: lab-baylibre: FAIL (gcc-8) lab-baylibre-seattle: PASS (gcc-8)
--- For more info write to info@kernelci.org
On 20/05/2019 13:13, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.121 release. There are 63 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed 22 May 2019 11:50:54 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.121-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v4.14: 8 builds: 8 pass, 0 fail 16 boots: 16 pass, 0 fail 24 tests: 24 pass, 0 fail
Linux version: 4.14.121-rc1-gffedd7f Boards tested: tegra124-jetson-tk1, tegra20-ventana, tegra210-p2371-2180, tegra30-cardhu-a04
Cheers Jon
On Mon, 20 May 2019 at 17:49, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 4.14.121 release. There are 63 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed 22 May 2019 11:50:54 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.121-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 4.14.121-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: ffedd7fd95e8d03834094434754a33dbc060770d git describe: v4.14.120-64-gffedd7fd95e8 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.120-6...
No regressions (compared to build v4.14.120)
No fixes (compared to build v4.14.120)
Ran 23544 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * ltp-open-posix-tests * kvm-unit-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none * ssuite
On 5/20/19 6:13 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 4.14.121 release. There are 63 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed 22 May 2019 11:50:54 AM UTC. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.121-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org