This is the start of the stable review cycle for the 5.5.5 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Feb 2020 19:03:19 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.5-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.5.5-rc1
Michał Mirosław mirq-linux@rere.qmqm.pl mmc: core: Rework wp-gpio handling
Michał Mirosław mirq-linux@rere.qmqm.pl gpio: add gpiod_toggle_active_low()
Trond Myklebust trondmy@gmail.com NFSv4: Add accounting for the number of active delegations held
Jens Axboe axboe@kernel.dk io-wq: add support for inheriting ->fs
Chengguang Xu cgxu519@mykernel.net ext4: choose hardlimit when softlimit is larger than hardlimit in ext4_statfs_project()
Chris Wilson chris@chris-wilson.co.uk drm/i915/pmu: Correct the rc6 offset upon enabling
Jernej Skrabec jernej.skrabec@siol.net Revert "drm/sun4i: drv: Allow framebuffer modifiers in mode config"
Trond Myklebust trondmy@gmail.com NFSv4: Ensure the delegation cred is pinned when we call delegreturn
Olga Kornievskaia kolga@netapp.com NFSv4.1 make cachethis=no for writes
Kim Phillips kim.phillips@amd.com perf stat: Don't report a null stalled cycles per insn metric
Oliver Upton oupton@google.com KVM: nVMX: Handle pending #DB when injecting INIT VM-exit
Oliver Upton oupton@google.com KVM: x86: Mask off reserved bit from #DB exception payload
Marc Zyngier maz@kernel.org arm64: dts: fast models: Fix FVP PCI interrupt-map property
Xiubo Li xiubli@redhat.com ceph: noacl mount option is effectively ignored
Petr Pavlu petr.pavlu@suse.com cifs: fix mount option display for sec=krb5i
Sara Sharon sara.sharon@intel.com mac80211: fix quiet mode activation in action frames
Mike Jones michael-a1.jones@analog.com hwmon: (pmbus/ltc2978) Fix PMBus polling of MFR_COMMON definitions.
Kan Liang kan.liang@linux.intel.com perf/x86/intel: Fix inaccurate period in context switch for auto-reload
Stephen Boyd swboyd@chromium.org spmi: pmic-arb: Set lockdep class for hierarchical irq domains
Johannes Berg johannes.berg@intel.com mac80211: use more bits for ack_frame_id
Qais Yousef qais.yousef@arm.com sched/uclamp: Reject negative values in cpu_uclamp_write()
Luca Weiss luca@z3ntu.xyz Input: ili210x - fix return value of is_visible function
Nathan Chancellor natechancellor@gmail.com s390/time: Fix clk type in get_tod_clock
Leon Romanovsky leon@kernel.org RDMA/core: Fix protection fault in get_pkey_idx_qp_list
Zhu Yanjun yanjunz@mellanox.com RDMA/rxe: Fix soft lockup problem due to using tasklets in softirq
Kamal Heib kamalheib1@gmail.com RDMA/hfi1: Fix memory leak in _dev_comp_vect_mappings_create
Krishnamraju Eraparaju krishna2@chelsio.com RDMA/iw_cxgb4: initiate CLOSE when entering TERM
Avihai Horon avihaih@mellanox.com RDMA/core: Fix invalid memory access in spec_filter_size
Yonatan Cohen yonatanc@mellanox.com IB/umad: Fix kernel crash while unloading ib_umad
Kaike Wan kaike.wan@intel.com IB/rdmavt: Reset all QPs when the device is shut down
Mike Marciniszyn mike.marciniszyn@intel.com IB/hfi1: Close window for pq and request coliding
Kaike Wan kaike.wan@intel.com IB/hfi1: Acquire lock to release TID entries when user file is closed
Mark Zhang markz@mellanox.com IB/mlx5: Return failure when rts2rts_qp_counters_set_id is not supported
Colin Ian King colin.king@canonical.com drivers: ipmi: fix off-by-one bounds check that leads to a out-of-bounds write
Yi Zhang yi.zhang@redhat.com nvme: fix the parameter order for nvme_get_log in nvme_get_fw_slot_info
Marek Behún marek.behun@nic.cz bus: moxtet: fix potential stack buffer overflow
Alex Deucher alexander.deucher@amd.com drm/amdgpu:/navi10: use the ODCAP enum to index the caps array
Alex Deucher alexander.deucher@amd.com drm/amdgpu: update smu_v11_0_pptable.h
Boris Brezillon boris.brezillon@collabora.com drm/panfrost: Make sure the shrinker does not reclaim referenced BOs
José Roberto de Souza jose.souza@intel.com drm/mst: Fix possible NULL pointer dereference in drm_dp_mst_process_up_req()
Daniel Vetter daniel.vetter@ffwll.ch drm/vgem: Close use-after-free race in vgem_gem_create
Christian Borntraeger borntraeger@de.ibm.com s390/uv: Fix handling of length extensions
Harald Freudenberger freude@linux.ibm.com s390/pkey: fix missing length of protected key on return
Kim Phillips kim.phillips@amd.com perf/x86/amd: Add missing L2 misses event spec to AMD Family 17h's event map
Sean Christopherson sean.j.christopherson@intel.com KVM: x86/mmu: Fix struct guest_walker arrays for 5-level paging
Sean Christopherson sean.j.christopherson@intel.com KVM: nVMX: Use correct root level for nested EPT shadow page tables
Robert Richter rrichter@marvell.com EDAC/mc: Fix use-after-free and memleaks during device removal
Robert Richter rrichter@marvell.com EDAC/sysfs: Remove csrow objects on errors
zhangyi (F) yi.zhang@huawei.com jbd2: do not clear the BH_Mapped flag when forgetting a metadata buffer
zhangyi (F) yi.zhang@huawei.com jbd2: move the clearing of b_modified flag to the journal_unmap_buffer()
Ronnie Sahlberg lsahlber@redhat.com cifs: make sure we do not overflow the max EA buffer size
Chuck Lever chuck.lever@oracle.com xprtrdma: Fix DMA scatter-gather list mapping imbalance
Tejun Heo tj@kernel.org cgroup: init_tasks shouldn't be linked to the root cgroup
Will Deacon will@kernel.org arm64: ssbs: Fix context-switch when SSBS is present on all CPUs
Paul Thomas pthomas8589@gmail.com gpio: xilinx: Fix bug where the wrong GPIO register is written to
Krzysztof Kozlowski krzk@kernel.org ARM: npcm: Bring back GPIOLIB support
David Sterba dsterba@suse.com btrfs: log message when rw remount is attempted with unclean tree-log
David Sterba dsterba@suse.com btrfs: print message when tree-log replay starts
Wenwen Wang wenwen@cs.uga.edu btrfs: ref-verify: fix memory leaks
Filipe Manana fdmanana@suse.com Btrfs: fix race between using extent maps and merging them
Theodore Ts'o tytso@mit.edu ext4: improve explanation of a mount failure caused by a misconfigured kernel
Shijie Luo luoshijie1@huawei.com ext4: add cond_resched() to ext4_protect_reserved_inode
Jan Kara jack@suse.cz ext4: fix checksum errors with indexed dirs
Theodore Ts'o tytso@mit.edu ext4: fix support for inode sizes > 1024 bytes
Andreas Dilger adilger@dilger.ca ext4: don't assume that mmp_nodename/bdevname have NUL
Rafael J. Wysocki rafael.j.wysocki@intel.com ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system
Rafael J. Wysocki rafael.j.wysocki@intel.com ACPICA: Introduce acpi_any_gpe_status_set()
Rafael J. Wysocki rafael.j.wysocki@intel.com ACPI: PM: s2idle: Avoid possible race related to the EC GPE
Rafael J. Wysocki rafael.j.wysocki@intel.com ACPI: EC: Fix flushing of pending work
Arvind Sankar nivedita@alum.mit.edu ALSA: usb-audio: Apply sample rate quirk for Audioengine D1
Takashi Iwai tiwai@suse.de ALSA: hda/realtek - Fix silent output on MSI-GL73
Kailang Yang kailang@realtek.com ALSA: hda/realtek - Add more codec supported Headset Button
Takashi Iwai tiwai@suse.de ALSA: pcm: Fix double hw_free calls
Takashi Iwai tiwai@suse.de ALSA: usb-audio: Fix UAC2/3 effect unit parsing
Alexander Tsoy alexander@tsoy.me ALSA: usb-audio: Add clock validity quirk for Denon MC7000/MCX8000
Benjamin Tissoires benjamin.tissoires@redhat.com Input: synaptics - remove the LEN0049 dmi id from topbuttonpad list
Gaurav Agrawal agrawalgaurav@gnome.org Input: synaptics - enable SMBus on ThinkPad L470
Lyude Paul lyude@redhat.com Input: synaptics - switch T470s to RMI4 by default
Jens Axboe axboe@kernel.dk io_uring: retry raw bdev writes if we hit -EOPNOTSUPP
Pavel Begunkov asml.silence@gmail.com io_uring: fix deferred req iovec leak
-------------
Diffstat:
Makefile | 4 +- arch/arm/mach-npcm/Kconfig | 2 +- arch/arm64/boot/dts/arm/fvp-base-revc.dts | 8 +- arch/arm64/kernel/process.c | 7 ++ arch/s390/boot/uv.c | 3 +- arch/s390/include/asm/timex.h | 2 +- arch/x86/events/amd/core.c | 1 + arch/x86/events/intel/ds.c | 2 + arch/x86/kvm/mmu/paging_tmpl.h | 2 +- arch/x86/kvm/vmx/nested.c | 28 +++++++ arch/x86/kvm/vmx/vmx.c | 3 + arch/x86/kvm/x86.c | 8 ++ drivers/acpi/acpica/achware.h | 2 + drivers/acpi/acpica/evxfgpe.c | 32 ++++++++ drivers/acpi/acpica/hwgpe.c | 71 +++++++++++++++++ drivers/acpi/ec.c | 44 ++++++----- drivers/acpi/sleep.c | 50 ++++++++---- drivers/bus/moxtet.c | 2 +- drivers/char/ipmi/ipmb_dev_int.c | 2 +- drivers/edac/edac_mc.c | 12 +-- drivers/edac/edac_mc_sysfs.c | 18 +---- drivers/gpio/gpio-xilinx.c | 5 +- drivers/gpio/gpiolib-of.c | 4 - drivers/gpio/gpiolib.c | 11 +++ .../gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h | 46 +++++++---- drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 22 +++--- drivers/gpu/drm/drm_dp_mst_topology.c | 3 +- drivers/gpu/drm/i915/i915_pmu.c | 12 +++ drivers/gpu/drm/panfrost/panfrost_drv.c | 1 + drivers/gpu/drm/panfrost/panfrost_gem.h | 6 ++ drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c | 3 + drivers/gpu/drm/panfrost/panfrost_job.c | 7 +- drivers/gpu/drm/sun4i/sun4i_drv.c | 1 - drivers/gpu/drm/vgem/vgem_drv.c | 9 ++- drivers/hwmon/pmbus/ltc2978.c | 4 +- drivers/infiniband/core/security.c | 24 +++--- drivers/infiniband/core/user_mad.c | 5 +- drivers/infiniband/core/uverbs_cmd.c | 15 ++-- drivers/infiniband/hw/cxgb4/cm.c | 4 + drivers/infiniband/hw/cxgb4/qp.c | 4 +- drivers/infiniband/hw/hfi1/affinity.c | 2 + drivers/infiniband/hw/hfi1/file_ops.c | 52 ++++++++----- drivers/infiniband/hw/hfi1/hfi.h | 5 +- drivers/infiniband/hw/hfi1/user_exp_rcv.c | 5 +- drivers/infiniband/hw/hfi1/user_sdma.c | 17 ++-- drivers/infiniband/hw/mlx5/qp.c | 9 ++- drivers/infiniband/sw/rdmavt/qp.c | 84 ++++++++++++-------- drivers/infiniband/sw/rxe/rxe_comp.c | 8 +- drivers/input/mouse/synaptics.c | 4 +- drivers/input/touchscreen/ili210x.c | 2 +- drivers/mmc/core/host.c | 11 +-- drivers/mmc/core/slot-gpio.c | 3 + drivers/mmc/host/pxamci.c | 8 +- drivers/mmc/host/sdhci-esdhc-imx.c | 3 +- drivers/nvme/host/core.c | 2 +- drivers/s390/crypto/pkey_api.c | 2 +- drivers/spmi/spmi-pmic-arb.c | 4 + fs/btrfs/disk-io.c | 1 + fs/btrfs/extent_map.c | 11 +++ fs/btrfs/ref-verify.c | 5 ++ fs/btrfs/super.c | 2 + fs/ceph/super.c | 8 +- fs/cifs/cifsfs.c | 6 +- fs/cifs/smb2ops.c | 35 ++++++++- fs/ext4/block_validity.c | 1 + fs/ext4/dir.c | 14 ++-- fs/ext4/ext4.h | 5 +- fs/ext4/inode.c | 12 +++ fs/ext4/mmp.c | 12 +-- fs/ext4/namei.c | 7 ++ fs/ext4/super.c | 55 +++++++------ fs/io-wq.c | 8 ++ fs/io-wq.h | 4 +- fs/io_uring.c | 53 +++++-------- fs/jbd2/commit.c | 46 ++++++----- fs/jbd2/transaction.c | 10 ++- fs/nfs/delegation.c | 47 +++++++---- fs/nfs/nfs4proc.c | 2 +- include/acpi/acpixf.h | 1 + include/linux/gpio/consumer.h | 7 ++ include/linux/suspend.h | 2 +- include/net/mac80211.h | 11 ++- kernel/cgroup/cgroup.c | 13 ++-- kernel/power/suspend.c | 9 ++- kernel/sched/core.c | 2 +- net/mac80211/cfg.c | 2 +- net/mac80211/mlme.c | 8 +- net/mac80211/tx.c | 2 +- net/sunrpc/xprtrdma/frwr_ops.c | 13 ++-- sound/core/pcm_native.c | 3 +- sound/pci/hda/patch_realtek.c | 4 + sound/usb/clock.c | 91 +++++++++++++++------- sound/usb/clock.h | 4 +- sound/usb/format.c | 3 +- sound/usb/mixer.c | 12 ++- sound/usb/quirks.c | 1 + tools/perf/util/stat-shadow.c | 6 -- 97 files changed, 842 insertions(+), 406 deletions(-)
From: Pavel Begunkov asml.silence@gmail.com
commit 1e95081cb5b4cf77065d37866f57cf3c90a3df78 upstream.
After defer, a request will be prepared, that includes allocating iovec if needed, and then submitted through io_wq_submit_work() but not custom handler (e.g. io_rw_async()/io_sendrecv_async()). However, it'll leak iovec, as it's in io-wq and the code goes as follows:
io_read() { if (!io_wq_current_is_worker()) kfree(iovec); }
Put all deallocation logic in io_{read,write,send,recv}(), which will leave the memory, if going async with -EAGAIN.
It also fixes a leak after failed io_alloc_async_ctx() in io_{recv,send}_msg().
Cc: stable@vger.kernel.org # 5.5 Signed-off-by: Pavel Begunkov asml.silence@gmail.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/io_uring.c | 47 ++++++++++++----------------------------------- 1 file changed, 12 insertions(+), 35 deletions(-)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1786,17 +1786,6 @@ static int io_alloc_async_ctx(struct io_ return req->io == NULL; }
-static void io_rw_async(struct io_wq_work **workptr) -{ - struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work); - struct iovec *iov = NULL; - - if (req->io->rw.iov != req->io->rw.fast_iov) - iov = req->io->rw.iov; - io_wq_submit_work(workptr); - kfree(iov); -} - static int io_setup_async_rw(struct io_kiocb *req, ssize_t io_size, struct iovec *iovec, struct iovec *fast_iov, struct iov_iter *iter) @@ -1810,7 +1799,6 @@ static int io_setup_async_rw(struct io_k
io_req_map_rw(req, io_size, iovec, fast_iov, iter); } - req->work.func = io_rw_async; return 0; }
@@ -1897,8 +1885,7 @@ copy_iov: } } out_free: - if (!io_wq_current_is_worker()) - kfree(iovec); + kfree(iovec); return ret; }
@@ -2003,8 +1990,7 @@ copy_iov: } } out_free: - if (!io_wq_current_is_worker()) - kfree(iovec); + kfree(iovec); return ret; }
@@ -2174,19 +2160,6 @@ static int io_sync_file_range(struct io_ return 0; }
-#if defined(CONFIG_NET) -static void io_sendrecv_async(struct io_wq_work **workptr) -{ - struct io_kiocb *req = container_of(*workptr, struct io_kiocb, work); - struct iovec *iov = NULL; - - if (req->io->rw.iov != req->io->rw.fast_iov) - iov = req->io->msg.iov; - io_wq_submit_work(workptr); - kfree(iov); -} -#endif - static int io_sendmsg_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe) { #if defined(CONFIG_NET) @@ -2254,17 +2227,19 @@ static int io_sendmsg(struct io_kiocb *r if (force_nonblock && ret == -EAGAIN) { if (req->io) return -EAGAIN; - if (io_alloc_async_ctx(req)) + if (io_alloc_async_ctx(req)) { + if (kmsg && kmsg->iov != kmsg->fast_iov) + kfree(kmsg->iov); return -ENOMEM; + } memcpy(&req->io->msg, &io.msg, sizeof(io.msg)); - req->work.func = io_sendrecv_async; return -EAGAIN; } if (ret == -ERESTARTSYS) ret = -EINTR; }
- if (!io_wq_current_is_worker() && kmsg && kmsg->iov != kmsg->fast_iov) + if (kmsg && kmsg->iov != kmsg->fast_iov) kfree(kmsg->iov); io_cqring_add_event(req, ret); if (ret < 0) @@ -2346,17 +2321,19 @@ static int io_recvmsg(struct io_kiocb *r if (force_nonblock && ret == -EAGAIN) { if (req->io) return -EAGAIN; - if (io_alloc_async_ctx(req)) + if (io_alloc_async_ctx(req)) { + if (kmsg && kmsg->iov != kmsg->fast_iov) + kfree(kmsg->iov); return -ENOMEM; + } memcpy(&req->io->msg, &io.msg, sizeof(io.msg)); - req->work.func = io_sendrecv_async; return -EAGAIN; } if (ret == -ERESTARTSYS) ret = -EINTR; }
- if (!io_wq_current_is_worker() && kmsg && kmsg->iov != kmsg->fast_iov) + if (kmsg && kmsg->iov != kmsg->fast_iov) kfree(kmsg->iov); io_cqring_add_event(req, ret); if (ret < 0)
From: Jens Axboe axboe@kernel.dk
commit faac996ccd5da95bc56b91aa80f2643c2d0a1c56 upstream.
For non-blocking issue, we set IOCB_NOWAIT in the kiocb. However, on a raw block device, this yields an -EOPNOTSUPP return, as non-blocking writes aren't supported. Turn this -EOPNOTSUPP into -EAGAIN, so we retry from blocking context with IOCB_NOWAIT cleared.
Cc: stable@vger.kernel.org # 5.5 Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/io_uring.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -1978,6 +1978,12 @@ static int io_write(struct io_kiocb *req ret2 = call_write_iter(req->file, kiocb, &iter); else ret2 = loop_rw_iter(WRITE, req->file, kiocb, &iter); + /* + * Raw bdev writes will -EOPNOTSUPP for IOCB_NOWAIT. Just + * retry them without IOCB_NOWAIT. + */ + if (ret2 == -EOPNOTSUPP && (kiocb->ki_flags & IOCB_NOWAIT)) + ret2 = -EAGAIN; if (!force_nonblock || ret2 != -EAGAIN) { kiocb_done(kiocb, ret2, nxt, req->in_async); } else {
From: Lyude Paul lyude@redhat.com
commit bf502391353b928e63096127e5fd8482080203f5 upstream.
This supports RMI4 and everything seems to work, including the touchpad buttons. So, let's enable this by default.
Signed-off-by: Lyude Paul lyude@redhat.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200204194322.112638-1-lyude@redhat.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/mouse/synaptics.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/input/mouse/synaptics.c +++ b/drivers/input/mouse/synaptics.c @@ -169,6 +169,7 @@ static const char * const smbus_pnp_ids[ "LEN004a", /* W541 */ "LEN005b", /* P50 */ "LEN005e", /* T560 */ + "LEN006c", /* T470s */ "LEN0071", /* T480 */ "LEN0072", /* X1 Carbon Gen 5 (2017) - Elan/ALPS trackpoint */ "LEN0073", /* X1 Carbon G5 (Elantech) */
From: Gaurav Agrawal agrawalgaurav@gnome.org
commit b8a3d819f872e0a3a0a6db0dbbcd48071042fb98 upstream.
Add touchpad LEN2044 to the list, as it is capable of working with psmouse.synaptics_intertouch=1
Signed-off-by: Gaurav Agrawal agrawalgaurav@gnome.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/CADdtggVzVJq5gGNmFhKSz2MBwjTpdN5YVOdr4D3Hkkv=KZRc9... Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/mouse/synaptics.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/input/mouse/synaptics.c +++ b/drivers/input/mouse/synaptics.c @@ -180,6 +180,7 @@ static const char * const smbus_pnp_ids[ "LEN0097", /* X280 -> ALPS trackpoint */ "LEN009b", /* T580 */ "LEN200f", /* T450s */ + "LEN2044", /* L470 */ "LEN2054", /* E480 */ "LEN2055", /* E580 */ "SYN3052", /* HP EliteBook 840 G4 */
From: Benjamin Tissoires benjamin.tissoires@redhat.com
commit 5179a9dfa9440c1781816e2c9a183d1d2512dc61 upstream.
The Yoga 11e is using LEN0049, but it doesn't have a trackstick.
Thus, there is no need to create a software top buttons row.
However, it seems that the device works under SMBus, so keep it as part of the smbus_pnp_ids.
Signed-off-by: Benjamin Tissoires benjamin.tissoires@redhat.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200115013023.9710-1-benjamin.tissoires@redhat.co... Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/mouse/synaptics.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/input/mouse/synaptics.c +++ b/drivers/input/mouse/synaptics.c @@ -146,7 +146,6 @@ static const char * const topbuttonpad_p "LEN0042", /* Yoga */ "LEN0045", "LEN0047", - "LEN0049", "LEN2000", /* S540 */ "LEN2001", /* Edge E431 */ "LEN2002", /* Edge E531 */ @@ -166,6 +165,7 @@ static const char * const smbus_pnp_ids[ /* all of the topbuttonpad_pnp_ids are valid, we just add some extras */ "LEN0048", /* X1 Carbon 3 */ "LEN0046", /* X250 */ + "LEN0049", /* Yoga 11e */ "LEN004a", /* W541 */ "LEN005b", /* P50 */ "LEN005e", /* T560 */
From: Alexander Tsoy alexander@tsoy.me
commit 9f35a31283775e6f6af73fb2c95c686a4c0acac7 upstream.
It should be safe to ignore clock validity check result if the following conditions are met: - only one single sample rate is supported; - the terminal is directly connected to the clock source; - the clock type is internal.
This is to deal with some Denon DJ controllers that always reports that clock is invalid.
Tested-by: Tobias Oszlanyi toszlanyi@yahoo.de Signed-off-by: Alexander Tsoy alexander@tsoy.me Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200212235450.697348-1-alexander@tsoy.me Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/clock.c | 91 ++++++++++++++++++++++++++++++++++++----------------- sound/usb/clock.h | 4 +- sound/usb/format.c | 3 - 3 files changed, 66 insertions(+), 32 deletions(-)
--- a/sound/usb/clock.c +++ b/sound/usb/clock.c @@ -151,8 +151,34 @@ static int uac_clock_selector_set_val(st return ret; }
+/* + * Assume the clock is valid if clock source supports only one single sample + * rate, the terminal is connected directly to it (there is no clock selector) + * and clock type is internal. This is to deal with some Denon DJ controllers + * that always reports that clock is invalid. + */ +static bool uac_clock_source_is_valid_quirk(struct snd_usb_audio *chip, + struct audioformat *fmt, + int source_id) +{ + if (fmt->protocol == UAC_VERSION_2) { + struct uac_clock_source_descriptor *cs_desc = + snd_usb_find_clock_source(chip->ctrl_intf, source_id); + + if (!cs_desc) + return false; + + return (fmt->nr_rates == 1 && + (fmt->clock & 0xff) == cs_desc->bClockID && + (cs_desc->bmAttributes & 0x3) != + UAC_CLOCK_SOURCE_TYPE_EXT); + } + + return false; +} + static bool uac_clock_source_is_valid(struct snd_usb_audio *chip, - int protocol, + struct audioformat *fmt, int source_id) { int err; @@ -160,7 +186,7 @@ static bool uac_clock_source_is_valid(st struct usb_device *dev = chip->dev; u32 bmControls;
- if (protocol == UAC_VERSION_3) { + if (fmt->protocol == UAC_VERSION_3) { struct uac3_clock_source_descriptor *cs_desc = snd_usb_find_clock_source_v3(chip->ctrl_intf, source_id);
@@ -194,10 +220,14 @@ static bool uac_clock_source_is_valid(st return false; }
- return data ? true : false; + if (data) + return true; + else + return uac_clock_source_is_valid_quirk(chip, fmt, source_id); }
-static int __uac_clock_find_source(struct snd_usb_audio *chip, int entity_id, +static int __uac_clock_find_source(struct snd_usb_audio *chip, + struct audioformat *fmt, int entity_id, unsigned long *visited, bool validate) { struct uac_clock_source_descriptor *source; @@ -217,7 +247,7 @@ static int __uac_clock_find_source(struc source = snd_usb_find_clock_source(chip->ctrl_intf, entity_id); if (source) { entity_id = source->bClockID; - if (validate && !uac_clock_source_is_valid(chip, UAC_VERSION_2, + if (validate && !uac_clock_source_is_valid(chip, fmt, entity_id)) { usb_audio_err(chip, "clock source %d is not valid, cannot use\n", @@ -248,8 +278,9 @@ static int __uac_clock_find_source(struc }
cur = ret; - ret = __uac_clock_find_source(chip, selector->baCSourceID[ret - 1], - visited, validate); + ret = __uac_clock_find_source(chip, fmt, + selector->baCSourceID[ret - 1], + visited, validate); if (!validate || ret > 0 || !chip->autoclock) return ret;
@@ -260,8 +291,9 @@ static int __uac_clock_find_source(struc if (i == cur) continue;
- ret = __uac_clock_find_source(chip, selector->baCSourceID[i - 1], - visited, true); + ret = __uac_clock_find_source(chip, fmt, + selector->baCSourceID[i - 1], + visited, true); if (ret < 0) continue;
@@ -281,14 +313,16 @@ static int __uac_clock_find_source(struc /* FIXME: multipliers only act as pass-thru element for now */ multiplier = snd_usb_find_clock_multiplier(chip->ctrl_intf, entity_id); if (multiplier) - return __uac_clock_find_source(chip, multiplier->bCSourceID, - visited, validate); + return __uac_clock_find_source(chip, fmt, + multiplier->bCSourceID, + visited, validate);
return -EINVAL; }
-static int __uac3_clock_find_source(struct snd_usb_audio *chip, int entity_id, - unsigned long *visited, bool validate) +static int __uac3_clock_find_source(struct snd_usb_audio *chip, + struct audioformat *fmt, int entity_id, + unsigned long *visited, bool validate) { struct uac3_clock_source_descriptor *source; struct uac3_clock_selector_descriptor *selector; @@ -307,7 +341,7 @@ static int __uac3_clock_find_source(stru source = snd_usb_find_clock_source_v3(chip->ctrl_intf, entity_id); if (source) { entity_id = source->bClockID; - if (validate && !uac_clock_source_is_valid(chip, UAC_VERSION_3, + if (validate && !uac_clock_source_is_valid(chip, fmt, entity_id)) { usb_audio_err(chip, "clock source %d is not valid, cannot use\n", @@ -338,7 +372,8 @@ static int __uac3_clock_find_source(stru }
cur = ret; - ret = __uac3_clock_find_source(chip, selector->baCSourceID[ret - 1], + ret = __uac3_clock_find_source(chip, fmt, + selector->baCSourceID[ret - 1], visited, validate); if (!validate || ret > 0 || !chip->autoclock) return ret; @@ -350,8 +385,9 @@ static int __uac3_clock_find_source(stru if (i == cur) continue;
- ret = __uac3_clock_find_source(chip, selector->baCSourceID[i - 1], - visited, true); + ret = __uac3_clock_find_source(chip, fmt, + selector->baCSourceID[i - 1], + visited, true); if (ret < 0) continue;
@@ -372,7 +408,8 @@ static int __uac3_clock_find_source(stru multiplier = snd_usb_find_clock_multiplier_v3(chip->ctrl_intf, entity_id); if (multiplier) - return __uac3_clock_find_source(chip, multiplier->bCSourceID, + return __uac3_clock_find_source(chip, fmt, + multiplier->bCSourceID, visited, validate);
return -EINVAL; @@ -389,18 +426,18 @@ static int __uac3_clock_find_source(stru * * Returns the clock source UnitID (>=0) on success, or an error. */ -int snd_usb_clock_find_source(struct snd_usb_audio *chip, int protocol, - int entity_id, bool validate) +int snd_usb_clock_find_source(struct snd_usb_audio *chip, + struct audioformat *fmt, bool validate) { DECLARE_BITMAP(visited, 256); memset(visited, 0, sizeof(visited));
- switch (protocol) { + switch (fmt->protocol) { case UAC_VERSION_2: - return __uac_clock_find_source(chip, entity_id, visited, + return __uac_clock_find_source(chip, fmt, fmt->clock, visited, validate); case UAC_VERSION_3: - return __uac3_clock_find_source(chip, entity_id, visited, + return __uac3_clock_find_source(chip, fmt, fmt->clock, visited, validate); default: return -EINVAL; @@ -501,8 +538,7 @@ static int set_sample_rate_v2v3(struct s * automatic clock selection if the current clock is not * valid. */ - clock = snd_usb_clock_find_source(chip, fmt->protocol, - fmt->clock, true); + clock = snd_usb_clock_find_source(chip, fmt, true); if (clock < 0) { /* We did not find a valid clock, but that might be * because the current sample rate does not match an @@ -510,8 +546,7 @@ static int set_sample_rate_v2v3(struct s * and we will do another validation after setting the * rate. */ - clock = snd_usb_clock_find_source(chip, fmt->protocol, - fmt->clock, false); + clock = snd_usb_clock_find_source(chip, fmt, false); if (clock < 0) return clock; } @@ -577,7 +612,7 @@ static int set_sample_rate_v2v3(struct s
validation: /* validate clock after rate change */ - if (!uac_clock_source_is_valid(chip, fmt->protocol, clock)) + if (!uac_clock_source_is_valid(chip, fmt, clock)) return -ENXIO; return 0; } --- a/sound/usb/clock.h +++ b/sound/usb/clock.h @@ -6,7 +6,7 @@ int snd_usb_init_sample_rate(struct snd_ struct usb_host_interface *alts, struct audioformat *fmt, int rate);
-int snd_usb_clock_find_source(struct snd_usb_audio *chip, int protocol, - int entity_id, bool validate); +int snd_usb_clock_find_source(struct snd_usb_audio *chip, + struct audioformat *fmt, bool validate);
#endif /* __USBAUDIO_CLOCK_H */ --- a/sound/usb/format.c +++ b/sound/usb/format.c @@ -322,8 +322,7 @@ static int parse_audio_format_rates_v2v3 struct usb_device *dev = chip->dev; unsigned char tmp[2], *data; int nr_triplets, data_size, ret = 0, ret_l6; - int clock = snd_usb_clock_find_source(chip, fp->protocol, - fp->clock, false); + int clock = snd_usb_clock_find_source(chip, fp, false);
if (clock < 0) { dev_err(&dev->dev,
From: Takashi Iwai tiwai@suse.de
commit d75a170fd848f037a1e28893ad10be7a4c51f8a6 upstream.
We've got a regression report about M-Audio Fast Track C400 device, and the git bisection resulted in the commit e0ccdef92653 ("ALSA: usb-audio: Clean up check_input_term()"). This commit was about the rewrite of the input terminal parser, and it's not too obvious from the change what really broke. The answer is: it's the interpretation of UAC2/3 effect units.
In the original code, UAC2 effect unit is as if through UAC1 processing unit because both UAC1 PU and UAC2/3 EU share the same number (0x07). The old code went through a complex switch-case fallthrough, finally bailing out in the middle:
if (protocol == UAC_VERSION_2 && hdr[2] == UAC2_EFFECT_UNIT) { /* UAC2/UAC1 unit IDs overlap here in an * uncompatible way. Ignore this unit for now. */ return 0; }
... and this special handling was missing in the new code; the new code treats UAC2/3 effect unit as if it were equivalent with the processing unit.
Actually, the old code was too confusing. The effect unit has an incompatible unit description with the processing unit, so we shouldn't have dealt with EU in the same way.
This patch addresses the regression by changing the effect unit handling to the own parser function. The own parser function makes the clear distinct with PU, so it improves the readability, too.
The EU parser just sets the type and the id like the old kernels. Once when the proper effect unit support is added, we can revisit this parser function, but for now, let's keep this simple setup as is.
Fixes: e0ccdef92653 ("ALSA: usb-audio: Clean up check_input_term()") Cc: stable@vger.kernel.org BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206147 Link: https://lore.kernel.org/r/20200211160521.31990-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/mixer.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
--- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -897,6 +897,15 @@ static int parse_term_proc_unit(struct m return 0; }
+static int parse_term_effect_unit(struct mixer_build *state, + struct usb_audio_term *term, + void *p1, int id) +{ + term->type = UAC3_EFFECT_UNIT << 16; /* virtual type */ + term->id = id; + return 0; +} + static int parse_term_uac2_clock_source(struct mixer_build *state, struct usb_audio_term *term, void *p1, int id) @@ -981,8 +990,7 @@ static int __check_input_term(struct mix UAC3_PROCESSING_UNIT); case PTYPE(UAC_VERSION_2, UAC2_EFFECT_UNIT): case PTYPE(UAC_VERSION_3, UAC3_EFFECT_UNIT): - return parse_term_proc_unit(state, term, p1, id, - UAC3_EFFECT_UNIT); + return parse_term_effect_unit(state, term, p1, id); case PTYPE(UAC_VERSION_1, UAC1_EXTENSION_UNIT): case PTYPE(UAC_VERSION_2, UAC2_EXTENSION_UNIT_V2): case PTYPE(UAC_VERSION_3, UAC3_EXTENSION_UNIT):
From: Takashi Iwai tiwai@suse.de
commit 0fbb027b44e79700da80e4b8bd1c1914d4796af6 upstream.
The commit 66f2d19f8116 ("ALSA: pcm: Fix memory leak at closing a stream without hw_free") tried to fix the regression wrt the missing hw_free call at closing without SNDRV_PCM_IOCTL_HW_FREE ioctl. However, the code change dropped mistakenly the state check, resulting in calling hw_free twice when SNDRV_PCM_IOCTL_HW_FRE got called beforehand. For most drivers, this is almost harmless, but the drivers like SOF show another regression now.
This patch adds the state condition check before calling do_hw_free() at releasing the stream for avoiding the double hw_free calls.
Fixes: 66f2d19f8116 ("ALSA: pcm: Fix memory leak at closing a stream without hw_free") Reported-by: Bard Liao yung-chuan.liao@linux.intel.com Reported-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Tested-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/s5hd0ajyprg.wl-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/core/pcm_native.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/sound/core/pcm_native.c +++ b/sound/core/pcm_native.c @@ -2474,7 +2474,8 @@ void snd_pcm_release_substream(struct sn
snd_pcm_drop(substream); if (substream->hw_opened) { - do_hw_free(substream); + if (substream->runtime->status->state != SNDRV_PCM_STATE_OPEN) + do_hw_free(substream); substream->ops->close(substream); substream->hw_opened = 0; }
From: Kailang Yang kailang@realtek.com
commit 2b3b6497c38d123934de68ea82a247b557d95290 upstream.
Add supported Headset Button for ALC215/ALC285/ALC289.
Signed-off-by: Kailang Yang kailang@realtek.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/948f70b4488f4cc2b629a39ce4e4be33@realtek.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -5701,8 +5701,11 @@ static void alc_fixup_headset_jack(struc break; case HDA_FIXUP_ACT_INIT: switch (codec->core.vendor_id) { + case 0x10ec0215: case 0x10ec0225: + case 0x10ec0285: case 0x10ec0295: + case 0x10ec0289: case 0x10ec0299: alc_write_coef_idx(codec, 0x48, 0xd011); alc_update_coef_idx(codec, 0x49, 0x007f, 0x0045);
From: Takashi Iwai tiwai@suse.de
commit 7dafba3762d6c0083ded00a48f8c1a158bc86717 upstream.
MSI-GL73 laptop with ALC1220 codec requires a similar workaround for Clevo laptops to enforce the DAC/mixer connection path. Set up a quirk entry for that.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=204159 Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200212081047.27727-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -2447,6 +2447,7 @@ static const struct snd_pci_quirk alc882 SND_PCI_QUIRK(0x1071, 0x8258, "Evesham Voyaeger", ALC882_FIXUP_EAPD), SND_PCI_QUIRK(0x1458, 0xa002, "Gigabyte EP45-DS3/Z87X-UD3H", ALC889_FIXUP_FRONT_HP_NO_PRESENCE), SND_PCI_QUIRK(0x1458, 0xa0b8, "Gigabyte AZ370-Gaming", ALC1220_FIXUP_GB_DUAL_CODECS), + SND_PCI_QUIRK(0x1462, 0x1276, "MSI-GL73", ALC1220_FIXUP_CLEVO_P950), SND_PCI_QUIRK(0x1462, 0x7350, "MSI-7350", ALC889_FIXUP_CD), SND_PCI_QUIRK(0x1462, 0xda57, "MSI Z270-Gaming", ALC1220_FIXUP_GB_DUAL_CODECS), SND_PCI_QUIRK_VENDOR(0x1462, "MSI", ALC882_FIXUP_GPIO3),
From: Arvind Sankar nivedita@alum.mit.edu
commit 93f9d1a4ac5930654c17412e3911b46ece73755a upstream.
The Audioengine D1 (0x2912:0x30c8) does support reading the sample rate, but it returns the rate in byte-reversed order.
When setting sampling rate, the driver produces these warning messages: [168840.944226] usb 3-2.2: current rate 4500480 is different from the runtime rate 44100 [168854.930414] usb 3-2.2: current rate 8436480 is different from the runtime rate 48000 [168905.185825] usb 3-2.1.2: current rate 30465 is different from the runtime rate 96000
As can be seen from the hexadecimal conversion, the current rate read back is byte-reversed from the rate that was set.
44100 == 0x00ac44, 4500480 == 0x44ac00 48000 == 0x00bb80, 8436480 == 0x80bb00 96000 == 0x017700, 30465 == 0x007701
Rather than implementing a new quirk to reverse the order, just skip checking the rate to avoid spamming the log.
Signed-off-by: Arvind Sankar nivedita@alum.mit.edu Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20200211162235.1639889-1-nivedita@alum.mit.edu Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- sound/usb/quirks.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -1402,6 +1402,7 @@ bool snd_usb_get_sample_rate_quirk(struc case USB_ID(0x1395, 0x740a): /* Sennheiser DECT */ case USB_ID(0x1901, 0x0191): /* GE B850V3 CP2114 audio interface */ case USB_ID(0x21B4, 0x0081): /* AudioQuest DragonFly */ + case USB_ID(0x2912, 0x30c8): /* Audioengine D1 */ return true; }
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit f0ac20c3f6137910c8a927953e8a92f5b3716166 upstream.
Commit 016b87ca5c8c ("ACPI: EC: Rework flushing of pending work") introduced a subtle bug into the flushing of pending EC work while suspended to idle, which may cause the EC driver to fail to re-enable the EC GPE after handling a non-wakeup event (like a battery status change event, for example).
The problem is that the work item flushed by flush_scheduled_work() in __acpi_ec_flush_work() may disable the EC GPE and schedule another work item expected to re-enable it, but that new work item is not flushed, so __acpi_ec_flush_work() returns with the EC GPE disabled and the CPU running it goes into an idle state subsequently. If all of the other CPUs are in idle states at that point, the EC GPE won't be re-enabled until at least one CPU is woken up by another interrupt source, so system wakeup events that would normally come from the EC then don't work.
This is reproducible on a Dell XPS13 9360 in my office which sometimes stops reacting to power button and lid events (triggered by the EC on that machine) after switching from AC power to battery power or vice versa while suspended to idle (each of those switches causes the EC GPE to trigger for several times in a row, but they are not system wakeup events).
To avoid this problem, it is necessary to drain the workqueue entirely in __acpi_ec_flush_work(), but that cannot be done with respect to system_wq, because work items may be added to it from other places while __acpi_ec_flush_work() is running. For this reason, make the EC driver use a dedicated workqueue for EC events processing (let that workqueue be ordered so that EC events are processed sequentially) and use drain_workqueue() on it in __acpi_ec_flush_work().
Fixes: 016b87ca5c8c ("ACPI: EC: Rework flushing of pending work") Cc: 5.4+ stable@vger.kernel.org # 5.4+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/ec.c | 44 ++++++++++++++++++++++++++------------------ 1 file changed, 26 insertions(+), 18 deletions(-)
--- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -179,6 +179,7 @@ EXPORT_SYMBOL(first_ec);
static struct acpi_ec *boot_ec; static bool boot_ec_is_ecdt = false; +static struct workqueue_struct *ec_wq; static struct workqueue_struct *ec_query_wq;
static int EC_FLAGS_QUERY_HANDSHAKE; /* Needs QR_EC issued when SCI_EVT set */ @@ -469,7 +470,7 @@ static void acpi_ec_submit_query(struct ec_dbg_evt("Command(%s) submitted/blocked", acpi_ec_cmd_string(ACPI_EC_COMMAND_QUERY)); ec->nr_pending_queries++; - schedule_work(&ec->work); + queue_work(ec_wq, &ec->work); } }
@@ -535,7 +536,7 @@ static void acpi_ec_enable_event(struct #ifdef CONFIG_PM_SLEEP static void __acpi_ec_flush_work(void) { - flush_scheduled_work(); /* flush ec->work */ + drain_workqueue(ec_wq); /* flush ec->work */ flush_workqueue(ec_query_wq); /* flush queries */ }
@@ -556,8 +557,8 @@ static void acpi_ec_disable_event(struct
void acpi_ec_flush_work(void) { - /* Without ec_query_wq there is nothing to flush. */ - if (!ec_query_wq) + /* Without ec_wq there is nothing to flush. */ + if (!ec_wq) return;
__acpi_ec_flush_work(); @@ -2115,25 +2116,33 @@ static struct acpi_driver acpi_ec_driver .drv.pm = &acpi_ec_pm, };
-static inline int acpi_ec_query_init(void) +static void acpi_ec_destroy_workqueues(void) { - if (!ec_query_wq) { - ec_query_wq = alloc_workqueue("kec_query", 0, - ec_max_queries); - if (!ec_query_wq) - return -ENODEV; + if (ec_wq) { + destroy_workqueue(ec_wq); + ec_wq = NULL; } - return 0; -} - -static inline void acpi_ec_query_exit(void) -{ if (ec_query_wq) { destroy_workqueue(ec_query_wq); ec_query_wq = NULL; } }
+static int acpi_ec_init_workqueues(void) +{ + if (!ec_wq) + ec_wq = alloc_ordered_workqueue("kec", 0); + + if (!ec_query_wq) + ec_query_wq = alloc_workqueue("kec_query", 0, ec_max_queries); + + if (!ec_wq || !ec_query_wq) { + acpi_ec_destroy_workqueues(); + return -ENODEV; + } + return 0; +} + static const struct dmi_system_id acpi_ec_no_wakeup[] = { { .ident = "Thinkpad X1 Carbon 6th", @@ -2164,8 +2173,7 @@ int __init acpi_ec_init(void) int result; int ecdt_fail, dsdt_fail;
- /* register workqueue for _Qxx evaluations */ - result = acpi_ec_query_init(); + result = acpi_ec_init_workqueues(); if (result) return result;
@@ -2196,6 +2204,6 @@ static void __exit acpi_ec_exit(void) {
acpi_bus_unregister_driver(&acpi_ec_driver); - acpi_ec_query_exit(); + acpi_ec_destroy_workqueues(); } #endif /* 0 */
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit e3728b50cd9be7d4b1469447cdf1feb93e3b7adb upstream.
It is theoretically possible for the ACPI EC GPE to be set after the s2idle_ops->wake() called from s2idle_loop() has returned and before the subsequent pm_wakeup_pending() check is carried out. If that happens, the resulting wakeup event will cause the system to resume even though it may be a spurious one.
To avoid that race, first make the ->wake() callback in struct platform_s2idle_ops return a bool value indicating whether or not to let the system resume and rearrange s2idle_loop() to use that value instad of the direct pm_wakeup_pending() call if ->wake() is present.
Next, rework acpi_s2idle_wake() to process EC events and check pm_wakeup_pending() before re-arming the SCI for system wakeup to prevent it from triggering prematurely and add comments to that function to explain the rationale for the new code flow.
Fixes: 56b991849009 ("PM: sleep: Simplify suspend-to-idle control flow") Cc: 5.4+ stable@vger.kernel.org # 5.4+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/sleep.c | 46 ++++++++++++++++++++++++++++++++-------------- include/linux/suspend.h | 2 +- kernel/power/suspend.c | 9 +++++---- 3 files changed, 38 insertions(+), 19 deletions(-)
--- a/drivers/acpi/sleep.c +++ b/drivers/acpi/sleep.c @@ -987,21 +987,28 @@ static void acpi_s2idle_sync(void) acpi_os_wait_events_complete(); /* synchronize Notify handling */ }
-static void acpi_s2idle_wake(void) +static bool acpi_s2idle_wake(void) { - /* - * If IRQD_WAKEUP_ARMED is set for the SCI at this point, the SCI has - * not triggered while suspended, so bail out. - */ - if (!acpi_sci_irq_valid() || - irqd_is_wakeup_armed(irq_get_irq_data(acpi_sci_irq))) - return; - - /* - * If there are EC events to process, the wakeup may be a spurious one - * coming from the EC. - */ - if (acpi_ec_dispatch_gpe()) { + if (!acpi_sci_irq_valid()) + return pm_wakeup_pending(); + + while (pm_wakeup_pending()) { + /* + * If IRQD_WAKEUP_ARMED is set for the SCI at this point, the + * SCI has not triggered while suspended, so bail out (the + * wakeup is pending anyway and the SCI is not the source of + * it). + */ + if (irqd_is_wakeup_armed(irq_get_irq_data(acpi_sci_irq))) + return true; + + /* + * If there are no EC events to process, the wakeup is regarded + * as a genuine one. + */ + if (!acpi_ec_dispatch_gpe()) + return true; + /* * Cancel the wakeup and process all pending events in case * there are any wakeup ones in there. @@ -1014,8 +1021,19 @@ static void acpi_s2idle_wake(void)
acpi_s2idle_sync();
+ /* + * The SCI is in the "suspended" state now and it cannot produce + * new wakeup events till the rearming below, so if any of them + * are pending here, they must be resulting from the processing + * of EC events above or coming from somewhere else. + */ + if (pm_wakeup_pending()) + return true; + rearm_wake_irq(acpi_sci_irq); } + + return false; }
static void acpi_s2idle_restore_early(void) --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -191,7 +191,7 @@ struct platform_s2idle_ops { int (*begin)(void); int (*prepare)(void); int (*prepare_late)(void); - void (*wake)(void); + bool (*wake)(void); void (*restore_early)(void); void (*restore)(void); void (*end)(void); --- a/kernel/power/suspend.c +++ b/kernel/power/suspend.c @@ -131,11 +131,12 @@ static void s2idle_loop(void) * to avoid them upfront. */ for (;;) { - if (s2idle_ops && s2idle_ops->wake) - s2idle_ops->wake(); - - if (pm_wakeup_pending()) + if (s2idle_ops && s2idle_ops->wake) { + if (s2idle_ops->wake()) + break; + } else if (pm_wakeup_pending()) { break; + }
pm_wakeup_clear(false);
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit ea128834dd76f9a72a35d011c651fa96658f06a7 upstream.
Introduce a new helper function, acpi_any_gpe_status_set(), for checking the status bits of all enabled GPEs in one go.
It is needed to distinguish spurious SCIs from genuine ones when deciding whether or not to wake up the system from suspend-to-idle.
Cc: 5.4+ stable@vger.kernel.org # 5.4+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/acpica/achware.h | 2 + drivers/acpi/acpica/evxfgpe.c | 32 ++++++++++++++++++ drivers/acpi/acpica/hwgpe.c | 71 ++++++++++++++++++++++++++++++++++++++++++ include/acpi/acpixf.h | 1 4 files changed, 106 insertions(+)
--- a/drivers/acpi/acpica/achware.h +++ b/drivers/acpi/acpica/achware.h @@ -101,6 +101,8 @@ acpi_status acpi_hw_enable_all_runtime_g
acpi_status acpi_hw_enable_all_wakeup_gpes(void);
+u8 acpi_hw_check_all_gpes(void); + acpi_status acpi_hw_enable_runtime_gpe_block(struct acpi_gpe_xrupt_info *gpe_xrupt_info, struct acpi_gpe_block_info *gpe_block, --- a/drivers/acpi/acpica/evxfgpe.c +++ b/drivers/acpi/acpica/evxfgpe.c @@ -795,6 +795,38 @@ acpi_status acpi_enable_all_wakeup_gpes(
ACPI_EXPORT_SYMBOL(acpi_enable_all_wakeup_gpes)
+/****************************************************************************** + * + * FUNCTION: acpi_any_gpe_status_set + * + * PARAMETERS: None + * + * RETURN: Whether or not the status bit is set for any GPE + * + * DESCRIPTION: Check the status bits of all enabled GPEs and return TRUE if any + * of them is set or FALSE otherwise. + * + ******************************************************************************/ +u32 acpi_any_gpe_status_set(void) +{ + acpi_status status; + u8 ret; + + ACPI_FUNCTION_TRACE(acpi_any_gpe_status_set); + + status = acpi_ut_acquire_mutex(ACPI_MTX_EVENTS); + if (ACPI_FAILURE(status)) { + return (FALSE); + } + + ret = acpi_hw_check_all_gpes(); + (void)acpi_ut_release_mutex(ACPI_MTX_EVENTS); + + return (ret); +} + +ACPI_EXPORT_SYMBOL(acpi_any_gpe_status_set) + /******************************************************************************* * * FUNCTION: acpi_install_gpe_block --- a/drivers/acpi/acpica/hwgpe.c +++ b/drivers/acpi/acpica/hwgpe.c @@ -446,6 +446,53 @@ acpi_hw_enable_wakeup_gpe_block(struct a
/****************************************************************************** * + * FUNCTION: acpi_hw_get_gpe_block_status + * + * PARAMETERS: gpe_xrupt_info - GPE Interrupt info + * gpe_block - Gpe Block info + * + * RETURN: Success + * + * DESCRIPTION: Produce a combined GPE status bits mask for the given block. + * + ******************************************************************************/ + +static acpi_status +acpi_hw_get_gpe_block_status(struct acpi_gpe_xrupt_info *gpe_xrupt_info, + struct acpi_gpe_block_info *gpe_block, + void *ret_ptr) +{ + struct acpi_gpe_register_info *gpe_register_info; + u64 in_enable, in_status; + acpi_status status; + u8 *ret = ret_ptr; + u32 i; + + /* Examine each GPE Register within the block */ + + for (i = 0; i < gpe_block->register_count; i++) { + gpe_register_info = &gpe_block->register_info[i]; + + status = acpi_hw_read(&in_enable, + &gpe_register_info->enable_address); + if (ACPI_FAILURE(status)) { + continue; + } + + status = acpi_hw_read(&in_status, + &gpe_register_info->status_address); + if (ACPI_FAILURE(status)) { + continue; + } + + *ret |= in_enable & in_status; + } + + return (AE_OK); +} + +/****************************************************************************** + * * FUNCTION: acpi_hw_disable_all_gpes * * PARAMETERS: None @@ -510,4 +557,28 @@ acpi_status acpi_hw_enable_all_wakeup_gp return_ACPI_STATUS(status); }
+/****************************************************************************** + * + * FUNCTION: acpi_hw_check_all_gpes + * + * PARAMETERS: None + * + * RETURN: Combined status of all GPEs + * + * DESCRIPTION: Check all enabled GPEs in all GPE blocks and return TRUE if the + * status bit is set for at least one of them of FALSE otherwise. + * + ******************************************************************************/ + +u8 acpi_hw_check_all_gpes(void) +{ + u8 ret = 0; + + ACPI_FUNCTION_TRACE(acpi_hw_check_all_gpes); + + (void)acpi_ev_walk_gpe_list(acpi_hw_get_gpe_block_status, &ret); + + return (ret != 0); +} + #endif /* !ACPI_REDUCED_HARDWARE */ --- a/include/acpi/acpixf.h +++ b/include/acpi/acpixf.h @@ -752,6 +752,7 @@ ACPI_HW_DEPENDENT_RETURN_UINT32(u32 acpi ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_disable_all_gpes(void)) ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_enable_all_runtime_gpes(void)) ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_enable_all_wakeup_gpes(void)) +ACPI_HW_DEPENDENT_RETURN_UINT32(u32 acpi_any_gpe_status_set(void))
ACPI_HW_DEPENDENT_RETURN_STATUS(acpi_status acpi_get_gpe_device(u32 gpe_index,
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit fdde0ff8590b4c1c41b3227f5ac4265fccccb96b upstream.
If the platform triggers a spurious SCI even though the status bit is not set for any GPE when the system is suspended to idle, it will be treated as a genuine wakeup, so avoid that by checking if any GPEs are active at all before returning 'true' from acpi_s2idle_wake().
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206413 Fixes: 56b991849009 ("PM: sleep: Simplify suspend-to-idle control flow") Reported-by: Tsuchiya Yuto kitakar@gmail.com Cc: 5.4+ stable@vger.kernel.org # 5.4+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/acpi/sleep.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/drivers/acpi/sleep.c +++ b/drivers/acpi/sleep.c @@ -1003,10 +1003,16 @@ static bool acpi_s2idle_wake(void) return true;
/* - * If there are no EC events to process, the wakeup is regarded - * as a genuine one. + * If there are no EC events to process and at least one of the + * other enabled GPEs is active, the wakeup is regarded as a + * genuine one. + * + * Note that the checks below must be carried out in this order + * to avoid returning prematurely due to a change of the EC GPE + * status bit from unset to set between the checks with the + * status bits of all the other GPEs unset. */ - if (!acpi_ec_dispatch_gpe()) + if (acpi_any_gpe_status_set() && !acpi_ec_dispatch_gpe()) return true;
/*
From: Andreas Dilger adilger@dilger.ca
commit 14c9ca0583eee8df285d68a0e6ec71053efd2228 upstream.
Don't assume that the mmp_nodename and mmp_bdevname strings are NUL terminated, since they are filled in by snprintf(), which is not guaranteed to do so.
Link: https://lore.kernel.org/r/1580076215-1048-1-git-send-email-adilger@dilger.ca Signed-off-by: Andreas Dilger adilger@dilger.ca Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/mmp.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/fs/ext4/mmp.c +++ b/fs/ext4/mmp.c @@ -120,10 +120,10 @@ void __dump_mmp_msg(struct super_block * { __ext4_warning(sb, function, line, "%s", msg); __ext4_warning(sb, function, line, - "MMP failure info: last update time: %llu, last update " - "node: %s, last update device: %s", - (long long unsigned int) le64_to_cpu(mmp->mmp_time), - mmp->mmp_nodename, mmp->mmp_bdevname); + "MMP failure info: last update time: %llu, last update node: %.*s, last update device: %.*s", + (unsigned long long)le64_to_cpu(mmp->mmp_time), + (int)sizeof(mmp->mmp_nodename), mmp->mmp_nodename, + (int)sizeof(mmp->mmp_bdevname), mmp->mmp_bdevname); }
/* @@ -154,6 +154,7 @@ static int kmmpd(void *data) mmp_check_interval = max(EXT4_MMP_CHECK_MULT * mmp_update_interval, EXT4_MMP_MIN_CHECK_INTERVAL); mmp->mmp_check_interval = cpu_to_le16(mmp_check_interval); + BUILD_BUG_ON(sizeof(mmp->mmp_bdevname) < BDEVNAME_SIZE); bdevname(bh->b_bdev, mmp->mmp_bdevname);
memcpy(mmp->mmp_nodename, init_utsname()->nodename, @@ -375,7 +376,8 @@ skip: /* * Start a kernel thread to update the MMP block periodically. */ - EXT4_SB(sb)->s_mmp_tsk = kthread_run(kmmpd, mmpd_data, "kmmpd-%s", + EXT4_SB(sb)->s_mmp_tsk = kthread_run(kmmpd, mmpd_data, "kmmpd-%.*s", + (int)sizeof(mmp->mmp_bdevname), bdevname(bh->b_bdev, mmp->mmp_bdevname)); if (IS_ERR(EXT4_SB(sb)->s_mmp_tsk)) {
From: Theodore Ts'o tytso@mit.edu
commit 4f97a68192bd33b9963b400759cef0ca5963af00 upstream.
A recent commit, 9803387c55f7 ("ext4: validate the debug_want_extra_isize mount option at parse time"), moved mount-time checks around. One of those changes moved the inode size check before the blocksize variable was set to the blocksize of the file system. After 9803387c55f7 was set to the minimum allowable blocksize, which in practice on most systems would be 1024 bytes. This cuased file systems with inode sizes larger than 1024 bytes to be rejected with a message:
EXT4-fs (sdXX): unsupported inode size: 4096
Fixes: 9803387c55f7 ("ext4: validate the debug_want_extra_isize mount option at parse time") Link: https://lore.kernel.org/r/20200206225252.GA3673@mit.edu Reported-by: Herbert Poetzl herbert@13thfloor.at Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/super.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -3768,6 +3768,15 @@ static int ext4_fill_super(struct super_ */ sbi->s_li_wait_mult = EXT4_DEF_LI_WAIT_MULT;
+ blocksize = BLOCK_SIZE << le32_to_cpu(es->s_log_block_size); + if (blocksize < EXT4_MIN_BLOCK_SIZE || + blocksize > EXT4_MAX_BLOCK_SIZE) { + ext4_msg(sb, KERN_ERR, + "Unsupported filesystem blocksize %d (%d log_block_size)", + blocksize, le32_to_cpu(es->s_log_block_size)); + goto failed_mount; + } + if (le32_to_cpu(es->s_rev_level) == EXT4_GOOD_OLD_REV) { sbi->s_inode_size = EXT4_GOOD_OLD_INODE_SIZE; sbi->s_first_ino = EXT4_GOOD_OLD_FIRST_INO; @@ -3785,6 +3794,7 @@ static int ext4_fill_super(struct super_ ext4_msg(sb, KERN_ERR, "unsupported inode size: %d", sbi->s_inode_size); + ext4_msg(sb, KERN_ERR, "blocksize: %d", blocksize); goto failed_mount; } /* @@ -3988,14 +3998,6 @@ static int ext4_fill_super(struct super_ if (!ext4_feature_set_ok(sb, (sb_rdonly(sb)))) goto failed_mount;
- blocksize = BLOCK_SIZE << le32_to_cpu(es->s_log_block_size); - if (blocksize < EXT4_MIN_BLOCK_SIZE || - blocksize > EXT4_MAX_BLOCK_SIZE) { - ext4_msg(sb, KERN_ERR, - "Unsupported filesystem blocksize %d (%d log_block_size)", - blocksize, le32_to_cpu(es->s_log_block_size)); - goto failed_mount; - } if (le32_to_cpu(es->s_log_block_size) > (EXT4_MAX_BLOCK_LOG_SIZE - EXT4_MIN_BLOCK_LOG_SIZE)) { ext4_msg(sb, KERN_ERR,
From: Jan Kara jack@suse.cz
commit 48a34311953d921235f4d7bbd2111690d2e469cf upstream.
DIR_INDEX has been introduced as a compat ext4 feature. That means that even kernels / tools that don't understand the feature may modify the filesystem. This works because for kernels not understanding indexed dir format, internal htree nodes appear just as empty directory entries. Index dir aware kernels then check the htree structure is still consistent before using the data. This all worked reasonably well until metadata checksums were introduced. The problem is that these effectively made DIR_INDEX only ro-compatible because internal htree nodes store checksums in a different place than normal directory blocks. Thus any modification ignorant to DIR_INDEX (or just clearing EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch and trigger kernel errors. So we have to be more careful when dealing with indexed directories on filesystems with checksumming enabled.
1) We just disallow loading any directory inodes with EXT4_INDEX_FL when DIR_INDEX is not enabled. This is harsh but it should be very rare (it means someone disabled DIR_INDEX on existing filesystem and didn't run e2fsck), e2fsck can fix the problem, and we don't want to answer the difficult question: "Should we rather corrupt the directory more or should we ignore that DIR_INDEX feature is not set?"
2) When we find out htree structure is corrupted (but the filesystem and the directory should in support htrees), we continue just ignoring htree information for reading but we refuse to add new entries to the directory to avoid corrupting it more.
Link: https://lore.kernel.org/r/20200210144316.22081-1-jack@suse.cz Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes") Reviewed-by: Andreas Dilger adilger@dilger.ca Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/dir.c | 14 ++++++++------ fs/ext4/ext4.h | 5 ++++- fs/ext4/inode.c | 12 ++++++++++++ fs/ext4/namei.c | 7 +++++++ 4 files changed, 31 insertions(+), 7 deletions(-)
--- a/fs/ext4/dir.c +++ b/fs/ext4/dir.c @@ -129,12 +129,14 @@ static int ext4_readdir(struct file *fil if (err != ERR_BAD_DX_DIR) { return err; } - /* - * We don't set the inode dirty flag since it's not - * critical that it get flushed back to the disk. - */ - ext4_clear_inode_flag(file_inode(file), - EXT4_INODE_INDEX); + /* Can we just clear INDEX flag to ignore htree information? */ + if (!ext4_has_metadata_csum(sb)) { + /* + * We don't set the inode dirty flag since it's not + * critical that it gets flushed back to the disk. + */ + ext4_clear_inode_flag(inode, EXT4_INODE_INDEX); + } }
if (ext4_has_inline_data(inode)) { --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2482,8 +2482,11 @@ void ext4_insert_dentry(struct inode *in struct ext4_filename *fname); static inline void ext4_update_dx_flag(struct inode *inode) { - if (!ext4_has_feature_dir_index(inode->i_sb)) + if (!ext4_has_feature_dir_index(inode->i_sb)) { + /* ext4_iget() should have caught this... */ + WARN_ON_ONCE(ext4_has_feature_metadata_csum(inode->i_sb)); ext4_clear_inode_flag(inode, EXT4_INODE_INDEX); + } } static const unsigned char ext4_filetype_table[] = { DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4615,6 +4615,18 @@ struct inode *__ext4_iget(struct super_b ret = -EFSCORRUPTED; goto bad_inode; } + /* + * If dir_index is not enabled but there's dir with INDEX flag set, + * we'd normally treat htree data as empty space. But with metadata + * checksumming that corrupts checksums so forbid that. + */ + if (!ext4_has_feature_dir_index(sb) && ext4_has_metadata_csum(sb) && + ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) { + ext4_error_inode(inode, function, line, 0, + "iget: Dir with htree data on filesystem without dir_index feature."); + ret = -EFSCORRUPTED; + goto bad_inode; + } ei->i_disksize = inode->i_size; #ifdef CONFIG_QUOTA ei->i_reserved_quota = 0; --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -2207,6 +2207,13 @@ static int ext4_add_entry(handle_t *hand retval = ext4_dx_add_entry(handle, &fname, dir, inode); if (!retval || (retval != ERR_BAD_DX_DIR)) goto out; + /* Can we just ignore htree data? */ + if (ext4_has_metadata_csum(sb)) { + EXT4_ERROR_INODE(dir, + "Directory has corrupted htree index."); + retval = -EFSCORRUPTED; + goto out; + } ext4_clear_inode_flag(dir, EXT4_INODE_INDEX); dx_fallback++; ext4_mark_inode_dirty(handle, dir);
From: Shijie Luo luoshijie1@huawei.com
commit af133ade9a40794a37104ecbcc2827c0ea373a3c upstream.
When journal size is set too big by "mkfs.ext4 -J size=", or when we mount a crafted image to make journal inode->i_size too big, the loop, "while (i < num)", holds cpu too long. This could cause soft lockup.
[ 529.357541] Call trace: [ 529.357551] dump_backtrace+0x0/0x198 [ 529.357555] show_stack+0x24/0x30 [ 529.357562] dump_stack+0xa4/0xcc [ 529.357568] watchdog_timer_fn+0x300/0x3e8 [ 529.357574] __hrtimer_run_queues+0x114/0x358 [ 529.357576] hrtimer_interrupt+0x104/0x2d8 [ 529.357580] arch_timer_handler_virt+0x38/0x58 [ 529.357584] handle_percpu_devid_irq+0x90/0x248 [ 529.357588] generic_handle_irq+0x34/0x50 [ 529.357590] __handle_domain_irq+0x68/0xc0 [ 529.357593] gic_handle_irq+0x6c/0x150 [ 529.357595] el1_irq+0xb8/0x140 [ 529.357599] __ll_sc_atomic_add_return_acquire+0x14/0x20 [ 529.357668] ext4_map_blocks+0x64/0x5c0 [ext4] [ 529.357693] ext4_setup_system_zone+0x330/0x458 [ext4] [ 529.357717] ext4_fill_super+0x2170/0x2ba8 [ext4] [ 529.357722] mount_bdev+0x1a8/0x1e8 [ 529.357746] ext4_mount+0x44/0x58 [ext4] [ 529.357748] mount_fs+0x50/0x170 [ 529.357752] vfs_kern_mount.part.9+0x54/0x188 [ 529.357755] do_mount+0x5ac/0xd78 [ 529.357758] ksys_mount+0x9c/0x118 [ 529.357760] __arm64_sys_mount+0x28/0x38 [ 529.357764] el0_svc_common+0x78/0x130 [ 529.357766] el0_svc_handler+0x38/0x78 [ 529.357769] el0_svc+0x8/0xc [ 541.356516] watchdog: BUG: soft lockup - CPU#0 stuck for 23s! [mount:18674]
Link: https://lore.kernel.org/r/20200211011752.29242-1-luoshijie1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Shijie Luo luoshijie1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/block_validity.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/ext4/block_validity.c +++ b/fs/ext4/block_validity.c @@ -207,6 +207,7 @@ static int ext4_protect_reserved_inode(s return PTR_ERR(inode); num = (inode->i_size + sb->s_blocksize - 1) >> sb->s_blocksize_bits; while (i < num) { + cond_resched(); map.m_lblk = i; map.m_len = num - i; n = ext4_map_blocks(NULL, inode, &map, 0);
From: Theodore Ts'o tytso@mit.edu
commit d65d87a07476aa17df2dcb3ad18c22c154315bec upstream.
If CONFIG_QFMT_V2 is not enabled, but CONFIG_QUOTA is enabled, when a user tries to mount a file system with the quota or project quota enabled, the kernel will emit a very confusing messsage:
EXT4-fs warning (device vdc): ext4_enable_quotas:5914: Failed to enable quota tracking (type=0, err=-3). Please run e2fsck to fix. EXT4-fs (vdc): mount failed
We will now report an explanatory message indicating which kernel configuration options have to be enabled, to avoid customer/sysadmin confusion.
Link: https://lore.kernel.org/r/20200215012738.565735-1-tytso@mit.edu Google-Bug-Id: 149093531 Fixes: 7c319d328505b778 ("ext4: make quota as first class supported feature") Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ext4/super.c | 14 ++++---------- 1 file changed, 4 insertions(+), 10 deletions(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -2964,17 +2964,11 @@ static int ext4_feature_set_ok(struct su return 0; }
-#ifndef CONFIG_QUOTA - if (ext4_has_feature_quota(sb) && !readonly) { +#if !defined(CONFIG_QUOTA) || !defined(CONFIG_QFMT_V2) + if (!readonly && (ext4_has_feature_quota(sb) || + ext4_has_feature_project(sb))) { ext4_msg(sb, KERN_ERR, - "Filesystem with quota feature cannot be mounted RDWR " - "without CONFIG_QUOTA"); - return 0; - } - if (ext4_has_feature_project(sb) && !readonly) { - ext4_msg(sb, KERN_ERR, - "Filesystem with project quota feature cannot be mounted RDWR " - "without CONFIG_QUOTA"); + "The kernel was not built with CONFIG_QUOTA and CONFIG_QFMT_V2"); return 0; } #endif /* CONFIG_QUOTA */
From: Filipe Manana fdmanana@suse.com
commit ac05ca913e9f3871126d61da275bfe8516ff01ca upstream.
We have a few cases where we allow an extent map that is in an extent map tree to be merged with other extents in the tree. Such cases include the unpinning of an extent after the respective ordered extent completed or after logging an extent during a fast fsync. This can lead to subtle and dangerous problems because when doing the merge some other task might be using the same extent map and as consequence see an inconsistent state of the extent map - for example sees the new length but has seen the old start offset.
With luck this triggers a BUG_ON(), and not some silent bug, such as the following one in __do_readpage():
$ cat -n fs/btrfs/extent_io.c 3061 static int __do_readpage(struct extent_io_tree *tree, 3062 struct page *page, (...) 3127 em = __get_extent_map(inode, page, pg_offset, cur, 3128 end - cur + 1, get_extent, em_cached); 3129 if (IS_ERR_OR_NULL(em)) { 3130 SetPageError(page); 3131 unlock_extent(tree, cur, end); 3132 break; 3133 } 3134 extent_offset = cur - em->start; 3135 BUG_ON(extent_map_end(em) <= cur); (...)
Consider the following example scenario, where we end up hitting the BUG_ON() in __do_readpage().
We have an inode with a size of 8KiB and 2 extent maps:
extent A: file offset 0, length 4KiB, disk_bytenr = X, persisted on disk by a previous transaction
extent B: file offset 4KiB, length 4KiB, disk_bytenr = X + 4KiB, not yet persisted but writeback started for it already. The extent map is pinned since there's writeback and an ordered extent in progress, so it can not be merged with extent map A yet
The following sequence of steps leads to the BUG_ON():
1) The ordered extent for extent B completes, the respective page gets its writeback bit cleared and the extent map is unpinned, at that point it is not yet merged with extent map A because it's in the list of modified extents;
2) Due to memory pressure, or some other reason, the MM subsystem releases the page corresponding to extent B - btrfs_releasepage() is called and returns 1, meaning the page can be released as it's not dirty, not under writeback anymore and the extent range is not locked in the inode's iotree. However the extent map is not released, either because we are not in a context that allows memory allocations to block or because the inode's size is smaller than 16MiB - in this case our inode has a size of 8KiB;
3) Task B needs to read extent B and ends up __do_readpage() through the btrfs_readpage() callback. At __do_readpage() it gets a reference to extent map B;
4) Task A, doing a fast fsync, calls clear_em_loggin() against extent map B while holding the write lock on the inode's extent map tree - this results in try_merge_map() being called and since it's possible to merge extent map B with extent map A now (the extent map B was removed from the list of modified extents), the merging begins - it sets extent map B's start offset to 0 (was 4KiB), but before it increments the map's length to 8KiB (4kb + 4KiB), task A is at:
BUG_ON(extent_map_end(em) <= cur);
The call to extent_map_end() sees the extent map has a start of 0 and a length still at 4KiB, so it returns 4KiB and 'cur' is 4KiB, so the BUG_ON() is triggered.
So it's dangerous to modify an extent map that is in the tree, because some other task might have got a reference to it before and still using it, and needs to see a consistent map while using it. Generally this is very rare since most paths that lookup and use extent maps also have the file range locked in the inode's iotree. The fsync path is pretty much the only exception where we don't do it to avoid serialization with concurrent reads.
Fix this by not allowing an extent map do be merged if if it's being used by tasks other then the one attempting to merge the extent map (when the reference count of the extent map is greater than 2).
Reported-by: ryusuke1925 st13s20@gm.ibaraki-ct.ac.jp Reported-by: Koki Mitani koki.mitani.xg@hco.ntt.co.jp Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=206211 CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik josef@toxicpanda.com Signed-off-by: Filipe Manana fdmanana@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/extent_map.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -237,6 +237,17 @@ static void try_merge_map(struct extent_ struct extent_map *merge = NULL; struct rb_node *rb;
+ /* + * We can't modify an extent map that is in the tree and that is being + * used by another task, as it can cause that other task to see it in + * inconsistent state during the merging. We always have 1 reference for + * the tree and 1 for this task (which is unpinning the extent map or + * clearing the logging flag), so anything > 2 means it's being used by + * other tasks too. + */ + if (refcount_read(&em->refs) > 2) + return; + if (em->start != 0) { rb = rb_prev(&em->rb_node); if (rb)
From: Wenwen Wang wenwen@cs.uga.edu
commit f311ade3a7adf31658ed882aaab9f9879fdccef7 upstream.
In btrfs_ref_tree_mod(), 'ref' and 'ra' are allocated through kzalloc() and kmalloc(), respectively. In the following code, if an error occurs, the execution will be redirected to 'out' or 'out_unlock' and the function will be exited. However, on some of the paths, 'ref' and 'ra' are not deallocated, leading to memory leaks. For example, if 'action' is BTRFS_ADD_DELAYED_EXTENT, add_block_entry() will be invoked. If the return value indicates an error, the execution will be redirected to 'out'. But, 'ref' is not deallocated on this path, causing a memory leak.
To fix the above issues, deallocate both 'ref' and 'ra' before exiting from the function when an error is encountered.
CC: stable@vger.kernel.org # 4.15+ Signed-off-by: Wenwen Wang wenwen@cs.uga.edu Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/ref-verify.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/btrfs/ref-verify.c +++ b/fs/btrfs/ref-verify.c @@ -744,6 +744,7 @@ int btrfs_ref_tree_mod(struct btrfs_fs_i */ be = add_block_entry(fs_info, bytenr, num_bytes, ref_root); if (IS_ERR(be)) { + kfree(ref); kfree(ra); ret = PTR_ERR(be); goto out; @@ -757,6 +758,8 @@ int btrfs_ref_tree_mod(struct btrfs_fs_i "re-allocated a block that still has references to it!"); dump_block_entry(fs_info, be); dump_ref_action(fs_info, ra); + kfree(ref); + kfree(ra); goto out_unlock; }
@@ -819,6 +822,7 @@ int btrfs_ref_tree_mod(struct btrfs_fs_i "dropping a ref for a existing root that doesn't have a ref on the block"); dump_block_entry(fs_info, be); dump_ref_action(fs_info, ra); + kfree(ref); kfree(ra); goto out_unlock; } @@ -834,6 +838,7 @@ int btrfs_ref_tree_mod(struct btrfs_fs_i "attempting to add another ref for an existing ref on a tree block"); dump_block_entry(fs_info, be); dump_ref_action(fs_info, ra); + kfree(ref); kfree(ra); goto out_unlock; }
From: David Sterba dsterba@suse.com
commit e8294f2f6aa6208ed0923aa6d70cea3be178309a upstream.
There's no logged information about tree-log replay although this is something that points to previous unclean unmount. Other filesystems report that as well.
Suggested-by: Chris Murphy lists@colorremedies.com CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Anand Jain anand.jain@oracle.com Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/disk-io.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3164,6 +3164,7 @@ int __cold open_ctree(struct super_block /* do not make disk changes in broken FS or nologreplay is given */ if (btrfs_super_log_root(disk_super) != 0 && !btrfs_test_opt(fs_info, NOLOGREPLAY)) { + btrfs_info(fs_info, "start tree-log replay"); ret = btrfs_replay_log(fs_info, fs_devices); if (ret) { err = ret;
From: David Sterba dsterba@suse.com
commit 10a3a3edc5b89a8cd095bc63495fb1e0f42047d9 upstream.
A remount to a read-write filesystem is not safe when there's tree-log to be replayed. Files that could be opened until now might be affected by the changes in the tree-log.
A regular mount is needed to replay the log so the filesystem presents the consistent view with the pending changes included.
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Anand Jain anand.jain@oracle.com Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/btrfs/super.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1803,6 +1803,8 @@ static int btrfs_remount(struct super_bl }
if (btrfs_super_log_root(fs_info->super_copy) != 0) { + btrfs_warn(fs_info, + "mount required to replay tree-log, cannot remount read-write"); ret = -EINVAL; goto restore; }
From: Krzysztof Kozlowski krzk@kernel.org
commit e383e871ab54f073c2a798a9e0bde7f1d0528de8 upstream.
The CONFIG_ARCH_REQUIRE_GPIOLIB is gone since commit 65053e1a7743 ("gpio: delete ARCH_[WANTS_OPTIONAL|REQUIRE]_GPIOLIB") and all platforms should explicitly select GPIOLIB to have it.
Link: https://lore.kernel.org/r/20200130195525.4525-1-krzk@kernel.org Cc: stable@vger.kernel.org Fixes: 65053e1a7743 ("gpio: delete ARCH_[WANTS_OPTIONAL|REQUIRE]_GPIOLIB") Signed-off-by: Krzysztof Kozlowski krzk@kernel.org Signed-off-by: Olof Johansson olof@lixom.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm/mach-npcm/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/mach-npcm/Kconfig +++ b/arch/arm/mach-npcm/Kconfig @@ -11,7 +11,7 @@ config ARCH_NPCM7XX depends on ARCH_MULTI_V7 select PINCTRL_NPCM7XX select NPCM7XX_TIMER - select ARCH_REQUIRE_GPIOLIB + select GPIOLIB select CACHE_L2X0 select ARM_GIC select HAVE_ARM_TWD if SMP
From: Paul Thomas pthomas8589@gmail.com
commit c3afa804c58e5c30ac63858b527fffadc88bce82 upstream.
Care is taken with "index", however with the current version the actual xgpio_writereg is using index for data but xgpio_regoffset(chip, i) for the offset. And since i is already incremented it is incorrect. This patch fixes it so that index is used for the offset too.
Cc: stable@vger.kernel.org Signed-off-by: Paul Thomas pthomas8589@gmail.com Link: https://lore.kernel.org/r/20200125221410.8022-1-pthomas8589@gmail.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpio/gpio-xilinx.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/gpio/gpio-xilinx.c +++ b/drivers/gpio/gpio-xilinx.c @@ -147,9 +147,10 @@ static void xgpio_set_multiple(struct gp for (i = 0; i < gc->ngpio; i++) { if (*mask == 0) break; + /* Once finished with an index write it out to the register */ if (index != xgpio_index(chip, i)) { xgpio_writereg(chip->regs + XGPIO_DATA_OFFSET + - xgpio_regoffset(chip, i), + index * XGPIO_CHANNEL_OFFSET, chip->gpio_state[index]); spin_unlock_irqrestore(&chip->gpio_lock[index], flags); index = xgpio_index(chip, i); @@ -165,7 +166,7 @@ static void xgpio_set_multiple(struct gp }
xgpio_writereg(chip->regs + XGPIO_DATA_OFFSET + - xgpio_regoffset(chip, i), chip->gpio_state[index]); + index * XGPIO_CHANNEL_OFFSET, chip->gpio_state[index]);
spin_unlock_irqrestore(&chip->gpio_lock[index], flags); }
From: Will Deacon will@kernel.org
commit fca3d33d8ad61eb53eca3ee4cac476d1e31b9008 upstream.
When all CPUs in the system implement the SSBS extension, the SSBS field in PSTATE is the definitive indication of the mitigation state. Further, when the CPUs implement the SSBS manipulation instructions (advertised to userspace via an HWCAP), EL0 can toggle the SSBS field directly and so we cannot rely on any shadow state such as TIF_SSBD at all.
Avoid forcing the SSBS field in context-switch on such a system, and simply rely on the PSTATE register instead.
Cc: stable@vger.kernel.org Cc: Catalin Marinas catalin.marinas@arm.com Cc: Srinivas Ramana sramana@codeaurora.org Fixes: cbdf8a189a66 ("arm64: Force SSBS on context switch") Reviewed-by: Marc Zyngier maz@kernel.org Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/kernel/process.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/arch/arm64/kernel/process.c +++ b/arch/arm64/kernel/process.c @@ -466,6 +466,13 @@ static void ssbs_thread_switch(struct ta if (unlikely(next->flags & PF_KTHREAD)) return;
+ /* + * If all CPUs implement the SSBS extension, then we just need to + * context-switch the PSTATE field. + */ + if (cpu_have_feature(cpu_feature(SSBS))) + return; + /* If the mitigation is enabled, then we leave SSBS clear. */ if ((arm64_get_ssbd_state() == ARM64_SSBD_FORCE_ENABLE) || test_tsk_thread_flag(next, TIF_SSBD))
From: Tejun Heo tj@kernel.org
commit 0cd9d33ace336bc424fc30944aa3defd6786e4fe upstream.
5153faac18d2 ("cgroup: remove cgroup_enable_task_cg_lists() optimization") removed lazy initialization of css_sets so that new tasks are always lniked to its css_set. In the process, it incorrectly ended up adding init_tasks to root css_set. They show up as PID 0's in root's cgroup.procs triggering warnings in systemd and generally confusing people.
Fix it by skip css_set linking for init_tasks.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: https://github.com/joanbm Link: https://github.com/systemd/systemd/issues/14682 Fixes: 5153faac18d2 ("cgroup: remove cgroup_enable_task_cg_lists() optimization") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/cgroup/cgroup.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -5932,11 +5932,14 @@ void cgroup_post_fork(struct task_struct
spin_lock_irq(&css_set_lock);
- WARN_ON_ONCE(!list_empty(&child->cg_list)); - cset = task_css_set(current); /* current is @child's parent */ - get_css_set(cset); - cset->nr_tasks++; - css_set_move_task(child, NULL, cset, false); + /* init tasks are special, only link regular threads */ + if (likely(child->pid)) { + WARN_ON_ONCE(!list_empty(&child->cg_list)); + cset = task_css_set(current); /* current is @child's parent */ + get_css_set(cset); + cset->nr_tasks++; + css_set_move_task(child, NULL, cset, false); + }
/* * If the cgroup has to be frozen, the new task has too. Let's set
From: Chuck Lever chuck.lever@oracle.com
commit ca1c671302825182629d3c1a60363cee6f5455bb upstream.
The @nents value that was passed to ib_dma_map_sg() has to be passed to the matching ib_dma_unmap_sg() call. If ib_dma_map_sg() choses to concatenate sg entries, it will return a different nents value than it was passed.
The bug was exposed by recent changes to the AMD IOMMU driver, which enabled sg entry concatenation.
Looking all the way back to commit 4143f34e01e9 ("xprtrdma: Port to new memory registration API") and reviewing other kernel ULPs, it's not clear that the frwr_map() logic was ever correct for this case.
Reported-by: Andre Tomt andre@tomt.net Suggested-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org Reviewed-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/sunrpc/xprtrdma/frwr_ops.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
--- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -298,8 +298,8 @@ struct rpcrdma_mr_seg *frwr_map(struct r { struct rpcrdma_ia *ia = &r_xprt->rx_ia; struct ib_reg_wr *reg_wr; + int i, n, dma_nents; struct ib_mr *ibmr; - int i, n; u8 key;
if (nsegs > ia->ri_max_frwr_depth) @@ -323,15 +323,16 @@ struct rpcrdma_mr_seg *frwr_map(struct r break; } mr->mr_dir = rpcrdma_data_dir(writing); + mr->mr_nents = i;
- mr->mr_nents = - ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, i, mr->mr_dir); - if (!mr->mr_nents) + dma_nents = ib_dma_map_sg(ia->ri_id->device, mr->mr_sg, mr->mr_nents, + mr->mr_dir); + if (!dma_nents) goto out_dmamap_err;
ibmr = mr->frwr.fr_mr; - n = ib_map_mr_sg(ibmr, mr->mr_sg, mr->mr_nents, NULL, PAGE_SIZE); - if (unlikely(n != mr->mr_nents)) + n = ib_map_mr_sg(ibmr, mr->mr_sg, dma_nents, NULL, PAGE_SIZE); + if (n != dma_nents) goto out_mapmr_err;
ibmr->iova &= 0x00000000ffffffff;
From: Ronnie Sahlberg lsahlber@redhat.com
commit 85db6b7ae65f33be4bb44f1c28261a3faa126437 upstream.
RHBZ: 1752437
Before we add a new EA we should check that this will not overflow the maximum buffer we have available to read the EAs back. Otherwise we can get into a situation where the EAs are so big that we can not read them back to the client and thus we can not list EAs anymore or delete them.
Signed-off-by: Ronnie Sahlberg lsahlber@redhat.com Signed-off-by: Steve French stfrench@microsoft.com CC: Stable stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/smb2ops.c | 35 ++++++++++++++++++++++++++++++++++- 1 file changed, 34 insertions(+), 1 deletion(-)
--- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -1115,7 +1115,8 @@ smb2_set_ea(const unsigned int xid, stru void *data[1]; struct smb2_file_full_ea_info *ea = NULL; struct kvec close_iov[1]; - int rc; + struct smb2_query_info_rsp *rsp; + int rc, used_len = 0;
if (smb3_encryption_required(tcon)) flags |= CIFS_TRANSFORM_REQ; @@ -1138,6 +1139,38 @@ smb2_set_ea(const unsigned int xid, stru cifs_sb); if (rc == -ENODATA) goto sea_exit; + } else { + /* If we are adding a attribute we should first check + * if there will be enough space available to store + * the new EA. If not we should not add it since we + * would not be able to even read the EAs back. + */ + rc = smb2_query_info_compound(xid, tcon, utf16_path, + FILE_READ_EA, + FILE_FULL_EA_INFORMATION, + SMB2_O_INFO_FILE, + CIFSMaxBufSize - + MAX_SMB2_CREATE_RESPONSE_SIZE - + MAX_SMB2_CLOSE_RESPONSE_SIZE, + &rsp_iov[1], &resp_buftype[1], cifs_sb); + if (rc == 0) { + rsp = (struct smb2_query_info_rsp *)rsp_iov[1].iov_base; + used_len = le32_to_cpu(rsp->OutputBufferLength); + } + free_rsp_buf(resp_buftype[1], rsp_iov[1].iov_base); + resp_buftype[1] = CIFS_NO_BUFFER; + memset(&rsp_iov[1], 0, sizeof(rsp_iov[1])); + rc = 0; + + /* Use a fudge factor of 256 bytes in case we collide + * with a different set_EAs command. + */ + if(CIFSMaxBufSize - MAX_SMB2_CREATE_RESPONSE_SIZE - + MAX_SMB2_CLOSE_RESPONSE_SIZE - 256 < + used_len + ea_name_len + ea_value_len + 1) { + rc = -ENOSPC; + goto sea_exit; + } } }
From: zhangyi (F) yi.zhang@huawei.com
commit 6a66a7ded12baa6ebbb2e3e82f8cb91382814839 upstream.
There is no need to delay the clearing of b_modified flag to the transaction committing time when unmapping the journalled buffer, so just move it to the journal_unmap_buffer().
Link: https://lore.kernel.org/r/20200213063821.30455-2-yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/jbd2/commit.c | 43 +++++++++++++++---------------------------- fs/jbd2/transaction.c | 10 ++++++---- 2 files changed, 21 insertions(+), 32 deletions(-)
--- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -976,34 +976,21 @@ restart_loop: * it. */
/* - * A buffer which has been freed while still being journaled by - * a previous transaction. - */ - if (buffer_freed(bh)) { - /* - * If the running transaction is the one containing - * "add to orphan" operation (b_next_transaction != - * NULL), we have to wait for that transaction to - * commit before we can really get rid of the buffer. - * So just clear b_modified to not confuse transaction - * credit accounting and refile the buffer to - * BJ_Forget of the running transaction. If the just - * committed transaction contains "add to orphan" - * operation, we can completely invalidate the buffer - * now. We are rather through in that since the - * buffer may be still accessible when blocksize < - * pagesize and it is attached to the last partial - * page. - */ - jh->b_modified = 0; - if (!jh->b_next_transaction) { - clear_buffer_freed(bh); - clear_buffer_jbddirty(bh); - clear_buffer_mapped(bh); - clear_buffer_new(bh); - clear_buffer_req(bh); - bh->b_bdev = NULL; - } + * A buffer which has been freed while still being journaled + * by a previous transaction, refile the buffer to BJ_Forget of + * the running transaction. If the just committed transaction + * contains "add to orphan" operation, we can completely + * invalidate the buffer now. We are rather through in that + * since the buffer may be still accessible when blocksize < + * pagesize and it is attached to the last partial page. + */ + if (buffer_freed(bh) && !jh->b_next_transaction) { + clear_buffer_freed(bh); + clear_buffer_jbddirty(bh); + clear_buffer_mapped(bh); + clear_buffer_new(bh); + clear_buffer_req(bh); + bh->b_bdev = NULL; }
if (buffer_jbddirty(bh)) { --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -2329,14 +2329,16 @@ static int journal_unmap_buffer(journal_ return -EBUSY; } /* - * OK, buffer won't be reachable after truncate. We just set - * j_next_transaction to the running transaction (if there is - * one) and mark buffer as freed so that commit code knows it - * should clear dirty bits when it is done with the buffer. + * OK, buffer won't be reachable after truncate. We just clear + * b_modified to not confuse transaction credit accounting, and + * set j_next_transaction to the running transaction (if there + * is one) and mark buffer as freed so that commit code knows + * it should clear dirty bits when it is done with the buffer. */ set_buffer_freed(bh); if (journal->j_running_transaction && buffer_jbddirty(bh)) jh->b_next_transaction = journal->j_running_transaction; + jh->b_modified = 0; spin_unlock(&journal->j_list_lock); spin_unlock(&jh->b_state_lock); write_unlock(&journal->j_state_lock);
From: zhangyi (F) yi.zhang@huawei.com
commit c96dceeabf765d0b1b1f29c3bf50a5c01315b820 upstream.
Commit 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction") set the BH_Freed flag when forgetting a metadata buffer which belongs to the committing transaction, it indicate the committing process clear dirty bits when it is done with the buffer. But it also clear the BH_Mapped flag at the same time, which may trigger below NULL pointer oops when block_size < PAGE_SIZE.
rmdir 1 kjournald2 mkdir 2 jbd2_journal_commit_transaction commit transaction N jbd2_journal_forget set_buffer_freed(bh1) jbd2_journal_commit_transaction commit transaction N+1 ... clear_buffer_mapped(bh1) ext4_getblk(bh2 ummapped) ... grow_dev_page init_page_buffers bh1->b_private=NULL bh2->b_private=NULL jbd2_journal_put_journal_head(jh1) __journal_remove_journal_head(hb1) jh1 is NULL and trigger oops
*) Dir entry block bh1 and bh2 belongs to one page, and the bh2 has already been unmapped.
For the metadata buffer we forgetting, we should always keep the mapped flag and clear the dirty flags is enough, so this patch pick out the these buffers and keep their BH_Mapped flag.
Link: https://lore.kernel.org/r/20200213063821.30455-3-yi.zhang@huawei.com Fixes: 904cdbd41d74 ("jbd2: clear dirty flag when revoking a buffer from an older transaction") Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: zhangyi (F) yi.zhang@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/jbd2/commit.c | 25 +++++++++++++++++++++---- 1 file changed, 21 insertions(+), 4 deletions(-)
--- a/fs/jbd2/commit.c +++ b/fs/jbd2/commit.c @@ -985,12 +985,29 @@ restart_loop: * pagesize and it is attached to the last partial page. */ if (buffer_freed(bh) && !jh->b_next_transaction) { + struct address_space *mapping; + clear_buffer_freed(bh); clear_buffer_jbddirty(bh); - clear_buffer_mapped(bh); - clear_buffer_new(bh); - clear_buffer_req(bh); - bh->b_bdev = NULL; + + /* + * Block device buffers need to stay mapped all the + * time, so it is enough to clear buffer_jbddirty and + * buffer_freed bits. For the file mapping buffers (i.e. + * journalled data) we need to unmap buffer and clear + * more bits. We also need to be careful about the check + * because the data page mapping can get cleared under + * out hands, which alse need not to clear more bits + * because the page and buffers will be freed and can + * never be reused once we are done with them. + */ + mapping = READ_ONCE(bh->b_page->mapping); + if (mapping && !sb_is_blkdev_sb(mapping->host->i_sb)) { + clear_buffer_mapped(bh); + clear_buffer_new(bh); + clear_buffer_req(bh); + bh->b_bdev = NULL; + } }
if (buffer_jbddirty(bh)) {
From: Robert Richter rrichter@marvell.com
commit 4d59588c09f2a2daedad2a544d4d1b602ab3a8af upstream.
All created csrow objects must be removed in the error path of edac_create_csrow_objects(). The objects have been added as devices.
They need to be removed by doing a device_del() *and* put_device() call to also free their memory. The missing put_device() leaves a memory leak. Use device_unregister() instead of device_del() which properly unregisters the device doing both.
Fixes: 7adc05d2dc3a ("EDAC/sysfs: Drop device references properly") Signed-off-by: Robert Richter rrichter@marvell.com Signed-off-by: Borislav Petkov bp@suse.de Tested-by: John Garry john.garry@huawei.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200212120340.4764-4-rrichter@marvell.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/edac/edac_mc_sysfs.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/edac/edac_mc_sysfs.c +++ b/drivers/edac/edac_mc_sysfs.c @@ -447,8 +447,7 @@ error: csrow = mci->csrows[i]; if (!nr_pages_per_csrow(csrow)) continue; - - device_del(&mci->csrows[i]->dev); + device_unregister(&mci->csrows[i]->dev); }
return err;
From: Robert Richter rrichter@marvell.com
commit 216aa145aaf379a50b17afc812db71d893bd6683 upstream.
A test kernel with the options DEBUG_TEST_DRIVER_REMOVE, KASAN and DEBUG_KMEMLEAK set, revealed several issues when removing an mci device:
1) Use-after-free:
On 27.11.19 17:07:33, John Garry wrote:
[ 22.104498] BUG: KASAN: use-after-free in edac_remove_sysfs_mci_device+0x148/0x180
The use-after-free is caused by the mci_for_each_dimm() macro called in edac_remove_sysfs_mci_device(). The iterator was introduced with
c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator").
The iterator loop calls device_unregister(&dimm->dev), which removes the sysfs entry of the device, but also frees the dimm struct in dimm_attr_release(). When incrementing the loop in mci_for_each_dimm(), the dimm struct is accessed again, after having been freed already.
The fix is to free all the mci device's subsequent dimm and csrow objects at a later point, in _edac_mc_free(), when the mci device itself is being freed.
This keeps the data structures intact and the mci device can be fully used until its removal. The change allows the safe usage of mci_for_each_dimm() to release dimm devices from sysfs.
2) Memory leaks:
Following memory leaks have been detected:
# grep edac /sys/kernel/debug/kmemleak | sort | uniq -c 1 [<000000003c0f58f9>] edac_mc_alloc+0x3bc/0x9d0 # mci->csrows 16 [<00000000bb932dc0>] edac_mc_alloc+0x49c/0x9d0 # csr->channels 16 [<00000000e2734dba>] edac_mc_alloc+0x518/0x9d0 # csr->channels[chn] 1 [<00000000eb040168>] edac_mc_alloc+0x5c8/0x9d0 # mci->dimms 34 [<00000000ef737c29>] ghes_edac_register+0x1c8/0x3f8 # see edac_mc_alloc()
All leaks are from memory allocated by edac_mc_alloc().
Note: The test above shows that edac_mc_alloc() was called here from ghes_edac_register(), thus both functions show up in the stack trace but the module causing the leaks is edac_mc. The comments with the data structures involved were made manually by analyzing the objdump.
The data structures listed above and created by edac_mc_alloc() are not properly removed during device removal, which is done in edac_mc_free().
There are two paths implemented to remove the device depending on device registration, _edac_mc_free() is called if the device is not registered and edac_unregister_sysfs() otherwise.
The implemenations differ. For the sysfs case, the mci device removal lacks the removal of subsequent data structures (csrows, channels, dimms). This causes the memory leaks (see mci_attr_release()).
[ bp: Massage commit message. ]
Fixes: c498afaf7df8 ("EDAC: Introduce an mci_for_each_dimm() iterator") Fixes: faa2ad09c01c ("edac_mc: edac_mc_free() cannot assume mem_ctl_info is registered in sysfs.") Fixes: 7a623c039075 ("edac: rewrite the sysfs code to use struct device") Reported-by: John Garry john.garry@huawei.com Signed-off-by: Robert Richter rrichter@marvell.com Signed-off-by: Borislav Petkov bp@suse.de Tested-by: John Garry john.garry@huawei.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200212120340.4764-3-rrichter@marvell.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/edac/edac_mc.c | 12 +++--------- drivers/edac/edac_mc_sysfs.c | 15 +++------------ 2 files changed, 6 insertions(+), 21 deletions(-)
--- a/drivers/edac/edac_mc.c +++ b/drivers/edac/edac_mc.c @@ -505,16 +505,10 @@ void edac_mc_free(struct mem_ctl_info *m { edac_dbg(1, "\n");
- /* If we're not yet registered with sysfs free only what was allocated - * in edac_mc_alloc(). - */ - if (!device_is_registered(&mci->dev)) { - _edac_mc_free(mci); - return; - } + if (device_is_registered(&mci->dev)) + edac_unregister_sysfs(mci);
- /* the mci instance is freed here, when the sysfs object is dropped */ - edac_unregister_sysfs(mci); + _edac_mc_free(mci); } EXPORT_SYMBOL_GPL(edac_mc_free);
--- a/drivers/edac/edac_mc_sysfs.c +++ b/drivers/edac/edac_mc_sysfs.c @@ -276,10 +276,7 @@ static const struct attribute_group *csr
static void csrow_attr_release(struct device *dev) { - struct csrow_info *csrow = container_of(dev, struct csrow_info, dev); - - edac_dbg(1, "device %s released\n", dev_name(dev)); - kfree(csrow); + /* release device with _edac_mc_free() */ }
static const struct device_type csrow_attr_type = { @@ -607,10 +604,7 @@ static const struct attribute_group *dim
static void dimm_attr_release(struct device *dev) { - struct dimm_info *dimm = container_of(dev, struct dimm_info, dev); - - edac_dbg(1, "device %s released\n", dev_name(dev)); - kfree(dimm); + /* release device with _edac_mc_free() */ }
static const struct device_type dimm_attr_type = { @@ -892,10 +886,7 @@ static const struct attribute_group *mci
static void mci_attr_release(struct device *dev) { - struct mem_ctl_info *mci = container_of(dev, struct mem_ctl_info, dev); - - edac_dbg(1, "device %s released\n", dev_name(dev)); - kfree(mci); + /* release device with _edac_mc_free() */ }
static const struct device_type mci_attr_type = {
From: Sean Christopherson sean.j.christopherson@intel.com
commit 148d735eb55d32848c3379e460ce365f2c1cbe4b upstream.
Hardcode the EPT page-walk level for L2 to be 4 levels, as KVM's MMU currently also hardcodes the page walk level for nested EPT to be 4 levels. The L2 guest is all but guaranteed to soft hang on its first instruction when L1 is using EPT, as KVM will construct 4-level page tables and then tell hardware to use 5-level page tables.
Fixes: 855feb673640 ("KVM: MMU: Add 5 level EPT & Shadow page table support.") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/vmx/vmx.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -2968,6 +2968,9 @@ void vmx_set_cr0(struct kvm_vcpu *vcpu,
static int get_ept_level(struct kvm_vcpu *vcpu) { + /* Nested EPT currently only supports 4-level walks. */ + if (is_guest_mode(vcpu) && nested_cpu_has_ept(get_vmcs12(vcpu))) + return 4; if (cpu_has_vmx_ept_5levels() && (cpuid_maxphyaddr(vcpu) > 48)) return 5; return 4;
From: Sean Christopherson sean.j.christopherson@intel.com
commit f6ab0107a4942dbf9a5cf0cca3f37e184870a360 upstream.
Define PT_MAX_FULL_LEVELS as PT64_ROOT_MAX_LEVEL, i.e. 5, to fix shadow paging for 5-level guest page tables. PT_MAX_FULL_LEVELS is used to size the arrays that track guest pages table information, i.e. using a "max levels" of 4 causes KVM to access garbage beyond the end of an array when querying state for level 5 entries. E.g. FNAME(gpte_changed) will read garbage and most likely return %true for a level 5 entry, soft-hanging the guest because FNAME(fetch) will restart the guest instead of creating SPTEs because it thinks the guest PTE has changed.
Note, KVM doesn't yet support 5-level nested EPT, so PT_MAX_FULL_LEVELS gets to stay "4" for the PTTYPE_EPT case.
Fixes: 855feb673640 ("KVM: MMU: Add 5 level EPT & Shadow page table support.") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson sean.j.christopherson@intel.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/x86/kvm/mmu/paging_tmpl.h +++ b/arch/x86/kvm/mmu/paging_tmpl.h @@ -33,7 +33,7 @@ #define PT_GUEST_ACCESSED_SHIFT PT_ACCESSED_SHIFT #define PT_HAVE_ACCESSED_DIRTY(mmu) true #ifdef CONFIG_X86_64 - #define PT_MAX_FULL_LEVELS 4 + #define PT_MAX_FULL_LEVELS PT64_ROOT_MAX_LEVEL #define CMPXCHG cmpxchg #else #define CMPXCHG cmpxchg64
From: Kim Phillips kim.phillips@amd.com
commit 25d387287cf0330abf2aad761ce6eee67326a355 upstream.
Commit 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h"), claimed L2 misses were unsupported, due to them not being found in its referenced documentation, whose link has now moved [1].
That old documentation listed PMCx064 unit mask bit 3 as:
"LsRdBlkC: LS Read Block C S L X Change to X Miss."
and bit 0 as:
"IcFillMiss: IC Fill Miss"
We now have new public documentation [2] with improved descriptions, that clearly indicate what events those unit mask bits represent:
Bit 3 now clearly states:
"LsRdBlkC: Data Cache Req Miss in L2 (all types)"
and bit 0 is:
"IcFillMiss: Instruction Cache Req Miss in L2."
So we can now add support for L2 misses in perf's genericised events as PMCx064 with both the above unit masks.
[1] The commit's original documentation reference, "Processor Programming Reference (PPR) for AMD Family 17h Model 01h, Revision B1 Processors", originally available here:
https://www.amd.com/system/files/TechDocs/54945_PPR_Family_17h_Models_00h-0F...
is now available here:
https://developer.amd.com/wordpress/media/2017/11/54945_PPR_Family_17h_Model...
[2] "Processor Programming Reference (PPR) for Family 17h Model 31h, Revision B0 Processors", available here:
https://developer.amd.com/wp-content/resources/55803_0.54-PUB.pdf
Fixes: 3fe3331bb285 ("perf/x86/amd: Add event map for AMD Family 17h") Reported-by: Babu Moger babu.moger@amd.com Signed-off-by: Kim Phillips kim.phillips@amd.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Tested-by: Babu Moger babu.moger@amd.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20200121171232.28839-1-kim.phillips@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/amd/core.c | 1 + 1 file changed, 1 insertion(+)
--- a/arch/x86/events/amd/core.c +++ b/arch/x86/events/amd/core.c @@ -246,6 +246,7 @@ static const u64 amd_f17h_perfmon_event_ [PERF_COUNT_HW_CPU_CYCLES] = 0x0076, [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0, [PERF_COUNT_HW_CACHE_REFERENCES] = 0xff60, + [PERF_COUNT_HW_CACHE_MISSES] = 0x0964, [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c2, [PERF_COUNT_HW_BRANCH_MISSES] = 0x00c3, [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x0287,
From: Harald Freudenberger freude@linux.ibm.com
commit aab73d278d49c718b722ff5052e16c9cddf144d4 upstream.
The pkey ioctl call PKEY_SEC2PROTK updates a struct pkey_protkey on return. The protected key is stored in, the protected key type is stored in but the len information was not updated. This patch now fixes this and so the len field gets an update to refrect the actual size of the protected key value returned.
Fixes: efc598e6c8a9 ("s390/zcrypt: move cca misc functions to new code file") Cc: Stable stable@vger.kernel.org Signed-off-by: Harald Freudenberger freude@linux.ibm.com Reported-by: Christian Rund RUNDC@de.ibm.com Suggested-by: Ingo Franzki ifranzki@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/s390/crypto/pkey_api.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/s390/crypto/pkey_api.c +++ b/drivers/s390/crypto/pkey_api.c @@ -774,7 +774,7 @@ static long pkey_unlocked_ioctl(struct f return -EFAULT; rc = cca_sec2protkey(ksp.cardnr, ksp.domain, ksp.seckey.seckey, ksp.protkey.protkey, - NULL, &ksp.protkey.type); + &ksp.protkey.len, &ksp.protkey.type); DEBUG_DBG("%s cca_sec2protkey()=%d\n", __func__, rc); if (rc) break;
From: Christian Borntraeger borntraeger@de.ibm.com
commit 27dc0700c3be7c681cea03c5230b93d02f623492 upstream.
The query parameter block might contain additional information and can be extended in the future. If the size of the block does not suffice we get an error code of rc=0x100. The buffer will contain all information up to the specified size and the hypervisor/guest simply do not need the additional information as they do not know about the new data. That means that we can (and must) accept rc=0x100 as success.
Cc: stable@vger.kernel.org Reviewed-by: Cornelia Huck cohuck@redhat.com Fixes: 5abb9351dfd9 ("s390/uv: introduce guest side ultravisor code") Signed-off-by: Christian Borntraeger borntraeger@de.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/boot/uv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/s390/boot/uv.c +++ b/arch/s390/boot/uv.c @@ -15,7 +15,8 @@ void uv_query_info(void) if (!test_facility(158)) return;
- if (uv_call(0, (uint64_t)&uvcb)) + /* rc==0x100 means that there is additional data we do not process */ + if (uv_call(0, (uint64_t)&uvcb) && uvcb.header.rc != 0x100) return;
if (test_bit_inv(BIT_UVC_CMD_SET_SHARED_ACCESS, (unsigned long *)uvcb.inst_calls_list) &&
From: Daniel Vetter daniel.vetter@ffwll.ch
commit 4b848f20eda5974020f043ca14bacf7a7e634fc8 upstream.
There's two references floating around here (for the object reference, not the handle_count reference, that's a different thing):
- The temporary reference held by vgem_gem_create, acquired by creating the object and released by calling drm_gem_object_put_unlocked.
- The reference held by the object handle, created by drm_gem_handle_create. This one generally outlives the function, except if a 2nd thread races with a GEM_CLOSE ioctl call.
So usually everything is correct, except in that race case, where the access to gem_object->size could be looking at freed data already. Which again isn't a real problem (userspace shot its feet off already with the race, we could return garbage), but maybe someone can exploit this as an information leak.
Cc: Dan Carpenter dan.carpenter@oracle.com Cc: Hillf Danton hdanton@sina.com Reported-by: syzbot+0dc4444774d419e916c8@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Emil Velikov emil.velikov@collabora.com Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Sean Paul seanpaul@chromium.org Cc: Chris Wilson chris@chris-wilson.co.uk Cc: Eric Anholt eric@anholt.net Cc: Sam Ravnborg sam@ravnborg.org Cc: Rob Clark robdclark@chromium.org Reviewed-by: Chris Wilson chris@chris-wilson.co.uk Signed-off-by: Daniel Vetter daniel.vetter@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20200202132133.1891846-1-danie... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/vgem/vgem_drv.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/vgem/vgem_drv.c +++ b/drivers/gpu/drm/vgem/vgem_drv.c @@ -196,9 +196,10 @@ static struct drm_gem_object *vgem_gem_c return ERR_CAST(obj);
ret = drm_gem_handle_create(file, &obj->base, handle); - drm_gem_object_put_unlocked(&obj->base); - if (ret) + if (ret) { + drm_gem_object_put_unlocked(&obj->base); return ERR_PTR(ret); + }
return &obj->base; } @@ -221,7 +222,9 @@ static int vgem_gem_dumb_create(struct d args->size = gem_object->size; args->pitch = pitch;
- DRM_DEBUG("Created object of size %lld\n", size); + drm_gem_object_put_unlocked(gem_object); + + DRM_DEBUG("Created object of size %llu\n", args->size);
return 0; }
From: José Roberto de Souza jose.souza@intel.com
commit 8ccb5bf7619c6523e7a4384a84b72e7be804298c upstream.
According to DP specification, DP_SINK_EVENT_NOTIFY is also a broadcast message but as this function only handles DP_CONNECTION_STATUS_NOTIFY I will only make the static analyzer that caught this issue happy by not calling drm_dp_get_mst_branch_device_by_guid() with a NULL guid, causing drm_dp_mst_process_up_req() to return in the "if (!mstb)" right bellow.
Fixes: 9408cc94eb04 ("drm/dp_mst: Handle UP requests asynchronously") Cc: Lyude Paul lyude@redhat.com Cc: Sean Paul sean@poorly.run Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: José Roberto de Souza jose.souza@intel.com [added cc to stable] Signed-off-by: Lyude Paul lyude@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20200129232448.84704-1-jose.so... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/drm_dp_mst_topology.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/drm_dp_mst_topology.c @@ -3772,7 +3772,8 @@ drm_dp_mst_process_up_req(struct drm_dp_ else if (msg->req_type == DP_RESOURCE_STATUS_NOTIFY) guid = msg->u.resource_stat.guid;
- mstb = drm_dp_get_mst_branch_device_by_guid(mgr, guid); + if (guid) + mstb = drm_dp_get_mst_branch_device_by_guid(mgr, guid); } else { mstb = drm_dp_get_mst_branch_device(mgr, hdr->lct, hdr->rad); }
From: Boris Brezillon boris.brezillon@collabora.com
commit 7e0cf7e9936c4358b0863357b90aa12afe6489da upstream.
Userspace might tag a BO purgeable while it's still referenced by GPU jobs. We need to make sure the shrinker does not purge such BOs until all jobs referencing it are finished.
Fixes: 013b65101315 ("drm/panfrost: Add madvise and shrinker support") Cc: stable@vger.kernel.org Signed-off-by: Boris Brezillon boris.brezillon@collabora.com Reviewed-by: Steven Price steven.price@arm.com Signed-off-by: Rob Herring robh@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20191129135908.2439529-9-boris... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/panfrost/panfrost_drv.c | 1 + drivers/gpu/drm/panfrost/panfrost_gem.h | 6 ++++++ drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c | 3 +++ drivers/gpu/drm/panfrost/panfrost_job.c | 7 ++++++- 4 files changed, 16 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -166,6 +166,7 @@ panfrost_lookup_bos(struct drm_device *d break; }
+ atomic_inc(&bo->gpu_usecount); job->mappings[i] = mapping; }
--- a/drivers/gpu/drm/panfrost/panfrost_gem.h +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h @@ -30,6 +30,12 @@ struct panfrost_gem_object { struct mutex lock; } mappings;
+ /* + * Count the number of jobs referencing this BO so we don't let the + * shrinker reclaim this object prematurely. + */ + atomic_t gpu_usecount; + bool noexec :1; bool is_heap :1; }; --- a/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem_shrinker.c @@ -41,6 +41,9 @@ static bool panfrost_gem_purge(struct dr struct drm_gem_shmem_object *shmem = to_drm_gem_shmem_obj(obj); struct panfrost_gem_object *bo = to_panfrost_bo(obj);
+ if (atomic_read(&bo->gpu_usecount)) + return false; + if (!mutex_trylock(&shmem->pages_lock)) return false;
--- a/drivers/gpu/drm/panfrost/panfrost_job.c +++ b/drivers/gpu/drm/panfrost/panfrost_job.c @@ -269,8 +269,13 @@ static void panfrost_job_cleanup(struct dma_fence_put(job->render_done_fence);
if (job->mappings) { - for (i = 0; i < job->bo_count; i++) + for (i = 0; i < job->bo_count; i++) { + if (!job->mappings[i]) + break; + + atomic_dec(&job->mappings[i]->obj->gpu_usecount); panfrost_gem_mapping_put(job->mappings[i]); + } kvfree(job->mappings); }
From: Alex Deucher alexander.deucher@amd.com
commit c1d66bc2e531b4ed3a9464b8e87144cc6b2fd63f upstream.
Update to the latest changes.
Reviewed-by: Evan Quan evan.quan@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 5.5.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h | 46 ++++++++++++------ 1 file changed, 32 insertions(+), 14 deletions(-)
--- a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h +++ b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0_pptable.h @@ -39,21 +39,39 @@ #define SMU_11_0_PP_OVERDRIVE_VERSION 0x0800 #define SMU_11_0_PP_POWERSAVINGCLOCK_VERSION 0x0100
+enum SMU_11_0_ODFEATURE_CAP { + SMU_11_0_ODCAP_GFXCLK_LIMITS = 0, + SMU_11_0_ODCAP_GFXCLK_CURVE, + SMU_11_0_ODCAP_UCLK_MAX, + SMU_11_0_ODCAP_POWER_LIMIT, + SMU_11_0_ODCAP_FAN_ACOUSTIC_LIMIT, + SMU_11_0_ODCAP_FAN_SPEED_MIN, + SMU_11_0_ODCAP_TEMPERATURE_FAN, + SMU_11_0_ODCAP_TEMPERATURE_SYSTEM, + SMU_11_0_ODCAP_MEMORY_TIMING_TUNE, + SMU_11_0_ODCAP_FAN_ZERO_RPM_CONTROL, + SMU_11_0_ODCAP_AUTO_UV_ENGINE, + SMU_11_0_ODCAP_AUTO_OC_ENGINE, + SMU_11_0_ODCAP_AUTO_OC_MEMORY, + SMU_11_0_ODCAP_FAN_CURVE, + SMU_11_0_ODCAP_COUNT, +}; + enum SMU_11_0_ODFEATURE_ID { - SMU_11_0_ODFEATURE_GFXCLK_LIMITS = 1 << 0, //GFXCLK Limit feature - SMU_11_0_ODFEATURE_GFXCLK_CURVE = 1 << 1, //GFXCLK Curve feature - SMU_11_0_ODFEATURE_UCLK_MAX = 1 << 2, //UCLK Limit feature - SMU_11_0_ODFEATURE_POWER_LIMIT = 1 << 3, //Power Limit feature - SMU_11_0_ODFEATURE_FAN_ACOUSTIC_LIMIT = 1 << 4, //Fan Acoustic RPM feature - SMU_11_0_ODFEATURE_FAN_SPEED_MIN = 1 << 5, //Minimum Fan Speed feature - SMU_11_0_ODFEATURE_TEMPERATURE_FAN = 1 << 6, //Fan Target Temperature Limit feature - SMU_11_0_ODFEATURE_TEMPERATURE_SYSTEM = 1 << 7, //Operating Temperature Limit feature - SMU_11_0_ODFEATURE_MEMORY_TIMING_TUNE = 1 << 8, //AC Timing Tuning feature - SMU_11_0_ODFEATURE_FAN_ZERO_RPM_CONTROL = 1 << 9, //Zero RPM feature - SMU_11_0_ODFEATURE_AUTO_UV_ENGINE = 1 << 10, //Auto Under Volt GFXCLK feature - SMU_11_0_ODFEATURE_AUTO_OC_ENGINE = 1 << 11, //Auto Over Clock GFXCLK feature - SMU_11_0_ODFEATURE_AUTO_OC_MEMORY = 1 << 12, //Auto Over Clock MCLK feature - SMU_11_0_ODFEATURE_FAN_CURVE = 1 << 13, //VICTOR TODO + SMU_11_0_ODFEATURE_GFXCLK_LIMITS = 1 << SMU_11_0_ODCAP_GFXCLK_LIMITS, //GFXCLK Limit feature + SMU_11_0_ODFEATURE_GFXCLK_CURVE = 1 << SMU_11_0_ODCAP_GFXCLK_CURVE, //GFXCLK Curve feature + SMU_11_0_ODFEATURE_UCLK_MAX = 1 << SMU_11_0_ODCAP_UCLK_MAX, //UCLK Limit feature + SMU_11_0_ODFEATURE_POWER_LIMIT = 1 << SMU_11_0_ODCAP_POWER_LIMIT, //Power Limit feature + SMU_11_0_ODFEATURE_FAN_ACOUSTIC_LIMIT = 1 << SMU_11_0_ODCAP_FAN_ACOUSTIC_LIMIT, //Fan Acoustic RPM feature + SMU_11_0_ODFEATURE_FAN_SPEED_MIN = 1 << SMU_11_0_ODCAP_FAN_SPEED_MIN, //Minimum Fan Speed feature + SMU_11_0_ODFEATURE_TEMPERATURE_FAN = 1 << SMU_11_0_ODCAP_TEMPERATURE_FAN, //Fan Target Temperature Limit feature + SMU_11_0_ODFEATURE_TEMPERATURE_SYSTEM = 1 << SMU_11_0_ODCAP_TEMPERATURE_SYSTEM, //Operating Temperature Limit feature + SMU_11_0_ODFEATURE_MEMORY_TIMING_TUNE = 1 << SMU_11_0_ODCAP_MEMORY_TIMING_TUNE, //AC Timing Tuning feature + SMU_11_0_ODFEATURE_FAN_ZERO_RPM_CONTROL = 1 << SMU_11_0_ODCAP_FAN_ZERO_RPM_CONTROL, //Zero RPM feature + SMU_11_0_ODFEATURE_AUTO_UV_ENGINE = 1 << SMU_11_0_ODCAP_AUTO_UV_ENGINE, //Auto Under Volt GFXCLK feature + SMU_11_0_ODFEATURE_AUTO_OC_ENGINE = 1 << SMU_11_0_ODCAP_AUTO_OC_ENGINE, //Auto Over Clock GFXCLK feature + SMU_11_0_ODFEATURE_AUTO_OC_MEMORY = 1 << SMU_11_0_ODCAP_AUTO_OC_MEMORY, //Auto Over Clock MCLK feature + SMU_11_0_ODFEATURE_FAN_CURVE = 1 << SMU_11_0_ODCAP_FAN_CURVE, //Fan Curve feature SMU_11_0_ODFEATURE_COUNT = 14, }; #define SMU_11_0_MAX_ODFEATURE 32 //Maximum Number of OD Features
From: Alex Deucher alexander.deucher@amd.com
commit e33a8cfda5198fc09554fdd77ba246de42c886bd upstream.
Rather than the FEATURE_ID flags. Avoids a possible reading past the end of the array.
Reviewed-by: Evan Quan evan.quan@amd.com Reported-by: Aleksandr Mezin mezin.alexander@gmail.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org # 5.5.x Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/amd/powerplay/navi10_ppt.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)
--- a/drivers/gpu/drm/amd/powerplay/navi10_ppt.c +++ b/drivers/gpu/drm/amd/powerplay/navi10_ppt.c @@ -705,9 +705,9 @@ static bool navi10_is_support_fine_grain return dpm_desc->SnapToDiscrete == 0 ? true : false; }
-static inline bool navi10_od_feature_is_supported(struct smu_11_0_overdrive_table *od_table, enum SMU_11_0_ODFEATURE_ID feature) +static inline bool navi10_od_feature_is_supported(struct smu_11_0_overdrive_table *od_table, enum SMU_11_0_ODFEATURE_CAP cap) { - return od_table->cap[feature]; + return od_table->cap[cap]; }
static void navi10_od_setting_get_range(struct smu_11_0_overdrive_table *od_table, @@ -815,7 +815,7 @@ static int navi10_print_clk_levels(struc case SMU_OD_SCLK: if (!smu->od_enabled || !od_table || !od_settings) break; - if (!navi10_od_feature_is_supported(od_settings, SMU_11_0_ODFEATURE_GFXCLK_LIMITS)) + if (!navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_GFXCLK_LIMITS)) break; size += sprintf(buf + size, "OD_SCLK:\n"); size += sprintf(buf + size, "0: %uMhz\n1: %uMhz\n", od_table->GfxclkFmin, od_table->GfxclkFmax); @@ -823,7 +823,7 @@ static int navi10_print_clk_levels(struc case SMU_OD_MCLK: if (!smu->od_enabled || !od_table || !od_settings) break; - if (!navi10_od_feature_is_supported(od_settings, SMU_11_0_ODFEATURE_UCLK_MAX)) + if (!navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_UCLK_MAX)) break; size += sprintf(buf + size, "OD_MCLK:\n"); size += sprintf(buf + size, "1: %uMHz\n", od_table->UclkFmax); @@ -831,7 +831,7 @@ static int navi10_print_clk_levels(struc case SMU_OD_VDDC_CURVE: if (!smu->od_enabled || !od_table || !od_settings) break; - if (!navi10_od_feature_is_supported(od_settings, SMU_11_0_ODFEATURE_GFXCLK_CURVE)) + if (!navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_GFXCLK_CURVE)) break; size += sprintf(buf + size, "OD_VDDC_CURVE:\n"); for (i = 0; i < 3; i++) { @@ -856,7 +856,7 @@ static int navi10_print_clk_levels(struc break; size = sprintf(buf, "%s:\n", "OD_RANGE");
- if (navi10_od_feature_is_supported(od_settings, SMU_11_0_ODFEATURE_GFXCLK_LIMITS)) { + if (navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_GFXCLK_LIMITS)) { navi10_od_setting_get_range(od_settings, SMU_11_0_ODSETTING_GFXCLKFMIN, &min_value, NULL); navi10_od_setting_get_range(od_settings, SMU_11_0_ODSETTING_GFXCLKFMAX, @@ -865,14 +865,14 @@ static int navi10_print_clk_levels(struc min_value, max_value); }
- if (navi10_od_feature_is_supported(od_settings, SMU_11_0_ODFEATURE_UCLK_MAX)) { + if (navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_UCLK_MAX)) { navi10_od_setting_get_range(od_settings, SMU_11_0_ODSETTING_UCLKFMAX, &min_value, &max_value); size += sprintf(buf + size, "MCLK: %7uMhz %10uMhz\n", min_value, max_value); }
- if (navi10_od_feature_is_supported(od_settings, SMU_11_0_ODFEATURE_GFXCLK_CURVE)) { + if (navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_GFXCLK_CURVE)) { navi10_od_setting_get_range(od_settings, SMU_11_0_ODSETTING_VDDGFXCURVEFREQ_P1, &min_value, &max_value); size += sprintf(buf + size, "VDDC_CURVE_SCLK[0]: %7uMhz %10uMhz\n", @@ -1956,7 +1956,7 @@ static int navi10_od_edit_dpm_table(stru
switch (type) { case PP_OD_EDIT_SCLK_VDDC_TABLE: - if (!navi10_od_feature_is_supported(od_settings, SMU_11_0_ODFEATURE_GFXCLK_LIMITS)) { + if (!navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_GFXCLK_LIMITS)) { pr_warn("GFXCLK_LIMITS not supported!\n"); return -ENOTSUPP; } @@ -2002,7 +2002,7 @@ static int navi10_od_edit_dpm_table(stru } break; case PP_OD_EDIT_MCLK_VDDC_TABLE: - if (!navi10_od_feature_is_supported(od_settings, SMU_11_0_ODFEATURE_UCLK_MAX)) { + if (!navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_UCLK_MAX)) { pr_warn("UCLK_MAX not supported!\n"); return -ENOTSUPP; } @@ -2043,7 +2043,7 @@ static int navi10_od_edit_dpm_table(stru } break; case PP_OD_EDIT_VDDC_CURVE: - if (!navi10_od_feature_is_supported(od_settings, SMU_11_0_ODFEATURE_GFXCLK_CURVE)) { + if (!navi10_od_feature_is_supported(od_settings, SMU_11_0_ODCAP_GFXCLK_CURVE)) { pr_warn("GFXCLK_CURVE not supported!\n"); return -ENOTSUPP; }
From: Marek Behún marek.behun@nic.cz
commit 3bf3c9744694803bd2d6f0ee70a6369b980530fd upstream.
The input_read function declares the size of the hex array relative to sizeof(buf), but buf is a pointer argument of the function. The hex array is meant to contain hexadecimal representation of the bin array.
Link: https://lore.kernel.org/r/20200215142130.22743-1-marek.behun@nic.cz Fixes: 5bc7f990cd98 ("bus: Add support for Moxtet bus") Signed-off-by: Marek Behún marek.behun@nic.cz Reported-by: sohu0106 sohu0106@126.com Signed-off-by: Olof Johansson olof@lixom.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/bus/moxtet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/bus/moxtet.c +++ b/drivers/bus/moxtet.c @@ -466,7 +466,7 @@ static ssize_t input_read(struct file *f { struct moxtet *moxtet = file->private_data; u8 bin[TURRIS_MOX_MAX_MODULES]; - u8 hex[sizeof(buf) * 2 + 1]; + u8 hex[sizeof(bin) * 2 + 1]; int ret, n;
ret = moxtet_spi_read(moxtet, bin);
From: Yi Zhang yi.zhang@redhat.com
commit f25372ffc3f6c2684b57fb718219137e6ee2b64c upstream.
nvme fw-activate operation will get bellow warning log, fix it by update the parameter order
[ 113.231513] nvme nvme0: Get FW SLOT INFO log error
Fixes: 0e98719b0e4b ("nvme: simplify the API for getting log pages") Reported-by: Sujith Pandel sujith_pandel@dell.com Reviewed-by: David Milburn dmilburn@redhat.com Signed-off-by: Yi Zhang yi.zhang@redhat.com Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3867,7 +3867,7 @@ static void nvme_get_fw_slot_info(struct if (!log) return;
- if (nvme_get_log(ctrl, NVME_NSID_ALL, 0, NVME_LOG_FW_SLOT, log, + if (nvme_get_log(ctrl, NVME_NSID_ALL, NVME_LOG_FW_SLOT, 0, log, sizeof(*log), 0)) dev_warn(ctrl->device, "Get FW SLOT INFO log error\n"); kfree(log);
From: Colin Ian King colin.king@canonical.com
commit e0354d147e5889b5faa12e64fa38187aed39aad4 upstream.
The end of buffer check is off-by-one since the check is against an index that is pre-incremented before a store to buf[]. Fix this adjusting the bounds check appropriately.
Addresses-Coverity: ("Out-of-bounds write") Fixes: 51bd6f291583 ("Add support for IPMB driver") Signed-off-by: Colin Ian King colin.king@canonical.com Message-Id: 20200114144031.358003-1-colin.king@canonical.com Reviewed-by: Asmaa Mnebhi asmaa@mellanox.com Signed-off-by: Corey Minyard cminyard@mvista.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/char/ipmi/ipmb_dev_int.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/char/ipmi/ipmb_dev_int.c +++ b/drivers/char/ipmi/ipmb_dev_int.c @@ -253,7 +253,7 @@ static int ipmb_slave_cb(struct i2c_clie break;
case I2C_SLAVE_WRITE_RECEIVED: - if (ipmb_dev->msg_idx >= sizeof(struct ipmb_msg)) + if (ipmb_dev->msg_idx >= sizeof(struct ipmb_msg) - 1) break;
buf[++ipmb_dev->msg_idx] = *val;
From: Mark Zhang markz@mellanox.com
commit 10189e8e6fe8dcde13435f9354800429c4474fb1 upstream.
When binding a QP with a counter and the QP state is not RESET, return failure if the rts2rts_qp_counters_set_id is not supported by the device.
This is to prevent cases like manual bind for Connect-IB devices from returning success when the feature is not supported.
Fixes: d14133dd4161 ("IB/mlx5: Support set qp counter") Link: https://lore.kernel.org/r/20200126171708.5167-1-leon@kernel.org Signed-off-by: Mark Zhang markz@mellanox.com Reviewed-by: Maor Gottlieb maorg@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/hw/mlx5/qp.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -3394,9 +3394,6 @@ static int __mlx5_ib_qp_set_counter(stru struct mlx5_ib_qp_base *base; u32 set_id;
- if (!MLX5_CAP_GEN(dev->mdev, rts2rts_qp_counters_set_id)) - return 0; - if (counter) set_id = counter->id; else @@ -6529,6 +6526,7 @@ void mlx5_ib_drain_rq(struct ib_qp *qp) */ int mlx5_ib_qp_set_counter(struct ib_qp *qp, struct rdma_counter *counter) { + struct mlx5_ib_dev *dev = to_mdev(qp->device); struct mlx5_ib_qp *mqp = to_mqp(qp); int err = 0;
@@ -6538,6 +6536,11 @@ int mlx5_ib_qp_set_counter(struct ib_qp goto out; }
+ if (!MLX5_CAP_GEN(dev->mdev, rts2rts_qp_counters_set_id)) { + err = -EOPNOTSUPP; + goto out; + } + if (mqp->state == IB_QPS_RTS) { err = __mlx5_ib_qp_set_counter(qp, counter); if (!err)
From: Kaike Wan kaike.wan@intel.com
commit a70ed0f2e6262e723ae8d70accb984ba309eacc2 upstream.
Each user context is allocated a certain number of RcvArray (TID) entries and these entries are managed through TID groups. These groups are put into one of three lists in each user context: tid_group_list, tid_used_list, and tid_full_list, depending on the number of used TID entries within each group. When TID packets are expected, one or more TID groups will be allocated. After the packets are received, the TID groups will be freed. Since multiple user threads may access the TID groups simultaneously, a mutex exp_mutex is used to synchronize the access. However, when the user file is closed, it tries to release all TID groups without acquiring the mutex first, which risks a race condition with another thread that may be releasing its TID groups, leading to data corruption.
This patch addresses the issue by acquiring the mutex first before releasing the TID groups when the file is closed.
Fixes: 3abb33ac6521 ("staging/hfi1: Add TID cache receive init and free funcs") Link: https://lore.kernel.org/r/20200210131026.87408.86853.stgit@awfm-01.aw.intel.... Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Kaike Wan kaike.wan@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/hw/hfi1/user_exp_rcv.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c +++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c @@ -142,10 +142,12 @@ void hfi1_user_exp_rcv_free(struct hfi1_ { struct hfi1_ctxtdata *uctxt = fd->uctxt;
+ mutex_lock(&uctxt->exp_mutex); if (!EXP_TID_SET_EMPTY(uctxt->tid_full_list)) unlock_exp_tids(uctxt, &uctxt->tid_full_list, fd); if (!EXP_TID_SET_EMPTY(uctxt->tid_used_list)) unlock_exp_tids(uctxt, &uctxt->tid_used_list, fd); + mutex_unlock(&uctxt->exp_mutex);
kfree(fd->invalid_tids); fd->invalid_tids = NULL;
From: Mike Marciniszyn mike.marciniszyn@intel.com
commit be8638344c70bf492963ace206a9896606b6922d upstream.
Cleaning up a pq can result in the following warning and panic:
WARNING: CPU: 52 PID: 77418 at lib/list_debug.c:53 __list_del_entry+0x63/0xd0 list_del corruption, ffff88cb2c6ac068->next is LIST_POISON1 (dead000000000100) Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit] CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G OE ------------ 3.10.0-957.38.3.el7.x86_64 #1 Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019 Call Trace: [<ffffffff90365ac0>] dump_stack+0x19/0x1b [<ffffffff8fc98b78>] __warn+0xd8/0x100 [<ffffffff8fc98bff>] warn_slowpath_fmt+0x5f/0x80 [<ffffffff8ff970c3>] __list_del_entry+0x63/0xd0 [<ffffffff8ff9713d>] list_del+0xd/0x30 [<ffffffff8fddda70>] kmem_cache_destroy+0x50/0x110 [<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1] [<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1] [<ffffffff8fe4519c>] __fput+0xec/0x260 [<ffffffff8fe453fe>] ____fput+0xe/0x10 [<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0 [<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0 [<ffffffff90379134>] int_signal+0x12/0x17 BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300 PGD 2cdab19067 PUD 2f7bfdb067 PMD 0 Oops: 0000 [#1] SMP Modules linked in: mmfs26(OE) mmfslinux(OE) tracedev(OE) 8021q garp mrp ib_isert iscsi_target_mod target_core_mod crc_t10dif crct10dif_generic opa_vnic rpcrdma ib_iser libiscsi scsi_transport_iscsi ib_ipoib(OE) bridge stp llc iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crct10dif_pclmul crct10dif_common crc32_pclmul ghash_clmulni_intel ast aesni_intel ttm lrw gf128mul glue_helper ablk_helper drm_kms_helper cryptd syscopyarea sysfillrect sysimgblt fb_sys_fops drm pcspkr joydev lpc_ich mei_me drm_panel_orientation_quirks i2c_i801 mei wmi ipmi_si ipmi_devintf ipmi_msghandler nfit libnvdimm acpi_power_meter acpi_pad hfi1(OE) rdmavt(OE) rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_core binfmt_misc numatools(OE) xpmem(OE) ip_tables nfsv3 nfs_acl nfs lockd grace sunrpc fscache igb ahci i2c_algo_bit libahci dca ptp libata pps_core crc32c_intel [last unloaded: i2c_algo_bit] CPU: 52 PID: 77418 Comm: pvbatch Kdump: loaded Tainted: G W OE ------------ 3.10.0-957.38.3.el7.x86_64 #1 Hardware name: HPE.COM HPE SGI 8600-XA730i Gen10/X11DPT-SB-SG007, BIOS SBED1229 01/22/2019 task: ffff88cc26db9040 ti: ffff88b5393a8000 task.ti: ffff88b5393a8000 RIP: 0010:[<ffffffff8fe1f93e>] [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300 RSP: 0018:ffff88b5393abd60 EFLAGS: 00010287 RAX: 0000000000000000 RBX: ffff88cb2c6ac000 RCX: 0000000000000003 RDX: 0000000000000400 RSI: 0000000000000400 RDI: ffffffff9095b800 RBP: ffff88b5393abdb0 R08: ffffffff9095b808 R09: ffffffff8ff77c19 R10: ffff88b73ce1f160 R11: ffffddecddde9800 R12: ffff88cb2c6ac000 R13: 000000000000000c R14: ffff88cf3fdca780 R15: 0000000000000000 FS: 00002aaaaab52500(0000) GS:ffff88b73ce00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 0000002d27664000 CR4: 00000000007607e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: [<ffffffff8fe20d44>] __kmem_cache_shutdown+0x14/0x80 [<ffffffff8fddda78>] kmem_cache_destroy+0x58/0x110 [<ffffffffc0328130>] hfi1_user_sdma_free_queues+0xf0/0x200 [hfi1] [<ffffffffc02e2350>] hfi1_file_close+0x70/0x1e0 [hfi1] [<ffffffff8fe4519c>] __fput+0xec/0x260 [<ffffffff8fe453fe>] ____fput+0xe/0x10 [<ffffffff8fcbfd1b>] task_work_run+0xbb/0xe0 [<ffffffff8fc2bc65>] do_notify_resume+0xa5/0xc0 [<ffffffff90379134>] int_signal+0x12/0x17 Code: 00 00 ba 00 04 00 00 0f 4f c2 3d 00 04 00 00 89 45 bc 0f 84 e7 01 00 00 48 63 45 bc 49 8d 04 c4 48 89 45 b0 48 8b 80 c8 00 00 00 <48> 8b 78 10 48 89 45 c0 48 83 c0 10 48 89 45 d0 48 8b 17 48 39 RIP [<ffffffff8fe1f93e>] kmem_cache_close+0x7e/0x300 RSP <ffff88b5393abd60> CR2: 0000000000000010
The panic is the result of slab entries being freed during the destruction of the pq slab.
The code attempts to quiesce the pq, but looking for n_req == 0 doesn't account for new requests.
Fix the issue by using SRCU to get a pq pointer and adjust the pq free logic to NULL the fd pq pointer prior to the quiesce.
Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized") Link: https://lore.kernel.org/r/20200210131033.87408.81174.stgit@awfm-01.aw.intel.... Reviewed-by: Kaike Wan kaike.wan@intel.com Signed-off-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/hw/hfi1/file_ops.c | 52 ++++++++++++++++++------------ drivers/infiniband/hw/hfi1/hfi.h | 5 ++ drivers/infiniband/hw/hfi1/user_exp_rcv.c | 3 - drivers/infiniband/hw/hfi1/user_sdma.c | 17 ++++++--- 4 files changed, 48 insertions(+), 29 deletions(-)
--- a/drivers/infiniband/hw/hfi1/file_ops.c +++ b/drivers/infiniband/hw/hfi1/file_ops.c @@ -200,23 +200,24 @@ static int hfi1_file_open(struct inode *
fd = kzalloc(sizeof(*fd), GFP_KERNEL);
- if (fd) { - fd->rec_cpu_num = -1; /* no cpu affinity by default */ - fd->mm = current->mm; - mmgrab(fd->mm); - fd->dd = dd; - kobject_get(&fd->dd->kobj); - fp->private_data = fd; - } else { - fp->private_data = NULL; - - if (atomic_dec_and_test(&dd->user_refcount)) - complete(&dd->user_comp); - - return -ENOMEM; - } - + if (!fd || init_srcu_struct(&fd->pq_srcu)) + goto nomem; + spin_lock_init(&fd->pq_rcu_lock); + spin_lock_init(&fd->tid_lock); + spin_lock_init(&fd->invalid_lock); + fd->rec_cpu_num = -1; /* no cpu affinity by default */ + fd->mm = current->mm; + mmgrab(fd->mm); + fd->dd = dd; + kobject_get(&fd->dd->kobj); + fp->private_data = fd; return 0; +nomem: + kfree(fd); + fp->private_data = NULL; + if (atomic_dec_and_test(&dd->user_refcount)) + complete(&dd->user_comp); + return -ENOMEM; }
static long hfi1_file_ioctl(struct file *fp, unsigned int cmd, @@ -301,21 +302,30 @@ static long hfi1_file_ioctl(struct file static ssize_t hfi1_write_iter(struct kiocb *kiocb, struct iov_iter *from) { struct hfi1_filedata *fd = kiocb->ki_filp->private_data; - struct hfi1_user_sdma_pkt_q *pq = fd->pq; + struct hfi1_user_sdma_pkt_q *pq; struct hfi1_user_sdma_comp_q *cq = fd->cq; int done = 0, reqs = 0; unsigned long dim = from->nr_segs; + int idx;
- if (!cq || !pq) + idx = srcu_read_lock(&fd->pq_srcu); + pq = srcu_dereference(fd->pq, &fd->pq_srcu); + if (!cq || !pq) { + srcu_read_unlock(&fd->pq_srcu, idx); return -EIO; + }
- if (!iter_is_iovec(from) || !dim) + if (!iter_is_iovec(from) || !dim) { + srcu_read_unlock(&fd->pq_srcu, idx); return -EINVAL; + }
trace_hfi1_sdma_request(fd->dd, fd->uctxt->ctxt, fd->subctxt, dim);
- if (atomic_read(&pq->n_reqs) == pq->n_max_reqs) + if (atomic_read(&pq->n_reqs) == pq->n_max_reqs) { + srcu_read_unlock(&fd->pq_srcu, idx); return -ENOSPC; + }
while (dim) { int ret; @@ -333,6 +343,7 @@ static ssize_t hfi1_write_iter(struct ki reqs++; }
+ srcu_read_unlock(&fd->pq_srcu, idx); return reqs; }
@@ -707,6 +718,7 @@ done: if (atomic_dec_and_test(&dd->user_refcount)) complete(&dd->user_comp);
+ cleanup_srcu_struct(&fdata->pq_srcu); kfree(fdata); return 0; } --- a/drivers/infiniband/hw/hfi1/hfi.h +++ b/drivers/infiniband/hw/hfi1/hfi.h @@ -1436,10 +1436,13 @@ struct mmu_rb_handler;
/* Private data for file operations */ struct hfi1_filedata { + struct srcu_struct pq_srcu; struct hfi1_devdata *dd; struct hfi1_ctxtdata *uctxt; struct hfi1_user_sdma_comp_q *cq; - struct hfi1_user_sdma_pkt_q *pq; + /* update side lock for SRCU */ + spinlock_t pq_rcu_lock; + struct hfi1_user_sdma_pkt_q __rcu *pq; u16 subctxt; /* for cpu affinity; -1 if none */ int rec_cpu_num; --- a/drivers/infiniband/hw/hfi1/user_exp_rcv.c +++ b/drivers/infiniband/hw/hfi1/user_exp_rcv.c @@ -87,9 +87,6 @@ int hfi1_user_exp_rcv_init(struct hfi1_f { int ret = 0;
- spin_lock_init(&fd->tid_lock); - spin_lock_init(&fd->invalid_lock); - fd->entry_to_rb = kcalloc(uctxt->expected_count, sizeof(struct rb_node *), GFP_KERNEL); --- a/drivers/infiniband/hw/hfi1/user_sdma.c +++ b/drivers/infiniband/hw/hfi1/user_sdma.c @@ -179,7 +179,6 @@ int hfi1_user_sdma_alloc_queues(struct h pq = kzalloc(sizeof(*pq), GFP_KERNEL); if (!pq) return -ENOMEM; - pq->dd = dd; pq->ctxt = uctxt->ctxt; pq->subctxt = fd->subctxt; @@ -236,7 +235,7 @@ int hfi1_user_sdma_alloc_queues(struct h goto pq_mmu_fail; }
- fd->pq = pq; + rcu_assign_pointer(fd->pq, pq); fd->cq = cq;
return 0; @@ -264,8 +263,14 @@ int hfi1_user_sdma_free_queues(struct hf
trace_hfi1_sdma_user_free_queues(uctxt->dd, uctxt->ctxt, fd->subctxt);
- pq = fd->pq; + spin_lock(&fd->pq_rcu_lock); + pq = srcu_dereference_check(fd->pq, &fd->pq_srcu, + lockdep_is_held(&fd->pq_rcu_lock)); if (pq) { + rcu_assign_pointer(fd->pq, NULL); + spin_unlock(&fd->pq_rcu_lock); + synchronize_srcu(&fd->pq_srcu); + /* at this point there can be no more new requests */ if (pq->handler) hfi1_mmu_rb_unregister(pq->handler); iowait_sdma_drain(&pq->busy); @@ -277,7 +282,8 @@ int hfi1_user_sdma_free_queues(struct hf kfree(pq->req_in_use); kmem_cache_destroy(pq->txreq_cache); kfree(pq); - fd->pq = NULL; + } else { + spin_unlock(&fd->pq_rcu_lock); } if (fd->cq) { vfree(fd->cq->comps); @@ -321,7 +327,8 @@ int hfi1_user_sdma_process_request(struc { int ret = 0, i; struct hfi1_ctxtdata *uctxt = fd->uctxt; - struct hfi1_user_sdma_pkt_q *pq = fd->pq; + struct hfi1_user_sdma_pkt_q *pq = + srcu_dereference(fd->pq, &fd->pq_srcu); struct hfi1_user_sdma_comp_q *cq = fd->cq; struct hfi1_devdata *dd = pq->dd; unsigned long idx = 0;
From: Kaike Wan kaike.wan@intel.com
commit f92e48718889b3d49cee41853402aa88cac84a6b upstream.
When the hfi1 device is shut down during a system reboot, it is possible that some QPs might have not not freed by ULPs. More requests could be post sent and a lingering timer could be triggered to schedule more packet sends, leading to a crash:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000102 IP: [ffffffff810a65f2] __queue_work+0x32/0x3c0 PGD 0 Oops: 0000 1 SMP Modules linked in: nvmet_rdma(OE) nvmet(OE) nvme(OE) dm_round_robin nvme_rdma(OE) nvme_fabrics(OE) nvme_core(OE) pal_raw(POE) pal_pmt(POE) pal_cache(POE) pal_pile(POE) pal(POE) pal_compatible(OE) rpcrdma sunrpc ib_isert iscsi_target_mod target_core_mod ib_iser libiscsi scsi_transport_iscsi ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm mlx4_ib sb_edac edac_core intel_powerclamp coretemp intel_rapl iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support mxm_wmi ipmi_ssif pcspkr ses enclosure joydev scsi_transport_sas i2c_i801 sg mei_me lpc_ich mei ioatdma shpchp ipmi_si ipmi_devintf ipmi_msghandler wmi acpi_power_meter acpi_pad dm_multipath hangcheck_timer ip_tables ext4 mbcache jbd2 mlx4_en sd_mod crc_t10dif crct10dif_generic mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm mlx4_core crct10dif_pclmul crct10dif_common hfi1(OE) igb crc32c_intel rdmavt(OE) ahci ib_core libahci libata ptp megaraid_sas pps_core dca i2c_algo_bit i2c_core devlink dm_mirror dm_region_hash dm_log dm_mod CPU: 23 PID: 0 Comm: swapper/23 Tainted: P OE ------------ 3.10.0-693.el7.x86_64 #1 Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0028.121720182203 12/17/2018 task: ffff8808f4ec4f10 ti: ffff8808f4ed8000 task.ti: ffff8808f4ed8000 RIP: 0010:[ffffffff810a65f2] [ffffffff810a65f2] __queue_work+0x32/0x3c0 RSP: 0018:ffff88105df43d48 EFLAGS: 00010046 RAX: 0000000000000086 RBX: 0000000000000086 RCX: 0000000000000000 RDX: ffff880f74e758b0 RSI: 0000000000000000 RDI: 000000000000001f RBP: ffff88105df43d80 R08: ffff8808f3c583c8 R09: ffff8808f3c58000 R10: 0000000000000002 R11: ffff88105df43da8 R12: ffff880f74e758b0 R13: 000000000000001f R14: 0000000000000000 R15: ffff88105a300000 FS: 0000000000000000(0000) GS:ffff88105df40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000102 CR3: 00000000019f2000 CR4: 00000000001407e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Stack: ffff88105b6dd708 0000001f00000286 0000000000000086 ffff88105a300000 ffff880f74e75800 0000000000000000 ffff88105a300000 ffff88105df43d98 ffffffff810a6b85 ffff88105a301e80 ffff88105df43dc8 ffffffffc0224cde Call Trace: IRQ
[ffffffff810a6b85] queue_work_on+0x45/0x50 [ffffffffc0224cde] _hfi1_schedule_send+0x6e/0xc0 [hfi1] [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt] [ffffffffc0224d62] hfi1_schedule_send+0x32/0x70 [hfi1] [ffffffffc0170644] rvt_rc_timeout+0xd4/0x120 [rdmavt] [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt] [ffffffff81097316] call_timer_fn+0x36/0x110 [ffffffffc0170570] ? get_map_page+0x60/0x60 [rdmavt] [ffffffff8109982d] run_timer_softirq+0x22d/0x310 [ffffffff81090b3f] __do_softirq+0xef/0x280 [ffffffff816b6a5c] call_softirq+0x1c/0x30 [ffffffff8102d3c5] do_softirq+0x65/0xa0 [ffffffff81090ec5] irq_exit+0x105/0x110 [ffffffff816b76c2] smp_apic_timer_interrupt+0x42/0x50 [ffffffff816b5c1d] apic_timer_interrupt+0x6d/0x80 EOI
[ffffffff81527a02] ? cpuidle_enter_state+0x52/0xc0 [ffffffff81527b48] cpuidle_idle_call+0xd8/0x210 [ffffffff81034fee] arch_cpu_idle+0xe/0x30 [ffffffff810e7bca] cpu_startup_entry+0x14a/0x1c0 [ffffffff81051af6] start_secondary+0x1b6/0x230 Code: 89 e5 41 57 41 56 49 89 f6 41 55 41 89 fd 41 54 49 89 d4 53 48 83 ec 10 89 7d d4 9c 58 0f 1f 44 00 00 f6 c4 02 0f 85 be 02 00 00 41 f6 86 02 01 00 00 01 0f 85 58 02 00 00 49 c7 c7 28 19 01 00 RIP [ffffffff810a65f2] __queue_work+0x32/0x3c0 RSP ffff88105df43d48 CR2: 0000000000000102
The solution is to reset the QPs before the device resources are freed. This reset will change the QP state to prevent post sends and delete timers to prevent callbacks.
Fixes: 0acb0cc7ecc1 ("IB/rdmavt: Initialize and teardown of qpn table") Link: https://lore.kernel.org/r/20200210131040.87408.38161.stgit@awfm-01.aw.intel.... Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Kaike Wan kaike.wan@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/sw/rdmavt/qp.c | 84 +++++++++++++++++++++++--------------- 1 file changed, 51 insertions(+), 33 deletions(-)
--- a/drivers/infiniband/sw/rdmavt/qp.c +++ b/drivers/infiniband/sw/rdmavt/qp.c @@ -61,6 +61,8 @@ #define RVT_RWQ_COUNT_THRESHOLD 16
static void rvt_rc_timeout(struct timer_list *t); +static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp, + enum ib_qp_type type);
/* * Convert the AETH RNR timeout code into the number of microseconds. @@ -452,40 +454,41 @@ no_qp_table: }
/** - * free_all_qps - check for QPs still in use + * rvt_free_qp_cb - callback function to reset a qp + * @qp: the qp to reset + * @v: a 64-bit value + * + * This function resets the qp and removes it from the + * qp hash table. + */ +static void rvt_free_qp_cb(struct rvt_qp *qp, u64 v) +{ + unsigned int *qp_inuse = (unsigned int *)v; + struct rvt_dev_info *rdi = ib_to_rvt(qp->ibqp.device); + + /* Reset the qp and remove it from the qp hash list */ + rvt_reset_qp(rdi, qp, qp->ibqp.qp_type); + + /* Increment the qp_inuse count */ + (*qp_inuse)++; +} + +/** + * rvt_free_all_qps - check for QPs still in use * @rdi: rvt device info structure * * There should not be any QPs still in use. * Free memory for table. + * Return the number of QPs still in use. */ static unsigned rvt_free_all_qps(struct rvt_dev_info *rdi) { - unsigned long flags; - struct rvt_qp *qp; - unsigned n, qp_inuse = 0; - spinlock_t *ql; /* work around too long line below */ - - if (rdi->driver_f.free_all_qps) - qp_inuse = rdi->driver_f.free_all_qps(rdi); + unsigned int qp_inuse = 0;
qp_inuse += rvt_mcast_tree_empty(rdi);
- if (!rdi->qp_dev) - return qp_inuse; + rvt_qp_iter(rdi, (u64)&qp_inuse, rvt_free_qp_cb);
- ql = &rdi->qp_dev->qpt_lock; - spin_lock_irqsave(ql, flags); - for (n = 0; n < rdi->qp_dev->qp_table_size; n++) { - qp = rcu_dereference_protected(rdi->qp_dev->qp_table[n], - lockdep_is_held(ql)); - RCU_INIT_POINTER(rdi->qp_dev->qp_table[n], NULL); - - for (; qp; qp = rcu_dereference_protected(qp->next, - lockdep_is_held(ql))) - qp_inuse++; - } - spin_unlock_irqrestore(ql, flags); - synchronize_rcu(); return qp_inuse; }
@@ -902,14 +905,14 @@ static void rvt_init_qp(struct rvt_dev_i }
/** - * rvt_reset_qp - initialize the QP state to the reset state + * _rvt_reset_qp - initialize the QP state to the reset state * @qp: the QP to reset * @type: the QP type * * r_lock, s_hlock, and s_lock are required to be held by the caller */ -static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp, - enum ib_qp_type type) +static void _rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp, + enum ib_qp_type type) __must_hold(&qp->s_lock) __must_hold(&qp->s_hlock) __must_hold(&qp->r_lock) @@ -955,6 +958,27 @@ static void rvt_reset_qp(struct rvt_dev_ lockdep_assert_held(&qp->s_lock); }
+/** + * rvt_reset_qp - initialize the QP state to the reset state + * @rdi: the device info + * @qp: the QP to reset + * @type: the QP type + * + * This is the wrapper function to acquire the r_lock, s_hlock, and s_lock + * before calling _rvt_reset_qp(). + */ +static void rvt_reset_qp(struct rvt_dev_info *rdi, struct rvt_qp *qp, + enum ib_qp_type type) +{ + spin_lock_irq(&qp->r_lock); + spin_lock(&qp->s_hlock); + spin_lock(&qp->s_lock); + _rvt_reset_qp(rdi, qp, type); + spin_unlock(&qp->s_lock); + spin_unlock(&qp->s_hlock); + spin_unlock_irq(&qp->r_lock); +} + /** rvt_free_qpn - Free a qpn from the bit map * @qpt: QP table * @qpn: queue pair number to free @@ -1546,7 +1570,7 @@ int rvt_modify_qp(struct ib_qp *ibqp, st switch (new_state) { case IB_QPS_RESET: if (qp->state != IB_QPS_RESET) - rvt_reset_qp(rdi, qp, ibqp->qp_type); + _rvt_reset_qp(rdi, qp, ibqp->qp_type); break;
case IB_QPS_RTR: @@ -1695,13 +1719,7 @@ int rvt_destroy_qp(struct ib_qp *ibqp, s struct rvt_qp *qp = ibqp_to_rvtqp(ibqp); struct rvt_dev_info *rdi = ib_to_rvt(ibqp->device);
- spin_lock_irq(&qp->r_lock); - spin_lock(&qp->s_hlock); - spin_lock(&qp->s_lock); rvt_reset_qp(rdi, qp, ibqp->qp_type); - spin_unlock(&qp->s_lock); - spin_unlock(&qp->s_hlock); - spin_unlock_irq(&qp->r_lock);
wait_event(qp->wait, !atomic_read(&qp->refcount)); /* qpn is now available for use again */
From: Yonatan Cohen yonatanc@mellanox.com
commit 9ea04d0df6e6541c6736b43bff45f1e54875a1db upstream.
When disassociating a device from umad we must ensure that the sysfs access is prevented before blocking the fops, otherwise assumptions in syfs don't hold:
CPU0 CPU1 ib_umad_kill_port() ibdev_show() port->ib_dev = NULL dev_name(port->ib_dev)
The prior patch made an error in moving the device_destroy(), it should have been split into device_del() (above) and put_device() (below). At this point we already have the split, so move the device_del() back to its original place.
kernel stack PF: error_code(0x0000) - not-present page Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI RIP: 0010:ibdev_show+0x18/0x50 [ib_umad] RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000 RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870 RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000 R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40 R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58 FS: 00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0 Call Trace: dev_attr_show+0x15/0x50 sysfs_kf_seq_show+0xb8/0x1a0 seq_read+0x12d/0x350 vfs_read+0x89/0x140 ksys_read+0x55/0xd0 do_syscall_64+0x55/0x1b0 entry_SYSCALL_64_after_hwframe+0x44/0xa9:
Fixes: cf7ad3030271 ("IB/umad: Avoid destroying device while it is accessed") Link: https://lore.kernel.org/r/20200212072635.682689-9-leon@kernel.org Signed-off-by: Yonatan Cohen yonatanc@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Reviewed-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/core/user_mad.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/infiniband/core/user_mad.c +++ b/drivers/infiniband/core/user_mad.c @@ -1312,6 +1312,9 @@ static void ib_umad_kill_port(struct ib_ struct ib_umad_file *file; int id;
+ cdev_device_del(&port->sm_cdev, &port->sm_dev); + cdev_device_del(&port->cdev, &port->dev); + mutex_lock(&port->file_mutex);
/* Mark ib_dev NULL and block ioctl or other file ops to progress @@ -1331,8 +1334,6 @@ static void ib_umad_kill_port(struct ib_
mutex_unlock(&port->file_mutex);
- cdev_device_del(&port->sm_cdev, &port->sm_dev); - cdev_device_del(&port->cdev, &port->dev); ida_free(&umad_ida, port->dev_num);
/* balances device_initialize() */
From: Avihai Horon avihaih@mellanox.com
commit a72f4ac1d778f7bde93dfee69bfc23377ec3d74f upstream.
Add a check that the size specified in the flow spec header doesn't cause an overflow when calculating the filter size, and thus prevent access to invalid memory. The following crash from syzkaller revealed it.
kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 17834 Comm: syz-executor.3 Not tainted 5.5.0-rc5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:memchr_inv+0xd3/0x330 Code: 89 f9 89 f5 83 e1 07 0f 85 f9 00 00 00 49 89 d5 49 c1 ed 03 45 85 ed 74 6f 48 89 d9 48 b8 00 00 00 00 00 fc ff df 48 c1 e9 03 <80> 3c 01 00 0f 85 0d 02 00 00 44 0f b6 e5 48 b8 01 01 01 01 01 01 RSP: 0018:ffffc9000a13fa50 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 7fff88810de9d820 RCX: 0ffff11021bd3b04 RDX: 000000000000fff8 RSI: 0000000000000000 RDI: 7fff88810de9d820 RBP: 0000000000000000 R08: ffff888110d69018 R09: 0000000000000009 R10: 0000000000000001 R11: ffffed10236267cc R12: 0000000000000004 R13: 0000000000001fff R14: ffff88810de9d820 R15: 0000000000000040 FS: 00007f9ee0e51700(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000000115ea0006 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: spec_filter_size.part.16+0x34/0x50 ib_uverbs_kern_spec_to_ib_spec_filter+0x691/0x770 ib_uverbs_ex_create_flow+0x9ea/0x1b40 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9 RIP: 0033:0x465b49 Code: f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f9ee0e50c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000073bf00 RCX: 0000000000465b49 RDX: 00000000000003a0 RSI: 00000000200007c0 RDI: 0000000000000004 RBP: 0000000000000003 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f9ee0e516bc R13: 00000000004ca2da R14: 000000000070deb8 R15: 00000000ffffffff Modules linked in: Dumping ftrace buffer: (ftrace buffer empty)
Fixes: 94e03f11ad1f ("IB/uverbs: Add support for flow tag") Link: https://lore.kernel.org/r/20200126171500.4623-1-leon@kernel.org Signed-off-by: Avihai Horon avihaih@mellanox.com Reviewed-by: Maor Gottlieb maorg@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/core/uverbs_cmd.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
--- a/drivers/infiniband/core/uverbs_cmd.c +++ b/drivers/infiniband/core/uverbs_cmd.c @@ -2720,12 +2720,6 @@ static int kern_spec_to_ib_spec_action(s return 0; }
-static size_t kern_spec_filter_sz(const struct ib_uverbs_flow_spec_hdr *spec) -{ - /* Returns user space filter size, includes padding */ - return (spec->size - sizeof(struct ib_uverbs_flow_spec_hdr)) / 2; -} - static ssize_t spec_filter_size(const void *kern_spec_filter, u16 kern_filter_size, u16 ib_real_filter_sz) { @@ -2869,11 +2863,16 @@ int ib_uverbs_kern_spec_to_ib_spec_filte static int kern_spec_to_ib_spec_filter(struct ib_uverbs_flow_spec *kern_spec, union ib_flow_spec *ib_spec) { - ssize_t kern_filter_sz; + size_t kern_filter_sz; void *kern_spec_mask; void *kern_spec_val;
- kern_filter_sz = kern_spec_filter_sz(&kern_spec->hdr); + if (check_sub_overflow((size_t)kern_spec->hdr.size, + sizeof(struct ib_uverbs_flow_spec_hdr), + &kern_filter_sz)) + return -EINVAL; + + kern_filter_sz /= 2;
kern_spec_val = (void *)kern_spec + sizeof(struct ib_uverbs_flow_spec_hdr);
From: Krishnamraju Eraparaju krishna2@chelsio.com
commit d219face9059f38ad187bde133451a2a308fdb7c upstream.
As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3, always initiate a CLOSE when entering into TERM state.
In c4iw_modify_qp(), disconnect operation should only be performed when the modify_qp call is invoked from ib_core. And all other internal modify_qp calls(invoked within iw_cxgb4) that needs 'disconnect' should call c4iw_ep_disconnect() explicitly after modify_qp. Otherwise, deadlocks like below can occur:
Call Trace: schedule+0x2f/0xa0 schedule_preempt_disabled+0xa/0x10 __mutex_lock.isra.5+0x2d0/0x4a0 c4iw_ep_disconnect+0x39/0x430 => tries to reacquire ep lock again c4iw_modify_qp+0x468/0x10d0 rx_data+0x218/0x570 => acquires ep lock process_work+0x5f/0x70 process_one_work+0x1a7/0x3b0 worker_thread+0x30/0x390 kthread+0x112/0x130 ret_from_fork+0x35/0x40
Fixes: d2c33370ae73 ("RDMA/iw_cxgb4: Always disconnect when QP is transitioning to TERMINATE state") Link: https://lore.kernel.org/r/20200204091230.7210-1-krishna2@chelsio.com Signed-off-by: Krishnamraju Eraparaju krishna2@chelsio.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/hw/cxgb4/cm.c | 4 ++++ drivers/infiniband/hw/cxgb4/qp.c | 4 ++-- 2 files changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/infiniband/hw/cxgb4/cm.c +++ b/drivers/infiniband/hw/cxgb4/cm.c @@ -3036,6 +3036,10 @@ static int terminate(struct c4iw_dev *de C4IW_QP_ATTR_NEXT_STATE, &attrs, 1); }
+ /* As per draft-hilland-iwarp-verbs-v1.0, sec 6.2.3, + * when entering the TERM state the RNIC MUST initiate a CLOSE. + */ + c4iw_ep_disconnect(ep, 1, GFP_KERNEL); c4iw_put_ep(&ep->com); } else pr_warn("TERM received tid %u no ep/qp\n", tid); --- a/drivers/infiniband/hw/cxgb4/qp.c +++ b/drivers/infiniband/hw/cxgb4/qp.c @@ -1948,10 +1948,10 @@ int c4iw_modify_qp(struct c4iw_dev *rhp, qhp->attr.layer_etype = attrs->layer_etype; qhp->attr.ecode = attrs->ecode; ep = qhp->ep; - c4iw_get_ep(&ep->com); - disconnect = 1; if (!internal) { + c4iw_get_ep(&ep->com); terminate = 1; + disconnect = 1; } else { terminate = qhp->attr.send_term; ret = rdma_fini(rhp, qhp, ep);
From: Kamal Heib kamalheib1@gmail.com
commit 8a4f300b978edbbaa73ef9eca660e45eb9f13873 upstream.
Make sure to free the allocated cpumask_var_t's to avoid the following reported memory leak by kmemleak:
$ cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8897f812d6a8 (size 8): comm "kworker/1:1", pid 347, jiffies 4294751400 (age 101.703s) hex dump (first 8 bytes): 00 00 00 00 00 00 00 00 ........ backtrace: [<00000000bff49664>] alloc_cpumask_var_node+0x4c/0xb0 [<0000000075d3ca81>] hfi1_comp_vectors_set_up+0x20f/0x800 [hfi1] [<0000000098d420df>] hfi1_init_dd+0x3311/0x4960 [hfi1] [<0000000071be7e52>] init_one+0x25e/0xf10 [hfi1] [<000000005483d4c2>] local_pci_probe+0xd4/0x180 [<000000007c3cbc6e>] work_for_cpu_fn+0x51/0xa0 [<000000001d626905>] process_one_work+0x8f0/0x17b0 [<000000007e569e7e>] worker_thread+0x536/0xb50 [<00000000fd39a4a5>] kthread+0x30c/0x3d0 [<0000000056f2edb3>] ret_from_fork+0x3a/0x50
Fixes: 5d18ee67d4c1 ("IB/{hfi1, rdmavt, qib}: Implement CQ completion vector support") Link: https://lore.kernel.org/r/20200205110530.12129-1-kamalheib1@gmail.com Signed-off-by: Kamal Heib kamalheib1@gmail.com Reviewed-by: Dennis Dalessandro dennis.dalessandro@intel.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/hw/hfi1/affinity.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/infiniband/hw/hfi1/affinity.c +++ b/drivers/infiniband/hw/hfi1/affinity.c @@ -479,6 +479,8 @@ static int _dev_comp_vect_mappings_creat rvt_get_ibdev_name(&(dd)->verbs_dev.rdi), i, cpu); }
+ free_cpumask_var(available_cpus); + free_cpumask_var(non_intr_cpus); return 0;
fail:
From: Zhu Yanjun yanjunz@mellanox.com
commit 8ac0e6641c7ca14833a2a8c6f13d8e0a435e535c upstream.
When run stress tests with RXE, the following Call Traces often occur
watchdog: BUG: soft lockup - CPU#2 stuck for 22s! [swapper/2:0] ... Call Trace: <IRQ> create_object+0x3f/0x3b0 kmem_cache_alloc_node_trace+0x129/0x2d0 __kmalloc_reserve.isra.52+0x2e/0x80 __alloc_skb+0x83/0x270 rxe_init_packet+0x99/0x150 [rdma_rxe] rxe_requester+0x34e/0x11a0 [rdma_rxe] rxe_do_task+0x85/0xf0 [rdma_rxe] tasklet_action_common.isra.21+0xeb/0x100 __do_softirq+0xd0/0x298 irq_exit+0xc5/0xd0 smp_apic_timer_interrupt+0x68/0x120 apic_timer_interrupt+0xf/0x20 </IRQ> ...
The root cause is that tasklet is actually a softirq. In a tasklet handler, another softirq handler is triggered. Usually these softirq handlers run on the same cpu core. So this will cause "soft lockup Bug".
Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://lore.kernel.org/r/20200212072635.682689-8-leon@kernel.org Signed-off-by: Zhu Yanjun yanjunz@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/sw/rxe/rxe_comp.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/infiniband/sw/rxe/rxe_comp.c +++ b/drivers/infiniband/sw/rxe/rxe_comp.c @@ -329,7 +329,7 @@ static inline enum comp_state check_ack( qp->comp.psn = pkt->psn; if (qp->req.wait_psn) { qp->req.wait_psn = 0; - rxe_run_task(&qp->req.task, 1); + rxe_run_task(&qp->req.task, 0); } } return COMPST_ERROR_RETRY; @@ -463,7 +463,7 @@ static void do_complete(struct rxe_qp *q */ if (qp->req.wait_fence) { qp->req.wait_fence = 0; - rxe_run_task(&qp->req.task, 1); + rxe_run_task(&qp->req.task, 0); } }
@@ -479,7 +479,7 @@ static inline enum comp_state complete_a if (qp->req.need_rd_atomic) { qp->comp.timeout_retry = 0; qp->req.need_rd_atomic = 0; - rxe_run_task(&qp->req.task, 1); + rxe_run_task(&qp->req.task, 0); } }
@@ -725,7 +725,7 @@ int rxe_completer(void *arg) RXE_CNT_COMP_RETRY); qp->req.need_retry = 1; qp->comp.started_retry = 1; - rxe_run_task(&qp->req.task, 1); + rxe_run_task(&qp->req.task, 0); }
if (pkt) {
From: Leon Romanovsky leonro@mellanox.com
commit 1dd017882e01d2fcd9c5dbbf1eb376211111c393 upstream.
We don't need to set pkey as valid in case that user set only one of pkey index or port number, otherwise it will be resulted in NULL pointer dereference while accessing to uninitialized pkey list. The following crash from Syzkaller revealed it.
kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] SMP KASAN PTI CPU: 1 PID: 14753 Comm: syz-executor.2 Not tainted 5.5.0-rc5 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 RIP: 0010:get_pkey_idx_qp_list+0x161/0x2d0 Code: 01 00 00 49 8b 5e 20 4c 39 e3 0f 84 b9 00 00 00 e8 e4 42 6e fe 48 8d 7b 10 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48 c1 ea 03 <0f> b6 04 02 84 c0 74 08 3c 01 0f 8e d0 00 00 00 48 8d 7d 04 48 b8 RSP: 0018:ffffc9000bc6f950 EFLAGS: 00010202 RAX: dffffc0000000000 RBX: 0000000000000000 RCX: ffffffff82c8bdec RDX: 0000000000000002 RSI: ffffc900030a8000 RDI: 0000000000000010 RBP: ffff888112c8ce80 R08: 0000000000000004 R09: fffff5200178df1f R10: 0000000000000001 R11: fffff5200178df1f R12: ffff888115dc4430 R13: ffff888115da8498 R14: ffff888115dc4410 R15: ffff888115da8000 FS: 00007f20777de700(0000) GS:ffff88811b100000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2f721000 CR3: 00000001173ca002 CR4: 0000000000360ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: port_pkey_list_insert+0xd7/0x7c0 ib_security_modify_qp+0x6fa/0xfc0 _ib_modify_qp+0x8c4/0xbf0 modify_qp+0x10da/0x16d0 ib_uverbs_modify_qp+0x9a/0x100 ib_uverbs_write+0xaa5/0xdf0 __vfs_write+0x7c/0x100 vfs_write+0x168/0x4a0 ksys_write+0xc8/0x200 do_syscall_64+0x9c/0x390 entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: d291f1a65232 ("IB/core: Enforce PKey security on QPs") Link: https://lore.kernel.org/r/20200212080651.GB679970@unreal Signed-off-by: Maor Gottlieb maorg@mellanox.com Signed-off-by: Leon Romanovsky leonro@mellanox.com Message-Id: 20200212080651.GB679970@unreal Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/infiniband/core/security.c | 24 +++++++++--------------- 1 file changed, 9 insertions(+), 15 deletions(-)
--- a/drivers/infiniband/core/security.c +++ b/drivers/infiniband/core/security.c @@ -339,22 +339,16 @@ static struct ib_ports_pkeys *get_new_pp if (!new_pps) return NULL;
- if (qp_attr_mask & (IB_QP_PKEY_INDEX | IB_QP_PORT)) { - if (!qp_pps) { - new_pps->main.port_num = qp_attr->port_num; - new_pps->main.pkey_index = qp_attr->pkey_index; - } else { - new_pps->main.port_num = (qp_attr_mask & IB_QP_PORT) ? - qp_attr->port_num : - qp_pps->main.port_num; - - new_pps->main.pkey_index = - (qp_attr_mask & IB_QP_PKEY_INDEX) ? - qp_attr->pkey_index : - qp_pps->main.pkey_index; - } + if (qp_attr_mask & IB_QP_PORT) + new_pps->main.port_num = + (qp_pps) ? qp_pps->main.port_num : qp_attr->port_num; + if (qp_attr_mask & IB_QP_PKEY_INDEX) + new_pps->main.pkey_index = (qp_pps) ? qp_pps->main.pkey_index : + qp_attr->pkey_index; + if ((qp_attr_mask & IB_QP_PKEY_INDEX) && (qp_attr_mask & IB_QP_PORT)) new_pps->main.state = IB_PORT_PKEY_VALID; - } else if (qp_pps) { + + if (!(qp_attr_mask & (IB_QP_PKEY_INDEX || IB_QP_PORT)) && qp_pps) { new_pps->main.port_num = qp_pps->main.port_num; new_pps->main.pkey_index = qp_pps->main.pkey_index; if (qp_pps->main.state != IB_PORT_PKEY_NOT_VALID)
From: Nathan Chancellor natechancellor@gmail.com
commit 0f8a206df7c920150d2aa45574fba0ab7ff6be4f upstream.
Clang warns:
In file included from ../arch/s390/boot/startup.c:3: In file included from ../include/linux/elf.h:5: In file included from ../arch/s390/include/asm/elf.h:132: In file included from ../include/linux/compat.h:10: In file included from ../include/linux/time.h:74: In file included from ../include/linux/time32.h:13: In file included from ../include/linux/timex.h:65: ../arch/s390/include/asm/timex.h:160:20: warning: passing 'unsigned char [16]' to parameter of type 'char *' converts between pointers to integer types with different sign [-Wpointer-sign] get_tod_clock_ext(clk); ^~~ ../arch/s390/include/asm/timex.h:149:44: note: passing argument to parameter 'clk' here static inline void get_tod_clock_ext(char *clk) ^
Change clk's type to just be char so that it matches what happens in get_tod_clock_ext.
Fixes: 57b28f66316d ("[S390] s390_hypfs: Add new attributes") Link: https://github.com/ClangBuiltLinux/linux/issues/861 Link: http://lkml.kernel.org/r/20200208140858.47970-1-natechancellor@gmail.com Reviewed-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/s390/include/asm/timex.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/s390/include/asm/timex.h +++ b/arch/s390/include/asm/timex.h @@ -155,7 +155,7 @@ static inline void get_tod_clock_ext(cha
static inline unsigned long long get_tod_clock(void) { - unsigned char clk[STORE_CLOCK_EXT_SIZE]; + char clk[STORE_CLOCK_EXT_SIZE];
get_tod_clock_ext(clk); return *((unsigned long long *)&clk[1]);
From: Luca Weiss luca@z3ntu.xyz
commit fbd1ec000213c8b457dd4fb15b6de9ba02ec5482 upstream.
The is_visible function expects the permissions associated with an attribute of the sysfs group or 0 if an attribute is not visible.
Change the code to return the attribute permissions when the attribute should be visible which resolves the warning:
Attribute calibrate: Invalid permissions 01
Fixes: cc12ba1872c6 ("Input: ili210x - optionally show calibrate sysfs attribute") Signed-off-by: Luca Weiss luca@z3ntu.xyz Reviewed-by: Sven Van Asbroeck TheSven73@gmail.com Link: https://lore.kernel.org/r/20200209145628.649409-1-luca@z3ntu.xyz Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/input/touchscreen/ili210x.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/input/touchscreen/ili210x.c +++ b/drivers/input/touchscreen/ili210x.c @@ -321,7 +321,7 @@ static umode_t ili210x_calibrate_visible struct i2c_client *client = to_i2c_client(dev); struct ili210x *priv = i2c_get_clientdata(client);
- return priv->chip->has_calibrate_reg; + return priv->chip->has_calibrate_reg ? attr->mode : 0; }
static const struct attribute_group ili210x_attr_group = {
From: Qais Yousef qais.yousef@arm.com
commit b562d140649966d4daedd0483a8fe59ad3bb465a upstream.
The check to ensure that the new written value into cpu.uclamp.{min,max} is within range, [0:100], wasn't working because of the signed comparison
7301 if (req.percent > UCLAMP_PERCENT_SCALE) { 7302 req.ret = -ERANGE; 7303 return req; 7304 }
# echo -1 > cpu.uclamp.min # cat cpu.uclamp.min 42949671.96
Cast req.percent into u64 to force the comparison to be unsigned and work as intended in capacity_from_percent().
# echo -1 > cpu.uclamp.min sh: write error: Numerical result out of range
Fixes: 2480c093130f ("sched/uclamp: Extend CPU's cgroup controller") Signed-off-by: Qais Yousef qais.yousef@arm.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lkml.kernel.org/r/20200114210947.14083-1-qais.yousef@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -7260,7 +7260,7 @@ capacity_from_percent(char *buf) &req.percent); if (req.ret) return req; - if (req.percent > UCLAMP_PERCENT_SCALE) { + if ((u64)req.percent > UCLAMP_PERCENT_SCALE) { req.ret = -ERANGE; return req; }
From: Johannes Berg johannes.berg@intel.com
commit f2b18baca9539c6a3116d48b70972c7a2ba5d766 upstream.
It turns out that this wasn't a good idea, I hit a test failure in hwsim due to this. That particular failure was easily worked around, but it raised questions: if an AP needs to, for example, send action frames to each connected station, the current limit is nowhere near enough (especially if those stations are sleeping and the frames are queued for a while.)
Shuffle around some bits to make more room for ack_frame_id to allow up to 8192 queued up frames, that's enough for queueing 4 frames to each connected station, even at the maximum of 2007 stations on a single AP.
We take the bits from band (which currently only 2 but I leave 3 in case we add another band) and from the hw_queue, which can only need 4 since it has a limit of 16 queues.
Fixes: 6912daed05e1 ("mac80211: Shrink the size of ack_frame_id to make room for tx_time_est") Signed-off-by: Johannes Berg johannes.berg@intel.com Acked-by: Toke Høiland-Jørgensen toke@redhat.com Link: https://lore.kernel.org/r/20200115122549.b9a4ef9f4980.Ied52ed90150220b83a280... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- include/net/mac80211.h | 11 +++++------ net/mac80211/cfg.c | 2 +- net/mac80211/tx.c | 2 +- 3 files changed, 7 insertions(+), 8 deletions(-)
--- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -1004,12 +1004,11 @@ ieee80211_rate_get_vht_nss(const struct struct ieee80211_tx_info { /* common information */ u32 flags; - u8 band; - - u8 hw_queue; - - u16 ack_frame_id:6; - u16 tx_time_est:10; + u32 band:3, + ack_frame_id:13, + hw_queue:4, + tx_time_est:10; + /* 2 free bits */
union { struct { --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -3450,7 +3450,7 @@ int ieee80211_attach_ack_skb(struct ieee
spin_lock_irqsave(&local->ack_status_lock, spin_flags); id = idr_alloc(&local->ack_status_frames, ack_skb, - 1, 0x40, GFP_ATOMIC); + 1, 0x2000, GFP_ATOMIC); spin_unlock_irqrestore(&local->ack_status_lock, spin_flags);
if (id < 0) { --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -2442,7 +2442,7 @@ static int ieee80211_store_ack_skb(struc
spin_lock_irqsave(&local->ack_status_lock, flags); id = idr_alloc(&local->ack_status_frames, ack_skb, - 1, 0x40, GFP_ATOMIC); + 1, 0x2000, GFP_ATOMIC); spin_unlock_irqrestore(&local->ack_status_lock, flags);
if (id >= 0) {
From: Stephen Boyd swboyd@chromium.org
commit 2d5a2f913b658a7ae984773a63318ed4daadf4af upstream.
I see the following lockdep splat in the qcom pinctrl driver when attempting to suspend the device.
WARNING: possible recursive locking detected 5.4.11 #3 Tainted: G W -------------------------------------------- cat/3074 is trying to acquire lock: ffffff81f49804c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94
but task is already holding lock: ffffff81f1cc10c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94
other info that might help us debug this: Possible unsafe locking scenario:
CPU0 ---- lock(&irq_desc_lock_class); lock(&irq_desc_lock_class);
*** DEADLOCK ***
May be due to missing lock nesting notation
6 locks held by cat/3074: #0: ffffff81f01d9420 (sb_writers#7){.+.+}, at: vfs_write+0xd0/0x1a4 #1: ffffff81bd7d2080 (&of->mutex){+.+.}, at: kernfs_fop_write+0x12c/0x1fc #2: ffffff81f4c322f0 (kn->count#337){.+.+}, at: kernfs_fop_write+0x134/0x1fc #3: ffffffe411a41d60 (system_transition_mutex){+.+.}, at: pm_suspend+0x108/0x348 #4: ffffff81f1c5e970 (&dev->mutex){....}, at: __device_suspend+0x168/0x41c #5: ffffff81f1cc10c0 (&irq_desc_lock_class){-.-.}, at: __irq_get_desc_lock+0x64/0x94
stack backtrace: CPU: 5 PID: 3074 Comm: cat Tainted: G W 5.4.11 #3 Hardware name: Google Cheza (rev3+) (DT) Call trace: dump_backtrace+0x0/0x174 show_stack+0x20/0x2c dump_stack+0xc8/0x124 __lock_acquire+0x460/0x2388 lock_acquire+0x1cc/0x210 _raw_spin_lock_irqsave+0x64/0x80 __irq_get_desc_lock+0x64/0x94 irq_set_irq_wake+0x40/0x144 qpnpint_irq_set_wake+0x28/0x34 set_irq_wake_real+0x40/0x5c irq_set_irq_wake+0x70/0x144 pm8941_pwrkey_suspend+0x34/0x44 platform_pm_suspend+0x34/0x60 dpm_run_callback+0x64/0xcc __device_suspend+0x310/0x41c dpm_suspend+0xf8/0x298 dpm_suspend_start+0x84/0xb4 suspend_devices_and_enter+0xbc/0x620 pm_suspend+0x210/0x348 state_store+0xb0/0x108 kobj_attr_store+0x14/0x24 sysfs_kf_write+0x4c/0x64 kernfs_fop_write+0x15c/0x1fc __vfs_write+0x54/0x18c vfs_write+0xe4/0x1a4 ksys_write+0x7c/0xe4 __arm64_sys_write+0x20/0x2c el0_svc_common+0xa8/0x160 el0_svc_handler+0x7c/0x98 el0_svc+0x8/0xc
Set a lockdep class when we map the irq so that irq_set_wake() doesn't warn about a lockdep bug that doesn't exist.
Fixes: 12a9eeaebba3 ("spmi: pmic-arb: convert to v2 irq interfaces to support hierarchical IRQ chips") Cc: Douglas Anderson dianders@chromium.org Cc: Brian Masney masneyb@onstation.org Cc: Lina Iyer ilina@codeaurora.org Cc: Maulik Shah mkshah@codeaurora.org Cc: Bjorn Andersson bjorn.andersson@linaro.org Signed-off-by: Stephen Boyd swboyd@chromium.org Link: https://lore.kernel.org/r/20200121183748.68662-1-swboyd@chromium.org Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/spmi/spmi-pmic-arb.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/spmi/spmi-pmic-arb.c +++ b/drivers/spmi/spmi-pmic-arb.c @@ -731,6 +731,7 @@ static int qpnpint_irq_domain_translate( return 0; }
+static struct lock_class_key qpnpint_irq_lock_class, qpnpint_irq_request_class;
static void qpnpint_irq_domain_map(struct spmi_pmic_arb *pmic_arb, struct irq_domain *domain, unsigned int virq, @@ -746,6 +747,9 @@ static void qpnpint_irq_domain_map(struc else handler = handle_level_irq;
+ + irq_set_lockdep_class(virq, &qpnpint_irq_lock_class, + &qpnpint_irq_request_class); irq_domain_set_info(domain, virq, hwirq, &pmic_arb_irqchip, pmic_arb, handler, NULL, NULL); }
From: Kan Liang kan.liang@linux.intel.com
commit f861854e1b435b27197417f6f90d87188003cb24 upstream.
Perf doesn't take the left period into account when auto-reload is enabled with fixed period sampling mode in context switch.
Here is the MSR trace of the perf command as below. (The MSR trace is simplified from a ftrace log.)
#perf record -e cycles:p -c 2000000 -- ./triad_loop
//The MSR trace of task schedule out //perf disable all counters, disable PEBS, disable GP counter 0, //read GP counter 0, and re-enable all counters. //The counter 0 stops at 0xfffffff82840 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 0 write_msr: MSR_P6_EVNTSEL0(186), value 40003003c rdpmc: 0, value fffffff82840 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff
//The MSR trace of the same task schedule in again //perf disable all counters, enable and set GP counter 0, //enable PEBS, and re-enable all counters. //0xffffffe17b80 (-2000000) is written to GP counter 0. write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PMC0(4c1), value ffffffe17b80 write_msr: MSR_P6_EVNTSEL0(186), value 40043003c write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 1 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff
When the same task schedule in again, the counter should starts from previous left. However, it starts from the fixed period -2000000 again.
A special variant of intel_pmu_save_and_restart() is used for auto-reload, which doesn't update the hwc->period_left. When the monitored task schedules in again, perf doesn't know the left period. The fixed period is used, which is inaccurate.
With auto-reload, the counter always has a negative counter value. So the left period is -value. Update the period_left in intel_pmu_save_and_restart_reload().
With the patch:
//The MSR trace of task schedule out write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 0 write_msr: MSR_P6_EVNTSEL0(186), value 40003003c rdpmc: 0, value ffffffe25cbc write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff
//The MSR trace of the same task schedule in again write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value 0 write_msr: MSR_IA32_PMC0(4c1), value ffffffe25cbc write_msr: MSR_P6_EVNTSEL0(186), value 40043003c write_msr: MSR_IA32_PEBS_ENABLE(3f1), value 1 write_msr: MSR_CORE_PERF_GLOBAL_CTRL(38f), value f000000ff
Fixes: d31fc13fdcb2 ("perf/x86/intel: Fix event update for auto-reload") Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lkml.kernel.org/r/20200121190125.3389-1-kan.liang@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/events/intel/ds.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/x86/events/intel/ds.c +++ b/arch/x86/events/intel/ds.c @@ -1713,6 +1713,8 @@ intel_pmu_save_and_restart_reload(struct old = ((s64)(prev_raw_count << shift) >> shift); local64_add(new - old + count * period, &event->count);
+ local64_set(&hwc->period_left, -new); + perf_event_update_userpage(event);
return 0;
From: Mike Jones michael-a1.jones@analog.com
commit cf2b012c90e74e85d8aea7d67e48868069cfee0c upstream.
Change 21537dc driver PMBus polling of MFR_COMMON from bits 5/4 to bits 6/5. This fixs a LTC297X family bug where polling always returns not busy even when the part is busy. This fixes a LTC388X and LTM467X bug where polling used PEND and NOT_IN_TRANS, and BUSY was not polled, which can lead to NACKing of commands. LTC388X and LTM467X modules now poll BUSY and PEND, increasing reliability by eliminating NACKing of commands.
Signed-off-by: Mike Jones michael-a1.jones@analog.com Link: https://lore.kernel.org/r/1580234400-2829-2-git-send-email-michael-a1.jones@... Fixes: e04d1ce9bbb49 ("hwmon: (ltc2978) Add polling for chips requiring it") Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/hwmon/pmbus/ltc2978.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/hwmon/pmbus/ltc2978.c +++ b/drivers/hwmon/pmbus/ltc2978.c @@ -82,8 +82,8 @@ enum chips { ltc2974, ltc2975, ltc2977,
#define LTC_POLL_TIMEOUT 100 /* in milli-seconds */
-#define LTC_NOT_BUSY BIT(5) -#define LTC_NOT_PENDING BIT(4) +#define LTC_NOT_BUSY BIT(6) +#define LTC_NOT_PENDING BIT(5)
/* * LTC2978 clears peak data whenever the CLEAR_FAULTS command is executed, which
From: Sara Sharon sara.sharon@intel.com
commit 2bf973ff9b9aeceb8acda629ae65341820d4b35b upstream.
Previously I intended to ignore quiet mode in probe response, however I ended up ignoring it instead for action frames. As a matter of fact, this path isn't invoked for probe responses to start with. Just revert this patch.
Signed-off-by: Sara Sharon sara.sharon@intel.com Fixes: 7976b1e9e3bf ("mac80211: ignore quiet mode in probe") Signed-off-by: Luca Coelho luciano.coelho@intel.com Link: https://lore.kernel.org/r/20200131111300.891737-15-luca@coelho.fi Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- net/mac80211/mlme.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -8,7 +8,7 @@ * Copyright 2007, Michael Wu flamingice@sourmilk.net * Copyright 2013-2014 Intel Mobile Communications GmbH * Copyright (C) 2015 - 2017 Intel Deutschland GmbH - * Copyright (C) 2018 - 2019 Intel Corporation + * Copyright (C) 2018 - 2020 Intel Corporation */
#include <linux/delay.h> @@ -1311,7 +1311,7 @@ ieee80211_sta_process_chanswitch(struct if (!res) { ch_switch.timestamp = timestamp; ch_switch.device_timestamp = device_timestamp; - ch_switch.block_tx = beacon ? csa_ie.mode : 0; + ch_switch.block_tx = csa_ie.mode; ch_switch.chandef = csa_ie.chandef; ch_switch.count = csa_ie.count; ch_switch.delay = csa_ie.max_switch_time; @@ -1404,7 +1404,7 @@ ieee80211_sta_process_chanswitch(struct
sdata->vif.csa_active = true; sdata->csa_chandef = csa_ie.chandef; - sdata->csa_block_tx = ch_switch.block_tx; + sdata->csa_block_tx = csa_ie.mode; ifmgd->csa_ignored_same_chan = false;
if (sdata->csa_block_tx) @@ -1438,7 +1438,7 @@ ieee80211_sta_process_chanswitch(struct * reset when the disconnection worker runs. */ sdata->vif.csa_active = true; - sdata->csa_block_tx = ch_switch.block_tx; + sdata->csa_block_tx = csa_ie.mode;
ieee80211_queue_work(&local->hw, &ifmgd->csa_connection_drop_work); mutex_unlock(&local->chanctx_mtx);
From: Petr Pavlu petr.pavlu@suse.com
commit 3f6166aaf19902f2f3124b5426405e292e8974dd upstream.
Fix display for sec=krb5i which was wrongly interleaved by cruid, resulting in string "sec=krb5,cruid=<...>i" instead of "sec=krb5i,cruid=<...>".
Fixes: 96281b9e46eb ("smb3: for kerberos mounts display the credential uid used") Signed-off-by: Petr Pavlu petr.pavlu@suse.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/cifs/cifsfs.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -414,7 +414,7 @@ cifs_show_security(struct seq_file *s, s seq_puts(s, "ntlm"); break; case Kerberos: - seq_printf(s, "krb5,cruid=%u", from_kuid_munged(&init_user_ns,ses->cred_uid)); + seq_puts(s, "krb5"); break; case RawNTLMSSP: seq_puts(s, "ntlmssp"); @@ -427,6 +427,10 @@ cifs_show_security(struct seq_file *s, s
if (ses->sign) seq_puts(s, "i"); + + if (ses->sectype == Kerberos) + seq_printf(s, ",cruid=%u", + from_kuid_munged(&init_user_ns, ses->cred_uid)); }
static void
From: Xiubo Li xiubli@redhat.com
commit 3b20bc2fe4c0cfd82d35838965dc7ff0b93415c6 upstream.
For the old mount API, the module parameters parseing function will be called in ceph_mount() and also just after the default posix acl flag set, so we can control to enable/disable it via the mount option.
But for the new mount API, it will call the module parameters parseing function before ceph_get_tree(), so the posix acl will always be enabled.
Fixes: 82995cc6c5ae ("libceph, rbd, ceph: convert to use the new mount API") Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/ceph/super.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/fs/ceph/super.c +++ b/fs/ceph/super.c @@ -1020,10 +1020,6 @@ static int ceph_get_tree(struct fs_conte if (!fc->source) return invalf(fc, "ceph: No source");
-#ifdef CONFIG_CEPH_FS_POSIX_ACL - fc->sb_flags |= SB_POSIXACL; -#endif - /* create client (which we may/may not use) */ fsc = create_fs_client(pctx->opts, pctx->copts); pctx->opts = NULL; @@ -1141,6 +1137,10 @@ static int ceph_init_fs_context(struct f fsopt->max_readdir_bytes = CEPH_MAX_READDIR_BYTES_DEFAULT; fsopt->congestion_kb = default_congestion_kb();
+#ifdef CONFIG_CEPH_FS_POSIX_ACL + fc->sb_flags |= SB_POSIXACL; +#endif + fc->fs_private = pctx; fc->ops = &ceph_context_ops; return 0;
From: Marc Zyngier maz@kernel.org
commit 3543d7ddd55fe12c37e8a9db846216c51846015b upstream.
The interrupt map for the FVP's PCI node is missing the parent-unit-address cells for each of the INTx entries, leading to the kernel code failing to parse the entries correctly.
Add the missing zero cells, which are pretty useless as far as the GIC is concerned, but that the spec requires. This allows INTx to be usable on the model, and VFIO to work correctly.
Fixes: fa083b99eb28 ("arm64: dts: fast models: Add DTS fo Base RevC FVP") Signed-off-by: Marc Zyngier maz@kernel.org Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/arm64/boot/dts/arm/fvp-base-revc.dts | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/arm64/boot/dts/arm/fvp-base-revc.dts +++ b/arch/arm64/boot/dts/arm/fvp-base-revc.dts @@ -161,10 +161,10 @@ bus-range = <0x0 0x1>; reg = <0x0 0x40000000 0x0 0x10000000>; ranges = <0x2000000 0x0 0x50000000 0x0 0x50000000 0x0 0x10000000>; - interrupt-map = <0 0 0 1 &gic GIC_SPI 168 IRQ_TYPE_LEVEL_HIGH>, - <0 0 0 2 &gic GIC_SPI 169 IRQ_TYPE_LEVEL_HIGH>, - <0 0 0 3 &gic GIC_SPI 170 IRQ_TYPE_LEVEL_HIGH>, - <0 0 0 4 &gic GIC_SPI 171 IRQ_TYPE_LEVEL_HIGH>; + interrupt-map = <0 0 0 1 &gic 0 0 GIC_SPI 168 IRQ_TYPE_LEVEL_HIGH>, + <0 0 0 2 &gic 0 0 GIC_SPI 169 IRQ_TYPE_LEVEL_HIGH>, + <0 0 0 3 &gic 0 0 GIC_SPI 170 IRQ_TYPE_LEVEL_HIGH>, + <0 0 0 4 &gic 0 0 GIC_SPI 171 IRQ_TYPE_LEVEL_HIGH>; interrupt-map-mask = <0x0 0x0 0x0 0x7>; msi-map = <0x0 &its 0x0 0x10000>; iommu-map = <0x0 &smmu 0x0 0x10000>;
From: Oliver Upton oupton@google.com
commit 307f1cfa269657c63cfe2c932386fcc24684d9dd upstream.
KVM defines the #DB payload as compatible with the 'pending debug exceptions' field under VMX, not DR6. Mask off bit 12 when applying the payload to DR6, as it is reserved on DR6 but not the 'pending debug exceptions' field.
Fixes: f10c729ff965 ("kvm: vmx: Defer setting of DR6 until #DB delivery") Signed-off-by: Oliver Upton oupton@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/x86.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -437,6 +437,14 @@ void kvm_deliver_exception_payload(struc * for #DB exceptions under VMX. */ vcpu->arch.dr6 ^= payload & DR6_RTM; + + /* + * The #DB payload is defined as compatible with the 'pending + * debug exceptions' field under VMX, not DR6. While bit 12 is + * defined in the 'pending debug exceptions' field (enabled + * breakpoint), it is reserved and must be zero in DR6. + */ + vcpu->arch.dr6 &= ~BIT(12); break; case PF_VECTOR: vcpu->arch.cr2 = payload;
From: Oliver Upton oupton@google.com
commit 684c0422da71da0cd81319c90b8099b563b13da4 upstream.
SDM 27.3.4 states that the 'pending debug exceptions' VMCS field will be populated if a VM-exit caused by an INIT signal takes priority over a debug-trap. Emulate this behavior when synthesizing an INIT signal VM-exit into L1.
Fixes: 4b9852f4f389 ("KVM: x86: Fix INIT signal handling in various CPU states") Signed-off-by: Oliver Upton oupton@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kvm/vmx/nested.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
--- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3583,6 +3583,33 @@ static void nested_vmx_inject_exception_ nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI, intr_info, exit_qual); }
+/* + * Returns true if a debug trap is pending delivery. + * + * In KVM, debug traps bear an exception payload. As such, the class of a #DB + * exception may be inferred from the presence of an exception payload. + */ +static inline bool vmx_pending_dbg_trap(struct kvm_vcpu *vcpu) +{ + return vcpu->arch.exception.pending && + vcpu->arch.exception.nr == DB_VECTOR && + vcpu->arch.exception.payload; +} + +/* + * Certain VM-exits set the 'pending debug exceptions' field to indicate a + * recognized #DB (data or single-step) that has yet to be delivered. Since KVM + * represents these debug traps with a payload that is said to be compatible + * with the 'pending debug exceptions' field, write the payload to the VMCS + * field if a VM-exit is delivered before the debug trap. + */ +static void nested_vmx_update_pending_dbg(struct kvm_vcpu *vcpu) +{ + if (vmx_pending_dbg_trap(vcpu)) + vmcs_writel(GUEST_PENDING_DBG_EXCEPTIONS, + vcpu->arch.exception.payload); +} + static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) { struct vcpu_vmx *vmx = to_vmx(vcpu); @@ -3595,6 +3622,7 @@ static int vmx_check_nested_events(struc test_bit(KVM_APIC_INIT, &apic->pending_events)) { if (block_nested_events) return -EBUSY; + nested_vmx_update_pending_dbg(vcpu); clear_bit(KVM_APIC_INIT, &apic->pending_events); nested_vmx_vmexit(vcpu, EXIT_REASON_INIT_SIGNAL, 0, 0); return 0;
From: Kim Phillips kim.phillips@amd.com
commit 80cc7bb6c104d733bff60ddda09f19139c61507c upstream.
For data collected on machines with front end stalled cycles supported, such as found on modern AMD CPU families, commit 146540fb545b ("perf stat: Always separate stalled cycles per insn") introduces a new line in CSV output with a leading comma that upsets some automated scripts. Scripts have to use "-e ex_ret_instr" to work around this issue, after upgrading to a version of perf with that commit.
We could add "if (have_frontend_stalled && !config->csv_sep)" to the not (total && avg) else clause, to emphasize that CSV users are usually scripts, and are written to do only what is needed, i.e., they wouldn't typically invoke "perf stat" without specifying an explicit event list.
But - let alone CSV output - why should users now tolerate a constant 0-reporting extra line in regular terminal output?:
BEFORE:
$ sudo perf stat --all-cpus -einstructions,cycles -- sleep 1
Performance counter stats for 'system wide':
181,110,981 instructions # 0.58 insn per cycle # 0.00 stalled cycles per insn 309,876,469 cycles
1.002202582 seconds time elapsed
The user would not like to see the now permanent:
"0.00 stalled cycles per insn"
line fixture, as it gives no useful information.
So this patch removes the printing of the zeroed stalled cycles line altogether, almost reverting the very original commit fb4605ba47e7 ("perf stat: Check for frontend stalled for metrics"), which seems like it was written to normalize --metric-only column output of common Intel machines at the time: modern Intel machines have ceased to support the genericised frontend stalled metrics AFAICT.
AFTER:
$ sudo perf stat --all-cpus -einstructions,cycles -- sleep 1
Performance counter stats for 'system wide':
244,071,432 instructions # 0.69 insn per cycle 355,353,490 cycles
1.001862516 seconds time elapsed
Output behaviour when stalled cycles is indeed measured is not affected (BEFORE == AFTER):
$ sudo perf stat --all-cpus -einstructions,cycles,stalled-cycles-frontend -- sleep 1
Performance counter stats for 'system wide':
247,227,799 instructions # 0.63 insn per cycle # 0.26 stalled cycles per insn 394,745,636 cycles 63,194,485 stalled-cycles-frontend # 16.01% frontend cycles idle
1.002079770 seconds time elapsed
Fixes: 146540fb545b ("perf stat: Always separate stalled cycles per insn") Signed-off-by: Kim Phillips kim.phillips@amd.com Acked-by: Andi Kleen ak@linux.intel.com Acked-by: Jiri Olsa jolsa@redhat.com Acked-by: Song Liu songliubraving@fb.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Cong Wang xiyou.wangcong@gmail.com Cc: Davidlohr Bueso dave@stgolabs.net Cc: Jin Yao yao.jin@linux.intel.com Cc: Kan Liang kan.liang@linux.intel.com Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: http://lore.kernel.org/lkml/20200207230613.26709-1-kim.phillips@amd.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- tools/perf/util/stat-shadow.c | 6 ------ 1 file changed, 6 deletions(-)
--- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -18,7 +18,6 @@ * AGGR_NONE: Use matching CPU * AGGR_THREAD: Not supported? */ -static bool have_frontend_stalled;
struct runtime_stat rt_stat; struct stats walltime_nsecs_stats; @@ -144,7 +143,6 @@ void runtime_stat__exit(struct runtime_s
void perf_stat__init_shadow_stats(void) { - have_frontend_stalled = pmu_have_event("cpu", "stalled-cycles-frontend"); runtime_stat__init(&rt_stat); }
@@ -853,10 +851,6 @@ void perf_stat__print_shadow_stats(struc print_metric(config, ctxp, NULL, "%7.2f ", "stalled cycles per insn", ratio); - } else if (have_frontend_stalled) { - out->new_line(config, ctxp); - print_metric(config, ctxp, NULL, "%7.2f ", - "stalled cycles per insn", 0); } } else if (perf_evsel__match(evsel, HARDWARE, HW_BRANCH_MISSES)) { if (runtime_stat_n(st, STAT_BRANCHES, ctx, cpu) != 0)
From: Olga Kornievskaia kolga@netapp.com
commit cd1b659d8ce7697ee9799b64f887528315b9097b upstream.
Turning caching off for writes on the server should improve performance.
Fixes: fba83f34119a ("NFS: Pass "privileged" value to nfs4_init_sequence()") Signed-off-by: Olga Kornievskaia kolga@netapp.com Reviewed-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfs/nfs4proc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -5295,7 +5295,7 @@ static void nfs4_proc_write_setup(struct hdr->timestamp = jiffies;
msg->rpc_proc = &nfs4_procedures[NFSPROC4_CLNT_WRITE]; - nfs4_init_sequence(&hdr->args.seq_args, &hdr->res.seq_res, 1, 0); + nfs4_init_sequence(&hdr->args.seq_args, &hdr->res.seq_res, 0, 0); nfs4_state_protect_write(server->nfs_client, clnt, msg, hdr); }
From: Trond Myklebust trondmy@gmail.com
commit 5d63944f8206a80636ae8cb4b9107d3b49f43d37 upstream.
Ensure we don't release the delegation cred during the call to nfs4_proc_delegreturn().
Fixes: ee05f456772d ("NFSv4: Fix races between open and delegreturn") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- fs/nfs/delegation.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-)
--- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -222,13 +222,18 @@ void nfs_inode_reclaim_delegation(struct
static int nfs_do_return_delegation(struct inode *inode, struct nfs_delegation *delegation, int issync) { + const struct cred *cred; int res = 0;
- if (!test_bit(NFS_DELEGATION_REVOKED, &delegation->flags)) - res = nfs4_proc_delegreturn(inode, - delegation->cred, + if (!test_bit(NFS_DELEGATION_REVOKED, &delegation->flags)) { + spin_lock(&delegation->lock); + cred = get_cred(delegation->cred); + spin_unlock(&delegation->lock); + res = nfs4_proc_delegreturn(inode, cred, &delegation->stateid, issync); + put_cred(cred); + } return res; }
From: Jernej Skrabec jernej.skrabec@siol.net
commit cf913e9683273f2640501094fa63a67e29f437b3 upstream.
This reverts commit 9db9c0cf5895e4ddde2814360cae7bea9282edd2.
Setting mode_config.allow_fb_modifiers manually is completely unnecessary. It is set automatically by drm_universal_plane_init() based on the fact if modifier list is provided or not. Even more, it breaks DE2 and DE3 as they don't support any modifiers beside linear. Modifiers aware applications can be confused by provided empty modifier list - at least linear modifier should be included, but it's not for DE2 and DE3.
Fixes: 9db9c0cf5895 ("drm/sun4i: drv: Allow framebuffer modifiers in mode config") Signed-off-by: Jernej Skrabec jernej.skrabec@siol.net Reviewed-by: Paul Kocialkowski paul.kocialkowski@bootlin.com Signed-off-by: Maxime Ripard maxime@cerno.tech Link: https://patchwork.freedesktop.org/patch/msgid/20200126065937.9564-1-jernej.s... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/sun4i/sun4i_drv.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/gpu/drm/sun4i/sun4i_drv.c +++ b/drivers/gpu/drm/sun4i/sun4i_drv.c @@ -85,7 +85,6 @@ static int sun4i_drv_bind(struct device }
drm_mode_config_init(drm); - drm->mode_config.allow_fb_modifiers = true;
ret = component_bind_all(drm->dev, drm); if (ret) {
From: Chris Wilson chris@chris-wilson.co.uk
commit 88a9c66d998b1d2dac412fcd458c5d17d70513c8 upstream.
The rc6 residency starts ticking from 0 from BIOS POST, but the kernel starts measuring the time from its boot. If we start measuruing I915_PMU_RC6_RESIDENCY while the GT is idle, we start our sampling from 0 and then upon first activity (park/unpark) add in all the rc6 residency since boot. After the first park with the sampler engaged, the sleep/active counters are aligned.
v2: With a wakeref to be sure
Closes: https://gitlab.freedesktop.org/drm/intel/issues/973 Fixes: df6a42053513 ("drm/i915/pmu: Ensure monotonic rc6") Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Tvrtko Ursulin tvrtko.ursulin@intel.com Reviewed-by: Tvrtko Ursulin tvrtko.ursulin@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20200114105648.2172026-1-chris... (cherry picked from commit f4e9894b6952a2819937f363cd42e7cd7894a1e4) Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- drivers/gpu/drm/i915/i915_pmu.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/drivers/gpu/drm/i915/i915_pmu.c +++ b/drivers/gpu/drm/i915/i915_pmu.c @@ -594,8 +594,10 @@ static void i915_pmu_enable(struct perf_ container_of(event->pmu, typeof(*i915), pmu.base); unsigned int bit = event_enabled_bit(event); struct i915_pmu *pmu = &i915->pmu; + intel_wakeref_t wakeref; unsigned long flags;
+ wakeref = intel_runtime_pm_get(&i915->runtime_pm); spin_lock_irqsave(&pmu->lock, flags);
/* @@ -605,6 +607,14 @@ static void i915_pmu_enable(struct perf_ BUILD_BUG_ON(ARRAY_SIZE(pmu->enable_count) != I915_PMU_MASK_BITS); GEM_BUG_ON(bit >= ARRAY_SIZE(pmu->enable_count)); GEM_BUG_ON(pmu->enable_count[bit] == ~0); + + if (pmu->enable_count[bit] == 0 && + config_enabled_mask(I915_PMU_RC6_RESIDENCY) & BIT_ULL(bit)) { + pmu->sample[__I915_SAMPLE_RC6_LAST_REPORTED].cur = 0; + pmu->sample[__I915_SAMPLE_RC6].cur = __get_rc6(&i915->gt); + pmu->sleep_last = ktime_get(); + } + pmu->enable |= BIT_ULL(bit); pmu->enable_count[bit]++;
@@ -645,6 +655,8 @@ static void i915_pmu_enable(struct perf_ * an existing non-zero value. */ local64_set(&event->hw.prev_count, __i915_pmu_event_read(event)); + + intel_runtime_pm_put(&i915->runtime_pm, wakeref); }
static void i915_pmu_disable(struct perf_event *event)
From: Chengguang Xu cgxu519@mykernel.net
[ Upstream commit 57c32ea42f8e802bda47010418e25043e0c9337f ]
Setting softlimit larger than hardlimit seems meaningless for disk quota but currently it is allowed. In this case, there may be a bit of comfusion for users when they run df comamnd to directory which has project quota.
For example, we set 20M softlimit and 10M hardlimit of block usage limit for project quota of test_dir(project id 123).
[root@hades mnt_ext4]# repquota -P -a *** Report for project quotas on device /dev/loop0 Block grace time: 7days; Inode grace time: 7days Block limits File limits Project used soft hard grace used soft hard grace ---------------------------------------------------------------------- 0 -- 13 0 0 2 0 0 123 -- 10237 20480 10240 5 200 100
The result of df command as below:
[root@hades mnt_ext4]# df -h test_dir Filesystem Size Used Avail Use% Mounted on /dev/loop0 20M 10M 10M 50% /home/cgxu/test/mnt_ext4
Even though it looks like there is another 10M free space to use, if we write new data to diretory test_dir(inherit project id), the write will fail with errno(-EDQUOT).
After this patch, the df result looks like below.
[root@hades mnt_ext4]# df -h test_dir Filesystem Size Used Avail Use% Mounted on /dev/loop0 10M 10M 3.0K 100% /home/cgxu/test/mnt_ext4
Signed-off-by: Chengguang Xu cgxu519@mykernel.net Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20191016022501.760-1-cgxu519@mykernel.net Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/super.c | 23 +++++++++++++++++------ 1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 937d8bc1dda74..c51d7ef2e4675 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5536,9 +5536,15 @@ static int ext4_statfs_project(struct super_block *sb, return PTR_ERR(dquot); spin_lock(&dquot->dq_dqb_lock);
- limit = (dquot->dq_dqb.dqb_bsoftlimit ? - dquot->dq_dqb.dqb_bsoftlimit : - dquot->dq_dqb.dqb_bhardlimit) >> sb->s_blocksize_bits; + limit = 0; + if (dquot->dq_dqb.dqb_bsoftlimit && + (!limit || dquot->dq_dqb.dqb_bsoftlimit < limit)) + limit = dquot->dq_dqb.dqb_bsoftlimit; + if (dquot->dq_dqb.dqb_bhardlimit && + (!limit || dquot->dq_dqb.dqb_bhardlimit < limit)) + limit = dquot->dq_dqb.dqb_bhardlimit; + limit >>= sb->s_blocksize_bits; + if (limit && buf->f_blocks > limit) { curblock = (dquot->dq_dqb.dqb_curspace + dquot->dq_dqb.dqb_rsvspace) >> sb->s_blocksize_bits; @@ -5548,9 +5554,14 @@ static int ext4_statfs_project(struct super_block *sb, (buf->f_blocks - curblock) : 0; }
- limit = dquot->dq_dqb.dqb_isoftlimit ? - dquot->dq_dqb.dqb_isoftlimit : - dquot->dq_dqb.dqb_ihardlimit; + limit = 0; + if (dquot->dq_dqb.dqb_isoftlimit && + (!limit || dquot->dq_dqb.dqb_isoftlimit < limit)) + limit = dquot->dq_dqb.dqb_isoftlimit; + if (dquot->dq_dqb.dqb_ihardlimit && + (!limit || dquot->dq_dqb.dqb_ihardlimit < limit)) + limit = dquot->dq_dqb.dqb_ihardlimit; + if (limit && buf->f_files > limit) { buf->f_files = limit; buf->f_ffree =
From: Jens Axboe axboe@kernel.dk
[ Upstream commit 9392a27d88b9707145d713654eb26f0c29789e50 ]
Some work items need this for relative path lookup, make it available like the other inherited credentials/mm/etc.
Cc: stable@vger.kernel.org # 5.3+ Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io-wq.c | 8 ++++++++ fs/io-wq.h | 4 +++- 2 files changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/io-wq.c b/fs/io-wq.c index 5147d2213b019..0dc4bb6de6566 100644 --- a/fs/io-wq.c +++ b/fs/io-wq.c @@ -16,6 +16,7 @@ #include <linux/slab.h> #include <linux/kthread.h> #include <linux/rculist_nulls.h> +#include <linux/fs_struct.h>
#include "io-wq.h"
@@ -58,6 +59,7 @@ struct io_worker { struct mm_struct *mm; const struct cred *creds; struct files_struct *restore_files; + struct fs_struct *restore_fs; };
#if BITS_PER_LONG == 64 @@ -150,6 +152,9 @@ static bool __io_worker_unuse(struct io_wqe *wqe, struct io_worker *worker) task_unlock(current); }
+ if (current->fs != worker->restore_fs) + current->fs = worker->restore_fs; + /* * If we have an active mm, we need to drop the wq lock before unusing * it. If we do, return true and let the caller retry the idle loop. @@ -310,6 +315,7 @@ static void io_worker_start(struct io_wqe *wqe, struct io_worker *worker)
worker->flags |= (IO_WORKER_F_UP | IO_WORKER_F_RUNNING); worker->restore_files = current->files; + worker->restore_fs = current->fs; io_wqe_inc_running(wqe, worker); }
@@ -456,6 +462,8 @@ static void io_worker_handle_work(struct io_worker *worker) } if (!worker->creds) worker->creds = override_creds(wq->creds); + if (work->fs && current->fs != work->fs) + current->fs = work->fs; if (test_bit(IO_WQ_BIT_CANCEL, &wq->state)) work->flags |= IO_WQ_WORK_CANCEL; if (worker->mm) diff --git a/fs/io-wq.h b/fs/io-wq.h index 3f5e356de9805..bbab98d1d328b 100644 --- a/fs/io-wq.h +++ b/fs/io-wq.h @@ -72,6 +72,7 @@ struct io_wq_work { }; void (*func)(struct io_wq_work **); struct files_struct *files; + struct fs_struct *fs; unsigned flags; };
@@ -79,8 +80,9 @@ struct io_wq_work { do { \ (work)->list.next = NULL; \ (work)->func = _func; \ - (work)->flags = 0; \ (work)->files = NULL; \ + (work)->fs = NULL; \ + (work)->flags = 0; \ } while (0) \
typedef void (get_work_fn)(struct io_wq_work *);
From: Trond Myklebust trondmy@gmail.com
[ Upstream commit d2269ea14ebd2a73f291d6b3a7a7d320ec00270c ]
In order to better manage our delegation caching, add a counter to track the number of active delegations.
Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Anna Schumaker Anna.Schumaker@Netapp.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/delegation.c | 36 ++++++++++++++++++++++++------------ 1 file changed, 24 insertions(+), 12 deletions(-)
diff --git a/fs/nfs/delegation.c b/fs/nfs/delegation.c index 5f02d922f2173..8e322bacde699 100644 --- a/fs/nfs/delegation.c +++ b/fs/nfs/delegation.c @@ -25,13 +25,29 @@ #include "internal.h" #include "nfs4trace.h"
-static void nfs_free_delegation(struct nfs_delegation *delegation) +static atomic_long_t nfs_active_delegations; + +static void __nfs_free_delegation(struct nfs_delegation *delegation) { put_cred(delegation->cred); delegation->cred = NULL; kfree_rcu(delegation, rcu); }
+static void nfs_mark_delegation_revoked(struct nfs_delegation *delegation) +{ + if (!test_and_set_bit(NFS_DELEGATION_REVOKED, &delegation->flags)) { + delegation->stateid.type = NFS4_INVALID_STATEID_TYPE; + atomic_long_dec(&nfs_active_delegations); + } +} + +static void nfs_free_delegation(struct nfs_delegation *delegation) +{ + nfs_mark_delegation_revoked(delegation); + __nfs_free_delegation(delegation); +} + /** * nfs_mark_delegation_referenced - set delegation's REFERENCED flag * @delegation: delegation to process @@ -348,7 +364,8 @@ nfs_update_inplace_delegation(struct nfs_delegation *delegation, delegation->stateid.seqid = update->stateid.seqid; smp_wmb(); delegation->type = update->type; - clear_bit(NFS_DELEGATION_REVOKED, &delegation->flags); + if (test_and_clear_bit(NFS_DELEGATION_REVOKED, &delegation->flags)) + atomic_long_inc(&nfs_active_delegations); } }
@@ -428,6 +445,8 @@ int nfs_inode_set_delegation(struct inode *inode, const struct cred *cred, rcu_assign_pointer(nfsi->delegation, delegation); delegation = NULL;
+ atomic_long_inc(&nfs_active_delegations); + trace_nfs4_set_delegation(inode, type);
spin_lock(&inode->i_lock); @@ -437,7 +456,7 @@ int nfs_inode_set_delegation(struct inode *inode, const struct cred *cred, out: spin_unlock(&clp->cl_lock); if (delegation != NULL) - nfs_free_delegation(delegation); + __nfs_free_delegation(delegation); if (freeme != NULL) { nfs_do_return_delegation(inode, freeme, 0); nfs_free_delegation(freeme); @@ -765,13 +784,6 @@ static void nfs_client_mark_return_unused_delegation_types(struct nfs_client *cl rcu_read_unlock(); }
-static void nfs_mark_delegation_revoked(struct nfs_server *server, - struct nfs_delegation *delegation) -{ - set_bit(NFS_DELEGATION_REVOKED, &delegation->flags); - delegation->stateid.type = NFS4_INVALID_STATEID_TYPE; -} - static void nfs_revoke_delegation(struct inode *inode, const nfs4_stateid *stateid) { @@ -799,7 +811,7 @@ static void nfs_revoke_delegation(struct inode *inode, } spin_unlock(&delegation->lock); } - nfs_mark_delegation_revoked(NFS_SERVER(inode), delegation); + nfs_mark_delegation_revoked(delegation); ret = true; out: rcu_read_unlock(); @@ -838,7 +850,7 @@ void nfs_delegation_mark_returned(struct inode *inode, delegation->stateid.seqid = stateid->seqid; }
- nfs_mark_delegation_revoked(NFS_SERVER(inode), delegation); + nfs_mark_delegation_revoked(delegation);
out_clear_returning: clear_bit(NFS_DELEGATION_RETURNING, &delegation->flags);
From: Michał Mirosław mirq-linux@rere.qmqm.pl
[ Upstream commit d3a5bcb4a17f1ad072484bb92c42519ff3aba6e1 ]
Add possibility to toggle active-low flag of a gpio descriptor. This is useful for compatibility code, where defaults are inverted vs DT gpio flags or the active-low flag is taken from elsewhere.
Acked-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Michał Mirosław mirq-linux@rere.qmqm.pl Link: https://lore.kernel.org/r/7ce0338e01ad17fa5a227176813941b41a7c35c1.157603163... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpiolib.c | 11 +++++++++++ include/linux/gpio/consumer.h | 7 +++++++ 2 files changed, 18 insertions(+)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 78a16e42f222e..bcfbfded9ba3f 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -3371,6 +3371,17 @@ int gpiod_is_active_low(const struct gpio_desc *desc) } EXPORT_SYMBOL_GPL(gpiod_is_active_low);
+/** + * gpiod_toggle_active_low - toggle whether a GPIO is active-low or not + * @desc: the gpio descriptor to change + */ +void gpiod_toggle_active_low(struct gpio_desc *desc) +{ + VALIDATE_DESC_VOID(desc); + change_bit(FLAG_ACTIVE_LOW, &desc->flags); +} +EXPORT_SYMBOL_GPL(gpiod_toggle_active_low); + /* I/O calls are only valid after configuration completed; the relevant * "is this a valid GPIO" error checks should already have been done. * diff --git a/include/linux/gpio/consumer.h b/include/linux/gpio/consumer.h index 5215fdba6b9a6..bf2d017dd7b71 100644 --- a/include/linux/gpio/consumer.h +++ b/include/linux/gpio/consumer.h @@ -158,6 +158,7 @@ int gpiod_set_raw_array_value_cansleep(unsigned int array_size,
int gpiod_set_debounce(struct gpio_desc *desc, unsigned debounce); int gpiod_set_transitory(struct gpio_desc *desc, bool transitory); +void gpiod_toggle_active_low(struct gpio_desc *desc);
int gpiod_is_active_low(const struct gpio_desc *desc); int gpiod_cansleep(const struct gpio_desc *desc); @@ -483,6 +484,12 @@ static inline int gpiod_set_transitory(struct gpio_desc *desc, bool transitory) return -ENOSYS; }
+static inline void gpiod_toggle_active_low(struct gpio_desc *desc) +{ + /* GPIO can never have been requested */ + WARN_ON(desc); +} + static inline int gpiod_is_active_low(const struct gpio_desc *desc) { /* GPIO can never have been requested */
From: Michał Mirosław mirq-linux@rere.qmqm.pl
[ Upstream commit 9073d10b098973519044f5fcdc25586810b435da ]
Use MMC_CAP2_RO_ACTIVE_HIGH flag as indicator if GPIO line is to be inverted compared to DT/platform-specified polarity. The flag is not used after init in GPIO mode anyway. No functional changes intended.
Signed-off-by: Michał Mirosław mirq-linux@rere.qmqm.pl Link: https://lore.kernel.org/r/a60f563f11bbff821da2fa2949ca82922b144860.157603163... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpiolib-of.c | 4 ---- drivers/mmc/core/host.c | 11 ++++------- drivers/mmc/core/slot-gpio.c | 3 +++ drivers/mmc/host/pxamci.c | 8 ++++---- drivers/mmc/host/sdhci-esdhc-imx.c | 3 ++- 5 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/drivers/gpio/gpiolib-of.c b/drivers/gpio/gpiolib-of.c index b696e4598a240..b0e79bed59520 100644 --- a/drivers/gpio/gpiolib-of.c +++ b/drivers/gpio/gpiolib-of.c @@ -147,10 +147,6 @@ static void of_gpio_flags_quirks(struct device_node *np, if (of_property_read_bool(np, "cd-inverted")) *flags ^= OF_GPIO_ACTIVE_LOW; } - if (!strcmp(propname, "wp-gpios")) { - if (of_property_read_bool(np, "wp-inverted")) - *flags ^= OF_GPIO_ACTIVE_LOW; - } } /* * Some GPIO fixed regulator quirks. diff --git a/drivers/mmc/core/host.c b/drivers/mmc/core/host.c index 105b7a7c02513..b3484def0a8b0 100644 --- a/drivers/mmc/core/host.c +++ b/drivers/mmc/core/host.c @@ -176,7 +176,6 @@ int mmc_of_parse(struct mmc_host *host) u32 bus_width, drv_type, cd_debounce_delay_ms; int ret; bool cd_cap_invert, cd_gpio_invert = false; - bool ro_cap_invert, ro_gpio_invert = false;
if (!dev || !dev_fwnode(dev)) return 0; @@ -255,9 +254,11 @@ int mmc_of_parse(struct mmc_host *host) }
/* Parse Write Protection */ - ro_cap_invert = device_property_read_bool(dev, "wp-inverted");
- ret = mmc_gpiod_request_ro(host, "wp", 0, 0, &ro_gpio_invert); + if (device_property_read_bool(dev, "wp-inverted")) + host->caps2 |= MMC_CAP2_RO_ACTIVE_HIGH; + + ret = mmc_gpiod_request_ro(host, "wp", 0, 0, NULL); if (!ret) dev_info(host->parent, "Got WP GPIO\n"); else if (ret != -ENOENT && ret != -ENOSYS) @@ -266,10 +267,6 @@ int mmc_of_parse(struct mmc_host *host) if (device_property_read_bool(dev, "disable-wp")) host->caps2 |= MMC_CAP2_NO_WRITE_PROTECT;
- /* See the comment on CD inversion above */ - if (ro_cap_invert ^ ro_gpio_invert) - host->caps2 |= MMC_CAP2_RO_ACTIVE_HIGH; - if (device_property_read_bool(dev, "cap-sd-highspeed")) host->caps |= MMC_CAP_SD_HIGHSPEED; if (device_property_read_bool(dev, "cap-mmc-highspeed")) diff --git a/drivers/mmc/core/slot-gpio.c b/drivers/mmc/core/slot-gpio.c index da2596c5fa28d..582ec3d720f64 100644 --- a/drivers/mmc/core/slot-gpio.c +++ b/drivers/mmc/core/slot-gpio.c @@ -241,6 +241,9 @@ int mmc_gpiod_request_ro(struct mmc_host *host, const char *con_id, return ret; }
+ if (host->caps2 & MMC_CAP2_RO_ACTIVE_HIGH) + gpiod_toggle_active_low(desc); + if (gpio_invert) *gpio_invert = !gpiod_is_active_low(desc);
diff --git a/drivers/mmc/host/pxamci.c b/drivers/mmc/host/pxamci.c index 024acc1b0a2ea..b2bbcb09a49e6 100644 --- a/drivers/mmc/host/pxamci.c +++ b/drivers/mmc/host/pxamci.c @@ -740,16 +740,16 @@ static int pxamci_probe(struct platform_device *pdev) goto out; }
+ if (!host->pdata->gpio_card_ro_invert) + mmc->caps2 |= MMC_CAP2_RO_ACTIVE_HIGH; + ret = mmc_gpiod_request_ro(mmc, "wp", 0, 0, NULL); if (ret && ret != -ENOENT) { dev_err(dev, "Failed requesting gpio_ro\n"); goto out; } - if (!ret) { + if (!ret) host->use_ro_gpio = true; - mmc->caps2 |= host->pdata->gpio_card_ro_invert ? - 0 : MMC_CAP2_RO_ACTIVE_HIGH; - }
if (host->pdata->init) host->pdata->init(dev, pxamci_detect_irq, mmc); diff --git a/drivers/mmc/host/sdhci-esdhc-imx.c b/drivers/mmc/host/sdhci-esdhc-imx.c index 1c988d6a24330..dccb4df465126 100644 --- a/drivers/mmc/host/sdhci-esdhc-imx.c +++ b/drivers/mmc/host/sdhci-esdhc-imx.c @@ -1381,13 +1381,14 @@ static int sdhci_esdhc_imx_probe_nondt(struct platform_device *pdev, host->mmc->parent->platform_data); /* write_protect */ if (boarddata->wp_type == ESDHC_WP_GPIO) { + host->mmc->caps2 |= MMC_CAP2_RO_ACTIVE_HIGH; + err = mmc_gpiod_request_ro(host->mmc, "wp", 0, 0, NULL); if (err) { dev_err(mmc_dev(host->mmc), "failed to request write-protect gpio!\n"); return err; } - host->mmc->caps2 |= MMC_CAP2_RO_ACTIVE_HIGH; }
/* card_detect */
On 2/18/20 12:54 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.5.5 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Feb 2020 19:03:19 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.5-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
thanks, -- Shuah
On Tue, Feb 18, 2020 at 04:02:35PM -0700, shuah wrote:
On 2/18/20 12:54 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.5.5 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Feb 2020 19:03:19 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.5-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Thanks for testing all of these and letting me know.
greg k-h
On Wed, 19 Feb 2020 at 01:32, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.5.5 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Feb 2020 19:03:19 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.5-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Summary ------------------------------------------------------------------------
kernel: 5.5.5-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.5.y git commit: 8aa3a43b129c3b5516b2310f4a93256da87dc711 git describe: v5.5.4-81-g8aa3a43b129c Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.5-oe/build/v5.5.4-81-g8...
No regressions (compared to build v5.5.4)
No fixes (compared to build v5.5.4)
Ran 25790 total tests in the following environments and test suites.
Environments -------------- - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - nxp-ls2088 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - x86
Test Suites ----------- * build * install-android-platform-tools-r2600 * kselftest * libgpiod * libhugetlbfs * linux-log-parser * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * ltp-fs-tests * ltp-syscalls-tests * ltp-open-posix-tests * network-basic-tests * kvm-unit-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none
On Wed, Feb 19, 2020 at 10:00:27AM +0530, Naresh Kamboju wrote:
On Wed, 19 Feb 2020 at 01:32, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.5.5 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Feb 2020 19:03:19 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.5-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Great, thanks for testing all of them and letting me know.
greg k-h
On 18/02/2020 19:54, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.5.5 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Feb 2020 19:03:19 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.5-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v5.5: 13 builds: 13 pass, 0 fail 22 boots: 22 pass, 0 fail 40 tests: 40 pass, 0 fail
Linux version: 5.5.5-rc1-g8aa3a43b129c Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Cheers Jon
On Wed, Feb 19, 2020 at 11:06:26AM +0000, Jon Hunter wrote:
On 18/02/2020 19:54, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.5.5 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Feb 2020 19:03:19 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.5.5-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.5.y and the diffstat can be found below.
thanks,
greg k-h
All tests are passing for Tegra ...
Test results for stable-v5.5: 13 builds: 13 pass, 0 fail 22 boots: 22 pass, 0 fail 40 tests: 40 pass, 0 fail
Linux version: 5.5.5-rc1-g8aa3a43b129c Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Thanks for testing all of these and letting me know.
greg k-h
On Tue, Feb 18, 2020 at 08:54:21PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.5.5 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Feb 2020 19:03:19 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 157 fail: 0 Qemu test results: total: 412 pass: 412 fail: 0
Guenter
On Wed, Feb 19, 2020 at 10:09:54AM -0800, Guenter Roeck wrote:
On Tue, Feb 18, 2020 at 08:54:21PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.5.5 release. There are 80 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Feb 2020 19:03:19 +0000. Anything received after that time might be too late.
Build results: total: 157 pass: 157 fail: 0 Qemu test results: total: 412 pass: 412 fail: 0
Great, thanks for testing all of them and letting me know.
greg k-h
linux-stable-mirror@lists.linaro.org