This is the start of the stable review cycle for the 5.10.178 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.178-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.10.178-rc1
Kuniyuki Iwashima kuniyu@amazon.com sysctl: Fix data-races in proc_dou8vec_minmax().
Valentin Schneider vschneid@redhat.com panic, kexec: make __crash_kexec() NMI safe
Valentin Schneider vschneid@redhat.com kexec: turn all kexec_mutex acquisitions into trylocks
Arnd Bergmann arnd@arndb.de kexec: move locking into do_kexec_load
Nathan Chancellor nathan@kernel.org riscv: Handle zicsr/zifencei issues between clang and binutils
Masahiro Yamada masahiroy@kernel.org kbuild: check CONFIG_AS_IS_LLVM instead of LLVM_IAS
Nathan Chancellor nathan@kernel.org kbuild: Switch to 'f' variants of integrated assembler flag
Masahiro Yamada masahiroy@kernel.org kbuild: check the minimum assembler version in Kconfig
Steve Clevenger scclevenger@os.amperecomputing.com coresight-etm4: Fix for() loop drvdata->nr_addr_cmp range bug
George Cherian george.cherian@marvell.com watchdog: sbsa_wdog: Make sure the timeout programming is within the limits
Waiman Long longman@redhat.com cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods
Waiman Long longman@redhat.com cgroup/cpuset: Make cpuset_fork() handle CLONE_INTO_CGROUP properly
Waiman Long longman@redhat.com cgroup/cpuset: Skip spread flags update on v2
Waiman Long longman@redhat.com cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
Gregor Herburger gregor.herburger@tq-group.com i2c: ocores: generate stop condition after timeout in polling mode
Matija Glavinic Pecotic matija.glavinic-pecotic.ext@nokia.com x86/rtc: Remove __init for runtime functions
Vincent Guittot vincent.guittot@linaro.org sched/fair: Fix imbalance overflow
zgpeng zgpeng.linux@gmail.com sched/fair: Move calculate of avg_load to a better location
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/papr_scm: Update the NUMA distance table for the target node
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/pseries: Add support for FORM2 associativity
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/pseries: Add a helper for form1 cpu distance
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/pseries: Consolidate different NUMA distance update code paths
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/pseries: Rename TYPE1_AFFINITY to FORM1_AFFINITY
Aneesh Kumar K.V aneesh.kumar@linux.ibm.com powerpc/pseries: rename min_common_depth to primary_domain_index
ZhaoLong Wang wangzhaolong1@huawei.com ubi: Fix deadlock caused by recursively holding work_sem
Lee Jones lee.jones@linaro.org mtd: ubi: wl: Fix a couple of kernel-doc issues
Zhihao Cheng chengzhihao1@huawei.com ubi: Fix failure attaching when vid_hdr offset equals to (sub)page size
Waiman Long longman@redhat.com cgroup/cpuset: Wake up cpuset_attach_wq tasks in cpuset_cancel_attach()
Basavaraj Natikar Basavaraj.Natikar@amd.com x86/PCI: Add quirk for AMD XHCI controller that loses MSI-X state in D3hot
Jiri Kosina jkosina@suse.cz scsi: ses: Handle enclosure with just a primary component gracefully
Ivan Bornyakov i.bornyakov@metrotek.ru net: sfp: initialize sfp->i2c_block_size at sfp allocation
Mathis Salmen mathis.salmen@matsal.de riscv: add icache flush for nommu sigreturn trampoline
Robbie Harwood rharwood@redhat.com asymmetric_keys: log on fatal failures in PE/pkcs7
Robbie Harwood rharwood@redhat.com verify_pefile: relax wrapper length check
Hans de Goede hdegoede@redhat.com drm: panel-orientation-quirks: Add quirk for Lenovo Yoga Book X90F
Hans de Goede hdegoede@redhat.com efi: sysfb_efi: Add quirk for Lenovo Yoga Book X91F/L
Alexander Stein alexander.stein@ew.tq-group.com i2c: imx-lpi2c: clean rx/tx buffers upon new message
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org wifi: mwifiex: mark OF related data as maybe unused
Grant Grundler grundler@chromium.org power: supply: cros_usbpd: reclassify "default case!" as debug
Andrii Nakryiko andrii@kernel.org libbpf: Fix single-line struct definition output in btf_dump
Roman Gushchin roman.gushchin@linux.dev net: macb: fix a memory corruption in extended buffer descriptor mode
Eric Dumazet edumazet@google.com udp6: fix potential access to stale information
Saravanan Vajravel saravanan.vajravel@broadcom.com RDMA/core: Fix GID entry ref leak when create_ah fails
Xin Long lucien.xin@gmail.com sctp: fix a potential overflow in sctp_ifwdtsn_skip
Ziyang Xuan william.xuanziyang@huawei.com net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume()
Denis Plotnikov den-plotnikov@yandex-team.ru qlcnic: check pci_reset_function result
Christophe JAILLET christophe.jaillet@wanadoo.fr drm/armada: Fix a potential double free in an error handling path
YueHaibing yuehaibing@huawei.com tcp: restrict net.ipv4.tcp_app_win
Eric Dumazet edumazet@google.com tcp: convert elligible sysctls to u8
Eric Dumazet edumazet@google.com ipv4: shrink netns_ipv4 with sysctl conversions
Eric Dumazet edumazet@google.com sysctl: add proc_dou8vec_minmax()
Harshit Mogalapalli harshit.m.mogalapalli@oracle.com niu: Fix missing unwind goto in niu_alloc_channels()
Zheng Wang zyytlz.wz@163.com 9p/xen : Fix use after free bug in xen_9pfs_front_remove due to race condition
Mark Zhang markzhang@nvidia.com RDMA/cma: Allow UD qp_type to join multicast only
Maher Sanalla msanalla@nvidia.com IB/mlx5: Add support for 400G_8X lane speed
Meir Lichtinger meirl@mellanox.com IB/mlx5: Add support for NDR link speed
Chunyan Zhang chunyan.zhang@unisoc.com clk: sprd: set max_register according to mapping range
Christophe Kerello christophe.kerello@foss.st.com mtd: rawnand: stm32_fmc2: use timings.mode instead of checking tRC_min
Christophe Kerello christophe.kerello@foss.st.com mtd: rawnand: stm32_fmc2: remove unsupported EDO mode
Arseniy Krasnov avkrasnov@sberdevices.ru mtd: rawnand: meson: fix bitmask for length in command word
Bang Li libang.linuxer@gmail.com mtdblock: tolerate corrected bit-flips
Daniel Vetter daniel.vetter@ffwll.ch fbmem: Reject FB_ACTIVATE_KD_TEXT from userspace
Christoph Hellwig hch@lst.de btrfs: fix fast csum implementation detection
David Sterba dsterba@suse.com btrfs: print checksum type and implementation at mount time
Min Li lm0963hack@gmail.com Bluetooth: Fix race condition in hidp_session_thread
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: L2CAP: Fix use-after-free in l2cap_disconnect_{req,rsp}
Oswald Buddenhagen oswald.buddenhagen@gmx.de ALSA: hda/sigmatel: fix S/PDIF out on Intel D*45* motherboards
Xu Biang xubiang@hust.edu.cn ALSA: firewire-tascam: add missing unwind goto in snd_tscm_stream_start_duplex()
Oswald Buddenhagen oswald.buddenhagen@gmx.de ALSA: i2c/cs8427: fix iec958 mixer control deactivation
Oswald Buddenhagen oswald.buddenhagen@gmx.de ALSA: hda/sigmatel: add pin overrides for Intel DP45SG motherboard
Oswald Buddenhagen oswald.buddenhagen@gmx.de ALSA: emu10k1: fix capture interrupt handler unlinking
Kornel Dulęba korneld@chromium.org Revert "pinctrl: amd: Disable and mask interrupts on resume"
Eduard Zingerman eddyz87@gmail.com bpftool: Print newline before '}' for struct with padding only fields
Heming Zhao ocfs2-devel@oss.oracle.com ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown
Gaosheng Cui cuigaosheng1@huawei.com Revert "media: ti: cal: fix possible memory leak in cal_ctx_create()"
Robert Foss robert.foss@linaro.org drm/bridge: lt9611: Fix PLL being unable to lock
Tommi Rantala tommi.t.rantala@nokia.com selftests: intel_pstate: ftime() is deprecated
Rongwei Wang rongwei.wang@linux.alibaba.com mm/swap: fix swap_info_struct race between swapoff and get_swap_pages()
Zheng Yejian zhengyejian1@huawei.com ring-buffer: Fix race while reader and writer are on the same page
Karol Herbst kherbst@redhat.com drm/nouveau/disp: Support more modes by checking with lower bpc
Boris Brezillon boris.brezillon@collabora.com drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path
Jason Montleon jmontleo@redhat.com ASoC: hdac_hdmi: use set_stream() instead of set_tdm_slots()
Steven Rostedt (Google) rostedt@goodmis.org tracing: Free error logs of tracing instances
Michal Sojka michal.sojka@cvut.cz can: isotp: isotp_ops: fix poll() to not report false EPOLLOUT events
Oleksij Rempel linux@rempel-privat.de can: j1939: j1939_tp_tx_dat_new(): fix out-of-bounds memory access
Zheng Yejian zhengyejian1@huawei.com ftrace: Fix issue that 'direct->addr' not restored in modify_ftrace_direct()
John Keeping john@metanate.com ftrace: Mark get_lock_parent_ip() __always_inline
Kan Liang kan.liang@linux.intel.com perf/core: Fix the same task check in perf_event_set_output
Zhong Jinghua zhongjinghua@huawei.com scsi: iscsi_tcp: Check that sock is valid before iscsi_set_param()
Nuno Sá nuno.sa@analog.com iio: adc: ad7791: fix IRQ flags
Jeremy Soller jeremy@system76.com ALSA: hda/realtek: Add quirk for Clevo X370SNW
Geert Uytterhoeven geert+renesas@glider.be dt-bindings: serial: renesas,scif: Fix 4th IRQ for 4-IRQ SCIFs
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix sysfs interface lifetime
Ryusuke Konishi konishi.ryusuke@gmail.com nilfs2: fix potential UAF of struct nilfs_sc_info in nilfs_segctor_thread()
Sherry Sun sherry.sun@nxp.com tty: serial: fsl_lpuart: avoid checking for transfer complete when UARTCTRL_SBK is asserted in lpuart32_tx_empty
Biju Das biju.das.jz@bp.renesas.com tty: serial: sh-sci: Fix Rx on RZ/G2L SCI
Biju Das biju.das.jz@bp.renesas.com tty: serial: sh-sci: Fix transmit end interrupt handler
Kai-Heng Feng kai.heng.feng@canonical.com iio: light: cm32181: Unregister second I2C client if present
William Breathitt Gray william.gray@linaro.org iio: dac: cio-dac: Fix max DAC write value check for 12-bit
Lars-Peter Clausen lars@metafoo.de iio: adc: ti-ads7950: Set `can_sleep` flag for GPIO chip
Bjørn Mork bjorn@mork.no USB: serial: option: add Quectel RM500U-CN modem
Enrico Sau enrico.sau@gmail.com USB: serial: option: add Telit FE990 compositions
RD Babiera rdbabiera@google.com usb: typec: altmodes/displayport: Fix configure initial pin assignment
Kees Jan Koster kjkoster@kjkoster.org USB: serial: cp210x: add Silicon Labs IFS-USB-DATACABLE IDs
D Scott Phillips scott@os.amperecomputing.com xhci: also avoid the XHCI_ZERO_64B_REGS quirk with a passthrough iommu
Wayne Chang waynec@nvidia.com usb: xhci: tegra: fix sleep in atomic call
Dai Ngo dai.ngo@oracle.com NFSD: callback request does not use correct credential for AUTH_SYS
Jeff Layton jlayton@kernel.org sunrpc: only free unix grouplist after RCU settles
Corinna Vinschen vinschen@redhat.com net: stmmac: fix up RX flow hash indirection table when setting channels
Siddharth Vadapalli s-vadapalli@ti.com net: ethernet: ti: am65-cpsw: Fix mdio cleanup in probe
Dhruva Gole d-gole@ti.com gpio: davinci: Add irq chip flag to skip set wake
Ziyang Xuan william.xuanziyang@huawei.com ipv6: Fix an uninit variable access bug in __ip6_make_skb()
Sricharan Ramabadhran quic_srichara@quicinc.com net: qrtr: Do not do DEL_SERVER broadcast after DEL_CLIENT
Xin Long lucien.xin@gmail.com sctp: check send stream number after wait_for_sndbuf
Jakub Kicinski kuba@kernel.org net: don't let netpoll invoke NAPI if in xmit context
Eric Dumazet edumazet@google.com icmp: guard against too small mtu
Ziyang Xuan william.xuanziyang@huawei.com net: qrtr: Fix a refcount bug in qrtr_recvmsg()
Luca Weiss luca@z3ntu.xyz net: qrtr: combine nameservice into main module
Felix Fietkau nbd@nbd.name wifi: mac80211: fix invalid drv_sta_pre_rcu_remove calls for non-uploaded sta
Nico Boehr nrb@linux.ibm.com KVM: s390: pv: fix external interruption loop not always detected
Uwe Kleine-König u.kleine-koenig@pengutronix.de pwm: sprd: Explicitly set .polarity in .get_state()
Uwe Kleine-König u.kleine-koenig@pengutronix.de pwm: cros-ec: Explicitly set .polarity in .get_state()
Mohammed Gamal mgamal@redhat.com Drivers: vmbus: Check for channel allocation before looking up relids
Randy Dunlap rdunlap@infradead.org gpio: GPIO_REGMAP: select REGMAP instead of depending on it
-------------
Diffstat:
.../devicetree/bindings/serial/renesas,scif.yaml | 4 +- Documentation/networking/ip-sysctl.rst | 2 + Documentation/powerpc/associativity.rst | 104 +++++ Documentation/sound/hd-audio/models.rst | 2 +- Makefile | 12 +- arch/powerpc/include/asm/firmware.h | 7 +- arch/powerpc/include/asm/prom.h | 3 +- arch/powerpc/include/asm/topology.h | 6 +- arch/powerpc/kernel/prom_init.c | 3 +- arch/powerpc/mm/numa.c | 433 ++++++++++++++++----- arch/powerpc/platforms/pseries/firmware.c | 3 +- arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 + arch/powerpc/platforms/pseries/hotplug-memory.c | 2 + arch/powerpc/platforms/pseries/lpar.c | 4 +- arch/powerpc/platforms/pseries/papr_scm.c | 7 + arch/riscv/Kconfig | 22 ++ arch/riscv/Makefile | 12 +- arch/riscv/kernel/signal.c | 9 +- arch/s390/kvm/intercept.c | 32 +- arch/x86/kernel/sysfb_efi.c | 8 + arch/x86/kernel/x86_init.c | 4 +- arch/x86/pci/fixup.c | 21 + crypto/asymmetric_keys/pkcs7_verify.c | 10 +- crypto/asymmetric_keys/verify_pefile.c | 32 +- drivers/clk/sprd/common.c | 9 +- drivers/gpio/Kconfig | 2 +- drivers/gpio/gpio-davinci.c | 2 +- drivers/gpu/drm/armada/armada_drv.c | 1 - drivers/gpu/drm/bridge/lontium-lt9611.c | 1 + drivers/gpu/drm/drm_panel_orientation_quirks.c | 13 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 32 ++ drivers/gpu/drm/nouveau/nouveau_dp.c | 8 +- drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + drivers/hv/connection.c | 4 + drivers/hwtracing/coresight/coresight-etm4x-core.c | 2 +- drivers/i2c/busses/i2c-imx-lpi2c.c | 2 + drivers/i2c/busses/i2c-ocores.c | 35 +- drivers/iio/adc/ad7791.c | 2 +- drivers/iio/adc/ti-ads7950.c | 1 + drivers/iio/dac/cio-dac.c | 4 +- drivers/iio/light/cm32181.c | 12 + drivers/infiniband/core/cma.c | 60 +-- drivers/infiniband/core/verbs.c | 2 + drivers/infiniband/hw/mlx5/main.c | 16 + drivers/media/platform/ti-vpe/cal.c | 4 +- drivers/mtd/mtdblock.c | 12 +- drivers/mtd/nand/raw/meson_nand.c | 6 +- drivers/mtd/nand/raw/stm32_fmc2_nand.c | 3 + drivers/mtd/ubi/build.c | 21 +- drivers/mtd/ubi/wl.c | 5 +- drivers/net/ethernet/cadence/macb_main.c | 4 + drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c | 8 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +- drivers/net/ethernet/sun/niu.c | 2 +- drivers/net/ethernet/ti/am65-cpsw-nuss.c | 6 +- drivers/net/phy/sfp.c | 13 +- drivers/net/wireless/marvell/mwifiex/pcie.c | 2 +- drivers/net/wireless/marvell/mwifiex/sdio.c | 2 +- drivers/pinctrl/pinctrl-amd.c | 36 +- drivers/power/supply/cros_usbpd-charger.c | 2 +- drivers/pwm/pwm-cros-ec.c | 1 + drivers/pwm/pwm-sprd.c | 1 + drivers/scsi/iscsi_tcp.c | 3 +- drivers/scsi/ses.c | 20 +- drivers/tty/serial/fsl_lpuart.c | 8 +- drivers/tty/serial/sh-sci.c | 10 +- drivers/usb/host/xhci-tegra.c | 6 +- drivers/usb/host/xhci.c | 6 +- drivers/usb/serial/cp210x.c | 1 + drivers/usb/serial/option.c | 10 + drivers/usb/typec/altmodes/displayport.c | 6 +- drivers/video/fbdev/core/fbmem.c | 2 + drivers/watchdog/sbsa_gwdt.c | 1 + fs/btrfs/disk-io.c | 17 + fs/btrfs/super.c | 2 - fs/nfsd/nfs4callback.c | 4 +- fs/nilfs2/segment.c | 3 +- fs/nilfs2/super.c | 2 + fs/nilfs2/the_nilfs.c | 12 +- fs/ocfs2/dlmglue.c | 8 +- fs/ocfs2/super.c | 3 +- fs/proc/proc_sysctl.c | 6 + include/linux/ftrace.h | 2 +- include/linux/kexec.h | 2 +- include/linux/sysctl.h | 2 + include/net/netns/ipv4.h | 100 ++--- init/Kconfig | 12 + kernel/cgroup/cpuset.c | 214 +++++++--- kernel/events/core.c | 2 +- kernel/kexec.c | 41 +- kernel/kexec_core.c | 28 +- kernel/kexec_file.c | 4 +- kernel/kexec_internal.h | 15 +- kernel/ksysfs.c | 7 +- kernel/sched/fair.c | 15 +- kernel/sysctl.c | 65 ++++ kernel/trace/ftrace.c | 15 +- kernel/trace/ring_buffer.c | 13 +- kernel/trace/trace.c | 1 + mm/swapfile.c | 3 +- net/9p/trans_xen.c | 4 + net/bluetooth/hidp/core.c | 2 +- net/bluetooth/l2cap_core.c | 24 +- net/can/isotp.c | 17 +- net/can/j1939/transport.c | 5 +- net/core/netpoll.c | 19 +- net/ipv4/icmp.c | 5 + net/ipv4/sysctl_net_ipv4.c | 203 +++++----- net/ipv6/ip6_output.c | 7 +- net/ipv6/udp.c | 8 +- net/mac80211/sta_info.c | 3 +- net/qrtr/Makefile | 3 +- net/qrtr/{qrtr.c => af_qrtr.c} | 15 + net/qrtr/ns.c | 15 +- net/sctp/socket.c | 4 + net/sctp/stream_interleave.c | 3 +- net/sunrpc/svcauth_unix.c | 17 +- scripts/Kconfig.include | 6 + scripts/as-version.sh | 69 ++++ scripts/dummy-tools/gcc | 6 + sound/firewire/tascam/tascam-stream.c | 2 +- sound/i2c/cs8427.c | 7 +- sound/pci/emu10k1/emupcm.c | 4 +- sound/pci/hda/patch_realtek.c | 1 + sound/pci/hda/patch_sigmatel.c | 10 + sound/soc/codecs/hdac_hdmi.c | 17 +- tools/lib/bpf/btf_dump.c | 11 +- tools/testing/selftests/intel_pstate/aperf.c | 22 +- 128 files changed, 1652 insertions(+), 615 deletions(-)
From: Randy Dunlap rdunlap@infradead.org
[ Upstream commit d49765b5f4320a402fbc4ed5edfd73d87640f27c ]
REGMAP is a hidden (not user visible) symbol. Users cannot set it directly thru "make *config", so drivers should select it instead of depending on it if they need it.
Consistently using "select" or "depends on" can also help reduce Kconfig circular dependency issues.
Therefore, change the use of "depends on REGMAP" to "select REGMAP".
Fixes: ebe363197e52 ("gpio: add a reusable generic gpio_chip using regmap") Signed-off-by: Randy Dunlap rdunlap@infradead.org Cc: Michael Walle michael@walle.cc Cc: Linus Walleij linus.walleij@linaro.org Cc: Bartosz Golaszewski brgl@bgdev.pl Cc: linux-gpio@vger.kernel.org Acked-by: Michael Walle michael@walle.cc Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index d1300fc003ed7..39f3e13664099 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -99,7 +99,7 @@ config GPIO_GENERIC tristate
config GPIO_REGMAP - depends on REGMAP + select REGMAP tristate
# put drivers in the right section, in alphabetical order
From: Mohammed Gamal mgamal@redhat.com
[ Upstream commit 1eb65c8687316c65140b48fad27133d583178e15 ]
relid2channel() assumes vmbus channel array to be allocated when called. However, in cases such as kdump/kexec, not all relids will be reset by the host. When the second kernel boots and if the guest receives a vmbus interrupt during vmbus driver initialization before vmbus_connect() is called, before it finishes, or if it fails, the vmbus interrupt service routine is called which in turn calls relid2channel() and can cause a null pointer dereference.
Print a warning and error out in relid2channel() for a channel id that's invalid in the second kernel.
Fixes: 8b6a877c060e ("Drivers: hv: vmbus: Replace the per-CPU channel lists with a global array of channels")
Signed-off-by: Mohammed Gamal mgamal@redhat.com Reviewed-by: Dexuan Cui decui@microsoft.com Link: https://lore.kernel.org/r/20230217204411.212709-1-mgamal@redhat.com Signed-off-by: Wei Liu wei.liu@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hv/connection.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c index bfd7f00a59ecf..683fdfa3e723e 100644 --- a/drivers/hv/connection.c +++ b/drivers/hv/connection.c @@ -305,6 +305,10 @@ void vmbus_disconnect(void) */ struct vmbus_channel *relid2channel(u32 relid) { + if (vmbus_connection.channels == NULL) { + pr_warn_once("relid2channel: relid=%d: No channels mapped!\n", relid); + return NULL; + } if (WARN_ON(relid >= MAX_CHANNEL_RELIDS)) return NULL; return READ_ONCE(vmbus_connection.channels[relid]);
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
[ Upstream commit 30006b77c7e130e01d1ab2148cc8abf73dfcc4bf ]
The driver only supports normal polarity. Complete the implementation of .get_state() by setting .polarity accordingly.
Reviewed-by: Guenter Roeck groeck@chromium.org Fixes: 1f0d3bb02785 ("pwm: Add ChromeOS EC PWM driver") Link: https://lore.kernel.org/r/20230228135508.1798428-3-u.kleine-koenig@pengutron... Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pwm/pwm-cros-ec.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/pwm/pwm-cros-ec.c b/drivers/pwm/pwm-cros-ec.c index c1c337969e4ec..d4f4a133656c7 100644 --- a/drivers/pwm/pwm-cros-ec.c +++ b/drivers/pwm/pwm-cros-ec.c @@ -154,6 +154,7 @@ static void cros_ec_pwm_get_state(struct pwm_chip *chip, struct pwm_device *pwm,
state->enabled = (ret > 0); state->period = EC_PWM_MAX_DUTY; + state->polarity = PWM_POLARITY_NORMAL;
/* * Note that "disabled" and "duty cycle == 0" are treated the same. If
On Tue, Apr 18, 2023 at 02:20:22PM +0200, Greg Kroah-Hartman wrote:
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
[ Upstream commit 30006b77c7e130e01d1ab2148cc8abf73dfcc4bf ]
The driver only supports normal polarity. Complete the implementation of .get_state() by setting .polarity accordingly.
Reviewed-by: Guenter Roeck groeck@chromium.org Fixes: 1f0d3bb02785 ("pwm: Add ChromeOS EC PWM driver") Link: https://lore.kernel.org/r/20230228135508.1798428-3-u.kleine-koenig@pengutron... Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org
I see you picked this one and the similar sprd patch, but not
8caa81eb950c pwm: meson: Explicitly set .polarity in .get_state() b20b097128d9 pwm: iqs620a: Explicitly set .polarity in .get_state() 6f5793798014 pwm: hibvt: Explicitly set .polarity in .get_state()
(At least I didn't get a mail about these). These should qualify in the same way.
Maybe you also want to pick
1271a7b98e79 pwm: Zero-initialize the pwm_state passed to driver's .get_state()
Best regards Uwe
On Tue, Apr 18, 2023 at 03:01:21PM +0200, Uwe Kleine-König wrote:
On Tue, Apr 18, 2023 at 02:20:22PM +0200, Greg Kroah-Hartman wrote:
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
[ Upstream commit 30006b77c7e130e01d1ab2148cc8abf73dfcc4bf ]
The driver only supports normal polarity. Complete the implementation of .get_state() by setting .polarity accordingly.
Reviewed-by: Guenter Roeck groeck@chromium.org Fixes: 1f0d3bb02785 ("pwm: Add ChromeOS EC PWM driver") Link: https://lore.kernel.org/r/20230228135508.1798428-3-u.kleine-koenig@pengutron... Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org
I see you picked this one and the similar sprd patch, but not
8caa81eb950c pwm: meson: Explicitly set .polarity in .get_state() b20b097128d9 pwm: iqs620a: Explicitly set .polarity in .get_state() 6f5793798014 pwm: hibvt: Explicitly set .polarity in .get_state()
(At least I didn't get a mail about these). These should qualify in the same way.
They didn't all apply very well (one did).
Maybe you also want to pick
1271a7b98e79 pwm: Zero-initialize the pwm_state passed to driver's .get_state()
I've queued that up to 6.2 only, the other trees it didn't apply to, want to send a backport?
thanks,
greg k-h
On Sat, Apr 22, 2023 at 05:05:08PM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 18, 2023 at 03:01:21PM +0200, Uwe Kleine-König wrote:
On Tue, Apr 18, 2023 at 02:20:22PM +0200, Greg Kroah-Hartman wrote:
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
[ Upstream commit 30006b77c7e130e01d1ab2148cc8abf73dfcc4bf ]
The driver only supports normal polarity. Complete the implementation of .get_state() by setting .polarity accordingly.
Reviewed-by: Guenter Roeck groeck@chromium.org Fixes: 1f0d3bb02785 ("pwm: Add ChromeOS EC PWM driver") Link: https://lore.kernel.org/r/20230228135508.1798428-3-u.kleine-koenig@pengutron... Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org
I see you picked this one and the similar sprd patch, but not
8caa81eb950c pwm: meson: Explicitly set .polarity in .get_state() b20b097128d9 pwm: iqs620a: Explicitly set .polarity in .get_state() 6f5793798014 pwm: hibvt: Explicitly set .polarity in .get_state()
(At least I didn't get a mail about these). These should qualify in the same way.
They didn't all apply very well (one did).
Nope, that one broke the build, so none of these applied, which is probably why Sasha didn't do it :)
thanks,
greg k-h
From: Uwe Kleine-König u.kleine-koenig@pengutronix.de
[ Upstream commit 2be4dcf6627e1bcbbef8e6ba1811f5127d39202c ]
The driver only supports normal polarity. Complete the implementation of .get_state() by setting .polarity accordingly.
Fixes: 8aae4b02e8a6 ("pwm: sprd: Add Spreadtrum PWM support") Link: https://lore.kernel.org/r/20230228135508.1798428-5-u.kleine-koenig@pengutron... Signed-off-by: Uwe Kleine-König u.kleine-koenig@pengutronix.de Signed-off-by: Thierry Reding thierry.reding@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/pwm/pwm-sprd.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/pwm/pwm-sprd.c b/drivers/pwm/pwm-sprd.c index 9eeb59cb81b68..b23456d38bd22 100644 --- a/drivers/pwm/pwm-sprd.c +++ b/drivers/pwm/pwm-sprd.c @@ -109,6 +109,7 @@ static void sprd_pwm_get_state(struct pwm_chip *chip, struct pwm_device *pwm, duty = val & SPRD_PWM_DUTY_MSK; tmp = (prescale + 1) * NSEC_PER_SEC * duty; state->duty_cycle = DIV_ROUND_CLOSEST_ULL(tmp, chn->clk_rate); + state->polarity = PWM_POLARITY_NORMAL;
/* Disable PWM clocks if the PWM channel is not in enable state. */ if (!state->enabled)
From: Nico Boehr nrb@linux.ibm.com
[ Upstream commit 21f27df854008b86349a203bf97fef79bb11f53e ]
To determine whether the guest has caused an external interruption loop upon code 20 (external interrupt) intercepts, the ext_new_psw needs to be inspected to see whether external interrupts are enabled.
Under non-PV, ext_new_psw can simply be taken from guest lowcore. Under PV, KVM can only access the encrypted guest lowcore and hence the ext_new_psw must not be taken from guest lowcore.
handle_external_interrupt() incorrectly did that and hence was not able to reliably tell whether an external interruption loop is happening or not. False negatives cause spurious failures of my kvm-unit-test for extint loops[1] under PV.
Since code 20 is only caused under PV if and only if the guest's ext_new_psw is enabled for external interrupts, false positive detection of a external interruption loop can not happen.
Fix this issue by instead looking at the guest PSW in the state description. Since the PSW swap for external interrupt is done by the ultravisor before the intercept is caused, this reliably tells whether the guest is enabled for external interrupts in the ext_new_psw.
Also update the comments to explain better what is happening.
[1] https://lore.kernel.org/kvm/20220812062151.1980937-4-nrb@linux.ibm.com/
Signed-off-by: Nico Boehr nrb@linux.ibm.com Reviewed-by: Janosch Frank frankja@linux.ibm.com Reviewed-by: Christian Borntraeger borntraeger@linux.ibm.com Fixes: 201ae986ead7 ("KVM: s390: protvirt: Implement interrupt injection") Link: https://lore.kernel.org/r/20230213085520.100756-2-nrb@linux.ibm.com Message-Id: 20230213085520.100756-2-nrb@linux.ibm.com Signed-off-by: Janosch Frank frankja@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/kvm/intercept.c | 32 ++++++++++++++++++++++++-------- 1 file changed, 24 insertions(+), 8 deletions(-)
diff --git a/arch/s390/kvm/intercept.c b/arch/s390/kvm/intercept.c index 77909d362b78f..5be68190901f9 100644 --- a/arch/s390/kvm/intercept.c +++ b/arch/s390/kvm/intercept.c @@ -270,10 +270,18 @@ static int handle_prog(struct kvm_vcpu *vcpu) /** * handle_external_interrupt - used for external interruption interceptions * - * This interception only occurs if the CPUSTAT_EXT_INT bit was set, or if - * the new PSW does not have external interrupts disabled. In the first case, - * we've got to deliver the interrupt manually, and in the second case, we - * drop to userspace to handle the situation there. + * This interception occurs if: + * - the CPUSTAT_EXT_INT bit was already set when the external interrupt + * occurred. In this case, the interrupt needs to be injected manually to + * preserve interrupt priority. + * - the external new PSW has external interrupts enabled, which will cause an + * interruption loop. We drop to userspace in this case. + * + * The latter case can be detected by inspecting the external mask bit in the + * external new psw. + * + * Under PV, only the latter case can occur, since interrupt priorities are + * handled in the ultravisor. */ static int handle_external_interrupt(struct kvm_vcpu *vcpu) { @@ -284,10 +292,18 @@ static int handle_external_interrupt(struct kvm_vcpu *vcpu)
vcpu->stat.exit_external_interrupt++;
- rc = read_guest_lc(vcpu, __LC_EXT_NEW_PSW, &newpsw, sizeof(psw_t)); - if (rc) - return rc; - /* We can not handle clock comparator or timer interrupt with bad PSW */ + if (kvm_s390_pv_cpu_is_protected(vcpu)) { + newpsw = vcpu->arch.sie_block->gpsw; + } else { + rc = read_guest_lc(vcpu, __LC_EXT_NEW_PSW, &newpsw, sizeof(psw_t)); + if (rc) + return rc; + } + + /* + * Clock comparator or timer interrupt with external interrupt enabled + * will cause interrupt loop. Drop to userspace. + */ if ((eic == EXT_IRQ_CLK_COMP || eic == EXT_IRQ_CPU_TIMER) && (newpsw.mask & PSW_MASK_EXT)) return -EOPNOTSUPP;
From: Felix Fietkau nbd@nbd.name
[ Upstream commit 12b220a6171faf10638ab683a975cadcf1a352d6 ]
Avoid potential data corruption issues caused by uninitialized driver private data structures.
Reported-by: Brian Coverstone brian@mainsequence.net Fixes: 6a9d1b91f34d ("mac80211: add pre-RCU-sync sta removal driver operation") Signed-off-by: Felix Fietkau nbd@nbd.name Link: https://lore.kernel.org/r/20230324120924.38412-3-nbd@nbd.name Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/sta_info.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index d572478c4d68e..2e84360990f0c 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -1040,7 +1040,8 @@ static int __must_check __sta_info_destroy_part1(struct sta_info *sta) list_del_rcu(&sta->list); sta->removed = true;
- drv_sta_pre_rcu_remove(local, sta->sdata, sta); + if (sta->uploaded) + drv_sta_pre_rcu_remove(local, sta->sdata, sta);
if (sdata->vif.type == NL80211_IFTYPE_AP_VLAN && rcu_access_pointer(sdata->u.vlan.sta) == sta)
From: Luca Weiss luca@z3ntu.xyz
[ Upstream commit a365023a76f231cc2fc6e33797e66f3bcaa9f9a9 ]
Previously with CONFIG_QRTR=m a separate ns.ko would be built which wasn't done on purpose and should be included in qrtr.ko.
Rename qrtr.c to af_qrtr.c so we can build a qrtr.ko with both af_qrtr.c and ns.c.
Signed-off-by: Luca Weiss luca@z3ntu.xyz Reviewed-by: Bjorn Andersson bjorn.andersson@linaro.org Tested-By: Steev Klimaszewski steev@kali.org Reviewed-by: Manivannan Sadhasivam mani@kernel.org Link: https://lore.kernel.org/r/20210928171156.6353-1-luca@z3ntu.xyz Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 44d807320000 ("net: qrtr: Fix a refcount bug in qrtr_recvmsg()") Signed-off-by: Sasha Levin sashal@kernel.org --- net/qrtr/Makefile | 3 ++- net/qrtr/{qrtr.c => af_qrtr.c} | 0 2 files changed, 2 insertions(+), 1 deletion(-) rename net/qrtr/{qrtr.c => af_qrtr.c} (100%)
diff --git a/net/qrtr/Makefile b/net/qrtr/Makefile index 1b1411d158a73..8e0605f88a73d 100644 --- a/net/qrtr/Makefile +++ b/net/qrtr/Makefile @@ -1,5 +1,6 @@ # SPDX-License-Identifier: GPL-2.0-only -obj-$(CONFIG_QRTR) := qrtr.o ns.o +obj-$(CONFIG_QRTR) += qrtr.o +qrtr-y := af_qrtr.o ns.o
obj-$(CONFIG_QRTR_SMD) += qrtr-smd.o qrtr-smd-y := smd.o diff --git a/net/qrtr/qrtr.c b/net/qrtr/af_qrtr.c similarity index 100% rename from net/qrtr/qrtr.c rename to net/qrtr/af_qrtr.c
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit 44d807320000db0d0013372ad39b53e12d52f758 ]
Syzbot reported a bug as following:
refcount_t: addition on 0; use-after-free. ... RIP: 0010:refcount_warn_saturate+0x17c/0x1f0 lib/refcount.c:25 ... Call Trace: <TASK> __refcount_add include/linux/refcount.h:199 [inline] __refcount_inc include/linux/refcount.h:250 [inline] refcount_inc include/linux/refcount.h:267 [inline] kref_get include/linux/kref.h:45 [inline] qrtr_node_acquire net/qrtr/af_qrtr.c:202 [inline] qrtr_node_lookup net/qrtr/af_qrtr.c:398 [inline] qrtr_send_resume_tx net/qrtr/af_qrtr.c:1003 [inline] qrtr_recvmsg+0x85f/0x990 net/qrtr/af_qrtr.c:1070 sock_recvmsg_nosec net/socket.c:1017 [inline] sock_recvmsg+0xe2/0x160 net/socket.c:1038 qrtr_ns_worker+0x170/0x1700 net/qrtr/ns.c:688 process_one_work+0x991/0x15c0 kernel/workqueue.c:2390 worker_thread+0x669/0x1090 kernel/workqueue.c:2537
It occurs in the concurrent scenario of qrtr_recvmsg() and qrtr_endpoint_unregister() as following:
cpu0 cpu1 qrtr_recvmsg qrtr_endpoint_unregister qrtr_send_resume_tx qrtr_node_release qrtr_node_lookup mutex_lock(&qrtr_node_lock) spin_lock_irqsave(&qrtr_nodes_lock, ) refcount_dec_and_test(&node->ref) [node->ref == 0] radix_tree_lookup [node != NULL] __qrtr_node_release qrtr_node_acquire spin_lock_irqsave(&qrtr_nodes_lock, ) kref_get(&node->ref) [WARNING] ... mutex_unlock(&qrtr_node_lock)
Use qrtr_node_lock to protect qrtr_node_lookup() implementation, this is actually improving the protection of node reference.
Fixes: 0a7e0d0ef054 ("net: qrtr: Migrate node lookup tree to spinlock") Reported-by: syzbot+a7492efaa5d61b51db23@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=a7492efaa5d61b51db23 Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/qrtr/af_qrtr.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/qrtr/af_qrtr.c b/net/qrtr/af_qrtr.c index 13448ca5aeff2..d0f0b2b8dce2f 100644 --- a/net/qrtr/af_qrtr.c +++ b/net/qrtr/af_qrtr.c @@ -388,10 +388,12 @@ static struct qrtr_node *qrtr_node_lookup(unsigned int nid) struct qrtr_node *node; unsigned long flags;
+ mutex_lock(&qrtr_node_lock); spin_lock_irqsave(&qrtr_nodes_lock, flags); node = radix_tree_lookup(&qrtr_nodes, nid); node = qrtr_node_acquire(node); spin_unlock_irqrestore(&qrtr_nodes_lock, flags); + mutex_unlock(&qrtr_node_lock);
return node; }
From: Eric Dumazet edumazet@google.com
[ Upstream commit 7d63b67125382ff0ffdfca434acbc94a38bd092b ]
syzbot was able to trigger a panic [1] in icmp_glue_bits(), or more exactly in skb_copy_and_csum_bits()
There is no repro yet, but I think the issue is that syzbot manages to lower device mtu to a small value, fooling __icmp_send()
__icmp_send() must make sure there is enough room for the packet to include at least the headers.
We might in the future refactor skb_copy_and_csum_bits() and its callers to no longer crash when something bad happens.
[1] kernel BUG at net/core/skbuff.c:3343 ! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 15766 Comm: syz-executor.0 Not tainted 6.3.0-rc4-syzkaller-00039-gffe78bbd5121 #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014 RIP: 0010:skb_copy_and_csum_bits+0x798/0x860 net/core/skbuff.c:3343 Code: f0 c1 c8 08 41 89 c6 e9 73 ff ff ff e8 61 48 d4 f9 e9 41 fd ff ff 48 8b 7c 24 48 e8 52 48 d4 f9 e9 c3 fc ff ff e8 c8 27 84 f9 <0f> 0b 48 89 44 24 28 e8 3c 48 d4 f9 48 8b 44 24 28 e9 9d fb ff ff RSP: 0018:ffffc90000007620 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000000001e8 RCX: 0000000000000100 RDX: ffff8880276f6280 RSI: ffffffff87fdd138 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000 R10: 00000000000001e8 R11: 0000000000000001 R12: 000000000000003c R13: 0000000000000000 R14: ffff888028244868 R15: 0000000000000b0e FS: 00007fbc81f1c700(0000) GS:ffff88802ca00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b2df43000 CR3: 00000000744db000 CR4: 0000000000150ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> icmp_glue_bits+0x7b/0x210 net/ipv4/icmp.c:353 __ip_append_data+0x1d1b/0x39f0 net/ipv4/ip_output.c:1161 ip_append_data net/ipv4/ip_output.c:1343 [inline] ip_append_data+0x115/0x1a0 net/ipv4/ip_output.c:1322 icmp_push_reply+0xa8/0x440 net/ipv4/icmp.c:370 __icmp_send+0xb80/0x1430 net/ipv4/icmp.c:765 ipv4_send_dest_unreach net/ipv4/route.c:1239 [inline] ipv4_link_failure+0x5a9/0x9e0 net/ipv4/route.c:1246 dst_link_failure include/net/dst.h:423 [inline] arp_error_report+0xcb/0x1c0 net/ipv4/arp.c:296 neigh_invalidate+0x20d/0x560 net/core/neighbour.c:1079 neigh_timer_handler+0xc77/0xff0 net/core/neighbour.c:1166 call_timer_fn+0x1a0/0x580 kernel/time/timer.c:1700 expire_timers+0x29b/0x4b0 kernel/time/timer.c:1751 __run_timers kernel/time/timer.c:2022 [inline]
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot+d373d60fddbdc915e666@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20230330174502.1915328-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/icmp.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c index a1aacf5e75a6a..0bbc047e8f6e1 100644 --- a/net/ipv4/icmp.c +++ b/net/ipv4/icmp.c @@ -755,6 +755,11 @@ void __icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info, room = 576; room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen; room -= sizeof(struct icmphdr); + /* Guard against tiny mtu. We need to include at least one + * IP network header for this message to make any sense. + */ + if (room <= (int)sizeof(struct iphdr)) + goto ende;
icmp_param.data_len = skb_in->len - icmp_param.offset; if (icmp_param.data_len > room)
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 275b471e3d2daf1472ae8fa70dc1b50c9e0b9e75 ]
Commit 0db3dc73f7a3 ("[NETPOLL]: tx lock deadlock fix") narrowed down the region under netif_tx_trylock() inside netpoll_send_skb(). (At that point in time netif_tx_trylock() would lock all queues of the device.) Taking the tx lock was problematic because driver's cleanup method may take the same lock. So the change made us hold the xmit lock only around xmit, and expected the driver to take care of locking within ->ndo_poll_controller().
Unfortunately this only works if netpoll isn't itself called with the xmit lock already held. Netpoll code is careful and uses trylock(). The drivers, however, may be using plain lock(). Printing while holding the xmit lock is going to result in rare deadlocks.
Luckily we record the xmit lock owners, so we can scan all the queues, the same way we scan NAPI owners. If any of the xmit locks is held by the local CPU we better not attempt any polling.
It would be nice if we could narrow down the check to only the NAPIs and the queue we're trying to use. I don't see a way to do that now.
Reported-by: Roman Gushchin roman.gushchin@linux.dev Fixes: 0db3dc73f7a3 ("[NETPOLL]: tx lock deadlock fix") Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/netpoll.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 960948290001e..2ad22511b9c6d 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -137,6 +137,20 @@ static void queue_process(struct work_struct *work) } }
+static int netif_local_xmit_active(struct net_device *dev) +{ + int i; + + for (i = 0; i < dev->num_tx_queues; i++) { + struct netdev_queue *txq = netdev_get_tx_queue(dev, i); + + if (READ_ONCE(txq->xmit_lock_owner) == smp_processor_id()) + return 1; + } + + return 0; +} + static void poll_one_napi(struct napi_struct *napi) { int work; @@ -183,7 +197,10 @@ void netpoll_poll_dev(struct net_device *dev) if (!ni || down_trylock(&ni->dev_lock)) return;
- if (!netif_running(dev)) { + /* Some drivers will take the same locks in poll and xmit, + * we can't poll if local CPU is already in xmit. + */ + if (!netif_running(dev) || netif_local_xmit_active(dev)) { up(&ni->dev_lock); return; }
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 2584024b23552c00d95b50255e47bd18d306d31a ]
This patch fixes a corner case where the asoc out stream count may change after wait_for_sndbuf.
When the main thread in the client starts a connection, if its out stream count is set to N while the in stream count in the server is set to N - 2, another thread in the client keeps sending the msgs with stream number N - 1, and waits for sndbuf before processing INIT_ACK.
However, after processing INIT_ACK, the out stream count in the client is shrunk to N - 2, the same to the in stream count in the server. The crash occurs when the thread waiting for sndbuf is awake and sends the msg in a non-existing stream(N - 1), the call trace is as below:
KASAN: null-ptr-deref in range [0x0000000000000038-0x000000000000003f] Call Trace: <TASK> sctp_cmd_send_msg net/sctp/sm_sideeffect.c:1114 [inline] sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1777 [inline] sctp_side_effects net/sctp/sm_sideeffect.c:1199 [inline] sctp_do_sm+0x197d/0x5310 net/sctp/sm_sideeffect.c:1170 sctp_primitive_SEND+0x9f/0xc0 net/sctp/primitive.c:163 sctp_sendmsg_to_asoc+0x10eb/0x1a30 net/sctp/socket.c:1868 sctp_sendmsg+0x8d4/0x1d90 net/sctp/socket.c:2026 inet_sendmsg+0x9d/0xe0 net/ipv4/af_inet.c:825 sock_sendmsg_nosec net/socket.c:722 [inline] sock_sendmsg+0xde/0x190 net/socket.c:745
The fix is to add an unlikely check for the send stream number after the thread wakes up from the wait_for_sndbuf.
Fixes: 5bbbbe32a431 ("sctp: introduce stream scheduler foundations") Reported-by: syzbot+47c24ca20a2fa01f082e@syzkaller.appspotmail.com Signed-off-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/socket.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/sctp/socket.c b/net/sctp/socket.c index e9b4ea3d934fa..3a68d65f7d153 100644 --- a/net/sctp/socket.c +++ b/net/sctp/socket.c @@ -1831,6 +1831,10 @@ static int sctp_sendmsg_to_asoc(struct sctp_association *asoc, err = sctp_wait_for_sndbuf(asoc, &timeo, msg_len); if (err) goto err; + if (unlikely(sinfo->sinfo_stream >= asoc->stream.outcnt)) { + err = -EINVAL; + goto err; + } }
if (sctp_state(asoc, CLOSED)) {
From: Sricharan Ramabadhran quic_srichara@quicinc.com
[ Upstream commit 839349d13905927d8a567ca4d21d88c82028e31d ]
On the remote side, when QRTR socket is removed, af_qrtr will call qrtr_port_remove() which broadcasts the DEL_CLIENT packet to all neighbours including local NS. NS upon receiving the DEL_CLIENT packet, will remove the lookups associated with the node:port and broadcasts the DEL_SERVER packet.
But on the host side, due to the arrival of the DEL_CLIENT packet, the NS would've already deleted the server belonging to that port. So when the remote's NS again broadcasts the DEL_SERVER for that port, it throws below error message on the host:
"failed while handling packet from 2:-2"
So fix this error by not broadcasting the DEL_SERVER packet when the DEL_CLIENT packet gets processed."
Fixes: 0c2204a4ad71 ("net: qrtr: Migrate nameservice to kernel from userspace") Reviewed-by: Manivannan Sadhasivam mani@kernel.org Signed-off-by: Ram Kumar Dharuman quic_ramd@quicinc.com Signed-off-by: Sricharan Ramabadhran quic_srichara@quicinc.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/qrtr/ns.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c index fe81e03851686..713e9940d88bb 100644 --- a/net/qrtr/ns.c +++ b/net/qrtr/ns.c @@ -273,7 +273,7 @@ static struct qrtr_server *server_add(unsigned int service, return NULL; }
-static int server_del(struct qrtr_node *node, unsigned int port) +static int server_del(struct qrtr_node *node, unsigned int port, bool bcast) { struct qrtr_lookup *lookup; struct qrtr_server *srv; @@ -286,7 +286,7 @@ static int server_del(struct qrtr_node *node, unsigned int port) radix_tree_delete(&node->servers, port);
/* Broadcast the removal of local servers */ - if (srv->node == qrtr_ns.local_node) + if (srv->node == qrtr_ns.local_node && bcast) service_announce_del(&qrtr_ns.bcast_sq, srv);
/* Announce the service's disappearance to observers */ @@ -372,7 +372,7 @@ static int ctrl_cmd_bye(struct sockaddr_qrtr *from) } slot = radix_tree_iter_resume(slot, &iter); rcu_read_unlock(); - server_del(node, srv->port); + server_del(node, srv->port, true); rcu_read_lock(); } rcu_read_unlock(); @@ -458,10 +458,13 @@ static int ctrl_cmd_del_client(struct sockaddr_qrtr *from, kfree(lookup); }
- /* Remove the server belonging to this port */ + /* Remove the server belonging to this port but don't broadcast + * DEL_SERVER. Neighbours would've already removed the server belonging + * to this port due to the DEL_CLIENT broadcast from qrtr_port_remove(). + */ node = node_get(node_id); if (node) - server_del(node, port); + server_del(node, port, false);
/* Advertise the removal of this client to all local servers */ local_node = node_get(qrtr_ns.local_node); @@ -574,7 +577,7 @@ static int ctrl_cmd_del_server(struct sockaddr_qrtr *from, if (!node) return -ENOENT;
- return server_del(node, port); + return server_del(node, port, true); }
static int ctrl_cmd_new_lookup(struct sockaddr_qrtr *from,
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit ea30388baebcce37fd594d425a65037ca35e59e8 ]
Syzbot reported a bug as following:
===================================================== BUG: KMSAN: uninit-value in arch_atomic64_inc arch/x86/include/asm/atomic64_64.h:88 [inline] BUG: KMSAN: uninit-value in arch_atomic_long_inc include/linux/atomic/atomic-long.h:161 [inline] BUG: KMSAN: uninit-value in atomic_long_inc include/linux/atomic/atomic-instrumented.h:1429 [inline] BUG: KMSAN: uninit-value in __ip6_make_skb+0x2f37/0x30f0 net/ipv6/ip6_output.c:1956 arch_atomic64_inc arch/x86/include/asm/atomic64_64.h:88 [inline] arch_atomic_long_inc include/linux/atomic/atomic-long.h:161 [inline] atomic_long_inc include/linux/atomic/atomic-instrumented.h:1429 [inline] __ip6_make_skb+0x2f37/0x30f0 net/ipv6/ip6_output.c:1956 ip6_finish_skb include/net/ipv6.h:1122 [inline] ip6_push_pending_frames+0x10e/0x550 net/ipv6/ip6_output.c:1987 rawv6_push_pending_frames+0xb12/0xb90 net/ipv6/raw.c:579 rawv6_sendmsg+0x297e/0x2e60 net/ipv6/raw.c:922 inet_sendmsg+0x101/0x180 net/ipv4/af_inet.c:827 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0xa8e/0xe70 net/socket.c:2476 ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2530 __sys_sendmsg net/socket.c:2559 [inline] __do_sys_sendmsg net/socket.c:2568 [inline] __se_sys_sendmsg net/socket.c:2566 [inline] __x64_sys_sendmsg+0x367/0x540 net/socket.c:2566 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Uninit was created at: slab_post_alloc_hook mm/slab.h:766 [inline] slab_alloc_node mm/slub.c:3452 [inline] __kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491 __do_kmalloc_node mm/slab_common.c:967 [inline] __kmalloc_node_track_caller+0x114/0x3b0 mm/slab_common.c:988 kmalloc_reserve net/core/skbuff.c:492 [inline] __alloc_skb+0x3af/0x8f0 net/core/skbuff.c:565 alloc_skb include/linux/skbuff.h:1270 [inline] __ip6_append_data+0x51c1/0x6bb0 net/ipv6/ip6_output.c:1684 ip6_append_data+0x411/0x580 net/ipv6/ip6_output.c:1854 rawv6_sendmsg+0x2882/0x2e60 net/ipv6/raw.c:915 inet_sendmsg+0x101/0x180 net/ipv4/af_inet.c:827 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg net/socket.c:734 [inline] ____sys_sendmsg+0xa8e/0xe70 net/socket.c:2476 ___sys_sendmsg+0x2a1/0x3f0 net/socket.c:2530 __sys_sendmsg net/socket.c:2559 [inline] __do_sys_sendmsg net/socket.c:2568 [inline] __se_sys_sendmsg net/socket.c:2566 [inline] __x64_sys_sendmsg+0x367/0x540 net/socket.c:2566 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
It is because icmp6hdr does not in skb linear region under the scenario of SOCK_RAW socket. Access icmp6_hdr(skb)->icmp6_type directly will trigger the uninit variable access bug.
Use a local variable icmp6_type to carry the correct value in different scenarios.
Fixes: 14878f75abd5 ("[IPV6]: Add ICMPMsgStats MIB (RFC 4293) [rev 2]") Reported-by: syzbot+8257f4dcef79de670baf@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=3d605ec1d0a7f2a269a1a6936ac7f2b85975ee9... Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index e427f5040a08e..c62e44224bf84 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1925,8 +1925,13 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, IP6_UPD_PO_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUT, skb->len); if (proto == IPPROTO_ICMPV6) { struct inet6_dev *idev = ip6_dst_idev(skb_dst(skb)); + u8 icmp6_type;
- ICMP6MSGOUT_INC_STATS(net, idev, icmp6_hdr(skb)->icmp6_type); + if (sk->sk_socket->type == SOCK_RAW && !inet_sk(sk)->hdrincl) + icmp6_type = fl6->fl6_icmp_type; + else + icmp6_type = icmp6_hdr(skb)->icmp6_type; + ICMP6MSGOUT_INC_STATS(net, idev, icmp6_type); ICMP6_INC_STATS(net, idev, ICMP6_MIB_OUTMSGS); }
From: Dhruva Gole d-gole@ti.com
[ Upstream commit 7b75c4703609a3ebaf67271813521bc0281e1ec1 ]
Add the IRQCHIP_SKIP_SET_WAKE flag since there are no special IRQ Wake bits that can be set to enable wakeup IRQ.
Fixes: 3d9edf09d452 ("[ARM] 4457/2: davinci: GPIO support") Signed-off-by: Dhruva Gole d-gole@ti.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-davinci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/gpio-davinci.c b/drivers/gpio/gpio-davinci.c index 6f2138503726a..80597e90de9c6 100644 --- a/drivers/gpio/gpio-davinci.c +++ b/drivers/gpio/gpio-davinci.c @@ -326,7 +326,7 @@ static struct irq_chip gpio_irqchip = { .irq_enable = gpio_irq_enable, .irq_disable = gpio_irq_disable, .irq_set_type = gpio_irq_type, - .flags = IRQCHIP_SET_TYPE_MASKED, + .flags = IRQCHIP_SET_TYPE_MASKED | IRQCHIP_SKIP_SET_WAKE, };
static void gpio_irq_handler(struct irq_desc *desc)
From: Siddharth Vadapalli s-vadapalli@ti.com
[ Upstream commit c6b486fb33680ad5a3a6390ce693c835caaae3f7 ]
In the am65_cpsw_nuss_probe() function's cleanup path, the call to of_platform_device_destroy() for the common->mdio_dev device is invoked unconditionally. It is possible that either the MDIO node is not present in the device-tree, or the MDIO node is disabled in the device-tree. In both these cases, the MDIO device is not created, resulting in a NULL pointer dereference when the of_platform_device_destroy() function is invoked on the common->mdio_dev device on the cleanup path.
Fix this by ensuring that the common->mdio_dev device exists, before attempting to invoke of_platform_device_destroy().
Fixes: a45cfcc69a25 ("net: ethernet: ti: am65-cpsw-nuss: use of_platform_device_create() for mdio") Signed-off-by: Siddharth Vadapalli s-vadapalli@ti.com Reviewed-by: Roger Quadros rogerq@kernel.org Link: https://lore.kernel.org/r/20230403090321.835877-1-s-vadapalli@ti.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/ti/am65-cpsw-nuss.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/ti/am65-cpsw-nuss.c b/drivers/net/ethernet/ti/am65-cpsw-nuss.c index 4074310abcff4..e4af1f506b833 100644 --- a/drivers/net/ethernet/ti/am65-cpsw-nuss.c +++ b/drivers/net/ethernet/ti/am65-cpsw-nuss.c @@ -2153,7 +2153,8 @@ static int am65_cpsw_nuss_probe(struct platform_device *pdev) return 0;
err_of_clear: - of_platform_device_destroy(common->mdio_dev, NULL); + if (common->mdio_dev) + of_platform_device_destroy(common->mdio_dev, NULL); err_pm_clear: pm_runtime_put_sync(dev); pm_runtime_disable(dev); @@ -2179,7 +2180,8 @@ static int am65_cpsw_nuss_remove(struct platform_device *pdev) */ am65_cpsw_nuss_cleanup_ndev(common);
- of_platform_device_destroy(common->mdio_dev, NULL); + if (common->mdio_dev) + of_platform_device_destroy(common->mdio_dev, NULL);
pm_runtime_put_sync(&pdev->dev); pm_runtime_disable(&pdev->dev);
From: Corinna Vinschen vinschen@redhat.com
[ Upstream commit 218c597325f4faf7b7a6049233a30d7842b5b2dc ]
stmmac_reinit_queues() fails to fix up the RX hash. Even if the number of channels gets restricted, the output of `ethtool -x' indicates that all RX queues are used:
$ ethtool -l enp0s29f2 Channel parameters for enp0s29f2: Pre-set maximums: RX: 8 TX: 8 Other: n/a Combined: n/a Current hardware settings: RX: 8 TX: 8 Other: n/a Combined: n/a $ ethtool -x enp0s29f2 RX flow hash indirection table for enp0s29f2 with 8 RX ring(s): 0: 0 1 2 3 4 5 6 7 8: 0 1 2 3 4 5 6 7 [...] $ ethtool -L enp0s29f2 rx 3 $ ethtool -x enp0s29f2 RX flow hash indirection table for enp0s29f2 with 3 RX ring(s): 0: 0 1 2 3 4 5 6 7 8: 0 1 2 3 4 5 6 7 [...]
Fix this by setting the indirection table according to the number of specified queues. The result is now as expected:
$ ethtool -L enp0s29f2 rx 3 $ ethtool -x enp0s29f2 RX flow hash indirection table for enp0s29f2 with 3 RX ring(s): 0: 0 1 2 0 1 2 0 1 8: 2 0 1 2 0 1 2 0 [...]
Tested on Intel Elkhart Lake.
Fixes: 0366f7e06a6b ("net: stmmac: add ethtool support for get/set channels") Signed-off-by: Corinna Vinschen vinschen@redhat.com Link: https://lore.kernel.org/r/20230403121120.489138-1-vinschen@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 04c59102a2863..de66406c50572 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4940,7 +4940,7 @@ static void stmmac_napi_del(struct net_device *dev) int stmmac_reinit_queues(struct net_device *dev, u32 rx_cnt, u32 tx_cnt) { struct stmmac_priv *priv = netdev_priv(dev); - int ret = 0; + int ret = 0, i;
if (netif_running(dev)) stmmac_release(dev); @@ -4949,6 +4949,10 @@ int stmmac_reinit_queues(struct net_device *dev, u32 rx_cnt, u32 tx_cnt)
priv->plat->rx_queues_to_use = rx_cnt; priv->plat->tx_queues_to_use = tx_cnt; + if (!netif_is_rxfh_configured(dev)) + for (i = 0; i < ARRAY_SIZE(priv->rss.table); i++) + priv->rss.table[i] = ethtool_rxfh_indir_default(i, + rx_cnt);
stmmac_napi_add(dev);
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 5085e41f9e83a1bec51da1f20b54f2ec3a13a3fe ]
While the unix_gid object is rcu-freed, the group_info list that it contains is not. Ensure that we only put the group list reference once we are really freeing the unix_gid object.
Reported-by: Zhi Li yieli@redhat.com Link: https://bugzilla.redhat.com/show_bug.cgi?id=2183056 Signed-off-by: Jeff Layton jlayton@kernel.org Fixes: fd5d2f78261b ("SUNRPC: Make server side AUTH_UNIX use lockless lookups") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/svcauth_unix.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c index 97c0bddba7a30..60754a292589b 100644 --- a/net/sunrpc/svcauth_unix.c +++ b/net/sunrpc/svcauth_unix.c @@ -424,14 +424,23 @@ static int unix_gid_hash(kuid_t uid) return hash_long(from_kuid(&init_user_ns, uid), GID_HASHBITS); }
-static void unix_gid_put(struct kref *kref) +static void unix_gid_free(struct rcu_head *rcu) { - struct cache_head *item = container_of(kref, struct cache_head, ref); - struct unix_gid *ug = container_of(item, struct unix_gid, h); + struct unix_gid *ug = container_of(rcu, struct unix_gid, rcu); + struct cache_head *item = &ug->h; + if (test_bit(CACHE_VALID, &item->flags) && !test_bit(CACHE_NEGATIVE, &item->flags)) put_group_info(ug->gi); - kfree_rcu(ug, rcu); + kfree(ug); +} + +static void unix_gid_put(struct kref *kref) +{ + struct cache_head *item = container_of(kref, struct cache_head, ref); + struct unix_gid *ug = container_of(item, struct unix_gid, h); + + call_rcu(&ug->rcu, unix_gid_free); }
static int unix_gid_match(struct cache_head *corig, struct cache_head *cnew)
From: Dai Ngo dai.ngo@oracle.com
[ Upstream commit 7de82c2f36fb26aa78440bbf0efcf360b691d98b ]
Currently callback request does not use the credential specified in CREATE_SESSION if the security flavor for the back channel is AUTH_SYS.
Problem was discovered by pynfs 4.1 DELEG5 and DELEG7 test with error: DELEG5 st_delegation.testCBSecParms : FAILURE expected callback with uid, gid == 17, 19, got 0, 0
Signed-off-by: Dai Ngo dai.ngo@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Fixes: 8276c902bbe9 ("SUNRPC: remove uid and gid from struct auth_cred") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfs4callback.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/nfsd/nfs4callback.c b/fs/nfsd/nfs4callback.c index af2064e36ac61..f5b7ad0847f20 100644 --- a/fs/nfsd/nfs4callback.c +++ b/fs/nfsd/nfs4callback.c @@ -875,8 +875,8 @@ static const struct cred *get_backchannel_cred(struct nfs4_client *clp, struct r if (!kcred) return NULL;
- kcred->uid = ses->se_cb_sec.uid; - kcred->gid = ses->se_cb_sec.gid; + kcred->fsuid = ses->se_cb_sec.uid; + kcred->fsgid = ses->se_cb_sec.gid; return kcred; } }
From: Wayne Chang waynec@nvidia.com
commit 4c7f9d2e413dc06a157c4e5dccde84aaf4655eb3 upstream.
When we set the dual-role port to Host mode, we observed the following splat: [ 167.057718] BUG: sleeping function called from invalid context at include/linux/sched/mm.h:229 [ 167.057872] Workqueue: events tegra_xusb_usb_phy_work [ 167.057954] Call trace: [ 167.057962] dump_backtrace+0x0/0x210 [ 167.057996] show_stack+0x30/0x50 [ 167.058020] dump_stack_lvl+0x64/0x84 [ 167.058065] dump_stack+0x14/0x34 [ 167.058100] __might_resched+0x144/0x180 [ 167.058140] __might_sleep+0x64/0xd0 [ 167.058171] slab_pre_alloc_hook.constprop.0+0xa8/0x110 [ 167.058202] __kmalloc_track_caller+0x74/0x2b0 [ 167.058233] kvasprintf+0xa4/0x190 [ 167.058261] kasprintf+0x58/0x90 [ 167.058285] tegra_xusb_find_port_node.isra.0+0x58/0xd0 [ 167.058334] tegra_xusb_find_port+0x38/0xa0 [ 167.058380] tegra_xusb_padctl_get_usb3_companion+0x38/0xd0 [ 167.058430] tegra_xhci_id_notify+0x8c/0x1e0 [ 167.058473] notifier_call_chain+0x88/0x100 [ 167.058506] atomic_notifier_call_chain+0x44/0x70 [ 167.058537] tegra_xusb_usb_phy_work+0x60/0xd0 [ 167.058581] process_one_work+0x1dc/0x4c0 [ 167.058618] worker_thread+0x54/0x410 [ 167.058650] kthread+0x188/0x1b0 [ 167.058672] ret_from_fork+0x10/0x20
The function tegra_xusb_padctl_get_usb3_companion eventually calls tegra_xusb_find_port and this in turn calls kasprintf which might sleep and so cannot be called from an atomic context.
Fix this by moving the call to tegra_xusb_padctl_get_usb3_companion to the tegra_xhci_id_work function where it is really needed.
Fixes: f836e7843036 ("usb: xhci-tegra: Add OTG support") Cc: stable@vger.kernel.org Signed-off-by: Wayne Chang waynec@nvidia.com Signed-off-by: Haotien Hsu haotienh@nvidia.com Link: https://lore.kernel.org/r/20230327095548.1599470-1-haotienh@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-tegra.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/usb/host/xhci-tegra.c +++ b/drivers/usb/host/xhci-tegra.c @@ -1175,6 +1175,9 @@ static void tegra_xhci_id_work(struct wo
mutex_unlock(&tegra->lock);
+ tegra->otg_usb3_port = tegra_xusb_padctl_get_usb3_companion(tegra->padctl, + tegra->otg_usb2_port); + if (tegra->host_mode) { /* switch to host mode */ if (tegra->otg_usb3_port >= 0) { @@ -1243,9 +1246,6 @@ static int tegra_xhci_id_notify(struct n }
tegra->otg_usb2_port = tegra_xusb_get_usb2_port(tegra, usbphy); - tegra->otg_usb3_port = tegra_xusb_padctl_get_usb3_companion( - tegra->padctl, - tegra->otg_usb2_port);
tegra->host_mode = (usbphy->last_event == USB_EVENT_ID) ? true : false;
From: D Scott Phillips scott@os.amperecomputing.com
commit ecaa4902439298f6b0e29f47424a86b310a9ff4f upstream.
Previously the quirk was skipped when no iommu was present. The same rationale for skipping the quirk also applies in the iommu.passthrough=1 case.
Skip applying the XHCI_ZERO_64B_REGS quirk if the device's iommu domain is passthrough.
Fixes: 12de0a35c996 ("xhci: Add quirk to zero 64bit registers on Renesas PCIe controllers") Cc: stable stable@kernel.org Signed-off-by: D Scott Phillips scott@os.amperecomputing.com Acked-by: Marc Zyngier maz@kernel.org Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20230330143056.1390020-2-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -9,6 +9,7 @@ */
#include <linux/pci.h> +#include <linux/iommu.h> #include <linux/iopoll.h> #include <linux/irq.h> #include <linux/log2.h> @@ -226,6 +227,7 @@ int xhci_reset(struct xhci_hcd *xhci, u6 static void xhci_zero_64b_regs(struct xhci_hcd *xhci) { struct device *dev = xhci_to_hcd(xhci)->self.sysdev; + struct iommu_domain *domain; int err, i; u64 val; u32 intrs; @@ -244,7 +246,9 @@ static void xhci_zero_64b_regs(struct xh * an iommu. Doing anything when there is no iommu is definitely * unsafe... */ - if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !device_iommu_mapped(dev)) + domain = iommu_get_domain_for_dev(dev); + if (!(xhci->quirks & XHCI_ZERO_64B_REGS) || !domain || + domain->type == IOMMU_DOMAIN_IDENTITY) return;
xhci_info(xhci, "Zeroing 64bit base registers, expecting fault\n");
From: Kees Jan Koster kjkoster@kjkoster.org
commit 71f8afa2b66e356f435b6141b4a9ccf953e18356 upstream.
The Silicon Labs IFS-USB-DATACABLE is used in conjunction with for example the Quint UPSes. It is used to enable Modbus communication with the UPS to query configuration, power and battery status.
Signed-off-by: Kees Jan Koster kjkoster@kjkoster.org Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/cp210x.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/serial/cp210x.c +++ b/drivers/usb/serial/cp210x.c @@ -124,6 +124,7 @@ static const struct usb_device_id id_tab { USB_DEVICE(0x10C4, 0x826B) }, /* Cygnal Integrated Products, Inc., Fasttrax GPS demonstration module */ { USB_DEVICE(0x10C4, 0x8281) }, /* Nanotec Plug & Drive */ { USB_DEVICE(0x10C4, 0x8293) }, /* Telegesis ETRX2USB */ + { USB_DEVICE(0x10C4, 0x82AA) }, /* Silicon Labs IFS-USB-DATACABLE used with Quint UPS */ { USB_DEVICE(0x10C4, 0x82EF) }, /* CESINEL FALCO 6105 AC Power Supply */ { USB_DEVICE(0x10C4, 0x82F1) }, /* CESINEL MEDCAL EFD Earth Fault Detector */ { USB_DEVICE(0x10C4, 0x82F2) }, /* CESINEL MEDCAL ST Network Analyzer */
From: RD Babiera rdbabiera@google.com
commit eddebe39602efe631b83ff8d03f26eba12cfd760 upstream.
While determining the initial pin assignment to be sent in the configure message, using the DP_PIN_ASSIGN_DP_ONLY_MASK mask causes the DFP_U to send both Pin Assignment C and E when both are supported by the DFP_U and UFP_U. The spec (Table 5-7 DFP_U Pin Assignment Selection Mandates, VESA DisplayPort Alt Mode Standard v2.0) indicates that the DFP_U never selects Pin Assignment E when Pin Assignment C is offered.
Update the DP_PIN_ASSIGN_DP_ONLY_MASK conditional to intially select only Pin Assignment C if it is available.
Fixes: 0e3bb7d6894d ("usb: typec: Add driver for DisplayPort alternate mode") Cc: stable@vger.kernel.org Signed-off-by: RD Babiera rdbabiera@google.com Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20230329215159.2046932-1-rdbabiera@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/altmodes/displayport.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/usb/typec/altmodes/displayport.c +++ b/drivers/usb/typec/altmodes/displayport.c @@ -101,8 +101,12 @@ static int dp_altmode_configure(struct d if (dp->data.status & DP_STATUS_PREFER_MULTI_FUNC && pin_assign & DP_PIN_ASSIGN_MULTI_FUNC_MASK) pin_assign &= DP_PIN_ASSIGN_MULTI_FUNC_MASK; - else if (pin_assign & DP_PIN_ASSIGN_DP_ONLY_MASK) + else if (pin_assign & DP_PIN_ASSIGN_DP_ONLY_MASK) { pin_assign &= DP_PIN_ASSIGN_DP_ONLY_MASK; + /* Default to pin assign C if available */ + if (pin_assign & BIT(DP_PIN_ASSIGN_C)) + pin_assign = BIT(DP_PIN_ASSIGN_C); + }
if (!pin_assign) return -EINVAL;
From: Enrico Sau enrico.sau@gmail.com
commit 773e8e7d07b753474b2ccd605ff092faaa9e65b9 upstream.
Add the following Telit FE990 compositions:
0x1080: tty, adb, rmnet, tty, tty, tty, tty 0x1081: tty, adb, mbim, tty, tty, tty, tty 0x1082: rndis, tty, adb, tty, tty, tty, tty 0x1083: tty, adb, ecm, tty, tty, tty, tty
Signed-off-by: Enrico Sau enrico.sau@gmail.com Link: https://lore.kernel.org/r/20230314090059.77876-1-enrico.sau@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/option.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1300,6 +1300,14 @@ static const struct usb_device_id option .driver_info = NCTRL(0) | RSVD(1) }, { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1075, 0xff), /* Telit FN990 (PCIe) */ .driver_info = RSVD(0) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1080, 0xff), /* Telit FE990 (rmnet) */ + .driver_info = NCTRL(0) | RSVD(1) | RSVD(2) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1081, 0xff), /* Telit FE990 (MBIM) */ + .driver_info = NCTRL(0) | RSVD(1) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1082, 0xff), /* Telit FE990 (RNDIS) */ + .driver_info = NCTRL(2) | RSVD(3) }, + { USB_DEVICE_INTERFACE_CLASS(TELIT_VENDOR_ID, 0x1083, 0xff), /* Telit FE990 (ECM) */ + .driver_info = NCTRL(0) | RSVD(1) }, { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_ME910), .driver_info = NCTRL(0) | RSVD(1) | RSVD(3) }, { USB_DEVICE(TELIT_VENDOR_ID, TELIT_PRODUCT_ME910_DUAL_MODEM),
From: Bjørn Mork bjorn@mork.no
commit 7708a3858e69db91a8b69487994f33b96d20192a upstream.
This modem supports several modes with a class network function and a number of serial functions, all using ff/00/00
The device ID is the same in all modes.
RNDIS mode ---------- T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2c7c ProdID=0900 Rev= 4.04 S: Manufacturer=Quectel S: Product=RM500U-CN S: SerialNumber=0123456789ABCDEF C:* #Ifs= 7 Cfg#= 1 Atr=c0 MxPwr=500mA A: FirstIf#= 0 IfCount= 2 Cls=e0(wlcon) Sub=01 Prot=03 I:* If#= 0 Alt= 0 #EPs= 1 Cls=e0(wlcon) Sub=01 Prot=03 Driver=rndis_host E: Ad=82(I) Atr=03(Int.) MxPS= 8 Ivl=32ms I:* If#= 1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=rndis_host E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
ECM mode -------- T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 2 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2c7c ProdID=0900 Rev= 4.04 S: Manufacturer=Quectel S: Product=RM500U-CN S: SerialNumber=0123456789ABCDEF C:* #Ifs= 7 Cfg#= 1 Atr=c0 MxPwr=500mA A: FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=06 Prot=00 I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=06 Prot=00 Driver=cdc_ether E: Ad=82(I) Atr=03(Int.) MxPS= 16 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether I:* If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
NCM mode -------- T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 5 Spd=480 MxCh= 0 D: Ver= 2.10 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=2c7c ProdID=0900 Rev= 4.04 S: Manufacturer=Quectel S: Product=RM500U-CN S: SerialNumber=0123456789ABCDEF C:* #Ifs= 7 Cfg#= 1 Atr=c0 MxPwr=500mA A: FirstIf#= 0 IfCount= 2 Cls=02(comm.) Sub=0d Prot=00 I:* If#= 0 Alt= 0 #EPs= 1 Cls=02(comm.) Sub=0d Prot=00 Driver=cdc_ncm E: Ad=82(I) Atr=03(Int.) MxPS= 16 Ivl=32ms I: If#= 1 Alt= 0 #EPs= 0 Cls=0a(data ) Sub=00 Prot=01 Driver=cdc_ncm I:* If#= 1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=01 Driver=cdc_ncm E: Ad=81(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=01(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=83(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=02(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=84(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=03(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=85(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=04(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=86(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=05(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms I:* If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=00 Prot=00 Driver=option E: Ad=87(I) Atr=02(Bulk) MxPS= 512 Ivl=0ms E: Ad=06(O) Atr=02(Bulk) MxPS= 512 Ivl=0ms
Reported-by: Andrew Green askgreen@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Bjørn Mork bjorn@mork.no Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/serial/option.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/usb/serial/option.c +++ b/drivers/usb/serial/option.c @@ -1198,6 +1198,8 @@ static const struct usb_device_id option { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_RM520N, 0xff, 0xff, 0x30) }, { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_RM520N, 0xff, 0, 0x40) }, { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_RM520N, 0xff, 0, 0) }, + { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, 0x0900, 0xff, 0, 0), /* RM500U-CN */ + .driver_info = ZLP }, { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC200U, 0xff, 0, 0) }, { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC200S_CN, 0xff, 0, 0) }, { USB_DEVICE_AND_INTERFACE_INFO(QUECTEL_VENDOR_ID, QUECTEL_PRODUCT_EC200T, 0xff, 0, 0) },
From: Lars-Peter Clausen lars@metafoo.de
commit 363c7dc72f79edd55bf1c4380e0fbf7f1bbc2c86 upstream.
The ads7950 uses a mutex as well as SPI transfers in its GPIO callbacks. This means these callbacks can sleep and the `can_sleep` flag should be set.
Having the flag set will make sure that warnings are generated when calling any of the callbacks from a potentially non-sleeping context.
Fixes: c97dce792dc8 ("iio: adc: ti-ads7950: add GPIO support") Signed-off-by: Lars-Peter Clausen lars@metafoo.de Acked-by: David Lechner david@lechnology.com Link: https://lore.kernel.org/r/20230312210933.2275376-1-lars@metafoo.de Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/adc/ti-ads7950.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/iio/adc/ti-ads7950.c +++ b/drivers/iio/adc/ti-ads7950.c @@ -634,6 +634,7 @@ static int ti_ads7950_probe(struct spi_d st->chip.label = dev_name(&st->spi->dev); st->chip.parent = &st->spi->dev; st->chip.owner = THIS_MODULE; + st->chip.can_sleep = true; st->chip.base = -1; st->chip.ngpio = TI_ADS7950_NUM_GPIOS; st->chip.get_direction = ti_ads7950_get_direction;
From: William Breathitt Gray william.gray@linaro.org
commit c3701185ee1973845db088d8b0fc443397ab0eb2 upstream.
The CIO-DAC series of devices only supports DAC values up to 12-bit rather than 16-bit. Trying to write a 16-bit value results in only the lower 12 bits affecting the DAC output which is not what the user expects. Instead, adjust the DAC write value check to reject values larger than 12-bit so that they fail explicitly as invalid for the user.
Fixes: 3b8df5fd526e ("iio: Add IIO support for the Measurement Computing CIO-DAC family") Cc: stable@vger.kernel.org Signed-off-by: William Breathitt Gray william.gray@linaro.org Link: https://lore.kernel.org/r/20230311002248.8548-1-william.gray@linaro.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/dac/cio-dac.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/iio/dac/cio-dac.c +++ b/drivers/iio/dac/cio-dac.c @@ -66,8 +66,8 @@ static int cio_dac_write_raw(struct iio_ if (mask != IIO_CHAN_INFO_RAW) return -EINVAL;
- /* DAC can only accept up to a 16-bit value */ - if ((unsigned int)val > 65535) + /* DAC can only accept up to a 12-bit value */ + if ((unsigned int)val > 4095) return -EINVAL;
priv->chan_out_states[chan->channel] = val;
From: Kai-Heng Feng kai.heng.feng@canonical.com
commit 099cc90a5a62e68b2fe3a42da011ab929b98bf73 upstream.
If a second dummy client that talks to the actual I2C address was created in probe(), there should be a proper cleanup on driver and device removal to avoid leakage.
So unregister the dummy client via another callback.
Reviewed-by: Hans de Goede hdegoede@redhat.com Suggested-by: Hans de Goede hdegoede@redhat.com Fixes: c1e62062ff54 ("iio: light: cm32181: Handle CM3218 ACPI devices with 2 I2C resources") Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2152281 Signed-off-by: Kai-Heng Feng kai.heng.feng@canonical.com Link: https://lore.kernel.org/r/20230223020059.2013993-1-kai.heng.feng@canonical.c... Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/light/cm32181.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
--- a/drivers/iio/light/cm32181.c +++ b/drivers/iio/light/cm32181.c @@ -429,6 +429,14 @@ static const struct iio_info cm32181_inf .attrs = &cm32181_attribute_group, };
+static void cm32181_unregister_dummy_client(void *data) +{ + struct i2c_client *client = data; + + /* Unregister the dummy client */ + i2c_unregister_device(client); +} + static int cm32181_probe(struct i2c_client *client) { struct device *dev = &client->dev; @@ -458,6 +466,10 @@ static int cm32181_probe(struct i2c_clie client = i2c_acpi_new_device(dev, 1, &board_info); if (IS_ERR(client)) return PTR_ERR(client); + + ret = devm_add_action_or_reset(dev, cm32181_unregister_dummy_client, client); + if (ret) + return ret; }
cm32181 = iio_priv(indio_dev);
From: Biju Das biju.das.jz@bp.renesas.com
commit b43a18647f03c87e77d50d6fe74904b61b96323e upstream.
The fourth interrupt on SCI port is transmit end interrupt compared to the break interrupt on other port types. So, shuffle the interrupts to fix the transmit end interrupt handler.
Fixes: e1d0be616186 ("sh-sci: Add h8300 SCI") Cc: stable stable@kernel.org Suggested-by: Geert Uytterhoeven geert+renesas@glider.be Signed-off-by: Biju Das biju.das.jz@bp.renesas.com Link: https://lore.kernel.org/r/20230317150403.154094-1-biju.das.jz@bp.renesas.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sh-sci.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/tty/serial/sh-sci.c +++ b/drivers/tty/serial/sh-sci.c @@ -31,6 +31,7 @@ #include <linux/ioport.h> #include <linux/ktime.h> #include <linux/major.h> +#include <linux/minmax.h> #include <linux/module.h> #include <linux/mm.h> #include <linux/of.h> @@ -2923,6 +2924,13 @@ static int sci_init_single(struct platfo sci_port->irqs[i] = platform_get_irq(dev, i); }
+ /* + * The fourth interrupt on SCI port is transmit end interrupt, so + * shuffle the interrupts. + */ + if (p->type == PORT_SCI) + swap(sci_port->irqs[SCIx_BRI_IRQ], sci_port->irqs[SCIx_TEI_IRQ]); + /* The SCI generates several interrupts. They can be muxed together or * connected to different interrupt lines. In the muxed case only one * interrupt resource is specified as there is only one interrupt ID.
From: Biju Das biju.das.jz@bp.renesas.com
commit f92ed0cd9328aed918ebb0ebb64d259eccbcc6e7 upstream.
SCI IP on RZ/G2L alike SoCs do not need regshift compared to other SCI IPs on the SH platform. Currently, it does regshift and configuring Rx wrongly. Drop adding regshift for RZ/G2L alike SoCs.
Fixes: dfc80387aefb ("serial: sh-sci: Compute the regshift value for SCI ports") Cc: stable@vger.kernel.org Signed-off-by: Biju Das biju.das.jz@bp.renesas.com Link: https://lore.kernel.org/r/20230321114753.75038-3-biju.das.jz@bp.renesas.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sh-sci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/serial/sh-sci.c +++ b/drivers/tty/serial/sh-sci.c @@ -2996,7 +2996,7 @@ static int sci_init_single(struct platfo port->flags = UPF_FIXED_PORT | UPF_BOOT_AUTOCONF | p->flags; port->fifosize = sci_port->params->fifosize;
- if (port->type == PORT_SCI) { + if (port->type == PORT_SCI && !dev->dev.of_node) { if (sci_port->reg_size >= 0x20) port->regshift = 2; else
From: Sherry Sun sherry.sun@nxp.com
commit 9425914f3de6febbd6250395f56c8279676d9c3c upstream.
According to LPUART RM, Transmission Complete Flag becomes 0 if queuing a break character by writing 1 to CTRL[SBK], so here need to avoid checking for transmission complete when UARTCTRL_SBK is asserted, otherwise the lpuart32_tx_empty may never get TIOCSER_TEMT.
Commit 2411fd94ceaa("tty: serial: fsl_lpuart: skip waiting for transmission complete when UARTCTRL_SBK is asserted") only fix it in lpuart32_set_termios(), here also fix it in lpuart32_tx_empty().
Fixes: 380c966c093e ("tty: serial: fsl_lpuart: add 32-bit register interface support") Cc: stable stable@kernel.org Signed-off-by: Sherry Sun sherry.sun@nxp.com Link: https://lore.kernel.org/r/20230323054415.20363-1-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/fsl_lpuart.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/tty/serial/fsl_lpuart.c +++ b/drivers/tty/serial/fsl_lpuart.c @@ -809,11 +809,17 @@ static unsigned int lpuart32_tx_empty(st struct lpuart_port, port); unsigned long stat = lpuart32_read(port, UARTSTAT); unsigned long sfifo = lpuart32_read(port, UARTFIFO); + unsigned long ctrl = lpuart32_read(port, UARTCTRL);
if (sport->dma_tx_in_progress) return 0;
- if (stat & UARTSTAT_TC && sfifo & UARTFIFO_TXEMPT) + /* + * LPUART Transmission Complete Flag may never be set while queuing a break + * character, so avoid checking for transmission complete when UARTCTRL_SBK + * is asserted. + */ + if ((stat & UARTSTAT_TC && sfifo & UARTFIFO_TXEMPT) || ctrl & UARTCTRL_SBK) return TIOCSER_TEMT;
return 0;
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 6be49d100c22ffea3287a4b19d7639d259888e33 upstream.
The finalization of nilfs_segctor_thread() can race with nilfs_segctor_kill_thread() which terminates that thread, potentially causing a use-after-free BUG as KASAN detected.
At the end of nilfs_segctor_thread(), it assigns NULL to "sc_task" member of "struct nilfs_sc_info" to indicate the thread has finished, and then notifies nilfs_segctor_kill_thread() of this using waitqueue "sc_wait_task" on the struct nilfs_sc_info.
However, here, immediately after the NULL assignment to "sc_task", it is possible that nilfs_segctor_kill_thread() will detect it and return to continue the deallocation, freeing the nilfs_sc_info structure before the thread does the notification.
This fixes the issue by protecting the NULL assignment to "sc_task" and its notification, with spinlock "sc_state_lock" of the struct nilfs_sc_info. Since nilfs_segctor_kill_thread() does a final check to see if "sc_task" is NULL with "sc_state_lock" locked, this can eliminate the race.
Link: https://lkml.kernel.org/r/20230327175318.8060-1-konishi.ryusuke@gmail.com Reported-by: syzbot+b08ebcc22f8f3e6be43a@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/00000000000000660d05f7dfa877@google.com Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/segment.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/nilfs2/segment.c +++ b/fs/nilfs2/segment.c @@ -2614,11 +2614,10 @@ static int nilfs_segctor_thread(void *ar goto loop;
end_thread: - spin_unlock(&sci->sc_state_lock); - /* end sync. */ sci->sc_task = NULL; wake_up(&sci->sc_wait_task); /* for nilfs_segctor_kill_thread() */ + spin_unlock(&sci->sc_state_lock); return 0; }
From: Ryusuke Konishi konishi.ryusuke@gmail.com
commit 42560f9c92cc43dce75dbf06cc0d840dced39b12 upstream.
The current nilfs2 sysfs support has issues with the timing of creation and deletion of sysfs entries, potentially leading to null pointer dereferences, use-after-free, and lockdep warnings.
Some of the sysfs attributes for nilfs2 per-filesystem instance refer to metadata file "cpfile", "sufile", or "dat", but nilfs_sysfs_create_device_group that creates those attributes is executed before the inodes for these metadata files are loaded, and nilfs_sysfs_delete_device_group which deletes these sysfs entries is called after releasing their metadata file inodes.
Therefore, access to some of these sysfs attributes may occur outside of the lifetime of these metadata files, resulting in inode NULL pointer dereferences or use-after-free.
In addition, the call to nilfs_sysfs_create_device_group() is made during the locking period of the semaphore "ns_sem" of nilfs object, so the shrinker call caused by the memory allocation for the sysfs entries, may derive lock dependencies "ns_sem" -> (shrinker) -> "locks acquired in nilfs_evict_inode()".
Since nilfs2 may acquire "ns_sem" deep in the call stack holding other locks via its error handler __nilfs_error(), this causes lockdep to report circular locking. This is a false positive and no circular locking actually occurs as no inodes exist yet when nilfs_sysfs_create_device_group() is called. Fortunately, the lockdep warnings can be resolved by simply moving the call to nilfs_sysfs_create_device_group() out of "ns_sem".
This fixes these sysfs issues by revising where the device's sysfs interface is created/deleted and keeping its lifetime within the lifetime of the metadata files above.
Link: https://lkml.kernel.org/r/20230330205515.6167-1-konishi.ryusuke@gmail.com Fixes: dd70edbde262 ("nilfs2: integrate sysfs support into driver") Signed-off-by: Ryusuke Konishi konishi.ryusuke@gmail.com Reported-by: syzbot+979fa7f9c0d086fdc282@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/0000000000003414b505f7885f7e@google.com Reported-by: syzbot+5b7d542076d9bddc3c6a@syzkaller.appspotmail.com Link: https://lkml.kernel.org/r/0000000000006ac86605f5f44eb9@google.com Cc: Viacheslav Dubeyko slava@dubeyko.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nilfs2/super.c | 2 ++ fs/nilfs2/the_nilfs.c | 12 +++++++----- 2 files changed, 9 insertions(+), 5 deletions(-)
--- a/fs/nilfs2/super.c +++ b/fs/nilfs2/super.c @@ -482,6 +482,7 @@ static void nilfs_put_super(struct super up_write(&nilfs->ns_sem); }
+ nilfs_sysfs_delete_device_group(nilfs); iput(nilfs->ns_sufile); iput(nilfs->ns_cpfile); iput(nilfs->ns_dat); @@ -1105,6 +1106,7 @@ nilfs_fill_super(struct super_block *sb, nilfs_put_root(fsroot);
failed_unload: + nilfs_sysfs_delete_device_group(nilfs); iput(nilfs->ns_sufile); iput(nilfs->ns_cpfile); iput(nilfs->ns_dat); --- a/fs/nilfs2/the_nilfs.c +++ b/fs/nilfs2/the_nilfs.c @@ -87,7 +87,6 @@ void destroy_nilfs(struct the_nilfs *nil { might_sleep(); if (nilfs_init(nilfs)) { - nilfs_sysfs_delete_device_group(nilfs); brelse(nilfs->ns_sbh[0]); brelse(nilfs->ns_sbh[1]); } @@ -305,6 +304,10 @@ int load_nilfs(struct the_nilfs *nilfs, goto failed; }
+ err = nilfs_sysfs_create_device_group(sb); + if (unlikely(err)) + goto sysfs_error; + if (valid_fs) goto skip_recovery;
@@ -366,6 +369,9 @@ int load_nilfs(struct the_nilfs *nilfs, goto failed;
failed_unload: + nilfs_sysfs_delete_device_group(nilfs); + + sysfs_error: iput(nilfs->ns_cpfile); iput(nilfs->ns_sufile); iput(nilfs->ns_dat); @@ -697,10 +703,6 @@ int init_nilfs(struct the_nilfs *nilfs, if (err) goto failed_sbh;
- err = nilfs_sysfs_create_device_group(sb); - if (err) - goto failed_sbh; - set_nilfs_init(nilfs); err = 0; out:
From: Geert Uytterhoeven geert+renesas@glider.be
commit 7b21f329ae0ab6361c0aebfc094db95821490cd1 upstream.
The fourth interrupt on SCIF variants with four interrupts (RZ/A1) is the Break interrupt, not the Transmit End interrupt (like on SCI(g)). Update the description and interrupt name to fix this.
Fixes: 384d00fae8e51f8f ("dt-bindings: serial: sh-sci: Convert to json-schema") Cc: stable stable@kernel.org Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Acked-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/719d1582e0ebbe3d674e3a48fc26295e1475a4c3.167904639... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/devicetree/bindings/serial/renesas,scif.yaml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/Documentation/devicetree/bindings/serial/renesas,scif.yaml +++ b/Documentation/devicetree/bindings/serial/renesas,scif.yaml @@ -74,7 +74,7 @@ properties: - description: Error interrupt - description: Receive buffer full interrupt - description: Transmit buffer empty interrupt - - description: Transmit End interrupt + - description: Break interrupt - items: - description: Error interrupt - description: Receive buffer full interrupt @@ -89,7 +89,7 @@ properties: - const: eri - const: rxi - const: txi - - const: tei + - const: bri - items: - const: eri - const: rxi
From: Jeremy Soller jeremy@system76.com
commit 36d4d213c6d4fffae2645a601e8ae996de4c3645 upstream.
Fixes speaker output and headset detection on Clevo X370SNW.
Signed-off-by: Jeremy Soller jeremy@system76.com Signed-off-by: Tim Crawford tcrawford@system76.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230331162317.14992-1-tcrawford@system76.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+)
--- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -2632,6 +2632,7 @@ static const struct snd_pci_quirk alc882 SND_PCI_QUIRK(0x1462, 0xda57, "MSI Z270-Gaming", ALC1220_FIXUP_GB_DUAL_CODECS), SND_PCI_QUIRK_VENDOR(0x1462, "MSI", ALC882_FIXUP_GPIO3), SND_PCI_QUIRK(0x147b, 0x107a, "Abit AW9D-MAX", ALC882_FIXUP_ABIT_AW9D_MAX), + SND_PCI_QUIRK(0x1558, 0x3702, "Clevo X370SN[VW]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK(0x1558, 0x50d3, "Clevo PC50[ER][CDF]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK(0x1558, 0x65d1, "Clevo PB51[ER][CDF]", ALC1220_FIXUP_CLEVO_PB51ED_PINS), SND_PCI_QUIRK(0x1558, 0x65d2, "Clevo PB51R[CDF]", ALC1220_FIXUP_CLEVO_PB51ED_PINS),
From: Nuno Sá nuno.sa@analog.com
[ Upstream commit 0c6ef985a1fd8a74dcb5cad941ddcadd55cb8697 ]
The interrupt is triggered on the falling edge rather than being a level low interrupt.
Fixes: da4d3d6bb9f6 ("iio: adc: ad-sigma-delta: Allow custom IRQ flags") Signed-off-by: Nuno Sá nuno.sa@analog.com Link: https://lore.kernel.org/r/20230120124645.819910-1-nuno.sa@analog.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/adc/ad7791.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/adc/ad7791.c b/drivers/iio/adc/ad7791.c index d57ad966e17c1..f3502f12653b3 100644 --- a/drivers/iio/adc/ad7791.c +++ b/drivers/iio/adc/ad7791.c @@ -253,7 +253,7 @@ static const struct ad_sigma_delta_info ad7791_sigma_delta_info = { .has_registers = true, .addr_shift = 4, .read_mask = BIT(3), - .irq_flags = IRQF_TRIGGER_LOW, + .irq_flags = IRQF_TRIGGER_FALLING, };
static int ad7791_read_raw(struct iio_dev *indio_dev,
From: Zhong Jinghua zhongjinghua@huawei.com
[ Upstream commit 48b19b79cfa37b1e50da3b5a8af529f994c08901 ]
The validity of sock should be checked before assignment to avoid incorrect values. Commit 57569c37f0ad ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()") introduced this change which may lead to inconsistent values of tcp_sw_conn->sendpage and conn->datadgst_en.
Fix the issue by moving the position of the assignment.
Fixes: 57569c37f0ad ("scsi: iscsi: iscsi_tcp: Fix null-ptr-deref while calling getpeername()") Signed-off-by: Zhong Jinghua zhongjinghua@huawei.com Link: https://lore.kernel.org/r/20230329071739.2175268-1-zhongjinghua@huaweicloud.... Reviewed-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/iscsi_tcp.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/scsi/iscsi_tcp.c b/drivers/scsi/iscsi_tcp.c index 252d7881f99c2..def9fac7aa4f4 100644 --- a/drivers/scsi/iscsi_tcp.c +++ b/drivers/scsi/iscsi_tcp.c @@ -721,13 +721,12 @@ static int iscsi_sw_tcp_conn_set_param(struct iscsi_cls_conn *cls_conn, iscsi_set_param(cls_conn, param, buf, buflen); break; case ISCSI_PARAM_DATADGST_EN: - iscsi_set_param(cls_conn, param, buf, buflen); - mutex_lock(&tcp_sw_conn->sock_lock); if (!tcp_sw_conn->sock) { mutex_unlock(&tcp_sw_conn->sock_lock); return -ENOTCONN; } + iscsi_set_param(cls_conn, param, buf, buflen); tcp_sw_conn->sendpage = conn->datadgst_en ? sock_no_sendpage : tcp_sw_conn->sock->ops->sendpage; mutex_unlock(&tcp_sw_conn->sock_lock);
From: Kan Liang kan.liang@linux.intel.com
[ Upstream commit 24d3ae2f37d8bc3c14b31d353c5d27baf582b6a6 ]
The same task check in perf_event_set_output has some potential issues for some usages.
For the current perf code, there is a problem if using of perf_event_open() to have multiple samples getting into the same mmap’d memory when they are both attached to the same process. https://lore.kernel.org/all/92645262-D319-4068-9C44-2409EF44888E@gmail.com/ Because the event->ctx is not ready when the perf_event_set_output() is invoked in the perf_event_open().
Besides the above issue, before the commit bd2756811766 ("perf: Rewrite core context handling"), perf record can errors out when sampling with a hardware event and a software event as below. $ perf record -e cycles,dummy --per-thread ls failed to mmap with 22 (Invalid argument) That's because that prior to the commit a hardware event and a software event are from different task context.
The problem should be a long time issue since commit c3f00c70276d ("perk: Separate find_get_context() from event initialization").
The task struct is stored in the event->hw.target for each per-thread event. It is a more reliable way to determine whether two events are attached to the same task.
The event->hw.target was also introduced several years ago by the commit 50f16a8bf9d7 ("perf: Remove type specific target pointers"). It can not only be used to fix the issue with the current code, but also back port to fix the issues with an older kernel.
Note: The event->hw.target was introduced later than commit c3f00c70276d. The patch may cannot be applied between the commit c3f00c70276d and commit 50f16a8bf9d7. Anybody that wants to back-port this at that period may have to find other solutions.
Fixes: c3f00c70276d ("perf: Separate find_get_context() from event initialization") Signed-off-by: Kan Liang kan.liang@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Zhengjun Xing zhengjun.xing@linux.intel.com Link: https://lkml.kernel.org/r/20230322202449.512091-1-kan.liang@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/events/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c index e2e1371fbb9d3..f0df3bc0e6415 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -11622,7 +11622,7 @@ perf_event_set_output(struct perf_event *event, struct perf_event *output_event) /* * If its not a per-cpu rb, it must be the same task. */ - if (output_event->cpu == -1 && output_event->ctx != event->ctx) + if (output_event->cpu == -1 && output_event->hw.target != event->hw.target) goto out;
/*
From: John Keeping john@metanate.com
commit ea65b41807a26495ff2a73dd8b1bab2751940887 upstream.
If the compiler decides not to inline this function then preemption tracing will always show an IP inside the preemption disabling path and never the function actually calling preempt_{enable,disable}.
Link: https://lore.kernel.org/linux-trace-kernel/20230327173647.1690849-1-john@met...
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: stable@vger.kernel.org Fixes: f904f58263e1d ("sched/debug: Fix preempt_disable_ip recording for preempt_disable()") Signed-off-by: John Keeping john@metanate.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/ftrace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -811,7 +811,7 @@ static inline void __ftrace_enabled_rest #define CALLER_ADDR5 ((unsigned long)ftrace_return_address(5)) #define CALLER_ADDR6 ((unsigned long)ftrace_return_address(6))
-static inline unsigned long get_lock_parent_ip(void) +static __always_inline unsigned long get_lock_parent_ip(void) { unsigned long addr = CALLER_ADDR0;
From: Zheng Yejian zhengyejian1@huawei.com
commit 2a2d8c51defb446e8d89a83f42f8e5cd529111e9 upstream.
Syzkaller report a WARNING: "WARN_ON(!direct)" in modify_ftrace_direct().
Root cause is 'direct->addr' was changed from 'old_addr' to 'new_addr' but not restored if error happened on calling ftrace_modify_direct_caller(). Then it can no longer find 'direct' by that 'old_addr'.
To fix it, restore 'direct->addr' to 'old_addr' explicitly in error path.
Link: https://lore.kernel.org/linux-trace-kernel/20230330025223.1046087-1-zhengyej...
Cc: stable@vger.kernel.org Cc: mhiramat@kernel.org Cc: mark.rutland@arm.com Cc: ast@kernel.org Cc: daniel@iogearbox.net Fixes: 8a141dd7f706 ("ftrace: Fix modify_ftrace_direct.") Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ftrace.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
--- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -5390,12 +5390,15 @@ int modify_ftrace_direct(unsigned long i ret = 0; }
- if (unlikely(ret && new_direct)) { - direct->count++; - list_del_rcu(&new_direct->next); - synchronize_rcu_tasks(); - kfree(new_direct); - ftrace_direct_func_count--; + if (ret) { + direct->addr = old_addr; + if (unlikely(new_direct)) { + direct->count++; + list_del_rcu(&new_direct->next); + synchronize_rcu_tasks(); + kfree(new_direct); + ftrace_direct_func_count--; + } }
out_unlock:
From: Oleksij Rempel o.rempel@pengutronix.de
commit b45193cb4df556fe6251b285a5ce44046dd36b4a upstream.
In the j1939_tp_tx_dat_new() function, an out-of-bounds memory access could occur during the memcpy() operation if the size of skb->cb is larger than the size of struct j1939_sk_buff_cb. This is because the memcpy() operation uses the size of skb->cb, leading to a read beyond the struct j1939_sk_buff_cb.
Updated the memcpy() operation to use the size of struct j1939_sk_buff_cb instead of the size of skb->cb. This ensures that the memcpy() operation only reads the memory within the bounds of struct j1939_sk_buff_cb, preventing out-of-bounds memory access.
Additionally, add a BUILD_BUG_ON() to check that the size of skb->cb is greater than or equal to the size of struct j1939_sk_buff_cb. This ensures that the skb->cb buffer is large enough to hold the j1939_sk_buff_cb structure.
Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Reported-by: Shuangpeng Bai sjb7183@psu.edu Tested-by: Shuangpeng Bai sjb7183@psu.edu Signed-off-by: Oleksij Rempel o.rempel@pengutronix.de Link: https://groups.google.com/g/syzkaller/c/G_LL-C3plRs/m/-8xCi6dCAgAJ Link: https://lore.kernel.org/all/20230404073128.3173900-1-o.rempel@pengutronix.de Cc: stable@vger.kernel.org [mkl: rephrase commit message] Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/can/j1939/transport.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/net/can/j1939/transport.c +++ b/net/can/j1939/transport.c @@ -600,7 +600,10 @@ sk_buff *j1939_tp_tx_dat_new(struct j193 /* reserve CAN header */ skb_reserve(skb, offsetof(struct can_frame, data));
- memcpy(skb->cb, re_skcb, sizeof(skb->cb)); + /* skb->cb must be large enough to hold a j1939_sk_buff_cb structure */ + BUILD_BUG_ON(sizeof(skb->cb) < sizeof(*re_skcb)); + + memcpy(skb->cb, re_skcb, sizeof(*re_skcb)); skcb = j1939_skb_to_cb(skb); if (swap_src_dst) j1939_skbcb_swap(skcb);
From: Michal Sojka michal.sojka@cvut.cz
commit 79e19fa79cb5d5f1b3bf3e3ae24989ccb93c7b7b upstream.
When using select()/poll()/epoll() with a non-blocking ISOTP socket to wait for when non-blocking write is possible, a false EPOLLOUT event is sometimes returned. This can happen at least after sending a message which must be split to multiple CAN frames.
The reason is that isotp_sendmsg() returns -EAGAIN when tx.state is not equal to ISOTP_IDLE and this behavior is not reflected in datagram_poll(), which is used in isotp_ops.
This is fixed by introducing ISOTP-specific poll function, which suppresses the EPOLLOUT events in that case.
v2: https://lore.kernel.org/all/20230302092812.320643-1-michal.sojka@cvut.cz v1: https://lore.kernel.org/all/20230224010659.48420-1-michal.sojka@cvut.cz https://lore.kernel.org/all/b53a04a2-ba1f-3858-84c1-d3eb3301ae15@hartkopp.ne...
Signed-off-by: Michal Sojka michal.sojka@cvut.cz Reported-by: Jakub Jira jirajak2@fel.cvut.cz Tested-by: Oliver Hartkopp socketcan@hartkopp.net Acked-by: Oliver Hartkopp socketcan@hartkopp.net Fixes: e057dd3fc20f ("can: add ISO 15765-2:2016 transport protocol") Link: https://lore.kernel.org/all/20230331125511.372783-1-michal.sojka@cvut.cz Cc: stable@vger.kernel.org Signed-off-by: Marc Kleine-Budde mkl@pengutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/can/isotp.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
--- a/net/can/isotp.c +++ b/net/can/isotp.c @@ -1480,6 +1480,21 @@ static int isotp_init(struct sock *sk) return 0; }
+static __poll_t isotp_poll(struct file *file, struct socket *sock, poll_table *wait) +{ + struct sock *sk = sock->sk; + struct isotp_sock *so = isotp_sk(sk); + + __poll_t mask = datagram_poll(file, sock, wait); + poll_wait(file, &so->wait, wait); + + /* Check for false positives due to TX state */ + if ((mask & EPOLLWRNORM) && (so->tx.state != ISOTP_IDLE)) + mask &= ~(EPOLLOUT | EPOLLWRNORM); + + return mask; +} + static int isotp_sock_no_ioctlcmd(struct socket *sock, unsigned int cmd, unsigned long arg) { @@ -1495,7 +1510,7 @@ static const struct proto_ops isotp_ops .socketpair = sock_no_socketpair, .accept = sock_no_accept, .getname = isotp_getname, - .poll = datagram_poll, + .poll = isotp_poll, .ioctl = isotp_sock_no_ioctlcmd, .gettstamp = sock_gettstamp, .listen = sock_no_listen,
From: Steven Rostedt (Google) rostedt@goodmis.org
commit 3357c6e429643231e60447b52ffbb7ac895aca22 upstream.
When a tracing instance is removed, the error messages that hold errors that occurred in the instance needs to be freed. The following reports a memory leak:
# cd /sys/kernel/tracing # mkdir instances/foo # echo 'hist:keys=x' > instances/foo/events/sched/sched_switch/trigger # cat instances/foo/error_log [ 117.404795] hist:sched:sched_switch: error: Couldn't find field Command: hist:keys=x ^ # rmdir instances/foo
Then check for memory leaks:
# echo scan > /sys/kernel/debug/kmemleak # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff88810d8ec700 (size 192): comm "bash", pid 869, jiffies 4294950577 (age 215.752s) hex dump (first 32 bytes): 60 dd 68 61 81 88 ff ff 60 dd 68 61 81 88 ff ff `.ha....`.ha.... a0 30 8c 83 ff ff ff ff 26 00 0a 00 00 00 00 00 .0......&....... backtrace: [<00000000dae26536>] kmalloc_trace+0x2a/0xa0 [<00000000b2938940>] tracing_log_err+0x277/0x2e0 [<000000004a0e1b07>] parse_atom+0x966/0xb40 [<0000000023b24337>] parse_expr+0x5f3/0xdb0 [<00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560 [<00000000293a9645>] trigger_process_regex+0x135/0x1a0 [<000000005c22b4f2>] event_trigger_write+0x87/0xf0 [<000000002cadc509>] vfs_write+0x162/0x670 [<0000000059c3b9be>] ksys_write+0xca/0x170 [<00000000f1cddc00>] do_syscall_64+0x3e/0xc0 [<00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc unreferenced object 0xffff888170c35a00 (size 32): comm "bash", pid 869, jiffies 4294950577 (age 215.752s) hex dump (first 32 bytes): 0a 20 20 43 6f 6d 6d 61 6e 64 3a 20 68 69 73 74 . Command: hist 3a 6b 65 79 73 3d 78 0a 00 00 00 00 00 00 00 00 :keys=x......... backtrace: [<000000006a747de5>] __kmalloc+0x4d/0x160 [<000000000039df5f>] tracing_log_err+0x29b/0x2e0 [<000000004a0e1b07>] parse_atom+0x966/0xb40 [<0000000023b24337>] parse_expr+0x5f3/0xdb0 [<00000000594ad074>] event_hist_trigger_parse+0x27f8/0x3560 [<00000000293a9645>] trigger_process_regex+0x135/0x1a0 [<000000005c22b4f2>] event_trigger_write+0x87/0xf0 [<000000002cadc509>] vfs_write+0x162/0x670 [<0000000059c3b9be>] ksys_write+0xca/0x170 [<00000000f1cddc00>] do_syscall_64+0x3e/0xc0 [<00000000868ac68c>] entry_SYSCALL_64_after_hwframe+0x72/0xdc
The problem is that the error log needs to be freed when the instance is removed.
Link: https://lore.kernel.org/lkml/76134d9f-a5ba-6a0d-37b3-28310b4a1e91@alu.unizg.... Link: https://lore.kernel.org/linux-trace-kernel/20230404194504.5790b95f@gandalf.l...
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Mark Rutland mark.rutland@arm.com Cc: Thorsten Leemhuis regressions@leemhuis.info Cc: Ulf Hansson ulf.hansson@linaro.org Cc: Eric Biggers ebiggers@kernel.org Fixes: 2f754e771b1a6 ("tracing: Have the error logs show up in the proper instances") Reported-by: Mirsad Goran Todorovac mirsad.todorovac@alu.unizg.hr Tested-by: Mirsad Todorovac mirsad.todorovac@alu.unizg.hr Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -8895,6 +8895,7 @@ static int __remove_instance(struct trac ftrace_destroy_function_files(tr); tracefs_remove(tr->dir); free_trace_buffers(tr); + clear_tracing_err_log(tr);
for (i = 0; i < tr->nr_topts; i++) { kfree(tr->topts[i].topts);
From: Jason Montleon jmontleo@redhat.com
commit f6887a71bdd2f0dcba9b8180dd2223cfa8637e85 upstream.
hdac_hdmi was not updated to use set_stream() instead of set_tdm_slots() in the original commit so HDMI no longer produces audio.
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/regressions/CAJD_bPKQdtaExvVEKxhQ47G-ZXDA=k+gzhMJRHL... Fixes: 636110411ca7 ("ASoC: Intel/SOF: use set_stream() instead of set_tdm_slots() for HDAudio") Signed-off-by: Jason Montleon jmontleo@redhat.com Reviewed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20230324170711.2526-1-jmontleo@redhat.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/codecs/hdac_hdmi.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
--- a/sound/soc/codecs/hdac_hdmi.c +++ b/sound/soc/codecs/hdac_hdmi.c @@ -436,23 +436,28 @@ static int hdac_hdmi_setup_audio_infofra return 0; }
-static int hdac_hdmi_set_tdm_slot(struct snd_soc_dai *dai, - unsigned int tx_mask, unsigned int rx_mask, - int slots, int slot_width) +static int hdac_hdmi_set_stream(struct snd_soc_dai *dai, + void *stream, int direction) { struct hdac_hdmi_priv *hdmi = snd_soc_dai_get_drvdata(dai); struct hdac_device *hdev = hdmi->hdev; struct hdac_hdmi_dai_port_map *dai_map; struct hdac_hdmi_pcm *pcm; + struct hdac_stream *hstream;
- dev_dbg(&hdev->dev, "%s: strm_tag: %d\n", __func__, tx_mask); + if (!stream) + return -EINVAL; + + hstream = (struct hdac_stream *)stream; + + dev_dbg(&hdev->dev, "%s: strm_tag: %d\n", __func__, hstream->stream_tag);
dai_map = &hdmi->dai_map[dai->id];
pcm = hdac_hdmi_get_pcm_from_cvt(hdmi, dai_map->cvt);
if (pcm) - pcm->stream_tag = (tx_mask << 4); + pcm->stream_tag = (hstream->stream_tag << 4);
return 0; } @@ -1544,7 +1549,7 @@ static const struct snd_soc_dai_ops hdmi .startup = hdac_hdmi_pcm_open, .shutdown = hdac_hdmi_pcm_close, .hw_params = hdac_hdmi_set_hw_params, - .set_tdm_slot = hdac_hdmi_set_tdm_slot, + .set_stream = hdac_hdmi_set_stream, };
/*
From: Boris Brezillon boris.brezillon@collabora.com
commit 764a2ab9eb56e1200083e771aab16186836edf1d upstream.
Make sure all bo->base.pages entries are either NULL or pointing to a valid page before calling drm_gem_shmem_put_pages().
Reported-by: Tomeu Vizoso tomeu.vizoso@collabora.com Cc: stable@vger.kernel.org Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Signed-off-by: Boris Brezillon boris.brezillon@collabora.com Reviewed-by: Steven Price steven.price@arm.com Link: https://patchwork.freedesktop.org/patch/msgid/20210521093811.1018992-1-boris... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -458,6 +458,7 @@ static int panfrost_mmu_map_fault_addr(s if (IS_ERR(pages[i])) { mutex_unlock(&bo->base.pages_lock); ret = PTR_ERR(pages[i]); + pages[i] = NULL; goto err_pages; } }
From: Karol Herbst kherbst@redhat.com
commit 7f67aa097e875c87fba024e850cf405342300059 upstream.
This allows us to advertise more modes especially on HDR displays.
Fixes using 4K@60 modes on my TV and main display both using a HDMI to DP adapter. Also fixes similar issues for users running into this.
Cc: stable@vger.kernel.org # 5.10+ Signed-off-by: Karol Herbst kherbst@redhat.com Reviewed-by: Lyude Paul lyude@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20230330223938.4025569-1-kherb... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/dispnv50/disp.c | 32 ++++++++++++++++++++++++++++++++ drivers/gpu/drm/nouveau/nouveau_dp.c | 8 +++++--- 2 files changed, 37 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -396,6 +396,35 @@ nv50_outp_atomic_check_view(struct drm_e return 0; }
+static void +nv50_outp_atomic_fix_depth(struct drm_encoder *encoder, struct drm_crtc_state *crtc_state) +{ + struct nv50_head_atom *asyh = nv50_head_atom(crtc_state); + struct nouveau_encoder *nv_encoder = nouveau_encoder(encoder); + struct drm_display_mode *mode = &asyh->state.adjusted_mode; + unsigned int max_rate, mode_rate; + + switch (nv_encoder->dcb->type) { + case DCB_OUTPUT_DP: + max_rate = nv_encoder->dp.link_nr * nv_encoder->dp.link_bw; + + /* we don't support more than 10 anyway */ + asyh->or.bpc = min_t(u8, asyh->or.bpc, 10); + + /* reduce the bpc until it works out */ + while (asyh->or.bpc > 6) { + mode_rate = DIV_ROUND_UP(mode->clock * asyh->or.bpc * 3, 8); + if (mode_rate <= max_rate) + break; + + asyh->or.bpc -= 2; + } + break; + default: + break; + } +} + static int nv50_outp_atomic_check(struct drm_encoder *encoder, struct drm_crtc_state *crtc_state, @@ -414,6 +443,9 @@ nv50_outp_atomic_check(struct drm_encode if (crtc_state->mode_changed || crtc_state->connectors_changed) asyh->or.bpc = connector->display_info.bpc;
+ /* We might have to reduce the bpc */ + nv50_outp_atomic_fix_depth(encoder, crtc_state); + return 0; }
--- a/drivers/gpu/drm/nouveau/nouveau_dp.c +++ b/drivers/gpu/drm/nouveau/nouveau_dp.c @@ -220,8 +220,6 @@ void nouveau_dp_irq(struct nouveau_drm * }
/* TODO: - * - Use the minimum possible BPC here, once we add support for the max bpc - * property. * - Validate against the DP caps advertised by the GPU (we don't check these * yet) */ @@ -233,7 +231,11 @@ nv50_dp_mode_valid(struct drm_connector { const unsigned int min_clock = 25000; unsigned int max_rate, mode_rate, ds_max_dotclock, clock = mode->clock; - const u8 bpp = connector->display_info.bpc * 3; + /* Check with the minmum bpc always, so we can advertise better modes. + * In particlar not doing this causes modes to be dropped on HDR + * displays as we might check with a bpc of 16 even. + */ + const u8 bpp = 6 * 3;
if (mode->flags & DRM_MODE_FLAG_INTERLACE && !outp->caps.dp_interlace) return MODE_NO_INTERLACE;
From: Zheng Yejian zhengyejian1@huawei.com
commit 6455b6163d8c680366663cdb8c679514d55fc30c upstream.
When user reads file 'trace_pipe', kernel keeps printing following logs that warn at "cpu_buffer->reader_page->read > rb_page_size(reader)" in rb_get_reader_page(). It just looks like there's an infinite loop in tracing_read_pipe(). This problem occurs several times on arm64 platform when testing v5.10 and below.
Call trace: rb_get_reader_page+0x248/0x1300 rb_buffer_peek+0x34/0x160 ring_buffer_peek+0xbc/0x224 peek_next_entry+0x98/0xbc __find_next_entry+0xc4/0x1c0 trace_find_next_entry_inc+0x30/0x94 tracing_read_pipe+0x198/0x304 vfs_read+0xb4/0x1e0 ksys_read+0x74/0x100 __arm64_sys_read+0x24/0x30 el0_svc_common.constprop.0+0x7c/0x1bc do_el0_svc+0x2c/0x94 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb4 el0_sync+0x160/0x180
Then I dump the vmcore and look into the problematic per_cpu ring_buffer, I found that tail_page/commit_page/reader_page are on the same page while reader_page->read is obviously abnormal: tail_page == commit_page == reader_page == { .write = 0x100d20, .read = 0x8f9f4805, // Far greater than 0xd20, obviously abnormal!!! .entries = 0x10004c, .real_end = 0x0, .page = { .time_stamp = 0x857257416af0, .commit = 0xd20, // This page hasn't been full filled. // .data[0...0xd20] seems normal. } }
The root cause is most likely the race that reader and writer are on the same page while reader saw an event that not fully committed by writer.
To fix this, add memory barriers to make sure the reader can see the content of what is committed. Since commit a0fcaaed0c46 ("ring-buffer: Fix race between reset page and reading page") has added the read barrier in rb_get_reader_page(), here we just need to add the write barrier.
Link: https://lore.kernel.org/linux-trace-kernel/20230325021247.2923907-1-zhengyej...
Cc: stable@vger.kernel.org Fixes: 77ae365eca89 ("ring-buffer: make lockless") Suggested-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/ring_buffer.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-)
--- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -2962,6 +2962,10 @@ rb_set_commit_to_write(struct ring_buffe if (RB_WARN_ON(cpu_buffer, rb_is_reader_page(cpu_buffer->tail_page))) return; + /* + * No need for a memory barrier here, as the update + * of the tail_page did it for this page. + */ local_set(&cpu_buffer->commit_page->page->commit, rb_page_write(cpu_buffer->commit_page)); rb_inc_page(cpu_buffer, &cpu_buffer->commit_page); @@ -2971,6 +2975,8 @@ rb_set_commit_to_write(struct ring_buffe while (rb_commit_index(cpu_buffer) != rb_page_write(cpu_buffer->commit_page)) {
+ /* Make sure the readers see the content of what is committed. */ + smp_wmb(); local_set(&cpu_buffer->commit_page->page->commit, rb_page_write(cpu_buffer->commit_page)); RB_WARN_ON(cpu_buffer, @@ -4390,7 +4396,12 @@ rb_get_reader_page(struct ring_buffer_pe
/* * Make sure we see any padding after the write update - * (see rb_reset_tail()) + * (see rb_reset_tail()). + * + * In addition, a writer may be writing on the reader page + * if the page has not been fully filled, so the read barrier + * is also needed to make sure we see the content of what is + * committed by the writer (see rb_set_commit_to_write()). */ smp_rmb();
From: Rongwei Wang rongwei.wang@linux.alibaba.com
commit 6fe7d6b992113719e96744d974212df3fcddc76c upstream.
The si->lock must be held when deleting the si from the available list. Otherwise, another thread can re-add the si to the available list, which can lead to memory corruption. The only place we have found where this happens is in the swapoff path. This case can be described as below:
core 0 core 1 swapoff
del_from_avail_list(si) waiting
try lock si->lock acquire swap_avail_lock and re-add si into swap_avail_head
acquire si->lock but missing si already being added again, and continuing to clear SWP_WRITEOK, etc.
It can be easily found that a massive warning messages can be triggered inside get_swap_pages() by some special cases, for example, we call madvise(MADV_PAGEOUT) on blocks of touched memory concurrently, meanwhile, run much swapon-swapoff operations (e.g. stress-ng-swap).
However, in the worst case, panic can be caused by the above scene. In swapoff(), the memory used by si could be kept in swap_info[] after turning off a swap. This means memory corruption will not be caused immediately until allocated and reset for a new swap in the swapon path. A panic message caused: (with CONFIG_PLIST_DEBUG enabled)
------------[ cut here ]------------ top: 00000000e58a3003, n: 0000000013e75cda, p: 000000008cd4451a prev: 0000000035b1e58a, n: 000000008cd4451a, p: 000000002150ee8d next: 000000008cd4451a, n: 000000008cd4451a, p: 000000008cd4451a WARNING: CPU: 21 PID: 1843 at lib/plist.c:60 plist_check_prev_next_node+0x50/0x70 Modules linked in: rfkill(E) crct10dif_ce(E)... CPU: 21 PID: 1843 Comm: stress-ng Kdump: ... 5.10.134+ Hardware name: Alibaba Cloud ECS, BIOS 0.0.0 02/06/2015 pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--) pc : plist_check_prev_next_node+0x50/0x70 lr : plist_check_prev_next_node+0x50/0x70 sp : ffff0018009d3c30 x29: ffff0018009d3c40 x28: ffff800011b32a98 x27: 0000000000000000 x26: ffff001803908000 x25: ffff8000128ea088 x24: ffff800011b32a48 x23: 0000000000000028 x22: ffff001800875c00 x21: ffff800010f9e520 x20: ffff001800875c00 x19: ffff001800fdc6e0 x18: 0000000000000030 x17: 0000000000000000 x16: 0000000000000000 x15: 0736076307640766 x14: 0730073007380731 x13: 0736076307640766 x12: 0730073007380731 x11: 000000000004058d x10: 0000000085a85b76 x9 : ffff8000101436e4 x8 : ffff800011c8ce08 x7 : 0000000000000000 x6 : 0000000000000001 x5 : ffff0017df9ed338 x4 : 0000000000000001 x3 : ffff8017ce62a000 x2 : ffff0017df9ed340 x1 : 0000000000000000 x0 : 0000000000000000 Call trace: plist_check_prev_next_node+0x50/0x70 plist_check_head+0x80/0xf0 plist_add+0x28/0x140 add_to_avail_list+0x9c/0xf0 _enable_swap_info+0x78/0xb4 __do_sys_swapon+0x918/0xa10 __arm64_sys_swapon+0x20/0x30 el0_svc_common+0x8c/0x220 do_el0_svc+0x2c/0x90 el0_svc+0x1c/0x30 el0_sync_handler+0xa8/0xb0 el0_sync+0x148/0x180 irq event stamp: 2082270
Now, si->lock locked before calling 'del_from_avail_list()' to make sure other thread see the si had been deleted and SWP_WRITEOK cleared together, will not reinsert again.
This problem exists in versions after stable 5.10.y.
Link: https://lkml.kernel.org/r/20230404154716.23058-1-rongwei.wang@linux.alibaba.... Fixes: a2468cc9bfdff ("swap: choose swap device according to numa node") Tested-by: Yongchen Yin wb-yyc939293@alibaba-inc.com Signed-off-by: Rongwei Wang rongwei.wang@linux.alibaba.com Cc: Bagas Sanjaya bagasdotme@gmail.com Cc: Matthew Wilcox (Oracle) willy@infradead.org Cc: Aaron Lu aaron.lu@intel.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/swapfile.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -666,6 +666,7 @@ static void __del_from_avail_list(struct { int nid;
+ assert_spin_locked(&p->lock); for_each_node(nid) plist_del(&p->avail_lists[nid], &swap_avail_heads[nid]); } @@ -2611,8 +2612,8 @@ SYSCALL_DEFINE1(swapoff, const char __us spin_unlock(&swap_lock); goto out_dput; } - del_from_avail_list(p); spin_lock(&p->lock); + del_from_avail_list(p); if (p->prio < 0) { struct swap_info_struct *si = p; int nid;
From: Tommi Rantala tommi.t.rantala@nokia.com
commit fc4a3a1bf9ad799181e4d4ec9c2598c0405bc27d upstream.
Use clock_gettime() instead of deprecated ftime().
aperf.c: In function ‘main’: aperf.c:58:2: warning: ‘ftime’ is deprecated [-Wdeprecated-declarations] 58 | ftime(&before); | ^~~~~ In file included from aperf.c:9: /usr/include/sys/timeb.h:39:12: note: declared here 39 | extern int ftime (struct timeb *__timebuf) | ^~~~~
Signed-off-by: Tommi Rantala tommi.t.rantala@nokia.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Ricardo Cañuelo ricardo.canuelo@collabora.com Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/oe-kbuild-all/202304060514.ELO1BqLI-lkp@intel.com/ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/intel_pstate/aperf.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
--- a/tools/testing/selftests/intel_pstate/aperf.c +++ b/tools/testing/selftests/intel_pstate/aperf.c @@ -10,8 +10,12 @@ #include <sched.h> #include <errno.h> #include <string.h> +#include <time.h> #include "../kselftest.h"
+#define MSEC_PER_SEC 1000L +#define NSEC_PER_MSEC 1000000L + void usage(char *name) { printf ("Usage: %s cpunum\n", name); } @@ -22,7 +26,7 @@ int main(int argc, char **argv) { long long tsc, old_tsc, new_tsc; long long aperf, old_aperf, new_aperf; long long mperf, old_mperf, new_mperf; - struct timeb before, after; + struct timespec before, after; long long int start, finish, total; cpu_set_t cpuset;
@@ -55,7 +59,10 @@ int main(int argc, char **argv) { return 1; }
- ftime(&before); + if (clock_gettime(CLOCK_MONOTONIC, &before) < 0) { + perror("clock_gettime"); + return 1; + } pread(fd, &old_tsc, sizeof(old_tsc), 0x10); pread(fd, &old_aperf, sizeof(old_mperf), 0xe7); pread(fd, &old_mperf, sizeof(old_aperf), 0xe8); @@ -64,7 +71,10 @@ int main(int argc, char **argv) { sqrt(i); }
- ftime(&after); + if (clock_gettime(CLOCK_MONOTONIC, &after) < 0) { + perror("clock_gettime"); + return 1; + } pread(fd, &new_tsc, sizeof(new_tsc), 0x10); pread(fd, &new_aperf, sizeof(new_mperf), 0xe7); pread(fd, &new_mperf, sizeof(new_aperf), 0xe8); @@ -73,11 +83,11 @@ int main(int argc, char **argv) { aperf = new_aperf-old_aperf; mperf = new_mperf-old_mperf;
- start = before.time*1000 + before.millitm; - finish = after.time*1000 + after.millitm; + start = before.tv_sec*MSEC_PER_SEC + before.tv_nsec/NSEC_PER_MSEC; + finish = after.tv_sec*MSEC_PER_SEC + after.tv_nsec/NSEC_PER_MSEC; total = finish - start;
- printf("runTime: %4.2f\n", 1.0*total/1000); + printf("runTime: %4.2f\n", 1.0*total/MSEC_PER_SEC); printf("freq: %7.0f\n", tsc / (1.0*aperf / (1.0 * mperf)) / total); return 0; }
From: Robert Foss robert.foss@linaro.org
commit 2a9df204be0bbb896e087f00b9ee3fc559d5a608 upstream.
This fixes PLL being unable to lock, and is derived from an equivalent downstream commit.
Available LT9611 documentation does not list this register, neither does LT9611UXC (which is a different chip).
This commit has been confirmed to fix HDMI output on DragonBoard 845c.
Suggested-by: Amit Pundir amit.pundir@linaro.org Reviewed-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Robert Foss robert.foss@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20221213150304.4189760-1-rober... Signed-off-by: Amit Pundir amit.pundir@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/bridge/lontium-lt9611.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/bridge/lontium-lt9611.c +++ b/drivers/gpu/drm/bridge/lontium-lt9611.c @@ -256,6 +256,7 @@ static int lt9611_pll_setup(struct lt961 { 0x8126, 0x55 }, { 0x8127, 0x66 }, { 0x8128, 0x88 }, + { 0x812a, 0x20 }, };
regmap_multi_reg_write(lt9611->regmap, reg_cfg, ARRAY_SIZE(reg_cfg));
From: Gaosheng Cui cuigaosheng1@huawei.com
This reverts commit c7a218cbf67fffcd99b76ae3b5e9c2e8bef17c8c.
The memory of ctx is allocated by devm_kzalloc in cal_ctx_create, it should not be freed by kfree when cal_ctx_v4l2_init() fails, otherwise kfree() will cause double free, so revert this patch.
The memory of ctx is allocated by kzalloc since commit 9e67f24e4d9 ("media: ti-vpe: cal: fix ctx uninitialization"), so the fixes tag of patch c7a218cbf67fis not entirely accurate, mainline should merge this patch, but it should not be merged into 5.10, so we just revert this patch for this branch.
Fixes: c7a218cbf67f ("media: ti: cal: fix possible memory leak in cal_ctx_create()") Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/platform/ti-vpe/cal.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/drivers/media/platform/ti-vpe/cal.c +++ b/drivers/media/platform/ti-vpe/cal.c @@ -624,10 +624,8 @@ static struct cal_ctx *cal_ctx_create(st ctx->cport = inst;
ret = cal_ctx_v4l2_init(ctx); - if (ret) { - kfree(ctx); + if (ret) return NULL; - }
return ctx; }
From: Heming Zhao ocfs2-devel@oss.oracle.com
commit 550842cc60987b269e31b222283ade3e1b6c7fc8 upstream.
After commit 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before return error"), any procedure after ocfs2_dlm_init() fails will trigger crash when calling ocfs2_dlm_shutdown().
ie: On local mount mode, no dlm resource is initialized. If ocfs2_mount_volume() fails in ocfs2_find_slot(), error handling will call ocfs2_dlm_shutdown(), then does dlm resource cleanup job, which will trigger kernel crash.
This solution should bypass uninitialized resources in ocfs2_dlm_shutdown().
Link: https://lkml.kernel.org/r/20220815085754.20417-1-heming.zhao@suse.com Fixes: 0737e01de9c4 ("ocfs2: ocfs2_mount_volume does cleanup job before return error") Signed-off-by: Heming Zhao heming.zhao@suse.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Changwei Ge gechangwei@live.cn Cc: Gang He ghe@suse.com Cc: Jun Piao piaojun@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/dlmglue.c | 8 +++++--- fs/ocfs2/super.c | 3 +-- 2 files changed, 6 insertions(+), 5 deletions(-)
--- a/fs/ocfs2/dlmglue.c +++ b/fs/ocfs2/dlmglue.c @@ -3396,10 +3396,12 @@ void ocfs2_dlm_shutdown(struct ocfs2_sup ocfs2_lock_res_free(&osb->osb_nfs_sync_lockres); ocfs2_lock_res_free(&osb->osb_orphan_scan.os_lockres);
- ocfs2_cluster_disconnect(osb->cconn, hangup_pending); - osb->cconn = NULL; + if (osb->cconn) { + ocfs2_cluster_disconnect(osb->cconn, hangup_pending); + osb->cconn = NULL;
- ocfs2_dlm_shutdown_debug(osb); + ocfs2_dlm_shutdown_debug(osb); + } }
static int ocfs2_drop_lock(struct ocfs2_super *osb, --- a/fs/ocfs2/super.c +++ b/fs/ocfs2/super.c @@ -1922,8 +1922,7 @@ static void ocfs2_dismount_volume(struct !ocfs2_is_hard_readonly(osb)) hangup_needed = 1;
- if (osb->cconn) - ocfs2_dlm_shutdown(osb, hangup_needed); + ocfs2_dlm_shutdown(osb, hangup_needed);
ocfs2_blockcheck_stats_debugfs_remove(&osb->osb_ecc_stats); debugfs_remove_recursive(osb->osb_debug_root);
From: Eduard Zingerman eddyz87@gmail.com
[ Upstream commit 44a726c3f23cf762ef4ce3c1709aefbcbe97f62c ]
btf_dump_emit_struct_def attempts to print empty structures at a single line, e.g. `struct empty {}`. However, it has to account for a case when there are no regular but some padding fields in the struct. In such case `vlen` would be zero, but size would be non-zero.
E.g. here is struct bpf_timer from vmlinux.h before this patch:
struct bpf_timer { long: 64; long: 64;};
And after this patch:
struct bpf_dynptr { long: 64; long: 64; };
Signed-off-by: Eduard Zingerman eddyz87@gmail.com Signed-off-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/bpf/20221001104425.415768-1-eddyz87@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/bpf/btf_dump.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c index 558d34fbd331c..6a8d8ed34b760 100644 --- a/tools/lib/bpf/btf_dump.c +++ b/tools/lib/bpf/btf_dump.c @@ -969,7 +969,11 @@ static void btf_dump_emit_struct_def(struct btf_dump *d, if (is_struct) btf_dump_emit_bit_padding(d, off, t->size * 8, align, false, lvl + 1);
- if (vlen) + /* + * Keep `struct empty {}` on a single line, + * only print newline when there are regular or padding fields. + */ + if (vlen || t->size) btf_dump_printf(d, "\n"); btf_dump_printf(d, "%s}", pfx(lvl)); if (packed)
From: Kornel Dulęba korneld@chromium.org
commit 534e465845ebfb4a97eb5459d3931a0b35e3b9a5 upstream.
This reverts commit b26cd9325be4c1fcd331b77f10acb627c560d4d7.
This patch introduces a regression on Lenovo Z13, which can't wake from the lid with it applied; and some unspecified AMD based Dell platforms are unable to wake from hitting the power button
Signed-off-by: Kornel Dulęba korneld@chromium.org Reviewed-by: Mario Limonciello mario.limonciello@amd.com Link: https://lore.kernel.org/r/20230411134932.292287-1-korneld@chromium.org Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pinctrl/pinctrl-amd.c | 36 ++++++++++++++++-------------------- 1 file changed, 16 insertions(+), 20 deletions(-)
--- a/drivers/pinctrl/pinctrl-amd.c +++ b/drivers/pinctrl/pinctrl-amd.c @@ -764,34 +764,32 @@ static const struct pinconf_ops amd_pinc .pin_config_group_set = amd_pinconf_group_set, };
-static void amd_gpio_irq_init_pin(struct amd_gpio *gpio_dev, int pin) +static void amd_gpio_irq_init(struct amd_gpio *gpio_dev) { - const struct pin_desc *pd; + struct pinctrl_desc *desc = gpio_dev->pctrl->desc; unsigned long flags; u32 pin_reg, mask; + int i;
mask = BIT(WAKE_CNTRL_OFF_S0I3) | BIT(WAKE_CNTRL_OFF_S3) | BIT(INTERRUPT_MASK_OFF) | BIT(INTERRUPT_ENABLE_OFF) | BIT(WAKE_CNTRL_OFF_S4);
- pd = pin_desc_get(gpio_dev->pctrl, pin); - if (!pd) - return; + for (i = 0; i < desc->npins; i++) { + int pin = desc->pins[i].number; + const struct pin_desc *pd = pin_desc_get(gpio_dev->pctrl, pin);
- raw_spin_lock_irqsave(&gpio_dev->lock, flags); - pin_reg = readl(gpio_dev->base + pin * 4); - pin_reg &= ~mask; - writel(pin_reg, gpio_dev->base + pin * 4); - raw_spin_unlock_irqrestore(&gpio_dev->lock, flags); -} + if (!pd) + continue;
-static void amd_gpio_irq_init(struct amd_gpio *gpio_dev) -{ - struct pinctrl_desc *desc = gpio_dev->pctrl->desc; - int i; + raw_spin_lock_irqsave(&gpio_dev->lock, flags);
- for (i = 0; i < desc->npins; i++) - amd_gpio_irq_init_pin(gpio_dev, i); + pin_reg = readl(gpio_dev->base + i * 4); + pin_reg &= ~mask; + writel(pin_reg, gpio_dev->base + i * 4); + + raw_spin_unlock_irqrestore(&gpio_dev->lock, flags); + } }
#ifdef CONFIG_PM_SLEEP @@ -844,10 +842,8 @@ static int amd_gpio_resume(struct device for (i = 0; i < desc->npins; i++) { int pin = desc->pins[i].number;
- if (!amd_gpio_should_save(gpio_dev, pin)) { - amd_gpio_irq_init_pin(gpio_dev, pin); + if (!amd_gpio_should_save(gpio_dev, pin)) continue; - }
raw_spin_lock_irqsave(&gpio_dev->lock, flags); gpio_dev->saved_regs[i] |= readl(gpio_dev->base + pin * 4) & PIN_IRQ_PENDING;
From: Oswald Buddenhagen oswald.buddenhagen@gmx.de
commit b09c551c77c7e01dc6e4f3c8bf06b5ffa7b06db5 upstream.
Due to two copy/pastos, closing the MIC or EFX capture device would make a running ADC capture hang due to unsetting its interrupt handler. In principle, this would have also allowed dereferencing dangling pointers, but we're actually rather thorough at disabling and flushing the ints.
While it may sound like one, this actually wasn't a hypothetical bug: PortAudio will open a capture stream at startup (and close it right away) even if not asked to. If the first device is busy, it will just proceed with the next one ... thus killing a concurrent capture.
Signed-off-by: Oswald Buddenhagen oswald.buddenhagen@gmx.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230405201220.2197923-1-oswald.buddenhagen@gmx.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/emu10k1/emupcm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/sound/pci/emu10k1/emupcm.c +++ b/sound/pci/emu10k1/emupcm.c @@ -1232,7 +1232,7 @@ static int snd_emu10k1_capture_mic_close { struct snd_emu10k1 *emu = snd_pcm_substream_chip(substream);
- emu->capture_interrupt = NULL; + emu->capture_mic_interrupt = NULL; emu->pcm_capture_mic_substream = NULL; return 0; } @@ -1340,7 +1340,7 @@ static int snd_emu10k1_capture_efx_close { struct snd_emu10k1 *emu = snd_pcm_substream_chip(substream);
- emu->capture_interrupt = NULL; + emu->capture_efx_interrupt = NULL; emu->pcm_capture_efx_substream = NULL; return 0; }
From: Oswald Buddenhagen oswald.buddenhagen@gmx.de
commit c17f8fd31700392b1bb9e7b66924333568cb3700 upstream.
Like the other boards from the D*45* series, this one sets up the outputs not quite correctly.
Signed-off-by: Oswald Buddenhagen oswald.buddenhagen@gmx.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230405201220.2197826-1-oswald.buddenhagen@gmx.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/sound/hd-audio/models.rst | 2 +- sound/pci/hda/patch_sigmatel.c | 2 ++ 2 files changed, 3 insertions(+), 1 deletion(-)
--- a/Documentation/sound/hd-audio/models.rst +++ b/Documentation/sound/hd-audio/models.rst @@ -704,7 +704,7 @@ ref no-jd BIOS setup but without jack-detection intel - Intel DG45* mobos + Intel D*45* mobos dell-m6-amic Dell desktops/laptops with analog mics dell-m6-dmic --- a/sound/pci/hda/patch_sigmatel.c +++ b/sound/pci/hda/patch_sigmatel.c @@ -1955,6 +1955,8 @@ static const struct snd_pci_quirk stac92 "DFI LanParty", STAC_92HD73XX_REF), SND_PCI_QUIRK(PCI_VENDOR_ID_DFI, 0x3101, "DFI LanParty", STAC_92HD73XX_REF), + SND_PCI_QUIRK(PCI_VENDOR_ID_INTEL, 0x5001, + "Intel DP45SG", STAC_92HD73XX_INTEL), SND_PCI_QUIRK(PCI_VENDOR_ID_INTEL, 0x5002, "Intel DG45ID", STAC_92HD73XX_INTEL), SND_PCI_QUIRK(PCI_VENDOR_ID_INTEL, 0x5003,
From: Oswald Buddenhagen oswald.buddenhagen@gmx.de
commit e98e7a82bca2b6dce3e03719cff800ec913f9af7 upstream.
snd_cs8427_iec958_active() would always delete SNDRV_CTL_ELEM_ACCESS_INACTIVE, even though the function has an argument `active`.
Signed-off-by: Oswald Buddenhagen oswald.buddenhagen@gmx.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230405201219.2197811-1-oswald.buddenhagen@gmx.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/i2c/cs8427.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/sound/i2c/cs8427.c +++ b/sound/i2c/cs8427.c @@ -553,10 +553,13 @@ int snd_cs8427_iec958_active(struct snd_ if (snd_BUG_ON(!cs8427)) return -ENXIO; chip = cs8427->private_data; - if (active) + if (active) { memcpy(chip->playback.pcm_status, chip->playback.def_status, 24); - chip->playback.pcm_ctl->vd[0].access &= ~SNDRV_CTL_ELEM_ACCESS_INACTIVE; + chip->playback.pcm_ctl->vd[0].access &= ~SNDRV_CTL_ELEM_ACCESS_INACTIVE; + } else { + chip->playback.pcm_ctl->vd[0].access |= SNDRV_CTL_ELEM_ACCESS_INACTIVE; + } snd_ctl_notify(cs8427->bus->card, SNDRV_CTL_EVENT_MASK_VALUE | SNDRV_CTL_EVENT_MASK_INFO, &chip->playback.pcm_ctl->id);
From: Xu Biang xubiang@hust.edu.cn
commit fb4a624f88f658c7b7ae124452bd42eaa8ac7168 upstream.
Smatch Warns: sound/firewire/tascam/tascam-stream.c:493 snd_tscm_stream_start_duplex() warn: missing unwind goto?
The direct return will cause the stream list of "&tscm->domain" unemptied and the session in "tscm" unfinished if amdtp_domain_start() returns with an error.
Fix this by changing the direct return to a goto which will empty the stream list of "&tscm->domain" and finish the session in "tscm".
The snd_tscm_stream_start_duplex() function is called in the prepare callback of PCM. According to "ALSA Kernel API Documentation", the prepare callback of PCM will be called many times at each setup. So, if the "&d->streams" list is not emptied, when the prepare callback is called next time, snd_tscm_stream_start_duplex() will receive -EBUSY from amdtp_domain_add_stream() that tries to add an existing stream to the domain. The error handling code after the "error" label will be executed in this case, and the "&d->streams" list will be emptied. So not emptying the "&d->streams" list will not cause an issue. But it is more efficient and readable to empty it on the first error by changing the direct return to a goto statement.
The session in "tscm" has been begun before amdtp_domain_start(), so it needs to be finished when amdtp_domain_start() fails.
Fixes: c281d46a51e3 ("ALSA: firewire-tascam: support AMDTP domain") Signed-off-by: Xu Biang xubiang@hust.edu.cn Reviewed-by: Dan Carpenter error27@gmail.com Acked-by: Takashi Sakamoto o-takashi@sakamocchi.jp Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230406132801.105108-1-xubiang@hust.edu.cn Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/firewire/tascam/tascam-stream.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/firewire/tascam/tascam-stream.c +++ b/sound/firewire/tascam/tascam-stream.c @@ -475,7 +475,7 @@ int snd_tscm_stream_start_duplex(struct
err = amdtp_domain_start(&tscm->domain, 0); if (err < 0) - return err; + goto error;
if (!amdtp_stream_wait_callback(&tscm->rx_stream, CALLBACK_TIMEOUT) ||
From: Oswald Buddenhagen oswald.buddenhagen@gmx.de
commit f342ac00da1064eb4f94b1f4bcacbdfea955797a upstream.
The BIOS botches this one completely - it says the 2nd S/PDIF output is used, while in fact it's the 1st one. This is tested on DP45SG, but I'm assuming it's valid for the other boards in the series as well.
Also add some comments regarding the pins. FWIW, the codec is apparently still sold by Tempo Semiconductor, Inc., where one can download the documentation.
Signed-off-by: Oswald Buddenhagen oswald.buddenhagen@gmx.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230405201220.2197826-2-oswald.buddenhagen@gmx.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_sigmatel.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/sound/pci/hda/patch_sigmatel.c +++ b/sound/pci/hda/patch_sigmatel.c @@ -1707,6 +1707,7 @@ static const struct snd_pci_quirk stac92 };
static const struct hda_pintbl ref92hd73xx_pin_configs[] = { + // Port A-H { 0x0a, 0x02214030 }, { 0x0b, 0x02a19040 }, { 0x0c, 0x01a19020 }, @@ -1715,9 +1716,12 @@ static const struct hda_pintbl ref92hd73 { 0x0f, 0x01014010 }, { 0x10, 0x01014020 }, { 0x11, 0x01014030 }, + // CD in { 0x12, 0x02319040 }, + // Digial Mic ins { 0x13, 0x90a000f0 }, { 0x14, 0x90a000f0 }, + // Digital outs { 0x22, 0x01452050 }, { 0x23, 0x01452050 }, {} @@ -1758,6 +1762,7 @@ static const struct hda_pintbl alienware };
static const struct hda_pintbl intel_dg45id_pin_configs[] = { + // Analog outputs { 0x0a, 0x02214230 }, { 0x0b, 0x02A19240 }, { 0x0c, 0x01013214 }, @@ -1765,6 +1770,9 @@ static const struct hda_pintbl intel_dg4 { 0x0e, 0x01A19250 }, { 0x0f, 0x01011212 }, { 0x10, 0x01016211 }, + // Digital output + { 0x22, 0x01451380 }, + { 0x23, 0x40f000f0 }, {} };
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
commit a2a9339e1c9deb7e1e079e12e27a0265aea8421a upstream.
Similar to commit d0be8347c623 ("Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put"), just use l2cap_chan_hold_unless_zero to prevent referencing a channel that is about to be destroyed.
Cc: stable@kernel.org Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Min Li lm0963hack@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/l2cap_core.c | 24 ++++++------------------ 1 file changed, 6 insertions(+), 18 deletions(-)
--- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -4647,33 +4647,27 @@ static inline int l2cap_disconnect_req(s
BT_DBG("scid 0x%4.4x dcid 0x%4.4x", scid, dcid);
- mutex_lock(&conn->chan_lock); - - chan = __l2cap_get_chan_by_scid(conn, dcid); + chan = l2cap_get_chan_by_scid(conn, dcid); if (!chan) { - mutex_unlock(&conn->chan_lock); cmd_reject_invalid_cid(conn, cmd->ident, dcid, scid); return 0; }
- l2cap_chan_hold(chan); - l2cap_chan_lock(chan); - rsp.dcid = cpu_to_le16(chan->scid); rsp.scid = cpu_to_le16(chan->dcid); l2cap_send_cmd(conn, cmd->ident, L2CAP_DISCONN_RSP, sizeof(rsp), &rsp);
chan->ops->set_shutdown(chan);
+ mutex_lock(&conn->chan_lock); l2cap_chan_del(chan, ECONNRESET); + mutex_unlock(&conn->chan_lock);
chan->ops->close(chan);
l2cap_chan_unlock(chan); l2cap_chan_put(chan);
- mutex_unlock(&conn->chan_lock); - return 0; }
@@ -4693,33 +4687,27 @@ static inline int l2cap_disconnect_rsp(s
BT_DBG("dcid 0x%4.4x scid 0x%4.4x", dcid, scid);
- mutex_lock(&conn->chan_lock); - - chan = __l2cap_get_chan_by_scid(conn, scid); + chan = l2cap_get_chan_by_scid(conn, scid); if (!chan) { mutex_unlock(&conn->chan_lock); return 0; }
- l2cap_chan_hold(chan); - l2cap_chan_lock(chan); - if (chan->state != BT_DISCONN) { l2cap_chan_unlock(chan); l2cap_chan_put(chan); - mutex_unlock(&conn->chan_lock); return 0; }
+ mutex_lock(&conn->chan_lock); l2cap_chan_del(chan, 0); + mutex_unlock(&conn->chan_lock);
chan->ops->close(chan);
l2cap_chan_unlock(chan); l2cap_chan_put(chan);
- mutex_unlock(&conn->chan_lock); - return 0; }
From: Min Li lm0963hack@gmail.com
commit c95930abd687fcd1aa040dc4fe90dff947916460 upstream.
There is a potential race condition in hidp_session_thread that may lead to use-after-free. For instance, the timer is active while hidp_del_timer is called in hidp_session_thread(). After hidp_session_put, then 'session' will be freed, causing kernel panic when hidp_idle_timeout is running.
The solution is to use del_timer_sync instead of del_timer.
Here is the call trace:
? hidp_session_probe+0x780/0x780 call_timer_fn+0x2d/0x1e0 __run_timers.part.0+0x569/0x940 hidp_session_probe+0x780/0x780 call_timer_fn+0x1e0/0x1e0 ktime_get+0x5c/0xf0 lapic_next_deadline+0x2c/0x40 clockevents_program_event+0x205/0x320 run_timer_softirq+0xa9/0x1b0 __do_softirq+0x1b9/0x641 __irq_exit_rcu+0xdc/0x190 irq_exit_rcu+0xe/0x20 sysvec_apic_timer_interrupt+0xa1/0xc0
Cc: stable@vger.kernel.org Signed-off-by: Min Li lm0963hack@gmail.com Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/hidp/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/bluetooth/hidp/core.c +++ b/net/bluetooth/hidp/core.c @@ -433,7 +433,7 @@ static void hidp_set_timer(struct hidp_s static void hidp_del_timer(struct hidp_session *session) { if (session->idle_to > 0) - del_timer(&session->timer); + del_timer_sync(&session->timer); }
static void hidp_process_report(struct hidp_session *session, int type,
From: David Sterba dsterba@suse.com
commit c8a5f8ca9a9c7d5c5bc31d54f47ea9d86f93ed69 upstream.
Per user request, print the checksum type and implementation at mount time among the messages. The checksum is user configurable and the actual crypto implementation is useful to see for performance reasons. The same information is also available after mount in /sys/fs/FSID/checksum file.
Example:
[25.323662] BTRFS info (device vdb): using sha256 (sha256-generic) checksum algorithm
Link: https://github.com/kdave/btrfs-progs/issues/483 Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Reviewed-by: Nikolay Borisov nborisov@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/disk-io.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2246,6 +2246,9 @@ static int btrfs_init_csum_hash(struct b
fs_info->csum_shash = csum_shash;
+ btrfs_info(fs_info, "using %s (%s) checksum algorithm", + btrfs_super_csum_name(csum_type), + crypto_shash_driver_name(csum_shash)); return 0; }
From: Christoph Hellwig hch@lst.de
commit 68d99ab0e9221ef54506f827576c5a914680eeaf upstream.
The BTRFS_FS_CSUM_IMPL_FAST flag is currently set whenever a non-generic crc32c is detected, which is the incorrect check if the file system uses a different checksumming algorithm. Refactor the code to only check this if crc32c is actually used. Note that in an ideal world the information if an algorithm is hardware accelerated or not should be provided by the crypto API instead, but that's left for another day.
CC: stable@vger.kernel.org # 5.4.x: c8a5f8ca9a9c: btrfs: print checksum type and implementation at mount time CC: stable@vger.kernel.org # 5.4.x Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/disk-io.c | 14 ++++++++++++++ fs/btrfs/super.c | 2 -- 2 files changed, 14 insertions(+), 2 deletions(-)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2246,6 +2246,20 @@ static int btrfs_init_csum_hash(struct b
fs_info->csum_shash = csum_shash;
+ /* + * Check if the checksum implementation is a fast accelerated one. + * As-is this is a bit of a hack and should be replaced once the csum + * implementations provide that information themselves. + */ + switch (csum_type) { + case BTRFS_CSUM_TYPE_CRC32: + if (!strstr(crypto_shash_driver_name(csum_shash), "generic")) + set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags); + break; + default: + break; + } + btrfs_info(fs_info, "using %s (%s) checksum algorithm", btrfs_super_csum_name(csum_type), crypto_shash_driver_name(csum_shash)); --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -1692,8 +1692,6 @@ static struct dentry *btrfs_mount_root(s } else { snprintf(s->s_id, sizeof(s->s_id), "%pg", bdev); btrfs_sb(s)->bdev_holder = fs_type; - if (!strstr(crc32c_impl(), "generic")) - set_bit(BTRFS_FS_CSUM_IMPL_FAST, &fs_info->flags); error = btrfs_fill_super(s, fs_devices, data); } if (!error)
From: Daniel Vetter daniel.vetter@ffwll.ch
commit 6fd33a3333c7916689b8f051a185defe4dd515b0 upstream.
This is an oversight from dc5bdb68b5b3 ("drm/fb-helper: Fix vt restore") - I failed to realize that nasty userspace could set this.
It's not pretty to mix up kernel-internal and userspace uapi flags like this, but since the entire fb_var_screeninfo structure is uapi we'd need to either add a new parameter to the ->fb_set_par callback and fb_set_par() function, which has a _lot_ of users. Or some other fairly ugly side-channel int fb_info. Neither is a pretty prospect.
Instead just correct the issue at hand by filtering out this kernel-internal flag in the ioctl handling code.
Reviewed-by: Javier Martinez Canillas javierm@redhat.com Acked-by: Maarten Lankhorst maarten.lankhorst@linux.intel.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Fixes: dc5bdb68b5b3 ("drm/fb-helper: Fix vt restore") Cc: Alex Deucher alexander.deucher@amd.com Cc: shlomo@fastmail.com Cc: Michel Dänzer michel@daenzer.net Cc: Noralf Trønnes noralf@tronnes.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Daniel Vetter daniel.vetter@intel.com Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v5.7+ Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com Cc: Geert Uytterhoeven geert@linux-m68k.org Cc: Nathan Chancellor natechancellor@gmail.com Cc: Qiujun Huang hqjagain@gmail.com Cc: Peter Rosin peda@axentia.se Cc: linux-fbdev@vger.kernel.org Cc: Helge Deller deller@gmx.de Cc: Sam Ravnborg sam@ravnborg.org Cc: Geert Uytterhoeven geert+renesas@glider.be Cc: Samuel Thibault samuel.thibault@ens-lyon.org Cc: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Cc: Shigeru Yoshida syoshida@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20230404193934.472457-1-daniel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/video/fbdev/core/fbmem.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/video/fbdev/core/fbmem.c +++ b/drivers/video/fbdev/core/fbmem.c @@ -1117,6 +1117,8 @@ static long do_fb_ioctl(struct fb_info * case FBIOPUT_VSCREENINFO: if (copy_from_user(&var, argp, sizeof(var))) return -EFAULT; + /* only for kernel-internal use */ + var.activate &= ~FB_ACTIVATE_KD_TEXT; console_lock(); lock_fb_info(info); ret = fbcon_modechange_possible(info, &var);
From: Bang Li libang.linuxer@gmail.com
commit 0c3089601f064d80b3838eceb711fcac04bceaad upstream.
mtd_read() may return -EUCLEAN in case of corrected bit-flips.This particular condition should not be treated like an error.
Signed-off-by: Bang Li libang.linuxer@gmail.com Fixes: e47f68587b82 ("mtd: check for max_bitflips in mtd_read_oob()") Cc: stable@vger.kernel.org # v3.7 Acked-by: Richard Weinberger richard@nod.at Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20230328163012.4264-1-libang.linuxer@gmail... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/mtdblock.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/drivers/mtd/mtdblock.c +++ b/drivers/mtd/mtdblock.c @@ -153,7 +153,7 @@ static int do_cached_write (struct mtdbl mtdblk->cache_state = STATE_EMPTY; ret = mtd_read(mtd, sect_start, sect_size, &retlen, mtdblk->cache_data); - if (ret) + if (ret && !mtd_is_bitflip(ret)) return ret; if (retlen != sect_size) return -EIO; @@ -188,8 +188,12 @@ static int do_cached_read (struct mtdblk pr_debug("mtdblock: read on "%s" at 0x%lx, size 0x%x\n", mtd->name, pos, len);
- if (!sect_size) - return mtd_read(mtd, pos, len, &retlen, buf); + if (!sect_size) { + ret = mtd_read(mtd, pos, len, &retlen, buf); + if (ret && !mtd_is_bitflip(ret)) + return ret; + return 0; + }
while (len > 0) { unsigned long sect_start = (pos/sect_size)*sect_size; @@ -209,7 +213,7 @@ static int do_cached_read (struct mtdblk memcpy (buf, mtdblk->cache_data + offset, size); } else { ret = mtd_read(mtd, pos, size, &retlen, buf); - if (ret) + if (ret && !mtd_is_bitflip(ret)) return ret; if (retlen != size) return -EIO;
From: Arseniy Krasnov avkrasnov@sberdevices.ru
commit 93942b70461574ca7fc3d91494ca89b16a4c64c7 upstream.
Valid mask is 0x3FFF, without this patch the following problems were found:
1) [ 0.938914] Could not find a valid ONFI parameter page, trying bit-wise majority to recover it [ 0.947384] ONFI parameter recovery failed, aborting
2) Read with disabled ECC mode was broken.
Fixes: 8fae856c5350 ("mtd: rawnand: meson: add support for Amlogic NAND flash controller") Cc: Stable@vger.kernel.org Signed-off-by: Arseniy Krasnov AVKrasnov@sberdevices.ru Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/3794ffbf-dfea-e96f-1f97-fe235b005e19@sberd... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/meson_nand.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/mtd/nand/raw/meson_nand.c +++ b/drivers/mtd/nand/raw/meson_nand.c @@ -276,7 +276,7 @@ static void meson_nfc_cmd_access(struct
if (raw) { len = mtd->writesize + mtd->oobsize; - cmd = (len & GENMASK(5, 0)) | scrambler | DMA_DIR(dir); + cmd = (len & GENMASK(13, 0)) | scrambler | DMA_DIR(dir); writel(cmd, nfc->reg_base + NFC_REG_CMD); return; } @@ -540,7 +540,7 @@ static int meson_nfc_read_buf(struct nan if (ret) goto out;
- cmd = NFC_CMD_N2M | (len & GENMASK(5, 0)); + cmd = NFC_CMD_N2M | (len & GENMASK(13, 0)); writel(cmd, nfc->reg_base + NFC_REG_CMD);
meson_nfc_drain_cmd(nfc); @@ -564,7 +564,7 @@ static int meson_nfc_write_buf(struct na if (ret) return ret;
- cmd = NFC_CMD_M2N | (len & GENMASK(5, 0)); + cmd = NFC_CMD_M2N | (len & GENMASK(13, 0)); writel(cmd, nfc->reg_base + NFC_REG_CMD);
meson_nfc_drain_cmd(nfc);
From: Christophe Kerello christophe.kerello@foss.st.com
commit f71e0e329c152c7f11ddfd97ffc62aba152fad3f upstream.
Remove the EDO mode support from as the FMC2 controller does not support the feature.
Signed-off-by: Christophe Kerello christophe.kerello@foss.st.com Fixes: 2cd457f328c1 ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver") Cc: stable@vger.kernel.org #v5.4+ Reviewed-by: Tudor Ambarus tudor.ambarus@linaro.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20230328155819.225521-2-christophe.kerello... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/stm32_fmc2_nand.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/mtd/nand/raw/stm32_fmc2_nand.c +++ b/drivers/mtd/nand/raw/stm32_fmc2_nand.c @@ -1525,6 +1525,9 @@ static int stm32_fmc2_nfc_setup_interfac if (IS_ERR(sdrt)) return PTR_ERR(sdrt);
+ if (sdrt->tRC_min < 30000) + return -EOPNOTSUPP; + if (chipnr == NAND_DATA_IFACE_CHECK_ONLY) return 0;
From: Christophe Kerello christophe.kerello@foss.st.com
commit ddbb664b6ab8de7dffa388ae0c88cd18616494e5 upstream.
Use timings.mode value instead of checking tRC_min timing for EDO mode support.
Signed-off-by: Christophe Kerello christophe.kerello@foss.st.com Fixes: 2cd457f328c1 ("mtd: rawnand: stm32_fmc2: add STM32 FMC2 NAND flash controller driver") Cc: stable@vger.kernel.org #v5.10+ Reviewed-by: Tudor Ambarus tudor.ambarus@linaro.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20230328155819.225521-3-christophe.kerello... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/stm32_fmc2_nand.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/mtd/nand/raw/stm32_fmc2_nand.c +++ b/drivers/mtd/nand/raw/stm32_fmc2_nand.c @@ -1525,7 +1525,7 @@ static int stm32_fmc2_nfc_setup_interfac if (IS_ERR(sdrt)) return PTR_ERR(sdrt);
- if (sdrt->tRC_min < 30000) + if (conf->timings.mode > 3) return -EOPNOTSUPP;
if (chipnr == NAND_DATA_IFACE_CHECK_ONLY)
From: Chunyan Zhang chunyan.zhang@unisoc.com
[ Upstream commit 47d43086531f10539470a63e8ad92803e686a3dd ]
In sprd clock driver, regmap_config.max_register was set to a fixed value which is likely larger than the address range configured in device tree, when reading registers through debugfs it would cause access violation.
Fixes: d41f59fd92f2 ("clk: sprd: Add common infrastructure") Signed-off-by: Chunyan Zhang chunyan.zhang@unisoc.com Link: https://lore.kernel.org/r/20230316023624.758204-1-chunyan.zhang@unisoc.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/sprd/common.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/clk/sprd/common.c b/drivers/clk/sprd/common.c index ce81e4087a8fc..2bfbab8db94bf 100644 --- a/drivers/clk/sprd/common.c +++ b/drivers/clk/sprd/common.c @@ -17,7 +17,6 @@ static const struct regmap_config sprdclk_regmap_config = { .reg_bits = 32, .reg_stride = 4, .val_bits = 32, - .max_register = 0xffff, .fast_io = true, };
@@ -43,6 +42,8 @@ int sprd_clk_regmap_init(struct platform_device *pdev, struct device *dev = &pdev->dev; struct device_node *node = dev->of_node, *np; struct regmap *regmap; + struct resource *res; + struct regmap_config reg_config = sprdclk_regmap_config;
if (of_find_property(node, "sprd,syscon", NULL)) { regmap = syscon_regmap_lookup_by_phandle(node, "sprd,syscon"); @@ -59,12 +60,14 @@ int sprd_clk_regmap_init(struct platform_device *pdev, return PTR_ERR(regmap); } } else { - base = devm_platform_ioremap_resource(pdev, 0); + base = devm_platform_get_and_ioremap_resource(pdev, 0, &res); if (IS_ERR(base)) return PTR_ERR(base);
+ reg_config.max_register = resource_size(res) - reg_config.reg_stride; + regmap = devm_regmap_init_mmio(&pdev->dev, base, - &sprdclk_regmap_config); + ®_config); if (IS_ERR(regmap)) { pr_err("failed to init regmap\n"); return PTR_ERR(regmap);
From: Meir Lichtinger meirl@mellanox.com
[ Upstream commit f946e45f59ef01ff54ffb3b1eba3a8e7915e7326 ]
The IBTA specification has new speed - NDR. That speed supports signaling rate of 100Gb. mlx5 IB driver translates link modes reported by ConnectX device to IB speed and width. Added translation of new 100Gb, 200Gb and 400Gb link modes to NDR IB type and width of x1, x2 or x4 respectively.
Link: https://lore.kernel.org/r/20201026133738.1340432-3-leon@kernel.org Signed-off-by: Meir Lichtinger meirl@mellanox.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Stable-dep-of: 88c9483faf15 ("IB/mlx5: Add support for 400G_8X lane speed") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/mlx5/main.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index eb69bec77e5d4..638da09ff8380 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -425,10 +425,22 @@ static int translate_eth_ext_proto_oper(u32 eth_proto_oper, u16 *active_speed, *active_width = IB_WIDTH_2X; *active_speed = IB_SPEED_HDR; break; + case MLX5E_PROT_MASK(MLX5E_100GAUI_1_100GBASE_CR_KR): + *active_width = IB_WIDTH_1X; + *active_speed = IB_SPEED_NDR; + break; case MLX5E_PROT_MASK(MLX5E_200GAUI_4_200GBASE_CR4_KR4): *active_width = IB_WIDTH_4X; *active_speed = IB_SPEED_HDR; break; + case MLX5E_PROT_MASK(MLX5E_200GAUI_2_200GBASE_CR2_KR2): + *active_width = IB_WIDTH_2X; + *active_speed = IB_SPEED_NDR; + break; + case MLX5E_PROT_MASK(MLX5E_400GAUI_4_400GBASE_CR4_KR4): + *active_width = IB_WIDTH_4X; + *active_speed = IB_SPEED_NDR; + break; default: return -EINVAL; }
From: Maher Sanalla msanalla@nvidia.com
[ Upstream commit 88c9483faf15ada14eca82714114656893063458 ]
Currently, when driver queries PTYS to report which link speed is being used on its RoCE ports, it does not check the case of having 400Gbps transmitted over 8 lanes. Thus it fails to report the said speed and instead it defaults to report 10G over 4 lanes.
Add a check for the said speed when querying PTYS and report it back correctly when needed.
Fixes: 08e8676f1607 ("IB/mlx5: Add support for 50Gbps per lane link modes") Signed-off-by: Maher Sanalla msanalla@nvidia.com Reviewed-by: Aya Levin ayal@nvidia.com Reviewed-by: Saeed Mahameed saeedm@nvidia.com Link: https://lore.kernel.org/r/ec9040548d119d22557d6a4b4070d6f421701fd4.167897399... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/mlx5/main.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 638da09ff8380..5ef37902e96b5 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -437,6 +437,10 @@ static int translate_eth_ext_proto_oper(u32 eth_proto_oper, u16 *active_speed, *active_width = IB_WIDTH_2X; *active_speed = IB_SPEED_NDR; break; + case MLX5E_PROT_MASK(MLX5E_400GAUI_8): + *active_width = IB_WIDTH_8X; + *active_speed = IB_SPEED_HDR; + break; case MLX5E_PROT_MASK(MLX5E_400GAUI_4_400GBASE_CR4_KR4): *active_width = IB_WIDTH_4X; *active_speed = IB_SPEED_NDR;
From: Mark Zhang markzhang@nvidia.com
[ Upstream commit 58e84f6b3e84e46524b7e5a916b53c1ad798bc8f ]
As for multicast: - The SIDR is the only mode that makes sense; - Besides PS_UDP, other port spaces like PS_IB is also allowed, as it is UD compatible. In this case qkey also needs to be set [1].
This patch allows only UD qp_type to join multicast, and set qkey to default if it's not set, to fix an uninit-value error: the ib->rec.qkey field is accessed without being initialized.
===================================================== BUG: KMSAN: uninit-value in cma_set_qkey drivers/infiniband/core/cma.c:510 [inline] BUG: KMSAN: uninit-value in cma_make_mc_event+0xb73/0xe00 drivers/infiniband/core/cma.c:4570 cma_set_qkey drivers/infiniband/core/cma.c:510 [inline] cma_make_mc_event+0xb73/0xe00 drivers/infiniband/core/cma.c:4570 cma_iboe_join_multicast drivers/infiniband/core/cma.c:4782 [inline] rdma_join_multicast+0x2b83/0x30a0 drivers/infiniband/core/cma.c:4814 ucma_process_join+0xa76/0xf60 drivers/infiniband/core/ucma.c:1479 ucma_join_multicast+0x1e3/0x250 drivers/infiniband/core/ucma.c:1546 ucma_write+0x639/0x6d0 drivers/infiniband/core/ucma.c:1732 vfs_write+0x8ce/0x2030 fs/read_write.c:588 ksys_write+0x28c/0x520 fs/read_write.c:643 __do_sys_write fs/read_write.c:655 [inline] __se_sys_write fs/read_write.c:652 [inline] __ia32_sys_write+0xdb/0x120 fs/read_write.c:652 do_syscall_32_irqs_on arch/x86/entry/common.c:114 [inline] __do_fast_syscall_32+0x96/0xf0 arch/x86/entry/common.c:180 do_fast_syscall_32+0x34/0x70 arch/x86/entry/common.c:205 do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:248 entry_SYSENTER_compat_after_hwframe+0x4d/0x5c
Local variable ib.i created at: cma_iboe_join_multicast drivers/infiniband/core/cma.c:4737 [inline] rdma_join_multicast+0x586/0x30a0 drivers/infiniband/core/cma.c:4814 ucma_process_join+0xa76/0xf60 drivers/infiniband/core/ucma.c:1479
CPU: 0 PID: 29874 Comm: syz-executor.3 Not tainted 5.16.0-rc3-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 =====================================================
[1] https://lore.kernel.org/linux-rdma/20220117183832.GD84788@nvidia.com/
Fixes: b5de0c60cc30 ("RDMA/cma: Fix use after free race in roce multicast join") Reported-by: syzbot+8fcbb77276d43cc8b693@syzkaller.appspotmail.com Signed-off-by: Mark Zhang markzhang@nvidia.com Link: https://lore.kernel.org/r/58a4a98323b5e6b1282e83f6b76960d06e43b9fa.167930990... Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/cma.c | 60 ++++++++++++++++++++--------------- 1 file changed, 34 insertions(+), 26 deletions(-)
diff --git a/drivers/infiniband/core/cma.c b/drivers/infiniband/core/cma.c index 9ed5de38e372f..fdcad8d6a5a07 100644 --- a/drivers/infiniband/core/cma.c +++ b/drivers/infiniband/core/cma.c @@ -505,22 +505,11 @@ static inline unsigned short cma_family(struct rdma_id_private *id_priv) return id_priv->id.route.addr.src_addr.ss_family; }
-static int cma_set_qkey(struct rdma_id_private *id_priv, u32 qkey) +static int cma_set_default_qkey(struct rdma_id_private *id_priv) { struct ib_sa_mcmember_rec rec; int ret = 0;
- if (id_priv->qkey) { - if (qkey && id_priv->qkey != qkey) - return -EINVAL; - return 0; - } - - if (qkey) { - id_priv->qkey = qkey; - return 0; - } - switch (id_priv->id.ps) { case RDMA_PS_UDP: case RDMA_PS_IB: @@ -540,6 +529,16 @@ static int cma_set_qkey(struct rdma_id_private *id_priv, u32 qkey) return ret; }
+static int cma_set_qkey(struct rdma_id_private *id_priv, u32 qkey) +{ + if (!qkey || + (id_priv->qkey && (id_priv->qkey != qkey))) + return -EINVAL; + + id_priv->qkey = qkey; + return 0; +} + static void cma_translate_ib(struct sockaddr_ib *sib, struct rdma_dev_addr *dev_addr) { dev_addr->dev_type = ARPHRD_INFINIBAND; @@ -1107,7 +1106,7 @@ static int cma_ib_init_qp_attr(struct rdma_id_private *id_priv, *qp_attr_mask = IB_QP_STATE | IB_QP_PKEY_INDEX | IB_QP_PORT;
if (id_priv->id.qp_type == IB_QPT_UD) { - ret = cma_set_qkey(id_priv, 0); + ret = cma_set_default_qkey(id_priv); if (ret) return ret;
@@ -4312,7 +4311,10 @@ static int cma_send_sidr_rep(struct rdma_id_private *id_priv, memset(&rep, 0, sizeof rep); rep.status = status; if (status == IB_SIDR_SUCCESS) { - ret = cma_set_qkey(id_priv, qkey); + if (qkey) + ret = cma_set_qkey(id_priv, qkey); + else + ret = cma_set_default_qkey(id_priv); if (ret) return ret; rep.qp_num = id_priv->qp_num; @@ -4516,9 +4518,7 @@ static void cma_make_mc_event(int status, struct rdma_id_private *id_priv, enum ib_gid_type gid_type; struct net_device *ndev;
- if (!status) - status = cma_set_qkey(id_priv, be32_to_cpu(multicast->rec.qkey)); - else + if (status) pr_debug_ratelimited("RDMA CM: MULTICAST_ERROR: failed to join multicast. status %d\n", status);
@@ -4546,7 +4546,7 @@ static void cma_make_mc_event(int status, struct rdma_id_private *id_priv, }
event->param.ud.qp_num = 0xFFFFFF; - event->param.ud.qkey = be32_to_cpu(multicast->rec.qkey); + event->param.ud.qkey = id_priv->qkey;
out: if (ndev) @@ -4565,8 +4565,11 @@ static int cma_ib_mc_handler(int status, struct ib_sa_multicast *multicast) READ_ONCE(id_priv->state) == RDMA_CM_DESTROYING) goto out;
- cma_make_mc_event(status, id_priv, multicast, &event, mc); - ret = cma_cm_event_handler(id_priv, &event); + ret = cma_set_qkey(id_priv, be32_to_cpu(multicast->rec.qkey)); + if (!ret) { + cma_make_mc_event(status, id_priv, multicast, &event, mc); + ret = cma_cm_event_handler(id_priv, &event); + } rdma_destroy_ah_attr(&event.param.ud.ah_attr); WARN_ON(ret);
@@ -4619,9 +4622,11 @@ static int cma_join_ib_multicast(struct rdma_id_private *id_priv, if (ret) return ret;
- ret = cma_set_qkey(id_priv, 0); - if (ret) - return ret; + if (!id_priv->qkey) { + ret = cma_set_default_qkey(id_priv); + if (ret) + return ret; + }
cma_set_mgid(id_priv, (struct sockaddr *) &mc->addr, &rec.mgid); rec.qkey = cpu_to_be32(id_priv->qkey); @@ -4709,9 +4714,6 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv, cma_iboe_set_mgid(addr, &ib.rec.mgid, gid_type);
ib.rec.pkey = cpu_to_be16(0xffff); - if (id_priv->id.ps == RDMA_PS_UDP) - ib.rec.qkey = cpu_to_be32(RDMA_UDP_QKEY); - if (dev_addr->bound_dev_if) ndev = dev_get_by_index(dev_addr->net, dev_addr->bound_dev_if); if (!ndev) @@ -4737,6 +4739,9 @@ static int cma_iboe_join_multicast(struct rdma_id_private *id_priv, if (err || !ib.rec.mtu) return err ?: -EINVAL;
+ if (!id_priv->qkey) + cma_set_default_qkey(id_priv); + rdma_ip2gid((struct sockaddr *)&id_priv->id.route.addr.src_addr, &ib.rec.port_gid); INIT_WORK(&mc->iboe_join.work, cma_iboe_join_work_handler); @@ -4762,6 +4767,9 @@ int rdma_join_multicast(struct rdma_cm_id *id, struct sockaddr *addr, READ_ONCE(id_priv->state) != RDMA_CM_ADDR_RESOLVED)) return -EINVAL;
+ if (id_priv->id.qp_type != IB_QPT_UD) + return -EINVAL; + mc = kzalloc(sizeof(*mc), GFP_KERNEL); if (!mc) return -ENOMEM;
From: Zheng Wang zyytlz.wz@163.com
[ Upstream commit ea4f1009408efb4989a0f139b70fb338e7f687d0 ]
In xen_9pfs_front_probe, it calls xen_9pfs_front_alloc_dataring to init priv->rings and bound &ring->work with p9_xen_response.
When it calls xen_9pfs_front_event_handler to handle IRQ requests, it will finally call schedule_work to start the work.
When we call xen_9pfs_front_remove to remove the driver, there may be a sequence as follows:
Fix it by finishing the work before cleanup in xen_9pfs_front_free.
Note that, this bug is found by static analysis, which might be false positive.
CPU0 CPU1
|p9_xen_response xen_9pfs_front_remove| xen_9pfs_front_free| kfree(priv) | //free priv | |p9_tag_lookup |//use priv->client
Fixes: 71ebd71921e4 ("xen/9pfs: connect to the backend") Signed-off-by: Zheng Wang zyytlz.wz@163.com Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Signed-off-by: Eric Van Hensbergen ericvh@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/9p/trans_xen.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/9p/trans_xen.c b/net/9p/trans_xen.c index 220e8f4ac0cfe..da056170849bf 100644 --- a/net/9p/trans_xen.c +++ b/net/9p/trans_xen.c @@ -300,6 +300,10 @@ static void xen_9pfs_front_free(struct xen_9pfs_front_priv *priv) write_unlock(&xen_9pfs_lock);
for (i = 0; i < priv->num_rings; i++) { + struct xen_9pfs_dataring *ring = &priv->rings[i]; + + cancel_work_sync(&ring->work); + if (!priv->rings[i].intf) break; if (priv->rings[i].irq > 0)
From: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com
[ Upstream commit 8ce07be703456acb00e83d99f3b8036252c33b02 ]
Smatch reports: drivers/net/ethernet/sun/niu.c:4525 niu_alloc_channels() warn: missing unwind goto?
If niu_rbr_fill() fails, then we are directly returning 'err' without freeing the channels.
Fix this by changing direct return to a goto 'out_err'.
Fixes: a3138df9f20e ("[NIU]: Add Sun Neptune ethernet driver.") Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com Reviewed-by: Simon Horman simon.horman@corigine.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/sun/niu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/sun/niu.c b/drivers/net/ethernet/sun/niu.c index 860644d182ab0..1a269fa8c1a07 100644 --- a/drivers/net/ethernet/sun/niu.c +++ b/drivers/net/ethernet/sun/niu.c @@ -4503,7 +4503,7 @@ static int niu_alloc_channels(struct niu *np)
err = niu_rbr_fill(np, rp, GFP_KERNEL); if (err) - return err; + goto out_err; }
tx_rings = kcalloc(num_tx_rings, sizeof(struct tx_ring_info),
From: Eric Dumazet edumazet@google.com
[ Upstream commit cb9444130662c6c13022579c861098f212db2562 ]
Networking has many sysctls that could fit in one u8.
This patch adds proc_dou8vec_minmax() for this purpose.
Note that the .extra1 and .extra2 fields are pointing to integers, because it makes conversions easier.
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: dc5110c2d959 ("tcp: restrict net.ipv4.tcp_app_win") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/proc/proc_sysctl.c | 6 ++++ include/linux/sysctl.h | 2 ++ kernel/sysctl.c | 65 ++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 73 insertions(+)
diff --git a/fs/proc/proc_sysctl.c b/fs/proc/proc_sysctl.c index cd7c6c4af83ad..1655b7b2a5abe 100644 --- a/fs/proc/proc_sysctl.c +++ b/fs/proc/proc_sysctl.c @@ -1106,6 +1106,11 @@ static int sysctl_check_table_array(const char *path, struct ctl_table *table) err |= sysctl_err(path, table, "array not allowed"); }
+ if (table->proc_handler == proc_dou8vec_minmax) { + if (table->maxlen != sizeof(u8)) + err |= sysctl_err(path, table, "array not allowed"); + } + return err; }
@@ -1121,6 +1126,7 @@ static int sysctl_check_table(const char *path, struct ctl_table *table) (table->proc_handler == proc_douintvec) || (table->proc_handler == proc_douintvec_minmax) || (table->proc_handler == proc_dointvec_minmax) || + (table->proc_handler == proc_dou8vec_minmax) || (table->proc_handler == proc_dointvec_jiffies) || (table->proc_handler == proc_dointvec_userhz_jiffies) || (table->proc_handler == proc_dointvec_ms_jiffies) || diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h index 161eba9fd9122..4393de94cb32d 100644 --- a/include/linux/sysctl.h +++ b/include/linux/sysctl.h @@ -53,6 +53,8 @@ int proc_douintvec(struct ctl_table *, int, void *, size_t *, loff_t *); int proc_dointvec_minmax(struct ctl_table *, int, void *, size_t *, loff_t *); int proc_douintvec_minmax(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos); +int proc_dou8vec_minmax(struct ctl_table *table, int write, void *buffer, + size_t *lenp, loff_t *ppos); int proc_dointvec_jiffies(struct ctl_table *, int, void *, size_t *, loff_t *); int proc_dointvec_userhz_jiffies(struct ctl_table *, int, void *, size_t *, loff_t *); diff --git a/kernel/sysctl.c b/kernel/sysctl.c index d8b7b28463135..43e907f4cac79 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1061,6 +1061,65 @@ int proc_douintvec_minmax(struct ctl_table *table, int write, do_proc_douintvec_minmax_conv, ¶m); }
+/** + * proc_dou8vec_minmax - read a vector of unsigned chars with min/max values + * @table: the sysctl table + * @write: %TRUE if this is a write to the sysctl file + * @buffer: the user buffer + * @lenp: the size of the user buffer + * @ppos: file position + * + * Reads/writes up to table->maxlen/sizeof(u8) unsigned chars + * values from/to the user buffer, treated as an ASCII string. Negative + * strings are not allowed. + * + * This routine will ensure the values are within the range specified by + * table->extra1 (min) and table->extra2 (max). + * + * Returns 0 on success or an error on write when the range check fails. + */ +int proc_dou8vec_minmax(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + struct ctl_table tmp; + unsigned int min = 0, max = 255U, val; + u8 *data = table->data; + struct do_proc_douintvec_minmax_conv_param param = { + .min = &min, + .max = &max, + }; + int res; + + /* Do not support arrays yet. */ + if (table->maxlen != sizeof(u8)) + return -EINVAL; + + if (table->extra1) { + min = *(unsigned int *) table->extra1; + if (min > 255U) + return -EINVAL; + } + if (table->extra2) { + max = *(unsigned int *) table->extra2; + if (max > 255U) + return -EINVAL; + } + + tmp = *table; + + tmp.maxlen = sizeof(val); + tmp.data = &val; + val = *data; + res = do_proc_douintvec(&tmp, write, buffer, lenp, ppos, + do_proc_douintvec_minmax_conv, ¶m); + if (res) + return res; + if (write) + *data = val; + return 0; +} +EXPORT_SYMBOL_GPL(proc_dou8vec_minmax); + static int do_proc_dopipe_max_size_conv(unsigned long *lvalp, unsigned int *valp, int write, void *data) @@ -1612,6 +1671,12 @@ int proc_douintvec_minmax(struct ctl_table *table, int write, return -ENOSYS; }
+int proc_dou8vec_minmax(struct ctl_table *table, int write, + void *buffer, size_t *lenp, loff_t *ppos) +{ + return -ENOSYS; +} + int proc_dointvec_jiffies(struct ctl_table *table, int write, void *buffer, size_t *lenp, loff_t *ppos) {
From: Eric Dumazet edumazet@google.com
[ Upstream commit 4b6bbf17d4e1939afa72821879fc033d725e9491 ]
These sysctls that can fit in one byte instead of one int are converted to save space and thus reduce cache line misses.
- icmp_echo_ignore_all, icmp_echo_ignore_broadcasts, - icmp_ignore_bogus_error_responses, icmp_errors_use_inbound_ifaddr - tcp_ecn, tcp_ecn_fallback - ip_default_ttl, ip_no_pmtu_disc, ip_fwd_use_pmtu - ip_nonlocal_bind, ip_autobind_reuse - ip_dynaddr, ip_early_demux, raw_l3mdev_accept - nexthop_compat_mode, fwmark_reflect
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: dc5110c2d959 ("tcp: restrict net.ipv4.tcp_app_win") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netns/ipv4.h | 32 +++++++++---------- net/ipv4/sysctl_net_ipv4.c | 64 +++++++++++++++++++------------------- 2 files changed, 48 insertions(+), 48 deletions(-)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 75484f425e558..92e3d8fe954ab 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -84,36 +84,36 @@ struct netns_ipv4 { struct xt_table *nat_table; #endif
- int sysctl_icmp_echo_ignore_all; - int sysctl_icmp_echo_ignore_broadcasts; - int sysctl_icmp_ignore_bogus_error_responses; + u8 sysctl_icmp_echo_ignore_all; + u8 sysctl_icmp_echo_ignore_broadcasts; + u8 sysctl_icmp_ignore_bogus_error_responses; + u8 sysctl_icmp_errors_use_inbound_ifaddr; int sysctl_icmp_ratelimit; int sysctl_icmp_ratemask; - int sysctl_icmp_errors_use_inbound_ifaddr;
struct local_ports ip_local_ports;
- int sysctl_tcp_ecn; - int sysctl_tcp_ecn_fallback; + u8 sysctl_tcp_ecn; + u8 sysctl_tcp_ecn_fallback;
- int sysctl_ip_default_ttl; - int sysctl_ip_no_pmtu_disc; - int sysctl_ip_fwd_use_pmtu; + u8 sysctl_ip_default_ttl; + u8 sysctl_ip_no_pmtu_disc; + u8 sysctl_ip_fwd_use_pmtu; int sysctl_ip_fwd_update_priority; - int sysctl_ip_nonlocal_bind; - int sysctl_ip_autobind_reuse; + u8 sysctl_ip_nonlocal_bind; + u8 sysctl_ip_autobind_reuse; /* Shall we try to damage output packets if routing dev changes? */ - int sysctl_ip_dynaddr; - int sysctl_ip_early_demux; + u8 sysctl_ip_dynaddr; + u8 sysctl_ip_early_demux; #ifdef CONFIG_NET_L3_MASTER_DEV - int sysctl_raw_l3mdev_accept; + u8 sysctl_raw_l3mdev_accept; #endif int sysctl_tcp_early_demux; int sysctl_udp_early_demux;
- int sysctl_nexthop_compat_mode; + u8 sysctl_nexthop_compat_mode;
- int sysctl_fwmark_reflect; + u8 sysctl_fwmark_reflect; int sysctl_tcp_fwmark_accept; #ifdef CONFIG_NET_L3_MASTER_DEV int sysctl_tcp_l3mdev_accept; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 439970e02ac65..cb587bdd683a6 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -540,30 +540,30 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "icmp_echo_ignore_all", .data = &init_net.ipv4.sysctl_icmp_echo_ignore_all, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "icmp_echo_ignore_broadcasts", .data = &init_net.ipv4.sysctl_icmp_echo_ignore_broadcasts, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "icmp_ignore_bogus_error_responses", .data = &init_net.ipv4.sysctl_icmp_ignore_bogus_error_responses, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "icmp_errors_use_inbound_ifaddr", .data = &init_net.ipv4.sysctl_icmp_errors_use_inbound_ifaddr, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "icmp_ratelimit", @@ -590,9 +590,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "raw_l3mdev_accept", .data = &init_net.ipv4.sysctl_raw_l3mdev_accept, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, @@ -600,30 +600,30 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_ecn", .data = &init_net.ipv4.sysctl_tcp_ecn, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_ecn_fallback", .data = &init_net.ipv4.sysctl_tcp_ecn_fallback, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "ip_dynaddr", .data = &init_net.ipv4.sysctl_ip_dynaddr, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "ip_early_demux", .data = &init_net.ipv4.sysctl_ip_early_demux, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "udp_early_demux", @@ -642,18 +642,18 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "nexthop_compat_mode", .data = &init_net.ipv4.sysctl_nexthop_compat_mode, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, { .procname = "ip_default_ttl", .data = &init_net.ipv4.sysctl_ip_default_ttl, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra1 = &ip_ttl_min, .extra2 = &ip_ttl_max, }, @@ -674,16 +674,16 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "ip_no_pmtu_disc", .data = &init_net.ipv4.sysctl_ip_no_pmtu_disc, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "ip_forward_use_pmtu", .data = &init_net.ipv4.sysctl_ip_fwd_use_pmtu, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dou8vec_minmax, }, { .procname = "ip_forward_update_priority", @@ -697,25 +697,25 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "ip_nonlocal_bind", .data = &init_net.ipv4.sysctl_ip_nonlocal_bind, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "ip_autobind_reuse", .data = &init_net.ipv4.sysctl_ip_autobind_reuse, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, { .procname = "fwmark_reflect", .data = &init_net.ipv4.sysctl_fwmark_reflect, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_fwmark_accept",
From: Eric Dumazet edumazet@google.com
[ Upstream commit 4ecc1baf362c5df2dcabe242511e38ee28486545 ]
Many tcp sysctls are either bools or small ints that can fit into u8.
Reducing space taken by sysctls can save few cache line misses when sending/receiving data while cpu caches are empty, for example after cpu idle period.
This is hard to measure with typical network performance tests, but after this patch, struct netns_ipv4 has shrunk by three cache lines.
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: dc5110c2d959 ("tcp: restrict net.ipv4.tcp_app_win") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netns/ipv4.h | 68 +++++++++---------- net/ipv4/sysctl_net_ipv4.c | 136 ++++++++++++++++++------------------- 2 files changed, 102 insertions(+), 102 deletions(-)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 92e3d8fe954ab..d8b320cf54ba0 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -114,11 +114,11 @@ struct netns_ipv4 { u8 sysctl_nexthop_compat_mode;
u8 sysctl_fwmark_reflect; - int sysctl_tcp_fwmark_accept; + u8 sysctl_tcp_fwmark_accept; #ifdef CONFIG_NET_L3_MASTER_DEV - int sysctl_tcp_l3mdev_accept; + u8 sysctl_tcp_l3mdev_accept; #endif - int sysctl_tcp_mtu_probing; + u8 sysctl_tcp_mtu_probing; int sysctl_tcp_mtu_probe_floor; int sysctl_tcp_base_mss; int sysctl_tcp_min_snd_mss; @@ -126,46 +126,47 @@ struct netns_ipv4 { u32 sysctl_tcp_probe_interval;
int sysctl_tcp_keepalive_time; - int sysctl_tcp_keepalive_probes; int sysctl_tcp_keepalive_intvl; + u8 sysctl_tcp_keepalive_probes;
- int sysctl_tcp_syn_retries; - int sysctl_tcp_synack_retries; - int sysctl_tcp_syncookies; + u8 sysctl_tcp_syn_retries; + u8 sysctl_tcp_synack_retries; + u8 sysctl_tcp_syncookies; int sysctl_tcp_reordering; - int sysctl_tcp_retries1; - int sysctl_tcp_retries2; - int sysctl_tcp_orphan_retries; + u8 sysctl_tcp_retries1; + u8 sysctl_tcp_retries2; + u8 sysctl_tcp_orphan_retries; + u8 sysctl_tcp_tw_reuse; int sysctl_tcp_fin_timeout; unsigned int sysctl_tcp_notsent_lowat; - int sysctl_tcp_tw_reuse; - int sysctl_tcp_sack; - int sysctl_tcp_window_scaling; - int sysctl_tcp_timestamps; - int sysctl_tcp_early_retrans; - int sysctl_tcp_recovery; - int sysctl_tcp_thin_linear_timeouts; - int sysctl_tcp_slow_start_after_idle; - int sysctl_tcp_retrans_collapse; - int sysctl_tcp_stdurg; - int sysctl_tcp_rfc1337; - int sysctl_tcp_abort_on_overflow; - int sysctl_tcp_fack; + u8 sysctl_tcp_sack; + u8 sysctl_tcp_window_scaling; + u8 sysctl_tcp_timestamps; + u8 sysctl_tcp_early_retrans; + u8 sysctl_tcp_recovery; + u8 sysctl_tcp_thin_linear_timeouts; + u8 sysctl_tcp_slow_start_after_idle; + u8 sysctl_tcp_retrans_collapse; + u8 sysctl_tcp_stdurg; + u8 sysctl_tcp_rfc1337; + u8 sysctl_tcp_abort_on_overflow; + u8 sysctl_tcp_fack; /* obsolete */ int sysctl_tcp_max_reordering; - int sysctl_tcp_dsack; - int sysctl_tcp_app_win; int sysctl_tcp_adv_win_scale; - int sysctl_tcp_frto; - int sysctl_tcp_nometrics_save; - int sysctl_tcp_no_ssthresh_metrics_save; - int sysctl_tcp_moderate_rcvbuf; - int sysctl_tcp_tso_win_divisor; - int sysctl_tcp_workaround_signed_windows; + u8 sysctl_tcp_dsack; + u8 sysctl_tcp_app_win; + u8 sysctl_tcp_frto; + u8 sysctl_tcp_nometrics_save; + u8 sysctl_tcp_no_ssthresh_metrics_save; + u8 sysctl_tcp_moderate_rcvbuf; + u8 sysctl_tcp_tso_win_divisor; + u8 sysctl_tcp_workaround_signed_windows; int sysctl_tcp_limit_output_bytes; int sysctl_tcp_challenge_ack_limit; - int sysctl_tcp_min_tso_segs; int sysctl_tcp_min_rtt_wlen; - int sysctl_tcp_autocorking; + u8 sysctl_tcp_min_tso_segs; + u8 sysctl_tcp_autocorking; + u8 sysctl_tcp_reflect_tos; int sysctl_tcp_invalid_ratelimit; int sysctl_tcp_pacing_ss_ratio; int sysctl_tcp_pacing_ca_ratio; @@ -183,7 +184,6 @@ struct netns_ipv4 { unsigned int sysctl_tcp_fastopen_blackhole_timeout; atomic_t tfo_active_disable_times; unsigned long tfo_active_disable_stamp; - int sysctl_tcp_reflect_tos;
int sysctl_udp_wmem_min; int sysctl_udp_rmem_min; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index cb587bdd683a6..1a2506f795d4e 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -720,17 +720,17 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_fwmark_accept", .data = &init_net.ipv4.sysctl_tcp_fwmark_accept, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dou8vec_minmax, }, #ifdef CONFIG_NET_L3_MASTER_DEV { .procname = "tcp_l3mdev_accept", .data = &init_net.ipv4.sysctl_tcp_l3mdev_accept, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, @@ -738,9 +738,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_mtu_probing", .data = &init_net.ipv4.sysctl_tcp_mtu_probing, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_base_mss", @@ -842,9 +842,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_keepalive_probes", .data = &init_net.ipv4.sysctl_tcp_keepalive_probes, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_keepalive_intvl", @@ -856,26 +856,26 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_syn_retries", .data = &init_net.ipv4.sysctl_tcp_syn_retries, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra1 = &tcp_syn_retries_min, .extra2 = &tcp_syn_retries_max }, { .procname = "tcp_synack_retries", .data = &init_net.ipv4.sysctl_tcp_synack_retries, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, #ifdef CONFIG_SYN_COOKIES { .procname = "tcp_syncookies", .data = &init_net.ipv4.sysctl_tcp_syncookies, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, #endif { @@ -888,24 +888,24 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_retries1", .data = &init_net.ipv4.sysctl_tcp_retries1, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra2 = &tcp_retr1_max }, { .procname = "tcp_retries2", .data = &init_net.ipv4.sysctl_tcp_retries2, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_orphan_retries", .data = &init_net.ipv4.sysctl_tcp_orphan_retries, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_fin_timeout", @@ -924,9 +924,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_tw_reuse", .data = &init_net.ipv4.sysctl_tcp_tw_reuse, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = &two, }, @@ -1012,88 +1012,88 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_sack", .data = &init_net.ipv4.sysctl_tcp_sack, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_window_scaling", .data = &init_net.ipv4.sysctl_tcp_window_scaling, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_timestamps", .data = &init_net.ipv4.sysctl_tcp_timestamps, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_early_retrans", .data = &init_net.ipv4.sysctl_tcp_early_retrans, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = &four, }, { .procname = "tcp_recovery", .data = &init_net.ipv4.sysctl_tcp_recovery, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_thin_linear_timeouts", .data = &init_net.ipv4.sysctl_tcp_thin_linear_timeouts, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_slow_start_after_idle", .data = &init_net.ipv4.sysctl_tcp_slow_start_after_idle, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_retrans_collapse", .data = &init_net.ipv4.sysctl_tcp_retrans_collapse, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_stdurg", .data = &init_net.ipv4.sysctl_tcp_stdurg, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_rfc1337", .data = &init_net.ipv4.sysctl_tcp_rfc1337, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_abort_on_overflow", .data = &init_net.ipv4.sysctl_tcp_abort_on_overflow, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_fack", .data = &init_net.ipv4.sysctl_tcp_fack, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_max_reordering", @@ -1105,16 +1105,16 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_dsack", .data = &init_net.ipv4.sysctl_tcp_dsack, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_app_win", .data = &init_net.ipv4.sysctl_tcp_app_win, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_adv_win_scale", @@ -1128,46 +1128,46 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_frto", .data = &init_net.ipv4.sysctl_tcp_frto, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_no_metrics_save", .data = &init_net.ipv4.sysctl_tcp_nometrics_save, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_no_ssthresh_metrics_save", .data = &init_net.ipv4.sysctl_tcp_no_ssthresh_metrics_save, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, { .procname = "tcp_moderate_rcvbuf", .data = &init_net.ipv4.sysctl_tcp_moderate_rcvbuf, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_tso_win_divisor", .data = &init_net.ipv4.sysctl_tcp_tso_win_divisor, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec, + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_workaround_signed_windows", .data = &init_net.ipv4.sysctl_tcp_workaround_signed_windows, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec + .proc_handler = proc_dou8vec_minmax, }, { .procname = "tcp_limit_output_bytes", @@ -1186,9 +1186,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_min_tso_segs", .data = &init_net.ipv4.sysctl_tcp_min_tso_segs, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ONE, .extra2 = &gso_max_segs, }, @@ -1204,9 +1204,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_autocorking", .data = &init_net.ipv4.sysctl_tcp_autocorking, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, @@ -1277,9 +1277,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_reflect_tos", .data = &init_net.ipv4.sysctl_tcp_reflect_tos, - .maxlen = sizeof(int), + .maxlen = sizeof(u8), .mode = 0644, - .proc_handler = proc_dointvec_minmax, + .proc_handler = proc_dou8vec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, },
Greg,
On Tue, Apr 18, 2023 at 02:21:35PM +0200, Greg Kroah-Hartman wrote:
From: Eric Dumazet edumazet@google.com
[ Upstream commit 4ecc1baf362c5df2dcabe242511e38ee28486545 ]
Many tcp sysctls are either bools or small ints that can fit into u8.
Reducing space taken by sysctls can save few cache line misses when sending/receiving data while cpu caches are empty, for example after cpu idle period.
This is hard to measure with typical network performance tests, but after this patch, struct netns_ipv4 has shrunk by three cache lines.
This commit in 5.10 needs fix from d24f511b04b8 ("tcp: fix tcp_min_tso_segs sysctl") which have incorrectly set commit id in Fixes tag. (Perhaps, that's why it's missed.)
Fixes: 47996b489bdc ("tcp: convert elligible sysctls to u8")
Thanks,
Signed-off-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: dc5110c2d959 ("tcp: restrict net.ipv4.tcp_app_win") Signed-off-by: Sasha Levin sashal@kernel.org
include/net/netns/ipv4.h | 68 +++++++++---------- net/ipv4/sysctl_net_ipv4.c | 136 ++++++++++++++++++------------------- 2 files changed, 102 insertions(+), 102 deletions(-)
diff --git a/include/net/netns/ipv4.h b/include/net/netns/ipv4.h index 92e3d8fe954ab..d8b320cf54ba0 100644 --- a/include/net/netns/ipv4.h +++ b/include/net/netns/ipv4.h @@ -114,11 +114,11 @@ struct netns_ipv4 { u8 sysctl_nexthop_compat_mode; u8 sysctl_fwmark_reflect;
- int sysctl_tcp_fwmark_accept;
- u8 sysctl_tcp_fwmark_accept;
#ifdef CONFIG_NET_L3_MASTER_DEV
- int sysctl_tcp_l3mdev_accept;
- u8 sysctl_tcp_l3mdev_accept;
#endif
- int sysctl_tcp_mtu_probing;
- u8 sysctl_tcp_mtu_probing; int sysctl_tcp_mtu_probe_floor; int sysctl_tcp_base_mss; int sysctl_tcp_min_snd_mss;
@@ -126,46 +126,47 @@ struct netns_ipv4 { u32 sysctl_tcp_probe_interval; int sysctl_tcp_keepalive_time;
- int sysctl_tcp_keepalive_probes; int sysctl_tcp_keepalive_intvl;
- u8 sysctl_tcp_keepalive_probes;
- int sysctl_tcp_syn_retries;
- int sysctl_tcp_synack_retries;
- int sysctl_tcp_syncookies;
- u8 sysctl_tcp_syn_retries;
- u8 sysctl_tcp_synack_retries;
- u8 sysctl_tcp_syncookies; int sysctl_tcp_reordering;
- int sysctl_tcp_retries1;
- int sysctl_tcp_retries2;
- int sysctl_tcp_orphan_retries;
- u8 sysctl_tcp_retries1;
- u8 sysctl_tcp_retries2;
- u8 sysctl_tcp_orphan_retries;
- u8 sysctl_tcp_tw_reuse; int sysctl_tcp_fin_timeout; unsigned int sysctl_tcp_notsent_lowat;
- int sysctl_tcp_tw_reuse;
- int sysctl_tcp_sack;
- int sysctl_tcp_window_scaling;
- int sysctl_tcp_timestamps;
- int sysctl_tcp_early_retrans;
- int sysctl_tcp_recovery;
- int sysctl_tcp_thin_linear_timeouts;
- int sysctl_tcp_slow_start_after_idle;
- int sysctl_tcp_retrans_collapse;
- int sysctl_tcp_stdurg;
- int sysctl_tcp_rfc1337;
- int sysctl_tcp_abort_on_overflow;
- int sysctl_tcp_fack;
- u8 sysctl_tcp_sack;
- u8 sysctl_tcp_window_scaling;
- u8 sysctl_tcp_timestamps;
- u8 sysctl_tcp_early_retrans;
- u8 sysctl_tcp_recovery;
- u8 sysctl_tcp_thin_linear_timeouts;
- u8 sysctl_tcp_slow_start_after_idle;
- u8 sysctl_tcp_retrans_collapse;
- u8 sysctl_tcp_stdurg;
- u8 sysctl_tcp_rfc1337;
- u8 sysctl_tcp_abort_on_overflow;
- u8 sysctl_tcp_fack; /* obsolete */ int sysctl_tcp_max_reordering;
- int sysctl_tcp_dsack;
- int sysctl_tcp_app_win; int sysctl_tcp_adv_win_scale;
- int sysctl_tcp_frto;
- int sysctl_tcp_nometrics_save;
- int sysctl_tcp_no_ssthresh_metrics_save;
- int sysctl_tcp_moderate_rcvbuf;
- int sysctl_tcp_tso_win_divisor;
- int sysctl_tcp_workaround_signed_windows;
- u8 sysctl_tcp_dsack;
- u8 sysctl_tcp_app_win;
- u8 sysctl_tcp_frto;
- u8 sysctl_tcp_nometrics_save;
- u8 sysctl_tcp_no_ssthresh_metrics_save;
- u8 sysctl_tcp_moderate_rcvbuf;
- u8 sysctl_tcp_tso_win_divisor;
- u8 sysctl_tcp_workaround_signed_windows; int sysctl_tcp_limit_output_bytes; int sysctl_tcp_challenge_ack_limit;
- int sysctl_tcp_min_tso_segs; int sysctl_tcp_min_rtt_wlen;
- int sysctl_tcp_autocorking;
- u8 sysctl_tcp_min_tso_segs;
- u8 sysctl_tcp_autocorking;
- u8 sysctl_tcp_reflect_tos; int sysctl_tcp_invalid_ratelimit; int sysctl_tcp_pacing_ss_ratio; int sysctl_tcp_pacing_ca_ratio;
@@ -183,7 +184,6 @@ struct netns_ipv4 { unsigned int sysctl_tcp_fastopen_blackhole_timeout; atomic_t tfo_active_disable_times; unsigned long tfo_active_disable_stamp;
- int sysctl_tcp_reflect_tos;
int sysctl_udp_wmem_min; int sysctl_udp_rmem_min; diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index cb587bdd683a6..1a2506f795d4e 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -720,17 +720,17 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_fwmark_accept", .data = &init_net.ipv4.sysctl_tcp_fwmark_accept,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec,
},.proc_handler = proc_dou8vec_minmax,
#ifdef CONFIG_NET_L3_MASTER_DEV { .procname = "tcp_l3mdev_accept", .data = &init_net.ipv4.sysctl_tcp_l3mdev_accept,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, },.proc_handler = proc_dou8vec_minmax,
@@ -738,9 +738,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_mtu_probing", .data = &init_net.ipv4.sysctl_tcp_mtu_probing,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec,
}, { .procname = "tcp_base_mss",.proc_handler = proc_dou8vec_minmax,
@@ -842,9 +842,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_keepalive_probes", .data = &init_net.ipv4.sysctl_tcp_keepalive_probes,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_keepalive_intvl",.proc_handler = proc_dou8vec_minmax,
@@ -856,26 +856,26 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_syn_retries", .data = &init_net.ipv4.sysctl_tcp_syn_retries,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec_minmax,
.extra1 = &tcp_syn_retries_min, .extra2 = &tcp_syn_retries_max }, { .procname = "tcp_synack_retries", .data = &init_net.ipv4.sysctl_tcp_synack_retries,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
},.proc_handler = proc_dou8vec_minmax,
#ifdef CONFIG_SYN_COOKIES { .procname = "tcp_syncookies", .data = &init_net.ipv4.sysctl_tcp_syncookies,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
},.proc_handler = proc_dou8vec_minmax,
#endif { @@ -888,24 +888,24 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_retries1", .data = &init_net.ipv4.sysctl_tcp_retries1,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec_minmax,
.extra2 = &tcp_retr1_max }, { .procname = "tcp_retries2", .data = &init_net.ipv4.sysctl_tcp_retries2,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_orphan_retries", .data = &init_net.ipv4.sysctl_tcp_orphan_retries,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_fin_timeout",.proc_handler = proc_dou8vec_minmax,
@@ -924,9 +924,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_tw_reuse", .data = &init_net.ipv4.sysctl_tcp_tw_reuse,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO, .extra2 = &two, },.proc_handler = proc_dou8vec_minmax,
@@ -1012,88 +1012,88 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_sack", .data = &init_net.ipv4.sysctl_tcp_sack,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_window_scaling", .data = &init_net.ipv4.sysctl_tcp_window_scaling,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_timestamps", .data = &init_net.ipv4.sysctl_tcp_timestamps,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_early_retrans", .data = &init_net.ipv4.sysctl_tcp_early_retrans,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO, .extra2 = &four, }, { .procname = "tcp_recovery", .data = &init_net.ipv4.sysctl_tcp_recovery,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec,
}, { .procname = "tcp_thin_linear_timeouts", .data = &init_net.ipv4.sysctl_tcp_thin_linear_timeouts,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_slow_start_after_idle", .data = &init_net.ipv4.sysctl_tcp_slow_start_after_idle,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_retrans_collapse", .data = &init_net.ipv4.sysctl_tcp_retrans_collapse,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_stdurg", .data = &init_net.ipv4.sysctl_tcp_stdurg,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_rfc1337", .data = &init_net.ipv4.sysctl_tcp_rfc1337,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_abort_on_overflow", .data = &init_net.ipv4.sysctl_tcp_abort_on_overflow,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_fack", .data = &init_net.ipv4.sysctl_tcp_fack,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_max_reordering",.proc_handler = proc_dou8vec_minmax,
@@ -1105,16 +1105,16 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_dsack", .data = &init_net.ipv4.sysctl_tcp_dsack,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_app_win", .data = &init_net.ipv4.sysctl_tcp_app_win,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_adv_win_scale",.proc_handler = proc_dou8vec_minmax,
@@ -1128,46 +1128,46 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_frto", .data = &init_net.ipv4.sysctl_tcp_frto,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_no_metrics_save", .data = &init_net.ipv4.sysctl_tcp_nometrics_save,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec,
}, { .procname = "tcp_no_ssthresh_metrics_save", .data = &init_net.ipv4.sysctl_tcp_no_ssthresh_metrics_save,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, }, { .procname = "tcp_moderate_rcvbuf", .data = &init_net.ipv4.sysctl_tcp_moderate_rcvbuf,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec,
}, { .procname = "tcp_tso_win_divisor", .data = &init_net.ipv4.sysctl_tcp_tso_win_divisor,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec,
}, { .procname = "tcp_workaround_signed_windows", .data = &init_net.ipv4.sysctl_tcp_workaround_signed_windows,.proc_handler = proc_dou8vec_minmax,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec
}, { .procname = "tcp_limit_output_bytes",.proc_handler = proc_dou8vec_minmax,
@@ -1186,9 +1186,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_min_tso_segs", .data = &init_net.ipv4.sysctl_tcp_min_tso_segs,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ONE, .extra2 = &gso_max_segs, },.proc_handler = proc_dou8vec_minmax,
@@ -1204,9 +1204,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_autocorking", .data = &init_net.ipv4.sysctl_tcp_autocorking,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, },.proc_handler = proc_dou8vec_minmax,
@@ -1277,9 +1277,9 @@ static struct ctl_table ipv4_net_table[] = { { .procname = "tcp_reflect_tos", .data = &init_net.ipv4.sysctl_tcp_reflect_tos,
.maxlen = sizeof(int),
.mode = 0644,.maxlen = sizeof(u8),
.proc_handler = proc_dointvec_minmax,
.extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, },.proc_handler = proc_dou8vec_minmax,
-- 2.39.2
From: YueHaibing yuehaibing@huawei.com
[ Upstream commit dc5110c2d959c1707e12df5f792f41d90614adaa ]
UBSAN: shift-out-of-bounds in net/ipv4/tcp_input.c:555:23 shift exponent 255 is too large for 32-bit type 'int' CPU: 1 PID: 7907 Comm: ssh Not tainted 6.3.0-rc4-00161-g62bad54b26db-dirty #206 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x136/0x150 __ubsan_handle_shift_out_of_bounds+0x21f/0x5a0 tcp_init_transfer.cold+0x3a/0xb9 tcp_finish_connect+0x1d0/0x620 tcp_rcv_state_process+0xd78/0x4d60 tcp_v4_do_rcv+0x33d/0x9d0 __release_sock+0x133/0x3b0 release_sock+0x58/0x1b0
'maxwin' is int, shifting int for 32 or more bits is undefined behaviour.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: YueHaibing yuehaibing@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/networking/ip-sysctl.rst | 2 ++ net/ipv4/sysctl_net_ipv4.c | 3 +++ 2 files changed, 5 insertions(+)
diff --git a/Documentation/networking/ip-sysctl.rst b/Documentation/networking/ip-sysctl.rst index 0158dff638873..df26cf4110ef5 100644 --- a/Documentation/networking/ip-sysctl.rst +++ b/Documentation/networking/ip-sysctl.rst @@ -272,6 +272,8 @@ tcp_app_win - INTEGER Reserve max(window/2^tcp_app_win, mss) of window for application buffer. Value 0 is special, it means that nothing is reserved.
+ Possible values are [0, 31], inclusive. + Default: 31
tcp_autocorking - BOOLEAN diff --git a/net/ipv4/sysctl_net_ipv4.c b/net/ipv4/sysctl_net_ipv4.c index 1a2506f795d4e..3a34e9768bff0 100644 --- a/net/ipv4/sysctl_net_ipv4.c +++ b/net/ipv4/sysctl_net_ipv4.c @@ -37,6 +37,7 @@ static int ip_local_port_range_min[] = { 1, 1 }; static int ip_local_port_range_max[] = { 65535, 65535 }; static int tcp_adv_win_scale_min = -31; static int tcp_adv_win_scale_max = 31; +static int tcp_app_win_max = 31; static int tcp_min_snd_mss_min = TCP_MIN_SND_MSS; static int tcp_min_snd_mss_max = 65535; static int ip_privileged_port_min; @@ -1115,6 +1116,8 @@ static struct ctl_table ipv4_net_table[] = { .maxlen = sizeof(u8), .mode = 0644, .proc_handler = proc_dou8vec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = &tcp_app_win_max, }, { .procname = "tcp_adv_win_scale",
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit b89ce1177d42d5c124e83f3858818cd4e6a2c46f ]
'priv' is a managed resource, so there is no need to free it explicitly or there will be a double free().
Fixes: 90ad200b4cbc ("drm/armada: Use devm_drm_dev_alloc") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/c4f3c9207a9fce35cb6dd2cc60e755... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/armada/armada_drv.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/armada/armada_drv.c b/drivers/gpu/drm/armada/armada_drv.c index 980d3f1f8f16e..2d1e1e48f0eec 100644 --- a/drivers/gpu/drm/armada/armada_drv.c +++ b/drivers/gpu/drm/armada/armada_drv.c @@ -102,7 +102,6 @@ static int armada_drm_bind(struct device *dev) if (ret) { dev_err(dev, "[" DRM_NAME ":%s] can't kick out simple-fb: %d\n", __func__, ret); - kfree(priv); return ret; }
From: Denis Plotnikov den-plotnikov@yandex-team.ru
[ Upstream commit 7573099e10ca69c3be33995c1fcd0d241226816d ]
Static code analyzer complains to unchecked return value. The result of pci_reset_function() is unchecked. Despite, the issue is on the FLR supported code path and in that case reset can be done with pcie_flr(), the patch uses less invasive approach by adding the result check of pci_reset_function().
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 7e2cf4feba05 ("qlcnic: change driver hardware interface mechanism") Signed-off-by: Denis Plotnikov den-plotnikov@yandex-team.ru Reviewed-by: Simon Horman simon.horman@corigine.com Reviewed-by: Bjorn Helgaas bhelgaas@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c index 87f76bac2e463..eb827b86ecae8 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_ctx.c @@ -628,7 +628,13 @@ int qlcnic_fw_create_ctx(struct qlcnic_adapter *dev) int i, err, ring;
if (dev->flags & QLCNIC_NEED_FLR) { - pci_reset_function(dev->pdev); + err = pci_reset_function(dev->pdev); + if (err) { + dev_err(&dev->pdev->dev, + "Adapter reset failed (%d). Please reboot\n", + err); + return err; + } dev->flags &= ~QLCNIC_NEED_FLR; }
From: Ziyang Xuan william.xuanziyang@huawei.com
[ Upstream commit 6417070918de3bcdbe0646e7256dae58fd8083ba ]
Syzbot reported a bug as following:
===================================================== BUG: KMSAN: uninit-value in qrtr_tx_resume+0x185/0x1f0 net/qrtr/af_qrtr.c:230 qrtr_tx_resume+0x185/0x1f0 net/qrtr/af_qrtr.c:230 qrtr_endpoint_post+0xf85/0x11b0 net/qrtr/af_qrtr.c:519 qrtr_tun_write_iter+0x270/0x400 net/qrtr/tun.c:108 call_write_iter include/linux/fs.h:2189 [inline] aio_write+0x63a/0x950 fs/aio.c:1600 io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019 __do_sys_io_submit fs/aio.c:2078 [inline] __se_sys_io_submit+0x293/0x770 fs/aio.c:2048 __x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Uninit was created at: slab_post_alloc_hook mm/slab.h:766 [inline] slab_alloc_node mm/slub.c:3452 [inline] __kmem_cache_alloc_node+0x71f/0xce0 mm/slub.c:3491 __do_kmalloc_node mm/slab_common.c:967 [inline] __kmalloc_node_track_caller+0x114/0x3b0 mm/slab_common.c:988 kmalloc_reserve net/core/skbuff.c:492 [inline] __alloc_skb+0x3af/0x8f0 net/core/skbuff.c:565 __netdev_alloc_skb+0x120/0x7d0 net/core/skbuff.c:630 qrtr_endpoint_post+0xbd/0x11b0 net/qrtr/af_qrtr.c:446 qrtr_tun_write_iter+0x270/0x400 net/qrtr/tun.c:108 call_write_iter include/linux/fs.h:2189 [inline] aio_write+0x63a/0x950 fs/aio.c:1600 io_submit_one+0x1d1c/0x3bf0 fs/aio.c:2019 __do_sys_io_submit fs/aio.c:2078 [inline] __se_sys_io_submit+0x293/0x770 fs/aio.c:2048 __x64_sys_io_submit+0x92/0xd0 fs/aio.c:2048 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3d/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
It is because that skb->len requires at least sizeof(struct qrtr_ctrl_pkt) in qrtr_tx_resume(). And skb->len equals to size in qrtr_endpoint_post(). But size is less than sizeof(struct qrtr_ctrl_pkt) when qrtr_cb->type equals to QRTR_TYPE_RESUME_TX in qrtr_endpoint_post() under the syzbot scenario. This triggers the uninit variable access bug.
Add size check when qrtr_cb->type equals to QRTR_TYPE_RESUME_TX in qrtr_endpoint_post() to fix the bug.
Fixes: 5fdeb0d372ab ("net: qrtr: Implement outgoing flow control") Reported-by: syzbot+4436c9630a45820fda76@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?id=c14607f0963d27d5a3d5f4c8639b500909e4354... Suggested-by: Manivannan Sadhasivam mani@kernel.org Signed-off-by: Ziyang Xuan william.xuanziyang@huawei.com Reviewed-by: Simon Horman simon.horman@corigine.com Link: https://lore.kernel.org/r/20230410012352.3997823-1-william.xuanziyang@huawei... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/qrtr/af_qrtr.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/net/qrtr/af_qrtr.c b/net/qrtr/af_qrtr.c index d0f0b2b8dce2f..71c2295d4a573 100644 --- a/net/qrtr/af_qrtr.c +++ b/net/qrtr/af_qrtr.c @@ -492,6 +492,11 @@ int qrtr_endpoint_post(struct qrtr_endpoint *ep, const void *data, size_t len) if (!size || len != ALIGN(size, 4) + hdrlen) goto err;
+ if ((cb->type == QRTR_TYPE_NEW_SERVER || + cb->type == QRTR_TYPE_RESUME_TX) && + size < sizeof(struct qrtr_ctrl_pkt)) + goto err; + if (cb->dst_port != QRTR_PORT_CTRL && cb->type != QRTR_TYPE_DATA && cb->type != QRTR_TYPE_RESUME_TX) goto err; @@ -500,6 +505,14 @@ int qrtr_endpoint_post(struct qrtr_endpoint *ep, const void *data, size_t len)
qrtr_node_assign(node, cb->src_node);
+ if (cb->type == QRTR_TYPE_NEW_SERVER) { + /* Remote node endpoint can bridge other distant nodes */ + const struct qrtr_ctrl_pkt *pkt; + + pkt = data + hdrlen; + qrtr_node_assign(node, le32_to_cpu(pkt->server.node)); + } + if (cb->type == QRTR_TYPE_RESUME_TX) { qrtr_tx_resume(node, skb); } else {
From: Xin Long lucien.xin@gmail.com
[ Upstream commit 32832a2caf82663870126c5186cf8f86c8b2a649 ]
Currently, when traversing ifwdtsn skips with _sctp_walk_ifwdtsn, it only checks the pos against the end of the chunk. However, the data left for the last pos may be < sizeof(struct sctp_ifwdtsn_skip), and dereference it as struct sctp_ifwdtsn_skip may cause coverflow.
This patch fixes it by checking the pos against "the end of the chunk - sizeof(struct sctp_ifwdtsn_skip)" in sctp_ifwdtsn_skip, similar to sctp_fwdtsn_skip.
Fixes: 0fc2ea922c8a ("sctp: implement validate_ftsn for sctp_stream_interleave") Signed-off-by: Xin Long lucien.xin@gmail.com Link: https://lore.kernel.org/r/2a71bffcd80b4f2c61fac6d344bb2f11c8fd74f7.168115581... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sctp/stream_interleave.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/sctp/stream_interleave.c b/net/sctp/stream_interleave.c index 6b13f737ebf2e..e3aad75cb11d9 100644 --- a/net/sctp/stream_interleave.c +++ b/net/sctp/stream_interleave.c @@ -1162,7 +1162,8 @@ static void sctp_generate_iftsn(struct sctp_outq *q, __u32 ctsn)
#define _sctp_walk_ifwdtsn(pos, chunk, end) \ for (pos = chunk->subh.ifwdtsn_hdr->skip; \ - (void *)pos < (void *)chunk->subh.ifwdtsn_hdr->skip + (end); pos++) + (void *)pos <= (void *)chunk->subh.ifwdtsn_hdr->skip + (end) - \ + sizeof(struct sctp_ifwdtsn_skip); pos++)
#define sctp_walk_ifwdtsn(pos, ch) \ _sctp_walk_ifwdtsn((pos), (ch), ntohs((ch)->chunk_hdr->length) - \
From: Saravanan Vajravel saravanan.vajravel@broadcom.com
[ Upstream commit aca3b0fa3d04b40c96934d86cc224cccfa7ea8e0 ]
If AH create request fails, release sgid_attr to avoid GID entry referrence leak reported while releasing GID table
Fixes: 1a1f460ff151 ("RDMA: Hold the sgid_attr inside the struct ib_ah/qp") Link: https://lore.kernel.org/r/20230401063424.342204-1-saravanan.vajravel@broadco... Reviewed-by: Selvin Xavier selvin.xavier@broadcom.com Signed-off-by: Saravanan Vajravel saravanan.vajravel@broadcom.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/core/verbs.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index 5123be0ab02f5..4fcabe5a84bee 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -535,6 +535,8 @@ static struct ib_ah *_rdma_create_ah(struct ib_pd *pd,
ret = device->ops.create_ah(ah, &init_attr, udata); if (ret) { + if (ah->sgid_attr) + rdma_put_gid_attr(ah->sgid_attr); kfree(ah); return ERR_PTR(ret); }
From: Eric Dumazet edumazet@google.com
[ Upstream commit 1c5950fc6fe996235f1d18539b9c6b64b597f50f ]
lena wang reported an issue caused by udpv6_sendmsg() mangling msg->msg_name and msg->msg_namelen, which are later read from ____sys_sendmsg() :
/* * If this is sendmmsg() and sending to current destination address was * successful, remember it. */ if (used_address && err >= 0) { used_address->name_len = msg_sys->msg_namelen; if (msg_sys->msg_name) memcpy(&used_address->name, msg_sys->msg_name, used_address->name_len); }
udpv6_sendmsg() wants to pretend the remote address family is AF_INET in order to call udp_sendmsg().
A fix would be to modify the address in-place, instead of using a local variable, but this could have other side effects.
Instead, restore initial values before we return from udpv6_sendmsg().
Fixes: c71d8ebe7a44 ("net: Fix security_socket_sendmsg() bypass problem.") Reported-by: lena wang lena.wang@mediatek.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Maciej Żenczykowski maze@google.com Link: https://lore.kernel.org/r/20230412130308.1202254-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/udp.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 1805cc5f7418b..20cc08210c700 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1340,9 +1340,11 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) msg->msg_name = &sin; msg->msg_namelen = sizeof(sin); do_udp_sendmsg: - if (__ipv6_only_sock(sk)) - return -ENETUNREACH; - return udp_sendmsg(sk, msg, len); + err = __ipv6_only_sock(sk) ? + -ENETUNREACH : udp_sendmsg(sk, msg, len); + msg->msg_name = sin6; + msg->msg_namelen = addr_len; + return err; } }
From: Roman Gushchin roman.gushchin@linux.dev
[ Upstream commit e8b74453555872851bdd7ea43a7c0ec39659834f ]
For quite some time we were chasing a bug which looked like a sudden permanent failure of networking and mmc on some of our devices. The bug was very sensitive to any software changes and even more to any kernel debug options.
Finally we got a setup where the problem was reproducible with CONFIG_DMA_API_DEBUG=y and it revealed the issue with the rx dma:
[ 16.992082] ------------[ cut here ]------------ [ 16.996779] DMA-API: macb ff0b0000.ethernet: device driver tries to free DMA memory it has not allocated [device address=0x0000000875e3e244] [size=1536 bytes] [ 17.011049] WARNING: CPU: 0 PID: 85 at kernel/dma/debug.c:1011 check_unmap+0x6a0/0x900 [ 17.018977] Modules linked in: xxxxx [ 17.038823] CPU: 0 PID: 85 Comm: irq/55-8000f000 Not tainted 5.4.0 #28 [ 17.045345] Hardware name: xxxxx [ 17.049528] pstate: 60000005 (nZCv daif -PAN -UAO) [ 17.054322] pc : check_unmap+0x6a0/0x900 [ 17.058243] lr : check_unmap+0x6a0/0x900 [ 17.062163] sp : ffffffc010003c40 [ 17.065470] x29: ffffffc010003c40 x28: 000000004000c03c [ 17.070783] x27: ffffffc010da7048 x26: ffffff8878e38800 [ 17.076095] x25: ffffff8879d22810 x24: ffffffc010003cc8 [ 17.081407] x23: 0000000000000000 x22: ffffffc010a08750 [ 17.086719] x21: ffffff8878e3c7c0 x20: ffffffc010acb000 [ 17.092032] x19: 0000000875e3e244 x18: 0000000000000010 [ 17.097343] x17: 0000000000000000 x16: 0000000000000000 [ 17.102647] x15: ffffff8879e4a988 x14: 0720072007200720 [ 17.107959] x13: 0720072007200720 x12: 0720072007200720 [ 17.113261] x11: 0720072007200720 x10: 0720072007200720 [ 17.118565] x9 : 0720072007200720 x8 : 000000000000022d [ 17.123869] x7 : 0000000000000015 x6 : 0000000000000098 [ 17.129173] x5 : 0000000000000000 x4 : 0000000000000000 [ 17.134475] x3 : 00000000ffffffff x2 : ffffffc010a1d370 [ 17.139778] x1 : b420c9d75d27bb00 x0 : 0000000000000000 [ 17.145082] Call trace: [ 17.147524] check_unmap+0x6a0/0x900 [ 17.151091] debug_dma_unmap_page+0x88/0x90 [ 17.155266] gem_rx+0x114/0x2f0 [ 17.158396] macb_poll+0x58/0x100 [ 17.161705] net_rx_action+0x118/0x400 [ 17.165445] __do_softirq+0x138/0x36c [ 17.169100] irq_exit+0x98/0xc0 [ 17.172234] __handle_domain_irq+0x64/0xc0 [ 17.176320] gic_handle_irq+0x5c/0xc0 [ 17.179974] el1_irq+0xb8/0x140 [ 17.183109] xiic_process+0x5c/0xe30 [ 17.186677] irq_thread_fn+0x28/0x90 [ 17.190244] irq_thread+0x208/0x2a0 [ 17.193724] kthread+0x130/0x140 [ 17.196945] ret_from_fork+0x10/0x20 [ 17.200510] ---[ end trace 7240980785f81d6f ]---
[ 237.021490] ------------[ cut here ]------------ [ 237.026129] DMA-API: exceeded 7 overlapping mappings of cacheline 0x0000000021d79e7b [ 237.033886] WARNING: CPU: 0 PID: 0 at kernel/dma/debug.c:499 add_dma_entry+0x214/0x240 [ 237.041802] Modules linked in: xxxxx [ 237.061637] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 5.4.0 #28 [ 237.068941] Hardware name: xxxxx [ 237.073116] pstate: 80000085 (Nzcv daIf -PAN -UAO) [ 237.077900] pc : add_dma_entry+0x214/0x240 [ 237.081986] lr : add_dma_entry+0x214/0x240 [ 237.086072] sp : ffffffc010003c30 [ 237.089379] x29: ffffffc010003c30 x28: ffffff8878a0be00 [ 237.094683] x27: 0000000000000180 x26: ffffff8878e387c0 [ 237.099987] x25: 0000000000000002 x24: 0000000000000000 [ 237.105290] x23: 000000000000003b x22: ffffffc010a0fa00 [ 237.110594] x21: 0000000021d79e7b x20: ffffffc010abe600 [ 237.115897] x19: 00000000ffffffef x18: 0000000000000010 [ 237.121201] x17: 0000000000000000 x16: 0000000000000000 [ 237.126504] x15: ffffffc010a0fdc8 x14: 0720072007200720 [ 237.131807] x13: 0720072007200720 x12: 0720072007200720 [ 237.137111] x11: 0720072007200720 x10: 0720072007200720 [ 237.142415] x9 : 0720072007200720 x8 : 0000000000000259 [ 237.147718] x7 : 0000000000000001 x6 : 0000000000000000 [ 237.153022] x5 : ffffffc010003a20 x4 : 0000000000000001 [ 237.158325] x3 : 0000000000000006 x2 : 0000000000000007 [ 237.163628] x1 : 8ac721b3a7dc1c00 x0 : 0000000000000000 [ 237.168932] Call trace: [ 237.171373] add_dma_entry+0x214/0x240 [ 237.175115] debug_dma_map_page+0xf8/0x120 [ 237.179203] gem_rx_refill+0x190/0x280 [ 237.182942] gem_rx+0x224/0x2f0 [ 237.186075] macb_poll+0x58/0x100 [ 237.189384] net_rx_action+0x118/0x400 [ 237.193125] __do_softirq+0x138/0x36c [ 237.196780] irq_exit+0x98/0xc0 [ 237.199914] __handle_domain_irq+0x64/0xc0 [ 237.204000] gic_handle_irq+0x5c/0xc0 [ 237.207654] el1_irq+0xb8/0x140 [ 237.210789] arch_cpu_idle+0x40/0x200 [ 237.214444] default_idle_call+0x18/0x30 [ 237.218359] do_idle+0x200/0x280 [ 237.221578] cpu_startup_entry+0x20/0x30 [ 237.225493] rest_init+0xe4/0xf0 [ 237.228713] arch_call_rest_init+0xc/0x14 [ 237.232714] start_kernel+0x47c/0x4a8 [ 237.236367] ---[ end trace 7240980785f81d70 ]---
Lars was fast to find an explanation: according to the datasheet bit 2 of the rx buffer descriptor entry has a different meaning in the extended mode: Address [2] of beginning of buffer, or in extended buffer descriptor mode (DMA configuration register [28] = 1), indicates a valid timestamp in the buffer descriptor entry.
The macb driver didn't mask this bit while getting an address and it eventually caused a memory corruption and a dma failure.
The problem is resolved by explicitly clearing the problematic bit if hw timestamping is used.
Fixes: 7b4296148066 ("net: macb: Add support for PTP timestamps in DMA descriptors") Signed-off-by: Roman Gushchin roman.gushchin@linux.dev Co-developed-by: Lars-Peter Clausen lars@metafoo.de Signed-off-by: Lars-Peter Clausen lars@metafoo.de Acked-by: Nicolas Ferre nicolas.ferre@microchip.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Link: https://lore.kernel.org/r/20230412232144.770336-1-roman.gushchin@linux.dev Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cadence/macb_main.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index e0d62e2513879..70d57ef95fb15 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -884,6 +884,10 @@ static dma_addr_t macb_get_addr(struct macb *bp, struct macb_dma_desc *desc) } #endif addr |= MACB_BF(RX_WADDR, MACB_BFEXT(RX_WADDR, desc->addr)); +#ifdef CONFIG_MACB_USE_HWSTAMP + if (bp->hw_dma_cap & HW_DMA_CAP_PTP) + addr &= ~GEM_BIT(DMA_RXVALID); +#endif return addr; }
From: Andrii Nakryiko andrii@kernel.org
[ Upstream commit 872aec4b5f635d94111d48ec3c57fbe078d64e7d ]
btf_dump APIs emit unnecessary tabs when emitting struct/union definition that fits on the single line. Before this patch we'd get:
struct blah {<tab>};
This patch fixes this and makes sure that we get more natural:
struct blah {};
Fixes: 44a726c3f23c ("bpftool: Print newline before '}' for struct with padding only fields") Signed-off-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Link: https://lore.kernel.org/bpf/20221212211505.558851-2-andrii@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/bpf/btf_dump.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/tools/lib/bpf/btf_dump.c b/tools/lib/bpf/btf_dump.c index 6a8d8ed34b760..61aa2c47fbd5e 100644 --- a/tools/lib/bpf/btf_dump.c +++ b/tools/lib/bpf/btf_dump.c @@ -973,9 +973,12 @@ static void btf_dump_emit_struct_def(struct btf_dump *d, * Keep `struct empty {}` on a single line, * only print newline when there are regular or padding fields. */ - if (vlen || t->size) + if (vlen || t->size) { btf_dump_printf(d, "\n"); - btf_dump_printf(d, "%s}", pfx(lvl)); + btf_dump_printf(d, "%s}", pfx(lvl)); + } else { + btf_dump_printf(d, "}"); + } if (packed) btf_dump_printf(d, " __attribute__((packed))"); }
From: Grant Grundler grundler@chromium.org
[ Upstream commit 14c76b2e75bca4d96e2b85a0c12aa43e84fe3f74 ]
This doesn't need to be printed every second as an error: ... <3>[17438.628385] cros-usbpd-charger cros-usbpd-charger.3.auto: Port 1: default case! <3>[17439.634176] cros-usbpd-charger cros-usbpd-charger.3.auto: Port 1: default case! <3>[17440.640298] cros-usbpd-charger cros-usbpd-charger.3.auto: Port 1: default case! ...
Reduce priority from ERROR to DEBUG.
Signed-off-by: Grant Grundler grundler@chromium.org Reviewed-by: Guenter Roeck groeck@chromium.org Signed-off-by: Sebastian Reichel sebastian.reichel@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/power/supply/cros_usbpd-charger.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/power/supply/cros_usbpd-charger.c b/drivers/power/supply/cros_usbpd-charger.c index d89e08efd2ad0..0a4f02e4ae7ba 100644 --- a/drivers/power/supply/cros_usbpd-charger.c +++ b/drivers/power/supply/cros_usbpd-charger.c @@ -276,7 +276,7 @@ static int cros_usbpd_charger_get_power_info(struct port_data *port) port->psy_current_max = 0; break; default: - dev_err(dev, "Port %d: default case!\n", port->port_number); + dev_dbg(dev, "Port %d: default case!\n", port->port_number); port->psy_usb_type = POWER_SUPPLY_USB_TYPE_SDP; }
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit 139f6973bf140c65d4d1d4bde5485badb4454d7a ]
The driver can be compile tested with !CONFIG_OF making certain data unused:
drivers/net/wireless/marvell/mwifiex/sdio.c:498:34: error: ‘mwifiex_sdio_of_match_table’ defined but not used [-Werror=unused-const-variable=] drivers/net/wireless/marvell/mwifiex/pcie.c:175:34: error: ‘mwifiex_pcie_of_match_table’ defined but not used [-Werror=unused-const-variable=]
Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Simon Horman simon.horman@corigine.com Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20230312132523.352182-1-krzysztof.kozlowski@linaro... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/marvell/mwifiex/pcie.c | 2 +- drivers/net/wireless/marvell/mwifiex/sdio.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c index b0024893a1cba..50c34630ca302 100644 --- a/drivers/net/wireless/marvell/mwifiex/pcie.c +++ b/drivers/net/wireless/marvell/mwifiex/pcie.c @@ -183,7 +183,7 @@ static const struct mwifiex_pcie_device mwifiex_pcie8997 = { .can_ext_scan = true, };
-static const struct of_device_id mwifiex_pcie_of_match_table[] = { +static const struct of_device_id mwifiex_pcie_of_match_table[] __maybe_unused = { { .compatible = "pci11ab,2b42" }, { .compatible = "pci1b4b,2b42" }, { } diff --git a/drivers/net/wireless/marvell/mwifiex/sdio.c b/drivers/net/wireless/marvell/mwifiex/sdio.c index 7fb6eef409285..b09e60fedeb16 100644 --- a/drivers/net/wireless/marvell/mwifiex/sdio.c +++ b/drivers/net/wireless/marvell/mwifiex/sdio.c @@ -484,7 +484,7 @@ static struct memory_type_mapping mem_type_mapping_tbl[] = { {"EXTLAST", NULL, 0, 0xFE}, };
-static const struct of_device_id mwifiex_sdio_of_match_table[] = { +static const struct of_device_id mwifiex_sdio_of_match_table[] __maybe_unused = { { .compatible = "marvell,sd8787" }, { .compatible = "marvell,sd8897" }, { .compatible = "marvell,sd8997" },
From: Alexander Stein alexander.stein@ew.tq-group.com
[ Upstream commit 987dd36c0141f6ab9f0fbf14d6b2ec3342dedb2f ]
When start sending a new message clear the Rx & Tx buffer pointers in order to avoid using stale pointers.
Signed-off-by: Alexander Stein alexander.stein@ew.tq-group.com Tested-by: Emanuele Ghidoli emanuele.ghidoli@toradex.com Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-imx-lpi2c.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/i2c/busses/i2c-imx-lpi2c.c b/drivers/i2c/busses/i2c-imx-lpi2c.c index 2018dbcf241e9..d45ec26d51cb9 100644 --- a/drivers/i2c/busses/i2c-imx-lpi2c.c +++ b/drivers/i2c/busses/i2c-imx-lpi2c.c @@ -462,6 +462,8 @@ static int lpi2c_imx_xfer(struct i2c_adapter *adapter, if (num == 1 && msgs[0].len == 0) goto stop;
+ lpi2c_imx->rx_buf = NULL; + lpi2c_imx->tx_buf = NULL; lpi2c_imx->delivered = 0; lpi2c_imx->msglen = msgs[i].len; init_completion(&lpi2c_imx->complete);
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 5ed213dd64681f84a01ceaa82fb336cf7d59ddcf ]
Another Lenovo convertable which reports a landscape resolution of 1920x1200 with a pitch of (1920 * 4) bytes, while the actual framebuffer has a resolution of 1200x1920 with a pitch of (1200 * 4) bytes.
Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Javier Martinez Canillas javierm@redhat.com Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/sysfb_efi.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/arch/x86/kernel/sysfb_efi.c b/arch/x86/kernel/sysfb_efi.c index 9ea65611fba0b..fff04d2859765 100644 --- a/arch/x86/kernel/sysfb_efi.c +++ b/arch/x86/kernel/sysfb_efi.c @@ -272,6 +272,14 @@ static const struct dmi_system_id efifb_dmi_swap_width_height[] __initconst = { "IdeaPad Duet 3 10IGL5"), }, }, + { + /* Lenovo Yoga Book X91F / X91L */ + .matches = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "LENOVO"), + /* Non exact match to match F + L versions */ + DMI_MATCH(DMI_PRODUCT_NAME, "Lenovo YB1-X91"), + }, + }, {}, };
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 03aecb1acbcd7a660f97d645ca6c09d9de27ff9d ]
Like the Windows Lenovo Yoga Book X91F/L the Android Lenovo Yoga Book X90F/L has a portrait 1200x1920 screen used in landscape mode, add a quirk for this.
When the quirk for the X91F/L was initially added it was written to also apply to the X90F/L but this does not work because the Android version of the Yoga Book uses completely different DMI strings. Also adjust the X91F/L quirk to reflect that it only applies to the X91F/L models.
Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Javier Martinez Canillas javierm@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20230301095218.28457-1-hdegoed... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/drm_panel_orientation_quirks.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/drm_panel_orientation_quirks.c b/drivers/gpu/drm/drm_panel_orientation_quirks.c index 8768073794fbf..6106fa7c43028 100644 --- a/drivers/gpu/drm/drm_panel_orientation_quirks.c +++ b/drivers/gpu/drm/drm_panel_orientation_quirks.c @@ -284,10 +284,17 @@ static const struct dmi_system_id orientation_data[] = { DMI_EXACT_MATCH(DMI_PRODUCT_VERSION, "IdeaPad Duet 3 10IGL5"), }, .driver_data = (void *)&lcd1200x1920_rightside_up, - }, { /* Lenovo Yoga Book X90F / X91F / X91L */ + }, { /* Lenovo Yoga Book X90F / X90L */ .matches = { - /* Non exact match to match all versions */ - DMI_MATCH(DMI_PRODUCT_NAME, "Lenovo YB1-X9"), + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Intel Corporation"), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "CHERRYVIEW D1 PLATFORM"), + DMI_EXACT_MATCH(DMI_PRODUCT_VERSION, "YETI-11"), + }, + .driver_data = (void *)&lcd1200x1920_rightside_up, + }, { /* Lenovo Yoga Book X91F / X91L */ + .matches = { + /* Non exact match to match F + L versions */ + DMI_MATCH(DMI_PRODUCT_NAME, "Lenovo YB1-X91"), }, .driver_data = (void *)&lcd1200x1920_rightside_up, }, { /* OneGX1 Pro */
From: Robbie Harwood rharwood@redhat.com
[ Upstream commit 4fc5c74dde69a7eda172514aaeb5a7df3600adb3 ]
The PE Format Specification (section "The Attribute Certificate Table (Image Only)") states that `dwLength` is to be rounded up to 8-byte alignment when used for traversal. Therefore, the field is not required to be an 8-byte multiple in the first place.
Accordingly, pesign has not performed this alignment since version 0.110. This causes kexec failure on pesign'd binaries with "PEFILE: Signature wrapper len wrong". Update the comment and relax the check.
Signed-off-by: Robbie Harwood rharwood@redhat.com Signed-off-by: David Howells dhowells@redhat.com cc: Jarkko Sakkinen jarkko@kernel.org cc: Eric Biederman ebiederm@xmission.com cc: Herbert Xu herbert@gondor.apana.org.au cc: keyrings@vger.kernel.org cc: linux-crypto@vger.kernel.org cc: kexec@lists.infradead.org Link: https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#the-attribut... Link: https://github.com/rhboot/pesign Link: https://lore.kernel.org/r/20230220171254.592347-2-rharwood@redhat.com/ # v2 Signed-off-by: Sasha Levin sashal@kernel.org --- crypto/asymmetric_keys/verify_pefile.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/crypto/asymmetric_keys/verify_pefile.c b/crypto/asymmetric_keys/verify_pefile.c index 7553ab18db898..fe1bb374239d7 100644 --- a/crypto/asymmetric_keys/verify_pefile.c +++ b/crypto/asymmetric_keys/verify_pefile.c @@ -135,11 +135,15 @@ static int pefile_strip_sig_wrapper(const void *pebuf, pr_debug("sig wrapper = { %x, %x, %x }\n", wrapper.length, wrapper.revision, wrapper.cert_type);
- /* Both pesign and sbsign round up the length of certificate table - * (in optional header data directories) to 8 byte alignment. + /* sbsign rounds up the length of certificate table (in optional + * header data directories) to 8 byte alignment. However, the PE + * specification states that while entries are 8-byte aligned, this is + * not included in their length, and as a result, pesign has not + * rounded up since 0.110. */ - if (round_up(wrapper.length, 8) != ctx->sig_len) { - pr_debug("Signature wrapper len wrong\n"); + if (wrapper.length > ctx->sig_len) { + pr_debug("Signature wrapper bigger than sig len (%x > %x)\n", + ctx->sig_len, wrapper.length); return -ELIBBAD; } if (wrapper.revision != WIN_CERT_REVISION_2_0) {
From: Robbie Harwood rharwood@redhat.com
[ Upstream commit 3584c1dbfffdabf8e3dc1dd25748bb38dd01cd43 ]
These particular errors can be encountered while trying to kexec when secureboot lockdown is in place. Without this change, even with a signed debug build, one still needs to reboot the machine to add the appropriate dyndbg parameters (since lockdown blocks debugfs).
Accordingly, upgrade all pr_debug() before fatal error into pr_warn().
Signed-off-by: Robbie Harwood rharwood@redhat.com Signed-off-by: David Howells dhowells@redhat.com cc: Jarkko Sakkinen jarkko@kernel.org cc: Eric Biederman ebiederm@xmission.com cc: Herbert Xu herbert@gondor.apana.org.au cc: keyrings@vger.kernel.org cc: linux-crypto@vger.kernel.org cc: kexec@lists.infradead.org Link: https://lore.kernel.org/r/20230220171254.592347-3-rharwood@redhat.com/ # v2 Signed-off-by: Sasha Levin sashal@kernel.org --- crypto/asymmetric_keys/pkcs7_verify.c | 10 +++++----- crypto/asymmetric_keys/verify_pefile.c | 24 ++++++++++++------------ 2 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/crypto/asymmetric_keys/pkcs7_verify.c b/crypto/asymmetric_keys/pkcs7_verify.c index ce49820caa97f..01e54450c846f 100644 --- a/crypto/asymmetric_keys/pkcs7_verify.c +++ b/crypto/asymmetric_keys/pkcs7_verify.c @@ -79,16 +79,16 @@ static int pkcs7_digest(struct pkcs7_message *pkcs7, }
if (sinfo->msgdigest_len != sig->digest_size) { - pr_debug("Sig %u: Invalid digest size (%u)\n", - sinfo->index, sinfo->msgdigest_len); + pr_warn("Sig %u: Invalid digest size (%u)\n", + sinfo->index, sinfo->msgdigest_len); ret = -EBADMSG; goto error; }
if (memcmp(sig->digest, sinfo->msgdigest, sinfo->msgdigest_len) != 0) { - pr_debug("Sig %u: Message digest doesn't match\n", - sinfo->index); + pr_warn("Sig %u: Message digest doesn't match\n", + sinfo->index); ret = -EKEYREJECTED; goto error; } @@ -488,7 +488,7 @@ int pkcs7_supply_detached_data(struct pkcs7_message *pkcs7, const void *data, size_t datalen) { if (pkcs7->data) { - pr_debug("Data already supplied\n"); + pr_warn("Data already supplied\n"); return -EINVAL; } pkcs7->data = data; diff --git a/crypto/asymmetric_keys/verify_pefile.c b/crypto/asymmetric_keys/verify_pefile.c index fe1bb374239d7..22beaf2213a22 100644 --- a/crypto/asymmetric_keys/verify_pefile.c +++ b/crypto/asymmetric_keys/verify_pefile.c @@ -74,7 +74,7 @@ static int pefile_parse_binary(const void *pebuf, unsigned int pelen, break;
default: - pr_debug("Unknown PEOPT magic = %04hx\n", pe32->magic); + pr_warn("Unknown PEOPT magic = %04hx\n", pe32->magic); return -ELIBBAD; }
@@ -95,7 +95,7 @@ static int pefile_parse_binary(const void *pebuf, unsigned int pelen, ctx->certs_size = ddir->certs.size;
if (!ddir->certs.virtual_address || !ddir->certs.size) { - pr_debug("Unsigned PE binary\n"); + pr_warn("Unsigned PE binary\n"); return -ENODATA; }
@@ -127,7 +127,7 @@ static int pefile_strip_sig_wrapper(const void *pebuf, unsigned len;
if (ctx->sig_len < sizeof(wrapper)) { - pr_debug("Signature wrapper too short\n"); + pr_warn("Signature wrapper too short\n"); return -ELIBBAD; }
@@ -142,16 +142,16 @@ static int pefile_strip_sig_wrapper(const void *pebuf, * rounded up since 0.110. */ if (wrapper.length > ctx->sig_len) { - pr_debug("Signature wrapper bigger than sig len (%x > %x)\n", - ctx->sig_len, wrapper.length); + pr_warn("Signature wrapper bigger than sig len (%x > %x)\n", + ctx->sig_len, wrapper.length); return -ELIBBAD; } if (wrapper.revision != WIN_CERT_REVISION_2_0) { - pr_debug("Signature is not revision 2.0\n"); + pr_warn("Signature is not revision 2.0\n"); return -ENOTSUPP; } if (wrapper.cert_type != WIN_CERT_TYPE_PKCS_SIGNED_DATA) { - pr_debug("Signature certificate type is not PKCS\n"); + pr_warn("Signature certificate type is not PKCS\n"); return -ENOTSUPP; }
@@ -164,7 +164,7 @@ static int pefile_strip_sig_wrapper(const void *pebuf, ctx->sig_offset += sizeof(wrapper); ctx->sig_len -= sizeof(wrapper); if (ctx->sig_len < 4) { - pr_debug("Signature data missing\n"); + pr_warn("Signature data missing\n"); return -EKEYREJECTED; }
@@ -198,7 +198,7 @@ static int pefile_strip_sig_wrapper(const void *pebuf, return 0; } not_pkcs7: - pr_debug("Signature data not PKCS#7\n"); + pr_warn("Signature data not PKCS#7\n"); return -ELIBBAD; }
@@ -341,8 +341,8 @@ static int pefile_digest_pe(const void *pebuf, unsigned int pelen, digest_size = crypto_shash_digestsize(tfm);
if (digest_size != ctx->digest_len) { - pr_debug("Digest size mismatch (%zx != %x)\n", - digest_size, ctx->digest_len); + pr_warn("Digest size mismatch (%zx != %x)\n", + digest_size, ctx->digest_len); ret = -EBADMSG; goto error_no_desc; } @@ -373,7 +373,7 @@ static int pefile_digest_pe(const void *pebuf, unsigned int pelen, * PKCS#7 certificate. */ if (memcmp(digest, ctx->digest, ctx->digest_len) != 0) { - pr_debug("Digest mismatch\n"); + pr_warn("Digest mismatch\n"); ret = -EKEYREJECTED; } else { pr_debug("The digests match!\n");
From: Mathis Salmen mathis.salmen@matsal.de
commit 8d736482749f6d350892ef83a7a11d43cd49981e upstream.
In a NOMMU kernel, sigreturn trampolines are generated on the user stack by setup_rt_frame. Currently, these trampolines are not instruction fenced, thus their visibility to ifetch is not guaranteed.
This patch adds a flush_icache_range in setup_rt_frame to fix this problem.
Signed-off-by: Mathis Salmen mathis.salmen@matsal.de Fixes: 6bd33e1ece52 ("riscv: add nommu support") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230406101130.82304-1-mathis.salmen@matsal.de Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/kernel/signal.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/arch/riscv/kernel/signal.c +++ b/arch/riscv/kernel/signal.c @@ -16,6 +16,7 @@ #include <asm/vdso.h> #include <asm/switch_to.h> #include <asm/csr.h> +#include <asm/cacheflush.h>
extern u32 __user_rt_sigreturn[2];
@@ -178,6 +179,7 @@ static int setup_rt_frame(struct ksignal { struct rt_sigframe __user *frame; long err = 0; + unsigned long __maybe_unused addr;
frame = get_sigframe(ksig, regs, sizeof(*frame)); if (!access_ok(frame, sizeof(*frame))) @@ -206,7 +208,12 @@ static int setup_rt_frame(struct ksignal if (copy_to_user(&frame->sigreturn_code, __user_rt_sigreturn, sizeof(frame->sigreturn_code))) return -EFAULT; - regs->ra = (unsigned long)&frame->sigreturn_code; + + addr = (unsigned long)&frame->sigreturn_code; + /* Make sure the two instructions are pushed to icache. */ + flush_icache_range(addr, addr + sizeof(frame->sigreturn_code)); + + regs->ra = addr; #endif /* CONFIG_MMU */
/*
From: Ivan Bornyakov i.bornyakov@metrotek.ru
commit 813c2dd78618f108fdcf9cd726ea90f081ee2881 upstream.
sfp->i2c_block_size is initialized at SFP module insertion in sfp_sm_mod_probe(). Because of that, if SFP module was never inserted since boot, sfp_read() call will lead to zero-length I2C read attempt, and not all I2C controllers are happy with zero-length reads.
One way to issue sfp_read() on empty SFP cage is to execute ethtool -m. If SFP module was never plugged since boot, there will be a zero-length I2C read attempt.
# ethtool -m xge0 i2c i2c-3: adapter quirk: no zero length (addr 0x0050, size 0, read) Cannot get Module EEPROM data: Operation not supported
If SFP module was plugged then removed at least once, sfp->i2c_block_size will be initialized and ethtool -m will fail with different exit code and without I2C error
# ethtool -m xge0 Cannot get Module EEPROM data: Remote I/O error
Fix this by initializing sfp->i2_block_size at struct sfp allocation stage so no wild sfp_read() could issue zero-length I2C read.
Signed-off-by: Ivan Bornyakov i.bornyakov@metrotek.ru Fixes: 0d035bed2a4a ("net: sfp: VSOL V2801F / CarlitoxxPro CPGOS03-0490 v2.0 workaround") Cc: stable@vger.kernel.org Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/phy/sfp.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
--- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -207,6 +207,12 @@ static const enum gpiod_flags gpio_flags */ #define SFP_PHY_ADDR 22
+/* SFP_EEPROM_BLOCK_SIZE is the size of data chunk to read the EEPROM + * at a time. Some SFP modules and also some Linux I2C drivers do not like + * reads longer than 16 bytes. + */ +#define SFP_EEPROM_BLOCK_SIZE 16 + struct sff_data { unsigned int gpios; bool (*module_supported)(const struct sfp_eeprom_id *id); @@ -1754,11 +1760,7 @@ static int sfp_sm_mod_probe(struct sfp * u8 check; int ret;
- /* Some SFP modules and also some Linux I2C drivers do not like reads - * longer than 16 bytes, so read the EEPROM in chunks of 16 bytes at - * a time. - */ - sfp->i2c_block_size = 16; + sfp->i2c_block_size = SFP_EEPROM_BLOCK_SIZE;
ret = sfp_read(sfp, false, 0, &id.base, sizeof(id.base)); if (ret < 0) { @@ -2385,6 +2387,7 @@ static struct sfp *sfp_alloc(struct devi return ERR_PTR(-ENOMEM);
sfp->dev = dev; + sfp->i2c_block_size = SFP_EEPROM_BLOCK_SIZE;
mutex_init(&sfp->sm_mutex); mutex_init(&sfp->st_mutex);
From: Jiri Kosina jkosina@suse.cz
commit c8e22b7a1694bb8d025ea636816472739d859145 upstream.
This reverts commit 3fe97ff3d949 ("scsi: ses: Don't attach if enclosure has no components") and introduces proper handling of case where there are no detected secondary components, but primary component (enumerated in num_enclosures) does exist. That fix was originally proposed by Ding Hui dinghui@sangfor.com.cn.
Completely ignoring devices that have one primary enclosure and no secondary one results in ses_intf_add() bailing completely
scsi 2:0:0:254: enclosure has no enumerated components scsi 2:0:0:254: Failed to bind enclosure -12ven in valid configurations such
even on valid configurations with 1 primary and 0 secondary enclosures as below:
# sg_ses /dev/sg0 3PARdata SES 3321 Supported diagnostic pages: Supported Diagnostic Pages [sdp] [0x0] Configuration (SES) [cf] [0x1] Short Enclosure Status (SES) [ses] [0x8] # sg_ses -p cf /dev/sg0 3PARdata SES 3321 Configuration diagnostic page: number of secondary subenclosures: 0 generation code: 0x0 enclosure descriptor list Subenclosure identifier: 0 [primary] relative ES process id: 0, number of ES processes: 1 number of type descriptor headers: 1 enclosure logical identifier (hex): 20000002ac02068d enclosure vendor: 3PARdata product: VV rev: 3321 type descriptor header and text list Element type: Unspecified, subenclosure id: 0 number of possible elements: 1
The changelog for the original fix follows
===== We can get a crash when disconnecting the iSCSI session, the call trace like this:
[ffff00002a00fb70] kfree at ffff00000830e224 [ffff00002a00fba0] ses_intf_remove at ffff000001f200e4 [ffff00002a00fbd0] device_del at ffff0000086b6a98 [ffff00002a00fc50] device_unregister at ffff0000086b6d58 [ffff00002a00fc70] __scsi_remove_device at ffff00000870608c [ffff00002a00fca0] scsi_remove_device at ffff000008706134 [ffff00002a00fcc0] __scsi_remove_target at ffff0000087062e4 [ffff00002a00fd10] scsi_remove_target at ffff0000087064c0 [ffff00002a00fd70] __iscsi_unbind_session at ffff000001c872c4 [ffff00002a00fdb0] process_one_work at ffff00000810f35c [ffff00002a00fe00] worker_thread at ffff00000810f648 [ffff00002a00fe70] kthread at ffff000008116e98
In ses_intf_add, components count could be 0, and kcalloc 0 size scomp, but not saved in edev->component[i].scratch
In this situation, edev->component[0].scratch is an invalid pointer, when kfree it in ses_intf_remove_enclosure, a crash like above would happen The call trace also could be other random cases when kfree cannot catch the invalid pointer
We should not use edev->component[] array when the components count is 0 We also need check index when use edev->component[] array in ses_enclosure_data_process =====
Reported-by: Michal Kolar mich.k@seznam.cz Originally-by: Ding Hui dinghui@sangfor.com.cn Cc: stable@vger.kernel.org Fixes: 3fe97ff3d949 ("scsi: ses: Don't attach if enclosure has no components") Signed-off-by: Jiri Kosina jkosina@suse.cz Link: https://lore.kernel.org/r/nycvar.YFH.7.76.2304042122270.29760@cbobk.fhfr.pm Tested-by: Michal Kolar mich.k@seznam.cz Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/ses.c | 20 ++++++++------------ 1 file changed, 8 insertions(+), 12 deletions(-)
--- a/drivers/scsi/ses.c +++ b/drivers/scsi/ses.c @@ -503,9 +503,6 @@ static int ses_enclosure_find_by_addr(st int i; struct ses_component *scomp;
- if (!edev->component[0].scratch) - return 0; - for (i = 0; i < edev->components; i++) { scomp = edev->component[i].scratch; if (scomp->addr != efd->addr) @@ -596,8 +593,10 @@ static void ses_enclosure_data_process(s components++, type_ptr[0], name); - else + else if (components < edev->components) ecomp = &edev->component[components++]; + else + ecomp = ERR_PTR(-EINVAL);
if (!IS_ERR(ecomp)) { if (addl_desc_ptr) { @@ -728,11 +727,6 @@ static int ses_intf_add(struct device *c components += type_ptr[1]; }
- if (components == 0) { - sdev_printk(KERN_WARNING, sdev, "enclosure has no enumerated components\n"); - goto err_free; - } - ses_dev->page1 = buf; ses_dev->page1_len = len; buf = NULL; @@ -774,9 +768,11 @@ static int ses_intf_add(struct device *c buf = NULL; } page2_not_supported: - scomp = kcalloc(components, sizeof(struct ses_component), GFP_KERNEL); - if (!scomp) - goto err_free; + if (components > 0) { + scomp = kcalloc(components, sizeof(struct ses_component), GFP_KERNEL); + if (!scomp) + goto err_free; + }
edev = enclosure_register(cdev->parent, dev_name(&sdev->sdev_gendev), components, &ses_enclosure_callbacks);
From: Basavaraj Natikar Basavaraj.Natikar@amd.com
commit f195fc1e9715ba826c3b62d58038f760f66a4fe9 upstream.
The AMD [1022:15b8] USB controller loses some internal functional MSI-X context when transitioning from D0 to D3hot. BIOS normally traps D0->D3hot and D3hot->D0 transitions so it can save and restore that internal context, but some firmware in the field can't do this because it fails to clear the AMD_15B8_RCC_DEV2_EPF0_STRAP2 NO_SOFT_RESET bit.
Clear AMD_15B8_RCC_DEV2_EPF0_STRAP2 NO_SOFT_RESET bit before USB controller initialization during boot.
Link: https://lore.kernel.org/linux-usb/Y%2Fz9GdHjPyF2rNG3@glanzmann.de/T/#u Link: https://lore.kernel.org/r/20230329172859.699743-1-Basavaraj.Natikar@amd.com Reported-by: Thomas Glanzmann thomas@glanzmann.de Tested-by: Thomas Glanzmann thomas@glanzmann.de Signed-off-by: Basavaraj Natikar Basavaraj.Natikar@amd.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/pci/fixup.c | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+)
--- a/arch/x86/pci/fixup.c +++ b/arch/x86/pci/fixup.c @@ -7,6 +7,7 @@ #include <linux/dmi.h> #include <linux/pci.h> #include <linux/vgaarb.h> +#include <asm/amd_nb.h> #include <asm/hpet.h> #include <asm/pci_x86.h>
@@ -824,3 +825,23 @@ static void rs690_fix_64bit_dma(struct p DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x7910, rs690_fix_64bit_dma);
#endif + +#ifdef CONFIG_AMD_NB + +#define AMD_15B8_RCC_DEV2_EPF0_STRAP2 0x10136008 +#define AMD_15B8_RCC_DEV2_EPF0_STRAP2_NO_SOFT_RESET_DEV2_F0_MASK 0x00000080L + +static void quirk_clear_strap_no_soft_reset_dev2_f0(struct pci_dev *dev) +{ + u32 data; + + if (!amd_smn_read(0, AMD_15B8_RCC_DEV2_EPF0_STRAP2, &data)) { + data &= ~AMD_15B8_RCC_DEV2_EPF0_STRAP2_NO_SOFT_RESET_DEV2_F0_MASK; + if (amd_smn_write(0, AMD_15B8_RCC_DEV2_EPF0_STRAP2, data)) + pci_err(dev, "Failed to write data 0x%x\n", data); + } else { + pci_err(dev, "Failed to read data\n"); + } +} +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, 0x15b8, quirk_clear_strap_no_soft_reset_dev2_f0); +#endif
From: Waiman Long longman@redhat.com
commit ba9182a89626d5f83c2ee4594f55cb9c1e60f0e2 upstream.
After a successful cpuset_can_attach() call which increments the attach_in_progress flag, either cpuset_cancel_attach() or cpuset_attach() will be called later. In cpuset_attach(), tasks in cpuset_attach_wq, if present, will be woken up at the end. That is not the case in cpuset_cancel_attach(). So missed wakeup is possible if the attach operation is somehow cancelled. Fix that by doing the wakeup in cpuset_cancel_attach() as well.
Fixes: e44193d39e8d ("cpuset: let hotplug propagation work wait for task attaching") Signed-off-by: Waiman Long longman@redhat.com Reviewed-by: Michal Koutný mkoutny@suse.com Cc: stable@vger.kernel.org # v3.11+ Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cgroup/cpuset.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2188,11 +2188,15 @@ out_unlock: static void cpuset_cancel_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; + struct cpuset *cs;
cgroup_taskset_first(tset, &css); + cs = css_cs(css);
percpu_down_write(&cpuset_rwsem); - css_cs(css)->attach_in_progress--; + cs->attach_in_progress--; + if (!cs->attach_in_progress) + wake_up(&cpuset_attach_wq); percpu_up_write(&cpuset_rwsem); }
From: Zhihao Cheng chengzhihao1@huawei.com
commit 1e020e1b96afdecd20680b5b5be2a6ffc3d27628 upstream.
Following process will make ubi attaching failed since commit 1b42b1a36fc946 ("ubi: ensure that VID header offset ... size"):
ID="0xec,0xa1,0x00,0x15" # 128M 128KB 2KB modprobe nandsim id_bytes=$ID flash_eraseall /dev/mtd0 modprobe ubi mtd="0,2048" # set vid_hdr offset as 2048 (one page) (dmesg): ubi0 error: ubi_attach_mtd_dev [ubi]: VID header offset 2048 too large. UBI error: cannot attach mtd0 UBI error: cannot initialize UBI, error -22
Rework original solution, the key point is making sure 'vid_hdr_shift + UBI_VID_HDR_SIZE < ubi->vid_hdr_alsize', so we should check vid_hdr_shift rather not vid_hdr_offset. Then, ubi still support (sub)page aligined VID header offset.
Fixes: 1b42b1a36fc946 ("ubi: ensure that VID header offset ... size") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Tested-by: Nicolas Schichan nschichan@freebox.fr Tested-by: Miquel Raynal miquel.raynal@bootlin.com # v5.10, v4.19 Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/ubi/build.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-)
--- a/drivers/mtd/ubi/build.c +++ b/drivers/mtd/ubi/build.c @@ -665,12 +665,6 @@ static int io_init(struct ubi_device *ub ubi->ec_hdr_alsize = ALIGN(UBI_EC_HDR_SIZE, ubi->hdrs_min_io_size); ubi->vid_hdr_alsize = ALIGN(UBI_VID_HDR_SIZE, ubi->hdrs_min_io_size);
- if (ubi->vid_hdr_offset && ((ubi->vid_hdr_offset + UBI_VID_HDR_SIZE) > - ubi->vid_hdr_alsize)) { - ubi_err(ubi, "VID header offset %d too large.", ubi->vid_hdr_offset); - return -EINVAL; - } - dbg_gen("min_io_size %d", ubi->min_io_size); dbg_gen("max_write_size %d", ubi->max_write_size); dbg_gen("hdrs_min_io_size %d", ubi->hdrs_min_io_size); @@ -688,6 +682,21 @@ static int io_init(struct ubi_device *ub ubi->vid_hdr_aloffset; }
+ /* + * Memory allocation for VID header is ubi->vid_hdr_alsize + * which is described in comments in io.c. + * Make sure VID header shift + UBI_VID_HDR_SIZE not exceeds + * ubi->vid_hdr_alsize, so that all vid header operations + * won't access memory out of bounds. + */ + if ((ubi->vid_hdr_shift + UBI_VID_HDR_SIZE) > ubi->vid_hdr_alsize) { + ubi_err(ubi, "Invalid VID header offset %d, VID header shift(%d)" + " + VID header size(%zu) > VID header aligned size(%d).", + ubi->vid_hdr_offset, ubi->vid_hdr_shift, + UBI_VID_HDR_SIZE, ubi->vid_hdr_alsize); + return -EINVAL; + } + /* Similar for the data offset */ ubi->leb_start = ubi->vid_hdr_offset + UBI_VID_HDR_SIZE; ubi->leb_start = ALIGN(ubi->leb_start, ubi->min_io_size);
From: Lee Jones lee.jones@linaro.org
[ Upstream commit ab4e4de9fd8b469823a645f05f2c142e9270b012 ]
Fixes the following W=1 kernel build warning(s):
drivers/mtd/ubi/wl.c:584: warning: Function parameter or member 'nested' not described in 'schedule_erase' drivers/mtd/ubi/wl.c:1075: warning: Excess function parameter 'shutdown' description in '__erase_worker'
Cc: Richard Weinberger richard@nod.at Cc: Miquel Raynal miquel.raynal@bootlin.com Cc: Vignesh Raghavendra vigneshr@ti.com Cc: linux-mtd@lists.infradead.org Signed-off-by: Lee Jones lee.jones@linaro.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20201109182206.3037326-13-lee.jones@linaro... Stable-dep-of: f773f0a331d6 ("ubi: Fix deadlock caused by recursively holding work_sem") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/ubi/wl.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c index 6da09263e0b9f..2ee0e60c43c2e 100644 --- a/drivers/mtd/ubi/wl.c +++ b/drivers/mtd/ubi/wl.c @@ -575,6 +575,7 @@ static int erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk, * @vol_id: the volume ID that last used this PEB * @lnum: the last used logical eraseblock number for the PEB * @torture: if the physical eraseblock has to be tortured + * @nested: denotes whether the work_sem is already held in read mode * * This function returns zero in case of success and a %-ENOMEM in case of * failure. @@ -1066,8 +1067,6 @@ static int ensure_wear_leveling(struct ubi_device *ubi, int nested) * __erase_worker - physical eraseblock erase worker function. * @ubi: UBI device description object * @wl_wrk: the work object - * @shutdown: non-zero if the worker has to free memory and exit - * because the WL sub-system is shutting down * * This function erases a physical eraseblock and perform torture testing if * needed. It also takes care about marking the physical eraseblock bad if
From: ZhaoLong Wang wangzhaolong1@huawei.com
[ Upstream commit f773f0a331d6c41733b17bebbc1b6cae12e016f5 ]
During the processing of the bgt, if the sync_erase() return -EBUSY or some other error code in __erase_worker(),schedule_erase() called again lead to the down_read(ubi->work_sem) hold twice and may get block by down_write(ubi->work_sem) in ubi_update_fastmap(), which cause deadlock.
ubi bgt other task do_work down_read(&ubi->work_sem) ubi_update_fastmap erase_worker # Blocked by down_read __erase_worker down_write(&ubi->work_sem) schedule_erase schedule_ubi_work down_read(&ubi->work_sem)
Fix this by changing input parameter @nested of the schedule_erase() to 'true' to avoid recursively acquiring the down_read(&ubi->work_sem).
Also, fix the incorrect comment about @nested parameter of the schedule_erase() because when down_write(ubi->work_sem) is held, the @nested is also need be true.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217093 Fixes: 2e8f08deabbc ("ubi: Fix races around ubi_refill_pools()") Signed-off-by: ZhaoLong Wang wangzhaolong1@huawei.com Reviewed-by: Zhihao Cheng chengzhihao1@huawei.com Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mtd/ubi/wl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/ubi/wl.c b/drivers/mtd/ubi/wl.c index 2ee0e60c43c2e..4427018ad4d9b 100644 --- a/drivers/mtd/ubi/wl.c +++ b/drivers/mtd/ubi/wl.c @@ -575,7 +575,7 @@ static int erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk, * @vol_id: the volume ID that last used this PEB * @lnum: the last used logical eraseblock number for the PEB * @torture: if the physical eraseblock has to be tortured - * @nested: denotes whether the work_sem is already held in read mode + * @nested: denotes whether the work_sem is already held * * This function returns zero in case of success and a %-ENOMEM in case of * failure. @@ -1121,7 +1121,7 @@ static int __erase_worker(struct ubi_device *ubi, struct ubi_work *wl_wrk) int err1;
/* Re-schedule the LEB for erasure */ - err1 = schedule_erase(ubi, e, vol_id, lnum, 0, false); + err1 = schedule_erase(ubi, e, vol_id, lnum, 0, true); if (err1) { spin_lock(&ubi->wl_lock); wl_entry_destroy(ubi, e);
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit 7e35ef662ca05c42dbc2f401bb76d9219dd7fd02 ]
No functional change in this patch.
Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Reviewed-by: David Gibson david@gibson.dropbear.id.au Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210812132223.225214-2-aneesh.kumar@linux.ibm.com Stable-dep-of: b277fc793daf ("powerpc/papr_scm: Update the NUMA distance table for the target node") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/mm/numa.c | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 275c60f92a7ce..a21f62fcda1e8 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -51,7 +51,7 @@ EXPORT_SYMBOL(numa_cpu_lookup_table); EXPORT_SYMBOL(node_to_cpumask_map); EXPORT_SYMBOL(node_data);
-static int min_common_depth; +static int primary_domain_index; static int n_mem_addr_cells, n_mem_size_cells; static int form1_affinity;
@@ -232,8 +232,8 @@ static int associativity_to_nid(const __be32 *associativity) if (!numa_enabled) goto out;
- if (of_read_number(associativity, 1) >= min_common_depth) - nid = of_read_number(&associativity[min_common_depth], 1); + if (of_read_number(associativity, 1) >= primary_domain_index) + nid = of_read_number(&associativity[primary_domain_index], 1);
/* POWER4 LPAR uses 0xffff as invalid node */ if (nid == 0xffff || nid >= nr_node_ids) @@ -284,9 +284,9 @@ int of_node_to_nid(struct device_node *device) } EXPORT_SYMBOL(of_node_to_nid);
-static int __init find_min_common_depth(void) +static int __init find_primary_domain_index(void) { - int depth; + int index; struct device_node *root;
if (firmware_has_feature(FW_FEATURE_OPAL)) @@ -326,7 +326,7 @@ static int __init find_min_common_depth(void) }
if (form1_affinity) { - depth = of_read_number(distance_ref_points, 1); + index = of_read_number(distance_ref_points, 1); } else { if (distance_ref_points_depth < 2) { printk(KERN_WARNING "NUMA: " @@ -334,7 +334,7 @@ static int __init find_min_common_depth(void) goto err; }
- depth = of_read_number(&distance_ref_points[1], 1); + index = of_read_number(&distance_ref_points[1], 1); }
/* @@ -348,7 +348,7 @@ static int __init find_min_common_depth(void) }
of_node_put(root); - return depth; + return index;
err: of_node_put(root); @@ -437,16 +437,16 @@ int of_drconf_to_nid_single(struct drmem_lmb *lmb) int nid = default_nid; int rc, index;
- if ((min_common_depth < 0) || !numa_enabled) + if ((primary_domain_index < 0) || !numa_enabled) return default_nid;
rc = of_get_assoc_arrays(&aa); if (rc) return default_nid;
- if (min_common_depth <= aa.array_sz && + if (primary_domain_index <= aa.array_sz && !(lmb->flags & DRCONF_MEM_AI_INVALID) && lmb->aa_index < aa.n_arrays) { - index = lmb->aa_index * aa.array_sz + min_common_depth - 1; + index = lmb->aa_index * aa.array_sz + primary_domain_index - 1; nid = of_read_number(&aa.arrays[index], 1);
if (nid == 0xffff || nid >= nr_node_ids) @@ -708,18 +708,18 @@ static int __init parse_numa_properties(void) return -1; }
- min_common_depth = find_min_common_depth(); + primary_domain_index = find_primary_domain_index();
- if (min_common_depth < 0) { + if (primary_domain_index < 0) { /* - * if we fail to parse min_common_depth from device tree + * if we fail to parse primary_domain_index from device tree * mark the numa disabled, boot with numa disabled. */ numa_enabled = false; - return min_common_depth; + return primary_domain_index; }
- dbg("NUMA associativity depth for CPU/Memory: %d\n", min_common_depth); + dbg("NUMA associativity depth for CPU/Memory: %d\n", primary_domain_index);
/* * Even though we connect cpus to numa domains later in SMP @@ -926,7 +926,7 @@ static void __init find_possible_nodes(void) goto out; }
- max_nodes = of_read_number(&domains[min_common_depth], 1); + max_nodes = of_read_number(&domains[primary_domain_index], 1); pr_info("Partition configured for %d NUMA nodes.\n", max_nodes);
for (i = 0; i < max_nodes; i++) { @@ -935,7 +935,7 @@ static void __init find_possible_nodes(void) }
prop_length /= sizeof(int); - if (prop_length > min_common_depth + 2) + if (prop_length > primary_domain_index + 2) coregroup_enabled = 1;
out: @@ -1268,7 +1268,7 @@ int cpu_to_coregroup_id(int cpu) goto out;
index = of_read_number(associativity, 1); - if (index > min_common_depth + 1) + if (index > primary_domain_index + 1) return of_read_number(&associativity[index - 1], 1);
out:
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit 0eacd06bb8adea8dd9edb0a30144166d9f227e64 ]
Also make related code cleanup that will allow adding FORM2_AFFINITY in later patches. No functional change in this patch.
Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Reviewed-by: David Gibson david@gibson.dropbear.id.au Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210812132223.225214-3-aneesh.kumar@linux.ibm.com Stable-dep-of: b277fc793daf ("powerpc/papr_scm: Update the NUMA distance table for the target node") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/firmware.h | 4 +-- arch/powerpc/include/asm/prom.h | 2 +- arch/powerpc/kernel/prom_init.c | 2 +- arch/powerpc/mm/numa.c | 35 ++++++++++++++--------- arch/powerpc/platforms/pseries/firmware.c | 2 +- 5 files changed, 26 insertions(+), 19 deletions(-)
diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h index aa6a5ef5d4830..0cf648d829f15 100644 --- a/arch/powerpc/include/asm/firmware.h +++ b/arch/powerpc/include/asm/firmware.h @@ -44,7 +44,7 @@ #define FW_FEATURE_OPAL ASM_CONST(0x0000000010000000) #define FW_FEATURE_SET_MODE ASM_CONST(0x0000000040000000) #define FW_FEATURE_BEST_ENERGY ASM_CONST(0x0000000080000000) -#define FW_FEATURE_TYPE1_AFFINITY ASM_CONST(0x0000000100000000) +#define FW_FEATURE_FORM1_AFFINITY ASM_CONST(0x0000000100000000) #define FW_FEATURE_PRRN ASM_CONST(0x0000000200000000) #define FW_FEATURE_DRMEM_V2 ASM_CONST(0x0000000400000000) #define FW_FEATURE_DRC_INFO ASM_CONST(0x0000000800000000) @@ -69,7 +69,7 @@ enum { FW_FEATURE_SPLPAR | FW_FEATURE_LPAR | FW_FEATURE_CMO | FW_FEATURE_VPHN | FW_FEATURE_XCMO | FW_FEATURE_SET_MODE | FW_FEATURE_BEST_ENERGY | - FW_FEATURE_TYPE1_AFFINITY | FW_FEATURE_PRRN | + FW_FEATURE_FORM1_AFFINITY | FW_FEATURE_PRRN | FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 | FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE | FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR | diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h index 324a13351749a..df9fec9d232cb 100644 --- a/arch/powerpc/include/asm/prom.h +++ b/arch/powerpc/include/asm/prom.h @@ -147,7 +147,7 @@ extern int of_read_drc_info_cell(struct property **prop, #define OV5_MSI 0x0201 /* PCIe/MSI support */ #define OV5_CMO 0x0480 /* Cooperative Memory Overcommitment */ #define OV5_XCMO 0x0440 /* Page Coalescing */ -#define OV5_TYPE1_AFFINITY 0x0580 /* Type 1 NUMA affinity */ +#define OV5_FORM1_AFFINITY 0x0580 /* FORM1 NUMA affinity */ #define OV5_PRRN 0x0540 /* Platform Resource Reassignment */ #define OV5_HP_EVT 0x0604 /* Hot Plug Event support */ #define OV5_RESIZE_HPT 0x0601 /* Hash Page Table resizing */ diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index 9e71c0739f08d..a3bf3587a4162 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -1069,7 +1069,7 @@ static const struct ibm_arch_vec ibm_architecture_vec_template __initconst = { #else 0, #endif - .associativity = OV5_FEAT(OV5_TYPE1_AFFINITY) | OV5_FEAT(OV5_PRRN), + .associativity = OV5_FEAT(OV5_FORM1_AFFINITY) | OV5_FEAT(OV5_PRRN), .bin_opts = OV5_FEAT(OV5_RESIZE_HPT) | OV5_FEAT(OV5_HP_EVT), .micro_checkpoint = 0, .reserved0 = 0, diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index a21f62fcda1e8..415cd3d258ff8 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -53,7 +53,10 @@ EXPORT_SYMBOL(node_data);
static int primary_domain_index; static int n_mem_addr_cells, n_mem_size_cells; -static int form1_affinity; + +#define FORM0_AFFINITY 0 +#define FORM1_AFFINITY 1 +static int affinity_form;
#define MAX_DISTANCE_REF_POINTS 4 static int distance_ref_points_depth; @@ -190,7 +193,7 @@ int __node_distance(int a, int b) int i; int distance = LOCAL_DISTANCE;
- if (!form1_affinity) + if (affinity_form == FORM0_AFFINITY) return ((a == b) ? LOCAL_DISTANCE : REMOTE_DISTANCE);
for (i = 0; i < distance_ref_points_depth; i++) { @@ -210,7 +213,7 @@ static void initialize_distance_lookup_table(int nid, { int i;
- if (!form1_affinity) + if (affinity_form != FORM1_AFFINITY) return;
for (i = 0; i < distance_ref_points_depth; i++) { @@ -289,6 +292,17 @@ static int __init find_primary_domain_index(void) int index; struct device_node *root;
+ /* + * Check for which form of affinity. + */ + if (firmware_has_feature(FW_FEATURE_OPAL)) { + affinity_form = FORM1_AFFINITY; + } else if (firmware_has_feature(FW_FEATURE_FORM1_AFFINITY)) { + dbg("Using form 1 affinity\n"); + affinity_form = FORM1_AFFINITY; + } else + affinity_form = FORM0_AFFINITY; + if (firmware_has_feature(FW_FEATURE_OPAL)) root = of_find_node_by_path("/ibm,opal"); else @@ -318,23 +332,16 @@ static int __init find_primary_domain_index(void) }
distance_ref_points_depth /= sizeof(int); - - if (firmware_has_feature(FW_FEATURE_OPAL) || - firmware_has_feature(FW_FEATURE_TYPE1_AFFINITY)) { - dbg("Using form 1 affinity\n"); - form1_affinity = 1; - } - - if (form1_affinity) { - index = of_read_number(distance_ref_points, 1); - } else { + if (affinity_form == FORM0_AFFINITY) { if (distance_ref_points_depth < 2) { printk(KERN_WARNING "NUMA: " - "short ibm,associativity-reference-points\n"); + "short ibm,associativity-reference-points\n"); goto err; }
index = of_read_number(&distance_ref_points[1], 1); + } else { + index = of_read_number(distance_ref_points, 1); }
/* diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c index 4c7b7f5a2ebca..5d4c2bc20bbab 100644 --- a/arch/powerpc/platforms/pseries/firmware.c +++ b/arch/powerpc/platforms/pseries/firmware.c @@ -119,7 +119,7 @@ struct vec5_fw_feature {
static __initdata struct vec5_fw_feature vec5_fw_features_table[] = { - {FW_FEATURE_TYPE1_AFFINITY, OV5_TYPE1_AFFINITY}, + {FW_FEATURE_FORM1_AFFINITY, OV5_FORM1_AFFINITY}, {FW_FEATURE_PRRN, OV5_PRRN}, {FW_FEATURE_DRMEM_V2, OV5_DRMEM_V2}, {FW_FEATURE_DRC_INFO, OV5_DRC_INFO},
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit 8ddc6448ec5a5ef50eaa581a7dec0e12a02850ff ]
The associativity details of the newly added resourced are collected from the hypervisor via "ibm,configure-connector" rtas call. Update the numa distance details of the newly added numa node after the above call.
Instead of updating NUMA distance every time we lookup a node id from the associativity property, add helpers that can be used during boot which does this only once. Also remove the distance update from node id lookup helpers.
Currently, we duplicate parsing code for ibm,associativity and ibm,associativity-lookup-arrays in the kernel. The associativity array provided by these device tree properties are very similar and hence can use a helper to parse the node id and numa distance details.
Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210812132223.225214-4-aneesh.kumar@linux.ibm.com Stable-dep-of: b277fc793daf ("powerpc/papr_scm: Update the NUMA distance table for the target node") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/topology.h | 2 + arch/powerpc/mm/numa.c | 212 +++++++++++++----- arch/powerpc/platforms/pseries/hotplug-cpu.c | 2 + .../platforms/pseries/hotplug-memory.c | 2 + 4 files changed, 161 insertions(+), 57 deletions(-)
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index 3beeb030cd78e..1604920d8d2de 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -64,6 +64,7 @@ static inline int early_cpu_to_node(int cpu) }
int of_drconf_to_nid_single(struct drmem_lmb *lmb); +void update_numa_distance(struct device_node *node);
#else
@@ -93,6 +94,7 @@ static inline int of_drconf_to_nid_single(struct drmem_lmb *lmb) return first_online_node; }
+static inline void update_numa_distance(struct device_node *node) {} #endif /* CONFIG_NUMA */
#if defined(CONFIG_NUMA) && defined(CONFIG_PPC_SPLPAR) diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 415cd3d258ff8..e61593ae25c9e 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -208,50 +208,35 @@ int __node_distance(int a, int b) } EXPORT_SYMBOL(__node_distance);
-static void initialize_distance_lookup_table(int nid, - const __be32 *associativity) +static int __associativity_to_nid(const __be32 *associativity, + int max_array_sz) { - int i; + int nid; + /* + * primary_domain_index is 1 based array index. + */ + int index = primary_domain_index - 1;
- if (affinity_form != FORM1_AFFINITY) - return; + if (!numa_enabled || index >= max_array_sz) + return NUMA_NO_NODE;
- for (i = 0; i < distance_ref_points_depth; i++) { - const __be32 *entry; + nid = of_read_number(&associativity[index], 1);
- entry = &associativity[be32_to_cpu(distance_ref_points[i]) - 1]; - distance_lookup_table[nid][i] = of_read_number(entry, 1); - } + /* POWER4 LPAR uses 0xffff as invalid node */ + if (nid == 0xffff || nid >= nr_node_ids) + nid = NUMA_NO_NODE; + return nid; } - /* * Returns nid in the range [0..nr_node_ids], or -1 if no useful NUMA * info is found. */ static int associativity_to_nid(const __be32 *associativity) { - int nid = NUMA_NO_NODE; - - if (!numa_enabled) - goto out; - - if (of_read_number(associativity, 1) >= primary_domain_index) - nid = of_read_number(&associativity[primary_domain_index], 1); - - /* POWER4 LPAR uses 0xffff as invalid node */ - if (nid == 0xffff || nid >= nr_node_ids) - nid = NUMA_NO_NODE; - - if (nid > 0 && - of_read_number(associativity, 1) >= distance_ref_points_depth) { - /* - * Skip the length field and send start of associativity array - */ - initialize_distance_lookup_table(nid, associativity + 1); - } + int array_sz = of_read_number(associativity, 1);
-out: - return nid; + /* Skip the first element in the associativity array */ + return __associativity_to_nid((associativity + 1), array_sz); }
/* Returns the nid associated with the given device tree node, @@ -287,6 +272,60 @@ int of_node_to_nid(struct device_node *device) } EXPORT_SYMBOL(of_node_to_nid);
+static void __initialize_form1_numa_distance(const __be32 *associativity, + int max_array_sz) +{ + int i, nid; + + if (affinity_form != FORM1_AFFINITY) + return; + + nid = __associativity_to_nid(associativity, max_array_sz); + if (nid != NUMA_NO_NODE) { + for (i = 0; i < distance_ref_points_depth; i++) { + const __be32 *entry; + int index = be32_to_cpu(distance_ref_points[i]) - 1; + + /* + * broken hierarchy, return with broken distance table + */ + if (WARN(index >= max_array_sz, "Broken ibm,associativity property")) + return; + + entry = &associativity[index]; + distance_lookup_table[nid][i] = of_read_number(entry, 1); + } + } +} + +static void initialize_form1_numa_distance(const __be32 *associativity) +{ + int array_sz; + + array_sz = of_read_number(associativity, 1); + /* Skip the first element in the associativity array */ + __initialize_form1_numa_distance(associativity + 1, array_sz); +} + +/* + * Used to update distance information w.r.t newly added node. + */ +void update_numa_distance(struct device_node *node) +{ + if (affinity_form == FORM0_AFFINITY) + return; + else if (affinity_form == FORM1_AFFINITY) { + const __be32 *associativity; + + associativity = of_get_associativity(node); + if (!associativity) + return; + + initialize_form1_numa_distance(associativity); + return; + } +} + static int __init find_primary_domain_index(void) { int index; @@ -433,6 +472,38 @@ static int of_get_assoc_arrays(struct assoc_arrays *aa) return 0; }
+static int get_nid_and_numa_distance(struct drmem_lmb *lmb) +{ + struct assoc_arrays aa = { .arrays = NULL }; + int default_nid = NUMA_NO_NODE; + int nid = default_nid; + int rc, index; + + if ((primary_domain_index < 0) || !numa_enabled) + return default_nid; + + rc = of_get_assoc_arrays(&aa); + if (rc) + return default_nid; + + if (primary_domain_index <= aa.array_sz && + !(lmb->flags & DRCONF_MEM_AI_INVALID) && lmb->aa_index < aa.n_arrays) { + const __be32 *associativity; + + index = lmb->aa_index * aa.array_sz; + associativity = &aa.arrays[index]; + nid = __associativity_to_nid(associativity, aa.array_sz); + if (nid > 0 && affinity_form == FORM1_AFFINITY) { + /* + * lookup array associativity entries have + * no length of the array as the first element. + */ + __initialize_form1_numa_distance(associativity, aa.array_sz); + } + } + return nid; +} + /* * This is like of_node_to_nid_single() for memory represented in the * ibm,dynamic-reconfiguration-memory node. @@ -453,26 +524,19 @@ int of_drconf_to_nid_single(struct drmem_lmb *lmb)
if (primary_domain_index <= aa.array_sz && !(lmb->flags & DRCONF_MEM_AI_INVALID) && lmb->aa_index < aa.n_arrays) { - index = lmb->aa_index * aa.array_sz + primary_domain_index - 1; - nid = of_read_number(&aa.arrays[index], 1); - - if (nid == 0xffff || nid >= nr_node_ids) - nid = default_nid; + const __be32 *associativity;
- if (nid > 0) { - index = lmb->aa_index * aa.array_sz; - initialize_distance_lookup_table(nid, - &aa.arrays[index]); - } + index = lmb->aa_index * aa.array_sz; + associativity = &aa.arrays[index]; + nid = __associativity_to_nid(associativity, aa.array_sz); } - return nid; }
#ifdef CONFIG_PPC_SPLPAR -static int vphn_get_nid(long lcpu) + +static int __vphn_get_associativity(long lcpu, __be32 *associativity) { - __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0}; long rc, hwid;
/* @@ -492,12 +556,30 @@ static int vphn_get_nid(long lcpu)
rc = hcall_vphn(hwid, VPHN_FLAG_VCPU, associativity); if (rc == H_SUCCESS) - return associativity_to_nid(associativity); + return 0; }
+ return -1; +} + +static int vphn_get_nid(long lcpu) +{ + __be32 associativity[VPHN_ASSOC_BUFSIZE] = {0}; + + + if (!__vphn_get_associativity(lcpu, associativity)) + return associativity_to_nid(associativity); + return NUMA_NO_NODE; + } #else + +static int __vphn_get_associativity(long lcpu, __be32 *associativity) +{ + return -1; +} + static int vphn_get_nid(long unused) { return NUMA_NO_NODE; @@ -692,7 +774,7 @@ static int __init numa_setup_drmem_lmb(struct drmem_lmb *lmb, size = read_n_cells(n_mem_size_cells, usm); }
- nid = of_drconf_to_nid_single(lmb); + nid = get_nid_and_numa_distance(lmb); fake_numa_create_new_node(((base + size) >> PAGE_SHIFT), &nid); node_set_online(nid); @@ -709,6 +791,7 @@ static int __init parse_numa_properties(void) struct device_node *memory; int default_nid = 0; unsigned long i; + const __be32 *associativity;
if (numa_enabled == 0) { printk(KERN_WARNING "NUMA disabled by user\n"); @@ -734,18 +817,30 @@ static int __init parse_numa_properties(void) * each node to be onlined must have NODE_DATA etc backing it. */ for_each_present_cpu(i) { + __be32 vphn_assoc[VPHN_ASSOC_BUFSIZE]; struct device_node *cpu; - int nid = vphn_get_nid(i); + int nid = NUMA_NO_NODE;
- /* - * Don't fall back to default_nid yet -- we will plug - * cpus into nodes once the memory scan has discovered - * the topology. - */ - if (nid == NUMA_NO_NODE) { + memset(vphn_assoc, 0, VPHN_ASSOC_BUFSIZE * sizeof(__be32)); + + if (__vphn_get_associativity(i, vphn_assoc) == 0) { + nid = associativity_to_nid(vphn_assoc); + initialize_form1_numa_distance(vphn_assoc); + } else { + + /* + * Don't fall back to default_nid yet -- we will plug + * cpus into nodes once the memory scan has discovered + * the topology. + */ cpu = of_get_cpu_node(i, NULL); BUG_ON(!cpu); - nid = of_node_to_nid_single(cpu); + + associativity = of_get_associativity(cpu); + if (associativity) { + nid = associativity_to_nid(associativity); + initialize_form1_numa_distance(associativity); + } of_node_put(cpu); }
@@ -783,8 +878,11 @@ static int __init parse_numa_properties(void) * have associativity properties. If none, then * everything goes to default_nid. */ - nid = of_node_to_nid_single(memory); - if (nid < 0) + associativity = of_get_associativity(memory); + if (associativity) { + nid = associativity_to_nid(associativity); + initialize_form1_numa_distance(associativity); + } else nid = default_nid;
fake_numa_create_new_node(((start + size) >> PAGE_SHIFT), &nid); diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c index 325f3b220f360..1f8f97210d143 100644 --- a/arch/powerpc/platforms/pseries/hotplug-cpu.c +++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c @@ -484,6 +484,8 @@ static ssize_t dlpar_cpu_add(u32 drc_index) return saved_rc; }
+ update_numa_distance(dn); + rc = dlpar_online_cpu(dn); if (rc) { saved_rc = rc; diff --git a/arch/powerpc/platforms/pseries/hotplug-memory.c b/arch/powerpc/platforms/pseries/hotplug-memory.c index 7efe6ec5d14a4..a5f968b5fa3a8 100644 --- a/arch/powerpc/platforms/pseries/hotplug-memory.c +++ b/arch/powerpc/platforms/pseries/hotplug-memory.c @@ -180,6 +180,8 @@ static int update_lmb_associativity_index(struct drmem_lmb *lmb) return -ENODEV; }
+ update_numa_distance(lmb_node); + dr_node = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory"); if (!dr_node) { dlpar_free_cc_nodes(lmb_node);
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit ef31cb83d19c4c589d650747cd5a7e502be9f665 ]
This helper is only used with the dispatch trace log collection. A later patch will add Form2 affinity support and this change helps in keeping that simpler. Also add a comment explaining we don't expect the code to be called with FORM0
Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Reviewed-by: David Gibson david@gibson.dropbear.id.au Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210812132223.225214-5-aneesh.kumar@linux.ibm.com Stable-dep-of: b277fc793daf ("powerpc/papr_scm: Update the NUMA distance table for the target node") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/topology.h | 4 ++-- arch/powerpc/mm/numa.c | 10 +++++++++- arch/powerpc/platforms/pseries/lpar.c | 4 ++-- 3 files changed, 13 insertions(+), 5 deletions(-)
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index 1604920d8d2de..b239ef589ae06 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -36,7 +36,7 @@ static inline int pcibus_to_node(struct pci_bus *bus) cpu_all_mask : \ cpumask_of_node(pcibus_to_node(bus)))
-extern int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc); +int cpu_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc); extern int __node_distance(int, int); #define node_distance(a, b) __node_distance(a, b)
@@ -84,7 +84,7 @@ static inline void sysfs_remove_device_from_node(struct device *dev,
static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node) {}
-static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) +static inline int cpu_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) { return 0; } diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index e61593ae25c9e..010476abec344 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -166,7 +166,7 @@ static void unmap_cpu_from_node(unsigned long cpu) } #endif /* CONFIG_HOTPLUG_CPU || CONFIG_PPC_SPLPAR */
-int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) +static int __cpu_form1_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) { int dist = 0;
@@ -182,6 +182,14 @@ int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) return dist; }
+int cpu_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) +{ + /* We should not get called with FORM0 */ + VM_WARN_ON(affinity_form == FORM0_AFFINITY); + + return __cpu_form1_relative_distance(cpu1_assoc, cpu2_assoc); +} + /* must hold reference to node during call */ static const __be32 *of_get_associativity(struct device_node *dev) { diff --git a/arch/powerpc/platforms/pseries/lpar.c b/arch/powerpc/platforms/pseries/lpar.c index 115d196560b8b..28396a7e77d6f 100644 --- a/arch/powerpc/platforms/pseries/lpar.c +++ b/arch/powerpc/platforms/pseries/lpar.c @@ -261,7 +261,7 @@ static int cpu_relative_dispatch_distance(int last_disp_cpu, int cur_disp_cpu) if (!last_disp_cpu_assoc || !cur_disp_cpu_assoc) return -EIO;
- return cpu_distance(last_disp_cpu_assoc, cur_disp_cpu_assoc); + return cpu_relative_distance(last_disp_cpu_assoc, cur_disp_cpu_assoc); }
static int cpu_home_node_dispatch_distance(int disp_cpu) @@ -281,7 +281,7 @@ static int cpu_home_node_dispatch_distance(int disp_cpu) if (!disp_cpu_assoc || !vcpu_assoc) return -EIO;
- return cpu_distance(disp_cpu_assoc, vcpu_assoc); + return cpu_relative_distance(disp_cpu_assoc, vcpu_assoc); }
static void update_vcpu_disp_stat(int disp_cpu)
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit 1c6b5a7e74052768977855f95d6b8812f6e7772c ]
PAPR interface currently supports two different ways of communicating resource grouping details to the OS. These are referred to as Form 0 and Form 1 associativity grouping. Form 0 is the older format and is now considered deprecated. This patch adds another resource grouping named FORM2.
Signed-off-by: Daniel Henrique Barboza danielhb413@gmail.com Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210812132223.225214-6-aneesh.kumar@linux.ibm.com Stable-dep-of: b277fc793daf ("powerpc/papr_scm: Update the NUMA distance table for the target node") Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/powerpc/associativity.rst | 104 ++++++++++++ arch/powerpc/include/asm/firmware.h | 3 +- arch/powerpc/include/asm/prom.h | 1 + arch/powerpc/kernel/prom_init.c | 3 +- arch/powerpc/mm/numa.c | 187 ++++++++++++++++++---- arch/powerpc/platforms/pseries/firmware.c | 1 + 6 files changed, 262 insertions(+), 37 deletions(-) create mode 100644 Documentation/powerpc/associativity.rst
diff --git a/Documentation/powerpc/associativity.rst b/Documentation/powerpc/associativity.rst new file mode 100644 index 0000000000000..07e7dd3d6c87e --- /dev/null +++ b/Documentation/powerpc/associativity.rst @@ -0,0 +1,104 @@ +============================ +NUMA resource associativity +============================= + +Associativity represents the groupings of the various platform resources into +domains of substantially similar mean performance relative to resources outside +of that domain. Resources subsets of a given domain that exhibit better +performance relative to each other than relative to other resources subsets +are represented as being members of a sub-grouping domain. This performance +characteristic is presented in terms of NUMA node distance within the Linux kernel. +From the platform view, these groups are also referred to as domains. + +PAPR interface currently supports different ways of communicating these resource +grouping details to the OS. These are referred to as Form 0, Form 1 and Form2 +associativity grouping. Form 0 is the oldest format and is now considered deprecated. + +Hypervisor indicates the type/form of associativity used via "ibm,architecture-vec-5 property". +Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of Form 0 or Form 1. +A value of 1 indicates the usage of Form 1 associativity. For Form 2 associativity +bit 2 of byte 5 in the "ibm,architecture-vec-5" property is used. + +Form 0 +----- +Form 0 associativity supports only two NUMA distances (LOCAL and REMOTE). + +Form 1 +----- +With Form 1 a combination of ibm,associativity-reference-points, and ibm,associativity +device tree properties are used to determine the NUMA distance between resource groups/domains. + +The “ibm,associativity” property contains a list of one or more numbers (domainID) +representing the resource’s platform grouping domains. + +The “ibm,associativity-reference-points” property contains a list of one or more numbers +(domainID index) that represents the 1 based ordinal in the associativity lists. +The list of domainID indexes represents an increasing hierarchy of resource grouping. + +ex: +{ primary domainID index, secondary domainID index, tertiary domainID index.. } + +Linux kernel uses the domainID at the primary domainID index as the NUMA node id. +Linux kernel computes NUMA distance between two domains by recursively comparing +if they belong to the same higher-level domains. For mismatch at every higher +level of the resource group, the kernel doubles the NUMA distance between the +comparing domains. + +Form 2 +------- +Form 2 associativity format adds separate device tree properties representing NUMA node distance +thereby making the node distance computation flexible. Form 2 also allows flexible primary +domain numbering. With numa distance computation now detached from the index value in +"ibm,associativity-reference-points" property, Form 2 allows a large number of primary domain +ids at the same domainID index representing resource groups of different performance/latency +characteristics. + +Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 in the +"ibm,architecture-vec-5" property. + +"ibm,numa-lookup-index-table" property contains a list of one or more numbers representing +the domainIDs present in the system. The offset of the domainID in this property is +used as an index while computing numa distance information via "ibm,numa-distance-table". + +prop-encoded-array: The number N of the domainIDs encoded as with encode-int, followed by +N domainID encoded as with encode-int + +For ex: +"ibm,numa-lookup-index-table" = {4, 0, 8, 250, 252}. The offset of domainID 8 (2) is used when +computing the distance of domain 8 from other domains present in the system. For the rest of +this document, this offset will be referred to as domain distance offset. + +"ibm,numa-distance-table" property contains a list of one or more numbers representing the NUMA +distance between resource groups/domains present in the system. + +prop-encoded-array: The number N of the distance values encoded as with encode-int, followed by +N distance values encoded as with encode-bytes. The max distance value we could encode is 255. +The number N must be equal to the square of m where m is the number of domainIDs in the +numa-lookup-index-table. + +For ex: +ibm,numa-lookup-index-table = <3 0 8 40>; +ibm,numa-distace-table = <9>, /bits/ 8 < 10 20 80 + 20 10 160 + 80 160 10>; + | 0 8 40 +--|------------ + | +0 | 10 20 80 + | +8 | 20 10 160 + | +40| 80 160 10 + +A possible "ibm,associativity" property for resources in node 0, 8 and 40 + +{ 3, 6, 7, 0 } +{ 3, 6, 9, 8 } +{ 3, 6, 7, 40} + +With "ibm,associativity-reference-points" { 0x3 } + +"ibm,lookup-index-table" helps in having a compact representation of distance matrix. +Since domainID can be sparse, the matrix of distances can also be effectively sparse. +With "ibm,lookup-index-table" we can achieve a compact representation of +distance information. diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h index 0cf648d829f15..89a31f1c7b118 100644 --- a/arch/powerpc/include/asm/firmware.h +++ b/arch/powerpc/include/asm/firmware.h @@ -53,6 +53,7 @@ #define FW_FEATURE_ULTRAVISOR ASM_CONST(0x0000004000000000) #define FW_FEATURE_STUFF_TCE ASM_CONST(0x0000008000000000) #define FW_FEATURE_RPT_INVALIDATE ASM_CONST(0x0000010000000000) +#define FW_FEATURE_FORM2_AFFINITY ASM_CONST(0x0000020000000000)
#ifndef __ASSEMBLY__
@@ -73,7 +74,7 @@ enum { FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 | FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE | FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR | - FW_FEATURE_RPT_INVALIDATE, + FW_FEATURE_RPT_INVALIDATE | FW_FEATURE_FORM2_AFFINITY, FW_FEATURE_PSERIES_ALWAYS = 0, FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR, FW_FEATURE_POWERNV_ALWAYS = 0, diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h index df9fec9d232cb..5c80152e8f188 100644 --- a/arch/powerpc/include/asm/prom.h +++ b/arch/powerpc/include/asm/prom.h @@ -149,6 +149,7 @@ extern int of_read_drc_info_cell(struct property **prop, #define OV5_XCMO 0x0440 /* Page Coalescing */ #define OV5_FORM1_AFFINITY 0x0580 /* FORM1 NUMA affinity */ #define OV5_PRRN 0x0540 /* Platform Resource Reassignment */ +#define OV5_FORM2_AFFINITY 0x0520 /* Form2 NUMA affinity */ #define OV5_HP_EVT 0x0604 /* Hot Plug Event support */ #define OV5_RESIZE_HPT 0x0601 /* Hash Page Table resizing */ #define OV5_PFO_HW_RNG 0x1180 /* PFO Random Number Generator */ diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index a3bf3587a4162..6f7ad80763159 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -1069,7 +1069,8 @@ static const struct ibm_arch_vec ibm_architecture_vec_template __initconst = { #else 0, #endif - .associativity = OV5_FEAT(OV5_FORM1_AFFINITY) | OV5_FEAT(OV5_PRRN), + .associativity = OV5_FEAT(OV5_FORM1_AFFINITY) | OV5_FEAT(OV5_PRRN) | + OV5_FEAT(OV5_FORM2_AFFINITY), .bin_opts = OV5_FEAT(OV5_RESIZE_HPT) | OV5_FEAT(OV5_HP_EVT), .micro_checkpoint = 0, .reserved0 = 0, diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 010476abec344..cfc170935a58b 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -56,12 +56,17 @@ static int n_mem_addr_cells, n_mem_size_cells;
#define FORM0_AFFINITY 0 #define FORM1_AFFINITY 1 +#define FORM2_AFFINITY 2 static int affinity_form;
#define MAX_DISTANCE_REF_POINTS 4 static int distance_ref_points_depth; static const __be32 *distance_ref_points; static int distance_lookup_table[MAX_NUMNODES][MAX_DISTANCE_REF_POINTS]; +static int numa_distance_table[MAX_NUMNODES][MAX_NUMNODES] = { + [0 ... MAX_NUMNODES - 1] = { [0 ... MAX_NUMNODES - 1] = -1 } +}; +static int numa_id_index_table[MAX_NUMNODES] = { [0 ... MAX_NUMNODES - 1] = NUMA_NO_NODE };
/* * Allocate node_to_cpumask_map based on number of available nodes @@ -166,6 +171,54 @@ static void unmap_cpu_from_node(unsigned long cpu) } #endif /* CONFIG_HOTPLUG_CPU || CONFIG_PPC_SPLPAR */
+static int __associativity_to_nid(const __be32 *associativity, + int max_array_sz) +{ + int nid; + /* + * primary_domain_index is 1 based array index. + */ + int index = primary_domain_index - 1; + + if (!numa_enabled || index >= max_array_sz) + return NUMA_NO_NODE; + + nid = of_read_number(&associativity[index], 1); + + /* POWER4 LPAR uses 0xffff as invalid node */ + if (nid == 0xffff || nid >= nr_node_ids) + nid = NUMA_NO_NODE; + return nid; +} +/* + * Returns nid in the range [0..nr_node_ids], or -1 if no useful NUMA + * info is found. + */ +static int associativity_to_nid(const __be32 *associativity) +{ + int array_sz = of_read_number(associativity, 1); + + /* Skip the first element in the associativity array */ + return __associativity_to_nid((associativity + 1), array_sz); +} + +static int __cpu_form2_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) +{ + int dist; + int node1, node2; + + node1 = associativity_to_nid(cpu1_assoc); + node2 = associativity_to_nid(cpu2_assoc); + + dist = numa_distance_table[node1][node2]; + if (dist <= LOCAL_DISTANCE) + return 0; + else if (dist <= REMOTE_DISTANCE) + return 1; + else + return 2; +} + static int __cpu_form1_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) { int dist = 0; @@ -186,8 +239,9 @@ int cpu_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) { /* We should not get called with FORM0 */ VM_WARN_ON(affinity_form == FORM0_AFFINITY); - - return __cpu_form1_relative_distance(cpu1_assoc, cpu2_assoc); + if (affinity_form == FORM1_AFFINITY) + return __cpu_form1_relative_distance(cpu1_assoc, cpu2_assoc); + return __cpu_form2_relative_distance(cpu1_assoc, cpu2_assoc); }
/* must hold reference to node during call */ @@ -201,7 +255,9 @@ int __node_distance(int a, int b) int i; int distance = LOCAL_DISTANCE;
- if (affinity_form == FORM0_AFFINITY) + if (affinity_form == FORM2_AFFINITY) + return numa_distance_table[a][b]; + else if (affinity_form == FORM0_AFFINITY) return ((a == b) ? LOCAL_DISTANCE : REMOTE_DISTANCE);
for (i = 0; i < distance_ref_points_depth; i++) { @@ -216,37 +272,6 @@ int __node_distance(int a, int b) } EXPORT_SYMBOL(__node_distance);
-static int __associativity_to_nid(const __be32 *associativity, - int max_array_sz) -{ - int nid; - /* - * primary_domain_index is 1 based array index. - */ - int index = primary_domain_index - 1; - - if (!numa_enabled || index >= max_array_sz) - return NUMA_NO_NODE; - - nid = of_read_number(&associativity[index], 1); - - /* POWER4 LPAR uses 0xffff as invalid node */ - if (nid == 0xffff || nid >= nr_node_ids) - nid = NUMA_NO_NODE; - return nid; -} -/* - * Returns nid in the range [0..nr_node_ids], or -1 if no useful NUMA - * info is found. - */ -static int associativity_to_nid(const __be32 *associativity) -{ - int array_sz = of_read_number(associativity, 1); - - /* Skip the first element in the associativity array */ - return __associativity_to_nid((associativity + 1), array_sz); -} - /* Returns the nid associated with the given device tree node, * or -1 if not found. */ @@ -320,6 +345,8 @@ static void initialize_form1_numa_distance(const __be32 *associativity) */ void update_numa_distance(struct device_node *node) { + int nid; + if (affinity_form == FORM0_AFFINITY) return; else if (affinity_form == FORM1_AFFINITY) { @@ -332,6 +359,84 @@ void update_numa_distance(struct device_node *node) initialize_form1_numa_distance(associativity); return; } + + /* FORM2 affinity */ + nid = of_node_to_nid_single(node); + if (nid == NUMA_NO_NODE) + return; + + /* + * With FORM2 we expect NUMA distance of all possible NUMA + * nodes to be provided during boot. + */ + WARN(numa_distance_table[nid][nid] == -1, + "NUMA distance details for node %d not provided\n", nid); +} + +/* + * ibm,numa-lookup-index-table= {N, domainid1, domainid2, ..... domainidN} + * ibm,numa-distance-table = { N, 1, 2, 4, 5, 1, 6, .... N elements} + */ +static void initialize_form2_numa_distance_lookup_table(void) +{ + int i, j; + struct device_node *root; + const __u8 *numa_dist_table; + const __be32 *numa_lookup_index; + int numa_dist_table_length; + int max_numa_index, distance_index; + + if (firmware_has_feature(FW_FEATURE_OPAL)) + root = of_find_node_by_path("/ibm,opal"); + else + root = of_find_node_by_path("/rtas"); + if (!root) + root = of_find_node_by_path("/"); + + numa_lookup_index = of_get_property(root, "ibm,numa-lookup-index-table", NULL); + max_numa_index = of_read_number(&numa_lookup_index[0], 1); + + /* first element of the array is the size and is encode-int */ + numa_dist_table = of_get_property(root, "ibm,numa-distance-table", NULL); + numa_dist_table_length = of_read_number((const __be32 *)&numa_dist_table[0], 1); + /* Skip the size which is encoded int */ + numa_dist_table += sizeof(__be32); + + pr_debug("numa_dist_table_len = %d, numa_dist_indexes_len = %d\n", + numa_dist_table_length, max_numa_index); + + for (i = 0; i < max_numa_index; i++) + /* +1 skip the max_numa_index in the property */ + numa_id_index_table[i] = of_read_number(&numa_lookup_index[i + 1], 1); + + + if (numa_dist_table_length != max_numa_index * max_numa_index) { + WARN(1, "Wrong NUMA distance information\n"); + /* consider everybody else just remote. */ + for (i = 0; i < max_numa_index; i++) { + for (j = 0; j < max_numa_index; j++) { + int nodeA = numa_id_index_table[i]; + int nodeB = numa_id_index_table[j]; + + if (nodeA == nodeB) + numa_distance_table[nodeA][nodeB] = LOCAL_DISTANCE; + else + numa_distance_table[nodeA][nodeB] = REMOTE_DISTANCE; + } + } + } + + distance_index = 0; + for (i = 0; i < max_numa_index; i++) { + for (j = 0; j < max_numa_index; j++) { + int nodeA = numa_id_index_table[i]; + int nodeB = numa_id_index_table[j]; + + numa_distance_table[nodeA][nodeB] = numa_dist_table[distance_index++]; + pr_debug("dist[%d][%d]=%d ", nodeA, nodeB, numa_distance_table[nodeA][nodeB]); + } + } + of_node_put(root); }
static int __init find_primary_domain_index(void) @@ -344,6 +449,9 @@ static int __init find_primary_domain_index(void) */ if (firmware_has_feature(FW_FEATURE_OPAL)) { affinity_form = FORM1_AFFINITY; + } else if (firmware_has_feature(FW_FEATURE_FORM2_AFFINITY)) { + dbg("Using form 2 affinity\n"); + affinity_form = FORM2_AFFINITY; } else if (firmware_has_feature(FW_FEATURE_FORM1_AFFINITY)) { dbg("Using form 1 affinity\n"); affinity_form = FORM1_AFFINITY; @@ -388,9 +496,12 @@ static int __init find_primary_domain_index(void)
index = of_read_number(&distance_ref_points[1], 1); } else { + /* + * Both FORM1 and FORM2 affinity find the primary domain details + * at the same offset. + */ index = of_read_number(distance_ref_points, 1); } - /* * Warn and cap if the hardware supports more than * MAX_DISTANCE_REF_POINTS domains. @@ -819,6 +930,12 @@ static int __init parse_numa_properties(void)
dbg("NUMA associativity depth for CPU/Memory: %d\n", primary_domain_index);
+ /* + * If it is FORM2 initialize the distance table here. + */ + if (affinity_form == FORM2_AFFINITY) + initialize_form2_numa_distance_lookup_table(); + /* * Even though we connect cpus to numa domains later in SMP * init, we need to know the node ids now. This is because diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c index 5d4c2bc20bbab..f162156b7b68d 100644 --- a/arch/powerpc/platforms/pseries/firmware.c +++ b/arch/powerpc/platforms/pseries/firmware.c @@ -123,6 +123,7 @@ vec5_fw_features_table[] = { {FW_FEATURE_PRRN, OV5_PRRN}, {FW_FEATURE_DRMEM_V2, OV5_DRMEM_V2}, {FW_FEATURE_DRC_INFO, OV5_DRC_INFO}, + {FW_FEATURE_FORM2_AFFINITY, OV5_FORM2_AFFINITY}, };
static void __init fw_vec5_feature_init(const char *vec5, unsigned long len)
Hi Greg,
On Tue, Apr 18, 2023 at 02:22:04PM +0200, Greg Kroah-Hartman wrote:
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit 1c6b5a7e74052768977855f95d6b8812f6e7772c ]
PAPR interface currently supports two different ways of communicating resource grouping details to the OS. These are referred to as Form 0 and Form 1 associativity grouping. Form 0 is the older format and is now considered deprecated. This patch adds another resource grouping named FORM2.
Signed-off-by: Daniel Henrique Barboza danielhb413@gmail.com Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20210812132223.225214-6-aneesh.kumar@linux.ibm.com Stable-dep-of: b277fc793daf ("powerpc/papr_scm: Update the NUMA distance table for the target node") Signed-off-by: Sasha Levin sashal@kernel.org
Documentation/powerpc/associativity.rst | 104 ++++++++++++ arch/powerpc/include/asm/firmware.h | 3 +- arch/powerpc/include/asm/prom.h | 1 + arch/powerpc/kernel/prom_init.c | 3 +- arch/powerpc/mm/numa.c | 187 ++++++++++++++++++---- arch/powerpc/platforms/pseries/firmware.c | 1 + 6 files changed, 262 insertions(+), 37 deletions(-) create mode 100644 Documentation/powerpc/associativity.rst
diff --git a/Documentation/powerpc/associativity.rst b/Documentation/powerpc/associativity.rst new file mode 100644 index 0000000000000..07e7dd3d6c87e --- /dev/null +++ b/Documentation/powerpc/associativity.rst @@ -0,0 +1,104 @@ +============================ +NUMA resource associativity +=============================
+Associativity represents the groupings of the various platform resources into +domains of substantially similar mean performance relative to resources outside +of that domain. Resources subsets of a given domain that exhibit better +performance relative to each other than relative to other resources subsets +are represented as being members of a sub-grouping domain. This performance +characteristic is presented in terms of NUMA node distance within the Linux kernel. +From the platform view, these groups are also referred to as domains.
+PAPR interface currently supports different ways of communicating these resource +grouping details to the OS. These are referred to as Form 0, Form 1 and Form2 +associativity grouping. Form 0 is the oldest format and is now considered deprecated.
+Hypervisor indicates the type/form of associativity used via "ibm,architecture-vec-5 property". +Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates usage of Form 0 or Form 1. +A value of 1 indicates the usage of Form 1 associativity. For Form 2 associativity +bit 2 of byte 5 in the "ibm,architecture-vec-5" property is used.
+Form 0 +----- +Form 0 associativity supports only two NUMA distances (LOCAL and REMOTE).
+Form 1 +----- +With Form 1 a combination of ibm,associativity-reference-points, and ibm,associativity +device tree properties are used to determine the NUMA distance between resource groups/domains.
+The “ibm,associativity” property contains a list of one or more numbers (domainID) +representing the resource’s platform grouping domains.
+The “ibm,associativity-reference-points” property contains a list of one or more numbers +(domainID index) that represents the 1 based ordinal in the associativity lists. +The list of domainID indexes represents an increasing hierarchy of resource grouping.
+ex: +{ primary domainID index, secondary domainID index, tertiary domainID index.. }
+Linux kernel uses the domainID at the primary domainID index as the NUMA node id. +Linux kernel computes NUMA distance between two domains by recursively comparing +if they belong to the same higher-level domains. For mismatch at every higher +level of the resource group, the kernel doubles the NUMA distance between the +comparing domains.
+Form 2 +------- +Form 2 associativity format adds separate device tree properties representing NUMA node distance +thereby making the node distance computation flexible. Form 2 also allows flexible primary +domain numbering. With numa distance computation now detached from the index value in +"ibm,associativity-reference-points" property, Form 2 allows a large number of primary domain +ids at the same domainID index representing resource groups of different performance/latency +characteristics.
+Hypervisor indicates the usage of FORM2 associativity using bit 2 of byte 5 in the +"ibm,architecture-vec-5" property.
+"ibm,numa-lookup-index-table" property contains a list of one or more numbers representing +the domainIDs present in the system. The offset of the domainID in this property is +used as an index while computing numa distance information via "ibm,numa-distance-table".
+prop-encoded-array: The number N of the domainIDs encoded as with encode-int, followed by +N domainID encoded as with encode-int
+For ex: +"ibm,numa-lookup-index-table" = {4, 0, 8, 250, 252}. The offset of domainID 8 (2) is used when +computing the distance of domain 8 from other domains present in the system. For the rest of +this document, this offset will be referred to as domain distance offset.
+"ibm,numa-distance-table" property contains a list of one or more numbers representing the NUMA +distance between resource groups/domains present in the system.
+prop-encoded-array: The number N of the distance values encoded as with encode-int, followed by +N distance values encoded as with encode-bytes. The max distance value we could encode is 255. +The number N must be equal to the square of m where m is the number of domainIDs in the +numa-lookup-index-table.
+For ex: +ibm,numa-lookup-index-table = <3 0 8 40>; +ibm,numa-distace-table = <9>, /bits/ 8 < 10 20 80
20 10 160
80 160 10>;
- | 0 8 40
+--|------------
- |
+0 | 10 20 80
- |
+8 | 20 10 160
- |
+40| 80 160 10
+A possible "ibm,associativity" property for resources in node 0, 8 and 40
+{ 3, 6, 7, 0 } +{ 3, 6, 9, 8 } +{ 3, 6, 7, 40}
+With "ibm,associativity-reference-points" { 0x3 }
+"ibm,lookup-index-table" helps in having a compact representation of distance matrix. +Since domainID can be sparse, the matrix of distances can also be effectively sparse. +With "ibm,lookup-index-table" we can achieve a compact representation of +distance information. diff --git a/arch/powerpc/include/asm/firmware.h b/arch/powerpc/include/asm/firmware.h index 0cf648d829f15..89a31f1c7b118 100644 --- a/arch/powerpc/include/asm/firmware.h +++ b/arch/powerpc/include/asm/firmware.h @@ -53,6 +53,7 @@ #define FW_FEATURE_ULTRAVISOR ASM_CONST(0x0000004000000000) #define FW_FEATURE_STUFF_TCE ASM_CONST(0x0000008000000000) #define FW_FEATURE_RPT_INVALIDATE ASM_CONST(0x0000010000000000) +#define FW_FEATURE_FORM2_AFFINITY ASM_CONST(0x0000020000000000) #ifndef __ASSEMBLY__ @@ -73,7 +74,7 @@ enum { FW_FEATURE_HPT_RESIZE | FW_FEATURE_DRMEM_V2 | FW_FEATURE_DRC_INFO | FW_FEATURE_BLOCK_REMOVE | FW_FEATURE_PAPR_SCM | FW_FEATURE_ULTRAVISOR |
FW_FEATURE_RPT_INVALIDATE,
FW_FEATURE_PSERIES_ALWAYS = 0, FW_FEATURE_POWERNV_POSSIBLE = FW_FEATURE_OPAL | FW_FEATURE_ULTRAVISOR, FW_FEATURE_POWERNV_ALWAYS = 0,FW_FEATURE_RPT_INVALIDATE | FW_FEATURE_FORM2_AFFINITY,
diff --git a/arch/powerpc/include/asm/prom.h b/arch/powerpc/include/asm/prom.h index df9fec9d232cb..5c80152e8f188 100644 --- a/arch/powerpc/include/asm/prom.h +++ b/arch/powerpc/include/asm/prom.h @@ -149,6 +149,7 @@ extern int of_read_drc_info_cell(struct property **prop, #define OV5_XCMO 0x0440 /* Page Coalescing */ #define OV5_FORM1_AFFINITY 0x0580 /* FORM1 NUMA affinity */ #define OV5_PRRN 0x0540 /* Platform Resource Reassignment */ +#define OV5_FORM2_AFFINITY 0x0520 /* Form2 NUMA affinity */ #define OV5_HP_EVT 0x0604 /* Hot Plug Event support */ #define OV5_RESIZE_HPT 0x0601 /* Hash Page Table resizing */ #define OV5_PFO_HW_RNG 0x1180 /* PFO Random Number Generator */ diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c index a3bf3587a4162..6f7ad80763159 100644 --- a/arch/powerpc/kernel/prom_init.c +++ b/arch/powerpc/kernel/prom_init.c @@ -1069,7 +1069,8 @@ static const struct ibm_arch_vec ibm_architecture_vec_template __initconst = { #else 0, #endif
.associativity = OV5_FEAT(OV5_FORM1_AFFINITY) | OV5_FEAT(OV5_PRRN),
.associativity = OV5_FEAT(OV5_FORM1_AFFINITY) | OV5_FEAT(OV5_PRRN) |
.bin_opts = OV5_FEAT(OV5_RESIZE_HPT) | OV5_FEAT(OV5_HP_EVT), .micro_checkpoint = 0, .reserved0 = 0,OV5_FEAT(OV5_FORM2_AFFINITY),
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index 010476abec344..cfc170935a58b 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -56,12 +56,17 @@ static int n_mem_addr_cells, n_mem_size_cells; #define FORM0_AFFINITY 0 #define FORM1_AFFINITY 1 +#define FORM2_AFFINITY 2 static int affinity_form; #define MAX_DISTANCE_REF_POINTS 4 static int distance_ref_points_depth; static const __be32 *distance_ref_points; static int distance_lookup_table[MAX_NUMNODES][MAX_DISTANCE_REF_POINTS]; +static int numa_distance_table[MAX_NUMNODES][MAX_NUMNODES] = {
- [0 ... MAX_NUMNODES - 1] = { [0 ... MAX_NUMNODES - 1] = -1 }
+}; +static int numa_id_index_table[MAX_NUMNODES] = { [0 ... MAX_NUMNODES - 1] = NUMA_NO_NODE }; /*
- Allocate node_to_cpumask_map based on number of available nodes
@@ -166,6 +171,54 @@ static void unmap_cpu_from_node(unsigned long cpu) } #endif /* CONFIG_HOTPLUG_CPU || CONFIG_PPC_SPLPAR */ +static int __associativity_to_nid(const __be32 *associativity,
int max_array_sz)
+{
- int nid;
- /*
* primary_domain_index is 1 based array index.
*/
- int index = primary_domain_index - 1;
- if (!numa_enabled || index >= max_array_sz)
return NUMA_NO_NODE;
- nid = of_read_number(&associativity[index], 1);
- /* POWER4 LPAR uses 0xffff as invalid node */
- if (nid == 0xffff || nid >= nr_node_ids)
nid = NUMA_NO_NODE;
- return nid;
+} +/*
- Returns nid in the range [0..nr_node_ids], or -1 if no useful NUMA
- info is found.
- */
+static int associativity_to_nid(const __be32 *associativity) +{
- int array_sz = of_read_number(associativity, 1);
- /* Skip the first element in the associativity array */
- return __associativity_to_nid((associativity + 1), array_sz);
+}
+static int __cpu_form2_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) +{
- int dist;
- int node1, node2;
- node1 = associativity_to_nid(cpu1_assoc);
- node2 = associativity_to_nid(cpu2_assoc);
- dist = numa_distance_table[node1][node2];
- if (dist <= LOCAL_DISTANCE)
return 0;
- else if (dist <= REMOTE_DISTANCE)
return 1;
- else
return 2;
+}
static int __cpu_form1_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) { int dist = 0; @@ -186,8 +239,9 @@ int cpu_relative_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc) { /* We should not get called with FORM0 */ VM_WARN_ON(affinity_form == FORM0_AFFINITY);
- return __cpu_form1_relative_distance(cpu1_assoc, cpu2_assoc);
- if (affinity_form == FORM1_AFFINITY)
return __cpu_form1_relative_distance(cpu1_assoc, cpu2_assoc);
- return __cpu_form2_relative_distance(cpu1_assoc, cpu2_assoc);
} /* must hold reference to node during call */ @@ -201,7 +255,9 @@ int __node_distance(int a, int b) int i; int distance = LOCAL_DISTANCE;
- if (affinity_form == FORM0_AFFINITY)
- if (affinity_form == FORM2_AFFINITY)
return numa_distance_table[a][b];
- else if (affinity_form == FORM0_AFFINITY) return ((a == b) ? LOCAL_DISTANCE : REMOTE_DISTANCE);
for (i = 0; i < distance_ref_points_depth; i++) { @@ -216,37 +272,6 @@ int __node_distance(int a, int b) } EXPORT_SYMBOL(__node_distance); -static int __associativity_to_nid(const __be32 *associativity,
int max_array_sz)
-{
- int nid;
- /*
* primary_domain_index is 1 based array index.
*/
- int index = primary_domain_index - 1;
- if (!numa_enabled || index >= max_array_sz)
return NUMA_NO_NODE;
- nid = of_read_number(&associativity[index], 1);
- /* POWER4 LPAR uses 0xffff as invalid node */
- if (nid == 0xffff || nid >= nr_node_ids)
nid = NUMA_NO_NODE;
- return nid;
-} -/*
- Returns nid in the range [0..nr_node_ids], or -1 if no useful NUMA
- info is found.
- */
-static int associativity_to_nid(const __be32 *associativity) -{
- int array_sz = of_read_number(associativity, 1);
- /* Skip the first element in the associativity array */
- return __associativity_to_nid((associativity + 1), array_sz);
-}
/* Returns the nid associated with the given device tree node,
- or -1 if not found.
*/ @@ -320,6 +345,8 @@ static void initialize_form1_numa_distance(const __be32 *associativity) */ void update_numa_distance(struct device_node *node) {
- int nid;
- if (affinity_form == FORM0_AFFINITY) return; else if (affinity_form == FORM1_AFFINITY) {
@@ -332,6 +359,84 @@ void update_numa_distance(struct device_node *node) initialize_form1_numa_distance(associativity); return; }
- /* FORM2 affinity */
- nid = of_node_to_nid_single(node);
- if (nid == NUMA_NO_NODE)
return;
- /*
* With FORM2 we expect NUMA distance of all possible NUMA
* nodes to be provided during boot.
*/
- WARN(numa_distance_table[nid][nid] == -1,
"NUMA distance details for node %d not provided\n", nid);
+}
+/*
- ibm,numa-lookup-index-table= {N, domainid1, domainid2, ..... domainidN}
- ibm,numa-distance-table = { N, 1, 2, 4, 5, 1, 6, .... N elements}
- */
+static void initialize_form2_numa_distance_lookup_table(void) +{
- int i, j;
- struct device_node *root;
- const __u8 *numa_dist_table;
- const __be32 *numa_lookup_index;
- int numa_dist_table_length;
- int max_numa_index, distance_index;
- if (firmware_has_feature(FW_FEATURE_OPAL))
root = of_find_node_by_path("/ibm,opal");
- else
root = of_find_node_by_path("/rtas");
- if (!root)
root = of_find_node_by_path("/");
- numa_lookup_index = of_get_property(root, "ibm,numa-lookup-index-table", NULL);
- max_numa_index = of_read_number(&numa_lookup_index[0], 1);
- /* first element of the array is the size and is encode-int */
- numa_dist_table = of_get_property(root, "ibm,numa-distance-table", NULL);
- numa_dist_table_length = of_read_number((const __be32 *)&numa_dist_table[0], 1);
- /* Skip the size which is encoded int */
- numa_dist_table += sizeof(__be32);
- pr_debug("numa_dist_table_len = %d, numa_dist_indexes_len = %d\n",
numa_dist_table_length, max_numa_index);
- for (i = 0; i < max_numa_index; i++)
/* +1 skip the max_numa_index in the property */
numa_id_index_table[i] = of_read_number(&numa_lookup_index[i + 1], 1);
- if (numa_dist_table_length != max_numa_index * max_numa_index) {
WARN(1, "Wrong NUMA distance information\n");
/* consider everybody else just remote. */
for (i = 0; i < max_numa_index; i++) {
for (j = 0; j < max_numa_index; j++) {
int nodeA = numa_id_index_table[i];
int nodeB = numa_id_index_table[j];
if (nodeA == nodeB)
numa_distance_table[nodeA][nodeB] = LOCAL_DISTANCE;
else
numa_distance_table[nodeA][nodeB] = REMOTE_DISTANCE;
}
}
- }
- distance_index = 0;
- for (i = 0; i < max_numa_index; i++) {
for (j = 0; j < max_numa_index; j++) {
int nodeA = numa_id_index_table[i];
int nodeB = numa_id_index_table[j];
numa_distance_table[nodeA][nodeB] = numa_dist_table[distance_index++];
pr_debug("dist[%d][%d]=%d ", nodeA, nodeB, numa_distance_table[nodeA][nodeB]);
}
- }
- of_node_put(root);
} static int __init find_primary_domain_index(void) @@ -344,6 +449,9 @@ static int __init find_primary_domain_index(void) */ if (firmware_has_feature(FW_FEATURE_OPAL)) { affinity_form = FORM1_AFFINITY;
- } else if (firmware_has_feature(FW_FEATURE_FORM2_AFFINITY)) {
dbg("Using form 2 affinity\n");
} else if (firmware_has_feature(FW_FEATURE_FORM1_AFFINITY)) { dbg("Using form 1 affinity\n"); affinity_form = FORM1_AFFINITY;affinity_form = FORM2_AFFINITY;
@@ -388,9 +496,12 @@ static int __init find_primary_domain_index(void) index = of_read_number(&distance_ref_points[1], 1); } else {
/*
* Both FORM1 and FORM2 affinity find the primary domain details
* at the same offset.
index = of_read_number(distance_ref_points, 1); }*/
- /*
- Warn and cap if the hardware supports more than
- MAX_DISTANCE_REF_POINTS domains.
@@ -819,6 +930,12 @@ static int __init parse_numa_properties(void) dbg("NUMA associativity depth for CPU/Memory: %d\n", primary_domain_index);
- /*
* If it is FORM2 initialize the distance table here.
*/
- if (affinity_form == FORM2_AFFINITY)
initialize_form2_numa_distance_lookup_table();
- /*
- Even though we connect cpus to numa domains later in SMP
- init, we need to know the node ids now. This is because
diff --git a/arch/powerpc/platforms/pseries/firmware.c b/arch/powerpc/platforms/pseries/firmware.c index 5d4c2bc20bbab..f162156b7b68d 100644 --- a/arch/powerpc/platforms/pseries/firmware.c +++ b/arch/powerpc/platforms/pseries/firmware.c @@ -123,6 +123,7 @@ vec5_fw_features_table[] = { {FW_FEATURE_PRRN, OV5_PRRN}, {FW_FEATURE_DRMEM_V2, OV5_DRMEM_V2}, {FW_FEATURE_DRC_INFO, OV5_DRC_INFO},
- {FW_FEATURE_FORM2_AFFINITY, OV5_FORM2_AFFINITY},
}; static void __init fw_vec5_feature_init(const char *vec5, unsigned long len) -- 2.39.2
This commit causes documentation build to fail:
| Sphinx parallel build error: | docutils.utils.SystemMessage: /build/linux/Documentation/powerpc/associativity.rst:1: (SEVERE/4) Title overline & underline mismatch. | | ============================ | NUMA resource associativity | =============================
This in fact was fixed with upstream f50da6edbf1e ("powerpc/doc: Fix htmldocs errors") but was not applied along with the above changes.
Can you please pick f50da6edbf1e as well for 5.10.y?
#regzbot ^introduced 1c6b5a7e7405 #regzbot ^introduced 453b3188be89 #regzbot fixed-by: f50da6edbf1e
Regards, Salvatore
From: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com
[ Upstream commit b277fc793daf258877b4c0744b52f69d6e6ba22e ]
Platform device helper routines won't update the NUMA distance table while creating a platform device, even if the device is present on a NUMA node that doesn't have memory or CPU. This is especially true for pmem devices. If the target node of the pmem device is not online, we find the nearest online node to the device and associate the pmem device with that online node. To find the nearest online node, we should have the numa distance table updated correctly. Update the distance information during the device probe.
For a papr scm device on NUMA node 3 distance_lookup_table value for distance_ref_points_depth = 2 before and after fix is below:
Before fix: node 3 distance depth 0 - 0 node 3 distance depth 1 - 0 node 4 distance depth 0 - 4 node 4 distance depth 1 - 2 node 5 distance depth 0 - 5 node 5 distance depth 1 - 1
After fix node 3 distance depth 0 - 3 node 3 distance depth 1 - 1 node 4 distance depth 0 - 4 node 4 distance depth 1 - 2 node 5 distance depth 0 - 5 node 5 distance depth 1 - 1
Without the fix, the nearest numa node to the pmem device (NUMA node 3) will be picked as 4. After the fix, we get the correct numa node which is 5.
Fixes: da1115fdbd6e ("powerpc/nvdimm: Pick nearby online node if the device node is not online") Signed-off-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20230404041433.1781804-1-aneesh.kumar@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/mm/numa.c | 1 + arch/powerpc/platforms/pseries/papr_scm.c | 7 +++++++ 2 files changed, 8 insertions(+)
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c index cfc170935a58b..ce8569e16f0c4 100644 --- a/arch/powerpc/mm/numa.c +++ b/arch/powerpc/mm/numa.c @@ -372,6 +372,7 @@ void update_numa_distance(struct device_node *node) WARN(numa_distance_table[nid][nid] == -1, "NUMA distance details for node %d not provided\n", nid); } +EXPORT_SYMBOL_GPL(update_numa_distance);
/* * ibm,numa-lookup-index-table= {N, domainid1, domainid2, ..... domainidN} diff --git a/arch/powerpc/platforms/pseries/papr_scm.c b/arch/powerpc/platforms/pseries/papr_scm.c index 057acbb9116dd..e3b7698b4762c 100644 --- a/arch/powerpc/platforms/pseries/papr_scm.c +++ b/arch/powerpc/platforms/pseries/papr_scm.c @@ -1079,6 +1079,13 @@ static int papr_scm_probe(struct platform_device *pdev) return -ENODEV; }
+ /* + * open firmware platform device create won't update the NUMA + * distance table. For PAPR SCM devices we use numa_map_to_online_node() + * to find the nearest online NUMA node and that requires correct + * distance table information. + */ + update_numa_distance(dn);
p = kzalloc(sizeof(*p), GFP_KERNEL); if (!p)
From: zgpeng zgpeng.linux@gmail.com
[ Upstream commit 06354900787f25bf5be3c07a68e3cdbc5bf0fa69 ]
In calculate_imbalance function, when the value of local->avg_load is greater than or equal to busiest->avg_load, the calculated sds->avg_load is not used. So this calculation can be placed in a more appropriate position.
Signed-off-by: zgpeng zgpeng@tencent.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Samuel Liao samuelliao@tencent.com Reviewed-by: Vincent Guittot vincent.guittot@linaro.org Link: https://lore.kernel.org/r/1649239025-10010-1-git-send-email-zgpeng@tencent.c... Stable-dep-of: 91dcf1e8068e ("sched/fair: Fix imbalance overflow") Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/fair.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index bb70a7856277f..22139e97b2a8e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9342,8 +9342,6 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s local->avg_load = (local->group_load * SCHED_CAPACITY_SCALE) / local->group_capacity;
- sds->avg_load = (sds->total_load * SCHED_CAPACITY_SCALE) / - sds->total_capacity; /* * If the local group is more loaded than the selected * busiest group don't try to pull any tasks. @@ -9352,6 +9350,9 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s env->imbalance = 0; return; } + + sds->avg_load = (sds->total_load * SCHED_CAPACITY_SCALE) / + sds->total_capacity; }
/*
From: Vincent Guittot vincent.guittot@linaro.org
[ Upstream commit 91dcf1e8068e9a8823e419a7a34ff4341275fb70 ]
When local group is fully busy but its average load is above system load, computing the imbalance will overflow and local group is not the best target for pulling this load.
Fixes: 0b0695f2b34a ("sched/fair: Rework load_balance()") Reported-by: Tingjia Cao tjcao980311@gmail.com Signed-off-by: Vincent Guittot vincent.guittot@linaro.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Tested-by: Tingjia Cao tjcao980311@gmail.com Link: https://lore.kernel.org/lkml/CABcWv9_DAhVBOq2=W=2ypKE9dKM5s2DvoV8-U0+GDwwuKZ... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/sched/fair.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 22139e97b2a8e..57a58bc48021a 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -9353,6 +9353,16 @@ static inline void calculate_imbalance(struct lb_env *env, struct sd_lb_stats *s
sds->avg_load = (sds->total_load * SCHED_CAPACITY_SCALE) / sds->total_capacity; + + /* + * If the local group is more loaded than the average system + * load, don't try to pull any tasks. + */ + if (local->avg_load >= sds->avg_load) { + env->imbalance = 0; + return; + } + }
/*
From: Matija Glavinic Pecotic matija.glavinic-pecotic.ext@nokia.com
[ Upstream commit 775d3c514c5b2763a50ab7839026d7561795924d ]
set_rtc_noop(), get_rtc_noop() are after booting, therefore their __init annotation is wrong.
A crash was observed on an x86 platform where CMOS RTC is unused and disabled via device tree. set_rtc_noop() was invoked from ntp: sync_hw_clock(), although CONFIG_RTC_SYSTOHC=n, however sync_cmos_clock() doesn't honour that.
Workqueue: events_power_efficient sync_hw_clock RIP: 0010:set_rtc_noop Call Trace: update_persistent_clock64 sync_hw_clock
Fix this by dropping the __init annotation from set/get_rtc_noop().
Fixes: c311ed6183f4 ("x86/init: Allow DT configured systems to disable RTC at boot time") Signed-off-by: Matija Glavinic Pecotic matija.glavinic-pecotic.ext@nokia.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Link: https://lore.kernel.org/r/59f7ceb1-446b-1d3d-0bc8-1f0ee94b1e18@nokia.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/x86_init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/x86_init.c b/arch/x86/kernel/x86_init.c index a3038d8deb6a4..b758eeea6090b 100644 --- a/arch/x86/kernel/x86_init.c +++ b/arch/x86/kernel/x86_init.c @@ -32,8 +32,8 @@ static int __init iommu_init_noop(void) { return 0; } static void iommu_shutdown_noop(void) { } bool __init bool_x86_init_noop(void) { return false; } void x86_op_int_noop(int cpu) { } -static __init int set_rtc_noop(const struct timespec64 *now) { return -EINVAL; } -static __init void get_rtc_noop(struct timespec64 *now) { } +static int set_rtc_noop(const struct timespec64 *now) { return -EINVAL; } +static void get_rtc_noop(struct timespec64 *now) { }
static __initconst const struct of_device_id of_cmos_match[] = { { .compatible = "motorola,mc146818" },
From: Gregor Herburger gregor.herburger@tq-group.com
[ Upstream commit f8160d3b35fc94491bb0cb974dbda310ef96c0e2 ]
In polling mode, no stop condition is generated after a timeout. This causes SCL to remain low and thereby block the bus. If this happens during a transfer it can cause slaves to misinterpret the subsequent transfer and return wrong values.
To solve this, pass the ETIMEDOUT error up from ocores_process_polling() instead of setting STATE_ERROR directly. The caller is adjusted to call ocores_process_timeout() on error both in polling and in IRQ mode, which will set STATE_ERROR and generate a stop condition.
Fixes: 69c8c0c0efa8 ("i2c: ocores: add polling interface") Signed-off-by: Gregor Herburger gregor.herburger@tq-group.com Signed-off-by: Matthias Schiffer matthias.schiffer@ew.tq-group.com Acked-by: Peter Korsgaard peter@korsgaard.com Reviewed-by: Andrew Lunn andrew@lunn.ch Reviewed-by: Federico Vaga federico.vaga@cern.ch Signed-off-by: Wolfram Sang wsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-ocores.c | 35 ++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 16 deletions(-)
diff --git a/drivers/i2c/busses/i2c-ocores.c b/drivers/i2c/busses/i2c-ocores.c index f5fc75b65a194..71e26aa6bd8ff 100644 --- a/drivers/i2c/busses/i2c-ocores.c +++ b/drivers/i2c/busses/i2c-ocores.c @@ -343,18 +343,18 @@ static int ocores_poll_wait(struct ocores_i2c *i2c) * ocores_isr(), we just add our polling code around it. * * It can run in atomic context + * + * Return: 0 on success, -ETIMEDOUT on timeout */ -static void ocores_process_polling(struct ocores_i2c *i2c) +static int ocores_process_polling(struct ocores_i2c *i2c) { - while (1) { - irqreturn_t ret; - int err; + irqreturn_t ret; + int err = 0;
+ while (1) { err = ocores_poll_wait(i2c); - if (err) { - i2c->state = STATE_ERROR; + if (err) break; /* timeout */ - }
ret = ocores_isr(-1, i2c); if (ret == IRQ_NONE) @@ -365,13 +365,15 @@ static void ocores_process_polling(struct ocores_i2c *i2c) break; } } + + return err; }
static int ocores_xfer_core(struct ocores_i2c *i2c, struct i2c_msg *msgs, int num, bool polling) { - int ret; + int ret = 0; u8 ctrl;
ctrl = oc_getreg(i2c, OCI2C_CONTROL); @@ -389,15 +391,16 @@ static int ocores_xfer_core(struct ocores_i2c *i2c, oc_setreg(i2c, OCI2C_CMD, OCI2C_CMD_START);
if (polling) { - ocores_process_polling(i2c); + ret = ocores_process_polling(i2c); } else { - ret = wait_event_timeout(i2c->wait, - (i2c->state == STATE_ERROR) || - (i2c->state == STATE_DONE), HZ); - if (ret == 0) { - ocores_process_timeout(i2c); - return -ETIMEDOUT; - } + if (wait_event_timeout(i2c->wait, + (i2c->state == STATE_ERROR) || + (i2c->state == STATE_DONE), HZ) == 0) + ret = -ETIMEDOUT; + } + if (ret) { + ocores_process_timeout(i2c); + return ret; }
return (i2c->state == STATE_DONE) ? num : -EIO;
From: Waiman Long longman@redhat.com
Commit b94f9ac79a7395c2d6171cc753cc27942df0be73 upstream.
Since commit 1243dc518c9d ("cgroup/cpuset: Convert cpuset_mutex to percpu_rwsem"), cpuset_mutex has been replaced by cpuset_rwsem which is a percpu rwsem. However, the comments in kernel/cgroup/cpuset.c still reference cpuset_mutex which are now incorrect.
Change all the references of cpuset_mutex to cpuset_rwsem.
Fixes: 1243dc518c9d ("cgroup/cpuset: Convert cpuset_mutex to percpu_rwsem") Signed-off-by: Waiman Long longman@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cgroup/cpuset.c | 56 +++++++++++++++++++++++++------------------------ 1 file changed, 29 insertions(+), 27 deletions(-)
--- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -299,17 +299,19 @@ static struct cpuset top_cpuset = { if (is_cpuset_online(((des_cs) = css_cs((pos_css)))))
/* - * There are two global locks guarding cpuset structures - cpuset_mutex and + * There are two global locks guarding cpuset structures - cpuset_rwsem and * callback_lock. We also require taking task_lock() when dereferencing a * task's cpuset pointer. See "The task_lock() exception", at the end of this - * comment. + * comment. The cpuset code uses only cpuset_rwsem write lock. Other + * kernel subsystems can use cpuset_read_lock()/cpuset_read_unlock() to + * prevent change to cpuset structures. * * A task must hold both locks to modify cpusets. If a task holds - * cpuset_mutex, then it blocks others wanting that mutex, ensuring that it + * cpuset_rwsem, it blocks others wanting that rwsem, ensuring that it * is the only task able to also acquire callback_lock and be able to * modify cpusets. It can perform various checks on the cpuset structure * first, knowing nothing will change. It can also allocate memory while - * just holding cpuset_mutex. While it is performing these checks, various + * just holding cpuset_rwsem. While it is performing these checks, various * callback routines can briefly acquire callback_lock to query cpusets. * Once it is ready to make the changes, it takes callback_lock, blocking * everyone else. @@ -380,7 +382,7 @@ static inline bool is_in_v2_mode(void) * One way or another, we guarantee to return some non-empty subset * of cpu_online_mask. * - * Call with callback_lock or cpuset_mutex held. + * Call with callback_lock or cpuset_rwsem held. */ static void guarantee_online_cpus(struct cpuset *cs, struct cpumask *pmask) { @@ -410,7 +412,7 @@ static void guarantee_online_cpus(struct * One way or another, we guarantee to return some non-empty subset * of node_states[N_MEMORY]. * - * Call with callback_lock or cpuset_mutex held. + * Call with callback_lock or cpuset_rwsem held. */ static void guarantee_online_mems(struct cpuset *cs, nodemask_t *pmask) { @@ -422,7 +424,7 @@ static void guarantee_online_mems(struct /* * update task's spread flag if cpuset's page/slab spread flag is set * - * Call with callback_lock or cpuset_mutex held. + * Call with callback_lock or cpuset_rwsem held. */ static void cpuset_update_task_spread_flag(struct cpuset *cs, struct task_struct *tsk) @@ -443,7 +445,7 @@ static void cpuset_update_task_spread_fl * * One cpuset is a subset of another if all its allowed CPUs and * Memory Nodes are a subset of the other, and its exclusive flags - * are only set if the other's are set. Call holding cpuset_mutex. + * are only set if the other's are set. Call holding cpuset_rwsem. */
static int is_cpuset_subset(const struct cpuset *p, const struct cpuset *q) @@ -552,7 +554,7 @@ static inline void free_cpuset(struct cp * If we replaced the flag and mask values of the current cpuset * (cur) with those values in the trial cpuset (trial), would * our various subset and exclusive rules still be valid? Presumes - * cpuset_mutex held. + * cpuset_rwsem held. * * 'cur' is the address of an actual, in-use cpuset. Operations * such as list traversal that depend on the actual address of the @@ -675,7 +677,7 @@ static void update_domain_attr_tree(stru rcu_read_unlock(); }
-/* Must be called with cpuset_mutex held. */ +/* Must be called with cpuset_rwsem held. */ static inline int nr_cpusets(void) { /* jump label reference count + the top-level cpuset */ @@ -701,7 +703,7 @@ static inline int nr_cpusets(void) * domains when operating in the severe memory shortage situations * that could cause allocation failures below. * - * Must be called with cpuset_mutex held. + * Must be called with cpuset_rwsem held. * * The three key local variables below are: * cp - cpuset pointer, used (together with pos_css) to perform a @@ -980,7 +982,7 @@ partition_and_rebuild_sched_domains(int * 'cpus' is removed, then call this routine to rebuild the * scheduler's dynamic sched domains. * - * Call with cpuset_mutex held. Takes get_online_cpus(). + * Call with cpuset_rwsem held. Takes get_online_cpus(). */ static void rebuild_sched_domains_locked(void) { @@ -1053,7 +1055,7 @@ void rebuild_sched_domains(void) * @cs: the cpuset in which each task's cpus_allowed mask needs to be changed * * Iterate through each task of @cs updating its cpus_allowed to the - * effective cpuset's. As this function is called with cpuset_mutex held, + * effective cpuset's. As this function is called with cpuset_rwsem held, * cpuset membership stays stable. */ static void update_tasks_cpumask(struct cpuset *cs) @@ -1328,7 +1330,7 @@ static int update_parent_subparts_cpumas * * On legacy hierachy, effective_cpus will be the same with cpu_allowed. * - * Called with cpuset_mutex held + * Called with cpuset_rwsem held */ static void update_cpumasks_hier(struct cpuset *cs, struct tmpmasks *tmp) { @@ -1688,12 +1690,12 @@ static void *cpuset_being_rebound; * @cs: the cpuset in which each task's mems_allowed mask needs to be changed * * Iterate through each task of @cs updating its mems_allowed to the - * effective cpuset's. As this function is called with cpuset_mutex held, + * effective cpuset's. As this function is called with cpuset_rwsem held, * cpuset membership stays stable. */ static void update_tasks_nodemask(struct cpuset *cs) { - static nodemask_t newmems; /* protected by cpuset_mutex */ + static nodemask_t newmems; /* protected by cpuset_rwsem */ struct css_task_iter it; struct task_struct *task;
@@ -1706,7 +1708,7 @@ static void update_tasks_nodemask(struct * take while holding tasklist_lock. Forks can happen - the * mpol_dup() cpuset_being_rebound check will catch such forks, * and rebind their vma mempolicies too. Because we still hold - * the global cpuset_mutex, we know that no other rebind effort + * the global cpuset_rwsem, we know that no other rebind effort * will be contending for the global variable cpuset_being_rebound. * It's ok if we rebind the same mm twice; mpol_rebind_mm() * is idempotent. Also migrate pages in each mm to new nodes. @@ -1752,7 +1754,7 @@ static void update_tasks_nodemask(struct * * On legacy hiearchy, effective_mems will be the same with mems_allowed. * - * Called with cpuset_mutex held + * Called with cpuset_rwsem held */ static void update_nodemasks_hier(struct cpuset *cs, nodemask_t *new_mems) { @@ -1805,7 +1807,7 @@ static void update_nodemasks_hier(struct * mempolicies and if the cpuset is marked 'memory_migrate', * migrate the tasks pages to the new memory. * - * Call with cpuset_mutex held. May take callback_lock during call. + * Call with cpuset_rwsem held. May take callback_lock during call. * Will take tasklist_lock, scan tasklist for tasks in cpuset cs, * lock each such tasks mm->mmap_lock, scan its vma's and rebind * their mempolicies to the cpusets new mems_allowed. @@ -1895,7 +1897,7 @@ static int update_relax_domain_level(str * @cs: the cpuset in which each task's spread flags needs to be changed * * Iterate through each task of @cs updating its spread flags. As this - * function is called with cpuset_mutex held, cpuset membership stays + * function is called with cpuset_rwsem held, cpuset membership stays * stable. */ static void update_tasks_flags(struct cpuset *cs) @@ -1915,7 +1917,7 @@ static void update_tasks_flags(struct cp * cs: the cpuset to update * turning_on: whether the flag is being set or cleared * - * Call with cpuset_mutex held. + * Call with cpuset_rwsem held. */
static int update_flag(cpuset_flagbits_t bit, struct cpuset *cs, @@ -1964,7 +1966,7 @@ out: * cs: the cpuset to update * new_prs: new partition root state * - * Call with cpuset_mutex held. + * Call with cpuset_rwsem held. */ static int update_prstate(struct cpuset *cs, int new_prs) { @@ -2145,7 +2147,7 @@ static int fmeter_getrate(struct fmeter
static struct cpuset *cpuset_attach_old_cs;
-/* Called by cgroups to determine if a cpuset is usable; cpuset_mutex held */ +/* Called by cgroups to determine if a cpuset is usable; cpuset_rwsem held */ static int cpuset_can_attach(struct cgroup_taskset *tset) { struct cgroup_subsys_state *css; @@ -2201,7 +2203,7 @@ static void cpuset_cancel_attach(struct }
/* - * Protected by cpuset_mutex. cpus_attach is used only by cpuset_attach() + * Protected by cpuset_rwsem. cpus_attach is used only by cpuset_attach() * but we can't allocate it dynamically there. Define it global and * allocate from cpuset_init(). */ @@ -2209,7 +2211,7 @@ static cpumask_var_t cpus_attach;
static void cpuset_attach(struct cgroup_taskset *tset) { - /* static buf protected by cpuset_mutex */ + /* static buf protected by cpuset_rwsem */ static nodemask_t cpuset_attach_nodemask_to; struct task_struct *task; struct task_struct *leader; @@ -2402,7 +2404,7 @@ static ssize_t cpuset_write_resmask(stru * operation like this one can lead to a deadlock through kernfs * active_ref protection. Let's break the protection. Losing the * protection is okay as we check whether @cs is online after - * grabbing cpuset_mutex anyway. This only happens on the legacy + * grabbing cpuset_rwsem anyway. This only happens on the legacy * hierarchies. */ css_get(&cs->css); @@ -3641,7 +3643,7 @@ void __cpuset_memory_pressure_bump(void) * - Used for /proc/<pid>/cpuset. * - No need to task_lock(tsk) on this tsk->cpuset reference, as it * doesn't really matter if tsk->cpuset changes after we read it, - * and we take cpuset_mutex, keeping cpuset_attach() from changing it + * and we take cpuset_rwsem, keeping cpuset_attach() from changing it * anyway. */ int proc_cpuset_show(struct seq_file *m, struct pid_namespace *ns,
From: Waiman Long longman@redhat.com
commit 18f9a4d47527772515ad6cbdac796422566e6440 upstream.
Cpuset v2 has no spread flags to set. So we can skip spread flags update if cpuset v2 is being used. Also change the name to cpuset_update_task_spread_flags() to indicate that there are multiple spread flags.
Signed-off-by: Waiman Long longman@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cgroup/cpuset.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
--- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -424,11 +424,15 @@ static void guarantee_online_mems(struct /* * update task's spread flag if cpuset's page/slab spread flag is set * - * Call with callback_lock or cpuset_rwsem held. + * Call with callback_lock or cpuset_rwsem held. The check can be skipped + * if on default hierarchy. */ -static void cpuset_update_task_spread_flag(struct cpuset *cs, +static void cpuset_update_task_spread_flags(struct cpuset *cs, struct task_struct *tsk) { + if (cgroup_subsys_on_dfl(cpuset_cgrp_subsys)) + return; + if (is_spread_page(cs)) task_set_spread_page(tsk); else @@ -1907,7 +1911,7 @@ static void update_tasks_flags(struct cp
css_task_iter_start(&cs->css, 0, &it); while ((task = css_task_iter_next(&it))) - cpuset_update_task_spread_flag(cs, task); + cpuset_update_task_spread_flags(cs, task); css_task_iter_end(&it); }
@@ -2241,7 +2245,7 @@ static void cpuset_attach(struct cgroup_ WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach));
cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to); - cpuset_update_task_spread_flag(cs, task); + cpuset_update_task_spread_flags(cs, task); }
/*
From: Waiman Long longman@redhat.com
commit 42a11bf5c5436e91b040aeb04063be1710bb9f9c upstream.
By default, the clone(2) syscall spawn a child process into the same cgroup as its parent. With the use of the CLONE_INTO_CGROUP flag introduced by commit ef2c41cf38a7 ("clone3: allow spawning processes into cgroups"), the child will be spawned into a different cgroup which is somewhat similar to writing the child's tid into "cgroup.threads".
The current cpuset_fork() method does not properly handle the CLONE_INTO_CGROUP case where the cpuset of the child may be different from that of its parent. Update the cpuset_fork() method to treat the CLONE_INTO_CGROUP case similar to cpuset_attach().
Since the newly cloned task has not been running yet, its actual memory usage isn't known. So it is not necessary to make change to mm in cpuset_fork().
Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups") Reported-by: Giuseppe Scrivano gscrivan@redhat.com Signed-off-by: Waiman Long longman@redhat.com Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cgroup/cpuset.c | 60 +++++++++++++++++++++++++++++++++++-------------- 1 file changed, 44 insertions(+), 16 deletions(-)
--- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2207,16 +2207,29 @@ static void cpuset_cancel_attach(struct }
/* - * Protected by cpuset_rwsem. cpus_attach is used only by cpuset_attach() + * Protected by cpuset_rwsem. cpus_attach is used only by cpuset_attach_task() * but we can't allocate it dynamically there. Define it global and * allocate from cpuset_init(). */ static cpumask_var_t cpus_attach; +static nodemask_t cpuset_attach_nodemask_to; + +static void cpuset_attach_task(struct cpuset *cs, struct task_struct *task) +{ + percpu_rwsem_assert_held(&cpuset_rwsem); + + /* + * can_attach beforehand should guarantee that this doesn't + * fail. TODO: have a better way to handle failure here + */ + WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach)); + + cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to); + cpuset_update_task_spread_flags(cs, task); +}
static void cpuset_attach(struct cgroup_taskset *tset) { - /* static buf protected by cpuset_rwsem */ - static nodemask_t cpuset_attach_nodemask_to; struct task_struct *task; struct task_struct *leader; struct cgroup_subsys_state *css; @@ -2237,16 +2250,8 @@ static void cpuset_attach(struct cgroup_
guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
- cgroup_taskset_for_each(task, css, tset) { - /* - * can_attach beforehand should guarantee that this doesn't - * fail. TODO: have a better way to handle failure here - */ - WARN_ON_ONCE(set_cpus_allowed_ptr(task, cpus_attach)); - - cpuset_change_task_nodemask(task, &cpuset_attach_nodemask_to); - cpuset_update_task_spread_flags(cs, task); - } + cgroup_taskset_for_each(task, css, tset) + cpuset_attach_task(cs, task);
/* * Change mm for all threadgroup leaders. This is expensive and may @@ -2914,11 +2919,34 @@ static void cpuset_bind(struct cgroup_su */ static void cpuset_fork(struct task_struct *task) { - if (task_css_is_root(task, cpuset_cgrp_id)) + struct cpuset *cs; + bool same_cs; + + rcu_read_lock(); + cs = task_cs(task); + same_cs = (cs == task_cs(current)); + rcu_read_unlock(); + + if (same_cs) { + if (cs == &top_cpuset) + return; + + set_cpus_allowed_ptr(task, current->cpus_ptr); + task->mems_allowed = current->mems_allowed; return; + } + + /* CLONE_INTO_CGROUP */ + percpu_down_write(&cpuset_rwsem); + guarantee_online_mems(cs, &cpuset_attach_nodemask_to);
- set_cpus_allowed_ptr(task, current->cpus_ptr); - task->mems_allowed = current->mems_allowed; + /* prepare for attach */ + if (cs == &top_cpuset) + cpumask_copy(cpus_attach, cpu_possible_mask); + else + guarantee_online_cpus(cs, cpus_attach); + cpuset_attach_task(cs, task); + percpu_up_write(&cpuset_rwsem); }
struct cgroup_subsys cpuset_cgrp_subsys = {
From: Waiman Long longman@redhat.com
commit eee87853794187f6adbe19533ed79c8b44b36a91 upstream.
In the case of CLONE_INTO_CGROUP, not all cpusets are ready to accept new tasks. It is too late to check that in cpuset_fork(). So we need to add the cpuset_can_fork() and cpuset_cancel_fork() methods to pre-check it before we can allow attachment to a different cpuset.
We also need to set the attach_in_progress flag to alert other code that a new task is going to be added to the cpuset.
Fixes: ef2c41cf38a7 ("clone3: allow spawning processes into cgroups") Suggested-by: Michal Koutný mkoutny@suse.com Signed-off-by: Waiman Long longman@redhat.com Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cgroup/cpuset.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 83 insertions(+), 5 deletions(-)
--- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2151,6 +2151,18 @@ static int fmeter_getrate(struct fmeter
static struct cpuset *cpuset_attach_old_cs;
+/* + * Check to see if a cpuset can accept a new task + * For v1, cpus_allowed and mems_allowed can't be empty. + */ +static int cpuset_can_attach_check(struct cpuset *cs) +{ + if (!is_in_v2_mode() && + (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))) + return -ENOSPC; + return 0; +} + /* Called by cgroups to determine if a cpuset is usable; cpuset_rwsem held */ static int cpuset_can_attach(struct cgroup_taskset *tset) { @@ -2165,10 +2177,8 @@ static int cpuset_can_attach(struct cgro
percpu_down_write(&cpuset_rwsem);
- /* allow moving tasks into an empty cpuset if on default hierarchy */ - ret = -ENOSPC; - if (!is_in_v2_mode() && - (cpumask_empty(cs->cpus_allowed) || nodes_empty(cs->mems_allowed))) + ret = cpuset_can_attach_check(cs); + if (ret) goto out_unlock;
cgroup_taskset_for_each(task, css, tset) { @@ -2185,7 +2195,6 @@ static int cpuset_can_attach(struct cgro * changes which zero cpus/mems_allowed. */ cs->attach_in_progress++; - ret = 0; out_unlock: percpu_up_write(&cpuset_rwsem); return ret; @@ -2913,6 +2922,68 @@ static void cpuset_bind(struct cgroup_su }
/* + * In case the child is cloned into a cpuset different from its parent, + * additional checks are done to see if the move is allowed. + */ +static int cpuset_can_fork(struct task_struct *task, struct css_set *cset) +{ + struct cpuset *cs = css_cs(cset->subsys[cpuset_cgrp_id]); + bool same_cs; + int ret; + + rcu_read_lock(); + same_cs = (cs == task_cs(current)); + rcu_read_unlock(); + + if (same_cs) + return 0; + + lockdep_assert_held(&cgroup_mutex); + percpu_down_write(&cpuset_rwsem); + + /* Check to see if task is allowed in the cpuset */ + ret = cpuset_can_attach_check(cs); + if (ret) + goto out_unlock; + + ret = task_can_attach(task, cs->effective_cpus); + if (ret) + goto out_unlock; + + ret = security_task_setscheduler(task); + if (ret) + goto out_unlock; + + /* + * Mark attach is in progress. This makes validate_change() fail + * changes which zero cpus/mems_allowed. + */ + cs->attach_in_progress++; +out_unlock: + percpu_up_write(&cpuset_rwsem); + return ret; +} + +static void cpuset_cancel_fork(struct task_struct *task, struct css_set *cset) +{ + struct cpuset *cs = css_cs(cset->subsys[cpuset_cgrp_id]); + bool same_cs; + + rcu_read_lock(); + same_cs = (cs == task_cs(current)); + rcu_read_unlock(); + + if (same_cs) + return; + + percpu_down_write(&cpuset_rwsem); + cs->attach_in_progress--; + if (!cs->attach_in_progress) + wake_up(&cpuset_attach_wq); + percpu_up_write(&cpuset_rwsem); +} + +/* * Make sure the new task conform to the current state of its parent, * which could have been changed by cpuset just after it inherits the * state from the parent and before it sits on the cgroup's task list. @@ -2946,6 +3017,11 @@ static void cpuset_fork(struct task_stru else guarantee_online_cpus(cs, cpus_attach); cpuset_attach_task(cs, task); + + cs->attach_in_progress--; + if (!cs->attach_in_progress) + wake_up(&cpuset_attach_wq); + percpu_up_write(&cpuset_rwsem); }
@@ -2959,6 +3035,8 @@ struct cgroup_subsys cpuset_cgrp_subsys .attach = cpuset_attach, .post_attach = cpuset_post_attach, .bind = cpuset_bind, + .can_fork = cpuset_can_fork, + .cancel_fork = cpuset_cancel_fork, .fork = cpuset_fork, .legacy_cftypes = legacy_files, .dfl_cftypes = dfl_files,
From: George Cherian george.cherian@marvell.com
commit 000987a38b53c172f435142a4026dd71378ca464 upstream.
Make sure to honour the max_hw_heartbeat_ms while programming the timeout value to WOR. Clamp the timeout passed to sbsa_gwdt_set_timeout() to make sure the programmed value is within the permissible range.
Fixes: abd3ac7902fb ("watchdog: sbsa: Support architecture version 1")
Signed-off-by: George Cherian george.cherian@marvell.com Reviewed-by: Guenter Roeck linux@roeck-us.net Link: https://lore.kernel.org/r/20230209021117.1512097-1-george.cherian@marvell.co... Signed-off-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Wim Van Sebroeck wim@linux-watchdog.org Signed-off-by: Tyler Hicks (Microsoft) code@tyhicks.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/watchdog/sbsa_gwdt.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/watchdog/sbsa_gwdt.c +++ b/drivers/watchdog/sbsa_gwdt.c @@ -121,6 +121,7 @@ static int sbsa_gwdt_set_timeout(struct struct sbsa_gwdt *gwdt = watchdog_get_drvdata(wdd);
wdd->timeout = timeout; + timeout = clamp_t(unsigned int, timeout, 1, wdd->max_hw_heartbeat_ms / 1000);
if (action) writel(gwdt->clk * timeout,
From: Steve Clevenger scclevenger@os.amperecomputing.com
commit bf84937e882009075f57fd213836256fc65d96bc upstream.
In etm4_enable_hw, fix for() loop range to represent address comparator pairs.
Fixes: 2e1cdfe184b5 ("coresight-etm4x: Adding CoreSight ETM4x driver") Cc: stable@vger.kernel.org Signed-off-by: Steve Clevenger scclevenger@os.amperecomputing.com Reviewed-by: James Clark james.clark@arm.com Signed-off-by: Suzuki K Poulose suzuki.poulose@arm.com Link: https://lore.kernel.org/r/4a4ee61ce8ef402615a4528b21a051de3444fb7b.167754007... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwtracing/coresight/coresight-etm4x-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/hwtracing/coresight/coresight-etm4x-core.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-core.c @@ -179,7 +179,7 @@ static int etm4_enable_hw(struct etmv4_d writel_relaxed(config->ss_pe_cmp[i], drvdata->base + TRCSSPCICRn(i)); } - for (i = 0; i < drvdata->nr_addr_cmp; i++) { + for (i = 0; i < drvdata->nr_addr_cmp * 2; i++) { writeq_relaxed(config->addr_val[i], drvdata->base + TRCACVRn(i)); writeq_relaxed(config->addr_acc[i],
From: Masahiro Yamada masahiroy@kernel.org
commit ba64beb17493a4bfec563100c86a462a15926f24 upstream.
Documentation/process/changes.rst defines the minimum assembler version (binutils version), but we have never checked it in the build time.
Kbuild never invokes 'as' directly because all assembly files in the kernel tree are *.S, hence must be preprocessed. I do not expect raw assembly source files (*.s) would be added to the kernel tree.
Therefore, we always use $(CC) as the assembler driver, and commit aa824e0c962b ("kbuild: remove AS variable") removed 'AS'. However, we are still interested in the version of the assembler acting behind.
As usual, the --version option prints the version string.
$ as --version | head -n 1 GNU assembler (GNU Binutils for Ubuntu) 2.35.1
But, we do not have $(AS). So, we can add the -Wa prefix so that $(CC) passes --version down to the backing assembler.
$ gcc -Wa,--version | head -n 1 gcc: fatal error: no input files compilation terminated.
OK, we need to input something to satisfy gcc.
$ gcc -Wa,--version -c -x assembler /dev/null -o /dev/null | head -n 1 GNU assembler (GNU Binutils for Ubuntu) 2.35.1
The combination of Clang and GNU assembler works in the same way:
$ clang -no-integrated-as -Wa,--version -c -x assembler /dev/null -o /dev/null | head -n 1 GNU assembler (GNU Binutils for Ubuntu) 2.35.1
Clang with the integrated assembler fails like this:
$ clang -integrated-as -Wa,--version -c -x assembler /dev/null -o /dev/null | head -n 1 clang: error: unsupported argument '--version' to option 'Wa,'
For the last case, checking the error message is fragile. If the proposal for -Wa,--version support [1] is accepted, this may not be even an error in the future.
One easy way is to check if -integrated-as is present in the passed arguments. We did not pass -integrated-as to CLANG_FLAGS before, but we can make it explicit.
Nathan pointed out -integrated-as is the default for all of the architectures/targets that the kernel cares about, but it goes along with "explicit is better than implicit" policy. [2]
With all this in my mind, I implemented scripts/as-version.sh to check the assembler version in Kconfig time.
$ scripts/as-version.sh gcc GNU 23501 $ scripts/as-version.sh clang -no-integrated-as GNU 23501 $ scripts/as-version.sh clang -integrated-as LLVM 0
[1]: https://github.com/ClangBuiltLinux/linux/issues/1320 [2]: https://lore.kernel.org/linux-kbuild/20210307044253.v3h47ucq6ng25iay@archlin...
Signed-off-by: Masahiro Yamada masahiroy@kernel.org Reviewed-by: Nathan Chancellor nathan@kernel.org [nathan: Backport to 5.10. Drop minimum version checking, as it is not required in 5.10] Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Makefile | 4 ++ init/Kconfig | 12 ++++++++ scripts/Kconfig.include | 6 ++++ scripts/as-version.sh | 69 ++++++++++++++++++++++++++++++++++++++++++++++++ scripts/dummy-tools/gcc | 6 ++++ 5 files changed, 96 insertions(+), 1 deletion(-)
diff --git a/Makefile b/Makefile index 71caf5938361..44d9aff4f66a 100644 --- a/Makefile +++ b/Makefile @@ -580,7 +580,9 @@ endif ifneq ($(GCC_TOOLCHAIN),) CLANG_FLAGS += --gcc-toolchain=$(GCC_TOOLCHAIN) endif -ifneq ($(LLVM_IAS),1) +ifeq ($(LLVM_IAS),1) +CLANG_FLAGS += -integrated-as +else CLANG_FLAGS += -no-integrated-as endif CLANG_FLAGS += -Werror=unknown-warning-option diff --git a/init/Kconfig b/init/Kconfig index eba883d6d9ed..9807c66b24bb 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -47,6 +47,18 @@ config CLANG_VERSION int default $(shell,$(srctree)/scripts/clang-version.sh $(CC))
+config AS_IS_GNU + def_bool $(success,test "$(as-name)" = GNU) + +config AS_IS_LLVM + def_bool $(success,test "$(as-name)" = LLVM) + +config AS_VERSION + int + # Use clang version if this is the integrated assembler + default CLANG_VERSION if AS_IS_LLVM + default $(as-version) + config LLD_VERSION int default $(shell,$(srctree)/scripts/lld-version.sh $(LD)) diff --git a/scripts/Kconfig.include b/scripts/Kconfig.include index a5fe72c504ff..6d37cb780452 100644 --- a/scripts/Kconfig.include +++ b/scripts/Kconfig.include @@ -42,6 +42,12 @@ $(error-if,$(failure,command -v $(LD)),linker '$(LD)' not found) # Fail if the linker is gold as it's not capable of linking the kernel proper $(error-if,$(success, $(LD) -v | grep -q gold), gold linker '$(LD)' not supported)
+# Get the assembler name, version, and error out if it is not supported. +as-info := $(shell,$(srctree)/scripts/as-version.sh $(CC) $(CLANG_FLAGS)) +$(error-if,$(success,test -z "$(as-info)"),Sorry$(comma) this assembler is not supported.) +as-name := $(shell,set -- $(as-info) && echo $1) +as-version := $(shell,set -- $(as-info) && echo $2) + # machine bit flags # $(m32-flag): -m32 if the compiler supports it, or an empty string otherwise. # $(m64-flag): -m64 if the compiler supports it, or an empty string otherwise. diff --git a/scripts/as-version.sh b/scripts/as-version.sh new file mode 100755 index 000000000000..851dcb8a0e68 --- /dev/null +++ b/scripts/as-version.sh @@ -0,0 +1,69 @@ +#!/bin/sh +# SPDX-License-Identifier: GPL-2.0-only +# +# Print the assembler name and its version in a 5 or 6-digit form. +# Also, perform the minimum version check. +# (If it is the integrated assembler, return 0 as the version, and +# skip the version check.) + +set -e + +# Convert the version string x.y.z to a canonical 5 or 6-digit form. +get_canonical_version() +{ + IFS=. + set -- $1 + + # If the 2nd or 3rd field is missing, fill it with a zero. + # + # The 4th field, if present, is ignored. + # This occurs in development snapshots as in 2.35.1.20201116 + echo $((10000 * $1 + 100 * ${2:-0} + ${3:-0})) +} + +# Clang fails to handle -Wa,--version unless -no-integrated-as is given. +# We check -(f)integrated-as, expecting it is explicitly passed in for the +# integrated assembler case. +check_integrated_as() +{ + while [ $# -gt 0 ]; do + if [ "$1" = -integrated-as -o "$1" = -fintegrated-as ]; then + # For the intergrated assembler, we do not check the + # version here. It is the same as the clang version, and + # it has been already checked by scripts/cc-version.sh. + echo LLVM 0 + exit 0 + fi + shift + done +} + +check_integrated_as "$@" + +orig_args="$@" + +# Get the first line of the --version output. +IFS=' +' +set -- $(LC_ALL=C "$@" -Wa,--version -c -x assembler /dev/null -o /dev/null 2>/dev/null) + +# Split the line on spaces. +IFS=' ' +set -- $1 + +if [ "$1" = GNU -a "$2" = assembler ]; then + shift $(($# - 1)) + version=$1 + name=GNU +else + echo "$orig_args: unknown assembler invoked" >&2 + exit 1 +fi + +# Some distributions append a package release number, as in 2.34-4.fc32 +# Trim the hyphen and any characters that follow. +version=${version%-*} + +cversion=$(get_canonical_version $version) + +echo $name $cversion diff --git a/scripts/dummy-tools/gcc b/scripts/dummy-tools/gcc index 346757a87dbc..485427f40dba 100755 --- a/scripts/dummy-tools/gcc +++ b/scripts/dummy-tools/gcc @@ -67,6 +67,12 @@ if arg_contain -E "$@"; then fi fi
+# To set CONFIG_AS_IS_GNU +if arg_contain -Wa,--version "$@"; then + echo "GNU assembler (scripts/dummy-tools) 2.50" + exit 0 +fi + if arg_contain -S "$@"; then # For scripts/gcc-x86-*-has-stack-protector.sh if arg_contain -fstack-protector "$@"; then
From: Nathan Chancellor nathan@kernel.org
commit 2185a7e4b0ade86c2c57fc63d4a7535c40254bd0 upstream.
It has been brought up a few times in various code reviews that clang 3.5 introduced -f{,no-}integrated-as as the preferred way to enable and disable the integrated assembler, mentioning that -{no-,}integrated-as are now considered legacy flags.
Switch the kernel over to using those variants in case there is ever a time where clang decides to remove the non-'f' variants of the flag.
Also, fix a typo in a comment ("intergrated" -> "integrated").
Link: https://releases.llvm.org/3.5.0/tools/clang/docs/ReleaseNotes.html#new-compi... Reviewed-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Masahiro Yamada masahiroy@kernel.org [nathan: Backport to 5.10] Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Makefile | 4 ++-- scripts/as-version.sh | 8 ++++---- 2 files changed, 6 insertions(+), 6 deletions(-)
--- a/Makefile +++ b/Makefile @@ -581,9 +581,9 @@ ifneq ($(GCC_TOOLCHAIN),) CLANG_FLAGS += --gcc-toolchain=$(GCC_TOOLCHAIN) endif ifeq ($(LLVM_IAS),1) -CLANG_FLAGS += -integrated-as +CLANG_FLAGS += -fintegrated-as else -CLANG_FLAGS += -no-integrated-as +CLANG_FLAGS += -fno-integrated-as endif CLANG_FLAGS += -Werror=unknown-warning-option KBUILD_CFLAGS += $(CLANG_FLAGS) --- a/scripts/as-version.sh +++ b/scripts/as-version.sh @@ -21,14 +21,14 @@ get_canonical_version() echo $((10000 * $1 + 100 * ${2:-0} + ${3:-0})) }
-# Clang fails to handle -Wa,--version unless -no-integrated-as is given. -# We check -(f)integrated-as, expecting it is explicitly passed in for the +# Clang fails to handle -Wa,--version unless -fno-integrated-as is given. +# We check -fintegrated-as, expecting it is explicitly passed in for the # integrated assembler case. check_integrated_as() { while [ $# -gt 0 ]; do - if [ "$1" = -integrated-as -o "$1" = -fintegrated-as ]; then - # For the intergrated assembler, we do not check the + if [ "$1" = -fintegrated-as ]; then + # For the integrated assembler, we do not check the # version here. It is the same as the clang version, and # it has been already checked by scripts/cc-version.sh. echo LLVM 0
From: Masahiro Yamada masahiroy@kernel.org
commit 52cc02b910284d6bddba46cce402044ab775f314 upstream.
LLVM_IAS is the user interface to set the -(no-)integrated-as flag, and it should be used only for that purpose.
LLVM_IAS is checked in some places to determine the assembler type, but it is not precise.
For example,
$ make CC=gcc LLVM_IAS=1
... will use the GNU assembler (i.e. binutils) since LLVM_IAS=1 is effective only when $(CC) is clang.
Of course, 'CC=gcc LLVM_IAS=1' is an odd combination, but the build system can be more robust against such insane input.
Commit ba64beb17493a ("kbuild: check the minimum assembler version in Kconfig") introduced CONFIG_AS_IS_GNU/LLVM, which is more precise because Kconfig checks the version string from the assembler in use.
Signed-off-by: Masahiro Yamada masahiroy@kernel.org Reviewed-by: Nick Desaulniers ndesaulniers@google.com Reviewed-by: Nathan Chancellor nathan@kernel.org [nathan: Backport to 5.10] Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Makefile | 2 +- arch/riscv/Makefile | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/Makefile +++ b/Makefile @@ -851,7 +851,7 @@ else DEBUG_CFLAGS += -g endif
-ifeq ($(LLVM_IAS),1) +ifdef CONFIG_AS_IS_LLVM KBUILD_AFLAGS += -g else KBUILD_AFLAGS += -Wa,-gdwarf-2 --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -40,7 +40,7 @@ ifeq ($(CONFIG_LD_IS_LLD),y) ifeq ($(shell test $(CONFIG_LLD_VERSION) -lt 150000; echo $$?),0) KBUILD_CFLAGS += -mno-relax KBUILD_AFLAGS += -mno-relax -ifneq ($(LLVM_IAS),1) +ifndef CONFIG_AS_IS_LLVM KBUILD_CFLAGS += -Wa,-mno-relax KBUILD_AFLAGS += -Wa,-mno-relax endif
From: Nathan Chancellor nathan@kernel.org
commit e89c2e815e76471cb507bd95728bf26da7976430 upstream.
There are two related issues that appear in certain combinations with clang and GNU binutils.
The first occurs when a version of clang that supports zicsr or zifencei via '-march=' [1] (i.e, >= 17.x) is used in combination with a version of GNU binutils that do not recognize zicsr and zifencei in the '-march=' value (i.e., < 2.36):
riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei' riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/file.o riscv64-linux-gnu-ld: -march=rv64i2p0_m2p0_a2p0_c2p0_zicsr2p0_zifencei2p0: Invalid or unknown z ISA extension: 'zifencei' riscv64-linux-gnu-ld: failed to merge target specific data of file fs/efivarfs/super.o
The second occurs when a version of clang that does not support zicsr or zifencei via '-march=' (i.e., <= 16.x) is used in combination with a version of GNU as that defaults to a newer ISA base spec, which requires specifying zicsr and zifencei in the '-march=' value explicitly (i.e, >= 2.38):
../arch/riscv/kernel/kexec_relocate.S: Assembler messages: ../arch/riscv/kernel/kexec_relocate.S:147: Error: unrecognized opcode `fence.i', extension `zifencei' required clang-12: error: assembler command failed with exit code 1 (use -v to see invocation)
This is the same issue addressed by commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38") (see [2] for additional information) but older versions of clang miss out on it because the cc-option check fails:
clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr' clang-12: error: invalid arch name 'rv64imac_zicsr_zifencei', unsupported standard user-level extension 'zicsr'
To resolve the first issue, only attempt to add zicsr and zifencei to the march string when using the GNU assembler 2.38 or newer, which is when the default ISA spec was updated, requiring these extensions to be specified explicitly. LLVM implements an older version of the base specification for all currently released versions, so these instructions are available as part of the 'i' extension. If LLVM's implementation is updated in the future, a CONFIG_AS_IS_LLVM condition can be added to CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI.
To resolve the second issue, use version 2.2 of the base ISA spec when using an older version of clang that does not support zicsr or zifencei via '-march=', as that is the spec version most compatible with the one clang/LLVM implements and avoids the need to specify zicsr and zifencei explicitly due to still being a part of 'i'.
[1]: https://github.com/llvm/llvm-project/commit/22e199e6afb1263c943c0c0d4498694e... [2]: https://lore.kernel.org/ZAxT7T9Xy1Fo3d5W@aurel32.net/
Cc: stable@vger.kernel.org Link: https://github.com/ClangBuiltLinux/linux/issues/1808 Co-developed-by: Conor Dooley conor.dooley@microchip.com Signed-off-by: Conor Dooley conor.dooley@microchip.com Signed-off-by: Nathan Chancellor nathan@kernel.org Acked-by: Conor Dooley conor.dooley@microchip.com Link: https://lore.kernel.org/r/20230313-riscv-zicsr-zifencei-fiasco-v1-1-dd1b7840... Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/Kconfig | 22 ++++++++++++++++++++++ arch/riscv/Makefile | 10 ++++++---- 2 files changed, 28 insertions(+), 4 deletions(-)
--- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -331,6 +331,28 @@ config RISCV_BASE_PMU
endmenu
+config TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI + def_bool y + # https://sourceware.org/git/?p=binutils-gdb.git%3Ba=commit%3Bh=aed44286efa8ae... + depends on AS_IS_GNU && AS_VERSION >= 23800 + help + Newer binutils versions default to ISA spec version 20191213 which + moves some instructions from the I extension to the Zicsr and Zifencei + extensions. + +config TOOLCHAIN_NEEDS_OLD_ISA_SPEC + def_bool y + depends on TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI + # https://github.com/llvm/llvm-project/commit/22e199e6afb1263c943c0c0d4498694e... + depends on CC_IS_CLANG && CLANG_VERSION < 170000 + help + Certain versions of clang do not support zicsr and zifencei via -march + but newer versions of binutils require it for the reasons noted in the + help text of CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. This + option causes an older ISA spec compatible with these older versions + of clang to be passed to GAS, which has the same result as passing zicsr + and zifencei to -march. + config FPU bool "FPU support" default y --- a/arch/riscv/Makefile +++ b/arch/riscv/Makefile @@ -53,10 +53,12 @@ riscv-march-$(CONFIG_ARCH_RV64I) := rv64 riscv-march-$(CONFIG_FPU) := $(riscv-march-y)fd riscv-march-$(CONFIG_RISCV_ISA_C) := $(riscv-march-y)c
-# Newer binutils versions default to ISA spec version 20191213 which moves some -# instructions from the I extension to the Zicsr and Zifencei extensions. -toolchain-need-zicsr-zifencei := $(call cc-option-yn, -march=$(riscv-march-y)_zicsr_zifencei) -riscv-march-$(toolchain-need-zicsr-zifencei) := $(riscv-march-y)_zicsr_zifencei +ifdef CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC +KBUILD_CFLAGS += -Wa,-misa-spec=2.2 +KBUILD_AFLAGS += -Wa,-misa-spec=2.2 +else +riscv-march-$(CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI) := $(riscv-march-y)_zicsr_zifencei +endif
KBUILD_CFLAGS += -march=$(subst fd,,$(riscv-march-y)) KBUILD_AFLAGS += -march=$(riscv-march-y)
From: Arnd Bergmann arnd@arndb.de
commit 4b692e861619353ce069e547a67c8d0e32d9ef3d upstream.
Patch series "compat: remove compat_alloc_user_space", v5.
Going through compat_alloc_user_space() to convert indirect system call arguments tends to add complexity compared to handling the native and compat logic in the same code.
This patch (of 6):
The locking is the same between the native and compat version of sys_kexec_load(), so it can be done in the common implementation to reduce duplication.
Link: https://lkml.kernel.org/r/20210727144859.4150043-1-arnd@kernel.org Link: https://lkml.kernel.org/r/20210727144859.4150043-2-arnd@kernel.org Signed-off-by: Arnd Bergmann arnd@arndb.de Co-developed-by: Eric Biederman ebiederm@xmission.com Co-developed-by: Christoph Hellwig hch@infradead.org Acked-by: "Eric W. Biederman" ebiederm@xmission.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: Will Deacon will@kernel.org Cc: Thomas Bogendoerfer tsbogend@alpha.franken.de Cc: "James E.J. Bottomley" James.Bottomley@HansenPartnership.com Cc: Helge Deller deller@gmx.de Cc: Michael Ellerman mpe@ellerman.id.au Cc: Benjamin Herrenschmidt benh@kernel.crashing.org Cc: Paul Mackerras paulus@samba.org Cc: Heiko Carstens hca@linux.ibm.com Cc: Vasily Gorbik gor@linux.ibm.com Cc: Christian Borntraeger borntraeger@de.ibm.com Cc: "David S. Miller" davem@davemloft.net Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Borislav Petkov bp@alien8.de Cc: "H. Peter Anvin" hpa@zytor.com Cc: Al Viro viro@zeniv.linux.org.uk Cc: Feng Tang feng.tang@intel.com Cc: Christoph Hellwig hch@lst.de Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Wen Yang wenyang.linux@foxmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/kexec.c | 44 ++++++++++++++++---------------------------- 1 file changed, 16 insertions(+), 28 deletions(-)
--- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -110,6 +110,17 @@ static int do_kexec_load(unsigned long e unsigned long i; int ret;
+ /* + * Because we write directly to the reserved memory region when loading + * crash kernels we need a mutex here to prevent multiple crash kernels + * from attempting to load simultaneously, and to prevent a crash kernel + * from loading over the top of a in use crash kernel. + * + * KISS: always take the mutex. + */ + if (!mutex_trylock(&kexec_mutex)) + return -EBUSY; + if (flags & KEXEC_ON_CRASH) { dest_image = &kexec_crash_image; if (kexec_crash_image) @@ -121,7 +132,8 @@ static int do_kexec_load(unsigned long e if (nr_segments == 0) { /* Uninstall image */ kimage_free(xchg(dest_image, NULL)); - return 0; + ret = 0; + goto out_unlock; } if (flags & KEXEC_ON_CRASH) { /* @@ -134,7 +146,7 @@ static int do_kexec_load(unsigned long e
ret = kimage_alloc_init(&image, entry, nr_segments, segments, flags); if (ret) - return ret; + goto out_unlock;
if (flags & KEXEC_PRESERVE_CONTEXT) image->preserve_context = 1; @@ -171,6 +183,8 @@ out: arch_kexec_protect_crashkres();
kimage_free(image); +out_unlock: + mutex_unlock(&kexec_mutex); return ret; }
@@ -247,21 +261,8 @@ SYSCALL_DEFINE4(kexec_load, unsigned lon ((flags & KEXEC_ARCH_MASK) != KEXEC_ARCH_DEFAULT)) return -EINVAL;
- /* Because we write directly to the reserved memory - * region when loading crash kernels we need a mutex here to - * prevent multiple crash kernels from attempting to load - * simultaneously, and to prevent a crash kernel from loading - * over the top of a in use crash kernel. - * - * KISS: always take the mutex. - */ - if (!mutex_trylock(&kexec_mutex)) - return -EBUSY; - result = do_kexec_load(entry, nr_segments, segments, flags);
- mutex_unlock(&kexec_mutex); - return result; }
@@ -301,21 +302,8 @@ COMPAT_SYSCALL_DEFINE4(kexec_load, compa return -EFAULT; }
- /* Because we write directly to the reserved memory - * region when loading crash kernels we need a mutex here to - * prevent multiple crash kernels from attempting to load - * simultaneously, and to prevent a crash kernel from loading - * over the top of a in use crash kernel. - * - * KISS: always take the mutex. - */ - if (!mutex_trylock(&kexec_mutex)) - return -EBUSY; - result = do_kexec_load(entry, nr_segments, ksegments, flags);
- mutex_unlock(&kexec_mutex); - return result; } #endif
From: Valentin Schneider vschneid@redhat.com
commit 7bb5da0d490b2d836c5218f5186ee588d2145310 upstream.
Patch series "kexec, panic: Making crash_kexec() NMI safe", v4.
This patch (of 2):
Most acquistions of kexec_mutex are done via mutex_trylock() - those were a direct "translation" from:
8c5a1cf0ad3a ("kexec: use a mutex for locking rather than xchg()")
there have however been two additions since then that use mutex_lock(): crash_get_memory_size() and crash_shrink_memory().
A later commit will replace said mutex with an atomic variable, and locking operations will become atomic_cmpxchg(). Rather than having those mutex_lock() become while (atomic_cmpxchg(&lock, 0, 1)), turn them into trylocks that can return -EBUSY on acquisition failure.
This does halve the printable size of the crash kernel, but that's still neighbouring 2G for 32bit kernels which should be ample enough.
Link: https://lkml.kernel.org/r/20220630223258.4144112-1-vschneid@redhat.com Link: https://lkml.kernel.org/r/20220630223258.4144112-2-vschneid@redhat.com Signed-off-by: Valentin Schneider vschneid@redhat.com Cc: Arnd Bergmann arnd@arndb.de Cc: "Eric W . Biederman" ebiederm@xmission.com Cc: Juri Lelli jlelli@redhat.com Cc: Luis Claudio R. Goncalves lgoncalv@redhat.com Cc: Miaohe Lin linmiaohe@huawei.com Cc: Petr Mladek pmladek@suse.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Baoquan He bhe@redhat.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Wen Yang wenyang.linux@foxmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/kexec.h | 2 +- kernel/kexec_core.c | 12 ++++++++---- kernel/ksysfs.c | 7 ++++++- 3 files changed, 15 insertions(+), 6 deletions(-)
--- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -380,8 +380,8 @@ extern note_buf_t __percpu *crash_notes; extern bool kexec_in_progress;
int crash_shrink_memory(unsigned long new_size); -size_t crash_get_memory_size(void); void crash_free_reserved_phys_range(unsigned long begin, unsigned long end); +ssize_t crash_get_memory_size(void);
void arch_kexec_protect_crashkres(void); void arch_kexec_unprotect_crashkres(void); --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -989,13 +989,16 @@ void crash_kexec(struct pt_regs *regs) } }
-size_t crash_get_memory_size(void) +ssize_t crash_get_memory_size(void) { - size_t size = 0; + ssize_t size = 0; + + if (!mutex_trylock(&kexec_mutex)) + return -EBUSY;
- mutex_lock(&kexec_mutex); if (crashk_res.end != crashk_res.start) size = resource_size(&crashk_res); + mutex_unlock(&kexec_mutex); return size; } @@ -1016,7 +1019,8 @@ int crash_shrink_memory(unsigned long ne unsigned long old_size; struct resource *ram_res;
- mutex_lock(&kexec_mutex); + if (!mutex_trylock(&kexec_mutex)) + return -EBUSY;
if (kexec_crash_image) { ret = -ENOENT; --- a/kernel/ksysfs.c +++ b/kernel/ksysfs.c @@ -106,7 +106,12 @@ KERNEL_ATTR_RO(kexec_crash_loaded); static ssize_t kexec_crash_size_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { - return sprintf(buf, "%zu\n", crash_get_memory_size()); + ssize_t size = crash_get_memory_size(); + + if (size < 0) + return size; + + return sprintf(buf, "%zd\n", size); } static ssize_t kexec_crash_size_store(struct kobject *kobj, struct kobj_attribute *attr,
From: Valentin Schneider vschneid@redhat.com
commit 05c6257433b7212f07a7e53479a8ab038fc1666a upstream.
Attempting to get a crash dump out of a debug PREEMPT_RT kernel via an NMI panic() doesn't work. The cause of that lies in the PREEMPT_RT definition of mutex_trylock():
if (IS_ENABLED(CONFIG_DEBUG_RT_MUTEXES) && WARN_ON_ONCE(!in_task())) return 0;
This prevents an nmi_panic() from executing the main body of __crash_kexec() which does the actual kexec into the kdump kernel. The warning and return are explained by:
6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context") [...] The reasons for this are:
1) There is a potential deadlock in the slowpath
2) Another cpu which blocks on the rtmutex will boost the task which allegedly locked the rtmutex, but that cannot work because the hard/softirq context borrows the task context.
Furthermore, grabbing the lock isn't NMI safe, so do away with kexec_mutex and replace it with an atomic variable. This is somewhat overzealous as *some* callsites could keep using a mutex (e.g. the sysfs-facing ones like crash_shrink_memory()), but this has the benefit of involving a single unified lock and preventing any future NMI-related surprises.
Tested by triggering NMI panics via:
$ echo 1 > /proc/sys/kernel/panic_on_unrecovered_nmi $ echo 1 > /proc/sys/kernel/unknown_nmi_panic $ echo 1 > /proc/sys/kernel/panic
$ ipmitool power diag
Link: https://lkml.kernel.org/r/20220630223258.4144112-3-vschneid@redhat.com Fixes: 6ce47fd961fa ("rtmutex: Warn if trylock is called from hard/softirq context") Signed-off-by: Valentin Schneider vschneid@redhat.com Cc: Arnd Bergmann arnd@arndb.de Cc: Baoquan He bhe@redhat.com Cc: "Eric W . Biederman" ebiederm@xmission.com Cc: Juri Lelli jlelli@redhat.com Cc: Luis Claudio R. Goncalves lgoncalv@redhat.com Cc: Miaohe Lin linmiaohe@huawei.com Cc: Petr Mladek pmladek@suse.com Cc: Sebastian Andrzej Siewior bigeasy@linutronix.de Cc: Thomas Gleixner tglx@linutronix.de Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Wen Yang wenyang.linux@foxmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/kexec.c | 11 ++++------- kernel/kexec_core.c | 20 ++++++++++---------- kernel/kexec_file.c | 4 ++-- kernel/kexec_internal.h | 15 ++++++++++++++- 4 files changed, 30 insertions(+), 20 deletions(-)
--- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -112,13 +112,10 @@ static int do_kexec_load(unsigned long e
/* * Because we write directly to the reserved memory region when loading - * crash kernels we need a mutex here to prevent multiple crash kernels - * from attempting to load simultaneously, and to prevent a crash kernel - * from loading over the top of a in use crash kernel. - * - * KISS: always take the mutex. + * crash kernels we need a serialization here to prevent multiple crash + * kernels from attempting to load simultaneously. */ - if (!mutex_trylock(&kexec_mutex)) + if (!kexec_trylock()) return -EBUSY;
if (flags & KEXEC_ON_CRASH) { @@ -184,7 +181,7 @@ out:
kimage_free(image); out_unlock: - mutex_unlock(&kexec_mutex); + kexec_unlock(); return ret; }
--- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -45,7 +45,7 @@ #include <crypto/sha.h> #include "kexec_internal.h"
-DEFINE_MUTEX(kexec_mutex); +atomic_t __kexec_lock = ATOMIC_INIT(0);
/* Per cpu memory for storing cpu states in case of system crash. */ note_buf_t __percpu *crash_notes; @@ -943,7 +943,7 @@ int kexec_load_disabled; */ void __noclone __crash_kexec(struct pt_regs *regs) { - /* Take the kexec_mutex here to prevent sys_kexec_load + /* Take the kexec_lock here to prevent sys_kexec_load * running on one cpu from replacing the crash kernel * we are using after a panic on a different cpu. * @@ -951,7 +951,7 @@ void __noclone __crash_kexec(struct pt_r * of memory the xchg(&kexec_crash_image) would be * sufficient. But since I reuse the memory... */ - if (mutex_trylock(&kexec_mutex)) { + if (kexec_trylock()) { if (kexec_crash_image) { struct pt_regs fixed_regs;
@@ -960,7 +960,7 @@ void __noclone __crash_kexec(struct pt_r machine_crash_shutdown(&fixed_regs); machine_kexec(kexec_crash_image); } - mutex_unlock(&kexec_mutex); + kexec_unlock(); } } STACK_FRAME_NON_STANDARD(__crash_kexec); @@ -993,13 +993,13 @@ ssize_t crash_get_memory_size(void) { ssize_t size = 0;
- if (!mutex_trylock(&kexec_mutex)) + if (!kexec_trylock()) return -EBUSY;
if (crashk_res.end != crashk_res.start) size = resource_size(&crashk_res);
- mutex_unlock(&kexec_mutex); + kexec_unlock(); return size; }
@@ -1019,7 +1019,7 @@ int crash_shrink_memory(unsigned long ne unsigned long old_size; struct resource *ram_res;
- if (!mutex_trylock(&kexec_mutex)) + if (!kexec_trylock()) return -EBUSY;
if (kexec_crash_image) { @@ -1058,7 +1058,7 @@ int crash_shrink_memory(unsigned long ne insert_resource(&iomem_resource, ram_res);
unlock: - mutex_unlock(&kexec_mutex); + kexec_unlock(); return ret; }
@@ -1130,7 +1130,7 @@ int kernel_kexec(void) { int error = 0;
- if (!mutex_trylock(&kexec_mutex)) + if (!kexec_trylock()) return -EBUSY; if (!kexec_image) { error = -EINVAL; @@ -1205,7 +1205,7 @@ int kernel_kexec(void) #endif
Unlock: - mutex_unlock(&kexec_mutex); + kexec_unlock(); return error; }
--- a/kernel/kexec_file.c +++ b/kernel/kexec_file.c @@ -343,7 +343,7 @@ SYSCALL_DEFINE5(kexec_file_load, int, ke
image = NULL;
- if (!mutex_trylock(&kexec_mutex)) + if (!kexec_trylock()) return -EBUSY;
dest_image = &kexec_image; @@ -415,7 +415,7 @@ out: if ((flags & KEXEC_FILE_ON_CRASH) && kexec_crash_image) arch_kexec_protect_crashkres();
- mutex_unlock(&kexec_mutex); + kexec_unlock(); kimage_free(image); return ret; } --- a/kernel/kexec_internal.h +++ b/kernel/kexec_internal.h @@ -15,7 +15,20 @@ int kimage_is_destination_range(struct k
int machine_kexec_post_load(struct kimage *image);
-extern struct mutex kexec_mutex; +/* + * Whatever is used to serialize accesses to the kexec_crash_image needs to be + * NMI safe, as __crash_kexec() can happen during nmi_panic(), so here we use a + * "simple" atomic variable that is acquired with a cmpxchg(). + */ +extern atomic_t __kexec_lock; +static inline bool kexec_trylock(void) +{ + return atomic_cmpxchg_acquire(&__kexec_lock, 0, 1) == 0; +} +static inline void kexec_unlock(void) +{ + atomic_set_release(&__kexec_lock, 0); +}
#ifdef CONFIG_KEXEC_FILE #include <linux/purgatory.h>
From: Kuniyuki Iwashima kuniyu@amazon.com
commit 7dee5d7747a69aa2be41f04c6a7ecfe3ac8cdf18 upstream.
A sysctl variable is accessed concurrently, and there is always a chance of data-race. So, all readers and writers need some basic protection to avoid load/store-tearing.
This patch changes proc_dou8vec_minmax() to use READ_ONCE() and WRITE_ONCE() internally to fix data-races on the sysctl side. For now, proc_dou8vec_minmax() itself is tolerant to a data-race, but we still need to add annotations on the other subsystem's side.
Fixes: cb9444130662 ("sysctl: add proc_dou8vec_minmax()") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/sysctl.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -1109,13 +1109,13 @@ int proc_dou8vec_minmax(struct ctl_table
tmp.maxlen = sizeof(val); tmp.data = &val; - val = *data; + val = READ_ONCE(*data); res = do_proc_douintvec(&tmp, write, buffer, lenp, ppos, do_proc_douintvec_minmax_conv, ¶m); if (res) return res; if (write) - *data = val; + WRITE_ONCE(*data, val); return 0; } EXPORT_SYMBOL_GPL(proc_dou8vec_minmax);
On Tue, 18 Apr 2023 at 18:03, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.178 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.178-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Following build errors noticed on 5.15 and 5.10.,
Waiman Long longman@redhat.com cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
kernel/cgroup/cpuset.c: In function 'cpuset_can_fork': kernel/cgroup/cpuset.c:2941:30: error: 'cgroup_mutex' undeclared (first use in this function); did you mean 'cgroup_put'? 2941 | lockdep_assert_held(&cgroup_mutex); | ^~~~~~~~~~~~
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Suspected commit, cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods commit eee87853794187f6adbe19533ed79c8b44b36a91 upstream.
-- Linaro LKFT https://lkft.linaro.org
On Tue, Apr 18, 2023 at 08:38:47PM +0530, Naresh Kamboju wrote:
On Tue, 18 Apr 2023 at 18:03, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.178 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.178-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Following build errors noticed on 5.15 and 5.10.,
Waiman Long longman@redhat.com cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
That's a documentation patch, it can not:
kernel/cgroup/cpuset.c: In function 'cpuset_can_fork': kernel/cgroup/cpuset.c:2941:30: error: 'cgroup_mutex' undeclared (first use in this function); did you mean 'cgroup_put'? 2941 | lockdep_assert_held(&cgroup_mutex);
Cause this.
What arch is failing here? This builds for x86.
thanks,
greg k-h
On Tue, 18 Apr 2023 at 21:04, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Apr 18, 2023 at 08:38:47PM +0530, Naresh Kamboju wrote:
On Tue, 18 Apr 2023 at 18:03, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.178 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.178-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Following build errors noticed on 5.15 and 5.10.,
Waiman Long longman@redhat.com cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
That's a documentation patch, it can not:
Sorry for my mistake in trimming the email at the wrong place.
I have pasted down of the email as this suspected patch,
cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods commit eee87853794187f6adbe19533ed79c8b44b36a91 upstream.
kernel/cgroup/cpuset.c: In function 'cpuset_can_fork': kernel/cgroup/cpuset.c:2941:30: error: 'cgroup_mutex' undeclared (first use in this function); did you mean 'cgroup_put'? 2941 | lockdep_assert_held(&cgroup_mutex);
Cause this.
What arch is failing here? This builds for x86.
Not for me.
List of regression builds.
Regressions found on x86_64: - build/gcc-12-defconfig
Regressions found on i386: - build/gcc-12-defconfig - build/gcc-11-lkftconfig
Regressions found on arm64: - build/gcc-11-lkftconfig
Regressions found on arm: - build/gcc-12-vexpress_defconfig - build/gcc-8-vexpress_defconfig
Regressions found on mips: - build/gcc-8-defconfig - build/gcc-12-defconfig
For 5.10 defconfigs failed, https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.10.y/build/v5.10....
For 5.15 defconfigs failed, https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.15.y/build/v5.15....
- Naresh
On 4/18/23 09:45, Naresh Kamboju wrote:
On Tue, 18 Apr 2023 at 21:04, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Apr 18, 2023 at 08:38:47PM +0530, Naresh Kamboju wrote:
On Tue, 18 Apr 2023 at 18:03, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.178 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.178-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Following build errors noticed on 5.15 and 5.10.,
Waiman Long longman@redhat.com cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
That's a documentation patch, it can not:
Sorry for my mistake in trimming the email at the wrong place.
I have pasted down of the email as this suspected patch,
cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods commit eee87853794187f6adbe19533ed79c8b44b36a91 upstream.
kernel/cgroup/cpuset.c: In function 'cpuset_can_fork': kernel/cgroup/cpuset.c:2941:30: error: 'cgroup_mutex' undeclared (first use in this function); did you mean 'cgroup_put'? 2941 | lockdep_assert_held(&cgroup_mutex);
Cause this.
What arch is failing here? This builds for x86.
Not for me.
It is failing for me on x86_64 with CONFIG_CGROUPS=y and CONFIG_CGROUP_CPUACCT=y
kernel/cgroup/cpuset.c: In function ‘cpuset_can_fork’: kernel/cgroup/cpuset.c:2941:30: error: ‘cgroup_mutex’ undeclared (first use in this function); did you mean ‘cgroup_put’? 2941 | lockdep_assert_held(&cgroup_mutex); | ^~~~~~~~~~~~ ./include/linux/lockdep.h:393:61: note: in definition of macro ‘lockdep_assert_held’ 393 | #define lockdep_assert_held(l) do { (void)(l); } while (0) | ^ kernel/cgroup/cpuset.c:2941:30: note: each undeclared identifier is reported only once for each function it appears in 2941 | lockdep_assert_held(&cgroup_mutex); | ^~~~~~~~~~~~ ./include/linux/lockdep.h:393:61: note: in definition of macro ‘lockdep_assert_held’ 393 | #define lockdep_assert_held(l) do { (void)(l); } while (0)
thanks, -- Shuah
Hi all,
On 18/04/23 9:30 pm, Shuah Khan wrote:
On 4/18/23 09:45, Naresh Kamboju wrote:
On Tue, 18 Apr 2023 at 21:04, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Apr 18, 2023 at 08:38:47PM +0530, Naresh Kamboju wrote:
On Tue, 18 Apr 2023 at 18:03, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.178 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.178-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Following build errors noticed on 5.15 and 5.10.,
Waiman Long longman@redhat.com cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
That's a documentation patch, it can not:
Sorry for my mistake in trimming the email at the wrong place.
I have pasted down of the email as this suspected patch,
cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods commit eee87853794187f6adbe19533ed79c8b44b36a91 upstream.
kernel/cgroup/cpuset.c: In function 'cpuset_can_fork': kernel/cgroup/cpuset.c:2941:30: error: 'cgroup_mutex' undeclared (first use in this function); did you mean 'cgroup_put'? 2941 | lockdep_assert_held(&cgroup_mutex);
Cause this.
What arch is failing here? This builds for x86.
Not for me.
It is failing for me on x86_64 with CONFIG_CGROUPS=y and CONFIG_CGROUP_CPUACCT=y
Added some observations on 5.15.y thread:
https://lore.kernel.org/all/bd46521c-a167-2872-fecb-2b0f32855a24@oracle.com/
Thanks, Harshit
kernel/cgroup/cpuset.c: In function ‘cpuset_can_fork’: kernel/cgroup/cpuset.c:2941:30: error: ‘cgroup_mutex’ undeclared (first use in this function); did you mean ‘cgroup_put’? 2941 | lockdep_assert_held(&cgroup_mutex); | ^~~~~~~~~~~~ ./include/linux/lockdep.h:393:61: note: in definition of macro ‘lockdep_assert_held’ 393 | #define lockdep_assert_held(l) do { (void)(l); } while (0) | ^ kernel/cgroup/cpuset.c:2941:30: note: each undeclared identifier is reported only once for each function it appears in 2941 | lockdep_assert_held(&cgroup_mutex); | ^~~~~~~~~~~~ ./include/linux/lockdep.h:393:61: note: in definition of macro ‘lockdep_assert_held’ 393 | #define lockdep_assert_held(l) do { (void)(l); } while (0)
thanks, -- Shuah
On 4/18/23 08:34, Greg Kroah-Hartman wrote:
On Tue, Apr 18, 2023 at 08:38:47PM +0530, Naresh Kamboju wrote:
On Tue, 18 Apr 2023 at 18:03, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.178 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.178-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Following build errors noticed on 5.15 and 5.10.,
Waiman Long longman@redhat.com cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
That's a documentation patch, it can not:
kernel/cgroup/cpuset.c: In function 'cpuset_can_fork': kernel/cgroup/cpuset.c:2941:30: error: 'cgroup_mutex' undeclared (first use in this function); did you mean 'cgroup_put'? 2941 | lockdep_assert_held(&cgroup_mutex);
Cause this.
What arch is failing here? This builds for x86.
No, it doesn't.
Build reference: v5.10.177-125-g19b9d9b9f62e Compiler version: x86_64-linux-gcc (GCC) 11.3.0 Assembler version: GNU assembler (GNU Binutils) 2.39
Building x86_64:defconfig ... failed -------------- Error log: In file included from include/linux/rcupdate.h:29, from include/linux/rculist.h:11, from include/linux/pid.h:5, from include/linux/sched.h:14, from include/linux/ratelimit.h:6, from include/linux/dev_printk.h:16, from include/linux/device.h:15, from include/linux/node.h:18, from include/linux/cpu.h:17, from kernel/cgroup/cpuset.c:25: kernel/cgroup/cpuset.c: In function 'cpuset_can_fork': kernel/cgroup/cpuset.c:2941:30: error: 'cgroup_mutex' undeclared (first use in this function); did you mean 'cgroup_put'? 2941 | lockdep_assert_held(&cgroup_mutex); | ^~~~~~~~~~~~ include/linux/lockdep.h:393:61: note: in definition of macro 'lockdep_assert_held' 393 | #define lockdep_assert_held(l) do { (void)(l); } while (0) | ^ kernel/cgroup/cpuset.c:2941:30: note: each undeclared identifier is reported only once for each function it appears in 2941 | lockdep_assert_held(&cgroup_mutex); | ^~~~~~~~~~~~ include/linux/lockdep.h:393:61: note: in definition of macro 'lockdep_assert_held' 393 | #define lockdep_assert_held(l) do { (void)(l); } while (0) | ^ make[3]: *** [scripts/Makefile.build:286: kernel/cgroup/cpuset.o] Error 1 make[3]: *** Waiting for unfinished jobs.... make[2]: *** [scripts/Makefile.build:503: kernel/cgroup] Error 2 make[2]: *** Waiting for unfinished jobs.... make[1]: *** [Makefile:1828: kernel] Error 2 make[1]: *** Waiting for unfinished jobs.... make: *** [Makefile:192: __sub-make] Error 2
On 4/18/23 11:08, Naresh Kamboju wrote:
On Tue, 18 Apr 2023 at 18:03, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.178 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.178-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Following build errors noticed on 5.15 and 5.10.,
Waiman Long longman@redhat.com cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
kernel/cgroup/cpuset.c: In function 'cpuset_can_fork': kernel/cgroup/cpuset.c:2941:30: error: 'cgroup_mutex' undeclared (first use in this function); did you mean 'cgroup_put'? 2941 | lockdep_assert_held(&cgroup_mutex); | ^~~~~~~~~~~~
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Suspected commit, cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods commit eee87853794187f6adbe19533ed79c8b44b36a91 upstream.
Oh, cgroup_mutex needs the recent commit 354ed59744295 ("mm: multi-gen LRU: kill switch") to make it available in include/linux/cgroup.h. I did my testing with a debug Kconfig and so didn't catch that. This line can be safely removed.
Regards, Longman
On Tue, Apr 18, 2023 at 01:24:12PM -0400, Waiman Long wrote:
On 4/18/23 11:08, Naresh Kamboju wrote:
On Tue, 18 Apr 2023 at 18:03, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.10.178 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.178-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Following build errors noticed on 5.15 and 5.10.,
Waiman Long longman@redhat.com cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
kernel/cgroup/cpuset.c: In function 'cpuset_can_fork': kernel/cgroup/cpuset.c:2941:30: error: 'cgroup_mutex' undeclared (first use in this function); did you mean 'cgroup_put'? 2941 | lockdep_assert_held(&cgroup_mutex); | ^~~~~~~~~~~~
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
Suspected commit, cgroup/cpuset: Add cpuset_can_fork() and cpuset_cancel_fork() methods commit eee87853794187f6adbe19533ed79c8b44b36a91 upstream.
Oh, cgroup_mutex needs the recent commit 354ed59744295 ("mm: multi-gen LRU: kill switch") to make it available in include/linux/cgroup.h. I did my testing with a debug Kconfig and so didn't catch that. This line can be safely removed.
I can confirm removing just this line fixed 5.15 x86_64 build for me. Thanks!
--Tom
Hi!
This is the start of the stable review cycle for the 5.10.178 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.178-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
Following build errors noticed on 5.15 and 5.10.,
Waiman Long longman@redhat.com cgroup/cpuset: Change references of cpuset_mutex to cpuset_rwsem
kernel/cgroup/cpuset.c: In function 'cpuset_can_fork': kernel/cgroup/cpuset.c:2941:30: error: 'cgroup_mutex' undeclared (first use in this function); did you mean 'cgroup_put'? 2941 | lockdep_assert_held(&cgroup_mutex); | ^~~~~~~~~~~~
For the record, we can see the same failure in CIP testing.
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/pipelines/84...
And it is the only build error I can see in a quick search.
Best regards, Pavel
On 4/18/2023 5:20 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.178 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.178-rc... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On Tue, Apr 18, 2023 at 02:20:19PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.10.178 release. There are 124 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 20 Apr 2023 12:02:44 +0000. Anything received after that time might be too late.
Build results: total: 162 pass: 148 fail: 14 Failed builds: alpha:allmodconfig arm:omap2plus_defconfig arm:vexpress_defconfig arm64:defconfig i386:defconfig ia64:defconfig mips:defconfig parisc:allmodconfig parisc64:generic-64bit_defconfig powerpc:defconfig powerpc:ppc64e_defconfig powerpc:cell_defconfig s390:defconfig x86_64:defconfig Qemu test results: total: 485 pass: 477 fail: 8 Failed tests: mipsel64:64r6el_defconfig:notests:nonet:smp:ide:hd mipsel64:64r6el_defconfig:notests:nonet:smp:ide:hd mipsel64:64r6el_defconfig:notests:nonet:smp:ide:cd s390:defconfig:nolocktests:smp2:net,default:initrd s390:defconfig:nolocktests:smp2:virtio-blk-ccw:net,virtio-net-pci:rootfs s390:defconfig:nolocktests:smp2:scsi[virtio-ccw]:net,default:rootfs s390:defconfig:nolocktests:virtio-pci:net,virtio-net-pci:rootfs s390:defconfig:nolocktests:scsi[virtio-pci]:net,default:rootfs
Caused by:
kernel/cgroup/cpuset.c: In function 'cpuset_can_fork': kernel/cgroup/cpuset.c:2941:30: error: 'cgroup_mutex' undeclared
Guenter
linux-stable-mirror@lists.linaro.org