This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.9.6-rc1
Oleg Nesterov oleg@redhat.com zap_pid_ns_processes: clear TIF_NOTIFY_SIGNAL along with TIF_SIGPENDING
Jean Delvare jdelvare@suse.de i2c: designware: Fix the functionality flags of the slave-only interface
Jean Delvare jdelvare@suse.de i2c: at91: Fix the functionality flags of the slave-only interface
Yongzhi Liu hyperlyzcs@gmail.com misc: microchip: pci1xxxx: Fix a memory leak in the error handling of gp_aux_bus_probe()
Hans de Goede hdegoede@redhat.com mei: vsc: Fix wrong invocation of ACPI SID method
Shichao Lai shichaorai@gmail.com usb-storage: alauda: Check whether the media is initialized
Rob Herring (Arm) robh@kernel.org dt-bindings: usb: realtek,rts5411: Add missing "additionalProperties" on child nodes
Andy Shevchenko andriy.shevchenko@linux.intel.com serial: 8250_dw: Don't use struct dw8250_data outside of 8250_dw
Stefan Berger stefanb@linux.ibm.com ima: Fix use-after-free on a dentry's dname.name
Sicong Huang congei42@163.com greybus: Fix use-after-free bug in gb_interface_release due to race condition.
Beleswar Padhi b-padhi@ti.com remoteproc: k3-r5: Jump to error handling labels in start/stop errors
Miaohe Lin linmiaohe@huawei.com mm/huge_memory: don't unpoison huge_zero_folio
Emmanuel Grumbach emmanuel.grumbach@intel.com wifi: iwlwifi: mvm: fix a crash on 7265
Emmanuel Grumbach emmanuel.grumbach@intel.com wifi: iwlwifi: mvm: support iwl_dev_tx_power_cmd_v8
Filipe Manana fdmanana@suse.com btrfs: zoned: fix use-after-free due to race with dev replace
Tomi Valkeinen tomi.valkeinen@ideasonboard.com pmdomain: ti-sci: Fix duplicate PD referrals
Alexander Shishkin alexander.shishkin@linux.intel.com intel_th: pci: Add Lunar Lake support
Alexander Shishkin alexander.shishkin@linux.intel.com intel_th: pci: Add Meteor Lake-S support
Alexander Shishkin alexander.shishkin@linux.intel.com intel_th: pci: Add Sapphire Rapids SOC support
Alexander Shishkin alexander.shishkin@linux.intel.com intel_th: pci: Add Granite Rapids SOC support
Alexander Shishkin alexander.shishkin@linux.intel.com intel_th: pci: Add Granite Rapids support
Imre Deak imre.deak@intel.com drm/i915: Fix audio component initialization
Vidya Srinivas vidya.srinivas@intel.com drm/i915/dpt: Make DPT object unshrinkable
Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com drm/xe: Properly handle alloc_guc_id() failure
Wachowski, Karol karol.wachowski@intel.com drm/shmem-helper: Fix BUG_ON() on mmap(PROT_WRITE, MAP_PRIVATE)
Chris Wilson chris@chris-wilson.co.uk drm/i915/gt: Disarm breadcrumbs if engines are already idle
Daniel Bristot de Oliveira bristot@kernel.org rtla/auto-analysis: Replace \t with spaces
Daniel Bristot de Oliveira bristot@kernel.org rtla/timerlat: Simplify "no value" printing on top
Nam Cao namcao@linutronix.de riscv: force PAGE_SIZE linear mapping if debug_pagealloc is enabled
Nam Cao namcao@linutronix.de riscv: rewrite __kernel_map_pages() to fix sleeping in invalid context
Wayne Lin Wayne.Lin@amd.com drm/mst: Fix NULL pointer dereference at drm_dp_add_payload_part2
Jean-Baptiste Maneyrol jean-baptiste.maneyrol@tdk.com iio: invensense: fix interrupt timestamp alignment
Dimitri Fedrau dima.fedrau@gmail.com iio: temperature: mcp9600: Fix temperature reading for negative values
Nuno Sa nuno.sa@analog.com iio: adc: axi-adc: make sure AXI clock is enabled
Beleswar Padhi b-padhi@ti.com remoteproc: k3-r5: Do not allow core1 to power up before core0 via sysfs
Apurva Nandan a-nandan@ti.com remoteproc: k3-r5: Wait for core0 power-up before powering up core1
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/bridge: aux-hpd-bridge: correct devm_drm_dp_hpd_bridge_add() stub
Nuno Sa nuno.sa@analog.com dmaengine: axi-dmac: fix possible race in remove()
Rick Wertenbroek rick.wertenbroek@gmail.com PCI: rockchip-ep: Remove wrong mask on subsys_vendor_id
Mikulas Patocka mpatocka@redhat.com dm-integrity: set discard_granularity to logical block size
Su Yue glass.su@suse.com ocfs2: fix races between hole punching and AIO+DIO
Su Yue glass.su@suse.com ocfs2: use coarse time for new created files
Su Yue glass.su@suse.com ocfs2: update inode fsync transaction id in ocfs2_unlink and ocfs2_link
Baoquan He bhe@redhat.com kexec: fix the unexpected kexec_dprintk() macro
Rik van Riel riel@surriel.com fs/proc: fix softlockup in __read_vmcore
Trond Myklebust trond.myklebust@hammerspace.com knfsd: LOOKUP can return an illegal error value
Vamshi Gajjela vamshigajjela@google.com spmi: hisi-spmi-controller: Do not override device identifier
Hagar Gamal Halim Hemdan hagarhem@amazon.com vmci: prevent speculation leaks by sanitizing event in event_deliver()
Fedor Pchelkin pchelkin@ispras.ru dma-buf: handle testing kthreads creation failure
Thadeu Lima de Souza Cascardo cascardo@igalia.com sock_map: avoid race between sock_map_close and sk_psock_put
Niklas Cassel cassel@kernel.org ata: libata-core: Add ATA_HORKAGE_NOLPM for AMD Radeon S3 SSD
Niklas Cassel cassel@kernel.org ata: libata-core: Add ATA_HORKAGE_NOLPM for Crucial CT240BX500SSD1
Niklas Cassel cassel@kernel.org ata: libata-core: Add ATA_HORKAGE_NOLPM for Apacer AS340
Jason Nader dev@kayoway.com ata: ahci: Do not apply Intel PCS quirk on Intel Alder Lake
Damien Le Moal dlemoal@kernel.org null_blk: Print correct max open zones limit in null_init_zoned_dev()
Matthias Maennich maennich@google.com kheaders: explicitly define file modes for archived headers
Steven Rostedt (Google) rostedt@goodmis.org tracing/selftests: Fix kprobe event name test for .isra. functions
Nam Cao namcao@linutronix.de riscv: fix overlap of allocated page and PTR_ERR
Carlos Llamas cmllamas@google.com locking/atomic: scripts: fix ${atomic}_sub_and_test() kerneldoc
Johannes Berg johannes.berg@intel.com wifi: mt76: mt7615: add missing chanctx ops
Bitterblue Smith rtl8821cerfe2@gmail.com wifi: rtlwifi: Ignore IEEE80211_CONF_CHANGE_RETRY_LIMITS
Johannes Berg johannes.berg@intel.com wifi: cfg80211: validate HE operation element parsing
Adrian Hunter adrian.hunter@intel.com perf script: Show also errors for --insn-trace option
Adrian Hunter adrian.hunter@intel.com perf auxtrace: Fix multiple use of --itrace option
Haifeng Xu haifeng.xu@shopee.com perf/core: Fix missing wakeup when waiting for context reference
Yazen Ghannam yazen.ghannam@amd.com x86/amd_nb: Check for invalid SMN reads
David Kaplan david.kaplan@amd.com x86/kexec: Fix bug with call depth tracking
Hagar Hemdan hagarhem@amazon.com irqchip/gic-v3-its: Fix potential race condition in its_vlpi_prop_update()
Samuel Holland samuel.holland@sifive.com irqchip/sifive-plic: Chain to parent IRQ after handlers are ready
YonglongLi liyonglong@chinatelecom.cn mptcp: pm: update add_addr counters after connect
YonglongLi liyonglong@chinatelecom.cn mptcp: pm: inc RmAddr MIB counter once per RM_ADDR ID
Paolo Abeni pabeni@redhat.com mptcp: ensure snd_una is properly initialized on connect
Marek Szyprowski m.szyprowski@samsung.com drm/exynos: hdmi: report safe 640x480 mode as a fallback when no EDID found
Jani Nikula jani.nikula@intel.com drm/exynos/vidi: fix memory leak in .get_modes()
Jan Beulich jbeulich@suse.com memblock: make memblock_set_node() also warn about use of MAX_NUMNODES
Jan Beulich jbeulich@suse.com x86/mm/numa: Use NUMA_NO_NODE when calling memblock_set_node()
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: ACPI: Invalidate trip points with temperature of 0 or below
Mario Limonciello mario.limonciello@amd.com ACPI: x86: Force StorageD3Enable on more products
Yazen Ghannam yazen.ghannam@amd.com RAS/AMD/ATL: Use system settings for MI300 DRAM to normalized address translation
Yazen Ghannam yazen.ghannam@amd.com RAS/AMD/ATL: Fix MI300 bank hash
John David Anglin dave@parisc-linux.org parisc: Try to fix random segmentation faults in package builds
Dirk Behme dirk.behme@de.bosch.com drivers: core: synchronize really_probe() and dev_uevent()
Jean-Baptiste Maneyrol jean-baptiste.maneyrol@tdk.com iio: imu: inv_icm42600: delete unneeded update watermark call
Harshit Mogalapalli harshit.m.mogalapalli@oracle.com iio: temperature: mlx90635: Fix ERR_PTR dereference in mlx90635_probe()
Adam Rizkalla ajarizzo@gmail.com iio: pressure: bmp280: Fix BMP580 temperature reading
Jean-Baptiste Maneyrol jean-baptiste.maneyrol@tdk.com iio: invensense: fix odr switching to same value
Vasileios Amoiridis vassilisamir@gmail.com iio: imu: bmi323: Fix trigger notification in case of error
Marc Ferland marc.ferland@sonatest.com iio: dac: ad5592r: fix temperature channel scaling value
David Lechner dlechner@baylibre.com iio: adc: ad9467: fix scan type sign
Benjamin Segall bsegall@google.com x86/boot: Don't add the EFI stub to targets, again
Hans de Goede hdegoede@redhat.com leds: class: Revert: "If no default trigger is given, make hw_control trigger the default trigger"
Oleg Nesterov oleg@redhat.com tick/nohz_full: Don't abuse smp_call_function_single() in tick_setup_device()
Namjae Jeon linkinjeon@kernel.org ksmbd: fix missing use of get_write in in smb2_set_ea()
Namjae Jeon linkinjeon@kernel.org ksmbd: move leading slash check to smb2_get_name()
Yongzhi Liu hyperlyzcs@gmail.com misc: microchip: pci1xxxx: fix double free in the error handling of gp_aux_bus_probe()
Aleksandr Mishin amishin@t-argos.ru bnxt_en: Adjust logging of firmware messages in case of released token in __hwrm_send()
Rao Shoaib Rao.Shoaib@oracle.com af_unix: Read with MSG_PEEK loops if the first unread byte is OOB
Michael Chan michael.chan@broadcom.com bnxt_en: Cap the size of HWRM_PORT_PHY_QCFG forwarded response
Taehee Yoo ap420073@gmail.com ionic: fix use after netif_napi_del()
Riana Tauro riana.tauro@intel.com drm/xe: move disable_c6 call
Rodrigo Vivi rodrigo.vivi@intel.com drm/xe: Remove mem_access from guc_pc calls
Andrzej Hajda andrzej.hajda@intel.com drm/xe: flush engine buffers before signalling user fence on all engines
Riana Tauro riana.tauro@intel.com drm/xe/xe_gt_idle: use GT forcewake domain assertion
Nikolay Aleksandrov razor@blackwall.org net: bridge: mst: fix suspicious rcu usage in br_mst_set_state
Nikolay Aleksandrov razor@blackwall.org net: bridge: mst: pass vlan group directly to br_mst_vlan_set_state
Petr Pavlu petr.pavlu@suse.com net/ipv6: Fix the RT cache flush via sysctl using a previous delay
Daniel Wagner dwagner@suse.de nvmet-passthru: propagate status from id override functions
Chengming Zhou chengming.zhou@linux.dev block: fix request.queuelist usage in flush
Su Hui suhui@nfschina.com block: sed-opal: avoid possible wrong address reference in read_sed_opal_key()
Xiaolei Wang xiaolei.wang@windriver.com net: stmmac: replace priv->speed with the portTransmitRate from the tc-cbs parameters
Joshua Washington joshwash@google.com gve: ignore nonrelevant GSO type bits when processing TSO headers
Kory Maincent kory.maincent@bootlin.com net: pse-pd: Use EOPNOTSUPP error code instead of ENOTSUPP
Ziqi Chen quic_ziqichen@quicinc.com scsi: ufs: core: Quiesce request queues before checking pending cmds
Kees Cook kees@kernel.org x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checking
Uros Bizjak ubizjak@gmail.com x86/asm: Use %c/%n instead of %P operand modifier in asm templates
Jozsef Kadlecsik kadlec@netfilter.org netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type
Davide Ornaghi d.ornaghi97@gmail.com netfilter: nft_inner: validate mandatory meta and payload
Vasily Khoruzhick anarsoul@gmail.com drm/nouveau: don't attempt to schedule hpd_work on headless cards
Eric Dumazet edumazet@google.com tcp: use signed arithmetic in tcp_rtx_probe0_timed_out()
Johannes Berg johannes.berg@intel.com net/sched: initialize noop_qdisc owner
Pauli Virtanen pav@iki.fi Bluetooth: fix connection setup in l2cap_connect
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: L2CAP: Fix rejecting L2CAP_CONN_PARAM_UPDATE_REQ
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_sync: Fix not using correct handle
Gal Pressman gal@nvidia.com net/mlx5e: Fix features validation check for tunneled UDP (non-VXLAN) packets
Gal Pressman gal@nvidia.com geneve: Fix incorrect inner network header offset when innerprotoinherit is set
Andy Shevchenko andriy.shevchenko@linux.intel.com net dsa: qca8k: fix usages of device_get_named_child_node()
Eric Dumazet edumazet@google.com tcp: fix race in tcp_v6_syn_recv_sock()
Adam Miotk adam.miotk@arm.com drm/bridge/panel: Fix runtime warning on panel bridge release
Amjad Ouled-Ameur amjad.ouled-ameur@arm.com drm/komeda: check for error-valued pointer
David Wei dw@davidwei.uk netdevsim: fix backwards compatibility in nsim_get_iflink()
Sagar Cheluvegowda quic_scheluve@quicinc.com net: stmmac: dwmac-qcom-ethqos: Configure host DMA width
Aleksandr Mishin amishin@t-argos.ru liquidio: Adjust a NULL pointer handling path in lio_vf_rep_copy_packet
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: core: Do not fail cdev registration because of invalid initial state
Jie Wang wangjie125@huawei.com net: hns3: add cond_resched() to hns3 ring buffer init process
Yonglong Liu liuyonglong@huawei.com net: hns3: fix kernel crash problem in concurrent scenario
Csókás, Bence csokas.bence@prolan.hu net: sfp: Always call `sfp_sm_mod_remove()` on remove
Masahiro Yamada masahiroy@kernel.org modpost: do not warn about missing MODULE_DESCRIPTION() for vmlinux.o
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Annotate data-race of sk->sk_state in unix_accept().
Ian Forbes ian.forbes@broadcom.com drm/vmwgfx: Don't memcmp equivalent pointers
Ian Forbes ian.forbes@broadcom.com drm/vmwgfx: Remove STDU logic from generic mode_valid function
Ian Forbes ian.forbes@broadcom.com drm/vmwgfx: 3D disabled should not effect STDU memory limits
Ian Forbes ian.forbes@broadcom.com drm/vmwgfx: Filter modes which exceed graphics memory
José Expósito jose.exposito89@gmail.com HID: logitech-dj: Fix memory leak in logi_dj_recv_switch_to_dj_mode()
Su Hui suhui@nfschina.com io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()
Breno Leitao leitao@debian.org io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
Lu Baolu baolu.lu@linux.intel.com iommu: Return right value in iommu_sva_bind_device()
Kun(llfl) llfl@linux.alibaba.com iommu/amd: Fix sysfs leak in iommu init
Nikita Zhandarovich n.zhandarovich@fintech.ru HID: core: remove unnecessary WARN_ON() in implement()
Matthias Schiffer matthias.schiffer@ew.tq-group.com gpio: tqmx86: fix broken IRQ_TYPE_EDGE_BOTH interrupt type
Matthias Schiffer matthias.schiffer@ew.tq-group.com gpio: tqmx86: store IRQ trigger type and unmask status separately
Matthias Schiffer matthias.schiffer@ew.tq-group.com gpio: tqmx86: introduce shadow register for GPIO output value
Gregor Herburger gregor.herburger@tq-group.com gpio: tqmx86: fix typo in Kconfig label
Armin Wolf W_Armin@gmx.de platform/x86: dell-smbios: Fix wrong token data in sysfs
Chen Ni nichen@iscas.ac.cn drm/panel: sitronix-st7789v: Add check for of_drm_get_panel_orientation
Weiwen Hu huweiwen@linux.alibaba.com nvme: fix nvme_pr_* status code parsing
John Hubbard jhubbard@nvidia.com selftests/futex: don't pass a const char* to asprintf(3)
Masami Hiramatsu (Google) mhiramat@kernel.org selftests/tracing: Fix event filter test to retry up to 10 times
NeilBrown neilb@suse.de NFS: add barriers when testing for NFS_FSDATA_BLOCKED
Chen Hanxiao chenhx.fnst@fujitsu.com SUNRPC: return proper error from gss_wrap_req_priv
Olga Kornievskaia kolga@netapp.com NFSv4.1 enforce rootpath check in fs_location query
Samuel Holland samuel.holland@sifive.com clk: sifive: Do not register clkdevs for PRCI clocks
Masami Hiramatsu (Google) mhiramat@kernel.org selftests/ftrace: Fix to check required event file
Mark Brown broonie@kernel.org kselftest/alsa: Ensure _GNU_SOURCE is defined
Baokun Li libaokun1@huawei.com cachefiles: flush all requests after setting CACHEFILES_DEAD
Baokun Li libaokun1@huawei.com cachefiles: defer exposing anon_fd until after copy_to_user() succeeds
Baokun Li libaokun1@huawei.com cachefiles: never get a new anonymous fd if ondemand_id is valid
Baokun Li libaokun1@huawei.com cachefiles: remove err_put_fd label in cachefiles_ondemand_daemon_read()
Baokun Li libaokun1@huawei.com cachefiles: add spin_lock for cachefiles_ondemand_info
Baokun Li libaokun1@huawei.com cachefiles: fix slab-use-after-free in cachefiles_ondemand_daemon_read()
Baokun Li libaokun1@huawei.com cachefiles: fix slab-use-after-free in cachefiles_ondemand_get_fd()
Baokun Li libaokun1@huawei.com cachefiles: remove requests from xarray during flushing requests
Baokun Li libaokun1@huawei.com cachefiles: add output string to cachefiles_obj_[get|put]_ondemand_fd
Li Zhijian lizhijian@fujitsu.com cxl/region: Fix memregion leaks in devm_cxl_add_region()
Dave Jiang dave.jiang@intel.com cxl/test: Add missing vmalloc.h for tools/testing/cxl/test/mem.c
Chen Ni nichen@iscas.ac.cn HID: nvidia-shield: Add missing check for input_ff_create_memless
Michael Ellerman mpe@ellerman.id.au powerpc/uaccess: Fix build errors seen with GCC 13/14
Hari Bathini hbathini@linux.ibm.com powerpc/85xx: fix compile error without CONFIG_CRASH_DUMP
Ziwei Xiao ziweixiao@google.com gve: Clear napi->skb before dev_kfree_skb_any()
Martin K. Petersen martin.petersen@oracle.com scsi: sd: Use READ(16) when reading block zero on large capacity disks
Breno Leitao leitao@debian.org scsi: mpt3sas: Avoid test/set_bit() operating in non-allocated memory
Damien Le Moal dlemoal@kernel.org scsi: mpi3mr: Fix ATA NCQ priority support
Damien Le Moal dlemoal@kernel.org scsi: core: Disable CDL by default
Damien Le Moal dlemoal@kernel.org ata: libata-scsi: Set the RMB bit only for removable media devices
Aapo Vienamo aapo.vienamo@linux.intel.com thunderbolt: debugfs: Fix margin debugfs node creation condition
Kuangyi Chiang ki.chiang65@gmail.com xhci: Apply broken streams quirk to Etron EJ188 xHCI host
Hector Martin marcan@marcan.st xhci: Handle TD clearing for multiple streams case
Kuangyi Chiang ki.chiang65@gmail.com xhci: Apply reset resume quirk to Etron EJ188 xHCI host
Mathias Nyman mathias.nyman@linux.intel.com xhci: Set correct transferred length for cancelled bulk transfers
Greg Kroah-Hartman gregkh@linuxfoundation.org jfs: xattr: fix buffer overflow for invalid xattr
Mickaël Salaün mic@digikod.net landlock: Fix d_parent walk
Douglas Anderson dianders@chromium.org serial: port: Don't block system suspend even if bytes are left to xmit
Doug Brown doug@schmorgal.com serial: 8250_pxa: Configure tx_loadsz to match FIFO IRQ level
Ilpo Järvinen ilpo.jarvinen@linux.intel.com tty: n_tty: Fix buffer offsets when lookahead is used
Wentong Wu wentong.wu@intel.com mei: vsc: Don't stop/restart mei device during system suspend/resume
Tomas Winkler tomas.winkler@intel.com mei: me: release irq in mei_me_pci_resume error path
Kyle Tso kyletso@google.com usb: typec: tcpm: Ignore received Hard Reset in TOGGLING state
Amit Sunil Dhamne amitsd@google.com usb: typec: tcpm: fix use-after-free case in tcpm_register_source_caps
John Ernberg john.ernberg@actia.se USB: xen-hcd: Traverse host/ when CONFIG_USB_XEN_HCD is selected
Andrey Konovalov andreyknvl@gmail.com kcov, usb: disable interrupts in kcov_remote_start_usb_softirq
Alan Stern stern@rowland.harvard.edu USB: class: cdc-wdm: Fix CPU lockup caused by excessive log messages
Pavel Begunkov asml.silence@gmail.com io_uring: fix cancellation overwriting req->flags
Pavel Begunkov asml.silence@gmail.com io_uring/rsrc: don't lock while !TASK_RUNNING
Greg Kroah-Hartman gregkh@linuxfoundation.org .editorconfig: remove trim_trailing_whitespace option
Steven Rostedt (Google) rostedt@goodmis.org eventfs: Update all the eventfs_inodes from the events descriptor
Dev Jain dev.jain@arm.com selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation
Nathan Chancellor nathan@kernel.org selftests/mm: ksft_exit functions do not return
Dave Hansen dave.hansen@linux.intel.com x86/cpu: Provide default cache line size if not enumerated
Borislav Petkov (AMD) bp@alien8.de x86/cpu: Get rid of an unnecessary local variable in get_cpu_address_sizes()
Matthew Brost matthew.brost@intel.com drm/xe: Use ordered WQ for G2H handler
Su Hui suhui@nfschina.com net: ethtool: fix the error condition in ethtool_get_phy_stats_ethtool()
Eric Dumazet edumazet@google.com ipv6: fix possible race in __fib6_drop_pcpu_from()
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Annotate data-race of sk->sk_shutdown in sk_diag_fill().
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Use skb_queue_len_lockless() in sk_diag_show_rqlen().
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Use skb_queue_empty_lockless() in unix_release_sock().
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Use unix_recvq_full_lockless() in unix_stream_connect().
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Annotate data-race of net->unx.sysctl_max_dgram_qlen.
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Annotate data-races around sk->sk_sndbuf.
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Annotate data-races around sk->sk_state in UNIX_DIAG.
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Annotate data-race of sk->sk_state in unix_stream_read_skb().
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Annotate data-races around sk->sk_state in sendmsg() and recvmsg().
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Annotate data-race of sk->sk_state in unix_stream_connect().
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Annotate data-races around sk->sk_state in unix_write_space() and poll().
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Annotate data-race of sk->sk_state in unix_inq_len().
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Annodate data-races around sk->sk_state for writers.
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Set sk->sk_state under unix_state_lock() for truly disconencted peer.
Aleksandr Mishin amishin@t-argos.ru net: wwan: iosm: Fix tainted pointer delete is case of region creation fail
Sasha Neftin sasha.neftin@intel.com igc: Fix Energy Efficient Ethernet support declaration
Larysa Zaremba larysa.zaremba@intel.com ice: map XDP queues to vectors in ice_vsi_map_rings_to_vectors()
Larysa Zaremba larysa.zaremba@intel.com ice: add flag to distinguish reset from .ndo_bpf in XDP rings config
Larysa Zaremba larysa.zaremba@intel.com ice: remove af_xdp_zc_qps bitmap
Jacob Keller jacob.e.keller@intel.com ice: fix reads from NVM Shadow RAM on E830 and E825-C devices
Jacob Keller jacob.e.keller@intel.com ice: fix iteration of TLVs in Preserved Fields Area
Karol Kolacinski karol.kolacinski@intel.com ptp: Fix error message on failed pin verification
Eric Dumazet edumazet@google.com net/sched: taprio: always validate TCA_TAPRIO_ATTR_PRIOMAP
Aleksandr Mishin amishin@t-argos.ru net/mlx5: Fix tainted pointer delete is case of flow rules creation fail
Shay Drory shayd@nvidia.com net/mlx5: Always stop health timer during driver removal
Moshe Shemesh moshe@nvidia.com net/mlx5: Stop waiting for PCI if pci channel is offline
Frank Wunderlich frank-w@public-files.de net: ethernet: mtk_eth_soc: handle dma buffer size soc specific
Jakub Kicinski kuba@kernel.org rtnetlink: make the "split" NLM_DONE handling generic
Jason Xing kernelxing@tencent.com mptcp: count CLOSE-WAIT sockets for MPTCP_MIB_CURRESTAB
Jason Xing kernelxing@tencent.com tcp: count CLOSE-WAIT sockets for TCP_MIB_CURRESTAB
Hangyu Hua hbh25y@gmail.com net: sched: sch_multiq: fix possible OOB write in multiq_tune()
Taehee Yoo ap420073@gmail.com ionic: fix kernel panic in XDP_TX action
Tristram Ha tristram.ha@microchip.com net: phy: Micrel KSZ8061: fix errata solution not taking effect problem
Wen Gu guwen@linux.alibaba.com net/smc: avoid overwriting when adjusting sock bufsizes
Subbaraya Sundeep sbhatta@marvell.com octeontx2-af: Always allocate PF entries from low prioriy zone
Jiri Olsa jolsa@kernel.org bpf: Set run context for rawtp test_run callback
Jakub Kicinski kuba@kernel.org net: tls: fix marking packets as decrypted
Eric Dumazet edumazet@google.com ipv6: sr: block BH in seg6_output_core() and seg6_input_core()
Eric Dumazet edumazet@google.com ipv6: ioam: block BH from ioam6_output()
Matthias Stocker mstocker@barracuda.com vmxnet3: disable rx data ring on dma allocation failure
Ravi Bangoria ravi.bangoria@amd.com KVM: SEV-ES: Delegate LBR virtualization to the processor
Ravi Bangoria ravi.bangoria@amd.com KVM: SEV-ES: Disallow SEV-ES guests when X86_FEATURE_LBRV is absent
Cong Wang cong.wang@bytedance.com bpf: Fix a potential use-after-free in bpf_link_free()
Tristram Ha tristram.ha@microchip.com net: phy: micrel: fix KSZ9477 PHY issues after suspend/resume
DelphineCCChiu delphine_cc_chiu@wiwynn.com net/ncsi: Fix the multi thread manner of NCSI driver
Duoming Zhou duoming@zju.edu.cn ax25: Replace kfree() in ax25_dev_free() with ax25_dev_put()
Lars Kellogg-Stedman lars@oddbit.com ax25: Fix refcount imbalance on inbound connections
Heng Qi hengqi@linux.alibaba.com virtio_net: fix possible dim status unrecoverable
Quan Zhou zhouquan@iscas.ac.cn RISC-V: KVM: Fix incorrect reg_subtype labels in kvm_riscv_vcpu_set_reg_isa_ext function
Yong-Xuan Wang yongxuan.wang@sifive.com RISC-V: KVM: No need to use mask when hart-index-bit is 0
Chanwoo Lee cw9316.lee@samsung.com scsi: ufs: mcq: Fix error output and clean up ufshcd_mcq_abort()
Lingbo Kong quic_lingbok@quicinc.com wifi: mac80211: correctly parse Spatial Reuse Parameter Set element
Lingbo Kong quic_lingbok@quicinc.com wifi: mac80211: fix Spatial Reuse element size check
Emmanuel Grumbach emmanuel.grumbach@intel.com wifi: iwlwifi: mvm: don't read past the mfuart notifcation
Miri Korenblit miriam.rachel.korenblit@intel.com wifi: iwlwifi: mvm: check n_ssids before accessing the ssids
Shahar S Matityahu shahar.s.matityahu@intel.com wifi: iwlwifi: dbg_ini: move iwl_dbg_tlv_free outside of debugfs ifdef
Mordechay Goodstein mordechay.goodstein@intel.com wifi: iwlwifi: mvm: set properly mac header
Johannes Berg johannes.berg@intel.com wifi: iwlwifi: mvm: revert gen2 TX A-MPDU size to 64
Miri Korenblit miriam.rachel.korenblit@intel.com wifi: iwlwifi: mvm: don't initialize csa_work twice
Aditya Kumar Singh quic_adisi@quicinc.com wifi: mac80211: pass proper link id for channel switch started notification
Lin Ma linma@zju.edu.cn wifi: cfg80211: pmsr: use correct nla_get_uX functions
Remi Pommarel repk@triplefau.lt wifi: cfg80211: Lock wiphy in cfg80211_get_station
Johannes Berg johannes.berg@intel.com wifi: cfg80211: fully move wiphy work to unbound workqueue
Remi Pommarel repk@triplefau.lt wifi: mac80211: Fix deadlock in ieee80211_sta_ps_deliver_wakeup()
Nicolas Escande nico.escande@gmail.com wifi: mac80211: mesh: Fix leak of mesh_preq_queue objects
Arnd Bergmann arnd@arndb.de cpufreq: amd-pstate: remove global header file
Perry Yuan perry.yuan@amd.com cpufreq: amd-pstate: Add quirk for the pstate CPPC capabilities missing
Perry Yuan perry.yuan@amd.com cpufreq: amd-pstate: Unify computation of {max,min,nominal,lowest_nonlinear}_freq
Baochen Qiang quic_bqiang@quicinc.com wifi: ath11k: move power type check to ASSOC stage when connecting to 6 GHz AP
Carl Huang quic_cjhuang@quicinc.com wifi: ath11k: fix WCN6750 firmware crash caused by 17 num_vdevs
-------------
Diffstat:
.editorconfig | 3 - .../devicetree/bindings/usb/realtek,rts5411.yaml | 1 + MAINTAINERS | 1 - Makefile | 4 +- arch/parisc/include/asm/cacheflush.h | 15 +- arch/parisc/include/asm/pgtable.h | 27 +- arch/parisc/kernel/cache.c | 421 +++++++++++++-------- arch/powerpc/include/asm/uaccess.h | 16 + arch/powerpc/platforms/85xx/smp.c | 9 +- arch/riscv/kvm/aia_device.c | 7 +- arch/riscv/kvm/vcpu_onereg.c | 4 +- arch/riscv/mm/init.c | 24 +- arch/riscv/mm/pageattr.c | 28 +- arch/x86/boot/compressed/Makefile | 4 +- arch/x86/boot/main.c | 4 +- arch/x86/include/asm/alternative.h | 22 +- arch/x86/include/asm/atomic64_32.h | 2 +- arch/x86/include/asm/cpufeature.h | 2 +- arch/x86/include/asm/irq_stack.h | 2 +- arch/x86/include/asm/uaccess.h | 6 +- arch/x86/kernel/amd_nb.c | 9 +- arch/x86/kernel/cpu/common.c | 21 +- arch/x86/kernel/machine_kexec_64.c | 11 +- arch/x86/kvm/svm/sev.c | 19 +- arch/x86/kvm/svm/svm.c | 24 +- arch/x86/kvm/svm/svm.h | 4 +- arch/x86/lib/getuser.S | 6 +- arch/x86/mm/numa.c | 6 +- block/blk-flush.c | 3 +- block/sed-opal.c | 2 +- drivers/acpi/thermal.c | 8 +- drivers/acpi/x86/utils.c | 24 +- drivers/ata/ahci.c | 1 - drivers/ata/libata-core.c | 9 +- drivers/ata/libata-scsi.c | 8 +- drivers/base/core.c | 3 + drivers/block/null_blk/zoned.c | 2 +- drivers/clk/sifive/sifive-prci.c | 8 - drivers/cpufreq/amd-pstate-ut.c | 3 +- drivers/cpufreq/amd-pstate.c | 209 ++++++---- {include/linux => drivers/cpufreq}/amd-pstate.h | 27 -- drivers/cxl/core/region.c | 18 +- drivers/dma-buf/st-dma-fence.c | 6 + drivers/dma/dma-axi-dmac.c | 2 +- drivers/gpio/Kconfig | 2 +- drivers/gpio/gpio-tqmx86.c | 110 ++++-- .../drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 2 +- .../drm/arm/display/komeda/komeda_pipeline_state.c | 2 +- drivers/gpu/drm/bridge/panel.c | 7 +- drivers/gpu/drm/display/drm_dp_mst_topology.c | 4 +- drivers/gpu/drm/drm_gem_shmem_helper.c | 3 + drivers/gpu/drm/exynos/exynos_drm_vidi.c | 7 +- drivers/gpu/drm/exynos/exynos_hdmi.c | 7 +- drivers/gpu/drm/i915/display/intel_audio.c | 32 +- drivers/gpu/drm/i915/display/intel_audio.h | 1 + .../gpu/drm/i915/display/intel_display_driver.c | 2 + drivers/gpu/drm/i915/display/intel_dp_mst.c | 2 +- drivers/gpu/drm/i915/gem/i915_gem_object.h | 4 +- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 15 +- drivers/gpu/drm/nouveau/dispnv04/disp.c | 2 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 4 +- drivers/gpu/drm/nouveau/nouveau_display.c | 6 +- drivers/gpu/drm/nouveau/nouveau_drv.h | 1 + drivers/gpu/drm/panel/panel-sitronix-st7789v.c | 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 7 - drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 3 - drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 28 +- drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c | 45 ++- drivers/gpu/drm/xe/xe_gt_idle.c | 9 +- drivers/gpu/drm/xe/xe_guc_ct.c | 4 + drivers/gpu/drm/xe/xe_guc_pc.c | 70 +--- drivers/gpu/drm/xe/xe_guc_submit.c | 1 + drivers/gpu/drm/xe/xe_ring_ops.c | 18 +- drivers/greybus/interface.c | 1 + drivers/hid/hid-core.c | 1 - drivers/hid/hid-logitech-dj.c | 4 +- drivers/hid/hid-nvidia-shield.c | 4 +- drivers/hwtracing/intel_th/pci.c | 25 ++ drivers/i2c/busses/i2c-at91-slave.c | 3 +- drivers/i2c/busses/i2c-designware-slave.c | 2 +- drivers/iio/adc/ad9467.c | 4 +- drivers/iio/adc/adi-axi-adc.c | 5 + .../iio/common/inv_sensors/inv_sensors_timestamp.c | 15 +- drivers/iio/dac/ad5592r-base.c | 2 +- drivers/iio/imu/bmi323/bmi323_core.c | 5 +- drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c | 4 - drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c | 4 - drivers/iio/pressure/bmp280-core.c | 10 +- drivers/iio/temperature/mcp9600.c | 3 +- drivers/iio/temperature/mlx90635.c | 6 +- drivers/iommu/amd/init.c | 9 + drivers/irqchip/irq-gic-v3-its.c | 44 +-- drivers/irqchip/irq-sifive-plic.c | 34 +- drivers/leds/led-class.c | 6 - drivers/md/dm-integrity.c | 1 + drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.c | 9 +- drivers/misc/mei/pci-me.c | 4 +- drivers/misc/mei/platform-vsc.c | 39 +- drivers/misc/mei/vsc-fw-loader.c | 2 +- drivers/misc/vmw_vmci/vmci_event.c | 6 +- drivers/net/dsa/qca/qca8k-leds.c | 12 +- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 51 +++ drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.c | 2 +- drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 12 +- drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c | 11 +- drivers/net/ethernet/google/gve/gve_rx_dqo.c | 8 +- drivers/net/ethernet/google/gve/gve_tx_dqo.c | 20 +- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 4 + drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 2 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 21 +- drivers/net/ethernet/intel/ice/ice.h | 44 ++- drivers/net/ethernet/intel/ice/ice_base.c | 3 + drivers/net/ethernet/intel/ice/ice_lib.c | 29 +- drivers/net/ethernet/intel/ice/ice_main.c | 144 ++++--- drivers/net/ethernet/intel/ice/ice_nvm.c | 116 +++++- drivers/net/ethernet/intel/ice/ice_type.h | 14 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 13 +- drivers/net/ethernet/intel/igc/igc_ethtool.c | 9 +- drivers/net/ethernet/intel/igc/igc_main.c | 4 + .../net/ethernet/marvell/octeontx2/af/rvu_npc.c | 33 +- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 106 ++++-- drivers/net/ethernet/mediatek/mtk_eth_soc.h | 9 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 4 + drivers/net/ethernet/mellanox/mlx5/core/health.c | 8 + .../net/ethernet/mellanox/mlx5/core/lag/port_sel.c | 8 +- .../net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 4 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 3 + drivers/net/ethernet/pensando/ionic/ionic_lif.c | 4 +- drivers/net/ethernet/pensando/ionic/ionic_txrx.c | 1 + .../ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c | 4 + drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c | 25 +- drivers/net/geneve.c | 10 +- drivers/net/netdevsim/netdev.c | 3 +- drivers/net/phy/micrel.c | 104 ++++- drivers/net/phy/sfp.c | 3 +- drivers/net/virtio_net.c | 2 +- drivers/net/vmxnet3/vmxnet3_drv.c | 2 +- drivers/net/wireless/ath/ath11k/core.c | 2 +- drivers/net/wireless/ath/ath11k/mac.c | 38 +- drivers/net/wireless/intel/iwlwifi/fw/api/power.h | 30 ++ drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 2 +- drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 18 +- drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 4 +- .../net/wireless/intel/iwlwifi/mvm/mld-mac80211.c | 2 - drivers/net/wireless/intel/iwlwifi/mvm/rs.h | 9 +- drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 5 +- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 4 +- drivers/net/wireless/mediatek/mt76/mt7615/main.c | 4 + drivers/net/wireless/realtek/rtlwifi/core.c | 15 - drivers/net/wwan/iosm/iosm_ipc_devlink.c | 2 +- drivers/nvme/host/pr.c | 2 +- drivers/nvme/target/passthru.c | 6 +- drivers/pci/controller/pcie-rockchip-ep.c | 6 +- drivers/platform/x86/dell/dell-smbios-base.c | 92 ++--- drivers/pmdomain/ti/ti_sci_pm_domains.c | 20 +- drivers/ptp/ptp_chardev.c | 3 +- drivers/ras/amd/atl/internal.h | 2 +- drivers/ras/amd/atl/system.c | 2 +- drivers/ras/amd/atl/umc.c | 160 +++++--- drivers/remoteproc/ti_k3_r5_remoteproc.c | 58 ++- drivers/scsi/mpi3mr/mpi3mr_app.c | 62 +++ drivers/scsi/mpt3sas/mpt3sas_base.c | 19 + drivers/scsi/mpt3sas/mpt3sas_base.h | 3 - drivers/scsi/mpt3sas/mpt3sas_ctl.c | 4 +- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 23 -- drivers/scsi/scsi.c | 7 + drivers/scsi/scsi_transport_sas.c | 23 ++ drivers/scsi/sd.c | 17 +- drivers/spmi/hisi-spmi-controller.c | 1 - drivers/thermal/thermal_core.c | 13 +- drivers/thunderbolt/debugfs.c | 5 +- drivers/tty/n_tty.c | 22 +- drivers/tty/serial/8250/8250_dw.c | 9 +- drivers/tty/serial/8250/8250_dwlib.c | 3 +- drivers/tty/serial/8250/8250_dwlib.h | 3 +- drivers/tty/serial/8250/8250_pxa.c | 1 + drivers/tty/serial/serial_port.c | 7 + drivers/ufs/core/ufs-mcq.c | 17 +- drivers/ufs/core/ufshcd.c | 6 +- drivers/usb/Makefile | 1 + drivers/usb/class/cdc-wdm.c | 4 +- drivers/usb/core/hcd.c | 12 +- drivers/usb/host/xhci-pci.c | 7 + drivers/usb/host/xhci-ring.c | 59 ++- drivers/usb/host/xhci.h | 1 + drivers/usb/storage/alauda.c | 9 +- drivers/usb/typec/tcpm/tcpm.c | 5 +- fs/btrfs/zoned.c | 13 +- fs/cachefiles/daemon.c | 3 +- fs/cachefiles/internal.h | 5 + fs/cachefiles/ondemand.c | 165 +++++--- fs/jfs/xattr.c | 4 +- fs/nfs/dir.c | 47 ++- fs/nfs/nfs4proc.c | 23 +- fs/nfsd/nfsfh.c | 4 +- fs/ocfs2/file.c | 2 + fs/ocfs2/namei.c | 4 +- fs/proc/vmcore.c | 2 + fs/smb/server/smb2pdu.c | 22 +- fs/smb/server/vfs.c | 17 +- fs/smb/server/vfs.h | 3 +- fs/smb/server/vfs_cache.c | 3 +- fs/tracefs/event_inode.c | 59 ++- include/drm/bridge/aux-bridge.h | 2 +- include/drm/display/drm_dp_mst_helper.h | 1 - include/linux/atomic/atomic-arch-fallback.h | 6 +- include/linux/atomic/atomic-instrumented.h | 8 +- include/linux/atomic/atomic-long.h | 4 +- include/linux/io_uring_types.h | 3 +- include/linux/iommu.h | 2 +- include/linux/kcov.h | 47 ++- include/linux/kexec.h | 6 +- include/linux/pse-pd/pse.h | 4 +- include/net/bluetooth/hci_core.h | 36 +- include/net/ip_tunnels.h | 5 +- include/net/rtnetlink.h | 1 + include/scsi/scsi_transport_sas.h | 2 + include/trace/events/cachefiles.h | 8 +- io_uring/cancel.h | 4 +- io_uring/io-wq.c | 57 +-- io_uring/io_uring.c | 1 + io_uring/rsrc.c | 1 + kernel/bpf/syscall.c | 11 +- kernel/events/core.c | 13 + kernel/gen_kheaders.sh | 2 +- kernel/pid_namespace.c | 1 + kernel/time/tick-common.c | 42 +- mm/memblock.c | 4 + mm/memory-failure.c | 7 + net/ax25/af_ax25.c | 6 + net/ax25/ax25_dev.c | 2 +- net/bluetooth/hci_sync.c | 2 +- net/bluetooth/l2cap_core.c | 12 +- net/bpf/test_run.c | 6 + net/bridge/br_mst.c | 13 +- net/core/rtnetlink.c | 44 ++- net/core/sock_map.c | 16 +- net/ethtool/ioctl.c | 2 +- net/ipv4/devinet.c | 2 +- net/ipv4/fib_frontend.c | 7 +- net/ipv4/tcp.c | 9 +- net/ipv4/tcp_timer.c | 6 +- net/ipv6/ioam6_iptunnel.c | 8 +- net/ipv6/ip6_fib.c | 6 +- net/ipv6/route.c | 5 +- net/ipv6/seg6_iptunnel.c | 14 +- net/ipv6/tcp_ipv6.c | 3 +- net/mac80211/cfg.c | 4 +- net/mac80211/he.c | 10 +- net/mac80211/mesh_pathtbl.c | 13 + net/mac80211/parse.c | 2 +- net/mac80211/sta_info.c | 4 +- net/mptcp/pm_netlink.c | 21 +- net/mptcp/protocol.c | 10 +- net/ncsi/internal.h | 2 + net/ncsi/ncsi-manage.c | 75 ++-- net/ncsi/ncsi-rsp.c | 4 +- net/netfilter/ipset/ip_set_core.c | 93 +++-- net/netfilter/ipset/ip_set_list_set.c | 30 +- net/netfilter/nft_meta.c | 3 + net/netfilter/nft_payload.c | 4 + net/sched/sch_generic.c | 1 + net/sched/sch_multiq.c | 2 +- net/sched/sch_taprio.c | 15 +- net/smc/af_smc.c | 22 +- net/sunrpc/auth_gss/auth_gss.c | 4 +- net/unix/af_unix.c | 108 +++--- net/unix/diag.c | 12 +- net/wireless/core.c | 2 +- net/wireless/pmsr.c | 8 +- net/wireless/scan.c | 3 +- net/wireless/sysfs.c | 4 +- net/wireless/util.c | 7 +- scripts/atomic/kerneldoc/sub_and_test | 2 +- scripts/mod/modpost.c | 5 +- security/integrity/ima/ima_api.c | 16 +- security/integrity/ima/ima_template_lib.c | 17 +- security/landlock/fs.c | 13 +- tools/perf/builtin-script.c | 2 +- tools/perf/util/auxtrace.c | 4 +- tools/testing/cxl/test/mem.c | 1 + tools/testing/selftests/alsa/Makefile | 2 +- .../ftrace/test.d/dynevent/test_duplicates.tc | 2 +- .../ftrace/test.d/filter/event-filter-function.tc | 20 +- .../ftrace/test.d/kprobe/kprobe_eventname.tc | 3 +- .../selftests/futex/functional/futex_requeue_pi.c | 2 +- tools/testing/selftests/mm/compaction_test.c | 77 ++-- tools/testing/selftests/mm/cow.c | 2 +- tools/testing/selftests/mm/gup_longterm.c | 2 +- tools/testing/selftests/mm/gup_test.c | 4 +- tools/testing/selftests/mm/ksm_functional_tests.c | 2 +- tools/testing/selftests/mm/madv_populate.c | 2 +- tools/testing/selftests/mm/mkdirty.c | 2 +- tools/testing/selftests/mm/pagemap_ioctl.c | 4 +- tools/testing/selftests/mm/soft-dirty.c | 2 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 5 +- tools/tracing/rtla/src/timerlat_aa.c | 109 +++--- tools/tracing/rtla/src/timerlat_top.c | 17 +- 299 files changed, 3209 insertions(+), 1761 deletions(-)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carl Huang quic_cjhuang@quicinc.com
[ Upstream commit ed281c6ab6eb8a914f06c74dfeaebde15b34a3f4 ]
WCN6750 firmware crashes because of num_vdevs changed from 4 to 17 in ath11k_init_wmi_config_qca6390() as the ab->hw_params.num_vdevs is 17. This is caused by commit f019f4dff2e4 ("wifi: ath11k: support 2 station interfaces") which assigns ab->hw_params.num_vdevs directly to config->num_vdevs in ath11k_init_wmi_config_qca6390(), therefore WCN6750 firmware crashes as it can't support such a big num_vdevs.
Fix it by assign 3 to num_vdevs in hw_params for WCN6750 as 3 is sufficient too.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3 Tested-on: WCN6750 hw1.0 AHB WLAN.MSL.1.0.1-01371-QCAMSLSWPLZ-1
Fixes: f019f4dff2e4 ("wifi: ath11k: support 2 station interfaces") Reported-by: Luca Weiss luca.weiss@fairphone.com Tested-by: Luca Weiss luca.weiss@fairphone.com Closes: https://lore.kernel.org/r/D15TIIDIIESY.D1EKKJLZINMA@fairphone.com/ Signed-off-by: Carl Huang quic_cjhuang@quicinc.com Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://msgid.link/20240520030757.2209395-1-quic_cjhuang@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath11k/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ath/ath11k/core.c b/drivers/net/wireless/ath/ath11k/core.c index c78bce19bd754..5d07585e59c17 100644 --- a/drivers/net/wireless/ath/ath11k/core.c +++ b/drivers/net/wireless/ath/ath11k/core.c @@ -595,7 +595,7 @@ static const struct ath11k_hw_params ath11k_hw_params[] = { .coldboot_cal_ftm = true, .cbcal_restart_fw = false, .fw_mem_mode = 0, - .num_vdevs = 16 + 1, + .num_vdevs = 3, .num_peers = 512, .supports_suspend = false, .hal_desc_sz = sizeof(struct hal_rx_desc_qcn9074),
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baochen Qiang quic_bqiang@quicinc.com
[ Upstream commit 6e16782d6b4a724f9c9dcd49471219643593b60c ]
With commit bc8a0fac8677 ("wifi: mac80211: don't set bss_conf in parsing") ath11k fails to connect to 6 GHz AP.
This is because currently ath11k checks AP's power type in ath11k_mac_op_assign_vif_chanctx() which would be called in AUTH stage. However with above commit power type is not available until ASSOC stage. As a result power type check fails and therefore connection fails.
Fix this by moving power type check to ASSOC stage, also move regulatory rules update there because it depends on power type.
Tested-on: WCN6855 hw2.0 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.30
Fixes: bc8a0fac8677 ("wifi: mac80211: don't set bss_conf in parsing") Signed-off-by: Baochen Qiang quic_bqiang@quicinc.com Acked-by: Jeff Johnson quic_jjohnson@quicinc.com Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://msgid.link/20240424064019.4847-1-quic_bqiang@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath11k/mac.c | 38 ++++++++++++++++++--------- 1 file changed, 25 insertions(+), 13 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/mac.c b/drivers/net/wireless/ath/ath11k/mac.c index 2fca415322aec..790277f547bb4 100644 --- a/drivers/net/wireless/ath/ath11k/mac.c +++ b/drivers/net/wireless/ath/ath11k/mac.c @@ -7889,8 +7889,6 @@ ath11k_mac_op_assign_vif_chanctx(struct ieee80211_hw *hw, struct ath11k_base *ab = ar->ab; struct ath11k_vif *arvif = ath11k_vif_to_arvif(vif); int ret; - struct cur_regulatory_info *reg_info; - enum ieee80211_ap_reg_power power_type;
mutex_lock(&ar->conf_mutex);
@@ -7901,17 +7899,6 @@ ath11k_mac_op_assign_vif_chanctx(struct ieee80211_hw *hw, if (ath11k_wmi_supports_6ghz_cc_ext(ar) && ctx->def.chan->band == NL80211_BAND_6GHZ && arvif->vdev_type == WMI_VDEV_TYPE_STA) { - reg_info = &ab->reg_info_store[ar->pdev_idx]; - power_type = vif->bss_conf.power_type; - - ath11k_dbg(ab, ATH11K_DBG_MAC, "chanctx power type %d\n", power_type); - - if (power_type == IEEE80211_REG_UNSET_AP) { - ret = -EINVAL; - goto out; - } - - ath11k_reg_handle_chan_list(ab, reg_info, power_type); arvif->chanctx = *ctx; ath11k_mac_parse_tx_pwr_env(ar, vif, ctx); } @@ -9525,6 +9512,8 @@ static int ath11k_mac_op_sta_state(struct ieee80211_hw *hw, struct ath11k *ar = hw->priv; struct ath11k_vif *arvif = ath11k_vif_to_arvif(vif); struct ath11k_sta *arsta = ath11k_sta_to_arsta(sta); + enum ieee80211_ap_reg_power power_type; + struct cur_regulatory_info *reg_info; struct ath11k_peer *peer; int ret = 0;
@@ -9604,6 +9593,29 @@ static int ath11k_mac_op_sta_state(struct ieee80211_hw *hw, ath11k_warn(ar->ab, "Unable to authorize peer %pM vdev %d: %d\n", sta->addr, arvif->vdev_id, ret); } + + if (!ret && + ath11k_wmi_supports_6ghz_cc_ext(ar) && + arvif->vdev_type == WMI_VDEV_TYPE_STA && + arvif->chanctx.def.chan && + arvif->chanctx.def.chan->band == NL80211_BAND_6GHZ) { + reg_info = &ar->ab->reg_info_store[ar->pdev_idx]; + power_type = vif->bss_conf.power_type; + + if (power_type == IEEE80211_REG_UNSET_AP) { + ath11k_warn(ar->ab, "invalid power type %d\n", + power_type); + ret = -EINVAL; + } else { + ret = ath11k_reg_handle_chan_list(ar->ab, + reg_info, + power_type); + if (ret) + ath11k_warn(ar->ab, + "failed to handle chan list with power type %d\n", + power_type); + } + } } else if (old_state == IEEE80211_STA_AUTHORIZED && new_state == IEEE80211_STA_ASSOC) { spin_lock_bh(&ar->ab->base_lock);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Perry Yuan perry.yuan@amd.com
[ Upstream commit 5547c0ebfc2efdab6ee93a7fd4d9c411ad87013e ]
Currently the amd_get_{min, max, nominal, lowest_nonlinear}_freq() helpers computes the values of min_freq, max_freq, nominal_freq and lowest_nominal_freq respectively afresh from cppc_get_perf_caps(). This is not necessary as there are fields in cpudata to cache these values.
To simplify this, add a single helper function named amd_pstate_init_freq() which computes all these frequencies at once, and caches it in cpudata.
Use the cached values everywhere else in the code.
Acked-by: Huang Rui ray.huang@amd.com Reviewed-by: Li Meng li.meng@amd.com Tested-by: Dhananjay Ugwekar Dhananjay.Ugwekar@amd.com Co-developed-by: Gautham R. Shenoy gautham.shenoy@amd.com Signed-off-by: Gautham R. Shenoy gautham.shenoy@amd.com Signed-off-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: 779b8a14afde ("cpufreq: amd-pstate: remove global header file") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/amd-pstate.c | 126 ++++++++++++++++------------------- 1 file changed, 59 insertions(+), 67 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index e263db0385ab7..e8385ca32f1fc 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -622,74 +622,22 @@ static void amd_pstate_adjust_perf(unsigned int cpu,
static int amd_get_min_freq(struct amd_cpudata *cpudata) { - struct cppc_perf_caps cppc_perf; - - int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); - if (ret) - return ret; - - /* Switch to khz */ - return cppc_perf.lowest_freq * 1000; + return READ_ONCE(cpudata->min_freq); }
static int amd_get_max_freq(struct amd_cpudata *cpudata) { - struct cppc_perf_caps cppc_perf; - u32 max_perf, max_freq, nominal_freq, nominal_perf; - u64 boost_ratio; - - int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); - if (ret) - return ret; - - nominal_freq = cppc_perf.nominal_freq; - nominal_perf = READ_ONCE(cpudata->nominal_perf); - max_perf = READ_ONCE(cpudata->highest_perf); - - boost_ratio = div_u64(max_perf << SCHED_CAPACITY_SHIFT, - nominal_perf); - - max_freq = nominal_freq * boost_ratio >> SCHED_CAPACITY_SHIFT; - - /* Switch to khz */ - return max_freq * 1000; + return READ_ONCE(cpudata->max_freq); }
static int amd_get_nominal_freq(struct amd_cpudata *cpudata) { - struct cppc_perf_caps cppc_perf; - - int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); - if (ret) - return ret; - - /* Switch to khz */ - return cppc_perf.nominal_freq * 1000; + return READ_ONCE(cpudata->nominal_freq); }
static int amd_get_lowest_nonlinear_freq(struct amd_cpudata *cpudata) { - struct cppc_perf_caps cppc_perf; - u32 lowest_nonlinear_freq, lowest_nonlinear_perf, - nominal_freq, nominal_perf; - u64 lowest_nonlinear_ratio; - - int ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); - if (ret) - return ret; - - nominal_freq = cppc_perf.nominal_freq; - nominal_perf = READ_ONCE(cpudata->nominal_perf); - - lowest_nonlinear_perf = cppc_perf.lowest_nonlinear_perf; - - lowest_nonlinear_ratio = div_u64(lowest_nonlinear_perf << SCHED_CAPACITY_SHIFT, - nominal_perf); - - lowest_nonlinear_freq = nominal_freq * lowest_nonlinear_ratio >> SCHED_CAPACITY_SHIFT; - - /* Switch to khz */ - return lowest_nonlinear_freq * 1000; + return READ_ONCE(cpudata->lowest_nonlinear_freq); }
static int amd_pstate_set_boost(struct cpufreq_policy *policy, int state) @@ -844,6 +792,53 @@ static void amd_pstate_update_limits(unsigned int cpu) mutex_unlock(&amd_pstate_driver_lock); }
+/** + * amd_pstate_init_freq: Initialize the max_freq, min_freq, + * nominal_freq and lowest_nonlinear_freq for + * the @cpudata object. + * + * Requires: highest_perf, lowest_perf, nominal_perf and + * lowest_nonlinear_perf members of @cpudata to be + * initialized. + * + * Returns 0 on success, non-zero value on failure. + */ +static int amd_pstate_init_freq(struct amd_cpudata *cpudata) +{ + int ret; + u32 min_freq; + u32 highest_perf, max_freq; + u32 nominal_perf, nominal_freq; + u32 lowest_nonlinear_perf, lowest_nonlinear_freq; + u32 boost_ratio, lowest_nonlinear_ratio; + struct cppc_perf_caps cppc_perf; + + + ret = cppc_get_perf_caps(cpudata->cpu, &cppc_perf); + if (ret) + return ret; + + min_freq = cppc_perf.lowest_freq * 1000; + nominal_freq = cppc_perf.nominal_freq; + nominal_perf = READ_ONCE(cpudata->nominal_perf); + + highest_perf = READ_ONCE(cpudata->highest_perf); + boost_ratio = div_u64(highest_perf << SCHED_CAPACITY_SHIFT, nominal_perf); + max_freq = (nominal_freq * boost_ratio >> SCHED_CAPACITY_SHIFT) * 1000; + + lowest_nonlinear_perf = READ_ONCE(cpudata->lowest_nonlinear_perf); + lowest_nonlinear_ratio = div_u64(lowest_nonlinear_perf << SCHED_CAPACITY_SHIFT, + nominal_perf); + lowest_nonlinear_freq = (nominal_freq * lowest_nonlinear_ratio >> SCHED_CAPACITY_SHIFT) * 1000; + + WRITE_ONCE(cpudata->min_freq, min_freq); + WRITE_ONCE(cpudata->lowest_nonlinear_freq, lowest_nonlinear_freq); + WRITE_ONCE(cpudata->nominal_freq, nominal_freq); + WRITE_ONCE(cpudata->max_freq, max_freq); + + return 0; +} + static int amd_pstate_cpu_init(struct cpufreq_policy *policy) { int min_freq, max_freq, nominal_freq, lowest_nonlinear_freq, ret; @@ -871,6 +866,10 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy) if (ret) goto free_cpudata1;
+ ret = amd_pstate_init_freq(cpudata); + if (ret) + goto free_cpudata1; + min_freq = amd_get_min_freq(cpudata); max_freq = amd_get_max_freq(cpudata); nominal_freq = amd_get_nominal_freq(cpudata); @@ -912,13 +911,8 @@ static int amd_pstate_cpu_init(struct cpufreq_policy *policy) goto free_cpudata2; }
- /* Initial processor data capability frequencies */ - cpudata->max_freq = max_freq; - cpudata->min_freq = min_freq; cpudata->max_limit_freq = max_freq; cpudata->min_limit_freq = min_freq; - cpudata->nominal_freq = nominal_freq; - cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq;
policy->driver_data = cpudata;
@@ -1333,6 +1327,10 @@ static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) if (ret) goto free_cpudata1;
+ ret = amd_pstate_init_freq(cpudata); + if (ret) + goto free_cpudata1; + min_freq = amd_get_min_freq(cpudata); max_freq = amd_get_max_freq(cpudata); nominal_freq = amd_get_nominal_freq(cpudata); @@ -1349,12 +1347,6 @@ static int amd_pstate_epp_cpu_init(struct cpufreq_policy *policy) /* It will be updated by governor */ policy->cur = policy->cpuinfo.min_freq;
- /* Initial processor data capability frequencies */ - cpudata->max_freq = max_freq; - cpudata->min_freq = min_freq; - cpudata->nominal_freq = nominal_freq; - cpudata->lowest_nonlinear_freq = lowest_nonlinear_freq; - policy->driver_data = cpudata;
cpudata->epp_cached = amd_pstate_get_epp(cpudata, 0);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Perry Yuan perry.yuan@amd.com
[ Upstream commit eb8b6c36820214df96e7e86d8614d93f6b028f28 ]
Add quirks table to get CPPC capabilities issue fixed by providing correct perf or frequency values while driver loading.
If CPPC capabilities are not defined in the ACPI tables or wrongly defined by platform firmware, it needs to use quick to get those issues fixed with correct workaround values to make pstate driver can be loaded even though there are CPPC capabilities errors.
The workaround will match the broken BIOS which lack of CPPC capabilities nominal_freq and lowest_freq definition in the ACPI table.
$ cat /sys/devices/system/cpu/cpu0/acpi_cppc/lowest_freq 0 $ cat /sys/devices/system/cpu/cpu0/acpi_cppc/nominal_freq 0
Acked-by: Huang Rui ray.huang@amd.com Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Gautham R. Shenoy gautham.shenoy@amd.com Tested-by: Dhananjay Ugwekar Dhananjay.Ugwekar@amd.com Signed-off-by: Perry Yuan perry.yuan@amd.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: 779b8a14afde ("cpufreq: amd-pstate: remove global header file") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/amd-pstate.c | 53 ++++++++++++++++++++++++++++++++++-- include/linux/amd-pstate.h | 6 ++++ 2 files changed, 57 insertions(+), 2 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index e8385ca32f1fc..605b037913ff8 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -68,6 +68,7 @@ static struct cpufreq_driver amd_pstate_epp_driver; static int cppc_state = AMD_PSTATE_UNDEFINED; static bool cppc_enabled; static bool amd_pstate_prefcore = true; +static struct quirk_entry *quirks;
/* * AMD Energy Preference Performance (EPP) @@ -112,6 +113,41 @@ static unsigned int epp_values[] = {
typedef int (*cppc_mode_transition_fn)(int);
+static struct quirk_entry quirk_amd_7k62 = { + .nominal_freq = 2600, + .lowest_freq = 550, +}; + +static int __init dmi_matched_7k62_bios_bug(const struct dmi_system_id *dmi) +{ + /** + * match the broken bios for family 17h processor support CPPC V2 + * broken BIOS lack of nominal_freq and lowest_freq capabilities + * definition in ACPI tables + */ + if (boot_cpu_has(X86_FEATURE_ZEN2)) { + quirks = dmi->driver_data; + pr_info("Overriding nominal and lowest frequencies for %s\n", dmi->ident); + return 1; + } + + return 0; +} + +static const struct dmi_system_id amd_pstate_quirks_table[] __initconst = { + { + .callback = dmi_matched_7k62_bios_bug, + .ident = "AMD EPYC 7K62", + .matches = { + DMI_MATCH(DMI_BIOS_VERSION, "5.14"), + DMI_MATCH(DMI_BIOS_RELEASE, "12/12/2019"), + }, + .driver_data = &quirk_amd_7k62, + }, + {} +}; +MODULE_DEVICE_TABLE(dmi, amd_pstate_quirks_table); + static inline int get_mode_idx_from_str(const char *str, size_t size) { int i; @@ -818,8 +854,16 @@ static int amd_pstate_init_freq(struct amd_cpudata *cpudata) if (ret) return ret;
- min_freq = cppc_perf.lowest_freq * 1000; - nominal_freq = cppc_perf.nominal_freq; + if (quirks && quirks->lowest_freq) + min_freq = quirks->lowest_freq * 1000; + else + min_freq = cppc_perf.lowest_freq * 1000; + + if (quirks && quirks->nominal_freq) + nominal_freq = quirks->nominal_freq ; + else + nominal_freq = cppc_perf.nominal_freq; + nominal_perf = READ_ONCE(cpudata->nominal_perf);
highest_perf = READ_ONCE(cpudata->highest_perf); @@ -1664,6 +1708,11 @@ static int __init amd_pstate_init(void) if (cpufreq_get_current_driver()) return -EEXIST;
+ quirks = NULL; + + /* check if this machine need CPPC quirks */ + dmi_check_system(amd_pstate_quirks_table); + switch (cppc_state) { case AMD_PSTATE_UNDEFINED: /* Disable on the following configs by default: diff --git a/include/linux/amd-pstate.h b/include/linux/amd-pstate.h index d21838835abda..7b2cbb892fd91 100644 --- a/include/linux/amd-pstate.h +++ b/include/linux/amd-pstate.h @@ -124,4 +124,10 @@ static const char * const amd_pstate_mode_string[] = { [AMD_PSTATE_GUIDED] = "guided", NULL, }; + +struct quirk_entry { + u32 nominal_freq; + u32 lowest_freq; +}; + #endif /* _LINUX_AMD_PSTATE_H */
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 779b8a14afde110dd3502566be907289eba72447 ]
When extra warnings are enabled, gcc points out a global variable definition in a header:
In file included from drivers/cpufreq/amd-pstate-ut.c:29: include/linux/amd-pstate.h:123:27: error: 'amd_pstate_mode_string' defined but not used [-Werror=unused-const-variable=] 123 | static const char * const amd_pstate_mode_string[] = { | ^~~~~~~~~~~~~~~~~~~~~~
This header is only included from two files in the same directory, and one of them uses only a single definition from it, so clean it up by moving most of the contents into the driver that uses them, and making shared bits a local header file.
Fixes: 36c5014e5460 ("cpufreq: amd-pstate: optimize driver working mode selection in amd_pstate_param()") Signed-off-by: Arnd Bergmann arnd@arndb.de Acked-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- MAINTAINERS | 1 - drivers/cpufreq/amd-pstate-ut.c | 3 +- drivers/cpufreq/amd-pstate.c | 34 ++++++++++++++++++- .../linux => drivers/cpufreq}/amd-pstate.h | 33 ------------------ 4 files changed, 35 insertions(+), 36 deletions(-) rename {include/linux => drivers/cpufreq}/amd-pstate.h (81%)
diff --git a/MAINTAINERS b/MAINTAINERS index 28e20975c26f5..3121709d99e3b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1066,7 +1066,6 @@ L: linux-pm@vger.kernel.org S: Supported F: Documentation/admin-guide/pm/amd-pstate.rst F: drivers/cpufreq/amd-pstate* -F: include/linux/amd-pstate.h F: tools/power/x86/amd_pstate_tracer/amd_pstate_trace.py
AMD PTDMA DRIVER diff --git a/drivers/cpufreq/amd-pstate-ut.c b/drivers/cpufreq/amd-pstate-ut.c index f04ae67dda372..fc275d41d51e9 100644 --- a/drivers/cpufreq/amd-pstate-ut.c +++ b/drivers/cpufreq/amd-pstate-ut.c @@ -26,10 +26,11 @@ #include <linux/module.h> #include <linux/moduleparam.h> #include <linux/fs.h> -#include <linux/amd-pstate.h>
#include <acpi/cppc_acpi.h>
+#include "amd-pstate.h" + /* * Abbreviations: * amd_pstate_ut: used as a shortform for AMD P-State unit test. diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 605b037913ff8..6c989d859b396 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -36,7 +36,6 @@ #include <linux/delay.h> #include <linux/uaccess.h> #include <linux/static_call.h> -#include <linux/amd-pstate.h> #include <linux/topology.h>
#include <acpi/processor.h> @@ -46,6 +45,8 @@ #include <asm/processor.h> #include <asm/cpufeature.h> #include <asm/cpu_device_id.h> + +#include "amd-pstate.h" #include "amd-pstate-trace.h"
#define AMD_PSTATE_TRANSITION_LATENCY 20000 @@ -53,6 +54,37 @@ #define CPPC_HIGHEST_PERF_PERFORMANCE 196 #define CPPC_HIGHEST_PERF_DEFAULT 166
+#define AMD_CPPC_EPP_PERFORMANCE 0x00 +#define AMD_CPPC_EPP_BALANCE_PERFORMANCE 0x80 +#define AMD_CPPC_EPP_BALANCE_POWERSAVE 0xBF +#define AMD_CPPC_EPP_POWERSAVE 0xFF + +/* + * enum amd_pstate_mode - driver working mode of amd pstate + */ +enum amd_pstate_mode { + AMD_PSTATE_UNDEFINED = 0, + AMD_PSTATE_DISABLE, + AMD_PSTATE_PASSIVE, + AMD_PSTATE_ACTIVE, + AMD_PSTATE_GUIDED, + AMD_PSTATE_MAX, +}; + +static const char * const amd_pstate_mode_string[] = { + [AMD_PSTATE_UNDEFINED] = "undefined", + [AMD_PSTATE_DISABLE] = "disable", + [AMD_PSTATE_PASSIVE] = "passive", + [AMD_PSTATE_ACTIVE] = "active", + [AMD_PSTATE_GUIDED] = "guided", + NULL, +}; + +struct quirk_entry { + u32 nominal_freq; + u32 lowest_freq; +}; + /* * TODO: We need more time to fine tune processors with shared memory solution * with community together. diff --git a/include/linux/amd-pstate.h b/drivers/cpufreq/amd-pstate.h similarity index 81% rename from include/linux/amd-pstate.h rename to drivers/cpufreq/amd-pstate.h index 7b2cbb892fd91..bc341f35908d7 100644 --- a/include/linux/amd-pstate.h +++ b/drivers/cpufreq/amd-pstate.h @@ -1,7 +1,5 @@ /* SPDX-License-Identifier: GPL-2.0-only */ /* - * linux/include/linux/amd-pstate.h - * * Copyright (C) 2022 Advanced Micro Devices, Inc. * * Author: Meng Li li.meng@amd.com @@ -12,11 +10,6 @@
#include <linux/pm_qos.h>
-#define AMD_CPPC_EPP_PERFORMANCE 0x00 -#define AMD_CPPC_EPP_BALANCE_PERFORMANCE 0x80 -#define AMD_CPPC_EPP_BALANCE_POWERSAVE 0xBF -#define AMD_CPPC_EPP_POWERSAVE 0xFF - /********************************************************************* * AMD P-state INTERFACE * *********************************************************************/ @@ -104,30 +97,4 @@ struct amd_cpudata { bool suspended; };
-/* - * enum amd_pstate_mode - driver working mode of amd pstate - */ -enum amd_pstate_mode { - AMD_PSTATE_UNDEFINED = 0, - AMD_PSTATE_DISABLE, - AMD_PSTATE_PASSIVE, - AMD_PSTATE_ACTIVE, - AMD_PSTATE_GUIDED, - AMD_PSTATE_MAX, -}; - -static const char * const amd_pstate_mode_string[] = { - [AMD_PSTATE_UNDEFINED] = "undefined", - [AMD_PSTATE_DISABLE] = "disable", - [AMD_PSTATE_PASSIVE] = "passive", - [AMD_PSTATE_ACTIVE] = "active", - [AMD_PSTATE_GUIDED] = "guided", - NULL, -}; - -struct quirk_entry { - u32 nominal_freq; - u32 lowest_freq; -}; - #endif /* _LINUX_AMD_PSTATE_H */
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicolas Escande nico.escande@gmail.com
[ Upstream commit b7d7f11a291830fdf69d3301075dd0fb347ced84 ]
The hwmp code use objects of type mesh_preq_queue, added to a list in ieee80211_if_mesh, to keep track of mpath we need to resolve. If the mpath gets deleted, ex mesh interface is removed, the entries in that list will never get cleaned. Fix this by flushing all corresponding items of the preq_queue in mesh_path_flush_pending().
This should take care of KASAN reports like this:
unreferenced object 0xffff00000668d800 (size 128): comm "kworker/u8:4", pid 67, jiffies 4295419552 (age 1836.444s) hex dump (first 32 bytes): 00 1f 05 09 00 00 ff ff 00 d5 68 06 00 00 ff ff ..........h..... 8e 97 ea eb 3e b8 01 00 00 00 00 00 00 00 00 00 ....>........... backtrace: [<000000007302a0b6>] __kmem_cache_alloc_node+0x1e0/0x35c [<00000000049bd418>] kmalloc_trace+0x34/0x80 [<0000000000d792bb>] mesh_queue_preq+0x44/0x2a8 [<00000000c99c3696>] mesh_nexthop_resolve+0x198/0x19c [<00000000926bf598>] ieee80211_xmit+0x1d0/0x1f4 [<00000000fc8c2284>] __ieee80211_subif_start_xmit+0x30c/0x764 [<000000005926ee38>] ieee80211_subif_start_xmit+0x9c/0x7a4 [<000000004c86e916>] dev_hard_start_xmit+0x174/0x440 [<0000000023495647>] __dev_queue_xmit+0xe24/0x111c [<00000000cfe9ca78>] batadv_send_skb_packet+0x180/0x1e4 [<000000007bacc5d5>] batadv_v_elp_periodic_work+0x2f4/0x508 [<00000000adc3cd94>] process_one_work+0x4b8/0xa1c [<00000000b36425d1>] worker_thread+0x9c/0x634 [<0000000005852dd5>] kthread+0x1bc/0x1c4 [<000000005fccd770>] ret_from_fork+0x10/0x20 unreferenced object 0xffff000009051f00 (size 128): comm "kworker/u8:4", pid 67, jiffies 4295419553 (age 1836.440s) hex dump (first 32 bytes): 90 d6 92 0d 00 00 ff ff 00 d8 68 06 00 00 ff ff ..........h..... 36 27 92 e4 02 e0 01 00 00 58 79 06 00 00 ff ff 6'.......Xy..... backtrace: [<000000007302a0b6>] __kmem_cache_alloc_node+0x1e0/0x35c [<00000000049bd418>] kmalloc_trace+0x34/0x80 [<0000000000d792bb>] mesh_queue_preq+0x44/0x2a8 [<00000000c99c3696>] mesh_nexthop_resolve+0x198/0x19c [<00000000926bf598>] ieee80211_xmit+0x1d0/0x1f4 [<00000000fc8c2284>] __ieee80211_subif_start_xmit+0x30c/0x764 [<000000005926ee38>] ieee80211_subif_start_xmit+0x9c/0x7a4 [<000000004c86e916>] dev_hard_start_xmit+0x174/0x440 [<0000000023495647>] __dev_queue_xmit+0xe24/0x111c [<00000000cfe9ca78>] batadv_send_skb_packet+0x180/0x1e4 [<000000007bacc5d5>] batadv_v_elp_periodic_work+0x2f4/0x508 [<00000000adc3cd94>] process_one_work+0x4b8/0xa1c [<00000000b36425d1>] worker_thread+0x9c/0x634 [<0000000005852dd5>] kthread+0x1bc/0x1c4 [<000000005fccd770>] ret_from_fork+0x10/0x20
Fixes: 050ac52cbe1f ("mac80211: code for on-demand Hybrid Wireless Mesh Protocol") Signed-off-by: Nicolas Escande nico.escande@gmail.com Link: https://msgid.link/20240528142605.1060566-1-nico.escande@gmail.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/mesh_pathtbl.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/net/mac80211/mesh_pathtbl.c b/net/mac80211/mesh_pathtbl.c index a6b62169f0848..c0a5c75cddcb9 100644 --- a/net/mac80211/mesh_pathtbl.c +++ b/net/mac80211/mesh_pathtbl.c @@ -1017,10 +1017,23 @@ void mesh_path_discard_frame(struct ieee80211_sub_if_data *sdata, */ void mesh_path_flush_pending(struct mesh_path *mpath) { + struct ieee80211_sub_if_data *sdata = mpath->sdata; + struct ieee80211_if_mesh *ifmsh = &sdata->u.mesh; + struct mesh_preq_queue *preq, *tmp; struct sk_buff *skb;
while ((skb = skb_dequeue(&mpath->frame_queue)) != NULL) mesh_path_discard_frame(mpath->sdata, skb); + + spin_lock_bh(&ifmsh->mesh_preq_queue_lock); + list_for_each_entry_safe(preq, tmp, &ifmsh->preq_queue.list, list) { + if (ether_addr_equal(mpath->dst, preq->dst)) { + list_del(&preq->list); + kfree(preq); + --ifmsh->preq_queue_len; + } + } + spin_unlock_bh(&ifmsh->mesh_preq_queue_lock); }
/**
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Remi Pommarel repk@triplefau.lt
[ Upstream commit 44c06bbde6443de206b30f513100b5670b23fc5e ]
The ieee80211_sta_ps_deliver_wakeup() function takes sta->ps_lock to synchronizes with ieee80211_tx_h_unicast_ps_buf() which is called from softirq context. However using only spin_lock() to get sta->ps_lock in ieee80211_sta_ps_deliver_wakeup() does not prevent softirq to execute on this same CPU, to run ieee80211_tx_h_unicast_ps_buf() and try to take this same lock ending in deadlock. Below is an example of rcu stall that arises in such situation.
rcu: INFO: rcu_sched self-detected stall on CPU rcu: 2-....: (42413413 ticks this GP) idle=b154/1/0x4000000000000000 softirq=1763/1765 fqs=21206996 rcu: (t=42586894 jiffies g=2057 q=362405 ncpus=4) CPU: 2 PID: 719 Comm: wpa_supplicant Tainted: G W 6.4.0-02158-g1b062f552873 #742 Hardware name: RPT (r1) (DT) pstate: 00000005 (nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : queued_spin_lock_slowpath+0x58/0x2d0 lr : invoke_tx_handlers_early+0x5b4/0x5c0 sp : ffff00001ef64660 x29: ffff00001ef64660 x28: ffff000009bc1070 x27: ffff000009bc0ad8 x26: ffff000009bc0900 x25: ffff00001ef647a8 x24: 0000000000000000 x23: ffff000009bc0900 x22: ffff000009bc0900 x21: ffff00000ac0e000 x20: ffff00000a279e00 x19: ffff00001ef646e8 x18: 0000000000000000 x17: ffff800016468000 x16: ffff00001ef608c0 x15: 0010533c93f64f80 x14: 0010395c9faa3946 x13: 0000000000000000 x12: 00000000fa83b2da x11: 000000012edeceea x10: ffff0000010fbe00 x9 : 0000000000895440 x8 : 000000000010533c x7 : ffff00000ad8b740 x6 : ffff00000c350880 x5 : 0000000000000007 x4 : 0000000000000001 x3 : 0000000000000000 x2 : 0000000000000000 x1 : 0000000000000001 x0 : ffff00000ac0e0e8 Call trace: queued_spin_lock_slowpath+0x58/0x2d0 ieee80211_tx+0x80/0x12c ieee80211_tx_pending+0x110/0x278 tasklet_action_common.constprop.0+0x10c/0x144 tasklet_action+0x20/0x28 _stext+0x11c/0x284 ____do_softirq+0xc/0x14 call_on_irq_stack+0x24/0x34 do_softirq_own_stack+0x18/0x20 do_softirq+0x74/0x7c __local_bh_enable_ip+0xa0/0xa4 _ieee80211_wake_txqs+0x3b0/0x4b8 __ieee80211_wake_queue+0x12c/0x168 ieee80211_add_pending_skbs+0xec/0x138 ieee80211_sta_ps_deliver_wakeup+0x2a4/0x480 ieee80211_mps_sta_status_update.part.0+0xd8/0x11c ieee80211_mps_sta_status_update+0x18/0x24 sta_apply_parameters+0x3bc/0x4c0 ieee80211_change_station+0x1b8/0x2dc nl80211_set_station+0x444/0x49c genl_family_rcv_msg_doit.isra.0+0xa4/0xfc genl_rcv_msg+0x1b0/0x244 netlink_rcv_skb+0x38/0x10c genl_rcv+0x34/0x48 netlink_unicast+0x254/0x2bc netlink_sendmsg+0x190/0x3b4 ____sys_sendmsg+0x1e8/0x218 ___sys_sendmsg+0x68/0x8c __sys_sendmsg+0x44/0x84 __arm64_sys_sendmsg+0x20/0x28 do_el0_svc+0x6c/0xe8 el0_svc+0x14/0x48 el0t_64_sync_handler+0xb0/0xb4 el0t_64_sync+0x14c/0x150
Using spin_lock_bh()/spin_unlock_bh() instead prevents softirq to raise on the same CPU that is holding the lock.
Fixes: 1d147bfa6429 ("mac80211: fix AP powersave TX vs. wakeup race") Signed-off-by: Remi Pommarel repk@triplefau.lt Link: https://msgid.link/8e36fe07d0fbc146f89196cd47a53c8a0afe84aa.1716910344.git.r... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/sta_info.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index da5fdd6f5c852..aa22f09e6d145 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -1724,7 +1724,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta) skb_queue_head_init(&pending);
/* sync with ieee80211_tx_h_unicast_ps_buf */ - spin_lock(&sta->ps_lock); + spin_lock_bh(&sta->ps_lock); /* Send all buffered frames to the station */ for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) { int count = skb_queue_len(&pending), tmp; @@ -1753,7 +1753,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta) */ clear_sta_flag(sta, WLAN_STA_PSPOLL); clear_sta_flag(sta, WLAN_STA_UAPSD); - spin_unlock(&sta->ps_lock); + spin_unlock_bh(&sta->ps_lock);
atomic_dec(&ps->num_sta_ps);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit e296c95eac655008d5a709b8cf54d0018da1c916 ]
Previously I had moved the wiphy work to the unbound system workqueue, but missed that when it restarts and during resume it was still using the normal system workqueue. Fix that.
Fixes: 91d20ab9d9ca ("wifi: cfg80211: use system_unbound_wq for wiphy work") Reviewed-by: Miriam Rachel Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240522124126.7ca959f2cbd3.I3e2a71ef445d167b84000ccf934e... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/core.c | 2 +- net/wireless/sysfs.c | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/wireless/core.c b/net/wireless/core.c index 3fb1b637352a9..4b1f45e3070e0 100644 --- a/net/wireless/core.c +++ b/net/wireless/core.c @@ -431,7 +431,7 @@ static void cfg80211_wiphy_work(struct work_struct *work) if (wk) { list_del_init(&wk->entry); if (!list_empty(&rdev->wiphy_work_list)) - schedule_work(work); + queue_work(system_unbound_wq, work); spin_unlock_irq(&rdev->wiphy_work_lock);
wk->func(&rdev->wiphy, wk); diff --git a/net/wireless/sysfs.c b/net/wireless/sysfs.c index 565511a3f461e..62f26618f6747 100644 --- a/net/wireless/sysfs.c +++ b/net/wireless/sysfs.c @@ -5,7 +5,7 @@ * * Copyright 2005-2006 Jiri Benc jbenc@suse.cz * Copyright 2006 Johannes Berg johannes@sipsolutions.net - * Copyright (C) 2020-2021, 2023 Intel Corporation + * Copyright (C) 2020-2021, 2023-2024 Intel Corporation */
#include <linux/device.h> @@ -137,7 +137,7 @@ static int wiphy_resume(struct device *dev) if (rdev->wiphy.registered && rdev->ops->resume) ret = rdev_resume(rdev); rdev->suspended = false; - schedule_work(&rdev->wiphy_work); + queue_work(system_unbound_wq, &rdev->wiphy_work); wiphy_unlock(&rdev->wiphy);
if (ret)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Remi Pommarel repk@triplefau.lt
[ Upstream commit 642f89daa34567d02f312d03e41523a894906dae ]
Wiphy should be locked before calling rdev_get_station() (see lockdep assert in ieee80211_get_station()).
This fixes the following kernel NULL dereference:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000050 Mem abort info: ESR = 0x0000000096000006 EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x06: level 2 translation fault Data abort info: ISV = 0, ISS = 0x00000006 CM = 0, WnR = 0 user pgtable: 4k pages, 48-bit VAs, pgdp=0000000003001000 [0000000000000050] pgd=0800000002dca003, p4d=0800000002dca003, pud=08000000028e9003, pmd=0000000000000000 Internal error: Oops: 0000000096000006 [#1] SMP Modules linked in: netconsole dwc3_meson_g12a dwc3_of_simple dwc3 ip_gre gre ath10k_pci ath10k_core ath9k ath9k_common ath9k_hw ath CPU: 0 PID: 1091 Comm: kworker/u8:0 Not tainted 6.4.0-02144-g565f9a3a7911-dirty #705 Hardware name: RPT (r1) (DT) Workqueue: bat_events batadv_v_elp_throughput_metric_update pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : ath10k_sta_statistics+0x10/0x2dc [ath10k_core] lr : sta_set_sinfo+0xcc/0xbd4 sp : ffff000007b43ad0 x29: ffff000007b43ad0 x28: ffff0000071fa900 x27: ffff00000294ca98 x26: ffff000006830880 x25: ffff000006830880 x24: ffff00000294c000 x23: 0000000000000001 x22: ffff000007b43c90 x21: ffff800008898acc x20: ffff00000294c6e8 x19: ffff000007b43c90 x18: 0000000000000000 x17: 445946354d552d78 x16: 62661f7200000000 x15: 57464f445946354d x14: 0000000000000000 x13: 00000000000000e3 x12: d5f0acbcebea978e x11: 00000000000000e3 x10: 000000010048fe41 x9 : 0000000000000000 x8 : ffff000007b43d90 x7 : 000000007a1e2125 x6 : 0000000000000000 x5 : ffff0000024e0900 x4 : ffff800000a0250c x3 : ffff000007b43c90 x2 : ffff00000294ca98 x1 : ffff000006831920 x0 : 0000000000000000 Call trace: ath10k_sta_statistics+0x10/0x2dc [ath10k_core] sta_set_sinfo+0xcc/0xbd4 ieee80211_get_station+0x2c/0x44 cfg80211_get_station+0x80/0x154 batadv_v_elp_get_throughput+0x138/0x1fc batadv_v_elp_throughput_metric_update+0x1c/0xa4 process_one_work+0x1ec/0x414 worker_thread+0x70/0x46c kthread+0xdc/0xe0 ret_from_fork+0x10/0x20 Code: a9bb7bfd 910003fd a90153f3 f9411c40 (f9402814)
This happens because STA has time to disconnect and reconnect before batadv_v_elp_throughput_metric_update() delayed work gets scheduled. In this situation, ath10k_sta_state() can be in the middle of resetting arsta data when the work queue get chance to be scheduled and ends up accessing it. Locking wiphy prevents that.
Fixes: 7406353d43c8 ("cfg80211: implement cfg80211_get_station cfg80211 API") Signed-off-by: Remi Pommarel repk@triplefau.lt Reviewed-by: Nicolas Escande nico.escande@gmail.com Acked-by: Antonio Quartulli a@unstable.cc Link: https://msgid.link/983b24a6a176e0800c01aedcd74480d9b551cb13.1716046653.git.r... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/util.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/wireless/util.c b/net/wireless/util.c index 2bde8a3546313..082c6f9c5416e 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -2549,6 +2549,7 @@ int cfg80211_get_station(struct net_device *dev, const u8 *mac_addr, { struct cfg80211_registered_device *rdev; struct wireless_dev *wdev; + int ret;
wdev = dev->ieee80211_ptr; if (!wdev) @@ -2560,7 +2561,11 @@ int cfg80211_get_station(struct net_device *dev, const u8 *mac_addr,
memset(sinfo, 0, sizeof(*sinfo));
- return rdev_get_station(rdev, dev, mac_addr, sinfo); + wiphy_lock(&rdev->wiphy); + ret = rdev_get_station(rdev, dev, mac_addr, sinfo); + wiphy_unlock(&rdev->wiphy); + + return ret; } EXPORT_SYMBOL(cfg80211_get_station);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lin Ma linma@zju.edu.cn
[ Upstream commit ab904521f4de52fef4f179d2dfc1877645ef5f5c ]
The commit 9bb7e0f24e7e ("cfg80211: add peer measurement with FTM initiator API") defines four attributes NL80211_PMSR_FTM_REQ_ATTR_ {NUM_BURSTS_EXP}/{BURST_PERIOD}/{BURST_DURATION}/{FTMS_PER_BURST} in following ways.
static const struct nla_policy nl80211_pmsr_ftm_req_attr_policy[NL80211_PMSR_FTM_REQ_ATTR_MAX + 1] = { ... [NL80211_PMSR_FTM_REQ_ATTR_NUM_BURSTS_EXP] = NLA_POLICY_MAX(NLA_U8, 15), [NL80211_PMSR_FTM_REQ_ATTR_BURST_PERIOD] = { .type = NLA_U16 }, [NL80211_PMSR_FTM_REQ_ATTR_BURST_DURATION] = NLA_POLICY_MAX(NLA_U8, 15), [NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST] = NLA_POLICY_MAX(NLA_U8, 31), ... };
That is, those attributes are expected to be NLA_U8 and NLA_U16 types. However, the consumers of these attributes in `pmsr_parse_ftm` blindly all use `nla_get_u32`, which is incorrect and causes functionality issues on little-endian platforms. Hence, fix them with the correct `nla_get_u8` and `nla_get_u16` functions.
Fixes: 9bb7e0f24e7e ("cfg80211: add peer measurement with FTM initiator API") Signed-off-by: Lin Ma linma@zju.edu.cn Link: https://msgid.link/20240521075059.47999-1-linma@zju.edu.cn Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/wireless/pmsr.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/wireless/pmsr.c b/net/wireless/pmsr.c index e106dcea39778..c569c37da3175 100644 --- a/net/wireless/pmsr.c +++ b/net/wireless/pmsr.c @@ -56,7 +56,7 @@ static int pmsr_parse_ftm(struct cfg80211_registered_device *rdev, out->ftm.burst_period = 0; if (tb[NL80211_PMSR_FTM_REQ_ATTR_BURST_PERIOD]) out->ftm.burst_period = - nla_get_u32(tb[NL80211_PMSR_FTM_REQ_ATTR_BURST_PERIOD]); + nla_get_u16(tb[NL80211_PMSR_FTM_REQ_ATTR_BURST_PERIOD]);
out->ftm.asap = !!tb[NL80211_PMSR_FTM_REQ_ATTR_ASAP]; if (out->ftm.asap && !capa->ftm.asap) { @@ -75,7 +75,7 @@ static int pmsr_parse_ftm(struct cfg80211_registered_device *rdev, out->ftm.num_bursts_exp = 0; if (tb[NL80211_PMSR_FTM_REQ_ATTR_NUM_BURSTS_EXP]) out->ftm.num_bursts_exp = - nla_get_u32(tb[NL80211_PMSR_FTM_REQ_ATTR_NUM_BURSTS_EXP]); + nla_get_u8(tb[NL80211_PMSR_FTM_REQ_ATTR_NUM_BURSTS_EXP]);
if (capa->ftm.max_bursts_exponent >= 0 && out->ftm.num_bursts_exp > capa->ftm.max_bursts_exponent) { @@ -88,7 +88,7 @@ static int pmsr_parse_ftm(struct cfg80211_registered_device *rdev, out->ftm.burst_duration = 15; if (tb[NL80211_PMSR_FTM_REQ_ATTR_BURST_DURATION]) out->ftm.burst_duration = - nla_get_u32(tb[NL80211_PMSR_FTM_REQ_ATTR_BURST_DURATION]); + nla_get_u8(tb[NL80211_PMSR_FTM_REQ_ATTR_BURST_DURATION]);
out->ftm.ftms_per_burst = 0; if (tb[NL80211_PMSR_FTM_REQ_ATTR_FTMS_PER_BURST]) @@ -107,7 +107,7 @@ static int pmsr_parse_ftm(struct cfg80211_registered_device *rdev, out->ftm.ftmr_retries = 3; if (tb[NL80211_PMSR_FTM_REQ_ATTR_NUM_FTMR_RETRIES]) out->ftm.ftmr_retries = - nla_get_u32(tb[NL80211_PMSR_FTM_REQ_ATTR_NUM_FTMR_RETRIES]); + nla_get_u8(tb[NL80211_PMSR_FTM_REQ_ATTR_NUM_FTMR_RETRIES]);
out->ftm.request_lci = !!tb[NL80211_PMSR_FTM_REQ_ATTR_REQUEST_LCI]; if (out->ftm.request_lci && !capa->ftm.request_lci) {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aditya Kumar Singh quic_adisi@quicinc.com
[ Upstream commit 8ecc4d7a7cd3e9704b63b8e4f6cd8b6b7314210f ]
Original changes[1] posted is having proper changes. However, at the same time, there was chandef puncturing changes which had a conflict with this. While applying, two errors crept in - a) Whitespace error. b) Link ID being passed to channel switch started notifier function is 0. However proper link ID is present in the function.
Fix these now.
[1] https://lore.kernel.org/all/20240130140918.1172387-5-quic_adisi@quicinc.com/
Fixes: 1a96bb4e8a79 ("wifi: mac80211: start and finalize channel switch on link basis") Signed-off-by: Aditya Kumar Singh quic_adisi@quicinc.com Link: https://msgid.link/20240509032555.263933-1-quic_adisi@quicinc.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/cfg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index 07abaf7820c56..51dc2d9dd6b84 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -4012,7 +4012,7 @@ __ieee80211_channel_switch(struct wiphy *wiphy, struct net_device *dev, goto out; }
- link_data->csa_chanreq = chanreq; + link_data->csa_chanreq = chanreq; link_conf->csa_active = true;
if (params->block_tx && @@ -4023,7 +4023,7 @@ __ieee80211_channel_switch(struct wiphy *wiphy, struct net_device *dev, }
cfg80211_ch_switch_started_notify(sdata->dev, - &link_data->csa_chanreq.oper, 0, + &link_data->csa_chanreq.oper, link_id, params->count, params->block_tx);
if (changed) {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miri Korenblit miriam.rachel.korenblit@intel.com
[ Upstream commit 92158790ce4391ce4c35d8dfbce759195e4724cb ]
The initialization of this worker moved to iwl_mvm_mac_init_mvmvif but we removed only from the pre-MLD version of the add_interface callback. Remove it also from the MLD version.
Fixes: 0bcc2155983e ("wifi: iwlwifi: mvm: init vif works only once") Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Reviewed-by: Johannes Berg johannes.berg@intel.com Link: https://msgid.link/20240512152312.4f15b41604f0.Iec912158e5a706175531d3736d77... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c | 2 -- 1 file changed, 2 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c b/drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c index df183a79db4c8..43f3002ede464 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mld-mac80211.c @@ -75,8 +75,6 @@ static int iwl_mvm_mld_mac_add_interface(struct ieee80211_hw *hw, goto out_free_bf;
iwl_mvm_tcm_add_vif(mvm, vif); - INIT_DELAYED_WORK(&mvmvif->csa_work, - iwl_mvm_channel_switch_disconnect_wk);
if (vif->type == NL80211_IFTYPE_MONITOR) { mvm->monitor_on = true;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 4a7aace2899711592327463c1a29ffee44fcc66e ]
We don't actually support >64 even for HE devices, so revert back to 64. This fixes an issue where the session is refused because the queue is configured differently from the actual session later.
Fixes: 514c30696fbc ("iwlwifi: add support for IEEE802.11ax") Signed-off-by: Johannes Berg johannes.berg@intel.com Reviewed-by: Liad Kaufman liad.kaufman@intel.com Reviewed-by: Luciano Coelho luciano.coelho@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240510170500.52f7b4cf83aa.If47e43adddf7fe250ed7f5571fbb... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/rs.h | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rs.h b/drivers/net/wireless/intel/iwlwifi/mvm/rs.h index 376b23b409dca..6cd4ec4d8f344 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rs.h +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rs.h @@ -122,13 +122,8 @@ enum {
#define LINK_QUAL_AGG_FRAME_LIMIT_DEF (63) #define LINK_QUAL_AGG_FRAME_LIMIT_MAX (63) -/* - * FIXME - various places in firmware API still use u8, - * e.g. LQ command and SCD config command. - * This should be 256 instead. - */ -#define LINK_QUAL_AGG_FRAME_LIMIT_GEN2_DEF (255) -#define LINK_QUAL_AGG_FRAME_LIMIT_GEN2_MAX (255) +#define LINK_QUAL_AGG_FRAME_LIMIT_GEN2_DEF (64) +#define LINK_QUAL_AGG_FRAME_LIMIT_GEN2_MAX (64) #define LINK_QUAL_AGG_FRAME_LIMIT_MIN (0)
#define LQ_SIZE 2 /* 2 mode tables: "Active" and "Search" */
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mordechay Goodstein mordechay.goodstein@intel.com
[ Upstream commit 0f2e9f6f21d1ff292363cdfb5bc4d492eeaff76e ]
In the driver we only use skb_put* for adding data to the skb, hence data never moves and skb_reset_mac_haeder would set mac_header to the first time data was added and not to mac80211 header, fix this my using the actual len of bytes added for setting the mac header.
Fixes: 3f7a9d577d47 ("wifi: iwlwifi: mvm: simplify by using SKB MAC header pointer") Signed-off-by: Mordechay Goodstein mordechay.goodstein@intel.com Reviewed-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240510170500.12f2de2909c3.I72a819b96f2fe55bde192a8fd31a... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c index ce8d83c771a70..8ac5c045fcfcb 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/rxmq.c @@ -2456,8 +2456,11 @@ void iwl_mvm_rx_monitor_no_data(struct iwl_mvm *mvm, struct napi_struct *napi, * * We mark it as mac header, for upper layers to know where * all radio tap header ends. + * + * Since data doesn't move data while putting data on skb and that is + * the only way we use, data + len is the next place that hdr would be put */ - skb_reset_mac_header(skb); + skb_set_mac_header(skb, skb->len);
/* * Override the nss from the rx_vec since the rate_n_flags has
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shahar S Matityahu shahar.s.matityahu@intel.com
[ Upstream commit 87821b67dea87addbc4ab093ba752753b002176a ]
The driver should call iwl_dbg_tlv_free even if debugfs is not defined since ini mode does not depend on debugfs ifdef.
Fixes: 68f6f492c4fa ("iwlwifi: trans: support loading ini TLVs from external file") Signed-off-by: Shahar S Matityahu shahar.s.matityahu@intel.com Reviewed-by: Luciano Coelho luciano.coelho@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240510170500.c8e3723f55b0.I5e805732b0be31ee6b83c642ec65... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/iwl-drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c index 1b7254569a37a..6c27ef2f7c7e5 100644 --- a/drivers/net/wireless/intel/iwlwifi/iwl-drv.c +++ b/drivers/net/wireless/intel/iwlwifi/iwl-drv.c @@ -1821,8 +1821,8 @@ struct iwl_drv *iwl_drv_start(struct iwl_trans *trans) err_fw: #ifdef CONFIG_IWLWIFI_DEBUGFS debugfs_remove_recursive(drv->dbgfs_drv); - iwl_dbg_tlv_free(drv->trans); #endif + iwl_dbg_tlv_free(drv->trans); kfree(drv); err: return ERR_PTR(ret);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miri Korenblit miriam.rachel.korenblit@intel.com
[ Upstream commit 60d62757df30b74bf397a2847a6db7385c6ee281 ]
In some versions of cfg80211, the ssids poinet might be a valid one even though n_ssids is 0. Accessing the pointer in this case will cuase an out-of-bound access. Fix this by checking n_ssids first.
Fixes: c1a7515393e4 ("iwlwifi: mvm: add adaptive dwell support") Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Reviewed-by: Ilan Peer ilan.peer@intel.com Reviewed-by: Johannes Berg johannes.berg@intel.com Link: https://msgid.link/20240513132416.6e4d1762bf0d.I5a0e6cc8f02050a766db704d1559... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 22bc032cffc8b..525d8efcc1475 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1303,7 +1303,7 @@ static void iwl_mvm_scan_umac_dwell(struct iwl_mvm *mvm, if (IWL_MVM_ADWELL_MAX_BUDGET) cmd->v7.adwell_max_budget = cpu_to_le16(IWL_MVM_ADWELL_MAX_BUDGET); - else if (params->ssids && params->ssids[0].ssid_len) + else if (params->n_ssids && params->ssids[0].ssid_len) cmd->v7.adwell_max_budget = cpu_to_le16(IWL_SCAN_ADWELL_MAX_BUDGET_DIRECTED_SCAN); else @@ -1405,7 +1405,7 @@ iwl_mvm_scan_umac_dwell_v11(struct iwl_mvm *mvm, if (IWL_MVM_ADWELL_MAX_BUDGET) general_params->adwell_max_budget = cpu_to_le16(IWL_MVM_ADWELL_MAX_BUDGET); - else if (params->ssids && params->ssids[0].ssid_len) + else if (params->n_ssids && params->ssids[0].ssid_len) general_params->adwell_max_budget = cpu_to_le16(IWL_SCAN_ADWELL_MAX_BUDGET_DIRECTED_SCAN); else
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Emmanuel Grumbach emmanuel.grumbach@intel.com
[ Upstream commit 4bb95f4535489ed830cf9b34b0a891e384d1aee4 ]
In case the firmware sends a notification that claims it has more data than it has, we will read past that was allocated for the notification. Remove the print of the buffer, we won't see it by default. If needed, we can see the content with tracing.
This was reported by KFENCE.
Fixes: bdccdb854f2f ("iwlwifi: mvm: support MFUART dump in case of MFUART assert") Signed-off-by: Emmanuel Grumbach emmanuel.grumbach@intel.com Reviewed-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240513132416.ba82a01a559e.Ia91dd20f5e1ca1ad380b95e68aeb... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 10 ---------- 1 file changed, 10 deletions(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c index e1c2b7fc92ab9..c56212c2c3066 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c @@ -94,20 +94,10 @@ void iwl_mvm_mfu_assert_dump_notif(struct iwl_mvm *mvm, { struct iwl_rx_packet *pkt = rxb_addr(rxb); struct iwl_mfu_assert_dump_notif *mfu_dump_notif = (void *)pkt->data; - __le32 *dump_data = mfu_dump_notif->data; - int n_words = le32_to_cpu(mfu_dump_notif->data_size) / sizeof(__le32); - int i;
if (mfu_dump_notif->index_num == 0) IWL_INFO(mvm, "MFUART assert id 0x%x occurred\n", le32_to_cpu(mfu_dump_notif->assert_id)); - - for (i = 0; i < n_words; i++) - IWL_DEBUG_INFO(mvm, - "MFUART assert dump, dword %u: 0x%08x\n", - le16_to_cpu(mfu_dump_notif->index_num) * - n_words + i, - le32_to_cpu(dump_data[i])); }
static bool iwl_alive_fn(struct iwl_notif_wait_data *notif_wait,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lingbo Kong quic_lingbok@quicinc.com
[ Upstream commit 0c2fd18f7ec552796179c14f13a0e06942f09d16 ]
Currently, the way to check the size of Spatial Reuse IE data in the ieee80211_parse_extension_element() is incorrect.
This is because the len variable in the ieee80211_parse_extension_element() function is equal to the size of Spatial Reuse IE data minus one and the value of returned by the ieee80211_he_spr_size() function is equal to the length of Spatial Reuse IE data. So the result of the len >= ieee80211_he_spr_size(data) statement always false.
To address this issue and make it consistent with the logic used elsewhere with ieee80211_he_oper_size(), change the "len >= ieee80211_he_spr_size(data)" to “len >= ieee80211_he_spr_size(data) - 1”.
Fixes: 9d0480a7c05b ("wifi: mac80211: move element parsing to a new file") Signed-off-by: Lingbo Kong quic_lingbok@quicinc.com Link: https://msgid.link/20240516021854.5682-2-quic_lingbok@quicinc.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/parse.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mac80211/parse.c b/net/mac80211/parse.c index 55e5497f89781..055a60e90979b 100644 --- a/net/mac80211/parse.c +++ b/net/mac80211/parse.c @@ -111,7 +111,7 @@ ieee80211_parse_extension_element(u32 *crc, if (params->mode < IEEE80211_CONN_MODE_HE) break; if (len >= sizeof(*elems->he_spr) && - len >= ieee80211_he_spr_size(data)) + len >= ieee80211_he_spr_size(data) - 1) elems->he_spr = data; break; case WLAN_EID_EXT_HE_6GHZ_CAPA:
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lingbo Kong quic_lingbok@quicinc.com
[ Upstream commit a26d8dc5227f449a54518a8b40733a54c6600a8b ]
Currently, the way of parsing Spatial Reuse Parameter Set element is incorrect and some members of struct ieee80211_he_obss_pd are not assigned.
To address this issue, it must be parsed in the order of the elements of Spatial Reuse Parameter Set defined in the IEEE Std 802.11ax specification.
The diagram of the Spatial Reuse Parameter Set element (IEEE Std 802.11ax -2021-9.4.2.252).
------------------------------------------------------------------------- | | | | |Non-SRG| SRG | SRG | SRG | SRG | |Element|Length| Element | SR |OBSS PD|OBSS PD|OBSS PD| BSS |Partial| | ID | | ID |Control| Max | Min | Max |Color | BSSID | | | |Extension| | Offset| Offset|Offset |Bitmap|Bitmap | -------------------------------------------------------------------------
Fixes: 1ced169cc1c2 ("mac80211: allow setting spatial reuse parameters from bss_conf") Signed-off-by: Lingbo Kong quic_lingbok@quicinc.com Link: https://msgid.link/20240516021854.5682-3-quic_lingbok@quicinc.com Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/he.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/mac80211/he.c b/net/mac80211/he.c index 9f5ffdc9db284..ecbb042dd0433 100644 --- a/net/mac80211/he.c +++ b/net/mac80211/he.c @@ -230,15 +230,21 @@ ieee80211_he_spr_ie_to_bss_conf(struct ieee80211_vif *vif,
if (!he_spr_ie_elem) return; + + he_obss_pd->sr_ctrl = he_spr_ie_elem->he_sr_control; data = he_spr_ie_elem->optional;
if (he_spr_ie_elem->he_sr_control & IEEE80211_HE_SPR_NON_SRG_OFFSET_PRESENT) - data++; + he_obss_pd->non_srg_max_offset = *data++; + if (he_spr_ie_elem->he_sr_control & IEEE80211_HE_SPR_SRG_INFORMATION_PRESENT) { - he_obss_pd->max_offset = *data++; he_obss_pd->min_offset = *data++; + he_obss_pd->max_offset = *data++; + memcpy(he_obss_pd->bss_color_bitmap, data, 8); + data += 8; + memcpy(he_obss_pd->partial_bssid_bitmap, data, 8); he_obss_pd->enable = true; } }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chanwoo Lee cw9316.lee@samsung.com
[ Upstream commit d53b681ce9ca7db5ef4ecb8d2cf465ae4a031264 ]
An error unrelated to ufshcd_try_to_abort_task is being logged and can cause confusion. Modify ufshcd_mcq_abort() to print the result of the abort failure. For readability, return immediately instead of 'goto'.
Fixes: f1304d442077 ("scsi: ufs: mcq: Added ufshcd_mcq_abort()") Signed-off-by: Chanwoo Lee cw9316.lee@samsung.com Link: https://lore.kernel.org/r/20240524015904.1116005-1-cw9316.lee@samsung.com Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ufs/core/ufs-mcq.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-)
diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c index 005d63ab1f441..8944548c30fa1 100644 --- a/drivers/ufs/core/ufs-mcq.c +++ b/drivers/ufs/core/ufs-mcq.c @@ -634,20 +634,20 @@ int ufshcd_mcq_abort(struct scsi_cmnd *cmd) struct ufshcd_lrb *lrbp = &hba->lrb[tag]; struct ufs_hw_queue *hwq; unsigned long flags; - int err = FAILED; + int err;
if (!ufshcd_cmd_inflight(lrbp->cmd)) { dev_err(hba->dev, "%s: skip abort. cmd at tag %d already completed.\n", __func__, tag); - goto out; + return FAILED; }
/* Skip task abort in case previous aborts failed and report failure */ if (lrbp->req_abort_skip) { dev_err(hba->dev, "%s: skip abort. tag %d failed earlier\n", __func__, tag); - goto out; + return FAILED; }
hwq = ufshcd_mcq_req_to_hwq(hba, scsi_cmd_to_rq(cmd)); @@ -659,7 +659,7 @@ int ufshcd_mcq_abort(struct scsi_cmnd *cmd) */ dev_err(hba->dev, "%s: cmd found in sq. hwq=%d, tag=%d\n", __func__, hwq->id, tag); - goto out; + return FAILED; }
/* @@ -667,18 +667,17 @@ int ufshcd_mcq_abort(struct scsi_cmnd *cmd) * in the completion queue either. Query the device to see if * the command is being processed in the device. */ - if (ufshcd_try_to_abort_task(hba, tag)) { + err = ufshcd_try_to_abort_task(hba, tag); + if (err) { dev_err(hba->dev, "%s: device abort failed %d\n", __func__, err); lrbp->req_abort_skip = true; - goto out; + return FAILED; }
- err = SUCCESS; spin_lock_irqsave(&hwq->cq_lock, flags); if (ufshcd_cmd_inflight(lrbp->cmd)) ufshcd_release_scsi_cmd(hba, lrbp); spin_unlock_irqrestore(&hwq->cq_lock, flags);
-out: - return err; + return SUCCESS; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yong-Xuan Wang yongxuan.wang@sifive.com
[ Upstream commit 2d707b4e37f9b0c37b8b2392f91b04c5b63ea538 ]
When the maximum hart number within groups is 1, hart-index-bit is set to 0. Consequently, there is no need to restore the hart ID from IMSIC addresses and hart-index-bit settings. Currently, QEMU and kvmtool do not pass correct hart-index-bit values when the maximum hart number is a power of 2, thereby avoiding this issue. Corresponding patches for QEMU and kvmtool will also be dispatched.
Fixes: 89d01306e34d ("RISC-V: KVM: Implement device interface for AIA irqchip") Signed-off-by: Yong-Xuan Wang yongxuan.wang@sifive.com Reviewed-by: Andrew Jones ajones@ventanamicro.com Link: https://lore.kernel.org/r/20240415064905.25184-1-yongxuan.wang@sifive.com Signed-off-by: Anup Patel anup@brainfault.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/kvm/aia_device.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/riscv/kvm/aia_device.c b/arch/riscv/kvm/aia_device.c index 0eb689351b7d0..5cd407c6a8e4f 100644 --- a/arch/riscv/kvm/aia_device.c +++ b/arch/riscv/kvm/aia_device.c @@ -237,10 +237,11 @@ static gpa_t aia_imsic_ppn(struct kvm_aia *aia, gpa_t addr)
static u32 aia_imsic_hart_index(struct kvm_aia *aia, gpa_t addr) { - u32 hart, group = 0; + u32 hart = 0, group = 0;
- hart = (addr >> (aia->nr_guest_bits + IMSIC_MMIO_PAGE_SHIFT)) & - GENMASK_ULL(aia->nr_hart_bits - 1, 0); + if (aia->nr_hart_bits) + hart = (addr >> (aia->nr_guest_bits + IMSIC_MMIO_PAGE_SHIFT)) & + GENMASK_ULL(aia->nr_hart_bits - 1, 0); if (aia->nr_group_bits) group = (addr >> aia->nr_group_shift) & GENMASK_ULL(aia->nr_group_bits - 1, 0);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quan Zhou zhouquan@iscas.ac.cn
[ Upstream commit c66f3b40b17d3dfc4b6abb5efde8e71c46971821 ]
In the function kvm_riscv_vcpu_set_reg_isa_ext, the original code used incorrect reg_subtype labels KVM_REG_RISCV_SBI_MULTI_EN/DIS. These have been corrected to KVM_REG_RISCV_ISA_MULTI_EN/DIS respectively. Although they are numerically equivalent, the actual processing will not result in errors, but it may lead to ambiguous code semantics.
Fixes: 613029442a4b ("RISC-V: KVM: Extend ONE_REG to enable/disable multiple ISA extensions") Signed-off-by: Quan Zhou zhouquan@iscas.ac.cn Reviewed-by: Andrew Jones ajones@ventanamicro.com Link: https://lore.kernel.org/r/ff1c6771a67d660db94372ac9aaa40f51e5e0090.171642937... Signed-off-by: Anup Patel anup@brainfault.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/kvm/vcpu_onereg.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/riscv/kvm/vcpu_onereg.c b/arch/riscv/kvm/vcpu_onereg.c index 994adc26db4b1..e5706f5f2c71a 100644 --- a/arch/riscv/kvm/vcpu_onereg.c +++ b/arch/riscv/kvm/vcpu_onereg.c @@ -718,9 +718,9 @@ static int kvm_riscv_vcpu_set_reg_isa_ext(struct kvm_vcpu *vcpu, switch (reg_subtype) { case KVM_REG_RISCV_ISA_SINGLE: return riscv_vcpu_set_isa_ext_single(vcpu, reg_num, reg_val); - case KVM_REG_RISCV_SBI_MULTI_EN: + case KVM_REG_RISCV_ISA_MULTI_EN: return riscv_vcpu_set_isa_ext_multi(vcpu, reg_num, reg_val, true); - case KVM_REG_RISCV_SBI_MULTI_DIS: + case KVM_REG_RISCV_ISA_MULTI_DIS: return riscv_vcpu_set_isa_ext_multi(vcpu, reg_num, reg_val, false); default: return -ENOENT;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Heng Qi hengqi@linux.alibaba.com
[ Upstream commit 9e0945b1901c9eed4fbee3b8a3870487b2bdc936 ]
When the dim worker is scheduled, if it no longer needs to issue commands, dim may not be able to return to the working state later.
For example, the following single queue scenario: 1. The dim worker of rxq0 is scheduled, and the dim status is changed to DIM_APPLY_NEW_PROFILE; 2. dim is disabled or parameters have not been modified; 3. virtnet_rx_dim_work exits directly;
Then, even if net_dim is invoked again, it cannot work because the state is not restored to DIM_START_MEASURE.
Fixes: 6208799553a8 ("virtio-net: support rx netdim") Signed-off-by: Heng Qi hengqi@linux.alibaba.com Reviewed-by: Jiri Pirko jiri@nvidia.com Acked-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Xuan Zhuo xuanzhuo@linux.alibaba.com Link: https://lore.kernel.org/r/20240528134116.117426-2-hengqi@linux.alibaba.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 115c3c5414f2a..574b052a517d7 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -3589,10 +3589,10 @@ static void virtnet_rx_dim_work(struct work_struct *work) if (err) pr_debug("%s: Failed to send dim parameters on rxq%d\n", dev->name, qnum); - dim->state = DIM_START_MEASURE; } }
+ dim->state = DIM_START_MEASURE; rtnl_unlock(); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lars Kellogg-Stedman lars@oddbit.com
[ Upstream commit 3c34fb0bd4a4237592c5ecb5b2e2531900c55774 ]
When releasing a socket in ax25_release(), we call netdev_put() to decrease the refcount on the associated ax.25 device. However, the execution path for accepting an incoming connection never calls netdev_hold(). This imbalance leads to refcount errors, and ultimately to kernel crashes.
A typical call trace for the above situation will start with one of the following errors:
refcount_t: decrement hit 0; leaking memory. refcount_t: underflow; use-after-free.
And will then have a trace like:
Call Trace: <TASK> ? show_regs+0x64/0x70 ? __warn+0x83/0x120 ? refcount_warn_saturate+0xb2/0x100 ? report_bug+0x158/0x190 ? prb_read_valid+0x20/0x30 ? handle_bug+0x3e/0x70 ? exc_invalid_op+0x1c/0x70 ? asm_exc_invalid_op+0x1f/0x30 ? refcount_warn_saturate+0xb2/0x100 ? refcount_warn_saturate+0xb2/0x100 ax25_release+0x2ad/0x360 __sock_release+0x35/0xa0 sock_close+0x19/0x20 [...]
On reboot (or any attempt to remove the interface), the kernel gets stuck in an infinite loop:
unregister_netdevice: waiting for ax0 to become free. Usage count = 0
This patch corrects these issues by ensuring that we call netdev_hold() and ax25_dev_hold() for new connections in ax25_accept(). This makes the logic leading to ax25_accept() match the logic for ax25_bind(): in both cases we increment the refcount, which is ultimately decremented in ax25_release().
Fixes: 9fd75b66b8f6 ("ax25: Fix refcount leaks caused by ax25_cb_del()") Signed-off-by: Lars Kellogg-Stedman lars@oddbit.com Tested-by: Duoming Zhou duoming@zju.edu.cn Tested-by: Dan Cross crossd@gmail.com Tested-by: Chris Maness christopher.maness@gmail.com Reviewed-by: Dan Carpenter dan.carpenter@linaro.org Link: https://lore.kernel.org/r/20240529210242.3346844-2-lars@oddbit.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ax25/af_ax25.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c index 9169efb2f43aa..5fff5930e4deb 100644 --- a/net/ax25/af_ax25.c +++ b/net/ax25/af_ax25.c @@ -1378,8 +1378,10 @@ static int ax25_accept(struct socket *sock, struct socket *newsock, int flags, { struct sk_buff *skb; struct sock *newsk; + ax25_dev *ax25_dev; DEFINE_WAIT(wait); struct sock *sk; + ax25_cb *ax25; int err = 0;
if (sock->state != SS_UNCONNECTED) @@ -1434,6 +1436,10 @@ static int ax25_accept(struct socket *sock, struct socket *newsock, int flags, kfree_skb(skb); sk_acceptq_removed(sk); newsock->state = SS_CONNECTED; + ax25 = sk_to_ax25(newsk); + ax25_dev = ax25->ax25_dev; + netdev_hold(ax25_dev->dev, &ax25->dev_tracker, GFP_ATOMIC); + ax25_dev_hold(ax25_dev);
out: release_sock(sk);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Duoming Zhou duoming@zju.edu.cn
[ Upstream commit 166fcf86cd34e15c7f383eda4642d7a212393008 ]
The object "ax25_dev" is managed by reference counting. Thus it should not be directly released by kfree(), replace with ax25_dev_put().
Fixes: d01ffb9eee4a ("ax25: add refcount in ax25_dev to avoid UAF bugs") Suggested-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: Duoming Zhou duoming@zju.edu.cn Reviewed-by: Dan Carpenter dan.carpenter@linaro.org Link: https://lore.kernel.org/r/20240530051733.11416-1-duoming@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ax25/ax25_dev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c index c9d55b99a7a57..67ae6b8c52989 100644 --- a/net/ax25/ax25_dev.c +++ b/net/ax25/ax25_dev.c @@ -193,7 +193,7 @@ void __exit ax25_dev_free(void) list_for_each_entry_safe(s, n, &ax25_dev_list, list) { netdev_put(s->dev, &s->dev_tracker); list_del(&s->list); - kfree(s); + ax25_dev_put(s); } spin_unlock_bh(&ax25_dev_lock); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: DelphineCCChiu delphine_cc_chiu@wiwynn.com
[ Upstream commit e85e271dec0270982afed84f70dc37703fcc1d52 ]
Currently NCSI driver will send several NCSI commands back to back without waiting the response of previous NCSI command or timeout in some state when NIC have multi channel. This operation against the single thread manner defined by NCSI SPEC(section 6.3.2.3 in DSP0222_1.1.1)
According to NCSI SPEC(section 6.2.13.1 in DSP0222_1.1.1), we should probe one channel at a time by sending NCSI commands (Clear initial state, Get version ID, Get capabilities...), than repeat this steps until the max number of channels which we got from NCSI command (Get capabilities) has been probed.
Fixes: e6f44ed6d04d ("net/ncsi: Package and channel management") Signed-off-by: DelphineCCChiu delphine_cc_chiu@wiwynn.com Link: https://lore.kernel.org/r/20240529065856.825241-1-delphine_cc_chiu@wiwynn.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ncsi/internal.h | 2 ++ net/ncsi/ncsi-manage.c | 73 +++++++++++++++++++++--------------------- net/ncsi/ncsi-rsp.c | 4 ++- 3 files changed, 41 insertions(+), 38 deletions(-)
diff --git a/net/ncsi/internal.h b/net/ncsi/internal.h index 374412ed780b6..ef0f8f73826f5 100644 --- a/net/ncsi/internal.h +++ b/net/ncsi/internal.h @@ -325,6 +325,7 @@ struct ncsi_dev_priv { spinlock_t lock; /* Protect the NCSI device */ unsigned int package_probe_id;/* Current ID during probe */ unsigned int package_num; /* Number of packages */ + unsigned int channel_probe_id;/* Current cahnnel ID during probe */ struct list_head packages; /* List of packages */ struct ncsi_channel *hot_channel; /* Channel was ever active */ struct ncsi_request requests[256]; /* Request table */ @@ -343,6 +344,7 @@ struct ncsi_dev_priv { bool multi_package; /* Enable multiple packages */ bool mlx_multi_host; /* Enable multi host Mellanox */ u32 package_whitelist; /* Packages to configure */ + unsigned char channel_count; /* Num of channels to probe */ };
struct ncsi_cmd_arg { diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c index 745c788f1d1df..5ecf611c88200 100644 --- a/net/ncsi/ncsi-manage.c +++ b/net/ncsi/ncsi-manage.c @@ -510,17 +510,19 @@ static void ncsi_suspend_channel(struct ncsi_dev_priv *ndp)
break; case ncsi_dev_state_suspend_gls: - ndp->pending_req_num = np->channel_num; + ndp->pending_req_num = 1;
nca.type = NCSI_PKT_CMD_GLS; nca.package = np->id; + nca.channel = ndp->channel_probe_id; + ret = ncsi_xmit_cmd(&nca); + if (ret) + goto error; + ndp->channel_probe_id++;
- nd->state = ncsi_dev_state_suspend_dcnt; - NCSI_FOR_EACH_CHANNEL(np, nc) { - nca.channel = nc->id; - ret = ncsi_xmit_cmd(&nca); - if (ret) - goto error; + if (ndp->channel_probe_id == ndp->channel_count) { + ndp->channel_probe_id = 0; + nd->state = ncsi_dev_state_suspend_dcnt; }
break; @@ -1345,7 +1347,6 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) { struct ncsi_dev *nd = &ndp->ndev; struct ncsi_package *np; - struct ncsi_channel *nc; struct ncsi_cmd_arg nca; unsigned char index; int ret; @@ -1423,23 +1424,6 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp)
nd->state = ncsi_dev_state_probe_cis; break; - case ncsi_dev_state_probe_cis: - ndp->pending_req_num = NCSI_RESERVED_CHANNEL; - - /* Clear initial state */ - nca.type = NCSI_PKT_CMD_CIS; - nca.package = ndp->active_package->id; - for (index = 0; index < NCSI_RESERVED_CHANNEL; index++) { - nca.channel = index; - ret = ncsi_xmit_cmd(&nca); - if (ret) - goto error; - } - - nd->state = ncsi_dev_state_probe_gvi; - if (IS_ENABLED(CONFIG_NCSI_OEM_CMD_KEEP_PHY)) - nd->state = ncsi_dev_state_probe_keep_phy; - break; case ncsi_dev_state_probe_keep_phy: ndp->pending_req_num = 1;
@@ -1452,14 +1436,17 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp)
nd->state = ncsi_dev_state_probe_gvi; break; + case ncsi_dev_state_probe_cis: case ncsi_dev_state_probe_gvi: case ncsi_dev_state_probe_gc: case ncsi_dev_state_probe_gls: np = ndp->active_package; - ndp->pending_req_num = np->channel_num; + ndp->pending_req_num = 1;
- /* Retrieve version, capability or link status */ - if (nd->state == ncsi_dev_state_probe_gvi) + /* Clear initial state Retrieve version, capability or link status */ + if (nd->state == ncsi_dev_state_probe_cis) + nca.type = NCSI_PKT_CMD_CIS; + else if (nd->state == ncsi_dev_state_probe_gvi) nca.type = NCSI_PKT_CMD_GVI; else if (nd->state == ncsi_dev_state_probe_gc) nca.type = NCSI_PKT_CMD_GC; @@ -1467,19 +1454,29 @@ static void ncsi_probe_channel(struct ncsi_dev_priv *ndp) nca.type = NCSI_PKT_CMD_GLS;
nca.package = np->id; - NCSI_FOR_EACH_CHANNEL(np, nc) { - nca.channel = nc->id; - ret = ncsi_xmit_cmd(&nca); - if (ret) - goto error; - } + nca.channel = ndp->channel_probe_id;
- if (nd->state == ncsi_dev_state_probe_gvi) + ret = ncsi_xmit_cmd(&nca); + if (ret) + goto error; + + if (nd->state == ncsi_dev_state_probe_cis) { + nd->state = ncsi_dev_state_probe_gvi; + if (IS_ENABLED(CONFIG_NCSI_OEM_CMD_KEEP_PHY) && ndp->channel_probe_id == 0) + nd->state = ncsi_dev_state_probe_keep_phy; + } else if (nd->state == ncsi_dev_state_probe_gvi) { nd->state = ncsi_dev_state_probe_gc; - else if (nd->state == ncsi_dev_state_probe_gc) + } else if (nd->state == ncsi_dev_state_probe_gc) { nd->state = ncsi_dev_state_probe_gls; - else + } else { + nd->state = ncsi_dev_state_probe_cis; + ndp->channel_probe_id++; + } + + if (ndp->channel_probe_id == ndp->channel_count) { + ndp->channel_probe_id = 0; nd->state = ncsi_dev_state_probe_dp; + } break; case ncsi_dev_state_probe_dp: ndp->pending_req_num = 1; @@ -1780,6 +1777,7 @@ struct ncsi_dev *ncsi_register_dev(struct net_device *dev, ndp->requests[i].ndp = ndp; timer_setup(&ndp->requests[i].timer, ncsi_request_timeout, 0); } + ndp->channel_count = NCSI_RESERVED_CHANNEL;
spin_lock_irqsave(&ncsi_dev_lock, flags); list_add_tail_rcu(&ndp->node, &ncsi_dev_list); @@ -1813,6 +1811,7 @@ int ncsi_start_dev(struct ncsi_dev *nd)
if (!(ndp->flags & NCSI_DEV_PROBED)) { ndp->package_probe_id = 0; + ndp->channel_probe_id = 0; nd->state = ncsi_dev_state_probe; schedule_work(&ndp->work); return 0; diff --git a/net/ncsi/ncsi-rsp.c b/net/ncsi/ncsi-rsp.c index bee290d0f48b6..e28be33bdf2c4 100644 --- a/net/ncsi/ncsi-rsp.c +++ b/net/ncsi/ncsi-rsp.c @@ -795,12 +795,13 @@ static int ncsi_rsp_handler_gc(struct ncsi_request *nr) struct ncsi_rsp_gc_pkt *rsp; struct ncsi_dev_priv *ndp = nr->ndp; struct ncsi_channel *nc; + struct ncsi_package *np; size_t size;
/* Find the channel */ rsp = (struct ncsi_rsp_gc_pkt *)skb_network_header(nr->rsp); ncsi_find_package_and_channel(ndp, rsp->rsp.common.channel, - NULL, &nc); + &np, &nc); if (!nc) return -ENODEV;
@@ -835,6 +836,7 @@ static int ncsi_rsp_handler_gc(struct ncsi_request *nr) */ nc->vlan_filter.bitmap = U64_MAX; nc->vlan_filter.n_vids = rsp->vlan_cnt; + np->ndp->channel_count = rsp->channel_cnt;
return 0; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tristram Ha tristram.ha@microchip.com
[ Upstream commit 6149db4997f582e958da675092f21c666e3b67b7 ]
When the PHY is powered up after powered down most of the registers are reset, so the PHY setup code needs to be done again. In addition the interrupt register will need to be setup again so that link status indication works again.
Fixes: 26dd2974c5b5 ("net: phy: micrel: Move KSZ9477 errata fixes to PHY driver") Signed-off-by: Tristram Ha tristram.ha@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/micrel.c | 62 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 56 insertions(+), 6 deletions(-)
diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index 13370439a7cae..c2d99344ade41 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -1858,7 +1858,7 @@ static const struct ksz9477_errata_write ksz9477_errata_writes[] = { {0x1c, 0x20, 0xeeee}, };
-static int ksz9477_config_init(struct phy_device *phydev) +static int ksz9477_phy_errata(struct phy_device *phydev) { int err; int i; @@ -1886,16 +1886,30 @@ static int ksz9477_config_init(struct phy_device *phydev) return err; }
+ err = genphy_restart_aneg(phydev); + if (err) + return err; + + return err; +} + +static int ksz9477_config_init(struct phy_device *phydev) +{ + int err; + + /* Only KSZ9897 family of switches needs this fix. */ + if ((phydev->phy_id & 0xf) == 1) { + err = ksz9477_phy_errata(phydev); + if (err) + return err; + } + /* According to KSZ9477 Errata DS80000754C (Module 4) all EEE modes * in this switch shall be regarded as broken. */ if (phydev->dev_flags & MICREL_NO_EEE) phydev->eee_broken_modes = -1;
- err = genphy_restart_aneg(phydev); - if (err) - return err; - return kszphy_config_init(phydev); }
@@ -2004,6 +2018,42 @@ static int kszphy_resume(struct phy_device *phydev) return 0; }
+static int ksz9477_resume(struct phy_device *phydev) +{ + int ret; + + /* No need to initialize registers if not powered down. */ + ret = phy_read(phydev, MII_BMCR); + if (ret < 0) + return ret; + if (!(ret & BMCR_PDOWN)) + return 0; + + genphy_resume(phydev); + + /* After switching from power-down to normal mode, an internal global + * reset is automatically generated. Wait a minimum of 1 ms before + * read/write access to the PHY registers. + */ + usleep_range(1000, 2000); + + /* Only KSZ9897 family of switches needs this fix. */ + if ((phydev->phy_id & 0xf) == 1) { + ret = ksz9477_phy_errata(phydev); + if (ret) + return ret; + } + + /* Enable PHY Interrupts */ + if (phy_interrupt_is_valid(phydev)) { + phydev->interrupts = PHY_INTERRUPT_ENABLED; + if (phydev->drv->config_intr) + phydev->drv->config_intr(phydev); + } + + return 0; +} + static int kszphy_probe(struct phy_device *phydev) { const struct kszphy_type *type = phydev->drv->driver_data; @@ -4980,7 +5030,7 @@ static struct phy_driver ksphy_driver[] = { .config_intr = kszphy_config_intr, .handle_interrupt = kszphy_handle_interrupt, .suspend = genphy_suspend, - .resume = genphy_resume, + .resume = ksz9477_resume, .get_features = ksz9477_get_features, } };
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang cong.wang@bytedance.com
[ Upstream commit 2884dc7d08d98a89d8d65121524bb7533183a63a ]
After commit 1a80dbcb2dba, bpf_link can be freed by link->ops->dealloc_deferred, but the code still tests and uses link->ops->dealloc afterward, which leads to a use-after-free as reported by syzbot. Actually, one of them should be sufficient, so just call one of them instead of both. Also add a WARN_ON() in case of any problematic implementation.
Fixes: 1a80dbcb2dba ("bpf: support deferring bpf_link dealloc to after RCU grace period") Reported-by: syzbot+1989ee16d94720836244@syzkaller.appspotmail.com Signed-off-by: Cong Wang cong.wang@bytedance.com Signed-off-by: Daniel Borkmann daniel@iogearbox.net Acked-by: Jiri Olsa jolsa@kernel.org Link: https://lore.kernel.org/bpf/20240602182703.207276-1-xiyou.wangcong@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/syscall.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c index cb61d8880dbe0..52ffe33356418 100644 --- a/kernel/bpf/syscall.c +++ b/kernel/bpf/syscall.c @@ -2985,6 +2985,7 @@ static int bpf_obj_get(const union bpf_attr *attr) void bpf_link_init(struct bpf_link *link, enum bpf_link_type type, const struct bpf_link_ops *ops, struct bpf_prog *prog) { + WARN_ON(ops->dealloc && ops->dealloc_deferred); atomic64_set(&link->refcnt, 1); link->type = type; link->id = 0; @@ -3043,16 +3044,17 @@ static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu) /* bpf_link_free is guaranteed to be called from process context */ static void bpf_link_free(struct bpf_link *link) { + const struct bpf_link_ops *ops = link->ops; bool sleepable = false;
bpf_link_free_id(link->id); if (link->prog) { sleepable = link->prog->sleepable; /* detach BPF program, clean up used resources */ - link->ops->release(link); + ops->release(link); bpf_prog_put(link->prog); } - if (link->ops->dealloc_deferred) { + if (ops->dealloc_deferred) { /* schedule BPF link deallocation; if underlying BPF program * is sleepable, we need to first wait for RCU tasks trace * sync, then go through "classic" RCU grace period @@ -3061,9 +3063,8 @@ static void bpf_link_free(struct bpf_link *link) call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp); else call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp); - } - if (link->ops->dealloc) - link->ops->dealloc(link); + } else if (ops->dealloc) + ops->dealloc(link); }
static void bpf_link_put_deferred(struct work_struct *work)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ravi Bangoria ravi.bangoria@amd.com
[ Upstream commit d922056215617eedfbdbc29fe49953423686fe5e ]
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES guests. So, prevent SEV-ES guests when LBRV support is missing.
[1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June 2023, Vol 2, 15.35.2 Enabling SEV-ES. https://bugzilla.kernel.org/attachment.cgi?id=304653
Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") Signed-off-by: Ravi Bangoria ravi.bangoria@amd.com Message-ID: 20240531044644.768-3-ravi.bangoria@amd.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/svm/sev.c | 6 ++++++ arch/x86/kvm/svm/svm.c | 16 +++++++--------- arch/x86/kvm/svm/svm.h | 1 + 3 files changed, 14 insertions(+), 9 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 759581bb2128d..43b7d76a27a56 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2269,6 +2269,12 @@ void __init sev_hardware_setup(void) if (!boot_cpu_has(X86_FEATURE_SEV_ES)) goto out;
+ if (!lbrv) { + WARN_ONCE(!boot_cpu_has(X86_FEATURE_LBRV), + "LBRV must be present for SEV-ES support"); + goto out; + } + /* Has the system been allocated ASIDs for SEV-ES? */ if (min_sev_asid == 1) goto out; diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 308416b50b036..3363e5ba0fff5 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -215,7 +215,7 @@ int vgif = true; module_param(vgif, int, 0444);
/* enable/disable LBR virtualization */ -static int lbrv = true; +int lbrv = true; module_param(lbrv, int, 0444);
static int tsc_scaling = true; @@ -5260,6 +5260,12 @@ static __init int svm_hardware_setup(void)
nrips = nrips && boot_cpu_has(X86_FEATURE_NRIPS);
+ if (lbrv) { + if (!boot_cpu_has(X86_FEATURE_LBRV)) + lbrv = false; + else + pr_info("LBR virtualization supported\n"); + } /* * Note, SEV setup consumes npt_enabled and enable_mmio_caching (which * may be modified by svm_adjust_mmio_mask()), as well as nrips. @@ -5313,14 +5319,6 @@ static __init int svm_hardware_setup(void) svm_x86_ops.set_vnmi_pending = NULL; }
- - if (lbrv) { - if (!boot_cpu_has(X86_FEATURE_LBRV)) - lbrv = false; - else - pr_info("LBR virtualization supported\n"); - } - if (!enable_pmu) pr_info("PMU virtualization is disabled\n");
diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 33878efdebc82..4bf9af529ae03 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -39,6 +39,7 @@ extern int vgif; extern bool intercept_smi; extern bool x2avic_enabled; extern bool vnmi; +extern int lbrv;
/* * Clean bits in VMCB.
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ravi Bangoria ravi.bangoria@amd.com
[ Upstream commit b7e4be0a224fe5c6be30c1c8bdda8d2317ad6ba4 ]
As documented in APM[1], LBR Virtualization must be enabled for SEV-ES guests. Although KVM currently enforces LBRV for SEV-ES guests, there are multiple issues with it:
o MSR_IA32_DEBUGCTLMSR is still intercepted. Since MSR_IA32_DEBUGCTLMSR interception is used to dynamically toggle LBRV for performance reasons, this can be fatal for SEV-ES guests. For ex SEV-ES guest on Zen3:
[guest ~]# wrmsr 0x1d9 0x4 KVM: entry failed, hardware error 0xffffffff EAX=00000004 EBX=00000000 ECX=000001d9 EDX=00000000
Fix this by never intercepting MSR_IA32_DEBUGCTLMSR for SEV-ES guests. No additional save/restore logic is required since MSR_IA32_DEBUGCTLMSR is of swap type A.
o KVM will disable LBRV if userspace sets MSR_IA32_DEBUGCTLMSR before the VMSA is encrypted. Fix this by moving LBRV enablement code post VMSA encryption.
[1]: AMD64 Architecture Programmer's Manual Pub. 40332, Rev. 4.07 - June 2023, Vol 2, 15.35.2 Enabling SEV-ES. https://bugzilla.kernel.org/attachment.cgi?id=304653
Fixes: 376c6d285017 ("KVM: SVM: Provide support for SEV-ES vCPU creation/loading") Co-developed-by: Nikunj A Dadhania nikunj@amd.com Signed-off-by: Nikunj A Dadhania nikunj@amd.com Signed-off-by: Ravi Bangoria ravi.bangoria@amd.com Message-ID: 20240531044644.768-4-ravi.bangoria@amd.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kvm/svm/sev.c | 13 ++++++++----- arch/x86/kvm/svm/svm.c | 8 +++++++- arch/x86/kvm/svm/svm.h | 3 ++- 3 files changed, 17 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index 43b7d76a27a56..4471b4e08d23d 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -666,6 +666,14 @@ static int __sev_launch_update_vmsa(struct kvm *kvm, struct kvm_vcpu *vcpu, return ret;
vcpu->arch.guest_state_protected = true; + + /* + * SEV-ES guest mandates LBR Virtualization to be _always_ ON. Enable it + * only after setting guest_state_protected because KVM_SET_MSRS allows + * dynamic toggling of LBRV (for performance reason) on write access to + * MSR_IA32_DEBUGCTLMSR when guest_state_protected is not set. + */ + svm_enable_lbrv(vcpu); return 0; }
@@ -3040,7 +3048,6 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm) struct kvm_vcpu *vcpu = &svm->vcpu;
svm->vmcb->control.nested_ctl |= SVM_NESTED_CTL_SEV_ES_ENABLE; - svm->vmcb->control.virt_ext |= LBR_CTL_ENABLE_MASK;
/* * An SEV-ES guest requires a VMSA area that is a separate from the @@ -3092,10 +3099,6 @@ static void sev_es_init_vmcb(struct vcpu_svm *svm) /* Clear intercepts on selected MSRs */ set_msr_interception(vcpu, svm->msrpm, MSR_EFER, 1, 1); set_msr_interception(vcpu, svm->msrpm, MSR_IA32_CR_PAT, 1, 1); - set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 1, 1); - set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 1, 1); - set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1); - set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1); }
void sev_init_vmcb(struct vcpu_svm *svm) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 3363e5ba0fff5..4650153afa465 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -99,6 +99,7 @@ static const struct svm_direct_access_msrs { { .index = MSR_IA32_SPEC_CTRL, .always = false }, { .index = MSR_IA32_PRED_CMD, .always = false }, { .index = MSR_IA32_FLUSH_CMD, .always = false }, + { .index = MSR_IA32_DEBUGCTLMSR, .always = false }, { .index = MSR_IA32_LASTBRANCHFROMIP, .always = false }, { .index = MSR_IA32_LASTBRANCHTOIP, .always = false }, { .index = MSR_IA32_LASTINTFROMIP, .always = false }, @@ -990,7 +991,7 @@ void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb) vmcb_mark_dirty(to_vmcb, VMCB_LBR); }
-static void svm_enable_lbrv(struct kvm_vcpu *vcpu) +void svm_enable_lbrv(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu);
@@ -1000,6 +1001,9 @@ static void svm_enable_lbrv(struct kvm_vcpu *vcpu) set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTFROMIP, 1, 1); set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1);
+ if (sev_es_guest(vcpu->kvm)) + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_DEBUGCTLMSR, 1, 1); + /* Move the LBR msrs to the vmcb02 so that the guest can see them. */ if (is_guest_mode(vcpu)) svm_copy_lbrs(svm->vmcb, svm->vmcb01.ptr); @@ -1009,6 +1013,8 @@ static void svm_disable_lbrv(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu);
+ KVM_BUG_ON(sev_es_guest(vcpu->kvm), vcpu->kvm); + svm->vmcb->control.virt_ext &= ~LBR_CTL_ENABLE_MASK; set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHFROMIP, 0, 0); set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTBRANCHTOIP, 0, 0); diff --git a/arch/x86/kvm/svm/svm.h b/arch/x86/kvm/svm/svm.h index 4bf9af529ae03..2ed3015e03f13 100644 --- a/arch/x86/kvm/svm/svm.h +++ b/arch/x86/kvm/svm/svm.h @@ -30,7 +30,7 @@ #define IOPM_SIZE PAGE_SIZE * 3 #define MSRPM_SIZE PAGE_SIZE * 2
-#define MAX_DIRECT_ACCESS_MSRS 47 +#define MAX_DIRECT_ACCESS_MSRS 48 #define MSRPM_OFFSETS 32 extern u32 msrpm_offsets[MSRPM_OFFSETS] __read_mostly; extern bool npt_enabled; @@ -544,6 +544,7 @@ u32 *svm_vcpu_alloc_msrpm(void); void svm_vcpu_init_msrpm(struct kvm_vcpu *vcpu, u32 *msrpm); void svm_vcpu_free_msrpm(u32 *msrpm); void svm_copy_lbrs(struct vmcb *to_vmcb, struct vmcb *from_vmcb); +void svm_enable_lbrv(struct kvm_vcpu *vcpu); void svm_update_lbrv(struct kvm_vcpu *vcpu);
int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthias Stocker mstocker@barracuda.com
[ Upstream commit ffbe335b8d471f79b259e950cb20999700670456 ]
When vmxnet3_rq_create() fails to allocate memory for rq->data_ring.base, the subsequent call to vmxnet3_rq_destroy_all_rxdataring does not reset rq->data_ring.desc_size for the data ring that failed, which presumably causes the hypervisor to reference it on packet reception.
To fix this bug, rq->data_ring.desc_size needs to be set to 0 to tell the hypervisor to disable this feature.
[ 95.436876] kernel BUG at net/core/skbuff.c:207! [ 95.439074] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 95.440411] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 6.9.3-dirty #1 [ 95.441558] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018 [ 95.443481] RIP: 0010:skb_panic+0x4d/0x4f [ 95.444404] Code: 4f 70 50 8b 87 c0 00 00 00 50 8b 87 bc 00 00 00 50 ff b7 d0 00 00 00 4c 8b 8f c8 00 00 00 48 c7 c7 68 e8 be 9f e8 63 58 f9 ff <0f> 0b 48 8b 14 24 48 c7 c1 d0 73 65 9f e8 a1 ff ff ff 48 8b 14 24 [ 95.447684] RSP: 0018:ffffa13340274dd0 EFLAGS: 00010246 [ 95.448762] RAX: 0000000000000089 RBX: ffff8fbbc72b02d0 RCX: 000000000000083f [ 95.450148] RDX: 0000000000000000 RSI: 00000000000000f6 RDI: 000000000000083f [ 95.451520] RBP: 000000000000002d R08: 0000000000000000 R09: ffffa13340274c60 [ 95.452886] R10: ffffffffa04ed468 R11: 0000000000000002 R12: 0000000000000000 [ 95.454293] R13: ffff8fbbdab3c2d0 R14: ffff8fbbdbd829e0 R15: ffff8fbbdbd809e0 [ 95.455682] FS: 0000000000000000(0000) GS:ffff8fbeefd80000(0000) knlGS:0000000000000000 [ 95.457178] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 95.458340] CR2: 00007fd0d1f650c8 CR3: 0000000115f28000 CR4: 00000000000406f0 [ 95.459791] Call Trace: [ 95.460515] <IRQ> [ 95.461180] ? __die_body.cold+0x19/0x27 [ 95.462150] ? die+0x2e/0x50 [ 95.462976] ? do_trap+0xca/0x110 [ 95.463973] ? do_error_trap+0x6a/0x90 [ 95.464966] ? skb_panic+0x4d/0x4f [ 95.465901] ? exc_invalid_op+0x50/0x70 [ 95.466849] ? skb_panic+0x4d/0x4f [ 95.467718] ? asm_exc_invalid_op+0x1a/0x20 [ 95.468758] ? skb_panic+0x4d/0x4f [ 95.469655] skb_put.cold+0x10/0x10 [ 95.470573] vmxnet3_rq_rx_complete+0x862/0x11e0 [vmxnet3] [ 95.471853] vmxnet3_poll_rx_only+0x36/0xb0 [vmxnet3] [ 95.473185] __napi_poll+0x2b/0x160 [ 95.474145] net_rx_action+0x2c6/0x3b0 [ 95.475115] handle_softirqs+0xe7/0x2a0 [ 95.476122] __irq_exit_rcu+0x97/0xb0 [ 95.477109] common_interrupt+0x85/0xa0 [ 95.478102] </IRQ> [ 95.478846] <TASK> [ 95.479603] asm_common_interrupt+0x26/0x40 [ 95.480657] RIP: 0010:pv_native_safe_halt+0xf/0x20 [ 95.481801] Code: 22 d7 e9 54 87 01 00 0f 1f 40 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa eb 07 0f 00 2d 93 ba 3b 00 fb f4 <e9> 2c 87 01 00 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 [ 95.485563] RSP: 0018:ffffa133400ffe58 EFLAGS: 00000246 [ 95.486882] RAX: 0000000000004000 RBX: ffff8fbbc1d14064 RCX: 0000000000000000 [ 95.488477] RDX: ffff8fbeefd80000 RSI: ffff8fbbc1d14000 RDI: 0000000000000001 [ 95.490067] RBP: ffff8fbbc1d14064 R08: ffffffffa0652260 R09: 00000000000010d3 [ 95.491683] R10: 0000000000000018 R11: ffff8fbeefdb4764 R12: ffffffffa0652260 [ 95.493389] R13: ffffffffa06522e0 R14: 0000000000000001 R15: 0000000000000000 [ 95.495035] acpi_safe_halt+0x14/0x20 [ 95.496127] acpi_idle_do_entry+0x2f/0x50 [ 95.497221] acpi_idle_enter+0x7f/0xd0 [ 95.498272] cpuidle_enter_state+0x81/0x420 [ 95.499375] cpuidle_enter+0x2d/0x40 [ 95.500400] do_idle+0x1e5/0x240 [ 95.501385] cpu_startup_entry+0x29/0x30 [ 95.502422] start_secondary+0x11c/0x140 [ 95.503454] common_startup_64+0x13e/0x141 [ 95.504466] </TASK> [ 95.505197] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set nf_tables vsock_loopback vmw_vsock_virtio_transport_common qrtr vmw_vsock_vmci_transport vsock sunrpc binfmt_misc pktcdvd vmw_balloon pcspkr vmw_vmci i2c_piix4 joydev loop dm_multipath nfnetlink zram crct10dif_pclmul crc32_pclmul vmwgfx crc32c_intel polyval_clmulni polyval_generic ghash_clmulni_intel sha512_ssse3 sha256_ssse3 vmxnet3 sha1_ssse3 drm_ttm_helper vmw_pvscsi ttm ata_generic pata_acpi serio_raw scsi_dh_rdac scsi_dh_emc scsi_dh_alua ip6_tables ip_tables fuse [ 95.516536] ---[ end trace 0000000000000000 ]---
Fixes: 6f4833383e85 ("net: vmxnet3: Fix NULL pointer dereference in vmxnet3_rq_rx_complete()") Signed-off-by: Matthias Stocker mstocker@barracuda.com Reviewed-by: Subbaraya Sundeep sbhatta@marvell.com Reviewed-by: Ronak Doshi ronak.doshi@broadcom.com Link: https://lore.kernel.org/r/20240531103711.101961-1-mstocker@barracuda.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/vmxnet3/vmxnet3_drv.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index 0578864792b60..beebe09eb88ff 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -2034,8 +2034,8 @@ vmxnet3_rq_destroy_all_rxdataring(struct vmxnet3_adapter *adapter) rq->data_ring.base, rq->data_ring.basePA); rq->data_ring.base = NULL; - rq->data_ring.desc_size = 0; } + rq->data_ring.desc_size = 0; } }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 2fe40483ec257de2a0d819ef88e3e76c7e261319 ]
As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled.
Disabling preemption in ioam6_output() is not good enough, because ioam6_output() is called from process context, lwtunnel_output() only uses rcu_read_lock().
We might be interrupted by a softirq, re-enter ioam6_output() and corrupt dst_cache data structures.
Fix the race by using local_bh_disable() instead of preempt_disable().
Fixes: 8cb3bf8bff3c ("ipv6: ioam: Add support for the ip6ip6 encapsulation") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Justin Iurman justin.iurman@uliege.be Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240531132636.2637995-2-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ioam6_iptunnel.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c index 7563f8c6aa87c..bf7120ecea1eb 100644 --- a/net/ipv6/ioam6_iptunnel.c +++ b/net/ipv6/ioam6_iptunnel.c @@ -351,9 +351,9 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) goto drop;
if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr)) { - preempt_disable(); + local_bh_disable(); dst = dst_cache_get(&ilwt->cache); - preempt_enable(); + local_bh_enable();
if (unlikely(!dst)) { struct ipv6hdr *hdr = ipv6_hdr(skb); @@ -373,9 +373,9 @@ static int ioam6_output(struct net *net, struct sock *sk, struct sk_buff *skb) goto drop; }
- preempt_disable(); + local_bh_disable(); dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr); - preempt_enable(); + local_bh_enable(); }
skb_dst_drop(skb);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit c0b98ac1cc104f48763cdb27b1e9ac25fd81fc90 ]
As explained in commit 1378817486d6 ("tipc: block BH before using dst_cache"), net/core/dst_cache.c helpers need to be called with BH disabled.
Disabling preemption in seg6_output_core() is not good enough, because seg6_output_core() is called from process context, lwtunnel_output() only uses rcu_read_lock().
We might be interrupted by a softirq, re-enter seg6_output_core() and corrupt dst_cache data structures.
Fix the race by using local_bh_disable() instead of preempt_disable().
Apply a similar change in seg6_input_core().
Fixes: fa79581ea66c ("ipv6: sr: fix several BUGs when preemption is enabled") Fixes: 6c8702c60b88 ("ipv6: sr: add support for SRH encapsulation and injection with lwtunnels") Signed-off-by: Eric Dumazet edumazet@google.com Cc: David Lebrun dlebrun@google.com Acked-by: Paolo Abeni pabeni@redhat.com Link: https://lore.kernel.org/r/20240531132636.2637995-4-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/seg6_iptunnel.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-)
diff --git a/net/ipv6/seg6_iptunnel.c b/net/ipv6/seg6_iptunnel.c index a75df2ec8db0d..098632adc9b5a 100644 --- a/net/ipv6/seg6_iptunnel.c +++ b/net/ipv6/seg6_iptunnel.c @@ -464,23 +464,21 @@ static int seg6_input_core(struct net *net, struct sock *sk,
slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate);
- preempt_disable(); + local_bh_disable(); dst = dst_cache_get(&slwt->cache); - preempt_enable();
if (!dst) { ip6_route_input(skb); dst = skb_dst(skb); if (!dst->error) { - preempt_disable(); dst_cache_set_ip6(&slwt->cache, dst, &ipv6_hdr(skb)->saddr); - preempt_enable(); } } else { skb_dst_drop(skb); skb_dst_set(skb, dst); } + local_bh_enable();
err = skb_cow_head(skb, LL_RESERVED_SPACE(dst->dev)); if (unlikely(err)) @@ -536,9 +534,9 @@ static int seg6_output_core(struct net *net, struct sock *sk,
slwt = seg6_lwt_lwtunnel(orig_dst->lwtstate);
- preempt_disable(); + local_bh_disable(); dst = dst_cache_get(&slwt->cache); - preempt_enable(); + local_bh_enable();
if (unlikely(!dst)) { struct ipv6hdr *hdr = ipv6_hdr(skb); @@ -558,9 +556,9 @@ static int seg6_output_core(struct net *net, struct sock *sk, goto drop; }
- preempt_disable(); + local_bh_disable(); dst_cache_set_ip6(&slwt->cache, dst, &fl6.saddr); - preempt_enable(); + local_bh_enable(); }
skb_dst_drop(skb);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit a535d59432370343058755100ee75ab03c0e3f91 ]
For TLS offload we mark packets with skb->decrypted to make sure they don't escape the host without getting encrypted first. The crypto state lives in the socket, so it may get detached by a call to skb_orphan(). As a safety check - the egress path drops all packets with skb->decrypted and no "crypto-safe" socket.
The skb marking was added to sendpage only (and not sendmsg), because tls_device injected data into the TCP stack using sendpage. This special case was missed when sendpage got folded into sendmsg.
Fixes: c5c37af6ecad ("tcp: Convert do_tcp_sendpages() to use MSG_SPLICE_PAGES") Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240530232607.82686-1-kuba@kernel.org Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 66d77faca64f6..5c79836e4c9e7 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1159,6 +1159,9 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
process_backlog++;
+#ifdef CONFIG_SKB_DECRYPTED + skb->decrypted = !!(flags & MSG_SENDPAGE_DECRYPTED); +#endif tcp_skb_entail(sk, skb); copy = size_goal;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Olsa jolsa@kernel.org
[ Upstream commit d0d1df8ba18abc57f28fb3bc053b2bf319367f2c ]
syzbot reported crash when rawtp program executed through the test_run interface calls bpf_get_attach_cookie helper or any other helper that touches task->bpf_ctx pointer.
Setting the run context (task->bpf_ctx pointer) for test_run callback.
Fixes: 7adfc6c9b315 ("bpf: Add bpf_get_attach_cookie() BPF helper to access bpf_cookie value") Reported-by: syzbot+3ab78ff125b7979e45f9@syzkaller.appspotmail.com Signed-off-by: Jiri Olsa jolsa@kernel.org Signed-off-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Daniel Borkmann daniel@iogearbox.net Closes: https://syzkaller.appspot.com/bug?extid=3ab78ff125b7979e45f9 Link: https://lore.kernel.org/bpf/20240604150024.359247-1-jolsa@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bpf/test_run.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 61efeadaff8db..4cd29fb490f7c 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -719,10 +719,16 @@ static void __bpf_prog_test_run_raw_tp(void *data) { struct bpf_raw_tp_test_run_info *info = data; + struct bpf_trace_run_ctx run_ctx = {}; + struct bpf_run_ctx *old_run_ctx; + + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
rcu_read_lock(); info->retval = bpf_prog_run(info->prog, info->ctx); rcu_read_unlock(); + + bpf_reset_run_ctx(old_run_ctx); }
int bpf_prog_test_run_raw_tp(struct bpf_prog *prog,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Subbaraya Sundeep sbhatta@marvell.com
[ Upstream commit 8b0f7410942cdc420c4557eda02bfcdf60ccec17 ]
PF mcam entries has to be at low priority always so that VF can install longest prefix match rules at higher priority. This was taken care currently but when priority allocation wrt reference entry is requested then entries are allocated from mid-zone instead of low priority zone. Fix this and always allocate entries from low priority zone for PFs.
Fixes: 7df5b4b260dd ("octeontx2-af: Allocate low priority entries for PF") Signed-off-by: Subbaraya Sundeep sbhatta@marvell.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/marvell/octeontx2/af/rvu_npc.c | 33 ++++++++++++------- 1 file changed, 22 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c index e8b73b9d75e31..97722ce8c4cb3 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_npc.c @@ -2519,7 +2519,17 @@ static int npc_mcam_alloc_entries(struct npc_mcam *mcam, u16 pcifunc, * - when available free entries are less. * Lower priority ones out of avaialble free entries are always * chosen when 'high vs low' question arises. + * + * For a VF base MCAM match rule is set by its PF. And all the + * further MCAM rules installed by VF on its own are + * concatenated with the base rule set by its PF. Hence PF entries + * should be at lower priority compared to VF entries. Otherwise + * base rule is hit always and rules installed by VF will be of + * no use. Hence if the request is from PF then allocate low + * priority entries. */ + if (!(pcifunc & RVU_PFVF_FUNC_MASK)) + goto lprio_alloc;
/* Get the search range for priority allocation request */ if (req->priority) { @@ -2528,17 +2538,6 @@ static int npc_mcam_alloc_entries(struct npc_mcam *mcam, u16 pcifunc, goto alloc; }
- /* For a VF base MCAM match rule is set by its PF. And all the - * further MCAM rules installed by VF on its own are - * concatenated with the base rule set by its PF. Hence PF entries - * should be at lower priority compared to VF entries. Otherwise - * base rule is hit always and rules installed by VF will be of - * no use. Hence if the request is from PF and NOT a priority - * allocation request then allocate low priority entries. - */ - if (!(pcifunc & RVU_PFVF_FUNC_MASK)) - goto lprio_alloc; - /* Find out the search range for non-priority allocation request * * Get MCAM free entry count in middle zone. @@ -2568,6 +2567,18 @@ static int npc_mcam_alloc_entries(struct npc_mcam *mcam, u16 pcifunc, reverse = true; start = 0; end = mcam->bmap_entries; + /* Ensure PF requests are always at bottom and if PF requests + * for higher/lower priority entry wrt reference entry then + * honour that criteria and start search for entries from bottom + * and not in mid zone. + */ + if (!(pcifunc & RVU_PFVF_FUNC_MASK) && + req->priority == NPC_MCAM_HIGHER_PRIO) + end = req->ref_entry; + + if (!(pcifunc & RVU_PFVF_FUNC_MASK) && + req->priority == NPC_MCAM_LOWER_PRIO) + start = req->ref_entry; }
alloc:
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wen Gu guwen@linux.alibaba.com
[ Upstream commit fb0aa0781a5f457e3864da68af52c3b1f4f7fd8f ]
When copying smc settings to clcsock, avoid setting clcsock's sk_sndbuf to sysctl_tcp_wmem[1], since this may overwrite the value set by tcp_sndbuf_expand() in TCP connection establishment.
And the other setting sk_{snd|rcv}buf to sysctl value in smc_adjust_sock_bufsizes() can also be omitted since the initialization of smc sock and clcsock has set sk_{snd|rcv}buf to smc.sysctl_{w|r}mem or ipv4_sysctl_tcp_{w|r}mem[1].
Fixes: 30c3c4a4497c ("net/smc: Use correct buffer sizes when switching between TCP and SMC") Link: https://lore.kernel.org/r/5eaf3858-e7fd-4db8-83e8-3d7a3e0e9ae2@linux.alibaba... Signed-off-by: Wen Gu guwen@linux.alibaba.com Reviewed-by: Wenjia Zhang wenjia@linux.ibm.com Reviewed-by: Gerd Bayer gbayer@linux.ibm.com, too. Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/af_smc.c | 22 ++-------------------- 1 file changed, 2 insertions(+), 20 deletions(-)
diff --git a/net/smc/af_smc.c b/net/smc/af_smc.c index 4b52b3b159c0e..5f9f3d4c1df5f 100644 --- a/net/smc/af_smc.c +++ b/net/smc/af_smc.c @@ -460,29 +460,11 @@ static int smc_bind(struct socket *sock, struct sockaddr *uaddr, static void smc_adjust_sock_bufsizes(struct sock *nsk, struct sock *osk, unsigned long mask) { - struct net *nnet = sock_net(nsk); - nsk->sk_userlocks = osk->sk_userlocks; - if (osk->sk_userlocks & SOCK_SNDBUF_LOCK) { + if (osk->sk_userlocks & SOCK_SNDBUF_LOCK) nsk->sk_sndbuf = osk->sk_sndbuf; - } else { - if (mask == SK_FLAGS_SMC_TO_CLC) - WRITE_ONCE(nsk->sk_sndbuf, - READ_ONCE(nnet->ipv4.sysctl_tcp_wmem[1])); - else - WRITE_ONCE(nsk->sk_sndbuf, - 2 * READ_ONCE(nnet->smc.sysctl_wmem)); - } - if (osk->sk_userlocks & SOCK_RCVBUF_LOCK) { + if (osk->sk_userlocks & SOCK_RCVBUF_LOCK) nsk->sk_rcvbuf = osk->sk_rcvbuf; - } else { - if (mask == SK_FLAGS_SMC_TO_CLC) - WRITE_ONCE(nsk->sk_rcvbuf, - READ_ONCE(nnet->ipv4.sysctl_tcp_rmem[1])); - else - WRITE_ONCE(nsk->sk_rcvbuf, - 2 * READ_ONCE(nnet->smc.sysctl_rmem)); - } }
static void smc_copy_sock_settings(struct sock *nsk, struct sock *osk,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tristram Ha tristram.ha@microchip.com
[ Upstream commit 0a8d3f2e3e8d8aea8af017e14227b91d5989b696 ]
KSZ8061 needs to write to a MMD register at driver initialization to fix an errata. This worked in 5.0 kernel but not in newer kernels. The issue is the main phylib code no longer resets PHY at the very beginning. Calling phy resuming code later will reset the chip if it is already powered down at the beginning. This wipes out the MMD register write. Solution is to implement a phy resume function for KSZ8061 to take care of this problem.
Fixes: 232ba3a51cc2 ("net: phy: Micrel KSZ8061: link failure after cable connect") Signed-off-by: Tristram Ha tristram.ha@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/micrel.c | 42 +++++++++++++++++++++++++++++++++++++++- 1 file changed, 41 insertions(+), 1 deletion(-)
diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index c2d99344ade41..4b22bb6393e26 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -785,6 +785,17 @@ static int ksz8061_config_init(struct phy_device *phydev) { int ret;
+ /* Chip can be powered down by the bootstrap code. */ + ret = phy_read(phydev, MII_BMCR); + if (ret < 0) + return ret; + if (ret & BMCR_PDOWN) { + ret = phy_write(phydev, MII_BMCR, ret & ~BMCR_PDOWN); + if (ret < 0) + return ret; + usleep_range(1000, 2000); + } + ret = phy_write_mmd(phydev, MDIO_MMD_PMAPMD, MDIO_DEVID1, 0xB61A); if (ret) return ret; @@ -2054,6 +2065,35 @@ static int ksz9477_resume(struct phy_device *phydev) return 0; }
+static int ksz8061_resume(struct phy_device *phydev) +{ + int ret; + + /* This function can be called twice when the Ethernet device is on. */ + ret = phy_read(phydev, MII_BMCR); + if (ret < 0) + return ret; + if (!(ret & BMCR_PDOWN)) + return 0; + + genphy_resume(phydev); + usleep_range(1000, 2000); + + /* Re-program the value after chip is reset. */ + ret = phy_write_mmd(phydev, MDIO_MMD_PMAPMD, MDIO_DEVID1, 0xB61A); + if (ret) + return ret; + + /* Enable PHY Interrupts */ + if (phy_interrupt_is_valid(phydev)) { + phydev->interrupts = PHY_INTERRUPT_ENABLED; + if (phydev->drv->config_intr) + phydev->drv->config_intr(phydev); + } + + return 0; +} + static int kszphy_probe(struct phy_device *phydev) { const struct kszphy_type *type = phydev->drv->driver_data; @@ -4876,7 +4916,7 @@ static struct phy_driver ksphy_driver[] = { .config_intr = kszphy_config_intr, .handle_interrupt = kszphy_handle_interrupt, .suspend = kszphy_suspend, - .resume = kszphy_resume, + .resume = ksz8061_resume, }, { .phy_id = PHY_ID_KSZ9021, .phy_id_mask = 0x000ffffe,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit 491aee894a08bc9b8bb52e7363b9d4bc6403f363 ]
In the XDP_TX path, ionic driver sends a packet to the TX path with rx page and corresponding dma address. After tx is done, ionic_tx_clean() frees that page. But RX ring buffer isn't reset to NULL. So, it uses a freed page, which causes kernel panic.
BUG: unable to handle page fault for address: ffff8881576c110c PGD 773801067 P4D 773801067 PUD 87f086067 PMD 87efca067 PTE 800ffffea893e060 Oops: Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN NOPTI CPU: 1 PID: 25 Comm: ksoftirqd/1 Not tainted 6.9.0+ #11 Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 RIP: 0010:bpf_prog_f0b8caeac1068a55_balancer_ingress+0x3b/0x44f Code: 00 53 41 55 41 56 41 57 b8 01 00 00 00 48 8b 5f 08 4c 8b 77 00 4c 89 f7 48 83 c7 0e 48 39 d8 RSP: 0018:ffff888104e6fa28 EFLAGS: 00010283 RAX: 0000000000000002 RBX: ffff8881576c1140 RCX: 0000000000000002 RDX: ffffffffc0051f64 RSI: ffffc90002d33048 RDI: ffff8881576c110e RBP: ffff888104e6fa88 R08: 0000000000000000 R09: ffffed1027a04a23 R10: 0000000000000000 R11: 0000000000000000 R12: ffff8881b03a21a8 R13: ffff8881589f800f R14: ffff8881576c1100 R15: 00000001576c1100 FS: 0000000000000000(0000) GS:ffff88881ae00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8881576c110c CR3: 0000000767a90000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x20/0x70 ? page_fault_oops+0x254/0x790 ? __pfx_page_fault_oops+0x10/0x10 ? __pfx_is_prefetch.constprop.0+0x10/0x10 ? search_bpf_extables+0x165/0x260 ? fixup_exception+0x4a/0x970 ? exc_page_fault+0xcb/0xe0 ? asm_exc_page_fault+0x22/0x30 ? 0xffffffffc0051f64 ? bpf_prog_f0b8caeac1068a55_balancer_ingress+0x3b/0x44f ? do_raw_spin_unlock+0x54/0x220 ionic_rx_service+0x11ab/0x3010 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ? ionic_tx_clean+0x29b/0xc60 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ? __pfx_ionic_tx_clean+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ? __pfx_ionic_rx_service+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ? ionic_tx_cq_service+0x25d/0xa00 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ? __pfx_ionic_rx_service+0x10/0x10 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ionic_cq_service+0x69/0x150 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] ionic_txrx_napi+0x11a/0x540 [ionic 9180c3001ab627d82bbc5f3ebe8a0decaf6bb864] __napi_poll.constprop.0+0xa0/0x440 net_rx_action+0x7e7/0xc30 ? __pfx_net_rx_action+0x10/0x10
Fixes: 8eeed8373e1c ("ionic: Add XDP_TX support") Signed-off-by: Taehee Yoo ap420073@gmail.com Reviewed-by: Shannon Nelson shannon.nelson@amd.com Reviewed-by: Brett Creeley brett.creeley@amd.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/pensando/ionic/ionic_txrx.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_txrx.c b/drivers/net/ethernet/pensando/ionic/ionic_txrx.c index 5dba6d2d633cb..2427610f4306d 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_txrx.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_txrx.c @@ -586,6 +586,7 @@ static bool ionic_run_xdp(struct ionic_rx_stats *stats, netdev_dbg(netdev, "tx ionic_xdp_post_frame err %d\n", err); goto out_xdp_abort; } + buf_info->page = NULL; stats->xdp_tx++;
/* the Tx completion will free the buffers */
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangyu Hua hbh25y@gmail.com
[ Upstream commit affc18fdc694190ca7575b9a86632a73b9fe043d ]
q->bands will be assigned to qopt->bands to execute subsequent code logic after kmalloc. So the old q->bands should not be used in kmalloc. Otherwise, an out-of-bounds write will occur.
Fixes: c2999f7fb05b ("net: sched: multiq: don't call qdisc_put() while holding tree lock") Signed-off-by: Hangyu Hua hbh25y@gmail.com Acked-by: Cong Wang cong.wang@bytedance.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_multiq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/sched/sch_multiq.c b/net/sched/sch_multiq.c index 79e93a19d5fab..06e03f5cd7ce1 100644 --- a/net/sched/sch_multiq.c +++ b/net/sched/sch_multiq.c @@ -185,7 +185,7 @@ static int multiq_tune(struct Qdisc *sch, struct nlattr *opt,
qopt->bands = qdisc_dev(sch)->real_num_tx_queues;
- removed = kmalloc(sizeof(*removed) * (q->max_bands - q->bands), + removed = kmalloc(sizeof(*removed) * (q->max_bands - qopt->bands), GFP_KERNEL); if (!removed) return -ENOMEM;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Xing kernelxing@tencent.com
[ Upstream commit a46d0ea5c94205f40ecf912d1bb7806a8a64704f ]
According to RFC 1213, we should also take CLOSE-WAIT sockets into consideration:
"tcpCurrEstab OBJECT-TYPE ... The number of TCP connections for which the current state is either ESTABLISHED or CLOSE- WAIT."
After this, CurrEstab counter will display the total number of ESTABLISHED and CLOSE-WAIT sockets.
The logic of counting When we increment the counter? a) if we change the state to ESTABLISHED. b) if we change the state from SYN-RECEIVED to CLOSE-WAIT.
When we decrement the counter? a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT, say, on the client side, changing from ESTABLISHED to FIN-WAIT-1. b) if the socket leaves CLOSE-WAIT, say, on the server side, changing from CLOSE-WAIT to LAST-ACK.
Please note: there are two chances that old state of socket can be changed to CLOSE-WAIT in tcp_fin(). One is SYN-RECV, the other is ESTABLISHED. So we have to take care of the former case.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jason Xing kernelxing@tencent.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 5c79836e4c9e7..77ee1eda3fd86 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -2640,6 +2640,10 @@ void tcp_set_state(struct sock *sk, int state) if (oldstate != TCP_ESTABLISHED) TCP_INC_STATS(sock_net(sk), TCP_MIB_CURRESTAB); break; + case TCP_CLOSE_WAIT: + if (oldstate == TCP_SYN_RECV) + TCP_INC_STATS(sock_net(sk), TCP_MIB_CURRESTAB); + break;
case TCP_CLOSE: if (oldstate == TCP_CLOSE_WAIT || oldstate == TCP_ESTABLISHED) @@ -2651,7 +2655,7 @@ void tcp_set_state(struct sock *sk, int state) inet_put_port(sk); fallthrough; default: - if (oldstate == TCP_ESTABLISHED) + if (oldstate == TCP_ESTABLISHED || oldstate == TCP_CLOSE_WAIT) TCP_DEC_STATS(sock_net(sk), TCP_MIB_CURRESTAB); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Xing kernelxing@tencent.com
[ Upstream commit 9633e9377e6af0244f7381e86b9aac5276f5be97 ]
Like previous patch does in TCP, we need to adhere to RFC 1213:
"tcpCurrEstab OBJECT-TYPE ... The number of TCP connections for which the current state is either ESTABLISHED or CLOSE- WAIT."
So let's consider CLOSE-WAIT sockets.
The logic of counting When we increment the counter? a) Only if we change the state to ESTABLISHED.
When we decrement the counter? a) if the socket leaves ESTABLISHED and will never go into CLOSE-WAIT, say, on the client side, changing from ESTABLISHED to FIN-WAIT-1. b) if the socket leaves CLOSE-WAIT, say, on the server side, changing from CLOSE-WAIT to LAST-ACK.
Fixes: d9cd27b8cd19 ("mptcp: add CurrEstab MIB counter support") Signed-off-by: Jason Xing kernelxing@tencent.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/protocol.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c index 965eb69dc5de3..327dcf06edd47 100644 --- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -2907,9 +2907,14 @@ void mptcp_set_state(struct sock *sk, int state) if (oldstate != TCP_ESTABLISHED) MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_CURRESTAB); break; - + case TCP_CLOSE_WAIT: + /* Unlike TCP, MPTCP sk would not have the TCP_SYN_RECV state: + * MPTCP "accepted" sockets will be created later on. So no + * transition from TCP_SYN_RECV to TCP_CLOSE_WAIT. + */ + break; default: - if (oldstate == TCP_ESTABLISHED) + if (oldstate == TCP_ESTABLISHED || oldstate == TCP_CLOSE_WAIT) MPTCP_DEC_STATS(sock_net(sk), MPTCP_MIB_CURRESTAB); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 5b4b62a169e10401cca34a6e7ac39161986f5605 ]
Jaroslav reports Dell's OMSA Systems Management Data Engine expects NLM_DONE in a separate recvmsg(), both for rtnl_dump_ifinfo() and inet_dump_ifaddr(). We already added a similar fix previously in commit 460b0d33cf10 ("inet: bring NLM_DONE out to a separate recv() again")
Instead of modifying all the dump handlers, and making them look different than modern for_each_netdev_dump()-based dump handlers - put the workaround in rtnetlink code. This will also help us move the custom rtnl-locking from af_netlink in the future (in net-next).
Note that this change is not touching rtnl_dump_all(). rtnl_dump_all() is different kettle of fish and a potential problem. We now mix families in a single recvmsg(), but NLM_DONE is not coalesced.
Tested:
./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_addr.yaml \ --dump getaddr --json '{"ifa-family": 2}'
./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_route.yaml \ --dump getroute --json '{"rtm-family": 2}'
./cli.py --dbg-small-recv 4096 --spec netlink/specs/rt_link.yaml \ --dump getlink
Fixes: 3e41af90767d ("rtnetlink: use xarray iterator to implement rtnl_dump_ifinfo()") Fixes: cdb2f80f1c10 ("inet: use xa_array iterator to implement inet_dump_ifaddr()") Reported-by: Jaroslav Pulchart jaroslav.pulchart@gooddata.com Link: https://lore.kernel.org/all/CAK8fFZ7MKoFSEzMBDAOjoUt+vTZRRQgLDNXEOfdCCXSoXXK... Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/rtnetlink.h | 1 + net/core/rtnetlink.c | 44 +++++++++++++++++++++++++++++++++++++++-- net/ipv4/devinet.c | 2 +- net/ipv4/fib_frontend.c | 7 +------ 4 files changed, 45 insertions(+), 9 deletions(-)
diff --git a/include/net/rtnetlink.h b/include/net/rtnetlink.h index 3bfb80bad1739..b45d57b5968af 100644 --- a/include/net/rtnetlink.h +++ b/include/net/rtnetlink.h @@ -13,6 +13,7 @@ enum rtnl_link_flags { RTNL_FLAG_DOIT_UNLOCKED = BIT(0), RTNL_FLAG_BULK_DEL_SUPPORTED = BIT(1), RTNL_FLAG_DUMP_UNLOCKED = BIT(2), + RTNL_FLAG_DUMP_SPLIT_NLM_DONE = BIT(3), /* legacy behavior */ };
enum rtnl_kinds { diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 8ba6a4e4be266..74e6f9746fb30 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -6484,6 +6484,46 @@ static int rtnl_mdb_del(struct sk_buff *skb, struct nlmsghdr *nlh,
/* Process one rtnetlink message. */
+static int rtnl_dumpit(struct sk_buff *skb, struct netlink_callback *cb) +{ + rtnl_dumpit_func dumpit = cb->data; + int err; + + /* Previous iteration have already finished, avoid calling->dumpit() + * again, it may not expect to be called after it reached the end. + */ + if (!dumpit) + return 0; + + err = dumpit(skb, cb); + + /* Old dump handlers used to send NLM_DONE as in a separate recvmsg(). + * Some applications which parse netlink manually depend on this. + */ + if (cb->flags & RTNL_FLAG_DUMP_SPLIT_NLM_DONE) { + if (err < 0 && err != -EMSGSIZE) + return err; + if (!err) + cb->data = NULL; + + return skb->len; + } + return err; +} + +static int rtnetlink_dump_start(struct sock *ssk, struct sk_buff *skb, + const struct nlmsghdr *nlh, + struct netlink_dump_control *control) +{ + if (control->flags & RTNL_FLAG_DUMP_SPLIT_NLM_DONE) { + WARN_ON(control->data); + control->data = control->dump; + control->dump = rtnl_dumpit; + } + + return netlink_dump_start(ssk, skb, nlh, control); +} + static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, struct netlink_ext_ack *extack) { @@ -6548,7 +6588,7 @@ static int rtnetlink_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh, .module = owner, .flags = flags, }; - err = netlink_dump_start(rtnl, skb, nlh, &c); + err = rtnetlink_dump_start(rtnl, skb, nlh, &c); /* netlink_dump_start() will keep a reference on * module if dump is still in progress. */ @@ -6694,7 +6734,7 @@ void __init rtnetlink_init(void) register_netdevice_notifier(&rtnetlink_dev_notifier);
rtnl_register(PF_UNSPEC, RTM_GETLINK, rtnl_getlink, - rtnl_dump_ifinfo, 0); + rtnl_dump_ifinfo, RTNL_FLAG_DUMP_SPLIT_NLM_DONE); rtnl_register(PF_UNSPEC, RTM_SETLINK, rtnl_setlink, NULL, 0); rtnl_register(PF_UNSPEC, RTM_NEWLINK, rtnl_newlink, NULL, 0); rtnl_register(PF_UNSPEC, RTM_DELLINK, rtnl_dellink, NULL, 0); diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 8382cc998bff8..84b5d1ccf716a 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -2801,7 +2801,7 @@ void __init devinet_init(void) rtnl_register(PF_INET, RTM_NEWADDR, inet_rtm_newaddr, NULL, 0); rtnl_register(PF_INET, RTM_DELADDR, inet_rtm_deladdr, NULL, 0); rtnl_register(PF_INET, RTM_GETADDR, NULL, inet_dump_ifaddr, - RTNL_FLAG_DUMP_UNLOCKED); + RTNL_FLAG_DUMP_UNLOCKED | RTNL_FLAG_DUMP_SPLIT_NLM_DONE); rtnl_register(PF_INET, RTM_GETNETCONF, inet_netconf_get_devconf, inet_netconf_dump_devconf, RTNL_FLAG_DOIT_UNLOCKED | RTNL_FLAG_DUMP_UNLOCKED); diff --git a/net/ipv4/fib_frontend.c b/net/ipv4/fib_frontend.c index c484b1c0fc00a..7ad2cafb92763 100644 --- a/net/ipv4/fib_frontend.c +++ b/net/ipv4/fib_frontend.c @@ -1050,11 +1050,6 @@ static int inet_dump_fib(struct sk_buff *skb, struct netlink_callback *cb) e++; } } - - /* Don't let NLM_DONE coalesce into a message, even if it could. - * Some user space expects NLM_DONE in a separate recv(). - */ - err = skb->len; out:
cb->args[1] = e; @@ -1665,5 +1660,5 @@ void __init ip_fib_init(void) rtnl_register(PF_INET, RTM_NEWROUTE, inet_rtm_newroute, NULL, 0); rtnl_register(PF_INET, RTM_DELROUTE, inet_rtm_delroute, NULL, 0); rtnl_register(PF_INET, RTM_GETROUTE, NULL, inet_dump_fib, - RTNL_FLAG_DUMP_UNLOCKED); + RTNL_FLAG_DUMP_UNLOCKED | RTNL_FLAG_DUMP_SPLIT_NLM_DONE); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Wunderlich frank-w@public-files.de
[ Upstream commit c57e558194430d10d5e5f4acd8a8655b68dade13 ]
The mainline MTK ethernet driver suffers long time from rarly but annoying tx queue timeouts. We think that this is caused by fixed dma sizes hardcoded for all SoCs.
We suspect this problem arises from a low level of free TX DMADs, the TX Ring alomost full.
The transmit timeout is caused by the Tx queue not waking up. The Tx queue stops when the free counter is less than ring->thres, and it will wake up once the free counter is greater than ring->thres. If the CPU is too late to wake up the Tx queues, it may cause a transmit timeout. Therefore, we increased the TX and RX DMADs to improve this error situation.
Use the dma-size implementation from SDK in a per SoC manner. In difference to SDK we have no RSS feature yet, so all RX/TX sizes should be raised from 512 to 2048 byte except fqdma on mt7988 to avoid the tx timeout issue.
Fixes: 656e705243fd ("net-next: mediatek: add support for MT7623 ethernet") Suggested-by: Daniel Golle daniel@makrotopia.org Signed-off-by: Frank Wunderlich frank-w@public-files.de Reviewed-by: Jacob Keller jacob.e.keller@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mediatek/mtk_eth_soc.c | 104 +++++++++++++------- drivers/net/ethernet/mediatek/mtk_eth_soc.h | 9 +- 2 files changed, 77 insertions(+), 36 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.c b/drivers/net/ethernet/mediatek/mtk_eth_soc.c index d7d73295f0dc4..41d9b0684be74 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.c +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.c @@ -1131,9 +1131,9 @@ static int mtk_init_fq_dma(struct mtk_eth *eth) { const struct mtk_soc_data *soc = eth->soc; dma_addr_t phy_ring_tail; - int cnt = MTK_QDMA_RING_SIZE; + int cnt = soc->tx.fq_dma_size; dma_addr_t dma_addr; - int i; + int i, j, len;
if (MTK_HAS_CAPS(eth->soc->caps, MTK_SRAM)) eth->scratch_ring = eth->sram_base; @@ -1142,40 +1142,46 @@ static int mtk_init_fq_dma(struct mtk_eth *eth) cnt * soc->tx.desc_size, ð->phy_scratch_ring, GFP_KERNEL); + if (unlikely(!eth->scratch_ring)) return -ENOMEM;
- eth->scratch_head = kcalloc(cnt, MTK_QDMA_PAGE_SIZE, GFP_KERNEL); - if (unlikely(!eth->scratch_head)) - return -ENOMEM; + phy_ring_tail = eth->phy_scratch_ring + soc->tx.desc_size * (cnt - 1);
- dma_addr = dma_map_single(eth->dma_dev, - eth->scratch_head, cnt * MTK_QDMA_PAGE_SIZE, - DMA_FROM_DEVICE); - if (unlikely(dma_mapping_error(eth->dma_dev, dma_addr))) - return -ENOMEM; + for (j = 0; j < DIV_ROUND_UP(soc->tx.fq_dma_size, MTK_FQ_DMA_LENGTH); j++) { + len = min_t(int, cnt - j * MTK_FQ_DMA_LENGTH, MTK_FQ_DMA_LENGTH); + eth->scratch_head[j] = kcalloc(len, MTK_QDMA_PAGE_SIZE, GFP_KERNEL);
- phy_ring_tail = eth->phy_scratch_ring + soc->tx.desc_size * (cnt - 1); + if (unlikely(!eth->scratch_head[j])) + return -ENOMEM;
- for (i = 0; i < cnt; i++) { - dma_addr_t addr = dma_addr + i * MTK_QDMA_PAGE_SIZE; - struct mtk_tx_dma_v2 *txd; + dma_addr = dma_map_single(eth->dma_dev, + eth->scratch_head[j], len * MTK_QDMA_PAGE_SIZE, + DMA_FROM_DEVICE);
- txd = eth->scratch_ring + i * soc->tx.desc_size; - txd->txd1 = addr; - if (i < cnt - 1) - txd->txd2 = eth->phy_scratch_ring + - (i + 1) * soc->tx.desc_size; + if (unlikely(dma_mapping_error(eth->dma_dev, dma_addr))) + return -ENOMEM;
- txd->txd3 = TX_DMA_PLEN0(MTK_QDMA_PAGE_SIZE); - if (MTK_HAS_CAPS(soc->caps, MTK_36BIT_DMA)) - txd->txd3 |= TX_DMA_PREP_ADDR64(addr); - txd->txd4 = 0; - if (mtk_is_netsys_v2_or_greater(eth)) { - txd->txd5 = 0; - txd->txd6 = 0; - txd->txd7 = 0; - txd->txd8 = 0; + for (i = 0; i < cnt; i++) { + struct mtk_tx_dma_v2 *txd; + + txd = eth->scratch_ring + (j * MTK_FQ_DMA_LENGTH + i) * soc->tx.desc_size; + txd->txd1 = dma_addr + i * MTK_QDMA_PAGE_SIZE; + if (j * MTK_FQ_DMA_LENGTH + i < cnt) + txd->txd2 = eth->phy_scratch_ring + + (j * MTK_FQ_DMA_LENGTH + i + 1) * soc->tx.desc_size; + + txd->txd3 = TX_DMA_PLEN0(MTK_QDMA_PAGE_SIZE); + if (MTK_HAS_CAPS(soc->caps, MTK_36BIT_DMA)) + txd->txd3 |= TX_DMA_PREP_ADDR64(dma_addr + i * MTK_QDMA_PAGE_SIZE); + + txd->txd4 = 0; + if (mtk_is_netsys_v2_or_greater(eth)) { + txd->txd5 = 0; + txd->txd6 = 0; + txd->txd7 = 0; + txd->txd8 = 0; + } } }
@@ -2457,7 +2463,7 @@ static int mtk_tx_alloc(struct mtk_eth *eth) if (MTK_HAS_CAPS(soc->caps, MTK_QDMA)) ring_size = MTK_QDMA_RING_SIZE; else - ring_size = MTK_DMA_SIZE; + ring_size = soc->tx.dma_size;
ring->buf = kcalloc(ring_size, sizeof(*ring->buf), GFP_KERNEL); @@ -2465,8 +2471,8 @@ static int mtk_tx_alloc(struct mtk_eth *eth) goto no_tx_mem;
if (MTK_HAS_CAPS(soc->caps, MTK_SRAM)) { - ring->dma = eth->sram_base + ring_size * sz; - ring->phys = eth->phy_scratch_ring + ring_size * (dma_addr_t)sz; + ring->dma = eth->sram_base + soc->tx.fq_dma_size * sz; + ring->phys = eth->phy_scratch_ring + soc->tx.fq_dma_size * (dma_addr_t)sz; } else { ring->dma = dma_alloc_coherent(eth->dma_dev, ring_size * sz, &ring->phys, GFP_KERNEL); @@ -2588,6 +2594,7 @@ static void mtk_tx_clean(struct mtk_eth *eth) static int mtk_rx_alloc(struct mtk_eth *eth, int ring_no, int rx_flag) { const struct mtk_reg_map *reg_map = eth->soc->reg_map; + const struct mtk_soc_data *soc = eth->soc; struct mtk_rx_ring *ring; int rx_data_len, rx_dma_size, tx_ring_size; int i; @@ -2595,7 +2602,7 @@ static int mtk_rx_alloc(struct mtk_eth *eth, int ring_no, int rx_flag) if (MTK_HAS_CAPS(eth->soc->caps, MTK_QDMA)) tx_ring_size = MTK_QDMA_RING_SIZE; else - tx_ring_size = MTK_DMA_SIZE; + tx_ring_size = soc->tx.dma_size;
if (rx_flag == MTK_RX_FLAGS_QDMA) { if (ring_no) @@ -2610,7 +2617,7 @@ static int mtk_rx_alloc(struct mtk_eth *eth, int ring_no, int rx_flag) rx_dma_size = MTK_HW_LRO_DMA_SIZE; } else { rx_data_len = ETH_DATA_LEN; - rx_dma_size = MTK_DMA_SIZE; + rx_dma_size = soc->rx.dma_size; }
ring->frag_size = mtk_max_frag_size(rx_data_len); @@ -3139,7 +3146,10 @@ static void mtk_dma_free(struct mtk_eth *eth) mtk_rx_clean(eth, ð->rx_ring[i], false); }
- kfree(eth->scratch_head); + for (i = 0; i < DIV_ROUND_UP(soc->tx.fq_dma_size, MTK_FQ_DMA_LENGTH); i++) { + kfree(eth->scratch_head[i]); + eth->scratch_head[i] = NULL; + } }
static bool mtk_hw_reset_check(struct mtk_eth *eth) @@ -5043,11 +5053,14 @@ static const struct mtk_soc_data mt2701_data = { .desc_size = sizeof(struct mtk_tx_dma), .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, + .dma_size = MTK_DMA_SIZE(2K), + .fq_dma_size = MTK_DMA_SIZE(2K), }, .rx = { .desc_size = sizeof(struct mtk_rx_dma), .irq_done_mask = MTK_RX_DONE_INT, .dma_l4_valid = RX_DMA_L4_VALID, + .dma_size = MTK_DMA_SIZE(2K), .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, }, @@ -5067,11 +5080,14 @@ static const struct mtk_soc_data mt7621_data = { .desc_size = sizeof(struct mtk_tx_dma), .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, + .dma_size = MTK_DMA_SIZE(2K), + .fq_dma_size = MTK_DMA_SIZE(2K), }, .rx = { .desc_size = sizeof(struct mtk_rx_dma), .irq_done_mask = MTK_RX_DONE_INT, .dma_l4_valid = RX_DMA_L4_VALID, + .dma_size = MTK_DMA_SIZE(2K), .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, }, @@ -5093,11 +5109,14 @@ static const struct mtk_soc_data mt7622_data = { .desc_size = sizeof(struct mtk_tx_dma), .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, + .dma_size = MTK_DMA_SIZE(2K), + .fq_dma_size = MTK_DMA_SIZE(2K), }, .rx = { .desc_size = sizeof(struct mtk_rx_dma), .irq_done_mask = MTK_RX_DONE_INT, .dma_l4_valid = RX_DMA_L4_VALID, + .dma_size = MTK_DMA_SIZE(2K), .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, }, @@ -5118,11 +5137,14 @@ static const struct mtk_soc_data mt7623_data = { .desc_size = sizeof(struct mtk_tx_dma), .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, + .dma_size = MTK_DMA_SIZE(2K), + .fq_dma_size = MTK_DMA_SIZE(2K), }, .rx = { .desc_size = sizeof(struct mtk_rx_dma), .irq_done_mask = MTK_RX_DONE_INT, .dma_l4_valid = RX_DMA_L4_VALID, + .dma_size = MTK_DMA_SIZE(2K), .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, }, @@ -5141,11 +5163,14 @@ static const struct mtk_soc_data mt7629_data = { .desc_size = sizeof(struct mtk_tx_dma), .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, + .dma_size = MTK_DMA_SIZE(2K), + .fq_dma_size = MTK_DMA_SIZE(2K), }, .rx = { .desc_size = sizeof(struct mtk_rx_dma), .irq_done_mask = MTK_RX_DONE_INT, .dma_l4_valid = RX_DMA_L4_VALID, + .dma_size = MTK_DMA_SIZE(2K), .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, }, @@ -5167,6 +5192,8 @@ static const struct mtk_soc_data mt7981_data = { .desc_size = sizeof(struct mtk_tx_dma_v2), .dma_max_len = MTK_TX_DMA_BUF_LEN_V2, .dma_len_offset = 8, + .dma_size = MTK_DMA_SIZE(2K), + .fq_dma_size = MTK_DMA_SIZE(2K), }, .rx = { .desc_size = sizeof(struct mtk_rx_dma), @@ -5174,6 +5201,7 @@ static const struct mtk_soc_data mt7981_data = { .dma_l4_valid = RX_DMA_L4_VALID_V2, .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, + .dma_size = MTK_DMA_SIZE(2K), }, };
@@ -5193,6 +5221,8 @@ static const struct mtk_soc_data mt7986_data = { .desc_size = sizeof(struct mtk_tx_dma_v2), .dma_max_len = MTK_TX_DMA_BUF_LEN_V2, .dma_len_offset = 8, + .dma_size = MTK_DMA_SIZE(2K), + .fq_dma_size = MTK_DMA_SIZE(2K), }, .rx = { .desc_size = sizeof(struct mtk_rx_dma), @@ -5200,6 +5230,7 @@ static const struct mtk_soc_data mt7986_data = { .dma_l4_valid = RX_DMA_L4_VALID_V2, .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, + .dma_size = MTK_DMA_SIZE(2K), }, };
@@ -5219,6 +5250,8 @@ static const struct mtk_soc_data mt7988_data = { .desc_size = sizeof(struct mtk_tx_dma_v2), .dma_max_len = MTK_TX_DMA_BUF_LEN_V2, .dma_len_offset = 8, + .dma_size = MTK_DMA_SIZE(2K), + .fq_dma_size = MTK_DMA_SIZE(4K), }, .rx = { .desc_size = sizeof(struct mtk_rx_dma_v2), @@ -5226,6 +5259,7 @@ static const struct mtk_soc_data mt7988_data = { .dma_l4_valid = RX_DMA_L4_VALID_V2, .dma_max_len = MTK_TX_DMA_BUF_LEN_V2, .dma_len_offset = 8, + .dma_size = MTK_DMA_SIZE(2K), }, };
@@ -5240,6 +5274,7 @@ static const struct mtk_soc_data rt5350_data = { .desc_size = sizeof(struct mtk_tx_dma), .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, + .dma_size = MTK_DMA_SIZE(2K), }, .rx = { .desc_size = sizeof(struct mtk_rx_dma), @@ -5247,6 +5282,7 @@ static const struct mtk_soc_data rt5350_data = { .dma_l4_valid = RX_DMA_L4_VALID_PDMA, .dma_max_len = MTK_TX_DMA_BUF_LEN, .dma_len_offset = 16, + .dma_size = MTK_DMA_SIZE(2K), }, };
diff --git a/drivers/net/ethernet/mediatek/mtk_eth_soc.h b/drivers/net/ethernet/mediatek/mtk_eth_soc.h index 39b50de1decbf..a25c33b9a4f34 100644 --- a/drivers/net/ethernet/mediatek/mtk_eth_soc.h +++ b/drivers/net/ethernet/mediatek/mtk_eth_soc.h @@ -32,7 +32,9 @@ #define MTK_TX_DMA_BUF_LEN 0x3fff #define MTK_TX_DMA_BUF_LEN_V2 0xffff #define MTK_QDMA_RING_SIZE 2048 -#define MTK_DMA_SIZE 512 +#define MTK_DMA_SIZE(x) (SZ_##x) +#define MTK_FQ_DMA_HEAD 32 +#define MTK_FQ_DMA_LENGTH 2048 #define MTK_RX_ETH_HLEN (ETH_HLEN + ETH_FCS_LEN) #define MTK_RX_HLEN (NET_SKB_PAD + MTK_RX_ETH_HLEN + NET_IP_ALIGN) #define MTK_DMA_DUMMY_DESC 0xffffffff @@ -1176,6 +1178,8 @@ struct mtk_soc_data { u32 desc_size; u32 dma_max_len; u32 dma_len_offset; + u32 dma_size; + u32 fq_dma_size; } tx; struct { u32 desc_size; @@ -1183,6 +1187,7 @@ struct mtk_soc_data { u32 dma_l4_valid; u32 dma_max_len; u32 dma_len_offset; + u32 dma_size; } rx; };
@@ -1264,7 +1269,7 @@ struct mtk_eth { struct napi_struct rx_napi; void *scratch_ring; dma_addr_t phy_scratch_ring; - void *scratch_head; + void *scratch_head[MTK_FQ_DMA_HEAD]; struct clk *clks[MTK_CLK_MAX];
struct mii_bus *mii_bus;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Moshe Shemesh moshe@nvidia.com
[ Upstream commit 33afbfcc105a572159750f2ebee834a8a70fdd96 ]
In case pci channel becomes offline the driver should not wait for PCI reads during health dump and recovery flow. The driver has timeout for each of these loops trying to read PCI, so it would fail anyway. However, in case of recovery waiting till timeout may cause the pci error_detected() callback fail to meet pci_dpc_recovered() wait timeout.
Fixes: b3bd076f7501 ("net/mlx5: Report devlink health on FW fatal issues") Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Shay Drori shayd@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/fw.c | 4 ++++ drivers/net/ethernet/mellanox/mlx5/core/health.c | 8 ++++++++ drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c | 4 ++++ 3 files changed, 16 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fw.c b/drivers/net/ethernet/mellanox/mlx5/core/fw.c index e7faf7e73ca48..6c7f2471fe629 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fw.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fw.c @@ -373,6 +373,10 @@ int mlx5_cmd_fast_teardown_hca(struct mlx5_core_dev *dev) do { if (mlx5_get_nic_state(dev) == MLX5_INITIAL_SEG_NIC_INTERFACE_DISABLED) break; + if (pci_channel_offline(dev->pdev)) { + mlx5_core_err(dev, "PCI channel offline, stop waiting for NIC IFC\n"); + return -EACCES; + }
cond_resched(); } while (!time_after(jiffies, end)); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index ad38e31822df1..a6329ca2d9bff 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -248,6 +248,10 @@ void mlx5_error_sw_reset(struct mlx5_core_dev *dev) do { if (mlx5_get_nic_state(dev) == MLX5_INITIAL_SEG_NIC_INTERFACE_DISABLED) break; + if (pci_channel_offline(dev->pdev)) { + mlx5_core_err(dev, "PCI channel offline, stop waiting for NIC IFC\n"); + goto unlock; + }
msleep(20); } while (!time_after(jiffies, end)); @@ -317,6 +321,10 @@ int mlx5_health_wait_pci_up(struct mlx5_core_dev *dev) mlx5_core_warn(dev, "device is being removed, stop waiting for PCI\n"); return -ENODEV; } + if (pci_channel_offline(dev->pdev)) { + mlx5_core_err(dev, "PCI channel offline, stop waiting for PCI\n"); + return -EACCES; + } msleep(100); } return 0; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c index 6b774e0c27665..d0b595ba61101 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/pci_vsc.c @@ -74,6 +74,10 @@ int mlx5_vsc_gw_lock(struct mlx5_core_dev *dev) ret = -EBUSY; goto pci_unlock; } + if (pci_channel_offline(dev->pdev)) { + ret = -EACCES; + goto pci_unlock; + }
/* Check if semaphore is already locked */ ret = vsc_read(dev, VSC_SEMAPHORE_OFFSET, &lock_val);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shay Drory shayd@nvidia.com
[ Upstream commit c8b3f38d2dae0397944814d691a419c451f9906f ]
Currently, if teardown_hca fails to execute during driver removal, mlx5 does not stop the health timer. Afterwards, mlx5 continue with driver teardown. This may lead to a UAF bug, which results in page fault Oops[1], since the health timer invokes after resources were freed.
Hence, stop the health monitor even if teardown_hca fails.
[1] mlx5_core 0000:18:00.0: E-Switch: Unload vfs: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) mlx5_core 0000:18:00.0: E-Switch: Disable: mode(LEGACY), nvfs(0), necvfs(0), active vports(0) mlx5_core 0000:18:00.0: E-Switch: cleanup mlx5_core 0000:18:00.0: wait_func:1155:(pid 1967079): TEARDOWN_HCA(0x103) timeout. Will cause a leak of a command resource mlx5_core 0000:18:00.0: mlx5_function_close:1288:(pid 1967079): tear_down_hca failed, skip cleanup BUG: unable to handle page fault for address: ffffa26487064230 PGD 100c00067 P4D 100c00067 PUD 100e5a067 PMD 105ed7067 PTE 0 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 0 Comm: swapper/0 Tainted: G OE ------- --- 6.7.0-68.fc38.x86_64 #1 Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0013.121520200651 12/15/2020 RIP: 0010:ioread32be+0x34/0x60 RSP: 0018:ffffa26480003e58 EFLAGS: 00010292 RAX: ffffa26487064200 RBX: ffff9042d08161a0 RCX: ffff904c108222c0 RDX: 000000010bbf1b80 RSI: ffffffffc055ddb0 RDI: ffffa26487064230 RBP: ffff9042d08161a0 R08: 0000000000000022 R09: ffff904c108222e8 R10: 0000000000000004 R11: 0000000000000441 R12: ffffffffc055ddb0 R13: ffffa26487064200 R14: ffffa26480003f00 R15: ffff904c108222c0 FS: 0000000000000000(0000) GS:ffff904c10800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffa26487064230 CR3: 00000002c4420006 CR4: 00000000007706f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <IRQ> ? __die+0x23/0x70 ? page_fault_oops+0x171/0x4e0 ? exc_page_fault+0x175/0x180 ? asm_exc_page_fault+0x26/0x30 ? __pfx_poll_health+0x10/0x10 [mlx5_core] ? __pfx_poll_health+0x10/0x10 [mlx5_core] ? ioread32be+0x34/0x60 mlx5_health_check_fatal_sensors+0x20/0x100 [mlx5_core] ? __pfx_poll_health+0x10/0x10 [mlx5_core] poll_health+0x42/0x230 [mlx5_core] ? __next_timer_interrupt+0xbc/0x110 ? __pfx_poll_health+0x10/0x10 [mlx5_core] call_timer_fn+0x21/0x130 ? __pfx_poll_health+0x10/0x10 [mlx5_core] __run_timers+0x222/0x2c0 run_timer_softirq+0x1d/0x40 __do_softirq+0xc9/0x2c8 __irq_exit_rcu+0xa6/0xc0 sysvec_apic_timer_interrupt+0x72/0x90 </IRQ> <TASK> asm_sysvec_apic_timer_interrupt+0x1a/0x20 RIP: 0010:cpuidle_enter_state+0xcc/0x440 ? cpuidle_enter_state+0xbd/0x440 cpuidle_enter+0x2d/0x40 do_idle+0x20d/0x270 cpu_startup_entry+0x2a/0x30 rest_init+0xd0/0xd0 arch_call_rest_init+0xe/0x30 start_kernel+0x709/0xa90 x86_64_start_reservations+0x18/0x30 x86_64_start_kernel+0x96/0xa0 secondary_startup_64_no_verify+0x18f/0x19b ---[ end trace 0000000000000000 ]---
Fixes: 9b98d395b85d ("net/mlx5: Start health poll at earlier stage of driver load") Signed-off-by: Shay Drory shayd@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 6574c145dc1e2..459a836a5d9c1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -1298,6 +1298,9 @@ static int mlx5_function_teardown(struct mlx5_core_dev *dev, bool boot)
if (!err) mlx5_function_disable(dev, boot); + else + mlx5_stop_health_poll(dev, boot); + return err; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aleksandr Mishin amishin@t-argos.ru
[ Upstream commit 229bedbf62b13af5aba6525ad10b62ad38d9ccb5 ]
In case of flow rule creation fail in mlx5_lag_create_port_sel_table(), instead of previously created rules, the tainted pointer is deleted deveral times. Fix this bug by using correct flow rules pointers.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 352899f384d4 ("net/mlx5: Lag, use buckets in hash mode") Signed-off-by: Aleksandr Mishin amishin@t-argos.ru Reviewed-by: Jacob Keller jacob.e.keller@intel.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Link: https://lore.kernel.org/r/20240604100552.25201-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c index 101b3bb908638..e12bc4cd80661 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/port_sel.c @@ -88,9 +88,13 @@ static int mlx5_lag_create_port_sel_table(struct mlx5_lag *ldev, &dest, 1); if (IS_ERR(lag_definer->rules[idx])) { err = PTR_ERR(lag_definer->rules[idx]); - while (i--) - while (j--) + do { + while (j--) { + idx = i * ldev->buckets + j; mlx5_del_flow_rules(lag_definer->rules[idx]); + } + j = ldev->buckets; + } while (i--); goto destroy_fg; } }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit f921a58ae20852d188f70842431ce6519c4fdc36 ]
If one TCA_TAPRIO_ATTR_PRIOMAP attribute has been provided, taprio_parse_mqprio_opt() must validate it, or userspace can inject arbitrary data to the kernel, the second time taprio_change() is called.
First call (with valid attributes) sets dev->num_tc to a non zero value.
Second call (with arbitrary mqprio attributes) returns early from taprio_parse_mqprio_opt() and bad things can happen.
Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule") Reported-by: Noam Rathaus noamr@ssd-disclosure.com Signed-off-by: Eric Dumazet edumazet@google.com Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Reviewed-by: Vladimir Oltean vladimir.oltean@nxp.com Link: https://lore.kernel.org/r/20240604181511.769870-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_taprio.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
diff --git a/net/sched/sch_taprio.c b/net/sched/sch_taprio.c index 5c3f8a278fc2f..0b150b13bee7a 100644 --- a/net/sched/sch_taprio.c +++ b/net/sched/sch_taprio.c @@ -1176,16 +1176,13 @@ static int taprio_parse_mqprio_opt(struct net_device *dev, { bool allow_overlapping_txqs = TXTIME_ASSIST_IS_ENABLED(taprio_flags);
- if (!qopt && !dev->num_tc) { - NL_SET_ERR_MSG(extack, "'mqprio' configuration is necessary"); - return -EINVAL; - } - - /* If num_tc is already set, it means that the user already - * configured the mqprio part - */ - if (dev->num_tc) + if (!qopt) { + if (!dev->num_tc) { + NL_SET_ERR_MSG(extack, "'mqprio' configuration is necessary"); + return -EINVAL; + } return 0; + }
/* taprio imposes that traffic classes map 1:n to tx queues */ if (qopt->num_tc > dev->num_tx_queues) {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Karol Kolacinski karol.kolacinski@intel.com
[ Upstream commit 323a359f9b077f382f4483023d096a4d316fd135 ]
On failed verification of PTP clock pin, error message prints channel number instead of pin index after "pin", which is incorrect.
Fix error message by adding channel number to the message and printing pin number instead of channel number.
Fixes: 6092315dfdec ("ptp: introduce programmable pins.") Signed-off-by: Karol Kolacinski karol.kolacinski@intel.com Acked-by: Richard Cochran richardcochran@gmail.com Link: https://lore.kernel.org/r/20240604120555.16643-1-karol.kolacinski@intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ptp/ptp_chardev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/ptp/ptp_chardev.c b/drivers/ptp/ptp_chardev.c index 7513018c9f9ac..2067b0120d083 100644 --- a/drivers/ptp/ptp_chardev.c +++ b/drivers/ptp/ptp_chardev.c @@ -85,7 +85,8 @@ int ptp_set_pinfunc(struct ptp_clock *ptp, unsigned int pin, }
if (info->verify(info, pin, func, chan)) { - pr_err("driver cannot use function %u on pin %u\n", func, chan); + pr_err("driver cannot use function %u and channel %u on pin %u\n", + func, chan, pin); return -EOPNOTSUPP; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacob Keller jacob.e.keller@intel.com
[ Upstream commit 03e4a092be8ce3de7c1baa7ae14e68b64e3ea644 ]
The ice_get_pfa_module_tlv() function iterates over the Type-Length-Value structures in the Preserved Fields Area (PFA) of the NVM. This is used by the driver to access data such as the Part Board Assembly identifier.
The function uses simple logic to iterate over the PFA. First, the pointer to the PFA in the NVM is read. Then the total length of the PFA is read from the first word.
A pointer to the first TLV is initialized, and a simple loop iterates over each TLV. The pointer is moved forward through the NVM until it exceeds the PFA area.
The logic seems sound, but it is missing a key detail. The Preserved Fields Area length includes one additional final word. This is documented in the device data sheet as a dummy word which contains 0xFFFF. All NVMs have this extra word.
If the driver tries to scan for a TLV that is not in the PFA, it will read past the size of the PFA. It reads and interprets the last dummy word of the PFA as a TLV with type 0xFFFF. It then reads the word following the PFA as a length.
The PFA resides within the Shadow RAM portion of the NVM, which is relatively small. All of its offsets are within a 16-bit size. The PFA pointer and TLV pointer are stored by the driver as 16-bit values.
In almost all cases, the word following the PFA will be such that interpreting it as a length will result in 16-bit arithmetic overflow. Once overflowed, the new next_tlv value is now below the maximum offset of the PFA. Thus, the driver will continue to iterate the data as TLVs. In the worst case, the driver hits on a sequence of reads which loop back to reading the same offsets in an endless loop.
To fix this, we need to correct the loop iteration check to account for this extra word at the end of the PFA. This alone is sufficient to resolve the known cases of this issue in the field. However, it is plausible that an NVM could be misconfigured or have corrupt data which results in the same kind of overflow. Protect against this by using check_add_overflow when calculating both the maximum offset of the TLVs, and when calculating the next_tlv offset at the end of each loop iteration. This ensures that the driver will not get stuck in an infinite loop when scanning the PFA.
Fixes: e961b679fb0b ("ice: add board identifier info to devlink .info_get") Co-developed-by: Paul Greenwalt paul.greenwalt@intel.com Signed-off-by: Paul Greenwalt paul.greenwalt@intel.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com Signed-off-by: Jacob Keller jacob.e.keller@intel.com Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-1-e3563... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_nvm.c | 28 ++++++++++++++++++------ 1 file changed, 21 insertions(+), 7 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c index d4e05d2cb30c4..a0ad950cc76d9 100644 --- a/drivers/net/ethernet/intel/ice/ice_nvm.c +++ b/drivers/net/ethernet/intel/ice/ice_nvm.c @@ -441,8 +441,7 @@ int ice_get_pfa_module_tlv(struct ice_hw *hw, u16 *module_tlv, u16 *module_tlv_len, u16 module_type) { - u16 pfa_len, pfa_ptr; - u16 next_tlv; + u16 pfa_len, pfa_ptr, next_tlv, max_tlv; int status;
status = ice_read_sr_word(hw, ICE_SR_PFA_PTR, &pfa_ptr); @@ -455,11 +454,23 @@ ice_get_pfa_module_tlv(struct ice_hw *hw, u16 *module_tlv, u16 *module_tlv_len, ice_debug(hw, ICE_DBG_INIT, "Failed to read PFA length.\n"); return status; } + + /* The Preserved Fields Area contains a sequence of Type-Length-Value + * structures which define its contents. The PFA length includes all + * of the TLVs, plus the initial length word itself, *and* one final + * word at the end after all of the TLVs. + */ + if (check_add_overflow(pfa_ptr, pfa_len - 1, &max_tlv)) { + dev_warn(ice_hw_to_dev(hw), "PFA starts at offset %u. PFA length of %u caused 16-bit arithmetic overflow.\n", + pfa_ptr, pfa_len); + return -EINVAL; + } + /* Starting with first TLV after PFA length, iterate through the list * of TLVs to find the requested one. */ next_tlv = pfa_ptr + 1; - while (next_tlv < pfa_ptr + pfa_len) { + while (next_tlv < max_tlv) { u16 tlv_sub_module_type; u16 tlv_len;
@@ -483,10 +494,13 @@ ice_get_pfa_module_tlv(struct ice_hw *hw, u16 *module_tlv, u16 *module_tlv_len, } return -EINVAL; } - /* Check next TLV, i.e. current TLV pointer + length + 2 words - * (for current TLV's type and length) - */ - next_tlv = next_tlv + tlv_len + 2; + + if (check_add_overflow(next_tlv, 2, &next_tlv) || + check_add_overflow(next_tlv, tlv_len, &next_tlv)) { + dev_warn(ice_hw_to_dev(hw), "TLV of type %u and length 0x%04x caused 16-bit arithmetic overflow. The PFA starts at 0x%04x and has length of 0x%04x\n", + tlv_sub_module_type, tlv_len, pfa_ptr, pfa_len); + return -EINVAL; + } } /* Module does not exist */ return -ENOENT;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jacob Keller jacob.e.keller@intel.com
[ Upstream commit cfa747a66e5da34793ac08c26b814e7709613fab ]
The ice driver reads data from the Shadow RAM portion of the NVM during initialization, including data used to identify the NVM image and device, such as the ETRACK ID used to populate devlink dev info fw.bundle.
Currently it is using a fixed offset defined by ICE_CSS_HEADER_LENGTH to compute the appropriate offset. This worked fine for E810 and E822 devices which both have CSS header length of 330 words.
Other devices, including both E825-C and E830 devices have different sizes for their CSS header. The use of a hard coded value results in the driver reading from the wrong block in the NVM when attempting to access the Shadow RAM copy. This results in the driver reporting the fw.bundle as 0x0 in both the devlink dev info and ethtool -i output.
The first E830 support was introduced by commit ba20ecb1d1bb ("ice: Hook up 4 E830 devices by adding their IDs") and the first E825-C support was introducted by commit f64e18944233 ("ice: introduce new E825C devices family")
The NVM actually contains the CSS header length embedded in it. Remove the hard coded value and replace it with logic to read the length from the NVM directly. This is more resilient against all existing and future hardware, vs looking up the expected values from a table. It ensures the driver will read from the appropriate place when determining the ETRACK ID value used for populating the fw.bundle_id and for reporting in ethtool -i.
The CSS header length for both the active and inactive flash bank is stored in the ice_bank_info structure to avoid unnecessary duplicate work when accessing multiple words of the Shadow RAM. Both banks are read in the unlikely event that the header length is different for the NVM in the inactive bank, rather than being different only by the overall device family.
Fixes: ba20ecb1d1bb ("ice: Hook up 4 E830 devices by adding their IDs") Co-developed-by: Paul Greenwalt paul.greenwalt@intel.com Signed-off-by: Paul Greenwalt paul.greenwalt@intel.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Tested-by: Pucha Himasekhar Reddy himasekharx.reddy.pucha@intel.com Signed-off-by: Jacob Keller jacob.e.keller@intel.com Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-2-e3563... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_nvm.c | 88 ++++++++++++++++++++++- drivers/net/ethernet/intel/ice/ice_type.h | 14 ++-- 2 files changed, 93 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_nvm.c b/drivers/net/ethernet/intel/ice/ice_nvm.c index a0ad950cc76d9..8510a02afedcc 100644 --- a/drivers/net/ethernet/intel/ice/ice_nvm.c +++ b/drivers/net/ethernet/intel/ice/ice_nvm.c @@ -375,11 +375,25 @@ ice_read_nvm_module(struct ice_hw *hw, enum ice_bank_select bank, u32 offset, u1 * * Read the specified word from the copy of the Shadow RAM found in the * specified NVM module. + * + * Note that the Shadow RAM copy is always located after the CSS header, and + * is aligned to 64-byte (32-word) offsets. */ static int ice_read_nvm_sr_copy(struct ice_hw *hw, enum ice_bank_select bank, u32 offset, u16 *data) { - return ice_read_nvm_module(hw, bank, ICE_NVM_SR_COPY_WORD_OFFSET + offset, data); + u32 sr_copy; + + switch (bank) { + case ICE_ACTIVE_FLASH_BANK: + sr_copy = roundup(hw->flash.banks.active_css_hdr_len, 32); + break; + case ICE_INACTIVE_FLASH_BANK: + sr_copy = roundup(hw->flash.banks.inactive_css_hdr_len, 32); + break; + } + + return ice_read_nvm_module(hw, bank, sr_copy + offset, data); }
/** @@ -1024,6 +1038,72 @@ static int ice_determine_active_flash_banks(struct ice_hw *hw) return 0; }
+/** + * ice_get_nvm_css_hdr_len - Read the CSS header length from the NVM CSS header + * @hw: pointer to the HW struct + * @bank: whether to read from the active or inactive flash bank + * @hdr_len: storage for header length in words + * + * Read the CSS header length from the NVM CSS header and add the Authentication + * header size, and then convert to words. + * + * Return: zero on success, or a negative error code on failure. + */ +static int +ice_get_nvm_css_hdr_len(struct ice_hw *hw, enum ice_bank_select bank, + u32 *hdr_len) +{ + u16 hdr_len_l, hdr_len_h; + u32 hdr_len_dword; + int status; + + status = ice_read_nvm_module(hw, bank, ICE_NVM_CSS_HDR_LEN_L, + &hdr_len_l); + if (status) + return status; + + status = ice_read_nvm_module(hw, bank, ICE_NVM_CSS_HDR_LEN_H, + &hdr_len_h); + if (status) + return status; + + /* CSS header length is in DWORD, so convert to words and add + * authentication header size + */ + hdr_len_dword = hdr_len_h << 16 | hdr_len_l; + *hdr_len = (hdr_len_dword * 2) + ICE_NVM_AUTH_HEADER_LEN; + + return 0; +} + +/** + * ice_determine_css_hdr_len - Discover CSS header length for the device + * @hw: pointer to the HW struct + * + * Determine the size of the CSS header at the start of the NVM module. This + * is useful for locating the Shadow RAM copy in the NVM, as the Shadow RAM is + * always located just after the CSS header. + * + * Return: zero on success, or a negative error code on failure. + */ +static int ice_determine_css_hdr_len(struct ice_hw *hw) +{ + struct ice_bank_info *banks = &hw->flash.banks; + int status; + + status = ice_get_nvm_css_hdr_len(hw, ICE_ACTIVE_FLASH_BANK, + &banks->active_css_hdr_len); + if (status) + return status; + + status = ice_get_nvm_css_hdr_len(hw, ICE_INACTIVE_FLASH_BANK, + &banks->inactive_css_hdr_len); + if (status) + return status; + + return 0; +} + /** * ice_init_nvm - initializes NVM setting * @hw: pointer to the HW struct @@ -1070,6 +1150,12 @@ int ice_init_nvm(struct ice_hw *hw) return status; }
+ status = ice_determine_css_hdr_len(hw); + if (status) { + ice_debug(hw, ICE_DBG_NVM, "Failed to determine Shadow RAM copy offsets.\n"); + return status; + } + status = ice_get_nvm_ver_info(hw, ICE_ACTIVE_FLASH_BANK, &flash->nvm); if (status) { ice_debug(hw, ICE_DBG_INIT, "Failed to read NVM info.\n"); diff --git a/drivers/net/ethernet/intel/ice/ice_type.h b/drivers/net/ethernet/intel/ice/ice_type.h index 9ff92dba58236..0aacd0d050b8e 100644 --- a/drivers/net/ethernet/intel/ice/ice_type.h +++ b/drivers/net/ethernet/intel/ice/ice_type.h @@ -481,6 +481,8 @@ struct ice_bank_info { u32 orom_size; /* Size of OROM bank */ u32 netlist_ptr; /* Pointer to 1st Netlist bank */ u32 netlist_size; /* Size of Netlist bank */ + u32 active_css_hdr_len; /* Active CSS header length */ + u32 inactive_css_hdr_len; /* Inactive CSS header length */ enum ice_flash_bank nvm_bank; /* Active NVM bank */ enum ice_flash_bank orom_bank; /* Active OROM bank */ enum ice_flash_bank netlist_bank; /* Active Netlist bank */ @@ -1084,17 +1086,13 @@ struct ice_aq_get_set_rss_lut_params { #define ICE_SR_SECTOR_SIZE_IN_WORDS 0x800
/* CSS Header words */ +#define ICE_NVM_CSS_HDR_LEN_L 0x02 +#define ICE_NVM_CSS_HDR_LEN_H 0x03 #define ICE_NVM_CSS_SREV_L 0x14 #define ICE_NVM_CSS_SREV_H 0x15
-/* Length of CSS header section in words */ -#define ICE_CSS_HEADER_LENGTH 330 - -/* Offset of Shadow RAM copy in the NVM bank area. */ -#define ICE_NVM_SR_COPY_WORD_OFFSET roundup(ICE_CSS_HEADER_LENGTH, 32) - -/* Size in bytes of Option ROM trailer */ -#define ICE_NVM_OROM_TRAILER_LENGTH (2 * ICE_CSS_HEADER_LENGTH) +/* Length of Authentication header section in words */ +#define ICE_NVM_AUTH_HEADER_LEN 0x08
/* The Link Topology Netlist section is stored as a series of words. It is * stored in the NVM as a TLV, with the first two words containing the type
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Larysa Zaremba larysa.zaremba@intel.com
[ Upstream commit adbf5a42341f6ea038d3626cd4437d9f0ad0b2dd ]
Referenced commit has introduced a bitmap to distinguish between ZC and copy-mode AF_XDP queues, because xsk_get_pool_from_qid() does not do this for us.
The bitmap would be especially useful when restoring previous state after rebuild, if only it was not reallocated in the process. This leads to e.g. xdpsock dying after changing number of queues.
Instead of preserving the bitmap during the rebuild, remove it completely and distinguish between ZC and copy-mode queues based on the presence of a device associated with the pool.
Fixes: e102db780e1c ("ice: track AF_XDP ZC enabled queues in bitmap") Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Signed-off-by: Larysa Zaremba larysa.zaremba@intel.com Reviewed-by: Simon Horman horms@kernel.org Tested-by: Chandan Kumar Rout chandanx.rout@intel.com Signed-off-by: Jacob Keller jacob.e.keller@intel.com Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-3-e3563... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice.h | 32 ++++++++++++++++-------- drivers/net/ethernet/intel/ice/ice_lib.c | 8 ------ drivers/net/ethernet/intel/ice/ice_xsk.c | 13 +++++----- 3 files changed, 27 insertions(+), 26 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 365c03d1c4622..ca9b56d625841 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -412,7 +412,6 @@ struct ice_vsi { struct ice_tc_cfg tc_cfg; struct bpf_prog *xdp_prog; struct ice_tx_ring **xdp_rings; /* XDP ring array */ - unsigned long *af_xdp_zc_qps; /* tracks AF_XDP ZC enabled qps */ u16 num_xdp_txq; /* Used XDP queues */ u8 xdp_mapping_mode; /* ICE_MAP_MODE_[CONTIG|SCATTER] */
@@ -748,6 +747,25 @@ static inline void ice_set_ring_xdp(struct ice_tx_ring *ring) ring->flags |= ICE_TX_FLAGS_RING_XDP; }
+/** + * ice_get_xp_from_qid - get ZC XSK buffer pool bound to a queue ID + * @vsi: pointer to VSI + * @qid: index of a queue to look at XSK buff pool presence + * + * Return: A pointer to xsk_buff_pool structure if there is a buffer pool + * attached and configured as zero-copy, NULL otherwise. + */ +static inline struct xsk_buff_pool *ice_get_xp_from_qid(struct ice_vsi *vsi, + u16 qid) +{ + struct xsk_buff_pool *pool = xsk_get_pool_from_qid(vsi->netdev, qid); + + if (!ice_is_xdp_ena_vsi(vsi)) + return NULL; + + return (pool && pool->dev) ? pool : NULL; +} + /** * ice_xsk_pool - get XSK buffer pool bound to a ring * @ring: Rx ring to use @@ -760,10 +778,7 @@ static inline struct xsk_buff_pool *ice_xsk_pool(struct ice_rx_ring *ring) struct ice_vsi *vsi = ring->vsi; u16 qid = ring->q_index;
- if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi->af_xdp_zc_qps)) - return NULL; - - return xsk_get_pool_from_qid(vsi->netdev, qid); + return ice_get_xp_from_qid(vsi, qid); }
/** @@ -788,12 +803,7 @@ static inline void ice_tx_xsk_pool(struct ice_vsi *vsi, u16 qid) if (!ring) return;
- if (!ice_is_xdp_ena_vsi(vsi) || !test_bit(qid, vsi->af_xdp_zc_qps)) { - ring->xsk_pool = NULL; - return; - } - - ring->xsk_pool = xsk_get_pool_from_qid(vsi->netdev, qid); + ring->xsk_pool = ice_get_xp_from_qid(vsi, qid); }
/** diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 558422120312b..7d401a4dc4513 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -117,14 +117,8 @@ static int ice_vsi_alloc_arrays(struct ice_vsi *vsi) if (!vsi->q_vectors) goto err_vectors;
- vsi->af_xdp_zc_qps = bitmap_zalloc(max_t(int, vsi->alloc_txq, vsi->alloc_rxq), GFP_KERNEL); - if (!vsi->af_xdp_zc_qps) - goto err_zc_qps; - return 0;
-err_zc_qps: - devm_kfree(dev, vsi->q_vectors); err_vectors: devm_kfree(dev, vsi->rxq_map); err_rxq_map: @@ -328,8 +322,6 @@ static void ice_vsi_free_arrays(struct ice_vsi *vsi)
dev = ice_pf_to_dev(pf);
- bitmap_free(vsi->af_xdp_zc_qps); - vsi->af_xdp_zc_qps = NULL; /* free the ring and vector containers */ devm_kfree(dev, vsi->q_vectors); vsi->q_vectors = NULL; diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 1857220d27fee..86a865788e345 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -269,7 +269,6 @@ static int ice_xsk_pool_disable(struct ice_vsi *vsi, u16 qid) if (!pool) return -EINVAL;
- clear_bit(qid, vsi->af_xdp_zc_qps); xsk_pool_dma_unmap(pool, ICE_RX_DMA_ATTR);
return 0; @@ -300,8 +299,6 @@ ice_xsk_pool_enable(struct ice_vsi *vsi, struct xsk_buff_pool *pool, u16 qid) if (err) return err;
- set_bit(qid, vsi->af_xdp_zc_qps); - return 0; }
@@ -349,11 +346,13 @@ ice_realloc_rx_xdp_bufs(struct ice_rx_ring *rx_ring, bool pool_present) int ice_realloc_zc_buf(struct ice_vsi *vsi, bool zc) { struct ice_rx_ring *rx_ring; - unsigned long q; + uint i; + + ice_for_each_rxq(vsi, i) { + rx_ring = vsi->rx_rings[i]; + if (!rx_ring->xsk_pool) + continue;
- for_each_set_bit(q, vsi->af_xdp_zc_qps, - max_t(int, vsi->alloc_txq, vsi->alloc_rxq)) { - rx_ring = vsi->rx_rings[q]; if (ice_realloc_rx_xdp_bufs(rx_ring, zc)) return -ENOMEM; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Larysa Zaremba larysa.zaremba@intel.com
[ Upstream commit 744d197162c2070a6045a71e2666ed93a57cc65d ]
Commit 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") has placed ice_vsi_free_q_vectors() after ice_destroy_xdp_rings() in the rebuild process. The behaviour of the XDP rings config functions is context-dependent, so the change of order has led to ice_destroy_xdp_rings() doing additional work and removing XDP prog, when it was supposed to be preserved.
Also, dependency on the PF state reset flags creates an additional, fortunately less common problem:
* PFR is requested e.g. by tx_timeout handler * .ndo_bpf() is asked to delete the program, calls ice_destroy_xdp_rings(), but reset flag is set, so rings are destroyed without deleting the program * ice_vsi_rebuild tries to delete non-existent XDP rings, because the program is still on the VSI * system crashes
With a similar race, when requested to attach a program, ice_prepare_xdp_rings() can actually skip setting the program in the VSI and nevertheless report success.
Instead of reverting to the old order of function calls, add an enum argument to both ice_prepare_xdp_rings() and ice_destroy_xdp_rings() in order to distinguish between calls from rebuild and .ndo_bpf().
Fixes: efc2214b6047 ("ice: Add support for XDP") Reviewed-by: Igor Bagnucki igor.bagnucki@intel.com Signed-off-by: Larysa Zaremba larysa.zaremba@intel.com Reviewed-by: Simon Horman horms@kernel.org Tested-by: Chandan Kumar Rout chandanx.rout@intel.com Signed-off-by: Jacob Keller jacob.e.keller@intel.com Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-4-e3563... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice.h | 11 +++++++++-- drivers/net/ethernet/intel/ice/ice_lib.c | 5 +++-- drivers/net/ethernet/intel/ice/ice_main.c | 22 ++++++++++++---------- 3 files changed, 24 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index ca9b56d625841..a3286964d6c31 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -932,9 +932,16 @@ int ice_down(struct ice_vsi *vsi); int ice_down_up(struct ice_vsi *vsi); int ice_vsi_cfg_lan(struct ice_vsi *vsi); struct ice_vsi *ice_lb_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi); + +enum ice_xdp_cfg { + ICE_XDP_CFG_FULL, /* Fully apply new config in .ndo_bpf() */ + ICE_XDP_CFG_PART, /* Save/use part of config in VSI rebuild */ +}; + int ice_vsi_determine_xdp_res(struct ice_vsi *vsi); -int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog); -int ice_destroy_xdp_rings(struct ice_vsi *vsi); +int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog, + enum ice_xdp_cfg cfg_type); +int ice_destroy_xdp_rings(struct ice_vsi *vsi, enum ice_xdp_cfg cfg_type); int ice_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, u32 flags); diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 7d401a4dc4513..5de7c50b439e1 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -2334,7 +2334,8 @@ ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vsi_cfg_params *params) ret = ice_vsi_determine_xdp_res(vsi); if (ret) goto unroll_vector_base; - ret = ice_prepare_xdp_rings(vsi, vsi->xdp_prog); + ret = ice_prepare_xdp_rings(vsi, vsi->xdp_prog, + ICE_XDP_CFG_PART); if (ret) goto unroll_vector_base; } @@ -2485,7 +2486,7 @@ void ice_vsi_decfg(struct ice_vsi *vsi) /* return value check can be skipped here, it always returns * 0 if reset is in progress */ - ice_destroy_xdp_rings(vsi); + ice_destroy_xdp_rings(vsi, ICE_XDP_CFG_PART);
ice_vsi_clear_rings(vsi); ice_vsi_free_q_vectors(vsi); diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 33a164fa325ac..b53fe27dbed7d 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2674,10 +2674,12 @@ static void ice_vsi_assign_bpf_prog(struct ice_vsi *vsi, struct bpf_prog *prog) * ice_prepare_xdp_rings - Allocate, configure and setup Tx rings for XDP * @vsi: VSI to bring up Tx rings used by XDP * @prog: bpf program that will be assigned to VSI + * @cfg_type: create from scratch or restore the existing configuration * * Return 0 on success and negative value on error */ -int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog) +int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog, + enum ice_xdp_cfg cfg_type) { u16 max_txqs[ICE_MAX_TRAFFIC_CLASS] = { 0 }; int xdp_rings_rem = vsi->num_xdp_txq; @@ -2753,7 +2755,7 @@ int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog) * taken into account at the end of ice_vsi_rebuild, where * ice_cfg_vsi_lan is being called */ - if (ice_is_reset_in_progress(pf->state)) + if (cfg_type == ICE_XDP_CFG_PART) return 0;
/* tell the Tx scheduler that right now we have @@ -2805,22 +2807,21 @@ int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog) /** * ice_destroy_xdp_rings - undo the configuration made by ice_prepare_xdp_rings * @vsi: VSI to remove XDP rings + * @cfg_type: disable XDP permanently or allow it to be restored later * * Detach XDP rings from irq vectors, clean up the PF bitmap and free * resources */ -int ice_destroy_xdp_rings(struct ice_vsi *vsi) +int ice_destroy_xdp_rings(struct ice_vsi *vsi, enum ice_xdp_cfg cfg_type) { u16 max_txqs[ICE_MAX_TRAFFIC_CLASS] = { 0 }; struct ice_pf *pf = vsi->back; int i, v_idx;
/* q_vectors are freed in reset path so there's no point in detaching - * rings; in case of rebuild being triggered not from reset bits - * in pf->state won't be set, so additionally check first q_vector - * against NULL + * rings */ - if (ice_is_reset_in_progress(pf->state) || !vsi->q_vectors[0]) + if (cfg_type == ICE_XDP_CFG_PART) goto free_qmap;
ice_for_each_q_vector(vsi, v_idx) { @@ -2861,7 +2862,7 @@ int ice_destroy_xdp_rings(struct ice_vsi *vsi) if (static_key_enabled(&ice_xdp_locking_key)) static_branch_dec(&ice_xdp_locking_key);
- if (ice_is_reset_in_progress(pf->state) || !vsi->q_vectors[0]) + if (cfg_type == ICE_XDP_CFG_PART) return 0;
ice_vsi_assign_bpf_prog(vsi, NULL); @@ -2972,7 +2973,8 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog, if (xdp_ring_err) { NL_SET_ERR_MSG_MOD(extack, "Not enough Tx resources for XDP"); } else { - xdp_ring_err = ice_prepare_xdp_rings(vsi, prog); + xdp_ring_err = ice_prepare_xdp_rings(vsi, prog, + ICE_XDP_CFG_FULL); if (xdp_ring_err) NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Tx resources failed"); } @@ -2983,7 +2985,7 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog, NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Rx resources failed"); } else if (ice_is_xdp_ena_vsi(vsi) && !prog) { xdp_features_clear_redirect_target(vsi->netdev); - xdp_ring_err = ice_destroy_xdp_rings(vsi); + xdp_ring_err = ice_destroy_xdp_rings(vsi, ICE_XDP_CFG_FULL); if (xdp_ring_err) NL_SET_ERR_MSG_MOD(extack, "Freeing XDP Tx resources failed"); /* reallocate Rx queues that were used for zero-copy */
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Larysa Zaremba larysa.zaremba@intel.com
[ Upstream commit f3df4044254c98128890b512bf19cc05588f1fe5 ]
ice_pf_dcb_recfg() re-maps queues to vectors with ice_vsi_map_rings_to_vectors(), which does not restore the previous state for XDP queues. This leads to no AF_XDP traffic after rebuild.
Map XDP queues to vectors in ice_vsi_map_rings_to_vectors(). Also, move the code around, so XDP queues are mapped independently only through .ndo_bpf().
Fixes: 6624e780a577 ("ice: split ice_vsi_setup into smaller functions") Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Signed-off-by: Larysa Zaremba larysa.zaremba@intel.com Reviewed-by: Simon Horman horms@kernel.org Tested-by: Chandan Kumar Rout chandanx.rout@intel.com Signed-off-by: Jacob Keller jacob.e.keller@intel.com Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-5-e3563... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice.h | 1 + drivers/net/ethernet/intel/ice/ice_base.c | 3 + drivers/net/ethernet/intel/ice/ice_lib.c | 14 ++-- drivers/net/ethernet/intel/ice/ice_main.c | 96 ++++++++++++++--------- 4 files changed, 68 insertions(+), 46 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index a3286964d6c31..8e40f26aa5060 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -942,6 +942,7 @@ int ice_vsi_determine_xdp_res(struct ice_vsi *vsi); int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog, enum ice_xdp_cfg cfg_type); int ice_destroy_xdp_rings(struct ice_vsi *vsi, enum ice_xdp_cfg cfg_type); +void ice_map_xdp_rings(struct ice_vsi *vsi); int ice_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, u32 flags); diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index a545a7917e4fc..9d23a436d2a6a 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -860,6 +860,9 @@ void ice_vsi_map_rings_to_vectors(struct ice_vsi *vsi) } rx_rings_rem -= rx_rings_per_v; } + + if (ice_is_xdp_ena_vsi(vsi)) + ice_map_xdp_rings(vsi); }
/** diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 5de7c50b439e1..acf732ce04ed6 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -2323,13 +2323,6 @@ ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vsi_cfg_params *params) if (ret) goto unroll_vector_base;
- ice_vsi_map_rings_to_vectors(vsi); - - /* Associate q_vector rings to napi */ - ice_vsi_set_napi_queues(vsi); - - vsi->stat_offsets_loaded = false; - if (ice_is_xdp_ena_vsi(vsi)) { ret = ice_vsi_determine_xdp_res(vsi); if (ret) @@ -2340,6 +2333,13 @@ ice_vsi_cfg_def(struct ice_vsi *vsi, struct ice_vsi_cfg_params *params) goto unroll_vector_base; }
+ ice_vsi_map_rings_to_vectors(vsi); + + /* Associate q_vector rings to napi */ + ice_vsi_set_napi_queues(vsi); + + vsi->stat_offsets_loaded = false; + /* ICE_VSI_CTRL does not need RSS so skip RSS processing */ if (vsi->type != ICE_VSI_CTRL) /* Do not exit if configuring RSS had an issue, at diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index b53fe27dbed7d..10fef2e726b39 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2670,6 +2670,60 @@ static void ice_vsi_assign_bpf_prog(struct ice_vsi *vsi, struct bpf_prog *prog) bpf_prog_put(old_prog); }
+static struct ice_tx_ring *ice_xdp_ring_from_qid(struct ice_vsi *vsi, int qid) +{ + struct ice_q_vector *q_vector; + struct ice_tx_ring *ring; + + if (static_key_enabled(&ice_xdp_locking_key)) + return vsi->xdp_rings[qid % vsi->num_xdp_txq]; + + q_vector = vsi->rx_rings[qid]->q_vector; + ice_for_each_tx_ring(ring, q_vector->tx) + if (ice_ring_is_xdp(ring)) + return ring; + + return NULL; +} + +/** + * ice_map_xdp_rings - Map XDP rings to interrupt vectors + * @vsi: the VSI with XDP rings being configured + * + * Map XDP rings to interrupt vectors and perform the configuration steps + * dependent on the mapping. + */ +void ice_map_xdp_rings(struct ice_vsi *vsi) +{ + int xdp_rings_rem = vsi->num_xdp_txq; + int v_idx, q_idx; + + /* follow the logic from ice_vsi_map_rings_to_vectors */ + ice_for_each_q_vector(vsi, v_idx) { + struct ice_q_vector *q_vector = vsi->q_vectors[v_idx]; + int xdp_rings_per_v, q_id, q_base; + + xdp_rings_per_v = DIV_ROUND_UP(xdp_rings_rem, + vsi->num_q_vectors - v_idx); + q_base = vsi->num_xdp_txq - xdp_rings_rem; + + for (q_id = q_base; q_id < (q_base + xdp_rings_per_v); q_id++) { + struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_id]; + + xdp_ring->q_vector = q_vector; + xdp_ring->next = q_vector->tx.tx_ring; + q_vector->tx.tx_ring = xdp_ring; + } + xdp_rings_rem -= xdp_rings_per_v; + } + + ice_for_each_rxq(vsi, q_idx) { + vsi->rx_rings[q_idx]->xdp_ring = ice_xdp_ring_from_qid(vsi, + q_idx); + ice_tx_xsk_pool(vsi, q_idx); + } +} + /** * ice_prepare_xdp_rings - Allocate, configure and setup Tx rings for XDP * @vsi: VSI to bring up Tx rings used by XDP @@ -2682,7 +2736,6 @@ int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog, enum ice_xdp_cfg cfg_type) { u16 max_txqs[ICE_MAX_TRAFFIC_CLASS] = { 0 }; - int xdp_rings_rem = vsi->num_xdp_txq; struct ice_pf *pf = vsi->back; struct ice_qs_cfg xdp_qs_cfg = { .qs_mutex = &pf->avail_q_mutex, @@ -2695,8 +2748,7 @@ int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog, .mapping_mode = ICE_VSI_MAP_CONTIG }; struct device *dev; - int i, v_idx; - int status; + int status, i;
dev = ice_pf_to_dev(pf); vsi->xdp_rings = devm_kcalloc(dev, vsi->num_xdp_txq, @@ -2715,42 +2767,6 @@ int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog, if (ice_xdp_alloc_setup_rings(vsi)) goto clear_xdp_rings;
- /* follow the logic from ice_vsi_map_rings_to_vectors */ - ice_for_each_q_vector(vsi, v_idx) { - struct ice_q_vector *q_vector = vsi->q_vectors[v_idx]; - int xdp_rings_per_v, q_id, q_base; - - xdp_rings_per_v = DIV_ROUND_UP(xdp_rings_rem, - vsi->num_q_vectors - v_idx); - q_base = vsi->num_xdp_txq - xdp_rings_rem; - - for (q_id = q_base; q_id < (q_base + xdp_rings_per_v); q_id++) { - struct ice_tx_ring *xdp_ring = vsi->xdp_rings[q_id]; - - xdp_ring->q_vector = q_vector; - xdp_ring->next = q_vector->tx.tx_ring; - q_vector->tx.tx_ring = xdp_ring; - } - xdp_rings_rem -= xdp_rings_per_v; - } - - ice_for_each_rxq(vsi, i) { - if (static_key_enabled(&ice_xdp_locking_key)) { - vsi->rx_rings[i]->xdp_ring = vsi->xdp_rings[i % vsi->num_xdp_txq]; - } else { - struct ice_q_vector *q_vector = vsi->rx_rings[i]->q_vector; - struct ice_tx_ring *ring; - - ice_for_each_tx_ring(ring, q_vector->tx) { - if (ice_ring_is_xdp(ring)) { - vsi->rx_rings[i]->xdp_ring = ring; - break; - } - } - } - ice_tx_xsk_pool(vsi, i); - } - /* omit the scheduler update if in reset path; XDP queues will be * taken into account at the end of ice_vsi_rebuild, where * ice_cfg_vsi_lan is being called @@ -2758,6 +2774,8 @@ int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog, if (cfg_type == ICE_XDP_CFG_PART) return 0;
+ ice_map_xdp_rings(vsi); + /* tell the Tx scheduler that right now we have * additional queues */
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sasha Neftin sasha.neftin@intel.com
[ Upstream commit 7d67d11fbe194f71298263f48e33ae2afa38197e ]
The commit 01cf893bf0f4 ("net: intel: i40e/igc: Remove setting Autoneg in EEE capabilities") removed SUPPORTED_Autoneg field but left inappropriate ethtool_keee structure initialization. When "ethtool --show <device>" (get_eee) invoke, the 'ethtool_keee' structure was accidentally overridden. Remove the 'ethtool_keee' overriding and add EEE declaration as per IEEE specification that allows reporting Energy Efficient Ethernet capabilities.
Examples: Before fix: ethtool --show-eee enp174s0 EEE settings for enp174s0: EEE status: not supported
After fix: EEE settings for enp174s0: EEE status: disabled Tx LPI: disabled Supported EEE link modes: 100baseT/Full 1000baseT/Full 2500baseT/Full
Fixes: 01cf893bf0f4 ("net: intel: i40e/igc: Remove setting Autoneg in EEE capabilities") Suggested-by: Dima Ruinskiy dima.ruinskiy@intel.com Signed-off-by: Sasha Neftin sasha.neftin@intel.com Tested-by: Naama Meir naamax.meir@linux.intel.com Signed-off-by: Jacob Keller jacob.e.keller@intel.com Link: https://lore.kernel.org/r/20240603-net-2024-05-30-intel-net-fixes-v2-6-e3563... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc_ethtool.c | 9 +++++++-- drivers/net/ethernet/intel/igc/igc_main.c | 4 ++++ 2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_ethtool.c b/drivers/net/ethernet/intel/igc/igc_ethtool.c index 1a64f1ca6ca86..e699412d22f68 100644 --- a/drivers/net/ethernet/intel/igc/igc_ethtool.c +++ b/drivers/net/ethernet/intel/igc/igc_ethtool.c @@ -1629,12 +1629,17 @@ static int igc_ethtool_get_eee(struct net_device *netdev, struct igc_hw *hw = &adapter->hw; u32 eeer;
+ linkmode_set_bit(ETHTOOL_LINK_MODE_2500baseT_Full_BIT, + edata->supported); + linkmode_set_bit(ETHTOOL_LINK_MODE_1000baseT_Full_BIT, + edata->supported); + linkmode_set_bit(ETHTOOL_LINK_MODE_100baseT_Full_BIT, + edata->supported); + if (hw->dev_spec._base.eee_enable) mii_eee_cap1_mod_linkmode_t(edata->advertised, adapter->eee_advert);
- *edata = adapter->eee; - eeer = rd32(IGC_EEER);
/* EEE status on negotiated link */ diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 4d975d620a8e4..58bc96021bb4c 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -12,6 +12,7 @@ #include <linux/bpf_trace.h> #include <net/xdp_sock_drv.h> #include <linux/pci.h> +#include <linux/mdio.h>
#include <net/ipv6.h>
@@ -4876,6 +4877,9 @@ void igc_up(struct igc_adapter *adapter) /* start the watchdog. */ hw->mac.get_link_status = true; schedule_work(&adapter->watchdog_task); + + adapter->eee_advert = MDIO_EEE_100TX | MDIO_EEE_1000T | + MDIO_EEE_2_5GT; }
/**
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aleksandr Mishin amishin@t-argos.ru
[ Upstream commit b0c9a26435413b81799047a7be53255640432547 ]
In case of region creation fail in ipc_devlink_create_region(), previously created regions delete process starts from tainted pointer which actually holds error code value. Fix this bug by decreasing region index before delete.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 4dcd183fbd67 ("net: wwan: iosm: devlink registration") Signed-off-by: Aleksandr Mishin amishin@t-argos.ru Acked-by: Sergey Ryazanov ryazanov.s.a@gmail.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240604082500.20769-1-amishin@t-argos.ru Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wwan/iosm/iosm_ipc_devlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wwan/iosm/iosm_ipc_devlink.c b/drivers/net/wwan/iosm/iosm_ipc_devlink.c index 2fe724d623c06..33c5a46f1b922 100644 --- a/drivers/net/wwan/iosm/iosm_ipc_devlink.c +++ b/drivers/net/wwan/iosm/iosm_ipc_devlink.c @@ -210,7 +210,7 @@ static int ipc_devlink_create_region(struct iosm_devlink *devlink) rc = PTR_ERR(devlink->cd_regions[i]); dev_err(devlink->dev, "Devlink region fail,err %d", rc); /* Delete previously created regions */ - for ( ; i >= 0; i--) + for (i--; i >= 0; i--) devlink_region_destroy(devlink->cd_regions[i]); goto region_create_fail; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 26bfb8b57063f52b867f9b6c8d1742fcb5bd656c ]
When a SOCK_DGRAM socket connect()s to another socket, the both sockets' sk->sk_state are changed to TCP_ESTABLISHED so that we can register them to BPF SOCKMAP.
When the socket disconnects from the peer by connect(AF_UNSPEC), the state is set back to TCP_CLOSE.
Then, the peer's state is also set to TCP_CLOSE, but the update is done locklessly and unconditionally.
Let's say socket A connect()ed to B, B connect()ed to C, and A disconnects from B.
After the first two connect()s, all three sockets' sk->sk_state are TCP_ESTABLISHED:
$ ss -xa Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess u_dgr ESTAB 0 0 @A 641 * 642 u_dgr ESTAB 0 0 @B 642 * 643 u_dgr ESTAB 0 0 @C 643 * 0
And after the disconnect, B's state is TCP_CLOSE even though it's still connected to C and C's state is TCP_ESTABLISHED.
$ ss -xa Netid State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess u_dgr UNCONN 0 0 @A 641 * 0 u_dgr UNCONN 0 0 @B 642 * 643 u_dgr ESTAB 0 0 @C 643 * 0
In this case, we cannot register B to SOCKMAP.
So, when a socket disconnects from the peer, we should not set TCP_CLOSE to the peer if the peer is connected to yet another socket, and this must be done under unix_state_lock().
Note that we use WRITE_ONCE() for sk->sk_state as there are many lockless readers. These data-races will be fixed in the following patches.
Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 439c531744a27..c0cf7137979c7 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -570,7 +570,6 @@ static void unix_dgram_disconnected(struct sock *sk, struct sock *other) sk_error_report(other); } } - other->sk_state = TCP_CLOSE; }
static void unix_sock_destructor(struct sock *sk) @@ -1424,8 +1423,15 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
unix_state_double_unlock(sk, other);
- if (other != old_peer) + if (other != old_peer) { unix_dgram_disconnected(sk, old_peer); + + unix_state_lock(old_peer); + if (!unix_peer(old_peer)) + WRITE_ONCE(old_peer->sk_state, TCP_CLOSE); + unix_state_unlock(old_peer); + } + sock_put(old_peer); } else { unix_peer(sk) = other;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 942238f9735a4a4ebf8274b218d9a910158941d1 ]
sk->sk_state is changed under unix_state_lock(), but it's read locklessly in many places.
This patch adds WRITE_ONCE() on the writer side.
We will add READ_ONCE() to the lockless readers in the following patches.
Fixes: 83301b5367a9 ("af_unix: Set TCP_ESTABLISHED for datagram sockets too") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index c0cf7137979c7..0a9c3975d4303 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -616,7 +616,7 @@ static void unix_release_sock(struct sock *sk, int embrion) u->path.dentry = NULL; u->path.mnt = NULL; state = sk->sk_state; - sk->sk_state = TCP_CLOSE; + WRITE_ONCE(sk->sk_state, TCP_CLOSE);
skpair = unix_peer(sk); unix_peer(sk) = NULL; @@ -738,7 +738,8 @@ static int unix_listen(struct socket *sock, int backlog) if (backlog > sk->sk_max_ack_backlog) wake_up_interruptible_all(&u->peer_wait); sk->sk_max_ack_backlog = backlog; - sk->sk_state = TCP_LISTEN; + WRITE_ONCE(sk->sk_state, TCP_LISTEN); + /* set credentials so connect can copy them */ init_peercred(sk); err = 0; @@ -1401,7 +1402,8 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr, if (err) goto out_unlock;
- sk->sk_state = other->sk_state = TCP_ESTABLISHED; + WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED); + WRITE_ONCE(other->sk_state, TCP_ESTABLISHED); } else { /* * 1003.1g breaking connected state with AF_UNSPEC @@ -1418,7 +1420,7 @@ static int unix_dgram_connect(struct socket *sock, struct sockaddr *addr,
unix_peer(sk) = other; if (!other) - sk->sk_state = TCP_CLOSE; + WRITE_ONCE(sk->sk_state, TCP_CLOSE); unix_dgram_peer_wake_disconnect_wakeup(sk, old_peer);
unix_state_double_unlock(sk, other); @@ -1638,7 +1640,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, copy_peercred(sk, other);
sock->state = SS_CONNECTED; - sk->sk_state = TCP_ESTABLISHED; + WRITE_ONCE(sk->sk_state, TCP_ESTABLISHED); sock_hold(newsk);
smp_mb__after_atomic(); /* sock_hold() does an atomic_inc() */ @@ -2097,7 +2099,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg, unix_peer(sk) = NULL; unix_dgram_peer_wake_disconnect_wakeup(sk, other);
- sk->sk_state = TCP_CLOSE; + WRITE_ONCE(sk->sk_state, TCP_CLOSE); unix_state_unlock(sk);
unix_dgram_disconnected(sk, other);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 3a0f38eb285c8c2eead4b3230c7ac2983707599d ]
ioctl(SIOCINQ) calls unix_inq_len() that checks sk->sk_state first and returns -EINVAL if it's TCP_LISTEN.
Then, for SOCK_STREAM sockets, unix_inq_len() returns the number of bytes in recvq.
However, unix_inq_len() does not hold unix_state_lock(), and the concurrent listen() might change the state after checking sk->sk_state.
If the race occurs, 0 is returned for the listener, instead of -EINVAL, because the length of skb with embryo is 0.
We could hold unix_state_lock() in unix_inq_len(), but it's overkill given the result is true for pre-listen() TCP_CLOSE state.
So, let's use READ_ONCE() for sk->sk_state in unix_inq_len().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 0a9c3975d4303..989c2c76dce66 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -3064,7 +3064,7 @@ long unix_inq_len(struct sock *sk) struct sk_buff *skb; long amount = 0;
- if (sk->sk_state == TCP_LISTEN) + if (READ_ONCE(sk->sk_state) == TCP_LISTEN) return -EINVAL;
spin_lock(&sk->sk_receive_queue.lock);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit eb0718fb3e97ad0d6f4529b810103451c90adf94 ]
unix_poll() and unix_dgram_poll() read sk->sk_state locklessly and calls unix_writable() which also reads sk->sk_state without holding unix_state_lock().
Let's use READ_ONCE() in unix_poll() and unix_dgram_poll() and pass it to unix_writable().
While at it, we remove TCP_SYN_SENT check in unix_dgram_poll() as that state does not exist for AF_UNIX socket since the code was added.
Fixes: 1586a5877db9 ("af_unix: do not report POLLOUT on listeners") Fixes: 3c73419c09a5 ("af_unix: fix 'poll for write'/ connected DGRAM sockets") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 989c2c76dce66..a3eb241eb064e 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -530,9 +530,9 @@ static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other) return 0; }
-static int unix_writable(const struct sock *sk) +static int unix_writable(const struct sock *sk, unsigned char state) { - return sk->sk_state != TCP_LISTEN && + return state != TCP_LISTEN && (refcount_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; }
@@ -541,7 +541,7 @@ static void unix_write_space(struct sock *sk) struct socket_wq *wq;
rcu_read_lock(); - if (unix_writable(sk)) { + if (unix_writable(sk, READ_ONCE(sk->sk_state))) { wq = rcu_dereference(sk->sk_wq); if (skwq_has_sleeper(wq)) wake_up_interruptible_sync_poll(&wq->wait, @@ -3176,12 +3176,14 @@ static int unix_compat_ioctl(struct socket *sock, unsigned int cmd, unsigned lon static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wait) { struct sock *sk = sock->sk; + unsigned char state; __poll_t mask; u8 shutdown;
sock_poll_wait(file, sock, wait); mask = 0; shutdown = READ_ONCE(sk->sk_shutdown); + state = READ_ONCE(sk->sk_state);
/* exceptional events? */ if (READ_ONCE(sk->sk_err)) @@ -3203,14 +3205,14 @@ static __poll_t unix_poll(struct file *file, struct socket *sock, poll_table *wa
/* Connection-based need to check for termination and startup */ if ((sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) && - sk->sk_state == TCP_CLOSE) + state == TCP_CLOSE) mask |= EPOLLHUP;
/* * we set writable also when the other side has shut down the * connection. This prevents stuck sockets. */ - if (unix_writable(sk)) + if (unix_writable(sk, state)) mask |= EPOLLOUT | EPOLLWRNORM | EPOLLWRBAND;
return mask; @@ -3221,12 +3223,14 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock, { struct sock *sk = sock->sk, *other; unsigned int writable; + unsigned char state; __poll_t mask; u8 shutdown;
sock_poll_wait(file, sock, wait); mask = 0; shutdown = READ_ONCE(sk->sk_shutdown); + state = READ_ONCE(sk->sk_state);
/* exceptional events? */ if (READ_ONCE(sk->sk_err) || @@ -3246,19 +3250,14 @@ static __poll_t unix_dgram_poll(struct file *file, struct socket *sock, mask |= EPOLLIN | EPOLLRDNORM;
/* Connection-based need to check for termination and startup */ - if (sk->sk_type == SOCK_SEQPACKET) { - if (sk->sk_state == TCP_CLOSE) - mask |= EPOLLHUP; - /* connection hasn't started yet? */ - if (sk->sk_state == TCP_SYN_SENT) - return mask; - } + if (sk->sk_type == SOCK_SEQPACKET && state == TCP_CLOSE) + mask |= EPOLLHUP;
/* No write status requested, avoid expensive OUT tests. */ if (!(poll_requested_events(wait) & (EPOLLWRBAND|EPOLLWRNORM|EPOLLOUT))) return mask;
- writable = unix_writable(sk); + writable = unix_writable(sk, state); if (writable) { unix_state_lock(sk);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit a9bf9c7dc6a5899c01cb8f6e773a66315a5cd4b7 ]
As small optimisation, unix_stream_connect() prefetches the client's sk->sk_state without unix_state_lock() and checks if it's TCP_CLOSE.
Later, sk->sk_state is checked again under unix_state_lock().
Let's use READ_ONCE() for the first check and TCP_CLOSE directly for the second check.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index a3eb241eb064e..f95aba56425fa 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1481,7 +1481,6 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, struct sk_buff *skb = NULL; long timeo; int err; - int st;
err = unix_validate_addr(sunaddr, addr_len); if (err) @@ -1571,9 +1570,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
Well, and we have to recheck the state after socket locked. */ - st = sk->sk_state; - - switch (st) { + switch (READ_ONCE(sk->sk_state)) { case TCP_CLOSE: /* This is ok... continue with connect */ break; @@ -1588,7 +1585,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr,
unix_state_lock_nested(sk, U_LOCK_SECOND);
- if (sk->sk_state != st) { + if (sk->sk_state != TCP_CLOSE) { unix_state_unlock(sk); unix_state_unlock(other); sock_put(other);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 8a34d4e8d9742a24f74998f45a6a98edd923319b ]
The following functions read sk->sk_state locklessly and proceed only if the state is TCP_ESTABLISHED.
* unix_stream_sendmsg * unix_stream_read_generic * unix_seqpacket_sendmsg * unix_seqpacket_recvmsg
Let's use READ_ONCE() there.
Fixes: a05d2ad1c1f3 ("af_unix: Only allow recv on connected seqpacket sockets.") Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index f95aba56425fa..d92e664032121 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2273,7 +2273,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, }
if (msg->msg_namelen) { - err = sk->sk_state == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP; + err = READ_ONCE(sk->sk_state) == TCP_ESTABLISHED ? -EISCONN : -EOPNOTSUPP; goto out_err; } else { err = -ENOTCONN; @@ -2387,7 +2387,7 @@ static int unix_seqpacket_sendmsg(struct socket *sock, struct msghdr *msg, if (err) return err;
- if (sk->sk_state != TCP_ESTABLISHED) + if (READ_ONCE(sk->sk_state) != TCP_ESTABLISHED) return -ENOTCONN;
if (msg->msg_namelen) @@ -2401,7 +2401,7 @@ static int unix_seqpacket_recvmsg(struct socket *sock, struct msghdr *msg, { struct sock *sk = sock->sk;
- if (sk->sk_state != TCP_ESTABLISHED) + if (READ_ONCE(sk->sk_state) != TCP_ESTABLISHED) return -ENOTCONN;
return unix_dgram_recvmsg(sock, msg, size, flags); @@ -2730,7 +2730,7 @@ static int unix_stream_read_generic(struct unix_stream_read_state *state, size_t size = state->size; unsigned int last_len;
- if (unlikely(sk->sk_state != TCP_ESTABLISHED)) { + if (unlikely(READ_ONCE(sk->sk_state) != TCP_ESTABLISHED)) { err = -EINVAL; goto out; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit af4c733b6b1aded4dc808fafece7dfe6e9d2ebb3 ]
unix_stream_read_skb() is called from sk->sk_data_ready() context where unix_state_lock() is not held.
Let's use READ_ONCE() there.
Fixes: 77462de14a43 ("af_unix: Add read_sock for stream socket types") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index d92e664032121..9f266a7679cbc 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2706,7 +2706,7 @@ static struct sk_buff *manage_oob(struct sk_buff *skb, struct sock *sk,
static int unix_stream_read_skb(struct sock *sk, skb_read_actor_t recv_actor) { - if (unlikely(sk->sk_state != TCP_ESTABLISHED)) + if (unlikely(READ_ONCE(sk->sk_state) != TCP_ESTABLISHED)) return -ENOTCONN;
return unix_read_skb(sk, recv_actor);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 0aa3be7b3e1f8f997312cc4705f8165e02806f8f ]
While dumping AF_UNIX sockets via UNIX_DIAG, sk->sk_state is read locklessly.
Let's use READ_ONCE() there.
Note that the result could be inconsistent if the socket is dumped during the state change. This is common for other SOCK_DIAG and similar interfaces.
Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report") Fixes: 2aac7a2cb0d9 ("unix_diag: Pending connections IDs NLA") Fixes: 45a96b9be6ec ("unix_diag: Dumping all sockets core") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/diag.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/net/unix/diag.c b/net/unix/diag.c index ae39538c5042b..116cf508aea4a 100644 --- a/net/unix/diag.c +++ b/net/unix/diag.c @@ -65,7 +65,7 @@ static int sk_diag_dump_icons(struct sock *sk, struct sk_buff *nlskb) u32 *buf; int i;
- if (sk->sk_state == TCP_LISTEN) { + if (READ_ONCE(sk->sk_state) == TCP_LISTEN) { spin_lock(&sk->sk_receive_queue.lock);
attr = nla_reserve(nlskb, UNIX_DIAG_ICONS, @@ -103,7 +103,7 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb) { struct unix_diag_rqlen rql;
- if (sk->sk_state == TCP_LISTEN) { + if (READ_ONCE(sk->sk_state) == TCP_LISTEN) { rql.udiag_rqueue = sk->sk_receive_queue.qlen; rql.udiag_wqueue = sk->sk_max_ack_backlog; } else { @@ -136,7 +136,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r rep = nlmsg_data(nlh); rep->udiag_family = AF_UNIX; rep->udiag_type = sk->sk_type; - rep->udiag_state = sk->sk_state; + rep->udiag_state = READ_ONCE(sk->sk_state); rep->pad = 0; rep->udiag_ino = sk_ino; sock_diag_save_cookie(sk, rep->udiag_cookie); @@ -215,7 +215,7 @@ static int unix_diag_dump(struct sk_buff *skb, struct netlink_callback *cb) sk_for_each(sk, &net->unx.table.buckets[slot]) { if (num < s_num) goto next; - if (!(req->udiag_states & (1 << sk->sk_state))) + if (!(req->udiag_states & (1 << READ_ONCE(sk->sk_state)))) goto next; if (sk_diag_dump(sk, skb, req, sk_user_ns(skb->sk), NETLINK_CB(cb->skb).portid,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit b0632e53e0da8054e36bc973f0eec69d30f1b7c6 ]
sk_setsockopt() changes sk->sk_sndbuf under lock_sock(), but it's not used in af_unix.c.
Let's use READ_ONCE() to read sk->sk_sndbuf in unix_writable(), unix_dgram_sendmsg(), and unix_stream_sendmsg().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 9f266a7679cbc..84bc1de2fd967 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -533,7 +533,7 @@ static int unix_dgram_peer_wake_me(struct sock *sk, struct sock *other) static int unix_writable(const struct sock *sk, unsigned char state) { return state != TCP_LISTEN && - (refcount_read(&sk->sk_wmem_alloc) << 2) <= sk->sk_sndbuf; + (refcount_read(&sk->sk_wmem_alloc) << 2) <= READ_ONCE(sk->sk_sndbuf); }
static void unix_write_space(struct sock *sk) @@ -2014,7 +2014,7 @@ static int unix_dgram_sendmsg(struct socket *sock, struct msghdr *msg, }
err = -EMSGSIZE; - if (len > sk->sk_sndbuf - 32) + if (len > READ_ONCE(sk->sk_sndbuf) - 32) goto out;
if (len > SKB_MAX_ALLOC) { @@ -2294,7 +2294,7 @@ static int unix_stream_sendmsg(struct socket *sock, struct msghdr *msg, &err, 0); } else { /* Keep two messages in the pipe so it schedules better */ - size = min_t(int, size, (sk->sk_sndbuf >> 1) - 64); + size = min_t(int, size, (READ_ONCE(sk->sk_sndbuf) >> 1) - 64);
/* allow fallback to order-0 allocations */ size = min_t(int, size, SKB_MAX_HEAD(0) + UNIX_SKB_FRAGS_SZ);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit bd9f2d05731f6a112d0c7391a0d537bfc588dbe6 ]
net->unx.sysctl_max_dgram_qlen is exposed as a sysctl knob and can be changed concurrently.
Let's use READ_ONCE() in unix_create1().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 84bc1de2fd967..0c217ac17e053 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -976,7 +976,7 @@ static struct sock *unix_create1(struct net *net, struct socket *sock, int kern, sk->sk_hash = unix_unbound_hash(sk); sk->sk_allocation = GFP_KERNEL_ACCOUNT; sk->sk_write_space = unix_write_space; - sk->sk_max_ack_backlog = net->unx.sysctl_max_dgram_qlen; + sk->sk_max_ack_backlog = READ_ONCE(net->unx.sysctl_max_dgram_qlen); sk->sk_destruct = unix_sock_destructor; u = unix_sk(sk); u->inflight = 0;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 45d872f0e65593176d880ec148f41ad7c02e40a7 ]
Once sk->sk_state is changed to TCP_LISTEN, it never changes.
unix_accept() takes advantage of this characteristics; it does not hold the listener's unix_state_lock() and only acquires recvq lock to pop one skb.
It means unix_state_lock() does not prevent the queue length from changing in unix_stream_connect().
Thus, we need to use unix_recvq_full_lockless() to avoid data-race.
Now we remove unix_recvq_full() as no one uses it.
Note that we can remove READ_ONCE() for sk->sk_max_ack_backlog in unix_recvq_full_lockless() because of the following reasons:
(1) For SOCK_DGRAM, it is a written-once field in unix_create1()
(2) For SOCK_STREAM and SOCK_SEQPACKET, it is changed under the listener's unix_state_lock() in unix_listen(), and we hold the lock in unix_stream_connect()
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index 0c217ac17e053..f0760afad71fe 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -221,15 +221,9 @@ static inline int unix_may_send(struct sock *sk, struct sock *osk) return unix_peer(osk) == NULL || unix_our_peer(sk, osk); }
-static inline int unix_recvq_full(const struct sock *sk) -{ - return skb_queue_len(&sk->sk_receive_queue) > sk->sk_max_ack_backlog; -} - static inline int unix_recvq_full_lockless(const struct sock *sk) { - return skb_queue_len_lockless(&sk->sk_receive_queue) > - READ_ONCE(sk->sk_max_ack_backlog); + return skb_queue_len_lockless(&sk->sk_receive_queue) > sk->sk_max_ack_backlog; }
struct sock *unix_peer_get(struct sock *s) @@ -1545,7 +1539,7 @@ static int unix_stream_connect(struct socket *sock, struct sockaddr *uaddr, if (other->sk_shutdown & RCV_SHUTDOWN) goto out_unlock;
- if (unix_recvq_full(other)) { + if (unix_recvq_full_lockless(other)) { err = -EAGAIN; if (!timeo) goto out_unlock;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 83690b82d228b3570565ebd0b41873933238b97f ]
If the socket type is SOCK_STREAM or SOCK_SEQPACKET, unix_release_sock() checks the length of the peer socket's recvq under unix_state_lock().
However, unix_stream_read_generic() calls skb_unlink() after releasing the lock. Also, for SOCK_SEQPACKET, __skb_try_recv_datagram() unlinks skb without unix_state_lock().
Thues, unix_state_lock() does not protect qlen.
Let's use skb_queue_empty_lockless() in unix_release_sock().
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c index f0760afad71fe..cbc011ceb89b4 100644 --- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -631,7 +631,7 @@ static void unix_release_sock(struct sock *sk, int embrion) unix_state_lock(skpair); /* No more writes */ WRITE_ONCE(skpair->sk_shutdown, SHUTDOWN_MASK); - if (!skb_queue_empty(&sk->sk_receive_queue) || embrion) + if (!skb_queue_empty_lockless(&sk->sk_receive_queue) || embrion) WRITE_ONCE(skpair->sk_err, ECONNRESET); unix_state_unlock(skpair); skpair->sk_state_change(skpair);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 5d915e584d8408211d4567c22685aae8820bfc55 ]
We can dump the socket queue length via UNIX_DIAG by specifying UDIAG_SHOW_RQLEN.
If sk->sk_state is TCP_LISTEN, we return the recv queue length, but here we do not hold recvq lock.
Let's use skb_queue_len_lockless() in sk_diag_show_rqlen().
Fixes: c9da99e6475f ("unix_diag: Fixup RQLEN extension report") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/diag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/diag.c b/net/unix/diag.c index 116cf508aea4a..321336f91a0af 100644 --- a/net/unix/diag.c +++ b/net/unix/diag.c @@ -104,7 +104,7 @@ static int sk_diag_show_rqlen(struct sock *sk, struct sk_buff *nlskb) struct unix_diag_rqlen rql;
if (READ_ONCE(sk->sk_state) == TCP_LISTEN) { - rql.udiag_rqueue = sk->sk_receive_queue.qlen; + rql.udiag_rqueue = skb_queue_len_lockless(&sk->sk_receive_queue); rql.udiag_wqueue = sk->sk_max_ack_backlog; } else { rql.udiag_rqueue = (u32) unix_inq_len(sk);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit efaf24e30ec39ebbea9112227485805a48b0ceb1 ]
While dumping sockets via UNIX_DIAG, we do not hold unix_state_lock().
Let's use READ_ONCE() to read sk->sk_shutdown.
Fixes: e4e541a84863 ("sock-diag: Report shutdown for inet and unix sockets (v2)") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/diag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/unix/diag.c b/net/unix/diag.c index 321336f91a0af..937edf4afed41 100644 --- a/net/unix/diag.c +++ b/net/unix/diag.c @@ -165,7 +165,7 @@ static int sk_diag_fill(struct sock *sk, struct sk_buff *skb, struct unix_diag_r sock_diag_put_meminfo(sk, skb, UNIX_DIAG_MEMINFO)) goto out_nlmsg_trim;
- if (nla_put_u8(skb, UNIX_DIAG_SHUTDOWN, sk->sk_shutdown)) + if (nla_put_u8(skb, UNIX_DIAG_SHUTDOWN, READ_ONCE(sk->sk_shutdown))) goto out_nlmsg_trim;
if ((req->udiag_show & UDIAG_SHOW_UID) &&
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit b01e1c030770ff3b4fe37fc7cc6bca03f594133f ]
syzbot found a race in __fib6_drop_pcpu_from() [1]
If compiler reads more than once (*ppcpu_rt), second read could read NULL, if another cpu clears the value in rt6_get_pcpu_route().
Add a READ_ONCE() to prevent this race.
Also add rcu_read_lock()/rcu_read_unlock() because we rely on RCU protection while dereferencing pcpu_rt.
[1]
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000012: 0000 [#1] PREEMPT SMP KASAN PTI KASAN: null-ptr-deref in range [0x0000000000000090-0x0000000000000097] CPU: 0 PID: 7543 Comm: kworker/u8:17 Not tainted 6.10.0-rc1-syzkaller-00013-g2bfcfd584ff5 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 Workqueue: netns cleanup_net RIP: 0010:__fib6_drop_pcpu_from.part.0+0x10a/0x370 net/ipv6/ip6_fib.c:984 Code: f8 48 c1 e8 03 80 3c 28 00 0f 85 16 02 00 00 4d 8b 3f 4d 85 ff 74 31 e8 74 a7 fa f7 49 8d bf 90 00 00 00 48 89 f8 48 c1 e8 03 <80> 3c 28 00 0f 85 1e 02 00 00 49 8b 87 90 00 00 00 48 8b 0c 24 48 RSP: 0018:ffffc900040df070 EFLAGS: 00010206 RAX: 0000000000000012 RBX: 0000000000000001 RCX: ffffffff89932e16 RDX: ffff888049dd1e00 RSI: ffffffff89932d7c RDI: 0000000000000091 RBP: dffffc0000000000 R08: 0000000000000005 R09: 0000000000000007 R10: 0000000000000001 R11: 0000000000000006 R12: ffff88807fa080b8 R13: fffffbfff1a9a07d R14: ffffed100ff41022 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8880b9200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000001b32c26000 CR3: 000000005d56e000 CR4: 00000000003526f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __fib6_drop_pcpu_from net/ipv6/ip6_fib.c:966 [inline] fib6_drop_pcpu_from net/ipv6/ip6_fib.c:1027 [inline] fib6_purge_rt+0x7f2/0x9f0 net/ipv6/ip6_fib.c:1038 fib6_del_route net/ipv6/ip6_fib.c:1998 [inline] fib6_del+0xa70/0x17b0 net/ipv6/ip6_fib.c:2043 fib6_clean_node+0x426/0x5b0 net/ipv6/ip6_fib.c:2205 fib6_walk_continue+0x44f/0x8d0 net/ipv6/ip6_fib.c:2127 fib6_walk+0x182/0x370 net/ipv6/ip6_fib.c:2175 fib6_clean_tree+0xd7/0x120 net/ipv6/ip6_fib.c:2255 __fib6_clean_all+0x100/0x2d0 net/ipv6/ip6_fib.c:2271 rt6_sync_down_dev net/ipv6/route.c:4906 [inline] rt6_disable_ip+0x7ed/0xa00 net/ipv6/route.c:4911 addrconf_ifdown.isra.0+0x117/0x1b40 net/ipv6/addrconf.c:3855 addrconf_notify+0x223/0x19e0 net/ipv6/addrconf.c:3778 notifier_call_chain+0xb9/0x410 kernel/notifier.c:93 call_netdevice_notifiers_info+0xbe/0x140 net/core/dev.c:1992 call_netdevice_notifiers_extack net/core/dev.c:2030 [inline] call_netdevice_notifiers net/core/dev.c:2044 [inline] dev_close_many+0x333/0x6a0 net/core/dev.c:1585 unregister_netdevice_many_notify+0x46d/0x19f0 net/core/dev.c:11193 unregister_netdevice_many net/core/dev.c:11276 [inline] default_device_exit_batch+0x85b/0xae0 net/core/dev.c:11759 ops_exit_list+0x128/0x180 net/core/net_namespace.c:178 cleanup_net+0x5b7/0xbf0 net/core/net_namespace.c:640 process_one_work+0x9fb/0x1b60 kernel/workqueue.c:3231 process_scheduled_works kernel/workqueue.c:3312 [inline] worker_thread+0x6c8/0xf70 kernel/workqueue.c:3393 kthread+0x2c1/0x3a0 kernel/kthread.c:389 ret_from_fork+0x45/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
Fixes: d52d3997f843 ("ipv6: Create percpu rt6_info") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Martin KaFai Lau kafai@fb.com Link: https://lore.kernel.org/r/20240604193549.981839-1-edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_fib.c | 6 +++++- net/ipv6/route.c | 1 + 2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/ip6_fib.c b/net/ipv6/ip6_fib.c index c1f62352a4814..1ace4ac3ee04c 100644 --- a/net/ipv6/ip6_fib.c +++ b/net/ipv6/ip6_fib.c @@ -965,6 +965,7 @@ static void __fib6_drop_pcpu_from(struct fib6_nh *fib6_nh, if (!fib6_nh->rt6i_pcpu) return;
+ rcu_read_lock(); /* release the reference to this fib entry from * all of its cached pcpu routes */ @@ -973,7 +974,9 @@ static void __fib6_drop_pcpu_from(struct fib6_nh *fib6_nh, struct rt6_info *pcpu_rt;
ppcpu_rt = per_cpu_ptr(fib6_nh->rt6i_pcpu, cpu); - pcpu_rt = *ppcpu_rt; + + /* Paired with xchg() in rt6_get_pcpu_route() */ + pcpu_rt = READ_ONCE(*ppcpu_rt);
/* only dropping the 'from' reference if the cached route * is using 'match'. The cached pcpu_rt->from only changes @@ -987,6 +990,7 @@ static void __fib6_drop_pcpu_from(struct fib6_nh *fib6_nh, fib6_info_release(from); } } + rcu_read_unlock(); }
struct fib6_nh_pcpu_arg { diff --git a/net/ipv6/route.c b/net/ipv6/route.c index f090e7bcb784f..bca6f33c7bb9e 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -1409,6 +1409,7 @@ static struct rt6_info *rt6_get_pcpu_route(const struct fib6_result *res) struct rt6_info *prev, **p;
p = this_cpu_ptr(res->nh->rt6i_pcpu); + /* Paired with READ_ONCE() in __fib6_drop_pcpu_from() */ prev = xchg(p, NULL); if (prev) { dst_dev_put(&prev->dst);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Su Hui suhui@nfschina.com
[ Upstream commit 0dcc53abf58d572d34c5313de85f607cd33fc691 ]
Clang static checker (scan-build) warning: net/ethtool/ioctl.c:line 2233, column 2 Called function pointer is null (null dereference).
Return '-EOPNOTSUPP' when 'ops->get_ethtool_phy_stats' is NULL to fix this typo error.
Fixes: 201ed315f967 ("net/ethtool/ioctl: split ethtool_get_phy_stats into multiple helpers") Signed-off-by: Su Hui suhui@nfschina.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Reviewed-by: Hariprasad Kelam hkelam@marvell.com Link: https://lore.kernel.org/r/20240605034742.921751-1-suhui@nfschina.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ethtool/ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c index 5a55270aa86e8..e645d751a5e89 100644 --- a/net/ethtool/ioctl.c +++ b/net/ethtool/ioctl.c @@ -2220,7 +2220,7 @@ static int ethtool_get_phy_stats_ethtool(struct net_device *dev, const struct ethtool_ops *ops = dev->ethtool_ops; int n_stats, ret;
- if (!ops || !ops->get_sset_count || ops->get_ethtool_phy_stats) + if (!ops || !ops->get_sset_count || !ops->get_ethtool_phy_stats) return -EOPNOTSUPP;
n_stats = ops->get_sset_count(dev, ETH_SS_PHY_STATS);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Brost matthew.brost@intel.com
[ Upstream commit 2d9c72f676e6f79a021b74c6c1c88235e7d5b722 ]
System work queues are shared, use a dedicated work queue for G2H processing to avoid G2H processing getting block behind system tasks.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost matthew.brost@intel.com Reviewed-by: Francois Dugast francois.dugast@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240506034758.3697397-1-matth... (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_guc_ct.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 8bbfa45798e2e..6ac86936faaf9 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -146,6 +146,10 @@ int xe_guc_ct_init(struct xe_guc_ct *ct)
xe_assert(xe, !(guc_ct_size() % PAGE_SIZE));
+ ct->g2h_wq = alloc_ordered_workqueue("xe-g2h-wq", 0); + if (!ct->g2h_wq) + return -ENOMEM; + ct->g2h_wq = alloc_ordered_workqueue("xe-g2h-wq", 0); if (!ct->g2h_wq) return -ENOMEM;
On Wed, Jun 19, 2024 at 02:53:52PM +0200, Greg Kroah-Hartman wrote:
6.9-stable review patch. If anyone has any objections, please let me know.
Hi Greg,
This patch seems to be a duplicate and should be dropped.
Thanks, Francois
From: Matthew Brost matthew.brost@intel.com
[ Upstream commit 2d9c72f676e6f79a021b74c6c1c88235e7d5b722 ]
System work queues are shared, use a dedicated work queue for G2H processing to avoid G2H processing getting block behind system tasks.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org Signed-off-by: Matthew Brost matthew.brost@intel.com Reviewed-by: Francois Dugast francois.dugast@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240506034758.3697397-1-matth... (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org
drivers/gpu/drm/xe/xe_guc_ct.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 8bbfa45798e2e..6ac86936faaf9 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -146,6 +146,10 @@ int xe_guc_ct_init(struct xe_guc_ct *ct) xe_assert(xe, !(guc_ct_size() % PAGE_SIZE));
- ct->g2h_wq = alloc_ordered_workqueue("xe-g2h-wq", 0);
- if (!ct->g2h_wq)
return -ENOMEM;
- ct->g2h_wq = alloc_ordered_workqueue("xe-g2h-wq", 0); if (!ct->g2h_wq) return -ENOMEM;
-- 2.43.0
On Wed, Jun 19, 2024 at 04:03:29PM +0200, Francois Dugast wrote:
On Wed, Jun 19, 2024 at 02:53:52PM +0200, Greg Kroah-Hartman wrote:
6.9-stable review patch. If anyone has any objections, please let me know.
Hi Greg,
This patch seems to be a duplicate and should be dropped.
How are we supposed to be able to determine this?
When you all check in commits into multiple branches, and tag them for stable: and then they hit Linus's tree, and all hell breaks loose on our scripts. "Normally" this tag:
(cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de)
Would help out here, but it doesn't. Why not, what went wrong?
I'll go drop this, but ugh, what a mess. It makes me dread every drm patch that gets tagged for stable, and so I postpone taking them until I am done with everything else and can't ignore them anymore.
Please fix your broken process.
greg k-h
On Wed, Jun 19, 2024 at 04:16:56PM +0200, Greg Kroah-Hartman wrote:
On Wed, Jun 19, 2024 at 04:03:29PM +0200, Francois Dugast wrote:
On Wed, Jun 19, 2024 at 02:53:52PM +0200, Greg Kroah-Hartman wrote:
6.9-stable review patch. If anyone has any objections, please let me know.
Hi Greg,
This patch seems to be a duplicate and should be dropped.
Please also drop the 6.9.7-rc1:
https://lore.kernel.org/stable/20240625085557.190548833@linuxfoundation.org/...
How are we supposed to be able to determine this?
When you all check in commits into multiple branches, and tag them for stable: and then they hit Linus's tree, and all hell breaks loose on our scripts. "Normally" this tag:
(cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de)
Would help out here, but it doesn't.
I wonder if there would be a way of automate this on stable scripts to avoid attempting a cherry pick that is already there.
But I do understand that any change like this would cause a 'latency' on the scripts and slow down everything.
Why not, what went wrong?
worst thing in this case is that git applied this cleanly, although the change was already there.
But also a timing thing. The faulty patch was already in the master. At the moment we applied the fix in our drm-xe-next, we had already sent the latest changes to the upcoming merge-window, so it propagated there as a cherry-pick, but we had to also send to the current -rc cycle and then the second cherry-pick also goes there.
This fast propagation to the current active development branch in general shouldn't be a problem, but a good thing so it is ensured that the fix gets quickly there. But clearly this configure a problem to the later propagation to the stable trees.
I'll go drop this, but ugh, what a mess. It makes me dread every drm patch that gets tagged for stable, and so I postpone taking them until I am done with everything else and can't ignore them anymore.
Please fix your broken process.
When you say drm, do you have same problem with patches coming from other drm drivers, drm-misc, or is it really only Intel trees? (only drm-intel (i915) and drm-xe)?
greg k-h
On Tue, Jun 25, 2024 at 08:03:33AM -0400, Rodrigo Vivi wrote:
On Wed, Jun 19, 2024 at 04:16:56PM +0200, Greg Kroah-Hartman wrote:
On Wed, Jun 19, 2024 at 04:03:29PM +0200, Francois Dugast wrote:
On Wed, Jun 19, 2024 at 02:53:52PM +0200, Greg Kroah-Hartman wrote:
6.9-stable review patch. If anyone has any objections, please let me know.
Hi Greg,
This patch seems to be a duplicate and should be dropped.
Please also drop the 6.9.7-rc1:
https://lore.kernel.org/stable/20240625085557.190548833@linuxfoundation.org/...
Done.
How are we supposed to be able to determine this?
When you all check in commits into multiple branches, and tag them for stable: and then they hit Linus's tree, and all hell breaks loose on our scripts. "Normally" this tag:
(cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de)
Would help out here, but it doesn't.
I wonder if there would be a way of automate this on stable scripts to avoid attempting a cherry pick that is already there.
Please tell me how to do so.
But I do understand that any change like this would cause a 'latency' on the scripts and slow down everything.
Depends, I already put a huge latency on drm stable patches because of this mess. And I see few, if any, actual backports for when I report FAILED stable patches, so what is going to get slower than it currently is?
Why not, what went wrong?
worst thing in this case is that git applied this cleanly, although the change was already there.
Yup.
But also a timing thing. The faulty patch was already in the master. At the moment we applied the fix in our drm-xe-next, we had already sent the latest changes to the upcoming merge-window, so it propagated there as a cherry-pick, but we had to also send to the current -rc cycle and then the second cherry-pick also goes there.
This fast propagation to the current active development branch in general shouldn't be a problem, but a good thing so it is ensured that the fix gets quickly there. But clearly this configure a problem to the later propagation to the stable trees.
Normally you all tag these cherry-picks as such. You didn't do that here or either place, so there was no way for anyone to know. Please fix that.
I'll go drop this, but ugh, what a mess. It makes me dread every drm patch that gets tagged for stable, and so I postpone taking them until I am done with everything else and can't ignore them anymore.
Please fix your broken process.
When you say drm, do you have same problem with patches coming from other drm drivers, drm-misc, or is it really only Intel trees? (only drm-intel (i915) and drm-xe)?
Intel trees have traditionally been the worst, but normally you all give me some cherry-pick clue on the commits so I can weed them out. That didn't happen here.
But, I will note that the AMD drm tree is now starting to do this, in much worse ways than the Intel tree because there is NO cherry-pick information anywhere, so again, I have no idea what to do and massive conflicts happen.
So AMD is copying your bad behavior, please, both of you stop and fix this so that it isn't so broken.
Again, I understand the need/want to have multiple versions of the same patch in different branches to get fixes merged quicker, but when you do that, please give me a way to determine this, otherwise we have no chance.
thanks,
greg k-h
On Tue, 25 Jun 2024, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Jun 25, 2024 at 08:03:33AM -0400, Rodrigo Vivi wrote:
On Wed, Jun 19, 2024 at 04:16:56PM +0200, Greg Kroah-Hartman wrote:
On Wed, Jun 19, 2024 at 04:03:29PM +0200, Francois Dugast wrote:
On Wed, Jun 19, 2024 at 02:53:52PM +0200, Greg Kroah-Hartman wrote:
6.9-stable review patch. If anyone has any objections, please let me know.
Hi Greg,
This patch seems to be a duplicate and should be dropped.
Please also drop the 6.9.7-rc1:
https://lore.kernel.org/stable/20240625085557.190548833@linuxfoundation.org/...
Done.
How are we supposed to be able to determine this?
When you all check in commits into multiple branches, and tag them for stable: and then they hit Linus's tree, and all hell breaks loose on our scripts. "Normally" this tag:
(cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de)
Would help out here, but it doesn't.
I wonder if there would be a way of automate this on stable scripts to avoid attempting a cherry pick that is already there.
Please tell me how to do so.
But I do understand that any change like this would cause a 'latency' on the scripts and slow down everything.
Depends, I already put a huge latency on drm stable patches because of this mess. And I see few, if any, actual backports for when I report FAILED stable patches, so what is going to get slower than it currently is?
Why not, what went wrong?
worst thing in this case is that git applied this cleanly, although the change was already there.
Yup.
But also a timing thing. The faulty patch was already in the master. At the moment we applied the fix in our drm-xe-next, we had already sent the latest changes to the upcoming merge-window, so it propagated there as a cherry-pick, but we had to also send to the current -rc cycle and then the second cherry-pick also goes there.
This fast propagation to the current active development branch in general shouldn't be a problem, but a good thing so it is ensured that the fix gets quickly there. But clearly this configure a problem to the later propagation to the stable trees.
Normally you all tag these cherry-picks as such. You didn't do that here or either place, so there was no way for anyone to know. Please fix that.
I'll go drop this, but ugh, what a mess. It makes me dread every drm patch that gets tagged for stable, and so I postpone taking them until I am done with everything else and can't ignore them anymore.
Please fix your broken process.
When you say drm, do you have same problem with patches coming from other drm drivers, drm-misc, or is it really only Intel trees? (only drm-intel (i915) and drm-xe)?
Intel trees have traditionally been the worst, but normally you all give me some cherry-pick clue on the commits so I can weed them out. That didn't happen here.
But, I will note that the AMD drm tree is now starting to do this, in much worse ways than the Intel tree because there is NO cherry-pick information anywhere, so again, I have no idea what to do and massive conflicts happen.
So AMD is copying your bad behavior, please, both of you stop and fix this so that it isn't so broken.
Again, I understand the need/want to have multiple versions of the same patch in different branches to get fixes merged quicker, but when you do that, please give me a way to determine this, otherwise we have no chance.
To be fair, this one seems to have been an accident. The same commit was cherry-picked to *two* different branches by two different people [1][2], and this is something we try not to do. Any cherry-picks should go to one tree only, it's checked by our scripts, but it's not race free when two people are doing this roughly at the same time.
BR, Jani.
[1] https://lore.kernel.org/r/Zjz7SzCvfA3vQRxu@fedora [2] https://lore.kernel.org/r/c3rduifdp5wipkljdpuq4x6uowkc2uyzgdoft4txvp6mgvzjaj...
On Tue, Jun 25, 2024 at 03:41:24PM +0300, Jani Nikula wrote:
On Tue, 25 Jun 2024, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Tue, Jun 25, 2024 at 08:03:33AM -0400, Rodrigo Vivi wrote:
On Wed, Jun 19, 2024 at 04:16:56PM +0200, Greg Kroah-Hartman wrote:
On Wed, Jun 19, 2024 at 04:03:29PM +0200, Francois Dugast wrote:
On Wed, Jun 19, 2024 at 02:53:52PM +0200, Greg Kroah-Hartman wrote:
6.9-stable review patch. If anyone has any objections, please let me know.
Hi Greg,
This patch seems to be a duplicate and should be dropped.
Please also drop the 6.9.7-rc1:
https://lore.kernel.org/stable/20240625085557.190548833@linuxfoundation.org/...
Done.
How are we supposed to be able to determine this?
When you all check in commits into multiple branches, and tag them for stable: and then they hit Linus's tree, and all hell breaks loose on our scripts. "Normally" this tag:
(cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de)
Would help out here, but it doesn't.
I wonder if there would be a way of automate this on stable scripts to avoid attempting a cherry pick that is already there.
Please tell me how to do so.
But I do understand that any change like this would cause a 'latency' on the scripts and slow down everything.
Depends, I already put a huge latency on drm stable patches because of this mess. And I see few, if any, actual backports for when I report FAILED stable patches, so what is going to get slower than it currently is?
Why not, what went wrong?
worst thing in this case is that git applied this cleanly, although the change was already there.
Yup.
But also a timing thing. The faulty patch was already in the master. At the moment we applied the fix in our drm-xe-next, we had already sent the latest changes to the upcoming merge-window, so it propagated there as a cherry-pick, but we had to also send to the current -rc cycle and then the second cherry-pick also goes there.
This fast propagation to the current active development branch in general shouldn't be a problem, but a good thing so it is ensured that the fix gets quickly there. But clearly this configure a problem to the later propagation to the stable trees.
Normally you all tag these cherry-picks as such. You didn't do that here or either place, so there was no way for anyone to know. Please fix that.
I'll go drop this, but ugh, what a mess. It makes me dread every drm patch that gets tagged for stable, and so I postpone taking them until I am done with everything else and can't ignore them anymore.
Please fix your broken process.
When you say drm, do you have same problem with patches coming from other drm drivers, drm-misc, or is it really only Intel trees? (only drm-intel (i915) and drm-xe)?
Intel trees have traditionally been the worst, but normally you all give me some cherry-pick clue on the commits so I can weed them out. That didn't happen here.
But, I will note that the AMD drm tree is now starting to do this, in much worse ways than the Intel tree because there is NO cherry-pick information anywhere, so again, I have no idea what to do and massive conflicts happen.
So AMD is copying your bad behavior, please, both of you stop and fix this so that it isn't so broken.
Again, I understand the need/want to have multiple versions of the same patch in different branches to get fixes merged quicker, but when you do that, please give me a way to determine this, otherwise we have no chance.
To be fair, this one seems to have been an accident. The same commit was cherry-picked to *two* different branches by two different people [1][2], and this is something we try not to do. Any cherry-picks should go to one tree only, it's checked by our scripts, but it's not race free when two people are doing this roughly at the same time.
any cherry-pick SHOULD have the git id referenced when they are cherry-picked, that's what the id is there for. Please always do that.
greg k-h
> (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de)
Would help out here, but it doesn't.
I wonder if there would be a way of automate this on stable scripts to avoid attempting a cherry pick that is already there.
Please tell me how to do so.
But I do understand that any change like this would cause a 'latency' on the scripts and slow down everything.
Depends, I already put a huge latency on drm stable patches because of this mess. And I see few, if any, actual backports for when I report FAILED stable patches, so what is going to get slower than it currently is?
My thought was on the stable scripts doing something like that.
For each candidate commit, check if it has the tag line (cherry picked from commit <original-hash>)
if so, then something like: if git rev-parse --quiet --verify <original-hash> || \ git log --grep="cherry picked from commit <original-hash> -E --oneline >/dev/null; then echo "One version of this patch is already in tree. Skipping..." # send-email?! else #attempt to apply the candidate commit...
Normally you all tag these cherry-picks as such. You didn't do that here or either place, so there was no way for anyone to know. Please fix that.
I'm afraid this is not accurate. Our tooling is taking care of that for us. More details below.
To be fair, this one seems to have been an accident. The same commit was cherry-picked to *two* different branches by two different people [1][2], and this is something we try not to do. Any cherry-picks should go to one tree only, it's checked by our scripts, but it's not race free when two people are doing this roughly at the same time.
Also I don't believe there's anything wrong here. It was a coincidence on the timing, but one is drm-xe-next-fixes-2024-05-09-1 and the other drm-xe-fixes-2024-05-09
both maintained by different people at that time.
any cherry-pick SHOULD have the git id referenced when they are cherry-picked, that's what the id is there for. Please always do that.
Original commit hash is 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de 50aec9665e0b ("drm/xe: Use ordered WQ for G2H handler")
drm-xe-next-fixes-2024-05-09-1 has commit 2d9c72f676e6 ("drm/xe: Use ordered WQ for G2H handler") which contains: (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com
drm-xe-fixes-2024-05-09 has commit c002bfe644a2 ("drm/xe: Use ordered WQ for G2H handler") which contains: (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de) Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
Perhaps git itself should be smart to avoid it since the info is there, but unfortunately it is not. So probably a script on top like above could help us to minimize cases like this.
Thanks a lot from the support and patience here.
greg k-h
On Tue, Jun 25, 2024 at 03:02:02PM -0400, Rodrigo Vivi wrote:
> > (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de)
Would help out here, but it doesn't.
I wonder if there would be a way of automate this on stable scripts to avoid attempting a cherry pick that is already there.
Please tell me how to do so.
But I do understand that any change like this would cause a 'latency' on the scripts and slow down everything.
Depends, I already put a huge latency on drm stable patches because of this mess. And I see few, if any, actual backports for when I report FAILED stable patches, so what is going to get slower than it currently is?
My thought was on the stable scripts doing something like that.
For each candidate commit, check if it has the tag line (cherry picked from commit <original-hash>)
Right there, that fails with how the drm tree works. So you are going to have to come up with something else for how to check this. Or fix your process to make this work.
Look at the commits tagged for stable in the -rc1 merge window. They don't have the "cherry picked" wording as they were not cherry picked! They were the "originals" that were cherry picked from.
if so, then something like: if git rev-parse --quiet --verify <original-hash> || \ git log --grep="cherry picked from commit <original-hash> -E --oneline >/dev/null; then echo "One version of this patch is already in tree. Skipping..." # send-email?! else #attempt to apply the candidate commit...
Normally you all tag these cherry-picks as such. You didn't do that here or either place, so there was no way for anyone to know. Please fix that.
I'm afraid this is not accurate. Our tooling is taking care of that for us.
Then your tooling needs to be fixed.
To be fair, this one seems to have been an accident. The same commit was cherry-picked to *two* different branches by two different people [1][2], and this is something we try not to do. Any cherry-picks should go to one tree only, it's checked by our scripts, but it's not race free when two people are doing this roughly at the same time.
Also I don't believe there's anything wrong here. It was a coincidence on the timing, but one is drm-xe-next-fixes-2024-05-09-1 and the other drm-xe-fixes-2024-05-09
both maintained by different people at that time.
any cherry-pick SHOULD have the git id referenced when they are cherry-picked, that's what the id is there for. Please always do that.
Original commit hash is 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de 50aec9665e0b ("drm/xe: Use ordered WQ for G2H handler")
And that's not in Linus's tree.
drm-xe-next-fixes-2024-05-09-1 has commit 2d9c72f676e6 ("drm/xe: Use ordered WQ for G2H handler") which contains: (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com
drm-xe-fixes-2024-05-09 has commit c002bfe644a2 ("drm/xe: Use ordered WQ for G2H handler") which contains: (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de) Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
Ok, but again, the original is not in Linus's tree.
So how am I do know that the cherry picked line was already in the tree? Added it twice is odd, but really, that's not the common failure here at all. The real problem is the -rc1 window where there is NOT a commit id that can be matched up with (as it happening with a few commits here, and with lots of AMD patches now.)
thanks,
greg k-h
On Tue, Jun 25, 2024 at 10:12:05PM GMT, Greg Kroah-Hartman wrote:
On Tue, Jun 25, 2024 at 03:02:02PM -0400, Rodrigo Vivi wrote:
> > > > (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de) > > Would help out here, but it doesn't.
I wonder if there would be a way of automate this on stable scripts to avoid attempting a cherry pick that is already there.
Please tell me how to do so.
But I do understand that any change like this would cause a 'latency' on the scripts and slow down everything.
Depends, I already put a huge latency on drm stable patches because of this mess. And I see few, if any, actual backports for when I report FAILED stable patches, so what is going to get slower than it currently is?
My thought was on the stable scripts doing something like that.
For each candidate commit, check if it has the tag line (cherry picked from commit <original-hash>)
Right there, that fails with how the drm tree works. So you are going to have to come up with something else for how to check this. Or fix your process to make this work.
Look at the commits tagged for stable in the -rc1 merge window. They don't have the "cherry picked" wording as they were not cherry picked! They were the "originals" that were cherry picked from.
if so, then something like: if git rev-parse --quiet --verify <original-hash> || \ git log --grep="cherry picked from commit <original-hash> -E --oneline >/dev/null; then echo "One version of this patch is already in tree. Skipping..." # send-email?! else #attempt to apply the candidate commit...
Normally you all tag these cherry-picks as such. You didn't do that here or either place, so there was no way for anyone to know. Please fix that.
I'm afraid this is not accurate. Our tooling is taking care of that for us.
Then your tooling needs to be fixed.
To be fair, this one seems to have been an accident. The same commit was cherry-picked to *two* different branches by two different people [1][2], and this is something we try not to do. Any cherry-picks should go to one tree only, it's checked by our scripts, but it's not race free when two people are doing this roughly at the same time.
Also I don't believe there's anything wrong here. It was a coincidence on the timing, but one is drm-xe-next-fixes-2024-05-09-1 and the other drm-xe-fixes-2024-05-09
both maintained by different people at that time.
any cherry-pick SHOULD have the git id referenced when they are cherry-picked, that's what the id is there for. Please always do that.
Original commit hash is 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de 50aec9665e0b ("drm/xe: Use ordered WQ for G2H handler")
And that's not in Linus's tree.
drm-xe-next-fixes-2024-05-09-1 has commit 2d9c72f676e6 ("drm/xe: Use ordered WQ for G2H handler") which contains: (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com
drm-xe-fixes-2024-05-09 has commit c002bfe644a2 ("drm/xe: Use ordered WQ for G2H handler") which contains: (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de) Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
Ok, but again, the original is not in Linus's tree.
current workflow is:
All patches go to drm-xe-next which may be targeting the next kernel release or the next->next. Then the fixes (and less frequently, other commits which are not important here) are cherry-picked by maintainers to the current or next kernel release. In other words, the cherry-pick always targets a version 1 less than the original patch.
couple of issues I see:
1) original commit not being in Linus's tree
If you are traversing the branch and you have a "Fixes: XXXX" where XXXX is from a previous release, you will most likely see the "cherry picked from <original>" *before* the <original> commit showing up in that tree.
Examples looking at the most recent fixes in Linus's tree with
$ git rev-parse origin/master 6d6444ba82053c716fb5ac83346202659023044e
$ git log --format="%h %s%n%(trailers)" -n3 --grep "^Fixes:" \ 6d6444ba82053c716fb5ac83346202659023044e \ -- drivers/gpu/drm/xe
d21d44dbdde8 in v6.10-rc5, fixes aef4eb7c7dec from v6.9 but f0ccd2d805e55e12b430d5d6b9acd9f891af455e is not in any tag.
2470b141bfae in v6.10-rc4, fixes 975e4a3795d4 from v6.8 but 6800e63cf97bae62bca56d8e691544540d945f53 is not in any tag.
cd554e1e118a in v6.10-rc4, fixes 0698ff57bf32 from v6.10-rc3 but b321cb83a375bcc18cd0a4b62bdeaf6905cca769 is not in any tag.
Without a change in the workflow, I don't think you can rely on the original commit hash being present in Linus's tree, unless you wait one entire kernel release cycle.
2) Duplicate commits with "cherry picked from ... "
This only happened because it was cherry-picked in 2 branches, both for current and for next kernel releases. This can only happen during the last rcN as drm-xe-next-fixes is only used there.
I think (2) is easier to fix: we should really add more checks in dim to avoid that: if a commit is a candidate to drm-xe-fixes it should never be a candidate to drm-xe-next-fixes.
As for (1), I don't see how it could be fixed without a workflow change. That would require committers to add the commits to the right branch rather than relying on maintainers to cherry-pick them to the -fixes branch after they are merged in drm-xe-next. What dim could do is to automate that for the committer and figure out the branch by itself based on the 2 commit hashes.
For all the above I used drm-xe-* as example, but it applies to drm-intel-* too.
Without the fix to (2) and workflow change from (1), this situation can be avoided from the -stable scripts by checking if the commit or any cherry-pick thereof is already in the stable tree:
git merge-base --is-ancestor $original_commit HEAD git log -n1 --grep "^(cherry picked from commit $original_commit)"
the second search is very expensive though.... probably need to find a way to limit the search space.
Lucas De Marchi
So how am I do know that the cherry picked line was already in the tree? Added it twice is odd, but really, that's not the common failure here at all. The real problem is the -rc1 window where there is NOT a commit id that can be matched up with (as it happening with a few commits here, and with lots of AMD patches now.)
thanks,
greg k-h
On Thu, Jun 27, 2024 at 06:32:37PM -0500, Lucas De Marchi wrote:
On Tue, Jun 25, 2024 at 10:12:05PM GMT, Greg Kroah-Hartman wrote:
On Tue, Jun 25, 2024 at 03:02:02PM -0400, Rodrigo Vivi wrote:
> > > > > > (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de) > > > > Would help out here, but it doesn't. > > I wonder if there would be a way of automate this on stable scripts > to avoid attempting a cherry pick that is already there.
Please tell me how to do so.
> But I do understand that any change like this would cause a 'latency' > on the scripts and slow down everything.
Depends, I already put a huge latency on drm stable patches because of this mess. And I see few, if any, actual backports for when I report FAILED stable patches, so what is going to get slower than it currently is?
My thought was on the stable scripts doing something like that.
For each candidate commit, check if it has the tag line (cherry picked from commit <original-hash>)
Right there, that fails with how the drm tree works. So you are going to have to come up with something else for how to check this. Or fix your process to make this work.
Look at the commits tagged for stable in the -rc1 merge window. They don't have the "cherry picked" wording as they were not cherry picked! They were the "originals" that were cherry picked from.
if so, then something like: if git rev-parse --quiet --verify <original-hash> || \ git log --grep="cherry picked from commit <original-hash> -E --oneline >/dev/null; then echo "One version of this patch is already in tree. Skipping..." # send-email?! else #attempt to apply the candidate commit...
Normally you all tag these cherry-picks as such. You didn't do that here or either place, so there was no way for anyone to know. Please fix that.
I'm afraid this is not accurate. Our tooling is taking care of that for us.
Then your tooling needs to be fixed.
To be fair, this one seems to have been an accident. The same commit was cherry-picked to *two* different branches by two different people [1][2], and this is something we try not to do. Any cherry-picks should go to one tree only, it's checked by our scripts, but it's not race free when two people are doing this roughly at the same time.
Also I don't believe there's anything wrong here. It was a coincidence on the timing, but one is drm-xe-next-fixes-2024-05-09-1 and the other drm-xe-fixes-2024-05-09
both maintained by different people at that time.
any cherry-pick SHOULD have the git id referenced when they are cherry-picked, that's what the id is there for. Please always do that.
Original commit hash is 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de 50aec9665e0b ("drm/xe: Use ordered WQ for G2H handler")
And that's not in Linus's tree.
drm-xe-next-fixes-2024-05-09-1 has commit 2d9c72f676e6 ("drm/xe: Use ordered WQ for G2H handler") which contains: (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com
drm-xe-fixes-2024-05-09 has commit c002bfe644a2 ("drm/xe: Use ordered WQ for G2H handler") which contains: (cherry picked from commit 50aec9665e0babd62b9eee4e613d9a1ef8d2b7de) Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com
Ok, but again, the original is not in Linus's tree.
current workflow is:
All patches go to drm-xe-next which may be targeting the next kernel release or the next->next. Then the fixes (and less frequently, other commits which are not important here) are cherry-picked by maintainers to the current or next kernel release. In other words, the cherry-pick always targets a version 1 less than the original patch.
That's fine, as long as that cherry-pick references the git id of the original one. Which did not happen here, and does NOT happen at all for all drm branches (like AMD right now).
couple of issues I see:
- original commit not being in Linus's tree
If you are traversing the branch and you have a "Fixes: XXXX" where XXXX is from a previous release, you will most likely see the "cherry picked from <original>" *before* the <original> commit showing up in that tree.
Examples looking at the most recent fixes in Linus's tree with
$ git rev-parse origin/master 6d6444ba82053c716fb5ac83346202659023044e
$ git log --format="%h %s%n%(trailers)" -n3 --grep "^Fixes:" \ 6d6444ba82053c716fb5ac83346202659023044e \ -- drivers/gpu/drm/xe
d21d44dbdde8 in v6.10-rc5, fixes aef4eb7c7dec from v6.9 but f0ccd2d805e55e12b430d5d6b9acd9f891af455e is not in any tag.
2470b141bfae in v6.10-rc4, fixes 975e4a3795d4 from v6.8 but 6800e63cf97bae62bca56d8e691544540d945f53 is not in any tag.
cd554e1e118a in v6.10-rc4, fixes 0698ff57bf32 from v6.10-rc3 but b321cb83a375bcc18cd0a4b62bdeaf6905cca769 is not in any tag.
Without a change in the workflow, I don't think you can rely on the original commit hash being present in Linus's tree, unless you wait one entire kernel release cycle.
Which you obviously do not want me to do as you wanted that fix in the tree sooner.
So this is a pain, but I'm kind of used to it by now as you all seem to like this workflow for whatever reason...
- Duplicate commits with "cherry picked from ... "
This only happened because it was cherry-picked in 2 branches, both for current and for next kernel releases. This can only happen during the last rcN as drm-xe-next-fixes is only used there.
I think (2) is easier to fix: we should really add more checks in dim to avoid that: if a commit is a candidate to drm-xe-fixes it should never be a candidate to drm-xe-next-fixes.
Please fix.
As for (1), I don't see how it could be fixed without a workflow change. That would require committers to add the commits to the right branch rather than relying on maintainers to cherry-pick them to the -fixes branch after they are merged in drm-xe-next. What dim could do is to automate that for the committer and figure out the branch by itself based on the 2 commit hashes.
For all the above I used drm-xe-* as example, but it applies to drm-intel-* too.
Without the fix to (2) and workflow change from (1), this situation can be avoided from the -stable scripts by checking if the commit or any cherry-pick thereof is already in the stable tree:
git merge-base --is-ancestor $original_commit HEAD git log -n1 --grep "^(cherry picked from commit $original_commit)"
the second search is very expensive though.... probably need to find a way to limit the search space.
That's still a major pain, as is your current workflow. So until you all fix this up, I will continue to complain as the existing way drm patches flow into stable trees is a major pain which causes me to dread looking at them every time.
greg k-h
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Borislav Petkov (AMD) bp@alien8.de
[ Upstream commit 95bfb35269b2e85cff0dd2c957b2d42ebf95ae5f ]
Drop 'vp_bits_from_cpuid' as it is not really needed.
No functional changes.
Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Ingo Molnar mingo@kernel.org Link: https://lore.kernel.org/r/20240316120706.4352-1-bp@alien8.de Stable-dep-of: 2a38e4ca3022 ("x86/cpu: Provide default cache line size if not enumerated") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/common.c | 17 +++++++---------- 1 file changed, 7 insertions(+), 10 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index ae987a26f26e4..d636991536a5f 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1053,18 +1053,9 @@ void get_cpu_cap(struct cpuinfo_x86 *c) void get_cpu_address_sizes(struct cpuinfo_x86 *c) { u32 eax, ebx, ecx, edx; - bool vp_bits_from_cpuid = true;
if (!cpu_has(c, X86_FEATURE_CPUID) || - (c->extended_cpuid_level < 0x80000008)) - vp_bits_from_cpuid = false; - - if (vp_bits_from_cpuid) { - cpuid(0x80000008, &eax, &ebx, &ecx, &edx); - - c->x86_virt_bits = (eax >> 8) & 0xff; - c->x86_phys_bits = eax & 0xff; - } else { + (c->extended_cpuid_level < 0x80000008)) { if (IS_ENABLED(CONFIG_X86_64)) { c->x86_clflush_size = 64; c->x86_phys_bits = 36; @@ -1078,7 +1069,13 @@ void get_cpu_address_sizes(struct cpuinfo_x86 *c) cpu_has(c, X86_FEATURE_PSE36)) c->x86_phys_bits = 36; } + } else { + cpuid(0x80000008, &eax, &ebx, &ecx, &edx); + + c->x86_virt_bits = (eax >> 8) & 0xff; + c->x86_phys_bits = eax & 0xff; } + c->x86_cache_bits = c->x86_phys_bits; c->x86_cache_alignment = c->x86_clflush_size; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Hansen dave.hansen@linux.intel.com
[ Upstream commit 2a38e4ca302280fdcce370ba2bee79bac16c4587 ]
tl;dr: CPUs with CPUID.80000008H but without CPUID.01H:EDX[CLFSH] will end up reporting cache_line_size()==0 and bad things happen. Fill in a default on those to avoid the problem.
Long Story:
The kernel dies a horrible death if c->x86_cache_alignment (aka. cache_line_size() is 0. Normally, this value is populated from c->x86_clflush_size.
Right now the code is set up to get c->x86_clflush_size from two places. First, modern CPUs get it from CPUID. Old CPUs that don't have leaf 0x80000008 (or CPUID at all) just get some sane defaults from the kernel in get_cpu_address_sizes().
The vast majority of CPUs that have leaf 0x80000008 also get ->x86_clflush_size from CPUID. But there are oddballs.
Intel Quark CPUs[1] and others[2] have leaf 0x80000008 but don't set CPUID.01H:EDX[CLFSH], so they skip over filling in ->x86_clflush_size:
cpuid(0x00000001, &tfms, &misc, &junk, &cap0); if (cap0 & (1<<19)) c->x86_clflush_size = ((misc >> 8) & 0xff) * 8;
So they: land in get_cpu_address_sizes() and see that CPUID has level 0x80000008 and jump into the side of the if() that does not fill in c->x86_clflush_size. That assigns a 0 to c->x86_cache_alignment, and hilarity ensues in code like:
buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()), GFP_KERNEL);
To fix this, always provide a sane value for ->x86_clflush_size.
Big thanks to Andy Shevchenko for finding and reporting this and also providing a first pass at a fix. But his fix was only partial and only worked on the Quark CPUs. It would not, for instance, have worked on the QEMU config.
1. https://raw.githubusercontent.com/InstLatx64/InstLatx64/master/GenuineIntel/... 2. You can also get this behavior if you use "-cpu 486,+clzero" in QEMU.
[ dhansen: remove 'vp_bits_from_cpuid' reference in changelog because bpetkov brutally murdered it recently. ]
Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach") Reported-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Tested-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Tested-by: Jörn Heusipp osmanx@heusipp.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20240516173928.3960193-1-andriy.shevchenko@linux... Link: https://lore.kernel.org/lkml/5e31cad3-ad4d-493e-ab07-724cfbfaba44@heusipp.de... Link: https://lore.kernel.org/all/20240517200534.8EC5F33E%40davehans-spike.ostc.in... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/common.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index d636991536a5f..1982007828276 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -1074,6 +1074,10 @@ void get_cpu_address_sizes(struct cpuinfo_x86 *c)
c->x86_virt_bits = (eax >> 8) & 0xff; c->x86_phys_bits = eax & 0xff; + + /* Provide a sane default if not enumerated: */ + if (!c->x86_clflush_size) + c->x86_clflush_size = 32; }
c->x86_cache_bits = c->x86_phys_bits;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
[ Upstream commit 69e545edbe8b17c26aa06ef7e430d0be7f08d876 ]
After commit f7d5bcd35d42 ("selftests: kselftest: Mark functions that unconditionally call exit() as __noreturn"), ksft_exit_...() functions are marked as __noreturn, which means the return type should not be 'int' but 'void' because they are not returning anything (and never were since exit() has always been called).
To facilitate updating the return type of these functions, remove 'return' before the calls to ksft_exit_...(), as __noreturn prevents the compiler from warning that a caller of the ksft_exit functions does not return a value because the program will terminate upon calling these functions.
Reviewed-by: Muhammad Usama Anjum usama.anjum@collabora.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Nathan Chancellor nathan@kernel.org Signed-off-by: Shuah Khan skhan@linuxfoundation.org Stable-dep-of: fb9293b6b015 ("selftests/mm: compaction_test: fix bogus test success and reduce probability of OOM-killer invocation") Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/mm/compaction_test.c | 6 +++--- tools/testing/selftests/mm/cow.c | 2 +- tools/testing/selftests/mm/gup_longterm.c | 2 +- tools/testing/selftests/mm/gup_test.c | 4 ++-- tools/testing/selftests/mm/ksm_functional_tests.c | 2 +- tools/testing/selftests/mm/madv_populate.c | 2 +- tools/testing/selftests/mm/mkdirty.c | 2 +- tools/testing/selftests/mm/pagemap_ioctl.c | 4 ++-- tools/testing/selftests/mm/soft-dirty.c | 2 +- 9 files changed, 13 insertions(+), 13 deletions(-)
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c index 326a893647bab..5e9bd1da9370c 100644 --- a/tools/testing/selftests/mm/compaction_test.c +++ b/tools/testing/selftests/mm/compaction_test.c @@ -185,7 +185,7 @@ int main(int argc, char **argv) ksft_print_header();
if (prereq() || geteuid()) - return ksft_exit_skip("Prerequisites unsatisfied\n"); + ksft_exit_skip("Prerequisites unsatisfied\n");
ksft_set_plan(1);
@@ -233,7 +233,7 @@ int main(int argc, char **argv) }
if (check_compaction(mem_free, hugepage_size) == 0) - return ksft_exit_pass(); + ksft_exit_pass();
- return ksft_exit_fail(); + ksft_exit_fail(); } diff --git a/tools/testing/selftests/mm/cow.c b/tools/testing/selftests/mm/cow.c index 363bf5f801be5..fe078d6e18064 100644 --- a/tools/testing/selftests/mm/cow.c +++ b/tools/testing/selftests/mm/cow.c @@ -1779,5 +1779,5 @@ int main(int argc, char **argv) if (err) ksft_exit_fail_msg("%d out of %d tests failed\n", err, ksft_test_num()); - return ksft_exit_pass(); + ksft_exit_pass(); } diff --git a/tools/testing/selftests/mm/gup_longterm.c b/tools/testing/selftests/mm/gup_longterm.c index ad168d35b23b7..d7eaca5bbe9b1 100644 --- a/tools/testing/selftests/mm/gup_longterm.c +++ b/tools/testing/selftests/mm/gup_longterm.c @@ -456,5 +456,5 @@ int main(int argc, char **argv) if (err) ksft_exit_fail_msg("%d out of %d tests failed\n", err, ksft_test_num()); - return ksft_exit_pass(); + ksft_exit_pass(); } diff --git a/tools/testing/selftests/mm/gup_test.c b/tools/testing/selftests/mm/gup_test.c index 7821cf45c323b..bdeaac67ff9aa 100644 --- a/tools/testing/selftests/mm/gup_test.c +++ b/tools/testing/selftests/mm/gup_test.c @@ -229,7 +229,7 @@ int main(int argc, char **argv) break; } ksft_test_result_skip("Please run this test as root\n"); - return ksft_exit_pass(); + ksft_exit_pass(); }
p = mmap(NULL, size, PROT_READ | PROT_WRITE, flags, filed, 0); @@ -268,5 +268,5 @@ int main(int argc, char **argv)
free(tid);
- return ksft_exit_pass(); + ksft_exit_pass(); } diff --git a/tools/testing/selftests/mm/ksm_functional_tests.c b/tools/testing/selftests/mm/ksm_functional_tests.c index d615767e396be..508287560c455 100644 --- a/tools/testing/selftests/mm/ksm_functional_tests.c +++ b/tools/testing/selftests/mm/ksm_functional_tests.c @@ -646,5 +646,5 @@ int main(int argc, char **argv) if (err) ksft_exit_fail_msg("%d out of %d tests failed\n", err, ksft_test_num()); - return ksft_exit_pass(); + ksft_exit_pass(); } diff --git a/tools/testing/selftests/mm/madv_populate.c b/tools/testing/selftests/mm/madv_populate.c index 17bcb07f19f34..ef7d911da13e0 100644 --- a/tools/testing/selftests/mm/madv_populate.c +++ b/tools/testing/selftests/mm/madv_populate.c @@ -307,5 +307,5 @@ int main(int argc, char **argv) if (err) ksft_exit_fail_msg("%d out of %d tests failed\n", err, ksft_test_num()); - return ksft_exit_pass(); + ksft_exit_pass(); } diff --git a/tools/testing/selftests/mm/mkdirty.c b/tools/testing/selftests/mm/mkdirty.c index 301abb99e027e..b8a7efe9204ea 100644 --- a/tools/testing/selftests/mm/mkdirty.c +++ b/tools/testing/selftests/mm/mkdirty.c @@ -375,5 +375,5 @@ int main(void) if (err) ksft_exit_fail_msg("%d out of %d tests failed\n", err, ksft_test_num()); - return ksft_exit_pass(); + ksft_exit_pass(); } diff --git a/tools/testing/selftests/mm/pagemap_ioctl.c b/tools/testing/selftests/mm/pagemap_ioctl.c index d59517ed3d48b..2d785aca72a5c 100644 --- a/tools/testing/selftests/mm/pagemap_ioctl.c +++ b/tools/testing/selftests/mm/pagemap_ioctl.c @@ -1484,7 +1484,7 @@ int main(int argc, char *argv[]) ksft_print_header();
if (init_uffd()) - return ksft_exit_pass(); + ksft_exit_pass();
ksft_set_plan(115);
@@ -1660,5 +1660,5 @@ int main(int argc, char *argv[]) userfaultfd_tests();
close(pagemap_fd); - return ksft_exit_pass(); + ksft_exit_pass(); } diff --git a/tools/testing/selftests/mm/soft-dirty.c b/tools/testing/selftests/mm/soft-dirty.c index 7dbfa53d93a05..d9dbf879748b2 100644 --- a/tools/testing/selftests/mm/soft-dirty.c +++ b/tools/testing/selftests/mm/soft-dirty.c @@ -209,5 +209,5 @@ int main(int argc, char **argv)
close(pagemap_fd);
- return ksft_exit_pass(); + ksft_exit_pass(); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dev Jain dev.jain@arm.com
[ Upstream commit fb9293b6b0156fbf6ab97a1625d99a29c36d9f0c ]
Reset nr_hugepages to zero before the start of the test.
If a non-zero number of hugepages is already set before the start of the test, the following problems arise:
- The probability of the test getting OOM-killed increases. Proof: The test wants to run on 80% of available memory to prevent OOM-killing (see original code comments). Let the value of mem_free at the start of the test, when nr_hugepages = 0, be x. In the other case, when nr_hugepages > 0, let the memory consumed by hugepages be y. In the former case, the test operates on 0.8 * x of memory. In the latter, the test operates on 0.8 * (x - y) of memory, with y already filled, hence, memory consumed is y + 0.8 * (x - y) = 0.8 * x + 0.2 * y > 0.8 * x. Q.E.D
- The probability of a bogus test success increases. Proof: Let the memory consumed by hugepages be greater than 25% of x, with x and y defined as above. The definition of compaction_index is c_index = (x - y)/z where z is the memory consumed by hugepages after trying to increase them again. In check_compaction(), we set the number of hugepages to zero, and then increase them back; the probability that they will be set back to consume at least y amount of memory again is very high (since there is not much delay between the two attempts of changing nr_hugepages). Hence, z >= y > (x/4) (by the 25% assumption). Therefore, c_index = (x - y)/z <= (x - y)/y = x/y - 1 < 4 - 1 = 3 hence, c_index can always be forced to be less than 3, thereby the test succeeding always. Q.E.D
Link: https://lkml.kernel.org/r/20240521074358.675031-4-dev.jain@arm.com Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory") Signed-off-by: Dev Jain dev.jain@arm.com Cc: stable@vger.kernel.org Cc: Anshuman Khandual anshuman.khandual@arm.com Cc: Shuah Khan shuah@kernel.org Cc: Sri Jayaramappa sjayaram@akamai.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/mm/compaction_test.c | 71 ++++++++++++++------ 1 file changed, 49 insertions(+), 22 deletions(-)
diff --git a/tools/testing/selftests/mm/compaction_test.c b/tools/testing/selftests/mm/compaction_test.c index 5e9bd1da9370c..e140558e6f53f 100644 --- a/tools/testing/selftests/mm/compaction_test.c +++ b/tools/testing/selftests/mm/compaction_test.c @@ -82,13 +82,16 @@ int prereq(void) return -1; }
-int check_compaction(unsigned long mem_free, unsigned long hugepage_size) +int check_compaction(unsigned long mem_free, unsigned long hugepage_size, + unsigned long initial_nr_hugepages) { unsigned long nr_hugepages_ul; int fd, ret = -1; int compaction_index = 0; - char initial_nr_hugepages[20] = {0}; char nr_hugepages[20] = {0}; + char init_nr_hugepages[20] = {0}; + + sprintf(init_nr_hugepages, "%lu", initial_nr_hugepages);
/* We want to test with 80% of available memory. Else, OOM killer comes in to play */ @@ -102,23 +105,6 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size) goto out; }
- if (read(fd, initial_nr_hugepages, sizeof(initial_nr_hugepages)) <= 0) { - ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n", - strerror(errno)); - goto close_fd; - } - - lseek(fd, 0, SEEK_SET); - - /* Start with the initial condition of 0 huge pages*/ - if (write(fd, "0", sizeof(char)) != sizeof(char)) { - ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n", - strerror(errno)); - goto close_fd; - } - - lseek(fd, 0, SEEK_SET); - /* Request a large number of huge pages. The Kernel will allocate as much as it can */ if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) { @@ -146,8 +132,8 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size)
lseek(fd, 0, SEEK_SET);
- if (write(fd, initial_nr_hugepages, strlen(initial_nr_hugepages)) - != strlen(initial_nr_hugepages)) { + if (write(fd, init_nr_hugepages, strlen(init_nr_hugepages)) + != strlen(init_nr_hugepages)) { ksft_print_msg("Failed to write value to /proc/sys/vm/nr_hugepages: %s\n", strerror(errno)); goto close_fd; @@ -171,6 +157,41 @@ int check_compaction(unsigned long mem_free, unsigned long hugepage_size) return ret; }
+int set_zero_hugepages(unsigned long *initial_nr_hugepages) +{ + int fd, ret = -1; + char nr_hugepages[20] = {0}; + + fd = open("/proc/sys/vm/nr_hugepages", O_RDWR | O_NONBLOCK); + if (fd < 0) { + ksft_print_msg("Failed to open /proc/sys/vm/nr_hugepages: %s\n", + strerror(errno)); + goto out; + } + if (read(fd, nr_hugepages, sizeof(nr_hugepages)) <= 0) { + ksft_print_msg("Failed to read from /proc/sys/vm/nr_hugepages: %s\n", + strerror(errno)); + goto close_fd; + } + + lseek(fd, 0, SEEK_SET); + + /* Start with the initial condition of 0 huge pages */ + if (write(fd, "0", sizeof(char)) != sizeof(char)) { + ksft_print_msg("Failed to write 0 to /proc/sys/vm/nr_hugepages: %s\n", + strerror(errno)); + goto close_fd; + } + + *initial_nr_hugepages = strtoul(nr_hugepages, NULL, 10); + ret = 0; + + close_fd: + close(fd); + + out: + return ret; +}
int main(int argc, char **argv) { @@ -181,6 +202,7 @@ int main(int argc, char **argv) unsigned long mem_free = 0; unsigned long hugepage_size = 0; long mem_fragmentable_MB = 0; + unsigned long initial_nr_hugepages;
ksft_print_header();
@@ -189,6 +211,10 @@ int main(int argc, char **argv)
ksft_set_plan(1);
+ /* Start the test without hugepages reducing mem_free */ + if (set_zero_hugepages(&initial_nr_hugepages)) + ksft_exit_fail(); + lim.rlim_cur = RLIM_INFINITY; lim.rlim_max = RLIM_INFINITY; if (setrlimit(RLIMIT_MEMLOCK, &lim)) @@ -232,7 +258,8 @@ int main(int argc, char **argv) entry = entry->next; }
- if (check_compaction(mem_free, hugepage_size) == 0) + if (check_compaction(mem_free, hugepage_size, + initial_nr_hugepages) == 0) ksft_exit_pass();
ksft_exit_fail();
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt (Google) rostedt@goodmis.org
[ Upstream commit 340f0c7067a95281ad13734f8225f49c6cf52067 ]
The change to update the permissions of the eventfs_inode had the misconception that using the tracefs_inode would find all the eventfs_inodes that have been updated and reset them on remount. The problem with this approach is that the eventfs_inodes are freed when they are no longer used (basically the reason the eventfs system exists). When they are freed, the updated eventfs_inodes are not reset on a remount because their tracefs_inodes have been freed.
Instead, since the events directory eventfs_inode always has a tracefs_inode pointing to it (it is not freed when finished), and the events directory has a link to all its children, have the eventfs_remount() function only operate on the events eventfs_inode and have it descend into its children updating their uid and gids.
Link: https://lore.kernel.org/all/CAK7LNARXgaWw3kH9JgrnH4vK6fr8LDkNKf3wq8NhMWJrVwJ... Link: https://lore.kernel.org/linux-trace-kernel/20240523051539.754424703@goodmis....
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Andrew Morton akpm@linux-foundation.org Fixes: baa23a8d4360d ("tracefs: Reset permissions on remount if permissions are options") Reported-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/tracefs/event_inode.c | 51 ++++++++++++++++++++++++++++++---------- 1 file changed, 39 insertions(+), 12 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c index 55a40a730b10c..129d0f54ba62f 100644 --- a/fs/tracefs/event_inode.c +++ b/fs/tracefs/event_inode.c @@ -305,33 +305,60 @@ static const struct file_operations eventfs_file_operations = { .llseek = generic_file_llseek, };
-/* - * On a remount of tracefs, if UID or GID options are set, then - * the mount point inode permissions should be used. - * Reset the saved permission flags appropriately. - */ -void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid) +static void eventfs_set_attrs(struct eventfs_inode *ei, bool update_uid, kuid_t uid, + bool update_gid, kgid_t gid, int level) { - struct eventfs_inode *ei = ti->private; + struct eventfs_inode *ei_child;
- if (!ei) + /* Update events/<system>/<event> */ + if (WARN_ON_ONCE(level > 3)) return;
- if (update_uid) + if (update_uid) { ei->attr.mode &= ~EVENTFS_SAVE_UID; + ei->attr.uid = uid; + }
- if (update_gid) + if (update_gid) { ei->attr.mode &= ~EVENTFS_SAVE_GID; + ei->attr.gid = gid; + } + + list_for_each_entry(ei_child, &ei->children, list) { + eventfs_set_attrs(ei_child, update_uid, uid, update_gid, gid, level + 1); + }
if (!ei->entry_attrs) return;
for (int i = 0; i < ei->nr_entries; i++) { - if (update_uid) + if (update_uid) { ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_UID; - if (update_gid) + ei->entry_attrs[i].uid = uid; + } + if (update_gid) { ei->entry_attrs[i].mode &= ~EVENTFS_SAVE_GID; + ei->entry_attrs[i].gid = gid; + } } + +} + +/* + * On a remount of tracefs, if UID or GID options are set, then + * the mount point inode permissions should be used. + * Reset the saved permission flags appropriately. + */ +void eventfs_remount(struct tracefs_inode *ti, bool update_uid, bool update_gid) +{ + struct eventfs_inode *ei = ti->private; + + /* Only the events directory does the updates */ + if (!ei || !ei->is_events || ei->is_freed) + return; + + eventfs_set_attrs(ei, update_uid, ti->vfs_inode.i_uid, + update_gid, ti->vfs_inode.i_gid, 0); }
/* Return the evenfs_inode of the "events" directory */
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit 7da9dfdd5a3dbfd3d2450d9c6a3d1d699d625c43 upstream.
Some editors (like the vim variants), when seeing "trim_whitespace" decide to do just that for all of the whitespace in the file you are saving, even if it is not on a line that you have modified. This plays havoc with diffs and is NOT something that should be intended.
As the "only trim whitespace on modified lines" is not part of the editorconfig standard yet, just delete these lines from the .editorconfig file so that we don't end up with diffs that are automatically rejected by maintainers for containing things they shouldn't.
Cc: Danny Lin danny@kdrag0n.dev Cc: Íñigo Huguet ihuguet@redhat.com Cc: Mickaël Salaün mic@digikod.net Cc: Masahiro Yamada masahiroy@kernel.org Fixes: 5a602de99797 ("Add .editorconfig file for basic formatting") Acked-by: Vincent Mailhol mailhol.vincent@wanadoo.fr Link: https://lore.kernel.org/r/2024061137-jawless-dipped-e789@gregkh Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- .editorconfig | 3 --- 1 file changed, 3 deletions(-)
--- a/.editorconfig +++ b/.editorconfig @@ -5,7 +5,6 @@ root = true [{*.{awk,c,dts,dtsi,dtso,h,mk,s,S},Kconfig,Makefile,Makefile.*}] charset = utf-8 end_of_line = lf -trim_trailing_whitespace = true insert_final_newline = true indent_style = tab indent_size = 8 @@ -13,7 +12,6 @@ indent_size = 8 [*.{json,py,rs}] charset = utf-8 end_of_line = lf -trim_trailing_whitespace = true insert_final_newline = true indent_style = space indent_size = 4 @@ -26,7 +24,6 @@ indent_size = 8 [*.yaml] charset = utf-8 end_of_line = lf -trim_trailing_whitespace = unset insert_final_newline = true indent_style = space indent_size = 2
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Begunkov asml.silence@gmail.com
commit 54559642b96116b45e4b5ca7fd9f7835b8561272 upstream.
There is a report of io_rsrc_ref_quiesce() locking a mutex while not TASK_RUNNING, which is due to forgetting restoring the state back after io_run_task_work_sig() and attempts to break out of the waiting loop.
do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff815d2494>] prepare_to_wait+0xa4/0x380 kernel/sched/wait.c:237 WARNING: CPU: 2 PID: 397056 at kernel/sched/core.c:10099 __might_sleep+0x114/0x160 kernel/sched/core.c:10099 RIP: 0010:__might_sleep+0x114/0x160 kernel/sched/core.c:10099 Call Trace: <TASK> __mutex_lock_common kernel/locking/mutex.c:585 [inline] __mutex_lock+0xb4/0x940 kernel/locking/mutex.c:752 io_rsrc_ref_quiesce+0x590/0x940 io_uring/rsrc.c:253 io_sqe_buffers_unregister+0xa2/0x340 io_uring/rsrc.c:799 __io_uring_register io_uring/register.c:424 [inline] __do_sys_io_uring_register+0x5b9/0x2400 io_uring/register.c:613 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xd8/0x270 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x6f/0x77
Reported-by: Li Shi sl1589472800@gmail.com Fixes: 4ea15b56f0810 ("io_uring/rsrc: use wq for quiescing") Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/77966bc104e25b0534995d5dbb152332bc8f31c0.171819695... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/rsrc.c | 1 + 1 file changed, 1 insertion(+)
--- a/io_uring/rsrc.c +++ b/io_uring/rsrc.c @@ -250,6 +250,7 @@ __cold static int io_rsrc_ref_quiesce(st
ret = io_run_task_work_sig(ctx); if (ret < 0) { + __set_current_state(TASK_RUNNING); mutex_lock(&ctx->uring_lock); if (list_empty(&ctx->rsrc_ref_list)) ret = 0;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Begunkov asml.silence@gmail.com
commit f4a1254f2a076afb0edd473589bf40f9b4d36b41 upstream.
Only the current owner of a request is allowed to write into req->flags. Hence, the cancellation path should never touch it. Add a new field instead of the flag, move it into the 3rd cache line because it should always be initialised. poll_refs can move further as polling is an involved process anyway.
It's a minimal patch, in the future we can and should find a better place for it and remove now unused REQ_F_CANCEL_SEQ.
Fixes: 521223d7c229f ("io_uring/cancel: don't default to setting req->work.cancel_seq") Cc: stable@vger.kernel.org Reported-by: Li Shi sl1589472800@gmail.com Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/6827b129f8f0ad76fa9d1f0a773de938b240ffab.171832343... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/io_uring_types.h | 3 ++- io_uring/cancel.h | 4 ++-- io_uring/io_uring.c | 1 + 3 files changed, 5 insertions(+), 3 deletions(-)
--- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -653,7 +653,7 @@ struct io_kiocb { struct io_rsrc_node *rsrc_node;
atomic_t refs; - atomic_t poll_refs; + bool cancel_seq_set; struct io_task_work io_task_work; /* for polled requests, i.e. IORING_OP_POLL_ADD and async armed poll */ struct hlist_node hash_node; @@ -662,6 +662,7 @@ struct io_kiocb { /* opcode allocated if it needs to store data for async defer */ void *async_data; /* linked requests, IFF REQ_F_HARDLINK or REQ_F_LINK are set */ + atomic_t poll_refs; struct io_kiocb *link; /* custom credentials, valid IFF REQ_F_CREDS is set */ const struct cred *creds; --- a/io_uring/cancel.h +++ b/io_uring/cancel.h @@ -27,10 +27,10 @@ bool io_cancel_req_match(struct io_kiocb
static inline bool io_cancel_match_sequence(struct io_kiocb *req, int sequence) { - if ((req->flags & REQ_F_CANCEL_SEQ) && sequence == req->work.cancel_seq) + if (req->cancel_seq_set && sequence == req->work.cancel_seq) return true;
- req->flags |= REQ_F_CANCEL_SEQ; + req->cancel_seq_set = true; req->work.cancel_seq = sequence; return false; } --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2211,6 +2211,7 @@ static int io_init_req(struct io_ring_ct req->file = NULL; req->rsrc_node = NULL; req->task = current; + req->cancel_seq_set = false;
if (unlikely(opcode >= IORING_OP_LAST)) { req->opcode = 0;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alan Stern stern@rowland.harvard.edu
commit 22f00812862564b314784167a89f27b444f82a46 upstream.
The syzbot fuzzer found that the interrupt-URB completion callback in the cdc-wdm driver was taking too long, and the driver's immediate resubmission of interrupt URBs with -EPROTO status combined with the dummy-hcd emulation to cause a CPU lockup:
cdc_wdm 1-1:1.0: nonzero urb status received: -71 cdc_wdm 1-1:1.0: wdm_int_callback - 0 bytes watchdog: BUG: soft lockup - CPU#0 stuck for 26s! [syz-executor782:6625] CPU#0 Utilization every 4s during lockup: #1: 98% system, 0% softirq, 3% hardirq, 0% idle #2: 98% system, 0% softirq, 3% hardirq, 0% idle #3: 98% system, 0% softirq, 3% hardirq, 0% idle #4: 98% system, 0% softirq, 3% hardirq, 0% idle #5: 98% system, 1% softirq, 3% hardirq, 0% idle Modules linked in: irq event stamp: 73096 hardirqs last enabled at (73095): [<ffff80008037bc00>] console_emit_next_record kernel/printk/printk.c:2935 [inline] hardirqs last enabled at (73095): [<ffff80008037bc00>] console_flush_all+0x650/0xb74 kernel/printk/printk.c:2994 hardirqs last disabled at (73096): [<ffff80008af10b00>] __el1_irq arch/arm64/kernel/entry-common.c:533 [inline] hardirqs last disabled at (73096): [<ffff80008af10b00>] el1_interrupt+0x24/0x68 arch/arm64/kernel/entry-common.c:551 softirqs last enabled at (73048): [<ffff8000801ea530>] softirq_handle_end kernel/softirq.c:400 [inline] softirqs last enabled at (73048): [<ffff8000801ea530>] handle_softirqs+0xa60/0xc34 kernel/softirq.c:582 softirqs last disabled at (73043): [<ffff800080020de8>] __do_softirq+0x14/0x20 kernel/softirq.c:588 CPU: 0 PID: 6625 Comm: syz-executor782 Tainted: G W 6.10.0-rc2-syzkaller-g8867bbd4a056 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024
Testing showed that the problem did not occur if the two error messages -- the first two lines above -- were removed; apparently adding material to the kernel log takes a surprisingly large amount of time.
In any case, the best approach for preventing these lockups and to avoid spamming the log with thousands of error messages per second is to ratelimit the two dev_err() calls. Therefore we replace them with dev_err_ratelimited().
Signed-off-by: Alan Stern stern@rowland.harvard.edu Suggested-by: Greg KH gregkh@linuxfoundation.org Reported-and-tested-by: syzbot+5f996b83575ef4058638@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/00000000000073d54b061a6a1c65@google.com/ Reported-and-tested-by: syzbot+1b2abad17596ad03dcff@syzkaller.appspotmail.com Closes: https://lore.kernel.org/linux-usb/000000000000f45085061aa9b37e@google.com/ Fixes: 9908a32e94de ("USB: remove err() macro from usb class drivers") Link: https://lore.kernel.org/linux-usb/40dfa45b-5f21-4eef-a8c1-51a2f320e267@rowla... Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/29855215-52f5-4385-b058-91f42c2bee18@rowland.harva... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/class/cdc-wdm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/usb/class/cdc-wdm.c +++ b/drivers/usb/class/cdc-wdm.c @@ -266,14 +266,14 @@ static void wdm_int_callback(struct urb dev_err(&desc->intf->dev, "Stall on int endpoint\n"); goto sw; /* halt is cleared in work */ default: - dev_err(&desc->intf->dev, + dev_err_ratelimited(&desc->intf->dev, "nonzero urb status received: %d\n", status); break; } }
if (urb->actual_length < sizeof(struct usb_cdc_notification)) { - dev_err(&desc->intf->dev, "wdm_int_callback - %d bytes\n", + dev_err_ratelimited(&desc->intf->dev, "wdm_int_callback - %d bytes\n", urb->actual_length); goto exit; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrey Konovalov andreyknvl@gmail.com
commit f85d39dd7ed89ffdd622bc1de247ffba8d961504 upstream.
After commit 8fea0c8fda30 ("usb: core: hcd: Convert from tasklet to BH workqueue"), usb_giveback_urb_bh() runs in the BH workqueue with interrupts enabled.
Thus, the remote coverage collection section in usb_giveback_urb_bh()-> __usb_hcd_giveback_urb() might be interrupted, and the interrupt handler might invoke __usb_hcd_giveback_urb() again.
This breaks KCOV, as it does not support nested remote coverage collection sections within the same context (neither in task nor in softirq).
Update kcov_remote_start/stop_usb_softirq() to disable interrupts for the duration of the coverage collection section to avoid nested sections in the softirq context (in addition to such in the task context, which are already handled).
Reported-by: Tetsuo Handa penguin-kernel@i-love.sakura.ne.jp Closes: https://lore.kernel.org/linux-usb/0f4d1964-7397-485b-bc48-11c01e2fcbca@I-lov... Closes: https://syzkaller.appspot.com/bug?extid=0438378d6f157baae1a2 Suggested-by: Alan Stern stern@rowland.harvard.edu Fixes: 8fea0c8fda30 ("usb: core: hcd: Convert from tasklet to BH workqueue") Cc: stable@vger.kernel.org Acked-by: Dmitry Vyukov dvyukov@google.com Signed-off-by: Andrey Konovalov andreyknvl@gmail.com Link: https://lore.kernel.org/r/20240527173538.4989-1-andrey.konovalov@linux.dev Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/core/hcd.c | 12 +++++++----- include/linux/kcov.h | 47 ++++++++++++++++++++++++++++++++++++++--------- 2 files changed, 45 insertions(+), 14 deletions(-)
--- a/drivers/usb/core/hcd.c +++ b/drivers/usb/core/hcd.c @@ -1623,6 +1623,7 @@ static void __usb_hcd_giveback_urb(struc struct usb_hcd *hcd = bus_to_hcd(urb->dev->bus); struct usb_anchor *anchor = urb->anchor; int status = urb->unlinked; + unsigned long flags;
urb->hcpriv = NULL; if (unlikely((urb->transfer_flags & URB_SHORT_NOT_OK) && @@ -1640,13 +1641,14 @@ static void __usb_hcd_giveback_urb(struc /* pass ownership to the completion handler */ urb->status = status; /* - * This function can be called in task context inside another remote - * coverage collection section, but kcov doesn't support that kind of - * recursion yet. Only collect coverage in softirq context for now. + * Only collect coverage in the softirq context and disable interrupts + * to avoid scenarios with nested remote coverage collection sections + * that KCOV does not support. + * See the comment next to kcov_remote_start_usb_softirq() for details. */ - kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum); + flags = kcov_remote_start_usb_softirq((u64)urb->dev->bus->busnum); urb->complete(urb); - kcov_remote_stop_softirq(); + kcov_remote_stop_softirq(flags);
usb_anchor_resume_wakeups(anchor); atomic_dec(&urb->use_count); --- a/include/linux/kcov.h +++ b/include/linux/kcov.h @@ -55,21 +55,47 @@ static inline void kcov_remote_start_usb
/* * The softirq flavor of kcov_remote_*() functions is introduced as a temporary - * work around for kcov's lack of nested remote coverage sections support in - * task context. Adding support for nested sections is tracked in: - * https://bugzilla.kernel.org/show_bug.cgi?id=210337 + * workaround for KCOV's lack of nested remote coverage sections support. + * + * Adding support is tracked in https://bugzilla.kernel.org/show_bug.cgi?id=210337. + * + * kcov_remote_start_usb_softirq(): + * + * 1. Only collects coverage when called in the softirq context. This allows + * avoiding nested remote coverage collection sections in the task context. + * For example, USB/IP calls usb_hcd_giveback_urb() in the task context + * within an existing remote coverage collection section. Thus, KCOV should + * not attempt to start collecting coverage within the coverage collection + * section in __usb_hcd_giveback_urb() in this case. + * + * 2. Disables interrupts for the duration of the coverage collection section. + * This allows avoiding nested remote coverage collection sections in the + * softirq context (a softirq might occur during the execution of a work in + * the BH workqueue, which runs with in_serving_softirq() > 0). + * For example, usb_giveback_urb_bh() runs in the BH workqueue with + * interrupts enabled, so __usb_hcd_giveback_urb() might be interrupted in + * the middle of its remote coverage collection section, and the interrupt + * handler might invoke __usb_hcd_giveback_urb() again. */
-static inline void kcov_remote_start_usb_softirq(u64 id) +static inline unsigned long kcov_remote_start_usb_softirq(u64 id) { - if (in_serving_softirq()) + unsigned long flags = 0; + + if (in_serving_softirq()) { + local_irq_save(flags); kcov_remote_start_usb(id); + } + + return flags; }
-static inline void kcov_remote_stop_softirq(void) +static inline void kcov_remote_stop_softirq(unsigned long flags) { - if (in_serving_softirq()) + if (in_serving_softirq()) { kcov_remote_stop(); + local_irq_restore(flags); + } }
#ifdef CONFIG_64BIT @@ -103,8 +129,11 @@ static inline u64 kcov_common_handle(voi } static inline void kcov_remote_start_common(u64 id) {} static inline void kcov_remote_start_usb(u64 id) {} -static inline void kcov_remote_start_usb_softirq(u64 id) {} -static inline void kcov_remote_stop_softirq(void) {} +static inline unsigned long kcov_remote_start_usb_softirq(u64 id) +{ + return 0; +} +static inline void kcov_remote_stop_softirq(unsigned long flags) {}
#endif /* CONFIG_KCOV */ #endif /* _LINUX_KCOV_H */
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Ernberg john.ernberg@actia.se
commit 8475ffcfb381a77075562207ce08552414a80326 upstream.
If no other USB HCDs are selected when compiling a small pure virutal machine, the Xen HCD driver cannot be built.
Fix it by traversing down host/ if CONFIG_USB_XEN_HCD is selected.
Fixes: 494ed3997d75 ("usb: Introduce Xen pvUSB frontend (xen hcd)") Cc: stable@vger.kernel.org # v5.17+ Signed-off-by: John Ernberg john.ernberg@actia.se Link: https://lore.kernel.org/r/20240517114345.1190755-1-john.ernberg@actia.se Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/Makefile | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/Makefile +++ b/drivers/usb/Makefile @@ -35,6 +35,7 @@ obj-$(CONFIG_USB_R8A66597_HCD) += host/ obj-$(CONFIG_USB_FSL_USB2) += host/ obj-$(CONFIG_USB_FOTG210_HCD) += host/ obj-$(CONFIG_USB_MAX3421_HCD) += host/ +obj-$(CONFIG_USB_XEN_HCD) += host/
obj-$(CONFIG_USB_C67X00_HCD) += c67x00/
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amit Sunil Dhamne amitsd@google.com
commit e7e921918d905544500ca7a95889f898121ba886 upstream.
There could be a potential use-after-free case in tcpm_register_source_caps(). This could happen when: * new (say invalid) source caps are advertised * the existing source caps are unregistered * tcpm_register_source_caps() returns with an error as usb_power_delivery_register_capabilities() fails
This causes port->partner_source_caps to hold on to the now freed source caps.
Reset port->partner_source_caps value to NULL after unregistering existing source caps.
Fixes: 230ecdf71a64 ("usb: typec: tcpm: unregister existing source caps before re-registration") Cc: stable@vger.kernel.org Signed-off-by: Amit Sunil Dhamne amitsd@google.com Reviewed-by: Ondrej Jirman megi@xff.cz Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Link: https://lore.kernel.org/r/20240514220134.2143181-1-amitsd@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/tcpm/tcpm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -3014,8 +3014,10 @@ static int tcpm_register_source_caps(str memcpy(caps.pdo, port->source_caps, sizeof(u32) * port->nr_source_caps); caps.role = TYPEC_SOURCE;
- if (cap) + if (cap) { usb_power_delivery_unregister_capabilities(cap); + port->partner_source_caps = NULL; + }
cap = usb_power_delivery_register_capabilities(port->partner_pd, &caps); if (IS_ERR(cap))
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kyle Tso kyletso@google.com
commit fc8fb9eea94d8f476e15f3a4a7addeb16b3b99d6 upstream.
Similar to what fixed in Commit a6fe37f428c1 ("usb: typec: tcpm: Skip hard reset when in error recovery"), the handling of the received Hard Reset has to be skipped during TOGGLING state.
[ 4086.021288] VBUS off [ 4086.021295] pending state change SNK_READY -> SNK_UNATTACHED @ 650 ms [rev2 NONE_AMS] [ 4086.022113] VBUS VSAFE0V [ 4086.022117] state change SNK_READY -> SNK_UNATTACHED [rev2 NONE_AMS] [ 4086.022447] VBUS off [ 4086.022450] state change SNK_UNATTACHED -> SNK_UNATTACHED [rev2 NONE_AMS] [ 4086.023060] VBUS VSAFE0V [ 4086.023064] state change SNK_UNATTACHED -> SNK_UNATTACHED [rev2 NONE_AMS] [ 4086.023070] disable BIST MODE TESTDATA [ 4086.023766] disable vbus discharge ret:0 [ 4086.023911] Setting usb_comm capable false [ 4086.028874] Setting voltage/current limit 0 mV 0 mA [ 4086.028888] polarity 0 [ 4086.030305] Requesting mux state 0, usb-role 0, orientation 0 [ 4086.033539] Start toggling [ 4086.038496] state change SNK_UNATTACHED -> TOGGLING [rev2 NONE_AMS]
// This Hard Reset is unexpected [ 4086.038499] Received hard reset [ 4086.038501] state change TOGGLING -> HARD_RESET_START [rev2 HARD_RESET]
Fixes: f0690a25a140 ("staging: typec: USB Type-C Port Manager (tcpm)") Cc: stable@vger.kernel.org Signed-off-by: Kyle Tso kyletso@google.com Reviewed-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20240520154858.1072347-1-kyletso@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/tcpm/tcpm.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -6174,6 +6174,7 @@ static void _tcpm_pd_hard_reset(struct t port->tcpc->set_bist_data(port->tcpc, false);
switch (port->state) { + case TOGGLING: case ERROR_RECOVERY: case PORT_RESET: case PORT_RESET_WAIT_OFF:
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomas Winkler tomas.winkler@intel.com
commit 283cb234ef95d94c61f59e1cd070cd9499b51292 upstream.
The mei_me_pci_resume doesn't release irq on the error path, in case mei_start() fails.
Cc: stable@kernel.org Fixes: 33ec08263147 ("mei: revamp mei reset state machine") Signed-off-by: Tomas Winkler tomas.winkler@intel.com Link: https://lore.kernel.org/r/20240604090728.1027307-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/misc/mei/pci-me.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/misc/mei/pci-me.c +++ b/drivers/misc/mei/pci-me.c @@ -385,8 +385,10 @@ static int mei_me_pci_resume(struct devi }
err = mei_restart(dev); - if (err) + if (err) { + free_irq(pdev->irq, dev); return err; + }
/* Start timer if stopped in suspend */ schedule_delayed_work(&dev->timer_work, HZ);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wentong Wu wentong.wu@intel.com
commit 9b5e045029d8bded4c6979874ed3abc347c1415c upstream.
The dynamically created mei client device (mei csi) is used as one V4L2 sub device of the whole video pipeline, and the V4L2 connection graph is built by software node. The mei_stop() and mei_restart() will delete the old mei csi client device and create a new mei client device, which will cause the software node information saved in old mei csi device lost and the whole video pipeline will be broken.
Removing mei_stop()/mei_restart() during system suspend/resume can fix the issue above and won't impact hardware actual power saving logic.
Fixes: f6085a96c973 ("mei: vsc: Unregister interrupt handler for system suspend") Cc: stable@vger.kernel.org # for 6.8+ Reported-by: Hao Yao hao.yao@intel.com Signed-off-by: Wentong Wu wentong.wu@intel.com Reviewed-by: Sakari Ailus sakari.ailus@linux.intel.com Tested-by: Jason Chen jason.z.chen@intel.com Tested-by: Sakari Ailus sakari.ailus@linux.intel.com Acked-by: Tomas Winkler tomas.winkler@intel.com Link: https://lore.kernel.org/r/20240527123835.522384-1-wentong.wu@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/misc/mei/platform-vsc.c | 39 +++++++++++++++------------------------ 1 file changed, 15 insertions(+), 24 deletions(-)
--- a/drivers/misc/mei/platform-vsc.c +++ b/drivers/misc/mei/platform-vsc.c @@ -399,41 +399,32 @@ static void mei_vsc_remove(struct platfo
static int mei_vsc_suspend(struct device *dev) { - struct mei_device *mei_dev = dev_get_drvdata(dev); - struct mei_vsc_hw *hw = mei_dev_to_vsc_hw(mei_dev); + struct mei_device *mei_dev; + int ret = 0;
- mei_stop(mei_dev); + mei_dev = dev_get_drvdata(dev); + if (!mei_dev) + return -ENODEV;
- mei_disable_interrupts(mei_dev); + mutex_lock(&mei_dev->device_lock);
- vsc_tp_free_irq(hw->tp); + if (!mei_write_is_idle(mei_dev)) + ret = -EAGAIN;
- return 0; + mutex_unlock(&mei_dev->device_lock); + + return ret; }
static int mei_vsc_resume(struct device *dev) { - struct mei_device *mei_dev = dev_get_drvdata(dev); - struct mei_vsc_hw *hw = mei_dev_to_vsc_hw(mei_dev); - int ret; - - ret = vsc_tp_request_irq(hw->tp); - if (ret) - return ret; - - ret = mei_restart(mei_dev); - if (ret) - goto err_free; + struct mei_device *mei_dev;
- /* start timer if stopped in suspend */ - schedule_delayed_work(&mei_dev->timer_work, HZ); + mei_dev = dev_get_drvdata(dev); + if (!mei_dev) + return -ENODEV;
return 0; - -err_free: - vsc_tp_free_irq(hw->tp); - - return ret; }
static DEFINE_SIMPLE_DEV_PM_OPS(mei_vsc_pm_ops, mei_vsc_suspend, mei_vsc_resume);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilpo Järvinen ilpo.jarvinen@linux.intel.com
commit b19ab7ee2c4c1ec5f27c18413c3ab63907f7d55c upstream.
When lookahead has "consumed" some characters (la_count > 0), n_tty_receive_buf_standard() and n_tty_receive_buf_closing() for characters beyond the la_count are given wrong cp/fp offsets which leads to duplicating and losing some characters.
If la_count > 0, correct buffer pointers and make count consistent too (the latter is not strictly necessary to fix the issue but seems more logical to adjust all variables immediately to keep state consistent).
Reported-by: Vadym Krevs vkrevs@yahoo.com Fixes: 6bb6fa6908eb ("tty: Implement lookahead to process XON/XOFF timely") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218834 Tested-by: Vadym Krevs vkrevs@yahoo.com Cc: stable@vger.kernel.org Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/r/20240514140429.12087-1-ilpo.jarvinen@linux.intel.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/n_tty.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
--- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1619,15 +1619,25 @@ static void __receive_buf(struct tty_str else if (ldata->raw || (L_EXTPROC(tty) && !preops)) n_tty_receive_buf_raw(tty, cp, fp, count); else if (tty->closing && !L_EXTPROC(tty)) { - if (la_count > 0) + if (la_count > 0) { n_tty_receive_buf_closing(tty, cp, fp, la_count, true); - if (count > la_count) - n_tty_receive_buf_closing(tty, cp, fp, count - la_count, false); + cp += la_count; + if (fp) + fp += la_count; + count -= la_count; + } + if (count > 0) + n_tty_receive_buf_closing(tty, cp, fp, count, false); } else { - if (la_count > 0) + if (la_count > 0) { n_tty_receive_buf_standard(tty, cp, fp, la_count, true); - if (count > la_count) - n_tty_receive_buf_standard(tty, cp, fp, count - la_count, false); + cp += la_count; + if (fp) + fp += la_count; + count -= la_count; + } + if (count > 0) + n_tty_receive_buf_standard(tty, cp, fp, count, false);
flush_echoes(tty); if (tty->ops->flush_chars)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Doug Brown doug@schmorgal.com
commit 5208e7ced520a813b4f4774451fbac4e517e78b2 upstream.
The FIFO is 64 bytes, but the FCR is configured to fire the TX interrupt when the FIFO is half empty (bit 3 = 0). Thus, we should only write 32 bytes when a TX interrupt occurs.
This fixes a problem observed on the PXA168 that dropped a bunch of TX bytes during large transmissions.
Fixes: ab28f51c77cd ("serial: rewrite pxa2xx-uart to use 8250_core") Signed-off-by: Doug Brown doug@schmorgal.com Link: https://lore.kernel.org/r/20240519191929.122202-1-doug@schmorgal.com Cc: stable stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/8250/8250_pxa.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/tty/serial/8250/8250_pxa.c +++ b/drivers/tty/serial/8250/8250_pxa.c @@ -125,6 +125,7 @@ static int serial_pxa_probe(struct platf uart.port.iotype = UPIO_MEM32; uart.port.regshift = 2; uart.port.fifosize = 64; + uart.tx_loadsz = 32; uart.dl_write = serial_pxa_dl_write;
ret = serial8250_register_8250_port(&uart);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Douglas Anderson dianders@chromium.org
commit ca84cd379b45e9b1775b9e026f069a3a886b409d upstream.
Recently, suspend testing on sc7180-trogdor based devices has started to sometimes fail with messages like this:
port a88000.serial:0.0: PM: calling pm_runtime_force_suspend+0x0/0xf8 @ 28934, parent: a88000.serial:0 port a88000.serial:0.0: PM: dpm_run_callback(): pm_runtime_force_suspend+0x0/0xf8 returns -16 port a88000.serial:0.0: PM: pm_runtime_force_suspend+0x0/0xf8 returned -16 after 33 usecs port a88000.serial:0.0: PM: failed to suspend: error -16
I could reproduce these problems by logging in via an agetty on the debug serial port (which was _not_ used for kernel console) and running: cat /var/log/messages ...and then (via an SSH session) forcing a few suspend/resume cycles.
Tracing through the code and doing some printf()-based debugging shows that the -16 (-EBUSY) comes from the recently added serial_port_runtime_suspend().
The idea of the serial_port_runtime_suspend() function is to prevent the port from being _runtime_ suspended if it still has bytes left to transmit. Having bytes left to transmit isn't a reason to block _system_ suspend, though. If a serdev device in the kernel needs to block system suspend it should block its own suspend and it can use serdev_device_wait_until_sent() to ensure bytes are sent.
The DEFINE_RUNTIME_DEV_PM_OPS() used by the serial_port code means that the system suspend function will be pm_runtime_force_suspend(). In pm_runtime_force_suspend() we can see that before calling the runtime suspend function we'll call pm_runtime_disable(). This should be a reliable way to detect that we're called from system suspend and that we shouldn't look for busyness.
Fixes: 43066e32227e ("serial: port: Don't suspend if the port is still busy") Cc: stable@vger.kernel.org Reviewed-by: Tony Lindgren tony.lindgren@linux.intel.com Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://lore.kernel.org/r/20240531080914.v3.1.I2395e66cf70c6e67d774c56943825... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/serial_port.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/tty/serial/serial_port.c +++ b/drivers/tty/serial/serial_port.c @@ -63,6 +63,13 @@ static int serial_port_runtime_suspend(s if (port->flags & UPF_DEAD) return 0;
+ /* + * Nothing to do on pm_runtime_force_suspend(), see + * DEFINE_RUNTIME_DEV_PM_OPS. + */ + if (!pm_runtime_enabled(dev)) + return 0; + uart_port_lock_irqsave(port, &flags); if (!port_dev->tx_enabled) { uart_port_unlock_irqrestore(port, flags);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mickaël Salaün mic@digikod.net
commit 88da52ccd66e65f2e63a6c35c9dff55d448ef4dc upstream.
The WARN_ON_ONCE() in collect_domain_accesses() can be triggered when trying to link a root mount point. This cannot work in practice because this directory is mounted, but the VFS check is done after the call to security_path_link().
Do not use source directory's d_parent when the source directory is the mount point.
Cc: Günther Noack gnoack@google.com Cc: Paul Moore paul@paul-moore.com Cc: stable@vger.kernel.org Reported-by: syzbot+bf4903dc7e12b18ebc87@syzkaller.appspotmail.com Fixes: b91c3e4ea756 ("landlock: Add support for file reparenting with LANDLOCK_ACCESS_FS_REFER") Closes: https://lore.kernel.org/r/000000000000553d3f0618198200@google.com Link: https://lore.kernel.org/r/20240516181935.1645983-2-mic@digikod.net [mic: Fix commit message] Signed-off-by: Mickaël Salaün mic@digikod.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/landlock/fs.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
--- a/security/landlock/fs.c +++ b/security/landlock/fs.c @@ -950,6 +950,7 @@ static int current_check_refer_path(stru bool allow_parent1, allow_parent2; access_mask_t access_request_parent1, access_request_parent2; struct path mnt_dir; + struct dentry *old_parent; layer_mask_t layer_masks_parent1[LANDLOCK_NUM_ACCESS_FS] = {}, layer_masks_parent2[LANDLOCK_NUM_ACCESS_FS] = {};
@@ -997,9 +998,17 @@ static int current_check_refer_path(stru mnt_dir.mnt = new_dir->mnt; mnt_dir.dentry = new_dir->mnt->mnt_root;
+ /* + * old_dentry may be the root of the common mount point and + * !IS_ROOT(old_dentry) at the same time (e.g. with open_tree() and + * OPEN_TREE_CLONE). We do not need to call dget(old_parent) because + * we keep a reference to old_dentry. + */ + old_parent = (old_dentry == mnt_dir.dentry) ? old_dentry : + old_dentry->d_parent; + /* new_dir->dentry is equal to new_dentry->d_parent */ - allow_parent1 = collect_domain_accesses(dom, mnt_dir.dentry, - old_dentry->d_parent, + allow_parent1 = collect_domain_accesses(dom, mnt_dir.dentry, old_parent, &layer_masks_parent1); allow_parent2 = collect_domain_accesses( dom, mnt_dir.dentry, new_dir->dentry, &layer_masks_parent2);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Greg Kroah-Hartman gregkh@linuxfoundation.org
commit 7c55b78818cfb732680c4a72ab270cc2d2ee3d0f upstream.
When an xattr size is not what is expected, it is printed out to the kernel log in hex format as a form of debugging. But when that xattr size is bigger than the expected size, printing it out can cause an access off the end of the buffer.
Fix this all up by properly restricting the size of the debug hex dump in the kernel log.
Reported-by: syzbot+9dfe490c8176301c1d06@syzkaller.appspotmail.com Cc: Dave Kleikamp shaggy@kernel.org Link: https://lore.kernel.org/r/2024051433-slider-cloning-98f9@gregkh Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/jfs/xattr.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/fs/jfs/xattr.c +++ b/fs/jfs/xattr.c @@ -557,9 +557,11 @@ static int ea_get(struct inode *inode, s
size_check: if (EALIST_SIZE(ea_buf->xattr) != ea_size) { + int size = min_t(int, EALIST_SIZE(ea_buf->xattr), ea_size); + printk(KERN_ERR "ea_get: invalid extended attribute\n"); print_hex_dump(KERN_ERR, "", DUMP_PREFIX_ADDRESS, 16, 1, - ea_buf->xattr, ea_size, 1); + ea_buf->xattr, size, 1); ea_release(inode, ea_buf); rc = -EIO; goto clean_up;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Nyman mathias.nyman@linux.intel.com
commit f0260589b439e2637ad54a2b25f00a516ef28a57 upstream.
The transferred length is set incorrectly for cancelled bulk transfer TDs in case the bulk transfer ring stops on the last transfer block with a 'Stop - Length Invalid' completion code.
length essentially ends up being set to the requested length: urb->actual_length = urb->transfer_buffer_length
Length for 'Stop - Length Invalid' cases should be the sum of all TRB transfer block lengths up to the one the ring stopped on, _excluding_ the one stopped on.
Fix this by always summing up TRB lengths for 'Stop - Length Invalid' bulk cases.
This issue was discovered by Alan Stern while debugging https://bugzilla.kernel.org/show_bug.cgi?id=218890, but does not solve that bug. Issue is older than 4.10 kernel but fix won't apply to those due to major reworks in that area.
Tested-by: Pierre Tomon pierretom+12@ik.me Cc: stable@vger.kernel.org # v4.10+ Cc: Alan Stern stern@rowland.harvard.edu Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20240611120610.3264502-2-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-ring.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -2535,9 +2535,8 @@ static int process_bulk_intr_td(struct x goto finish_td; case COMP_STOPPED_LENGTH_INVALID: /* stopped on ep trb with invalid length, exclude it */ - ep_trb_len = 0; - remaining = 0; - break; + td->urb->actual_length = sum_trb_lengths(xhci, ep_ring, ep_trb); + goto finish_td; case COMP_USB_TRANSACTION_ERROR: if (xhci->quirks & XHCI_NO_SOFT_RETRY || (ep->err_count++ > MAX_SOFT_RETRY) ||
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuangyi Chiang ki.chiang65@gmail.com
commit 17bd54555c2aaecfdb38e2734149f684a73fa584 upstream.
As described in commit c877b3b2ad5c ("xhci: Add reset on resume quirk for asrock p67 host"), EJ188 have the same issue as EJ168, where completely dies on resume. So apply XHCI_RESET_ON_RESUME quirk to EJ188 as well.
Cc: stable@vger.kernel.org Signed-off-by: Kuangyi Chiang ki.chiang65@gmail.com Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20240611120610.3264502-3-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-pci.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -36,6 +36,7 @@
#define PCI_VENDOR_ID_ETRON 0x1b6f #define PCI_DEVICE_ID_EJ168 0x7023 +#define PCI_DEVICE_ID_EJ188 0x7052
#define PCI_DEVICE_ID_INTEL_LYNXPOINT_XHCI 0x8c31 #define PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI 0x9c31 @@ -402,6 +403,10 @@ static void xhci_pci_quirks(struct devic xhci->quirks |= XHCI_TRUST_TX_LENGTH; xhci->quirks |= XHCI_BROKEN_STREAMS; } + if (pdev->vendor == PCI_VENDOR_ID_ETRON && + pdev->device == PCI_DEVICE_ID_EJ188) + xhci->quirks |= XHCI_RESET_ON_RESUME; + if (pdev->vendor == PCI_VENDOR_ID_RENESAS && pdev->device == 0x0014) { xhci->quirks |= XHCI_TRUST_TX_LENGTH;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hector Martin marcan@marcan.st
commit 5ceac4402f5d975e5a01c806438eb4e554771577 upstream.
When multiple streams are in use, multiple TDs might be in flight when an endpoint is stopped. We need to issue a Set TR Dequeue Pointer for each, to ensure everything is reset properly and the caches cleared. Change the logic so that any N>1 TDs found active for different streams are deferred until after the first one is processed, calling xhci_invalidate_cancelled_tds() again from xhci_handle_cmd_set_deq() to queue another command until we are done with all of them. Also change the error/"should never happen" paths to ensure we at least clear any affected TDs, even if we can't issue a command to clear the hardware cache, and complain loudly with an xhci_warn() if this ever happens.
This problem case dates back to commit e9df17eb1408 ("USB: xhci: Correct assumptions about number of rings per endpoint.") early on in the XHCI driver's life, when stream support was first added. It was then identified but not fixed nor made into a warning in commit 674f8438c121 ("xhci: split handling halted endpoints into two steps"), which added a FIXME comment for the problem case (without materially changing the behavior as far as I can tell, though the new logic made the problem more obvious).
Then later, in commit 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs."), it was acknowledged again.
[Mathias: commit 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs.") was a targeted regression fix to the previously mentioned patch. Users reported issues with usb stuck after unmounting/disconnecting UAS devices. This rolled back the TD clearing of multiple streams to its original state.]
Apparently the commit author was aware of the problem (yet still chose to submit it): It was still mentioned as a FIXME, an xhci_dbg() was added to log the problem condition, and the remaining issue was mentioned in the commit description. The choice of making the log type xhci_dbg() for what is, at this point, a completely unhandled and known broken condition is puzzling and unfortunate, as it guarantees that no actual users would see the log in production, thereby making it nigh undebuggable (indeed, even if you turn on DEBUG, the message doesn't really hint at there being a problem at all).
It took me *months* of random xHC crashes to finally find a reliable repro and be able to do a deep dive debug session, which could all have been avoided had this unhandled, broken condition been actually reported with a warning, as it should have been as a bug intentionally left in unfixed (never mind that it shouldn't have been left in at all).
Another fix to solve clearing the caches of all stream rings with cancelled TDs is needed, but not as urgent.
3 years after that statement and 14 years after the original bug was introduced, I think it's finally time to fix it. And maybe next time let's not leave bugs unfixed (that are actually worse than the original bug), and let's actually get people to review kernel commits please.
Fixes xHC crashes and IOMMU faults with UAS devices when handling errors/faults. Easiest repro is to use `hdparm` to mark an early sector (e.g. 1024) on a disk as bad, then `cat /dev/sdX > /dev/null` in a loop. At least in the case of JMicron controllers, the read errors end up having to cancel two TDs (for two queued requests to different streams) and the one that didn't get cleared properly ends up faulting the xHC entirely when it tries to access DMA pages that have since been unmapped, referred to by the stale TDs. This normally happens quickly (after two or three loops). After this fix, I left the `cat` in a loop running overnight and experienced no xHC failures, with all read errors recovered properly. Repro'd and tested on an Apple M1 Mac Mini (dwc3 host).
On systems without an IOMMU, this bug would instead silently corrupt freed memory, making this a security bug (even on systems with IOMMUs this could silently corrupt memory belonging to other USB devices on the same controller, so it's still a security bug). Given that the kernel autoprobes partition tables, I'm pretty sure a malicious USB device pretending to be a UAS device and reporting an error with the right timing could deliberately trigger a UAF and write to freed memory, with no user action.
[Mathias: Commit message and code comment edit, original at:] https://lore.kernel.org/linux-usb/20240524-xhci-streams-v1-1-6b1f13819bea@ma...
Fixes: e9df17eb1408 ("USB: xhci: Correct assumptions about number of rings per endpoint.") Fixes: 94f339147fc3 ("xhci: Fix failure to give back some cached cancelled URBs.") Fixes: 674f8438c121 ("xhci: split handling halted endpoints into two steps") Cc: stable@vger.kernel.org Cc: security@kernel.org Reviewed-by: Neal Gompa neal@gompa.dev Signed-off-by: Hector Martin marcan@marcan.st Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20240611120610.3264502-5-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-ring.c | 54 ++++++++++++++++++++++++++++++++++--------- drivers/usb/host/xhci.h | 1 2 files changed, 44 insertions(+), 11 deletions(-)
--- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -1034,13 +1034,27 @@ static int xhci_invalidate_cancelled_tds break; case TD_DIRTY: /* TD is cached, clear it */ case TD_HALTED: + case TD_CLEARING_CACHE_DEFERRED: + if (cached_td) { + if (cached_td->urb->stream_id != td->urb->stream_id) { + /* Multiple streams case, defer move dq */ + xhci_dbg(xhci, + "Move dq deferred: stream %u URB %p\n", + td->urb->stream_id, td->urb); + td->cancel_status = TD_CLEARING_CACHE_DEFERRED; + break; + } + + /* Should never happen, but clear the TD if it does */ + xhci_warn(xhci, + "Found multiple active URBs %p and %p in stream %u?\n", + td->urb, cached_td->urb, + td->urb->stream_id); + td_to_noop(xhci, ring, cached_td, false); + cached_td->cancel_status = TD_CLEARED; + } + td->cancel_status = TD_CLEARING_CACHE; - if (cached_td) - /* FIXME stream case, several stopped rings */ - xhci_dbg(xhci, - "Move dq past stream %u URB %p instead of stream %u URB %p\n", - td->urb->stream_id, td->urb, - cached_td->urb->stream_id, cached_td->urb); cached_td = td; break; } @@ -1060,10 +1074,16 @@ static int xhci_invalidate_cancelled_tds if (err) { /* Failed to move past cached td, just set cached TDs to no-op */ list_for_each_entry_safe(td, tmp_td, &ep->cancelled_td_list, cancelled_td_list) { - if (td->cancel_status != TD_CLEARING_CACHE) + /* + * Deferred TDs need to have the deq pointer set after the above command + * completes, so if that failed we just give up on all of them (and + * complain loudly since this could cause issues due to caching). + */ + if (td->cancel_status != TD_CLEARING_CACHE && + td->cancel_status != TD_CLEARING_CACHE_DEFERRED) continue; - xhci_dbg(xhci, "Failed to clear cancelled cached URB %p, mark clear anyway\n", - td->urb); + xhci_warn(xhci, "Failed to clear cancelled cached URB %p, mark clear anyway\n", + td->urb); td_to_noop(xhci, ring, td, false); td->cancel_status = TD_CLEARED; } @@ -1350,6 +1370,7 @@ static void xhci_handle_cmd_set_deq(stru struct xhci_ep_ctx *ep_ctx; struct xhci_slot_ctx *slot_ctx; struct xhci_td *td, *tmp_td; + bool deferred = false;
ep_index = TRB_TO_EP_INDEX(le32_to_cpu(trb->generic.field[3])); stream_id = TRB_TO_STREAM_ID(le32_to_cpu(trb->generic.field[2])); @@ -1436,6 +1457,8 @@ static void xhci_handle_cmd_set_deq(stru xhci_dbg(ep->xhci, "%s: Giveback cancelled URB %p TD\n", __func__, td->urb); xhci_td_cleanup(ep->xhci, td, ep_ring, td->status); + } else if (td->cancel_status == TD_CLEARING_CACHE_DEFERRED) { + deferred = true; } else { xhci_dbg(ep->xhci, "%s: Keep cancelled URB %p TD as cancel_status is %d\n", __func__, td->urb, td->cancel_status); @@ -1445,8 +1468,17 @@ cleanup: ep->ep_state &= ~SET_DEQ_PENDING; ep->queued_deq_seg = NULL; ep->queued_deq_ptr = NULL; - /* Restart any rings with pending URBs */ - ring_doorbell_for_active_rings(xhci, slot_id, ep_index); + + if (deferred) { + /* We have more streams to clear */ + xhci_dbg(ep->xhci, "%s: Pending TDs to clear, continuing with invalidation\n", + __func__); + xhci_invalidate_cancelled_tds(ep); + } else { + /* Restart any rings with pending URBs */ + xhci_dbg(ep->xhci, "%s: All TDs cleared, ring doorbell\n", __func__); + ring_doorbell_for_active_rings(xhci, slot_id, ep_index); + } }
static void xhci_handle_cmd_reset_ep(struct xhci_hcd *xhci, int slot_id, --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1276,6 +1276,7 @@ enum xhci_cancelled_td_status { TD_DIRTY = 0, TD_HALTED, TD_CLEARING_CACHE, + TD_CLEARING_CACHE_DEFERRED, TD_CLEARED, };
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuangyi Chiang ki.chiang65@gmail.com
commit 91f7a1524a92c70ffe264db8bdfa075f15bbbeb9 upstream.
As described in commit 8f873c1ff4ca ("xhci: Blacklist using streams on the Etron EJ168 controller"), EJ188 have the same issue as EJ168, where Streams do not work reliable on EJ188. So apply XHCI_BROKEN_STREAMS quirk to EJ188 as well.
Cc: stable@vger.kernel.org Signed-off-by: Kuangyi Chiang ki.chiang65@gmail.com Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20240611120610.3264502-4-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-pci.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -404,8 +404,10 @@ static void xhci_pci_quirks(struct devic xhci->quirks |= XHCI_BROKEN_STREAMS; } if (pdev->vendor == PCI_VENDOR_ID_ETRON && - pdev->device == PCI_DEVICE_ID_EJ188) + pdev->device == PCI_DEVICE_ID_EJ188) { xhci->quirks |= XHCI_RESET_ON_RESUME; + xhci->quirks |= XHCI_BROKEN_STREAMS; + }
if (pdev->vendor == PCI_VENDOR_ID_RENESAS && pdev->device == 0x0014) {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aapo Vienamo aapo.vienamo@linux.intel.com
commit 985cfe501b74f214905ab4817acee0df24627268 upstream.
The margin debugfs node controls the "Enable Margin Test" field of the lane margining operations. This field selects between either low or high voltage margin values for voltage margin test or left or right timing margin values for timing margin test.
According to the USB4 specification, whether or not the "Enable Margin Test" control applies, depends on the values of the "Independent High/Low Voltage Margin" or "Independent Left/Right Timing Margin" capability fields for voltage and timing margin tests respectively. The pre-existing condition enabled the debugfs node also in the case where both low/high or left/right margins are returned, which is incorrect. This change only enables the debugfs node in question, if the specific required capability values are met.
Signed-off-by: Aapo Vienamo aapo.vienamo@linux.intel.com Fixes: d0f1e0c2a699 ("thunderbolt: Add support for receiver lane margining") Cc: stable@vger.kernel.org Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thunderbolt/debugfs.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/thunderbolt/debugfs.c +++ b/drivers/thunderbolt/debugfs.c @@ -943,8 +943,9 @@ static void margining_port_init(struct t debugfs_create_file("run", 0600, dir, port, &margining_run_fops); debugfs_create_file("results", 0600, dir, port, &margining_results_fops); debugfs_create_file("test", 0600, dir, port, &margining_test_fops); - if (independent_voltage_margins(usb4) || - (supports_time(usb4) && independent_time_margins(usb4))) + if (independent_voltage_margins(usb4) == USB4_MARGIN_CAP_0_VOLTAGE_HL || + (supports_time(usb4) && + independent_time_margins(usb4) == USB4_MARGIN_CAP_1_TIME_LR)) debugfs_create_file("margin", 0600, dir, port, &margining_margin_fops); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit a6a75edc8669a4f030546c7390808ef0cc034742 upstream.
The SCSI Removable Media Bit (RMB) should only be set for removable media, where the device stays and the media changes, e.g. CD-ROM or floppy.
The ATA removable media device bit is obsoleted since ATA-8 ACS (2006), but before that it was used to indicate that the device can have its media removed (while the device stays).
Commit 8a3e33cf92c7 ("ata: ahci: find eSATA ports and flag them as removable") introduced a change to set the RMB bit if the port has either the eSATA bit or the hot-plug capable bit set. The reasoning was that the author wanted his eSATA ports to get treated like a USB stick.
This is however wrong. See "20-082r23SPC-6: Removable Medium Bit Expectations" which has since been integrated to SPC, which states that:
""" Reports have been received that some USB Memory Stick device servers set the removable medium (RMB) bit to one. The rub comes when the medium is actually removed, because... The device server is removed concurrently with the medium removal. If there is no device server, then there is no device server that is waiting to have removable medium inserted.
Sufficient numbers of SCSI analysts see such a device: - not as a device that supports removable medium; but - as a removable, hot pluggable device. """
The definition of the RMB bit in the SPC specification has since been clarified to match this.
Thus, a USB stick should not have the RMB bit set (and neither shall an eSATA nor a hot-plug capable port).
Commit dc8b4afc4a04 ("ata: ahci: don't mark HotPlugCapable Ports as external/removable") then changed so that the RMB bit is only set for the eSATA bit (and not for the hot-plug capable bit), because of a lot of bug reports of SATA devices were being automounted by udisks. However, treating eSATA and hot-plug capable ports differently is not correct.
From the AHCI 1.3.1 spec:
Hot Plug Capable Port (HPCP): When set to '1', indicates that this port's signal and power connectors are externally accessible via a joint signal and power connector for blindmate device hot plug.
So a hot-plug capable port is an external port, just like commit 45b96d65ec68 ("ata: ahci: a hotplug capable port is an external port") claims.
In order to not violate the SPC specification, modify the SCSI INQUIRY data to only set the RMB bit if the ATA device can have its media removed.
This fixes a reported problem where GNOME/udisks was automounting devices connected to hot-plug capable ports.
Fixes: 45b96d65ec68 ("ata: ahci: a hotplug capable port is an external port") Cc: stable@vger.kernel.org Reviewed-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Thomas Weißschuh linux@weissschuh.net Tested-by: Thomas Weißschuh linux@weissschuh.net Reported-by: Thomas Weißschuh linux@weissschuh.net Closes: https://lore.kernel.org/linux-ide/c0de8262-dc4b-4c22-9fac-33432e5bddd3@t-8ch... Signed-off-by: Damien Le Moal dlemoal@kernel.org [cassel: wrote commit message] Signed-off-by: Niklas Cassel cassel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/libata-scsi.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/ata/libata-scsi.c +++ b/drivers/ata/libata-scsi.c @@ -1828,11 +1828,11 @@ static unsigned int ata_scsiop_inq_std(s 2 };
- /* set scsi removable (RMB) bit per ata bit, or if the - * AHCI port says it's external (Hotplug-capable, eSATA). + /* + * Set the SCSI Removable Media Bit (RMB) if the ATA removable media + * device bit (obsolete since ATA-8 ACS) is set. */ - if (ata_id_removable(args->id) || - (args->dev->link->ap->pflags & ATA_PFLAG_EXTERNAL)) + if (ata_id_removable(args->id)) hdr[1] |= (1 << 7);
if (args->dev->class == ATA_DEV_ZAC) {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit 52912ca87e2b810e5acdcdc452593d30c9187d8f upstream.
For SCSI devices supporting the Command Duration Limits feature set, the user can enable/disable this feature use through the sysfs device attribute "cdl_enable". This attribute modification triggers a call to scsi_cdl_enable() to enable and disable the feature for ATA devices and set the scsi device cdl_enable field to the user provided bool value. For SCSI devices supporting CDL, the feature set is always enabled and scsi_cdl_enable() is reduced to setting the cdl_enable field.
However, for ATA devices, a drive may spin-up with the CDL feature enabled by default. But the SCSI device cdl_enable field is always initialized to false (CDL disabled), regardless of the actual device CDL feature state. For ATA devices managed by libata (or libsas), libata-core always disables the CDL feature set when the device is attached, thus syncing the state of the CDL feature on the device and of the SCSI device cdl_enable field. However, for ATA devices connected to a SAS HBA, the CDL feature is not disabled on scan for ATA devices that have this feature enabled by default, leading to an inconsistent state of the feature on the device with the SCSI device cdl_enable field.
Avoid this inconsistency by adding a call to scsi_cdl_enable() in scsi_cdl_check() to make sure that the device-side state of the CDL feature set always matches the scsi device cdl_enable field state. This implies that CDL will always be disabled for ATA devices connected to SAS HBAs, which is consistent with libata/libsas initialization of the device.
Reported-by: Scott McCoy scott.mccoy@wdc.com Fixes: 1b22cfb14142 ("scsi: core: Allow enabling and disabling command duration limits") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Link: https://lore.kernel.org/r/20240607012507.111488-1-dlemoal@kernel.org Reviewed-by: Niklas Cassel cassel@kernel.org Reviewed-by: Igor Pylypiv ipylypiv@google.com Reviewed-by: Hannes Reinecke hare@suse.de Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/scsi.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/scsi/scsi.c +++ b/drivers/scsi/scsi.c @@ -673,6 +673,13 @@ void scsi_cdl_check(struct scsi_device * sdev->use_10_for_rw = 0;
sdev->cdl_supported = 1; + + /* + * If the device supports CDL, make sure that the current drive + * feature status is consistent with the user controlled + * cdl_enable state. + */ + scsi_cdl_enable(sdev, sdev->cdl_enable); } else { sdev->cdl_supported = 0; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit 90e6f08915ec6efe46570420412a65050ec826b2 upstream.
The function mpi3mr_qcmd() of the mpi3mr driver is able to indicate to the HBA if a read or write command directed at an ATA device should be translated to an NCQ read/write command with the high prioiryt bit set when the request uses the RT priority class and the user has enabled NCQ priority through sysfs.
However, unlike the mpt3sas driver, the mpi3mr driver does not define the sas_ncq_prio_supported and sas_ncq_prio_enable sysfs attributes, so the ncq_prio_enable field of struct mpi3mr_sdev_priv_data is never actually set and NCQ Priority cannot ever be used.
Fix this by defining these missing atributes to allow a user to check if an ATA device supports NCQ priority and to enable/disable the use of NCQ priority. To do this, lift the function scsih_ncq_prio_supp() out of the mpt3sas driver and make it the generic SCSI SAS transport function sas_ata_ncq_prio_supported(). Nothing in that function is hardware specific, so this function can be used in both the mpt3sas driver and the mpi3mr driver.
Reported-by: Scott McCoy scott.mccoy@wdc.com Fixes: 023ab2a9b4ed ("scsi: mpi3mr: Add support for queue command processing") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Link: https://lore.kernel.org/r/20240611083435.92961-1-dlemoal@kernel.org Reviewed-by: Niklas Cassel cassel@kernel.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/mpi3mr/mpi3mr_app.c | 62 +++++++++++++++++++++++++++++++++++ drivers/scsi/mpt3sas/mpt3sas_base.h | 3 - drivers/scsi/mpt3sas/mpt3sas_ctl.c | 4 +- drivers/scsi/mpt3sas/mpt3sas_scsih.c | 23 ------------ drivers/scsi/scsi_transport_sas.c | 23 ++++++++++++ include/scsi/scsi_transport_sas.h | 2 + 6 files changed, 89 insertions(+), 28 deletions(-)
--- a/drivers/scsi/mpi3mr/mpi3mr_app.c +++ b/drivers/scsi/mpi3mr/mpi3mr_app.c @@ -2158,10 +2158,72 @@ persistent_id_show(struct device *dev, s } static DEVICE_ATTR_RO(persistent_id);
+/** + * sas_ncq_prio_supported_show - Indicate if device supports NCQ priority + * @dev: pointer to embedded device + * @attr: sas_ncq_prio_supported attribute descriptor + * @buf: the buffer returned + * + * A sysfs 'read-only' sdev attribute, only works with SATA devices + */ +static ssize_t +sas_ncq_prio_supported_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct scsi_device *sdev = to_scsi_device(dev); + + return sysfs_emit(buf, "%d\n", sas_ata_ncq_prio_supported(sdev)); +} +static DEVICE_ATTR_RO(sas_ncq_prio_supported); + +/** + * sas_ncq_prio_enable_show - send prioritized io commands to device + * @dev: pointer to embedded device + * @attr: sas_ncq_prio_enable attribute descriptor + * @buf: the buffer returned + * + * A sysfs 'read/write' sdev attribute, only works with SATA devices + */ +static ssize_t +sas_ncq_prio_enable_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct scsi_device *sdev = to_scsi_device(dev); + struct mpi3mr_sdev_priv_data *sdev_priv_data = sdev->hostdata; + + if (!sdev_priv_data) + return 0; + + return sysfs_emit(buf, "%d\n", sdev_priv_data->ncq_prio_enable); +} + +static ssize_t +sas_ncq_prio_enable_store(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) +{ + struct scsi_device *sdev = to_scsi_device(dev); + struct mpi3mr_sdev_priv_data *sdev_priv_data = sdev->hostdata; + bool ncq_prio_enable = 0; + + if (kstrtobool(buf, &ncq_prio_enable)) + return -EINVAL; + + if (!sas_ata_ncq_prio_supported(sdev)) + return -EINVAL; + + sdev_priv_data->ncq_prio_enable = ncq_prio_enable; + + return strlen(buf); +} +static DEVICE_ATTR_RW(sas_ncq_prio_enable); + static struct attribute *mpi3mr_dev_attrs[] = { &dev_attr_sas_address.attr, &dev_attr_device_handle.attr, &dev_attr_persistent_id.attr, + &dev_attr_sas_ncq_prio_supported.attr, + &dev_attr_sas_ncq_prio_enable.attr, NULL, };
--- a/drivers/scsi/mpt3sas/mpt3sas_base.h +++ b/drivers/scsi/mpt3sas/mpt3sas_base.h @@ -2048,9 +2048,6 @@ void mpt3sas_setup_direct_io(struct MPT3SAS_ADAPTER *ioc, struct scsi_cmnd *scmd, struct _raid_device *raid_device, Mpi25SCSIIORequest_t *mpi_request);
-/* NCQ Prio Handling Check */ -bool scsih_ncq_prio_supp(struct scsi_device *sdev); - void mpt3sas_setup_debugfs(struct MPT3SAS_ADAPTER *ioc); void mpt3sas_destroy_debugfs(struct MPT3SAS_ADAPTER *ioc); void mpt3sas_init_debugfs(void); --- a/drivers/scsi/mpt3sas/mpt3sas_ctl.c +++ b/drivers/scsi/mpt3sas/mpt3sas_ctl.c @@ -4088,7 +4088,7 @@ sas_ncq_prio_supported_show(struct devic { struct scsi_device *sdev = to_scsi_device(dev);
- return sysfs_emit(buf, "%d\n", scsih_ncq_prio_supp(sdev)); + return sysfs_emit(buf, "%d\n", sas_ata_ncq_prio_supported(sdev)); } static DEVICE_ATTR_RO(sas_ncq_prio_supported);
@@ -4123,7 +4123,7 @@ sas_ncq_prio_enable_store(struct device if (kstrtobool(buf, &ncq_prio_enable)) return -EINVAL;
- if (!scsih_ncq_prio_supp(sdev)) + if (!sas_ata_ncq_prio_supported(sdev)) return -EINVAL;
sas_device_priv_data->ncq_prio_enable = ncq_prio_enable; --- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c +++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c @@ -12573,29 +12573,6 @@ scsih_pci_mmio_enabled(struct pci_dev *p return PCI_ERS_RESULT_RECOVERED; }
-/** - * scsih_ncq_prio_supp - Check for NCQ command priority support - * @sdev: scsi device struct - * - * This is called when a user indicates they would like to enable - * ncq command priorities. This works only on SATA devices. - */ -bool scsih_ncq_prio_supp(struct scsi_device *sdev) -{ - struct scsi_vpd *vpd; - bool ncq_prio_supp = false; - - rcu_read_lock(); - vpd = rcu_dereference(sdev->vpd_pg89); - if (!vpd || vpd->len < 214) - goto out; - - ncq_prio_supp = (vpd->data[213] >> 4) & 1; -out: - rcu_read_unlock(); - - return ncq_prio_supp; -} /* * The pci device ids are defined in mpi/mpi2_cnfg.h. */ --- a/drivers/scsi/scsi_transport_sas.c +++ b/drivers/scsi/scsi_transport_sas.c @@ -416,6 +416,29 @@ unsigned int sas_is_tlr_enabled(struct s } EXPORT_SYMBOL_GPL(sas_is_tlr_enabled);
+/** + * sas_ata_ncq_prio_supported - Check for ATA NCQ command priority support + * @sdev: SCSI device + * + * Check if an ATA device supports NCQ priority using VPD page 89h (ATA + * Information). Since this VPD page is implemented only for ATA devices, + * this function always returns false for SCSI devices. + */ +bool sas_ata_ncq_prio_supported(struct scsi_device *sdev) +{ + struct scsi_vpd *vpd; + bool ncq_prio_supported = false; + + rcu_read_lock(); + vpd = rcu_dereference(sdev->vpd_pg89); + if (vpd && vpd->len >= 214) + ncq_prio_supported = (vpd->data[213] >> 4) & 1; + rcu_read_unlock(); + + return ncq_prio_supported; +} +EXPORT_SYMBOL_GPL(sas_ata_ncq_prio_supported); + /* * SAS Phy attributes */ --- a/include/scsi/scsi_transport_sas.h +++ b/include/scsi/scsi_transport_sas.h @@ -200,6 +200,8 @@ unsigned int sas_is_tlr_enabled(struct s void sas_disable_tlr(struct scsi_device *); void sas_enable_tlr(struct scsi_device *);
+bool sas_ata_ncq_prio_supported(struct scsi_device *sdev); + extern struct sas_rphy *sas_end_device_alloc(struct sas_port *); extern struct sas_rphy *sas_expander_alloc(struct sas_port *, enum sas_device_type); void sas_rphy_free(struct sas_rphy *);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
commit 4254dfeda82f20844299dca6c38cbffcfd499f41 upstream.
There is a potential out-of-bounds access when using test_bit() on a single word. The test_bit() and set_bit() functions operate on long values, and when testing or setting a single word, they can exceed the word boundary. KASAN detects this issue and produces a dump:
BUG: KASAN: slab-out-of-bounds in _scsih_add_device.constprop.0 (./arch/x86/include/asm/bitops.h:60 ./include/asm-generic/bitops/instrumented-atomic.h:29 drivers/scsi/mpt3sas/mpt3sas_scsih.c:7331) mpt3sas
Write of size 8 at addr ffff8881d26e3c60 by task kworker/u1536:2/2965
For full log, please look at [1].
Make the allocation at least the size of sizeof(unsigned long) so that set_bit() and test_bit() have sufficient room for read/write operations without overwriting unallocated memory.
[1] Link: https://lore.kernel.org/all/ZkNcALr3W3KGYYJG@gmail.com/
Fixes: c696f7b83ede ("scsi: mpt3sas: Implement device_remove_in_progress check in IOCTL path") Cc: stable@vger.kernel.org Suggested-by: Keith Busch kbusch@kernel.org Signed-off-by: Breno Leitao leitao@debian.org Link: https://lore.kernel.org/r/20240605085530.499432-1-leitao@debian.org Reviewed-by: Keith Busch kbusch@kernel.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/mpt3sas/mpt3sas_base.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c +++ b/drivers/scsi/mpt3sas/mpt3sas_base.c @@ -8512,6 +8512,12 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPT ioc->pd_handles_sz = (ioc->facts.MaxDevHandle / 8); if (ioc->facts.MaxDevHandle % 8) ioc->pd_handles_sz++; + /* + * pd_handles_sz should have, at least, the minimal room for + * set_bit()/test_bit(), otherwise out-of-memory touch may occur. + */ + ioc->pd_handles_sz = ALIGN(ioc->pd_handles_sz, sizeof(unsigned long)); + ioc->pd_handles = kzalloc(ioc->pd_handles_sz, GFP_KERNEL); if (!ioc->pd_handles) { @@ -8529,6 +8535,13 @@ mpt3sas_base_attach(struct MPT3SAS_ADAPT ioc->pend_os_device_add_sz = (ioc->facts.MaxDevHandle / 8); if (ioc->facts.MaxDevHandle % 8) ioc->pend_os_device_add_sz++; + + /* + * pend_os_device_add_sz should have, at least, the minimal room for + * set_bit()/test_bit(), otherwise out-of-memory may occur. + */ + ioc->pend_os_device_add_sz = ALIGN(ioc->pend_os_device_add_sz, + sizeof(unsigned long)); ioc->pend_os_device_add = kzalloc(ioc->pend_os_device_add_sz, GFP_KERNEL); if (!ioc->pend_os_device_add) { @@ -8820,6 +8833,12 @@ _base_check_ioc_facts_changes(struct MPT if (ioc->facts.MaxDevHandle % 8) pd_handles_sz++;
+ /* + * pd_handles should have, at least, the minimal room for + * set_bit()/test_bit(), otherwise out-of-memory touch may + * occur. + */ + pd_handles_sz = ALIGN(pd_handles_sz, sizeof(unsigned long)); pd_handles = krealloc(ioc->pd_handles, pd_handles_sz, GFP_KERNEL); if (!pd_handles) {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin K. Petersen martin.petersen@oracle.com
commit 7926d51f73e0434a6250c2fd1a0555f98d9a62da upstream.
Commit 321da3dc1f3c ("scsi: sd: usb_storage: uas: Access media prior to querying device properties") triggered a read to LBA 0 before attempting to inquire about device characteristics. This was done because some protocol bridge devices will return generic values until an attached storage device's media has been accessed.
Pierre Tomon reported that this change caused problems on a large capacity external drive connected via a bridge device. The bridge in question does not appear to implement the READ(10) command.
Issue a READ(16) instead of READ(10) when a device has been identified as preferring 16-byte commands (use_16_for_rw heuristic).
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218890 Link: https://lore.kernel.org/r/70dd7ae0-b6b1-48e1-bb59-53b7c7f18274@rowland.harva... Link: https://lore.kernel.org/r/20240605022521.3960956-1-martin.petersen@oracle.co... Fixes: 321da3dc1f3c ("scsi: sd: usb_storage: uas: Access media prior to querying device properties") Cc: stable@vger.kernel.org Reported-by: Pierre Tomon pierretom+12@ik.me Suggested-by: Alan Stern stern@rowland.harvard.edu Tested-by: Pierre Tomon pierretom+12@ik.me Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/scsi/sd.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)
--- a/drivers/scsi/sd.c +++ b/drivers/scsi/sd.c @@ -3572,16 +3572,23 @@ static bool sd_validate_opt_xfer_size(st
static void sd_read_block_zero(struct scsi_disk *sdkp) { - unsigned int buf_len = sdkp->device->sector_size; - char *buffer, cmd[10] = { }; + struct scsi_device *sdev = sdkp->device; + unsigned int buf_len = sdev->sector_size; + u8 *buffer, cmd[16] = { };
buffer = kmalloc(buf_len, GFP_KERNEL); if (!buffer) return;
- cmd[0] = READ_10; - put_unaligned_be32(0, &cmd[2]); /* Logical block address 0 */ - put_unaligned_be16(1, &cmd[7]); /* Transfer 1 logical block */ + if (sdev->use_16_for_rw) { + cmd[0] = READ_16; + put_unaligned_be64(0, &cmd[2]); /* Logical block address 0 */ + put_unaligned_be32(1, &cmd[10]);/* Transfer 1 logical block */ + } else { + cmd[0] = READ_10; + put_unaligned_be32(0, &cmd[2]); /* Logical block address 0 */ + put_unaligned_be16(1, &cmd[7]); /* Transfer 1 logical block */ + }
scsi_execute_cmd(sdkp->device, cmd, REQ_OP_DRV_IN, buffer, buf_len, SD_TIMEOUT, sdkp->max_retries, NULL);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ziwei Xiao ziweixiao@google.com
commit 6f4d93b78ade0a4c2cafd587f7b429ce95abb02e upstream.
gve_rx_free_skb incorrectly leaves napi->skb referencing an skb after it is freed with dev_kfree_skb_any(). This can result in a subsequent call to napi_get_frags returning a dangling pointer.
Fix this by clearing napi->skb before the skb is freed.
Fixes: 9b8dd5e5ea48 ("gve: DQO: Add RX path") Cc: stable@vger.kernel.org Reported-by: Shailend Chand shailend@google.com Signed-off-by: Ziwei Xiao ziweixiao@google.com Reviewed-by: Harshitha Ramamurthy hramamurthy@google.com Reviewed-by: Shailend Chand shailend@google.com Reviewed-by: Praveen Kaligineedi pkaligineedi@google.com Link: https://lore.kernel.org/r/20240612001654.923887-1-ziweixiao@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/google/gve/gve_rx_dqo.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/net/ethernet/google/gve/gve_rx_dqo.c +++ b/drivers/net/ethernet/google/gve/gve_rx_dqo.c @@ -581,11 +581,13 @@ static void gve_rx_skb_hash(struct sk_bu skb_set_hash(skb, le32_to_cpu(compl_desc->hash), hash_type); }
-static void gve_rx_free_skb(struct gve_rx_ring *rx) +static void gve_rx_free_skb(struct napi_struct *napi, struct gve_rx_ring *rx) { if (!rx->ctx.skb_head) return;
+ if (rx->ctx.skb_head == napi->skb) + napi->skb = NULL; dev_kfree_skb_any(rx->ctx.skb_head); rx->ctx.skb_head = NULL; rx->ctx.skb_tail = NULL; @@ -884,7 +886,7 @@ int gve_rx_poll_dqo(struct gve_notify_bl
err = gve_rx_dqo(napi, rx, compl_desc, complq->head, rx->q_num); if (err < 0) { - gve_rx_free_skb(rx); + gve_rx_free_skb(napi, rx); u64_stats_update_begin(&rx->statss); if (err == -ENOMEM) rx->rx_skb_alloc_fail++; @@ -927,7 +929,7 @@ int gve_rx_poll_dqo(struct gve_notify_bl
/* gve_rx_complete_skb() will consume skb if successful */ if (gve_rx_complete_skb(rx, napi, compl_desc, feat) != 0) { - gve_rx_free_skb(rx); + gve_rx_free_skb(napi, rx); u64_stats_update_begin(&rx->statss); rx->rx_desc_err_dropped_pkt++; u64_stats_update_end(&rx->statss);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hari Bathini hbathini@linux.ibm.com
commit 7b090b6ff51b9a9f002139660672f662b95f0630 upstream.
Since commit 5c4233cc0920 ("powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency"), crashing_cpu is not available without CONFIG_CRASH_DUMP. Fix compile error on 64-BIT 85xx owing to this change.
Fixes: 5c4233cc0920 ("powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency") Cc: stable@vger.kernel.org # v6.9+ Reported-by: Christian Zigotzky chzigotzky@xenosoft.de Closes: https://lore.kernel.org/all/fa247ae4-5825-4dbe-a737-d93b7ab4d4b9@xenosoft.de... Suggested-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Hari Bathini hbathini@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240510080757.560159-1-hbathini@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/platforms/85xx/smp.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/arch/powerpc/platforms/85xx/smp.c +++ b/arch/powerpc/platforms/85xx/smp.c @@ -398,6 +398,7 @@ static void mpc85xx_smp_kexec_cpu_down(i hard_irq_disable(); mpic_teardown_this_cpu(secondary);
+#ifdef CONFIG_CRASH_DUMP if (cpu == crashing_cpu && cpu_thread_in_core(cpu) != 0) { /* * We enter the crash kernel on whatever cpu crashed, @@ -406,9 +407,11 @@ static void mpc85xx_smp_kexec_cpu_down(i */ disable_threadbit = 1; disable_cpu = cpu_first_thread_sibling(cpu); - } else if (sibling != crashing_cpu && - cpu_thread_in_core(cpu) == 0 && - cpu_thread_in_core(sibling) != 0) { + } else if (sibling == crashing_cpu) { + return; + } +#endif + if (cpu_thread_in_core(cpu) == 0 && cpu_thread_in_core(sibling) != 0) { disable_threadbit = 2; disable_cpu = sibling; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Ellerman mpe@ellerman.id.au
commit 2d43cc701b96f910f50915ac4c2a0cae5deb734c upstream.
Building ppc64le_defconfig with GCC 14 fails with assembler errors:
CC fs/readdir.o /tmp/ccdQn0mD.s: Assembler messages: /tmp/ccdQn0mD.s:212: Error: operand out of domain (18 is not a multiple of 4) /tmp/ccdQn0mD.s:226: Error: operand out of domain (18 is not a multiple of 4) ... [6 lines] /tmp/ccdQn0mD.s:1699: Error: operand out of domain (18 is not a multiple of 4)
A snippet of the asm shows:
# ../fs/readdir.c:210: unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end); ld 9,0(29) # MEM[(u64 *)name_38(D) + _88 * 1], MEM[(u64 *)name_38(D) + _88 * 1] # 210 "../fs/readdir.c" 1 1: std 9,18(8) # put_user # *__pus_addr_52, MEM[(u64 *)name_38(D) + _88 * 1]
The 'std' instruction requires a 4-byte aligned displacement because it is a DS-form instruction, and as the assembler says, 18 is not a multiple of 4.
A similar error is seen with GCC 13 and CONFIG_UBSAN_SIGNED_WRAP=y.
The fix is to change the constraint on the memory operand to put_user(), from "m" which is a general memory reference to "YZ".
The "Z" constraint is documented in the GCC manual PowerPC machine constraints, and specifies a "memory operand accessed with indexed or indirect addressing". "Y" is not documented in the manual but specifies a "memory operand for a DS-form instruction". Using both allows the compiler to generate a DS-form "std" or X-form "stdx" as appropriate.
The change has to be conditional on CONFIG_PPC_KERNEL_PREFIXED because the "Y" constraint does not guarantee 4-byte alignment when prefixed instructions are enabled.
Unfortunately clang doesn't support the "Y" constraint so that has to be behind an ifdef.
Although the build error is only seen with GCC 13/14, that appears to just be luck. The constraint has been incorrect since it was first added.
Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()") Cc: stable@vger.kernel.org # v5.10+ Suggested-by: Kewen Lin linkw@gcc.gnu.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240529123029.146953-1-mpe@ellerman.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/include/asm/uaccess.h | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
--- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -92,9 +92,25 @@ __pu_failed: \ : label) #endif
+#ifdef CONFIG_CC_IS_CLANG +#define DS_FORM_CONSTRAINT "Z<>" +#else +#define DS_FORM_CONSTRAINT "YZ<>" +#endif + #ifdef __powerpc64__ +#ifdef CONFIG_PPC_KERNEL_PREFIXED #define __put_user_asm2_goto(x, ptr, label) \ __put_user_asm_goto(x, ptr, label, "std") +#else +#define __put_user_asm2_goto(x, addr, label) \ + asm goto ("1: std%U1%X1 %0,%1 # put_user\n" \ + EX_TABLE(1b, %l2) \ + : \ + : "r" (x), DS_FORM_CONSTRAINT (*addr) \ + : \ + : label) +#endif // CONFIG_PPC_KERNEL_PREFIXED #else /* __powerpc64__ */ #define __put_user_asm2_goto(x, addr, label) \ asm goto( \
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen Ni nichen@iscas.ac.cn
[ Upstream commit 0a3f9f7fc59feb8a91a2793b8b60977895c72365 ]
Add check for the return value of input_ff_create_memless() and return the error if it fails in order to catch the error.
Fixes: 09308562d4af ("HID: nvidia-shield: Initial driver implementation with Thunderstrike support") Signed-off-by: Chen Ni nichen@iscas.ac.cn Reviewed-by: Rahul Rameshbabu rrameshbabu@nvidia.com Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-nvidia-shield.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/hid/hid-nvidia-shield.c b/drivers/hid/hid-nvidia-shield.c index 58b15750dbb0a..ff9078ad19611 100644 --- a/drivers/hid/hid-nvidia-shield.c +++ b/drivers/hid/hid-nvidia-shield.c @@ -283,7 +283,9 @@ static struct input_dev *shield_haptics_create( return haptics;
input_set_capability(haptics, EV_FF, FF_RUMBLE); - input_ff_create_memless(haptics, NULL, play_effect); + ret = input_ff_create_memless(haptics, NULL, play_effect); + if (ret) + goto err;
ret = input_register_device(haptics); if (ret)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Jiang dave.jiang@intel.com
[ Upstream commit d55510527153d17a3af8cc2df69c04f95ae1350d ]
tools/testing/cxl/test/mem.c uses vmalloc() and vfree() but does not include linux/vmalloc.h. Kernel v6.10 made changes that causes the currently included headers not depend on vmalloc.h and therefore mem.c can no longer compile. Add linux/vmalloc.h to fix compile issue.
CC [M] tools/testing/cxl/test/mem.o tools/testing/cxl/test/mem.c: In function ‘label_area_release’: tools/testing/cxl/test/mem.c:1428:9: error: implicit declaration of function ‘vfree’; did you mean ‘kvfree’? [-Werror=implicit-function-declaration] 1428 | vfree(lsa); | ^~~~~ | kvfree tools/testing/cxl/test/mem.c: In function ‘cxl_mock_mem_probe’: tools/testing/cxl/test/mem.c:1466:22: error: implicit declaration of function ‘vmalloc’; did you mean ‘kmalloc’? [-Werror=implicit-function-declaration] 1466 | mdata->lsa = vmalloc(LSA_SIZE); | ^~~~~~~ | kmalloc
Fixes: 7d3eb23c4ccf ("tools/testing/cxl: Introduce a mock memory device + driver") Reviewed-by: Dan Williams dan.j.williams@intel.com Reviewed-by: Alison Schofield alison.schofield@intel.com Link: https://lore.kernel.org/r/20240528225551.1025977-1-dave.jiang@intel.com Signed-off-by: Dave Jiang dave.jiang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/cxl/test/mem.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/testing/cxl/test/mem.c b/tools/testing/cxl/test/mem.c index 35ee41e435ab3..0d0ca63de8629 100644 --- a/tools/testing/cxl/test/mem.c +++ b/tools/testing/cxl/test/mem.c @@ -3,6 +3,7 @@
#include <linux/platform_device.h> #include <linux/mod_devicetable.h> +#include <linux/vmalloc.h> #include <linux/module.h> #include <linux/delay.h> #include <linux/sizes.h>
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li Zhijian lizhijian@fujitsu.com
[ Upstream commit 49ba7b515c4c0719b866d16f068e62d16a8a3dd1 ]
Move the mode verification to __create_region() before allocating the memregion to avoid the memregion leaks.
Fixes: 6e099264185d ("cxl/region: Add volatile region creation support") Signed-off-by: Li Zhijian lizhijian@fujitsu.com Reviewed-by: Dan Williams dan.j.williams@intel.com Link: https://lore.kernel.org/r/20240507053421.456439-1-lizhijian@fujitsu.com Signed-off-by: Dave Jiang dave.jiang@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cxl/core/region.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index 812b2948b6c65..18b95149640b6 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -2352,15 +2352,6 @@ static struct cxl_region *devm_cxl_add_region(struct cxl_root_decoder *cxlrd, struct device *dev; int rc;
- switch (mode) { - case CXL_DECODER_RAM: - case CXL_DECODER_PMEM: - break; - default: - dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode); - return ERR_PTR(-EINVAL); - } - cxlr = cxl_region_alloc(cxlrd, id); if (IS_ERR(cxlr)) return cxlr; @@ -2415,6 +2406,15 @@ static struct cxl_region *__create_region(struct cxl_root_decoder *cxlrd, { int rc;
+ switch (mode) { + case CXL_DECODER_RAM: + case CXL_DECODER_PMEM: + break; + default: + dev_err(&cxlrd->cxlsd.cxld.dev, "unsupported mode %d\n", mode); + return ERR_PTR(-EINVAL); + } + rc = memregion_alloc(GFP_KERNEL); if (rc < 0) return ERR_PTR(rc);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit cc5ac966f26193ab185cc43d64d9f1ae998ccb6e ]
This lets us see the correct trace output.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240522114308.2402121-2-libaokun@huaweicloud.com Acked-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/trace/events/cachefiles.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h index cf4b98b9a9edc..e3213af847cdf 100644 --- a/include/trace/events/cachefiles.h +++ b/include/trace/events/cachefiles.h @@ -127,7 +127,9 @@ enum cachefiles_error_trace { EM(cachefiles_obj_see_lookup_cookie, "SEE lookup_cookie") \ EM(cachefiles_obj_see_lookup_failed, "SEE lookup_failed") \ EM(cachefiles_obj_see_withdraw_cookie, "SEE withdraw_cookie") \ - E_(cachefiles_obj_see_withdrawal, "SEE withdrawal") + EM(cachefiles_obj_see_withdrawal, "SEE withdrawal") \ + EM(cachefiles_obj_get_ondemand_fd, "GET ondemand_fd") \ + E_(cachefiles_obj_put_ondemand_fd, "PUT ondemand_fd")
#define cachefiles_coherency_traces \ EM(cachefiles_coherency_check_aux, "BAD aux ") \
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 0fc75c5940fa634d84e64c93bfc388e1274ed013 ]
Even with CACHEFILES_DEAD set, we can still read the requests, so in the following concurrency the request may be used after it has been freed:
mount | daemon_thread1 | daemon_thread2 ------------------------------------------------------------ cachefiles_ondemand_init_object cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done) cachefiles_daemon_read cachefiles_ondemand_daemon_read // close dev fd cachefiles_flush_reqs complete(&REQ_A->done) kfree(REQ_A) xa_lock(&cache->reqs); cachefiles_ondemand_select_req req->msg.opcode != CACHEFILES_OP_READ // req use-after-free !!! xa_unlock(&cache->reqs); xa_destroy(&cache->reqs)
Hence remove requests from cache->reqs when flushing them to avoid accessing freed requests.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240522114308.2402121-3-libaokun@huaweicloud.com Acked-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/daemon.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index 6465e25742309..ccb7b707ea4b7 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -159,6 +159,7 @@ static void cachefiles_flush_reqs(struct cachefiles_cache *cache) xa_for_each(xa, index, req) { req->error = -EIO; complete(&req->done); + __xa_erase(xa, index); } xa_unlock(xa);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit de3e26f9e5b76fc628077578c001c4a51bf54d06 ]
We got the following issue in a fuzz test of randomly issuing the restore command:
================================================================== BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0x609/0xab0 Write of size 4 at addr ffff888109164a80 by task ondemand-04-dae/4962
CPU: 11 PID: 4962 Comm: ondemand-04-dae Not tainted 6.8.0-rc7-dirty #542 Call Trace: kasan_report+0x94/0xc0 cachefiles_ondemand_daemon_read+0x609/0xab0 vfs_read+0x169/0xb50 ksys_read+0xf5/0x1e0
Allocated by task 626: __kmalloc+0x1df/0x4b0 cachefiles_ondemand_send_req+0x24d/0x690 cachefiles_create_tmpfile+0x249/0xb30 cachefiles_create_file+0x6f/0x140 cachefiles_look_up_object+0x29c/0xa60 cachefiles_lookup_cookie+0x37d/0xca0 fscache_cookie_state_machine+0x43c/0x1230 [...]
Freed by task 626: kfree+0xf1/0x2c0 cachefiles_ondemand_send_req+0x568/0x690 cachefiles_create_tmpfile+0x249/0xb30 cachefiles_create_file+0x6f/0x140 cachefiles_look_up_object+0x29c/0xa60 cachefiles_lookup_cookie+0x37d/0xca0 fscache_cookie_state_machine+0x43c/0x1230 [...] ==================================================================
Following is the process that triggers the issue:
mount | daemon_thread1 | daemon_thread2 ------------------------------------------------------------ cachefiles_ondemand_init_object cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done)
cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req cachefiles_ondemand_get_fd copy_to_user(_buffer, msg, n) process_open_req(REQ_A) ------ restore ------ cachefiles_ondemand_restore xas_for_each(&xas, req, ULONG_MAX) xas_set_mark(&xas, CACHEFILES_REQ_NEW);
cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req
write(devfd, ("copen %u,%llu", msg->msg_id, size)); cachefiles_ondemand_copen xa_erase(&cache->reqs, id) complete(&REQ_A->done) kfree(REQ_A) cachefiles_ondemand_get_fd(REQ_A) fd = get_unused_fd_flags file = anon_inode_getfile fd_install(fd, file) load = (void *)REQ_A->msg.data; load->fd = fd; // load UAF !!!
This issue is caused by issuing a restore command when the daemon is still alive, which results in a request being processed multiple times thus triggering a UAF. So to avoid this problem, add an additional reference count to cachefiles_req, which is held while waiting and reading, and then released when the waiting and reading is over.
Note that since there is only one reference count for waiting, we need to avoid the same request being completed multiple times, so we can only complete the request if it is successfully removed from the xarray.
Fixes: e73fa11a356c ("cachefiles: add restore command to recover inflight ondemand read requests") Suggested-by: Hou Tao houtao1@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240522114308.2402121-4-libaokun@huaweicloud.com Acked-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/internal.h | 1 + fs/cachefiles/ondemand.c | 23 +++++++++++++++++++---- 2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index d33169f0018b1..7745b8abc3aa3 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -138,6 +138,7 @@ static inline bool cachefiles_in_ondemand_mode(struct cachefiles_cache *cache) struct cachefiles_req { struct cachefiles_object *object; struct completion done; + refcount_t ref; int error; struct cachefiles_msg msg; }; diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 4ba42f1fa3b40..c011fb24d2382 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -4,6 +4,12 @@ #include <linux/uio.h> #include "internal.h"
+static inline void cachefiles_req_put(struct cachefiles_req *req) +{ + if (refcount_dec_and_test(&req->ref)) + kfree(req); +} + static int cachefiles_ondemand_fd_release(struct inode *inode, struct file *file) { @@ -330,6 +336,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
xas_clear_mark(&xas, CACHEFILES_REQ_NEW); cache->req_id_next = xas.xa_index + 1; + refcount_inc(&req->ref); xa_unlock(&cache->reqs);
id = xas.xa_index; @@ -356,15 +363,22 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, complete(&req->done); }
+ cachefiles_req_put(req); return n;
err_put_fd: if (msg->opcode == CACHEFILES_OP_OPEN) close_fd(((struct cachefiles_open *)msg->data)->fd); error: - xa_erase(&cache->reqs, id); - req->error = ret; - complete(&req->done); + xas_reset(&xas); + xas_lock(&xas); + if (xas_load(&xas) == req) { + req->error = ret; + complete(&req->done); + xas_store(&xas, NULL); + } + xas_unlock(&xas); + cachefiles_req_put(req); return ret; }
@@ -395,6 +409,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, goto out; }
+ refcount_set(&req->ref, 1); req->object = object; init_completion(&req->done); req->msg.opcode = opcode; @@ -456,7 +471,7 @@ static int cachefiles_ondemand_send_req(struct cachefiles_object *object, wake_up_all(&cache->daemon_pollwq); wait_for_completion(&req->done); ret = req->error; - kfree(req); + cachefiles_req_put(req); return ret; out: /* Reset the object to close state in error handling path.
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit da4a827416066191aafeeccee50a8836a826ba10 ]
We got the following issue in a fuzz test of randomly issuing the restore command:
================================================================== BUG: KASAN: slab-use-after-free in cachefiles_ondemand_daemon_read+0xb41/0xb60 Read of size 8 at addr ffff888122e84088 by task ondemand-04-dae/963
CPU: 13 PID: 963 Comm: ondemand-04-dae Not tainted 6.8.0-dirty #564 Call Trace: kasan_report+0x93/0xc0 cachefiles_ondemand_daemon_read+0xb41/0xb60 vfs_read+0x169/0xb50 ksys_read+0xf5/0x1e0
Allocated by task 116: kmem_cache_alloc+0x140/0x3a0 cachefiles_lookup_cookie+0x140/0xcd0 fscache_cookie_state_machine+0x43c/0x1230 [...]
Freed by task 792: kmem_cache_free+0xfe/0x390 cachefiles_put_object+0x241/0x480 fscache_cookie_state_machine+0x5c8/0x1230 [...] ==================================================================
Following is the process that triggers the issue:
mount | daemon_thread1 | daemon_thread2 ------------------------------------------------------------ cachefiles_withdraw_cookie cachefiles_ondemand_clean_object(object) cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done)
cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req msg->object_id = req->object->ondemand->ondemand_id ------ restore ------ cachefiles_ondemand_restore xas_for_each(&xas, req, ULONG_MAX) xas_set_mark(&xas, CACHEFILES_REQ_NEW)
cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req copy_to_user(_buffer, msg, n) xa_erase(&cache->reqs, id) complete(&REQ_A->done) ------ close(fd) ------ cachefiles_ondemand_fd_release cachefiles_put_object cachefiles_put_object kmem_cache_free(cachefiles_object_jar, object) REQ_A->object->ondemand->ondemand_id // object UAF !!!
When we see the request within xa_lock, req->object must not have been freed yet, so grab the reference count of object before xa_unlock to avoid the above issue.
Fixes: 0a7e54c1959c ("cachefiles: resend an open request if the read request's object is closed") Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240522114308.2402121-5-libaokun@huaweicloud.com Acked-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/ondemand.c | 3 +++ include/trace/events/cachefiles.h | 6 +++++- 2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index c011fb24d2382..3dd002108a872 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -337,6 +337,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, xas_clear_mark(&xas, CACHEFILES_REQ_NEW); cache->req_id_next = xas.xa_index + 1; refcount_inc(&req->ref); + cachefiles_grab_object(req->object, cachefiles_obj_get_read_req); xa_unlock(&cache->reqs);
id = xas.xa_index; @@ -357,6 +358,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, goto err_put_fd; }
+ cachefiles_put_object(req->object, cachefiles_obj_put_read_req); /* CLOSE request has no reply */ if (msg->opcode == CACHEFILES_OP_CLOSE) { xa_erase(&cache->reqs, id); @@ -370,6 +372,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, if (msg->opcode == CACHEFILES_OP_OPEN) close_fd(((struct cachefiles_open *)msg->data)->fd); error: + cachefiles_put_object(req->object, cachefiles_obj_put_read_req); xas_reset(&xas); xas_lock(&xas); if (xas_load(&xas) == req) { diff --git a/include/trace/events/cachefiles.h b/include/trace/events/cachefiles.h index e3213af847cdf..7d931db02b934 100644 --- a/include/trace/events/cachefiles.h +++ b/include/trace/events/cachefiles.h @@ -33,6 +33,8 @@ enum cachefiles_obj_ref_trace { cachefiles_obj_see_withdrawal, cachefiles_obj_get_ondemand_fd, cachefiles_obj_put_ondemand_fd, + cachefiles_obj_get_read_req, + cachefiles_obj_put_read_req, };
enum fscache_why_object_killed { @@ -129,7 +131,9 @@ enum cachefiles_error_trace { EM(cachefiles_obj_see_withdraw_cookie, "SEE withdraw_cookie") \ EM(cachefiles_obj_see_withdrawal, "SEE withdrawal") \ EM(cachefiles_obj_get_ondemand_fd, "GET ondemand_fd") \ - E_(cachefiles_obj_put_ondemand_fd, "PUT ondemand_fd") + EM(cachefiles_obj_put_ondemand_fd, "PUT ondemand_fd") \ + EM(cachefiles_obj_get_read_req, "GET read_req") \ + E_(cachefiles_obj_put_read_req, "PUT read_req")
#define cachefiles_coherency_traces \ EM(cachefiles_coherency_check_aux, "BAD aux ") \
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 0a790040838c736495d5afd6b2d636f159f817f1 ]
The following concurrency may cause a read request to fail to be completed and result in a hung:
t1 | t2 --------------------------------------------------------- cachefiles_ondemand_copen req = xa_erase(&cache->reqs, id) // Anon fd is maliciously closed. cachefiles_ondemand_fd_release xa_lock(&cache->reqs) cachefiles_ondemand_set_object_close(object) xa_unlock(&cache->reqs) cachefiles_ondemand_set_object_open // No one will ever close it again. cachefiles_ondemand_daemon_read cachefiles_ondemand_select_req // Get a read req but its fd is already closed. // The daemon can't issue a cread ioctl with an closed fd, then hung.
So add spin_lock for cachefiles_ondemand_info to protect ondemand_id and state, thus we can avoid the above problem in cachefiles_ondemand_copen() by using ondemand_id to determine if fd has been closed.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240522114308.2402121-8-libaokun@huaweicloud.com Acked-by: Jeff Layton jlayton@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/internal.h | 1 + fs/cachefiles/ondemand.c | 35 ++++++++++++++++++++++++++++++++++- 2 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 7745b8abc3aa3..45c8bed605383 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -55,6 +55,7 @@ struct cachefiles_ondemand_info { int ondemand_id; enum cachefiles_object_state state; struct cachefiles_object *object; + spinlock_t lock; };
/* diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 3dd002108a872..7b667ddc84ca5 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -16,13 +16,16 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, struct cachefiles_object *object = file->private_data; struct cachefiles_cache *cache = object->volume->cache; struct cachefiles_ondemand_info *info = object->ondemand; - int object_id = info->ondemand_id; + int object_id; struct cachefiles_req *req; XA_STATE(xas, &cache->reqs, 0);
xa_lock(&cache->reqs); + spin_lock(&info->lock); + object_id = info->ondemand_id; info->ondemand_id = CACHEFILES_ONDEMAND_ID_CLOSED; cachefiles_ondemand_set_object_close(object); + spin_unlock(&info->lock);
/* Only flush CACHEFILES_REQ_NEW marked req to avoid race with daemon_read */ xas_for_each_marked(&xas, req, ULONG_MAX, CACHEFILES_REQ_NEW) { @@ -122,6 +125,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) { struct cachefiles_req *req; struct fscache_cookie *cookie; + struct cachefiles_ondemand_info *info; char *pid, *psize; unsigned long id; long size; @@ -172,6 +176,33 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) goto out; }
+ info = req->object->ondemand; + spin_lock(&info->lock); + /* + * The anonymous fd was closed before copen ? Fail the request. + * + * t1 | t2 + * --------------------------------------------------------- + * cachefiles_ondemand_copen + * req = xa_erase(&cache->reqs, id) + * // Anon fd is maliciously closed. + * cachefiles_ondemand_fd_release + * xa_lock(&cache->reqs) + * cachefiles_ondemand_set_object_close(object) + * xa_unlock(&cache->reqs) + * cachefiles_ondemand_set_object_open + * // No one will ever close it again. + * cachefiles_ondemand_daemon_read + * cachefiles_ondemand_select_req + * + * Get a read req but its fd is already closed. The daemon can't + * issue a cread ioctl with an closed fd, then hung. + */ + if (info->ondemand_id == CACHEFILES_ONDEMAND_ID_CLOSED) { + spin_unlock(&info->lock); + req->error = -EBADFD; + goto out; + } cookie = req->object->cookie; cookie->object_size = size; if (size) @@ -181,6 +212,7 @@ int cachefiles_ondemand_copen(struct cachefiles_cache *cache, char *args) trace_cachefiles_ondemand_copen(req->object, id, size);
cachefiles_ondemand_set_object_open(req->object); + spin_unlock(&info->lock); wake_up_all(&cache->daemon_pollwq);
out: @@ -596,6 +628,7 @@ int cachefiles_ondemand_init_obj_info(struct cachefiles_object *object, return -ENOMEM;
object->ondemand->object = object; + spin_lock_init(&object->ondemand->lock); INIT_WORK(&object->ondemand->ondemand_work, ondemand_object_worker); return 0; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 3e6d704f02aa4c50c7bc5fe91a4401df249a137b ]
The err_put_fd label is only used once, so remove it to make the code more readable. In addition, the logic for deleting error request and CLOSE request is merged to simplify the code.
Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240522114308.2402121-6-libaokun@huaweicloud.com Acked-by: Jeff Layton jlayton@kernel.org Reviewed-by: Jia Zhu zhujia.zj@bytedance.com Reviewed-by: Gao Xiang hsiangkao@linux.alibaba.com Reviewed-by: Jingbo Xu jefflexu@linux.alibaba.com Signed-off-by: Christian Brauner brauner@kernel.org Stable-dep-of: 4988e35e95fc ("cachefiles: never get a new anonymous fd if ondemand_id is valid") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/ondemand.c | 45 ++++++++++++++-------------------------- 1 file changed, 16 insertions(+), 29 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index 7b667ddc84ca5..c8389bb118f5b 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -337,7 +337,6 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, { struct cachefiles_req *req; struct cachefiles_msg *msg; - unsigned long id = 0; size_t n; int ret = 0; XA_STATE(xas, &cache->reqs, cache->req_id_next); @@ -372,49 +371,37 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, cachefiles_grab_object(req->object, cachefiles_obj_get_read_req); xa_unlock(&cache->reqs);
- id = xas.xa_index; - if (msg->opcode == CACHEFILES_OP_OPEN) { ret = cachefiles_ondemand_get_fd(req); if (ret) { cachefiles_ondemand_set_object_close(req->object); - goto error; + goto out; } }
- msg->msg_id = id; + msg->msg_id = xas.xa_index; msg->object_id = req->object->ondemand->ondemand_id;
if (copy_to_user(_buffer, msg, n) != 0) { ret = -EFAULT; - goto err_put_fd; - } - - cachefiles_put_object(req->object, cachefiles_obj_put_read_req); - /* CLOSE request has no reply */ - if (msg->opcode == CACHEFILES_OP_CLOSE) { - xa_erase(&cache->reqs, id); - complete(&req->done); + if (msg->opcode == CACHEFILES_OP_OPEN) + close_fd(((struct cachefiles_open *)msg->data)->fd); } - - cachefiles_req_put(req); - return n; - -err_put_fd: - if (msg->opcode == CACHEFILES_OP_OPEN) - close_fd(((struct cachefiles_open *)msg->data)->fd); -error: +out: cachefiles_put_object(req->object, cachefiles_obj_put_read_req); - xas_reset(&xas); - xas_lock(&xas); - if (xas_load(&xas) == req) { - req->error = ret; - complete(&req->done); - xas_store(&xas, NULL); + /* Remove error request and CLOSE request has no reply */ + if (ret || msg->opcode == CACHEFILES_OP_CLOSE) { + xas_reset(&xas); + xas_lock(&xas); + if (xas_load(&xas) == req) { + req->error = ret; + complete(&req->done); + xas_store(&xas, NULL); + } + xas_unlock(&xas); } - xas_unlock(&xas); cachefiles_req_put(req); - return ret; + return ret ? ret : n; }
typedef int (*init_req_fn)(struct cachefiles_req *req, void *private);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 4988e35e95fc938bdde0e15880fe72042fc86acf ]
Now every time the daemon reads an open request, it gets a new anonymous fd and ondemand_id. With the introduction of "restore", it is possible to read the same open request more than once, and therefore an object can have more than one anonymous fd.
If the anonymous fd is not unique, the following concurrencies will result in an fd leak:
t1 | t2 | t3 ------------------------------------------------------------ cachefiles_ondemand_init_object cachefiles_ondemand_send_req REQ_A = kzalloc(sizeof(*req) + data_len) wait_for_completion(&REQ_A->done) cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req cachefiles_ondemand_get_fd load->fd = fd0 ondemand_id = object_id0 ------ restore ------ cachefiles_ondemand_restore // restore REQ_A cachefiles_daemon_read cachefiles_ondemand_daemon_read REQ_A = cachefiles_ondemand_select_req cachefiles_ondemand_get_fd load->fd = fd1 ondemand_id = object_id1 process_open_req(REQ_A) write(devfd, ("copen %u,%llu", msg->msg_id, size)) cachefiles_ondemand_copen xa_erase(&cache->reqs, id) complete(&REQ_A->done) kfree(REQ_A) process_open_req(REQ_A) // copen fails due to no req // daemon close(fd1) cachefiles_ondemand_fd_release // set object closed -- umount -- cachefiles_withdraw_cookie cachefiles_ondemand_clean_object cachefiles_ondemand_init_close_req if (!cachefiles_ondemand_object_is_open(object)) return -ENOENT; // The fd0 is not closed until the daemon exits.
However, the anonymous fd holds the reference count of the object and the object holds the reference count of the cookie. So even though the cookie has been relinquished, it will not be unhashed and freed until the daemon exits.
In fscache_hash_cookie(), when the same cookie is found in the hash list, if the cookie is set with the FSCACHE_COOKIE_RELINQUISHED bit, then the new cookie waits for the old cookie to be unhashed, while the old cookie is waiting for the leaked fd to be closed, if the daemon does not exit in time it will trigger a hung task.
To avoid this, allocate a new anonymous fd only if no anonymous fd has been allocated (ondemand_id == 0) or if the previously allocated anonymous fd has been closed (ondemand_id == -1). Moreover, returns an error if ondemand_id is valid, letting the daemon know that the current userland restore logic is abnormal and needs to be checked.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240522114308.2402121-9-libaokun@huaweicloud.com Acked-by: Jeff Layton jlayton@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/ondemand.c | 34 ++++++++++++++++++++++++++++------ 1 file changed, 28 insertions(+), 6 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index c8389bb118f5b..dbcd4161ea3a1 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -14,11 +14,18 @@ static int cachefiles_ondemand_fd_release(struct inode *inode, struct file *file) { struct cachefiles_object *object = file->private_data; - struct cachefiles_cache *cache = object->volume->cache; - struct cachefiles_ondemand_info *info = object->ondemand; + struct cachefiles_cache *cache; + struct cachefiles_ondemand_info *info; int object_id; struct cachefiles_req *req; - XA_STATE(xas, &cache->reqs, 0); + XA_STATE(xas, NULL, 0); + + if (!object) + return 0; + + info = object->ondemand; + cache = object->volume->cache; + xas.xa = &cache->reqs;
xa_lock(&cache->reqs); spin_lock(&info->lock); @@ -275,22 +282,39 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) goto err_put_fd; }
+ spin_lock(&object->ondemand->lock); + if (object->ondemand->ondemand_id > 0) { + spin_unlock(&object->ondemand->lock); + /* Pair with check in cachefiles_ondemand_fd_release(). */ + file->private_data = NULL; + ret = -EEXIST; + goto err_put_file; + } + file->f_mode |= FMODE_PWRITE | FMODE_LSEEK; fd_install(fd, file);
load = (void *)req->msg.data; load->fd = fd; object->ondemand->ondemand_id = object_id; + spin_unlock(&object->ondemand->lock);
cachefiles_get_unbind_pincount(cache); trace_cachefiles_ondemand_open(object, &req->msg, load); return 0;
+err_put_file: + fput(file); err_put_fd: put_unused_fd(fd); err_free_id: xa_erase(&cache->ondemand_ids, object_id); err: + spin_lock(&object->ondemand->lock); + /* Avoid marking an opened object as closed. */ + if (object->ondemand->ondemand_id <= 0) + cachefiles_ondemand_set_object_close(object); + spin_unlock(&object->ondemand->lock); cachefiles_put_object(object, cachefiles_obj_put_ondemand_fd); return ret; } @@ -373,10 +397,8 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache,
if (msg->opcode == CACHEFILES_OP_OPEN) { ret = cachefiles_ondemand_get_fd(req); - if (ret) { - cachefiles_ondemand_set_object_close(req->object); + if (ret) goto out; - } }
msg->msg_id = xas.xa_index;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 4b4391e77a6bf24cba2ef1590e113d9b73b11039 ]
After installing the anonymous fd, we can now see it in userland and close it. However, at this point we may not have gotten the reference count of the cache, but we will put it during colse fd, so this may cause a cache UAF.
So grab the cache reference count before fd_install(). In addition, by kernel convention, fd is taken over by the user land after fd_install(), and the kernel should not call close_fd() after that, i.e., it should call fd_install() after everything is ready, thus fd_install() is called after copy_to_user() succeeds.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Suggested-by: Hou Tao houtao1@huawei.com Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240522114308.2402121-10-libaokun@huaweicloud.com Acked-by: Jeff Layton jlayton@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/ondemand.c | 53 +++++++++++++++++++++++++--------------- 1 file changed, 33 insertions(+), 20 deletions(-)
diff --git a/fs/cachefiles/ondemand.c b/fs/cachefiles/ondemand.c index dbcd4161ea3a1..89f118d68d125 100644 --- a/fs/cachefiles/ondemand.c +++ b/fs/cachefiles/ondemand.c @@ -4,6 +4,11 @@ #include <linux/uio.h> #include "internal.h"
+struct ondemand_anon_file { + struct file *file; + int fd; +}; + static inline void cachefiles_req_put(struct cachefiles_req *req) { if (refcount_dec_and_test(&req->ref)) @@ -250,14 +255,14 @@ int cachefiles_ondemand_restore(struct cachefiles_cache *cache, char *args) return 0; }
-static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) +static int cachefiles_ondemand_get_fd(struct cachefiles_req *req, + struct ondemand_anon_file *anon_file) { struct cachefiles_object *object; struct cachefiles_cache *cache; struct cachefiles_open *load; - struct file *file; u32 object_id; - int ret, fd; + int ret;
object = cachefiles_grab_object(req->object, cachefiles_obj_get_ondemand_fd); @@ -269,16 +274,16 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) if (ret < 0) goto err;
- fd = get_unused_fd_flags(O_WRONLY); - if (fd < 0) { - ret = fd; + anon_file->fd = get_unused_fd_flags(O_WRONLY); + if (anon_file->fd < 0) { + ret = anon_file->fd; goto err_free_id; }
- file = anon_inode_getfile("[cachefiles]", &cachefiles_ondemand_fd_fops, - object, O_WRONLY); - if (IS_ERR(file)) { - ret = PTR_ERR(file); + anon_file->file = anon_inode_getfile("[cachefiles]", + &cachefiles_ondemand_fd_fops, object, O_WRONLY); + if (IS_ERR(anon_file->file)) { + ret = PTR_ERR(anon_file->file); goto err_put_fd; }
@@ -286,16 +291,15 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) if (object->ondemand->ondemand_id > 0) { spin_unlock(&object->ondemand->lock); /* Pair with check in cachefiles_ondemand_fd_release(). */ - file->private_data = NULL; + anon_file->file->private_data = NULL; ret = -EEXIST; goto err_put_file; }
- file->f_mode |= FMODE_PWRITE | FMODE_LSEEK; - fd_install(fd, file); + anon_file->file->f_mode |= FMODE_PWRITE | FMODE_LSEEK;
load = (void *)req->msg.data; - load->fd = fd; + load->fd = anon_file->fd; object->ondemand->ondemand_id = object_id; spin_unlock(&object->ondemand->lock);
@@ -304,9 +308,11 @@ static int cachefiles_ondemand_get_fd(struct cachefiles_req *req) return 0;
err_put_file: - fput(file); + fput(anon_file->file); + anon_file->file = NULL; err_put_fd: - put_unused_fd(fd); + put_unused_fd(anon_file->fd); + anon_file->fd = ret; err_free_id: xa_erase(&cache->ondemand_ids, object_id); err: @@ -363,6 +369,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, struct cachefiles_msg *msg; size_t n; int ret = 0; + struct ondemand_anon_file anon_file; XA_STATE(xas, &cache->reqs, cache->req_id_next);
xa_lock(&cache->reqs); @@ -396,7 +403,7 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, xa_unlock(&cache->reqs);
if (msg->opcode == CACHEFILES_OP_OPEN) { - ret = cachefiles_ondemand_get_fd(req); + ret = cachefiles_ondemand_get_fd(req, &anon_file); if (ret) goto out; } @@ -404,10 +411,16 @@ ssize_t cachefiles_ondemand_daemon_read(struct cachefiles_cache *cache, msg->msg_id = xas.xa_index; msg->object_id = req->object->ondemand->ondemand_id;
- if (copy_to_user(_buffer, msg, n) != 0) { + if (copy_to_user(_buffer, msg, n) != 0) ret = -EFAULT; - if (msg->opcode == CACHEFILES_OP_OPEN) - close_fd(((struct cachefiles_open *)msg->data)->fd); + + if (msg->opcode == CACHEFILES_OP_OPEN) { + if (ret < 0) { + fput(anon_file.file); + put_unused_fd(anon_file.fd); + goto out; + } + fd_install(anon_file.fd, anon_file.file); } out: cachefiles_put_object(req->object, cachefiles_obj_put_read_req);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 85e833cd7243bda7285492b0653c3abb1e2e757b ]
In ondemand mode, when the daemon is processing an open request, if the kernel flags the cache as CACHEFILES_DEAD, the cachefiles_daemon_write() will always return -EIO, so the daemon can't pass the copen to the kernel. Then the kernel process that is waiting for the copen triggers a hung_task.
Since the DEAD state is irreversible, it can only be exited by closing /dev/cachefiles. Therefore, after calling cachefiles_io_error() to mark the cache as CACHEFILES_DEAD, if in ondemand mode, flush all requests to avoid the above hungtask. We may still be able to read some of the cached data before closing the fd of /dev/cachefiles.
Note that this relies on the patch that adds reference counting to the req, otherwise it may UAF.
Fixes: c8383054506c ("cachefiles: notify the user daemon when looking up cookie") Signed-off-by: Baokun Li libaokun1@huawei.com Link: https://lore.kernel.org/r/20240522114308.2402121-12-libaokun@huaweicloud.com Acked-by: Jeff Layton jlayton@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cachefiles/daemon.c | 2 +- fs/cachefiles/internal.h | 3 +++ 2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/cachefiles/daemon.c b/fs/cachefiles/daemon.c index ccb7b707ea4b7..06cdf1a8a16f6 100644 --- a/fs/cachefiles/daemon.c +++ b/fs/cachefiles/daemon.c @@ -133,7 +133,7 @@ static int cachefiles_daemon_open(struct inode *inode, struct file *file) return 0; }
-static void cachefiles_flush_reqs(struct cachefiles_cache *cache) +void cachefiles_flush_reqs(struct cachefiles_cache *cache) { struct xarray *xa = &cache->reqs; struct cachefiles_req *req; diff --git a/fs/cachefiles/internal.h b/fs/cachefiles/internal.h index 45c8bed605383..6845a90cdfcce 100644 --- a/fs/cachefiles/internal.h +++ b/fs/cachefiles/internal.h @@ -188,6 +188,7 @@ extern int cachefiles_has_space(struct cachefiles_cache *cache, * daemon.c */ extern const struct file_operations cachefiles_daemon_fops; +extern void cachefiles_flush_reqs(struct cachefiles_cache *cache); extern void cachefiles_get_unbind_pincount(struct cachefiles_cache *cache); extern void cachefiles_put_unbind_pincount(struct cachefiles_cache *cache);
@@ -426,6 +427,8 @@ do { \ pr_err("I/O Error: " FMT"\n", ##__VA_ARGS__); \ fscache_io_error((___cache)->cache); \ set_bit(CACHEFILES_DEAD, &(___cache)->flags); \ + if (cachefiles_in_ondemand_mode(___cache)) \ + cachefiles_flush_reqs(___cache); \ } while (0)
#define cachefiles_io_error_obj(object, FMT, ...) \
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Brown broonie@kernel.org
[ Upstream commit 2032e61e24fe9fe55d6c7a34fb5506c911b3e280 ]
The pcmtest driver tests use the kselftest harness which requires that _GNU_SOURCE is defined but nothing causes it to be defined. Since the KHDR_INCLUDES Makefile variable has had the required define added let's use that, this should provide some futureproofing.
Fixes: daef47b89efd ("selftests: Compile kselftest headers with -D_GNU_SOURCE") Signed-off-by: Mark Brown broonie@kernel.org Reviewed-by: Muhammad Usama Anjum usama.anjum@collabora.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/alsa/Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/alsa/Makefile b/tools/testing/selftests/alsa/Makefile index 5af9ba8a4645b..c1ce39874e2b5 100644 --- a/tools/testing/selftests/alsa/Makefile +++ b/tools/testing/selftests/alsa/Makefile @@ -1,7 +1,7 @@ # SPDX-License-Identifier: GPL-2.0 #
-CFLAGS += $(shell pkg-config --cflags alsa) +CFLAGS += $(shell pkg-config --cflags alsa) $(KHDR_INCLUDES) LDLIBS += $(shell pkg-config --libs alsa) ifeq ($(LDLIBS),) LDLIBS += -lasound
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu (Google) mhiramat@kernel.org
[ Upstream commit f6c3c83db1d939ebdb8c8922748ae647d8126d91 ]
The dynevent/test_duplicates.tc test case uses `syscalls/sys_enter_openat` event for defining eprobe on it. Since this `syscalls` events depend on CONFIG_FTRACE_SYSCALLS=y, if it is not set, the test will fail.
Add the event file to `required` line so that the test will return `unsupported` result.
Fixes: 297e1dcdca3d ("selftests/ftrace: Add selftest for testing duplicate eprobes and kprobes") Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../testing/selftests/ftrace/test.d/dynevent/test_duplicates.tc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/dynevent/test_duplicates.tc b/tools/testing/selftests/ftrace/test.d/dynevent/test_duplicates.tc index d3a79da215c8b..5f72abe6fa79b 100644 --- a/tools/testing/selftests/ftrace/test.d/dynevent/test_duplicates.tc +++ b/tools/testing/selftests/ftrace/test.d/dynevent/test_duplicates.tc @@ -1,7 +1,7 @@ #!/bin/sh # SPDX-License-Identifier: GPL-2.0 # description: Generic dynamic event - check if duplicate events are caught -# requires: dynamic_events "e[:[<group>/][<event>]] <attached-group>.<attached-event> [<args>]":README +# requires: dynamic_events "e[:[<group>/][<event>]] <attached-group>.<attached-event> [<args>]":README events/syscalls/sys_enter_openat
echo 0 > events/enable
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Samuel Holland samuel.holland@sifive.com
[ Upstream commit 2607133196c35f31892ee199ce7ffa717bea4ad1 ]
These clkdevs were unnecessary, because systems using this driver always look up clocks using the devicetree. And as Russell King points out[1], since the provided device name was truncated, lookups via clkdev would never match.
Recently, commit 8d532528ff6a ("clkdev: report over-sized strings when creating clkdev entries") caused clkdev registration to fail due to the truncation, and this now prevents the driver from probing. Fix the driver by removing the clkdev registration.
Link: https://lore.kernel.org/linux-clk/ZkfYqj+OcAxd9O2t@shell.armlinux.org.uk/ [1] Fixes: 30b8e27e3b58 ("clk: sifive: add a driver for the SiFive FU540 PRCI IP block") Fixes: 8d532528ff6a ("clkdev: report over-sized strings when creating clkdev entries") Reported-by: Guenter Roeck linux@roeck-us.net Closes: https://lore.kernel.org/linux-clk/7eda7621-0dde-4153-89e4-172e4c095d01@roeck... Suggested-by: Russell King linux@armlinux.org.uk Signed-off-by: Samuel Holland samuel.holland@sifive.com Link: https://lore.kernel.org/r/20240528001432.1200403-1-samuel.holland@sifive.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/sifive/sifive-prci.c | 8 -------- 1 file changed, 8 deletions(-)
diff --git a/drivers/clk/sifive/sifive-prci.c b/drivers/clk/sifive/sifive-prci.c index 25b8e1a80ddce..b32a59fe55e74 100644 --- a/drivers/clk/sifive/sifive-prci.c +++ b/drivers/clk/sifive/sifive-prci.c @@ -4,7 +4,6 @@ * Copyright (C) 2020 Zong Li */
-#include <linux/clkdev.h> #include <linux/delay.h> #include <linux/io.h> #include <linux/module.h> @@ -537,13 +536,6 @@ static int __prci_register_clocks(struct device *dev, struct __prci_data *pd, return r; }
- r = clk_hw_register_clkdev(&pic->hw, pic->name, dev_name(dev)); - if (r) { - dev_warn(dev, "Failed to register clkdev for %s: %d\n", - init.name, r); - return r; - } - pd->hw_clks.hws[i] = &pic->hw; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Olga Kornievskaia kolga@netapp.com
[ Upstream commit 28568c906c1bb5f7560e18082ed7d6295860f1c2 ]
In commit 4ca9f31a2be66 ("NFSv4.1 test and add 4.1 trunking transport"), we introduce the ability to query the NFS server for possible trunking locations of the existing filesystem. However, we never checked the returned file system path for these alternative locations. According to the RFC, the server can say that the filesystem currently known under "fs_root" of fs_location also resides under these server locations under the following "rootpath" pathname. The client cannot handle trunking a filesystem that reside under different location under different paths other than what the main path is. This patch enforces the check that fs_root path and rootpath path in fs_location reply is the same.
Fixes: 4ca9f31a2be6 ("NFSv4.1 test and add 4.1 trunking transport") Signed-off-by: Olga Kornievskaia kolga@netapp.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/nfs4proc.c | 23 ++++++++++++++++++++++- 1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c index c93c12063b3af..3a816c4a6d5e2 100644 --- a/fs/nfs/nfs4proc.c +++ b/fs/nfs/nfs4proc.c @@ -4023,6 +4023,23 @@ static void test_fs_location_for_trunking(struct nfs4_fs_location *location, } }
+static bool _is_same_nfs4_pathname(struct nfs4_pathname *path1, + struct nfs4_pathname *path2) +{ + int i; + + if (path1->ncomponents != path2->ncomponents) + return false; + for (i = 0; i < path1->ncomponents; i++) { + if (path1->components[i].len != path2->components[i].len) + return false; + if (memcmp(path1->components[i].data, path2->components[i].data, + path1->components[i].len)) + return false; + } + return true; +} + static int _nfs4_discover_trunking(struct nfs_server *server, struct nfs_fh *fhandle) { @@ -4056,9 +4073,13 @@ static int _nfs4_discover_trunking(struct nfs_server *server, if (status) goto out_free_3;
- for (i = 0; i < locations->nlocations; i++) + for (i = 0; i < locations->nlocations; i++) { + if (!_is_same_nfs4_pathname(&locations->fs_path, + &locations->locations[i].rootpath)) + continue; test_fs_location_for_trunking(&locations->locations[i], clp, server); + } out_free_3: kfree(locations->fattr); out_free_2:
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen Hanxiao chenhx.fnst@fujitsu.com
[ Upstream commit 33c94d7e3cb84f6d130678d6d59ba475a6c489cf ]
don't return 0 if snd_buf->len really greater than snd_buf->buflen
Signed-off-by: Chen Hanxiao chenhx.fnst@fujitsu.com Fixes: 0c77668ddb4e ("SUNRPC: Introduce trace points in rpc_auth_gss.ko") Reviewed-by: Benjamin Coddington bcodding@redhat.com Reviewed-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/auth_gss/auth_gss.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index c7af0220f82f4..369310909fc98 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -1875,8 +1875,10 @@ gss_wrap_req_priv(struct rpc_cred *cred, struct gss_cl_ctx *ctx, offset = (u8 *)p - (u8 *)snd_buf->head[0].iov_base; maj_stat = gss_wrap(ctx->gc_gss_ctx, offset, snd_buf, inpages); /* slack space should prevent this ever happening: */ - if (unlikely(snd_buf->len > snd_buf->buflen)) + if (unlikely(snd_buf->len > snd_buf->buflen)) { + status = -EIO; goto wrap_failed; + } /* We're assuming that when GSS_S_CONTEXT_EXPIRED, the encryption was * done anyway, so it's safe to put the request on the wire: */ if (maj_stat == GSS_S_CONTEXT_EXPIRED)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
[ Upstream commit 99bc9f2eb3f79a2b4296d9bf43153e1d10ca50d3 ]
dentry->d_fsdata is set to NFS_FSDATA_BLOCKED while unlinking or renaming-over a file to ensure that no open succeeds while the NFS operation progressed on the server.
Setting dentry->d_fsdata to NFS_FSDATA_BLOCKED is done under ->d_lock after checking the refcount is not elevated. Any attempt to open the file (through that name) will go through lookp_open() which will take ->d_lock while incrementing the refcount, we can be sure that once the new value is set, __nfs_lookup_revalidate() *will* see the new value and will block.
We don't have any locking guarantee that when we set ->d_fsdata to NULL, the wait_var_event() in __nfs_lookup_revalidate() will notice. wait/wake primitives do NOT provide barriers to guarantee order. We must use smp_load_acquire() in wait_var_event() to ensure we look at an up-to-date value, and must use smp_store_release() before wake_up_var().
This patch adds those barrier functions and factors out block_revalidate() and unblock_revalidate() far clarity.
There is also a hypothetical bug in that if memory allocation fails (which never happens in practice) we might leave ->d_fsdata locked. This patch adds the missing call to unblock_revalidate().
Reported-and-tested-by: Richard Kojedzinszky richard+debian+bugreport@kojedz.in Closes: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1071501 Fixes: 3c59366c207e ("NFS: don't unhash dentry during unlink/rename") Signed-off-by: NeilBrown neilb@suse.de Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfs/dir.c | 47 ++++++++++++++++++++++++++++++++--------------- 1 file changed, 32 insertions(+), 15 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index ac505671efbdb..bdd6cb33a3708 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -1802,9 +1802,10 @@ __nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags, if (parent != READ_ONCE(dentry->d_parent)) return -ECHILD; } else { - /* Wait for unlink to complete */ + /* Wait for unlink to complete - see unblock_revalidate() */ wait_var_event(&dentry->d_fsdata, - dentry->d_fsdata != NFS_FSDATA_BLOCKED); + smp_load_acquire(&dentry->d_fsdata) + != NFS_FSDATA_BLOCKED); parent = dget_parent(dentry); ret = reval(d_inode(parent), dentry, flags); dput(parent); @@ -1817,6 +1818,29 @@ static int nfs_lookup_revalidate(struct dentry *dentry, unsigned int flags) return __nfs_lookup_revalidate(dentry, flags, nfs_do_lookup_revalidate); }
+static void block_revalidate(struct dentry *dentry) +{ + /* old devname - just in case */ + kfree(dentry->d_fsdata); + + /* Any new reference that could lead to an open + * will take ->d_lock in lookup_open() -> d_lookup(). + * Holding this lock ensures we cannot race with + * __nfs_lookup_revalidate() and removes and need + * for further barriers. + */ + lockdep_assert_held(&dentry->d_lock); + + dentry->d_fsdata = NFS_FSDATA_BLOCKED; +} + +static void unblock_revalidate(struct dentry *dentry) +{ + /* store_release ensures wait_var_event() sees the update */ + smp_store_release(&dentry->d_fsdata, NULL); + wake_up_var(&dentry->d_fsdata); +} + /* * A weaker form of d_revalidate for revalidating just the d_inode(dentry) * when we don't really care about the dentry name. This is called when a @@ -2501,15 +2525,12 @@ int nfs_unlink(struct inode *dir, struct dentry *dentry) spin_unlock(&dentry->d_lock); goto out; } - /* old devname */ - kfree(dentry->d_fsdata); - dentry->d_fsdata = NFS_FSDATA_BLOCKED; + block_revalidate(dentry);
spin_unlock(&dentry->d_lock); error = nfs_safe_remove(dentry); nfs_dentry_remove_handle_error(dir, dentry, error); - dentry->d_fsdata = NULL; - wake_up_var(&dentry->d_fsdata); + unblock_revalidate(dentry); out: trace_nfs_unlink_exit(dir, dentry, error); return error; @@ -2616,8 +2637,7 @@ nfs_unblock_rename(struct rpc_task *task, struct nfs_renamedata *data) { struct dentry *new_dentry = data->new_dentry;
- new_dentry->d_fsdata = NULL; - wake_up_var(&new_dentry->d_fsdata); + unblock_revalidate(new_dentry); }
/* @@ -2679,11 +2699,6 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir, if (WARN_ON(new_dentry->d_flags & DCACHE_NFSFS_RENAMED) || WARN_ON(new_dentry->d_fsdata == NFS_FSDATA_BLOCKED)) goto out; - if (new_dentry->d_fsdata) { - /* old devname */ - kfree(new_dentry->d_fsdata); - new_dentry->d_fsdata = NULL; - }
spin_lock(&new_dentry->d_lock); if (d_count(new_dentry) > 2) { @@ -2705,7 +2720,7 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir, new_dentry = dentry; new_inode = NULL; } else { - new_dentry->d_fsdata = NFS_FSDATA_BLOCKED; + block_revalidate(new_dentry); must_unblock = true; spin_unlock(&new_dentry->d_lock); } @@ -2717,6 +2732,8 @@ int nfs_rename(struct mnt_idmap *idmap, struct inode *old_dir, task = nfs_async_rename(old_dir, new_dir, old_dentry, new_dentry, must_unblock ? nfs_unblock_rename : NULL); if (IS_ERR(task)) { + if (must_unblock) + unblock_revalidate(new_dentry); error = PTR_ERR(task); goto out; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masami Hiramatsu (Google) mhiramat@kernel.org
[ Upstream commit 0f42bdf59b4e428485aa922bef871bfa6cc505e0 ]
Commit eb50d0f250e9 ("selftests/ftrace: Choose target function for filter test from samples") choose the target function from samples, but sometimes this test failes randomly because the target function does not hit at the next time. So retry getting samples up to 10 times.
Fixes: eb50d0f250e9 ("selftests/ftrace: Choose target function for filter test from samples") Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../test.d/filter/event-filter-function.tc | 20 ++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc index 3f74c09c56b62..118247b8dd84d 100644 --- a/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc +++ b/tools/testing/selftests/ftrace/test.d/filter/event-filter-function.tc @@ -10,7 +10,6 @@ fail() { #msg }
sample_events() { - echo > trace echo 1 > events/kmem/kmem_cache_free/enable echo 1 > tracing_on ls > /dev/null @@ -22,6 +21,7 @@ echo 0 > tracing_on echo 0 > events/enable
echo "Get the most frequently calling function" +echo > trace sample_events
target_func=`cat trace | grep -o 'call_site=([^+]*)' | sed 's/call_site=//' | sort | uniq -c | sort | tail -n 1 | sed 's/^[ 0-9]*//'` @@ -32,7 +32,16 @@ echo > trace
echo "Test event filter function name" echo "call_site.function == $target_func" > events/kmem/kmem_cache_free/filter + +sample_events +max_retry=10 +while [ `grep kmem_cache_free trace| wc -l` -eq 0 ]; do sample_events +max_retry=$((max_retry - 1)) +if [ $max_retry -eq 0 ]; then + exit_fail +fi +done
hitcnt=`grep kmem_cache_free trace| grep $target_func | wc -l` misscnt=`grep kmem_cache_free trace| grep -v $target_func | wc -l` @@ -49,7 +58,16 @@ address=`grep " ${target_func}$" /proc/kallsyms | cut -d' ' -f1`
echo "Test event filter function address" echo "call_site.function == 0x$address" > events/kmem/kmem_cache_free/filter +echo > trace +sample_events +max_retry=10 +while [ `grep kmem_cache_free trace| wc -l` -eq 0 ]; do sample_events +max_retry=$((max_retry - 1)) +if [ $max_retry -eq 0 ]; then + exit_fail +fi +done
hitcnt=`grep kmem_cache_free trace| grep $target_func | wc -l` misscnt=`grep kmem_cache_free trace| grep -v $target_func | wc -l`
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: John Hubbard jhubbard@nvidia.com
[ Upstream commit 4bf15b1c657d22d1d70173e43264e4606dfe75ff ]
When building with clang, via:
make LLVM=1 -C tools/testing/selftests
...clang issues this warning:
futex_requeue_pi.c:403:17: warning: passing 'const char **' to parameter of type 'char **' discards qualifiers in nested pointer types [-Wincompatible-pointer-types-discards-qualifiers]
This warning fires because test_name is passed into asprintf(3), which then changes it.
Fix this by simply removing the const qualifier. This is a local automatic variable in a very short function, so there is not much need to use the compiler to enforce const-ness at this scope.
[1] https://lore.kernel.org/all/20240329-selftests-libmk-llvm-rfc-v1-1-2f9ed7d1c...
Fixes: f17d8a87ecb5 ("selftests: fuxex: Report a unique test name per run of futex_requeue_pi") Reviewed-by: Davidlohr Bueso dave@stgolabs.net Signed-off-by: John Hubbard jhubbard@nvidia.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/futex/functional/futex_requeue_pi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/futex/functional/futex_requeue_pi.c b/tools/testing/selftests/futex/functional/futex_requeue_pi.c index 7f3ca5c78df12..215c6cb539b4a 100644 --- a/tools/testing/selftests/futex/functional/futex_requeue_pi.c +++ b/tools/testing/selftests/futex/functional/futex_requeue_pi.c @@ -360,7 +360,7 @@ int unit_test(int broadcast, long lock, int third_party_owner, long timeout_ns)
int main(int argc, char *argv[]) { - const char *test_name; + char *test_name; int c, ret;
while ((c = getopt(argc, argv, "bchlot:v:")) != -1) {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Weiwen Hu huweiwen@linux.alibaba.com
[ Upstream commit b1a1fdd7096dd2d67911b07f8118ff113d815db4 ]
Fix the parsing if extra status bits (e.g. MORE) is present.
Fixes: 7fb42780d06c ("nvme: Convert NVMe errors to PR errors") Signed-off-by: Weiwen Hu huweiwen@linux.alibaba.com Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/pr.c b/drivers/nvme/host/pr.c index e05571b2a1b0c..8fa1ffcdaed48 100644 --- a/drivers/nvme/host/pr.c +++ b/drivers/nvme/host/pr.c @@ -77,7 +77,7 @@ static int nvme_sc_to_pr_err(int nvme_sc) if (nvme_is_path_error(nvme_sc)) return PR_STS_PATH_FAILED;
- switch (nvme_sc) { + switch (nvme_sc & 0x7ff) { case NVME_SC_SUCCESS: return PR_STS_SUCCESS; case NVME_SC_RESERVATION_CONFLICT:
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen Ni nichen@iscas.ac.cn
[ Upstream commit 629f2b4e05225e53125aaf7ff0b87d5d53897128 ]
Add check for the return value of of_drm_get_panel_orientation() and return the error if it fails in order to catch the error.
Fixes: b27c0f6d208d ("drm/panel: sitronix-st7789v: add panel orientation support") Signed-off-by: Chen Ni nichen@iscas.ac.cn Reviewed-by: Michael Riesch michael.riesch@wolfvision.net Acked-by: Jessica Zhang quic_jesszhan@quicinc.com Link: https://lore.kernel.org/r/20240528030832.2529471-1-nichen@iscas.ac.cn Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20240528030832.2529471-1-niche... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/panel/panel-sitronix-st7789v.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panel/panel-sitronix-st7789v.c b/drivers/gpu/drm/panel/panel-sitronix-st7789v.c index e8f385b9c6182..28bfc48a91272 100644 --- a/drivers/gpu/drm/panel/panel-sitronix-st7789v.c +++ b/drivers/gpu/drm/panel/panel-sitronix-st7789v.c @@ -643,7 +643,9 @@ static int st7789v_probe(struct spi_device *spi) if (ret) return dev_err_probe(dev, ret, "Failed to get backlight\n");
- of_drm_get_panel_orientation(spi->dev.of_node, &ctx->orientation); + ret = of_drm_get_panel_orientation(spi->dev.of_node, &ctx->orientation); + if (ret) + return dev_err_probe(&spi->dev, ret, "Failed to get orientation\n");
drm_panel_add(&ctx->panel);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Armin Wolf W_Armin@gmx.de
[ Upstream commit 1981b296f858010eae409548fd297659b2cc570e ]
When reading token data from sysfs on my Inspiron 3505, the token locations and values are wrong. This happens because match_attribute() blindly assumes that all entries in da_tokens have an associated entry in token_attrs.
This however is not true as soon as da_tokens[] contains zeroed token entries. Those entries are being skipped when initialising token_attrs, breaking the core assumption of match_attribute().
Fix this by defining an extra struct for each pair of token attributes and use container_of() to retrieve token information.
Tested on a Dell Inspiron 3050.
Fixes: 33b9ca1e53b4 ("platform/x86: dell-smbios: Add a sysfs interface for SMBIOS tokens") Signed-off-by: Armin Wolf W_Armin@gmx.de Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Link: https://lore.kernel.org/r/20240528204903.445546-1-W_Armin@gmx.de Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/dell/dell-smbios-base.c | 92 ++++++++------------ 1 file changed, 36 insertions(+), 56 deletions(-)
diff --git a/drivers/platform/x86/dell/dell-smbios-base.c b/drivers/platform/x86/dell/dell-smbios-base.c index e61bfaf8b5c48..86b95206cb1bd 100644 --- a/drivers/platform/x86/dell/dell-smbios-base.c +++ b/drivers/platform/x86/dell/dell-smbios-base.c @@ -11,6 +11,7 @@ */ #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+#include <linux/container_of.h> #include <linux/kernel.h> #include <linux/module.h> #include <linux/capability.h> @@ -25,11 +26,16 @@ static u32 da_supported_commands; static int da_num_tokens; static struct platform_device *platform_device; static struct calling_interface_token *da_tokens; -static struct device_attribute *token_location_attrs; -static struct device_attribute *token_value_attrs; +static struct token_sysfs_data *token_entries; static struct attribute **token_attrs; static DEFINE_MUTEX(smbios_mutex);
+struct token_sysfs_data { + struct device_attribute location_attr; + struct device_attribute value_attr; + struct calling_interface_token *token; +}; + struct smbios_device { struct list_head list; struct device *device; @@ -416,47 +422,26 @@ static void __init find_tokens(const struct dmi_header *dm, void *dummy) } }
-static int match_attribute(struct device *dev, - struct device_attribute *attr) -{ - int i; - - for (i = 0; i < da_num_tokens * 2; i++) { - if (!token_attrs[i]) - continue; - if (strcmp(token_attrs[i]->name, attr->attr.name) == 0) - return i/2; - } - dev_dbg(dev, "couldn't match: %s\n", attr->attr.name); - return -EINVAL; -} - static ssize_t location_show(struct device *dev, struct device_attribute *attr, char *buf) { - int i; + struct token_sysfs_data *data = container_of(attr, struct token_sysfs_data, location_attr);
if (!capable(CAP_SYS_ADMIN)) return -EPERM;
- i = match_attribute(dev, attr); - if (i > 0) - return sysfs_emit(buf, "%08x", da_tokens[i].location); - return 0; + return sysfs_emit(buf, "%08x", data->token->location); }
static ssize_t value_show(struct device *dev, struct device_attribute *attr, char *buf) { - int i; + struct token_sysfs_data *data = container_of(attr, struct token_sysfs_data, value_attr);
if (!capable(CAP_SYS_ADMIN)) return -EPERM;
- i = match_attribute(dev, attr); - if (i > 0) - return sysfs_emit(buf, "%08x", da_tokens[i].value); - return 0; + return sysfs_emit(buf, "%08x", data->token->value); }
static struct attribute_group smbios_attribute_group = { @@ -473,22 +458,15 @@ static int build_tokens_sysfs(struct platform_device *dev) { char *location_name; char *value_name; - size_t size; int ret; int i, j;
- /* (number of tokens + 1 for null terminated */ - size = sizeof(struct device_attribute) * (da_num_tokens + 1); - token_location_attrs = kzalloc(size, GFP_KERNEL); - if (!token_location_attrs) + token_entries = kcalloc(da_num_tokens, sizeof(*token_entries), GFP_KERNEL); + if (!token_entries) return -ENOMEM; - token_value_attrs = kzalloc(size, GFP_KERNEL); - if (!token_value_attrs) - goto out_allocate_value;
/* need to store both location and value + terminator*/ - size = sizeof(struct attribute *) * ((2 * da_num_tokens) + 1); - token_attrs = kzalloc(size, GFP_KERNEL); + token_attrs = kcalloc((2 * da_num_tokens) + 1, sizeof(*token_attrs), GFP_KERNEL); if (!token_attrs) goto out_allocate_attrs;
@@ -496,27 +474,32 @@ static int build_tokens_sysfs(struct platform_device *dev) /* skip empty */ if (da_tokens[i].tokenID == 0) continue; + + token_entries[i].token = &da_tokens[i]; + /* add location */ location_name = kasprintf(GFP_KERNEL, "%04x_location", da_tokens[i].tokenID); if (location_name == NULL) goto out_unwind_strings; - sysfs_attr_init(&token_location_attrs[i].attr); - token_location_attrs[i].attr.name = location_name; - token_location_attrs[i].attr.mode = 0444; - token_location_attrs[i].show = location_show; - token_attrs[j++] = &token_location_attrs[i].attr; + + sysfs_attr_init(&token_entries[i].location_attr.attr); + token_entries[i].location_attr.attr.name = location_name; + token_entries[i].location_attr.attr.mode = 0444; + token_entries[i].location_attr.show = location_show; + token_attrs[j++] = &token_entries[i].location_attr.attr;
/* add value */ value_name = kasprintf(GFP_KERNEL, "%04x_value", da_tokens[i].tokenID); if (value_name == NULL) goto loop_fail_create_value; - sysfs_attr_init(&token_value_attrs[i].attr); - token_value_attrs[i].attr.name = value_name; - token_value_attrs[i].attr.mode = 0444; - token_value_attrs[i].show = value_show; - token_attrs[j++] = &token_value_attrs[i].attr; + + sysfs_attr_init(&token_entries[i].value_attr.attr); + token_entries[i].value_attr.attr.name = value_name; + token_entries[i].value_attr.attr.mode = 0444; + token_entries[i].value_attr.show = value_show; + token_attrs[j++] = &token_entries[i].value_attr.attr; continue;
loop_fail_create_value: @@ -532,14 +515,12 @@ static int build_tokens_sysfs(struct platform_device *dev)
out_unwind_strings: while (i--) { - kfree(token_location_attrs[i].attr.name); - kfree(token_value_attrs[i].attr.name); + kfree(token_entries[i].location_attr.attr.name); + kfree(token_entries[i].value_attr.attr.name); } kfree(token_attrs); out_allocate_attrs: - kfree(token_value_attrs); -out_allocate_value: - kfree(token_location_attrs); + kfree(token_entries);
return -ENOMEM; } @@ -551,12 +532,11 @@ static void free_group(struct platform_device *pdev) sysfs_remove_group(&pdev->dev.kobj, &smbios_attribute_group); for (i = 0; i < da_num_tokens; i++) { - kfree(token_location_attrs[i].attr.name); - kfree(token_value_attrs[i].attr.name); + kfree(token_entries[i].location_attr.attr.name); + kfree(token_entries[i].value_attr.attr.name); } kfree(token_attrs); - kfree(token_value_attrs); - kfree(token_location_attrs); + kfree(token_entries); }
static int __init dell_smbios_init(void)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gregor Herburger gregor.herburger@tq-group.com
[ Upstream commit 8c219e52ca4d9a67cd6a7074e91bf29b55edc075 ]
Fix description for GPIO_TQMX86 from QTMX86 to TQMx86.
Fixes: b868db94a6a7 ("gpio: tqmx86: Add GPIO from for this IO controller") Signed-off-by: Gregor Herburger gregor.herburger@tq-group.com Signed-off-by: Matthias Schiffer matthias.schiffer@ew.tq-group.com Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/e0e38c9944ad6d281d9a662a45d289b88edc808e.171706399... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig index b50d0b4708497..cbfcfefdb5da1 100644 --- a/drivers/gpio/Kconfig +++ b/drivers/gpio/Kconfig @@ -1549,7 +1549,7 @@ config GPIO_TPS68470 are "output only" GPIOs.
config GPIO_TQMX86 - tristate "TQ-Systems QTMX86 GPIO" + tristate "TQ-Systems TQMx86 GPIO" depends on MFD_TQMX86 || COMPILE_TEST depends on HAS_IOPORT_MAP select GPIOLIB_IRQCHIP
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthias Schiffer matthias.schiffer@ew.tq-group.com
[ Upstream commit 9d6a811b522ba558bcb4ec01d12e72a0af8e9f6e ]
The TQMx86 GPIO controller uses the same register address for input and output data. Reading the register will always return current inputs rather than the previously set outputs (regardless of the current direction setting). Therefore, using a RMW pattern does not make sense when setting output values. Instead, the previously set output register value needs to be stored as a shadow register.
As there is no reliable way to get the current output values from the hardware, also initialize all channels to 0, to ensure that stored and actual output values match. This should usually not have any effect in practise, as the TQMx86 UEFI sets all outputs to 0 during boot.
Also prepare for extension of the driver to more than 8 GPIOs by using DECLARE_BITMAP.
Fixes: b868db94a6a7 ("gpio: tqmx86: Add GPIO from for this IO controller") Signed-off-by: Matthias Schiffer matthias.schiffer@ew.tq-group.com Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/d0555933becd45fa92a85675d26e4d59343ddc01.171706399... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-tqmx86.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-)
diff --git a/drivers/gpio/gpio-tqmx86.c b/drivers/gpio/gpio-tqmx86.c index 3a28c1f273c39..b7e2dbbdc4ebe 100644 --- a/drivers/gpio/gpio-tqmx86.c +++ b/drivers/gpio/gpio-tqmx86.c @@ -6,6 +6,7 @@ * Vadim V.Vlasov vvlasov@dev.rtsoft.ru */
+#include <linux/bitmap.h> #include <linux/bitops.h> #include <linux/errno.h> #include <linux/gpio/driver.h> @@ -38,6 +39,7 @@ struct tqmx86_gpio_data { void __iomem *io_base; int irq; raw_spinlock_t spinlock; + DECLARE_BITMAP(output, TQMX86_NGPIO); u8 irq_type[TQMX86_NGPI]; };
@@ -64,15 +66,10 @@ static void tqmx86_gpio_set(struct gpio_chip *chip, unsigned int offset, { struct tqmx86_gpio_data *gpio = gpiochip_get_data(chip); unsigned long flags; - u8 val;
raw_spin_lock_irqsave(&gpio->spinlock, flags); - val = tqmx86_gpio_read(gpio, TQMX86_GPIOD); - if (value) - val |= BIT(offset); - else - val &= ~BIT(offset); - tqmx86_gpio_write(gpio, val, TQMX86_GPIOD); + __assign_bit(offset, gpio->output, value); + tqmx86_gpio_write(gpio, bitmap_get_value8(gpio->output, 0), TQMX86_GPIOD); raw_spin_unlock_irqrestore(&gpio->spinlock, flags); }
@@ -277,6 +274,13 @@ static int tqmx86_gpio_probe(struct platform_device *pdev)
tqmx86_gpio_write(gpio, (u8)~TQMX86_DIR_INPUT_MASK, TQMX86_GPIODD);
+ /* + * Reading the previous output state is not possible with TQMx86 hardware. + * Initialize all outputs to 0 to have a defined state that matches the + * shadow register. + */ + tqmx86_gpio_write(gpio, 0, TQMX86_GPIOD); + chip = &gpio->chip; chip->label = "gpio-tqmx86"; chip->owner = THIS_MODULE;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthias Schiffer matthias.schiffer@ew.tq-group.com
[ Upstream commit 08af509efdf8dad08e972b48de0e2c2a7919ea8b ]
irq_set_type() should not implicitly unmask the IRQ.
All accesses to the interrupt configuration register are moved to a new helper tqmx86_gpio_irq_config(). We also introduce the new rule that accessing irq_type must happen while locked, which will become significant for fixing EDGE_BOTH handling.
Fixes: b868db94a6a7 ("gpio: tqmx86: Add GPIO from for this IO controller") Signed-off-by: Matthias Schiffer matthias.schiffer@ew.tq-group.com Link: https://lore.kernel.org/r/6aa4f207f77cb58ef64ffb947e91949b0f753ccd.171706399... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-tqmx86.c | 48 ++++++++++++++++++++++---------------- 1 file changed, 28 insertions(+), 20 deletions(-)
diff --git a/drivers/gpio/gpio-tqmx86.c b/drivers/gpio/gpio-tqmx86.c index b7e2dbbdc4ebe..7e428c872a257 100644 --- a/drivers/gpio/gpio-tqmx86.c +++ b/drivers/gpio/gpio-tqmx86.c @@ -29,15 +29,19 @@ #define TQMX86_GPIIC 3 /* GPI Interrupt Configuration Register */ #define TQMX86_GPIIS 4 /* GPI Interrupt Status Register */
+#define TQMX86_GPII_NONE 0 #define TQMX86_GPII_FALLING BIT(0) #define TQMX86_GPII_RISING BIT(1) #define TQMX86_GPII_MASK (BIT(0) | BIT(1)) #define TQMX86_GPII_BITS 2 +/* Stored in irq_type with GPII bits */ +#define TQMX86_INT_UNMASKED BIT(2)
struct tqmx86_gpio_data { struct gpio_chip chip; void __iomem *io_base; int irq; + /* Lock must be held for accessing output and irq_type fields */ raw_spinlock_t spinlock; DECLARE_BITMAP(output, TQMX86_NGPIO); u8 irq_type[TQMX86_NGPI]; @@ -104,21 +108,32 @@ static int tqmx86_gpio_get_direction(struct gpio_chip *chip, return GPIO_LINE_DIRECTION_OUT; }
+static void tqmx86_gpio_irq_config(struct tqmx86_gpio_data *gpio, int offset) + __must_hold(&gpio->spinlock) +{ + u8 type = TQMX86_GPII_NONE, gpiic; + + if (gpio->irq_type[offset] & TQMX86_INT_UNMASKED) + type = gpio->irq_type[offset] & TQMX86_GPII_MASK; + + gpiic = tqmx86_gpio_read(gpio, TQMX86_GPIIC); + gpiic &= ~(TQMX86_GPII_MASK << (offset * TQMX86_GPII_BITS)); + gpiic |= type << (offset * TQMX86_GPII_BITS); + tqmx86_gpio_write(gpio, gpiic, TQMX86_GPIIC); +} + static void tqmx86_gpio_irq_mask(struct irq_data *data) { unsigned int offset = (data->hwirq - TQMX86_NGPO); struct tqmx86_gpio_data *gpio = gpiochip_get_data( irq_data_get_irq_chip_data(data)); unsigned long flags; - u8 gpiic, mask; - - mask = TQMX86_GPII_MASK << (offset * TQMX86_GPII_BITS);
raw_spin_lock_irqsave(&gpio->spinlock, flags); - gpiic = tqmx86_gpio_read(gpio, TQMX86_GPIIC); - gpiic &= ~mask; - tqmx86_gpio_write(gpio, gpiic, TQMX86_GPIIC); + gpio->irq_type[offset] &= ~TQMX86_INT_UNMASKED; + tqmx86_gpio_irq_config(gpio, offset); raw_spin_unlock_irqrestore(&gpio->spinlock, flags); + gpiochip_disable_irq(&gpio->chip, irqd_to_hwirq(data)); }
@@ -128,16 +143,12 @@ static void tqmx86_gpio_irq_unmask(struct irq_data *data) struct tqmx86_gpio_data *gpio = gpiochip_get_data( irq_data_get_irq_chip_data(data)); unsigned long flags; - u8 gpiic, mask; - - mask = TQMX86_GPII_MASK << (offset * TQMX86_GPII_BITS);
gpiochip_enable_irq(&gpio->chip, irqd_to_hwirq(data)); + raw_spin_lock_irqsave(&gpio->spinlock, flags); - gpiic = tqmx86_gpio_read(gpio, TQMX86_GPIIC); - gpiic &= ~mask; - gpiic |= gpio->irq_type[offset] << (offset * TQMX86_GPII_BITS); - tqmx86_gpio_write(gpio, gpiic, TQMX86_GPIIC); + gpio->irq_type[offset] |= TQMX86_INT_UNMASKED; + tqmx86_gpio_irq_config(gpio, offset); raw_spin_unlock_irqrestore(&gpio->spinlock, flags); }
@@ -148,7 +159,7 @@ static int tqmx86_gpio_irq_set_type(struct irq_data *data, unsigned int type) unsigned int offset = (data->hwirq - TQMX86_NGPO); unsigned int edge_type = type & IRQF_TRIGGER_MASK; unsigned long flags; - u8 new_type, gpiic; + u8 new_type;
switch (edge_type) { case IRQ_TYPE_EDGE_RISING: @@ -164,13 +175,10 @@ static int tqmx86_gpio_irq_set_type(struct irq_data *data, unsigned int type) return -EINVAL; /* not supported */ }
- gpio->irq_type[offset] = new_type; - raw_spin_lock_irqsave(&gpio->spinlock, flags); - gpiic = tqmx86_gpio_read(gpio, TQMX86_GPIIC); - gpiic &= ~((TQMX86_GPII_MASK) << (offset * TQMX86_GPII_BITS)); - gpiic |= new_type << (offset * TQMX86_GPII_BITS); - tqmx86_gpio_write(gpio, gpiic, TQMX86_GPIIC); + gpio->irq_type[offset] &= ~TQMX86_GPII_MASK; + gpio->irq_type[offset] |= new_type; + tqmx86_gpio_irq_config(gpio, offset); raw_spin_unlock_irqrestore(&gpio->spinlock, flags);
return 0;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthias Schiffer matthias.schiffer@ew.tq-group.com
[ Upstream commit 90dd7de4ef7ba584823dfbeba834c2919a4bb55b ]
The TQMx86 GPIO controller only supports falling and rising edge triggers, but not both. Fix this by implementing a software both-edge mode that toggles the edge type after every interrupt.
Fixes: b868db94a6a7 ("gpio: tqmx86: Add GPIO from for this IO controller") Co-developed-by: Gregor Herburger gregor.herburger@tq-group.com Signed-off-by: Gregor Herburger gregor.herburger@tq-group.com Signed-off-by: Matthias Schiffer matthias.schiffer@ew.tq-group.com Link: https://lore.kernel.org/r/515324f0491c4d44f4ef49f170354aca002d81ef.171706399... Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-tqmx86.c | 46 ++++++++++++++++++++++++++++++++++---- 1 file changed, 42 insertions(+), 4 deletions(-)
diff --git a/drivers/gpio/gpio-tqmx86.c b/drivers/gpio/gpio-tqmx86.c index 7e428c872a257..f2e7e8754d95d 100644 --- a/drivers/gpio/gpio-tqmx86.c +++ b/drivers/gpio/gpio-tqmx86.c @@ -32,6 +32,10 @@ #define TQMX86_GPII_NONE 0 #define TQMX86_GPII_FALLING BIT(0) #define TQMX86_GPII_RISING BIT(1) +/* Stored in irq_type as a trigger type, but not actually valid as a register + * value, so the name doesn't use "GPII" + */ +#define TQMX86_INT_BOTH (BIT(0) | BIT(1)) #define TQMX86_GPII_MASK (BIT(0) | BIT(1)) #define TQMX86_GPII_BITS 2 /* Stored in irq_type with GPII bits */ @@ -113,9 +117,15 @@ static void tqmx86_gpio_irq_config(struct tqmx86_gpio_data *gpio, int offset) { u8 type = TQMX86_GPII_NONE, gpiic;
- if (gpio->irq_type[offset] & TQMX86_INT_UNMASKED) + if (gpio->irq_type[offset] & TQMX86_INT_UNMASKED) { type = gpio->irq_type[offset] & TQMX86_GPII_MASK;
+ if (type == TQMX86_INT_BOTH) + type = tqmx86_gpio_get(&gpio->chip, offset + TQMX86_NGPO) + ? TQMX86_GPII_FALLING + : TQMX86_GPII_RISING; + } + gpiic = tqmx86_gpio_read(gpio, TQMX86_GPIIC); gpiic &= ~(TQMX86_GPII_MASK << (offset * TQMX86_GPII_BITS)); gpiic |= type << (offset * TQMX86_GPII_BITS); @@ -169,7 +179,7 @@ static int tqmx86_gpio_irq_set_type(struct irq_data *data, unsigned int type) new_type = TQMX86_GPII_FALLING; break; case IRQ_TYPE_EDGE_BOTH: - new_type = TQMX86_GPII_FALLING | TQMX86_GPII_RISING; + new_type = TQMX86_INT_BOTH; break; default: return -EINVAL; /* not supported */ @@ -189,8 +199,8 @@ static void tqmx86_gpio_irq_handler(struct irq_desc *desc) struct gpio_chip *chip = irq_desc_get_handler_data(desc); struct tqmx86_gpio_data *gpio = gpiochip_get_data(chip); struct irq_chip *irq_chip = irq_desc_get_chip(desc); - unsigned long irq_bits; - int i = 0; + unsigned long irq_bits, flags; + int i; u8 irq_status;
chained_irq_enter(irq_chip, desc); @@ -199,6 +209,34 @@ static void tqmx86_gpio_irq_handler(struct irq_desc *desc) tqmx86_gpio_write(gpio, irq_status, TQMX86_GPIIS);
irq_bits = irq_status; + + raw_spin_lock_irqsave(&gpio->spinlock, flags); + for_each_set_bit(i, &irq_bits, TQMX86_NGPI) { + /* + * Edge-both triggers are implemented by flipping the edge + * trigger after each interrupt, as the controller only supports + * either rising or falling edge triggers, but not both. + * + * Internally, the TQMx86 GPIO controller has separate status + * registers for rising and falling edge interrupts. GPIIC + * configures which bits from which register are visible in the + * interrupt status register GPIIS and defines what triggers the + * parent IRQ line. Writing to GPIIS always clears both rising + * and falling interrupt flags internally, regardless of the + * currently configured trigger. + * + * In consequence, we can cleanly implement the edge-both + * trigger in software by first clearing the interrupt and then + * setting the new trigger based on the current GPIO input in + * tqmx86_gpio_irq_config() - even if an edge arrives between + * reading the input and setting the trigger, we will have a new + * interrupt pending. + */ + if ((gpio->irq_type[i] & TQMX86_GPII_MASK) == TQMX86_INT_BOTH) + tqmx86_gpio_irq_config(gpio, i); + } + raw_spin_unlock_irqrestore(&gpio->spinlock, flags); + for_each_set_bit(i, &irq_bits, TQMX86_NGPI) generic_handle_domain_irq(gpio->chip.irq.domain, i + TQMX86_NGPO);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikita Zhandarovich n.zhandarovich@fintech.ru
[ Upstream commit 4aa2dcfbad538adf7becd0034a3754e1bd01b2b5 ]
Syzkaller hit a warning [1] in a call to implement() when trying to write a value into a field of smaller size in an output report.
Since implement() already has a warn message printed out with the help of hid_warn() and value in question gets trimmed with: ... value &= m; ... WARN_ON may be considered superfluous. Remove it to suppress future syzkaller triggers.
[1] WARNING: CPU: 0 PID: 5084 at drivers/hid/hid-core.c:1451 implement drivers/hid/hid-core.c:1451 [inline] WARNING: CPU: 0 PID: 5084 at drivers/hid/hid-core.c:1451 hid_output_report+0x548/0x760 drivers/hid/hid-core.c:1863 Modules linked in: CPU: 0 PID: 5084 Comm: syz-executor424 Not tainted 6.9.0-rc7-syzkaller-00183-gcf87f46fd34d #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 RIP: 0010:implement drivers/hid/hid-core.c:1451 [inline] RIP: 0010:hid_output_report+0x548/0x760 drivers/hid/hid-core.c:1863 ... Call Trace: <TASK> __usbhid_submit_report drivers/hid/usbhid/hid-core.c:591 [inline] usbhid_submit_report+0x43d/0x9e0 drivers/hid/usbhid/hid-core.c:636 hiddev_ioctl+0x138b/0x1f00 drivers/hid/usbhid/hiddev.c:726 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:904 [inline] __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:890 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f ...
Fixes: 95d1c8951e5b ("HID: simplify implement() a bit") Reported-by: syzbot+5186630949e3c55f0799@syzkaller.appspotmail.com Suggested-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Nikita Zhandarovich n.zhandarovich@fintech.ru Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-core.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/hid/hid-core.c b/drivers/hid/hid-core.c index de7a477d66656..751a73ae7de75 100644 --- a/drivers/hid/hid-core.c +++ b/drivers/hid/hid-core.c @@ -1448,7 +1448,6 @@ static void implement(const struct hid_device *hid, u8 *report, hid_warn(hid, "%s() called with too large value %d (n: %d)! (%s)\n", __func__, value, n, current->comm); - WARN_ON(1); value &= m; } }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kun(llfl) llfl@linux.alibaba.com
[ Upstream commit a295ec52c8624883885396fde7b4df1a179627c3 ]
During the iommu initialization, iommu_init_pci() adds sysfs nodes. However, these nodes aren't remove in free_iommu_resources() subsequently.
Fixes: 39ab9555c241 ("iommu: Add sysfs bindings for struct iommu_device") Signed-off-by: Kun(llfl) llfl@linux.alibaba.com Reviewed-by: Suravee Suthikulpanit suravee.suthikulpanit@amd.com Link: https://lore.kernel.org/r/c8e0d11c6ab1ee48299c288009cf9c5dae07b42d.171521500... Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/amd/init.c | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c index f440ca440d924..e740dc54c4685 100644 --- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -1678,8 +1678,17 @@ static void __init free_pci_segments(void) } }
+static void __init free_sysfs(struct amd_iommu *iommu) +{ + if (iommu->iommu.dev) { + iommu_device_unregister(&iommu->iommu); + iommu_device_sysfs_remove(&iommu->iommu); + } +} + static void __init free_iommu_one(struct amd_iommu *iommu) { + free_sysfs(iommu); free_cwwb_sem(iommu); free_command_buffer(iommu); free_event_buffer(iommu);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lu Baolu baolu.lu@linux.intel.com
[ Upstream commit 89e8a2366e3bce584b6c01549d5019c5cda1205e ]
iommu_sva_bind_device() should return either a sva bond handle or an ERR_PTR value in error cases. Existing drivers (idxd and uacce) only check the return value with IS_ERR(). This could potentially lead to a kernel NULL pointer dereference issue if the function returns NULL instead of an error pointer.
In reality, this doesn't cause any problems because iommu_sva_bind_device() only returns NULL when the kernel is not configured with CONFIG_IOMMU_SVA. In this case, iommu_dev_enable_feature(dev, IOMMU_DEV_FEAT_SVA) will return an error, and the device drivers won't call iommu_sva_bind_device() at all.
Fixes: 26b25a2b98e4 ("iommu: Bind process address spaces to devices") Signed-off-by: Lu Baolu baolu.lu@linux.intel.com Reviewed-by: Jean-Philippe Brucker jean-philippe@linaro.org Reviewed-by: Kevin Tian kevin.tian@intel.com Reviewed-by: Vasant Hegde vasant.hegde@amd.com Link: https://lore.kernel.org/r/20240528042528.71396-1-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/iommu.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/iommu.h b/include/linux/iommu.h index 2e925b5eba534..3b67d59a36bf9 100644 --- a/include/linux/iommu.h +++ b/include/linux/iommu.h @@ -1537,7 +1537,7 @@ struct iommu_domain *iommu_sva_domain_alloc(struct device *dev, static inline struct iommu_sva * iommu_sva_bind_device(struct device *dev, struct mm_struct *mm) { - return NULL; + return ERR_PTR(-ENODEV); }
static inline void iommu_sva_unbind_device(struct iommu_sva *handle)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
[ Upstream commit 8a565304927fbd28c9f028c492b5c1714002cbab ]
Utilize set_bit() and test_bit() on worker->flags within io_uring/io-wq to address potential data races.
The structure io_worker->flags may be accessed through various data paths, leading to concurrency issues. When KCSAN is enabled, it reveals data races occurring in io_worker_handle_work and io_wq_activate_free_worker functions.
BUG: KCSAN: data-race in io_worker_handle_work / io_wq_activate_free_worker write to 0xffff8885c4246404 of 4 bytes by task 49071 on cpu 28: io_worker_handle_work (io_uring/io-wq.c:434 io_uring/io-wq.c:569) io_wq_worker (io_uring/io-wq.c:?) <snip>
read to 0xffff8885c4246404 of 4 bytes by task 49024 on cpu 5: io_wq_activate_free_worker (io_uring/io-wq.c:? io_uring/io-wq.c:285) io_wq_enqueue (io_uring/io-wq.c:947) io_queue_iowq (io_uring/io_uring.c:524) io_req_task_submit (io_uring/io_uring.c:1511) io_handle_tw_list (io_uring/io_uring.c:1198) <snip>
Line numbers against commit 18daea77cca6 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm").
These races involve writes and reads to the same memory location by different tasks running on different CPUs. To mitigate this, refactor the code to use atomic operations such as set_bit(), test_bit(), and clear_bit() instead of basic "and" and "or" operations. This ensures thread-safe manipulation of worker flags.
Also, move `create_index` to avoid holes in the structure.
Signed-off-by: Breno Leitao leitao@debian.org Link: https://lore.kernel.org/r/20240507170002.2269003-1-leitao@debian.org Signed-off-by: Jens Axboe axboe@kernel.dk Stable-dep-of: 91215f70ea85 ("io_uring/io-wq: avoid garbage value of 'match' in io_wq_enqueue()") Signed-off-by: Sasha Levin sashal@kernel.org --- io_uring/io-wq.c | 47 ++++++++++++++++++++++++----------------------- 1 file changed, 24 insertions(+), 23 deletions(-)
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c index 318ed067dbf64..4a07742349048 100644 --- a/io_uring/io-wq.c +++ b/io_uring/io-wq.c @@ -25,10 +25,10 @@ #define WORKER_IDLE_TIMEOUT (5 * HZ)
enum { - IO_WORKER_F_UP = 1, /* up and active */ - IO_WORKER_F_RUNNING = 2, /* account as running */ - IO_WORKER_F_FREE = 4, /* worker on free list */ - IO_WORKER_F_BOUND = 8, /* is doing bounded work */ + IO_WORKER_F_UP = 0, /* up and active */ + IO_WORKER_F_RUNNING = 1, /* account as running */ + IO_WORKER_F_FREE = 2, /* worker on free list */ + IO_WORKER_F_BOUND = 3, /* is doing bounded work */ };
enum { @@ -44,7 +44,8 @@ enum { */ struct io_worker { refcount_t ref; - unsigned flags; + int create_index; + unsigned long flags; struct hlist_nulls_node nulls_node; struct list_head all_list; struct task_struct *task; @@ -58,7 +59,6 @@ struct io_worker {
unsigned long create_state; struct callback_head create_work; - int create_index;
union { struct rcu_head rcu; @@ -165,7 +165,7 @@ static inline struct io_wq_acct *io_work_get_acct(struct io_wq *wq,
static inline struct io_wq_acct *io_wq_get_acct(struct io_worker *worker) { - return io_get_acct(worker->wq, worker->flags & IO_WORKER_F_BOUND); + return io_get_acct(worker->wq, test_bit(IO_WORKER_F_BOUND, &worker->flags)); }
static void io_worker_ref_put(struct io_wq *wq) @@ -225,7 +225,7 @@ static void io_worker_exit(struct io_worker *worker) wait_for_completion(&worker->ref_done);
raw_spin_lock(&wq->lock); - if (worker->flags & IO_WORKER_F_FREE) + if (test_bit(IO_WORKER_F_FREE, &worker->flags)) hlist_nulls_del_rcu(&worker->nulls_node); list_del_rcu(&worker->all_list); raw_spin_unlock(&wq->lock); @@ -410,7 +410,7 @@ static void io_wq_dec_running(struct io_worker *worker) struct io_wq_acct *acct = io_wq_get_acct(worker); struct io_wq *wq = worker->wq;
- if (!(worker->flags & IO_WORKER_F_UP)) + if (!test_bit(IO_WORKER_F_UP, &worker->flags)) return;
if (!atomic_dec_and_test(&acct->nr_running)) @@ -430,8 +430,8 @@ static void io_wq_dec_running(struct io_worker *worker) */ static void __io_worker_busy(struct io_wq *wq, struct io_worker *worker) { - if (worker->flags & IO_WORKER_F_FREE) { - worker->flags &= ~IO_WORKER_F_FREE; + if (test_bit(IO_WORKER_F_FREE, &worker->flags)) { + clear_bit(IO_WORKER_F_FREE, &worker->flags); raw_spin_lock(&wq->lock); hlist_nulls_del_init_rcu(&worker->nulls_node); raw_spin_unlock(&wq->lock); @@ -444,8 +444,8 @@ static void __io_worker_busy(struct io_wq *wq, struct io_worker *worker) static void __io_worker_idle(struct io_wq *wq, struct io_worker *worker) __must_hold(wq->lock) { - if (!(worker->flags & IO_WORKER_F_FREE)) { - worker->flags |= IO_WORKER_F_FREE; + if (!test_bit(IO_WORKER_F_FREE, &worker->flags)) { + set_bit(IO_WORKER_F_FREE, &worker->flags); hlist_nulls_add_head_rcu(&worker->nulls_node, &wq->free_list); } } @@ -634,7 +634,8 @@ static int io_wq_worker(void *data) bool exit_mask = false, last_timeout = false; char buf[TASK_COMM_LEN];
- worker->flags |= (IO_WORKER_F_UP | IO_WORKER_F_RUNNING); + set_mask_bits(&worker->flags, 0, + BIT(IO_WORKER_F_UP) | BIT(IO_WORKER_F_RUNNING));
snprintf(buf, sizeof(buf), "iou-wrk-%d", wq->task->pid); set_task_comm(current, buf); @@ -698,11 +699,11 @@ void io_wq_worker_running(struct task_struct *tsk)
if (!worker) return; - if (!(worker->flags & IO_WORKER_F_UP)) + if (!test_bit(IO_WORKER_F_UP, &worker->flags)) return; - if (worker->flags & IO_WORKER_F_RUNNING) + if (test_bit(IO_WORKER_F_RUNNING, &worker->flags)) return; - worker->flags |= IO_WORKER_F_RUNNING; + set_bit(IO_WORKER_F_RUNNING, &worker->flags); io_wq_inc_running(worker); }
@@ -716,12 +717,12 @@ void io_wq_worker_sleeping(struct task_struct *tsk)
if (!worker) return; - if (!(worker->flags & IO_WORKER_F_UP)) + if (!test_bit(IO_WORKER_F_UP, &worker->flags)) return; - if (!(worker->flags & IO_WORKER_F_RUNNING)) + if (!test_bit(IO_WORKER_F_RUNNING, &worker->flags)) return;
- worker->flags &= ~IO_WORKER_F_RUNNING; + clear_bit(IO_WORKER_F_RUNNING, &worker->flags); io_wq_dec_running(worker); }
@@ -735,7 +736,7 @@ static void io_init_new_worker(struct io_wq *wq, struct io_worker *worker, raw_spin_lock(&wq->lock); hlist_nulls_add_head_rcu(&worker->nulls_node, &wq->free_list); list_add_tail_rcu(&worker->all_list, &wq->all_list); - worker->flags |= IO_WORKER_F_FREE; + set_bit(IO_WORKER_F_FREE, &worker->flags); raw_spin_unlock(&wq->lock); wake_up_new_task(tsk); } @@ -841,7 +842,7 @@ static bool create_io_worker(struct io_wq *wq, int index) init_completion(&worker->ref_done);
if (index == IO_WQ_ACCT_BOUND) - worker->flags |= IO_WORKER_F_BOUND; + set_bit(IO_WORKER_F_BOUND, &worker->flags);
tsk = create_io_thread(io_wq_worker, worker, NUMA_NO_NODE); if (!IS_ERR(tsk)) { @@ -927,8 +928,8 @@ static bool io_wq_work_match_item(struct io_wq_work *work, void *data) void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work) { struct io_wq_acct *acct = io_work_get_acct(wq, work); + unsigned long work_flags = work->flags; struct io_cb_cancel_data match; - unsigned work_flags = work->flags; bool do_create;
/*
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Su Hui suhui@nfschina.com
[ Upstream commit 91215f70ea8541e9011c0b48f8b59b9e0ce6953b ]
Clang static checker (scan-build) warning: o_uring/io-wq.c:line 1051, column 3 The expression is an uninitialized value. The computed value will also be garbage.
'match.nr_pending' is used in io_acct_cancel_pending_work(), but it is not fully initialized. Change the order of assignment for 'match' to fix this problem.
Fixes: 42abc95f05bf ("io-wq: decouple work_list protection from the big wqe->lock") Signed-off-by: Su Hui suhui@nfschina.com Link: https://lore.kernel.org/r/20240604121242.2661244-1-suhui@nfschina.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- io_uring/io-wq.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/io_uring/io-wq.c b/io_uring/io-wq.c index 4a07742349048..8a99aabcac2c3 100644 --- a/io_uring/io-wq.c +++ b/io_uring/io-wq.c @@ -929,7 +929,11 @@ void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work) { struct io_wq_acct *acct = io_work_get_acct(wq, work); unsigned long work_flags = work->flags; - struct io_cb_cancel_data match; + struct io_cb_cancel_data match = { + .fn = io_wq_work_match_item, + .data = work, + .cancel_all = false, + }; bool do_create;
/* @@ -967,10 +971,6 @@ void io_wq_enqueue(struct io_wq *wq, struct io_wq_work *work) raw_spin_unlock(&wq->lock);
/* fatal condition, failed to create the first worker */ - match.fn = io_wq_work_match_item, - match.data = work, - match.cancel_all = false, - io_acct_cancel_pending_work(wq, acct, &match); } }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: José Expósito jose.exposito89@gmail.com
[ Upstream commit ce3af2ee95170b7d9e15fff6e500d67deab1e7b3 ]
Fix a memory leak on logi_dj_recv_send_report() error path.
Fixes: 6f20d3261265 ("HID: logitech-dj: Fix error handling in logi_dj_recv_switch_to_dj_mode()") Signed-off-by: José Expósito jose.exposito89@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-logitech-dj.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/hid/hid-logitech-dj.c b/drivers/hid/hid-logitech-dj.c index 3c3c497b6b911..37958edec55f5 100644 --- a/drivers/hid/hid-logitech-dj.c +++ b/drivers/hid/hid-logitech-dj.c @@ -1284,8 +1284,10 @@ static int logi_dj_recv_switch_to_dj_mode(struct dj_receiver_dev *djrcv_dev, */ msleep(50);
- if (retval) + if (retval) { + kfree(dj_report); return retval; + } }
/*
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Forbes ian.forbes@broadcom.com
[ Upstream commit 426826933109093503e7ef15d49348fc5ab505fe ]
SVGA requires individual surfaces to fit within graphics memory (max_mob_pages) which means that modes with a final buffer size that would exceed graphics memory must be pruned otherwise creation will fail.
Additionally llvmpipe requires its buffer height and width to be a multiple of its tile size which is 64. As a result we have to anticipate that llvmpipe will round up the mode size passed to it by the compositor when it creates buffers and filter modes where this rounding exceeds graphics memory.
This fixes an issue where VMs with low graphics memory (< 64MiB) configured with high resolution mode boot to a black screen because surface creation fails.
Fixes: d947d1b71deb ("drm/vmwgfx: Add and connect connector helper function") Signed-off-by: Ian Forbes ian.forbes@broadcom.com Signed-off-by: Zack Rusin zack.rusin@broadcom.com Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-2-ian.forbe... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c | 45 ++++++++++++++++++++++++++-- 1 file changed, 43 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c index 3c8414a13dbad..dbc44ecbd1f4a 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c @@ -41,7 +41,14 @@ #define vmw_connector_to_stdu(x) \ container_of(x, struct vmw_screen_target_display_unit, base.connector)
- +/* + * Some renderers such as llvmpipe will align the width and height of their + * buffers to match their tile size. We need to keep this in mind when exposing + * modes to userspace so that this possible over-allocation will not exceed + * graphics memory. 64x64 pixels seems to be a reasonable upper bound for the + * tile size of current renderers. + */ +#define GPU_TILE_SIZE 64
enum stdu_content_type { SAME_AS_DISPLAY = 0, @@ -830,7 +837,41 @@ static void vmw_stdu_connector_destroy(struct drm_connector *connector) vmw_stdu_destroy(vmw_connector_to_stdu(connector)); }
+static enum drm_mode_status +vmw_stdu_connector_mode_valid(struct drm_connector *connector, + struct drm_display_mode *mode) +{ + enum drm_mode_status ret; + struct drm_device *dev = connector->dev; + struct vmw_private *dev_priv = vmw_priv(dev); + u64 assumed_cpp = dev_priv->assume_16bpp ? 2 : 4; + /* Align width and height to account for GPU tile over-alignment */ + u64 required_mem = ALIGN(mode->hdisplay, GPU_TILE_SIZE) * + ALIGN(mode->vdisplay, GPU_TILE_SIZE) * + assumed_cpp; + required_mem = ALIGN(required_mem, PAGE_SIZE); + + ret = drm_mode_validate_size(mode, dev_priv->stdu_max_width, + dev_priv->stdu_max_height); + if (ret != MODE_OK) + return ret;
+ ret = drm_mode_validate_size(mode, dev_priv->texture_max_width, + dev_priv->texture_max_height); + if (ret != MODE_OK) + return ret; + + if (required_mem > dev_priv->max_primary_mem) + return MODE_MEM; + + if (required_mem > dev_priv->max_mob_pages * PAGE_SIZE) + return MODE_MEM; + + if (required_mem > dev_priv->max_mob_size) + return MODE_MEM; + + return MODE_OK; +}
static const struct drm_connector_funcs vmw_stdu_connector_funcs = { .dpms = vmw_du_connector_dpms, @@ -846,7 +887,7 @@ static const struct drm_connector_funcs vmw_stdu_connector_funcs = { static const struct drm_connector_helper_funcs vmw_stdu_connector_helper_funcs = { .get_modes = vmw_connector_get_modes, - .mode_valid = vmw_connector_mode_valid + .mode_valid = vmw_stdu_connector_mode_valid };
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Forbes ian.forbes@broadcom.com
[ Upstream commit fb5e19d2dd03eb995ccd468d599b2337f7f66555 ]
This limit became a hard cap starting with the change referenced below. Surface creation on the device will fail if the requested size is larger than this limit so altering the value arbitrarily will expose modes that are too large for the device's hard limits.
Fixes: 7ebb47c9f9ab ("drm/vmwgfx: Read new register for GB memory when available")
Signed-off-by: Ian Forbes ian.forbes@broadcom.com Signed-off-by: Zack Rusin zack.rusin@broadcom.com Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-3-ian.forbe... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 7 ------- 1 file changed, 7 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c index 58fb40c93100a..bea576434e475 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -956,13 +956,6 @@ static int vmw_driver_load(struct vmw_private *dev_priv, u32 pci_id) vmw_read(dev_priv, SVGA_REG_SUGGESTED_GBOBJECT_MEM_SIZE_KB);
- /* - * Workaround for low memory 2D VMs to compensate for the - * allocation taken by fbdev - */ - if (!(dev_priv->capabilities & SVGA_CAP_3D)) - mem_size *= 3; - dev_priv->max_mob_pages = mem_size * 1024 / PAGE_SIZE; dev_priv->max_primary_mem = vmw_read(dev_priv, SVGA_REG_MAX_PRIMARY_MEM);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Forbes ian.forbes@broadcom.com
[ Upstream commit dde1de06bd7248fd83c4ce5cf0dbe9e4e95bbb91 ]
STDU has its own mode_valid function now so this logic can be removed from the generic version.
Fixes: 935f795045a6 ("drm/vmwgfx: Refactor drm connector probing for display modes")
Signed-off-by: Ian Forbes ian.forbes@broadcom.com Signed-off-by: Zack Rusin zack.rusin@broadcom.com Link: https://patchwork.freedesktop.org/patch/msgid/20240521184720.767-4-ian.forbe... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/vmwgfx/vmwgfx_drv.h | 3 --- drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 26 +++++++++----------------- 2 files changed, 9 insertions(+), 20 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h index b019a1a1787af..c1430e55474cb 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.h @@ -1066,9 +1066,6 @@ void vmw_kms_cursor_snoop(struct vmw_surface *srf, int vmw_kms_write_svga(struct vmw_private *vmw_priv, unsigned width, unsigned height, unsigned pitch, unsigned bpp, unsigned depth); -bool vmw_kms_validate_mode_vram(struct vmw_private *dev_priv, - uint32_t pitch, - uint32_t height); int vmw_kms_present(struct vmw_private *dev_priv, struct drm_file *file_priv, struct vmw_framebuffer *vfb, diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c index 84ae4e10a2ebe..42fcf4698aba9 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c @@ -2157,13 +2157,12 @@ int vmw_kms_write_svga(struct vmw_private *vmw_priv, return 0; }
+static bool vmw_kms_validate_mode_vram(struct vmw_private *dev_priv, - uint32_t pitch, - uint32_t height) + u64 pitch, + u64 height) { - return ((u64) pitch * (u64) height) < (u64) - ((dev_priv->active_display_unit == vmw_du_screen_target) ? - dev_priv->max_primary_mem : dev_priv->vram_size); + return (pitch * height) < (u64)dev_priv->vram_size; }
/** @@ -2859,25 +2858,18 @@ int vmw_du_helper_plane_update(struct vmw_du_update_plane *update) enum drm_mode_status vmw_connector_mode_valid(struct drm_connector *connector, struct drm_display_mode *mode) { + enum drm_mode_status ret; struct drm_device *dev = connector->dev; struct vmw_private *dev_priv = vmw_priv(dev); - u32 max_width = dev_priv->texture_max_width; - u32 max_height = dev_priv->texture_max_height; u32 assumed_cpp = 4;
if (dev_priv->assume_16bpp) assumed_cpp = 2;
- if (dev_priv->active_display_unit == vmw_du_screen_target) { - max_width = min(dev_priv->stdu_max_width, max_width); - max_height = min(dev_priv->stdu_max_height, max_height); - } - - if (max_width < mode->hdisplay) - return MODE_BAD_HVALUE; - - if (max_height < mode->vdisplay) - return MODE_BAD_VVALUE; + ret = drm_mode_validate_size(mode, dev_priv->texture_max_width, + dev_priv->texture_max_height); + if (ret != MODE_OK) + return ret;
if (!vmw_kms_validate_mode_vram(dev_priv, mode->hdisplay * assumed_cpp,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ian Forbes ian.forbes@broadcom.com
[ Upstream commit 5703fc058efdafcdd6b70776ee562478f0753acb ]
These pointers are frequently the same and memcmp does not compare the pointers before comparing their contents so this was wasting cycles comparing 16 KiB of memory which will always be equal.
Fixes: bb6780aa5a1d ("drm/vmwgfx: Diff cursors when using cmds") Signed-off-by: Ian Forbes ian.forbes@broadcom.com Signed-off-by: Zack Rusin zack.rusin@broadcom.com Link: https://patchwork.freedesktop.org/patch/msgid/20240328190716.27367-1-ian.for... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c index 42fcf4698aba9..11755d143e657 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c @@ -216,7 +216,7 @@ static bool vmw_du_cursor_plane_has_changed(struct vmw_plane_state *old_vps, new_image = vmw_du_cursor_plane_acquire_image(new_vps);
changed = false; - if (old_image && new_image) + if (old_image && new_image && old_image != new_image) changed = memcmp(old_image, new_image, size) != 0;
return changed;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 1b536948e805aab61a48c5aa5db10c9afee880bd ]
Once sk->sk_state is changed to TCP_LISTEN, it never changes.
unix_accept() takes the advantage and reads sk->sk_state without holding unix_state_lock().
Let's use READ_ONCE() there.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -1704,7 +1704,7 @@ static int unix_accept(struct socket *so goto out;
err = -EINVAL; - if (sk->sk_state != TCP_LISTEN) + if (READ_ONCE(sk->sk_state) != TCP_LISTEN) goto out;
/* If socket state is TCP_LISTEN it cannot change (for now...),
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit 9185afeac2a3dcce8300a5684291a43c2838cfd6 ]
Building with W=1 incorrectly emits the following warning:
WARNING: modpost: missing MODULE_DESCRIPTION() in vmlinux.o
This check should apply only to modules.
Fixes: 1fffe7a34c89 ("script: modpost: emit a warning when the description is missing") Signed-off-by: Masahiro Yamada masahiroy@kernel.org Reviewed-by: Vincenzo Palazzo vincenzopalazzodev@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/mod/modpost.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/scripts/mod/modpost.c b/scripts/mod/modpost.c index 2f5b91da5afa9..c27c762e68807 100644 --- a/scripts/mod/modpost.c +++ b/scripts/mod/modpost.c @@ -1652,10 +1652,11 @@ static void read_symbols(const char *modname) namespace = get_next_modinfo(&info, "import_ns", namespace); } + + if (extra_warn && !get_modinfo(&info, "description")) + warn("missing MODULE_DESCRIPTION() in %s\n", modname); }
- if (extra_warn && !get_modinfo(&info, "description")) - warn("missing MODULE_DESCRIPTION() in %s\n", modname); for (sym = info.symtab_start; sym < info.symtab_stop; sym++) { symname = remove_dot(info.strtab + sym->st_name);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Csókás, Bence csokas.bence@prolan.hu
[ Upstream commit e96b2933152fd87b6a41765b2f58b158fde855b6 ]
If the module is in SFP_MOD_ERROR, `sfp_sm_mod_remove()` will not be run. As a consequence, `sfp_hwmon_remove()` is not getting run either, leaving a stale `hwmon` device behind. `sfp_sm_mod_remove()` itself checks `sfp->sm_mod_state` anyways, so this check was not really needed in the first place.
Fixes: d2e816c0293f ("net: sfp: handle module remove outside state machine") Signed-off-by: "Csókás, Bence" csokas.bence@prolan.hu Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://lore.kernel.org/r/20240605084251.63502-1-csokas.bence@prolan.hu Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/sfp.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/phy/sfp.c b/drivers/net/phy/sfp.c index f75c9eb3958ef..d999d9baadb26 100644 --- a/drivers/net/phy/sfp.c +++ b/drivers/net/phy/sfp.c @@ -2418,8 +2418,7 @@ static void sfp_sm_module(struct sfp *sfp, unsigned int event)
/* Handle remove event globally, it resets this state machine */ if (event == SFP_E_REMOVE) { - if (sfp->sm_mod_state > SFP_MOD_PROBE) - sfp_sm_mod_remove(sfp); + sfp_sm_mod_remove(sfp); sfp_sm_mod_next(sfp, SFP_MOD_EMPTY, 0); return; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yonglong Liu liuyonglong@huawei.com
[ Upstream commit 12cda920212a49fa22d9e8b9492ac4ea013310a4 ]
When link status change, the nic driver need to notify the roce driver to handle this event, but at this time, the roce driver may uninit, then cause kernel crash.
To fix the problem, when link status change, need to check whether the roce registered, and when uninit, need to wait link update finish.
Fixes: 45e92b7e4e27 ("net: hns3: add calling roce callback function when link status change") Signed-off-by: Yonglong Liu liuyonglong@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../hisilicon/hns3/hns3pf/hclge_main.c | 21 ++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index ce60332d83c39..990d7bd397aa8 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -3042,9 +3042,7 @@ static void hclge_push_link_status(struct hclge_dev *hdev)
static void hclge_update_link_status(struct hclge_dev *hdev) { - struct hnae3_handle *rhandle = &hdev->vport[0].roce; struct hnae3_handle *handle = &hdev->vport[0].nic; - struct hnae3_client *rclient = hdev->roce_client; struct hnae3_client *client = hdev->nic_client; int state; int ret; @@ -3068,8 +3066,15 @@ static void hclge_update_link_status(struct hclge_dev *hdev)
client->ops->link_status_change(handle, state); hclge_config_mac_tnl_int(hdev, state); - if (rclient && rclient->ops->link_status_change) - rclient->ops->link_status_change(rhandle, state); + + if (test_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state)) { + struct hnae3_handle *rhandle = &hdev->vport[0].roce; + struct hnae3_client *rclient = hdev->roce_client; + + if (rclient && rclient->ops->link_status_change) + rclient->ops->link_status_change(rhandle, + state); + }
hclge_push_link_status(hdev); } @@ -11245,6 +11250,12 @@ static int hclge_init_client_instance(struct hnae3_client *client, return ret; }
+static bool hclge_uninit_need_wait(struct hclge_dev *hdev) +{ + return test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state) || + test_bit(HCLGE_STATE_LINK_UPDATING, &hdev->state); +} + static void hclge_uninit_client_instance(struct hnae3_client *client, struct hnae3_ae_dev *ae_dev) { @@ -11253,7 +11264,7 @@ static void hclge_uninit_client_instance(struct hnae3_client *client,
if (hdev->roce_client) { clear_bit(HCLGE_STATE_ROCE_REGISTERED, &hdev->state); - while (test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state)) + while (hclge_uninit_need_wait(hdev)) msleep(HCLGE_WAIT_RESET_DONE);
hdev->roce_client->ops->uninit_instance(&vport->roce, 0);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jie Wang wangjie125@huawei.com
[ Upstream commit 968fde83841a8c23558dfbd0a0c69d636db52b55 ]
Currently hns3 ring buffer init process would hold cpu too long with big Tx/Rx ring depth. This could cause soft lockup.
So this patch adds cond_resched() to the process. Then cpu can break to run other tasks instead of busy looping.
Fixes: a723fb8efe29 ("net: hns3: refine for set ring parameters") Signed-off-by: Jie Wang wangjie125@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 4 ++++ drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 2 ++ 2 files changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 19668a8d22f76..c9258b1b2b429 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -3539,6 +3539,9 @@ static int hns3_alloc_ring_buffers(struct hns3_enet_ring *ring) ret = hns3_alloc_and_attach_buffer(ring, i); if (ret) goto out_buffer_fail; + + if (!(i % HNS3_RESCHED_BD_NUM)) + cond_resched(); }
return 0; @@ -5111,6 +5114,7 @@ int hns3_init_all_ring(struct hns3_nic_priv *priv) }
u64_stats_init(&priv->ring[i].syncp); + cond_resched(); }
return 0; diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h index acd756b0c7c9a..d36c4ed16d8dd 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h @@ -214,6 +214,8 @@ enum hns3_nic_state { #define HNS3_CQ_MODE_EQE 1U #define HNS3_CQ_MODE_CQE 0U
+#define HNS3_RESCHED_BD_NUM 1024 + enum hns3_pkt_l2t_type { HNS3_L2_TYPE_UNICAST, HNS3_L2_TYPE_MULTICAST,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
[ Upstream commit 1af89dedc8a58006d8e385b1e0d2cd24df8a3b69 ]
It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()") causes the ACPI fan driver to fail probing on some systems which turns out to be due to the _FST control method returning an invalid value until _FSL is first evaluated for the given fan. If this happens, the .get_cur_state() cooling device callback returns an error and __thermal_cooling_device_register() fails as uses that callback after commit 31a0fa0019b0.
Arguably, _FST should not return an invalid value even if it is evaluated before _FSL, so this may be regarded as a platform firmware issue, but at the same time it is not a good enough reason for failing the cooling device registration where the initial cooling device state is only needed to initialize a thermal debug facility.
Accordingly, modify __thermal_cooling_device_register() to avoid calling thermal_debug_cdev_add() instead of returning an error if the initial .get_cur_state() callback invocation fails.
Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()") Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@collabo... Reported-by: Laura Nao laura.nao@collabora.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Daniel Lezcano daniel.lezcano@linaro.org Tested-by: Laura Nao laura.nao@collabora.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/thermal_core.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 38b7d02384d7c..258482036f1e9 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -936,9 +936,17 @@ __thermal_cooling_device_register(struct device_node *np, if (ret) goto out_cdev_type;
+ /* + * The cooling device's current state is only needed for debug + * initialization below, so a failure to get it does not cause + * the entire cooling device initialization to fail. However, + * the debug will not work for the device if its initial state + * cannot be determined and drivers are responsible for ensuring + * that this will not happen. + */ ret = cdev->ops->get_cur_state(cdev, ¤t_state); if (ret) - goto out_cdev_type; + current_state = ULONG_MAX;
thermal_cooling_device_setup_sysfs(cdev);
@@ -953,7 +961,8 @@ __thermal_cooling_device_register(struct device_node *np, return ERR_PTR(ret); }
- thermal_debug_cdev_add(cdev, current_state); + if (current_state <= cdev->max_state) + thermal_debug_cdev_add(cdev, current_state);
/* Add 'this' new cdev to the global cdev list */ mutex_lock(&thermal_list_lock);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aleksandr Mishin amishin@t-argos.ru
[ Upstream commit c44711b78608c98a3e6b49ce91678cd0917d5349 ]
In lio_vf_rep_copy_packet() pg_info->page is compared to a NULL value, but then it is unconditionally passed to skb_add_rx_frag() which looks strange and could lead to null pointer dereference.
lio_vf_rep_copy_packet() call trace looks like: octeon_droq_process_packets octeon_droq_fast_process_packets octeon_droq_dispatch_pkt octeon_create_recv_info ...search in the dispatch_list... ->disp_fn(rdisp->rinfo, ...) lio_vf_rep_pkt_recv(struct octeon_recv_info *recv_info, ...) In this path there is no code which sets pg_info->page to NULL. So this check looks unneeded and doesn't solve potential problem. But I guess the author had reason to add a check and I have no such card and can't do real test. In addition, the code in the function liquidio_push_packet() in liquidio/lio_core.c does exactly the same.
Based on this, I consider the most acceptable compromise solution to adjust this issue by moving skb_add_rx_frag() into conditional scope.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 1f233f327913 ("liquidio: switchdev support for LiquidIO NIC") Signed-off-by: Aleksandr Mishin amishin@t-argos.ru Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c b/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c index aa6c0dfb6f1ca..e26b4ed33dc83 100644 --- a/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c +++ b/drivers/net/ethernet/cavium/liquidio/lio_vf_rep.c @@ -272,13 +272,12 @@ lio_vf_rep_copy_packet(struct octeon_device *oct, pg_info->page_offset; memcpy(skb->data, va, MIN_SKB_SIZE); skb_put(skb, MIN_SKB_SIZE); + skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, + pg_info->page, + pg_info->page_offset + MIN_SKB_SIZE, + len - MIN_SKB_SIZE, + LIO_RXBUFFER_SZ); } - - skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, - pg_info->page, - pg_info->page_offset + MIN_SKB_SIZE, - len - MIN_SKB_SIZE, - LIO_RXBUFFER_SZ); } else { struct octeon_skb_page_info *pg_info = ((struct octeon_skb_page_info *)(skb->cb));
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sagar Cheluvegowda quic_scheluve@quicinc.com
[ Upstream commit 0579f27249047006a818e463ee66a6c314d04cea ]
Commit 070246e4674b ("net: stmmac: Fix for mismatched host/device DMA address width") added support in the stmmac driver for platform drivers to indicate the host DMA width, but left it up to authors of the specific platforms to indicate if their width differed from the addr64 register read from the MAC itself.
Qualcomm's EMAC4 integration supports only up to 36 bit width (as opposed to the addr64 register indicating 40 bit width). Let's indicate that in the platform driver to avoid a scenario where the driver will allocate descriptors of size that is supported by the CPU which in our case is 36 bit, but as the addr64 register is still capable of 40 bits the device will use two descriptors as one address.
Fixes: 8c4d92e82d50 ("net: stmmac: dwmac-qcom-ethqos: add support for emac4 on sa8775p platforms") Signed-off-by: Sagar Cheluvegowda quic_scheluve@quicinc.com Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Andrew Halaney ahalaney@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c index e254b21fdb598..65d7370b47d57 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-qcom-ethqos.c @@ -93,6 +93,7 @@ struct ethqos_emac_driver_data { bool has_emac_ge_3; const char *link_clk_name; bool has_integrated_pcs; + u32 dma_addr_width; struct dwmac4_addrs dwmac4_addrs; };
@@ -276,6 +277,7 @@ static const struct ethqos_emac_driver_data emac_v4_0_0_data = { .has_emac_ge_3 = true, .link_clk_name = "phyaux", .has_integrated_pcs = true, + .dma_addr_width = 36, .dwmac4_addrs = { .dma_chan = 0x00008100, .dma_chan_offset = 0x1000, @@ -845,6 +847,8 @@ static int qcom_ethqos_probe(struct platform_device *pdev) plat_dat->flags |= STMMAC_FLAG_RX_CLK_RUNS_IN_LPI; if (data->has_integrated_pcs) plat_dat->flags |= STMMAC_FLAG_HAS_INTEGRATED_PCS; + if (data->dma_addr_width) + plat_dat->host_dma_width = data->dma_addr_width;
if (ethqos->serdes_phy) { plat_dat->serdes_powerup = qcom_ethqos_serdes_powerup;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Wei dw@davidwei.uk
[ Upstream commit 5add2f7288468f35a374620dabf126c13baaea9c ]
The default ndo_get_iflink() implementation returns the current ifindex of the netdev. But the overridden nsim_get_iflink() returns 0 if the current nsim is not linked, breaking backwards compatibility for userspace that depend on this behaviour.
Fix the problem by returning the current ifindex if not linked to a peer.
Fixes: 8debcf5832c3 ("netdevsim: add ndo_get_iflink() implementation") Reported-by: Yu Watanabe watanabe.yu@gmail.com Suggested-by: Yu Watanabe watanabe.yu@gmail.com Signed-off-by: David Wei dw@davidwei.uk Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/netdevsim/netdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/netdevsim/netdev.c b/drivers/net/netdevsim/netdev.c index 8330bc0bcb7e5..d405809030aab 100644 --- a/drivers/net/netdevsim/netdev.c +++ b/drivers/net/netdevsim/netdev.c @@ -292,7 +292,8 @@ static int nsim_get_iflink(const struct net_device *dev)
rcu_read_lock(); peer = rcu_dereference(nsim->peer); - iflink = peer ? READ_ONCE(peer->netdev->ifindex) : 0; + iflink = peer ? READ_ONCE(peer->netdev->ifindex) : + READ_ONCE(dev->ifindex); rcu_read_unlock();
return iflink;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amjad Ouled-Ameur amjad.ouled-ameur@arm.com
[ Upstream commit b880018edd3a577e50366338194dee9b899947e0 ]
komeda_pipeline_get_state() may return an error-valued pointer, thus check the pointer for negative or null value before dereferencing.
Fixes: 502932a03fce ("drm/komeda: Add the initial scaler support for CORE") Signed-off-by: Amjad Ouled-Ameur amjad.ouled-ameur@arm.com Signed-off-by: Maxime Ripard mripard@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20240610102056.40406-1-amjad.o... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c b/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c index f3e744172673c..f4e76b46ca327 100644 --- a/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c +++ b/drivers/gpu/drm/arm/display/komeda/komeda_pipeline_state.c @@ -259,7 +259,7 @@ komeda_component_get_avail_scaler(struct komeda_component *c, u32 avail_scalers;
pipe_st = komeda_pipeline_get_state(c->pipeline, state); - if (!pipe_st) + if (IS_ERR_OR_NULL(pipe_st)) return NULL;
avail_scalers = (pipe_st->active_comps & KOMEDA_PIPELINE_SCALERS) ^
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adam Miotk adam.miotk@arm.com
[ Upstream commit ce62600c4dbee8d43b02277669dd91785a9b81d9 ]
Device managed panel bridge wrappers are created by calling to drm_panel_bridge_add_typed() and registering a release handler for clean-up when the device gets unbound.
Since the memory for this bridge is also managed and linked to the panel device, the release function should not try to free that memory. Moreover, the call to devm_kfree() inside drm_panel_bridge_remove() will fail in this case and emit a warning because the panel bridge resource is no longer on the device resources list (it has been removed from there before the call to release handlers).
Fixes: 67022227ffb1 ("drm/bridge: Add a devm_ allocator for panel bridge.") Signed-off-by: Adam Miotk adam.miotk@arm.com Signed-off-by: Maxime Ripard mripard@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20240610102739.139852-1-adam.m... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/panel.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/bridge/panel.c b/drivers/gpu/drm/bridge/panel.c index 7f41525f7a6e6..3d6e8f096a5d4 100644 --- a/drivers/gpu/drm/bridge/panel.c +++ b/drivers/gpu/drm/bridge/panel.c @@ -358,9 +358,12 @@ EXPORT_SYMBOL(drm_panel_bridge_set_orientation);
static void devm_drm_panel_bridge_release(struct device *dev, void *res) { - struct drm_bridge **bridge = res; + struct drm_bridge *bridge = *(struct drm_bridge **)res;
- drm_panel_bridge_remove(*bridge); + if (!bridge) + return; + + drm_bridge_remove(bridge); }
/**
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit d37fe4255abe8e7b419b90c5847e8ec2b8debb08 ]
tcp_v6_syn_recv_sock() calls ip6_dst_store() before inet_sk(newsk)->pinet6 has been set up.
This means ip6_dst_store() writes over the parent (listener) np->dst_cookie.
This is racy because multiple threads could share the same parent and their final np->dst_cookie could be wrong.
Move ip6_dst_store() call after inet_sk(newsk)->pinet6 has been changed and after the copy of parent ipv6_pinfo.
Fixes: e994b2f0fb92 ("tcp: do not lock listener to process SYN packets") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/tcp_ipv6.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 5873b3c3562ed..2b2eda5a2894d 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -1438,7 +1438,6 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff * */
newsk->sk_gso_type = SKB_GSO_TCPV6; - ip6_dst_store(newsk, dst, NULL, NULL); inet6_sk_rx_dst_set(newsk, skb);
inet_sk(newsk)->pinet6 = tcp_inet6_sk(newsk); @@ -1449,6 +1448,8 @@ static struct sock *tcp_v6_syn_recv_sock(const struct sock *sk, struct sk_buff *
memcpy(newnp, np, sizeof(struct ipv6_pinfo));
+ ip6_dst_store(newsk, dst, NULL, NULL); + newsk->sk_v6_daddr = ireq->ir_v6_rmt_addr; newnp->saddr = ireq->ir_v6_loc_addr; newsk->sk_v6_rcv_saddr = ireq->ir_v6_loc_addr;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit d029edefed39647c797c2710aedd9d31f84c069e ]
The documentation for device_get_named_child_node() mentions this important point:
" The caller is responsible for calling fwnode_handle_put() on the returned fwnode pointer. "
Add fwnode_handle_put() to avoid leaked references.
Fixes: 1e264f9d2918 ("net: dsa: qca8k: add LEDs basic support") Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/qca/qca8k-leds.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/qca/qca8k-leds.c b/drivers/net/dsa/qca/qca8k-leds.c index 811ebeeff4ed7..43ac68052baf9 100644 --- a/drivers/net/dsa/qca/qca8k-leds.c +++ b/drivers/net/dsa/qca/qca8k-leds.c @@ -431,8 +431,11 @@ qca8k_parse_port_leds(struct qca8k_priv *priv, struct fwnode_handle *port, int p init_data.devicename = kasprintf(GFP_KERNEL, "%s:0%d", priv->internal_mdio_bus->id, port_num); - if (!init_data.devicename) + if (!init_data.devicename) { + fwnode_handle_put(led); + fwnode_handle_put(leds); return -ENOMEM; + }
ret = devm_led_classdev_register_ext(priv->dev, &port_led->cdev, &init_data); if (ret) @@ -441,6 +444,7 @@ qca8k_parse_port_leds(struct qca8k_priv *priv, struct fwnode_handle *port, int p kfree(init_data.devicename); }
+ fwnode_handle_put(leds); return 0; }
@@ -471,9 +475,13 @@ qca8k_setup_led_ctrl(struct qca8k_priv *priv) * the correct port for LED setup. */ ret = qca8k_parse_port_leds(priv, port, qca8k_port_to_phy(port_num)); - if (ret) + if (ret) { + fwnode_handle_put(port); + fwnode_handle_put(ports); return ret; + } }
+ fwnode_handle_put(ports); return 0; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gal Pressman gal@nvidia.com
[ Upstream commit c6ae073f5903f6c6439d0ac855836a4da5c0a701 ]
When innerprotoinherit is set, the tunneled packets do not have an inner Ethernet header. Change 'maclen' to not always assume the header length is ETH_HLEN, as there might not be a MAC header.
This resolves issues with drivers (e.g. mlx5, in mlx5e_tx_tunnel_accel()) who rely on the skb inner network header offset to be correct, and use it for TX offloads.
Fixes: d8a6213d70ac ("geneve: fix header validation in geneve[6]_xmit_skb") Signed-off-by: Gal Pressman gal@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Reviewed-by: Wojciech Drewek wojciech.drewek@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/geneve.c | 10 ++++++---- include/net/ip_tunnels.h | 5 +++-- 2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/net/geneve.c b/drivers/net/geneve.c index 6c2835086b57e..76053d2c15958 100644 --- a/drivers/net/geneve.c +++ b/drivers/net/geneve.c @@ -811,6 +811,7 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev, struct geneve_dev *geneve, const struct ip_tunnel_info *info) { + bool inner_proto_inherit = geneve->cfg.inner_proto_inherit; bool xnet = !net_eq(geneve->net, dev_net(geneve->dev)); struct geneve_sock *gs4 = rcu_dereference(geneve->sock4); const struct ip_tunnel_key *key = &info->key; @@ -822,7 +823,7 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev, __be16 sport; int err;
- if (!skb_vlan_inet_prepare(skb)) + if (!skb_vlan_inet_prepare(skb, inner_proto_inherit)) return -EINVAL;
if (!gs4) @@ -903,7 +904,7 @@ static int geneve_xmit_skb(struct sk_buff *skb, struct net_device *dev, }
err = geneve_build_skb(&rt->dst, skb, info, xnet, sizeof(struct iphdr), - geneve->cfg.inner_proto_inherit); + inner_proto_inherit); if (unlikely(err)) return err;
@@ -919,6 +920,7 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev, struct geneve_dev *geneve, const struct ip_tunnel_info *info) { + bool inner_proto_inherit = geneve->cfg.inner_proto_inherit; bool xnet = !net_eq(geneve->net, dev_net(geneve->dev)); struct geneve_sock *gs6 = rcu_dereference(geneve->sock6); const struct ip_tunnel_key *key = &info->key; @@ -929,7 +931,7 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev, __be16 sport; int err;
- if (!skb_vlan_inet_prepare(skb)) + if (!skb_vlan_inet_prepare(skb, inner_proto_inherit)) return -EINVAL;
if (!gs6) @@ -991,7 +993,7 @@ static int geneve6_xmit_skb(struct sk_buff *skb, struct net_device *dev, ttl = ttl ? : ip6_dst_hoplimit(dst); } err = geneve_build_skb(dst, skb, info, xnet, sizeof(struct ipv6hdr), - geneve->cfg.inner_proto_inherit); + inner_proto_inherit); if (unlikely(err)) return err;
diff --git a/include/net/ip_tunnels.h b/include/net/ip_tunnels.h index c286cc2e766ee..cc666da626517 100644 --- a/include/net/ip_tunnels.h +++ b/include/net/ip_tunnels.h @@ -363,9 +363,10 @@ static inline bool pskb_inet_may_pull(struct sk_buff *skb)
/* Variant of pskb_inet_may_pull(). */ -static inline bool skb_vlan_inet_prepare(struct sk_buff *skb) +static inline bool skb_vlan_inet_prepare(struct sk_buff *skb, + bool inner_proto_inherit) { - int nhlen = 0, maclen = ETH_HLEN; + int nhlen = 0, maclen = inner_proto_inherit ? 0 : ETH_HLEN; __be16 type = skb->protocol;
/* Essentially this is skb_protocol(skb, true)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gal Pressman gal@nvidia.com
[ Upstream commit 791b4089e326271424b78f2fae778b20e53d071b ]
Move the vxlan_features_check() call to after we verified the packet is a tunneled VXLAN packet.
Without this, tunneled UDP non-VXLAN packets (for ex. GENENVE) might wrongly not get offloaded. In some cases, it worked by chance as GENEVE header is the same size as VXLAN, but it is obviously incorrect.
Fixes: e3cfc7e6b7bd ("net/mlx5e: TX, Add geneve tunnel stateless offload support") Signed-off-by: Gal Pressman gal@nvidia.com Reviewed-by: Dragos Tatulea dtatulea@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Reviewed-by: Wojciech Drewek wojciech.drewek@intel.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index 47be07af214ff..981a3e058840d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -4738,7 +4738,7 @@ static netdev_features_t mlx5e_tunnel_features_check(struct mlx5e_priv *priv,
/* Verify if UDP port is being offloaded by HW */ if (mlx5_vxlan_lookup_port(priv->mdev->vxlan, port)) - return features; + return vxlan_features_check(skb, features);
#if IS_ENABLED(CONFIG_GENEVE) /* Support Geneve offload for default UDP port */ @@ -4764,7 +4764,6 @@ netdev_features_t mlx5e_features_check(struct sk_buff *skb, struct mlx5e_priv *priv = netdev_priv(netdev);
features = vlan_features_check(skb, features); - features = vxlan_features_check(skb, features);
/* Validate if the tunneled packet is being offloaded by HW */ if (skb->encapsulation &&
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 86fbd9f63a6b42b8f158361334f5a25762aea358 ]
When setting up an advertisement the code shall always attempt to use the handle set by the instance since it may not be equal to the instance ID.
Fixes: e77f43d531af ("Bluetooth: hci_core: Fix not handling hdev->le_num_of_adv_sets=1") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_sync.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c index 64f794d198cdc..7bfa6b59ba87e 100644 --- a/net/bluetooth/hci_sync.c +++ b/net/bluetooth/hci_sync.c @@ -1194,7 +1194,7 @@ int hci_setup_ext_adv_instance_sync(struct hci_dev *hdev, u8 instance)
cp.own_addr_type = own_addr_type; cp.channel_map = hdev->le_adv_channel_map; - cp.handle = instance; + cp.handle = adv ? adv->handle : instance;
if (flags & MGMT_ADV_FLAG_SEC_2M) { cp.primary_phy = HCI_ADV_PHY_1M;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 806a5198c05987b748b50f3d0c0cfb3d417381a4 ]
This removes the bogus check for max > hcon->le_conn_max_interval since the later is just the initial maximum conn interval not the maximum the stack could support which is really 3200=4000ms.
In order to pass GAP/CONN/CPUP/BV-05-C one shall probably enter values of the following fields in IXIT that would cause hci_check_conn_params to fail:
TSPX_conn_update_int_min TSPX_conn_update_int_max TSPX_conn_update_peripheral_latency TSPX_conn_update_supervision_timeout
Link: https://github.com/bluez/bluez/issues/847 Fixes: e4b019515f95 ("Bluetooth: Enforce validation on max value of connection interval") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/bluetooth/hci_core.h | 36 ++++++++++++++++++++++++++++---- net/bluetooth/l2cap_core.c | 8 +------ 2 files changed, 33 insertions(+), 11 deletions(-)
diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index 970101184cf10..17b2a7baa4d8f 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -2113,18 +2113,46 @@ static inline int hci_check_conn_params(u16 min, u16 max, u16 latency, { u16 max_latency;
- if (min > max || min < 6 || max > 3200) + if (min > max) { + BT_WARN("min %d > max %d", min, max); return -EINVAL; + } + + if (min < 6) { + BT_WARN("min %d < 6", min); + return -EINVAL; + } + + if (max > 3200) { + BT_WARN("max %d > 3200", max); + return -EINVAL; + } + + if (to_multiplier < 10) { + BT_WARN("to_multiplier %d < 10", to_multiplier); + return -EINVAL; + }
- if (to_multiplier < 10 || to_multiplier > 3200) + if (to_multiplier > 3200) { + BT_WARN("to_multiplier %d > 3200", to_multiplier); return -EINVAL; + }
- if (max >= to_multiplier * 8) + if (max >= to_multiplier * 8) { + BT_WARN("max %d >= to_multiplier %d * 8", max, to_multiplier); return -EINVAL; + }
max_latency = (to_multiplier * 4 / max) - 1; - if (latency > 499 || latency > max_latency) + if (latency > 499) { + BT_WARN("latency %d > 499", latency); return -EINVAL; + } + + if (latency > max_latency) { + BT_WARN("latency %d > max_latency %d", latency, max_latency); + return -EINVAL; + }
return 0; } diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index 4a633c1b68825..e3f5d830e42bf 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -4645,13 +4645,7 @@ static inline int l2cap_conn_param_update_req(struct l2cap_conn *conn,
memset(&rsp, 0, sizeof(rsp));
- if (max > hcon->le_conn_max_interval) { - BT_DBG("requested connection interval exceeds current bounds."); - err = -EINVAL; - } else { - err = hci_check_conn_params(min, max, latency, to_multiplier); - } - + err = hci_check_conn_params(min, max, latency, to_multiplier); if (err) rsp.result = cpu_to_le16(L2CAP_CONN_PARAM_REJECTED); else
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pauli Virtanen pav@iki.fi
[ Upstream commit c695439d198d30e10553a3b98360c5efe77b6903 ]
The amp_id argument of l2cap_connect() was removed in commit 84a4bb6548a2 ("Bluetooth: HCI: Remove HCI_AMP support")
It was always called with amp_id == 0, i.e. AMP_ID_BREDR == 0x00 (ie. non-AMP controller). In the above commit, the code path for amp_id != 0 was preserved, although it should have used the amp_id == 0 one.
Restore the previous behavior of the non-AMP code path, to fix problems with L2CAP connections.
Fixes: 84a4bb6548a2 ("Bluetooth: HCI: Remove HCI_AMP support") Signed-off-by: Pauli Virtanen pav@iki.fi Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/l2cap_core.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c index e3f5d830e42bf..9394a158d1b1a 100644 --- a/net/bluetooth/l2cap_core.c +++ b/net/bluetooth/l2cap_core.c @@ -4009,8 +4009,8 @@ static void l2cap_connect(struct l2cap_conn *conn, struct l2cap_cmd_hdr *cmd, status = L2CAP_CS_AUTHOR_PEND; chan->ops->defer(chan); } else { - l2cap_state_change(chan, BT_CONNECT2); - result = L2CAP_CR_PEND; + l2cap_state_change(chan, BT_CONFIG); + result = L2CAP_CR_SUCCESS; status = L2CAP_CS_NO_INFO; } } else {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit 44180feaccf266d9b0b28cc4ceaac019817deb5c ]
When the noop_qdisc owner isn't initialized, then it will be 0, so packets will erroneously be regarded as having been subject to recursion as long as only CPU 0 queues them. For non-SMP, that's all packets, of course. This causes a change in what's reported to userspace, normally noop_qdisc would drop packets silently, but with this change the syscall returns -ENOBUFS if RECVERR is also set on the socket.
Fix this by initializing the owner field to -1, just like it would be for dynamically allocated qdiscs by qdisc_alloc().
Fixes: 0f022d32c3ec ("net/sched: Fix mirred deadlock on device recursion") Signed-off-by: Johannes Berg johannes.berg@intel.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240607175340.786bfb938803.I493bf8422e36be4454c08... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_generic.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index 4a2c763e2d116..10b1491d55809 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -673,6 +673,7 @@ struct Qdisc noop_qdisc = { .qlen = 0, .lock = __SPIN_LOCK_UNLOCKED(noop_qdisc.skb_bad_txq.lock), }, + .owner = -1, }; EXPORT_SYMBOL(noop_qdisc);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 36534d3c54537bf098224a32dc31397793d4594d ]
Due to timer wheel implementation, a timer will usually fire after its schedule.
For instance, for HZ=1000, a timeout between 512ms and 4s has a granularity of 64ms. For this range of values, the extra delay could be up to 63ms.
For TCP, this means that tp->rcv_tstamp may be after inet_csk(sk)->icsk_timeout whenever the timer interrupt finally triggers, if one packet came during the extra delay.
We need to make sure tcp_rtx_probe0_timed_out() handles this case.
Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Menglong Dong imagedong@tencent.com Acked-by: Neal Cardwell ncardwell@google.com Reviewed-by: Jason Xing kerneljasonxing@gmail.com Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_timer.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/tcp_timer.c b/net/ipv4/tcp_timer.c index d1ad20ce1c8c7..f96f68cf7961c 100644 --- a/net/ipv4/tcp_timer.c +++ b/net/ipv4/tcp_timer.c @@ -483,8 +483,12 @@ static bool tcp_rtx_probe0_timed_out(const struct sock *sk, { const struct tcp_sock *tp = tcp_sk(sk); const int timeout = TCP_RTO_MAX * 2; - u32 rcv_delta; + s32 rcv_delta;
+ /* Note: timer interrupt might have been delayed by at least one jiffy, + * and tp->rcv_tstamp might very well have been written recently. + * rcv_delta can thus be negative. + */ rcv_delta = inet_csk(sk)->icsk_timeout - tp->rcv_tstamp; if (rcv_delta <= timeout) return false;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasily Khoruzhick anarsoul@gmail.com
[ Upstream commit b96a225377b6602299a03d2ce3c289b68cd41bb7 ]
If the card doesn't have display hardware, hpd_work and hpd_lock are left uninitialized which causes BUG when attempting to schedule hpd_work on runtime PM resume.
Fix it by adding headless flag to DRM and skip any hpd if it's set.
Fixes: ae1aadb1eb8d ("nouveau: don't fail driver load if no display hw present.") Link: https://gitlab.freedesktop.org/drm/nouveau/-/issues/337 Signed-off-by: Vasily Khoruzhick anarsoul@gmail.com Reviewed-by: Ben Skeggs bskeggs@nvidia.com Signed-off-by: Danilo Krummrich dakr@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20240607221032.25918-1-anarsou... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/dispnv04/disp.c | 2 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 2 +- drivers/gpu/drm/nouveau/nouveau_display.c | 6 +++++- drivers/gpu/drm/nouveau/nouveau_drv.h | 1 + 4 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/dispnv04/disp.c b/drivers/gpu/drm/nouveau/dispnv04/disp.c index 13705c5f14973..4b7497a8755cd 100644 --- a/drivers/gpu/drm/nouveau/dispnv04/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv04/disp.c @@ -68,7 +68,7 @@ nv04_display_fini(struct drm_device *dev, bool runtime, bool suspend) if (nv_two_heads(dev)) NVWriteCRTC(dev, 1, NV_PCRTC_INTR_EN_0, 0);
- if (!runtime) + if (!runtime && !drm->headless) cancel_work_sync(&drm->hpd_work);
if (!suspend) diff --git a/drivers/gpu/drm/nouveau/dispnv50/disp.c b/drivers/gpu/drm/nouveau/dispnv50/disp.c index 0c3d88ad0b0ea..88151209033d5 100644 --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -2680,7 +2680,7 @@ nv50_display_fini(struct drm_device *dev, bool runtime, bool suspend) nv50_mstm_fini(nouveau_encoder(encoder)); }
- if (!runtime) + if (!runtime && !drm->headless) cancel_work_sync(&drm->hpd_work); }
diff --git a/drivers/gpu/drm/nouveau/nouveau_display.c b/drivers/gpu/drm/nouveau/nouveau_display.c index f28f9a8574586..60c32244211d0 100644 --- a/drivers/gpu/drm/nouveau/nouveau_display.c +++ b/drivers/gpu/drm/nouveau/nouveau_display.c @@ -450,6 +450,9 @@ nouveau_display_hpd_resume(struct drm_device *dev) { struct nouveau_drm *drm = nouveau_drm(dev);
+ if (drm->headless) + return; + spin_lock_irq(&drm->hpd_lock); drm->hpd_pending = ~0; spin_unlock_irq(&drm->hpd_lock); @@ -635,7 +638,7 @@ nouveau_display_fini(struct drm_device *dev, bool suspend, bool runtime) } drm_connector_list_iter_end(&conn_iter);
- if (!runtime) + if (!runtime && !drm->headless) cancel_work_sync(&drm->hpd_work);
drm_kms_helper_poll_disable(dev); @@ -729,6 +732,7 @@ nouveau_display_create(struct drm_device *dev) /* no display hw */ if (ret == -ENODEV) { ret = 0; + drm->headless = true; goto disp_create_err; }
diff --git a/drivers/gpu/drm/nouveau/nouveau_drv.h b/drivers/gpu/drm/nouveau/nouveau_drv.h index e239c6bf4afa4..25fca98a20bcd 100644 --- a/drivers/gpu/drm/nouveau/nouveau_drv.h +++ b/drivers/gpu/drm/nouveau/nouveau_drv.h @@ -276,6 +276,7 @@ struct nouveau_drm { /* modesetting */ struct nvbios vbios; struct nouveau_display *display; + bool headless; struct work_struct hpd_work; spinlock_t hpd_lock; u32 hpd_pending;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Davide Ornaghi d.ornaghi97@gmail.com
[ Upstream commit c4ab9da85b9df3692f861512fe6c9812f38b7471 ]
Check for mandatory netlink attributes in payload and meta expression when used embedded from the inner expression, otherwise NULL pointer dereference is possible from userspace.
Fixes: a150d122b6bd ("netfilter: nft_meta: add inner match support") Fixes: 3a07327d10a0 ("netfilter: nft_inner: support for inner tunnel header matching") Signed-off-by: Davide Ornaghi d.ornaghi97@gmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_meta.c | 3 +++ net/netfilter/nft_payload.c | 4 ++++ 2 files changed, 7 insertions(+)
diff --git a/net/netfilter/nft_meta.c b/net/netfilter/nft_meta.c index ba0d3683a45d3..9139ce38ea7b9 100644 --- a/net/netfilter/nft_meta.c +++ b/net/netfilter/nft_meta.c @@ -839,6 +839,9 @@ static int nft_meta_inner_init(const struct nft_ctx *ctx, struct nft_meta *priv = nft_expr_priv(expr); unsigned int len;
+ if (!tb[NFTA_META_KEY] || !tb[NFTA_META_DREG]) + return -EINVAL; + priv->key = ntohl(nla_get_be32(tb[NFTA_META_KEY])); switch (priv->key) { case NFT_META_PROTOCOL: diff --git a/net/netfilter/nft_payload.c b/net/netfilter/nft_payload.c index 0c43d748e23ae..50429cbd42da4 100644 --- a/net/netfilter/nft_payload.c +++ b/net/netfilter/nft_payload.c @@ -650,6 +650,10 @@ static int nft_payload_inner_init(const struct nft_ctx *ctx, struct nft_payload *priv = nft_expr_priv(expr); u32 base;
+ if (!tb[NFTA_PAYLOAD_BASE] || !tb[NFTA_PAYLOAD_OFFSET] || + !tb[NFTA_PAYLOAD_LEN] || !tb[NFTA_PAYLOAD_DREG]) + return -EINVAL; + base = ntohl(nla_get_be32(tb[NFTA_PAYLOAD_BASE])); switch (base) { case NFT_PAYLOAD_TUN_HEADER:
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jozsef Kadlecsik kadlec@netfilter.org
[ Upstream commit 4e7aaa6b82d63e8ddcbfb56b4fd3d014ca586f10 ]
Lion Ackermann reported that there is a race condition between namespace cleanup in ipset and the garbage collection of the list:set type. The namespace cleanup can destroy the list:set type of sets while the gc of the set type is waiting to run in rcu cleanup. The latter uses data from the destroyed set which thus leads use after free. The patch contains the following parts:
- When destroying all sets, first remove the garbage collectors, then wait if needed and then destroy the sets. - Fix the badly ordered "wait then remove gc" for the destroy a single set case. - Fix the missing rcu locking in the list:set type in the userspace test case. - Use proper RCU list handlings in the list:set type.
The patch depends on c1193d9bbbd3 (netfilter: ipset: Add list flush to cancel_gc).
Fixes: 97f7cf1cd80e (netfilter: ipset: fix performance regression in swap operation) Reported-by: Lion Ackermann nnamrec@gmail.com Tested-by: Lion Ackermann nnamrec@gmail.com Signed-off-by: Jozsef Kadlecsik kadlec@netfilter.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipset/ip_set_core.c | 81 +++++++++++++++------------ net/netfilter/ipset/ip_set_list_set.c | 30 +++++----- 2 files changed, 60 insertions(+), 51 deletions(-)
diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index 3184cc6be4c9d..c7ae4d9bf3d24 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -1172,23 +1172,50 @@ ip_set_setname_policy[IPSET_ATTR_CMD_MAX + 1] = { .len = IPSET_MAXNAMELEN - 1 }, };
+/* In order to return quickly when destroying a single set, it is split + * into two stages: + * - Cancel garbage collector + * - Destroy the set itself via call_rcu() + */ + static void -ip_set_destroy_set(struct ip_set *set) +ip_set_destroy_set_rcu(struct rcu_head *head) { - pr_debug("set: %s\n", set->name); + struct ip_set *set = container_of(head, struct ip_set, rcu);
- /* Must call it without holding any lock */ set->variant->destroy(set); module_put(set->type->me); kfree(set); }
static void -ip_set_destroy_set_rcu(struct rcu_head *head) +_destroy_all_sets(struct ip_set_net *inst) { - struct ip_set *set = container_of(head, struct ip_set, rcu); + struct ip_set *set; + ip_set_id_t i; + bool need_wait = false;
- ip_set_destroy_set(set); + /* First cancel gc's: set:list sets are flushed as well */ + for (i = 0; i < inst->ip_set_max; i++) { + set = ip_set(inst, i); + if (set) { + set->variant->cancel_gc(set); + if (set->type->features & IPSET_TYPE_NAME) + need_wait = true; + } + } + /* Must wait for flush to be really finished */ + if (need_wait) + rcu_barrier(); + for (i = 0; i < inst->ip_set_max; i++) { + set = ip_set(inst, i); + if (set) { + ip_set(inst, i) = NULL; + set->variant->destroy(set); + module_put(set->type->me); + kfree(set); + } + } }
static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info, @@ -1202,11 +1229,10 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info, if (unlikely(protocol_min_failed(attr))) return -IPSET_ERR_PROTOCOL;
- /* Commands are serialized and references are * protected by the ip_set_ref_lock. * External systems (i.e. xt_set) must call - * ip_set_put|get_nfnl_* functions, that way we + * ip_set_nfnl_get_* functions, that way we * can safely check references here. * * list:set timer can only decrement the reference @@ -1214,8 +1240,6 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info, * without holding the lock. */ if (!attr[IPSET_ATTR_SETNAME]) { - /* Must wait for flush to be really finished in list:set */ - rcu_barrier(); read_lock_bh(&ip_set_ref_lock); for (i = 0; i < inst->ip_set_max; i++) { s = ip_set(inst, i); @@ -1226,15 +1250,7 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info, } inst->is_destroyed = true; read_unlock_bh(&ip_set_ref_lock); - for (i = 0; i < inst->ip_set_max; i++) { - s = ip_set(inst, i); - if (s) { - ip_set(inst, i) = NULL; - /* Must cancel garbage collectors */ - s->variant->cancel_gc(s); - ip_set_destroy_set(s); - } - } + _destroy_all_sets(inst); /* Modified by ip_set_destroy() only, which is serialized */ inst->is_destroyed = false; } else { @@ -1255,12 +1271,12 @@ static int ip_set_destroy(struct sk_buff *skb, const struct nfnl_info *info, features = s->type->features; ip_set(inst, i) = NULL; read_unlock_bh(&ip_set_ref_lock); + /* Must cancel garbage collectors */ + s->variant->cancel_gc(s); if (features & IPSET_TYPE_NAME) { /* Must wait for flush to be really finished */ rcu_barrier(); } - /* Must cancel garbage collectors */ - s->variant->cancel_gc(s); call_rcu(&s->rcu, ip_set_destroy_set_rcu); } return 0; @@ -2365,30 +2381,25 @@ ip_set_net_init(struct net *net) }
static void __net_exit -ip_set_net_exit(struct net *net) +ip_set_net_pre_exit(struct net *net) { struct ip_set_net *inst = ip_set_pernet(net);
- struct ip_set *set = NULL; - ip_set_id_t i; - inst->is_deleted = true; /* flag for ip_set_nfnl_put */ +}
- nfnl_lock(NFNL_SUBSYS_IPSET); - for (i = 0; i < inst->ip_set_max; i++) { - set = ip_set(inst, i); - if (set) { - ip_set(inst, i) = NULL; - set->variant->cancel_gc(set); - ip_set_destroy_set(set); - } - } - nfnl_unlock(NFNL_SUBSYS_IPSET); +static void __net_exit +ip_set_net_exit(struct net *net) +{ + struct ip_set_net *inst = ip_set_pernet(net); + + _destroy_all_sets(inst); kvfree(rcu_dereference_protected(inst->ip_set_list, 1)); }
static struct pernet_operations ip_set_net_ops = { .init = ip_set_net_init, + .pre_exit = ip_set_net_pre_exit, .exit = ip_set_net_exit, .id = &ip_set_net_id, .size = sizeof(struct ip_set_net), diff --git a/net/netfilter/ipset/ip_set_list_set.c b/net/netfilter/ipset/ip_set_list_set.c index 54e2a1dd7f5f5..bfae7066936bb 100644 --- a/net/netfilter/ipset/ip_set_list_set.c +++ b/net/netfilter/ipset/ip_set_list_set.c @@ -79,7 +79,7 @@ list_set_kadd(struct ip_set *set, const struct sk_buff *skb, struct set_elem *e; int ret;
- list_for_each_entry(e, &map->members, list) { + list_for_each_entry_rcu(e, &map->members, list) { if (SET_WITH_TIMEOUT(set) && ip_set_timeout_expired(ext_timeout(e, set))) continue; @@ -99,7 +99,7 @@ list_set_kdel(struct ip_set *set, const struct sk_buff *skb, struct set_elem *e; int ret;
- list_for_each_entry(e, &map->members, list) { + list_for_each_entry_rcu(e, &map->members, list) { if (SET_WITH_TIMEOUT(set) && ip_set_timeout_expired(ext_timeout(e, set))) continue; @@ -188,9 +188,10 @@ list_set_utest(struct ip_set *set, void *value, const struct ip_set_ext *ext, struct list_set *map = set->data; struct set_adt_elem *d = value; struct set_elem *e, *next, *prev = NULL; - int ret; + int ret = 0;
- list_for_each_entry(e, &map->members, list) { + rcu_read_lock(); + list_for_each_entry_rcu(e, &map->members, list) { if (SET_WITH_TIMEOUT(set) && ip_set_timeout_expired(ext_timeout(e, set))) continue; @@ -201,6 +202,7 @@ list_set_utest(struct ip_set *set, void *value, const struct ip_set_ext *ext,
if (d->before == 0) { ret = 1; + goto out; } else if (d->before > 0) { next = list_next_entry(e, list); ret = !list_is_last(&e->list, &map->members) && @@ -208,9 +210,11 @@ list_set_utest(struct ip_set *set, void *value, const struct ip_set_ext *ext, } else { ret = prev && prev->id == d->refid; } - return ret; + goto out; } - return 0; +out: + rcu_read_unlock(); + return ret; }
static void @@ -239,7 +243,7 @@ list_set_uadd(struct ip_set *set, void *value, const struct ip_set_ext *ext,
/* Find where to add the new entry */ n = prev = next = NULL; - list_for_each_entry(e, &map->members, list) { + list_for_each_entry_rcu(e, &map->members, list) { if (SET_WITH_TIMEOUT(set) && ip_set_timeout_expired(ext_timeout(e, set))) continue; @@ -316,9 +320,9 @@ list_set_udel(struct ip_set *set, void *value, const struct ip_set_ext *ext, { struct list_set *map = set->data; struct set_adt_elem *d = value; - struct set_elem *e, *next, *prev = NULL; + struct set_elem *e, *n, *next, *prev = NULL;
- list_for_each_entry(e, &map->members, list) { + list_for_each_entry_safe(e, n, &map->members, list) { if (SET_WITH_TIMEOUT(set) && ip_set_timeout_expired(ext_timeout(e, set))) continue; @@ -424,14 +428,8 @@ static void list_set_destroy(struct ip_set *set) { struct list_set *map = set->data; - struct set_elem *e, *n;
- list_for_each_entry_safe(e, n, &map->members, list) { - list_del(&e->list); - ip_set_put_byindex(map->net, e->id); - ip_set_ext_destroy(set, e); - kfree(e); - } + WARN_ON_ONCE(!list_empty(&map->members)); kfree(map);
set->data = NULL;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Uros Bizjak ubizjak@gmail.com
[ Upstream commit 41cd2e1ee96e56401a18dbce6f42f0bdaebcbf3b ]
The "P" asm operand modifier is a x86 target-specific modifier.
When used with a constant, the "P" modifier emits "cst" instead of "$cst". This property is currently used to emit the bare constant without all syntax-specific prefixes.
The generic "c" resp. "n" operand modifier should be used instead.
No functional changes intended.
Signed-off-by: Uros Bizjak ubizjak@gmail.com Signed-off-by: Ingo Molnar mingo@kernel.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Ard Biesheuvel ardb@kernel.org Cc: "H. Peter Anvin" hpa@zytor.com Link: https://lore.kernel.org/r/20240319104418.284519-3-ubizjak@gmail.com Stable-dep-of: 8c860ed825cb ("x86/uaccess: Fix missed zeroing of ia32 u64 get_user() range checking") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/boot/main.c | 4 ++-- arch/x86/include/asm/alternative.h | 22 +++++++++++----------- arch/x86/include/asm/atomic64_32.h | 2 +- arch/x86/include/asm/cpufeature.h | 2 +- arch/x86/include/asm/irq_stack.h | 2 +- arch/x86/include/asm/uaccess.h | 4 ++-- 6 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/arch/x86/boot/main.c b/arch/x86/boot/main.c index c4ea5258ab558..9049f390d8347 100644 --- a/arch/x86/boot/main.c +++ b/arch/x86/boot/main.c @@ -119,8 +119,8 @@ static void init_heap(void) char *stack_end;
if (boot_params.hdr.loadflags & CAN_USE_HEAP) { - asm("leal %P1(%%esp),%0" - : "=r" (stack_end) : "i" (-STACK_SIZE)); + asm("leal %n1(%%esp),%0" + : "=r" (stack_end) : "i" (STACK_SIZE));
heap_end = (char *) ((size_t)boot_params.hdr.heap_end_ptr + 0x200); diff --git a/arch/x86/include/asm/alternative.h b/arch/x86/include/asm/alternative.h index 67b68d0d17d1e..0cb2396de066d 100644 --- a/arch/x86/include/asm/alternative.h +++ b/arch/x86/include/asm/alternative.h @@ -294,10 +294,10 @@ static inline int alternatives_text_reserved(void *start, void *end) * Otherwise, if CPU has feature1, newinstr1 is used. * Otherwise, oldinstr is used. */ -#define alternative_input_2(oldinstr, newinstr1, ft_flags1, newinstr2, \ - ft_flags2, input...) \ - asm_inline volatile(ALTERNATIVE_2(oldinstr, newinstr1, ft_flags1, \ - newinstr2, ft_flags2) \ +#define alternative_input_2(oldinstr, newinstr1, ft_flags1, newinstr2, \ + ft_flags2, input...) \ + asm_inline volatile(ALTERNATIVE_2(oldinstr, newinstr1, ft_flags1, \ + newinstr2, ft_flags2) \ : : "i" (0), ## input)
/* Like alternative_input, but with a single output argument */ @@ -307,7 +307,7 @@ static inline int alternatives_text_reserved(void *start, void *end)
/* Like alternative_io, but for replacing a direct call with another one. */ #define alternative_call(oldfunc, newfunc, ft_flags, output, input...) \ - asm_inline volatile (ALTERNATIVE("call %P[old]", "call %P[new]", ft_flags) \ + asm_inline volatile (ALTERNATIVE("call %c[old]", "call %c[new]", ft_flags) \ : output : [old] "i" (oldfunc), [new] "i" (newfunc), ## input)
/* @@ -316,12 +316,12 @@ static inline int alternatives_text_reserved(void *start, void *end) * Otherwise, if CPU has feature1, function1 is used. * Otherwise, old function is used. */ -#define alternative_call_2(oldfunc, newfunc1, ft_flags1, newfunc2, ft_flags2, \ - output, input...) \ - asm_inline volatile (ALTERNATIVE_2("call %P[old]", "call %P[new1]", ft_flags1,\ - "call %P[new2]", ft_flags2) \ - : output, ASM_CALL_CONSTRAINT \ - : [old] "i" (oldfunc), [new1] "i" (newfunc1), \ +#define alternative_call_2(oldfunc, newfunc1, ft_flags1, newfunc2, ft_flags2, \ + output, input...) \ + asm_inline volatile (ALTERNATIVE_2("call %c[old]", "call %c[new1]", ft_flags1, \ + "call %c[new2]", ft_flags2) \ + : output, ASM_CALL_CONSTRAINT \ + : [old] "i" (oldfunc), [new1] "i" (newfunc1), \ [new2] "i" (newfunc2), ## input)
/* diff --git a/arch/x86/include/asm/atomic64_32.h b/arch/x86/include/asm/atomic64_32.h index 3486d91b8595f..d510405e4e1de 100644 --- a/arch/x86/include/asm/atomic64_32.h +++ b/arch/x86/include/asm/atomic64_32.h @@ -24,7 +24,7 @@ typedef struct {
#ifdef CONFIG_X86_CMPXCHG64 #define __alternative_atomic64(f, g, out, in...) \ - asm volatile("call %P[func]" \ + asm volatile("call %c[func]" \ : out : [func] "i" (atomic64_##g##_cx8), ## in)
#define ATOMIC64_DECL(sym) ATOMIC64_DECL_ONE(sym##_cx8) diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h index 686e92d2663ee..3508f3fc928d4 100644 --- a/arch/x86/include/asm/cpufeature.h +++ b/arch/x86/include/asm/cpufeature.h @@ -173,7 +173,7 @@ extern void clear_cpu_cap(struct cpuinfo_x86 *c, unsigned int bit); static __always_inline bool _static_cpu_has(u16 bit) { asm goto( - ALTERNATIVE_TERNARY("jmp 6f", %P[feature], "", "jmp %l[t_no]") + ALTERNATIVE_TERNARY("jmp 6f", %c[feature], "", "jmp %l[t_no]") ".pushsection .altinstr_aux,"ax"\n" "6:\n" " testb %[bitnum]," _ASM_RIP(%P[cap_byte]) "\n" diff --git a/arch/x86/include/asm/irq_stack.h b/arch/x86/include/asm/irq_stack.h index 798183867d789..b71ad173f8776 100644 --- a/arch/x86/include/asm/irq_stack.h +++ b/arch/x86/include/asm/irq_stack.h @@ -100,7 +100,7 @@ }
#define ASM_CALL_ARG0 \ - "call %P[__func] \n" \ + "call %c[__func] \n" \ ASM_REACHABLE
#define ASM_CALL_ARG1 \ diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 237dc8cdd12b9..0f9bab92a43d7 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -78,7 +78,7 @@ extern int __get_user_bad(void); int __ret_gu; \ register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \ __chk_user_ptr(ptr); \ - asm volatile("call __" #fn "_%P4" \ + asm volatile("call __" #fn "_%c4" \ : "=a" (__ret_gu), "=r" (__val_gu), \ ASM_CALL_CONSTRAINT \ : "0" (ptr), "i" (sizeof(*(ptr)))); \ @@ -177,7 +177,7 @@ extern void __put_user_nocheck_8(void); __chk_user_ptr(__ptr); \ __ptr_pu = __ptr; \ __val_pu = __x; \ - asm volatile("call __" #fn "_%P[size]" \ + asm volatile("call __" #fn "_%c[size]" \ : "=c" (__ret_pu), \ ASM_CALL_CONSTRAINT \ : "0" (__ptr_pu), \
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kees Cook kees@kernel.org
[ Upstream commit 8c860ed825cb85f6672cd7b10a8f33e3498a7c81 ]
When reworking the range checking for get_user(), the get_user_8() case on 32-bit wasn't zeroing the high register. (The jump to bad_get_user_8 was accidentally dropped.) Restore the correct error handling destination (and rename the jump to using the expected ".L" prefix).
While here, switch to using a named argument ("size") for the call template ("%c4" to "%c[size]") as already used in the other call templates in this file.
Found after moving the usercopy selftests to KUnit:
# usercopy_test_invalid: EXPECTATION FAILED at lib/usercopy_kunit.c:278 Expected val_u64 == 0, but val_u64 == -60129542144 (0xfffffff200000000)
Closes: https://lore.kernel.org/all/CABVgOSn=tb=Lj9SxHuT4_9MTjjKVxsq-ikdXC4kGHO4CfKV... Fixes: b19b74bc99b1 ("x86/mm: Rework address range check in get_user() and put_user()") Reported-by: David Gow davidgow@google.com Signed-off-by: Kees Cook kees@kernel.org Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Reviewed-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reviewed-by: Qiuxu Zhuo qiuxu.zhuo@intel.com Tested-by: David Gow davidgow@google.com Link: https://lore.kernel.org/all/20240610210213.work.143-kees%40kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/uaccess.h | 4 ++-- arch/x86/lib/getuser.S | 6 +++++- 2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h index 0f9bab92a43d7..3a7755c1a4410 100644 --- a/arch/x86/include/asm/uaccess.h +++ b/arch/x86/include/asm/uaccess.h @@ -78,10 +78,10 @@ extern int __get_user_bad(void); int __ret_gu; \ register __inttype(*(ptr)) __val_gu asm("%"_ASM_DX); \ __chk_user_ptr(ptr); \ - asm volatile("call __" #fn "_%c4" \ + asm volatile("call __" #fn "_%c[size]" \ : "=a" (__ret_gu), "=r" (__val_gu), \ ASM_CALL_CONSTRAINT \ - : "0" (ptr), "i" (sizeof(*(ptr)))); \ + : "0" (ptr), [size] "i" (sizeof(*(ptr)))); \ instrument_get_user(__val_gu); \ (x) = (__force __typeof__(*(ptr))) __val_gu; \ __builtin_expect(__ret_gu, 0); \ diff --git a/arch/x86/lib/getuser.S b/arch/x86/lib/getuser.S index 10d5ed8b5990f..a1cb3a4e6742d 100644 --- a/arch/x86/lib/getuser.S +++ b/arch/x86/lib/getuser.S @@ -44,7 +44,11 @@ or %rdx, %rax .else cmp $TASK_SIZE_MAX-\size+1, %eax +.if \size != 8 jae .Lbad_get_user +.else + jae .Lbad_get_user_8 +.endif sbb %edx, %edx /* array_index_mask_nospec() */ and %edx, %eax .endif @@ -154,7 +158,7 @@ SYM_CODE_END(__get_user_handle_exception) #ifdef CONFIG_X86_32 SYM_CODE_START_LOCAL(__get_user_8_handle_exception) ASM_CLAC -bad_get_user_8: +.Lbad_get_user_8: xor %edx,%edx xor %ecx,%ecx mov $(-EFAULT),%_ASM_AX
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ziqi Chen quic_ziqichen@quicinc.com
[ Upstream commit 77691af484e28af7a692e511b9ed5ca63012ec6e ]
In ufshcd_clock_scaling_prepare(), after SCSI layer is blocked, ufshcd_pending_cmds() is called to check whether there are pending transactions or not. And only if there are no pending transactions can we proceed to kickstart the clock scaling sequence.
ufshcd_pending_cmds() traverses over all SCSI devices and calls sbitmap_weight() on their budget_map. sbitmap_weight() can be broken down to three steps:
1. Calculate the nr outstanding bits set in the 'word' bitmap.
2. Calculate the nr outstanding bits set in the 'cleared' bitmap.
3. Subtract the result from step 1 by the result from step 2.
This can lead to a race condition as outlined below:
Assume there is one pending transaction in the request queue of one SCSI device, say sda, and the budget token of this request is 0, the 'word' is 0x1 and the 'cleared' is 0x0.
1. When step 1 executes, it gets the result as 1.
2. Before step 2 executes, block layer tries to dispatch a new request to sda. Since the SCSI layer is blocked, the request cannot pass through SCSI but the block layer would do budget_get() and budget_put() to sda's budget map regardless, so the 'word' has become 0x3 and 'cleared' has become 0x2 (assume the new request got budget token 1).
3. When step 2 executes, it gets the result as 1.
4. When step 3 executes, it gets the result as 0, meaning there is no pending transactions, which is wrong.
Thread A Thread B ufshcd_pending_cmds() __blk_mq_sched_dispatch_requests() | | sbitmap_weight(word) | | scsi_mq_get_budget() | | | scsi_mq_put_budget() | | sbitmap_weight(cleared) ...
When this race condition happens, the clock scaling sequence is started with transactions still in flight, leading to subsequent hibernate enter failure, broken link, task abort and back to back error recovery.
Fix this race condition by quiescing the request queues before calling ufshcd_pending_cmds() so that block layer won't touch the budget map when ufshcd_pending_cmds() is working on it. In addition, remove the SCSI layer blocking/unblocking to reduce redundancies and latencies.
Fixes: 8d077ede48c1 ("scsi: ufs: Optimize the command queueing code") Co-developed-by: Can Guo quic_cang@quicinc.com Signed-off-by: Can Guo quic_cang@quicinc.com Signed-off-by: Ziqi Chen quic_ziqichen@quicinc.com Link: https://lore.kernel.org/r/1717754818-39863-1-git-send-email-quic_ziqichen@qu... Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ufs/core/ufshcd.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 1322a9c318cff..ce1abd5d725ad 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -1392,7 +1392,7 @@ static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba, u64 timeout_us) * make sure that there are no outstanding requests when * clock scaling is in progress */ - ufshcd_scsi_block_requests(hba); + blk_mq_quiesce_tagset(&hba->host->tag_set); mutex_lock(&hba->wb_mutex); down_write(&hba->clk_scaling_lock);
@@ -1401,7 +1401,7 @@ static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba, u64 timeout_us) ret = -EBUSY; up_write(&hba->clk_scaling_lock); mutex_unlock(&hba->wb_mutex); - ufshcd_scsi_unblock_requests(hba); + blk_mq_unquiesce_tagset(&hba->host->tag_set); goto out; }
@@ -1422,7 +1422,7 @@ static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba, int err, bool sc
mutex_unlock(&hba->wb_mutex);
- ufshcd_scsi_unblock_requests(hba); + blk_mq_unquiesce_tagset(&hba->host->tag_set); ufshcd_release(hba); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kory Maincent kory.maincent@bootlin.com
[ Upstream commit 144ba8580bcb82b2686c3d1a043299d844b9a682 ]
ENOTSUPP is not a SUSV4 error code, prefer EOPNOTSUPP as reported by checkpatch script.
Fixes: 18ff0bcda6d1 ("ethtool: add interface to interact with Ethernet Power Equipment") Reviewed-by: Andrew Lunn andrew@lunn.ch Acked-by: Oleksij Rempel o.rempel@pengutronix.de Signed-off-by: Kory Maincent kory.maincent@bootlin.com Link: https://lore.kernel.org/r/20240610083426.740660-1-kory.maincent@bootlin.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/pse-pd/pse.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/pse-pd/pse.h b/include/linux/pse-pd/pse.h index fb724c65c77bc..5ce0cd76956e0 100644 --- a/include/linux/pse-pd/pse.h +++ b/include/linux/pse-pd/pse.h @@ -114,14 +114,14 @@ static inline int pse_ethtool_get_status(struct pse_control *psec, struct netlink_ext_ack *extack, struct pse_control_status *status) { - return -ENOTSUPP; + return -EOPNOTSUPP; }
static inline int pse_ethtool_set_config(struct pse_control *psec, struct netlink_ext_ack *extack, const struct pse_control_config *config) { - return -ENOTSUPP; + return -EOPNOTSUPP; }
#endif
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joshua Washington joshwash@google.com
[ Upstream commit 1b9f756344416e02b41439bf2324b26aa25e141c ]
TSO currently fails when the skb's gso_type field has more than one bit set.
TSO packets can be passed from userspace using PF_PACKET, TUNTAP and a few others, using virtio_net_hdr (e.g., PACKET_VNET_HDR). This includes virtualization, such as QEMU, a real use-case.
The gso_type and gso_size fields as passed from userspace in virtio_net_hdr are not trusted blindly by the kernel. It adds gso_type |= SKB_GSO_DODGY to force the packet to enter the software GSO stack for verification.
This issue might similarly come up when the CWR bit is set in the TCP header for congestion control, causing the SKB_GSO_TCP_ECN gso_type bit to be set.
Fixes: a57e5de476be ("gve: DQO: Add TX path") Signed-off-by: Joshua Washington joshwash@google.com Reviewed-by: Praveen Kaligineedi pkaligineedi@google.com Reviewed-by: Harshitha Ramamurthy hramamurthy@google.com Reviewed-by: Willem de Bruijn willemb@google.com Suggested-by: Eric Dumazet edumazet@google.com Acked-by: Andrei Vagin avagin@gmail.com
v2 - Remove unnecessary comments, remove line break between fixes tag and signoffs.
v3 - Add back unrelated empty line removal.
Link: https://lore.kernel.org/r/20240610225729.2985343-1-joshwash@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/google/gve/gve_tx_dqo.c | 20 +++++--------------- 1 file changed, 5 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/google/gve/gve_tx_dqo.c b/drivers/net/ethernet/google/gve/gve_tx_dqo.c index bc34b6cd3a3e5..917a79a47e19c 100644 --- a/drivers/net/ethernet/google/gve/gve_tx_dqo.c +++ b/drivers/net/ethernet/google/gve/gve_tx_dqo.c @@ -555,28 +555,18 @@ static int gve_prep_tso(struct sk_buff *skb) if (unlikely(skb_shinfo(skb)->gso_size < GVE_TX_MIN_TSO_MSS_DQO)) return -1;
+ if (!(skb_shinfo(skb)->gso_type & (SKB_GSO_TCPV4 | SKB_GSO_TCPV6))) + return -EINVAL; + /* Needed because we will modify header. */ err = skb_cow_head(skb, 0); if (err < 0) return err;
tcp = tcp_hdr(skb); - - /* Remove payload length from checksum. */ paylen = skb->len - skb_transport_offset(skb); - - switch (skb_shinfo(skb)->gso_type) { - case SKB_GSO_TCPV4: - case SKB_GSO_TCPV6: - csum_replace_by_diff(&tcp->check, - (__force __wsum)htonl(paylen)); - - /* Compute length of segmentation header. */ - header_len = skb_tcp_all_headers(skb); - break; - default: - return -EINVAL; - } + csum_replace_by_diff(&tcp->check, (__force __wsum)htonl(paylen)); + header_len = skb_tcp_all_headers(skb);
if (unlikely(header_len > GVE_TX_MAX_HDR_SIZE_DQO)) return -EINVAL;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiaolei Wang xiaolei.wang@windriver.com
[ Upstream commit be27b896529787e23a35ae4befb6337ce73fcca0 ]
The current cbs parameter depends on speed after uplinking, which is not needed and will report a configuration error if the port is not initially connected. The UAPI exposed by tc-cbs requires userspace to recalculate the send slope anyway, because the formula depends on port_transmit_rate (see man tc-cbs), which is not an invariant from tc's perspective. Therefore, we use offload->sendslope and offload->idleslope to derive the original port_transmit_rate from the CBS formula.
Fixes: 1f705bc61aee ("net: stmmac: Add support for CBS QDISC") Signed-off-by: Xiaolei Wang xiaolei.wang@windriver.com Reviewed-by: Wojciech Drewek wojciech.drewek@intel.com Reviewed-by: Vladimir Oltean olteanv@gmail.com Link: https://lore.kernel.org/r/20240608143524.2065736-1-xiaolei.wang@windriver.co... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/stmicro/stmmac/stmmac_tc.c | 25 ++++++++----------- 1 file changed, 11 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c index 620c16e9be3a6..b1896379dbab5 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c @@ -343,10 +343,11 @@ static int tc_setup_cbs(struct stmmac_priv *priv, struct tc_cbs_qopt_offload *qopt) { u32 tx_queues_count = priv->plat->tx_queues_to_use; + s64 port_transmit_rate_kbps; u32 queue = qopt->queue; - u32 ptr, speed_div; u32 mode_to_use; u64 value; + u32 ptr; int ret;
/* Queue 0 is not AVB capable */ @@ -355,30 +356,26 @@ static int tc_setup_cbs(struct stmmac_priv *priv, if (!priv->dma_cap.av) return -EOPNOTSUPP;
+ port_transmit_rate_kbps = qopt->idleslope - qopt->sendslope; + /* Port Transmit Rate and Speed Divider */ - switch (priv->speed) { + switch (div_s64(port_transmit_rate_kbps, 1000)) { case SPEED_10000: - ptr = 32; - speed_div = 10000000; - break; case SPEED_5000: ptr = 32; - speed_div = 5000000; break; case SPEED_2500: - ptr = 8; - speed_div = 2500000; - break; case SPEED_1000: ptr = 8; - speed_div = 1000000; break; case SPEED_100: ptr = 4; - speed_div = 100000; break; default: - return -EOPNOTSUPP; + netdev_err(priv->dev, + "Invalid portTransmitRate %lld (idleSlope - sendSlope)\n", + port_transmit_rate_kbps); + return -EINVAL; }
mode_to_use = priv->plat->tx_queues_cfg[queue].mode_to_use; @@ -398,10 +395,10 @@ static int tc_setup_cbs(struct stmmac_priv *priv, }
/* Final adjustments for HW */ - value = div_s64(qopt->idleslope * 1024ll * ptr, speed_div); + value = div_s64(qopt->idleslope * 1024ll * ptr, port_transmit_rate_kbps); priv->plat->tx_queues_cfg[queue].idle_slope = value & GENMASK(31, 0);
- value = div_s64(-qopt->sendslope * 1024ll * ptr, speed_div); + value = div_s64(-qopt->sendslope * 1024ll * ptr, port_transmit_rate_kbps); priv->plat->tx_queues_cfg[queue].send_slope = value & GENMASK(31, 0);
value = qopt->hicredit * 1024ll * 8;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Su Hui suhui@nfschina.com
[ Upstream commit 9b1ebce6a1fded90d4a1c6c57dc6262dac4c4c14 ]
Clang static checker (scan-build) warning: block/sed-opal.c:line 317, column 3 Value stored to 'ret' is never read.
Fix this problem by returning the error code when keyring_search() failed. Otherwise, 'key' will have a wrong value when 'kerf' stores the error code.
Fixes: 3bfeb6125664 ("block: sed-opal: keyring support for SED keys") Signed-off-by: Su Hui suhui@nfschina.com Link: https://lore.kernel.org/r/20240611073659.429582-1-suhui@nfschina.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/sed-opal.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/sed-opal.c b/block/sed-opal.c index 14fe0fef811cf..598fd3e7fcc8e 100644 --- a/block/sed-opal.c +++ b/block/sed-opal.c @@ -314,7 +314,7 @@ static int read_sed_opal_key(const char *key_name, u_char *buffer, int buflen) &key_type_user, key_name, true);
if (IS_ERR(kref)) - ret = PTR_ERR(kref); + return PTR_ERR(kref);
key = key_ref_to_ptr(kref); down_read(&key->sem);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chengming Zhou chengming.zhou@linux.dev
[ Upstream commit d0321c812d89c5910d8da8e4b10c891c6b96ff70 ]
Friedrich Weber reported a kernel crash problem and bisected to commit 81ada09cc25e ("blk-flush: reuse rq queuelist in flush state machine").
The root cause is that we use "list_move_tail(&rq->queuelist, pending)" in the PREFLUSH/POSTFLUSH sequences. But rq->queuelist.next == xxx since it's popped out from plug->cached_rq in __blk_mq_alloc_requests_batch(). We don't initialize its queuelist just for this first request, although the queuelist of all later popped requests will be initialized.
Fix it by changing to use "list_add_tail(&rq->queuelist, pending)" so rq->queuelist doesn't need to be initialized. It should be ok since rq can't be on any list when PREFLUSH or POSTFLUSH, has no move actually.
Please note the commit 81ada09cc25e ("blk-flush: reuse rq queuelist in flush state machine") also has another requirement that no drivers would touch rq->queuelist after blk_mq_end_request() since we will reuse it to add rq to the post-flush pending list in POSTFLUSH. If this is not true, we will have to revert that commit IMHO.
This updated version adds "list_del_init(&rq->queuelist)" in flush rq callback since the dm layer may submit request of a weird invalid format (REQ_FSEQ_PREFLUSH | REQ_FSEQ_POSTFLUSH), which causes double list_add if without this "list_del_init(&rq->queuelist)". The weird invalid format problem should be fixed in dm layer.
Reported-by: Friedrich Weber f.weber@proxmox.com Closes: https://lore.kernel.org/lkml/14b89dfb-505c-49f7-aebb-01c54451db40@proxmox.co... Closes: https://lore.kernel.org/lkml/c9d03ff7-27c5-4ebd-b3f6-5a90d96f35ba@proxmox.co... Fixes: 81ada09cc25e ("blk-flush: reuse rq queuelist in flush state machine") Cc: Christoph Hellwig hch@lst.de Cc: ming.lei@redhat.com Cc: bvanassche@acm.org Tested-by: Friedrich Weber f.weber@proxmox.com Signed-off-by: Chengming Zhou chengming.zhou@linux.dev Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/20240608143115.972486-1-chengming.zhou@linux.dev Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-flush.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/block/blk-flush.c b/block/blk-flush.c index b0f314f4bc149..9944414bf7eed 100644 --- a/block/blk-flush.c +++ b/block/blk-flush.c @@ -183,7 +183,7 @@ static void blk_flush_complete_seq(struct request *rq, /* queue for flush */ if (list_empty(pending)) fq->flush_pending_since = jiffies; - list_move_tail(&rq->queuelist, pending); + list_add_tail(&rq->queuelist, pending); break;
case REQ_FSEQ_DATA: @@ -261,6 +261,7 @@ static enum rq_end_io_ret flush_end_io(struct request *flush_rq, unsigned int seq = blk_flush_cur_seq(rq);
BUG_ON(seq != REQ_FSEQ_PREFLUSH && seq != REQ_FSEQ_POSTFLUSH); + list_del_init(&rq->queuelist); blk_flush_complete_seq(rq, fq, seq, error); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Wagner dwagner@suse.de
[ Upstream commit d76584e53f4244dbc154bec447c3852600acc914 ]
The id override functions return a status which is not propagated to the caller.
Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Signed-off-by: Daniel Wagner dwagner@suse.de Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/passthru.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c index bb4a69d538fd1..f003782d4ecff 100644 --- a/drivers/nvme/target/passthru.c +++ b/drivers/nvme/target/passthru.c @@ -226,13 +226,13 @@ static void nvmet_passthru_execute_cmd_work(struct work_struct *w) req->cmd->common.opcode == nvme_admin_identify) { switch (req->cmd->identify.cns) { case NVME_ID_CNS_CTRL: - nvmet_passthru_override_id_ctrl(req); + status = nvmet_passthru_override_id_ctrl(req); break; case NVME_ID_CNS_NS: - nvmet_passthru_override_id_ns(req); + status = nvmet_passthru_override_id_ns(req); break; case NVME_ID_CNS_NS_DESC_LIST: - nvmet_passthru_override_id_descs(req); + status = nvmet_passthru_override_id_descs(req); break; } } else if (status < 0)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Pavlu petr.pavlu@suse.com
[ Upstream commit 14a20e5b4ad998793c5f43b0330d9e1388446cf3 ]
The net.ipv6.route.flush system parameter takes a value which specifies a delay used during the flush operation for aging exception routes. The written value is however not used in the currently requested flush and instead utilized only in the next one.
A problem is that ipv6_sysctl_rtcache_flush() first reads the old value of net->ipv6.sysctl.flush_delay into a local delay variable and then calls proc_dointvec() which actually updates the sysctl based on the provided input.
Fix the problem by switching the order of the two operations.
Fixes: 4990509f19e8 ("[NETNS][IPV6]: Make sysctls route per namespace.") Signed-off-by: Petr Pavlu petr.pavlu@suse.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://lore.kernel.org/r/20240607112828.30285-1-petr.pavlu@suse.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/route.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/ipv6/route.c b/net/ipv6/route.c index bca6f33c7bb9e..8f8c8fcfd1c21 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -6343,12 +6343,12 @@ static int ipv6_sysctl_rtcache_flush(struct ctl_table *ctl, int write, if (!write) return -EINVAL;
- net = (struct net *)ctl->extra1; - delay = net->ipv6.sysctl.flush_delay; ret = proc_dointvec(ctl, write, buffer, lenp, ppos); if (ret) return ret;
+ net = (struct net *)ctl->extra1; + delay = net->ipv6.sysctl.flush_delay; fib6_run_gc(delay <= 0 ? 0 : (unsigned long)delay, net, delay > 0); return 0; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Aleksandrov razor@blackwall.org
[ Upstream commit 36c92936e868601fa1f43da6758cf55805043509 ]
Pass the already obtained vlan group pointer to br_mst_vlan_set_state() instead of dereferencing it again. Each caller has already correctly dereferenced it for their context. This change is required for the following suspicious RCU dereference fix. No functional changes intended.
Fixes: 3a7c1661ae13 ("net: bridge: mst: fix vlan use-after-free") Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20240609103654.914987-2-razor@blackwall.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br_mst.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/net/bridge/br_mst.c b/net/bridge/br_mst.c index 3c66141d34d62..1de72816b0fb2 100644 --- a/net/bridge/br_mst.c +++ b/net/bridge/br_mst.c @@ -73,11 +73,10 @@ int br_mst_get_state(const struct net_device *dev, u16 msti, u8 *state) } EXPORT_SYMBOL_GPL(br_mst_get_state);
-static void br_mst_vlan_set_state(struct net_bridge_port *p, struct net_bridge_vlan *v, +static void br_mst_vlan_set_state(struct net_bridge_vlan_group *vg, + struct net_bridge_vlan *v, u8 state) { - struct net_bridge_vlan_group *vg = nbp_vlan_group(p); - if (br_vlan_get_state(v) == state) return;
@@ -121,7 +120,7 @@ int br_mst_set_state(struct net_bridge_port *p, u16 msti, u8 state, if (v->brvlan->msti != msti) continue;
- br_mst_vlan_set_state(p, v, state); + br_mst_vlan_set_state(vg, v, state); }
out: @@ -140,13 +139,13 @@ static void br_mst_vlan_sync_state(struct net_bridge_vlan *pv, u16 msti) * it. */ if (v != pv && v->brvlan->msti == msti) { - br_mst_vlan_set_state(pv->port, pv, v->state); + br_mst_vlan_set_state(vg, pv, v->state); return; } }
/* Otherwise, start out in a new MSTI with all ports disabled. */ - return br_mst_vlan_set_state(pv->port, pv, BR_STATE_DISABLED); + return br_mst_vlan_set_state(vg, pv, BR_STATE_DISABLED); }
int br_mst_vlan_set_msti(struct net_bridge_vlan *mv, u16 msti)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Aleksandrov razor@blackwall.org
[ Upstream commit 546ceb1dfdac866648ec959cbc71d9525bd73462 ]
I converted br_mst_set_state to RCU to avoid a vlan use-after-free but forgot to change the vlan group dereference helper. Switch to vlan group RCU deref helper to fix the suspicious rcu usage warning.
Fixes: 3a7c1661ae13 ("net: bridge: mst: fix vlan use-after-free") Reported-by: syzbot+9bbe2de1bc9d470eb5fe@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9bbe2de1bc9d470eb5fe Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Link: https://lore.kernel.org/r/20240609103654.914987-3-razor@blackwall.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br_mst.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/bridge/br_mst.c b/net/bridge/br_mst.c index 1de72816b0fb2..1820f09ff59ce 100644 --- a/net/bridge/br_mst.c +++ b/net/bridge/br_mst.c @@ -102,7 +102,7 @@ int br_mst_set_state(struct net_bridge_port *p, u16 msti, u8 state, int err = 0;
rcu_read_lock(); - vg = nbp_vlan_group(p); + vg = nbp_vlan_group_rcu(p); if (!vg) goto out;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Riana Tauro riana.tauro@intel.com
[ Upstream commit 7c877115da4196fa108dcfefd49f5a9b67b8d8ca ]
The rc6 registers used in disable_c6 function belong to the GT forcewake domain. Hence change the forcewake assertion to check GT forcewake domain.
v2: add fixes tag (Himal)
Fixes: 975e4a3795d4 ("drm/xe: Manually setup C6 when skip_guc_pc is set") Signed-off-by: Riana Tauro riana.tauro@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com Reviewed-by: Himal Prasad Ghimiray himal.prasad.ghimiray@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240606100842.956072-2-riana.... Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com (cherry picked from commit 21b708554648177a0078962c31629bce31ef5d83) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_gt_idle.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_gt_idle.c b/drivers/gpu/drm/xe/xe_gt_idle.c index 9fcae65b64699..1521b3d9933c7 100644 --- a/drivers/gpu/drm/xe/xe_gt_idle.c +++ b/drivers/gpu/drm/xe/xe_gt_idle.c @@ -184,7 +184,7 @@ void xe_gt_idle_enable_c6(struct xe_gt *gt) void xe_gt_idle_disable_c6(struct xe_gt *gt) { xe_device_assert_mem_access(gt_to_xe(gt)); - xe_force_wake_assert_held(gt_to_fw(gt), XE_FORCEWAKE_ALL); + xe_force_wake_assert_held(gt_to_fw(gt), XE_FW_GT);
xe_mmio_write32(gt, PG_ENABLE, 0); xe_mmio_write32(gt, RC_CONTROL, 0);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrzej Hajda andrzej.hajda@intel.com
[ Upstream commit b5e3a9b83f352a737b77a01734a6661d1130ed49 ]
Tests show that user fence signalling requires kind of write barrier, otherwise not all writes performed by the workload will be available to userspace. It is already done for render and compute, we need it also for the rest: video, gsc, copy.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Andrzej Hajda andrzej.hajda@intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240605-fix_user_fence_posted... (cherry picked from commit 3ad7d18c5dad75ed38098c7cc3bc9594b4701399) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_ring_ops.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c index 5b2b37b598130..820c8d73c464f 100644 --- a/drivers/gpu/drm/xe/xe_ring_ops.c +++ b/drivers/gpu/drm/xe/xe_ring_ops.c @@ -79,6 +79,16 @@ static int emit_store_imm_ggtt(u32 addr, u32 value, u32 *dw, int i) return i; }
+static int emit_flush_dw(u32 *dw, int i) +{ + dw[i++] = MI_FLUSH_DW | MI_FLUSH_IMM_DW; + dw[i++] = 0; + dw[i++] = 0; + dw[i++] = 0; + + return i; +} + static int emit_flush_imm_ggtt(u32 addr, u32 value, bool invalidate_tlb, u32 *dw, int i) { @@ -233,10 +243,12 @@ static void __emit_job_gen12_simple(struct xe_sched_job *job, struct xe_lrc *lrc
i = emit_bb_start(batch_addr, ppgtt_flag, dw, i);
- if (job->user_fence.used) + if (job->user_fence.used) { + i = emit_flush_dw(dw, i); i = emit_store_imm_ppgtt_posted(job->user_fence.addr, job->user_fence.value, dw, i); + }
i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno, false, dw, i);
@@ -292,10 +304,12 @@ static void __emit_job_gen12_video(struct xe_sched_job *job, struct xe_lrc *lrc,
i = emit_bb_start(batch_addr, ppgtt_flag, dw, i);
- if (job->user_fence.used) + if (job->user_fence.used) { + i = emit_flush_dw(dw, i); i = emit_store_imm_ppgtt_posted(job->user_fence.addr, job->user_fence.value, dw, i); + }
i = emit_flush_imm_ggtt(xe_lrc_seqno_ggtt_addr(lrc), seqno, false, dw, i);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rodrigo Vivi rodrigo.vivi@intel.com
[ Upstream commit 1e941c9881ec20f6d0173bcd344a605bb89cb121 ]
We are now protected by init, sysfs, or removal and don't need these mem_access protections around GuC_PC anymore.
Reviewed-by: Matthew Auld matthew.auld@intel.com Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240222163937.138342-6-rodrig... Stable-dep-of: 2470b141bfae ("drm/xe: move disable_c6 call") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_guc_pc.c | 64 ++++++---------------------------- 1 file changed, 10 insertions(+), 54 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c index 2839d685631bc..f4b031b8d9deb 100644 --- a/drivers/gpu/drm/xe/xe_guc_pc.c +++ b/drivers/gpu/drm/xe/xe_guc_pc.c @@ -381,8 +381,6 @@ u32 xe_guc_pc_get_act_freq(struct xe_guc_pc *pc) struct xe_device *xe = gt_to_xe(gt); u32 freq;
- xe_device_mem_access_get(gt_to_xe(gt)); - /* When in RC6, actual frequency reported will be 0. */ if (GRAPHICS_VERx100(xe) >= 1270) { freq = xe_mmio_read32(gt, MTL_MIRROR_TARGET_WP1); @@ -394,8 +392,6 @@ u32 xe_guc_pc_get_act_freq(struct xe_guc_pc *pc)
freq = decode_freq(freq);
- xe_device_mem_access_put(gt_to_xe(gt)); - return freq; }
@@ -412,14 +408,13 @@ int xe_guc_pc_get_cur_freq(struct xe_guc_pc *pc, u32 *freq) struct xe_gt *gt = pc_to_gt(pc); int ret;
- xe_device_mem_access_get(gt_to_xe(gt)); /* * GuC SLPC plays with cur freq request when GuCRC is enabled * Block RC6 for a more reliable read. */ ret = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); if (ret) - goto out; + return ret;
*freq = xe_mmio_read32(gt, RPNSWREQ);
@@ -427,9 +422,7 @@ int xe_guc_pc_get_cur_freq(struct xe_guc_pc *pc, u32 *freq) *freq = decode_freq(*freq);
XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL)); -out: - xe_device_mem_access_put(gt_to_xe(gt)); - return ret; + return 0; }
/** @@ -451,12 +444,7 @@ u32 xe_guc_pc_get_rp0_freq(struct xe_guc_pc *pc) */ u32 xe_guc_pc_get_rpe_freq(struct xe_guc_pc *pc) { - struct xe_gt *gt = pc_to_gt(pc); - struct xe_device *xe = gt_to_xe(gt); - - xe_device_mem_access_get(xe); pc_update_rp_values(pc); - xe_device_mem_access_put(xe);
return pc->rpe_freq; } @@ -485,7 +473,6 @@ int xe_guc_pc_get_min_freq(struct xe_guc_pc *pc, u32 *freq) struct xe_gt *gt = pc_to_gt(pc); int ret;
- xe_device_mem_access_get(pc_to_xe(pc)); mutex_lock(&pc->freq_lock); if (!pc->freq_ready) { /* Might be in the middle of a gt reset */ @@ -511,7 +498,6 @@ int xe_guc_pc_get_min_freq(struct xe_guc_pc *pc, u32 *freq) XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL)); out: mutex_unlock(&pc->freq_lock); - xe_device_mem_access_put(pc_to_xe(pc)); return ret; }
@@ -528,7 +514,6 @@ int xe_guc_pc_set_min_freq(struct xe_guc_pc *pc, u32 freq) { int ret;
- xe_device_mem_access_get(pc_to_xe(pc)); mutex_lock(&pc->freq_lock); if (!pc->freq_ready) { /* Might be in the middle of a gt reset */ @@ -544,8 +529,6 @@ int xe_guc_pc_set_min_freq(struct xe_guc_pc *pc, u32 freq)
out: mutex_unlock(&pc->freq_lock); - xe_device_mem_access_put(pc_to_xe(pc)); - return ret; }
@@ -561,7 +544,6 @@ int xe_guc_pc_get_max_freq(struct xe_guc_pc *pc, u32 *freq) { int ret;
- xe_device_mem_access_get(pc_to_xe(pc)); mutex_lock(&pc->freq_lock); if (!pc->freq_ready) { /* Might be in the middle of a gt reset */ @@ -577,7 +559,6 @@ int xe_guc_pc_get_max_freq(struct xe_guc_pc *pc, u32 *freq)
out: mutex_unlock(&pc->freq_lock); - xe_device_mem_access_put(pc_to_xe(pc)); return ret; }
@@ -594,7 +575,6 @@ int xe_guc_pc_set_max_freq(struct xe_guc_pc *pc, u32 freq) { int ret;
- xe_device_mem_access_get(pc_to_xe(pc)); mutex_lock(&pc->freq_lock); if (!pc->freq_ready) { /* Might be in the middle of a gt reset */ @@ -610,7 +590,6 @@ int xe_guc_pc_set_max_freq(struct xe_guc_pc *pc, u32 freq)
out: mutex_unlock(&pc->freq_lock); - xe_device_mem_access_put(pc_to_xe(pc)); return ret; }
@@ -623,8 +602,6 @@ enum xe_gt_idle_state xe_guc_pc_c_status(struct xe_guc_pc *pc) struct xe_gt *gt = pc_to_gt(pc); u32 reg, gt_c_state;
- xe_device_mem_access_get(gt_to_xe(gt)); - if (GRAPHICS_VERx100(gt_to_xe(gt)) >= 1270) { reg = xe_mmio_read32(gt, MTL_MIRROR_TARGET_WP1); gt_c_state = REG_FIELD_GET(MTL_CC_MASK, reg); @@ -633,8 +610,6 @@ enum xe_gt_idle_state xe_guc_pc_c_status(struct xe_guc_pc *pc) gt_c_state = REG_FIELD_GET(RCN_MASK, reg); }
- xe_device_mem_access_put(gt_to_xe(gt)); - switch (gt_c_state) { case GT_C6: return GT_IDLE_C6; @@ -654,9 +629,7 @@ u64 xe_guc_pc_rc6_residency(struct xe_guc_pc *pc) struct xe_gt *gt = pc_to_gt(pc); u32 reg;
- xe_device_mem_access_get(gt_to_xe(gt)); reg = xe_mmio_read32(gt, GT_GFX_RC6); - xe_device_mem_access_put(gt_to_xe(gt));
return reg; } @@ -670,9 +643,7 @@ u64 xe_guc_pc_mc6_residency(struct xe_guc_pc *pc) struct xe_gt *gt = pc_to_gt(pc); u64 reg;
- xe_device_mem_access_get(gt_to_xe(gt)); reg = xe_mmio_read32(gt, MTL_MEDIA_MC6); - xe_device_mem_access_put(gt_to_xe(gt));
return reg; } @@ -801,23 +772,19 @@ int xe_guc_pc_gucrc_disable(struct xe_guc_pc *pc) if (xe->info.skip_guc_pc) return 0;
- xe_device_mem_access_get(pc_to_xe(pc)); - ret = pc_action_setup_gucrc(pc, XE_GUCRC_HOST_CONTROL); if (ret) - goto out; + return ret;
ret = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); if (ret) - goto out; + return ret;
xe_gt_idle_disable_c6(gt);
XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL));
-out: - xe_device_mem_access_put(pc_to_xe(pc)); - return ret; + return 0; }
static void pc_init_pcode_freq(struct xe_guc_pc *pc) @@ -870,11 +837,9 @@ int xe_guc_pc_start(struct xe_guc_pc *pc)
xe_gt_assert(gt, xe_device_uc_enabled(xe));
- xe_device_mem_access_get(pc_to_xe(pc)); - ret = xe_force_wake_get(gt_to_fw(gt), XE_FORCEWAKE_ALL); if (ret) - goto out_fail_force_wake; + return ret;
if (xe->info.skip_guc_pc) { if (xe->info.platform != XE_PVC) @@ -914,8 +879,6 @@ int xe_guc_pc_start(struct xe_guc_pc *pc)
out: XE_WARN_ON(xe_force_wake_put(gt_to_fw(gt), XE_FORCEWAKE_ALL)); -out_fail_force_wake: - xe_device_mem_access_put(pc_to_xe(pc)); return ret; }
@@ -928,12 +891,9 @@ int xe_guc_pc_stop(struct xe_guc_pc *pc) struct xe_device *xe = pc_to_xe(pc); int ret;
- xe_device_mem_access_get(pc_to_xe(pc)); - if (xe->info.skip_guc_pc) { xe_gt_idle_disable_c6(pc_to_gt(pc)); - ret = 0; - goto out; + return 0; }
mutex_lock(&pc->freq_lock); @@ -942,16 +902,14 @@ int xe_guc_pc_stop(struct xe_guc_pc *pc)
ret = pc_action_shutdown(pc); if (ret) - goto out; + return ret;
if (wait_for_pc_state(pc, SLPC_GLOBAL_STATE_NOT_RUNNING)) { drm_err(&pc_to_xe(pc)->drm, "GuC PC Shutdown failed\n"); - ret = -EIO; + return -EIO; }
-out: - xe_device_mem_access_put(pc_to_xe(pc)); - return ret; + return 0; }
/** @@ -965,9 +923,7 @@ static void xe_guc_pc_fini(struct drm_device *drm, void *arg) struct xe_device *xe = pc_to_xe(pc);
if (xe->info.skip_guc_pc) { - xe_device_mem_access_get(xe); xe_gt_idle_disable_c6(pc_to_gt(pc)); - xe_device_mem_access_put(xe); return; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Riana Tauro riana.tauro@intel.com
[ Upstream commit 2470b141bfae2b9695b5b6823e3b978b22d33dde ]
disable c6 called in guc_pc_fini_hw is unreachable.
GuC PC init returns earlier if skip_guc_pc is true and never registers the finish call thus making disable_c6 unreachable.
move this call to gt idle.
v2: rebase v3: add fixes tag (Himal)
Fixes: 975e4a3795d4 ("drm/xe: Manually setup C6 when skip_guc_pc is set") Signed-off-by: Riana Tauro riana.tauro@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240606100842.956072-3-riana.... Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com (cherry picked from commit 6800e63cf97bae62bca56d8e691544540d945f53) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_gt_idle.c | 7 +++++++ drivers/gpu/drm/xe/xe_guc_pc.c | 6 ------ 2 files changed, 7 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_gt_idle.c b/drivers/gpu/drm/xe/xe_gt_idle.c index 1521b3d9933c7..e7a39ad7adba0 100644 --- a/drivers/gpu/drm/xe/xe_gt_idle.c +++ b/drivers/gpu/drm/xe/xe_gt_idle.c @@ -126,6 +126,13 @@ static const struct attribute *gt_idle_attrs[] = { static void gt_idle_sysfs_fini(struct drm_device *drm, void *arg) { struct kobject *kobj = arg; + struct xe_gt *gt = kobj_to_gt(kobj->parent); + + if (gt_to_xe(gt)->info.skip_guc_pc) { + XE_WARN_ON(xe_force_wake_get(gt_to_fw(gt), XE_FW_GT)); + xe_gt_idle_disable_c6(gt); + xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); + }
sysfs_remove_files(kobj, gt_idle_attrs); kobject_put(kobj); diff --git a/drivers/gpu/drm/xe/xe_guc_pc.c b/drivers/gpu/drm/xe/xe_guc_pc.c index f4b031b8d9deb..dd9d65f923da7 100644 --- a/drivers/gpu/drm/xe/xe_guc_pc.c +++ b/drivers/gpu/drm/xe/xe_guc_pc.c @@ -920,12 +920,6 @@ int xe_guc_pc_stop(struct xe_guc_pc *pc) static void xe_guc_pc_fini(struct drm_device *drm, void *arg) { struct xe_guc_pc *pc = arg; - struct xe_device *xe = pc_to_xe(pc); - - if (xe->info.skip_guc_pc) { - xe_gt_idle_disable_c6(pc_to_gt(pc)); - return; - }
xe_force_wake_get(gt_to_fw(pc_to_gt(pc)), XE_FORCEWAKE_ALL); XE_WARN_ON(xe_guc_pc_gucrc_disable(pc));
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Taehee Yoo ap420073@gmail.com
[ Upstream commit 79f18a41dd056115d685f3b0a419c7cd40055e13 ]
When queues are started, netif_napi_add() and napi_enable() are called. If there are 4 queues and only 3 queues are used for the current configuration, only 3 queues' napi should be registered and enabled. The ionic_qcq_enable() checks whether the .poll pointer is not NULL for enabling only the using queue' napi. Unused queues' napi will not be registered by netif_napi_add(), so the .poll pointer indicates NULL. But it couldn't distinguish whether the napi was unregistered or not because netif_napi_del() doesn't reset the .poll pointer to NULL. So, ionic_qcq_enable() calls napi_enable() for the queue, which was unregistered by netif_napi_del().
Reproducer: ethtool -L <interface name> rx 1 tx 1 combined 0 ethtool -L <interface name> rx 0 tx 0 combined 1 ethtool -L <interface name> rx 0 tx 0 combined 4
Splat looks like: kernel BUG at net/core/dev.c:6666! Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 1057 Comm: kworker/3:3 Not tainted 6.10.0-rc2+ #16 Workqueue: events ionic_lif_deferred_work [ionic] RIP: 0010:napi_enable+0x3b/0x40 Code: 48 89 c2 48 83 e2 f6 80 b9 61 09 00 00 00 74 0d 48 83 bf 60 01 00 00 00 74 03 80 ce 01 f0 4f RSP: 0018:ffffb6ed83227d48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff97560cda0828 RCX: 0000000000000029 RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff97560cda0a28 RBP: ffffb6ed83227d50 R08: 0000000000000400 R09: 0000000000000001 R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000 R13: ffff97560ce3c1a0 R14: 0000000000000000 R15: ffff975613ba0a20 FS: 0000000000000000(0000) GS:ffff975d5f780000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8f734ee200 CR3: 0000000103e50000 CR4: 00000000007506f0 PKRU: 55555554 Call Trace: <TASK> ? die+0x33/0x90 ? do_trap+0xd9/0x100 ? napi_enable+0x3b/0x40 ? do_error_trap+0x83/0xb0 ? napi_enable+0x3b/0x40 ? napi_enable+0x3b/0x40 ? exc_invalid_op+0x4e/0x70 ? napi_enable+0x3b/0x40 ? asm_exc_invalid_op+0x16/0x20 ? napi_enable+0x3b/0x40 ionic_qcq_enable+0xb7/0x180 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_start_queues+0xc4/0x290 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_link_status_check+0x11c/0x170 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] ionic_lif_deferred_work+0x129/0x280 [ionic 59bdfc8a035436e1c4224ff7d10789e3f14643f8] process_one_work+0x145/0x360 worker_thread+0x2bb/0x3d0 ? __pfx_worker_thread+0x10/0x10 kthread+0xcc/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2d/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30
Fixes: 0f3154e6bcb3 ("ionic: Add Tx and Rx handling") Signed-off-by: Taehee Yoo ap420073@gmail.com Reviewed-by: Brett Creeley brett.creeley@amd.com Reviewed-by: Shannon Nelson shannon.nelson@amd.com Link: https://lore.kernel.org/r/20240612060446.1754392-1-ap420073@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/pensando/ionic/ionic_lif.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/pensando/ionic/ionic_lif.c b/drivers/net/ethernet/pensando/ionic/ionic_lif.c index 7f0c6cdc375e3..0cd819bc4ae35 100644 --- a/drivers/net/ethernet/pensando/ionic/ionic_lif.c +++ b/drivers/net/ethernet/pensando/ionic/ionic_lif.c @@ -304,10 +304,8 @@ static int ionic_qcq_enable(struct ionic_qcq *qcq) if (ret) return ret;
- if (qcq->napi.poll) - napi_enable(&qcq->napi); - if (qcq->flags & IONIC_QCQ_F_INTR) { + napi_enable(&qcq->napi); irq_set_affinity_hint(qcq->intr.vector, &qcq->intr.affinity_mask); ionic_intr_mask(idev->intr_ctrl, qcq->intr.index,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit 7d9df38c9c037ab84502ce7eeae9f1e1e7e72603 ]
Firmware interface 1.10.2.118 has increased the size of HWRM_PORT_PHY_QCFG response beyond the maximum size that can be forwarded. When the VF's link state is not the default auto state, the PF will need to forward the response back to the VF to indicate the forced state. This regression may cause the VF to fail to initialize.
Fix it by capping the HWRM_PORT_PHY_QCFG response to the maximum 96 bytes. The SPEEDS2_SUPPORTED flag needs to be cleared because the new speeds2 fields are beyond the legacy structure. Also modify bnxt_hwrm_fwd_resp() to print a warning if the message size exceeds 96 bytes to make this failure more obvious.
Fixes: 84a911db8305 ("bnxt_en: Update firmware interface to 1.10.2.118") Reviewed-by: Somnath Kotur somnath.kotur@broadcom.com Reviewed-by: Pavan Chebbi pavan.chebbi@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Link: https://lore.kernel.org/r/20240612231736.57823-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.h | 51 +++++++++++++++++++ .../net/ethernet/broadcom/bnxt/bnxt_sriov.c | 12 ++++- 2 files changed, 61 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.h b/drivers/net/ethernet/broadcom/bnxt/bnxt.h index dd849e715c9ba..c46abcccfca10 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.h +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.h @@ -1419,6 +1419,57 @@ struct bnxt_l2_filter { atomic_t refcnt; };
+/* Compat version of hwrm_port_phy_qcfg_output capped at 96 bytes. The + * first 95 bytes are identical to hwrm_port_phy_qcfg_output in bnxt_hsi.h. + * The last valid byte in the compat version is different. + */ +struct hwrm_port_phy_qcfg_output_compat { + __le16 error_code; + __le16 req_type; + __le16 seq_id; + __le16 resp_len; + u8 link; + u8 active_fec_signal_mode; + __le16 link_speed; + u8 duplex_cfg; + u8 pause; + __le16 support_speeds; + __le16 force_link_speed; + u8 auto_mode; + u8 auto_pause; + __le16 auto_link_speed; + __le16 auto_link_speed_mask; + u8 wirespeed; + u8 lpbk; + u8 force_pause; + u8 module_status; + __le32 preemphasis; + u8 phy_maj; + u8 phy_min; + u8 phy_bld; + u8 phy_type; + u8 media_type; + u8 xcvr_pkg_type; + u8 eee_config_phy_addr; + u8 parallel_detect; + __le16 link_partner_adv_speeds; + u8 link_partner_adv_auto_mode; + u8 link_partner_adv_pause; + __le16 adv_eee_link_speed_mask; + __le16 link_partner_adv_eee_link_speed_mask; + __le32 xcvr_identifier_type_tx_lpi_timer; + __le16 fec_cfg; + u8 duplex_state; + u8 option_flags; + char phy_vendor_name[16]; + char phy_vendor_partnumber[16]; + __le16 support_pam4_speeds; + __le16 force_pam4_link_speed; + __le16 auto_pam4_link_speed_mask; + u8 link_partner_pam4_adv_speeds; + u8 valid; +}; + struct bnxt_link_info { u8 phy_type; u8 media_type; diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c index 175192ebaa773..22898d3d088b0 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c @@ -950,8 +950,11 @@ static int bnxt_hwrm_fwd_resp(struct bnxt *bp, struct bnxt_vf_info *vf, struct hwrm_fwd_resp_input *req; int rc;
- if (BNXT_FWD_RESP_SIZE_ERR(msg_size)) + if (BNXT_FWD_RESP_SIZE_ERR(msg_size)) { + netdev_warn_once(bp->dev, "HWRM fwd response too big (%d bytes)\n", + msg_size); return -EINVAL; + }
rc = hwrm_req_init(bp, req, HWRM_FWD_RESP); if (!rc) { @@ -1085,7 +1088,7 @@ static int bnxt_vf_set_link(struct bnxt *bp, struct bnxt_vf_info *vf) rc = bnxt_hwrm_exec_fwd_resp( bp, vf, sizeof(struct hwrm_port_phy_qcfg_input)); } else { - struct hwrm_port_phy_qcfg_output phy_qcfg_resp = {0}; + struct hwrm_port_phy_qcfg_output_compat phy_qcfg_resp = {}; struct hwrm_port_phy_qcfg_input *phy_qcfg_req;
phy_qcfg_req = @@ -1096,6 +1099,11 @@ static int bnxt_vf_set_link(struct bnxt *bp, struct bnxt_vf_info *vf) mutex_unlock(&bp->link_lock); phy_qcfg_resp.resp_len = cpu_to_le16(sizeof(phy_qcfg_resp)); phy_qcfg_resp.seq_id = phy_qcfg_req->seq_id; + /* New SPEEDS2 fields are beyond the legacy structure, so + * clear the SPEEDS2_SUPPORTED flag. + */ + phy_qcfg_resp.option_flags &= + ~PORT_PHY_QCAPS_RESP_FLAGS2_SPEEDS2_SUPPORTED; phy_qcfg_resp.valid = 1;
if (vf->flags & BNXT_VF_LINK_UP) {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rao Shoaib Rao.Shoaib@oracle.com
[ Upstream commit a6736a0addd60fccc3a3508461d72314cc609772 ]
Read with MSG_PEEK flag loops if the first byte to read is an OOB byte. commit 22dd70eb2c3d ("af_unix: Don't peek OOB data without MSG_OOB.") addresses the loop issue but does not address the issue that no data beyond OOB byte can be read.
from socket import * c1, c2 = socketpair(AF_UNIX, SOCK_STREAM) c1.send(b'a', MSG_OOB)
1
c1.send(b'b')
1
c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'b'
from socket import * c1, c2 = socketpair(AF_UNIX, SOCK_STREAM) c2.setsockopt(SOL_SOCKET, SO_OOBINLINE, 1) c1.send(b'a', MSG_OOB)
1
c1.send(b'b')
1
c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'a'
c2.recv(1, MSG_DONTWAIT)
b'a'
c2.recv(1, MSG_PEEK | MSG_DONTWAIT)
b'b'
Fixes: 314001f0bf92 ("af_unix: Add OOB support") Signed-off-by: Rao Shoaib Rao.Shoaib@oracle.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240611084639.2248934-1-Rao.Shoaib@oracle.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/unix/af_unix.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
--- a/net/unix/af_unix.c +++ b/net/unix/af_unix.c @@ -2672,18 +2672,18 @@ static struct sk_buff *manage_oob(struct if (skb == u->oob_skb) { if (copied) { skb = NULL; - } else if (sock_flag(sk, SOCK_URGINLINE)) { - if (!(flags & MSG_PEEK)) { + } else if (!(flags & MSG_PEEK)) { + if (sock_flag(sk, SOCK_URGINLINE)) { WRITE_ONCE(u->oob_skb, NULL); consume_skb(skb); + } else { + __skb_unlink(skb, &sk->sk_receive_queue); + WRITE_ONCE(u->oob_skb, NULL); + unlinked_skb = skb; + skb = skb_peek(&sk->sk_receive_queue); } - } else if (flags & MSG_PEEK) { - skb = NULL; - } else { - __skb_unlink(skb, &sk->sk_receive_queue); - WRITE_ONCE(u->oob_skb, NULL); - unlinked_skb = skb; - skb = skb_peek(&sk->sk_receive_queue); + } else if (!sock_flag(sk, SOCK_URGINLINE)) { + skb = skb_peek_next(skb, &sk->sk_receive_queue); } }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aleksandr Mishin amishin@t-argos.ru
[ Upstream commit a9b9741854a9fe9df948af49ca5514e0ed0429df ]
In case of token is released due to token->state == BNXT_HWRM_DEFERRED, released token (set to NULL) is used in log messages. This issue is expected to be prevented by HWRM_ERR_CODE_PF_UNAVAILABLE error code. But this error code is returned by recent firmware. So some firmware may not return it. This may lead to NULL pointer dereference. Adjust this issue by adding token pointer check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 8fa4219dba8e ("bnxt_en: add dynamic debug support for HWRM messages") Suggested-by: Michael Chan michael.chan@broadcom.com Signed-off-by: Aleksandr Mishin amishin@t-argos.ru Reviewed-by: Wojciech Drewek wojciech.drewek@intel.com Reviewed-by: Michael Chan michael.chan@broadcom.com Link: https://lore.kernel.org/r/20240611082547.12178-1-amishin@t-argos.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.c index 1df3d56cc4b51..d2fd2d04ed474 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_hwrm.c @@ -680,7 +680,7 @@ static int __hwrm_send(struct bnxt *bp, struct bnxt_hwrm_ctx *ctx) req_type); else if (rc && rc != HWRM_ERR_CODE_PF_UNAVAILABLE) hwrm_err(bp, ctx, "hwrm req_type 0x%x seq id 0x%x error 0x%x\n", - req_type, token->seq_id, rc); + req_type, le16_to_cpu(ctx->req->seq_id), rc); rc = __hwrm_to_stderr(rc); exit: if (token)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yongzhi Liu hyperlyzcs@gmail.com
commit 086c6cbcc563c81d55257f9b27e14faf1d0963d3 upstream.
When auxiliary_device_add() returns error and then calls auxiliary_device_uninit(), callback function gp_auxiliary_device_release() calls ida_free() and kfree(aux_device_wrapper) to free memory. We should't call them again in the error handling path.
Fix this by skipping the redundant cleanup functions.
Fixes: 393fc2f5948f ("misc: microchip: pci1xxxx: load auxiliary bus driver for the PIO function in the multi-function endpoint of pci1xxxx device.") Signed-off-by: Yongzhi Liu hyperlyzcs@gmail.com Link: https://lore.kernel.org/r/20240523121434.21855-3-hyperlyzcs@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.c +++ b/drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.c @@ -111,6 +111,7 @@ static int gp_aux_bus_probe(struct pci_d
err_aux_dev_add_1: auxiliary_device_uninit(&aux_bus->aux_device_wrapper[1]->aux_dev); + goto err_aux_dev_add_0;
err_aux_dev_init_1: ida_free(&gp_client_ida, aux_bus->aux_device_wrapper[1]->aux_dev.id); @@ -120,6 +121,7 @@ err_ida_alloc_1:
err_aux_dev_add_0: auxiliary_device_uninit(&aux_bus->aux_device_wrapper[0]->aux_dev); + goto err_ret;
err_aux_dev_init_0: ida_free(&gp_client_ida, aux_bus->aux_device_wrapper[0]->aux_dev.id); @@ -127,6 +129,7 @@ err_aux_dev_init_0: err_ida_alloc_0: kfree(aux_bus->aux_device_wrapper[0]);
+err_ret: return retval; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
commit 1cdeca6a7264021e20157de0baf7880ff0ced822 upstream.
If the directory name in the root of the share starts with character like 镜(0x955c) or Ṝ(0x1e5c), it (and anything inside) cannot be accessed. The leading slash check must be checked after converting unicode to nls string.
Cc: stable@vger.kernel.org Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/smb2pdu.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
--- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -630,6 +630,12 @@ smb2_get_name(const char *src, const int return name; }
+ if (*name == '\') { + pr_err("not allow directory name included leading slash\n"); + kfree(name); + return ERR_PTR(-EINVAL); + } + ksmbd_conv_path_to_unix(name); ksmbd_strip_last_slash(name); return name; @@ -2842,20 +2848,11 @@ int smb2_open(struct ksmbd_work *work) }
if (req->NameLength) { - if ((req->CreateOptions & FILE_DIRECTORY_FILE_LE) && - *(char *)req->Buffer == '\') { - pr_err("not allow directory name included leading slash\n"); - rc = -EINVAL; - goto err_out2; - } - name = smb2_get_name((char *)req + le16_to_cpu(req->NameOffset), le16_to_cpu(req->NameLength), work->conn->local_nls); if (IS_ERR(name)) { rc = PTR_ERR(name); - if (rc != -ENOMEM) - rc = -ENOENT; name = NULL; goto err_out2; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
commit 2bfc4214c69c62da13a9da8e3c3db5539da2ccd3 upstream.
Fix an issue where get_write is not used in smb2_set_ea().
Fixes: 6fc0a265e1b9 ("ksmbd: fix potential circular locking issue in smb2_set_ea()") Cc: stable@vger.kernel.org Reported-by: Wang Zhaolong wangzhaolong1@huawei.com Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/smb2pdu.c | 7 ++++--- fs/smb/server/vfs.c | 17 +++++++++++------ fs/smb/server/vfs.h | 3 ++- fs/smb/server/vfs_cache.c | 3 ++- 4 files changed, 19 insertions(+), 11 deletions(-)
--- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -2367,7 +2367,8 @@ static int smb2_set_ea(struct smb2_ea_in if (rc > 0) { rc = ksmbd_vfs_remove_xattr(idmap, path, - attr_name); + attr_name, + get_write);
if (rc < 0) { ksmbd_debug(SMB, @@ -2382,7 +2383,7 @@ static int smb2_set_ea(struct smb2_ea_in } else { rc = ksmbd_vfs_setxattr(idmap, path, attr_name, value, le16_to_cpu(eabuf->EaValueLength), - 0, true); + 0, get_write); if (rc < 0) { ksmbd_debug(SMB, "ksmbd_vfs_setxattr is failed(%d)\n", @@ -2474,7 +2475,7 @@ static int smb2_remove_smb_xattrs(const !strncmp(&name[XATTR_USER_PREFIX_LEN], STREAM_PREFIX, STREAM_PREFIX_LEN)) { err = ksmbd_vfs_remove_xattr(idmap, path, - name); + name, true); if (err) ksmbd_debug(SMB, "remove xattr failed : %s\n", name); --- a/fs/smb/server/vfs.c +++ b/fs/smb/server/vfs.c @@ -1058,16 +1058,21 @@ int ksmbd_vfs_fqar_lseek(struct ksmbd_fi }
int ksmbd_vfs_remove_xattr(struct mnt_idmap *idmap, - const struct path *path, char *attr_name) + const struct path *path, char *attr_name, + bool get_write) { int err;
- err = mnt_want_write(path->mnt); - if (err) - return err; + if (get_write == true) { + err = mnt_want_write(path->mnt); + if (err) + return err; + }
err = vfs_removexattr(idmap, path->dentry, attr_name); - mnt_drop_write(path->mnt); + + if (get_write == true) + mnt_drop_write(path->mnt);
return err; } @@ -1380,7 +1385,7 @@ int ksmbd_vfs_remove_sd_xattrs(struct mn ksmbd_debug(SMB, "%s, len %zd\n", name, strlen(name));
if (!strncmp(name, XATTR_NAME_SD, XATTR_NAME_SD_LEN)) { - err = ksmbd_vfs_remove_xattr(idmap, path, name); + err = ksmbd_vfs_remove_xattr(idmap, path, name, true); if (err) ksmbd_debug(SMB, "remove xattr failed : %s\n", name); } --- a/fs/smb/server/vfs.h +++ b/fs/smb/server/vfs.h @@ -114,7 +114,8 @@ int ksmbd_vfs_setxattr(struct mnt_idmap int ksmbd_vfs_xattr_stream_name(char *stream_name, char **xattr_stream_name, size_t *xattr_stream_name_size, int s_type); int ksmbd_vfs_remove_xattr(struct mnt_idmap *idmap, - const struct path *path, char *attr_name); + const struct path *path, char *attr_name, + bool get_write); int ksmbd_vfs_kern_path_locked(struct ksmbd_work *work, char *name, unsigned int flags, struct path *parent_path, struct path *path, bool caseless); --- a/fs/smb/server/vfs_cache.c +++ b/fs/smb/server/vfs_cache.c @@ -254,7 +254,8 @@ static void __ksmbd_inode_close(struct k ci->m_flags &= ~S_DEL_ON_CLS_STREAM; err = ksmbd_vfs_remove_xattr(file_mnt_idmap(filp), &filp->f_path, - fp->stream.name); + fp->stream.name, + true); if (err) pr_err("remove xattr failed : %s\n", fp->stream.name);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleg Nesterov oleg@redhat.com
commit 07c54cc5988f19c9642fd463c2dbdac7fc52f777 upstream.
After the recent commit 5097cbcb38e6 ("sched/isolation: Prevent boot crash when the boot CPU is nohz_full") the kernel no longer crashes, but there is another problem.
In this case tick_setup_device() calls tick_take_do_timer_from_boot() to update tick_do_timer_cpu and this triggers the WARN_ON_ONCE(irqs_disabled) in smp_call_function_single().
Kill tick_take_do_timer_from_boot() and just use WRITE_ONCE(), the new comment explains why this is safe (thanks Thomas!).
Fixes: 08ae95f4fd3b ("nohz_full: Allow the boot CPU to be nohz_full") Signed-off-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240528122019.GA28794@redhat.com Link: https://lore.kernel.org/all/20240522151742.GA10400@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/tick-common.c | 42 ++++++++++++++---------------------------- 1 file changed, 14 insertions(+), 28 deletions(-)
--- a/kernel/time/tick-common.c +++ b/kernel/time/tick-common.c @@ -178,26 +178,6 @@ void tick_setup_periodic(struct clock_ev } }
-#ifdef CONFIG_NO_HZ_FULL -static void giveup_do_timer(void *info) -{ - int cpu = *(unsigned int *)info; - - WARN_ON(tick_do_timer_cpu != smp_processor_id()); - - tick_do_timer_cpu = cpu; -} - -static void tick_take_do_timer_from_boot(void) -{ - int cpu = smp_processor_id(); - int from = tick_do_timer_boot_cpu; - - if (from >= 0 && from != cpu) - smp_call_function_single(from, giveup_do_timer, &cpu, 1); -} -#endif - /* * Setup the tick device */ @@ -221,19 +201,25 @@ static void tick_setup_device(struct tic tick_next_period = ktime_get(); #ifdef CONFIG_NO_HZ_FULL /* - * The boot CPU may be nohz_full, in which case set - * tick_do_timer_boot_cpu so the first housekeeping - * secondary that comes up will take do_timer from - * us. + * The boot CPU may be nohz_full, in which case the + * first housekeeping secondary will take do_timer() + * from it. */ if (tick_nohz_full_cpu(cpu)) tick_do_timer_boot_cpu = cpu;
- } else if (tick_do_timer_boot_cpu != -1 && - !tick_nohz_full_cpu(cpu)) { - tick_take_do_timer_from_boot(); + } else if (tick_do_timer_boot_cpu != -1 && !tick_nohz_full_cpu(cpu)) { tick_do_timer_boot_cpu = -1; - WARN_ON(READ_ONCE(tick_do_timer_cpu) != cpu); + /* + * The boot CPU will stay in periodic (NOHZ disabled) + * mode until clocksource_done_booting() called after + * smp_init() selects a high resolution clocksource and + * timekeeping_notify() kicks the NOHZ stuff alive. + * + * So this WRITE_ONCE can only race with the READ_ONCE + * check in tick_periodic() but this race is harmless. + */ + WRITE_ONCE(tick_do_timer_cpu, cpu); #endif }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit fcf2a9970ef587d8f358560c381ee6115a9108aa upstream.
Commit 66601a29bb23 ("leds: class: If no default trigger is given, make hw_control trigger the default trigger") causes ledtrig-netdev to get set as default trigger on various network LEDs.
This causes users to hit a pre-existing AB-BA deadlock issue in ledtrig-netdev between the LED-trigger locks and the rtnl mutex, resulting in hung tasks in kernels >= 6.9.
Solving the deadlock is non trivial, so for now revert the change to set the hw_control trigger as default trigger, so that ledtrig-netdev no longer gets activated automatically for various network LEDs.
The netdev trigger is not needed because the network LEDs are usually under hw-control and the netdev trigger tries to leave things that way so setting it as the active trigger for the LED class device is a no-op.
Fixes: 66601a29bb23 ("leds: class: If no default trigger is given, make hw_control trigger the default trigger") Reported-by: Genes Lists lists@sapience.com Closes: https://lore.kernel.org/all/9d189ec329cfe68ed68699f314e191a10d4b5eda.camel@s... Reported-by: Johannes Wüller johanneswueller@gmail.com Closes: https://lore.kernel.org/lkml/e441605c-eaf2-4c2d-872b-d8e541f4cf60@gmail.com/ Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Andrew Lunn andrew@lunn.ch Acked-by: Lee Jones lee@kernel.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/leds/led-class.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c index 24fcff682b24..ba1be15cfd8e 100644 --- a/drivers/leds/led-class.c +++ b/drivers/leds/led-class.c @@ -552,12 +552,6 @@ int led_classdev_register_ext(struct device *parent, led_init_core(led_cdev);
#ifdef CONFIG_LEDS_TRIGGERS - /* - * If no default trigger was given and hw_control_trigger is set, - * make it the default trigger. - */ - if (!led_cdev->default_trigger && led_cdev->hw_control_trigger) - led_cdev->default_trigger = led_cdev->hw_control_trigger; led_trigger_set_default(led_cdev); #endif
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Segall bsegall@google.com
commit b2747f108b8034271fd5289bd8f3a7003e0775a3 upstream.
This is a re-commit of
da05b143a308 ("x86/boot: Don't add the EFI stub to targets")
after the tagged patch incorrectly reverted it.
vmlinux-objs-y is added to targets, with an assumption that they are all relative to $(obj); adding a $(objtree)/drivers/... path causes the build to incorrectly create a useless arch/x86/boot/compressed/drivers/... directory tree.
Fix this just by using a different make variable for the EFI stub.
Fixes: cb8bda8ad443 ("x86/boot/compressed: Rename efi_thunk_64.S to efi-mixed.S") Signed-off-by: Ben Segall bsegall@google.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Ard Biesheuvel ardb@kernel.org Cc: stable@vger.kernel.org # v6.1+ Link: https://lore.kernel.org/r/xm267ceukksz.fsf@bsegall.svl.corp.google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/boot/compressed/Makefile | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/boot/compressed/Makefile +++ b/arch/x86/boot/compressed/Makefile @@ -116,9 +116,9 @@ vmlinux-objs-$(CONFIG_UNACCEPTED_MEMORY)
vmlinux-objs-$(CONFIG_EFI) += $(obj)/efi.o vmlinux-objs-$(CONFIG_EFI_MIXED) += $(obj)/efi_mixed.o -vmlinux-objs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a +vmlinux-libs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a
-$(obj)/vmlinux: $(vmlinux-objs-y) FORCE +$(obj)/vmlinux: $(vmlinux-objs-y) $(vmlinux-libs-y) FORCE $(call if_changed,ld)
OBJCOPYFLAGS_vmlinux.bin := -R .comment -S
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Lechner dlechner@baylibre.com
commit 8a01ef749b0a632f0e1f4ead0f08b3310d99fcb1 upstream.
According to the IIO documentation, the sign in the scan type should be lower case. The ad9467 driver was incorrectly using upper case.
Fix by changing to lower case.
Fixes: 4606d0f4b05f ("iio: adc: ad9467: add support for AD9434 high-speed ADC") Fixes: ad6797120238 ("iio: adc: ad9467: add support AD9467 ADC") Signed-off-by: David Lechner dlechner@baylibre.com Link: https://lore.kernel.org/r/20240503-ad9467-fix-scan-type-sign-v1-1-c7a1a066eb... Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/adc/ad9467.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/iio/adc/ad9467.c +++ b/drivers/iio/adc/ad9467.c @@ -225,11 +225,11 @@ static void __ad9467_get_scale(struct ad }
static const struct iio_chan_spec ad9434_channels[] = { - AD9467_CHAN(0, 0, 12, 'S'), + AD9467_CHAN(0, 0, 12, 's'), };
static const struct iio_chan_spec ad9467_channels[] = { - AD9467_CHAN(0, 0, 16, 'S'), + AD9467_CHAN(0, 0, 16, 's'), };
static const struct ad9467_chip_info ad9467_chip_tbl = {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Ferland marc.ferland@sonatest.com
commit 279428df888319bf68f2686934897301a250bb84 upstream.
The scale value for the temperature channel is (assuming Vref=2.5 and the datasheet):
376.7897513
When calculating both val and val2 for the temperature scale we use (3767897513/25) and multiply it by Vref (here I assume 2500mV) to obtain:
2500 * (3767897513/25) ==> 376789751300
Finally we divide with remainder by 10^9 to get:
val = 376 val2 = 789751300
However, we return IIO_VAL_INT_PLUS_MICRO (should have been NANO) as the scale type. So when converting the raw temperature value to the 'processed' temperature value we will get (assuming raw=810, offset=-753):
processed = (raw + offset) * scale_val = (810 + -753) * 376 = 21432
processed += div((raw + offset) * scale_val2, 10^6) += div((810 + -753) * 789751300, 10^6) += 45015 ==> 66447 ==> 66.4 Celcius
instead of the expected 21.5 Celsius.
Fix this issue by changing IIO_VAL_INT_PLUS_MICRO to IIO_VAL_INT_PLUS_NANO.
Fixes: 56ca9db862bf ("iio: dac: Add support for the AD5592R/AD5593R ADCs/DACs") Signed-off-by: Marc Ferland marc.ferland@sonatest.com Link: https://lore.kernel.org/r/20240501150554.1871390-1-marc.ferland@sonatest.com Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/dac/ad5592r-base.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/iio/dac/ad5592r-base.c +++ b/drivers/iio/dac/ad5592r-base.c @@ -415,7 +415,7 @@ static int ad5592r_read_raw(struct iio_d s64 tmp = *val * (3767897513LL / 25LL); *val = div_s64_rem(tmp, 1000000000LL, val2);
- return IIO_VAL_INT_PLUS_MICRO; + return IIO_VAL_INT_PLUS_NANO; }
mutex_lock(&st->lock);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vasileios Amoiridis vassilisamir@gmail.com
commit bedb2ccb566de5ca0c336ca3fd3588cea6d50414 upstream.
In case of error in the bmi323_trigger_handler() function, the function exits without calling the iio_trigger_notify_done() which is responsible for informing the attached trigger that the process is done and in case there is a .reenable(), to call it.
Fixes: 8a636db3aa57 ("iio: imu: Add driver for BMI323 IMU") Signed-off-by: Vasileios Amoiridis vassilisamir@gmail.com Link: https://lore.kernel.org/r/20240508155407.139805-1-vassilisamir@gmail.com Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/imu/bmi323/bmi323_core.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/iio/imu/bmi323/bmi323_core.c +++ b/drivers/iio/imu/bmi323/bmi323_core.c @@ -1391,7 +1391,7 @@ static irqreturn_t bmi323_trigger_handle &data->buffer.channels, ARRAY_SIZE(data->buffer.channels)); if (ret) - return IRQ_NONE; + goto out; } else { for_each_set_bit(bit, indio_dev->active_scan_mask, BMI323_CHAN_MAX) { @@ -1400,13 +1400,14 @@ static irqreturn_t bmi323_trigger_handle &data->buffer.channels[index++], BMI323_BYTES_PER_SAMPLE); if (ret) - return IRQ_NONE; + goto out; } }
iio_push_to_buffers_with_timestamp(indio_dev, &data->buffer, iio_get_time_ns(indio_dev));
+out: iio_trigger_notify_done(indio_dev->trig);
return IRQ_HANDLED;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jean-Baptiste Maneyrol jean-baptiste.maneyrol@tdk.com
commit 95444b9eeb8c5c0330563931d70c61ca3b101548 upstream.
ODR switching happens in 2 steps, update to store the new value and then apply when the ODR change flag is received in the data. When switching to the same ODR value, the ODR change flag is never happening, and frequency switching is blocked waiting for the never coming apply.
Fix the issue by preventing update to happen when switching to same ODR value.
Fixes: 0ecc363ccea7 ("iio: make invensense timestamp module generic") Cc: stable@vger.kernel.org Signed-off-by: Jean-Baptiste Maneyrol jean-baptiste.maneyrol@tdk.com Link: https://lore.kernel.org/r/20240524124851.567485-1-inv.git-commit@tdk.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/common/inv_sensors/inv_sensors_timestamp.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/iio/common/inv_sensors/inv_sensors_timestamp.c +++ b/drivers/iio/common/inv_sensors/inv_sensors_timestamp.c @@ -60,11 +60,15 @@ EXPORT_SYMBOL_NS_GPL(inv_sensors_timesta int inv_sensors_timestamp_update_odr(struct inv_sensors_timestamp *ts, uint32_t period, bool fifo) { + uint32_t mult; + /* when FIFO is on, prevent odr change if one is already pending */ if (fifo && ts->new_mult != 0) return -EAGAIN;
- ts->new_mult = period / ts->chip.clock_period; + mult = period / ts->chip.clock_period; + if (mult != ts->mult) + ts->new_mult = mult;
return 0; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adam Rizkalla ajarizzo@gmail.com
commit 0f0f6306617cb4b6231fc9d4ec68ab9a56dba7c0 upstream.
Fix overflow issue when storing BMP580 temperature reading and properly preserve sign of 24-bit data.
Signed-off-by: Adam Rizkalla ajarizzo@gmail.com Tested-By: Vasileios Amoiridis vassilisamir@gmail.com Acked-by: Angel Iglesias ang.iglesiasg@gmail.com Link: https://lore.kernel.org/r/Zin2udkXRD0+GrML@adam-asahi.lan Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/pressure/bmp280-core.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/iio/pressure/bmp280-core.c +++ b/drivers/iio/pressure/bmp280-core.c @@ -1394,12 +1394,12 @@ static int bmp580_read_temp(struct bmp28
/* * Temperature is returned in Celsius degrees in fractional - * form down 2^16. We rescale by x1000 to return milli Celsius - * to respect IIO ABI. + * form down 2^16. We rescale by x1000 to return millidegrees + * Celsius to respect IIO ABI. */ - *val = raw_temp * 1000; - *val2 = 16; - return IIO_VAL_FRACTIONAL_LOG2; + raw_temp = sign_extend32(raw_temp, 23); + *val = ((s64)raw_temp * 1000) / (1 << 16); + return IIO_VAL_INT; }
static int bmp580_read_press(struct bmp280_data *data, int *val, int *val2)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com
commit a23c14b062d8800a2192077d83273bbfe6c7552d upstream.
When devm_regmap_init_i2c() fails, regmap_ee could be error pointer, instead of checking for IS_ERR(regmap_ee), regmap is checked which looks like a copy paste error.
Fixes: a1d1ba5e1c28 ("iio: temperature: mlx90635 MLX90635 IR Temperature sensor") Reviewed-by: Crt Moricmo@melexis.com Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com Link: https://lore.kernel.org/r/20240513203427.3208696-1-harshit.m.mogalapalli@ora... Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/temperature/mlx90635.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/iio/temperature/mlx90635.c +++ b/drivers/iio/temperature/mlx90635.c @@ -947,9 +947,9 @@ static int mlx90635_probe(struct i2c_cli "failed to allocate regmap\n");
regmap_ee = devm_regmap_init_i2c(client, &mlx90635_regmap_ee); - if (IS_ERR(regmap)) - return dev_err_probe(&client->dev, PTR_ERR(regmap), - "failed to allocate regmap\n"); + if (IS_ERR(regmap_ee)) + return dev_err_probe(&client->dev, PTR_ERR(regmap_ee), + "failed to allocate EEPROM regmap\n");
mlx90635 = iio_priv(indio_dev); i2c_set_clientdata(client, indio_dev);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jean-Baptiste Maneyrol jean-baptiste.maneyrol@tdk.com
commit 245f3b149e6cc3ac6ee612cdb7042263bfc9e73c upstream.
Update watermark will be done inside the hwfifo_set_watermark callback just after the update_scan_mode. It is useless to do it here.
Fixes: 7f85e42a6c54 ("iio: imu: inv_icm42600: add buffer support in iio devices") Cc: stable@vger.kernel.org Signed-off-by: Jean-Baptiste Maneyrol jean-baptiste.maneyrol@tdk.com Link: https://lore.kernel.org/r/20240527210008.612932-1-inv.git-commit@tdk.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c | 4 ---- drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c | 4 ---- 2 files changed, 8 deletions(-)
--- a/drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c +++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_accel.c @@ -129,10 +129,6 @@ static int inv_icm42600_accel_update_sca /* update data FIFO write */ inv_sensors_timestamp_apply_odr(ts, 0, 0, 0); ret = inv_icm42600_buffer_set_fifo_en(st, fifo_en | st->fifo.en); - if (ret) - goto out_unlock; - - ret = inv_icm42600_buffer_update_watermark(st);
out_unlock: mutex_unlock(&st->lock); --- a/drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c +++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_gyro.c @@ -129,10 +129,6 @@ static int inv_icm42600_gyro_update_scan /* update data FIFO write */ inv_sensors_timestamp_apply_odr(ts, 0, 0, 0); ret = inv_icm42600_buffer_set_fifo_en(st, fifo_en | st->fifo.en); - if (ret) - goto out_unlock; - - ret = inv_icm42600_buffer_update_watermark(st);
out_unlock: mutex_unlock(&st->lock);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dirk Behme dirk.behme@de.bosch.com
commit c0a40097f0bc81deafc15f9195d1fb54595cd6d0 upstream.
Synchronize the dev->driver usage in really_probe() and dev_uevent(). These can run in different threads, what can result in the following race condition for dev->driver uninitialization:
Thread #1: ==========
really_probe() { ... probe_failed: ... device_unbind_cleanup(dev) { ... dev->driver = NULL; // <= Failed probe sets dev->driver to NULL ... } ... }
Thread #2: ==========
dev_uevent() { ... if (dev->driver) // If dev->driver is NULLed from really_probe() from here on, // after above check, the system crashes add_uevent_var(env, "DRIVER=%s", dev->driver->name); ... }
really_probe() holds the lock, already. So nothing needs to be done there. dev_uevent() is called with lock held, often, too. But not always. What implies that we can't add any locking in dev_uevent() itself. So fix this race by adding the lock to the non-protected path. This is the path where above race is observed:
dev_uevent+0x235/0x380 uevent_show+0x10c/0x1f0 <= Add lock here dev_attr_show+0x3a/0xa0 sysfs_kf_seq_show+0x17c/0x250 kernfs_seq_show+0x7c/0x90 seq_read_iter+0x2d7/0x940 kernfs_fop_read_iter+0xc6/0x310 vfs_read+0x5bc/0x6b0 ksys_read+0xeb/0x1b0 __x64_sys_read+0x42/0x50 x64_sys_call+0x27ad/0x2d30 do_syscall_64+0xcd/0x1d0 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Similar cases are reported by syzkaller in
https://syzkaller.appspot.com/bug?extid=ffa8143439596313a85a
But these are regarding the *initialization* of dev->driver
dev->driver = drv;
As this switches dev->driver to non-NULL these reports can be considered to be false-positives (which should be "fixed" by this commit, as well, though).
The same issue was reported and tried to be fixed back in 2015 in
https://lore.kernel.org/lkml/1421259054-2574-1-git-send-email-a.sangwan@sams...
already.
Fixes: 239378f16aa1 ("Driver core: add uevent vars for devices of a class") Cc: stable stable@kernel.org Cc: syzbot+ffa8143439596313a85a@syzkaller.appspotmail.com Cc: Ashish Sangwan a.sangwan@samsung.com Cc: Namjae Jeon namjae.jeon@samsung.com Signed-off-by: Dirk Behme dirk.behme@de.bosch.com Link: https://lore.kernel.org/r/20240513050634.3964461-1-dirk.behme@de.bosch.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/base/core.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -2738,8 +2738,11 @@ static ssize_t uevent_show(struct device if (!env) return -ENOMEM;
+ /* Synchronize with really_probe() */ + device_lock(dev); /* let the kset specific function add its keys */ retval = kset->uevent_ops->uevent(&dev->kobj, env); + device_unlock(dev); if (retval) goto out;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: John David Anglin dave@parisc-linux.org
commit 72d95924ee35c8cd16ef52f912483ee938a34d49 upstream.
PA-RISC systems with PA8800 and PA8900 processors have had problems with random segmentation faults for many years. Systems with earlier processors are much more stable.
Systems with PA8800 and PA8900 processors have a large L2 cache which needs per page flushing for decent performance when a large range is flushed. The combined cache in these systems is also more sensitive to non-equivalent aliases than the caches in earlier systems.
The majority of random segmentation faults that I have looked at appear to be memory corruption in memory allocated using mmap and malloc.
My first attempt at fixing the random faults didn't work. On reviewing the cache code, I realized that there were two issues which the existing code didn't handle correctly. Both relate to cache move-in. Another issue is that the present bit in PTEs is racy.
1) PA-RISC caches have a mind of their own and they can speculatively load data and instructions for a page as long as there is a entry in the TLB for the page which allows move-in. TLBs are local to each CPU. Thus, the TLB entry for a page must be purged before flushing the page. This is particularly important on SMP systems.
In some of the flush routines, the flush routine would be called and then the TLB entry would be purged. This was because the flush routine needed the TLB entry to do the flush.
2) My initial approach to trying the fix the random faults was to try and use flush_cache_page_if_present for all flush operations. This actually made things worse and led to a couple of hardware lockups. It finally dawned on me that some lines weren't being flushed because the pte check code was racy. This resulted in random inequivalent mappings to physical pages.
The __flush_cache_page tmpalias flush sets up its own TLB entry and it doesn't need the existing TLB entry. As long as we can find the pte pointer for the vm page, we can get the pfn and physical address of the page. We can also purge the TLB entry for the page before doing the flush. Further, __flush_cache_page uses a special TLB entry that inhibits cache move-in.
When switching page mappings, we need to ensure that lines are removed from the cache. It is not sufficient to just flush the lines to memory as they may come back.
This made it clear that we needed to implement all the required flush operations using tmpalias routines. This includes flushes for user and kernel pages.
After modifying the code to use tmpalias flushes, it became clear that the random segmentation faults were not fully resolved. The frequency of faults was worse on systems with a 64 MB L2 (PA8900) and systems with more CPUs (rp4440).
The warning that I added to flush_cache_page_if_present to detect pages that couldn't be flushed triggered frequently on some systems.
Helge and I looked at the pages that couldn't be flushed and found that the PTE was either cleared or for a swap page. Ignoring pages that were swapped out seemed okay but pages with cleared PTEs seemed problematic.
I looked at routines related to pte_clear and noticed ptep_clear_flush. The default implementation just flushes the TLB entry. However, it was obvious that on parisc we need to flush the cache page as well. If we don't flush the cache page, stale lines will be left in the cache and cause random corruption. Once a PTE is cleared, there is no way to find the physical address associated with the PTE and flush the associated page at a later time.
I implemented an updated change with a parisc specific version of ptep_clear_flush. It fixed the random data corruption on Helge's rp4440 and rp3440, as well as on my c8000.
At this point, I realized that I could restore the code where we only flush in flush_cache_page_if_present if the page has been accessed. However, for this, we also need to flush the cache when the accessed bit is cleared in ptep_clear_flush_young to keep things synchronized. The default implementation only flushes the TLB entry.
Other changes in this version are:
1) Implement parisc specific version of ptep_get. It's identical to default but needed in arch/parisc/include/asm/pgtable.h. 2) Revise parisc implementation of ptep_test_and_clear_young to use ptep_get (READ_ONCE). 3) Drop parisc implementation of ptep_get_and_clear. We can use default. 4) Revise flush_kernel_vmap_range and invalidate_kernel_vmap_range to use full data cache flush. 5) Move flush_cache_vmap and flush_cache_vunmap to cache.c. Handle VM_IOREMAP case in flush_cache_vmap.
At this time, I don't know whether it is better to always flush when the PTE present bit is set or when both the accessed and present bits are set. The later saves flushing pages that haven't been accessed, but we need to flush in ptep_clear_flush_young. It also needs a page table lookup to find the PTE pointer. The lpa instruction only needs a page table lookup when the PTE entry isn't in the TLB.
We don't atomically handle setting and clearing the _PAGE_ACCESSED bit. If we miss an update, we may miss a flush and the cache may get corrupted. Whether the current code is effectively atomic depends on process control.
When CONFIG_FLUSH_PAGE_ACCESSED is set to zero, the page will eventually be flushed when the PTE is cleared or in flush_cache_page_if_present. The _PAGE_ACCESSED bit is not used, so the problem is avoided.
The flush method can be selected using the CONFIG_FLUSH_PAGE_ACCESSED define in cache.c. The default is 0. I didn't see a large difference in performance.
Signed-off-by: John David Anglin dave.anglin@bell.net Cc: stable@vger.kernel.org # v6.6+ Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/include/asm/cacheflush.h | 15 - arch/parisc/include/asm/pgtable.h | 27 +- arch/parisc/kernel/cache.c | 411 +++++++++++++++++++++-------------- 3 files changed, 274 insertions(+), 179 deletions(-)
--- a/arch/parisc/include/asm/cacheflush.h +++ b/arch/parisc/include/asm/cacheflush.h @@ -31,18 +31,17 @@ void flush_cache_all_local(void); void flush_cache_all(void); void flush_cache_mm(struct mm_struct *mm);
-void flush_kernel_dcache_page_addr(const void *addr); - #define flush_kernel_dcache_range(start,size) \ flush_kernel_dcache_range_asm((start), (start)+(size));
+/* The only way to flush a vmap range is to flush whole cache */ #define ARCH_IMPLEMENTS_FLUSH_KERNEL_VMAP_RANGE 1 void flush_kernel_vmap_range(void *vaddr, int size); void invalidate_kernel_vmap_range(void *vaddr, int size);
-#define flush_cache_vmap(start, end) flush_cache_all() +void flush_cache_vmap(unsigned long start, unsigned long end); #define flush_cache_vmap_early(start, end) do { } while (0) -#define flush_cache_vunmap(start, end) flush_cache_all() +void flush_cache_vunmap(unsigned long start, unsigned long end);
void flush_dcache_folio(struct folio *folio); #define flush_dcache_folio flush_dcache_folio @@ -77,17 +76,11 @@ void flush_cache_page(struct vm_area_str void flush_cache_range(struct vm_area_struct *vma, unsigned long start, unsigned long end);
-/* defined in pacache.S exported in cache.c used by flush_anon_page */ -void flush_dcache_page_asm(unsigned long phys_addr, unsigned long vaddr); - #define ARCH_HAS_FLUSH_ANON_PAGE void flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr);
#define ARCH_HAS_FLUSH_ON_KUNMAP -static inline void kunmap_flush_on_unmap(const void *addr) -{ - flush_kernel_dcache_page_addr(addr); -} +void kunmap_flush_on_unmap(const void *addr);
#endif /* _PARISC_CACHEFLUSH_H */
--- a/arch/parisc/include/asm/pgtable.h +++ b/arch/parisc/include/asm/pgtable.h @@ -448,14 +448,17 @@ static inline pte_t pte_swp_clear_exclus return pte; }
+static inline pte_t ptep_get(pte_t *ptep) +{ + return READ_ONCE(*ptep); +} +#define ptep_get ptep_get + static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { pte_t pte;
- if (!pte_young(*ptep)) - return 0; - - pte = *ptep; + pte = ptep_get(ptep); if (!pte_young(pte)) { return 0; } @@ -463,17 +466,10 @@ static inline int ptep_test_and_clear_yo return 1; }
-struct mm_struct; -static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, pte_t *ptep) -{ - pte_t old_pte; - - old_pte = *ptep; - set_pte(ptep, __pte(0)); - - return old_pte; -} +int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); +pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep);
+struct mm_struct; static inline void ptep_set_wrprotect(struct mm_struct *mm, unsigned long addr, pte_t *ptep) { set_pte(ptep, pte_wrprotect(*ptep)); @@ -511,7 +507,8 @@ static inline void ptep_set_wrprotect(st #define HAVE_ARCH_UNMAPPED_AREA_TOPDOWN
#define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG -#define __HAVE_ARCH_PTEP_GET_AND_CLEAR +#define __HAVE_ARCH_PTEP_CLEAR_YOUNG_FLUSH +#define __HAVE_ARCH_PTEP_CLEAR_FLUSH #define __HAVE_ARCH_PTEP_SET_WRPROTECT #define __HAVE_ARCH_PTE_SAME
--- a/arch/parisc/kernel/cache.c +++ b/arch/parisc/kernel/cache.c @@ -20,6 +20,7 @@ #include <linux/sched.h> #include <linux/sched/mm.h> #include <linux/syscalls.h> +#include <linux/vmalloc.h> #include <asm/pdc.h> #include <asm/cache.h> #include <asm/cacheflush.h> @@ -31,20 +32,31 @@ #include <asm/mmu_context.h> #include <asm/cachectl.h>
+#define PTR_PAGE_ALIGN_DOWN(addr) PTR_ALIGN_DOWN(addr, PAGE_SIZE) + +/* + * When nonzero, use _PAGE_ACCESSED bit to try to reduce the number + * of page flushes done flush_cache_page_if_present. There are some + * pros and cons in using this option. It may increase the risk of + * random segmentation faults. + */ +#define CONFIG_FLUSH_PAGE_ACCESSED 0 + int split_tlb __ro_after_init; int dcache_stride __ro_after_init; int icache_stride __ro_after_init; EXPORT_SYMBOL(dcache_stride);
+/* Internal implementation in arch/parisc/kernel/pacache.S */ void flush_dcache_page_asm(unsigned long phys_addr, unsigned long vaddr); EXPORT_SYMBOL(flush_dcache_page_asm); void purge_dcache_page_asm(unsigned long phys_addr, unsigned long vaddr); void flush_icache_page_asm(unsigned long phys_addr, unsigned long vaddr); - -/* Internal implementation in arch/parisc/kernel/pacache.S */ void flush_data_cache_local(void *); /* flushes local data-cache only */ void flush_instruction_cache_local(void); /* flushes local code-cache only */
+static void flush_kernel_dcache_page_addr(const void *addr); + /* On some machines (i.e., ones with the Merced bus), there can be * only a single PxTLB broadcast at a time; this must be guaranteed * by software. We need a spinlock around all TLB flushes to ensure @@ -321,6 +333,18 @@ __flush_cache_page(struct vm_area_struct { if (!static_branch_likely(&parisc_has_cache)) return; + + /* + * The TLB is the engine of coherence on parisc. The CPU is + * entitled to speculate any page with a TLB mapping, so here + * we kill the mapping then flush the page along a special flush + * only alias mapping. This guarantees that the page is no-longer + * in the cache for any process and nor may it be speculatively + * read in (until the user or kernel specifically accesses it, + * of course). + */ + flush_tlb_page(vma, vmaddr); + preempt_disable(); flush_dcache_page_asm(physaddr, vmaddr); if (vma->vm_flags & VM_EXEC) @@ -328,46 +352,44 @@ __flush_cache_page(struct vm_area_struct preempt_enable(); }
-static void flush_user_cache_page(struct vm_area_struct *vma, unsigned long vmaddr) +static void flush_kernel_dcache_page_addr(const void *addr) { - unsigned long flags, space, pgd, prot; -#ifdef CONFIG_TLB_PTLOCK - unsigned long pgd_lock; -#endif + unsigned long vaddr = (unsigned long)addr; + unsigned long flags;
- vmaddr &= PAGE_MASK; + /* Purge TLB entry to remove translation on all CPUs */ + purge_tlb_start(flags); + pdtlb(SR_KERNEL, addr); + purge_tlb_end(flags);
+ /* Use tmpalias flush to prevent data cache move-in */ preempt_disable(); + flush_dcache_page_asm(__pa(vaddr), vaddr); + preempt_enable(); +}
- /* Set context for flush */ - local_irq_save(flags); - prot = mfctl(8); - space = mfsp(SR_USER); - pgd = mfctl(25); -#ifdef CONFIG_TLB_PTLOCK - pgd_lock = mfctl(28); -#endif - switch_mm_irqs_off(NULL, vma->vm_mm, NULL); - local_irq_restore(flags); - - flush_user_dcache_range_asm(vmaddr, vmaddr + PAGE_SIZE); - if (vma->vm_flags & VM_EXEC) - flush_user_icache_range_asm(vmaddr, vmaddr + PAGE_SIZE); - flush_tlb_page(vma, vmaddr); +static void flush_kernel_icache_page_addr(const void *addr) +{ + unsigned long vaddr = (unsigned long)addr; + unsigned long flags;
- /* Restore previous context */ - local_irq_save(flags); -#ifdef CONFIG_TLB_PTLOCK - mtctl(pgd_lock, 28); -#endif - mtctl(pgd, 25); - mtsp(space, SR_USER); - mtctl(prot, 8); - local_irq_restore(flags); + /* Purge TLB entry to remove translation on all CPUs */ + purge_tlb_start(flags); + pdtlb(SR_KERNEL, addr); + purge_tlb_end(flags);
+ /* Use tmpalias flush to prevent instruction cache move-in */ + preempt_disable(); + flush_icache_page_asm(__pa(vaddr), vaddr); preempt_enable(); }
+void kunmap_flush_on_unmap(const void *addr) +{ + flush_kernel_dcache_page_addr(addr); +} +EXPORT_SYMBOL(kunmap_flush_on_unmap); + void flush_icache_pages(struct vm_area_struct *vma, struct page *page, unsigned int nr) { @@ -375,13 +397,16 @@ void flush_icache_pages(struct vm_area_s
for (;;) { flush_kernel_dcache_page_addr(kaddr); - flush_kernel_icache_page(kaddr); + flush_kernel_icache_page_addr(kaddr); if (--nr == 0) break; kaddr += PAGE_SIZE; } }
+/* + * Walk page directory for MM to find PTEP pointer for address ADDR. + */ static inline pte_t *get_ptep(struct mm_struct *mm, unsigned long addr) { pte_t *ptep = NULL; @@ -410,6 +435,41 @@ static inline bool pte_needs_flush(pte_t == (_PAGE_PRESENT | _PAGE_ACCESSED); }
+/* + * Return user physical address. Returns 0 if page is not present. + */ +static inline unsigned long get_upa(struct mm_struct *mm, unsigned long addr) +{ + unsigned long flags, space, pgd, prot, pa; +#ifdef CONFIG_TLB_PTLOCK + unsigned long pgd_lock; +#endif + + /* Save context */ + local_irq_save(flags); + prot = mfctl(8); + space = mfsp(SR_USER); + pgd = mfctl(25); +#ifdef CONFIG_TLB_PTLOCK + pgd_lock = mfctl(28); +#endif + + /* Set context for lpa_user */ + switch_mm_irqs_off(NULL, mm, NULL); + pa = lpa_user(addr); + + /* Restore previous context */ +#ifdef CONFIG_TLB_PTLOCK + mtctl(pgd_lock, 28); +#endif + mtctl(pgd, 25); + mtsp(space, SR_USER); + mtctl(prot, 8); + local_irq_restore(flags); + + return pa; +} + void flush_dcache_folio(struct folio *folio) { struct address_space *mapping = folio_flush_mapping(folio); @@ -458,50 +518,23 @@ void flush_dcache_folio(struct folio *fo if (addr + nr * PAGE_SIZE > vma->vm_end) nr = (vma->vm_end - addr) / PAGE_SIZE;
- if (parisc_requires_coherency()) { - for (i = 0; i < nr; i++) { - pte_t *ptep = get_ptep(vma->vm_mm, - addr + i * PAGE_SIZE); - if (!ptep) - continue; - if (pte_needs_flush(*ptep)) - flush_user_cache_page(vma, - addr + i * PAGE_SIZE); - /* Optimise accesses to the same table? */ - pte_unmap(ptep); - } - } else { + if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1)) + != (addr & (SHM_COLOUR - 1))) { + for (i = 0; i < nr; i++) + __flush_cache_page(vma, + addr + i * PAGE_SIZE, + (pfn + i) * PAGE_SIZE); /* - * The TLB is the engine of coherence on parisc: - * The CPU is entitled to speculate any page - * with a TLB mapping, so here we kill the - * mapping then flush the page along a special - * flush only alias mapping. This guarantees that - * the page is no-longer in the cache for any - * process and nor may it be speculatively read - * in (until the user or kernel specifically - * accesses it, of course) + * Software is allowed to have any number + * of private mappings to a page. */ - for (i = 0; i < nr; i++) - flush_tlb_page(vma, addr + i * PAGE_SIZE); - if (old_addr == 0 || (old_addr & (SHM_COLOUR - 1)) - != (addr & (SHM_COLOUR - 1))) { - for (i = 0; i < nr; i++) - __flush_cache_page(vma, - addr + i * PAGE_SIZE, - (pfn + i) * PAGE_SIZE); - /* - * Software is allowed to have any number - * of private mappings to a page. - */ - if (!(vma->vm_flags & VM_SHARED)) - continue; - if (old_addr) - pr_err("INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n", - old_addr, addr, vma->vm_file); - if (nr == folio_nr_pages(folio)) - old_addr = addr; - } + if (!(vma->vm_flags & VM_SHARED)) + continue; + if (old_addr) + pr_err("INEQUIVALENT ALIASES 0x%lx and 0x%lx in file %pD\n", + old_addr, addr, vma->vm_file); + if (nr == folio_nr_pages(folio)) + old_addr = addr; } WARN_ON(++count == 4096); } @@ -591,35 +624,28 @@ extern void purge_kernel_dcache_page_asm extern void clear_user_page_asm(void *, unsigned long); extern void copy_user_page_asm(void *, void *, unsigned long);
-void flush_kernel_dcache_page_addr(const void *addr) -{ - unsigned long flags; - - flush_kernel_dcache_page_asm(addr); - purge_tlb_start(flags); - pdtlb(SR_KERNEL, addr); - purge_tlb_end(flags); -} -EXPORT_SYMBOL(flush_kernel_dcache_page_addr); - static void flush_cache_page_if_present(struct vm_area_struct *vma, - unsigned long vmaddr, unsigned long pfn) + unsigned long vmaddr) { +#if CONFIG_FLUSH_PAGE_ACCESSED bool needs_flush = false; - pte_t *ptep; + pte_t *ptep, pte;
- /* - * The pte check is racy and sometimes the flush will trigger - * a non-access TLB miss. Hopefully, the page has already been - * flushed. - */ ptep = get_ptep(vma->vm_mm, vmaddr); if (ptep) { - needs_flush = pte_needs_flush(*ptep); + pte = ptep_get(ptep); + needs_flush = pte_needs_flush(pte); pte_unmap(ptep); } if (needs_flush) - flush_cache_page(vma, vmaddr, pfn); + __flush_cache_page(vma, vmaddr, PFN_PHYS(pte_pfn(pte))); +#else + struct mm_struct *mm = vma->vm_mm; + unsigned long physaddr = get_upa(mm, vmaddr); + + if (physaddr) + __flush_cache_page(vma, vmaddr, PAGE_ALIGN_DOWN(physaddr)); +#endif }
void copy_user_highpage(struct page *to, struct page *from, @@ -629,7 +655,7 @@ void copy_user_highpage(struct page *to,
kfrom = kmap_local_page(from); kto = kmap_local_page(to); - flush_cache_page_if_present(vma, vaddr, page_to_pfn(from)); + __flush_cache_page(vma, vaddr, PFN_PHYS(page_to_pfn(from))); copy_page_asm(kto, kfrom); kunmap_local(kto); kunmap_local(kfrom); @@ -638,16 +664,17 @@ void copy_user_highpage(struct page *to, void copy_to_user_page(struct vm_area_struct *vma, struct page *page, unsigned long user_vaddr, void *dst, void *src, int len) { - flush_cache_page_if_present(vma, user_vaddr, page_to_pfn(page)); + __flush_cache_page(vma, user_vaddr, PFN_PHYS(page_to_pfn(page))); memcpy(dst, src, len); - flush_kernel_dcache_range_asm((unsigned long)dst, (unsigned long)dst + len); + flush_kernel_dcache_page_addr(PTR_PAGE_ALIGN_DOWN(dst)); }
void copy_from_user_page(struct vm_area_struct *vma, struct page *page, unsigned long user_vaddr, void *dst, void *src, int len) { - flush_cache_page_if_present(vma, user_vaddr, page_to_pfn(page)); + __flush_cache_page(vma, user_vaddr, PFN_PHYS(page_to_pfn(page))); memcpy(dst, src, len); + flush_kernel_dcache_page_addr(PTR_PAGE_ALIGN_DOWN(src)); }
/* __flush_tlb_range() @@ -681,32 +708,10 @@ int __flush_tlb_range(unsigned long sid,
static void flush_cache_pages(struct vm_area_struct *vma, unsigned long start, unsigned long end) { - unsigned long addr, pfn; - pte_t *ptep; + unsigned long addr;
- for (addr = start; addr < end; addr += PAGE_SIZE) { - bool needs_flush = false; - /* - * The vma can contain pages that aren't present. Although - * the pte search is expensive, we need the pte to find the - * page pfn and to check whether the page should be flushed. - */ - ptep = get_ptep(vma->vm_mm, addr); - if (ptep) { - needs_flush = pte_needs_flush(*ptep); - pfn = pte_pfn(*ptep); - pte_unmap(ptep); - } - if (needs_flush) { - if (parisc_requires_coherency()) { - flush_user_cache_page(vma, addr); - } else { - if (WARN_ON(!pfn_valid(pfn))) - return; - __flush_cache_page(vma, addr, PFN_PHYS(pfn)); - } - } - } + for (addr = start; addr < end; addr += PAGE_SIZE) + flush_cache_page_if_present(vma, addr); }
static inline unsigned long mm_total_size(struct mm_struct *mm) @@ -757,21 +762,19 @@ void flush_cache_range(struct vm_area_st if (WARN_ON(IS_ENABLED(CONFIG_SMP) && arch_irqs_disabled())) return; flush_tlb_range(vma, start, end); - flush_cache_all(); + if (vma->vm_flags & VM_EXEC) + flush_cache_all(); + else + flush_data_cache(); return; }
- flush_cache_pages(vma, start, end); + flush_cache_pages(vma, start & PAGE_MASK, end); }
void flush_cache_page(struct vm_area_struct *vma, unsigned long vmaddr, unsigned long pfn) { - if (WARN_ON(!pfn_valid(pfn))) - return; - if (parisc_requires_coherency()) - flush_user_cache_page(vma, vmaddr); - else - __flush_cache_page(vma, vmaddr, PFN_PHYS(pfn)); + __flush_cache_page(vma, vmaddr, PFN_PHYS(pfn)); }
void flush_anon_page(struct vm_area_struct *vma, struct page *page, unsigned long vmaddr) @@ -779,34 +782,133 @@ void flush_anon_page(struct vm_area_stru if (!PageAnon(page)) return;
- if (parisc_requires_coherency()) { - if (vma->vm_flags & VM_SHARED) - flush_data_cache(); - else - flush_user_cache_page(vma, vmaddr); + __flush_cache_page(vma, vmaddr, PFN_PHYS(page_to_pfn(page))); +} + +int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long addr, + pte_t *ptep) +{ + pte_t pte = ptep_get(ptep); + + if (!pte_young(pte)) + return 0; + set_pte(ptep, pte_mkold(pte)); +#if CONFIG_FLUSH_PAGE_ACCESSED + __flush_cache_page(vma, addr, PFN_PHYS(pte_pfn(pte))); +#endif + return 1; +} + +/* + * After a PTE is cleared, we have no way to flush the cache for + * the physical page. On PA8800 and PA8900 processors, these lines + * can cause random cache corruption. Thus, we must flush the cache + * as well as the TLB when clearing a PTE that's valid. + */ +pte_t ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, + pte_t *ptep) +{ + struct mm_struct *mm = (vma)->vm_mm; + pte_t pte = ptep_get_and_clear(mm, addr, ptep); + unsigned long pfn = pte_pfn(pte); + + if (pfn_valid(pfn)) + __flush_cache_page(vma, addr, PFN_PHYS(pfn)); + else if (pte_accessible(mm, pte)) + flush_tlb_page(vma, addr); + + return pte; +} + +/* + * The physical address for pages in the ioremap case can be obtained + * from the vm_struct struct. I wasn't able to successfully handle the + * vmalloc and vmap cases. We have an array of struct page pointers in + * the uninitialized vmalloc case but the flush failed using page_to_pfn. + */ +void flush_cache_vmap(unsigned long start, unsigned long end) +{ + unsigned long addr, physaddr; + struct vm_struct *vm; + + /* Prevent cache move-in */ + flush_tlb_kernel_range(start, end); + + if (end - start >= parisc_cache_flush_threshold) { + flush_cache_all(); return; }
- flush_tlb_page(vma, vmaddr); - preempt_disable(); - flush_dcache_page_asm(page_to_phys(page), vmaddr); - preempt_enable(); + if (WARN_ON_ONCE(!is_vmalloc_addr((void *)start))) { + flush_cache_all(); + return; + } + + vm = find_vm_area((void *)start); + if (WARN_ON_ONCE(!vm)) { + flush_cache_all(); + return; + } + + /* The physical addresses of IOREMAP regions are contiguous */ + if (vm->flags & VM_IOREMAP) { + physaddr = vm->phys_addr; + for (addr = start; addr < end; addr += PAGE_SIZE) { + preempt_disable(); + flush_dcache_page_asm(physaddr, start); + flush_icache_page_asm(physaddr, start); + preempt_enable(); + physaddr += PAGE_SIZE; + } + return; + } + + flush_cache_all(); } +EXPORT_SYMBOL(flush_cache_vmap);
+/* + * The vm_struct has been retired and the page table is set up. The + * last page in the range is a guard page. Its physical address can't + * be determined using lpa, so there is no way to flush the range + * using flush_dcache_page_asm. + */ +void flush_cache_vunmap(unsigned long start, unsigned long end) +{ + /* Prevent cache move-in */ + flush_tlb_kernel_range(start, end); + flush_data_cache(); +} +EXPORT_SYMBOL(flush_cache_vunmap); + +/* + * On systems with PA8800/PA8900 processors, there is no way to flush + * a vmap range other than using the architected loop to flush the + * entire cache. The page directory is not set up, so we can't use + * fdc, etc. FDCE/FICE don't work to flush a portion of the cache. + * L2 is physically indexed but FDCE/FICE instructions in virtual + * mode output their virtual address on the core bus, not their + * real address. As a result, the L2 cache index formed from the + * virtual address will most likely not be the same as the L2 index + * formed from the real address. + */ void flush_kernel_vmap_range(void *vaddr, int size) { unsigned long start = (unsigned long)vaddr; unsigned long end = start + size;
- if ((!IS_ENABLED(CONFIG_SMP) || !arch_irqs_disabled()) && - (unsigned long)size >= parisc_cache_flush_threshold) { - flush_tlb_kernel_range(start, end); - flush_data_cache(); + flush_tlb_kernel_range(start, end); + + if (!static_branch_likely(&parisc_has_dcache)) + return; + + /* If interrupts are disabled, we can only do local flush */ + if (WARN_ON(IS_ENABLED(CONFIG_SMP) && arch_irqs_disabled())) { + flush_data_cache_local(NULL); return; }
- flush_kernel_dcache_range_asm(start, end); - flush_tlb_kernel_range(start, end); + flush_data_cache(); } EXPORT_SYMBOL(flush_kernel_vmap_range);
@@ -818,15 +920,18 @@ void invalidate_kernel_vmap_range(void * /* Ensure DMA is complete */ asm_syncdma();
- if ((!IS_ENABLED(CONFIG_SMP) || !arch_irqs_disabled()) && - (unsigned long)size >= parisc_cache_flush_threshold) { - flush_tlb_kernel_range(start, end); - flush_data_cache(); + flush_tlb_kernel_range(start, end); + + if (!static_branch_likely(&parisc_has_dcache)) + return; + + /* If interrupts are disabled, we can only do local flush */ + if (WARN_ON(IS_ENABLED(CONFIG_SMP) && arch_irqs_disabled())) { + flush_data_cache_local(NULL); return; }
- purge_kernel_dcache_range_asm(start, end); - flush_tlb_kernel_range(start, end); + flush_data_cache(); } EXPORT_SYMBOL(invalidate_kernel_vmap_range);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yazen Ghannam yazen.ghannam@amd.com
commit fe8a08973a0dea9757394c5adbdc3c0a03b0b432 upstream.
Apply the SID bits to the correct offset in the Bank value. Do this in the temporary value so they don't need to be masked off later.
Fixes: 87a612375307 ("RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support") Signed-off-by: Yazen Ghannam yazen.ghannam@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240607-mi300-dram-xl-fix-v1-1-2f11547a178c@amd.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ras/amd/atl/umc.c | 9 ++------- 1 file changed, 2 insertions(+), 7 deletions(-)
diff --git a/drivers/ras/amd/atl/umc.c b/drivers/ras/amd/atl/umc.c index 59b6169093f7..5cb92330dc67 100644 --- a/drivers/ras/amd/atl/umc.c +++ b/drivers/ras/amd/atl/umc.c @@ -189,16 +189,11 @@ static unsigned long convert_dram_to_norm_addr_mi300(unsigned long addr)
/* Calculate hash for PC bit. */ if (addr_hash.pc.xor_enable) { - /* Bits SID[1:0] act as Bank[6:5] for PC hash, so apply them here. */ - bank |= sid << 5; - temp = bitwise_xor_bits(col & addr_hash.pc.col_xor); temp ^= bitwise_xor_bits(row & addr_hash.pc.row_xor); - temp ^= bitwise_xor_bits(bank & addr_hash.bank_xor); + /* Bits SID[1:0] act as Bank[5:4] for PC hash, so apply them here. */ + temp ^= bitwise_xor_bits((bank | sid << NUM_BANK_BITS) & addr_hash.bank_xor); pc ^= temp; - - /* Drop SID bits for the sake of debug printing later. */ - bank &= 0x1F; }
/* Reconstruct the normalized address starting with NA[4:0] = 0 */
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yazen Ghannam yazen.ghannam@amd.com
commit ba437905b4fbf0ee1686c175069239a1cc292558 upstream.
The currently used normalized address format is not applicable to all MI300 systems. This leads to incorrect results during address translation.
Drop the fixed layout and construct the normalized address from system settings.
Fixes: 87a612375307 ("RAS/AMD/ATL: Add MI300 DRAM to normalized address translation support") Signed-off-by: Yazen Ghannam yazen.ghannam@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240607-mi300-dram-xl-fix-v1-2-2f11547a178c@amd.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ras/amd/atl/internal.h | 2 drivers/ras/amd/atl/system.c | 2 drivers/ras/amd/atl/umc.c | 157 ++++++++++++++++++++++++++++++----------- 3 files changed, 117 insertions(+), 44 deletions(-)
--- a/drivers/ras/amd/atl/internal.h +++ b/drivers/ras/amd/atl/internal.h @@ -224,7 +224,7 @@ int df_indirect_read_broadcast(u16 node,
int get_df_system_info(void); int determine_node_id(struct addr_ctx *ctx, u8 socket_num, u8 die_num); -int get_addr_hash_mi300(void); +int get_umc_info_mi300(void);
int get_address_map(struct addr_ctx *ctx);
--- a/drivers/ras/amd/atl/system.c +++ b/drivers/ras/amd/atl/system.c @@ -127,7 +127,7 @@ static int df4_determine_df_rev(u32 reg) if (reg == DF_FUNC0_ID_MI300) { df_cfg.flags.heterogeneous = 1;
- if (get_addr_hash_mi300()) + if (get_umc_info_mi300()) return -EINVAL; }
--- a/drivers/ras/amd/atl/umc.c +++ b/drivers/ras/amd/atl/umc.c @@ -68,6 +68,8 @@ struct xor_bits { };
#define NUM_BANK_BITS 4 +#define NUM_COL_BITS 5 +#define NUM_SID_BITS 2
static struct { /* UMC::CH::AddrHashBank */ @@ -80,7 +82,22 @@ static struct { u8 bank_xor; } addr_hash;
+static struct { + u8 bank[NUM_BANK_BITS]; + u8 col[NUM_COL_BITS]; + u8 sid[NUM_SID_BITS]; + u8 num_row_lo; + u8 num_row_hi; + u8 row_lo; + u8 row_hi; + u8 pc; +} bit_shifts; + #define MI300_UMC_CH_BASE 0x90000 +#define MI300_ADDR_CFG (MI300_UMC_CH_BASE + 0x30) +#define MI300_ADDR_SEL (MI300_UMC_CH_BASE + 0x40) +#define MI300_COL_SEL_LO (MI300_UMC_CH_BASE + 0x50) +#define MI300_ADDR_SEL_2 (MI300_UMC_CH_BASE + 0xA4) #define MI300_ADDR_HASH_BANK0 (MI300_UMC_CH_BASE + 0xC8) #define MI300_ADDR_HASH_PC (MI300_UMC_CH_BASE + 0xE0) #define MI300_ADDR_HASH_PC2 (MI300_UMC_CH_BASE + 0xE4) @@ -90,17 +107,42 @@ static struct { #define ADDR_HASH_ROW_XOR GENMASK(31, 14) #define ADDR_HASH_BANK_XOR GENMASK(5, 0)
+#define ADDR_CFG_NUM_ROW_LO GENMASK(11, 8) +#define ADDR_CFG_NUM_ROW_HI GENMASK(15, 12) + +#define ADDR_SEL_BANK0 GENMASK(3, 0) +#define ADDR_SEL_BANK1 GENMASK(7, 4) +#define ADDR_SEL_BANK2 GENMASK(11, 8) +#define ADDR_SEL_BANK3 GENMASK(15, 12) +#define ADDR_SEL_BANK4 GENMASK(20, 16) +#define ADDR_SEL_ROW_LO GENMASK(27, 24) +#define ADDR_SEL_ROW_HI GENMASK(31, 28) + +#define COL_SEL_LO_COL0 GENMASK(3, 0) +#define COL_SEL_LO_COL1 GENMASK(7, 4) +#define COL_SEL_LO_COL2 GENMASK(11, 8) +#define COL_SEL_LO_COL3 GENMASK(15, 12) +#define COL_SEL_LO_COL4 GENMASK(19, 16) + +#define ADDR_SEL_2_BANK5 GENMASK(4, 0) +#define ADDR_SEL_2_CHAN GENMASK(15, 12) + /* * Read UMC::CH::AddrHash{Bank,PC,PC2} registers to get XOR bits used - * for hashing. Do this during module init, since the values will not - * change during run time. + * for hashing. + * + * Also, read UMC::CH::Addr{Cfg,Sel,Sel2} and UMC::CH:ColSelLo registers to + * get the values needed to reconstruct the normalized address. Apply additional + * offsets to the raw register values, as needed. + * + * Do this during module init, since the values will not change during run time. * * These registers are instantiated for each UMC across each AMD Node. * However, they should be identically programmed due to the fixed hardware * design of MI300 systems. So read the values from Node 0 UMC 0 and keep a * single global structure for simplicity. */ -int get_addr_hash_mi300(void) +int get_umc_info_mi300(void) { u32 temp; int ret; @@ -130,6 +172,44 @@ int get_addr_hash_mi300(void)
addr_hash.bank_xor = FIELD_GET(ADDR_HASH_BANK_XOR, temp);
+ ret = amd_smn_read(0, MI300_ADDR_CFG, &temp); + if (ret) + return ret; + + bit_shifts.num_row_hi = FIELD_GET(ADDR_CFG_NUM_ROW_HI, temp); + bit_shifts.num_row_lo = 10 + FIELD_GET(ADDR_CFG_NUM_ROW_LO, temp); + + ret = amd_smn_read(0, MI300_ADDR_SEL, &temp); + if (ret) + return ret; + + bit_shifts.bank[0] = 5 + FIELD_GET(ADDR_SEL_BANK0, temp); + bit_shifts.bank[1] = 5 + FIELD_GET(ADDR_SEL_BANK1, temp); + bit_shifts.bank[2] = 5 + FIELD_GET(ADDR_SEL_BANK2, temp); + bit_shifts.bank[3] = 5 + FIELD_GET(ADDR_SEL_BANK3, temp); + /* Use BankBit4 for the SID0 position. */ + bit_shifts.sid[0] = 5 + FIELD_GET(ADDR_SEL_BANK4, temp); + bit_shifts.row_lo = 12 + FIELD_GET(ADDR_SEL_ROW_LO, temp); + bit_shifts.row_hi = 24 + FIELD_GET(ADDR_SEL_ROW_HI, temp); + + ret = amd_smn_read(0, MI300_COL_SEL_LO, &temp); + if (ret) + return ret; + + bit_shifts.col[0] = 2 + FIELD_GET(COL_SEL_LO_COL0, temp); + bit_shifts.col[1] = 2 + FIELD_GET(COL_SEL_LO_COL1, temp); + bit_shifts.col[2] = 2 + FIELD_GET(COL_SEL_LO_COL2, temp); + bit_shifts.col[3] = 2 + FIELD_GET(COL_SEL_LO_COL3, temp); + bit_shifts.col[4] = 2 + FIELD_GET(COL_SEL_LO_COL4, temp); + + ret = amd_smn_read(0, MI300_ADDR_SEL_2, &temp); + if (ret) + return ret; + + /* Use BankBit5 for the SID1 position. */ + bit_shifts.sid[1] = 5 + FIELD_GET(ADDR_SEL_2_BANK5, temp); + bit_shifts.pc = 5 + FIELD_GET(ADDR_SEL_2_CHAN, temp); + return 0; }
@@ -146,9 +226,6 @@ int get_addr_hash_mi300(void) * The MCA address format is as follows: * MCA_ADDR[27:0] = {S[1:0], P[0], R[14:0], B[3:0], C[4:0], Z[0]} * - * The normalized address format is fixed in hardware and is as follows: - * NA[30:0] = {S[1:0], R[13:0], C4, B[1:0], B[3:2], C[3:2], P, C[1:0], Z[4:0]} - * * Additionally, the PC and Bank bits may be hashed. This must be accounted for before * reconstructing the normalized address. */ @@ -158,18 +235,10 @@ int get_addr_hash_mi300(void) #define MI300_UMC_MCA_PC BIT(25) #define MI300_UMC_MCA_SID GENMASK(27, 26)
-#define MI300_NA_COL_1_0 GENMASK(6, 5) -#define MI300_NA_PC BIT(7) -#define MI300_NA_COL_3_2 GENMASK(9, 8) -#define MI300_NA_BANK_3_2 GENMASK(11, 10) -#define MI300_NA_BANK_1_0 GENMASK(13, 12) -#define MI300_NA_COL_4 BIT(14) -#define MI300_NA_ROW GENMASK(28, 15) -#define MI300_NA_SID GENMASK(30, 29) - static unsigned long convert_dram_to_norm_addr_mi300(unsigned long addr) { - u16 i, col, row, bank, pc, sid, temp; + u16 i, col, row, bank, pc, sid; + u32 temp;
col = FIELD_GET(MI300_UMC_MCA_COL, addr); bank = FIELD_GET(MI300_UMC_MCA_BANK, addr); @@ -199,34 +268,38 @@ static unsigned long convert_dram_to_nor /* Reconstruct the normalized address starting with NA[4:0] = 0 */ addr = 0;
- /* NA[6:5] = Column[1:0] */ - temp = col & 0x3; - addr |= FIELD_PREP(MI300_NA_COL_1_0, temp); - - /* NA[7] = PC */ - addr |= FIELD_PREP(MI300_NA_PC, pc); - - /* NA[9:8] = Column[3:2] */ - temp = (col >> 2) & 0x3; - addr |= FIELD_PREP(MI300_NA_COL_3_2, temp); - - /* NA[11:10] = Bank[3:2] */ - temp = (bank >> 2) & 0x3; - addr |= FIELD_PREP(MI300_NA_BANK_3_2, temp); - - /* NA[13:12] = Bank[1:0] */ - temp = bank & 0x3; - addr |= FIELD_PREP(MI300_NA_BANK_1_0, temp); - - /* NA[14] = Column[4] */ - temp = (col >> 4) & 0x1; - addr |= FIELD_PREP(MI300_NA_COL_4, temp); + /* Column bits */ + for (i = 0; i < NUM_COL_BITS; i++) { + temp = (col >> i) & 0x1; + addr |= temp << bit_shifts.col[i]; + }
- /* NA[28:15] = Row[13:0] */ - addr |= FIELD_PREP(MI300_NA_ROW, row); + /* Bank bits */ + for (i = 0; i < NUM_BANK_BITS; i++) { + temp = (bank >> i) & 0x1; + addr |= temp << bit_shifts.bank[i]; + } + + /* Row lo bits */ + for (i = 0; i < bit_shifts.num_row_lo; i++) { + temp = (row >> i) & 0x1; + addr |= temp << (i + bit_shifts.row_lo); + }
- /* NA[30:29] = SID[1:0] */ - addr |= FIELD_PREP(MI300_NA_SID, sid); + /* Row hi bits */ + for (i = 0; i < bit_shifts.num_row_hi; i++) { + temp = (row >> (i + bit_shifts.num_row_lo)) & 0x1; + addr |= temp << (i + bit_shifts.row_hi); + } + + /* PC bit */ + addr |= pc << bit_shifts.pc; + + /* SID bits */ + for (i = 0; i < NUM_SID_BITS; i++) { + temp = (sid >> i) & 0x1; + addr |= temp << bit_shifts.sid[i]; + }
pr_debug("Addr=0x%016lx", addr); pr_debug("Bank=%u Row=%u Column=%u PC=%u SID=%u", bank, row, col, pc, sid);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit e79a10652bbd320649da705ca1ea0c04351af403 upstream.
A Rembrandt-based HP thin client is reported to have problems where the NVME disk isn't present after resume from s2idle.
This is because the NVME disk wasn't put into D3 at suspend, and that happened because the StorageD3Enable _DSD was missing in the BIOS.
As AMD's architecture requires that the NVME is in D3 for s2idle, adjust the criteria for force_storage_d3 to match *all* Zen SoCs when the FADT advertises low power idle support.
This will ensure that any future products with this BIOS deficiency don't need to be added to the allow list of overrides.
Cc: All applicable stable@vger.kernel.org Signed-off-by: Mario Limonciello mario.limonciello@amd.com Acked-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/x86/utils.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-)
--- a/drivers/acpi/x86/utils.c +++ b/drivers/acpi/x86/utils.c @@ -197,16 +197,16 @@ bool acpi_device_override_status(struct }
/* - * AMD systems from Renoir and Lucienne *require* that the NVME controller + * AMD systems from Renoir onwards *require* that the NVME controller * is put into D3 over a Modern Standby / suspend-to-idle cycle. * * This is "typically" accomplished using the `StorageD3Enable` * property in the _DSD that is checked via the `acpi_storage_d3` function - * but this property was introduced after many of these systems launched - * and most OEM systems don't have it in their BIOS. + * but some OEM systems still don't have it in their BIOS. * * The Microsoft documentation for StorageD3Enable mentioned that Windows has - * a hardcoded allowlist for D3 support, which was used for these platforms. + * a hardcoded allowlist for D3 support as well as a registry key to override + * the BIOS, which has been used for these cases. * * This allows quirking on Linux in a similar fashion. * @@ -219,19 +219,15 @@ bool acpi_device_override_status(struct * https://bugzilla.kernel.org/show_bug.cgi?id=216773 * https://bugzilla.kernel.org/show_bug.cgi?id=217003 * 2) On at least one HP system StorageD3Enable is missing on the second NVME - disk in the system. + * disk in the system. + * 3) On at least one HP Rembrandt system StorageD3Enable is missing on the only + * NVME device. */ -static const struct x86_cpu_id storage_d3_cpu_ids[] = { - X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 24, NULL), /* Picasso */ - X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 96, NULL), /* Renoir */ - X86_MATCH_VENDOR_FAM_MODEL(AMD, 23, 104, NULL), /* Lucienne */ - X86_MATCH_VENDOR_FAM_MODEL(AMD, 25, 80, NULL), /* Cezanne */ - {} -}; - bool force_storage_d3(void) { - return x86_match_cpu(storage_d3_cpu_ids); + if (!cpu_feature_enabled(X86_FEATURE_ZEN)) + return false; + return acpi_gbl_FADT.flags & ACPI_FADT_LOW_POWER_S0; }
/*
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 7f18bd49cb6b6a3ab6d860fefccdc94f2a247db0 upstream.
It is reported that commit 950210887670 ("thermal: core: Drop trips_disabled bitmask") causes the maximum frequency of CPUs to drop further down with every system sleep-wake cycle on Intel Core i7-4710HQ.
This turns out to be due to a trip point whose temperature is equal to 0 degrees Celsius which is acted on every time the system wakes from sleep.
Before commit 950210887670 this trip point would be disabled wia the trips_disabled bitmask, but now it is treated as a valid one.
Since ACPI thermal control is generally about protection against overheating, trip points with temperature of 0 centigrade or below are not particularly useful there, so initialize them all as invalid which fixes the problem at hand.
Fixes: 950210887670 ("thermal: core: Drop trips_disabled bitmask") Closes: https://lore.kernel.org/linux-pm/3f71747b-f852-4ee0-b384-cf46b2aefa3f@gmx.co... Reported-by: Tibor Billes tbilles@gmx.com Tested-by: Tibor Billes tbilles@gmx.com Cc: 6.7+ stable@vger.kernel.org # 6.7+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/thermal.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/drivers/acpi/thermal.c +++ b/drivers/acpi/thermal.c @@ -168,11 +168,17 @@ static int acpi_thermal_get_polling_freq
static int acpi_thermal_temp(struct acpi_thermal *tz, int temp_deci_k) { + int temp; + if (temp_deci_k == THERMAL_TEMP_INVALID) return THERMAL_TEMP_INVALID;
- return deci_kelvin_to_millicelsius_with_offset(temp_deci_k, + temp = deci_kelvin_to_millicelsius_with_offset(temp_deci_k, tz->kelvin_offset); + if (temp <= 0) + return THERMAL_TEMP_INVALID; + + return temp; }
static bool acpi_thermal_trip_valid(struct acpi_thermal_trip *acpi_trip)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Beulich jbeulich@suse.com
commit 3ac36aa7307363b7247ccb6f6a804e11496b2b36 upstream.
memblock_set_node() warns about using MAX_NUMNODES, see
e0eec24e2e19 ("memblock: make memblock_set_node() also warn about use of MAX_NUMNODES")
for details.
Reported-by: Narasimhan V Narasimhan.V@amd.com Signed-off-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org [bp: commit message] Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Mike Rapoport (IBM) rppt@kernel.org Tested-by: Paul E. McKenney paulmck@kernel.org Link: https://lore.kernel.org/r/20240603141005.23261-1-bp@kernel.org Link: https://lore.kernel.org/r/abadb736-a239-49e4-ab42-ace7acdd4278@suse.com Signed-off-by: Mike Rapoport (IBM) rppt@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/mm/numa.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/arch/x86/mm/numa.c +++ b/arch/x86/mm/numa.c @@ -493,7 +493,7 @@ static void __init numa_clear_kernel_nod for_each_reserved_mem_region(mb_region) { int nid = memblock_get_region_node(mb_region);
- if (nid != MAX_NUMNODES) + if (nid != NUMA_NO_NODE) node_set(nid, reserved_nodemask); }
@@ -614,9 +614,9 @@ static int __init numa_init(int (*init_f nodes_clear(node_online_map); memset(&numa_meminfo, 0, sizeof(numa_meminfo)); WARN_ON(memblock_set_node(0, ULLONG_MAX, &memblock.memory, - MAX_NUMNODES)); + NUMA_NO_NODE)); WARN_ON(memblock_set_node(0, ULLONG_MAX, &memblock.reserved, - MAX_NUMNODES)); + NUMA_NO_NODE)); /* In case that parsing SRAT failed. */ WARN_ON(memblock_clear_hotplug(0, ULLONG_MAX)); numa_reset_distance();
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jan Beulich jbeulich@suse.com
commit e0eec24e2e199873f43df99ec39773ad3af2bff7 upstream.
On an (old) x86 system with SRAT just covering space above 4Gb:
ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0xfffffffff] hotplug
the commit referenced below leads to this NUMA configuration no longer being refused by a CONFIG_NUMA=y kernel (previously
NUMA: nodes only cover 6144MB of your 8185MB e820 RAM. Not used. No NUMA configuration found Faking a node at [mem 0x0000000000000000-0x000000027fffffff]
was seen in the log directly after the message quoted above), because of memblock_validate_numa_coverage() checking for NUMA_NO_NODE (only). This in turn led to memblock_alloc_range_nid()'s warning about MAX_NUMNODES triggering, followed by a NULL deref in memmap_init() when trying to access node 64's (NODE_SHIFT=6) node data.
To compensate said change, make memblock_set_node() warn on and adjust a passed in value of MAX_NUMNODES, just like various other functions already do.
Fixes: ff6c3d81f2e8 ("NUMA: optimize detection of memory with no node id assigned by firmware") Signed-off-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1c8a058c-5365-4f27-a9f1-3aeb7fb3e7b2@suse.com Signed-off-by: Mike Rapoport (IBM) rppt@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memblock.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/mm/memblock.c +++ b/mm/memblock.c @@ -1339,6 +1339,10 @@ int __init_memblock memblock_set_node(ph int start_rgn, end_rgn; int i, ret;
+ if (WARN_ONCE(nid == MAX_NUMNODES, + "Usage of MAX_NUMNODES is deprecated. Use NUMA_NO_NODE instead\n")) + nid = NUMA_NO_NODE; + ret = memblock_isolate_range(type, base, size, &start_rgn, &end_rgn); if (ret) return ret;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jani Nikula jani.nikula@intel.com
commit 38e3825631b1f314b21e3ade00b5a4d737eb054e upstream.
The duplicated EDID is never freed. Fix it.
Cc: stable@vger.kernel.org Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Inki Dae inki.dae@samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/exynos/exynos_drm_vidi.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/exynos/exynos_drm_vidi.c +++ b/drivers/gpu/drm/exynos/exynos_drm_vidi.c @@ -309,6 +309,7 @@ static int vidi_get_modes(struct drm_con struct vidi_context *ctx = ctx_from_connector(connector); struct edid *edid; int edid_len; + int count;
/* * the edid data comes from user side and it would be set @@ -328,7 +329,11 @@ static int vidi_get_modes(struct drm_con
drm_connector_update_edid_property(connector, edid);
- return drm_add_edid_modes(connector, edid); + count = drm_add_edid_modes(connector, edid); + + kfree(edid); + + return count; }
static const struct drm_connector_helper_funcs vidi_connector_helper_funcs = {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marek Szyprowski m.szyprowski@samsung.com
commit 799d4b392417ed6889030a5b2335ccb6dcf030ab upstream.
When reading EDID fails and driver reports no modes available, the DRM core adds an artificial 1024x786 mode to the connector. Unfortunately some variants of the Exynos HDMI (like the one in Exynos4 SoCs) are not able to drive such mode, so report a safe 640x480 mode instead of nothing in case of the EDID reading failure.
This fixes the following issue observed on Trats2 board since commit 13d5b040363c ("drm/exynos: do not return negative values from .get_modes()"):
[drm] Exynos DRM: using 11c00000.fimd device for DMA mapping operations exynos-drm exynos-drm: bound 11c00000.fimd (ops fimd_component_ops) exynos-drm exynos-drm: bound 12c10000.mixer (ops mixer_component_ops) exynos-dsi 11c80000.dsi: [drm:samsung_dsim_host_attach] Attached s6e8aa0 device (lanes:4 bpp:24 mode-flags:0x10b) exynos-drm exynos-drm: bound 11c80000.dsi (ops exynos_dsi_component_ops) exynos-drm exynos-drm: bound 12d00000.hdmi (ops hdmi_component_ops) [drm] Initialized exynos 1.1.0 20180330 for exynos-drm on minor 1 exynos-hdmi 12d00000.hdmi: [drm:hdmiphy_enable.part.0] *ERROR* PLL could not reach steady state panel-samsung-s6e8aa0 11c80000.dsi.0: ID: 0xa2, 0x20, 0x8c exynos-mixer 12c10000.mixer: timeout waiting for VSYNC ------------[ cut here ]------------ WARNING: CPU: 1 PID: 11 at drivers/gpu/drm/drm_atomic_helper.c:1682 drm_atomic_helper_wait_for_vblanks.part.0+0x2b0/0x2b8 [CRTC:70:crtc-1] vblank wait timed out Modules linked in: CPU: 1 PID: 11 Comm: kworker/u16:0 Not tainted 6.9.0-rc5-next-20240424 #14913 Hardware name: Samsung Exynos (Flattened Device Tree) Workqueue: events_unbound deferred_probe_work_func Call trace: unwind_backtrace from show_stack+0x10/0x14 show_stack from dump_stack_lvl+0x68/0x88 dump_stack_lvl from __warn+0x7c/0x1c4 __warn from warn_slowpath_fmt+0x11c/0x1a8 warn_slowpath_fmt from drm_atomic_helper_wait_for_vblanks.part.0+0x2b0/0x2b8 drm_atomic_helper_wait_for_vblanks.part.0 from drm_atomic_helper_commit_tail_rpm+0x7c/0x8c drm_atomic_helper_commit_tail_rpm from commit_tail+0x9c/0x184 commit_tail from drm_atomic_helper_commit+0x168/0x190 drm_atomic_helper_commit from drm_atomic_commit+0xb4/0xe0 drm_atomic_commit from drm_client_modeset_commit_atomic+0x23c/0x27c drm_client_modeset_commit_atomic from drm_client_modeset_commit_locked+0x60/0x1cc drm_client_modeset_commit_locked from drm_client_modeset_commit+0x24/0x40 drm_client_modeset_commit from __drm_fb_helper_restore_fbdev_mode_unlocked+0x9c/0xc4 __drm_fb_helper_restore_fbdev_mode_unlocked from drm_fb_helper_set_par+0x2c/0x3c drm_fb_helper_set_par from fbcon_init+0x3d8/0x550 fbcon_init from visual_init+0xc0/0x108 visual_init from do_bind_con_driver+0x1b8/0x3a4 do_bind_con_driver from do_take_over_console+0x140/0x1ec do_take_over_console from do_fbcon_takeover+0x70/0xd0 do_fbcon_takeover from fbcon_fb_registered+0x19c/0x1ac fbcon_fb_registered from register_framebuffer+0x190/0x21c register_framebuffer from __drm_fb_helper_initial_config_and_unlock+0x350/0x574 __drm_fb_helper_initial_config_and_unlock from exynos_drm_fbdev_client_hotplug+0x6c/0xb0 exynos_drm_fbdev_client_hotplug from drm_client_register+0x58/0x94 drm_client_register from exynos_drm_bind+0x160/0x190 exynos_drm_bind from try_to_bring_up_aggregate_device+0x200/0x2d8 try_to_bring_up_aggregate_device from __component_add+0xb0/0x170 __component_add from mixer_probe+0x74/0xcc mixer_probe from platform_probe+0x5c/0xb8 platform_probe from really_probe+0xe0/0x3d8 really_probe from __driver_probe_device+0x9c/0x1e4 __driver_probe_device from driver_probe_device+0x30/0xc0 driver_probe_device from __device_attach_driver+0xa8/0x120 __device_attach_driver from bus_for_each_drv+0x80/0xcc bus_for_each_drv from __device_attach+0xac/0x1fc __device_attach from bus_probe_device+0x8c/0x90 bus_probe_device from deferred_probe_work_func+0x98/0xe0 deferred_probe_work_func from process_one_work+0x240/0x6d0 process_one_work from worker_thread+0x1a0/0x3f4 worker_thread from kthread+0x104/0x138 kthread from ret_from_fork+0x14/0x28 Exception stack(0xf0895fb0 to 0xf0895ff8) ... irq event stamp: 82357 hardirqs last enabled at (82363): [<c01a96e8>] vprintk_emit+0x308/0x33c hardirqs last disabled at (82368): [<c01a969c>] vprintk_emit+0x2bc/0x33c softirqs last enabled at (81614): [<c0101644>] __do_softirq+0x320/0x500 softirqs last disabled at (81609): [<c012dfe0>] __irq_exit_rcu+0x130/0x184 ---[ end trace 0000000000000000 ]--- exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out exynos-drm exynos-drm: [drm] *ERROR* [CRTC:70:crtc-1] commit wait timed out exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out exynos-drm exynos-drm: [drm] *ERROR* [CONNECTOR:74:HDMI-A-1] commit wait timed out exynos-drm exynos-drm: [drm] *ERROR* flip_done timed out exynos-drm exynos-drm: [drm] *ERROR* [PLANE:56:plane-5] commit wait timed out exynos-mixer 12c10000.mixer: timeout waiting for VSYNC
Cc: stable@vger.kernel.org Fixes: 13d5b040363c ("drm/exynos: do not return negative values from .get_modes()") Signed-off-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Inki Dae inki.dae@samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/exynos/exynos_hdmi.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/exynos/exynos_hdmi.c +++ b/drivers/gpu/drm/exynos/exynos_hdmi.c @@ -887,11 +887,11 @@ static int hdmi_get_modes(struct drm_con int ret;
if (!hdata->ddc_adpt) - return 0; + goto no_edid;
edid = drm_get_edid(connector, hdata->ddc_adpt); if (!edid) - return 0; + goto no_edid;
hdata->dvi_mode = !connector->display_info.is_hdmi; DRM_DEV_DEBUG_KMS(hdata->dev, "%s : width[%d] x height[%d]\n", @@ -906,6 +906,9 @@ static int hdmi_get_modes(struct drm_con kfree(edid);
return ret; + +no_edid: + return drm_add_modes_noedid(connector, 640, 480); }
static int hdmi_find_phy_conf(struct hdmi_context *hdata, u32 pixel_clock)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
commit 8031b58c3a9b1db3ef68b3bd749fbee2e1e1aaa3 upstream.
This is strictly related to commit fb7a0d334894 ("mptcp: ensure snd_nxt is properly initialized on connect"). It turns out that syzkaller can trigger the retransmit after fallback and before processing any other incoming packet - so that snd_una is still left uninitialized.
Address the issue explicitly initializing snd_una together with snd_nxt and write_seq.
Suggested-by: Mat Martineau martineau@kernel.org Fixes: 8fd738049ac3 ("mptcp: fallback in case of simultaneous connect") Cc: stable@vger.kernel.org Reported-by: Christoph Paasch cpaasch@apple.com Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/485 Signed-off-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-1-1ab... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/protocol.c | 1 + 1 file changed, 1 insertion(+)
--- a/net/mptcp/protocol.c +++ b/net/mptcp/protocol.c @@ -3731,6 +3731,7 @@ static int mptcp_connect(struct sock *sk
WRITE_ONCE(msk->write_seq, subflow->idsn); WRITE_ONCE(msk->snd_nxt, subflow->idsn); + WRITE_ONCE(msk->snd_una, subflow->idsn); if (likely(!__mptcp_check_fallback(msk))) MPTCP_INC_STATS(sock_net(sk), MPTCP_MIB_MPCAPABLEACTIVE);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: YonglongLi liyonglong@chinatelecom.cn
commit 6a09788c1a66e3d8b04b3b3e7618cc817bb60ae9 upstream.
The RmAddr MIB counter is supposed to be incremented once when a valid RM_ADDR has been received. Before this patch, it could have been incremented as many times as the number of subflows connected to the linked address ID, so it could have been 0, 1 or more than 1.
The "RmSubflow" is incremented after a local operation. In this case, it is normal to tied it with the number of subflows that have been actually removed.
The "remove invalid addresses" MP Join subtest has been modified to validate this case. A broadcast IP address is now used instead: the client will not be able to create a subflow to this address. The consequence is that when receiving the RM_ADDR with the ID attached to this broadcast IP address, no subflow linked to this ID will be found.
Fixes: 7a7e52e38a40 ("mptcp: add RM_ADDR related mibs") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: YonglongLi liyonglong@chinatelecom.cn Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-2-1ab... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 5 ++++- tools/testing/selftests/net/mptcp/mptcp_join.sh | 3 ++- 2 files changed, 6 insertions(+), 2 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -813,10 +813,13 @@ static void mptcp_pm_nl_rm_addr_or_subfl spin_lock_bh(&msk->pm.lock);
removed = true; - __MPTCP_INC_STATS(sock_net(sk), rm_type); + if (rm_type == MPTCP_MIB_RMSUBFLOW) + __MPTCP_INC_STATS(sock_net(sk), rm_type); } if (rm_type == MPTCP_MIB_RMSUBFLOW) __set_bit(rm_id ? rm_id : msk->mpc_endpoint_id, msk->pm.id_avail_bitmap); + else if (rm_type == MPTCP_MIB_RMADDR) + __MPTCP_INC_STATS(sock_net(sk), rm_type); if (!removed) continue;
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2359,7 +2359,8 @@ remove_tests() pm_nl_set_limits $ns1 3 3 pm_nl_add_endpoint $ns1 10.0.12.1 flags signal pm_nl_add_endpoint $ns1 10.0.3.1 flags signal - pm_nl_add_endpoint $ns1 10.0.14.1 flags signal + # broadcast IP: no packet for this address will be received on ns1 + pm_nl_add_endpoint $ns1 224.0.0.1 flags signal pm_nl_set_limits $ns2 3 3 addr_nr_ns1=-3 speed=10 \ run_tests $ns1 $ns2 10.0.1.1
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: YonglongLi liyonglong@chinatelecom.cn
commit 40eec1795cc27b076d49236649a29507c7ed8c2d upstream.
The creation of new subflows can fail for different reasons. If no subflow have been created using the received ADD_ADDR, the related counters should not be updated, otherwise they will never be decremented for events related to this ID later on.
For the moment, the number of accepted ADD_ADDR is only decremented upon the reception of a related RM_ADDR, and only if the remote address ID is currently being used by at least one subflow. In other words, if no subflow can be created with the received address, the counter will not be decremented. In this case, it is then important not to increment pm.add_addr_accepted counter, and not to modify pm.accept_addr bit.
Note that this patch does not modify the behaviour in case of failures later on, e.g. if the MP Join is dropped or rejected.
The "remove invalid addresses" MP Join subtest has been modified to validate this case. The broadcast IP address is added before the "valid" address that will be used to successfully create a subflow, and the limit is decreased by one: without this patch, it was not possible to create the last subflow, because:
- the broadcast address would have been accepted even if it was not usable: the creation of a subflow to this address results in an error,
- the limit of 2 accepted ADD_ADDR would have then been reached.
Fixes: 01cacb00b35c ("mptcp: add netlink-based PM") Cc: stable@vger.kernel.org Co-developed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Signed-off-by: YonglongLi liyonglong@chinatelecom.cn Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240607-upstream-net-20240607-misc-fixes-v1-3-1ab... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 16 ++++++++++------ tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 ++-- 2 files changed, 12 insertions(+), 8 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -676,6 +676,7 @@ static void mptcp_pm_nl_add_addr_receive unsigned int add_addr_accept_max; struct mptcp_addr_info remote; unsigned int subflows_max; + bool sf_created = false; int i, nr;
add_addr_accept_max = mptcp_pm_get_add_addr_accept_max(msk); @@ -703,15 +704,18 @@ static void mptcp_pm_nl_add_addr_receive if (nr == 0) return;
- msk->pm.add_addr_accepted++; - if (msk->pm.add_addr_accepted >= add_addr_accept_max || - msk->pm.subflows >= subflows_max) - WRITE_ONCE(msk->pm.accept_addr, false); - spin_unlock_bh(&msk->pm.lock); for (i = 0; i < nr; i++) - __mptcp_subflow_connect(sk, &addrs[i], &remote); + if (__mptcp_subflow_connect(sk, &addrs[i], &remote) == 0) + sf_created = true; spin_lock_bh(&msk->pm.lock); + + if (sf_created) { + msk->pm.add_addr_accepted++; + if (msk->pm.add_addr_accepted >= add_addr_accept_max || + msk->pm.subflows >= subflows_max) + WRITE_ONCE(msk->pm.accept_addr, false); + } }
void mptcp_pm_nl_addr_send_ack(struct mptcp_sock *msk) --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2358,10 +2358,10 @@ remove_tests() if reset "remove invalid addresses"; then pm_nl_set_limits $ns1 3 3 pm_nl_add_endpoint $ns1 10.0.12.1 flags signal - pm_nl_add_endpoint $ns1 10.0.3.1 flags signal # broadcast IP: no packet for this address will be received on ns1 pm_nl_add_endpoint $ns1 224.0.0.1 flags signal - pm_nl_set_limits $ns2 3 3 + pm_nl_add_endpoint $ns1 10.0.3.1 flags signal + pm_nl_set_limits $ns2 2 2 addr_nr_ns1=-3 speed=10 \ run_tests $ns1 $ns2 10.0.1.1 chk_join_nr 1 1 1
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Samuel Holland samuel.holland@sifive.com
commit e306a894bd511804ba9db7c00ca9cc05b55df1f2 upstream.
Now that the PLIC uses a platform driver, the driver is probed later in the boot process, where interrupts from peripherals might already be pending.
As a result, plic_handle_irq() may be called as early as the call to irq_set_chained_handler() completes. But this call happens before the per-context handler is completely set up, so there is a window where plic_handle_irq() can see incomplete per-context state and crash.
Avoid this by delaying the call to irq_set_chained_handler() until all handlers from all PLICs are initialized.
Fixes: 8ec99b033147 ("irqchip/sifive-plic: Convert PLIC driver into a platform driver") Reported-by: Geert Uytterhoeven geert@linux-m68k.org Signed-off-by: Samuel Holland samuel.holland@sifive.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Geert Uytterhoeven geert+renesas@glider.be Reviewed-by: Anup Patel anup@brainfault.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240529215458.937817-1-samuel.holland@sifive.com Closes: https://lore.kernel.org/r/CAMuHMdVYFFR7K5SbHBLY-JHhb7YpgGMS_hnRWm8H0KD-wBo+4... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-sifive-plic.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-)
--- a/drivers/irqchip/irq-sifive-plic.c +++ b/drivers/irqchip/irq-sifive-plic.c @@ -85,7 +85,7 @@ struct plic_handler { struct plic_priv *priv; }; static int plic_parent_irq __ro_after_init; -static bool plic_cpuhp_setup_done __ro_after_init; +static bool plic_global_setup_done __ro_after_init; static DEFINE_PER_CPU(struct plic_handler, plic_handlers);
static int plic_irq_set_type(struct irq_data *d, unsigned int type); @@ -490,10 +490,8 @@ static int plic_probe(struct platform_de unsigned long plic_quirks = 0; struct plic_handler *handler; u32 nr_irqs, parent_hwirq; - struct irq_domain *domain; struct plic_priv *priv; irq_hw_number_t hwirq; - bool cpuhp_setup;
if (is_of_node(dev->fwnode)) { const struct of_device_id *id; @@ -552,14 +550,6 @@ static int plic_probe(struct platform_de continue; }
- /* Find parent domain and register chained handler */ - domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(), DOMAIN_BUS_ANY); - if (!plic_parent_irq && domain) { - plic_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT); - if (plic_parent_irq) - irq_set_chained_handler(plic_parent_irq, plic_handle_irq); - } - /* * When running in M-mode we need to ignore the S-mode handler. * Here we assume it always comes later, but that might be a @@ -600,25 +590,35 @@ done: goto fail_cleanup_contexts;
/* - * We can have multiple PLIC instances so setup cpuhp state + * We can have multiple PLIC instances so setup global state * and register syscore operations only once after context * handlers of all online CPUs are initialized. */ - if (!plic_cpuhp_setup_done) { - cpuhp_setup = true; + if (!plic_global_setup_done) { + struct irq_domain *domain; + bool global_setup = true; + for_each_online_cpu(cpu) { handler = per_cpu_ptr(&plic_handlers, cpu); if (!handler->present) { - cpuhp_setup = false; + global_setup = false; break; } } - if (cpuhp_setup) { + + if (global_setup) { + /* Find parent domain and register chained handler */ + domain = irq_find_matching_fwnode(riscv_get_intc_hwnode(), DOMAIN_BUS_ANY); + if (domain) + plic_parent_irq = irq_create_mapping(domain, RV_IRQ_EXT); + if (plic_parent_irq) + irq_set_chained_handler(plic_parent_irq, plic_handle_irq); + cpuhp_setup_state(CPUHP_AP_IRQ_SIFIVE_PLIC_STARTING, "irqchip/sifive/plic:starting", plic_starting_cpu, plic_dying_cpu); register_syscore_ops(&plic_irq_syscore_ops); - plic_cpuhp_setup_done = true; + plic_global_setup_done = true; } }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hagar Hemdan hagarhem@amazon.com
commit b97e8a2f7130a4b30d1502003095833d16c028b3 upstream.
its_vlpi_prop_update() calls lpi_write_config() which obtains the mapping information for a VLPI without lock held. So it could race with its_vlpi_unmap().
Since all calls from its_irq_set_vcpu_affinity() require the same lock to be held, hoist the locking there instead of sprinkling the locking all over the place.
This bug was discovered using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc.
[ tglx: Use guard() instead of goto ]
Fixes: 015ec0386ab6 ("irqchip/gic-v3-its: Add VLPI configuration handling") Suggested-by: Marc Zyngier maz@kernel.org Signed-off-by: Hagar Hemdan hagarhem@amazon.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Reviewed-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20240531162144.28650-1-hagarhem@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/irqchip/irq-gic-v3-its.c | 44 ++++++++++----------------------------- 1 file changed, 12 insertions(+), 32 deletions(-)
--- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -1846,28 +1846,22 @@ static int its_vlpi_map(struct irq_data { struct its_device *its_dev = irq_data_get_irq_chip_data(d); u32 event = its_get_event_id(d); - int ret = 0;
if (!info->map) return -EINVAL;
- raw_spin_lock(&its_dev->event_map.vlpi_lock); - if (!its_dev->event_map.vm) { struct its_vlpi_map *maps;
maps = kcalloc(its_dev->event_map.nr_lpis, sizeof(*maps), GFP_ATOMIC); - if (!maps) { - ret = -ENOMEM; - goto out; - } + if (!maps) + return -ENOMEM;
its_dev->event_map.vm = info->map->vm; its_dev->event_map.vlpi_maps = maps; } else if (its_dev->event_map.vm != info->map->vm) { - ret = -EINVAL; - goto out; + return -EINVAL; }
/* Get our private copy of the mapping information */ @@ -1899,46 +1893,32 @@ static int its_vlpi_map(struct irq_data its_dev->event_map.nr_vlpis++; }
-out: - raw_spin_unlock(&its_dev->event_map.vlpi_lock); - return ret; + return 0; }
static int its_vlpi_get(struct irq_data *d, struct its_cmd_info *info) { struct its_device *its_dev = irq_data_get_irq_chip_data(d); struct its_vlpi_map *map; - int ret = 0; - - raw_spin_lock(&its_dev->event_map.vlpi_lock);
map = get_vlpi_map(d);
- if (!its_dev->event_map.vm || !map) { - ret = -EINVAL; - goto out; - } + if (!its_dev->event_map.vm || !map) + return -EINVAL;
/* Copy our mapping information to the incoming request */ *info->map = *map;
-out: - raw_spin_unlock(&its_dev->event_map.vlpi_lock); - return ret; + return 0; }
static int its_vlpi_unmap(struct irq_data *d) { struct its_device *its_dev = irq_data_get_irq_chip_data(d); u32 event = its_get_event_id(d); - int ret = 0;
- raw_spin_lock(&its_dev->event_map.vlpi_lock); - - if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d)) { - ret = -EINVAL; - goto out; - } + if (!its_dev->event_map.vm || !irqd_is_forwarded_to_vcpu(d)) + return -EINVAL;
/* Drop the virtual mapping */ its_send_discard(its_dev, event); @@ -1962,9 +1942,7 @@ static int its_vlpi_unmap(struct irq_dat kfree(its_dev->event_map.vlpi_maps); }
-out: - raw_spin_unlock(&its_dev->event_map.vlpi_lock); - return ret; + return 0; }
static int its_vlpi_prop_update(struct irq_data *d, struct its_cmd_info *info) @@ -1992,6 +1970,8 @@ static int its_irq_set_vcpu_affinity(str if (!is_v4(its_dev->its)) return -EINVAL;
+ guard(raw_spinlock_irq)(&its_dev->event_map.vlpi_lock); + /* Unmap request? */ if (!info) return its_vlpi_unmap(d);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Kaplan david.kaplan@amd.com
commit 93c1800b3799f17375989b0daf76497dd3e80922 upstream.
The call to cc_platform_has() triggers a fault and system crash if call depth tracking is active because the GS segment has been reset by load_segments() and GS_BASE is now 0 but call depth tracking uses per-CPU variables to operate.
Call cc_platform_has() earlier in the function when GS is still valid.
[ bp: Massage. ]
Fixes: 5d8213864ade ("x86/retbleed: Add SKL return thunk") Signed-off-by: David Kaplan david.kaplan@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Tom Lendacky thomas.lendacky@amd.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240603083036.637-1-bp@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/machine_kexec_64.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -295,8 +295,15 @@ void machine_kexec_cleanup(struct kimage void machine_kexec(struct kimage *image) { unsigned long page_list[PAGES_NR]; - void *control_page; + unsigned int host_mem_enc_active; int save_ftrace_enabled; + void *control_page; + + /* + * This must be done before load_segments() since if call depth tracking + * is used then GS must be valid to make any function calls. + */ + host_mem_enc_active = cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT);
#ifdef CONFIG_KEXEC_JUMP if (image->preserve_context) @@ -358,7 +365,7 @@ void machine_kexec(struct kimage *image) (unsigned long)page_list, image->start, image->preserve_context, - cc_platform_has(CC_ATTR_HOST_MEM_ENCRYPT)); + host_mem_enc_active);
#ifdef CONFIG_KEXEC_JUMP if (image->preserve_context)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yazen Ghannam yazen.ghannam@amd.com
commit c625dabbf1c4a8e77e4734014f2fde7aa9071a1f upstream.
AMD Zen-based systems use a System Management Network (SMN) that provides access to implementation-specific registers.
SMN accesses are done indirectly through an index/data pair in PCI config space. The PCI config access may fail and return an error code. This would prevent the "read" value from being updated.
However, the PCI config access may succeed, but the return value may be invalid. This is in similar fashion to PCI bad reads, i.e. return all bits set.
Most systems will return 0 for SMN addresses that are not accessible. This is in line with AMD convention that unavailable registers are Read-as-Zero/Writes-Ignored.
However, some systems will return a "PCI Error Response" instead. This value, along with an error code of 0 from the PCI config access, will confuse callers of the amd_smn_read() function.
Check for this condition, clear the return value, and set a proper error code.
Fixes: ddfe43cdc0da ("x86/amd_nb: Add SMN and Indirect Data Fabric access for AMD Fam17h") Signed-off-by: Yazen Ghannam yazen.ghannam@amd.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230403164244.471141-1-yazen.ghannam@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/amd_nb.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -215,7 +215,14 @@ out:
int amd_smn_read(u16 node, u32 address, u32 *value) { - return __amd_smn_rw(node, address, value, false); + int err = __amd_smn_rw(node, address, value, false); + + if (PCI_POSSIBLE_ERROR(*value)) { + err = -ENODEV; + *value = 0; + } + + return err; } EXPORT_SYMBOL_GPL(amd_smn_read);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haifeng Xu haifeng.xu@shopee.com
commit 74751ef5c1912ebd3e65c3b65f45587e05ce5d36 upstream.
In our production environment, we found many hung tasks which are blocked for more than 18 hours. Their call traces are like this:
[346278.191038] __schedule+0x2d8/0x890 [346278.191046] schedule+0x4e/0xb0 [346278.191049] perf_event_free_task+0x220/0x270 [346278.191056] ? init_wait_var_entry+0x50/0x50 [346278.191060] copy_process+0x663/0x18d0 [346278.191068] kernel_clone+0x9d/0x3d0 [346278.191072] __do_sys_clone+0x5d/0x80 [346278.191076] __x64_sys_clone+0x25/0x30 [346278.191079] do_syscall_64+0x5c/0xc0 [346278.191083] ? syscall_exit_to_user_mode+0x27/0x50 [346278.191086] ? do_syscall_64+0x69/0xc0 [346278.191088] ? irqentry_exit_to_user_mode+0x9/0x20 [346278.191092] ? irqentry_exit+0x19/0x30 [346278.191095] ? exc_page_fault+0x89/0x160 [346278.191097] ? asm_exc_page_fault+0x8/0x30 [346278.191102] entry_SYSCALL_64_after_hwframe+0x44/0xae
The task was waiting for the refcount become to 1, but from the vmcore, we found the refcount has already been 1. It seems that the task didn't get woken up by perf_event_release_kernel() and got stuck forever. The below scenario may cause the problem.
Thread A Thread B ... ... perf_event_free_task perf_event_release_kernel ... acquire event->child_mutex ... get_ctx ... release event->child_mutex acquire ctx->mutex ... perf_free_event (acquire/release event->child_mutex) ... release ctx->mutex wait_var_event acquire ctx->mutex acquire event->child_mutex # move existing events to free_list release event->child_mutex release ctx->mutex put_ctx ... ...
In this case, all events of the ctx have been freed, so we couldn't find the ctx in free_list and Thread A will miss the wakeup. It's thus necessary to add a wakeup after dropping the reference.
Fixes: 1cf8dfe8a661 ("perf/core: Fix race between close() and fork()") Signed-off-by: Haifeng Xu haifeng.xu@shopee.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Frederic Weisbecker frederic@kernel.org Acked-by: Mark Rutland mark.rutland@arm.com Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240513103948.33570-1-haifeng.xu@shopee.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/events/core.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -5365,6 +5365,7 @@ int perf_event_release_kernel(struct per again: mutex_lock(&event->child_mutex); list_for_each_entry(child, &event->child_list, child_list) { + void *var = NULL;
/* * Cannot change, child events are not migrated, see the @@ -5405,11 +5406,23 @@ again: * this can't be the last reference. */ put_event(event); + } else { + var = &ctx->refcount; }
mutex_unlock(&event->child_mutex); mutex_unlock(&ctx->mutex); put_ctx(ctx); + + if (var) { + /* + * If perf_event_free_task() has deleted all events from the + * ctx while the child_mutex got released above, make sure to + * notify about the preceding put_ctx(). + */ + smp_mb(); /* pairs with wait_var_event() */ + wake_up_var(var); + } goto again; } mutex_unlock(&event->child_mutex);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adrian Hunter adrian.hunter@intel.com
commit bb69c912c4e8005cf1ee6c63782d2fc28838dee2 upstream.
If the --itrace option is used more than once, the options are combined, but "i" and "y" (sub-)options can be corrupted because itrace_do_parse_synth_opts() incorrectly overwrites the period type and period with default values.
For example, with:
--itrace=i0ns --itrace=e
The processing of "--itrace=e", resets the "i" period from 0 nanoseconds to the default 100 microseconds.
Fix by performing the default setting of period type and period only if "i" or "y" are present in the currently processed --itrace value.
Fixes: f6986c95af84ff2a ("perf session: Add instruction tracing options") Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240315071334.3478-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/util/auxtrace.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/tools/perf/util/auxtrace.c +++ b/tools/perf/util/auxtrace.c @@ -1466,6 +1466,7 @@ int itrace_do_parse_synth_opts(struct it char *endptr; bool period_type_set = false; bool period_set = false; + bool iy = false;
synth_opts->set = true;
@@ -1484,6 +1485,7 @@ int itrace_do_parse_synth_opts(struct it switch (*p++) { case 'i': case 'y': + iy = true; if (p[-1] == 'y') synth_opts->cycles = true; else @@ -1649,7 +1651,7 @@ int itrace_do_parse_synth_opts(struct it } } out: - if (synth_opts->instructions || synth_opts->cycles) { + if (iy) { if (!period_type_set) synth_opts->period_type = PERF_ITRACE_DEFAULT_PERIOD_TYPE;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Adrian Hunter adrian.hunter@intel.com
commit d4a98b45fbe6d06f4b79ed90d0bb05ced8674c23 upstream.
The trace could be misleading if trace errors are not taken into account, so display them also by adding the itrace "e" option.
Note --call-trace and --call-ret-trace already add the itrace "e" option.
Fixes: b585ebdb5912cf14 ("perf script: Add --insn-trace for instruction decoding") Reviewed-by: Andi Kleen ak@linux.intel.com Signed-off-by: Adrian Hunter adrian.hunter@intel.com Cc: Andi Kleen ak@linux.intel.com Cc: Ian Rogers irogers@google.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240315071334.3478-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/perf/builtin-script.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3806,7 +3806,7 @@ static int parse_insn_trace(const struct if (ret < 0) return ret;
- itrace_parse_synth_opts(opt, "i0ns", 0); + itrace_parse_synth_opts(opt, "i0nse", 0); symbol_conf.nanosecs = true; return 0; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
commit 4dc3a3893dae5a7f73e5809273aca0f1f3548d55 upstream.
Validate that the HE operation element has the correct length before parsing it.
Cc: stable@vger.kernel.org Fixes: 645f3d85129d ("wifi: cfg80211: handle UHB AP and STA power type") Reviewed-by: Miriam Rachel Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240523120533.677025eb4a92.I44c091029ef113c294e8fe8b9bf8... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/wireless/scan.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/wireless/scan.c b/net/wireless/scan.c index 127853877a0a..8daed8232b05 100644 --- a/net/wireless/scan.c +++ b/net/wireless/scan.c @@ -2128,7 +2128,8 @@ static bool cfg80211_6ghz_power_type_valid(const u8 *ie, size_t ielen, struct ieee80211_he_operation *he_oper;
tmp = cfg80211_find_ext_elem(WLAN_EID_EXT_HE_OPERATION, ie, ielen); - if (tmp && tmp->datalen >= sizeof(*he_oper) + 1) { + if (tmp && tmp->datalen >= sizeof(*he_oper) + 1 && + tmp->datalen >= ieee80211_he_oper_size(tmp->data + 1)) { const struct ieee80211_he_6ghz_oper *he_6ghz_oper;
he_oper = (void *)&tmp->data[1];
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bitterblue Smith rtl8821cerfe2@gmail.com
commit 819bda58e77bb67974f94dc1aa11b0556b6f6889 upstream.
Since commit 0a44dfc07074 ("wifi: mac80211: simplify non-chanctx drivers") ieee80211_hw_config() is no longer called with changed = ~0. rtlwifi relied on ~0 in order to ignore the default retry limits of 4/7, preferring 48/48 in station mode and 7/7 in AP/IBSS.
RTL8192DU has a lot of packet loss with the default limits from mac80211. Fix it by ignoring IEEE80211_CONF_CHANGE_RETRY_LIMITS completely, because it's the simplest solution.
Link: https://lore.kernel.org/linux-wireless/cedd13d7691f4692b2a2fa5a24d44a22@real... Cc: stable@vger.kernel.org # 6.9.x Signed-off-by: Bitterblue Smith rtl8821cerfe2@gmail.com Acked-by: Ping-Ke Shih pkshih@realtek.com Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://msgid.link/1fabb8e4-adf3-47ae-8462-8aea963bc2a5@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/realtek/rtlwifi/core.c | 15 --------------- 1 file changed, 15 deletions(-)
diff --git a/drivers/net/wireless/realtek/rtlwifi/core.c b/drivers/net/wireless/realtek/rtlwifi/core.c index 2e60a6991ca1..42b7db12b1bd 100644 --- a/drivers/net/wireless/realtek/rtlwifi/core.c +++ b/drivers/net/wireless/realtek/rtlwifi/core.c @@ -633,21 +633,6 @@ static int rtl_op_config(struct ieee80211_hw *hw, u32 changed) } }
- if (changed & IEEE80211_CONF_CHANGE_RETRY_LIMITS) { - rtl_dbg(rtlpriv, COMP_MAC80211, DBG_LOUD, - "IEEE80211_CONF_CHANGE_RETRY_LIMITS %x\n", - hw->conf.long_frame_max_tx_count); - /* brought up everything changes (changed == ~0) indicates first - * open, so use our default value instead of that of wiphy. - */ - if (changed != ~0) { - mac->retry_long = hw->conf.long_frame_max_tx_count; - mac->retry_short = hw->conf.long_frame_max_tx_count; - rtlpriv->cfg->ops->set_hw_reg(hw, HW_VAR_RETRY_LIMIT, - (u8 *)(&hw->conf.long_frame_max_tx_count)); - } - } - if (changed & IEEE80211_CONF_CHANGE_CHANNEL && !rtlpriv->proximity.proxim_on) { struct ieee80211_channel *channel = hw->conf.chandef.chan;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
commit 40cecacabc460f5074398753feb9ed7d43e8dfa6 upstream.
Here's another one I missed during the initial conversion, fix that.
Cc: stable@vger.kernel.org Reported-by: Rene Petersen renepetersen@posteo.de Fixes: 0a44dfc07074 ("wifi: mac80211: simplify non-chanctx drivers") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218895 Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://msgid.link/20240528142308.3f7db1821e68.I531135d7ad76331a50244d6d5288... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/mediatek/mt76/mt7615/main.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/wireless/mediatek/mt76/mt7615/main.c b/drivers/net/wireless/mediatek/mt76/mt7615/main.c index 0971c164b57e..c27acaf0eb1c 100644 --- a/drivers/net/wireless/mediatek/mt76/mt7615/main.c +++ b/drivers/net/wireless/mediatek/mt76/mt7615/main.c @@ -1326,6 +1326,10 @@ static void mt7615_set_rekey_data(struct ieee80211_hw *hw, #endif /* CONFIG_PM */
const struct ieee80211_ops mt7615_ops = { + .add_chanctx = ieee80211_emulate_add_chanctx, + .remove_chanctx = ieee80211_emulate_remove_chanctx, + .change_chanctx = ieee80211_emulate_change_chanctx, + .switch_vif_chanctx = ieee80211_emulate_switch_vif_chanctx, .tx = mt7615_tx, .start = mt7615_start, .stop = mt7615_stop,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carlos Llamas cmllamas@google.com
commit f92a59f6d12e31ead999fee9585471b95a8ae8a3 upstream.
For ${atomic}_sub_and_test() the @i parameter is the value to subtract, not add. Fix the typo in the kerneldoc template and generate the headers with this update.
Fixes: ad8110706f38 ("locking/atomic: scripts: generate kerneldoc comments") Suggested-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Carlos Llamas cmllamas@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Acked-by: Mark Rutland mark.rutland@arm.com Reviewed-by: Kees Cook keescook@chromium.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240515133844.3502360-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/atomic/atomic-arch-fallback.h | 6 +++--- include/linux/atomic/atomic-instrumented.h | 8 ++++---- include/linux/atomic/atomic-long.h | 4 ++-- scripts/atomic/kerneldoc/sub_and_test | 2 +- 4 files changed, 10 insertions(+), 10 deletions(-)
--- a/include/linux/atomic/atomic-arch-fallback.h +++ b/include/linux/atomic/atomic-arch-fallback.h @@ -2242,7 +2242,7 @@ raw_atomic_try_cmpxchg_relaxed(atomic_t
/** * raw_atomic_sub_and_test() - atomic subtract and test if zero with full ordering - * @i: int value to add + * @i: int value to subtract * @v: pointer to atomic_t * * Atomically updates @v to (@v - @i) with full ordering. @@ -4368,7 +4368,7 @@ raw_atomic64_try_cmpxchg_relaxed(atomic6
/** * raw_atomic64_sub_and_test() - atomic subtract and test if zero with full ordering - * @i: s64 value to add + * @i: s64 value to subtract * @v: pointer to atomic64_t * * Atomically updates @v to (@v - @i) with full ordering. @@ -4690,4 +4690,4 @@ raw_atomic64_dec_if_positive(atomic64_t }
#endif /* _LINUX_ATOMIC_FALLBACK_H */ -// 14850c0b0db20c62fdc78ccd1d42b98b88d76331 +// b565db590afeeff0d7c9485ccbca5bb6e155749f --- a/include/linux/atomic/atomic-instrumented.h +++ b/include/linux/atomic/atomic-instrumented.h @@ -1349,7 +1349,7 @@ atomic_try_cmpxchg_relaxed(atomic_t *v,
/** * atomic_sub_and_test() - atomic subtract and test if zero with full ordering - * @i: int value to add + * @i: int value to subtract * @v: pointer to atomic_t * * Atomically updates @v to (@v - @i) with full ordering. @@ -2927,7 +2927,7 @@ atomic64_try_cmpxchg_relaxed(atomic64_t
/** * atomic64_sub_and_test() - atomic subtract and test if zero with full ordering - * @i: s64 value to add + * @i: s64 value to subtract * @v: pointer to atomic64_t * * Atomically updates @v to (@v - @i) with full ordering. @@ -4505,7 +4505,7 @@ atomic_long_try_cmpxchg_relaxed(atomic_l
/** * atomic_long_sub_and_test() - atomic subtract and test if zero with full ordering - * @i: long value to add + * @i: long value to subtract * @v: pointer to atomic_long_t * * Atomically updates @v to (@v - @i) with full ordering. @@ -5050,4 +5050,4 @@ atomic_long_dec_if_positive(atomic_long_
#endif /* _LINUX_ATOMIC_INSTRUMENTED_H */ -// ce5b65e0f1f8a276268b667194581d24bed219d4 +// 8829b337928e9508259079d32581775ececd415b --- a/include/linux/atomic/atomic-long.h +++ b/include/linux/atomic/atomic-long.h @@ -1535,7 +1535,7 @@ raw_atomic_long_try_cmpxchg_relaxed(atom
/** * raw_atomic_long_sub_and_test() - atomic subtract and test if zero with full ordering - * @i: long value to add + * @i: long value to subtract * @v: pointer to atomic_long_t * * Atomically updates @v to (@v - @i) with full ordering. @@ -1809,4 +1809,4 @@ raw_atomic_long_dec_if_positive(atomic_l }
#endif /* _LINUX_ATOMIC_LONG_H */ -// 1c4a26fc77f345342953770ebe3c4d08e7ce2f9a +// eadf183c3600b8b92b91839dd3be6bcc560c752d --- a/scripts/atomic/kerneldoc/sub_and_test +++ b/scripts/atomic/kerneldoc/sub_and_test @@ -1,7 +1,7 @@ cat <<EOF /** * ${class}${atomicname}() - atomic subtract and test if zero with ${desc_order} ordering - * @i: ${int} value to add + * @i: ${int} value to subtract * @v: pointer to ${atomic}_t * * Atomically updates @v to (@v - @i) with ${desc_order} ordering.
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nam Cao namcao@linutronix.de
commit 994af1825a2aa286f4903ff64a1c7378b52defe6 upstream.
On riscv32, it is possible for the last page in virtual address space (0xfffff000) to be allocated. This page overlaps with PTR_ERR, so that shouldn't happen.
There is already some code to ensure memblock won't allocate the last page. However, buddy allocator is left unchecked.
Fix this by reserving physical memory that would be mapped at virtual addresses greater than 0xfffff000.
Reported-by: Björn Töpel bjorn@kernel.org Closes: https://lore.kernel.org/linux-riscv/878r1ibpdn.fsf@all.your.base.are.belong.... Fixes: 76d2a0493a17 ("RISC-V: Init and Halt Code") Signed-off-by: Nam Cao namcao@linutronix.de Cc: stable@vger.kernel.org Tested-by: Björn Töpel bjorn@rivosinc.com Reviewed-by: Björn Töpel bjorn@rivosinc.com Reviewed-by: Mike Rapoport (IBM) rppt@kernel.org Link: https://lore.kernel.org/r/20240425115201.3044202-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/mm/init.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)
--- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -235,18 +235,19 @@ static void __init setup_bootmem(void) kernel_map.va_pa_offset = PAGE_OFFSET - phys_ram_base;
/* - * memblock allocator is not aware of the fact that last 4K bytes of - * the addressable memory can not be mapped because of IS_ERR_VALUE - * macro. Make sure that last 4k bytes are not usable by memblock - * if end of dram is equal to maximum addressable memory. For 64-bit - * kernel, this problem can't happen here as the end of the virtual - * address space is occupied by the kernel mapping then this check must - * be done as soon as the kernel mapping base address is determined. + * Reserve physical address space that would be mapped to virtual + * addresses greater than (void *)(-PAGE_SIZE) because: + * - This memory would overlap with ERR_PTR + * - This memory belongs to high memory, which is not supported + * + * This is not applicable to 64-bit kernel, because virtual addresses + * after (void *)(-PAGE_SIZE) are not linearly mapped: they are + * occupied by kernel mapping. Also it is unrealistic for high memory + * to exist on 64-bit platforms. */ if (!IS_ENABLED(CONFIG_64BIT)) { - max_mapped_addr = __pa(~(ulong)0); - if (max_mapped_addr == (phys_ram_end - 1)) - memblock_set_current_limit(max_mapped_addr - 4096); + max_mapped_addr = __va_to_pa_nodebug(-PAGE_SIZE); + memblock_reserve(max_mapped_addr, (phys_addr_t)-max_mapped_addr); }
min_low_pfn = PFN_UP(phys_ram_base);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt (Google) rostedt@goodmis.org
commit 23a4b108accc29a6125ed14de4a044689ffeda78 upstream.
The kprobe_eventname.tc test checks if a function with .isra. can have a kprobe attached to it. It loops through the kallsyms file for all the functions that have the .isra. name, and checks if it exists in the available_filter_functions file, and if it does, it uses it to attach a kprobe to it.
The issue is that kprobes can not attach to functions that are listed more than once in available_filter_functions. With the latest kernel, the function that is found is: rapl_event_update.isra.0
# grep rapl_event_update.isra.0 /sys/kernel/tracing/available_filter_functions rapl_event_update.isra.0 rapl_event_update.isra.0
It is listed twice. This causes the attached kprobe to it to fail which in turn fails the test. Instead of just picking the function function that is found in available_filter_functions, pick the first one that is listed only once in available_filter_functions.
Cc: stable@vger.kernel.org Fixes: 604e3548236d ("selftests/ftrace: Select an existing function in kprobe_eventname test") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc +++ b/tools/testing/selftests/ftrace/test.d/kprobe/kprobe_eventname.tc @@ -30,7 +30,8 @@ find_dot_func() { fi
grep " [tT] .*.isra..*" /proc/kallsyms | cut -f 3 -d " " | while read f; do - if grep -s $f available_filter_functions; then + cnt=`grep -s $f available_filter_functions | wc -l`; + if [ $cnt -eq 1 ]; then echo $f break fi
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthias Maennich maennich@google.com
commit 3bd27a847a3a4827a948387cc8f0dbc9fa5931d5 upstream.
Build environments might be running with different umask settings resulting in indeterministic file modes for the files contained in kheaders.tar.xz. The file itself is served with 444, i.e. world readable. Archive the files explicitly with 744,a+X to improve reproducibility across build environments.
--mode=0444 is not suitable as directories need to be executable. Also, 444 makes it hard to delete all the readonly files after extraction.
Cc: stable@vger.kernel.org Signed-off-by: Matthias Maennich maennich@google.com Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/gen_kheaders.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/gen_kheaders.sh +++ b/kernel/gen_kheaders.sh @@ -89,7 +89,7 @@ find $cpio_dir -type f -print0 |
# Create archive and try to normalize metadata for reproducibility. tar "${KBUILD_BUILD_TIMESTAMP:+--mtime=$KBUILD_BUILD_TIMESTAMP}" \ - --owner=0 --group=0 --sort=name --numeric-owner \ + --owner=0 --group=0 --sort=name --numeric-owner --mode=u=rw,go=r,a+X \ -I $XZ -cf $tarfile -C $cpio_dir/ . > /dev/null
echo $headers_md5 > kernel/kheaders.md5
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Damien Le Moal dlemoal@kernel.org
commit 233e27b4d21c3e44eb863f03e566d3a22e81a7ae upstream.
When changing the maximum number of open zones, print that number instead of the total number of zones.
Fixes: dc4d137ee3b7 ("null_blk: add support for max open/active zone limit for zoned devices") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal dlemoal@kernel.org Reviewed-by: Niklas Cassel cassel@kernel.org Link: https://lore.kernel.org/r/20240528062852.437599-1-dlemoal@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/null_blk/zoned.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/block/null_blk/zoned.c +++ b/drivers/block/null_blk/zoned.c @@ -113,7 +113,7 @@ int null_init_zoned_dev(struct nullb_dev if (dev->zone_max_active && dev->zone_max_open > dev->zone_max_active) { dev->zone_max_open = dev->zone_max_active; pr_info("changed the maximum number of open zones to %u\n", - dev->nr_zones); + dev->zone_max_open); } else if (dev->zone_max_open >= dev->nr_zones - dev->zone_nr_conv) { dev->zone_max_open = 0; pr_info("zone_max_open limit disabled, limit >= zone count\n");
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Nader dev@kayoway.com
commit 9e2f46cd87473c70d01fcaf8a559809e6d18dd50 upstream.
Commit b8b8b4e0c052 ("ata: ahci: Add Intel Alder Lake-P AHCI controller to low power chipsets list") added Intel Alder Lake to the ahci_pci_tbl.
Because of the way that the Intel PCS quirk was implemented, having an explicit entry in the ahci_pci_tbl caused the Intel PCS quirk to be applied. (The quirk was not being applied if there was no explict entry.)
Thus, entries that were added to the ahci_pci_tbl also got the Intel PCS quirk applied.
The quirk was cleaned up in commit 7edbb6059274 ("ahci: clean up intel_pcs_quirk"), such that it is clear which entries that actually applies the Intel PCS quirk.
Newer Intel AHCI controllers do not need the Intel PCS quirk, and applying it when not needed actually breaks some platforms.
Do not apply the Intel PCS quirk for Intel Alder Lake. This is in line with how things worked before commit b8b8b4e0c052 ("ata: ahci: Add Intel Alder Lake-P AHCI controller to low power chipsets list"), such that certain platforms using Intel Alder Lake will work once again.
Cc: stable@vger.kernel.org # 6.7 Fixes: b8b8b4e0c052 ("ata: ahci: Add Intel Alder Lake-P AHCI controller to low power chipsets list") Signed-off-by: Jason Nader dev@kayoway.com Signed-off-by: Niklas Cassel cassel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/ahci.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index 6548f10e61d9..07d66d2c5f0d 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -429,7 +429,6 @@ static const struct pci_device_id ahci_pci_tbl[] = { { PCI_VDEVICE(INTEL, 0x02d7), board_ahci_pcs_quirk }, /* Comet Lake PCH RAID */ /* Elkhart Lake IDs 0x4b60 & 0x4b62 https://sata-io.org/product/8803 not tested yet */ { PCI_VDEVICE(INTEL, 0x4b63), board_ahci_pcs_quirk }, /* Elkhart Lake AHCI */ - { PCI_VDEVICE(INTEL, 0x7ae2), board_ahci_pcs_quirk }, /* Alder Lake-P AHCI */
/* JMicron 360/1/3/5/6, match class to avoid IDE function */ { PCI_VENDOR_ID_JMICRON, PCI_ANY_ID, PCI_ANY_ID, PCI_ANY_ID,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Niklas Cassel cassel@kernel.org
commit 3cb648c4dd3e8dde800fb3659250ed11f2d9efa5 upstream.
Commit 7627a0edef54 ("ata: ahci: Drop low power policy board type") dropped the board_ahci_low_power board type, and instead enables LPM if: -The AHCI controller reports that it supports LPM (Partial/Slumber), and -CONFIG_SATA_MOBILE_LPM_POLICY != 0, and -The port is not defined as external in the per port PxCMD register, and -The port is not defined as hotplug capable in the per port PxCMD register.
Partial and Slumber LPM states can either be initiated by HIPM or DIPM.
For HIPM (host initiated power management) to get enabled, both the AHCI controller and the drive have to report that they support HIPM.
For DIPM (device initiated power management) to get enabled, only the drive has to report that it supports DIPM. However, the HBA will reject device requests to enter LPM states which the HBA does not support.
The problem is that Apacer AS340 drives do not handle low power modes correctly. The problem was most likely not seen before because no one had used this drive with a AHCI controller with LPM enabled.
Add a quirk so that we do not enable LPM for this drive, since we see command timeouts if we do (even though the drive claims to support DIPM).
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Cc: stable@vger.kernel.org Reported-by: Tim Teichmann teichmanntim@outlook.de Closes: https://lore.kernel.org/linux-ide/87bk4pbve8.ffs@tglx/ Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Reviewed-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Niklas Cassel cassel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/libata-core.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4199,6 +4199,9 @@ static const struct ata_blacklist_entry ATA_HORKAGE_ZERO_AFTER_TRIM | ATA_HORKAGE_NOLPM },
+ /* Apacer models with LPM issues */ + { "Apacer AS340*", NULL, ATA_HORKAGE_NOLPM }, + /* These specific Samsung models/firmware-revs do not handle LPM well */ { "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM }, { "SAMSUNG SSD PM830 mSATA *", "CXM13D1Q", ATA_HORKAGE_NOLPM },
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Niklas Cassel cassel@kernel.org
commit 86aaa7e9d641c1ad1035ed2df88b8d0b48c86b30 upstream.
Commit 7627a0edef54 ("ata: ahci: Drop low power policy board type") dropped the board_ahci_low_power board type, and instead enables LPM if: -The AHCI controller reports that it supports LPM (Partial/Slumber), and -CONFIG_SATA_MOBILE_LPM_POLICY != 0, and -The port is not defined as external in the per port PxCMD register, and -The port is not defined as hotplug capable in the per port PxCMD register.
Partial and Slumber LPM states can either be initiated by HIPM or DIPM.
For HIPM (host initiated power management) to get enabled, both the AHCI controller and the drive have to report that they support HIPM.
For DIPM (device initiated power management) to get enabled, only the drive has to report that it supports DIPM. However, the HBA will reject device requests to enter LPM states which the HBA does not support.
The problem is that Crucial CT240BX500SSD1 drives do not handle low power modes correctly. The problem was most likely not seen before because no one had used this drive with a AHCI controller with LPM enabled.
Add a quirk so that we do not enable LPM for this drive, since we see command timeouts if we do (even though the drive claims to support DIPM).
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Cc: stable@vger.kernel.org Reported-by: Aarrayy lp610mh@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218832 Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Reviewed-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Niklas Cassel cassel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/libata-core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4180,8 +4180,9 @@ static const struct ata_blacklist_entry { "PIONEER BD-RW BDR-207M", NULL, ATA_HORKAGE_NOLPM }, { "PIONEER BD-RW BDR-205", NULL, ATA_HORKAGE_NOLPM },
- /* Crucial BX100 SSD 500GB has broken LPM support */ + /* Crucial devices with broken LPM support */ { "CT500BX100SSD1", NULL, ATA_HORKAGE_NOLPM }, + { "CT240BX500SSD1", NULL, ATA_HORKAGE_NOLPM },
/* 512GB MX100 with MU01 firmware has both queued TRIM and LPM issues */ { "Crucial_CT512MX100*", "MU01", ATA_HORKAGE_NO_NCQ_TRIM |
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Niklas Cassel cassel@kernel.org
commit 473880369304cfd4445720cdd8bae4c6f1e16e60 upstream.
Commit 7627a0edef54 ("ata: ahci: Drop low power policy board type") dropped the board_ahci_low_power board type, and instead enables LPM if: -The AHCI controller reports that it supports LPM (Partial/Slumber), and -CONFIG_SATA_MOBILE_LPM_POLICY != 0, and -The port is not defined as external in the per port PxCMD register, and -The port is not defined as hotplug capable in the per port PxCMD register.
Partial and Slumber LPM states can either be initiated by HIPM or DIPM.
For HIPM (host initiated power management) to get enabled, both the AHCI controller and the drive have to report that they support HIPM.
For DIPM (device initiated power management) to get enabled, only the drive has to report that it supports DIPM. However, the HBA will reject device requests to enter LPM states which the HBA does not support.
The problem is that AMD Radeon S3 SSD drives do not handle low power modes correctly. The problem was most likely not seen before because no one had used this drive with a AHCI controller with LPM enabled.
Add a quirk so that we do not enable LPM for this drive, since we see command timeouts if we do (even though the drive claims to support both HIPM and DIPM).
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Cc: stable@vger.kernel.org Reported-by: Doru Iorgulescu doru.iorgulescu1@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218832 Reviewed-by: Mika Westerberg mika.westerberg@linux.intel.com Reviewed-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Niklas Cassel cassel@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/libata-core.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/ata/libata-core.c +++ b/drivers/ata/libata-core.c @@ -4203,6 +4203,9 @@ static const struct ata_blacklist_entry /* Apacer models with LPM issues */ { "Apacer AS340*", NULL, ATA_HORKAGE_NOLPM },
+ /* AMD Radeon devices with broken LPM support */ + { "R3SL240G", NULL, ATA_HORKAGE_NOLPM }, + /* These specific Samsung models/firmware-revs do not handle LPM well */ { "SAMSUNG MZMPC128HBFU-000MV", "CXM14M1Q", ATA_HORKAGE_NOLPM }, { "SAMSUNG SSD PM830 mSATA *", "CXM13D1Q", ATA_HORKAGE_NOLPM },
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thadeu Lima de Souza Cascardo cascardo@igalia.com
commit 4b4647add7d3c8530493f7247d11e257ee425bf0 upstream.
sk_psock_get will return NULL if the refcount of psock has gone to 0, which will happen when the last call of sk_psock_put is done. However, sk_psock_drop may not have finished yet, so the close callback will still point to sock_map_close despite psock being NULL.
This can be reproduced with a thread deleting an element from the sock map, while the second one creates a socket, adds it to the map and closes it.
That will trigger the WARN_ON_ONCE:
------------[ cut here ]------------ WARNING: CPU: 1 PID: 7220 at net/core/sock_map.c:1701 sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701 Modules linked in: CPU: 1 PID: 7220 Comm: syz-executor380 Not tainted 6.9.0-syzkaller-07726-g3c999d1ae3c7 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 04/02/2024 RIP: 0010:sock_map_close+0x2a2/0x2d0 net/core/sock_map.c:1701 Code: df e8 92 29 88 f8 48 8b 1b 48 89 d8 48 c1 e8 03 42 80 3c 20 00 74 08 48 89 df e8 79 29 88 f8 4c 8b 23 eb 89 e8 4f 15 23 f8 90 <0f> 0b 90 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d e9 13 26 3d 02 RSP: 0018:ffffc9000441fda8 EFLAGS: 00010293 RAX: ffffffff89731ae1 RBX: ffffffff94b87540 RCX: ffff888029470000 RDX: 0000000000000000 RSI: ffffffff8bcab5c0 RDI: ffffffff8c1faba0 RBP: 0000000000000000 R08: ffffffff92f9b61f R09: 1ffffffff25f36c3 R10: dffffc0000000000 R11: fffffbfff25f36c4 R12: ffffffff89731840 R13: ffff88804b587000 R14: ffff88804b587000 R15: ffffffff89731870 FS: 000055555e080380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 00000000207d4000 CR4: 0000000000350ef0 Call Trace: <TASK> unix_release+0x87/0xc0 net/unix/af_unix.c:1048 __sock_release net/socket.c:659 [inline] sock_close+0xbe/0x240 net/socket.c:1421 __fput+0x42b/0x8a0 fs/file_table.c:422 __do_sys_close fs/open.c:1556 [inline] __se_sys_close fs/open.c:1541 [inline] __x64_sys_close+0x7f/0x110 fs/open.c:1541 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7fb37d618070 Code: 00 00 48 c7 c2 b8 ff ff ff f7 d8 64 89 02 b8 ff ff ff ff eb d4 e8 10 2c 00 00 80 3d 31 f0 07 00 00 74 17 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 48 c3 0f 1f 80 00 00 00 00 48 83 ec 18 89 7c RSP: 002b:00007ffcd4a525d8 EFLAGS: 00000202 ORIG_RAX: 0000000000000003 RAX: ffffffffffffffda RBX: 0000000000000005 RCX: 00007fb37d618070 RDX: 0000000000000010 RSI: 00000000200001c0 RDI: 0000000000000004 RBP: 0000000000000000 R08: 0000000100000000 R09: 0000000100000000 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 </TASK>
Use sk_psock, which will only check that the pointer is not been set to NULL yet, which should only happen after the callbacks are restored. If, then, a reference can still be gotten, we may call sk_psock_stop and cancel psock->work.
As suggested by Paolo Abeni, reorder the condition so the control flow is less convoluted.
After that change, the reproducer does not trigger the WARN_ON_ONCE anymore.
Suggested-by: Paolo Abeni pabeni@redhat.com Reported-by: syzbot+07a2e4a1a57118ef7355@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=07a2e4a1a57118ef7355 Fixes: aadb2bb83ff7 ("sock_map: Fix a potential use-after-free in sock_map_close()") Fixes: 5b4a79ba65a1 ("bpf, sockmap: Don't let sock_map_{close,destroy,unhash} call itself") Cc: stable@vger.kernel.org Signed-off-by: Thadeu Lima de Souza Cascardo cascardo@igalia.com Acked-by: Jakub Sitnicki jakub@cloudflare.com Link: https://lore.kernel.org/r/20240524144702.1178377-1-cascardo@igalia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/core/sock_map.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-)
--- a/net/core/sock_map.c +++ b/net/core/sock_map.c @@ -1639,19 +1639,23 @@ void sock_map_close(struct sock *sk, lon
lock_sock(sk); rcu_read_lock(); - psock = sk_psock_get(sk); - if (unlikely(!psock)) { - rcu_read_unlock(); - release_sock(sk); - saved_close = READ_ONCE(sk->sk_prot)->close; - } else { + psock = sk_psock(sk); + if (likely(psock)) { saved_close = psock->saved_close; sock_map_remove_links(sk, psock); + psock = sk_psock_get(sk); + if (unlikely(!psock)) + goto no_psock; rcu_read_unlock(); sk_psock_stop(psock); release_sock(sk); cancel_delayed_work_sync(&psock->work); sk_psock_put(sk, psock); + } else { + saved_close = READ_ONCE(sk->sk_prot)->close; +no_psock: + rcu_read_unlock(); + release_sock(sk); }
/* Make sure we do not recurse. This is a bug.
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fedor Pchelkin pchelkin@ispras.ru
commit 6cb05d89fd62a76a9b74bd16211fb0930e89fea8 upstream.
kthread creation may possibly fail inside race_signal_callback(). In such a case stop the already started threads, put the already taken references to them and return with error code.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 2989f6451084 ("dma-buf: Add selftests for dma-fence") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Reviewed-by: T.J. Mercier tjmercier@google.com Link: https://patchwork.freedesktop.org/patch/msgid/20240522181308.841686-1-pchelk... Signed-off-by: Christian König christian.koenig@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma-buf/st-dma-fence.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/dma-buf/st-dma-fence.c +++ b/drivers/dma-buf/st-dma-fence.c @@ -540,6 +540,12 @@ static int race_signal_callback(void *ar t[i].before = pass; t[i].task = kthread_run(thread_signal_callback, &t[i], "dma-fence:%d", i); + if (IS_ERR(t[i].task)) { + ret = PTR_ERR(t[i].task); + while (--i >= 0) + kthread_stop_put(t[i].task); + return ret; + } get_task_struct(t[i].task); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hagar Gamal Halim Hemdan hagarhem@amazon.com
commit 8003f00d895310d409b2bf9ef907c56b42a4e0f4 upstream.
Coverity spotted that event_msg is controlled by user-space, event_msg->event_data.event is passed to event_deliver() and used as an index without sanitization.
This change ensures that the event index is sanitized to mitigate any possibility of speculative information leaks.
This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc.
Only compile tested, no access to HW.
Fixes: 1d990201f9bb ("VMCI: event handling implementation.") Cc: stable stable@kernel.org Signed-off-by: Hagar Gamal Halim Hemdan hagarhem@amazon.com Link: https://lore.kernel.org/stable/20231127193533.46174-1-hagarhem%40amazon.com Link: https://lore.kernel.org/r/20240430085916.4753-1-hagarhem@amazon.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/misc/vmw_vmci/vmci_event.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/misc/vmw_vmci/vmci_event.c +++ b/drivers/misc/vmw_vmci/vmci_event.c @@ -9,6 +9,7 @@ #include <linux/vmw_vmci_api.h> #include <linux/list.h> #include <linux/module.h> +#include <linux/nospec.h> #include <linux/sched.h> #include <linux/slab.h> #include <linux/rculist.h> @@ -86,9 +87,12 @@ static void event_deliver(struct vmci_ev { struct vmci_subscription *cur; struct list_head *subscriber_list; + u32 sanitized_event, max_vmci_event;
rcu_read_lock(); - subscriber_list = &subscriber_array[event_msg->event_data.event]; + max_vmci_event = ARRAY_SIZE(subscriber_array); + sanitized_event = array_index_nospec(event_msg->event_data.event, max_vmci_event); + subscriber_list = &subscriber_array[sanitized_event]; list_for_each_entry_rcu(cur, subscriber_list, node) { cur->callback(cur->id, &event_msg->event_data, cur->callback_data);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vamshi Gajjela vamshigajjela@google.com
commit eda4923d78d634482227c0b189d9b7ca18824146 upstream.
'nr' member of struct spmi_controller, which serves as an identifier for the controller/bus. This value is a dynamic ID assigned in spmi_controller_alloc, and overriding it from the driver results in an ida_free error "ida_free called for id=xx which is not allocated".
Signed-off-by: Vamshi Gajjela vamshigajjela@google.com Fixes: 70f59c90c819 ("staging: spmi: add Hikey 970 SPMI controller driver") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240228185116.1269-1-vamshigajjela@google.com Signed-off-by: Stephen Boyd sboyd@kernel.org Link: https://lore.kernel.org/r/20240507210809.3479953-5-sboyd@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/spmi/hisi-spmi-controller.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/spmi/hisi-spmi-controller.c +++ b/drivers/spmi/hisi-spmi-controller.c @@ -300,7 +300,6 @@ static int spmi_controller_probe(struct
spin_lock_init(&spmi_controller->lock);
- ctrl->nr = spmi_controller->channel; ctrl->dev.parent = pdev->dev.parent; ctrl->dev.of_node = of_node_get(pdev->dev.of_node);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Trond Myklebust trond.myklebust@hammerspace.com
commit e221c45da3770962418fb30c27d941bbc70d595a upstream.
The 'NFS error' NFSERR_OPNOTSUPP is not described by any of the official NFS related RFCs, but appears to have snuck into some older .x files for NFSv2. Either way, it is not in RFC1094, RFC1813 or any of the NFSv4 RFCs, so should not be returned by the knfsd server, and particularly not by the "LOOKUP" operation.
Instead, let's return NFSERR_STALE, which is more appropriate if the filesystem encodes the filehandle as FILEID_INVALID.
Cc: stable@vger.kernel.org Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfsfh.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/nfsd/nfsfh.c +++ b/fs/nfsd/nfsfh.c @@ -573,7 +573,7 @@ fh_compose(struct svc_fh *fhp, struct sv _fh_update(fhp, exp, dentry); if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) { fh_put(fhp); - return nfserr_opnotsupp; + return nfserr_stale; }
return 0; @@ -599,7 +599,7 @@ fh_update(struct svc_fh *fhp)
_fh_update(fhp, fhp->fh_export, dentry); if (fhp->fh_handle.fh_fileid_type == FILEID_INVALID) - return nfserr_opnotsupp; + return nfserr_stale; return 0; out_bad: printk(KERN_ERR "fh_update: fh not verified!\n");
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rik van Riel riel@surriel.com
commit 5cbcb62dddf5346077feb82b7b0c9254222d3445 upstream.
While taking a kernel core dump with makedumpfile on a larger system, softlockup messages often appear.
While softlockup warnings can be harmless, they can also interfere with things like RCU freeing memory, which can be problematic when the kdump kexec image is configured with as little memory as possible.
Avoid the softlockup, and give things like work items and RCU a chance to do their thing during __read_vmcore by adding a cond_resched.
Link: https://lkml.kernel.org/r/20240507091858.36ff767f@imladris.surriel.com Signed-off-by: Rik van Riel riel@surriel.com Acked-by: Baoquan He bhe@redhat.com Cc: Dave Young dyoung@redhat.com Cc: Vivek Goyal vgoyal@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/proc/vmcore.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/proc/vmcore.c +++ b/fs/proc/vmcore.c @@ -383,6 +383,8 @@ static ssize_t __read_vmcore(struct iov_ /* leave now if filled buffer already */ if (!iov_iter_count(iter)) return acc; + + cond_resched(); }
list_for_each_entry(m, &vmcore_list, list) {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baoquan He bhe@redhat.com
commit f4af41bf177add167e39e4b0203460b1d0b531f6 upstream.
Jiri reported that the current kexec_dprintk() always prints out debugging message whenever kexec/kdmmp loading is triggered. That is not wanted. The debugging message is supposed to be printed out when 'kexec -s -d' is specified for kexec/kdump loading.
After investigating, the reason is the current kexec_dprintk() takes printk(KERN_INFO) or printk(KERN_DEBUG) depending on whether '-d' is specified. However, distros usually have defaulg log level like below:
[~]# cat /proc/sys/kernel/printk 7 4 1 7
So, even though '-d' is not specified, printk(KERN_DEBUG) also always prints out. I thought printk(KERN_DEBUG) is equal to pr_debug(), it's not.
Fix it by changing to use pr_info() instead which are expected to work.
Link: https://lkml.kernel.org/r/20240409042238.1240462-1-bhe@redhat.com Fixes: cbc2fe9d9cb2 ("kexec_file: add kexec_file flag to control debug printing") Signed-off-by: Baoquan He bhe@redhat.com Reported-by: Jiri Slaby jirislaby@kernel.org Closes: https://lore.kernel.org/all/4c775fca-5def-4a2d-8437-7130b02722a2@kernel.org Reviewed-by: Dave Young dyoung@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/kexec.h | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/include/linux/kexec.h b/include/linux/kexec.h index 060835bb82d5..f31bd304df45 100644 --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -461,10 +461,8 @@ static inline void arch_kexec_pre_free_pages(void *vaddr, unsigned int pages) {
extern bool kexec_file_dbg_print;
-#define kexec_dprintk(fmt, ...) \ - printk("%s" fmt, \ - kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG, \ - ##__VA_ARGS__) +#define kexec_dprintk(fmt, arg...) \ + do { if (kexec_file_dbg_print) pr_info(fmt, ##arg); } while (0)
#else /* !CONFIG_KEXEC_CORE */ struct pt_regs;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Su Yue glass.su@suse.com
commit 8c40984eeb8804cffcd28640f427f4fe829243fc upstream.
transaction id should be updated in ocfs2_unlink and ocfs2_link. Otherwise, inode link will be wrong after journal replay even fsync was called before power failure: ======================================================================= $ touch testdir/bar $ ln testdir/bar testdir/bar_link $ fsync testdir/bar $ stat -c %h $SCRATCH_MNT/testdir/bar 1 $ stat -c %h $SCRATCH_MNT/testdir/bar 1 =======================================================================
Link: https://lkml.kernel.org/r/20240408082041.20925-4-glass.su@suse.com Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: Su Yue glass.su@suse.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Changwei Ge gechangwei@live.cn Cc: Gang He ghe@suse.com Cc: Joel Becker jlbec@evilplan.org Cc: Jun Piao piaojun@huawei.com Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Mark Fasheh mark@fasheh.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/namei.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/ocfs2/namei.c +++ b/fs/ocfs2/namei.c @@ -797,6 +797,7 @@ static int ocfs2_link(struct dentry *old ocfs2_set_links_count(fe, inode->i_nlink); fe->i_ctime = cpu_to_le64(inode_get_ctime_sec(inode)); fe->i_ctime_nsec = cpu_to_le32(inode_get_ctime_nsec(inode)); + ocfs2_update_inode_fsync_trans(handle, inode, 0); ocfs2_journal_dirty(handle, fe_bh);
err = ocfs2_add_entry(handle, dentry, inode, @@ -993,6 +994,7 @@ static int ocfs2_unlink(struct inode *di drop_nlink(inode); drop_nlink(inode); ocfs2_set_links_count(fe, inode->i_nlink); + ocfs2_update_inode_fsync_trans(handle, inode, 0); ocfs2_journal_dirty(handle, fe_bh);
inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir));
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Su Yue glass.su@suse.com
commit b8cb324277ee16f3eca3055b96fce4735a5a41c6 upstream.
The default atime related mount option is '-o realtime' which means file atime should be updated if atime <= ctime or atime <= mtime. atime should be updated in the following scenario, but it is not: ========================================================== $ rm /mnt/testfile; $ echo test > /mnt/testfile $ stat -c "%X %Y %Z" /mnt/testfile 1711881646 1711881646 1711881646 $ sleep 5 $ cat /mnt/testfile > /dev/null $ stat -c "%X %Y %Z" /mnt/testfile 1711881646 1711881646 1711881646 ==========================================================
And the reason the atime in the test is not updated is that ocfs2 calls ktime_get_real_ts64() in __ocfs2_mknod_locked during file creation. Then inode_set_ctime_current() is called in inode_set_ctime_current() calls ktime_get_coarse_real_ts64() to get current time.
ktime_get_real_ts64() is more accurate than ktime_get_coarse_real_ts64(). In my test box, I saw ctime set by ktime_get_coarse_real_ts64() is less than ktime_get_real_ts64() even ctime is set later. The ctime of the new inode is smaller than atime.
The call trace is like:
ocfs2_create ocfs2_mknod __ocfs2_mknod_locked ....
ktime_get_real_ts64 <------- set atime,ctime,mtime, more accurate ocfs2_populate_inode ... ocfs2_init_acl ocfs2_acl_set_mode inode_set_ctime_current current_time ktime_get_coarse_real_ts64 <-------less accurate
ocfs2_file_read_iter ocfs2_inode_lock_atime ocfs2_should_update_atime atime <= ctime ? <-------- false, ctime < atime due to accuracy
So here call ktime_get_coarse_real_ts64 to set inode time coarser while creating new files. It may lower the accuracy of file times. But it's not a big deal since we already use coarse time in other places like ocfs2_update_inode_atime and inode_set_ctime_current.
Link: https://lkml.kernel.org/r/20240408082041.20925-5-glass.su@suse.com Fixes: c62c38f6b91b ("ocfs2: replace CURRENT_TIME macro") Signed-off-by: Su Yue glass.su@suse.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Mark Fasheh mark@fasheh.com Cc: Joel Becker jlbec@evilplan.org Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Changwei Ge gechangwei@live.cn Cc: Gang He ghe@suse.com Cc: Jun Piao piaojun@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/namei.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ocfs2/namei.c +++ b/fs/ocfs2/namei.c @@ -566,7 +566,7 @@ static int __ocfs2_mknod_locked(struct i fe->i_last_eb_blk = 0; strcpy(fe->i_signature, OCFS2_INODE_SIGNATURE); fe->i_flags |= cpu_to_le32(OCFS2_VALID_FL); - ktime_get_real_ts64(&ts); + ktime_get_coarse_real_ts64(&ts); fe->i_atime = fe->i_ctime = fe->i_mtime = cpu_to_le64(ts.tv_sec); fe->i_mtime_nsec = fe->i_ctime_nsec = fe->i_atime_nsec =
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Su Yue glass.su@suse.com
commit 952b023f06a24b2ad6ba67304c4c84d45bea2f18 upstream.
After commit "ocfs2: return real error code in ocfs2_dio_wr_get_block", fstests/generic/300 become from always failed to sometimes failed:
======================================================================== [ 473.293420 ] run fstests generic/300
[ 475.296983 ] JBD2: Ignoring recovery information on journal [ 475.302473 ] ocfs2: Mounting device (253,1) on (node local, slot 0) with ordered data mode. [ 494.290998 ] OCFS2: ERROR (device dm-1): ocfs2_change_extent_flag: Owner 5668 has an extent at cpos 78723 which can no longer be found [ 494.291609 ] On-disk corruption discovered. Please run fsck.ocfs2 once the filesystem is unmounted. [ 494.292018 ] OCFS2: File system is now read-only. [ 494.292224 ] (kworker/19:11,2628,19):ocfs2_mark_extent_written:5272 ERROR: status = -30 [ 494.292602 ] (kworker/19:11,2628,19):ocfs2_dio_end_io_write:2374 ERROR: status = -3 fio: io_u error on file /mnt/scratch/racer: Read-only file system: write offset=460849152, buflen=131072 =========================================================================
In __blockdev_direct_IO, ocfs2_dio_wr_get_block is called to add unwritten extents to a list. extents are also inserted into extent tree in ocfs2_write_begin_nolock. Then another thread call fallocate to puch a hole at one of the unwritten extent. The extent at cpos was removed by ocfs2_remove_extent(). At end io worker thread, ocfs2_search_extent_list found there is no such extent at the cpos.
T1 T2 T3 inode lock ... insert extents ... inode unlock ocfs2_fallocate __ocfs2_change_file_space inode lock lock ip_alloc_sem ocfs2_remove_inode_range inode ocfs2_remove_btree_range ocfs2_remove_extent ^---remove the extent at cpos 78723 ... unlock ip_alloc_sem inode unlock ocfs2_dio_end_io ocfs2_dio_end_io_write lock ip_alloc_sem ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_search_extent_list ^---failed to find extent ... unlock ip_alloc_sem
In most filesystems, fallocate is not compatible with racing with AIO+DIO, so fix it by adding to wait for all dio before fallocate/punch_hole like ext4.
Link: https://lkml.kernel.org/r/20240408082041.20925-3-glass.su@suse.com Fixes: b25801038da5 ("ocfs2: Support xfs style space reservation ioctls") Signed-off-by: Su Yue glass.su@suse.com Reviewed-by: Joseph Qi joseph.qi@linux.alibaba.com Cc: Changwei Ge gechangwei@live.cn Cc: Gang He ghe@suse.com Cc: Joel Becker jlbec@evilplan.org Cc: Jun Piao piaojun@huawei.com Cc: Junxiao Bi junxiao.bi@oracle.com Cc: Mark Fasheh mark@fasheh.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ocfs2/file.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/ocfs2/file.c +++ b/fs/ocfs2/file.c @@ -1936,6 +1936,8 @@ static int __ocfs2_change_file_space(str
inode_lock(inode);
+ /* Wait all existing dio workers, newcomers will block on i_rwsem */ + inode_dio_wait(inode); /* * This prevents concurrent writes on other nodes */
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
commit 69381cf88a8dfa0ab27fb801b78be813e7e8fb80 upstream.
dm-integrity could set discard_granularity lower than the logical block size. This could result in failures when sending discard requests to dm-integrity.
This fix is needed for kernels prior to 6.10.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Reported-by: Eric Wheeler linux-integrity@lists.ewheeler.net Cc: stable@vger.kernel.org # <= 6.9 Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-integrity.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/md/dm-integrity.c b/drivers/md/dm-integrity.c index 7f3dc8ee6ab8..417fddebe367 100644 --- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -3492,6 +3492,7 @@ static void dm_integrity_io_hints(struct dm_target *ti, struct queue_limits *lim limits->physical_block_size = ic->sectors_per_block << SECTOR_SHIFT; blk_limits_io_min(limits, ic->sectors_per_block << SECTOR_SHIFT); limits->dma_alignment = limits->logical_block_size - 1; + limits->discard_granularity = ic->sectors_per_block << SECTOR_SHIFT; } limits->max_integrity_segments = USHRT_MAX; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rick Wertenbroek rick.wertenbroek@gmail.com
commit 2dba285caba53f309d6060fca911b43d63f41697 upstream.
Remove wrong mask on subsys_vendor_id. Both the Vendor ID and Subsystem Vendor ID are u16 variables and are written to a u32 register of the controller. The Subsystem Vendor ID was always 0 because the u16 value was masked incorrectly with GENMASK(31,16) resulting in all lower 16 bits being set to 0 prior to the shift.
Remove both masks as they are unnecessary and set the register correctly i.e., the lower 16-bits are the Vendor ID and the upper 16-bits are the Subsystem Vendor ID.
This is documented in the RK3399 TRM section 17.6.7.1.17
[kwilczynski: removed unnecesary newline] Fixes: cf590b078391 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller") Link: https://lore.kernel.org/linux-pci/20240403144508.489835-1-rick.wertenbroek@g... Signed-off-by: Rick Wertenbroek rick.wertenbroek@gmail.com Signed-off-by: Krzysztof Wilczyński kwilczynski@kernel.org Signed-off-by: Bjorn Helgaas bhelgaas@google.com Reviewed-by: Damien Le Moal dlemoal@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/controller/pcie-rockchip-ep.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-)
--- a/drivers/pci/controller/pcie-rockchip-ep.c +++ b/drivers/pci/controller/pcie-rockchip-ep.c @@ -98,10 +98,8 @@ static int rockchip_pcie_ep_write_header
/* All functions share the same vendor ID with function 0 */ if (fn == 0) { - u32 vid_regs = (hdr->vendorid & GENMASK(15, 0)) | - (hdr->subsys_vendor_id & GENMASK(31, 16)) << 16; - - rockchip_pcie_write(rockchip, vid_regs, + rockchip_pcie_write(rockchip, + hdr->vendorid | hdr->subsys_vendor_id << 16, PCIE_CORE_CONFIG_VENDOR); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nuno Sa nuno.sa@analog.com
commit 1bc31444209c8efae98cb78818131950d9a6f4d6 upstream.
We need to first free the IRQ before calling of_dma_controller_free(). Otherwise we could get an interrupt and schedule a tasklet while removing the DMA controller.
Fixes: 0e3b67b348b8 ("dmaengine: Add support for the Analog Devices AXI-DMAC DMA controller") Cc: stable@kernel.org Signed-off-by: Nuno Sa nuno.sa@analog.com Link: https://lore.kernel.org/r/20240328-axi-dmac-devm-probe-v3-1-523c0176df70@ana... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/dma/dma-axi-dmac.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/dma/dma-axi-dmac.c +++ b/drivers/dma/dma-axi-dmac.c @@ -1134,8 +1134,8 @@ static void axi_dmac_remove(struct platf { struct axi_dmac *dmac = platform_get_drvdata(pdev);
- of_dma_controller_free(pdev->dev.of_node); free_irq(dmac->irq, dmac); + of_dma_controller_free(pdev->dev.of_node); tasklet_kill(&dmac->chan.vchan.task); dma_async_device_unregister(&dmac->dma_dev); clk_disable_unprepare(dmac->clk);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
commit 51474ab44abf907023a8a875e799b07de461e466 upstream.
If CONFIG_DRM_AUX_HPD_BRIDGE is not enabled, the aux-bridge.h header provides a stub for the bridge's functions. Correct the arguments list of one of those stubs to match the argument list of the non-stubbed function.
Fixes: e5ca263508f7 ("drm/bridge: aux-hpd: separate allocation and registration") Reported-by: kernel test robot lkp@intel.com Cc: stable stable@kernel.org Closes: https://lore.kernel.org/oe-kbuild-all/202405110428.TMCfb1Ut-lkp@intel.com/ Cc: Johan Hovold johan+linaro@kernel.org Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20240511-fix-aux-hpd-stubs-v1-1-98dae71dfaec@linar... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/drm/bridge/aux-bridge.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/drm/bridge/aux-bridge.h +++ b/include/drm/bridge/aux-bridge.h @@ -33,7 +33,7 @@ static inline struct auxiliary_device *d return NULL; }
-static inline int devm_drm_dp_hpd_bridge_add(struct auxiliary_device *adev) +static inline int devm_drm_dp_hpd_bridge_add(struct device *dev, struct auxiliary_device *adev) { return 0; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Apurva Nandan a-nandan@ti.com
commit 61f6f68447aba08aeaa97593af3a7d85a114891f upstream.
PSC controller has a limitation that it can only power-up the second core when the first core is in ON state. Power-state for core0 should be equal to or higher than core1, else the kernel is seen hanging during rproc loading.
Make the powering up of cores sequential, by waiting for the current core to power-up before proceeding to the next core, with a timeout of 2sec. Add a wait queue event in k3_r5_cluster_rproc_init call, that will wait for the current core to be released from reset before proceeding with the next core.
Fixes: 6dedbd1d5443 ("remoteproc: k3-r5: Add a remoteproc driver for R5F subsystem") Signed-off-by: Apurva Nandan a-nandan@ti.com Signed-off-by: Beleswar Padhi b-padhi@ti.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240430105307.1190615-2-b-padhi@ti.com Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/remoteproc/ti_k3_r5_remoteproc.c | 33 +++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+)
--- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -103,12 +103,14 @@ struct k3_r5_soc_data { * @dev: cached device pointer * @mode: Mode to configure the Cluster - Split or LockStep * @cores: list of R5 cores within the cluster + * @core_transition: wait queue to sync core state changes * @soc_data: SoC-specific feature data for a R5FSS */ struct k3_r5_cluster { struct device *dev; enum cluster_mode mode; struct list_head cores; + wait_queue_head_t core_transition; const struct k3_r5_soc_data *soc_data; };
@@ -128,6 +130,7 @@ struct k3_r5_cluster { * @atcm_enable: flag to control ATCM enablement * @btcm_enable: flag to control BTCM enablement * @loczrama: flag to dictate which TCM is at device address 0x0 + * @released_from_reset: flag to signal when core is out of reset */ struct k3_r5_core { struct list_head elem; @@ -144,6 +147,7 @@ struct k3_r5_core { u32 atcm_enable; u32 btcm_enable; u32 loczrama; + bool released_from_reset; };
/** @@ -460,6 +464,8 @@ static int k3_r5_rproc_prepare(struct rp ret); return ret; } + core->released_from_reset = true; + wake_up_interruptible(&cluster->core_transition);
/* * Newer IP revisions like on J7200 SoCs support h/w auto-initialization @@ -1140,6 +1146,12 @@ static int k3_r5_rproc_configure_mode(st return ret; }
+ /* + * Skip the waiting mechanism for sequential power-on of cores if the + * core has already been booted by another entity. + */ + core->released_from_reset = c_state; + ret = ti_sci_proc_get_status(core->tsp, &boot_vec, &cfg, &ctrl, &stat); if (ret < 0) { @@ -1280,6 +1292,26 @@ init_rmem: cluster->mode == CLUSTER_MODE_SINGLECPU || cluster->mode == CLUSTER_MODE_SINGLECORE) break; + + /* + * R5 cores require to be powered on sequentially, core0 + * should be in higher power state than core1 in a cluster + * So, wait for current core to power up before proceeding + * to next core and put timeout of 2sec for each core. + * + * This waiting mechanism is necessary because + * rproc_auto_boot_callback() for core1 can be called before + * core0 due to thread execution order. + */ + ret = wait_event_interruptible_timeout(cluster->core_transition, + core->released_from_reset, + msecs_to_jiffies(2000)); + if (ret <= 0) { + dev_err(dev, + "Timed out waiting for %s core to power up!\n", + rproc->name); + return ret; + } }
return 0; @@ -1709,6 +1741,7 @@ static int k3_r5_probe(struct platform_d cluster->dev = dev; cluster->soc_data = data; INIT_LIST_HEAD(&cluster->cores); + init_waitqueue_head(&cluster->core_transition);
ret = of_property_read_u32(np, "ti,cluster-mode", &cluster->mode); if (ret < 0 && ret != -EINVAL) {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Beleswar Padhi b-padhi@ti.com
commit 3c8a9066d584f5010b6f4ba03bf6b19d28973d52 upstream.
PSC controller has a limitation that it can only power-up the second core when the first core is in ON state. Power-state for core0 should be equal to or higher than core1.
Therefore, prevent core1 from powering up before core0 during the start process from sysfs. Similarly, prevent core0 from shutting down before core1 has been shut down from sysfs.
Fixes: 6dedbd1d5443 ("remoteproc: k3-r5: Add a remoteproc driver for R5F subsystem") Signed-off-by: Beleswar Padhi b-padhi@ti.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240430105307.1190615-3-b-padhi@ti.com Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/remoteproc/ti_k3_r5_remoteproc.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-)
--- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -548,7 +548,7 @@ static int k3_r5_rproc_start(struct rpro struct k3_r5_rproc *kproc = rproc->priv; struct k3_r5_cluster *cluster = kproc->cluster; struct device *dev = kproc->dev; - struct k3_r5_core *core; + struct k3_r5_core *core0, *core; u32 boot_addr; int ret;
@@ -574,6 +574,15 @@ static int k3_r5_rproc_start(struct rpro goto unroll_core_run; } } else { + /* do not allow core 1 to start before core 0 */ + core0 = list_first_entry(&cluster->cores, struct k3_r5_core, + elem); + if (core != core0 && core0->rproc->state == RPROC_OFFLINE) { + dev_err(dev, "%s: can not start core 1 before core 0\n", + __func__); + return -EPERM; + } + ret = k3_r5_core_run(core); if (ret) goto put_mbox; @@ -619,7 +628,8 @@ static int k3_r5_rproc_stop(struct rproc { struct k3_r5_rproc *kproc = rproc->priv; struct k3_r5_cluster *cluster = kproc->cluster; - struct k3_r5_core *core = kproc->core; + struct device *dev = kproc->dev; + struct k3_r5_core *core1, *core = kproc->core; int ret;
/* halt all applicable cores */ @@ -632,6 +642,15 @@ static int k3_r5_rproc_stop(struct rproc } } } else { + /* do not allow core 0 to stop before core 1 */ + core1 = list_last_entry(&cluster->cores, struct k3_r5_core, + elem); + if (core != core1 && core1->rproc->state != RPROC_OFFLINE) { + dev_err(dev, "%s: can not stop core 0 before core 1\n", + __func__); + return -EPERM; + } + ret = k3_r5_core_halt(core); if (ret) goto out;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nuno Sa nuno.sa@analog.com
commit 80721776c5af6f6dce7d84ba8df063957aa425a2 upstream.
We can only access the IP core registers if the bus clock is enabled. As such we need to get and enable it and not rely on anyone else to do it.
Note this clock is a very fundamental one that is typically enabled pretty early during boot. Independently of that, we should really rely on it to be enabled.
Fixes: ef04070692a2 ("iio: adc: adi-axi-adc: add support for AXI ADC IP core") Signed-off-by: Nuno Sa nuno.sa@analog.com Link: https://lore.kernel.org/r/20240426-ad9467-new-features-v2-4-6361fc3ba1cc@ana... Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/adc/adi-axi-adc.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/iio/adc/adi-axi-adc.c +++ b/drivers/iio/adc/adi-axi-adc.c @@ -175,6 +175,7 @@ static int adi_axi_adc_probe(struct plat struct adi_axi_adc_state *st; void __iomem *base; unsigned int ver; + struct clk *clk; int ret;
st = devm_kzalloc(&pdev->dev, sizeof(*st), GFP_KERNEL); @@ -195,6 +196,10 @@ static int adi_axi_adc_probe(struct plat if (!expected_ver) return -ENODEV;
+ clk = devm_clk_get_enabled(&pdev->dev, NULL); + if (IS_ERR(clk)) + return PTR_ERR(clk); + /* * Force disable the core. Up to the frontend to enable us. And we can * still read/write registers...
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dimitri Fedrau dima.fedrau@gmail.com
commit 827dca3129708a8465bde90c86c2e3c38e62dd4f upstream.
Temperature is stored as 16bit value in two's complement format. Current implementation ignores the sign bit. Make it aware of the sign bit by using sign_extend32.
Fixes: 3f6b9598b6df ("iio: temperature: Add MCP9600 thermocouple EMF converter") Signed-off-by: Dimitri Fedrau dima.fedrau@gmail.com Reviewed-by: Marcelo Schmitt marcelo.schmitt1@gmail.com Tested-by: Andrew Hepp andrew.hepp@ahepp.dev Link: https://lore.kernel.org/r/20240424185913.1177127-1-dima.fedrau@gmail.com Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/temperature/mcp9600.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/temperature/mcp9600.c b/drivers/iio/temperature/mcp9600.c index 46845804292b..7a3eef5d5e75 100644 --- a/drivers/iio/temperature/mcp9600.c +++ b/drivers/iio/temperature/mcp9600.c @@ -52,7 +52,8 @@ static int mcp9600_read(struct mcp9600_data *data,
if (ret < 0) return ret; - *val = ret; + + *val = sign_extend32(ret, 15);
return 0; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jean-Baptiste Maneyrol jean-baptiste.maneyrol@tdk.com
commit 0340dc4c82590d8735c58cf904a8aa1173273ab5 upstream.
Restrict interrupt timestamp alignment for not overflowing max/min period thresholds.
Fixes: 0ecc363ccea7 ("iio: make invensense timestamp module generic") Cc: stable@vger.kernel.org Signed-off-by: Jean-Baptiste Maneyrol jean-baptiste.maneyrol@tdk.com Link: https://lore.kernel.org/r/20240426135814.141837-1-inv.git-commit@tdk.com Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iio/common/inv_sensors/inv_sensors_timestamp.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/iio/common/inv_sensors/inv_sensors_timestamp.c +++ b/drivers/iio/common/inv_sensors/inv_sensors_timestamp.c @@ -105,6 +105,9 @@ static bool inv_update_chip_period(struc
static void inv_align_timestamp_it(struct inv_sensors_timestamp *ts) { + const int64_t period_min = ts->min_period * ts->mult; + const int64_t period_max = ts->max_period * ts->mult; + int64_t add_max, sub_max; int64_t delta, jitter; int64_t adjust;
@@ -112,11 +115,13 @@ static void inv_align_timestamp_it(struc delta = ts->it.lo - ts->timestamp;
/* adjust timestamp while respecting jitter */ + add_max = period_max - (int64_t)ts->period; + sub_max = period_min - (int64_t)ts->period; jitter = INV_SENSORS_TIMESTAMP_JITTER((int64_t)ts->period, ts->chip.jitter); if (delta > jitter) - adjust = jitter; + adjust = add_max; else if (delta < -jitter) - adjust = -jitter; + adjust = sub_max; else adjust = 0;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wayne Lin Wayne.Lin@amd.com
commit 5a507b7d2be15fddb95bf8dee01110b723e2bcd9 upstream.
[Why] Commit: - commit 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload allocation/removement") accidently overwrite the commit - commit 54d217406afe ("drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2") which cause regression.
[How] Recover the original NULL fix and remove the unnecessary input parameter 'state' for drm_dp_add_payload_part2().
Fixes: 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload allocation/removement") Reported-by: Leon Weiß leon.weiss@ruhr-uni-bochum.de Link: https://lore.kernel.org/r/38c253ea42072cc825dc969ac4e6b9b600371cc8.camel@ruh... Cc: lyude@redhat.com Cc: imre.deak@intel.com Cc: stable@vger.kernel.org Cc: regressions@lists.linux.dev Reviewed-by: Harry Wentland harry.wentland@amd.com Acked-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Wayne Lin Wayne.Lin@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Link: https://patchwork.freedesktop.org/patch/msgid/20240307062957.2323620-1-Wayne... (cherry picked from commit 4545614c1d8da603e57b60dd66224d81b6ffc305) Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 2 +- drivers/gpu/drm/display/drm_dp_mst_topology.c | 4 +--- drivers/gpu/drm/i915/display/intel_dp_mst.c | 2 +- drivers/gpu/drm/nouveau/dispnv50/disp.c | 2 +- include/drm/display/drm_dp_mst_helper.h | 1 - 5 files changed, 4 insertions(+), 7 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c @@ -363,7 +363,7 @@ void dm_helpers_dp_mst_send_payload_allo mst_state = to_drm_dp_mst_topology_state(mst_mgr->base.state); new_payload = drm_atomic_get_mst_payload_state(mst_state, aconnector->mst_output_port);
- ret = drm_dp_add_payload_part2(mst_mgr, mst_state->base.state, new_payload); + ret = drm_dp_add_payload_part2(mst_mgr, new_payload);
if (ret) { amdgpu_dm_set_mst_status(&aconnector->mst_status, --- a/drivers/gpu/drm/display/drm_dp_mst_topology.c +++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c @@ -3421,7 +3421,6 @@ EXPORT_SYMBOL(drm_dp_remove_payload_part /** * drm_dp_add_payload_part2() - Execute payload update part 2 * @mgr: Manager to use. - * @state: The global atomic state * @payload: The payload to update * * If @payload was successfully assigned a starting time slot by drm_dp_add_payload_part1(), this @@ -3430,14 +3429,13 @@ EXPORT_SYMBOL(drm_dp_remove_payload_part * Returns: 0 on success, negative error code on failure. */ int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr, - struct drm_atomic_state *state, struct drm_dp_mst_atomic_payload *payload) { int ret = 0;
/* Skip failed payloads */ if (payload->payload_allocation_status != DRM_DP_MST_PAYLOAD_ALLOCATION_DFP) { - drm_dbg_kms(state->dev, "Part 1 of payload creation for %s failed, skipping part 2\n", + drm_dbg_kms(mgr->dev, "Part 1 of payload creation for %s failed, skipping part 2\n", payload->port->connector->name); return -EIO; } --- a/drivers/gpu/drm/i915/display/intel_dp_mst.c +++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c @@ -1160,7 +1160,7 @@ static void intel_mst_enable_dp(struct i if (first_mst_stream) intel_ddi_wait_for_fec_status(encoder, pipe_config, true);
- drm_dp_add_payload_part2(&intel_dp->mst_mgr, &state->base, + drm_dp_add_payload_part2(&intel_dp->mst_mgr, drm_atomic_get_mst_payload_state(mst_state, connector->port));
if (DISPLAY_VER(dev_priv) >= 12) --- a/drivers/gpu/drm/nouveau/dispnv50/disp.c +++ b/drivers/gpu/drm/nouveau/dispnv50/disp.c @@ -915,7 +915,7 @@ nv50_msto_cleanup(struct drm_atomic_stat msto->disabled = false; drm_dp_remove_payload_part2(mgr, new_mst_state, old_payload, new_payload); } else if (msto->enabled) { - drm_dp_add_payload_part2(mgr, state, new_payload); + drm_dp_add_payload_part2(mgr, new_payload); msto->enabled = false; } } --- a/include/drm/display/drm_dp_mst_helper.h +++ b/include/drm/display/drm_dp_mst_helper.h @@ -851,7 +851,6 @@ int drm_dp_add_payload_part1(struct drm_ struct drm_dp_mst_topology_state *mst_state, struct drm_dp_mst_atomic_payload *payload); int drm_dp_add_payload_part2(struct drm_dp_mst_topology_mgr *mgr, - struct drm_atomic_state *state, struct drm_dp_mst_atomic_payload *payload); void drm_dp_remove_payload_part1(struct drm_dp_mst_topology_mgr *mgr, struct drm_dp_mst_topology_state *mst_state,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nam Cao namcao@linutronix.de
commit fb1cf0878328fe75d47f0aed0a65b30126fcefc4 upstream.
__kernel_map_pages() is a debug function which clears the valid bit in page table entry for deallocated pages to detect illegal memory accesses to freed pages.
This function set/clear the valid bit using __set_memory(). __set_memory() acquires init_mm's semaphore, and this operation may sleep. This is problematic, because __kernel_map_pages() can be called in atomic context, and thus is illegal to sleep. An example warning that this causes:
BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1578 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2, name: kthreadd preempt_count: 2, expected: 0 CPU: 0 PID: 2 Comm: kthreadd Not tainted 6.9.0-g1d4c6d784ef6 #37 Hardware name: riscv-virtio,qemu (DT) Call Trace: [<ffffffff800060dc>] dump_backtrace+0x1c/0x24 [<ffffffff8091ef6e>] show_stack+0x2c/0x38 [<ffffffff8092baf8>] dump_stack_lvl+0x5a/0x72 [<ffffffff8092bb24>] dump_stack+0x14/0x1c [<ffffffff8003b7ac>] __might_resched+0x104/0x10e [<ffffffff8003b7f4>] __might_sleep+0x3e/0x62 [<ffffffff8093276a>] down_write+0x20/0x72 [<ffffffff8000cf00>] __set_memory+0x82/0x2fa [<ffffffff8000d324>] __kernel_map_pages+0x5a/0xd4 [<ffffffff80196cca>] __alloc_pages_bulk+0x3b2/0x43a [<ffffffff8018ee82>] __vmalloc_node_range+0x196/0x6ba [<ffffffff80011904>] copy_process+0x72c/0x17ec [<ffffffff80012ab4>] kernel_clone+0x60/0x2fe [<ffffffff80012f62>] kernel_thread+0x82/0xa0 [<ffffffff8003552c>] kthreadd+0x14a/0x1be [<ffffffff809357de>] ret_from_fork+0xe/0x1c
Rewrite this function with apply_to_existing_page_range(). It is fine to not have any locking, because __kernel_map_pages() works with pages being allocated/deallocated and those pages are not changed by anyone else in the meantime.
Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support") Signed-off-by: Nam Cao namcao@linutronix.de Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/1289ecba9606a19917bc12b6c27da8aa23e1e5ae.171575093... Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/mm/pageattr.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-)
--- a/arch/riscv/mm/pageattr.c +++ b/arch/riscv/mm/pageattr.c @@ -387,17 +387,33 @@ int set_direct_map_default_noflush(struc }
#ifdef CONFIG_DEBUG_PAGEALLOC +static int debug_pagealloc_set_page(pte_t *pte, unsigned long addr, void *data) +{ + int enable = *(int *)data; + + unsigned long val = pte_val(ptep_get(pte)); + + if (enable) + val |= _PAGE_PRESENT; + else + val &= ~_PAGE_PRESENT; + + set_pte(pte, __pte(val)); + + return 0; +} + void __kernel_map_pages(struct page *page, int numpages, int enable) { if (!debug_pagealloc_enabled()) return;
- if (enable) - __set_memory((unsigned long)page_address(page), numpages, - __pgprot(_PAGE_PRESENT), __pgprot(0)); - else - __set_memory((unsigned long)page_address(page), numpages, - __pgprot(0), __pgprot(_PAGE_PRESENT)); + unsigned long start = (unsigned long)page_address(page); + unsigned long size = PAGE_SIZE * numpages; + + apply_to_existing_page_range(&init_mm, start, size, debug_pagealloc_set_page, &enable); + + flush_tlb_kernel_range(start, start + size); } #endif
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nam Cao namcao@linutronix.de
commit c67ddf59ac44adc60649730bf8347e37c516b001 upstream.
debug_pagealloc is a debug feature which clears the valid bit in page table entry for freed pages to detect illegal accesses to freed memory.
For this feature to work, virtual mapping must have PAGE_SIZE resolution. (No, we cannot map with huge pages and split them only when needed; because pages can be allocated/freed in atomic context and page splitting cannot be done in atomic context)
Force linear mapping to use small pages if debug_pagealloc is enabled.
Note that it is not necessary to force the entire linear mapping, but only those that are given to memory allocator. Some parts of memory can keep using huge page mapping (for example, kernel's executable code). But these parts are minority, so keep it simple. This is just a debug feature, some extra overhead should be acceptable.
Fixes: 5fde3db5eb02 ("riscv: add ARCH_SUPPORTS_DEBUG_PAGEALLOC support") Signed-off-by: Nam Cao namcao@linutronix.de Cc: stable@vger.kernel.org Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/2e391fa6c6f9b3fcf1b41cefbace02ee4ab4bf59.171575093... Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/mm/init.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -669,6 +669,9 @@ void __init create_pgd_mapping(pgd_t *pg static uintptr_t __init best_map_size(phys_addr_t pa, uintptr_t va, phys_addr_t size) { + if (debug_pagealloc_enabled()) + return PAGE_SIZE; + if (pgtable_l5_enabled && !(pa & (P4D_SIZE - 1)) && !(va & (P4D_SIZE - 1)) && size >= P4D_SIZE) return P4D_SIZE;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Bristot de Oliveira bristot@kernel.org
commit 5f0769331a965675cdfec97c09f3f6e875d7c246 upstream.
Instead of printing three times the same output, print it only once, reducing lines and being sure that all no values have the same length.
It also fixes an extra '\n' when running the with kernel threads, like here:
=============== %< ============== Timer Latency
0 00:00:01 | IRQ Timer Latency (us) | Thread Timer Latency (us) CPU COUNT | cur min avg max | cur min avg max 2 #0 | - - - - | 161 161 161 161 3 #0 | - - - - | 161 161 161 161 8 #1 | 54 54 54 54 | - - - -'\n'
---------------|----------------------------------------|--------------------------------------- ALL #1 e0 | 54 54 54 | 161 161 161 =============== %< ==============
This '\n' should have been removed with the user-space support that added another '\n' if not running with kernel threads.
Link: https://lkml.kernel.org/r/0a4d8085e7cd706733a5dc10a81ca38b82bd4992.171396896...
Cc: stable@vger.kernel.org Cc: Jonathan Corbet corbet@lwn.net Cc: Juri Lelli juri.lelli@redhat.com Fixes: cdca4f4e5e8e ("rtla/timerlat_top: Add timerlat user-space support") Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/tracing/rtla/src/timerlat_top.c | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-)
--- a/tools/tracing/rtla/src/timerlat_top.c +++ b/tools/tracing/rtla/src/timerlat_top.c @@ -212,6 +212,8 @@ static void timerlat_top_header(struct o trace_seq_printf(s, "\n"); }
+static const char *no_value = " -"; + /* * timerlat_top_print - prints the output of a given CPU */ @@ -239,10 +241,7 @@ static void timerlat_top_print(struct os trace_seq_printf(s, "%3d #%-9d |", cpu, cpu_data->irq_count);
if (!cpu_data->irq_count) { - trace_seq_printf(s, " - "); - trace_seq_printf(s, " - "); - trace_seq_printf(s, " - "); - trace_seq_printf(s, " - |"); + trace_seq_printf(s, "%s %s %s %s |", no_value, no_value, no_value, no_value); } else { trace_seq_printf(s, "%9llu ", cpu_data->cur_irq / params->output_divisor); trace_seq_printf(s, "%9llu ", cpu_data->min_irq / params->output_divisor); @@ -251,10 +250,7 @@ static void timerlat_top_print(struct os }
if (!cpu_data->thread_count) { - trace_seq_printf(s, " - "); - trace_seq_printf(s, " - "); - trace_seq_printf(s, " - "); - trace_seq_printf(s, " -\n"); + trace_seq_printf(s, "%s %s %s %s", no_value, no_value, no_value, no_value); } else { trace_seq_printf(s, "%9llu ", cpu_data->cur_thread / divisor); trace_seq_printf(s, "%9llu ", cpu_data->min_thread / divisor); @@ -271,10 +267,7 @@ static void timerlat_top_print(struct os trace_seq_printf(s, " |");
if (!cpu_data->user_count) { - trace_seq_printf(s, " - "); - trace_seq_printf(s, " - "); - trace_seq_printf(s, " - "); - trace_seq_printf(s, " -\n"); + trace_seq_printf(s, "%s %s %s %s\n", no_value, no_value, no_value, no_value); } else { trace_seq_printf(s, "%9llu ", cpu_data->cur_user / divisor); trace_seq_printf(s, "%9llu ", cpu_data->min_user / divisor);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daniel Bristot de Oliveira bristot@kernel.org
commit a40e5e4dd0207485dee75e2b8e860d5853bcc5f7 upstream.
When copying timerlat auto-analysis from a terminal to some web pages or chats, the \t are being replaced with a single ' ' or ' ', breaking the output.
For example: ## CPU 3 hit stop tracing, analyzing it ## IRQ handler delay: 1.30 us (0.11 %) IRQ latency: 1.90 us Timerlat IRQ duration: 3.00 us (0.24 %) Blocking thread: 1223.16 us (99.00 %) insync:4048 1223.16 us IRQ interference 4.93 us (0.40 %) local_timer:236 4.93 us ------------------------------------------------------------------------ Thread latency: 1235.47 us (100%)
Replace \t with spaces to avoid this problem.
Link: https://lkml.kernel.org/r/ec7ed2b2809c22ab0dfc8eb7c805ab9cddc4254a.171396896...
Cc: stable@vger.kernel.org Cc: Jonathan Corbet corbet@lwn.net Cc: Juri Lelli juri.lelli@redhat.com Fixes: 27e348b221f6 ("rtla/timerlat: Add auto-analysis core") Signed-off-by: Daniel Bristot de Oliveira bristot@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/tracing/rtla/src/timerlat_aa.c | 111 ++++++++++++++++++++--------------- 1 file changed, 64 insertions(+), 47 deletions(-)
--- a/tools/tracing/rtla/src/timerlat_aa.c +++ b/tools/tracing/rtla/src/timerlat_aa.c @@ -16,6 +16,9 @@ enum timelat_state { TIMERLAT_WAITING_THREAD, };
+/* Used to fill spaces in the output */ +static const char *spaces = " "; + #define MAX_COMM 24
/* @@ -274,14 +277,17 @@ static int timerlat_aa_nmi_handler(struc taa_data->prev_irq_timstamp = start;
trace_seq_reset(taa_data->prev_irqs_seq); - trace_seq_printf(taa_data->prev_irqs_seq, "\t%24s \t\t\t%9.2f us\n", - "nmi", ns_to_usf(duration)); + trace_seq_printf(taa_data->prev_irqs_seq, " %24s %.*s %9.2f us\n", + "nmi", + 24, spaces, + ns_to_usf(duration)); return 0; }
taa_data->thread_nmi_sum += duration; - trace_seq_printf(taa_data->nmi_seq, " %24s \t\t\t%9.2f us\n", - "nmi", ns_to_usf(duration)); + trace_seq_printf(taa_data->nmi_seq, " %24s %.*s %9.2f us\n", + "nmi", + 24, spaces, ns_to_usf(duration));
return 0; } @@ -323,8 +329,10 @@ static int timerlat_aa_irq_handler(struc taa_data->prev_irq_timstamp = start;
trace_seq_reset(taa_data->prev_irqs_seq); - trace_seq_printf(taa_data->prev_irqs_seq, "\t%24s:%-3llu \t\t%9.2f us\n", - desc, vector, ns_to_usf(duration)); + trace_seq_printf(taa_data->prev_irqs_seq, " %24s:%-3llu %.*s %9.2f us\n", + desc, vector, + 15, spaces, + ns_to_usf(duration)); return 0; }
@@ -372,8 +380,10 @@ static int timerlat_aa_irq_handler(struc * IRQ interference. */ taa_data->thread_irq_sum += duration; - trace_seq_printf(taa_data->irqs_seq, " %24s:%-3llu \t %9.2f us\n", - desc, vector, ns_to_usf(duration)); + trace_seq_printf(taa_data->irqs_seq, " %24s:%-3llu %.*s %9.2f us\n", + desc, vector, + 24, spaces, + ns_to_usf(duration));
return 0; } @@ -408,8 +418,10 @@ static int timerlat_aa_softirq_handler(s
taa_data->thread_softirq_sum += duration;
- trace_seq_printf(taa_data->softirqs_seq, "\t%24s:%-3llu \t %9.2f us\n", - softirq_name[vector], vector, ns_to_usf(duration)); + trace_seq_printf(taa_data->softirqs_seq, " %24s:%-3llu %.*s %9.2f us\n", + softirq_name[vector], vector, + 24, spaces, + ns_to_usf(duration)); return 0; }
@@ -452,8 +464,10 @@ static int timerlat_aa_thread_handler(st } else { taa_data->thread_thread_sum += duration;
- trace_seq_printf(taa_data->threads_seq, "\t%24s:%-3llu \t\t%9.2f us\n", - comm, pid, ns_to_usf(duration)); + trace_seq_printf(taa_data->threads_seq, " %24s:%-12llu %.*s %9.2f us\n", + comm, pid, + 15, spaces, + ns_to_usf(duration)); }
return 0; @@ -482,7 +496,8 @@ static int timerlat_aa_stack_handler(str function = tep_find_function(taa_ctx->tool->trace.tep, caller[i]); if (!function) break; - trace_seq_printf(taa_data->stack_seq, "\t\t-> %s\n", function); + trace_seq_printf(taa_data->stack_seq, " %.*s -> %s\n", + 14, spaces, function); } } return 0; @@ -568,23 +583,24 @@ static void timerlat_thread_analysis(str exp_irq_ts = taa_data->timer_irq_start_time - taa_data->timer_irq_start_delay; if (exp_irq_ts < taa_data->prev_irq_timstamp + taa_data->prev_irq_duration) { if (taa_data->prev_irq_timstamp < taa_data->timer_irq_start_time) - printf(" Previous IRQ interference: \t\t up to %9.2f us\n", - ns_to_usf(taa_data->prev_irq_duration)); + printf(" Previous IRQ interference: %.*s up to %9.2f us\n", + 16, spaces, + ns_to_usf(taa_data->prev_irq_duration)); }
/* * The delay that the IRQ suffered before starting. */ - printf(" IRQ handler delay: %16s %9.2f us (%.2f %%)\n", - (ns_to_usf(taa_data->timer_exit_from_idle) > 10) ? "(exit from idle)" : "", - ns_to_usf(taa_data->timer_irq_start_delay), - ns_to_per(total, taa_data->timer_irq_start_delay)); + printf(" IRQ handler delay: %.*s %16s %9.2f us (%.2f %%)\n", 16, spaces, + (ns_to_usf(taa_data->timer_exit_from_idle) > 10) ? "(exit from idle)" : "", + ns_to_usf(taa_data->timer_irq_start_delay), + ns_to_per(total, taa_data->timer_irq_start_delay));
/* * Timerlat IRQ. */ - printf(" IRQ latency: \t\t\t\t %9.2f us\n", - ns_to_usf(taa_data->tlat_irq_latency)); + printf(" IRQ latency: %.*s %9.2f us\n", 40, spaces, + ns_to_usf(taa_data->tlat_irq_latency));
if (irq) { /* @@ -595,15 +611,16 @@ static void timerlat_thread_analysis(str * so it will be displayed, it is the key. */ printf(" Blocking thread:\n"); - printf(" %24s:%-9llu\n", - taa_data->run_thread_comm, taa_data->run_thread_pid); + printf(" %.*s %24s:%-9llu\n", 6, spaces, taa_data->run_thread_comm, + taa_data->run_thread_pid); } else { /* * The duration of the IRQ handler that handled the timerlat IRQ. */ - printf(" Timerlat IRQ duration: \t\t %9.2f us (%.2f %%)\n", - ns_to_usf(taa_data->timer_irq_duration), - ns_to_per(total, taa_data->timer_irq_duration)); + printf(" Timerlat IRQ duration: %.*s %9.2f us (%.2f %%)\n", + 30, spaces, + ns_to_usf(taa_data->timer_irq_duration), + ns_to_per(total, taa_data->timer_irq_duration));
/* * The amount of time that the current thread postponed the scheduler. @@ -611,13 +628,13 @@ static void timerlat_thread_analysis(str * Recalling that it is net from NMI/IRQ/Softirq interference, so there * is no need to compute values here. */ - printf(" Blocking thread: \t\t\t %9.2f us (%.2f %%)\n", - ns_to_usf(taa_data->thread_blocking_duration), - ns_to_per(total, taa_data->thread_blocking_duration)); - - printf(" %24s:%-9llu %9.2f us\n", - taa_data->run_thread_comm, taa_data->run_thread_pid, - ns_to_usf(taa_data->thread_blocking_duration)); + printf(" Blocking thread: %.*s %9.2f us (%.2f %%)\n", 36, spaces, + ns_to_usf(taa_data->thread_blocking_duration), + ns_to_per(total, taa_data->thread_blocking_duration)); + + printf(" %.*s %24s:%-9llu %.*s %9.2f us\n", 6, spaces, + taa_data->run_thread_comm, taa_data->run_thread_pid, + 12, spaces, ns_to_usf(taa_data->thread_blocking_duration)); }
/* @@ -629,9 +646,9 @@ static void timerlat_thread_analysis(str * NMIs can happen during the IRQ, so they are always possible. */ if (taa_data->thread_nmi_sum) - printf(" NMI interference \t\t\t %9.2f us (%.2f %%)\n", - ns_to_usf(taa_data->thread_nmi_sum), - ns_to_per(total, taa_data->thread_nmi_sum)); + printf(" NMI interference %.*s %9.2f us (%.2f %%)\n", 36, spaces, + ns_to_usf(taa_data->thread_nmi_sum), + ns_to_per(total, taa_data->thread_nmi_sum));
/* * If it is an IRQ latency, the other factors can be skipped. @@ -643,9 +660,9 @@ static void timerlat_thread_analysis(str * Prints the interference caused by IRQs to the thread latency. */ if (taa_data->thread_irq_sum) { - printf(" IRQ interference \t\t\t %9.2f us (%.2f %%)\n", - ns_to_usf(taa_data->thread_irq_sum), - ns_to_per(total, taa_data->thread_irq_sum)); + printf(" IRQ interference %.*s %9.2f us (%.2f %%)\n", 36, spaces, + ns_to_usf(taa_data->thread_irq_sum), + ns_to_per(total, taa_data->thread_irq_sum));
trace_seq_do_printf(taa_data->irqs_seq); } @@ -654,9 +671,9 @@ static void timerlat_thread_analysis(str * Prints the interference caused by Softirqs to the thread latency. */ if (taa_data->thread_softirq_sum) { - printf(" Softirq interference \t\t\t %9.2f us (%.2f %%)\n", - ns_to_usf(taa_data->thread_softirq_sum), - ns_to_per(total, taa_data->thread_softirq_sum)); + printf(" Softirq interference %.*s %9.2f us (%.2f %%)\n", 32, spaces, + ns_to_usf(taa_data->thread_softirq_sum), + ns_to_per(total, taa_data->thread_softirq_sum));
trace_seq_do_printf(taa_data->softirqs_seq); } @@ -670,9 +687,9 @@ static void timerlat_thread_analysis(str * timer handling latency. */ if (taa_data->thread_thread_sum) { - printf(" Thread interference \t\t\t %9.2f us (%.2f %%)\n", - ns_to_usf(taa_data->thread_thread_sum), - ns_to_per(total, taa_data->thread_thread_sum)); + printf(" Thread interference %.*s %9.2f us (%.2f %%)\n", 33, spaces, + ns_to_usf(taa_data->thread_thread_sum), + ns_to_per(total, taa_data->thread_thread_sum));
trace_seq_do_printf(taa_data->threads_seq); } @@ -682,8 +699,8 @@ static void timerlat_thread_analysis(str */ print_total: printf("------------------------------------------------------------------------\n"); - printf(" %s latency: \t\t\t %9.2f us (100%%)\n", irq ? "IRQ" : "Thread", - ns_to_usf(total)); + printf(" %s latency: %.*s %9.2f us (100%%)\n", irq ? " IRQ" : "Thread", + 37, spaces, ns_to_usf(total)); }
static int timerlat_auto_analysis_collect_trace(struct timerlat_aa_context *taa_ctx)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chris Wilson chris@chris-wilson.co.uk
commit 70cb9188ffc75e643debf292fcddff36c9dbd4ae upstream.
The breadcrumbs use a GT wakeref for guarding the interrupt, but are disarmed during release of the engine wakeref. This leaves a hole where we may attach a breadcrumb just as the engine is parking (after it has parked its breadcrumbs), execute the irq worker with some signalers still attached, but never be woken again.
That issue manifests itself in CI with IGT runner timeouts while tests are waiting indefinitely for release of all GT wakerefs.
<6> [209.151778] i915: Running live_engine_pm_selftests/live_engine_busy_stats <7> [209.231628] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_5 <7> [209.231816] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_4 <7> [209.231944] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_3 <7> [209.232056] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling PW_2 <7> [209.232166] i915 0000:00:02.0: [drm:intel_power_well_disable [i915]] disabling DC_off <7> [209.232270] i915 0000:00:02.0: [drm:skl_enable_dc6 [i915]] Enabling DC6 <7> [209.232368] i915 0000:00:02.0: [drm:gen9_set_dc_state.part.0 [i915]] Setting DC state from 00 to 02 <4> [299.356116] [IGT] Inactivity timeout exceeded. Killing the current test with SIGQUIT. ... <6> [299.356526] sysrq: Show State ... <6> [299.373964] task:i915_selftest state:D stack:11784 pid:5578 tgid:5578 ppid:873 flags:0x00004002 <6> [299.373967] Call Trace: <6> [299.373968] <TASK> <6> [299.373970] __schedule+0x3bb/0xda0 <6> [299.373974] schedule+0x41/0x110 <6> [299.373976] intel_wakeref_wait_for_idle+0x82/0x100 [i915] <6> [299.374083] ? __pfx_var_wake_function+0x10/0x10 <6> [299.374087] live_engine_busy_stats+0x9b/0x500 [i915] <6> [299.374173] __i915_subtests+0xbe/0x240 [i915] <6> [299.374277] ? __pfx___intel_gt_live_setup+0x10/0x10 [i915] <6> [299.374369] ? __pfx___intel_gt_live_teardown+0x10/0x10 [i915] <6> [299.374456] intel_engine_live_selftests+0x1c/0x30 [i915] <6> [299.374547] __run_selftests+0xbb/0x190 [i915] <6> [299.374635] i915_live_selftests+0x4b/0x90 [i915] <6> [299.374717] i915_pci_probe+0x10d/0x210 [i915]
At the end of the interrupt worker, if there are no more engines awake, disarm the breadcrumb and go to sleep.
Fixes: 9d5612ca165a ("drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission") Closes: https://gitlab.freedesktop.org/drm/intel/issues/10026 Signed-off-by: Chris Wilson chris@chris-wilson.co.uk Cc: Andrzej Hajda andrzej.hajda@intel.com Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Janusz Krzysztofik janusz.krzysztofik@linux.intel.com Acked-by: Nirmoy Das nirmoy.das@intel.com Reviewed-by: Andrzej Hajda andrzej.hajda@intel.com Reviewed-by: Andi Shyti andi.shyti@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240423165505.465734-2-janusz... (cherry picked from commit fbad43eccae5cb14594195c20113369aabaa22b5) Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c @@ -263,8 +263,13 @@ static void signal_irq_work(struct irq_w i915_request_put(rq); }
+ /* Lazy irq enabling after HW submission */ if (!READ_ONCE(b->irq_armed) && !list_empty(&b->signalers)) intel_breadcrumbs_arm_irq(b); + + /* And confirm that we still want irqs enabled before we yield */ + if (READ_ONCE(b->irq_armed) && !atomic_read(&b->active)) + intel_breadcrumbs_disarm_irq(b); }
struct intel_breadcrumbs * @@ -315,13 +320,7 @@ void __intel_breadcrumbs_park(struct int return;
/* Kick the work once more to drain the signalers, and disarm the irq */ - irq_work_sync(&b->irq_work); - while (READ_ONCE(b->irq_armed) && !atomic_read(&b->active)) { - local_irq_disable(); - signal_irq_work(&b->irq_work); - local_irq_enable(); - cond_resched(); - } + irq_work_queue(&b->irq_work); }
void intel_breadcrumbs_free(struct kref *kref) @@ -404,7 +403,7 @@ static void insert_breadcrumb(struct i91 * the request as it may have completed and raised the interrupt as * we were attaching it into the lists. */ - if (!b->irq_armed || __i915_request_is_complete(rq)) + if (!READ_ONCE(b->irq_armed) || __i915_request_is_complete(rq)) irq_work_queue(&b->irq_work); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wachowski, Karol karol.wachowski@intel.com
commit 39bc27bd688066a63e56f7f64ad34fae03fbe3b8 upstream.
Lack of check for copy-on-write (COW) mapping in drm_gem_shmem_mmap allows users to call mmap with PROT_WRITE and MAP_PRIVATE flag causing a kernel panic due to BUG_ON in vmf_insert_pfn_prot: BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags));
Return -EINVAL early if COW mapping is detected.
This bug affects all drm drivers using default shmem helpers. It can be reproduced by this simple example: void *ptr = mmap(0, size, PROT_WRITE, MAP_PRIVATE, fd, mmap_offset); ptr[0] = 0;
Fixes: 2194a63a818d ("drm: Add library for shmem backed GEM objects") Cc: Noralf Trønnes noralf@tronnes.org Cc: Eric Anholt eric@anholt.net Cc: Rob Herring robh@kernel.org Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Wachowski, Karol karol.wachowski@intel.com Signed-off-by: Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com Signed-off-by: Daniel Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/20240520100514.925681-1-jacek.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_gem_shmem_helper.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c +++ b/drivers/gpu/drm/drm_gem_shmem_helper.c @@ -610,6 +610,9 @@ int drm_gem_shmem_mmap(struct drm_gem_sh return ret; }
+ if (is_cow_mapping(vma->vm_flags)) + return -EINVAL; + dma_resv_lock(shmem->base.resv, NULL); ret = drm_gem_shmem_get_pages(shmem); dma_resv_unlock(shmem->base.resv);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com
commit 6c5cd0807c79eb4c0cda70b48f6be668a241d584 upstream.
Release the submission_state lock if alloc_guc_id() fails.
v2: Add Fixes tag and CC stable kernel
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Cc: stable@vger.kernel.org # v6.8+ Signed-off-by: Niranjana Vishwanathapura niranjana.vishwanathapura@intel.com Reviewed-by: Nirmoy Das nirmoy.das@intel.com Reviewed-by: Matthew Brost matthew.brost@intel.com Signed-off-by: José Roberto de Souza jose.souza@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240521201711.4934-1-niranjan... (cherry picked from commit 40672b792a36894aff3a337b695f6136ee6ac5d4) Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/xe/xe_guc_submit.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -1257,6 +1257,7 @@ static int guc_exec_queue_init(struct xe return 0;
err_entity: + mutex_unlock(&guc->submission_state.lock); xe_sched_entity_fini(&ge->entity); err_sched: xe_sched_fini(&ge->sched);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vidya Srinivas vidya.srinivas@intel.com
commit 43e2b37e2ab660c3565d4cff27922bc70e79c3f1 upstream.
In some scenarios, the DPT object gets shrunk but the actual framebuffer did not and thus its still there on the DPT's vm->bound_list. Then it tries to rewrite the PTEs via a stale CPU mapping. This causes panic.
Cc: stable@vger.kernel.org Reported-by: Shawn Lee shawn.c.lee@intel.com Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for dpt") Signed-off-by: Vidya Srinivas vidya.srinivas@intel.com [vsyrjala: Add TODO comment] Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240520165634.1162470-1-vidya... (cherry picked from commit 51064d471c53dcc8eddd2333c3f1c1d9131ba36c) Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/gem/i915_gem_object.h | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h @@ -284,7 +284,9 @@ bool i915_gem_object_has_iomem(const str static inline bool i915_gem_object_is_shrinkable(const struct drm_i915_gem_object *obj) { - return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE); + /* TODO: make DPT shrinkable when it has no bound vmas */ + return i915_gem_object_type_has(obj, I915_GEM_OBJECT_IS_SHRINKABLE) && + !obj->is_dpt; }
static inline bool
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Imre Deak imre.deak@intel.com
commit 75800e2e4203ea83bbc9d4f63ad97ea582244a08 upstream.
After registering the audio component in i915_audio_component_init() the audio driver may call i915_audio_component_get_power() via the component ops. This could program AUD_FREQ_CNTRL with an uninitialized value if the latter function is called before display.audio.freq_cntrl gets initialized. The get_power() function also does a modeset which in the above case happens too early before the initialization step and triggers the
"Reject display access from task"
error message added by the Fixes: commit below.
Fix the above issue by registering the audio component only after the initialization step.
Fixes: 87c1694533c9 ("drm/i915: save AUD_FREQ_CNTRL state at audio domain suspend") Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10291 Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Imre Deak imre.deak@intel.com Reviewed-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240521143022.3784539-1-imre.... (cherry picked from commit fdd0b80172758ce284f19fa8a26d90c61e4371d2) Signed-off-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_audio.c | 32 +++++++++++++------- drivers/gpu/drm/i915/display/intel_audio.h | 1 drivers/gpu/drm/i915/display/intel_display_driver.c | 2 + 3 files changed, 24 insertions(+), 11 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_audio.c +++ b/drivers/gpu/drm/i915/display/intel_audio.c @@ -1252,17 +1252,6 @@ static const struct component_ops i915_a static void i915_audio_component_init(struct drm_i915_private *i915) { u32 aud_freq, aud_freq_init; - int ret; - - ret = component_add_typed(i915->drm.dev, - &i915_audio_component_bind_ops, - I915_COMPONENT_AUDIO); - if (ret < 0) { - drm_err(&i915->drm, - "failed to add audio component (%d)\n", ret); - /* continue with reduced functionality */ - return; - }
if (DISPLAY_VER(i915) >= 9) { aud_freq_init = intel_de_read(i915, AUD_FREQ_CNTRL); @@ -1285,6 +1274,21 @@ static void i915_audio_component_init(st
/* init with current cdclk */ intel_audio_cdclk_change_post(i915); +} + +static void i915_audio_component_register(struct drm_i915_private *i915) +{ + int ret; + + ret = component_add_typed(i915->drm.dev, + &i915_audio_component_bind_ops, + I915_COMPONENT_AUDIO); + if (ret < 0) { + drm_err(&i915->drm, + "failed to add audio component (%d)\n", ret); + /* continue with reduced functionality */ + return; + }
i915->display.audio.component_registered = true; } @@ -1317,6 +1321,12 @@ void intel_audio_init(struct drm_i915_pr i915_audio_component_init(i915); }
+void intel_audio_register(struct drm_i915_private *i915) +{ + if (!i915->display.audio.lpe.platdev) + i915_audio_component_register(i915); +} + /** * intel_audio_deinit() - deinitialize the audio driver * @i915: the i915 drm device private data --- a/drivers/gpu/drm/i915/display/intel_audio.h +++ b/drivers/gpu/drm/i915/display/intel_audio.h @@ -28,6 +28,7 @@ void intel_audio_codec_get_config(struct void intel_audio_cdclk_change_pre(struct drm_i915_private *dev_priv); void intel_audio_cdclk_change_post(struct drm_i915_private *dev_priv); void intel_audio_init(struct drm_i915_private *dev_priv); +void intel_audio_register(struct drm_i915_private *i915); void intel_audio_deinit(struct drm_i915_private *dev_priv); void intel_audio_sdp_split_update(const struct intel_crtc_state *crtc_state);
--- a/drivers/gpu/drm/i915/display/intel_display_driver.c +++ b/drivers/gpu/drm/i915/display/intel_display_driver.c @@ -542,6 +542,8 @@ void intel_display_driver_register(struc
intel_display_driver_enable_user_access(i915);
+ intel_audio_register(i915); + intel_display_debugfs_register(i915);
/*
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Shishkin alexander.shishkin@linux.intel.com
commit e44937889bdf4ecd1f0c25762b7226406b9b7a69 upstream.
Add support for the Trace Hub in Granite Rapids.
Signed-off-by: Alexander Shishkin alexander.shishkin@linux.intel.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-11-alexander.shishkin@linux... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwtracing/intel_th/pci.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/hwtracing/intel_th/pci.c +++ b/drivers/hwtracing/intel_th/pci.c @@ -305,6 +305,11 @@ static const struct pci_device_id intel_ .driver_data = (kernel_ulong_t)&intel_th_2x, }, { + /* Granite Rapids */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x0963), + .driver_data = (kernel_ulong_t)&intel_th_2x, + }, + { /* Alder Lake CPU */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f), .driver_data = (kernel_ulong_t)&intel_th_2x,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Shishkin alexander.shishkin@linux.intel.com
commit 854afe461b009801a171b3a49c5f75ea43e4c04c upstream.
Add support for the Trace Hub in Granite Rapids SOC.
Signed-off-by: Alexander Shishkin alexander.shishkin@linux.intel.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-12-alexander.shishkin@linux... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwtracing/intel_th/pci.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/hwtracing/intel_th/pci.c +++ b/drivers/hwtracing/intel_th/pci.c @@ -310,6 +310,11 @@ static const struct pci_device_id intel_ .driver_data = (kernel_ulong_t)&intel_th_2x, }, { + /* Granite Rapids SOC */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x3256), + .driver_data = (kernel_ulong_t)&intel_th_2x, + }, + { /* Alder Lake CPU */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f), .driver_data = (kernel_ulong_t)&intel_th_2x,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Shishkin alexander.shishkin@linux.intel.com
commit 2e1da7efabe05cb0cf0b358883b2bc89080ed0eb upstream.
Add support for the Trace Hub in Sapphire Rapids SOC.
Signed-off-by: Alexander Shishkin alexander.shishkin@linux.intel.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-13-alexander.shishkin@linux... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwtracing/intel_th/pci.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/hwtracing/intel_th/pci.c +++ b/drivers/hwtracing/intel_th/pci.c @@ -315,6 +315,11 @@ static const struct pci_device_id intel_ .driver_data = (kernel_ulong_t)&intel_th_2x, }, { + /* Sapphire Rapids SOC */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x3456), + .driver_data = (kernel_ulong_t)&intel_th_2x, + }, + { /* Alder Lake CPU */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f), .driver_data = (kernel_ulong_t)&intel_th_2x,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Shishkin alexander.shishkin@linux.intel.com
commit c4a30def564d75e84718b059d1a62cc79b137cf9 upstream.
Add support for the Trace Hub in Meteor Lake-S.
Signed-off-by: Alexander Shishkin alexander.shishkin@linux.intel.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-14-alexander.shishkin@linux... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwtracing/intel_th/pci.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/hwtracing/intel_th/pci.c +++ b/drivers/hwtracing/intel_th/pci.c @@ -295,6 +295,11 @@ static const struct pci_device_id intel_ .driver_data = (kernel_ulong_t)&intel_th_2x, }, { + /* Meteor Lake-S */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7f26), + .driver_data = (kernel_ulong_t)&intel_th_2x, + }, + { /* Raptor Lake-S */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x7a26), .driver_data = (kernel_ulong_t)&intel_th_2x,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Shishkin alexander.shishkin@linux.intel.com
commit f866b65322bfbc8fcca13c25f49e1a5c5a93ae4d upstream.
Add support for the Trace Hub in Lunar Lake.
Signed-off-by: Alexander Shishkin alexander.shishkin@linux.intel.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: stable@kernel.org Link: https://lore.kernel.org/r/20240429130119.1518073-16-alexander.shishkin@linux... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hwtracing/intel_th/pci.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/hwtracing/intel_th/pci.c +++ b/drivers/hwtracing/intel_th/pci.c @@ -325,6 +325,11 @@ static const struct pci_device_id intel_ .driver_data = (kernel_ulong_t)&intel_th_2x, }, { + /* Lunar Lake */ + PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0xa824), + .driver_data = (kernel_ulong_t)&intel_th_2x, + }, + { /* Alder Lake CPU */ PCI_DEVICE(PCI_VENDOR_ID_INTEL, 0x466f), .driver_data = (kernel_ulong_t)&intel_th_2x,
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomi Valkeinen tomi.valkeinen@ideasonboard.com
commit 670c900f69645db394efb38934b3344d8804171a upstream.
When the dts file has multiple referrers to a single PD (e.g. simple-framebuffer and dss nodes both point to the DSS power-domain) the ti-sci driver will create two power domains, both with the same ID, and that will cause problems as one of the power domains will hide the other one.
Fix this checking if a PD with the ID has already been created, and only create a PD for new IDs.
Fixes: efa5c01cd7ee ("soc: ti: ti_sci_pm_domains: switch to use multiple genpds instead of one") Signed-off-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240415-ti-sci-pd-v1-1-a0e56b8ad897@ideasonboard.... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pmdomain/ti/ti_sci_pm_domains.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-)
--- a/drivers/pmdomain/ti/ti_sci_pm_domains.c +++ b/drivers/pmdomain/ti/ti_sci_pm_domains.c @@ -114,6 +114,18 @@ static const struct of_device_id ti_sci_ }; MODULE_DEVICE_TABLE(of, ti_sci_pm_domain_matches);
+static bool ti_sci_pm_idx_exists(struct ti_sci_genpd_provider *pd_provider, u32 idx) +{ + struct ti_sci_pm_domain *pd; + + list_for_each_entry(pd, &pd_provider->pd_list, node) { + if (pd->idx == idx) + return true; + } + + return false; +} + static int ti_sci_pm_domain_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; @@ -149,8 +161,14 @@ static int ti_sci_pm_domain_probe(struct break;
if (args.args_count >= 1 && args.np == dev->of_node) { - if (args.args[0] > max_id) + if (args.args[0] > max_id) { max_id = args.args[0]; + } else { + if (ti_sci_pm_idx_exists(pd_provider, args.args[0])) { + index++; + continue; + } + }
pd = devm_kzalloc(dev, sizeof(*pd), GFP_KERNEL); if (!pd) {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit 0090d6e1b210551e63cf43958dc7a1ec942cdde9 upstream.
While loading a zone's info during creation of a block group, we can race with a device replace operation and then trigger a use-after-free on the device that was just replaced (source device of the replace operation).
This happens because at btrfs_load_zone_info() we extract a device from the chunk map into a local variable and then use the device while not under the protection of the device replace rwsem. So if there's a device replace operation happening when we extract the device and that device is the source of the replace operation, we will trigger a use-after-free if before we finish using the device the replace operation finishes and frees the device.
Fix this by enlarging the critical section under the protection of the device replace rwsem so that all uses of the device are done inside the critical section.
CC: stable@vger.kernel.org # 6.1.x: 15c12fcc50a1: btrfs: zoned: introduce a zone_info struct in btrfs_load_block_group_zone_info CC: stable@vger.kernel.org # 6.1.x: 09a46725cc84: btrfs: zoned: factor out per-zone logic from btrfs_load_block_group_zone_info CC: stable@vger.kernel.org # 6.1.x: 9e0e3e74dc69: btrfs: zoned: factor out single bg handling from btrfs_load_block_group_zone_info CC: stable@vger.kernel.org # 6.1.x: 87463f7e0250: btrfs: zoned: factor out DUP bg handling from btrfs_load_block_group_zone_info CC: stable@vger.kernel.org # 6.1.x Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/zoned.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
--- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1290,7 +1290,7 @@ static int btrfs_load_zone_info(struct b struct btrfs_chunk_map *map) { struct btrfs_dev_replace *dev_replace = &fs_info->dev_replace; - struct btrfs_device *device = map->stripes[zone_idx].dev; + struct btrfs_device *device; int dev_replace_is_ongoing = 0; unsigned int nofs_flag; struct blk_zone zone; @@ -1298,7 +1298,11 @@ static int btrfs_load_zone_info(struct b
info->physical = map->stripes[zone_idx].physical;
+ down_read(&dev_replace->rwsem); + device = map->stripes[zone_idx].dev; + if (!device->bdev) { + up_read(&dev_replace->rwsem); info->alloc_offset = WP_MISSING_DEV; return 0; } @@ -1308,6 +1312,7 @@ static int btrfs_load_zone_info(struct b __set_bit(zone_idx, active);
if (!btrfs_dev_is_sequential(device, info->physical)) { + up_read(&dev_replace->rwsem); info->alloc_offset = WP_CONVENTIONAL; return 0; } @@ -1315,11 +1320,9 @@ static int btrfs_load_zone_info(struct b /* This zone will be used for allocation, so mark this zone non-empty. */ btrfs_dev_clear_zone_empty(device, info->physical);
- down_read(&dev_replace->rwsem); dev_replace_is_ongoing = btrfs_dev_replace_is_ongoing(dev_replace); if (dev_replace_is_ongoing && dev_replace->tgtdev != NULL) btrfs_dev_clear_zone_empty(dev_replace->tgtdev, info->physical); - up_read(&dev_replace->rwsem);
/* * The group is mapped to a sequential zone. Get the zone write pointer @@ -1330,6 +1333,7 @@ static int btrfs_load_zone_info(struct b ret = btrfs_get_dev_zone(device, info->physical, &zone); memalloc_nofs_restore(nofs_flag); if (ret) { + up_read(&dev_replace->rwsem); if (ret != -EIO && ret != -EOPNOTSUPP) return ret; info->alloc_offset = WP_MISSING_DEV; @@ -1341,6 +1345,7 @@ static int btrfs_load_zone_info(struct b "zoned: unexpected conventional zone %llu on device %s (devid %llu)", zone.start << SECTOR_SHIFT, rcu_str_deref(device->name), device->devid); + up_read(&dev_replace->rwsem); return -EIO; }
@@ -1368,6 +1373,8 @@ static int btrfs_load_zone_info(struct b break; }
+ up_read(&dev_replace->rwsem); + return 0; }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Emmanuel Grumbach emmanuel.grumbach@intel.com
commit 8f892e225f416fcf2b55a0f9161162e08e2b0cc7 upstream.
This just adds a __le32 that we (currently) don't use.
Signed-off-by: Emmanuel Grumbach emmanuel.grumbach@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240319100755.29ff7a88ddac.I39cf2ff1d1ddf0fa62722538698d... Signed-off-by: Johannes Berg johannes.berg@intel.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=218963 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/intel/iwlwifi/fw/api/power.h | 30 ++++++++++++++++++++++ drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 4 ++ drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c | 4 ++ 3 files changed, 36 insertions(+), 2 deletions(-)
--- a/drivers/net/wireless/intel/iwlwifi/fw/api/power.h +++ b/drivers/net/wireless/intel/iwlwifi/fw/api/power.h @@ -385,6 +385,33 @@ struct iwl_dev_tx_power_cmd_v7 { __le32 timer_period; __le32 flags; } __packed; /* TX_REDUCED_POWER_API_S_VER_7 */ + +/** + * struct iwl_dev_tx_power_cmd_v8 - TX power reduction command version 8 + * @per_chain: per chain restrictions + * @enable_ack_reduction: enable or disable close range ack TX power + * reduction. + * @per_chain_restriction_changed: is per_chain_restriction has changed + * from last command. used if set_mode is + * IWL_TX_POWER_MODE_SET_SAR_TIMER. + * note: if not changed, the command is used for keep alive only. + * @reserved: reserved (padding) + * @timer_period: timer in milliseconds. if expires FW will change to default + * BIOS values. relevant if setMode is IWL_TX_POWER_MODE_SET_SAR_TIMER + * @flags: reduce power flags. + * @tpc_vlp_backoff_level: user backoff of UNII5,7 VLP channels in USA. + * Not in use. + */ +struct iwl_dev_tx_power_cmd_v8 { + __le16 per_chain[IWL_NUM_CHAIN_TABLES_V2][IWL_NUM_CHAIN_LIMITS][IWL_NUM_SUB_BANDS_V2]; + u8 enable_ack_reduction; + u8 per_chain_restriction_changed; + u8 reserved[2]; + __le32 timer_period; + __le32 flags; + __le32 tpc_vlp_backoff_level; +} __packed; /* TX_REDUCED_POWER_API_S_VER_8 */ + /** * struct iwl_dev_tx_power_cmd - TX power reduction command (multiversion) * @common: common part of the command @@ -392,6 +419,8 @@ struct iwl_dev_tx_power_cmd_v7 { * @v4: version 4 part of the command * @v5: version 5 part of the command * @v6: version 6 part of the command + * @v7: version 7 part of the command + * @v8: version 8 part of the command */ struct iwl_dev_tx_power_cmd { struct iwl_dev_tx_power_common common; @@ -401,6 +430,7 @@ struct iwl_dev_tx_power_cmd { struct iwl_dev_tx_power_cmd_v5 v5; struct iwl_dev_tx_power_cmd_v6 v6; struct iwl_dev_tx_power_cmd_v7 v7; + struct iwl_dev_tx_power_cmd_v8 v8; }; };
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c @@ -886,11 +886,13 @@ int iwl_mvm_sar_select_profile(struct iw u32 n_subbands; u8 cmd_ver = iwl_fw_lookup_cmd_ver(mvm->fw, cmd_id, IWL_FW_CMD_VER_UNKNOWN); - if (cmd_ver == 7) { + if (cmd_ver >= 7) { len = sizeof(cmd.v7); n_subbands = IWL_NUM_SUB_BANDS_V2; per_chain = cmd.v7.per_chain[0][0]; cmd.v7.flags = cpu_to_le32(mvm->fwrt.reduced_power_flags); + if (cmd_ver == 8) + len = sizeof(cmd.v8); } else if (cmd_ver == 6) { len = sizeof(cmd.v6); n_subbands = IWL_NUM_SUB_BANDS_V2; --- a/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/mac80211.c @@ -1399,7 +1399,9 @@ int iwl_mvm_set_tx_power(struct iwl_mvm if (tx_power == IWL_DEFAULT_MAX_TX_POWER) cmd.common.pwr_restriction = cpu_to_le16(IWL_DEV_MAX_TX_POWER);
- if (cmd_ver == 7) + if (cmd_ver == 8) + len = sizeof(cmd.v8); + else if (cmd_ver == 7) len = sizeof(cmd.v7); else if (cmd_ver == 6) len = sizeof(cmd.v6);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Emmanuel Grumbach emmanuel.grumbach@intel.com
commit 788e4c75f831d06fcfbbec1d455fac429521e607 upstream.
Since IWL_FW_CMD_VER_UNKNOWN = 99, then my change to consider cmd_ver >= 7 instead of cmd_ver = 7 included also firmwares that don't advertise the command version at all. This made us send a command with a bad size and because of that, the firmware hit a BAD_COMMAND immediately after handling the REDUCE_TX_POWER_CMD command.
Fixes: 8f892e225f41 ("wifi: iwlwifi: mvm: support iwl_dev_tx_power_cmd_v8") Signed-off-by: Emmanuel Grumbach emmanuel.grumbach@intel.com Reviewed-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240512072733.eb20ff5050d3.Ie4fc6f5496cd296fd6ff20d15e98... Signed-off-by: Johannes Berg johannes.berg@intel.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=218963 Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c @@ -884,8 +884,8 @@ int iwl_mvm_sar_select_profile(struct iw int ret; u16 len = 0; u32 n_subbands; - u8 cmd_ver = iwl_fw_lookup_cmd_ver(mvm->fw, cmd_id, - IWL_FW_CMD_VER_UNKNOWN); + u8 cmd_ver = iwl_fw_lookup_cmd_ver(mvm->fw, cmd_id, 3); + if (cmd_ver >= 7) { len = sizeof(cmd.v7); n_subbands = IWL_NUM_SUB_BANDS_V2;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miaohe Lin linmiaohe@huawei.com
commit fe6f86f4b40855a130a19aa589f9ba7f650423f4 upstream.
When I did memory failure tests recently, below panic occurs:
kernel BUG at include/linux/mm.h:1135! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 9 PID: 137 Comm: kswapd1 Not tainted 6.9.0-rc4-00491-gd5ce28f156fe-dirty #14 RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0 Call Trace: <TASK> do_shrink_slab+0x14f/0x6a0 shrink_slab+0xca/0x8c0 shrink_node+0x2d0/0x7d0 balance_pgdat+0x33a/0x720 kswapd+0x1f3/0x410 kthread+0xd5/0x100 ret_from_fork+0x2f/0x50 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: mce_inject hwpoison_inject ---[ end trace 0000000000000000 ]--- RIP: 0010:shrink_huge_zero_page_scan+0x168/0x1a0 RSP: 0018:ffff9933c6c57bd0 EFLAGS: 00000246 RAX: 000000000000003e RBX: 0000000000000000 RCX: ffff88f61fc5c9c8 RDX: 0000000000000000 RSI: 0000000000000027 RDI: ffff88f61fc5c9c0 RBP: ffffcd7c446b0000 R08: ffffffff9a9405f0 R09: 0000000000005492 R10: 00000000000030ea R11: ffffffff9a9405f0 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: ffff88e703c4ac00 FS: 0000000000000000(0000) GS:ffff88f61fc40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055f4da6e9878 CR3: 0000000c71048000 CR4: 00000000000006f0
The root cause is that HWPoison flag will be set for huge_zero_folio without increasing the folio refcnt. But then unpoison_memory() will decrease the folio refcnt unexpectedly as it appears like a successfully hwpoisoned folio leading to VM_BUG_ON_PAGE(page_ref_count(page) == 0) when releasing huge_zero_folio.
Skip unpoisoning huge_zero_folio in unpoison_memory() to fix this issue. We're not prepared to unpoison huge_zero_folio yet.
Link: https://lkml.kernel.org/r/20240516122608.22610-1-linmiaohe@huawei.com Fixes: 478d134e9506 ("mm/huge_memory: do not overkill when splitting huge_zero_page") Signed-off-by: Miaohe Lin linmiaohe@huawei.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Yang Shi shy828301@gmail.com Reviewed-by: Oscar Salvador osalvador@suse.de Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: Xu Yu xuyu@linux.alibaba.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Miaohe Lin linmiaohe@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory-failure.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2550,6 +2550,13 @@ int unpoison_memory(unsigned long pfn) goto unlock_mutex; }
+ if (is_huge_zero_page(&folio->page)) { + unpoison_pr_info("Unpoison: huge zero page is not supported %#lx\n", + pfn, &unpoison_rs); + ret = -EOPNOTSUPP; + goto unlock_mutex; + } + if (!PageHWPoison(p)) { unpoison_pr_info("Unpoison: Page was already unpoisoned %#lx\n", pfn, &unpoison_rs);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Beleswar Padhi b-padhi@ti.com
commit 1dc7242f6ee0c99852cb90676d7fe201cf5de422 upstream.
In case of errors during core start operation from sysfs, the driver directly returns with the -EPERM error code. Fix this to ensure that mailbox channels are freed on error before returning by jumping to the 'put_mbox' error handling label. Similarly, jump to the 'out' error handling label to return with required -EPERM error code during the core stop operation from sysfs.
Fixes: 3c8a9066d584 ("remoteproc: k3-r5: Do not allow core1 to power up before core0 via sysfs") Signed-off-by: Beleswar Padhi b-padhi@ti.com Link: https://lore.kernel.org/r/20240506141849.1735679-1-b-padhi@ti.com Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/remoteproc/ti_k3_r5_remoteproc.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/drivers/remoteproc/ti_k3_r5_remoteproc.c +++ b/drivers/remoteproc/ti_k3_r5_remoteproc.c @@ -580,7 +580,8 @@ static int k3_r5_rproc_start(struct rpro if (core != core0 && core0->rproc->state == RPROC_OFFLINE) { dev_err(dev, "%s: can not start core 1 before core 0\n", __func__); - return -EPERM; + ret = -EPERM; + goto put_mbox; }
ret = k3_r5_core_run(core); @@ -648,7 +649,8 @@ static int k3_r5_rproc_stop(struct rproc if (core != core1 && core1->rproc->state != RPROC_OFFLINE) { dev_err(dev, "%s: can not stop core 0 before core 1\n", __func__); - return -EPERM; + ret = -EPERM; + goto out; }
ret = k3_r5_core_halt(core);
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sicong Huang congei42@163.com
commit 5c9c5d7f26acc2c669c1dcf57d1bb43ee99220ce upstream.
In gb_interface_create, &intf->mode_switch_completion is bound with gb_interface_mode_switch_work. Then it will be started by gb_interface_request_mode_switch. Here is the relevant code. if (!queue_work(system_long_wq, &intf->mode_switch_work)) { ... }
If we call gb_interface_release to make cleanup, there may be an unfinished work. This function will call kfree to free the object "intf". However, if gb_interface_mode_switch_work is scheduled to run after kfree, it may cause use-after-free error as gb_interface_mode_switch_work will use the object "intf". The possible execution flow that may lead to the issue is as follows:
CPU0 CPU1
| gb_interface_create | gb_interface_request_mode_switch gb_interface_release | kfree(intf) (free) | | gb_interface_mode_switch_work | mutex_lock(&intf->mutex) (use)
Fix it by canceling the work before kfree.
Signed-off-by: Sicong Huang congei42@163.com Link: https://lore.kernel.org/r/20240416080313.92306-1-congei42@163.com Cc: Ronnie Sahlberg rsahlberg@ciq.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/greybus/interface.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/greybus/interface.c +++ b/drivers/greybus/interface.c @@ -693,6 +693,7 @@ static void gb_interface_release(struct
trace_gb_interface_release(intf);
+ cancel_work_sync(&intf->mode_switch_work); kfree(intf); }
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Berger stefanb@linux.ibm.com
commit be84f32bb2c981ca670922e047cdde1488b233de upstream.
->d_name.name can change on rename and the earlier value can be freed; there are conditions sufficient to stabilize it (->d_lock on dentry, ->d_lock on its parent, ->i_rwsem exclusive on the parent's inode, rename_lock), but none of those are met at any of the sites. Take a stable snapshot of the name instead.
Link: https://lore.kernel.org/all/20240202182732.GE2087318@ZenIV/ Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Stefan Berger stefanb@linux.ibm.com Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/integrity/ima/ima_api.c | 16 ++++++++++++---- security/integrity/ima/ima_template_lib.c | 17 ++++++++++++++--- 2 files changed, 26 insertions(+), 7 deletions(-)
--- a/security/integrity/ima/ima_api.c +++ b/security/integrity/ima/ima_api.c @@ -245,8 +245,8 @@ int ima_collect_measurement(struct ima_i const char *audit_cause = "failed"; struct inode *inode = file_inode(file); struct inode *real_inode = d_real_inode(file_dentry(file)); - const char *filename = file->f_path.dentry->d_name.name; struct ima_max_digest_data hash; + struct name_snapshot filename; struct kstat stat; int result = 0; int length; @@ -317,9 +317,13 @@ out: if (file->f_flags & O_DIRECT) audit_cause = "failed(directio)";
+ take_dentry_name_snapshot(&filename, file->f_path.dentry); + integrity_audit_msg(AUDIT_INTEGRITY_DATA, inode, - filename, "collect_data", audit_cause, - result, 0); + filename.name.name, "collect_data", + audit_cause, result, 0); + + release_dentry_name_snapshot(&filename); } return result; } @@ -432,6 +436,7 @@ out: */ const char *ima_d_path(const struct path *path, char **pathbuf, char *namebuf) { + struct name_snapshot filename; char *pathname = NULL;
*pathbuf = __getname(); @@ -445,7 +450,10 @@ const char *ima_d_path(const struct path }
if (!pathname) { - strscpy(namebuf, path->dentry->d_name.name, NAME_MAX); + take_dentry_name_snapshot(&filename, path->dentry); + strscpy(namebuf, filename.name.name, NAME_MAX); + release_dentry_name_snapshot(&filename); + pathname = namebuf; }
--- a/security/integrity/ima/ima_template_lib.c +++ b/security/integrity/ima/ima_template_lib.c @@ -483,7 +483,10 @@ static int ima_eventname_init_common(str bool size_limit) { const char *cur_filename = NULL; + struct name_snapshot filename; u32 cur_filename_len = 0; + bool snapshot = false; + int ret;
BUG_ON(event_data->filename == NULL && event_data->file == NULL);
@@ -496,7 +499,10 @@ static int ima_eventname_init_common(str }
if (event_data->file) { - cur_filename = event_data->file->f_path.dentry->d_name.name; + take_dentry_name_snapshot(&filename, + event_data->file->f_path.dentry); + snapshot = true; + cur_filename = filename.name.name; cur_filename_len = strlen(cur_filename); } else /* @@ -505,8 +511,13 @@ static int ima_eventname_init_common(str */ cur_filename_len = IMA_EVENT_NAME_LEN_MAX; out: - return ima_write_template_field_data(cur_filename, cur_filename_len, - DATA_FMT_STRING, field_data); + ret = ima_write_template_field_data(cur_filename, cur_filename_len, + DATA_FMT_STRING, field_data); + + if (snapshot) + release_dentry_name_snapshot(&filename); + + return ret; }
/*
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit 87d80bfbd577912462061b1a45c0ed9c7fcb872f ]
The container of the struct dw8250_port_data is private to the actual driver. In particular, 8250_lpss and 8250_dw use different data types that are assigned to the UART port private_data. Hence, it must not be used outside the specific driver.
Currently the only cpr_val is required by the common code, make it be available via struct dw8250_port_data.
This fixes the UART breakage on Intel Galileo boards.
Fixes: 593dea000bc1 ("serial: 8250: dw: Allow to use a fallback CPR value if not synthesized") Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Link: https://lore.kernel.org/r/20240514190730.2787071-2-andriy.shevchenko@linux.i... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/8250/8250_dw.c | 9 +++++++-- drivers/tty/serial/8250/8250_dwlib.c | 3 +-- drivers/tty/serial/8250/8250_dwlib.h | 3 ++- 3 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c index 1300c92b8702a..94d680e4b5353 100644 --- a/drivers/tty/serial/8250/8250_dw.c +++ b/drivers/tty/serial/8250/8250_dw.c @@ -55,6 +55,7 @@ #define DW_UART_QUIRK_SKIP_SET_RATE BIT(2) #define DW_UART_QUIRK_IS_DMA_FC BIT(3) #define DW_UART_QUIRK_APMC0D08 BIT(4) +#define DW_UART_QUIRK_CPR_VALUE BIT(5)
static inline struct dw8250_data *clk_to_dw8250_data(struct notifier_block *nb) { @@ -445,6 +446,10 @@ static void dw8250_prepare_rx_dma(struct uart_8250_port *p) static void dw8250_quirks(struct uart_port *p, struct dw8250_data *data) { unsigned int quirks = data->pdata ? data->pdata->quirks : 0; + u32 cpr_value = data->pdata ? data->pdata->cpr_value : 0; + + if (quirks & DW_UART_QUIRK_CPR_VALUE) + data->data.cpr_value = cpr_value;
#ifdef CONFIG_64BIT if (quirks & DW_UART_QUIRK_OCTEON) { @@ -727,8 +732,8 @@ static const struct dw8250_platform_data dw8250_armada_38x_data = {
static const struct dw8250_platform_data dw8250_renesas_rzn1_data = { .usr_reg = DW_UART_USR, - .cpr_val = 0x00012f32, - .quirks = DW_UART_QUIRK_IS_DMA_FC, + .cpr_value = 0x00012f32, + .quirks = DW_UART_QUIRK_CPR_VALUE | DW_UART_QUIRK_IS_DMA_FC, };
static const struct dw8250_platform_data dw8250_starfive_jh7100_data = { diff --git a/drivers/tty/serial/8250/8250_dwlib.c b/drivers/tty/serial/8250/8250_dwlib.c index 3e33ddf7bc800..5a2520943dfd5 100644 --- a/drivers/tty/serial/8250/8250_dwlib.c +++ b/drivers/tty/serial/8250/8250_dwlib.c @@ -242,7 +242,6 @@ static const struct serial_rs485 dw8250_rs485_supported = { void dw8250_setup_port(struct uart_port *p) { struct dw8250_port_data *pd = p->private_data; - struct dw8250_data *data = to_dw8250_data(pd); struct uart_8250_port *up = up_to_u8250p(p); u32 reg, old_dlf;
@@ -278,7 +277,7 @@ void dw8250_setup_port(struct uart_port *p)
reg = dw8250_readl_ext(p, DW_UART_CPR); if (!reg) { - reg = data->pdata->cpr_val; + reg = pd->cpr_value; dev_dbg(p->dev, "CPR is not available, using 0x%08x instead\n", reg); } if (!reg) diff --git a/drivers/tty/serial/8250/8250_dwlib.h b/drivers/tty/serial/8250/8250_dwlib.h index f13e91f2cace9..794a9014cdac1 100644 --- a/drivers/tty/serial/8250/8250_dwlib.h +++ b/drivers/tty/serial/8250/8250_dwlib.h @@ -19,6 +19,7 @@ struct dw8250_port_data { struct uart_8250_dma dma;
/* Hardware configuration */ + u32 cpr_value; u8 dlf_size;
/* RS485 variables */ @@ -27,7 +28,7 @@ struct dw8250_port_data {
struct dw8250_platform_data { u8 usr_reg; - u32 cpr_val; + u32 cpr_value; unsigned int quirks; };
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Herring (Arm) robh@kernel.org
[ Upstream commit e4228cfd092351c2d9b1a3048b2070287291ccbb ]
All nodes need an explicit additionalProperties or unevaluatedProperties unless a $ref has one that's false. As that is not the case with usb-device.yaml, "additionalProperties" is needed here.
Fixes: c44d9dab31d6 ("dt-bindings: usb: Add downstream facing ports to realtek binding") Signed-off-by: "Rob Herring (Arm)" robh@kernel.org Reviewed-by: Stephen Boyd swboyd@chromium.org Link: https://lore.kernel.org/r/20240523194500.2958192-1-robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/devicetree/bindings/usb/realtek,rts5411.yaml | 1 + 1 file changed, 1 insertion(+)
diff --git a/Documentation/devicetree/bindings/usb/realtek,rts5411.yaml b/Documentation/devicetree/bindings/usb/realtek,rts5411.yaml index 0874fc21f66fb..6577a61cc0753 100644 --- a/Documentation/devicetree/bindings/usb/realtek,rts5411.yaml +++ b/Documentation/devicetree/bindings/usb/realtek,rts5411.yaml @@ -65,6 +65,7 @@ patternProperties: description: The hard wired USB devices type: object $ref: /schemas/usb/usb-device.yaml + additionalProperties: true
required: - peer-hub
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shichao Lai shichaorai@gmail.com
[ Upstream commit 16637fea001ab3c8df528a8995b3211906165a30 ]
The member "uzonesize" of struct alauda_info will remain 0 if alauda_init_media() fails, potentially causing divide errors in alauda_read_data() and alauda_write_lba(). - Add a member "media_initialized" to struct alauda_info. - Change a condition in alauda_check_media() to ensure the first initialization. - Add an error check for the return value of alauda_init_media().
Fixes: e80b0fade09e ("[PATCH] USB Storage: add alauda support") Reported-by: xingwei lee xrivendell7@gmail.com Reported-by: yue sun samsun1006219@gmail.com Reviewed-by: Alan Stern stern@rowland.harvard.edu Signed-off-by: Shichao Lai shichaorai@gmail.com Link: https://lore.kernel.org/r/20240526012745.2852061-1-shichaorai@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/storage/alauda.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/storage/alauda.c b/drivers/usb/storage/alauda.c index 115f05a6201a1..40d34cc28344a 100644 --- a/drivers/usb/storage/alauda.c +++ b/drivers/usb/storage/alauda.c @@ -105,6 +105,8 @@ struct alauda_info { unsigned char sense_key; unsigned long sense_asc; /* additional sense code */ unsigned long sense_ascq; /* additional sense code qualifier */ + + bool media_initialized; };
#define short_pack(lsb,msb) ( ((u16)(lsb)) | ( ((u16)(msb))<<8 ) ) @@ -476,11 +478,12 @@ static int alauda_check_media(struct us_data *us) }
/* Check for media change */ - if (status[0] & 0x08) { + if (status[0] & 0x08 || !info->media_initialized) { usb_stor_dbg(us, "Media change detected\n"); alauda_free_maps(&MEDIA_INFO(us)); - alauda_init_media(us); - + rc = alauda_init_media(us); + if (rc == USB_STOR_TRANSPORT_GOOD) + info->media_initialized = true; info->sense_key = UNIT_ATTENTION; info->sense_asc = 0x28; info->sense_ascq = 0x00;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit af076156ec6d70332f1555754e99d4a3771ec297 ]
When using an initializer for a union only one of the union members must be initialized. The initializer for the acpi_object union variable passed as argument to the SID ACPI method was initializing both the type and the integer members of the union.
Unfortunately rather then complaining about this gcc simply ignores the first initializer and only used the second integer.value = 1 initializer. Leaving type set to 0 which leads to the argument being skipped by acpi acpi_ns_evaluate() resulting in:
ACPI Warning: _SB.PC00.SPI1.SPFD.CVFD.SID: Insufficient arguments - Caller passed 0, method requires 1 (20240322/nsarguments-232)
Fix this by initializing only the integer struct part of the union and initializing both members of the integer struct.
Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Sakari Ailus sakari.ailus@linux.intel.com Fixes: 566f5ca97680 ("mei: Add transport driver for IVSC device") Reviewed-by: Wentong Wu wentong.wu@intel.com Link: https://lore.kernel.org/r/20240603205050.505389-1-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/misc/mei/vsc-fw-loader.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/misc/mei/vsc-fw-loader.c b/drivers/misc/mei/vsc-fw-loader.c index ffa4ccd96a104..596a9d695dfc1 100644 --- a/drivers/misc/mei/vsc-fw-loader.c +++ b/drivers/misc/mei/vsc-fw-loader.c @@ -252,7 +252,7 @@ static int vsc_get_sensor_name(struct vsc_fw_loader *fw_loader, { struct acpi_buffer buffer = { ACPI_ALLOCATE_BUFFER }; union acpi_object obj = { - .type = ACPI_TYPE_INTEGER, + .integer.type = ACPI_TYPE_INTEGER, .integer.value = 1, }; struct acpi_object_list arg_list = {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yongzhi Liu hyperlyzcs@gmail.com
[ Upstream commit 77427e3d5c353e3dd98c7c0af322f8d9e3131ace ]
There is a memory leak (forget to free allocated buffers) in a memory allocation failure path.
Fix it to jump to the correct error handling code.
Fixes: 393fc2f5948f ("misc: microchip: pci1xxxx: load auxiliary bus driver for the PIO function in the multi-function endpoint of pci1xxxx device.") Signed-off-by: Yongzhi Liu hyperlyzcs@gmail.com Reviewed-by: Kumaravel Thiagarajan kumaravel.thiagarajan@microchip.com Link: https://lore.kernel.org/r/20240523121434.21855-4-hyperlyzcs@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.c b/drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.c index de75d89ef53e8..34c9be437432a 100644 --- a/drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.c +++ b/drivers/misc/mchp_pci1xxxx/mchp_pci1xxxx_gp.c @@ -69,8 +69,10 @@ static int gp_aux_bus_probe(struct pci_dev *pdev, const struct pci_device_id *id
aux_bus->aux_device_wrapper[1] = kzalloc(sizeof(*aux_bus->aux_device_wrapper[1]), GFP_KERNEL); - if (!aux_bus->aux_device_wrapper[1]) - return -ENOMEM; + if (!aux_bus->aux_device_wrapper[1]) { + retval = -ENOMEM; + goto err_aux_dev_add_0; + }
retval = ida_alloc(&gp_client_ida, GFP_KERNEL); if (retval < 0)
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jean Delvare jdelvare@suse.de
[ Upstream commit d6d5645e5fc1233a7ba950de4a72981c394a2557 ]
When an I2C adapter acts only as a slave, it should not claim to support I2C master capabilities.
Fixes: 9d3ca54b550c ("i2c: at91: added slave mode support") Signed-off-by: Jean Delvare jdelvare@suse.de Cc: Juergen Fitschen me@jue.yt Cc: Ludovic Desroches ludovic.desroches@microchip.com Cc: Codrin Ciubotariu codrin.ciubotariu@microchip.com Cc: Andi Shyti andi.shyti@kernel.org Cc: Nicolas Ferre nicolas.ferre@microchip.com Cc: Alexandre Belloni alexandre.belloni@bootlin.com Cc: Claudiu Beznea claudiu.beznea@tuxon.dev Signed-off-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-at91-slave.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/i2c/busses/i2c-at91-slave.c b/drivers/i2c/busses/i2c-at91-slave.c index d6eeea5166c04..131a67d9d4a68 100644 --- a/drivers/i2c/busses/i2c-at91-slave.c +++ b/drivers/i2c/busses/i2c-at91-slave.c @@ -106,8 +106,7 @@ static int at91_unreg_slave(struct i2c_client *slave)
static u32 at91_twi_func(struct i2c_adapter *adapter) { - return I2C_FUNC_SLAVE | I2C_FUNC_I2C | I2C_FUNC_SMBUS_EMUL - | I2C_FUNC_SMBUS_READ_BLOCK_DATA; + return I2C_FUNC_SLAVE; }
static const struct i2c_algorithm at91_twi_algorithm_slave = {
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jean Delvare jdelvare@suse.de
[ Upstream commit cbf3fb5b29e99e3689d63a88c3cddbffa1b8de99 ]
When an I2C adapter acts only as a slave, it should not claim to support I2C master capabilities.
Fixes: 5b6d721b266a ("i2c: designware: enable SLAVE in platform module") Signed-off-by: Jean Delvare jdelvare@suse.de Cc: Luis Oliveira lolivei@synopsys.com Cc: Jarkko Nikula jarkko.nikula@linux.intel.com Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: Mika Westerberg mika.westerberg@linux.intel.com Cc: Jan Dabros jsd@semihalf.com Cc: Andi Shyti andi.shyti@kernel.org Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Acked-by: Jarkko Nikula jarkko.nikula@linux.intel.com Tested-by: Jarkko Nikula jarkko.nikula@linux.intel.com Signed-off-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/i2c/busses/i2c-designware-slave.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/i2c/busses/i2c-designware-slave.c b/drivers/i2c/busses/i2c-designware-slave.c index 2e079cf20bb5b..78e2c47e3d7da 100644 --- a/drivers/i2c/busses/i2c-designware-slave.c +++ b/drivers/i2c/busses/i2c-designware-slave.c @@ -220,7 +220,7 @@ static const struct i2c_algorithm i2c_dw_algo = {
void i2c_dw_configure_slave(struct dw_i2c_dev *dev) { - dev->functionality = I2C_FUNC_SLAVE | DW_IC_DEFAULT_FUNCTIONALITY; + dev->functionality = I2C_FUNC_SLAVE;
dev->slave_cfg = DW_IC_CON_RX_FIFO_FULL_HLD_CTRL | DW_IC_CON_RESTART_EN | DW_IC_CON_STOP_DET_IFADDRESSED;
6.9-stable review patch. If anyone has any objections, please let me know.
------------------
From: Oleg Nesterov oleg@redhat.com
[ Upstream commit 7fea700e04bd3f424c2d836e98425782f97b494e ]
kernel_wait4() doesn't sleep and returns -EINTR if there is no eligible child and signal_pending() is true.
That is why zap_pid_ns_processes() clears TIF_SIGPENDING but this is not enough, it should also clear TIF_NOTIFY_SIGNAL to make signal_pending() return false and avoid a busy-wait loop.
Link: https://lkml.kernel.org/r/20240608120616.GB7947@redhat.com Fixes: 12db8b690010 ("entry: Add support for TIF_NOTIFY_SIGNAL") Signed-off-by: Oleg Nesterov oleg@redhat.com Reported-by: Rachel Menge rachelmenge@linux.microsoft.com Closes: https://lore.kernel.org/all/1386cd49-36d0-4a5c-85e9-bc42056a5a38@linux.micro... Reviewed-by: Boqun Feng boqun.feng@gmail.com Tested-by: Wei Fu fuweid89@gmail.com Reviewed-by: Jens Axboe axboe@kernel.dk Cc: Allen Pais apais@linux.microsoft.com Cc: Christian Brauner brauner@kernel.org Cc: Frederic Weisbecker frederic@kernel.org Cc: Joel Fernandes (Google) joel@joelfernandes.org Cc: Joel Granados j.granados@samsung.com Cc: Josh Triplett josh@joshtriplett.org Cc: Lai Jiangshan jiangshanlai@gmail.com Cc: Mateusz Guzik mjguzik@gmail.com Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Cc: Mike Christie michael.christie@oracle.com Cc: Neeraj Upadhyay neeraj.upadhyay@kernel.org Cc: Paul E. McKenney paulmck@kernel.org Cc: Steven Rostedt (Google) rostedt@goodmis.org Cc: Zqiang qiang.zhang1211@gmail.com Cc: Thomas Gleixner tglx@linutronix.de Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/pid_namespace.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 7ade20e952321..415201ca0c7e4 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -218,6 +218,7 @@ void zap_pid_ns_processes(struct pid_namespace *pid_ns) */ do { clear_thread_flag(TIF_SIGPENDING); + clear_thread_flag(TIF_NOTIFY_SIGNAL); rc = kernel_wait4(-1, NULL, __WALL, NULL); } while (rc != -ECHILD);
On 6/19/2024 1:52 PM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
Hello,
On Wed, 19 Jun 2024 14:52:39 +0200 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] 93f303762da5 ("Linux 6.9.6-rc1")
Thanks, SJ
[...]
---
ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: debugfs_rm_non_contexts.sh ok 8 selftests: damon: debugfs_target_ids_read_before_terminate_race.sh ok 9 selftests: damon: debugfs_target_ids_pid_leak.sh ok 10 selftests: damon: sysfs.sh ok 11 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 12 selftests: damon: sysfs_update_schemes_tried_regions_hang.py ok 13 selftests: damon: sysfs_update_schemes_tried_regions_wss_estimation.py ok 14 selftests: damon: damos_quota.py ok 15 selftests: damon: damos_apply_interval.py ok 16 selftests: damon: reclaim.sh ok 17 selftests: damon: lru_sort.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_arm64.sh ok 12 selftests: damon-tests: build_m68k.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m
On Wed, 19 Jun 2024 14:52:39 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.9: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 116 tests: 116 pass, 0 fail
Linux version: 6.9.6-rc1-g93f303762da5 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
Hi!
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
CIP testing did not find any problems here:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-6...
6.6 passes our testing, too:
https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/tree/linux-6...
Tested-by: Pavel Machek (CIP) pavel@denx.de
Best regards, Pavel
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my x86_64 and ARM64 test systems. No errors or regressions.
Tested-by: Allen Pais apais@linux.microsoft.com
Thanks.
On Wed, Jun 19, 2024 at 02:52:39PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
No regressions found on WSL (x86 and arm64).
Built, booted, and reviewed dmesg.
Thank you. :)
Tested-by: Kelsey Steele kelseysteele@linux.microsoft.com
On Wed, Jun 19, 2024 at 02:52:39PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully compiled and installed the kernel on my computer (Acer Aspire E15, Intel Core i3 Haswell). No noticeable regressions.
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
Am 19.06.2024 um 14:52 schrieb Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
Builds, boots and works fine w/o regressions on my 2-socket Ivy Bridge Xeon E5-2697 v2 server.
Tested-by: Peter Schneider pschneider1968@googlemail.com
Beste Grüße, Peter Schneider
On Wed, Jun 19, 2024 at 02:52:39PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Mark Brown broonie@kernel.org
On Wed, 19 Jun 2024 at 18:41, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
thanks,
greg k-h
There are two major issues on arm64 Juno-r2 on Linux stable-rc 6.9.6-rc1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
1) The LTP controllers cgroup_fj_stress test cases causing kernel crash on arm64 Juno-r2 with compat mode testing with stable-rc 6.9 kernel.
In the recent past I have reported this issues on Linux mainline.
LTP: fork13: kernel panic on rk3399-rock-pi-4 running mainline 6.10.rc3 - https://lore.kernel.org/all/CA+G9fYvKmr84WzTArmfaypKM9+=Aw0uXCtuUKHQKFCNMGJy...
it goes like this, Unable to handle kernel NULL pointer dereference at virtual address ... Insufficient stack space to handle exception! end Kernel panic - not syncing: kernel stack overflow
2) The LTP controllers cgroup_fj_stress test suite causing kernel oops on arm64 Juno-r2 (with the clang-night build toolchain). Unable to handle kernel NULL pointer dereference at virtual address 0000000000000009 Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP pc : xprt_alloc_slot+0x54/0x1c8 lr : xprt_alloc_slot+0x30/0x1c8
Details of crash log: 1) Crash log: ----------- cgroup_fj_stress 1 TINFO: Running: cgroup_fj_stress.sh cpuacct 200 1 none cgroup_fj_stress 1 TINFO: timeout per run is 0h 50m 0s tst_cgroup.c:764: TINFO: Mounted V1 cpuacct CGroup on /scratch/ltp-iiltEE0UOm/cgroup_cpuacct cgroup_fj_stress 1 TINFO: test starts with cgroup version 1 cgroup_fj_stress 1 TINFO: Creating subgroups ... [ 1785.477847] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 [ 1785.486682] Mem abort info: [ 1785.489477] ESR = 0x0000000096000004 [ 1785.493232] EC = 0x25: DABT (current EL), IL = 32 bits [ 1785.498555] SET = 0, FnV = 0 [ 1785.501613] EA = 0, S1PTW = 0 [ 1785.504757] FSC = 0x04: level 0 translation fault [ 1785.509643] Data abort info: [ 1785.512526] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 1785.518021] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 1785.523082] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 .. [ 1786.235715] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 .. [ 1786.286238] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 .. [ 1786.336761] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 [ 1786.345564] Mem abort info: [ 1786.348359] ESR = 0x0000000096000004 [ 1786.352112] EC = 0x25: DABT (current EL), IL = 32 bits [ 1786.357434] SET = 0, FnV = 0 [ 1786.360492] EA = 0, S1PTW = 0 [ 1786.363637] FSC = 0x04: level 0 translation fault [ 1786.368523] Data abort info: [ 1786.371405] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 1786.376900] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 1786.381960] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1786.387284] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 [ 1786.387293] Insufficient stack space to handle exception! [ 1786.387296] ESR: 0x0000000096000047 -- DABT (current EL) [ 1786.387302] FAR: 0xffff80008399ffe0 [ 1786.387306] Task stack: [0xffff8000839a0000..0xffff8000839a4000] [ 1786.387312] IRQ stack: [0xffff8000837f8000..0xffff8000837fc000] [ 1786.387319] Overflow stack: [0xffff00097ec95320..0xffff00097ec96320] [ 1786.387327] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 6.9.6-rc1 #1 [ 1786.387338] Hardware name: ARM Juno development board (r2) (DT) [ 1786.387344] pstate: a00003c5 (NzCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1786.387355] pc : _prb_read_valid (kernel/printk/printk_ringbuffer.c:2109) [ 1786.387374] lr : prb_read_valid (kernel/printk/printk_ringbuffer.c:2183) [ 1786.387385] sp : ffff80008399ffe0 [ 1786.387390] x29: ffff8000839a0030 x28: ffff000800365f00 x27: ffff800082530008 [ 1786.387407] x26: ffff8000834e33b8 x25: ffff8000839a00b0 x24: 0000000000000001 [ 1786.387423] x23: ffff8000839a00a8 x22: ffff8000830e3e40 x21: 0000000000001e9e [ 1786.387438] x20: 0000000000000000 x19: ffff8000839a01c8 x18: 0000000000000010 [ 1786.387453] x17: 72646461206c6175 x16: 7472697620746120 x15: 65636e6572656665 [ 1786.387468] x14: 726564207265746e x13: 3037303030303030 x12: 3030303030303030 [ 1786.387483] x11: 2073736572646461 x10: ffff800083151ea0 x9 : ffff80008014273c [ 1786.387498] x8 : ffff8000839a0120 x7 : 0000000000000000 x6 : 0000000000000e9f [ 1786.387512] x5 : ffff8000839a00c8 x4 : ffff8000837157c0 x3 : 0000000000000000 [ 1786.387526] x2 : ffff8000839a00b0 x1 : 0000000000000000 x0 : ffff8000830e3f58 [ 1786.387542] Kernel panic - not syncing: kernel stack overflow [ 1786.387549] SMP: stopping secondary CPUs [ 1787.510055] SMP: failed to stop secondary CPUs 0,4 [ 1787.510065] Kernel Offset: disabled [ 1787.510068] CPU features: 0x4,00001061,e0100000,0200421b [ 1787.510076] Memory Limit: none [ 1787.680436] ---[ end Kernel panic - not syncing: kernel stack overflow ]---
2) Kernel oops log: ----------- [ 1094.253182] __secondary_switched+0xb8/0xc0 [ 1094.258306] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000009 [ 1094.267132] Mem abort info: [ 1094.269938] ESR = 0x0000000096000044 [ 1094.273701] EC = 0x25: DABT (current EL), IL = 32 bits [ 1094.279031] SET = 0, FnV = 0 [ 1094.282097] EA = 0, S1PTW = 0 [ 1094.285242] FSC = ranslation fault [ 1094.290136] Data abort info: [ 1094.293019] ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000 [ 1094.298523] CM = 0, WnR = 1, TnD = 0, TagAccess = 0 [ 1094.303592] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1094.308921] user pgtable: 4k bit VAs, pgdp=00000008a2a34000 [ 1094.315383] [0000000000000009] pgd=0000000000000000, p4d=0000000000000000 [ 1094.322211] Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP [ 1094.328489] Modules linked in: btrfs xor xor_neon raid6_pq zstd_compress libcrc38x hdlcd cec drm_dma_helper onboard_usb_hub crct10dif_ce drm_kms_helper fuse drm backlight dm_mod ip_tables x_tables [ 1094.346744] CPU: 1 PID: 161 Comm: systemd-journal Tainted: G W 6.9.6-rc1 #1 [ 1094.355112] Hardware name: ARM Juno development board (r2) (DT) [ 1094.361038] pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1094.368013] pc : xprt_alloc_slot+0x54/0x1c8 [ 1094.372208] lr : xprt_alloc_slot+0x30/0x1c8 [ 1094.376398] sp : ffff800082dc37e0 [ 1094.379713] x29: ffff800082dc37e0 x28: ffff8000814d31c8 x27: 0000000000008080 [ 1094.386868] x26: ffff8000825da000 x25: 0000000000000001 x24: 0000000000440100 [ 1094.394022] x23: ffff000821759300 x22: 0000000000002102 x21: ffff00082d39d000 [ 1094.401176] x20: ffff00082201f800 x19: ffff0008225bf400 x18: 0000000000000000 [ 1094.408329] x17: 0000000000000000 x16: 0000000000000800 x15: 8080008000000000 [ 1094.415483] x14: 0000ff0064656873 x13: ffff800082dc0000 x12: 0000000000000022 [ 1094.422636] x11: dead000000000100 x10: 0000000000000001 x9 : 0000000000000000 [ 1094.429790] x8 : ffff00082d39d0c8 x7 : 0000000000000000 x6 : 0000000000000000 [ 1094.436942] x5 : 0000000000000000 x4 : ffff00097ec4c530 x3 : ffff800082dc3790 [ 1094.444096] x2 : ffff000821759300 x1 : 0000000000000000 x0 : ffff800081583770 [ 1094.451249] Call trace: [ 1094.453694] xprt_alloc_slot+0x54/0x1c8 [ 1094.457536] xprt_reserve+0x6c/0xe8 [ 1094.461029] call_reserve+0x2c/0x40 [ 1094.464522] __rpc_execute+0x124/0x640 [ 1094.468280] rpc_execute+0x100/0x280 [ 1094.471862] rpc_run_task+0x124/0x1e8 [ 1094.475528] rpc_call_sync+0x58/0xc0 [ 1094.479106] nfs3_proc_getattr+0x94/0xf8 [ 1094.483037] __nfs_revalidate_inode+0x13c/0x310 [ 1094.487575] nfs_access_get_cached+0x23c/0x3b8 [ 1094.492024] nfs_do_access+0x74/0x2b8 [ 1094.495689] nfs_permission+0xb8/0x1e0 [ 1094.499441] inode_permission+0xc4/0x170 [ 1094.503371] link_path_walk+0x100/0x3e0 [ 1094.507215] path_lookupat+0x74/0x130 [ 1094.510882] filename_lookup+0xdc/0x1d8 [ 1094.514724] user_path_at_empty+0x58/0x108 [ 1094.518828] do_faccessat+0x178/0x330 [ 1094.522495] __arm64_sys_faccessat+0x30/0x48 [ 1094.526771] invoke_syscall+0x4c/0x118 [ 1094.530528] el0_svc_common+0x8c/0xf0 [ 1094.534197] do_el0_svc+0x28/0x40 [ 1094.537518] el0_svc+0x40/0x88 [ 1094.540576] el0t_64_sync_handler+0x90/0x100 [ 1094.544853] el0t_64_sync+0x190/0x198 [ 1094.548522] Code: d280200b f2fbd5ab 5280044c d1032115 (f9000549) [ 1094.554623] ---[ end trace 0000000000000000 ]--- [ 1094.559268] note: systemd-journal[161] exited with preempt_count 1 [ 1115.569495] rcu: INFO: rcu_preempt self-detected stall on CPU
Links: ------ 1) - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.9.y/build/v6.9.5-... - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.9.y/build/v6.9.5-... - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2i6h1Ah6I8C... - https://lkft.validation.linaro.org/scheduler/job/7687060#L23314
2) - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.9.y/build/v6.9.5-... - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.9.y/build/v6.9.5-... - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.9.y/build/v6.9.5-... - https://lkft.validation.linaro.org/scheduler/job/7688690#L16336
Build details: ------- * kernel: 6.9.6-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-6.9.y * git commit: 93f303762da5a9d9c2c72cac615d4d092ce42b1f * git describe: v6.9.5-282-g93f303762da5
-- Linaro LKFT https://lkft.linaro.org
On Thu, Jun 20, 2024 at 05:21:09PM +0530, Naresh Kamboju wrote:
On Wed, 19 Jun 2024 at 18:41, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
thanks,
greg k-h
There are two major issues on arm64 Juno-r2 on Linux stable-rc 6.9.6-rc1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
The LTP controllers cgroup_fj_stress test cases causing kernel crash on arm64 Juno-r2 with compat mode testing with stable-rc 6.9 kernel.
In the recent past I have reported this issues on Linux mainline.
LTP: fork13: kernel panic on rk3399-rock-pi-4 running mainline 6.10.rc3
it goes like this, Unable to handle kernel NULL pointer dereference at virtual address ... Insufficient stack space to handle exception! end Kernel panic - not syncing: kernel stack overflow
The LTP controllers cgroup_fj_stress test suite causing kernel oops on arm64 Juno-r2 (with the clang-night build toolchain). Unable to handle kernel NULL pointer dereference at virtual address 0000000000000009 Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP pc : xprt_alloc_slot+0x54/0x1c8 lr : xprt_alloc_slot+0x30/0x1c8
And these are regressions? Any chance to run 'git bisect'?
thanks,
greg k-h
On Thu, 20 Jun 2024 at 17:59, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Jun 20, 2024 at 05:21:09PM +0530, Naresh Kamboju wrote:
On Wed, 19 Jun 2024 at 18:41, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
thanks,
greg k-h
There are two major issues on arm64 Juno-r2 on Linux stable-rc 6.9.6-rc1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
The LTP controllers cgroup_fj_stress test cases causing kernel crash on arm64 Juno-r2 with compat mode testing with stable-rc 6.9 kernel.
In the recent past I have reported this issues on Linux mainline.
LTP: fork13: kernel panic on rk3399-rock-pi-4 running mainline 6.10.rc3
it goes like this, Unable to handle kernel NULL pointer dereference at virtual address ... Insufficient stack space to handle exception! end Kernel panic - not syncing: kernel stack overflow
The LTP controllers cgroup_fj_stress test suite causing kernel oops on arm64 Juno-r2 (with the clang-night build toolchain). Unable to handle kernel NULL pointer dereference at virtual address 0000000000000009 Internal error: Oops: 0000000096000044 [#1] PREEMPT SMP pc : xprt_alloc_slot+0x54/0x1c8 lr : xprt_alloc_slot+0x30/0x1c8
And these are regressions? Any chance to run 'git bisect'?
it's difficult to reproduce the first one so we haven't been able to bisect it. It seemd like David and Baolin might have an idea what's causing it.
- Naresh
On 20.06.24 15:14, Naresh Kamboju wrote:
On Thu, 20 Jun 2024 at 17:59, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Jun 20, 2024 at 05:21:09PM +0530, Naresh Kamboju wrote:
On Wed, 19 Jun 2024 at 18:41, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
thanks,
greg k-h
There are two major issues on arm64 Juno-r2 on Linux stable-rc 6.9.6-rc1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
The LTP controllers cgroup_fj_stress test cases causing kernel crash on arm64 Juno-r2 with compat mode testing with stable-rc 6.9 kernel.
In the recent past I have reported this issues on Linux mainline.
LTP: fork13: kernel panic on rk3399-rock-pi-4 running mainline 6.10.rc3
it goes like this, Unable to handle kernel NULL pointer dereference at virtual address ... Insufficient stack space to handle exception! end Kernel panic - not syncing: kernel stack overflow
How is that related to 6.9.6-rc1? That report is from mainline (6.10.rc3).
Can you share a similar kernel dmesg output from the issue on 6.9.6-rc1?
On Thu, 20 Jun 2024 at 19:23, David Hildenbrand david@redhat.com wrote:
On 20.06.24 15:14, Naresh Kamboju wrote:
On Thu, 20 Jun 2024 at 17:59, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Jun 20, 2024 at 05:21:09PM +0530, Naresh Kamboju wrote:
On Wed, 19 Jun 2024 at 18:41, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
thanks,
greg k-h
There are two major issues on arm64 Juno-r2 on Linux stable-rc 6.9.6-rc1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
The LTP controllers cgroup_fj_stress test cases causing kernel crash on arm64 Juno-r2 with compat mode testing with stable-rc 6.9 kernel.
In the recent past I have reported this issues on Linux mainline.
LTP: fork13: kernel panic on rk3399-rock-pi-4 running mainline 6.10.rc3
it goes like this, Unable to handle kernel NULL pointer dereference at virtual address ... Insufficient stack space to handle exception! end Kernel panic - not syncing: kernel stack overflow
How is that related to 6.9.6-rc1? That report is from mainline (6.10.rc3).
Can you share a similar kernel dmesg output from the issue on 6.9.6-rc1?
I request you to use this link for detailed boot log, test log and crash log. - https://lkft.validation.linaro.org/scheduler/job/7687060#L23314
Few more logs related to build artifacts links provided in the original email thread and bottom of this email.
crash log: ---
[ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033] [ 0.000000] Linux version 6.9.6-rc1 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 13.2.0-12) 13.2.0, GNU ld (GNU Binutils for Debian) 2.42) #1 SMP PREEMPT @1718817000 ... [ 1786.336761] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 [ 1786.345564] Mem abort info: [ 1786.348359] ESR = 0x0000000096000004 [ 1786.352112] EC = 0x25: DABT (current EL), IL = 32 bits [ 1786.357434] SET = 0, FnV = 0 [ 1786.360492] EA = 0, S1PTW = 0 [ 1786.363637] FSC = 0x04: level 0 translation fault [ 1786.368523] Data abort info: [ 1786.371405] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 1786.376900] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 1786.381960] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1786.387284] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 [ 1786.387293] Insufficient stack space to handle exception! [ 1786.387296] ESR: 0x0000000096000047 -- DABT (current EL) [ 1786.387302] FAR: 0xffff80008399ffe0 [ 1786.387306] Task stack: [0xffff8000839a0000..0xffff8000839a4000] [ 1786.387312] IRQ stack: [0xffff8000837f8000..0xffff8000837fc000] [ 1786.387319] Overflow stack: [0xffff00097ec95320..0xffff00097ec96320] [ 1786.387327] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 6.9.6-rc1 #1 [ 1786.387338] Hardware name: ARM Juno development board (r2) (DT) [ 1786.387344] pstate: a00003c5 (NzCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1786.387355] pc : _prb_read_valid (kernel/printk/printk_ringbuffer.c:2109) [ 1786.387374] lr : prb_read_valid (kernel/printk/printk_ringbuffer.c:2183) [ 1786.387385] sp : ffff80008399ffe0 [ 1786.387390] x29: ffff8000839a0030 x28: ffff000800365f00 x27: ffff800082530008 [ 1786.387407] x26: ffff8000834e33b8 x25: ffff8000839a00b0 x24: 0000000000000001 [ 1786.387423] x23: ffff8000839a00a8 x22: ffff8000830e3e40 x21: 0000000000001e9e [ 1786.387438] x20: 0000000000000000 x19: ffff8000839a01c8 x18: 0000000000000010 [ 1786.387453] x17: 72646461206c6175 x16: 7472697620746120 x15: 65636e6572656665 [ 1786.387468] x14: 726564207265746e x13: 3037303030303030 x12: 3030303030303030 [ 1786.387483] x11: 2073736572646461 x10: ffff800083151ea0 x9 : ffff80008014273c [ 1786.387498] x8 : ffff8000839a0120 x7 : 0000000000000000 x6 : 0000000000000e9f [ 1786.387512] x5 : ffff8000839a00c8 x4 : ffff8000837157c0 x3 : 0000000000000000 [ 1786.387526] x2 : ffff8000839a00b0 x1 : 0000000000000000 x0 : ffff8000830e3f58 [ 1786.387542] Kernel panic - not syncing: kernel stack overflow [ 1786.387549] SMP: stopping secondary CPUs [ 1787.510055] SMP: failed to stop secondary CPUs 0,4 [ 1787.510065] Kernel Offset: disabled [ 1787.510068] CPU features: 0x4,00001061,e0100000,0200421b [ 1787.510076] Memory Limit: none [ 1787.680436] ---[ end Kernel panic - not syncing: kernel stack overflow ]---
1) - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.9.y/build/v6.9.5-... - https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.9.y/build/v6.9.5-... - https://tuxapi.tuxsuite.com/v1/groups/linaro/projects/lkft/tests/2i6h1Ah6I8C... - https://lkft.validation.linaro.org/scheduler/job/7687060#L23314
- Naresh
-- Cheers,
David / dhildenb
On 20.06.24 16:02, Naresh Kamboju wrote:
On Thu, 20 Jun 2024 at 19:23, David Hildenbrand david@redhat.com wrote:
On 20.06.24 15:14, Naresh Kamboju wrote:
On Thu, 20 Jun 2024 at 17:59, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Jun 20, 2024 at 05:21:09PM +0530, Naresh Kamboju wrote:
On Wed, 19 Jun 2024 at 18:41, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
thanks,
greg k-h
There are two major issues on arm64 Juno-r2 on Linux stable-rc 6.9.6-rc1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
The LTP controllers cgroup_fj_stress test cases causing kernel crash on arm64 Juno-r2 with compat mode testing with stable-rc 6.9 kernel.
In the recent past I have reported this issues on Linux mainline.
LTP: fork13: kernel panic on rk3399-rock-pi-4 running mainline 6.10.rc3 - https://lore.kernel.org/all/CA+G9fYvKmr84WzTArmfaypKM9+=Aw0uXCtuUKHQKFCNMGJy...
it goes like this, Unable to handle kernel NULL pointer dereference at virtual address ... Insufficient stack space to handle exception! end Kernel panic - not syncing: kernel stack overflow
How is that related to 6.9.6-rc1? That report is from mainline (6.10.rc3).
Can you share a similar kernel dmesg output from the issue on 6.9.6-rc1?
I request you to use this link for detailed boot log, test log and crash log.
Few more logs related to build artifacts links provided in the original email thread and bottom of this email.
crash log:
Thanks, so this is something different than the
"BUG: Bad page map in process fork13 BUG: Bad rss-counter state mm:"
stuff on mainline you referenced.
Looks like some recursive exception until we exhausted the stack.
Trying to connect the dots here, can you enlighten me how this is related to the fork13 mainline report?
[ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033] [ 0.000000] Linux version 6.9.6-rc1 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 13.2.0-12) 13.2.0, GNU ld (GNU Binutils for Debian) 2.42) #1 SMP PREEMPT @1718817000 ... [ 1786.336761] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 [ 1786.345564] Mem abort info: [ 1786.348359] ESR = 0x0000000096000004 [ 1786.352112] EC = 0x25: DABT (current EL), IL = 32 bits [ 1786.357434] SET = 0, FnV = 0 [ 1786.360492] EA = 0, S1PTW = 0 [ 1786.363637] FSC = 0x04: level 0 translation fault [ 1786.368523] Data abort info: [ 1786.371405] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 1786.376900] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 1786.381960] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1786.387284] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 [ 1786.387293] Insufficient stack space to handle exception! [ 1786.387296] ESR: 0x0000000096000047 -- DABT (current EL) [ 1786.387302] FAR: 0xffff80008399ffe0 [ 1786.387306] Task stack: [0xffff8000839a0000..0xffff8000839a4000] [ 1786.387312] IRQ stack: [0xffff8000837f8000..0xffff8000837fc000] [ 1786.387319] Overflow stack: [0xffff00097ec95320..0xffff00097ec96320] [ 1786.387327] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 6.9.6-rc1 #1 [ 1786.387338] Hardware name: ARM Juno development board (r2) (DT) [ 1786.387344] pstate: a00003c5 (NzCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1786.387355] pc : _prb_read_valid (kernel/printk/printk_ringbuffer.c:2109) [ 1786.387374] lr : prb_read_valid (kernel/printk/printk_ringbuffer.c:2183) [ 1786.387385] sp : ffff80008399ffe0 [ 1786.387390] x29: ffff8000839a0030 x28: ffff000800365f00 x27: ffff800082530008 [ 1786.387407] x26: ffff8000834e33b8 x25: ffff8000839a00b0 x24: 0000000000000001 [ 1786.387423] x23: ffff8000839a00a8 x22: ffff8000830e3e40 x21: 0000000000001e9e [ 1786.387438] x20: 0000000000000000 x19: ffff8000839a01c8 x18: 0000000000000010 [ 1786.387453] x17: 72646461206c6175 x16: 7472697620746120 x15: 65636e6572656665 [ 1786.387468] x14: 726564207265746e x13: 3037303030303030 x12: 3030303030303030 [ 1786.387483] x11: 2073736572646461 x10: ffff800083151ea0 x9 : ffff80008014273c [ 1786.387498] x8 : ffff8000839a0120 x7 : 0000000000000000 x6 : 0000000000000e9f [ 1786.387512] x5 : ffff8000839a00c8 x4 : ffff8000837157c0 x3 : 0000000000000000 [ 1786.387526] x2 : ffff8000839a00b0 x1 : 0000000000000000 x0 : ffff8000830e3f58 [ 1786.387542] Kernel panic - not syncing: kernel stack overflow [ 1786.387549] SMP: stopping secondary CPUs [ 1787.510055] SMP: failed to stop secondary CPUs 0,4 [ 1787.510065] Kernel Offset: disabled [ 1787.510068] CPU features: 0x4,00001061,e0100000,0200421b [ 1787.510076] Memory Limit: none [ 1787.680436] ---[ end Kernel panic - not syncing: kernel stack overflow ]---
On Thu, 20 Jun 2024 at 19:50, David Hildenbrand david@redhat.com wrote:
On 20.06.24 16:02, Naresh Kamboju wrote:
On Thu, 20 Jun 2024 at 19:23, David Hildenbrand david@redhat.com wrote:
On 20.06.24 15:14, Naresh Kamboju wrote:
On Thu, 20 Jun 2024 at 17:59, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Jun 20, 2024 at 05:21:09PM +0530, Naresh Kamboju wrote:
On Wed, 19 Jun 2024 at 18:41, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote: > > This is the start of the stable review cycle for the 6.9.6 release. > There are 281 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... > or in the git tree and branch at: > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y > and the diffstat can be found below. > > thanks, > > greg k-h
There are two major issues on arm64 Juno-r2 on Linux stable-rc 6.9.6-rc1
Reported-by: Linux Kernel Functional Testing lkft@linaro.org
The LTP controllers cgroup_fj_stress test cases causing kernel crash on arm64 Juno-r2 with compat mode testing with stable-rc 6.9 kernel.
In the recent past I have reported this issues on Linux mainline.
LTP: fork13: kernel panic on rk3399-rock-pi-4 running mainline 6.10.rc3 - https://lore.kernel.org/all/CA+G9fYvKmr84WzTArmfaypKM9+=Aw0uXCtuUKHQKFCNMGJy...
it goes like this, Unable to handle kernel NULL pointer dereference at virtual address ... Insufficient stack space to handle exception! end Kernel panic - not syncing: kernel stack overflow
How is that related to 6.9.6-rc1? That report is from mainline (6.10.rc3).
Can you share a similar kernel dmesg output from the issue on 6.9.6-rc1?
I request you to use this link for detailed boot log, test log and crash log.
Few more logs related to build artifacts links provided in the original email thread and bottom of this email.
crash log:
Thanks for investigating this crash report.
Thanks, so this is something different than the
"BUG: Bad page map in process fork13 BUG: Bad rss-counter state mm:"
stuff on mainline you referenced.
Looks like some recursive exception until we exhausted the stack.
You are right ! I see only one common case is, exhaust the stack.
Trying to connect the dots here, can you enlighten me how this is related to the fork13 mainline report?
I am not sure about the relation between these two reports. But as a common practice I have shared that report information.
[ 0.000000] Booting Linux on physical CPU 0x0000000100 [0x410fd033] [ 0.000000] Linux version 6.9.6-rc1 (tuxmake@tuxmake) (aarch64-linux-gnu-gcc (Debian 13.2.0-12) 13.2.0, GNU ld (GNU Binutils for Debian) 2.42) #1 SMP PREEMPT @1718817000 ... [ 1786.336761] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 [ 1786.345564] Mem abort info: [ 1786.348359] ESR = 0x0000000096000004 [ 1786.352112] EC = 0x25: DABT (current EL), IL = 32 bits [ 1786.357434] SET = 0, FnV = 0 [ 1786.360492] EA = 0, S1PTW = 0 [ 1786.363637] FSC = 0x04: level 0 translation fault [ 1786.368523] Data abort info: [ 1786.371405] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 1786.376900] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 1786.381960] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 1786.387284] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000070 [ 1786.387293] Insufficient stack space to handle exception! [ 1786.387296] ESR: 0x0000000096000047 -- DABT (current EL) [ 1786.387302] FAR: 0xffff80008399ffe0 [ 1786.387306] Task stack: [0xffff8000839a0000..0xffff8000839a4000] [ 1786.387312] IRQ stack: [0xffff8000837f8000..0xffff8000837fc000] [ 1786.387319] Overflow stack: [0xffff00097ec95320..0xffff00097ec96320] [ 1786.387327] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 6.9.6-rc1 #1 [ 1786.387338] Hardware name: ARM Juno development board (r2) (DT) [ 1786.387344] pstate: a00003c5 (NzCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 1786.387355] pc : _prb_read_valid (kernel/printk/printk_ringbuffer.c:2109) [ 1786.387374] lr : prb_read_valid (kernel/printk/printk_ringbuffer.c:2183) [ 1786.387385] sp : ffff80008399ffe0 [ 1786.387390] x29: ffff8000839a0030 x28: ffff000800365f00 x27: ffff800082530008 [ 1786.387407] x26: ffff8000834e33b8 x25: ffff8000839a00b0 x24: 0000000000000001 [ 1786.387423] x23: ffff8000839a00a8 x22: ffff8000830e3e40 x21: 0000000000001e9e [ 1786.387438] x20: 0000000000000000 x19: ffff8000839a01c8 x18: 0000000000000010 [ 1786.387453] x17: 72646461206c6175 x16: 7472697620746120 x15: 65636e6572656665 [ 1786.387468] x14: 726564207265746e x13: 3037303030303030 x12: 3030303030303030 [ 1786.387483] x11: 2073736572646461 x10: ffff800083151ea0 x9 : ffff80008014273c [ 1786.387498] x8 : ffff8000839a0120 x7 : 0000000000000000 x6 : 0000000000000e9f [ 1786.387512] x5 : ffff8000839a00c8 x4 : ffff8000837157c0 x3 : 0000000000000000 [ 1786.387526] x2 : ffff8000839a00b0 x1 : 0000000000000000 x0 : ffff8000830e3f58 [ 1786.387542] Kernel panic - not syncing: kernel stack overflow [ 1786.387549] SMP: stopping secondary CPUs [ 1787.510055] SMP: failed to stop secondary CPUs 0,4 [ 1787.510065] Kernel Offset: disabled [ 1787.510068] CPU features: 0x4,00001061,e0100000,0200421b [ 1787.510076] Memory Limit: none [ 1787.680436] ---[ end Kernel panic - not syncing: kernel stack overflow ]---
-- Cheers,
David / dhildenb
- Naresh
Trying to connect the dots here, can you enlighten me how this is related to the fork13 mainline report?
I am not sure about the relation between these two reports. But as a common practice I have shared that report information.
Yes, I was just briefly concerned that we are seeing the same issue as we saw once on 6.10-rc3 also on a 6.9 kernel -- also because the patch list don't contain that much MM stuff.
On 6/19/24 5:52 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On 6/19/24 06:52, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.9.6 release. There are 281 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Fri, 21 Jun 2024 12:55:11 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.9.6-rc1.g... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.9.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
linux-stable-mirror@lists.linaro.org