This is the start of the stable review cycle for the 6.10.7 release. There are 273 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.7-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.10.7-rc1
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Input: MT - limit max slots
Namjae Jeon linkinjeon@kernel.org ksmbd: fix race condition between destroy_previous_session() and smb2 operations()
Yonghong Song yonghong.song@linux.dev selftests/bpf: Add a test to verify previous stacksafe() fix
Boyuan Zhang boyuan.zhang@amd.com drm/amdgpu/vcn: not pause dpg for unified queue
Boyuan Zhang boyuan.zhang@amd.com drm/amdgpu/vcn: identify unified queue in sw init
Christian Brauner brauner@kernel.org Revert "pidfd: prevent creation of pidfds for kthreads"
Matthew Brost matthew.brost@intel.com drm/xe: Do not dereference NULL job->fence in trace points
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: join: check re-using ID of closed subflow
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: join: validate fullmesh endp on 1st sf
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: avoid possible UaF when selecting endp
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: fullmesh: select the right ID later
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: only in-kernel cannot have entries with ID 0
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: check add_addr_accept_max before accepting new ADD_ADDR
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: only decrement add_addr_accepted for MPJ req
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: only mark 'subflow' endp as available
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: remove mptcp_pm_remove_subflow()
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: re-using ID of unused flushed subflows
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: re-using ID of unused removed subflows
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: re-using ID of unused removed ADD_ADDR
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org thermal: of: Fix OF node leak in of_thermal_zone_find() error paths
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org thermal: of: Fix OF node leak in thermal_of_zone_register()
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org thermal: of: Fix OF node leak in thermal_of_trips_init() error path
Dave Airlie airlied@redhat.com nouveau/firmware: use dma non-coherent allocator
Peng Fan peng.fan@nxp.com pmdomain: imx: wait SSAR when i.MX93 power domain on
Alexander Stein alexander.stein@ew.tq-group.com pmdomain: imx: scu-pd: Remove duplicated clocks
Steve French stfrench@microsoft.com smb3: fix broken cached reads when posix locks
Ben Whitten ben.whitten@gmail.com mmc: dw_mmc: allow biu and ciu clocks to defer
Mengqi Zhang mengqi.zhang@mediatek.com mmc: mtk-sd: receive cmd8 data when hs400 tuning fail
Waiman Long longman@redhat.com cgroup/cpuset: Clear effective_xcpus on cpus_allowed clearing only if cpus.exclusive not set
Chen Ridong chenridong@huawei.com cgroup/cpuset: fix panic caused by partcmd_update
Marc Zyngier maz@kernel.org KVM: arm64: Make ICC_*SGI*_EL1 undef in the absence of a vGICv3
Zenghui Yu yuzenghui@huawei.com KVM: arm64: vgic-debug: Don't put unmarked LPIs
Nikolay Kuratov kniv@yandex-team.ru cxgb4: add forgotten u64 ivlan cast before shift
Michael Ellerman mpe@ellerman.id.au ata: pata_macio: Fix DMA table overflow
Werner Sembach wse@tuxedocomputers.com Input: i8042 - use new forcenorestore quirk to replace old buggy quirk combination
Werner Sembach wse@tuxedocomputers.com Input: i8042 - add forcenorestore quirk to leave controller untouched even on s3
Nicolin Chen nicolinc@nvidia.com iommufd/device: Fix hwpt at err_unresv in iommufd_device_do_replace()
Jason Gerecke jason.gerecke@wacom.com HID: wacom: Defer calculation of resolution until resolution_code is known
Jiaxun Yang jiaxun.yang@flygoat.com MIPS: Loongson64: Set timer mode in cpu-probe
Martin Whitaker foss@martin-whitaker.me.uk net: dsa: microchip: fix PTP config failure when using multiple ports
Mengyuan Lou mengyuanlou@net-swift.com net: ngbe: Fix phy mode set to external phy
Harald Freudenberger freude@linux.ibm.com s390/ap: Refine AP bus bindings complete processing
Srinivas Pandruvada srinivas.pandruvada@linux.intel.com platform/x86: ISST: Fix return value on last invalid resource
Hans de Goede hdegoede@redhat.com platform/x86: dell-uart-backlight: Use acpi_video_get_backlight_type()
Hans de Goede hdegoede@redhat.com ACPI: video: Add backlight=native quirk for Dell OptiPlex 7760 AIO
Hans de Goede hdegoede@redhat.com ACPI: video: Add Dell UART backlight controller detection
Alex Deucher alexander.deucher@amd.com drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1
Candice Li candice.li@amd.com drm/amdgpu: Validate TA binary size
Namjae Jeon linkinjeon@kernel.org ksmbd: the buffer of smb2 query dir response has at least 1 byte
Chaotian Jing chaotian.jing@mediatek.com scsi: core: Fix the return value of scsi_logical_block_count()
Griffin Kroah-Hartman griffin@kroah.com Bluetooth: MGMT: Add error handling to pair_device()
Ming Lei ming.lei@redhat.com nvme: move stopping keep-alive into nvme_uninit_ctrl()
Paulo Alcantara pc@manguebit.com smb: client: ignore unhandled reparse tags
Alexander Gordeev agordeev@linux.ibm.com s390/boot: Fix KASLR base offset off by __START_KERNEL bytes
Alexander Gordeev agordeev@linux.ibm.com s390/boot: Avoid possible physmem_info segment corruption
Yang Ruibin 11162571@vivo.com thermal/debugfs: Fix the NULL vs IS_ERR() confusion in debugfs_create_dir()
Matthew Brost matthew.brost@intel.com drm/xe: Free job before xe_exec_queue_put
Thomas Hellström thomas.hellstrom@linux.intel.com drm/xe: Don't initialize fences at xe_sched_job_create()
Thomas Hellström thomas.hellstrom@linux.intel.com drm/xe: Split lrc seqno fence creation up
Matthew Brost matthew.brost@intel.com drm/xe: Decouple job seqno and lrc seqno
Rodrigo Vivi rodrigo.vivi@intel.com drm/xe: Relax runtime pm protection during execution
Stuart Summers stuart.summers@intel.com drm/xe: Fix missing workqueue destroy in xe_gt_pagefault
Jens Axboe axboe@kernel.dk io_uring/kbuf: sanitize peek buffer setup
Dan Carpenter dan.carpenter@linaro.org mmc: mmc_test: Fix NULL dereference on allocation failure
Matthew Brost matthew.brost@intel.com drm/xe: Fix tile fini sequence
Matthew Auld matthew.auld@intel.com drm/xe: reset mmio mappings with devm
Matthew Auld matthew.auld@intel.com drm/xe/mmio: move mmio_fini over to devm
Lucas De Marchi lucas.demarchi@intel.com drm/xe: Fix opregion leak
Matthew Auld matthew.auld@intel.com drm/xe/display: stop calling domains_driver_remove twice
Suraj Kandpal suraj.kandpal@intel.com drm/i915/hdcp: Use correct cp_irq_count
Vignesh Raghavendra vigneshr@ti.com spi: spi-cadence-quadspi: Fix OSPI NOR failures during system resume
Abhinav Kumar quic_abhinavk@quicinc.com drm/msm: fix the highest_bank_bit for sc7180
Tejun Heo tj@kernel.org workqueue: Fix spruious data race in __flush_work()
Will Deacon will@kernel.org workqueue: Fix UBSAN 'subtraction overflow' error in shift_and_mask()
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: take plane rotation into account for wide planes
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: relax YUV requirements
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: limit QCM2290 to RGB formats only
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: cleanup FB if dpu_format_populate_layout fails
Abhinav Kumar quic_abhinavk@quicinc.com drm/msm/dp: reset the link phy params before link training
Abhinav Kumar quic_abhinavk@quicinc.com drm/msm/dpu: move dpu_encoder's connector assignment to atomic_enable()
Abhinav Kumar quic_abhinavk@quicinc.com drm/msm/dp: fix the max supported bpp logic
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: don't play tricks with debug macros
Alexandra Winter wintera@linux.ibm.com s390/iucv: Fix vargs handling in iucv_alloc_device()
Menglong Dong menglong8.dong@gmail.com net: ovs: fix ovs_drop_reasons error
Sean Anderson sean.anderson@linux.dev net: xilinx: axienet: Fix dangling multicast addresses
Sean Anderson sean.anderson@linux.dev net: xilinx: axienet: Always disable promiscuous mode
Bharat Bhushan bbhushan2@marvell.com octeontx2-af: Fix CPT AF register offset calculation
Pablo Neira Ayuso pablo@netfilter.org netfilter: flowtable: validate vlan header
Somnath Kotur somnath.kotur@broadcom.com bnxt_en: Fix double DMA unmapping for XDP_REDIRECT
Eric Dumazet edumazet@google.com ipv6: prevent possible UAF in ip6_xmit()
Eric Dumazet edumazet@google.com ipv6: fix possible UAF in ip6_finish_output2()
Eric Dumazet edumazet@google.com ipv6: prevent UAF in ip6_send_skb()
Ido Schimmel idosch@nvidia.com selftests: mlxsw: ethtool_lanes: Source ethtool lib from correct path
Felix Fietkau nbd@nbd.name udp: fix receiving fraglist GSO packets
Stephen Hemminger stephen@networkplumber.org netem: fix return value if duplicate enqueue fails
Joseph Huang Joseph.Huang@garmin.com net: dsa: mv88e6xxx: Fix out-of-bound access
Paolo Abeni pabeni@redhat.com igb: cope with large MAX_SKB_FRAGS
Dan Carpenter dan.carpenter@linaro.org dpaa2-switch: Fix error checking in dpaa2_switch_seed_bp()
Michal Swiatkowski michal.swiatkowski@linux.intel.com ice: use internal pf id instead of function number
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: fix truesize operations for PAGE_SIZE >= 8192
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: fix ICE_LAST_OFFSET formula
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: fix page reuse when PAGE_SIZE is over 8k
Nikolay Aleksandrov razor@blackwall.org bonding: fix xfrm state handling when clearing active slave
Nikolay Aleksandrov razor@blackwall.org bonding: fix xfrm real_dev null pointer dereference
Nikolay Aleksandrov razor@blackwall.org bonding: fix null pointer deref in bond_ipsec_offload_ok
Nikolay Aleksandrov razor@blackwall.org bonding: fix bond_ipsec_offload_ok return type
Thomas Bogendoerfer tbogendoerfer@suse.de ip6_tunnel: Fix broken GRO
Sebastian Andrzej Siewior bigeasy@linutronix.de netfilter: nft_counter: Synchronize nft_counter_reset() against reader.
Sebastian Andrzej Siewior bigeasy@linutronix.de netfilter: nft_counter: Disable BH in nft_counter_offload_stats().
Kuniyuki Iwashima kuniyu@amazon.com kcm: Serialise kcm_sendmsg() for the same socket.
Jeremy Kerr jk@codeconstruct.com.au net: mctp: test: Use correct skb for route input check
Florian Westphal fw@strlen.de tcp: prevent concurrent execution of tcp_sk_exit_batch
Hangbin Liu liuhangbin@gmail.com selftests: udpgro: no need to load xdp for gro
Hangbin Liu liuhangbin@gmail.com selftests: udpgro: report error when receive failed
Simon Horman horms@kernel.org tc-testing: don't access non-existent variable on exception
Patrisious Haddad phaddad@nvidia.com net/mlx5: Fix IPsec RoCE MPV trace call
Carolina Jubran cjubran@nvidia.com net/mlx5e: XPS, Fix oversight of Multi-PF Netdev changes
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: serialize access to the injection/extraction groups
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: fix QoS class for injected packets with "ocelot-8021q"
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and register injection
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: SMP: Fix assumption of Central always being Initiator
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_core: Fix LE quote calculation
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: HCI: Invert LE State quirk to be opt-out rather then opt-in
Masahiro Yamada masahiroy@kernel.org kbuild: avoid scripts/kallsyms parsing /dev/null
Masahiro Yamada masahiroy@kernel.org kbuild: merge temporary vmlinux for BTF and kallsyms
Alexandre Courbot gnurou@gmail.com Makefile: add $(srctree) to dependency of compile_commands.json target
Takashi Iwai tiwai@suse.de ALSA: hda/tas2781: Use correct endian conversion
Maximilian Luz luzmaximilian@gmail.com platform/surface: aggregator: Fix warning when controller is destroyed in probe
Baochen Qiang quic_bqiang@quicinc.com wifi: ath12k: use 128 bytes aligned iova in transmit path for WCN7850
Mikulas Patocka mpatocka@redhat.com dm suspend: return -ERESTARTSYS instead of -EINTR
Su Hui suhui@nfschina.com smb/client: avoid possible NULL dereference in cifs_free_subrequest()
David Howells dhowells@redhat.com cifs: Add a tracepoint to track credits involved in R/W requests
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: gov_bang_bang: Use governor_data to reduce overhead
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: gov_bang_bang: Add .manage() callback
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: gov_bang_bang: Split bang_bang_control()
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: gov_bang_bang: Drop unnecessary cooling device target state checks
Mario Limonciello mario.limonciello@amd.com drm/amd/display: Don't register panel_power_savings on OLED panels
Li Lingfeng lilingfeng3@huawei.com block: Fix lockdep warning in blk_mq_mark_tag_wait
Samuel Holland samuel.holland@sifive.com arm64: Fix KASAN random tag seed initialization
Ryo Takakura takakura@valinux.co.jp printk/panic: Allow cpu backtraces to be written into ringbuffer during panic
Nysal Jan K.A nysal@linux.ibm.com powerpc/topology: Check if a core is online
Nysal Jan K.A nysal@linux.ibm.com cpu/SMT: Enable SMT only if a core is online
Olivier Langlois olivier@trillion01.com io_uring/napi: check napi_enabled in io_napi_add() before proceeding
Pavel Begunkov asml.silence@gmail.com io_uring/napi: use ktime in busy polling
Thorsten Blum thorsten.blum@toblux.com io_uring/napi: Remove unnecessary s64 cast
Eric Farman farman@linux.ibm.com s390/dasd: Remove DMA alignment
Masahiro Yamada masahiroy@kernel.org rust: fix the default format for CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
Masahiro Yamada masahiroy@kernel.org rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
Miguel Ojeda ojeda@kernel.org rust: work around `bindgen` 0.69.0 issue
Maíra Canal mcanal@igalia.com drm/v3d: Fix out-of-bounds read in `v3d_csd_job_run()`
Parsa Poorshikhian parsa.poorsh@gmail.com ALSA: hda/realtek: Fix noise from speakers on Lenovo IdeaPad 3 15IAU7
Asmaa Mnebhi asmaa@nvidia.com gpio: mlxbf3: Support shutdown() function
Barak Biber bbiber@nvidia.com iommu: Restore lost return in iommu_report_device_fault()
Song Liu song@kernel.org kallsyms: Match symbols exactly with CONFIG_LTO_CLANG
Song Liu song@kernel.org kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols
Jann Horn jannh@google.com kallsyms: get rid of code for absolute kallsyms
Masahiro Yamada masahiroy@kernel.org kbuild: remove PROVIDE() for kallsyms symbols
Masahiro Yamada masahiroy@kernel.org kbuild: refactor variables in scripts/link-vmlinux.sh
Jie Wang wangjie125@huawei.com net: hns3: fix a deadlock problem when config TC during resetting
Peiyang Wang wangpeiyang1@huawei.com net: hns3: use the user's cfg after reset
Jie Wang wangjie125@huawei.com net: hns3: fix wrong use of semaphore up
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: net: lib: kill PIDs before del netns
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: net: lib: ignore possible errors
Cong Wang cong.wang@bytedance.com vsock: fix recursive ->recvmsg calls
Abhinav Jain jain.abhinav177@gmail.com selftest: af_unix: Fix kselftest compilation warnings
Phil Sutter phil@nwl.cc netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
Phil Sutter phil@nwl.cc netfilter: nf_tables: Introduce nf_tables_getobj_single
Phil Sutter phil@nwl.cc netfilter: nf_tables: Audit log dump reset after the fact
Florian Westphal fw@strlen.de netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
Donald Hunter donald.hunter@gmail.com netfilter: flowtable: initialise extack before use
Donald Hunter donald.hunter@gmail.com netfilter: nfnetlink: Initialise extack before use in ACKs
Tom Hughes tom@compton.nu netfilter: allow ipv6 fragments to arrive on different devices
Subash Abhinov Kasiviswanathan quic_subashab@quicinc.com tcp: Update window clamping condition
Eugene Syromiatnikov esyr@redhat.com mptcp: correct MPTCP_SUBFLOW_ATTR_SSN_OFFSET reserved size
David Thompson davthompson@nvidia.com mlxbf_gige: disable RX filters until RX path initialized
Zheng Zhang everything411@qq.com net: ethernet: mtk_wed: fix use-after-free panic in mtk_wed_setup_tc_block_cb()
Pawel Dembicki paweldembicki@gmail.com net: dsa: vsc73xx: check busy flag in MDIO operations
Pawel Dembicki paweldembicki@gmail.com net: dsa: vsc73xx: pass value in phy_write operation
Pawel Dembicki paweldembicki@gmail.com net: dsa: vsc73xx: fix port MAC configuration in full duplex mode
Radhey Shyam Pandey radhey.shyam.pandey@amd.com net: axienet: Fix register defines comment description
Dan Carpenter dan.carpenter@linaro.org atm: idt77252: prevent use after free in dequeue_rx()
Cosmin Ratiu cratiu@nvidia.com net/mlx5e: Correctly report errors for ethtool rx flows
Dragos Tatulea dtatulea@nvidia.com net/mlx5e: Take state lock during tx timeout reporter
Tariq Toukan tariqt@nvidia.com net/mlx5: SD, Do not query MPIR register if no sd_group
Eric Dumazet edumazet@google.com gtp: pull network headers in gtp_dev_xmit()
Faizal Rahim faizal.abdul.rahim@linux.intel.com igc: Fix qbv tx latency by setting gtxoffset
Faizal Rahim faizal.abdul.rahim@linux.intel.com igc: Fix reset adapter logics when tx mode change
Faizal Rahim faizal.abdul.rahim@linux.intel.com igc: Fix qbv_config_change_errors logics
Faizal Rahim faizal.abdul.rahim@linux.intel.com igc: Fix packet still tx after gate close by reducing i226 MAC retry buffer
Naohiro Aota naohiro.aota@wdc.com btrfs: fix invalid mapping of extent xarray state
Dominique Martinet asmadeus@codewreck.org 9p: Fix DIO read through netfs
Yonghong Song yonghong.song@linux.dev bpf: Fix a kernel verifier crash in stacksafe()
Leon Hwang leon.hwang@linux.dev bpf: Fix updating attached freplace prog in prog_array map
yangerkun yangerkun@huawei.com libfs: fix infinite directory reads for offset dir
Omar Sandoval osandov@fb.com filelock: fix name of file_lease slab cache
Matthew Wilcox (Oracle) willy@infradead.org netfs: Fault in smaller chunks for non-large folio mappings
Claudio Imbrenda imbrenda@linux.ibm.com s390/uv: Panic for set and remove shared access UVC errors
Christian Brauner brauner@kernel.org pidfd: prevent creation of pidfds for kthreads
David (Ming Qiang) Wu David.Wu3@amd.com drm/amd/amdgpu: command submission parser for JPEG
Alex Deucher alexander.deucher@amd.com drm/amdgpu/jpeg4: properly set atomics vmid field
Alex Deucher alexander.deucher@amd.com drm/amdgpu/jpeg2: properly set atomics vmid field
Melissa Wen mwen@igalia.com drm/amd/display: fix cursor offset on rotation 180
Loan Chen lo-an.chen@amd.com drm/amd/display: Enable otg synchronization logic for DCN321
Hamza Mahfooz hamza.mahfooz@amd.com drm/amd/display: fix s2idle entry for DCN3.5+
Rodrigo Siqueira Rodrigo.Siqueira@amd.com drm/amd/display: Adjust cursor position
Al Viro viro@zeniv.linux.org.uk memcg_write_event_control(): fix a user-triggerable oops
Bas Nieuwenhuizen bas@basnieuwenhuizen.nl drm/amdgpu: Actually check flags for all context ops.
Qu Wenruo wqu@suse.com btrfs: only enable extent map shrinker for DEBUG builds
Qu Wenruo wqu@suse.com btrfs: tree-checker: add dev extent item checks
Naohiro Aota naohiro.aota@wdc.com btrfs: zoned: properly take lock to read/update block group's zoned variables
Filipe Manana fdmanana@suse.com btrfs: only run the extent map shrinker from kswapd tasks
Josef Bacik josef@toxicpanda.com btrfs: check delayed refs when we're checking if a ref exists
Filipe Manana fdmanana@suse.com btrfs: send: allow cloning non-aligned extent if it ends at i_size
Qu Wenruo wqu@suse.com btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
Zi Yan ziy@nvidia.com mm/numa: no task_numa_fault() call if PTE is changed
Hailong Liu hailong.liu@oppo.com mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
Zi Yan ziy@nvidia.com mm/numa: no task_numa_fault() call if PMD is changed
Suren Baghdasaryan surenb@google.com alloc_tag: introduce clear_page_tag_ref() helper function
Muhammad Usama Anjum usama.anjum@collabora.com selftests: memfd_secret: don't build memfd_secret test on unsupported arches
Waiman Long longman@redhat.com mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
Suren Baghdasaryan surenb@google.com alloc_tag: mark pages reserved during CMA activation as not tagged
Zhen Lei thunder.leizhen@huawei.com selinux: add the processing of the failure of avc_add_xperms_decision()
Zhen Lei thunder.leizhen@huawei.com selinux: fix potential counting error in avc_add_xperms_decision()
Max Kellermann max.kellermann@ionos.com fs/netfs/fscache_cookie: add missing "n_accesses" check
Janne Grunau j@jannau.net wifi: brcmfmac: cfg80211: Handle SSID based pmksa deletion
Long Li longli@microsoft.com net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
Hans de Goede hdegoede@redhat.com media: atomisp: Fix streaming no longer working on BYT / ISP2400 devices
Haiyang Zhang haiyangz@microsoft.com net: mana: Fix RX buf alloc_size alignment and atomic op panic
Yu Kuai yukuai3@huawei.com md/raid1: Fix data corruption for degraded array with slow disk
David Hildenbrand david@redhat.com mm/hugetlb: fix hugetlb vs. core-mm PT locking
Kirill A. Shutemov kirill.shutemov@linux.intel.com mm: fix endless reclaim on machines with unaccepted memory
Dan Carpenter dan.carpenter@linaro.org rtla/osnoise: Prevent NULL dereference in error handling
Pedro Falcato pedro.falcato@gmail.com mseal: fix is_madv_discard()
Kyle Huey me@kylehuey.com perf/bpf: Don't call bpf_overflow_handler() for tracing events
Steven Rostedt rostedt@goodmis.org tracing: Return from tracing_buffers_read() if the file has been closed
Andi Shyti andi.shyti@kernel.org i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
Al Viro viro@zeniv.linux.org.uk fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
Zhihao Cheng chengzhihao1@huawei.com vfs: Don't evict inode under the inode lru traversing context
Mikulas Patocka mpatocka@redhat.com dm persistent data: fix memory allocation failure
Khazhismel Kumykov khazhy@google.com dm resume: don't return EINVAL when signalled
Haibo Xu haibo1.xu@intel.com arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
Rafael J. Wysocki rafael.j.wysocki@intel.com ACPI: EC: Evaluate _REG outside the EC scope more carefully
Rafael J. Wysocki rafael.j.wysocki@intel.com ACPICA: Add a depth argument to acpi_execute_reg_methods()
Breno Leitao leitao@debian.org i2c: tegra: Do not mark ACPI devices as irq safe
Steve French stfrench@microsoft.com smb3: fix lock breakage for cached writes
Celeste Liu coelacanthushex@gmail.com riscv: entry: always initialize regs->a0 to -ENOSYS
Nam Cao namcao@linutronix.de riscv: change XIP's kernel_map.size to be size of the entire kernel
David Gstir david@sigma-star.at KEYS: trusted: dcp: fix leak of blob encryption key
David Gstir david@sigma-star.at KEYS: trusted: fix DCP blob payload length assignment
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: gov_bang_bang: Call __thermal_cdev_update() directly
Michael Mueller mimu@linux.ibm.com KVM: s390: fix validity interception issue when gisa is switched off
Stefan Haberland sth@linux.ibm.com s390/dasd: fix error recovery leading to data corruption on ESE devices
Takashi Iwai tiwai@suse.de ALSA: timer: Relax start tick time check for slave timer elements
Baojun Xu baojun.xu@ti.com ALSA: hda/tas2781: fix wrong calibrated data order
Mika Westerberg mika.westerberg@linux.intel.com thunderbolt: Mark XDomain as unplugged when router is removed
Mathias Nyman mathias.nyman@linux.intel.com xhci: Fix Panther point NULL pointer deref at full-speed re-enumeration
Marc Zyngier maz@kernel.org usb: xhci: Check for xhci->interrupters being allocated in xhci_mem_clearup()
Hans de Goede hdegoede@redhat.com usb: misc: ljca: Add Lunar Lake ljca GPIO HID to ljca_gpio_hids[]
Juan José Arboleda soyjuanarbol@gmail.com ALSA: usb-audio: Support Yamaha P-125 quirk entry
Lianqin Hu hulianqin@vivo.com ALSA: usb-audio: Add delay quirk for VIVO USB-C-XE710 HEADSET
Eli Billauer eli.billauer@gmail.com char: xillybus: Check USB endpoints when probing device
Eli Billauer eli.billauer@gmail.com char: xillybus: Refine workqueue handling
Eli Billauer eli.billauer@gmail.com char: xillybus: Don't destroy workqueue from work item running on it
Jann Horn jannh@google.com fuse: Initialize beyond-EOF page contents before setting uptodate
David Howells dhowells@redhat.com netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
Paul Moore paul@paul-moore.com selinux: revert our use of vma_is_initial_heap()
Xu Yang xu.yang_2@nxp.com Revert "usb: typec: tcpm: clear pd_event queue in PORT_RESET"
Griffin Kroah-Hartman griffin@kroah.com Revert "serial: 8250_omap: Set the console genpd always on if no console suspend"
Griffin Kroah-Hartman griffin@kroah.com Revert "misc: fastrpc: Restrict untrusted app to attach to privileged PD"
Rafael J. Wysocki rafael.j.wysocki@intel.com Revert "ACPI: EC: Evaluate orphan _REG under EC device"
Mathieu Othacehe m.othacehe@gmail.com tty: atmel_serial: use the correct RTS flag.
Peng Fan peng.fan@nxp.com tty: serial: fsl_lpuart: mark last busy before uart_add_one_port
Masahiro Yamada masahiroy@kernel.org tty: vt: conmakehash: remove non-portable code printing comment header
-------------
Diffstat:
Documentation/ABI/testing/sysfs-devices-system-cpu | 3 +- Makefile | 6 +- arch/arm64/kernel/acpi_numa.c | 2 +- arch/arm64/kernel/setup.c | 3 - arch/arm64/kernel/smp.c | 2 + arch/arm64/kvm/sys_regs.c | 6 + arch/arm64/kvm/vgic/vgic-debug.c | 2 +- arch/arm64/kvm/vgic/vgic.h | 7 + arch/mips/kernel/cpu-probe.c | 4 + arch/powerpc/include/asm/topology.h | 13 ++ arch/riscv/kernel/traps.c | 4 +- arch/riscv/mm/init.c | 4 +- arch/s390/boot/startup.c | 55 ++++--- arch/s390/boot/vmem.c | 14 +- arch/s390/boot/vmlinux.lds.S | 7 +- arch/s390/include/asm/page.h | 3 +- arch/s390/include/asm/uv.h | 5 +- arch/s390/kernel/vmlinux.lds.S | 2 +- arch/s390/kvm/kvm-s390.h | 7 +- arch/s390/tools/relocs.c | 2 +- block/blk-mq-tag.c | 5 +- drivers/acpi/acpica/acevents.h | 6 +- drivers/acpi/acpica/evregion.c | 12 +- drivers/acpi/acpica/evxfregn.c | 64 +------- drivers/acpi/ec.c | 14 +- drivers/acpi/internal.h | 1 + drivers/acpi/scan.c | 2 + drivers/acpi/video_detect.c | 22 +++ drivers/ata/pata_macio.c | 23 ++- drivers/atm/idt77252.c | 9 +- drivers/bluetooth/btintel.c | 10 -- drivers/bluetooth/btintel_pcie.c | 3 - drivers/bluetooth/btmtksdio.c | 3 - drivers/bluetooth/btrtl.c | 1 - drivers/bluetooth/btusb.c | 4 +- drivers/bluetooth/hci_qca.c | 4 +- drivers/bluetooth/hci_vhci.c | 2 - drivers/char/xillybus/xillyusb.c | 42 ++++- drivers/gpio/gpio-mlxbf3.c | 14 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 8 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 53 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 + drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 4 +- drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 63 ++++++- drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.h | 7 +- drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c | 1 + drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 18 +- drivers/gpu/drm/amd/amdgpu/soc15d.h | 6 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 32 +++- .../drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 4 +- .../display/dc/resource/dcn321/dcn321_resource.c | 3 + drivers/gpu/drm/i915/display/intel_dp_hdcp.c | 4 +- drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 4 +- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 4 +- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 14 +- drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 20 ++- drivers/gpu/drm/msm/dp/dp_ctrl.c | 2 + drivers/gpu/drm/msm/dp/dp_panel.c | 19 ++- drivers/gpu/drm/msm/msm_mdss.c | 2 +- drivers/gpu/drm/nouveau/nvkm/core/firmware.c | 9 +- drivers/gpu/drm/nouveau/nvkm/falcon/fw.c | 6 + drivers/gpu/drm/v3d/v3d_sched.c | 14 +- drivers/gpu/drm/xe/display/xe_display.c | 6 +- drivers/gpu/drm/xe/xe_device.c | 4 +- drivers/gpu/drm/xe/xe_exec_queue.c | 19 --- drivers/gpu/drm/xe/xe_exec_queue_types.h | 10 -- drivers/gpu/drm/xe/xe_gt_pagefault.c | 18 +- drivers/gpu/drm/xe/xe_guc_submit.c | 8 +- drivers/gpu/drm/xe/xe_hw_fence.c | 59 +++++-- drivers/gpu/drm/xe/xe_hw_fence.h | 7 +- drivers/gpu/drm/xe/xe_lrc.c | 48 +++++- drivers/gpu/drm/xe/xe_lrc.h | 3 + drivers/gpu/drm/xe/xe_mmio.c | 41 ++++- drivers/gpu/drm/xe/xe_mmio.h | 2 +- drivers/gpu/drm/xe/xe_ring_ops.c | 22 +-- drivers/gpu/drm/xe/xe_sched_job.c | 182 ++++++++++++--------- drivers/gpu/drm/xe/xe_sched_job.h | 7 +- drivers/gpu/drm/xe/xe_sched_job_types.h | 20 ++- drivers/gpu/drm/xe/xe_trace.h | 11 +- drivers/hid/wacom_wac.c | 4 +- drivers/i2c/busses/i2c-qcom-geni.c | 4 +- drivers/i2c/busses/i2c-tegra.c | 4 +- drivers/input/input-mt.c | 3 + drivers/input/serio/i8042-acpipnpio.h | 20 +-- drivers/input/serio/i8042.c | 10 +- drivers/iommu/io-pgfault.c | 1 + drivers/iommu/iommufd/device.c | 2 +- drivers/md/dm-ioctl.c | 22 ++- drivers/md/dm.c | 4 +- drivers/md/persistent-data/dm-space-map-metadata.c | 4 +- drivers/md/raid1.c | 14 +- drivers/misc/fastrpc.c | 22 +-- drivers/mmc/core/mmc_test.c | 9 +- drivers/mmc/host/dw_mmc.c | 8 + drivers/mmc/host/mtk-sd.c | 8 +- drivers/net/bonding/bond_main.c | 21 +-- drivers/net/bonding/bond_options.c | 2 +- drivers/net/dsa/microchip/ksz_ptp.c | 5 +- drivers/net/dsa/mv88e6xxx/global1_atu.c | 3 +- drivers/net/dsa/ocelot/felix.c | 11 ++ drivers/net/dsa/vitesse-vsc73xx-core.c | 45 ++++- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 5 - drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 3 +- .../net/ethernet/freescale/dpaa2/dpaa2-switch.c | 7 +- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 28 +++- .../ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c | 3 + .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 4 +- .../net/ethernet/intel/ice/devlink/devlink_port.c | 4 +- drivers/net/ethernet/intel/ice/ice_base.c | 21 ++- drivers/net/ethernet/intel/ice/ice_txrx.c | 47 +----- drivers/net/ethernet/intel/igb/igb_main.c | 1 + drivers/net/ethernet/intel/igc/igc_defines.h | 6 + drivers/net/ethernet/intel/igc/igc_main.c | 8 +- drivers/net/ethernet/intel/igc/igc_tsn.c | 76 +++++++-- drivers/net/ethernet/intel/igc/igc_tsn.h | 1 + .../net/ethernet/marvell/octeontx2/af/rvu_cpt.c | 23 ++- drivers/net/ethernet/mediatek/mtk_wed.c | 6 +- .../ethernet/mellanox/mlx5/core/en/reporter_tx.c | 2 + .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 +- .../mellanox/mlx5/core/lib/ipsec_fs_roce.c | 6 +- drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c | 18 +- .../net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h | 8 + .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 10 ++ .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h | 2 + .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c | 50 +++++- drivers/net/ethernet/microsoft/mana/mana_en.c | 28 +++- drivers/net/ethernet/mscc/ocelot.c | 91 ++++++++++- drivers/net/ethernet/mscc/ocelot_fdma.c | 3 +- drivers/net/ethernet/mscc/ocelot_vsc7514.c | 4 + drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c | 8 +- drivers/net/ethernet/xilinx/xilinx_axienet.h | 17 +- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 25 +-- drivers/net/gtp.c | 3 + drivers/net/wireless/ath/ath12k/dp_tx.c | 72 ++++++++ drivers/net/wireless/ath/ath12k/hw.c | 6 + drivers/net/wireless/ath/ath12k/hw.h | 4 + drivers/net/wireless/ath/ath12k/mac.c | 1 + .../broadcom/brcm80211/brcmfmac/cfg80211.c | 13 +- drivers/nvme/host/core.c | 2 +- drivers/platform/surface/aggregator/controller.c | 3 +- drivers/platform/x86/dell/Kconfig | 1 + drivers/platform/x86/dell/dell-uart-backlight.c | 8 + .../x86/intel/speed_select_if/isst_tpmi_core.c | 3 +- drivers/pmdomain/imx/imx93-pd.c | 5 +- drivers/pmdomain/imx/scu-pd.c | 5 - drivers/s390/block/dasd.c | 36 ++-- drivers/s390/block/dasd_3990_erp.c | 10 +- drivers/s390/block/dasd_eckd.c | 57 +++---- drivers/s390/block/dasd_genhd.c | 1 - drivers/s390/block/dasd_int.h | 2 +- drivers/s390/crypto/ap_bus.c | 7 +- drivers/spi/spi-cadence-quadspi.c | 14 +- .../media/atomisp/pci/ia_css_stream_public.h | 8 +- .../staging/media/atomisp/pci/sh_css_internal.h | 19 ++- drivers/thermal/gov_bang_bang.c | 91 ++++++++--- drivers/thermal/thermal_core.c | 3 +- drivers/thermal/thermal_debugfs.c | 6 +- drivers/thermal/thermal_of.c | 23 ++- drivers/thunderbolt/switch.c | 1 + drivers/tty/serial/8250/8250_omap.c | 33 +--- drivers/tty/serial/atmel_serial.c | 2 +- drivers/tty/serial/fsl_lpuart.c | 1 + drivers/tty/vt/conmakehash.c | 12 +- drivers/usb/host/xhci-mem.c | 2 +- drivers/usb/host/xhci.c | 8 +- drivers/usb/misc/usb-ljca.c | 1 + drivers/usb/typec/tcpm/tcpm.c | 1 - fs/9p/vfs_addr.c | 3 +- fs/afs/file.c | 3 +- fs/btrfs/delayed-ref.c | 67 ++++++++ fs/btrfs/delayed-ref.h | 2 + fs/btrfs/extent-tree.c | 51 +++++- fs/btrfs/extent_io.c | 14 +- fs/btrfs/extent_map.c | 22 +-- fs/btrfs/free-space-cache.c | 14 +- fs/btrfs/send.c | 52 ++++-- fs/btrfs/super.c | 18 +- fs/btrfs/tree-checker.c | 74 ++++++++- fs/ceph/addr.c | 25 ++- fs/file.c | 28 ++-- fs/fuse/dev.c | 6 +- fs/inode.c | 39 ++++- fs/libfs.c | 35 ++-- fs/locks.c | 2 +- fs/netfs/buffered_read.c | 8 +- fs/netfs/buffered_write.c | 2 +- fs/netfs/fscache_cookie.c | 4 + fs/netfs/io.c | 161 +++++++++++++++++- fs/nfs/fscache.c | 3 +- fs/smb/client/cifsglob.h | 17 +- fs/smb/client/file.c | 58 +++++-- fs/smb/client/reparse.c | 11 +- fs/smb/client/smb1ops.c | 2 +- fs/smb/client/smb2ops.c | 42 ++++- fs/smb/client/smb2pdu.c | 40 ++++- fs/smb/client/trace.h | 55 ++++++- fs/smb/client/transport.c | 8 +- fs/smb/server/connection.c | 34 +++- fs/smb/server/connection.h | 3 +- fs/smb/server/mgmt/user_session.c | 8 + fs/smb/server/smb2pdu.c | 5 +- include/acpi/acpixf.h | 5 +- include/acpi/video.h | 1 + include/asm-generic/vmlinux.lds.h | 19 --- include/linux/bitmap.h | 12 ++ include/linux/bpf_verifier.h | 4 +- include/linux/dsa/ocelot.h | 47 ++++++ include/linux/fs.h | 5 + include/linux/hugetlb.h | 33 +++- include/linux/io_uring_types.h | 2 +- include/linux/mm.h | 11 ++ include/linux/panic.h | 1 + include/linux/pgalloc_tag.h | 13 ++ include/linux/thermal.h | 1 + include/net/af_vsock.h | 4 + include/net/bluetooth/hci.h | 17 +- include/net/bluetooth/hci_core.h | 2 +- include/net/kcm.h | 1 + include/net/mana/mana.h | 1 + include/scsi/scsi_cmnd.h | 2 +- include/soc/mscc/ocelot.h | 12 +- include/trace/events/netfs.h | 1 + include/uapi/misc/fastrpc.h | 3 - init/Kconfig | 25 +-- io_uring/io_uring.h | 2 +- io_uring/kbuf.c | 9 +- io_uring/napi.c | 50 +++--- io_uring/napi.h | 2 +- kernel/bpf/verifier.c | 5 +- kernel/cgroup/cpuset.c | 5 +- kernel/cpu.c | 12 +- kernel/events/core.c | 3 +- kernel/kallsyms.c | 60 +------ kernel/kallsyms_internal.h | 6 - kernel/kallsyms_selftest.c | 22 +-- kernel/panic.c | 8 +- kernel/printk/printk.c | 2 +- kernel/trace/trace.c | 2 +- kernel/vmcore_info.c | 4 - kernel/workqueue.c | 47 +++--- mm/huge_memory.c | 29 ++-- mm/memcontrol.c | 7 +- mm/memory-failure.c | 20 ++- mm/memory.c | 33 ++-- mm/mm_init.c | 12 +- mm/mseal.c | 14 +- mm/page_alloc.c | 51 +++--- mm/vmalloc.c | 11 +- net/bluetooth/hci_core.c | 19 +-- net/bluetooth/hci_event.c | 2 +- net/bluetooth/mgmt.c | 4 + net/bluetooth/smp.c | 146 ++++++++--------- net/bridge/br_netfilter_hooks.c | 6 +- net/dsa/tag_ocelot.c | 37 +---- net/ipv4/tcp_input.c | 28 ++-- net/ipv4/tcp_ipv4.c | 14 ++ net/ipv4/udp_offload.c | 3 +- net/ipv6/ip6_output.c | 10 ++ net/ipv6/ip6_tunnel.c | 12 +- net/ipv6/netfilter/nf_conntrack_reasm.c | 4 + net/iucv/iucv.c | 4 +- net/kcm/kcmsock.c | 4 + net/mctp/test/route-test.c | 2 +- net/mptcp/diag.c | 2 +- net/mptcp/pm.c | 13 -- net/mptcp/pm_netlink.c | 142 ++++++++++------ net/mptcp/protocol.h | 3 - net/netfilter/nf_flow_table_inet.c | 3 + net/netfilter/nf_flow_table_ip.c | 3 + net/netfilter/nf_flow_table_offload.c | 2 +- net/netfilter/nf_tables_api.c | 163 ++++++++++++------ net/netfilter/nfnetlink.c | 5 +- net/netfilter/nfnetlink_queue.c | 35 +++- net/netfilter/nft_counter.c | 9 +- net/openvswitch/datapath.c | 2 +- net/sched/sch_netem.c | 47 ++++-- net/vmw_vsock/af_vsock.c | 50 +++--- net/vmw_vsock/vsock_bpf.c | 4 +- scripts/kallsyms.c | 109 ++++-------- scripts/link-vmlinux.sh | 106 ++++++------ scripts/rust_is_available.sh | 6 +- security/keys/trusted-keys/trusted_dcp.c | 35 ++-- security/selinux/avc.c | 8 +- security/selinux/hooks.c | 12 +- sound/core/timer.c | 2 +- sound/pci/hda/patch_realtek.c | 1 - sound/pci/hda/tas2781_hda_i2c.c | 14 +- sound/usb/quirks-table.h | 1 + sound/usb/quirks.c | 2 + tools/perf/tests/vmlinux-kallsyms.c | 1 - tools/testing/selftests/bpf/progs/iters.c | 54 ++++++ tools/testing/selftests/core/close_range_test.c | 35 ++++ .../selftests/drivers/net/mlxsw/ethtool_lanes.sh | 3 +- tools/testing/selftests/mm/Makefile | 2 + tools/testing/selftests/mm/run_vmtests.sh | 3 + tools/testing/selftests/net/af_unix/msg_oob.c | 2 +- tools/testing/selftests/net/lib.sh | 11 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 28 +++- tools/testing/selftests/net/udpgro.sh | 53 +++--- tools/testing/selftests/tc-testing/tdc.py | 1 - tools/tracing/rtla/src/osnoise_top.c | 11 +- 305 files changed, 3476 insertions(+), 1744 deletions(-)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
commit 7258fdd7d7459616b3fe1a603e33900584b10c13 upstream.
Commit 6e20753da6bc ("tty: vt: conmakehash: cope with abs_srctree no longer in env") included <linux/limits.h>, which invoked another (wrong) patch that tried to address a build error on macOS.
According to the specification [1], the correct header to use PATH_MAX is <limits.h>.
The minimal fix would be to replace <linux/limits.h> with <limits.h>.
However, the following commits seem questionable to me:
- 3bd85c6c97b2 ("tty: vt: conmakehash: Don't mention the full path of the input in output") - 6e20753da6bc ("tty: vt: conmakehash: cope with abs_srctree no longer in env")
These commits made too many efforts to cope with a comment header in drivers/tty/vt/consolemap_deftbl.c:
/* * Do not edit this file; it was automatically generated by * * conmakehash drivers/tty/vt/cp437.uni > [this file] * */
With this commit, the header part of the generate C file will be simplified as follows:
/* * Automatically generated file; Do not edit. */
BTW, another series of excessive efforts for a comment header can be seen in the following:
- 5ef6dc08cfde ("lib/build_OID_registry: don't mention the full path of the script in output") - 2fe29fe94563 ("lib/build_OID_registry: avoid non-destructive substitution for Perl < 5.13.2 compat")
[1]: https://pubs.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html
Fixes: 6e20753da6bc ("tty: vt: conmakehash: cope with abs_srctree no longer in env") Cc: stable stable@kernel.org Reported-by: Daniel Gomez da.gomez@samsung.com Closes: https://lore.kernel.org/all/20240807-macos-build-support-v1-11-4cd1ded85694@... Signed-off-by: Masahiro Yamada masahiroy@kernel.org Link: https://lore.kernel.org/r/20240809160853.1269466-1-masahiroy@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/vt/conmakehash.c | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/drivers/tty/vt/conmakehash.c b/drivers/tty/vt/conmakehash.c index 82d9db68b2ce..a931fcde7ad9 100644 --- a/drivers/tty/vt/conmakehash.c +++ b/drivers/tty/vt/conmakehash.c @@ -11,8 +11,6 @@ * Copyright (C) 1995-1997 H. Peter Anvin */
-#include <libgen.h> -#include <linux/limits.h> #include <stdio.h> #include <stdlib.h> #include <sysexits.h> @@ -79,7 +77,6 @@ int main(int argc, char *argv[]) { FILE *ctbl; const char *tblname; - char base_tblname[PATH_MAX]; char buffer[65536]; int fontlen; int i, nuni, nent; @@ -245,20 +242,15 @@ int main(int argc, char *argv[]) for ( i = 0 ; i < fontlen ; i++ ) nuni += unicount[i];
- strncpy(base_tblname, tblname, PATH_MAX); - base_tblname[PATH_MAX - 1] = 0; printf("\ /*\n\ - * Do not edit this file; it was automatically generated by\n\ - *\n\ - * conmakehash %s > [this file]\n\ - *\n\ + * Automatically generated file; Do not edit.\n\ */\n\ \n\ #include <linux/types.h>\n\ \n\ u8 dfont_unicount[%d] = \n\ -{\n\t", basename(base_tblname), fontlen); +{\n\t", fontlen);
for ( i = 0 ; i < fontlen ; i++ ) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peng Fan peng.fan@nxp.com
commit dc98d76a15bc29a9a4e76f2f65f39f3e590fb15c upstream.
With "earlycon initcall_debug=1 loglevel=8" in bootargs, kernel sometimes boot hang. It is because normal console still is not ready, but runtime suspend is called, so early console putchar will hang in waiting TRDE set in UARTSTAT.
The lpuart driver has auto suspend delay set to 3000ms, but during uart_add_one_port, a child device serial ctrl will added and probed with its pm runtime enabled(see serial_ctrl.c). The runtime suspend call path is: device_add |-> bus_probe_device |->device_initial_probe |->__device_attach |-> pm_runtime_get_sync(dev->parent); |-> pm_request_idle(dev); |-> pm_runtime_put(dev->parent);
So in the end, before normal console ready, the lpuart get runtime suspended. And earlycon putchar will hang.
To address the issue, mark last busy just after pm_runtime_enable, three seconds is long enough to switch from bootconsole to normal console.
Fixes: 43543e6f539b ("tty: serial: fsl_lpuart: Add runtime pm support") Cc: stable stable@kernel.org Signed-off-by: Peng Fan peng.fan@nxp.com Link: https://lore.kernel.org/r/20240808140325.580105-1-peng.fan@oss.nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/fsl_lpuart.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/tty/serial/fsl_lpuart.c +++ b/drivers/tty/serial/fsl_lpuart.c @@ -2923,6 +2923,7 @@ static int lpuart_probe(struct platform_ pm_runtime_set_autosuspend_delay(&pdev->dev, UART_AUTOSUSPEND_TIMEOUT); pm_runtime_set_active(&pdev->dev); pm_runtime_enable(&pdev->dev); + pm_runtime_mark_last_busy(&pdev->dev);
ret = lpuart_global_reset(sport); if (ret)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathieu Othacehe othacehe@gnu.org
commit c9f6613b16123989f2c3bd04b1d9b2365d6914e7 upstream.
In RS485 mode, the RTS pin is driven high by hardware when the transmitter is operating. This behaviour cannot be changed. This means that the driver should claim that it supports SER_RS485_RTS_ON_SEND and not SER_RS485_RTS_AFTER_SEND.
Otherwise, when configuring the port with the SER_RS485_RTS_ON_SEND, one get the following warning:
kern.warning kernel: atmel_usart_serial atmel_usart_serial.2.auto: ttyS1 (1): invalid RTS setting, using RTS_AFTER_SEND instead
which is contradictory with what's really happening.
Signed-off-by: Mathieu Othacehe othacehe@gnu.org Cc: stable stable@kernel.org Tested-by: Alexander Dahl ada@thorsis.com Fixes: af47c491e3c7 ("serial: atmel: Fill in rs485_supported") Link: https://lore.kernel.org/r/20240808060637.19886-1-othacehe@gnu.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/atmel_serial.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/tty/serial/atmel_serial.c +++ b/drivers/tty/serial/atmel_serial.c @@ -2514,7 +2514,7 @@ static const struct uart_ops atmel_pops };
static const struct serial_rs485 atmel_rs485_supported = { - .flags = SER_RS485_ENABLED | SER_RS485_RTS_AFTER_SEND | SER_RS485_RX_DURING_TX, + .flags = SER_RS485_ENABLED | SER_RS485_RTS_ON_SEND | SER_RS485_RX_DURING_TX, .delay_rts_before_send = 1, .delay_rts_after_send = 1, };
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 779bac9994452f6a894524f70c00cfb0cd4b6364 upstream.
This reverts commit 0e6b6dedf168 ("Revert "ACPI: EC: Evaluate orphan _REG under EC device") because the problem addressed by it will be addressed differently in what follows.
Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Hans de Goede hdegoede@redhat.com Cc: All applicable stable@vger.kernel.org Link: https://patch.msgid.link/3236716.5fSG56mABF@rjwysocki.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/acpica/acevents.h | 4 --- drivers/acpi/acpica/evregion.c | 6 +++- drivers/acpi/acpica/evxfregn.c | 54 ----------------------------------------- drivers/acpi/ec.c | 3 -- include/acpi/acpixf.h | 4 --- 5 files changed, 5 insertions(+), 66 deletions(-)
--- a/drivers/acpi/acpica/acevents.h +++ b/drivers/acpi/acpica/acevents.h @@ -191,10 +191,6 @@ void acpi_ev_execute_reg_methods(struct acpi_namespace_node *node, acpi_adr_space_type space_id, u32 function);
-void -acpi_ev_execute_orphan_reg_method(struct acpi_namespace_node *node, - acpi_adr_space_type space_id); - acpi_status acpi_ev_execute_reg_method(union acpi_operand_object *region_obj, u32 function);
--- a/drivers/acpi/acpica/evregion.c +++ b/drivers/acpi/acpica/evregion.c @@ -20,6 +20,10 @@ extern u8 acpi_gbl_default_address_space
/* Local prototypes */
+static void +acpi_ev_execute_orphan_reg_method(struct acpi_namespace_node *device_node, + acpi_adr_space_type space_id); + static acpi_status acpi_ev_reg_run(acpi_handle obj_handle, u32 level, void *context, void **return_value); @@ -814,7 +818,7 @@ acpi_ev_reg_run(acpi_handle obj_handle, * ******************************************************************************/
-void +static void acpi_ev_execute_orphan_reg_method(struct acpi_namespace_node *device_node, acpi_adr_space_type space_id) { --- a/drivers/acpi/acpica/evxfregn.c +++ b/drivers/acpi/acpica/evxfregn.c @@ -306,57 +306,3 @@ acpi_execute_reg_methods(acpi_handle dev }
ACPI_EXPORT_SYMBOL(acpi_execute_reg_methods) - -/******************************************************************************* - * - * FUNCTION: acpi_execute_orphan_reg_method - * - * PARAMETERS: device - Handle for the device - * space_id - The address space ID - * - * RETURN: Status - * - * DESCRIPTION: Execute an "orphan" _REG method that appears under an ACPI - * device. This is a _REG method that has no corresponding region - * within the device's scope. - * - ******************************************************************************/ -acpi_status -acpi_execute_orphan_reg_method(acpi_handle device, acpi_adr_space_type space_id) -{ - struct acpi_namespace_node *node; - acpi_status status; - - ACPI_FUNCTION_TRACE(acpi_execute_orphan_reg_method); - - /* Parameter validation */ - - if (!device) { - return_ACPI_STATUS(AE_BAD_PARAMETER); - } - - status = acpi_ut_acquire_mutex(ACPI_MTX_NAMESPACE); - if (ACPI_FAILURE(status)) { - return_ACPI_STATUS(status); - } - - /* Convert and validate the device handle */ - - node = acpi_ns_validate_handle(device); - if (node) { - - /* - * If an "orphan" _REG method is present in the device's scope - * for the given address space ID, run it. - */ - - acpi_ev_execute_orphan_reg_method(node, space_id); - } else { - status = AE_BAD_PARAMETER; - } - - (void)acpi_ut_release_mutex(ACPI_MTX_NAMESPACE); - return_ACPI_STATUS(status); -} - -ACPI_EXPORT_SYMBOL(acpi_execute_orphan_reg_method) --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -1507,9 +1507,6 @@ static int ec_install_handlers(struct ac
if (call_reg && !test_bit(EC_FLAGS_EC_REG_CALLED, &ec->flags)) { acpi_execute_reg_methods(scope_handle, ACPI_ADR_SPACE_EC); - if (scope_handle != ec->handle) - acpi_execute_orphan_reg_method(ec->handle, ACPI_ADR_SPACE_EC); - set_bit(EC_FLAGS_EC_REG_CALLED, &ec->flags); }
--- a/include/acpi/acpixf.h +++ b/include/acpi/acpixf.h @@ -663,10 +663,6 @@ ACPI_EXTERNAL_RETURN_STATUS(acpi_status acpi_adr_space_type space_id)) ACPI_EXTERNAL_RETURN_STATUS(acpi_status - acpi_execute_orphan_reg_method(acpi_handle device, - acpi_adr_space_type - space_id)) -ACPI_EXTERNAL_RETURN_STATUS(acpi_status acpi_remove_address_space_handler(acpi_handle device, acpi_adr_space_type
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Griffin Kroah-Hartman griffin@kroah.com
commit 9bb5e74b2bf88fbb024bb15ded3b011e02c673be upstream.
This reverts commit bab2f5e8fd5d2f759db26b78d9db57412888f187.
Joel reported that this commit breaks userspace and stops sensors in SDM845 from working. Also breaks other qcom SoC devices running postmarketOS.
Cc: stable stable@kernel.org Cc: Ekansh Gupta quic_ekangupt@quicinc.com Cc: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reported-by: Joel Selvaraj joelselvaraj.oss@gmail.com Link: https://lore.kernel.org/r/9a9f5646-a554-4b65-8122-d212bb665c81@umsystem.edu Signed-off-by: Griffin Kroah-Hartman griffin@kroah.com Acked-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Fixes: bab2f5e8fd5d ("misc: fastrpc: Restrict untrusted app to attach to privileged PD") Link: https://lore.kernel.org/r/20240815094920.8242-1-griffin@kroah.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/misc/fastrpc.c | 22 +++------------------- include/uapi/misc/fastrpc.h | 3 --- 2 files changed, 3 insertions(+), 22 deletions(-)
--- a/drivers/misc/fastrpc.c +++ b/drivers/misc/fastrpc.c @@ -2087,16 +2087,6 @@ err_invoke: return err; }
-static int is_attach_rejected(struct fastrpc_user *fl) -{ - /* Check if the device node is non-secure */ - if (!fl->is_secure_dev) { - dev_dbg(&fl->cctx->rpdev->dev, "untrusted app trying to attach to privileged DSP PD\n"); - return -EACCES; - } - return 0; -} - static long fastrpc_device_ioctl(struct file *file, unsigned int cmd, unsigned long arg) { @@ -2109,19 +2099,13 @@ static long fastrpc_device_ioctl(struct err = fastrpc_invoke(fl, argp); break; case FASTRPC_IOCTL_INIT_ATTACH: - err = is_attach_rejected(fl); - if (!err) - err = fastrpc_init_attach(fl, ROOT_PD); + err = fastrpc_init_attach(fl, ROOT_PD); break; case FASTRPC_IOCTL_INIT_ATTACH_SNS: - err = is_attach_rejected(fl); - if (!err) - err = fastrpc_init_attach(fl, SENSORS_PD); + err = fastrpc_init_attach(fl, SENSORS_PD); break; case FASTRPC_IOCTL_INIT_CREATE_STATIC: - err = is_attach_rejected(fl); - if (!err) - err = fastrpc_init_create_static_process(fl, argp); + err = fastrpc_init_create_static_process(fl, argp); break; case FASTRPC_IOCTL_INIT_CREATE: err = fastrpc_init_create_process(fl, argp); --- a/include/uapi/misc/fastrpc.h +++ b/include/uapi/misc/fastrpc.h @@ -8,14 +8,11 @@ #define FASTRPC_IOCTL_ALLOC_DMA_BUFF _IOWR('R', 1, struct fastrpc_alloc_dma_buf) #define FASTRPC_IOCTL_FREE_DMA_BUFF _IOWR('R', 2, __u32) #define FASTRPC_IOCTL_INVOKE _IOWR('R', 3, struct fastrpc_invoke) -/* This ioctl is only supported with secure device nodes */ #define FASTRPC_IOCTL_INIT_ATTACH _IO('R', 4) #define FASTRPC_IOCTL_INIT_CREATE _IOWR('R', 5, struct fastrpc_init_create) #define FASTRPC_IOCTL_MMAP _IOWR('R', 6, struct fastrpc_req_mmap) #define FASTRPC_IOCTL_MUNMAP _IOWR('R', 7, struct fastrpc_req_munmap) -/* This ioctl is only supported with secure device nodes */ #define FASTRPC_IOCTL_INIT_ATTACH_SNS _IO('R', 8) -/* This ioctl is only supported with secure device nodes */ #define FASTRPC_IOCTL_INIT_CREATE_STATIC _IOWR('R', 9, struct fastrpc_init_create_static) #define FASTRPC_IOCTL_MEM_MAP _IOWR('R', 10, struct fastrpc_mem_map) #define FASTRPC_IOCTL_MEM_UNMAP _IOWR('R', 11, struct fastrpc_mem_unmap)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Griffin Kroah-Hartman griffin@kroah.com
commit 0863bffda1131fd2fa9c05b653ad9ee3d8db127e upstream.
This reverts commit 68e6939ea9ec3d6579eadeab16060339cdeaf940.
Kevin reported that this causes a crash during suspend on platforms that dont use PM domains.
Link: https://lore.kernel.org/r/7ha5hgpchq.fsf@baylibre.com Cc: Thomas Richard thomas.richard@bootlin.com Fixes: 68e6939ea9ec ("serial: 8250_omap: Set the console genpd always on if no console suspend") Cc: stable stable@kernel.org Reported-by: Kevin Hilman khilman@kernel.org Signed-off-by: Griffin Kroah-Hartman griffin@kroah.com Link: https://lore.kernel.org/r/20240814111747.82371-1-griffin@kroah.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/8250/8250_omap.c | 33 +++++------------------------ 1 file changed, 5 insertions(+), 28 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c index 1af9aed99c65..afef1dd4ddf4 100644 --- a/drivers/tty/serial/8250/8250_omap.c +++ b/drivers/tty/serial/8250/8250_omap.c @@ -27,7 +27,6 @@ #include <linux/pm_wakeirq.h> #include <linux/dma-mapping.h> #include <linux/sys_soc.h> -#include <linux/pm_domain.h>
#include "8250.h"
@@ -119,12 +118,6 @@ #define UART_OMAP_TO_L 0x26 #define UART_OMAP_TO_H 0x27
-/* - * Copy of the genpd flags for the console. - * Only used if console suspend is disabled - */ -static unsigned int genpd_flags_console; - struct omap8250_priv { void __iomem *membase; int line; @@ -1655,7 +1648,6 @@ static int omap8250_suspend(struct device *dev) { struct omap8250_priv *priv = dev_get_drvdata(dev); struct uart_8250_port *up = serial8250_get_port(priv->line); - struct generic_pm_domain *genpd = pd_to_genpd(dev->pm_domain); int err = 0;
serial8250_suspend_port(priv->line); @@ -1666,19 +1658,8 @@ static int omap8250_suspend(struct device *dev) if (!device_may_wakeup(dev)) priv->wer = 0; serial_out(up, UART_OMAP_WER, priv->wer); - if (uart_console(&up->port)) { - if (console_suspend_enabled) - err = pm_runtime_force_suspend(dev); - else { - /* - * The pd shall not be powered-off (no console suspend). - * Make copy of genpd flags before to set it always on. - * The original value is restored during the resume. - */ - genpd_flags_console = genpd->flags; - genpd->flags |= GENPD_FLAG_ALWAYS_ON; - } - } + if (uart_console(&up->port) && console_suspend_enabled) + err = pm_runtime_force_suspend(dev); flush_work(&priv->qos_work);
return err; @@ -1688,16 +1669,12 @@ static int omap8250_resume(struct device *dev) { struct omap8250_priv *priv = dev_get_drvdata(dev); struct uart_8250_port *up = serial8250_get_port(priv->line); - struct generic_pm_domain *genpd = pd_to_genpd(dev->pm_domain); int err;
if (uart_console(&up->port) && console_suspend_enabled) { - if (console_suspend_enabled) { - err = pm_runtime_force_resume(dev); - if (err) - return err; - } else - genpd->flags = genpd_flags_console; + err = pm_runtime_force_resume(dev); + if (err) + return err; }
serial8250_resume_port(priv->line);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xu Yang xu.yang_2@nxp.com
commit 21ea1ce37fc267dc45fe27517bbde926211683df upstream.
This reverts commit bf20c69cf3cf9c6445c4925dd9a8a6ca1b78bfdf.
During tcpm_init() stage, if the VBUS is still present after tcpm_reset_port(), then we assume that VBUS will off and goto safe0v after a specific discharge time. Following a TCPM_VBUS_EVENT event if VBUS reach to off state. TCPM_VBUS_EVENT event may be set during PORT_RESET handling stage. If pd_events reset to 0 after TCPM_VBUS_EVENT set, we will lost this VBUS event. Then the port state machine may stuck at one state.
Before:
[ 2.570172] pending state change PORT_RESET -> PORT_RESET_WAIT_OFF @ 100 ms [rev1 NONE_AMS] [ 2.570179] state change PORT_RESET -> PORT_RESET_WAIT_OFF [delayed 100 ms] [ 2.570182] pending state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED @ 920 ms [rev1 NONE_AMS] [ 3.490213] state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED [delayed 920 ms] [ 3.490220] Start toggling [ 3.546050] CC1: 0 -> 0, CC2: 0 -> 2 [state TOGGLING, polarity 0, connected] [ 3.546057] state change TOGGLING -> SRC_ATTACH_WAIT [rev1 NONE_AMS]
After revert this patch, we can see VBUS off event and the port will goto expected state.
[ 2.441992] pending state change PORT_RESET -> PORT_RESET_WAIT_OFF @ 100 ms [rev1 NONE_AMS] [ 2.441999] state change PORT_RESET -> PORT_RESET_WAIT_OFF [delayed 100 ms] [ 2.442002] pending state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED @ 920 ms [rev1 NONE_AMS] [ 2.442122] VBUS off [ 2.442125] state change PORT_RESET_WAIT_OFF -> SNK_UNATTACHED [rev1 NONE_AMS] [ 2.442127] VBUS VSAFE0V [ 2.442351] CC1: 0 -> 0, CC2: 0 -> 0 [state SNK_UNATTACHED, polarity 0, disconnected] [ 2.442357] Start toggling [ 2.491850] CC1: 0 -> 0, CC2: 0 -> 2 [state TOGGLING, polarity 0, connected] [ 2.491858] state change TOGGLING -> SRC_ATTACH_WAIT [rev1 NONE_AMS] [ 2.491863] pending state change SRC_ATTACH_WAIT -> SNK_TRY @ 200 ms [rev1 NONE_AMS] [ 2.691905] state change SRC_ATTACH_WAIT -> SNK_TRY [delayed 200 ms]
Fixes: bf20c69cf3cf ("usb: typec: tcpm: clear pd_event queue in PORT_RESET") Cc: stable@vger.kernel.org Signed-off-by: Xu Yang xu.yang_2@nxp.com Acked-by: Heikki Krogerus heikki.krogerus@linux.intel.com Link: https://lore.kernel.org/r/20240809112901.535072-1-xu.yang_2@nxp.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/typec/tcpm/tcpm.c | 1 - 1 file changed, 1 deletion(-)
--- a/drivers/usb/typec/tcpm/tcpm.c +++ b/drivers/usb/typec/tcpm/tcpm.c @@ -5630,7 +5630,6 @@ static void run_state_machine(struct tcp break; case PORT_RESET: tcpm_reset_port(port); - port->pd_events = 0; if (port->self_powered) tcpm_set_cc(port, TYPEC_CC_OPEN); else
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Moore paul@paul-moore.com
commit 05a3d6e9307250a5911d75308e4363466794ab21 upstream.
Unfortunately it appears that vma_is_initial_heap() is currently broken for applications that do not currently have any heap allocated, e.g. brk == start_brk. The breakage is such that it will cause SELinux to check for the process/execheap permission on memory regions that cross brk/start_brk even when there is no heap.
The proper fix would be to correct vma_is_initial_heap(), but as there are multiple callers I am hesitant to unilaterally modify the helper out of concern that I would end up breaking some other subsystem. The mm developers have been made aware of the situation and hopefully they will have a fix at some point in the future, but we need a fix soon so we are simply going to revert our use of vma_is_initial_heap() in favor of our old logic/code which works as expected, even in the face of a zero size heap. We can return to using vma_is_initial_heap() at some point in the future when it is fixed.
Cc: stable@vger.kernel.org Reported-by: Marc Reisner reisner.marc@gmail.com Closes: https://lore.kernel.org/all/ZrPmoLKJEf1wiFmM@marcreisner.com Fixes: 68df1baf158f ("selinux: use vma_is_initial_stack() and vma_is_initial_heap()") Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/selinux/hooks.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
--- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3852,7 +3852,17 @@ static int selinux_file_mprotect(struct if (default_noexec && (prot & PROT_EXEC) && !(vma->vm_flags & VM_EXEC)) { int rc = 0; - if (vma_is_initial_heap(vma)) { + /* + * We don't use the vma_is_initial_heap() helper as it has + * a history of problems and is currently broken on systems + * where there is no heap, e.g. brk == start_brk. Before + * replacing the conditional below with vma_is_initial_heap(), + * or something similar, please ensure that the logic is the + * same as what we have below or you have tested every possible + * corner case you can think to test. + */ + if (vma->vm_start >= vma->vm_mm->start_brk && + vma->vm_end <= vma->vm_mm->brk) { rc = avc_has_perm(sid, sid, SECCLASS_PROCESS, PROCESS__EXECHEAP, NULL); } else if (!vma->vm_file && (vma_is_initial_stack(vma) ||
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
commit 8e5ced7804cb9184c4a23f8054551240562a8eda upstream.
This reverts commit ae678317b95e760607c7b20b97c9cd4ca9ed6e1a.
Revert the patch that removes the deprecated use of PG_private_2 in netfslib for the moment as Ceph is actually still using this to track data copied to the cache.
Fixes: ae678317b95e ("netfs: Remove deprecated use of PG_private_2 as a second writeback flag") Reported-by: Max Kellermann max.kellermann@ionos.com Signed-off-by: David Howells dhowells@redhat.com cc: Ilya Dryomov idryomov@gmail.com cc: Xiubo Li xiubli@redhat.com cc: Jeff Layton jlayton@kernel.org cc: Matthew Wilcox willy@infradead.org cc: ceph-devel@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: linux-mm@kvack.org https: //lore.kernel.org/r/3575457.1722355300@warthog.procyon.org.uk Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ceph/addr.c | 19 +++++ fs/netfs/buffered_read.c | 8 ++ fs/netfs/io.c | 144 +++++++++++++++++++++++++++++++++++++++++++ include/trace/events/netfs.h | 1 4 files changed, 170 insertions(+), 2 deletions(-)
--- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -498,6 +498,11 @@ const struct netfs_request_ops ceph_netf };
#ifdef CONFIG_CEPH_FSCACHE +static void ceph_set_page_fscache(struct page *page) +{ + folio_start_private_2(page_folio(page)); /* [DEPRECATED] */ +} + static void ceph_fscache_write_terminated(void *priv, ssize_t error, bool was_async) { struct inode *inode = priv; @@ -515,6 +520,10 @@ static void ceph_fscache_write_to_cache( ceph_fscache_write_terminated, inode, true, caching); } #else +static inline void ceph_set_page_fscache(struct page *page) +{ +} + static inline void ceph_fscache_write_to_cache(struct inode *inode, u64 off, u64 len, bool caching) { } @@ -706,6 +715,8 @@ static int writepage_nounlock(struct pag len = wlen;
set_page_writeback(page); + if (caching) + ceph_set_page_fscache(page); ceph_fscache_write_to_cache(inode, page_off, len, caching);
if (IS_ENCRYPTED(inode)) { @@ -789,6 +800,8 @@ static int ceph_writepage(struct page *p return AOP_WRITEPAGE_ACTIVATE; }
+ folio_wait_private_2(page_folio(page)); /* [DEPRECATED] */ + err = writepage_nounlock(page, wbc); if (err == -ERESTARTSYS) { /* direct memory reclaimer was killed by SIGKILL. return 0 @@ -1062,7 +1075,8 @@ get_more_pages: unlock_page(page); break; } - if (PageWriteback(page)) { + if (PageWriteback(page) || + PagePrivate2(page) /* [DEPRECATED] */) { if (wbc->sync_mode == WB_SYNC_NONE) { doutc(cl, "%p under writeback\n", page); unlock_page(page); @@ -1070,6 +1084,7 @@ get_more_pages: } doutc(cl, "waiting on writeback %p\n", page); wait_on_page_writeback(page); + folio_wait_private_2(page_folio(page)); /* [DEPRECATED] */ }
if (!clear_page_dirty_for_io(page)) { @@ -1254,6 +1269,8 @@ new_request: }
set_page_writeback(page); + if (caching) + ceph_set_page_fscache(page); len += thp_size(page); } ceph_fscache_write_to_cache(inode, offset, len, caching); --- a/fs/netfs/buffered_read.c +++ b/fs/netfs/buffered_read.c @@ -466,7 +466,7 @@ retry: if (!netfs_is_cache_enabled(ctx) && netfs_skip_folio_read(folio, pos, len, false)) { netfs_stat(&netfs_n_rh_write_zskip); - goto have_folio; + goto have_folio_no_wait; }
rreq = netfs_alloc_request(mapping, file, @@ -507,6 +507,12 @@ retry: netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
have_folio: + if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags)) { + ret = folio_wait_private_2_killable(folio); + if (ret < 0) + goto error; + } +have_folio_no_wait: *_folio = folio; kleave(" = 0"); return 0; --- a/fs/netfs/io.c +++ b/fs/netfs/io.c @@ -99,6 +99,146 @@ static void netfs_rreq_completed(struct }
/* + * [DEPRECATED] Deal with the completion of writing the data to the cache. We + * have to clear the PG_fscache bits on the folios involved and release the + * caller's ref. + * + * May be called in softirq mode and we inherit a ref from the caller. + */ +static void netfs_rreq_unmark_after_write(struct netfs_io_request *rreq, + bool was_async) +{ + struct netfs_io_subrequest *subreq; + struct folio *folio; + pgoff_t unlocked = 0; + bool have_unlocked = false; + + rcu_read_lock(); + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + XA_STATE(xas, &rreq->mapping->i_pages, subreq->start / PAGE_SIZE); + + xas_for_each(&xas, folio, (subreq->start + subreq->len - 1) / PAGE_SIZE) { + if (xas_retry(&xas, folio)) + continue; + + /* We might have multiple writes from the same huge + * folio, but we mustn't unlock a folio more than once. + */ + if (have_unlocked && folio->index <= unlocked) + continue; + unlocked = folio_next_index(folio) - 1; + trace_netfs_folio(folio, netfs_folio_trace_end_copy); + folio_end_private_2(folio); + have_unlocked = true; + } + } + + rcu_read_unlock(); + netfs_rreq_completed(rreq, was_async); +} + +static void netfs_rreq_copy_terminated(void *priv, ssize_t transferred_or_error, + bool was_async) /* [DEPRECATED] */ +{ + struct netfs_io_subrequest *subreq = priv; + struct netfs_io_request *rreq = subreq->rreq; + + if (IS_ERR_VALUE(transferred_or_error)) { + netfs_stat(&netfs_n_rh_write_failed); + trace_netfs_failure(rreq, subreq, transferred_or_error, + netfs_fail_copy_to_cache); + } else { + netfs_stat(&netfs_n_rh_write_done); + } + + trace_netfs_sreq(subreq, netfs_sreq_trace_write_term); + + /* If we decrement nr_copy_ops to 0, the ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_copy_ops)) + netfs_rreq_unmark_after_write(rreq, was_async); + + netfs_put_subrequest(subreq, was_async, netfs_sreq_trace_put_terminated); +} + +/* + * [DEPRECATED] Perform any outstanding writes to the cache. We inherit a ref + * from the caller. + */ +static void netfs_rreq_do_write_to_cache(struct netfs_io_request *rreq) +{ + struct netfs_cache_resources *cres = &rreq->cache_resources; + struct netfs_io_subrequest *subreq, *next, *p; + struct iov_iter iter; + int ret; + + trace_netfs_rreq(rreq, netfs_rreq_trace_copy); + + /* We don't want terminating writes trying to wake us up whilst we're + * still going through the list. + */ + atomic_inc(&rreq->nr_copy_ops); + + list_for_each_entry_safe(subreq, p, &rreq->subrequests, rreq_link) { + if (!test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags)) { + list_del_init(&subreq->rreq_link); + netfs_put_subrequest(subreq, false, + netfs_sreq_trace_put_no_copy); + } + } + + list_for_each_entry(subreq, &rreq->subrequests, rreq_link) { + /* Amalgamate adjacent writes */ + while (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) { + next = list_next_entry(subreq, rreq_link); + if (next->start != subreq->start + subreq->len) + break; + subreq->len += next->len; + list_del_init(&next->rreq_link); + netfs_put_subrequest(next, false, + netfs_sreq_trace_put_merged); + } + + ret = cres->ops->prepare_write(cres, &subreq->start, &subreq->len, + subreq->len, rreq->i_size, true); + if (ret < 0) { + trace_netfs_failure(rreq, subreq, ret, netfs_fail_prepare_write); + trace_netfs_sreq(subreq, netfs_sreq_trace_write_skip); + continue; + } + + iov_iter_xarray(&iter, ITER_SOURCE, &rreq->mapping->i_pages, + subreq->start, subreq->len); + + atomic_inc(&rreq->nr_copy_ops); + netfs_stat(&netfs_n_rh_write); + netfs_get_subrequest(subreq, netfs_sreq_trace_get_copy_to_cache); + trace_netfs_sreq(subreq, netfs_sreq_trace_write); + cres->ops->write(cres, subreq->start, &iter, + netfs_rreq_copy_terminated, subreq); + } + + /* If we decrement nr_copy_ops to 0, the usage ref belongs to us. */ + if (atomic_dec_and_test(&rreq->nr_copy_ops)) + netfs_rreq_unmark_after_write(rreq, false); +} + +static void netfs_rreq_write_to_cache_work(struct work_struct *work) /* [DEPRECATED] */ +{ + struct netfs_io_request *rreq = + container_of(work, struct netfs_io_request, work); + + netfs_rreq_do_write_to_cache(rreq); +} + +static void netfs_rreq_write_to_cache(struct netfs_io_request *rreq) /* [DEPRECATED] */ +{ + rreq->work.func = netfs_rreq_write_to_cache_work; + if (!queue_work(system_unbound_wq, &rreq->work)) + BUG(); +} + +/* * Handle a short read. */ static void netfs_rreq_short_read(struct netfs_io_request *rreq, @@ -275,6 +415,10 @@ again: clear_bit_unlock(NETFS_RREQ_IN_PROGRESS, &rreq->flags); wake_up_bit(&rreq->flags, NETFS_RREQ_IN_PROGRESS);
+ if (test_bit(NETFS_RREQ_COPY_TO_CACHE, &rreq->flags) && + test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) + return netfs_rreq_write_to_cache(rreq); + netfs_rreq_completed(rreq, was_async); }
--- a/include/trace/events/netfs.h +++ b/include/trace/events/netfs.h @@ -145,6 +145,7 @@ EM(netfs_folio_trace_clear_g, "clear-g") \ EM(netfs_folio_trace_clear_s, "clear-s") \ EM(netfs_folio_trace_copy_to_cache, "mark-copy") \ + EM(netfs_folio_trace_end_copy, "end-copy") \ EM(netfs_folio_trace_filled_gaps, "filled-gaps") \ EM(netfs_folio_trace_kill, "kill") \ EM(netfs_folio_trace_kill_cc, "kill-cc") \
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
commit 3c0da3d163eb32f1f91891efaade027fa9b245b9 upstream.
fuse_notify_store(), unlike fuse_do_readpage(), does not enable page zeroing (because it can be used to change partial page contents).
So fuse_notify_store() must be more careful to fully initialize page contents (including parts of the page that are beyond end-of-file) before marking the page uptodate.
The current code can leave beyond-EOF page contents uninitialized, which makes these uninitialized page contents visible to userspace via mmap().
This is an information leak, but only affects systems which do not enable init-on-alloc (via CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y or the corresponding kernel command line parameter).
Link: https://bugs.chromium.org/p/project-zero/issues/detail?id=2574 Cc: stable@kernel.org Fixes: a1d75f258230 ("fuse: add store request") Signed-off-by: Jann Horn jannh@google.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/fuse/dev.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/fuse/dev.c +++ b/fs/fuse/dev.c @@ -1618,9 +1618,11 @@ static int fuse_notify_store(struct fuse
this_num = min_t(unsigned, num, PAGE_SIZE - offset); err = fuse_copy_page(cs, &page, offset, this_num, 0); - if (!err && offset == 0 && - (this_num == PAGE_SIZE || file_size == end)) + if (!PageUptodate(page) && !err && offset == 0 && + (this_num == PAGE_SIZE || file_size == end)) { + zero_user_segment(page, this_num, PAGE_SIZE); SetPageUptodate(page); + } unlock_page(page); put_page(page);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eli Billauer eli.billauer@gmail.com
commit ccbde4b128ef9c73d14d0d7817d68ef795f6d131 upstream.
Triggered by a kref decrement, destroy_workqueue() may be called from within a work item for destroying its own workqueue. This illegal situation is averted by adding a module-global workqueue for exclusive use of the offending work item. Other work items continue to be queued on per-device workqueues to ensure performance.
Reported-by: syzbot+91dbdfecdd3287734d8e@syzkaller.appspotmail.com Cc: stable stable@kernel.org Closes: https://lore.kernel.org/lkml/0000000000000ab25a061e1dfe9f@google.com/ Signed-off-by: Eli Billauer eli.billauer@gmail.com Link: https://lore.kernel.org/r/20240801121126.60183-1-eli.billauer@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/xillybus/xillyusb.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)
--- a/drivers/char/xillybus/xillyusb.c +++ b/drivers/char/xillybus/xillyusb.c @@ -50,6 +50,7 @@ MODULE_LICENSE("GPL v2"); static const char xillyname[] = "xillyusb";
static unsigned int fifo_buf_order; +static struct workqueue_struct *wakeup_wq;
#define USB_VENDOR_ID_XILINX 0x03fd #define USB_VENDOR_ID_ALTERA 0x09fb @@ -569,10 +570,6 @@ static void cleanup_dev(struct kref *kre * errors if executed. The mechanism relies on that xdev->error is assigned * a non-zero value by report_io_error() prior to queueing wakeup_all(), * which prevents bulk_in_work() from calling process_bulk_in(). - * - * The fact that wakeup_all() and bulk_in_work() are queued on the same - * workqueue makes their concurrent execution very unlikely, however the - * kernel's API doesn't seem to ensure this strictly. */
static void wakeup_all(struct work_struct *work) @@ -627,7 +624,7 @@ static void report_io_error(struct xilly
if (do_once) { kref_get(&xdev->kref); /* xdev is used by work item */ - queue_work(xdev->workq, &xdev->wakeup_workitem); + queue_work(wakeup_wq, &xdev->wakeup_workitem); } }
@@ -2258,6 +2255,10 @@ static int __init xillyusb_init(void) { int rc = 0;
+ wakeup_wq = alloc_workqueue(xillyname, 0, 0); + if (!wakeup_wq) + return -ENOMEM; + if (LOG2_INITIAL_FIFO_BUF_SIZE > PAGE_SHIFT) fifo_buf_order = LOG2_INITIAL_FIFO_BUF_SIZE - PAGE_SHIFT; else @@ -2265,11 +2266,16 @@ static int __init xillyusb_init(void)
rc = usb_register(&xillyusb_driver);
+ if (rc) + destroy_workqueue(wakeup_wq); + return rc; }
static void __exit xillyusb_exit(void) { + destroy_workqueue(wakeup_wq); + usb_deregister(&xillyusb_driver); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eli Billauer eli.billauer@gmail.com
commit ad899c301c880766cc709aad277991b3ab671b66 upstream.
As the wakeup work item now runs on a separate workqueue, it needs to be flushed separately along with flushing the device's workqueue.
Also, move the destroy_workqueue() call to the end of the exit method, so that deinitialization is done in the opposite order of initialization.
Fixes: ccbde4b128ef ("char: xillybus: Don't destroy workqueue from work item running on it") Cc: stable stable@kernel.org Signed-off-by: Eli Billauer eli.billauer@gmail.com Link: https://lore.kernel.org/r/20240816070200.50695-1-eli.billauer@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/xillybus/xillyusb.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/char/xillybus/xillyusb.c +++ b/drivers/char/xillybus/xillyusb.c @@ -2093,9 +2093,11 @@ static int xillyusb_discovery(struct usb * just after responding with the IDT, there is no reason for any * work item to be running now. To be sure that xdev->channels * is updated on anything that might run in parallel, flush the - * workqueue, which rarely does anything. + * device's workqueue and the wakeup work item. This rarely + * does anything. */ flush_workqueue(xdev->workq); + flush_work(&xdev->wakeup_workitem);
xdev->num_channels = num_channels;
@@ -2274,9 +2276,9 @@ static int __init xillyusb_init(void)
static void __exit xillyusb_exit(void) { - destroy_workqueue(wakeup_wq); - usb_deregister(&xillyusb_driver); + + destroy_workqueue(wakeup_wq); }
module_init(xillyusb_init);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eli Billauer eli.billauer@gmail.com
commit 2374bf7558de915edc6ec8cb10ec3291dfab9594 upstream.
Ensure, as the driver probes the device, that all endpoints that the driver may attempt to access exist and are of the correct type.
All XillyUSB devices must have a Bulk IN and Bulk OUT endpoint at address 1. This is verified in xillyusb_setup_base_eps().
On top of that, a XillyUSB device may have additional Bulk OUT endpoints. The information about these endpoints' addresses is deduced from a data structure (the IDT) that the driver fetches from the device while probing it. These endpoints are checked in setup_channels().
A XillyUSB device never has more than one IN endpoint, as all data towards the host is multiplexed in this single Bulk IN endpoint. This is why setup_channels() only checks OUT endpoints.
Reported-by: syzbot+eac39cba052f2e750dbe@syzkaller.appspotmail.com Cc: stable stable@kernel.org Closes: https://lore.kernel.org/all/0000000000001d44a6061f7a54ee@google.com/T/ Fixes: a53d1202aef1 ("char: xillybus: Add driver for XillyUSB (Xillybus variant for USB)"). Signed-off-by: Eli Billauer eli.billauer@gmail.com Link: https://lore.kernel.org/r/20240816070200.50695-2-eli.billauer@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/xillybus/xillyusb.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)
--- a/drivers/char/xillybus/xillyusb.c +++ b/drivers/char/xillybus/xillyusb.c @@ -1903,6 +1903,13 @@ static const struct file_operations xill
static int xillyusb_setup_base_eps(struct xillyusb_dev *xdev) { + struct usb_device *udev = xdev->udev; + + /* Verify that device has the two fundamental bulk in/out endpoints */ + if (usb_pipe_type_check(udev, usb_sndbulkpipe(udev, MSG_EP_NUM)) || + usb_pipe_type_check(udev, usb_rcvbulkpipe(udev, IN_EP_NUM))) + return -ENODEV; + xdev->msg_ep = endpoint_alloc(xdev, MSG_EP_NUM | USB_DIR_OUT, bulk_out_work, 1, 2); if (!xdev->msg_ep) @@ -1932,14 +1939,15 @@ static int setup_channels(struct xillyus __le16 *chandesc, int num_channels) { - struct xillyusb_channel *chan; + struct usb_device *udev = xdev->udev; + struct xillyusb_channel *chan, *new_channels; int i;
chan = kcalloc(num_channels, sizeof(*chan), GFP_KERNEL); if (!chan) return -ENOMEM;
- xdev->channels = chan; + new_channels = chan;
for (i = 0; i < num_channels; i++, chan++) { unsigned int in_desc = le16_to_cpu(*chandesc++); @@ -1968,6 +1976,15 @@ static int setup_channels(struct xillyus */
if ((out_desc & 0x80) && i < 14) { /* Entry is valid */ + if (usb_pipe_type_check(udev, + usb_sndbulkpipe(udev, i + 2))) { + dev_err(xdev->dev, + "Missing BULK OUT endpoint %d\n", + i + 2); + kfree(new_channels); + return -ENODEV; + } + chan->writable = 1; chan->out_synchronous = !!(out_desc & 0x40); chan->out_seekable = !!(out_desc & 0x20); @@ -1977,6 +1994,7 @@ static int setup_channels(struct xillyus } }
+ xdev->channels = new_channels; return 0; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lianqin Hu hulianqin@vivo.com
commit 004eb8ba776ccd3e296ea6f78f7ae7985b12824e upstream.
Audio control requests that sets sampling frequency sometimes fail on this card. Adding delay between control messages eliminates that problem.
Signed-off-by: Lianqin Hu hulianqin@vivo.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/TYUPR06MB6217FF67076AF3E49E12C877D2842@TYUPR06MB621... Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/sound/usb/quirks.c +++ b/sound/usb/quirks.c @@ -2221,6 +2221,8 @@ static const struct usb_audio_quirk_flag QUIRK_FLAG_GENERIC_IMPLICIT_FB), DEVICE_FLG(0x2b53, 0x0031, /* Fiero SC-01 (firmware v1.1.0) */ QUIRK_FLAG_GENERIC_IMPLICIT_FB), + DEVICE_FLG(0x2d95, 0x8021, /* VIVO USB-C-XE710 HEADSET */ + QUIRK_FLAG_CTL_MSG_DELAY_1M), DEVICE_FLG(0x30be, 0x0101, /* Schiit Hel */ QUIRK_FLAG_IGNORE_CTL_ERROR), DEVICE_FLG(0x413c, 0xa506, /* Dell AE515 sound bar */
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Juan José Arboleda soyjuanarbol@gmail.com
commit c286f204ce6ba7b48e3dcba53eda7df8eaa64dd9 upstream.
This patch adds a USB quirk for the Yamaha P-125 digital piano.
Signed-off-by: Juan José Arboleda soyjuanarbol@gmail.com Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20240813161053.70256-1-soyjuanarbol@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/quirks-table.h | 1 + 1 file changed, 1 insertion(+)
--- a/sound/usb/quirks-table.h +++ b/sound/usb/quirks-table.h @@ -273,6 +273,7 @@ YAMAHA_DEVICE(0x105a, NULL), YAMAHA_DEVICE(0x105b, NULL), YAMAHA_DEVICE(0x105c, NULL), YAMAHA_DEVICE(0x105d, NULL), +YAMAHA_DEVICE(0x1718, "P-125"), { USB_DEVICE(0x0499, 0x1503), .driver_info = (unsigned long) & (const struct snd_usb_audio_quirk) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit 3ed486e383ccee9b0c8d727608f12a937c6603ca upstream.
Add LJCA GPIO support for the Lunar Lake platform.
New HID taken from out of tree ivsc-driver git repo.
Link: https://github.com/intel/ivsc-driver/commit/47e7c4a446c8ea8c741ff5a32fa7b19f... Cc: stable stable@kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20240812095038.555837-1-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/misc/usb-ljca.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/usb/misc/usb-ljca.c +++ b/drivers/usb/misc/usb-ljca.c @@ -169,6 +169,7 @@ static const struct acpi_device_id ljca_ { "INTC1096" }, { "INTC100B" }, { "INTC10D1" }, + { "INTC10B5" }, {}, };
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier maz@kernel.org
commit dcdb52d948f3a17ccd3fce757d9bd981d7c32039 upstream.
If xhci_mem_init() fails, it calls into xhci_mem_cleanup() to mop up the damage. If it fails early enough, before xhci->interrupters is allocated but after xhci->max_interrupters has been set, which happens in most (all?) cases, things get uglier, as xhci_mem_cleanup() unconditionally derefences xhci->interrupters. With prejudice.
Gate the interrupt freeing loop with a check on xhci->interrupters being non-NULL.
Found while debugging a DMA allocation issue that led the XHCI driver on this exact path.
Fixes: c99b38c41234 ("xhci: add support to allocate several interrupters") Cc: Mathias Nyman mathias.nyman@linux.intel.com Cc: Wesley Cheng quic_wcheng@quicinc.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org # 6.8+ Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20240809124408.505786-2-mathias.nyman@linux.intel.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci-mem.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/usb/host/xhci-mem.c +++ b/drivers/usb/host/xhci-mem.c @@ -1877,7 +1877,7 @@ void xhci_mem_cleanup(struct xhci_hcd *x
cancel_delayed_work_sync(&xhci->cmd_timer);
- for (i = 0; i < xhci->max_interrupters; i++) { + for (i = 0; xhci->interrupters && i < xhci->max_interrupters; i++) { if (xhci->interrupters[i]) { xhci_remove_interrupter(xhci, xhci->interrupters[i]); xhci_free_interrupter(xhci, xhci->interrupters[i]);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mathias Nyman mathias.nyman@linux.intel.com
commit af8e119f52e9c13e556be9e03f27957554a84656 upstream.
re-enumerating full-speed devices after a failed address device command can trigger a NULL pointer dereference.
Full-speed devices may need to reconfigure the endpoint 0 Max Packet Size value during enumeration. Usb core calls usb_ep0_reinit() in this case, which ends up calling xhci_configure_endpoint().
On Panther point xHC the xhci_configure_endpoint() function will additionally check and reserve bandwidth in software. Other hosts do this in hardware
If xHC address device command fails then a new xhci_virt_device structure is allocated as part of re-enabling the slot, but the bandwidth table pointers are not set up properly here. This triggers the NULL pointer dereference the next time usb_ep0_reinit() is called and xhci_configure_endpoint() tries to check and reserve bandwidth
[46710.713538] usb 3-1: new full-speed USB device number 5 using xhci_hcd [46710.713699] usb 3-1: Device not responding to setup address. [46710.917684] usb 3-1: Device not responding to setup address. [46711.125536] usb 3-1: device not accepting address 5, error -71 [46711.125594] BUG: kernel NULL pointer dereference, address: 0000000000000008 [46711.125600] #PF: supervisor read access in kernel mode [46711.125603] #PF: error_code(0x0000) - not-present page [46711.125606] PGD 0 P4D 0 [46711.125610] Oops: Oops: 0000 [#1] PREEMPT SMP PTI [46711.125615] CPU: 1 PID: 25760 Comm: kworker/1:2 Not tainted 6.10.3_2 #1 [46711.125620] Hardware name: Gigabyte Technology Co., Ltd. [46711.125623] Workqueue: usb_hub_wq hub_event [usbcore] [46711.125668] RIP: 0010:xhci_reserve_bandwidth (drivers/usb/host/xhci.c
Fix this by making sure bandwidth table pointers are set up correctly after a failed address device command, and additionally by avoiding checking for bandwidth in cases like this where no actual endpoints are added or removed, i.e. only context for default control endpoint 0 is evaluated.
Reported-by: Karel Balej balejk@matfyz.cz Closes: https://lore.kernel.org/linux-usb/D3CKQQAETH47.1MUO22RTCH2O3@matfyz.cz/ Cc: stable@vger.kernel.org Fixes: 651aaf36a7d7 ("usb: xhci: Handle USB transaction error on address command") Signed-off-by: Mathias Nyman mathias.nyman@linux.intel.com Link: https://lore.kernel.org/r/20240815141117.2702314-2-mathias.nyman@linux.intel... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/usb/host/xhci.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/usb/host/xhci.c +++ b/drivers/usb/host/xhci.c @@ -2837,7 +2837,7 @@ static int xhci_configure_endpoint(struc xhci->num_active_eps); return -ENOMEM; } - if ((xhci->quirks & XHCI_SW_BW_CHECKING) && + if ((xhci->quirks & XHCI_SW_BW_CHECKING) && !ctx_change && xhci_reserve_bandwidth(xhci, virt_dev, command->in_ctx)) { if ((xhci->quirks & XHCI_EP_LIMIT_QUIRK)) xhci_free_host_resources(xhci, ctrl_ctx); @@ -4200,8 +4200,10 @@ static int xhci_setup_device(struct usb_ mutex_unlock(&xhci->mutex); ret = xhci_disable_slot(xhci, udev->slot_id); xhci_free_virt_device(xhci, udev->slot_id); - if (!ret) - xhci_alloc_dev(hcd, udev); + if (!ret) { + if (xhci_alloc_dev(hcd, udev) == 1) + xhci_setup_addressable_virt_dev(xhci, udev); + } kfree(command->completion); kfree(command); return -EPROTO;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mika Westerberg mika.westerberg@linux.intel.com
commit e2006140ad2e01a02ed0aff49cc2ae3ceeb11f8d upstream.
I noticed that when we do discrete host router NVM upgrade and it gets hot-removed from the PCIe side as a result of NVM firmware authentication, if there is another host connected with enabled paths we hang in tearing them down. This is due to fact that the Thunderbolt networking driver also tries to cleanup the paths and ends up blocking in tb_disconnect_xdomain_paths() waiting for the domain lock.
However, at this point we already cleaned the paths in tb_stop() so there is really no need for tb_disconnect_xdomain_paths() to do that anymore. Furthermore it already checks if the XDomain is unplugged and bails out early so take advantage of that and mark the XDomain as unplugged when we remove the parent router.
Cc: stable@vger.kernel.org Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thunderbolt/switch.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/thunderbolt/switch.c +++ b/drivers/thunderbolt/switch.c @@ -3392,6 +3392,7 @@ void tb_switch_remove(struct tb_switch * tb_switch_remove(port->remote->sw); port->remote = NULL; } else if (port->xdomain) { + port->xdomain->is_unplugged = true; tb_xdomain_remove(port->xdomain); port->xdomain = NULL; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baojun Xu baojun.xu@ti.com
commit 3beddef84d90590270465a907de1cfe2539ac70d upstream.
Wrong calibration data order cause sound too low in some device. Fix wrong calibrated data order, add calibration data converssion by get_unaligned_be32() after reading from UEFI.
Fixes: 5be27f1e3ec9 ("ALSA: hda/tas2781: Add tas2781 HDA driver") Cc: stable@vger.kernel.org Signed-off-by: Baojun Xu baojun.xu@ti.com Link: https://patch.msgid.link/20240813043749.108-1-shenghao-ding@ti.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/tas2781_hda_i2c.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
--- a/sound/pci/hda/tas2781_hda_i2c.c +++ b/sound/pci/hda/tas2781_hda_i2c.c @@ -2,10 +2,12 @@ // // TAS2781 HDA I2C driver // -// Copyright 2023 Texas Instruments, Inc. +// Copyright 2023 - 2024 Texas Instruments, Inc. // // Author: Shenghao Ding shenghao-ding@ti.com +// Current maintainer: Baojun Xu baojun.xu@ti.com
+#include <asm/unaligned.h> #include <linux/acpi.h> #include <linux/crc8.h> #include <linux/crc32.h> @@ -519,20 +521,22 @@ static void tas2781_apply_calib(struct t static const unsigned char rgno_array[CALIB_MAX] = { 0x74, 0x0c, 0x14, 0x70, 0x7c, }; - unsigned char *data; + int offset = 0; int i, j, rc; + __be32 data;
for (i = 0; i < tas_priv->ndev; i++) { - data = tas_priv->cali_data.data + - i * TASDEVICE_SPEAKER_CALIBRATION_SIZE; for (j = 0; j < CALIB_MAX; j++) { + data = get_unaligned_be32( + &tas_priv->cali_data.data[offset]); rc = tasdevice_dev_bulk_write(tas_priv, i, TASDEVICE_REG(0, page_array[j], rgno_array[j]), - &(data[4 * j]), 4); + (unsigned char *)&data, 4); if (rc < 0) dev_err(tas_priv->dev, "chn %d calib %d bulk_wr err = %d\n", i, j, rc); + offset += 4; } } }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
commit ccbfcac05866ebe6eb3bc6d07b51d4ed4fcde436 upstream.
The recent addition of a sanity check for a too low start tick time seems breaking some applications that uses aloop with a certain slave timer setup. They may have the initial resolution 0, hence it's treated as if it were a too low value.
Relax and skip the check for the slave timer instance for addressing the regression.
Fixes: 4a63bd179fa8 ("ALSA: timer: Set lower bound of start tick time") Cc: stable@vger.kernel.org Link: https://github.com/raspberrypi/linux/issues/6294 Link: https://patch.msgid.link/20240810084833.10939-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/core/timer.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/sound/core/timer.c +++ b/sound/core/timer.c @@ -547,7 +547,7 @@ static int snd_timer_start1(struct snd_t /* check the actual time for the start tick; * bail out as error if it's way too low (< 100us) */ - if (start) { + if (start && !(timer->hw.flags & SNDRV_TIMER_HW_SLAVE)) { if ((u64)snd_timer_hw_resolution(timer) * ticks < 100000) return -EINVAL; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stefan Haberland sth@linux.ibm.com
commit 7db4042336580dfd75cb5faa82c12cd51098c90b upstream.
Extent Space Efficient (ESE) or thin provisioned volumes need to be formatted on demand during usual IO processing.
The dasd_ese_needs_format function checks for error codes that signal the non existence of a proper track format.
The check for incorrect length is to imprecise since other error cases leading to transport of insufficient data also have this flag set. This might lead to data corruption in certain error cases for example during a storage server warmstart.
Fix by removing the check for incorrect length and replacing by explicitly checking for invalid track format in transport mode.
Also remove the check for file protected since this is not a valid ESE handling case.
Cc: stable@vger.kernel.org # 5.3+ Fixes: 5e2b17e712cf ("s390/dasd: Add dynamic formatting support for ESE volumes") Reviewed-by: Jan Hoeppner hoeppner@linux.ibm.com Signed-off-by: Stefan Haberland sth@linux.ibm.com Link: https://lore.kernel.org/r/20240812125733.126431-3-sth@linux.ibm.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/block/dasd.c | 36 +++++++++++++++--------- drivers/s390/block/dasd_3990_erp.c | 10 +----- drivers/s390/block/dasd_eckd.c | 55 ++++++++++++++++--------------------- drivers/s390/block/dasd_int.h | 2 - 4 files changed, 50 insertions(+), 53 deletions(-)
--- a/drivers/s390/block/dasd.c +++ b/drivers/s390/block/dasd.c @@ -1601,9 +1601,15 @@ static int dasd_ese_needs_format(struct if (!sense) return 0;
- return !!(sense[1] & SNS1_NO_REC_FOUND) || - !!(sense[1] & SNS1_FILE_PROTECTED) || - scsw_cstat(&irb->scsw) == SCHN_STAT_INCORR_LEN; + if (sense[1] & SNS1_NO_REC_FOUND) + return 1; + + if ((sense[1] & SNS1_INV_TRACK_FORMAT) && + scsw_is_tm(&irb->scsw) && + !(sense[2] & SNS2_ENV_DATA_PRESENT)) + return 1; + + return 0; }
static int dasd_ese_oos_cond(u8 *sense) @@ -1624,7 +1630,7 @@ void dasd_int_handler(struct ccw_device struct dasd_device *device; unsigned long now; int nrf_suppressed = 0; - int fp_suppressed = 0; + int it_suppressed = 0; struct request *req; u8 *sense = NULL; int expires; @@ -1679,8 +1685,9 @@ void dasd_int_handler(struct ccw_device */ sense = dasd_get_sense(irb); if (sense) { - fp_suppressed = (sense[1] & SNS1_FILE_PROTECTED) && - test_bit(DASD_CQR_SUPPRESS_FP, &cqr->flags); + it_suppressed = (sense[1] & SNS1_INV_TRACK_FORMAT) && + !(sense[2] & SNS2_ENV_DATA_PRESENT) && + test_bit(DASD_CQR_SUPPRESS_IT, &cqr->flags); nrf_suppressed = (sense[1] & SNS1_NO_REC_FOUND) && test_bit(DASD_CQR_SUPPRESS_NRF, &cqr->flags);
@@ -1695,7 +1702,7 @@ void dasd_int_handler(struct ccw_device return; } } - if (!(fp_suppressed || nrf_suppressed)) + if (!(it_suppressed || nrf_suppressed)) device->discipline->dump_sense_dbf(device, irb, "int");
if (device->features & DASD_FEATURE_ERPLOG) @@ -2459,14 +2466,17 @@ retry: rc = 0; list_for_each_entry_safe(cqr, n, ccw_queue, blocklist) { /* - * In some cases the 'File Protected' or 'Incorrect Length' - * error might be expected and error recovery would be - * unnecessary in these cases. Check if the according suppress - * bit is set. + * In some cases certain errors might be expected and + * error recovery would be unnecessary in these cases. + * Check if the according suppress bit is set. */ sense = dasd_get_sense(&cqr->irb); - if (sense && sense[1] & SNS1_FILE_PROTECTED && - test_bit(DASD_CQR_SUPPRESS_FP, &cqr->flags)) + if (sense && (sense[1] & SNS1_INV_TRACK_FORMAT) && + !(sense[2] & SNS2_ENV_DATA_PRESENT) && + test_bit(DASD_CQR_SUPPRESS_IT, &cqr->flags)) + continue; + if (sense && (sense[1] & SNS1_NO_REC_FOUND) && + test_bit(DASD_CQR_SUPPRESS_NRF, &cqr->flags)) continue; if (scsw_cstat(&cqr->irb.scsw) == 0x40 && test_bit(DASD_CQR_SUPPRESS_IL, &cqr->flags)) --- a/drivers/s390/block/dasd_3990_erp.c +++ b/drivers/s390/block/dasd_3990_erp.c @@ -1386,14 +1386,8 @@ dasd_3990_erp_file_prot(struct dasd_ccw_
struct dasd_device *device = erp->startdev;
- /* - * In some cases the 'File Protected' error might be expected and - * log messages shouldn't be written then. - * Check if the according suppress bit is set. - */ - if (!test_bit(DASD_CQR_SUPPRESS_FP, &erp->flags)) - dev_err(&device->cdev->dev, - "Accessing the DASD failed because of a hardware error\n"); + dev_err(&device->cdev->dev, + "Accessing the DASD failed because of a hardware error\n");
return dasd_3990_erp_cleanup(erp, DASD_CQR_FAILED);
--- a/drivers/s390/block/dasd_eckd.c +++ b/drivers/s390/block/dasd_eckd.c @@ -2274,6 +2274,7 @@ dasd_eckd_analysis_ccw(struct dasd_devic cqr->status = DASD_CQR_FILLED; /* Set flags to suppress output for expected errors */ set_bit(DASD_CQR_SUPPRESS_NRF, &cqr->flags); + set_bit(DASD_CQR_SUPPRESS_IT, &cqr->flags);
return cqr; } @@ -2555,7 +2556,6 @@ dasd_eckd_build_check_tcw(struct dasd_de cqr->buildclk = get_tod_clock(); cqr->status = DASD_CQR_FILLED; /* Set flags to suppress output for expected errors */ - set_bit(DASD_CQR_SUPPRESS_FP, &cqr->flags); set_bit(DASD_CQR_SUPPRESS_IL, &cqr->flags);
return cqr; @@ -4129,8 +4129,6 @@ static struct dasd_ccw_req *dasd_eckd_bu
/* Set flags to suppress output for expected errors */ if (dasd_eckd_is_ese(basedev)) { - set_bit(DASD_CQR_SUPPRESS_FP, &cqr->flags); - set_bit(DASD_CQR_SUPPRESS_IL, &cqr->flags); set_bit(DASD_CQR_SUPPRESS_NRF, &cqr->flags); }
@@ -4632,9 +4630,8 @@ static struct dasd_ccw_req *dasd_eckd_bu
/* Set flags to suppress output for expected errors */ if (dasd_eckd_is_ese(basedev)) { - set_bit(DASD_CQR_SUPPRESS_FP, &cqr->flags); - set_bit(DASD_CQR_SUPPRESS_IL, &cqr->flags); set_bit(DASD_CQR_SUPPRESS_NRF, &cqr->flags); + set_bit(DASD_CQR_SUPPRESS_IT, &cqr->flags); }
return cqr; @@ -5779,36 +5776,32 @@ static void dasd_eckd_dump_sense(struct { u8 *sense = dasd_get_sense(irb);
- if (scsw_is_tm(&irb->scsw)) { - /* - * In some cases the 'File Protected' or 'Incorrect Length' - * error might be expected and log messages shouldn't be written - * then. Check if the according suppress bit is set. - */ - if (sense && (sense[1] & SNS1_FILE_PROTECTED) && - test_bit(DASD_CQR_SUPPRESS_FP, &req->flags)) - return; - if (scsw_cstat(&irb->scsw) == 0x40 && - test_bit(DASD_CQR_SUPPRESS_IL, &req->flags)) - return; + /* + * In some cases certain errors might be expected and + * log messages shouldn't be written then. + * Check if the according suppress bit is set. + */ + if (sense && (sense[1] & SNS1_INV_TRACK_FORMAT) && + !(sense[2] & SNS2_ENV_DATA_PRESENT) && + test_bit(DASD_CQR_SUPPRESS_IT, &req->flags)) + return;
- dasd_eckd_dump_sense_tcw(device, req, irb); - } else { - /* - * In some cases the 'Command Reject' or 'No Record Found' - * error might be expected and log messages shouldn't be - * written then. Check if the according suppress bit is set. - */ - if (sense && sense[0] & SNS0_CMD_REJECT && - test_bit(DASD_CQR_SUPPRESS_CR, &req->flags)) - return; + if (sense && sense[0] & SNS0_CMD_REJECT && + test_bit(DASD_CQR_SUPPRESS_CR, &req->flags)) + return;
- if (sense && sense[1] & SNS1_NO_REC_FOUND && - test_bit(DASD_CQR_SUPPRESS_NRF, &req->flags)) - return; + if (sense && sense[1] & SNS1_NO_REC_FOUND && + test_bit(DASD_CQR_SUPPRESS_NRF, &req->flags)) + return;
+ if (scsw_cstat(&irb->scsw) == 0x40 && + test_bit(DASD_CQR_SUPPRESS_IL, &req->flags)) + return; + + if (scsw_is_tm(&irb->scsw)) + dasd_eckd_dump_sense_tcw(device, req, irb); + else dasd_eckd_dump_sense_ccw(device, req, irb); - } }
static int dasd_eckd_reload_device(struct dasd_device *device) --- a/drivers/s390/block/dasd_int.h +++ b/drivers/s390/block/dasd_int.h @@ -196,7 +196,7 @@ struct dasd_ccw_req { * The following flags are used to suppress output of certain errors. */ #define DASD_CQR_SUPPRESS_NRF 4 /* Suppress 'No Record Found' error */ -#define DASD_CQR_SUPPRESS_FP 5 /* Suppress 'File Protected' error*/ +#define DASD_CQR_SUPPRESS_IT 5 /* Suppress 'Invalid Track' error*/ #define DASD_CQR_SUPPRESS_IL 6 /* Suppress 'Incorrect Length' error */ #define DASD_CQR_SUPPRESS_CR 7 /* Suppress 'Command Reject' error */
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Mueller mimu@linux.ibm.com
commit 5a44bb061d04b0306f2aa8add761d86d152b9377 upstream.
We might run into a SIE validity if gisa has been disabled either via using kernel parameter "kvm.use_gisa=0" or by setting the related sysfs attribute to N (echo N >/sys/module/kvm/parameters/use_gisa).
The validity is caused by an invalid value in the SIE control block's gisa designation. That happens because we pass the uninitialized gisa origin to virt_to_phys() before writing it to the gisa designation.
To fix this we return 0 in kvm_s390_get_gisa_desc() if the origin is 0. kvm_s390_get_gisa_desc() is used to determine which gisa designation to set in the SIE control block. A value of 0 in the gisa designation disables gisa usage.
The issue surfaces in the host kernel with the following kernel message as soon a new kvm guest start is attemted.
kvm: unhandled validity intercept 0x1011 WARNING: CPU: 0 PID: 781237 at arch/s390/kvm/intercept.c:101 kvm_handle_sie_intercept+0x42e/0x4d0 [kvm] Modules linked in: vhost_net tap tun xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT xt_tcpudp nft_compat x_tables nf_nat_tftp nf_conntrack_tftp vfio_pci_core irqbypass vhost_vsock vmw_vsock_virtio_transport_common vsock vhost vhost_iotlb kvm nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables sunrpc mlx5_ib ib_uverbs ib_core mlx5_core uvdevice s390_trng eadm_sch vfio_ccw zcrypt_cex4 mdev vfio_iommu_type1 vfio sch_fq_codel drm i2c_core loop drm_panel_orientation_quirks configfs nfnetlink lcs ctcm fsm dm_service_time ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common dm_mirror dm_region_hash dm_log zfcp scsi_transport_fc scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkey zcrypt dm_multipath rng_core autofs4 [last unloaded: vfio_pci] CPU: 0 PID: 781237 Comm: CPU 0/KVM Not tainted 6.10.0-08682-gcad9f11498ea #6 Hardware name: IBM 3931 A01 701 (LPAR) Krnl PSW : 0704c00180000000 000003d93deb0122 (kvm_handle_sie_intercept+0x432/0x4d0 [kvm]) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 Krnl GPRS: 000003d900000027 000003d900000023 0000000000000028 000002cd00000000 000002d063a00900 00000359c6daf708 00000000000bebb5 0000000000001eff 000002cfd82e9000 000002cfd80bc000 0000000000001011 000003d93deda412 000003ff8962df98 000003d93de77ce0 000003d93deb011e 00000359c6daf960 Krnl Code: 000003d93deb0112: c020fffe7259 larl %r2,000003d93de7e5c4 000003d93deb0118: c0e53fa8beac brasl %r14,000003d9bd3c7e70 #000003d93deb011e: af000000 mc 0,0 >000003d93deb0122: a728ffea lhi %r2,-22 000003d93deb0126: a7f4fe24 brc 15,000003d93deafd6e 000003d93deb012a: 9101f0b0 tm 176(%r15),1 000003d93deb012e: a774fe48 brc 7,000003d93deafdbe 000003d93deb0132: 40a0f0ae sth %r10,174(%r15) Call Trace: [<000003d93deb0122>] kvm_handle_sie_intercept+0x432/0x4d0 [kvm] ([<000003d93deb011e>] kvm_handle_sie_intercept+0x42e/0x4d0 [kvm]) [<000003d93deacc10>] vcpu_post_run+0x1d0/0x3b0 [kvm] [<000003d93deaceda>] __vcpu_run+0xea/0x2d0 [kvm] [<000003d93dead9da>] kvm_arch_vcpu_ioctl_run+0x16a/0x430 [kvm] [<000003d93de93ee0>] kvm_vcpu_ioctl+0x190/0x7c0 [kvm] [<000003d9bd728b4e>] vfs_ioctl+0x2e/0x70 [<000003d9bd72a092>] __s390x_sys_ioctl+0xc2/0xd0 [<000003d9be0e9222>] __do_syscall+0x1f2/0x2e0 [<000003d9be0f9a90>] system_call+0x70/0x98 Last Breaking-Event-Address: [<000003d9bd3c7f58>] __warn_printk+0xe8/0xf0
Cc: stable@vger.kernel.org Reported-by: Christian Borntraeger borntraeger@linux.ibm.com Fixes: fe0ef0030463 ("KVM: s390: sort out physical vs virtual pointers usage") Signed-off-by: Michael Mueller mimu@linux.ibm.com Tested-by: Christian Borntraeger borntraeger@linux.ibm.com Reviewed-by: Janosch Frank frankja@linux.ibm.com Link: https://lore.kernel.org/r/20240801123109.2782155-1-mimu@linux.ibm.com Message-ID: 20240801123109.2782155-1-mimu@linux.ibm.com Signed-off-by: Janosch Frank frankja@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/kvm/kvm-s390.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
--- a/arch/s390/kvm/kvm-s390.h +++ b/arch/s390/kvm/kvm-s390.h @@ -267,7 +267,12 @@ static inline unsigned long kvm_s390_get
static inline u32 kvm_s390_get_gisa_desc(struct kvm *kvm) { - u32 gd = virt_to_phys(kvm->arch.gisa_int.origin); + u32 gd; + + if (!kvm->arch.gisa_int.origin) + return 0; + + gd = virt_to_phys(kvm->arch.gisa_int.origin);
if (gd && sclp.has_gisaf) gd |= GISA_FORMAT1;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit b9b6ee6fe258ce4d89592593efcd3d798c418859 upstream.
Instead of clearing the "updated" flag for each cooling device affected by the trip point crossing in bang_bang_control() and walking all thermal instances to run thermal_cdev_update() for all of the affected cooling devices, call __thermal_cdev_update() directly for each of them.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Peter Kästle peter@piie.net Reviewed-by: Zhang Rui rui.zhang@intel.com Cc: 6.10+ stable@vger.kernel.org # 6.10+ Link: https://patch.msgid.link/13583081.uLZWGnKmhe@rjwysocki.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thermal/gov_bang_bang.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/drivers/thermal/gov_bang_bang.c +++ b/drivers/thermal/gov_bang_bang.c @@ -79,12 +79,9 @@ static void bang_bang_control(struct the dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target);
mutex_lock(&instance->cdev->lock); - instance->cdev->updated = false; /* cdev needs update */ + __thermal_cdev_update(instance->cdev); mutex_unlock(&instance->cdev->lock); } - - list_for_each_entry(instance, &tz->thermal_instances, tz_node) - thermal_cdev_update(instance->cdev); }
static struct thermal_governor thermal_gov_bang_bang = {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Gstir david@sigma-star.at
commit 6486cad00a8b7f8585983408c152bbe33dda529b upstream.
The DCP trusted key type uses the wrong helper function to store the blob's payload length which can lead to the wrong byte order being used in case this would ever run on big endian architectures.
Fix by using correct helper function.
Cc: stable@vger.kernel.org # v6.10+ Fixes: 2e8a0f40a39c ("KEYS: trusted: Introduce NXP DCP-backed trusted keys") Suggested-by: Richard Weinberger richard@nod.at Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202405240610.fj53EK0q-lkp@intel.com/ Signed-off-by: David Gstir david@sigma-star.at Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/keys/trusted-keys/trusted_dcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/security/keys/trusted-keys/trusted_dcp.c b/security/keys/trusted-keys/trusted_dcp.c index b5f81a05be36..b0947f072a98 100644 --- a/security/keys/trusted-keys/trusted_dcp.c +++ b/security/keys/trusted-keys/trusted_dcp.c @@ -222,7 +222,7 @@ static int trusted_dcp_seal(struct trusted_key_payload *p, char *datablob) return ret; }
- b->payload_len = get_unaligned_le32(&p->key_len); + put_unaligned_le32(p->key_len, &b->payload_len); p->blob_len = blen; return 0; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Gstir david@sigma-star.at
commit 0e28bf61a5f9ab30be3f3b4eafb8d097e39446bb upstream.
Trusted keys unseal the key blob on load, but keep the sealed payload in the blob field so that every subsequent read (export) will simply convert this field to hex and send it to userspace.
With DCP-based trusted keys, we decrypt the blob encryption key (BEK) in the Kernel due hardware limitations and then decrypt the blob payload. BEK decryption is done in-place which means that the trusted key blob field is modified and it consequently holds the BEK in plain text. Every subsequent read of that key thus send the plain text BEK instead of the encrypted BEK to userspace.
This issue only occurs when importing a trusted DCP-based key and then exporting it again. This should rarely happen as the common use cases are to either create a new trusted key and export it, or import a key blob and then just use it without exporting it again.
Fix this by performing BEK decryption and encryption in a dedicated buffer. Further always wipe the plain text BEK buffer to prevent leaking the key via uninitialized memory.
Cc: stable@vger.kernel.org # v6.10+ Fixes: 2e8a0f40a39c ("KEYS: trusted: Introduce NXP DCP-backed trusted keys") Signed-off-by: David Gstir david@sigma-star.at Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/keys/trusted-keys/trusted_dcp.c | 33 +++++++++++++++--------- 1 file changed, 21 insertions(+), 12 deletions(-)
diff --git a/security/keys/trusted-keys/trusted_dcp.c b/security/keys/trusted-keys/trusted_dcp.c index b0947f072a98..4edc5bbbcda3 100644 --- a/security/keys/trusted-keys/trusted_dcp.c +++ b/security/keys/trusted-keys/trusted_dcp.c @@ -186,20 +186,21 @@ static int do_aead_crypto(u8 *in, u8 *out, size_t len, u8 *key, u8 *nonce, return ret; }
-static int decrypt_blob_key(u8 *key) +static int decrypt_blob_key(u8 *encrypted_key, u8 *plain_key) { - return do_dcp_crypto(key, key, false); + return do_dcp_crypto(encrypted_key, plain_key, false); }
-static int encrypt_blob_key(u8 *key) +static int encrypt_blob_key(u8 *plain_key, u8 *encrypted_key) { - return do_dcp_crypto(key, key, true); + return do_dcp_crypto(plain_key, encrypted_key, true); }
static int trusted_dcp_seal(struct trusted_key_payload *p, char *datablob) { struct dcp_blob_fmt *b = (struct dcp_blob_fmt *)p->blob; int blen, ret; + u8 plain_blob_key[AES_KEYSIZE_128];
blen = calc_blob_len(p->key_len); if (blen > MAX_BLOB_SIZE) @@ -207,30 +208,36 @@ static int trusted_dcp_seal(struct trusted_key_payload *p, char *datablob)
b->fmt_version = DCP_BLOB_VERSION; get_random_bytes(b->nonce, AES_KEYSIZE_128); - get_random_bytes(b->blob_key, AES_KEYSIZE_128); + get_random_bytes(plain_blob_key, AES_KEYSIZE_128);
- ret = do_aead_crypto(p->key, b->payload, p->key_len, b->blob_key, + ret = do_aead_crypto(p->key, b->payload, p->key_len, plain_blob_key, b->nonce, true); if (ret) { pr_err("Unable to encrypt blob payload: %i\n", ret); - return ret; + goto out; }
- ret = encrypt_blob_key(b->blob_key); + ret = encrypt_blob_key(plain_blob_key, b->blob_key); if (ret) { pr_err("Unable to encrypt blob key: %i\n", ret); - return ret; + goto out; }
put_unaligned_le32(p->key_len, &b->payload_len); p->blob_len = blen; - return 0; + ret = 0; + +out: + memzero_explicit(plain_blob_key, sizeof(plain_blob_key)); + + return ret; }
static int trusted_dcp_unseal(struct trusted_key_payload *p, char *datablob) { struct dcp_blob_fmt *b = (struct dcp_blob_fmt *)p->blob; int blen, ret; + u8 plain_blob_key[AES_KEYSIZE_128];
if (b->fmt_version != DCP_BLOB_VERSION) { pr_err("DCP blob has bad version: %i, expected %i\n", @@ -248,14 +255,14 @@ static int trusted_dcp_unseal(struct trusted_key_payload *p, char *datablob) goto out; }
- ret = decrypt_blob_key(b->blob_key); + ret = decrypt_blob_key(b->blob_key, plain_blob_key); if (ret) { pr_err("Unable to decrypt blob key: %i\n", ret); goto out; }
ret = do_aead_crypto(b->payload, p->key, p->key_len + DCP_BLOB_AUTHLEN, - b->blob_key, b->nonce, false); + plain_blob_key, b->nonce, false); if (ret) { pr_err("Unwrap of DCP payload failed: %i\n", ret); goto out; @@ -263,6 +270,8 @@ static int trusted_dcp_unseal(struct trusted_key_payload *p, char *datablob)
ret = 0; out: + memzero_explicit(plain_blob_key, sizeof(plain_blob_key)); + return ret; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nam Cao namcao@linutronix.de
commit 57d76bc51fd80824bcc0c84a5b5ec944f1b51edd upstream.
With XIP kernel, kernel_map.size is set to be only the size of data part of the kernel. This is inconsistent with "normal" kernel, who sets it to be the size of the entire kernel.
More importantly, XIP kernel fails to boot if CONFIG_DEBUG_VIRTUAL is enabled, because there are checks on virtual addresses with the assumption that kernel_map.size is the size of the entire kernel (these checks are in arch/riscv/mm/physaddr.c).
Change XIP's kernel_map.size to be the size of the entire kernel.
Signed-off-by: Nam Cao namcao@linutronix.de Cc: stable@vger.kernel.org # v6.1+ Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/20240508191917.2892064-1-namcao@linutronix.de Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/mm/init.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/riscv/mm/init.c +++ b/arch/riscv/mm/init.c @@ -931,7 +931,7 @@ static void __init create_kernel_page_ta PMD_SIZE, PAGE_KERNEL_EXEC);
/* Map the data in RAM */ - end_va = kernel_map.virt_addr + XIP_OFFSET + kernel_map.size; + end_va = kernel_map.virt_addr + kernel_map.size; for (va = kernel_map.virt_addr + XIP_OFFSET; va < end_va; va += PMD_SIZE) create_pgd_mapping(pgdir, va, kernel_map.phys_addr + (va - (kernel_map.virt_addr + XIP_OFFSET)), @@ -1100,7 +1100,7 @@ asmlinkage void __init setup_vm(uintptr_
phys_ram_base = CONFIG_PHYS_RAM_BASE; kernel_map.phys_addr = (uintptr_t)CONFIG_PHYS_RAM_BASE; - kernel_map.size = (uintptr_t)(&_end) - (uintptr_t)(&_sdata); + kernel_map.size = (uintptr_t)(&_end) - (uintptr_t)(&_start);
kernel_map.va_kernel_xip_pa_offset = kernel_map.virt_addr - kernel_map.xiprom; #else
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Celeste Liu coelacanthushex@gmail.com
commit 61119394631f219e23ce98bcc3eb993a64a8ea64 upstream.
Otherwise when the tracer changes syscall number to -1, the kernel fails to initialize a0 with -ENOSYS and subsequently fails to return the error code of the failed syscall to userspace. For example, it will break strace syscall tampering.
Fixes: 52449c17bdd1 ("riscv: entry: set a0 = -ENOSYS only when syscall != -1") Reported-by: "Dmitry V. Levin" ldv@strace.io Reviewed-by: Björn Töpel bjorn@rivosinc.com Cc: stable@vger.kernel.org Signed-off-by: Celeste Liu CoelacanthusHex@gmail.com Link: https://lore.kernel.org/r/20240627142338.5114-2-CoelacanthusHex@gmail.com Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/kernel/traps.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/riscv/kernel/traps.c +++ b/arch/riscv/kernel/traps.c @@ -319,6 +319,7 @@ void do_trap_ecall_u(struct pt_regs *reg
regs->epc += 4; regs->orig_a0 = regs->a0; + regs->a0 = -ENOSYS;
riscv_v_vstate_discard(regs);
@@ -328,8 +329,7 @@ void do_trap_ecall_u(struct pt_regs *reg
if (syscall >= 0 && syscall < NR_syscalls) syscall_handler(regs, syscall); - else if (syscall != -1) - regs->a0 = -ENOSYS; + /* * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(), * so the maximum stack offset is 1k bytes (10 bits).
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steve French stfrench@microsoft.com
commit 836bb3268db405cf9021496ac4dbc26d3e4758fe upstream.
Mandatory locking is enforced for cached writes, which violates default posix semantics, and also it is enforced inconsistently. This apparently breaks recent versions of libreoffice, but can also be demonstrated by opening a file twice from the same client, locking it from handle one and writing to it from handle two (which fails, returning EACCES).
Since there was already a mount option "forcemandatorylock" (which defaults to off), with this change only when the user intentionally specifies "forcemandatorylock" on mount will we break posix semantics on write to a locked range (ie we will only fail the write in this case, if the user mounts with "forcemandatorylock").
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks") Cc: stable@vger.kernel.org Cc: Pavel Shilovsky piastryyy@gmail.com Reported-by: abartlet@samba.org Reported-by: Kevin Ottens kevin.ottens@enioka.com Reviewed-by: David Howells dhowells@redhat.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/file.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
--- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -2719,6 +2719,7 @@ cifs_writev(struct kiocb *iocb, struct i struct inode *inode = file->f_mapping->host; struct cifsInodeInfo *cinode = CIFS_I(inode); struct TCP_Server_Info *server = tlink_tcon(cfile->tlink)->ses->server; + struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); ssize_t rc;
rc = netfs_start_io_write(inode); @@ -2735,12 +2736,16 @@ cifs_writev(struct kiocb *iocb, struct i if (rc <= 0) goto out;
- if (!cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from), + if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) && + (cifs_find_lock_conflict(cfile, iocb->ki_pos, iov_iter_count(from), server->vals->exclusive_lock_type, 0, - NULL, CIFS_WRITE_OP)) - rc = netfs_buffered_write_iter_locked(iocb, from, NULL); - else + NULL, CIFS_WRITE_OP))) { rc = -EACCES; + goto out; + } + + rc = netfs_buffered_write_iter_locked(iocb, from, NULL); + out: up_read(&cinode->lock_sem); netfs_end_io_write(inode);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Breno Leitao leitao@debian.org
commit 14d069d92951a3e150c0a81f2ca3b93e54da913b upstream.
On ACPI machines, the tegra i2c module encounters an issue due to a mutex being called inside a spinlock. This leads to the following bug:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:585 ...
Call trace: __might_sleep __mutex_lock_common mutex_lock_nested acpi_subsys_runtime_resume rpm_resume tegra_i2c_xfer
The problem arises because during __pm_runtime_resume(), the spinlock &dev->power.lock is acquired before rpm_resume() is called. Later, rpm_resume() invokes acpi_subsys_runtime_resume(), which relies on mutexes, triggering the error.
To address this issue, devices on ACPI are now marked as not IRQ-safe, considering the dependency of acpi_subsys_runtime_resume() on mutexes.
Fixes: bd2fdedbf2ba ("i2c: tegra: Add the ACPI support") Cc: stable@vger.kernel.org # v5.17+ Co-developed-by: Michael van der Westhuizen rmikey@meta.com Signed-off-by: Michael van der Westhuizen rmikey@meta.com Signed-off-by: Breno Leitao leitao@debian.org Reviewed-by: Dmitry Osipenko digetx@gmail.com Reviewed-by: Andy Shevchenko andy@kernel.org Signed-off-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-tegra.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/i2c/busses/i2c-tegra.c +++ b/drivers/i2c/busses/i2c-tegra.c @@ -1802,9 +1802,9 @@ static int tegra_i2c_probe(struct platfo * domain. * * VI I2C device shouldn't be marked as IRQ-safe because VI I2C won't - * be used for atomic transfers. + * be used for atomic transfers. ACPI device is not IRQ safe also. */ - if (!IS_VI(i2c_dev)) + if (!IS_VI(i2c_dev) && !has_acpi_companion(i2c_dev->dev)) pm_runtime_irq_safe(i2c_dev->dev);
pm_runtime_enable(i2c_dev->dev);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit cdf65d73e001fde600b18d7e45afadf559425ce5 upstream.
A subsequent change will need to pass a depth argument to acpi_execute_reg_methods(), so prepare that function for it.
No intentional functional changes.
Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Hans de Goede hdegoede@redhat.com Cc: All applicable stable@vger.kernel.org Link: https://patch.msgid.link/8451567.NyiUUSuA9g@rjwysocki.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/acpica/acevents.h | 2 +- drivers/acpi/acpica/evregion.c | 6 ++++-- drivers/acpi/acpica/evxfregn.c | 10 +++++++--- drivers/acpi/ec.c | 2 +- include/acpi/acpixf.h | 1 + 5 files changed, 14 insertions(+), 7 deletions(-)
--- a/drivers/acpi/acpica/acevents.h +++ b/drivers/acpi/acpica/acevents.h @@ -188,7 +188,7 @@ acpi_ev_detach_region(union acpi_operand u8 acpi_ns_is_locked);
void -acpi_ev_execute_reg_methods(struct acpi_namespace_node *node, +acpi_ev_execute_reg_methods(struct acpi_namespace_node *node, u32 max_depth, acpi_adr_space_type space_id, u32 function);
acpi_status --- a/drivers/acpi/acpica/evregion.c +++ b/drivers/acpi/acpica/evregion.c @@ -65,6 +65,7 @@ acpi_status acpi_ev_initialize_op_region acpi_gbl_default_address_spaces [i])) { acpi_ev_execute_reg_methods(acpi_gbl_root_node, + ACPI_UINT32_MAX, acpi_gbl_default_address_spaces [i], ACPI_REG_CONNECT); } @@ -672,6 +673,7 @@ cleanup1: * FUNCTION: acpi_ev_execute_reg_methods * * PARAMETERS: node - Namespace node for the device + * max_depth - Depth to which search for _REG * space_id - The address space ID * function - Passed to _REG: On (1) or Off (0) * @@ -683,7 +685,7 @@ cleanup1: ******************************************************************************/
void -acpi_ev_execute_reg_methods(struct acpi_namespace_node *node, +acpi_ev_execute_reg_methods(struct acpi_namespace_node *node, u32 max_depth, acpi_adr_space_type space_id, u32 function) { struct acpi_reg_walk_info info; @@ -717,7 +719,7 @@ acpi_ev_execute_reg_methods(struct acpi_ * regions and _REG methods. (i.e. handlers must be installed for all * regions of this Space ID before we can run any _REG methods) */ - (void)acpi_ns_walk_namespace(ACPI_TYPE_ANY, node, ACPI_UINT32_MAX, + (void)acpi_ns_walk_namespace(ACPI_TYPE_ANY, node, max_depth, ACPI_NS_WALK_UNLOCK, acpi_ev_reg_run, NULL, &info, NULL);
--- a/drivers/acpi/acpica/evxfregn.c +++ b/drivers/acpi/acpica/evxfregn.c @@ -85,7 +85,8 @@ acpi_install_address_space_handler_inter /* Run all _REG methods for this address space */
if (run_reg) { - acpi_ev_execute_reg_methods(node, space_id, ACPI_REG_CONNECT); + acpi_ev_execute_reg_methods(node, ACPI_UINT32_MAX, space_id, + ACPI_REG_CONNECT); }
unlock_and_exit: @@ -263,6 +264,7 @@ ACPI_EXPORT_SYMBOL(acpi_remove_address_s * FUNCTION: acpi_execute_reg_methods * * PARAMETERS: device - Handle for the device + * max_depth - Depth to which search for _REG * space_id - The address space ID * * RETURN: Status @@ -271,7 +273,8 @@ ACPI_EXPORT_SYMBOL(acpi_remove_address_s * ******************************************************************************/ acpi_status -acpi_execute_reg_methods(acpi_handle device, acpi_adr_space_type space_id) +acpi_execute_reg_methods(acpi_handle device, u32 max_depth, + acpi_adr_space_type space_id) { struct acpi_namespace_node *node; acpi_status status; @@ -296,7 +299,8 @@ acpi_execute_reg_methods(acpi_handle dev
/* Run all _REG methods for this address space */
- acpi_ev_execute_reg_methods(node, space_id, ACPI_REG_CONNECT); + acpi_ev_execute_reg_methods(node, max_depth, space_id, + ACPI_REG_CONNECT); } else { status = AE_BAD_PARAMETER; } --- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -1506,7 +1506,7 @@ static int ec_install_handlers(struct ac }
if (call_reg && !test_bit(EC_FLAGS_EC_REG_CALLED, &ec->flags)) { - acpi_execute_reg_methods(scope_handle, ACPI_ADR_SPACE_EC); + acpi_execute_reg_methods(scope_handle, ACPI_UINT32_MAX, ACPI_ADR_SPACE_EC); set_bit(EC_FLAGS_EC_REG_CALLED, &ec->flags); }
--- a/include/acpi/acpixf.h +++ b/include/acpi/acpixf.h @@ -660,6 +660,7 @@ ACPI_EXTERNAL_RETURN_STATUS(acpi_status void *context)) ACPI_EXTERNAL_RETURN_STATUS(acpi_status acpi_execute_reg_methods(acpi_handle device, + u32 nax_depth, acpi_adr_space_type space_id)) ACPI_EXTERNAL_RETURN_STATUS(acpi_status
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 71bf41b8e913ec9fc91f0d39ab8fb320229ec604 upstream.
Commit 60fa6ae6e6d0 ("ACPI: EC: Install address space handler at the namespace root") caused _REG methods for EC operation regions outside the EC device scope to be evaluated which on some systems leads to the evaluation of _REG methods in the scopes of device objects representing devices that are not present and not functional according to the _STA return values. Some of those device objects represent EC "alternatives" and if _REG is evaluated for their operation regions, the platform firmware may be confused and the platform may start to behave incorrectly.
To avoid this problem, only evaluate _REG for EC operation regions located in the scopes of device objects representing known-to-be-present devices.
For this purpose, partially revert commit 60fa6ae6e6d0 and trigger the evaluation of _REG for EC operation regions from acpi_bus_attach() for the known-valid devices.
Fixes: 60fa6ae6e6d0 ("ACPI: EC: Install address space handler at the namespace root") Link: https://lore.kernel.org/linux-acpi/1f76b7e2-1928-4598-8037-28a1785c2d13@redh... Link: https://bugzilla.redhat.com/show_bug.cgi?id=2298938 Link: https://bugzilla.redhat.com/show_bug.cgi?id=2302253 Reported-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Hans de Goede hdegoede@redhat.com Cc: All applicable stable@vger.kernel.org Link: https://patch.msgid.link/23612351.6Emhk5qWAg@rjwysocki.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/ec.c | 11 +++++++++-- drivers/acpi/internal.h | 1 + drivers/acpi/scan.c | 2 ++ 3 files changed, 12 insertions(+), 2 deletions(-)
--- a/drivers/acpi/ec.c +++ b/drivers/acpi/ec.c @@ -1487,12 +1487,13 @@ static bool install_gpio_irq_event_handl static int ec_install_handlers(struct acpi_ec *ec, struct acpi_device *device, bool call_reg) { - acpi_handle scope_handle = ec == first_ec ? ACPI_ROOT_OBJECT : ec->handle; acpi_status status;
acpi_ec_start(ec, false);
if (!test_bit(EC_FLAGS_EC_HANDLER_INSTALLED, &ec->flags)) { + acpi_handle scope_handle = ec == first_ec ? ACPI_ROOT_OBJECT : ec->handle; + acpi_ec_enter_noirq(ec); status = acpi_install_address_space_handler_no_reg(scope_handle, ACPI_ADR_SPACE_EC, @@ -1506,7 +1507,7 @@ static int ec_install_handlers(struct ac }
if (call_reg && !test_bit(EC_FLAGS_EC_REG_CALLED, &ec->flags)) { - acpi_execute_reg_methods(scope_handle, ACPI_UINT32_MAX, ACPI_ADR_SPACE_EC); + acpi_execute_reg_methods(ec->handle, ACPI_UINT32_MAX, ACPI_ADR_SPACE_EC); set_bit(EC_FLAGS_EC_REG_CALLED, &ec->flags); }
@@ -1721,6 +1722,12 @@ static void acpi_ec_remove(struct acpi_d } }
+void acpi_ec_register_opregions(struct acpi_device *adev) +{ + if (first_ec && first_ec->handle != adev->handle) + acpi_execute_reg_methods(adev->handle, 1, ACPI_ADR_SPACE_EC); +} + static acpi_status ec_parse_io_ports(struct acpi_resource *resource, void *context) { --- a/drivers/acpi/internal.h +++ b/drivers/acpi/internal.h @@ -223,6 +223,7 @@ int acpi_ec_add_query_handler(struct acp acpi_handle handle, acpi_ec_query_func func, void *data); void acpi_ec_remove_query_handler(struct acpi_ec *ec, u8 query_bit); +void acpi_ec_register_opregions(struct acpi_device *adev);
#ifdef CONFIG_PM_SLEEP void acpi_ec_flush_work(void); --- a/drivers/acpi/scan.c +++ b/drivers/acpi/scan.c @@ -2264,6 +2264,8 @@ static int acpi_bus_attach(struct acpi_d if (device->handler) goto ok;
+ acpi_ec_register_opregions(device); + if (!device->flags.initialized) { device->flags.power_manageable = device->power.states[ACPI_STATE_D0].flags.valid;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haibo Xu haibo1.xu@intel.com
commit a21dcf0ea8566ebbe011c79d6ed08cdfea771de3 upstream.
Currently, only acpi_early_node_map[0] was initialized to NUMA_NO_NODE. To ensure all the values were properly initialized, switch to initialize all of them to NUMA_NO_NODE.
Fixes: e18962491696 ("arm64: numa: rework ACPI NUMA initialization") Cc: stable@vger.kernel.org # 4.19.x Reported-by: Andrew Jones ajones@ventanamicro.com Suggested-by: Andrew Jones ajones@ventanamicro.com Signed-off-by: Haibo Xu haibo1.xu@intel.com Reviewed-by: Anshuman Khandual anshuman.khandual@arm.com Reviewed-by: Sunil V L sunilvl@ventanamicro.com Reviewed-by: Andrew Jones ajones@ventanamicro.com Acked-by: Catalin Marinas catalin.marinas@arm.com Acked-by: Lorenzo Pieralisi lpieralisi@kernel.org Reviewed-by: Hanjun Guo guohanjun@huawei.com Link: https://lore.kernel.org/r/853d7f74aa243f6f5999e203246f0d1ae92d2b61.172282842... Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/acpi_numa.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm64/kernel/acpi_numa.c +++ b/arch/arm64/kernel/acpi_numa.c @@ -27,7 +27,7 @@
#include <asm/numa.h>
-static int acpi_early_node_map[NR_CPUS] __initdata = { NUMA_NO_NODE }; +static int acpi_early_node_map[NR_CPUS] __initdata = { [0 ... NR_CPUS - 1] = NUMA_NO_NODE };
int __init acpi_numa_get_nid(unsigned int cpu) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Khazhismel Kumykov khazhy@google.com
commit 7a636b4f03af9d541205f69e373672e7b2b60a8a upstream.
If the dm_resume method is called on a device that is not suspended, the method will suspend the device briefly, before resuming it (so that the table will be swapped).
However, there was a bug that the return value of dm_suspended_md was not checked. dm_suspended_md may return an error when it is interrupted by a signal. In this case, do_resume would call dm_swap_table, which would return -EINVAL.
This commit fixes the logic, so that error returned by dm_suspend is checked and the resume operation is undone.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Signed-off-by: Khazhismel Kumykov khazhy@google.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-ioctl.c | 22 ++++++++++++++++++++-- 1 file changed, 20 insertions(+), 2 deletions(-)
--- a/drivers/md/dm-ioctl.c +++ b/drivers/md/dm-ioctl.c @@ -1181,8 +1181,26 @@ static int do_resume(struct dm_ioctl *pa suspend_flags &= ~DM_SUSPEND_LOCKFS_FLAG; if (param->flags & DM_NOFLUSH_FLAG) suspend_flags |= DM_SUSPEND_NOFLUSH_FLAG; - if (!dm_suspended_md(md)) - dm_suspend(md, suspend_flags); + if (!dm_suspended_md(md)) { + r = dm_suspend(md, suspend_flags); + if (r) { + down_write(&_hash_lock); + hc = dm_get_mdptr(md); + if (hc && !hc->new_map) { + hc->new_map = new_map; + new_map = NULL; + } else { + r = -ENXIO; + } + up_write(&_hash_lock); + if (new_map) { + dm_sync_table(md); + dm_table_destroy(new_map); + } + dm_put(md); + return r; + } + }
old_size = dm_get_size(md); old_map = dm_swap_table(md, new_map);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
commit faada2174c08662ae98b439c69efe3e79382c538 upstream.
kmalloc is unreliable when allocating more than 8 pages of memory. It may fail when there is plenty of free memory but the memory is fragmented. Zdenek Kabelac observed such failure in his tests.
This commit changes kmalloc to kvmalloc - kvmalloc will fall back to vmalloc if the large allocation fails.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Reported-by: Zdenek Kabelac zkabelac@redhat.com Reviewed-by: Mike Snitzer snitzer@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/persistent-data/dm-space-map-metadata.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/md/persistent-data/dm-space-map-metadata.c +++ b/drivers/md/persistent-data/dm-space-map-metadata.c @@ -277,7 +277,7 @@ static void sm_metadata_destroy(struct d { struct sm_metadata *smm = container_of(sm, struct sm_metadata, sm);
- kfree(smm); + kvfree(smm); }
static int sm_metadata_get_nr_blocks(struct dm_space_map *sm, dm_block_t *count) @@ -772,7 +772,7 @@ struct dm_space_map *dm_sm_metadata_init { struct sm_metadata *smm;
- smm = kmalloc(sizeof(*smm), GFP_KERNEL); + smm = kvmalloc(sizeof(*smm), GFP_KERNEL); if (!smm) return ERR_PTR(-ENOMEM);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhihao Cheng chengzhihao1@huawei.com
commit 2a0629834cd82f05d424bbc193374f9a43d1f87d upstream.
The inode reclaiming process(See function prune_icache_sb) collects all reclaimable inodes and mark them with I_FREEING flag at first, at that time, other processes will be stuck if they try getting these inodes (See function find_inode_fast), then the reclaiming process destroy the inodes by function dispose_list(). Some filesystems(eg. ext4 with ea_inode feature, ubifs with xattr) may do inode lookup in the inode evicting callback function, if the inode lookup is operated under the inode lru traversing context, deadlock problems may happen.
Case 1: In function ext4_evict_inode(), the ea inode lookup could happen if ea_inode feature is enabled, the lookup process will be stuck under the evicting context like this:
1. File A has inode i_reg and an ea inode i_ea 2. getfattr(A, xattr_buf) // i_ea is added into lru // lru->i_ea 3. Then, following three processes running like this:
PA PB echo 2 > /proc/sys/vm/drop_caches shrink_slab prune_dcache_sb // i_reg is added into lru, lru->i_ea->i_reg prune_icache_sb list_lru_walk_one inode_lru_isolate i_ea->i_state |= I_FREEING // set inode state inode_lru_isolate __iget(i_reg) spin_unlock(&i_reg->i_lock) spin_unlock(lru_lock) rm file A i_reg->nlink = 0 iput(i_reg) // i_reg->nlink is 0, do evict ext4_evict_inode ext4_xattr_delete_inode ext4_xattr_inode_dec_ref_all ext4_xattr_inode_iget ext4_iget(i_ea->i_ino) iget_locked find_inode_fast __wait_on_freeing_inode(i_ea) ----→ AA deadlock dispose_list // cannot be executed by prune_icache_sb wake_up_bit(&i_ea->i_state)
Case 2: In deleted inode writing function ubifs_jnl_write_inode(), file deleting process holds BASEHD's wbuf->io_mutex while getting the xattr inode, which could race with inode reclaiming process(The reclaiming process could try locking BASEHD's wbuf->io_mutex in inode evicting function), then an ABBA deadlock problem would happen as following:
1. File A has inode ia and a xattr(with inode ixa), regular file B has inode ib and a xattr. 2. getfattr(A, xattr_buf) // ixa is added into lru // lru->ixa 3. Then, following three processes running like this:
PA PB PC echo 2 > /proc/sys/vm/drop_caches shrink_slab prune_dcache_sb // ib and ia are added into lru, lru->ixa->ib->ia prune_icache_sb list_lru_walk_one inode_lru_isolate ixa->i_state |= I_FREEING // set inode state inode_lru_isolate __iget(ib) spin_unlock(&ib->i_lock) spin_unlock(lru_lock) rm file B ib->nlink = 0 rm file A iput(ia) ubifs_evict_inode(ia) ubifs_jnl_delete_inode(ia) ubifs_jnl_write_inode(ia) make_reservation(BASEHD) // Lock wbuf->io_mutex ubifs_iget(ixa->i_ino) iget_locked find_inode_fast __wait_on_freeing_inode(ixa) | iput(ib) // ib->nlink is 0, do evict | ubifs_evict_inode | ubifs_jnl_delete_inode(ib) ↓ ubifs_jnl_write_inode ABBA deadlock ←-----make_reservation(BASEHD) dispose_list // cannot be executed by prune_icache_sb wake_up_bit(&ixa->i_state)
Fix the possible deadlock by using new inode state flag I_LRU_ISOLATING to pin the inode in memory while inode_lru_isolate() reclaims its pages instead of using ordinary inode reference. This way inode deletion cannot be triggered from inode_lru_isolate() thus avoiding the deadlock. evict() is made to wait for I_LRU_ISOLATING to be cleared before proceeding with inode cleanup.
Link: https://lore.kernel.org/all/37c29c42-7685-d1f0-067d-63582ffac405@huaweicloud... Link: https://bugzilla.kernel.org/show_bug.cgi?id=219022 Fixes: e50e5129f384 ("ext4: xattr-in-inode support") Fixes: 7959cf3a7506 ("ubifs: journal: Handle xattrs like files") Cc: stable@vger.kernel.org Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Link: https://lore.kernel.org/r/20240809031628.1069873-1-chengzhihao@huaweicloud.c... Reviewed-by: Jan Kara jack@suse.cz Suggested-by: Jan Kara jack@suse.cz Suggested-by: Mateusz Guzik mjguzik@gmail.com Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/inode.c | 39 +++++++++++++++++++++++++++++++++++++-- include/linux/fs.h | 5 +++++ 2 files changed, 42 insertions(+), 2 deletions(-)
--- a/fs/inode.c +++ b/fs/inode.c @@ -486,6 +486,39 @@ static void inode_lru_list_del(struct in this_cpu_dec(nr_unused); }
+static void inode_pin_lru_isolating(struct inode *inode) +{ + lockdep_assert_held(&inode->i_lock); + WARN_ON(inode->i_state & (I_LRU_ISOLATING | I_FREEING | I_WILL_FREE)); + inode->i_state |= I_LRU_ISOLATING; +} + +static void inode_unpin_lru_isolating(struct inode *inode) +{ + spin_lock(&inode->i_lock); + WARN_ON(!(inode->i_state & I_LRU_ISOLATING)); + inode->i_state &= ~I_LRU_ISOLATING; + smp_mb(); + wake_up_bit(&inode->i_state, __I_LRU_ISOLATING); + spin_unlock(&inode->i_lock); +} + +static void inode_wait_for_lru_isolating(struct inode *inode) +{ + spin_lock(&inode->i_lock); + if (inode->i_state & I_LRU_ISOLATING) { + DEFINE_WAIT_BIT(wq, &inode->i_state, __I_LRU_ISOLATING); + wait_queue_head_t *wqh; + + wqh = bit_waitqueue(&inode->i_state, __I_LRU_ISOLATING); + spin_unlock(&inode->i_lock); + __wait_on_bit(wqh, &wq, bit_wait, TASK_UNINTERRUPTIBLE); + spin_lock(&inode->i_lock); + WARN_ON(inode->i_state & I_LRU_ISOLATING); + } + spin_unlock(&inode->i_lock); +} + /** * inode_sb_list_add - add inode to the superblock list of inodes * @inode: inode to add @@ -655,6 +688,8 @@ static void evict(struct inode *inode)
inode_sb_list_del(inode);
+ inode_wait_for_lru_isolating(inode); + /* * Wait for flusher thread to be done with the inode so that filesystem * does not start destroying it while writeback is still running. Since @@ -843,7 +878,7 @@ static enum lru_status inode_lru_isolate * be under pressure before the cache inside the highmem zone. */ if (inode_has_buffers(inode) || !mapping_empty(&inode->i_data)) { - __iget(inode); + inode_pin_lru_isolating(inode); spin_unlock(&inode->i_lock); spin_unlock(lru_lock); if (remove_inode_buffers(inode)) { @@ -855,7 +890,7 @@ static enum lru_status inode_lru_isolate __count_vm_events(PGINODESTEAL, reap); mm_account_reclaimed_pages(reap); } - iput(inode); + inode_unpin_lru_isolating(inode); spin_lock(lru_lock); return LRU_RETRY; } --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2370,6 +2370,9 @@ static inline void kiocb_clone(struct ki * * I_PINNING_FSCACHE_WB Inode is pinning an fscache object for writeback. * + * I_LRU_ISOLATING Inode is pinned being isolated from LRU without holding + * i_count. + * * Q: What is the difference between I_WILL_FREE and I_FREEING? */ #define I_DIRTY_SYNC (1 << 0) @@ -2393,6 +2396,8 @@ static inline void kiocb_clone(struct ki #define I_DONTCACHE (1 << 16) #define I_SYNC_QUEUED (1 << 17) #define I_PINNING_NETFS_WB (1 << 18) +#define __I_LRU_ISOLATING 19 +#define I_LRU_ISOLATING (1 << __I_LRU_ISOLATING)
#define I_DIRTY_INODE (I_DIRTY_SYNC | I_DIRTY_DATASYNC) #define I_DIRTY (I_DIRTY_INODE | I_DIRTY_PAGES)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Al Viro viro@zeniv.linux.org.uk
commit 9a2fa1472083580b6c66bdaf291f591e1170123a upstream.
copy_fd_bitmaps(new, old, count) is expected to copy the first count/BITS_PER_LONG bits from old->full_fds_bits[] and fill the rest with zeroes. What it does is copying enough words (BITS_TO_LONGS(count/BITS_PER_LONG)), then memsets the rest. That works fine, *if* all bits past the cutoff point are clear. Otherwise we are risking garbage from the last word we'd copied.
For most of the callers that is true - expand_fdtable() has count equal to old->max_fds, so there's no open descriptors past count, let alone fully occupied words in ->open_fds[], which is what bits in ->full_fds_bits[] correspond to.
The other caller (dup_fd()) passes sane_fdtable_size(old_fdt, max_fds), which is the smallest multiple of BITS_PER_LONG that covers all opened descriptors below max_fds. In the common case (copying on fork()) max_fds is ~0U, so all opened descriptors will be below it and we are fine, by the same reasons why the call in expand_fdtable() is safe.
Unfortunately, there is a case where max_fds is less than that and where we might, indeed, end up with junk in ->full_fds_bits[] - close_range(from, to, CLOSE_RANGE_UNSHARE) with * descriptor table being currently shared * 'to' being above the current capacity of descriptor table * 'from' being just under some chunk of opened descriptors. In that case we end up with observably wrong behaviour - e.g. spawn a child with CLONE_FILES, get all descriptors in range 0..127 open, then close_range(64, ~0U, CLOSE_RANGE_UNSHARE) and watch dup(0) ending up with descriptor #128, despite #64 being observably not open.
The minimally invasive fix would be to deal with that in dup_fd(). If this proves to add measurable overhead, we can go that way, but let's try to fix copy_fd_bitmaps() first.
* new helper: bitmap_copy_and_expand(to, from, bits_to_copy, size). * make copy_fd_bitmaps() take the bitmap size in words, rather than bits; it's 'count' argument is always a multiple of BITS_PER_LONG, so we are not losing any information, and that way we can use the same helper for all three bitmaps - compiler will see that count is a multiple of BITS_PER_LONG for the large ones, so it'll generate plain memcpy()+memset().
Reproducer added to tools/testing/selftests/core/close_range_test.c
Cc: stable@vger.kernel.org Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/file.c | 28 ++++++++----------- include/linux/bitmap.h | 12 ++++++++ tools/testing/selftests/core/close_range_test.c | 35 ++++++++++++++++++++++++ 3 files changed, 59 insertions(+), 16 deletions(-)
--- a/fs/file.c +++ b/fs/file.c @@ -46,27 +46,23 @@ static void free_fdtable_rcu(struct rcu_ #define BITBIT_NR(nr) BITS_TO_LONGS(BITS_TO_LONGS(nr)) #define BITBIT_SIZE(nr) (BITBIT_NR(nr) * sizeof(long))
+#define fdt_words(fdt) ((fdt)->max_fds / BITS_PER_LONG) // words in ->open_fds /* * Copy 'count' fd bits from the old table to the new table and clear the extra * space if any. This does not copy the file pointers. Called with the files * spinlock held for write. */ -static void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt, - unsigned int count) +static inline void copy_fd_bitmaps(struct fdtable *nfdt, struct fdtable *ofdt, + unsigned int copy_words) { - unsigned int cpy, set; + unsigned int nwords = fdt_words(nfdt);
- cpy = count / BITS_PER_BYTE; - set = (nfdt->max_fds - count) / BITS_PER_BYTE; - memcpy(nfdt->open_fds, ofdt->open_fds, cpy); - memset((char *)nfdt->open_fds + cpy, 0, set); - memcpy(nfdt->close_on_exec, ofdt->close_on_exec, cpy); - memset((char *)nfdt->close_on_exec + cpy, 0, set); - - cpy = BITBIT_SIZE(count); - set = BITBIT_SIZE(nfdt->max_fds) - cpy; - memcpy(nfdt->full_fds_bits, ofdt->full_fds_bits, cpy); - memset((char *)nfdt->full_fds_bits + cpy, 0, set); + bitmap_copy_and_extend(nfdt->open_fds, ofdt->open_fds, + copy_words * BITS_PER_LONG, nwords * BITS_PER_LONG); + bitmap_copy_and_extend(nfdt->close_on_exec, ofdt->close_on_exec, + copy_words * BITS_PER_LONG, nwords * BITS_PER_LONG); + bitmap_copy_and_extend(nfdt->full_fds_bits, ofdt->full_fds_bits, + copy_words, nwords); }
/* @@ -84,7 +80,7 @@ static void copy_fdtable(struct fdtable memcpy(nfdt->fd, ofdt->fd, cpy); memset((char *)nfdt->fd + cpy, 0, set);
- copy_fd_bitmaps(nfdt, ofdt, ofdt->max_fds); + copy_fd_bitmaps(nfdt, ofdt, fdt_words(ofdt)); }
/* @@ -379,7 +375,7 @@ struct files_struct *dup_fd(struct files open_files = sane_fdtable_size(old_fdt, max_fds); }
- copy_fd_bitmaps(new_fdt, old_fdt, open_files); + copy_fd_bitmaps(new_fdt, old_fdt, open_files / BITS_PER_LONG);
old_fds = old_fdt->fd; new_fds = new_fdt->fd; --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -270,6 +270,18 @@ static inline void bitmap_copy_clear_tai dst[nbits / BITS_PER_LONG] &= BITMAP_LAST_WORD_MASK(nbits); }
+static inline void bitmap_copy_and_extend(unsigned long *to, + const unsigned long *from, + unsigned int count, unsigned int size) +{ + unsigned int copy = BITS_TO_LONGS(count); + + memcpy(to, from, copy * sizeof(long)); + if (count % BITS_PER_LONG) + to[copy - 1] &= BITMAP_LAST_WORD_MASK(count); + memset(to + copy, 0, bitmap_size(size) - copy * sizeof(long)); +} + /* * On 32-bit systems bitmaps are represented as u32 arrays internally. On LE64 * machines the order of hi and lo parts of numbers match the bitmap structure. --- a/tools/testing/selftests/core/close_range_test.c +++ b/tools/testing/selftests/core/close_range_test.c @@ -589,4 +589,39 @@ TEST(close_range_cloexec_unshare_syzbot) EXPECT_EQ(close(fd3), 0); }
+TEST(close_range_bitmap_corruption) +{ + pid_t pid; + int status; + struct __clone_args args = { + .flags = CLONE_FILES, + .exit_signal = SIGCHLD, + }; + + /* get the first 128 descriptors open */ + for (int i = 2; i < 128; i++) + EXPECT_GE(dup2(0, i), 0); + + /* get descriptor table shared */ + pid = sys_clone3(&args, sizeof(args)); + ASSERT_GE(pid, 0); + + if (pid == 0) { + /* unshare and truncate descriptor table down to 64 */ + if (sys_close_range(64, ~0U, CLOSE_RANGE_UNSHARE)) + exit(EXIT_FAILURE); + + ASSERT_EQ(fcntl(64, F_GETFD), -1); + /* ... and verify that the range 64..127 is not + stuck "fully used" according to secondary bitmap */ + EXPECT_EQ(dup(0), 64) + exit(EXIT_FAILURE); + exit(EXIT_SUCCESS); + } + + EXPECT_EQ(waitpid(pid, &status, 0), pid); + EXPECT_EQ(true, WIFEXITED(status)); + EXPECT_EQ(0, WEXITSTATUS(status)); +} + TEST_HARNESS_MAIN
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andi Shyti andi.shyti@kernel.org
commit 4e91fa1ef3ce6290b4c598e54b5eb6cf134fbec8 upstream.
Add the missing geni_icc_disable() call before returning in the geni_i2c_runtime_resume() function.
Commit 9ba48db9f77c ("i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume") by Gaosheng missed disabling the interconnect in one case.
Fixes: bf225ed357c6 ("i2c: i2c-qcom-geni: Add interconnect support") Cc: Gaosheng Cui cuigaosheng1@huawei.com Cc: stable@vger.kernel.org # v5.9+ Signed-off-by: Andi Shyti andi.shyti@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/i2c/busses/i2c-qcom-geni.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/i2c/busses/i2c-qcom-geni.c +++ b/drivers/i2c/busses/i2c-qcom-geni.c @@ -986,8 +986,10 @@ static int __maybe_unused geni_i2c_runti return ret;
ret = clk_prepare_enable(gi2c->core_clk); - if (ret) + if (ret) { + geni_icc_disable(&gi2c->se); return ret; + }
ret = geni_se_resources_on(&gi2c->se); if (ret) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt rostedt@goodmis.org
commit d0949cd44a62c4c41b30ea7ae94d8c887f586882 upstream.
When running the following:
# cd /sys/kernel/tracing/ # echo 1 > events/sched/sched_waking/enable # echo 1 > events/sched/sched_switch/enable # echo 0 > tracing_on # dd if=per_cpu/cpu0/trace_pipe_raw of=/tmp/raw0.dat
The dd task would get stuck in an infinite loop in the kernel. What would happen is the following:
When ring_buffer_read_page() returns -1 (no data) then a check is made to see if the buffer is empty (as happens when the page is not full), it will call wait_on_pipe() to wait until the ring buffer has data. When it is it will try again to read data (unless O_NONBLOCK is set).
The issue happens when there's a reader and the file descriptor is closed. The wait_on_pipe() will return when that is the case. But this loop will continue to try again and wait_on_pipe() will again return immediately and the loop will continue and never stop.
Simply check if the file was closed before looping and exit out if it is.
Cc: stable@vger.kernel.org Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Mathieu Desnoyers mathieu.desnoyers@efficios.com Link: https://lore.kernel.org/20240808235730.78bf63e5@rorschach.local.home Fixes: 2aa043a55b9a7 ("tracing/ring-buffer: Fix wait_on_pipe() race") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 10cd38bce2f1..ebe7ce2f5f4a 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -7956,7 +7956,7 @@ tracing_buffers_read(struct file *filp, char __user *ubuf, trace_access_unlock(iter->cpu_file);
if (ret < 0) { - if (trace_empty(iter)) { + if (trace_empty(iter) && !iter->closed) { if ((filp->f_flags & O_NONBLOCK)) return -EAGAIN;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kyle Huey me@kylehuey.com
commit 100bff23818eb61751ed05d64a7df36ce9728a4d upstream.
The regressing commit is new in 6.10. It assumed that anytime event->prog is set bpf_overflow_handler() should be invoked to execute the attached bpf program. This assumption is false for tracing events, and as a result the regressing commit broke bpftrace by invoking the bpf handler with garbage inputs on overflow.
Prior to the regression the overflow handlers formed a chain (of length 0, 1, or 2) and perf_event_set_bpf_handler() (the !tracing case) added bpf_overflow_handler() to that chain, while perf_event_attach_bpf_prog() (the tracing case) did not. Both set event->prog. The chain of overflow handlers was replaced by a single overflow handler slot and a fixed call to bpf_overflow_handler() when appropriate. This modifies the condition there to check event->prog->type == BPF_PROG_TYPE_PERF_EVENT, restoring the previous behavior and fixing bpftrace.
Signed-off-by: Kyle Huey khuey@kylehuey.com Suggested-by: Andrii Nakryiko andrii.nakryiko@gmail.com Reported-by: Joe Damato jdamato@fastly.com Closes: https://lore.kernel.org/lkml/ZpFfocvyF3KHaSzF@LQ3V64L9R2/ Fixes: f11f10bfa1ca ("perf/bpf: Call BPF handler directly, not through overflow machinery") Cc: stable@vger.kernel.org Tested-by: Joe Damato jdamato@fastly.com # bpftrace Acked-by: Andrii Nakryiko andrii@kernel.org Link: https://lore.kernel.org/r/20240813151727.28797-1-jdamato@fastly.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/events/core.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -9708,7 +9708,8 @@ static int __perf_event_overflow(struct
ret = __perf_event_account_interrupt(event, throttle);
- if (event->prog && !bpf_overflow_handler(event, data, regs)) + if (event->prog && event->prog->type == BPF_PROG_TYPE_PERF_EVENT && + !bpf_overflow_handler(event, data, regs)) return ret;
/*
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pedro Falcato pedro.falcato@gmail.com
commit e46bc2e7eb90a370bc27fa2fd98cb8251e7da1ec upstream.
is_madv_discard did its check wrong. MADV_ flags are not bitwise, they're normal sequential numbers. So, for instance: behavior & (/* ... */ | MADV_REMOVE)
tagged both MADV_REMOVE and MADV_RANDOM (bit 0 set) as discard operations.
As a result the kernel could erroneously block certain madvises (e.g MADV_RANDOM or MADV_HUGEPAGE) on sealed VMAs due to them sharing bits with blocked MADV operations (e.g REMOVE or WIPEONFORK).
This is obviously incorrect, so use a switch statement instead.
Link: https://lkml.kernel.org/r/20240807173336.2523757-1-pedro.falcato@gmail.com Link: https://lkml.kernel.org/r/20240807173336.2523757-2-pedro.falcato@gmail.com Fixes: 8be7258aad44 ("mseal: add mseal syscall") Signed-off-by: Pedro Falcato pedro.falcato@gmail.com Tested-by: Jeff Xu jeffxu@chromium.org Reviewed-by: Jeff Xu jeffxu@chromium.org Cc: Kees Cook kees@kernel.org Cc: Liam R. Howlett Liam.Howlett@oracle.com Cc: Shuah Khan shuah@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mseal.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/mm/mseal.c b/mm/mseal.c index bf783bba8ed0..15bba28acc00 100644 --- a/mm/mseal.c +++ b/mm/mseal.c @@ -40,9 +40,17 @@ static bool can_modify_vma(struct vm_area_struct *vma)
static bool is_madv_discard(int behavior) { - return behavior & - (MADV_FREE | MADV_DONTNEED | MADV_DONTNEED_LOCKED | - MADV_REMOVE | MADV_DONTFORK | MADV_WIPEONFORK); + switch (behavior) { + case MADV_FREE: + case MADV_DONTNEED: + case MADV_DONTNEED_LOCKED: + case MADV_REMOVE: + case MADV_DONTFORK: + case MADV_WIPEONFORK: + return true; + } + + return false; }
static bool is_ro_anon(struct vm_area_struct *vma)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
commit 90574d2a675947858b47008df8d07f75ea50d0d0 upstream.
If the "tool->data" allocation fails then there is no need to call osnoise_free_top() and, in fact, doing so will lead to a NULL dereference.
Cc: stable@vger.kernel.org Cc: John Kacur jkacur@redhat.com Cc: "Luis Claudio R. Goncalves" lgoncalv@redhat.com Cc: Clark Williams williams@redhat.com Fixes: 1eceb2fc2ca5 ("rtla/osnoise: Add osnoise top mode") Link: https://lore.kernel.org/f964ed1f-64d2-4fde-ad3e-708331f8f358@stanley.mountai... Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/tracing/rtla/src/osnoise_top.c | 11 ++++------- 1 file changed, 4 insertions(+), 7 deletions(-)
--- a/tools/tracing/rtla/src/osnoise_top.c +++ b/tools/tracing/rtla/src/osnoise_top.c @@ -640,8 +640,10 @@ struct osnoise_tool *osnoise_init_top(st return NULL;
tool->data = osnoise_alloc_top(nr_cpus); - if (!tool->data) - goto out_err; + if (!tool->data) { + osnoise_destroy_tool(tool); + return NULL; + }
tool->params = params;
@@ -649,11 +651,6 @@ struct osnoise_tool *osnoise_init_top(st osnoise_top_handler, NULL);
return tool; - -out_err: - osnoise_free_top(tool->data); - osnoise_destroy_tool(tool); - return NULL; }
static int stop_tracing;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kirill A. Shutemov kirill.shutemov@linux.intel.com
commit 807174a93d24c456503692dc3f5af322ee0b640a upstream.
Unaccepted memory is considered unusable free memory, which is not counted as free on the zone watermark check. This causes get_page_from_freelist() to accept more memory to hit the high watermark, but it creates problems in the reclaim path.
The reclaim path encounters a failed zone watermark check and attempts to reclaim memory. This is usually successful, but if there is little or no reclaimable memory, it can result in endless reclaim with little to no progress. This can occur early in the boot process, just after start of the init process when the only reclaimable memory is the page cache of the init executable and its libraries.
Make unaccepted memory free from watermark check point of view. This way unaccepted memory will never be the trigger of memory reclaim. Accept more memory in the get_page_from_freelist() if needed.
Link: https://lkml.kernel.org/r/20240809114854.3745464-2-kirill.shutemov@linux.int... Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory") Signed-off-by: Kirill A. Shutemov kirill.shutemov@linux.intel.com Reported-by: Jianxiong Gao jxgao@google.com Acked-by: David Hildenbrand david@redhat.com Tested-by: Jianxiong Gao jxgao@google.com Cc: Borislav Petkov bp@alien8.de Cc: Johannes Weiner hannes@cmpxchg.org Cc: Kirill A. Shutemov kirill.shutemov@linux.intel.com Cc: Matthew Wilcox willy@infradead.org Cc: Mel Gorman mgorman@suse.de Cc: Mike Rapoport (Microsoft) rppt@kernel.org Cc: Tom Lendacky thomas.lendacky@amd.com Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org [6.5+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/page_alloc.c | 42 ++++++++++++++++++++---------------------- 1 file changed, 20 insertions(+), 22 deletions(-)
--- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -287,7 +287,7 @@ EXPORT_SYMBOL(nr_online_nodes);
static bool page_contains_unaccepted(struct page *page, unsigned int order); static void accept_page(struct page *page, unsigned int order); -static bool try_to_accept_memory(struct zone *zone, unsigned int order); +static bool cond_accept_memory(struct zone *zone, unsigned int order); static inline bool has_unaccepted_memory(void); static bool __free_unaccepted(struct page *page);
@@ -3059,9 +3059,6 @@ static inline long __zone_watermark_unus if (!(alloc_flags & ALLOC_CMA)) unusable_free += zone_page_state(z, NR_FREE_CMA_PAGES); #endif -#ifdef CONFIG_UNACCEPTED_MEMORY - unusable_free += zone_page_state(z, NR_UNACCEPTED); -#endif
return unusable_free; } @@ -3355,6 +3352,8 @@ retry: } }
+ cond_accept_memory(zone, order); + /* * Detect whether the number of free pages is below high * watermark. If so, we will decrease pcp->high and free @@ -3380,10 +3379,8 @@ check_alloc_wmark: gfp_mask)) { int ret;
- if (has_unaccepted_memory()) { - if (try_to_accept_memory(zone, order)) - goto try_this_zone; - } + if (cond_accept_memory(zone, order)) + goto try_this_zone;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* @@ -3437,10 +3434,8 @@ try_this_zone:
return page; } else { - if (has_unaccepted_memory()) { - if (try_to_accept_memory(zone, order)) - goto try_this_zone; - } + if (cond_accept_memory(zone, order)) + goto try_this_zone;
#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT /* Try again if zone has deferred pages */ @@ -6933,9 +6928,6 @@ static bool try_to_accept_memory_one(str struct page *page; bool last;
- if (list_empty(&zone->unaccepted_pages)) - return false; - spin_lock_irqsave(&zone->lock, flags); page = list_first_entry_or_null(&zone->unaccepted_pages, struct page, lru); @@ -6961,23 +6953,29 @@ static bool try_to_accept_memory_one(str return true; }
-static bool try_to_accept_memory(struct zone *zone, unsigned int order) +static bool cond_accept_memory(struct zone *zone, unsigned int order) { long to_accept; - int ret = false; + bool ret = false; + + if (!has_unaccepted_memory()) + return false; + + if (list_empty(&zone->unaccepted_pages)) + return false;
/* How much to accept to get to high watermark? */ to_accept = high_wmark_pages(zone) - (zone_page_state(zone, NR_FREE_PAGES) - - __zone_watermark_unusable_free(zone, order, 0)); + __zone_watermark_unusable_free(zone, order, 0) - + zone_page_state(zone, NR_UNACCEPTED));
- /* Accept at least one page */ - do { + while (to_accept > 0) { if (!try_to_accept_memory_one(zone)) break; ret = true; to_accept -= MAX_ORDER_NR_PAGES; - } while (to_accept > 0); + }
return ret; } @@ -7020,7 +7018,7 @@ static void accept_page(struct page *pag { }
-static bool try_to_accept_memory(struct zone *zone, unsigned int order) +static bool cond_accept_memory(struct zone *zone, unsigned int order) { return false; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Hildenbrand david@redhat.com
commit 5f75cfbd6bb02295ddaed48adf667b6c828ce07b upstream.
We recently made GUP's common page table walking code to also walk hugetlb VMAs without most hugetlb special-casing, preparing for the future of having less hugetlb-specific page table walking code in the codebase. Turns out that we missed one page table locking detail: page table locking for hugetlb folios that are not mapped using a single PMD/PUD.
Assume we have hugetlb folio that spans multiple PTEs (e.g., 64 KiB hugetlb folios on arm64 with 4 KiB base page size). GUP, as it walks the page tables, will perform a pte_offset_map_lock() to grab the PTE table lock.
However, hugetlb that concurrently modifies these page tables would actually grab the mm->page_table_lock: with USE_SPLIT_PTE_PTLOCKS, the locks would differ. Something similar can happen right now with hugetlb folios that span multiple PMDs when USE_SPLIT_PMD_PTLOCKS.
This issue can be reproduced [1], for example triggering:
[ 3105.936100] ------------[ cut here ]------------ [ 3105.939323] WARNING: CPU: 31 PID: 2732 at mm/gup.c:142 try_grab_folio+0x11c/0x188 [ 3105.944634] Modules linked in: [...] [ 3105.974841] CPU: 31 PID: 2732 Comm: reproducer Not tainted 6.10.0-64.eln141.aarch64 #1 [ 3105.980406] Hardware name: QEMU KVM Virtual Machine, BIOS edk2-20240524-4.fc40 05/24/2024 [ 3105.986185] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 3105.991108] pc : try_grab_folio+0x11c/0x188 [ 3105.994013] lr : follow_page_pte+0xd8/0x430 [ 3105.996986] sp : ffff80008eafb8f0 [ 3105.999346] x29: ffff80008eafb900 x28: ffffffe8d481f380 x27: 00f80001207cff43 [ 3106.004414] x26: 0000000000000001 x25: 0000000000000000 x24: ffff80008eafba48 [ 3106.009520] x23: 0000ffff9372f000 x22: ffff7a54459e2000 x21: ffff7a546c1aa978 [ 3106.014529] x20: ffffffe8d481f3c0 x19: 0000000000610041 x18: 0000000000000001 [ 3106.019506] x17: 0000000000000001 x16: ffffffffffffffff x15: 0000000000000000 [ 3106.024494] x14: ffffb85477fdfe08 x13: 0000ffff9372ffff x12: 0000000000000000 [ 3106.029469] x11: 1fffef4a88a96be1 x10: ffff7a54454b5f0c x9 : ffffb854771b12f0 [ 3106.034324] x8 : 0008000000000000 x7 : ffff7a546c1aa980 x6 : 0008000000000080 [ 3106.038902] x5 : 00000000001207cf x4 : 0000ffff9372f000 x3 : ffffffe8d481f000 [ 3106.043420] x2 : 0000000000610041 x1 : 0000000000000001 x0 : 0000000000000000 [ 3106.047957] Call trace: [ 3106.049522] try_grab_folio+0x11c/0x188 [ 3106.051996] follow_pmd_mask.constprop.0.isra.0+0x150/0x2e0 [ 3106.055527] follow_page_mask+0x1a0/0x2b8 [ 3106.058118] __get_user_pages+0xf0/0x348 [ 3106.060647] faultin_page_range+0xb0/0x360 [ 3106.063651] do_madvise+0x340/0x598
Let's make huge_pte_lockptr() effectively use the same PT locks as any core-mm page table walker would. Add ptep_lockptr() to obtain the PTE page table lock using a pte pointer -- unfortunately we cannot convert pte_lockptr() because virt_to_page() doesn't work with kmap'ed page tables we can have with CONFIG_HIGHPTE.
Handle CONFIG_PGTABLE_LEVELS correctly by checking in reverse order, such that when e.g., CONFIG_PGTABLE_LEVELS==2 with PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE will work as expected. Document why that works.
There is one ugly case: powerpc 8xx, whereby we have an 8 MiB hugetlb folio being mapped using two PTE page tables. While hugetlb wants to take the PMD table lock, core-mm would grab the PTE table lock of one of both PTE page tables. In such corner cases, we have to make sure that both locks match, which is (fortunately!) currently guaranteed for 8xx as it does not support SMP and consequently doesn't use split PT locks.
[1] https://lore.kernel.org/all/1bbfcc7f-f222-45a5-ac44-c5a1381c596d@redhat.com/
Link: https://lkml.kernel.org/r/20240801204748.99107-1-david@redhat.com Fixes: 9cb28da54643 ("mm/gup: handle hugetlb in the generic follow_page_mask code") Signed-off-by: David Hildenbrand david@redhat.com Acked-by: Peter Xu peterx@redhat.com Reviewed-by: Baolin Wang baolin.wang@linux.alibaba.com Tested-by: Baolin Wang baolin.wang@linux.alibaba.com Cc: Peter Xu peterx@redhat.com Cc: Oscar Salvador osalvador@suse.de Cc: Muchun Song muchun.song@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/hugetlb.h | 33 ++++++++++++++++++++++++++++++--- include/linux/mm.h | 11 +++++++++++ 2 files changed, 41 insertions(+), 3 deletions(-)
--- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -967,10 +967,37 @@ static inline bool htlb_allow_alloc_fall static inline spinlock_t *huge_pte_lockptr(struct hstate *h, struct mm_struct *mm, pte_t *pte) { - if (huge_page_size(h) == PMD_SIZE) + const unsigned long size = huge_page_size(h); + + VM_WARN_ON(size == PAGE_SIZE); + + /* + * hugetlb must use the exact same PT locks as core-mm page table + * walkers would. When modifying a PTE table, hugetlb must take the + * PTE PT lock, when modifying a PMD table, hugetlb must take the PMD + * PT lock etc. + * + * The expectation is that any hugetlb folio smaller than a PMD is + * always mapped into a single PTE table and that any hugetlb folio + * smaller than a PUD (but at least as big as a PMD) is always mapped + * into a single PMD table. + * + * If that does not hold for an architecture, then that architecture + * must disable split PT locks such that all *_lockptr() functions + * will give us the same result: the per-MM PT lock. + * + * Note that with e.g., CONFIG_PGTABLE_LEVELS=2 where + * PGDIR_SIZE==P4D_SIZE==PUD_SIZE==PMD_SIZE, we'd use pud_lockptr() + * and core-mm would use pmd_lockptr(). However, in such configurations + * split PMD locks are disabled -- they don't make sense on a single + * PGDIR page table -- and the end result is the same. + */ + if (size >= PUD_SIZE) + return pud_lockptr(mm, (pud_t *) pte); + else if (size >= PMD_SIZE || IS_ENABLED(CONFIG_HIGHPTE)) return pmd_lockptr(mm, (pmd_t *) pte); - VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); - return &mm->page_table_lock; + /* pte_alloc_huge() only applies with !CONFIG_HIGHPTE */ + return ptep_lockptr(mm, pte); }
#ifndef hugepages_supported --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2960,6 +2960,13 @@ static inline spinlock_t *pte_lockptr(st return ptlock_ptr(page_ptdesc(pmd_page(*pmd))); }
+static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte) +{ + BUILD_BUG_ON(IS_ENABLED(CONFIG_HIGHPTE)); + BUILD_BUG_ON(MAX_PTRS_PER_PTE * sizeof(pte_t) > PAGE_SIZE); + return ptlock_ptr(virt_to_ptdesc(pte)); +} + static inline bool ptlock_init(struct ptdesc *ptdesc) { /* @@ -2984,6 +2991,10 @@ static inline spinlock_t *pte_lockptr(st { return &mm->page_table_lock; } +static inline spinlock_t *ptep_lockptr(struct mm_struct *mm, pte_t *pte) +{ + return &mm->page_table_lock; +} static inline void ptlock_cache_init(void) {} static inline bool ptlock_init(struct ptdesc *ptdesc) { return true; } static inline void ptlock_free(struct ptdesc *ptdesc) {}
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yu Kuai yukuai3@huawei.com
commit c916ca35308d3187c9928664f9be249b22a3a701 upstream.
read_balance() will avoid reading from slow disks as much as possible, however, if valid data only lands in slow disks, and a new normal disk is still in recovery, unrecovered data can be read:
raid1_read_request read_balance raid1_should_read_first -> return false choose_best_rdev -> normal disk is not recovered, return -1 choose_bb_rdev -> missing the checking of recovery, return the normal disk -> read unrecovered data
Root cause is that the checking of recovery is missing in choose_bb_rdev(). Hence add such checking to fix the problem.
Also fix similar problem in choose_slow_rdev().
Cc: stable@vger.kernel.org Fixes: 9f3ced792203 ("md/raid1: factor out choose_bb_rdev() from read_balance()") Fixes: dfa8ecd167c1 ("md/raid1: factor out choose_slow_rdev() from read_balance()") Reported-and-tested-by: Mateusz Jończyk mat.jonczyk@o2.pl Closes: https://lore.kernel.org/all/9952f532-2554-44bf-b906-4880b2e88e3a@o2.pl/ Signed-off-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20240803091137.3197008-1-yukuai1@huaweicloud.com Signed-off-by: Song Liu song@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/raid1.c | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 7acfe7c9dc8d..761989d67906 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -617,6 +617,12 @@ static int choose_first_rdev(struct r1conf *conf, struct r1bio *r1_bio, return -1; }
+static bool rdev_in_recovery(struct md_rdev *rdev, struct r1bio *r1_bio) +{ + return !test_bit(In_sync, &rdev->flags) && + rdev->recovery_offset < r1_bio->sector + r1_bio->sectors; +} + static int choose_bb_rdev(struct r1conf *conf, struct r1bio *r1_bio, int *max_sectors) { @@ -635,6 +641,7 @@ static int choose_bb_rdev(struct r1conf *conf, struct r1bio *r1_bio,
rdev = conf->mirrors[disk].rdev; if (!rdev || test_bit(Faulty, &rdev->flags) || + rdev_in_recovery(rdev, r1_bio) || test_bit(WriteMostly, &rdev->flags)) continue;
@@ -673,7 +680,8 @@ static int choose_slow_rdev(struct r1conf *conf, struct r1bio *r1_bio,
rdev = conf->mirrors[disk].rdev; if (!rdev || test_bit(Faulty, &rdev->flags) || - !test_bit(WriteMostly, &rdev->flags)) + !test_bit(WriteMostly, &rdev->flags) || + rdev_in_recovery(rdev, r1_bio)) continue;
/* there are no bad blocks, we can use this disk */ @@ -733,9 +741,7 @@ static bool rdev_readable(struct md_rdev *rdev, struct r1bio *r1_bio) if (!rdev || test_bit(Faulty, &rdev->flags)) return false;
- /* still in recovery */ - if (!test_bit(In_sync, &rdev->flags) && - rdev->recovery_offset < r1_bio->sector + r1_bio->sectors) + if (rdev_in_recovery(rdev, r1_bio)) return false;
/* don't read from slow disk unless have to */
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Haiyang Zhang haiyangz@microsoft.com
commit 32316f676b4ee87c0404d333d248ccf777f739bc upstream.
The MANA driver's RX buffer alloc_size is passed into napi_build_skb() to create SKB. skb_shinfo(skb) is located at the end of skb, and its alignment is affected by the alloc_size passed into napi_build_skb(). The size needs to be aligned properly for better performance and atomic operations. Otherwise, on ARM64 CPU, for certain MTU settings like 4000, atomic operations may panic on the skb_shinfo(skb)->dataref due to alignment fault.
To fix this bug, add proper alignment to the alloc_size calculation.
Sample panic info: [ 253.298819] Unable to handle kernel paging request at virtual address ffff000129ba5cce [ 253.300900] Mem abort info: [ 253.301760] ESR = 0x0000000096000021 [ 253.302825] EC = 0x25: DABT (current EL), IL = 32 bits [ 253.304268] SET = 0, FnV = 0 [ 253.305172] EA = 0, S1PTW = 0 [ 253.306103] FSC = 0x21: alignment fault Call trace: __skb_clone+0xfc/0x198 skb_clone+0x78/0xe0 raw6_local_deliver+0xfc/0x228 ip6_protocol_deliver_rcu+0x80/0x500 ip6_input_finish+0x48/0x80 ip6_input+0x48/0xc0 ip6_sublist_rcv_finish+0x50/0x78 ip6_sublist_rcv+0x1cc/0x2b8 ipv6_list_rcv+0x100/0x150 __netif_receive_skb_list_core+0x180/0x220 netif_receive_skb_list_internal+0x198/0x2a8 __napi_poll+0x138/0x250 net_rx_action+0x148/0x330 handle_softirqs+0x12c/0x3a0
Cc: stable@vger.kernel.org Fixes: 80f6215b450e ("net: mana: Add support for jumbo frame") Signed-off-by: Haiyang Zhang haiyangz@microsoft.com Reviewed-by: Long Li longli@microsoft.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/microsoft/mana/mana_en.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -599,7 +599,11 @@ static void mana_get_rxbuf_cfg(int mtu, else *headroom = XDP_PACKET_HEADROOM;
- *alloc_size = mtu + MANA_RXBUF_PAD + *headroom; + *alloc_size = SKB_DATA_ALIGN(mtu + MANA_RXBUF_PAD + *headroom); + + /* Using page pool in this case, so alloc_size is PAGE_SIZE */ + if (*alloc_size < PAGE_SIZE) + *alloc_size = PAGE_SIZE;
*datasize = mtu + ETH_HLEN; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit 63de936b513f7a9ce559194d3269ac291f4f4662 upstream.
Commit a0821ca14bb8 ("media: atomisp: Remove test pattern generator (TPG) support") broke BYT support because it removed a seemingly unused field from struct sh_css_sp_config and a seemingly unused value from enum ia_css_input_mode.
But these are part of the ABI between the kernel and firmware on ISP2400 and this part of the TPG support removal changes broke ISP2400 support.
ISP2401 support was not affected because on ISP2401 only a part of struct sh_css_sp_config is used.
Restore the removed field and enum value to fix this.
Fixes: a0821ca14bb8 ("media: atomisp: Remove test pattern generator (TPG) support") Cc: stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/media/atomisp/pci/ia_css_stream_public.h | 8 ++++-- drivers/staging/media/atomisp/pci/sh_css_internal.h | 19 ++++++++++++--- 2 files changed, 22 insertions(+), 5 deletions(-)
--- a/drivers/staging/media/atomisp/pci/ia_css_stream_public.h +++ b/drivers/staging/media/atomisp/pci/ia_css_stream_public.h @@ -27,12 +27,16 @@ #include "ia_css_prbs.h" #include "ia_css_input_port.h"
-/* Input modes, these enumerate all supported input modes. - * Note that not all ISP modes support all input modes. +/* + * Input modes, these enumerate all supported input modes. + * This enum is part of the atomisp firmware ABI and must + * NOT be changed! + * Note that not all ISP modes support all input modes. */ enum ia_css_input_mode { IA_CSS_INPUT_MODE_SENSOR, /** data from sensor */ IA_CSS_INPUT_MODE_FIFO, /** data from input-fifo */ + IA_CSS_INPUT_MODE_TPG, /** data from test-pattern generator */ IA_CSS_INPUT_MODE_PRBS, /** data from pseudo-random bit stream */ IA_CSS_INPUT_MODE_MEMORY, /** data from a frame in memory */ IA_CSS_INPUT_MODE_BUFFERED_SENSOR /** data is sent through mipi buffer */ --- a/drivers/staging/media/atomisp/pci/sh_css_internal.h +++ b/drivers/staging/media/atomisp/pci/sh_css_internal.h @@ -341,7 +341,14 @@ struct sh_css_sp_input_formatter_set {
#define IA_CSS_MIPI_SIZE_CHECK_MAX_NOF_ENTRIES_PER_PORT (3)
-/* SP configuration information */ +/* + * SP configuration information + * + * This struct is part of the atomisp firmware ABI and is directly copied + * to ISP DRAM by sh_css_store_sp_group_to_ddr() + * + * Do NOT change this struct's layout or remove seemingly unused fields! + */ struct sh_css_sp_config { u8 no_isp_sync; /* Signal host immediately after start */ u8 enable_raw_pool_locking; /** Enable Raw Buffer Locking for HALv3 Support */ @@ -351,6 +358,10 @@ struct sh_css_sp_config { host (true) or when they are passed to the preview/video pipe (false). */
+ /* + * Note the fields below are only used on the ISP2400 not on the ISP2401, + * sh_css_store_sp_group_to_ddr() skip copying these when run on the ISP2401. + */ struct { u8 a_changed; u8 b_changed; @@ -360,11 +371,13 @@ struct sh_css_sp_config { } input_formatter;
sync_generator_cfg_t sync_gen; + tpg_cfg_t tpg; prbs_cfg_t prbs; input_system_cfg_t input_circuit; u8 input_circuit_cfg_changed; - u32 mipi_sizes_for_check[N_CSI_PORTS][IA_CSS_MIPI_SIZE_CHECK_MAX_NOF_ENTRIES_PER_PORT]; - u8 enable_isys_event_queue; + u32 mipi_sizes_for_check[N_CSI_PORTS][IA_CSS_MIPI_SIZE_CHECK_MAX_NOF_ENTRIES_PER_PORT]; + /* These last 2 fields are used on both the ISP2400 and the ISP2401 */ + u8 enable_isys_event_queue; u8 disable_cont_vf; };
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Long Li longli@microsoft.com
commit 58a63729c957621f1990c3494c702711188ca347 upstream.
After napi_complete_done() is called when NAPI is polling in the current process context, another NAPI may be scheduled and start running in softirq on another CPU and may ring the doorbell before the current CPU does. When combined with unnecessary rings when there is no need to arm the CQ, it triggers error paths in the hardware.
This patch fixes this by calling napi_complete_done() after doorbell rings. It limits the number of unnecessary rings when there is no need to arm. MANA hardware specifies that there must be one doorbell ring every 8 CQ wraparounds. This driver guarantees one doorbell ring as soon as the number of consumed CQEs exceeds 4 CQ wraparounds. In practical workloads, the 4 CQ wraparounds proves to be big enough that it rarely exceeds this limit before all the napi weight is consumed.
To implement this, add a per-CQ counter cq->work_done_since_doorbell, and make sure the CQ is armed as soon as passing 4 wraparounds of the CQ.
Cc: stable@vger.kernel.org Fixes: e1b5683ff62e ("net: mana: Move NAPI from EQ to CQ") Reviewed-by: Haiyang Zhang haiyangz@microsoft.com Signed-off-by: Long Li longli@microsoft.com Link: https://patch.msgid.link/1723219138-29887-1-git-send-email-longli@linuxonhyp... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/microsoft/mana/mana_en.c | 22 ++++++++++++++-------- include/net/mana/mana.h | 1 + 2 files changed, 15 insertions(+), 8 deletions(-)
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c @@ -1777,7 +1777,6 @@ static void mana_poll_rx_cq(struct mana_ static int mana_cq_handler(void *context, struct gdma_queue *gdma_queue) { struct mana_cq *cq = context; - u8 arm_bit; int w;
WARN_ON_ONCE(cq->gdma_cq != gdma_queue); @@ -1788,16 +1787,23 @@ static int mana_cq_handler(void *context mana_poll_tx_cq(cq);
w = cq->work_done; + cq->work_done_since_doorbell += w;
- if (w < cq->budget && - napi_complete_done(&cq->napi, w)) { - arm_bit = SET_ARM_BIT; - } else { - arm_bit = 0; + if (w < cq->budget) { + mana_gd_ring_cq(gdma_queue, SET_ARM_BIT); + cq->work_done_since_doorbell = 0; + napi_complete_done(&cq->napi, w); + } else if (cq->work_done_since_doorbell > + cq->gdma_cq->queue_size / COMP_ENTRY_SIZE * 4) { + /* MANA hardware requires at least one doorbell ring every 8 + * wraparounds of CQ even if there is no need to arm the CQ. + * This driver rings the doorbell as soon as we have exceeded + * 4 wraparounds. + */ + mana_gd_ring_cq(gdma_queue, 0); + cq->work_done_since_doorbell = 0; }
- mana_gd_ring_cq(gdma_queue, arm_bit); - return w; }
--- a/include/net/mana/mana.h +++ b/include/net/mana/mana.h @@ -274,6 +274,7 @@ struct mana_cq { /* NAPI data */ struct napi_struct napi; int work_done; + int work_done_since_doorbell; int budget; };
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Janne Grunau j@jannau.net
commit 2ad4e1ada8eebafa2d75a4b75eeeca882de6ada1 upstream.
wpa_supplicant 2.11 sends since 1efdba5fdc2c ("Handle PMKSA flush in the driver for SAE/OWE offload cases") SSID based PMKSA del commands. brcmfmac is not prepared and tries to dereference the NULL bssid and pmkid pointers in cfg80211_pmksa. PMKID_V3 operations support SSID based updates so copy the SSID.
Fixes: a96202acaea4 ("wifi: brcmfmac: cfg80211: Add support for PMKID_V3 operations") Cc: stable@vger.kernel.org # 6.4.x Signed-off-by: Janne Grunau j@jannau.net Reviewed-by: Neal Gompa neal@gompa.dev Acked-by: Arend van Spriel arend.vanspriel@broadcom.com Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://patch.msgid.link/20240803-brcmfmac_pmksa_del_ssid-v1-1-4e85f19135e1@... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 13 +++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c @@ -4320,9 +4320,16 @@ brcmf_pmksa_v3_op(struct brcmf_if *ifp, /* Single PMK operation */ pmk_op->count = cpu_to_le16(1); length += sizeof(struct brcmf_pmksa_v3); - memcpy(pmk_op->pmk[0].bssid, pmksa->bssid, ETH_ALEN); - memcpy(pmk_op->pmk[0].pmkid, pmksa->pmkid, WLAN_PMKID_LEN); - pmk_op->pmk[0].pmkid_len = WLAN_PMKID_LEN; + if (pmksa->bssid) + memcpy(pmk_op->pmk[0].bssid, pmksa->bssid, ETH_ALEN); + if (pmksa->pmkid) { + memcpy(pmk_op->pmk[0].pmkid, pmksa->pmkid, WLAN_PMKID_LEN); + pmk_op->pmk[0].pmkid_len = WLAN_PMKID_LEN; + } + if (pmksa->ssid && pmksa->ssid_len) { + memcpy(pmk_op->pmk[0].ssid.SSID, pmksa->ssid, pmksa->ssid_len); + pmk_op->pmk[0].ssid.SSID_len = pmksa->ssid_len; + } pmk_op->pmk[0].time_left = cpu_to_le32(alive ? BRCMF_PMKSA_NO_EXPIRY : 0); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Kellermann max.kellermann@ionos.com
commit f71aa06398aabc2e3eaac25acdf3d62e0094ba70 upstream.
This fixes a NULL pointer dereference bug due to a data race which looks like this:
BUG: kernel NULL pointer dereference, address: 0000000000000008 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 33 PID: 16573 Comm: kworker/u97:799 Not tainted 6.8.7-cm4all1-hp+ #43 Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 10/17/2018 Workqueue: events_unbound netfs_rreq_write_to_cache_work RIP: 0010:cachefiles_prepare_write+0x30/0xa0 Code: 57 41 56 45 89 ce 41 55 49 89 cd 41 54 49 89 d4 55 53 48 89 fb 48 83 ec 08 48 8b 47 08 48 83 7f 10 00 48 89 34 24 48 8b 68 20 <48> 8b 45 08 4c 8b 38 74 45 49 8b 7f 50 e8 4e a9 b0 ff 48 8b 73 10 RSP: 0018:ffffb4e78113bde0 EFLAGS: 00010286 RAX: ffff976126be6d10 RBX: ffff97615cdb8438 RCX: 0000000000020000 RDX: ffff97605e6c4c68 RSI: ffff97605e6c4c60 RDI: ffff97615cdb8438 RBP: 0000000000000000 R08: 0000000000278333 R09: 0000000000000001 R10: ffff97605e6c4600 R11: 0000000000000001 R12: ffff97605e6c4c68 R13: 0000000000020000 R14: 0000000000000001 R15: ffff976064fe2c00 FS: 0000000000000000(0000) GS:ffff9776dfd40000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000008 CR3: 000000005942c002 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x1f/0x70 ? page_fault_oops+0x15d/0x440 ? search_module_extables+0xe/0x40 ? fixup_exception+0x22/0x2f0 ? exc_page_fault+0x5f/0x100 ? asm_exc_page_fault+0x22/0x30 ? cachefiles_prepare_write+0x30/0xa0 netfs_rreq_write_to_cache_work+0x135/0x2e0 process_one_work+0x137/0x2c0 worker_thread+0x2e9/0x400 ? __pfx_worker_thread+0x10/0x10 kthread+0xcc/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> Modules linked in: CR2: 0000000000000008 ---[ end trace 0000000000000000 ]---
This happened because fscache_cookie_state_machine() was slow and was still running while another process invoked fscache_unuse_cookie(); this led to a fscache_cookie_lru_do_one() call, setting the FSCACHE_COOKIE_DO_LRU_DISCARD flag, which was picked up by fscache_cookie_state_machine(), withdrawing the cookie via cachefiles_withdraw_cookie(), clearing cookie->cache_priv.
At the same time, yet another process invoked cachefiles_prepare_write(), which found a NULL pointer in this code line:
struct cachefiles_object *object = cachefiles_cres_object(cres);
The next line crashes, obviously:
struct cachefiles_cache *cache = object->volume->cache;
During cachefiles_prepare_write(), the "n_accesses" counter is non-zero (via fscache_begin_operation()). The cookie must not be withdrawn until it drops to zero.
The counter is checked by fscache_cookie_state_machine() before switching to FSCACHE_COOKIE_STATE_RELINQUISHING and FSCACHE_COOKIE_STATE_WITHDRAWING (in "case FSCACHE_COOKIE_STATE_FAILED"), but not for FSCACHE_COOKIE_STATE_LRU_DISCARDING ("case FSCACHE_COOKIE_STATE_ACTIVE").
This patch adds the missing check. With a non-zero access counter, the function returns and the next fscache_end_cookie_access() call will queue another fscache_cookie_state_machine() call to handle the still-pending FSCACHE_COOKIE_DO_LRU_DISCARD.
Fixes: 12bb21a29c19 ("fscache: Implement cookie user counting and resource pinning") Signed-off-by: Max Kellermann max.kellermann@ionos.com Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/20240729162002.3436763-2-dhowells@redhat.com cc: Jeff Layton jlayton@kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org cc: stable@vger.kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/netfs/fscache_cookie.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/netfs/fscache_cookie.c +++ b/fs/netfs/fscache_cookie.c @@ -741,6 +741,10 @@ again_locked: spin_lock(&cookie->lock); } if (test_bit(FSCACHE_COOKIE_DO_LRU_DISCARD, &cookie->flags)) { + if (atomic_read(&cookie->n_accesses) != 0) + /* still being accessed: postpone it */ + break; + __fscache_set_cookie_state(cookie, FSCACHE_COOKIE_STATE_LRU_DISCARDING); wake = true;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhen Lei thunder.leizhen@huawei.com
commit 379d9af3f3da2da1bbfa67baf1820c72a080d1f1 upstream.
The count increases only when a node is successfully added to the linked list.
Cc: stable@vger.kernel.org Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls") Signed-off-by: Zhen Lei thunder.leizhen@huawei.com Acked-by: Stephen Smalley stephen.smalley.work@gmail.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/selinux/avc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/security/selinux/avc.c +++ b/security/selinux/avc.c @@ -330,12 +330,12 @@ static int avc_add_xperms_decision(struc { struct avc_xperms_decision_node *dest_xpd;
- node->ae.xp_node->xp.len++; dest_xpd = avc_xperms_decision_alloc(src->used); if (!dest_xpd) return -ENOMEM; avc_copy_xperms_decision(&dest_xpd->xpd, src); list_add(&dest_xpd->xpd_list, &node->ae.xp_node->xpd_head); + node->ae.xp_node->xp.len++; return 0; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhen Lei thunder.leizhen@huawei.com
commit 6dd1e4c045afa6a4ba5d46f044c83bd357c593c2 upstream.
When avc_add_xperms_decision() fails, the information recorded by the new avc node is incomplete. In this case, the new avc node should be released instead of replacing the old avc node.
Cc: stable@vger.kernel.org Fixes: fa1aa143ac4a ("selinux: extended permissions for ioctls") Suggested-by: Stephen Smalley stephen.smalley.work@gmail.com Signed-off-by: Zhen Lei thunder.leizhen@huawei.com Acked-by: Stephen Smalley stephen.smalley.work@gmail.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/selinux/avc.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/security/selinux/avc.c +++ b/security/selinux/avc.c @@ -907,7 +907,11 @@ static int avc_update_node(u32 event, u3 node->ae.avd.auditdeny &= ~perms; break; case AVC_CALLBACK_ADD_XPERMS: - avc_add_xperms_decision(node, xpd); + rc = avc_add_xperms_decision(node, xpd); + if (rc) { + avc_node_kill(node); + goto out_unlock; + } break; } avc_node_replace(node, orig);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suren Baghdasaryan surenb@google.com
commit 766c163c2068b45330664fb67df67268e588a22d upstream.
During CMA activation, pages in CMA area are prepared and then freed without being allocated. This triggers warnings when memory allocation debug config (CONFIG_MEM_ALLOC_PROFILING_DEBUG) is enabled. Fix this by marking these pages not tagged before freeing them.
Link: https://lkml.kernel.org/r/20240813150758.855881-2-surenb@google.com Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty") Signed-off-by: Suren Baghdasaryan surenb@google.com Acked-by: David Hildenbrand david@redhat.com Cc: Kees Cook keescook@chromium.org Cc: Kent Overstreet kent.overstreet@linux.dev Cc: Pasha Tatashin pasha.tatashin@soleen.com Cc: Sourav Panda souravpanda@google.com Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org [6.10] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mm_init.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2293,6 +2293,8 @@ void __init init_cma_reserved_pageblock(
set_pageblock_migratetype(page, MIGRATE_CMA); set_page_refcounted(page); + /* pages were reserved and not allocated */ + clear_page_tag_ref(page); __free_pages(page, pageblock_order);
adjust_managed_page_count(page, pageblock_nr_pages);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Waiman Long longman@redhat.com
commit d75abd0d0bc29e6ebfebbf76d11b4067b35844af upstream.
The memory_failure_cpu structure is a per-cpu structure. Access to its content requires the use of get_cpu_var() to lock in the current CPU and disable preemption. The use of a regular spinlock_t for locking purpose is fine for a non-RT kernel.
Since the integration of RT spinlock support into the v5.15 kernel, a spinlock_t in a RT kernel becomes a sleeping lock and taking a sleeping lock in a preemption disabled context is illegal resulting in the following kind of warning.
[12135.732244] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 [12135.732248] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 270076, name: kworker/0:0 [12135.732252] preempt_count: 1, expected: 0 [12135.732255] RCU nest depth: 2, expected: 2 : [12135.732420] Hardware name: Dell Inc. PowerEdge R640/0HG0J8, BIOS 2.10.2 02/24/2021 [12135.732423] Workqueue: kacpi_notify acpi_os_execute_deferred [12135.732433] Call Trace: [12135.732436] <TASK> [12135.732450] dump_stack_lvl+0x57/0x81 [12135.732461] __might_resched.cold+0xf4/0x12f [12135.732479] rt_spin_lock+0x4c/0x100 [12135.732491] memory_failure_queue+0x40/0xe0 [12135.732503] ghes_do_memory_failure+0x53/0x390 [12135.732516] ghes_do_proc.constprop.0+0x229/0x3e0 [12135.732575] ghes_proc+0xf9/0x1a0 [12135.732591] ghes_notify_hed+0x6a/0x150 [12135.732602] notifier_call_chain+0x43/0xb0 [12135.732626] blocking_notifier_call_chain+0x43/0x60 [12135.732637] acpi_ev_notify_dispatch+0x47/0x70 [12135.732648] acpi_os_execute_deferred+0x13/0x20 [12135.732654] process_one_work+0x41f/0x500 [12135.732695] worker_thread+0x192/0x360 [12135.732715] kthread+0x111/0x140 [12135.732733] ret_from_fork+0x29/0x50 [12135.732779] </TASK>
Fix it by using a raw_spinlock_t for locking instead.
Also move the pr_err() out of the lock critical section and after put_cpu_ptr() to avoid indeterminate latency and the possibility of sleep with this call.
[longman@redhat.com: don't hold percpu ref across pr_err(), per Miaohe] Link: https://lkml.kernel.org/r/20240807181130.1122660-1-longman@redhat.com Link: https://lkml.kernel.org/r/20240806164107.1044956-1-longman@redhat.com Fixes: 0f383b6dc96e ("locking/spinlock: Provide RT variant") Signed-off-by: Waiman Long longman@redhat.com Acked-by: Miaohe Lin linmiaohe@huawei.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Juri Lelli juri.lelli@redhat.com Cc: Len Brown len.brown@intel.com Cc: Naoya Horiguchi nao.horiguchi@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory-failure.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-)
--- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -2406,7 +2406,7 @@ struct memory_failure_entry { struct memory_failure_cpu { DECLARE_KFIFO(fifo, struct memory_failure_entry, MEMORY_FAILURE_FIFO_SIZE); - spinlock_t lock; + raw_spinlock_t lock; struct work_struct work; };
@@ -2432,20 +2432,22 @@ void memory_failure_queue(unsigned long { struct memory_failure_cpu *mf_cpu; unsigned long proc_flags; + bool buffer_overflow; struct memory_failure_entry entry = { .pfn = pfn, .flags = flags, };
mf_cpu = &get_cpu_var(memory_failure_cpu); - spin_lock_irqsave(&mf_cpu->lock, proc_flags); - if (kfifo_put(&mf_cpu->fifo, entry)) + raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags); + buffer_overflow = !kfifo_put(&mf_cpu->fifo, entry); + if (!buffer_overflow) schedule_work_on(smp_processor_id(), &mf_cpu->work); - else + raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags); + put_cpu_var(memory_failure_cpu); + if (buffer_overflow) pr_err("buffer overflow when queuing memory failure at %#lx\n", pfn); - spin_unlock_irqrestore(&mf_cpu->lock, proc_flags); - put_cpu_var(memory_failure_cpu); } EXPORT_SYMBOL_GPL(memory_failure_queue);
@@ -2458,9 +2460,9 @@ static void memory_failure_work_func(str
mf_cpu = container_of(work, struct memory_failure_cpu, work); for (;;) { - spin_lock_irqsave(&mf_cpu->lock, proc_flags); + raw_spin_lock_irqsave(&mf_cpu->lock, proc_flags); gotten = kfifo_get(&mf_cpu->fifo, &entry); - spin_unlock_irqrestore(&mf_cpu->lock, proc_flags); + raw_spin_unlock_irqrestore(&mf_cpu->lock, proc_flags); if (!gotten) break; if (entry.flags & MF_SOFT_OFFLINE) @@ -2490,7 +2492,7 @@ static int __init memory_failure_init(vo
for_each_possible_cpu(cpu) { mf_cpu = &per_cpu(memory_failure_cpu, cpu); - spin_lock_init(&mf_cpu->lock); + raw_spin_lock_init(&mf_cpu->lock); INIT_KFIFO(mf_cpu->fifo); INIT_WORK(&mf_cpu->work, memory_failure_work_func); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Muhammad Usama Anjum usama.anjum@collabora.com
commit 7c5e8d212d7d81991a580e7de3904ea213d9a852 upstream.
[1] mentions that memfd_secret is only supported on arm64, riscv, x86 and x86_64 for now. It doesn't support other architectures. I found the build error on arm and decided to send the fix as it was creating noise on KernelCI:
memfd_secret.c: In function 'memfd_secret': memfd_secret.c:42:24: error: '__NR_memfd_secret' undeclared (first use in this function); did you mean 'memfd_secret'? 42 | return syscall(__NR_memfd_secret, flags); | ^~~~~~~~~~~~~~~~~ | memfd_secret
Hence I'm adding condition that memfd_secret should only be compiled on supported architectures.
Also check in run_vmtests script if memfd_secret binary is present before executing it.
Link: https://lkml.kernel.org/r/20240812061522.1933054-1-usama.anjum@collabora.com Link: https://lore.kernel.org/all/20210518072034.31572-7-rppt@kernel.org/ [1] Link: https://lkml.kernel.org/r/20240809075642.403247-1-usama.anjum@collabora.com Fixes: 76fe17ef588a ("secretmem: test: add basic selftest for memfd_secret(2)") Signed-off-by: Muhammad Usama Anjum usama.anjum@collabora.com Reviewed-by: Shuah Khan skhan@linuxfoundation.org Acked-by: Mike Rapoport (Microsoft) rppt@kernel.org Cc: Albert Ou aou@eecs.berkeley.edu Cc: James Bottomley James.Bottomley@HansenPartnership.com Cc: Mike Rapoport (Microsoft) rppt@kernel.org Cc: Palmer Dabbelt palmer@dabbelt.com Cc: Paul Walmsley paul.walmsley@sifive.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/mm/Makefile | 2 ++ tools/testing/selftests/mm/run_vmtests.sh | 3 +++ 2 files changed, 5 insertions(+)
--- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -51,7 +51,9 @@ TEST_GEN_FILES += madv_populate TEST_GEN_FILES += map_fixed_noreplace TEST_GEN_FILES += map_hugetlb TEST_GEN_FILES += map_populate +ifneq (,$(filter $(ARCH),arm64 riscv riscv64 x86 x86_64)) TEST_GEN_FILES += memfd_secret +endif TEST_GEN_FILES += migration TEST_GEN_FILES += mkdirty TEST_GEN_FILES += mlock-random-test --- a/tools/testing/selftests/mm/run_vmtests.sh +++ b/tools/testing/selftests/mm/run_vmtests.sh @@ -367,8 +367,11 @@ CATEGORY="hmm" run_test bash ./test_hmm. # MADV_POPULATE_READ and MADV_POPULATE_WRITE tests CATEGORY="madv_populate" run_test ./madv_populate
+if [ -x ./memfd_secret ] +then (echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope 2>&1) | tap_prefix CATEGORY="memfd_secret" run_test ./memfd_secret +fi
# KSM KSM_MERGE_TIME_HUGE_PAGES test with size of 100 CATEGORY="ksm" run_test ./ksm_tests -H -s 100
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suren Baghdasaryan surenb@google.com
commit a8fc28dad6d574582cdf2f7e78c73c59c623df30 upstream.
In several cases we are freeing pages which were not allocated using common page allocators. For such cases, in order to keep allocation accounting correct, we should clear the page tag to indicate that the page being freed is expected to not have a valid allocation tag. Introduce clear_page_tag_ref() helper function to be used for this.
Link: https://lkml.kernel.org/r/20240813150758.855881-1-surenb@google.com Fixes: d224eb0287fb ("codetag: debug: mark codetags for reserved pages as empty") Signed-off-by: Suren Baghdasaryan surenb@google.com Suggested-by: David Hildenbrand david@redhat.com Acked-by: David Hildenbrand david@redhat.com Reviewed-by: Pasha Tatashin pasha.tatashin@soleen.com Cc: Kees Cook keescook@chromium.org Cc: Kent Overstreet kent.overstreet@linux.dev Cc: Sourav Panda souravpanda@google.com Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org [6.10] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/pgalloc_tag.h | 13 +++++++++++++ mm/mm_init.c | 10 +--------- mm/page_alloc.c | 9 +-------- 3 files changed, 15 insertions(+), 17 deletions(-)
--- a/include/linux/pgalloc_tag.h +++ b/include/linux/pgalloc_tag.h @@ -43,6 +43,18 @@ static inline void put_page_tag_ref(unio page_ext_put(page_ext_from_codetag_ref(ref)); }
+static inline void clear_page_tag_ref(struct page *page) +{ + if (mem_alloc_profiling_enabled()) { + union codetag_ref *ref = get_page_tag_ref(page); + + if (ref) { + set_codetag_empty(ref); + put_page_tag_ref(ref); + } + } +} + static inline void pgalloc_tag_add(struct page *page, struct task_struct *task, unsigned int nr) { @@ -126,6 +138,7 @@ static inline void pgalloc_tag_sub_pages
static inline union codetag_ref *get_page_tag_ref(struct page *page) { return NULL; } static inline void put_page_tag_ref(union codetag_ref *ref) {} +static inline void clear_page_tag_ref(struct page *page) {} static inline void pgalloc_tag_add(struct page *page, struct task_struct *task, unsigned int nr) {} static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {} --- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -2507,15 +2507,7 @@ void __init memblock_free_pages(struct p }
/* pages were reserved and not allocated */ - if (mem_alloc_profiling_enabled()) { - union codetag_ref *ref = get_page_tag_ref(page); - - if (ref) { - set_codetag_empty(ref); - put_page_tag_ref(ref); - } - } - + clear_page_tag_ref(page); __free_pages_core(page, order); }
--- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5806,14 +5806,7 @@ unsigned long free_reserved_area(void *s
void free_reserved_page(struct page *page) { - if (mem_alloc_profiling_enabled()) { - union codetag_ref *ref = get_page_tag_ref(page); - - if (ref) { - set_codetag_empty(ref); - put_page_tag_ref(ref); - } - } + clear_page_tag_ref(page); ClearPageReserved(page); init_page_count(page); __free_page(page);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zi Yan ziy@nvidia.com
commit fd8c35a92910f4829b7c99841f39b1b952c259d5 upstream.
When handling a numa page fault, task_numa_fault() should be called by a process that restores the page table of the faulted folio to avoid duplicated stats counting. Commit c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling") restructured do_huge_pmd_numa_page() and did not avoid task_numa_fault() call in the second page table check after a numa migration failure. Fix it by making all !pmd_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary and lead to unexpected numa balancing results (It is hard to tell whether the issue will cause positive or negative performance impact due to duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-3-ziy@nvidia.com Fixes: c5b5a3dd2c1f ("mm: thp: refactor NUMA fault handling") Reported-by: "Huang, Ying" ying.huang@intel.com Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel... Signed-off-by: Zi Yan ziy@nvidia.com Acked-by: David Hildenbrand david@redhat.com Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: "Huang, Ying" ying.huang@intel.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Mel Gorman mgorman@suse.de Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/huge_memory.c | 29 +++++++++++++---------------- 1 file changed, 13 insertions(+), 16 deletions(-)
--- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1672,7 +1672,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) { spin_unlock(vmf->ptl); - goto out; + return 0; }
pmd = pmd_modify(oldpmd, vma->vm_page_prot); @@ -1715,22 +1715,16 @@ vm_fault_t do_huge_pmd_numa_page(struct if (!migrate_misplaced_folio(folio, vma, target_nid)) { flags |= TNF_MIGRATED; nid = target_nid; - } else { - flags |= TNF_MIGRATE_FAIL; - vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); - if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) { - spin_unlock(vmf->ptl); - goto out; - } - goto out_map; - } - -out: - if (nid != NUMA_NO_NODE) task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags); + return 0; + }
- return 0; - + flags |= TNF_MIGRATE_FAIL; + vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd); + if (unlikely(!pmd_same(oldpmd, *vmf->pmd))) { + spin_unlock(vmf->ptl); + return 0; + } out_map: /* Restore the PMD */ pmd = pmd_modify(oldpmd, vma->vm_page_prot); @@ -1740,7 +1734,10 @@ out_map: set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd); update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); spin_unlock(vmf->ptl); - goto out; + + if (nid != NUMA_NO_NODE) + task_numa_fault(last_cpupid, nid, HPAGE_PMD_NR, flags); + return 0; }
/*
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hailong Liu hailong.liu@oppo.com
commit 61ebe5a747da649057c37be1c37eb934b4af79ca upstream.
The __vmap_pages_range_noflush() assumes its argument pages** contains pages with the same page shift. However, since commit e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations"), if gfp_flags includes __GFP_NOFAIL with high order in vm_area_alloc_pages() and page allocation failed for high order, the pages** may contain two different page shifts (high order and order-0). This could lead __vmap_pages_range_noflush() to perform incorrect mappings, potentially resulting in memory corruption.
Users might encounter this as follows (vmap_allow_huge = true, 2M is for PMD_SIZE):
kvmalloc(2M, __GFP_NOFAIL|GFP_X) __vmalloc_node_range_noprof(vm_flags=VM_ALLOW_HUGE_VMAP) vm_area_alloc_pages(order=9) ---> order-9 allocation failed and fallback to order-0 vmap_pages_range() vmap_pages_range_noflush() __vmap_pages_range_noflush(page_shift = 21) ----> wrong mapping happens
We can remove the fallback code because if a high-order allocation fails, __vmalloc_node_range_noprof() will retry with order-0. Therefore, it is unnecessary to fallback to order-0 here. Therefore, fix this by removing the fallback code.
Link: https://lkml.kernel.org/r/20240808122019.3361-1-hailong.liu@oppo.com Fixes: e9c3cda4d86e ("mm, vmalloc: fix high order __GFP_NOFAIL allocations") Signed-off-by: Hailong Liu hailong.liu@oppo.com Reported-by: Tangquan Zheng zhengtangquan@oppo.com Reviewed-by: Baoquan He bhe@redhat.com Reviewed-by: Uladzislau Rezki (Sony) urezki@gmail.com Acked-by: Barry Song baohua@kernel.org Acked-by: Michal Hocko mhocko@suse.com Cc: Matthew Wilcox willy@infradead.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/vmalloc.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)
--- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -3583,15 +3583,8 @@ vm_area_alloc_pages(gfp_t gfp, int nid, page = alloc_pages_noprof(alloc_gfp, order); else page = alloc_pages_node_noprof(nid, alloc_gfp, order); - if (unlikely(!page)) { - if (!nofail) - break; - - /* fall back to the zero order allocations */ - alloc_gfp |= __GFP_NOFAIL; - order = 0; - continue; - } + if (unlikely(!page)) + break;
/* * Higher order allocations must be able to be treated as
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zi Yan ziy@nvidia.com
commit 40b760cfd44566bca791c80e0720d70d75382b84 upstream.
When handling a numa page fault, task_numa_fault() should be called by a process that restores the page table of the faulted folio to avoid duplicated stats counting. Commit b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault") restructured do_numa_page() and did not avoid task_numa_fault() call in the second page table check after a numa migration failure. Fix it by making all !pte_same() return immediately.
This issue can cause task_numa_fault() being called more than necessary and lead to unexpected numa balancing results (It is hard to tell whether the issue will cause positive or negative performance impact due to duplicated numa fault counting).
Link: https://lkml.kernel.org/r/20240809145906.1513458-2-ziy@nvidia.com Fixes: b99a342d4f11 ("NUMA balancing: reduce TLB flush via delaying mapping on hint page fault") Signed-off-by: Zi Yan ziy@nvidia.com Reported-by: "Huang, Ying" ying.huang@intel.com Closes: https://lore.kernel.org/linux-mm/87zfqfw0yw.fsf@yhuang6-desk2.ccr.corp.intel... Acked-by: David Hildenbrand david@redhat.com Cc: Baolin Wang baolin.wang@linux.alibaba.com Cc: Kefeng Wang wangkefeng.wang@huawei.com Cc: Mel Gorman mgorman@suse.de Cc: Yang Shi shy828301@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memory.c | 33 ++++++++++++++++----------------- 1 file changed, 16 insertions(+), 17 deletions(-)
--- a/mm/memory.c +++ b/mm/memory.c @@ -5155,7 +5155,7 @@ static vm_fault_t do_numa_page(struct vm
if (unlikely(!pte_same(old_pte, vmf->orig_pte))) { pte_unmap_unlock(vmf->pte, vmf->ptl); - goto out; + return 0; }
pte = pte_modify(old_pte, vma->vm_page_prot); @@ -5218,23 +5218,19 @@ static vm_fault_t do_numa_page(struct vm if (!migrate_misplaced_folio(folio, vma, target_nid)) { nid = target_nid; flags |= TNF_MIGRATED; - } else { - flags |= TNF_MIGRATE_FAIL; - vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, - vmf->address, &vmf->ptl); - if (unlikely(!vmf->pte)) - goto out; - if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) { - pte_unmap_unlock(vmf->pte, vmf->ptl); - goto out; - } - goto out_map; + task_numa_fault(last_cpupid, nid, nr_pages, flags); + return 0; }
-out: - if (nid != NUMA_NO_NODE) - task_numa_fault(last_cpupid, nid, nr_pages, flags); - return 0; + flags |= TNF_MIGRATE_FAIL; + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + if (unlikely(!vmf->pte)) + return 0; + if (unlikely(!pte_same(ptep_get(vmf->pte), vmf->orig_pte))) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; + } out_map: /* * Make it present again, depending on how arch implements @@ -5247,7 +5243,10 @@ out_map: numa_rebuild_single_mapping(vmf, vma, vmf->address, vmf->pte, writable); pte_unmap_unlock(vmf->pte, vmf->ptl); - goto out; + + if (nid != NUMA_NO_NODE) + task_numa_fault(last_cpupid, nid, nr_pages, flags); + return 0; }
static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
commit 31723c9542dba1681cc3720571fdf12ffe0eddd9 upstream.
[REPORT] There is a bug report that kernel is rejecting a mismatching inode mode and its dir item:
[ 1881.553937] BTRFS critical (device dm-0): inode mode mismatch with dir: inode mode=040700 btrfs type=2 dir type=0
[CAUSE] It looks like the inode mode is correct, while the dir item type 0 is BTRFS_FT_UNKNOWN, which should not be generated by btrfs at all.
This may be caused by a memory bit flip.
[ENHANCEMENT] Although tree-checker is not able to do any cross-leaf verification, for this particular case we can at least reject any dir type with BTRFS_FT_UNKNOWN.
So here we enhance the dir type check from [0, BTRFS_FT_MAX), to (0, BTRFS_FT_MAX). Although the existing corruption can not be fixed just by such enhanced checking, it should prevent the same 0x2->0x0 bitflip for dir type to reach disk in the future.
Reported-by: Kota nospam@kota.moe Link: https://lore.kernel.org/linux-btrfs/CACsxjPYnQF9ZF-0OhH16dAx50=BXXOcP74MxBc3... CC: stable@vger.kernel.org # 5.4+ Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/tree-checker.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -551,9 +551,10 @@ static int check_dir_item(struct extent_
/* dir type check */ dir_type = btrfs_dir_ftype(leaf, di); - if (unlikely(dir_type >= BTRFS_FT_MAX)) { + if (unlikely(dir_type <= BTRFS_FT_UNKNOWN || + dir_type >= BTRFS_FT_MAX)) { dir_item_err(leaf, slot, - "invalid dir item type, have %u expect [0, %u)", + "invalid dir item type, have %u expect (0, %u)", dir_type, BTRFS_FT_MAX); return -EUCLEAN; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit 46a6e10a1ab16cc71d4a3cab73e79aabadd6b8ea upstream.
If we a find that an extent is shared but its end offset is not sector size aligned, then we don't clone it and issue write operations instead. This is because the reflink (remap_file_range) operation does not allow to clone unaligned ranges, except if the end offset of the range matches the i_size of the source and destination files (and the start offset is sector size aligned).
While this is not incorrect because send can only guarantee that a file has the same data in the source and destination snapshots, it's not optimal and generates confusion and surprising behaviour for users.
For example, running this test:
$ cat test.sh #!/bin/bash
DEV=/dev/sdi MNT=/mnt/sdi
mkfs.btrfs -f $DEV mount $DEV $MNT
# Use a file size not aligned to any possible sector size. file_size=$((1 * 1024 * 1024 + 5)) # 1MB + 5 bytes dd if=/dev/random of=$MNT/foo bs=$file_size count=1 cp --reflink=always $MNT/foo $MNT/bar
btrfs subvolume snapshot -r $MNT/ $MNT/snap rm -f /tmp/send-test btrfs send -f /tmp/send-test $MNT/snap
umount $MNT mkfs.btrfs -f $DEV mount $DEV $MNT
btrfs receive -vv -f /tmp/send-test $MNT
xfs_io -r -c "fiemap -v" $MNT/snap/bar
umount $MNT
Gives the following result:
(...) mkfile o258-7-0 rename o258-7-0 -> bar write bar - offset=0 length=49152 write bar - offset=49152 length=49152 write bar - offset=98304 length=49152 write bar - offset=147456 length=49152 write bar - offset=196608 length=49152 write bar - offset=245760 length=49152 write bar - offset=294912 length=49152 write bar - offset=344064 length=49152 write bar - offset=393216 length=49152 write bar - offset=442368 length=49152 write bar - offset=491520 length=49152 write bar - offset=540672 length=49152 write bar - offset=589824 length=49152 write bar - offset=638976 length=49152 write bar - offset=688128 length=49152 write bar - offset=737280 length=49152 write bar - offset=786432 length=49152 write bar - offset=835584 length=49152 write bar - offset=884736 length=49152 write bar - offset=933888 length=49152 write bar - offset=983040 length=49152 write bar - offset=1032192 length=16389 chown bar - uid=0, gid=0 chmod bar - mode=0644 utimes bar utimes BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=06d640da-9ca1-604c-b87c-3375175a8eb3, stransid=7 /mnt/sdi/snap/bar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2055]: 26624..28679 2056 0x1
There's no clone operation to clone extents from the file foo into file bar and fiemap confirms there's no shared flag (0x2000).
So update send_write_or_clone() so that it proceeds with cloning if the source and destination ranges end at the i_size of the respective files.
After this changes the result of the test is:
(...) mkfile o258-7-0 rename o258-7-0 -> bar clone bar - source=foo source offset=0 offset=0 length=1048581 chown bar - uid=0, gid=0 chmod bar - mode=0644 utimes bar utimes BTRFS_IOC_SET_RECEIVED_SUBVOL uuid=582420f3-ea7d-564e-bbe5-ce440d622190, stransid=7 /mnt/sdi/snap/bar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..2055]: 26624..28679 2056 0x2001
A test case for fstests will also follow up soon.
Link: https://github.com/kdave/btrfs-progs/issues/572#issuecomment-2282841416 CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/send.c | 54 ++++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 40 insertions(+), 14 deletions(-)
--- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -6158,25 +6158,51 @@ static int send_write_or_clone(struct se u64 offset = key->offset; u64 end; u64 bs = sctx->send_root->fs_info->sectorsize; + struct btrfs_file_extent_item *ei; + u64 disk_byte; + u64 data_offset; + u64 num_bytes; + struct btrfs_inode_info info = { 0 };
end = min_t(u64, btrfs_file_extent_end(path), sctx->cur_inode_size); if (offset >= end) return 0;
- if (clone_root && IS_ALIGNED(end, bs)) { - struct btrfs_file_extent_item *ei; - u64 disk_byte; - u64 data_offset; - - ei = btrfs_item_ptr(path->nodes[0], path->slots[0], - struct btrfs_file_extent_item); - disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei); - data_offset = btrfs_file_extent_offset(path->nodes[0], ei); - ret = clone_range(sctx, path, clone_root, disk_byte, - data_offset, offset, end - offset); - } else { - ret = send_extent_data(sctx, path, offset, end - offset); - } + num_bytes = end - offset; + + if (!clone_root) + goto write_data; + + if (IS_ALIGNED(end, bs)) + goto clone_data; + + /* + * If the extent end is not aligned, we can clone if the extent ends at + * the i_size of the inode and the clone range ends at the i_size of the + * source inode, otherwise the clone operation fails with -EINVAL. + */ + if (end != sctx->cur_inode_size) + goto write_data; + + ret = get_inode_info(clone_root->root, clone_root->ino, &info); + if (ret < 0) + return ret; + + if (clone_root->offset + num_bytes == info.size) + goto clone_data; + +write_data: + ret = send_extent_data(sctx, path, offset, num_bytes); + sctx->cur_inode_next_write_offset = end; + return ret; + +clone_data: + ei = btrfs_item_ptr(path->nodes[0], path->slots[0], + struct btrfs_file_extent_item); + disk_byte = btrfs_file_extent_disk_bytenr(path->nodes[0], ei); + data_offset = btrfs_file_extent_offset(path->nodes[0], ei); + ret = clone_range(sctx, path, clone_root, disk_byte, data_offset, offset, + num_bytes); sctx->cur_inode_next_write_offset = end; return ret; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
commit 42fac187b5c746227c92d024f1caf33bc1d337e4 upstream.
In the patch 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume") I added some code to handle file systems that had been corrupted by a bug that incorrectly skipped updating the drop progress key while dropping a snapshot. This code would check to see if we had already deleted our reference for a child block, and skip the deletion if we had already.
Unfortunately there is a bug, as the check would only check the on-disk references. I made an incorrect assumption that blocks in an already deleted snapshot that was having the deletion resume on mount wouldn't be modified.
If we have 2 pending deleted snapshots that share blocks, we can easily modify the rules for a block. Take the following example
subvolume a exists, and subvolume b is a snapshot of subvolume a. They share references to block 1. Block 1 will have 2 full references, one for subvolume a and one for subvolume b, and it belongs to subvolume a (btrfs_header_owner(block 1) == subvolume a).
When deleting subvolume a, we will drop our full reference for block 1, and because we are the owner we will drop our full reference for all of block 1's children, convert block 1 to FULL BACKREF, and add a shared reference to all of block 1's children.
Then we will start the snapshot deletion of subvolume b. We look up the extent info for block 1, which checks delayed refs and tells us that FULL BACKREF is set, so sets parent to the bytenr of block 1. However because this is a resumed snapshot deletion, we call into check_ref_exists(). Because check_ref_exists() only looks at the disk, it doesn't find the shared backref for the child of block 1, and thus returns 0 and we skip deleting the reference for the child of block 1 and continue. This orphans the child of block 1.
The fix is to lookup the delayed refs, similar to what we do in btrfs_lookup_extent_info(). However we only care about whether the reference exists or not. If we fail to find our reference on disk, go look up the bytenr in the delayed refs, and if it exists look for an existing ref in the delayed ref head. If that exists then we know we can delete the reference safely and carry on. If it doesn't exist we know we have to skip over this block.
This bug has existed since I introduced this fix, however requires having multiple deleted snapshots pending when we unmount. We noticed this in production because our shutdown path stops the container on the system, which deletes a bunch of subvolumes, and then reboots the box. This gives us plenty of opportunities to hit this issue. Looking at the history we've seen this occasionally in production, but we had a big spike recently thanks to faster machines getting jobs with multiple subvolumes in the job.
Chris Mason wrote a reproducer which does the following
mount /dev/nvme4n1 /btrfs btrfs subvol create /btrfs/s1 simoop -E -f 4k -n 200000 -z /btrfs/s1 while(true) ; do btrfs subvol snap /btrfs/s1 /btrfs/s2 simoop -f 4k -n 200000 -r 10 -z /btrfs/s2 btrfs subvol snap /btrfs/s2 /btrfs/s3 btrfs balance start -dusage=80 /btrfs btrfs subvol del /btrfs/s2 /btrfs/s3 umount /btrfs btrfsck /dev/nvme4n1 || exit 1 mount /dev/nvme4n1 /btrfs done
On the second loop this would fail consistently, with my patch it has been running for hours and hasn't failed.
I also used dm-log-writes to capture the state of the failure so I could debug the problem. Using the existing failure case to test my patch validated that it fixes the problem.
Fixes: 78c52d9eb6b7 ("btrfs: check for refs on snapshot delete resume") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Josef Bacik josef@toxicpanda.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/delayed-ref.c | 67 +++++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/delayed-ref.h | 2 + fs/btrfs/extent-tree.c | 51 ++++++++++++++++++++++++++++++++----- 3 files changed, 114 insertions(+), 6 deletions(-)
--- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -1169,6 +1169,73 @@ btrfs_find_delayed_ref_head(struct btrfs return find_ref_head(delayed_refs, bytenr, false); }
+static int find_comp(struct btrfs_delayed_ref_node *entry, u64 root, u64 parent) +{ + int type = parent ? BTRFS_SHARED_BLOCK_REF_KEY : BTRFS_TREE_BLOCK_REF_KEY; + + if (type < entry->type) + return -1; + if (type > entry->type) + return 1; + + if (type == BTRFS_TREE_BLOCK_REF_KEY) { + if (root < entry->ref_root) + return -1; + if (root > entry->ref_root) + return 1; + } else { + if (parent < entry->parent) + return -1; + if (parent > entry->parent) + return 1; + } + return 0; +} + +/* + * Check to see if a given root/parent reference is attached to the head. This + * only checks for BTRFS_ADD_DELAYED_REF references that match, as that + * indicates the reference exists for the given root or parent. This is for + * tree blocks only. + * + * @head: the head of the bytenr we're searching. + * @root: the root objectid of the reference if it is a normal reference. + * @parent: the parent if this is a shared backref. + */ +bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head, + u64 root, u64 parent) +{ + struct rb_node *node; + bool found = false; + + lockdep_assert_held(&head->mutex); + + spin_lock(&head->lock); + node = head->ref_tree.rb_root.rb_node; + while (node) { + struct btrfs_delayed_ref_node *entry; + int ret; + + entry = rb_entry(node, struct btrfs_delayed_ref_node, ref_node); + ret = find_comp(entry, root, parent); + if (ret < 0) { + node = node->rb_left; + } else if (ret > 0) { + node = node->rb_right; + } else { + /* + * We only want to count ADD actions, as drops mean the + * ref doesn't exist. + */ + if (entry->action == BTRFS_ADD_DELAYED_REF) + found = true; + break; + } + } + spin_unlock(&head->lock); + return found; +} + void __cold btrfs_delayed_ref_exit(void) { kmem_cache_destroy(btrfs_delayed_ref_head_cachep); --- a/fs/btrfs/delayed-ref.h +++ b/fs/btrfs/delayed-ref.h @@ -389,6 +389,8 @@ int btrfs_delayed_refs_rsv_refill(struct void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info, u64 num_bytes); bool btrfs_check_space_for_delayed_refs(struct btrfs_fs_info *fs_info); +bool btrfs_find_delayed_tree_ref(struct btrfs_delayed_ref_head *head, + u64 root, u64 parent);
static inline u64 btrfs_delayed_ref_owner(struct btrfs_delayed_ref_node *node) { --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -5387,23 +5387,62 @@ static int check_ref_exists(struct btrfs struct btrfs_root *root, u64 bytenr, u64 parent, int level) { + struct btrfs_delayed_ref_root *delayed_refs; + struct btrfs_delayed_ref_head *head; struct btrfs_path *path; struct btrfs_extent_inline_ref *iref; int ret; + bool exists = false;
path = btrfs_alloc_path(); if (!path) return -ENOMEM; - +again: ret = lookup_extent_backref(trans, path, &iref, bytenr, root->fs_info->nodesize, parent, btrfs_root_id(root), level, 0); + if (ret != -ENOENT) { + /* + * If we get 0 then we found our reference, return 1, else + * return the error if it's not -ENOENT; + */ + btrfs_free_path(path); + return (ret < 0 ) ? ret : 1; + } + + /* + * We could have a delayed ref with this reference, so look it up while + * we're holding the path open to make sure we don't race with the + * delayed ref running. + */ + delayed_refs = &trans->transaction->delayed_refs; + spin_lock(&delayed_refs->lock); + head = btrfs_find_delayed_ref_head(delayed_refs, bytenr); + if (!head) + goto out; + if (!mutex_trylock(&head->mutex)) { + /* + * We're contended, means that the delayed ref is running, get a + * reference and wait for the ref head to be complete and then + * try again. + */ + refcount_inc(&head->refs); + spin_unlock(&delayed_refs->lock); + + btrfs_release_path(path); + + mutex_lock(&head->mutex); + mutex_unlock(&head->mutex); + btrfs_put_delayed_ref_head(head); + goto again; + } + + exists = btrfs_find_delayed_tree_ref(head, root->root_key.objectid, parent); + mutex_unlock(&head->mutex); +out: + spin_unlock(&delayed_refs->lock); btrfs_free_path(path); - if (ret == -ENOENT) - return 0; - if (ret < 0) - return ret; - return 1; + return exists ? 1 : 0; }
/*
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Filipe Manana fdmanana@suse.com
commit ae1e766f623f7a2a889a0b09eb076dd9a60efbe9 upstream.
Currently the extent map shrinker can be run by any task when attempting to allocate memory and there's enough memory pressure to trigger it.
To avoid too much latency we stop iterating over extent maps and removing them once the task needs to reschedule. This logic was introduced in commit b3ebb9b7e92a ("btrfs: stop extent map shrinker if reschedule is needed").
While that solved high latency problems for some use cases, it's still not enough because with a too high number of tasks entering the extent map shrinker code, either due to memory allocations or because they are a kswapd task, we end up having a very high level of contention on some spin locks, namely:
1) The fs_info->fs_roots_radix_lock spin lock, which we need to find roots to iterate over their inodes;
2) The spin lock of the xarray used to track open inodes for a root (struct btrfs_root::inodes) - on 6.10 kernels and below, it used to be a red black tree and the spin lock was root->inode_lock;
3) The fs_info->delayed_iput_lock spin lock since the shrinker adds delayed iputs (calls btrfs_add_delayed_iput()).
Instead of allowing the extent map shrinker to be run by any task, make it run only by kswapd tasks. This still solves the problem of running into OOM situations due to an unbounded extent map creation, which is simple to trigger by direct IO writes, as described in the changelog of commit 956a17d9d050 ("btrfs: add a shrinker for extent maps"), and by a similar case when doing buffered IO on files with a very large number of holes (keeping the file open and creating many holes, whose extent maps are only released when the file is closed).
Reported-by: kzd kzd@56709.net Link: https://bugzilla.kernel.org/show_bug.cgi?id=219121 Reported-by: Octavia Togami octavia.togami@gmail.com Link: https://lore.kernel.org/linux-btrfs/CAHPNGSSt-a4ZZWrtJdVyYnJFscFjP9S7rMcvEMa... Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps") CC: stable@vger.kernel.org # 6.10+ Tested-by: kzd kzd@56709.net Tested-by: Octavia Togami octavia.togami@gmail.com Signed-off-by: Filipe Manana fdmanana@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent_map.c | 22 ++++++---------------- fs/btrfs/super.c | 10 ++++++++++ 2 files changed, 16 insertions(+), 16 deletions(-)
--- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -1065,8 +1065,7 @@ static long btrfs_scan_inode(struct btrf return 0;
/* - * We want to be fast because we can be called from any path trying to - * allocate memory, so if the lock is busy we don't want to spend time + * We want to be fast so if the lock is busy we don't want to spend time * waiting for it - either some task is about to do IO for the inode or * we may have another task shrinking extent maps, here in this code, so * skip this inode. @@ -1109,9 +1108,7 @@ next: /* * Stop if we need to reschedule or there's contention on the * lock. This is to avoid slowing other tasks trying to take the - * lock and because the shrinker might be called during a memory - * allocation path and we want to avoid taking a very long time - * and slowing down all sorts of tasks. + * lock. */ if (need_resched() || rwlock_needbreak(&tree->lock)) break; @@ -1139,12 +1136,7 @@ static long btrfs_scan_root(struct btrfs if (ctx->scanned >= ctx->nr_to_scan) break;
- /* - * We may be called from memory allocation paths, so we don't - * want to take too much time and slowdown tasks. - */ - if (need_resched()) - break; + cond_resched();
inode = btrfs_find_first_inode(root, min_ino); } @@ -1202,14 +1194,12 @@ long btrfs_free_extent_maps(struct btrfs ctx.last_ino); }
- /* - * We may be called from memory allocation paths, so we don't want to - * take too much time and slowdown tasks, so stop if we need reschedule. - */ - while (ctx.scanned < ctx.nr_to_scan && !need_resched()) { + while (ctx.scanned < ctx.nr_to_scan) { struct btrfs_root *root; unsigned long count;
+ cond_resched(); + spin_lock(&fs_info->fs_roots_radix_lock); count = radix_tree_gang_lookup(&fs_info->fs_roots_radix, (void **)&root, --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -28,6 +28,7 @@ #include <linux/btrfs.h> #include <linux/security.h> #include <linux/fs_parser.h> +#include <linux/swap.h> #include "messages.h" #include "delayed-inode.h" #include "ctree.h" @@ -2394,6 +2395,15 @@ static long btrfs_free_cached_objects(st const long nr_to_scan = min_t(unsigned long, LONG_MAX, sc->nr_to_scan); struct btrfs_fs_info *fs_info = btrfs_sb(sb);
+ /* + * We may be called from any task trying to allocate memory and we don't + * want to slow it down with scanning and dropping extent maps. It would + * also cause heavy lock contention if many tasks concurrently enter + * here. Therefore only allow kswapd tasks to scan and drop extent maps. + */ + if (!current_is_kswapd()) + return 0; + return btrfs_free_extent_maps(fs_info, nr_to_scan); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naohiro Aota naohiro.aota@wdc.com
commit e30729d4bd4001881be4d1ad4332a5d4985398f8 upstream.
__btrfs_add_free_space_zoned() references and modifies bg's alloc_offset, ro, and zone_unusable, but without taking the lock. It is mostly safe because they monotonically increase (at least for now) and this function is mostly called by a transaction commit, which is serialized by itself.
Still, taking the lock is a safer and correct option and I'm going to add a change to reset zone_unusable while a block group is still alive. So, add locking around the operations.
Fixes: 169e0da91a21 ("btrfs: zoned: track unusable bytes for zones") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Naohiro Aota naohiro.aota@wdc.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/free-space-cache.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
--- a/fs/btrfs/free-space-cache.c +++ b/fs/btrfs/free-space-cache.c @@ -2698,15 +2698,16 @@ static int __btrfs_add_free_space_zoned( u64 offset = bytenr - block_group->start; u64 to_free, to_unusable; int bg_reclaim_threshold = 0; - bool initial = ((size == block_group->length) && (block_group->alloc_offset == 0)); + bool initial; u64 reclaimable_unusable;
- WARN_ON(!initial && offset + size > block_group->zone_capacity); + spin_lock(&block_group->lock);
+ initial = ((size == block_group->length) && (block_group->alloc_offset == 0)); + WARN_ON(!initial && offset + size > block_group->zone_capacity); if (!initial) bg_reclaim_threshold = READ_ONCE(sinfo->bg_reclaim_threshold);
- spin_lock(&ctl->tree_lock); if (!used) to_free = size; else if (initial) @@ -2719,7 +2720,9 @@ static int __btrfs_add_free_space_zoned( to_free = offset + size - block_group->alloc_offset; to_unusable = size - to_free;
+ spin_lock(&ctl->tree_lock); ctl->free_space += to_free; + spin_unlock(&ctl->tree_lock); /* * If the block group is read-only, we should account freed space into * bytes_readonly. @@ -2728,11 +2731,8 @@ static int __btrfs_add_free_space_zoned( block_group->zone_unusable += to_unusable; WARN_ON(block_group->zone_unusable > block_group->length); } - spin_unlock(&ctl->tree_lock); if (!used) { - spin_lock(&block_group->lock); block_group->alloc_offset -= size; - spin_unlock(&block_group->lock); }
reclaimable_unusable = block_group->zone_unusable - @@ -2746,6 +2746,8 @@ static int __btrfs_add_free_space_zoned( btrfs_mark_bg_to_reclaim(block_group); }
+ spin_unlock(&block_group->lock); + return 0; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
commit 008e2512dc5696ab2dc5bf264e98a9fe9ceb830e upstream.
[REPORT] There is a corruption report that btrfs refused to mount a fs that has overlapping dev extents:
BTRFS error (device sdc): dev extent devid 4 physical offset 14263979671552 overlap with previous dev extent end 14263980982272 BTRFS error (device sdc): failed to verify dev extents against chunks: -117 BTRFS error (device sdc): open_ctree failed
[CAUSE] The direct cause is very obvious, there is a bad dev extent item with incorrect length.
With btrfs check reporting two overlapping extents, the second one shows some clue on the cause:
ERROR: dev extent devid 4 offset 14263979671552 len 6488064 overlap with previous dev extent end 14263980982272 ERROR: dev extent devid 13 offset 2257707008000 len 6488064 overlap with previous dev extent end 2257707270144 ERROR: errors found in extent allocation tree or chunk allocation
The second one looks like a bitflip happened during new chunk allocation: hex(2257707008000) = 0x20da9d30000 hex(2257707270144) = 0x20da9d70000 diff = 0x00000040000
So it looks like a bitflip happened during new dev extent allocation, resulting the second overlap.
Currently we only do the dev-extent verification at mount time, but if the corruption is caused by memory bitflip, we really want to catch it before writing the corruption to the storage.
Furthermore the dev extent items has the following key definition:
(<device id> DEV_EXTENT <physical offset>)
Thus we can not just rely on the generic key order check to make sure there is no overlapping.
[ENHANCEMENT] Introduce dedicated dev extent checks, including:
- Fixed member checks * chunk_tree should always be BTRFS_CHUNK_TREE_OBJECTID (3) * chunk_objectid should always be BTRFS_FIRST_CHUNK_CHUNK_TREE_OBJECTID (256)
- Alignment checks * chunk_offset should be aligned to sectorsize * length should be aligned to sectorsize * key.offset should be aligned to sectorsize
- Overlap checks If the previous key is also a dev-extent item, with the same device id, make sure we do not overlap with the previous dev extent.
Reported: Stefan N stefannnau@gmail.com Link: https://lore.kernel.org/linux-btrfs/CA+W5K0rSO3koYTo=nzxxTm1-Pdu1HYgVxEpgJ=a... CC: stable@vger.kernel.org # 5.10+ Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/tree-checker.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 69 insertions(+)
--- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -1718,6 +1718,72 @@ static int check_raid_stripe_extent(cons return 0; }
+static int check_dev_extent_item(const struct extent_buffer *leaf, + const struct btrfs_key *key, + int slot, + struct btrfs_key *prev_key) +{ + struct btrfs_dev_extent *de; + const u32 sectorsize = leaf->fs_info->sectorsize; + + de = btrfs_item_ptr(leaf, slot, struct btrfs_dev_extent); + /* Basic fixed member checks. */ + if (unlikely(btrfs_dev_extent_chunk_tree(leaf, de) != + BTRFS_CHUNK_TREE_OBJECTID)) { + generic_err(leaf, slot, + "invalid dev extent chunk tree id, has %llu expect %llu", + btrfs_dev_extent_chunk_tree(leaf, de), + BTRFS_CHUNK_TREE_OBJECTID); + return -EUCLEAN; + } + if (unlikely(btrfs_dev_extent_chunk_objectid(leaf, de) != + BTRFS_FIRST_CHUNK_TREE_OBJECTID)) { + generic_err(leaf, slot, + "invalid dev extent chunk objectid, has %llu expect %llu", + btrfs_dev_extent_chunk_objectid(leaf, de), + BTRFS_FIRST_CHUNK_TREE_OBJECTID); + return -EUCLEAN; + } + /* Alignment check. */ + if (unlikely(!IS_ALIGNED(key->offset, sectorsize))) { + generic_err(leaf, slot, + "invalid dev extent key.offset, has %llu not aligned to %u", + key->offset, sectorsize); + return -EUCLEAN; + } + if (unlikely(!IS_ALIGNED(btrfs_dev_extent_chunk_offset(leaf, de), + sectorsize))) { + generic_err(leaf, slot, + "invalid dev extent chunk offset, has %llu not aligned to %u", + btrfs_dev_extent_chunk_objectid(leaf, de), + sectorsize); + return -EUCLEAN; + } + if (unlikely(!IS_ALIGNED(btrfs_dev_extent_length(leaf, de), + sectorsize))) { + generic_err(leaf, slot, + "invalid dev extent length, has %llu not aligned to %u", + btrfs_dev_extent_length(leaf, de), sectorsize); + return -EUCLEAN; + } + /* Overlap check with previous dev extent. */ + if (slot && prev_key->objectid == key->objectid && + prev_key->type == key->type) { + struct btrfs_dev_extent *prev_de; + u64 prev_len; + + prev_de = btrfs_item_ptr(leaf, slot - 1, struct btrfs_dev_extent); + prev_len = btrfs_dev_extent_length(leaf, prev_de); + if (unlikely(prev_key->offset + prev_len > key->offset)) { + generic_err(leaf, slot, + "dev extent overlap, prev offset %llu len %llu current offset %llu", + prev_key->objectid, prev_len, key->offset); + return -EUCLEAN; + } + } + return 0; +} + /* * Common point to switch the item-specific validation. */ @@ -1754,6 +1820,9 @@ static enum btrfs_tree_block_status chec case BTRFS_DEV_ITEM_KEY: ret = check_dev_item(leaf, key, slot); break; + case BTRFS_DEV_EXTENT_KEY: + ret = check_dev_extent_item(leaf, key, slot, prev_key); + break; case BTRFS_INODE_ITEM_KEY: ret = check_inode_item(leaf, key, slot); break;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
commit 534f7eff9239c1b0af852fc33f5af2b62c00eddf upstream.
Although there are several patches improving the extent map shrinker, there are still reports of too frequent shrinker behavior, taking too much CPU for the kswapd process.
So let's only enable extent shrinker for now, until we got more comprehensive understanding and a better solution.
Link: https://lore.kernel.org/linux-btrfs/3df4acd616a07ef4d2dc6bad668701504b412ffc... Link: https://lore.kernel.org/linux-btrfs/c30fd6b3-ca7a-4759-8a53-d42878bf84f7@gma... Fixes: 956a17d9d050 ("btrfs: add a shrinker for extent maps") CC: stable@vger.kernel.org # 6.10+ Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/super.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2387,7 +2387,13 @@ static long btrfs_nr_cached_objects(stru
trace_btrfs_extent_map_shrinker_count(fs_info, nr);
- return nr; + /* + * Only report the real number for DEBUG builds, as there are reports of + * serious performance degradation caused by too frequent shrinks. + */ + if (IS_ENABLED(CONFIG_BTRFS_DEBUG)) + return nr; + return 0; }
static long btrfs_free_cached_objects(struct super_block *sb, struct shrink_control *sc)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl
commit 0573a1e2ea7e35bff08944a40f1adf2bb35cea61 upstream.
Missing validation ...
Checked libdrm and it clears all the structs, so we should be safe to just check everything.
Signed-off-by: Bas Nieuwenhuizen bas@basnieuwenhuizen.nl Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit c6b86421f1f9ddf9d706f2453159813ee39d0cf9) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c @@ -685,16 +685,24 @@ int amdgpu_ctx_ioctl(struct drm_device *
switch (args->in.op) { case AMDGPU_CTX_OP_ALLOC_CTX: + if (args->in.flags) + return -EINVAL; r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, &id); args->out.alloc.ctx_id = id; break; case AMDGPU_CTX_OP_FREE_CTX: + if (args->in.flags) + return -EINVAL; r = amdgpu_ctx_free(fpriv, id); break; case AMDGPU_CTX_OP_QUERY_STATE: + if (args->in.flags) + return -EINVAL; r = amdgpu_ctx_query(adev, fpriv, id, &args->out); break; case AMDGPU_CTX_OP_QUERY_STATE2: + if (args->in.flags) + return -EINVAL; r = amdgpu_ctx_query2(adev, fpriv, id, &args->out); break; case AMDGPU_CTX_OP_GET_STABLE_PSTATE:
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Al Viro viro@zeniv.linux.org.uk
commit 046667c4d3196938e992fba0dfcde570aa85cd0e upstream.
we are *not* guaranteed that anything past the terminating NUL is mapped (let alone initialized with anything sane).
Fixes: 0dea116876ee ("cgroup: implement eventfd-based generic API for notifications") Cc: stable@vger.kernel.org Cc: Andrew Morton akpm@linux-foundation.org Acked-by: Michal Hocko mhocko@suse.com Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/memcontrol.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5282,9 +5282,12 @@ static ssize_t memcg_write_event_control buf = endp + 1;
cfd = simple_strtoul(buf, &endp, 10); - if ((*endp != ' ') && (*endp != '\0')) + if (*endp == '\0') + buf = endp; + else if (*endp == ' ') + buf = endp + 1; + else return -EINVAL; - buf = endp + 1;
event = kzalloc(sizeof(*event), GFP_KERNEL); if (!event)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rodrigo Siqueira Rodrigo.Siqueira@amd.com
commit 56fb276d0244d430496f249335a44ae114dd5f54 upstream.
[why & how] When the commit 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror") was introduced, it used the wrong calculation for the position copy for X. This commit uses the correct calculation for that based on the original patch.
Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror") Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Acked-by: Wayne Lin wayne.lin@amd.com Signed-off-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 8f9b23abbae5ffcd64856facd26a86b67195bc2f) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c @@ -3698,7 +3698,7 @@ void dcn10_set_cursor_position(struct pi (int)hubp->curs_attr.width || pos_cpy.x <= (int)hubp->curs_attr.width + pipe_ctx->plane_state->src_rect.x) { - pos_cpy.x = 2 * viewport_width - temp_x; + pos_cpy.x = temp_x + viewport_width; } } } else {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hamza Mahfooz hamza.mahfooz@amd.com
commit f6098641d3e1e4d4052ff9378857c831f9675f6b upstream.
To be able to get to the lowest power state when suspending systems with DCN3.5+, we must be in IPS before the display hardware is put into D3cold. So, to ensure that the system always reaches the lowest power state while suspending, force systems that support IPS to enter idle optimizations before entering D3cold.
Reviewed-by: Roman Li roman.li@amd.com Signed-off-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 237193e21b29d4aa0617ffeea3d6f49e72999708) Cc: stable@vger.kernel.org # 6.10+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2724,6 +2724,9 @@ static int dm_suspend(void *handle)
hpd_rx_irq_work_suspend(dm);
+ if (adev->dm.dc->caps.ips_support) + dc_allow_idle_optimizations(adev->dm.dc, true); + dc_set_power_state(dm->dc, DC_ACPI_CM_POWER_STATE_D3); dc_dmub_srv_set_power_state(dm->dc->ctx->dmub_srv, DC_ACPI_CM_POWER_STATE_D3);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Loan Chen lo-an.chen@amd.com
commit 0dbb81d44108a2a1004e5b485ef3fca5bc078424 upstream.
[Why] Tiled display cannot synchronize properly after S3. The fix for commit 5f0c74915815 ("drm/amd/display: Fix for otg synchronization logic") is not enable in DCN321, which causes the otg is excluded from synchronization.
[How] Enable otg synchronization logic in dcn321.
Fixes: 5f0c74915815 ("drm/amd/display: Fix for otg synchronization logic") Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Reviewed-by: Alvin Lee alvin.lee2@amd.com Signed-off-by: Loan Chen lo-an.chen@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit d6ed53712f583423db61fbb802606759e023bf7b) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/resource/dcn321/dcn321_resource.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/amd/display/dc/resource/dcn321/dcn321_resource.c +++ b/drivers/gpu/drm/amd/display/dc/resource/dcn321/dcn321_resource.c @@ -1778,6 +1778,9 @@ static bool dcn321_resource_construct( dc->caps.color.mpc.ogam_rom_caps.hlg = 0; dc->caps.color.mpc.ocsc = 1;
+ /* Use pipe context based otg sync logic */ + dc->config.use_pipe_ctx_sync_logic = true; + dc->config.dc_mode_clk_limit_support = true; dc->config.enable_windowed_mpo_odm = true; /* read VBIOS LTTPR caps */
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Melissa Wen mwen@igalia.com
commit 737222cebecbdbcdde2b69475c52bcb9ecfeb830 upstream.
[why & how] Cursor gets clipped off in the middle of the screen with hw rotation 180. Fix a miscalculation of cursor offset when it's placed near the edges in the pipe split case.
Cursor bugs with hw rotation were reported on AMD issue tracker: https://gitlab.freedesktop.org/drm/amd/-/issues/2247
The issues on rotation 270 was fixed by: https://lore.kernel.org/amd-gfx/20221118125935.4013669-22-Brian.Chang@amd.co... that partially addressed the rotation 180 too. So, this patch is the final bits for rotation 180.
Reported-by: Xaver Hugl xaver.hugl@gmail.com Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2247 Reviewed-by: Harry Wentland harry.wentland@amd.com Fixes: 9d84c7ef8a87 ("drm/amd/display: Correct cursor position on horizontal mirror") Signed-off-by: Melissa Wen mwen@igalia.com Signed-off-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Tom Chung chiahsuan.chung@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 1fd2cf090096af8a25bf85564341cfc21cec659d) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c @@ -3605,7 +3605,7 @@ void dcn10_set_cursor_position(struct pi (int)hubp->curs_attr.width || pos_cpy.x <= (int)hubp->curs_attr.width + pipe_ctx->plane_state->src_rect.x) { - pos_cpy.x = temp_x + viewport_width; + pos_cpy.x = 2 * viewport_width - temp_x; } } } else {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit e414a304f2c5368a84f03ad34d29b89f965a33c9 upstream.
This needs to be set as well if the IB uses atomics.
Reviewed-by: Leo Liu leo.liu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 35c628774e50b3784c59e8ca7973f03bcb067132) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c @@ -543,11 +543,11 @@ void jpeg_v2_0_dec_ring_emit_ib(struct a
amdgpu_ring_write(ring, PACKETJ(mmUVD_LMI_JRBC_IB_VMID_INTERNAL_OFFSET, 0, 0, PACKETJ_TYPE0)); - amdgpu_ring_write(ring, (vmid | (vmid << 4))); + amdgpu_ring_write(ring, (vmid | (vmid << 4) | (vmid << 8)));
amdgpu_ring_write(ring, PACKETJ(mmUVD_LMI_JPEG_VMID_INTERNAL_OFFSET, 0, 0, PACKETJ_TYPE0)); - amdgpu_ring_write(ring, (vmid | (vmid << 4))); + amdgpu_ring_write(ring, (vmid | (vmid << 4) | (vmid << 8)));
amdgpu_ring_write(ring, PACKETJ(mmUVD_LMI_JRBC_IB_64BIT_BAR_LOW_INTERNAL_OFFSET, 0, 0, PACKETJ_TYPE0));
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit e6c6bd6253e792cee6c5c065e106e87b9f0d9ae9 upstream.
This needs to be set as well if the IB uses atomics.
Reviewed-by: Leo Liu leo.liu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit c6c2e8b6a427d4fecc7c36cffccb908185afcab2) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c @@ -773,11 +773,11 @@ void jpeg_v4_0_3_dec_ring_emit_ib(struct
amdgpu_ring_write(ring, PACKETJ(regUVD_LMI_JRBC_IB_VMID_INTERNAL_OFFSET, 0, 0, PACKETJ_TYPE0)); - amdgpu_ring_write(ring, (vmid | (vmid << 4))); + amdgpu_ring_write(ring, (vmid | (vmid << 4) | (vmid << 8)));
amdgpu_ring_write(ring, PACKETJ(regUVD_LMI_JPEG_VMID_INTERNAL_OFFSET, 0, 0, PACKETJ_TYPE0)); - amdgpu_ring_write(ring, (vmid | (vmid << 4))); + amdgpu_ring_write(ring, (vmid | (vmid << 4) | (vmid << 8)));
amdgpu_ring_write(ring, PACKETJ(regUVD_LMI_JRBC_IB_64BIT_BAR_LOW_INTERNAL_OFFSET, 0, 0, PACKETJ_TYPE0));
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: David (Ming Qiang) Wu David.Wu3@amd.com
commit 470516c2925493594a690bc4d05b1f4471d9f996 upstream.
Add JPEG IB command parser to ensure registers in the command are within the JPEG IP block.
Reviewed-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: David (Ming Qiang) Wu David.Wu3@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit a7f670d5d8e77b092404ca8a35bb0f8f89ed3117) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 + drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 61 ++++++++++++++++++++++++++++++- drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.h | 7 +++ drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c | 1 drivers/gpu/drm/amd/amdgpu/soc15d.h | 6 +++ 5 files changed, 76 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -1057,6 +1057,9 @@ static int amdgpu_cs_patch_ibs(struct am r = amdgpu_ring_parse_cs(ring, p, job, ib); if (r) return r; + + if (ib->sa_bo) + ib->gpu_addr = amdgpu_sa_bo_gpu_addr(ib->sa_bo); } else { ib->ptr = (uint32_t *)kptr; r = amdgpu_ring_patch_cs_in_place(ring, p, job, ib); --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c @@ -23,6 +23,7 @@
#include "amdgpu.h" #include "amdgpu_jpeg.h" +#include "amdgpu_cs.h" #include "soc15.h" #include "soc15d.h" #include "jpeg_v4_0_3.h" @@ -773,7 +774,11 @@ void jpeg_v4_0_3_dec_ring_emit_ib(struct
amdgpu_ring_write(ring, PACKETJ(regUVD_LMI_JRBC_IB_VMID_INTERNAL_OFFSET, 0, 0, PACKETJ_TYPE0)); - amdgpu_ring_write(ring, (vmid | (vmid << 4) | (vmid << 8))); + + if (ring->funcs->parse_cs) + amdgpu_ring_write(ring, 0); + else + amdgpu_ring_write(ring, (vmid | (vmid << 4) | (vmid << 8)));
amdgpu_ring_write(ring, PACKETJ(regUVD_LMI_JPEG_VMID_INTERNAL_OFFSET, 0, 0, PACKETJ_TYPE0)); @@ -1063,6 +1068,7 @@ static const struct amdgpu_ring_funcs jp .get_rptr = jpeg_v4_0_3_dec_ring_get_rptr, .get_wptr = jpeg_v4_0_3_dec_ring_get_wptr, .set_wptr = jpeg_v4_0_3_dec_ring_set_wptr, + .parse_cs = jpeg_v4_0_3_dec_ring_parse_cs, .emit_frame_size = SOC15_FLUSH_GPU_TLB_NUM_WREG * 6 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 8 + @@ -1227,3 +1233,56 @@ static void jpeg_v4_0_3_set_ras_funcs(st { adev->jpeg.ras = &jpeg_v4_0_3_ras; } + +/** + * jpeg_v4_0_3_dec_ring_parse_cs - command submission parser + * + * @parser: Command submission parser context + * @job: the job to parse + * @ib: the IB to parse + * + * Parse the command stream, return -EINVAL for invalid packet, + * 0 otherwise + */ +int jpeg_v4_0_3_dec_ring_parse_cs(struct amdgpu_cs_parser *parser, + struct amdgpu_job *job, + struct amdgpu_ib *ib) +{ + uint32_t i, reg, res, cond, type; + struct amdgpu_device *adev = parser->adev; + + for (i = 0; i < ib->length_dw ; i += 2) { + reg = CP_PACKETJ_GET_REG(ib->ptr[i]); + res = CP_PACKETJ_GET_RES(ib->ptr[i]); + cond = CP_PACKETJ_GET_COND(ib->ptr[i]); + type = CP_PACKETJ_GET_TYPE(ib->ptr[i]); + + if (res) /* only support 0 at the moment */ + return -EINVAL; + + switch (type) { + case PACKETJ_TYPE0: + if (cond != PACKETJ_CONDITION_CHECK0 || reg < JPEG_REG_RANGE_START || reg > JPEG_REG_RANGE_END) { + dev_err(adev->dev, "Invalid packet [0x%08x]!\n", ib->ptr[i]); + return -EINVAL; + } + break; + case PACKETJ_TYPE3: + if (cond != PACKETJ_CONDITION_CHECK3 || reg < JPEG_REG_RANGE_START || reg > JPEG_REG_RANGE_END) { + dev_err(adev->dev, "Invalid packet [0x%08x]!\n", ib->ptr[i]); + return -EINVAL; + } + break; + case PACKETJ_TYPE6: + if (ib->ptr[i] == CP_PACKETJ_NOP) + continue; + dev_err(adev->dev, "Invalid packet [0x%08x]!\n", ib->ptr[i]); + return -EINVAL; + default: + dev_err(adev->dev, "Unknown packet type %d !\n", type); + return -EINVAL; + } + } + + return 0; +} --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.h +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.h @@ -46,6 +46,9 @@
#define JRBC_DEC_EXTERNAL_REG_WRITE_ADDR 0x18000
+#define JPEG_REG_RANGE_START 0x4000 +#define JPEG_REG_RANGE_END 0x41c2 + extern const struct amdgpu_ip_block_version jpeg_v4_0_3_ip_block;
void jpeg_v4_0_3_dec_ring_emit_ib(struct amdgpu_ring *ring, @@ -62,5 +65,7 @@ void jpeg_v4_0_3_dec_ring_insert_end(str void jpeg_v4_0_3_dec_ring_emit_wreg(struct amdgpu_ring *ring, uint32_t reg, uint32_t val); void jpeg_v4_0_3_dec_ring_emit_reg_wait(struct amdgpu_ring *ring, uint32_t reg, uint32_t val, uint32_t mask); - +int jpeg_v4_0_3_dec_ring_parse_cs(struct amdgpu_cs_parser *parser, + struct amdgpu_job *job, + struct amdgpu_ib *ib); #endif /* __JPEG_V4_0_3_H__ */ --- a/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c +++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c @@ -523,6 +523,7 @@ static const struct amdgpu_ring_funcs jp .get_rptr = jpeg_v5_0_0_dec_ring_get_rptr, .get_wptr = jpeg_v5_0_0_dec_ring_get_wptr, .set_wptr = jpeg_v5_0_0_dec_ring_set_wptr, + .parse_cs = jpeg_v4_0_3_dec_ring_parse_cs, .emit_frame_size = SOC15_FLUSH_GPU_TLB_NUM_WREG * 6 + SOC15_FLUSH_GPU_TLB_NUM_REG_WAIT * 8 + --- a/drivers/gpu/drm/amd/amdgpu/soc15d.h +++ b/drivers/gpu/drm/amd/amdgpu/soc15d.h @@ -76,6 +76,12 @@ ((cond & 0xF) << 24) | \ ((type & 0xF) << 28))
+#define CP_PACKETJ_NOP 0x60000000 +#define CP_PACKETJ_GET_REG(x) ((x) & 0x3FFFF) +#define CP_PACKETJ_GET_RES(x) (((x) >> 18) & 0x3F) +#define CP_PACKETJ_GET_COND(x) (((x) >> 24) & 0xF) +#define CP_PACKETJ_GET_TYPE(x) (((x) >> 28) & 0xF) + /* Packet 3 types */ #define PACKET3_NOP 0x10 #define PACKET3_SET_BASE 0x11
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Brauner brauner@kernel.org
commit 3b5bbe798b2451820e74243b738268f51901e7d0 upstream.
It's currently possible to create pidfds for kthreads but it is unclear what that is supposed to mean. Until we have use-cases for it and we figured out what behavior we want block the creation of pidfds for kthreads.
Link: https://lore.kernel.org/r/20240731-gleis-mehreinnahmen-6bbadd128383@brauner Fixes: 32fcb426ec00 ("pid: add pidfd_open()") Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/fork.c | 25 ++++++++++++++++++++++--- 1 file changed, 22 insertions(+), 3 deletions(-)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -2069,11 +2069,24 @@ static int __pidfd_prepare(struct pid *p */ int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) { - bool thread = flags & PIDFD_THREAD; - - if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID)) + if (!pid) return -EINVAL;
+ scoped_guard(rcu) { + struct task_struct *tsk; + + if (flags & PIDFD_THREAD) + tsk = pid_task(pid, PIDTYPE_PID); + else + tsk = pid_task(pid, PIDTYPE_TGID); + if (!tsk) + return -EINVAL; + + /* Don't create pidfds for kernel threads for now. */ + if (tsk->flags & PF_KTHREAD) + return -EINVAL; + } + return __pidfd_prepare(pid, flags, ret); }
@@ -2419,6 +2432,12 @@ __latent_entropy struct task_struct *cop if (clone_flags & CLONE_PIDFD) { int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
+ /* Don't create pidfds for kernel threads for now. */ + if (args->kthread) { + retval = -EINVAL; + goto bad_fork_free_pid; + } + /* Note that no task has been attached to @pid yet. */ retval = __pidfd_prepare(pid, flags, &pidfile); if (retval < 0)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Claudio Imbrenda imbrenda@linux.ibm.com
[ Upstream commit cff59d8631e1409ffdd22d9d717e15810181b32c ]
The return value uv_set_shared() and uv_remove_shared() (which are wrappers around the share() function) is not always checked. The system integrity of a protected guest depends on the Share and Unshare UVCs being successful. This means that any caller that fails to check the return value will compromise the security of the protected guest.
No code path that would lead to such violation of the security guarantees is currently exercised, since all the areas that are shared never get unshared during the lifetime of the system. This might change and become an issue in the future.
The Share and Unshare UVCs can only fail in case of hypervisor misbehaviour (either a bug or malicious behaviour). In such cases there is no reasonable way forward, and the system needs to panic.
This patch replaces the return at the end of the share() function with a panic, to guarantee system integrity.
Fixes: 5abb9351dfd9 ("s390/uv: introduce guest side ultravisor code") Signed-off-by: Claudio Imbrenda imbrenda@linux.ibm.com Reviewed-by: Christian Borntraeger borntraeger@linux.ibm.com Reviewed-by: Steffen Eiden seiden@linux.ibm.com Reviewed-by: Janosch Frank frankja@linux.ibm.com Link: https://lore.kernel.org/r/20240801112548.85303-1-imbrenda@linux.ibm.com Message-ID: 20240801112548.85303-1-imbrenda@linux.ibm.com [frankja@linux.ibm.com: Fixed up patch subject] Signed-off-by: Janosch Frank frankja@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/include/asm/uv.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/arch/s390/include/asm/uv.h b/arch/s390/include/asm/uv.h index 0e7bd3873907f..b2e2f9a4163c5 100644 --- a/arch/s390/include/asm/uv.h +++ b/arch/s390/include/asm/uv.h @@ -442,7 +442,10 @@ static inline int share(unsigned long addr, u16 cmd)
if (!uv_call(0, (u64)&uvcb)) return 0; - return -EINVAL; + pr_err("%s UVC failed (rc: 0x%x, rrc: 0x%x), possible hypervisor bug.\n", + uvcb.header.cmd == UVC_CMD_SET_SHARED_ACCESS ? "Share" : "Unshare", + uvcb.header.rc, uvcb.header.rrc); + panic("System security cannot be guaranteed unless the system panics now.\n"); }
/*
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Wilcox (Oracle) willy@infradead.org
[ Upstream commit 98055bc3595500bcf2126b93b1595354bdb86a66 ]
As in commit 4e527d5841e2 ("iomap: fault in smaller chunks for non-large folio mappings"), we can see a performance loss for filesystems which have not yet been converted to large folios.
Fixes: c38f4e96e605 ("netfs: Provide func to copy data to pagecache for buffered write") Signed-off-by: Matthew Wilcox (Oracle) willy@infradead.org Link: https://lore.kernel.org/r/20240527201735.1898381-1-willy@infradead.org Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/netfs/buffered_write.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/netfs/buffered_write.c b/fs/netfs/buffered_write.c index ecbc99ec7d367..18055c1e01835 100644 --- a/fs/netfs/buffered_write.c +++ b/fs/netfs/buffered_write.c @@ -184,7 +184,7 @@ ssize_t netfs_perform_write(struct kiocb *iocb, struct iov_iter *iter, unsigned int bdp_flags = (iocb->ki_flags & IOCB_NOWAIT) ? BDP_ASYNC : 0; ssize_t written = 0, ret, ret2; loff_t i_size, pos = iocb->ki_pos, from, to; - size_t max_chunk = PAGE_SIZE << MAX_PAGECACHE_ORDER; + size_t max_chunk = mapping_max_folio_size(mapping); bool maybe_trouble = false;
if (unlikely(test_bit(NETFS_ICTX_WRITETHROUGH, &ctx->flags) ||
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Omar Sandoval osandov@fb.com
[ Upstream commit 3f65f3c099bcb27949e712f39ba836f21785924a ]
When struct file_lease was split out from struct file_lock, the name of the file_lock slab cache was copied to the new slab cache for file_lease. This name conflict causes confusion in /proc/slabinfo and /sys/kernel/slab. In particular, it caused failures in drgn's test case for slab cache merging.
Link: https://github.com/osandov/drgn/blob/9ad29fd86499eb32847473e928b6540872d3d59... Fixes: c69ff4071935 ("filelock: split leases out of struct file_lock") Signed-off-by: Omar Sandoval osandov@fb.com Link: https://lore.kernel.org/r/2d1d053da1cafb3e7940c4f25952da4f0af34e38.172229327... Reviewed-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/locks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/locks.c b/fs/locks.c index 9afb16e0683ff..e45cad40f8b6b 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2984,7 +2984,7 @@ static int __init filelock_init(void) filelock_cache = kmem_cache_create("file_lock_cache", sizeof(struct file_lock), 0, SLAB_PANIC, NULL);
- filelease_cache = kmem_cache_create("file_lock_cache", + filelease_cache = kmem_cache_create("file_lease_cache", sizeof(struct file_lease), 0, SLAB_PANIC, NULL);
for_each_possible_cpu(i) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: yangerkun yangerkun@huawei.com
[ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ]
After we switch tmpfs dir operations from simple_dir_operations to simple_offset_dir_operations, every rename happened will fill new dentry to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free key starting with octx->newx_offset, and then set newx_offset equals to free key + 1. This will lead to infinite readdir combine with rename happened at the same time, which fail generic/736 in xfstests(detail show as below).
1. create 5000 files(1 2 3...) under one dir 2. call readdir(man 3 readdir) once, and get one entry 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) 4. loop 2~3, until readdir return nothing or we loop too many times(tmpfs break test with the second condition)
We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite directory reads") to fix it, record the last_index when we open dir, and do not emit the entry which index >= last_index. The file->private_data now used in offset dir can use directly to do this, and we also update the last_index when we llseek the dir file.
Fixes: a2e459555c5f ("shmem: stable directory offsets") Signed-off-by: yangerkun yangerkun@huawei.com Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com Reviewed-by: Chuck Lever chuck.lever@oracle.com [brauner: only update last_index after seek when offset is zero like Jan suggested] Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/libfs.c | 35 ++++++++++++++++++++++++----------- 1 file changed, 24 insertions(+), 11 deletions(-)
diff --git a/fs/libfs.c b/fs/libfs.c index b635ee5adbcce..65279e53fbf27 100644 --- a/fs/libfs.c +++ b/fs/libfs.c @@ -450,6 +450,14 @@ void simple_offset_destroy(struct offset_ctx *octx) mtree_destroy(&octx->mt); }
+static int offset_dir_open(struct inode *inode, struct file *file) +{ + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); + + file->private_data = (void *)ctx->next_offset; + return 0; +} + /** * offset_dir_llseek - Advance the read position of a directory descriptor * @file: an open directory whose position is to be updated @@ -463,6 +471,9 @@ void simple_offset_destroy(struct offset_ctx *octx) */ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) { + struct inode *inode = file->f_inode; + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); + switch (whence) { case SEEK_CUR: offset += file->f_pos; @@ -476,7 +487,8 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) }
/* In this case, ->private_data is protected by f_pos_lock */ - file->private_data = NULL; + if (!offset) + file->private_data = (void *)ctx->next_offset; return vfs_setpos(file, offset, LONG_MAX); }
@@ -507,7 +519,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) inode->i_ino, fs_umode_to_dtype(inode->i_mode)); }
-static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) { struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); struct dentry *dentry; @@ -515,17 +527,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) while (true) { dentry = offset_find_next(octx, ctx->pos); if (!dentry) - return ERR_PTR(-ENOENT); + return; + + if (dentry2offset(dentry) >= last_index) { + dput(dentry); + return; + }
if (!offset_dir_emit(ctx, dentry)) { dput(dentry); - break; + return; }
ctx->pos = dentry2offset(dentry) + 1; dput(dentry); } - return NULL; }
/** @@ -552,22 +568,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) static int offset_readdir(struct file *file, struct dir_context *ctx) { struct dentry *dir = file->f_path.dentry; + long last_index = (long)file->private_data;
lockdep_assert_held(&d_inode(dir)->i_rwsem);
if (!dir_emit_dots(file, ctx)) return 0;
- /* In this case, ->private_data is protected by f_pos_lock */ - if (ctx->pos == DIR_OFFSET_MIN) - file->private_data = NULL; - else if (file->private_data == ERR_PTR(-ENOENT)) - return 0; - file->private_data = offset_iterate_dir(d_inode(dir), ctx); + offset_iterate_dir(d_inode(dir), ctx, last_index); return 0; }
const struct file_operations simple_offset_dir_operations = { + .open = offset_dir_open, .llseek = offset_dir_llseek, .iterate_shared = offset_readdir, .read = generic_read_dir,
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leon Hwang leon.hwang@linux.dev
[ Upstream commit fdad456cbcca739bae1849549c7a999857c56f88 ]
The commit f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") fixed a NULL pointer dereference panic, but didn't fix the issue that fails to update attached freplace prog to prog_array map.
Since commit 1c123c567fb1 ("bpf: Resolve fext program type when checking map compatibility"), freplace prog and its target prog are able to tail call each other.
And the commit 3aac1ead5eb6 ("bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach") sets prog->aux->dst_prog as NULL after attaching freplace prog to its target prog.
After loading freplace the prog_array's owner type is BPF_PROG_TYPE_SCHED_CLS. Then, after attaching freplace its prog->aux->dst_prog is NULL. Then, while updating freplace in prog_array the bpf_prog_map_compatible() incorrectly returns false because resolve_prog_type() returns BPF_PROG_TYPE_EXT instead of BPF_PROG_TYPE_SCHED_CLS. After this patch the resolve_prog_type() returns BPF_PROG_TYPE_SCHED_CLS and update to prog_array can succeed.
Fixes: f7866c358733 ("bpf: Fix null pointer dereference in resolve_prog_type() for BPF_PROG_TYPE_EXT") Cc: Toke Høiland-Jørgensen toke@redhat.com Cc: Martin KaFai Lau martin.lau@kernel.org Acked-by: Yonghong Song yonghong.song@linux.dev Signed-off-by: Leon Hwang leon.hwang@linux.dev Link: https://lore.kernel.org/r/20240728114612.48486-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/bpf_verifier.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index ff2a6cdb1fa3f..5db4a3f354804 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -846,8 +846,8 @@ static inline u32 type_flag(u32 type) /* only use after check_attach_btf_id() */ static inline enum bpf_prog_type resolve_prog_type(const struct bpf_prog *prog) { - return (prog->type == BPF_PROG_TYPE_EXT && prog->aux->dst_prog) ? - prog->aux->dst_prog->type : prog->type; + return (prog->type == BPF_PROG_TYPE_EXT && prog->aux->saved_dst_prog_type) ? + prog->aux->saved_dst_prog_type : prog->type; }
static inline bool bpf_prog_check_recur(const struct bpf_prog *prog)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yonghong Song yonghong.song@linux.dev
[ Upstream commit bed2eb964c70b780fb55925892a74f26cb590b25 ]
Daniel Hodges reported a kernel verifier crash when playing with sched-ext. Further investigation shows that the crash is due to invalid memory access in stacksafe(). More specifically, it is the following code:
if (exact != NOT_EXACT && old->stack[spi].slot_type[i % BPF_REG_SIZE] != cur->stack[spi].slot_type[i % BPF_REG_SIZE]) return false;
The 'i' iterates old->allocated_stack. If cur->allocated_stack < old->allocated_stack the out-of-bound access will happen.
To fix the issue add 'i >= cur->allocated_stack' check such that if the condition is true, stacksafe() should fail. Otherwise, cur->stack[spi].slot_type[i % BPF_REG_SIZE] memory access is legal.
Fixes: 2793a8b015f7 ("bpf: exact states comparison for iterator convergence checks") Cc: Eduard Zingerman eddyz87@gmail.com Reported-by: Daniel Hodges hodgesd@meta.com Acked-by: Eduard Zingerman eddyz87@gmail.com Signed-off-by: Yonghong Song yonghong.song@linux.dev Link: https://lore.kernel.org/r/20240812214847.213612-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/verifier.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c index a8845cc299fec..521bd7efae038 100644 --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -16881,8 +16881,9 @@ static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old, spi = i / BPF_REG_SIZE;
if (exact != NOT_EXACT && - old->stack[spi].slot_type[i % BPF_REG_SIZE] != - cur->stack[spi].slot_type[i % BPF_REG_SIZE]) + (i >= cur->allocated_stack || + old->stack[spi].slot_type[i % BPF_REG_SIZE] != + cur->stack[spi].slot_type[i % BPF_REG_SIZE])) return false;
if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dominique Martinet asmadeus@codewreck.org
[ Upstream commit e3786b29c54cdae3490b07180a54e2461f42144c ]
If a program is watching a file on a 9p mount, it won't see any change in size if the file being exported by the server is changed directly in the source filesystem, presumably because 9p doesn't have change notifications, and because netfs skips the reads if the file is empty.
Fix this by attempting to read the full size specified when a DIO read is requested (such as when 9p is operating in unbuffered mode) and dealing with a short read if the EOF was less than the expected read.
To make this work, filesystems using netfslib must not set NETFS_SREQ_CLEAR_TAIL if performing a DIO read where that read hit the EOF. I don't want to mandatorily clear this flag in netfslib for DIO because, say, ceph might make a read from an object that is not completely filled, but does not reside at the end of file - and so we need to clear the excess.
This can be tested by watching an empty file over 9p within a VM (such as in the ktest framework):
while true; do read content; if [ -n "$content" ]; then echo $content; break; fi; done < /host/tmp/foo
then writing something into the empty file. The watcher should immediately display the file content and break out of the loop. Without this fix, it remains in the loop indefinitely.
Fixes: 80105ed2fd27 ("9p: Use netfslib read/write_iter") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218916 Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/1229195.1723211769@warthog.procyon.org.uk cc: Eric Van Hensbergen ericvh@kernel.org cc: Latchesar Ionkov lucho@ionkov.net cc: Christian Schoenebeck linux_oss@crudebyte.com cc: Marc Dionne marc.dionne@auristor.com cc: Ilya Dryomov idryomov@gmail.com cc: Steve French sfrench@samba.org cc: Paulo Alcantara pc@manguebit.com cc: Trond Myklebust trond.myklebust@hammerspace.com cc: v9fs@lists.linux.dev cc: linux-afs@lists.infradead.org cc: ceph-devel@vger.kernel.org cc: linux-cifs@vger.kernel.org cc: linux-nfs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Dominique Martinet asmadeus@codewreck.org Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/9p/vfs_addr.c | 3 ++- fs/afs/file.c | 3 ++- fs/ceph/addr.c | 6 ++++-- fs/netfs/io.c | 17 +++++++++++------ fs/nfs/fscache.c | 3 ++- fs/smb/client/file.c | 3 ++- 6 files changed, 23 insertions(+), 12 deletions(-)
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index a97ceb105cd8d..24fdc74caeba4 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -75,7 +75,8 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)
/* if we just extended the file size, any portion not in * cache won't be on server and is zeroes */ - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + if (subreq->rreq->origin != NETFS_DIO_READ) + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
netfs_subreq_terminated(subreq, err ?: total, false); } diff --git a/fs/afs/file.c b/fs/afs/file.c index c3f0c45ae9a9b..ec1be0091fdb5 100644 --- a/fs/afs/file.c +++ b/fs/afs/file.c @@ -242,7 +242,8 @@ static void afs_fetch_data_notify(struct afs_operation *op)
req->error = error; if (subreq) { - __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + if (subreq->rreq->origin != NETFS_DIO_READ) + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); netfs_subreq_terminated(subreq, error ?: req->actual_len, false); req->subreq = NULL; } else if (req->done) { diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index 73b5a07bf94de..d2194022132ec 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -246,7 +246,8 @@ static void finish_netfs_read(struct ceph_osd_request *req) if (err >= 0) { if (sparse && err > 0) err = ceph_sparse_ext_map_end(op); - if (err < subreq->len) + if (err < subreq->len && + subreq->rreq->origin != NETFS_DIO_READ) __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); if (IS_ENCRYPTED(inode) && err > 0) { err = ceph_fscrypt_decrypt_extents(inode, @@ -282,7 +283,8 @@ static bool ceph_netfs_issue_op_inline(struct netfs_io_subrequest *subreq) size_t len; int mode;
- __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + if (rreq->origin != NETFS_DIO_READ) + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); __clear_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
if (subreq->start >= inode->i_size) diff --git a/fs/netfs/io.c b/fs/netfs/io.c index f3abc5dfdbc0c..19ec6990dc91e 100644 --- a/fs/netfs/io.c +++ b/fs/netfs/io.c @@ -530,7 +530,8 @@ void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
if (transferred_or_error == 0) { if (__test_and_set_bit(NETFS_SREQ_NO_PROGRESS, &subreq->flags)) { - subreq->error = -ENODATA; + if (rreq->origin != NETFS_DIO_READ) + subreq->error = -ENODATA; goto failed; } } else { @@ -601,9 +602,14 @@ netfs_rreq_prepare_read(struct netfs_io_request *rreq, } if (subreq->len > ictx->zero_point - subreq->start) subreq->len = ictx->zero_point - subreq->start; + + /* We limit buffered reads to the EOF, but let the + * server deal with larger-than-EOF DIO/unbuffered + * reads. + */ + if (subreq->len > rreq->i_size - subreq->start) + subreq->len = rreq->i_size - subreq->start; } - if (subreq->len > rreq->i_size - subreq->start) - subreq->len = rreq->i_size - subreq->start; if (rreq->rsize && subreq->len > rreq->rsize) subreq->len = rreq->rsize;
@@ -739,11 +745,10 @@ int netfs_begin_read(struct netfs_io_request *rreq, bool sync) do { kdebug("submit %llx + %llx >= %llx", rreq->start, rreq->submitted, rreq->i_size); - if (rreq->origin == NETFS_DIO_READ && - rreq->start + rreq->submitted >= rreq->i_size) - break; if (!netfs_rreq_submit_slice(rreq, &io_iter)) break; + if (test_bit(NETFS_SREQ_NO_PROGRESS, &rreq->flags)) + break; if (test_bit(NETFS_RREQ_BLOCKED, &rreq->flags) && test_bit(NETFS_RREQ_NONBLOCK, &rreq->flags)) break; diff --git a/fs/nfs/fscache.c b/fs/nfs/fscache.c index ddc1ee0319554..bc20ba50283c8 100644 --- a/fs/nfs/fscache.c +++ b/fs/nfs/fscache.c @@ -361,7 +361,8 @@ void nfs_netfs_read_completion(struct nfs_pgio_header *hdr) return;
sreq = netfs->sreq; - if (test_bit(NFS_IOHDR_EOF, &hdr->flags)) + if (test_bit(NFS_IOHDR_EOF, &hdr->flags) && + sreq->rreq->origin != NETFS_DIO_READ) __set_bit(NETFS_SREQ_CLEAR_TAIL, &sreq->flags);
if (hdr->error) diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c index 2e3c4d0277dbb..9e4f4e67768b9 100644 --- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -196,7 +196,8 @@ static void cifs_req_issue_read(struct netfs_io_subrequest *subreq) goto out; }
- __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags); + if (subreq->rreq->origin != NETFS_DIO_READ) + __set_bit(NETFS_SREQ_CLEAR_TAIL, &subreq->flags);
rc = rdata->server->ops->async_readv(rdata); out:
Greg Kroah-Hartman wrote on Tue, Aug 27, 2024 at 04:36:47PM +0200:
From: Dominique Martinet asmadeus@codewreck.org
[ Upstream commit e3786b29c54cdae3490b07180a54e2461f42144c ]
As much as I'd like to have this in, it breaks cifs so please hold this patch until at least these two patches also get backported (I didn't actually test the fix so not sure which is needed, *probably* either/both): 950b03d0f66 ("netfs: Fix missing iterator reset on retry of short read") https://lore.kernel.org/r/20240823200819.532106-8-dhowells@redhat.com ("netfs, cifs: Fix handling of short DIO read")
For some reason the former got in master but the later wasn't despite having been sent together, I might have missed some mails and only the first might actually be required.. David, Steve please let us know if just the first is enough.
Either way the 9p patch can wait a couple more weeks; stuck debian CI (9p) is bad but cifs corruptions are worse.
Thanks,
We are working through this regression with David this week - hopefully will have more info in a few days
On Tue, Aug 27, 2024 at 2:59 PM Dominique Martinet asmadeus@codewreck.org wrote:
Greg Kroah-Hartman wrote on Tue, Aug 27, 2024 at 04:36:47PM +0200:
From: Dominique Martinet asmadeus@codewreck.org
[ Upstream commit e3786b29c54cdae3490b07180a54e2461f42144c ]
As much as I'd like to have this in, it breaks cifs so please hold this patch until at least these two patches also get backported (I didn't actually test the fix so not sure which is needed, *probably* either/both): 950b03d0f66 ("netfs: Fix missing iterator reset on retry of short read") https://lore.kernel.org/r/20240823200819.532106-8-dhowells@redhat.com ("netfs, cifs: Fix handling of short DIO read")
For some reason the former got in master but the later wasn't despite having been sent together, I might have missed some mails and only the first might actually be required.. David, Steve please let us know if just the first is enough.
Either way the 9p patch can wait a couple more weeks; stuck debian CI (9p) is bad but cifs corruptions are worse.
Thanks,
Dominique
On Wed, Aug 28, 2024 at 04:58:33AM +0900, Dominique Martinet wrote:
Greg Kroah-Hartman wrote on Tue, Aug 27, 2024 at 04:36:47PM +0200:
From: Dominique Martinet asmadeus@codewreck.org
[ Upstream commit e3786b29c54cdae3490b07180a54e2461f42144c ]
As much as I'd like to have this in, it breaks cifs so please hold this patch until at least these two patches also get backported (I didn't actually test the fix so not sure which is needed, *probably* either/both): 950b03d0f66 ("netfs: Fix missing iterator reset on retry of short read") https://lore.kernel.org/r/20240823200819.532106-8-dhowells@redhat.com ("netfs, cifs: Fix handling of short DIO read")
For some reason the former got in master but the later wasn't despite having been sent together, I might have missed some mails and only the first might actually be required.. David, Steve please let us know if just the first is enough.
Either way the 9p patch can wait a couple more weeks; stuck debian CI (9p) is bad but cifs corruptions are worse.
Ok, now dropped, thanks!
greg k-h
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naohiro Aota naohiro.aota@wdc.com
[ Upstream commit 6252690f7e1b173b86a4c27dfc046b351ab423e7 ]
In __extent_writepage_io(), we call btrfs_set_range_writeback() -> folio_start_writeback(), which clears PAGECACHE_TAG_DIRTY mark from the mapping xarray if the folio is not dirty. This worked fine before commit 97713b1a2ced ("btrfs: do not clear page dirty inside extent_write_locked_range()").
After the commit, however, the folio is still dirty at this point, so the mapping DIRTY tag is not cleared anymore. Then, __extent_writepage_io() calls btrfs_folio_clear_dirty() to clear the folio's dirty flag. That results in the page being unlocked with a "strange" state. The page is not PageDirty, but the mapping tag is set as PAGECACHE_TAG_DIRTY.
This strange state looks like causing a hang with a call trace below when running fstests generic/091 on a null_blk device. It is waiting for a folio lock.
While I don't have an exact relation between this hang and the strange state, fixing the state also fixes the hang. And, that state is worth fixing anyway.
This commit reorders btrfs_folio_clear_dirty() and btrfs_set_range_writeback() in __extent_writepage_io(), so that the PAGECACHE_TAG_DIRTY tag is properly removed from the xarray.
[464.274] task:fsx state:D stack:0 pid:3034 tgid:3034 ppid:2853 flags:0x00004002 [464.286] Call Trace: [464.291] <TASK> [464.295] __schedule+0x10ed/0x6260 [464.301] ? __pfx___blk_flush_plug+0x10/0x10 [464.308] ? __submit_bio+0x37c/0x450 [464.314] ? __pfx___schedule+0x10/0x10 [464.321] ? lock_release+0x567/0x790 [464.327] ? __pfx_lock_acquire+0x10/0x10 [464.334] ? __pfx_lock_release+0x10/0x10 [464.340] ? __pfx_lock_acquire+0x10/0x10 [464.347] ? __pfx_lock_release+0x10/0x10 [464.353] ? do_raw_spin_lock+0x12e/0x270 [464.360] schedule+0xdf/0x3b0 [464.365] io_schedule+0x8f/0xf0 [464.371] folio_wait_bit_common+0x2ca/0x6d0 [464.378] ? folio_wait_bit_common+0x1cc/0x6d0 [464.385] ? __pfx_folio_wait_bit_common+0x10/0x10 [464.392] ? __pfx_filemap_get_folios_tag+0x10/0x10 [464.400] ? __pfx_wake_page_function+0x10/0x10 [464.407] ? __pfx___might_resched+0x10/0x10 [464.414] ? do_raw_spin_unlock+0x58/0x1f0 [464.420] extent_write_cache_pages+0xe49/0x1620 [btrfs] [464.428] ? lock_acquire+0x435/0x500 [464.435] ? __pfx_extent_write_cache_pages+0x10/0x10 [btrfs] [464.443] ? btrfs_do_write_iter+0x493/0x640 [btrfs] [464.451] ? orc_find.part.0+0x1d4/0x380 [464.457] ? __pfx_lock_release+0x10/0x10 [464.464] ? __pfx_lock_release+0x10/0x10 [464.471] ? btrfs_do_write_iter+0x493/0x640 [btrfs] [464.478] btrfs_writepages+0x1cc/0x460 [btrfs] [464.485] ? __pfx_btrfs_writepages+0x10/0x10 [btrfs] [464.493] ? is_bpf_text_address+0x6e/0x100 [464.500] ? kernel_text_address+0x145/0x160 [464.507] ? unwind_get_return_address+0x5e/0xa0 [464.514] ? arch_stack_walk+0xac/0x100 [464.521] do_writepages+0x176/0x780 [464.527] ? lock_release+0x567/0x790 [464.533] ? __pfx_do_writepages+0x10/0x10 [464.540] ? __pfx_lock_acquire+0x10/0x10 [464.546] ? __pfx_stack_trace_save+0x10/0x10 [464.553] ? do_raw_spin_lock+0x12e/0x270 [464.560] ? do_raw_spin_unlock+0x58/0x1f0 [464.566] ? _raw_spin_unlock+0x23/0x40 [464.573] ? wbc_attach_and_unlock_inode+0x3da/0x7d0 [464.580] filemap_fdatawrite_wbc+0x113/0x180 [464.587] ? prepare_pages.constprop.0+0x13c/0x5c0 [btrfs] [464.596] __filemap_fdatawrite_range+0xaf/0xf0 [464.603] ? __pfx___filemap_fdatawrite_range+0x10/0x10 [464.611] ? trace_irq_enable.constprop.0+0xce/0x110 [464.618] ? kasan_quarantine_put+0xd7/0x1e0 [464.625] btrfs_start_ordered_extent+0x46f/0x570 [btrfs] [464.633] ? __pfx_btrfs_start_ordered_extent+0x10/0x10 [btrfs] [464.642] ? __clear_extent_bit+0x2c0/0x9d0 [btrfs] [464.650] btrfs_lock_and_flush_ordered_range+0xc6/0x180 [btrfs] [464.659] ? __pfx_btrfs_lock_and_flush_ordered_range+0x10/0x10 [btrfs] [464.669] btrfs_read_folio+0x12a/0x1d0 [btrfs] [464.676] ? __pfx_btrfs_read_folio+0x10/0x10 [btrfs] [464.684] ? __pfx_filemap_add_folio+0x10/0x10 [464.691] ? __pfx___might_resched+0x10/0x10 [464.698] ? __filemap_get_folio+0x1c5/0x450 [464.705] prepare_uptodate_page+0x12e/0x4d0 [btrfs] [464.713] prepare_pages.constprop.0+0x13c/0x5c0 [btrfs] [464.721] ? fault_in_iov_iter_readable+0xd2/0x240 [464.729] btrfs_buffered_write+0x5bd/0x12f0 [btrfs] [464.737] ? __pfx_btrfs_buffered_write+0x10/0x10 [btrfs] [464.745] ? __pfx_lock_release+0x10/0x10 [464.752] ? generic_write_checks+0x275/0x400 [464.759] ? down_write+0x118/0x1f0 [464.765] ? up_write+0x19b/0x500 [464.770] btrfs_direct_write+0x731/0xba0 [btrfs] [464.778] ? __pfx_btrfs_direct_write+0x10/0x10 [btrfs] [464.785] ? __pfx___might_resched+0x10/0x10 [464.792] ? lock_acquire+0x435/0x500 [464.798] ? lock_acquire+0x435/0x500 [464.804] btrfs_do_write_iter+0x494/0x640 [btrfs] [464.811] ? __pfx_btrfs_do_write_iter+0x10/0x10 [btrfs] [464.819] ? __pfx___might_resched+0x10/0x10 [464.825] ? rw_verify_area+0x6d/0x590 [464.831] vfs_write+0x5d7/0xf50 [464.837] ? __might_fault+0x9d/0x120 [464.843] ? __pfx_vfs_write+0x10/0x10 [464.849] ? btrfs_file_llseek+0xb1/0xfb0 [btrfs] [464.856] ? lock_release+0x567/0x790 [464.862] ksys_write+0xfb/0x1d0 [464.867] ? __pfx_ksys_write+0x10/0x10 [464.873] ? _raw_spin_unlock+0x23/0x40 [464.879] ? btrfs_getattr+0x4af/0x670 [btrfs] [464.886] ? vfs_getattr_nosec+0x79/0x340 [464.892] do_syscall_64+0x95/0x180 [464.898] ? __do_sys_newfstat+0xde/0xf0 [464.904] ? __pfx___do_sys_newfstat+0x10/0x10 [464.911] ? trace_irq_enable.constprop.0+0xce/0x110 [464.918] ? syscall_exit_to_user_mode+0xac/0x2a0 [464.925] ? do_syscall_64+0xa1/0x180 [464.931] ? trace_irq_enable.constprop.0+0xce/0x110 [464.939] ? trace_irq_enable.constprop.0+0xce/0x110 [464.946] ? syscall_exit_to_user_mode+0xac/0x2a0 [464.953] ? btrfs_file_llseek+0xb1/0xfb0 [btrfs] [464.960] ? do_syscall_64+0xa1/0x180 [464.966] ? btrfs_file_llseek+0xb1/0xfb0 [btrfs] [464.973] ? trace_irq_enable.constprop.0+0xce/0x110 [464.980] ? syscall_exit_to_user_mode+0xac/0x2a0 [464.987] ? __pfx_btrfs_file_llseek+0x10/0x10 [btrfs] [464.995] ? trace_irq_enable.constprop.0+0xce/0x110 [465.002] ? __pfx_btrfs_file_llseek+0x10/0x10 [btrfs] [465.010] ? do_syscall_64+0xa1/0x180 [465.016] ? lock_release+0x567/0x790 [465.022] ? __pfx_lock_acquire+0x10/0x10 [465.028] ? __pfx_lock_release+0x10/0x10 [465.034] ? trace_irq_enable.constprop.0+0xce/0x110 [465.042] ? syscall_exit_to_user_mode+0xac/0x2a0 [465.049] ? do_syscall_64+0xa1/0x180 [465.055] ? syscall_exit_to_user_mode+0xac/0x2a0 [465.062] ? do_syscall_64+0xa1/0x180 [465.068] ? syscall_exit_to_user_mode+0xac/0x2a0 [465.075] ? do_syscall_64+0xa1/0x180 [465.081] ? clear_bhb_loop+0x25/0x80 [465.087] ? clear_bhb_loop+0x25/0x80 [465.093] ? clear_bhb_loop+0x25/0x80 [465.099] entry_SYSCALL_64_after_hwframe+0x76/0x7e [465.106] RIP: 0033:0x7f093b8ee784 [465.111] RSP: 002b:00007ffc29d31b28 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [465.122] RAX: ffffffffffffffda RBX: 0000000000006000 RCX: 00007f093b8ee784 [465.131] RDX: 000000000001de00 RSI: 00007f093b6ed200 RDI: 0000000000000003 [465.141] RBP: 000000000001de00 R08: 0000000000006000 R09: 0000000000000000 [465.150] R10: 0000000000023e00 R11: 0000000000000202 R12: 0000000000006000 [465.160] R13: 0000000000023e00 R14: 0000000000023e00 R15: 0000000000000001 [465.170] </TASK> [465.174] INFO: lockdep is turned off.
Reported-by: Shinichiro Kawasaki shinichiro.kawasaki@wdc.com Fixes: 97713b1a2ced ("btrfs: do not clear page dirty inside extent_write_locked_range()") Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Naohiro Aota naohiro.aota@wdc.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent_io.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index 0486b1f911248..3bad7c0be1f10 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -1420,6 +1420,13 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, free_extent_map(em); em = NULL;
+ /* + * Although the PageDirty bit might be cleared before entering + * this function, subpage dirty bit is not cleared. + * So clear subpage dirty bit here so next time we won't submit + * page for range already written to disk. + */ + btrfs_folio_clear_dirty(fs_info, page_folio(page), cur, iosize); btrfs_set_range_writeback(inode, cur, cur + iosize - 1); if (!PageWriteback(page)) { btrfs_err(inode->root->fs_info, @@ -1427,13 +1434,6 @@ static noinline_for_stack int __extent_writepage_io(struct btrfs_inode *inode, page->index, cur, end); }
- /* - * Although the PageDirty bit is cleared before entering this - * function, subpage dirty bit is not cleared. - * So clear subpage dirty bit here so next time we won't submit - * page for range already written to disk. - */ - btrfs_folio_clear_dirty(fs_info, page_folio(page), cur, iosize);
submit_extent_page(bio_ctrl, disk_bytenr, page, iosize, cur - page_offset(page));
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Faizal Rahim faizal.abdul.rahim@linux.intel.com
[ Upstream commit e037a26ead187901f83cad9c503ccece5ff6817a ]
Testing uncovered that even when the taprio gate is closed, some packets still transmit.
According to i225/6 hardware errata [1], traffic might overflow the planned QBV window. This happens because MAC maintains an internal buffer, primarily for supporting half duplex retries. Therefore, even when the gate closes, residual MAC data in the buffer may still transmit.
To mitigate this for i226, reduce the MAC's internal buffer from 192 bytes to the recommended 88 bytes by modifying the RETX_CTL register value.
This follows guidelines from: [1] Ethernet Controller I225/I22 Spec Update Rev 2.1 Errata Item 9: TSN: Packet Transmission Might Cross Qbv Window [2] I225/6 SW User Manual Rev 1.2.4: Section 8.11.5 Retry Buffer Control
Note that the RETX_CTL register can't be used in TSN mode because half duplex feature cannot coexist with TSN.
Test Steps: 1. Send taprio cmd to board A: tc qdisc replace dev enp1s0 parent root handle 100 taprio \ num_tc 4 \ map 3 2 1 0 3 3 3 3 3 3 3 3 3 3 3 3 \ queues 1@0 1@1 1@2 1@3 \ base-time 0 \ sched-entry S 0x07 500000 \ sched-entry S 0x0f 500000 \ flags 0x2 \ txtime-delay 0
Note that for TC3, gate should open for 500us and close for another 500us.
3. Take tcpdump log on Board B.
4. Send udp packets via UDP tai app from Board A to Board B.
5. Analyze tcpdump log via wireshark log on Board B. Ensure that the total time from the first to the last packet received during one cycle for TC3 does not exceed 500us.
Fixes: 43546211738e ("igc: Add new device ID's") Signed-off-by: Faizal Rahim faizal.abdul.rahim@linux.intel.com Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Tested-by: Mor Bar-Gabay morx.bar.gabay@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc_defines.h | 6 ++++ drivers/net/ethernet/intel/igc/igc_tsn.c | 34 ++++++++++++++++++++ 2 files changed, 40 insertions(+)
diff --git a/drivers/net/ethernet/intel/igc/igc_defines.h b/drivers/net/ethernet/intel/igc/igc_defines.h index 5f92b3c7c3d4a..511384f3ec5cb 100644 --- a/drivers/net/ethernet/intel/igc/igc_defines.h +++ b/drivers/net/ethernet/intel/igc/igc_defines.h @@ -404,6 +404,12 @@ #define IGC_DTXMXPKTSZ_TSN 0x19 /* 1600 bytes of max TX DMA packet size */ #define IGC_DTXMXPKTSZ_DEFAULT 0x98 /* 9728-byte Jumbo frames */
+/* Retry Buffer Control */ +#define IGC_RETX_CTL 0x041C +#define IGC_RETX_CTL_WATERMARK_MASK 0xF +#define IGC_RETX_CTL_QBVFULLTH_SHIFT 8 /* QBV Retry Buffer Full Threshold */ +#define IGC_RETX_CTL_QBVFULLEN 0x1000 /* Enable QBV Retry Buffer Full Threshold */ + /* Transmit Scheduling Latency */ /* Latency between transmission scheduling (LaunchTime) and the time * the packet is transmitted to the network in nanosecond. diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c index 22cefb1eeedfa..46d4c3275bbb5 100644 --- a/drivers/net/ethernet/intel/igc/igc_tsn.c +++ b/drivers/net/ethernet/intel/igc/igc_tsn.c @@ -78,6 +78,15 @@ void igc_tsn_adjust_txtime_offset(struct igc_adapter *adapter) wr32(IGC_GTXOFFSET, txoffset); }
+static void igc_tsn_restore_retx_default(struct igc_adapter *adapter) +{ + struct igc_hw *hw = &adapter->hw; + u32 retxctl; + + retxctl = rd32(IGC_RETX_CTL) & IGC_RETX_CTL_WATERMARK_MASK; + wr32(IGC_RETX_CTL, retxctl); +} + /* Returns the TSN specific registers to their default values after * the adapter is reset. */ @@ -91,6 +100,9 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter) wr32(IGC_TXPBS, I225_TXPBSIZE_DEFAULT); wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_DEFAULT);
+ if (igc_is_device_id_i226(hw)) + igc_tsn_restore_retx_default(adapter); + tqavctrl = rd32(IGC_TQAVCTRL); tqavctrl &= ~(IGC_TQAVCTRL_TRANSMIT_MODE_TSN | IGC_TQAVCTRL_ENHANCED_QAV | IGC_TQAVCTRL_FUTSCDDIS); @@ -111,6 +123,25 @@ static int igc_tsn_disable_offload(struct igc_adapter *adapter) return 0; }
+/* To partially fix i226 HW errata, reduce MAC internal buffering from 192 Bytes + * to 88 Bytes by setting RETX_CTL register using the recommendation from: + * a) Ethernet Controller I225/I226 Specification Update Rev 2.1 + * Item 9: TSN: Packet Transmission Might Cross the Qbv Window + * b) I225/6 SW User Manual Rev 1.2.4: Section 8.11.5 Retry Buffer Control + */ +static void igc_tsn_set_retx_qbvfullthreshold(struct igc_adapter *adapter) +{ + struct igc_hw *hw = &adapter->hw; + u32 retxctl, watermark; + + retxctl = rd32(IGC_RETX_CTL); + watermark = retxctl & IGC_RETX_CTL_WATERMARK_MASK; + /* Set QBVFULLTH value using watermark and set QBVFULLEN */ + retxctl |= (watermark << IGC_RETX_CTL_QBVFULLTH_SHIFT) | + IGC_RETX_CTL_QBVFULLEN; + wr32(IGC_RETX_CTL, retxctl); +} + static int igc_tsn_enable_offload(struct igc_adapter *adapter) { struct igc_hw *hw = &adapter->hw; @@ -123,6 +154,9 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter) wr32(IGC_DTXMXPKTSZ, IGC_DTXMXPKTSZ_TSN); wr32(IGC_TXPBS, IGC_TXPBSIZE_TSN);
+ if (igc_is_device_id_i226(hw)) + igc_tsn_set_retx_qbvfullthreshold(adapter); + for (i = 0; i < adapter->num_tx_queues; i++) { struct igc_ring *ring = adapter->tx_ring[i]; u32 txqctl = 0;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Faizal Rahim faizal.abdul.rahim@linux.intel.com
[ Upstream commit f8d6acaee9d35cbff3c3cfad94641666c596f8da ]
When user issues these cmds: 1. Either a) or b) a) mqprio with hardware offload disabled b) taprio with txtime-assist feature enabled 2. etf 3. tc qdisc delete 4. taprio with base time in the past
At step 4, qbv_config_change_errors wrongly increased by 1.
Excerpt from IEEE 802.1Q-2018 8.6.9.3.1: "If AdminBaseTime specifies a time in the past, and the current schedule is running, then: Increment ConfigChangeError counter"
qbv_config_change_errors should only increase if base time is in the past and no taprio is active. In user perspective, taprio was not active when first triggered at step 4. However, i225/6 reuses qbv for etf, so qbv is enabled with a dummy schedule at step 2 where it enters igc_tsn_enable_offload() and qbv_count got incremented to 1. At step 4, it enters igc_tsn_enable_offload() again, qbv_count is incremented to 2. Because taprio is running, tc_setup_type is TC_SETUP_QDISC_ETF and qbv_count > 1, qbv_config_change_errors value got incremented.
This issue happens due to reliance on qbv_count field where a non-zero value indicates that taprio is running. But qbv_count increases regardless if taprio is triggered by user or by other tsn feature. It does not align with qbv_config_change_errors expectation where it is only concerned with taprio triggered by user.
Fixing this by relocating the qbv_config_change_errors logic to igc_save_qbv_schedule(), eliminating reliance on qbv_count and its inaccuracies from i225/6's multiple uses of qbv feature for other TSN features.
The new function created: igc_tsn_is_taprio_activated_by_user() uses taprio_offload_enable field to indicate that the current running taprio was triggered by user, instead of triggered by non-qbv feature like etf.
Fixes: ae4fe4698300 ("igc: Add qbv_config_change_errors counter") Signed-off-by: Faizal Rahim faizal.abdul.rahim@linux.intel.com Reviewed-by: Simon Horman horms@kernel.org Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Tested-by: Mor Bar-Gabay morx.bar.gabay@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc_main.c | 8 ++++++-- drivers/net/ethernet/intel/igc/igc_tsn.c | 16 ++++++++-------- drivers/net/ethernet/intel/igc/igc_tsn.h | 1 + 3 files changed, 15 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_main.c b/drivers/net/ethernet/intel/igc/igc_main.c index 33069880c86c0..3041f8142324f 100644 --- a/drivers/net/ethernet/intel/igc/igc_main.c +++ b/drivers/net/ethernet/intel/igc/igc_main.c @@ -6319,12 +6319,16 @@ static int igc_save_qbv_schedule(struct igc_adapter *adapter, if (!validate_schedule(adapter, qopt)) return -EINVAL;
+ igc_ptp_read(adapter, &now); + + if (igc_tsn_is_taprio_activated_by_user(adapter) && + is_base_time_past(qopt->base_time, &now)) + adapter->qbv_config_change_errors++; + adapter->cycle_time = qopt->cycle_time; adapter->base_time = qopt->base_time; adapter->taprio_offload_enable = true;
- igc_ptp_read(adapter, &now); - for (n = 0; n < qopt->num_entries; n++) { struct tc_taprio_sched_entry *e = &qopt->entries[n];
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c index 46d4c3275bbb5..8ed7b965484da 100644 --- a/drivers/net/ethernet/intel/igc/igc_tsn.c +++ b/drivers/net/ethernet/intel/igc/igc_tsn.c @@ -87,6 +87,14 @@ static void igc_tsn_restore_retx_default(struct igc_adapter *adapter) wr32(IGC_RETX_CTL, retxctl); }
+bool igc_tsn_is_taprio_activated_by_user(struct igc_adapter *adapter) +{ + struct igc_hw *hw = &adapter->hw; + + return (rd32(IGC_BASET_H) || rd32(IGC_BASET_L)) && + adapter->taprio_offload_enable; +} + /* Returns the TSN specific registers to their default values after * the adapter is reset. */ @@ -296,14 +304,6 @@ static int igc_tsn_enable_offload(struct igc_adapter *adapter) s64 n = div64_s64(ktime_sub_ns(systim, base_time), cycle);
base_time = ktime_add_ns(base_time, (n + 1) * cycle); - - /* Increase the counter if scheduling into the past while - * Gate Control List (GCL) is running. - */ - if ((rd32(IGC_BASET_H) || rd32(IGC_BASET_L)) && - (adapter->tc_setup_type == TC_SETUP_QDISC_TAPRIO) && - (adapter->qbv_count > 1)) - adapter->qbv_config_change_errors++; } else { if (igc_is_device_id_i226(hw)) { ktime_t adjust_time, expires_time; diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.h b/drivers/net/ethernet/intel/igc/igc_tsn.h index b53e6af560b73..98ec845a86bf0 100644 --- a/drivers/net/ethernet/intel/igc/igc_tsn.h +++ b/drivers/net/ethernet/intel/igc/igc_tsn.h @@ -7,5 +7,6 @@ int igc_tsn_offload_apply(struct igc_adapter *adapter); int igc_tsn_reset(struct igc_adapter *adapter); void igc_tsn_adjust_txtime_offset(struct igc_adapter *adapter); +bool igc_tsn_is_taprio_activated_by_user(struct igc_adapter *adapter);
#endif /* _IGC_BASE_H */
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Faizal Rahim faizal.abdul.rahim@linux.intel.com
[ Upstream commit 0afeaeb5dae86aceded0d5f0c3a54d27858c0c6f ]
Following the "igc: Fix TX Hang issue when QBV Gate is close" changes, remaining issues with the reset adapter logic in igc_tsn_offload_apply() have been observed:
1. The reset adapter logics for i225 and i226 differ, although they should be the same according to the guidelines in I225/6 HW Design Section 7.5.2.1 on software initialization during tx mode changes. 2. The i225 resets adapter every time, even though tx mode doesn't change. This occurs solely based on the condition igc_is_device_id_i225() when calling schedule_work(). 3. i226 doesn't reset adapter for tsn->legacy tx mode changes. It only resets adapter for legacy->tsn tx mode transitions. 4. qbv_count introduced in the patch is actually not needed; in this context, a non-zero value of qbv_count is used to indicate if tx mode was unconditionally set to tsn in igc_tsn_enable_offload(). This could be replaced by checking the existing register IGC_TQAVCTRL_TRANSMIT_MODE_TSN bit.
This patch resolves all issues and enters schedule_work() to reset the adapter only when changing tx mode. It also removes reliance on qbv_count.
qbv_count field will be removed in a future patch.
Test ran:
1. Verify reset adapter behaviour in i225/6: a) Enrol a new GCL Reset adapter observed (tx mode change legacy->tsn) b) Enrol a new GCL without deleting qdisc No reset adapter observed (tx mode remain tsn->tsn) c) Delete qdisc Reset adapter observed (tx mode change tsn->legacy)
2. Tested scenario from "igc: Fix TX Hang issue when QBV Gate is closed" to confirm it remains resolved.
Fixes: 175c241288c0 ("igc: Fix TX Hang issue when QBV Gate is closed") Signed-off-by: Faizal Rahim faizal.abdul.rahim@linux.intel.com Reviewed-by: Simon Horman horms@kernel.org Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Tested-by: Mor Bar-Gabay morx.bar.gabay@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc_tsn.c | 24 +++++++++++++++++++----- 1 file changed, 19 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c index 8ed7b965484da..ada7514305171 100644 --- a/drivers/net/ethernet/intel/igc/igc_tsn.c +++ b/drivers/net/ethernet/intel/igc/igc_tsn.c @@ -49,6 +49,13 @@ static unsigned int igc_tsn_new_flags(struct igc_adapter *adapter) return new_flags; }
+static bool igc_tsn_is_tx_mode_in_tsn(struct igc_adapter *adapter) +{ + struct igc_hw *hw = &adapter->hw; + + return !!(rd32(IGC_TQAVCTRL) & IGC_TQAVCTRL_TRANSMIT_MODE_TSN); +} + void igc_tsn_adjust_txtime_offset(struct igc_adapter *adapter) { struct igc_hw *hw = &adapter->hw; @@ -365,15 +372,22 @@ int igc_tsn_reset(struct igc_adapter *adapter) return err; }
-int igc_tsn_offload_apply(struct igc_adapter *adapter) +static bool igc_tsn_will_tx_mode_change(struct igc_adapter *adapter) { - struct igc_hw *hw = &adapter->hw; + bool any_tsn_enabled = !!(igc_tsn_new_flags(adapter) & + IGC_FLAG_TSN_ANY_ENABLED); + + return (any_tsn_enabled && !igc_tsn_is_tx_mode_in_tsn(adapter)) || + (!any_tsn_enabled && igc_tsn_is_tx_mode_in_tsn(adapter)); +}
- /* Per I225/6 HW Design Section 7.5.2.1, transmit mode - * cannot be changed dynamically. Require reset the adapter. +int igc_tsn_offload_apply(struct igc_adapter *adapter) +{ + /* Per I225/6 HW Design Section 7.5.2.1 guideline, if tx mode change + * from legacy->tsn or tsn->legacy, then reset adapter is needed. */ if (netif_running(adapter->netdev) && - (igc_is_device_id_i225(hw) || !adapter->qbv_count)) { + igc_tsn_will_tx_mode_change(adapter)) { schedule_work(&adapter->reset_task); return 0; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Faizal Rahim faizal.abdul.rahim@linux.intel.com
[ Upstream commit 6c3fc0b1c3d073bd6fc3bf43dbd0e64240537464 ]
A large tx latency issue was discovered during testing when only QBV was enabled. The issue occurs because gtxoffset was not set when QBV is active, it was only set when launch time is active.
The patch "igc: Correct the launchtime offset" only sets gtxoffset when the launchtime_enable field is set by the user. Enabling launchtime_enable ultimately sets the register IGC_TXQCTL_QUEUE_MODE_LAUNCHT (referred to as LaunchT in the SW user manual).
Section 7.5.2.6 of the IGC i225/6 SW User Manual Rev 1.2.4 states: "The latency between transmission scheduling (launch time) and the time the packet is transmitted to the network is listed in Table 7-61."
However, the patch misinterprets the phrase "launch time" in that section by assuming it specifically refers to the LaunchT register, whereas it actually denotes the generic term for when a packet is released from the internal buffer to the MAC transmit logic.
This launch time, as per that section, also implicitly refers to the QBV gate open time, where a packet waits in the buffer for the QBV gate to open. Therefore, latency applies whenever QBV is in use. TSN features such as QBU and QAV reuse QBV, making the latency universal to TSN features.
Discussed with i226 HW owner (Shalev, Avi) and we were in agreement that the term "launch time" used in Section 7.5.2.6 is not clear and can be easily misinterpreted. Avi will update this section to: "When TQAVCTRL.TRANSMIT_MODE = TSN, the latency between transmission scheduling and the time the packet is transmitted to the network is listed in Table 7-61."
Fix this issue by using igc_tsn_is_tx_mode_in_tsn() as a condition to write to gtxoffset, aligning with the newly updated SW User Manual.
Tested: 1. Enrol taprio on talker board base-time 0 cycle-time 1000000 flags 0x2 index 0 cmd S gatemask 0x1 interval1 index 0 cmd S gatemask 0x1 interval2
Note: interval1 = interval for a 64 bytes packet to go through interval2 = cycle-time - interval1
2. Take tcpdump on listener board
3. Use udp tai app on talker to send packets to listener
4. Check the timestamp on listener via wireshark
Test Result: 100 Mbps: 113 ~193 ns 1000 Mbps: 52 ~ 84 ns 2500 Mbps: 95 ~ 223 ns
Note that the test result is similar to the patch "igc: Correct the launchtime offset".
Fixes: 790835fcc0cb ("igc: Correct the launchtime offset") Signed-off-by: Faizal Rahim faizal.abdul.rahim@linux.intel.com Reviewed-by: Simon Horman horms@kernel.org Acked-by: Vinicius Costa Gomes vinicius.gomes@intel.com Tested-by: Mor Bar-Gabay morx.bar.gabay@intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igc/igc_tsn.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/igc/igc_tsn.c b/drivers/net/ethernet/intel/igc/igc_tsn.c index ada7514305171..d68fa7f3d5f07 100644 --- a/drivers/net/ethernet/intel/igc/igc_tsn.c +++ b/drivers/net/ethernet/intel/igc/igc_tsn.c @@ -61,7 +61,7 @@ void igc_tsn_adjust_txtime_offset(struct igc_adapter *adapter) struct igc_hw *hw = &adapter->hw; u16 txoffset;
- if (!is_any_launchtime(adapter)) + if (!igc_tsn_is_tx_mode_in_tsn(adapter)) return;
switch (adapter->link_speed) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 3a3be7ff9224f424e485287b54be00d2c6bd9c40 ]
syzbot/KMSAN reported use of uninit-value in get_dev_xmit() [1]
We must make sure the IPv4 or Ipv6 header is pulled in skb->head before accessing fields in them.
Use pskb_inet_may_pull() to fix this issue.
[1] BUG: KMSAN: uninit-value in ipv6_pdp_find drivers/net/gtp.c:220 [inline] BUG: KMSAN: uninit-value in gtp_build_skb_ip6 drivers/net/gtp.c:1229 [inline] BUG: KMSAN: uninit-value in gtp_dev_xmit+0x1424/0x2540 drivers/net/gtp.c:1281 ipv6_pdp_find drivers/net/gtp.c:220 [inline] gtp_build_skb_ip6 drivers/net/gtp.c:1229 [inline] gtp_dev_xmit+0x1424/0x2540 drivers/net/gtp.c:1281 __netdev_start_xmit include/linux/netdevice.h:4913 [inline] netdev_start_xmit include/linux/netdevice.h:4922 [inline] xmit_one net/core/dev.c:3580 [inline] dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3596 __dev_queue_xmit+0x358c/0x5610 net/core/dev.c:4423 dev_queue_xmit include/linux/netdevice.h:3105 [inline] packet_xmit+0x9c/0x6c0 net/packet/af_packet.c:276 packet_snd net/packet/af_packet.c:3145 [inline] packet_sendmsg+0x90e3/0xa3a0 net/packet/af_packet.c:3177 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 __sys_sendto+0x685/0x830 net/socket.c:2204 __do_sys_sendto net/socket.c:2216 [inline] __se_sys_sendto net/socket.c:2212 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2212 x64_sys_call+0x3799/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:45 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Uninit was created at: slab_post_alloc_hook mm/slub.c:3994 [inline] slab_alloc_node mm/slub.c:4037 [inline] kmem_cache_alloc_node_noprof+0x6bf/0xb80 mm/slub.c:4080 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:583 __alloc_skb+0x363/0x7b0 net/core/skbuff.c:674 alloc_skb include/linux/skbuff.h:1320 [inline] alloc_skb_with_frags+0xc8/0xbf0 net/core/skbuff.c:6526 sock_alloc_send_pskb+0xa81/0xbf0 net/core/sock.c:2815 packet_alloc_skb net/packet/af_packet.c:2994 [inline] packet_snd net/packet/af_packet.c:3088 [inline] packet_sendmsg+0x749c/0xa3a0 net/packet/af_packet.c:3177 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x30f/0x380 net/socket.c:745 __sys_sendto+0x685/0x830 net/socket.c:2204 __do_sys_sendto net/socket.c:2216 [inline] __se_sys_sendto net/socket.c:2212 [inline] __x64_sys_sendto+0x125/0x1d0 net/socket.c:2212 x64_sys_call+0x3799/0x3c10 arch/x86/include/generated/asm/syscalls_64.h:45 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xcd/0x1e0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
CPU: 0 UID: 0 PID: 7115 Comm: syz.1.515 Not tainted 6.11.0-rc1-syzkaller-00043-g94ede2a3e913 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/27/2024
Fixes: 999cb275c807 ("gtp: add IPv6 support") Fixes: 459aa660eb1d ("gtp: add initial driver for datapath of GPRS Tunneling Protocol (GTP-U)") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Harald Welte laforge@gnumonks.org Reviewed-by: Pablo Neira Ayuso pablo@netfilter.org Link: https://patch.msgid.link/20240808132455.3413916-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/gtp.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/gtp.c b/drivers/net/gtp.c index 427b91aca50d3..0696faf60013e 100644 --- a/drivers/net/gtp.c +++ b/drivers/net/gtp.c @@ -1269,6 +1269,9 @@ static netdev_tx_t gtp_dev_xmit(struct sk_buff *skb, struct net_device *dev) if (skb_cow_head(skb, dev->needed_headroom)) goto tx_err;
+ if (!pskb_inet_may_pull(skb)) + goto tx_err; + skb_reset_inner_headers(skb);
/* PDP context lookups in gtp_build_skb_*() need rcu read-side lock. */
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tariq Toukan tariqt@nvidia.com
[ Upstream commit c31fe2b5095d8c84562ce90db07600f7e9f318df ]
Unconditionally calling the MPIR query on BF separate mode yields the FW syndrome below [1]. Do not call it unless admin clearly specified the SD group, i.e. expressing the intention of using the multi-PF netdev feature.
This fix covers cases not covered in commit fca3b4791850 ("net/mlx5: Do not query MPIR on embedded CPU function").
[1] mlx5_cmd_out_err:808:(pid 8267): ACCESS_REG(0x805) op_mod(0x1) failed, status bad system state(0x4), syndrome (0x685f19), err(-5)
Fixes: 678eb448055a ("net/mlx5: SD, Implement basic query and instantiation") Signed-off-by: Tariq Toukan tariqt@nvidia.com Reviewed-by: Gal Pressman gal@nvidia.com Link: https://patch.msgid.link/20240808144107.2095424-2-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/lib/sd.c | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c index f6deb5a3f8202..eeb0b7ea05f12 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c @@ -126,7 +126,7 @@ static bool mlx5_sd_is_supported(struct mlx5_core_dev *dev, u8 host_buses) }
static int mlx5_query_sd(struct mlx5_core_dev *dev, bool *sdm, - u8 *host_buses, u8 *sd_group) + u8 *host_buses) { u32 out[MLX5_ST_SZ_DW(mpir_reg)]; int err; @@ -135,10 +135,6 @@ static int mlx5_query_sd(struct mlx5_core_dev *dev, bool *sdm, if (err) return err;
- err = mlx5_query_nic_vport_sd_group(dev, sd_group); - if (err) - return err; - *sdm = MLX5_GET(mpir_reg, out, sdm); *host_buses = MLX5_GET(mpir_reg, out, host_buses);
@@ -166,19 +162,23 @@ static int sd_init(struct mlx5_core_dev *dev) if (mlx5_core_is_ecpf(dev)) return 0;
+ err = mlx5_query_nic_vport_sd_group(dev, &sd_group); + if (err) + return err; + + if (!sd_group) + return 0; + if (!MLX5_CAP_MCAM_REG(dev, mpir)) return 0;
- err = mlx5_query_sd(dev, &sdm, &host_buses, &sd_group); + err = mlx5_query_sd(dev, &sdm, &host_buses); if (err) return err;
if (!sdm) return 0;
- if (!sd_group) - return 0; - group_id = mlx5_sd_group_id(dev, sd_group);
if (!mlx5_sd_is_supported(dev, host_buses)) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dragos Tatulea dtatulea@nvidia.com
[ Upstream commit e6b5afd30b99b43682a7764e1a74a42fe4d5f4b3 ]
mlx5e_safe_reopen_channels() requires the state lock taken. The referenced changed in the Fixes tag removed the lock to fix another issue. This patch adds it back but at a later point (when calling mlx5e_safe_reopen_channels()) to avoid the deadlock referenced in the Fixes tag.
Fixes: eab0da38912e ("net/mlx5e: Fix possible deadlock on mlx5e_tx_timeout_work") Signed-off-by: Dragos Tatulea dtatulea@nvidia.com Link: https://lore.kernel.org/all/ZplpKq8FKi3vwfxv@gmail.com/T/ Reviewed-by: Breno Leitao leitao@debian.org Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/20240808144107.2095424-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c index 22918b2ef7f12..09433b91be176 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/reporter_tx.c @@ -146,7 +146,9 @@ static int mlx5e_tx_reporter_timeout_recover(void *ctx) return err; }
+ mutex_lock(&priv->state_lock); err = mlx5e_safe_reopen_channels(priv); + mutex_unlock(&priv->state_lock); if (!err) { to_ctx->status = 1; /* all channels recovered */ return err;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cosmin Ratiu cratiu@nvidia.com
[ Upstream commit cbc796be1779c4dbc9a482c7233995e2a8b6bfb3 ]
Previously, an ethtool rx flow with no attrs would not be added to the NIC as it has no rules to configure the hw with, but it would be reported as successful to the caller (return code 0). This is confusing for the user as ethtool then reports "Added rule $num", but no rule was actually added.
This change corrects that by instead reporting these wrong rules as -EINVAL.
Fixes: b29c61dac3a2 ("net/mlx5e: Ethtool steering flow validation refactoring") Signed-off-by: Cosmin Ratiu cratiu@nvidia.com Reviewed-by: Saeed Mahameed saeedm@nvidia.com Reviewed-by: Dragos Tatulea dtatulea@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/20240808144107.2095424-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c index 3eccdadc03578..773624bb2c5d5 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_fs_ethtool.c @@ -734,7 +734,7 @@ mlx5e_ethtool_flow_replace(struct mlx5e_priv *priv, if (num_tuples <= 0) { netdev_warn(priv->netdev, "%s: flow is not valid %d\n", __func__, num_tuples); - return num_tuples; + return num_tuples < 0 ? num_tuples : -EINVAL; }
eth_ft = get_flow_table(priv, fs, num_tuples);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit a9a18e8f770c9b0703dab93580d0b02e199a4c79 ]
We can't dereference "skb" after calling vcc->push() because the skb is released.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/atm/idt77252.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/atm/idt77252.c b/drivers/atm/idt77252.c index e7f713cd70d3f..a876024d8a05f 100644 --- a/drivers/atm/idt77252.c +++ b/drivers/atm/idt77252.c @@ -1118,8 +1118,8 @@ dequeue_rx(struct idt77252_dev *card, struct rsq_entry *rsqe) rpp->len += skb->len;
if (stat & SAR_RSQE_EPDU) { + unsigned int len, truesize; unsigned char *l1l2; - unsigned int len;
l1l2 = (unsigned char *) ((unsigned long) skb->data + skb->len - 6);
@@ -1189,14 +1189,15 @@ dequeue_rx(struct idt77252_dev *card, struct rsq_entry *rsqe) ATM_SKB(skb)->vcc = vcc; __net_timestamp(skb);
+ truesize = skb->truesize; vcc->push(vcc, skb); atomic_inc(&vcc->stats->rx);
- if (skb->truesize > SAR_FB_SIZE_3) + if (truesize > SAR_FB_SIZE_3) add_rx_skb(card, 3, SAR_FB_SIZE_3, 1); - else if (skb->truesize > SAR_FB_SIZE_2) + else if (truesize > SAR_FB_SIZE_2) add_rx_skb(card, 2, SAR_FB_SIZE_2, 1); - else if (skb->truesize > SAR_FB_SIZE_1) + else if (truesize > SAR_FB_SIZE_1) add_rx_skb(card, 1, SAR_FB_SIZE_1, 1); else add_rx_skb(card, 0, SAR_FB_SIZE_0, 1);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Radhey Shyam Pandey radhey.shyam.pandey@amd.com
[ Upstream commit 9ff2f816e2aa65ca9a1cdf0954842f8173c0f48d ]
In axiethernet header fix register defines comment description to be inline with IP documentation. It updates MAC configuration register, MDIO configuration register and frame filter control description.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Radhey Shyam Pandey radhey.shyam.pandey@amd.com Reviewed-by: Andrew Lunn andrew@lunn.ch Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/xilinx/xilinx_axienet.h | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h index fa5500decc960..c7d9221fafdcb 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h @@ -160,16 +160,16 @@ #define XAE_RCW1_OFFSET 0x00000404 /* Rx Configuration Word 1 */ #define XAE_TC_OFFSET 0x00000408 /* Tx Configuration */ #define XAE_FCC_OFFSET 0x0000040C /* Flow Control Configuration */ -#define XAE_EMMC_OFFSET 0x00000410 /* EMAC mode configuration */ -#define XAE_PHYC_OFFSET 0x00000414 /* RGMII/SGMII configuration */ +#define XAE_EMMC_OFFSET 0x00000410 /* MAC speed configuration */ +#define XAE_PHYC_OFFSET 0x00000414 /* RX Max Frame Configuration */ #define XAE_ID_OFFSET 0x000004F8 /* Identification register */ -#define XAE_MDIO_MC_OFFSET 0x00000500 /* MII Management Config */ -#define XAE_MDIO_MCR_OFFSET 0x00000504 /* MII Management Control */ -#define XAE_MDIO_MWD_OFFSET 0x00000508 /* MII Management Write Data */ -#define XAE_MDIO_MRD_OFFSET 0x0000050C /* MII Management Read Data */ +#define XAE_MDIO_MC_OFFSET 0x00000500 /* MDIO Setup */ +#define XAE_MDIO_MCR_OFFSET 0x00000504 /* MDIO Control */ +#define XAE_MDIO_MWD_OFFSET 0x00000508 /* MDIO Write Data */ +#define XAE_MDIO_MRD_OFFSET 0x0000050C /* MDIO Read Data */ #define XAE_UAW0_OFFSET 0x00000700 /* Unicast address word 0 */ #define XAE_UAW1_OFFSET 0x00000704 /* Unicast address word 1 */ -#define XAE_FMI_OFFSET 0x00000708 /* Filter Mask Index */ +#define XAE_FMI_OFFSET 0x00000708 /* Frame Filter Control */ #define XAE_AF0_OFFSET 0x00000710 /* Address Filter 0 */ #define XAE_AF1_OFFSET 0x00000714 /* Address Filter 1 */
@@ -308,7 +308,7 @@ */ #define XAE_UAW1_UNICASTADDR_MASK 0x0000FFFF
-/* Bit masks for Axi Ethernet FMI register */ +/* Bit masks for Axi Ethernet FMC register */ #define XAE_FMI_PM_MASK 0x80000000 /* Promis. mode enable */ #define XAE_FMI_IND_MASK 0x00000003 /* Index Mask */
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pawel Dembicki paweldembicki@gmail.com
[ Upstream commit 63796bc2e97cd5ebcef60bad4953259d4ad11cb4 ]
According to the datasheet description ("Port Mode Procedure" in 5.6.2), the VSC73XX_MAC_CFG_WEXC_DIS bit is configured only for half duplex mode.
The WEXC_DIS bit is responsible for MAC behavior after an excessive collision. Let's set it as described in the datasheet.
Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Reviewed-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: Pawel Dembicki paweldembicki@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/vitesse-vsc73xx-core.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/vitesse-vsc73xx-core.c b/drivers/net/dsa/vitesse-vsc73xx-core.c index 4b031fefcec68..fc4030976bdce 100644 --- a/drivers/net/dsa/vitesse-vsc73xx-core.c +++ b/drivers/net/dsa/vitesse-vsc73xx-core.c @@ -817,6 +817,11 @@ static void vsc73xx_mac_link_up(struct phylink_config *config,
if (duplex == DUPLEX_FULL) val |= VSC73XX_MAC_CFG_FDX; + else + /* In datasheet description ("Port Mode Procedure" in 5.6.2) + * this bit is configured only for half duplex. + */ + val |= VSC73XX_MAC_CFG_WEXC_DIS;
/* This routine is described in the datasheet (below ARBDISC register * description) @@ -827,7 +832,6 @@ static void vsc73xx_mac_link_up(struct phylink_config *config, get_random_bytes(&seed, 1); val |= seed << VSC73XX_MAC_CFG_SEED_OFFSET; val |= VSC73XX_MAC_CFG_SEED_LOAD; - val |= VSC73XX_MAC_CFG_WEXC_DIS; vsc73xx_write(vsc, VSC73XX_BLOCK_MAC, port, VSC73XX_MAC_CFG, val);
/* Flow control for the PHY facing ports:
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pawel Dembicki paweldembicki@gmail.com
[ Upstream commit 5b9eebc2c7a5f0cc7950d918c1e8a4ad4bed5010 ]
In the 'vsc73xx_phy_write' function, the register value is missing, and the phy write operation always sends zeros.
This commit passes the value variable into the proper register.
Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Reviewed-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: Pawel Dembicki paweldembicki@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/vitesse-vsc73xx-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/dsa/vitesse-vsc73xx-core.c b/drivers/net/dsa/vitesse-vsc73xx-core.c index fc4030976bdce..2725541b3c36f 100644 --- a/drivers/net/dsa/vitesse-vsc73xx-core.c +++ b/drivers/net/dsa/vitesse-vsc73xx-core.c @@ -534,7 +534,7 @@ static int vsc73xx_phy_write(struct dsa_switch *ds, int phy, int regnum, return 0; }
- cmd = (phy << 21) | (regnum << 16); + cmd = (phy << 21) | (regnum << 16) | val; ret = vsc73xx_write(vsc, VSC73XX_BLOCK_MII, 0, 1, cmd); if (ret) return ret;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pawel Dembicki paweldembicki@gmail.com
[ Upstream commit fa63c6434b6f6aaf9d8d599dc899bc0a074cc0ad ]
The VSC73xx has a busy flag used during MDIO operations. It is raised when MDIO read/write operations are in progress. Without it, PHYs are misconfigured and bus operations do not work as expected.
Fixes: 05bd97fc559d ("net: dsa: Add Vitesse VSC73xx DSA router driver") Reviewed-by: Linus Walleij linus.walleij@linaro.org Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: Pawel Dembicki paweldembicki@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/vitesse-vsc73xx-core.c | 37 +++++++++++++++++++++++++- 1 file changed, 36 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/vitesse-vsc73xx-core.c b/drivers/net/dsa/vitesse-vsc73xx-core.c index 2725541b3c36f..56bb77dbd28a2 100644 --- a/drivers/net/dsa/vitesse-vsc73xx-core.c +++ b/drivers/net/dsa/vitesse-vsc73xx-core.c @@ -38,6 +38,10 @@ #define VSC73XX_BLOCK_ARBITER 0x5 /* Only subblock 0 */ #define VSC73XX_BLOCK_SYSTEM 0x7 /* Only subblock 0 */
+/* MII Block subblock */ +#define VSC73XX_BLOCK_MII_INTERNAL 0x0 /* Internal MDIO subblock */ +#define VSC73XX_BLOCK_MII_EXTERNAL 0x1 /* External MDIO subblock */ + #define CPU_PORT 6 /* CPU port */
/* MAC Block registers */ @@ -196,6 +200,8 @@ #define VSC73XX_MII_CMD 0x1 #define VSC73XX_MII_DATA 0x2
+#define VSC73XX_MII_STAT_BUSY BIT(3) + /* Arbiter block 5 registers */ #define VSC73XX_ARBEMPTY 0x0c #define VSC73XX_ARBDISC 0x0e @@ -270,6 +276,7 @@ #define IS_739X(a) (IS_7395(a) || IS_7398(a))
#define VSC73XX_POLL_SLEEP_US 1000 +#define VSC73XX_MDIO_POLL_SLEEP_US 5 #define VSC73XX_POLL_TIMEOUT_US 10000
struct vsc73xx_counter { @@ -487,6 +494,22 @@ static int vsc73xx_detect(struct vsc73xx *vsc) return 0; }
+static int vsc73xx_mdio_busy_check(struct vsc73xx *vsc) +{ + int ret, err; + u32 val; + + ret = read_poll_timeout(vsc73xx_read, err, + err < 0 || !(val & VSC73XX_MII_STAT_BUSY), + VSC73XX_MDIO_POLL_SLEEP_US, + VSC73XX_POLL_TIMEOUT_US, false, vsc, + VSC73XX_BLOCK_MII, VSC73XX_BLOCK_MII_INTERNAL, + VSC73XX_MII_STAT, &val); + if (ret) + return ret; + return err; +} + static int vsc73xx_phy_read(struct dsa_switch *ds, int phy, int regnum) { struct vsc73xx *vsc = ds->priv; @@ -494,12 +517,20 @@ static int vsc73xx_phy_read(struct dsa_switch *ds, int phy, int regnum) u32 val; int ret;
+ ret = vsc73xx_mdio_busy_check(vsc); + if (ret) + return ret; + /* Setting bit 26 means "read" */ cmd = BIT(26) | (phy << 21) | (regnum << 16); ret = vsc73xx_write(vsc, VSC73XX_BLOCK_MII, 0, 1, cmd); if (ret) return ret; - msleep(2); + + ret = vsc73xx_mdio_busy_check(vsc); + if (ret) + return ret; + ret = vsc73xx_read(vsc, VSC73XX_BLOCK_MII, 0, 2, &val); if (ret) return ret; @@ -523,6 +554,10 @@ static int vsc73xx_phy_write(struct dsa_switch *ds, int phy, int regnum, u32 cmd; int ret;
+ ret = vsc73xx_mdio_busy_check(vsc); + if (ret) + return ret; + /* It was found through tedious experiments that this router * chip really hates to have it's PHYs reset. They * never recover if that happens: autonegotiation stops
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zheng Zhang everything411@qq.com
[ Upstream commit db1b4bedb9b97c6d34b03d03815147c04fffe8b4 ]
When there are multiple ap interfaces on one band and with WED on, turning the interface down will cause a kernel panic on MT798X.
Previously, cb_priv was freed in mtk_wed_setup_tc_block() without marking NULL,and mtk_wed_setup_tc_block_cb() didn't check the value, too.
Assign NULL after free cb_priv in mtk_wed_setup_tc_block() and check NULL in mtk_wed_setup_tc_block_cb().
---------- Unable to handle kernel paging request at virtual address 0072460bca32b4f5 Call trace: mtk_wed_setup_tc_block_cb+0x4/0x38 0xffffffc0794084bc tcf_block_playback_offloads+0x70/0x1e8 tcf_block_unbind+0x6c/0xc8 ... ---------
Fixes: 799684448e3e ("net: ethernet: mtk_wed: introduce wed wo support") Signed-off-by: Zheng Zhang everything411@qq.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mediatek/mtk_wed.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mediatek/mtk_wed.c b/drivers/net/ethernet/mediatek/mtk_wed.c index 61334a71058c7..e212a4ba92751 100644 --- a/drivers/net/ethernet/mediatek/mtk_wed.c +++ b/drivers/net/ethernet/mediatek/mtk_wed.c @@ -2666,14 +2666,15 @@ mtk_wed_setup_tc_block_cb(enum tc_setup_type type, void *type_data, void *cb_pri { struct mtk_wed_flow_block_priv *priv = cb_priv; struct flow_cls_offload *cls = type_data; - struct mtk_wed_hw *hw = priv->hw; + struct mtk_wed_hw *hw = NULL;
- if (!tc_can_offload(priv->dev)) + if (!priv || !tc_can_offload(priv->dev)) return -EOPNOTSUPP;
if (type != TC_SETUP_CLSFLOWER) return -EOPNOTSUPP;
+ hw = priv->hw; return mtk_flow_offload_cmd(hw->eth, cls, hw->index); }
@@ -2729,6 +2730,7 @@ mtk_wed_setup_tc_block(struct mtk_wed_hw *hw, struct net_device *dev, flow_block_cb_remove(block_cb, f); list_del(&block_cb->driver_list); kfree(block_cb->cb_priv); + block_cb->cb_priv = NULL; } return 0; default:
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Thompson davthompson@nvidia.com
[ Upstream commit df934abb185c71c9f2fa07a5013672d0cbd36560 ]
A recent change to the driver exposed a bug where the MAC RX filters (unicast MAC, broadcast MAC, and multicast MAC) are configured and enabled before the RX path is fully initialized. The result of this bug is that after the PHY is started packets that match these MAC RX filters start to flow into the RX FIFO. And then, after rx_init() is completed, these packets will go into the driver RX ring as well. If enough packets are received to fill the RX ring (default size is 128 packets) before the call to request_irq() completes, the driver RX function becomes stuck.
This bug is intermittent but is most likely to be seen where the oob_net0 interface is connected to a busy network with lots of broadcast and multicast traffic.
All the MAC RX filters must be disabled until the RX path is ready, i.e. all initialization is done and all the IRQs are installed.
Fixes: f7442a634ac0 ("mlxbf_gige: call request_irq() after NAPI initialized") Reviewed-by: Asmaa Mnebhi asmaa@nvidia.com Signed-off-by: David Thompson davthompson@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20240809163612.12852-1-davthompson@nvidia.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/mellanox/mlxbf_gige/mlxbf_gige.h | 8 +++ .../mellanox/mlxbf_gige/mlxbf_gige_main.c | 10 ++++ .../mellanox/mlxbf_gige/mlxbf_gige_regs.h | 2 + .../mellanox/mlxbf_gige/mlxbf_gige_rx.c | 50 ++++++++++++++++--- 4 files changed, 64 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h index bc94e75a7aebd..e7777700ee18a 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h @@ -40,6 +40,7 @@ */ #define MLXBF_GIGE_BCAST_MAC_FILTER_IDX 0 #define MLXBF_GIGE_LOCAL_MAC_FILTER_IDX 1 +#define MLXBF_GIGE_MAX_FILTER_IDX 3
/* Define for broadcast MAC literal */ #define BCAST_MAC_ADDR 0xFFFFFFFFFFFF @@ -175,6 +176,13 @@ enum mlxbf_gige_res { int mlxbf_gige_mdio_probe(struct platform_device *pdev, struct mlxbf_gige *priv); void mlxbf_gige_mdio_remove(struct mlxbf_gige *priv); + +void mlxbf_gige_enable_multicast_rx(struct mlxbf_gige *priv); +void mlxbf_gige_disable_multicast_rx(struct mlxbf_gige *priv); +void mlxbf_gige_enable_mac_rx_filter(struct mlxbf_gige *priv, + unsigned int index); +void mlxbf_gige_disable_mac_rx_filter(struct mlxbf_gige *priv, + unsigned int index); void mlxbf_gige_set_mac_rx_filter(struct mlxbf_gige *priv, unsigned int index, u64 dmac); void mlxbf_gige_get_mac_rx_filter(struct mlxbf_gige *priv, diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c index b157f0f1c5a88..385a56ac73481 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c @@ -168,6 +168,10 @@ static int mlxbf_gige_open(struct net_device *netdev) if (err) goto napi_deinit;
+ mlxbf_gige_enable_mac_rx_filter(priv, MLXBF_GIGE_BCAST_MAC_FILTER_IDX); + mlxbf_gige_enable_mac_rx_filter(priv, MLXBF_GIGE_LOCAL_MAC_FILTER_IDX); + mlxbf_gige_enable_multicast_rx(priv); + /* Set bits in INT_EN that we care about */ int_en = MLXBF_GIGE_INT_EN_HW_ACCESS_ERROR | MLXBF_GIGE_INT_EN_TX_CHECKSUM_INPUTS | @@ -379,6 +383,7 @@ static int mlxbf_gige_probe(struct platform_device *pdev) void __iomem *plu_base; void __iomem *base; int addr, phy_irq; + unsigned int i; int err;
base = devm_platform_ioremap_resource(pdev, MLXBF_GIGE_RES_MAC); @@ -423,6 +428,11 @@ static int mlxbf_gige_probe(struct platform_device *pdev) priv->rx_q_entries = MLXBF_GIGE_DEFAULT_RXQ_SZ; priv->tx_q_entries = MLXBF_GIGE_DEFAULT_TXQ_SZ;
+ for (i = 0; i <= MLXBF_GIGE_MAX_FILTER_IDX; i++) + mlxbf_gige_disable_mac_rx_filter(priv, i); + mlxbf_gige_disable_multicast_rx(priv); + mlxbf_gige_disable_promisc(priv); + /* Write initial MAC address to hardware */ mlxbf_gige_initial_mac(priv);
diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h index 98a8681c21b9c..4d14cb13fd64e 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h @@ -62,6 +62,8 @@ #define MLXBF_GIGE_TX_STATUS_DATA_FIFO_FULL BIT(1) #define MLXBF_GIGE_RX_MAC_FILTER_DMAC_RANGE_START 0x0520 #define MLXBF_GIGE_RX_MAC_FILTER_DMAC_RANGE_END 0x0528 +#define MLXBF_GIGE_RX_MAC_FILTER_GENERAL 0x0530 +#define MLXBF_GIGE_RX_MAC_FILTER_EN_MULTICAST BIT(1) #define MLXBF_GIGE_RX_MAC_FILTER_COUNT_DISC 0x0540 #define MLXBF_GIGE_RX_MAC_FILTER_COUNT_DISC_EN BIT(0) #define MLXBF_GIGE_RX_MAC_FILTER_COUNT_PASS 0x0548 diff --git a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c index 6999843584934..eb62620b63c7f 100644 --- a/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c +++ b/drivers/net/ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c @@ -11,15 +11,31 @@ #include "mlxbf_gige.h" #include "mlxbf_gige_regs.h"
-void mlxbf_gige_set_mac_rx_filter(struct mlxbf_gige *priv, - unsigned int index, u64 dmac) +void mlxbf_gige_enable_multicast_rx(struct mlxbf_gige *priv) { void __iomem *base = priv->base; - u64 control; + u64 data;
- /* Write destination MAC to specified MAC RX filter */ - writeq(dmac, base + MLXBF_GIGE_RX_MAC_FILTER + - (index * MLXBF_GIGE_RX_MAC_FILTER_STRIDE)); + data = readq(base + MLXBF_GIGE_RX_MAC_FILTER_GENERAL); + data |= MLXBF_GIGE_RX_MAC_FILTER_EN_MULTICAST; + writeq(data, base + MLXBF_GIGE_RX_MAC_FILTER_GENERAL); +} + +void mlxbf_gige_disable_multicast_rx(struct mlxbf_gige *priv) +{ + void __iomem *base = priv->base; + u64 data; + + data = readq(base + MLXBF_GIGE_RX_MAC_FILTER_GENERAL); + data &= ~MLXBF_GIGE_RX_MAC_FILTER_EN_MULTICAST; + writeq(data, base + MLXBF_GIGE_RX_MAC_FILTER_GENERAL); +} + +void mlxbf_gige_enable_mac_rx_filter(struct mlxbf_gige *priv, + unsigned int index) +{ + void __iomem *base = priv->base; + u64 control;
/* Enable MAC receive filter mask for specified index */ control = readq(base + MLXBF_GIGE_CONTROL); @@ -27,6 +43,28 @@ void mlxbf_gige_set_mac_rx_filter(struct mlxbf_gige *priv, writeq(control, base + MLXBF_GIGE_CONTROL); }
+void mlxbf_gige_disable_mac_rx_filter(struct mlxbf_gige *priv, + unsigned int index) +{ + void __iomem *base = priv->base; + u64 control; + + /* Disable MAC receive filter mask for specified index */ + control = readq(base + MLXBF_GIGE_CONTROL); + control &= ~(MLXBF_GIGE_CONTROL_EN_SPECIFIC_MAC << index); + writeq(control, base + MLXBF_GIGE_CONTROL); +} + +void mlxbf_gige_set_mac_rx_filter(struct mlxbf_gige *priv, + unsigned int index, u64 dmac) +{ + void __iomem *base = priv->base; + + /* Write destination MAC to specified MAC RX filter */ + writeq(dmac, base + MLXBF_GIGE_RX_MAC_FILTER + + (index * MLXBF_GIGE_RX_MAC_FILTER_STRIDE)); +} + void mlxbf_gige_get_mac_rx_filter(struct mlxbf_gige *priv, unsigned int index, u64 *dmac) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eugene Syromiatnikov esyr@redhat.com
[ Upstream commit 655111b838cdabdb604f3625a9ff08c5eedb11da ]
ssn_offset field is u32 and is placed into the netlink response with nla_put_u32(), but only 2 bytes are reserved for the attribute payload in subflow_get_info_size() (even though it makes no difference in the end, as it is aligned up to 4 bytes). Supply the correct argument to the relevant nla_total_size() call to make it less confusing.
Fixes: 5147dfb50832 ("mptcp: allow dumping subflow context to userspace") Signed-off-by: Eugene Syromiatnikov esyr@redhat.com Reviewed-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240812065024.GA19719@asgard.redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mptcp/diag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mptcp/diag.c b/net/mptcp/diag.c index 3ae46b545d2c2..2d3efb405437d 100644 --- a/net/mptcp/diag.c +++ b/net/mptcp/diag.c @@ -94,7 +94,7 @@ static size_t subflow_get_info_size(const struct sock *sk) nla_total_size(4) + /* MPTCP_SUBFLOW_ATTR_RELWRITE_SEQ */ nla_total_size_64bit(8) + /* MPTCP_SUBFLOW_ATTR_MAP_SEQ */ nla_total_size(4) + /* MPTCP_SUBFLOW_ATTR_MAP_SFSEQ */ - nla_total_size(2) + /* MPTCP_SUBFLOW_ATTR_SSN_OFFSET */ + nla_total_size(4) + /* MPTCP_SUBFLOW_ATTR_SSN_OFFSET */ nla_total_size(2) + /* MPTCP_SUBFLOW_ATTR_MAP_DATALEN */ nla_total_size(4) + /* MPTCP_SUBFLOW_ATTR_FLAGS */ nla_total_size(1) + /* MPTCP_SUBFLOW_ATTR_ID_REM */
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Subash Abhinov Kasiviswanathan quic_subashab@quicinc.com
[ Upstream commit a2cbb1603943281a604f5adc48079a148db5cb0d ]
This patch is based on the discussions between Neal Cardwell and Eric Dumazet in the link https://lore.kernel.org/netdev/20240726204105.1466841-1-quic_subashab@quicin...
It was correctly pointed out that tp->window_clamp would not be updated in cases where net.ipv4.tcp_moderate_rcvbuf=0 or if (copied <= tp->rcvq_space.space). While it is expected for most setups to leave the sysctl enabled, the latter condition may not end up hitting depending on the TCP receive queue size and the pattern of arriving data.
The updated check should be hit only on initial MSS update from TCP_MIN_MSS to measured MSS value and subsequently if there was an update to a larger value.
Fixes: 05f76b2d634e ("tcp: Adjust clamping window for applications specifying SO_RCVBUF") Signed-off-by: Sean Tranchetti quic_stranche@quicinc.com Signed-off-by: Subash Abhinov Kasiviswanathan quic_subashab@quicinc.com Acked-by: Neal Cardwell ncardwell@google.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_input.c | 28 ++++++++++++---------------- 1 file changed, 12 insertions(+), 16 deletions(-)
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ecd521108559f..2c52f6dcbd290 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -238,9 +238,14 @@ static void tcp_measure_rcv_mss(struct sock *sk, const struct sk_buff *skb) */ if (unlikely(len != icsk->icsk_ack.rcv_mss)) { u64 val = (u64)skb->len << TCP_RMEM_TO_WIN_SCALE; + u8 old_ratio = tcp_sk(sk)->scaling_ratio;
do_div(val, skb->truesize); tcp_sk(sk)->scaling_ratio = val ? val : 1; + + if (old_ratio != tcp_sk(sk)->scaling_ratio) + WRITE_ONCE(tcp_sk(sk)->window_clamp, + tcp_win_from_space(sk, sk->sk_rcvbuf)); } icsk->icsk_ack.rcv_mss = min_t(unsigned int, len, tcp_sk(sk)->advmss); @@ -754,7 +759,8 @@ void tcp_rcv_space_adjust(struct sock *sk) * <prev RTT . ><current RTT .. ><next RTT .... > */
- if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf)) { + if (READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_moderate_rcvbuf) && + !(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { u64 rcvwin, grow; int rcvbuf;
@@ -770,22 +776,12 @@ void tcp_rcv_space_adjust(struct sock *sk)
rcvbuf = min_t(u64, tcp_space_from_win(sk, rcvwin), READ_ONCE(sock_net(sk)->ipv4.sysctl_tcp_rmem[2])); - if (!(sk->sk_userlocks & SOCK_RCVBUF_LOCK)) { - if (rcvbuf > sk->sk_rcvbuf) { - WRITE_ONCE(sk->sk_rcvbuf, rcvbuf); - - /* Make the window clamp follow along. */ - WRITE_ONCE(tp->window_clamp, - tcp_win_from_space(sk, rcvbuf)); - } - } else { - /* Make the window clamp follow along while being bounded - * by SO_RCVBUF. - */ - int clamp = tcp_win_from_space(sk, min(rcvbuf, sk->sk_rcvbuf)); + if (rcvbuf > sk->sk_rcvbuf) { + WRITE_ONCE(sk->sk_rcvbuf, rcvbuf);
- if (clamp > tp->window_clamp) - WRITE_ONCE(tp->window_clamp, clamp); + /* Make the window clamp follow along. */ + WRITE_ONCE(tp->window_clamp, + tcp_win_from_space(sk, rcvbuf)); } } tp->rcvq_space.space = copied;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tom Hughes tom@compton.nu
[ Upstream commit 3cd740b985963f874a1a094f1969e998b9d05554 ]
Commit 264640fc2c5f4 ("ipv6: distinguish frag queues by device for multicast and link-local packets") modified the ipv6 fragment reassembly logic to distinguish frag queues by device for multicast and link-local packets but in fact only the main reassembly code limits the use of the device to those address types and the netfilter reassembly code uses the device for all packets.
This means that if fragments of a packet arrive on different interfaces then netfilter will fail to reassemble them and the fragments will be expired without going any further through the filters.
Fixes: 648700f76b03 ("inet: frags: use rhashtables for reassembly units") Signed-off-by: Tom Hughes tom@compton.nu Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/netfilter/nf_conntrack_reasm.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/ipv6/netfilter/nf_conntrack_reasm.c b/net/ipv6/netfilter/nf_conntrack_reasm.c index 5e1b50c6a44d2..3e9779ed7daec 100644 --- a/net/ipv6/netfilter/nf_conntrack_reasm.c +++ b/net/ipv6/netfilter/nf_conntrack_reasm.c @@ -154,6 +154,10 @@ static struct frag_queue *fq_find(struct net *net, __be32 id, u32 user, }; struct inet_frag_queue *q;
+ if (!(ipv6_addr_type(&hdr->daddr) & (IPV6_ADDR_MULTICAST | + IPV6_ADDR_LINKLOCAL))) + key.iif = 0; + q = inet_frag_find(nf_frag->fqdir, &key); if (!q) return NULL;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Donald Hunter donald.hunter@gmail.com
[ Upstream commit d1a7b382a9d3f0f3e5a80e0be2991c075fa4f618 ]
Add missing extack initialisation when ACKing BATCH_BEGIN and BATCH_END.
Fixes: bf2ac490d28c ("netfilter: nfnetlink: Handle ACK flags for batch messages") Signed-off-by: Donald Hunter donald.hunter@gmail.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nfnetlink.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nfnetlink.c b/net/netfilter/nfnetlink.c index 4abf660c7baff..932b3ddb34f13 100644 --- a/net/netfilter/nfnetlink.c +++ b/net/netfilter/nfnetlink.c @@ -427,8 +427,10 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh,
nfnl_unlock(subsys_id);
- if (nlh->nlmsg_flags & NLM_F_ACK) + if (nlh->nlmsg_flags & NLM_F_ACK) { + memset(&extack, 0, sizeof(extack)); nfnl_err_add(&err_list, nlh, 0, &extack); + }
while (skb->len >= nlmsg_total_size(0)) { int msglen, type; @@ -577,6 +579,7 @@ static void nfnetlink_rcv_batch(struct sk_buff *skb, struct nlmsghdr *nlh, ss->abort(net, oskb, NFNL_ABORT_NONE); netlink_ack(oskb, nlmsg_hdr(oskb), err, NULL); } else if (nlh->nlmsg_flags & NLM_F_ACK) { + memset(&extack, 0, sizeof(extack)); nfnl_err_add(&err_list, nlh, 0, &extack); } } else {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Donald Hunter donald.hunter@gmail.com
[ Upstream commit e9767137308daf906496613fd879808a07f006a2 ]
Fix missing initialisation of extack in flow offload.
Fixes: c29f74e0df7a ("netfilter: nf_flow_table: hardware offload support") Signed-off-by: Donald Hunter donald.hunter@gmail.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_flow_table_offload.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c index a010b25076ca0..3d46372b538e5 100644 --- a/net/netfilter/nf_flow_table_offload.c +++ b/net/netfilter/nf_flow_table_offload.c @@ -841,8 +841,8 @@ static int nf_flow_offload_tuple(struct nf_flowtable *flowtable, struct list_head *block_cb_list) { struct flow_cls_offload cls_flow = {}; + struct netlink_ext_ack extack = {}; struct flow_block_cb *block_cb; - struct netlink_ext_ack extack; __be16 proto = ETH_P_ALL; int err, i = 0;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 7d8dc1c7be8d3509e8f5164dd5df64c8e34d7eeb ]
Conntrack assumes an unconfirmed entry (not yet committed to global hash table) has a refcount of 1 and is not visible to other cores.
With multicast forwarding this assumption breaks down because such skbs get cloned after being picked up, i.e. ct->use refcount is > 1.
Likewise, bridge netfilter will clone broad/mutlicast frames and all frames in case they need to be flood-forwarded during learning phase.
For ip multicast forwarding or plain bridge flood-forward this will "work" because packets don't leave softirq and are implicitly serialized.
With nfqueue this no longer holds true, the packets get queued and can be reinjected in arbitrary ways.
Disable this feature, I see no other solution.
After this patch, nfqueue cannot queue packets except the last multicast/broadcast packet.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/bridge/br_netfilter_hooks.c | 6 +++++- net/netfilter/nfnetlink_queue.c | 35 +++++++++++++++++++++++++++++++-- 2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/net/bridge/br_netfilter_hooks.c b/net/bridge/br_netfilter_hooks.c index bf30c50b56895..a9e1b56f854d4 100644 --- a/net/bridge/br_netfilter_hooks.c +++ b/net/bridge/br_netfilter_hooks.c @@ -619,8 +619,12 @@ static unsigned int br_nf_local_in(void *priv, if (likely(nf_ct_is_confirmed(ct))) return NF_ACCEPT;
+ if (WARN_ON_ONCE(refcount_read(&nfct->use) != 1)) { + nf_reset_ct(skb); + return NF_ACCEPT; + } + WARN_ON_ONCE(skb_shared(skb)); - WARN_ON_ONCE(refcount_read(&nfct->use) != 1);
/* We can't call nf_confirm here, it would create a dependency * on nf_conntrack module. diff --git a/net/netfilter/nfnetlink_queue.c b/net/netfilter/nfnetlink_queue.c index 55e28e1da66ec..e0716da256bf5 100644 --- a/net/netfilter/nfnetlink_queue.c +++ b/net/netfilter/nfnetlink_queue.c @@ -820,10 +820,41 @@ static bool nf_ct_drop_unconfirmed(const struct nf_queue_entry *entry) { #if IS_ENABLED(CONFIG_NF_CONNTRACK) static const unsigned long flags = IPS_CONFIRMED | IPS_DYING; - const struct nf_conn *ct = (void *)skb_nfct(entry->skb); + struct nf_conn *ct = (void *)skb_nfct(entry->skb); + unsigned long status; + unsigned int use;
- if (ct && ((ct->status & flags) == IPS_DYING)) + if (!ct) + return false; + + status = READ_ONCE(ct->status); + if ((status & flags) == IPS_DYING) return true; + + if (status & IPS_CONFIRMED) + return false; + + /* in some cases skb_clone() can occur after initial conntrack + * pickup, but conntrack assumes exclusive skb->_nfct ownership for + * unconfirmed entries. + * + * This happens for br_netfilter and with ip multicast routing. + * We can't be solved with serialization here because one clone could + * have been queued for local delivery. + */ + use = refcount_read(&ct->ct_general.use); + if (likely(use == 1)) + return false; + + /* Can't decrement further? Exclusive ownership. */ + if (!refcount_dec_not_one(&ct->ct_general.use)) + return false; + + skb_set_nfct(entry->skb, 0); + /* No nf_ct_put(): we already decremented .use and it cannot + * drop down to 0. + */ + return true; #endif return false; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit e0b6648b0446e59522819c75ba1dcb09e68d3e94 ]
In theory, dumpreset may fail and invalidate the preceeding log message. Fix this and use the occasion to prepare for object reset locking, which benefits from a few unrelated changes:
* Add an early call to nfnetlink_unicast if not resetting which effectively skips the audit logging but also unindents it. * Extract the table's name from the netlink attribute (which is verified via earlier table lookup) to not rely upon validity of the looked up table pointer. * Do not use local variable family, it will vanish.
Fixes: 8e6cf365e1d5 ("audit: log nftables configuration change events") Signed-off-by: Phil Sutter phil@nwl.cc Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 91cc3a81ba8f1..7ae055521cf36 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8012,6 +8012,7 @@ static int nf_tables_dump_obj_done(struct netlink_callback *cb) static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const nla[]) { + const struct nftables_pernet *nft_net = nft_pernet(info->net); struct netlink_ext_ack *extack = info->extack; u8 genmask = nft_genmask_cur(info->net); u8 family = info->nfmsg->nfgen_family; @@ -8021,6 +8022,7 @@ static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, struct sk_buff *skb2; bool reset = false; u32 objtype; + char *buf; int err;
if (info->nlh->nlmsg_flags & NLM_F_DUMP) { @@ -8059,27 +8061,23 @@ static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, if (NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET) reset = true;
- if (reset) { - const struct nftables_pernet *nft_net; - char *buf; - - nft_net = nft_pernet(net); - buf = kasprintf(GFP_ATOMIC, "%s:%u", table->name, nft_net->base_seq); - - audit_log_nfcfg(buf, - family, - 1, - AUDIT_NFT_OP_OBJ_RESET, - GFP_ATOMIC); - kfree(buf); - } - err = nf_tables_fill_obj_info(skb2, net, NETLINK_CB(skb).portid, info->nlh->nlmsg_seq, NFT_MSG_NEWOBJ, 0, family, table, obj, reset); if (err < 0) goto err_fill_obj_info;
+ if (!reset) + return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid); + + buf = kasprintf(GFP_ATOMIC, "%.*s:%u", + nla_len(nla[NFTA_OBJ_TABLE]), + (char *)nla_data(nla[NFTA_OBJ_TABLE]), + nft_net->base_seq); + audit_log_nfcfg(buf, info->nfmsg->nfgen_family, 1, + AUDIT_NFT_OP_OBJ_RESET, GFP_ATOMIC); + kfree(buf); + return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid);
err_fill_obj_info:
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit 69fc3e9e90f1afc11f4015e6b75d18ab9acee348 ]
Outsource the reply skb preparation for non-dump getrule requests into a distinct function. Prep work for object reset locking.
Signed-off-by: Phil Sutter phil@nwl.cc Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: bd662c4218f9 ("netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 75 ++++++++++++++++++++--------------- 1 file changed, 44 insertions(+), 31 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 7ae055521cf36..2f32db2a09fec 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -8009,10 +8009,10 @@ static int nf_tables_dump_obj_done(struct netlink_callback *cb) }
/* called with rcu_read_lock held */ -static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, - const struct nlattr * const nla[]) +static struct sk_buff * +nf_tables_getobj_single(u32 portid, const struct nfnl_info *info, + const struct nlattr * const nla[], bool reset) { - const struct nftables_pernet *nft_net = nft_pernet(info->net); struct netlink_ext_ack *extack = info->extack; u8 genmask = nft_genmask_cur(info->net); u8 family = info->nfmsg->nfgen_family; @@ -8020,52 +8020,69 @@ static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, struct net *net = info->net; struct nft_object *obj; struct sk_buff *skb2; - bool reset = false; u32 objtype; - char *buf; int err;
- if (info->nlh->nlmsg_flags & NLM_F_DUMP) { - struct netlink_dump_control c = { - .start = nf_tables_dump_obj_start, - .dump = nf_tables_dump_obj, - .done = nf_tables_dump_obj_done, - .module = THIS_MODULE, - .data = (void *)nla, - }; - - return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c); - } - if (!nla[NFTA_OBJ_NAME] || !nla[NFTA_OBJ_TYPE]) - return -EINVAL; + return ERR_PTR(-EINVAL);
table = nft_table_lookup(net, nla[NFTA_OBJ_TABLE], family, genmask, 0); if (IS_ERR(table)) { NL_SET_BAD_ATTR(extack, nla[NFTA_OBJ_TABLE]); - return PTR_ERR(table); + return ERR_CAST(table); }
objtype = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE])); obj = nft_obj_lookup(net, table, nla[NFTA_OBJ_NAME], objtype, genmask); if (IS_ERR(obj)) { NL_SET_BAD_ATTR(extack, nla[NFTA_OBJ_NAME]); - return PTR_ERR(obj); + return ERR_CAST(obj); }
skb2 = alloc_skb(NLMSG_GOODSIZE, GFP_ATOMIC); if (!skb2) - return -ENOMEM; + return ERR_PTR(-ENOMEM); + + err = nf_tables_fill_obj_info(skb2, net, portid, + info->nlh->nlmsg_seq, NFT_MSG_NEWOBJ, 0, + family, table, obj, reset); + if (err < 0) { + kfree_skb(skb2); + return ERR_PTR(err); + } + + return skb2; +} + +static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, + const struct nlattr * const nla[]) +{ + struct nftables_pernet *nft_net = nft_pernet(info->net); + u32 portid = NETLINK_CB(skb).portid; + struct net *net = info->net; + struct sk_buff *skb2; + bool reset = false; + char *buf; + + if (info->nlh->nlmsg_flags & NLM_F_DUMP) { + struct netlink_dump_control c = { + .start = nf_tables_dump_obj_start, + .dump = nf_tables_dump_obj, + .done = nf_tables_dump_obj_done, + .module = THIS_MODULE, + .data = (void *)nla, + }; + + return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c); + }
if (NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET) reset = true;
- err = nf_tables_fill_obj_info(skb2, net, NETLINK_CB(skb).portid, - info->nlh->nlmsg_seq, NFT_MSG_NEWOBJ, 0, - family, table, obj, reset); - if (err < 0) - goto err_fill_obj_info; + skb2 = nf_tables_getobj_single(portid, info, nla, reset); + if (IS_ERR(skb2)) + return PTR_ERR(skb2);
if (!reset) return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid); @@ -8078,11 +8095,7 @@ static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, AUDIT_NFT_OP_OBJ_RESET, GFP_ATOMIC); kfree(buf);
- return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid); - -err_fill_obj_info: - kfree_skb(skb2); - return err; + return nfnetlink_unicast(skb2, net, portid); }
static void nft_obj_destroy(const struct nft_ctx *ctx, struct nft_object *obj)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Phil Sutter phil@nwl.cc
[ Upstream commit bd662c4218f9648e888bebde9468146965f3f8a0 ]
Objects' dump callbacks are not concurrency-safe per-se with reset bit set. If two CPUs perform a reset at the same time, at least counter and quota objects suffer from value underrun.
Prevent this by introducing dedicated locking callbacks for nfnetlink and the asynchronous dump handling to serialize access.
Fixes: 43da04a593d8 ("netfilter: nf_tables: atomic dump and reset for stateful objects") Signed-off-by: Phil Sutter phil@nwl.cc Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 72 ++++++++++++++++++++++++++++------- 1 file changed, 59 insertions(+), 13 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 2f32db2a09fec..41d7faeb101cf 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -7977,6 +7977,19 @@ static int nf_tables_dump_obj(struct sk_buff *skb, struct netlink_callback *cb) return skb->len; }
+static int nf_tables_dumpreset_obj(struct sk_buff *skb, + struct netlink_callback *cb) +{ + struct nftables_pernet *nft_net = nft_pernet(sock_net(skb->sk)); + int ret; + + mutex_lock(&nft_net->commit_mutex); + ret = nf_tables_dump_obj(skb, cb); + mutex_unlock(&nft_net->commit_mutex); + + return ret; +} + static int nf_tables_dump_obj_start(struct netlink_callback *cb) { struct nft_obj_dump_ctx *ctx = (void *)cb->ctx; @@ -7993,12 +8006,18 @@ static int nf_tables_dump_obj_start(struct netlink_callback *cb) if (nla[NFTA_OBJ_TYPE]) ctx->type = ntohl(nla_get_be32(nla[NFTA_OBJ_TYPE]));
- if (NFNL_MSG_TYPE(cb->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET) - ctx->reset = true; - return 0; }
+static int nf_tables_dumpreset_obj_start(struct netlink_callback *cb) +{ + struct nft_obj_dump_ctx *ctx = (void *)cb->ctx; + + ctx->reset = true; + + return nf_tables_dump_obj_start(cb); +} + static int nf_tables_dump_obj_done(struct netlink_callback *cb) { struct nft_obj_dump_ctx *ctx = (void *)cb->ctx; @@ -8057,18 +8076,43 @@ nf_tables_getobj_single(u32 portid, const struct nfnl_info *info,
static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const nla[]) +{ + u32 portid = NETLINK_CB(skb).portid; + struct sk_buff *skb2; + + if (info->nlh->nlmsg_flags & NLM_F_DUMP) { + struct netlink_dump_control c = { + .start = nf_tables_dump_obj_start, + .dump = nf_tables_dump_obj, + .done = nf_tables_dump_obj_done, + .module = THIS_MODULE, + .data = (void *)nla, + }; + + return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c); + } + + skb2 = nf_tables_getobj_single(portid, info, nla, false); + if (IS_ERR(skb2)) + return PTR_ERR(skb2); + + return nfnetlink_unicast(skb2, info->net, portid); +} + +static int nf_tables_getobj_reset(struct sk_buff *skb, + const struct nfnl_info *info, + const struct nlattr * const nla[]) { struct nftables_pernet *nft_net = nft_pernet(info->net); u32 portid = NETLINK_CB(skb).portid; struct net *net = info->net; struct sk_buff *skb2; - bool reset = false; char *buf;
if (info->nlh->nlmsg_flags & NLM_F_DUMP) { struct netlink_dump_control c = { - .start = nf_tables_dump_obj_start, - .dump = nf_tables_dump_obj, + .start = nf_tables_dumpreset_obj_start, + .dump = nf_tables_dumpreset_obj, .done = nf_tables_dump_obj_done, .module = THIS_MODULE, .data = (void *)nla, @@ -8077,16 +8121,18 @@ static int nf_tables_getobj(struct sk_buff *skb, const struct nfnl_info *info, return nft_netlink_dump_start_rcu(info->sk, skb, info->nlh, &c); }
- if (NFNL_MSG_TYPE(info->nlh->nlmsg_type) == NFT_MSG_GETOBJ_RESET) - reset = true; + if (!try_module_get(THIS_MODULE)) + return -EINVAL; + rcu_read_unlock(); + mutex_lock(&nft_net->commit_mutex); + skb2 = nf_tables_getobj_single(portid, info, nla, true); + mutex_unlock(&nft_net->commit_mutex); + rcu_read_lock(); + module_put(THIS_MODULE);
- skb2 = nf_tables_getobj_single(portid, info, nla, reset); if (IS_ERR(skb2)) return PTR_ERR(skb2);
- if (!reset) - return nfnetlink_unicast(skb2, net, NETLINK_CB(skb).portid); - buf = kasprintf(GFP_ATOMIC, "%.*s:%u", nla_len(nla[NFTA_OBJ_TABLE]), (char *)nla_data(nla[NFTA_OBJ_TABLE]), @@ -9378,7 +9424,7 @@ static const struct nfnl_callback nf_tables_cb[NFT_MSG_MAX] = { .policy = nft_obj_policy, }, [NFT_MSG_GETOBJ_RESET] = { - .call = nf_tables_getobj, + .call = nf_tables_getobj_reset, .type = NFNL_CB_RCU, .attr_count = NFTA_OBJ_MAX, .policy = nft_obj_policy,
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Abhinav Jain jain.abhinav177@gmail.com
[ Upstream commit 6c569b77f0300f8a9960277c7094fa0f128eb811 ]
Change expected_buf from (const void *) to (const char *) in function __recvpair(). This change fixes the below warnings during test compilation:
``` In file included from msg_oob.c:14: msg_oob.c: In function ‘__recvpair’:
../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument of type ‘char *’,but argument 6 has type ‘const void *’ [-Wformat=]
../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’ msg_oob.c:235:17: note: in expansion of macro ‘TH_LOG’
../../kselftest_harness.h:106:40: warning: format ‘%s’ expects argument of type ‘char *’,but argument 6 has type ‘const void *’ [-Wformat=]
../../kselftest_harness.h:101:17: note: in expansion of macro ‘__TH_LOG’ msg_oob.c:259:25: note: in expansion of macro ‘TH_LOG’ ```
Fixes: d098d77232c3 ("selftest: af_unix: Add msg_oob.c.") Signed-off-by: Abhinav Jain jain.abhinav177@gmail.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://patch.msgid.link/20240814080743.1156166-1-jain.abhinav177@gmail.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/af_unix/msg_oob.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/af_unix/msg_oob.c b/tools/testing/selftests/net/af_unix/msg_oob.c index 16d0c172eaebe..535eb2c3d7d1c 100644 --- a/tools/testing/selftests/net/af_unix/msg_oob.c +++ b/tools/testing/selftests/net/af_unix/msg_oob.c @@ -209,7 +209,7 @@ static void __sendpair(struct __test_metadata *_metadata,
static void __recvpair(struct __test_metadata *_metadata, FIXTURE_DATA(msg_oob) *self, - const void *expected_buf, int expected_len, + const char *expected_buf, int expected_len, int buf_len, int flags) { int i, ret[2], recv_errno[2], expected_errno = 0;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cong Wang cong.wang@bytedance.com
[ Upstream commit 69139d2919dd4aa9a553c8245e7c63e82613e3fc ]
After a vsock socket has been added to a BPF sockmap, its prot->recvmsg has been replaced with vsock_bpf_recvmsg(). Thus the following recursiion could happen:
vsock_bpf_recvmsg() -> __vsock_recvmsg() -> vsock_connectible_recvmsg() -> prot->recvmsg() -> vsock_bpf_recvmsg() again
We need to fix it by calling the original ->recvmsg() without any BPF sockmap logic in __vsock_recvmsg().
Fixes: 634f1a7110b4 ("vsock: support sockmap") Reported-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com Tested-by: syzbot+bdb4bd87b5e22058e2a4@syzkaller.appspotmail.com Cc: Bobby Eshleman bobby.eshleman@bytedance.com Cc: Michael S. Tsirkin mst@redhat.com Cc: Stefano Garzarella sgarzare@redhat.com Signed-off-by: Cong Wang cong.wang@bytedance.com Acked-by: Michael S. Tsirkin mst@redhat.com Link: https://patch.msgid.link/20240812022153.86512-1-xiyou.wangcong@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/af_vsock.h | 4 ++++ net/vmw_vsock/af_vsock.c | 50 +++++++++++++++++++++++---------------- net/vmw_vsock/vsock_bpf.c | 4 ++-- 3 files changed, 35 insertions(+), 23 deletions(-)
diff --git a/include/net/af_vsock.h b/include/net/af_vsock.h index 535701efc1e5c..24d970f7a4fa2 100644 --- a/include/net/af_vsock.h +++ b/include/net/af_vsock.h @@ -230,8 +230,12 @@ struct vsock_tap { int vsock_add_tap(struct vsock_tap *vt); int vsock_remove_tap(struct vsock_tap *vt); void vsock_deliver_tap(struct sk_buff *build_skb(void *opaque), void *opaque); +int __vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, + int flags); int vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags); +int __vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg, + size_t len, int flags); int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags);
diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index 4b040285aa78c..0ff9b2dd86bac 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -1270,25 +1270,28 @@ static int vsock_dgram_connect(struct socket *sock, return err; }
+int __vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg, + size_t len, int flags) +{ + struct sock *sk = sock->sk; + struct vsock_sock *vsk = vsock_sk(sk); + + return vsk->transport->dgram_dequeue(vsk, msg, len, flags); +} + int vsock_dgram_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, int flags) { #ifdef CONFIG_BPF_SYSCALL + struct sock *sk = sock->sk; const struct proto *prot; -#endif - struct vsock_sock *vsk; - struct sock *sk;
- sk = sock->sk; - vsk = vsock_sk(sk); - -#ifdef CONFIG_BPF_SYSCALL prot = READ_ONCE(sk->sk_prot); if (prot != &vsock_proto) return prot->recvmsg(sk, msg, len, flags, NULL); #endif
- return vsk->transport->dgram_dequeue(vsk, msg, len, flags); + return __vsock_dgram_recvmsg(sock, msg, len, flags); } EXPORT_SYMBOL_GPL(vsock_dgram_recvmsg);
@@ -2174,15 +2177,12 @@ static int __vsock_seqpacket_recvmsg(struct sock *sk, struct msghdr *msg, }
int -vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, - int flags) +__vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, + int flags) { struct sock *sk; struct vsock_sock *vsk; const struct vsock_transport *transport; -#ifdef CONFIG_BPF_SYSCALL - const struct proto *prot; -#endif int err;
sk = sock->sk; @@ -2233,14 +2233,6 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, goto out; }
-#ifdef CONFIG_BPF_SYSCALL - prot = READ_ONCE(sk->sk_prot); - if (prot != &vsock_proto) { - release_sock(sk); - return prot->recvmsg(sk, msg, len, flags, NULL); - } -#endif - if (sk->sk_type == SOCK_STREAM) err = __vsock_stream_recvmsg(sk, msg, len, flags); else @@ -2250,6 +2242,22 @@ vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, release_sock(sk); return err; } + +int +vsock_connectible_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, + int flags) +{ +#ifdef CONFIG_BPF_SYSCALL + struct sock *sk = sock->sk; + const struct proto *prot; + + prot = READ_ONCE(sk->sk_prot); + if (prot != &vsock_proto) + return prot->recvmsg(sk, msg, len, flags, NULL); +#endif + + return __vsock_connectible_recvmsg(sock, msg, len, flags); +} EXPORT_SYMBOL_GPL(vsock_connectible_recvmsg);
static int vsock_set_rcvlowat(struct sock *sk, int val) diff --git a/net/vmw_vsock/vsock_bpf.c b/net/vmw_vsock/vsock_bpf.c index a3c97546ab84a..c42c5cc18f324 100644 --- a/net/vmw_vsock/vsock_bpf.c +++ b/net/vmw_vsock/vsock_bpf.c @@ -64,9 +64,9 @@ static int __vsock_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int int err;
if (sk->sk_type == SOCK_STREAM || sk->sk_type == SOCK_SEQPACKET) - err = vsock_connectible_recvmsg(sock, msg, len, flags); + err = __vsock_connectible_recvmsg(sock, msg, len, flags); else if (sk->sk_type == SOCK_DGRAM) - err = vsock_dgram_recvmsg(sock, msg, len, flags); + err = __vsock_dgram_recvmsg(sock, msg, len, flags); else err = -EPROTOTYPE;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
[ Upstream commit 7e0620bc6a5ec6b340a0be40054f294ca26c010f ]
No need to disable errexit temporary, simply ignore the only possible and not handled error.
Reviewed-by: Geliang Tang geliang@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://lore.kernel.org/r/20240607-upstream-net-next-20240607-selftests-mptc... Signed-off-by: Jakub Kicinski kuba@kernel.org Stable-dep-of: 7965a7f32a53 ("selftests: net: lib: kill PIDs before del netns") Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/lib.sh | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh index 9155c914c064f..b2572aff6286f 100644 --- a/tools/testing/selftests/net/lib.sh +++ b/tools/testing/selftests/net/lib.sh @@ -128,25 +128,17 @@ slowwait_for_counter() cleanup_ns() { local ns="" - local errexit=0 local ret=0
- # disable errexit temporary - if [[ $- =~ "e" ]]; then - errexit=1 - set +e - fi - for ns in "$@"; do [ -z "${ns}" ] && continue - ip netns delete "${ns}" &> /dev/null + ip netns delete "${ns}" &> /dev/null || true if ! busywait $BUSYWAIT_TIMEOUT ip netns list | grep -vq "^$ns$" &> /dev/null; then echo "Warn: Failed to remove namespace $ns" ret=1 fi done
- [ $errexit -eq 1 ] && set -e return $ret }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
[ Upstream commit 7965a7f32a53d9ad807ce2c53bdda69ba104974f ]
When deleting netns, it is possible to still have some tasks running, e.g. background tasks like tcpdump running in the background, not stopped because the test has been interrupted.
Before deleting the netns, it is then safer to kill all attached PIDs, if any. That should reduce some noises after the end of some tests, and help with the debugging of some issues. That's why this modification is seen as a "fix".
Fixes: 25ae948b4478 ("selftests/net: add lib.sh") Acked-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Acked-by: Florian Westphal fw@strlen.de Reviewed-by: Hangbin Liu liuhangbin@gmail.com Link: https://patch.msgid.link/20240813-upstream-net-20240813-selftests-net-lib-ki... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/lib.sh | 1 + 1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/net/lib.sh b/tools/testing/selftests/net/lib.sh index b2572aff6286f..93de05fedd91a 100644 --- a/tools/testing/selftests/net/lib.sh +++ b/tools/testing/selftests/net/lib.sh @@ -132,6 +132,7 @@ cleanup_ns()
for ns in "$@"; do [ -z "${ns}" ] && continue + ip netns pids "${ns}" 2> /dev/null | xargs -r kill || true ip netns delete "${ns}" &> /dev/null || true if ! busywait $BUSYWAIT_TIMEOUT ip netns list | grep -vq "^$ns$" &> /dev/null; then echo "Warn: Failed to remove namespace $ns"
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jie Wang wangjie125@huawei.com
[ Upstream commit 8445d9d3c03101859663d34fda747f6a50947556 ]
Currently, if hns3 PF or VF FLR reset failed after five times retry, the reset done process will directly release the semaphore which has already released in hclge_reset_prepare_general. This will cause down operation fail.
So this patch fixes it by adding reset state judgement. The up operation is only called after successful PF FLR reset.
Fixes: 8627bdedc435 ("net: hns3: refactor the precedure of PF FLR") Fixes: f28368bb4542 ("net: hns3: refactor the procedure of VF FLR") Signed-off-by: Jie Wang wangjie125@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 4 ++-- drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 82574ce0194fb..125e04434611d 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -11516,8 +11516,8 @@ static void hclge_reset_done(struct hnae3_ae_dev *ae_dev) dev_err(&hdev->pdev->dev, "fail to rebuild, ret=%d\n", ret);
hdev->reset_type = HNAE3_NONE_RESET; - clear_bit(HCLGE_STATE_RST_HANDLING, &hdev->state); - up(&hdev->reset_sem); + if (test_and_clear_bit(HCLGE_STATE_RST_HANDLING, &hdev->state)) + up(&hdev->reset_sem); }
static void hclge_clear_resetting_state(struct hclge_dev *hdev) diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c index 3735d2fed11f7..094a7c7b55921 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c @@ -1747,8 +1747,8 @@ static void hclgevf_reset_done(struct hnae3_ae_dev *ae_dev) ret);
hdev->reset_type = HNAE3_NONE_RESET; - clear_bit(HCLGEVF_STATE_RST_HANDLING, &hdev->state); - up(&hdev->reset_sem); + if (test_and_clear_bit(HCLGEVF_STATE_RST_HANDLING, &hdev->state)) + up(&hdev->reset_sem); }
static u32 hclgevf_get_fw_version(struct hnae3_handle *handle)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peiyang Wang wangpeiyang1@huawei.com
[ Upstream commit 30545e17eac1f50c5ef49644daf6af205100a965 ]
Consider the followed case that the user change speed and reset the net interface. Before the hw change speed successfully, the driver get old old speed from hw by timer task. After reset, the previous speed is config to hw. As a result, the new speed is configed successfully but lost after PF reset. The followed pictured shows more dirrectly.
+------+ +----+ +----+ | USER | | PF | | HW | +---+--+ +-+--+ +-+--+ | ethtool -s 100G | | +------------------>| set speed 100G | | +--------------------->| | | set successfully | | |<---------------------+---+ | |query cfg (timer task)| | | +--------------------->| | handle speed | | return 200G | | changing event | ethtool --reset |<---------------------+ | (100G) +------------------>| cfg previous speed |<--+ | | after reset (200G) | | +--------------------->| | | +---+ | |query cfg (timer task)| | | +--------------------->| | handle speed | | return 100G | | changing event | |<---------------------+ | (200G) | | |<--+ | |query cfg (timer task)| | +--------------------->| | | return 200G | | |<---------------------+ | | | v v v
This patch save new speed if hw change speed successfully, which will be used after reset successfully.
Fixes: 2d03eacc0b7e ("net: hns3: Only update mac configuation when necessary") Signed-off-by: Peiyang Wang wangpeiyang1@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../hisilicon/hns3/hns3pf/hclge_main.c | 24 ++++++++++++++----- .../hisilicon/hns3/hns3pf/hclge_mdio.c | 3 +++ 2 files changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 125e04434611d..465f0d5822837 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -2653,8 +2653,17 @@ static int hclge_cfg_mac_speed_dup_h(struct hnae3_handle *handle, int speed, { struct hclge_vport *vport = hclge_get_vport(handle); struct hclge_dev *hdev = vport->back; + int ret; + + ret = hclge_cfg_mac_speed_dup(hdev, speed, duplex, lane_num);
- return hclge_cfg_mac_speed_dup(hdev, speed, duplex, lane_num); + if (ret) + return ret; + + hdev->hw.mac.req_speed = speed; + hdev->hw.mac.req_duplex = duplex; + + return 0; }
static int hclge_set_autoneg_en(struct hclge_dev *hdev, bool enable) @@ -2956,17 +2965,20 @@ static int hclge_mac_init(struct hclge_dev *hdev) if (!test_bit(HCLGE_STATE_RST_HANDLING, &hdev->state)) hdev->hw.mac.duplex = HCLGE_MAC_FULL;
- ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.speed, - hdev->hw.mac.duplex, hdev->hw.mac.lane_num); - if (ret) - return ret; - if (hdev->hw.mac.support_autoneg) { ret = hclge_set_autoneg_en(hdev, hdev->hw.mac.autoneg); if (ret) return ret; }
+ if (!hdev->hw.mac.autoneg) { + ret = hclge_cfg_mac_speed_dup_hw(hdev, hdev->hw.mac.req_speed, + hdev->hw.mac.req_duplex, + hdev->hw.mac.lane_num); + if (ret) + return ret; + } + mac->link = 0;
if (mac->user_fec_mode & BIT(HNAE3_FEC_USER_DEF)) { diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c index 85fb11de43a12..80079657afebe 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c @@ -191,6 +191,9 @@ static void hclge_mac_adjust_link(struct net_device *netdev) if (ret) netdev_err(netdev, "failed to adjust link.\n");
+ hdev->hw.mac.req_speed = (u32)speed; + hdev->hw.mac.req_duplex = (u8)duplex; + ret = hclge_cfg_flowctrl(hdev); if (ret) netdev_err(netdev, "failed to configure flow control.\n");
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jie Wang wangjie125@huawei.com
[ Upstream commit be5e816d00a506719e9dbb1a9c861c5ced30a109 ]
When config TC during the reset process, may cause a deadlock, the flow is as below: pf reset start │ ▼ ...... setup tc │ │ ▼ ▼ DOWN: napi_disable() napi_disable()(skip) │ │ │ ▼ ▼ ...... ...... │ │ ▼ │ napi_enable() │ ▼ UINIT: netif_napi_del() │ ▼ ...... │ ▼ INIT: netif_napi_add() │ ▼ ...... global reset start │ │ ▼ ▼ UP: napi_enable()(skip) ...... │ │ ▼ ▼ ...... napi_disable()
In reset process, the driver will DOWN the port and then UINIT, in this case, the setup tc process will UP the port before UINIT, so cause the problem. Adds a DOWN process in UINIT to fix it.
Fixes: bb6b94a896d4 ("net: hns3: Add reset interface implementation in client") Signed-off-by: Jie Wang wangjie125@huawei.com Signed-off-by: Jijie Shao shaojijie@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index a5fc0209d628e..4cbc4d069a1f3 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -5724,6 +5724,9 @@ static int hns3_reset_notify_uninit_enet(struct hnae3_handle *handle) struct net_device *netdev = handle->kinfo.netdev; struct hns3_nic_priv *priv = netdev_priv(netdev);
+ if (!test_bit(HNS3_NIC_STATE_DOWN, &priv->state)) + hns3_nic_net_stop(netdev); + if (!test_and_clear_bit(HNS3_NIC_STATE_INITED, &priv->state)) { netdev_warn(netdev, "already uninitialized\n"); return 0;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit ddf41329839f49dadf26973cd845ea160ac1784d ]
Clean up the variables in scripts/link-vmlinux.sh
- Specify the extra objects directly in vmlinux_link() - Move the AS rule to kallsyms() - Set kallsymso and btf_vmlinux_bin_o where they are generated - Remove unneeded variable, kallsymso_prev - Introduce the btf_data variable - Introduce the strip_debug flag instead of checking the output name
No functional change intended.
Signed-off-by: Masahiro Yamada masahiroy@kernel.org Reviewed-by: Nicolas Schier nicolas@fjasle.eu Stable-dep-of: 020925ce9299 ("kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols") Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/link-vmlinux.sh | 65 +++++++++++++++++++++-------------------- 1 file changed, 34 insertions(+), 31 deletions(-)
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 518c70b8db507..3d9d7257143a0 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -45,7 +45,6 @@ info()
# Link of vmlinux # ${1} - output file -# ${2}, ${3}, ... - optional extra .o files vmlinux_link() { local output=${1} @@ -90,7 +89,7 @@ vmlinux_link() ldflags="${ldflags} ${wl}--script=${objtree}/${KBUILD_LDS}"
# The kallsyms linking does not need debug symbols included. - if [ "$output" != "${output#.tmp_vmlinux.kallsyms}" ] ; then + if [ -n "${strip_debug}" ] ; then ldflags="${ldflags} ${wl}--strip-debug" fi
@@ -101,7 +100,7 @@ vmlinux_link() ${ld} ${ldflags} -o ${output} \ ${wl}--whole-archive ${objs} ${wl}--no-whole-archive \ ${wl}--start-group ${libs} ${wl}--end-group \ - $@ ${ldlibs} + ${kallsymso} ${btf_vmlinux_bin_o} ${ldlibs} }
# generate .BTF typeinfo from DWARF debuginfo @@ -110,6 +109,7 @@ vmlinux_link() gen_btf() { local pahole_ver + local btf_data=${2}
if ! [ -x "$(command -v ${PAHOLE})" ]; then echo >&2 "BTF: ${1}: pahole (${PAHOLE}) is not available" @@ -124,16 +124,16 @@ gen_btf()
vmlinux_link ${1}
- info "BTF" ${2} + info BTF "${btf_data}" LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
- # Create ${2} which contains just .BTF section but no symbols. Add + # Create ${btf_data} which contains just .BTF section but no symbols. Add # SHF_ALLOC because .BTF will be part of the vmlinux image. --strip-all # deletes all symbols including __start_BTF and __stop_BTF, which will # be redefined in the linker script. Add 2>/dev/null to suppress GNU # objcopy warnings: "empty loadable segment detected at ..." ${OBJCOPY} --only-section=.BTF --set-section-flags .BTF=alloc,readonly \ - --strip-all ${1} ${2} 2>/dev/null + --strip-all ${1} "${btf_data}" 2>/dev/null # Change e_type to ET_REL so that it can be used to link final vmlinux. # GNU ld 2.35+ and lld do not allow an ET_EXEC input. if is_enabled CONFIG_CPU_BIG_ENDIAN; then @@ -141,10 +141,12 @@ gen_btf() else et_rel='\1\0' fi - printf "${et_rel}" | dd of=${2} conv=notrunc bs=1 seek=16 status=none + printf "${et_rel}" | dd of="${btf_data}" conv=notrunc bs=1 seek=16 status=none + + btf_vmlinux_bin_o=${btf_data} }
-# Create ${2} .S file with all symbols from the ${1} object file +# Create ${2}.o file with all symbols from the ${1} object file kallsyms() { local kallsymopt; @@ -165,27 +167,25 @@ kallsyms() kallsymopt="${kallsymopt} --lto-clang" fi
- info KSYMS ${2} - scripts/kallsyms ${kallsymopt} ${1} > ${2} + info KSYMS "${2}.S" + scripts/kallsyms ${kallsymopt} "${1}" > "${2}.S" + + info AS "${2}.o" + ${CC} ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS} \ + ${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL} -c -o "${2}.o" "${2}.S" + + kallsymso=${2}.o }
# Perform one step in kallsyms generation, including temporary linking of # vmlinux. kallsyms_step() { - kallsymso_prev=${kallsymso} kallsyms_vmlinux=.tmp_vmlinux.kallsyms${1} - kallsymso=${kallsyms_vmlinux}.o - kallsyms_S=${kallsyms_vmlinux}.S - - vmlinux_link ${kallsyms_vmlinux} "${kallsymso_prev}" ${btf_vmlinux_bin_o} - mksysmap ${kallsyms_vmlinux} ${kallsyms_vmlinux}.syms - kallsyms ${kallsyms_vmlinux}.syms ${kallsyms_S}
- info AS ${kallsymso} - ${CC} ${NOSTDINC_FLAGS} ${LINUXINCLUDE} ${KBUILD_CPPFLAGS} \ - ${KBUILD_AFLAGS} ${KBUILD_AFLAGS_KERNEL} \ - -c -o ${kallsymso} ${kallsyms_S} + vmlinux_link "${kallsyms_vmlinux}" + mksysmap "${kallsyms_vmlinux}" "${kallsyms_vmlinux}.syms" + kallsyms "${kallsyms_vmlinux}.syms" "${kallsyms_vmlinux}" }
# Create map file with all symbols from ${1} @@ -223,19 +223,18 @@ fi
${MAKE} -f "${srctree}/scripts/Makefile.build" obj=init init/version-timestamp.o
-btf_vmlinux_bin_o="" +btf_vmlinux_bin_o= +kallsymso= +strip_debug= + if is_enabled CONFIG_DEBUG_INFO_BTF; then - btf_vmlinux_bin_o=.btf.vmlinux.bin.o - if ! gen_btf .tmp_vmlinux.btf $btf_vmlinux_bin_o ; then + if ! gen_btf .tmp_vmlinux.btf .btf.vmlinux.bin.o ; then echo >&2 "Failed to generate BTF for vmlinux" echo >&2 "Try to disable CONFIG_DEBUG_INFO_BTF" exit 1 fi fi
-kallsymso="" -kallsymso_prev="" -kallsyms_vmlinux="" if is_enabled CONFIG_KALLSYMS; then
# kallsyms support @@ -261,11 +260,13 @@ if is_enabled CONFIG_KALLSYMS; then # a) Verify that the System.map from vmlinux matches the map from # ${kallsymso}.
+ # The kallsyms linking does not need debug symbols included. + strip_debug=1 + kallsyms_step 1 - kallsyms_step 2 + size1=$(${CONFIG_SHELL} "${srctree}/scripts/file-size.sh" ${kallsymso})
- # step 3 - size1=$(${CONFIG_SHELL} "${srctree}/scripts/file-size.sh" ${kallsymso_prev}) + kallsyms_step 2 size2=$(${CONFIG_SHELL} "${srctree}/scripts/file-size.sh" ${kallsymso})
if [ $size1 -ne $size2 ] || [ -n "${KALLSYMS_EXTRA_PASS}" ]; then @@ -273,7 +274,9 @@ if is_enabled CONFIG_KALLSYMS; then fi fi
-vmlinux_link vmlinux "${kallsymso}" ${btf_vmlinux_bin_o} +strip_debug= + +vmlinux_link vmlinux
# fill in BTF IDs if is_enabled CONFIG_DEBUG_INFO_BTF && is_enabled CONFIG_BPF; then
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit c442db3f49f27e5a60a641b2ac9a3c6320796ed6 ]
This reimplements commit 951bcae6c5a0 ("kallsyms: Avoid weak references for kallsyms symbols") because I am not a big fan of PROVIDE().
As an alternative solution, this commit prepends one more kallsyms step.
KSYMS .tmp_vmlinux.kallsyms0.S # added AS .tmp_vmlinux.kallsyms0.o # added LD .tmp_vmlinux.btf BTF .btf.vmlinux.bin.o LD .tmp_vmlinux.kallsyms1 NM .tmp_vmlinux.kallsyms1.syms KSYMS .tmp_vmlinux.kallsyms1.S AS .tmp_vmlinux.kallsyms1.o LD .tmp_vmlinux.kallsyms2 NM .tmp_vmlinux.kallsyms2.syms KSYMS .tmp_vmlinux.kallsyms2.S AS .tmp_vmlinux.kallsyms2.o LD vmlinux
Step 0 takes /dev/null as input, and generates .tmp_vmlinux.kallsyms0.o, which has a valid kallsyms format with the empty symbol list, and can be linked to vmlinux. Since it is really small, the added compile-time cost is negligible.
Signed-off-by: Masahiro Yamada masahiroy@kernel.org Acked-by: Ard Biesheuvel ardb@kernel.org Reviewed-by: Nicolas Schier nicolas@fjasle.eu Stable-dep-of: 020925ce9299 ("kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols") Signed-off-by: Sasha Levin sashal@kernel.org --- include/asm-generic/vmlinux.lds.h | 19 ------------------- kernel/kallsyms_internal.h | 5 ----- scripts/kallsyms.c | 6 ------ scripts/link-vmlinux.sh | 9 +++++++-- 4 files changed, 7 insertions(+), 32 deletions(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h index 70bf1004076b2..f00a8e18f389f 100644 --- a/include/asm-generic/vmlinux.lds.h +++ b/include/asm-generic/vmlinux.lds.h @@ -451,30 +451,11 @@ #endif #endif
-/* - * Some symbol definitions will not exist yet during the first pass of the - * link, but are guaranteed to exist in the final link. Provide preliminary - * definitions that will be superseded in the final link to avoid having to - * rely on weak external linkage, which requires a GOT when used in position - * independent code. - */ -#define PRELIMINARY_SYMBOL_DEFINITIONS \ - PROVIDE(kallsyms_addresses = .); \ - PROVIDE(kallsyms_offsets = .); \ - PROVIDE(kallsyms_names = .); \ - PROVIDE(kallsyms_num_syms = .); \ - PROVIDE(kallsyms_relative_base = .); \ - PROVIDE(kallsyms_token_table = .); \ - PROVIDE(kallsyms_token_index = .); \ - PROVIDE(kallsyms_markers = .); \ - PROVIDE(kallsyms_seqs_of_names = .); - /* * Read only Data */ #define RO_DATA(align) \ . = ALIGN((align)); \ - PRELIMINARY_SYMBOL_DEFINITIONS \ .rodata : AT(ADDR(.rodata) - LOAD_OFFSET) { \ __start_rodata = .; \ *(.rodata) *(.rodata.*) \ diff --git a/kernel/kallsyms_internal.h b/kernel/kallsyms_internal.h index 85480274fc8fb..925f2ab22639a 100644 --- a/kernel/kallsyms_internal.h +++ b/kernel/kallsyms_internal.h @@ -4,11 +4,6 @@
#include <linux/types.h>
-/* - * These will be re-linked against their real values during the second link - * stage. Preliminary values must be provided in the linker script using the - * PROVIDE() directive so that the first link stage can complete successfully. - */ extern const unsigned long kallsyms_addresses[]; extern const int kallsyms_offsets[]; extern const u8 kallsyms_names[]; diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c index 47978efe4797c..fa53b5eef5530 100644 --- a/scripts/kallsyms.c +++ b/scripts/kallsyms.c @@ -259,12 +259,6 @@ static void shrink_table(void) } } table_cnt = pos; - - /* When valid symbol is not registered, exit to error */ - if (!table_cnt) { - fprintf(stderr, "No valid symbol.\n"); - exit(1); - } }
static void read_map(const char *in) diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 3d9d7257143a0..83d605ba7241a 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -227,6 +227,10 @@ btf_vmlinux_bin_o= kallsymso= strip_debug=
+if is_enabled CONFIG_KALLSYMS; then + kallsyms /dev/null .tmp_vmlinux.kallsyms0 +fi + if is_enabled CONFIG_DEBUG_INFO_BTF; then if ! gen_btf .tmp_vmlinux.btf .btf.vmlinux.bin.o ; then echo >&2 "Failed to generate BTF for vmlinux" @@ -239,9 +243,10 @@ if is_enabled CONFIG_KALLSYMS; then
# kallsyms support # Generate section listing all symbols and add it into vmlinux - # It's a three step process: + # It's a four step process: + # 0) Generate a dummy __kallsyms with empty symbol list. # 1) Link .tmp_vmlinux.kallsyms1 so it has all symbols and sections, - # but __kallsyms is empty. + # with a dummy __kallsyms. # Running kallsyms on that gives us .tmp_kallsyms1.o with # the right size # 2) Link .tmp_vmlinux.kallsyms2 so it now has a __kallsyms section of
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jann Horn jannh@google.com
[ Upstream commit 64e166099b69bfc09f667253358a15160b86ea43 ]
Commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture") removed the last use of the absolute kallsyms.
Signed-off-by: Jann Horn jannh@google.com Acked-by: Arnd Bergmann arnd@arndb.de Link: https://lore.kernel.org/all/20240221202655.2423854-1-jannh@google.com/ [masahiroy@kernel.org: rebase the code and reword the commit description] Signed-off-by: Masahiro Yamada masahiroy@kernel.org Stable-dep-of: 020925ce9299 ("kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols") Signed-off-by: Sasha Levin sashal@kernel.org --- init/Kconfig | 18 ------- kernel/kallsyms.c | 5 +- kernel/kallsyms_internal.h | 1 - kernel/vmcore_info.c | 4 -- scripts/kallsyms.c | 78 ++++++++++++----------------- scripts/link-vmlinux.sh | 4 -- tools/perf/tests/vmlinux-kallsyms.c | 1 - 7 files changed, 33 insertions(+), 78 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index d8a971b804d32..6e97693b675f2 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1789,24 +1789,6 @@ config KALLSYMS_ABSOLUTE_PERCPU depends on KALLSYMS default X86_64 && SMP
-config KALLSYMS_BASE_RELATIVE - bool - depends on KALLSYMS - default y - help - Instead of emitting them as absolute values in the native word size, - emit the symbol references in the kallsyms table as 32-bit entries, - each containing a relative value in the range [base, base + U32_MAX] - or, when KALLSYMS_ABSOLUTE_PERCPU is in effect, each containing either - an absolute value in the range [0, S32_MAX] or a relative value in the - range [base, base + S32_MAX], where base is the lowest relative symbol - address encountered in the image. - - On 64-bit builds, this reduces the size of the address table by 50%, - but more importantly, it results in entries whose values are build - time constants, and no relocation pass is required at runtime to fix - up the entries based on the runtime load address of the kernel. - # end of the "standard kernel features (expert users)" menu
config ARCH_HAS_MEMBARRIER_CALLBACKS diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index 98b9622d372e4..fb2c77368d187 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -148,9 +148,6 @@ static unsigned int get_symbol_offset(unsigned long pos)
unsigned long kallsyms_sym_address(int idx) { - if (!IS_ENABLED(CONFIG_KALLSYMS_BASE_RELATIVE)) - return kallsyms_addresses[idx]; - /* values are unsigned offsets if --absolute-percpu is not in effect */ if (!IS_ENABLED(CONFIG_KALLSYMS_ABSOLUTE_PERCPU)) return kallsyms_relative_base + (u32)kallsyms_offsets[idx]; @@ -325,7 +322,7 @@ static unsigned long get_symbol_pos(unsigned long addr, unsigned long symbol_start = 0, symbol_end = 0; unsigned long i, low, high, mid;
- /* Do a binary search on the sorted kallsyms_addresses array. */ + /* Do a binary search on the sorted kallsyms_offsets array. */ low = 0; high = kallsyms_num_syms;
diff --git a/kernel/kallsyms_internal.h b/kernel/kallsyms_internal.h index 925f2ab22639a..9633782f82500 100644 --- a/kernel/kallsyms_internal.h +++ b/kernel/kallsyms_internal.h @@ -4,7 +4,6 @@
#include <linux/types.h>
-extern const unsigned long kallsyms_addresses[]; extern const int kallsyms_offsets[]; extern const u8 kallsyms_names[];
diff --git a/kernel/vmcore_info.c b/kernel/vmcore_info.c index 1d5eadd9dd61c..8b4f8cc2e0ec0 100644 --- a/kernel/vmcore_info.c +++ b/kernel/vmcore_info.c @@ -216,12 +216,8 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_SYMBOL(kallsyms_num_syms); VMCOREINFO_SYMBOL(kallsyms_token_table); VMCOREINFO_SYMBOL(kallsyms_token_index); -#ifdef CONFIG_KALLSYMS_BASE_RELATIVE VMCOREINFO_SYMBOL(kallsyms_offsets); VMCOREINFO_SYMBOL(kallsyms_relative_base); -#else - VMCOREINFO_SYMBOL(kallsyms_addresses); -#endif /* CONFIG_KALLSYMS_BASE_RELATIVE */ #endif /* CONFIG_KALLSYMS */
arch_crash_save_vmcoreinfo(); diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c index fa53b5eef5530..55a423519f2e5 100644 --- a/scripts/kallsyms.c +++ b/scripts/kallsyms.c @@ -6,7 +6,7 @@ * of the GNU General Public License, incorporated herein by reference. * * Usage: kallsyms [--all-symbols] [--absolute-percpu] - * [--base-relative] [--lto-clang] in.map > out.S + * [--lto-clang] in.map > out.S * * Table compression uses all the unused char codes on the symbols and * maps these to the most used substrings (tokens). For instance, it might @@ -63,7 +63,6 @@ static struct sym_entry **table; static unsigned int table_size, table_cnt; static int all_symbols; static int absolute_percpu; -static int base_relative; static int lto_clang;
static int token_profit[0x10000]; @@ -76,7 +75,7 @@ static unsigned char best_table_len[256]; static void usage(void) { fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] " - "[--base-relative] [--lto-clang] in.map > out.S\n"); + "[--lto-clang] in.map > out.S\n"); exit(1); }
@@ -491,54 +490,43 @@ static void write_src(void) printf("\t.short\t%d\n", best_idx[i]); printf("\n");
- if (!base_relative) - output_label("kallsyms_addresses"); - else - output_label("kallsyms_offsets"); + output_label("kallsyms_offsets");
for (i = 0; i < table_cnt; i++) { - if (base_relative) { - /* - * Use the offset relative to the lowest value - * encountered of all relative symbols, and emit - * non-relocatable fixed offsets that will be fixed - * up at runtime. - */ + /* + * Use the offset relative to the lowest value + * encountered of all relative symbols, and emit + * non-relocatable fixed offsets that will be fixed + * up at runtime. + */
- long long offset; - int overflow; - - if (!absolute_percpu) { - offset = table[i]->addr - relative_base; - overflow = (offset < 0 || offset > UINT_MAX); - } else if (symbol_absolute(table[i])) { - offset = table[i]->addr; - overflow = (offset < 0 || offset > INT_MAX); - } else { - offset = relative_base - table[i]->addr - 1; - overflow = (offset < INT_MIN || offset >= 0); - } - if (overflow) { - fprintf(stderr, "kallsyms failure: " - "%s symbol value %#llx out of range in relative mode\n", - symbol_absolute(table[i]) ? "absolute" : "relative", - table[i]->addr); - exit(EXIT_FAILURE); - } - printf("\t.long\t%#x /* %s */\n", (int)offset, table[i]->sym); - } else if (!symbol_absolute(table[i])) { - output_address(table[i]->addr); + long long offset; + int overflow; + + if (!absolute_percpu) { + offset = table[i]->addr - relative_base; + overflow = (offset < 0 || offset > UINT_MAX); + } else if (symbol_absolute(table[i])) { + offset = table[i]->addr; + overflow = (offset < 0 || offset > INT_MAX); } else { - printf("\tPTR\t%#llx\n", table[i]->addr); + offset = relative_base - table[i]->addr - 1; + overflow = (offset < INT_MIN || offset >= 0); + } + if (overflow) { + fprintf(stderr, "kallsyms failure: " + "%s symbol value %#llx out of range in relative mode\n", + symbol_absolute(table[i]) ? "absolute" : "relative", + table[i]->addr); + exit(EXIT_FAILURE); } + printf("\t.long\t%#x /* %s */\n", (int)offset, table[i]->sym); } printf("\n");
- if (base_relative) { - output_label("kallsyms_relative_base"); - output_address(relative_base); - printf("\n"); - } + output_label("kallsyms_relative_base"); + output_address(relative_base); + printf("\n");
if (lto_clang) for (i = 0; i < table_cnt; i++) @@ -820,7 +808,6 @@ int main(int argc, char **argv) static const struct option long_options[] = { {"all-symbols", no_argument, &all_symbols, 1}, {"absolute-percpu", no_argument, &absolute_percpu, 1}, - {"base-relative", no_argument, &base_relative, 1}, {"lto-clang", no_argument, <o_clang, 1}, {}, }; @@ -841,8 +828,7 @@ int main(int argc, char **argv) if (absolute_percpu) make_percpus_absolute(); sort_symbols(); - if (base_relative) - record_relative_base(); + record_relative_base(); optimize_token_table(); write_src();
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 83d605ba7241a..31581504489ef 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -159,10 +159,6 @@ kallsyms() kallsymopt="${kallsymopt} --absolute-percpu" fi
- if is_enabled CONFIG_KALLSYMS_BASE_RELATIVE; then - kallsymopt="${kallsymopt} --base-relative" - fi - if is_enabled CONFIG_LTO_CLANG; then kallsymopt="${kallsymopt} --lto-clang" fi diff --git a/tools/perf/tests/vmlinux-kallsyms.c b/tools/perf/tests/vmlinux-kallsyms.c index e30fd55f8e51d..cd3b480d20bd6 100644 --- a/tools/perf/tests/vmlinux-kallsyms.c +++ b/tools/perf/tests/vmlinux-kallsyms.c @@ -26,7 +26,6 @@ static bool is_ignored_symbol(const char *name, char type) * when --all-symbols is specified so exclude them to get a * stable symbol list. */ - "kallsyms_addresses", "kallsyms_offsets", "kallsyms_relative_base", "kallsyms_num_syms",
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Song Liu song@kernel.org
[ Upstream commit 020925ce92990c3bf59ab2cde386ac6d9ec734ff ]
Cleaning up the symbols causes various issues afterwards. Let's sort the list based on original name.
Signed-off-by: Song Liu song@kernel.org Fixes: 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions") Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Tested-by: Masami Hiramatsu (Google) mhiramat@kernel.org Acked-by: Petr Mladek pmladek@suse.com Reviewed-by: Sami Tolvanen samitolvanen@google.com Reviewed-by: Luis Chamberlain mcgrof@kernel.org Link: https://lore.kernel.org/r/20240807220513.3100483-2-song@kernel.org Signed-off-by: Kees Cook kees@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/kallsyms.c | 31 ++----------------------------- scripts/link-vmlinux.sh | 4 ---- 2 files changed, 2 insertions(+), 33 deletions(-)
diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c index 55a423519f2e5..839d9c49f28ce 100644 --- a/scripts/kallsyms.c +++ b/scripts/kallsyms.c @@ -5,8 +5,7 @@ * This software may be used and distributed according to the terms * of the GNU General Public License, incorporated herein by reference. * - * Usage: kallsyms [--all-symbols] [--absolute-percpu] - * [--lto-clang] in.map > out.S + * Usage: kallsyms [--all-symbols] [--absolute-percpu] in.map > out.S * * Table compression uses all the unused char codes on the symbols and * maps these to the most used substrings (tokens). For instance, it might @@ -63,7 +62,6 @@ static struct sym_entry **table; static unsigned int table_size, table_cnt; static int all_symbols; static int absolute_percpu; -static int lto_clang;
static int token_profit[0x10000];
@@ -74,8 +72,7 @@ static unsigned char best_table_len[256];
static void usage(void) { - fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] " - "[--lto-clang] in.map > out.S\n"); + fprintf(stderr, "Usage: kallsyms [--all-symbols] [--absolute-percpu] in.map > out.S\n"); exit(1); }
@@ -345,25 +342,6 @@ static int symbol_absolute(const struct sym_entry *s) return s->percpu_absolute; }
-static void cleanup_symbol_name(char *s) -{ - char *p; - - /* - * ASCII[.] = 2e - * ASCII[0-9] = 30,39 - * ASCII[A-Z] = 41,5a - * ASCII[_] = 5f - * ASCII[a-z] = 61,7a - * - * As above, replacing the first '.' in ".llvm." with '\0' does not - * affect the main sorting, but it helps us with subsorting. - */ - p = strstr(s, ".llvm."); - if (p) - *p = '\0'; -} - static int compare_names(const void *a, const void *b) { int ret; @@ -528,10 +506,6 @@ static void write_src(void) output_address(relative_base); printf("\n");
- if (lto_clang) - for (i = 0; i < table_cnt; i++) - cleanup_symbol_name((char *)table[i]->sym); - sort_symbols_by_name(); output_label("kallsyms_seqs_of_names"); for (i = 0; i < table_cnt; i++) @@ -808,7 +782,6 @@ int main(int argc, char **argv) static const struct option long_options[] = { {"all-symbols", no_argument, &all_symbols, 1}, {"absolute-percpu", no_argument, &absolute_percpu, 1}, - {"lto-clang", no_argument, <o_clang, 1}, {}, };
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 31581504489ef..1e41b330550e6 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -159,10 +159,6 @@ kallsyms() kallsymopt="${kallsymopt} --absolute-percpu" fi
- if is_enabled CONFIG_LTO_CLANG; then - kallsymopt="${kallsymopt} --lto-clang" - fi - info KSYMS "${2}.S" scripts/kallsyms ${kallsymopt} "${1}" > "${2}.S"
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Song Liu song@kernel.org
[ Upstream commit fb6a421fb6153d97cf3058f9bd550b377b76a490 ]
With CONFIG_LTO_CLANG=y, the compiler may add .llvm.<hash> suffix to function names to avoid duplication. APIs like kallsyms_lookup_name() and kallsyms_on_each_match_symbol() tries to match these symbol names without the .llvm.<hash> suffix, e.g., match "c_stop" with symbol c_stop.llvm.17132674095431275852. This turned out to be problematic for use cases that require exact match, for example, livepatch.
Fix this by making the APIs to match symbols exactly.
Also cleanup kallsyms_selftests accordingly.
Signed-off-by: Song Liu song@kernel.org Fixes: 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions") Tested-by: Masami Hiramatsu (Google) mhiramat@kernel.org Reviewed-by: Masami Hiramatsu (Google) mhiramat@kernel.org Acked-by: Petr Mladek pmladek@suse.com Reviewed-by: Sami Tolvanen samitolvanen@google.com Reviewed-by: Luis Chamberlain mcgrof@kernel.org Link: https://lore.kernel.org/r/20240807220513.3100483-3-song@kernel.org Signed-off-by: Kees Cook kees@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/kallsyms.c | 55 +++++--------------------------------- kernel/kallsyms_selftest.c | 22 +-------------- 2 files changed, 7 insertions(+), 70 deletions(-)
diff --git a/kernel/kallsyms.c b/kernel/kallsyms.c index fb2c77368d187..a9a0ca605d4a8 100644 --- a/kernel/kallsyms.c +++ b/kernel/kallsyms.c @@ -160,38 +160,6 @@ unsigned long kallsyms_sym_address(int idx) return kallsyms_relative_base - 1 - kallsyms_offsets[idx]; }
-static void cleanup_symbol_name(char *s) -{ - char *res; - - if (!IS_ENABLED(CONFIG_LTO_CLANG)) - return; - - /* - * LLVM appends various suffixes for local functions and variables that - * must be promoted to global scope as part of LTO. This can break - * hooking of static functions with kprobes. '.' is not a valid - * character in an identifier in C. Suffixes only in LLVM LTO observed: - * - foo.llvm.[0-9a-f]+ - */ - res = strstr(s, ".llvm."); - if (res) - *res = '\0'; - - return; -} - -static int compare_symbol_name(const char *name, char *namebuf) -{ - /* The kallsyms_seqs_of_names is sorted based on names after - * cleanup_symbol_name() (see scripts/kallsyms.c) if clang lto is enabled. - * To ensure correct bisection in kallsyms_lookup_names(), do - * cleanup_symbol_name(namebuf) before comparing name and namebuf. - */ - cleanup_symbol_name(namebuf); - return strcmp(name, namebuf); -} - static unsigned int get_symbol_seq(int index) { unsigned int i, seq = 0; @@ -219,7 +187,7 @@ static int kallsyms_lookup_names(const char *name, seq = get_symbol_seq(mid); off = get_symbol_offset(seq); kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); - ret = compare_symbol_name(name, namebuf); + ret = strcmp(name, namebuf); if (ret > 0) low = mid + 1; else if (ret < 0) @@ -236,7 +204,7 @@ static int kallsyms_lookup_names(const char *name, seq = get_symbol_seq(low - 1); off = get_symbol_offset(seq); kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); - if (compare_symbol_name(name, namebuf)) + if (strcmp(name, namebuf)) break; low--; } @@ -248,7 +216,7 @@ static int kallsyms_lookup_names(const char *name, seq = get_symbol_seq(high + 1); off = get_symbol_offset(seq); kallsyms_expand_symbol(off, namebuf, ARRAY_SIZE(namebuf)); - if (compare_symbol_name(name, namebuf)) + if (strcmp(name, namebuf)) break; high++; } @@ -407,8 +375,7 @@ static int kallsyms_lookup_buildid(unsigned long addr, if (modbuildid) *modbuildid = NULL;
- ret = strlen(namebuf); - goto found; + return strlen(namebuf); }
/* See if it's in a module or a BPF JITed image. */ @@ -422,8 +389,6 @@ static int kallsyms_lookup_buildid(unsigned long addr, ret = ftrace_mod_address_lookup(addr, symbolsize, offset, modname, namebuf);
-found: - cleanup_symbol_name(namebuf); return ret; }
@@ -450,8 +415,6 @@ const char *kallsyms_lookup(unsigned long addr,
int lookup_symbol_name(unsigned long addr, char *symname) { - int res; - symname[0] = '\0'; symname[KSYM_NAME_LEN - 1] = '\0';
@@ -462,16 +425,10 @@ int lookup_symbol_name(unsigned long addr, char *symname) /* Grab name */ kallsyms_expand_symbol(get_symbol_offset(pos), symname, KSYM_NAME_LEN); - goto found; + return 0; } /* See if it's in a module. */ - res = lookup_module_symbol_name(addr, symname); - if (res) - return res; - -found: - cleanup_symbol_name(symname); - return 0; + return lookup_module_symbol_name(addr, symname); }
/* Look up a kernel symbol and return it in a text buffer. */ diff --git a/kernel/kallsyms_selftest.c b/kernel/kallsyms_selftest.c index 2f84896a7bcbd..873f7c445488c 100644 --- a/kernel/kallsyms_selftest.c +++ b/kernel/kallsyms_selftest.c @@ -187,31 +187,11 @@ static void test_perf_kallsyms_lookup_name(void) stat.min, stat.max, div_u64(stat.sum, stat.real_cnt)); }
-static bool match_cleanup_name(const char *s, const char *name) -{ - char *p; - int len; - - if (!IS_ENABLED(CONFIG_LTO_CLANG)) - return false; - - p = strstr(s, ".llvm."); - if (!p) - return false; - - len = strlen(name); - if (p - s != len) - return false; - - return !strncmp(s, name, len); -} - static int find_symbol(void *data, const char *name, unsigned long addr) { struct test_stat *stat = (struct test_stat *)data;
- if (strcmp(name, stat->name) == 0 || - (!stat->perf && match_cleanup_name(name, stat->name))) { + if (!strcmp(name, stat->name)) { stat->real_cnt++; stat->addr = addr;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Barak Biber bbiber@nvidia.com
[ Upstream commit fca5b78511e98bdff2cdd55c172b23200a7b3404 ]
When iommu_report_device_fault gets called with a partial fault it is supposed to collect the fault into the group and then return.
Instead the return was accidently deleted which results in trying to process the fault and an eventual crash.
Deleting the return was a typo, put it back.
Fixes: 3dfa64aecbaf ("iommu: Make iommu_report_device_fault() return void") Signed-off-by: Barak Biber bbiber@nvidia.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Reviewed-by: Lu Baolu baolu.lu@linux.intel.com Link: https://lore.kernel.org/r/0-v1-e7153d9c8cee+1c6-iommu_fault_fix_jgg@nvidia.c... Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/io-pgfault.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/iommu/io-pgfault.c b/drivers/iommu/io-pgfault.c index 06d78fcc79fdb..f2c87c695a17c 100644 --- a/drivers/iommu/io-pgfault.c +++ b/drivers/iommu/io-pgfault.c @@ -192,6 +192,7 @@ void iommu_report_device_fault(struct device *dev, struct iopf_fault *evt) report_partial_fault(iopf_param, fault); iopf_put_dev_fault_param(iopf_param); /* A request that is not the last does not need to be ack'd */ + return; }
/*
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Asmaa Mnebhi asmaa@nvidia.com
[ Upstream commit aad41832326723627ad8ac9ee8a543b6dca4454d ]
During Linux graceful reboot, the GPIO interrupts are not disabled. Since the drivers are not removed during graceful reboot, the logic to call mlxbf3_gpio_irq_disable() is not triggered. Interrupts that remain enabled can cause issues on subsequent boots.
For example, the mlxbf-gige driver contains PHY logic to bring up the link. If the gpio-mlxbf3 driver loads first, the mlxbf-gige driver will use a GPIO interrupt to bring up the link. Otherwise, it will use polling. The next time Linux boots and loads the drivers in this order, we encounter the issue: - mlxbf-gige loads first and uses polling while the GPIO10 interrupt is still enabled from the previous boot. So if the interrupt triggers, there is nothing to clear it. - gpio-mlxbf3 loads. - i2c-mlxbf loads. The interrupt doesn't trigger for I2C because it is shared with the GPIO interrupt line which was not cleared.
The solution is to add a shutdown function to the GPIO driver to clear and disable all interrupts. Also clear the interrupt after disabling it in mlxbf3_gpio_irq_disable().
Fixes: 38a700efc510 ("gpio: mlxbf3: Add gpio driver support") Signed-off-by: Asmaa Mnebhi asmaa@nvidia.com Reviewed-by: David Thompson davthompson@nvidia.com Reviewed-by: Andy Shevchenko andy@kernel.org Reviewed-by: Linus Walleij linus.walleij@linaro.org Link: https://lore.kernel.org/r/20240611171509.22151-1-asmaa@nvidia.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-mlxbf3.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/drivers/gpio/gpio-mlxbf3.c b/drivers/gpio/gpio-mlxbf3.c index d5906d419b0ab..10ea71273c891 100644 --- a/drivers/gpio/gpio-mlxbf3.c +++ b/drivers/gpio/gpio-mlxbf3.c @@ -39,6 +39,8 @@ #define MLXBF_GPIO_CAUSE_OR_EVTEN0 0x14 #define MLXBF_GPIO_CAUSE_OR_CLRCAUSE 0x18
+#define MLXBF_GPIO_CLR_ALL_INTS GENMASK(31, 0) + struct mlxbf3_gpio_context { struct gpio_chip gc;
@@ -82,6 +84,8 @@ static void mlxbf3_gpio_irq_disable(struct irq_data *irqd) val = readl(gs->gpio_cause_io + MLXBF_GPIO_CAUSE_OR_EVTEN0); val &= ~BIT(offset); writel(val, gs->gpio_cause_io + MLXBF_GPIO_CAUSE_OR_EVTEN0); + + writel(BIT(offset), gs->gpio_cause_io + MLXBF_GPIO_CAUSE_OR_CLRCAUSE); raw_spin_unlock_irqrestore(&gs->gc.bgpio_lock, flags);
gpiochip_disable_irq(gc, offset); @@ -253,6 +257,15 @@ static int mlxbf3_gpio_probe(struct platform_device *pdev) return 0; }
+static void mlxbf3_gpio_shutdown(struct platform_device *pdev) +{ + struct mlxbf3_gpio_context *gs = platform_get_drvdata(pdev); + + /* Disable and clear all interrupts */ + writel(0, gs->gpio_cause_io + MLXBF_GPIO_CAUSE_OR_EVTEN0); + writel(MLXBF_GPIO_CLR_ALL_INTS, gs->gpio_cause_io + MLXBF_GPIO_CAUSE_OR_CLRCAUSE); +} + static const struct acpi_device_id mlxbf3_gpio_acpi_match[] = { { "MLNXBF33", 0 }, {} @@ -265,6 +278,7 @@ static struct platform_driver mlxbf3_gpio_driver = { .acpi_match_table = mlxbf3_gpio_acpi_match, }, .probe = mlxbf3_gpio_probe, + .shutdown = mlxbf3_gpio_shutdown, }; module_platform_driver(mlxbf3_gpio_driver);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Parsa Poorshikhian parsa.poorsh@gmail.com
[ Upstream commit ef9718b3d54e822de294351251f3a574f8a082ce ]
Fix noise from speakers connected to AUX port when no sound is playing. The problem occurs because the `alc_shutup_pins` function includes a 0x10ec0257 vendor ID, which causes noise on Lenovo IdeaPad 3 15IAU7 with Realtek ALC257 codec when no sound is playing. Removing this vendor ID from the function fixes the bug.
Fixes: 70794b9563fe ("ALSA: hda/realtek: Add more codec ID to no shutup pins list") Signed-off-by: Parsa Poorshikhian parsa.poorsh@gmail.com Link: https://patch.msgid.link/20240810150939.330693-1-parsa.poorsh@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 3840565ef8b02..c9d76bca99232 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -583,7 +583,6 @@ static void alc_shutup_pins(struct hda_codec *codec) switch (codec->core.vendor_id) { case 0x10ec0236: case 0x10ec0256: - case 0x10ec0257: case 0x19e58326: case 0x10ec0283: case 0x10ec0285:
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maíra Canal mcanal@igalia.com
[ Upstream commit 497d370a644d95a9f04271aa92cb96d32e84c770 ]
When enabling UBSAN on Raspberry Pi 5, we get the following warning:
[ 387.894977] UBSAN: array-index-out-of-bounds in drivers/gpu/drm/v3d/v3d_sched.c:320:3 [ 387.903868] index 7 is out of range for type '__u32 [7]' [ 387.909692] CPU: 0 PID: 1207 Comm: kworker/u16:2 Tainted: G WC 6.10.3-v8-16k-numa #151 [ 387.919166] Hardware name: Raspberry Pi 5 Model B Rev 1.0 (DT) [ 387.925961] Workqueue: v3d_csd drm_sched_run_job_work [gpu_sched] [ 387.932525] Call trace: [ 387.935296] dump_backtrace+0x170/0x1b8 [ 387.939403] show_stack+0x20/0x38 [ 387.942907] dump_stack_lvl+0x90/0xd0 [ 387.946785] dump_stack+0x18/0x28 [ 387.950301] __ubsan_handle_out_of_bounds+0x98/0xd0 [ 387.955383] v3d_csd_job_run+0x3a8/0x438 [v3d] [ 387.960707] drm_sched_run_job_work+0x520/0x6d0 [gpu_sched] [ 387.966862] process_one_work+0x62c/0xb48 [ 387.971296] worker_thread+0x468/0x5b0 [ 387.975317] kthread+0x1c4/0x1e0 [ 387.978818] ret_from_fork+0x10/0x20 [ 387.983014] ---[ end trace ]---
This happens because the UAPI provides only seven configuration registers and we are reading the eighth position of this u32 array.
Therefore, fix the out-of-bounds read in `v3d_csd_job_run()` by accessing only seven positions on the '__u32 [7]' array. The eighth register exists indeed on V3D 7.1, but it isn't currently used. That being so, let's guarantee that it remains unused and add a note that it could be set in a future patch.
Fixes: 0ad5bc1ce463 ("drm/v3d: fix up register addresses for V3D 7.x") Reported-by: Tvrtko Ursulin tvrtko.ursulin@igalia.com Signed-off-by: Maíra Canal mcanal@igalia.com Reviewed-by: Iago Toral Quiroga itoral@igalia.com Link: https://patchwork.freedesktop.org/patch/msgid/20240809152001.668314-1-mcanal... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/v3d/v3d_sched.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/v3d/v3d_sched.c b/drivers/gpu/drm/v3d/v3d_sched.c index 30d5366d62883..0132403b8159f 100644 --- a/drivers/gpu/drm/v3d/v3d_sched.c +++ b/drivers/gpu/drm/v3d/v3d_sched.c @@ -315,7 +315,7 @@ v3d_csd_job_run(struct drm_sched_job *sched_job) struct v3d_dev *v3d = job->base.v3d; struct drm_device *dev = &v3d->drm; struct dma_fence *fence; - int i, csd_cfg0_reg, csd_cfg_reg_count; + int i, csd_cfg0_reg;
v3d->csd_job = job;
@@ -335,9 +335,17 @@ v3d_csd_job_run(struct drm_sched_job *sched_job) v3d_switch_perfmon(v3d, &job->base);
csd_cfg0_reg = V3D_CSD_QUEUED_CFG0(v3d->ver); - csd_cfg_reg_count = v3d->ver < 71 ? 6 : 7; - for (i = 1; i <= csd_cfg_reg_count; i++) + for (i = 1; i <= 6; i++) V3D_CORE_WRITE(0, csd_cfg0_reg + 4 * i, job->args.cfg[i]); + + /* Although V3D 7.1 has an eighth configuration register, we are not + * using it. Therefore, make sure it remains unused. + * + * XXX: Set the CFG7 register + */ + if (v3d->ver >= 71) + V3D_CORE_WRITE(0, V3D_V7_CSD_QUEUED_CFG7, 0); + /* CFG0 write kicks off the job. */ V3D_CORE_WRITE(0, csd_cfg0_reg, job->args.cfg[0]);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miguel Ojeda ojeda@kernel.org
[ Upstream commit 9e98db17837093cb0f4dcfcc3524739d93249c45 ]
`bindgen` 0.69.0 contains a bug: `--version` does not work without providing a header [1]:
error: the following required arguments were not provided: <HEADER>
Usage: bindgen <FLAGS> <OPTIONS> <HEADER> -- <CLANG_ARGS>...
Thus, in preparation for supporting several `bindgen` versions, work around the issue by passing a dummy argument.
Include a comment so that we can remove the workaround in the future.
Link: https://github.com/rust-lang/rust-bindgen/pull/2678 [1] Reviewed-by: Finn Behrens me@kloenk.dev Tested-by: Benno Lossin benno.lossin@proton.me Tested-by: Andreas Hindborg a.hindborg@samsung.com Link: https://lore.kernel.org/r/20240709160615.998336-9-ojeda@kernel.org Signed-off-by: Miguel Ojeda ojeda@kernel.org Stable-dep-of: 5ce86c6c8613 ("rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT") Signed-off-by: Sasha Levin sashal@kernel.org --- init/Kconfig | 5 ++++- scripts/rust_is_available.sh | 6 +++++- 2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index 6e97693b675f2..b7683506264f0 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1911,7 +1911,10 @@ config RUSTC_VERSION_TEXT config BINDGEN_VERSION_TEXT string depends on RUST - default $(shell,command -v $(BINDGEN) >/dev/null 2>&1 && $(BINDGEN) --version || echo n) + # The dummy parameter `workaround-for-0.69.0` is required to support 0.69.0 + # (https://github.com/rust-lang/rust-bindgen/pull/2678). It can be removed when + # the minimum version is upgraded past that (0.69.1 already fixed the issue). + default $(shell,command -v $(BINDGEN) >/dev/null 2>&1 && $(BINDGEN) --version workaround-for-0.69.0 || echo n)
# # Place an empty function call at each tracepoint site. Can be diff --git a/scripts/rust_is_available.sh b/scripts/rust_is_available.sh index 117018946b577..a6fdcf13e0e53 100755 --- a/scripts/rust_is_available.sh +++ b/scripts/rust_is_available.sh @@ -129,8 +129,12 @@ fi # Check that the Rust bindings generator is suitable. # # Non-stable and distributions' versions may have a version suffix, e.g. `-dev`. +# +# The dummy parameter `workaround-for-0.69.0` is required to support 0.69.0 +# (https://github.com/rust-lang/rust-bindgen/pull/2678). It can be removed when +# the minimum version is upgraded past that (0.69.1 already fixed the issue). rust_bindings_generator_output=$( \ - LC_ALL=C "$BINDGEN" --version 2>/dev/null + LC_ALL=C "$BINDGEN" --version workaround-for-0.69.0 2>/dev/null ) || rust_bindings_generator_code=$? if [ -n "$rust_bindings_generator_code" ]; then echo >&2 "***"
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit 5ce86c6c861352c9346ebb5c96ed70cb67414aa3 ]
While this is a somewhat unusual case, I encountered odd error messages when I ran Kconfig in a foreign architecture chroot.
$ make allmodconfig sh: 1: rustc: not found sh: 1: bindgen: not found # # configuration written to .config #
The successful execution of 'command -v rustc' does not necessarily mean that 'rustc --version' will succeed.
$ sh -c 'command -v rustc' /home/masahiro/.cargo/bin/rustc $ sh -c 'rustc --version' sh: 1: rustc: not found
Here, 'rustc' is built for x86, and I ran it in an arm64 system.
The current code:
command -v $(RUSTC) >/dev/null 2>&1 && $(RUSTC) --version || echo n
can be turned into:
command -v $(RUSTC) >/dev/null 2>&1 && $(RUSTC) --version 2>/dev/null || echo n
However, I did not understand the necessity of 'command -v $(RUSTC)'.
I simplified it to:
$(RUSTC) --version 2>/dev/null || echo n
Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support") Signed-off-by: Masahiro Yamada masahiroy@kernel.org Link: https://lore.kernel.org/r/20240727140302.1806011-1-masahiroy@kernel.org [ Rebased on top of v6.11-rc1. - Miguel ] Signed-off-by: Miguel Ojeda ojeda@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- init/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index b7683506264f0..051cbb22968bd 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1906,7 +1906,7 @@ config RUST config RUSTC_VERSION_TEXT string depends on RUST - default $(shell,command -v $(RUSTC) >/dev/null 2>&1 && $(RUSTC) --version || echo n) + default $(shell,$(RUSTC) --version 2>/dev/null || echo n)
config BINDGEN_VERSION_TEXT string @@ -1914,7 +1914,7 @@ config BINDGEN_VERSION_TEXT # The dummy parameter `workaround-for-0.69.0` is required to support 0.69.0 # (https://github.com/rust-lang/rust-bindgen/pull/2678). It can be removed when # the minimum version is upgraded past that (0.69.1 already fixed the issue). - default $(shell,command -v $(BINDGEN) >/dev/null 2>&1 && $(BINDGEN) --version workaround-for-0.69.0 || echo n) + default $(shell,$(BINDGEN) --version workaround-for-0.69.0 2>/dev/null || echo n)
# # Place an empty function call at each tracepoint site. Can be
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit aacf93e87f0d808ef46e621aa56caea336b4433c ]
Another oddity in these config entries is their default value can fall back to 'n', which is a value for bool or tristate symbols.
The '|| echo n' is an incorrect workaround to avoid the syntax error. This is not a big deal, as the entry is hidden by 'depends on RUST' in situations where '$(RUSTC) --version' or '$(BINDGEN) --version' fails. Anyway, it looks odd.
The default of a string type symbol should be a double-quoted string literal. Turn it into an empty string when the version command fails.
Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support") Signed-off-by: Masahiro Yamada masahiroy@kernel.org Link: https://lore.kernel.org/r/20240727140302.1806011-2-masahiroy@kernel.org [ Rebased on top of v6.11-rc1. - Miguel ] Signed-off-by: Miguel Ojeda ojeda@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- init/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/init/Kconfig b/init/Kconfig index 051cbb22968bd..9684e5d2b81c6 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -1906,7 +1906,7 @@ config RUST config RUSTC_VERSION_TEXT string depends on RUST - default $(shell,$(RUSTC) --version 2>/dev/null || echo n) + default "$(shell,$(RUSTC) --version 2>/dev/null)"
config BINDGEN_VERSION_TEXT string @@ -1914,7 +1914,7 @@ config BINDGEN_VERSION_TEXT # The dummy parameter `workaround-for-0.69.0` is required to support 0.69.0 # (https://github.com/rust-lang/rust-bindgen/pull/2678). It can be removed when # the minimum version is upgraded past that (0.69.1 already fixed the issue). - default $(shell,$(BINDGEN) --version workaround-for-0.69.0 2>/dev/null || echo n) + default "$(shell,$(BINDGEN) --version workaround-for-0.69.0 2>/dev/null)"
# # Place an empty function call at each tracepoint site. Can be
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Farman farman@linux.ibm.com
[ Upstream commit 2a07bb64d80152701d507b1498237ed1b8d83866 ]
This reverts commit bc792884b76f ("s390/dasd: Establish DMA alignment").
Quoting the original commit: linux-next commit bf8d08532bc1 ("iomap: add support for dma aligned direct-io") changes the alignment requirement to come from the block device rather than the block size, and the default alignment requirement is 512-byte boundaries. Since DASD I/O has page alignments for IDAW/TIDAW requests, let's override this value to restore the expected behavior.
I mentioned TIDAW, but that was wrong. TIDAWs have no distinct alignment requirement (per p. 15-70 of POPS SA22-7832-13):
Unless otherwise specified, TIDAWs may designate a block of main storage on any boundary and length up to 4K bytes, provided the specified block does not cross a 4 K-byte boundary.
IDAWs do, but the original commit neglected that while ECKD DASD are typically formatted in 4096-byte blocks, they don't HAVE to be. Formatting an ECKD volume with smaller blocks is permitted (dasdfmt -b xxx), and the problematic commit enforces alignment properties to such a device that will result in errors, such as:
[test@host ~]# lsdasd -l a367 | grep blksz blksz: 512 [test@host ~]# mkfs.xfs -f /dev/disk/by-path/ccw-0.0.a367-part1 meta-data=/dev/dasdc1 isize=512 agcount=4, agsize=230075 blks = sectsz=512 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=1 data = bsize=4096 blocks=920299, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=16384, version=2 = sectsz=512 sunit=0 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 error reading existing superblock: Invalid argument mkfs.xfs: pwrite failed: Invalid argument libxfs_bwrite: write failed on (unknown) bno 0x70565c/0x100, err=22 mkfs.xfs: Releasing dirty buffer to free list! found dirty buffer (bulk) on free list! mkfs.xfs: pwrite failed: Invalid argument ...snipped...
The original commit omitted the FBA discipline for just this reason, but the formatted block size of the other disciplines was overlooked. The solution to all of this is to revert to the original behavior, such that the block size can be respected. There were two commits [1] that moved this code in the interim, so a straight git-revert is not possible, but the change is straightforward.
But what of the original problem? That was manifested with a direct-io QEMU guest, where QEMU itself was changed a month or two later with commit 25474d90aa ("block: use the request length for iov alignment") such that the blamed kernel commit is unnecessary.
[1] commit 0127a47f58c6 ("dasd: move queue setup to common code") commit fde07a4d74e3 ("dasd: use the atomic queue limits API")
Fixes: bc792884b76f ("s390/dasd: Establish DMA alignment") Reviewed-by: Stefan Haberland sth@linux.ibm.com Signed-off-by: Eric Farman farman@linux.ibm.com Signed-off-by: Stefan Haberland sth@linux.ibm.com Link: https://lore.kernel.org/r/20240812125733.126431-2-sth@linux.ibm.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/s390/block/dasd_genhd.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/s390/block/dasd_genhd.c b/drivers/s390/block/dasd_genhd.c index 4533dd055ca8e..23d6e638f381d 100644 --- a/drivers/s390/block/dasd_genhd.c +++ b/drivers/s390/block/dasd_genhd.c @@ -41,7 +41,6 @@ int dasd_gendisk_alloc(struct dasd_block *block) */ .max_segment_size = PAGE_SIZE, .seg_boundary_mask = PAGE_SIZE - 1, - .dma_alignment = PAGE_SIZE - 1, .max_segments = USHRT_MAX, }; struct gendisk *gdp;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thorsten Blum thorsten.blum@toblux.com
[ Upstream commit f7c696a56cc7d70515774a24057b473757ec6089 ]
Since the do_div() macro casts the divisor to u32 anyway, remove the unnecessary s64 cast and fix the following Coccinelle/coccicheck warning reported by do_div.cocci:
WARNING: do_div() does a 64-by-32 division, please consider using div64_s64 instead
Signed-off-by: Thorsten Blum thorsten.blum@toblux.com Link: https://lore.kernel.org/r/20240710010520.384009-2-thorsten.blum@toblux.com Signed-off-by: Jens Axboe axboe@kernel.dk Stable-dep-of: 84f2eecf9501 ("io_uring/napi: check napi_enabled in io_napi_add() before proceeding") Signed-off-by: Sasha Levin sashal@kernel.org --- io_uring/napi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/io_uring/napi.c b/io_uring/napi.c index 080d10e0e0afd..327e5f3a8abe0 100644 --- a/io_uring/napi.c +++ b/io_uring/napi.c @@ -285,7 +285,7 @@ void __io_napi_adjust_timeout(struct io_ring_ctx *ctx, struct io_wait_queue *iow s64 poll_to_ns = timespec64_to_ns(ts); if (poll_to_ns > 0) { u64 val = poll_to_ns + 999; - do_div(val, (s64) 1000); + do_div(val, 1000); poll_to = val; } }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pavel Begunkov asml.silence@gmail.com
[ Upstream commit 342b2e395d5f34c9f111a818556e617939f83a8c ]
It's more natural to use ktime/ns instead of keeping around usec, especially since we're comparing it against user provided timers, so convert napi busy poll internal handling to ktime. It's also nicer since the type (ktime_t vs unsigned long) now tells the unit of measure.
Keep everything as ktime, which we convert to/from micro seconds for IORING_[UN]REGISTER_NAPI. The net/ busy polling works seems to work with usec, however it's not real usec as shift by 10 is used to get it from nsecs, see busy_loop_current_time(), so it's easy to get truncated nsec back and we get back better precision.
Note, we can further improve it later by removing the truncation and maybe convincing net/ to use ktime/ns instead.
Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/95e7ec8d095069a3ed5d40a4bc6f8b586698bc7e.172200377... Signed-off-by: Jens Axboe axboe@kernel.dk Stable-dep-of: 84f2eecf9501 ("io_uring/napi: check napi_enabled in io_napi_add() before proceeding") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/io_uring_types.h | 2 +- io_uring/io_uring.h | 2 +- io_uring/napi.c | 48 +++++++++++++++++++--------------- io_uring/napi.h | 2 +- 4 files changed, 30 insertions(+), 24 deletions(-)
diff --git a/include/linux/io_uring_types.h b/include/linux/io_uring_types.h index 7abdc09271245..b18e998c8b887 100644 --- a/include/linux/io_uring_types.h +++ b/include/linux/io_uring_types.h @@ -410,7 +410,7 @@ struct io_ring_ctx { spinlock_t napi_lock; /* napi_list lock */
/* napi busy poll default timeout */ - unsigned int napi_busy_poll_to; + ktime_t napi_busy_poll_dt; bool napi_prefer_busy_poll; bool napi_enabled;
diff --git a/io_uring/io_uring.h b/io_uring/io_uring.h index 726e6367af4d3..af46d03d58847 100644 --- a/io_uring/io_uring.h +++ b/io_uring/io_uring.h @@ -43,7 +43,7 @@ struct io_wait_queue { ktime_t timeout;
#ifdef CONFIG_NET_RX_BUSY_POLL - unsigned int napi_busy_poll_to; + ktime_t napi_busy_poll_dt; bool napi_prefer_busy_poll; #endif }; diff --git a/io_uring/napi.c b/io_uring/napi.c index 327e5f3a8abe0..6bdb267e9c33c 100644 --- a/io_uring/napi.c +++ b/io_uring/napi.c @@ -33,6 +33,12 @@ static struct io_napi_entry *io_napi_hash_find(struct hlist_head *hash_list, return NULL; }
+static inline ktime_t net_to_ktime(unsigned long t) +{ + /* napi approximating usecs, reverse busy_loop_current_time */ + return ns_to_ktime(t << 10); +} + void __io_napi_add(struct io_ring_ctx *ctx, struct socket *sock) { struct hlist_head *hash_list; @@ -102,14 +108,14 @@ static inline void io_napi_remove_stale(struct io_ring_ctx *ctx, bool is_stale) __io_napi_remove_stale(ctx); }
-static inline bool io_napi_busy_loop_timeout(unsigned long start_time, - unsigned long bp_usec) +static inline bool io_napi_busy_loop_timeout(ktime_t start_time, + ktime_t bp) { - if (bp_usec) { - unsigned long end_time = start_time + bp_usec; - unsigned long now = busy_loop_current_time(); + if (bp) { + ktime_t end_time = ktime_add(start_time, bp); + ktime_t now = net_to_ktime(busy_loop_current_time());
- return time_after(now, end_time); + return ktime_after(now, end_time); }
return true; @@ -124,7 +130,8 @@ static bool io_napi_busy_loop_should_end(void *data, return true; if (io_should_wake(iowq) || io_has_work(iowq->ctx)) return true; - if (io_napi_busy_loop_timeout(start_time, iowq->napi_busy_poll_to)) + if (io_napi_busy_loop_timeout(net_to_ktime(start_time), + iowq->napi_busy_poll_dt)) return true;
return false; @@ -181,10 +188,12 @@ static void io_napi_blocking_busy_loop(struct io_ring_ctx *ctx, */ void io_napi_init(struct io_ring_ctx *ctx) { + u64 sys_dt = READ_ONCE(sysctl_net_busy_poll) * NSEC_PER_USEC; + INIT_LIST_HEAD(&ctx->napi_list); spin_lock_init(&ctx->napi_lock); ctx->napi_prefer_busy_poll = false; - ctx->napi_busy_poll_to = READ_ONCE(sysctl_net_busy_poll); + ctx->napi_busy_poll_dt = ns_to_ktime(sys_dt); }
/* @@ -217,7 +226,7 @@ void io_napi_free(struct io_ring_ctx *ctx) int io_register_napi(struct io_ring_ctx *ctx, void __user *arg) { const struct io_uring_napi curr = { - .busy_poll_to = ctx->napi_busy_poll_to, + .busy_poll_to = ktime_to_us(ctx->napi_busy_poll_dt), .prefer_busy_poll = ctx->napi_prefer_busy_poll }; struct io_uring_napi napi; @@ -232,7 +241,7 @@ int io_register_napi(struct io_ring_ctx *ctx, void __user *arg) if (copy_to_user(arg, &curr, sizeof(curr))) return -EFAULT;
- WRITE_ONCE(ctx->napi_busy_poll_to, napi.busy_poll_to); + WRITE_ONCE(ctx->napi_busy_poll_dt, napi.busy_poll_to * NSEC_PER_USEC); WRITE_ONCE(ctx->napi_prefer_busy_poll, !!napi.prefer_busy_poll); WRITE_ONCE(ctx->napi_enabled, true); return 0; @@ -249,14 +258,14 @@ int io_register_napi(struct io_ring_ctx *ctx, void __user *arg) int io_unregister_napi(struct io_ring_ctx *ctx, void __user *arg) { const struct io_uring_napi curr = { - .busy_poll_to = ctx->napi_busy_poll_to, + .busy_poll_to = ktime_to_us(ctx->napi_busy_poll_dt), .prefer_busy_poll = ctx->napi_prefer_busy_poll };
if (arg && copy_to_user(arg, &curr, sizeof(curr))) return -EFAULT;
- WRITE_ONCE(ctx->napi_busy_poll_to, 0); + WRITE_ONCE(ctx->napi_busy_poll_dt, 0); WRITE_ONCE(ctx->napi_prefer_busy_poll, false); WRITE_ONCE(ctx->napi_enabled, false); return 0; @@ -275,23 +284,20 @@ int io_unregister_napi(struct io_ring_ctx *ctx, void __user *arg) void __io_napi_adjust_timeout(struct io_ring_ctx *ctx, struct io_wait_queue *iowq, struct timespec64 *ts) { - unsigned int poll_to = READ_ONCE(ctx->napi_busy_poll_to); + ktime_t poll_dt = READ_ONCE(ctx->napi_busy_poll_dt);
if (ts) { struct timespec64 poll_to_ts;
- poll_to_ts = ns_to_timespec64(1000 * (s64)poll_to); + poll_to_ts = ns_to_timespec64(ktime_to_ns(poll_dt)); if (timespec64_compare(ts, &poll_to_ts) < 0) { s64 poll_to_ns = timespec64_to_ns(ts); - if (poll_to_ns > 0) { - u64 val = poll_to_ns + 999; - do_div(val, 1000); - poll_to = val; - } + if (poll_to_ns > 0) + poll_dt = ns_to_ktime(poll_to_ns); } }
- iowq->napi_busy_poll_to = poll_to; + iowq->napi_busy_poll_dt = poll_dt; }
/* @@ -320,7 +326,7 @@ int io_napi_sqpoll_busy_poll(struct io_ring_ctx *ctx) LIST_HEAD(napi_list); bool is_stale = false;
- if (!READ_ONCE(ctx->napi_busy_poll_to)) + if (!READ_ONCE(ctx->napi_busy_poll_dt)) return 0; if (list_empty_careful(&ctx->napi_list)) return 0; diff --git a/io_uring/napi.h b/io_uring/napi.h index 6fc0393d0dbef..babbee36cd3eb 100644 --- a/io_uring/napi.h +++ b/io_uring/napi.h @@ -55,7 +55,7 @@ static inline void io_napi_add(struct io_kiocb *req) struct io_ring_ctx *ctx = req->ctx; struct socket *sock;
- if (!READ_ONCE(ctx->napi_busy_poll_to)) + if (!READ_ONCE(ctx->napi_busy_poll_dt)) return;
sock = sock_from_file(req->file);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Olivier Langlois olivier@trillion01.com
[ Upstream commit 84f2eecf95018386c145ada19bb45b03bdb80d9e ]
doing so avoids the overhead of adding napi ids to all the rings that do not enable napi.
if no id is added to napi_list because napi is disabled, __io_napi_busy_loop() will not be called.
Signed-off-by: Olivier Langlois olivier@trillion01.com Fixes: b4ccc4dd1330 ("io_uring/napi: enable even with a timeout of 0") Link: https://lore.kernel.org/r/bd989ccef5fda14f5fd9888faf4fefcf66bd0369.172340013... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- io_uring/napi.c | 2 +- io_uring/napi.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/io_uring/napi.c b/io_uring/napi.c index 6bdb267e9c33c..ab5d68d4440c4 100644 --- a/io_uring/napi.c +++ b/io_uring/napi.c @@ -311,7 +311,7 @@ void __io_napi_busy_loop(struct io_ring_ctx *ctx, struct io_wait_queue *iowq) { iowq->napi_prefer_busy_poll = READ_ONCE(ctx->napi_prefer_busy_poll);
- if (!(ctx->flags & IORING_SETUP_SQPOLL) && ctx->napi_enabled) + if (!(ctx->flags & IORING_SETUP_SQPOLL)) io_napi_blocking_busy_loop(ctx, iowq); }
diff --git a/io_uring/napi.h b/io_uring/napi.h index babbee36cd3eb..341d010cf66bc 100644 --- a/io_uring/napi.h +++ b/io_uring/napi.h @@ -55,7 +55,7 @@ static inline void io_napi_add(struct io_kiocb *req) struct io_ring_ctx *ctx = req->ctx; struct socket *sock;
- if (!READ_ONCE(ctx->napi_busy_poll_dt)) + if (!READ_ONCE(ctx->napi_enabled)) return;
sock = sock_from_file(req->file);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nysal Jan K.A nysal@linux.ibm.com
[ Upstream commit 6c17ea1f3eaa330d445ac14a9428402ce4e3055e ]
If a core is offline then enabling SMT should not online CPUs of this core. By enabling SMT, what is intended is either changing the SMT value from "off" to "on" or setting the SMT level (threads per core) from a lower to higher value.
On PowerPC the ppc64_cpu utility can be used, among other things, to perform the following functions:
ppc64_cpu --cores-on # Get the number of online cores ppc64_cpu --cores-on=X # Put exactly X cores online ppc64_cpu --offline-cores=X[,Y,...] # Put specified cores offline ppc64_cpu --smt={on|off|value} # Enable, disable or change SMT level
If the user has decided to offline certain cores, enabling SMT should not online CPUs in those cores. This patch fixes the issue and changes the behaviour as described, by introducing an arch specific function topology_is_core_online(). It is currently implemented only for PowerPC.
Fixes: 73c58e7e1412 ("powerpc: Add HOTPLUG_SMT support") Reported-by: Tyrel Datwyler tyreld@linux.ibm.com Closes: https://groups.google.com/g/powerpc-utils-devel/c/wrwVzAAnRlI/m/5KJSoqP4BAAJ Signed-off-by: Nysal Jan K.A nysal@linux.ibm.com Reviewed-by: Shrikanth Hegde sshegde@linux.ibm.com Reviewed-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240731030126.956210-2-nysal@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/ABI/testing/sysfs-devices-system-cpu | 3 ++- kernel/cpu.c | 12 +++++++++++- 2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-devices-system-cpu b/Documentation/ABI/testing/sysfs-devices-system-cpu index e7e160954e798..0579860b55299 100644 --- a/Documentation/ABI/testing/sysfs-devices-system-cpu +++ b/Documentation/ABI/testing/sysfs-devices-system-cpu @@ -562,7 +562,8 @@ Description: Control Symmetric Multi Threading (SMT) ================ =========================================
If control status is "forceoff" or "notsupported" writes - are rejected. + are rejected. Note that enabling SMT on PowerPC skips + offline cores.
What: /sys/devices/system/cpu/cpuX/power/energy_perf_bias Date: March 2019 diff --git a/kernel/cpu.c b/kernel/cpu.c index 3d2bf1d50a0c4..6dee328bfe6fd 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2679,6 +2679,16 @@ int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) return ret; }
+/** + * Check if the core a CPU belongs to is online + */ +#if !defined(topology_is_core_online) +static inline bool topology_is_core_online(unsigned int cpu) +{ + return true; +} +#endif + int cpuhp_smt_enable(void) { int cpu, ret = 0; @@ -2689,7 +2699,7 @@ int cpuhp_smt_enable(void) /* Skip online CPUs and CPUs on offline nodes */ if (cpu_online(cpu) || !node_online(cpu_to_node(cpu))) continue; - if (!cpu_smt_thread_allowed(cpu)) + if (!cpu_smt_thread_allowed(cpu) || !topology_is_core_online(cpu)) continue; ret = _cpu_up(cpu, 0, CPUHP_ONLINE); if (ret)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nysal Jan K.A nysal@linux.ibm.com
[ Upstream commit 227bbaabe64b6f9cd98aa051454c1d4a194a8c6a ]
topology_is_core_online() checks if the core a CPU belongs to is online. The core is online if at least one of the sibling CPUs is online. The first CPU of an online core is also online in the common case, so this should be fairly quick.
Fixes: 73c58e7e1412 ("powerpc: Add HOTPLUG_SMT support") Signed-off-by: Nysal Jan K.A nysal@linux.ibm.com Reviewed-by: Shrikanth Hegde sshegde@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/20240731030126.956210-3-nysal@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/include/asm/topology.h | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h index f4e6f2dd04b73..16bacfe8c7a2c 100644 --- a/arch/powerpc/include/asm/topology.h +++ b/arch/powerpc/include/asm/topology.h @@ -145,6 +145,7 @@ static inline int cpu_to_coregroup_id(int cpu)
#ifdef CONFIG_HOTPLUG_SMT #include <linux/cpu_smt.h> +#include <linux/cpumask.h> #include <asm/cputhreads.h>
static inline bool topology_is_primary_thread(unsigned int cpu) @@ -156,6 +157,18 @@ static inline bool topology_smt_thread_allowed(unsigned int cpu) { return cpu_thread_in_core(cpu) < cpu_smt_num_threads; } + +#define topology_is_core_online topology_is_core_online +static inline bool topology_is_core_online(unsigned int cpu) +{ + int i, first_cpu = cpu_first_thread_sibling(cpu); + + for (i = first_cpu; i < first_cpu + threads_per_core; ++i) { + if (cpu_online(i)) + return true; + } + return false; +} #endif
#endif /* __KERNEL__ */
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ryo Takakura takakura@valinux.co.jp
[ Upstream commit bcc954c6caba01fca143162d5fbb90e46aa1ad80 ]
commit 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing to ringbuffer") disabled non-panic CPUs to further write messages to ringbuffer after panicked.
Since the commit, non-panicked CPU's are not allowed to write to ring buffer after panicked and CPU backtrace which is triggered after panicked to sample non-panicked CPUs' backtrace no longer serves its function as it has nothing to print.
Fix the issue by allowing non-panicked CPUs to write into ringbuffer while CPU backtrace is in flight.
Fixes: 779dbc2e78d7 ("printk: Avoid non-panic CPUs writing to ringbuffer") Signed-off-by: Ryo Takakura takakura@valinux.co.jp Reviewed-by: Petr Mladek pmladek@suse.com Link: https://lore.kernel.org/r/20240812072703.339690-1-takakura@valinux.co.jp Signed-off-by: Petr Mladek pmladek@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/panic.h | 1 + kernel/panic.c | 8 +++++++- kernel/printk/printk.c | 2 +- 3 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/include/linux/panic.h b/include/linux/panic.h index 6717b15e798c3..556b4e2ad9aa5 100644 --- a/include/linux/panic.h +++ b/include/linux/panic.h @@ -16,6 +16,7 @@ extern void oops_enter(void); extern void oops_exit(void); extern bool oops_may_print(void);
+extern bool panic_triggering_all_cpu_backtrace; extern int panic_timeout; extern unsigned long panic_print; extern int panic_on_oops; diff --git a/kernel/panic.c b/kernel/panic.c index 8bff183d6180e..30342568e935f 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -63,6 +63,8 @@ unsigned long panic_on_taint; bool panic_on_taint_nousertaint = false; static unsigned int warn_limit __read_mostly;
+bool panic_triggering_all_cpu_backtrace; + int panic_timeout = CONFIG_PANIC_TIMEOUT; EXPORT_SYMBOL_GPL(panic_timeout);
@@ -252,8 +254,12 @@ void check_panic_on_warn(const char *origin) */ static void panic_other_cpus_shutdown(bool crash_kexec) { - if (panic_print & PANIC_PRINT_ALL_CPU_BT) + if (panic_print & PANIC_PRINT_ALL_CPU_BT) { + /* Temporary allow non-panic CPUs to write their backtraces. */ + panic_triggering_all_cpu_backtrace = true; trigger_all_cpu_backtrace(); + panic_triggering_all_cpu_backtrace = false; + }
/* * Note that smp_send_stop() is the usual SMP shutdown function, diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c index dddb15f48d595..c5d844f727f63 100644 --- a/kernel/printk/printk.c +++ b/kernel/printk/printk.c @@ -2316,7 +2316,7 @@ asmlinkage int vprintk_emit(int facility, int level, * non-panic CPUs are generating any messages, they will be * silently dropped. */ - if (other_cpu_in_panic()) + if (other_cpu_in_panic() && !panic_triggering_all_cpu_backtrace) return 0;
if (level == LOGLEVEL_SCHED) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Samuel Holland samuel.holland@sifive.com
[ Upstream commit f75c235565f90c4a17b125e47f1c68ef6b8c2bce ]
Currently, kasan_init_sw_tags() is called before setup_per_cpu_areas(), so per_cpu(prng_state, cpu) accesses the same address regardless of the value of "cpu", and the same seed value gets copied to the percpu area for every CPU. Fix this by moving the call to smp_prepare_boot_cpu(), which is the first architecture hook after setup_per_cpu_areas().
Fixes: 3c9e3aa11094 ("kasan: add tag related helper functions") Fixes: 3f41b6093823 ("kasan: fix random seed generation for tag-based mode") Signed-off-by: Samuel Holland samuel.holland@sifive.com Reviewed-by: Andrey Konovalov andreyknvl@gmail.com Link: https://lore.kernel.org/r/20240814091005.969756-1-samuel.holland@sifive.com Signed-off-by: Catalin Marinas catalin.marinas@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kernel/setup.c | 3 --- arch/arm64/kernel/smp.c | 2 ++ 2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/arm64/kernel/setup.c b/arch/arm64/kernel/setup.c index a096e2451044d..b22d28ec80284 100644 --- a/arch/arm64/kernel/setup.c +++ b/arch/arm64/kernel/setup.c @@ -355,9 +355,6 @@ void __init __no_sanitize_address setup_arch(char **cmdline_p) smp_init_cpus(); smp_build_mpidr_hash();
- /* Init percpu seeds for random tags after cpus are set up. */ - kasan_init_sw_tags(); - #ifdef CONFIG_ARM64_SW_TTBR0_PAN /* * Make sure init_thread_info.ttbr0 always generates translation diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c index 5de85dccc09cd..05688f6a275f1 100644 --- a/arch/arm64/kernel/smp.c +++ b/arch/arm64/kernel/smp.c @@ -469,6 +469,8 @@ void __init smp_prepare_boot_cpu(void) init_gic_priority_masking();
kasan_init_hw_tags(); + /* Init percpu seeds for random tags after cpus are set up. */ + kasan_init_sw_tags(); }
/*
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li Lingfeng lilingfeng3@huawei.com
[ Upstream commit b313a8c835516bdda85025500be866ac8a74e022 ]
Lockdep reported a warning in Linux version 6.6:
[ 414.344659] ================================ [ 414.345155] WARNING: inconsistent lock state [ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted [ 414.346221] -------------------------------- [ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes: [ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.351204] {IN-SOFTIRQ-W} state was registered at: [ 414.351751] lock_acquire+0x18d/0x460 [ 414.352218] _raw_spin_lock_irqsave+0x39/0x60 [ 414.352769] __wake_up_common_lock+0x22/0x60 [ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0 [ 414.353829] sbitmap_queue_clear+0xdd/0x270 [ 414.354338] blk_mq_put_tag+0xdf/0x170 [ 414.354807] __blk_mq_free_request+0x381/0x4d0 [ 414.355335] blk_mq_free_request+0x28b/0x3e0 [ 414.355847] __blk_mq_end_request+0x242/0xc30 [ 414.356367] scsi_end_request+0x2c1/0x830 [ 414.345155] WARNING: inconsistent lock state [ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted [ 414.346221] -------------------------------- [ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. [ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes: [ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.351204] {IN-SOFTIRQ-W} state was registered at: [ 414.351751] lock_acquire+0x18d/0x460 [ 414.352218] _raw_spin_lock_irqsave+0x39/0x60 [ 414.352769] __wake_up_common_lock+0x22/0x60 [ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0 [ 414.353829] sbitmap_queue_clear+0xdd/0x270 [ 414.354338] blk_mq_put_tag+0xdf/0x170 [ 414.354807] __blk_mq_free_request+0x381/0x4d0 [ 414.355335] blk_mq_free_request+0x28b/0x3e0 [ 414.355847] __blk_mq_end_request+0x242/0xc30 [ 414.356367] scsi_end_request+0x2c1/0x830 [ 414.356863] scsi_io_completion+0x177/0x1610 [ 414.357379] scsi_complete+0x12f/0x260 [ 414.357856] blk_complete_reqs+0xba/0xf0 [ 414.358338] __do_softirq+0x1b0/0x7a2 [ 414.358796] irq_exit_rcu+0x14b/0x1a0 [ 414.359262] sysvec_call_function_single+0xaf/0xc0 [ 414.359828] asm_sysvec_call_function_single+0x1a/0x20 [ 414.360426] default_idle+0x1e/0x30 [ 414.360873] default_idle_call+0x9b/0x1f0 [ 414.361390] do_idle+0x2d2/0x3e0 [ 414.361819] cpu_startup_entry+0x55/0x60 [ 414.362314] start_secondary+0x235/0x2b0 [ 414.362809] secondary_startup_64_no_verify+0x18f/0x19b [ 414.363413] irq event stamp: 428794 [ 414.363825] hardirqs last enabled at (428793): [<ffffffff816bfd1c>] ktime_get+0x1dc/0x200 [ 414.364694] hardirqs last disabled at (428794): [<ffffffff85470177>] _raw_spin_lock_irq+0x47/0x50 [ 414.365629] softirqs last enabled at (428444): [<ffffffff85474780>] __do_softirq+0x540/0x7a2 [ 414.366522] softirqs last disabled at (428419): [<ffffffff813f65ab>] irq_exit_rcu+0x14b/0x1a0 [ 414.367425] other info that might help us debug this: [ 414.368194] Possible unsafe locking scenario: [ 414.368900] CPU0 [ 414.369225] ---- [ 414.369548] lock(&sbq->ws[i].wait); [ 414.370000] <Interrupt> [ 414.370342] lock(&sbq->ws[i].wait); [ 414.370802] *** DEADLOCK *** [ 414.371569] 5 locks held by kworker/u10:3/1152: [ 414.372088] #0: ffff88810130e938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_scheduled_works+0x357/0x13f0 [ 414.373180] #1: ffff88810201fdb8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x3a3/0x13f0 [ 414.374384] #2: ffffffff86ffbdc0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x637/0xa00 [ 414.375342] #3: ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 [ 414.376377] #4: ffff888106205a08 (&hctx->dispatch_wait_lock){+.-.}-{2:2}, at: blk_mq_dispatch_rq_list+0x1337/0x1ee0 [ 414.378607] stack backtrace: [ 414.379177] CPU: 0 PID: 1152 Comm: kworker/u10:3 Not tainted 6.6.0-07439-gba2303cacfda #6 [ 414.380032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 414.381177] Workqueue: writeback wb_workfn (flush-253:0) [ 414.381805] Call Trace: [ 414.382136] <TASK> [ 414.382429] dump_stack_lvl+0x91/0xf0 [ 414.382884] mark_lock_irq+0xb3b/0x1260 [ 414.383367] ? __pfx_mark_lock_irq+0x10/0x10 [ 414.383889] ? stack_trace_save+0x8e/0xc0 [ 414.384373] ? __pfx_stack_trace_save+0x10/0x10 [ 414.384903] ? graph_lock+0xcf/0x410 [ 414.385350] ? save_trace+0x3d/0xc70 [ 414.385808] mark_lock.part.20+0x56d/0xa90 [ 414.386317] mark_held_locks+0xb0/0x110 [ 414.386791] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 414.387320] lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.387901] ? _raw_spin_unlock_irq+0x28/0x50 [ 414.388422] trace_hardirqs_on+0x58/0x100 [ 414.388917] _raw_spin_unlock_irq+0x28/0x50 [ 414.389422] __blk_mq_tag_busy+0x1d6/0x2a0 [ 414.389920] __blk_mq_get_driver_tag+0x761/0x9f0 [ 414.390899] blk_mq_dispatch_rq_list+0x1780/0x1ee0 [ 414.391473] ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10 [ 414.392070] ? sbitmap_get+0x2b8/0x450 [ 414.392533] ? __blk_mq_get_driver_tag+0x210/0x9f0 [ 414.393095] __blk_mq_sched_dispatch_requests+0xd99/0x1690 [ 414.393730] ? elv_attempt_insert_merge+0x1b1/0x420 [ 414.394302] ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10 [ 414.394970] ? lock_acquire+0x18d/0x460 [ 414.395456] ? blk_mq_run_hw_queue+0x637/0xa00 [ 414.395986] ? __pfx_lock_acquire+0x10/0x10 [ 414.396499] blk_mq_sched_dispatch_requests+0x109/0x190 [ 414.397100] blk_mq_run_hw_queue+0x66e/0xa00 [ 414.397616] blk_mq_flush_plug_list.part.17+0x614/0x2030 [ 414.398244] ? __pfx_blk_mq_flush_plug_list.part.17+0x10/0x10 [ 414.398897] ? writeback_sb_inodes+0x241/0xcc0 [ 414.399429] blk_mq_flush_plug_list+0x65/0x80 [ 414.399957] __blk_flush_plug+0x2f1/0x530 [ 414.400458] ? __pfx___blk_flush_plug+0x10/0x10 [ 414.400999] blk_finish_plug+0x59/0xa0 [ 414.401467] wb_writeback+0x7cc/0x920 [ 414.401935] ? __pfx_wb_writeback+0x10/0x10 [ 414.402442] ? mark_held_locks+0xb0/0x110 [ 414.402931] ? __pfx_do_raw_spin_lock+0x10/0x10 [ 414.403462] ? lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.404062] wb_workfn+0x2b3/0xcf0 [ 414.404500] ? __pfx_wb_workfn+0x10/0x10 [ 414.404989] process_scheduled_works+0x432/0x13f0 [ 414.405546] ? __pfx_process_scheduled_works+0x10/0x10 [ 414.406139] ? do_raw_spin_lock+0x101/0x2a0 [ 414.406641] ? assign_work+0x19b/0x240 [ 414.407106] ? lock_is_held_type+0x9d/0x110 [ 414.407604] worker_thread+0x6f2/0x1160 [ 414.408075] ? __kthread_parkme+0x62/0x210 [ 414.408572] ? lockdep_hardirqs_on_prepare+0x297/0x3f0 [ 414.409168] ? __kthread_parkme+0x13c/0x210 [ 414.409678] ? __pfx_worker_thread+0x10/0x10 [ 414.410191] kthread+0x33c/0x440 [ 414.410602] ? __pfx_kthread+0x10/0x10 [ 414.411068] ret_from_fork+0x4d/0x80 [ 414.411526] ? __pfx_kthread+0x10/0x10 [ 414.411993] ret_from_fork_asm+0x1b/0x30 [ 414.412489] </TASK>
When interrupt is turned on while a lock holding by spin_lock_irq it throws a warning because of potential deadlock.
blk_mq_prep_dispatch_rq blk_mq_get_driver_tag __blk_mq_get_driver_tag __blk_mq_alloc_driver_tag blk_mq_tag_busy -> tag is already busy // failed to get driver tag blk_mq_mark_tag_wait spin_lock_irq(&wq->lock) -> lock A (&sbq->ws[i].wait) __add_wait_queue(wq, wait) -> wait queue active blk_mq_get_driver_tag __blk_mq_tag_busy -> 1) tag must be idle, which means there can't be inflight IO spin_lock_irq(&tags->lock) -> lock B (hctx->tags) spin_unlock_irq(&tags->lock) -> unlock B, turn on interrupt accidentally -> 2) context must be preempt by IO interrupt to trigger deadlock.
As shown above, the deadlock is not possible in theory, but the warning still need to be fixed.
Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq.
Fixes: 4f1731df60f9 ("blk-mq: fix potential io hang by wrong 'wake_batch'") Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Ming Lei ming.lei@redhat.com Reviewed-by: Yu Kuai yukuai3@huawei.com Reviewed-by: Bart Van Assche bvanassche@acm.org Link: https://lore.kernel.org/r/20240815024736.2040971-1-lilingfeng@huaweicloud.co... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-mq-tag.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq-tag.c b/block/blk-mq-tag.c index cc57e2dd9a0bb..2cafcf11ee8be 100644 --- a/block/blk-mq-tag.c +++ b/block/blk-mq-tag.c @@ -38,6 +38,7 @@ static void blk_mq_update_wake_batch(struct blk_mq_tags *tags, void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) { unsigned int users; + unsigned long flags; struct blk_mq_tags *tags = hctx->tags;
/* @@ -56,11 +57,11 @@ void __blk_mq_tag_busy(struct blk_mq_hw_ctx *hctx) return; }
- spin_lock_irq(&tags->lock); + spin_lock_irqsave(&tags->lock, flags); users = tags->active_queues + 1; WRITE_ONCE(tags->active_queues, users); blk_mq_update_wake_batch(tags, users); - spin_unlock_irq(&tags->lock); + spin_unlock_irqrestore(&tags->lock, flags); }
/*
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit 76cb763e6ea62e838ccc8f7a1ea4246d690fccc9 upstream.
OLED panels don't support the ABM, they shouldn't offer the panel_power_savings attribute to the user. Check whether aux BL control support was enabled to decide whether to offer it.
Reported-by: Gergo Koteles soyer@irl.hu Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3359 Signed-off-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Tested-by: Gergo Koteles soyer@irl.hu Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 29 ++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -6562,12 +6562,34 @@ static const struct attribute_group amdg .attrs = amdgpu_attrs };
+static bool +amdgpu_dm_should_create_sysfs(struct amdgpu_dm_connector *amdgpu_dm_connector) +{ + if (amdgpu_dm_abm_level >= 0) + return false; + + if (amdgpu_dm_connector->base.connector_type != DRM_MODE_CONNECTOR_eDP) + return false; + + /* check for OLED panels */ + if (amdgpu_dm_connector->bl_idx >= 0) { + struct drm_device *drm = amdgpu_dm_connector->base.dev; + struct amdgpu_display_manager *dm = &drm_to_adev(drm)->dm; + struct amdgpu_dm_backlight_caps *caps; + + caps = &dm->backlight_caps[amdgpu_dm_connector->bl_idx]; + if (caps->aux_support) + return false; + } + + return true; +} + static void amdgpu_dm_connector_unregister(struct drm_connector *connector) { struct amdgpu_dm_connector *amdgpu_dm_connector = to_amdgpu_dm_connector(connector);
- if (connector->connector_type == DRM_MODE_CONNECTOR_eDP && - amdgpu_dm_abm_level < 0) + if (amdgpu_dm_should_create_sysfs(amdgpu_dm_connector)) sysfs_remove_group(&connector->kdev->kobj, &amdgpu_group);
drm_dp_aux_unregister(&amdgpu_dm_connector->dm_dp_aux.aux); @@ -6674,8 +6696,7 @@ amdgpu_dm_connector_late_register(struct to_amdgpu_dm_connector(connector); int r;
- if (connector->connector_type == DRM_MODE_CONNECTOR_eDP && - amdgpu_dm_abm_level < 0) { + if (amdgpu_dm_should_create_sysfs(amdgpu_dm_connector)) { r = sysfs_create_group(&connector->kdev->kobj, &amdgpu_group); if (r)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
[ Upstream commit 2c637af8a74d9a2a52ee5456a75dd29c8cb52da5 ]
Some cooling device target state checks in bang_bang_control() done before setting the new target state are not necessary after recent changes, so drop them.
Also avoid updating the target state before checking it for unexpected values.
Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: 84248e35d9b6 ("thermal: gov_bang_bang: Split bang_bang_control()") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/gov_bang_bang.c | 14 +++----------- 1 file changed, 3 insertions(+), 11 deletions(-)
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c index 2a6a651e9d921..b9474c6af72b5 100644 --- a/drivers/thermal/gov_bang_bang.c +++ b/drivers/thermal/gov_bang_bang.c @@ -57,24 +57,16 @@ static void bang_bang_control(struct thermal_zone_device *tz, if (instance->trip != trip) continue;
- if (instance->target == THERMAL_NO_TARGET) - instance->target = 0; - - if (instance->target != 0 && instance->target != 1) { + if (instance->target != 0 && instance->target != 1 && + instance->target != THERMAL_NO_TARGET) pr_debug("Unexpected state %ld of thermal instance %s in bang-bang\n", instance->target, instance->name);
- instance->target = 1; - } - /* * Enable the fan when the trip is crossed on the way up and * disable it when the trip is crossed on the way down. */ - if (instance->target == 0 && crossed_up) - instance->target = 1; - else if (instance->target == 1 && !crossed_up) - instance->target = 0; + instance->target = crossed_up;
dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
[ Upstream commit 84248e35d9b60e03df7276627e4e91fbaf80f73d ]
Move the setting of the thermal instance target state from bang_bang_control() into a separate function that will be also called in a different place going forward.
No intentional functional impact.
Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Peter Kästle peter@piie.net Reviewed-by: Zhang Rui rui.zhang@intel.com Cc: 6.10+ stable@vger.kernel.org # 6.10+ Link: https://patch.msgid.link/3313587.aeNJFYEL58@rjwysocki.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/gov_bang_bang.c | 42 ++++++++++++++++++--------------- 1 file changed, 23 insertions(+), 19 deletions(-)
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c index b9474c6af72b5..87cff3ea77a9d 100644 --- a/drivers/thermal/gov_bang_bang.c +++ b/drivers/thermal/gov_bang_bang.c @@ -13,6 +13,27 @@
#include "thermal_core.h"
+static void bang_bang_set_instance_target(struct thermal_instance *instance, + unsigned int target) +{ + if (instance->target != 0 && instance->target != 1 && + instance->target != THERMAL_NO_TARGET) + pr_debug("Unexpected state %ld of thermal instance %s in bang-bang\n", + instance->target, instance->name); + + /* + * Enable the fan when the trip is crossed on the way up and disable it + * when the trip is crossed on the way down. + */ + instance->target = target; + + dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target); + + mutex_lock(&instance->cdev->lock); + __thermal_cdev_update(instance->cdev); + mutex_unlock(&instance->cdev->lock); +} + /** * bang_bang_control - controls devices associated with the given zone * @tz: thermal_zone_device @@ -54,25 +75,8 @@ static void bang_bang_control(struct thermal_zone_device *tz, tz->temperature, trip->hysteresis);
list_for_each_entry(instance, &tz->thermal_instances, tz_node) { - if (instance->trip != trip) - continue; - - if (instance->target != 0 && instance->target != 1 && - instance->target != THERMAL_NO_TARGET) - pr_debug("Unexpected state %ld of thermal instance %s in bang-bang\n", - instance->target, instance->name); - - /* - * Enable the fan when the trip is crossed on the way up and - * disable it when the trip is crossed on the way down. - */ - instance->target = crossed_up; - - dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target); - - mutex_lock(&instance->cdev->lock); - __thermal_cdev_update(instance->cdev); - mutex_unlock(&instance->cdev->lock); + if (instance->trip == trip) + bang_bang_set_instance_target(instance, crossed_up); } }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
[ Upstream commit 5f64b4a1ab1b0412446d42e1fc2964c2cdb60b27 ]
After recent changes, the Bang-bang governor may not adjust the initial configuration of cooling devices to the actual situation.
Namely, if a cooling device bound to a certain trip point starts in the "on" state and the thermal zone temperature is below the threshold of that trip point, the trip point may never be crossed on the way up in which case the state of the cooling device will never be adjusted because the thermal core will never invoke the governor's .trip_crossed() callback. [Note that there is no issue if the zone temperature is at the trip threshold or above it to start with because .trip_crossed() will be invoked then to indicate the start of thermal mitigation for the given trip.]
To address this, add a .manage() callback to the Bang-bang governor and use it to ensure that all of the thermal instances managed by the governor have been initialized properly and the states of all of the cooling devices involved have been adjusted to the current zone temperature as appropriate.
Fixes: 530c932bdf75 ("thermal: gov_bang_bang: Use .trip_crossed() instead of .throttle()") Link: https://lore.kernel.org/linux-pm/1bfbbae5-42b0-4c7d-9544-e98855715294@piie.n... Cc: 6.10+ stable@vger.kernel.org # 6.10+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Peter Kästle peter@piie.net Reviewed-by: Zhang Rui rui.zhang@intel.com Link: https://patch.msgid.link/8419356.T7Z3S40VBb@rjwysocki.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/gov_bang_bang.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+)
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c index 87cff3ea77a9d..bc55e0698bfa8 100644 --- a/drivers/thermal/gov_bang_bang.c +++ b/drivers/thermal/gov_bang_bang.c @@ -26,6 +26,7 @@ static void bang_bang_set_instance_target(struct thermal_instance *instance, * when the trip is crossed on the way down. */ instance->target = target; + instance->initialized = true;
dev_dbg(&instance->cdev->device, "target=%ld\n", instance->target);
@@ -80,8 +81,37 @@ static void bang_bang_control(struct thermal_zone_device *tz, } }
+static void bang_bang_manage(struct thermal_zone_device *tz) +{ + const struct thermal_trip_desc *td; + struct thermal_instance *instance; + + for_each_trip_desc(tz, td) { + const struct thermal_trip *trip = &td->trip; + + if (tz->temperature >= td->threshold || + trip->temperature == THERMAL_TEMP_INVALID || + trip->type == THERMAL_TRIP_CRITICAL || + trip->type == THERMAL_TRIP_HOT) + continue; + + /* + * If the initial cooling device state is "on", but the zone + * temperature is not above the trip point, the core will not + * call bang_bang_control() until the zone temperature reaches + * the trip point temperature which may be never. In those + * cases, set the initial state of the cooling device to 0. + */ + list_for_each_entry(instance, &tz->thermal_instances, tz_node) { + if (!instance->initialized && instance->trip == trip) + bang_bang_set_instance_target(instance, 0); + } + } +} + static struct thermal_governor thermal_gov_bang_bang = { .name = "bang_bang", .trip_crossed = bang_bang_control, + .manage = bang_bang_manage, }; THERMAL_GOVERNOR_DECLARE(thermal_gov_bang_bang);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
[ Upstream commit 6e6f58a170ea98e44075b761f2da42a5aec47dfb ]
After running once, the for_each_trip_desc() loop in bang_bang_manage() is pure needless overhead because it is not going to make any changes unless a new cooling device has been bound to one of the trips in the thermal zone or the system is resuming from sleep.
For this reason, make bang_bang_manage() set governor_data for the thermal zone and check it upfront to decide whether or not it needs to do anything.
However, governor_data needs to be reset in some cases to let bang_bang_manage() know that it should walk the trips again, so add an .update_tz() callback to the governor and make the core additionally invoke it during system resume.
To avoid affecting the other users of that callback unnecessarily, add a special notification reason for system resume, THERMAL_TZ_RESUME, and also pass it to __thermal_zone_device_update() called during system resume for consistency.
Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Peter Kästle peter@piie.net Reviewed-by: Zhang Rui rui.zhang@intel.com Cc: 6.10+ stable@vger.kernel.org # 6.10+ Link: https://patch.msgid.link/2285575.iZASKD2KPV@rjwysocki.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/gov_bang_bang.c | 18 ++++++++++++++++++ drivers/thermal/thermal_core.c | 3 ++- include/linux/thermal.h | 1 + 3 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c index bc55e0698bfa8..daed67d19efb8 100644 --- a/drivers/thermal/gov_bang_bang.c +++ b/drivers/thermal/gov_bang_bang.c @@ -86,6 +86,10 @@ static void bang_bang_manage(struct thermal_zone_device *tz) const struct thermal_trip_desc *td; struct thermal_instance *instance;
+ /* If the code below has run already, nothing needs to be done. */ + if (tz->governor_data) + return; + for_each_trip_desc(tz, td) { const struct thermal_trip *trip = &td->trip;
@@ -107,11 +111,25 @@ static void bang_bang_manage(struct thermal_zone_device *tz) bang_bang_set_instance_target(instance, 0); } } + + tz->governor_data = (void *)true; +} + +static void bang_bang_update_tz(struct thermal_zone_device *tz, + enum thermal_notify_event reason) +{ + /* + * Let bang_bang_manage() know that it needs to walk trips after binding + * a new cdev and after system resume. + */ + if (reason == THERMAL_TZ_BIND_CDEV || reason == THERMAL_TZ_RESUME) + tz->governor_data = NULL; }
static struct thermal_governor thermal_gov_bang_bang = { .name = "bang_bang", .trip_crossed = bang_bang_control, .manage = bang_bang_manage, + .update_tz = bang_bang_update_tz, }; THERMAL_GOVERNOR_DECLARE(thermal_gov_bang_bang); diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index f2d31bc48f529..b8d889ef4fa5e 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -1715,7 +1715,8 @@ static void thermal_zone_device_resume(struct work_struct *work) tz->suspended = false;
thermal_zone_device_init(tz); - __thermal_zone_device_update(tz, THERMAL_EVENT_UNSPECIFIED); + thermal_governor_update_tz(tz, THERMAL_TZ_RESUME); + __thermal_zone_device_update(tz, THERMAL_TZ_RESUME);
complete(&tz->resume); tz->resuming = false; diff --git a/include/linux/thermal.h b/include/linux/thermal.h index f1155c0439c4b..13f88317b81bf 100644 --- a/include/linux/thermal.h +++ b/include/linux/thermal.h @@ -55,6 +55,7 @@ enum thermal_notify_event { THERMAL_TZ_BIND_CDEV, /* Cooling dev is bind to the thermal zone */ THERMAL_TZ_UNBIND_CDEV, /* Cooling dev is unbind from the thermal zone */ THERMAL_INSTANCE_WEIGHT_CHANGED, /* Thermal instance weight changed */ + THERMAL_TZ_RESUME, /* Thermal zone is resuming after system sleep */ };
/**
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit 519be989717c5bffaed1dc14a439e3872cb4bb8d ]
Add a tracepoint to track the credit changes and server in_flight value involved in the lifetime of a R/W request, logging it against the request/subreq debugging ID. This requires the debugging IDs to be recorded in the cifs_credits struct.
The tracepoint can be enabled with:
echo 1 >/sys/kernel/debug/tracing/events/cifs/smb3_rw_credits/enable
Also add a three-state flag to struct cifs_credits to note if we're interested in determining when the in_flight contribution ends and, if so, to track whether we've decremented the contribution yet.
Signed-off-by: David Howells dhowells@redhat.com Reviewed-by: Paulo Alcantara (Red Hat) pc@manguebit.com cc: Jeff Layton jlayton@kernel.org cc: linux-cifs@vger.kernel.org cc: netfs@lists.linux.dev cc: linux-fsdevel@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: 74c2ab6d653b ("smb/client: avoid possible NULL dereference in cifs_free_subrequest()") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/cifsglob.h | 17 +++++++----- fs/smb/client/file.c | 32 ++++++++++++++++++++++- fs/smb/client/smb1ops.c | 2 +- fs/smb/client/smb2ops.c | 42 ++++++++++++++++++++++++++---- fs/smb/client/smb2pdu.c | 40 +++++++++++++++++++++++++--- fs/smb/client/trace.h | 55 ++++++++++++++++++++++++++++++++++++++- fs/smb/client/transport.c | 8 +++--- 7 files changed, 173 insertions(+), 23 deletions(-)
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h index d4bcc7da700c6..0a271b9fbc622 100644 --- a/fs/smb/client/cifsglob.h +++ b/fs/smb/client/cifsglob.h @@ -290,7 +290,7 @@ struct smb_version_operations { int (*check_receive)(struct mid_q_entry *, struct TCP_Server_Info *, bool); void (*add_credits)(struct TCP_Server_Info *server, - const struct cifs_credits *credits, + struct cifs_credits *credits, const int optype); void (*set_credits)(struct TCP_Server_Info *, const int); int * (*get_credits_field)(struct TCP_Server_Info *, const int); @@ -550,8 +550,8 @@ struct smb_version_operations { size_t *, struct cifs_credits *); /* adjust previously taken mtu credits to request size */ int (*adjust_credits)(struct TCP_Server_Info *server, - struct cifs_credits *credits, - const unsigned int payload_size); + struct cifs_io_subrequest *subreq, + unsigned int /*enum smb3_rw_credits_trace*/ trace); /* check if we need to issue closedir */ bool (*dir_needs_close)(struct cifsFileInfo *); long (*fallocate)(struct file *, struct cifs_tcon *, int, loff_t, @@ -848,6 +848,9 @@ static inline void cifs_server_unlock(struct TCP_Server_Info *server) struct cifs_credits { unsigned int value; unsigned int instance; + unsigned int in_flight_check; + unsigned int rreq_debug_id; + unsigned int rreq_debug_index; };
static inline unsigned int @@ -873,7 +876,7 @@ has_credits(struct TCP_Server_Info *server, int *credits, int num_credits) }
static inline void -add_credits(struct TCP_Server_Info *server, const struct cifs_credits *credits, +add_credits(struct TCP_Server_Info *server, struct cifs_credits *credits, const int optype) { server->ops->add_credits(server, credits, optype); @@ -897,11 +900,11 @@ set_credits(struct TCP_Server_Info *server, const int val) }
static inline int -adjust_credits(struct TCP_Server_Info *server, struct cifs_credits *credits, - const unsigned int payload_size) +adjust_credits(struct TCP_Server_Info *server, struct cifs_io_subrequest *subreq, + unsigned int /* enum smb3_rw_credits_trace */ trace) { return server->ops->adjust_credits ? - server->ops->adjust_credits(server, credits, payload_size) : 0; + server->ops->adjust_credits(server, subreq, trace) : 0; }
static inline __le64 diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c index 9e4f4e67768b9..b413cfef05422 100644 --- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -80,6 +80,16 @@ static void cifs_prepare_write(struct netfs_io_subrequest *subreq) return netfs_prepare_write_failed(subreq); }
+ wdata->credits.rreq_debug_id = subreq->rreq->debug_id; + wdata->credits.rreq_debug_index = subreq->debug_index; + wdata->credits.in_flight_check = 1; + trace_smb3_rw_credits(wdata->rreq->debug_id, + wdata->subreq.debug_index, + wdata->credits.value, + server->credits, server->in_flight, + wdata->credits.value, + cifs_trace_rw_credits_write_prepare); + #ifdef CONFIG_CIFS_SMB_DIRECT if (server->smbd_conn) subreq->max_nr_segs = server->smbd_conn->max_frmr_depth; @@ -101,7 +111,7 @@ static void cifs_issue_write(struct netfs_io_subrequest *subreq) goto fail; }
- rc = adjust_credits(wdata->server, &wdata->credits, wdata->subreq.len); + rc = adjust_credits(wdata->server, wdata, cifs_trace_rw_credits_issue_write_adjust); if (rc) goto fail;
@@ -163,7 +173,18 @@ static bool cifs_clamp_length(struct netfs_io_subrequest *subreq) return false; }
+ rdata->credits.in_flight_check = 1; + rdata->credits.rreq_debug_id = rreq->debug_id; + rdata->credits.rreq_debug_index = subreq->debug_index; + + trace_smb3_rw_credits(rdata->rreq->debug_id, + rdata->subreq.debug_index, + rdata->credits.value, + server->credits, server->in_flight, 0, + cifs_trace_rw_credits_read_submit); + subreq->len = min_t(size_t, subreq->len, rsize); + #ifdef CONFIG_CIFS_SMB_DIRECT if (server->smbd_conn) subreq->max_nr_segs = server->smbd_conn->max_frmr_depth; @@ -295,6 +316,15 @@ static void cifs_free_subrequest(struct netfs_io_subrequest *subreq) #endif }
+ if (rdata->credits.value != 0) + trace_smb3_rw_credits(rdata->rreq->debug_id, + rdata->subreq.debug_index, + rdata->credits.value, + rdata->server ? rdata->server->credits : 0, + rdata->server ? rdata->server->in_flight : 0, + -rdata->credits.value, + cifs_trace_rw_credits_free_subreq); + add_credits_and_wake_if(rdata->server, &rdata->credits, 0); if (rdata->have_xid) free_xid(rdata->xid); diff --git a/fs/smb/client/smb1ops.c b/fs/smb/client/smb1ops.c index 212ec6f66ec65..e1f2feb56f45f 100644 --- a/fs/smb/client/smb1ops.c +++ b/fs/smb/client/smb1ops.c @@ -108,7 +108,7 @@ cifs_find_mid(struct TCP_Server_Info *server, char *buffer)
static void cifs_add_credits(struct TCP_Server_Info *server, - const struct cifs_credits *credits, const int optype) + struct cifs_credits *credits, const int optype) { spin_lock(&server->req_lock); server->credits += credits->value; diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c index c8e536540895a..7fe59235f0901 100644 --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -66,7 +66,7 @@ change_conf(struct TCP_Server_Info *server)
static void smb2_add_credits(struct TCP_Server_Info *server, - const struct cifs_credits *credits, const int optype) + struct cifs_credits *credits, const int optype) { int *val, rc = -1; int scredits, in_flight; @@ -94,7 +94,21 @@ smb2_add_credits(struct TCP_Server_Info *server, server->conn_id, server->hostname, *val, add, server->in_flight); } - WARN_ON_ONCE(server->in_flight == 0); + if (credits->in_flight_check > 1) { + pr_warn_once("rreq R=%08x[%x] Credits not in flight\n", + credits->rreq_debug_id, credits->rreq_debug_index); + } else { + credits->in_flight_check = 2; + } + if (WARN_ON_ONCE(server->in_flight == 0)) { + pr_warn_once("rreq R=%08x[%x] Zero in_flight\n", + credits->rreq_debug_id, credits->rreq_debug_index); + trace_smb3_rw_credits(credits->rreq_debug_id, + credits->rreq_debug_index, + credits->value, + server->credits, server->in_flight, 0, + cifs_trace_rw_credits_zero_in_flight); + } server->in_flight--; if (server->in_flight == 0 && ((optype & CIFS_OP_MASK) != CIFS_NEG_OP) && @@ -283,16 +297,23 @@ smb2_wait_mtu_credits(struct TCP_Server_Info *server, size_t size,
static int smb2_adjust_credits(struct TCP_Server_Info *server, - struct cifs_credits *credits, - const unsigned int payload_size) + struct cifs_io_subrequest *subreq, + unsigned int /*enum smb3_rw_credits_trace*/ trace) { - int new_val = DIV_ROUND_UP(payload_size, SMB2_MAX_BUFFER_SIZE); + struct cifs_credits *credits = &subreq->credits; + int new_val = DIV_ROUND_UP(subreq->subreq.len, SMB2_MAX_BUFFER_SIZE); int scredits, in_flight;
if (!credits->value || credits->value == new_val) return 0;
if (credits->value < new_val) { + trace_smb3_rw_credits(subreq->rreq->debug_id, + subreq->subreq.debug_index, + credits->value, + server->credits, server->in_flight, + new_val - credits->value, + cifs_trace_rw_credits_no_adjust_up); trace_smb3_too_many_credits(server->CurrentMid, server->conn_id, server->hostname, 0, credits->value - new_val, 0); cifs_server_dbg(VFS, "request has less credits (%d) than required (%d)", @@ -308,6 +329,12 @@ smb2_adjust_credits(struct TCP_Server_Info *server, in_flight = server->in_flight; spin_unlock(&server->req_lock);
+ trace_smb3_rw_credits(subreq->rreq->debug_id, + subreq->subreq.debug_index, + credits->value, + server->credits, server->in_flight, + new_val - credits->value, + cifs_trace_rw_credits_old_session); trace_smb3_reconnect_detected(server->CurrentMid, server->conn_id, server->hostname, scredits, credits->value - new_val, in_flight); @@ -316,6 +343,11 @@ smb2_adjust_credits(struct TCP_Server_Info *server, return -EAGAIN; }
+ trace_smb3_rw_credits(subreq->rreq->debug_id, + subreq->subreq.debug_index, + credits->value, + server->credits, server->in_flight, + new_val - credits->value, trace); server->credits += credits->value - new_val; scredits = server->credits; in_flight = server->in_flight; diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index 896147ba6660e..4cd5c33be2a1a 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -4505,8 +4505,15 @@ smb2_readv_callback(struct mid_q_entry *mid) struct TCP_Server_Info *server = rdata->server; struct smb2_hdr *shdr = (struct smb2_hdr *)rdata->iov[0].iov_base; - struct cifs_credits credits = { .value = 0, .instance = 0 }; + struct cifs_credits credits = { + .value = 0, + .instance = 0, + .rreq_debug_id = rdata->rreq->debug_id, + .rreq_debug_index = rdata->subreq.debug_index, + }; struct smb_rqst rqst = { .rq_iov = &rdata->iov[1], .rq_nvec = 1 }; + unsigned int rreq_debug_id = rdata->rreq->debug_id; + unsigned int subreq_debug_index = rdata->subreq.debug_index;
if (rdata->got_bytes) { rqst.rq_iter = rdata->subreq.io_iter; @@ -4590,10 +4597,16 @@ smb2_readv_callback(struct mid_q_entry *mid) if (rdata->subreq.start < rdata->subreq.rreq->i_size) rdata->result = 0; } + trace_smb3_rw_credits(rreq_debug_id, subreq_debug_index, rdata->credits.value, + server->credits, server->in_flight, + 0, cifs_trace_rw_credits_read_response_clear); rdata->credits.value = 0; INIT_WORK(&rdata->subreq.work, smb2_readv_worker); queue_work(cifsiod_wq, &rdata->subreq.work); release_mid(mid); + trace_smb3_rw_credits(rreq_debug_id, subreq_debug_index, 0, + server->credits, server->in_flight, + credits.value, cifs_trace_rw_credits_read_response_add); add_credits(server, &credits, 0); }
@@ -4650,7 +4663,7 @@ smb2_async_readv(struct cifs_io_subrequest *rdata) min_t(int, server->max_credits - server->credits, credit_request));
- rc = adjust_credits(server, &rdata->credits, rdata->subreq.len); + rc = adjust_credits(server, rdata, cifs_trace_rw_credits_call_readv_adjust); if (rc) goto async_readv_out;
@@ -4769,7 +4782,14 @@ smb2_writev_callback(struct mid_q_entry *mid) struct cifs_tcon *tcon = tlink_tcon(wdata->req->cfile->tlink); struct TCP_Server_Info *server = wdata->server; struct smb2_write_rsp *rsp = (struct smb2_write_rsp *)mid->resp_buf; - struct cifs_credits credits = { .value = 0, .instance = 0 }; + struct cifs_credits credits = { + .value = 0, + .instance = 0, + .rreq_debug_id = wdata->rreq->debug_id, + .rreq_debug_index = wdata->subreq.debug_index, + }; + unsigned int rreq_debug_id = wdata->rreq->debug_id; + unsigned int subreq_debug_index = wdata->subreq.debug_index; ssize_t result = 0; size_t written;
@@ -4840,9 +4860,15 @@ smb2_writev_callback(struct mid_q_entry *mid) tcon->tid, tcon->ses->Suid, wdata->subreq.start, wdata->subreq.len);
+ trace_smb3_rw_credits(rreq_debug_id, subreq_debug_index, wdata->credits.value, + server->credits, server->in_flight, + 0, cifs_trace_rw_credits_write_response_clear); wdata->credits.value = 0; cifs_write_subrequest_terminated(wdata, result ?: written, true); release_mid(mid); + trace_smb3_rw_credits(rreq_debug_id, subreq_debug_index, 0, + server->credits, server->in_flight, + credits.value, cifs_trace_rw_credits_write_response_add); add_credits(server, &credits, 0); }
@@ -4972,7 +4998,7 @@ smb2_async_writev(struct cifs_io_subrequest *wdata) min_t(int, server->max_credits - server->credits, credit_request));
- rc = adjust_credits(server, &wdata->credits, io_parms->length); + rc = adjust_credits(server, wdata, cifs_trace_rw_credits_call_writev_adjust); if (rc) goto async_writev_out;
@@ -4997,6 +5023,12 @@ smb2_async_writev(struct cifs_io_subrequest *wdata) cifs_small_buf_release(req); out: if (rc) { + trace_smb3_rw_credits(wdata->rreq->debug_id, + wdata->subreq.debug_index, + wdata->credits.value, + server->credits, server->in_flight, + -(int)wdata->credits.value, + cifs_trace_rw_credits_write_response_clear); add_credits_and_wake_if(wdata->server, &wdata->credits, 0); cifs_write_subrequest_terminated(wdata, rc, true); } diff --git a/fs/smb/client/trace.h b/fs/smb/client/trace.h index 36d47ce596317..36d5295c2a6f9 100644 --- a/fs/smb/client/trace.h +++ b/fs/smb/client/trace.h @@ -20,6 +20,22 @@ /* * Specify enums for tracing information. */ +#define smb3_rw_credits_traces \ + EM(cifs_trace_rw_credits_call_readv_adjust, "rd-call-adj") \ + EM(cifs_trace_rw_credits_call_writev_adjust, "wr-call-adj") \ + EM(cifs_trace_rw_credits_free_subreq, "free-subreq") \ + EM(cifs_trace_rw_credits_issue_read_adjust, "rd-issu-adj") \ + EM(cifs_trace_rw_credits_issue_write_adjust, "wr-issu-adj") \ + EM(cifs_trace_rw_credits_no_adjust_up, "no-adj-up ") \ + EM(cifs_trace_rw_credits_old_session, "old-session") \ + EM(cifs_trace_rw_credits_read_response_add, "rd-resp-add") \ + EM(cifs_trace_rw_credits_read_response_clear, "rd-resp-clr") \ + EM(cifs_trace_rw_credits_read_submit, "rd-submit ") \ + EM(cifs_trace_rw_credits_write_prepare, "wr-prepare ") \ + EM(cifs_trace_rw_credits_write_response_add, "wr-resp-add") \ + EM(cifs_trace_rw_credits_write_response_clear, "wr-resp-clr") \ + E_(cifs_trace_rw_credits_zero_in_flight, "ZERO-IN-FLT") + #define smb3_tcon_ref_traces \ EM(netfs_trace_tcon_ref_dec_dfs_refer, "DEC DfsRef") \ EM(netfs_trace_tcon_ref_free, "FRE ") \ @@ -59,7 +75,8 @@ #define EM(a, b) a, #define E_(a, b) a
-enum smb3_tcon_ref_trace { smb3_tcon_ref_traces } __mode(byte); +enum smb3_rw_credits_trace { smb3_rw_credits_traces } __mode(byte); +enum smb3_tcon_ref_trace { smb3_tcon_ref_traces } __mode(byte);
#undef EM #undef E_ @@ -71,6 +88,7 @@ enum smb3_tcon_ref_trace { smb3_tcon_ref_traces } __mode(byte); #define EM(a, b) TRACE_DEFINE_ENUM(a); #define E_(a, b) TRACE_DEFINE_ENUM(a);
+smb3_rw_credits_traces; smb3_tcon_ref_traces;
#undef EM @@ -1316,6 +1334,41 @@ TRACE_EVENT(smb3_tcon_ref, __entry->ref) );
+TRACE_EVENT(smb3_rw_credits, + TP_PROTO(unsigned int rreq_debug_id, + unsigned int subreq_debug_index, + unsigned int subreq_credits, + unsigned int server_credits, + int server_in_flight, + int credit_change, + enum smb3_rw_credits_trace trace), + TP_ARGS(rreq_debug_id, subreq_debug_index, subreq_credits, + server_credits, server_in_flight, credit_change, trace), + TP_STRUCT__entry( + __field(unsigned int, rreq_debug_id) + __field(unsigned int, subreq_debug_index) + __field(unsigned int, subreq_credits) + __field(unsigned int, server_credits) + __field(int, in_flight) + __field(int, credit_change) + __field(enum smb3_rw_credits_trace, trace) + ), + TP_fast_assign( + __entry->rreq_debug_id = rreq_debug_id; + __entry->subreq_debug_index = subreq_debug_index; + __entry->subreq_credits = subreq_credits; + __entry->server_credits = server_credits; + __entry->in_flight = server_in_flight; + __entry->credit_change = credit_change; + __entry->trace = trace; + ), + TP_printk("R=%08x[%x] %s cred=%u chg=%d pool=%u ifl=%d", + __entry->rreq_debug_id, __entry->subreq_debug_index, + __print_symbolic(__entry->trace, smb3_rw_credits_traces), + __entry->subreq_credits, __entry->credit_change, + __entry->server_credits, __entry->in_flight) + ); +
#undef EM #undef E_ diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c index 012b9bd069952..adfe0d0587010 100644 --- a/fs/smb/client/transport.c +++ b/fs/smb/client/transport.c @@ -988,10 +988,10 @@ static void cifs_compound_callback(struct mid_q_entry *mid) { struct TCP_Server_Info *server = mid->server; - struct cifs_credits credits; - - credits.value = server->ops->get_credits(mid); - credits.instance = server->reconnect_instance; + struct cifs_credits credits = { + .value = server->ops->get_credits(mid), + .instance = server->reconnect_instance, + };
add_credits(server, &credits, mid->optype);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Su Hui suhui@nfschina.com
[ Upstream commit 74c2ab6d653b4c2354df65a7f7f2df1925a40a51 ]
Clang static checker (scan-build) warning: cifsglob.h:line 890, column 3 Access to field 'ops' results in a dereference of a null pointer.
Commit 519be989717c ("cifs: Add a tracepoint to track credits involved in R/W requests") adds a check for 'rdata->server', and let clang throw this warning about NULL dereference.
When 'rdata->credits.value != 0 && rdata->server == NULL' happens, add_credits_and_wake_if() will call rdata->server->ops->add_credits(). This will cause NULL dereference problem. Add a check for 'rdata->server' to avoid NULL dereference.
Cc: stable@vger.kernel.org Fixes: 69c3c023af25 ("cifs: Implement netfslib hooks") Reviewed-by: David Howells dhowells@redhat.com Signed-off-by: Su Hui suhui@nfschina.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/file.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/smb/client/file.c b/fs/smb/client/file.c index b413cfef05422..1fc66bcf49eb4 100644 --- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -316,7 +316,7 @@ static void cifs_free_subrequest(struct netfs_io_subrequest *subreq) #endif }
- if (rdata->credits.value != 0) + if (rdata->credits.value != 0) { trace_smb3_rw_credits(rdata->rreq->debug_id, rdata->subreq.debug_index, rdata->credits.value, @@ -324,8 +324,12 @@ static void cifs_free_subrequest(struct netfs_io_subrequest *subreq) rdata->server ? rdata->server->in_flight : 0, -rdata->credits.value, cifs_trace_rw_credits_free_subreq); + if (rdata->server) + add_credits_and_wake_if(rdata->server, &rdata->credits, 0); + else + rdata->credits.value = 0; + }
- add_credits_and_wake_if(rdata->server, &rdata->credits, 0); if (rdata->have_xid) free_xid(rdata->xid); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mikulas Patocka mpatocka@redhat.com
[ Upstream commit 1e1fd567d32fcf7544c6e09e0e5bc6c650da6e23 ]
This commit changes device mapper, so that it returns -ERESTARTSYS instead of -EINTR when it is interrupted by a signal (so that the ioctl can be restarted).
The manpage signal(7) says that the ioctl function should be restarted if the signal was handled with SA_RESTART.
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/md/dm.c b/drivers/md/dm.c index 13037d6a6f62a..6e15ac4e0845c 100644 --- a/drivers/md/dm.c +++ b/drivers/md/dm.c @@ -2594,7 +2594,7 @@ static int dm_wait_for_bios_completion(struct mapped_device *md, unsigned int ta break;
if (signal_pending_state(task_state, current)) { - r = -EINTR; + r = -ERESTARTSYS; break; }
@@ -2619,7 +2619,7 @@ static int dm_wait_for_completion(struct mapped_device *md, unsigned int task_st break;
if (signal_pending_state(task_state, current)) { - r = -EINTR; + r = -ERESTARTSYS; break; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baochen Qiang quic_bqiang@quicinc.com
[ Upstream commit 38055789d15155109b41602ad719d770af507030 ]
In transmit path, it is likely that the iova is not aligned to PCIe TLP max payload size, which is 128 for WCN7850. Normally in such cases hardware is expected to split the packet into several parts in a manner such that they, other than the first one, have aligned iova. However due to hardware limitations, WCN7850 does not behave like that properly with some specific unaligned iova in transmit path. This easily results in target hang in a KPI transmit test: packet send/receive failure, WMI command send timeout etc. Also fatal error seen in PCIe level:
... Capabilities: ... ... DevSta: ... FatalErr+ ... ... ...
Work around this by manually moving/reallocating payload buffer such that we can map it to a 128 bytes aligned iova. The moving requires sufficient head room or tail room in skb: for the former we can do ourselves a favor by asking some extra bytes when registering with mac80211, while for the latter we can do nothing.
Moving/reallocating buffer consumes additional CPU cycles, but the good news is that an aligned iova increases PCIe efficiency. In my tests on some X86 platforms the KPI results are almost consistent.
Since this is seen only with WCN7850, add a new hardware parameter to differentiate from others.
Tested-on: WCN7850 hw2.0 PCI WLAN.HMT.1.0.c5-00481-QCAHMTSWPL_V1.0_V2.0_SILICONZ-3
Signed-off-by: Baochen Qiang quic_bqiang@quicinc.com Cc: stable@vger.kernel.org Tested-by: Mark Pearson mpearson-lenovo@squebb.ca Signed-off-by: Kalle Valo quic_kvalo@quicinc.com Link: https://patch.msgid.link/20240715023814.20242-1-quic_bqiang@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/wireless/ath/ath12k/dp_tx.c | 72 +++++++++++++++++++++++++ drivers/net/wireless/ath/ath12k/hw.c | 6 +++ drivers/net/wireless/ath/ath12k/hw.h | 4 ++ drivers/net/wireless/ath/ath12k/mac.c | 1 + 4 files changed, 83 insertions(+)
diff --git a/drivers/net/wireless/ath/ath12k/dp_tx.c b/drivers/net/wireless/ath/ath12k/dp_tx.c index a7c7a868c14ce..fca9f7e510b41 100644 --- a/drivers/net/wireless/ath/ath12k/dp_tx.c +++ b/drivers/net/wireless/ath/ath12k/dp_tx.c @@ -124,6 +124,60 @@ static void ath12k_hal_tx_cmd_ext_desc_setup(struct ath12k_base *ab, HAL_TX_MSDU_EXT_INFO1_ENCRYPT_TYPE); }
+static void ath12k_dp_tx_move_payload(struct sk_buff *skb, + unsigned long delta, + bool head) +{ + unsigned long len = skb->len; + + if (head) { + skb_push(skb, delta); + memmove(skb->data, skb->data + delta, len); + skb_trim(skb, len); + } else { + skb_put(skb, delta); + memmove(skb->data + delta, skb->data, len); + skb_pull(skb, delta); + } +} + +static int ath12k_dp_tx_align_payload(struct ath12k_base *ab, + struct sk_buff **pskb) +{ + u32 iova_mask = ab->hw_params->iova_mask; + unsigned long offset, delta1, delta2; + struct sk_buff *skb2, *skb = *pskb; + unsigned int headroom = skb_headroom(skb); + int tailroom = skb_tailroom(skb); + int ret = 0; + + offset = (unsigned long)skb->data & iova_mask; + delta1 = offset; + delta2 = iova_mask - offset + 1; + + if (headroom >= delta1) { + ath12k_dp_tx_move_payload(skb, delta1, true); + } else if (tailroom >= delta2) { + ath12k_dp_tx_move_payload(skb, delta2, false); + } else { + skb2 = skb_realloc_headroom(skb, iova_mask); + if (!skb2) { + ret = -ENOMEM; + goto out; + } + + dev_kfree_skb_any(skb); + + offset = (unsigned long)skb2->data & iova_mask; + if (offset) + ath12k_dp_tx_move_payload(skb2, offset, true); + *pskb = skb2; + } + +out: + return ret; +} + int ath12k_dp_tx(struct ath12k *ar, struct ath12k_vif *arvif, struct sk_buff *skb) { @@ -145,6 +199,7 @@ int ath12k_dp_tx(struct ath12k *ar, struct ath12k_vif *arvif, u8 ring_selector, ring_map = 0; bool tcl_ring_retry; bool msdu_ext_desc = false; + u32 iova_mask = ab->hw_params->iova_mask;
if (test_bit(ATH12K_FLAG_CRASH_FLUSH, &ar->ab->dev_flags)) return -ESHUTDOWN; @@ -240,6 +295,23 @@ int ath12k_dp_tx(struct ath12k *ar, struct ath12k_vif *arvif, goto fail_remove_tx_buf; }
+ if (iova_mask && + (unsigned long)skb->data & iova_mask) { + ret = ath12k_dp_tx_align_payload(ab, &skb); + if (ret) { + ath12k_warn(ab, "failed to align TX buffer %d\n", ret); + /* don't bail out, give original buffer + * a chance even unaligned. + */ + goto map; + } + + /* hdr is pointing to a wrong place after alignment, + * so refresh it for later use. + */ + hdr = (void *)skb->data; + } +map: ti.paddr = dma_map_single(ab->dev, skb->data, skb->len, DMA_TO_DEVICE); if (dma_mapping_error(ab->dev, ti.paddr)) { atomic_inc(&ab->soc_stats.tx_err.misc_fail); diff --git a/drivers/net/wireless/ath/ath12k/hw.c b/drivers/net/wireless/ath/ath12k/hw.c index bff8cf97a18c6..2a92147d15fa1 100644 --- a/drivers/net/wireless/ath/ath12k/hw.c +++ b/drivers/net/wireless/ath/ath12k/hw.c @@ -922,6 +922,8 @@ static const struct ath12k_hw_params ath12k_hw_params[] = { .supports_sta_ps = false,
.acpi_guid = NULL, + + .iova_mask = 0, }, { .name = "wcn7850 hw2.0", @@ -997,6 +999,8 @@ static const struct ath12k_hw_params ath12k_hw_params[] = { .supports_sta_ps = true,
.acpi_guid = &wcn7850_uuid, + + .iova_mask = ATH12K_PCIE_MAX_PAYLOAD_SIZE - 1, }, { .name = "qcn9274 hw2.0", @@ -1067,6 +1071,8 @@ static const struct ath12k_hw_params ath12k_hw_params[] = { .supports_sta_ps = false,
.acpi_guid = NULL, + + .iova_mask = 0, }, };
diff --git a/drivers/net/wireless/ath/ath12k/hw.h b/drivers/net/wireless/ath/ath12k/hw.h index 2a314cfc8cb84..400bda17e02f6 100644 --- a/drivers/net/wireless/ath/ath12k/hw.h +++ b/drivers/net/wireless/ath/ath12k/hw.h @@ -96,6 +96,8 @@ #define ATH12K_M3_FILE "m3.bin" #define ATH12K_REGDB_FILE_NAME "regdb.bin"
+#define ATH12K_PCIE_MAX_PAYLOAD_SIZE 128 + enum ath12k_hw_rate_cck { ATH12K_HW_RATE_CCK_LP_11M = 0, ATH12K_HW_RATE_CCK_LP_5_5M, @@ -214,6 +216,8 @@ struct ath12k_hw_params { bool supports_sta_ps;
const guid_t *acpi_guid; + + u32 iova_mask; };
struct ath12k_hw_ops { diff --git a/drivers/net/wireless/ath/ath12k/mac.c b/drivers/net/wireless/ath/ath12k/mac.c index ead37a4e002a2..8474e25d2ac64 100644 --- a/drivers/net/wireless/ath/ath12k/mac.c +++ b/drivers/net/wireless/ath/ath12k/mac.c @@ -8737,6 +8737,7 @@ static int ath12k_mac_hw_register(struct ath12k_hw *ah)
hw->vif_data_size = sizeof(struct ath12k_vif); hw->sta_data_size = sizeof(struct ath12k_sta); + hw->extra_tx_headroom = ab->hw_params->iova_mask;
wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_CQM_RSSI_LIST); wiphy_ext_feature_set(wiphy, NL80211_EXT_FEATURE_STA_TX_PWR);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maximilian Luz luzmaximilian@gmail.com
[ Upstream commit bc923d594db21bee0ead128eb4bb78f7e77467a4 ]
There is a small window in ssam_serial_hub_probe() where the controller is initialized but has not been started yet. Specifically, between ssam_controller_init() and ssam_controller_start(). Any failure in this window, for example caused by a failure of serdev_device_open(), currently results in an incorrect warning being emitted.
In particular, any failure in this window results in the controller being destroyed via ssam_controller_destroy(). This function checks the state of the controller and, in an attempt to validate that the controller has been cleanly shut down before we try and deallocate any resources, emits a warning if that state is not SSAM_CONTROLLER_STOPPED.
However, since we have only just initialized the controller and have not yet started it, its state is SSAM_CONTROLLER_INITIALIZED. Note that this is the only point at which the controller has this state, as it will change after we start the controller with ssam_controller_start() and never revert back. Further, at this point no communication has taken place and the sender and receiver threads have not been started yet (and we may not even have an open serdev device either).
Therefore, it is perfectly safe to call ssam_controller_destroy() with a state of SSAM_CONTROLLER_INITIALIZED. This, however, means that the warning currently being emitted is incorrect. Fix it by extending the check.
Fixes: c167b9c7e3d6 ("platform/surface: Add Surface Aggregator subsystem") Signed-off-by: Maximilian Luz luzmaximilian@gmail.com Link: https://lore.kernel.org/r/20240811124645.246016-1-luzmaximilian@gmail.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/surface/aggregator/controller.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/platform/surface/aggregator/controller.c b/drivers/platform/surface/aggregator/controller.c index 7fc602e01487d..7e89f547999b2 100644 --- a/drivers/platform/surface/aggregator/controller.c +++ b/drivers/platform/surface/aggregator/controller.c @@ -1354,7 +1354,8 @@ void ssam_controller_destroy(struct ssam_controller *ctrl) if (ctrl->state == SSAM_CONTROLLER_UNINITIALIZED) return;
- WARN_ON(ctrl->state != SSAM_CONTROLLER_STOPPED); + WARN_ON(ctrl->state != SSAM_CONTROLLER_STOPPED && + ctrl->state != SSAM_CONTROLLER_INITIALIZED);
/* * Note: New events could still have been received after the previous
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Takashi Iwai tiwai@suse.de
[ Upstream commit 829e2a23121fb36ee30ea5145c2a85199f68e2c8 ]
The data conversion is done rather by a wrong function. We convert to BE32, not from BE32. Although the end result must be same, this was complained by the compiler.
Fix the code again and align with another similar function tas2563_apply_calib() that does already right.
Fixes: 3beddef84d90 ("ALSA: hda/tas2781: fix wrong calibrated data order") Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202408141630.DiDUB8Z4-lkp@intel.com/ Link: https://patch.msgid.link/20240814100500.1944-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/tas2781_hda_i2c.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/sound/pci/hda/tas2781_hda_i2c.c b/sound/pci/hda/tas2781_hda_i2c.c index 1d4b044b78410..9e88d39eac1e2 100644 --- a/sound/pci/hda/tas2781_hda_i2c.c +++ b/sound/pci/hda/tas2781_hda_i2c.c @@ -527,8 +527,8 @@ static void tas2781_apply_calib(struct tasdevice_priv *tas_priv)
for (i = 0; i < tas_priv->ndev; i++) { for (j = 0; j < CALIB_MAX; j++) { - data = get_unaligned_be32( - &tas_priv->cali_data.data[offset]); + data = cpu_to_be32( + *(uint32_t *)&tas_priv->cali_data.data[offset]); rc = tasdevice_dev_bulk_write(tas_priv, i, TASDEVICE_REG(0, page_array[j], rgno_array[j]), (unsigned char *)&data, 4);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandre Courbot gnurou@gmail.com
[ Upstream commit 6fc9aacad49e3fbecd270c266850d50c453d52ef ]
When trying to build compile_commands.json for an external module against the kernel built in a separate output directory, the following error is displayed:
make[1]: *** No rule to make target 'scripts/clang-tools/gen_compile_commands.py', needed by 'compile_commands.json'. Stop.
This is because gen_compile_commands.py was previously looked up using a relative path to $(srctree), but commit b1992c3772e6 ("kbuild: use $(src) instead of $(srctree)/$(src) for source directory") stopped defining VPATH for external module builds.
Prefixing gen_compile_commands.py with $(srctree) fixes the problem.
Fixes: b1992c3772e6 ("kbuild: use $(src) instead of $(srctree)/$(src) for source directory") Signed-off-by: Alexandre Courbot gnurou@gmail.com Reviewed-by: Nicolas Schier nicolas@fjasle.eu Signed-off-by: Masahiro Yamada masahiroy@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- Makefile | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Makefile b/Makefile index 361a70264e1fb..194841a4efde9 100644 --- a/Makefile +++ b/Makefile @@ -1986,7 +1986,7 @@ nsdeps: modules quiet_cmd_gen_compile_commands = GEN $@ cmd_gen_compile_commands = $(PYTHON3) $< -a $(AR) -o $@ $(filter-out $<, $(real-prereqs))
-$(extmod_prefix)compile_commands.json: scripts/clang-tools/gen_compile_commands.py \ +$(extmod_prefix)compile_commands.json: $(srctree)/scripts/clang-tools/gen_compile_commands.py \ $(if $(KBUILD_EXTMOD),, vmlinux.a $(KBUILD_VMLINUX_LIBS)) \ $(if $(CONFIG_MODULES), $(MODORDER)) FORCE $(call if_changed,gen_compile_commands)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit b1a9a5e04767e2a78783e19c9e55c25812ceccc3 ]
CONFIG_DEBUG_INFO_BTF=y requires one additional link step. (.tmp_vmlinux.btf)
CONFIG_KALLSYMS=y requires two additional link steps. (.tmp_vmlinux.kallsyms1 and .tmp_vmlinux.kallsyms2)
Enabling both requires three additional link steps.
When CONFIG_DEBUG_INFO_BTF=y and CONFIG_KALLSYMS=y, the current build process is as follows:
KSYMS .tmp_vmlinux.kallsyms0.S AS .tmp_vmlinux.kallsyms0.o LD .tmp_vmlinux.btf # temporary vmlinux for BTF BTF .btf.vmlinux.bin.o LD .tmp_vmlinux.kallsyms1 # temporary vmlinux for kallsyms step 1 NM .tmp_vmlinux.kallsyms1.syms KSYMS .tmp_vmlinux.kallsyms1.S AS .tmp_vmlinux.kallsyms1.o LD .tmp_vmlinux.kallsyms2 # temporary vmlinux for kallsyms step 2 NM .tmp_vmlinux.kallsyms2.syms KSYMS .tmp_vmlinux.kallsyms2.S AS .tmp_vmlinux.kallsyms2.o LD vmlinux # final vmlinux
This is redundant because the BTF generation and the kallsyms step 1 can be performed against the same temporary vmlinux.
When both CONFIG_DEBUG_INFO_BTF and CONFIG_KALLSYMS are enabled, we can reduce the number of link steps by one.
This commit changes the build process as follows:
KSYMS .tmp_vmlinux0.kallsyms.S AS .tmp_vmlinux0.kallsyms.o LD .tmp_vmlinux1 # temporary vmlinux for BTF and kallsyms step 1 BTF .tmp_vmlinux1.btf.o NM .tmp_vmlinux1.syms KSYMS .tmp_vmlinux1.kallsyms.S AS .tmp_vmlinux1.kallsyms.o LD .tmp_vmlinux2 # temporary vmlinux for kallsyms step 2 NM .tmp_vmlinux2.syms KSYMS .tmp_vmlinux2.kallsyms.S AS .tmp_vmlinux2.kallsyms.o LD vmlinux # final vmlinux
Signed-off-by: Masahiro Yamada masahiroy@kernel.org Acked-by: Andrii Nakryiko andrii@kernel.org Stable-dep-of: 1472464c6248 ("kbuild: avoid scripts/kallsyms parsing /dev/null") Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/link-vmlinux.sh | 41 ++++++++++++++++++++++++----------------- 1 file changed, 24 insertions(+), 17 deletions(-)
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 1e41b330550e6..22d0bc8439863 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -105,11 +105,10 @@ vmlinux_link()
# generate .BTF typeinfo from DWARF debuginfo # ${1} - vmlinux image -# ${2} - file to dump raw BTF data into gen_btf() { local pahole_ver - local btf_data=${2} + local btf_data=${1}.btf.o
if ! [ -x "$(command -v ${PAHOLE})" ]; then echo >&2 "BTF: ${1}: pahole (${PAHOLE}) is not available" @@ -122,8 +121,6 @@ gen_btf() return 1 fi
- vmlinux_link ${1} - info BTF "${btf_data}" LLVM_OBJCOPY="${OBJCOPY}" ${PAHOLE} -J ${PAHOLE_FLAGS} ${1}
@@ -169,15 +166,13 @@ kallsyms() kallsymso=${2}.o }
-# Perform one step in kallsyms generation, including temporary linking of -# vmlinux. -kallsyms_step() +# Perform kallsyms for the given temporary vmlinux. +sysmap_and_kallsyms() { - kallsyms_vmlinux=.tmp_vmlinux.kallsyms${1} + mksysmap "${1}" "${1}.syms" + kallsyms "${1}.syms" "${1}.kallsyms"
- vmlinux_link "${kallsyms_vmlinux}" - mksysmap "${kallsyms_vmlinux}" "${kallsyms_vmlinux}.syms" - kallsyms "${kallsyms_vmlinux}.syms" "${kallsyms_vmlinux}" + kallsyms_sysmap=${1}.syms }
# Create map file with all symbols from ${1} @@ -220,11 +215,21 @@ kallsymso= strip_debug=
if is_enabled CONFIG_KALLSYMS; then - kallsyms /dev/null .tmp_vmlinux.kallsyms0 + kallsyms /dev/null .tmp_vmlinux0.kallsyms +fi + +if is_enabled CONFIG_KALLSYMS || is_enabled CONFIG_DEBUG_INFO_BTF; then + + # The kallsyms linking does not need debug symbols, but the BTF does. + if ! is_enabled CONFIG_DEBUG_INFO_BTF; then + strip_debug=1 + fi + + vmlinux_link .tmp_vmlinux1 fi
if is_enabled CONFIG_DEBUG_INFO_BTF; then - if ! gen_btf .tmp_vmlinux.btf .btf.vmlinux.bin.o ; then + if ! gen_btf .tmp_vmlinux1; then echo >&2 "Failed to generate BTF for vmlinux" echo >&2 "Try to disable CONFIG_DEBUG_INFO_BTF" exit 1 @@ -260,14 +265,16 @@ if is_enabled CONFIG_KALLSYMS; then # The kallsyms linking does not need debug symbols included. strip_debug=1
- kallsyms_step 1 + sysmap_and_kallsyms .tmp_vmlinux1 size1=$(${CONFIG_SHELL} "${srctree}/scripts/file-size.sh" ${kallsymso})
- kallsyms_step 2 + vmlinux_link .tmp_vmlinux2 + sysmap_and_kallsyms .tmp_vmlinux2 size2=$(${CONFIG_SHELL} "${srctree}/scripts/file-size.sh" ${kallsymso})
if [ $size1 -ne $size2 ] || [ -n "${KALLSYMS_EXTRA_PASS}" ]; then - kallsyms_step 3 + vmlinux_link .tmp_vmlinux3 + sysmap_and_kallsyms .tmp_vmlinux3 fi fi
@@ -293,7 +300,7 @@ fi
# step a (see comment above) if is_enabled CONFIG_KALLSYMS; then - if ! cmp -s System.map ${kallsyms_vmlinux}.syms; then + if ! cmp -s System.map "${kallsyms_sysmap}"; then echo >&2 Inconsistent kallsyms data echo >&2 'Try "make KALLSYMS_EXTRA_PASS=1" as a workaround' exit 1
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Masahiro Yamada masahiroy@kernel.org
[ Upstream commit 1472464c6248575bf2d01c7f076b94704bb32c95 ]
On macOS, as reported by Daniel Gomez, getline() sets ENOTTY to errno if it is requested to read from /dev/null.
If this is worth fixing, I would rather pass an empty file to scripts/kallsyms instead of adding the ugly #ifdef __APPLE__.
Fixes: c442db3f49f2 ("kbuild: remove PROVIDE() for kallsyms symbols") Reported-by: Daniel Gomez da.gomez@samsung.com Closes: https://lore.kernel.org/all/20240807-macos-build-support-v1-12-4cd1ded85694@... Signed-off-by: Masahiro Yamada masahiroy@kernel.org Reviewed-by: Nicolas Schier nicolas@fjasle.eu Reviewed-by: Daniel Gomez da.gomez@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/link-vmlinux.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/scripts/link-vmlinux.sh b/scripts/link-vmlinux.sh index 22d0bc8439863..070a319140e89 100755 --- a/scripts/link-vmlinux.sh +++ b/scripts/link-vmlinux.sh @@ -215,7 +215,8 @@ kallsymso= strip_debug=
if is_enabled CONFIG_KALLSYMS; then - kallsyms /dev/null .tmp_vmlinux0.kallsyms + truncate -s0 .tmp_vmlinux.kallsyms0.syms + kallsyms .tmp_vmlinux.kallsyms0.syms .tmp_vmlinux0.kallsyms fi
if is_enabled CONFIG_KALLSYMS || is_enabled CONFIG_DEBUG_INFO_BTF; then
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit aae6b81260fd9a7224f7eb4fc440d625852245bb ]
This inverts the LE State quirk so by default we assume the controllers would report valid states rather than invalid which is how quirks normally behave, also this would result in HCI command failing it the LE States are really broken thus exposing the controllers that are really broken in this respect.
Link: https://github.com/bluez/bluez/issues/584 Fixes: 220915857e29 ("Bluetooth: Adding driver and quirk defs for multi-role LE") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/bluetooth/btintel.c | 10 ---------- drivers/bluetooth/btintel_pcie.c | 3 --- drivers/bluetooth/btmtksdio.c | 3 --- drivers/bluetooth/btrtl.c | 1 - drivers/bluetooth/btusb.c | 4 ++-- drivers/bluetooth/hci_qca.c | 4 ++-- drivers/bluetooth/hci_vhci.c | 2 -- include/net/bluetooth/hci.h | 17 ++++++++++------- include/net/bluetooth/hci_core.h | 2 +- net/bluetooth/hci_event.c | 2 +- 10 files changed, 16 insertions(+), 32 deletions(-)
diff --git a/drivers/bluetooth/btintel.c b/drivers/bluetooth/btintel.c index 93900c37349c1..c084dc88d3d91 100644 --- a/drivers/bluetooth/btintel.c +++ b/drivers/bluetooth/btintel.c @@ -2876,9 +2876,6 @@ static int btintel_setup_combined(struct hci_dev *hdev) INTEL_ROM_LEGACY_NO_WBS_SUPPORT)) set_bit(HCI_QUIRK_WIDEBAND_SPEECH_SUPPORTED, &hdev->quirks); - if (ver.hw_variant == 0x08 && ver.fw_variant == 0x22) - set_bit(HCI_QUIRK_VALID_LE_STATES, - &hdev->quirks);
err = btintel_legacy_rom_setup(hdev, &ver); break; @@ -2887,7 +2884,6 @@ static int btintel_setup_combined(struct hci_dev *hdev) case 0x12: /* ThP */ case 0x13: /* HrP */ case 0x14: /* CcP */ - set_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks); fallthrough; case 0x0c: /* WsP */ /* Apply the device specific HCI quirks @@ -2979,9 +2975,6 @@ static int btintel_setup_combined(struct hci_dev *hdev) /* These variants don't seem to support LE Coded PHY */ set_bit(HCI_QUIRK_BROKEN_LE_CODED, &hdev->quirks);
- /* Set Valid LE States quirk */ - set_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks); - /* Setup MSFT Extension support */ btintel_set_msft_opcode(hdev, ver.hw_variant);
@@ -3003,9 +2996,6 @@ static int btintel_setup_combined(struct hci_dev *hdev) */ set_bit(HCI_QUIRK_WIDEBAND_SPEECH_SUPPORTED, &hdev->quirks);
- /* Apply LE States quirk from solar onwards */ - set_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks); - /* Setup MSFT Extension support */ btintel_set_msft_opcode(hdev, INTEL_HW_VARIANT(ver_tlv.cnvi_bt)); diff --git a/drivers/bluetooth/btintel_pcie.c b/drivers/bluetooth/btintel_pcie.c index b8120b98a2395..1fd3b7073ab90 100644 --- a/drivers/bluetooth/btintel_pcie.c +++ b/drivers/bluetooth/btintel_pcie.c @@ -1182,9 +1182,6 @@ static int btintel_pcie_setup(struct hci_dev *hdev) */ set_bit(HCI_QUIRK_WIDEBAND_SPEECH_SUPPORTED, &hdev->quirks);
- /* Apply LE States quirk from solar onwards */ - set_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks); - /* Setup MSFT Extension support */ btintel_set_msft_opcode(hdev, INTEL_HW_VARIANT(ver_tlv.cnvi_bt)); diff --git a/drivers/bluetooth/btmtksdio.c b/drivers/bluetooth/btmtksdio.c index 8ded9ef8089a2..bc4700ed3b782 100644 --- a/drivers/bluetooth/btmtksdio.c +++ b/drivers/bluetooth/btmtksdio.c @@ -1144,9 +1144,6 @@ static int btmtksdio_setup(struct hci_dev *hdev) } }
- /* Valid LE States quirk for MediaTek 7921 */ - set_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks); - break; case 0x7663: case 0x7668: diff --git a/drivers/bluetooth/btrtl.c b/drivers/bluetooth/btrtl.c index 4f1e37b4f7802..bfcb41a57655f 100644 --- a/drivers/bluetooth/btrtl.c +++ b/drivers/bluetooth/btrtl.c @@ -1287,7 +1287,6 @@ void btrtl_set_quirks(struct hci_dev *hdev, struct btrtl_device_info *btrtl_dev) case CHIP_ID_8852C: case CHIP_ID_8851B: case CHIP_ID_8852BT: - set_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks); set_bit(HCI_QUIRK_WIDEBAND_SPEECH_SUPPORTED, &hdev->quirks);
/* RTL8852C needs to transmit mSBC data continuously without diff --git a/drivers/bluetooth/btusb.c b/drivers/bluetooth/btusb.c index 789c492df6fa2..0927f51867c26 100644 --- a/drivers/bluetooth/btusb.c +++ b/drivers/bluetooth/btusb.c @@ -4545,8 +4545,8 @@ static int btusb_probe(struct usb_interface *intf, if (id->driver_info & BTUSB_WIDEBAND_SPEECH) set_bit(HCI_QUIRK_WIDEBAND_SPEECH_SUPPORTED, &hdev->quirks);
- if (id->driver_info & BTUSB_VALID_LE_STATES) - set_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks); + if (!(id->driver_info & BTUSB_VALID_LE_STATES)) + set_bit(HCI_QUIRK_BROKEN_LE_STATES, &hdev->quirks);
if (id->driver_info & BTUSB_DIGIANSWER) { data->cmdreq_type = USB_TYPE_VENDOR; diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index 9a0bc86f9aace..34c36f0f781ea 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -2408,8 +2408,8 @@ static int qca_serdev_probe(struct serdev_device *serdev) set_bit(HCI_QUIRK_WIDEBAND_SPEECH_SUPPORTED, &hdev->quirks);
- if (data->capabilities & QCA_CAP_VALID_LE_STATES) - set_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks); + if (!(data->capabilities & QCA_CAP_VALID_LE_STATES)) + set_bit(HCI_QUIRK_BROKEN_LE_STATES, &hdev->quirks); }
return 0; diff --git a/drivers/bluetooth/hci_vhci.c b/drivers/bluetooth/hci_vhci.c index 28750a40f0ed5..b652d68f0ee14 100644 --- a/drivers/bluetooth/hci_vhci.c +++ b/drivers/bluetooth/hci_vhci.c @@ -425,8 +425,6 @@ static int __vhci_create_device(struct vhci_data *data, __u8 opcode) if (opcode & 0x80) set_bit(HCI_QUIRK_RAW_DEVICE, &hdev->quirks);
- set_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks); - if (hci_register_dev(hdev) < 0) { BT_ERR("Can't register HCI device"); hci_free_dev(hdev); diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h index e372a88e8c3f6..d1d073089f384 100644 --- a/include/net/bluetooth/hci.h +++ b/include/net/bluetooth/hci.h @@ -206,14 +206,17 @@ enum { */ HCI_QUIRK_WIDEBAND_SPEECH_SUPPORTED,
- /* When this quirk is set, the controller has validated that - * LE states reported through the HCI_LE_READ_SUPPORTED_STATES are - * valid. This mechanism is necessary as many controllers have - * been seen has having trouble initiating a connectable - * advertisement despite the state combination being reported as - * supported. + /* When this quirk is set, the LE states reported through the + * HCI_LE_READ_SUPPORTED_STATES are invalid/broken. + * + * This mechanism is necessary as many controllers have been seen has + * having trouble initiating a connectable advertisement despite the + * state combination being reported as supported. + * + * This quirk can be set before hci_register_dev is called or + * during the hdev->setup vendor callback. */ - HCI_QUIRK_VALID_LE_STATES, + HCI_QUIRK_BROKEN_LE_STATES,
/* When this quirk is set, then erroneous data reporting * is ignored. This is mainly due to the fact that the HCI diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h index b15f51ae3bfd9..c97ff64c9189f 100644 --- a/include/net/bluetooth/hci_core.h +++ b/include/net/bluetooth/hci_core.h @@ -826,7 +826,7 @@ extern struct mutex hci_cb_list_lock; } while (0)
#define hci_dev_le_state_simultaneous(hdev) \ - (test_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks) && \ + (!test_bit(HCI_QUIRK_BROKEN_LE_STATES, &hdev->quirks) && \ (hdev->le_states[4] & 0x08) && /* Central */ \ (hdev->le_states[4] & 0x40) && /* Peripheral */ \ (hdev->le_states[3] & 0x10)) /* Simultaneous */ diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c index a78f6d706cd43..59d9086db75fe 100644 --- a/net/bluetooth/hci_event.c +++ b/net/bluetooth/hci_event.c @@ -5921,7 +5921,7 @@ static struct hci_conn *check_pending_le_conn(struct hci_dev *hdev, * while we have an existing one in peripheral role. */ if (hdev->conn_hash.le_num_peripheral > 0 && - (!test_bit(HCI_QUIRK_VALID_LE_STATES, &hdev->quirks) || + (test_bit(HCI_QUIRK_BROKEN_LE_STATES, &hdev->quirks) || !(hdev->le_states[3] & 0x10))) return NULL;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 932021a11805b9da4bd6abf66fe233cccd59fe0e ]
Function hci_sched_le needs to update the respective counter variable inplace other the likes of hci_quote_sent would attempt to use the possible outdated value of conn->{le_cnt,acl_cnt}.
Link: https://github.com/bluez/bluez/issues/915 Fixes: 73d80deb7bdf ("Bluetooth: prioritizing data over HCI") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/hci_core.c | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c index 6ecb110bf46bc..b488d0742c966 100644 --- a/net/bluetooth/hci_core.c +++ b/net/bluetooth/hci_core.c @@ -3674,19 +3674,19 @@ static void hci_sched_le(struct hci_dev *hdev) { struct hci_chan *chan; struct sk_buff *skb; - int quote, cnt, tmp; + int quote, *cnt, tmp;
BT_DBG("%s", hdev->name);
if (!hci_conn_num(hdev, LE_LINK)) return;
- cnt = hdev->le_pkts ? hdev->le_cnt : hdev->acl_cnt; + cnt = hdev->le_pkts ? &hdev->le_cnt : &hdev->acl_cnt;
- __check_timeout(hdev, cnt, LE_LINK); + __check_timeout(hdev, *cnt, LE_LINK);
- tmp = cnt; - while (cnt && (chan = hci_chan_sent(hdev, LE_LINK, "e))) { + tmp = *cnt; + while (*cnt && (chan = hci_chan_sent(hdev, LE_LINK, "e))) { u32 priority = (skb_peek(&chan->data_q))->priority; while (quote-- && (skb = skb_peek(&chan->data_q))) { BT_DBG("chan %p skb %p len %d priority %u", chan, skb, @@ -3701,7 +3701,7 @@ static void hci_sched_le(struct hci_dev *hdev) hci_send_frame(hdev, skb); hdev->le_last_tx = jiffies;
- cnt--; + (*cnt)--; chan->sent++; chan->conn->sent++;
@@ -3711,12 +3711,7 @@ static void hci_sched_le(struct hci_dev *hdev) } }
- if (hdev->le_pkts) - hdev->le_cnt = cnt; - else - hdev->acl_cnt = cnt; - - if (cnt != tmp) + if (*cnt != tmp) hci_prio_recalculate(hdev, LE_LINK); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Luiz Augusto von Dentz luiz.von.dentz@intel.com
[ Upstream commit 28cd47f75185c4818b0fb1b46f2f02faaba96376 ]
SMP initiator role shall be considered the one that initiates the pairing procedure with SMP_CMD_PAIRING_REQ:
BLUETOOTH CORE SPECIFICATION Version 5.3 | Vol 3, Part H page 1557:
Figure 2.1: LE pairing phases
Note that by sending SMP_CMD_SECURITY_REQ it doesn't change the role to be Initiator.
Link: https://github.com/bluez/bluez/issues/567 Fixes: b28b4943660f ("Bluetooth: Add strict checks for allowed SMP PDUs") Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/bluetooth/smp.c | 144 ++++++++++++++++++++++---------------------- 1 file changed, 72 insertions(+), 72 deletions(-)
diff --git a/net/bluetooth/smp.c b/net/bluetooth/smp.c index 1e7ea3a4b7ef3..4f9fdf400584e 100644 --- a/net/bluetooth/smp.c +++ b/net/bluetooth/smp.c @@ -914,7 +914,7 @@ static int tk_request(struct l2cap_conn *conn, u8 remote_oob, u8 auth, * Confirms and the responder Enters the passkey. */ if (smp->method == OVERLAP) { - if (hcon->role == HCI_ROLE_MASTER) + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) smp->method = CFM_PASSKEY; else smp->method = REQ_PASSKEY; @@ -964,7 +964,7 @@ static u8 smp_confirm(struct smp_chan *smp)
smp_send_cmd(smp->conn, SMP_CMD_PAIRING_CONFIRM, sizeof(cp), &cp);
- if (conn->hcon->out) + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_CONFIRM); else SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RANDOM); @@ -980,7 +980,8 @@ static u8 smp_random(struct smp_chan *smp) int ret;
bt_dev_dbg(conn->hcon->hdev, "conn %p %s", conn, - conn->hcon->out ? "initiator" : "responder"); + test_bit(SMP_FLAG_INITIATOR, &smp->flags) ? "initiator" : + "responder");
ret = smp_c1(smp->tk, smp->rrnd, smp->preq, smp->prsp, hcon->init_addr_type, &hcon->init_addr, @@ -994,7 +995,7 @@ static u8 smp_random(struct smp_chan *smp) return SMP_CONFIRM_FAILED; }
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { u8 stk[16]; __le64 rand = 0; __le16 ediv = 0; @@ -1256,14 +1257,15 @@ static void smp_distribute_keys(struct smp_chan *smp) rsp = (void *) &smp->prsp[1];
/* The responder sends its keys first */ - if (hcon->out && (smp->remote_key_dist & KEY_DIST_MASK)) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags) && + (smp->remote_key_dist & KEY_DIST_MASK)) { smp_allow_key_dist(smp); return; }
req = (void *) &smp->preq[1];
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { keydist = &rsp->init_key_dist; *keydist &= req->init_key_dist; } else { @@ -1432,7 +1434,7 @@ static int sc_mackey_and_ltk(struct smp_chan *smp, u8 mackey[16], u8 ltk[16]) struct hci_conn *hcon = smp->conn->hcon; u8 *na, *nb, a[7], b[7];
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { na = smp->prnd; nb = smp->rrnd; } else { @@ -1460,7 +1462,7 @@ static void sc_dhkey_check(struct smp_chan *smp) a[6] = hcon->init_addr_type; b[6] = hcon->resp_addr_type;
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { local_addr = a; remote_addr = b; memcpy(io_cap, &smp->preq[1], 3); @@ -1539,7 +1541,7 @@ static u8 sc_passkey_round(struct smp_chan *smp, u8 smp_op) /* The round is only complete when the initiator * receives pairing random. */ - if (!hcon->out) { + if (!test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), smp->prnd); if (smp->passkey_round == 20) @@ -1567,7 +1569,7 @@ static u8 sc_passkey_round(struct smp_chan *smp, u8 smp_op)
SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RANDOM);
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), smp->prnd); return 0; @@ -1578,7 +1580,7 @@ static u8 sc_passkey_round(struct smp_chan *smp, u8 smp_op) case SMP_CMD_PUBLIC_KEY: default: /* Initiating device starts the round */ - if (!hcon->out) + if (!test_bit(SMP_FLAG_INITIATOR, &smp->flags)) return 0;
bt_dev_dbg(hdev, "Starting passkey round %u", @@ -1623,7 +1625,7 @@ static int sc_user_reply(struct smp_chan *smp, u16 mgmt_op, __le32 passkey) }
/* Initiator sends DHKey check first */ - if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { sc_dhkey_check(smp); SMP_ALLOW_CMD(smp, SMP_CMD_DHKEY_CHECK); } else if (test_and_clear_bit(SMP_FLAG_DHKEY_PENDING, &smp->flags)) { @@ -1746,7 +1748,7 @@ static u8 smp_cmd_pairing_req(struct l2cap_conn *conn, struct sk_buff *skb) struct smp_cmd_pairing rsp, *req = (void *) skb->data; struct l2cap_chan *chan = conn->smp; struct hci_dev *hdev = conn->hcon->hdev; - struct smp_chan *smp; + struct smp_chan *smp = chan->data; u8 key_size, auth, sec_level; int ret;
@@ -1755,16 +1757,14 @@ static u8 smp_cmd_pairing_req(struct l2cap_conn *conn, struct sk_buff *skb) if (skb->len < sizeof(*req)) return SMP_INVALID_PARAMS;
- if (conn->hcon->role != HCI_ROLE_SLAVE) + if (smp && test_bit(SMP_FLAG_INITIATOR, &smp->flags)) return SMP_CMD_NOTSUPP;
- if (!chan->data) + if (!smp) { smp = smp_chan_create(conn); - else - smp = chan->data; - - if (!smp) - return SMP_UNSPECIFIED; + if (!smp) + return SMP_UNSPECIFIED; + }
/* We didn't start the pairing, so match remote */ auth = req->auth_req & AUTH_REQ_MASK(hdev); @@ -1946,7 +1946,7 @@ static u8 smp_cmd_pairing_rsp(struct l2cap_conn *conn, struct sk_buff *skb) if (skb->len < sizeof(*rsp)) return SMP_INVALID_PARAMS;
- if (conn->hcon->role != HCI_ROLE_MASTER) + if (!test_bit(SMP_FLAG_INITIATOR, &smp->flags)) return SMP_CMD_NOTSUPP;
skb_pull(skb, sizeof(*rsp)); @@ -2041,7 +2041,7 @@ static u8 sc_check_confirm(struct smp_chan *smp) if (smp->method == REQ_PASSKEY || smp->method == DSP_PASSKEY) return sc_passkey_round(smp, SMP_CMD_PAIRING_CONFIRM);
- if (conn->hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), smp->prnd); SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RANDOM); @@ -2063,7 +2063,7 @@ static int fixup_sc_false_positive(struct smp_chan *smp) u8 auth;
/* The issue is only observed when we're in responder role */ - if (hcon->out) + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) return SMP_UNSPECIFIED;
if (hci_dev_test_flag(hdev, HCI_SC_ONLY)) { @@ -2099,7 +2099,8 @@ static u8 smp_cmd_pairing_confirm(struct l2cap_conn *conn, struct sk_buff *skb) struct hci_dev *hdev = hcon->hdev;
bt_dev_dbg(hdev, "conn %p %s", conn, - hcon->out ? "initiator" : "responder"); + test_bit(SMP_FLAG_INITIATOR, &smp->flags) ? "initiator" : + "responder");
if (skb->len < sizeof(smp->pcnf)) return SMP_INVALID_PARAMS; @@ -2121,7 +2122,7 @@ static u8 smp_cmd_pairing_confirm(struct l2cap_conn *conn, struct sk_buff *skb) return ret; }
- if (conn->hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), smp->prnd); SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RANDOM); @@ -2156,7 +2157,7 @@ static u8 smp_cmd_pairing_random(struct l2cap_conn *conn, struct sk_buff *skb) if (!test_bit(SMP_FLAG_SC, &smp->flags)) return smp_random(smp);
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { pkax = smp->local_pk; pkbx = smp->remote_pk; na = smp->prnd; @@ -2169,7 +2170,7 @@ static u8 smp_cmd_pairing_random(struct l2cap_conn *conn, struct sk_buff *skb) }
if (smp->method == REQ_OOB) { - if (!hcon->out) + if (!test_bit(SMP_FLAG_INITIATOR, &smp->flags)) smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), smp->prnd); SMP_ALLOW_CMD(smp, SMP_CMD_DHKEY_CHECK); @@ -2180,7 +2181,7 @@ static u8 smp_cmd_pairing_random(struct l2cap_conn *conn, struct sk_buff *skb) if (smp->method == REQ_PASSKEY || smp->method == DSP_PASSKEY) return sc_passkey_round(smp, SMP_CMD_PAIRING_RANDOM);
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { u8 cfm[16];
err = smp_f4(smp->tfm_cmac, smp->remote_pk, smp->local_pk, @@ -2221,7 +2222,7 @@ static u8 smp_cmd_pairing_random(struct l2cap_conn *conn, struct sk_buff *skb) return SMP_UNSPECIFIED;
if (smp->method == REQ_OOB) { - if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { sc_dhkey_check(smp); SMP_ALLOW_CMD(smp, SMP_CMD_DHKEY_CHECK); } @@ -2295,10 +2296,27 @@ bool smp_sufficient_security(struct hci_conn *hcon, u8 sec_level, return false; }
+static void smp_send_pairing_req(struct smp_chan *smp, __u8 auth) +{ + struct smp_cmd_pairing cp; + + if (smp->conn->hcon->type == ACL_LINK) + build_bredr_pairing_cmd(smp, &cp, NULL); + else + build_pairing_cmd(smp->conn, &cp, NULL, auth); + + smp->preq[0] = SMP_CMD_PAIRING_REQ; + memcpy(&smp->preq[1], &cp, sizeof(cp)); + + smp_send_cmd(smp->conn, SMP_CMD_PAIRING_REQ, sizeof(cp), &cp); + SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RSP); + + set_bit(SMP_FLAG_INITIATOR, &smp->flags); +} + static u8 smp_cmd_security_req(struct l2cap_conn *conn, struct sk_buff *skb) { struct smp_cmd_security_req *rp = (void *) skb->data; - struct smp_cmd_pairing cp; struct hci_conn *hcon = conn->hcon; struct hci_dev *hdev = hcon->hdev; struct smp_chan *smp; @@ -2347,16 +2365,20 @@ static u8 smp_cmd_security_req(struct l2cap_conn *conn, struct sk_buff *skb)
skb_pull(skb, sizeof(*rp));
- memset(&cp, 0, sizeof(cp)); - build_pairing_cmd(conn, &cp, NULL, auth); + smp_send_pairing_req(smp, auth);
- smp->preq[0] = SMP_CMD_PAIRING_REQ; - memcpy(&smp->preq[1], &cp, sizeof(cp)); + return 0; +}
- smp_send_cmd(conn, SMP_CMD_PAIRING_REQ, sizeof(cp), &cp); - SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RSP); +static void smp_send_security_req(struct smp_chan *smp, __u8 auth) +{ + struct smp_cmd_security_req cp;
- return 0; + cp.auth_req = auth; + smp_send_cmd(smp->conn, SMP_CMD_SECURITY_REQ, sizeof(cp), &cp); + SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_REQ); + + clear_bit(SMP_FLAG_INITIATOR, &smp->flags); }
int smp_conn_security(struct hci_conn *hcon, __u8 sec_level) @@ -2427,23 +2449,11 @@ int smp_conn_security(struct hci_conn *hcon, __u8 sec_level) authreq |= SMP_AUTH_MITM; }
- if (hcon->role == HCI_ROLE_MASTER) { - struct smp_cmd_pairing cp; - - build_pairing_cmd(conn, &cp, NULL, authreq); - smp->preq[0] = SMP_CMD_PAIRING_REQ; - memcpy(&smp->preq[1], &cp, sizeof(cp)); - - smp_send_cmd(conn, SMP_CMD_PAIRING_REQ, sizeof(cp), &cp); - SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RSP); - } else { - struct smp_cmd_security_req cp; - cp.auth_req = authreq; - smp_send_cmd(conn, SMP_CMD_SECURITY_REQ, sizeof(cp), &cp); - SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_REQ); - } + if (hcon->role == HCI_ROLE_MASTER) + smp_send_pairing_req(smp, authreq); + else + smp_send_security_req(smp, authreq);
- set_bit(SMP_FLAG_INITIATOR, &smp->flags); ret = 0;
unlock: @@ -2694,8 +2704,6 @@ static int smp_cmd_sign_info(struct l2cap_conn *conn, struct sk_buff *skb)
static u8 sc_select_method(struct smp_chan *smp) { - struct l2cap_conn *conn = smp->conn; - struct hci_conn *hcon = conn->hcon; struct smp_cmd_pairing *local, *remote; u8 local_mitm, remote_mitm, local_io, remote_io, method;
@@ -2708,7 +2716,7 @@ static u8 sc_select_method(struct smp_chan *smp) * the "struct smp_cmd_pairing" from them we need to skip the * first byte which contains the opcode. */ - if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { local = (void *) &smp->preq[1]; remote = (void *) &smp->prsp[1]; } else { @@ -2777,7 +2785,7 @@ static int smp_cmd_public_key(struct l2cap_conn *conn, struct sk_buff *skb) /* Non-initiating device sends its public key after receiving * the key from the initiating device. */ - if (!hcon->out) { + if (!test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { err = sc_send_public_key(smp); if (err) return err; @@ -2839,7 +2847,7 @@ static int smp_cmd_public_key(struct l2cap_conn *conn, struct sk_buff *skb) }
if (smp->method == REQ_OOB) { - if (hcon->out) + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) smp_send_cmd(conn, SMP_CMD_PAIRING_RANDOM, sizeof(smp->prnd), smp->prnd);
@@ -2848,7 +2856,7 @@ static int smp_cmd_public_key(struct l2cap_conn *conn, struct sk_buff *skb) return 0; }
- if (hcon->out) + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_CONFIRM);
if (smp->method == REQ_PASSKEY) { @@ -2863,7 +2871,7 @@ static int smp_cmd_public_key(struct l2cap_conn *conn, struct sk_buff *skb) /* The Initiating device waits for the non-initiating device to * send the confirm value. */ - if (conn->hcon->out) + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) return 0;
err = smp_f4(smp->tfm_cmac, smp->local_pk, smp->remote_pk, smp->prnd, @@ -2897,7 +2905,7 @@ static int smp_cmd_dhkey_check(struct l2cap_conn *conn, struct sk_buff *skb) a[6] = hcon->init_addr_type; b[6] = hcon->resp_addr_type;
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { local_addr = a; remote_addr = b; memcpy(io_cap, &smp->prsp[1], 3); @@ -2922,7 +2930,7 @@ static int smp_cmd_dhkey_check(struct l2cap_conn *conn, struct sk_buff *skb) if (crypto_memneq(check->e, e, 16)) return SMP_DHKEY_CHECK_FAILED;
- if (!hcon->out) { + if (!test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { if (test_bit(SMP_FLAG_WAIT_USER, &smp->flags)) { set_bit(SMP_FLAG_DHKEY_PENDING, &smp->flags); return 0; @@ -2934,7 +2942,7 @@ static int smp_cmd_dhkey_check(struct l2cap_conn *conn, struct sk_buff *skb)
sc_add_ltk(smp);
- if (hcon->out) { + if (test_bit(SMP_FLAG_INITIATOR, &smp->flags)) { hci_le_start_enc(hcon, 0, 0, smp->tk, smp->enc_key_size); hcon->enc_key_size = smp->enc_key_size; } @@ -3083,7 +3091,6 @@ static void bredr_pairing(struct l2cap_chan *chan) struct l2cap_conn *conn = chan->conn; struct hci_conn *hcon = conn->hcon; struct hci_dev *hdev = hcon->hdev; - struct smp_cmd_pairing req; struct smp_chan *smp;
bt_dev_dbg(hdev, "chan %p", chan); @@ -3135,14 +3142,7 @@ static void bredr_pairing(struct l2cap_chan *chan)
bt_dev_dbg(hdev, "starting SMP over BR/EDR");
- /* Prepare and send the BR/EDR SMP Pairing Request */ - build_bredr_pairing_cmd(smp, &req, NULL); - - smp->preq[0] = SMP_CMD_PAIRING_REQ; - memcpy(&smp->preq[1], &req, sizeof(req)); - - smp_send_cmd(conn, SMP_CMD_PAIRING_REQ, sizeof(req), &req); - SMP_ALLOW_CMD(smp, SMP_CMD_PAIRING_RSP); + smp_send_pairing_req(smp, 0x00); }
static void smp_resume_cb(struct l2cap_chan *chan)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit 67c3ca2c5cfe6a50772514e3349b5e7b3b0fac03 ]
Problem description -------------------
On an NXP LS1028A (felix DSA driver) with the following configuration:
- ocelot-8021q tagging protocol - VLAN-aware bridge (with STP) spanning at least swp0 and swp1 - 8021q VLAN upper interfaces on swp0 and swp1: swp0.700, swp1.700 - ptp4l on swp0.700 and swp1.700
we see that the ptp4l instances do not see each other's traffic, and they all go to the grand master state due to the ANNOUNCE_RECEIPT_TIMEOUT_EXPIRES condition.
Jumping to the conclusion for the impatient -------------------------------------------
There is a zero-day bug in the ocelot switchdev driver in the way it handles VLAN-tagged packet injection. The correct logic already exists in the source code, in function ocelot_xmit_get_vlan_info() added by commit 5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit"). But it is used only for normal NPI-based injection with the DSA "ocelot" tagging protocol. The other injection code paths (register-based and FDMA-based) roll their own wrong logic. This affects and was noticed on the DSA "ocelot-8021q" protocol because it uses register-based injection.
By moving ocelot_xmit_get_vlan_info() to a place that's common for both the DSA tagger and the ocelot switch library, it can also be called from ocelot_port_inject_frame() in ocelot.c.
We need to touch the lines with ocelot_ifh_port_set()'s prototype anyway, so let's rename it to something clearer regarding what it does, and add a kernel-doc. ocelot_ifh_set_basic() should do.
Investigation notes -------------------
Debugging reveals that PTP event (aka those carrying timestamps, like Sync) frames injected into swp0.700 (but also swp1.700) hit the wire with two VLAN tags:
00000000: 01 1b 19 00 00 00 00 01 02 03 04 05 81 00 02 bc ~~~~~~~~~~~ 00000010: 81 00 02 bc 88 f7 00 12 00 2c 00 00 02 00 00 00 ~~~~~~~~~~~ 00000020: 00 00 00 00 00 00 00 00 00 00 00 01 02 ff fe 03 00000030: 04 05 00 01 00 04 00 00 00 00 00 00 00 00 00 00 00000040: 00 00
The second (unexpected) VLAN tag makes felix_check_xtr_pkt() -> ptp_classify_raw() fail to see these as PTP packets at the link partner's receiving end, and return PTP_CLASS_NONE (because the BPF classifier is not written to expect 2 VLAN tags).
The reason why packets have 2 VLAN tags is because the transmission code treats VLAN incorrectly.
Neither ocelot switchdev, nor felix DSA, declare the NETIF_F_HW_VLAN_CTAG_TX feature. Therefore, at xmit time, all VLANs should be in the skb head, and none should be in the hwaccel area. This is done by:
static struct sk_buff *validate_xmit_vlan(struct sk_buff *skb, netdev_features_t features) { if (skb_vlan_tag_present(skb) && !vlan_hw_offload_capable(features, skb->vlan_proto)) skb = __vlan_hwaccel_push_inside(skb); return skb; }
But ocelot_port_inject_frame() handles things incorrectly:
ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb));
void ocelot_ifh_port_set(struct sk_buff *skb, void *ifh, int port, u32 rew_op) { (...) if (vlan_tag) ocelot_ifh_set_vlan_tci(ifh, vlan_tag); (...) }
The way __vlan_hwaccel_push_inside() pushes the tag inside the skb head is by calling:
static inline void __vlan_hwaccel_clear_tag(struct sk_buff *skb) { skb->vlan_present = 0; }
which does _not_ zero out skb->vlan_tci as seen by skb_vlan_tag_get(). This means that ocelot, when it calls skb_vlan_tag_get(), sees (and uses) a residual skb->vlan_tci, while the same VLAN tag is _already_ in the skb head.
The trivial fix for double VLAN headers is to replace the content of ocelot_ifh_port_set() with:
if (skb_vlan_tag_present(skb)) ocelot_ifh_set_vlan_tci(ifh, skb_vlan_tag_get(skb));
but this would not be correct either, because, as mentioned, vlan_hw_offload_capable() is false for us, so we'd be inserting dead code and we'd always transmit packets with VID=0 in the injection frame header.
I can't actually test the ocelot switchdev driver and rely exclusively on code inspection, but I don't think traffic from 8021q uppers has ever been injected properly, and not double-tagged. Thus I'm blaming the introduction of VLAN fields in the injection header - early driver code.
As hinted at in the early conclusion, what we _want_ to happen for VLAN transmission was already described once in commit 5ca721c54d86 ("net: dsa: tag_ocelot: set the classified VLAN during xmit").
ocelot_xmit_get_vlan_info() intends to ensure that if the port through which we're transmitting is under a VLAN-aware bridge, the outer VLAN tag from the skb head is stripped from there and inserted into the injection frame header (so that the packet is processed in hardware through that actual VLAN). And in all other cases, the packet is sent with VID=0 in the injection frame header, since the port is VLAN-unaware and has logic to strip this VID on egress (making it invisible to the wire).
Fixes: 08d02364b12f ("net: mscc: fix the injection header") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mscc/ocelot.c | 29 +++++++++++---- drivers/net/ethernet/mscc/ocelot_fdma.c | 2 +- include/linux/dsa/ocelot.h | 47 +++++++++++++++++++++++++ include/soc/mscc/ocelot.h | 3 +- net/dsa/tag_ocelot.c | 37 ++----------------- 5 files changed, 75 insertions(+), 43 deletions(-)
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index ed2fb44500b0c..69a4e5a90475b 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -1193,17 +1193,34 @@ bool ocelot_can_inject(struct ocelot *ocelot, int grp) } EXPORT_SYMBOL(ocelot_can_inject);
-void ocelot_ifh_port_set(void *ifh, int port, u32 rew_op, u32 vlan_tag) +/** + * ocelot_ifh_set_basic - Set basic information in Injection Frame Header + * @ifh: Pointer to Injection Frame Header memory + * @ocelot: Switch private data structure + * @port: Egress port number + * @rew_op: Egress rewriter operation for PTP + * @skb: Pointer to socket buffer (packet) + * + * Populate the Injection Frame Header with basic information for this skb: the + * analyzer bypass bit, destination port, VLAN info, egress rewriter info. + */ +void ocelot_ifh_set_basic(void *ifh, struct ocelot *ocelot, int port, + u32 rew_op, struct sk_buff *skb) { + struct ocelot_port *ocelot_port = ocelot->ports[port]; + u64 vlan_tci, tag_type; + + ocelot_xmit_get_vlan_info(skb, ocelot_port->bridge, &vlan_tci, + &tag_type); + ocelot_ifh_set_bypass(ifh, 1); ocelot_ifh_set_dest(ifh, BIT_ULL(port)); - ocelot_ifh_set_tag_type(ifh, IFH_TAG_TYPE_C); - if (vlan_tag) - ocelot_ifh_set_vlan_tci(ifh, vlan_tag); + ocelot_ifh_set_tag_type(ifh, tag_type); + ocelot_ifh_set_vlan_tci(ifh, vlan_tci); if (rew_op) ocelot_ifh_set_rew_op(ifh, rew_op); } -EXPORT_SYMBOL(ocelot_ifh_port_set); +EXPORT_SYMBOL(ocelot_ifh_set_basic);
void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp, u32 rew_op, struct sk_buff *skb) @@ -1214,7 +1231,7 @@ void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp, ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) | QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);
- ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb)); + ocelot_ifh_set_basic(ifh, ocelot, port, rew_op, skb);
for (i = 0; i < OCELOT_TAG_LEN / 4; i++) ocelot_write_rix(ocelot, ifh[i], QS_INJ_WR, grp); diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c b/drivers/net/ethernet/mscc/ocelot_fdma.c index 312a468321544..87b59cc5e4416 100644 --- a/drivers/net/ethernet/mscc/ocelot_fdma.c +++ b/drivers/net/ethernet/mscc/ocelot_fdma.c @@ -666,7 +666,7 @@ static int ocelot_fdma_prepare_skb(struct ocelot *ocelot, int port, u32 rew_op, ifh = skb_push(skb, OCELOT_TAG_LEN); skb_put(skb, ETH_FCS_LEN); memset(ifh, 0, OCELOT_TAG_LEN); - ocelot_ifh_port_set(ifh, port, rew_op, skb_vlan_tag_get(skb)); + ocelot_ifh_set_basic(ifh, ocelot, port, rew_op, skb);
return 0; } diff --git a/include/linux/dsa/ocelot.h b/include/linux/dsa/ocelot.h index dca2969015d80..6fbfbde68a37c 100644 --- a/include/linux/dsa/ocelot.h +++ b/include/linux/dsa/ocelot.h @@ -5,6 +5,8 @@ #ifndef _NET_DSA_TAG_OCELOT_H #define _NET_DSA_TAG_OCELOT_H
+#include <linux/if_bridge.h> +#include <linux/if_vlan.h> #include <linux/kthread.h> #include <linux/packing.h> #include <linux/skbuff.h> @@ -273,4 +275,49 @@ static inline u32 ocelot_ptp_rew_op(struct sk_buff *skb) return rew_op; }
+/** + * ocelot_xmit_get_vlan_info: Determine VLAN_TCI and TAG_TYPE for injected frame + * @skb: Pointer to socket buffer + * @br: Pointer to bridge device that the port is under, if any + * @vlan_tci: + * @tag_type: + * + * If the port is under a VLAN-aware bridge, remove the VLAN header from the + * payload and move it into the DSA tag, which will make the switch classify + * the packet to the bridge VLAN. Otherwise, leave the classified VLAN at zero, + * which is the pvid of standalone ports (OCELOT_STANDALONE_PVID), although not + * of VLAN-unaware bridge ports (that would be ocelot_vlan_unaware_pvid()). + * Anyway, VID 0 is fine because it is stripped on egress for these port modes, + * and source address learning is not performed for packets injected from the + * CPU anyway, so it doesn't matter that the VID is "wrong". + */ +static inline void ocelot_xmit_get_vlan_info(struct sk_buff *skb, + struct net_device *br, + u64 *vlan_tci, u64 *tag_type) +{ + struct vlan_ethhdr *hdr; + u16 proto, tci; + + if (!br || !br_vlan_enabled(br)) { + *vlan_tci = 0; + *tag_type = IFH_TAG_TYPE_C; + return; + } + + hdr = (struct vlan_ethhdr *)skb_mac_header(skb); + br_vlan_get_proto(br, &proto); + + if (ntohs(hdr->h_vlan_proto) == proto) { + vlan_remove_tag(skb, &tci); + *vlan_tci = tci; + } else { + rcu_read_lock(); + br_vlan_get_pvid_rcu(br, &tci); + rcu_read_unlock(); + *vlan_tci = tci; + } + + *tag_type = (proto != ETH_P_8021Q) ? IFH_TAG_TYPE_S : IFH_TAG_TYPE_C; +} + #endif diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index 1e1b40f4e664e..0297bc2277927 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -969,7 +969,8 @@ void __ocelot_target_write_ix(struct ocelot *ocelot, enum ocelot_target target, bool ocelot_can_inject(struct ocelot *ocelot, int grp); void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp, u32 rew_op, struct sk_buff *skb); -void ocelot_ifh_port_set(void *ifh, int port, u32 rew_op, u32 vlan_tag); +void ocelot_ifh_set_basic(void *ifh, struct ocelot *ocelot, int port, + u32 rew_op, struct sk_buff *skb); int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **skb); void ocelot_drain_cpu_queue(struct ocelot *ocelot, int grp); void ocelot_ptp_rx_timestamp(struct ocelot *ocelot, struct sk_buff *skb, diff --git a/net/dsa/tag_ocelot.c b/net/dsa/tag_ocelot.c index e0e4300bfbd3f..bf6608fc6be70 100644 --- a/net/dsa/tag_ocelot.c +++ b/net/dsa/tag_ocelot.c @@ -8,40 +8,6 @@ #define OCELOT_NAME "ocelot" #define SEVILLE_NAME "seville"
-/* If the port is under a VLAN-aware bridge, remove the VLAN header from the - * payload and move it into the DSA tag, which will make the switch classify - * the packet to the bridge VLAN. Otherwise, leave the classified VLAN at zero, - * which is the pvid of standalone and VLAN-unaware bridge ports. - */ -static void ocelot_xmit_get_vlan_info(struct sk_buff *skb, struct dsa_port *dp, - u64 *vlan_tci, u64 *tag_type) -{ - struct net_device *br = dsa_port_bridge_dev_get(dp); - struct vlan_ethhdr *hdr; - u16 proto, tci; - - if (!br || !br_vlan_enabled(br)) { - *vlan_tci = 0; - *tag_type = IFH_TAG_TYPE_C; - return; - } - - hdr = skb_vlan_eth_hdr(skb); - br_vlan_get_proto(br, &proto); - - if (ntohs(hdr->h_vlan_proto) == proto) { - vlan_remove_tag(skb, &tci); - *vlan_tci = tci; - } else { - rcu_read_lock(); - br_vlan_get_pvid_rcu(br, &tci); - rcu_read_unlock(); - *vlan_tci = tci; - } - - *tag_type = (proto != ETH_P_8021Q) ? IFH_TAG_TYPE_S : IFH_TAG_TYPE_C; -} - static void ocelot_xmit_common(struct sk_buff *skb, struct net_device *netdev, __be32 ifh_prefix, void **ifh) { @@ -53,7 +19,8 @@ static void ocelot_xmit_common(struct sk_buff *skb, struct net_device *netdev, u32 rew_op = 0; u64 qos_class;
- ocelot_xmit_get_vlan_info(skb, dp, &vlan_tci, &tag_type); + ocelot_xmit_get_vlan_info(skb, dsa_port_bridge_dev_get(dp), &vlan_tci, + &tag_type);
qos_class = netdev_get_num_tc(netdev) ? netdev_get_prio_tc_map(netdev, skb->priority) : skb->priority;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit e1b9e80236c540fa85d76e2d510d1b38e1968c5d ]
There are 2 distinct code paths (listed below) in the source code which set up an injection header for Ocelot(-like) switches. Code path (2) lacks the QoS class and source port being set correctly. Especially the improper QoS classification is a problem for the "ocelot-8021q" alternative DSA tagging protocol, because we support tc-taprio and each packet needs to be scheduled precisely through its time slot. This includes PTP, which is normally assigned to a traffic class other than 0, but would be sent through TC 0 nonetheless.
The code paths are:
(1) ocelot_xmit_common() from net/dsa/tag_ocelot.c - called only by the standard "ocelot" DSA tagging protocol which uses NPI-based injection - sets up bit fields in the tag manually to account for a small difference (destination port offset) between Ocelot and Seville. Namely, ocelot_ifh_set_dest() is omitted out of ocelot_xmit_common(), because there's also seville_ifh_set_dest().
(2) ocelot_ifh_set_basic(), called by: - ocelot_fdma_prepare_skb() for FDMA transmission of the ocelot switchdev driver - ocelot_port_xmit() -> ocelot_port_inject_frame() for register-based transmission of the ocelot switchdev driver - felix_port_deferred_xmit() -> ocelot_port_inject_frame() for the DSA tagger ocelot-8021q when it must transmit PTP frames (also through register-based injection). sets the bit fields according to its own logic.
The problem is that (2) doesn't call ocelot_ifh_set_qos_class(). Copying that logic from ocelot_xmit_common() fixes that.
Unfortunately, although desirable, it is not easily possible to de-duplicate code paths (1) and (2), and make net/dsa/tag_ocelot.c directly call ocelot_ifh_set_basic()), because of the ocelot/seville difference. This is the "minimal" fix with some logic duplicated (but at least more consolidated).
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mscc/ocelot.c | 10 +++++++++- drivers/net/ethernet/mscc/ocelot_fdma.c | 1 - 2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index 69a4e5a90475b..9301716e21d58 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -1208,13 +1208,21 @@ void ocelot_ifh_set_basic(void *ifh, struct ocelot *ocelot, int port, u32 rew_op, struct sk_buff *skb) { struct ocelot_port *ocelot_port = ocelot->ports[port]; + struct net_device *dev = skb->dev; u64 vlan_tci, tag_type; + int qos_class;
ocelot_xmit_get_vlan_info(skb, ocelot_port->bridge, &vlan_tci, &tag_type);
+ qos_class = netdev_get_num_tc(dev) ? + netdev_get_prio_tc_map(dev, skb->priority) : skb->priority; + + memset(ifh, 0, OCELOT_TAG_LEN); ocelot_ifh_set_bypass(ifh, 1); + ocelot_ifh_set_src(ifh, BIT_ULL(ocelot->num_phys_ports)); ocelot_ifh_set_dest(ifh, BIT_ULL(port)); + ocelot_ifh_set_qos_class(ifh, qos_class); ocelot_ifh_set_tag_type(ifh, tag_type); ocelot_ifh_set_vlan_tci(ifh, vlan_tci); if (rew_op) @@ -1225,7 +1233,7 @@ EXPORT_SYMBOL(ocelot_ifh_set_basic); void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp, u32 rew_op, struct sk_buff *skb) { - u32 ifh[OCELOT_TAG_LEN / 4] = {0}; + u32 ifh[OCELOT_TAG_LEN / 4]; unsigned int i, count, last;
ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) | diff --git a/drivers/net/ethernet/mscc/ocelot_fdma.c b/drivers/net/ethernet/mscc/ocelot_fdma.c index 87b59cc5e4416..00326ae8c708b 100644 --- a/drivers/net/ethernet/mscc/ocelot_fdma.c +++ b/drivers/net/ethernet/mscc/ocelot_fdma.c @@ -665,7 +665,6 @@ static int ocelot_fdma_prepare_skb(struct ocelot *ocelot, int port, u32 rew_op,
ifh = skb_push(skb, OCELOT_TAG_LEN); skb_put(skb, ETH_FCS_LEN); - memset(ifh, 0, OCELOT_TAG_LEN); ocelot_ifh_set_basic(ifh, ocelot, port, rew_op, skb);
return 0;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vladimir Oltean vladimir.oltean@nxp.com
[ Upstream commit c5e12ac3beb0dd3a718296b2d8af5528e9ab728e ]
As explained by Horatiu Vultur in commit 603ead96582d ("net: sparx5: Add spinlock for frame transmission from CPU") which is for a similar hardware design, multiple CPUs can simultaneously perform injection or extraction. There are only 2 register groups for injection and 2 for extraction, and the driver only uses one of each. So we'd better serialize access using spin locks, otherwise frame corruption is possible.
Note that unlike in sparx5, FDMA in ocelot does not have this issue because struct ocelot_fdma_tx_ring already contains an xmit_lock.
I guess this is mostly a problem for NXP LS1028A, as that is dual core. I don't think VSC7514 is. So I'm blaming the commit where LS1028A (aka the felix DSA driver) started using register-based packet injection and extraction.
Fixes: 0a6f17c6ae21 ("net: dsa: tag_ocelot_8021q: add support for PTP timestamping") Signed-off-by: Vladimir Oltean vladimir.oltean@nxp.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/ocelot/felix.c | 11 +++++ drivers/net/ethernet/mscc/ocelot.c | 52 ++++++++++++++++++++++ drivers/net/ethernet/mscc/ocelot_vsc7514.c | 4 ++ include/soc/mscc/ocelot.h | 9 ++++ 4 files changed, 76 insertions(+)
diff --git a/drivers/net/dsa/ocelot/felix.c b/drivers/net/dsa/ocelot/felix.c index 61e95487732dc..f5d26e724ae65 100644 --- a/drivers/net/dsa/ocelot/felix.c +++ b/drivers/net/dsa/ocelot/felix.c @@ -528,7 +528,9 @@ static int felix_tag_8021q_setup(struct dsa_switch *ds) * so we need to be careful that there are no extra frames to be * dequeued over MMIO, since we would never know to discard them. */ + ocelot_lock_xtr_grp_bh(ocelot, 0); ocelot_drain_cpu_queue(ocelot, 0); + ocelot_unlock_xtr_grp_bh(ocelot, 0);
return 0; } @@ -1504,6 +1506,8 @@ static void felix_port_deferred_xmit(struct kthread_work *work) int port = xmit_work->dp->index; int retries = 10;
+ ocelot_lock_inj_grp(ocelot, 0); + do { if (ocelot_can_inject(ocelot, 0)) break; @@ -1512,6 +1516,7 @@ static void felix_port_deferred_xmit(struct kthread_work *work) } while (--retries);
if (!retries) { + ocelot_unlock_inj_grp(ocelot, 0); dev_err(ocelot->dev, "port %d failed to inject skb\n", port); ocelot_port_purge_txtstamp_skb(ocelot, port, skb); @@ -1521,6 +1526,8 @@ static void felix_port_deferred_xmit(struct kthread_work *work)
ocelot_port_inject_frame(ocelot, port, 0, rew_op, skb);
+ ocelot_unlock_inj_grp(ocelot, 0); + consume_skb(skb); kfree(xmit_work); } @@ -1671,6 +1678,8 @@ static bool felix_check_xtr_pkt(struct ocelot *ocelot) if (!felix->info->quirk_no_xtr_irq) return false;
+ ocelot_lock_xtr_grp(ocelot, grp); + while (ocelot_read(ocelot, QS_XTR_DATA_PRESENT) & BIT(grp)) { struct sk_buff *skb; unsigned int type; @@ -1707,6 +1716,8 @@ static bool felix_check_xtr_pkt(struct ocelot *ocelot) ocelot_drain_cpu_queue(ocelot, 0); }
+ ocelot_unlock_xtr_grp(ocelot, grp); + return true; }
diff --git a/drivers/net/ethernet/mscc/ocelot.c b/drivers/net/ethernet/mscc/ocelot.c index 9301716e21d58..f4e027a6fe955 100644 --- a/drivers/net/ethernet/mscc/ocelot.c +++ b/drivers/net/ethernet/mscc/ocelot.c @@ -1099,6 +1099,48 @@ void ocelot_ptp_rx_timestamp(struct ocelot *ocelot, struct sk_buff *skb, } EXPORT_SYMBOL(ocelot_ptp_rx_timestamp);
+void ocelot_lock_inj_grp(struct ocelot *ocelot, int grp) + __acquires(&ocelot->inj_lock) +{ + spin_lock(&ocelot->inj_lock); +} +EXPORT_SYMBOL_GPL(ocelot_lock_inj_grp); + +void ocelot_unlock_inj_grp(struct ocelot *ocelot, int grp) + __releases(&ocelot->inj_lock) +{ + spin_unlock(&ocelot->inj_lock); +} +EXPORT_SYMBOL_GPL(ocelot_unlock_inj_grp); + +void ocelot_lock_xtr_grp(struct ocelot *ocelot, int grp) + __acquires(&ocelot->inj_lock) +{ + spin_lock(&ocelot->inj_lock); +} +EXPORT_SYMBOL_GPL(ocelot_lock_xtr_grp); + +void ocelot_unlock_xtr_grp(struct ocelot *ocelot, int grp) + __releases(&ocelot->inj_lock) +{ + spin_unlock(&ocelot->inj_lock); +} +EXPORT_SYMBOL_GPL(ocelot_unlock_xtr_grp); + +void ocelot_lock_xtr_grp_bh(struct ocelot *ocelot, int grp) + __acquires(&ocelot->xtr_lock) +{ + spin_lock_bh(&ocelot->xtr_lock); +} +EXPORT_SYMBOL_GPL(ocelot_lock_xtr_grp_bh); + +void ocelot_unlock_xtr_grp_bh(struct ocelot *ocelot, int grp) + __releases(&ocelot->xtr_lock) +{ + spin_unlock_bh(&ocelot->xtr_lock); +} +EXPORT_SYMBOL_GPL(ocelot_unlock_xtr_grp_bh); + int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **nskb) { u64 timestamp, src_port, len; @@ -1109,6 +1151,8 @@ int ocelot_xtr_poll_frame(struct ocelot *ocelot, int grp, struct sk_buff **nskb) u32 val, *buf; int err;
+ lockdep_assert_held(&ocelot->xtr_lock); + err = ocelot_xtr_poll_xfh(ocelot, grp, xfh); if (err) return err; @@ -1184,6 +1228,8 @@ bool ocelot_can_inject(struct ocelot *ocelot, int grp) { u32 val = ocelot_read(ocelot, QS_INJ_STATUS);
+ lockdep_assert_held(&ocelot->inj_lock); + if (!(val & QS_INJ_STATUS_FIFO_RDY(BIT(grp)))) return false; if (val & QS_INJ_STATUS_WMARK_REACHED(BIT(grp))) @@ -1236,6 +1282,8 @@ void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp, u32 ifh[OCELOT_TAG_LEN / 4]; unsigned int i, count, last;
+ lockdep_assert_held(&ocelot->inj_lock); + ocelot_write_rix(ocelot, QS_INJ_CTRL_GAP_SIZE(1) | QS_INJ_CTRL_SOF, QS_INJ_CTRL, grp);
@@ -1272,6 +1320,8 @@ EXPORT_SYMBOL(ocelot_port_inject_frame);
void ocelot_drain_cpu_queue(struct ocelot *ocelot, int grp) { + lockdep_assert_held(&ocelot->xtr_lock); + while (ocelot_read(ocelot, QS_XTR_DATA_PRESENT) & BIT(grp)) ocelot_read_rix(ocelot, QS_XTR_RD, grp); } @@ -2954,6 +3004,8 @@ int ocelot_init(struct ocelot *ocelot) mutex_init(&ocelot->fwd_domain_lock); spin_lock_init(&ocelot->ptp_clock_lock); spin_lock_init(&ocelot->ts_id_lock); + spin_lock_init(&ocelot->inj_lock); + spin_lock_init(&ocelot->xtr_lock);
ocelot->owq = alloc_ordered_workqueue("ocelot-owq", 0); if (!ocelot->owq) diff --git a/drivers/net/ethernet/mscc/ocelot_vsc7514.c b/drivers/net/ethernet/mscc/ocelot_vsc7514.c index 993212c3a7da6..c09dd2e3343cb 100644 --- a/drivers/net/ethernet/mscc/ocelot_vsc7514.c +++ b/drivers/net/ethernet/mscc/ocelot_vsc7514.c @@ -51,6 +51,8 @@ static irqreturn_t ocelot_xtr_irq_handler(int irq, void *arg) struct ocelot *ocelot = arg; int grp = 0, err;
+ ocelot_lock_xtr_grp(ocelot, grp); + while (ocelot_read(ocelot, QS_XTR_DATA_PRESENT) & BIT(grp)) { struct sk_buff *skb;
@@ -69,6 +71,8 @@ static irqreturn_t ocelot_xtr_irq_handler(int irq, void *arg) if (err < 0) ocelot_drain_cpu_queue(ocelot, 0);
+ ocelot_unlock_xtr_grp(ocelot, grp); + return IRQ_HANDLED; }
diff --git a/include/soc/mscc/ocelot.h b/include/soc/mscc/ocelot.h index 0297bc2277927..846132ca5503d 100644 --- a/include/soc/mscc/ocelot.h +++ b/include/soc/mscc/ocelot.h @@ -813,6 +813,9 @@ struct ocelot { const u32 *const *map; struct list_head stats_regions;
+ spinlock_t inj_lock; + spinlock_t xtr_lock; + u32 pool_size[OCELOT_SB_NUM][OCELOT_SB_POOL_NUM]; int packet_buffer_size; int num_frame_refs; @@ -966,6 +969,12 @@ void __ocelot_target_write_ix(struct ocelot *ocelot, enum ocelot_target target, u32 val, u32 reg, u32 offset);
/* Packet I/O */ +void ocelot_lock_inj_grp(struct ocelot *ocelot, int grp); +void ocelot_unlock_inj_grp(struct ocelot *ocelot, int grp); +void ocelot_lock_xtr_grp(struct ocelot *ocelot, int grp); +void ocelot_unlock_xtr_grp(struct ocelot *ocelot, int grp); +void ocelot_lock_xtr_grp_bh(struct ocelot *ocelot, int grp); +void ocelot_unlock_xtr_grp_bh(struct ocelot *ocelot, int grp); bool ocelot_can_inject(struct ocelot *ocelot, int grp); void ocelot_port_inject_frame(struct ocelot *ocelot, int port, int grp, u32 rew_op, struct sk_buff *skb);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Carolina Jubran cjubran@nvidia.com
[ Upstream commit a07e953dafe5ebd88942dc861dfb06eaf055fb07 ]
The offending commit overlooked the Multi-PF Netdev changes.
Revert mlx5e_set_default_xps_cpumasks to incorporate Multi-PF Netdev changes.
Fixes: bcee093751f8 ("net/mlx5e: Modifying channels number and updating TX queues") Signed-off-by: Carolina Jubran cjubran@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/20240815071611.2211873-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index eedbcba226894..409f525f1703c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3005,15 +3005,18 @@ int mlx5e_update_tx_netdev_queues(struct mlx5e_priv *priv) static void mlx5e_set_default_xps_cpumasks(struct mlx5e_priv *priv, struct mlx5e_params *params) { - struct mlx5_core_dev *mdev = priv->mdev; - int num_comp_vectors, ix, irq; - - num_comp_vectors = mlx5_comp_vectors_max(mdev); + int ix;
for (ix = 0; ix < params->num_channels; ix++) { + int num_comp_vectors, irq, vec_ix; + struct mlx5_core_dev *mdev; + + mdev = mlx5_sd_ch_ix_get_dev(priv->mdev, ix); + num_comp_vectors = mlx5_comp_vectors_max(mdev); cpumask_clear(priv->scratchpad.cpumask); + vec_ix = mlx5_sd_ch_ix_get_vec_ix(mdev, ix);
- for (irq = ix; irq < num_comp_vectors; irq += params->num_channels) { + for (irq = vec_ix; irq < num_comp_vectors; irq += params->num_channels) { int cpu = mlx5_comp_vector_get_cpu(mdev, irq);
cpumask_set_cpu(cpu, priv->scratchpad.cpumask);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Patrisious Haddad phaddad@nvidia.com
[ Upstream commit 607e1df7bd47fe91cab85a97f57870a26d066137 ]
Prevent the call trace below from happening, by not allowing IPsec creation over a slave, if master device doesn't support IPsec.
WARNING: CPU: 44 PID: 16136 at kernel/locking/rwsem.c:240 down_read+0x75/0x94 Modules linked in: esp4_offload esp4 act_mirred act_vlan cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa mst_pciconf(OE) nfsv3 nfs_acl nfs lockd grace fscache netfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 nft_compat nft_counter nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill cuse fuse rpcrdma sunrpc rdma_ucm ib_srpt ib_isert iscsi_target_mod target_core_mod ib_umad ib_iser libiscsi scsi_transport_iscsi rdma_cm ib_ipoib iw_cm ib_cm ipmi_ssif intel_rapl_msr intel_rapl_common amd64_edac edac_mce_amd kvm_amd kvm irqbypass crct10dif_pclmul crc32_pclmul mlx5_ib ghash_clmulni_intel sha1_ssse3 dell_smbios ib_uverbs aesni_intel crypto_simd dcdbas wmi_bmof dell_wmi_descriptor cryptd pcspkr ib_core acpi_ipmi sp5100_tco ccp i2c_piix4 ipmi_si ptdma k10temp ipmi_devintf ipmi_msghandler acpi_power_meter acpi_cpufreq ext4 mbcache jbd2 sd_mod t10_pi sg mgag200 drm_kms_helper syscopyarea sysfillrect mlx5_core sysimgblt fb_sys_fops cec ahci libahci mlxfw drm pci_hyperv_intf libata tg3 sha256_ssse3 tls megaraid_sas i2c_algo_bit psample wmi dm_mirror dm_region_hash dm_log dm_mod [last unloaded: mst_pci] CPU: 44 PID: 16136 Comm: kworker/44:3 Kdump: loaded Tainted: GOE 5.15.0-20240509.el8uek.uek7_u3_update_v6.6_ipsec_bf.x86_64 #2 Hardware name: Dell Inc. PowerEdge R7525/074H08, BIOS 2.0.3 01/15/2021 Workqueue: events xfrm_state_gc_task RIP: 0010:down_read+0x75/0x94 Code: 00 48 8b 45 08 65 48 8b 14 25 80 fc 01 00 83 e0 02 48 09 d0 48 83 c8 01 48 89 45 08 5d 31 c0 89 c2 89 c6 89 c7 e9 cb 88 3b 00 <0f> 0b 48 8b 45 08 a8 01 74 b2 a8 02 75 ae 48 89 c2 48 83 ca 02 f0 RSP: 0018:ffffb26387773da8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffffa08b658af900 RCX: 0000000000000001 RDX: 0000000000000000 RSI: ff886bc5e1366f2f RDI: 0000000000000000 RBP: ffffa08b658af940 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: ffffa0a9bfb31540 R13: ffffa0a9bfb37900 R14: 0000000000000000 R15: ffffa0a9bfb37905 FS: 0000000000000000(0000) GS:ffffa0a9bfb00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000055a45ed814e8 CR3: 000000109038a000 CR4: 0000000000350ee0 Call Trace: <TASK> ? show_trace_log_lvl+0x1d6/0x2f9 ? show_trace_log_lvl+0x1d6/0x2f9 ? mlx5_devcom_for_each_peer_begin+0x29/0x60 [mlx5_core] ? down_read+0x75/0x94 ? __warn+0x80/0x113 ? down_read+0x75/0x94 ? report_bug+0xa4/0x11d ? handle_bug+0x35/0x8b ? exc_invalid_op+0x14/0x75 ? asm_exc_invalid_op+0x16/0x1b ? down_read+0x75/0x94 ? down_read+0xe/0x94 mlx5_devcom_for_each_peer_begin+0x29/0x60 [mlx5_core] mlx5_ipsec_fs_roce_tx_destroy+0xb1/0x130 [mlx5_core] tx_destroy+0x1b/0xc0 [mlx5_core] tx_ft_put+0x53/0xc0 [mlx5_core] mlx5e_xfrm_free_state+0x45/0x90 [mlx5_core] ___xfrm_state_destroy+0x10f/0x1a2 xfrm_state_gc_task+0x81/0xa9 process_one_work+0x1f1/0x3c6 worker_thread+0x53/0x3e4 ? process_one_work.cold+0x46/0x3c kthread+0x127/0x144 ? set_kthread_struct+0x60/0x52 ret_from_fork+0x22/0x2d </TASK> ---[ end trace 5ef7896144d398e1 ]---
Fixes: dfbd229abeee ("net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic") Reviewed-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Patrisious Haddad phaddad@nvidia.com Signed-off-by: Tariq Toukan tariqt@nvidia.com Link: https://patch.msgid.link/20240815071611.2211873-5-tariqt@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/lib/ipsec_fs_roce.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/ipsec_fs_roce.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/ipsec_fs_roce.c index 234cd00f71a1c..b7d4b1a2baf2e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/ipsec_fs_roce.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/ipsec_fs_roce.c @@ -386,7 +386,8 @@ static int ipsec_fs_roce_tx_mpv_create(struct mlx5_core_dev *mdev, return -EOPNOTSUPP;
peer_priv = mlx5_devcom_get_next_peer_data(*ipsec_roce->devcom, &tmp); - if (!peer_priv) { + if (!peer_priv || !peer_priv->ipsec) { + mlx5_core_err(mdev, "IPsec not supported on master device\n"); err = -EOPNOTSUPP; goto release_peer; } @@ -455,7 +456,8 @@ static int ipsec_fs_roce_rx_mpv_create(struct mlx5_core_dev *mdev, return -EOPNOTSUPP;
peer_priv = mlx5_devcom_get_next_peer_data(*ipsec_roce->devcom, &tmp); - if (!peer_priv) { + if (!peer_priv || !peer_priv->ipsec) { + mlx5_core_err(mdev, "IPsec not supported on master device\n"); err = -EOPNOTSUPP; goto release_peer; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Simon Horman horms@kernel.org
[ Upstream commit a0c9fe5eecc97680323ee83780ea3eaf440ba1b7 ]
Since commit 255c1c7279ab ("tc-testing: Allow test cases to be skipped") the variable test_ordinal doesn't exist in call_pre_case(). So it should not be accessed when an exception occurs.
This resolves the following splat:
... During handling of the above exception, another exception occurred:
Traceback (most recent call last): File ".../tdc.py", line 1028, in <module> main() File ".../tdc.py", line 1022, in main set_operation_mode(pm, parser, args, remaining) File ".../tdc.py", line 966, in set_operation_mode catresults = test_runner_serial(pm, args, alltests) File ".../tdc.py", line 642, in test_runner_serial (index, tsr) = test_runner(pm, args, alltests) File ".../tdc.py", line 536, in test_runner res = run_one_test(pm, args, index, tidx) File ".../tdc.py", line 419, in run_one_test pm.call_pre_case(tidx) File ".../tdc.py", line 146, in call_pre_case print('test_ordinal is {}'.format(test_ordinal)) NameError: name 'test_ordinal' is not defined
Fixes: 255c1c7279ab ("tc-testing: Allow test cases to be skipped") Signed-off-by: Simon Horman horms@kernel.org Acked-by: Jamal Hadi Salim jhs@mojatatu.com Link: https://patch.msgid.link/20240815-tdc-test-ordinal-v1-1-0255c122a427@kernel.... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/tc-testing/tdc.py | 1 - 1 file changed, 1 deletion(-)
diff --git a/tools/testing/selftests/tc-testing/tdc.py b/tools/testing/selftests/tc-testing/tdc.py index ee349187636fc..4f255cec0c22e 100755 --- a/tools/testing/selftests/tc-testing/tdc.py +++ b/tools/testing/selftests/tc-testing/tdc.py @@ -143,7 +143,6 @@ class PluginMgr: except Exception as ee: print('exception {} in call to pre_case for {} plugin'. format(ee, pgn_inst.__class__)) - print('test_ordinal is {}'.format(test_ordinal)) print('testid is {}'.format(caseinfo['id'])) raise
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit 7167395a4be7930ecac6a33b4e54d7e3dd9ee209 ]
Currently, we only check the latest senders's exit code. If the receiver report failed, it is not recoreded. Fix it by checking the exit code of all the involved processes.
Before: bad GRO lookup ok multiple GRO socks ./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
failed $ echo $? 0
After: bad GRO lookup ok multiple GRO socks ./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520
failed $ echo $? 1
Fixes: 3327a9c46352 ("selftests: add functionals test for UDP GRO") Suggested-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/udpgro.sh | 44 ++++++++++++++++----------- 1 file changed, 27 insertions(+), 17 deletions(-)
diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh index 11a1ebda564fd..4659cf01e4384 100755 --- a/tools/testing/selftests/net/udpgro.sh +++ b/tools/testing/selftests/net/udpgro.sh @@ -46,17 +46,19 @@ run_one() { local -r all="$@" local -r tx_args=${all%rx*} local -r rx_args=${all#*rx} + local ret=0
cfg_veth
- ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 1000 -R 10 ${rx_args} && \ - echo "ok" || \ - echo "failed" & + ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 1000 -R 10 ${rx_args} & + local PID1=$!
wait_local_port_listen ${PEER_NS} 8000 udp ./udpgso_bench_tx ${tx_args} - ret=$? - wait $(jobs -p) + check_err $? + wait ${PID1} + check_err $? + [ "$ret" -eq 0 ] && echo "ok" || echo "failed" return $ret }
@@ -73,6 +75,7 @@ run_one_nat() { local -r all="$@" local -r tx_args=${all%rx*} local -r rx_args=${all#*rx} + local ret=0
if [[ ${tx_args} = *-4* ]]; then ipt_cmd=iptables @@ -93,16 +96,17 @@ run_one_nat() { # ... so that GRO will match the UDP_GRO enabled socket, but packets # will land on the 'plain' one ip netns exec "${PEER_NS}" ./udpgso_bench_rx -G ${family} -b ${addr1} -n 0 & - pid=$! - ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 1000 -R 10 ${family} -b ${addr2%/*} ${rx_args} && \ - echo "ok" || \ - echo "failed"& + local PID1=$! + ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 1000 -R 10 ${family} -b ${addr2%/*} ${rx_args} & + local PID2=$!
wait_local_port_listen "${PEER_NS}" 8000 udp ./udpgso_bench_tx ${tx_args} - ret=$? - kill -INT $pid - wait $(jobs -p) + check_err $? + kill -INT ${PID1} + wait ${PID2} + check_err $? + [ "$ret" -eq 0 ] && echo "ok" || echo "failed" return $ret }
@@ -111,20 +115,26 @@ run_one_2sock() { local -r all="$@" local -r tx_args=${all%rx*} local -r rx_args=${all#*rx} + local ret=0
cfg_veth
ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 1000 -R 10 ${rx_args} -p 12345 & - ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 2000 -R 10 ${rx_args} && \ - echo "ok" || \ - echo "failed" & + local PID1=$! + ip netns exec "${PEER_NS}" ./udpgso_bench_rx -C 2000 -R 10 ${rx_args} & + local PID2=$!
wait_local_port_listen "${PEER_NS}" 12345 udp ./udpgso_bench_tx ${tx_args} -p 12345 + check_err $? wait_local_port_listen "${PEER_NS}" 8000 udp ./udpgso_bench_tx ${tx_args} - ret=$? - wait $(jobs -p) + check_err $? + wait ${PID1} + check_err $? + wait ${PID2} + check_err $? + [ "$ret" -eq 0 ] && echo "ok" || echo "failed" return $ret }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit d7818402b1d80347c764001583f6d63fa68c2e1a ]
After commit d7db7775ea2e ("net: veth: do not manipulate GRO when using XDP"), there is no need to load XDP program to enable GRO. On the other hand, the current test is failed due to loading the XDP program. e.g.
# selftests: net: udpgro.sh # ipv4 # no GRO ok # no GRO chk cmsg ok # GRO ./udpgso_bench_rx: recv: bad packet len, got 1472, expected 14720 # # failed
[...]
# bad GRO lookup ok # multiple GRO socks ./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520 # # ./udpgso_bench_rx: recv: bad packet len, got 1452, expected 14520 # # failed ok 1 selftests: net: udpgro.sh
After fix, all the test passed.
# ./udpgro.sh ipv4 no GRO ok [...] multiple GRO socks ok
Fixes: d7db7775ea2e ("net: veth: do not manipulate GRO when using XDP") Reported-by: Yi Chen yiche@redhat.com Closes: https://issues.redhat.com/browse/RHEL-53858 Reviewed-by: Toke Høiland-Jørgensen toke@redhat.com Acked-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/udpgro.sh | 9 +-------- 1 file changed, 1 insertion(+), 8 deletions(-)
diff --git a/tools/testing/selftests/net/udpgro.sh b/tools/testing/selftests/net/udpgro.sh index 4659cf01e4384..d5ffd8c9172e1 100755 --- a/tools/testing/selftests/net/udpgro.sh +++ b/tools/testing/selftests/net/udpgro.sh @@ -7,8 +7,6 @@ source net_helper.sh
readonly PEER_NS="ns-peer-$(mktemp -u XXXXXX)"
-BPF_FILE="xdp_dummy.bpf.o" - # set global exit status, but never reset nonzero one. check_err() { @@ -38,7 +36,7 @@ cfg_veth() { ip -netns "${PEER_NS}" addr add dev veth1 192.168.1.1/24 ip -netns "${PEER_NS}" addr add dev veth1 2001:db8::1/64 nodad ip -netns "${PEER_NS}" link set dev veth1 up - ip -n "${PEER_NS}" link set veth1 xdp object ${BPF_FILE} section xdp + ip netns exec "${PEER_NS}" ethtool -K veth1 gro on }
run_one() { @@ -206,11 +204,6 @@ run_all() { return $ret }
-if [ ! -f ${BPF_FILE} ]; then - echo "Missing ${BPF_FILE}. Run 'make' first" - exit -1 -fi - if [[ $# -eq 0 ]]; then run_all elif [[ $1 == "__subprocess" ]]; then
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit 565d121b69980637f040eb4d84289869cdaabedf ]
Its possible that two threads call tcp_sk_exit_batch() concurrently, once from the cleanup_net workqueue, once from a task that failed to clone a new netns. In the latter case, error unwinding calls the exit handlers in reverse order for the 'failed' netns.
tcp_sk_exit_batch() calls tcp_twsk_purge(). Problem is that since commit b099ce2602d8 ("net: Batch inet_twsk_purge"), this function picks up twsk in any dying netns, not just the one passed in via exit_batch list.
This means that the error unwind of setup_net() can "steal" and destroy timewait sockets belonging to the exiting netns.
This allows the netns exit worker to proceed to call
WARN_ON_ONCE(!refcount_dec_and_test(&net->ipv4.tcp_death_row.tw_refcount));
without the expected 1 -> 0 transition, which then splats.
At same time, error unwind path that is also running inet_twsk_purge() will splat as well:
WARNING: .. at lib/refcount.c:31 refcount_warn_saturate+0x1ed/0x210 ... refcount_dec include/linux/refcount.h:351 [inline] inet_twsk_kill+0x758/0x9c0 net/ipv4/inet_timewait_sock.c:70 inet_twsk_deschedule_put net/ipv4/inet_timewait_sock.c:221 inet_twsk_purge+0x725/0x890 net/ipv4/inet_timewait_sock.c:304 tcp_sk_exit_batch+0x1c/0x170 net/ipv4/tcp_ipv4.c:3522 ops_exit_list+0x128/0x180 net/core/net_namespace.c:178 setup_net+0x714/0xb40 net/core/net_namespace.c:375 copy_net_ns+0x2f0/0x670 net/core/net_namespace.c:508 create_new_namespaces+0x3ea/0xb10 kernel/nsproxy.c:110
... because refcount_dec() of tw_refcount unexpectedly dropped to 0.
This doesn't seem like an actual bug (no tw sockets got lost and I don't see a use-after-free) but as erroneous trigger of debug check.
Add a mutex to force strict ordering: the task that calls tcp_twsk_purge() blocks other task from doing final _dec_and_test before mutex-owner has removed all tw sockets of dying netns.
Fixes: e9bd0cca09d1 ("tcp: Don't allocate tcp_death_row outside of struct netns_ipv4.") Reported-by: syzbot+8ea26396ff85d23a8929@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/0000000000003a5292061f5e4e19@google.com/ Link: https://lore.kernel.org/netdev/20240812140104.GA21559@breakpoint.cc/ Signed-off-by: Florian Westphal fw@strlen.de Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Jason Xing kerneljasonxing@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/20240812222857.29837-1-fw@strlen.de Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp_ipv4.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index a541659b6562b..8f8f93716ff85 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -95,6 +95,8 @@ EXPORT_SYMBOL(tcp_hashinfo);
static DEFINE_PER_CPU(struct sock *, ipv4_tcp_sk);
+static DEFINE_MUTEX(tcp_exit_batch_mutex); + static u32 tcp_v4_init_seq(const struct sk_buff *skb) { return secure_tcp_seq(ip_hdr(skb)->daddr, @@ -3509,6 +3511,16 @@ static void __net_exit tcp_sk_exit_batch(struct list_head *net_exit_list) { struct net *net;
+ /* make sure concurrent calls to tcp_sk_exit_batch from net_cleanup_work + * and failed setup_net error unwinding path are serialized. + * + * tcp_twsk_purge() handles twsk in any dead netns, not just those in + * net_exit_list, the thread that dismantles a particular twsk must + * do so without other thread progressing to refcount_dec_and_test() of + * tcp_death_row.tw_refcount. + */ + mutex_lock(&tcp_exit_batch_mutex); + tcp_twsk_purge(net_exit_list);
list_for_each_entry(net, net_exit_list, exit_list) { @@ -3516,6 +3528,8 @@ static void __net_exit tcp_sk_exit_batch(struct list_head *net_exit_list) WARN_ON_ONCE(!refcount_dec_and_test(&net->ipv4.tcp_death_row.tw_refcount)); tcp_fastopen_ctx_destroy(net); } + + mutex_unlock(&tcp_exit_batch_mutex); }
static struct pernet_operations __net_initdata tcp_sk_ops = {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jeremy Kerr jk@codeconstruct.com.au
[ Upstream commit ce335db0621648472f9bb4b7191eb2e13a5793cf ]
In the MCTP route input test, we're routing one skb, then (when delivery is expected) checking the resulting routed skb.
However, we're currently checking the original skb length, rather than the routed skb. Check the routed skb instead; the original will have been freed at this point.
Fixes: 8892c0490779 ("mctp: Add route input to socket tests") Reported-by: Dan Carpenter dan.carpenter@linaro.org Closes: https://lore.kernel.org/kernel-janitors/4ad204f0-94cf-46c5-bdab-49592addf315... Signed-off-by: Jeremy Kerr jk@codeconstruct.com.au Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20240816-mctp-kunit-skb-fix-v1-1-3c367ac89c27@codec... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/mctp/test/route-test.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mctp/test/route-test.c b/net/mctp/test/route-test.c index 77e5dd4222580..8551dab1d1e69 100644 --- a/net/mctp/test/route-test.c +++ b/net/mctp/test/route-test.c @@ -366,7 +366,7 @@ static void mctp_test_route_input_sk(struct kunit *test)
skb2 = skb_recv_datagram(sock->sk, MSG_DONTWAIT, &rc); KUNIT_EXPECT_NOT_ERR_OR_NULL(test, skb2); - KUNIT_EXPECT_EQ(test, skb->len, 1); + KUNIT_EXPECT_EQ(test, skb2->len, 1);
skb_free_datagram(sock->sk, skb2);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 807067bf014d4a3ae2cc55bd3de16f22a01eb580 ]
syzkaller reported UAF in kcm_release(). [0]
The scenario is
1. Thread A builds a skb with MSG_MORE and sets kcm->seq_skb.
2. Thread A resumes building skb from kcm->seq_skb but is blocked by sk_stream_wait_memory()
3. Thread B calls sendmsg() concurrently, finishes building kcm->seq_skb and puts the skb to the write queue
4. Thread A faces an error and finally frees skb that is already in the write queue
5. kcm_release() does double-free the skb in the write queue
When a thread is building a MSG_MORE skb, another thread must not touch it.
Let's add a per-sk mutex and serialise kcm_sendmsg().
[0]: BUG: KASAN: slab-use-after-free in __skb_unlink include/linux/skbuff.h:2366 [inline] BUG: KASAN: slab-use-after-free in __skb_dequeue include/linux/skbuff.h:2385 [inline] BUG: KASAN: slab-use-after-free in __skb_queue_purge_reason include/linux/skbuff.h:3175 [inline] BUG: KASAN: slab-use-after-free in __skb_queue_purge include/linux/skbuff.h:3181 [inline] BUG: KASAN: slab-use-after-free in kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691 Read of size 8 at addr ffff0000ced0fc80 by task syz-executor329/6167
CPU: 1 PID: 6167 Comm: syz-executor329 Tainted: G B 6.8.0-rc5-syzkaller-g9abbc24128bc #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/25/2024 Call trace: dump_backtrace+0x1b8/0x1e4 arch/arm64/kernel/stacktrace.c:291 show_stack+0x2c/0x3c arch/arm64/kernel/stacktrace.c:298 __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xd0/0x124 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:377 [inline] print_report+0x178/0x518 mm/kasan/report.c:488 kasan_report+0xd8/0x138 mm/kasan/report.c:601 __asan_report_load8_noabort+0x20/0x2c mm/kasan/report_generic.c:381 __skb_unlink include/linux/skbuff.h:2366 [inline] __skb_dequeue include/linux/skbuff.h:2385 [inline] __skb_queue_purge_reason include/linux/skbuff.h:3175 [inline] __skb_queue_purge include/linux/skbuff.h:3181 [inline] kcm_release+0x170/0x4c8 net/kcm/kcmsock.c:1691 __sock_release net/socket.c:659 [inline] sock_close+0xa4/0x1e8 net/socket.c:1421 __fput+0x30c/0x738 fs/file_table.c:376 ____fput+0x20/0x30 fs/file_table.c:404 task_work_run+0x230/0x2e0 kernel/task_work.c:180 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0x618/0x1f64 kernel/exit.c:871 do_group_exit+0x194/0x22c kernel/exit.c:1020 get_signal+0x1500/0x15ec kernel/signal.c:2893 do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249 do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148 exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline] exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline] el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Allocated by task 6166: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x40/0x78 mm/kasan/common.c:68 kasan_save_alloc_info+0x70/0x84 mm/kasan/generic.c:626 unpoison_slab_object mm/kasan/common.c:314 [inline] __kasan_slab_alloc+0x74/0x8c mm/kasan/common.c:340 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3813 [inline] slab_alloc_node mm/slub.c:3860 [inline] kmem_cache_alloc_node+0x204/0x4c0 mm/slub.c:3903 __alloc_skb+0x19c/0x3d8 net/core/skbuff.c:641 alloc_skb include/linux/skbuff.h:1296 [inline] kcm_sendmsg+0x1d3c/0x2124 net/kcm/kcmsock.c:783 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_sendmsg+0x220/0x2c0 net/socket.c:768 splice_to_socket+0x7cc/0xd58 fs/splice.c:889 do_splice_from fs/splice.c:941 [inline] direct_splice_actor+0xec/0x1d8 fs/splice.c:1164 splice_direct_to_actor+0x438/0xa0c fs/splice.c:1108 do_splice_direct_actor fs/splice.c:1207 [inline] do_splice_direct+0x1e4/0x304 fs/splice.c:1233 do_sendfile+0x460/0xb3c fs/read_write.c:1295 __do_sys_sendfile64 fs/read_write.c:1362 [inline] __se_sys_sendfile64 fs/read_write.c:1348 [inline] __arm64_sys_sendfile64+0x160/0x3b4 fs/read_write.c:1348 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x54/0x168 arch/arm64/kernel/entry-common.c:712 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
Freed by task 6167: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x40/0x78 mm/kasan/common.c:68 kasan_save_free_info+0x5c/0x74 mm/kasan/generic.c:640 poison_slab_object+0x124/0x18c mm/kasan/common.c:241 __kasan_slab_free+0x3c/0x78 mm/kasan/common.c:257 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2121 [inline] slab_free mm/slub.c:4299 [inline] kmem_cache_free+0x15c/0x3d4 mm/slub.c:4363 kfree_skbmem+0x10c/0x19c __kfree_skb net/core/skbuff.c:1109 [inline] kfree_skb_reason+0x240/0x6f4 net/core/skbuff.c:1144 kfree_skb include/linux/skbuff.h:1244 [inline] kcm_release+0x104/0x4c8 net/kcm/kcmsock.c:1685 __sock_release net/socket.c:659 [inline] sock_close+0xa4/0x1e8 net/socket.c:1421 __fput+0x30c/0x738 fs/file_table.c:376 ____fput+0x20/0x30 fs/file_table.c:404 task_work_run+0x230/0x2e0 kernel/task_work.c:180 exit_task_work include/linux/task_work.h:38 [inline] do_exit+0x618/0x1f64 kernel/exit.c:871 do_group_exit+0x194/0x22c kernel/exit.c:1020 get_signal+0x1500/0x15ec kernel/signal.c:2893 do_signal+0x23c/0x3b44 arch/arm64/kernel/signal.c:1249 do_notify_resume+0x74/0x1f4 arch/arm64/kernel/entry-common.c:148 exit_to_user_mode_prepare arch/arm64/kernel/entry-common.c:169 [inline] exit_to_user_mode arch/arm64/kernel/entry-common.c:178 [inline] el0_svc+0xac/0x168 arch/arm64/kernel/entry-common.c:713 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:730 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:598
The buggy address belongs to the object at ffff0000ced0fc80 which belongs to the cache skbuff_head_cache of size 240 The buggy address is located 0 bytes inside of freed 240-byte region [ffff0000ced0fc80, ffff0000ced0fd70)
The buggy address belongs to the physical page: page:00000000d35f4ae4 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10ed0f flags: 0x5ffc00000000800(slab|node=0|zone=2|lastcpupid=0x7ff) page_type: 0xffffffff() raw: 05ffc00000000800 ffff0000c1cbf640 fffffdffc3423100 dead000000000004 raw: 0000000000000000 00000000000c000c 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff0000ced0fb80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff0000ced0fc00: fb fb fb fb fb fb fc fc fc fc fc fc fc fc fc fc
ffff0000ced0fc80: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff0000ced0fd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fc fc ffff0000ced0fd80: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
Fixes: ab7ac4eb9832 ("kcm: Kernel Connection Multiplexor module") Reported-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b72d86aa5df17ce74c60 Tested-by: syzbot+b72d86aa5df17ce74c60@syzkaller.appspotmail.com Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://patch.msgid.link/20240815220437.69511-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/kcm.h | 1 + net/kcm/kcmsock.c | 4 ++++ 2 files changed, 5 insertions(+)
diff --git a/include/net/kcm.h b/include/net/kcm.h index 90279e5e09a5c..441e993be634c 100644 --- a/include/net/kcm.h +++ b/include/net/kcm.h @@ -70,6 +70,7 @@ struct kcm_sock { struct work_struct tx_work; struct list_head wait_psock_list; struct sk_buff *seq_skb; + struct mutex tx_mutex; u32 tx_stopped : 1;
/* Don't use bit fields here, these are set under different locks */ diff --git a/net/kcm/kcmsock.c b/net/kcm/kcmsock.c index 2f191e50d4fc9..d4118c796290e 100644 --- a/net/kcm/kcmsock.c +++ b/net/kcm/kcmsock.c @@ -755,6 +755,7 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) !(msg->msg_flags & MSG_MORE) : !!(msg->msg_flags & MSG_EOR); int err = -EPIPE;
+ mutex_lock(&kcm->tx_mutex); lock_sock(sk);
/* Per tcp_sendmsg this should be in poll */ @@ -926,6 +927,7 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) KCM_STATS_ADD(kcm->stats.tx_bytes, copied);
release_sock(sk); + mutex_unlock(&kcm->tx_mutex); return copied;
out_error: @@ -951,6 +953,7 @@ static int kcm_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) sk->sk_write_space(sk);
release_sock(sk); + mutex_unlock(&kcm->tx_mutex); return err; }
@@ -1204,6 +1207,7 @@ static void init_kcm_sock(struct kcm_sock *kcm, struct kcm_mux *mux) spin_unlock_bh(&mux->lock);
INIT_WORK(&kcm->tx_work, kcm_tx_work); + mutex_init(&kcm->tx_mutex);
spin_lock_bh(&mux->rx_lock); kcm_rcv_ready(kcm);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit 1eacdd71b3436b54d5fc8218c4bb0187d92a6892 ]
The sequence counter nft_counter_seq is a per-CPU counter. There is no lock associated with it. nft_counter_do_eval() is using the same counter and disables BH which suggest that it can be invoked from a softirq. This in turn means that nft_counter_offload_stats(), which disables only preemption, can be interrupted by nft_counter_do_eval() leading to two writer for one seqcount_t. This can lead to loosing stats or reading statistics while they are updated.
Disable BH during stats update in nft_counter_offload_stats() to ensure one writer at a time.
Fixes: b72920f6e4a9d ("netfilter: nftables: counter hardware offload support") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_counter.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c index 291ed2026367e..16f40b503d379 100644 --- a/net/netfilter/nft_counter.c +++ b/net/netfilter/nft_counter.c @@ -265,7 +265,7 @@ static void nft_counter_offload_stats(struct nft_expr *expr, struct nft_counter *this_cpu; seqcount_t *myseq;
- preempt_disable(); + local_bh_disable(); this_cpu = this_cpu_ptr(priv->counter); myseq = this_cpu_ptr(&nft_counter_seq);
@@ -273,7 +273,7 @@ static void nft_counter_offload_stats(struct nft_expr *expr, this_cpu->packets += stats->pkts; this_cpu->bytes += stats->bytes; write_seqcount_end(myseq); - preempt_enable(); + local_bh_enable(); }
void nft_counter_init_seqcount(void)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sebastian Andrzej Siewior bigeasy@linutronix.de
[ Upstream commit a0b39e2dc7017ac667b70bdeee5293e410fab2fb ]
nft_counter_reset() resets the counter by subtracting the previously retrieved value from the counter. This is a write operation on the counter and as such it requires to be performed with a write sequence of nft_counter_seq to serialize against its possible reader.
Update the packets/ bytes within write-sequence of nft_counter_seq.
Fixes: d84701ecbcd6a ("netfilter: nft_counter: rework atomic dump and reset") Signed-off-by: Sebastian Andrzej Siewior bigeasy@linutronix.de Reviewed-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_counter.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/netfilter/nft_counter.c b/net/netfilter/nft_counter.c index 16f40b503d379..eab0dc66bee6b 100644 --- a/net/netfilter/nft_counter.c +++ b/net/netfilter/nft_counter.c @@ -107,11 +107,16 @@ static void nft_counter_reset(struct nft_counter_percpu_priv *priv, struct nft_counter *total) { struct nft_counter *this_cpu; + seqcount_t *myseq;
local_bh_disable(); this_cpu = this_cpu_ptr(priv->counter); + myseq = this_cpu_ptr(&nft_counter_seq); + + write_seqcount_begin(myseq); this_cpu->packets -= total->packets; this_cpu->bytes -= total->bytes; + write_seqcount_end(myseq); local_bh_enable(); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Bogendoerfer tbogendoerfer@suse.de
[ Upstream commit 4b3e33fcc38f7750604b065c55a43e94c5bc3145 ]
GRO code checks for matching layer 2 headers to see, if packet belongs to the same flow and because ip6 tunnel set dev->hard_header_len this check fails in cases, where it shouldn't. To fix this don't set hard_header_len, but use needed_headroom like ipv4/ip_tunnel.c does.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Thomas Bogendoerfer tbogendoerfer@suse.de Link: https://patch.msgid.link/20240815151419.109864-1-tbogendoerfer@suse.de Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_tunnel.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c index 9dee0c1279554..87dfb565a9f81 100644 --- a/net/ipv6/ip6_tunnel.c +++ b/net/ipv6/ip6_tunnel.c @@ -1507,7 +1507,8 @@ static void ip6_tnl_link_config(struct ip6_tnl *t) tdev = __dev_get_by_index(t->net, p->link);
if (tdev) { - dev->hard_header_len = tdev->hard_header_len + t_hlen; + dev->needed_headroom = tdev->hard_header_len + + tdev->needed_headroom + t_hlen; mtu = min_t(unsigned int, tdev->mtu, IP6_MAX_MTU);
mtu = mtu - t_hlen; @@ -1731,7 +1732,9 @@ ip6_tnl_siocdevprivate(struct net_device *dev, struct ifreq *ifr, int ip6_tnl_change_mtu(struct net_device *dev, int new_mtu) { struct ip6_tnl *tnl = netdev_priv(dev); + int t_hlen;
+ t_hlen = tnl->hlen + sizeof(struct ipv6hdr); if (tnl->parms.proto == IPPROTO_IPV6) { if (new_mtu < IPV6_MIN_MTU) return -EINVAL; @@ -1740,10 +1743,10 @@ int ip6_tnl_change_mtu(struct net_device *dev, int new_mtu) return -EINVAL; } if (tnl->parms.proto == IPPROTO_IPV6 || tnl->parms.proto == 0) { - if (new_mtu > IP6_MAX_MTU - dev->hard_header_len) + if (new_mtu > IP6_MAX_MTU - dev->hard_header_len - t_hlen) return -EINVAL; } else { - if (new_mtu > IP_MAX_MTU - dev->hard_header_len) + if (new_mtu > IP_MAX_MTU - dev->hard_header_len - t_hlen) return -EINVAL; } WRITE_ONCE(dev->mtu, new_mtu); @@ -1887,12 +1890,11 @@ ip6_tnl_dev_init_gen(struct net_device *dev) t_hlen = t->hlen + sizeof(struct ipv6hdr);
dev->type = ARPHRD_TUNNEL6; - dev->hard_header_len = LL_MAX_HEADER + t_hlen; dev->mtu = ETH_DATA_LEN - t_hlen; if (!(t->parms.flags & IP6_TNL_F_IGN_ENCAP_LIMIT)) dev->mtu -= 8; dev->min_mtu = ETH_MIN_MTU; - dev->max_mtu = IP6_MAX_MTU - dev->hard_header_len; + dev->max_mtu = IP6_MAX_MTU - dev->hard_header_len - t_hlen;
netdev_hold(dev, &t->dev_tracker, GFP_KERNEL); netdev_lockdep_set_classes(dev);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Aleksandrov razor@blackwall.org
[ Upstream commit fc59b9a5f7201b9f7272944596113a82cc7773d5 ]
Fix the return type which should be bool.
Fixes: 955b785ec6b3 ("bonding: fix suspicious RCU usage in bond_ipsec_offload_ok()") Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 18 ++++++------------ 1 file changed, 6 insertions(+), 12 deletions(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 2ed0da0684906..2370da4632149 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -599,34 +599,28 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs) struct net_device *real_dev; struct slave *curr_active; struct bonding *bond; - int err; + bool ok = false;
bond = netdev_priv(bond_dev); rcu_read_lock(); curr_active = rcu_dereference(bond->curr_active_slave); real_dev = curr_active->dev;
- if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) { - err = false; + if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) goto out; - }
- if (!xs->xso.real_dev) { - err = false; + if (!xs->xso.real_dev) goto out; - }
if (!real_dev->xfrmdev_ops || !real_dev->xfrmdev_ops->xdo_dev_offload_ok || - netif_is_bond_master(real_dev)) { - err = false; + netif_is_bond_master(real_dev)) goto out; - }
- err = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs); + ok = real_dev->xfrmdev_ops->xdo_dev_offload_ok(skb, xs); out: rcu_read_unlock(); - return err; + return ok; }
static const struct xfrmdev_ops bond_xfrmdev_ops = {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Aleksandrov razor@blackwall.org
[ Upstream commit 95c90e4ad89d493a7a14fa200082e466e2548f9d ]
We must check if there is an active slave before dereferencing the pointer.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 2370da4632149..55841a0e05a47 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -604,6 +604,8 @@ static bool bond_ipsec_offload_ok(struct sk_buff *skb, struct xfrm_state *xs) bond = netdev_priv(bond_dev); rcu_read_lock(); curr_active = rcu_dereference(bond->curr_active_slave); + if (!curr_active) + goto out; real_dev = curr_active->dev;
if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Aleksandrov razor@blackwall.org
[ Upstream commit f8cde9805981c50d0c029063dc7d82821806fc44 ]
We shouldn't set real_dev to NULL because packets can be in transit and xfrm might call xdo_dev_offload_ok() in parallel. All callbacks assume real_dev is set.
Example trace: kernel: BUG: unable to handle page fault for address: 0000000000001030 kernel: bond0: (slave eni0np1): making interface the new active one kernel: #PF: supervisor write access in kernel mode kernel: #PF: error_code(0x0002) - not-present page kernel: PGD 0 P4D 0 kernel: Oops: 0002 [#1] PREEMPT SMP kernel: CPU: 4 PID: 2237 Comm: ping Not tainted 6.7.7+ #12 kernel: Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014 kernel: RIP: 0010:nsim_ipsec_offload_ok+0xc/0x20 [netdevsim] kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: Code: e0 0f 0b 48 83 7f 38 00 74 de 0f 0b 48 8b 47 08 48 8b 37 48 8b 78 40 e9 b2 e5 9a d7 66 90 0f 1f 44 00 00 48 8b 86 80 02 00 00 <83> 80 30 10 00 00 01 b8 01 00 00 00 c3 0f 1f 80 00 00 00 00 0f 1f kernel: bond0: (slave eni0np1): making interface the new active one kernel: RSP: 0018:ffffabde81553b98 EFLAGS: 00010246 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: kernel: RAX: 0000000000000000 RBX: ffff9eb404e74900 RCX: ffff9eb403d97c60 kernel: RDX: ffffffffc090de10 RSI: ffff9eb404e74900 RDI: ffff9eb3c5de9e00 kernel: RBP: ffff9eb3c0a42000 R08: 0000000000000010 R09: 0000000000000014 kernel: R10: 7974203030303030 R11: 3030303030303030 R12: 0000000000000000 kernel: R13: ffff9eb3c5de9e00 R14: ffffabde81553cc8 R15: ffff9eb404c53000 kernel: FS: 00007f2a77a3ad00(0000) GS:ffff9eb43bd00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000001030 CR3: 00000001122ab000 CR4: 0000000000350ef0 kernel: bond0: (slave eni0np1): making interface the new active one kernel: Call Trace: kernel: <TASK> kernel: ? __die+0x1f/0x60 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: ? page_fault_oops+0x142/0x4c0 kernel: ? do_user_addr_fault+0x65/0x670 kernel: ? kvm_read_and_reset_apf_flags+0x3b/0x50 kernel: bond0: (slave eni0np1): making interface the new active one kernel: ? exc_page_fault+0x7b/0x180 kernel: ? asm_exc_page_fault+0x22/0x30 kernel: ? nsim_bpf_uninit+0x50/0x50 [netdevsim] kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: ? nsim_ipsec_offload_ok+0xc/0x20 [netdevsim] kernel: bond0: (slave eni0np1): making interface the new active one kernel: bond_ipsec_offload_ok+0x7b/0x90 [bonding] kernel: xfrm_output+0x61/0x3b0 kernel: bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA kernel: ip_push_pending_frames+0x56/0x80
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_main.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 55841a0e05a47..b257504a85347 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -582,7 +582,6 @@ static void bond_ipsec_del_sa_all(struct bonding *bond) } else { slave->dev->xfrmdev_ops->xdo_dev_state_delete(ipsec->xs); } - ipsec->xs->xso.real_dev = NULL; } spin_unlock_bh(&bond->ipsec_lock); rcu_read_unlock();
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Aleksandrov razor@blackwall.org
[ Upstream commit c4c5c5d2ef40a9f67a9241dc5422eac9ffe19547 ]
If the active slave is cleared manually the xfrm state is not flushed. This leads to xfrm add/del imbalance and adding the same state multiple times. For example when the device cannot handle anymore states we get: [ 1169.884811] bond0: (slave eni0np1): bond_ipsec_add_sa_all: failed to add SA because it's filled with the same state after multiple active slave clearings. This change also has a few nice side effects: user-space gets a notification for the change, the old device gets its mac address and promisc/mcast adjusted properly.
Fixes: 18cb261afd7b ("bonding: support hardware encryption offload to slaves") Signed-off-by: Nikolay Aleksandrov razor@blackwall.org Reviewed-by: Hangbin Liu liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_options.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index bc80fb6397dcd..95d59a18c0223 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -936,7 +936,7 @@ static int bond_option_active_slave_set(struct bonding *bond, /* check to see if we are clearing active */ if (!slave_dev) { netdev_dbg(bond->dev, "Clearing current active slave\n"); - RCU_INIT_POINTER(bond->curr_active_slave, NULL); + bond_change_active_slave(bond, NULL); bond_select_active_slave(bond); } else { struct slave *old_active = rtnl_dereference(bond->curr_active_slave);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit 50b2143356e888777fc5bca023c39f34f404613a ]
Architectures that have PAGE_SIZE >= 8192 such as arm64 should act the same as x86 currently, meaning reuse of a page should only take place when no one else is busy with it.
Do two things independently of underlying PAGE_SIZE: - store the page count under ice_rx_buf::pgcnt - then act upon its value vs ice_rx_buf::pagecnt_bias when making the decision regarding page reuse
Fixes: 2b245cb29421 ("ice: Implement transmit and NAPI support") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Chandan Kumar Rout chandanx.rout@intel.com (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 12 +++--------- 1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 8d25b69812698..50211188c1a7a 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -837,16 +837,15 @@ ice_can_reuse_rx_page(struct ice_rx_buf *rx_buf) if (!dev_page_is_reusable(page)) return false;
-#if (PAGE_SIZE < 8192) /* if we are only owner of page we can reuse it */ if (unlikely(rx_buf->pgcnt - pagecnt_bias > 1)) return false; -#else +#if (PAGE_SIZE >= 8192) #define ICE_LAST_OFFSET \ (SKB_WITH_OVERHEAD(PAGE_SIZE) - ICE_RXBUF_2048) if (rx_buf->page_offset > ICE_LAST_OFFSET) return false; -#endif /* PAGE_SIZE < 8192) */ +#endif /* PAGE_SIZE >= 8192) */
/* If we have drained the page fragment pool we need to update * the pagecnt_bias and page count so that we fully restock the @@ -949,12 +948,7 @@ ice_get_rx_buf(struct ice_rx_ring *rx_ring, const unsigned int size, struct ice_rx_buf *rx_buf;
rx_buf = &rx_ring->rx_buf[ntc]; - rx_buf->pgcnt = -#if (PAGE_SIZE < 8192) - page_count(rx_buf->page); -#else - 0; -#endif + rx_buf->pgcnt = page_count(rx_buf->page); prefetchw(rx_buf->page);
if (!size)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit b966ad832942b5a11e002f9b5ef102b08425b84a ]
For bigger PAGE_SIZE archs, ice driver works on 3k Rx buffers. Therefore, ICE_LAST_OFFSET should take into account ICE_RXBUF_3072, not ICE_RXBUF_2048.
Fixes: 7237f5b0dba4 ("ice: introduce legacy Rx flag") Suggested-by: Luiz Capitulino luizcap@redhat.com Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Chandan Kumar Rout chandanx.rout@intel.com (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 50211188c1a7a..4b690952bb403 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -842,7 +842,7 @@ ice_can_reuse_rx_page(struct ice_rx_buf *rx_buf) return false; #if (PAGE_SIZE >= 8192) #define ICE_LAST_OFFSET \ - (SKB_WITH_OVERHEAD(PAGE_SIZE) - ICE_RXBUF_2048) + (SKB_WITH_OVERHEAD(PAGE_SIZE) - ICE_RXBUF_3072) if (rx_buf->page_offset > ICE_LAST_OFFSET) return false; #endif /* PAGE_SIZE >= 8192) */
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit d53d4dcce69be5773e2d0878c9899ebfbf58c393 ]
When working on multi-buffer packet on arch that has PAGE_SIZE >= 8192, truesize is calculated and stored in xdp_buff::frame_sz per each processed Rx buffer. This means that frame_sz will contain the truesize based on last received buffer, but commit 1dc1a7e7f410 ("ice: Centrallize Rx buffer recycling") assumed this value will be constant for each buffer, which breaks the page recycling scheme and mess up the way we update the page::page_offset.
To fix this, let us work on constant truesize when PAGE_SIZE >= 8192 instead of basing this on size of a packet read from Rx descriptor. This way we can simplify the code and avoid calculating truesize per each received frame and on top of that when using xdp_update_skb_shared_info(), current formula for truesize update will be valid.
This means ice_rx_frame_truesize() can be removed altogether. Furthermore, first call to it within ice_clean_rx_irq() for 4k PAGE_SIZE was redundant as xdp_buff::frame_sz is initialized via xdp_init_buff() in ice_vsi_cfg_rxq(). This should have been removed at the point where xdp_buff struct started to be a member of ice_rx_ring and it was no longer a stack based variable.
There are two fixes tags as my understanding is that the first one exposed us to broken truesize and page_offset handling and then second introduced broken skb_shared_info update in ice_{construct,build}_skb().
Reported-and-tested-by: Luiz Capitulino luizcap@redhat.com Closes: https://lore.kernel.org/netdev/8f9e2a5c-fd30-4206-9311-946a06d031bb@redhat.c... Fixes: 1dc1a7e7f410 ("ice: Centrallize Rx buffer recycling") Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Tested-by: Chandan Kumar Rout chandanx.rout@intel.com (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_base.c | 21 ++++++++++++++- drivers/net/ethernet/intel/ice/ice_txrx.c | 33 ----------------------- 2 files changed, 20 insertions(+), 34 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 1facf179a96fd..f448d3a845642 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -512,6 +512,25 @@ static void ice_xsk_pool_fill_cb(struct ice_rx_ring *ring) xsk_pool_fill_cb(ring->xsk_pool, &desc); }
+/** + * ice_get_frame_sz - calculate xdp_buff::frame_sz + * @rx_ring: the ring being configured + * + * Return frame size based on underlying PAGE_SIZE + */ +static unsigned int ice_get_frame_sz(struct ice_rx_ring *rx_ring) +{ + unsigned int frame_sz; + +#if (PAGE_SIZE >= 8192) + frame_sz = rx_ring->rx_buf_len; +#else + frame_sz = ice_rx_pg_size(rx_ring) / 2; +#endif + + return frame_sz; +} + /** * ice_vsi_cfg_rxq - Configure an Rx queue * @ring: the ring being configured @@ -576,7 +595,7 @@ static int ice_vsi_cfg_rxq(struct ice_rx_ring *ring) } }
- xdp_init_buff(&ring->xdp, ice_rx_pg_size(ring) / 2, &ring->xdp_rxq); + xdp_init_buff(&ring->xdp, ice_get_frame_sz(ring), &ring->xdp_rxq); ring->xdp.data = NULL; ring->xdp_ext.pkt_ctx = &ring->pkt_ctx; err = ice_setup_rx_ctx(ring); diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 4b690952bb403..c9bc3f1add5d3 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -521,30 +521,6 @@ int ice_setup_rx_ring(struct ice_rx_ring *rx_ring) return -ENOMEM; }
-/** - * ice_rx_frame_truesize - * @rx_ring: ptr to Rx ring - * @size: size - * - * calculate the truesize with taking into the account PAGE_SIZE of - * underlying arch - */ -static unsigned int -ice_rx_frame_truesize(struct ice_rx_ring *rx_ring, const unsigned int size) -{ - unsigned int truesize; - -#if (PAGE_SIZE < 8192) - truesize = ice_rx_pg_size(rx_ring) / 2; /* Must be power-of-2 */ -#else - truesize = rx_ring->rx_offset ? - SKB_DATA_ALIGN(rx_ring->rx_offset + size) + - SKB_DATA_ALIGN(sizeof(struct skb_shared_info)) : - SKB_DATA_ALIGN(size); -#endif - return truesize; -} - /** * ice_run_xdp - Executes an XDP program on initialized xdp_buff * @rx_ring: Rx ring @@ -1154,11 +1130,6 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) bool failure; u32 first;
- /* Frame size depend on rx_ring setup when PAGE_SIZE=4K */ -#if (PAGE_SIZE < 8192) - xdp->frame_sz = ice_rx_frame_truesize(rx_ring, 0); -#endif - xdp_prog = READ_ONCE(rx_ring->xdp_prog); if (xdp_prog) { xdp_ring = rx_ring->xdp_ring; @@ -1217,10 +1188,6 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) hard_start = page_address(rx_buf->page) + rx_buf->page_offset - offset; xdp_prepare_buff(xdp, hard_start, offset, size, !!offset); -#if (PAGE_SIZE > 4096) - /* At larger PAGE_SIZE, frame_sz depend on len size */ - xdp->frame_sz = ice_rx_frame_truesize(rx_ring, size); -#endif xdp_buff_clear_frags_flag(xdp); } else if (ice_add_xdp_frag(rx_ring, xdp, rx_buf, size)) { break;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michal Swiatkowski michal.swiatkowski@linux.intel.com
[ Upstream commit 503ab6ee40fc103ea55cc9e50bb879e571d65aac ]
Use always the same pf id in devlink port number. When doing pass-through the PF to VM bus info func number can be any value.
Fixes: 2ae0aa4758b0 ("ice: Move devlink port to PF/VF struct") Reviewed-by: Wojciech Drewek wojciech.drewek@intel.com Suggested-by: Jiri Pirko jiri@resnulli.us Signed-off-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Signed-off-by: Tony Nguyen anthony.l.nguyen@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/devlink/devlink_port.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/devlink/devlink_port.c b/drivers/net/ethernet/intel/ice/devlink/devlink_port.c index 13e6790d3cae7..afcf64dab48a1 100644 --- a/drivers/net/ethernet/intel/ice/devlink/devlink_port.c +++ b/drivers/net/ethernet/intel/ice/devlink/devlink_port.c @@ -337,7 +337,7 @@ int ice_devlink_create_pf_port(struct ice_pf *pf) return -EIO;
attrs.flavour = DEVLINK_PORT_FLAVOUR_PHYSICAL; - attrs.phys.port_number = pf->hw.bus.func; + attrs.phys.port_number = pf->hw.pf_id;
/* As FW supports only port split options for whole device, * set port split options only for first PF. @@ -399,7 +399,7 @@ int ice_devlink_create_vf_port(struct ice_vf *vf) return -EINVAL;
attrs.flavour = DEVLINK_PORT_FLAVOUR_PCI_VF; - attrs.pci_vf.pf = pf->hw.bus.func; + attrs.pci_vf.pf = pf->hw.pf_id; attrs.pci_vf.vf = vf->vf_id;
ice_devlink_set_switch_id(pf, &attrs.switch_id);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit c50e7475961c36ec4d21d60af055b32f9436b431 ]
The dpaa2_switch_add_bufs() function returns the number of bufs that it was able to add. It returns BUFS_PER_CMD (7) for complete success or a smaller number if there are not enough pages available. However, the error checking is looking at the total number of bufs instead of the number which were added on this iteration. Thus the error checking only works correctly for the first iteration through the loop and subsequent iterations are always counted as a success.
Fix this by checking only the bufs added in the current iteration.
Fixes: 0b1b71370458 ("staging: dpaa2-switch: handle Rx path on control interface") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Ioana Ciornei ioana.ciornei@nxp.com Tested-by: Ioana Ciornei ioana.ciornei@nxp.com Link: https://patch.msgid.link/eec27f30-b43f-42b6-b8ee-04a6f83423b6@stanley.mounta... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c index a71f848adc054..a293b08f36d46 100644 --- a/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c +++ b/drivers/net/ethernet/freescale/dpaa2/dpaa2-switch.c @@ -2638,13 +2638,14 @@ static int dpaa2_switch_refill_bp(struct ethsw_core *ethsw)
static int dpaa2_switch_seed_bp(struct ethsw_core *ethsw) { - int *count, i; + int *count, ret, i;
for (i = 0; i < DPAA2_ETHSW_NUM_BUFS; i += BUFS_PER_CMD) { + ret = dpaa2_switch_add_bufs(ethsw, ethsw->bpid); count = ðsw->buf_count; - *count += dpaa2_switch_add_bufs(ethsw, ethsw->bpid); + *count += ret;
- if (unlikely(*count < BUFS_PER_CMD)) + if (unlikely(ret < BUFS_PER_CMD)) return -ENOMEM; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paolo Abeni pabeni@redhat.com
[ Upstream commit 8aba27c4a5020abdf60149239198297f88338a8d ]
Sabrina reports that the igb driver does not cope well with large MAX_SKB_FRAG values: setting MAX_SKB_FRAG to 45 causes payload corruption on TX.
An easy reproducer is to run ssh to connect to the machine. With MAX_SKB_FRAGS=17 it works, with MAX_SKB_FRAGS=45 it fails. This has been reported originally in https://bugzilla.redhat.com/show_bug.cgi?id=2265320
The root cause of the issue is that the driver does not take into account properly the (possibly large) shared info size when selecting the ring layout, and will try to fit two packets inside the same 4K page even when the 1st fraglist will trump over the 2nd head.
Address the issue by checking if 2K buffers are insufficient.
Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS") Reported-by: Jan Tluka jtluka@redhat.com Reported-by: Jirka Hladky jhladky@redhat.com Reported-by: Sabrina Dubroca sd@queasysnail.net Tested-by: Sabrina Dubroca sd@queasysnail.net Tested-by: Corinna Vinschen vinschen@redhat.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Corinna Vinschen vinschen@redhat.com Link: https://patch.msgid.link/20240816152034.1453285-1-vinschen@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/igb/igb_main.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/intel/igb/igb_main.c b/drivers/net/ethernet/intel/igb/igb_main.c index fce2930ae6af7..b6aa449aa56af 100644 --- a/drivers/net/ethernet/intel/igb/igb_main.c +++ b/drivers/net/ethernet/intel/igb/igb_main.c @@ -4809,6 +4809,7 @@ static void igb_set_rx_buffer_len(struct igb_adapter *adapter,
#if (PAGE_SIZE < 8192) if (adapter->max_frame_size > IGB_MAX_FRAME_BUILD_SKB || + IGB_2K_TOO_SMALL_WITH_PADDING || rd32(E1000_RCTL) & E1000_RCTL_SBP) set_ring_uses_large_buffer(rx_ring); #endif
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Joseph Huang Joseph.Huang@garmin.com
[ Upstream commit 528876d867a23b5198022baf2e388052ca67c952 ]
If an ATU violation was caused by a CPU Load operation, the SPID could be larger than DSA_MAX_PORTS (the size of mv88e6xxx_chip.ports[] array).
Fixes: 75c05a74e745 ("net: dsa: mv88e6xxx: Fix counting of ATU violations") Signed-off-by: Joseph Huang Joseph.Huang@garmin.com Reviewed-by: Andrew Lunn andrew@lunn.ch Link: https://patch.msgid.link/20240819235251.1331763-1-Joseph.Huang@garmin.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mv88e6xxx/global1_atu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/dsa/mv88e6xxx/global1_atu.c b/drivers/net/dsa/mv88e6xxx/global1_atu.c index ce3b3690c3c05..c47f068f56b32 100644 --- a/drivers/net/dsa/mv88e6xxx/global1_atu.c +++ b/drivers/net/dsa/mv88e6xxx/global1_atu.c @@ -457,7 +457,8 @@ static irqreturn_t mv88e6xxx_g1_atu_prob_irq_thread_fn(int irq, void *dev_id) trace_mv88e6xxx_atu_full_violation(chip->dev, spid, entry.portvec, entry.mac, fid); - chip->ports[spid].atu_full_violation++; + if (spid < ARRAY_SIZE(chip->ports)) + chip->ports[spid].atu_full_violation++; }
return IRQ_HANDLED;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephen Hemminger stephen@networkplumber.org
[ Upstream commit c07ff8592d57ed258afee5a5e04991a48dbaf382 ]
There is a bug in netem_enqueue() introduced by commit 5845f706388a ("net: netem: fix skb length BUG_ON in __skb_to_sgvec") that can lead to a use-after-free.
This commit made netem_enqueue() always return NET_XMIT_SUCCESS when a packet is duplicated, which can cause the parent qdisc's q.qlen to be mistakenly incremented. When this happens qlen_notify() may be skipped on the parent during destruction, leaving a dangling pointer for some classful qdiscs like DRR.
There are two ways for the bug happen:
- If the duplicated packet is dropped by rootq->enqueue() and then the original packet is also dropped. - If rootq->enqueue() sends the duplicated packet to a different qdisc and the original packet is dropped.
In both cases NET_XMIT_SUCCESS is returned even though no packets are enqueued at the netem qdisc.
The fix is to defer the enqueue of the duplicate packet until after the original packet has been guaranteed to return NET_XMIT_SUCCESS.
Fixes: 5845f706388a ("net: netem: fix skb length BUG_ON in __skb_to_sgvec") Reported-by: Budimir Markovic markovicbudimir@gmail.com Signed-off-by: Stephen Hemminger stephen@networkplumber.org Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20240819175753.5151-1-stephen@networkplumber.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_netem.c | 47 ++++++++++++++++++++++++++----------------- 1 file changed, 29 insertions(+), 18 deletions(-)
diff --git a/net/sched/sch_netem.c b/net/sched/sch_netem.c index edc72962ae63a..0f8d581438c39 100644 --- a/net/sched/sch_netem.c +++ b/net/sched/sch_netem.c @@ -446,12 +446,10 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, struct netem_sched_data *q = qdisc_priv(sch); /* We don't fill cb now as skb_unshare() may invalidate it */ struct netem_skb_cb *cb; - struct sk_buff *skb2; + struct sk_buff *skb2 = NULL; struct sk_buff *segs = NULL; unsigned int prev_len = qdisc_pkt_len(skb); int count = 1; - int rc = NET_XMIT_SUCCESS; - int rc_drop = NET_XMIT_DROP;
/* Do not fool qdisc_drop_all() */ skb->prev = NULL; @@ -480,19 +478,11 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, skb_orphan_partial(skb);
/* - * If we need to duplicate packet, then re-insert at top of the - * qdisc tree, since parent queuer expects that only one - * skb will be queued. + * If we need to duplicate packet, then clone it before + * original is modified. */ - if (count > 1 && (skb2 = skb_clone(skb, GFP_ATOMIC)) != NULL) { - struct Qdisc *rootq = qdisc_root_bh(sch); - u32 dupsave = q->duplicate; /* prevent duplicating a dup... */ - - q->duplicate = 0; - rootq->enqueue(skb2, rootq, to_free); - q->duplicate = dupsave; - rc_drop = NET_XMIT_SUCCESS; - } + if (count > 1) + skb2 = skb_clone(skb, GFP_ATOMIC);
/* * Randomized packet corruption. @@ -504,7 +494,8 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, if (skb_is_gso(skb)) { skb = netem_segment(skb, sch, to_free); if (!skb) - return rc_drop; + goto finish_segs; + segs = skb->next; skb_mark_not_on_list(skb); qdisc_skb_cb(skb)->pkt_len = skb->len; @@ -530,7 +521,24 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, /* re-link segs, so that qdisc_drop_all() frees them all */ skb->next = segs; qdisc_drop_all(skb, sch, to_free); - return rc_drop; + if (skb2) + __qdisc_drop(skb2, to_free); + return NET_XMIT_DROP; + } + + /* + * If doing duplication then re-insert at top of the + * qdisc tree, since parent queuer expects that only one + * skb will be queued. + */ + if (skb2) { + struct Qdisc *rootq = qdisc_root_bh(sch); + u32 dupsave = q->duplicate; /* prevent duplicating a dup... */ + + q->duplicate = 0; + rootq->enqueue(skb2, rootq, to_free); + q->duplicate = dupsave; + skb2 = NULL; }
qdisc_qstats_backlog_inc(sch, skb); @@ -601,9 +609,12 @@ static int netem_enqueue(struct sk_buff *skb, struct Qdisc *sch, }
finish_segs: + if (skb2) + __qdisc_drop(skb2, to_free); + if (segs) { unsigned int len, last_len; - int nb; + int rc, nb;
len = skb ? skb->len : 0; nb = skb ? 1 : 0;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Felix Fietkau nbd@nbd.name
[ Upstream commit b128ed5ab27330deeeaf51ea8bb69f1442a96f7f ]
When assembling fraglist GSO packets, udp4_gro_complete does not set skb->csum_start, which makes the extra validation in __udp_gso_segment fail.
Fixes: 89add40066f9 ("net: drop bad gso csum_start and offset in virtio_net_hdr") Signed-off-by: Felix Fietkau nbd@nbd.name Reviewed-by: Willem de Bruijn willemb@google.com Link: https://patch.msgid.link/20240819150621.59833-1-nbd@nbd.name Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/udp_offload.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index ee9af921556a7..5b54f4f32b1cd 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -279,7 +279,8 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, return ERR_PTR(-EINVAL);
if (unlikely(skb_checksum_start(gso_skb) != - skb_transport_header(gso_skb))) + skb_transport_header(gso_skb) && + !(skb_shinfo(gso_skb)->gso_type & SKB_GSO_FRAGLIST))) return ERR_PTR(-EINVAL);
if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit f8669d7b5f5d2d88959456ae9123d8bb6fdc1ebe ]
Source the ethtool library from the correct path and avoid the following error:
./ethtool_lanes.sh: line 14: ./../../../net/forwarding/ethtool_lib.sh: No such file or directory
Fixes: 40d269c000bd ("selftests: forwarding: Move several selftests") Signed-off-by: Ido Schimmel idosch@nvidia.com Signed-off-by: Petr Machata petrm@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/2112faff02e536e1ac14beb4c2be09c9574b90ae.1724150067... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/drivers/net/mlxsw/ethtool_lanes.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/net/mlxsw/ethtool_lanes.sh b/tools/testing/selftests/drivers/net/mlxsw/ethtool_lanes.sh index 877cd6df94a10..fe905a7f34b3c 100755 --- a/tools/testing/selftests/drivers/net/mlxsw/ethtool_lanes.sh +++ b/tools/testing/selftests/drivers/net/mlxsw/ethtool_lanes.sh @@ -2,6 +2,7 @@ # SPDX-License-Identifier: GPL-2.0
lib_dir=$(dirname $0)/../../../net/forwarding +ethtool_lib_dir=$(dirname $0)/../hw
ALL_TESTS=" autoneg @@ -11,7 +12,7 @@ ALL_TESTS=" NUM_NETIFS=2 : ${TIMEOUT:=30000} # ms source $lib_dir/lib.sh -source $lib_dir/ethtool_lib.sh +source $ethtool_lib_dir/ethtool_lib.sh
setup_prepare() {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit faa389b2fbaaec7fd27a390b4896139f9da662e3 ]
syzbot reported an UAF in ip6_send_skb() [1]
After ip6_local_out() has returned, we no longer can safely dereference rt, unless we hold rcu_read_lock().
A similar issue has been fixed in commit a688caa34beb ("ipv6: take rcu lock in rawv6_send_hdrinc()")
Another potential issue in ip6_finish_output2() is handled in a separate patch.
[1] BUG: KASAN: slab-use-after-free in ip6_send_skb+0x18d/0x230 net/ipv6/ip6_output.c:1964 Read of size 8 at addr ffff88806dde4858 by task syz.1.380/6530
CPU: 1 UID: 0 PID: 6530 Comm: syz.1.380 Not tainted 6.11.0-rc3-syzkaller-00306-gdf6cbc62cc9b #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 Call Trace: <TASK> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 ip6_send_skb+0x18d/0x230 net/ipv6/ip6_output.c:1964 rawv6_push_pending_frames+0x75c/0x9e0 net/ipv6/raw.c:588 rawv6_sendmsg+0x19c7/0x23c0 net/ipv6/raw.c:926 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 sock_write_iter+0x2dd/0x400 net/socket.c:1160 do_iter_readv_writev+0x60a/0x890 vfs_writev+0x37c/0xbb0 fs/read_write.c:971 do_writev+0x1b1/0x350 fs/read_write.c:1018 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f936bf79e79 Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007f936cd7f038 EFLAGS: 00000246 ORIG_RAX: 0000000000000014 RAX: ffffffffffffffda RBX: 00007f936c115f80 RCX: 00007f936bf79e79 RDX: 0000000000000001 RSI: 0000000020000040 RDI: 0000000000000004 RBP: 00007f936bfe7916 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 0000000000000000 R14: 00007f936c115f80 R15: 00007fff2860a7a8 </TASK>
Allocated by task 6530: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 unpoison_slab_object mm/kasan/common.c:312 [inline] __kasan_slab_alloc+0x66/0x80 mm/kasan/common.c:338 kasan_slab_alloc include/linux/kasan.h:201 [inline] slab_post_alloc_hook mm/slub.c:3988 [inline] slab_alloc_node mm/slub.c:4037 [inline] kmem_cache_alloc_noprof+0x135/0x2a0 mm/slub.c:4044 dst_alloc+0x12b/0x190 net/core/dst.c:89 ip6_blackhole_route+0x59/0x340 net/ipv6/route.c:2670 make_blackhole net/xfrm/xfrm_policy.c:3120 [inline] xfrm_lookup_route+0xd1/0x1c0 net/xfrm/xfrm_policy.c:3313 ip6_dst_lookup_flow+0x13e/0x180 net/ipv6/ip6_output.c:1257 rawv6_sendmsg+0x1283/0x23c0 net/ipv6/raw.c:898 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2597 ___sys_sendmsg net/socket.c:2651 [inline] __sys_sendmsg+0x2b0/0x3a0 net/socket.c:2680 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 45: kasan_save_stack mm/kasan/common.c:47 [inline] kasan_save_track+0x3f/0x80 mm/kasan/common.c:68 kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579 poison_slab_object+0xe0/0x150 mm/kasan/common.c:240 __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256 kasan_slab_free include/linux/kasan.h:184 [inline] slab_free_hook mm/slub.c:2252 [inline] slab_free mm/slub.c:4473 [inline] kmem_cache_free+0x145/0x350 mm/slub.c:4548 dst_destroy+0x2ac/0x460 net/core/dst.c:124 rcu_do_batch kernel/rcu/tree.c:2569 [inline] rcu_core+0xafd/0x1830 kernel/rcu/tree.c:2843 handle_softirqs+0x2c4/0x970 kernel/softirq.c:554 __do_softirq kernel/softirq.c:588 [inline] invoke_softirq kernel/softirq.c:428 [inline] __irq_exit_rcu+0xf4/0x1c0 kernel/softirq.c:637 irq_exit_rcu+0x9/0x30 kernel/softirq.c:649 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1043 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1043 asm_sysvec_apic_timer_interrupt+0x1a/0x20 arch/x86/include/asm/idtentry.h:702
Last potentially related work creation: kasan_save_stack+0x3f/0x60 mm/kasan/common.c:47 __kasan_record_aux_stack+0xac/0xc0 mm/kasan/generic.c:541 __call_rcu_common kernel/rcu/tree.c:3106 [inline] call_rcu+0x167/0xa70 kernel/rcu/tree.c:3210 refdst_drop include/net/dst.h:263 [inline] skb_dst_drop include/net/dst.h:275 [inline] nf_ct_frag6_queue net/ipv6/netfilter/nf_conntrack_reasm.c:306 [inline] nf_ct_frag6_gather+0xb9a/0x2080 net/ipv6/netfilter/nf_conntrack_reasm.c:485 ipv6_defrag+0x2c8/0x3c0 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:67 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xc3/0x220 net/netfilter/core.c:626 nf_hook include/linux/netfilter.h:269 [inline] __ip6_local_out+0x6fa/0x800 net/ipv6/output_core.c:143 ip6_local_out+0x26/0x70 net/ipv6/output_core.c:153 ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1959 rawv6_push_pending_frames+0x75c/0x9e0 net/ipv6/raw.c:588 rawv6_sendmsg+0x19c7/0x23c0 net/ipv6/raw.c:926 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0x1a6/0x270 net/socket.c:745 sock_write_iter+0x2dd/0x400 net/socket.c:1160 do_iter_readv_writev+0x60a/0x890
Fixes: 0625491493d9 ("ipv6: ip6_push_pending_frames() should increment IPSTATS_MIB_OUTDISCARDS") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Reviewed-by: David Ahern dsahern@kernel.org Link: https://patch.msgid.link/20240820160859.3786976-2-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 784424ac41477..d44ddce4c9f4d 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1953,6 +1953,7 @@ int ip6_send_skb(struct sk_buff *skb) struct rt6_info *rt = dst_rt6_info(skb_dst(skb)); int err;
+ rcu_read_lock(); err = ip6_local_out(net, skb->sk, skb); if (err) { if (err > 0) @@ -1962,6 +1963,7 @@ int ip6_send_skb(struct sk_buff *skb) IPSTATS_MIB_OUTDISCARDS); }
+ rcu_read_unlock(); return err; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit da273b377ae0d9bd255281ed3c2adb228321687b ]
If skb_expand_head() returns NULL, skb has been freed and associated dst/idev could also have been freed.
We need to hold rcu_read_lock() to make sure the dst and associated idev are alive.
Fixes: 5796015fa968 ("ipv6: allocate enough headroom in ip6_finish_output2()") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Vasily Averin vasily.averin@linux.dev Reviewed-by: David Ahern dsahern@kernel.org Link: https://patch.msgid.link/20240820160859.3786976-3-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index d44ddce4c9f4d..8778431acffda 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -70,11 +70,15 @@ static int ip6_finish_output2(struct net *net, struct sock *sk, struct sk_buff *
/* Be paranoid, rather than too clever. */ if (unlikely(hh_len > skb_headroom(skb)) && dev->header_ops) { + /* Make sure idev stays alive */ + rcu_read_lock(); skb = skb_expand_head(skb, hh_len); if (!skb) { IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); + rcu_read_unlock(); return -ENOMEM; } + rcu_read_unlock(); }
hdr = ipv6_hdr(skb);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit 2d5ff7e339d04622d8282661df36151906d0e1c7 ]
If skb_expand_head() returns NULL, skb has been freed and the associated dst/idev could also have been freed.
We must use rcu_read_lock() to prevent a possible UAF.
Fixes: 0c9f227bee11 ("ipv6: use skb_expand_head in ip6_xmit") Signed-off-by: Eric Dumazet edumazet@google.com Cc: Vasily Averin vasily.averin@linux.dev Reviewed-by: David Ahern dsahern@kernel.org Link: https://patch.msgid.link/20240820160859.3786976-4-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/ip6_output.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 8778431acffda..c49344d8311ab 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -287,11 +287,15 @@ int ip6_xmit(const struct sock *sk, struct sk_buff *skb, struct flowi6 *fl6, head_room += opt->opt_nflen + opt->opt_flen;
if (unlikely(head_room > skb_headroom(skb))) { + /* Make sure idev stays alive */ + rcu_read_lock(); skb = skb_expand_head(skb, head_room); if (!skb) { IP6_INC_STATS(net, idev, IPSTATS_MIB_OUTDISCARDS); + rcu_read_unlock(); return -ENOBUFS; } + rcu_read_unlock(); }
if (opt) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Somnath Kotur somnath.kotur@broadcom.com
[ Upstream commit 8baeef7616d5194045c5a6b97fd1246b87c55b13 ]
Remove the dma_unmap_page_attrs() call in the driver's XDP_REDIRECT code path. This should have been removed when we let the page pool handle the DMA mapping. This bug causes the warning:
WARNING: CPU: 7 PID: 59 at drivers/iommu/dma-iommu.c:1198 iommu_dma_unmap_page+0xd5/0x100 CPU: 7 PID: 59 Comm: ksoftirqd/7 Tainted: G W 6.8.0-1010-gcp #11-Ubuntu Hardware name: Dell Inc. PowerEdge R7525/0PYVT1, BIOS 2.15.2 04/02/2024 RIP: 0010:iommu_dma_unmap_page+0xd5/0x100 Code: 89 ee 48 89 df e8 cb f2 69 ff 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 31 f6 31 ff 45 31 c0 e9 ab 17 71 00 <0f> 0b 48 83 c4 08 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 d2 31 c9 RSP: 0018:ffffab1fc0597a48 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff99ff838280c8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 RBP: ffffab1fc0597a78 R08: 0000000000000002 R09: ffffab1fc0597c1c R10: ffffab1fc0597cd3 R11: ffff99ffe375acd8 R12: 00000000e65b9000 R13: 0000000000000050 R14: 0000000000001000 R15: 0000000000000002 FS: 0000000000000000(0000) GS:ffff9a06efb80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000565c34c37210 CR3: 00000005c7e3e000 CR4: 0000000000350ef0 ? show_regs+0x6d/0x80 ? __warn+0x89/0x150 ? iommu_dma_unmap_page+0xd5/0x100 ? report_bug+0x16a/0x190 ? handle_bug+0x51/0xa0 ? exc_invalid_op+0x18/0x80 ? iommu_dma_unmap_page+0xd5/0x100 ? iommu_dma_unmap_page+0x35/0x100 dma_unmap_page_attrs+0x55/0x220 ? bpf_prog_4d7e87c0d30db711_xdp_dispatcher+0x64/0x9f bnxt_rx_xdp+0x237/0x520 [bnxt_en] bnxt_rx_pkt+0x640/0xdd0 [bnxt_en] __bnxt_poll_work+0x1a1/0x3d0 [bnxt_en] bnxt_poll+0xaa/0x1e0 [bnxt_en] __napi_poll+0x33/0x1e0 net_rx_action+0x18a/0x2f0
Fixes: 578fcfd26e2a ("bnxt_en: Let the page pool manage the DMA mapping") Reviewed-by: Andy Gospodarek andrew.gospodarek@broadcom.com Reviewed-by: Kalesh AP kalesh-anakkur.purayil@broadcom.com Signed-off-by: Somnath Kotur somnath.kotur@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Reviewed-by: Jacob Keller jacob.e.keller@intel.com Link: https://patch.msgid.link/20240820203415.168178-1-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c index 345681d5007e3..f88b641533fcc 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c @@ -297,11 +297,6 @@ bool bnxt_rx_xdp(struct bnxt *bp, struct bnxt_rx_ring_info *rxr, u16 cons, * redirect is coming from a frame received by the * bnxt_en driver. */ - rx_buf = &rxr->rx_buf_ring[cons]; - mapping = rx_buf->mapping - bp->rx_dma_offset; - dma_unmap_page_attrs(&pdev->dev, mapping, - BNXT_RX_PAGE_SIZE, bp->rx_dir, - DMA_ATTR_WEAK_ORDERING);
/* if we are unable to allocate a new buffer, abort and reuse */ if (bnxt_alloc_rx_data(bp, rxr, rxr->rx_prod, GFP_ATOMIC)) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 6ea14ccb60c8ab829349979b22b58a941ec4a3ee ]
Ensure there is sufficient room to access the protocol field of the VLAN header, validate it once before the flowtable lookup.
===================================================== BUG: KMSAN: uninit-value in nf_flow_offload_inet_hook+0x45a/0x5f0 net/netfilter/nf_flow_table_inet.c:32 nf_flow_offload_inet_hook+0x45a/0x5f0 net/netfilter/nf_flow_table_inet.c:32 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xf4/0x400 net/netfilter/core.c:626 nf_hook_ingress include/linux/netfilter_netdev.h:34 [inline] nf_ingress net/core/dev.c:5440 [inline]
Fixes: 4cd91f7c290f ("netfilter: flowtable: add vlan support") Reported-by: syzbot+8407d9bb88cd4c6bf61a@syzkaller.appspotmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_flow_table_inet.c | 3 +++ net/netfilter/nf_flow_table_ip.c | 3 +++ 2 files changed, 6 insertions(+)
diff --git a/net/netfilter/nf_flow_table_inet.c b/net/netfilter/nf_flow_table_inet.c index 6eef15648b7b0..b0f1991719324 100644 --- a/net/netfilter/nf_flow_table_inet.c +++ b/net/netfilter/nf_flow_table_inet.c @@ -17,6 +17,9 @@ nf_flow_offload_inet_hook(void *priv, struct sk_buff *skb,
switch (skb->protocol) { case htons(ETH_P_8021Q): + if (!pskb_may_pull(skb, skb_mac_offset(skb) + sizeof(*veth))) + return NF_ACCEPT; + veth = (struct vlan_ethhdr *)skb_mac_header(skb); proto = veth->h_vlan_encapsulated_proto; break; diff --git a/net/netfilter/nf_flow_table_ip.c b/net/netfilter/nf_flow_table_ip.c index c2c005234dcd3..98edcaa37b38d 100644 --- a/net/netfilter/nf_flow_table_ip.c +++ b/net/netfilter/nf_flow_table_ip.c @@ -281,6 +281,9 @@ static bool nf_flow_skb_encap_protocol(struct sk_buff *skb, __be16 proto,
switch (skb->protocol) { case htons(ETH_P_8021Q): + if (!pskb_may_pull(skb, skb_mac_offset(skb) + sizeof(*veth))) + return false; + veth = (struct vlan_ethhdr *)skb_mac_header(skb); if (veth->h_vlan_encapsulated_proto == proto) { *offset += VLAN_HLEN;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bharat Bhushan bbhushan2@marvell.com
[ Upstream commit af688a99eb1fc7ef69774665d61e6be51cea627a ]
Some CPT AF registers are per LF and others are global. Translation of PF/VF local LF slot number to actual LF slot number is required only for accessing perf LF registers. CPT AF global registers access do not require any LF slot number. Also, there is no reason CPT PF/VF to know actual lf's register offset.
Without this fix microcode loading will fail, VFs cannot be created and hardware is not usable.
Fixes: bc35e28af789 ("octeontx2-af: replace cpt slot with lf id on reg write") Signed-off-by: Bharat Bhushan bbhushan2@marvell.com Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20240821070558.1020101-1-bbhushan2@marvell.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/marvell/octeontx2/af/rvu_cpt.c | 23 +++++++++---------- 1 file changed, 11 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cpt.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cpt.c index 3e09d22858147..daf4b951e9059 100644 --- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cpt.c +++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cpt.c @@ -632,7 +632,9 @@ int rvu_mbox_handler_cpt_inline_ipsec_cfg(struct rvu *rvu, return ret; }
-static bool is_valid_offset(struct rvu *rvu, struct cpt_rd_wr_reg_msg *req) +static bool validate_and_update_reg_offset(struct rvu *rvu, + struct cpt_rd_wr_reg_msg *req, + u64 *reg_offset) { u64 offset = req->reg_offset; int blkaddr, num_lfs, lf; @@ -663,6 +665,11 @@ static bool is_valid_offset(struct rvu *rvu, struct cpt_rd_wr_reg_msg *req) if (lf < 0) return false;
+ /* Translate local LF's offset to global CPT LF's offset to + * access LFX register. + */ + *reg_offset = (req->reg_offset & 0xFF000) + (lf << 3); + return true; } else if (!(req->hdr.pcifunc & RVU_PFVF_FUNC_MASK)) { /* Registers that can be accessed from PF */ @@ -697,7 +704,7 @@ int rvu_mbox_handler_cpt_rd_wr_register(struct rvu *rvu, struct cpt_rd_wr_reg_msg *rsp) { u64 offset = req->reg_offset; - int blkaddr, lf; + int blkaddr;
blkaddr = validate_and_get_cpt_blkaddr(req->blkaddr); if (blkaddr < 0) @@ -708,18 +715,10 @@ int rvu_mbox_handler_cpt_rd_wr_register(struct rvu *rvu, !is_cpt_vf(rvu, req->hdr.pcifunc)) return CPT_AF_ERR_ACCESS_DENIED;
- if (!is_valid_offset(rvu, req)) + if (!validate_and_update_reg_offset(rvu, req, &offset)) return CPT_AF_ERR_ACCESS_DENIED;
- /* Translate local LF used by VFs to global CPT LF */ - lf = rvu_get_lf(rvu, &rvu->hw->block[blkaddr], req->hdr.pcifunc, - (offset & 0xFFF) >> 3); - - /* Translate local LF's offset to global CPT LF's offset */ - offset &= 0xFF000; - offset += lf << 3; - - rsp->reg_offset = offset; + rsp->reg_offset = req->reg_offset; rsp->ret_val = req->ret_val; rsp->is_write = req->is_write;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Anderson sean.anderson@linux.dev
[ Upstream commit 4ae738dfef2c0323752ab81786e2d298c9939321 ]
If promiscuous mode is disabled when there are fewer than four multicast addresses, then it will not be reflected in the hardware. Fix this by always clearing the promiscuous mode flag even when we program multicast addresses.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Sean Anderson sean.anderson@linux.dev Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20240822154059.1066595-2-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index fa510f4e26008..b2e4d0b11a7d7 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -450,6 +450,10 @@ static void axienet_set_multicast_list(struct net_device *ndev) } else if (!netdev_mc_empty(ndev)) { struct netdev_hw_addr *ha;
+ reg = axienet_ior(lp, XAE_FMI_OFFSET); + reg &= ~XAE_FMI_PM_MASK; + axienet_iow(lp, XAE_FMI_OFFSET, reg); + i = 0; netdev_for_each_mc_addr(ha, ndev) { if (i >= XAE_MULTICAST_CAM_TABLE_NUM)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sean Anderson sean.anderson@linux.dev
[ Upstream commit 797a68c9de0f5a5447baf4bd3bb9c10a3993435b ]
If a multicast address is removed but there are still some multicast addresses, that address would remain programmed into the frame filter. Fix this by explicitly setting the enable bit for each filter.
Fixes: 8a3b7a252dca ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Sean Anderson sean.anderson@linux.dev Reviewed-by: Simon Horman horms@kernel.org Link: https://patch.msgid.link/20240822154059.1066595-3-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/xilinx/xilinx_axienet.h | 1 + .../net/ethernet/xilinx/xilinx_axienet_main.c | 21 ++++++++----------- 2 files changed, 10 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet.h b/drivers/net/ethernet/xilinx/xilinx_axienet.h index c7d9221fafdcb..09c9f9787180b 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet.h +++ b/drivers/net/ethernet/xilinx/xilinx_axienet.h @@ -170,6 +170,7 @@ #define XAE_UAW0_OFFSET 0x00000700 /* Unicast address word 0 */ #define XAE_UAW1_OFFSET 0x00000704 /* Unicast address word 1 */ #define XAE_FMI_OFFSET 0x00000708 /* Frame Filter Control */ +#define XAE_FFE_OFFSET 0x0000070C /* Frame Filter Enable */ #define XAE_AF0_OFFSET 0x00000710 /* Address Filter 0 */ #define XAE_AF1_OFFSET 0x00000714 /* Address Filter 1 */
diff --git a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c index b2e4d0b11a7d7..559c0d60d9483 100644 --- a/drivers/net/ethernet/xilinx/xilinx_axienet_main.c +++ b/drivers/net/ethernet/xilinx/xilinx_axienet_main.c @@ -432,7 +432,7 @@ static int netdev_set_mac_address(struct net_device *ndev, void *p) */ static void axienet_set_multicast_list(struct net_device *ndev) { - int i; + int i = 0; u32 reg, af0reg, af1reg; struct axienet_local *lp = netdev_priv(ndev);
@@ -454,7 +454,6 @@ static void axienet_set_multicast_list(struct net_device *ndev) reg &= ~XAE_FMI_PM_MASK; axienet_iow(lp, XAE_FMI_OFFSET, reg);
- i = 0; netdev_for_each_mc_addr(ha, ndev) { if (i >= XAE_MULTICAST_CAM_TABLE_NUM) break; @@ -473,6 +472,7 @@ static void axienet_set_multicast_list(struct net_device *ndev) axienet_iow(lp, XAE_FMI_OFFSET, reg); axienet_iow(lp, XAE_AF0_OFFSET, af0reg); axienet_iow(lp, XAE_AF1_OFFSET, af1reg); + axienet_iow(lp, XAE_FFE_OFFSET, 1); i++; } } else { @@ -480,18 +480,15 @@ static void axienet_set_multicast_list(struct net_device *ndev) reg &= ~XAE_FMI_PM_MASK;
axienet_iow(lp, XAE_FMI_OFFSET, reg); - - for (i = 0; i < XAE_MULTICAST_CAM_TABLE_NUM; i++) { - reg = axienet_ior(lp, XAE_FMI_OFFSET) & 0xFFFFFF00; - reg |= i; - - axienet_iow(lp, XAE_FMI_OFFSET, reg); - axienet_iow(lp, XAE_AF0_OFFSET, 0); - axienet_iow(lp, XAE_AF1_OFFSET, 0); - } - dev_info(&ndev->dev, "Promiscuous mode disabled.\n"); } + + for (; i < XAE_MULTICAST_CAM_TABLE_NUM; i++) { + reg = axienet_ior(lp, XAE_FMI_OFFSET) & 0xFFFFFF00; + reg |= i; + axienet_iow(lp, XAE_FMI_OFFSET, reg); + axienet_iow(lp, XAE_FFE_OFFSET, 0); + } }
/**
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Menglong Dong menglong8.dong@gmail.com
[ Upstream commit 57fb67783c4011581882f32e656d738da1f82042 ]
There is something wrong with ovs_drop_reasons. ovs_drop_reasons[0] is "OVS_DROP_LAST_ACTION", but OVS_DROP_LAST_ACTION == __OVS_DROP_REASON + 1, which means that ovs_drop_reasons[1] should be "OVS_DROP_LAST_ACTION".
And as Adrian tested, without the patch, adding flow to drop packets results in:
drop at: do_execute_actions+0x197/0xb20 [openvsw (0xffffffffc0db6f97) origin: software input port ifindex: 8 timestamp: Tue Aug 20 10:19:17 2024 859853461 nsec protocol: 0x800 length: 98 original length: 98 drop reason: OVS_DROP_ACTION_ERROR
With the patch, the same results in:
drop at: do_execute_actions+0x197/0xb20 [openvsw (0xffffffffc0db6f97) origin: software input port ifindex: 8 timestamp: Tue Aug 20 10:16:13 2024 475856608 nsec protocol: 0x800 length: 98 original length: 98 drop reason: OVS_DROP_LAST_ACTION
Fix this by initializing ovs_drop_reasons with index.
Fixes: 9d802da40b7c ("net: openvswitch: add last-action drop reason") Signed-off-by: Menglong Dong dongml2@chinatelecom.cn Tested-by: Adrian Moreno amorenoz@redhat.com Reviewed-by: Adrian Moreno amorenoz@redhat.com Link: https://patch.msgid.link/20240821123252.186305-1-dongml2@chinatelecom.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/openvswitch/datapath.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/openvswitch/datapath.c b/net/openvswitch/datapath.c index 99d72543abd3a..78d9961fcd446 100644 --- a/net/openvswitch/datapath.c +++ b/net/openvswitch/datapath.c @@ -2706,7 +2706,7 @@ static struct pernet_operations ovs_net_ops = { };
static const char * const ovs_drop_reasons[] = { -#define S(x) (#x), +#define S(x) [(x) & ~SKB_DROP_REASON_SUBSYS_MASK] = (#x), OVS_DROP_REASONS(S) #undef S };
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexandra Winter wintera@linux.ibm.com
[ Upstream commit 0124fb0ebf3b0ef89892d42147c9387be3105318 ]
iucv_alloc_device() gets a format string and a varying number of arguments. This is incorrectly forwarded by calling dev_set_name() with the format string and a va_list, while dev_set_name() expects also a varying number of arguments.
Symptoms: Corrupted iucv device names, which can result in log messages like: sysfs: cannot create duplicate filename '/devices/iucv/hvc_iucv1827699952'
Fixes: 4452e8ef8c36 ("s390/iucv: Provide iucv_alloc_device() / iucv_release_device()") Link: https://bugzilla.suse.com/show_bug.cgi?id=1228425 Signed-off-by: Alexandra Winter wintera@linux.ibm.com Reviewed-by: Thorsten Winkler twinkler@linux.ibm.com Reviewed-by: Przemek Kitszel przemyslaw.kitszel@intel.com Link: https://patch.msgid.link/20240821091337.3627068-1-wintera@linux.ibm.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/iucv/iucv.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/iucv/iucv.c b/net/iucv/iucv.c index b7bf34a5eb37a..1235307020075 100644 --- a/net/iucv/iucv.c +++ b/net/iucv/iucv.c @@ -86,13 +86,15 @@ struct device *iucv_alloc_device(const struct attribute_group **attrs, { struct device *dev; va_list vargs; + char buf[20]; int rc;
dev = kzalloc(sizeof(*dev), GFP_KERNEL); if (!dev) goto out_error; va_start(vargs, fmt); - rc = dev_set_name(dev, fmt, vargs); + vsnprintf(buf, sizeof(buf), fmt, vargs); + rc = dev_set_name(dev, "%s", buf); va_end(vargs); if (rc) goto out_error;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit df24373435f5899a2a98b7d377479c8d4376613b ]
DPU debugging macros need to be converted to a proper drm_debug_* macros, however this is a going an intrusive patch, not suitable for a fix. Wire DPU_DEBUG and DPU_DEBUG_DRIVER to always use DRM_DEBUG_DRIVER to make sure that DPU debugging messages always end up in the drm debug messages and are controlled via the usual drm.debug mask.
I don't think that it is a good idea for a generic DPU_DEBUG macro to be tied to DRM_UT_KMS. It is used to report a debug message from driver, so by default it should go to the DRM_UT_DRIVER channel. While refactoring debug macros later on we might end up with particular messages going to ATOMIC or KMS, but DRIVER should be the default.
Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/606932/ Link: https://lore.kernel.org/r/20240802-dpu-fix-wb-v2-2-7eac9eb8e895@linaro.org Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h index e2adc937ea63b..935ff6fd172c4 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h @@ -31,24 +31,14 @@ * @fmt: Pointer to format string */ #define DPU_DEBUG(fmt, ...) \ - do { \ - if (drm_debug_enabled(DRM_UT_KMS)) \ - DRM_DEBUG(fmt, ##__VA_ARGS__); \ - else \ - pr_debug(fmt, ##__VA_ARGS__); \ - } while (0) + DRM_DEBUG_DRIVER(fmt, ##__VA_ARGS__)
/** * DPU_DEBUG_DRIVER - macro for hardware driver logging * @fmt: Pointer to format string */ #define DPU_DEBUG_DRIVER(fmt, ...) \ - do { \ - if (drm_debug_enabled(DRM_UT_DRIVER)) \ - DRM_ERROR(fmt, ##__VA_ARGS__); \ - else \ - pr_debug(fmt, ##__VA_ARGS__); \ - } while (0) + DRM_DEBUG_DRIVER(fmt, ##__VA_ARGS__)
#define DPU_ERROR(fmt, ...) pr_err("[dpu error]" fmt, ##__VA_ARGS__) #define DPU_ERROR_RATELIMITED(fmt, ...) pr_err_ratelimited("[dpu error]" fmt, ##__VA_ARGS__)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Abhinav Kumar quic_abhinavk@quicinc.com
[ Upstream commit d19d5b8d8f6dab942ce5ddbcf34bf7275e778250 ]
Fix the dp_panel_get_supported_bpp() API to return the minimum supported bpp correctly for relevant cases and use this API to correct the behavior of DP driver which hard-codes the max supported bpp to 30.
This is incorrect because the number of lanes and max data rate supported by the lanes need to be taken into account.
Replace the hardcoded limit with the appropriate math which accounts for the accurate number of lanes and max data rate.
changes in v2: - Fix the dp_panel_get_supported_bpp() and use it - Drop the max_t usage as dp_panel_get_supported_bpp() already returns the min_bpp correctly now
changes in v3: - replace min_t with just min as all params are u32
Fixes: c943b4948b58 ("drm/msm/dp: add displayPort driver support") Reported-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/43 Tested-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org # SM8350-HDK Reviewed-by: Stephen Boyd swboyd@chromium.org Patchwork: https://patchwork.freedesktop.org/patch/607073/ Link: https://lore.kernel.org/r/20240805202009.1120981-1-quic_abhinavk@quicinc.com Signed-off-by: Stephen Boyd swboyd@chromium.org Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/dp/dp_panel.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/msm/dp/dp_panel.c b/drivers/gpu/drm/msm/dp/dp_panel.c index 07db8f37cd06a..017fb8cc8ab67 100644 --- a/drivers/gpu/drm/msm/dp/dp_panel.c +++ b/drivers/gpu/drm/msm/dp/dp_panel.c @@ -90,22 +90,22 @@ static int dp_panel_read_dpcd(struct dp_panel *dp_panel) static u32 dp_panel_get_supported_bpp(struct dp_panel *dp_panel, u32 mode_edid_bpp, u32 mode_pclk_khz) { - struct dp_link_info *link_info; + const struct dp_link_info *link_info; const u32 max_supported_bpp = 30, min_supported_bpp = 18; - u32 bpp = 0, data_rate_khz = 0; + u32 bpp, data_rate_khz;
- bpp = min_t(u32, mode_edid_bpp, max_supported_bpp); + bpp = min(mode_edid_bpp, max_supported_bpp);
link_info = &dp_panel->link_info; data_rate_khz = link_info->num_lanes * link_info->rate * 8;
- while (bpp > min_supported_bpp) { + do { if (mode_pclk_khz * bpp <= data_rate_khz) - break; + return bpp; bpp -= 6; - } + } while (bpp > min_supported_bpp);
- return bpp; + return min_supported_bpp; }
static int dp_panel_update_modes(struct drm_connector *connector, @@ -442,8 +442,9 @@ int dp_panel_init_panel_info(struct dp_panel *dp_panel) drm_mode->clock); drm_dbg_dp(panel->drm_dev, "bpp = %d\n", dp_panel->dp_mode.bpp);
- dp_panel->dp_mode.bpp = max_t(u32, 18, - min_t(u32, dp_panel->dp_mode.bpp, 30)); + dp_panel->dp_mode.bpp = dp_panel_get_mode_bpp(dp_panel, dp_panel->dp_mode.bpp, + dp_panel->dp_mode.drm_mode.clock); + drm_dbg_dp(panel->drm_dev, "updated bpp = %d\n", dp_panel->dp_mode.bpp);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Abhinav Kumar quic_abhinavk@quicinc.com
[ Upstream commit aedf02e46eb549dac8db4821a6b9f0c6bf6e3990 ]
For cases where the crtc's connectors_changed was set without enable/active getting toggled , there is an atomic_enable() call followed by an atomic_disable() but without an atomic_mode_set().
This results in a NULL ptr access for the dpu_encoder_get_drm_fmt() call in the atomic_enable() as the dpu_encoder's connector was cleared in the atomic_disable() but not re-assigned as there was no atomic_mode_set() call.
Fix the NULL ptr access by moving the assignment for atomic_enable() and also use drm_atomic_get_new_connector_for_encoder() to get the connector from the atomic_state.
Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support") Reported-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/59 Suggested-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Tested-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org # SM8350-HDK Patchwork: https://patchwork.freedesktop.org/patch/606729/ Link: https://lore.kernel.org/r/20240731191723.3050932-1-quic_abhinavk@quicinc.com Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c index 697ad4a640516..a6c5e3bc9bf15 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c @@ -1179,8 +1179,6 @@ static void dpu_encoder_virt_atomic_mode_set(struct drm_encoder *drm_enc,
cstate->num_mixers = num_lm;
- dpu_enc->connector = conn_state->connector; - for (i = 0; i < dpu_enc->num_phys_encs; i++) { struct dpu_encoder_phys *phys = dpu_enc->phys_encs[i];
@@ -1277,6 +1275,8 @@ static void dpu_encoder_virt_atomic_enable(struct drm_encoder *drm_enc,
dpu_enc->commit_done_timedout = false;
+ dpu_enc->connector = drm_atomic_get_new_connector_for_encoder(state, drm_enc); + cur_mode = &dpu_enc->base.crtc->state->adjusted_mode;
dpu_enc->wide_bus_en = dpu_encoder_is_widebus_enabled(drm_enc);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Abhinav Kumar quic_abhinavk@quicinc.com
[ Upstream commit 319aca883bfa1b85ee08411541b51b9a934ac858 ]
Before re-starting link training reset the link phy params namely the pre-emphasis and voltage swing levels otherwise the next link training begins at the previously cached levels which can result in link training failures.
Fixes: 8ede2ecc3e5e ("drm/msm/dp: Add DP compliance tests on Snapdragon Chipsets") Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Tested-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org # SM8350-HDK Reviewed-by: Stephen Boyd swboyd@chromium.org Patchwork: https://patchwork.freedesktop.org/patch/605946/ Link: https://lore.kernel.org/r/20240725220450.131245-1-quic_abhinavk@quicinc.com Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/dp/dp_ctrl.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/msm/dp/dp_ctrl.c b/drivers/gpu/drm/msm/dp/dp_ctrl.c index 7bc8a9f0657a9..f342fc5ae41ec 100644 --- a/drivers/gpu/drm/msm/dp/dp_ctrl.c +++ b/drivers/gpu/drm/msm/dp/dp_ctrl.c @@ -1286,6 +1286,8 @@ static int dp_ctrl_link_train(struct dp_ctrl_private *ctrl, link_info.rate = ctrl->link->link_params.rate; link_info.capabilities = DP_LINK_CAP_ENHANCED_FRAMING;
+ dp_link_reset_phy_params_vx_px(ctrl->link); + dp_aux_link_configure(ctrl->aux, &link_info);
if (drm_dp_max_downspread(dpcd))
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit bfa1a6283be390947d3649c482e5167186a37016 ]
If the dpu_format_populate_layout() fails, then FB is prepared, but not cleaned up. This ends up leaking the pin_count on the GEM object and causes a splat during DRM file closure:
msm_obj->pin_count WARNING: CPU: 2 PID: 569 at drivers/gpu/drm/msm/msm_gem.c:121 update_lru_locked+0xc4/0xcc [...] Call trace: update_lru_locked+0xc4/0xcc put_pages+0xac/0x100 msm_gem_free_object+0x138/0x180 drm_gem_object_free+0x1c/0x30 drm_gem_object_handle_put_unlocked+0x108/0x10c drm_gem_object_release_handle+0x58/0x70 idr_for_each+0x68/0xec drm_gem_release+0x28/0x40 drm_file_free+0x174/0x234 drm_release+0xb0/0x160 __fput+0xc0/0x2c8 __fput_sync+0x50/0x5c __arm64_sys_close+0x38/0x7c invoke_syscall+0x48/0x118 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x4c/0x120 el0t_64_sync_handler+0x100/0x12c el0t_64_sync+0x190/0x194 irq event stamp: 129818 hardirqs last enabled at (129817): [<ffffa5f6d953fcc0>] console_unlock+0x118/0x124 hardirqs last disabled at (129818): [<ffffa5f6da7dcf04>] el1_dbg+0x24/0x8c softirqs last enabled at (129808): [<ffffa5f6d94afc18>] handle_softirqs+0x4c8/0x4e8 softirqs last disabled at (129785): [<ffffa5f6d94105e4>] __do_softirq+0x14/0x20
Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/600714/ Link: https://lore.kernel.org/r/20240625-dpu-mode-config-width-v5-1-501d984d634f@l... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c index 1c3a2657450c6..eabc4813c649c 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c @@ -680,6 +680,9 @@ static int dpu_plane_prepare_fb(struct drm_plane *plane, new_state->fb, &layout); if (ret) { DPU_ERROR_PLANE(pdpu, "failed to get format layout, %d\n", ret); + if (pstate->aspace) + msm_framebuffer_cleanup(new_state->fb, pstate->aspace, + pstate->needs_dirtyfb); return ret; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit 2db13c4a631505029ada9404e09a2b06a268c1c4 ]
The QCM2290 doesn't have CSC blocks, so it can not support YUV formats even on ViG blocks. Fix the formats declared by _VIG_SBLK_NOSCALE().
Fixes: 5334087ee743 ("drm/msm: add support for QCM2290 MDSS") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/601048/ Link: https://lore.kernel.org/r/20240627-dpu-virtual-wide-v5-1-5efb90cbb8be@linaro... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c index 9b72977feafa4..e61b5681f3bbd 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c @@ -308,8 +308,8 @@ static const u32 wb2_formats_rgb_yuv[] = { { \ .maxdwnscale = SSPP_UNITY_SCALE, \ .maxupscale = SSPP_UNITY_SCALE, \ - .format_list = plane_formats_yuv, \ - .num_formats = ARRAY_SIZE(plane_formats_yuv), \ + .format_list = plane_formats, \ + .num_formats = ARRAY_SIZE(plane_formats), \ .virt_format_list = plane_formats, \ .virt_num_formats = ARRAY_SIZE(plane_formats), \ }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit cb18195914e353ece0e789e365a5a16872169805 ]
YUV formats require only CSC to be enabled. Even decimated formats should not require scaler. Relax the requirement and don't check for the scaler block while checking if YUV format can be enabled.
Fixes: 25fdd5933e4c ("drm/msm: Add SDM845 DPU support") Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Patchwork: https://patchwork.freedesktop.org/patch/601049/ Link: https://lore.kernel.org/r/20240627-dpu-virtual-wide-v5-2-5efb90cbb8be@linaro... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c index eabc4813c649c..cbdb9628d962d 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c @@ -746,10 +746,9 @@ static int dpu_plane_atomic_check_pipe(struct dpu_plane *pdpu, min_src_size = MSM_FORMAT_IS_YUV(fmt) ? 2 : 1;
if (MSM_FORMAT_IS_YUV(fmt) && - (!pipe->sspp->cap->sblk->scaler_blk.len || - !pipe->sspp->cap->sblk->csc_blk.len)) { + !pipe->sspp->cap->sblk->csc_blk.len) { DPU_DEBUG_PLANE(pdpu, - "plane doesn't have scaler/csc for yuv\n"); + "plane doesn't have csc for yuv\n"); return -EINVAL; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dmitry Baryshkov dmitry.baryshkov@linaro.org
[ Upstream commit d3a785e4f983f523380e023d8a05fb6d04402957 ]
Take into account the plane rotation and flipping when calculating src positions for the wide plane parts.
This is not an issue yet, because rotation is only supported for the UBWC planes and wide UBWC planes are rejected anyway because in parallel multirect case only the half of the usual width is supported for tiled formats. However it's better to fix this now rather than stumbling upon it later.
Fixes: 80e8ae3b38ab ("drm/msm/dpu: add support for wide planes") Reviewed-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Patchwork: https://patchwork.freedesktop.org/patch/601059/ Link: https://lore.kernel.org/r/20240627-dpu-virtual-wide-v5-3-5efb90cbb8be@linaro... Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c index cbdb9628d962d..c31d283d1c6c1 100644 --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c @@ -865,6 +865,10 @@ static int dpu_plane_atomic_check(struct drm_plane *plane,
max_linewidth = pdpu->catalog->caps->max_linewidth;
+ drm_rect_rotate(&pipe_cfg->src_rect, + new_plane_state->fb->width, new_plane_state->fb->height, + new_plane_state->rotation); + if ((drm_rect_width(&pipe_cfg->src_rect) > max_linewidth) || _dpu_plane_calc_clk(&crtc_state->adjusted_mode, pipe_cfg) > max_mdp_clk_rate) { /* @@ -914,6 +918,14 @@ static int dpu_plane_atomic_check(struct drm_plane *plane, r_pipe_cfg->dst_rect.x1 = pipe_cfg->dst_rect.x2; }
+ drm_rect_rotate_inv(&pipe_cfg->src_rect, + new_plane_state->fb->width, new_plane_state->fb->height, + new_plane_state->rotation); + if (r_pipe->sspp) + drm_rect_rotate_inv(&r_pipe_cfg->src_rect, + new_plane_state->fb->width, new_plane_state->fb->height, + new_plane_state->rotation); + ret = dpu_plane_atomic_check_pipe(pdpu, pipe, pipe_cfg, fmt, &crtc_state->adjusted_mode); if (ret) return ret;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Will Deacon will@kernel.org
[ Upstream commit 38f7e14519d39cf524ddc02d4caee9b337dad703 ]
UBSAN reports the following 'subtraction overflow' error when booting in a virtual machine on Android:
| Internal error: UBSAN: integer subtraction overflow: 00000000f2005515 [#1] PREEMPT SMP | Modules linked in: | CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.0-00006-g3cbe9e5abd46-dirty #4 | Hardware name: linux,dummy-virt (DT) | pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) | pc : cancel_delayed_work+0x34/0x44 | lr : cancel_delayed_work+0x2c/0x44 | sp : ffff80008002ba60 | x29: ffff80008002ba60 x28: 0000000000000000 x27: 0000000000000000 | x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000 | x23: 0000000000000000 x22: 0000000000000000 x21: ffff1f65014cd3c0 | x20: ffffc0e84c9d0da0 x19: ffffc0e84cab3558 x18: ffff800080009058 | x17: 00000000247ee1f8 x16: 00000000247ee1f8 x15: 00000000bdcb279d | x14: 0000000000000001 x13: 0000000000000075 x12: 00000a0000000000 | x11: ffff1f6501499018 x10: 00984901651fffff x9 : ffff5e7cc35af000 | x8 : 0000000000000001 x7 : 3d4d455453595342 x6 : 000000004e514553 | x5 : ffff1f6501499265 x4 : ffff1f650ff60b10 x3 : 0000000000000620 | x2 : ffff80008002ba78 x1 : 0000000000000000 x0 : 0000000000000000 | Call trace: | cancel_delayed_work+0x34/0x44 | deferred_probe_extend_timeout+0x20/0x70 | driver_register+0xa8/0x110 | __platform_driver_register+0x28/0x3c | syscon_init+0x24/0x38 | do_one_initcall+0xe4/0x338 | do_initcall_level+0xac/0x178 | do_initcalls+0x5c/0xa0 | do_basic_setup+0x20/0x30 | kernel_init_freeable+0x8c/0xf8 | kernel_init+0x28/0x1b4 | ret_from_fork+0x10/0x20 | Code: f9000fbf 97fffa2f 39400268 37100048 (d42aa2a0) | ---[ end trace 0000000000000000 ]--- | Kernel panic - not syncing: UBSAN: integer subtraction overflow: Fatal exception
This is due to shift_and_mask() using a signed immediate to construct the mask and being called with a shift of 31 (WORK_OFFQ_POOL_SHIFT) so that it ends up decrementing from INT_MIN.
Use an unsigned constant '1U' to generate the mask in shift_and_mask().
Cc: Tejun Heo tj@kernel.org Cc: Lai Jiangshan jiangshanlai@gmail.com Fixes: 1211f3b21c2a ("workqueue: Preserve OFFQ bits in cancel[_sync] paths") Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/workqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c index f98247ec99c20..c8687c0ab3645 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -896,7 +896,7 @@ static struct worker_pool *get_work_pool(struct work_struct *work)
static unsigned long shift_and_mask(unsigned long v, u32 shift, u32 bits) { - return (v >> shift) & ((1 << bits) - 1); + return (v >> shift) & ((1U << bits) - 1); }
static void work_offqd_unpack(struct work_offq_data *offqd, unsigned long data)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tejun Heo tj@kernel.org
[ Upstream commit 8bc35475ef1a23b0e224f3242eb11c76cab0ea88 ]
When flushing a work item for cancellation, __flush_work() knows that it exclusively owns the work item through its PENDING bit. 134874e2eee9 ("workqueue: Allow cancel_work_sync() and disable_work() from atomic contexts on BH work items") added a read of @work->data to determine whether to use busy wait for BH work items that are being canceled. While the read is safe when @from_cancel, @work->data was read before testing @from_cancel to simplify code structure:
data = *work_data_bits(work); if (from_cancel && !WARN_ON_ONCE(data & WORK_STRUCT_PWQ) && (data & WORK_OFFQ_BH)) {
While the read data was never used if !@from_cancel, this could trigger KCSAN data race detection spuriously:
================================================================== BUG: KCSAN: data-race in __flush_work / __flush_work
write to 0xffff8881223aa3e8 of 8 bytes by task 3998 on cpu 0: instrument_write include/linux/instrumented.h:41 [inline] ___set_bit include/asm-generic/bitops/instrumented-non-atomic.h:28 [inline] insert_wq_barrier kernel/workqueue.c:3790 [inline] start_flush_work kernel/workqueue.c:4142 [inline] __flush_work+0x30b/0x570 kernel/workqueue.c:4178 flush_work kernel/workqueue.c:4229 [inline] ...
read to 0xffff8881223aa3e8 of 8 bytes by task 50 on cpu 1: __flush_work+0x42a/0x570 kernel/workqueue.c:4188 flush_work kernel/workqueue.c:4229 [inline] flush_delayed_work+0x66/0x70 kernel/workqueue.c:4251 ...
value changed: 0x0000000000400000 -> 0xffff88810006c00d
Reorganize the code so that @from_cancel is tested before @work->data is accessed. The only problem is triggering KCSAN detection spuriously. This shouldn't need READ_ONCE() or other access qualifiers.
No functional changes.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: syzbot+b3e4f2f51ed645fd5df2@syzkaller.appspotmail.com Fixes: 134874e2eee9 ("workqueue: Allow cancel_work_sync() and disable_work() from atomic contexts on BH work items") Link: http://lkml.kernel.org/r/000000000000ae429e061eea2157@google.com Cc: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/workqueue.c | 45 +++++++++++++++++++++++++-------------------- 1 file changed, 25 insertions(+), 20 deletions(-)
diff --git a/kernel/workqueue.c b/kernel/workqueue.c index c8687c0ab3645..c970eec25c5a0 100644 --- a/kernel/workqueue.c +++ b/kernel/workqueue.c @@ -4190,7 +4190,6 @@ static bool start_flush_work(struct work_struct *work, struct wq_barrier *barr, static bool __flush_work(struct work_struct *work, bool from_cancel) { struct wq_barrier barr; - unsigned long data;
if (WARN_ON(!wq_online)) return false; @@ -4208,29 +4207,35 @@ static bool __flush_work(struct work_struct *work, bool from_cancel) * was queued on a BH workqueue, we also know that it was running in the * BH context and thus can be busy-waited. */ - data = *work_data_bits(work); - if (from_cancel && - !WARN_ON_ONCE(data & WORK_STRUCT_PWQ) && (data & WORK_OFFQ_BH)) { - /* - * On RT, prevent a live lock when %current preempted soft - * interrupt processing or prevents ksoftirqd from running by - * keeping flipping BH. If the BH work item runs on a different - * CPU then this has no effect other than doing the BH - * disable/enable dance for nothing. This is copied from - * kernel/softirq.c::tasklet_unlock_spin_wait(). - */ - while (!try_wait_for_completion(&barr.done)) { - if (IS_ENABLED(CONFIG_PREEMPT_RT)) { - local_bh_disable(); - local_bh_enable(); - } else { - cpu_relax(); + if (from_cancel) { + unsigned long data = *work_data_bits(work); + + if (!WARN_ON_ONCE(data & WORK_STRUCT_PWQ) && + (data & WORK_OFFQ_BH)) { + /* + * On RT, prevent a live lock when %current preempted + * soft interrupt processing or prevents ksoftirqd from + * running by keeping flipping BH. If the BH work item + * runs on a different CPU then this has no effect other + * than doing the BH disable/enable dance for nothing. + * This is copied from + * kernel/softirq.c::tasklet_unlock_spin_wait(). + */ + while (!try_wait_for_completion(&barr.done)) { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) { + local_bh_disable(); + local_bh_enable(); + } else { + cpu_relax(); + } } + goto out_destroy; } - } else { - wait_for_completion(&barr.done); }
+ wait_for_completion(&barr.done); + +out_destroy: destroy_work_on_stack(&barr.work); return true; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Abhinav Kumar quic_abhinavk@quicinc.com
[ Upstream commit 3e30296b374af33cb4c12ff93df0b1e5b2d0f80b ]
sc7180 programs the ubwc settings as 0x1e as that would mean a highest bank bit of 14 which matches what the GPU sets as well.
However, the highest_bank_bit field of the msm_mdss_data which is being used to program the SSPP's fetch configuration is programmed to a highest bank bit of 16 as 0x3 translates to 16 and not 14.
Fix the highest bank bit field used for the SSPP to match the mdss and gpu settings.
Fixes: 6f410b246209 ("drm/msm/mdss: populate missing data") Reviewed-by: Rob Clark robdclark@gmail.com Tested-by: Stephen Boyd swboyd@chromium.org # Trogdor.Lazor Patchwork: https://patchwork.freedesktop.org/patch/607625/ Link: https://lore.kernel.org/r/20240808235227.2701479-1-quic_abhinavk@quicinc.com Signed-off-by: Abhinav Kumar quic_abhinavk@quicinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/msm/msm_mdss.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/msm/msm_mdss.c b/drivers/gpu/drm/msm/msm_mdss.c index fab6ad4e5107c..ec75274178028 100644 --- a/drivers/gpu/drm/msm/msm_mdss.c +++ b/drivers/gpu/drm/msm/msm_mdss.c @@ -577,7 +577,7 @@ static const struct msm_mdss_data sc7180_data = { .ubwc_enc_version = UBWC_2_0, .ubwc_dec_version = UBWC_2_0, .ubwc_static = 0x1e, - .highest_bank_bit = 0x3, + .highest_bank_bit = 0x1, .reg_bus_bw = 76800, };
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vignesh Raghavendra vigneshr@ti.com
[ Upstream commit 57d5af2660e9443b081eeaf1c373b3ce48477828 ]
Its necessary to call pm_runtime_force_*() hooks as part of system suspend/resume calls so that the runtime_pm hooks get called. This ensures latest state of the IP is cached and restored during system sleep. This is especially true if runtime autosuspend is enabled as runtime suspend hooks may not be called at all before system sleeps.
Without this patch, OSPI NOR enumeration (READ_ID) fails during resume as context saved during suspend path is inconsistent.
Fixes: 078d62de433b ("spi: cadence-qspi: add system-wide suspend and resume callbacks") Signed-off-by: Vignesh Raghavendra vigneshr@ti.com Link: https://patch.msgid.link/20240814151237.3856184-1-vigneshr@ti.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-cadence-quadspi.c | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c index 05ebb03d319fc..d4607cb89c484 100644 --- a/drivers/spi/spi-cadence-quadspi.c +++ b/drivers/spi/spi-cadence-quadspi.c @@ -2000,13 +2000,25 @@ static int cqspi_runtime_resume(struct device *dev) static int cqspi_suspend(struct device *dev) { struct cqspi_st *cqspi = dev_get_drvdata(dev); + int ret;
- return spi_controller_suspend(cqspi->host); + ret = spi_controller_suspend(cqspi->host); + if (ret) + return ret; + + return pm_runtime_force_suspend(dev); }
static int cqspi_resume(struct device *dev) { struct cqspi_st *cqspi = dev_get_drvdata(dev); + int ret; + + ret = pm_runtime_force_resume(dev); + if (ret) { + dev_err(dev, "pm_runtime_force_resume failed on resume\n"); + return ret; + }
return spi_controller_resume(cqspi->host); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suraj Kandpal suraj.kandpal@intel.com
[ Upstream commit 5d41eeb6725e3e24853629e5d7635e4bc45d736e ]
We are checking cp_irq_count from the wrong hdcp structure which ends up giving timed out errors. We only increment the cp_irq_count of the primary connector's hdcp structure but here in case of multidisplay setup we end up checking the secondary connector's hdcp structure, which will not have its cp_irq_count incremented. This leads to a timed out at CP_IRQ error even though a CP_IRQ was raised. Extract it from the correct intel_hdcp structure.
--v2 -Explain why it was the wrong hdcp structure [Jani]
Fixes: 8c9e4f68b861 ("drm/i915/hdcp: Use per-device debugs") Signed-off-by: Suraj Kandpal suraj.kandpal@intel.com Reviewed-by: Ankit Nautiyal ankit.k.nautiyal@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240809114127.3940699-2-suraj... (cherry picked from commit dd925902634def895690426bf10e0a8b3e56f56d) Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/display/intel_dp_hdcp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dp_hdcp.c b/drivers/gpu/drm/i915/display/intel_dp_hdcp.c index 92b03073acdd5..555428606e127 100644 --- a/drivers/gpu/drm/i915/display/intel_dp_hdcp.c +++ b/drivers/gpu/drm/i915/display/intel_dp_hdcp.c @@ -39,7 +39,9 @@ static u32 transcoder_to_stream_enc_status(enum transcoder cpu_transcoder) static void intel_dp_hdcp_wait_for_cp_irq(struct intel_connector *connector, int timeout) { - struct intel_hdcp *hdcp = &connector->hdcp; + struct intel_digital_port *dig_port = intel_attached_dig_port(connector); + struct intel_dp *dp = &dig_port->dp; + struct intel_hdcp *hdcp = &dp->attached_connector->hdcp; long ret;
#define C (hdcp->cp_irq_count_cached != atomic_read(&hdcp->cp_irq_count))
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Auld matthew.auld@intel.com
[ Upstream commit 48d74a0a45201de4efa016fb2f556889db37ed28 ]
Unclear why we call this twice.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Andrzej Hajda andrzej.hajda@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Reviewed-by: Andrzej Hajda andrzej.hajda@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-35-matth... Stable-dep-of: f4b2a0ae1a31 ("drm/xe: Fix opregion leak") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/display/xe_display.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c index 0de0566e5b394..6ecaf83264d55 100644 --- a/drivers/gpu/drm/xe/display/xe_display.c +++ b/drivers/gpu/drm/xe/display/xe_display.c @@ -134,7 +134,6 @@ static void xe_display_fini_noirq(struct drm_device *dev, void *dummy) return;
intel_display_driver_remove_noirq(xe); - intel_power_domains_driver_remove(xe); }
int xe_display_init_noirq(struct xe_device *xe)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lucas De Marchi lucas.demarchi@intel.com
[ Upstream commit f4b2a0ae1a31fd3d1b5ca18ee08319b479cf9b5f ]
Being part o the display, ideally the setup and cleanup would be done by display itself. However this is a bigger refactor that needs to be done on both i915 and xe. For now, just fix the leak:
unreferenced object 0xffff8881a0300008 (size 192): comm "modprobe", pid 4354, jiffies 4295647021 hex dump (first 32 bytes): 00 00 87 27 81 88 ff ff 18 80 9b 00 00 c9 ff ff ...'............ 18 81 9b 00 00 c9 ff ff 00 00 00 00 00 00 00 00 ................ backtrace (crc 99260e31): [<ffffffff823ce65b>] kmemleak_alloc+0x4b/0x80 [<ffffffff81493be2>] kmalloc_trace_noprof+0x312/0x3d0 [<ffffffffa1345679>] intel_opregion_setup+0x89/0x700 [xe] [<ffffffffa125bfaf>] xe_display_init_noirq+0x2f/0x90 [xe] [<ffffffffa1199ec3>] xe_device_probe+0x7a3/0xbf0 [xe] [<ffffffffa11f3713>] xe_pci_probe+0x333/0x5b0 [xe] [<ffffffff81af6be8>] local_pci_probe+0x48/0xb0 [<ffffffff81af8778>] pci_device_probe+0xc8/0x280 [<ffffffff81d09048>] really_probe+0xf8/0x390 [<ffffffff81d0937a>] __driver_probe_device+0x8a/0x170 [<ffffffff81d09503>] driver_probe_device+0x23/0xb0 [<ffffffff81d097b7>] __driver_attach+0xc7/0x190 [<ffffffff81d0628d>] bus_for_each_dev+0x7d/0xd0 [<ffffffff81d0851e>] driver_attach+0x1e/0x30 [<ffffffff81d07ac7>] bus_add_driver+0x117/0x250
Fixes: 44e694958b95 ("drm/xe/display: Implement display support") Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240724215309.644423-1-lucas.... Signed-off-by: Lucas De Marchi lucas.demarchi@intel.com (cherry picked from commit 6f4e43a2f771b737d991142ec4f6d4b7ff31fbb4) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/display/xe_display.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/display/xe_display.c b/drivers/gpu/drm/xe/display/xe_display.c index 6ecaf83264d55..7cdc03dc40ed9 100644 --- a/drivers/gpu/drm/xe/display/xe_display.c +++ b/drivers/gpu/drm/xe/display/xe_display.c @@ -134,6 +134,7 @@ static void xe_display_fini_noirq(struct drm_device *dev, void *dummy) return;
intel_display_driver_remove_noirq(xe); + intel_opregion_cleanup(xe); }
int xe_display_init_noirq(struct xe_device *xe) @@ -159,8 +160,10 @@ int xe_display_init_noirq(struct xe_device *xe) intel_display_device_info_runtime_init(xe);
err = intel_display_driver_probe_noirq(xe); - if (err) + if (err) { + intel_opregion_cleanup(xe); return err; + }
return drmm_add_action_or_reset(&xe->drm, xe_display_fini_noirq, NULL); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Auld matthew.auld@intel.com
[ Upstream commit a0b834c8957a7d2848face008a12382a0ad11ffc ]
Not valid to touch mmio once the device is removed, so make sure we unmap on removal and not just when driver instance goes away. Also set the mmio pointers to NULL to hopefully catch such issues more easily.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Andrzej Hajda andrzej.hajda@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Reviewed-by: Andrzej Hajda andrzej.hajda@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-32-matth... Stable-dep-of: 15939ca77d44 ("drm/xe: Fix tile fini sequence") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_mmio.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c index 334637511e750..2ebb2f0d6874e 100644 --- a/drivers/gpu/drm/xe/xe_mmio.c +++ b/drivers/gpu/drm/xe/xe_mmio.c @@ -386,13 +386,16 @@ void xe_mmio_probe_tiles(struct xe_device *xe) } }
-static void mmio_fini(struct drm_device *drm, void *arg) +static void mmio_fini(void *arg) { struct xe_device *xe = arg;
pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs); if (xe->mem.vram.mapping) iounmap(xe->mem.vram.mapping); + + xe->mem.vram.mapping = NULL; + xe->mmio.regs = NULL; }
int xe_mmio_init(struct xe_device *xe) @@ -417,7 +420,7 @@ int xe_mmio_init(struct xe_device *xe) root_tile->mmio.size = SZ_16M; root_tile->mmio.regs = xe->mmio.regs;
- return drmm_add_action_or_reset(&xe->drm, mmio_fini, xe); + return devm_add_action_or_reset(xe->drm.dev, mmio_fini, xe); }
u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Auld matthew.auld@intel.com
[ Upstream commit c7117419784f612d59ee565145f722e8b5541fe6 ]
Set our various mmio mappings to NULL. This should make it easier to catch something rogue trying to mess with mmio after device removal. For example, we might unmap everything and then start hitting some mmio address which has already been unmamped by us and then remapped by something else, causing all kinds of carnage.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Andrzej Hajda andrzej.hajda@intel.com Cc: Rodrigo Vivi rodrigo.vivi@intel.com Reviewed-by: Andrzej Hajda andrzej.hajda@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240522102143.128069-33-matth... Stable-dep-of: 15939ca77d44 ("drm/xe: Fix tile fini sequence") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_device.c | 4 +++- drivers/gpu/drm/xe/xe_mmio.c | 35 ++++++++++++++++++++++++++++------ drivers/gpu/drm/xe/xe_mmio.h | 2 +- 3 files changed, 33 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index 5ef9b50a20d01..a1cbdafbff75e 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -551,7 +551,9 @@ int xe_device_probe(struct xe_device *xe) if (err) return err;
- xe_mmio_probe_tiles(xe); + err = xe_mmio_probe_tiles(xe); + if (err) + return err;
xe_ttm_sys_mgr_init(xe);
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c index 2ebb2f0d6874e..9d8fafdf51453 100644 --- a/drivers/gpu/drm/xe/xe_mmio.c +++ b/drivers/gpu/drm/xe/xe_mmio.c @@ -254,6 +254,21 @@ static int xe_mmio_tile_vram_size(struct xe_tile *tile, u64 *vram_size, return xe_force_wake_put(gt_to_fw(gt), XE_FW_GT); }
+static void vram_fini(void *arg) +{ + struct xe_device *xe = arg; + struct xe_tile *tile; + int id; + + if (xe->mem.vram.mapping) + iounmap(xe->mem.vram.mapping); + + xe->mem.vram.mapping = NULL; + + for_each_tile(tile, xe, id) + tile->mem.vram.mapping = NULL; +} + int xe_mmio_probe_vram(struct xe_device *xe) { struct xe_tile *tile; @@ -330,10 +345,20 @@ int xe_mmio_probe_vram(struct xe_device *xe) drm_info(&xe->drm, "Available VRAM: %pa, %pa\n", &xe->mem.vram.io_start, &available_size);
- return 0; + return devm_add_action_or_reset(xe->drm.dev, vram_fini, xe); }
-void xe_mmio_probe_tiles(struct xe_device *xe) +static void tiles_fini(void *arg) +{ + struct xe_device *xe = arg; + struct xe_tile *tile; + int id; + + for_each_tile(tile, xe, id) + tile->mmio.regs = NULL; +} + +int xe_mmio_probe_tiles(struct xe_device *xe) { size_t tile_mmio_size = SZ_16M, tile_mmio_ext_size = xe->info.tile_mmio_ext_size; u8 id, tile_count = xe->info.tile_count; @@ -384,6 +409,8 @@ void xe_mmio_probe_tiles(struct xe_device *xe) regs += tile_mmio_ext_size; } } + + return devm_add_action_or_reset(xe->drm.dev, tiles_fini, xe); }
static void mmio_fini(void *arg) @@ -391,10 +418,6 @@ static void mmio_fini(void *arg) struct xe_device *xe = arg;
pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs); - if (xe->mem.vram.mapping) - iounmap(xe->mem.vram.mapping); - - xe->mem.vram.mapping = NULL; xe->mmio.regs = NULL; }
diff --git a/drivers/gpu/drm/xe/xe_mmio.h b/drivers/gpu/drm/xe/xe_mmio.h index a3cd7b3036c73..a929d090bb2f1 100644 --- a/drivers/gpu/drm/xe/xe_mmio.h +++ b/drivers/gpu/drm/xe/xe_mmio.h @@ -21,7 +21,7 @@ struct xe_device; #define LMEM_BAR 2
int xe_mmio_init(struct xe_device *xe); -void xe_mmio_probe_tiles(struct xe_device *xe); +int xe_mmio_probe_tiles(struct xe_device *xe);
u8 xe_mmio_read8(struct xe_gt *gt, struct xe_reg reg); u16 xe_mmio_read16(struct xe_gt *gt, struct xe_reg reg);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Brost matthew.brost@intel.com
[ Upstream commit 15939ca77d4424f736e1e4953b4da2351cc9689d ]
Only set tile->mmio.regs to NULL if not the root tile in tile_fini. The root tile mmio regs is setup ealier in MMIO init thus it should be set to NULL in mmio_fini.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost matthew.brost@intel.com Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com Reviewed-by: Himal Prasad Ghimiray himal.prasad.ghimiray@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240809232830.3302251-1-matth... (cherry picked from commit 3396900aa273903639a1792afa4d23dc09bec291) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_mmio.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_mmio.c b/drivers/gpu/drm/xe/xe_mmio.c index 9d8fafdf51453..beb4e276ba845 100644 --- a/drivers/gpu/drm/xe/xe_mmio.c +++ b/drivers/gpu/drm/xe/xe_mmio.c @@ -355,7 +355,8 @@ static void tiles_fini(void *arg) int id;
for_each_tile(tile, xe, id) - tile->mmio.regs = NULL; + if (tile != xe_device_get_root_tile(xe)) + tile->mmio.regs = NULL; }
int xe_mmio_probe_tiles(struct xe_device *xe) @@ -416,9 +417,11 @@ int xe_mmio_probe_tiles(struct xe_device *xe) static void mmio_fini(void *arg) { struct xe_device *xe = arg; + struct xe_tile *root_tile = xe_device_get_root_tile(xe);
pci_iounmap(to_pci_dev(xe->drm.dev), xe->mmio.regs); xe->mmio.regs = NULL; + root_tile->mmio.regs = NULL; }
int xe_mmio_init(struct xe_device *xe)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit a1e627af32ed60713941cbfc8075d44cad07f6dd ]
If the "test->highmem = alloc_pages()" allocation fails then calling __free_pages(test->highmem) will result in a NULL dereference. Also change the error code to -ENOMEM instead of returning success.
Fixes: 2661081f5ab9 ("mmc_test: highmem tests") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Link: https://lore.kernel.org/r/8c90be28-67b4-4b0d-a105-034dc72a0b31@stanley.mount... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/core/mmc_test.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/mmc/core/mmc_test.c b/drivers/mmc/core/mmc_test.c index 8f7f587a0025b..b7f627a9fdeab 100644 --- a/drivers/mmc/core/mmc_test.c +++ b/drivers/mmc/core/mmc_test.c @@ -3125,13 +3125,13 @@ static ssize_t mtf_test_write(struct file *file, const char __user *buf, test->buffer = kzalloc(BUFFER_SIZE, GFP_KERNEL); #ifdef CONFIG_HIGHMEM test->highmem = alloc_pages(GFP_KERNEL | __GFP_HIGHMEM, BUFFER_ORDER); + if (!test->highmem) { + count = -ENOMEM; + goto free_test_buffer; + } #endif
-#ifdef CONFIG_HIGHMEM - if (test->buffer && test->highmem) { -#else if (test->buffer) { -#endif mutex_lock(&mmc_test_lock); mmc_test_run(test, testcase); mutex_unlock(&mmc_test_lock); @@ -3139,6 +3139,7 @@ static ssize_t mtf_test_write(struct file *file, const char __user *buf,
#ifdef CONFIG_HIGHMEM __free_pages(test->highmem, BUFFER_ORDER); +free_test_buffer: #endif kfree(test->buffer); kfree(test);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jens Axboe axboe@kernel.dk
[ Upstream commit e0ee967630c8ee67bb47a5b38d235cd5a8789c48 ]
Harden the buffer peeking a bit, by adding a sanity check for it having a valid size. Outside of that, arg->max_len is a size_t, though it's only ever set to a 32-bit value (as it's governed by MAX_RW_COUNT). Bump our needed check to a size_t so we know it fits. Finally, cap the calculated needed iov value to the PEEK_MAX_IMPORT, which is the maximum number of segments that should be peeked.
Fixes: 35c8711c8fc4 ("io_uring/kbuf: add helpers for getting/peeking multiple buffers") Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- io_uring/kbuf.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/io_uring/kbuf.c b/io_uring/kbuf.c index c95dc1736dd93..1af2bd56af44a 100644 --- a/io_uring/kbuf.c +++ b/io_uring/kbuf.c @@ -218,10 +218,13 @@ static int io_ring_buffers_peek(struct io_kiocb *req, struct buf_sel_arg *arg,
buf = io_ring_head_to_buf(br, head, bl->mask); if (arg->max_len) { - int needed; + u32 len = READ_ONCE(buf->len); + size_t needed;
- needed = (arg->max_len + buf->len - 1) / buf->len; - needed = min(needed, PEEK_MAX_IMPORT); + if (unlikely(!len)) + return -ENOBUFS; + needed = (arg->max_len + len - 1) / len; + needed = min_not_zero(needed, (size_t) PEEK_MAX_IMPORT); if (nr_avail > needed) nr_avail = needed; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stuart Summers stuart.summers@intel.com
[ Upstream commit a6f78359ac75f24cac3c1bdd753c49c1877bcd82 ]
On driver reload we never free up the memory for the pagefault and access counter workqueues. Add those destroy calls here.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Stuart Summers stuart.summers@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/c9a951505271dc3a7aee76de765667... (cherry picked from commit 7586fc52b14e0b8edd0d1f8a434e0de2078b7b2b) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_gt_pagefault.c | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c b/drivers/gpu/drm/xe/xe_gt_pagefault.c index fa9e9853c53ba..67e8efcaa93f1 100644 --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c @@ -402,6 +402,18 @@ static void pf_queue_work_func(struct work_struct *w)
static void acc_queue_work_func(struct work_struct *w);
+static void pagefault_fini(void *arg) +{ + struct xe_gt *gt = arg; + struct xe_device *xe = gt_to_xe(gt); + + if (!xe->info.has_usm) + return; + + destroy_workqueue(gt->usm.acc_wq); + destroy_workqueue(gt->usm.pf_wq); +} + int xe_gt_pagefault_init(struct xe_gt *gt) { struct xe_device *xe = gt_to_xe(gt); @@ -429,10 +441,12 @@ int xe_gt_pagefault_init(struct xe_gt *gt) gt->usm.acc_wq = alloc_workqueue("xe_gt_access_counter_work_queue", WQ_UNBOUND | WQ_HIGHPRI, NUM_ACC_QUEUE); - if (!gt->usm.acc_wq) + if (!gt->usm.acc_wq) { + destroy_workqueue(gt->usm.pf_wq); return -ENOMEM; + }
- return 0; + return devm_add_action_or_reset(xe->drm.dev, pagefault_fini, gt); }
void xe_gt_pagefault_reset(struct xe_gt *gt)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rodrigo Vivi rodrigo.vivi@intel.com
[ Upstream commit ad1e331fc451a2cffc72ae193b843682ce237e24 ]
Limit the protection only during moments of actual job execution, and introduce protection for guc submit fini, which is currently unprotected due to the absence of exec_queue life protection.
In the regular use case scenario, user space will create an exec queue, and keep it alive to reuse that until it is done with that kind of workload.
For the regular desktop cases, it means that the exec_queue is alive even on idle scenarios where display goes off. This is unacceptable since this would entirely block runtime PM indefinitely, blocking deeper Package-C state. This would be a waste drainage of power.
Cc: Matthew Brost matthew.brost@intel.com Tested-by: Francois Dugast francois.dugast@intel.com Reviewed-by: Thomas Hellström thomas.hellstrom@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240522170105.327472-3-rodrig... Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Stable-dep-of: 9e7f30563677 ("drm/xe: Free job before xe_exec_queue_put") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_exec_queue.c | 14 -------------- drivers/gpu/drm/xe/xe_guc_submit.c | 3 +++ drivers/gpu/drm/xe/xe_sched_job.c | 10 +++------- 3 files changed, 6 insertions(+), 21 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index 395de93579fa6..33b03605a1d15 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -106,7 +106,6 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
static int __xe_exec_queue_init(struct xe_exec_queue *q) { - struct xe_device *xe = gt_to_xe(q->gt); int i, err;
for (i = 0; i < q->width; ++i) { @@ -119,17 +118,6 @@ static int __xe_exec_queue_init(struct xe_exec_queue *q) if (err) goto err_lrc;
- /* - * Normally the user vm holds an rpm ref to keep the device - * awake, and the context holds a ref for the vm, however for - * some engines we use the kernels migrate vm underneath which offers no - * such rpm ref, or we lack a vm. Make sure we keep a ref here, so we - * can perform GuC CT actions when needed. Caller is expected to have - * already grabbed the rpm ref outside any sensitive locks. - */ - if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !q->vm)) - xe_pm_runtime_get_noresume(xe); - return 0;
err_lrc: @@ -216,8 +204,6 @@ void xe_exec_queue_fini(struct xe_exec_queue *q)
for (i = 0; i < q->width; ++i) xe_lrc_finish(q->lrc + i); - if (!(q->flags & EXEC_QUEUE_FLAG_PERMANENT) && (q->flags & EXEC_QUEUE_FLAG_VM || !q->vm)) - xe_pm_runtime_put(gt_to_xe(q->gt)); __xe_exec_queue_free(q); }
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index 0f42971ff0a83..8c75791cbc4fb 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -35,6 +35,7 @@ #include "xe_macros.h" #include "xe_map.h" #include "xe_mocs.h" +#include "xe_pm.h" #include "xe_ring_ops_types.h" #include "xe_sched_job.h" #include "xe_trace.h" @@ -1011,6 +1012,7 @@ static void __guc_exec_queue_fini_async(struct work_struct *w) struct xe_exec_queue *q = ge->q; struct xe_guc *guc = exec_queue_to_guc(q);
+ xe_pm_runtime_get(guc_to_xe(guc)); trace_xe_exec_queue_destroy(q);
if (xe_exec_queue_is_lr(q)) @@ -1021,6 +1023,7 @@ static void __guc_exec_queue_fini_async(struct work_struct *w)
kfree(ge); xe_exec_queue_fini(q); + xe_pm_runtime_put(guc_to_xe(guc)); }
static void guc_exec_queue_fini_async(struct xe_exec_queue *q) diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c index cd8a2fba54389..a4e030f5e019a 100644 --- a/drivers/gpu/drm/xe/xe_sched_job.c +++ b/drivers/gpu/drm/xe/xe_sched_job.c @@ -158,11 +158,7 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q, for (i = 0; i < width; ++i) job->batch_addr[i] = batch_addr[i];
- /* All other jobs require a VM to be open which has a ref */ - if (unlikely(q->flags & EXEC_QUEUE_FLAG_KERNEL)) - xe_pm_runtime_get_noresume(job_to_xe(job)); - xe_device_assert_mem_access(job_to_xe(job)); - + xe_pm_runtime_get_noresume(job_to_xe(job)); trace_xe_sched_job_create(job); return job;
@@ -191,13 +187,13 @@ void xe_sched_job_destroy(struct kref *ref) { struct xe_sched_job *job = container_of(ref, struct xe_sched_job, refcount); + struct xe_device *xe = job_to_xe(job);
- if (unlikely(job->q->flags & EXEC_QUEUE_FLAG_KERNEL)) - xe_pm_runtime_put(job_to_xe(job)); xe_exec_queue_put(job->q); dma_fence_put(job->fence); drm_sched_job_cleanup(&job->drm); job_free(job); + xe_pm_runtime_put(xe); }
void xe_sched_job_set_error(struct xe_sched_job *job, int error)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Brost matthew.brost@intel.com
[ Upstream commit 08f7200899ca72dec550af092ae424b7db099abd ]
Tightly coupling these seqno presents problems if alternative fences for jobs are used. Decouple these for correctness.
v2: - Slightly reword commit message (Thomas) - Make sure the lrc fence ops are used in comparison (Thomas) - Assume seqno is unsigned rather than signed in format string (Thomas)
Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-2-thomas... Stable-dep-of: 9e7f30563677 ("drm/xe: Free job before xe_exec_queue_put") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_exec_queue.c | 2 +- drivers/gpu/drm/xe/xe_guc_submit.c | 5 +++-- drivers/gpu/drm/xe/xe_ring_ops.c | 12 ++++++------ drivers/gpu/drm/xe/xe_sched_job.c | 16 ++++++++-------- drivers/gpu/drm/xe/xe_sched_job.h | 5 +++++ drivers/gpu/drm/xe/xe_sched_job_types.h | 2 ++ drivers/gpu/drm/xe/xe_trace.h | 7 +++++-- 7 files changed, 30 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index 33b03605a1d15..79d099e054916 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -98,7 +98,7 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe,
if (xe_exec_queue_is_parallel(q)) { q->parallel.composite_fence_ctx = dma_fence_context_alloc(1); - q->parallel.composite_fence_seqno = XE_FENCE_INITIAL_SEQNO; + q->parallel.composite_fence_seqno = 0; }
return q; diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c index 8c75791cbc4fb..e48285c81bf57 100644 --- a/drivers/gpu/drm/xe/xe_guc_submit.c +++ b/drivers/gpu/drm/xe/xe_guc_submit.c @@ -917,8 +917,9 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job) return DRM_GPU_SCHED_STAT_NOMINAL; }
- drm_notice(&xe->drm, "Timedout job: seqno=%u, guc_id=%d, flags=0x%lx", - xe_sched_job_seqno(job), q->guc->id, q->flags); + drm_notice(&xe->drm, "Timedout job: seqno=%u, lrc_seqno=%u, guc_id=%d, flags=0x%lx", + xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job), + q->guc->id, q->flags); xe_gt_WARN(q->gt, q->flags & EXEC_QUEUE_FLAG_KERNEL, "Kernel-submitted job timed out\n"); xe_gt_WARN(q->gt, q->flags & EXEC_QUEUE_FLAG_VM && !exec_queue_killed(q), diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c index aca7a9af6e846..3b6aad8dea99a 100644 --- a/drivers/gpu/drm/xe/xe_ring_ops.c +++ b/drivers/gpu/drm/xe/xe_ring_ops.c @@ -412,7 +412,7 @@ static void emit_job_gen12_gsc(struct xe_sched_job *job)
__emit_job_gen12_simple(job, job->q->lrc, job->batch_addr[0], - xe_sched_job_seqno(job)); + xe_sched_job_lrc_seqno(job)); }
static void emit_job_gen12_copy(struct xe_sched_job *job) @@ -421,14 +421,14 @@ static void emit_job_gen12_copy(struct xe_sched_job *job)
if (xe_sched_job_is_migration(job->q)) { emit_migration_job_gen12(job, job->q->lrc, - xe_sched_job_seqno(job)); + xe_sched_job_lrc_seqno(job)); return; }
for (i = 0; i < job->q->width; ++i) __emit_job_gen12_simple(job, job->q->lrc + i, - job->batch_addr[i], - xe_sched_job_seqno(job)); + job->batch_addr[i], + xe_sched_job_lrc_seqno(job)); }
static void emit_job_gen12_video(struct xe_sched_job *job) @@ -439,7 +439,7 @@ static void emit_job_gen12_video(struct xe_sched_job *job) for (i = 0; i < job->q->width; ++i) __emit_job_gen12_video(job, job->q->lrc + i, job->batch_addr[i], - xe_sched_job_seqno(job)); + xe_sched_job_lrc_seqno(job)); }
static void emit_job_gen12_render_compute(struct xe_sched_job *job) @@ -449,7 +449,7 @@ static void emit_job_gen12_render_compute(struct xe_sched_job *job) for (i = 0; i < job->q->width; ++i) __emit_job_gen12_render_compute(job, job->q->lrc + i, job->batch_addr[i], - xe_sched_job_seqno(job)); + xe_sched_job_lrc_seqno(job)); }
static const struct xe_ring_ops ring_ops_gen12_gsc = { diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c index a4e030f5e019a..874450be327ec 100644 --- a/drivers/gpu/drm/xe/xe_sched_job.c +++ b/drivers/gpu/drm/xe/xe_sched_job.c @@ -117,6 +117,7 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q, err = PTR_ERR(job->fence); goto err_sched_job; } + job->lrc_seqno = job->fence->seqno; } else { struct dma_fence_array *cf;
@@ -132,6 +133,8 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q, err = PTR_ERR(fences[j]); goto err_fences; } + if (!j) + job->lrc_seqno = fences[0]->seqno; }
cf = dma_fence_array_create(q->width, fences, @@ -144,10 +147,6 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q, goto err_fences; }
- /* Sanity check */ - for (j = 0; j < q->width; ++j) - xe_assert(job_to_xe(job), cf->base.seqno == fences[j]->seqno); - job->fence = &cf->base; }
@@ -229,9 +228,9 @@ bool xe_sched_job_started(struct xe_sched_job *job) { struct xe_lrc *lrc = job->q->lrc;
- return !__dma_fence_is_later(xe_sched_job_seqno(job), + return !__dma_fence_is_later(xe_sched_job_lrc_seqno(job), xe_lrc_start_seqno(lrc), - job->fence->ops); + dma_fence_array_first(job->fence)->ops); }
bool xe_sched_job_completed(struct xe_sched_job *job) @@ -243,8 +242,9 @@ bool xe_sched_job_completed(struct xe_sched_job *job) * parallel handshake is done. */
- return !__dma_fence_is_later(xe_sched_job_seqno(job), xe_lrc_seqno(lrc), - job->fence->ops); + return !__dma_fence_is_later(xe_sched_job_lrc_seqno(job), + xe_lrc_seqno(lrc), + dma_fence_array_first(job->fence)->ops); }
void xe_sched_job_arm(struct xe_sched_job *job) diff --git a/drivers/gpu/drm/xe/xe_sched_job.h b/drivers/gpu/drm/xe/xe_sched_job.h index c75018f4660dc..002c3b5c0a5cb 100644 --- a/drivers/gpu/drm/xe/xe_sched_job.h +++ b/drivers/gpu/drm/xe/xe_sched_job.h @@ -73,6 +73,11 @@ static inline u32 xe_sched_job_seqno(struct xe_sched_job *job) return job->fence->seqno; }
+static inline u32 xe_sched_job_lrc_seqno(struct xe_sched_job *job) +{ + return job->lrc_seqno; +} + static inline void xe_sched_job_add_migrate_flush(struct xe_sched_job *job, u32 flags) { diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h index 5e12724219fdd..990ddac55ed62 100644 --- a/drivers/gpu/drm/xe/xe_sched_job_types.h +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h @@ -37,6 +37,8 @@ struct xe_sched_job { /** @user_fence.value: write back value */ u64 value; } user_fence; + /** @lrc_seqno: LRC seqno */ + u32 lrc_seqno; /** @migrate_flush_flags: Additional flush flags for migration jobs */ u32 migrate_flush_flags; /** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */ diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h index 2d56cfc09e421..6c6cecc58f63b 100644 --- a/drivers/gpu/drm/xe/xe_trace.h +++ b/drivers/gpu/drm/xe/xe_trace.h @@ -254,6 +254,7 @@ DECLARE_EVENT_CLASS(xe_sched_job,
TP_STRUCT__entry( __field(u32, seqno) + __field(u32, lrc_seqno) __field(u16, guc_id) __field(u32, guc_state) __field(u32, flags) @@ -264,6 +265,7 @@ DECLARE_EVENT_CLASS(xe_sched_job,
TP_fast_assign( __entry->seqno = xe_sched_job_seqno(job); + __entry->lrc_seqno = xe_sched_job_lrc_seqno(job); __entry->guc_id = job->q->guc->id; __entry->guc_state = atomic_read(&job->q->guc->state); @@ -273,8 +275,9 @@ DECLARE_EVENT_CLASS(xe_sched_job, __entry->batch_addr = (u64)job->batch_addr[0]; ),
- TP_printk("fence=%p, seqno=%u, guc_id=%d, batch_addr=0x%012llx, guc_state=0x%x, flags=0x%x, error=%d", - __entry->fence, __entry->seqno, __entry->guc_id, + TP_printk("fence=%p, seqno=%u, lrc_seqno=%u, guc_id=%d, batch_addr=0x%012llx, guc_state=0x%x, flags=0x%x, error=%d", + __entry->fence, __entry->seqno, + __entry->lrc_seqno, __entry->guc_id, __entry->batch_addr, __entry->guc_state, __entry->flags, __entry->error) );
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Hellström thomas.hellstrom@linux.intel.com
[ Upstream commit e183910ae4015214475b3248ce0b4c70f104f254 ]
Since sometimes a lock is required to initialize a seqno fence, and it might be desirable not to hold that lock while performing memory allocations, split the lrc seqno fence creation up into an allocation phase and an initialization phase.
Since lrc seqno fences under the hood are hw_fences, do the same for these and remove the xe_hw_fence_create() function since it is not used anymore.
Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Matthew Brost matthew.brost@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-3-thomas... Stable-dep-of: 9e7f30563677 ("drm/xe: Free job before xe_exec_queue_put") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_hw_fence.c | 59 +++++++++++++++++++++++++------- drivers/gpu/drm/xe/xe_hw_fence.h | 7 ++-- drivers/gpu/drm/xe/xe_lrc.c | 48 ++++++++++++++++++++++++-- drivers/gpu/drm/xe/xe_lrc.h | 3 ++ 4 files changed, 101 insertions(+), 16 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_hw_fence.c b/drivers/gpu/drm/xe/xe_hw_fence.c index f872ef1031272..35c0063a831af 100644 --- a/drivers/gpu/drm/xe/xe_hw_fence.c +++ b/drivers/gpu/drm/xe/xe_hw_fence.c @@ -208,23 +208,58 @@ static struct xe_hw_fence *to_xe_hw_fence(struct dma_fence *fence) return container_of(fence, struct xe_hw_fence, dma); }
-struct xe_hw_fence *xe_hw_fence_create(struct xe_hw_fence_ctx *ctx, - struct iosys_map seqno_map) +/** + * xe_hw_fence_alloc() - Allocate an hw fence. + * + * Allocate but don't initialize an hw fence. + * + * Return: Pointer to the allocated fence or + * negative error pointer on error. + */ +struct dma_fence *xe_hw_fence_alloc(void) { - struct xe_hw_fence *fence; + struct xe_hw_fence *hw_fence = fence_alloc();
- fence = fence_alloc(); - if (!fence) + if (!hw_fence) return ERR_PTR(-ENOMEM);
- fence->ctx = ctx; - fence->seqno_map = seqno_map; - INIT_LIST_HEAD(&fence->irq_link); + return &hw_fence->dma; +}
- dma_fence_init(&fence->dma, &xe_hw_fence_ops, &ctx->irq->lock, - ctx->dma_fence_ctx, ctx->next_seqno++); +/** + * xe_hw_fence_free() - Free an hw fence. + * @fence: Pointer to the fence to free. + * + * Frees an hw fence that hasn't yet been + * initialized. + */ +void xe_hw_fence_free(struct dma_fence *fence) +{ + fence_free(&fence->rcu); +}
- trace_xe_hw_fence_create(fence); +/** + * xe_hw_fence_init() - Initialize an hw fence. + * @fence: Pointer to the fence to initialize. + * @ctx: Pointer to the struct xe_hw_fence_ctx fence context. + * @seqno_map: Pointer to the map into where the seqno is blitted. + * + * Initializes a pre-allocated hw fence. + * After initialization, the fence is subject to normal + * dma-fence refcounting. + */ +void xe_hw_fence_init(struct dma_fence *fence, struct xe_hw_fence_ctx *ctx, + struct iosys_map seqno_map) +{ + struct xe_hw_fence *hw_fence = + container_of(fence, typeof(*hw_fence), dma); + + hw_fence->ctx = ctx; + hw_fence->seqno_map = seqno_map; + INIT_LIST_HEAD(&hw_fence->irq_link); + + dma_fence_init(fence, &xe_hw_fence_ops, &ctx->irq->lock, + ctx->dma_fence_ctx, ctx->next_seqno++);
- return fence; + trace_xe_hw_fence_create(hw_fence); } diff --git a/drivers/gpu/drm/xe/xe_hw_fence.h b/drivers/gpu/drm/xe/xe_hw_fence.h index cfe5fd603787e..f13a1c4982c73 100644 --- a/drivers/gpu/drm/xe/xe_hw_fence.h +++ b/drivers/gpu/drm/xe/xe_hw_fence.h @@ -24,7 +24,10 @@ void xe_hw_fence_ctx_init(struct xe_hw_fence_ctx *ctx, struct xe_gt *gt, struct xe_hw_fence_irq *irq, const char *name); void xe_hw_fence_ctx_finish(struct xe_hw_fence_ctx *ctx);
-struct xe_hw_fence *xe_hw_fence_create(struct xe_hw_fence_ctx *ctx, - struct iosys_map seqno_map); +struct dma_fence *xe_hw_fence_alloc(void);
+void xe_hw_fence_free(struct dma_fence *fence); + +void xe_hw_fence_init(struct dma_fence *fence, struct xe_hw_fence_ctx *ctx, + struct iosys_map seqno_map); #endif diff --git a/drivers/gpu/drm/xe/xe_lrc.c b/drivers/gpu/drm/xe/xe_lrc.c index d7bf7bc9dc145..995f47eb365c3 100644 --- a/drivers/gpu/drm/xe/xe_lrc.c +++ b/drivers/gpu/drm/xe/xe_lrc.c @@ -901,10 +901,54 @@ u32 xe_lrc_seqno_ggtt_addr(struct xe_lrc *lrc) return __xe_lrc_seqno_ggtt_addr(lrc); }
+/** + * xe_lrc_alloc_seqno_fence() - Allocate an lrc seqno fence. + * + * Allocate but don't initialize an lrc seqno fence. + * + * Return: Pointer to the allocated fence or + * negative error pointer on error. + */ +struct dma_fence *xe_lrc_alloc_seqno_fence(void) +{ + return xe_hw_fence_alloc(); +} + +/** + * xe_lrc_free_seqno_fence() - Free an lrc seqno fence. + * @fence: Pointer to the fence to free. + * + * Frees an lrc seqno fence that hasn't yet been + * initialized. + */ +void xe_lrc_free_seqno_fence(struct dma_fence *fence) +{ + xe_hw_fence_free(fence); +} + +/** + * xe_lrc_init_seqno_fence() - Initialize an lrc seqno fence. + * @lrc: Pointer to the lrc. + * @fence: Pointer to the fence to initialize. + * + * Initializes a pre-allocated lrc seqno fence. + * After initialization, the fence is subject to normal + * dma-fence refcounting. + */ +void xe_lrc_init_seqno_fence(struct xe_lrc *lrc, struct dma_fence *fence) +{ + xe_hw_fence_init(fence, &lrc->fence_ctx, __xe_lrc_seqno_map(lrc)); +} + struct dma_fence *xe_lrc_create_seqno_fence(struct xe_lrc *lrc) { - return &xe_hw_fence_create(&lrc->fence_ctx, - __xe_lrc_seqno_map(lrc))->dma; + struct dma_fence *fence = xe_lrc_alloc_seqno_fence(); + + if (IS_ERR(fence)) + return fence; + + xe_lrc_init_seqno_fence(lrc, fence); + return fence; }
s32 xe_lrc_seqno(struct xe_lrc *lrc) diff --git a/drivers/gpu/drm/xe/xe_lrc.h b/drivers/gpu/drm/xe/xe_lrc.h index d32fa31faa2cf..f57c5836dab87 100644 --- a/drivers/gpu/drm/xe/xe_lrc.h +++ b/drivers/gpu/drm/xe/xe_lrc.h @@ -38,6 +38,9 @@ void xe_lrc_write_ctx_reg(struct xe_lrc *lrc, int reg_nr, u32 val); u64 xe_lrc_descriptor(struct xe_lrc *lrc);
u32 xe_lrc_seqno_ggtt_addr(struct xe_lrc *lrc); +struct dma_fence *xe_lrc_alloc_seqno_fence(void); +void xe_lrc_free_seqno_fence(struct dma_fence *fence); +void xe_lrc_init_seqno_fence(struct xe_lrc *lrc, struct dma_fence *fence); struct dma_fence *xe_lrc_create_seqno_fence(struct xe_lrc *lrc); s32 xe_lrc_seqno(struct xe_lrc *lrc);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Hellström thomas.hellstrom@linux.intel.com
[ Upstream commit 0ac7a2c745e8a42803378b944fa0f4455b7240f6 ]
Pre-allocate but don't initialize fences at xe_sched_job_create(), and initialize / arm them instead at xe_sched_job_arm(). This makes it possible to move xe_sched_job_create() with its memory allocation out of any lock that is required for fence initialization, and that may not allow memory allocation under it.
Replaces the struct dma_fence_array for parallell jobs with a struct dma_fence_chain, since the former doesn't allow a split-up between allocation and initialization.
v2: - Rebase. - Don't always use the first lrc when initializing parallel lrc fences. - Use dma_fence_chain_contained() to access the lrc fences.
v4: - Add an assert that job->lrc_seqno == fence->seqno. (Matthew Brost)
Signed-off-by: Thomas Hellström thomas.hellstrom@linux.intel.com Reviewed-by: Matthew Brost matthew.brost@intel.com Reviewed-by: Rodrigo Vivi rodrigo.vivi@intel.com #v2 Link: https://patchwork.freedesktop.org/patch/msgid/20240527135912.152156-4-thomas... Stable-dep-of: 9e7f30563677 ("drm/xe: Free job before xe_exec_queue_put") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_exec_queue.c | 5 - drivers/gpu/drm/xe/xe_exec_queue_types.h | 10 -- drivers/gpu/drm/xe/xe_ring_ops.c | 12 +- drivers/gpu/drm/xe/xe_sched_job.c | 161 +++++++++++++---------- drivers/gpu/drm/xe/xe_sched_job_types.h | 18 ++- drivers/gpu/drm/xe/xe_trace.h | 2 +- 6 files changed, 115 insertions(+), 93 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_exec_queue.c b/drivers/gpu/drm/xe/xe_exec_queue.c index 79d099e054916..2ae4420e29353 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue.c +++ b/drivers/gpu/drm/xe/xe_exec_queue.c @@ -96,11 +96,6 @@ static struct xe_exec_queue *__xe_exec_queue_alloc(struct xe_device *xe, } }
- if (xe_exec_queue_is_parallel(q)) { - q->parallel.composite_fence_ctx = dma_fence_context_alloc(1); - q->parallel.composite_fence_seqno = 0; - } - return q; }
diff --git a/drivers/gpu/drm/xe/xe_exec_queue_types.h b/drivers/gpu/drm/xe/xe_exec_queue_types.h index ee78d497d838a..f0c40e8ad80a1 100644 --- a/drivers/gpu/drm/xe/xe_exec_queue_types.h +++ b/drivers/gpu/drm/xe/xe_exec_queue_types.h @@ -103,16 +103,6 @@ struct xe_exec_queue { struct xe_guc_exec_queue *guc; };
- /** - * @parallel: parallel submission state - */ - struct { - /** @parallel.composite_fence_ctx: context composite fence */ - u64 composite_fence_ctx; - /** @parallel.composite_fence_seqno: seqno for composite fence */ - u32 composite_fence_seqno; - } parallel; - /** @sched_props: scheduling properties */ struct { /** @sched_props.timeslice_us: timeslice period in micro-seconds */ diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c index 3b6aad8dea99a..c59c4373aefad 100644 --- a/drivers/gpu/drm/xe/xe_ring_ops.c +++ b/drivers/gpu/drm/xe/xe_ring_ops.c @@ -380,7 +380,7 @@ static void emit_migration_job_gen12(struct xe_sched_job *job,
dw[i++] = MI_ARB_ON_OFF | MI_ARB_DISABLE; /* Enabled again below */
- i = emit_bb_start(job->batch_addr[0], BIT(8), dw, i); + i = emit_bb_start(job->ptrs[0].batch_addr, BIT(8), dw, i);
if (!IS_SRIOV_VF(gt_to_xe(job->q->gt))) { /* XXX: Do we need this? Leaving for now. */ @@ -389,7 +389,7 @@ static void emit_migration_job_gen12(struct xe_sched_job *job, dw[i++] = preparser_disable(false); }
- i = emit_bb_start(job->batch_addr[1], BIT(8), dw, i); + i = emit_bb_start(job->ptrs[1].batch_addr, BIT(8), dw, i);
dw[i++] = MI_FLUSH_DW | MI_INVALIDATE_TLB | job->migrate_flush_flags | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_IMM_DW; @@ -411,7 +411,7 @@ static void emit_job_gen12_gsc(struct xe_sched_job *job) xe_gt_assert(gt, job->q->width <= 1); /* no parallel submission for GSCCS */
__emit_job_gen12_simple(job, job->q->lrc, - job->batch_addr[0], + job->ptrs[0].batch_addr, xe_sched_job_lrc_seqno(job)); }
@@ -427,7 +427,7 @@ static void emit_job_gen12_copy(struct xe_sched_job *job)
for (i = 0; i < job->q->width; ++i) __emit_job_gen12_simple(job, job->q->lrc + i, - job->batch_addr[i], + job->ptrs[i].batch_addr, xe_sched_job_lrc_seqno(job)); }
@@ -438,7 +438,7 @@ static void emit_job_gen12_video(struct xe_sched_job *job) /* FIXME: Not doing parallel handshake for now */ for (i = 0; i < job->q->width; ++i) __emit_job_gen12_video(job, job->q->lrc + i, - job->batch_addr[i], + job->ptrs[i].batch_addr, xe_sched_job_lrc_seqno(job)); }
@@ -448,7 +448,7 @@ static void emit_job_gen12_render_compute(struct xe_sched_job *job)
for (i = 0; i < job->q->width; ++i) __emit_job_gen12_render_compute(job, job->q->lrc + i, - job->batch_addr[i], + job->ptrs[i].batch_addr, xe_sched_job_lrc_seqno(job)); }
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c index 874450be327ec..29f3201d7dfac 100644 --- a/drivers/gpu/drm/xe/xe_sched_job.c +++ b/drivers/gpu/drm/xe/xe_sched_job.c @@ -6,7 +6,7 @@ #include "xe_sched_job.h"
#include <drm/xe_drm.h> -#include <linux/dma-fence-array.h> +#include <linux/dma-fence-chain.h> #include <linux/slab.h>
#include "xe_device.h" @@ -29,7 +29,7 @@ int __init xe_sched_job_module_init(void) xe_sched_job_slab = kmem_cache_create("xe_sched_job", sizeof(struct xe_sched_job) + - sizeof(u64), 0, + sizeof(struct xe_job_ptrs), 0, SLAB_HWCACHE_ALIGN, NULL); if (!xe_sched_job_slab) return -ENOMEM; @@ -37,7 +37,7 @@ int __init xe_sched_job_module_init(void) xe_sched_job_parallel_slab = kmem_cache_create("xe_sched_job_parallel", sizeof(struct xe_sched_job) + - sizeof(u64) * + sizeof(struct xe_job_ptrs) * XE_HW_ENGINE_MAX_INSTANCE, 0, SLAB_HWCACHE_ALIGN, NULL); if (!xe_sched_job_parallel_slab) { @@ -79,26 +79,33 @@ static struct xe_device *job_to_xe(struct xe_sched_job *job) return gt_to_xe(job->q->gt); }
+/* Free unused pre-allocated fences */ +static void xe_sched_job_free_fences(struct xe_sched_job *job) +{ + int i; + + for (i = 0; i < job->q->width; ++i) { + struct xe_job_ptrs *ptrs = &job->ptrs[i]; + + if (ptrs->lrc_fence) + xe_lrc_free_seqno_fence(ptrs->lrc_fence); + if (ptrs->chain_fence) + dma_fence_chain_free(ptrs->chain_fence); + } +} + struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q, u64 *batch_addr) { - struct xe_sched_job *job; - struct dma_fence **fences; bool is_migration = xe_sched_job_is_migration(q); + struct xe_sched_job *job; int err; - int i, j; + int i; u32 width;
/* only a kernel context can submit a vm-less job */ XE_WARN_ON(!q->vm && !(q->flags & EXEC_QUEUE_FLAG_KERNEL));
- /* Migration and kernel engines have their own locking */ - if (!(q->flags & (EXEC_QUEUE_FLAG_KERNEL | EXEC_QUEUE_FLAG_VM))) { - lockdep_assert_held(&q->vm->lock); - if (!xe_vm_in_lr_mode(q->vm)) - xe_vm_assert_held(q->vm); - } - job = job_alloc(xe_exec_queue_is_parallel(q) || is_migration); if (!job) return ERR_PTR(-ENOMEM); @@ -111,43 +118,25 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q, if (err) goto err_free;
- if (!xe_exec_queue_is_parallel(q)) { - job->fence = xe_lrc_create_seqno_fence(q->lrc); - if (IS_ERR(job->fence)) { - err = PTR_ERR(job->fence); - goto err_sched_job; - } - job->lrc_seqno = job->fence->seqno; - } else { - struct dma_fence_array *cf; + for (i = 0; i < q->width; ++i) { + struct dma_fence *fence = xe_lrc_alloc_seqno_fence(); + struct dma_fence_chain *chain;
- fences = kmalloc_array(q->width, sizeof(*fences), GFP_KERNEL); - if (!fences) { - err = -ENOMEM; + if (IS_ERR(fence)) { + err = PTR_ERR(fence); goto err_sched_job; } + job->ptrs[i].lrc_fence = fence;
- for (j = 0; j < q->width; ++j) { - fences[j] = xe_lrc_create_seqno_fence(q->lrc + j); - if (IS_ERR(fences[j])) { - err = PTR_ERR(fences[j]); - goto err_fences; - } - if (!j) - job->lrc_seqno = fences[0]->seqno; - } + if (i + 1 == q->width) + continue;
- cf = dma_fence_array_create(q->width, fences, - q->parallel.composite_fence_ctx, - q->parallel.composite_fence_seqno++, - false); - if (!cf) { - --q->parallel.composite_fence_seqno; + chain = dma_fence_chain_alloc(); + if (!chain) { err = -ENOMEM; - goto err_fences; + goto err_sched_job; } - - job->fence = &cf->base; + job->ptrs[i].chain_fence = chain; }
width = q->width; @@ -155,19 +144,14 @@ struct xe_sched_job *xe_sched_job_create(struct xe_exec_queue *q, width = 2;
for (i = 0; i < width; ++i) - job->batch_addr[i] = batch_addr[i]; + job->ptrs[i].batch_addr = batch_addr[i];
xe_pm_runtime_get_noresume(job_to_xe(job)); trace_xe_sched_job_create(job); return job;
-err_fences: - for (j = j - 1; j >= 0; --j) { - --q->lrc[j].fence_ctx.next_seqno; - dma_fence_put(fences[j]); - } - kfree(fences); err_sched_job: + xe_sched_job_free_fences(job); drm_sched_job_cleanup(&job->drm); err_free: xe_exec_queue_put(q); @@ -188,6 +172,7 @@ void xe_sched_job_destroy(struct kref *ref) container_of(ref, struct xe_sched_job, refcount); struct xe_device *xe = job_to_xe(job);
+ xe_sched_job_free_fences(job); xe_exec_queue_put(job->q); dma_fence_put(job->fence); drm_sched_job_cleanup(&job->drm); @@ -195,27 +180,32 @@ void xe_sched_job_destroy(struct kref *ref) xe_pm_runtime_put(xe); }
-void xe_sched_job_set_error(struct xe_sched_job *job, int error) +/* Set the error status under the fence to avoid racing with signaling */ +static bool xe_fence_set_error(struct dma_fence *fence, int error) { - if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &job->fence->flags)) - return; + unsigned long irq_flags; + bool signaled; + + spin_lock_irqsave(fence->lock, irq_flags); + signaled = test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags); + if (!signaled) + dma_fence_set_error(fence, error); + spin_unlock_irqrestore(fence->lock, irq_flags);
- dma_fence_set_error(job->fence, error); + return signaled; +}
- if (dma_fence_is_array(job->fence)) { - struct dma_fence_array *array = - to_dma_fence_array(job->fence); - struct dma_fence **child = array->fences; - unsigned int nchild = array->num_fences; +void xe_sched_job_set_error(struct xe_sched_job *job, int error) +{ + if (xe_fence_set_error(job->fence, error)) + return;
- do { - struct dma_fence *current_fence = *child++; + if (dma_fence_is_chain(job->fence)) { + struct dma_fence *iter;
- if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, - ¤t_fence->flags)) - continue; - dma_fence_set_error(current_fence, error); - } while (--nchild); + dma_fence_chain_for_each(iter, job->fence) + xe_fence_set_error(dma_fence_chain_contained(iter), + error); }
trace_xe_sched_job_set_error(job); @@ -230,7 +220,7 @@ bool xe_sched_job_started(struct xe_sched_job *job)
return !__dma_fence_is_later(xe_sched_job_lrc_seqno(job), xe_lrc_start_seqno(lrc), - dma_fence_array_first(job->fence)->ops); + dma_fence_chain_contained(job->fence)->ops); }
bool xe_sched_job_completed(struct xe_sched_job *job) @@ -244,13 +234,24 @@ bool xe_sched_job_completed(struct xe_sched_job *job)
return !__dma_fence_is_later(xe_sched_job_lrc_seqno(job), xe_lrc_seqno(lrc), - dma_fence_array_first(job->fence)->ops); + dma_fence_chain_contained(job->fence)->ops); }
void xe_sched_job_arm(struct xe_sched_job *job) { struct xe_exec_queue *q = job->q; + struct dma_fence *fence, *prev; struct xe_vm *vm = q->vm; + u64 seqno = 0; + int i; + + /* Migration and kernel engines have their own locking */ + if (IS_ENABLED(CONFIG_LOCKDEP) && + !(q->flags & (EXEC_QUEUE_FLAG_KERNEL | EXEC_QUEUE_FLAG_VM))) { + lockdep_assert_held(&q->vm->lock); + if (!xe_vm_in_lr_mode(q->vm)) + xe_vm_assert_held(q->vm); + }
if (vm && !xe_sched_job_is_migration(q) && !xe_vm_in_lr_mode(vm) && (vm->batch_invalidate_tlb || vm->tlb_flush_seqno != q->tlb_flush_seqno)) { @@ -259,6 +260,27 @@ void xe_sched_job_arm(struct xe_sched_job *job) job->ring_ops_flush_tlb = true; }
+ /* Arm the pre-allocated fences */ + for (i = 0; i < q->width; prev = fence, ++i) { + struct dma_fence_chain *chain; + + fence = job->ptrs[i].lrc_fence; + xe_lrc_init_seqno_fence(&q->lrc[i], fence); + job->ptrs[i].lrc_fence = NULL; + if (!i) { + job->lrc_seqno = fence->seqno; + continue; + } else { + xe_assert(gt_to_xe(q->gt), job->lrc_seqno == fence->seqno); + } + + chain = job->ptrs[i - 1].chain_fence; + dma_fence_chain_init(chain, prev, fence, seqno++); + job->ptrs[i - 1].chain_fence = NULL; + fence = &chain->base; + } + + job->fence = fence; drm_sched_job_arm(&job->drm); }
@@ -318,7 +340,8 @@ xe_sched_job_snapshot_capture(struct xe_sched_job *job)
snapshot->batch_addr_len = q->width; for (i = 0; i < q->width; i++) - snapshot->batch_addr[i] = xe_device_uncanonicalize_addr(xe, job->batch_addr[i]); + snapshot->batch_addr[i] = + xe_device_uncanonicalize_addr(xe, job->ptrs[i].batch_addr);
return snapshot; } diff --git a/drivers/gpu/drm/xe/xe_sched_job_types.h b/drivers/gpu/drm/xe/xe_sched_job_types.h index 990ddac55ed62..0d3f76fb05cea 100644 --- a/drivers/gpu/drm/xe/xe_sched_job_types.h +++ b/drivers/gpu/drm/xe/xe_sched_job_types.h @@ -11,6 +11,20 @@ #include <drm/gpu_scheduler.h>
struct xe_exec_queue; +struct dma_fence; +struct dma_fence_chain; + +/** + * struct xe_job_ptrs - Per hw engine instance data + */ +struct xe_job_ptrs { + /** @lrc_fence: Pre-allocated uinitialized lrc fence.*/ + struct dma_fence *lrc_fence; + /** @chain_fence: Pre-allocated ninitialized fence chain node. */ + struct dma_fence_chain *chain_fence; + /** @batch_addr: Batch buffer address. */ + u64 batch_addr; +};
/** * struct xe_sched_job - XE schedule job (batch buffer tracking) @@ -43,8 +57,8 @@ struct xe_sched_job { u32 migrate_flush_flags; /** @ring_ops_flush_tlb: The ring ops need to flush TLB before payload. */ bool ring_ops_flush_tlb; - /** @batch_addr: batch buffer address of job */ - u64 batch_addr[]; + /** @ptrs: per instance pointers. */ + struct xe_job_ptrs ptrs[]; };
struct xe_sched_job_snapshot { diff --git a/drivers/gpu/drm/xe/xe_trace.h b/drivers/gpu/drm/xe/xe_trace.h index 6c6cecc58f63b..450f407c66e8f 100644 --- a/drivers/gpu/drm/xe/xe_trace.h +++ b/drivers/gpu/drm/xe/xe_trace.h @@ -272,7 +272,7 @@ DECLARE_EVENT_CLASS(xe_sched_job, __entry->flags = job->q->flags; __entry->error = job->fence->error; __entry->fence = job->fence; - __entry->batch_addr = (u64)job->batch_addr[0]; + __entry->batch_addr = (u64)job->ptrs[0].batch_addr; ),
TP_printk("fence=%p, seqno=%u, lrc_seqno=%u, guc_id=%d, batch_addr=0x%012llx, guc_state=0x%x, flags=0x%x, error=%d",
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Brost matthew.brost@intel.com
[ Upstream commit 9e7f30563677fbeff62d368d5d2a5ac7aaa9746a ]
Free job depends on job->vm being valid, the last xe_exec_queue_put can destroy the VM. Prevent UAF by freeing job before xe_exec_queue_put.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Brost matthew.brost@intel.com Reviewed-by: Nirmoy Das nirmoy.das@intel.com Reviewed-by: Jagmeet Randhawa jagmeet.randhawa@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240820202309.1260755-1-matth... (cherry picked from commit 32a42c93b74c8ca6d0915ea3eba21bceff53042f) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/xe/xe_sched_job.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c index 29f3201d7dfac..2b064680abb96 100644 --- a/drivers/gpu/drm/xe/xe_sched_job.c +++ b/drivers/gpu/drm/xe/xe_sched_job.c @@ -171,12 +171,13 @@ void xe_sched_job_destroy(struct kref *ref) struct xe_sched_job *job = container_of(ref, struct xe_sched_job, refcount); struct xe_device *xe = job_to_xe(job); + struct xe_exec_queue *q = job->q;
xe_sched_job_free_fences(job); - xe_exec_queue_put(job->q); dma_fence_put(job->fence); drm_sched_job_cleanup(&job->drm); job_free(job); + xe_exec_queue_put(q); xe_pm_runtime_put(xe); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yang Ruibin 11162571@vivo.com
[ Upstream commit 57df60e1f981fa8c288a49012a4bbb02ae0ecdbc ]
The debugfs_create_dir() return value is never NULL, it is either a valid pointer or an error one.
Use IS_ERR() to check it.
Fixes: 7ef01f228c9f ("thermal/debugfs: Add thermal debugfs information for mitigation episodes") Fixes: 755113d76786 ("thermal/debugfs: Add thermal cooling device debugfs information") Signed-off-by: Yang Ruibin 11162571@vivo.com Link: https://patch.msgid.link/20240821075934.12145-1-11162571@vivo.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/thermal_debugfs.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/thermal/thermal_debugfs.c b/drivers/thermal/thermal_debugfs.c index 9424472291570..6d55f5fc4ca0f 100644 --- a/drivers/thermal/thermal_debugfs.c +++ b/drivers/thermal/thermal_debugfs.c @@ -178,11 +178,11 @@ struct thermal_debugfs { void thermal_debug_init(void) { d_root = debugfs_create_dir("thermal", NULL); - if (!d_root) + if (IS_ERR(d_root)) return;
d_cdev = debugfs_create_dir("cooling_devices", d_root); - if (!d_cdev) + if (IS_ERR(d_cdev)) return;
d_tz = debugfs_create_dir("thermal_zones", d_root); @@ -202,7 +202,7 @@ static struct thermal_debugfs *thermal_debugfs_add_id(struct dentry *d, int id) snprintf(ids, IDSLENGTH, "%d", id);
thermal_dbg->d_top = debugfs_create_dir(ids, d); - if (!thermal_dbg->d_top) { + if (IS_ERR(thermal_dbg->d_top)) { kfree(thermal_dbg); return NULL; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Gordeev agordeev@linux.ibm.com
[ Upstream commit d7fd2941ae9a67423d1c7bee985f240e4686634f ]
When physical memory for the kernel image is allocated it does not consider extra memory required for offsetting the image start to match it with the lower 20 bits of KASLR virtual base address. That might lead to kernel access beyond its memory range.
Suggested-by: Vasily Gorbik gor@linux.ibm.com Fixes: 693d41f7c938 ("s390/mm: Restore mapping of kernel image using large pages") Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Acked-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/boot/startup.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/s390/boot/startup.c b/arch/s390/boot/startup.c index 5a36d5538dae8..7797446620b64 100644 --- a/arch/s390/boot/startup.c +++ b/arch/s390/boot/startup.c @@ -446,9 +446,9 @@ void startup_kernel(void) */ kaslr_large_page_offset = __kaslr_offset & ~_SEGMENT_MASK; if (kaslr_enabled()) { - unsigned long end = ident_map_size - kaslr_large_page_offset; + unsigned long size = kernel_size + kaslr_large_page_offset;
- __kaslr_offset_phys = randomize_within_range(kernel_size, _SEGMENT_SIZE, 0, end); + __kaslr_offset_phys = randomize_within_range(size, _SEGMENT_SIZE, 0, ident_map_size); } if (!__kaslr_offset_phys) __kaslr_offset_phys = nokaslr_offset_phys;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Gordeev agordeev@linux.ibm.com
[ Upstream commit 1642285e511c2a40b14e87a41aa8feace6123036 ]
Symbol offsets to the KASLR base do not match symbol address in the vmlinux image. That is the result of setting the KASLR base to the beginning of .text section as result of an optimization.
Revert that optimization and allocate virtual memory for the whole kernel image including __START_KERNEL bytes as per the linker script. That allows keeping the semantics of the KASLR base offset in sync with other architectures.
Rename __START_KERNEL to TEXT_OFFSET, since it represents the offset of the .text section within the kernel image, rather than a virtual address.
Still skip mapping TEXT_OFFSET bytes to save memory on pgtables and provoke exceptions in case an attempt to access this area is made, as no kernel symbol may reside there.
In case CONFIG_KASAN is enabled the location counter might exceed the value of TEXT_OFFSET, while the decompressor linker script forcefully resets it to TEXT_OFFSET, which leads to a sections overlap link failure. Use MAX() expression to avoid that.
Reported-by: Omar Sandoval osandov@osandov.com Closes: https://lore.kernel.org/linux-s390/ZnS8dycxhtXBZVky@telecaster.dhcp.thefaceb... Fixes: 56b1069c40c7 ("s390/boot: Rework deployment of the kernel image") Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Acked-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/s390/boot/startup.c | 55 ++++++++++++++++++---------------- arch/s390/boot/vmem.c | 14 +++++++-- arch/s390/boot/vmlinux.lds.S | 7 ++++- arch/s390/include/asm/page.h | 3 +- arch/s390/kernel/vmlinux.lds.S | 2 +- arch/s390/tools/relocs.c | 2 +- 6 files changed, 52 insertions(+), 31 deletions(-)
diff --git a/arch/s390/boot/startup.c b/arch/s390/boot/startup.c index 7797446620b64..6d88f241dd43a 100644 --- a/arch/s390/boot/startup.c +++ b/arch/s390/boot/startup.c @@ -161,7 +161,7 @@ static void kaslr_adjust_relocs(unsigned long min_addr, unsigned long max_addr, loc = (long)*reloc + phys_offset; if (loc < min_addr || loc > max_addr) error("64-bit relocation outside of kernel!\n"); - *(u64 *)loc += offset - __START_KERNEL; + *(u64 *)loc += offset; } }
@@ -176,7 +176,7 @@ static void kaslr_adjust_got(unsigned long offset) */ for (entry = (u64 *)vmlinux.got_start; entry < (u64 *)vmlinux.got_end; entry++) { if (*entry) - *entry += offset - __START_KERNEL; + *entry += offset; } }
@@ -251,7 +251,7 @@ static unsigned long setup_kernel_memory_layout(unsigned long kernel_size) vmemmap_size = SECTION_ALIGN_UP(pages) * sizeof(struct page);
/* choose kernel address space layout: 4 or 3 levels. */ - BUILD_BUG_ON(!IS_ALIGNED(__START_KERNEL, THREAD_SIZE)); + BUILD_BUG_ON(!IS_ALIGNED(TEXT_OFFSET, THREAD_SIZE)); BUILD_BUG_ON(!IS_ALIGNED(__NO_KASLR_START_KERNEL, THREAD_SIZE)); BUILD_BUG_ON(__NO_KASLR_END_KERNEL > _REGION1_SIZE); vsize = get_vmem_size(ident_map_size, vmemmap_size, vmalloc_size, _REGION3_SIZE); @@ -378,31 +378,25 @@ static void kaslr_adjust_vmlinux_info(long offset) #endif }
-static void fixup_vmlinux_info(void) -{ - vmlinux.entry -= __START_KERNEL; - kaslr_adjust_vmlinux_info(-__START_KERNEL); -} - void startup_kernel(void) { - unsigned long kernel_size = vmlinux.image_size + vmlinux.bss_size; - unsigned long nokaslr_offset_phys, kaslr_large_page_offset; - unsigned long amode31_lma = 0; + unsigned long vmlinux_size = vmlinux.image_size + vmlinux.bss_size; + unsigned long nokaslr_text_lma, text_lma = 0, amode31_lma = 0; + unsigned long kernel_size = TEXT_OFFSET + vmlinux_size; + unsigned long kaslr_large_page_offset; unsigned long max_physmem_end; unsigned long asce_limit; unsigned long safe_addr; psw_t psw;
- fixup_vmlinux_info(); setup_lpp();
/* * Non-randomized kernel physical start address must be _SEGMENT_SIZE * aligned (see blow). */ - nokaslr_offset_phys = ALIGN(mem_safe_offset(), _SEGMENT_SIZE); - safe_addr = PAGE_ALIGN(nokaslr_offset_phys + kernel_size); + nokaslr_text_lma = ALIGN(mem_safe_offset(), _SEGMENT_SIZE); + safe_addr = PAGE_ALIGN(nokaslr_text_lma + vmlinux_size);
/* * Reserve decompressor memory together with decompression heap, @@ -446,16 +440,27 @@ void startup_kernel(void) */ kaslr_large_page_offset = __kaslr_offset & ~_SEGMENT_MASK; if (kaslr_enabled()) { - unsigned long size = kernel_size + kaslr_large_page_offset; + unsigned long size = vmlinux_size + kaslr_large_page_offset;
- __kaslr_offset_phys = randomize_within_range(size, _SEGMENT_SIZE, 0, ident_map_size); + text_lma = randomize_within_range(size, _SEGMENT_SIZE, TEXT_OFFSET, ident_map_size); } - if (!__kaslr_offset_phys) - __kaslr_offset_phys = nokaslr_offset_phys; - __kaslr_offset_phys |= kaslr_large_page_offset; + if (!text_lma) + text_lma = nokaslr_text_lma; + text_lma |= kaslr_large_page_offset; + + /* + * [__kaslr_offset_phys..__kaslr_offset_phys + TEXT_OFFSET] region is + * never accessed via the kernel image mapping as per the linker script: + * + * . = TEXT_OFFSET; + * + * Therefore, this region could be used for something else and does + * not need to be reserved. See how it is skipped in setup_vmem(). + */ + __kaslr_offset_phys = text_lma - TEXT_OFFSET; kaslr_adjust_vmlinux_info(__kaslr_offset_phys); - physmem_reserve(RR_VMLINUX, __kaslr_offset_phys, kernel_size); - deploy_kernel((void *)__kaslr_offset_phys); + physmem_reserve(RR_VMLINUX, text_lma, vmlinux_size); + deploy_kernel((void *)text_lma);
/* vmlinux decompression is done, shrink reserved low memory */ physmem_reserve(RR_DECOMPRESSOR, 0, (unsigned long)_decompressor_end); @@ -474,7 +479,7 @@ void startup_kernel(void) if (kaslr_enabled()) amode31_lma = randomize_within_range(vmlinux.amode31_size, PAGE_SIZE, 0, SZ_2G); if (!amode31_lma) - amode31_lma = __kaslr_offset_phys - vmlinux.amode31_size; + amode31_lma = text_lma - vmlinux.amode31_size; physmem_reserve(RR_AMODE31, amode31_lma, vmlinux.amode31_size);
/* @@ -490,8 +495,8 @@ void startup_kernel(void) * - copy_bootdata() must follow setup_vmem() to propagate changes * to bootdata made by setup_vmem() */ - clear_bss_section(__kaslr_offset_phys); - kaslr_adjust_relocs(__kaslr_offset_phys, __kaslr_offset_phys + vmlinux.image_size, + clear_bss_section(text_lma); + kaslr_adjust_relocs(text_lma, text_lma + vmlinux.image_size, __kaslr_offset, __kaslr_offset_phys); kaslr_adjust_got(__kaslr_offset); setup_vmem(__kaslr_offset, __kaslr_offset + kernel_size, asce_limit); diff --git a/arch/s390/boot/vmem.c b/arch/s390/boot/vmem.c index 40cfce2687c43..3d4dae9905cd8 100644 --- a/arch/s390/boot/vmem.c +++ b/arch/s390/boot/vmem.c @@ -89,7 +89,7 @@ static void kasan_populate_shadow(unsigned long kernel_start, unsigned long kern } memgap_start = end; } - kasan_populate(kernel_start, kernel_end, POPULATE_KASAN_MAP_SHADOW); + kasan_populate(kernel_start + TEXT_OFFSET, kernel_end, POPULATE_KASAN_MAP_SHADOW); kasan_populate(0, (unsigned long)__identity_va(0), POPULATE_KASAN_ZERO_SHADOW); kasan_populate(AMODE31_START, AMODE31_END, POPULATE_KASAN_ZERO_SHADOW); if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) { @@ -466,7 +466,17 @@ void setup_vmem(unsigned long kernel_start, unsigned long kernel_end, unsigned l (unsigned long)__identity_va(end), POPULATE_IDENTITY); } - pgtable_populate(kernel_start, kernel_end, POPULATE_KERNEL); + + /* + * [kernel_start..kernel_start + TEXT_OFFSET] region is never + * accessed as per the linker script: + * + * . = TEXT_OFFSET; + * + * Therefore, skip mapping TEXT_OFFSET bytes to prevent access to + * [__kaslr_offset_phys..__kaslr_offset_phys + TEXT_OFFSET] region. + */ + pgtable_populate(kernel_start + TEXT_OFFSET, kernel_end, POPULATE_KERNEL); pgtable_populate(AMODE31_START, AMODE31_END, POPULATE_DIRECT); pgtable_populate(__abs_lowcore, __abs_lowcore + sizeof(struct lowcore), POPULATE_ABS_LOWCORE); diff --git a/arch/s390/boot/vmlinux.lds.S b/arch/s390/boot/vmlinux.lds.S index a750711d44c86..66670212a3611 100644 --- a/arch/s390/boot/vmlinux.lds.S +++ b/arch/s390/boot/vmlinux.lds.S @@ -109,7 +109,12 @@ SECTIONS #ifdef CONFIG_KERNEL_UNCOMPRESSED . = ALIGN(PAGE_SIZE); . += AMODE31_SIZE; /* .amode31 section */ - . = ALIGN(1 << 20); /* _SEGMENT_SIZE */ + + /* + * Make sure the location counter is not less than TEXT_OFFSET. + * _SEGMENT_SIZE is not available, use ALIGN(1 << 20) instead. + */ + . = MAX(TEXT_OFFSET, ALIGN(1 << 20)); #else . = ALIGN(8); #endif diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h index 224ff9d433ead..8cac1a737424d 100644 --- a/arch/s390/include/asm/page.h +++ b/arch/s390/include/asm/page.h @@ -276,8 +276,9 @@ static inline unsigned long virt_to_pfn(const void *kaddr) #define AMODE31_SIZE (3 * PAGE_SIZE)
#define KERNEL_IMAGE_SIZE (512 * 1024 * 1024) -#define __START_KERNEL 0x100000 #define __NO_KASLR_START_KERNEL CONFIG_KERNEL_IMAGE_BASE #define __NO_KASLR_END_KERNEL (__NO_KASLR_START_KERNEL + KERNEL_IMAGE_SIZE)
+#define TEXT_OFFSET 0x100000 + #endif /* _S390_PAGE_H */ diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S index a1ce3925ec719..52bd969b28283 100644 --- a/arch/s390/kernel/vmlinux.lds.S +++ b/arch/s390/kernel/vmlinux.lds.S @@ -39,7 +39,7 @@ PHDRS {
SECTIONS { - . = __START_KERNEL; + . = TEXT_OFFSET; .text : { _stext = .; /* Start of text section */ _text = .; /* Text and read-only data */ diff --git a/arch/s390/tools/relocs.c b/arch/s390/tools/relocs.c index a74dbd5c9896a..30a732c808f35 100644 --- a/arch/s390/tools/relocs.c +++ b/arch/s390/tools/relocs.c @@ -280,7 +280,7 @@ static int do_reloc(struct section *sec, Elf_Rel *rel) case R_390_GOTOFF64: break; case R_390_64: - add_reloc(&relocs64, offset - ehdr.e_entry); + add_reloc(&relocs64, offset); break; default: die("Unsupported relocation type: %d\n", r_type);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
[ Upstream commit ec686804117a0421cf31d54427768aaf93aa0069 ]
Just ignore reparse points that the client can't parse rather than bailing out and not opening the file or directory.
Reported-by: Marc 1marc1@gmail.com Closes: https://lore.kernel.org/r/CAMHwNVv-B+Q6wa0FEXrAuzdchzcJRsPKDDRrNaYZJd6X-+iJz... Fixes: 539aad7f14da ("smb: client: introduce ->parse_reparse_point()") Tested-by: Anthony Nandaa (Microsoft) profnandaa@gmail.com Signed-off-by: Paulo Alcantara (Red Hat) pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/reparse.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/fs/smb/client/reparse.c b/fs/smb/client/reparse.c index 689d8a506d459..48c27581ec511 100644 --- a/fs/smb/client/reparse.c +++ b/fs/smb/client/reparse.c @@ -378,6 +378,8 @@ int parse_reparse_point(struct reparse_data_buffer *buf, u32 plen, struct cifs_sb_info *cifs_sb, bool unicode, struct cifs_open_info_data *data) { + struct cifs_tcon *tcon = cifs_sb_master_tcon(cifs_sb); + data->reparse.buf = buf;
/* See MS-FSCC 2.1.2 */ @@ -394,12 +396,13 @@ int parse_reparse_point(struct reparse_data_buffer *buf, case IO_REPARSE_TAG_LX_FIFO: case IO_REPARSE_TAG_LX_CHR: case IO_REPARSE_TAG_LX_BLK: - return 0; + break; default: - cifs_dbg(VFS, "%s: unhandled reparse tag: 0x%08x\n", - __func__, le32_to_cpu(buf->ReparseTag)); - return -EOPNOTSUPP; + cifs_tcon_dbg(VFS | ONCE, "unhandled reparse tag: 0x%08x\n", + le32_to_cpu(buf->ReparseTag)); + break; } + return 0; }
int smb2_parse_reparse_point(struct cifs_sb_info *cifs_sb,
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ming Lei ming.lei@redhat.com
[ Upstream commit a54a93d0e3599b05856971734e15418ac551a14c ]
Commit 4733b65d82bd ("nvme: start keep-alive after admin queue setup") moves starting keep-alive from nvme_start_ctrl() into nvme_init_ctrl_finish(), but don't move stopping keep-alive into nvme_uninit_ctrl(), so keep-alive work can be started and keep pending after failing to start controller, finally use-after-free is triggered if nvme host driver is unloaded.
This patch fixes kernel panic when running nvme/004 in case that connection failure is triggered, by moving stopping keep-alive into nvme_uninit_ctrl().
This way is reasonable because keep-alive is now started in nvme_init_ctrl_finish().
Fixes: 3af755a46881 ("nvme: move nvme_stop_keep_alive() back to original position") Cc: Hannes Reinecke hare@suse.de Cc: Mark O'Donovan shiftee@posteo.net Reported-by: Changhui Zhong czhong@redhat.com Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Hannes Reinecke hare@suse.de Signed-off-by: Ming Lei ming.lei@redhat.com Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 782090ce0bc10..d973d063bbf50 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -4522,7 +4522,6 @@ void nvme_stop_ctrl(struct nvme_ctrl *ctrl) { nvme_mpath_stop(ctrl); nvme_auth_stop(ctrl); - nvme_stop_keep_alive(ctrl); nvme_stop_failfast_work(ctrl); flush_work(&ctrl->async_event_work); cancel_work_sync(&ctrl->fw_act_work); @@ -4558,6 +4557,7 @@ EXPORT_SYMBOL_GPL(nvme_start_ctrl);
void nvme_uninit_ctrl(struct nvme_ctrl *ctrl) { + nvme_stop_keep_alive(ctrl); nvme_hwmon_exit(ctrl); nvme_fault_inject_fini(&ctrl->fault_inject); dev_pm_qos_hide_latency_tolerance(ctrl->device);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Griffin Kroah-Hartman griffin@kroah.com
commit 538fd3921afac97158d4177139a0ad39f056dbb2 upstream.
hci_conn_params_add() never checks for a NULL value and could lead to a NULL pointer dereference causing a crash.
Fixed by adding error handling in the function.
Cc: Stable stable@kernel.org Fixes: 5157b8a503fa ("Bluetooth: Fix initializing conn_params in scan phase") Signed-off-by: Griffin Kroah-Hartman griffin@kroah.com Reported-by: Yiwei Zhang zhan4630@purdue.edu Signed-off-by: Luiz Augusto von Dentz luiz.von.dentz@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/bluetooth/mgmt.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/net/bluetooth/mgmt.c +++ b/net/bluetooth/mgmt.c @@ -3457,6 +3457,10 @@ static int pair_device(struct sock *sk, * will be kept and this function does nothing. */ p = hci_conn_params_add(hdev, &cp->addr.bdaddr, addr_type); + if (!p) { + err = -EIO; + goto unlock; + }
if (p->auto_connect == HCI_AUTO_CONN_EXPLICIT) p->auto_connect = HCI_AUTO_CONN_DISABLED;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chaotian Jing chaotian.jing@mediatek.com
commit f03e94f23b04c2b71c0044c1534921b3975ef10c upstream.
scsi_logical_block_count() should return the block count of a given SCSI command. The original implementation ended up shifting twice, leading to an incorrect count being returned. Fix the conversion between bytes and logical blocks.
Cc: stable@vger.kernel.org Fixes: 6a20e21ae1e2 ("scsi: core: Add helper to return number of logical blocks in a request") Signed-off-by: Chaotian Jing chaotian.jing@mediatek.com Link: https://lore.kernel.org/r/20240813053534.7720-1-chaotian.jing@mediatek.com Reviewed-by: Bart Van Assche bvanassche@acm.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/scsi/scsi_cmnd.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/scsi/scsi_cmnd.h +++ b/include/scsi/scsi_cmnd.h @@ -234,7 +234,7 @@ static inline sector_t scsi_get_lba(stru
static inline unsigned int scsi_logical_block_count(struct scsi_cmnd *scmd) { - unsigned int shift = ilog2(scmd->device->sector_size) - SECTOR_SHIFT; + unsigned int shift = ilog2(scmd->device->sector_size);
return blk_rq_bytes(scsi_cmd_to_rq(scmd)) >> shift; }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
commit ce61b605a00502c59311d0a4b1f58d62b48272d0 upstream.
When STATUS_NO_MORE_FILES status is set to smb2 query dir response, ->StructureSize is set to 9, which mean buffer has 1 byte. This issue occurs because ->Buffer[1] in smb2_query_directory_rsp to flex-array.
Fixes: eb3e28c1e89b ("smb3: Replace smb2pdu 1-element arrays with flex-arrays") Cc: stable@vger.kernel.org # v6.1+ Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/smb2pdu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -4406,7 +4406,8 @@ int smb2_query_dir(struct ksmbd_work *wo rsp->OutputBufferLength = cpu_to_le32(0); rsp->Buffer[0] = 0; rc = ksmbd_iov_pin_rsp(work, (void *)rsp, - sizeof(struct smb2_query_directory_rsp)); + offsetof(struct smb2_query_directory_rsp, Buffer) + + 1); if (rc) goto err_out; } else {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Candice Li candice.li@amd.com
commit c99769bceab4ecb6a067b9af11f9db281eea3e2a upstream.
Add TA binary size validation to avoid OOB write.
Signed-off-by: Candice Li candice.li@amd.com Reviewed-by: Hawking Zhang Hawking.Zhang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit c0a04e3570d72aaf090962156ad085e37c62e442) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c @@ -166,6 +166,9 @@ static ssize_t ta_if_load_debugfs_write( if (ret) return -EFAULT;
+ if (ta_bin_len > PSP_1_MEG) + return -EINVAL; + copy_pos += sizeof(uint32_t);
ta_bin = kzalloc(ta_bin_len, GFP_KERNEL);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
commit e3e4bf58bad1576ac732a1429f53e3d4bfb82b4b upstream.
The workaround seems to cause stability issues on other SDMA 5.2.x IPs.
Fixes: a03ebf116303 ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556 Acked-by: Ruijing Dong ruijing.dong@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com (cherry picked from commit 2dc3851ef7d9c5439ea8e9623fc36878f3b40649) Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c +++ b/drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c @@ -176,14 +176,16 @@ static void sdma_v5_2_ring_set_wptr(stru DRM_DEBUG("calling WDOORBELL64(0x%08x, 0x%016llx)\n", ring->doorbell_index, ring->wptr << 2); WDOORBELL64(ring->doorbell_index, ring->wptr << 2); - /* SDMA seems to miss doorbells sometimes when powergating kicks in. - * Updating the wptr directly will wake it. This is only safe because - * we disallow gfxoff in begin_use() and then allow it again in end_use(). - */ - WREG32(sdma_v5_2_get_reg_offset(adev, ring->me, mmSDMA0_GFX_RB_WPTR), - lower_32_bits(ring->wptr << 2)); - WREG32(sdma_v5_2_get_reg_offset(adev, ring->me, mmSDMA0_GFX_RB_WPTR_HI), - upper_32_bits(ring->wptr << 2)); + if (amdgpu_ip_version(adev, SDMA0_HWIP, 0) == IP_VERSION(5, 2, 1)) { + /* SDMA seems to miss doorbells sometimes when powergating kicks in. + * Updating the wptr directly will wake it. This is only safe because + * we disallow gfxoff in begin_use() and then allow it again in end_use(). + */ + WREG32(sdma_v5_2_get_reg_offset(adev, ring->me, mmSDMA0_GFX_RB_WPTR), + lower_32_bits(ring->wptr << 2)); + WREG32(sdma_v5_2_get_reg_offset(adev, ring->me, mmSDMA0_GFX_RB_WPTR_HI), + upper_32_bits(ring->wptr << 2)); + } } else { DRM_DEBUG("Not using doorbell -- " "mmSDMA%i_GFX_RB_WPTR == 0x%08x "
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit cd8e468efb4fb2742e06328a75b282c35c1abf8d upstream.
Dell All In One (AIO) models released after 2017 use a backlight controller board connected to an UART.
In DSDT this uart port will be defined as:
Name (_HID, "DELL0501") Name (_CID, EisaId ("PNP0501")
Commit 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver") has added support for this, but I neglected to tie this into acpi_video_get_backlight_type().
Now the first AIO has turned up which has not only the DSDT bits for this, but also an actual controller attached to the UART, yet it is not using this controller for backlight control.
Add support to acpi_video_get_backlight_type() for a new dell_uart backlight type. So that the existing infra to override the backlight control method on the commandline or with DMI quirks can be used.
Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver") Cc: All applicable stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Andy Shevchenko andy@kernel.org Link: https://patch.msgid.link/20240814190159.15650-2-hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/video_detect.c | 7 +++++++ include/acpi/video.h | 1 + 2 files changed, 8 insertions(+)
--- a/drivers/acpi/video_detect.c +++ b/drivers/acpi/video_detect.c @@ -54,6 +54,8 @@ static void acpi_video_parse_cmdline(voi acpi_backlight_cmdline = acpi_backlight_nvidia_wmi_ec; if (!strcmp("apple_gmux", acpi_video_backlight_string)) acpi_backlight_cmdline = acpi_backlight_apple_gmux; + if (!strcmp("dell_uart", acpi_video_backlight_string)) + acpi_backlight_cmdline = acpi_backlight_dell_uart; if (!strcmp("none", acpi_video_backlight_string)) acpi_backlight_cmdline = acpi_backlight_none; } @@ -902,6 +904,7 @@ enum acpi_backlight_type __acpi_video_ge static DEFINE_MUTEX(init_mutex); static bool nvidia_wmi_ec_present; static bool apple_gmux_present; + static bool dell_uart_present; static bool native_available; static bool init_done; static long video_caps; @@ -916,6 +919,7 @@ enum acpi_backlight_type __acpi_video_ge &video_caps, NULL); nvidia_wmi_ec_present = nvidia_wmi_ec_supported(); apple_gmux_present = apple_gmux_detect(NULL, NULL); + dell_uart_present = acpi_dev_present("DELL0501", NULL, -1); init_done = true; } if (native) @@ -946,6 +950,9 @@ enum acpi_backlight_type __acpi_video_ge if (apple_gmux_present) return acpi_backlight_apple_gmux;
+ if (dell_uart_present) + return acpi_backlight_dell_uart; + /* Use ACPI video if available, except when native should be preferred. */ if ((video_caps & ACPI_VIDEO_BACKLIGHT) && !(native_available && prefer_native_over_acpi_video())) --- a/include/acpi/video.h +++ b/include/acpi/video.h @@ -50,6 +50,7 @@ enum acpi_backlight_type { acpi_backlight_native, acpi_backlight_nvidia_wmi_ec, acpi_backlight_apple_gmux, + acpi_backlight_dell_uart, };
#if IS_ENABLED(CONFIG_ACPI_VIDEO)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit 5c7bb62cb8f53de71d8ab3d619be22740da0b837 upstream.
Dell All In One (AIO) models released after 2017 may use a backlight controller board connected to an UART.
In DSDT this uart port will be defined as:
Name (_HID, "DELL0501") Name (_CID, EisaId ("PNP0501")
The Dell OptiPlex 7760 AIO has an ACPI device for one if its UARTs with the above _HID + _CID. Loading the dell-uart-backlight driver shows that there actually is a backlight controller board attached to the UART, which reports a firmware version of "G&MX01-V15".
But the backlight controller board does not actually control the backlight brightness and the GPU's native backlight control method does work.
Add a quirk to use the GPU's native backlight control method on this model.
Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver") Closes: https://bugzilla.redhat.com/show_bug.cgi?id=2303936 Cc: All applicable stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Andy Shevchenko andy@kernel.org Link: https://patch.msgid.link/20240814190159.15650-4-hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/video_detect.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
--- a/drivers/acpi/video_detect.c +++ b/drivers/acpi/video_detect.c @@ -808,6 +808,21 @@ static const struct dmi_system_id video_ },
/* + * Dell AIO (All in Ones) which advertise an UART attached backlight + * controller board in their ACPI tables (and may even have one), but + * which need native backlight control nevertheless. + */ + { + /* https://bugzilla.redhat.com/show_bug.cgi?id=2303936 */ + .callback = video_detect_force_native, + /* Dell OptiPlex 7760 AIO */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), + DMI_MATCH(DMI_PRODUCT_NAME, "OptiPlex 7760 AIO"), + }, + }, + + /* * Models which have nvidia-ec-wmi support, but should not use it. * Note this indicates a likely firmware bug on these models and should * be revisited if/when Linux gets support for dynamic mux mode.
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hans de Goede hdegoede@redhat.com
commit b5f0943001339c4d324a1af10470ce0bdd79f966 upstream.
The dell-uart-backlight driver supports backlight control on Dell All In One (AIO) models using a backlight controller board connected to an UART.
In DSDT this uart port will be defined as:
Name (_HID, "DELL0501") Name (_CID, EisaId ("PNP0501")
Now the first AIO has turned up which has not only the DSDT bits for this, but also an actual controller attached to the UART, yet it is not using this controller for backlight control.
Use the acpi_video_get_backlight_type() function from the ACPI video-detect code to check if the dell-uart-backlight driver should actually be used. This allows reusing the existing ACPI video-detect infra to override the backlight control method on the commandline or with DMI quirks.
Fixes: 484bae9e4d6a ("platform/x86: Add new Dell UART backlight driver") Cc: All applicable stable@vger.kernel.org Signed-off-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Andy Shevchenko andy@kernel.org Link: https://patch.msgid.link/20240814190159.15650-3-hdegoede@redhat.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/dell/Kconfig | 1 + drivers/platform/x86/dell/dell-uart-backlight.c | 8 ++++++++ 2 files changed, 9 insertions(+)
--- a/drivers/platform/x86/dell/Kconfig +++ b/drivers/platform/x86/dell/Kconfig @@ -148,6 +148,7 @@ config DELL_SMO8800 config DELL_UART_BACKLIGHT tristate "Dell AIO UART Backlight driver" depends on ACPI + depends on ACPI_VIDEO depends on BACKLIGHT_CLASS_DEVICE depends on SERIAL_DEV_BUS help --- a/drivers/platform/x86/dell/dell-uart-backlight.c +++ b/drivers/platform/x86/dell/dell-uart-backlight.c @@ -20,6 +20,7 @@ #include <linux/string.h> #include <linux/types.h> #include <linux/wait.h> +#include <acpi/video.h> #include "../serdev_helpers.h"
/* The backlight controller must respond within 1 second */ @@ -332,10 +333,17 @@ struct serdev_device_driver dell_uart_bl
static int dell_uart_bl_pdev_probe(struct platform_device *pdev) { + enum acpi_backlight_type bl_type; struct serdev_device *serdev; struct device *ctrl_dev; int ret;
+ bl_type = acpi_video_get_backlight_type(); + if (bl_type != acpi_backlight_dell_uart) { + dev_dbg(&pdev->dev, "Not loading (ACPI backlight type = %d)\n", bl_type); + return -ENODEV; + } + ctrl_dev = get_serdev_controller("DELL0501", NULL, 0, "serial0"); if (IS_ERR(ctrl_dev)) return PTR_ERR(ctrl_dev);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com
commit 46ee21e9f59205e54943dfe51b2dc8a9352ca37d upstream.
When only the last resource is invalid, tpmi_sst_dev_add() is returing error even if there are other valid resources before. This function should return error when there are no valid resources.
Here tpmi_sst_dev_add() is returning "ret" variable. But this "ret" variable contains the failure status of last call to sst_main(), which failed for the invalid resource. But there may be other valid resources before the last entry.
To address this, do not update "ret" variable for sst_main() return status.
If there are no valid resources, it is already checked for by !inst below the loop and -ENODEV is returned.
Fixes: 9d1d36268f3d ("platform/x86: ISST: Support partitioned systems") Signed-off-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Cc: stable@vger.kernel.org # 6.10+ Link: https://lore.kernel.org/r/20240816163626.415762-1-srinivas.pandruvada@linux.... Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/intel/speed_select_if/isst_tpmi_core.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/platform/x86/intel/speed_select_if/isst_tpmi_core.c +++ b/drivers/platform/x86/intel/speed_select_if/isst_tpmi_core.c @@ -1549,8 +1549,7 @@ int tpmi_sst_dev_add(struct auxiliary_de goto unlock_free; }
- ret = sst_main(auxdev, &pd_info[i]); - if (ret) { + if (sst_main(auxdev, &pd_info[i])) { /* * This entry is not valid, hardware can partially * populate dies. In this case MMIO will have 0xFFs.
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Harald Freudenberger freude@linux.ibm.com
commit b4f5bd60d558f6ba451d7e76aa05782c07a182a3 upstream.
With the rework of the AP bus scan and the introduction of a bindings complete completion also the timing until the userspace finally receives a AP bus binding complete uevent had increased. Unfortunately this event triggers some important jobs for preparation of KVM guests, for example the modification of card/queue masks to reassign AP resources to the alternate AP queue device driver (vfio_ap) which is the precondition for building mediated devices which may be a precondition for starting KVM guests using AP resources.
This small fix now triggers the check for binding complete each time an AP device driver has registered. With this patch the bindings complete may be posted up to 30s earlier as there is no need to wait for the next AP bus scan any more.
Fixes: 778412ab915d ("s390/ap: rearm APQNs bindings complete completion") Signed-off-by: Harald Freudenberger freude@linux.ibm.com Reviewed-by: Holger Dengler dengler@linux.ibm.com Cc: stable@vger.kernel.org Acked-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Vasily Gorbik gor@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/crypto/ap_bus.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/s390/crypto/ap_bus.c b/drivers/s390/crypto/ap_bus.c index 0998b17ecb37..f9f682f19415 100644 --- a/drivers/s390/crypto/ap_bus.c +++ b/drivers/s390/crypto/ap_bus.c @@ -971,11 +971,16 @@ int ap_driver_register(struct ap_driver *ap_drv, struct module *owner, char *name) { struct device_driver *drv = &ap_drv->driver; + int rc;
drv->bus = &ap_bus_type; drv->owner = owner; drv->name = name; - return driver_register(drv); + rc = driver_register(drv); + + ap_check_bindings_complete(); + + return rc; } EXPORT_SYMBOL(ap_driver_register);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mengyuan Lou mengyuanlou@net-swift.com
commit f2916c83d746eb99f50f42c15cf4c47c2ea5f3b3 upstream.
The MAC only has add the TX delay and it can not be modified. MAC and PHY are both set the TX delay cause transmission problems. So just disable TX delay in PHY, when use rgmii to attach to external phy, set PHY_INTERFACE_MODE_RGMII_RXID to phy drivers. And it is does not matter to internal phy.
Fixes: bc2426d74aa3 ("net: ngbe: convert phylib to phylink") Signed-off-by: Mengyuan Lou mengyuanlou@net-swift.com Cc: stable@vger.kernel.org # 6.3+ Reviewed-by: Jacob Keller jacob.e.keller@intel.com Link: https://patch.msgid.link/E6759CF1387CF84C+20240820030425.93003-1-mengyuanlou... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c +++ b/drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c @@ -124,8 +124,12 @@ static int ngbe_phylink_init(struct wx * MAC_SYM_PAUSE | MAC_ASYM_PAUSE; config->mac_managed_pm = true;
- phy_mode = PHY_INTERFACE_MODE_RGMII_ID; - __set_bit(PHY_INTERFACE_MODE_RGMII_ID, config->supported_interfaces); + /* The MAC only has add the Tx delay and it can not be modified. + * So just disable TX delay in PHY, and it is does not matter to + * internal phy. + */ + phy_mode = PHY_INTERFACE_MODE_RGMII_RXID; + __set_bit(PHY_INTERFACE_MODE_RGMII_RXID, config->supported_interfaces);
phylink = phylink_create(config, NULL, phy_mode, &ngbe_mac_ops); if (IS_ERR(phylink))
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin Whitaker foss@martin-whitaker.me.uk
commit 6efea5135417ae8194485d1d05ea79a21cf1a11c upstream.
When performing the port_hwtstamp_set operation, ptp_schedule_worker() will be called if hardware timestamoing is enabled on any of the ports. When using multiple ports for PTP, port_hwtstamp_set is executed for each port. When called for the first time ptp_schedule_worker() returns 0. On subsequent calls it returns 1, indicating the worker is already scheduled. Currently the ksz driver treats 1 as an error and fails to complete the port_hwtstamp_set operation, thus leaving the timestamping configuration for those ports unchanged.
This patch fixes this by ignoring the ptp_schedule_worker() return value.
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/7aae307a-35ca-4209-a850-7b2749d40f90@martin-whitaker... Fixes: bb01ad30570b0 ("net: dsa: microchip: ptp: manipulating absolute time using ptp hw clock") Signed-off-by: Martin Whitaker foss@martin-whitaker.me.uk Reviewed-by: Andrew Lunn andrew@lunn.ch Acked-by: Arun Ramadoss arun.ramadoss@microchip.com Link: https://patch.msgid.link/20240817094141.3332-1-foss@martin-whitaker.me.uk Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/dsa/microchip/ksz_ptp.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-)
--- a/drivers/net/dsa/microchip/ksz_ptp.c +++ b/drivers/net/dsa/microchip/ksz_ptp.c @@ -266,7 +266,6 @@ static int ksz_ptp_enable_mode(struct ks struct ksz_port *prt; struct dsa_port *dp; bool tag_en = false; - int ret;
dsa_switch_for_each_user_port(dp, dev->ds) { prt = &dev->ports[dp->index]; @@ -277,9 +276,7 @@ static int ksz_ptp_enable_mode(struct ks }
if (tag_en) { - ret = ptp_schedule_worker(ptp_data->clock, 0); - if (ret) - return ret; + ptp_schedule_worker(ptp_data->clock, 0); } else { ptp_cancel_worker_sync(ptp_data->clock); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiaxun Yang jiaxun.yang@flygoat.com
commit 1cb6ab446424649f03c82334634360c2e3043684 upstream.
Loongson64 C and G processors have EXTIMER feature which is conflicting with CP0 counter.
Although the processor resets in EXTIMER disabled & INTIMER enabled mode, which is compatible with MIPS CP0 compare, firmware may attempt to enable EXTIMER and interfere CP0 compare.
Set timer mode back to MIPS compatible mode to fix booting on systems with such firmware before we have an actual driver for EXTIMER.
Cc: stable@vger.kernel.org Signed-off-by: Jiaxun Yang jiaxun.yang@flygoat.com Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/mips/kernel/cpu-probe.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/arch/mips/kernel/cpu-probe.c +++ b/arch/mips/kernel/cpu-probe.c @@ -1724,12 +1724,16 @@ static inline void cpu_probe_loongson(st c->ases |= (MIPS_ASE_LOONGSON_MMI | MIPS_ASE_LOONGSON_CAM | MIPS_ASE_LOONGSON_EXT | MIPS_ASE_LOONGSON_EXT2); c->ases &= ~MIPS_ASE_VZ; /* VZ of Loongson-3A2000/3000 is incomplete */ + change_c0_config6(LOONGSON_CONF6_EXTIMER | LOONGSON_CONF6_INTIMER, + LOONGSON_CONF6_INTIMER); break; case PRID_IMP_LOONGSON_64G: __cpu_name[cpu] = "ICT Loongson-3"; set_elf_platform(cpu, "loongson3a"); set_isa(c, MIPS_CPU_ISA_M64R2); decode_cpucfg(c); + change_c0_config6(LOONGSON_CONF6_EXTIMER | LOONGSON_CONF6_INTIMER, + LOONGSON_CONF6_INTIMER); break; default: panic("Unknown Loongson Processor ID!");
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jason Gerecke jason.gerecke@wacom.com
commit 1b8f9c1fb464968a5b18d3acc1da8c00bad24fad upstream.
The Wacom driver maps the HID_DG_TWIST usage to ABS_Z (rather than ABS_RZ) for historic reasons. When the code to support twist was introduced in commit 50066a042da5 ("HID: wacom: generic: Add support for height, tilt, and twist usages"), we were careful to write it in such a way that it had HID calculate the resolution of the twist axis assuming ABS_RZ instead (so that we would get correct angular behavior). This was broken with the introduction of commit 08a46b4190d3 ("HID: wacom: Set a default resolution for older tablets"), which moved the resolution calculation to occur *before* the adjustment from ABS_Z to ABS_RZ occurred.
This commit moves the calculation of resolution after the point that we are finished setting things up for its proper use.
Signed-off-by: Jason Gerecke jason.gerecke@wacom.com Fixes: 08a46b4190d3 ("HID: wacom: Set a default resolution for older tablets") Cc: stable@vger.kernel.org Signed-off-by: Jiri Kosina jkosina@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/hid/wacom_wac.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/hid/wacom_wac.c +++ b/drivers/hid/wacom_wac.c @@ -1924,12 +1924,14 @@ static void wacom_map_usage(struct input int fmax = field->logical_maximum; unsigned int equivalent_usage = wacom_equivalent_usage(usage->hid); int resolution_code = code; - int resolution = hidinput_calc_abs_res(field, resolution_code); + int resolution;
if (equivalent_usage == HID_DG_TWIST) { resolution_code = ABS_RZ; }
+ resolution = hidinput_calc_abs_res(field, resolution_code); + if (equivalent_usage == HID_GD_X) { fmin += features->offset_left; fmax -= features->offset_right;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicolin Chen nicolinc@nvidia.com
commit 950aeefb34923fe3c28ade35fe05f24e2c5b1d55 upstream.
The rewind routine should remove the reserved iovas added to the new hwpt.
Fixes: 89db31635c87 ("iommufd: Derive iommufd_hwpt_paging from iommufd_hw_pagetable") Cc: stable@vger.kernel.org Link: https://patch.msgid.link/r/20240718050130.1956804-1-nicolinc@nvidia.com Signed-off-by: Nicolin Chen nicolinc@nvidia.com Reviewed-by: Kevin Tian kevin.tian@intel.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/iommufd/device.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -525,7 +525,7 @@ iommufd_device_do_replace(struct iommufd err_unresv: if (hwpt_is_paging(hwpt)) iommufd_group_remove_reserved_iova(igroup, - to_hwpt_paging(old_hwpt)); + to_hwpt_paging(hwpt)); err_unlock: mutex_unlock(&idev->igroup->lock); return ERR_PTR(rc);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Werner Sembach wse@tuxedocomputers.com
commit 3d765ae2daccc570b3f4fbcb57eb321b12cdded2 upstream.
On s3 resume the i8042 driver tries to restore the controller to a known state by reinitializing things, however this can confuse the controller with different effects. Mostly occasionally unresponsive keyboards after resume.
These issues do not rise on s0ix resume as here the controller is assumed to preserved its state from before suspend.
This patch adds a quirk for devices where the reinitialization on s3 resume is not needed and might be harmful as described above. It does this by using the s0ix resume code path at selected locations.
This new quirk goes beyond what the preexisting reset=never quirk does, which only skips some reinitialization steps.
Signed-off-by: Werner Sembach wse@tuxedocomputers.com Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20240104183118.779778-2-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/serio/i8042-acpipnpio.h | 10 +++++++--- drivers/input/serio/i8042.c | 10 +++++++--- 2 files changed, 14 insertions(+), 6 deletions(-)
--- a/drivers/input/serio/i8042-acpipnpio.h +++ b/drivers/input/serio/i8042-acpipnpio.h @@ -83,6 +83,7 @@ static inline void i8042_write_command(i #define SERIO_QUIRK_KBDRESET BIT(12) #define SERIO_QUIRK_DRITEK BIT(13) #define SERIO_QUIRK_NOPNP BIT(14) +#define SERIO_QUIRK_FORCENORESTORE BIT(15)
/* Quirk table for different mainboards. Options similar or identical to i8042 * module parameters. @@ -1685,6 +1686,8 @@ static void __init i8042_check_quirks(vo if (quirks & SERIO_QUIRK_NOPNP) i8042_nopnp = true; #endif + if (quirks & SERIO_QUIRK_FORCENORESTORE) + i8042_forcenorestore = true; } #else static inline void i8042_check_quirks(void) {} @@ -1718,7 +1721,7 @@ static int __init i8042_platform_init(vo
i8042_check_quirks();
- pr_debug("Active quirks (empty means none):%s%s%s%s%s%s%s%s%s%s%s%s%s\n", + pr_debug("Active quirks (empty means none):%s%s%s%s%s%s%s%s%s%s%s%s%s%s\n", i8042_nokbd ? " nokbd" : "", i8042_noaux ? " noaux" : "", i8042_nomux ? " nomux" : "", @@ -1738,10 +1741,11 @@ static int __init i8042_platform_init(vo "", #endif #ifdef CONFIG_PNP - i8042_nopnp ? " nopnp" : ""); + i8042_nopnp ? " nopnp" : "", #else - ""); + "", #endif + i8042_forcenorestore ? " forcenorestore" : "");
retval = i8042_pnp_init(); if (retval) --- a/drivers/input/serio/i8042.c +++ b/drivers/input/serio/i8042.c @@ -115,6 +115,10 @@ module_param_named(nopnp, i8042_nopnp, b MODULE_PARM_DESC(nopnp, "Do not use PNP to detect controller settings"); #endif
+static bool i8042_forcenorestore; +module_param_named(forcenorestore, i8042_forcenorestore, bool, 0); +MODULE_PARM_DESC(forcenorestore, "Force no restore on s3 resume, copying s2idle behaviour"); + #define DEBUG #ifdef DEBUG static bool i8042_debug; @@ -1232,7 +1236,7 @@ static int i8042_pm_suspend(struct devic { int i;
- if (pm_suspend_via_firmware()) + if (!i8042_forcenorestore && pm_suspend_via_firmware()) i8042_controller_reset(true);
/* Set up serio interrupts for system wakeup. */ @@ -1248,7 +1252,7 @@ static int i8042_pm_suspend(struct devic
static int i8042_pm_resume_noirq(struct device *dev) { - if (!pm_resume_via_firmware()) + if (i8042_forcenorestore || !pm_resume_via_firmware()) i8042_interrupt(0, NULL);
return 0; @@ -1271,7 +1275,7 @@ static int i8042_pm_resume(struct device * not restore the controller state to whatever it had been at boot * time, so we do not need to do anything. */ - if (!pm_suspend_via_firmware()) + if (i8042_forcenorestore || !pm_suspend_via_firmware()) return 0;
/*
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Werner Sembach wse@tuxedocomputers.com
commit aaa4ca873d3da768896ffc909795359a01e853ef upstream.
The old quirk combination sometimes cause a laggy keyboard after boot. With the new quirk the initial issue of an unresponsive keyboard after s3 resume is also fixed, but it doesn't have the negative side effect of the sometimes laggy keyboard.
Signed-off-by: Werner Sembach wse@tuxedocomputers.com Cc: stable@vger.kernel.org Reviewed-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20240104183118.779778-3-wse@tuxedocomputers.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/serio/i8042-acpipnpio.h | 10 +--------- 1 file changed, 1 insertion(+), 9 deletions(-)
--- a/drivers/input/serio/i8042-acpipnpio.h +++ b/drivers/input/serio/i8042-acpipnpio.h @@ -1150,18 +1150,10 @@ static const struct dmi_system_id i8042_ SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) }, { - /* - * Setting SERIO_QUIRK_NOMUX or SERIO_QUIRK_RESET_ALWAYS makes - * the keyboard very laggy for ~5 seconds after boot and - * sometimes also after resume. - * However both are required for the keyboard to not fail - * completely sometimes after boot or resume. - */ .matches = { DMI_MATCH(DMI_BOARD_NAME, "N150CU"), }, - .driver_data = (void *)(SERIO_QUIRK_NOMUX | SERIO_QUIRK_RESET_ALWAYS | - SERIO_QUIRK_NOLOOP | SERIO_QUIRK_NOPNP) + .driver_data = (void *)(SERIO_QUIRK_FORCENORESTORE) }, { .matches = {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Ellerman mpe@ellerman.id.au
commit 822c8020aebcf5804a143b891e34f29873fee5e2 upstream.
Kolbjørn and Jonáš reported that their 32-bit PowerMacs were crashing in pata-macio since commit 09fe2bfa6b83 ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K").
For example:
kernel BUG at drivers/ata/pata_macio.c:544! Oops: Exception in kernel mode, sig: 5 [#1] BE PAGE_SIZE=4K MMU=Hash SMP NR_CPUS=2 DEBUG_PAGEALLOC PowerMac ... NIP pata_macio_qc_prep+0xf4/0x190 LR pata_macio_qc_prep+0xfc/0x190 Call Trace: 0xc1421660 (unreliable) ata_qc_issue+0x14c/0x2d4 __ata_scsi_queuecmd+0x200/0x53c ata_scsi_queuecmd+0x50/0xe0 scsi_queue_rq+0x788/0xb1c __blk_mq_issue_directly+0x58/0xf4 blk_mq_plug_issue_direct+0x8c/0x1b4 blk_mq_flush_plug_list.part.0+0x584/0x5e0 __blk_flush_plug+0xf8/0x194 __submit_bio+0x1b8/0x2e0 submit_bio_noacct_nocheck+0x230/0x304 btrfs_work_helper+0x200/0x338 process_one_work+0x1a8/0x338 worker_thread+0x364/0x4c0 kthread+0x100/0x104 start_kernel_thread+0x10/0x14
That commit increased max_segment_size to 64KB, with the justification that the SCSI core was already using that size when PAGE_SIZE == 64KB, and that there was existing logic to split over-sized requests.
However with a sufficiently large request, the splitting logic causes each sg to be split into two commands in the DMA table, leading to overflow of the DMA table, triggering the BUG_ON().
With default settings the bug doesn't trigger, because the request size is limited by max_sectors_kb == 1280, however max_sectors_kb can be increased, and apparently some distros do that by default using udev rules.
Fix the bug for 4KB kernels by reverting to the old max_segment_size.
For 64KB kernels the sg_tablesize needs to be halved, to allow for the possibility that each sg will be split into two.
Fixes: 09fe2bfa6b83 ("ata: pata_macio: Fix max_segment_size with PAGE_SIZE == 64K") Cc: stable@vger.kernel.org # v6.10+ Reported-by: Kolbjørn Barmen linux-ppc@kolla.no Closes: https://lore.kernel.org/all/62d248bb-e97a-25d2-bcf2-9160c518cae5@kolla.no/ Reported-by: Jonáš Vidra vidra@ufal.mff.cuni.cz Closes: https://lore.kernel.org/all/3b6441b8-06e6-45da-9e55-f92f2c86933e@ufal.mff.cu... Tested-by: Kolbjørn Barmen linux-ppc@kolla.no Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/ata/pata_macio.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-)
diff --git a/drivers/ata/pata_macio.c b/drivers/ata/pata_macio.c index 1b85e8bf4ef9..1cb8d24b088f 100644 --- a/drivers/ata/pata_macio.c +++ b/drivers/ata/pata_macio.c @@ -208,6 +208,19 @@ static const char* macio_ata_names[] = { /* Don't let a DMA segment go all the way to 64K */ #define MAX_DBDMA_SEG 0xff00
+#ifdef CONFIG_PAGE_SIZE_64KB +/* + * The SCSI core requires the segment size to cover at least a page, so + * for 64K page size kernels it must be at least 64K. However the + * hardware can't handle 64K, so pata_macio_qc_prep() will split large + * requests. To handle the split requests the tablesize must be halved. + */ +#define PATA_MACIO_MAX_SEGMENT_SIZE SZ_64K +#define PATA_MACIO_SG_TABLESIZE (MAX_DCMDS / 2) +#else +#define PATA_MACIO_MAX_SEGMENT_SIZE MAX_DBDMA_SEG +#define PATA_MACIO_SG_TABLESIZE MAX_DCMDS +#endif
/* * Wait 1s for disk to answer on IDE bus after a hard reset @@ -912,16 +925,10 @@ static int pata_macio_do_resume(struct pata_macio_priv *priv)
static const struct scsi_host_template pata_macio_sht = { __ATA_BASE_SHT(DRV_NAME), - .sg_tablesize = MAX_DCMDS, + .sg_tablesize = PATA_MACIO_SG_TABLESIZE, /* We may not need that strict one */ .dma_boundary = ATA_DMA_BOUNDARY, - /* - * The SCSI core requires the segment size to cover at least a page, so - * for 64K page size kernels this must be at least 64K. However the - * hardware can't handle 64K, so pata_macio_qc_prep() will split large - * requests. - */ - .max_segment_size = SZ_64K, + .max_segment_size = PATA_MACIO_MAX_SEGMENT_SIZE, .device_configure = pata_macio_device_configure, .sdev_groups = ata_common_sdev_groups, .can_queue = ATA_DEF_QUEUE,
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nikolay Kuratov kniv@yandex-team.ru
commit 80a1e7b83bb1834b5568a3872e64c05795d88f31 upstream.
It is done everywhere in cxgb4 code, e.g. in is_filter_exact_match() There is no reason it should not be done here
Found by Linux Verification Center (linuxtesting.org) with SVACE
Signed-off-by: Nikolay Kuratov kniv@yandex-team.ru Cc: stable@vger.kernel.org Fixes: 12b276fbf6e0 ("cxgb4: add support to create hash filters") Reviewed-by: Simon Horman horms@kernel.org Reviewed-by: Jacob Keller jacob.e.keller@intel.com Link: https://patch.msgid.link/20240819075408.92378-1-kniv@yandex-team.ru Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c +++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c @@ -1244,7 +1244,8 @@ static u64 hash_filter_ntuple(struct ch_ * in the Compressed Filter Tuple. */ if (tp->vlan_shift >= 0 && fs->mask.ivlan) - ntuple |= (FT_VLAN_VLD_F | fs->val.ivlan) << tp->vlan_shift; + ntuple |= (u64)(FT_VLAN_VLD_F | + fs->val.ivlan) << tp->vlan_shift;
if (tp->port_shift >= 0 && fs->mask.iport) ntuple |= (u64)fs->val.iport << tp->port_shift;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zenghui Yu yuzenghui@huawei.com
commit 2240a50e6294214de791729e9dcba6880fa7e44e upstream.
If there were LPIs being mapped behind our back (i.e., between .start() and .stop()), we would put them at iter_unmark_lpis() without checking if they were actually *marked*, which is obviously not good.
Switch to use the xa_for_each_marked() iterator to fix it.
Cc: stable@vger.kernel.org Fixes: 85d3ccc8b75b ("KVM: arm64: vgic-debug: Use an xarray mark for debug iterator") Signed-off-by: Zenghui Yu yuzenghui@huawei.com Reviewed-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/20240817101541.1664-1-yuzenghui@huawei.com Signed-off-by: Oliver Upton oliver.upton@linux.dev Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/vgic/vgic-debug.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm64/kvm/vgic/vgic-debug.c +++ b/arch/arm64/kvm/vgic/vgic-debug.c @@ -84,7 +84,7 @@ static void iter_unmark_lpis(struct kvm struct vgic_irq *irq; unsigned long intid;
- xa_for_each(&dist->lpi_xa, intid, irq) { + xa_for_each_marked(&dist->lpi_xa, intid, irq, LPI_XA_MARK_DEBUG_ITER) { xa_clear_mark(&dist->lpi_xa, intid, LPI_XA_MARK_DEBUG_ITER); vgic_put_irq(kvm, irq); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marc Zyngier maz@kernel.org
commit 3e6245ebe7ef341639e9a7e402b3ade8ad45a19f upstream.
On a system with a GICv3, if a guest hasn't been configured with GICv3 and that the host is not capable of GICv2 emulation, a write to any of the ICC_*SGI*_EL1 registers is trapped to EL2.
We therefore try to emulate the SGI access, only to hit a NULL pointer as no private interrupt is allocated (no GIC, remember?).
The obvious fix is to give the guest what it deserves, in the shape of a UNDEF exception.
Reported-by: Alexander Potapenko glider@google.com Signed-off-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240820100349.3544850-2-maz@kernel.org Signed-off-by: Oliver Upton oliver.upton@linux.dev Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kvm/sys_regs.c | 6 ++++++ arch/arm64/kvm/vgic/vgic.h | 7 +++++++ 2 files changed, 13 insertions(+)
--- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -33,6 +33,7 @@ #include <trace/events/kvm.h>
#include "sys_regs.h" +#include "vgic/vgic.h"
#include "trace.h"
@@ -428,6 +429,11 @@ static bool access_gic_sgi(struct kvm_vc { bool g1;
+ if (!kvm_has_gicv3(vcpu->kvm)) { + kvm_inject_undefined(vcpu); + return false; + } + if (!p->is_write) return read_from_write_only(vcpu, p, r);
--- a/arch/arm64/kvm/vgic/vgic.h +++ b/arch/arm64/kvm/vgic/vgic.h @@ -346,4 +346,11 @@ void vgic_v4_configure_vsgis(struct kvm void vgic_v4_get_vlpi_state(struct vgic_irq *irq, bool *val); int vgic_v4_request_vpe_irq(struct kvm_vcpu *vcpu, int irq);
+static inline bool kvm_has_gicv3(struct kvm *kvm) +{ + return (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif) && + irqchip_in_kernel(kvm) && + kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3); +} + #endif
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chen Ridong chenridong@huawei.com
commit 959ab6350add903e352890af53e86663739fcb9a upstream.
We find a bug as below: BUG: unable to handle page fault for address: 00000003 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 358 Comm: bash Tainted: G W I 6.6.0-10893-g60d6 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/4 RIP: 0010:partition_sched_domains_locked+0x483/0x600 Code: 01 48 85 d2 74 0d 48 83 05 29 3f f8 03 01 f3 48 0f bc c2 89 c0 48 9 RSP: 0018:ffffc90000fdbc58 EFLAGS: 00000202 RAX: 0000000100000003 RBX: ffff888100b3dfa0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000002fe80 RBP: ffff888100b3dfb0 R08: 0000000000000001 R09: 0000000000000000 R10: ffffc90000fdbcb0 R11: 0000000000000004 R12: 0000000000000002 R13: ffff888100a92b48 R14: 0000000000000000 R15: 0000000000000000 FS: 00007f44a5425740(0000) GS:ffff888237d80000(0000) knlGS:0000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000100030973 CR3: 000000010722c000 CR4: 00000000000006e0 Call Trace: <TASK> ? show_regs+0x8c/0xa0 ? __die_body+0x23/0xa0 ? __die+0x3a/0x50 ? page_fault_oops+0x1d2/0x5c0 ? partition_sched_domains_locked+0x483/0x600 ? search_module_extables+0x2a/0xb0 ? search_exception_tables+0x67/0x90 ? kernelmode_fixup_or_oops+0x144/0x1b0 ? __bad_area_nosemaphore+0x211/0x360 ? up_read+0x3b/0x50 ? bad_area_nosemaphore+0x1a/0x30 ? exc_page_fault+0x890/0xd90 ? __lock_acquire.constprop.0+0x24f/0x8d0 ? __lock_acquire.constprop.0+0x24f/0x8d0 ? asm_exc_page_fault+0x26/0x30 ? partition_sched_domains_locked+0x483/0x600 ? partition_sched_domains_locked+0xf0/0x600 rebuild_sched_domains_locked+0x806/0xdc0 update_partition_sd_lb+0x118/0x130 cpuset_write_resmask+0xffc/0x1420 cgroup_file_write+0xb2/0x290 kernfs_fop_write_iter+0x194/0x290 new_sync_write+0xeb/0x160 vfs_write+0x16f/0x1d0 ksys_write+0x81/0x180 __x64_sys_write+0x21/0x30 x64_sys_call+0x2f25/0x4630 do_syscall_64+0x44/0xb0 entry_SYSCALL_64_after_hwframe+0x78/0xe2 RIP: 0033:0x7f44a553c887
It can be reproduced with cammands: cd /sys/fs/cgroup/ mkdir test cd test/ echo +cpuset > ../cgroup.subtree_control echo root > cpuset.cpus.partition cat /sys/fs/cgroup/cpuset.cpus.effective 0-3 echo 0-3 > cpuset.cpus // taking away all cpus from root
This issue is caused by the incorrect rebuilding of scheduling domains. In this scenario, test/cpuset.cpus.partition should be an invalid root and should not trigger the rebuilding of scheduling domains. When calling update_parent_effective_cpumask with partcmd_update, if newmask is not null, it should recheck newmask whether there are cpus is available for parect/cs that has tasks.
Fixes: 0c7f293efc87 ("cgroup/cpuset: Add cpuset.cpus.exclusive.effective for v2") Cc: stable@vger.kernel.org # v6.7+ Signed-off-by: Chen Ridong chenridong@huawei.com Signed-off-by: Waiman Long longman@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cgroup/cpuset.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -1940,6 +1940,8 @@ static int update_parent_effective_cpuma part_error = PERR_CPUSEMPTY; goto write_error; } + /* Check newmask again, whether cpus are available for parent/cs */ + nocpu |= tasks_nocpu_error(parent, cs, newmask);
/* * partcmd_update with newmask:
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Waiman Long longman@redhat.com
commit 311a1bdc44a8e06024df4fd3392be0dfc8298655 upstream.
Commit e2ffe502ba45 ("cgroup/cpuset: Add cpuset.cpus.exclusive for v2") adds a user writable cpuset.cpus.exclusive file for setting exclusive CPUs to be used for the creation of partitions. Since then effective_xcpus depends on both the cpuset.cpus and cpuset.cpus.exclusive setting. If cpuset.cpus.exclusive is set, effective_xcpus will depend only on cpuset.cpus.exclusive. When it is not set, effective_xcpus will be set according to the cpuset.cpus value when the cpuset becomes a valid partition root.
When cpuset.cpus is being cleared by the user, effective_xcpus should only be cleared when cpuset.cpus.exclusive is not set. However, that is not currently the case.
# cd /sys/fs/cgroup/ # mkdir test # echo +cpuset > cgroup.subtree_control # cd test # echo 3 > cpuset.cpus.exclusive # cat cpuset.cpus.exclusive.effective 3 # echo > cpuset.cpus # cat cpuset.cpus.exclusive.effective // was cleared
Fix it by clearing effective_xcpus only if cpuset.cpus.exclusive is not set.
Fixes: e2ffe502ba45 ("cgroup/cpuset: Add cpuset.cpus.exclusive for v2") Cc: stable@vger.kernel.org # v6.7+ Reported-by: Chen Ridong chenridong@huawei.com Signed-off-by: Waiman Long longman@redhat.com Signed-off-by: Tejun Heo tj@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/cgroup/cpuset.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/kernel/cgroup/cpuset.c +++ b/kernel/cgroup/cpuset.c @@ -2474,7 +2474,8 @@ static int update_cpumask(struct cpuset */ if (!*buf) { cpumask_clear(trialcs->cpus_allowed); - cpumask_clear(trialcs->effective_xcpus); + if (cpumask_empty(trialcs->exclusive_cpus)) + cpumask_clear(trialcs->effective_xcpus); } else { retval = cpulist_parse(buf, trialcs->cpus_allowed); if (retval < 0)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mengqi Zhang mengqi.zhang@mediatek.com
commit 9374ae912dbb1eed8139ed75fd2c0f1b30ca454d upstream.
When we use cmd8 as the tuning command in hs400 mode, the command response sent back by some eMMC devices cannot be correctly sampled by MTK eMMC controller at some weak sample timing. In this case, command timeout error may occur. So we must receive the following data to make sure the next cmd8 send correctly.
Signed-off-by: Mengqi Zhang mengqi.zhang@mediatek.com Fixes: c4ac38c6539b ("mmc: mtk-sd: Add HS400 online tuning support") Cc: stable@vger.stable.com Link: https://lore.kernel.org/r/20240716013704.10578-1-mengqi.zhang@mediatek.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/mtk-sd.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/drivers/mmc/host/mtk-sd.c +++ b/drivers/mmc/host/mtk-sd.c @@ -1230,7 +1230,7 @@ static bool msdc_cmd_done(struct msdc_ho }
if (!sbc_error && !(events & MSDC_INT_CMDRDY)) { - if (events & MSDC_INT_CMDTMO || + if ((events & MSDC_INT_CMDTMO && !host->hs400_tuning) || (!mmc_op_tuning(cmd->opcode) && !host->hs400_tuning)) /* * should not clear fifo/interrupt as the tune data @@ -1323,9 +1323,9 @@ static void msdc_start_command(struct ms static void msdc_cmd_next(struct msdc_host *host, struct mmc_request *mrq, struct mmc_command *cmd) { - if ((cmd->error && - !(cmd->error == -EILSEQ && - (mmc_op_tuning(cmd->opcode) || host->hs400_tuning))) || + if ((cmd->error && !host->hs400_tuning && + !(cmd->error == -EILSEQ && + mmc_op_tuning(cmd->opcode))) || (mrq->sbc && mrq->sbc->error)) msdc_request_done(host, mrq); else if (cmd == mrq->sbc)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ben Whitten ben.whitten@gmail.com
commit 6275c7bc8dd07644ea8142a1773d826800f0f3f7 upstream.
Fix a race condition if the clock provider comes up after mmc is probed, this causes mmc to fail without retrying. When given the DEFER error from the clk source, pass it on up the chain.
Fixes: f90a0612f0e1 ("mmc: dw_mmc: lookup for optional biu and ciu clocks") Signed-off-by: Ben Whitten ben.whitten@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240811212212.123255-1-ben.whitten@gmail.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/dw_mmc.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/mmc/host/dw_mmc.c +++ b/drivers/mmc/host/dw_mmc.c @@ -3293,6 +3293,10 @@ int dw_mci_probe(struct dw_mci *host) host->biu_clk = devm_clk_get(host->dev, "biu"); if (IS_ERR(host->biu_clk)) { dev_dbg(host->dev, "biu clock not available\n"); + ret = PTR_ERR(host->biu_clk); + if (ret == -EPROBE_DEFER) + return ret; + } else { ret = clk_prepare_enable(host->biu_clk); if (ret) { @@ -3304,6 +3308,10 @@ int dw_mci_probe(struct dw_mci *host) host->ciu_clk = devm_clk_get(host->dev, "ciu"); if (IS_ERR(host->ciu_clk)) { dev_dbg(host->dev, "ciu clock not available\n"); + ret = PTR_ERR(host->ciu_clk); + if (ret == -EPROBE_DEFER) + goto err_clk_biu; + host->bus_hz = host->pdata->bus_hz; } else { ret = clk_prepare_enable(host->ciu_clk);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steve French stfrench@microsoft.com
commit e4be320eeca842a3d7648258ee3673f1755a5a59 upstream.
Mandatory locking is enforced for cached reads, which violates default posix semantics, and also it is enforced inconsistently. This affected recent versions of libreoffice, and can be demonstrated by opening a file twice from the same client, locking it from handle one and trying to read from it from handle two (which fails, returning EACCES).
There is already a mount option "forcemandatorylock" (which defaults to off), so with this change only when the user intentionally specifies "forcemandatorylock" on mount will we break posix semantics on read to a locked range (ie we will only fail in this case, if the user mounts with "forcemandatorylock").
An earlier patch fixed the write path.
Fixes: 85160e03a79e ("CIFS: Implement caching mechanism for mandatory brlocks") Cc: stable@vger.kernel.org Cc: Pavel Shilovsky piastryyy@gmail.com Reviewed-by: David Howells dhowells@redhat.com Reported-by: abartlet@samba.org Reported-by: Kevin Ottens kevin.ottens@enioka.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/file.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/fs/smb/client/file.c +++ b/fs/smb/client/file.c @@ -2912,9 +2912,7 @@ cifs_strict_readv(struct kiocb *iocb, st if (!CIFS_CACHE_READ(cinode)) return netfs_unbuffered_read_iter(iocb, to);
- if (cap_unix(tcon->ses) && - (CIFS_UNIX_FCNTL_CAP & le64_to_cpu(tcon->fsUnixInfo.Capability)) && - ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0)) { + if ((cifs_sb->mnt_cifs_flags & CIFS_MOUNT_NOPOSIXBRL) == 0) { if (iocb->ki_flags & IOCB_DIRECT) return netfs_unbuffered_read_iter(iocb, to); return netfs_buffered_read_iter(iocb, to);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Stein alexander.stein@ew.tq-group.com
commit 50359c9c3cb3e55e840e3485f5ee37da5b2b16b6 upstream.
These clocks are already added to the list. Remove the duplicates ones.
Fixes: a67d780720ff ("genpd: imx: scu-pd: add more PDs") Signed-off-by: Alexander Stein alexander.stein@ew.tq-group.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240717080334.2210988-1-alexander.stein@ew.tq-gro... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pmdomain/imx/scu-pd.c | 5 ----- 1 file changed, 5 deletions(-)
--- a/drivers/pmdomain/imx/scu-pd.c +++ b/drivers/pmdomain/imx/scu-pd.c @@ -223,11 +223,6 @@ static const struct imx_sc_pd_range imx8 { "lvds1-pwm", IMX_SC_R_LVDS_1_PWM_0, 1, false, 0 }, { "lvds1-lpi2c", IMX_SC_R_LVDS_1_I2C_0, 2, true, 0 },
- { "mipi1", IMX_SC_R_MIPI_1, 1, 0 }, - { "mipi1-pwm0", IMX_SC_R_MIPI_1_PWM_0, 1, 0 }, - { "mipi1-i2c", IMX_SC_R_MIPI_1_I2C_0, 2, 1 }, - { "lvds1", IMX_SC_R_LVDS_1, 1, 0 }, - /* DC SS */ { "dc0", IMX_SC_R_DC_0, 1, false, 0 }, { "dc0-pll", IMX_SC_R_DC_0_PLL_0, 2, true, 0 },
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Peng Fan peng.fan@nxp.com
commit 52dd070c62e4ae2b5e7411b920e3f7a64235ecfb upstream.
With "quiet" set in bootargs, there is power domain failure: "imx93_power_domain 44462400.power-domain: pd_off timeout: name: 44462400.power-domain, stat: 4"
The current power on opertation takes ISO state as power on finished flag, but it is wrong. Before powering on operation really finishes, powering off comes and powering off will never finish because the last powering on still not finishes, so the following powering off actually not trigger hardware state machine to run. SSAR is the last step when powering on a domain, so need to wait SSAR done when powering on.
Since EdgeLock Enclave(ELE) handshake is involved in the flow, enlarge the waiting time to 10ms for both on and off to avoid timeout.
Cc: stable@vger.kernel.org Fixes: 0a0f7cc25d4a ("soc: imx: add i.MX93 SRC power domain driver") Reviewed-by: Jacky Bai ping.bai@nxp.com Signed-off-by: Peng Fan peng.fan@nxp.com Link: https://lore.kernel.org/r/20240814124740.2778952-1-peng.fan@oss.nxp.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pmdomain/imx/imx93-pd.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/pmdomain/imx/imx93-pd.c +++ b/drivers/pmdomain/imx/imx93-pd.c @@ -20,6 +20,7 @@ #define FUNC_STAT_PSW_STAT_MASK BIT(0) #define FUNC_STAT_RST_STAT_MASK BIT(2) #define FUNC_STAT_ISO_STAT_MASK BIT(4) +#define FUNC_STAT_SSAR_STAT_MASK BIT(8)
struct imx93_power_domain { struct generic_pm_domain genpd; @@ -50,7 +51,7 @@ static int imx93_pd_on(struct generic_pm writel(val, addr + MIX_SLICE_SW_CTRL_OFF);
ret = readl_poll_timeout(addr + MIX_FUNC_STAT_OFF, val, - !(val & FUNC_STAT_ISO_STAT_MASK), 1, 10000); + !(val & FUNC_STAT_SSAR_STAT_MASK), 1, 10000); if (ret) { dev_err(domain->dev, "pd_on timeout: name: %s, stat: %x\n", genpd->name, val); return ret; @@ -72,7 +73,7 @@ static int imx93_pd_off(struct generic_p writel(val, addr + MIX_SLICE_SW_CTRL_OFF);
ret = readl_poll_timeout(addr + MIX_FUNC_STAT_OFF, val, - val & FUNC_STAT_PSW_STAT_MASK, 1, 1000); + val & FUNC_STAT_PSW_STAT_MASK, 1, 10000); if (ret) { dev_err(domain->dev, "pd_off timeout: name: %s, stat: %x\n", genpd->name, val); return ret;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Airlie airlied@redhat.com
commit 9b340aeb26d50e9a9ec99599e2a39b035fac978e upstream.
Currently, enabling SG_DEBUG in the kernel will cause nouveau to hit a BUG() on startup, when the iommu is enabled:
kernel BUG at include/linux/scatterlist.h:187! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 7 PID: 930 Comm: (udev-worker) Not tainted 6.9.0-rc3Lyude-Test+ #30 Hardware name: MSI MS-7A39/A320M GAMING PRO (MS-7A39), BIOS 1.I0 01/22/2019 RIP: 0010:sg_init_one+0x85/0xa0 Code: 69 88 32 01 83 e1 03 f6 c3 03 75 20 a8 01 75 1e 48 09 cb 41 89 54 24 08 49 89 1c 24 41 89 6c 24 0c 5b 5d 41 5c e9 7b b9 88 00 <0f> 0b 0f 0b 0f 0b 48 8b 05 5e 46 9a 01 eb b2 66 66 2e 0f 1f 84 00 RSP: 0018:ffffa776017bf6a0 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffffa77600d87000 RCX: 000000000000002b RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffffa77680d87000 RBP: 000000000000e000 R08: 0000000000000000 R09: 0000000000000000 R10: ffff98f4c46aa508 R11: 0000000000000000 R12: ffff98f4c46aa508 R13: ffff98f4c46aa008 R14: ffffa77600d4a000 R15: ffffa77600d4a018 FS: 00007feeb5aae980(0000) GS:ffff98f5c4dc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f22cb9a4520 CR3: 00000001043ba000 CR4: 00000000003506f0 Call Trace: <TASK> ? die+0x36/0x90 ? do_trap+0xdd/0x100 ? sg_init_one+0x85/0xa0 ? do_error_trap+0x65/0x80 ? sg_init_one+0x85/0xa0 ? exc_invalid_op+0x50/0x70 ? sg_init_one+0x85/0xa0 ? asm_exc_invalid_op+0x1a/0x20 ? sg_init_one+0x85/0xa0 nvkm_firmware_ctor+0x14a/0x250 [nouveau] nvkm_falcon_fw_ctor+0x42/0x70 [nouveau] ga102_gsp_booter_ctor+0xb4/0x1a0 [nouveau] r535_gsp_oneinit+0xb3/0x15f0 [nouveau] ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? nvkm_udevice_new+0x95/0x140 [nouveau] ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? ktime_get+0x47/0xb0
Fix this by using the non-coherent allocator instead, I think there might be a better answer to this, but it involve ripping up some of APIs using sg lists.
Cc: stable@vger.kernel.org Fixes: 2541626cfb79 ("drm/nouveau/acr: use common falcon HS FW code for ACR FWs") Signed-off-by: Dave Airlie airlied@redhat.com Signed-off-by: Danilo Krummrich dakr@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/20240815201923.632803-1-airlie... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nvkm/core/firmware.c | 9 ++++++--- drivers/gpu/drm/nouveau/nvkm/falcon/fw.c | 6 ++++++ 2 files changed, 12 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/nouveau/nvkm/core/firmware.c +++ b/drivers/gpu/drm/nouveau/nvkm/core/firmware.c @@ -205,7 +205,8 @@ nvkm_firmware_dtor(struct nvkm_firmware break; case NVKM_FIRMWARE_IMG_DMA: nvkm_memory_unref(&memory); - dma_free_coherent(fw->device->dev, sg_dma_len(&fw->mem.sgl), fw->img, fw->phys); + dma_free_noncoherent(fw->device->dev, sg_dma_len(&fw->mem.sgl), + fw->img, fw->phys, DMA_TO_DEVICE); break; case NVKM_FIRMWARE_IMG_SGT: nvkm_memory_unref(&memory); @@ -236,10 +237,12 @@ nvkm_firmware_ctor(const struct nvkm_fir break; case NVKM_FIRMWARE_IMG_DMA: { dma_addr_t addr; - len = ALIGN(fw->len, PAGE_SIZE);
- fw->img = dma_alloc_coherent(fw->device->dev, len, &addr, GFP_KERNEL); + fw->img = dma_alloc_noncoherent(fw->device->dev, + len, &addr, + DMA_TO_DEVICE, + GFP_KERNEL); if (fw->img) { memcpy(fw->img, src, fw->len); fw->phys = addr; --- a/drivers/gpu/drm/nouveau/nvkm/falcon/fw.c +++ b/drivers/gpu/drm/nouveau/nvkm/falcon/fw.c @@ -89,6 +89,12 @@ nvkm_falcon_fw_boot(struct nvkm_falcon_f nvkm_falcon_fw_dtor_sigs(fw); }
+ /* after last write to the img, sync dma mappings */ + dma_sync_single_for_device(fw->fw.device->dev, + fw->fw.phys, + sg_dma_len(&fw->fw.mem.sgl), + DMA_TO_DEVICE); + FLCNFW_DBG(fw, "resetting"); fw->func->reset(fw);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit afc954fd223ded70b1fa000767e2531db55cce58 upstream.
Terminating for_each_child_of_node() loop requires dropping OF node reference, so bailing out after thermal_of_populate_trip() error misses this. Solve the OF node reference leak with scoped for_each_child_of_node_scoped().
Fixes: d0c75fa2c17f ("thermal/of: Initialize trip points separately") Cc: All applicable stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Chen-Yu Tsai wenst@chromium.org Reviewed-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://patch.msgid.link/20240814195823.437597-1-krzysztof.kozlowski@linaro.... Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thermal/thermal_of.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/thermal/thermal_of.c +++ b/drivers/thermal/thermal_of.c @@ -125,7 +125,7 @@ static int thermal_of_populate_trip(stru static struct thermal_trip *thermal_of_trips_init(struct device_node *np, int *ntrips) { struct thermal_trip *tt; - struct device_node *trips, *trip; + struct device_node *trips; int ret, count;
trips = of_get_child_by_name(np, "trips"); @@ -150,7 +150,7 @@ static struct thermal_trip *thermal_of_t *ntrips = count;
count = 0; - for_each_child_of_node(trips, trip) { + for_each_child_of_node_scoped(trips, trip) { ret = thermal_of_populate_trip(trip, &tt[count++]); if (ret) goto out_kfree;
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit 662b52b761bfe0ba970e5823759798faf809b896 upstream.
thermal_of_zone_register() calls of_thermal_zone_find() which will iterate over OF nodes with for_each_available_child_of_node() to find matching thermal zone node. When it finds such, it exits the loop and returns the node. Prematurely ending for_each_available_child_of_node() loops requires dropping OF node reference, thus success of of_thermal_zone_find() means that caller must drop the reference.
Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization") Cc: All applicable stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Chen-Yu Tsai wenst@chromium.org Reviewed-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://patch.msgid.link/20240814195823.437597-2-krzysztof.kozlowski@linaro.... Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thermal/thermal_of.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/thermal/thermal_of.c +++ b/drivers/thermal/thermal_of.c @@ -491,7 +491,8 @@ static struct thermal_zone_device *therm trips = thermal_of_trips_init(np, &ntrips); if (IS_ERR(trips)) { pr_err("Failed to find trip points for %pOFn id=%d\n", sensor, id); - return ERR_CAST(trips); + ret = PTR_ERR(trips); + goto out_of_node_put; }
ret = thermal_of_monitor_init(np, &delay, &pdelay); @@ -519,6 +520,7 @@ static struct thermal_zone_device *therm goto out_kfree_trips; }
+ of_node_put(np); kfree(trips);
ret = thermal_zone_device_enable(tz); @@ -533,6 +535,8 @@ static struct thermal_zone_device *therm
out_kfree_trips: kfree(trips); +out_of_node_put: + of_node_put(np);
return ERR_PTR(ret); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit c0a1ef9c5be72ff28a5413deb1b3e1a066593c13 upstream.
Terminating for_each_available_child_of_node() loop requires dropping OF node reference, so bailing out on errors misses this. Solve the OF node reference leak with scoped for_each_available_child_of_node_scoped().
Fixes: 3fd6d6e2b4e8 ("thermal/of: Rework the thermal device tree initialization") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Chen-Yu Tsai wenst@chromium.org Reviewed-by: Daniel Lezcano daniel.lezcano@linaro.org Link: https://patch.msgid.link/20240814195823.437597-3-krzysztof.kozlowski@linaro.... Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thermal/thermal_of.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
--- a/drivers/thermal/thermal_of.c +++ b/drivers/thermal/thermal_of.c @@ -184,14 +184,14 @@ static struct device_node *of_thermal_zo * Search for each thermal zone, a defined sensor * corresponding to the one passed as parameter */ - for_each_available_child_of_node(np, tz) { + for_each_available_child_of_node_scoped(np, child) {
int count, i;
- count = of_count_phandle_with_args(tz, "thermal-sensors", + count = of_count_phandle_with_args(child, "thermal-sensors", "#thermal-sensor-cells"); if (count <= 0) { - pr_err("%pOFn: missing thermal sensor\n", tz); + pr_err("%pOFn: missing thermal sensor\n", child); tz = ERR_PTR(-EINVAL); goto out; } @@ -200,18 +200,19 @@ static struct device_node *of_thermal_zo
int ret;
- ret = of_parse_phandle_with_args(tz, "thermal-sensors", + ret = of_parse_phandle_with_args(child, "thermal-sensors", "#thermal-sensor-cells", i, &sensor_specs); if (ret < 0) { - pr_err("%pOFn: Failed to read thermal-sensors cells: %d\n", tz, ret); + pr_err("%pOFn: Failed to read thermal-sensors cells: %d\n", child, ret); tz = ERR_PTR(ret); goto out; }
if ((sensor == sensor_specs.np) && id == (sensor_specs.args_count ? sensor_specs.args[0] : 0)) { - pr_debug("sensor %pOFn id=%d belongs to %pOFn\n", sensor, id, tz); + pr_debug("sensor %pOFn id=%d belongs to %pOFn\n", sensor, id, child); + tz = no_free_ptr(child); goto out; } }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit e255683c06df572ead96db5efb5d21be30c0efaa upstream.
If no subflow is attached to the 'signal' endpoint that is being removed, the addr ID will not be marked as available again.
Mark the linked ID as available when removing the address entry from the list to cover this case.
Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-1-38035d40de5b@... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1431,7 +1431,10 @@ static bool mptcp_pm_remove_anno_addr(st ret = remove_anno_list_by_saddr(msk, addr); if (ret || force) { spin_lock_bh(&msk->pm.lock); - msk->pm.add_addr_signaled -= ret; + if (ret) { + __set_bit(addr->id, msk->pm.id_avail_bitmap); + msk->pm.add_addr_signaled--; + } mptcp_pm_remove_addr(msk, &list); spin_unlock_bh(&msk->pm.lock); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit edd8b5d868a4d459f3065493001e293901af758d upstream.
If no subflow is attached to the 'subflow' endpoint that is being removed, the addr ID will not be marked as available again.
Mark the linked ID as available when removing the 'subflow' endpoint if no subflow is attached to it.
While at it, the local_addr_used counter is decremented if the ID was marked as being used to reflect the reality, but also to allow adding new endpoints after that.
Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-3-38035d40de5b@... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1469,8 +1469,17 @@ static int mptcp_nl_remove_subflow_and_s remove_subflow = lookup_subflow_by_saddr(&msk->conn_list, addr); mptcp_pm_remove_anno_addr(msk, addr, remove_subflow && !(entry->flags & MPTCP_PM_ADDR_FLAG_IMPLICIT)); - if (remove_subflow) + + if (remove_subflow) { mptcp_pm_remove_subflow(msk, &list); + } else if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) { + /* If the subflow has been used, but now closed */ + spin_lock_bh(&msk->pm.lock); + if (!__test_and_set_bit(entry->addr.id, msk->pm.id_avail_bitmap)) + msk->pm.local_addr_used--; + spin_unlock_bh(&msk->pm.lock); + } + release_sock(sk);
next:
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit ef34a6ea0cab1800f4b3c9c3c2cefd5091e03379 upstream.
If no subflows are attached to the 'subflow' endpoints that are being flushed, the corresponding addr IDs will not be marked as available again.
Mark all ID as being available when flushing all the 'subflow' endpoints, and reset local_addr_used counter to cover these cases.
Note that mptcp_pm_remove_addrs_and_subflows() helper is only called for flushing operations, not to remove a specific set of addresses and subflows.
Fixes: 06faa2271034 ("mptcp: remove multi addresses and subflows in PM") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-5-38035d40de5b@... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1623,8 +1623,15 @@ static void mptcp_pm_remove_addrs_and_su mptcp_pm_remove_addr(msk, &alist); spin_unlock_bh(&msk->pm.lock); } + if (slist.nr) mptcp_pm_remove_subflow(msk, &slist); + + /* Reset counters: maybe some subflows have been removed before */ + spin_lock_bh(&msk->pm.lock); + bitmap_fill(msk->pm.id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); + msk->pm.local_addr_used = 0; + spin_unlock_bh(&msk->pm.lock); }
static void mptcp_nl_remove_addrs_list(struct net *net,
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit f448451aa62d54be16acb0034223c17e0d12bc69 upstream.
This helper is confusing. It is in pm.c, but it is specific to the in-kernel PM and it cannot be used by the userspace one. Also, it simply calls one in-kernel specific function with the PM lock, while the similar mptcp_pm_remove_addr() helper requires the PM lock.
What's left is the pr_debug(), which is not that useful, because a similar one is present in the only function called by this helper:
mptcp_pm_nl_rm_subflow_received()
After these modifications, this helper can be marked as 'static', and the lock can be taken only once in mptcp_pm_flush_addrs_and_subflows().
Note that it is not a bug fix, but it will help backporting the following commits.
Fixes: 0ee4261a3681 ("mptcp: implement mptcp_pm_remove_subflow") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-7-38035d40de5b@... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm.c | 10 ---------- net/mptcp/pm_netlink.c | 16 +++++++--------- net/mptcp/protocol.h | 3 --- 3 files changed, 7 insertions(+), 22 deletions(-)
--- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -60,16 +60,6 @@ int mptcp_pm_remove_addr(struct mptcp_so return 0; }
-int mptcp_pm_remove_subflow(struct mptcp_sock *msk, const struct mptcp_rm_list *rm_list) -{ - pr_debug("msk=%p, rm_list_nr=%d", msk, rm_list->nr); - - spin_lock_bh(&msk->pm.lock); - mptcp_pm_nl_rm_subflow_received(msk, rm_list); - spin_unlock_bh(&msk->pm.lock); - return 0; -} - /* path manager event handlers */
void mptcp_pm_new_connection(struct mptcp_sock *msk, const struct sock *ssk, int server_side) --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -857,8 +857,8 @@ static void mptcp_pm_nl_rm_addr_received mptcp_pm_nl_rm_addr_or_subflow(msk, &msk->pm.rm_list_rx, MPTCP_MIB_RMADDR); }
-void mptcp_pm_nl_rm_subflow_received(struct mptcp_sock *msk, - const struct mptcp_rm_list *rm_list) +static void mptcp_pm_nl_rm_subflow_received(struct mptcp_sock *msk, + const struct mptcp_rm_list *rm_list) { mptcp_pm_nl_rm_addr_or_subflow(msk, rm_list, MPTCP_MIB_RMSUBFLOW); } @@ -1471,7 +1471,9 @@ static int mptcp_nl_remove_subflow_and_s !(entry->flags & MPTCP_PM_ADDR_FLAG_IMPLICIT));
if (remove_subflow) { - mptcp_pm_remove_subflow(msk, &list); + spin_lock_bh(&msk->pm.lock); + mptcp_pm_nl_rm_subflow_received(msk, &list); + spin_unlock_bh(&msk->pm.lock); } else if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) { /* If the subflow has been used, but now closed */ spin_lock_bh(&msk->pm.lock); @@ -1617,18 +1619,14 @@ static void mptcp_pm_remove_addrs_and_su alist.ids[alist.nr++] = entry->addr.id; }
+ spin_lock_bh(&msk->pm.lock); if (alist.nr) { - spin_lock_bh(&msk->pm.lock); msk->pm.add_addr_signaled -= alist.nr; mptcp_pm_remove_addr(msk, &alist); - spin_unlock_bh(&msk->pm.lock); } - if (slist.nr) - mptcp_pm_remove_subflow(msk, &slist); - + mptcp_pm_nl_rm_subflow_received(msk, &slist); /* Reset counters: maybe some subflows have been removed before */ - spin_lock_bh(&msk->pm.lock); bitmap_fill(msk->pm.id_avail_bitmap, MPTCP_PM_MAX_ADDR_ID + 1); msk->pm.local_addr_used = 0; spin_unlock_bh(&msk->pm.lock); --- a/net/mptcp/protocol.h +++ b/net/mptcp/protocol.h @@ -1021,7 +1021,6 @@ int mptcp_pm_announce_addr(struct mptcp_ const struct mptcp_addr_info *addr, bool echo); int mptcp_pm_remove_addr(struct mptcp_sock *msk, const struct mptcp_rm_list *rm_list); -int mptcp_pm_remove_subflow(struct mptcp_sock *msk, const struct mptcp_rm_list *rm_list); void mptcp_pm_remove_addrs(struct mptcp_sock *msk, struct list_head *rm_list);
void mptcp_free_local_addr_list(struct mptcp_sock *msk); @@ -1128,8 +1127,6 @@ static inline u8 subflow_get_local_id(co
void __init mptcp_pm_nl_init(void); void mptcp_pm_nl_work(struct mptcp_sock *msk); -void mptcp_pm_nl_rm_subflow_received(struct mptcp_sock *msk, - const struct mptcp_rm_list *rm_list); unsigned int mptcp_pm_get_add_addr_signal_max(const struct mptcp_sock *msk); unsigned int mptcp_pm_get_add_addr_accept_max(const struct mptcp_sock *msk); unsigned int mptcp_pm_get_subflows_max(const struct mptcp_sock *msk);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 322ea3778965da72862cca2a0c50253aacf65fe6 upstream.
Adding the following warning ...
WARN_ON_ONCE(msk->pm.local_addr_used == 0)
... before decrementing the local_addr_used counter helped to find a bug when running the "remove single address" subtest from the mptcp_join.sh selftests.
Removing a 'signal' endpoint will trigger the removal of all subflows linked to this endpoint via mptcp_pm_nl_rm_addr_or_subflow() with rm_type == MPTCP_MIB_RMSUBFLOW. This will decrement the local_addr_used counter, which is wrong in this case because this counter is linked to 'subflow' endpoints, and here it is a 'signal' endpoint that is being removed.
Now, the counter is decremented, only if the ID is being used outside of mptcp_pm_nl_rm_addr_or_subflow(), only for 'subflow' endpoints, and if the ID is not 0 -- local_addr_used is not taking into account these ones. This marking of the ID as being available, and the decrement is done no matter if a subflow using this ID is currently available, because the subflow could have been closed before.
Fixes: 06faa2271034 ("mptcp: remove multi addresses and subflows in PM") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-8-38035d40de5b@... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -833,10 +833,10 @@ static void mptcp_pm_nl_rm_addr_or_subfl if (rm_type == MPTCP_MIB_RMSUBFLOW) __MPTCP_INC_STATS(sock_net(sk), rm_type); } - if (rm_type == MPTCP_MIB_RMSUBFLOW) - __set_bit(rm_id ? rm_id : msk->mpc_endpoint_id, msk->pm.id_avail_bitmap); - else if (rm_type == MPTCP_MIB_RMADDR) + + if (rm_type == MPTCP_MIB_RMADDR) __MPTCP_INC_STATS(sock_net(sk), rm_type); + if (!removed) continue;
@@ -846,8 +846,6 @@ static void mptcp_pm_nl_rm_addr_or_subfl if (rm_type == MPTCP_MIB_RMADDR) { msk->pm.add_addr_accepted--; WRITE_ONCE(msk->pm.accept_addr, true); - } else if (rm_type == MPTCP_MIB_RMSUBFLOW) { - msk->pm.local_addr_used--; } } } @@ -1441,6 +1439,14 @@ static bool mptcp_pm_remove_anno_addr(st return ret; }
+static void __mark_subflow_endp_available(struct mptcp_sock *msk, u8 id) +{ + /* If it was marked as used, and not ID 0, decrement local_addr_used */ + if (!__test_and_set_bit(id ? : msk->mpc_endpoint_id, msk->pm.id_avail_bitmap) && + id && !WARN_ON_ONCE(msk->pm.local_addr_used == 0)) + msk->pm.local_addr_used--; +} + static int mptcp_nl_remove_subflow_and_signal_addr(struct net *net, const struct mptcp_pm_addr_entry *entry) { @@ -1474,11 +1480,11 @@ static int mptcp_nl_remove_subflow_and_s spin_lock_bh(&msk->pm.lock); mptcp_pm_nl_rm_subflow_received(msk, &list); spin_unlock_bh(&msk->pm.lock); - } else if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) { - /* If the subflow has been used, but now closed */ + } + + if (entry->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) { spin_lock_bh(&msk->pm.lock); - if (!__test_and_set_bit(entry->addr.id, msk->pm.id_avail_bitmap)) - msk->pm.local_addr_used--; + __mark_subflow_endp_available(msk, list.ids[0]); spin_unlock_bh(&msk->pm.lock); }
@@ -1516,6 +1522,7 @@ static int mptcp_nl_remove_id_zero_addre spin_lock_bh(&msk->pm.lock); mptcp_pm_remove_addr(msk, &list); mptcp_pm_nl_rm_subflow_received(msk, &list); + __mark_subflow_endp_available(msk, 0); spin_unlock_bh(&msk->pm.lock); release_sock(sk);
@@ -1917,6 +1924,7 @@ static void mptcp_pm_nl_fullmesh(struct
spin_lock_bh(&msk->pm.lock); mptcp_pm_nl_rm_subflow_received(msk, &list); + __mark_subflow_endp_available(msk, list.ids[0]); mptcp_pm_create_subflow_or_signal_addr(msk); spin_unlock_bh(&msk->pm.lock); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 1c1f721375989579e46741f59523e39ec9b2a9bd upstream.
Adding the following warning ...
WARN_ON_ONCE(msk->pm.add_addr_accepted == 0)
... before decrementing the add_addr_accepted counter helped to find a bug when running the "remove single subflow" subtest from the mptcp_join.sh selftest.
Removing a 'subflow' endpoint will first trigger a RM_ADDR, then the subflow closure. Before this patch, and upon the reception of the RM_ADDR, the other peer will then try to decrement this add_addr_accepted. That's not correct because the attached subflows have not been created upon the reception of an ADD_ADDR.
A way to solve that is to decrement the counter only if the attached subflow was an MP_JOIN to a remote id that was not 0, and initiated by the host receiving the RM_ADDR.
Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-9-38035d40de5b@... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -829,7 +829,7 @@ static void mptcp_pm_nl_rm_addr_or_subfl mptcp_close_ssk(sk, ssk, subflow); spin_lock_bh(&msk->pm.lock);
- removed = true; + removed |= subflow->request_join; if (rm_type == MPTCP_MIB_RMSUBFLOW) __MPTCP_INC_STATS(sock_net(sk), rm_type); } @@ -843,7 +843,11 @@ static void mptcp_pm_nl_rm_addr_or_subfl if (!mptcp_pm_is_kernel(msk)) continue;
- if (rm_type == MPTCP_MIB_RMADDR) { + if (rm_type == MPTCP_MIB_RMADDR && rm_id && + !WARN_ON_ONCE(msk->pm.add_addr_accepted == 0)) { + /* Note: if the subflow has been closed before, this + * add_addr_accepted counter will not be decremented. + */ msk->pm.add_addr_accepted--; WRITE_ONCE(msk->pm.accept_addr, true); }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 0137a3c7c2ea3f9df8ebfc65d78b4ba712a187bb upstream.
The limits might have changed in between, it is best to check them before accepting new ADD_ADDR.
Fixes: d0876b2284cf ("mptcp: add the incoming RM_ADDR support") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-10-38035d40de5b... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -848,8 +848,8 @@ static void mptcp_pm_nl_rm_addr_or_subfl /* Note: if the subflow has been closed before, this * add_addr_accepted counter will not be decremented. */ - msk->pm.add_addr_accepted--; - WRITE_ONCE(msk->pm.accept_addr, true); + if (--msk->pm.add_addr_accepted < mptcp_pm_get_add_addr_accept_max(msk)) + WRITE_ONCE(msk->pm.accept_addr, true); } } }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit ca6e55a703ca2894611bb5c5bca8bfd2290fd91e upstream.
The ID 0 is specific per MPTCP connections. The per netns entries cannot have this special ID 0 then.
But that's different for the userspace PM where the entries are per connection, they can then use this special ID 0.
Fixes: f40be0db0b76 ("mptcp: unify pm get_flags_and_ifindex_by_id") Cc: stable@vger.kernel.org Acked-by: Geliang Tang geliang@kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-11-38035d40de5b... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm.c | 3 --- net/mptcp/pm_netlink.c | 4 ++++ 2 files changed, 4 insertions(+), 3 deletions(-)
--- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -434,9 +434,6 @@ int mptcp_pm_get_flags_and_ifindex_by_id *flags = 0; *ifindex = 0;
- if (!id) - return 0; - if (mptcp_pm_is_userspace(msk)) return mptcp_userspace_pm_get_flags_and_ifindex_by_id(msk, id, flags, ifindex); return mptcp_pm_nl_get_flags_and_ifindex_by_id(msk, id, flags, ifindex); --- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -1395,6 +1395,10 @@ int mptcp_pm_nl_get_flags_and_ifindex_by struct sock *sk = (struct sock *)msk; struct net *net = sock_net(sk);
+ /* No entries with ID 0 */ + if (id == 0) + return 0; + rcu_read_lock(); entry = __lookup_addr_by_id(pm_nl_get_pernet(net), id); if (entry) {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 09355f7abb9fbfc1a240be029837921ea417bf4f upstream.
When reacting upon the reception of an ADD_ADDR, the in-kernel PM first looks for fullmesh endpoints. If there are some, it will pick them, using their entry ID.
It should set the ID 0 when using the endpoint corresponding to the initial subflow, it is a special case imposed by the MPTCP specs.
Note that msk->mpc_endpoint_id might not be set when receiving the first ADD_ADDR from the server. So better to compare the addresses.
Fixes: 1a0d6136c5f0 ("mptcp: local addresses fullmesh") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-12-38035d40de5b... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -636,6 +636,7 @@ static unsigned int fill_local_addresses { struct sock *sk = (struct sock *)msk; struct mptcp_pm_addr_entry *entry; + struct mptcp_addr_info mpc_addr; struct pm_nl_pernet *pernet; unsigned int subflows_max; int i = 0; @@ -643,6 +644,8 @@ static unsigned int fill_local_addresses pernet = pm_nl_get_pernet_from_msk(msk); subflows_max = mptcp_pm_get_subflows_max(msk);
+ mptcp_local_address((struct sock_common *)msk, &mpc_addr); + rcu_read_lock(); list_for_each_entry_rcu(entry, &pernet->local_addr_list, list) { if (!(entry->flags & MPTCP_PM_ADDR_FLAG_FULLMESH)) @@ -653,7 +656,13 @@ static unsigned int fill_local_addresses
if (msk->pm.subflows < subflows_max) { msk->pm.subflows++; - addrs[i++] = entry->addr; + addrs[i] = entry->addr; + + /* Special case for ID0: set the correct ID */ + if (mptcp_addresses_equal(&entry->addr, &mpc_addr, entry->addr.port)) + addrs[i].id = 0; + + i++; } } rcu_read_unlock();
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 48e50dcbcbaaf713d82bf2da5c16aeced94ad07d upstream.
select_local_address() and select_signal_address() both select an endpoint entry from the list inside an RCU protected section, but return a reference to it, to be read later on. If the entry is dereferenced after the RCU unlock, reading info could cause a Use-after-Free.
A simple solution is to copy the required info while inside the RCU protected section to avoid any risk of UaF later. The address ID might need to be modified later to handle the ID0 case later, so a copy seems OK to deal with.
Reported-by: Paolo Abeni pabeni@redhat.com Closes: https://lore.kernel.org/45cd30d3-7710-491c-ae4d-a1368c00beb1@redhat.com Fixes: 01cacb00b35c ("mptcp: add netlink-based PM") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-14-38035d40de5b... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/pm_netlink.c | 64 ++++++++++++++++++++++++++----------------------- 1 file changed, 34 insertions(+), 30 deletions(-)
--- a/net/mptcp/pm_netlink.c +++ b/net/mptcp/pm_netlink.c @@ -143,11 +143,13 @@ static bool lookup_subflow_by_daddr(cons return false; }
-static struct mptcp_pm_addr_entry * +static bool select_local_address(const struct pm_nl_pernet *pernet, - const struct mptcp_sock *msk) + const struct mptcp_sock *msk, + struct mptcp_pm_addr_entry *new_entry) { - struct mptcp_pm_addr_entry *entry, *ret = NULL; + struct mptcp_pm_addr_entry *entry; + bool found = false;
msk_owned_by_me(msk);
@@ -159,17 +161,21 @@ select_local_address(const struct pm_nl_ if (!test_bit(entry->addr.id, msk->pm.id_avail_bitmap)) continue;
- ret = entry; + *new_entry = *entry; + found = true; break; } rcu_read_unlock(); - return ret; + + return found; }
-static struct mptcp_pm_addr_entry * -select_signal_address(struct pm_nl_pernet *pernet, const struct mptcp_sock *msk) +static bool +select_signal_address(struct pm_nl_pernet *pernet, const struct mptcp_sock *msk, + struct mptcp_pm_addr_entry *new_entry) { - struct mptcp_pm_addr_entry *entry, *ret = NULL; + struct mptcp_pm_addr_entry *entry; + bool found = false;
rcu_read_lock(); /* do not keep any additional per socket state, just signal @@ -184,11 +190,13 @@ select_signal_address(struct pm_nl_perne if (!(entry->flags & MPTCP_PM_ADDR_FLAG_SIGNAL)) continue;
- ret = entry; + *new_entry = *entry; + found = true; break; } rcu_read_unlock(); - return ret; + + return found; }
unsigned int mptcp_pm_get_add_addr_signal_max(const struct mptcp_sock *msk) @@ -512,9 +520,10 @@ __lookup_addr(struct pm_nl_pernet *perne
static void mptcp_pm_create_subflow_or_signal_addr(struct mptcp_sock *msk) { - struct mptcp_pm_addr_entry *local, *signal_and_subflow = NULL; struct sock *sk = (struct sock *)msk; + struct mptcp_pm_addr_entry local; unsigned int add_addr_signal_max; + bool signal_and_subflow = false; unsigned int local_addr_max; struct pm_nl_pernet *pernet; unsigned int subflows_max; @@ -565,23 +574,22 @@ static void mptcp_pm_create_subflow_or_s if (msk->pm.addr_signal & BIT(MPTCP_ADD_ADDR_SIGNAL)) return;
- local = select_signal_address(pernet, msk); - if (!local) + if (!select_signal_address(pernet, msk, &local)) goto subflow;
/* If the alloc fails, we are on memory pressure, not worth * continuing, and trying to create subflows. */ - if (!mptcp_pm_alloc_anno_list(msk, &local->addr)) + if (!mptcp_pm_alloc_anno_list(msk, &local.addr)) return;
- __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); + __clear_bit(local.addr.id, msk->pm.id_avail_bitmap); msk->pm.add_addr_signaled++; - mptcp_pm_announce_addr(msk, &local->addr, false); + mptcp_pm_announce_addr(msk, &local.addr, false); mptcp_pm_nl_addr_send_ack(msk);
- if (local->flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) - signal_and_subflow = local; + if (local.flags & MPTCP_PM_ADDR_FLAG_SUBFLOW) + signal_and_subflow = true; }
subflow: @@ -592,26 +600,22 @@ subflow: bool fullmesh; int i, nr;
- if (signal_and_subflow) { - local = signal_and_subflow; - signal_and_subflow = NULL; - } else { - local = select_local_address(pernet, msk); - if (!local) - break; - } + if (signal_and_subflow) + signal_and_subflow = false; + else if (!select_local_address(pernet, msk, &local)) + break;
- fullmesh = !!(local->flags & MPTCP_PM_ADDR_FLAG_FULLMESH); + fullmesh = !!(local.flags & MPTCP_PM_ADDR_FLAG_FULLMESH);
msk->pm.local_addr_used++; - __clear_bit(local->addr.id, msk->pm.id_avail_bitmap); - nr = fill_remote_addresses_vec(msk, &local->addr, fullmesh, addrs); + __clear_bit(local.addr.id, msk->pm.id_avail_bitmap); + nr = fill_remote_addresses_vec(msk, &local.addr, fullmesh, addrs); if (nr == 0) continue;
spin_unlock_bh(&msk->pm.lock); for (i = 0; i < nr; i++) - __mptcp_subflow_connect(sk, &local->addr, &addrs[i]); + __mptcp_subflow_connect(sk, &local.addr, &addrs[i]); spin_lock_bh(&msk->pm.lock); } mptcp_pm_nl_check_work_pending(msk);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 4878f9f8421f4587bee7b232c1c8a9d3a7d4d782 upstream.
This case was not covered, and the wrong ID was set before the previous commit.
The rest is not modified, it is just that it will increase the code coverage.
The right address ID can be verified by looking at the packet traces. We could automate that using Netfilter with some cBPF code for example, but that's always a bit cryptic. Packetdrill seems better fitted for that.
Fixes: 4f49d63352da ("selftests: mptcp: add fullmesh testcases") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-13-38035d40de5b... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 1 + 1 file changed, 1 insertion(+)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -3058,6 +3058,7 @@ fullmesh_tests() pm_nl_set_limits $ns1 1 3 pm_nl_set_limits $ns2 1 3 pm_nl_add_endpoint $ns1 10.0.2.1 flags signal + pm_nl_add_endpoint $ns2 10.0.1.2 flags subflow,fullmesh fullmesh=1 speed=slow \ run_tests $ns1 $ns2 10.0.1.1 chk_join_nr 3 3 3
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthieu Baerts (NGI0) matttbe@kernel.org
commit 65fb58afa341ad68e71e5c4d816b407e6a683a66 upstream.
This test extends "delete and re-add" to validate the previous commit. A new 'subflow' endpoint is added, but the subflow request will be rejected. The result is that no subflow will be established from this address.
Later, the endpoint is removed and re-added after having cleared the firewall rule. Before the previous commit, the client would not have been able to create this new subflow.
While at it, extra checks have been added to validate the expected numbers of MPJ and RM_ADDR.
The 'Fixes' tag here below is the same as the one from the previous commit: this patch here is not fixing anything wrong in the selftests, but it validates the previous fix for an issue introduced by this commit ID.
Fixes: b6c08380860b ("mptcp: remove addr and subflow in PM netlink") Cc: stable@vger.kernel.org Reviewed-by: Mat Martineau martineau@kernel.org Signed-off-by: Matthieu Baerts (NGI0) matttbe@kernel.org Link: https://patch.msgid.link/20240819-net-mptcp-pm-reusing-id-v1-4-38035d40de5b@... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 27 +++++++++++++++++++----- 1 file changed, 22 insertions(+), 5 deletions(-)
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -436,9 +436,10 @@ reset_with_tcp_filter() local ns="${!1}" local src="${2}" local target="${3}" + local chain="${4:-INPUT}"
if ! ip netns exec "${ns}" ${iptables} \ - -A INPUT \ + -A "${chain}" \ -s "${src}" \ -p tcp \ -j "${target}"; then @@ -3572,10 +3573,10 @@ endpoint_tests() mptcp_lib_kill_wait $tests_pid fi
- if reset "delete and re-add" && + if reset_with_tcp_filter "delete and re-add" ns2 10.0.3.2 REJECT OUTPUT && mptcp_lib_kallsyms_has "subflow_rebuild_header$"; then - pm_nl_set_limits $ns1 1 1 - pm_nl_set_limits $ns2 1 1 + pm_nl_set_limits $ns1 0 2 + pm_nl_set_limits $ns2 0 2 pm_nl_add_endpoint $ns2 10.0.2.2 id 2 dev ns2eth2 flags subflow test_linkfail=4 speed=20 \ run_tests $ns1 $ns2 10.0.1.1 & @@ -3592,11 +3593,27 @@ endpoint_tests() chk_subflow_nr "after delete" 1 chk_mptcp_info subflows 0 subflows 0
- pm_nl_add_endpoint $ns2 10.0.2.2 dev ns2eth2 flags subflow + pm_nl_add_endpoint $ns2 10.0.2.2 id 2 dev ns2eth2 flags subflow wait_mpj $ns2 chk_subflow_nr "after re-add" 2 chk_mptcp_info subflows 1 subflows 1 + + pm_nl_add_endpoint $ns2 10.0.3.2 id 3 flags subflow + wait_attempt_fail $ns2 + chk_subflow_nr "after new reject" 2 + chk_mptcp_info subflows 1 subflows 1 + + ip netns exec "${ns2}" ${iptables} -D OUTPUT -s "10.0.3.2" -p tcp -j REJECT + pm_nl_del_endpoint $ns2 3 10.0.3.2 + pm_nl_add_endpoint $ns2 10.0.3.2 id 3 flags subflow + wait_mpj $ns2 + chk_subflow_nr "after no reject" 3 + chk_mptcp_info subflows 2 subflows 2 + mptcp_lib_kill_wait $tests_pid + + chk_join_nr 3 3 3 + chk_rm_nr 1 1 fi }
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Matthew Brost matthew.brost@intel.com
commit 5d30de4311d2d4165e78dc021c5cacb7496b3491 upstream.
job->fence is not assigned until xe_sched_job_arm(), check for job->fence in xe_sched_job_seqno() so any usage of this function (trace points) do not result in NULL ptr dereference. Also check job->fence before assigning error in job trace points.
Fixes: 0ac7a2c745e8 ("drm/xe: Don't initialize fences at xe_sched_job_create()") Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Signed-off-by: Matthew Brost matthew.brost@intel.com Reviewed-by: Lucas De Marchi lucas.demarchi@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240605055041.2082074-1-matth... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/xe/xe_sched_job.h | 2 +- drivers/gpu/drm/xe/xe_trace.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/xe/xe_sched_job.h +++ b/drivers/gpu/drm/xe/xe_sched_job.h @@ -70,7 +70,7 @@ to_xe_sched_job(struct drm_sched_job *dr
static inline u32 xe_sched_job_seqno(struct xe_sched_job *job) { - return job->fence->seqno; + return job->fence ? job->fence->seqno : 0; }
static inline u32 xe_sched_job_lrc_seqno(struct xe_sched_job *job) --- a/drivers/gpu/drm/xe/xe_trace.h +++ b/drivers/gpu/drm/xe/xe_trace.h @@ -270,7 +270,7 @@ DECLARE_EVENT_CLASS(xe_sched_job, __entry->guc_state = atomic_read(&job->q->guc->state); __entry->flags = job->q->flags; - __entry->error = job->fence->error; + __entry->error = job->fence ? job->fence->error : 0; __entry->fence = job->fence; __entry->batch_addr = (u64)job->ptrs[0].batch_addr; ),
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Brauner brauner@kernel.org
commit 232590ea7fc125986a526e03081b98e5783f70d2 upstream.
This reverts commit 3b5bbe798b2451820e74243b738268f51901e7d0.
Eric reported that systemd-shutdown gets broken by blocking the creating of pidfds for kthreads as older versions seems to rely on being able to create a pidfd for any process in /proc.
Reported-by: Eric Biggers ebiggers@kernel.org Link: https://lore.kernel.org/r/20240818035818.GA1929@sol.localdomain Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/fork.c | 25 +++---------------------- 1 file changed, 3 insertions(+), 22 deletions(-)
--- a/kernel/fork.c +++ b/kernel/fork.c @@ -2069,23 +2069,10 @@ static int __pidfd_prepare(struct pid *p */ int pidfd_prepare(struct pid *pid, unsigned int flags, struct file **ret) { - if (!pid) - return -EINVAL; - - scoped_guard(rcu) { - struct task_struct *tsk; + bool thread = flags & PIDFD_THREAD;
- if (flags & PIDFD_THREAD) - tsk = pid_task(pid, PIDTYPE_PID); - else - tsk = pid_task(pid, PIDTYPE_TGID); - if (!tsk) - return -EINVAL; - - /* Don't create pidfds for kernel threads for now. */ - if (tsk->flags & PF_KTHREAD) - return -EINVAL; - } + if (!pid || !pid_has_task(pid, thread ? PIDTYPE_PID : PIDTYPE_TGID)) + return -EINVAL;
return __pidfd_prepare(pid, flags, ret); } @@ -2432,12 +2419,6 @@ __latent_entropy struct task_struct *cop if (clone_flags & CLONE_PIDFD) { int flags = (clone_flags & CLONE_THREAD) ? PIDFD_THREAD : 0;
- /* Don't create pidfds for kernel threads for now. */ - if (args->kthread) { - retval = -EINVAL; - goto bad_fork_free_pid; - } - /* Note that no task has been attached to @pid yet. */ retval = __pidfd_prepare(pid, flags, &pidfile); if (retval < 0)
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boyuan Zhang boyuan.zhang@amd.com
commit ecfa23c8df7ef3ea2a429dfe039341bf792e95b4 upstream.
Determine whether VCN using unified queue in sw_init, instead of calling functions later on.
v2: fix coding style
Signed-off-by: Boyuan Zhang boyuan.zhang@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Ruijing Dong ruijing.dong@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 39 ++++++++++++-------------------- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 2 files changed, 16 insertions(+), 24 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -151,6 +151,10 @@ int amdgpu_vcn_sw_init(struct amdgpu_dev } }
+ /* from vcn4 and above, only unified queue is used */ + adev->vcn.using_unified_queue = + amdgpu_ip_version(adev, UVD_HWIP, 0) >= IP_VERSION(4, 0, 0); + hdr = (const struct common_firmware_header *)adev->vcn.fw[0]->data; adev->vcn.fw_version = le32_to_cpu(hdr->ucode_version);
@@ -279,18 +283,6 @@ int amdgpu_vcn_sw_fini(struct amdgpu_dev return 0; }
-/* from vcn4 and above, only unified queue is used */ -static bool amdgpu_vcn_using_unified_queue(struct amdgpu_ring *ring) -{ - struct amdgpu_device *adev = ring->adev; - bool ret = false; - - if (amdgpu_ip_version(adev, UVD_HWIP, 0) >= IP_VERSION(4, 0, 0)) - ret = true; - - return ret; -} - bool amdgpu_vcn_is_disabled_vcn(struct amdgpu_device *adev, enum vcn_ring_type type, uint32_t vcn_instance) { bool ret = false; @@ -728,12 +720,11 @@ static int amdgpu_vcn_dec_sw_send_msg(st struct amdgpu_job *job; struct amdgpu_ib *ib; uint64_t addr = AMDGPU_GPU_PAGE_ALIGN(ib_msg->gpu_addr); - bool sq = amdgpu_vcn_using_unified_queue(ring); uint32_t *ib_checksum; uint32_t ib_pack_in_dw; int i, r;
- if (sq) + if (adev->vcn.using_unified_queue) ib_size_dw += 8;
r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, @@ -746,7 +737,7 @@ static int amdgpu_vcn_dec_sw_send_msg(st ib->length_dw = 0;
/* single queue headers */ - if (sq) { + if (adev->vcn.using_unified_queue) { ib_pack_in_dw = sizeof(struct amdgpu_vcn_decode_buffer) / sizeof(uint32_t) + 4 + 2; /* engine info + decoding ib in dw */ ib_checksum = amdgpu_vcn_unified_ring_ib_header(ib, ib_pack_in_dw, false); @@ -765,7 +756,7 @@ static int amdgpu_vcn_dec_sw_send_msg(st for (i = ib->length_dw; i < ib_size_dw; ++i) ib->ptr[i] = 0x0;
- if (sq) + if (adev->vcn.using_unified_queue) amdgpu_vcn_unified_ring_ib_checksum(&ib_checksum, ib_pack_in_dw);
r = amdgpu_job_submit_direct(job, ring, &f); @@ -855,15 +846,15 @@ static int amdgpu_vcn_enc_get_create_msg struct dma_fence **fence) { unsigned int ib_size_dw = 16; + struct amdgpu_device *adev = ring->adev; struct amdgpu_job *job; struct amdgpu_ib *ib; struct dma_fence *f = NULL; uint32_t *ib_checksum = NULL; uint64_t addr; - bool sq = amdgpu_vcn_using_unified_queue(ring); int i, r;
- if (sq) + if (adev->vcn.using_unified_queue) ib_size_dw += 8;
r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, @@ -877,7 +868,7 @@ static int amdgpu_vcn_enc_get_create_msg
ib->length_dw = 0;
- if (sq) + if (adev->vcn.using_unified_queue) ib_checksum = amdgpu_vcn_unified_ring_ib_header(ib, 0x11, true);
ib->ptr[ib->length_dw++] = 0x00000018; @@ -899,7 +890,7 @@ static int amdgpu_vcn_enc_get_create_msg for (i = ib->length_dw; i < ib_size_dw; ++i) ib->ptr[i] = 0x0;
- if (sq) + if (adev->vcn.using_unified_queue) amdgpu_vcn_unified_ring_ib_checksum(&ib_checksum, 0x11);
r = amdgpu_job_submit_direct(job, ring, &f); @@ -922,15 +913,15 @@ static int amdgpu_vcn_enc_get_destroy_ms struct dma_fence **fence) { unsigned int ib_size_dw = 16; + struct amdgpu_device *adev = ring->adev; struct amdgpu_job *job; struct amdgpu_ib *ib; struct dma_fence *f = NULL; uint32_t *ib_checksum = NULL; uint64_t addr; - bool sq = amdgpu_vcn_using_unified_queue(ring); int i, r;
- if (sq) + if (adev->vcn.using_unified_queue) ib_size_dw += 8;
r = amdgpu_job_alloc_with_ib(ring->adev, NULL, NULL, @@ -944,7 +935,7 @@ static int amdgpu_vcn_enc_get_destroy_ms
ib->length_dw = 0;
- if (sq) + if (adev->vcn.using_unified_queue) ib_checksum = amdgpu_vcn_unified_ring_ib_header(ib, 0x11, true);
ib->ptr[ib->length_dw++] = 0x00000018; @@ -966,7 +957,7 @@ static int amdgpu_vcn_enc_get_destroy_ms for (i = ib->length_dw; i < ib_size_dw; ++i) ib->ptr[i] = 0x0;
- if (sq) + if (adev->vcn.using_unified_queue) amdgpu_vcn_unified_ring_ib_checksum(&ib_checksum, 0x11);
r = amdgpu_job_submit_direct(job, ring, &f); --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h @@ -329,6 +329,7 @@ struct amdgpu_vcn {
uint16_t inst_mask; uint8_t num_inst_per_aid; + bool using_unified_queue; };
struct amdgpu_fw_shared_rb_ptrs_struct {
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Boyuan Zhang boyuan.zhang@amd.com
commit 7d75ef3736a025db441be652c8cc8e84044a215f upstream.
For unified queue, DPG pause for encoding is done inside VCN firmware, so there is no need to pause dpg based on ring type in kernel.
For VCN3 and below, pausing DPG for encoding in kernel is still needed.
v2: add more comments v3: update commit message
Signed-off-by: Boyuan Zhang boyuan.zhang@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Ruijing Dong ruijing.dong@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c @@ -393,7 +393,9 @@ static void amdgpu_vcn_idle_work_handler for (i = 0; i < adev->vcn.num_enc_rings; ++i) fence[j] += amdgpu_fence_count_emitted(&adev->vcn.inst[j].ring_enc[i]);
- if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) { + /* Only set DPG pause for VCN3 or below, VCN4 and above will be handled by FW */ + if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG && + !adev->vcn.using_unified_queue) { struct dpg_pause_state new_state;
if (fence[j] || @@ -439,7 +441,9 @@ void amdgpu_vcn_ring_begin_use(struct am amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCN, AMD_PG_STATE_UNGATE);
- if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) { + /* Only set DPG pause for VCN3 or below, VCN4 and above will be handled by FW */ + if (adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG && + !adev->vcn.using_unified_queue) { struct dpg_pause_state new_state;
if (ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC) { @@ -465,8 +469,12 @@ void amdgpu_vcn_ring_begin_use(struct am
void amdgpu_vcn_ring_end_use(struct amdgpu_ring *ring) { + struct amdgpu_device *adev = ring->adev; + + /* Only set DPG pause for VCN3 or below, VCN4 and above will be handled by FW */ if (ring->adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG && - ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC) + ring->funcs->type == AMDGPU_RING_TYPE_VCN_ENC && + !adev->vcn.using_unified_queue) atomic_dec(&ring->adev->vcn.inst[ring->me].dpg_enc_submission_cnt);
atomic_dec(&ring->adev->vcn.total_submission_cnt);
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yonghong Song yonghong.song@linux.dev
commit 662c3e2db00f92e50c26e9dc4fe47c52223d9982 upstream.
A selftest is added such that without the previous patch, a crash can happen. With the previous patch, the test can run successfully. The new test is written in a way which mimics original crash case: main_prog static_prog_1 static_prog_2 where static_prog_1 has different paths to static_prog_2 and some path has stack allocated and some other path does not. A stacksafe() checking in static_prog_2() triggered the crash.
Signed-off-by: Yonghong Song yonghong.song@linux.dev Link: https://lore.kernel.org/r/20240812214852.214037-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Shung-Hsi Yu shung-hsi.yu@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/progs/iters.c | 54 ++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+)
--- a/tools/testing/selftests/bpf/progs/iters.c +++ b/tools/testing/selftests/bpf/progs/iters.c @@ -1434,4 +1434,58 @@ int iter_arr_with_actual_elem_count(cons return sum; }
+__u32 upper, select_n, result; +__u64 global; + +static __noinline bool nest_2(char *str) +{ + /* some insns (including branch insns) to ensure stacksafe() is triggered + * in nest_2(). This way, stacksafe() can compare frame associated with nest_1(). + */ + if (str[0] == 't') + return true; + if (str[1] == 'e') + return true; + if (str[2] == 's') + return true; + if (str[3] == 't') + return true; + return false; +} + +static __noinline bool nest_1(int n) +{ + /* case 0: allocate stack, case 1: no allocate stack */ + switch (n) { + case 0: { + char comm[16]; + + if (bpf_get_current_comm(comm, 16)) + return false; + return nest_2(comm); + } + case 1: + return nest_2((char *)&global); + default: + return false; + } +} + +SEC("raw_tp") +__success +int iter_subprog_check_stacksafe(const void *ctx) +{ + long i; + + bpf_for(i, 0, upper) { + if (!nest_1(select_n)) { + result = 1; + return 0; + } + } + + result = 2; + return 0; +} + char _license[] SEC("license") = "GPL";
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
[ Upstream commit 76e98a158b207771a6c9a0de0a60522a446a3447 ]
If there is ->PreviousSessionId field in the session setup request, The session of the previous connection should be destroyed. During this, if the smb2 operation requests in the previous session are being processed, a racy issue could happen with ksmbd_destroy_file_table(). This patch sets conn->status to KSMBD_SESS_NEED_RECONNECT to block incoming operations and waits until on-going operations are complete (i.e. idle) before desctorying the previous session.
Fixes: c8efcc786146 ("ksmbd: add support for durable handles v1/v2") Cc: stable@vger.kernel.org # v6.6+ Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-25040 Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/connection.c | 34 +++++++++++++++++++++++++++++++++- fs/smb/server/connection.h | 3 ++- fs/smb/server/mgmt/user_session.c | 8 ++++++++ fs/smb/server/smb2pdu.c | 2 +- 4 files changed, 44 insertions(+), 3 deletions(-)
--- a/fs/smb/server/connection.c +++ b/fs/smb/server/connection.c @@ -165,11 +165,43 @@ void ksmbd_all_conn_set_status(u64 sess_ up_read(&conn_list_lock); }
-void ksmbd_conn_wait_idle(struct ksmbd_conn *conn, u64 sess_id) +void ksmbd_conn_wait_idle(struct ksmbd_conn *conn) { wait_event(conn->req_running_q, atomic_read(&conn->req_running) < 2); }
+int ksmbd_conn_wait_idle_sess_id(struct ksmbd_conn *curr_conn, u64 sess_id) +{ + struct ksmbd_conn *conn; + int rc, retry_count = 0, max_timeout = 120; + int rcount = 1; + +retry_idle: + if (retry_count >= max_timeout) + return -EIO; + + down_read(&conn_list_lock); + list_for_each_entry(conn, &conn_list, conns_list) { + if (conn->binding || xa_load(&conn->sessions, sess_id)) { + if (conn == curr_conn) + rcount = 2; + if (atomic_read(&conn->req_running) >= rcount) { + rc = wait_event_timeout(conn->req_running_q, + atomic_read(&conn->req_running) < rcount, + HZ); + if (!rc) { + up_read(&conn_list_lock); + retry_count++; + goto retry_idle; + } + } + } + } + up_read(&conn_list_lock); + + return 0; +} + int ksmbd_conn_write(struct ksmbd_work *work) { struct ksmbd_conn *conn = work->conn; --- a/fs/smb/server/connection.h +++ b/fs/smb/server/connection.h @@ -145,7 +145,8 @@ extern struct list_head conn_list; extern struct rw_semaphore conn_list_lock;
bool ksmbd_conn_alive(struct ksmbd_conn *conn); -void ksmbd_conn_wait_idle(struct ksmbd_conn *conn, u64 sess_id); +void ksmbd_conn_wait_idle(struct ksmbd_conn *conn); +int ksmbd_conn_wait_idle_sess_id(struct ksmbd_conn *curr_conn, u64 sess_id); struct ksmbd_conn *ksmbd_conn_alloc(void); void ksmbd_conn_free(struct ksmbd_conn *conn); bool ksmbd_conn_lookup_dialect(struct ksmbd_conn *c); --- a/fs/smb/server/mgmt/user_session.c +++ b/fs/smb/server/mgmt/user_session.c @@ -310,6 +310,7 @@ void destroy_previous_session(struct ksm { struct ksmbd_session *prev_sess; struct ksmbd_user *prev_user; + int err;
down_write(&sessions_table_lock); down_write(&conn->session_lock); @@ -324,8 +325,15 @@ void destroy_previous_session(struct ksm memcmp(user->passkey, prev_user->passkey, user->passkey_sz)) goto out;
+ ksmbd_all_conn_set_status(id, KSMBD_SESS_NEED_RECONNECT); + err = ksmbd_conn_wait_idle_sess_id(conn, id); + if (err) { + ksmbd_all_conn_set_status(id, KSMBD_SESS_NEED_NEGOTIATE); + goto out; + } ksmbd_destroy_file_table(&prev_sess->file_table); prev_sess->state = SMB2_SESSION_EXPIRED; + ksmbd_all_conn_set_status(id, KSMBD_SESS_NEED_NEGOTIATE); out: up_write(&conn->session_lock); up_write(&sessions_table_lock); --- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -2210,7 +2210,7 @@ int smb2_session_logoff(struct ksmbd_wor ksmbd_conn_unlock(conn);
ksmbd_close_session_fds(work); - ksmbd_conn_wait_idle(conn, sess_id); + ksmbd_conn_wait_idle(conn);
/* * Re-lookup session to validate if session is deleted
6.10-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
commit 99d3bf5f7377d42f8be60a6b9cb60fb0be34dceb upstream.
syzbot is reporting too large allocation at input_mt_init_slots(), for num_slots is supplied from userspace using ioctl(UI_DEV_CREATE).
Since nobody knows possible max slots, this patch chose 1024.
Reported-by: syzbot syzbot+0122fa359a69694395d5@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=0122fa359a69694395d5 Suggested-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Cc: George Kennedy george.kennedy@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/input/input-mt.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/input/input-mt.c +++ b/drivers/input/input-mt.c @@ -46,6 +46,9 @@ int input_mt_init_slots(struct input_dev return 0; if (mt) return mt->num_slots != num_slots ? -EINVAL : 0; + /* Arbitrary limit for avoiding too large memory allocation. */ + if (num_slots > 1024) + return -EINVAL;
mt = kzalloc(struct_size(mt, slots, num_slots), GFP_KERNEL); if (!mt)
On 8/27/24 07:35, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.10.7 release. There are 273 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.7-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com o-- Florian
Hello,
On Tue, 27 Aug 2024 16:35:24 +0200 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.10.7 release. There are 273 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.7-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] b49f2df929ba ("Linux 6.10.7-rc1")
Thanks, SJ
[...]
---
ok 7 selftests: damon: damos_quota_goal.py ok 8 selftests: damon: damos_apply_interval.py ok 9 selftests: damon: reclaim.sh ok 10 selftests: damon: lru_sort.sh ok 11 selftests: damon: debugfs_empty_targets.sh ok 12 selftests: damon: debugfs_huge_count_read_write.sh ok 13 selftests: damon: debugfs_duplicate_context_creation.sh ok 14 selftests: damon: debugfs_rm_non_contexts.sh ok 15 selftests: damon: debugfs_target_ids_read_before_terminate_race.sh ok 16 selftests: damon: debugfs_target_ids_pid_leak.sh ok 17 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 18 selftests: damon: sysfs_update_schemes_tried_regions_hang.py ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_arm64.sh ok 12 selftests: damon-tests: build_m68k.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m
Am 27.08.2024 um 16:35 schrieb Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.10.7 release. There are 273 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Builds, boots and works on my 2-socket Ivy Bridge Xeon E5-2697 v2 server. No dmesg oddities or regressions found.
Tested-by: Peter Schneider pschneider1968@googlemail.com
Beste Grüße, Peter Schneider
On Tue, Aug 27, 2024 at 04:35:24PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.10.7 release. There are 273 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Tested-by: Mark Brown broonie@kernel.org
On Tue, 27 Aug 2024 at 20:32, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.10.7 release. There are 273 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.7-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.10.7-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git commit: aa78b3c4e7eeb94e23ce019c419141739613862e * git describe: v6.10.6-274-gaa78b3c4e7ee * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.10.y/build/v6.10....
## Test Regressions (compared to v6.10.5-26-ga522cad06418)
## Metric Regressions (compared to v6.10.5-26-ga522cad06418)
## Test Fixes (compared to v6.10.5-26-ga522cad06418)
## Metric Fixes (compared to v6.10.5-26-ga522cad06418)
## Test result summary total: 171554, pass: 150636, fail: 1805, skip: 18905, xfail: 208
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 129 total, 127 passed, 2 failed * arm64: 41 total, 41 passed, 0 failed * i386: 28 total, 26 passed, 2 failed * mips: 26 total, 25 passed, 1 failed * parisc: 4 total, 3 passed, 1 failed * powerpc: 36 total, 35 passed, 1 failed * riscv: 19 total, 19 passed, 0 failed * s390: 14 total, 13 passed, 1 failed * sh: 10 total, 10 passed, 0 failed * sparc: 7 total, 6 passed, 1 failed * x86_64: 33 total, 33 passed, 0 failed
## Test suites summary * boot * commands * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-kcmp * kselftest-kvm * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-mincore * kselftest-mqueue * kselftest-net * kselftest-net-mptcp * kselftest-openat2 * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-tc-testing * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user_events * kselftest-vDSO * kselftest-x86 * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-hugetlb * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-smoke * ltp-syscalls * ltp-tracing * perf * rcutorture
-- Linaro LKFT https://lkft.linaro.org
On Tue, 27 Aug 2024 16:35:24 +0200 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.10.7 release. There are 273 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.7-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y and the diffstat can be found below.
thanks,
greg k-h
Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.10.7-rc1
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Input: MT - limit max slots
Namjae Jeon linkinjeon@kernel.org ksmbd: fix race condition between destroy_previous_session() and smb2 operations()
Yonghong Song yonghong.song@linux.dev selftests/bpf: Add a test to verify previous stacksafe() fix
Boyuan Zhang boyuan.zhang@amd.com drm/amdgpu/vcn: not pause dpg for unified queue
Boyuan Zhang boyuan.zhang@amd.com drm/amdgpu/vcn: identify unified queue in sw init
Christian Brauner brauner@kernel.org Revert "pidfd: prevent creation of pidfds for kthreads"
Matthew Brost matthew.brost@intel.com drm/xe: Do not dereference NULL job->fence in trace points
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: join: check re-using ID of closed subflow
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: mptcp: join: validate fullmesh endp on 1st sf
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: avoid possible UaF when selecting endp
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: fullmesh: select the right ID later
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: only in-kernel cannot have entries with ID 0
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: check add_addr_accept_max before accepting new ADD_ADDR
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: only decrement add_addr_accepted for MPJ req
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: only mark 'subflow' endp as available
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: remove mptcp_pm_remove_subflow()
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: re-using ID of unused flushed subflows
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: re-using ID of unused removed subflows
Matthieu Baerts (NGI0) matttbe@kernel.org mptcp: pm: re-using ID of unused removed ADD_ADDR
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org thermal: of: Fix OF node leak in of_thermal_zone_find() error paths
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org thermal: of: Fix OF node leak in thermal_of_zone_register()
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org thermal: of: Fix OF node leak in thermal_of_trips_init() error path
Dave Airlie airlied@redhat.com nouveau/firmware: use dma non-coherent allocator
Peng Fan peng.fan@nxp.com pmdomain: imx: wait SSAR when i.MX93 power domain on
Alexander Stein alexander.stein@ew.tq-group.com pmdomain: imx: scu-pd: Remove duplicated clocks
Steve French stfrench@microsoft.com smb3: fix broken cached reads when posix locks
Ben Whitten ben.whitten@gmail.com mmc: dw_mmc: allow biu and ciu clocks to defer
Mengqi Zhang mengqi.zhang@mediatek.com mmc: mtk-sd: receive cmd8 data when hs400 tuning fail
Waiman Long longman@redhat.com cgroup/cpuset: Clear effective_xcpus on cpus_allowed clearing only if cpus.exclusive not set
Chen Ridong chenridong@huawei.com cgroup/cpuset: fix panic caused by partcmd_update
Marc Zyngier maz@kernel.org KVM: arm64: Make ICC_*SGI*_EL1 undef in the absence of a vGICv3
Zenghui Yu yuzenghui@huawei.com KVM: arm64: vgic-debug: Don't put unmarked LPIs
Nikolay Kuratov kniv@yandex-team.ru cxgb4: add forgotten u64 ivlan cast before shift
Michael Ellerman mpe@ellerman.id.au ata: pata_macio: Fix DMA table overflow
Werner Sembach wse@tuxedocomputers.com Input: i8042 - use new forcenorestore quirk to replace old buggy quirk combination
Werner Sembach wse@tuxedocomputers.com Input: i8042 - add forcenorestore quirk to leave controller untouched even on s3
Nicolin Chen nicolinc@nvidia.com iommufd/device: Fix hwpt at err_unresv in iommufd_device_do_replace()
Jason Gerecke jason.gerecke@wacom.com HID: wacom: Defer calculation of resolution until resolution_code is known
Jiaxun Yang jiaxun.yang@flygoat.com MIPS: Loongson64: Set timer mode in cpu-probe
Martin Whitaker foss@martin-whitaker.me.uk net: dsa: microchip: fix PTP config failure when using multiple ports
Mengyuan Lou mengyuanlou@net-swift.com net: ngbe: Fix phy mode set to external phy
Harald Freudenberger freude@linux.ibm.com s390/ap: Refine AP bus bindings complete processing
Srinivas Pandruvada srinivas.pandruvada@linux.intel.com platform/x86: ISST: Fix return value on last invalid resource
Hans de Goede hdegoede@redhat.com platform/x86: dell-uart-backlight: Use acpi_video_get_backlight_type()
Hans de Goede hdegoede@redhat.com ACPI: video: Add backlight=native quirk for Dell OptiPlex 7760 AIO
Hans de Goede hdegoede@redhat.com ACPI: video: Add Dell UART backlight controller detection
Alex Deucher alexander.deucher@amd.com drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1
Candice Li candice.li@amd.com drm/amdgpu: Validate TA binary size
Namjae Jeon linkinjeon@kernel.org ksmbd: the buffer of smb2 query dir response has at least 1 byte
Chaotian Jing chaotian.jing@mediatek.com scsi: core: Fix the return value of scsi_logical_block_count()
Griffin Kroah-Hartman griffin@kroah.com Bluetooth: MGMT: Add error handling to pair_device()
Ming Lei ming.lei@redhat.com nvme: move stopping keep-alive into nvme_uninit_ctrl()
Paulo Alcantara pc@manguebit.com smb: client: ignore unhandled reparse tags
Alexander Gordeev agordeev@linux.ibm.com s390/boot: Fix KASLR base offset off by __START_KERNEL bytes
Alexander Gordeev agordeev@linux.ibm.com s390/boot: Avoid possible physmem_info segment corruption
Yang Ruibin 11162571@vivo.com thermal/debugfs: Fix the NULL vs IS_ERR() confusion in debugfs_create_dir()
Matthew Brost matthew.brost@intel.com drm/xe: Free job before xe_exec_queue_put
Thomas Hellstr�m thomas.hellstrom@linux.intel.com drm/xe: Don't initialize fences at xe_sched_job_create()
Thomas Hellstr�m thomas.hellstrom@linux.intel.com drm/xe: Split lrc seqno fence creation up
Matthew Brost matthew.brost@intel.com drm/xe: Decouple job seqno and lrc seqno
Rodrigo Vivi rodrigo.vivi@intel.com drm/xe: Relax runtime pm protection during execution
Stuart Summers stuart.summers@intel.com drm/xe: Fix missing workqueue destroy in xe_gt_pagefault
Jens Axboe axboe@kernel.dk io_uring/kbuf: sanitize peek buffer setup
Dan Carpenter dan.carpenter@linaro.org mmc: mmc_test: Fix NULL dereference on allocation failure
Matthew Brost matthew.brost@intel.com drm/xe: Fix tile fini sequence
Matthew Auld matthew.auld@intel.com drm/xe: reset mmio mappings with devm
Matthew Auld matthew.auld@intel.com drm/xe/mmio: move mmio_fini over to devm
Lucas De Marchi lucas.demarchi@intel.com drm/xe: Fix opregion leak
Matthew Auld matthew.auld@intel.com drm/xe/display: stop calling domains_driver_remove twice
Suraj Kandpal suraj.kandpal@intel.com drm/i915/hdcp: Use correct cp_irq_count
Vignesh Raghavendra vigneshr@ti.com spi: spi-cadence-quadspi: Fix OSPI NOR failures during system resume
Abhinav Kumar quic_abhinavk@quicinc.com drm/msm: fix the highest_bank_bit for sc7180
Tejun Heo tj@kernel.org workqueue: Fix spruious data race in __flush_work()
Will Deacon will@kernel.org workqueue: Fix UBSAN 'subtraction overflow' error in shift_and_mask()
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: take plane rotation into account for wide planes
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: relax YUV requirements
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: limit QCM2290 to RGB formats only
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: cleanup FB if dpu_format_populate_layout fails
Abhinav Kumar quic_abhinavk@quicinc.com drm/msm/dp: reset the link phy params before link training
Abhinav Kumar quic_abhinavk@quicinc.com drm/msm/dpu: move dpu_encoder's connector assignment to atomic_enable()
Abhinav Kumar quic_abhinavk@quicinc.com drm/msm/dp: fix the max supported bpp logic
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: don't play tricks with debug macros
Alexandra Winter wintera@linux.ibm.com s390/iucv: Fix vargs handling in iucv_alloc_device()
Menglong Dong menglong8.dong@gmail.com net: ovs: fix ovs_drop_reasons error
Sean Anderson sean.anderson@linux.dev net: xilinx: axienet: Fix dangling multicast addresses
Sean Anderson sean.anderson@linux.dev net: xilinx: axienet: Always disable promiscuous mode
Bharat Bhushan bbhushan2@marvell.com octeontx2-af: Fix CPT AF register offset calculation
Pablo Neira Ayuso pablo@netfilter.org netfilter: flowtable: validate vlan header
Somnath Kotur somnath.kotur@broadcom.com bnxt_en: Fix double DMA unmapping for XDP_REDIRECT
Eric Dumazet edumazet@google.com ipv6: prevent possible UAF in ip6_xmit()
Eric Dumazet edumazet@google.com ipv6: fix possible UAF in ip6_finish_output2()
Eric Dumazet edumazet@google.com ipv6: prevent UAF in ip6_send_skb()
Ido Schimmel idosch@nvidia.com selftests: mlxsw: ethtool_lanes: Source ethtool lib from correct path
Felix Fietkau nbd@nbd.name udp: fix receiving fraglist GSO packets
Stephen Hemminger stephen@networkplumber.org netem: fix return value if duplicate enqueue fails
Joseph Huang Joseph.Huang@garmin.com net: dsa: mv88e6xxx: Fix out-of-bound access
Paolo Abeni pabeni@redhat.com igb: cope with large MAX_SKB_FRAGS
Dan Carpenter dan.carpenter@linaro.org dpaa2-switch: Fix error checking in dpaa2_switch_seed_bp()
Michal Swiatkowski michal.swiatkowski@linux.intel.com ice: use internal pf id instead of function number
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: fix truesize operations for PAGE_SIZE >= 8192
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: fix ICE_LAST_OFFSET formula
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: fix page reuse when PAGE_SIZE is over 8k
Nikolay Aleksandrov razor@blackwall.org bonding: fix xfrm state handling when clearing active slave
Nikolay Aleksandrov razor@blackwall.org bonding: fix xfrm real_dev null pointer dereference
Nikolay Aleksandrov razor@blackwall.org bonding: fix null pointer deref in bond_ipsec_offload_ok
Nikolay Aleksandrov razor@blackwall.org bonding: fix bond_ipsec_offload_ok return type
Thomas Bogendoerfer tbogendoerfer@suse.de ip6_tunnel: Fix broken GRO
Sebastian Andrzej Siewior bigeasy@linutronix.de netfilter: nft_counter: Synchronize nft_counter_reset() against reader.
Sebastian Andrzej Siewior bigeasy@linutronix.de netfilter: nft_counter: Disable BH in nft_counter_offload_stats().
Kuniyuki Iwashima kuniyu@amazon.com kcm: Serialise kcm_sendmsg() for the same socket.
Jeremy Kerr jk@codeconstruct.com.au net: mctp: test: Use correct skb for route input check
Florian Westphal fw@strlen.de tcp: prevent concurrent execution of tcp_sk_exit_batch
Hangbin Liu liuhangbin@gmail.com selftests: udpgro: no need to load xdp for gro
Hangbin Liu liuhangbin@gmail.com selftests: udpgro: report error when receive failed
Simon Horman horms@kernel.org tc-testing: don't access non-existent variable on exception
Patrisious Haddad phaddad@nvidia.com net/mlx5: Fix IPsec RoCE MPV trace call
Carolina Jubran cjubran@nvidia.com net/mlx5e: XPS, Fix oversight of Multi-PF Netdev changes
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: serialize access to the injection/extraction groups
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: fix QoS class for injected packets with "ocelot-8021q"
Vladimir Oltean vladimir.oltean@nxp.com net: mscc: ocelot: use ocelot_xmit_get_vlan_info() also for FDMA and register injection
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: SMP: Fix assumption of Central always being Initiator
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_core: Fix LE quote calculation
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: HCI: Invert LE State quirk to be opt-out rather then opt-in
Masahiro Yamada masahiroy@kernel.org kbuild: avoid scripts/kallsyms parsing /dev/null
Masahiro Yamada masahiroy@kernel.org kbuild: merge temporary vmlinux for BTF and kallsyms
Alexandre Courbot gnurou@gmail.com Makefile: add $(srctree) to dependency of compile_commands.json target
Takashi Iwai tiwai@suse.de ALSA: hda/tas2781: Use correct endian conversion
Maximilian Luz luzmaximilian@gmail.com platform/surface: aggregator: Fix warning when controller is destroyed in probe
Baochen Qiang quic_bqiang@quicinc.com wifi: ath12k: use 128 bytes aligned iova in transmit path for WCN7850
Mikulas Patocka mpatocka@redhat.com dm suspend: return -ERESTARTSYS instead of -EINTR
Su Hui suhui@nfschina.com smb/client: avoid possible NULL dereference in cifs_free_subrequest()
David Howells dhowells@redhat.com cifs: Add a tracepoint to track credits involved in R/W requests
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: gov_bang_bang: Use governor_data to reduce overhead
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: gov_bang_bang: Add .manage() callback
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: gov_bang_bang: Split bang_bang_control()
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: gov_bang_bang: Drop unnecessary cooling device target state checks
Mario Limonciello mario.limonciello@amd.com drm/amd/display: Don't register panel_power_savings on OLED panels
Li Lingfeng lilingfeng3@huawei.com block: Fix lockdep warning in blk_mq_mark_tag_wait
Samuel Holland samuel.holland@sifive.com arm64: Fix KASAN random tag seed initialization
Ryo Takakura takakura@valinux.co.jp printk/panic: Allow cpu backtraces to be written into ringbuffer during panic
Nysal Jan K.A nysal@linux.ibm.com powerpc/topology: Check if a core is online
Nysal Jan K.A nysal@linux.ibm.com cpu/SMT: Enable SMT only if a core is online
Olivier Langlois olivier@trillion01.com io_uring/napi: check napi_enabled in io_napi_add() before proceeding
Pavel Begunkov asml.silence@gmail.com io_uring/napi: use ktime in busy polling
Thorsten Blum thorsten.blum@toblux.com io_uring/napi: Remove unnecessary s64 cast
Eric Farman farman@linux.ibm.com s390/dasd: Remove DMA alignment
Masahiro Yamada masahiroy@kernel.org rust: fix the default format for CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
Masahiro Yamada masahiroy@kernel.org rust: suppress error messages from CONFIG_{RUSTC,BINDGEN}_VERSION_TEXT
Miguel Ojeda ojeda@kernel.org rust: work around `bindgen` 0.69.0 issue
Ma�ra Canal mcanal@igalia.com drm/v3d: Fix out-of-bounds read in `v3d_csd_job_run()`
Parsa Poorshikhian parsa.poorsh@gmail.com ALSA: hda/realtek: Fix noise from speakers on Lenovo IdeaPad 3 15IAU7
Asmaa Mnebhi asmaa@nvidia.com gpio: mlxbf3: Support shutdown() function
Barak Biber bbiber@nvidia.com iommu: Restore lost return in iommu_report_device_fault()
Song Liu song@kernel.org kallsyms: Match symbols exactly with CONFIG_LTO_CLANG
Song Liu song@kernel.org kallsyms: Do not cleanup .llvm.<hash> suffix before sorting symbols
Jann Horn jannh@google.com kallsyms: get rid of code for absolute kallsyms
Masahiro Yamada masahiroy@kernel.org kbuild: remove PROVIDE() for kallsyms symbols
Masahiro Yamada masahiroy@kernel.org kbuild: refactor variables in scripts/link-vmlinux.sh
Jie Wang wangjie125@huawei.com net: hns3: fix a deadlock problem when config TC during resetting
Peiyang Wang wangpeiyang1@huawei.com net: hns3: use the user's cfg after reset
Jie Wang wangjie125@huawei.com net: hns3: fix wrong use of semaphore up
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: net: lib: kill PIDs before del netns
Matthieu Baerts (NGI0) matttbe@kernel.org selftests: net: lib: ignore possible errors
Cong Wang cong.wang@bytedance.com vsock: fix recursive ->recvmsg calls
Abhinav Jain jain.abhinav177@gmail.com selftest: af_unix: Fix kselftest compilation warnings
Phil Sutter phil@nwl.cc netfilter: nf_tables: Add locking for NFT_MSG_GETOBJ_RESET requests
Phil Sutter phil@nwl.cc netfilter: nf_tables: Introduce nf_tables_getobj_single
Phil Sutter phil@nwl.cc netfilter: nf_tables: Audit log dump reset after the fact
Florian Westphal fw@strlen.de netfilter: nf_queue: drop packets with cloned unconfirmed conntracks
Donald Hunter donald.hunter@gmail.com netfilter: flowtable: initialise extack before use
Donald Hunter donald.hunter@gmail.com netfilter: nfnetlink: Initialise extack before use in ACKs
Tom Hughes tom@compton.nu netfilter: allow ipv6 fragments to arrive on different devices
Subash Abhinov Kasiviswanathan quic_subashab@quicinc.com tcp: Update window clamping condition
Eugene Syromiatnikov esyr@redhat.com mptcp: correct MPTCP_SUBFLOW_ATTR_SSN_OFFSET reserved size
David Thompson davthompson@nvidia.com mlxbf_gige: disable RX filters until RX path initialized
Zheng Zhang everything411@qq.com net: ethernet: mtk_wed: fix use-after-free panic in mtk_wed_setup_tc_block_cb()
Pawel Dembicki paweldembicki@gmail.com net: dsa: vsc73xx: check busy flag in MDIO operations
Pawel Dembicki paweldembicki@gmail.com net: dsa: vsc73xx: pass value in phy_write operation
Pawel Dembicki paweldembicki@gmail.com net: dsa: vsc73xx: fix port MAC configuration in full duplex mode
Radhey Shyam Pandey radhey.shyam.pandey@amd.com net: axienet: Fix register defines comment description
Dan Carpenter dan.carpenter@linaro.org atm: idt77252: prevent use after free in dequeue_rx()
Cosmin Ratiu cratiu@nvidia.com net/mlx5e: Correctly report errors for ethtool rx flows
Dragos Tatulea dtatulea@nvidia.com net/mlx5e: Take state lock during tx timeout reporter
Tariq Toukan tariqt@nvidia.com net/mlx5: SD, Do not query MPIR register if no sd_group
Eric Dumazet edumazet@google.com gtp: pull network headers in gtp_dev_xmit()
Faizal Rahim faizal.abdul.rahim@linux.intel.com igc: Fix qbv tx latency by setting gtxoffset
Faizal Rahim faizal.abdul.rahim@linux.intel.com igc: Fix reset adapter logics when tx mode change
Faizal Rahim faizal.abdul.rahim@linux.intel.com igc: Fix qbv_config_change_errors logics
Faizal Rahim faizal.abdul.rahim@linux.intel.com igc: Fix packet still tx after gate close by reducing i226 MAC retry buffer
Naohiro Aota naohiro.aota@wdc.com btrfs: fix invalid mapping of extent xarray state
Dominique Martinet asmadeus@codewreck.org 9p: Fix DIO read through netfs
Yonghong Song yonghong.song@linux.dev bpf: Fix a kernel verifier crash in stacksafe()
Leon Hwang leon.hwang@linux.dev bpf: Fix updating attached freplace prog in prog_array map
yangerkun yangerkun@huawei.com libfs: fix infinite directory reads for offset dir
Omar Sandoval osandov@fb.com filelock: fix name of file_lease slab cache
Matthew Wilcox (Oracle) willy@infradead.org netfs: Fault in smaller chunks for non-large folio mappings
Claudio Imbrenda imbrenda@linux.ibm.com s390/uv: Panic for set and remove shared access UVC errors
Christian Brauner brauner@kernel.org pidfd: prevent creation of pidfds for kthreads
David (Ming Qiang) Wu David.Wu3@amd.com drm/amd/amdgpu: command submission parser for JPEG
Alex Deucher alexander.deucher@amd.com drm/amdgpu/jpeg4: properly set atomics vmid field
Alex Deucher alexander.deucher@amd.com drm/amdgpu/jpeg2: properly set atomics vmid field
Melissa Wen mwen@igalia.com drm/amd/display: fix cursor offset on rotation 180
Loan Chen lo-an.chen@amd.com drm/amd/display: Enable otg synchronization logic for DCN321
Hamza Mahfooz hamza.mahfooz@amd.com drm/amd/display: fix s2idle entry for DCN3.5+
Rodrigo Siqueira Rodrigo.Siqueira@amd.com drm/amd/display: Adjust cursor position
Al Viro viro@zeniv.linux.org.uk memcg_write_event_control(): fix a user-triggerable oops
Bas Nieuwenhuizen bas@basnieuwenhuizen.nl drm/amdgpu: Actually check flags for all context ops.
Qu Wenruo wqu@suse.com btrfs: only enable extent map shrinker for DEBUG builds
Qu Wenruo wqu@suse.com btrfs: tree-checker: add dev extent item checks
Naohiro Aota naohiro.aota@wdc.com btrfs: zoned: properly take lock to read/update block group's zoned variables
Filipe Manana fdmanana@suse.com btrfs: only run the extent map shrinker from kswapd tasks
Josef Bacik josef@toxicpanda.com btrfs: check delayed refs when we're checking if a ref exists
Filipe Manana fdmanana@suse.com btrfs: send: allow cloning non-aligned extent if it ends at i_size
Qu Wenruo wqu@suse.com btrfs: tree-checker: reject BTRFS_FT_UNKNOWN dir type
Zi Yan ziy@nvidia.com mm/numa: no task_numa_fault() call if PTE is changed
Hailong Liu hailong.liu@oppo.com mm/vmalloc: fix page mapping if vm_area_alloc_pages() with high order fallback to order 0
Zi Yan ziy@nvidia.com mm/numa: no task_numa_fault() call if PMD is changed
Suren Baghdasaryan surenb@google.com alloc_tag: introduce clear_page_tag_ref() helper function
Muhammad Usama Anjum usama.anjum@collabora.com selftests: memfd_secret: don't build memfd_secret test on unsupported arches
Waiman Long longman@redhat.com mm/memory-failure: use raw_spinlock_t in struct memory_failure_cpu
Suren Baghdasaryan surenb@google.com alloc_tag: mark pages reserved during CMA activation as not tagged
Zhen Lei thunder.leizhen@huawei.com selinux: add the processing of the failure of avc_add_xperms_decision()
Zhen Lei thunder.leizhen@huawei.com selinux: fix potential counting error in avc_add_xperms_decision()
Max Kellermann max.kellermann@ionos.com fs/netfs/fscache_cookie: add missing "n_accesses" check
Janne Grunau j@jannau.net wifi: brcmfmac: cfg80211: Handle SSID based pmksa deletion
Long Li longli@microsoft.com net: mana: Fix doorbell out of order violation and avoid unnecessary doorbell rings
Hans de Goede hdegoede@redhat.com media: atomisp: Fix streaming no longer working on BYT / ISP2400 devices
Haiyang Zhang haiyangz@microsoft.com net: mana: Fix RX buf alloc_size alignment and atomic op panic
Yu Kuai yukuai3@huawei.com md/raid1: Fix data corruption for degraded array with slow disk
David Hildenbrand david@redhat.com mm/hugetlb: fix hugetlb vs. core-mm PT locking
Kirill A. Shutemov kirill.shutemov@linux.intel.com mm: fix endless reclaim on machines with unaccepted memory
Dan Carpenter dan.carpenter@linaro.org rtla/osnoise: Prevent NULL dereference in error handling
Pedro Falcato pedro.falcato@gmail.com mseal: fix is_madv_discard()
Kyle Huey me@kylehuey.com perf/bpf: Don't call bpf_overflow_handler() for tracing events
Steven Rostedt rostedt@goodmis.org tracing: Return from tracing_buffers_read() if the file has been closed
Andi Shyti andi.shyti@kernel.org i2c: qcom-geni: Add missing geni_icc_disable in geni_i2c_runtime_resume
Al Viro viro@zeniv.linux.org.uk fix bitmap corruption on close_range() with CLOSE_RANGE_UNSHARE
Zhihao Cheng chengzhihao1@huawei.com vfs: Don't evict inode under the inode lru traversing context
Mikulas Patocka mpatocka@redhat.com dm persistent data: fix memory allocation failure
Khazhismel Kumykov khazhy@google.com dm resume: don't return EINVAL when signalled
Haibo Xu haibo1.xu@intel.com arm64: ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
Rafael J. Wysocki rafael.j.wysocki@intel.com ACPI: EC: Evaluate _REG outside the EC scope more carefully
Rafael J. Wysocki rafael.j.wysocki@intel.com ACPICA: Add a depth argument to acpi_execute_reg_methods()
Breno Leitao leitao@debian.org i2c: tegra: Do not mark ACPI devices as irq safe
Steve French stfrench@microsoft.com smb3: fix lock breakage for cached writes
Celeste Liu coelacanthushex@gmail.com riscv: entry: always initialize regs->a0 to -ENOSYS
Nam Cao namcao@linutronix.de riscv: change XIP's kernel_map.size to be size of the entire kernel
David Gstir david@sigma-star.at KEYS: trusted: dcp: fix leak of blob encryption key
David Gstir david@sigma-star.at KEYS: trusted: fix DCP blob payload length assignment
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: gov_bang_bang: Call __thermal_cdev_update() directly
Michael Mueller mimu@linux.ibm.com KVM: s390: fix validity interception issue when gisa is switched off
Stefan Haberland sth@linux.ibm.com s390/dasd: fix error recovery leading to data corruption on ESE devices
Takashi Iwai tiwai@suse.de ALSA: timer: Relax start tick time check for slave timer elements
Baojun Xu baojun.xu@ti.com ALSA: hda/tas2781: fix wrong calibrated data order
Mika Westerberg mika.westerberg@linux.intel.com thunderbolt: Mark XDomain as unplugged when router is removed
Mathias Nyman mathias.nyman@linux.intel.com xhci: Fix Panther point NULL pointer deref at full-speed re-enumeration
Marc Zyngier maz@kernel.org usb: xhci: Check for xhci->interrupters being allocated in xhci_mem_clearup()
Hans de Goede hdegoede@redhat.com usb: misc: ljca: Add Lunar Lake ljca GPIO HID to ljca_gpio_hids[]
Juan Jos� Arboleda soyjuanarbol@gmail.com ALSA: usb-audio: Support Yamaha P-125 quirk entry
Lianqin Hu hulianqin@vivo.com ALSA: usb-audio: Add delay quirk for VIVO USB-C-XE710 HEADSET
Eli Billauer eli.billauer@gmail.com char: xillybus: Check USB endpoints when probing device
Eli Billauer eli.billauer@gmail.com char: xillybus: Refine workqueue handling
Eli Billauer eli.billauer@gmail.com char: xillybus: Don't destroy workqueue from work item running on it
Jann Horn jannh@google.com fuse: Initialize beyond-EOF page contents before setting uptodate
David Howells dhowells@redhat.com netfs, ceph: Revert "netfs: Remove deprecated use of PG_private_2 as a second writeback flag"
Paul Moore paul@paul-moore.com selinux: revert our use of vma_is_initial_heap()
Xu Yang xu.yang_2@nxp.com Revert "usb: typec: tcpm: clear pd_event queue in PORT_RESET"
Griffin Kroah-Hartman griffin@kroah.com Revert "serial: 8250_omap: Set the console genpd always on if no console suspend"
Griffin Kroah-Hartman griffin@kroah.com Revert "misc: fastrpc: Restrict untrusted app to attach to privileged PD"
Rafael J. Wysocki rafael.j.wysocki@intel.com Revert "ACPI: EC: Evaluate orphan _REG under EC device"
Mathieu Othacehe m.othacehe@gmail.com tty: atmel_serial: use the correct RTS flag.
Peng Fan peng.fan@nxp.com tty: serial: fsl_lpuart: mark last busy before uart_add_one_port
Masahiro Yamada masahiroy@kernel.org tty: vt: conmakehash: remove non-portable code printing comment header
Diffstat:
Documentation/ABI/testing/sysfs-devices-system-cpu | 3 +- Makefile | 6 +- arch/arm64/kernel/acpi_numa.c | 2 +- arch/arm64/kernel/setup.c | 3 - arch/arm64/kernel/smp.c | 2 + arch/arm64/kvm/sys_regs.c | 6 + arch/arm64/kvm/vgic/vgic-debug.c | 2 +- arch/arm64/kvm/vgic/vgic.h | 7 + arch/mips/kernel/cpu-probe.c | 4 + arch/powerpc/include/asm/topology.h | 13 ++ arch/riscv/kernel/traps.c | 4 +- arch/riscv/mm/init.c | 4 +- arch/s390/boot/startup.c | 55 ++++--- arch/s390/boot/vmem.c | 14 +- arch/s390/boot/vmlinux.lds.S | 7 +- arch/s390/include/asm/page.h | 3 +- arch/s390/include/asm/uv.h | 5 +- arch/s390/kernel/vmlinux.lds.S | 2 +- arch/s390/kvm/kvm-s390.h | 7 +- arch/s390/tools/relocs.c | 2 +- block/blk-mq-tag.c | 5 +- drivers/acpi/acpica/acevents.h | 6 +- drivers/acpi/acpica/evregion.c | 12 +- drivers/acpi/acpica/evxfregn.c | 64 +------- drivers/acpi/ec.c | 14 +- drivers/acpi/internal.h | 1 + drivers/acpi/scan.c | 2 + drivers/acpi/video_detect.c | 22 +++ drivers/ata/pata_macio.c | 23 ++- drivers/atm/idt77252.c | 9 +- drivers/bluetooth/btintel.c | 10 -- drivers/bluetooth/btintel_pcie.c | 3 - drivers/bluetooth/btmtksdio.c | 3 - drivers/bluetooth/btrtl.c | 1 - drivers/bluetooth/btusb.c | 4 +- drivers/bluetooth/hci_qca.c | 4 +- drivers/bluetooth/hci_vhci.c | 2 - drivers/char/xillybus/xillyusb.c | 42 ++++- drivers/gpio/gpio-mlxbf3.c | 14 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 8 + drivers/gpu/drm/amd/amdgpu/amdgpu_psp_ta.c | 3 + drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 53 +++--- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 1 + drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 4 +- drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 63 ++++++- drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.h | 7 +- drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c | 1 + drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 18 +- drivers/gpu/drm/amd/amdgpu/soc15d.h | 6 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 32 +++- .../drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c | 4 +- .../display/dc/resource/dcn321/dcn321_resource.c | 3 + drivers/gpu/drm/i915/display/intel_dp_hdcp.c | 4 +- drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 4 +- drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c | 4 +- drivers/gpu/drm/msm/disp/dpu1/dpu_kms.h | 14 +- drivers/gpu/drm/msm/disp/dpu1/dpu_plane.c | 20 ++- drivers/gpu/drm/msm/dp/dp_ctrl.c | 2 + drivers/gpu/drm/msm/dp/dp_panel.c | 19 ++- drivers/gpu/drm/msm/msm_mdss.c | 2 +- drivers/gpu/drm/nouveau/nvkm/core/firmware.c | 9 +- drivers/gpu/drm/nouveau/nvkm/falcon/fw.c | 6 + drivers/gpu/drm/v3d/v3d_sched.c | 14 +- drivers/gpu/drm/xe/display/xe_display.c | 6 +- drivers/gpu/drm/xe/xe_device.c | 4 +- drivers/gpu/drm/xe/xe_exec_queue.c | 19 --- drivers/gpu/drm/xe/xe_exec_queue_types.h | 10 -- drivers/gpu/drm/xe/xe_gt_pagefault.c | 18 +- drivers/gpu/drm/xe/xe_guc_submit.c | 8 +- drivers/gpu/drm/xe/xe_hw_fence.c | 59 +++++-- drivers/gpu/drm/xe/xe_hw_fence.h | 7 +- drivers/gpu/drm/xe/xe_lrc.c | 48 +++++- drivers/gpu/drm/xe/xe_lrc.h | 3 + drivers/gpu/drm/xe/xe_mmio.c | 41 ++++- drivers/gpu/drm/xe/xe_mmio.h | 2 +- drivers/gpu/drm/xe/xe_ring_ops.c | 22 +-- drivers/gpu/drm/xe/xe_sched_job.c | 182 ++++++++++++--------- drivers/gpu/drm/xe/xe_sched_job.h | 7 +- drivers/gpu/drm/xe/xe_sched_job_types.h | 20 ++- drivers/gpu/drm/xe/xe_trace.h | 11 +- drivers/hid/wacom_wac.c | 4 +- drivers/i2c/busses/i2c-qcom-geni.c | 4 +- drivers/i2c/busses/i2c-tegra.c | 4 +- drivers/input/input-mt.c | 3 + drivers/input/serio/i8042-acpipnpio.h | 20 +-- drivers/input/serio/i8042.c | 10 +- drivers/iommu/io-pgfault.c | 1 + drivers/iommu/iommufd/device.c | 2 +- drivers/md/dm-ioctl.c | 22 ++- drivers/md/dm.c | 4 +- drivers/md/persistent-data/dm-space-map-metadata.c | 4 +- drivers/md/raid1.c | 14 +- drivers/misc/fastrpc.c | 22 +-- drivers/mmc/core/mmc_test.c | 9 +- drivers/mmc/host/dw_mmc.c | 8 + drivers/mmc/host/mtk-sd.c | 8 +- drivers/net/bonding/bond_main.c | 21 +-- drivers/net/bonding/bond_options.c | 2 +- drivers/net/dsa/microchip/ksz_ptp.c | 5 +- drivers/net/dsa/mv88e6xxx/global1_atu.c | 3 +- drivers/net/dsa/ocelot/felix.c | 11 ++ drivers/net/dsa/vitesse-vsc73xx-core.c | 45 ++++- drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c | 5 - drivers/net/ethernet/chelsio/cxgb4/cxgb4_filter.c | 3 +- .../net/ethernet/freescale/dpaa2/dpaa2-switch.c | 7 +- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 3 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 28 +++- .../ethernet/hisilicon/hns3/hns3pf/hclge_mdio.c | 3 + .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 4 +- .../net/ethernet/intel/ice/devlink/devlink_port.c | 4 +- drivers/net/ethernet/intel/ice/ice_base.c | 21 ++- drivers/net/ethernet/intel/ice/ice_txrx.c | 47 +----- drivers/net/ethernet/intel/igb/igb_main.c | 1 + drivers/net/ethernet/intel/igc/igc_defines.h | 6 + drivers/net/ethernet/intel/igc/igc_main.c | 8 +- drivers/net/ethernet/intel/igc/igc_tsn.c | 76 +++++++-- drivers/net/ethernet/intel/igc/igc_tsn.h | 1 + .../net/ethernet/marvell/octeontx2/af/rvu_cpt.c | 23 ++- drivers/net/ethernet/mediatek/mtk_wed.c | 6 +- .../ethernet/mellanox/mlx5/core/en/reporter_tx.c | 2 + .../ethernet/mellanox/mlx5/core/en_fs_ethtool.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 +- .../mellanox/mlx5/core/lib/ipsec_fs_roce.c | 6 +- drivers/net/ethernet/mellanox/mlx5/core/lib/sd.c | 18 +- .../net/ethernet/mellanox/mlxbf_gige/mlxbf_gige.h | 8 + .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 10 ++ .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_regs.h | 2 + .../ethernet/mellanox/mlxbf_gige/mlxbf_gige_rx.c | 50 +++++- drivers/net/ethernet/microsoft/mana/mana_en.c | 28 +++- drivers/net/ethernet/mscc/ocelot.c | 91 ++++++++++- drivers/net/ethernet/mscc/ocelot_fdma.c | 3 +- drivers/net/ethernet/mscc/ocelot_vsc7514.c | 4 + drivers/net/ethernet/wangxun/ngbe/ngbe_mdio.c | 8 +- drivers/net/ethernet/xilinx/xilinx_axienet.h | 17 +- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 25 +-- drivers/net/gtp.c | 3 + drivers/net/wireless/ath/ath12k/dp_tx.c | 72 ++++++++ drivers/net/wireless/ath/ath12k/hw.c | 6 + drivers/net/wireless/ath/ath12k/hw.h | 4 + drivers/net/wireless/ath/ath12k/mac.c | 1 + .../broadcom/brcm80211/brcmfmac/cfg80211.c | 13 +- drivers/nvme/host/core.c | 2 +- drivers/platform/surface/aggregator/controller.c | 3 +- drivers/platform/x86/dell/Kconfig | 1 + drivers/platform/x86/dell/dell-uart-backlight.c | 8 + .../x86/intel/speed_select_if/isst_tpmi_core.c | 3 +- drivers/pmdomain/imx/imx93-pd.c | 5 +- drivers/pmdomain/imx/scu-pd.c | 5 - drivers/s390/block/dasd.c | 36 ++-- drivers/s390/block/dasd_3990_erp.c | 10 +- drivers/s390/block/dasd_eckd.c | 57 +++---- drivers/s390/block/dasd_genhd.c | 1 - drivers/s390/block/dasd_int.h | 2 +- drivers/s390/crypto/ap_bus.c | 7 +- drivers/spi/spi-cadence-quadspi.c | 14 +- .../media/atomisp/pci/ia_css_stream_public.h | 8 +- .../staging/media/atomisp/pci/sh_css_internal.h | 19 ++- drivers/thermal/gov_bang_bang.c | 91 ++++++++--- drivers/thermal/thermal_core.c | 3 +- drivers/thermal/thermal_debugfs.c | 6 +- drivers/thermal/thermal_of.c | 23 ++- drivers/thunderbolt/switch.c | 1 + drivers/tty/serial/8250/8250_omap.c | 33 +--- drivers/tty/serial/atmel_serial.c | 2 +- drivers/tty/serial/fsl_lpuart.c | 1 + drivers/tty/vt/conmakehash.c | 12 +- drivers/usb/host/xhci-mem.c | 2 +- drivers/usb/host/xhci.c | 8 +- drivers/usb/misc/usb-ljca.c | 1 + drivers/usb/typec/tcpm/tcpm.c | 1 - fs/9p/vfs_addr.c | 3 +- fs/afs/file.c | 3 +- fs/btrfs/delayed-ref.c | 67 ++++++++ fs/btrfs/delayed-ref.h | 2 + fs/btrfs/extent-tree.c | 51 +++++- fs/btrfs/extent_io.c | 14 +- fs/btrfs/extent_map.c | 22 +-- fs/btrfs/free-space-cache.c | 14 +- fs/btrfs/send.c | 52 ++++-- fs/btrfs/super.c | 18 +- fs/btrfs/tree-checker.c | 74 ++++++++- fs/ceph/addr.c | 25 ++- fs/file.c | 28 ++-- fs/fuse/dev.c | 6 +- fs/inode.c | 39 ++++- fs/libfs.c | 35 ++-- fs/locks.c | 2 +- fs/netfs/buffered_read.c | 8 +- fs/netfs/buffered_write.c | 2 +- fs/netfs/fscache_cookie.c | 4 + fs/netfs/io.c | 161 +++++++++++++++++- fs/nfs/fscache.c | 3 +- fs/smb/client/cifsglob.h | 17 +- fs/smb/client/file.c | 58 +++++-- fs/smb/client/reparse.c | 11 +- fs/smb/client/smb1ops.c | 2 +- fs/smb/client/smb2ops.c | 42 ++++- fs/smb/client/smb2pdu.c | 40 ++++- fs/smb/client/trace.h | 55 ++++++- fs/smb/client/transport.c | 8 +- fs/smb/server/connection.c | 34 +++- fs/smb/server/connection.h | 3 +- fs/smb/server/mgmt/user_session.c | 8 + fs/smb/server/smb2pdu.c | 5 +- include/acpi/acpixf.h | 5 +- include/acpi/video.h | 1 + include/asm-generic/vmlinux.lds.h | 19 --- include/linux/bitmap.h | 12 ++ include/linux/bpf_verifier.h | 4 +- include/linux/dsa/ocelot.h | 47 ++++++ include/linux/fs.h | 5 + include/linux/hugetlb.h | 33 +++- include/linux/io_uring_types.h | 2 +- include/linux/mm.h | 11 ++ include/linux/panic.h | 1 + include/linux/pgalloc_tag.h | 13 ++ include/linux/thermal.h | 1 + include/net/af_vsock.h | 4 + include/net/bluetooth/hci.h | 17 +- include/net/bluetooth/hci_core.h | 2 +- include/net/kcm.h | 1 + include/net/mana/mana.h | 1 + include/scsi/scsi_cmnd.h | 2 +- include/soc/mscc/ocelot.h | 12 +- include/trace/events/netfs.h | 1 + include/uapi/misc/fastrpc.h | 3 - init/Kconfig | 25 +-- io_uring/io_uring.h | 2 +- io_uring/kbuf.c | 9 +- io_uring/napi.c | 50 +++--- io_uring/napi.h | 2 +- kernel/bpf/verifier.c | 5 +- kernel/cgroup/cpuset.c | 5 +- kernel/cpu.c | 12 +- kernel/events/core.c | 3 +- kernel/kallsyms.c | 60 +------ kernel/kallsyms_internal.h | 6 - kernel/kallsyms_selftest.c | 22 +-- kernel/panic.c | 8 +- kernel/printk/printk.c | 2 +- kernel/trace/trace.c | 2 +- kernel/vmcore_info.c | 4 - kernel/workqueue.c | 47 +++--- mm/huge_memory.c | 29 ++-- mm/memcontrol.c | 7 +- mm/memory-failure.c | 20 ++- mm/memory.c | 33 ++-- mm/mm_init.c | 12 +- mm/mseal.c | 14 +- mm/page_alloc.c | 51 +++--- mm/vmalloc.c | 11 +- net/bluetooth/hci_core.c | 19 +-- net/bluetooth/hci_event.c | 2 +- net/bluetooth/mgmt.c | 4 + net/bluetooth/smp.c | 146 ++++++++--------- net/bridge/br_netfilter_hooks.c | 6 +- net/dsa/tag_ocelot.c | 37 +---- net/ipv4/tcp_input.c | 28 ++-- net/ipv4/tcp_ipv4.c | 14 ++ net/ipv4/udp_offload.c | 3 +- net/ipv6/ip6_output.c | 10 ++ net/ipv6/ip6_tunnel.c | 12 +- net/ipv6/netfilter/nf_conntrack_reasm.c | 4 + net/iucv/iucv.c | 4 +- net/kcm/kcmsock.c | 4 + net/mctp/test/route-test.c | 2 +- net/mptcp/diag.c | 2 +- net/mptcp/pm.c | 13 -- net/mptcp/pm_netlink.c | 142 ++++++++++------ net/mptcp/protocol.h | 3 - net/netfilter/nf_flow_table_inet.c | 3 + net/netfilter/nf_flow_table_ip.c | 3 + net/netfilter/nf_flow_table_offload.c | 2 +- net/netfilter/nf_tables_api.c | 163 ++++++++++++------ net/netfilter/nfnetlink.c | 5 +- net/netfilter/nfnetlink_queue.c | 35 +++- net/netfilter/nft_counter.c | 9 +- net/openvswitch/datapath.c | 2 +- net/sched/sch_netem.c | 47 ++++-- net/vmw_vsock/af_vsock.c | 50 +++--- net/vmw_vsock/vsock_bpf.c | 4 +- scripts/kallsyms.c | 109 ++++-------- scripts/link-vmlinux.sh | 106 ++++++------ scripts/rust_is_available.sh | 6 +- security/keys/trusted-keys/trusted_dcp.c | 35 ++-- security/selinux/avc.c | 8 +- security/selinux/hooks.c | 12 +- sound/core/timer.c | 2 +- sound/pci/hda/patch_realtek.c | 1 - sound/pci/hda/tas2781_hda_i2c.c | 14 +- sound/usb/quirks-table.h | 1 + sound/usb/quirks.c | 2 + tools/perf/tests/vmlinux-kallsyms.c | 1 - tools/testing/selftests/bpf/progs/iters.c | 54 ++++++ tools/testing/selftests/core/close_range_test.c | 35 ++++ .../selftests/drivers/net/mlxsw/ethtool_lanes.sh | 3 +- tools/testing/selftests/mm/Makefile | 2 + tools/testing/selftests/mm/run_vmtests.sh | 3 + tools/testing/selftests/net/af_unix/msg_oob.c | 2 +- tools/testing/selftests/net/lib.sh | 11 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 28 +++- tools/testing/selftests/net/udpgro.sh | 53 +++--- tools/testing/selftests/tc-testing/tdc.py | 1 - tools/tracing/rtla/src/osnoise_top.c | 11 +- 305 files changed, 3476 insertions(+), 1744 deletions(-)
Boot-tested under QEMU for Rust x86_64, arm64 and riscv64; built-tested for loongarch64:
Tested-by: Miguel Ojeda ojeda@kernel.org
Thanks!
Cheers, Miguel
On Tue, Aug 27, 2024 at 04:35:24PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.10.7 release. There are 273 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.7-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y and the diffstat can be found below.
thanks,
greg k-h
Tested rc1 against the Fedora build system (aarch64, ppc64le, s390x, x86_64), and boot tested x86_64. No regressions noted.
Tested-by: Justin M. Forbes jforbes@fedoraproject.org
On 8/27/24 7:35 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.10.7 release. There are 273 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.7-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Tue, 27 Aug 2024 16:35:24 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.10.7 release. There are 273 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.7-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.10: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 116 tests: 116 pass, 0 fail
Linux version: 6.10.7-rc1-gaa78b3c4e7ee Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
---- On Tue, 27 Aug 2024 20:05:24 +0530 Greg Kroah-Hartman wrote ---
This is the start of the stable review cycle for the 6.10.7 release. There are 273 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 29 Aug 2024 14:37:36 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.10.7-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y and the diffstat can be found below.
Hi,
Please find the KernelCI report below :-
OVERVIEW
Builds: 30 passed, 0 failed
Boot tests: 334 passed, 0 failed
CI systems: broonie, maestro
REVISION
Commit name: v6.10.6-273-gb49f2df929ba hash: b49f2df929ba1bbf4ef3b81d88f8118cfbd8b936 Checked out from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.10.y
BUILDS
No new build failures found
BOOT TESTS
No new boot failures found
See complete and up-to-date report at:
https://kcidb.kernelci.org/d/revision/revision?orgId=1&var-git_commit_ha...
Tested-by: kernelci.org bot bot@kernelci.org
Thanks, KernelCI team
linux-stable-mirror@lists.linaro.org