This is the start of the stable review cycle for the 6.6.15 release. There are 331 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 31 Jan 2024 16:59:28 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.6.15-rc1
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: trip: Drop lockdep assertion from thermal_zone_trip_id()
Randy Dunlap rdunlap@infradead.org serial: core: fix kernel-doc for uart_port_unlock_irqrestore()
Richard Palethorpe rpalethorpe@suse.com x86/entry/ia32: Ensure s32 is sign extended to s64
Tim Chen tim.c.chen@linux.intel.com tick/sched: Preserve number of idle sleeps across CPU hotplug events
Jiri Wiesner jwiesner@suse.de clocksource: Skip watchdog check for large watchdog intervals
Dawei Li dawei.li@shingroup.cn genirq: Initialize resend_node hlist for all interrupt descriptors
Xi Ruoyao xry111@xry111.site mips: Call lose_fpu(0) before initializing fcr31 in mips_set_personality_nan
Quanquan Cao caoqq@fujitsu.com cxl/region:Fix overflow issue in alloc_hpa()
Michael Walle mwalle@kernel.org drm: bridge: samsung-dsim: Don't use FORCE_STOP_STATE
Aleksander Jan Bajkowski olek2@wp.pl MIPS: lantiq: register smp_ops on non-smp platforms
David Lechner dlechner@baylibre.com spi: fix finalize message on error return
Shyam Prasad N sprasad@microsoft.com cifs: fix stray unlock in cifs_chan_skip_or_disable
Amit Kumar Mahapatra amit.kumar-mahapatra@amd.com spi: spi-cadence: Reverse the order of interleaved write and read operations
Kamal Dasu kamal.dasu@broadcom.com spi: bcm-qspi: fix SFDP BFPT read by usig mspi read
Mario Limonciello mario.limonciello@amd.com cpufreq/amd-pstate: Fix setting scaling max/min freq values
Hsin-Yi Wang hsinyi@chromium.org drm/bridge: anx7625: Ensure bridge is suspended in disable()
Li Lingfeng lilingfeng3@huawei.com block: Move checking GENHD_FL_NO_PART to bdev_add_partition()
Mika Westerberg mika.westerberg@linux.intel.com spi: intel-pci: Remove Meteor Lake-S SoC PCI ID from the list
Artur Weber aweber.kernel@gmail.com ARM: dts: exynos4212-tab3: add samsung,invert-vclk flag to fimd
Wenhua Lin Wenhua.Lin@unisoc.com gpio: eic-sprd: Clear interrupt after set the interrupt type
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Use xa_insert() when saving raw queues
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Use xa_insert() to store opps
Fedor Pchelkin pchelkin@ispras.ru drm/exynos: gsc: minor fix for loop iteration in gsc_runtime_resume
Arnd Bergmann arnd@arndb.de drm/exynos: fix accidental on-stack copy of exynos_drm_plane
Yajun Deng yajun.deng@linux.dev memblock: fix crash when reserved memory is not added to memory
Douglas Anderson dianders@chromium.org drm/bridge: parade-ps8640: Make sure we drop the AUX mutex in the error case
Pin-yen Lin treapking@chromium.org drm/bridge: parade-ps8640: Ensure bridge is suspended in .post_disable()
Tomi Valkeinen tomi.valkeinen@ideasonboard.com drm/bridge: sii902x: Fix audio codec unregistration
Tomi Valkeinen tomi.valkeinen@ideasonboard.com drm/bridge: sii902x: Fix probing race issue
Artur Weber aweber.kernel@gmail.com drm/panel: samsung-s6d7aa0: drop DRM_BUS_FLAG_DE_HIGH for lsl080al02
Markus Niebel Markus.Niebel@ew.tq-group.com drm: panel-simple: add missing bus flags for Tianma tm070jvhg[30/33]
Douglas Anderson dianders@chromium.org drm/bridge: parade-ps8640: Wait for HPD when doing an AUX transfer
Alex Deucher alexander.deucher@amd.com drm/amdgpu/gfx11: set UNORD_DISPATCH in compute MQDs
Alex Deucher alexander.deucher@amd.com drm/amdgpu/gfx10: set UNORD_DISPATCH in compute MQDs
Hsin-Yi Wang hsinyi@chromium.org drm/panel-edp: drm/panel-edp: Fix AUO B116XTN02 name
Hsin-Yi Wang hsinyi@chromium.org drm/panel-edp: drm/panel-edp: Fix AUO B116XAK01 name and timing
Sheng-Liang Pan sheng-liang.pan@quanta.corp-partner.google.com drm/panel-edp: Add AUO B116XTN02, BOE NT116WHM-N21,836X2, NV116WHM-N49 V8.0
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/psr: Only allow PSR in LPSP mode on HSW non-ULT
Mika Kahola mika.kahola@intel.com drm/i915/lnl: Remove watchdog timers for PSR
Naohiro Aota naohiro.aota@wdc.com btrfs: zoned: optimize hint byte for zoned allocator
Naohiro Aota naohiro.aota@wdc.com btrfs: zoned: factor out prepare_allocation_zoned()
Hugo Villeneuve hvilleneuve@dimonoff.com serial: sc16is7xx: fix unconditional activation of THRI interrupt
Thomas Gleixner tglx@linutronix.de serial: sc16is7xx: Use port lock wrappers
Thomas Gleixner tglx@linutronix.de serial: core: Provide port lock wrappers
Baolin Wang baolin.wang@linux.alibaba.com mm: migrate: fix getting incorrect page mapping during page migration
Baolin Wang baolin.wang@linux.alibaba.com mm: migrate: record the mlocked page status to remove unnecessary lru drain
Di Shen di.shen@unisoc.com thermal: gov_power_allocator: avoid inability to reset a cdev
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: core: Store trip pointer in struct thermal_instance
Rafael J. Wysocki rafael.j.wysocki@intel.com thermal: trip: Drop redundant trips check from for_each_thermal_trip()
Alexander Stein alexander.stein@ew.tq-group.com media: i2c: imx290: Properly encode registers as little-endian
Alexander Stein alexander.stein@ew.tq-group.com media: v4l2-cci: Add support for little-endian encoded registers
Sakari Ailus sakari.ailus@linux.intel.com media: v4l: cci: Add macros to obtain register width and address
Sakari Ailus sakari.ailus@linux.intel.com media: v4l: cci: Include linux/bits.h
Lukas Schauer lukas@schauer.dev pipe: wakeup wr_wait after setting max_usage
Max Kellermann max.kellermann@ionos.com fs/pipe: move check to pipe_has_watch_queue()
Ricardo Neri ricardo.neri-calderon@linux.intel.com thermal: intel: hfi: Add syscore callbacks for system-wide PM
Ricardo Neri ricardo.neri-calderon@linux.intel.com thermal: intel: hfi: Disable an HFI instance when all its CPUs go offline
Ricardo Neri ricardo.neri-calderon@linux.intel.com thermal: intel: hfi: Refactor enabling code into helper functions
Martin KaFai Lau martin.lau@kernel.org net/bpf: Avoid unused "sin_addr_len" warning when CONFIG_CGROUP_BPF is not set
Srinivasan Shanmugam srinivasan.shanmugam@amd.com drm/amd/display: Fix uninitialized variable usage in core_link_ 'read_dpcd() & write_dpcd()' functions
Ma Jun Jun.Ma2@amd.com drm/amdgpu/pm: Fix the power source flag error
Srinivasan Shanmugam srinivasan.shanmugam@amd.com drm/amd/display: Fix late derefrence 'dsc' check in 'link_set_dsc_pps_packet()'
Wayne Lin Wayne.Lin@amd.com drm/amd/display: Align the returned error code with legacy DP
Nicholas Kazlauskas nicholas.kazlauskas@amd.com drm/amd/display: Port DENTIST hang and TDR fixes to OTG disable W/A
Srinivasan Shanmugam srinivasan.shanmugam@amd.com drm/amd/display: Fix variable deferencing before NULL check in edp_setup_replay()
Likun Gao Likun.Gao@amd.com drm/amdgpu: correct the cu count for gfx v11
Dan Carpenter dan.carpenter@linaro.org drm/bridge: nxp-ptn3460: simplify some error checking
Ivan Lipski ivlipski@amd.com Revert "drm/amd/display: fix bandwidth validation failure on DCN 2.1"
Mario Limonciello mario.limonciello@amd.com drm/amd/display: Disable PSR-SU on Parade 0803 TCON again
Melissa Wen mwen@igalia.com drm/amd/display: fix bandwidth validation failure on DCN 2.1
Javier Martinez Canillas javierm@redhat.com drm: Allow drivers to indicate the damage helpers to ignore damage clips
Javier Martinez Canillas javierm@redhat.com drm/virtio: Disable damage clipping if FB changed since last page-flip
Zack Rusin zackr@vmware.com drm: Disable the cursor plane on atomic contexts with virtualized drivers
Tomi Valkeinen tomi.valkeinen@ideasonboard.com drm/tidss: Fix atomic_flush check
Thomas Zimmermann tzimmermann@suse.de drm: Fix TODO list mentioning non-KMS drivers
Dan Carpenter dan.carpenter@linaro.org drm/bridge: nxp-ptn3460: fix i2c_master_send() error checking
Ville Syrjälä ville.syrjala@linux.intel.com drm: Don't unref the same fb many times by mistake due to deadlock handling
Ville Syrjälä ville.syrjala@linux.intel.com Revert "drm/i915/dsi: Do display on sequence later on icl+"
Rafael J. Wysocki rafael.j.wysocki@intel.com cpufreq: intel_pstate: Refine computation of P-state for given frequency
Mario Limonciello mario.limonciello@amd.com gpiolib: acpi: Ignore touchpad wakeup on GPD G1619-04
Dave Chinner dchinner@redhat.com xfs: read only mounts with fsopen mount API are busted
Ma Jun Jun.Ma2@amd.com drm/amdgpu: Fix the null pointer when load rlc firmware
Thomas Zimmermann tzimmermann@suse.de Revert "drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync"
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Check mailbox/SMT channel for consistency
Lin Ma linma@zju.edu.cn ksmbd: fix global oob in ksmbd_nl_policy
Shin'ichiro Kawasaki shinichiro.kawasaki@wdc.com platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe
Nathan Chancellor nathan@kernel.org platform/x86: intel-uncore-freq: Fix types in sysfs callbacks
Florian Westphal fw@strlen.de netfilter: nf_tables: reject QUEUE/DROP verdict parameters
Pablo Neira Ayuso pablo@netfilter.org netfilter: nft_chain_filter: handle NETDEV_UNREGISTER for inet/ingress basechain
Michael Kelley mhklinux@outlook.com hv_netvsc: Calculate correct ring size when PAGE_SIZE is not 4 Kbytes
NeilBrown neilb@suse.de nfsd: fix RELEASE_LOCKOWNER
Emmanuel Grumbach emmanuel.grumbach@intel.com wifi: iwlwifi: fix a memory corruption
Bernd Edlinger bernd.edlinger@hotmail.de exec: Fix error handling in begin_new_exec()
Ilya Dryomov idryomov@gmail.com rbd: don't move requests to the running list on errors
Omar Sandoval osandov@fb.com btrfs: don't abort filesystem when attempting to snapshot deleted subvolume
Qu Wenruo wqu@suse.com btrfs: defrag: reject unknown flags of btrfs_ioctl_defrag_range_args
David Sterba dsterba@suse.com btrfs: don't warn if discard range is not aligned to sector
Chung-Chiang Cheng cccheng@synology.com btrfs: tree-checker: fix inline ref size in error messages
Fedor Pchelkin pchelkin@ispras.ru btrfs: ref-verify: free ref cache before clearing mount opt
Omar Sandoval osandov@fb.com btrfs: avoid copying BTRFS_ROOT_SUBVOL_DEAD flag to snapshot of subvolume being deleted
Naohiro Aota naohiro.aota@wdc.com btrfs: zoned: fix lock ordering in btrfs_zone_activate()
Qu Wenruo wqu@suse.com btrfs: scrub: avoid use-after-free when chunk length is not 64K aligned
Gerhard Engleder gerhard@engleder-embedded.com tsnep: Fix XDP_RING_NEED_WAKEUP for empty fill ring
Gerhard Engleder gerhard@engleder-embedded.com tsnep: Remove FCS for XDP data path
Shenwei Wang shenwei.wang@nxp.com net: fec: fix the unhandled context fault from smmu
Hangbin Liu liuhangbin@gmail.com selftests: bonding: do not test arp/ns target with mode balance-alb/tlb
Zhipeng Lu alexious@zju.edu.cn fjes: fix memleaks in fjes_hw_setup
Maciej Fijalkowski maciej.fijalkowski@intel.com i40e: update xdp_rxq_info::frag_size for ZC enabled Rx queue
Maciej Fijalkowski maciej.fijalkowski@intel.com i40e: set xdp_rxq_info::frag_size
Maciej Fijalkowski maciej.fijalkowski@intel.com xdp: reflect tail increase for MEM_TYPE_XSK_BUFF_POOL
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: update xdp_rxq_info::frag_size for ZC enabled Rx queue
Maciej Fijalkowski maciej.fijalkowski@intel.com intel: xsk: initialize skb_frag_t::bv_offset in ZC drivers
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: remove redundant xdp_rxq_info registration
Tirthendu Sarkar tirthendu.sarkar@intel.com i40e: handle multi-buffer packets that are shrunk by xdp prog
Maciej Fijalkowski maciej.fijalkowski@intel.com ice: work on pre-XDP prog frag count
Maciej Fijalkowski maciej.fijalkowski@intel.com xsk: fix usage of multi-buffer BPF helpers for ZC XDP
Daan De Meyer daan.j.demeyer@gmail.com bpf: Add bpf_sock_addr_set_sun_path() to allow writing unix sockaddr from bpf
Daan De Meyer daan.j.demeyer@gmail.com bpf: Propagate modified uaddrlen from cgroup sockaddr programs
Maciej Fijalkowski maciej.fijalkowski@intel.com xsk: make xsk_buff_pool responsible for clearing xdp_buff::flags
Maciej Fijalkowski maciej.fijalkowski@intel.com xsk: recycle buffer in case Rx queue was full
Jakub Kicinski kuba@kernel.org selftests: netdevsim: fix the udp_tunnel_nic test
Jakub Kicinski kuba@kernel.org selftests: net: fix rps_default_mask with >32 CPUs
Jenishkumar Maheshbhai Patel jpatel2@marvell.com net: mvpp2: clear BM pool before initialization
Bernd Edlinger bernd.edlinger@hotmail.de net: stmmac: Wait a bit for the reset to take effect
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: validate NFPROTO_* family
Florian Westphal fw@strlen.de netfilter: nf_tables: restrict anonymous set and map names to 16 bytes
Florian Westphal fw@strlen.de netfilter: nft_limit: reject configurations that cause integer overflow
Frederic Weisbecker frederic@kernel.org rcu: Defer RCU kthreads wakeup when CPU is dying
Dinghao Liu dinghao.liu@zju.edu.cn net/mlx5e: fix a potential double-free in fs_any_create_groups
Zhipeng Lu alexious@zju.edu.cn net/mlx5e: fix a double-free in arfs_create_groups
Leon Romanovsky leon@kernel.org net/mlx5e: Ignore IPsec replay window values on sender side
Leon Romanovsky leon@kernel.org net/mlx5e: Allow software parsing when IPsec crypto is enabled
Rahul Rameshbabu rrameshbabu@nvidia.com net/mlx5: Use mlx5 device constant for selecting CQ period mode for ASO
Yevgeny Kliteynik kliteyn@nvidia.com net/mlx5: DR, Can't go to uplink vport on RX rule
Yevgeny Kliteynik kliteyn@nvidia.com net/mlx5: DR, Use the right GVMI number for drop action
Moshe Shemesh moshe@nvidia.com net/mlx5: Bridge, fix multicast packets sent to uplink
Erez Shitrit erezsh@nvidia.com net/mlx5: Bridge, Enable mcast in smfs steering mode
Yishai Hadas yishaih@nvidia.com net/mlx5: Fix a WARN upon a callback command failure
Vlad Buslov vladbu@nvidia.com net/mlx5e: Fix peer flow lists handling
Rahul Rameshbabu rrameshbabu@nvidia.com net/mlx5e: Fix operation precedence bug in port timestamping napi_poll context
Ido Schimmel idosch@nvidia.com net/sched: flower: Fix chain template offload
Jakub Kicinski kuba@kernel.org selftests: fill in some missing configs for net
Zhengchao Shao shaozhengchao@huawei.com ipv6: init the accept_queue's spinlocks in inet6_create
Zhengchao Shao shaozhengchao@huawei.com netlink: fix potential sleeping issue in mqueue_flush_file
Kuniyuki Iwashima kuniyu@amazon.com selftest: Don't reuse port for SO_INCOMING_CPU test.
Salvatore Dipietro dipiets@amazon.com tcp: Add memory barrier to tcp_push()
David Howells dhowells@redhat.com afs: Hide silly-rename files from userspace
Petr Pavlu petr.pavlu@suse.com tracing: Ensure visibility when inserting an element into tracing_map
Dan Carpenter dan.carpenter@linaro.org netfs, fscache: Prevent Oops in fscache_put_cache()
Sharath Srinivasan sharath.srinivasan@oracle.com net/rds: Fix UBSAN: array-index-out-of-bounds in rds_cmsg_recv
Horatiu Vultur horatiu.vultur@microchip.com net: micrel: Fix PTP frame parsing for lan8814
Yunjian Wang wangyunjian@huawei.com tun: add missing rx stats accounting in tun_xdp_act
Yunjian Wang wangyunjian@huawei.com tun: fix missing dropped counter in tun_xdp_act
Jakub Kicinski kuba@kernel.org net: fix removing a namespace with conflicting altnames
Eric Dumazet edumazet@google.com udp: fix busy polling
Kuniyuki Iwashima kuniyu@amazon.com llc: Drop support for ETH_P_TR_802_2.
Eric Dumazet edumazet@google.com llc: make llc_ui_sendmsg() more robust against bonding changes
Lin Ma linma@zju.edu.cn vlan: skip nested type that is not IFLA_VLAN_QOS_MAPPING
Michael Chan michael.chan@broadcom.com bnxt_en: Prevent kernel warning when running offline self test
Michael Chan michael.chan@broadcom.com bnxt_en: Wait for FLR to complete during probe
Zhengchao Shao shaozhengchao@huawei.com tcp: make sure init the accept_queue's spinlocks once
Benjamin Poirier bpoirier@nvidia.com selftests: bonding: Increase timeout to 1200s
Wen Gu guwen@linux.alibaba.com net/smc: fix illegal rmb_desc access in SMC-D connection dump
Johannes Berg johannes.berg@intel.com wifi: mac80211: fix potential sta-link leak
Lucas Stach l.stach@pengutronix.de SUNRPC: use request size to initialize bio_vec in svc_udp_sendto()
Shyam Prasad N sprasad@microsoft.com cifs: after disabling multichannel, mark tcon for reconnect
Shyam Prasad N sprasad@microsoft.com cifs: fix a pending undercount of srv_count
Shyam Prasad N sprasad@microsoft.com cifs: fix lock ordering while disabling multichannel
Jonathan Gray jsg@jsg.id.au Revert "drm/amd: Enable PCIe PME from D3"
Eduard Zingerman eddyz87@gmail.com selftests/bpf: check if max number of bpf_loop iterations is tracked
Eduard Zingerman eddyz87@gmail.com bpf: keep track of max number of bpf_loop callback iterations
Eduard Zingerman eddyz87@gmail.com selftests/bpf: test widening for iterating callbacks
Eduard Zingerman eddyz87@gmail.com bpf: widening for callback iterators
Eduard Zingerman eddyz87@gmail.com selftests/bpf: tests for iterating callbacks
Eduard Zingerman eddyz87@gmail.com bpf: verify callbacks as if they are called unknown number of times
Eduard Zingerman eddyz87@gmail.com bpf: extract setup_func_entry() utility function
Eduard Zingerman eddyz87@gmail.com bpf: extract __check_reg_arg() utility function
Eduard Zingerman eddyz87@gmail.com selftests/bpf: track string payload offset as scalar in strobemeta
Eduard Zingerman eddyz87@gmail.com selftests/bpf: track tcp payload offset as scalar in xdp_synproxy
Eduard Zingerman eddyz87@gmail.com bpf: print full verifier states on infinite loop detection
Eduard Zingerman eddyz87@gmail.com selftests/bpf: test if state loops are detected in a tricky case
Eduard Zingerman eddyz87@gmail.com bpf: correct loop detection for iterators convergence
Eduard Zingerman eddyz87@gmail.com selftests/bpf: tests with delayed read/precision makrs in loop body
Eduard Zingerman eddyz87@gmail.com bpf: exact states comparison for iterator convergence checks
Eduard Zingerman eddyz87@gmail.com bpf: extract same_callsites() as utility function
Eduard Zingerman eddyz87@gmail.com bpf: move explored_state() closer to the beginning of verifier.c
Rohan G Thomas rohan.g.thomas@intel.com dt-bindings: net: snps,dwmac: Tx coe unsupported
Namjae Jeon linkinjeon@kernel.org ksmbd: Add missing set_freezable() for freezable kthread
Namjae Jeon linkinjeon@kernel.org ksmbd: send lease break notification on FILE_RENAME_INFORMATION
Namjae Jeon linkinjeon@kernel.org ksmbd: don't increment epoch if current state and request state are same
Namjae Jeon linkinjeon@kernel.org ksmbd: fix potential circular locking issue in smb2_set_ea()
Namjae Jeon linkinjeon@kernel.org ksmbd: set v2 lease version on lease upgrade
Lino Sanfilippo l.sanfilippo@kunbus.com serial: Do not hold the port lock when setting rx-during-tx GPIO
Charan Teja Kalla quic_charante@quicinc.com mm: page_alloc: unreserve highatomic page blocks before oom
Huacai Chen chenhuacai@kernel.org LoongArch/smp: Call rcutree_report_cpu_starting() earlier
Hugo Villeneuve hvilleneuve@dimonoff.com serial: sc16is7xx: improve do/while loop in sc16is7xx_irq()
Hugo Villeneuve hvilleneuve@dimonoff.com serial: sc16is7xx: remove obsolete loop in sc16is7xx_port_irq()
Hugo Villeneuve hvilleneuve@dimonoff.com serial: sc16is7xx: fix invalid sc16is7xx_lines bitfield in case of probe error
Hugo Villeneuve hvilleneuve@dimonoff.com serial: sc16is7xx: convert from _raw_ to _noinc_ regmap functions for FIFO
Hugo Villeneuve hvilleneuve@dimonoff.com serial: sc16is7xx: change EFR lock to operate on each channels
Hugo Villeneuve hvilleneuve@dimonoff.com serial: sc16is7xx: remove unused line structure member
Hugo Villeneuve hvilleneuve@dimonoff.com serial: sc16is7xx: remove global regmap from struct sc16is7xx_port
Hugo Villeneuve hvilleneuve@dimonoff.com serial: sc16is7xx: remove wasteful static buffer in sc16is7xx_regmap_name()
Hugo Villeneuve hvilleneuve@dimonoff.com serial: sc16is7xx: improve regmap debugfs by using one regmap per port
Al Viro viro@zeniv.linux.org.uk rename(): fix the locking of subdirectories
Charan Teja Kalla quic_charante@quicinc.com mm/sparsemem: fix race in accessing memory_section->usage
Steven Rostedt (Google) rostedt@goodmis.org mm/rmap: fix misplaced parenthesis of a likely()
Donet Tom donettom@linux.vnet.ibm.com selftests: mm: hugepage-vmemmap fails on 64K page size systems
James Gowans jgowans@amazon.com kexec: do syscore_shutdown() in kernel_kexec
Zhihao Cheng chengzhihao1@huawei.com ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path
Ma Wupeng mawupeng1@huawei.com efi: disable mirror feature during crashkernel
Dave Airlie airlied@redhat.com nouveau/vmm: don't set addr on the fail path to avoid warning
Mario Limonciello mario.limonciello@amd.com rtc: Extend timeout for waiting for UIP to clear to 1s
Mario Limonciello mario.limonciello@amd.com rtc: Add support for configuring the UIP timeout for RTC reads
Mario Limonciello mario.limonciello@amd.com rtc: mc146818-lib: Adjust failure return code for mc146818_get_time()
Mario Limonciello mario.limonciello@amd.com rtc: Adjust failure return code for cmos_set_alarm()
Mario Limonciello mario.limonciello@amd.com rtc: cmos: Use ACPI alarm for non-Intel x86 systems too
Mark Rutland mark.rutland@arm.com arm64: entry: fix ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD
Mark Brown broonie@kernel.org arm64/sme: Always exit sme_alloc() early with existing storage
Rob Herring robh@kernel.org arm64: errata: Add Cortex-A510 speculative unprivileged load workaround
Rob Herring robh@kernel.org arm64: Rename ARM64_WORKAROUND_2966298
Guo Ren guoren@kernel.org riscv: mm: Fixup compat mode boot failure
Guo Ren guoren@kernel.org riscv: mm: Fixup compat arch_get_mmap_end
Zheng Wang zyytlz.wz@163.com media: mtk-jpeg: Fix use after free bug due to error path handling in mtk_jpeg_dec_device_run
Zheng Wang zyytlz.wz@163.com media: mtk-jpeg: Fix timeout schedule error in mtk_jpegdec_worker.
Alain Volmat alain.volmat@foss.st.com media: i2c: st-mipid02: correct format propagation
Andy Shevchenko andriy.shevchenko@linux.intel.com mmc: mmc_spi: remove custom DMA mapped buffers
Avri Altman avri.altman@wdc.com mmc: core: Use mrq.sbc in close-ended ffu
Michael Grzeschik m.grzeschik@pengutronix.de media: videobuf2-dma-sg: fix vmap callback
Vegard Nossum vegard.nossum@oracle.com scripts/get_abi: fix source path leak
Vegard Nossum vegard.nossum@oracle.com docs: kernel_abi.py: fix command injection
Jordan Rife jrife@google.com dlm: use kernel_connect() and kernel_bind()
Alfred Piccioni alpic@google.com lsm: new security_file_ioctl_compat() hook
Johan Hovold johan+linaro@kernel.org ARM: dts: qcom: sdx55: fix USB SS wakeup
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sdm670: fix USB SS wakeup
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sdm670: fix USB DP/DM HS PHY interrupts
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sc8180x: fix USB SS wakeup
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sc8180x: fix USB DP/DM HS PHY interrupts
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sm8150: fix USB SS wakeup
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sm8150: fix USB DP/DM HS PHY interrupts
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sdm845: fix USB SS wakeup
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sdm845: fix USB DP/DM HS PHY interrupts
Johan Hovold johan+linaro@kernel.org ARM: dts: qcom: sdx55: fix USB DP/DM HS PHY interrupts
Stephan Gerhold stephan@gerhold.net arm64: dts: qcom: Add missing vio-supply for AW2013
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sc7280: fix usb_1 wakeup interrupt types
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sc8180x: fix USB wakeup interrupt types
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sm8150: fix USB wakeup interrupt types
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sdm670: fix USB wakeup interrupt types
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sdm845: fix USB wakeup interrupt types
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sc7180: fix USB wakeup interrupt types
Stephan Gerhold stephan@gerhold.net arm64: dts: qcom: msm8939: Make blsp_dma controlled-remotely
Stephan Gerhold stephan@gerhold.net arm64: dts: qcom: msm8916: Make blsp_dma controlled-remotely
Sam Edwards cfsworks@gmail.com arm64: dts: rockchip: Fix rk3588 USB power-domain clocks
Tianling Shen cnsztl@gmail.com arm64: dts: rockchip: configure eth pad driver strength for orangepi r1 plus lts
Cixi Geng cixi.geng1@unisoc.com arm64: dts: sprd: fix the cpu node for UMS512
Johan Hovold johan+linaro@kernel.org ARM: dts: qcom: sdx55: fix pdc '#interrupt-cells'
Paul Cercueil paul@crapouillou.net ARM: dts: samsung: exynos4210-i9100: Unconditionally enable LDO12
Johan Hovold johan+linaro@kernel.org ARM: dts: qcom: sdx55: fix USB wakeup interrupt types
Johan Hovold johan+linaro@kernel.org arm64: dts: qcom: sc8280xp-crd: fix eDP phy compatible
Andrejs Cainikovs andrejs.cainikovs@toradex.com ARM: dts: imx6q-apalis: add can power-up delay on ixora board
Helge Deller deller@gmx.de parisc/power: Fix power soft-off button emulation on qemu
Helge Deller deller@gmx.de parisc/firmware: Fix F-extend for PDC addresses
Bhaumik Bhatt bbhatt@codeaurora.org bus: mhi: host: Add spinlock to protect WP access when queueing TREs
Qiang Yu quic_qianyu@quicinc.com bus: mhi: host: Drop chan lock before queuing buffers
Krishna chaitanya chundru quic_krichai@quicinc.com bus: mhi: host: Add alignment check for event ring read pointer
Serge Semin fancer.lancer@gmail.com mips: Fix max_mapnr being uninitialized on early stages
Eric Dumazet edumazet@google.com nbd: always initialize struct msghdr completely
Tony Krowiak akrowiak@linux.ibm.com s390/vfio-ap: do not reset queue removed from host config
Tony Krowiak akrowiak@linux.ibm.com s390/vfio-ap: reset queues associated with adapter for queue unbound from driver
Tony Krowiak akrowiak@linux.ibm.com s390/vfio-ap: reset queues filtered from the guest's AP config
Tony Krowiak akrowiak@linux.ibm.com s390/vfio-ap: let on_scan_complete() callback filter matrix and update guest's APCB
Tony Krowiak akrowiak@linux.ibm.com s390/vfio-ap: loop over the shadow APCB when filtering guest's AP configuration
Tony Krowiak akrowiak@linux.ibm.com s390/vfio-ap: always filter entire AP matrix
Herve Codina herve.codina@bootlin.com soc: fsl: cpm1: qmc: Fix rx channel reset
Herve Codina herve.codina@bootlin.com soc: fsl: cpm1: qmc: Fix __iomem addresses declaration
Herve Codina herve.codina@bootlin.com soc: fsl: cpm1: tsa: Fix __iomem addresses declaration
Bingbu Cao bingbu.cao@intel.com media: ov01a10: Enable runtime PM before registering async sub-device
Bingbu Cao bingbu.cao@intel.com media: ov13b10: Enable runtime PM before registering async sub-device
Bingbu Cao bingbu.cao@intel.com media: ov9734: Enable runtime PM before registering async sub-device
Xiaolei Wang xiaolei.wang@windriver.com rpmsg: virtio: Free driver_override when rpmsg_remove()
Bingbu Cao bingbu.cao@intel.com media: imx355: Enable runtime PM before registering async sub-device
Johan Hovold johan+linaro@kernel.org soc: qcom: pmic_glink_altmode: fix port sanity check
Miquel Raynal miquel.raynal@bootlin.com mtd: rawnand: Clarify conditions to enable continuous reads
Miquel Raynal miquel.raynal@bootlin.com mtd: rawnand: Prevent sequential reads with on-die ECC engines
Miquel Raynal miquel.raynal@bootlin.com mtd: rawnand: Fix core interference with sequential reads
Miquel Raynal miquel.raynal@bootlin.com mtd: rawnand: Prevent crossing LUN boundaries during sequential reads
Miquel Raynal miquel.raynal@bootlin.com mtd: maps: vmu-flash: Fix the (mtd core) switch to ref counters
Christian Marangi ansuelsmth@gmail.com PM / devfreq: Fix buffer overflow in trans_stat_show
Anthony Krowiak akrowiak@linux.ibm.com s390/vfio-ap: unpin pages on gisc registration failure
Herbert Xu herbert@gondor.apana.org.au crypto: s390/aes - Fix buffer overread in CTR mode
Herbert Xu herbert@gondor.apana.org.au hwrng: core - Fix page fault dead lock on mmap-ed hwrng
Hongchen Zhang zhanghongchen@loongson.cn PM: hibernate: Enforce ordering during image compression/decompression
Herbert Xu herbert@gondor.apana.org.au crypto: api - Disallow identical driver names
Gao Xiang xiang@kernel.org erofs: fix lz4 inplace decompression
Tianjia Zhang tianjia.zhang@linux.alibaba.com crypto: lib/mpi - Fix unexpected pointer access in mpi_ec_init
David Disseldorp ddiss@suse.de btrfs: sysfs: validate scrub_speed_max value
Viresh Kumar viresh.kumar@linaro.org OPP: Pass rounded rate to _set_opp()
Josef Bacik josef@toxicpanda.com arm64: properly install vmlinuz.efi
Rafael J. Wysocki rafael.j.wysocki@intel.com PM: sleep: Fix possible deadlocks in core system-wide PM code
Rafael J. Wysocki rafael.j.wysocki@intel.com async: Introduce async_schedule_dev_nocall()
Rafael J. Wysocki rafael.j.wysocki@intel.com async: Split async_schedule_node_domain()
Suraj Jitindar Singh surajjs@amazon.com ext4: allow for the last group to be marked as trimmed
Geoff Levand geoff@infradead.org powerpc/ps3_defconfig: Disable PPC64_BIG_ENDIAN_ELF_ABI_V2
Shyam Prasad N sprasad@microsoft.com cifs: update iface_last_update on each query-and-update
Shyam Prasad N sprasad@microsoft.com cifs: handle servers that still advertise multichannel after disabling
Shyam Prasad N sprasad@microsoft.com cifs: reconnect worker should take reference on server struct unconditionally
Shyam Prasad N sprasad@microsoft.com Revert "cifs: reconnect work should have reference on server struct"
Shyam Prasad N sprasad@microsoft.com cifs: handle when server stops supporting multichannel
Shyam Prasad N sprasad@microsoft.com cifs: handle when server starts supporting multichannel
Shyam Prasad N sprasad@microsoft.com cifs: reconnect work should have reference on server struct
Shyam Prasad N sprasad@microsoft.com cifs: handle cases where a channel is closed
Paulo Alcantara pc@manguebit.com smb: client: fix parsing of SMB3.1.1 POSIX create context
Geert Uytterhoeven geert+renesas@glider.be sh: ecovec24: Rename missed backlight field from fbdev to dev
Niklas Cassel cassel@kernel.org scsi: core: Kick the requeue list after inserting when flushing
Christophe JAILLET christophe.jaillet@wanadoo.fr riscv: Fix an off-by-one in get_early_cmdline()
Bart Van Assche bvanassche@acm.org scsi: ufs: core: Remove the ufshcd_hba_exit() call from ufshcd_async_scan()
Rex Zhang rex.zhang@intel.com dmaengine: idxd: Move dma_free_coherent() out of spinlocked context
Amelie Delaunay amelie.delaunay@foss.st.com dmaengine: fix NULL pointer in channel unregistration function
Frank Li Frank.Li@nxp.com dmaengine: fsl-edma: fix eDMAv4 channel allocation issue
Marcelo Schmitt marcelo.schmitt@analog.com iio: adc: ad7091r: Enable internal vref if external vref is not supplied
Marcelo Schmitt marcelo.schmitt@analog.com iio: adc: ad7091r: Allow users to configure device events
Marcelo Schmitt marcelo.schmitt@analog.com iio: adc: ad7091r: Set alert bit in config register
Romain Gantois romain.gantois@bootlin.com net: stmmac: Prevent DSA tags from breaking COE
Rohan G Thomas rohan.g.thomas@intel.com net: stmmac: Tx coe sw fallback
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org soundwire: fix initializing sysfs for same devices on different buses
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com soundwire: bus: introduce controller_id
Lino Sanfilippo l.sanfilippo@kunbus.com serial: core: set missing supported flag for RX during TX GPIO
Andy Shevchenko andriy.shevchenko@linux.intel.com serial: core: Simplify uart_get_rs485_mode()
Vegard Nossum vegard.nossum@oracle.com docs: kernel_feat.py: fix potential command injection
Min-Hua Chen minhuadotchen@gmail.com docs: sparse: add sparse.rst to toctree
Min-Hua Chen minhuadotchen@gmail.com docs: sparse: move TW sparse.txt to TW dev-tools
-------------
Diffstat:
Documentation/ABI/testing/sysfs-class-devfreq | 3 + Documentation/admin-guide/abi-obsolete.rst | 2 +- Documentation/admin-guide/abi-removed.rst | 2 +- Documentation/admin-guide/abi-stable.rst | 2 +- Documentation/admin-guide/abi-testing.rst | 2 +- Documentation/admin-guide/features.rst | 2 +- Documentation/arch/arc/features.rst | 2 +- Documentation/arch/arm/features.rst | 2 +- Documentation/arch/arm64/features.rst | 2 +- Documentation/arch/arm64/silicon-errata.rst | 2 + Documentation/arch/loongarch/features.rst | 2 +- Documentation/arch/m68k/features.rst | 2 +- Documentation/arch/mips/features.rst | 2 +- Documentation/arch/nios2/features.rst | 2 +- Documentation/arch/openrisc/features.rst | 2 +- Documentation/arch/parisc/features.rst | 2 +- Documentation/arch/s390/features.rst | 2 +- Documentation/arch/sh/features.rst | 2 +- Documentation/arch/sparc/features.rst | 2 +- Documentation/arch/x86/features.rst | 2 +- Documentation/arch/xtensa/features.rst | 2 +- .../devicetree/bindings/net/snps,dwmac.yaml | 5 + Documentation/filesystems/directory-locking.rst | 29 +- Documentation/filesystems/locking.rst | 5 +- Documentation/filesystems/porting.rst | 18 + Documentation/gpu/drm-kms.rst | 2 + Documentation/gpu/todo.rst | 7 +- Documentation/powerpc/features.rst | 2 +- Documentation/riscv/features.rst | 2 +- Documentation/sphinx/kernel_abi.py | 56 +- Documentation/sphinx/kernel_feat.py | 55 +- .../translations/zh_CN/arch/loongarch/features.rst | 2 +- .../translations/zh_CN/arch/mips/features.rst | 2 +- .../translations/zh_TW/dev-tools/index.rst | 40 + .../translations/zh_TW/{ => dev-tools}/sparse.txt | 0 Documentation/translations/zh_TW/index.rst | 2 +- Makefile | 4 +- arch/alpha/kernel/rtc.c | 2 +- .../boot/dts/nxp/imx/imx6q-apalis-ixora-v1.2.dts | 2 + arch/arm/boot/dts/qcom/qcom-sdx55.dtsi | 10 +- arch/arm/boot/dts/samsung/exynos4210-i9100.dts | 8 + arch/arm/boot/dts/samsung/exynos4212-tab3.dtsi | 1 + arch/arm64/Kconfig | 18 + .../boot/dts/qcom/msm8916-longcheer-l8150.dts | 1 + .../boot/dts/qcom/msm8916-wingtech-wt88047.dts | 1 + arch/arm64/boot/dts/qcom/msm8916.dtsi | 1 + arch/arm64/boot/dts/qcom/msm8939.dtsi | 1 + arch/arm64/boot/dts/qcom/msm8953-xiaomi-mido.dts | 1 + arch/arm64/boot/dts/qcom/msm8953-xiaomi-tissot.dts | 1 + arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts | 1 + arch/arm64/boot/dts/qcom/sc7180.dtsi | 4 +- arch/arm64/boot/dts/qcom/sc7280.dtsi | 4 +- arch/arm64/boot/dts/qcom/sc8180x.dtsi | 16 +- arch/arm64/boot/dts/qcom/sc8280xp-crd.dts | 2 + arch/arm64/boot/dts/qcom/sdm670.dtsi | 8 +- arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 +- arch/arm64/boot/dts/qcom/sm8150.dtsi | 16 +- .../dts/rockchip/rk3328-orangepi-r1-plus-lts.dts | 4 +- arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 1 + arch/arm64/boot/dts/sprd/ums512.dtsi | 4 +- arch/arm64/boot/install.sh | 3 +- arch/arm64/kernel/cpu_errata.c | 21 +- arch/arm64/kernel/entry.S | 22 +- arch/arm64/kernel/fpsimd.c | 6 +- arch/arm64/tools/cpucaps | 2 +- arch/loongarch/kernel/smp.c | 3 +- arch/mips/kernel/elf.c | 6 + arch/mips/lantiq/prom.c | 7 +- arch/mips/mm/init.c | 12 +- arch/parisc/kernel/firmware.c | 4 +- arch/powerpc/configs/ps3_defconfig | 1 + arch/riscv/include/asm/pgtable.h | 2 +- arch/riscv/include/asm/processor.h | 2 +- arch/riscv/kernel/pi/cmdline_early.c | 3 +- arch/s390/crypto/aes_s390.c | 4 +- arch/s390/crypto/paes_s390.c | 4 +- arch/sh/boards/mach-ecovec24/setup.c | 2 +- arch/x86/include/asm/syscall_wrapper.h | 25 +- arch/x86/kernel/hpet.c | 2 +- arch/x86/kernel/rtc.c | 2 +- block/ioctl.c | 2 - block/partitions/core.c | 5 + crypto/algapi.c | 1 + drivers/base/power/main.c | 148 ++-- drivers/base/power/trace.c | 2 +- drivers/block/nbd.c | 6 +- drivers/block/rbd.c | 22 +- drivers/bus/mhi/host/main.c | 29 +- drivers/char/hw_random/core.c | 34 +- drivers/cpufreq/amd-pstate.c | 7 +- drivers/cpufreq/intel_pstate.c | 55 +- drivers/cxl/core/region.c | 4 +- drivers/devfreq/devfreq.c | 57 +- drivers/dma/dmaengine.c | 3 + drivers/dma/fsl-edma-main.c | 8 + drivers/dma/idxd/device.c | 9 +- drivers/firmware/arm_scmi/common.h | 1 + drivers/firmware/arm_scmi/mailbox.c | 14 + drivers/firmware/arm_scmi/perf.c | 23 +- drivers/firmware/arm_scmi/raw_mode.c | 12 +- drivers/firmware/arm_scmi/shmem.c | 6 + drivers/firmware/sysfb.c | 2 +- drivers/gpio/gpio-eic-sprd.c | 32 +- drivers/gpio/gpiolib-acpi.c | 14 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 - drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 17 +- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 5 +- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c | 1 + drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 1 + .../drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 5 + .../amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c | 21 +- drivers/gpu/drm/amd/display/dc/link/link_dpms.c | 8 +- .../drm/amd/display/dc/link/protocols/link_dpcd.c | 4 +- .../dc/link/protocols/link_edp_panel_control.c | 11 +- .../drm/amd/display/modules/power/power_helpers.c | 2 + drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 13 +- drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 2 + drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 2 + drivers/gpu/drm/bridge/analogix/anx7625.c | 7 +- drivers/gpu/drm/bridge/analogix/anx7625.h | 2 + drivers/gpu/drm/bridge/nxp-ptn3460.c | 6 +- drivers/gpu/drm/bridge/parade-ps8640.c | 23 + drivers/gpu/drm/bridge/samsung-dsim.c | 32 +- drivers/gpu/drm/bridge/sii902x.c | 42 +- drivers/gpu/drm/drm_damage_helper.c | 3 +- drivers/gpu/drm/drm_plane.c | 14 + drivers/gpu/drm/exynos/exynos5433_drm_decon.c | 4 +- drivers/gpu/drm/exynos/exynos_drm_fimd.c | 4 +- drivers/gpu/drm/exynos/exynos_drm_gsc.c | 2 +- drivers/gpu/drm/i915/display/icl_dsi.c | 3 +- drivers/gpu/drm/i915/display/intel_psr.c | 22 +- drivers/gpu/drm/nouveau/nouveau_vmm.c | 3 + drivers/gpu/drm/panel/panel-edp.c | 7 +- drivers/gpu/drm/panel/panel-samsung-s6d7aa0.c | 2 +- drivers/gpu/drm/panel/panel-simple.c | 2 + drivers/gpu/drm/qxl/qxl_drv.c | 2 +- drivers/gpu/drm/tidss/tidss_crtc.c | 10 +- drivers/gpu/drm/vboxvideo/vbox_drv.c | 2 +- drivers/gpu/drm/virtio/virtgpu_drv.c | 2 +- drivers/gpu/drm/virtio/virtgpu_plane.c | 10 + drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +- drivers/iio/adc/ad7091r-base.c | 169 ++++ drivers/iio/adc/ad7091r-base.h | 8 + drivers/iio/adc/ad7091r5.c | 28 +- drivers/media/common/videobuf2/videobuf2-dma-sg.c | 10 +- drivers/media/i2c/imx290.c | 42 +- drivers/media/i2c/imx355.c | 12 +- drivers/media/i2c/ov01a10.c | 18 +- drivers/media/i2c/ov13b10.c | 14 +- drivers/media/i2c/ov9734.c | 19 +- drivers/media/i2c/st-mipid02.c | 9 +- .../media/platform/mediatek/jpeg/mtk_jpeg_core.c | 12 +- drivers/media/v4l2-core/v4l2-cci.c | 52 +- drivers/mmc/core/block.c | 46 +- drivers/mmc/host/mmc_spi.c | 186 +---- drivers/mtd/maps/vmu-flash.c | 2 +- drivers/mtd/nand/raw/nand_base.c | 87 +- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 8 + drivers/net/ethernet/engleder/tsnep_main.c | 17 +- drivers/net/ethernet/freescale/fec_main.c | 2 + drivers/net/ethernet/intel/i40e/i40e_main.c | 47 +- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 49 +- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 4 +- drivers/net/ethernet/intel/ice/ice_base.c | 37 +- drivers/net/ethernet/intel/ice/ice_txrx.c | 19 +- drivers/net/ethernet/intel/ice/ice_txrx.h | 1 + drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 31 +- drivers/net/ethernet/intel/ice/ice_xsk.c | 4 +- drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c | 27 +- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 5 +- .../mellanox/mlx5/core/en/fs_tt_redirect.c | 1 + .../net/ethernet/mellanox/mlx5/core/en/params.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c | 2 +- .../ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 10 +- drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c | 26 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 3 +- .../ethernet/mellanox/mlx5/core/esw/bridge_mcast.c | 14 +- drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c | 2 + drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c | 2 +- .../mellanox/mlx5/core/steering/dr_action.c | 17 +- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 43 +- .../net/ethernet/stmicro/stmmac/stmmac_platform.c | 3 + drivers/net/fjes/fjes_hw.c | 37 +- drivers/net/hyperv/netvsc_drv.c | 4 +- drivers/net/phy/micrel.c | 11 + drivers/net/tun.c | 10 +- drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c | 4 +- drivers/opp/core.c | 6 +- drivers/parisc/power.c | 2 +- .../uncore-frequency/uncore-frequency-common.c | 82 +- .../uncore-frequency/uncore-frequency-common.h | 32 +- drivers/platform/x86/p2sb.c | 180 ++++- drivers/rpmsg/virtio_rpmsg_bus.c | 1 + drivers/rtc/rtc-cmos.c | 28 +- drivers/rtc/rtc-mc146818-lib.c | 39 +- drivers/s390/crypto/vfio_ap_ops.c | 269 ++++--- drivers/s390/crypto/vfio_ap_private.h | 3 + drivers/scsi/scsi_error.c | 9 +- drivers/soc/fsl/qe/qmc.c | 35 +- drivers/soc/fsl/qe/tsa.c | 22 +- drivers/soc/qcom/pmic_glink_altmode.c | 4 +- drivers/soundwire/amd_manager.c | 8 + drivers/soundwire/bus.c | 4 + drivers/soundwire/debugfs.c | 2 +- drivers/soundwire/intel_auxdevice.c | 3 + drivers/soundwire/master.c | 2 +- drivers/soundwire/qcom.c | 3 + drivers/soundwire/slave.c | 12 +- drivers/spi/spi-bcm-qspi.c | 4 +- drivers/spi/spi-cadence.c | 17 +- drivers/spi/spi-intel-pci.c | 1 - drivers/spi/spi.c | 4 + drivers/thermal/gov_bang_bang.c | 23 +- drivers/thermal/gov_fair_share.c | 5 +- drivers/thermal/gov_power_allocator.c | 13 +- drivers/thermal/gov_step_wise.c | 16 +- drivers/thermal/intel/intel_hfi.c | 106 ++- drivers/thermal/thermal_core.c | 15 +- drivers/thermal/thermal_core.h | 4 +- drivers/thermal/thermal_helpers.c | 5 +- drivers/thermal/thermal_sysfs.c | 3 +- drivers/thermal/thermal_trip.c | 16 +- drivers/tty/serial/imx.c | 4 - drivers/tty/serial/sc16is7xx.c | 389 ++++----- drivers/tty/serial/serial_core.c | 58 +- drivers/tty/serial/stm32-usart.c | 8 +- drivers/ufs/core/ufshcd.c | 7 +- fs/afs/dir.c | 8 + fs/btrfs/extent-tree.c | 53 +- fs/btrfs/inode.c | 22 +- fs/btrfs/ioctl.c | 7 + fs/btrfs/ref-verify.c | 6 +- fs/btrfs/scrub.c | 29 +- fs/btrfs/sysfs.c | 4 + fs/btrfs/tree-checker.c | 2 +- fs/btrfs/zoned.c | 8 +- fs/dlm/lowcomms.c | 14 +- fs/erofs/decompressor.c | 31 +- fs/exec.c | 3 + fs/ext4/mballoc.c | 15 +- fs/fscache/cache.c | 3 +- fs/ioctl.c | 3 +- fs/namei.c | 60 +- fs/nfsd/nfs4state.c | 26 +- fs/pipe.c | 19 +- fs/smb/client/cifs_debug.c | 5 + fs/smb/client/cifsglob.h | 2 + fs/smb/client/cifsproto.h | 5 +- fs/smb/client/connect.c | 31 +- fs/smb/client/sess.c | 96 ++- fs/smb/client/smb2ops.c | 10 +- fs/smb/client/smb2pdu.c | 162 +++- fs/smb/client/smb2transport.c | 8 +- fs/smb/client/transport.c | 2 +- fs/smb/server/connection.c | 1 + fs/smb/server/ksmbd_netlink.h | 3 +- fs/smb/server/oplock.c | 16 +- fs/smb/server/smb2pdu.c | 8 +- fs/smb/server/transport_ipc.c | 4 +- fs/ubifs/dir.c | 2 + fs/xfs/xfs_super.c | 27 +- include/drm/drm_drv.h | 9 + include/drm/drm_file.h | 12 + include/drm/drm_plane.h | 10 + include/linux/async.h | 2 + include/linux/bpf-cgroup.h | 73 +- include/linux/bpf_verifier.h | 32 + include/linux/filter.h | 1 + include/linux/lsm_hook_defs.h | 2 + include/linux/mc146818rtc.h | 3 +- include/linux/mlx5/fs.h | 2 + include/linux/mlx5/mlx5_ifc.h | 2 +- include/linux/mmzone.h | 14 +- include/linux/mtd/rawnand.h | 2 + include/linux/pipe_fs_i.h | 16 + include/linux/rmap.h | 4 +- include/linux/security.h | 9 + include/linux/serial_core.h | 79 ++ include/linux/skmsg.h | 6 - include/linux/soundwire/sdw.h | 4 +- include/linux/stmmac.h | 1 + include/linux/syscalls.h | 1 + include/media/v4l2-cci.h | 11 + include/net/inet_connection_sock.h | 8 + include/net/inet_sock.h | 5 - include/net/llc_pdu.h | 6 +- include/net/sch_generic.h | 4 + include/net/sock.h | 18 +- include/net/xdp_sock_drv.h | 27 + include/uapi/linux/btrfs.h | 3 + kernel/async.c | 85 +- kernel/bpf/btf.c | 1 + kernel/bpf/cgroup.c | 17 +- kernel/bpf/verifier.c | 875 ++++++++++++++++----- kernel/irq/irqdesc.c | 2 +- kernel/kexec_core.c | 1 + kernel/power/swap.c | 38 +- kernel/rcu/tree.c | 34 +- kernel/rcu/tree_exp.h | 3 +- kernel/time/clocksource.c | 25 +- kernel/time/tick-sched.c | 5 + kernel/trace/tracing_map.c | 7 +- lib/crypto/mpi/ec.c | 3 + mm/memblock.c | 3 + mm/migrate.c | 65 +- mm/mm_init.c | 6 + mm/page_alloc.c | 16 +- mm/sparse.c | 17 +- net/8021q/vlan_netlink.c | 4 + net/core/dev.c | 9 + net/core/dev.h | 3 + net/core/filter.c | 79 +- net/core/request_sock.c | 3 - net/core/sock.c | 11 +- net/ipv4/af_inet.c | 12 +- net/ipv4/inet_connection_sock.c | 4 + net/ipv4/ping.c | 2 +- net/ipv4/tcp.c | 1 + net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/udp.c | 9 +- net/ipv6/af_inet6.c | 12 +- net/ipv6/ping.c | 2 +- net/ipv6/tcp_ipv6.c | 2 +- net/ipv6/udp.c | 6 +- net/llc/af_llc.c | 24 +- net/llc/llc_core.c | 7 - net/mac80211/sta_info.c | 5 +- net/netfilter/nf_tables_api.c | 20 +- net/netfilter/nft_chain_filter.c | 11 +- net/netfilter/nft_compat.c | 12 + net/netfilter/nft_flow_offload.c | 5 + net/netfilter/nft_limit.c | 23 +- net/netfilter/nft_nat.c | 5 + net/netfilter/nft_rt.c | 5 + net/netfilter/nft_socket.c | 5 + net/netfilter/nft_synproxy.c | 7 +- net/netfilter/nft_tproxy.c | 5 + net/netfilter/nft_xfrm.c | 5 + net/netlink/af_netlink.c | 2 +- net/rds/af_rds.c | 2 +- net/sched/cls_api.c | 9 +- net/sched/cls_flower.c | 23 + net/smc/smc_diag.c | 2 +- net/sunrpc/svcsock.c | 4 +- net/xdp/xsk.c | 12 +- net/xdp/xsk_buff_pool.c | 1 + scripts/get_abi.pl | 2 +- security/security.c | 18 + security/selinux/hooks.c | 28 + security/smack/smack_lsm.c | 1 + security/tomoyo/tomoyo.c | 1 + sound/soc/intel/boards/sof_sdw.c | 4 +- tools/testing/selftests/bpf/prog_tests/verifier.c | 2 + tools/testing/selftests/bpf/progs/cb_refs.c | 1 + tools/testing/selftests/bpf/progs/iters.c | 695 ++++++++++++++++ tools/testing/selftests/bpf/progs/strobemeta.h | 78 +- .../bpf/progs/verifier_iterating_callbacks.c | 242 ++++++ .../bpf/progs/verifier_subprog_precision.c | 86 +- .../selftests/bpf/progs/xdp_synproxy_kern.c | 84 +- .../selftests/drivers/net/bonding/bond_options.sh | 8 +- .../testing/selftests/drivers/net/bonding/settings | 2 +- .../drivers/net/netdevsim/udp_tunnel_nic.sh | 9 + tools/testing/selftests/mm/hugepage-vmemmap.c | 29 +- tools/testing/selftests/net/config | 28 + tools/testing/selftests/net/rps_default_mask.sh | 6 +- tools/testing/selftests/net/so_incoming_cpu.c | 68 +- 366 files changed, 5747 insertions(+), 2192 deletions(-)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Min-Hua Chen minhuadotchen@gmail.com
[ Upstream commit 253f68f413a87a4e2bd93e61b00410e5e1b7b774 ]
Follow Randy's advice [1] to move Documentation/translations/zh_TW/sparse.txt to Documentation/translations/zh_TW/dev-tools/sparse.txt
[1] https://lore.kernel.org/lkml/bfab7c5b-e4d3-d8d9-afab-f43c0cdf26cf@infradead....
Cc: Randy Dunlap rdunlap@infradead.org Suggested-by: Randy Dunlap rdunlap@infradead.org Reviewed-by: Randy Dunlap rdunlap@infradead.org Signed-off-by: Min-Hua Chen minhuadotchen@gmail.com Signed-off-by: Jonathan Corbet corbet@lwn.net Link: https://lore.kernel.org/r/20230902052512.12184-2-minhuadotchen@gmail.com Stable-dep-of: c48a7c44a1d0 ("docs: kernel_feat.py: fix potential command injection") Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/translations/zh_TW/{ => dev-tools}/sparse.txt | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename Documentation/translations/zh_TW/{ => dev-tools}/sparse.txt (100%)
diff --git a/Documentation/translations/zh_TW/sparse.txt b/Documentation/translations/zh_TW/dev-tools/sparse.txt similarity index 100% rename from Documentation/translations/zh_TW/sparse.txt rename to Documentation/translations/zh_TW/dev-tools/sparse.txt
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Min-Hua Chen minhuadotchen@gmail.com
[ Upstream commit c9ad95adc096f25004d4192258863806a68a9bc8 ]
Add sparst.rst to toctree, so it can be part of the docs build.
Cc: Randy Dunlap rdunlap@infradead.org Cc: Jonathan Corbet corbet@lwn.net Suggested-by: Jonathan Corbet corbet@lwn.net Signed-off-by: Min-Hua Chen minhuadotchen@gmail.com Signed-off-by: Jonathan Corbet corbet@lwn.net Link: https://lore.kernel.org/r/20230902052512.12184-4-minhuadotchen@gmail.com Stable-dep-of: c48a7c44a1d0 ("docs: kernel_feat.py: fix potential command injection") Signed-off-by: Sasha Levin sashal@kernel.org --- .../translations/zh_TW/dev-tools/index.rst | 40 +++++++++++++++++++ Documentation/translations/zh_TW/index.rst | 2 +- 2 files changed, 41 insertions(+), 1 deletion(-) create mode 100644 Documentation/translations/zh_TW/dev-tools/index.rst
diff --git a/Documentation/translations/zh_TW/dev-tools/index.rst b/Documentation/translations/zh_TW/dev-tools/index.rst new file mode 100644 index 000000000000..8f101db5a07f --- /dev/null +++ b/Documentation/translations/zh_TW/dev-tools/index.rst @@ -0,0 +1,40 @@ +.. include:: ../disclaimer-zh_TW.rst + +:Original: Documentation/dev-tools/index.rst +:Translator: Min-Hua Chen minhuadotchen@gmail.com + +============ +內核開發工具 +============ + +本文檔是有關內核開發工具文檔的合集。 +目前這些文檔已經整理在一起,不需要再花費額外的精力。 +歡迎任何補丁。 + +有關測試專用工具的簡要概述,參見 +Documentation/dev-tools/testing-overview.rst + +.. class:: toc-title + + 目錄 + +.. toctree:: + :maxdepth: 2 + + sparse + +Todolist: + + - coccinelle + - kcov + - ubsan + - kmemleak + - kcsan + - kfence + - kgdb + - kselftest + - kunit/index + - testing-overview + - gcov + - kasan + - gdb-kernel-debugging diff --git a/Documentation/translations/zh_TW/index.rst b/Documentation/translations/zh_TW/index.rst index d1cf0b4d8e46..ffcaf3272fe7 100644 --- a/Documentation/translations/zh_TW/index.rst +++ b/Documentation/translations/zh_TW/index.rst @@ -55,11 +55,11 @@ TODOList: :maxdepth: 1
process/license-rules + dev-tools/index
TODOList:
* doc-guide/index -* dev-tools/index * dev-tools/testing-overview * kernel-hacking/index * rust/index
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vegard Nossum vegard.nossum@oracle.com
[ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ]
The kernel-feat directive passes its argument straight to the shell. This is unfortunate and unnecessary.
Let's always use paths relative to $srctree/Documentation/ and use subprocess.check_call() instead of subprocess.Popen(shell=True).
This also makes the code shorter.
This is analogous to commit 3231dd586277 ("docs: kernel_abi.py: fix command injection") where we did exactly the same thing for kernel_abi.py, somehow I completely missed this one.
Link: https://fosstodon.org/@jani/111676532203641247 Reported-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Vegard Nossum vegard.nossum@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Jonathan Corbet corbet@lwn.net Link: https://lore.kernel.org/r/20240110174758.3680506-1-vegard.nossum@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/admin-guide/features.rst | 2 +- Documentation/arch/arc/features.rst | 2 +- Documentation/arch/arm/features.rst | 2 +- Documentation/arch/arm64/features.rst | 2 +- Documentation/arch/loongarch/features.rst | 2 +- Documentation/arch/m68k/features.rst | 2 +- Documentation/arch/mips/features.rst | 2 +- Documentation/arch/nios2/features.rst | 2 +- Documentation/arch/openrisc/features.rst | 2 +- Documentation/arch/parisc/features.rst | 2 +- Documentation/arch/s390/features.rst | 2 +- Documentation/arch/sh/features.rst | 2 +- Documentation/arch/sparc/features.rst | 2 +- Documentation/arch/x86/features.rst | 2 +- Documentation/arch/xtensa/features.rst | 2 +- Documentation/powerpc/features.rst | 2 +- Documentation/riscv/features.rst | 2 +- Documentation/sphinx/kernel_feat.py | 55 ++++--------------- .../zh_CN/arch/loongarch/features.rst | 2 +- .../translations/zh_CN/arch/mips/features.rst | 2 +- 20 files changed, 30 insertions(+), 63 deletions(-)
diff --git a/Documentation/admin-guide/features.rst b/Documentation/admin-guide/features.rst index 8c167082a84f..7651eca38227 100644 --- a/Documentation/admin-guide/features.rst +++ b/Documentation/admin-guide/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features +.. kernel-feat:: features diff --git a/Documentation/arch/arc/features.rst b/Documentation/arch/arc/features.rst index b793583d688a..49ff446ff744 100644 --- a/Documentation/arch/arc/features.rst +++ b/Documentation/arch/arc/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features arc +.. kernel-feat:: features arc diff --git a/Documentation/arch/arm/features.rst b/Documentation/arch/arm/features.rst index 7414ec03dd15..0e76aaf68eca 100644 --- a/Documentation/arch/arm/features.rst +++ b/Documentation/arch/arm/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features arm +.. kernel-feat:: features arm diff --git a/Documentation/arch/arm64/features.rst b/Documentation/arch/arm64/features.rst index dfa4cb3cd3ef..03321f4309d0 100644 --- a/Documentation/arch/arm64/features.rst +++ b/Documentation/arch/arm64/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features arm64 +.. kernel-feat:: features arm64 diff --git a/Documentation/arch/loongarch/features.rst b/Documentation/arch/loongarch/features.rst index ebacade3ea45..009f44c7951f 100644 --- a/Documentation/arch/loongarch/features.rst +++ b/Documentation/arch/loongarch/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features loongarch +.. kernel-feat:: features loongarch diff --git a/Documentation/arch/m68k/features.rst b/Documentation/arch/m68k/features.rst index 5107a2119472..de7f0ccf7fc8 100644 --- a/Documentation/arch/m68k/features.rst +++ b/Documentation/arch/m68k/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features m68k +.. kernel-feat:: features m68k diff --git a/Documentation/arch/mips/features.rst b/Documentation/arch/mips/features.rst index 1973d729b29a..6e0ffe3e7354 100644 --- a/Documentation/arch/mips/features.rst +++ b/Documentation/arch/mips/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features mips +.. kernel-feat:: features mips diff --git a/Documentation/arch/nios2/features.rst b/Documentation/arch/nios2/features.rst index 8449e63f69b2..89913810ccb5 100644 --- a/Documentation/arch/nios2/features.rst +++ b/Documentation/arch/nios2/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features nios2 +.. kernel-feat:: features nios2 diff --git a/Documentation/arch/openrisc/features.rst b/Documentation/arch/openrisc/features.rst index 3f7c40d219f2..bae2e25adfd6 100644 --- a/Documentation/arch/openrisc/features.rst +++ b/Documentation/arch/openrisc/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features openrisc +.. kernel-feat:: features openrisc diff --git a/Documentation/arch/parisc/features.rst b/Documentation/arch/parisc/features.rst index 501d7c450037..b3aa4d243b93 100644 --- a/Documentation/arch/parisc/features.rst +++ b/Documentation/arch/parisc/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features parisc +.. kernel-feat:: features parisc diff --git a/Documentation/arch/s390/features.rst b/Documentation/arch/s390/features.rst index 57c296a9d8f3..2883dc950681 100644 --- a/Documentation/arch/s390/features.rst +++ b/Documentation/arch/s390/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features s390 +.. kernel-feat:: features s390 diff --git a/Documentation/arch/sh/features.rst b/Documentation/arch/sh/features.rst index f722af3b6c99..fae48fe81e9b 100644 --- a/Documentation/arch/sh/features.rst +++ b/Documentation/arch/sh/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features sh +.. kernel-feat:: features sh diff --git a/Documentation/arch/sparc/features.rst b/Documentation/arch/sparc/features.rst index c0c92468b0fe..96835b6d598a 100644 --- a/Documentation/arch/sparc/features.rst +++ b/Documentation/arch/sparc/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features sparc +.. kernel-feat:: features sparc diff --git a/Documentation/arch/x86/features.rst b/Documentation/arch/x86/features.rst index b663f15053ce..a33616346a38 100644 --- a/Documentation/arch/x86/features.rst +++ b/Documentation/arch/x86/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features x86 +.. kernel-feat:: features x86 diff --git a/Documentation/arch/xtensa/features.rst b/Documentation/arch/xtensa/features.rst index 6b92c7bfa19d..28dcce1759be 100644 --- a/Documentation/arch/xtensa/features.rst +++ b/Documentation/arch/xtensa/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features xtensa +.. kernel-feat:: features xtensa diff --git a/Documentation/powerpc/features.rst b/Documentation/powerpc/features.rst index aeae73df86b0..ee4b95e04202 100644 --- a/Documentation/powerpc/features.rst +++ b/Documentation/powerpc/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features powerpc +.. kernel-feat:: features powerpc diff --git a/Documentation/riscv/features.rst b/Documentation/riscv/features.rst index c70ef6ac2368..36e90144adab 100644 --- a/Documentation/riscv/features.rst +++ b/Documentation/riscv/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features riscv +.. kernel-feat:: features riscv diff --git a/Documentation/sphinx/kernel_feat.py b/Documentation/sphinx/kernel_feat.py index 27b701ed3681..bdfaa3e4b202 100644 --- a/Documentation/sphinx/kernel_feat.py +++ b/Documentation/sphinx/kernel_feat.py @@ -37,8 +37,6 @@ import re import subprocess import sys
-from os import path - from docutils import nodes, statemachine from docutils.statemachine import ViewList from docutils.parsers.rst import directives, Directive @@ -76,33 +74,26 @@ class KernelFeat(Directive): self.state.document.settings.env.app.warn(message, prefix="")
def run(self): - doc = self.state.document if not doc.settings.file_insertion_enabled: raise self.warning("docutils: file insertion disabled")
env = doc.settings.env - cwd = path.dirname(doc.current_source) - cmd = "get_feat.pl rest --enable-fname --dir " - cmd += self.arguments[0] - - if len(self.arguments) > 1: - cmd += " --arch " + self.arguments[1]
- srctree = path.abspath(os.environ["srctree"]) + srctree = os.path.abspath(os.environ["srctree"])
- fname = cmd + args = [ + os.path.join(srctree, 'scripts/get_feat.pl'), + 'rest', + '--enable-fname', + '--dir', + os.path.join(srctree, 'Documentation', self.arguments[0]), + ]
- # extend PATH with $(srctree)/scripts - path_env = os.pathsep.join([ - srctree + os.sep + "scripts", - os.environ["PATH"] - ]) - shell_env = os.environ.copy() - shell_env["PATH"] = path_env - shell_env["srctree"] = srctree + if len(self.arguments) > 1: + args.extend(['--arch', self.arguments[1]])
- lines = self.runCmd(cmd, shell=True, cwd=cwd, env=shell_env) + lines = subprocess.check_output(args, cwd=os.path.dirname(doc.current_source)).decode('utf-8')
line_regex = re.compile("^.. FILE (\S+)$")
@@ -121,30 +112,6 @@ class KernelFeat(Directive): nodeList = self.nestedParse(out_lines, fname) return nodeList
- def runCmd(self, cmd, **kwargs): - u"""Run command ``cmd`` and return its stdout as unicode.""" - - try: - proc = subprocess.Popen( - cmd - , stdout = subprocess.PIPE - , stderr = subprocess.PIPE - , **kwargs - ) - out, err = proc.communicate() - - out, err = codecs.decode(out, 'utf-8'), codecs.decode(err, 'utf-8') - - if proc.returncode != 0: - raise self.severe( - u"command '%s' failed with return code %d" - % (cmd, proc.returncode) - ) - except OSError as exc: - raise self.severe(u"problems with '%s' directive: %s." - % (self.name, ErrorString(exc))) - return out - def nestedParse(self, lines, fname): content = ViewList() node = nodes.section() diff --git a/Documentation/translations/zh_CN/arch/loongarch/features.rst b/Documentation/translations/zh_CN/arch/loongarch/features.rst index 82bfac180bdc..cec38dda8298 100644 --- a/Documentation/translations/zh_CN/arch/loongarch/features.rst +++ b/Documentation/translations/zh_CN/arch/loongarch/features.rst @@ -5,4 +5,4 @@ :Original: Documentation/arch/loongarch/features.rst :Translator: Huacai Chen chenhuacai@loongson.cn
-.. kernel-feat:: $srctree/Documentation/features loongarch +.. kernel-feat:: features loongarch diff --git a/Documentation/translations/zh_CN/arch/mips/features.rst b/Documentation/translations/zh_CN/arch/mips/features.rst index da1b956e4a40..0d6df97db069 100644 --- a/Documentation/translations/zh_CN/arch/mips/features.rst +++ b/Documentation/translations/zh_CN/arch/mips/features.rst @@ -10,4 +10,4 @@
.. _cn_features:
-.. kernel-feat:: $srctree/Documentation/features mips +.. kernel-feat:: features mips
On Mon, Jan 29, 2024 at 09:01:07AM -0800, Greg Kroah-Hartman wrote:
6.6-stable review patch. If anyone has any objections, please let me know.
From: Vegard Nossum vegard.nossum@oracle.com
[ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ]
The kernel-feat directive passes its argument straight to the shell. This is unfortunate and unnecessary.
Let's always use paths relative to $srctree/Documentation/ and use subprocess.check_call() instead of subprocess.Popen(shell=True).
This also makes the code shorter.
This is analogous to commit 3231dd586277 ("docs: kernel_abi.py: fix command injection") where we did exactly the same thing for kernel_abi.py, somehow I completely missed this one.
Link: https://fosstodon.org/@jani/111676532203641247 Reported-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Vegard Nossum vegard.nossum@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Jonathan Corbet corbet@lwn.net Link: https://lore.kernel.org/r/20240110174758.3680506-1-vegard.nossum@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org
This patch seems to be missing something. In 6.6.15-rc1 I get a doc build failure with:
/builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kerneldoc.py:133: SyntaxWarning: invalid escape sequence '.' line_regex = re.compile("^.. LINENO ([0-9]+)$") /builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/maintainers_include.py:80: SyntaxWarning: invalid escape sequence '\s' pat = '(Documentation/([^\s?*]*).rst)' /builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/maintainers_include.py:93: SyntaxWarning: invalid escape sequence '\s' m = re.search("\s(\S):\s", line) /builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/maintainers_include.py:97: SyntaxWarning: invalid escape sequence '*' m = re.search("*([^*]+)*", line) /builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/maintainers_include.py:115: SyntaxWarning: invalid escape sequence '\s' heading = re.sub("\s+", " ", line) /builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kernel_abi.py:105: SyntaxWarning: invalid escape sequence '.' line_regex = re.compile("^.. LINENO (\S+)#([0-9]+)$") /builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kernel_feat.py:98: SyntaxWarning: invalid escape sequence '.' line_regex = re.compile("^.. FILE (\S+)$") Sphinx parallel build error: UnboundLocalError: cannot access local variable 'fname' where it is not associated with a value make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2 make[1]: *** [/builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Makefile:1715: htmldocs] Error 2
Reverting this patch allows docs to build.
Thanks, Justin
Justin Forbes jforbes@fedoraproject.org writes:
On Mon, Jan 29, 2024 at 09:01:07AM -0800, Greg Kroah-Hartman wrote:
6.6-stable review patch. If anyone has any objections, please let me know.
From: Vegard Nossum vegard.nossum@oracle.com
[ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ]
The kernel-feat directive passes its argument straight to the shell. This is unfortunate and unnecessary.
Let's always use paths relative to $srctree/Documentation/ and use subprocess.check_call() instead of subprocess.Popen(shell=True).
This also makes the code shorter.
This is analogous to commit 3231dd586277 ("docs: kernel_abi.py: fix command injection") where we did exactly the same thing for kernel_abi.py, somehow I completely missed this one.
Link: https://fosstodon.org/@jani/111676532203641247 Reported-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Vegard Nossum vegard.nossum@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Jonathan Corbet corbet@lwn.net Link: https://lore.kernel.org/r/20240110174758.3680506-1-vegard.nossum@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org
This patch seems to be missing something. In 6.6.15-rc1 I get a doc build failure with:
/builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kerneldoc.py:133: SyntaxWarning: invalid escape sequence '.' line_regex = re.compile("^.. LINENO ([0-9]+)$")
Ah ... you're missing 86a0adc029d3 (Documentation/sphinx: fix Python string escapes). That is not a problem with this patch, though; I would expect you to get the same error (with Python 3.12) without.
Thanks,
jon
On Tue, Jan 30, 2024 at 10:21 AM Jonathan Corbet corbet@lwn.net wrote:
Justin Forbes jforbes@fedoraproject.org writes:
On Mon, Jan 29, 2024 at 09:01:07AM -0800, Greg Kroah-Hartman wrote:
6.6-stable review patch. If anyone has any objections, please let me know.
From: Vegard Nossum vegard.nossum@oracle.com
[ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ]
The kernel-feat directive passes its argument straight to the shell. This is unfortunate and unnecessary.
Let's always use paths relative to $srctree/Documentation/ and use subprocess.check_call() instead of subprocess.Popen(shell=True).
This also makes the code shorter.
This is analogous to commit 3231dd586277 ("docs: kernel_abi.py: fix command injection") where we did exactly the same thing for kernel_abi.py, somehow I completely missed this one.
Link: https://fosstodon.org/@jani/111676532203641247 Reported-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Vegard Nossum vegard.nossum@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Jonathan Corbet corbet@lwn.net Link: https://lore.kernel.org/r/20240110174758.3680506-1-vegard.nossum@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org
This patch seems to be missing something. In 6.6.15-rc1 I get a doc build failure with:
/builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kerneldoc.py:133: SyntaxWarning: invalid escape sequence '.' line_regex = re.compile("^.. LINENO ([0-9]+)$")
Ah ... you're missing 86a0adc029d3 (Documentation/sphinx: fix Python string escapes). That is not a problem with this patch, though; I would expect you to get the same error (with Python 3.12) without.
Well, it appears that 6.6.15 shipped anyway, with this patch included, but not with 86a0adc029d3. If anyone else builds docs, this thread should at least show them the fix. Perhaps we can get the missing patch into 6.6.16?
Jusitn
Thanks,
jon
On Thu, Feb 01, 2024 at 06:43:46AM -0600, Justin Forbes wrote:
On Tue, Jan 30, 2024 at 10:21 AM Jonathan Corbet corbet@lwn.net wrote:
Justin Forbes jforbes@fedoraproject.org writes:
On Mon, Jan 29, 2024 at 09:01:07AM -0800, Greg Kroah-Hartman wrote:
6.6-stable review patch. If anyone has any objections, please let me know.
From: Vegard Nossum vegard.nossum@oracle.com
[ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ]
The kernel-feat directive passes its argument straight to the shell. This is unfortunate and unnecessary.
Let's always use paths relative to $srctree/Documentation/ and use subprocess.check_call() instead of subprocess.Popen(shell=True).
This also makes the code shorter.
This is analogous to commit 3231dd586277 ("docs: kernel_abi.py: fix command injection") where we did exactly the same thing for kernel_abi.py, somehow I completely missed this one.
Link: https://fosstodon.org/@jani/111676532203641247 Reported-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Vegard Nossum vegard.nossum@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Jonathan Corbet corbet@lwn.net Link: https://lore.kernel.org/r/20240110174758.3680506-1-vegard.nossum@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org
This patch seems to be missing something. In 6.6.15-rc1 I get a doc build failure with:
/builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kerneldoc.py:133: SyntaxWarning: invalid escape sequence '.' line_regex = re.compile("^.. LINENO ([0-9]+)$")
Ah ... you're missing 86a0adc029d3 (Documentation/sphinx: fix Python string escapes). That is not a problem with this patch, though; I would expect you to get the same error (with Python 3.12) without.
Well, it appears that 6.6.15 shipped anyway, with this patch included, but not with 86a0adc029d3. If anyone else builds docs, this thread should at least show them the fix. Perhaps we can get the missing patch into 6.6.16?
Sure, but again, that should be independent of this change, right?
thanks,
greg k-h
Jusitn
Thanks,
jon
Greg Kroah-Hartman gregkh@linuxfoundation.org writes:
On Thu, Feb 01, 2024 at 06:43:46AM -0600, Justin Forbes wrote:
Well, it appears that 6.6.15 shipped anyway, with this patch included, but not with 86a0adc029d3. If anyone else builds docs, this thread should at least show them the fix. Perhaps we can get the missing patch into 6.6.16?
Sure, but again, that should be independent of this change, right?
Yes, it's an independent change but seems worth backporting if people are running into the issue.
Thanks,
jon
On Thu, Feb 1, 2024 at 8:25 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Feb 01, 2024 at 06:43:46AM -0600, Justin Forbes wrote:
On Tue, Jan 30, 2024 at 10:21 AM Jonathan Corbet corbet@lwn.net wrote:
Justin Forbes jforbes@fedoraproject.org writes:
On Mon, Jan 29, 2024 at 09:01:07AM -0800, Greg Kroah-Hartman wrote:
6.6-stable review patch. If anyone has any objections, please let me know.
From: Vegard Nossum vegard.nossum@oracle.com
[ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ]
The kernel-feat directive passes its argument straight to the shell. This is unfortunate and unnecessary.
Let's always use paths relative to $srctree/Documentation/ and use subprocess.check_call() instead of subprocess.Popen(shell=True).
This also makes the code shorter.
This is analogous to commit 3231dd586277 ("docs: kernel_abi.py: fix command injection") where we did exactly the same thing for kernel_abi.py, somehow I completely missed this one.
Link: https://fosstodon.org/@jani/111676532203641247 Reported-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Vegard Nossum vegard.nossum@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Jonathan Corbet corbet@lwn.net Link: https://lore.kernel.org/r/20240110174758.3680506-1-vegard.nossum@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org
This patch seems to be missing something. In 6.6.15-rc1 I get a doc build failure with:
/builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kerneldoc.py:133: SyntaxWarning: invalid escape sequence '.' line_regex = re.compile("^.. LINENO ([0-9]+)$")
Ah ... you're missing 86a0adc029d3 (Documentation/sphinx: fix Python string escapes). That is not a problem with this patch, though; I would expect you to get the same error (with Python 3.12) without.
Well, it appears that 6.6.15 shipped anyway, with this patch included, but not with 86a0adc029d3. If anyone else builds docs, this thread should at least show them the fix. Perhaps we can get the missing patch into 6.6.16?
Sure, but again, that should be independent of this change, right?
I am not sure I would say independent. This particular change causes docs to fail the build as I mentioned during rc1. There were no issues building 6.6.14 or previous releases, and no problem building 6.7.3.
Justin
thanks,
greg k-h
Jusitn
Thanks,
jon
On Thu, Feb 1, 2024 at 8:41 AM Justin Forbes jforbes@fedoraproject.org wrote:
On Thu, Feb 1, 2024 at 8:25 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Feb 01, 2024 at 06:43:46AM -0600, Justin Forbes wrote:
On Tue, Jan 30, 2024 at 10:21 AM Jonathan Corbet corbet@lwn.net wrote:
Justin Forbes jforbes@fedoraproject.org writes:
On Mon, Jan 29, 2024 at 09:01:07AM -0800, Greg Kroah-Hartman wrote:
6.6-stable review patch. If anyone has any objections, please let me know.
From: Vegard Nossum vegard.nossum@oracle.com
[ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ]
The kernel-feat directive passes its argument straight to the shell. This is unfortunate and unnecessary.
Let's always use paths relative to $srctree/Documentation/ and use subprocess.check_call() instead of subprocess.Popen(shell=True).
This also makes the code shorter.
This is analogous to commit 3231dd586277 ("docs: kernel_abi.py: fix command injection") where we did exactly the same thing for kernel_abi.py, somehow I completely missed this one.
Link: https://fosstodon.org/@jani/111676532203641247 Reported-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Vegard Nossum vegard.nossum@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Jonathan Corbet corbet@lwn.net Link: https://lore.kernel.org/r/20240110174758.3680506-1-vegard.nossum@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org
This patch seems to be missing something. In 6.6.15-rc1 I get a doc build failure with:
/builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kerneldoc.py:133: SyntaxWarning: invalid escape sequence '.' line_regex = re.compile("^.. LINENO ([0-9]+)$")
Ah ... you're missing 86a0adc029d3 (Documentation/sphinx: fix Python string escapes). That is not a problem with this patch, though; I would expect you to get the same error (with Python 3.12) without.
Well, it appears that 6.6.15 shipped anyway, with this patch included, but not with 86a0adc029d3. If anyone else builds docs, this thread should at least show them the fix. Perhaps we can get the missing patch into 6.6.16?
Sure, but again, that should be independent of this change, right?
I am not sure I would say independent. This particular change causes docs to fail the build as I mentioned during rc1. There were no issues building 6.6.14 or previous releases, and no problem building 6.7.3.
I can confirm that adding this patch to 6.6.15 makes docs build again.
Justin
Justin
thanks,
greg k-h
Jusitn
Thanks,
jon
On Thu, Feb 1, 2024 at 8:58 AM Justin Forbes jforbes@fedoraproject.org wrote:
On Thu, Feb 1, 2024 at 8:41 AM Justin Forbes jforbes@fedoraproject.org wrote:
On Thu, Feb 1, 2024 at 8:25 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Feb 01, 2024 at 06:43:46AM -0600, Justin Forbes wrote:
On Tue, Jan 30, 2024 at 10:21 AM Jonathan Corbet corbet@lwn.net wrote:
Justin Forbes jforbes@fedoraproject.org writes:
On Mon, Jan 29, 2024 at 09:01:07AM -0800, Greg Kroah-Hartman wrote: > 6.6-stable review patch. If anyone has any objections, please let me know. > > ------------------ > > From: Vegard Nossum vegard.nossum@oracle.com > > [ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ] > > The kernel-feat directive passes its argument straight to the shell. > This is unfortunate and unnecessary. > > Let's always use paths relative to $srctree/Documentation/ and use > subprocess.check_call() instead of subprocess.Popen(shell=True). > > This also makes the code shorter. > > This is analogous to commit 3231dd586277 ("docs: kernel_abi.py: fix > command injection") where we did exactly the same thing for > kernel_abi.py, somehow I completely missed this one. > > Link: https://fosstodon.org/@jani/111676532203641247 > Reported-by: Jani Nikula jani.nikula@intel.com > Signed-off-by: Vegard Nossum vegard.nossum@oracle.com > Cc: stable@vger.kernel.org > Signed-off-by: Jonathan Corbet corbet@lwn.net > Link: https://lore.kernel.org/r/20240110174758.3680506-1-vegard.nossum@oracle.com > Signed-off-by: Sasha Levin sashal@kernel.org
This patch seems to be missing something. In 6.6.15-rc1 I get a doc build failure with:
/builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kerneldoc.py:133: SyntaxWarning: invalid escape sequence '.' line_regex = re.compile("^.. LINENO ([0-9]+)$")
Ah ... you're missing 86a0adc029d3 (Documentation/sphinx: fix Python string escapes). That is not a problem with this patch, though; I would expect you to get the same error (with Python 3.12) without.
Well, it appears that 6.6.15 shipped anyway, with this patch included, but not with 86a0adc029d3. If anyone else builds docs, this thread should at least show them the fix. Perhaps we can get the missing patch into 6.6.16?
Sure, but again, that should be independent of this change, right?
I am not sure I would say independent. This particular change causes docs to fail the build as I mentioned during rc1. There were no issues building 6.6.14 or previous releases, and no problem building 6.7.3.
I can confirm that adding this patch to 6.6.15 makes docs build again.
I lied, it just fails slightly differently. Some of the noise is gone, but we still have: Sphinx parallel build error: UnboundLocalError: cannot access local variable 'fname' where it is not associated with a value make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2 make[1]: *** [/builddir/build/BUILD/kernel-6.6.15/linux-6.6.15-200.fc39.noarch/Makefile:1715: htmldocs] Error 2
Justin
Justin
thanks,
greg k-h
Jusitn
Thanks,
jon
On 01/02/2024 16:07, Justin Forbes wrote:
On Thu, Feb 1, 2024 at 8:58 AM Justin Forbes jforbes@fedoraproject.org wrote:
On Thu, Feb 1, 2024 at 8:41 AM Justin Forbes jforbes@fedoraproject.org wrote:
On Thu, Feb 1, 2024 at 8:25 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Feb 01, 2024 at 06:43:46AM -0600, Justin Forbes wrote:
On Tue, Jan 30, 2024 at 10:21 AM Jonathan Corbet corbet@lwn.net wrote:
Justin Forbes jforbes@fedoraproject.org writes: > On Mon, Jan 29, 2024 at 09:01:07AM -0800, Greg Kroah-Hartman wrote: >> 6.6-stable review patch. If anyone has any objections, please let me know. >> >> ------------------ >> >> From: Vegard Nossum vegard.nossum@oracle.com >> >> [ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ] >> >> The kernel-feat directive passes its argument straight to the shell. >> This is unfortunate and unnecessary.
[...]
> This patch seems to be missing something. In 6.6.15-rc1 I get a doc > build failure with: > > /builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kerneldoc.py:133: SyntaxWarning: invalid escape sequence '.' > line_regex = re.compile("^.. LINENO ([0-9]+)$")
Ah ... you're missing 86a0adc029d3 (Documentation/sphinx: fix Python string escapes). That is not a problem with this patch, though; I would expect you to get the same error (with Python 3.12) without.
Well, it appears that 6.6.15 shipped anyway, with this patch included, but not with 86a0adc029d3. If anyone else builds docs, this thread should at least show them the fix. Perhaps we can get the missing patch into 6.6.16?
Sure, but again, that should be independent of this change, right?
I am not sure I would say independent. This particular change causes docs to fail the build as I mentioned during rc1. There were no issues building 6.6.14 or previous releases, and no problem building 6.7.3.
I can confirm that adding this patch to 6.6.15 makes docs build again.
I lied, it just fails slightly differently. Some of the noise is gone, but we still have: Sphinx parallel build error: UnboundLocalError: cannot access local variable 'fname' where it is not associated with a value make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2 make[1]: *** [/builddir/build/BUILD/kernel-6.6.15/linux-6.6.15-200.fc39.noarch/Makefile:1715: htmldocs] Error 2
The old version of the script unconditionally assigned a value to the local variable 'fname' (not a value that makes sense to me, since it's literally assigning the whole command, not just a filename, but that's a separate issue), and I removed that so it's only conditionally assigned. This is almost certainly a bug in my patch.
I'm guessing maybe a different patch between 6.6 and current mainline is causing 'fname' to always get assigned for the newer versions and thus make the run succeed, in spite of the bug.
Something like the patch below (completely untested) should restore the previous behaviour, but I'm not convinced it's correct.
Vegard
diff --git a/Documentation/sphinx/kernel_feat.py b/Documentation/sphinx/kernel_feat.py index b9df61eb4501..15713be8b657 100644 --- a/Documentation/sphinx/kernel_feat.py +++ b/Documentation/sphinx/kernel_feat.py @@ -93,6 +93,8 @@ class KernelFeat(Directive): if len(self.arguments) > 1: args.extend(['--arch', self.arguments[1]])
+ fname = ' '.join(args) + lines = subprocess.check_output(args, cwd=os.path.dirname(doc.current_source)).decode('utf-8')
line_regex = re.compile(r"^.. FILE (\S+)$")
Hi,
On Thu, Feb 01, 2024 at 05:34:25PM +0100, Vegard Nossum wrote:
On 01/02/2024 16:07, Justin Forbes wrote:
On Thu, Feb 1, 2024 at 8:58 AM Justin Forbes jforbes@fedoraproject.org wrote:
On Thu, Feb 1, 2024 at 8:41 AM Justin Forbes jforbes@fedoraproject.org wrote:
On Thu, Feb 1, 2024 at 8:25 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Feb 01, 2024 at 06:43:46AM -0600, Justin Forbes wrote:
On Tue, Jan 30, 2024 at 10:21 AM Jonathan Corbet corbet@lwn.net wrote: > Justin Forbes jforbes@fedoraproject.org writes: > > On Mon, Jan 29, 2024 at 09:01:07AM -0800, Greg Kroah-Hartman wrote: > > > 6.6-stable review patch. If anyone has any objections, please let me know. > > > > > > ------------------ > > > > > > From: Vegard Nossum vegard.nossum@oracle.com > > > > > > [ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ] > > > > > > The kernel-feat directive passes its argument straight to the shell. > > > This is unfortunate and unnecessary.
[...]
> > This patch seems to be missing something. In 6.6.15-rc1 I get a doc > > build failure with: > > > > /builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kerneldoc.py:133: SyntaxWarning: invalid escape sequence '.' > > line_regex = re.compile("^.. LINENO ([0-9]+)$") > > Ah ... you're missing 86a0adc029d3 (Documentation/sphinx: fix Python > string escapes). That is not a problem with this patch, though; I would > expect you to get the same error (with Python 3.12) without.
Well, it appears that 6.6.15 shipped anyway, with this patch included, but not with 86a0adc029d3. If anyone else builds docs, this thread should at least show them the fix. Perhaps we can get the missing patch into 6.6.16?
Sure, but again, that should be independent of this change, right?
I am not sure I would say independent. This particular change causes docs to fail the build as I mentioned during rc1. There were no issues building 6.6.14 or previous releases, and no problem building 6.7.3.
I can confirm that adding this patch to 6.6.15 makes docs build again.
I lied, it just fails slightly differently. Some of the noise is gone, but we still have: Sphinx parallel build error: UnboundLocalError: cannot access local variable 'fname' where it is not associated with a value make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2 make[1]: *** [/builddir/build/BUILD/kernel-6.6.15/linux-6.6.15-200.fc39.noarch/Makefile:1715: htmldocs] Error 2
The old version of the script unconditionally assigned a value to the local variable 'fname' (not a value that makes sense to me, since it's literally assigning the whole command, not just a filename, but that's a separate issue), and I removed that so it's only conditionally assigned. This is almost certainly a bug in my patch.
I'm guessing maybe a different patch between 6.6 and current mainline is causing 'fname' to always get assigned for the newer versions and thus make the run succeed, in spite of the bug.
Something like the patch below (completely untested) should restore the previous behaviour, but I'm not convinced it's correct.
Vegard
diff --git a/Documentation/sphinx/kernel_feat.py b/Documentation/sphinx/kernel_feat.py index b9df61eb4501..15713be8b657 100644 --- a/Documentation/sphinx/kernel_feat.py +++ b/Documentation/sphinx/kernel_feat.py @@ -93,6 +93,8 @@ class KernelFeat(Directive): if len(self.arguments) > 1: args.extend(['--arch', self.arguments[1]])
fname = ' '.join(args)
lines = subprocess.check_output(args,
cwd=os.path.dirname(doc.current_source)).decode('utf-8')
line_regex = re.compile(r"^\.\. FILE (\S+)$")
We have as well a documention build problem in Debian, cf. https://buildd.debian.org/status/fetch.php?pkg=linux&arch=all&ver=6.... though not yet using python 3.12 as default.
Your above change seems to workaround the issue in fact, but need to do a full build yet.
Regards, Salvatore
On Sun, Feb 04, 2024 at 06:05:05PM +0100, Salvatore Bonaccorso wrote:
Hi,
On Thu, Feb 01, 2024 at 05:34:25PM +0100, Vegard Nossum wrote:
On 01/02/2024 16:07, Justin Forbes wrote:
On Thu, Feb 1, 2024 at 8:58 AM Justin Forbes jforbes@fedoraproject.org wrote:
On Thu, Feb 1, 2024 at 8:41 AM Justin Forbes jforbes@fedoraproject.org wrote:
On Thu, Feb 1, 2024 at 8:25 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Feb 01, 2024 at 06:43:46AM -0600, Justin Forbes wrote: > On Tue, Jan 30, 2024 at 10:21 AM Jonathan Corbet corbet@lwn.net wrote: > > Justin Forbes jforbes@fedoraproject.org writes: > > > On Mon, Jan 29, 2024 at 09:01:07AM -0800, Greg Kroah-Hartman wrote: > > > > 6.6-stable review patch. If anyone has any objections, please let me know. > > > > > > > > ------------------ > > > > > > > > From: Vegard Nossum vegard.nossum@oracle.com > > > > > > > > [ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ] > > > > > > > > The kernel-feat directive passes its argument straight to the shell. > > > > This is unfortunate and unnecessary.
[...]
> > > This patch seems to be missing something. In 6.6.15-rc1 I get a doc > > > build failure with: > > > > > > /builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kerneldoc.py:133: SyntaxWarning: invalid escape sequence '.' > > > line_regex = re.compile("^.. LINENO ([0-9]+)$") > > > > Ah ... you're missing 86a0adc029d3 (Documentation/sphinx: fix Python > > string escapes). That is not a problem with this patch, though; I would > > expect you to get the same error (with Python 3.12) without. > > Well, it appears that 6.6.15 shipped anyway, with this patch included, > but not with 86a0adc029d3. If anyone else builds docs, this thread > should at least show them the fix. Perhaps we can get the missing > patch into 6.6.16?
Sure, but again, that should be independent of this change, right?
I am not sure I would say independent. This particular change causes docs to fail the build as I mentioned during rc1. There were no issues building 6.6.14 or previous releases, and no problem building 6.7.3.
I can confirm that adding this patch to 6.6.15 makes docs build again.
I lied, it just fails slightly differently. Some of the noise is gone, but we still have: Sphinx parallel build error: UnboundLocalError: cannot access local variable 'fname' where it is not associated with a value make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2 make[1]: *** [/builddir/build/BUILD/kernel-6.6.15/linux-6.6.15-200.fc39.noarch/Makefile:1715: htmldocs] Error 2
The old version of the script unconditionally assigned a value to the local variable 'fname' (not a value that makes sense to me, since it's literally assigning the whole command, not just a filename, but that's a separate issue), and I removed that so it's only conditionally assigned. This is almost certainly a bug in my patch.
I'm guessing maybe a different patch between 6.6 and current mainline is causing 'fname' to always get assigned for the newer versions and thus make the run succeed, in spite of the bug.
Something like the patch below (completely untested) should restore the previous behaviour, but I'm not convinced it's correct.
Vegard
diff --git a/Documentation/sphinx/kernel_feat.py b/Documentation/sphinx/kernel_feat.py index b9df61eb4501..15713be8b657 100644 --- a/Documentation/sphinx/kernel_feat.py +++ b/Documentation/sphinx/kernel_feat.py @@ -93,6 +93,8 @@ class KernelFeat(Directive): if len(self.arguments) > 1: args.extend(['--arch', self.arguments[1]])
fname = ' '.join(args)
lines = subprocess.check_output(args,
cwd=os.path.dirname(doc.current_source)).decode('utf-8')
line_regex = re.compile(r"^\.\. FILE (\S+)$")
We have as well a documention build problem in Debian, cf. https://buildd.debian.org/status/fetch.php?pkg=linux&arch=all&ver=6.... though not yet using python 3.12 as default.
Your above change seems to workaround the issue in fact, but need to do a full build yet.
For Debian I'm temporarily reverting from the 6.6.15 upload:
e961f8c6966a ("docs: kernel_feat.py: fix potential command injection")
This is not the best solution, but unbreaks several other builds.
The alternative would be to apply Vegard's workaround or the proper solution for that.
Regards, Salvatore
On Sun, Feb 04, 2024 at 09:31:48PM +0100, Salvatore Bonaccorso wrote:
On Sun, Feb 04, 2024 at 06:05:05PM +0100, Salvatore Bonaccorso wrote:
Hi,
On Thu, Feb 01, 2024 at 05:34:25PM +0100, Vegard Nossum wrote:
On 01/02/2024 16:07, Justin Forbes wrote:
On Thu, Feb 1, 2024 at 8:58 AM Justin Forbes jforbes@fedoraproject.org wrote:
On Thu, Feb 1, 2024 at 8:41 AM Justin Forbes jforbes@fedoraproject.org wrote:
On Thu, Feb 1, 2024 at 8:25 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote: > On Thu, Feb 01, 2024 at 06:43:46AM -0600, Justin Forbes wrote: > > On Tue, Jan 30, 2024 at 10:21 AM Jonathan Corbet corbet@lwn.net wrote: > > > Justin Forbes jforbes@fedoraproject.org writes: > > > > On Mon, Jan 29, 2024 at 09:01:07AM -0800, Greg Kroah-Hartman wrote: > > > > > 6.6-stable review patch. If anyone has any objections, please let me know. > > > > > > > > > > ------------------ > > > > > > > > > > From: Vegard Nossum vegard.nossum@oracle.com > > > > > > > > > > [ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ] > > > > > > > > > > The kernel-feat directive passes its argument straight to the shell. > > > > > This is unfortunate and unnecessary.
[...]
> > > > This patch seems to be missing something. In 6.6.15-rc1 I get a doc > > > > build failure with: > > > > > > > > /builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kerneldoc.py:133: SyntaxWarning: invalid escape sequence '.' > > > > line_regex = re.compile("^.. LINENO ([0-9]+)$") > > > > > > Ah ... you're missing 86a0adc029d3 (Documentation/sphinx: fix Python > > > string escapes). That is not a problem with this patch, though; I would > > > expect you to get the same error (with Python 3.12) without. > > > > Well, it appears that 6.6.15 shipped anyway, with this patch included, > > but not with 86a0adc029d3. If anyone else builds docs, this thread > > should at least show them the fix. Perhaps we can get the missing > > patch into 6.6.16? > > Sure, but again, that should be independent of this change, right?
I am not sure I would say independent. This particular change causes docs to fail the build as I mentioned during rc1. There were no issues building 6.6.14 or previous releases, and no problem building 6.7.3.
I can confirm that adding this patch to 6.6.15 makes docs build again.
I lied, it just fails slightly differently. Some of the noise is gone, but we still have: Sphinx parallel build error: UnboundLocalError: cannot access local variable 'fname' where it is not associated with a value make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2 make[1]: *** [/builddir/build/BUILD/kernel-6.6.15/linux-6.6.15-200.fc39.noarch/Makefile:1715: htmldocs] Error 2
The old version of the script unconditionally assigned a value to the local variable 'fname' (not a value that makes sense to me, since it's literally assigning the whole command, not just a filename, but that's a separate issue), and I removed that so it's only conditionally assigned. This is almost certainly a bug in my patch.
I'm guessing maybe a different patch between 6.6 and current mainline is causing 'fname' to always get assigned for the newer versions and thus make the run succeed, in spite of the bug.
Something like the patch below (completely untested) should restore the previous behaviour, but I'm not convinced it's correct.
Vegard
diff --git a/Documentation/sphinx/kernel_feat.py b/Documentation/sphinx/kernel_feat.py index b9df61eb4501..15713be8b657 100644 --- a/Documentation/sphinx/kernel_feat.py +++ b/Documentation/sphinx/kernel_feat.py @@ -93,6 +93,8 @@ class KernelFeat(Directive): if len(self.arguments) > 1: args.extend(['--arch', self.arguments[1]])
fname = ' '.join(args)
lines = subprocess.check_output(args,
cwd=os.path.dirname(doc.current_source)).decode('utf-8')
line_regex = re.compile(r"^\.\. FILE (\S+)$")
We have as well a documention build problem in Debian, cf. https://buildd.debian.org/status/fetch.php?pkg=linux&arch=all&ver=6.... though not yet using python 3.12 as default.
Your above change seems to workaround the issue in fact, but need to do a full build yet.
For Debian I'm temporarily reverting from the 6.6.15 upload:
e961f8c6966a ("docs: kernel_feat.py: fix potential command injection")
This is not the best solution, but unbreaks several other builds.
The alternative would be to apply Vegard's workaround or the proper solution for that.
What is the "proper" solution here? Does 6.8-rc3 work? What are we missing to be backported here?
thanks,
greg k-h
On Sun, Feb 4, 2024 at 7:29 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Sun, Feb 04, 2024 at 09:31:48PM +0100, Salvatore Bonaccorso wrote:
On Sun, Feb 04, 2024 at 06:05:05PM +0100, Salvatore Bonaccorso wrote:
Hi,
On Thu, Feb 01, 2024 at 05:34:25PM +0100, Vegard Nossum wrote:
On 01/02/2024 16:07, Justin Forbes wrote:
On Thu, Feb 1, 2024 at 8:58 AM Justin Forbes jforbes@fedoraproject.org wrote:
On Thu, Feb 1, 2024 at 8:41 AM Justin Forbes jforbes@fedoraproject.org wrote: > On Thu, Feb 1, 2024 at 8:25 AM Greg Kroah-Hartman > gregkh@linuxfoundation.org wrote: > > On Thu, Feb 01, 2024 at 06:43:46AM -0600, Justin Forbes wrote: > > > On Tue, Jan 30, 2024 at 10:21 AM Jonathan Corbet corbet@lwn.net wrote: > > > > Justin Forbes jforbes@fedoraproject.org writes: > > > > > On Mon, Jan 29, 2024 at 09:01:07AM -0800, Greg Kroah-Hartman wrote: > > > > > > 6.6-stable review patch. If anyone has any objections, please let me know. > > > > > > > > > > > > ------------------ > > > > > > > > > > > > From: Vegard Nossum vegard.nossum@oracle.com > > > > > > > > > > > > [ Upstream commit c48a7c44a1d02516309015b6134c9bb982e17008 ] > > > > > > > > > > > > The kernel-feat directive passes its argument straight to the shell. > > > > > > This is unfortunate and unnecessary.
[...]
> > > > > This patch seems to be missing something. In 6.6.15-rc1 I get a doc > > > > > build failure with: > > > > > > > > > > /builddir/build/BUILD/kernel-6.6.14-332-g1ff49073b88b/linux-6.6.15-0.rc1.1ff49073b88b.200.fc39.noarch/Documentation/sphinx/kerneldoc.py:133: SyntaxWarning: invalid escape sequence '.' > > > > > line_regex = re.compile("^.. LINENO ([0-9]+)$") > > > > > > > > Ah ... you're missing 86a0adc029d3 (Documentation/sphinx: fix Python > > > > string escapes). That is not a problem with this patch, though; I would > > > > expect you to get the same error (with Python 3.12) without. > > > > > > Well, it appears that 6.6.15 shipped anyway, with this patch included, > > > but not with 86a0adc029d3. If anyone else builds docs, this thread > > > should at least show them the fix. Perhaps we can get the missing > > > patch into 6.6.16? > > > > Sure, but again, that should be independent of this change, right? > > I am not sure I would say independent. This particular change causes > docs to fail the build as I mentioned during rc1. There were no > issues building 6.6.14 or previous releases, and no problem building > 6.7.3.
I can confirm that adding this patch to 6.6.15 makes docs build again.
I lied, it just fails slightly differently. Some of the noise is gone, but we still have: Sphinx parallel build error: UnboundLocalError: cannot access local variable 'fname' where it is not associated with a value make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2 make[1]: *** [/builddir/build/BUILD/kernel-6.6.15/linux-6.6.15-200.fc39.noarch/Makefile:1715: htmldocs] Error 2
The old version of the script unconditionally assigned a value to the local variable 'fname' (not a value that makes sense to me, since it's literally assigning the whole command, not just a filename, but that's a separate issue), and I removed that so it's only conditionally assigned. This is almost certainly a bug in my patch.
I'm guessing maybe a different patch between 6.6 and current mainline is causing 'fname' to always get assigned for the newer versions and thus make the run succeed, in spite of the bug.
Something like the patch below (completely untested) should restore the previous behaviour, but I'm not convinced it's correct.
Vegard
diff --git a/Documentation/sphinx/kernel_feat.py b/Documentation/sphinx/kernel_feat.py index b9df61eb4501..15713be8b657 100644 --- a/Documentation/sphinx/kernel_feat.py +++ b/Documentation/sphinx/kernel_feat.py @@ -93,6 +93,8 @@ class KernelFeat(Directive): if len(self.arguments) > 1: args.extend(['--arch', self.arguments[1]])
fname = ' '.join(args)
lines = subprocess.check_output(args,
cwd=os.path.dirname(doc.current_source)).decode('utf-8')
line_regex = re.compile(r"^\.\. FILE (\S+)$")
We have as well a documention build problem in Debian, cf. https://buildd.debian.org/status/fetch.php?pkg=linux&arch=all&ver=6.... though not yet using python 3.12 as default.
Your above change seems to workaround the issue in fact, but need to do a full build yet.
For Debian I'm temporarily reverting from the 6.6.15 upload:
e961f8c6966a ("docs: kernel_feat.py: fix potential command injection")
This is not the best solution, but unbreaks several other builds.
The alternative would be to apply Vegard's workaround or the proper solution for that.
What is the "proper" solution here? Does 6.8-rc3 work? What are we missing to be backported here?
I am not sure what the "proper"fix was, but as I mentioned with 6.6.15, this patch broke the build with 6.6.15, but 6.7.3 and newer were fine. I think the fix came in through another path incidentally, but Vegard mentioned a possible fox for 6.6 kernels. Realistically, Fedora has moved on to 6.7.x now, but I do still test 6.6.x stable rcs and while I reported that 6.6.16 was good, it was only because I saw no regressions from 6.6.15. The docs failure from this patch still exists.
Justin
thanks,
greg k-h
On 05/02/2024 03:36, Justin Forbes wrote:
On Sun, Feb 4, 2024 at 7:29 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Sun, Feb 04, 2024 at 09:31:48PM +0100, Salvatore Bonaccorso wrote:
On Sun, Feb 04, 2024 at 06:05:05PM +0100, Salvatore Bonaccorso wrote:
On Thu, Feb 01, 2024 at 05:34:25PM +0100, Vegard Nossum wrote:
[...]
I'm guessing maybe a different patch between 6.6 and current mainline is causing 'fname' to always get assigned for the newer versions and thus make the run succeed, in spite of the bug.
Something like the patch below (completely untested) should restore the previous behaviour, but I'm not convinced it's correct.
[...]
Your above change seems to workaround the issue in fact, but need to do a full build yet.
For Debian I'm temporarily reverting from the 6.6.15 upload:
e961f8c6966a ("docs: kernel_feat.py: fix potential command injection")
This is not the best solution, but unbreaks several other builds.
The alternative would be to apply Vegard's workaround or the proper solution for that.
What is the "proper" solution here? Does 6.8-rc3 work? What are we missing to be backported here?
I am not sure what the "proper"fix was, but as I mentioned with 6.6.15, this patch broke the build with 6.6.15, but 6.7.3 and newer were fine. I think the fix came in through another path incidentally, but Vegard mentioned a possible fox for 6.6 kernels. Realistically, Fedora has moved on to 6.7.x now, but I do still test 6.6.x stable rcs and while I reported that 6.6.16 was good, it was only because I saw no regressions from 6.6.15. The docs failure from this patch still exists.
I'm thinking this might be missing from the backport of the stable patch, causing it to break on 6.6.15 (since ia64 was removed in 6.7):
diff --git a/Documentation/arch/ia64/features.rst b/Documentation/arch/ia64/features.rst index d7226fdcf5f8..056838d2ab55 100644 --- a/Documentation/arch/ia64/features.rst +++ b/Documentation/arch/ia64/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features ia64 +.. kernel-feat:: features ia64
I also think I understand the kernel_feat.py code a bit better and will submit a proper fix for that -- basically, if it doesn't manage to process any lines from the referenced file (which it won't for ia64 since it literally tries to open a file called $srctree/...) then it calls self.nestedParse() with a non-existent 'fname' and an empty 'lines' -- so the value of the fname parameter is truly unused in the degenerate case, but it's still referencing the undefined 'fname'.
So I'm thinking:
1) the ia64/features.rst patch above for stable (only) to fix the immediate breakage in stable,
2) kernel_feat.py patch for mainline that can trickle down into stable once it's been merged.
Thanks, and sorry for the breakage.
Vegard
My mainline commit c48a7c44a1d0 ("docs: kernel_feat.py: fix potential command injection") contains a bug which can manifests like this when building the documentation:
Sphinx parallel build error: UnboundLocalError: local variable 'fname' referenced before assignment make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2
However, this only appears when there exists a '.. kernel-feat::' directive that points to a non-existent file, which isn't the case in mainline.
When this commit was backported to stable 6.6, it didn't change Documentation/arch/ia64/features.rst since ia64 was removed in 6.7 in commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture"). This lead to the build failure seen above -- but only in stable kernels.
This patch fixes the backport and should only be applied to kernels where Documentation/arch/ia64/features.rst exists and commit c48a7c44a1d0 has also been applied.
A second patch will follow to fix kernel_feat.py in mainline so that it doesn't error out when the '.. kernel-feat::' directive points to a nonexistent file.
Link: https://lore.kernel.org/all/ZbkfGst991YHqJHK@fedora64.linuxtx.org/ Fixes: e961f8c6966a ("docs: kernel_feat.py: fix potential command injection") # stable 6.6.15 Reported-by: Justin Forbes jforbes@fedoraproject.org Reported-y: Salvatore Bonaccorso carnil@debian.org Signed-off-by: Vegard Nossum vegard.nossum@oracle.com --- Documentation/arch/ia64/features.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/arch/ia64/features.rst b/Documentation/arch/ia64/features.rst index d7226fdcf5f8..056838d2ab55 100644 --- a/Documentation/arch/ia64/features.rst +++ b/Documentation/arch/ia64/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0
-.. kernel-feat:: $srctree/Documentation/features ia64 +.. kernel-feat:: features ia64
On Mon, Feb 05, 2024 at 11:39:59AM +0100, Vegard Nossum wrote:
My mainline commit c48a7c44a1d0 ("docs: kernel_feat.py: fix potential command injection") contains a bug which can manifests like this when building the documentation:
Sphinx parallel build error: UnboundLocalError: local variable 'fname' referenced before assignment make[2]: *** [Documentation/Makefile:102: htmldocs] Error 2
However, this only appears when there exists a '.. kernel-feat::' directive that points to a non-existent file, which isn't the case in mainline.
When this commit was backported to stable 6.6, it didn't change Documentation/arch/ia64/features.rst since ia64 was removed in 6.7 in commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture"). This lead to the build failure seen above -- but only in stable kernels.
This patch fixes the backport and should only be applied to kernels where Documentation/arch/ia64/features.rst exists and commit c48a7c44a1d0 has also been applied.
A second patch will follow to fix kernel_feat.py in mainline so that it doesn't error out when the '.. kernel-feat::' directive points to a nonexistent file.
Link: https://lore.kernel.org/all/ZbkfGst991YHqJHK@fedora64.linuxtx.org/ Fixes: e961f8c6966a ("docs: kernel_feat.py: fix potential command injection") # stable 6.6.15 Reported-by: Justin Forbes jforbes@fedoraproject.org Reported-y: Salvatore Bonaccorso carnil@debian.org Signed-off-by: Vegard Nossum vegard.nossum@oracle.com
Documentation/arch/ia64/features.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/arch/ia64/features.rst b/Documentation/arch/ia64/features.rst index d7226fdcf5f8..056838d2ab55 100644 --- a/Documentation/arch/ia64/features.rst +++ b/Documentation/arch/ia64/features.rst @@ -1,3 +1,3 @@ .. SPDX-License-Identifier: GPL-2.0 -.. kernel-feat:: $srctree/Documentation/features ia64
+.. kernel-feat:: features ia64
2.34.1
Sorry for the delay, now queued up.
greg k-h
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
[ Upstream commit 7cda0b9eb6eb9e761f452e2ef4e81eca20b19938 ]
Simplify uart_get_rs485_mode() by using temporary variable for the GPIO descriptor. With that, use proper type for the flags of the GPIO descriptor.
Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Link: https://lore.kernel.org/r/20231003142346.3072929-1-andriy.shevchenko@linux.i... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 1a33e33ca0e8 ("serial: core: set missing supported flag for RX during TX GPIO") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/serial_core.c | 30 ++++++++++++------------------ 1 file changed, 12 insertions(+), 18 deletions(-)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c index 18b49b1439a5..021e096bc4ed 100644 --- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -3572,9 +3572,10 @@ int uart_get_rs485_mode(struct uart_port *port) { struct serial_rs485 *rs485conf = &port->rs485; struct device *dev = port->dev; + enum gpiod_flags dflags; + struct gpio_desc *desc; u32 rs485_delay[2]; int ret; - int rx_during_tx_gpio_flag;
if (!(port->rs485_supported.flags & SER_RS485_ENABLED)) return 0; @@ -3616,26 +3617,19 @@ int uart_get_rs485_mode(struct uart_port *port) * bus participants enable it, no communication is possible at all. * Works fine for short cables and users may enable for longer cables. */ - port->rs485_term_gpio = devm_gpiod_get_optional(dev, "rs485-term", - GPIOD_OUT_LOW); - if (IS_ERR(port->rs485_term_gpio)) { - ret = PTR_ERR(port->rs485_term_gpio); - port->rs485_term_gpio = NULL; - return dev_err_probe(dev, ret, "Cannot get rs485-term-gpios\n"); - } + desc = devm_gpiod_get_optional(dev, "rs485-term", GPIOD_OUT_LOW); + if (IS_ERR(desc)) + return dev_err_probe(dev, PTR_ERR(desc), "Cannot get rs485-term-gpios\n"); + port->rs485_term_gpio = desc; if (port->rs485_term_gpio) port->rs485_supported.flags |= SER_RS485_TERMINATE_BUS;
- rx_during_tx_gpio_flag = (rs485conf->flags & SER_RS485_RX_DURING_TX) ? - GPIOD_OUT_HIGH : GPIOD_OUT_LOW; - port->rs485_rx_during_tx_gpio = devm_gpiod_get_optional(dev, - "rs485-rx-during-tx", - rx_during_tx_gpio_flag); - if (IS_ERR(port->rs485_rx_during_tx_gpio)) { - ret = PTR_ERR(port->rs485_rx_during_tx_gpio); - port->rs485_rx_during_tx_gpio = NULL; - return dev_err_probe(dev, ret, "Cannot get rs485-rx-during-tx-gpios\n"); - } + dflags = (rs485conf->flags & SER_RS485_RX_DURING_TX) ? + GPIOD_OUT_HIGH : GPIOD_OUT_LOW; + desc = devm_gpiod_get_optional(dev, "rs485-rx-during-tx", dflags); + if (IS_ERR(desc)) + return dev_err_probe(dev, PTR_ERR(desc), "Cannot get rs485-rx-during-tx-gpios\n"); + port->rs485_rx_during_tx_gpio = desc;
return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lino Sanfilippo l.sanfilippo@kunbus.com
[ Upstream commit 1a33e33ca0e80d485458410f149265cdc0178cfa ]
If the RS485 feature RX-during-TX is supported by means of a GPIO set the according supported flag. Otherwise setting this feature from userspace may not be possible, since in uart_sanitize_serial_rs485() the passed RS485 configuration is matched against the supported features and unsupported settings are thereby removed and thus take no effect.
Cc: stable@vger.kernel.org Fixes: 163f080eb717 ("serial: core: Add option to output RS485 RX_DURING_TX state via GPIO") Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: Lino Sanfilippo l.sanfilippo@kunbus.com Link: https://lore.kernel.org/r/20240103061818.564-3-l.sanfilippo@kunbus.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/serial_core.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c index 021e096bc4ed..bde5b6015305 100644 --- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -3630,6 +3630,8 @@ int uart_get_rs485_mode(struct uart_port *port) if (IS_ERR(desc)) return dev_err_probe(dev, PTR_ERR(desc), "Cannot get rs485-rx-during-tx-gpios\n"); port->rs485_rx_during_tx_gpio = desc; + if (port->rs485_rx_during_tx_gpio) + port->rs485_supported.flags |= SER_RS485_RX_DURING_TX;
return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit 6543ac13c623f906200dfd3f1c407d8d333b6995 ]
The existing SoundWire support misses a clear Controller/Manager hiearchical definition to deal with all variants across SOC vendors.
a) Intel platforms have one controller with 4 or more Managers. b) AMD platforms have two controllers with one Manager each, but due to BIOS issues use two different link_id values within the scope of a single controller. c) QCOM platforms have one or more controller with one Manager each.
This patch adds a 'controller_id' which can be set by higher levels. If assigned to -1, the controller_id will be set to the system-unique IDA-assigned bus->id.
The main change is that the bus->id is no longer used for any device name, which makes the definition completely predictable and not dependent on any enumeration order. The bus->id is only used to insert the Managers in the stream rt context.
Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Vijendar Mukunda Vijendar.Mukunda@amd.com Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Tested-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/stable/20231017160933.12624-2-pierre-louis.bossart%4... Tested-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20231017160933.12624-2-pierre-louis.bossart@linux.... Signed-off-by: Vinod Koul vkoul@kernel.org Stable-dep-of: 8a8a9ac8a497 ("soundwire: fix initializing sysfs for same devices on different buses") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soundwire/amd_manager.c | 8 ++++++++ drivers/soundwire/bus.c | 4 ++++ drivers/soundwire/debugfs.c | 2 +- drivers/soundwire/intel_auxdevice.c | 3 +++ drivers/soundwire/master.c | 2 +- drivers/soundwire/qcom.c | 3 +++ include/linux/soundwire/sdw.h | 4 +++- 7 files changed, 23 insertions(+), 3 deletions(-)
diff --git a/drivers/soundwire/amd_manager.c b/drivers/soundwire/amd_manager.c index 3a99f6dcdfaf..a3b1f4e6f0f9 100644 --- a/drivers/soundwire/amd_manager.c +++ b/drivers/soundwire/amd_manager.c @@ -927,6 +927,14 @@ static int amd_sdw_manager_probe(struct platform_device *pdev) amd_manager->bus.clk_stop_timeout = 200; amd_manager->bus.link_id = amd_manager->instance;
+ /* + * Due to BIOS compatibility, the two links are exposed within + * the scope of a single controller. If this changes, the + * controller_id will have to be updated with drv_data + * information. + */ + amd_manager->bus.controller_id = 0; + switch (amd_manager->instance) { case ACP_SDW0: amd_manager->num_dout_ports = AMD_SDW0_MAX_TX_PORTS; diff --git a/drivers/soundwire/bus.c b/drivers/soundwire/bus.c index 0e7bc3c40f9d..e7553c38be59 100644 --- a/drivers/soundwire/bus.c +++ b/drivers/soundwire/bus.c @@ -22,6 +22,10 @@ static int sdw_get_id(struct sdw_bus *bus) return rc;
bus->id = rc; + + if (bus->controller_id == -1) + bus->controller_id = rc; + return 0; }
diff --git a/drivers/soundwire/debugfs.c b/drivers/soundwire/debugfs.c index d1553cb77187..67abd7e52f09 100644 --- a/drivers/soundwire/debugfs.c +++ b/drivers/soundwire/debugfs.c @@ -20,7 +20,7 @@ void sdw_bus_debugfs_init(struct sdw_bus *bus) return;
/* create the debugfs master-N */ - snprintf(name, sizeof(name), "master-%d-%d", bus->id, bus->link_id); + snprintf(name, sizeof(name), "master-%d-%d", bus->controller_id, bus->link_id); bus->debugfs = debugfs_create_dir(name, sdw_debugfs_root); }
diff --git a/drivers/soundwire/intel_auxdevice.c b/drivers/soundwire/intel_auxdevice.c index 7f15e3549e53..93698532deac 100644 --- a/drivers/soundwire/intel_auxdevice.c +++ b/drivers/soundwire/intel_auxdevice.c @@ -234,6 +234,9 @@ static int intel_link_probe(struct auxiliary_device *auxdev, cdns->instance = sdw->instance; cdns->msg_count = 0;
+ /* single controller for all SoundWire links */ + bus->controller_id = 0; + bus->link_id = auxdev->id; bus->clk_stop_timeout = 1;
diff --git a/drivers/soundwire/master.c b/drivers/soundwire/master.c index 9b05c9e25ebe..51abedbbaa66 100644 --- a/drivers/soundwire/master.c +++ b/drivers/soundwire/master.c @@ -145,7 +145,7 @@ int sdw_master_device_add(struct sdw_bus *bus, struct device *parent, md->dev.fwnode = fwnode; md->dev.dma_mask = parent->dma_mask;
- dev_set_name(&md->dev, "sdw-master-%d", bus->id); + dev_set_name(&md->dev, "sdw-master-%d-%d", bus->controller_id, bus->link_id);
ret = device_register(&md->dev); if (ret) { diff --git a/drivers/soundwire/qcom.c b/drivers/soundwire/qcom.c index 55be9f4b8d59..e3ae4e4e07ac 100644 --- a/drivers/soundwire/qcom.c +++ b/drivers/soundwire/qcom.c @@ -1612,6 +1612,9 @@ static int qcom_swrm_probe(struct platform_device *pdev) } }
+ /* FIXME: is there a DT-defined value to use ? */ + ctrl->bus.controller_id = -1; + ret = sdw_bus_master_add(&ctrl->bus, dev, dev->fwnode); if (ret) { dev_err(dev, "Failed to register Soundwire controller (%d)\n", diff --git a/include/linux/soundwire/sdw.h b/include/linux/soundwire/sdw.h index 4f3d14bb1538..c383579a008b 100644 --- a/include/linux/soundwire/sdw.h +++ b/include/linux/soundwire/sdw.h @@ -886,7 +886,8 @@ struct sdw_master_ops { * struct sdw_bus - SoundWire bus * @dev: Shortcut to &bus->md->dev to avoid changing the entire code. * @md: Master device - * @link_id: Link id number, can be 0 to N, unique for each Master + * @controller_id: system-unique controller ID. If set to -1, the bus @id will be used. + * @link_id: Link id number, can be 0 to N, unique for each Controller * @id: bus system-wide unique id * @slaves: list of Slaves on this bus * @assigned: Bitmap for Slave device numbers. @@ -918,6 +919,7 @@ struct sdw_master_ops { struct sdw_bus { struct device *dev; struct sdw_master_device *md; + int controller_id; unsigned int link_id; int id; struct list_head slaves;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
[ Upstream commit 8a8a9ac8a4972ee69d3dd3d1ae43963ae39cee18 ]
If same devices with same device IDs are present on different soundwire buses, the probe fails due to conflicting device names and sysfs entries:
sysfs: cannot create duplicate filename '/bus/soundwire/devices/sdw:0:0217:0204:00:0'
The link ID is 0 for both devices, so they should be differentiated by the controller ID. Add the controller ID so, the device names and sysfs entries look like:
sdw:1:0:0217:0204:00:0 -> ../../../devices/platform/soc@0/6ab0000.soundwire-controller/sdw-master-1-0/sdw:1:0:0217:0204:00:0 sdw:3:0:0217:0204:00:0 -> ../../../devices/platform/soc@0/6b10000.soundwire-controller/sdw-master-3-0/sdw:3:0:0217:0204:00:0
[PLB changes: use bus->controller_id instead of bus->id]
Fixes: 7c3cd189b86d ("soundwire: Add Master registration") Cc: stable@vger.kernel.org Reviewed-by: Bard Liao yung-chuan.liao@linux.intel.com Reviewed-by: Vijendar Mukunda Vijendar.Mukunda@amd.com Co-developed-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Tested-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Acked-by: Mark Brown broonie@kernel.org Tested-by: Srinivas Kandagatla srinivas.kandagatla@linaro.org Link: https://lore.kernel.org/r/20231017160933.12624-3-pierre-louis.bossart@linux.... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soundwire/slave.c | 12 ++++++------ sound/soc/intel/boards/sof_sdw.c | 4 ++-- 2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/soundwire/slave.c b/drivers/soundwire/slave.c index c1c1a2ac293a..060c2982e26b 100644 --- a/drivers/soundwire/slave.c +++ b/drivers/soundwire/slave.c @@ -39,14 +39,14 @@ int sdw_slave_add(struct sdw_bus *bus, slave->dev.fwnode = fwnode;
if (id->unique_id == SDW_IGNORED_UNIQUE_ID) { - /* name shall be sdw:link:mfg:part:class */ - dev_set_name(&slave->dev, "sdw:%01x:%04x:%04x:%02x", - bus->link_id, id->mfg_id, id->part_id, + /* name shall be sdw:ctrl:link:mfg:part:class */ + dev_set_name(&slave->dev, "sdw:%01x:%01x:%04x:%04x:%02x", + bus->controller_id, bus->link_id, id->mfg_id, id->part_id, id->class_id); } else { - /* name shall be sdw:link:mfg:part:class:unique */ - dev_set_name(&slave->dev, "sdw:%01x:%04x:%04x:%02x:%01x", - bus->link_id, id->mfg_id, id->part_id, + /* name shall be sdw:ctrl:link:mfg:part:class:unique */ + dev_set_name(&slave->dev, "sdw:%01x:%01x:%04x:%04x:%02x:%01x", + bus->controller_id, bus->link_id, id->mfg_id, id->part_id, id->class_id, id->unique_id); }
diff --git a/sound/soc/intel/boards/sof_sdw.c b/sound/soc/intel/boards/sof_sdw.c index 24e966a2ac2b..9ed572141fe5 100644 --- a/sound/soc/intel/boards/sof_sdw.c +++ b/sound/soc/intel/boards/sof_sdw.c @@ -1197,11 +1197,11 @@ static int fill_sdw_codec_dlc(struct device *dev, else if (is_unique_device(adr_link, sdw_version, mfg_id, part_id, class_id, adr_index)) codec->name = devm_kasprintf(dev, GFP_KERNEL, - "sdw:%01x:%04x:%04x:%02x", link_id, + "sdw:0:%01x:%04x:%04x:%02x", link_id, mfg_id, part_id, class_id); else codec->name = devm_kasprintf(dev, GFP_KERNEL, - "sdw:%01x:%04x:%04x:%02x:%01x", link_id, + "sdw:0:%01x:%04x:%04x:%02x:%01x", link_id, mfg_id, part_id, class_id, unique_id);
if (!codec->name)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rohan G Thomas rohan.g.thomas@intel.com
[ Upstream commit 8452a05b2c633b708dbe3e742f71b24bf21fe42d ]
Add sw fallback of tx checksum calculation for those tx queues that don't support tx checksum offloading. DW xGMAC IP can be synthesized such that it can support tx checksum offloading only for a few initial tx queues. Also as Serge pointed out, for the DW QoS IP, tx coe can be individually configured for each tx queue.
So when tx coe is enabled, for any tx queue that doesn't support tx coe with 'coe-unsupported' flag set will have a sw fallback happen in the driver for tx checksum calculation when any packets to be transmitted on these tx queues.
Signed-off-by: Rohan G Thomas rohan.g.thomas@intel.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: c2945c435c99 ("net: stmmac: Prevent DSA tags from breaking COE") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 10 ++++++++++ drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c | 3 +++ include/linux/stmmac.h | 1 + 3 files changed, 14 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 1bfcf673b3ce..59e07efe08c9 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4417,6 +4417,16 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) WARN_ON(tx_q->tx_skbuff[first_entry]);
csum_insertion = (skb->ip_summed == CHECKSUM_PARTIAL); + /* DWMAC IPs can be synthesized to support tx coe only for a few tx + * queues. In that case, checksum offloading for those queues that don't + * support tx coe needs to fallback to software checksum calculation. + */ + if (csum_insertion && + priv->plat->tx_queues_cfg[queue].coe_unsupported) { + if (unlikely(skb_checksum_help(skb))) + goto dma_map_err; + csum_insertion = !csum_insertion; + }
if (likely(priv->extend_desc)) desc = (struct dma_desc *)(tx_q->dma_etx + entry); diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c index 2f0678f15fb7..30d5e635190e 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_platform.c @@ -276,6 +276,9 @@ static int stmmac_mtl_setup(struct platform_device *pdev, plat->tx_queues_cfg[queue].use_prio = true; }
+ plat->tx_queues_cfg[queue].coe_unsupported = + of_property_read_bool(q_node, "snps,coe-unsupported"); + queue++; } if (queue != plat->tx_queues_to_use) { diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h index e3f7ee169c08..5acb77968902 100644 --- a/include/linux/stmmac.h +++ b/include/linux/stmmac.h @@ -139,6 +139,7 @@ struct stmmac_rxq_cfg {
struct stmmac_txq_cfg { u32 weight; + bool coe_unsupported; u8 mode_to_use; /* Credit Base Shaper parameters */ u32 send_slope;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Romain Gantois romain.gantois@bootlin.com
[ Upstream commit c2945c435c999c63e47f337bc7c13c98c21d0bcc ]
Some DSA tagging protocols change the EtherType field in the MAC header e.g. DSA_TAG_PROTO_(DSA/EDSA/BRCM/MTK/RTL4C_A/SJA1105). On TX these tagged frames are ignored by the checksum offload engine and IP header checker of some stmmac cores.
On RX, the stmmac driver wrongly assumes that checksums have been computed for these tagged packets, and sets CHECKSUM_UNNECESSARY.
Add an additional check in the stmmac TX and RX hotpaths so that COE is deactivated for packets with ethertypes that will not trigger the COE and IP header checks.
Fixes: 6b2c6e4a938f ("net: stmmac: propagate feature flags to vlan") Cc: stable@vger.kernel.org Reported-by: Richard Tresidder rtresidd@electromag.com.au Link: https://lore.kernel.org/netdev/e5c6c75f-2dfa-4e50-a1fb-6bf4cdb617c2@electrom... Reported-by: Romain Gantois romain.gantois@bootlin.com Link: https://lore.kernel.org/netdev/c57283ed-6b9b-b0e6-ee12-5655c1c54495@bootlin.... Reviewed-by: Vladimir Oltean vladimir.oltean@nxp.com Reviewed-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Romain Gantois romain.gantois@bootlin.com Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/stmicro/stmmac/stmmac_main.c | 32 +++++++++++++++++-- 1 file changed, 29 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 59e07efe08c9..684ec7058c82 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -4356,6 +4356,28 @@ static netdev_tx_t stmmac_tso_xmit(struct sk_buff *skb, struct net_device *dev) return NETDEV_TX_OK; }
+/** + * stmmac_has_ip_ethertype() - Check if packet has IP ethertype + * @skb: socket buffer to check + * + * Check if a packet has an ethertype that will trigger the IP header checks + * and IP/TCP checksum engine of the stmmac core. + * + * Return: true if the ethertype can trigger the checksum engine, false + * otherwise + */ +static bool stmmac_has_ip_ethertype(struct sk_buff *skb) +{ + int depth = 0; + __be16 proto; + + proto = __vlan_get_protocol(skb, eth_header_parse_protocol(skb), + &depth); + + return (depth <= ETH_HLEN) && + (proto == htons(ETH_P_IP) || proto == htons(ETH_P_IPV6)); +} + /** * stmmac_xmit - Tx entry point of the driver * @skb : the socket buffer @@ -4420,9 +4442,13 @@ static netdev_tx_t stmmac_xmit(struct sk_buff *skb, struct net_device *dev) /* DWMAC IPs can be synthesized to support tx coe only for a few tx * queues. In that case, checksum offloading for those queues that don't * support tx coe needs to fallback to software checksum calculation. + * + * Packets that won't trigger the COE e.g. most DSA-tagged packets will + * also have to be checksummed in software. */ if (csum_insertion && - priv->plat->tx_queues_cfg[queue].coe_unsupported) { + (priv->plat->tx_queues_cfg[queue].coe_unsupported || + !stmmac_has_ip_ethertype(skb))) { if (unlikely(skb_checksum_help(skb))) goto dma_map_err; csum_insertion = !csum_insertion; @@ -4982,7 +5008,7 @@ static void stmmac_dispatch_skb_zc(struct stmmac_priv *priv, u32 queue, stmmac_rx_vlan(priv->dev, skb); skb->protocol = eth_type_trans(skb, priv->dev);
- if (unlikely(!coe)) + if (unlikely(!coe) || !stmmac_has_ip_ethertype(skb)) skb_checksum_none_assert(skb); else skb->ip_summed = CHECKSUM_UNNECESSARY; @@ -5498,7 +5524,7 @@ static int stmmac_rx(struct stmmac_priv *priv, int limit, u32 queue) stmmac_rx_vlan(priv->dev, skb); skb->protocol = eth_type_trans(skb, priv->dev);
- if (unlikely(!coe)) + if (unlikely(!coe) || !stmmac_has_ip_ethertype(skb)) skb_checksum_none_assert(skb); else skb->ip_summed = CHECKSUM_UNNECESSARY;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marcelo Schmitt marcelo.schmitt@analog.com
[ Upstream commit 149694f5e79b0c7a36ceb76e7c0d590db8f151c1 ]
The ad7091r-base driver sets up an interrupt handler for firing events when inputs are either above or below a certain threshold. However, for the interrupt signal to come from the device it must be configured to enable the ALERT/BUSY/GPO pin to be used as ALERT, which was not being done until now. Enable interrupt signals on the ALERT/BUSY/GPO pin by setting the proper bit in the configuration register.
Signed-off-by: Marcelo Schmitt marcelo.schmitt@analog.com Link: https://lore.kernel.org/r/e8da2ee98d6df88318b14baf3dc9630e20218418.170274624... Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Stable-dep-of: 020e71c7ffc2 ("iio: adc: ad7091r: Allow users to configure device events") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/adc/ad7091r-base.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/iio/adc/ad7091r-base.c b/drivers/iio/adc/ad7091r-base.c index 0e5d3d2e9c98..8aaa854f816f 100644 --- a/drivers/iio/adc/ad7091r-base.c +++ b/drivers/iio/adc/ad7091r-base.c @@ -28,6 +28,7 @@ #define AD7091R_REG_RESULT_CONV_RESULT(x) ((x) & 0xfff)
/* AD7091R_REG_CONF */ +#define AD7091R_REG_CONF_ALERT_EN BIT(4) #define AD7091R_REG_CONF_AUTO BIT(8) #define AD7091R_REG_CONF_CMD BIT(10)
@@ -232,6 +233,11 @@ int ad7091r_probe(struct device *dev, const char *name, iio_dev->channels = chip_info->channels;
if (irq) { + ret = regmap_update_bits(st->map, AD7091R_REG_CONF, + AD7091R_REG_CONF_ALERT_EN, BIT(4)); + if (ret) + return ret; + ret = devm_request_threaded_irq(dev, irq, NULL, ad7091r_event_handler, IRQF_TRIGGER_FALLING | IRQF_ONESHOT, name, iio_dev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marcelo Schmitt marcelo.schmitt@analog.com
[ Upstream commit 020e71c7ffc25dfe29ed9be6c2d39af7bd7f661f ]
AD7091R-5 devices are supported by the ad7091r-5 driver together with the ad7091r-base driver. Those drivers declared iio events for notifying user space when ADC readings fall bellow the thresholds of low limit registers or above the values set in high limit registers. However, to configure iio events and their thresholds, a set of callback functions must be implemented and those were not present until now. The consequence of trying to configure ad7091r-5 events without the proper callback functions was a null pointer dereference in the kernel because the pointers to the callback functions were not set.
Implement event configuration callbacks allowing users to read/write event thresholds and enable/disable event generation.
Since the event spec structs are generic to AD7091R devices, also move those from the ad7091r-5 driver the base driver so they can be reused when support for ad7091r-2/-4/-8 be added.
Fixes: ca69300173b6 ("iio: adc: Add support for AD7091R5 ADC") Suggested-by: David Lechner dlechner@baylibre.com Signed-off-by: Marcelo Schmitt marcelo.schmitt@analog.com Link: https://lore.kernel.org/r/59552d3548dabd56adc3107b7b4869afee2b0c3c.170301335... Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/adc/ad7091r-base.c | 156 +++++++++++++++++++++++++++++++++ drivers/iio/adc/ad7091r-base.h | 6 ++ drivers/iio/adc/ad7091r5.c | 28 +----- 3 files changed, 166 insertions(+), 24 deletions(-)
diff --git a/drivers/iio/adc/ad7091r-base.c b/drivers/iio/adc/ad7091r-base.c index 8aaa854f816f..3d36bcd26b0c 100644 --- a/drivers/iio/adc/ad7091r-base.c +++ b/drivers/iio/adc/ad7091r-base.c @@ -6,6 +6,7 @@ */
#include <linux/bitops.h> +#include <linux/bitfield.h> #include <linux/iio/events.h> #include <linux/iio/iio.h> #include <linux/interrupt.h> @@ -50,6 +51,27 @@ struct ad7091r_state { struct mutex lock; /*lock to prevent concurent reads */ };
+const struct iio_event_spec ad7091r_events[] = { + { + .type = IIO_EV_TYPE_THRESH, + .dir = IIO_EV_DIR_RISING, + .mask_separate = BIT(IIO_EV_INFO_VALUE) | + BIT(IIO_EV_INFO_ENABLE), + }, + { + .type = IIO_EV_TYPE_THRESH, + .dir = IIO_EV_DIR_FALLING, + .mask_separate = BIT(IIO_EV_INFO_VALUE) | + BIT(IIO_EV_INFO_ENABLE), + }, + { + .type = IIO_EV_TYPE_THRESH, + .dir = IIO_EV_DIR_EITHER, + .mask_separate = BIT(IIO_EV_INFO_HYSTERESIS), + }, +}; +EXPORT_SYMBOL_NS_GPL(ad7091r_events, IIO_AD7091R); + static int ad7091r_set_mode(struct ad7091r_state *st, enum ad7091r_mode mode) { int ret, conf; @@ -169,8 +191,142 @@ static int ad7091r_read_raw(struct iio_dev *iio_dev, return ret; }
+static int ad7091r_read_event_config(struct iio_dev *indio_dev, + const struct iio_chan_spec *chan, + enum iio_event_type type, + enum iio_event_direction dir) +{ + struct ad7091r_state *st = iio_priv(indio_dev); + int val, ret; + + switch (dir) { + case IIO_EV_DIR_RISING: + ret = regmap_read(st->map, + AD7091R_REG_CH_HIGH_LIMIT(chan->channel), + &val); + if (ret) + return ret; + return val != AD7091R_HIGH_LIMIT; + case IIO_EV_DIR_FALLING: + ret = regmap_read(st->map, + AD7091R_REG_CH_LOW_LIMIT(chan->channel), + &val); + if (ret) + return ret; + return val != AD7091R_LOW_LIMIT; + default: + return -EINVAL; + } +} + +static int ad7091r_write_event_config(struct iio_dev *indio_dev, + const struct iio_chan_spec *chan, + enum iio_event_type type, + enum iio_event_direction dir, int state) +{ + struct ad7091r_state *st = iio_priv(indio_dev); + + if (state) { + return regmap_set_bits(st->map, AD7091R_REG_CONF, + AD7091R_REG_CONF_ALERT_EN); + } else { + /* + * Set thresholds either to 0 or to 2^12 - 1 as appropriate to + * prevent alerts and thus disable event generation. + */ + switch (dir) { + case IIO_EV_DIR_RISING: + return regmap_write(st->map, + AD7091R_REG_CH_HIGH_LIMIT(chan->channel), + AD7091R_HIGH_LIMIT); + case IIO_EV_DIR_FALLING: + return regmap_write(st->map, + AD7091R_REG_CH_LOW_LIMIT(chan->channel), + AD7091R_LOW_LIMIT); + default: + return -EINVAL; + } + } +} + +static int ad7091r_read_event_value(struct iio_dev *indio_dev, + const struct iio_chan_spec *chan, + enum iio_event_type type, + enum iio_event_direction dir, + enum iio_event_info info, int *val, int *val2) +{ + struct ad7091r_state *st = iio_priv(indio_dev); + int ret; + + switch (info) { + case IIO_EV_INFO_VALUE: + switch (dir) { + case IIO_EV_DIR_RISING: + ret = regmap_read(st->map, + AD7091R_REG_CH_HIGH_LIMIT(chan->channel), + val); + if (ret) + return ret; + return IIO_VAL_INT; + case IIO_EV_DIR_FALLING: + ret = regmap_read(st->map, + AD7091R_REG_CH_LOW_LIMIT(chan->channel), + val); + if (ret) + return ret; + return IIO_VAL_INT; + default: + return -EINVAL; + } + case IIO_EV_INFO_HYSTERESIS: + ret = regmap_read(st->map, + AD7091R_REG_CH_HYSTERESIS(chan->channel), + val); + if (ret) + return ret; + return IIO_VAL_INT; + default: + return -EINVAL; + } +} + +static int ad7091r_write_event_value(struct iio_dev *indio_dev, + const struct iio_chan_spec *chan, + enum iio_event_type type, + enum iio_event_direction dir, + enum iio_event_info info, int val, int val2) +{ + struct ad7091r_state *st = iio_priv(indio_dev); + + switch (info) { + case IIO_EV_INFO_VALUE: + switch (dir) { + case IIO_EV_DIR_RISING: + return regmap_write(st->map, + AD7091R_REG_CH_HIGH_LIMIT(chan->channel), + val); + case IIO_EV_DIR_FALLING: + return regmap_write(st->map, + AD7091R_REG_CH_LOW_LIMIT(chan->channel), + val); + default: + return -EINVAL; + } + case IIO_EV_INFO_HYSTERESIS: + return regmap_write(st->map, + AD7091R_REG_CH_HYSTERESIS(chan->channel), + val); + default: + return -EINVAL; + } +} + static const struct iio_info ad7091r_info = { .read_raw = ad7091r_read_raw, + .read_event_config = &ad7091r_read_event_config, + .write_event_config = &ad7091r_write_event_config, + .read_event_value = &ad7091r_read_event_value, + .write_event_value = &ad7091r_write_event_value, };
static irqreturn_t ad7091r_event_handler(int irq, void *private) diff --git a/drivers/iio/adc/ad7091r-base.h b/drivers/iio/adc/ad7091r-base.h index 509748aef9b1..7a78976a2f80 100644 --- a/drivers/iio/adc/ad7091r-base.h +++ b/drivers/iio/adc/ad7091r-base.h @@ -8,6 +8,10 @@ #ifndef __DRIVERS_IIO_ADC_AD7091R_BASE_H__ #define __DRIVERS_IIO_ADC_AD7091R_BASE_H__
+/* AD7091R_REG_CH_LIMIT */ +#define AD7091R_HIGH_LIMIT 0xFFF +#define AD7091R_LOW_LIMIT 0x0 + struct device; struct ad7091r_state;
@@ -17,6 +21,8 @@ struct ad7091r_chip_info { unsigned int vref_mV; };
+extern const struct iio_event_spec ad7091r_events[3]; + extern const struct regmap_config ad7091r_regmap_config;
int ad7091r_probe(struct device *dev, const char *name, diff --git a/drivers/iio/adc/ad7091r5.c b/drivers/iio/adc/ad7091r5.c index 2f048527b7b7..dae98c95ebb8 100644 --- a/drivers/iio/adc/ad7091r5.c +++ b/drivers/iio/adc/ad7091r5.c @@ -12,26 +12,6 @@
#include "ad7091r-base.h"
-static const struct iio_event_spec ad7091r5_events[] = { - { - .type = IIO_EV_TYPE_THRESH, - .dir = IIO_EV_DIR_RISING, - .mask_separate = BIT(IIO_EV_INFO_VALUE) | - BIT(IIO_EV_INFO_ENABLE), - }, - { - .type = IIO_EV_TYPE_THRESH, - .dir = IIO_EV_DIR_FALLING, - .mask_separate = BIT(IIO_EV_INFO_VALUE) | - BIT(IIO_EV_INFO_ENABLE), - }, - { - .type = IIO_EV_TYPE_THRESH, - .dir = IIO_EV_DIR_EITHER, - .mask_separate = BIT(IIO_EV_INFO_HYSTERESIS), - }, -}; - #define AD7091R_CHANNEL(idx, bits, ev, num_ev) { \ .type = IIO_VOLTAGE, \ .info_mask_separate = BIT(IIO_CHAN_INFO_RAW), \ @@ -44,10 +24,10 @@ static const struct iio_event_spec ad7091r5_events[] = { .scan_type.realbits = bits, \ } static const struct iio_chan_spec ad7091r5_channels_irq[] = { - AD7091R_CHANNEL(0, 12, ad7091r5_events, ARRAY_SIZE(ad7091r5_events)), - AD7091R_CHANNEL(1, 12, ad7091r5_events, ARRAY_SIZE(ad7091r5_events)), - AD7091R_CHANNEL(2, 12, ad7091r5_events, ARRAY_SIZE(ad7091r5_events)), - AD7091R_CHANNEL(3, 12, ad7091r5_events, ARRAY_SIZE(ad7091r5_events)), + AD7091R_CHANNEL(0, 12, ad7091r_events, ARRAY_SIZE(ad7091r_events)), + AD7091R_CHANNEL(1, 12, ad7091r_events, ARRAY_SIZE(ad7091r_events)), + AD7091R_CHANNEL(2, 12, ad7091r_events, ARRAY_SIZE(ad7091r_events)), + AD7091R_CHANNEL(3, 12, ad7091r_events, ARRAY_SIZE(ad7091r_events)), };
static const struct iio_chan_spec ad7091r5_channels_noirq[] = {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Marcelo Schmitt marcelo.schmitt@analog.com
[ Upstream commit e71c5c89bcb165a02df35325aa13d1ee40112401 ]
The ADC needs a voltage reference to work correctly. Users can provide an external voltage reference or use the chip internal reference to operate the ADC. The availability of an in chip reference for the ADC saves the user from having to supply an external voltage reference, which makes the external reference an optional property as described in the device tree documentation. Though, to use the internal reference, it must be enabled by writing to the configuration register. Enable AD7091R internal voltage reference if no external vref is supplied.
Fixes: 260442cc5be4 ("iio: adc: ad7091r5: Add scale and external VREF support") Signed-off-by: Marcelo Schmitt marcelo.schmitt@analog.com Link: https://lore.kernel.org/r/b865033fa6a4fc4bf2b4a98ec51a6144e0f64f77.170301335... Cc: Stable@vger.kernel.org Signed-off-by: Jonathan Cameron Jonathan.Cameron@huawei.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iio/adc/ad7091r-base.c | 7 +++++++ drivers/iio/adc/ad7091r-base.h | 2 ++ 2 files changed, 9 insertions(+)
diff --git a/drivers/iio/adc/ad7091r-base.c b/drivers/iio/adc/ad7091r-base.c index 3d36bcd26b0c..76002b91c86a 100644 --- a/drivers/iio/adc/ad7091r-base.c +++ b/drivers/iio/adc/ad7091r-base.c @@ -405,7 +405,14 @@ int ad7091r_probe(struct device *dev, const char *name, if (IS_ERR(st->vref)) { if (PTR_ERR(st->vref) == -EPROBE_DEFER) return -EPROBE_DEFER; + st->vref = NULL; + /* Enable internal vref */ + ret = regmap_set_bits(st->map, AD7091R_REG_CONF, + AD7091R_REG_CONF_INT_VREF); + if (ret) + return dev_err_probe(st->dev, ret, + "Error on enable internal reference\n"); } else { ret = regulator_enable(st->vref); if (ret) diff --git a/drivers/iio/adc/ad7091r-base.h b/drivers/iio/adc/ad7091r-base.h index 7a78976a2f80..b9e1c8bf3440 100644 --- a/drivers/iio/adc/ad7091r-base.h +++ b/drivers/iio/adc/ad7091r-base.h @@ -8,6 +8,8 @@ #ifndef __DRIVERS_IIO_ADC_AD7091R_BASE_H__ #define __DRIVERS_IIO_ADC_AD7091R_BASE_H__
+#define AD7091R_REG_CONF_INT_VREF BIT(0) + /* AD7091R_REG_CH_LIMIT */ #define AD7091R_HIGH_LIMIT 0xFFF #define AD7091R_LOW_LIMIT 0x0
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frank Li Frank.Li@nxp.com
[ Upstream commit dc51b4442dd94ab12c146c1897bbdb40e16d5636 ]
The eDMAv4 channel mux has a limitation where certain requests must use even channels, while others must use odd numbers.
Add two flags (ARGS_EVEN_CH and ARGS_ODD_CH) to reflect this limitation. The device tree source (dts) files need to be updated accordingly.
This issue was identified by the following commit: commit a725990557e7 ("arm64: dts: imx93: Fix the dmas entries order")
Reverting channel orders triggered this problem.
Fixes: 72f5801a4e2b ("dmaengine: fsl-edma: integrate v3 support") Signed-off-by: Frank Li Frank.Li@nxp.com Link: https://lore.kernel.org/r/20231114154824.3617255-2-Frank.Li@nxp.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/fsl-edma-main.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/dma/fsl-edma-main.c b/drivers/dma/fsl-edma-main.c index 00cb70aca34a..30df55da4dbb 100644 --- a/drivers/dma/fsl-edma-main.c +++ b/drivers/dma/fsl-edma-main.c @@ -26,6 +26,8 @@ #define ARGS_RX BIT(0) #define ARGS_REMOTE BIT(1) #define ARGS_MULTI_FIFO BIT(2) +#define ARGS_EVEN_CH BIT(3) +#define ARGS_ODD_CH BIT(4)
static void fsl_edma_synchronize(struct dma_chan *chan) { @@ -159,6 +161,12 @@ static struct dma_chan *fsl_edma3_xlate(struct of_phandle_args *dma_spec, fsl_chan->is_remote = dma_spec->args[2] & ARGS_REMOTE; fsl_chan->is_multi_fifo = dma_spec->args[2] & ARGS_MULTI_FIFO;
+ if ((dma_spec->args[2] & ARGS_EVEN_CH) && (i & 0x1)) + continue; + + if ((dma_spec->args[2] & ARGS_ODD_CH) && !(i & 0x1)) + continue; + if (!b_chmux && i == dma_spec->args[0]) { chan = dma_get_slave_channel(chan); chan->device->privatecnt++;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amelie Delaunay amelie.delaunay@foss.st.com
[ Upstream commit f5c24d94512f1b288262beda4d3dcb9629222fc7 ]
__dma_async_device_channel_register() can fail. In case of failure, chan->local is freed (with free_percpu()), and chan->local is nullified. When dma_async_device_unregister() is called (because of managed API or intentionally by DMA controller driver), channels are unconditionally unregistered, leading to this NULL pointer: [ 1.318693] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000d0 [...] [ 1.484499] Call trace: [ 1.486930] device_del+0x40/0x394 [ 1.490314] device_unregister+0x20/0x7c [ 1.494220] __dma_async_device_channel_unregister+0x68/0xc0
Look at dma_async_device_register() function error path, channel device unregistration is done only if chan->local is not NULL.
Then add the same condition at the beginning of __dma_async_device_channel_unregister() function, to avoid NULL pointer issue whatever the API used to reach this function.
Fixes: d2fb0a043838 ("dmaengine: break out channel registration") Signed-off-by: Amelie Delaunay amelie.delaunay@foss.st.com Reviewed-by: Dave Jiang dave.jiang@intel.com Link: https://lore.kernel.org/r/20231213160452.2598073-1-amelie.delaunay@foss.st.c... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/dmaengine.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/dma/dmaengine.c b/drivers/dma/dmaengine.c index b7388ae62d7f..491b22240221 100644 --- a/drivers/dma/dmaengine.c +++ b/drivers/dma/dmaengine.c @@ -1103,6 +1103,9 @@ EXPORT_SYMBOL_GPL(dma_async_device_channel_register); static void __dma_async_device_channel_unregister(struct dma_device *device, struct dma_chan *chan) { + if (chan->local == NULL) + return; + WARN_ONCE(!device->device_release && chan->client_count, "%s called while %d clients hold a reference\n", __func__, chan->client_count);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rex Zhang rex.zhang@intel.com
[ Upstream commit e271c0ba3f919c48e90c64b703538fbb7865cb63 ]
Task may be rescheduled within dma_free_coherent(). So dma_free_coherent() can't be called between spin_lock() and spin_unlock() to avoid Call Trace: Call Trace: <TASK> dump_stack_lvl+0x37/0x50 __might_resched+0x16a/0x1c0 vunmap+0x2c/0x70 __iommu_dma_free+0x96/0x100 idxd_device_evl_free+0xd5/0x100 [idxd] device_release_driver_internal+0x197/0x200 unbind_store+0xa1/0xb0 kernfs_fop_write_iter+0x120/0x1c0 vfs_write+0x2d3/0x400 ksys_write+0x63/0xe0 do_syscall_64+0x44/0xa0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 Move it out of the context.
Fixes: 244da66cda35 ("dmaengine: idxd: setup event log configuration") Signed-off-by: Rex Zhang rex.zhang@intel.com Reviewed-by: Dave Jiang dave.jiang@intel.com Reviewed-by: Fenghua Yu fenghua.yu@intel.com Link: https://lore.kernel.org/r/20231212022158.358619-2-rex.zhang@intel.com Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/idxd/device.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/dma/idxd/device.c b/drivers/dma/idxd/device.c index 8f754f922217..fa0f880beae6 100644 --- a/drivers/dma/idxd/device.c +++ b/drivers/dma/idxd/device.c @@ -802,6 +802,9 @@ static int idxd_device_evl_setup(struct idxd_device *idxd)
static void idxd_device_evl_free(struct idxd_device *idxd) { + void *evl_log; + unsigned int evl_log_size; + dma_addr_t evl_dma; union gencfg_reg gencfg; union genctrl_reg genctrl; struct device *dev = &idxd->pdev->dev; @@ -822,11 +825,15 @@ static void idxd_device_evl_free(struct idxd_device *idxd) iowrite64(0, idxd->reg_base + IDXD_EVLCFG_OFFSET); iowrite64(0, idxd->reg_base + IDXD_EVLCFG_OFFSET + 8);
- dma_free_coherent(dev, evl->log_size, evl->log, evl->dma); bitmap_free(evl->bmap); + evl_log = evl->log; + evl_log_size = evl->log_size; + evl_dma = evl->dma; evl->log = NULL; evl->size = IDXD_EVL_SIZE_MIN; spin_unlock(&evl->lock); + + dma_free_coherent(dev, evl_log_size, evl_log, evl_dma); }
static void idxd_group_config_write(struct idxd_group *group)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bart Van Assche bvanassche@acm.org
[ Upstream commit ee36710912b2075c417100a8acc642c9c6496501 ]
Calling ufshcd_hba_exit() from a function that is called asynchronously from ufshcd_init() is wrong because this triggers multiple race conditions. Instead of calling ufshcd_hba_exit(), log an error message.
Reported-by: Daniel Mentz danielmentz@google.com Fixes: 1d337ec2f35e ("ufs: improve init sequence") Signed-off-by: Bart Van Assche bvanassche@acm.org Link: https://lore.kernel.org/r/20231218225229.2542156-3-bvanassche@acm.org Reviewed-by: Can Guo quic_cang@quicinc.com Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ufs/core/ufshcd.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 0971ae37f2a7..44e0437bd19d 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -8810,12 +8810,9 @@ static void ufshcd_async_scan(void *data, async_cookie_t cookie)
out: pm_runtime_put_sync(hba->dev); - /* - * If we failed to initialize the device or the device is not - * present, turn off the power/clocks etc. - */ + if (ret) - ufshcd_hba_exit(hba); + dev_err(hba->dev, "%s failed: %d\n", __func__, ret); }
static enum scsi_timeout_action ufshcd_eh_timed_out(struct scsi_cmnd *scmd)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christophe JAILLET christophe.jaillet@wanadoo.fr
[ Upstream commit adb1f95d388a43c4c564ef3e436f18900dde978e ]
The ending NULL is not taken into account by strncat(), so switch to strlcat() to correctly compute the size of the available memory when appending CONFIG_CMDLINE to 'early_cmdline'.
Fixes: 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Reviewed-by: Alexandre Ghiti alexghiti@rivosinc.com Link: https://lore.kernel.org/r/9f66d2b58c8052d4055e90b8477ee55d9a0914f9.169856402... Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/riscv/kernel/pi/cmdline_early.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/arch/riscv/kernel/pi/cmdline_early.c b/arch/riscv/kernel/pi/cmdline_early.c index 68e786c84c94..f6d4dedffb84 100644 --- a/arch/riscv/kernel/pi/cmdline_early.c +++ b/arch/riscv/kernel/pi/cmdline_early.c @@ -38,8 +38,7 @@ static char *get_early_cmdline(uintptr_t dtb_pa) if (IS_ENABLED(CONFIG_CMDLINE_EXTEND) || IS_ENABLED(CONFIG_CMDLINE_FORCE) || fdt_cmdline_size == 0 /* CONFIG_CMDLINE_FALLBACK */) { - strncat(early_cmdline, CONFIG_CMDLINE, - COMMAND_LINE_SIZE - fdt_cmdline_size); + strlcat(early_cmdline, CONFIG_CMDLINE, COMMAND_LINE_SIZE); }
return early_cmdline;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Niklas Cassel cassel@kernel.org
[ Upstream commit 6df0e077d76bd144c533b61d6182676aae6b0a85 ]
When libata calls ata_link_abort() to abort all ata queued commands, it calls blk_abort_request() on the SCSI command representing each QC.
This causes scsi_timeout() to be called, which calls scsi_eh_scmd_add() for each SCSI command.
scsi_eh_scmd_add() sets the SCSI host to state recovery, and then adds the command to shost->eh_cmd_q.
This will wake up the SCSI EH, and eventually the libata EH strategy handler will be called, which calls scsi_eh_flush_done_q() to either flush retry or flush finish each failed command.
The commands that are flush retried by scsi_eh_flush_done_q() are done so using scsi_queue_insert().
Before commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary"), __scsi_queue_insert() called blk_mq_requeue_request() with the second argument set to true, indicating that it should always kick/run the requeue list after inserting.
After commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary"), __scsi_queue_insert() does not kick/run the requeue list after inserting, if the current SCSI host state is recovery (which is the case in the libata example above).
This optimization is probably fine in most cases, as I can only assume that most often someone will eventually kick/run the queues.
However, that is not the case for scsi_eh_flush_done_q(), where we can see that the request gets inserted to the requeue list, but the queue is never started after the request has been inserted, leading to the block layer waiting for the completion of command that never gets to run.
Since scsi_eh_flush_done_q() is called by SCSI EH context, the SCSI host state is most likely always in recovery when this function is called.
Thus, let scsi_eh_flush_done_q() explicitly kick the requeue list after inserting a flush retry command, so that scsi_eh_flush_done_q() keeps the same behavior as before commit 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary").
Simple reproducer for the libata example above: $ hdparm -Y /dev/sda $ echo 1 > /sys/class/scsi_device/0:0:0:0/device/delete
Fixes: 8b566edbdbfb ("scsi: core: Only kick the requeue list if necessary") Reported-by: Kevin Locke kevin@kevinlocke.name Closes: https://lore.kernel.org/linux-scsi/ZZw3Th70wUUvCiCY@kevinlocke.name/ Signed-off-by: Niklas Cassel cassel@kernel.org Link: https://lore.kernel.org/r/20240111120533.3612509-1-cassel@kernel.org Reviewed-by: Bart Van Assche bvanassche@acm.org Reviewed-by: Damien Le Moal dlemoal@kernel.org Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/scsi_error.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c index 1223d34c04da..d983f4a0e9f1 100644 --- a/drivers/scsi/scsi_error.c +++ b/drivers/scsi/scsi_error.c @@ -2196,15 +2196,18 @@ void scsi_eh_flush_done_q(struct list_head *done_q) struct scsi_cmnd *scmd, *next;
list_for_each_entry_safe(scmd, next, done_q, eh_entry) { + struct scsi_device *sdev = scmd->device; + list_del_init(&scmd->eh_entry); - if (scsi_device_online(scmd->device) && - !scsi_noretry_cmd(scmd) && scsi_cmd_retry_allowed(scmd) && - scsi_eh_should_retry_cmd(scmd)) { + if (scsi_device_online(sdev) && !scsi_noretry_cmd(scmd) && + scsi_cmd_retry_allowed(scmd) && + scsi_eh_should_retry_cmd(scmd)) { SCSI_LOG_ERROR_RECOVERY(3, scmd_printk(KERN_INFO, scmd, "%s: flush retry cmd\n", current->comm)); scsi_queue_insert(scmd, SCSI_MLQUEUE_EH_RETRY); + blk_mq_kick_requeue_list(sdev->request_queue); } else { /* * If just we got sense for the device (called
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geert Uytterhoeven geert+renesas@glider.be
[ Upstream commit d87123aa9a7920e88633ffc5c5a0a22ab08bdc06 ]
One instance of gpio_backlight_platform_data.fbdev was renamed, but the second instance was forgotten, causing a build failure:
arch/sh/boards/mach-ecovec24/setup.c: In function ‘arch_setup’: arch/sh/boards/mach-ecovec24/setup.c:1223:37: error: ‘struct gpio_backlight_platform_data’ has no member named ‘fbdev’; did you mean ‘dev’? 1223 | gpio_backlight_data.fbdev = NULL; | ^~~~~ | dev
Fix this by updating the second instance.
Fixes: ed369def91c1579a ("backlight/gpio_backlight: Rename field 'fbdev' to 'dev'") Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202309231601.Uu6qcRnU-lkp@intel.com/ Signed-off-by: Geert Uytterhoeven geert+renesas@glider.be Acked-by: Thomas Zimmermann tzimmermann@suse.de Reviewed-by: John Paul Adrian Glaubitz glaubitz@physik.fu-berlin.de Link: https://lore.kernel.org/r/20230925111022.3626362-1-geert+renesas@glider.be Signed-off-by: John Paul Adrian Glaubitz glaubitz@physik.fu-berlin.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/sh/boards/mach-ecovec24/setup.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/sh/boards/mach-ecovec24/setup.c b/arch/sh/boards/mach-ecovec24/setup.c index 3be293335de5..7a788d44cc73 100644 --- a/arch/sh/boards/mach-ecovec24/setup.c +++ b/arch/sh/boards/mach-ecovec24/setup.c @@ -1220,7 +1220,7 @@ static int __init arch_setup(void) lcdc_info.ch[0].num_modes = ARRAY_SIZE(ecovec_dvi_modes);
/* No backlight */ - gpio_backlight_data.fbdev = NULL; + gpio_backlight_data.dev = NULL;
gpio_set_value(GPIO_PTA2, 1); gpio_set_value(GPIO_PTU1, 1);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paulo Alcantara pc@manguebit.com
[ Upstream commit 76025cc2285d9ede3d717fe4305d66f8be2d9346 ]
The data offset for the SMB3.1.1 POSIX create context will always be 8-byte aligned so having the check 'noff + nlen >= doff' in smb2_parse_contexts() is wrong as it will lead to -EINVAL because noff + nlen == doff.
Fix the sanity check to correctly handle aligned create context data.
Fixes: af1689a9b770 ("smb: client: fix potential OOBs in smb2_parse_contexts()") Signed-off-by: Paulo Alcantara pc@manguebit.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/smb2pdu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index 5276992e3647..888eb59ad86f 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -2185,7 +2185,7 @@ int smb2_parse_contexts(struct TCP_Server_Info *server,
noff = le16_to_cpu(cc->NameOffset); nlen = le16_to_cpu(cc->NameLength); - if (noff + nlen >= doff) + if (noff + nlen > doff) return -EINVAL;
name = (char *)cc + noff;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit 0c51cc6f2cb0108e7d49805f6e089cd85caab279 ]
So far, SMB multichannel could only scale up, but not scale down the number of channels. In this series of patch, we now allow the client to deal with the case of multichannel disabled on the server when the share is mounted. With that change, we now need the ability to scale down the channels.
This change allows the client to deal with cases of missing channels more gracefully.
Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: 78e727e58e54 ("cifs: update iface_last_update on each query-and-update") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/cifs_debug.c | 5 +++++ fs/smb/client/cifsglob.h | 1 + fs/smb/client/cifsproto.h | 2 +- fs/smb/client/connect.c | 6 +++++- fs/smb/client/sess.c | 28 ++++++++++++++++++++++++---- fs/smb/client/smb2transport.c | 8 +++++++- 6 files changed, 43 insertions(+), 7 deletions(-)
diff --git a/fs/smb/client/cifs_debug.c b/fs/smb/client/cifs_debug.c index a2584ad8808a..3230ed7eadde 100644 --- a/fs/smb/client/cifs_debug.c +++ b/fs/smb/client/cifs_debug.c @@ -138,6 +138,11 @@ cifs_dump_channel(struct seq_file *m, int i, struct cifs_chan *chan) { struct TCP_Server_Info *server = chan->server;
+ if (!server) { + seq_printf(m, "\n\n\t\tChannel: %d DISABLED", i+1); + return; + } + seq_printf(m, "\n\n\t\tChannel: %d ConnectionId: 0x%llx" "\n\t\tNumber of credits: %d,%d,%d Dialect 0x%x" "\n\t\tTCP status: %d Instance: %d" diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h index ec1e5e20a36b..e3ef8eee68d1 100644 --- a/fs/smb/client/cifsglob.h +++ b/fs/smb/client/cifsglob.h @@ -1060,6 +1060,7 @@ struct cifs_ses { spinlock_t chan_lock; /* ========= begin: protected by chan_lock ======== */ #define CIFS_MAX_CHANNELS 16 +#define CIFS_INVAL_CHAN_INDEX (-1) #define CIFS_ALL_CHANNELS_SET(ses) \ ((1UL << (ses)->chan_count) - 1) #define CIFS_ALL_CHANS_GOOD(ses) \ diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h index c858feaf4f92..0eb62ccd476f 100644 --- a/fs/smb/client/cifsproto.h +++ b/fs/smb/client/cifsproto.h @@ -622,7 +622,7 @@ bool is_server_using_iface(struct TCP_Server_Info *server, bool is_ses_using_iface(struct cifs_ses *ses, struct cifs_server_iface *iface); void cifs_ses_mark_for_reconnect(struct cifs_ses *ses);
-unsigned int +int cifs_ses_get_chan_index(struct cifs_ses *ses, struct TCP_Server_Info *server); void diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index b82f60d6f47e..a482afa3fa42 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -169,8 +169,12 @@ cifs_signal_cifsd_for_reconnect(struct TCP_Server_Info *server, list_for_each_entry(ses, &pserver->smb_ses_list, smb_ses_list) { spin_lock(&ses->chan_lock); for (i = 0; i < ses->chan_count; i++) { + if (!ses->chans[i].server) + continue; + spin_lock(&ses->chans[i].server->srv_lock); - ses->chans[i].server->tcpStatus = CifsNeedReconnect; + if (ses->chans[i].server->tcpStatus != CifsExiting) + ses->chans[i].server->tcpStatus = CifsNeedReconnect; spin_unlock(&ses->chans[i].server->srv_lock); } spin_unlock(&ses->chan_lock); diff --git a/fs/smb/client/sess.c b/fs/smb/client/sess.c index 80050e36f045..650a3ec9e6e5 100644 --- a/fs/smb/client/sess.c +++ b/fs/smb/client/sess.c @@ -69,7 +69,7 @@ bool is_ses_using_iface(struct cifs_ses *ses, struct cifs_server_iface *iface)
/* channel helper functions. assumed that chan_lock is held by caller. */
-unsigned int +int cifs_ses_get_chan_index(struct cifs_ses *ses, struct TCP_Server_Info *server) { @@ -85,14 +85,17 @@ cifs_ses_get_chan_index(struct cifs_ses *ses, cifs_dbg(VFS, "unable to get chan index for server: 0x%llx", server->conn_id); WARN_ON(1); - return 0; + return CIFS_INVAL_CHAN_INDEX; }
void cifs_chan_set_in_reconnect(struct cifs_ses *ses, struct TCP_Server_Info *server) { - unsigned int chan_index = cifs_ses_get_chan_index(ses, server); + int chan_index = cifs_ses_get_chan_index(ses, server); + + if (chan_index == CIFS_INVAL_CHAN_INDEX) + return;
ses->chans[chan_index].in_reconnect = true; } @@ -102,6 +105,8 @@ cifs_chan_clear_in_reconnect(struct cifs_ses *ses, struct TCP_Server_Info *server) { unsigned int chan_index = cifs_ses_get_chan_index(ses, server); + if (chan_index == CIFS_INVAL_CHAN_INDEX) + return;
ses->chans[chan_index].in_reconnect = false; } @@ -111,6 +116,8 @@ cifs_chan_in_reconnect(struct cifs_ses *ses, struct TCP_Server_Info *server) { unsigned int chan_index = cifs_ses_get_chan_index(ses, server); + if (chan_index == CIFS_INVAL_CHAN_INDEX) + return true; /* err on the safer side */
return CIFS_CHAN_IN_RECONNECT(ses, chan_index); } @@ -120,6 +127,8 @@ cifs_chan_set_need_reconnect(struct cifs_ses *ses, struct TCP_Server_Info *server) { unsigned int chan_index = cifs_ses_get_chan_index(ses, server); + if (chan_index == CIFS_INVAL_CHAN_INDEX) + return;
set_bit(chan_index, &ses->chans_need_reconnect); cifs_dbg(FYI, "Set reconnect bitmask for chan %u; now 0x%lx\n", @@ -131,6 +140,8 @@ cifs_chan_clear_need_reconnect(struct cifs_ses *ses, struct TCP_Server_Info *server) { unsigned int chan_index = cifs_ses_get_chan_index(ses, server); + if (chan_index == CIFS_INVAL_CHAN_INDEX) + return;
clear_bit(chan_index, &ses->chans_need_reconnect); cifs_dbg(FYI, "Cleared reconnect bitmask for chan %u; now 0x%lx\n", @@ -142,6 +153,8 @@ cifs_chan_needs_reconnect(struct cifs_ses *ses, struct TCP_Server_Info *server) { unsigned int chan_index = cifs_ses_get_chan_index(ses, server); + if (chan_index == CIFS_INVAL_CHAN_INDEX) + return true; /* err on the safer side */
return CIFS_CHAN_NEEDS_RECONNECT(ses, chan_index); } @@ -151,6 +164,8 @@ cifs_chan_is_iface_active(struct cifs_ses *ses, struct TCP_Server_Info *server) { unsigned int chan_index = cifs_ses_get_chan_index(ses, server); + if (chan_index == CIFS_INVAL_CHAN_INDEX) + return true; /* err on the safer side */
return ses->chans[chan_index].iface && ses->chans[chan_index].iface->is_active; @@ -293,7 +308,7 @@ cifs_chan_update_iface(struct cifs_ses *ses, struct TCP_Server_Info *server)
spin_lock(&ses->chan_lock); chan_index = cifs_ses_get_chan_index(ses, server); - if (!chan_index) { + if (chan_index == CIFS_INVAL_CHAN_INDEX) { spin_unlock(&ses->chan_lock); return 0; } @@ -403,6 +418,11 @@ cifs_chan_update_iface(struct cifs_ses *ses, struct TCP_Server_Info *server)
spin_lock(&ses->chan_lock); chan_index = cifs_ses_get_chan_index(ses, server); + if (chan_index == CIFS_INVAL_CHAN_INDEX) { + spin_unlock(&ses->chan_lock); + return 0; + } + ses->chans[chan_index].iface = iface;
/* No iface is found. if secondary chan, drop connection */ diff --git a/fs/smb/client/smb2transport.c b/fs/smb/client/smb2transport.c index a136fc4cc2b5..5a3ca62d2f07 100644 --- a/fs/smb/client/smb2transport.c +++ b/fs/smb/client/smb2transport.c @@ -413,7 +413,13 @@ generate_smb3signingkey(struct cifs_ses *ses, ses->ses_status == SES_GOOD);
chan_index = cifs_ses_get_chan_index(ses, server); - /* TODO: introduce ref counting for channels when the can be freed */ + if (chan_index == CIFS_INVAL_CHAN_INDEX) { + spin_unlock(&ses->chan_lock); + spin_unlock(&ses->ses_lock); + + return -EINVAL; + } + spin_unlock(&ses->chan_lock); spin_unlock(&ses->ses_lock);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit 19a4b9d6c372cab6a3b2c9a061a236136fe95274 ]
The delayed work for reconnect takes server struct as a parameter. But it does so without holding a ref to it. Normally, this may not show a problem as the reconnect work is only cancelled on umount.
However, since we now plan to support scaling down of channels, and the scale down can happen from reconnect work itself, we need to fix it.
This change takes a reference on the server struct before it is passed to the delayed work. And drops the reference in the delayed work itself. Or if the delayed work is successfully cancelled, by the process that cancels it.
Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: 78e727e58e54 ("cifs: update iface_last_update on each query-and-update") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/connect.c | 27 +++++++++++++++++++++------ fs/smb/client/smb2pdu.c | 23 +++++++++++++---------- 2 files changed, 34 insertions(+), 16 deletions(-)
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index a482afa3fa42..2f5be7dcd1f9 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -388,7 +388,13 @@ static int __cifs_reconnect(struct TCP_Server_Info *server, spin_unlock(&server->srv_lock); cifs_swn_reset_server_dstaddr(server); cifs_server_unlock(server); - mod_delayed_work(cifsiod_wq, &server->reconnect, 0); + + /* increase ref count which reconnect work will drop */ + spin_lock(&cifs_tcp_ses_lock); + server->srv_count++; + spin_unlock(&cifs_tcp_ses_lock); + if (mod_delayed_work(cifsiod_wq, &server->reconnect, 0)) + cifs_put_tcp_session(server, false); } } while (server->tcpStatus == CifsNeedReconnect);
@@ -518,7 +524,13 @@ static int reconnect_dfs_server(struct TCP_Server_Info *server) spin_unlock(&server->srv_lock); cifs_swn_reset_server_dstaddr(server); cifs_server_unlock(server); - mod_delayed_work(cifsiod_wq, &server->reconnect, 0); + + /* increase ref count which reconnect work will drop */ + spin_lock(&cifs_tcp_ses_lock); + server->srv_count++; + spin_unlock(&cifs_tcp_ses_lock); + if (mod_delayed_work(cifsiod_wq, &server->reconnect, 0)) + cifs_put_tcp_session(server, false); } while (server->tcpStatus == CifsNeedReconnect);
mutex_lock(&server->refpath_lock); @@ -1605,16 +1617,19 @@ cifs_put_tcp_session(struct TCP_Server_Info *server, int from_reconnect)
cancel_delayed_work_sync(&server->echo);
- if (from_reconnect) + if (from_reconnect) { /* * Avoid deadlock here: reconnect work calls * cifs_put_tcp_session() at its end. Need to be sure * that reconnect work does nothing with server pointer after * that step. */ - cancel_delayed_work(&server->reconnect); - else - cancel_delayed_work_sync(&server->reconnect); + if (cancel_delayed_work(&server->reconnect)) + cifs_put_tcp_session(server, from_reconnect); + } else { + if (cancel_delayed_work_sync(&server->reconnect)) + cifs_put_tcp_session(server, from_reconnect); + }
spin_lock(&server->srv_lock); server->tcpStatus = CifsExiting; diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index 888eb59ad86f..0274ef67457b 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -3879,12 +3879,6 @@ void smb2_reconnect_server(struct work_struct *work) } spin_unlock(&ses->chan_lock); } - /* - * Get the reference to server struct to be sure that the last call of - * cifs_put_tcon() in the loop below won't release the server pointer. - */ - if (tcon_exist || ses_exist) - server->srv_count++;
spin_unlock(&cifs_tcp_ses_lock);
@@ -3932,13 +3926,17 @@ void smb2_reconnect_server(struct work_struct *work)
done: cifs_dbg(FYI, "Reconnecting tcons and channels finished\n"); - if (resched) + if (resched) { queue_delayed_work(cifsiod_wq, &server->reconnect, 2 * HZ); + mutex_unlock(&pserver->reconnect_mutex); + + /* no need to put tcp session as we're retrying */ + return; + } mutex_unlock(&pserver->reconnect_mutex);
/* now we can safely release srv struct */ - if (tcon_exist || ses_exist) - cifs_put_tcp_session(server, 1); + cifs_put_tcp_session(server, true); }
int @@ -3958,7 +3956,12 @@ SMB2_echo(struct TCP_Server_Info *server) server->ops->need_neg(server)) { spin_unlock(&server->srv_lock); /* No need to send echo on newly established connections */ - mod_delayed_work(cifsiod_wq, &server->reconnect, 0); + spin_lock(&cifs_tcp_ses_lock); + server->srv_count++; + spin_unlock(&cifs_tcp_ses_lock); + if (mod_delayed_work(cifsiod_wq, &server->reconnect, 0)) + cifs_put_tcp_session(server, false); + return rc; } spin_unlock(&server->srv_lock);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit 705fc522fe9d58848c253ee0948567060f36e2a7 ]
When the user mounts with multichannel option, but the server does not support it, there can be a time in future where it can be supported.
With this change, such a case is handled.
Signed-off-by: Shyam Prasad N sprasad@microsoft.com Stable-dep-of: 78e727e58e54 ("cifs: update iface_last_update on each query-and-update") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/cifsproto.h | 1 + fs/smb/client/connect.c | 3 +++ fs/smb/client/smb2pdu.c | 32 ++++++++++++++++++++++++++++++-- 3 files changed, 34 insertions(+), 2 deletions(-)
diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h index 0eb62ccd476f..4a28cff87038 100644 --- a/fs/smb/client/cifsproto.h +++ b/fs/smb/client/cifsproto.h @@ -132,6 +132,7 @@ extern int SendReceiveBlockingLock(const unsigned int xid, struct smb_hdr *in_buf, struct smb_hdr *out_buf, int *bytes_returned); + void cifs_signal_cifsd_for_reconnect(struct TCP_Server_Info *server, bool all_channels); diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index 2f5be7dcd1f9..c0b1f30eecd7 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -128,6 +128,9 @@ static void smb2_query_server_interfaces(struct work_struct *work) */ rc = SMB3_request_interfaces(0, tcon, false); if (rc) { + if (rc == -EOPNOTSUPP) + return; + cifs_dbg(FYI, "%s: failed to query server interfaces: %d\n", __func__, rc); } diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index 0274ef67457b..288f22050c20 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -163,6 +163,7 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, int rc = 0; struct nls_table *nls_codepage = NULL; struct cifs_ses *ses; + int xid;
/* * SMB2s NegProt, SessSetup, Logoff do not have tcon yet so @@ -307,17 +308,44 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, tcon->need_reopen_files = true;
rc = cifs_tree_connect(0, tcon, nls_codepage); - mutex_unlock(&ses->session_mutex);
cifs_dbg(FYI, "reconnect tcon rc = %d\n", rc); if (rc) { /* If sess reconnected but tcon didn't, something strange ... */ + mutex_unlock(&ses->session_mutex); cifs_dbg(VFS, "reconnect tcon failed rc = %d\n", rc); goto out; }
+ if (!rc && + (server->capabilities & SMB2_GLOBAL_CAP_MULTI_CHANNEL)) { + mutex_unlock(&ses->session_mutex); + + /* + * query server network interfaces, in case they change + */ + xid = get_xid(); + rc = SMB3_request_interfaces(xid, tcon, false); + free_xid(xid); + + if (rc) + cifs_dbg(FYI, "%s: failed to query server interfaces: %d\n", + __func__, rc); + + if (ses->chan_max > ses->chan_count && + !SERVER_IS_CHAN(server)) { + if (ses->chan_count == 1) + cifs_server_dbg(VFS, "supports multichannel now\n"); + + cifs_try_adding_channels(ses); + } + } else { + mutex_unlock(&ses->session_mutex); + } + if (smb2_command != SMB2_INTERNAL_CMD) - mod_delayed_work(cifsiod_wq, &server->reconnect, 0); + if (mod_delayed_work(cifsiod_wq, &server->reconnect, 0)) + cifs_put_tcp_session(server, false);
atomic_inc(&tconInfoReconnectCount); out:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit ee1d21794e55ab76505745d24101331552182002 ]
When a server stops supporting multichannel, we will keep attempting reconnects to the secondary channels today. Avoid this by freeing extra channels when negotiate returns no multichannel support.
Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: 78e727e58e54 ("cifs: update iface_last_update on each query-and-update") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/cifsglob.h | 1 + fs/smb/client/cifsproto.h | 2 ++ fs/smb/client/connect.c | 10 ++++++ fs/smb/client/sess.c | 64 ++++++++++++++++++++++++++++----- fs/smb/client/smb2pdu.c | 76 ++++++++++++++++++++++++++++++++++++++- fs/smb/client/transport.c | 2 +- 6 files changed, 145 insertions(+), 10 deletions(-)
diff --git a/fs/smb/client/cifsglob.h b/fs/smb/client/cifsglob.h index e3ef8eee68d1..5e32c79f03a7 100644 --- a/fs/smb/client/cifsglob.h +++ b/fs/smb/client/cifsglob.h @@ -659,6 +659,7 @@ struct TCP_Server_Info { bool noautotune; /* do not autotune send buf sizes */ bool nosharesock; bool tcp_nodelay; + bool terminate; unsigned int credits; /* send no more requests at once */ unsigned int max_credits; /* can override large 32000 default at mnt */ unsigned int in_flight; /* number of requests on the wire to server */ diff --git a/fs/smb/client/cifsproto.h b/fs/smb/client/cifsproto.h index 4a28cff87038..c00f84420559 100644 --- a/fs/smb/client/cifsproto.h +++ b/fs/smb/client/cifsproto.h @@ -647,6 +647,8 @@ cifs_chan_needs_reconnect(struct cifs_ses *ses, bool cifs_chan_is_iface_active(struct cifs_ses *ses, struct TCP_Server_Info *server); +void +cifs_disable_secondary_channels(struct cifs_ses *ses); int cifs_chan_update_iface(struct cifs_ses *ses, struct TCP_Server_Info *server); int diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index c0b1f30eecd7..f43f51f2d1c1 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -215,6 +215,14 @@ cifs_mark_tcp_ses_conns_for_reconnect(struct TCP_Server_Info *server,
spin_lock(&cifs_tcp_ses_lock); list_for_each_entry_safe(ses, nses, &pserver->smb_ses_list, smb_ses_list) { + /* + * if channel has been marked for termination, nothing to do + * for the channel. in fact, we cannot find the channel for the + * server. So safe to exit here + */ + if (server->terminate) + break; + /* check if iface is still active */ spin_lock(&ses->chan_lock); if (!cifs_chan_is_iface_active(ses, server)) { @@ -252,6 +260,8 @@ cifs_mark_tcp_ses_conns_for_reconnect(struct TCP_Server_Info *server, spin_lock(&tcon->tc_lock); tcon->status = TID_NEED_RECON; spin_unlock(&tcon->tc_lock); + + cancel_delayed_work(&tcon->query_interfaces); } if (ses->tcon_ipc) { ses->tcon_ipc->need_reconnect = true; diff --git a/fs/smb/client/sess.c b/fs/smb/client/sess.c index 650a3ec9e6e5..2ce1b7571371 100644 --- a/fs/smb/client/sess.c +++ b/fs/smb/client/sess.c @@ -290,6 +290,60 @@ int cifs_try_adding_channels(struct cifs_ses *ses) return new_chan_count - old_chan_count; }
+/* + * called when multichannel is disabled by the server. + * this always gets called from smb2_reconnect + * and cannot get called in parallel threads. + */ +void +cifs_disable_secondary_channels(struct cifs_ses *ses) +{ + int i, chan_count; + struct TCP_Server_Info *server; + struct cifs_server_iface *iface; + + spin_lock(&ses->chan_lock); + chan_count = ses->chan_count; + if (chan_count == 1) + goto done; + + ses->chan_count = 1; + + /* for all secondary channels reset the need reconnect bit */ + ses->chans_need_reconnect &= 1; + + for (i = 1; i < chan_count; i++) { + iface = ses->chans[i].iface; + server = ses->chans[i].server; + + if (iface) { + spin_lock(&ses->iface_lock); + kref_put(&iface->refcount, release_iface); + ses->chans[i].iface = NULL; + iface->num_channels--; + if (iface->weight_fulfilled) + iface->weight_fulfilled--; + spin_unlock(&ses->iface_lock); + } + + spin_unlock(&ses->chan_lock); + if (server && !server->terminate) { + server->terminate = true; + cifs_signal_cifsd_for_reconnect(server, false); + } + spin_lock(&ses->chan_lock); + + if (server) { + ses->chans[i].server = NULL; + cifs_put_tcp_session(server, false); + } + + } + +done: + spin_unlock(&ses->chan_lock); +} + /* * update the iface for the channel if necessary. * will return 0 when iface is updated, 1 if removed, 2 otherwise @@ -589,14 +643,10 @@ cifs_ses_add_channel(struct cifs_ses *ses,
out: if (rc && chan->server) { - /* - * we should avoid race with these delayed works before we - * remove this channel - */ - cancel_delayed_work_sync(&chan->server->echo); - cancel_delayed_work_sync(&chan->server->reconnect); + cifs_put_tcp_session(chan->server, 0);
spin_lock(&ses->chan_lock); + /* we rely on all bits beyond chan_count to be clear */ cifs_chan_clear_need_reconnect(ses, chan->server); ses->chan_count--; @@ -606,8 +656,6 @@ cifs_ses_add_channel(struct cifs_ses *ses, */ WARN_ON(ses->chan_count < 1); spin_unlock(&ses->chan_lock); - - cifs_put_tcp_session(chan->server, 0); }
kfree(ctx->UNC); diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index 288f22050c20..f1977987ae74 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -164,6 +164,8 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, struct nls_table *nls_codepage = NULL; struct cifs_ses *ses; int xid; + struct TCP_Server_Info *pserver; + unsigned int chan_index;
/* * SMB2s NegProt, SessSetup, Logoff do not have tcon yet so @@ -224,6 +226,12 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, return -EAGAIN; } } + + /* if server is marked for termination, cifsd will cleanup */ + if (server->terminate) { + spin_unlock(&server->srv_lock); + return -EHOSTDOWN; + } spin_unlock(&server->srv_lock);
again: @@ -242,12 +250,24 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, tcon->need_reconnect);
mutex_lock(&ses->session_mutex); + /* + * if this is called by delayed work, and the channel has been disabled + * in parallel, the delayed work can continue to execute in parallel + * there's a chance that this channel may not exist anymore + */ + spin_lock(&server->srv_lock); + if (server->tcpStatus == CifsExiting) { + spin_unlock(&server->srv_lock); + mutex_unlock(&ses->session_mutex); + rc = -EHOSTDOWN; + goto out; + } + /* * Recheck after acquire mutex. If another thread is negotiating * and the server never sends an answer the socket will be closed * and tcpStatus set to reconnect. */ - spin_lock(&server->srv_lock); if (server->tcpStatus == CifsNeedReconnect) { spin_unlock(&server->srv_lock); mutex_unlock(&ses->session_mutex); @@ -284,6 +304,53 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon,
rc = cifs_negotiate_protocol(0, ses, server); if (!rc) { + /* + * if server stopped supporting multichannel + * and the first channel reconnected, disable all the others. + */ + if (ses->chan_count > 1 && + !(server->capabilities & SMB2_GLOBAL_CAP_MULTI_CHANNEL)) { + if (SERVER_IS_CHAN(server)) { + cifs_dbg(VFS, "server %s does not support " \ + "multichannel anymore. skipping secondary channel\n", + ses->server->hostname); + + spin_lock(&ses->chan_lock); + chan_index = cifs_ses_get_chan_index(ses, server); + if (chan_index == CIFS_INVAL_CHAN_INDEX) { + spin_unlock(&ses->chan_lock); + goto skip_terminate; + } + + ses->chans[chan_index].server = NULL; + spin_unlock(&ses->chan_lock); + + /* + * the above reference of server by channel + * needs to be dropped without holding chan_lock + * as cifs_put_tcp_session takes a higher lock + * i.e. cifs_tcp_ses_lock + */ + cifs_put_tcp_session(server, 1); + + server->terminate = true; + cifs_signal_cifsd_for_reconnect(server, false); + + /* mark primary server as needing reconnect */ + pserver = server->primary_server; + cifs_signal_cifsd_for_reconnect(pserver, false); + +skip_terminate: + mutex_unlock(&ses->session_mutex); + rc = -EHOSTDOWN; + goto out; + } else { + cifs_server_dbg(VFS, "does not support " \ + "multichannel anymore. disabling all other channels\n"); + cifs_disable_secondary_channels(ses); + } + } + rc = cifs_setup_session(0, ses, server, nls_codepage); if ((rc == -EACCES) && !tcon->retry) { mutex_unlock(&ses->session_mutex); @@ -3863,6 +3930,13 @@ void smb2_reconnect_server(struct work_struct *work) /* Prevent simultaneous reconnects that can corrupt tcon->rlist list */ mutex_lock(&pserver->reconnect_mutex);
+ /* if the server is marked for termination, drop the ref count here */ + if (server->terminate) { + cifs_put_tcp_session(server, true); + mutex_unlock(&pserver->reconnect_mutex); + return; + } + INIT_LIST_HEAD(&tmp_list); INIT_LIST_HEAD(&tmp_ses_list); cifs_dbg(FYI, "Reconnecting tcons and channels\n"); diff --git a/fs/smb/client/transport.c b/fs/smb/client/transport.c index d553b7a54621..4f717ad7c21b 100644 --- a/fs/smb/client/transport.c +++ b/fs/smb/client/transport.c @@ -1023,7 +1023,7 @@ struct TCP_Server_Info *cifs_pick_channel(struct cifs_ses *ses) spin_lock(&ses->chan_lock); for (i = 0; i < ses->chan_count; i++) { server = ses->chans[i].server; - if (!server) + if (!server || server->terminate) continue;
/*
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit 823342524868168bf681f135d01b4ae10f5863ec ]
This reverts commit 19a4b9d6c372cab6a3b2c9a061a236136fe95274.
This earlier commit was making an assumption that each mod_delayed_work called for the reconnect work would result in smb2_reconnect_server being called twice. This assumption turns out to be untrue. So reverting this change for now.
I will submit a follow-up patch to fix the actual problem in a different way.
Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: 78e727e58e54 ("cifs: update iface_last_update on each query-and-update") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/connect.c | 27 ++++++--------------------- fs/smb/client/smb2pdu.c | 23 ++++++++++------------- 2 files changed, 16 insertions(+), 34 deletions(-)
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index f43f51f2d1c1..2a30245287d5 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -401,13 +401,7 @@ static int __cifs_reconnect(struct TCP_Server_Info *server, spin_unlock(&server->srv_lock); cifs_swn_reset_server_dstaddr(server); cifs_server_unlock(server); - - /* increase ref count which reconnect work will drop */ - spin_lock(&cifs_tcp_ses_lock); - server->srv_count++; - spin_unlock(&cifs_tcp_ses_lock); - if (mod_delayed_work(cifsiod_wq, &server->reconnect, 0)) - cifs_put_tcp_session(server, false); + mod_delayed_work(cifsiod_wq, &server->reconnect, 0); } } while (server->tcpStatus == CifsNeedReconnect);
@@ -537,13 +531,7 @@ static int reconnect_dfs_server(struct TCP_Server_Info *server) spin_unlock(&server->srv_lock); cifs_swn_reset_server_dstaddr(server); cifs_server_unlock(server); - - /* increase ref count which reconnect work will drop */ - spin_lock(&cifs_tcp_ses_lock); - server->srv_count++; - spin_unlock(&cifs_tcp_ses_lock); - if (mod_delayed_work(cifsiod_wq, &server->reconnect, 0)) - cifs_put_tcp_session(server, false); + mod_delayed_work(cifsiod_wq, &server->reconnect, 0); } while (server->tcpStatus == CifsNeedReconnect);
mutex_lock(&server->refpath_lock); @@ -1630,19 +1618,16 @@ cifs_put_tcp_session(struct TCP_Server_Info *server, int from_reconnect)
cancel_delayed_work_sync(&server->echo);
- if (from_reconnect) { + if (from_reconnect) /* * Avoid deadlock here: reconnect work calls * cifs_put_tcp_session() at its end. Need to be sure * that reconnect work does nothing with server pointer after * that step. */ - if (cancel_delayed_work(&server->reconnect)) - cifs_put_tcp_session(server, from_reconnect); - } else { - if (cancel_delayed_work_sync(&server->reconnect)) - cifs_put_tcp_session(server, from_reconnect); - } + cancel_delayed_work(&server->reconnect); + else + cancel_delayed_work_sync(&server->reconnect);
spin_lock(&server->srv_lock); server->tcpStatus = CifsExiting; diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index f1977987ae74..da752f41a4e6 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -3981,6 +3981,12 @@ void smb2_reconnect_server(struct work_struct *work) } spin_unlock(&ses->chan_lock); } + /* + * Get the reference to server struct to be sure that the last call of + * cifs_put_tcon() in the loop below won't release the server pointer. + */ + if (tcon_exist || ses_exist) + server->srv_count++;
spin_unlock(&cifs_tcp_ses_lock);
@@ -4028,17 +4034,13 @@ void smb2_reconnect_server(struct work_struct *work)
done: cifs_dbg(FYI, "Reconnecting tcons and channels finished\n"); - if (resched) { + if (resched) queue_delayed_work(cifsiod_wq, &server->reconnect, 2 * HZ); - mutex_unlock(&pserver->reconnect_mutex); - - /* no need to put tcp session as we're retrying */ - return; - } mutex_unlock(&pserver->reconnect_mutex);
/* now we can safely release srv struct */ - cifs_put_tcp_session(server, true); + if (tcon_exist || ses_exist) + cifs_put_tcp_session(server, 1); }
int @@ -4058,12 +4060,7 @@ SMB2_echo(struct TCP_Server_Info *server) server->ops->need_neg(server)) { spin_unlock(&server->srv_lock); /* No need to send echo on newly established connections */ - spin_lock(&cifs_tcp_ses_lock); - server->srv_count++; - spin_unlock(&cifs_tcp_ses_lock); - if (mod_delayed_work(cifsiod_wq, &server->reconnect, 0)) - cifs_put_tcp_session(server, false); - + mod_delayed_work(cifsiod_wq, &server->reconnect, 0); return rc; } spin_unlock(&server->srv_lock);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit 04909192ada3285070f8ced0af7f07735478b364 ]
Reconnect worker currently assumes that the server struct is alive and only takes reference on the server if it needs to call smb2_reconnect.
With the new ability to disable channels based on whether the server has multichannel disabled, this becomes a problem when we need to disable established channels. While disabling the channels and deallocating the server, there could be reconnect work that could not be cancelled (because it started).
This change forces the reconnect worker to unconditionally take a reference on the server when it runs.
Also, this change now allows smb2_reconnect to know if it was called by the reconnect worker. Based on this, the cifs_put_tcp_session can decide whether it can cancel the reconnect work synchronously or not.
Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: 78e727e58e54 ("cifs: update iface_last_update on each query-and-update") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/connect.c | 8 ++++---- fs/smb/client/smb2pdu.c | 29 +++++++++++++++-------------- 2 files changed, 19 insertions(+), 18 deletions(-)
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index 2a30245287d5..432248e955ed 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -1612,10 +1612,6 @@ cifs_put_tcp_session(struct TCP_Server_Info *server, int from_reconnect) list_del_init(&server->tcp_ses_list); spin_unlock(&cifs_tcp_ses_lock);
- /* For secondary channels, we pick up ref-count on the primary server */ - if (SERVER_IS_CHAN(server)) - cifs_put_tcp_session(server->primary_server, from_reconnect); - cancel_delayed_work_sync(&server->echo);
if (from_reconnect) @@ -1629,6 +1625,10 @@ cifs_put_tcp_session(struct TCP_Server_Info *server, int from_reconnect) else cancel_delayed_work_sync(&server->reconnect);
+ /* For secondary channels, we pick up ref-count on the primary server */ + if (SERVER_IS_CHAN(server)) + cifs_put_tcp_session(server->primary_server, from_reconnect); + spin_lock(&server->srv_lock); server->tcpStatus = CifsExiting; spin_unlock(&server->srv_lock); diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index da752f41a4e6..a3995c6dc1ad 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -158,7 +158,7 @@ smb2_hdr_assemble(struct smb2_hdr *shdr, __le16 smb2_cmd,
static int smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, - struct TCP_Server_Info *server) + struct TCP_Server_Info *server, bool from_reconnect) { int rc = 0; struct nls_table *nls_codepage = NULL; @@ -331,7 +331,7 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, * as cifs_put_tcp_session takes a higher lock * i.e. cifs_tcp_ses_lock */ - cifs_put_tcp_session(server, 1); + cifs_put_tcp_session(server, from_reconnect);
server->terminate = true; cifs_signal_cifsd_for_reconnect(server, false); @@ -504,7 +504,7 @@ static int smb2_plain_req_init(__le16 smb2_command, struct cifs_tcon *tcon, { int rc;
- rc = smb2_reconnect(smb2_command, tcon, server); + rc = smb2_reconnect(smb2_command, tcon, server, false); if (rc) return rc;
@@ -3924,6 +3924,15 @@ void smb2_reconnect_server(struct work_struct *work) int rc; bool resched = false;
+ /* first check if ref count has reached 0, if not inc ref count */ + spin_lock(&cifs_tcp_ses_lock); + if (!server->srv_count) { + spin_unlock(&cifs_tcp_ses_lock); + return; + } + server->srv_count++; + spin_unlock(&cifs_tcp_ses_lock); + /* If server is a channel, select the primary channel */ pserver = SERVER_IS_CHAN(server) ? server->primary_server : server;
@@ -3981,17 +3990,10 @@ void smb2_reconnect_server(struct work_struct *work) } spin_unlock(&ses->chan_lock); } - /* - * Get the reference to server struct to be sure that the last call of - * cifs_put_tcon() in the loop below won't release the server pointer. - */ - if (tcon_exist || ses_exist) - server->srv_count++; - spin_unlock(&cifs_tcp_ses_lock);
list_for_each_entry_safe(tcon, tcon2, &tmp_list, rlist) { - rc = smb2_reconnect(SMB2_INTERNAL_CMD, tcon, server); + rc = smb2_reconnect(SMB2_INTERNAL_CMD, tcon, server, true); if (!rc) cifs_reopen_persistent_handles(tcon); else @@ -4024,7 +4026,7 @@ void smb2_reconnect_server(struct work_struct *work) /* now reconnect sessions for necessary channels */ list_for_each_entry_safe(ses, ses2, &tmp_ses_list, rlist) { tcon->ses = ses; - rc = smb2_reconnect(SMB2_INTERNAL_CMD, tcon, server); + rc = smb2_reconnect(SMB2_INTERNAL_CMD, tcon, server, true); if (rc) resched = true; list_del_init(&ses->rlist); @@ -4039,8 +4041,7 @@ void smb2_reconnect_server(struct work_struct *work) mutex_unlock(&pserver->reconnect_mutex);
/* now we can safely release srv struct */ - if (tcon_exist || ses_exist) - cifs_put_tcp_session(server, 1); + cifs_put_tcp_session(server, true); }
int
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit f591062bdbf4742b7f1622173017f19e927057b0 ]
Some servers like Azure SMB servers always advertise multichannel capability in server capabilities list. Such servers return error STATUS_NOT_IMPLEMENTED for ioctl calls to query server interfaces, and expect clients to consider that as a sign that they do not support multichannel.
We already handled this at mount time. Soon after the tree connect, we query server interfaces. And when server returned STATUS_NOT_IMPLEMENTED, we kept interface list as empty. When cifs_try_adding_channels gets called, it would not find any interfaces, so will not add channels.
For the case where an active multichannel mount exists, and multichannel is disabled by such a server, this change will now allow the client to disable secondary channels on the mount. It will check the return status of query server interfaces call soon after a tree reconnect. If the return status is EOPNOTSUPP, then instead of the check to add more channels, we'll disable the secondary channels instead.
For better code reuse, this change also moves the common code for disabling multichannel to a helper function.
Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: 78e727e58e54 ("cifs: update iface_last_update on each query-and-update") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/smb2ops.c | 8 +-- fs/smb/client/smb2pdu.c | 107 +++++++++++++++++++++++++--------------- 2 files changed, 69 insertions(+), 46 deletions(-)
diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c index 0604696f59c1..8caf2cefc8a7 100644 --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -614,7 +614,7 @@ parse_server_interfaces(struct network_interface_info_ioctl_rsp *buf, "multichannel not available\n" "Empty network interface list returned by server %s\n", ses->server->hostname); - rc = -EINVAL; + rc = -EOPNOTSUPP; goto out; }
@@ -734,12 +734,6 @@ parse_server_interfaces(struct network_interface_info_ioctl_rsp *buf, if ((bytes_left > 8) || p->Next) cifs_dbg(VFS, "%s: incomplete interface info\n", __func__);
- - if (!ses->iface_count) { - rc = -EINVAL; - goto out; - } - out: /* * Go through the list again and put the inactive entries diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index a3995c6dc1ad..d846c238b7dd 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -156,6 +156,57 @@ smb2_hdr_assemble(struct smb2_hdr *shdr, __le16 smb2_cmd, return; }
+/* helper function for code reuse */ +static int +cifs_chan_skip_or_disable(struct cifs_ses *ses, + struct TCP_Server_Info *server, + bool from_reconnect) +{ + struct TCP_Server_Info *pserver; + unsigned int chan_index; + + if (SERVER_IS_CHAN(server)) { + cifs_dbg(VFS, + "server %s does not support multichannel anymore. Skip secondary channel\n", + ses->server->hostname); + + spin_lock(&ses->chan_lock); + chan_index = cifs_ses_get_chan_index(ses, server); + if (chan_index == CIFS_INVAL_CHAN_INDEX) { + spin_unlock(&ses->chan_lock); + goto skip_terminate; + } + + ses->chans[chan_index].server = NULL; + spin_unlock(&ses->chan_lock); + + /* + * the above reference of server by channel + * needs to be dropped without holding chan_lock + * as cifs_put_tcp_session takes a higher lock + * i.e. cifs_tcp_ses_lock + */ + cifs_put_tcp_session(server, from_reconnect); + + server->terminate = true; + cifs_signal_cifsd_for_reconnect(server, false); + + /* mark primary server as needing reconnect */ + pserver = server->primary_server; + cifs_signal_cifsd_for_reconnect(pserver, false); +skip_terminate: + mutex_unlock(&ses->session_mutex); + return -EHOSTDOWN; + } + + cifs_server_dbg(VFS, + "server does not support multichannel anymore. Disable all other channels\n"); + cifs_disable_secondary_channels(ses); + + + return 0; +} + static int smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, struct TCP_Server_Info *server, bool from_reconnect) @@ -164,8 +215,6 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, struct nls_table *nls_codepage = NULL; struct cifs_ses *ses; int xid; - struct TCP_Server_Info *pserver; - unsigned int chan_index;
/* * SMB2s NegProt, SessSetup, Logoff do not have tcon yet so @@ -310,44 +359,11 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, */ if (ses->chan_count > 1 && !(server->capabilities & SMB2_GLOBAL_CAP_MULTI_CHANNEL)) { - if (SERVER_IS_CHAN(server)) { - cifs_dbg(VFS, "server %s does not support " \ - "multichannel anymore. skipping secondary channel\n", - ses->server->hostname); - - spin_lock(&ses->chan_lock); - chan_index = cifs_ses_get_chan_index(ses, server); - if (chan_index == CIFS_INVAL_CHAN_INDEX) { - spin_unlock(&ses->chan_lock); - goto skip_terminate; - } - - ses->chans[chan_index].server = NULL; - spin_unlock(&ses->chan_lock); - - /* - * the above reference of server by channel - * needs to be dropped without holding chan_lock - * as cifs_put_tcp_session takes a higher lock - * i.e. cifs_tcp_ses_lock - */ - cifs_put_tcp_session(server, from_reconnect); - - server->terminate = true; - cifs_signal_cifsd_for_reconnect(server, false); - - /* mark primary server as needing reconnect */ - pserver = server->primary_server; - cifs_signal_cifsd_for_reconnect(pserver, false); - -skip_terminate: + rc = cifs_chan_skip_or_disable(ses, server, + from_reconnect); + if (rc) { mutex_unlock(&ses->session_mutex); - rc = -EHOSTDOWN; goto out; - } else { - cifs_server_dbg(VFS, "does not support " \ - "multichannel anymore. disabling all other channels\n"); - cifs_disable_secondary_channels(ses); } }
@@ -395,11 +411,23 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, rc = SMB3_request_interfaces(xid, tcon, false); free_xid(xid);
- if (rc) + if (rc == -EOPNOTSUPP) { + /* + * some servers like Azure SMB server do not advertise + * that multichannel has been disabled with server + * capabilities, rather return STATUS_NOT_IMPLEMENTED. + * treat this as server not supporting multichannel + */ + + rc = cifs_chan_skip_or_disable(ses, server, + from_reconnect); + goto skip_add_channels; + } else if (rc) cifs_dbg(FYI, "%s: failed to query server interfaces: %d\n", __func__, rc);
if (ses->chan_max > ses->chan_count && + ses->iface_count && !SERVER_IS_CHAN(server)) { if (ses->chan_count == 1) cifs_server_dbg(VFS, "supports multichannel now\n"); @@ -409,6 +437,7 @@ smb2_reconnect(__le16 smb2_command, struct cifs_tcon *tcon, } else { mutex_unlock(&ses->session_mutex); } +skip_add_channels:
if (smb2_command != SMB2_INTERNAL_CMD) if (mod_delayed_work(cifsiod_wq, &server->reconnect, 0))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit 78e727e58e54efca4c23863fbd9e16e9d2d83f81 ]
iface_last_update was an unused field when it was introduced. Later, when we had periodic update of server interface list, this field was used regularly to decide when to update next.
However, with the new logic of updating the interfaces, it becomes crucial that this field be updated whenever parse_server_interfaces runs successfully.
This change updates this field when either the server does not support query of interfaces; so that we do not query the interfaces repeatedly. It also updates the field when the function reaches the end.
Fixes: aa45dadd34e4 ("cifs: change iface_list from array to sorted linked list") Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/smb2ops.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c index 8caf2cefc8a7..e33ed0fbc318 100644 --- a/fs/smb/client/smb2ops.c +++ b/fs/smb/client/smb2ops.c @@ -615,6 +615,7 @@ parse_server_interfaces(struct network_interface_info_ioctl_rsp *buf, "Empty network interface list returned by server %s\n", ses->server->hostname); rc = -EOPNOTSUPP; + ses->iface_last_update = jiffies; goto out; }
@@ -712,7 +713,6 @@ parse_server_interfaces(struct network_interface_info_ioctl_rsp *buf,
ses->iface_count++; spin_unlock(&ses->iface_lock); - ses->iface_last_update = jiffies; next_iface: nb_iface++; next = le32_to_cpu(p->Next); @@ -734,6 +734,8 @@ parse_server_interfaces(struct network_interface_info_ioctl_rsp *buf, if ((bytes_left > 8) || p->Next) cifs_dbg(VFS, "%s: incomplete interface info\n", __func__);
+ ses->iface_last_update = jiffies; + out: /* * Go through the list again and put the inactive entries
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Geoff Levand geoff@infradead.org
commit 482b718a84f08b6fc84879c3e90cc57dba11c115 upstream.
Commit 8c5fa3b5c4df ("powerpc/64: Make ELFv2 the default for big-endian builds"), merged in Linux-6.5-rc1 changes the calling ABI in a way that is incompatible with the current code for the PS3's LV1 hypervisor calls.
This change just adds the line '# CONFIG_PPC64_BIG_ENDIAN_ELF_ABI_V2 is not set' to the ps3_defconfig file so that the PPC64_ELF_ABI_V1 is used.
Fixes run time errors like these:
BUG: Kernel NULL pointer dereference at 0x00000000 Faulting instruction address: 0xc000000000047cf0 Oops: Kernel access of bad area, sig: 11 [#1] Call Trace: [c0000000023039e0] [c00000000100ebfc] ps3_create_spu+0xc4/0x2b0 (unreliable) [c000000002303ab0] [c00000000100d4c4] create_spu+0xcc/0x3c4 [c000000002303b40] [c00000000100eae4] ps3_enumerate_spus+0xa4/0xf8
Fixes: 8c5fa3b5c4df ("powerpc/64: Make ELFv2 the default for big-endian builds") Cc: stable@vger.kernel.org # v6.5+ Signed-off-by: Geoff Levand geoff@infradead.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://msgid.link/df906ac1-5f17-44b9-b0bb-7cd292a0df65@infradead.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/powerpc/configs/ps3_defconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/powerpc/configs/ps3_defconfig b/arch/powerpc/configs/ps3_defconfig index 2b175ddf82f0..aa8bb0208bcc 100644 --- a/arch/powerpc/configs/ps3_defconfig +++ b/arch/powerpc/configs/ps3_defconfig @@ -24,6 +24,7 @@ CONFIG_PS3_VRAM=m CONFIG_PS3_LPM=m # CONFIG_PPC_OF_BOOT_TRAMPOLINE is not set CONFIG_KEXEC=y +# CONFIG_PPC64_BIG_ENDIAN_ELF_ABI_V2 is not set CONFIG_PPC_4K_PAGES=y CONFIG_SCHED_SMT=y CONFIG_PM=y
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Suraj Jitindar Singh surajjs@amazon.com
commit 7c784d624819acbeefb0018bac89e632467cca5a upstream.
The ext4 filesystem tracks the trim status of blocks at the group level. When an entire group has been trimmed then it is marked as such and subsequent trim invocations with the same minimum trim size will not be attempted on that group unless it is marked as able to be trimmed again such as when a block is freed.
Currently the last group can't be marked as trimmed due to incorrect logic in ext4_last_grp_cluster(). ext4_last_grp_cluster() is supposed to return the zero based index of the last cluster in a group. This is then used by ext4_try_to_trim_range() to determine if the trim operation spans the entire group and as such if the trim status of the group should be recorded.
ext4_last_grp_cluster() takes a 0 based group index, thus the valid values for grp are 0..(ext4_get_groups_count - 1). Any group index less than (ext4_get_groups_count - 1) is not the last group and must have EXT4_CLUSTERS_PER_GROUP(sb) clusters. For the last group we need to calculate the number of clusters based on the number of blocks in the group. Finally subtract 1 from the number of clusters as zero based indexing is expected. Rearrange the function slightly to make it clear what we are calculating and returning.
Reproducer: // Create file system where the last group has fewer blocks than // blocks per group $ mkfs.ext4 -b 4096 -g 8192 /dev/nvme0n1 8191 $ mount /dev/nvme0n1 /mnt
Before Patch: $ fstrim -v /mnt /mnt: 25.9 MiB (27156480 bytes) trimmed // Group not marked as trimmed so second invocation still discards blocks $ fstrim -v /mnt /mnt: 25.9 MiB (27156480 bytes) trimmed
After Patch: fstrim -v /mnt /mnt: 25.9 MiB (27156480 bytes) trimmed // Group marked as trimmed so second invocation DOESN'T discard any blocks fstrim -v /mnt /mnt: 0 B (0 bytes) trimmed
Fixes: 45e4ab320c9b ("ext4: move setting of trimmed bit into ext4_try_to_trim_range()") Cc: stable@vger.kernel.org # 4.19+ Signed-off-by: Suraj Jitindar Singh surajjs@amazon.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20231213051635.37731-1-surajjs@amazon.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/mballoc.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -6887,11 +6887,16 @@ __acquires(bitlock) static ext4_grpblk_t ext4_last_grp_cluster(struct super_block *sb, ext4_group_t grp) { - if (grp < ext4_get_groups_count(sb)) - return EXT4_CLUSTERS_PER_GROUP(sb) - 1; - return (ext4_blocks_count(EXT4_SB(sb)->s_es) - - ext4_group_first_block_no(sb, grp) - 1) >> - EXT4_CLUSTER_BITS(sb); + unsigned long nr_clusters_in_group; + + if (grp < (ext4_get_groups_count(sb) - 1)) + nr_clusters_in_group = EXT4_CLUSTERS_PER_GROUP(sb); + else + nr_clusters_in_group = (ext4_blocks_count(EXT4_SB(sb)->s_es) - + ext4_group_first_block_no(sb, grp)) + >> EXT4_CLUSTER_BITS(sb); + + return nr_clusters_in_group - 1; }
static bool ext4_trim_interrupted(void)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 6aa09a5bccd8e224d917afdb4c278fc66aacde4d upstream.
In preparation for subsequent changes, split async_schedule_node_domain() in two pieces so as to allow the bottom part of it to be called from a somewhat different code path.
No functional impact.
Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Stanislaw Gruszka stanislaw.gruszka@linux.intel.com Tested-by: Youngmin Nam youngmin.nam@samsung.com Reviewed-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/async.c | 56 ++++++++++++++++++++++++++++++++++---------------------- 1 file changed, 34 insertions(+), 22 deletions(-)
--- a/kernel/async.c +++ b/kernel/async.c @@ -145,6 +145,39 @@ static void async_run_entry_fn(struct wo wake_up(&async_done); }
+static async_cookie_t __async_schedule_node_domain(async_func_t func, + void *data, int node, + struct async_domain *domain, + struct async_entry *entry) +{ + async_cookie_t newcookie; + unsigned long flags; + + INIT_LIST_HEAD(&entry->domain_list); + INIT_LIST_HEAD(&entry->global_list); + INIT_WORK(&entry->work, async_run_entry_fn); + entry->func = func; + entry->data = data; + entry->domain = domain; + + spin_lock_irqsave(&async_lock, flags); + + /* allocate cookie and queue */ + newcookie = entry->cookie = next_cookie++; + + list_add_tail(&entry->domain_list, &domain->pending); + if (domain->registered) + list_add_tail(&entry->global_list, &async_global_pending); + + atomic_inc(&entry_count); + spin_unlock_irqrestore(&async_lock, flags); + + /* schedule for execution */ + queue_work_node(node, system_unbound_wq, &entry->work); + + return newcookie; +} + /** * async_schedule_node_domain - NUMA specific version of async_schedule_domain * @func: function to execute asynchronously @@ -186,29 +219,8 @@ async_cookie_t async_schedule_node_domai func(data, newcookie); return newcookie; } - INIT_LIST_HEAD(&entry->domain_list); - INIT_LIST_HEAD(&entry->global_list); - INIT_WORK(&entry->work, async_run_entry_fn); - entry->func = func; - entry->data = data; - entry->domain = domain; - - spin_lock_irqsave(&async_lock, flags);
- /* allocate cookie and queue */ - newcookie = entry->cookie = next_cookie++; - - list_add_tail(&entry->domain_list, &domain->pending); - if (domain->registered) - list_add_tail(&entry->global_list, &async_global_pending); - - atomic_inc(&entry_count); - spin_unlock_irqrestore(&async_lock, flags); - - /* schedule for execution */ - queue_work_node(node, system_unbound_wq, &entry->work); - - return newcookie; + return __async_schedule_node_domain(func, data, node, domain, entry); } EXPORT_SYMBOL_GPL(async_schedule_node_domain);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 7d4b5d7a37bdd63a5a3371b988744b060d5bb86f upstream.
In preparation for subsequent changes, introduce a specialized variant of async_schedule_dev() that will not invoke the argument function synchronously when it cannot be scheduled for asynchronous execution.
The new function, async_schedule_dev_nocall(), will be used for fixing possible deadlocks in the system-wide power management core code.
Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Stanislaw Gruszka stanislaw.gruszka@linux.intel.com for the series. Tested-by: Youngmin Nam youngmin.nam@samsung.com Reviewed-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/async.h | 2 ++ kernel/async.c | 29 +++++++++++++++++++++++++++++ 2 files changed, 31 insertions(+)
--- a/include/linux/async.h +++ b/include/linux/async.h @@ -90,6 +90,8 @@ async_schedule_dev(async_func_t func, st return async_schedule_node(func, dev, dev_to_node(dev)); }
+bool async_schedule_dev_nocall(async_func_t func, struct device *dev); + /** * async_schedule_dev_domain - A device specific version of async_schedule_domain * @func: function to execute asynchronously --- a/kernel/async.c +++ b/kernel/async.c @@ -244,6 +244,35 @@ async_cookie_t async_schedule_node(async EXPORT_SYMBOL_GPL(async_schedule_node);
/** + * async_schedule_dev_nocall - A simplified variant of async_schedule_dev() + * @func: function to execute asynchronously + * @dev: device argument to be passed to function + * + * @dev is used as both the argument for the function and to provide NUMA + * context for where to run the function. + * + * If the asynchronous execution of @func is scheduled successfully, return + * true. Otherwise, do nothing and return false, unlike async_schedule_dev() + * that will run the function synchronously then. + */ +bool async_schedule_dev_nocall(async_func_t func, struct device *dev) +{ + struct async_entry *entry; + + entry = kzalloc(sizeof(struct async_entry), GFP_KERNEL); + + /* Give up if there is no memory or too much work. */ + if (!entry || atomic_read(&entry_count) > MAX_WORK) { + kfree(entry); + return false; + } + + __async_schedule_node_domain(func, dev, dev_to_node(dev), + &async_dfl_domain, entry); + return true; +} + +/** * async_synchronize_full - synchronize all asynchronous function calls * * This function waits until all asynchronous function calls have been done.
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 7839d0078e0d5e6cc2fa0b0dfbee71de74f1e557 upstream.
It is reported that in low-memory situations the system-wide resume core code deadlocks, because async_schedule_dev() executes its argument function synchronously if it cannot allocate memory (and not only in that case) and that function attempts to acquire a mutex that is already held. Executing the argument function synchronously from within dpm_async_fn() may also be problematic for ordering reasons (it may cause a consumer device's resume callback to be invoked before a requisite supplier device's one, for example).
Address this by changing the code in question to use async_schedule_dev_nocall() for scheduling the asynchronous execution of device suspend and resume functions and to directly run them synchronously if async_schedule_dev_nocall() returns false.
Link: https://lore.kernel.org/linux-pm/ZYvjiqX6EsL15moe@perf/ Reported-by: Youngmin Nam youngmin.nam@samsung.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Stanislaw Gruszka stanislaw.gruszka@linux.intel.com Tested-by: Youngmin Nam youngmin.nam@samsung.com Reviewed-by: Ulf Hansson ulf.hansson@linaro.org Cc: 5.7+ stable@vger.kernel.org # 5.7+: 6aa09a5bccd8 async: Split async_schedule_node_domain() Cc: 5.7+ stable@vger.kernel.org # 5.7+: 7d4b5d7a37bd async: Introduce async_schedule_dev_nocall() Cc: 5.7+ stable@vger.kernel.org # 5.7+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/base/power/main.c | 148 +++++++++++++++++++++------------------------- 1 file changed, 68 insertions(+), 80 deletions(-)
--- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -579,7 +579,7 @@ bool dev_pm_skip_resume(struct device *d }
/** - * device_resume_noirq - Execute a "noirq resume" callback for given device. + * __device_resume_noirq - Execute a "noirq resume" callback for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. * @async: If true, the device is being resumed asynchronously. @@ -587,7 +587,7 @@ bool dev_pm_skip_resume(struct device *d * The driver of @dev will not receive interrupts while this function is being * executed. */ -static int device_resume_noirq(struct device *dev, pm_message_t state, bool async) +static void __device_resume_noirq(struct device *dev, pm_message_t state, bool async) { pm_callback_t callback = NULL; const char *info = NULL; @@ -655,7 +655,13 @@ Skip: Out: complete_all(&dev->power.completion); TRACE_RESUME(error); - return error; + + if (error) { + suspend_stats.failed_resume_noirq++; + dpm_save_failed_step(SUSPEND_RESUME_NOIRQ); + dpm_save_failed_dev(dev_name(dev)); + pm_dev_err(dev, state, async ? " async noirq" : " noirq", error); + } }
static bool is_async(struct device *dev) @@ -668,11 +674,15 @@ static bool dpm_async_fn(struct device * { reinit_completion(&dev->power.completion);
- if (is_async(dev)) { - get_device(dev); - async_schedule_dev(func, dev); + if (!is_async(dev)) + return false; + + get_device(dev); + + if (async_schedule_dev_nocall(func, dev)) return true; - } + + put_device(dev);
return false; } @@ -680,15 +690,19 @@ static bool dpm_async_fn(struct device * static void async_resume_noirq(void *data, async_cookie_t cookie) { struct device *dev = data; - int error; - - error = device_resume_noirq(dev, pm_transition, true); - if (error) - pm_dev_err(dev, pm_transition, " async", error);
+ __device_resume_noirq(dev, pm_transition, true); put_device(dev); }
+static void device_resume_noirq(struct device *dev) +{ + if (dpm_async_fn(dev, async_resume_noirq)) + return; + + __device_resume_noirq(dev, pm_transition, false); +} + static void dpm_noirq_resume_devices(pm_message_t state) { struct device *dev; @@ -698,14 +712,6 @@ static void dpm_noirq_resume_devices(pm_ mutex_lock(&dpm_list_mtx); pm_transition = state;
- /* - * Advanced the async threads upfront, - * in case the starting of async threads is - * delayed by non-async resuming devices. - */ - list_for_each_entry(dev, &dpm_noirq_list, power.entry) - dpm_async_fn(dev, async_resume_noirq); - while (!list_empty(&dpm_noirq_list)) { dev = to_device(dpm_noirq_list.next); get_device(dev); @@ -713,17 +719,7 @@ static void dpm_noirq_resume_devices(pm_
mutex_unlock(&dpm_list_mtx);
- if (!is_async(dev)) { - int error; - - error = device_resume_noirq(dev, state, false); - if (error) { - suspend_stats.failed_resume_noirq++; - dpm_save_failed_step(SUSPEND_RESUME_NOIRQ); - dpm_save_failed_dev(dev_name(dev)); - pm_dev_err(dev, state, " noirq", error); - } - } + device_resume_noirq(dev);
put_device(dev);
@@ -751,14 +747,14 @@ void dpm_resume_noirq(pm_message_t state }
/** - * device_resume_early - Execute an "early resume" callback for given device. + * __device_resume_early - Execute an "early resume" callback for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. * @async: If true, the device is being resumed asynchronously. * * Runtime PM is disabled for @dev while this function is being executed. */ -static int device_resume_early(struct device *dev, pm_message_t state, bool async) +static void __device_resume_early(struct device *dev, pm_message_t state, bool async) { pm_callback_t callback = NULL; const char *info = NULL; @@ -811,21 +807,31 @@ Out:
pm_runtime_enable(dev); complete_all(&dev->power.completion); - return error; + + if (error) { + suspend_stats.failed_resume_early++; + dpm_save_failed_step(SUSPEND_RESUME_EARLY); + dpm_save_failed_dev(dev_name(dev)); + pm_dev_err(dev, state, async ? " async early" : " early", error); + } }
static void async_resume_early(void *data, async_cookie_t cookie) { struct device *dev = data; - int error; - - error = device_resume_early(dev, pm_transition, true); - if (error) - pm_dev_err(dev, pm_transition, " async", error);
+ __device_resume_early(dev, pm_transition, true); put_device(dev); }
+static void device_resume_early(struct device *dev) +{ + if (dpm_async_fn(dev, async_resume_early)) + return; + + __device_resume_early(dev, pm_transition, false); +} + /** * dpm_resume_early - Execute "early resume" callbacks for all devices. * @state: PM transition of the system being carried out. @@ -839,14 +845,6 @@ void dpm_resume_early(pm_message_t state mutex_lock(&dpm_list_mtx); pm_transition = state;
- /* - * Advanced the async threads upfront, - * in case the starting of async threads is - * delayed by non-async resuming devices. - */ - list_for_each_entry(dev, &dpm_late_early_list, power.entry) - dpm_async_fn(dev, async_resume_early); - while (!list_empty(&dpm_late_early_list)) { dev = to_device(dpm_late_early_list.next); get_device(dev); @@ -854,17 +852,7 @@ void dpm_resume_early(pm_message_t state
mutex_unlock(&dpm_list_mtx);
- if (!is_async(dev)) { - int error; - - error = device_resume_early(dev, state, false); - if (error) { - suspend_stats.failed_resume_early++; - dpm_save_failed_step(SUSPEND_RESUME_EARLY); - dpm_save_failed_dev(dev_name(dev)); - pm_dev_err(dev, state, " early", error); - } - } + device_resume_early(dev);
put_device(dev);
@@ -888,12 +876,12 @@ void dpm_resume_start(pm_message_t state EXPORT_SYMBOL_GPL(dpm_resume_start);
/** - * device_resume - Execute "resume" callbacks for given device. + * __device_resume - Execute "resume" callbacks for given device. * @dev: Device to handle. * @state: PM transition of the system being carried out. * @async: If true, the device is being resumed asynchronously. */ -static int device_resume(struct device *dev, pm_message_t state, bool async) +static void __device_resume(struct device *dev, pm_message_t state, bool async) { pm_callback_t callback = NULL; const char *info = NULL; @@ -975,20 +963,30 @@ static int device_resume(struct device *
TRACE_RESUME(error);
- return error; + if (error) { + suspend_stats.failed_resume++; + dpm_save_failed_step(SUSPEND_RESUME); + dpm_save_failed_dev(dev_name(dev)); + pm_dev_err(dev, state, async ? " async" : "", error); + } }
static void async_resume(void *data, async_cookie_t cookie) { struct device *dev = data; - int error;
- error = device_resume(dev, pm_transition, true); - if (error) - pm_dev_err(dev, pm_transition, " async", error); + __device_resume(dev, pm_transition, true); put_device(dev); }
+static void device_resume(struct device *dev) +{ + if (dpm_async_fn(dev, async_resume)) + return; + + __device_resume(dev, pm_transition, false); +} + /** * dpm_resume - Execute "resume" callbacks for non-sysdev devices. * @state: PM transition of the system being carried out. @@ -1008,27 +1006,17 @@ void dpm_resume(pm_message_t state) pm_transition = state; async_error = 0;
- list_for_each_entry(dev, &dpm_suspended_list, power.entry) - dpm_async_fn(dev, async_resume); - while (!list_empty(&dpm_suspended_list)) { dev = to_device(dpm_suspended_list.next); + get_device(dev); - if (!is_async(dev)) { - int error;
- mutex_unlock(&dpm_list_mtx); + mutex_unlock(&dpm_list_mtx);
- error = device_resume(dev, state, false); - if (error) { - suspend_stats.failed_resume++; - dpm_save_failed_step(SUSPEND_RESUME); - dpm_save_failed_dev(dev_name(dev)); - pm_dev_err(dev, state, "", error); - } + device_resume(dev); + + mutex_lock(&dpm_list_mtx);
- mutex_lock(&dpm_list_mtx); - } if (!list_empty(&dev->power.entry)) list_move_tail(&dev->power.entry, &dpm_prepared_list);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Josef Bacik josef@toxicpanda.com
commit 7b21ed7d119dc06b0ed2ba3e406a02cafe3a8d03 upstream.
If you select CONFIG_EFI_ZBOOT, we will generate vmlinuz.efi, and then when we go to install the kernel we'll install the vmlinux instead because install.sh only recognizes Image.gz as wanting the compressed install image. With CONFIG_EFI_ZBOOT we don't get the proper kernel installed, which means it doesn't boot, which makes for a very confused and subsequently angry kernel developer.
Fix this by properly installing our compressed kernel if we've enabled CONFIG_EFI_ZBOOT.
Signed-off-by: Josef Bacik josef@toxicpanda.com Cc: stable@vger.kernel.org # 6.1.x Fixes: c37b830fef13 ("arm64: efi: enable generic EFI compressed boot") Reviewed-by: Simon Glass sjg@chromium.org Link: https://lore.kernel.org/r/6edb1402769c2c14c4fbef8f7eaedb3167558789.170257067... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/install.sh | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/arm64/boot/install.sh +++ b/arch/arm64/boot/install.sh @@ -17,7 +17,8 @@ # $3 - kernel map file # $4 - default install path (blank if root directory)
-if [ "$(basename $2)" = "Image.gz" ]; then +if [ "$(basename $2)" = "Image.gz" ] || [ "$(basename $2)" = "vmlinuz.efi" ] +then # Compressed install echo "Installing compressed kernel" base=vmlinuz
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Viresh Kumar viresh.kumar@linaro.org
commit 7269c250db1b89cda72ca419b7bd5e37997309d6 upstream.
The OPP core finds the eventual frequency to set with the help of clk_round_rate() and the same was earlier getting passed to _set_opp() and that's what would get configured.
The commit 1efae8d2e777 ("OPP: Make dev_pm_opp_set_opp() independent of frequency") mistakenly changed that. Fix it.
Fixes: 1efae8d2e777 ("OPP: Make dev_pm_opp_set_opp() independent of frequency") Cc: v5.18+ stable@vger.kernel.org # v6.0+ Signed-off-by: Viresh Kumar viresh.kumar@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/opp/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/opp/core.c +++ b/drivers/opp/core.c @@ -1322,12 +1322,12 @@ int dev_pm_opp_set_rate(struct device *d * value of the frequency. In such a case, do not abort but * configure the hardware to the desired frequency forcefully. */ - forced = opp_table->rate_clk_single != target_freq; + forced = opp_table->rate_clk_single != freq; }
- ret = _set_opp(dev, opp_table, opp, &target_freq, forced); + ret = _set_opp(dev, opp_table, opp, &freq, forced);
- if (target_freq) + if (freq) dev_pm_opp_put(opp);
put_opp_table:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Disseldorp ddiss@suse.de
commit 2b0122aaa800b021e36027d7f29e206f87c761d6 upstream.
The value set as scrub_speed_max accepts size with suffixes (k/m/g/t/p/e) but we should still validate it for trailing characters, similar to what we do with chunk_size_store.
CC: stable@vger.kernel.org # 5.15+ Signed-off-by: David Disseldorp ddiss@suse.de Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/sysfs.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/btrfs/sysfs.c +++ b/fs/btrfs/sysfs.c @@ -1760,6 +1760,10 @@ static ssize_t btrfs_devinfo_scrub_speed unsigned long long limit;
limit = memparse(buf, &endptr); + /* There could be trailing '\n', also catch any typos after the value. */ + endptr = skip_spaces(endptr); + if (*endptr != 0) + return -EINVAL; WRITE_ONCE(device->scrub_speed_max, limit); return len; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tianjia Zhang tianjia.zhang@linux.alibaba.com
commit ba3c5574203034781ac4231acf117da917efcd2a upstream.
When the mpi_ec_ctx structure is initialized, some fields are not cleared, causing a crash when referencing the field when the structure was released. Initially, this issue was ignored because memory for mpi_ec_ctx is allocated with the __GFP_ZERO flag. For example, this error will be triggered when calculating the Za value for SM2 separately.
Fixes: d58bb7e55a8a ("lib/mpi: Introduce ec implementation to MPI library") Cc: stable@vger.kernel.org # v6.5 Signed-off-by: Tianjia Zhang tianjia.zhang@linux.alibaba.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- lib/crypto/mpi/ec.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/lib/crypto/mpi/ec.c b/lib/crypto/mpi/ec.c index 40f5908e57a4..e16dca1e23d5 100644 --- a/lib/crypto/mpi/ec.c +++ b/lib/crypto/mpi/ec.c @@ -584,6 +584,9 @@ void mpi_ec_init(struct mpi_ec_ctx *ctx, enum gcry_mpi_ec_models model, ctx->a = mpi_copy(a); ctx->b = mpi_copy(b);
+ ctx->d = NULL; + ctx->t.two_inv_p = NULL; + ctx->t.p_barrett = use_barrett > 0 ? mpi_barrett_init(ctx->p, 0) : NULL;
mpi_ec_get_reset(ctx);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gao Xiang hsiangkao@linux.alibaba.com
commit 3c12466b6b7bf1e56f9b32c366a3d83d87afb4de upstream.
Currently EROFS can map another compressed buffer for inplace decompression, that was used to handle the cases that some pages of compressed data are actually not in-place I/O.
However, like most simple LZ77 algorithms, LZ4 expects the compressed data is arranged at the end of the decompressed buffer and it explicitly uses memmove() to handle overlapping: __________________________________________________________ |_ direction of decompression --> ____ |_ compressed data _|
Although EROFS arranges compressed data like this, it typically maps two individual virtual buffers so the relative order is uncertain. Previously, it was hardly observed since LZ4 only uses memmove() for short overlapped literals and x86/arm64 memmove implementations seem to completely cover it up and they don't have this issue. Juhyung reported that EROFS data corruption can be found on a new Intel x86 processor. After some analysis, it seems that recent x86 processors with the new FSRM feature expose this issue with "rep movsb".
Let's strictly use the decompressed buffer for lz4 inplace decompression for now. Later, as an useful improvement, we could try to tie up these two buffers together in the correct order.
Reported-and-tested-by: Juhyung Park qkrwngud825@gmail.com Closes: https://lore.kernel.org/r/CAD14+f2AVKf8Fa2OO1aAUdDNTDsVzzR6ctU_oJSmTyd6zSYR2... Fixes: 0ffd71bcc3a0 ("staging: erofs: introduce LZ4 decompression inplace") Fixes: 598162d05080 ("erofs: support decompress big pcluster for lz4 backend") Cc: stable stable@vger.kernel.org # 5.4+ Tested-by: Yifan Zhao zhaoyifan@sjtu.edu.cn Signed-off-by: Gao Xiang hsiangkao@linux.alibaba.com Link: https://lore.kernel.org/r/20231206045534.3920847-1-hsiangkao@linux.alibaba.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/erofs/decompressor.c | 31 ++++++++++++++++--------------- 1 file changed, 16 insertions(+), 15 deletions(-)
--- a/fs/erofs/decompressor.c +++ b/fs/erofs/decompressor.c @@ -122,11 +122,11 @@ static int z_erofs_lz4_prepare_dstpages( }
static void *z_erofs_lz4_handle_overlap(struct z_erofs_lz4_decompress_ctx *ctx, - void *inpage, unsigned int *inputmargin, int *maptype, - bool may_inplace) + void *inpage, void *out, unsigned int *inputmargin, + int *maptype, bool may_inplace) { struct z_erofs_decompress_req *rq = ctx->rq; - unsigned int omargin, total, i, j; + unsigned int omargin, total, i; struct page **in; void *src, *tmp;
@@ -136,12 +136,13 @@ static void *z_erofs_lz4_handle_overlap( omargin < LZ4_DECOMPRESS_INPLACE_MARGIN(rq->inputsize)) goto docopy;
- for (i = 0; i < ctx->inpages; ++i) { - DBG_BUGON(rq->in[i] == NULL); - for (j = 0; j < ctx->outpages - ctx->inpages + i; ++j) - if (rq->out[j] == rq->in[i]) - goto docopy; - } + for (i = 0; i < ctx->inpages; ++i) + if (rq->out[ctx->outpages - ctx->inpages + i] != + rq->in[i]) + goto docopy; + kunmap_local(inpage); + *maptype = 3; + return out + ((ctx->outpages - ctx->inpages) << PAGE_SHIFT); }
if (ctx->inpages <= 1) { @@ -149,7 +150,6 @@ static void *z_erofs_lz4_handle_overlap( return inpage; } kunmap_local(inpage); - might_sleep(); src = erofs_vm_map_ram(rq->in, ctx->inpages); if (!src) return ERR_PTR(-ENOMEM); @@ -205,12 +205,12 @@ int z_erofs_fixup_insize(struct z_erofs_ }
static int z_erofs_lz4_decompress_mem(struct z_erofs_lz4_decompress_ctx *ctx, - u8 *out) + u8 *dst) { struct z_erofs_decompress_req *rq = ctx->rq; bool support_0padding = false, may_inplace = false; unsigned int inputmargin; - u8 *headpage, *src; + u8 *out, *headpage, *src; int ret, maptype;
DBG_BUGON(*rq->in == NULL); @@ -231,11 +231,12 @@ static int z_erofs_lz4_decompress_mem(st }
inputmargin = rq->pageofs_in; - src = z_erofs_lz4_handle_overlap(ctx, headpage, &inputmargin, + src = z_erofs_lz4_handle_overlap(ctx, headpage, dst, &inputmargin, &maptype, may_inplace); if (IS_ERR(src)) return PTR_ERR(src);
+ out = dst + rq->pageofs_out; /* legacy format could compress extra data in a pcluster. */ if (rq->partial_decoding || !support_0padding) ret = LZ4_decompress_safe_partial(src + inputmargin, out, @@ -266,7 +267,7 @@ static int z_erofs_lz4_decompress_mem(st vm_unmap_ram(src, ctx->inpages); } else if (maptype == 2) { erofs_put_pcpubuf(src); - } else { + } else if (maptype != 3) { DBG_BUGON(1); return -EFAULT; } @@ -309,7 +310,7 @@ static int z_erofs_lz4_decompress(struct }
dstmap_out: - ret = z_erofs_lz4_decompress_mem(&ctx, dst + rq->pageofs_out); + ret = z_erofs_lz4_decompress_mem(&ctx, dst); if (!dst_maptype) kunmap_local(dst); else if (dst_maptype == 2)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herbert Xu herbert@gondor.apana.org.au
commit 27016f75f5ed47e2d8e0ca75a8ff1f40bc1a5e27 upstream.
Disallow registration of two algorithms with identical driver names.
Cc: stable@vger.kernel.org Reported-by: Ovidiu Panait ovidiu.panait@windriver.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- crypto/algapi.c | 1 + 1 file changed, 1 insertion(+)
--- a/crypto/algapi.c +++ b/crypto/algapi.c @@ -341,6 +341,7 @@ __crypto_register_alg(struct crypto_alg }
if (!strcmp(q->cra_driver_name, alg->cra_name) || + !strcmp(q->cra_driver_name, alg->cra_driver_name) || !strcmp(q->cra_name, alg->cra_driver_name)) goto err; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hongchen Zhang zhanghongchen@loongson.cn
commit 71cd7e80cfde548959952eac7063aeaea1f2e1c6 upstream.
An S4 (suspend to disk) test on the LoongArch 3A6000 platform sometimes fails with the following error messaged in the dmesg log:
Invalid LZO compressed length
That happens because when compressing/decompressing the image, the synchronization between the control thread and the compress/decompress/crc thread is based on a relaxed ordering interface, which is unreliable, and the following situation may occur:
CPU 0 CPU 1 save_image_lzo lzo_compress_threadfn atomic_set(&d->stop, 1); atomic_read(&data[thr].stop) data[thr].cmp = data[thr].cmp_len; WRITE data[thr].cmp_len
Then CPU0 gets a stale cmp_len and writes it to disk. During resume from S4, wrong cmp_len is loaded.
To maintain data consistency between the two threads, use the acquire/release variants of atomic set and read operations.
Fixes: 081a9d043c98 ("PM / Hibernate: Improve performance of LZO/plain hibernation, checksum image") Cc: All applicable stable@vger.kernel.org Signed-off-by: Hongchen Zhang zhanghongchen@loongson.cn Co-developed-by: Weihao Li liweihao@loongson.cn Signed-off-by: Weihao Li liweihao@loongson.cn [ rjw: Subject rewrite and changelog edits ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/power/swap.c | 38 +++++++++++++++++++------------------- 1 file changed, 19 insertions(+), 19 deletions(-)
--- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -605,11 +605,11 @@ static int crc32_threadfn(void *data) unsigned i;
while (1) { - wait_event(d->go, atomic_read(&d->ready) || + wait_event(d->go, atomic_read_acquire(&d->ready) || kthread_should_stop()); if (kthread_should_stop()) { d->thr = NULL; - atomic_set(&d->stop, 1); + atomic_set_release(&d->stop, 1); wake_up(&d->done); break; } @@ -618,7 +618,7 @@ static int crc32_threadfn(void *data) for (i = 0; i < d->run_threads; i++) *d->crc32 = crc32_le(*d->crc32, d->unc[i], *d->unc_len[i]); - atomic_set(&d->stop, 1); + atomic_set_release(&d->stop, 1); wake_up(&d->done); } return 0; @@ -648,12 +648,12 @@ static int lzo_compress_threadfn(void *d struct cmp_data *d = data;
while (1) { - wait_event(d->go, atomic_read(&d->ready) || + wait_event(d->go, atomic_read_acquire(&d->ready) || kthread_should_stop()); if (kthread_should_stop()) { d->thr = NULL; d->ret = -1; - atomic_set(&d->stop, 1); + atomic_set_release(&d->stop, 1); wake_up(&d->done); break; } @@ -662,7 +662,7 @@ static int lzo_compress_threadfn(void *d d->ret = lzo1x_1_compress(d->unc, d->unc_len, d->cmp + LZO_HEADER, &d->cmp_len, d->wrk); - atomic_set(&d->stop, 1); + atomic_set_release(&d->stop, 1); wake_up(&d->done); } return 0; @@ -797,7 +797,7 @@ static int save_image_lzo(struct swap_ma
data[thr].unc_len = off;
- atomic_set(&data[thr].ready, 1); + atomic_set_release(&data[thr].ready, 1); wake_up(&data[thr].go); }
@@ -805,12 +805,12 @@ static int save_image_lzo(struct swap_ma break;
crc->run_threads = thr; - atomic_set(&crc->ready, 1); + atomic_set_release(&crc->ready, 1); wake_up(&crc->go);
for (run_threads = thr, thr = 0; thr < run_threads; thr++) { wait_event(data[thr].done, - atomic_read(&data[thr].stop)); + atomic_read_acquire(&data[thr].stop)); atomic_set(&data[thr].stop, 0);
ret = data[thr].ret; @@ -849,7 +849,7 @@ static int save_image_lzo(struct swap_ma } }
- wait_event(crc->done, atomic_read(&crc->stop)); + wait_event(crc->done, atomic_read_acquire(&crc->stop)); atomic_set(&crc->stop, 0); }
@@ -1131,12 +1131,12 @@ static int lzo_decompress_threadfn(void struct dec_data *d = data;
while (1) { - wait_event(d->go, atomic_read(&d->ready) || + wait_event(d->go, atomic_read_acquire(&d->ready) || kthread_should_stop()); if (kthread_should_stop()) { d->thr = NULL; d->ret = -1; - atomic_set(&d->stop, 1); + atomic_set_release(&d->stop, 1); wake_up(&d->done); break; } @@ -1149,7 +1149,7 @@ static int lzo_decompress_threadfn(void flush_icache_range((unsigned long)d->unc, (unsigned long)d->unc + d->unc_len);
- atomic_set(&d->stop, 1); + atomic_set_release(&d->stop, 1); wake_up(&d->done); } return 0; @@ -1334,7 +1334,7 @@ static int load_image_lzo(struct swap_ma }
if (crc->run_threads) { - wait_event(crc->done, atomic_read(&crc->stop)); + wait_event(crc->done, atomic_read_acquire(&crc->stop)); atomic_set(&crc->stop, 0); crc->run_threads = 0; } @@ -1370,7 +1370,7 @@ static int load_image_lzo(struct swap_ma pg = 0; }
- atomic_set(&data[thr].ready, 1); + atomic_set_release(&data[thr].ready, 1); wake_up(&data[thr].go); }
@@ -1389,7 +1389,7 @@ static int load_image_lzo(struct swap_ma
for (run_threads = thr, thr = 0; thr < run_threads; thr++) { wait_event(data[thr].done, - atomic_read(&data[thr].stop)); + atomic_read_acquire(&data[thr].stop)); atomic_set(&data[thr].stop, 0);
ret = data[thr].ret; @@ -1420,7 +1420,7 @@ static int load_image_lzo(struct swap_ma ret = snapshot_write_next(snapshot); if (ret <= 0) { crc->run_threads = thr + 1; - atomic_set(&crc->ready, 1); + atomic_set_release(&crc->ready, 1); wake_up(&crc->go); goto out_finish; } @@ -1428,13 +1428,13 @@ static int load_image_lzo(struct swap_ma }
crc->run_threads = thr; - atomic_set(&crc->ready, 1); + atomic_set_release(&crc->ready, 1); wake_up(&crc->go); }
out_finish: if (crc->run_threads) { - wait_event(crc->done, atomic_read(&crc->stop)); + wait_event(crc->done, atomic_read_acquire(&crc->stop)); atomic_set(&crc->stop, 0); } stop = ktime_get();
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herbert Xu herbert@gondor.apana.org.au
commit 78aafb3884f6bc6636efcc1760c891c8500b9922 upstream.
There is a dead-lock in the hwrng device read path. This triggers when the user reads from /dev/hwrng into memory also mmap-ed from /dev/hwrng. The resulting page fault triggers a recursive read which then dead-locks.
Fix this by using a stack buffer when calling copy_to_user.
Reported-by: Edward Adam Davis eadavis@qq.com Reported-by: syzbot+c52ab18308964d248092@syzkaller.appspotmail.com Fixes: 9996508b3353 ("hwrng: core - Replace u32 in driver API with byte array") Cc: stable@vger.kernel.org Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/hw_random/core.c | 34 +++++++++++++++++++++------------- 1 file changed, 21 insertions(+), 13 deletions(-)
--- a/drivers/char/hw_random/core.c +++ b/drivers/char/hw_random/core.c @@ -23,10 +23,13 @@ #include <linux/sched.h> #include <linux/sched/signal.h> #include <linux/slab.h> +#include <linux/string.h> #include <linux/uaccess.h>
#define RNG_MODULE_NAME "hw_random"
+#define RNG_BUFFER_SIZE (SMP_CACHE_BYTES < 32 ? 32 : SMP_CACHE_BYTES) + static struct hwrng *current_rng; /* the current rng has been explicitly chosen by user via sysfs */ static int cur_rng_set_by_user; @@ -58,7 +61,7 @@ static inline int rng_get_data(struct hw
static size_t rng_buffer_size(void) { - return SMP_CACHE_BYTES < 32 ? 32 : SMP_CACHE_BYTES; + return RNG_BUFFER_SIZE; }
static void add_early_randomness(struct hwrng *rng) @@ -209,6 +212,7 @@ static inline int rng_get_data(struct hw static ssize_t rng_dev_read(struct file *filp, char __user *buf, size_t size, loff_t *offp) { + u8 buffer[RNG_BUFFER_SIZE]; ssize_t ret = 0; int err = 0; int bytes_read, len; @@ -236,34 +240,37 @@ static ssize_t rng_dev_read(struct file if (bytes_read < 0) { err = bytes_read; goto out_unlock_reading; + } else if (bytes_read == 0 && + (filp->f_flags & O_NONBLOCK)) { + err = -EAGAIN; + goto out_unlock_reading; } + data_avail = bytes_read; }
- if (!data_avail) { - if (filp->f_flags & O_NONBLOCK) { - err = -EAGAIN; - goto out_unlock_reading; - } - } else { - len = data_avail; + len = data_avail; + if (len) { if (len > size) len = size;
data_avail -= len;
- if (copy_to_user(buf + ret, rng_buffer + data_avail, - len)) { + memcpy(buffer, rng_buffer + data_avail, len); + } + mutex_unlock(&reading_mutex); + put_rng(rng); + + if (len) { + if (copy_to_user(buf + ret, buffer, len)) { err = -EFAULT; - goto out_unlock_reading; + goto out; }
size -= len; ret += len; }
- mutex_unlock(&reading_mutex); - put_rng(rng);
if (need_resched()) schedule_timeout_interruptible(1); @@ -274,6 +281,7 @@ static ssize_t rng_dev_read(struct file } } out: + memzero_explicit(buffer, sizeof(buffer)); return ret ? : err;
out_unlock_reading:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herbert Xu herbert@gondor.apana.org.au
commit d07f951903fa9922c375b8ab1ce81b18a0034e3b upstream.
When processing the last block, the s390 ctr code will always read a whole block, even if there isn't a whole block of data left. Fix this by using the actual length left and copy it into a buffer first for processing.
Fixes: 0200f3ecc196 ("crypto: s390 - add System z hardware support for CTR mode") Cc: stable@vger.kernel.org Reported-by: Guangwu Zhang guazhang@redhat.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Reviewd-by: Harald Freudenberger freude@de.ibm.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/s390/crypto/aes_s390.c | 4 +++- arch/s390/crypto/paes_s390.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-)
--- a/arch/s390/crypto/aes_s390.c +++ b/arch/s390/crypto/aes_s390.c @@ -597,7 +597,9 @@ static int ctr_aes_crypt(struct skcipher * final block may be < AES_BLOCK_SIZE, copy only nbytes */ if (nbytes) { - cpacf_kmctr(sctx->fc, sctx->key, buf, walk.src.virt.addr, + memset(buf, 0, AES_BLOCK_SIZE); + memcpy(buf, walk.src.virt.addr, nbytes); + cpacf_kmctr(sctx->fc, sctx->key, buf, buf, AES_BLOCK_SIZE, walk.iv); memcpy(walk.dst.virt.addr, buf, nbytes); crypto_inc(walk.iv, AES_BLOCK_SIZE); --- a/arch/s390/crypto/paes_s390.c +++ b/arch/s390/crypto/paes_s390.c @@ -693,9 +693,11 @@ static int ctr_paes_crypt(struct skciphe * final block may be < AES_BLOCK_SIZE, copy only nbytes */ if (nbytes) { + memset(buf, 0, AES_BLOCK_SIZE); + memcpy(buf, walk.src.virt.addr, nbytes); while (1) { if (cpacf_kmctr(ctx->fc, ¶m, buf, - walk.src.virt.addr, AES_BLOCK_SIZE, + buf, AES_BLOCK_SIZE, walk.iv) == AES_BLOCK_SIZE) break; if (__paes_convert_key(ctx))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Anthony Krowiak akrowiak@linux.ibm.com
commit 7b2d039da622daa9ba259ac6f38701d542b237c3 upstream.
In the vfio_ap_irq_enable function, after the page containing the notification indicator byte (NIB) is pinned, the function attempts to register the guest ISC. If registration fails, the function sets the status response code and returns without unpinning the page containing the NIB. In order to avoid a memory leak, the NIB should be unpinned before returning from the vfio_ap_irq_enable function.
Co-developed-by: Janosch Frank frankja@linux.ibm.com Signed-off-by: Janosch Frank frankja@linux.ibm.com Signed-off-by: Anthony Krowiak akrowiak@linux.ibm.com Reviewed-by: Matthew Rosato mjrosato@linux.ibm.com Fixes: 783f0a3ccd79 ("s390/vfio-ap: add s390dbf logging to the vfio_ap_irq_enable function") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231109164427.460493-2-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/crypto/vfio_ap_ops.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -457,6 +457,7 @@ static struct ap_queue_status vfio_ap_ir VFIO_AP_DBF_WARN("%s: gisc registration failed: nisc=%d, isc=%d, apqn=%#04x\n", __func__, nisc, isc, q->apqn);
+ vfio_unpin_pages(&q->matrix_mdev->vdev, nib, 1); status.response_code = AP_RESPONSE_INVALID_GISA; return status; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Christian Marangi ansuelsmth@gmail.com
commit 08e23d05fa6dc4fc13da0ccf09defdd4bbc92ff4 upstream.
Fix buffer overflow in trans_stat_show().
Convert simple snprintf to the more secure scnprintf with size of PAGE_SIZE.
Add condition checking if we are exceeding PAGE_SIZE and exit early from loop. Also add at the end a warning that we exceeded PAGE_SIZE and that stats is disabled.
Return -EFBIG in the case where we don't have enough space to write the full transition table.
Also document in the ABI that this function can return -EFBIG error.
Link: https://lore.kernel.org/all/20231024183016.14648-2-ansuelsmth@gmail.com/ Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218041 Fixes: e552bbaf5b98 ("PM / devfreq: Add sysfs node for representing frequency transition information.") Signed-off-by: Christian Marangi ansuelsmth@gmail.com Signed-off-by: Chanwoo Choi cw00.choi@samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/ABI/testing/sysfs-class-devfreq | 3 + drivers/devfreq/devfreq.c | 59 +++++++++++++++++--------- 2 files changed, 43 insertions(+), 19 deletions(-)
--- a/Documentation/ABI/testing/sysfs-class-devfreq +++ b/Documentation/ABI/testing/sysfs-class-devfreq @@ -52,6 +52,9 @@ Description:
echo 0 > /sys/class/devfreq/.../trans_stat
+ If the transition table is bigger than PAGE_SIZE, reading + this will return an -EFBIG error. + What: /sys/class/devfreq/.../available_frequencies Date: October 2012 Contact: Nishanth Menon nm@ti.com --- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -1688,7 +1688,7 @@ static ssize_t trans_stat_show(struct de struct device_attribute *attr, char *buf) { struct devfreq *df = to_devfreq(dev); - ssize_t len; + ssize_t len = 0; int i, j; unsigned int max_state;
@@ -1697,7 +1697,7 @@ static ssize_t trans_stat_show(struct de max_state = df->max_state;
if (max_state == 0) - return sprintf(buf, "Not Supported.\n"); + return scnprintf(buf, PAGE_SIZE, "Not Supported.\n");
mutex_lock(&df->lock); if (!df->stop_polling && @@ -1707,31 +1707,52 @@ static ssize_t trans_stat_show(struct de } mutex_unlock(&df->lock);
- len = sprintf(buf, " From : To\n"); - len += sprintf(buf + len, " :"); - for (i = 0; i < max_state; i++) - len += sprintf(buf + len, "%10lu", - df->freq_table[i]); + len += scnprintf(buf + len, PAGE_SIZE - len, " From : To\n"); + len += scnprintf(buf + len, PAGE_SIZE - len, " :"); + for (i = 0; i < max_state; i++) { + if (len >= PAGE_SIZE - 1) + break; + len += scnprintf(buf + len, PAGE_SIZE - len, "%10lu", + df->freq_table[i]); + } + if (len >= PAGE_SIZE - 1) + return PAGE_SIZE - 1;
- len += sprintf(buf + len, " time(ms)\n"); + len += scnprintf(buf + len, PAGE_SIZE - len, " time(ms)\n");
for (i = 0; i < max_state; i++) { + if (len >= PAGE_SIZE - 1) + break; if (df->freq_table[i] == df->previous_freq) - len += sprintf(buf + len, "*"); + len += scnprintf(buf + len, PAGE_SIZE - len, "*"); else - len += sprintf(buf + len, " "); - - len += sprintf(buf + len, "%10lu:", df->freq_table[i]); - for (j = 0; j < max_state; j++) - len += sprintf(buf + len, "%10u", - df->stats.trans_table[(i * max_state) + j]); + len += scnprintf(buf + len, PAGE_SIZE - len, " "); + if (len >= PAGE_SIZE - 1) + break; + + len += scnprintf(buf + len, PAGE_SIZE - len, "%10lu:", + df->freq_table[i]); + for (j = 0; j < max_state; j++) { + if (len >= PAGE_SIZE - 1) + break; + len += scnprintf(buf + len, PAGE_SIZE - len, "%10u", + df->stats.trans_table[(i * max_state) + j]); + } + if (len >= PAGE_SIZE - 1) + break; + len += scnprintf(buf + len, PAGE_SIZE - len, "%10llu\n", (u64) + jiffies64_to_msecs(df->stats.time_in_state[i])); + }
- len += sprintf(buf + len, "%10llu\n", (u64) - jiffies64_to_msecs(df->stats.time_in_state[i])); + if (len < PAGE_SIZE - 1) + len += scnprintf(buf + len, PAGE_SIZE - len, "Total transition : %u\n", + df->stats.total_trans); + + if (len >= PAGE_SIZE - 1) { + pr_warn_once("devfreq transition table exceeds PAGE_SIZE. Disabling\n"); + return -EFBIG; }
- len += sprintf(buf + len, "Total transition : %u\n", - df->stats.total_trans); return len; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miquel Raynal miquel.raynal@bootlin.com
commit a7d84a2e7663bbe12394cc771107e04668ea313a upstream.
While switching to ref counters for track mtd devices use, the vmu-flash driver was forgotten. The reason for reading the ref counter seems debatable, but let's just fix the build for now.
Fixes: 19bfa9ebebb5 ("mtd: use refcount to prevent corruption") Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202312022315.79twVRZw-lkp@intel.com/ Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Link: https://lore.kernel.org/linux-mtd/20231205075936.13831-1-miquel.raynal@bootl... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/maps/vmu-flash.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/maps/vmu-flash.c b/drivers/mtd/maps/vmu-flash.c index a7ec947a3ebb..53019d313db7 100644 --- a/drivers/mtd/maps/vmu-flash.c +++ b/drivers/mtd/maps/vmu-flash.c @@ -719,7 +719,7 @@ static int vmu_can_unload(struct maple_device *mdev) card = maple_get_drvdata(mdev); for (x = 0; x < card->partitions; x++) { mtd = &((card->mtd)[x]); - if (mtd->usecount > 0) + if (kref_read(&mtd->refcnt)) return 0; } return 1;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miquel Raynal miquel.raynal@bootlin.com
commit bbcd80f53a5e8c27c2511f539fec8c373f500cf4 upstream.
The ONFI specification states that devices do not need to support sequential reads across LUN boundaries. In order to prevent such event from happening and possibly failing, let's introduce the concept of "pause" in the sequential read to handle these cases. The first/last pages remain the same but any time we cross a LUN boundary we will end and restart (if relevant) the sequential read operation.
Cc: stable@vger.kernel.org Fixes: 003fe4b9545b ("mtd: rawnand: Support for sequential cache reads") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Tested-by: Martin Hundebøll martin@geanix.com Link: https://lore.kernel.org/linux-mtd/20231215123208.516590-2-miquel.raynal@boot... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/nand_base.c | 43 +++++++++++++++++++++++++++++++++------ include/linux/mtd/rawnand.h | 2 + 2 files changed, 39 insertions(+), 6 deletions(-)
--- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -1208,6 +1208,23 @@ static int nand_lp_exec_read_page_op(str return nand_exec_op(chip, &op); }
+static void rawnand_cap_cont_reads(struct nand_chip *chip) +{ + struct nand_memory_organization *memorg; + unsigned int pages_per_lun, first_lun, last_lun; + + memorg = nanddev_get_memorg(&chip->base); + pages_per_lun = memorg->pages_per_eraseblock * memorg->eraseblocks_per_lun; + first_lun = chip->cont_read.first_page / pages_per_lun; + last_lun = chip->cont_read.last_page / pages_per_lun; + + /* Prevent sequential cache reads across LUN boundaries */ + if (first_lun != last_lun) + chip->cont_read.pause_page = first_lun * pages_per_lun + pages_per_lun - 1; + else + chip->cont_read.pause_page = chip->cont_read.last_page; +} + static int nand_lp_exec_cont_read_page_op(struct nand_chip *chip, unsigned int page, unsigned int offset_in_page, void *buf, unsigned int len, bool check_only) @@ -1226,7 +1243,7 @@ static int nand_lp_exec_cont_read_page_o NAND_OP_DATA_IN(len, buf, 0), }; struct nand_op_instr cont_instrs[] = { - NAND_OP_CMD(page == chip->cont_read.last_page ? + NAND_OP_CMD(page == chip->cont_read.pause_page ? NAND_CMD_READCACHEEND : NAND_CMD_READCACHESEQ, NAND_COMMON_TIMING_NS(conf, tWB_max)), NAND_OP_WAIT_RDY(NAND_COMMON_TIMING_MS(conf, tR_max), @@ -1263,16 +1280,29 @@ static int nand_lp_exec_cont_read_page_o }
if (page == chip->cont_read.first_page) - return nand_exec_op(chip, &start_op); + ret = nand_exec_op(chip, &start_op); else - return nand_exec_op(chip, &cont_op); + ret = nand_exec_op(chip, &cont_op); + if (ret) + return ret; + + if (!chip->cont_read.ongoing) + return 0; + + if (page == chip->cont_read.pause_page && + page != chip->cont_read.last_page) { + chip->cont_read.first_page = chip->cont_read.pause_page + 1; + rawnand_cap_cont_reads(chip); + } else if (page == chip->cont_read.last_page) { + chip->cont_read.ongoing = false; + } + + return 0; }
static bool rawnand_cont_read_ongoing(struct nand_chip *chip, unsigned int page) { - return chip->cont_read.ongoing && - page >= chip->cont_read.first_page && - page <= chip->cont_read.last_page; + return chip->cont_read.ongoing && page >= chip->cont_read.first_page; }
/** @@ -3446,6 +3476,7 @@ static void rawnand_enable_cont_reads(st if (col) chip->cont_read.first_page++; chip->cont_read.last_page = page + ((readlen >> chip->page_shift) & chip->pagemask); + rawnand_cap_cont_reads(chip); }
/** --- a/include/linux/mtd/rawnand.h +++ b/include/linux/mtd/rawnand.h @@ -1265,6 +1265,7 @@ struct nand_secure_region { * @cont_read: Sequential page read internals * @cont_read.ongoing: Whether a continuous read is ongoing or not * @cont_read.first_page: Start of the continuous read operation + * @cont_read.pause_page: End of the current sequential cache read operation * @cont_read.last_page: End of the continuous read operation * @controller: The hardware controller structure which is shared among multiple * independent devices @@ -1321,6 +1322,7 @@ struct nand_chip { struct { bool ongoing; unsigned int first_page; + unsigned int pause_page; unsigned int last_page; } cont_read;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miquel Raynal miquel.raynal@bootlin.com
commit 7c9414c870c027737d0f2ed7b0ed10f26edb1c61 upstream.
A couple of reports pointed at some strange failures happening a bit randomly since the introduction of sequential page reads support. After investigation it turned out the most likely reason for these issues was the fact that sometimes a (longer) read might happen, starting at the same page that was read previously. This is optimized by the raw NAND core, by not sending the READ_PAGE command to the NAND device and just reading out the data in a local cache. When this page is also flagged as being the starting point for a sequential read, it means the page right next will be accessed without the right instructions. The NAND chip will be confused and will not output correct data. In order to avoid such situation from happening anymore, we can however handle this case with a bit of additional logic, to postpone the initialization of the read sequence by one page.
Reported-by: Alexander Shiyan eagle.alexander923@gmail.com Closes: https://lore.kernel.org/linux-mtd/CAP1tNvS=NVAm-vfvYWbc3k9Cx9YxMc2uZZkmXk8h1... Reported-by: Måns Rullgård mans@mansr.com Closes: https://lore.kernel.org/linux-mtd/yw1xfs6j4k6q.fsf@mansr.com/ Reported-by: Martin Hundebøll martin@geanix.com Closes: https://lore.kernel.org/linux-mtd/9d0c42fcde79bfedfe5b05d6a4e9fdef71d3dd52.c... Fixes: 003fe4b9545b ("mtd: rawnand: Support for sequential cache reads") Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Tested-by: Martin Hundebøll martin@geanix.com Link: https://lore.kernel.org/linux-mtd/20231215123208.516590-3-miquel.raynal@boot... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/nand_base.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -3479,6 +3479,18 @@ static void rawnand_enable_cont_reads(st rawnand_cap_cont_reads(chip); }
+static void rawnand_cont_read_skip_first_page(struct nand_chip *chip, unsigned int page) +{ + if (!chip->cont_read.ongoing || page != chip->cont_read.first_page) + return; + + chip->cont_read.first_page++; + if (chip->cont_read.first_page == chip->cont_read.pause_page) + chip->cont_read.first_page++; + if (chip->cont_read.first_page >= chip->cont_read.last_page) + chip->cont_read.ongoing = false; +} + /** * nand_setup_read_retry - [INTERN] Set the READ RETRY mode * @chip: NAND chip object @@ -3653,6 +3665,8 @@ read_retry: buf += bytes; max_bitflips = max_t(unsigned int, max_bitflips, chip->pagecache.bitflips); + + rawnand_cont_read_skip_first_page(chip, page); }
readlen -= bytes;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miquel Raynal miquel.raynal@bootlin.com
commit a62c4597953fe54c6af04166a5e2872efd0e1490 upstream.
Some devices support sequential reads when using the on-die ECC engines, some others do not. It is a bit hard to know which ones will break other than experimentally, so in order to avoid such a difficult and painful task, let's just pretend all devices should avoid using this optimization when configured like this.
Cc: stable@vger.kernel.org Fixes: 003fe4b9545b ("mtd: rawnand: Support for sequential cache reads") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Tested-by: Martin Hundebøll martin@geanix.com Link: https://lore.kernel.org/linux-mtd/20231215123208.516590-4-miquel.raynal@boot... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/nand_base.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -5171,6 +5171,14 @@ static void rawnand_late_check_supported /* The supported_op fields should not be set by individual drivers */ WARN_ON_ONCE(chip->controller->supported_op.cont_read);
+ /* + * Too many devices do not support sequential cached reads with on-die + * ECC correction enabled, so in this case refuse to perform the + * automation. + */ + if (chip->ecc.engine_type == NAND_ECC_ENGINE_TYPE_ON_DIE) + return; + if (!nand_has_exec_op(chip)) return;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Miquel Raynal miquel.raynal@bootlin.com
commit 828f6df1bcba7f64729166efc7086ea657070445 upstream.
The current logic is probably fine but is a bit convoluted. Plus, we don't want partial pages to be part of the sequential operation just in case the core would optimize the page read with a subpage read (which would break the sequence). This may happen on the first and last page only, so if the start offset or the end offset is not aligned with a page boundary, better avoid them to prevent any risk.
Cc: stable@vger.kernel.org Fixes: 003fe4b9545b ("mtd: rawnand: Support for sequential cache reads") Signed-off-by: Miquel Raynal miquel.raynal@bootlin.com Tested-by: Martin Hundebøll martin@geanix.com Link: https://lore.kernel.org/linux-mtd/20231215123208.516590-5-miquel.raynal@boot... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/nand/raw/nand_base.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-)
--- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -3461,21 +3461,29 @@ static void rawnand_enable_cont_reads(st u32 readlen, int col) { struct mtd_info *mtd = nand_to_mtd(chip); + unsigned int end_page, end_col; + + chip->cont_read.ongoing = false;
if (!chip->controller->supported_op.cont_read) return;
- if ((col && col + readlen < (3 * mtd->writesize)) || - (!col && readlen < (2 * mtd->writesize))) { - chip->cont_read.ongoing = false; + end_page = DIV_ROUND_UP(col + readlen, mtd->writesize); + end_col = (col + readlen) % mtd->writesize; + + if (col) + page++; + + if (end_col && end_page) + end_page--; + + if (page + 1 > end_page) return; - }
- chip->cont_read.ongoing = true; chip->cont_read.first_page = page; - if (col) - chip->cont_read.first_page++; - chip->cont_read.last_page = page + ((readlen >> chip->page_shift) & chip->pagemask); + chip->cont_read.last_page = end_page; + chip->cont_read.ongoing = true; + rawnand_cap_cont_reads(chip); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit c4fb7d2eac9ff9bfc35a2e4d40c7169a332416e0 upstream.
The PMIC GLINK altmode driver currently supports at most two ports.
Fix the incomplete port sanity check on notifications to avoid accessing and corrupting memory beyond the port array if we ever get a notification for an unsupported port.
Fixes: 080b4e24852b ("soc: qcom: pmic_glink: Introduce altmode support") Cc: stable@vger.kernel.org # 6.3 Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20231109093100.19971-1-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/qcom/pmic_glink_altmode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/soc/qcom/pmic_glink_altmode.c +++ b/drivers/soc/qcom/pmic_glink_altmode.c @@ -285,7 +285,7 @@ static void pmic_glink_altmode_sc8180xp_
svid = mux == 2 ? USB_TYPEC_DP_SID : 0;
- if (!altmode->ports[port].altmode) { + if (port >= ARRAY_SIZE(altmode->ports) || !altmode->ports[port].altmode) { dev_dbg(altmode->dev, "notification on undefined port %d\n", port); return; } @@ -328,7 +328,7 @@ static void pmic_glink_altmode_sc8280xp_ hpd_state = FIELD_GET(SC8280XP_HPD_STATE_MASK, notify->payload[8]); hpd_irq = FIELD_GET(SC8280XP_HPD_IRQ_MASK, notify->payload[8]);
- if (!altmode->ports[port].altmode) { + if (port >= ARRAY_SIZE(altmode->ports) || !altmode->ports[port].altmode) { dev_dbg(altmode->dev, "notification on undefined port %d\n", port); return; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bingbu Cao bingbu.cao@intel.com
commit efa5fe19c0a9199f49e36e1f5242ed5c88da617d upstream.
As the sensor device maybe accessible right after its async sub-device is registered, such as ipu-bridge will try to power up sensor by sensor's client device's runtime PM from the async notifier callback, if runtime PM is not enabled, it will fail.
So runtime PM should be ready before its async sub-device is registered and accessible by others.
Fixes: df0b5c4a7ddd ("media: add imx355 camera sensor driver") Cc: stable@vger.kernel.org Signed-off-by: Bingbu Cao bingbu.cao@intel.com Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/i2c/imx355.c | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-)
--- a/drivers/media/i2c/imx355.c +++ b/drivers/media/i2c/imx355.c @@ -1788,10 +1788,6 @@ static int imx355_probe(struct i2c_clien goto error_handler_free; }
- ret = v4l2_async_register_subdev_sensor(&imx355->sd); - if (ret < 0) - goto error_media_entity; - /* * Device is already turned on by i2c-core with ACPI domain PM. * Enable runtime PM and turn off the device. @@ -1800,9 +1796,15 @@ static int imx355_probe(struct i2c_clien pm_runtime_enable(&client->dev); pm_runtime_idle(&client->dev);
+ ret = v4l2_async_register_subdev_sensor(&imx355->sd); + if (ret < 0) + goto error_media_entity_runtime_pm; + return 0;
-error_media_entity: +error_media_entity_runtime_pm: + pm_runtime_disable(&client->dev); + pm_runtime_set_suspended(&client->dev); media_entity_cleanup(&imx355->sd.entity);
error_handler_free:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xiaolei Wang xiaolei.wang@windriver.com
commit d5362c37e1f8a40096452fc201c30e705750e687 upstream.
Free driver_override when rpmsg_remove(), otherwise the following memory leak will occur:
unreferenced object 0xffff0000d55d7080 (size 128): comm "kworker/u8:2", pid 56, jiffies 4294893188 (age 214.272s) hex dump (first 32 bytes): 72 70 6d 73 67 5f 6e 73 00 00 00 00 00 00 00 00 rpmsg_ns........ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000009c94c9c1>] __kmem_cache_alloc_node+0x1f8/0x320 [<000000002300d89b>] __kmalloc_node_track_caller+0x44/0x70 [<00000000228a60c3>] kstrndup+0x4c/0x90 [<0000000077158695>] driver_set_override+0xd0/0x164 [<000000003e9c4ea5>] rpmsg_register_device_override+0x98/0x170 [<000000001c0c89a8>] rpmsg_ns_register_device+0x24/0x30 [<000000008bbf8fa2>] rpmsg_probe+0x2e0/0x3ec [<00000000e65a68df>] virtio_dev_probe+0x1c0/0x280 [<00000000443331cc>] really_probe+0xbc/0x2dc [<00000000391064b1>] __driver_probe_device+0x78/0xe0 [<00000000a41c9a5b>] driver_probe_device+0xd8/0x160 [<000000009c3bd5df>] __device_attach_driver+0xb8/0x140 [<0000000043cd7614>] bus_for_each_drv+0x7c/0xd4 [<000000003b929a36>] __device_attach+0x9c/0x19c [<00000000a94e0ba8>] device_initial_probe+0x14/0x20 [<000000003c999637>] bus_probe_device+0xa0/0xac
Signed-off-by: Xiaolei Wang xiaolei.wang@windriver.com Fixes: b0b03b811963 ("rpmsg: Release rpmsg devices in backends") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231215020049.78750-1-xiaolei.wang@windriver.com Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/rpmsg/virtio_rpmsg_bus.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/rpmsg/virtio_rpmsg_bus.c +++ b/drivers/rpmsg/virtio_rpmsg_bus.c @@ -378,6 +378,7 @@ static void virtio_rpmsg_release_device( struct rpmsg_device *rpdev = to_rpmsg_device(dev); struct virtio_rpmsg_channel *vch = to_virtio_rpmsg_channel(rpdev);
+ kfree(rpdev->driver_override); kfree(vch); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bingbu Cao bingbu.cao@intel.com
commit e242e9c144050ed120cf666642ba96b7c4462a4c upstream.
As the sensor device maybe accessible right after its async sub-device is registered, such as ipu-bridge will try to power up sensor by sensor's client device's runtime PM from the async notifier callback, if runtime PM is not enabled, it will fail.
So runtime PM should be ready before its async sub-device is registered and accessible by others.
Fixes: d3f863a63fe4 ("media: i2c: Add ov9734 image sensor driver") Cc: stable@vger.kernel.org Signed-off-by: Bingbu Cao bingbu.cao@intel.com Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/i2c/ov9734.c | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-)
--- a/drivers/media/i2c/ov9734.c +++ b/drivers/media/i2c/ov9734.c @@ -939,6 +939,7 @@ static void ov9734_remove(struct i2c_cli media_entity_cleanup(&sd->entity); v4l2_ctrl_handler_free(sd->ctrl_handler); pm_runtime_disable(&client->dev); + pm_runtime_set_suspended(&client->dev); mutex_destroy(&ov9734->mutex); }
@@ -984,13 +985,6 @@ static int ov9734_probe(struct i2c_clien goto probe_error_v4l2_ctrl_handler_free; }
- ret = v4l2_async_register_subdev_sensor(&ov9734->sd); - if (ret < 0) { - dev_err(&client->dev, "failed to register V4L2 subdev: %d", - ret); - goto probe_error_media_entity_cleanup; - } - /* * Device is already turned on by i2c-core with ACPI domain PM. * Enable runtime PM and turn off the device. @@ -999,9 +993,18 @@ static int ov9734_probe(struct i2c_clien pm_runtime_enable(&client->dev); pm_runtime_idle(&client->dev);
+ ret = v4l2_async_register_subdev_sensor(&ov9734->sd); + if (ret < 0) { + dev_err(&client->dev, "failed to register V4L2 subdev: %d", + ret); + goto probe_error_media_entity_cleanup_pm; + } + return 0;
-probe_error_media_entity_cleanup: +probe_error_media_entity_cleanup_pm: + pm_runtime_disable(&client->dev); + pm_runtime_set_suspended(&client->dev); media_entity_cleanup(&ov9734->sd.entity);
probe_error_v4l2_ctrl_handler_free:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bingbu Cao bingbu.cao@intel.com
commit 7b0454cfd8edb3509619407c3b9f78a6d0dee1a5 upstream.
As the sensor device maybe accessible right after its async sub-device is registered, such as ipu-bridge will try to power up sensor by sensor's client device's runtime PM from the async notifier callback, if runtime PM is not enabled, it will fail.
So runtime PM should be ready before its async sub-device is registered and accessible by others.
Fixes: 7ee850546822 ("media: Add sensor driver support for the ov13b10 camera.") Cc: stable@vger.kernel.org Signed-off-by: Bingbu Cao bingbu.cao@intel.com Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/i2c/ov13b10.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
--- a/drivers/media/i2c/ov13b10.c +++ b/drivers/media/i2c/ov13b10.c @@ -1536,24 +1536,27 @@ static int ov13b10_probe(struct i2c_clie goto error_handler_free; }
- ret = v4l2_async_register_subdev_sensor(&ov13b->sd); - if (ret < 0) - goto error_media_entity;
/* * Device is already turned on by i2c-core with ACPI domain PM. * Enable runtime PM and turn off the device. */ - /* Set the device's state to active if it's in D0 state. */ if (full_power) pm_runtime_set_active(&client->dev); pm_runtime_enable(&client->dev); pm_runtime_idle(&client->dev);
+ ret = v4l2_async_register_subdev_sensor(&ov13b->sd); + if (ret < 0) + goto error_media_entity_runtime_pm; + return 0;
-error_media_entity: +error_media_entity_runtime_pm: + pm_runtime_disable(&client->dev); + if (full_power) + pm_runtime_set_suspended(&client->dev); media_entity_cleanup(&ov13b->sd.entity);
error_handler_free: @@ -1576,6 +1579,7 @@ static void ov13b10_remove(struct i2c_cl ov13b10_free_controls(ov13b);
pm_runtime_disable(&client->dev); + pm_runtime_set_suspended(&client->dev); }
static DEFINE_RUNTIME_DEV_PM_OPS(ov13b10_pm_ops, ov13b10_suspend,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bingbu Cao bingbu.cao@intel.com
commit 47a78052db51b16e8045524fbf33373b58f1323b upstream.
As the sensor device maybe accessible right after its async sub-device is registered, such as ipu-bridge will try to power up sensor by sensor's client device's runtime PM from the async notifier callback, if runtime PM is not enabled, it will fail.
So runtime PM should be ready before its async sub-device is registered and accessible by others.
It also sets the runtime PM status to active as the sensor was turned on by i2c-core.
Fixes: 0827b58dabff ("media: i2c: add ov01a10 image sensor driver") Cc: stable@vger.kernel.org Signed-off-by: Bingbu Cao bingbu.cao@intel.com Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/i2c/ov01a10.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)
--- a/drivers/media/i2c/ov01a10.c +++ b/drivers/media/i2c/ov01a10.c @@ -907,6 +907,7 @@ static void ov01a10_remove(struct i2c_cl v4l2_ctrl_handler_free(sd->ctrl_handler);
pm_runtime_disable(&client->dev); + pm_runtime_set_suspended(&client->dev); }
static int ov01a10_probe(struct i2c_client *client) @@ -953,17 +954,26 @@ static int ov01a10_probe(struct i2c_clie goto err_media_entity_cleanup; }
+ /* + * Device is already turned on by i2c-core with ACPI domain PM. + * Enable runtime PM and turn off the device. + */ + pm_runtime_set_active(&client->dev); + pm_runtime_enable(dev); + pm_runtime_idle(dev); + ret = v4l2_async_register_subdev_sensor(&ov01a10->sd); if (ret < 0) { dev_err(dev, "Failed to register subdev: %d\n", ret); - goto err_media_entity_cleanup; + goto err_pm_disable; }
- pm_runtime_enable(dev); - pm_runtime_idle(dev); - return 0;
+err_pm_disable: + pm_runtime_disable(dev); + pm_runtime_set_suspended(&client->dev); + err_media_entity_cleanup: media_entity_cleanup(&ov01a10->sd.entity);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herve Codina herve.codina@bootlin.com
commit fc0c64154e5ddeb6f63c954735bd646ce5b8d9a4 upstream.
Running sparse (make C=1) on tsa.c raises a lot of warning such as: --- 8< --- warning: incorrect type in assignment (different address spaces) expected void *[noderef] si_regs got void [noderef] __iomem * --- 8< ---
Indeed, some variable were declared 'type *__iomem var' instead of 'type __iomem *var'.
Use the correct declaration to remove these warnings.
Fixes: 1d4ba0b81c1c ("soc: fsl: cpm1: Add support for TSA") Cc: stable@vger.kernel.org Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202312051959.9YdRIYbg-lkp@intel.com/ Signed-off-by: Herve Codina herve.codina@bootlin.com Reviewed-by: Christophe Leroy christophe.leroy@csgroup.eu Link: https://lore.kernel.org/r/20231205152116.122512-2-herve.codina@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/fsl/qe/tsa.c | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-)
--- a/drivers/soc/fsl/qe/tsa.c +++ b/drivers/soc/fsl/qe/tsa.c @@ -98,9 +98,9 @@ #define TSA_SIRP 0x10
struct tsa_entries_area { - void *__iomem entries_start; - void *__iomem entries_next; - void *__iomem last_entry; + void __iomem *entries_start; + void __iomem *entries_next; + void __iomem *last_entry; };
struct tsa_tdm { @@ -117,8 +117,8 @@ struct tsa_tdm {
struct tsa { struct device *dev; - void *__iomem si_regs; - void *__iomem si_ram; + void __iomem *si_regs; + void __iomem *si_ram; resource_size_t si_ram_sz; spinlock_t lock; int tdms; /* TSA_TDMx ORed */ @@ -135,27 +135,27 @@ static inline struct tsa *tsa_serial_get return container_of(tsa_serial, struct tsa, serials[tsa_serial->id]); }
-static inline void tsa_write32(void *__iomem addr, u32 val) +static inline void tsa_write32(void __iomem *addr, u32 val) { iowrite32be(val, addr); }
-static inline void tsa_write8(void *__iomem addr, u32 val) +static inline void tsa_write8(void __iomem *addr, u32 val) { iowrite8(val, addr); }
-static inline u32 tsa_read32(void *__iomem addr) +static inline u32 tsa_read32(void __iomem *addr) { return ioread32be(addr); }
-static inline void tsa_clrbits32(void *__iomem addr, u32 clr) +static inline void tsa_clrbits32(void __iomem *addr, u32 clr) { tsa_write32(addr, tsa_read32(addr) & ~clr); }
-static inline void tsa_clrsetbits32(void *__iomem addr, u32 clr, u32 set) +static inline void tsa_clrsetbits32(void __iomem *addr, u32 clr, u32 set) { tsa_write32(addr, (tsa_read32(addr) & ~clr) | set); } @@ -313,7 +313,7 @@ static u32 tsa_serial_id2csel(struct tsa static int tsa_add_entry(struct tsa *tsa, struct tsa_entries_area *area, u32 count, u32 serial_id) { - void *__iomem addr; + void __iomem *addr; u32 left; u32 val; u32 cnt;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herve Codina herve.codina@bootlin.com
commit a5ec3a21220da06bdda2e686012ca64fdb6c513d upstream.
Running sparse (make C=1) on qmc.c raises a lot of warning such as: ... warning: incorrect type in assignment (different address spaces) expected struct cpm_buf_desc [usertype] *[noderef] __iomem bd got struct cpm_buf_desc [noderef] [usertype] __iomem *txbd_free ...
Indeed, some variable were declared 'type *__iomem var' instead of 'type __iomem *var'.
Use the correct declaration to remove these warnings.
Fixes: 3178d58e0b97 ("soc: fsl: cpm1: Add support for QMC") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com Reviewed-by: Christophe Leroy christophe.leroy@csgroup.eu Link: https://lore.kernel.org/r/20231205152116.122512-3-herve.codina@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/fsl/qe/qmc.c | 34 +++++++++++++++++----------------- 1 file changed, 17 insertions(+), 17 deletions(-)
diff --git a/drivers/soc/fsl/qe/qmc.c b/drivers/soc/fsl/qe/qmc.c index 92ec76c03965..3f3de1351c96 100644 --- a/drivers/soc/fsl/qe/qmc.c +++ b/drivers/soc/fsl/qe/qmc.c @@ -175,7 +175,7 @@ struct qmc_chan { struct list_head list; unsigned int id; struct qmc *qmc; - void *__iomem s_param; + void __iomem *s_param; enum qmc_mode mode; u64 tx_ts_mask; u64 rx_ts_mask; @@ -203,9 +203,9 @@ struct qmc_chan { struct qmc { struct device *dev; struct tsa_serial *tsa_serial; - void *__iomem scc_regs; - void *__iomem scc_pram; - void *__iomem dpram; + void __iomem *scc_regs; + void __iomem *scc_pram; + void __iomem *dpram; u16 scc_pram_offset; cbd_t __iomem *bd_table; dma_addr_t bd_dma_addr; @@ -218,37 +218,37 @@ struct qmc { struct qmc_chan *chans[64]; };
-static inline void qmc_write16(void *__iomem addr, u16 val) +static inline void qmc_write16(void __iomem *addr, u16 val) { iowrite16be(val, addr); }
-static inline u16 qmc_read16(void *__iomem addr) +static inline u16 qmc_read16(void __iomem *addr) { return ioread16be(addr); }
-static inline void qmc_setbits16(void *__iomem addr, u16 set) +static inline void qmc_setbits16(void __iomem *addr, u16 set) { qmc_write16(addr, qmc_read16(addr) | set); }
-static inline void qmc_clrbits16(void *__iomem addr, u16 clr) +static inline void qmc_clrbits16(void __iomem *addr, u16 clr) { qmc_write16(addr, qmc_read16(addr) & ~clr); }
-static inline void qmc_write32(void *__iomem addr, u32 val) +static inline void qmc_write32(void __iomem *addr, u32 val) { iowrite32be(val, addr); }
-static inline u32 qmc_read32(void *__iomem addr) +static inline u32 qmc_read32(void __iomem *addr) { return ioread32be(addr); }
-static inline void qmc_setbits32(void *__iomem addr, u32 set) +static inline void qmc_setbits32(void __iomem *addr, u32 set) { qmc_write32(addr, qmc_read32(addr) | set); } @@ -318,7 +318,7 @@ int qmc_chan_write_submit(struct qmc_chan *chan, dma_addr_t addr, size_t length, { struct qmc_xfer_desc *xfer_desc; unsigned long flags; - cbd_t *__iomem bd; + cbd_t __iomem *bd; u16 ctrl; int ret;
@@ -374,7 +374,7 @@ static void qmc_chan_write_done(struct qmc_chan *chan) void (*complete)(void *context); unsigned long flags; void *context; - cbd_t *__iomem bd; + cbd_t __iomem *bd; u16 ctrl;
/* @@ -425,7 +425,7 @@ int qmc_chan_read_submit(struct qmc_chan *chan, dma_addr_t addr, size_t length, { struct qmc_xfer_desc *xfer_desc; unsigned long flags; - cbd_t *__iomem bd; + cbd_t __iomem *bd; u16 ctrl; int ret;
@@ -488,7 +488,7 @@ static void qmc_chan_read_done(struct qmc_chan *chan) void (*complete)(void *context, size_t size); struct qmc_xfer_desc *xfer_desc; unsigned long flags; - cbd_t *__iomem bd; + cbd_t __iomem *bd; void *context; u16 datalen; u16 ctrl; @@ -663,7 +663,7 @@ static void qmc_chan_reset_rx(struct qmc_chan *chan) { struct qmc_xfer_desc *xfer_desc; unsigned long flags; - cbd_t *__iomem bd; + cbd_t __iomem *bd; u16 ctrl;
spin_lock_irqsave(&chan->rx_lock, flags); @@ -694,7 +694,7 @@ static void qmc_chan_reset_tx(struct qmc_chan *chan) { struct qmc_xfer_desc *xfer_desc; unsigned long flags; - cbd_t *__iomem bd; + cbd_t __iomem *bd; u16 ctrl;
spin_lock_irqsave(&chan->tx_lock, flags);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Herve Codina herve.codina@bootlin.com
commit dfe66d012af2ddfa566cf9c860b8472b412fb7e4 upstream.
The qmc_chan_reset_rx() set the is_rx_stopped flag. This leads to an inconsistent state in the following sequence. qmc_chan_stop() qmc_chan_reset() Indeed, after the qmc_chan_reset() call, the channel must still be stopped. Only a qmc_chan_start() call can move the channel from stopped state to started state.
Fix the issue removing the is_rx_stopped flag setting from qmc_chan_reset()
Fixes: 3178d58e0b97 ("soc: fsl: cpm1: Add support for QMC") Cc: stable@vger.kernel.org Signed-off-by: Herve Codina herve.codina@bootlin.com Reviewed-by: Christophe Leroy christophe.leroy@csgroup.eu Link: https://lore.kernel.org/r/20231205152116.122512-4-herve.codina@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/fsl/qe/qmc.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/soc/fsl/qe/qmc.c b/drivers/soc/fsl/qe/qmc.c index 3f3de1351c96..2312152a44b3 100644 --- a/drivers/soc/fsl/qe/qmc.c +++ b/drivers/soc/fsl/qe/qmc.c @@ -685,7 +685,6 @@ static void qmc_chan_reset_rx(struct qmc_chan *chan) qmc_read16(chan->s_param + QMC_SPE_RBASE));
chan->rx_pending = 0; - chan->is_rx_stopped = false;
spin_unlock_irqrestore(&chan->rx_lock, flags); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tony Krowiak akrowiak@linux.ibm.com
commit 850fb7fa8c684a4c6bf0e4b6978f4ddcc5d43d11 upstream.
The vfio_ap_mdev_filter_matrix function is called whenever a new adapter or domain is assigned to the mdev. The purpose of the function is to update the guest's AP configuration by filtering the matrix of adapters and domains assigned to the mdev. When an adapter or domain is assigned, only the APQNs associated with the APID of the new adapter or APQI of the new domain are inspected. If an APQN does not reference a queue device bound to the vfio_ap device driver, then it's APID will be filtered from the mdev's matrix when updating the guest's AP configuration.
Inspecting only the APID of the new adapter or APQI of the new domain will result in passing AP queues through to a guest that are not bound to the vfio_ap device driver under certain circumstances. Consider the following:
guest's AP configuration (all also assigned to the mdev's matrix): 14.0004 14.0005 14.0006 16.0004 16.0005 16.0006
unassign domain 4 unbind queue 16.0005 assign domain 4
When domain 4 is re-assigned, since only domain 4 will be inspected, the APQNs that will be examined will be: 14.0004 16.0004
Since both of those APQNs reference queue devices that are bound to the vfio_ap device driver, nothing will get filtered from the mdev's matrix when updating the guest's AP configuration. Consequently, queue 16.0005 will get passed through despite not being bound to the driver. This violates the linux device model requirement that a guest shall only be given access to devices bound to the device driver facilitating their pass-through.
To resolve this problem, every adapter and domain assigned to the mdev will be inspected when filtering the mdev's matrix.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Acked-by: Halil Pasic pasic@linux.ibm.com Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115185441.31526-2-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/crypto/vfio_ap_ops.c | 57 +++++++++++--------------------------- 1 file changed, 17 insertions(+), 40 deletions(-)
--- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -671,8 +671,7 @@ static bool vfio_ap_mdev_filter_cdoms(st * Return: a boolean value indicating whether the KVM guest's APCB was changed * by the filtering or not. */ -static bool vfio_ap_mdev_filter_matrix(unsigned long *apm, unsigned long *aqm, - struct ap_matrix_mdev *matrix_mdev) +static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev) { unsigned long apid, apqi, apqn; DECLARE_BITMAP(prev_shadow_apm, AP_DEVICES); @@ -693,8 +692,8 @@ static bool vfio_ap_mdev_filter_matrix(u bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm, (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
- for_each_set_bit_inv(apid, apm, AP_DEVICES) { - for_each_set_bit_inv(apqi, aqm, AP_DOMAINS) { + for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) { + for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) { /* * If the APQN is not bound to the vfio_ap device * driver, then we can't assign it to the guest's @@ -959,7 +958,6 @@ static ssize_t assign_adapter_store(stru { int ret; unsigned long apid; - DECLARE_BITMAP(apm_delta, AP_DEVICES); struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex); @@ -988,11 +986,8 @@ static ssize_t assign_adapter_store(stru }
vfio_ap_mdev_link_adapter(matrix_mdev, apid); - memset(apm_delta, 0, sizeof(apm_delta)); - set_bit_inv(apid, apm_delta);
- if (vfio_ap_mdev_filter_matrix(apm_delta, - matrix_mdev->matrix.aqm, matrix_mdev)) + if (vfio_ap_mdev_filter_matrix(matrix_mdev)) vfio_ap_mdev_update_guest_apcb(matrix_mdev);
ret = count; @@ -1168,7 +1163,6 @@ static ssize_t assign_domain_store(struc { int ret; unsigned long apqi; - DECLARE_BITMAP(aqm_delta, AP_DOMAINS); struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex); @@ -1197,11 +1191,8 @@ static ssize_t assign_domain_store(struc }
vfio_ap_mdev_link_domain(matrix_mdev, apqi); - memset(aqm_delta, 0, sizeof(aqm_delta)); - set_bit_inv(apqi, aqm_delta);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev->matrix.apm, aqm_delta, - matrix_mdev)) + if (vfio_ap_mdev_filter_matrix(matrix_mdev)) vfio_ap_mdev_update_guest_apcb(matrix_mdev);
ret = count; @@ -2092,9 +2083,7 @@ int vfio_ap_mdev_probe_queue(struct ap_d if (matrix_mdev) { vfio_ap_mdev_link_queue(matrix_mdev, q);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev->matrix.apm, - matrix_mdev->matrix.aqm, - matrix_mdev)) + if (vfio_ap_mdev_filter_matrix(matrix_mdev)) vfio_ap_mdev_update_guest_apcb(matrix_mdev); } dev_set_drvdata(&apdev->device, q); @@ -2444,34 +2433,22 @@ void vfio_ap_on_cfg_changed(struct ap_co
static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev) { - bool do_hotplug = false; - int filter_domains = 0; - int filter_adapters = 0; - DECLARE_BITMAP(apm, AP_DEVICES); - DECLARE_BITMAP(aqm, AP_DOMAINS); + bool filter_domains, filter_adapters, filter_cdoms, do_hotplug = false;
mutex_lock(&matrix_mdev->kvm->lock); mutex_lock(&matrix_dev->mdevs_lock);
- filter_adapters = bitmap_and(apm, matrix_mdev->matrix.apm, - matrix_mdev->apm_add, AP_DEVICES); - filter_domains = bitmap_and(aqm, matrix_mdev->matrix.aqm, - matrix_mdev->aqm_add, AP_DOMAINS); - - if (filter_adapters && filter_domains) - do_hotplug |= vfio_ap_mdev_filter_matrix(apm, aqm, matrix_mdev); - else if (filter_adapters) - do_hotplug |= - vfio_ap_mdev_filter_matrix(apm, - matrix_mdev->shadow_apcb.aqm, - matrix_mdev); - else - do_hotplug |= - vfio_ap_mdev_filter_matrix(matrix_mdev->shadow_apcb.apm, - aqm, matrix_mdev); + filter_adapters = bitmap_intersects(matrix_mdev->matrix.apm, + matrix_mdev->apm_add, AP_DEVICES); + filter_domains = bitmap_intersects(matrix_mdev->matrix.aqm, + matrix_mdev->aqm_add, AP_DOMAINS); + filter_cdoms = bitmap_intersects(matrix_mdev->matrix.adm, + matrix_mdev->adm_add, AP_DOMAINS); + + if (filter_adapters || filter_domains) + do_hotplug = vfio_ap_mdev_filter_matrix(matrix_mdev);
- if (bitmap_intersects(matrix_mdev->matrix.adm, matrix_mdev->adm_add, - AP_DOMAINS)) + if (filter_cdoms) do_hotplug |= vfio_ap_mdev_filter_cdoms(matrix_mdev);
if (do_hotplug)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tony Krowiak akrowiak@linux.ibm.com
commit 16fb78cbf56e42b8efb2682a4444ab59e32e7959 upstream.
While filtering the mdev matrix, it doesn't make sense - and will have unexpected results - to filter an APID from the matrix if the APID or one of the associated APQIs is not in the host's AP configuration. There are two reasons for this:
1. An adapter or domain that is not in the host's AP configuration can be assigned to the matrix; this is known as over-provisioning. Queue devices, however, are only created for adapters and domains in the host's AP configuration, so there will be no queues associated with an over-provisioned adapter or domain to filter.
2. The adapter or domain may have been externally removed from the host's configuration via an SE or HMC attached to a DPM enabled LPAR. In this case, the vfio_ap device driver would have been notified by the AP bus via the on_config_changed callback and the adapter or domain would have already been filtered.
Since the matrix_mdev->shadow_apcb.apm and matrix_mdev->shadow_apcb.aqm are copied from the mdev matrix sans the APIDs and APQIs not in the host's AP configuration, let's loop over those bitmaps instead of those assigned to the matrix.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Reviewed-by: Halil Pasic pasic@linux.ibm.com Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115185441.31526-3-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/crypto/vfio_ap_ops.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -692,8 +692,9 @@ static bool vfio_ap_mdev_filter_matrix(s bitmap_and(matrix_mdev->shadow_apcb.aqm, matrix_mdev->matrix.aqm, (unsigned long *)matrix_dev->info.aqm, AP_DOMAINS);
- for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) { - for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) { + for_each_set_bit_inv(apid, matrix_mdev->shadow_apcb.apm, AP_DEVICES) { + for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, + AP_DOMAINS) { /* * If the APQN is not bound to the vfio_ap device * driver, then we can't assign it to the guest's
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tony Krowiak akrowiak@linux.ibm.com
commit 774d10196e648e2c0b78da817f631edfb3dfa557 upstream.
When adapters and/or domains are added to the host's AP configuration, this may result in multiple queue devices getting created and probed by the vfio_ap device driver. For each queue device probed, the matrix of adapters and domains assigned to a matrix mdev will be filtered to update the guest's APCB. If any adapters or domains get added to or removed from the APCB, the guest's AP configuration will be dynamically updated (i.e., hot plug/unplug). To dynamically update the guest's configuration, its VCPUs must be taken out of SIE for the period of time it takes to make the update. This is disruptive to the guest's operation and if there are many queues probed due to a change in the host's AP configuration, this could be troublesome. The problem is exacerbated by the fact that the 'on_scan_complete' callback also filters the mdev's matrix and updates the guest's AP configuration.
In order to reduce the potential amount of disruption to the guest that may result from a change to the host's AP configuration, let's bypass the filtering of the matrix and updating of the guest's AP configuration in the probe callback - if due to a host config change - and defer it until the 'on_scan_complete' callback is invoked after the AP bus finishes its device scan operation. This way the filtering and updating will be performed only once regardless of the number of queues added.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Reviewed-by: Halil Pasic pasic@linux.ibm.com Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115185441.31526-4-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/crypto/vfio_ap_ops.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
--- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -2084,9 +2084,22 @@ int vfio_ap_mdev_probe_queue(struct ap_d if (matrix_mdev) { vfio_ap_mdev_link_queue(matrix_mdev, q);
+ /* + * If we're in the process of handling the adding of adapters or + * domains to the host's AP configuration, then let the + * vfio_ap device driver's on_scan_complete callback filter the + * matrix and update the guest's AP configuration after all of + * the new queue devices are probed. + */ + if (!bitmap_empty(matrix_mdev->apm_add, AP_DEVICES) || + !bitmap_empty(matrix_mdev->aqm_add, AP_DOMAINS)) + goto done; + if (vfio_ap_mdev_filter_matrix(matrix_mdev)) vfio_ap_mdev_update_guest_apcb(matrix_mdev); } + +done: dev_set_drvdata(&apdev->device, q); release_update_locks_for_mdev(matrix_mdev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tony Krowiak akrowiak@linux.ibm.com
commit f848cba767e59f8d5c54984b1d45451aae040d50 upstream.
When filtering the adapters from the configuration profile for a guest to create or update a guest's AP configuration, if the APID of an adapter and the APQI of a domain identify a queue device that is not bound to the vfio_ap device driver, the APID of the adapter will be filtered because an individual APQN can not be filtered due to the fact the APQNs are assigned to an AP configuration as a matrix of APIDs and APQIs. Consequently, a guest will not have access to all of the queues associated with the filtered adapter. If the queues are subsequently made available again to the guest, they should re-appear in a reset state; so, let's make sure all queues associated with an adapter unplugged from the guest are reset.
In order to identify the set of queues that need to be reset, let's allow a vfio_ap_queue object to be simultaneously stored in both a hashtable and a list: A hashtable used to store all of the queues assigned to a matrix mdev; and/or, a list used to store a subset of the queues that need to be reset. For example, when an adapter is hot unplugged from a guest, all guest queues associated with that adapter must be reset. Since that may be a subset of those assigned to the matrix mdev, they can be stored in a list that can be passed to the vfio_ap_mdev_reset_queues function.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Acked-by: Halil Pasic pasic@linux.ibm.com Fixes: 48cae940c31d ("s390/vfio-ap: refresh guest's APCB by filtering AP resources assigned to mdev") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115185441.31526-5-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/crypto/vfio_ap_ops.c | 171 +++++++++++++++++++++++++--------- drivers/s390/crypto/vfio_ap_private.h | 3 2 files changed, 129 insertions(+), 45 deletions(-)
--- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -32,7 +32,8 @@
#define AP_RESET_INTERVAL 20 /* Reset sleep interval (20ms) */
-static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable); +static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev); +static int vfio_ap_mdev_reset_qlist(struct list_head *qlist); static struct vfio_ap_queue *vfio_ap_find_queue(int apqn); static const struct vfio_device_ops vfio_ap_matrix_dev_ops; static void vfio_ap_mdev_reset_queue(struct vfio_ap_queue *q); @@ -662,16 +663,23 @@ static bool vfio_ap_mdev_filter_cdoms(st * device driver. * * @matrix_mdev: the matrix mdev whose matrix is to be filtered. + * @apm_filtered: a 256-bit bitmap for storing the APIDs filtered from the + * guest's AP configuration that are still in the host's AP + * configuration. * * Note: If an APQN referencing a queue device that is not bound to the vfio_ap * driver, its APID will be filtered from the guest's APCB. The matrix * structure precludes filtering an individual APQN, so its APID will be - * filtered. + * filtered. Consequently, all queues associated with the adapter that + * are in the host's AP configuration must be reset. If queues are + * subsequently made available again to the guest, they should re-appear + * in a reset state * * Return: a boolean value indicating whether the KVM guest's APCB was changed * by the filtering or not. */ -static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev) +static bool vfio_ap_mdev_filter_matrix(struct ap_matrix_mdev *matrix_mdev, + unsigned long *apm_filtered) { unsigned long apid, apqi, apqn; DECLARE_BITMAP(prev_shadow_apm, AP_DEVICES); @@ -681,6 +689,7 @@ static bool vfio_ap_mdev_filter_matrix(s bitmap_copy(prev_shadow_apm, matrix_mdev->shadow_apcb.apm, AP_DEVICES); bitmap_copy(prev_shadow_aqm, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS); vfio_ap_matrix_init(&matrix_dev->info, &matrix_mdev->shadow_apcb); + bitmap_clear(apm_filtered, 0, AP_DEVICES);
/* * Copy the adapters, domains and control domains to the shadow_apcb @@ -706,8 +715,16 @@ static bool vfio_ap_mdev_filter_matrix(s apqn = AP_MKQID(apid, apqi); q = vfio_ap_mdev_get_queue(matrix_mdev, apqn); if (!q || q->reset_status.response_code) { - clear_bit_inv(apid, - matrix_mdev->shadow_apcb.apm); + clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm); + + /* + * If the adapter was previously plugged into + * the guest, let's let the caller know that + * the APID was filtered. + */ + if (test_bit_inv(apid, prev_shadow_apm)) + set_bit_inv(apid, apm_filtered); + break; } } @@ -809,7 +826,7 @@ static void vfio_ap_mdev_remove(struct m
mutex_lock(&matrix_dev->guests_lock); mutex_lock(&matrix_dev->mdevs_lock); - vfio_ap_mdev_reset_queues(&matrix_mdev->qtable); + vfio_ap_mdev_reset_queues(matrix_mdev); vfio_ap_mdev_unlink_fr_queues(matrix_mdev); list_del(&matrix_mdev->node); mutex_unlock(&matrix_dev->mdevs_lock); @@ -919,6 +936,47 @@ static void vfio_ap_mdev_link_adapter(st AP_MKQID(apid, apqi)); }
+static int reset_queues_for_apids(struct ap_matrix_mdev *matrix_mdev, + unsigned long *apm_reset) +{ + struct vfio_ap_queue *q, *tmpq; + struct list_head qlist; + unsigned long apid, apqi; + int apqn, ret = 0; + + if (bitmap_empty(apm_reset, AP_DEVICES)) + return 0; + + INIT_LIST_HEAD(&qlist); + + for_each_set_bit_inv(apid, apm_reset, AP_DEVICES) { + for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, + AP_DOMAINS) { + /* + * If the domain is not in the host's AP configuration, + * then resetting it will fail with response code 01 + * (APQN not valid). + */ + if (!test_bit_inv(apqi, + (unsigned long *)matrix_dev->info.aqm)) + continue; + + apqn = AP_MKQID(apid, apqi); + q = vfio_ap_mdev_get_queue(matrix_mdev, apqn); + + if (q) + list_add_tail(&q->reset_qnode, &qlist); + } + } + + ret = vfio_ap_mdev_reset_qlist(&qlist); + + list_for_each_entry_safe(q, tmpq, &qlist, reset_qnode) + list_del(&q->reset_qnode); + + return ret; +} + /** * assign_adapter_store - parses the APID from @buf and sets the * corresponding bit in the mediated matrix device's APM @@ -959,6 +1017,7 @@ static ssize_t assign_adapter_store(stru { int ret; unsigned long apid; + DECLARE_BITMAP(apm_filtered, AP_DEVICES); struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex); @@ -988,8 +1047,10 @@ static ssize_t assign_adapter_store(stru
vfio_ap_mdev_link_adapter(matrix_mdev, apid);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev)) + if (vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered)) { vfio_ap_mdev_update_guest_apcb(matrix_mdev); + reset_queues_for_apids(matrix_mdev, apm_filtered); + }
ret = count; done: @@ -1020,11 +1081,12 @@ static struct vfio_ap_queue * adapter was assigned. * @matrix_mdev: the matrix mediated device to which the adapter was assigned. * @apid: the APID of the unassigned adapter. - * @qtable: table for storing queues associated with unassigned adapter. + * @qlist: list for storing queues associated with unassigned adapter that + * need to be reset. */ static void vfio_ap_mdev_unlink_adapter(struct ap_matrix_mdev *matrix_mdev, unsigned long apid, - struct ap_queue_table *qtable) + struct list_head *qlist) { unsigned long apqi; struct vfio_ap_queue *q; @@ -1032,11 +1094,10 @@ static void vfio_ap_mdev_unlink_adapter( for_each_set_bit_inv(apqi, matrix_mdev->matrix.aqm, AP_DOMAINS) { q = vfio_ap_unlink_apqn_fr_mdev(matrix_mdev, apid, apqi);
- if (q && qtable) { + if (q && qlist) { if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) && test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) - hash_add(qtable->queues, &q->mdev_qnode, - q->apqn); + list_add_tail(&q->reset_qnode, qlist); } } } @@ -1044,26 +1105,23 @@ static void vfio_ap_mdev_unlink_adapter( static void vfio_ap_mdev_hot_unplug_adapter(struct ap_matrix_mdev *matrix_mdev, unsigned long apid) { - int loop_cursor; - struct vfio_ap_queue *q; - struct ap_queue_table *qtable = kzalloc(sizeof(*qtable), GFP_KERNEL); + struct vfio_ap_queue *q, *tmpq; + struct list_head qlist;
- hash_init(qtable->queues); - vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, qtable); + INIT_LIST_HEAD(&qlist); + vfio_ap_mdev_unlink_adapter(matrix_mdev, apid, &qlist);
if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm)) { clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm); vfio_ap_mdev_update_guest_apcb(matrix_mdev); }
- vfio_ap_mdev_reset_queues(qtable); + vfio_ap_mdev_reset_qlist(&qlist);
- hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) { + list_for_each_entry_safe(q, tmpq, &qlist, reset_qnode) { vfio_ap_unlink_mdev_fr_queue(q); - hash_del(&q->mdev_qnode); + list_del(&q->reset_qnode); } - - kfree(qtable); }
/** @@ -1164,6 +1222,7 @@ static ssize_t assign_domain_store(struc { int ret; unsigned long apqi; + DECLARE_BITMAP(apm_filtered, AP_DEVICES); struct ap_matrix_mdev *matrix_mdev = dev_get_drvdata(dev);
mutex_lock(&ap_perms_mutex); @@ -1193,8 +1252,10 @@ static ssize_t assign_domain_store(struc
vfio_ap_mdev_link_domain(matrix_mdev, apqi);
- if (vfio_ap_mdev_filter_matrix(matrix_mdev)) + if (vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered)) { vfio_ap_mdev_update_guest_apcb(matrix_mdev); + reset_queues_for_apids(matrix_mdev, apm_filtered); + }
ret = count; done: @@ -1207,7 +1268,7 @@ static DEVICE_ATTR_WO(assign_domain);
static void vfio_ap_mdev_unlink_domain(struct ap_matrix_mdev *matrix_mdev, unsigned long apqi, - struct ap_queue_table *qtable) + struct list_head *qlist) { unsigned long apid; struct vfio_ap_queue *q; @@ -1215,11 +1276,10 @@ static void vfio_ap_mdev_unlink_domain(s for_each_set_bit_inv(apid, matrix_mdev->matrix.apm, AP_DEVICES) { q = vfio_ap_unlink_apqn_fr_mdev(matrix_mdev, apid, apqi);
- if (q && qtable) { + if (q && qlist) { if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) && test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) - hash_add(qtable->queues, &q->mdev_qnode, - q->apqn); + list_add_tail(&q->reset_qnode, qlist); } } } @@ -1227,26 +1287,23 @@ static void vfio_ap_mdev_unlink_domain(s static void vfio_ap_mdev_hot_unplug_domain(struct ap_matrix_mdev *matrix_mdev, unsigned long apqi) { - int loop_cursor; - struct vfio_ap_queue *q; - struct ap_queue_table *qtable = kzalloc(sizeof(*qtable), GFP_KERNEL); + struct vfio_ap_queue *q, *tmpq; + struct list_head qlist;
- hash_init(qtable->queues); - vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, qtable); + INIT_LIST_HEAD(&qlist); + vfio_ap_mdev_unlink_domain(matrix_mdev, apqi, &qlist);
if (test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) { clear_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm); vfio_ap_mdev_update_guest_apcb(matrix_mdev); }
- vfio_ap_mdev_reset_queues(qtable); + vfio_ap_mdev_reset_qlist(&qlist);
- hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) { + list_for_each_entry_safe(q, tmpq, &qlist, reset_qnode) { vfio_ap_unlink_mdev_fr_queue(q); - hash_del(&q->mdev_qnode); + list_del(&q->reset_qnode); } - - kfree(qtable); }
/** @@ -1601,7 +1658,7 @@ static void vfio_ap_mdev_unset_kvm(struc get_update_locks_for_kvm(kvm);
kvm_arch_crypto_clear_masks(kvm); - vfio_ap_mdev_reset_queues(&matrix_mdev->qtable); + vfio_ap_mdev_reset_queues(matrix_mdev); kvm_put_kvm(kvm); matrix_mdev->kvm = NULL;
@@ -1737,15 +1794,33 @@ static void vfio_ap_mdev_reset_queue(str } }
-static int vfio_ap_mdev_reset_queues(struct ap_queue_table *qtable) +static int vfio_ap_mdev_reset_queues(struct ap_matrix_mdev *matrix_mdev) { int ret = 0, loop_cursor; struct vfio_ap_queue *q;
- hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) + hash_for_each(matrix_mdev->qtable.queues, loop_cursor, q, mdev_qnode) vfio_ap_mdev_reset_queue(q);
- hash_for_each(qtable->queues, loop_cursor, q, mdev_qnode) { + hash_for_each(matrix_mdev->qtable.queues, loop_cursor, q, mdev_qnode) { + flush_work(&q->reset_work); + + if (q->reset_status.response_code) + ret = -EIO; + } + + return ret; +} + +static int vfio_ap_mdev_reset_qlist(struct list_head *qlist) +{ + int ret = 0; + struct vfio_ap_queue *q; + + list_for_each_entry(q, qlist, reset_qnode) + vfio_ap_mdev_reset_queue(q); + + list_for_each_entry(q, qlist, reset_qnode) { flush_work(&q->reset_work);
if (q->reset_status.response_code) @@ -1931,7 +2006,7 @@ static ssize_t vfio_ap_mdev_ioctl(struct ret = vfio_ap_mdev_get_device_info(arg); break; case VFIO_DEVICE_RESET: - ret = vfio_ap_mdev_reset_queues(&matrix_mdev->qtable); + ret = vfio_ap_mdev_reset_queues(matrix_mdev); break; case VFIO_DEVICE_GET_IRQ_INFO: ret = vfio_ap_get_irq_info(arg); @@ -2063,6 +2138,7 @@ int vfio_ap_mdev_probe_queue(struct ap_d { int ret; struct vfio_ap_queue *q; + DECLARE_BITMAP(apm_filtered, AP_DEVICES); struct ap_matrix_mdev *matrix_mdev;
ret = sysfs_create_group(&apdev->device.kobj, &vfio_queue_attr_group); @@ -2095,15 +2171,17 @@ int vfio_ap_mdev_probe_queue(struct ap_d !bitmap_empty(matrix_mdev->aqm_add, AP_DOMAINS)) goto done;
- if (vfio_ap_mdev_filter_matrix(matrix_mdev)) + if (vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered)) { vfio_ap_mdev_update_guest_apcb(matrix_mdev); + reset_queues_for_apids(matrix_mdev, apm_filtered); + } }
done: dev_set_drvdata(&apdev->device, q); release_update_locks_for_mdev(matrix_mdev);
- return 0; + return ret;
err_remove_group: sysfs_remove_group(&apdev->device.kobj, &vfio_queue_attr_group); @@ -2447,6 +2525,7 @@ void vfio_ap_on_cfg_changed(struct ap_co
static void vfio_ap_mdev_hot_plug_cfg(struct ap_matrix_mdev *matrix_mdev) { + DECLARE_BITMAP(apm_filtered, AP_DEVICES); bool filter_domains, filter_adapters, filter_cdoms, do_hotplug = false;
mutex_lock(&matrix_mdev->kvm->lock); @@ -2460,7 +2539,7 @@ static void vfio_ap_mdev_hot_plug_cfg(st matrix_mdev->adm_add, AP_DOMAINS);
if (filter_adapters || filter_domains) - do_hotplug = vfio_ap_mdev_filter_matrix(matrix_mdev); + do_hotplug = vfio_ap_mdev_filter_matrix(matrix_mdev, apm_filtered);
if (filter_cdoms) do_hotplug |= vfio_ap_mdev_filter_cdoms(matrix_mdev); @@ -2468,6 +2547,8 @@ static void vfio_ap_mdev_hot_plug_cfg(st if (do_hotplug) vfio_ap_mdev_update_guest_apcb(matrix_mdev);
+ reset_queues_for_apids(matrix_mdev, apm_filtered); + mutex_unlock(&matrix_dev->mdevs_lock); mutex_unlock(&matrix_mdev->kvm->lock); } --- a/drivers/s390/crypto/vfio_ap_private.h +++ b/drivers/s390/crypto/vfio_ap_private.h @@ -133,6 +133,8 @@ struct ap_matrix_mdev { * @apqn: the APQN of the AP queue device * @saved_isc: the guest ISC registered with the GIB interface * @mdev_qnode: allows the vfio_ap_queue struct to be added to a hashtable + * @reset_qnode: allows the vfio_ap_queue struct to be added to a list of queues + * that need to be reset * @reset_status: the status from the last reset of the queue * @reset_work: work to wait for queue reset to complete */ @@ -143,6 +145,7 @@ struct vfio_ap_queue { #define VFIO_AP_ISC_INVALID 0xff unsigned char saved_isc; struct hlist_node mdev_qnode; + struct list_head reset_qnode; struct ap_queue_status reset_status; struct work_struct reset_work; };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tony Krowiak akrowiak@linux.ibm.com
commit f009cfa466558b7dfe97f167ba1875d6f9ea4c07 upstream.
When a queue is unbound from the vfio_ap device driver, if that queue is assigned to a guest's AP configuration, its associated adapter is removed because queues are defined to a guest via a matrix of adapters and domains; so, it is not possible to remove a single queue.
If an adapter is removed from the guest's AP configuration, all associated queues must be reset to prevent leaking crypto data should any of them be assigned to a different guest or device driver. The one caveat is that if the queue is being removed because the adapter or domain has been removed from the host's AP configuration, then an attempt to reset the queue will fail with response code 01, AP-queue number not valid; so resetting these queues should be skipped.
Acked-by: Halil Pasic pasic@linux.ibm.com Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Fixes: 09d31ff78793 ("s390/vfio-ap: hot plug/unplug of AP devices when probed/removed") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115185441.31526-6-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/crypto/vfio_ap_ops.c | 76 ++++++++++++++++++++------------------ 1 file changed, 41 insertions(+), 35 deletions(-)
--- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -936,45 +936,45 @@ static void vfio_ap_mdev_link_adapter(st AP_MKQID(apid, apqi)); }
+static void collect_queues_to_reset(struct ap_matrix_mdev *matrix_mdev, + unsigned long apid, + struct list_head *qlist) +{ + struct vfio_ap_queue *q; + unsigned long apqi; + + for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, AP_DOMAINS) { + q = vfio_ap_mdev_get_queue(matrix_mdev, AP_MKQID(apid, apqi)); + if (q) + list_add_tail(&q->reset_qnode, qlist); + } +} + +static void reset_queues_for_apid(struct ap_matrix_mdev *matrix_mdev, + unsigned long apid) +{ + struct list_head qlist; + + INIT_LIST_HEAD(&qlist); + collect_queues_to_reset(matrix_mdev, apid, &qlist); + vfio_ap_mdev_reset_qlist(&qlist); +} + static int reset_queues_for_apids(struct ap_matrix_mdev *matrix_mdev, unsigned long *apm_reset) { - struct vfio_ap_queue *q, *tmpq; struct list_head qlist; - unsigned long apid, apqi; - int apqn, ret = 0; + unsigned long apid;
if (bitmap_empty(apm_reset, AP_DEVICES)) return 0;
INIT_LIST_HEAD(&qlist);
- for_each_set_bit_inv(apid, apm_reset, AP_DEVICES) { - for_each_set_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm, - AP_DOMAINS) { - /* - * If the domain is not in the host's AP configuration, - * then resetting it will fail with response code 01 - * (APQN not valid). - */ - if (!test_bit_inv(apqi, - (unsigned long *)matrix_dev->info.aqm)) - continue; - - apqn = AP_MKQID(apid, apqi); - q = vfio_ap_mdev_get_queue(matrix_mdev, apqn); + for_each_set_bit_inv(apid, apm_reset, AP_DEVICES) + collect_queues_to_reset(matrix_mdev, apid, &qlist);
- if (q) - list_add_tail(&q->reset_qnode, &qlist); - } - } - - ret = vfio_ap_mdev_reset_qlist(&qlist); - - list_for_each_entry_safe(q, tmpq, &qlist, reset_qnode) - list_del(&q->reset_qnode); - - return ret; + return vfio_ap_mdev_reset_qlist(&qlist); }
/** @@ -2200,24 +2200,30 @@ void vfio_ap_mdev_remove_queue(struct ap matrix_mdev = q->matrix_mdev;
if (matrix_mdev) { - vfio_ap_unlink_queue_fr_mdev(q); - apid = AP_QID_CARD(q->apqn); apqi = AP_QID_QUEUE(q->apqn); - - /* - * If the queue is assigned to the guest's APCB, then remove - * the adapter's APID from the APCB and hot it into the guest. - */ + /* If the queue is assigned to the guest's AP configuration */ if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) && test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) { + /* + * Since the queues are defined via a matrix of adapters + * and domains, it is not possible to hot unplug a + * single queue; so, let's unplug the adapter. + */ clear_bit_inv(apid, matrix_mdev->shadow_apcb.apm); vfio_ap_mdev_update_guest_apcb(matrix_mdev); + reset_queues_for_apid(matrix_mdev, apid); + goto done; } }
vfio_ap_mdev_reset_queue(q); flush_work(&q->reset_work); + +done: + if (matrix_mdev) + vfio_ap_unlink_queue_fr_mdev(q); + dev_set_drvdata(&apdev->device, NULL); kfree(q); release_update_locks_for_mdev(matrix_mdev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tony Krowiak akrowiak@linux.ibm.com
commit b9bd10c43456d16abd97b717446f51afb3b88411 upstream.
When a queue is unbound from the vfio_ap device driver, it is reset to ensure its crypto data is not leaked when it is bound to another device driver. If the queue is unbound due to the fact that the adapter or domain was removed from the host's AP configuration, then attempting to reset it will fail with response code 01 (APID not valid) getting returned from the reset command. Let's ensure that the queue is assigned to the host's configuration before resetting it.
Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com Reviewed-by: "Jason J. Herne" jjherne@linux.ibm.com Reviewed-by: Halil Pasic pasic@linux.ibm.com Fixes: eeb386aeb5b7 ("s390/vfio-ap: handle config changed and scan complete notification") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115185441.31526-7-akrowiak@linux.ibm.com Signed-off-by: Alexander Gordeev agordeev@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/s390/crypto/vfio_ap_ops.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
--- a/drivers/s390/crypto/vfio_ap_ops.c +++ b/drivers/s390/crypto/vfio_ap_ops.c @@ -2198,10 +2198,10 @@ void vfio_ap_mdev_remove_queue(struct ap q = dev_get_drvdata(&apdev->device); get_update_locks_for_queue(q); matrix_mdev = q->matrix_mdev; + apid = AP_QID_CARD(q->apqn); + apqi = AP_QID_QUEUE(q->apqn);
if (matrix_mdev) { - apid = AP_QID_CARD(q->apqn); - apqi = AP_QID_QUEUE(q->apqn); /* If the queue is assigned to the guest's AP configuration */ if (test_bit_inv(apid, matrix_mdev->shadow_apcb.apm) && test_bit_inv(apqi, matrix_mdev->shadow_apcb.aqm)) { @@ -2217,8 +2217,16 @@ void vfio_ap_mdev_remove_queue(struct ap } }
- vfio_ap_mdev_reset_queue(q); - flush_work(&q->reset_work); + /* + * If the queue is not in the host's AP configuration, then resetting + * it will fail with response code 01, (APQN not valid); so, let's make + * sure it is in the host's config. + */ + if (test_bit_inv(apid, (unsigned long *)matrix_dev->info.apm) && + test_bit_inv(apqi, (unsigned long *)matrix_dev->info.aqm)) { + vfio_ap_mdev_reset_queue(q); + flush_work(&q->reset_work); + }
done: if (matrix_mdev)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
commit 78fbb92af27d0982634116c7a31065f24d092826 upstream.
syzbot complains that msg->msg_get_inq value can be uninitialized [1]
struct msghdr got many new fields recently, we should always make sure their values is zero by default.
[1] BUG: KMSAN: uninit-value in tcp_recvmsg+0x686/0xac0 net/ipv4/tcp.c:2571 tcp_recvmsg+0x686/0xac0 net/ipv4/tcp.c:2571 inet_recvmsg+0x131/0x580 net/ipv4/af_inet.c:879 sock_recvmsg_nosec net/socket.c:1044 [inline] sock_recvmsg+0x12b/0x1e0 net/socket.c:1066 __sock_xmit+0x236/0x5c0 drivers/block/nbd.c:538 nbd_read_reply drivers/block/nbd.c:732 [inline] recv_work+0x262/0x3100 drivers/block/nbd.c:863 process_one_work kernel/workqueue.c:2627 [inline] process_scheduled_works+0x104e/0x1e70 kernel/workqueue.c:2700 worker_thread+0xf45/0x1490 kernel/workqueue.c:2781 kthread+0x3ed/0x540 kernel/kthread.c:388 ret_from_fork+0x66/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x11/0x20 arch/x86/entry/entry_64.S:242
Local variable msg created at: __sock_xmit+0x4c/0x5c0 drivers/block/nbd.c:513 nbd_read_reply drivers/block/nbd.c:732 [inline] recv_work+0x262/0x3100 drivers/block/nbd.c:863
CPU: 1 PID: 7465 Comm: kworker/u5:1 Not tainted 6.7.0-rc7-syzkaller-00041-gf016f7547aee #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 Workqueue: nbd5-recv recv_work
Fixes: f94fd25cb0aa ("tcp: pass back data left in socket after receive") Reported-by: syzbot syzkaller@googlegroups.com Signed-off-by: Eric Dumazet edumazet@google.com Cc: stable@vger.kernel.org Cc: Josef Bacik josef@toxicpanda.com Cc: Jens Axboe axboe@kernel.dk Cc: linux-block@vger.kernel.org Cc: nbd@other.debian.org Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240112132657.647112-1-edumazet@google.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/nbd.c | 6 +----- 1 file changed, 1 insertion(+), 5 deletions(-)
--- a/drivers/block/nbd.c +++ b/drivers/block/nbd.c @@ -510,7 +510,7 @@ static int __sock_xmit(struct nbd_device struct iov_iter *iter, int msg_flags, int *sent) { int result; - struct msghdr msg; + struct msghdr msg = {} ; unsigned int noreclaim_flag;
if (unlikely(!sock)) { @@ -526,10 +526,6 @@ static int __sock_xmit(struct nbd_device do { sock->sk->sk_allocation = GFP_NOIO | __GFP_MEMALLOC; sock->sk->sk_use_task_frag = false; - msg.msg_name = NULL; - msg.msg_namelen = 0; - msg.msg_control = NULL; - msg.msg_controllen = 0; msg.msg_flags = msg_flags | MSG_NOSIGNAL;
if (send)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Serge Semin fancer.lancer@gmail.com
commit e1a9ae45736989c972a8d1c151bc390678ae6205 upstream.
max_mapnr variable is utilized in the pfn_valid() method in order to determine the upper PFN space boundary. Having it uninitialized effectively makes any PFN passed to that method invalid. That in its turn causes the kernel mm-subsystem occasion malfunctions even after the max_mapnr variable is actually properly updated. For instance, pfn_valid() is called in the init_unavailable_range() method in the framework of the calls-chain on MIPS: setup_arch() +-> paging_init() +-> free_area_init() +-> memmap_init() +-> memmap_init_zone_range() +-> init_unavailable_range()
Since pfn_valid() always returns "false" value before max_mapnr is initialized in the mem_init() method, any flatmem page-holes will be left in the poisoned/uninitialized state including the IO-memory pages. Thus any further attempts to map/remap the IO-memory by using MMU may fail. In particular it happened in my case on attempt to map the SRAM region. The kernel bootup procedure just crashed on the unhandled unaligned access bug raised in the __update_cache() method:
Unhandled kernel unaligned access[#1]: CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.7.0-rc1-XXX-dirty #2056 ... Call Trace: [<8011ef9c>] __update_cache+0x88/0x1bc [<80385944>] ioremap_page_range+0x110/0x2a4 [<80126948>] ioremap_prot+0x17c/0x1f4 [<80711b80>] __devm_ioremap+0x8c/0x120 [<80711e0c>] __devm_ioremap_resource+0xf4/0x218 [<808bf244>] sram_probe+0x4f4/0x930 [<80889d20>] platform_probe+0x68/0xec ...
Let's fix the problem by initializing the max_mapnr variable as soon as the required data is available. In particular it can be done right in the paging_init() method before free_area_init() is called since all the PFN zone boundaries have already been calculated by that time.
Cc: stable@vger.kernel.org Signed-off-by: Serge Semin fancer.lancer@gmail.com Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/mips/mm/init.c | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-)
--- a/arch/mips/mm/init.c +++ b/arch/mips/mm/init.c @@ -422,7 +422,12 @@ void __init paging_init(void) (highend_pfn - max_low_pfn) << (PAGE_SHIFT - 10)); max_zone_pfns[ZONE_HIGHMEM] = max_low_pfn; } + + max_mapnr = highend_pfn ? highend_pfn : max_low_pfn; +#else + max_mapnr = max_low_pfn; #endif + high_memory = (void *) __va(max_low_pfn << PAGE_SHIFT);
free_area_init(max_zone_pfns); } @@ -458,13 +463,6 @@ void __init mem_init(void) */ BUILD_BUG_ON(IS_ENABLED(CONFIG_32BIT) && (PFN_PTE_SHIFT > PAGE_SHIFT));
-#ifdef CONFIG_HIGHMEM - max_mapnr = highend_pfn ? highend_pfn : max_low_pfn; -#else - max_mapnr = max_low_pfn; -#endif - high_memory = (void *) __va(max_low_pfn << PAGE_SHIFT); - maar_init(); memblock_free_all(); setup_zero_pages(); /* Setup zeroed pages. */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Krishna chaitanya chundru quic_krichai@quicinc.com
commit eff9704f5332a13b08fbdbe0f84059c9e7051d5f upstream.
Though we do check the event ring read pointer by "is_valid_ring_ptr" to make sure it is in the buffer range, but there is another risk the pointer may be not aligned. Since we are expecting event ring elements are 128 bits(struct mhi_ring_element) aligned, an unaligned read pointer could lead to multiple issues like DoS or ring buffer memory corruption.
So add a alignment check for event ring read pointer.
Fixes: ec32332df764 ("bus: mhi: core: Sanity check values from remote device before use") cc: stable@vger.kernel.org Signed-off-by: Krishna chaitanya chundru quic_krichai@quicinc.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Link: https://lore.kernel.org/r/20231031-alignment_check-v2-1-1441db7c5efd@quicinc... Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/bus/mhi/host/main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/bus/mhi/host/main.c +++ b/drivers/bus/mhi/host/main.c @@ -268,7 +268,8 @@ static void mhi_del_ring_element(struct
static bool is_valid_ring_ptr(struct mhi_ring *ring, dma_addr_t addr) { - return addr >= ring->iommu_base && addr < ring->iommu_base + ring->len; + return addr >= ring->iommu_base && addr < ring->iommu_base + ring->len && + !(addr & (sizeof(struct mhi_ring_element) - 1)); }
int mhi_destroy_device(struct device *dev, void *data)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qiang Yu quic_qianyu@quicinc.com
commit 01bd694ac2f682fb8017e16148b928482bc8fa4b upstream.
Ensure read and write locks for the channel are not taken in succession by dropping the read lock from parse_xfer_event() such that a callback given to client can potentially queue buffers and acquire the write lock in that process. Any queueing of buffers should be done without channel read lock acquired as it can result in multiple locks and a soft lockup.
Cc: stable@vger.kernel.org # 5.7 Fixes: 1d3173a3bae7 ("bus: mhi: core: Add support for processing events from client device") Signed-off-by: Qiang Yu quic_qianyu@quicinc.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Tested-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Link: https://lore.kernel.org/r/1702276972-41296-3-git-send-email-quic_qianyu@quic... [mani: added fixes tag and cc'ed stable] Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/bus/mhi/host/main.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/bus/mhi/host/main.c +++ b/drivers/bus/mhi/host/main.c @@ -643,6 +643,8 @@ static int parse_xfer_event(struct mhi_c mhi_del_ring_element(mhi_cntrl, tre_ring); local_rp = tre_ring->rp;
+ read_unlock_bh(&mhi_chan->lock); + /* notify client */ mhi_chan->xfer_cb(mhi_chan->mhi_dev, &result);
@@ -668,6 +670,8 @@ static int parse_xfer_event(struct mhi_c kfree(buf_info->cb_buf); } } + + read_lock_bh(&mhi_chan->lock); } break; } /* CC_EOT */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bhaumik Bhatt bbhatt@codeaurora.org
commit b89b6a863dd53bc70d8e52d50f9cfaef8ef5e9c9 upstream.
Protect WP accesses such that multiple threads queueing buffers for incoming data do not race.
Meanwhile, if CONFIG_TRACE_IRQFLAGS is enabled, irq will be enabled once __local_bh_enable_ip is called as part of write_unlock_bh. Hence, let's take irqsave lock after TRE is generated to avoid running write_unlock_bh when irqsave lock is held.
Cc: stable@vger.kernel.org Fixes: 189ff97cca53 ("bus: mhi: core: Add support for data transfer") Signed-off-by: Bhaumik Bhatt bbhatt@codeaurora.org Signed-off-by: Qiang Yu quic_qianyu@quicinc.com Reviewed-by: Jeffrey Hugo quic_jhugo@quicinc.com Tested-by: Jeffrey Hugo quic_jhugo@quicinc.com Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Link: https://lore.kernel.org/r/1702276972-41296-2-git-send-email-quic_qianyu@quic... Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/bus/mhi/host/main.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
--- a/drivers/bus/mhi/host/main.c +++ b/drivers/bus/mhi/host/main.c @@ -1127,17 +1127,15 @@ static int mhi_queue(struct mhi_device * if (unlikely(MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state))) return -EIO;
- read_lock_irqsave(&mhi_cntrl->pm_lock, flags); - ret = mhi_is_ring_full(mhi_cntrl, tre_ring); - if (unlikely(ret)) { - ret = -EAGAIN; - goto exit_unlock; - } + if (unlikely(ret)) + return -EAGAIN;
ret = mhi_gen_tre(mhi_cntrl, mhi_chan, buf_info, mflags); if (unlikely(ret)) - goto exit_unlock; + return ret; + + read_lock_irqsave(&mhi_cntrl->pm_lock, flags);
/* Packet is queued, take a usage ref to exit M3 if necessary * for host->device buffer, balanced put is done on buffer completion @@ -1157,7 +1155,6 @@ static int mhi_queue(struct mhi_device * if (dir == DMA_FROM_DEVICE) mhi_cntrl->runtime_put(mhi_cntrl);
-exit_unlock: read_unlock_irqrestore(&mhi_cntrl->pm_lock, flags);
return ret; @@ -1209,6 +1206,9 @@ int mhi_gen_tre(struct mhi_controller *m int eot, eob, chain, bei; int ret;
+ /* Protect accesses for reading and incrementing WP */ + write_lock_bh(&mhi_chan->lock); + buf_ring = &mhi_chan->buf_ring; tre_ring = &mhi_chan->tre_ring;
@@ -1226,8 +1226,10 @@ int mhi_gen_tre(struct mhi_controller *m
if (!info->pre_mapped) { ret = mhi_cntrl->map_single(mhi_cntrl, buf_info); - if (ret) + if (ret) { + write_unlock_bh(&mhi_chan->lock); return ret; + } }
eob = !!(flags & MHI_EOB); @@ -1244,6 +1246,8 @@ int mhi_gen_tre(struct mhi_controller *m mhi_add_ring_element(mhi_cntrl, tre_ring); mhi_add_ring_element(mhi_cntrl, buf_ring);
+ write_unlock_bh(&mhi_chan->lock); + return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 735ae74f73e55c191d48689bd11ff4a06ea0508f upstream.
When running with narrow firmware (64-bit kernel using a 32-bit firmware), extend PDC addresses into the 0xfffffff0.00000000 region instead of the 0xf0f0f0f0.00000000 region.
This fixes the power button on the C3700 machine in qemu (64-bit CPU with 32-bit firmware), and my assumption is that the previous code was really never used (because most 64-bit machines have a 64-bit firmware), or it just worked on very old machines because they may only decode 40-bit of virtual addresses.
Cc: stable@vger.kernel.org Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/parisc/kernel/firmware.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/parisc/kernel/firmware.c +++ b/arch/parisc/kernel/firmware.c @@ -123,10 +123,10 @@ static unsigned long f_extend(unsigned l #ifdef CONFIG_64BIT if(unlikely(parisc_narrow_firmware)) { if((address & 0xff000000) == 0xf0000000) - return 0xf0f0f0f000000000UL | (u32)address; + return (0xfffffff0UL << 32) | (u32)address;
if((address & 0xf0000000) == 0xf0000000) - return 0xffffffff00000000UL | (u32)address; + return (0xffffffffUL << 32) | (u32)address; } #endif return address;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Helge Deller deller@gmx.de
commit 6472036581f947109b20664121db1d143e916f0b upstream.
Make sure to start the kthread to check the power button on qemu as well if the power button address was provided. This fixes the qemu built-in system_powerdown runtime command.
Fixes: d0c219472980 ("parisc/power: Add power soft-off when running on qemu") Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/parisc/power.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/parisc/power.c +++ b/drivers/parisc/power.c @@ -238,7 +238,7 @@ static int __init power_init(void) if (running_on_qemu && soft_power_reg) register_sys_off_handler(SYS_OFF_MODE_POWER_OFF, SYS_OFF_PRIO_DEFAULT, qemu_power_off, (void *)soft_power_reg); - else + if (!running_on_qemu || soft_power_reg) power_task = kthread_run(kpowerswd, (void*)soft_power_reg, KTHREAD_NAME); if (IS_ERR(power_task)) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andrejs Cainikovs andrejs.cainikovs@toradex.com
commit b76bbf835d8945080b22b52fc1e6f41cde06865d upstream.
Newer variants of Ixora boards require a power-up delay when powering up the CAN transceiver of up to 1ms.
Cc: stable@vger.kernel.org Signed-off-by: Andrejs Cainikovs andrejs.cainikovs@toradex.com Signed-off-by: Shawn Guo shawnguo@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/nxp/imx/imx6q-apalis-ixora-v1.2.dts | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/arm/boot/dts/nxp/imx/imx6q-apalis-ixora-v1.2.dts +++ b/arch/arm/boot/dts/nxp/imx/imx6q-apalis-ixora-v1.2.dts @@ -76,6 +76,7 @@ pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enable_can1_power>; regulator-name = "can1_supply"; + startup-delay-us = <1000>; };
reg_can2_supply: regulator-can2-supply { @@ -85,6 +86,7 @@ pinctrl-names = "default"; pinctrl-0 = <&pinctrl_enable_can2_power>; regulator-name = "can2_supply"; + startup-delay-us = <1000>; }; };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 663affdb12b3e26c77d103327cf27de720c8117e upstream.
The sc8280xp Display Port PHYs can be used in either DP or eDP mode and this is configured using the devicetree compatible string which defaults to DP mode in the SoC dtsi.
Override the default compatible string for the CRD eDP PHY node so that the eDP settings are used.
Fixes: 4a883a8d80b5 ("arm64: dts: qcom: sc8280xp-crd: Enable EDP") Cc: stable@vger.kernel.org # 6.3 Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20231016080658.6667-1-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sc8280xp-crd.dts | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts b/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts index e4861c61a65b..ffc4406422ae 100644 --- a/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts +++ b/arch/arm64/boot/dts/qcom/sc8280xp-crd.dts @@ -458,6 +458,8 @@ mdss0_dp3_out: endpoint { };
&mdss0_dp3_phy { + compatible = "qcom,sc8280xp-edp-phy"; + vdda-phy-supply = <&vreg_l6b>; vdda-pll-supply = <&vreg_l3b>;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit d0ec3c4c11c3b30e1f2d344973b2a7bf0f986734 upstream.
The DP/DM wakeup interrupts are edge triggered and which edge to trigger on depends on use-case and whether a Low speed or Full/High speed device is connected.
Fixes: fea4b41022f3 ("ARM: dts: qcom: sdx55: Add USB3 and PHY support") Cc: stable@vger.kernel.org # 5.12 Cc: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20231120164331.8116-2-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/qcom/qcom-sdx55.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm/boot/dts/qcom/qcom-sdx55.dtsi +++ b/arch/arm/boot/dts/qcom/qcom-sdx55.dtsi @@ -594,8 +594,8 @@
interrupts = <GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 198 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 158 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 157 IRQ_TYPE_LEVEL_HIGH>; + <GIC_SPI 158 IRQ_TYPE_EDGE_BOTH>, + <GIC_SPI 157 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Paul Cercueil paul@crapouillou.net
commit 84228d5e29dbc7a6be51e221000e1d122125826c upstream.
The kernel hangs for a good 12 seconds without any info being printed to dmesg, very early in the boot process, if this regulator is not enabled.
Force-enable it to work around this issue, until we know more about the underlying problem.
Signed-off-by: Paul Cercueil paul@crapouillou.net Fixes: 8620cc2f99b7 ("ARM: dts: exynos: Add devicetree file for the Galaxy S2") Cc: stable@vger.kernel.org # v5.8+ Link: https://lore.kernel.org/r/20231206221556.15348-2-paul@crapouillou.net Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/samsung/exynos4210-i9100.dts | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/arch/arm/boot/dts/samsung/exynos4210-i9100.dts +++ b/arch/arm/boot/dts/samsung/exynos4210-i9100.dts @@ -527,6 +527,14 @@ regulator-name = "VT_CAM_1.8V"; regulator-min-microvolt = <1800000>; regulator-max-microvolt = <1800000>; + + /* + * Force-enable this regulator; otherwise the + * kernel hangs very early in the boot process + * for about 12 seconds, without apparent + * reason. + */ + regulator-always-on; };
vcclcd_reg: LDO13 {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit cc25bd06c16aa582596a058d375b2e3133f79b93 upstream.
The Qualcomm PDC interrupt controller binding expects two cells in interrupt specifiers.
Fixes: 9d038b2e62de ("ARM: dts: qcom: Add SDX55 platform and MTP board support") Cc: stable@vger.kernel.org # 5.12 Cc: Manivannan Sadhasivam mani@kernel.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Link: https://lore.kernel.org/r/20231213173131.29436-2-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/qcom/qcom-sdx55.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/qcom/qcom-sdx55.dtsi +++ b/arch/arm/boot/dts/qcom/qcom-sdx55.dtsi @@ -619,7 +619,7 @@ compatible = "qcom,sdx55-pdc", "qcom,pdc"; reg = <0x0b210000 0x30000>; qcom,pdc-ranges = <0 179 52>; - #interrupt-cells = <3>; + #interrupt-cells = <2>; interrupt-parent = <&intc>; interrupt-controller; };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cixi Geng cixi.geng1@unisoc.com
commit 2da4f4a7b003441b80f0f12d8a216590f652a40f upstream.
The UMS512 Socs have 8 cores contains 6 a55 and 2 a75. modify the cpu nodes to correct information.
Fixes: 2b4881839a39 ("arm64: dts: sprd: Add support for Unisoc's UMS512") Cc: stable@vger.kernel.org Signed-off-by: Cixi Geng cixi.geng1@unisoc.com Link: https://lore.kernel.org/r/20230711162346.5978-1-cixi.geng@linux.dev Signed-off-by: Chunyan Zhang chunyan.zhang@unisoc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/sprd/ums512.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/sprd/ums512.dtsi b/arch/arm64/boot/dts/sprd/ums512.dtsi index 024be594c47d..97ac550af2f1 100644 --- a/arch/arm64/boot/dts/sprd/ums512.dtsi +++ b/arch/arm64/boot/dts/sprd/ums512.dtsi @@ -96,7 +96,7 @@ CPU5: cpu@500 {
CPU6: cpu@600 { device_type = "cpu"; - compatible = "arm,cortex-a55"; + compatible = "arm,cortex-a75"; reg = <0x0 0x600>; enable-method = "psci"; cpu-idle-states = <&CORE_PD>; @@ -104,7 +104,7 @@ CPU6: cpu@600 {
CPU7: cpu@700 { device_type = "cpu"; - compatible = "arm,cortex-a55"; + compatible = "arm,cortex-a75"; reg = <0x0 0x700>; enable-method = "psci"; cpu-idle-states = <&CORE_PD>;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tianling Shen cnsztl@gmail.com
commit fc5a80a432607d05e85bba37971712405f75c546 upstream.
The default strength is not enough to provide stable connection under 3.3v LDO voltage.
Fixes: 387b3bbac5ea ("arm64: dts: rockchip: Add Xunlong OrangePi R1 Plus LTS") Cc: stable@vger.kernel.org # 6.6+ Signed-off-by: Tianling Shen cnsztl@gmail.com Link: https://lore.kernel.org/r/20231216040723.17864-1-cnsztl@gmail.com Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts b/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts index 5d7d567283e5..4237f2ee8fee 100644 --- a/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts +++ b/arch/arm64/boot/dts/rockchip/rk3328-orangepi-r1-plus-lts.dts @@ -26,9 +26,11 @@ yt8531c: ethernet-phy@0 { compatible = "ethernet-phy-ieee802.3-c22"; reg = <0>;
+ motorcomm,auto-sleep-disabled; motorcomm,clk-out-frequency-hz = <125000000>; motorcomm,keep-pll-enabled; - motorcomm,auto-sleep-disabled; + motorcomm,rx-clk-drv-microamp = <5020>; + motorcomm,rx-data-drv-microamp = <5020>;
pinctrl-0 = <ð_phy_reset_pin>; pinctrl-names = "default";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sam Edwards cfsworks@gmail.com
commit 44de8996ed5a10f08f2fe947182da6535edcfae5 upstream.
The QoS blocks saved/restored when toggling the PD_USB power domain are clocked by ACLK_USB. Attempting to access these memory regions without that clock running will result in an indefinite CPU stall.
The PD_USB node wasn't specifying this clock dependency, resulting in hangs when trying to toggle the power domain (either on or off), unless we get "lucky" and have ACLK_USB running for another reason at the time. This "luck" can result from the bootloader leaving USB powered/clocked, and if no built-in driver wants USB, Linux will disable the unused PD+CLK on boot when {pd,clk}_ignore_unused aren't given. This can also be unlucky because the two cleanup tasks run in parallel and race: if the CLK is disabled first, the PD deactivation stalls the boot. In any case, the PD cannot then be reenabled (if e.g. the driver loads later) once the clock has been stopped.
Fix this by specifying a dependency on ACLK_USB, instead of only ACLK_USB_ROOT. The child-parent relationship means the former implies the latter anyway.
Fixes: c9211fa2602b8 ("arm64: dts: rockchip: Add base DT for rk3588 SoC") Cc: stable@vger.kernel.org Signed-off-by: Sam Edwards CFSworks@gmail.com Link: https://lore.kernel.org/r/20231216021019.1543811-1-CFSworks@gmail.com [changed to only include the missing clock, not dropping the root-clocks] Signed-off-by: Heiko Stuebner heiko@sntech.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/rockchip/rk3588s.dtsi | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm64/boot/dts/rockchip/rk3588s.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3588s.dtsi @@ -890,6 +890,7 @@ reg = <RK3588_PD_USB>; clocks = <&cru PCLK_PHP_ROOT>, <&cru ACLK_USB_ROOT>, + <&cru ACLK_USB>, <&cru HCLK_USB_ROOT>, <&cru HCLK_HOST0>, <&cru HCLK_HOST_ARB0>,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephan Gerhold stephan@gerhold.net
commit 7c45b6ddbcff01f9934d11802010cfeb0879e693 upstream.
The blsp_dma controller is shared between the different subsystems, which is why it is already initialized by the firmware. We should not reinitialize it from Linux to avoid potential other users of the DMA engine to misbehave.
In mainline this can be described using the "qcom,controlled-remotely" property. In the downstream/vendor kernel from Qualcomm there is an opposite "qcom,managed-locally" property. This property is *not* set for the qcom,sps-dma@7884000 [1] so adding "qcom,controlled-remotely" upstream matches the behavior of the downstream/vendor kernel.
Adding this seems to fix some weird issues with UART where both input/output becomes garbled with certain obscure firmware versions on some devices.
[1]: https://git.codelinaro.org/clo/la/kernel/msm-3.10/-/blob/LA.BR.1.2.9.1-02310...
Cc: stable@vger.kernel.org # 6.5 Fixes: a0e5fb103150 ("arm64: dts: qcom: Add msm8916 BLSP device nodes") Signed-off-by: Stephan Gerhold stephan@gerhold.net Reviewed-by: Bryan O'Donoghue bryan.odonoghue@linaro.org Link: https://lore.kernel.org/r/20231204-msm8916-blsp-dma-remote-v1-1-3e49c8838c8d... Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/msm8916.dtsi | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm64/boot/dts/qcom/msm8916.dtsi +++ b/arch/arm64/boot/dts/qcom/msm8916.dtsi @@ -2085,6 +2085,7 @@ clock-names = "bam_clk"; #dma-cells = <1>; qcom,ee = <0>; + qcom,controlled-remotely; };
blsp_uart1: serial@78af000 {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephan Gerhold stephan@gerhold.net
commit 4bbda9421f316efdaef5dbf642e24925ef7de130 upstream.
The blsp_dma controller is shared between the different subsystems, which is why it is already initialized by the firmware. We should not reinitialize it from Linux to avoid potential other users of the DMA engine to misbehave.
In mainline this can be described using the "qcom,controlled-remotely" property. In the downstream/vendor kernel from Qualcomm there is an opposite "qcom,managed-locally" property. This property is *not* set for the qcom,sps-dma@7884000 [1] so adding "qcom,controlled-remotely" upstream matches the behavior of the downstream/vendor kernel.
Adding this seems to fix some weird issues with UART where both input/output becomes garbled with certain obscure firmware versions on some devices.
[1]: https://git.codelinaro.org/clo/la/kernel/msm-3.10/-/blob/LA.BR.1.2.9.1-02310...
Cc: stable@vger.kernel.org # 6.5 Fixes: 61550c6c156c ("arm64: dts: qcom: Add msm8939 SoC") Signed-off-by: Stephan Gerhold stephan@gerhold.net Reviewed-by: Bryan O'Donoghue bryan.odonoghue@linaro.org Link: https://lore.kernel.org/r/20231204-msm8916-blsp-dma-remote-v1-2-3e49c8838c8d... Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/msm8939.dtsi | 1 + 1 file changed, 1 insertion(+)
--- a/arch/arm64/boot/dts/qcom/msm8939.dtsi +++ b/arch/arm64/boot/dts/qcom/msm8939.dtsi @@ -1661,6 +1661,7 @@ clock-names = "bam_clk"; #dma-cells = <1>; qcom,ee = <0>; + qcom,controlled-remotely; };
blsp_uart1: serial@78af000 {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 9b956999bf725fd62613f719c3178fdbee6e5f47 upstream.
The DP/DM wakeup interrupts are edge triggered and which edge to trigger on depends on use-case and whether a Low speed or Full/High speed device is connected.
Fixes: 0b766e7fe5a2 ("arm64: dts: qcom: sc7180: Add USB related nodes") Cc: stable@vger.kernel.org # 5.10 Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20231120164331.8116-4-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sc7180.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sc7180.dtsi +++ b/arch/arm64/boot/dts/qcom/sc7180.dtsi @@ -2978,8 +2978,8 @@
interrupts-extended = <&intc GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, <&pdc 6 IRQ_TYPE_LEVEL_HIGH>, - <&pdc 8 IRQ_TYPE_LEVEL_HIGH>, - <&pdc 9 IRQ_TYPE_LEVEL_HIGH>; + <&pdc 8 IRQ_TYPE_EDGE_BOTH>, + <&pdc 9 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 84ad9ac8d9ca29033d589e79a991866b38e23b85 upstream.
The DP/DM wakeup interrupts are edge triggered and which edge to trigger on depends on use-case and whether a Low speed or Full/High speed device is connected.
Fixes: ca4db2b538a1 ("arm64: dts: qcom: sdm845: Add USB-related nodes") Cc: stable@vger.kernel.org # 4.20 Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20231120164331.8116-9-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sdm845.dtsi | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi @@ -4086,8 +4086,8 @@
interrupts = <GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 488 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 489 IRQ_TYPE_LEVEL_HIGH>; + <GIC_SPI 488 IRQ_TYPE_EDGE_BOTH>, + <GIC_SPI 489 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
@@ -4137,8 +4137,8 @@
interrupts = <GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 487 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 490 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 491 IRQ_TYPE_LEVEL_HIGH>; + <GIC_SPI 490 IRQ_TYPE_EDGE_BOTH>, + <GIC_SPI 491 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit de3b3de30999106549da4df88a7963d0ac02b91e upstream.
The DP/DM wakeup interrupts are edge triggered and which edge to trigger on depends on use-case and whether a Low speed or Full/High speed device is connected.
Fixes: 07c8ded6e373 ("arm64: dts: qcom: add sdm670 and pixel 3a device trees") Cc: stable@vger.kernel.org # 6.2 Cc: Richard Acayan mailingradian@gmail.com Signed-off-by: Johan Hovold johan+linaro@kernel.org Acked-by: Richard Acayan mailingradian@gmail.com Link: https://lore.kernel.org/r/20231120164331.8116-8-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sdm670.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sdm670.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm670.dtsi @@ -1297,8 +1297,8 @@
interrupts = <GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 488 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 489 IRQ_TYPE_LEVEL_HIGH>; + <GIC_SPI 488 IRQ_TYPE_EDGE_BOTH>, + <GIC_SPI 489 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 54524b6987d1fffe64cbf3dded1b2fa6b903edf9 upstream.
The DP/DM wakeup interrupts are edge triggered and which edge to trigger on depends on use-case and whether a Low speed or Full/High speed device is connected.
Fixes: 0c9dde0d2015 ("arm64: dts: qcom: sm8150: Add secondary USB and PHY nodes") Fixes: b33d2868e8d3 ("arm64: dts: qcom: sm8150: Add USB and PHY device nodes") Cc: stable@vger.kernel.org # 5.10 Cc: Jonathan Marek jonathan@marek.ca Cc: Jack Pham quic_jackp@quicinc.com Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Jack Pham quic_jackp@quicinc.com Link: https://lore.kernel.org/r/20231120164331.8116-11-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sm8150.dtsi | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sm8150.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8150.dtsi @@ -3594,8 +3594,8 @@
interrupts = <GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 488 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 489 IRQ_TYPE_LEVEL_HIGH>; + <GIC_SPI 488 IRQ_TYPE_EDGE_BOTH>, + <GIC_SPI 489 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
@@ -3647,8 +3647,8 @@
interrupts = <GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 487 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 490 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 491 IRQ_TYPE_LEVEL_HIGH>; + <GIC_SPI 490 IRQ_TYPE_EDGE_BOTH>, + <GIC_SPI 491 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 0dc0f6da3d43da8d2297105663e51ecb01b6f790 upstream.
The DP/DM wakeup interrupts are edge triggered and which edge to trigger on depends on use-case and whether a Low speed or Full/High speed device is connected.
Fixes: b080f53a8f44 ("arm64: dts: qcom: sc8180x: Add remoteprocs, wifi and usb nodes") Cc: stable@vger.kernel.org # 6.5 Cc: Vinod Koul vkoul@kernel.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20231120164331.8116-7-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sc8180x.dtsi | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sc8180x.dtsi +++ b/arch/arm64/boot/dts/qcom/sc8180x.dtsi @@ -2562,8 +2562,8 @@ reg = <0 0x0a6f8800 0 0x400>; interrupts = <GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 488 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 489 IRQ_TYPE_LEVEL_HIGH>; + <GIC_SPI 488 IRQ_TYPE_EDGE_BOTH>, + <GIC_SPI 489 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", @@ -2636,8 +2636,8 @@ power-domains = <&gcc USB30_SEC_GDSC>; interrupts = <GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>, <GIC_SPI 487 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 490 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 491 IRQ_TYPE_LEVEL_HIGH>; + <GIC_SPI 490 IRQ_TYPE_EDGE_BOTH>, + <GIC_SPI 491 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit c34199d967a946e55381404fa949382691737521 upstream.
A recent cleanup reordering the usb_1 wakeup interrupts inadvertently switched the DP and SuperSpeed interrupt trigger types.
Fixes: 4a7ffc10d195 ("arm64: dts: qcom: align DWC3 USB interrupts with DT schema") Cc: stable@vger.kernel.org # 5.19 Cc: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Link: https://lore.kernel.org/r/20231120164331.8116-5-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sc7280.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sc7280.dtsi +++ b/arch/arm64/boot/dts/qcom/sc7280.dtsi @@ -3668,9 +3668,9 @@ assigned-clock-rates = <19200000>, <200000000>;
interrupts-extended = <&intc GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, - <&pdc 14 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 14 IRQ_TYPE_EDGE_BOTH>, <&pdc 15 IRQ_TYPE_EDGE_BOTH>, - <&pdc 17 IRQ_TYPE_EDGE_BOTH>; + <&pdc 17 IRQ_TYPE_LEVEL_HIGH>; interrupt-names = "hs_phy_irq", "dp_hs_phy_irq", "dm_hs_phy_irq",
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Stephan Gerhold stephan@gerhold.net
commit cc1ec484f2d0f464ad11b56fe3de2589c23f73ec upstream.
Add the missing vio-supply to all usages of the AW2013 LED controller to ensure that the regulator needed for pull-up of the interrupt and I2C lines is really turned on. While this seems to have worked fine so far some of these regulators are not guaranteed to be always-on. For example, pm8916_l6 is typically turned off together with the display if there aren't any other devices (e.g. sensors) keeping it always-on.
Cc: stable@vger.kernel.org # 6.6 Signed-off-by: Stephan Gerhold stephan@gerhold.net Link: https://lore.kernel.org/r/20231204-qcom-aw2013-vio-v1-1-5d264bb5c0b2@gerhold... Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/msm8916-longcheer-l8150.dts | 1 + arch/arm64/boot/dts/qcom/msm8916-wingtech-wt88047.dts | 1 + arch/arm64/boot/dts/qcom/msm8953-xiaomi-mido.dts | 1 + arch/arm64/boot/dts/qcom/msm8953-xiaomi-tissot.dts | 1 + arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts | 1 + 5 files changed, 5 insertions(+)
--- a/arch/arm64/boot/dts/qcom/msm8916-longcheer-l8150.dts +++ b/arch/arm64/boot/dts/qcom/msm8916-longcheer-l8150.dts @@ -88,6 +88,7 @@ #size-cells = <0>;
vcc-supply = <&pm8916_l17>; + vio-supply = <&pm8916_l6>;
led@0 { reg = <0>; --- a/arch/arm64/boot/dts/qcom/msm8916-wingtech-wt88047.dts +++ b/arch/arm64/boot/dts/qcom/msm8916-wingtech-wt88047.dts @@ -118,6 +118,7 @@ #size-cells = <0>;
vcc-supply = <&pm8916_l16>; + vio-supply = <&pm8916_l5>;
led@0 { reg = <0>; --- a/arch/arm64/boot/dts/qcom/msm8953-xiaomi-mido.dts +++ b/arch/arm64/boot/dts/qcom/msm8953-xiaomi-mido.dts @@ -111,6 +111,7 @@ reg = <0x45>;
vcc-supply = <&pm8953_l10>; + vio-supply = <&pm8953_l5>;
#address-cells = <1>; #size-cells = <0>; --- a/arch/arm64/boot/dts/qcom/msm8953-xiaomi-tissot.dts +++ b/arch/arm64/boot/dts/qcom/msm8953-xiaomi-tissot.dts @@ -104,6 +104,7 @@ reg = <0x45>;
vcc-supply = <&pm8953_l10>; + vio-supply = <&pm8953_l5>;
#address-cells = <1>; #size-cells = <0>; --- a/arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts +++ b/arch/arm64/boot/dts/qcom/msm8953-xiaomi-vince.dts @@ -113,6 +113,7 @@ reg = <0x45>;
vcc-supply = <&pm8953_l10>; + vio-supply = <&pm8953_l5>;
#address-cells = <1>; #size-cells = <0>;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit de95f139394a5ed82270f005bc441d2e7c1e51b7 upstream.
The USB DP/DM HS PHY interrupts need to be provided by the PDC interrupt controller in order to be able to wake the system up from low-power states and to be able to detect disconnect events, which requires triggering on falling edges.
A recent commit updated the trigger type but failed to change the interrupt provider as required. This leads to the current Linux driver failing to probe instead of printing an error during suspend and USB wakeup not working as intended.
Fixes: d0ec3c4c11c3 ("ARM: dts: qcom: sdx55: fix USB wakeup interrupt types") Fixes: fea4b41022f3 ("ARM: dts: qcom: sdx55: Add USB3 and PHY support") Cc: stable@vger.kernel.org # 5.12 Cc: Manivannan Sadhasivam mani@kernel.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Link: https://lore.kernel.org/r/20231213173131.29436-3-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/qcom/qcom-sdx55.dtsi | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/arm/boot/dts/qcom/qcom-sdx55.dtsi +++ b/arch/arm/boot/dts/qcom/qcom-sdx55.dtsi @@ -592,10 +592,10 @@ <&gcc GCC_USB30_MASTER_CLK>; assigned-clock-rates = <19200000>, <200000000>;
- interrupts = <GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 198 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 158 IRQ_TYPE_EDGE_BOTH>, - <GIC_SPI 157 IRQ_TYPE_EDGE_BOTH>; + interrupts-extended = <&intc GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, + <&intc GIC_SPI 198 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 11 IRQ_TYPE_EDGE_BOTH>, + <&pdc 10 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 204f9ed4bad6293933179517624143b8f412347c upstream.
The USB DP/DM HS PHY interrupts need to be provided by the PDC interrupt controller in order to be able to wake the system up from low-power states and to be able to detect disconnect events, which requires triggering on falling edges.
A recent commit updated the trigger type but failed to change the interrupt provider as required. This leads to the current Linux driver failing to probe instead of printing an error during suspend and USB wakeup not working as intended.
Fixes: 84ad9ac8d9ca ("arm64: dts: qcom: sdm845: fix USB wakeup interrupt types") Fixes: ca4db2b538a1 ("arm64: dts: qcom: sdm845: Add USB-related nodes") Cc: stable@vger.kernel.org # 4.20 Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20231213173403.29544-3-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sdm845.dtsi | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi @@ -4084,10 +4084,10 @@ <&gcc GCC_USB30_PRIM_MASTER_CLK>; assigned-clock-rates = <19200000>, <150000000>;
- interrupts = <GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 488 IRQ_TYPE_EDGE_BOTH>, - <GIC_SPI 489 IRQ_TYPE_EDGE_BOTH>; + interrupts-extended = <&intc GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, + <&intc GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, + <&pdc_intc 8 IRQ_TYPE_EDGE_BOTH>, + <&pdc_intc 9 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
@@ -4135,10 +4135,10 @@ <&gcc GCC_USB30_SEC_MASTER_CLK>; assigned-clock-rates = <19200000>, <150000000>;
- interrupts = <GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 487 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 490 IRQ_TYPE_EDGE_BOTH>, - <GIC_SPI 491 IRQ_TYPE_EDGE_BOTH>; + interrupts-extended = <&intc GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>, + <&intc GIC_SPI 487 IRQ_TYPE_LEVEL_HIGH>, + <&pdc_intc 10 IRQ_TYPE_EDGE_BOTH>, + <&pdc_intc 11 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 971f5d8b0618d09db75184ddd8cca0767514db5d upstream.
The USB SS PHY interrupts need to be provided by the PDC interrupt controller in order to be able to wake the system up from low-power states.
Fixes: ca4db2b538a1 ("arm64: dts: qcom: sdm845: Add USB-related nodes") Cc: stable@vger.kernel.org # 4.20 Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20231213173403.29544-4-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sdm845.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sdm845.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi @@ -4085,7 +4085,7 @@ assigned-clock-rates = <19200000>, <150000000>;
interrupts-extended = <&intc GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, - <&intc GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, + <&pdc_intc 6 IRQ_TYPE_LEVEL_HIGH>, <&pdc_intc 8 IRQ_TYPE_EDGE_BOTH>, <&pdc_intc 9 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", @@ -4136,7 +4136,7 @@ assigned-clock-rates = <19200000>, <150000000>;
interrupts-extended = <&intc GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>, - <&intc GIC_SPI 487 IRQ_TYPE_LEVEL_HIGH>, + <&pdc_intc 7 IRQ_TYPE_LEVEL_HIGH>, <&pdc_intc 10 IRQ_TYPE_EDGE_BOTH>, <&pdc_intc 11 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq",
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 134de5e831775e8b178db9b131c1d3769a766982 upstream.
The USB DP/DM HS PHY interrupts need to be provided by the PDC interrupt controller in order to be able to wake the system up from low-power states and to be able to detect disconnect events, which requires triggering on falling edges.
A recent commit updated the trigger type but failed to change the interrupt provider as required. This leads to the current Linux driver failing to probe instead of printing an error during suspend and USB wakeup not working as intended.
Fixes: 54524b6987d1 ("arm64: dts: qcom: sm8150: fix USB wakeup interrupt types") Fixes: 0c9dde0d2015 ("arm64: dts: qcom: sm8150: Add secondary USB and PHY nodes") Fixes: b33d2868e8d3 ("arm64: dts: qcom: sm8150: Add USB and PHY device nodes") Cc: stable@vger.kernel.org # 5.10 Cc: Jack Pham quic_jackp@quicinc.com Cc: Jonathan Marek jonathan@marek.ca Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20231213173403.29544-5-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sm8150.dtsi | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sm8150.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8150.dtsi @@ -3592,10 +3592,10 @@ <&gcc GCC_USB30_PRIM_MASTER_CLK>; assigned-clock-rates = <19200000>, <200000000>;
- interrupts = <GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 488 IRQ_TYPE_EDGE_BOTH>, - <GIC_SPI 489 IRQ_TYPE_EDGE_BOTH>; + interrupts-extended = <&intc GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, + <&intc GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 8 IRQ_TYPE_EDGE_BOTH>, + <&pdc 9 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
@@ -3645,10 +3645,10 @@ <&gcc GCC_USB30_SEC_MASTER_CLK>; assigned-clock-rates = <19200000>, <200000000>;
- interrupts = <GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 487 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 490 IRQ_TYPE_EDGE_BOTH>, - <GIC_SPI 491 IRQ_TYPE_EDGE_BOTH>; + interrupts-extended = <&intc GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>, + <&intc GIC_SPI 487 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 10 IRQ_TYPE_EDGE_BOTH>, + <&pdc 11 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit cc4e1da491b84ca05339a19893884cda78f74aef upstream.
The USB SS PHY interrupts need to be provided by the PDC interrupt controller in order to be able to wake the system up from low-power states.
Fixes: 0c9dde0d2015 ("arm64: dts: qcom: sm8150: Add secondary USB and PHY nodes") Fixes: b33d2868e8d3 ("arm64: dts: qcom: sm8150: Add USB and PHY device nodes") Cc: stable@vger.kernel.org # 5.10 Cc: Jack Pham quic_jackp@quicinc.com Cc: Jonathan Marek jonathan@marek.ca Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20231213173403.29544-6-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sm8150.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sm8150.dtsi +++ b/arch/arm64/boot/dts/qcom/sm8150.dtsi @@ -3593,7 +3593,7 @@ assigned-clock-rates = <19200000>, <200000000>;
interrupts-extended = <&intc GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, - <&intc GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 6 IRQ_TYPE_LEVEL_HIGH>, <&pdc 8 IRQ_TYPE_EDGE_BOTH>, <&pdc 9 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", @@ -3646,7 +3646,7 @@ assigned-clock-rates = <19200000>, <200000000>;
interrupts-extended = <&intc GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>, - <&intc GIC_SPI 487 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 7 IRQ_TYPE_LEVEL_HIGH>, <&pdc 10 IRQ_TYPE_EDGE_BOTH>, <&pdc 11 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq",
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 687d402bb350b392fa330e9d9d1b917777ee9ed1 upstream.
The USB DP/DM HS PHY interrupts need to be provided by the PDC interrupt controller in order to be able to wake the system up from low-power states and to be able to detect disconnect events, which requires triggering on falling edges.
A recent commit updated the trigger type but failed to change the interrupt provider as required. This leads to the current Linux driver failing to probe instead of printing an error during suspend and USB wakeup not working as intended.
Fixes: 0dc0f6da3d43 ("arm64: dts: qcom: sc8180x: fix USB wakeup interrupt types") Fixes: b080f53a8f44 ("arm64: dts: qcom: sc8180x: Add remoteprocs, wifi and usb nodes") Cc: stable@vger.kernel.org # 6.5 Cc: Vinod Koul vkoul@kernel.org Reported-by: Konrad Dybcio konrad.dybcio@linaro.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Tested-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20231213173403.29544-2-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sc8180x.dtsi | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sc8180x.dtsi +++ b/arch/arm64/boot/dts/qcom/sc8180x.dtsi @@ -2560,10 +2560,10 @@ usb_prim: usb@a6f8800 { compatible = "qcom,sc8180x-dwc3", "qcom,dwc3"; reg = <0 0x0a6f8800 0 0x400>; - interrupts = <GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 488 IRQ_TYPE_EDGE_BOTH>, - <GIC_SPI 489 IRQ_TYPE_EDGE_BOTH>; + interrupts-extended = <&intc GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, + <&intc GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 8 IRQ_TYPE_EDGE_BOTH>, + <&pdc 9 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", @@ -2634,10 +2634,10 @@ "xo"; resets = <&gcc GCC_USB30_SEC_BCR>; power-domains = <&gcc USB30_SEC_GDSC>; - interrupts = <GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 487 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 490 IRQ_TYPE_EDGE_BOTH>, - <GIC_SPI 491 IRQ_TYPE_EDGE_BOTH>; + interrupts-extended = <&intc GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>, + <&intc GIC_SPI 487 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 10 IRQ_TYPE_EDGE_BOTH>, + <&pdc 11 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 0afa885d42d05d30161ab8eab1ebacd993edb82b upstream.
The USB SS PHY interrupt needs to be provided by the PDC interrupt controller in order to be able to wake the system up from low-power states.
Fixes: b080f53a8f44 ("arm64: dts: qcom: sc8180x: Add remoteprocs, wifi and usb nodes") Cc: stable@vger.kernel.org # 6.5 Cc: Vinod Koul vkoul@kernel.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Link: https://lore.kernel.org/r/20231214074319.11023-4-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sc8180x.dtsi | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sc8180x.dtsi +++ b/arch/arm64/boot/dts/qcom/sc8180x.dtsi @@ -2561,7 +2561,7 @@ compatible = "qcom,sc8180x-dwc3", "qcom,dwc3"; reg = <0 0x0a6f8800 0 0x400>; interrupts-extended = <&intc GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, - <&intc GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 6 IRQ_TYPE_LEVEL_HIGH>, <&pdc 8 IRQ_TYPE_EDGE_BOTH>, <&pdc 9 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", @@ -2635,7 +2635,7 @@ resets = <&gcc GCC_USB30_SEC_BCR>; power-domains = <&gcc USB30_SEC_GDSC>; interrupts-extended = <&intc GIC_SPI 136 IRQ_TYPE_LEVEL_HIGH>, - <&intc GIC_SPI 487 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 7 IRQ_TYPE_LEVEL_HIGH>, <&pdc 10 IRQ_TYPE_EDGE_BOTH>, <&pdc 11 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq",
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit c42d12ea105f67b0f137f1e52d5c59d13fe12b1f upstream.
The USB DP/DM HS PHY interrupts need to be provided by the PDC interrupt controller in order to be able to wake the system up from low-power states and to be able to detect disconnect events, which requires triggering on falling edges.
A recent commit updated the trigger type but failed to change the interrupt provider as required. This leads to the current Linux driver failing to probe instead of printing an error during suspend and USB wakeup not working as intended.
Fixes: de3b3de30999 ("arm64: dts: qcom: sdm670: fix USB wakeup interrupt types") Fixes: 07c8ded6e373 ("arm64: dts: qcom: add sdm670 and pixel 3a device trees") Cc: stable@vger.kernel.org # 6.2 Cc: Richard Acayan mailingradian@gmail.com Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Tested-by: Richard Acayan mailingradian@gmail.com Link: https://lore.kernel.org/r/20231214074319.11023-2-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sdm670.dtsi | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sdm670.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm670.dtsi @@ -1295,10 +1295,10 @@ <&gcc GCC_USB30_PRIM_MASTER_CLK>; assigned-clock-rates = <19200000>, <150000000>;
- interrupts = <GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, - <GIC_SPI 488 IRQ_TYPE_EDGE_BOTH>, - <GIC_SPI 489 IRQ_TYPE_EDGE_BOTH>; + interrupts-extended = <&intc GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, + <&intc GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 8 IRQ_TYPE_EDGE_BOTH>, + <&pdc 9 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq", "dm_hs_phy_irq", "dp_hs_phy_irq";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 047b2edc35b8db22354b4fba37818b548fc18896 upstream.
The USB SS PHY interrupt needs to be provided by the PDC interrupt controller in order to be able to wake the system up from low-power states.
Fixes: 07c8ded6e373 ("arm64: dts: qcom: add sdm670 and pixel 3a device trees") Cc: stable@vger.kernel.org # 6.2 Cc: Richard Acayan mailingradian@gmail.com Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Tested-by: Richard Acayan mailingradian@gmail.com Link: https://lore.kernel.org/r/20231214074319.11023-3-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sdm670.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm64/boot/dts/qcom/sdm670.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm670.dtsi @@ -1296,7 +1296,7 @@ assigned-clock-rates = <19200000>, <150000000>;
interrupts-extended = <&intc GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, - <&intc GIC_SPI 486 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 6 IRQ_TYPE_LEVEL_HIGH>, <&pdc 8 IRQ_TYPE_EDGE_BOTH>, <&pdc 9 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq",
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johan Hovold johan+linaro@kernel.org
commit 710dd03464e4ab5b3d329768388b165d61958577 upstream.
The USB SS PHY interrupt needs to be provided by the PDC interrupt controller in order to be able to wake the system up from low-power states.
Fixes: fea4b41022f3 ("ARM: dts: qcom: sdx55: Add USB3 and PHY support") Cc: stable@vger.kernel.org # 5.12 Cc: Manivannan Sadhasivam mani@kernel.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Reviewed-by: Konrad Dybcio konrad.dybcio@linaro.org Reviewed-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Link: https://lore.kernel.org/r/20231213173131.29436-4-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson andersson@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/boot/dts/qcom/qcom-sdx55.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/arm/boot/dts/qcom/qcom-sdx55.dtsi +++ b/arch/arm/boot/dts/qcom/qcom-sdx55.dtsi @@ -593,7 +593,7 @@ assigned-clock-rates = <19200000>, <200000000>;
interrupts-extended = <&intc GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>, - <&intc GIC_SPI 198 IRQ_TYPE_LEVEL_HIGH>, + <&pdc 51 IRQ_TYPE_LEVEL_HIGH>, <&pdc 11 IRQ_TYPE_EDGE_BOTH>, <&pdc 10 IRQ_TYPE_EDGE_BOTH>; interrupt-names = "hs_phy_irq", "ss_phy_irq",
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alfred Piccioni alpic@google.com
commit f1bb47a31dff6d4b34fb14e99850860ee74bb003 upstream.
Some ioctl commands do not require ioctl permission, but are routed to other permissions such as FILE_GETATTR or FILE_SETATTR. This routing is done by comparing the ioctl cmd to a set of 64-bit flags (FS_IOC_*).
However, if a 32-bit process is running on a 64-bit kernel, it emits 32-bit flags (FS_IOC32_*) for certain ioctl operations. These flags are being checked erroneously, which leads to these ioctl operations being routed to the ioctl permission, rather than the correct file permissions.
This was also noted in a RED-PEN finding from a while back - "/* RED-PEN how should LSM module know it's handling 32bit? */".
This patch introduces a new hook, security_file_ioctl_compat(), that is called from the compat ioctl syscall. All current LSMs have been changed to support this hook.
Reviewing the three places where we are currently using security_file_ioctl(), it appears that only SELinux needs a dedicated compat change; TOMOYO and SMACK appear to be functional without any change.
Cc: stable@vger.kernel.org Fixes: 0b24dcb7f2f7 ("Revert "selinux: simplify ioctl checking"") Signed-off-by: Alfred Piccioni alpic@google.com Reviewed-by: Stephen Smalley stephen.smalley.work@gmail.com [PM: subject tweak, line length fixes, and alignment corrections] Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ioctl.c | 3 +-- include/linux/lsm_hook_defs.h | 2 ++ include/linux/security.h | 9 +++++++++ security/security.c | 18 ++++++++++++++++++ security/selinux/hooks.c | 28 ++++++++++++++++++++++++++++ security/smack/smack_lsm.c | 1 + security/tomoyo/tomoyo.c | 1 + 7 files changed, 60 insertions(+), 2 deletions(-)
--- a/fs/ioctl.c +++ b/fs/ioctl.c @@ -920,8 +920,7 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned i if (!f.file) return -EBADF;
- /* RED-PEN how should LSM module know it's handling 32bit? */ - error = security_file_ioctl(f.file, cmd, arg); + error = security_file_ioctl_compat(f.file, cmd, arg); if (error) goto out;
--- a/include/linux/lsm_hook_defs.h +++ b/include/linux/lsm_hook_defs.h @@ -171,6 +171,8 @@ LSM_HOOK(int, 0, file_alloc_security, st LSM_HOOK(void, LSM_RET_VOID, file_free_security, struct file *file) LSM_HOOK(int, 0, file_ioctl, struct file *file, unsigned int cmd, unsigned long arg) +LSM_HOOK(int, 0, file_ioctl_compat, struct file *file, unsigned int cmd, + unsigned long arg) LSM_HOOK(int, 0, mmap_addr, unsigned long addr) LSM_HOOK(int, 0, mmap_file, struct file *file, unsigned long reqprot, unsigned long prot, unsigned long flags) --- a/include/linux/security.h +++ b/include/linux/security.h @@ -389,6 +389,8 @@ int security_file_permission(struct file int security_file_alloc(struct file *file); void security_file_free(struct file *file); int security_file_ioctl(struct file *file, unsigned int cmd, unsigned long arg); +int security_file_ioctl_compat(struct file *file, unsigned int cmd, + unsigned long arg); int security_mmap_file(struct file *file, unsigned long prot, unsigned long flags); int security_mmap_addr(unsigned long addr); @@ -986,6 +988,13 @@ static inline int security_file_ioctl(st { return 0; } + +static inline int security_file_ioctl_compat(struct file *file, + unsigned int cmd, + unsigned long arg) +{ + return 0; +}
static inline int security_mmap_file(struct file *file, unsigned long prot, unsigned long flags) --- a/security/security.c +++ b/security/security.c @@ -2648,6 +2648,24 @@ int security_file_ioctl(struct file *fil } EXPORT_SYMBOL_GPL(security_file_ioctl);
+/** + * security_file_ioctl_compat() - Check if an ioctl is allowed in compat mode + * @file: associated file + * @cmd: ioctl cmd + * @arg: ioctl arguments + * + * Compat version of security_file_ioctl() that correctly handles 32-bit + * processes running on 64-bit kernels. + * + * Return: Returns 0 if permission is granted. + */ +int security_file_ioctl_compat(struct file *file, unsigned int cmd, + unsigned long arg) +{ + return call_int_hook(file_ioctl_compat, 0, file, cmd, arg); +} +EXPORT_SYMBOL_GPL(security_file_ioctl_compat); + static inline unsigned long mmap_prot(struct file *file, unsigned long prot) { /* --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3725,6 +3725,33 @@ static int selinux_file_ioctl(struct fil return error; }
+static int selinux_file_ioctl_compat(struct file *file, unsigned int cmd, + unsigned long arg) +{ + /* + * If we are in a 64-bit kernel running 32-bit userspace, we need to + * make sure we don't compare 32-bit flags to 64-bit flags. + */ + switch (cmd) { + case FS_IOC32_GETFLAGS: + cmd = FS_IOC_GETFLAGS; + break; + case FS_IOC32_SETFLAGS: + cmd = FS_IOC_SETFLAGS; + break; + case FS_IOC32_GETVERSION: + cmd = FS_IOC_GETVERSION; + break; + case FS_IOC32_SETVERSION: + cmd = FS_IOC_SETVERSION; + break; + default: + break; + } + + return selinux_file_ioctl(file, cmd, arg); +} + static int default_noexec __ro_after_init;
static int file_map_prot_check(struct file *file, unsigned long prot, int shared) @@ -7037,6 +7064,7 @@ static struct security_hook_list selinux LSM_HOOK_INIT(file_permission, selinux_file_permission), LSM_HOOK_INIT(file_alloc_security, selinux_file_alloc_security), LSM_HOOK_INIT(file_ioctl, selinux_file_ioctl), + LSM_HOOK_INIT(file_ioctl_compat, selinux_file_ioctl_compat), LSM_HOOK_INIT(mmap_file, selinux_mmap_file), LSM_HOOK_INIT(mmap_addr, selinux_mmap_addr), LSM_HOOK_INIT(file_mprotect, selinux_file_mprotect), --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -4973,6 +4973,7 @@ static struct security_hook_list smack_h
LSM_HOOK_INIT(file_alloc_security, smack_file_alloc_security), LSM_HOOK_INIT(file_ioctl, smack_file_ioctl), + LSM_HOOK_INIT(file_ioctl_compat, smack_file_ioctl), LSM_HOOK_INIT(file_lock, smack_file_lock), LSM_HOOK_INIT(file_fcntl, smack_file_fcntl), LSM_HOOK_INIT(mmap_file, smack_mmap_file), --- a/security/tomoyo/tomoyo.c +++ b/security/tomoyo/tomoyo.c @@ -568,6 +568,7 @@ static struct security_hook_list tomoyo_ LSM_HOOK_INIT(path_rename, tomoyo_path_rename), LSM_HOOK_INIT(inode_getattr, tomoyo_inode_getattr), LSM_HOOK_INIT(file_ioctl, tomoyo_file_ioctl), + LSM_HOOK_INIT(file_ioctl_compat, tomoyo_file_ioctl), LSM_HOOK_INIT(path_chmod, tomoyo_path_chmod), LSM_HOOK_INIT(path_chown, tomoyo_path_chown), LSM_HOOK_INIT(path_chroot, tomoyo_path_chroot),
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jordan Rife jrife@google.com
commit e9cdebbe23f1aa9a1caea169862f479ab3fa2773 upstream.
Recent changes to kernel_connect() and kernel_bind() ensure that callers are insulated from changes to the address parameter made by BPF SOCK_ADDR hooks. This patch wraps direct calls to ops->connect() and ops->bind() with kernel_connect() and kernel_bind() to protect callers in such cases.
Link: https://lore.kernel.org/netdev/9944248dba1bce861375fcce9de663934d933ba9.came... Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect") Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind") Cc: stable@vger.kernel.org Signed-off-by: Jordan Rife jrife@google.com Signed-off-by: David Teigland teigland@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/dlm/lowcomms.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
--- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -1805,8 +1805,8 @@ static int dlm_tcp_bind(struct socket *s memcpy(&src_addr, &dlm_local_addr[0], sizeof(src_addr)); make_sockaddr(&src_addr, 0, &addr_len);
- result = sock->ops->bind(sock, (struct sockaddr *)&src_addr, - addr_len); + result = kernel_bind(sock, (struct sockaddr *)&src_addr, + addr_len); if (result < 0) { /* This *may* not indicate a critical error */ log_print("could not bind for connect: %d", result); @@ -1818,7 +1818,7 @@ static int dlm_tcp_bind(struct socket *s static int dlm_tcp_connect(struct connection *con, struct socket *sock, struct sockaddr *addr, int addr_len) { - return sock->ops->connect(sock, addr, addr_len, O_NONBLOCK); + return kernel_connect(sock, addr, addr_len, O_NONBLOCK); }
static int dlm_tcp_listen_validate(void) @@ -1850,8 +1850,8 @@ static int dlm_tcp_listen_bind(struct so
/* Bind to our port */ make_sockaddr(&dlm_local_addr[0], dlm_config.ci_tcp_port, &addr_len); - return sock->ops->bind(sock, (struct sockaddr *)&dlm_local_addr[0], - addr_len); + return kernel_bind(sock, (struct sockaddr *)&dlm_local_addr[0], + addr_len); }
static const struct dlm_proto_ops dlm_tcp_ops = { @@ -1876,12 +1876,12 @@ static int dlm_sctp_connect(struct conne int ret;
/* - * Make sock->ops->connect() function return in specified time, + * Make kernel_connect() function return in specified time, * since O_NONBLOCK argument in connect() function does not work here, * then, we should restore the default value of this attribute. */ sock_set_sndtimeo(sock->sk, 5); - ret = sock->ops->connect(sock, addr, addr_len, 0); + ret = kernel_connect(sock, addr, addr_len, 0); sock_set_sndtimeo(sock->sk, 0); return ret; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vegard Nossum vegard.nossum@oracle.com
commit 3231dd5862779c2e15633c96133a53205ad660ce upstream.
The kernel-abi directive passes its argument straight to the shell. This is unfortunate and unnecessary.
Let's always use paths relative to $srctree/Documentation/ and use subprocess.check_call() instead of subprocess.Popen(shell=True).
This also makes the code shorter.
Link: https://fosstodon.org/@jani/111676532203641247 Reported-by: Jani Nikula jani.nikula@intel.com Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum vegard.nossum@oracle.com Signed-off-by: Jonathan Corbet corbet@lwn.net Link: https://lore.kernel.org/r/20231231235959.3342928-2-vegard.nossum@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/admin-guide/abi-obsolete.rst | 2 - Documentation/admin-guide/abi-removed.rst | 2 - Documentation/admin-guide/abi-stable.rst | 2 - Documentation/admin-guide/abi-testing.rst | 2 - Documentation/sphinx/kernel_abi.py | 56 +++++------------------------ 5 files changed, 14 insertions(+), 50 deletions(-)
--- a/Documentation/admin-guide/abi-obsolete.rst +++ b/Documentation/admin-guide/abi-obsolete.rst @@ -7,5 +7,5 @@ marked to be removed at some later point The description of the interface will document the reason why it is obsolete and when it can be expected to be removed.
-.. kernel-abi:: $srctree/Documentation/ABI/obsolete +.. kernel-abi:: ABI/obsolete :rst: --- a/Documentation/admin-guide/abi-removed.rst +++ b/Documentation/admin-guide/abi-removed.rst @@ -1,5 +1,5 @@ ABI removed symbols ===================
-.. kernel-abi:: $srctree/Documentation/ABI/removed +.. kernel-abi:: ABI/removed :rst: --- a/Documentation/admin-guide/abi-stable.rst +++ b/Documentation/admin-guide/abi-stable.rst @@ -10,5 +10,5 @@ for at least 2 years. Most interfaces (like syscalls) are expected to never change and always be available.
-.. kernel-abi:: $srctree/Documentation/ABI/stable +.. kernel-abi:: ABI/stable :rst: --- a/Documentation/admin-guide/abi-testing.rst +++ b/Documentation/admin-guide/abi-testing.rst @@ -16,5 +16,5 @@ Programs that use these interfaces are s name to the description of these interfaces, so that the kernel developers can easily notify them if any changes occur.
-.. kernel-abi:: $srctree/Documentation/ABI/testing +.. kernel-abi:: ABI/testing :rst: --- a/Documentation/sphinx/kernel_abi.py +++ b/Documentation/sphinx/kernel_abi.py @@ -39,8 +39,6 @@ import sys import re import kernellog
-from os import path - from docutils import nodes, statemachine from docutils.statemachine import ViewList from docutils.parsers.rst import directives, Directive @@ -73,60 +71,26 @@ class KernelCmd(Directive): }
def run(self): - doc = self.state.document if not doc.settings.file_insertion_enabled: raise self.warning("docutils: file insertion disabled")
- env = doc.settings.env - cwd = path.dirname(doc.current_source) - cmd = "get_abi.pl rest --enable-lineno --dir " - cmd += self.arguments[0] - - if 'rst' in self.options: - cmd += " --rst-source" - - srctree = path.abspath(os.environ["srctree"]) + srctree = os.path.abspath(os.environ["srctree"])
- fname = cmd + args = [ + os.path.join(srctree, 'scripts/get_abi.pl'), + 'rest', + '--enable-lineno', + '--dir', os.path.join(srctree, 'Documentation', self.arguments[0]), + ]
- # extend PATH with $(srctree)/scripts - path_env = os.pathsep.join([ - srctree + os.sep + "scripts", - os.environ["PATH"] - ]) - shell_env = os.environ.copy() - shell_env["PATH"] = path_env - shell_env["srctree"] = srctree + if 'rst' in self.options: + args.append('--rst-source')
- lines = self.runCmd(cmd, shell=True, cwd=cwd, env=shell_env) + lines = subprocess.check_output(args, cwd=os.path.dirname(doc.current_source)).decode('utf-8') nodeList = self.nestedParse(lines, self.arguments[0]) return nodeList
- def runCmd(self, cmd, **kwargs): - u"""Run command ``cmd`` and return its stdout as unicode.""" - - try: - proc = subprocess.Popen( - cmd - , stdout = subprocess.PIPE - , stderr = subprocess.PIPE - , **kwargs - ) - out, err = proc.communicate() - - out, err = codecs.decode(out, 'utf-8'), codecs.decode(err, 'utf-8') - - if proc.returncode != 0: - raise self.severe( - u"command '%s' failed with return code %d" - % (cmd, proc.returncode) - ) - except OSError as exc: - raise self.severe(u"problems with '%s' directive: %s." - % (self.name, ErrorString(exc))) - return out - def nestedParse(self, lines, fname): env = self.state.document.settings.env content = ViewList()
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vegard Nossum vegard.nossum@oracle.com
commit 5889d6ede53bc17252f79c142387e007224aa554 upstream.
The code currently leaks the absolute path of the ABI files into the rendered documentation.
There exists code to prevent this, but it is not effective when an absolute path is passed, which it is when $srctree is used.
I consider this to be a minimal, stop-gap fix; a better fix would strip off the actual prefix instead of hacking it off with a regex.
Link: https://mastodon.social/@vegard/111677490643495163 Cc: Jani Nikula jani.nikula@intel.com Cc: stable@vger.kernel.org Signed-off-by: Vegard Nossum vegard.nossum@oracle.com Signed-off-by: Jonathan Corbet corbet@lwn.net Link: https://lore.kernel.org/r/20231231235959.3342928-1-vegard.nossum@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- scripts/get_abi.pl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/scripts/get_abi.pl +++ b/scripts/get_abi.pl @@ -98,7 +98,7 @@ sub parse_abi { $name =~ s,.*/,,;
my $fn = $file; - $fn =~ s,Documentation/ABI/,,; + $fn =~ s,.*Documentation/ABI/,,;
my $nametag = "File $fn"; $data{$nametag}->{what} = "File $name";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Grzeschik m.grzeschik@pengutronix.de
commit 608ca5a60ee47b48fec210aeb7a795a64eb5dcee upstream.
For dmabuf import users to be able to use the vaddr from another videobuf2-dma-sg source, the exporter needs to set a proper vaddr on vb2_dma_sg_dmabuf_ops_vmap callback. This patch adds vmap on map if buf->vaddr was not set.
Cc: stable@kernel.org Fixes: 7938f4218168 ("dma-buf-map: Rename to iosys-map") Signed-off-by: Michael Grzeschik m.grzeschik@pengutronix.de Acked-by: Tomasz Figa tfiga@chromium.org Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/common/videobuf2/videobuf2-dma-sg.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/drivers/media/common/videobuf2/videobuf2-dma-sg.c +++ b/drivers/media/common/videobuf2/videobuf2-dma-sg.c @@ -487,9 +487,15 @@ vb2_dma_sg_dmabuf_ops_end_cpu_access(str static int vb2_dma_sg_dmabuf_ops_vmap(struct dma_buf *dbuf, struct iosys_map *map) { - struct vb2_dma_sg_buf *buf = dbuf->priv; + struct vb2_dma_sg_buf *buf; + void *vaddr;
- iosys_map_set_vaddr(map, buf->vaddr); + buf = dbuf->priv; + vaddr = vb2_dma_sg_vaddr(buf->vb, buf); + if (!vaddr) + return -EINVAL; + + iosys_map_set_vaddr(map, vaddr);
return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Avri Altman avri.altman@wdc.com
commit 4d0c8d0aef6355660b6775d57ccd5d4ea2e15802 upstream.
Field Firmware Update (ffu) may use close-ended or open ended sequence. Each such sequence is comprised of a write commands enclosed between 2 switch commands - to and from ffu mode. So for the close-ended case, it will be: cmd6->cmd23-cmd25-cmd6.
Some host controllers however, get confused when multi-block rw is sent without sbc, and may generate auto-cmd12 which breaks the ffu sequence. I encountered this issue while testing fwupd (github.com/fwupd/fwupd) on HP Chromebook x2, a qualcomm based QC-7c, code name - strongbad.
Instead of a quirk, or hooking the request function of the msm ops, it would be better to fix the ioctl handling and make it use mrq.sbc instead of issuing SET_BLOCK_COUNT separately.
Signed-off-by: Avri Altman avri.altman@wdc.com Acked-by: Adrian Hunter adrian.hunter@intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231129092535.3278-1-avri.altman@wdc.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/core/block.c | 46 +++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 43 insertions(+), 3 deletions(-)
--- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -400,6 +400,10 @@ struct mmc_blk_ioc_data { struct mmc_ioc_cmd ic; unsigned char *buf; u64 buf_bytes; + unsigned int flags; +#define MMC_BLK_IOC_DROP BIT(0) /* drop this mrq */ +#define MMC_BLK_IOC_SBC BIT(1) /* use mrq.sbc */ + struct mmc_rpmb_data *rpmb; };
@@ -465,7 +469,7 @@ static int mmc_blk_ioctl_copy_to_user(st }
static int __mmc_blk_ioctl_cmd(struct mmc_card *card, struct mmc_blk_data *md, - struct mmc_blk_ioc_data *idata) + struct mmc_blk_ioc_data **idatas, int i) { struct mmc_command cmd = {}, sbc = {}; struct mmc_data data = {}; @@ -475,10 +479,18 @@ static int __mmc_blk_ioctl_cmd(struct mm unsigned int busy_timeout_ms; int err; unsigned int target_part; + struct mmc_blk_ioc_data *idata = idatas[i]; + struct mmc_blk_ioc_data *prev_idata = NULL;
if (!card || !md || !idata) return -EINVAL;
+ if (idata->flags & MMC_BLK_IOC_DROP) + return 0; + + if (idata->flags & MMC_BLK_IOC_SBC) + prev_idata = idatas[i - 1]; + /* * The RPMB accesses comes in from the character device, so we * need to target these explicitly. Else we just target the @@ -532,7 +544,7 @@ static int __mmc_blk_ioctl_cmd(struct mm return err; }
- if (idata->rpmb) { + if (idata->rpmb || prev_idata) { sbc.opcode = MMC_SET_BLOCK_COUNT; /* * We don't do any blockcount validation because the max size @@ -540,6 +552,8 @@ static int __mmc_blk_ioctl_cmd(struct mm * 'Reliable Write' bit here. */ sbc.arg = data.blocks | (idata->ic.write_flag & BIT(31)); + if (prev_idata) + sbc.arg = prev_idata->ic.arg; sbc.flags = MMC_RSP_R1 | MMC_CMD_AC; mrq.sbc = &sbc; } @@ -557,6 +571,15 @@ static int __mmc_blk_ioctl_cmd(struct mm mmc_wait_for_req(card->host, &mrq); memcpy(&idata->ic.response, cmd.resp, sizeof(cmd.resp));
+ if (prev_idata) { + memcpy(&prev_idata->ic.response, sbc.resp, sizeof(sbc.resp)); + if (sbc.error) { + dev_err(mmc_dev(card->host), "%s: sbc error %d\n", + __func__, sbc.error); + return sbc.error; + } + } + if (cmd.error) { dev_err(mmc_dev(card->host), "%s: cmd error %d\n", __func__, cmd.error); @@ -1034,6 +1057,20 @@ static inline void mmc_blk_reset_success md->reset_done &= ~type; }
+static void mmc_blk_check_sbc(struct mmc_queue_req *mq_rq) +{ + struct mmc_blk_ioc_data **idata = mq_rq->drv_op_data; + int i; + + for (i = 1; i < mq_rq->ioc_count; i++) { + if (idata[i - 1]->ic.opcode == MMC_SET_BLOCK_COUNT && + mmc_op_multi(idata[i]->ic.opcode)) { + idata[i - 1]->flags |= MMC_BLK_IOC_DROP; + idata[i]->flags |= MMC_BLK_IOC_SBC; + } + } +} + /* * The non-block commands come back from the block layer after it queued it and * processed it with all other requests and then they get issued in this @@ -1061,11 +1098,14 @@ static void mmc_blk_issue_drv_op(struct if (ret) break; } + + mmc_blk_check_sbc(mq_rq); + fallthrough; case MMC_DRV_OP_IOCTL_RPMB: idata = mq_rq->drv_op_data; for (i = 0, ret = 0; i < mq_rq->ioc_count; i++) { - ret = __mmc_blk_ioctl_cmd(card, md, idata[i]); + ret = __mmc_blk_ioctl_cmd(card, md, idata, i); if (ret) break; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
commit 84a6be7db9050dd2601c9870f65eab9a665d2d5d upstream.
There is no need to duplicate what SPI core or individual controller drivers already do, i.e. mapping the buffers for DMA capable transfers.
Note, that the code, besides its redundancy, was buggy: strictly speaking there is no guarantee, while it's true for those which can use this code (see below), that the SPI host controller _is_ the device which does DMA.
Also see the Link tags below.
Additional notes. Currently only two SPI host controller drivers may use premapped (by the user) DMA buffers:
- drivers/spi/spi-au1550.c
- drivers/spi/spi-fsl-spi.c
Both of them have DMA mapping support code. I don't expect that SPI host controller code is worse than what has been done in mmc_spi. Hence I do not expect any regressions here. Otherwise, I'm pretty much sure these regressions have to be fixed in the respective drivers, and not here.
That said, remove all related pieces of DMA mapping code from mmc_spi.
Link: https://lore.kernel.org/linux-mmc/c73b9ba9-1699-2aff-e2fd-b4b4f292a3ca@raspb... Link: https://stackoverflow.com/questions/67620728/mmc-spi-issue-not-able-to-setup... Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20231207221901.3259962-1-andriy.shevchenko@linux.i... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/mmc_spi.c | 186 +-------------------------------------------- 1 file changed, 5 insertions(+), 181 deletions(-)
--- a/drivers/mmc/host/mmc_spi.c +++ b/drivers/mmc/host/mmc_spi.c @@ -15,7 +15,7 @@ #include <linux/slab.h> #include <linux/module.h> #include <linux/bio.h> -#include <linux/dma-mapping.h> +#include <linux/dma-direction.h> #include <linux/crc7.h> #include <linux/crc-itu-t.h> #include <linux/scatterlist.h> @@ -119,19 +119,14 @@ struct mmc_spi_host { struct spi_transfer status; struct spi_message readback;
- /* underlying DMA-aware controller, or null */ - struct device *dma_dev; - /* buffer used for commands and for message "overhead" */ struct scratch *data; - dma_addr_t data_dma;
/* Specs say to write ones most of the time, even when the card * has no need to read its input data; and many cards won't care. * This is our source of those ones. */ void *ones; - dma_addr_t ones_dma; };
@@ -147,11 +142,8 @@ static inline int mmc_cs_off(struct mmc_ return spi_setup(host->spi); }
-static int -mmc_spi_readbytes(struct mmc_spi_host *host, unsigned len) +static int mmc_spi_readbytes(struct mmc_spi_host *host, unsigned int len) { - int status; - if (len > sizeof(*host->data)) { WARN_ON(1); return -EIO; @@ -159,19 +151,7 @@ mmc_spi_readbytes(struct mmc_spi_host *h
host->status.len = len;
- if (host->dma_dev) - dma_sync_single_for_device(host->dma_dev, - host->data_dma, sizeof(*host->data), - DMA_FROM_DEVICE); - - status = spi_sync_locked(host->spi, &host->readback); - - if (host->dma_dev) - dma_sync_single_for_cpu(host->dma_dev, - host->data_dma, sizeof(*host->data), - DMA_FROM_DEVICE); - - return status; + return spi_sync_locked(host->spi, &host->readback); }
static int mmc_spi_skip(struct mmc_spi_host *host, unsigned long timeout, @@ -506,23 +486,11 @@ mmc_spi_command_send(struct mmc_spi_host t = &host->t; memset(t, 0, sizeof(*t)); t->tx_buf = t->rx_buf = data->status; - t->tx_dma = t->rx_dma = host->data_dma; t->len = cp - data->status; t->cs_change = 1; spi_message_add_tail(t, &host->m);
- if (host->dma_dev) { - host->m.is_dma_mapped = 1; - dma_sync_single_for_device(host->dma_dev, - host->data_dma, sizeof(*host->data), - DMA_BIDIRECTIONAL); - } status = spi_sync_locked(host->spi, &host->m); - - if (host->dma_dev) - dma_sync_single_for_cpu(host->dma_dev, - host->data_dma, sizeof(*host->data), - DMA_BIDIRECTIONAL); if (status < 0) { dev_dbg(&host->spi->dev, " ... write returned %d\n", status); cmd->error = status; @@ -540,9 +508,6 @@ mmc_spi_command_send(struct mmc_spi_host * We always provide TX data for data and CRC. The MMC/SD protocol * requires us to write ones; but Linux defaults to writing zeroes; * so we explicitly initialize it to all ones on RX paths. - * - * We also handle DMA mapping, so the underlying SPI controller does - * not need to (re)do it for each message. */ static void mmc_spi_setup_data_message( @@ -552,11 +517,8 @@ mmc_spi_setup_data_message( { struct spi_transfer *t; struct scratch *scratch = host->data; - dma_addr_t dma = host->data_dma;
spi_message_init(&host->m); - if (dma) - host->m.is_dma_mapped = 1;
/* for reads, readblock() skips 0xff bytes before finding * the token; for writes, this transfer issues that token. @@ -570,8 +532,6 @@ mmc_spi_setup_data_message( else scratch->data_token = SPI_TOKEN_SINGLE; t->tx_buf = &scratch->data_token; - if (dma) - t->tx_dma = dma + offsetof(struct scratch, data_token); spi_message_add_tail(t, &host->m); }
@@ -581,7 +541,6 @@ mmc_spi_setup_data_message( t = &host->t; memset(t, 0, sizeof(*t)); t->tx_buf = host->ones; - t->tx_dma = host->ones_dma; /* length and actual buffer info are written later */ spi_message_add_tail(t, &host->m);
@@ -591,14 +550,9 @@ mmc_spi_setup_data_message( if (direction == DMA_TO_DEVICE) { /* the actual CRC may get written later */ t->tx_buf = &scratch->crc_val; - if (dma) - t->tx_dma = dma + offsetof(struct scratch, crc_val); } else { t->tx_buf = host->ones; - t->tx_dma = host->ones_dma; t->rx_buf = &scratch->crc_val; - if (dma) - t->rx_dma = dma + offsetof(struct scratch, crc_val); } spi_message_add_tail(t, &host->m);
@@ -621,10 +575,7 @@ mmc_spi_setup_data_message( memset(t, 0, sizeof(*t)); t->len = (direction == DMA_TO_DEVICE) ? sizeof(scratch->status) : 1; t->tx_buf = host->ones; - t->tx_dma = host->ones_dma; t->rx_buf = scratch->status; - if (dma) - t->rx_dma = dma + offsetof(struct scratch, status); t->cs_change = 1; spi_message_add_tail(t, &host->m); } @@ -653,23 +604,13 @@ mmc_spi_writeblock(struct mmc_spi_host *
if (host->mmc->use_spi_crc) scratch->crc_val = cpu_to_be16(crc_itu_t(0, t->tx_buf, t->len)); - if (host->dma_dev) - dma_sync_single_for_device(host->dma_dev, - host->data_dma, sizeof(*scratch), - DMA_BIDIRECTIONAL);
status = spi_sync_locked(spi, &host->m); - if (status != 0) { dev_dbg(&spi->dev, "write error (%d)\n", status); return status; }
- if (host->dma_dev) - dma_sync_single_for_cpu(host->dma_dev, - host->data_dma, sizeof(*scratch), - DMA_BIDIRECTIONAL); - /* * Get the transmission data-response reply. It must follow * immediately after the data block we transferred. This reply @@ -718,8 +659,6 @@ mmc_spi_writeblock(struct mmc_spi_host * }
t->tx_buf += t->len; - if (host->dma_dev) - t->tx_dma += t->len;
/* Return when not busy. If we didn't collect that status yet, * we'll need some more I/O. @@ -783,30 +722,12 @@ mmc_spi_readblock(struct mmc_spi_host *h } leftover = status << 1;
- if (host->dma_dev) { - dma_sync_single_for_device(host->dma_dev, - host->data_dma, sizeof(*scratch), - DMA_BIDIRECTIONAL); - dma_sync_single_for_device(host->dma_dev, - t->rx_dma, t->len, - DMA_FROM_DEVICE); - } - status = spi_sync_locked(spi, &host->m); if (status < 0) { dev_dbg(&spi->dev, "read error %d\n", status); return status; }
- if (host->dma_dev) { - dma_sync_single_for_cpu(host->dma_dev, - host->data_dma, sizeof(*scratch), - DMA_BIDIRECTIONAL); - dma_sync_single_for_cpu(host->dma_dev, - t->rx_dma, t->len, - DMA_FROM_DEVICE); - } - if (bitshift) { /* Walk through the data and the crc and do * all the magic to get byte-aligned data. @@ -841,8 +762,6 @@ mmc_spi_readblock(struct mmc_spi_host *h }
t->rx_buf += t->len; - if (host->dma_dev) - t->rx_dma += t->len;
return 0; } @@ -857,7 +776,6 @@ mmc_spi_data_do(struct mmc_spi_host *hos struct mmc_data *data, u32 blk_size) { struct spi_device *spi = host->spi; - struct device *dma_dev = host->dma_dev; struct spi_transfer *t; enum dma_data_direction direction = mmc_get_dma_dir(data); struct scatterlist *sg; @@ -884,31 +802,8 @@ mmc_spi_data_do(struct mmc_spi_host *hos */ for_each_sg(data->sg, sg, data->sg_len, n_sg) { int status = 0; - dma_addr_t dma_addr = 0; void *kmap_addr; unsigned length = sg->length; - enum dma_data_direction dir = direction; - - /* set up dma mapping for controller drivers that might - * use DMA ... though they may fall back to PIO - */ - if (dma_dev) { - /* never invalidate whole *shared* pages ... */ - if ((sg->offset != 0 || length != PAGE_SIZE) - && dir == DMA_FROM_DEVICE) - dir = DMA_BIDIRECTIONAL; - - dma_addr = dma_map_page(dma_dev, sg_page(sg), 0, - PAGE_SIZE, dir); - if (dma_mapping_error(dma_dev, dma_addr)) { - data->error = -EFAULT; - break; - } - if (direction == DMA_TO_DEVICE) - t->tx_dma = dma_addr + sg->offset; - else - t->rx_dma = dma_addr + sg->offset; - }
/* allow pio too; we don't allow highmem */ kmap_addr = kmap(sg_page(sg)); @@ -941,8 +836,6 @@ mmc_spi_data_do(struct mmc_spi_host *hos if (direction == DMA_FROM_DEVICE) flush_dcache_page(sg_page(sg)); kunmap(sg_page(sg)); - if (dma_dev) - dma_unmap_page(dma_dev, dma_addr, PAGE_SIZE, dir);
if (status < 0) { data->error = status; @@ -977,21 +870,9 @@ mmc_spi_data_do(struct mmc_spi_host *hos scratch->status[0] = SPI_TOKEN_STOP_TRAN;
host->early_status.tx_buf = host->early_status.rx_buf; - host->early_status.tx_dma = host->early_status.rx_dma; host->early_status.len = statlen;
- if (host->dma_dev) - dma_sync_single_for_device(host->dma_dev, - host->data_dma, sizeof(*scratch), - DMA_BIDIRECTIONAL); - tmp = spi_sync_locked(spi, &host->m); - - if (host->dma_dev) - dma_sync_single_for_cpu(host->dma_dev, - host->data_dma, sizeof(*scratch), - DMA_BIDIRECTIONAL); - if (tmp < 0) { if (!data->error) data->error = tmp; @@ -1265,52 +1146,6 @@ mmc_spi_detect_irq(int irq, void *mmc) return IRQ_HANDLED; }
-#ifdef CONFIG_HAS_DMA -static int mmc_spi_dma_alloc(struct mmc_spi_host *host) -{ - struct spi_device *spi = host->spi; - struct device *dev; - - if (!spi->master->dev.parent->dma_mask) - return 0; - - dev = spi->master->dev.parent; - - host->ones_dma = dma_map_single(dev, host->ones, MMC_SPI_BLOCKSIZE, - DMA_TO_DEVICE); - if (dma_mapping_error(dev, host->ones_dma)) - return -ENOMEM; - - host->data_dma = dma_map_single(dev, host->data, sizeof(*host->data), - DMA_BIDIRECTIONAL); - if (dma_mapping_error(dev, host->data_dma)) { - dma_unmap_single(dev, host->ones_dma, MMC_SPI_BLOCKSIZE, - DMA_TO_DEVICE); - return -ENOMEM; - } - - dma_sync_single_for_cpu(dev, host->data_dma, sizeof(*host->data), - DMA_BIDIRECTIONAL); - - host->dma_dev = dev; - return 0; -} - -static void mmc_spi_dma_free(struct mmc_spi_host *host) -{ - if (!host->dma_dev) - return; - - dma_unmap_single(host->dma_dev, host->ones_dma, MMC_SPI_BLOCKSIZE, - DMA_TO_DEVICE); - dma_unmap_single(host->dma_dev, host->data_dma, sizeof(*host->data), - DMA_BIDIRECTIONAL); -} -#else -static inline int mmc_spi_dma_alloc(struct mmc_spi_host *host) { return 0; } -static inline void mmc_spi_dma_free(struct mmc_spi_host *host) {} -#endif - static int mmc_spi_probe(struct spi_device *spi) { void *ones; @@ -1402,24 +1237,17 @@ static int mmc_spi_probe(struct spi_devi host->powerup_msecs = 250; }
- /* preallocate dma buffers */ + /* Preallocate buffers */ host->data = kmalloc(sizeof(*host->data), GFP_KERNEL); if (!host->data) goto fail_nobuf1;
- status = mmc_spi_dma_alloc(host); - if (status) - goto fail_dma; - /* setup message for status/busy readback */ spi_message_init(&host->readback); - host->readback.is_dma_mapped = (host->dma_dev != NULL);
spi_message_add_tail(&host->status, &host->readback); host->status.tx_buf = host->ones; - host->status.tx_dma = host->ones_dma; host->status.rx_buf = &host->data->status; - host->status.rx_dma = host->data_dma + offsetof(struct scratch, status); host->status.cs_change = 1;
/* register card detect irq */ @@ -1464,9 +1292,8 @@ static int mmc_spi_probe(struct spi_devi if (!status) has_ro = true;
- dev_info(&spi->dev, "SD/MMC host %s%s%s%s%s\n", + dev_info(&spi->dev, "SD/MMC host %s%s%s%s\n", dev_name(&mmc->class_dev), - host->dma_dev ? "" : ", no DMA", has_ro ? "" : ", no WP", (host->pdata && host->pdata->setpower) ? "" : ", no poweroff", @@ -1477,8 +1304,6 @@ static int mmc_spi_probe(struct spi_devi fail_gpiod_request: mmc_remove_host(mmc); fail_glue_init: - mmc_spi_dma_free(host); -fail_dma: kfree(host->data); fail_nobuf1: mmc_spi_put_pdata(spi); @@ -1500,7 +1325,6 @@ static void mmc_spi_remove(struct spi_de
mmc_remove_host(mmc);
- mmc_spi_dma_free(host); kfree(host->data); kfree(host->ones);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alain Volmat alain.volmat@foss.st.com
commit b33cb0cbe2893b96ecbfa16254407153f4b55d16 upstream.
Use a copy of the struct v4l2_subdev_format when propagating format from the sink to source pad in order to avoid impacting the sink format returned to the application.
Thanks to Jacopo Mondi for pointing the issue.
Fixes: 6c01e6f3f27b ("media: st-mipid02: Propagate format from sink to source pad") Signed-off-by: Alain Volmat alain.volmat@foss.st.com Cc: stable@vger.kernel.org Reviewed-by: Jacopo Mondi jacopo.mondi@ideasonboard.com Reviewed-by: Daniel Scally dan.scally@ideasonboard.com Reviewed-by: Benjamin Mugnier benjamin.mugnier@foss.st.com Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/i2c/st-mipid02.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/media/i2c/st-mipid02.c +++ b/drivers/media/i2c/st-mipid02.c @@ -770,6 +770,7 @@ static void mipid02_set_fmt_sink(struct struct v4l2_subdev_format *format) { struct mipid02_dev *bridge = to_mipid02_dev(sd); + struct v4l2_subdev_format source_fmt; struct v4l2_mbus_framefmt *fmt;
format->format.code = get_fmt_code(format->format.code); @@ -781,8 +782,12 @@ static void mipid02_set_fmt_sink(struct
*fmt = format->format;
- /* Propagate the format change to the source pad */ - mipid02_set_fmt_source(sd, sd_state, format); + /* + * Propagate the format change to the source pad, taking + * care not to update the format pointer given back to user + */ + source_fmt = *format; + mipid02_set_fmt_source(sd, sd_state, &source_fmt); }
static int mipid02_set_fmt(struct v4l2_subdev *sd,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zheng Wang zyytlz.wz@163.com
commit 38e1857933def4b3fafc28cc34ff3bbc84cad2c3 upstream.
In mtk_jpegdec_worker, if error occurs in mtk_jpeg_set_dec_dst, it will start the timeout worker and invoke v4l2_m2m_job_finish at the same time. This will break the logic of design for there should be only one function to call v4l2_m2m_job_finish. But now the timeout handler and mtk_jpegdec_worker will both invoke it.
Fix it by start the worker only if mtk_jpeg_set_dec_dst successfully finished.
Fixes: da4ede4b7fd6 ("media: mtk-jpeg: move data/code inside CONFIG_OF blocks") Signed-off-by: Zheng Wang zyytlz.wz@163.com Signed-off-by: Dmitry Osipenko dmitry.osipenko@collabora.com Cc: stable@vger.kernel.org Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c @@ -1749,9 +1749,6 @@ retry_select: v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx); v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
- schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work, - msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); - mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs); if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, @@ -1761,6 +1758,9 @@ retry_select: goto setdst_end; }
+ schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work, + msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); + spin_lock_irqsave(&comp_jpeg[hw_id]->hw_lock, flags); ctx->total_frame_num++; mtk_jpeg_dec_reset(comp_jpeg[hw_id]->reg_base);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zheng Wang zyytlz.wz@163.com
commit 206c857dd17d4d026de85866f1b5f0969f2a109e upstream.
In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with mtk_jpeg_job_timeout_work.
In mtk_jpeg_dec_device_run, if error happens in mtk_jpeg_set_dec_dst, it will finally start the worker while mark the job as finished by invoking v4l2_m2m_job_finish.
There are two methods to trigger the bug. If we remove the module, it which will call mtk_jpeg_remove to make cleanup. The possible sequence is as follows, which will cause a use-after-free bug.
CPU0 CPU1 mtk_jpeg_dec_... | start worker | |mtk_jpeg_job_timeout_work mtk_jpeg_remove | v4l2_m2m_release | kfree(m2m_dev); | | | v4l2_m2m_get_curr_priv | m2m_dev->curr_ctx //use
If we close the file descriptor, which will call mtk_jpeg_release, it will have a similar sequence.
Fix this bug by starting timeout worker only if started jpegdec worker successfully. Then v4l2_m2m_job_finish will only be called in either mtk_jpeg_job_timeout_work or mtk_jpeg_dec_device_run.
Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver") Signed-off-by: Zheng Wang zyytlz.wz@163.com Signed-off-by: Dmitry Osipenko dmitry.osipenko@collabora.com Cc: stable@vger.kernel.org Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c +++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c @@ -1021,13 +1021,13 @@ static void mtk_jpeg_dec_device_run(void if (ret < 0) goto dec_end;
- schedule_delayed_work(&jpeg->job_timeout_work, - msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); - mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs); if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, &dst_buf->vb2_buf, &fb)) goto dec_end;
+ schedule_delayed_work(&jpeg->job_timeout_work, + msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC)); + spin_lock_irqsave(&jpeg->hw_lock, flags); mtk_jpeg_dec_reset(jpeg->reg_base); mtk_jpeg_dec_set_config(jpeg->reg_base,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guo Ren guoren@linux.alibaba.com
commit 97b7ac69be2e5a683e898f5267f659fde52efdd5 upstream.
When the task is in COMPAT mode, the arch_get_mmap_end should be 2GB, not TASK_SIZE_64. The TASK_SIZE has contained is_compat_mode() detection, so change the definition of STACK_TOP_MAX to TASK_SIZE directly.
Cc: stable@vger.kernel.org Fixes: add2cc6b6515 ("RISC-V: mm: Restrict address space for sv39,sv48,sv57") Signed-off-by: Guo Ren guoren@linux.alibaba.com Signed-off-by: Guo Ren guoren@kernel.org Reviewed-by: Leonardo Bras leobras@redhat.com Reviewed-by: Charlie Jenkins charlie@rivosinc.com Link: https://lore.kernel.org/r/20231222115703.2404036-3-guoren@kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/processor.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/riscv/include/asm/processor.h +++ b/arch/riscv/include/asm/processor.h @@ -15,7 +15,7 @@
#ifdef CONFIG_64BIT #define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1)) -#define STACK_TOP_MAX TASK_SIZE_64 +#define STACK_TOP_MAX TASK_SIZE
#define arch_get_mmap_end(addr, len, flags) \ ({ \
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Guo Ren guoren@linux.alibaba.com
commit 5f449e245e5b0d9d63eef6c8968fbdc3a8594407 upstream.
In COMPAT mode, the STACK_TOP is DEFAULT_MAP_WINDOW (0x80000000), but the TASK_SIZE is 0x7fff000. When the user stack is upon 0x7fff000, it will cause a user segment fault. Sometimes, it would cause boot failure when the whole rootfs is rv32.
Freeing unused kernel image (initmem) memory: 2236K Run /sbin/init as init process Starting init: /sbin/init exists but couldn't execute it (error -14) Run /etc/init as init process ...
Increase the TASK_SIZE to cover STACK_TOP.
Cc: stable@vger.kernel.org Fixes: add2cc6b6515 ("RISC-V: mm: Restrict address space for sv39,sv48,sv57") Signed-off-by: Guo Ren guoren@linux.alibaba.com Signed-off-by: Guo Ren guoren@kernel.org Reviewed-by: Leonardo Bras leobras@redhat.com Reviewed-by: Charlie Jenkins charlie@rivosinc.com Link: https://lore.kernel.org/r/20231222115703.2404036-2-guoren@kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -880,7 +880,7 @@ static inline pte_t pte_swp_clear_exclus #define TASK_SIZE_MIN (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2)
#ifdef CONFIG_COMPAT -#define TASK_SIZE_32 (_AC(0x80000000, UL) - PAGE_SIZE) +#define TASK_SIZE_32 (_AC(0x80000000, UL)) #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ TASK_SIZE_32 : TASK_SIZE_64) #else
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Herring robh@kernel.org
commit 546b7cde9b1dd36089649101b75266564600ffe5 upstream.
In preparation to apply ARM64_WORKAROUND_2966298 for multiple errata, rename the kconfig and capability. No functional change.
Cc: stable@vger.kernel.org Signed-off-by: Rob Herring robh@kernel.org Reviewed-by: Mark Rutland mark.rutland@arm.com Link: https://lore.kernel.org/r/20240110-arm-errata-a510-v1-1-d02bc51aeeee@kernel.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/Kconfig | 4 ++++ arch/arm64/kernel/cpu_errata.c | 4 ++-- arch/arm64/kernel/entry.S | 2 +- arch/arm64/tools/cpucaps | 2 +- 4 files changed, 8 insertions(+), 4 deletions(-)
--- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1037,8 +1037,12 @@ config ARM64_ERRATUM_2645198
If unsure, say Y.
+config ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD + bool + config ARM64_ERRATUM_2966298 bool "Cortex-A520: 2966298: workaround for speculatively executed unprivileged load" + select ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD default y help This option adds the workaround for ARM Cortex-A520 erratum 2966298. --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -730,10 +730,10 @@ const struct arm64_cpu_capabilities arm6 .cpu_enable = cpu_clear_bf16_from_user_emulation, }, #endif -#ifdef CONFIG_ARM64_ERRATUM_2966298 +#ifdef CONFIG_ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD { .desc = "ARM erratum 2966298", - .capability = ARM64_WORKAROUND_2966298, + .capability = ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD, /* Cortex-A520 r0p0 - r0p1 */ ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A520, 0, 0, 1), }, --- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -428,7 +428,7 @@ alternative_else_nop_endif ldp x28, x29, [sp, #16 * 14]
.if \el == 0 -alternative_if ARM64_WORKAROUND_2966298 +alternative_if ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD tlbi vale1, xzr dsb nsh alternative_else_nop_endif --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -84,7 +84,6 @@ WORKAROUND_2077057 WORKAROUND_2457168 WORKAROUND_2645198 WORKAROUND_2658417 -WORKAROUND_2966298 WORKAROUND_AMPERE_AC03_CPU_38 WORKAROUND_TRBE_OVERWRITE_FILL_MODE WORKAROUND_TSB_FLUSH_FAILURE @@ -100,3 +99,4 @@ WORKAROUND_NVIDIA_CARMEL_CNP WORKAROUND_QCOM_FALKOR_E1003 WORKAROUND_REPEAT_TLBI WORKAROUND_SPECULATIVE_AT +WORKAROUND_SPECULATIVE_UNPRIV_LOAD
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rob Herring robh@kernel.org
commit f827bcdafa2a2ac21c91e47f587e8d0c76195409 upstream.
Implement the workaround for ARM Cortex-A510 erratum 3117295. On an affected Cortex-A510 core, a speculatively executed unprivileged load might leak data from a privileged load via a cache side channel. The issue only exists for loads within a translation regime with the same translation (e.g. same ASID and VMID). Therefore, the issue only affects the return to EL0.
The erratum and workaround are the same as ARM Cortex-A520 erratum 2966298, so reuse the existing workaround.
Cc: stable@vger.kernel.org Signed-off-by: Rob Herring robh@kernel.org Reviewed-by: Mark Rutland mark.rutland@arm.com Link: https://lore.kernel.org/r/20240110-arm-errata-a510-v1-2-d02bc51aeeee@kernel.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/arch/arm64/silicon-errata.rst | 2 ++ arch/arm64/Kconfig | 14 ++++++++++++++ arch/arm64/kernel/cpu_errata.c | 17 +++++++++++++++-- 3 files changed, 31 insertions(+), 2 deletions(-)
--- a/Documentation/arch/arm64/silicon-errata.rst +++ b/Documentation/arch/arm64/silicon-errata.rst @@ -71,6 +71,8 @@ stable kernels. +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A510 | #2658417 | ARM64_ERRATUM_2658417 | +----------------+-----------------+-----------------+-----------------------------+ +| ARM | Cortex-A510 | #3117295 | ARM64_ERRATUM_3117295 | ++----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A520 | #2966298 | ARM64_ERRATUM_2966298 | +----------------+-----------------+-----------------+-----------------------------+ | ARM | Cortex-A53 | #826319 | ARM64_ERRATUM_826319 | --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -1054,6 +1054,20 @@ config ARM64_ERRATUM_2966298
If unsure, say Y.
+config ARM64_ERRATUM_3117295 + bool "Cortex-A510: 3117295: workaround for speculatively executed unprivileged load" + select ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD + default y + help + This option adds the workaround for ARM Cortex-A510 erratum 3117295. + + On an affected Cortex-A510 core, a speculatively executed unprivileged + load might leak data from a privileged level via a cache side channel. + + Work around this problem by executing a TLBI before returning to EL0. + + If unsure, say Y. + config CAVIUM_ERRATUM_22375 bool "Cavium erratum 22375, 24313" default y --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -432,6 +432,19 @@ static struct midr_range broken_aarch32_ }; #endif /* CONFIG_ARM64_WORKAROUND_TRBE_WRITE_OUT_OF_RANGE */
+#ifdef CONFIG_ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD +static const struct midr_range erratum_spec_unpriv_load_list[] = { +#ifdef CONFIG_ARM64_ERRATUM_3117295 + MIDR_ALL_VERSIONS(MIDR_CORTEX_A510), +#endif +#ifdef CONFIG_ARM64_ERRATUM_2966298 + /* Cortex-A520 r0p0 to r0p1 */ + MIDR_REV_RANGE(MIDR_CORTEX_A520, 0, 0, 1), +#endif + {}, +}; +#endif + const struct arm64_cpu_capabilities arm64_errata[] = { #ifdef CONFIG_ARM64_WORKAROUND_CLEAN_CACHE { @@ -732,10 +745,10 @@ const struct arm64_cpu_capabilities arm6 #endif #ifdef CONFIG_ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD { - .desc = "ARM erratum 2966298", + .desc = "ARM errata 2966298, 3117295", .capability = ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD, /* Cortex-A520 r0p0 - r0p1 */ - ERRATA_MIDR_REV_RANGE(MIDR_CORTEX_A520, 0, 0, 1), + ERRATA_MIDR_RANGE_LIST(erratum_spec_unpriv_load_list), }, #endif #ifdef CONFIG_AMPERE_ERRATUM_AC03_CPU_38
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Brown broonie@kernel.org
commit dc7eb8755797ed41a0d1b5c0c39df3c8f401b3d9 upstream.
When sme_alloc() is called with existing storage and we are not flushing we will always allocate new storage, both leaking the existing storage and corrupting the state. Fix this by separating the checks for flushing and for existing storage as we do for SVE.
Callers that reallocate (eg, due to changing the vector length) should call sme_free() themselves.
Fixes: 5d0a8d2fba50 ("arm64/ptrace: Ensure that SME is set up for target when writing SSVE state") Signed-off-by: Mark Brown broonie@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240115-arm64-sme-flush-v1-1-7472bd3459b7@kernel.... Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/fpsimd.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1280,8 +1280,10 @@ void fpsimd_release_task(struct task_str */ void sme_alloc(struct task_struct *task, bool flush) { - if (task->thread.sme_state && flush) { - memset(task->thread.sme_state, 0, sme_state_size(task)); + if (task->thread.sme_state) { + if (flush) + memset(task->thread.sme_state, 0, + sme_state_size(task)); return; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mark Rutland mark.rutland@arm.com
commit 832dd634bd1b4e3bbe9f10b9c9ba5db6f6f2b97f upstream.
Currently the ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD workaround isn't quite right, as it is supposed to be applied after the last explicit memory access, but is immediately followed by an LDR.
The ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD workaround is used to handle Cortex-A520 erratum 2966298 and Cortex-A510 erratum 3117295, which are described in:
* https://developer.arm.com/documentation/SDEN2444153/0600/?lang=en * https://developer.arm.com/documentation/SDEN1873361/1600/?lang=en
In both cases the workaround is described as:
| If pagetable isolation is disabled, the context switch logic in the | kernel can be updated to execute the following sequence on affected | cores before exiting to EL0, and after all explicit memory accesses: | | 1. A non-shareable TLBI to any context and/or address, including | unused contexts or addresses, such as a `TLBI VALE1 Xzr`. | | 2. A DSB NSH to guarantee completion of the TLBI.
The important part being that the TLBI+DSB must be placed "after all explicit memory accesses".
Unfortunately, as-implemented, the TLBI+DSB is immediately followed by an LDR, as we have:
| alternative_if ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD | tlbi vale1, xzr | dsb nsh | alternative_else_nop_endif | alternative_if_not ARM64_UNMAP_KERNEL_AT_EL0 | ldr lr, [sp, #S_LR] | add sp, sp, #PT_REGS_SIZE // restore sp | eret | alternative_else_nop_endif | | [ ... KPTI exception return path ... ]
This patch fixes this by reworking the logic to place the TLBI+DSB immediately before the ERET, after all explicit memory accesses.
The ERET is currently in a separate alternative block, and alternatives cannot be nested. To account for this, the alternative block for ARM64_UNMAP_KERNEL_AT_EL0 is replaced with a single alternative branch to skip the KPTI logic, with the new shape of the logic being:
| alternative_insn "b .L_skip_tramp_exit_@", nop, ARM64_UNMAP_KERNEL_AT_EL0 | [ ... KPTI exception return path ... ] | .L_skip_tramp_exit_@: | | ldr lr, [sp, #S_LR] | add sp, sp, #PT_REGS_SIZE // restore sp | | alternative_if ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD | tlbi vale1, xzr | dsb nsh | alternative_else_nop_endif | eret
The new structure means that the workaround is only applied when KPTI is not in use; this is fine as noted in the documented implications of the erratum:
| Pagetable isolation between EL0 and higher level ELs prevents the | issue from occurring.
... and as per the workaround description quoted above, the workaround is only necessary "If pagetable isolation is disabled".
Fixes: 471470bc7052 ("arm64: errata: Add Cortex-A520 speculative unprivileged load workaround") Signed-off-by: Mark Rutland mark.rutland@arm.com Cc: Catalin Marinas catalin.marinas@arm.com Cc: James Morse james.morse@arm.com Cc: Rob Herring robh@kernel.org Cc: Will Deacon will@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240116110221.420467-2-mark.rutland@arm.com Signed-off-by: Will Deacon will@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/kernel/entry.S | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
--- a/arch/arm64/kernel/entry.S +++ b/arch/arm64/kernel/entry.S @@ -428,16 +428,9 @@ alternative_else_nop_endif ldp x28, x29, [sp, #16 * 14]
.if \el == 0 -alternative_if ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD - tlbi vale1, xzr - dsb nsh -alternative_else_nop_endif -alternative_if_not ARM64_UNMAP_KERNEL_AT_EL0 - ldr lr, [sp, #S_LR] - add sp, sp, #PT_REGS_SIZE // restore sp - eret -alternative_else_nop_endif #ifdef CONFIG_UNMAP_KERNEL_AT_EL0 + alternative_insn "b .L_skip_tramp_exit_@", nop, ARM64_UNMAP_KERNEL_AT_EL0 + msr far_el1, x29
ldr_this_cpu x30, this_cpu_vector, x29 @@ -446,7 +439,18 @@ alternative_else_nop_endif ldr lr, [sp, #S_LR] // restore x30 add sp, sp, #PT_REGS_SIZE // restore sp br x29 + +.L_skip_tramp_exit_@: #endif + ldr lr, [sp, #S_LR] + add sp, sp, #PT_REGS_SIZE // restore sp + + /* This must be after the last explicit memory access */ +alternative_if ARM64_WORKAROUND_SPECULATIVE_UNPRIV_LOAD + tlbi vale1, xzr + dsb nsh +alternative_else_nop_endif + eret .else ldr lr, [sp, #S_LR] add sp, sp, #PT_REGS_SIZE // restore sp
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit 3d762e21d56370a43478b55e604b4a83dd85aafc upstream.
Intel systems > 2015 have been configured to use ACPI alarm instead of HPET to avoid s2idle issues.
Having HPET programmed for wakeup causes problems on AMD systems with s2idle as well.
One particular case is that the systemd "SuspendThenHibernate" feature doesn't work properly on the Framework 13" AMD model. Switching to using ACPI alarm fixes the issue.
Adjust the quirk to apply to AMD/Hygon systems from 2021 onwards. This matches what has been tested and is specifically to avoid potential risk to older systems.
Cc: stable@vger.kernel.org # 6.1+ Reported-by: alvin.zhuge@gmail.com Reported-by: renzhamin@gmail.com Closes: https://github.com/systemd/systemd/issues/24279 Reported-by: Kelvie Wong kelvie@kelvie.ca Closes: https://community.frame.work/t/systemd-suspend-then-hibernate-wakes-up-after... Signed-off-by: Mario Limonciello mario.limonciello@amd.com Link: https://lore.kernel.org/r/20231106162310.85711-1-mario.limonciello@amd.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/rtc/rtc-cmos.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-)
--- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -818,18 +818,24 @@ static void rtc_wake_off(struct device * }
#ifdef CONFIG_X86 -/* Enable use_acpi_alarm mode for Intel platforms no earlier than 2015 */ static void use_acpi_alarm_quirks(void) { - if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL) + switch (boot_cpu_data.x86_vendor) { + case X86_VENDOR_INTEL: + if (dmi_get_bios_year() < 2015) + return; + break; + case X86_VENDOR_AMD: + case X86_VENDOR_HYGON: + if (dmi_get_bios_year() < 2021) + return; + break; + default: return; - + } if (!is_hpet_enabled()) return;
- if (dmi_get_bios_year() < 2015) - return; - use_acpi_alarm = true; } #else
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit 1311a8f0d4b23f58bbababa13623aa40b8ad4e0c upstream.
When mc146818_avoid_UIP() fails to return a valid value, this is because UIP didn't clear in the timeout period. Adjust the return code in this case to -ETIMEDOUT.
Tested-by: Mateusz Jończyk mat.jonczyk@o2.pl Reviewed-by: Mateusz Jończyk mat.jonczyk@o2.pl Acked-by: Mateusz Jończyk mat.jonczyk@o2.pl Cc: stable@vger.kernel.org Fixes: cdedc45c579f ("rtc: cmos: avoid UIP when reading alarm time") Fixes: cd17420ebea5 ("rtc: cmos: avoid UIP when writing alarm time") Signed-off-by: Mario Limonciello mario.limonciello@amd.com Link: https://lore.kernel.org/r/20231128053653.101798-3-mario.limonciello@amd.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/rtc/rtc-cmos.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -292,7 +292,7 @@ static int cmos_read_alarm(struct device
/* This not only a rtc_op, but also called directly */ if (!is_valid_irq(cmos->irq)) - return -EIO; + return -ETIMEDOUT;
/* Basic alarms only support hour, minute, and seconds fields. * Some also support day and month, for alarms up to a year in @@ -557,7 +557,7 @@ static int cmos_set_alarm(struct device * Use mc146818_avoid_UIP() to avoid this. */ if (!mc146818_avoid_UIP(cmos_set_alarm_callback, &p)) - return -EIO; + return -ETIMEDOUT;
cmos->alarm_expires = rtc_tm_to_time64(&t->time);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit af838635a3eb9b1bc0d98599c101ebca98f31311 upstream.
mc146818_get_time() calls mc146818_avoid_UIP() to avoid fetching the time while RTC update is in progress (UIP). When this fails, the return code is -EIO, but actually there was no IO failure.
The reason for the return from mc146818_avoid_UIP() is that the UIP wasn't cleared in the time period. Adjust the return code to -ETIMEDOUT to match the behavior.
Tested-by: Mateusz Jończyk mat.jonczyk@o2.pl Reviewed-by: Mateusz Jończyk mat.jonczyk@o2.pl Acked-by: Mateusz Jończyk mat.jonczyk@o2.pl Cc: stable@vger.kernel.org Fixes: 2a61b0ac5493 ("rtc: mc146818-lib: refactor mc146818_get_time") Signed-off-by: Mario Limonciello mario.limonciello@amd.com Link: https://lore.kernel.org/r/20231128053653.101798-2-mario.limonciello@amd.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/rtc/rtc-mc146818-lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -138,7 +138,7 @@ int mc146818_get_time(struct rtc_time *t
if (!mc146818_avoid_UIP(mc146818_get_time_callback, &p)) { memset(time, 0, sizeof(*time)); - return -EIO; + return -ETIMEDOUT; }
if (!(p.ctrl & RTC_DM_BINARY) || RTC_ALWAYS_BCD)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit 120931db07b49252aba2073096b595482d71857c upstream.
The UIP timeout is hardcoded to 10ms for all RTC reads, but in some contexts this might not be enough time. Add a timeout parameter to mc146818_get_time() and mc146818_get_time_callback().
If UIP timeout is configured by caller to be >=100 ms and a call takes this long, log a warning.
Make all callers use 10ms to ensure no functional changes.
Cc: stable@vger.kernel.org # 6.1.y Fixes: ec5895c0f2d8 ("rtc: mc146818-lib: extract mc146818_avoid_UIP") Signed-off-by: Mario Limonciello mario.limonciello@amd.com Tested-by: Mateusz Jończyk mat.jonczyk@o2.pl Reviewed-by: Mateusz Jończyk mat.jonczyk@o2.pl Acked-by: Mateusz Jończyk mat.jonczyk@o2.pl Link: https://lore.kernel.org/r/20231128053653.101798-4-mario.limonciello@amd.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/alpha/kernel/rtc.c | 2 +- arch/x86/kernel/hpet.c | 2 +- arch/x86/kernel/rtc.c | 2 +- drivers/base/power/trace.c | 2 +- drivers/rtc/rtc-cmos.c | 6 +++--- drivers/rtc/rtc-mc146818-lib.c | 37 +++++++++++++++++++++++++++++-------- include/linux/mc146818rtc.h | 3 ++- 7 files changed, 38 insertions(+), 16 deletions(-)
--- a/arch/alpha/kernel/rtc.c +++ b/arch/alpha/kernel/rtc.c @@ -80,7 +80,7 @@ init_rtc_epoch(void) static int alpha_rtc_read_time(struct device *dev, struct rtc_time *tm) { - int ret = mc146818_get_time(tm); + int ret = mc146818_get_time(tm, 10);
if (ret < 0) { dev_err_ratelimited(dev, "unable to read current time\n"); --- a/arch/x86/kernel/hpet.c +++ b/arch/x86/kernel/hpet.c @@ -1438,7 +1438,7 @@ irqreturn_t hpet_rtc_interrupt(int irq, memset(&curr_time, 0, sizeof(struct rtc_time));
if (hpet_rtc_flags & (RTC_UIE | RTC_AIE)) { - if (unlikely(mc146818_get_time(&curr_time) < 0)) { + if (unlikely(mc146818_get_time(&curr_time, 10) < 0)) { pr_err_ratelimited("unable to read current time from RTC\n"); return IRQ_HANDLED; } --- a/arch/x86/kernel/rtc.c +++ b/arch/x86/kernel/rtc.c @@ -67,7 +67,7 @@ void mach_get_cmos_time(struct timespec6 return; }
- if (mc146818_get_time(&tm)) { + if (mc146818_get_time(&tm, 10)) { pr_err("Unable to read current time from RTC\n"); now->tv_sec = now->tv_nsec = 0; return; --- a/drivers/base/power/trace.c +++ b/drivers/base/power/trace.c @@ -120,7 +120,7 @@ static unsigned int read_magic_time(void struct rtc_time time; unsigned int val;
- if (mc146818_get_time(&time) < 0) { + if (mc146818_get_time(&time, 10) < 0) { pr_err("Unable to read current time from RTC\n"); return 0; } --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -231,7 +231,7 @@ static int cmos_read_time(struct device if (!pm_trace_rtc_valid()) return -EIO;
- ret = mc146818_get_time(t); + ret = mc146818_get_time(t, 10); if (ret < 0) { dev_err_ratelimited(dev, "unable to read current time\n"); return ret; @@ -307,7 +307,7 @@ static int cmos_read_alarm(struct device * * Use the mc146818_avoid_UIP() function to avoid this. */ - if (!mc146818_avoid_UIP(cmos_read_alarm_callback, &p)) + if (!mc146818_avoid_UIP(cmos_read_alarm_callback, 10, &p)) return -EIO;
if (!(p.rtc_control & RTC_DM_BINARY) || RTC_ALWAYS_BCD) { @@ -556,7 +556,7 @@ static int cmos_set_alarm(struct device * * Use mc146818_avoid_UIP() to avoid this. */ - if (!mc146818_avoid_UIP(cmos_set_alarm_callback, &p)) + if (!mc146818_avoid_UIP(cmos_set_alarm_callback, 10, &p)) return -ETIMEDOUT;
cmos->alarm_expires = rtc_tm_to_time64(&t->time); --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -8,26 +8,31 @@ #include <linux/acpi.h> #endif
+#define UIP_RECHECK_DELAY 100 /* usec */ +#define UIP_RECHECK_DELAY_MS (USEC_PER_MSEC / UIP_RECHECK_DELAY) +#define UIP_RECHECK_LOOPS_MS(x) (x / UIP_RECHECK_DELAY_MS) + /* * Execute a function while the UIP (Update-in-progress) bit of the RTC is - * unset. + * unset. The timeout is configurable by the caller in ms. * * Warning: callback may be executed more then once. */ bool mc146818_avoid_UIP(void (*callback)(unsigned char seconds, void *param), + int timeout, void *param) { int i; unsigned long flags; unsigned char seconds;
- for (i = 0; i < 100; i++) { + for (i = 0; UIP_RECHECK_LOOPS_MS(i) < timeout; i++) { spin_lock_irqsave(&rtc_lock, flags);
/* * Check whether there is an update in progress during which the * readout is unspecified. The maximum update time is ~2ms. Poll - * every 100 usec for completion. + * for completion. * * Store the second value before checking UIP so a long lasting * NMI which happens to hit after the UIP check cannot make @@ -37,7 +42,7 @@ bool mc146818_avoid_UIP(void (*callback)
if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) { spin_unlock_irqrestore(&rtc_lock, flags); - udelay(100); + udelay(UIP_RECHECK_DELAY); continue; }
@@ -56,7 +61,7 @@ bool mc146818_avoid_UIP(void (*callback) */ if (CMOS_READ(RTC_FREQ_SELECT) & RTC_UIP) { spin_unlock_irqrestore(&rtc_lock, flags); - udelay(100); + udelay(UIP_RECHECK_DELAY); continue; }
@@ -72,6 +77,10 @@ bool mc146818_avoid_UIP(void (*callback) } spin_unlock_irqrestore(&rtc_lock, flags);
+ if (UIP_RECHECK_LOOPS_MS(i) >= 100) + pr_warn("Reading current time from RTC took around %li ms\n", + UIP_RECHECK_LOOPS_MS(i)); + return true; } return false; @@ -84,7 +93,7 @@ EXPORT_SYMBOL_GPL(mc146818_avoid_UIP); */ bool mc146818_does_rtc_work(void) { - return mc146818_avoid_UIP(NULL, NULL); + return mc146818_avoid_UIP(NULL, 10, NULL); } EXPORT_SYMBOL_GPL(mc146818_does_rtc_work);
@@ -130,13 +139,25 @@ static void mc146818_get_time_callback(u p->ctrl = CMOS_READ(RTC_CONTROL); }
-int mc146818_get_time(struct rtc_time *time) +/** + * mc146818_get_time - Get the current time from the RTC + * @time: pointer to struct rtc_time to store the current time + * @timeout: timeout value in ms + * + * This function reads the current time from the RTC and stores it in the + * provided struct rtc_time. The timeout parameter specifies the maximum + * time to wait for the RTC to become ready. + * + * Return: 0 on success, -ETIMEDOUT if the RTC did not become ready within + * the specified timeout, or another error code if an error occurred. + */ +int mc146818_get_time(struct rtc_time *time, int timeout) { struct mc146818_get_time_callback_param p = { .time = time };
- if (!mc146818_avoid_UIP(mc146818_get_time_callback, &p)) { + if (!mc146818_avoid_UIP(mc146818_get_time_callback, timeout, &p)) { memset(time, 0, sizeof(*time)); return -ETIMEDOUT; } --- a/include/linux/mc146818rtc.h +++ b/include/linux/mc146818rtc.h @@ -126,10 +126,11 @@ struct cmos_rtc_board_info { #endif /* ARCH_RTC_LOCATION */
bool mc146818_does_rtc_work(void); -int mc146818_get_time(struct rtc_time *time); +int mc146818_get_time(struct rtc_time *time, int timeout); int mc146818_set_time(struct rtc_time *time);
bool mc146818_avoid_UIP(void (*callback)(unsigned char seconds, void *param), + int timeout, void *param);
#endif /* _MC146818RTC_H */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit cef9ecc8e938dd48a560f7dd9be1246359248d20 upstream.
Specs don't say anything about UIP being cleared within 10ms. They only say that UIP won't occur for another 244uS. If a long NMI occurs while UIP is still updating it might not be possible to get valid data in 10ms.
This has been observed in the wild that around s2idle some calls can take up to 480ms before UIP is clear.
Adjust callers from outside an interrupt context to wait for up to a 1s instead of 10ms.
Cc: stable@vger.kernel.org # 6.1.y Fixes: ec5895c0f2d8 ("rtc: mc146818-lib: extract mc146818_avoid_UIP") Reported-by: Carsten Hatger xmb8dsv4@gmail.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217626 Tested-by: Mateusz Jończyk mat.jonczyk@o2.pl Reviewed-by: Mateusz Jończyk mat.jonczyk@o2.pl Acked-by: Mateusz Jończyk mat.jonczyk@o2.pl Signed-off-by: Mario Limonciello mario.limonciello@amd.com Link: https://lore.kernel.org/r/20231128053653.101798-5-mario.limonciello@amd.com Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/rtc.c | 2 +- drivers/base/power/trace.c | 2 +- drivers/rtc/rtc-cmos.c | 2 +- drivers/rtc/rtc-mc146818-lib.c | 2 +- 4 files changed, 4 insertions(+), 4 deletions(-)
--- a/arch/x86/kernel/rtc.c +++ b/arch/x86/kernel/rtc.c @@ -67,7 +67,7 @@ void mach_get_cmos_time(struct timespec6 return; }
- if (mc146818_get_time(&tm, 10)) { + if (mc146818_get_time(&tm, 1000)) { pr_err("Unable to read current time from RTC\n"); now->tv_sec = now->tv_nsec = 0; return; --- a/drivers/base/power/trace.c +++ b/drivers/base/power/trace.c @@ -120,7 +120,7 @@ static unsigned int read_magic_time(void struct rtc_time time; unsigned int val;
- if (mc146818_get_time(&time, 10) < 0) { + if (mc146818_get_time(&time, 1000) < 0) { pr_err("Unable to read current time from RTC\n"); return 0; } --- a/drivers/rtc/rtc-cmos.c +++ b/drivers/rtc/rtc-cmos.c @@ -231,7 +231,7 @@ static int cmos_read_time(struct device if (!pm_trace_rtc_valid()) return -EIO;
- ret = mc146818_get_time(t, 10); + ret = mc146818_get_time(t, 1000); if (ret < 0) { dev_err_ratelimited(dev, "unable to read current time\n"); return ret; --- a/drivers/rtc/rtc-mc146818-lib.c +++ b/drivers/rtc/rtc-mc146818-lib.c @@ -93,7 +93,7 @@ EXPORT_SYMBOL_GPL(mc146818_avoid_UIP); */ bool mc146818_does_rtc_work(void) { - return mc146818_avoid_UIP(NULL, 10, NULL); + return mc146818_avoid_UIP(NULL, 1000, NULL); } EXPORT_SYMBOL_GPL(mc146818_does_rtc_work);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Airlie airlied@redhat.com
commit cacea81390fd8c8c85404e5eb2adeb83d87a912e upstream.
nvif_vmm_put gets called if addr is set, but if the allocation fails we don't need to call put, otherwise we get a warning like
[523232.435671] ------------[ cut here ]------------ [523232.435674] WARNING: CPU: 8 PID: 1505697 at drivers/gpu/drm/nouveau/nvif/vmm.c:68 nvif_vmm_put+0x72/0x80 [nouveau] [523232.435795] Modules linked in: uinput rfcomm snd_seq_dummy snd_hrtimer nf_conntrack_netbios_ns nf_conntrack_broadcast nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables nfnetlink qrtr bnep sunrpc binfmt_misc intel_rapl_msr intel_rapl_common intel_uncore_frequency intel_uncore_frequency_common isst_if_common iwlmvm nfit libnvdimm vfat fat x86_pkg_temp_thermal intel_powerclamp mac80211 snd_soc_avs snd_soc_hda_codec coretemp snd_hda_ext_core snd_soc_core snd_hda_codec_realtek kvm_intel snd_hda_codec_hdmi snd_compress snd_hda_codec_generic ac97_bus snd_pcm_dmaengine snd_hda_intel libarc4 snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec kvm iwlwifi snd_hda_core btusb snd_hwdep btrtl snd_seq btintel irqbypass btbcm rapl snd_seq_device eeepc_wmi btmtk intel_cstate iTCO_wdt cfg80211 snd_pcm asus_wmi bluetooth intel_pmc_bxt iTCO_vendor_support snd_timer ledtrig_audio pktcdvd snd mei_me [523232.435828] sparse_keymap intel_uncore i2c_i801 platform_profile wmi_bmof mei pcspkr ioatdma soundcore i2c_smbus rfkill idma64 dca joydev acpi_tad loop zram nouveau drm_ttm_helper ttm video drm_exec drm_gpuvm gpu_sched crct10dif_pclmul i2c_algo_bit nvme crc32_pclmul crc32c_intel drm_display_helper polyval_clmulni nvme_core polyval_generic e1000e mxm_wmi cec ghash_clmulni_intel r8169 sha512_ssse3 nvme_common wmi pinctrl_sunrisepoint uas usb_storage ip6_tables ip_tables fuse [523232.435849] CPU: 8 PID: 1505697 Comm: gnome-shell Tainted: G W 6.6.0-rc7-nvk-uapi+ #12 [523232.435851] Hardware name: System manufacturer System Product Name/ROG STRIX X299-E GAMING II, BIOS 1301 09/24/2021 [523232.435852] RIP: 0010:nvif_vmm_put+0x72/0x80 [nouveau] [523232.435934] Code: 00 00 48 89 e2 be 02 00 00 00 48 c7 04 24 00 00 00 00 48 89 44 24 08 e8 fc bf ff ff 85 c0 75 0a 48 c7 43 08 00 00 00 00 eb b3 <0f> 0b eb f2 e8 f5 c9 b2 e6 0f 1f 44 00 00 90 90 90 90 90 90 90 90 [523232.435936] RSP: 0018:ffffc900077ffbd8 EFLAGS: 00010282 [523232.435937] RAX: 00000000fffffffe RBX: ffffc900077ffc00 RCX: 0000000000000010 [523232.435938] RDX: 0000000000000010 RSI: ffffc900077ffb38 RDI: ffffc900077ffbd8 [523232.435940] RBP: ffff888e1c4f2140 R08: 0000000000000000 R09: 0000000000000000 [523232.435940] R10: 0000000000000000 R11: 0000000000000000 R12: ffff888503811800 [523232.435941] R13: ffffc900077ffca0 R14: ffff888e1c4f2140 R15: ffff88810317e1e0 [523232.435942] FS: 00007f933a769640(0000) GS:ffff88905fa00000(0000) knlGS:0000000000000000 [523232.435943] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [523232.435944] CR2: 00007f930bef7000 CR3: 00000005d0322001 CR4: 00000000003706e0 [523232.435945] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [523232.435946] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [523232.435964] Call Trace: [523232.435965] <TASK> [523232.435966] ? nvif_vmm_put+0x72/0x80 [nouveau] [523232.436051] ? __warn+0x81/0x130 [523232.436055] ? nvif_vmm_put+0x72/0x80 [nouveau] [523232.436138] ? report_bug+0x171/0x1a0 [523232.436142] ? handle_bug+0x3c/0x80 [523232.436144] ? exc_invalid_op+0x17/0x70 [523232.436145] ? asm_exc_invalid_op+0x1a/0x20 [523232.436149] ? nvif_vmm_put+0x72/0x80 [nouveau] [523232.436230] ? nvif_vmm_put+0x64/0x80 [nouveau] [523232.436342] nouveau_vma_del+0x80/0xd0 [nouveau] [523232.436506] nouveau_vma_new+0x1a0/0x210 [nouveau] [523232.436671] nouveau_gem_object_open+0x1d0/0x1f0 [nouveau] [523232.436835] drm_gem_handle_create_tail+0xd1/0x180 [523232.436840] drm_prime_fd_to_handle_ioctl+0x12e/0x200 [523232.436844] ? __pfx_drm_prime_fd_to_handle_ioctl+0x10/0x10 [523232.436847] drm_ioctl_kernel+0xd3/0x180 [523232.436849] drm_ioctl+0x26d/0x4b0 [523232.436851] ? __pfx_drm_prime_fd_to_handle_ioctl+0x10/0x10 [523232.436855] nouveau_drm_ioctl+0x5a/0xb0 [nouveau] [523232.437032] __x64_sys_ioctl+0x94/0xd0 [523232.437036] do_syscall_64+0x5d/0x90 [523232.437040] ? syscall_exit_to_user_mode+0x2b/0x40 [523232.437044] ? do_syscall_64+0x6c/0x90 [523232.437046] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Reported-by: Faith Ekstrand faith.ekstrand@collabora.com Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie airlied@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20240117213852.295565-1-airlie... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/nouveau/nouveau_vmm.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/nouveau/nouveau_vmm.c +++ b/drivers/gpu/drm/nouveau/nouveau_vmm.c @@ -108,6 +108,9 @@ nouveau_vma_new(struct nouveau_bo *nvbo, } else { ret = nvif_vmm_get(&vmm->vmm, PTES, false, mem->mem.page, 0, mem->mem.size, &tmp); + if (ret) + goto done; + vma->addr = tmp.addr; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ma Wupeng mawupeng1@huawei.com
commit 7ea6ec4c25294e8bc8788148ef854df92ee8dc5e upstream.
If the system has no mirrored memory or uses crashkernel.high while kernelcore=mirror is enabled on the command line then during crashkernel, there will be limited mirrored memory and this usually leads to OOM.
To solve this problem, disable the mirror feature during crashkernel.
Link: https://lkml.kernel.org/r/20240109041536.3903042-1-mawupeng1@huawei.com Signed-off-by: Ma Wupeng mawupeng1@huawei.com Acked-by: Mike Rapoport (IBM) rppt@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mm_init.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/mm/mm_init.c +++ b/mm/mm_init.c @@ -26,6 +26,7 @@ #include <linux/pgtable.h> #include <linux/swap.h> #include <linux/cma.h> +#include <linux/crash_dump.h> #include "internal.h" #include "slab.h" #include "shuffle.h" @@ -381,6 +382,11 @@ static void __init find_zone_movable_pfn goto out; }
+ if (is_kdump_kernel()) { + pr_warn("The system is under kdump, ignore kernelcore=mirror.\n"); + goto out; + } + for_each_mem_region(r) { if (memblock_is_mirror(r)) continue;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhihao Cheng chengzhihao1@huawei.com
commit 1e022216dcd248326a5bb95609d12a6815bca4e2 upstream.
For error handling path in ubifs_symlink(), inode will be marked as bad first, then iput() is invoked. If inode->i_link is initialized by fscrypt_encrypt_symlink() in encryption scenario, inode->i_link won't be freed by callchain ubifs_free_inode -> fscrypt_free_inode in error handling path, because make_bad_inode() has changed 'inode->i_mode' as 'S_IFREG'. Following kmemleak is easy to be reproduced by injecting error in ubifs_jnl_update() when doing symlink in encryption scenario: unreferenced object 0xffff888103da3d98 (size 8): comm "ln", pid 1692, jiffies 4294914701 (age 12.045s) backtrace: kmemdup+0x32/0x70 __fscrypt_encrypt_symlink+0xed/0x1c0 ubifs_symlink+0x210/0x300 [ubifs] vfs_symlink+0x216/0x360 do_symlinkat+0x11a/0x190 do_syscall_64+0x3b/0xe0 There are two ways fixing it: 1. Remove make_bad_inode() in error handling path. We can do that because ubifs_evict_inode() will do same processes for good symlink inode and bad symlink inode, for inode->i_nlink checking is before is_bad_inode(). 2. Free inode->i_link before marking inode bad. Method 2 is picked, it has less influence, personally, I think.
Cc: stable@vger.kernel.org Fixes: 2c58d548f570 ("fscrypt: cache decrypted symlink target in ->i_link") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Suggested-by: Eric Biggers ebiggers@kernel.org Reviewed-by: Eric Biggers ebiggers@google.com Signed-off-by: Richard Weinberger richard@nod.at Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ubifs/dir.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/ubifs/dir.c +++ b/fs/ubifs/dir.c @@ -1225,6 +1225,8 @@ out_cancel: dir_ui->ui_size = dir->i_size; mutex_unlock(&dir_ui->ui_mutex); out_inode: + /* Free inode->i_link before inode is marked as bad. */ + fscrypt_free_inode(inode); make_bad_inode(inode); iput(inode); out_fname:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: James Gowans jgowans@amazon.com
commit 7bb943806ff61e83ae4cceef8906b7fe52453e8a upstream.
syscore_shutdown() runs driver and module callbacks to get the system into a state where it can be correctly shut down. In commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") syscore_shutdown() was removed from kernel_restart_prepare() and hence got (incorrectly?) removed from the kexec flow. This was innocuous until commit 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown") changed the way that KVM registered its shutdown callbacks, switching from reboot notifiers to syscore_ops.shutdown. As syscore_shutdown() is missing from kexec, KVM's shutdown hook is not run and virtualisation is left enabled on the boot CPU which results in triple faults when switching to the new kernel on Intel x86 VT-x with VMXE enabled.
Fix this by adding syscore_shutdown() to the kexec sequence. In terms of where to add it, it is being added after migrating the kexec task to the boot CPU, but before APs are shut down. It is not totally clear if this is the best place: in commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") it is stated that "syscore_ops operations should be carried with one CPU on-line and interrupts disabled." APs are only offlined later in machine_shutdown(), so this syscore_shutdown() is being run while APs are still online. This seems to be the correct place as it matches where syscore_shutdown() is run in the reboot and halt flows - they also run it before APs are shut down. The assumption is that the commit message in commit 6f389a8f1dd2 ("PM / reboot: call syscore_shutdown() after disable_nonboot_cpus()") is no longer valid.
KVM has been discussed here as it is what broke loudly by not having syscore_shutdown() in kexec, but this change impacts more than just KVM; all drivers/modules which register a syscore_ops.shutdown callback will now be invoked in the kexec flow. Looking at some of them like x86 MCE it is probably more correct to also shut these down during kexec. Maintainers of all drivers which use syscore_ops.shutdown are added on CC for visibility. They are:
arch/powerpc/platforms/cell/spu_base.c .shutdown = spu_shutdown, arch/x86/kernel/cpu/mce/core.c .shutdown = mce_syscore_shutdown, arch/x86/kernel/i8259.c .shutdown = i8259A_shutdown, drivers/irqchip/irq-i8259.c .shutdown = i8259A_shutdown, drivers/irqchip/irq-sun6i-r.c .shutdown = sun6i_r_intc_shutdown, drivers/leds/trigger/ledtrig-cpu.c .shutdown = ledtrig_cpu_syscore_shutdown, drivers/power/reset/sc27xx-poweroff.c .shutdown = sc27xx_poweroff_shutdown, kernel/irq/generic-chip.c .shutdown = irq_gc_shutdown, virt/kvm/kvm_main.c .shutdown = kvm_shutdown,
This has been tested by doing a kexec on x86_64 and aarch64.
Link: https://lkml.kernel.org/r/20231213064004.2419447-1-jgowans@amazon.com Fixes: 6735150b6997 ("KVM: Use syscore_ops instead of reboot_notifier to hook restart/shutdown") Signed-off-by: James Gowans jgowans@amazon.com Cc: Baoquan He bhe@redhat.com Cc: Eric Biederman ebiederm@xmission.com Cc: Paolo Bonzini pbonzini@redhat.com Cc: Sean Christopherson seanjc@google.com Cc: Marc Zyngier maz@kernel.org Cc: Arnd Bergmann arnd@arndb.de Cc: Tony Luck tony.luck@intel.com Cc: Borislav Petkov bp@alien8.de Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@redhat.com Cc: Chen-Yu Tsai wens@csie.org Cc: Jernej Skrabec jernej.skrabec@gmail.com Cc: Samuel Holland samuel@sholland.org Cc: Pavel Machek pavel@ucw.cz Cc: Sebastian Reichel sre@kernel.org Cc: Orson Zhai orsonzhai@gmail.com Cc: Alexander Graf graf@amazon.de Cc: Jan H. Schoenherr jschoenh@amazon.de Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/kexec_core.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -1271,6 +1271,7 @@ int kernel_kexec(void) kexec_in_progress = true; kernel_restart_prepare("kexec reboot"); migrate_to_reboot_cpu(); + syscore_shutdown();
/* * migrate_to_reboot_cpu() disables CPU hotplug assuming that
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Donet Tom donettom@linux.vnet.ibm.com
commit 00bcfcd47a52f50f07a2e88d730d7931384cb073 upstream.
The kernel sefltest mm/hugepage-vmemmap fails on architectures which has different page size other than 4K. In hugepage-vmemmap page size used is 4k so the pfn calculation will go wrong on systems which has different page size .The length of MAP_HUGETLB memory must be hugepage aligned but in hugepage-vmemmap map length is 2M so this will not get aligned if the system has differnet hugepage size.
Added psize() to get the page size and default_huge_page_size() to get the default hugepage size at run time, hugepage-vmemmap test pass on powerpc with 64K page size and x86 with 4K page size.
Result on powerpc without patch (page size 64K) *# ./hugepage-vmemmap Returned address is 0x7effff000000 whose pfn is 0 Head page flags (100000000) is invalid check_page_flags: Invalid argument *#
Result on powerpc with patch (page size 64K) *# ./hugepage-vmemmap Returned address is 0x7effff000000 whose pfn is 600 *#
Result on x86 with patch (page size 4K) *# ./hugepage-vmemmap Returned address is 0x7fc7c2c00000 whose pfn is 1dac00 *#
Link: https://lkml.kernel.org/r/3b3a3ae37ba21218481c482a872bbf7526031600.170486575... Fixes: b147c89cd429 ("selftests: vm: add a hugetlb test case") Signed-off-by: Donet Tom donettom@linux.vnet.ibm.com Reported-by: Geetika Moolchandani geetika@linux.ibm.com Tested-by: Geetika Moolchandani geetika@linux.ibm.com Acked-by: Muchun Song muchun.song@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/mm/hugepage-vmemmap.c | 29 ++++++++++++++++---------- 1 file changed, 18 insertions(+), 11 deletions(-)
--- a/tools/testing/selftests/mm/hugepage-vmemmap.c +++ b/tools/testing/selftests/mm/hugepage-vmemmap.c @@ -10,10 +10,7 @@ #include <unistd.h> #include <sys/mman.h> #include <fcntl.h> - -#define MAP_LENGTH (2UL * 1024 * 1024) - -#define PAGE_SIZE 4096 +#include "vm_util.h"
#define PAGE_COMPOUND_HEAD (1UL << 15) #define PAGE_COMPOUND_TAIL (1UL << 16) @@ -39,6 +36,9 @@ #define MAP_FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB) #endif
+static size_t pagesize; +static size_t maplength; + static void write_bytes(char *addr, size_t length) { unsigned long i; @@ -56,7 +56,7 @@ static unsigned long virt_to_pfn(void *a if (fd < 0) return -1UL;
- lseek(fd, (unsigned long)addr / PAGE_SIZE * sizeof(pagemap), SEEK_SET); + lseek(fd, (unsigned long)addr / pagesize * sizeof(pagemap), SEEK_SET); read(fd, &pagemap, sizeof(pagemap)); close(fd);
@@ -86,7 +86,7 @@ static int check_page_flags(unsigned lon * this also verifies kernel has correctly set the fake page_head to tail * while hugetlb_free_vmemmap is enabled. */ - for (i = 1; i < MAP_LENGTH / PAGE_SIZE; i++) { + for (i = 1; i < maplength / pagesize; i++) { read(fd, &pageflags, sizeof(pageflags)); if ((pageflags & TAIL_PAGE_FLAGS) != TAIL_PAGE_FLAGS || (pageflags & HEAD_PAGE_FLAGS) == HEAD_PAGE_FLAGS) { @@ -106,18 +106,25 @@ int main(int argc, char **argv) void *addr; unsigned long pfn;
- addr = mmap(MAP_ADDR, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0); + pagesize = psize(); + maplength = default_huge_page_size(); + if (!maplength) { + printf("Unable to determine huge page size\n"); + exit(1); + } + + addr = mmap(MAP_ADDR, maplength, PROT_READ | PROT_WRITE, MAP_FLAGS, -1, 0); if (addr == MAP_FAILED) { perror("mmap"); exit(1); }
/* Trigger allocation of HugeTLB page. */ - write_bytes(addr, MAP_LENGTH); + write_bytes(addr, maplength);
pfn = virt_to_pfn(addr); if (pfn == -1UL) { - munmap(addr, MAP_LENGTH); + munmap(addr, maplength); perror("virt_to_pfn"); exit(1); } @@ -125,13 +132,13 @@ int main(int argc, char **argv) printf("Returned address is %p whose pfn is %lx\n", addr, pfn);
if (check_page_flags(pfn) < 0) { - munmap(addr, MAP_LENGTH); + munmap(addr, maplength); perror("check_page_flags"); exit(1); }
/* munmap() length of MAP_HUGETLB memory must be hugepage aligned */ - if (munmap(addr, MAP_LENGTH)) { + if (munmap(addr, maplength)) { perror("munmap"); exit(1); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Steven Rostedt (Google) rostedt@goodmis.org
commit f67f8d4a8c1e1ebc85a6cbdb9a7266f14863461c upstream.
Running my yearly branch profiler to see where likely/unlikely annotation may be added or removed, I discovered this:
correct incorrect % Function File Line ------- --------- - -------- ---- ---- 0 457918 100 page_try_dup_anon_rmap rmap.h 264 [..] 458021 0 0 page_try_dup_anon_rmap rmap.h 265
I thought it was interesting that line 264 of rmap.h had a 100% incorrect annotation, but the line directly below it was 100% correct. Looking at the code:
if (likely(!is_device_private_page(page) && unlikely(page_needs_cow_for_dma(vma, page))))
It didn't make sense. The "likely()" was around the entire if statement (not just the "!is_device_private_page(page)"), which also included the "unlikely()" portion of that if condition.
If the unlikely portion is unlikely to be true, that would make the entire if condition unlikely to be true, so it made no sense at all to say the entire if condition is true.
What is more likely to be likely is just the first part of the if statement before the && operation. It's likely to be a misplaced parenthesis. And after making the if condition broken into a likely() && unlikely(), both now appear to be correct!
Link: https://lkml.kernel.org/r/20231201145936.5ddfdb50@gandalf.local.home Fixes:fb3d824d1a46c ("mm/rmap: split page_dup_rmap() into page_dup_file_rmap() and page_try_dup_anon_rmap()") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Acked-by: Vlastimil Babka vbabka@suse.cz Cc: David Hildenbrand david@redhat.com Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/rmap.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -261,8 +261,8 @@ static inline int page_try_dup_anon_rmap * guarantee the pinned page won't be randomly replaced in the * future on write faults. */ - if (likely(!is_device_private_page(page) && - unlikely(page_needs_cow_for_dma(vma, page)))) + if (likely(!is_device_private_page(page)) && + unlikely(page_needs_cow_for_dma(vma, page))) return -EBUSY;
ClearPageAnonExclusive(page);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Charan Teja Kalla quic_charante@quicinc.com
commit 5ec8e8ea8b7783fab150cf86404fc38cb4db8800 upstream.
The below race is observed on a PFN which falls into the device memory region with the system memory configuration where PFN's are such that [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL]. Since normal zone start and end pfn contains the device memory PFN's as well, the compaction triggered will try on the device memory PFN's too though they end up in NOP(because pfn_to_online_page() returns NULL for ZONE_DEVICE memory sections). When from other core, the section mappings are being removed for the ZONE_DEVICE region, that the PFN in question belongs to, on which compaction is currently being operated is resulting into the kernel crash with CONFIG_SPASEMEM_VMEMAP enabled. The crash logs can be seen at [1].
compact_zone() memunmap_pages ------------- --------------- __pageblock_pfn_to_page ...... (a)pfn_valid(): valid_section()//return true (b)__remove_pages()-> sparse_remove_section()-> section_deactivate(): [Free the array ms->usage and set ms->usage = NULL] pfn_section_valid() [Access ms->usage which is NULL]
NOTE: From the above it can be said that the race is reduced to between the pfn_valid()/pfn_section_valid() and the section deactivate with SPASEMEM_VMEMAP enabled.
The commit b943f045a9af("mm/sparse: fix kernel crash with pfn_section_valid check") tried to address the same problem by clearing the SECTION_HAS_MEM_MAP with the expectation of valid_section() returns false thus ms->usage is not accessed.
Fix this issue by the below steps:
a) Clear SECTION_HAS_MEM_MAP before freeing the ->usage.
b) RCU protected read side critical section will either return NULL when SECTION_HAS_MEM_MAP is cleared or can successfully access ->usage.
c) Free the ->usage with kfree_rcu() and set ms->usage = NULL. No attempt will be made to access ->usage after this as the SECTION_HAS_MEM_MAP is cleared thus valid_section() return false.
Thanks to David/Pavan for their inputs on this patch.
[1] https://lore.kernel.org/linux-mm/994410bb-89aa-d987-1f50-f514903c55aa@quicin...
On Snapdragon SoC, with the mentioned memory configuration of PFN's as [ZONE_NORMAL ZONE_DEVICE ZONE_NORMAL], we are able to see bunch of issues daily while testing on a device farm.
For this particular issue below is the log. Though the below log is not directly pointing to the pfn_section_valid(){ ms->usage;}, when we loaded this dump on T32 lauterbach tool, it is pointing.
[ 540.578056] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 540.578068] Mem abort info: [ 540.578070] ESR = 0x0000000096000005 [ 540.578073] EC = 0x25: DABT (current EL), IL = 32 bits [ 540.578077] SET = 0, FnV = 0 [ 540.578080] EA = 0, S1PTW = 0 [ 540.578082] FSC = 0x05: level 1 translation fault [ 540.578085] Data abort info: [ 540.578086] ISV = 0, ISS = 0x00000005 [ 540.578088] CM = 0, WnR = 0 [ 540.579431] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBSBTYPE=--) [ 540.579436] pc : __pageblock_pfn_to_page+0x6c/0x14c [ 540.579454] lr : compact_zone+0x994/0x1058 [ 540.579460] sp : ffffffc03579b510 [ 540.579463] x29: ffffffc03579b510 x28: 0000000000235800 x27:000000000000000c [ 540.579470] x26: 0000000000235c00 x25: 0000000000000068 x24:ffffffc03579b640 [ 540.579477] x23: 0000000000000001 x22: ffffffc03579b660 x21:0000000000000000 [ 540.579483] x20: 0000000000235bff x19: ffffffdebf7e3940 x18:ffffffdebf66d140 [ 540.579489] x17: 00000000739ba063 x16: 00000000739ba063 x15:00000000009f4bff [ 540.579495] x14: 0000008000000000 x13: 0000000000000000 x12:0000000000000001 [ 540.579501] x11: 0000000000000000 x10: 0000000000000000 x9 :ffffff897d2cd440 [ 540.579507] x8 : 0000000000000000 x7 : 0000000000000000 x6 :ffffffc03579b5b4 [ 540.579512] x5 : 0000000000027f25 x4 : ffffffc03579b5b8 x3 :0000000000000001 [ 540.579518] x2 : ffffffdebf7e3940 x1 : 0000000000235c00 x0 :0000000000235800 [ 540.579524] Call trace: [ 540.579527] __pageblock_pfn_to_page+0x6c/0x14c [ 540.579533] compact_zone+0x994/0x1058 [ 540.579536] try_to_compact_pages+0x128/0x378 [ 540.579540] __alloc_pages_direct_compact+0x80/0x2b0 [ 540.579544] __alloc_pages_slowpath+0x5c0/0xe10 [ 540.579547] __alloc_pages+0x250/0x2d0 [ 540.579550] __iommu_dma_alloc_noncontiguous+0x13c/0x3fc [ 540.579561] iommu_dma_alloc+0xa0/0x320 [ 540.579565] dma_alloc_attrs+0xd4/0x108
[quic_charante@quicinc.com: use kfree_rcu() in place of synchronize_rcu(), per David] Link: https://lkml.kernel.org/r/1698403778-20938-1-git-send-email-quic_charante@qu... Link: https://lkml.kernel.org/r/1697202267-23600-1-git-send-email-quic_charante@qu... Fixes: f46edbd1b151 ("mm/sparsemem: add helpers track active portions of a section at boot") Signed-off-by: Charan Teja Kalla quic_charante@quicinc.com Cc: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Cc: Dan Williams dan.j.williams@intel.com Cc: David Hildenbrand david@redhat.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Oscar Salvador osalvador@suse.de Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/mmzone.h | 14 +++++++++++--- mm/sparse.c | 17 +++++++++-------- 2 files changed, 20 insertions(+), 11 deletions(-)
--- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -1774,6 +1774,7 @@ static inline unsigned long section_nr_t #define SUBSECTION_ALIGN_DOWN(pfn) ((pfn) & PAGE_SUBSECTION_MASK)
struct mem_section_usage { + struct rcu_head rcu; #ifdef CONFIG_SPARSEMEM_VMEMMAP DECLARE_BITMAP(subsection_map, SUBSECTIONS_PER_SECTION); #endif @@ -1967,7 +1968,7 @@ static inline int pfn_section_valid(stru { int idx = subsection_map_index(pfn);
- return test_bit(idx, ms->usage->subsection_map); + return test_bit(idx, READ_ONCE(ms->usage)->subsection_map); } #else static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn) @@ -1991,6 +1992,7 @@ static inline int pfn_section_valid(stru static inline int pfn_valid(unsigned long pfn) { struct mem_section *ms; + int ret;
/* * Ensure the upper PAGE_SHIFT bits are clear in the @@ -2004,13 +2006,19 @@ static inline int pfn_valid(unsigned lon if (pfn_to_section_nr(pfn) >= NR_MEM_SECTIONS) return 0; ms = __pfn_to_section(pfn); - if (!valid_section(ms)) + rcu_read_lock(); + if (!valid_section(ms)) { + rcu_read_unlock(); return 0; + } /* * Traditionally early sections always returned pfn_valid() for * the entire section-sized span. */ - return early_section(ms) || pfn_section_valid(ms, pfn); + ret = early_section(ms) || pfn_section_valid(ms, pfn); + rcu_read_unlock(); + + return ret; } #endif
--- a/mm/sparse.c +++ b/mm/sparse.c @@ -792,6 +792,13 @@ static void section_deactivate(unsigned unsigned long section_nr = pfn_to_section_nr(pfn);
/* + * Mark the section invalid so that valid_section() + * return false. This prevents code from dereferencing + * ms->usage array. + */ + ms->section_mem_map &= ~SECTION_HAS_MEM_MAP; + + /* * When removing an early section, the usage map is kept (as the * usage maps of other sections fall into the same page). It * will be re-used when re-adding the section - which is then no @@ -799,16 +806,10 @@ static void section_deactivate(unsigned * was allocated during boot. */ if (!PageReserved(virt_to_page(ms->usage))) { - kfree(ms->usage); - ms->usage = NULL; + kfree_rcu(ms->usage, rcu); + WRITE_ONCE(ms->usage, NULL); } memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); - /* - * Mark the section invalid so that valid_section() - * return false. This prevents code from dereferencing - * ms->usage array. - */ - ms->section_mem_map &= ~SECTION_HAS_MEM_MAP; }
/*
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Al Viro viro@zeniv.linux.org.uk
commit 22e111ed6c83dcde3037fc81176012721bc34c0b upstream.
We should never lock two subdirectories without having taken ->s_vfs_rename_mutex; inode pointer order or not, the "order" proposed in 28eceeda130f "fs: Lock moved directories" is not transitive, with the usual consequences.
The rationale for locking renamed subdirectory in all cases was the possibility of race between rename modifying .. in a subdirectory to reflect the new parent and another thread modifying the same subdirectory. For a lot of filesystems that's not a problem, but for some it can lead to trouble (e.g. the case when short directory contents is kept in the inode, but creating a file in it might push it across the size limit and copy its contents into separate data block(s)).
However, we need that only in case when the parent does change - otherwise ->rename() doesn't need to do anything with .. entry in the first place. Some instances are lazy and do a tautological update anyway, but it's really not hard to avoid.
Amended locking rules for rename(): find the parent(s) of source and target if source and target have the same parent lock the common parent else lock ->s_vfs_rename_mutex lock both parents, in ancestor-first order; if neither is an ancestor of another, lock the parent of source first. find the source and target. if source and target have the same parent if operation is an overwriting rename of a subdirectory lock the target subdirectory else if source is a subdirectory lock the source if target is a subdirectory lock the target lock non-directories involved, in inode pointer order if both source and target are such.
That way we are guaranteed that parents are locked (for obvious reasons), that any renamed non-directory is locked (nfsd relies upon that), that any victim is locked (emptiness check needs that, among other things) and subdirectory that changes parent is locked (needed to protect the update of .. entries). We are also guaranteed that any operation locking more than one directory either takes ->s_vfs_rename_mutex or locks a parent followed by its child.
Cc: stable@vger.kernel.org Fixes: 28eceeda130f "fs: Lock moved directories" Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/filesystems/directory-locking.rst | 29 ++++++----- Documentation/filesystems/locking.rst | 5 +- Documentation/filesystems/porting.rst | 18 +++++++ fs/namei.c | 60 ++++++++++++++---------- 4 files changed, 74 insertions(+), 38 deletions(-)
--- a/Documentation/filesystems/directory-locking.rst +++ b/Documentation/filesystems/directory-locking.rst @@ -22,13 +22,16 @@ exclusive. 3) object removal. Locking rules: caller locks parent, finds victim, locks victim and calls the method. Locks are exclusive.
-4) rename() that is _not_ cross-directory. Locking rules: caller locks the -parent and finds source and target. We lock both (provided they exist). If we -need to lock two inodes of different type (dir vs non-dir), we lock directory -first. If we need to lock two inodes of the same type, lock them in inode -pointer order. Then call the method. All locks are exclusive. -NB: we might get away with locking the source (and target in exchange -case) shared. +4) rename() that is _not_ cross-directory. Locking rules: caller locks +the parent and finds source and target. Then we decide which of the +source and target need to be locked. Source needs to be locked if it's a +non-directory; target - if it's a non-directory or about to be removed. +Take the locks that need to be taken, in inode pointer order if need +to take both (that can happen only when both source and target are +non-directories - the source because it wouldn't be locked otherwise +and the target because mixing directory and non-directory is allowed +only with RENAME_EXCHANGE, and that won't be removing the target). +After the locks had been taken, call the method. All locks are exclusive.
5) link creation. Locking rules:
@@ -44,20 +47,17 @@ rules:
* lock the filesystem * lock parents in "ancestors first" order. If one is not ancestor of - the other, lock them in inode pointer order. + the other, lock the parent of source first. * find source and target. * if old parent is equal to or is a descendent of target fail with -ENOTEMPTY * if new parent is equal to or is a descendent of source fail with -ELOOP - * Lock both the source and the target provided they exist. If we - need to lock two inodes of different type (dir vs non-dir), we lock - the directory first. If we need to lock two inodes of the same type, - lock them in inode pointer order. + * Lock subdirectories involved (source before target). + * Lock non-directories involved, in inode pointer order. * call the method.
-All ->i_rwsem are taken exclusive. Again, we might get away with locking -the source (and target in exchange case) shared. +All ->i_rwsem are taken exclusive.
The rules above obviously guarantee that all directories that are going to be read, modified or removed by method will be locked by caller. @@ -67,6 +67,7 @@ If no directory is its own ancestor, the
Proof:
+[XXX: will be updated once we are done massaging the lock_rename()] First of all, at any moment we have a linear ordering of the objects - A < B iff (A is an ancestor of B) or (B is not an ancestor of A and ptr(A) < ptr(B)). --- a/Documentation/filesystems/locking.rst +++ b/Documentation/filesystems/locking.rst @@ -101,7 +101,7 @@ symlink: exclusive mkdir: exclusive unlink: exclusive (both) rmdir: exclusive (both)(see below) -rename: exclusive (all) (see below) +rename: exclusive (both parents, some children) (see below) readlink: no get_link: no setattr: exclusive @@ -123,6 +123,9 @@ get_offset_ctx no Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem exclusive on victim. cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem. + ->unlink() and ->rename() have ->i_rwsem exclusive on all non-directories + involved. + ->rename() has ->i_rwsem exclusive on any subdirectory that changes parent.
See Documentation/filesystems/directory-locking.rst for more detailed discussion of the locking scheme for directory operations. --- a/Documentation/filesystems/porting.rst +++ b/Documentation/filesystems/porting.rst @@ -1045,3 +1045,21 @@ filesystem type is now moved to a later As this is a VFS level change it has no practical consequences for filesystems other than that all of them must use one of the provided kill_litter_super(), kill_anon_super(), or kill_block_super() helpers. + +--- + +**mandatory** + +If ->rename() update of .. on cross-directory move needs an exclusion with +directory modifications, do *not* lock the subdirectory in question in your +->rename() - it's done by the caller now [that item should've been added in +28eceeda130f "fs: Lock moved directories"]. + +--- + +**mandatory** + +On same-directory ->rename() the (tautological) update of .. is not protected +by any locks; just don't do it if the old parent is the same as the new one. +We really can't lock two subdirectories in same-directory rename - not without +deadlocks. --- a/fs/namei.c +++ b/fs/namei.c @@ -3021,20 +3021,14 @@ static struct dentry *lock_two_directori p = d_ancestor(p2, p1); if (p) { inode_lock_nested(p2->d_inode, I_MUTEX_PARENT); - inode_lock_nested(p1->d_inode, I_MUTEX_CHILD); + inode_lock_nested(p1->d_inode, I_MUTEX_PARENT2); return p; }
p = d_ancestor(p1, p2); - if (p) { - inode_lock_nested(p1->d_inode, I_MUTEX_PARENT); - inode_lock_nested(p2->d_inode, I_MUTEX_CHILD); - return p; - } - - lock_two_inodes(p1->d_inode, p2->d_inode, - I_MUTEX_PARENT, I_MUTEX_PARENT2); - return NULL; + inode_lock_nested(p1->d_inode, I_MUTEX_PARENT); + inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2); + return p; }
/* @@ -4733,11 +4727,12 @@ SYSCALL_DEFINE2(link, const char __user * * a) we can get into loop creation. * b) race potential - two innocent renames can create a loop together. - * That's where 4.4 screws up. Current fix: serialization on + * That's where 4.4BSD screws up. Current fix: serialization on * sb->s_vfs_rename_mutex. We might be more accurate, but that's another * story. - * c) we have to lock _four_ objects - parents and victim (if it exists), - * and source. + * c) we may have to lock up to _four_ objects - parents and victim (if it exists), + * and source (if it's a non-directory or a subdirectory that moves to + * different parent). * And that - after we got ->i_mutex on parents (until then we don't know * whether the target exists). Solution: try to be smart with locking * order for inodes. We rely on the fact that tree topology may change @@ -4769,6 +4764,7 @@ int vfs_rename(struct renamedata *rd) bool new_is_dir = false; unsigned max_links = new_dir->i_sb->s_max_links; struct name_snapshot old_name; + bool lock_old_subdir, lock_new_subdir;
if (source == target) return 0; @@ -4822,15 +4818,32 @@ int vfs_rename(struct renamedata *rd) take_dentry_name_snapshot(&old_name, old_dentry); dget(new_dentry); /* - * Lock all moved children. Moved directories may need to change parent - * pointer so they need the lock to prevent against concurrent - * directory changes moving parent pointer. For regular files we've - * historically always done this. The lockdep locking subclasses are - * somewhat arbitrary but RENAME_EXCHANGE in particular can swap - * regular files and directories so it's difficult to tell which - * subclasses to use. + * Lock children. + * The source subdirectory needs to be locked on cross-directory + * rename or cross-directory exchange since its parent changes. + * The target subdirectory needs to be locked on cross-directory + * exchange due to parent change and on any rename due to becoming + * a victim. + * Non-directories need locking in all cases (for NFS reasons); + * they get locked after any subdirectories (in inode address order). + * + * NOTE: WE ONLY LOCK UNRELATED DIRECTORIES IN CROSS-DIRECTORY CASE. + * NEVER, EVER DO THAT WITHOUT ->s_vfs_rename_mutex. */ - lock_two_inodes(source, target, I_MUTEX_NORMAL, I_MUTEX_NONDIR2); + lock_old_subdir = new_dir != old_dir; + lock_new_subdir = new_dir != old_dir || !(flags & RENAME_EXCHANGE); + if (is_dir) { + if (lock_old_subdir) + inode_lock_nested(source, I_MUTEX_CHILD); + if (target && (!new_is_dir || lock_new_subdir)) + inode_lock(target); + } else if (new_is_dir) { + if (lock_new_subdir) + inode_lock_nested(target, I_MUTEX_CHILD); + inode_lock(source); + } else { + lock_two_nondirectories(source, target); + }
error = -EPERM; if (IS_SWAPFILE(source) || (target && IS_SWAPFILE(target))) @@ -4878,8 +4891,9 @@ int vfs_rename(struct renamedata *rd) d_exchange(old_dentry, new_dentry); } out: - inode_unlock(source); - if (target) + if (!is_dir || lock_old_subdir) + inode_unlock(source); + if (target && (!new_is_dir || lock_new_subdir)) inode_unlock(target); dput(new_dentry); if (!error) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit 3837a0379533aabb9e4483677077479f7c6aa910 upstream.
With this current driver regmap implementation, it is hard to make sense of the register addresses displayed using the regmap debugfs interface, because they do not correspond to the actual register addresses documented in the datasheet. For example, register 1 is displayed as registers 04 thru 07:
$ cat /sys/kernel/debug/regmap/spi0.0/registers 04: 10 -> Port 0, register offset 1 05: 10 -> Port 1, register offset 1 06: 00 -> Port 2, register offset 1 -> invalid 07: 00 -> port 3, register offset 1 -> invalid ...
The reason is that bits 0 and 1 of the register address correspond to the channel (port) bits, so the register address itself starts at bit 2, and we must 'mentally' shift each register address by 2 bits to get its real address/offset.
Also, only channels 0 and 1 are supported by the chip, so channel mask combinations of 10b and 11b are invalid, and the display of these registers is useless.
This patch adds a separate regmap configuration for each port, similar to what is done in the max310x driver, so that register addresses displayed match the register addresses in the chip datasheet. Also, each port now has its own debugfs entry.
Example with new regmap implementation:
$ cat /sys/kernel/debug/regmap/spi0.0-port0/registers 1: 10 2: 01 3: 00 ...
$ cat /sys/kernel/debug/regmap/spi0.0-port1/registers 1: 10 2: 01 3: 00
As an added bonus, this also simplifies some operations (read/write/modify) because it is no longer necessary to manually shift register addresses.
Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20231030211447.974779-1-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sc16is7xx.c | 143 +++++++++++++++++++++++------------------ 1 file changed, 81 insertions(+), 62 deletions(-)
--- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -301,8 +301,8 @@
/* Misc definitions */ +#define SC16IS7XX_SPI_READ_BIT BIT(7) #define SC16IS7XX_FIFO_SIZE (64) -#define SC16IS7XX_REG_SHIFT 2 #define SC16IS7XX_GPIOS_PER_BANK 4
struct sc16is7xx_devtype { @@ -324,6 +324,7 @@ struct sc16is7xx_one_config { struct sc16is7xx_one { struct uart_port port; u8 line; + struct regmap *regmap; struct kthread_work tx_work; struct kthread_work reg_work; struct kthread_delayed_work ms_work; @@ -362,48 +363,37 @@ static void sc16is7xx_stop_tx(struct uar #define to_sc16is7xx_port(p,e) ((container_of((p), struct sc16is7xx_port, e))) #define to_sc16is7xx_one(p,e) ((container_of((p), struct sc16is7xx_one, e)))
-static int sc16is7xx_line(struct uart_port *port) -{ - struct sc16is7xx_one *one = to_sc16is7xx_one(port, port); - - return one->line; -} - static u8 sc16is7xx_port_read(struct uart_port *port, u8 reg) { - struct sc16is7xx_port *s = dev_get_drvdata(port->dev); + struct sc16is7xx_one *one = to_sc16is7xx_one(port, port); unsigned int val = 0; - const u8 line = sc16is7xx_line(port);
- regmap_read(s->regmap, (reg << SC16IS7XX_REG_SHIFT) | line, &val); + regmap_read(one->regmap, reg, &val);
return val; }
static void sc16is7xx_port_write(struct uart_port *port, u8 reg, u8 val) { - struct sc16is7xx_port *s = dev_get_drvdata(port->dev); - const u8 line = sc16is7xx_line(port); + struct sc16is7xx_one *one = to_sc16is7xx_one(port, port);
- regmap_write(s->regmap, (reg << SC16IS7XX_REG_SHIFT) | line, val); + regmap_write(one->regmap, reg, val); }
static void sc16is7xx_fifo_read(struct uart_port *port, unsigned int rxlen) { struct sc16is7xx_port *s = dev_get_drvdata(port->dev); - const u8 line = sc16is7xx_line(port); - u8 addr = (SC16IS7XX_RHR_REG << SC16IS7XX_REG_SHIFT) | line; + struct sc16is7xx_one *one = to_sc16is7xx_one(port, port);
- regcache_cache_bypass(s->regmap, true); - regmap_raw_read(s->regmap, addr, s->buf, rxlen); - regcache_cache_bypass(s->regmap, false); + regcache_cache_bypass(one->regmap, true); + regmap_raw_read(one->regmap, SC16IS7XX_RHR_REG, s->buf, rxlen); + regcache_cache_bypass(one->regmap, false); }
static void sc16is7xx_fifo_write(struct uart_port *port, u8 to_send) { struct sc16is7xx_port *s = dev_get_drvdata(port->dev); - const u8 line = sc16is7xx_line(port); - u8 addr = (SC16IS7XX_THR_REG << SC16IS7XX_REG_SHIFT) | line; + struct sc16is7xx_one *one = to_sc16is7xx_one(port, port);
/* * Don't send zero-length data, at least on SPI it confuses the chip @@ -412,19 +402,17 @@ static void sc16is7xx_fifo_write(struct if (unlikely(!to_send)) return;
- regcache_cache_bypass(s->regmap, true); - regmap_raw_write(s->regmap, addr, s->buf, to_send); - regcache_cache_bypass(s->regmap, false); + regcache_cache_bypass(one->regmap, true); + regmap_raw_write(one->regmap, SC16IS7XX_THR_REG, s->buf, to_send); + regcache_cache_bypass(one->regmap, false); }
static void sc16is7xx_port_update(struct uart_port *port, u8 reg, u8 mask, u8 val) { - struct sc16is7xx_port *s = dev_get_drvdata(port->dev); - const u8 line = sc16is7xx_line(port); + struct sc16is7xx_one *one = to_sc16is7xx_one(port, port);
- regmap_update_bits(s->regmap, (reg << SC16IS7XX_REG_SHIFT) | line, - mask, val); + regmap_update_bits(one->regmap, reg, mask, val); }
static int sc16is7xx_alloc_line(void) @@ -479,7 +467,7 @@ static const struct sc16is7xx_devtype sc
static bool sc16is7xx_regmap_volatile(struct device *dev, unsigned int reg) { - switch (reg >> SC16IS7XX_REG_SHIFT) { + switch (reg) { case SC16IS7XX_RHR_REG: case SC16IS7XX_IIR_REG: case SC16IS7XX_LSR_REG: @@ -498,7 +486,7 @@ static bool sc16is7xx_regmap_volatile(st
static bool sc16is7xx_regmap_precious(struct device *dev, unsigned int reg) { - switch (reg >> SC16IS7XX_REG_SHIFT) { + switch (reg) { case SC16IS7XX_RHR_REG: return true; default: @@ -511,6 +499,7 @@ static bool sc16is7xx_regmap_precious(st static int sc16is7xx_set_baud(struct uart_port *port, int baud) { struct sc16is7xx_port *s = dev_get_drvdata(port->dev); + struct sc16is7xx_one *one = to_sc16is7xx_one(port, port); u8 lcr; u8 prescaler = 0; unsigned long clk = port->uartclk, div = clk / 16 / baud; @@ -542,12 +531,12 @@ static int sc16is7xx_set_baud(struct uar SC16IS7XX_LCR_CONF_MODE_B);
/* Enable enhanced features */ - regcache_cache_bypass(s->regmap, true); + regcache_cache_bypass(one->regmap, true); sc16is7xx_port_update(port, SC16IS7XX_EFR_REG, SC16IS7XX_EFR_ENABLE_BIT, SC16IS7XX_EFR_ENABLE_BIT);
- regcache_cache_bypass(s->regmap, false); + regcache_cache_bypass(one->regmap, false);
/* Put LCR back to the normal mode */ sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, lcr); @@ -563,10 +552,10 @@ static int sc16is7xx_set_baud(struct uar SC16IS7XX_LCR_CONF_MODE_A);
/* Write the new divisor */ - regcache_cache_bypass(s->regmap, true); + regcache_cache_bypass(one->regmap, true); sc16is7xx_port_write(port, SC16IS7XX_DLH_REG, div / 256); sc16is7xx_port_write(port, SC16IS7XX_DLL_REG, div % 256); - regcache_cache_bypass(s->regmap, false); + regcache_cache_bypass(one->regmap, false);
/* Put LCR back to the normal mode */ sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, lcr); @@ -1092,7 +1081,7 @@ static void sc16is7xx_set_termios(struct SC16IS7XX_LCR_CONF_MODE_B);
/* Configure flow control */ - regcache_cache_bypass(s->regmap, true); + regcache_cache_bypass(one->regmap, true); sc16is7xx_port_write(port, SC16IS7XX_XON1_REG, termios->c_cc[VSTART]); sc16is7xx_port_write(port, SC16IS7XX_XOFF1_REG, termios->c_cc[VSTOP]);
@@ -1111,7 +1100,7 @@ static void sc16is7xx_set_termios(struct SC16IS7XX_EFR_REG, SC16IS7XX_EFR_FLOWCTRL_BITS, flow); - regcache_cache_bypass(s->regmap, false); + regcache_cache_bypass(one->regmap, false);
/* Update LCR register */ sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, lcr); @@ -1162,7 +1151,6 @@ static int sc16is7xx_config_rs485(struct static int sc16is7xx_startup(struct uart_port *port) { struct sc16is7xx_one *one = to_sc16is7xx_one(port, port); - struct sc16is7xx_port *s = dev_get_drvdata(port->dev); unsigned int val; unsigned long flags;
@@ -1179,7 +1167,7 @@ static int sc16is7xx_startup(struct uart sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, SC16IS7XX_LCR_CONF_MODE_B);
- regcache_cache_bypass(s->regmap, true); + regcache_cache_bypass(one->regmap, true);
/* Enable write access to enhanced features and internal clock div */ sc16is7xx_port_update(port, SC16IS7XX_EFR_REG, @@ -1197,7 +1185,7 @@ static int sc16is7xx_startup(struct uart SC16IS7XX_TCR_RX_RESUME(24) | SC16IS7XX_TCR_RX_HALT(48));
- regcache_cache_bypass(s->regmap, false); + regcache_cache_bypass(one->regmap, false);
/* Now, initialize the UART */ sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, SC16IS7XX_LCR_WORD_LEN_8); @@ -1455,7 +1443,7 @@ static int sc16is7xx_setup_mctrl_ports(s if (s->mctrl_mask) regmap_update_bits( s->regmap, - SC16IS7XX_IOCONTROL_REG << SC16IS7XX_REG_SHIFT, + SC16IS7XX_IOCONTROL_REG, SC16IS7XX_IOCONTROL_MODEM_A_BIT | SC16IS7XX_IOCONTROL_MODEM_B_BIT, s->mctrl_mask);
@@ -1470,7 +1458,7 @@ static const struct serial_rs485 sc16is7
static int sc16is7xx_probe(struct device *dev, const struct sc16is7xx_devtype *devtype, - struct regmap *regmap, int irq) + struct regmap *regmaps[], int irq) { unsigned long freq = 0, *pfreq = dev_get_platdata(dev); unsigned int val; @@ -1478,16 +1466,16 @@ static int sc16is7xx_probe(struct device int i, ret; struct sc16is7xx_port *s;
- if (IS_ERR(regmap)) - return PTR_ERR(regmap); + for (i = 0; i < devtype->nr_uart; i++) + if (IS_ERR(regmaps[i])) + return PTR_ERR(regmaps[i]);
/* * This device does not have an identification register that would * tell us if we are really connected to the correct device. * The best we can do is to check if communication is at all possible. */ - ret = regmap_read(regmap, - SC16IS7XX_LSR_REG << SC16IS7XX_REG_SHIFT, &val); + ret = regmap_read(regmaps[0], SC16IS7XX_LSR_REG, &val); if (ret < 0) return -EPROBE_DEFER;
@@ -1521,7 +1509,7 @@ static int sc16is7xx_probe(struct device return -EINVAL; }
- s->regmap = regmap; + s->regmap = regmaps[0]; s->devtype = devtype; dev_set_drvdata(dev, s); mutex_init(&s->efr_lock); @@ -1536,8 +1524,8 @@ static int sc16is7xx_probe(struct device sched_set_fifo(s->kworker_task);
/* reset device, purging any pending irq / data */ - regmap_write(s->regmap, SC16IS7XX_IOCONTROL_REG << SC16IS7XX_REG_SHIFT, - SC16IS7XX_IOCONTROL_SRESET_BIT); + regmap_write(s->regmap, SC16IS7XX_IOCONTROL_REG, + SC16IS7XX_IOCONTROL_SRESET_BIT);
for (i = 0; i < devtype->nr_uart; ++i) { s->p[i].line = i; @@ -1561,6 +1549,7 @@ static int sc16is7xx_probe(struct device s->p[i].port.ops = &sc16is7xx_ops; s->p[i].old_mctrl = 0; s->p[i].port.line = sc16is7xx_alloc_line(); + s->p[i].regmap = regmaps[i];
if (s->p[i].port.line >= SC16IS7XX_MAX_DEVS) { ret = -ENOMEM; @@ -1589,13 +1578,13 @@ static int sc16is7xx_probe(struct device sc16is7xx_port_write(&s->p[i].port, SC16IS7XX_LCR_REG, SC16IS7XX_LCR_CONF_MODE_B);
- regcache_cache_bypass(s->regmap, true); + regcache_cache_bypass(regmaps[i], true);
/* Enable write access to enhanced features */ sc16is7xx_port_write(&s->p[i].port, SC16IS7XX_EFR_REG, SC16IS7XX_EFR_ENABLE_BIT);
- regcache_cache_bypass(s->regmap, false); + regcache_cache_bypass(regmaps[i], false);
/* Restore access to general registers */ sc16is7xx_port_write(&s->p[i].port, SC16IS7XX_LCR_REG, 0x00); @@ -1698,19 +1687,36 @@ static const struct of_device_id __maybe MODULE_DEVICE_TABLE(of, sc16is7xx_dt_ids);
static struct regmap_config regcfg = { - .reg_bits = 7, - .pad_bits = 1, + .reg_bits = 5, + .pad_bits = 3, .val_bits = 8, .cache_type = REGCACHE_RBTREE, .volatile_reg = sc16is7xx_regmap_volatile, .precious_reg = sc16is7xx_regmap_precious, + .max_register = SC16IS7XX_EFCR_REG, };
+static const char *sc16is7xx_regmap_name(unsigned int port_id) +{ + static char buf[6]; + + snprintf(buf, sizeof(buf), "port%d", port_id); + + return buf; +} + +static unsigned int sc16is7xx_regmap_port_mask(unsigned int port_id) +{ + /* CH1,CH0 are at bits 2:1. */ + return port_id << 1; +} + #ifdef CONFIG_SERIAL_SC16IS7XX_SPI static int sc16is7xx_spi_probe(struct spi_device *spi) { const struct sc16is7xx_devtype *devtype; - struct regmap *regmap; + struct regmap *regmaps[2]; + unsigned int i; int ret;
/* Setup SPI bus */ @@ -1735,11 +1741,20 @@ static int sc16is7xx_spi_probe(struct sp devtype = (struct sc16is7xx_devtype *)id_entry->driver_data; }
- regcfg.max_register = (0xf << SC16IS7XX_REG_SHIFT) | - (devtype->nr_uart - 1); - regmap = devm_regmap_init_spi(spi, ®cfg); + for (i = 0; i < devtype->nr_uart; i++) { + regcfg.name = sc16is7xx_regmap_name(i); + /* + * If read_flag_mask is 0, the regmap code sets it to a default + * of 0x80. Since we specify our own mask, we must add the READ + * bit ourselves: + */ + regcfg.read_flag_mask = sc16is7xx_regmap_port_mask(i) | + SC16IS7XX_SPI_READ_BIT; + regcfg.write_flag_mask = sc16is7xx_regmap_port_mask(i); + regmaps[i] = devm_regmap_init_spi(spi, ®cfg); + }
- return sc16is7xx_probe(&spi->dev, devtype, regmap, spi->irq); + return sc16is7xx_probe(&spi->dev, devtype, regmaps, spi->irq); }
static void sc16is7xx_spi_remove(struct spi_device *spi) @@ -1778,7 +1793,8 @@ static int sc16is7xx_i2c_probe(struct i2 { const struct i2c_device_id *id = i2c_client_get_device_id(i2c); const struct sc16is7xx_devtype *devtype; - struct regmap *regmap; + struct regmap *regmaps[2]; + unsigned int i;
if (i2c->dev.of_node) { devtype = device_get_match_data(&i2c->dev); @@ -1788,11 +1804,14 @@ static int sc16is7xx_i2c_probe(struct i2 devtype = (struct sc16is7xx_devtype *)id->driver_data; }
- regcfg.max_register = (0xf << SC16IS7XX_REG_SHIFT) | - (devtype->nr_uart - 1); - regmap = devm_regmap_init_i2c(i2c, ®cfg); + for (i = 0; i < devtype->nr_uart; i++) { + regcfg.name = sc16is7xx_regmap_name(i); + regcfg.read_flag_mask = sc16is7xx_regmap_port_mask(i); + regcfg.write_flag_mask = sc16is7xx_regmap_port_mask(i); + regmaps[i] = devm_regmap_init_i2c(i2c, ®cfg); + }
- return sc16is7xx_probe(&i2c->dev, devtype, regmap, i2c->irq); + return sc16is7xx_probe(&i2c->dev, devtype, regmaps, i2c->irq); }
static void sc16is7xx_i2c_remove(struct i2c_client *client)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit 6bcab3c8acc88e265c570dea969fd04f137c8a4c upstream.
Using a static buffer inside sc16is7xx_regmap_name() was a convenient and simple way to set the regmap name without having to allocate and free a buffer each time it is called. The drawback is that the static buffer wastes memory for nothing once regmap is fully initialized.
Remove static buffer and use constant strings instead.
This also avoids a truncation warning when using "%d" or "%u" in snprintf which was flagged by kernel test robot.
Fixes: 3837a0379533 ("serial: sc16is7xx: improve regmap debugfs by using one regmap per port") Cc: stable@vger.kernel.org # 6.1.x: 3837a03 serial: sc16is7xx: improve regmap debugfs by using one regmap per port Suggested-by: Andy Shevchenko andy.shevchenko@gmail.com Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20231211171353.2901416-2-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sc16is7xx.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
--- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -1696,13 +1696,15 @@ static struct regmap_config regcfg = { .max_register = SC16IS7XX_EFCR_REG, };
-static const char *sc16is7xx_regmap_name(unsigned int port_id) +static const char *sc16is7xx_regmap_name(u8 port_id) { - static char buf[6]; - - snprintf(buf, sizeof(buf), "port%d", port_id); - - return buf; + switch (port_id) { + case 0: return "port0"; + case 1: return "port1"; + default: + WARN_ON(true); + return NULL; + } }
static unsigned int sc16is7xx_regmap_port_mask(unsigned int port_id)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit f6959c5217bd799bcb770b95d3c09b3244e175c6 upstream.
Remove global struct regmap so that it is more obvious that this regmap is to be used only in the probe function.
Also add a comment to that effect in probe function.
Fixes: 3837a0379533 ("serial: sc16is7xx: improve regmap debugfs by using one regmap per port") Cc: stable@vger.kernel.org Suggested-by: Andy Shevchenko andy.shevchenko@gmail.com Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20231211171353.2901416-3-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sc16is7xx.c | 15 +++++++++------ 1 file changed, 9 insertions(+), 6 deletions(-)
--- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -335,7 +335,6 @@ struct sc16is7xx_one {
struct sc16is7xx_port { const struct sc16is7xx_devtype *devtype; - struct regmap *regmap; struct clk *clk; #ifdef CONFIG_GPIOLIB struct gpio_chip gpio; @@ -1413,7 +1412,8 @@ static int sc16is7xx_setup_gpio_chip(str /* * Configure ports designated to operate as modem control lines. */ -static int sc16is7xx_setup_mctrl_ports(struct sc16is7xx_port *s) +static int sc16is7xx_setup_mctrl_ports(struct sc16is7xx_port *s, + struct regmap *regmap) { int i; int ret; @@ -1442,7 +1442,7 @@ static int sc16is7xx_setup_mctrl_ports(s
if (s->mctrl_mask) regmap_update_bits( - s->regmap, + regmap, SC16IS7XX_IOCONTROL_REG, SC16IS7XX_IOCONTROL_MODEM_A_BIT | SC16IS7XX_IOCONTROL_MODEM_B_BIT, s->mctrl_mask); @@ -1474,6 +1474,10 @@ static int sc16is7xx_probe(struct device * This device does not have an identification register that would * tell us if we are really connected to the correct device. * The best we can do is to check if communication is at all possible. + * + * Note: regmap[0] is used in the probe function to access registers + * common to all channels/ports, as it is guaranteed to be present on + * all variants. */ ret = regmap_read(regmaps[0], SC16IS7XX_LSR_REG, &val); if (ret < 0) @@ -1509,7 +1513,6 @@ static int sc16is7xx_probe(struct device return -EINVAL; }
- s->regmap = regmaps[0]; s->devtype = devtype; dev_set_drvdata(dev, s); mutex_init(&s->efr_lock); @@ -1524,7 +1527,7 @@ static int sc16is7xx_probe(struct device sched_set_fifo(s->kworker_task);
/* reset device, purging any pending irq / data */ - regmap_write(s->regmap, SC16IS7XX_IOCONTROL_REG, + regmap_write(regmaps[0], SC16IS7XX_IOCONTROL_REG, SC16IS7XX_IOCONTROL_SRESET_BIT);
for (i = 0; i < devtype->nr_uart; ++i) { @@ -1604,7 +1607,7 @@ static int sc16is7xx_probe(struct device s->p[u].irda_mode = true; }
- ret = sc16is7xx_setup_mctrl_ports(s); + ret = sc16is7xx_setup_mctrl_ports(s, regmaps[0]); if (ret) goto out_ports;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit 41a308cbedb2a68a6831f0f2e992e296c4b8aff0 upstream.
Now that the driver has been converted to use one regmap per port, the line structure member is no longer used, so remove it.
Fixes: 3837a0379533 ("serial: sc16is7xx: improve regmap debugfs by using one regmap per port") Cc: stable@vger.kernel.org Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20231211171353.2901416-4-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sc16is7xx.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -323,7 +323,6 @@ struct sc16is7xx_one_config {
struct sc16is7xx_one { struct uart_port port; - u8 line; struct regmap *regmap; struct kthread_work tx_work; struct kthread_work reg_work; @@ -1531,7 +1530,6 @@ static int sc16is7xx_probe(struct device SC16IS7XX_IOCONTROL_SRESET_BIT);
for (i = 0; i < devtype->nr_uart; ++i) { - s->p[i].line = i; /* Initialize port data */ s->p[i].port.dev = dev; s->p[i].port.irq = irq;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit 4409df5866b7ff7686ba27e449ca97a92ee063c9 upstream.
Now that the driver has been converted to use one regmap per port, change efr locking to operate on a channel basis instead of on the whole IC.
Fixes: 3837a0379533 ("serial: sc16is7xx: improve regmap debugfs by using one regmap per port") Cc: stable@vger.kernel.org # 6.1.x: 3837a03 serial: sc16is7xx: improve regmap debugfs by using one regmap per port Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20231211171353.2901416-5-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sc16is7xx.c | 49 +++++++++++++++++++++-------------------- 1 file changed, 26 insertions(+), 23 deletions(-)
--- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -324,6 +324,7 @@ struct sc16is7xx_one_config { struct sc16is7xx_one { struct uart_port port; struct regmap *regmap; + struct mutex efr_lock; /* EFR registers access */ struct kthread_work tx_work; struct kthread_work reg_work; struct kthread_delayed_work ms_work; @@ -343,7 +344,6 @@ struct sc16is7xx_port { unsigned char buf[SC16IS7XX_FIFO_SIZE]; struct kthread_worker kworker; struct task_struct *kworker_task; - struct mutex efr_lock; struct sc16is7xx_one p[]; };
@@ -496,7 +496,6 @@ static bool sc16is7xx_regmap_precious(st
static int sc16is7xx_set_baud(struct uart_port *port, int baud) { - struct sc16is7xx_port *s = dev_get_drvdata(port->dev); struct sc16is7xx_one *one = to_sc16is7xx_one(port, port); u8 lcr; u8 prescaler = 0; @@ -520,7 +519,7 @@ static int sc16is7xx_set_baud(struct uar * because the bulk of the interrupt processing is run as a workqueue * job in thread context. */ - mutex_lock(&s->efr_lock); + mutex_lock(&one->efr_lock);
lcr = sc16is7xx_port_read(port, SC16IS7XX_LCR_REG);
@@ -539,7 +538,7 @@ static int sc16is7xx_set_baud(struct uar /* Put LCR back to the normal mode */ sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, lcr);
- mutex_unlock(&s->efr_lock); + mutex_unlock(&one->efr_lock);
sc16is7xx_port_update(port, SC16IS7XX_MCR_REG, SC16IS7XX_MCR_CLKSEL_BIT, @@ -707,11 +706,10 @@ static unsigned int sc16is7xx_get_hwmctr static void sc16is7xx_update_mlines(struct sc16is7xx_one *one) { struct uart_port *port = &one->port; - struct sc16is7xx_port *s = dev_get_drvdata(port->dev); unsigned long flags; unsigned int status, changed;
- lockdep_assert_held_once(&s->efr_lock); + lockdep_assert_held_once(&one->efr_lock);
status = sc16is7xx_get_hwmctrl(port); changed = status ^ one->old_mctrl; @@ -737,15 +735,20 @@ static void sc16is7xx_update_mlines(stru
static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno) { + bool rc = true; struct uart_port *port = &s->p[portno].port; + struct sc16is7xx_one *one = to_sc16is7xx_one(port, port); + + mutex_lock(&one->efr_lock);
do { unsigned int iir, rxlen; - struct sc16is7xx_one *one = to_sc16is7xx_one(port, port);
iir = sc16is7xx_port_read(port, SC16IS7XX_IIR_REG); - if (iir & SC16IS7XX_IIR_NO_INT_BIT) - return false; + if (iir & SC16IS7XX_IIR_NO_INT_BIT) { + rc = false; + goto out_port_irq; + }
iir &= SC16IS7XX_IIR_ID_MASK;
@@ -785,15 +788,17 @@ static bool sc16is7xx_port_irq(struct sc break; } } while (0); - return true; + +out_port_irq: + mutex_unlock(&one->efr_lock); + + return rc; }
static irqreturn_t sc16is7xx_irq(int irq, void *dev_id) { struct sc16is7xx_port *s = (struct sc16is7xx_port *)dev_id;
- mutex_lock(&s->efr_lock); - while (1) { bool keep_polling = false; int i; @@ -804,24 +809,22 @@ static irqreturn_t sc16is7xx_irq(int irq break; }
- mutex_unlock(&s->efr_lock); - return IRQ_HANDLED; }
static void sc16is7xx_tx_proc(struct kthread_work *ws) { struct uart_port *port = &(to_sc16is7xx_one(ws, tx_work)->port); - struct sc16is7xx_port *s = dev_get_drvdata(port->dev); + struct sc16is7xx_one *one = to_sc16is7xx_one(port, port); unsigned long flags;
if ((port->rs485.flags & SER_RS485_ENABLED) && (port->rs485.delay_rts_before_send > 0)) msleep(port->rs485.delay_rts_before_send);
- mutex_lock(&s->efr_lock); + mutex_lock(&one->efr_lock); sc16is7xx_handle_tx(port); - mutex_unlock(&s->efr_lock); + mutex_unlock(&one->efr_lock);
spin_lock_irqsave(&port->lock, flags); sc16is7xx_ier_set(port, SC16IS7XX_IER_THRI_BIT); @@ -928,9 +931,9 @@ static void sc16is7xx_ms_proc(struct kth struct sc16is7xx_port *s = dev_get_drvdata(one->port.dev);
if (one->port.state) { - mutex_lock(&s->efr_lock); + mutex_lock(&one->efr_lock); sc16is7xx_update_mlines(one); - mutex_unlock(&s->efr_lock); + mutex_unlock(&one->efr_lock);
kthread_queue_delayed_work(&s->kworker, &one->ms_work, HZ); } @@ -1014,7 +1017,6 @@ static void sc16is7xx_set_termios(struct struct ktermios *termios, const struct ktermios *old) { - struct sc16is7xx_port *s = dev_get_drvdata(port->dev); struct sc16is7xx_one *one = to_sc16is7xx_one(port, port); unsigned int lcr, flow = 0; int baud; @@ -1073,7 +1075,7 @@ static void sc16is7xx_set_termios(struct port->ignore_status_mask |= SC16IS7XX_LSR_BRK_ERROR_MASK;
/* As above, claim the mutex while accessing the EFR. */ - mutex_lock(&s->efr_lock); + mutex_lock(&one->efr_lock);
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, SC16IS7XX_LCR_CONF_MODE_B); @@ -1103,7 +1105,7 @@ static void sc16is7xx_set_termios(struct /* Update LCR register */ sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, lcr);
- mutex_unlock(&s->efr_lock); + mutex_unlock(&one->efr_lock);
/* Get baud rate generator configuration */ baud = uart_get_baud_rate(port, termios, old, @@ -1514,7 +1516,6 @@ static int sc16is7xx_probe(struct device
s->devtype = devtype; dev_set_drvdata(dev, s); - mutex_init(&s->efr_lock);
kthread_init_worker(&s->kworker); s->kworker_task = kthread_run(kthread_worker_fn, &s->kworker, @@ -1557,6 +1558,8 @@ static int sc16is7xx_probe(struct device goto out_ports; }
+ mutex_init(&s->p[i].efr_lock); + ret = uart_get_rs485_mode(&s->p[i].port); if (ret) goto out_ports;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit dbf4ab821804df071c8b566d9813083125e6d97b upstream.
The SC16IS7XX IC supports a burst mode to access the FIFOs where the initial register address is sent ($00), followed by all the FIFO data without having to resend the register address each time. In this mode, the IC doesn't increment the register address for each R/W byte.
The regmap_raw_read() and regmap_raw_write() are functions which can perform IO over multiple registers. They are currently used to read/write from/to the FIFO, and although they operate correctly in this burst mode on the SPI bus, they would corrupt the regmap cache if it was not disabled manually. The reason is that when the R/W size is more than 1 byte, these functions assume that the register address is incremented and handle the cache accordingly.
Convert FIFO R/W functions to use the regmap _noinc_ versions in order to remove the manual cache control which was a workaround when using the _raw_ versions. FIFO registers are properly declared as volatile so cache will not be used/updated for FIFO accesses.
Fixes: dfeae619d781 ("serial: sc16is7xx") Cc: stable@vger.kernel.org Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20231211171353.2901416-6-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sc16is7xx.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-)
--- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -383,9 +383,7 @@ static void sc16is7xx_fifo_read(struct u struct sc16is7xx_port *s = dev_get_drvdata(port->dev); struct sc16is7xx_one *one = to_sc16is7xx_one(port, port);
- regcache_cache_bypass(one->regmap, true); - regmap_raw_read(one->regmap, SC16IS7XX_RHR_REG, s->buf, rxlen); - regcache_cache_bypass(one->regmap, false); + regmap_noinc_read(one->regmap, SC16IS7XX_RHR_REG, s->buf, rxlen); }
static void sc16is7xx_fifo_write(struct uart_port *port, u8 to_send) @@ -400,9 +398,7 @@ static void sc16is7xx_fifo_write(struct if (unlikely(!to_send)) return;
- regcache_cache_bypass(one->regmap, true); - regmap_raw_write(one->regmap, SC16IS7XX_THR_REG, s->buf, to_send); - regcache_cache_bypass(one->regmap, false); + regmap_noinc_write(one->regmap, SC16IS7XX_THR_REG, s->buf, to_send); }
static void sc16is7xx_port_update(struct uart_port *port, u8 reg, @@ -494,6 +490,11 @@ static bool sc16is7xx_regmap_precious(st return false; }
+static bool sc16is7xx_regmap_noinc(struct device *dev, unsigned int reg) +{ + return reg == SC16IS7XX_RHR_REG; +} + static int sc16is7xx_set_baud(struct uart_port *port, int baud) { struct sc16is7xx_one *one = to_sc16is7xx_one(port, port); @@ -1697,6 +1698,10 @@ static struct regmap_config regcfg = { .cache_type = REGCACHE_RBTREE, .volatile_reg = sc16is7xx_regmap_volatile, .precious_reg = sc16is7xx_regmap_precious, + .writeable_noinc_reg = sc16is7xx_regmap_noinc, + .readable_noinc_reg = sc16is7xx_regmap_noinc, + .max_raw_read = SC16IS7XX_FIFO_SIZE, + .max_raw_write = SC16IS7XX_FIFO_SIZE, .max_register = SC16IS7XX_EFCR_REG, };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit 8a1060ce974919f2a79807527ad82ac39336eda2 upstream.
If an error occurs during probing, the sc16is7xx_lines bitfield may be left in a state that doesn't represent the correct state of lines allocation.
For example, in a system with two SC16 devices, if an error occurs only during probing of channel (port) B of the second device, sc16is7xx_lines final state will be 00001011b instead of the expected 00000011b.
This is caused in part because of the "i--" in the for/loop located in the out_ports: error path.
Fix this by checking the return value of uart_add_one_port() and set line allocation bit only if this was successful. This allows the refactor of the obfuscated for(i--...) loop in the error path, and properly call uart_remove_one_port() only when needed, and properly unset line allocation bits.
Also use same mechanism in remove() when calling uart_remove_one_port().
Fixes: c64349722d14 ("sc16is7xx: support multiple devices") Cc: stable@vger.kernel.org Cc: Yury Norov yury.norov@gmail.com Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20231221231823.2327894-2-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sc16is7xx.c | 44 ++++++++++++++++------------------------- 1 file changed, 18 insertions(+), 26 deletions(-)
--- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -409,19 +409,6 @@ static void sc16is7xx_port_update(struct regmap_update_bits(one->regmap, reg, mask, val); }
-static int sc16is7xx_alloc_line(void) -{ - int i; - - BUILD_BUG_ON(SC16IS7XX_MAX_DEVS > BITS_PER_LONG); - - for (i = 0; i < SC16IS7XX_MAX_DEVS; i++) - if (!test_and_set_bit(i, &sc16is7xx_lines)) - break; - - return i; -} - static void sc16is7xx_power(struct uart_port *port, int on) { sc16is7xx_port_update(port, SC16IS7XX_IER_REG, @@ -1532,6 +1519,13 @@ static int sc16is7xx_probe(struct device SC16IS7XX_IOCONTROL_SRESET_BIT);
for (i = 0; i < devtype->nr_uart; ++i) { + s->p[i].port.line = find_first_zero_bit(&sc16is7xx_lines, + SC16IS7XX_MAX_DEVS); + if (s->p[i].port.line >= SC16IS7XX_MAX_DEVS) { + ret = -ERANGE; + goto out_ports; + } + /* Initialize port data */ s->p[i].port.dev = dev; s->p[i].port.irq = irq; @@ -1551,14 +1545,8 @@ static int sc16is7xx_probe(struct device s->p[i].port.rs485_supported = sc16is7xx_rs485_supported; s->p[i].port.ops = &sc16is7xx_ops; s->p[i].old_mctrl = 0; - s->p[i].port.line = sc16is7xx_alloc_line(); s->p[i].regmap = regmaps[i];
- if (s->p[i].port.line >= SC16IS7XX_MAX_DEVS) { - ret = -ENOMEM; - goto out_ports; - } - mutex_init(&s->p[i].efr_lock);
ret = uart_get_rs485_mode(&s->p[i].port); @@ -1576,8 +1564,13 @@ static int sc16is7xx_probe(struct device kthread_init_work(&s->p[i].tx_work, sc16is7xx_tx_proc); kthread_init_work(&s->p[i].reg_work, sc16is7xx_reg_proc); kthread_init_delayed_work(&s->p[i].ms_work, sc16is7xx_ms_proc); + /* Register port */ - uart_add_one_port(&sc16is7xx_uart, &s->p[i].port); + ret = uart_add_one_port(&sc16is7xx_uart, &s->p[i].port); + if (ret) + goto out_ports; + + set_bit(s->p[i].port.line, &sc16is7xx_lines);
/* Enable EFR */ sc16is7xx_port_write(&s->p[i].port, SC16IS7XX_LCR_REG, @@ -1644,10 +1637,9 @@ static int sc16is7xx_probe(struct device #endif
out_ports: - for (i--; i >= 0; i--) { - uart_remove_one_port(&sc16is7xx_uart, &s->p[i].port); - clear_bit(s->p[i].port.line, &sc16is7xx_lines); - } + for (i = 0; i < devtype->nr_uart; i++) + if (test_and_clear_bit(s->p[i].port.line, &sc16is7xx_lines)) + uart_remove_one_port(&sc16is7xx_uart, &s->p[i].port);
kthread_stop(s->kworker_task);
@@ -1669,8 +1661,8 @@ static void sc16is7xx_remove(struct devi
for (i = 0; i < s->devtype->nr_uart; i++) { kthread_cancel_delayed_work_sync(&s->p[i].ms_work); - uart_remove_one_port(&sc16is7xx_uart, &s->p[i].port); - clear_bit(s->p[i].port.line, &sc16is7xx_lines); + if (test_and_clear_bit(s->p[i].port.line, &sc16is7xx_lines)) + uart_remove_one_port(&sc16is7xx_uart, &s->p[i].port); sc16is7xx_power(&s->p[i].port, 0); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit ed647256e8f226241ecff7baaecdb8632ffc7ec1 upstream.
Commit 834449872105 ("sc16is7xx: Fix for multi-channel stall") changed sc16is7xx_port_irq() from looping multiple times when there was still interrupts to serve. It simply changed the do {} while(1) loop to a do {} while(0) loop, which makes the loop itself now obsolete.
Clean the code by removing this obsolete do {} while(0) loop.
Fixes: 834449872105 ("sc16is7xx: Fix for multi-channel stall") Cc: stable@vger.kernel.org Suggested-by: Andy Shevchenko andy.shevchenko@gmail.com Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20231221231823.2327894-5-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sc16is7xx.c | 89 +++++++++++++++++++---------------------- 1 file changed, 43 insertions(+), 46 deletions(-)
--- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -724,58 +724,55 @@ static void sc16is7xx_update_mlines(stru static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno) { bool rc = true; + unsigned int iir, rxlen; struct uart_port *port = &s->p[portno].port; struct sc16is7xx_one *one = to_sc16is7xx_one(port, port);
mutex_lock(&one->efr_lock);
- do { - unsigned int iir, rxlen; - - iir = sc16is7xx_port_read(port, SC16IS7XX_IIR_REG); - if (iir & SC16IS7XX_IIR_NO_INT_BIT) { - rc = false; - goto out_port_irq; - } - - iir &= SC16IS7XX_IIR_ID_MASK; - - switch (iir) { - case SC16IS7XX_IIR_RDI_SRC: - case SC16IS7XX_IIR_RLSE_SRC: - case SC16IS7XX_IIR_RTOI_SRC: - case SC16IS7XX_IIR_XOFFI_SRC: - rxlen = sc16is7xx_port_read(port, SC16IS7XX_RXLVL_REG); - - /* - * There is a silicon bug that makes the chip report a - * time-out interrupt but no data in the FIFO. This is - * described in errata section 18.1.4. - * - * When this happens, read one byte from the FIFO to - * clear the interrupt. - */ - if (iir == SC16IS7XX_IIR_RTOI_SRC && !rxlen) - rxlen = 1; - - if (rxlen) - sc16is7xx_handle_rx(port, rxlen, iir); - break; + iir = sc16is7xx_port_read(port, SC16IS7XX_IIR_REG); + if (iir & SC16IS7XX_IIR_NO_INT_BIT) { + rc = false; + goto out_port_irq; + } + + iir &= SC16IS7XX_IIR_ID_MASK; + + switch (iir) { + case SC16IS7XX_IIR_RDI_SRC: + case SC16IS7XX_IIR_RLSE_SRC: + case SC16IS7XX_IIR_RTOI_SRC: + case SC16IS7XX_IIR_XOFFI_SRC: + rxlen = sc16is7xx_port_read(port, SC16IS7XX_RXLVL_REG); + + /* + * There is a silicon bug that makes the chip report a + * time-out interrupt but no data in the FIFO. This is + * described in errata section 18.1.4. + * + * When this happens, read one byte from the FIFO to + * clear the interrupt. + */ + if (iir == SC16IS7XX_IIR_RTOI_SRC && !rxlen) + rxlen = 1; + + if (rxlen) + sc16is7xx_handle_rx(port, rxlen, iir); + break; /* CTSRTS interrupt comes only when CTS goes inactive */ - case SC16IS7XX_IIR_CTSRTS_SRC: - case SC16IS7XX_IIR_MSI_SRC: - sc16is7xx_update_mlines(one); - break; - case SC16IS7XX_IIR_THRI_SRC: - sc16is7xx_handle_tx(port); - break; - default: - dev_err_ratelimited(port->dev, - "ttySC%i: Unexpected interrupt: %x", - port->line, iir); - break; - } - } while (0); + case SC16IS7XX_IIR_CTSRTS_SRC: + case SC16IS7XX_IIR_MSI_SRC: + sc16is7xx_update_mlines(one); + break; + case SC16IS7XX_IIR_THRI_SRC: + sc16is7xx_handle_tx(port); + break; + default: + dev_err_ratelimited(port->dev, + "ttySC%i: Unexpected interrupt: %x", + port->line, iir); + break; + }
out_port_irq: mutex_unlock(&one->efr_lock);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
commit d5078509c8b06c5c472a60232815e41af81c6446 upstream.
Simplify and improve readability by replacing while(1) loop with do {} while, and by using the keep_polling variable as the exit condition, making it more explicit.
Fixes: 834449872105 ("sc16is7xx: Fix for multi-channel stall") Cc: stable@vger.kernel.org Suggested-by: Andy Shevchenko andy.shevchenko@gmail.com Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20231221231823.2327894-6-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/sc16is7xx.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-)
--- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -782,17 +782,18 @@ out_port_irq:
static irqreturn_t sc16is7xx_irq(int irq, void *dev_id) { + bool keep_polling; + struct sc16is7xx_port *s = (struct sc16is7xx_port *)dev_id;
- while (1) { - bool keep_polling = false; + do { int i;
+ keep_polling = false; + for (i = 0; i < s->devtype->nr_uart; ++i) keep_polling |= sc16is7xx_port_irq(s, i); - if (!keep_polling) - break; - } + } while (keep_polling);
return IRQ_HANDLED; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Huacai Chen chenhuacai@loongson.cn
commit a2ccf46333d7b2cf9658f0d82ac74097c1542fae upstream.
rcutree_report_cpu_starting() must be called before cpu_probe() to avoid the following lockdep splat that triggered by calling __alloc_pages() when CONFIG_PROVE_RCU_LIST=y:
============================= WARNING: suspicious RCU usage 6.6.0+ #980 Not tainted ----------------------------- kernel/locking/lockdep.c:3761 RCU-list traversed in non-reader section!! other info that might help us debug this: RCU used illegally from offline CPU! rcu_scheduler_active = 1, debug_locks = 1 1 lock held by swapper/1/0: #0: 900000000c82ef98 (&pcp->lock){+.+.}-{2:2}, at: get_page_from_freelist+0x894/0x1790 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 6.6.0+ #980 Stack : 0000000000000001 9000000004f79508 9000000004893670 9000000100310000 90000001003137d0 0000000000000000 90000001003137d8 9000000004f79508 0000000000000000 0000000000000001 0000000000000000 90000000048a3384 203a656d616e2065 ca43677b3687e616 90000001002c3480 0000000000000008 000000000000009d 0000000000000000 0000000000000001 80000000ffffe0b8 000000000000000d 0000000000000033 0000000007ec0000 13bbf50562dad831 9000000005140748 0000000000000000 9000000004f79508 0000000000000004 0000000000000000 9000000005140748 90000001002bad40 0000000000000000 90000001002ba400 0000000000000000 9000000003573ec8 0000000000000000 00000000000000b0 0000000000000004 0000000000000000 0000000000070000 ... Call Trace: [<9000000003573ec8>] show_stack+0x38/0x150 [<9000000004893670>] dump_stack_lvl+0x74/0xa8 [<900000000360d2bc>] lockdep_rcu_suspicious+0x14c/0x190 [<900000000361235c>] __lock_acquire+0xd0c/0x2740 [<90000000036146f4>] lock_acquire+0x104/0x2c0 [<90000000048a955c>] _raw_spin_lock_irqsave+0x5c/0x90 [<900000000381cd5c>] rmqueue_bulk+0x6c/0x950 [<900000000381fc0c>] get_page_from_freelist+0xd4c/0x1790 [<9000000003821c6c>] __alloc_pages+0x1bc/0x3e0 [<9000000003583b40>] tlb_init+0x150/0x2a0 [<90000000035742a0>] per_cpu_trap_init+0xf0/0x110 [<90000000035712fc>] cpu_probe+0x3dc/0x7a0 [<900000000357ed20>] start_secondary+0x40/0xb0 [<9000000004897138>] smpboot_entry+0x54/0x58
raw_smp_processor_id() is required in order to avoid calling into lockdep before RCU has declared the CPU to be watched for readers.
See also commit 29368e093921 ("x86/smpboot: Move rcu_cpu_starting() earlier"), commit de5d9dae150c ("s390/smp: move rcu_cpu_starting() earlier") and commit 99f070b62322 ("powerpc/smp: Call rcu_cpu_starting() earlier").
Signed-off-by: Huacai Chen chenhuacai@loongson.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/loongarch/kernel/smp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/loongarch/kernel/smp.c +++ b/arch/loongarch/kernel/smp.c @@ -504,8 +504,9 @@ asmlinkage void start_secondary(void) unsigned int cpu;
sync_counter(); - cpu = smp_processor_id(); + cpu = raw_smp_processor_id(); set_my_cpu_offset(per_cpu_offset(cpu)); + rcu_cpu_starting(cpu);
cpu_probe(); constant_clockevent_init();
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Charan Teja Kalla quic_charante@quicinc.com
commit ac3f3b0a55518056bc80ed32a41931c99e1f7d81 upstream.
__alloc_pages_direct_reclaim() is called from slowpath allocation where high atomic reserves can be unreserved after there is a progress in reclaim and yet no suitable page is found. Later should_reclaim_retry() gets called from slow path allocation to decide if the reclaim needs to be retried before OOM kill path is taken.
should_reclaim_retry() checks the available(reclaimable + free pages) memory against the min wmark levels of a zone and returns:
a) true, if it is above the min wmark so that slow path allocation will do the reclaim retries.
b) false, thus slowpath allocation takes oom kill path.
should_reclaim_retry() can also unreserves the high atomic reserves **but only after all the reclaim retries are exhausted.**
In a case where there are almost none reclaimable memory and free pages contains mostly the high atomic reserves but allocation context can't use these high atomic reserves, makes the available memory below min wmark levels hence false is returned from should_reclaim_retry() leading the allocation request to take OOM kill path. This can turn into a early oom kill if high atomic reserves are holding lot of free memory and unreserving of them is not attempted.
(early)OOM is encountered on a VM with the below state: [ 295.998653] Normal free:7728kB boost:0kB min:804kB low:1004kB high:1204kB reserved_highatomic:8192KB active_anon:4kB inactive_anon:0kB active_file:24kB inactive_file:24kB unevictable:1220kB writepending:0kB present:70732kB managed:49224kB mlocked:0kB bounce:0kB free_pcp:688kB local_pcp:492kB free_cma:0kB [ 295.998656] lowmem_reserve[]: 0 32 [ 295.998659] Normal: 508*4kB (UMEH) 241*8kB (UMEH) 143*16kB (UMEH) 33*32kB (UH) 7*64kB (UH) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 7752kB
Per above log, the free memory of ~7MB exist in the high atomic reserves is not freed up before falling back to oom kill path.
Fix it by trying to unreserve the high atomic reserves in should_reclaim_retry() before __alloc_pages_direct_reclaim() can fallback to oom kill path.
Link: https://lkml.kernel.org/r/1700823445-27531-1-git-send-email-quic_charante@qu... Fixes: 0aaa29a56e4f ("mm, page_alloc: reserve pageblocks for high-order atomic allocations on demand") Signed-off-by: Charan Teja Kalla quic_charante@quicinc.com Reported-by: Chris Goldsworthy quic_cgoldswo@quicinc.com Suggested-by: Michal Hocko mhocko@suse.com Acked-by: Michal Hocko mhocko@suse.com Acked-by: David Rientjes rientjes@google.com Cc: Chris Goldsworthy quic_cgoldswo@quicinc.com Cc: David Hildenbrand david@redhat.com Cc: Johannes Weiner hannes@cmpxchg.org Cc: Mel Gorman mgorman@techsingularity.net Cc: Pavankumar Kondeti quic_pkondeti@quicinc.com Cc: Vlastimil Babka vbabka@suse.cz Cc: Joakim Tjernlund Joakim.Tjernlund@infinera.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/page_alloc.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
--- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3809,14 +3809,9 @@ should_reclaim_retry(gfp_t gfp_mask, uns else (*no_progress_loops)++;
- /* - * Make sure we converge to OOM if we cannot make any progress - * several times in the row. - */ - if (*no_progress_loops > MAX_RECLAIM_RETRIES) { - /* Before OOM, exhaust highatomic_reserve */ - return unreserve_highatomic_pageblock(ac, true); - } + if (*no_progress_loops > MAX_RECLAIM_RETRIES) + goto out; +
/* * Keep reclaiming pages while there is a chance this will lead @@ -3859,6 +3854,11 @@ should_reclaim_retry(gfp_t gfp_mask, uns schedule_timeout_uninterruptible(1); else cond_resched(); +out: + /* Before OOM, exhaust highatomic_reserve */ + if (!ret) + return unreserve_highatomic_pageblock(ac, true); + return ret; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lino Sanfilippo l.sanfilippo@kunbus.com
commit 07c30ea5861fb26a77dade8cdc787252f6122fb1 upstream.
Both the imx and stm32 driver set the rx-during-tx GPIO in rs485_config(). Since this function is called with the port lock held, this can be a problem in case that setting the GPIO line can sleep (e.g. if a GPIO expander is used which is connected via SPI or I2C).
Avoid this issue by moving the GPIO setting outside of the port lock into the serial core and thus making it a generic feature.
Also with commit c54d48543689 ("serial: stm32: Add support for rs485 RX_DURING_TX output GPIO") the SER_RS485_RX_DURING_TX flag is only set if a rx-during-tx GPIO is _not_ available, which is wrong. Fix this, too.
Furthermore reset old GPIO settings in case that changing the RS485 configuration failed.
Fixes: c54d48543689 ("serial: stm32: Add support for rs485 RX_DURING_TX output GPIO") Fixes: ca530cfa968c ("serial: imx: Add support for RS485 RX_DURING_TX output GPIO") Cc: Shawn Guo shawnguo@kernel.org Cc: Sascha Hauer s.hauer@pengutronix.de Cc: stable@vger.kernel.org Signed-off-by: Lino Sanfilippo l.sanfilippo@kunbus.com Link: https://lore.kernel.org/r/20240103061818.564-2-l.sanfilippo@kunbus.com Signed-off-by: Lino Sanfilippo l.sanfilippo@kunbus.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/tty/serial/imx.c | 4 ---- drivers/tty/serial/serial_core.c | 26 ++++++++++++++++++++++++-- drivers/tty/serial/stm32-usart.c | 8 ++------ 3 files changed, 26 insertions(+), 12 deletions(-)
--- a/drivers/tty/serial/imx.c +++ b/drivers/tty/serial/imx.c @@ -1947,10 +1947,6 @@ static int imx_uart_rs485_config(struct rs485conf->flags & SER_RS485_RX_DURING_TX) imx_uart_start_rx(port);
- if (port->rs485_rx_during_tx_gpio) - gpiod_set_value_cansleep(port->rs485_rx_during_tx_gpio, - !!(rs485conf->flags & SER_RS485_RX_DURING_TX)); - return 0; }
--- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -1409,6 +1409,16 @@ static void uart_set_rs485_termination(s !!(rs485->flags & SER_RS485_TERMINATE_BUS)); }
+static void uart_set_rs485_rx_during_tx(struct uart_port *port, + const struct serial_rs485 *rs485) +{ + if (!(rs485->flags & SER_RS485_ENABLED)) + return; + + gpiod_set_value_cansleep(port->rs485_rx_during_tx_gpio, + !!(rs485->flags & SER_RS485_RX_DURING_TX)); +} + static int uart_rs485_config(struct uart_port *port) { struct serial_rs485 *rs485 = &port->rs485; @@ -1420,12 +1430,17 @@ static int uart_rs485_config(struct uart
uart_sanitize_serial_rs485(port, rs485); uart_set_rs485_termination(port, rs485); + uart_set_rs485_rx_during_tx(port, rs485);
spin_lock_irqsave(&port->lock, flags); ret = port->rs485_config(port, NULL, rs485); spin_unlock_irqrestore(&port->lock, flags); - if (ret) + if (ret) { memset(rs485, 0, sizeof(*rs485)); + /* unset GPIOs */ + gpiod_set_value_cansleep(port->rs485_term_gpio, 0); + gpiod_set_value_cansleep(port->rs485_rx_during_tx_gpio, 0); + }
return ret; } @@ -1464,6 +1479,7 @@ static int uart_set_rs485_config(struct return ret; uart_sanitize_serial_rs485(port, &rs485); uart_set_rs485_termination(port, &rs485); + uart_set_rs485_rx_during_tx(port, &rs485);
spin_lock_irqsave(&port->lock, flags); ret = port->rs485_config(port, &tty->termios, &rs485); @@ -1475,8 +1491,14 @@ static int uart_set_rs485_config(struct port->ops->set_mctrl(port, port->mctrl); } spin_unlock_irqrestore(&port->lock, flags); - if (ret) + if (ret) { + /* restore old GPIO settings */ + gpiod_set_value_cansleep(port->rs485_term_gpio, + !!(port->rs485.flags & SER_RS485_TERMINATE_BUS)); + gpiod_set_value_cansleep(port->rs485_rx_during_tx_gpio, + !!(port->rs485.flags & SER_RS485_RX_DURING_TX)); return ret; + }
if (copy_to_user(rs485_user, &port->rs485, sizeof(port->rs485))) return -EFAULT; --- a/drivers/tty/serial/stm32-usart.c +++ b/drivers/tty/serial/stm32-usart.c @@ -226,12 +226,6 @@ static int stm32_usart_config_rs485(stru
stm32_usart_clr_bits(port, ofs->cr1, BIT(cfg->uart_enable_bit));
- if (port->rs485_rx_during_tx_gpio) - gpiod_set_value_cansleep(port->rs485_rx_during_tx_gpio, - !!(rs485conf->flags & SER_RS485_RX_DURING_TX)); - else - rs485conf->flags |= SER_RS485_RX_DURING_TX; - if (rs485conf->flags & SER_RS485_ENABLED) { cr1 = readl_relaxed(port->membase + ofs->cr1); cr3 = readl_relaxed(port->membase + ofs->cr3); @@ -256,6 +250,8 @@ static int stm32_usart_config_rs485(stru
writel_relaxed(cr3, port->membase + ofs->cr3); writel_relaxed(cr1, port->membase + ofs->cr1); + + rs485conf->flags |= SER_RS485_RX_DURING_TX; } else { stm32_usart_clr_bits(port, ofs->cr3, USART_CR3_DEM | USART_CR3_DEP);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
[ Upstream commit bb05367a66a9990d2c561282f5620bb1dbe40c28 ]
If file opened with v2 lease is upgraded with v1 lease, smb server should response v2 lease create context to client. This patch fix smb2.lease.v2_epoch2 test failure.
This test case assumes the following scenario: 1. smb2 create with v2 lease(R, LEASE1 key) 2. smb server return smb2 create response with v2 lease context(R, LEASE1 key, epoch + 1) 3. smb2 create with v1 lease(RH, LEASE1 key) 4. smb server return smb2 create response with v2 lease context(RH, LEASE1 key, epoch + 2)
i.e. If same client(same lease key) try to open a file that is being opened with v2 lease with v1 lease, smb server should return v2 lease.
Signed-off-by: Namjae Jeon linkinjeon@kernel.org Acked-by: Tom Talpey tom@talpey.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/oplock.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/smb/server/oplock.c +++ b/fs/smb/server/oplock.c @@ -1036,6 +1036,7 @@ static void copy_lease(struct oplock_inf lease2->duration = lease1->duration; lease2->flags = lease1->flags; lease2->epoch = lease1->epoch++; + lease2->version = lease1->version; }
static int add_lease_global_list(struct oplock_info *opinfo)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
[ Upstream commit 6fc0a265e1b932e5e97a038f99e29400a93baad0 ]
smb2_set_ea() can be called in parent inode lock range. So add get_write argument to smb2_set_ea() not to call nested mnt_want_write().
Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/smb2pdu.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-)
--- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -2323,11 +2323,12 @@ out: * @eabuf: set info command buffer * @buf_len: set info command buffer length * @path: dentry path for get ea + * @get_write: get write access to a mount * * Return: 0 on success, otherwise error */ static int smb2_set_ea(struct smb2_ea_info *eabuf, unsigned int buf_len, - const struct path *path) + const struct path *path, bool get_write) { struct mnt_idmap *idmap = mnt_idmap(path->mnt); char *attr_name = NULL, *value; @@ -3015,7 +3016,7 @@ int smb2_open(struct ksmbd_work *work)
rc = smb2_set_ea(&ea_buf->ea, le32_to_cpu(ea_buf->ccontext.DataLength), - &path); + &path, false); if (rc == -EOPNOTSUPP) rc = 0; else if (rc) @@ -5992,7 +5993,7 @@ static int smb2_set_info_file(struct ksm return -EINVAL;
return smb2_set_ea((struct smb2_ea_info *)req->Buffer, - buf_len, &fp->filp->f_path); + buf_len, &fp->filp->f_path, true); } case FILE_POSITION_INFORMATION: {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
[ Upstream commit b6e9a44e99603fe10e1d78901fdd97681a539612 ]
If existing lease state and request state are same, don't increment epoch in create context.
Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/oplock.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/fs/smb/server/oplock.c +++ b/fs/smb/server/oplock.c @@ -105,7 +105,7 @@ static int alloc_lease(struct oplock_inf lease->is_dir = lctx->is_dir; memcpy(lease->parent_lease_key, lctx->parent_lease_key, SMB2_LEASE_KEY_SIZE); lease->version = lctx->version; - lease->epoch = le16_to_cpu(lctx->epoch); + lease->epoch = le16_to_cpu(lctx->epoch) + 1; INIT_LIST_HEAD(&opinfo->lease_entry); opinfo->o_lease = lease;
@@ -541,6 +541,9 @@ static struct oplock_info *same_client_h continue; }
+ if (lctx->req_state != lease->state) + lease->epoch++; + /* upgrading lease */ if ((atomic_read(&ci->op_count) + atomic_read(&ci->sop_count)) == 1) { @@ -1035,7 +1038,7 @@ static void copy_lease(struct oplock_inf SMB2_LEASE_KEY_SIZE); lease2->duration = lease1->duration; lease2->flags = lease1->flags; - lease2->epoch = lease1->epoch++; + lease2->epoch = lease1->epoch; lease2->version = lease1->version; }
@@ -1454,7 +1457,7 @@ void create_lease_buf(u8 *rbuf, struct l memcpy(buf->lcontext.LeaseKey, lease->lease_key, SMB2_LEASE_KEY_SIZE); buf->lcontext.LeaseFlags = lease->flags; - buf->lcontext.Epoch = cpu_to_le16(++lease->epoch); + buf->lcontext.Epoch = cpu_to_le16(lease->epoch); buf->lcontext.LeaseState = lease->state; memcpy(buf->lcontext.ParentLeaseKey, lease->parent_lease_key, SMB2_LEASE_KEY_SIZE);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
[ Upstream commit 3fc74c65b367476874da5fe6f633398674b78e5a ]
Send lease break notification on FILE_RENAME_INFORMATION request. This patch fix smb2.lease.v2_epoch2 test failure.
Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/oplock.c | 12 +++++++----- fs/smb/server/smb2pdu.c | 1 + 2 files changed, 8 insertions(+), 5 deletions(-)
--- a/fs/smb/server/oplock.c +++ b/fs/smb/server/oplock.c @@ -541,14 +541,12 @@ static struct oplock_info *same_client_h continue; }
- if (lctx->req_state != lease->state) - lease->epoch++; - /* upgrading lease */ if ((atomic_read(&ci->op_count) + atomic_read(&ci->sop_count)) == 1) { if (lease->state != SMB2_LEASE_NONE_LE && lease->state == (lctx->req_state & lease->state)) { + lease->epoch++; lease->state |= lctx->req_state; if (lctx->req_state & SMB2_LEASE_WRITE_CACHING_LE) @@ -559,13 +557,17 @@ static struct oplock_info *same_client_h atomic_read(&ci->sop_count)) > 1) { if (lctx->req_state == (SMB2_LEASE_READ_CACHING_LE | - SMB2_LEASE_HANDLE_CACHING_LE)) + SMB2_LEASE_HANDLE_CACHING_LE)) { + lease->epoch++; lease->state = lctx->req_state; + } }
if (lctx->req_state && lease->state == - SMB2_LEASE_NONE_LE) + SMB2_LEASE_NONE_LE) { + lease->epoch++; lease_none_upgrade(opinfo, lctx->req_state); + } } read_lock(&ci->m_lock); } --- a/fs/smb/server/smb2pdu.c +++ b/fs/smb/server/smb2pdu.c @@ -5581,6 +5581,7 @@ static int smb2_rename(struct ksmbd_work if (!file_info->ReplaceIfExists) flags = RENAME_NOREPLACE;
+ smb_break_all_levII_oplock(work, fp, 0); rc = ksmbd_vfs_rename(work, &fp->filp->f_path, new_name, flags); out: kfree(new_name);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Namjae Jeon linkinjeon@kernel.org
From: Kevin Hao haokexin@gmail.com
[ Upstream commit 8fb7b723924cc9306bc161f45496497aec733904 ]
The kernel thread function ksmbd_conn_handler_loop() invokes the try_to_freeze() in its loop. But all the kernel threads are non-freezable by default. So if we want to make a kernel thread to be freezable, we have to invoke set_freezable() explicitly.
Signed-off-by: Kevin Hao haokexin@gmail.com Acked-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/connection.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/smb/server/connection.c +++ b/fs/smb/server/connection.c @@ -284,6 +284,7 @@ int ksmbd_conn_handler_loop(void *p) goto out;
conn->last_active = jiffies; + set_freezable(); while (ksmbd_conn_alive(conn)) { if (try_to_freeze()) continue;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rohan G Thomas rohan.g.thomas@intel.com
commit 6fb8c20a04be234cf1cfd4bdd8cfb8860c9d2d3b upstream.
Add dt-bindings for coe-unsupported property per tx queue. Some DWMAC IPs support tx checksum offloading(coe) only for a few tx queues.
DW xGMAC IP can be synthesized such that it can support tx coe only for a few initial tx queues. Also as Serge pointed out, for the DW QoS IP tx coe can be individually configured for each tx queue. This property is added to have sw fallback for checksum calculation if a tx queue doesn't support tx coe.
Signed-off-by: Rohan G Thomas rohan.g.thomas@intel.com Acked-by: Conor Dooley conor.dooley@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/devicetree/bindings/net/snps,dwmac.yaml | 5 +++++ 1 file changed, 5 insertions(+)
--- a/Documentation/devicetree/bindings/net/snps,dwmac.yaml +++ b/Documentation/devicetree/bindings/net/snps,dwmac.yaml @@ -394,6 +394,11 @@ properties: When a PFC frame is received with priorities matching the bitmask, the queue is blocked from transmitting for the pause time specified in the PFC frame. + + snps,coe-unsupported: + type: boolean + description: TX checksum offload is unsupported by the TX queue. + allOf: - if: required:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 3c4e420cb6536026ddd50eaaff5f30e4f144200d upstream.
Subsequent patches would make use of explored_state() function. Move it up to avoid adding unnecessary prototype.
Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231024000917.12153-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 28 +++++++++++++--------------- 1 file changed, 13 insertions(+), 15 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1786,6 +1786,19 @@ static int copy_verifier_state(struct bp return 0; }
+static u32 state_htab_size(struct bpf_verifier_env *env) +{ + return env->prog->len; +} + +static struct bpf_verifier_state_list **explored_state(struct bpf_verifier_env *env, int idx) +{ + struct bpf_verifier_state *cur = env->cur_state; + struct bpf_func_state *state = cur->frame[cur->curframe]; + + return &env->explored_states[(idx ^ state->callsite) % state_htab_size(env)]; +} + static void update_branch_counts(struct bpf_verifier_env *env, struct bpf_verifier_state *st) { while (st) { @@ -14702,21 +14715,6 @@ enum { BRANCH = 2, };
-static u32 state_htab_size(struct bpf_verifier_env *env) -{ - return env->prog->len; -} - -static struct bpf_verifier_state_list **explored_state( - struct bpf_verifier_env *env, - int idx) -{ - struct bpf_verifier_state *cur = env->cur_state; - struct bpf_func_state *state = cur->frame[cur->curframe]; - - return &env->explored_states[(idx ^ state->callsite) % state_htab_size(env)]; -} - static void mark_prune_point(struct bpf_verifier_env *env, int idx) { env->insn_aux_data[idx].prune_point = true;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 4c97259abc9bc8df7712f76f58ce385581876857 upstream.
Extract same_callsites() from clean_live_states() as a utility function. This function would be used by the next patch in the set.
Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231024000917.12153-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1799,6 +1799,20 @@ static struct bpf_verifier_state_list ** return &env->explored_states[(idx ^ state->callsite) % state_htab_size(env)]; }
+static bool same_callsites(struct bpf_verifier_state *a, struct bpf_verifier_state *b) +{ + int fr; + + if (a->curframe != b->curframe) + return false; + + for (fr = a->curframe; fr >= 0; fr--) + if (a->frame[fr]->callsite != b->frame[fr]->callsite) + return false; + + return true; +} + static void update_branch_counts(struct bpf_verifier_env *env, struct bpf_verifier_state *st) { while (st) { @@ -15521,18 +15535,14 @@ static void clean_live_states(struct bpf struct bpf_verifier_state *cur) { struct bpf_verifier_state_list *sl; - int i;
sl = *explored_state(env, insn); while (sl) { if (sl->state.branches) goto next; if (sl->state.insn_idx != insn || - sl->state.curframe != cur->curframe) + !same_callsites(&sl->state, cur)) goto next; - for (i = 0; i <= cur->curframe; i++) - if (sl->state.frame[i]->callsite != cur->frame[i]->callsite) - goto next; clean_verifier_state(env, &sl->state); next: sl = sl->next;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 2793a8b015f7f1caadb9bce9c63dc659f7522676 upstream.
Convergence for open coded iterators is computed in is_state_visited() by examining states with branches count > 1 and using states_equal(). states_equal() computes sub-state relation using read and precision marks. Read and precision marks are propagated from children states, thus are not guaranteed to be complete inside a loop when branches count > 1. This could be demonstrated using the following unsafe program:
1. r7 = -16 2. r6 = bpf_get_prandom_u32() 3. while (bpf_iter_num_next(&fp[-8])) { 4. if (r6 != 42) { 5. r7 = -32 6. r6 = bpf_get_prandom_u32() 7. continue 8. } 9. r0 = r10 10. r0 += r7 11. r8 = *(u64 *)(r0 + 0) 12. r6 = bpf_get_prandom_u32() 13. }
Here verifier would first visit path 1-3, create a checkpoint at 3 with r7=-16, continue to 4-7,3 with r7=-32.
Because instructions at 9-12 had not been visitied yet existing checkpoint at 3 does not have read or precision mark for r7. Thus states_equal() would return true and verifier would discard current state, thus unsafe memory access at 11 would not be caught.
This commit fixes this loophole by introducing exact state comparisons for iterator convergence logic: - registers are compared using regs_exact() regardless of read or precision marks; - stack slots have to have identical type.
Unfortunately, this is too strict even for simple programs like below:
i = 0; while(iter_next(&it)) i++;
At each iteration step i++ would produce a new distinct state and eventually instruction processing limit would be reached.
To avoid such behavior speculatively forget (widen) range for imprecise scalar registers, if those registers were not precise at the end of the previous iteration and do not match exactly.
This a conservative heuristic that allows to verify wide range of programs, however it precludes verification of programs that conjure an imprecise value on the first loop iteration and use it as precise on the second.
Test case iter_task_vma_for_each() presents one of such cases:
unsigned int seen = 0; ... bpf_for_each(task_vma, vma, task, 0) { if (seen >= 1000) break; ... seen++; }
Here clang generates the following code:
<LBB0_4>: 24: r8 = r6 ; stash current value of ... body ... 'seen' 29: r1 = r10 30: r1 += -0x8 31: call bpf_iter_task_vma_next 32: r6 += 0x1 ; seen++; 33: if r0 == 0x0 goto +0x2 <LBB0_6> ; exit on next() == NULL 34: r7 += 0x10 35: if r8 < 0x3e7 goto -0xc <LBB0_4> ; loop on seen < 1000
<LBB0_6>: ... exit ...
Note that counter in r6 is copied to r8 and then incremented, conditional jump is done using r8. Because of this precision mark for r6 lags one state behind of precision mark on r8 and widening logic kicks in.
Adding barrier_var(seen) after conditional is sufficient to force clang use the same register for both counting and conditional jump.
This issue was discussed in the thread [1] which was started by Andrew Werner awerner32@gmail.com demonstrating a similar bug in callback functions handling. The callbacks would be addressed in a followup patch.
[1] https://lore.kernel.org/bpf/97a90da09404c65c8e810cf83c94ac703705dc0e.camel@g...
Co-developed-by: Andrii Nakryiko andrii.nakryiko@gmail.com Co-developed-by: Alexei Starovoitov alexei.starovoitov@gmail.com Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231024000917.12153-4-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bpf_verifier.h | 1 kernel/bpf/verifier.c | 218 ++++++++++++++++++++++++++++++++++++------- 2 files changed, 188 insertions(+), 31 deletions(-)
--- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -383,6 +383,7 @@ struct bpf_verifier_state { */ struct bpf_idx_pair *jmp_history; u32 jmp_history_cnt; + u32 dfs_depth; };
#define bpf_get_spilled_reg(slot, frame) \ --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1771,6 +1771,7 @@ static int copy_verifier_state(struct bp dst_state->parent = src->parent; dst_state->first_insn_idx = src->first_insn_idx; dst_state->last_insn_idx = src->last_insn_idx; + dst_state->dfs_depth = src->dfs_depth; for (i = 0; i <= src->curframe; i++) { dst = dst_state->frame[i]; if (!dst) { @@ -7554,6 +7555,81 @@ static int process_iter_arg(struct bpf_v return 0; }
+/* Look for a previous loop entry at insn_idx: nearest parent state + * stopped at insn_idx with callsites matching those in cur->frame. + */ +static struct bpf_verifier_state *find_prev_entry(struct bpf_verifier_env *env, + struct bpf_verifier_state *cur, + int insn_idx) +{ + struct bpf_verifier_state_list *sl; + struct bpf_verifier_state *st; + + /* Explored states are pushed in stack order, most recent states come first */ + sl = *explored_state(env, insn_idx); + for (; sl; sl = sl->next) { + /* If st->branches != 0 state is a part of current DFS verification path, + * hence cur & st for a loop. + */ + st = &sl->state; + if (st->insn_idx == insn_idx && st->branches && same_callsites(st, cur) && + st->dfs_depth < cur->dfs_depth) + return st; + } + + return NULL; +} + +static void reset_idmap_scratch(struct bpf_verifier_env *env); +static bool regs_exact(const struct bpf_reg_state *rold, + const struct bpf_reg_state *rcur, + struct bpf_idmap *idmap); + +static void maybe_widen_reg(struct bpf_verifier_env *env, + struct bpf_reg_state *rold, struct bpf_reg_state *rcur, + struct bpf_idmap *idmap) +{ + if (rold->type != SCALAR_VALUE) + return; + if (rold->type != rcur->type) + return; + if (rold->precise || rcur->precise || regs_exact(rold, rcur, idmap)) + return; + __mark_reg_unknown(env, rcur); +} + +static int widen_imprecise_scalars(struct bpf_verifier_env *env, + struct bpf_verifier_state *old, + struct bpf_verifier_state *cur) +{ + struct bpf_func_state *fold, *fcur; + int i, fr; + + reset_idmap_scratch(env); + for (fr = old->curframe; fr >= 0; fr--) { + fold = old->frame[fr]; + fcur = cur->frame[fr]; + + for (i = 0; i < MAX_BPF_REG; i++) + maybe_widen_reg(env, + &fold->regs[i], + &fcur->regs[i], + &env->idmap_scratch); + + for (i = 0; i < fold->allocated_stack / BPF_REG_SIZE; i++) { + if (!is_spilled_reg(&fold->stack[i]) || + !is_spilled_reg(&fcur->stack[i])) + continue; + + maybe_widen_reg(env, + &fold->stack[i].spilled_ptr, + &fcur->stack[i].spilled_ptr, + &env->idmap_scratch); + } + } + return 0; +} + /* process_iter_next_call() is called when verifier gets to iterator's next * "method" (e.g., bpf_iter_num_next() for numbers iterator) call. We'll refer * to it as just "iter_next()" in comments below. @@ -7595,25 +7671,47 @@ static int process_iter_arg(struct bpf_v * is some statically known limit on number of iterations (e.g., if there is * an explicit `if n > 100 then break;` statement somewhere in the loop). * - * One very subtle but very important aspect is that we *always* simulate NULL - * condition first (as the current state) before we simulate non-NULL case. - * This has to do with intricacies of scalar precision tracking. By simulating - * "exit condition" of iter_next() returning NULL first, we make sure all the - * relevant precision marks *that will be set **after** we exit iterator loop* - * are propagated backwards to common parent state of NULL and non-NULL - * branches. Thanks to that, state equivalence checks done later in forked - * state, when reaching iter_next() for ACTIVE iterator, can assume that - * precision marks are finalized and won't change. Because simulating another - * ACTIVE iterator iteration won't change them (because given same input - * states we'll end up with exactly same output states which we are currently - * comparing; and verification after the loop already propagated back what - * needs to be **additionally** tracked as precise). It's subtle, grok - * precision tracking for more intuitive understanding. + * Iteration convergence logic in is_state_visited() relies on exact + * states comparison, which ignores read and precision marks. + * This is necessary because read and precision marks are not finalized + * while in the loop. Exact comparison might preclude convergence for + * simple programs like below: + * + * i = 0; + * while(iter_next(&it)) + * i++; + * + * At each iteration step i++ would produce a new distinct state and + * eventually instruction processing limit would be reached. + * + * To avoid such behavior speculatively forget (widen) range for + * imprecise scalar registers, if those registers were not precise at the + * end of the previous iteration and do not match exactly. + * + * This is a conservative heuristic that allows to verify wide range of programs, + * however it precludes verification of programs that conjure an + * imprecise value on the first loop iteration and use it as precise on a second. + * For example, the following safe program would fail to verify: + * + * struct bpf_num_iter it; + * int arr[10]; + * int i = 0, a = 0; + * bpf_iter_num_new(&it, 0, 10); + * while (bpf_iter_num_next(&it)) { + * if (a == 0) { + * a = 1; + * i = 7; // Because i changed verifier would forget + * // it's range on second loop entry. + * } else { + * arr[i] = 42; // This would fail to verify. + * } + * } + * bpf_iter_num_destroy(&it); */ static int process_iter_next_call(struct bpf_verifier_env *env, int insn_idx, struct bpf_kfunc_call_arg_meta *meta) { - struct bpf_verifier_state *cur_st = env->cur_state, *queued_st; + struct bpf_verifier_state *cur_st = env->cur_state, *queued_st, *prev_st; struct bpf_func_state *cur_fr = cur_st->frame[cur_st->curframe], *queued_fr; struct bpf_reg_state *cur_iter, *queued_iter; int iter_frameno = meta->iter.frameno; @@ -7631,6 +7729,19 @@ static int process_iter_next_call(struct }
if (cur_iter->iter.state == BPF_ITER_STATE_ACTIVE) { + /* Because iter_next() call is a checkpoint is_state_visitied() + * should guarantee parent state with same call sites and insn_idx. + */ + if (!cur_st->parent || cur_st->parent->insn_idx != insn_idx || + !same_callsites(cur_st->parent, cur_st)) { + verbose(env, "bug: bad parent state for iter next call"); + return -EFAULT; + } + /* Note cur_st->parent in the call below, it is necessary to skip + * checkpoint created for cur_st by is_state_visited() + * right at this instruction. + */ + prev_st = find_prev_entry(env, cur_st->parent, insn_idx); /* branch out active iter state */ queued_st = push_stack(env, insn_idx + 1, insn_idx, false); if (!queued_st) @@ -7639,6 +7750,8 @@ static int process_iter_next_call(struct queued_iter = &queued_st->frame[iter_frameno]->stack[iter_spi].spilled_ptr; queued_iter->iter.state = BPF_ITER_STATE_ACTIVE; queued_iter->iter.depth++; + if (prev_st) + widen_imprecise_scalars(env, prev_st, queued_st);
queued_fr = queued_st->frame[queued_st->curframe]; mark_ptr_not_null_reg(&queued_fr->regs[BPF_REG_0]); @@ -15560,8 +15673,11 @@ static bool regs_exact(const struct bpf_
/* Returns true if (rold safe implies rcur safe) */ static bool regsafe(struct bpf_verifier_env *env, struct bpf_reg_state *rold, - struct bpf_reg_state *rcur, struct bpf_idmap *idmap) + struct bpf_reg_state *rcur, struct bpf_idmap *idmap, bool exact) { + if (exact) + return regs_exact(rold, rcur, idmap); + if (!(rold->live & REG_LIVE_READ)) /* explored state didn't use this */ return true; @@ -15678,7 +15794,7 @@ static bool regsafe(struct bpf_verifier_ }
static bool stacksafe(struct bpf_verifier_env *env, struct bpf_func_state *old, - struct bpf_func_state *cur, struct bpf_idmap *idmap) + struct bpf_func_state *cur, struct bpf_idmap *idmap, bool exact) { int i, spi;
@@ -15691,7 +15807,12 @@ static bool stacksafe(struct bpf_verifie
spi = i / BPF_REG_SIZE;
- if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ)) { + if (exact && + old->stack[spi].slot_type[i % BPF_REG_SIZE] != + cur->stack[spi].slot_type[i % BPF_REG_SIZE]) + return false; + + if (!(old->stack[spi].spilled_ptr.live & REG_LIVE_READ) && !exact) { i += BPF_REG_SIZE - 1; /* explored state didn't use this */ continue; @@ -15741,7 +15862,7 @@ static bool stacksafe(struct bpf_verifie * return false to continue verification of this path */ if (!regsafe(env, &old->stack[spi].spilled_ptr, - &cur->stack[spi].spilled_ptr, idmap)) + &cur->stack[spi].spilled_ptr, idmap, exact)) return false; break; case STACK_DYNPTR: @@ -15823,16 +15944,16 @@ static bool refsafe(struct bpf_func_stat * the current state will reach 'bpf_exit' instruction safely */ static bool func_states_equal(struct bpf_verifier_env *env, struct bpf_func_state *old, - struct bpf_func_state *cur) + struct bpf_func_state *cur, bool exact) { int i;
for (i = 0; i < MAX_BPF_REG; i++) if (!regsafe(env, &old->regs[i], &cur->regs[i], - &env->idmap_scratch)) + &env->idmap_scratch, exact)) return false;
- if (!stacksafe(env, old, cur, &env->idmap_scratch)) + if (!stacksafe(env, old, cur, &env->idmap_scratch, exact)) return false;
if (!refsafe(old, cur, &env->idmap_scratch)) @@ -15841,17 +15962,23 @@ static bool func_states_equal(struct bpf return true; }
+static void reset_idmap_scratch(struct bpf_verifier_env *env) +{ + env->idmap_scratch.tmp_id_gen = env->id_gen; + memset(&env->idmap_scratch.map, 0, sizeof(env->idmap_scratch.map)); +} + static bool states_equal(struct bpf_verifier_env *env, struct bpf_verifier_state *old, - struct bpf_verifier_state *cur) + struct bpf_verifier_state *cur, + bool exact) { int i;
if (old->curframe != cur->curframe) return false;
- env->idmap_scratch.tmp_id_gen = env->id_gen; - memset(&env->idmap_scratch.map, 0, sizeof(env->idmap_scratch.map)); + reset_idmap_scratch(env);
/* Verification state from speculative execution simulation * must never prune a non-speculative execution one. @@ -15881,7 +16008,7 @@ static bool states_equal(struct bpf_veri for (i = 0; i <= old->curframe; i++) { if (old->frame[i]->callsite != cur->frame[i]->callsite) return false; - if (!func_states_equal(env, old->frame[i], cur->frame[i])) + if (!func_states_equal(env, old->frame[i], cur->frame[i], exact)) return false; } return true; @@ -16136,7 +16263,7 @@ static int is_state_visited(struct bpf_v struct bpf_verifier_state_list *new_sl; struct bpf_verifier_state_list *sl, **pprev; struct bpf_verifier_state *cur = env->cur_state, *new; - int i, j, err, states_cnt = 0; + int i, j, n, err, states_cnt = 0; bool force_new_state = env->test_state_freq || is_force_checkpoint(env, insn_idx); bool add_new_state = force_new_state;
@@ -16191,9 +16318,33 @@ static int is_state_visited(struct bpf_v * It's safe to assume that iterator loop will finish, taking into * account iter_next() contract of eventually returning * sticky NULL result. + * + * Note, that states have to be compared exactly in this case because + * read and precision marks might not be finalized inside the loop. + * E.g. as in the program below: + * + * 1. r7 = -16 + * 2. r6 = bpf_get_prandom_u32() + * 3. while (bpf_iter_num_next(&fp[-8])) { + * 4. if (r6 != 42) { + * 5. r7 = -32 + * 6. r6 = bpf_get_prandom_u32() + * 7. continue + * 8. } + * 9. r0 = r10 + * 10. r0 += r7 + * 11. r8 = *(u64 *)(r0 + 0) + * 12. r6 = bpf_get_prandom_u32() + * 13. } + * + * Here verifier would first visit path 1-3, create a checkpoint at 3 + * with r7=-16, continue to 4-7,3. Existing checkpoint at 3 does + * not have read or precision mark for r7 yet, thus inexact states + * comparison would discard current state with r7=-32 + * => unsafe memory access at 11 would not be caught. */ if (is_iter_next_insn(env, insn_idx)) { - if (states_equal(env, &sl->state, cur)) { + if (states_equal(env, &sl->state, cur, true)) { struct bpf_func_state *cur_frame; struct bpf_reg_state *iter_state, *iter_reg; int spi; @@ -16216,7 +16367,7 @@ static int is_state_visited(struct bpf_v } /* attempt to detect infinite loop to avoid unnecessary doomed work */ if (states_maybe_looping(&sl->state, cur) && - states_equal(env, &sl->state, cur) && + states_equal(env, &sl->state, cur, false) && !iter_active_depths_differ(&sl->state, cur)) { verbose_linfo(env, insn_idx, "; "); verbose(env, "infinite loop detected at insn %d\n", insn_idx); @@ -16241,7 +16392,7 @@ skip_inf_loop_check: add_new_state = false; goto miss; } - if (states_equal(env, &sl->state, cur)) { + if (states_equal(env, &sl->state, cur, false)) { hit: sl->hit_cnt++; /* reached equivalent register/stack state, @@ -16280,8 +16431,12 @@ miss: * to keep checking from state equivalence point of view. * Higher numbers increase max_states_per_insn and verification time, * but do not meaningfully decrease insn_processed. + * 'n' controls how many times state could miss before eviction. + * Use bigger 'n' for checkpoints because evicting checkpoint states + * too early would hinder iterator convergence. */ - if (sl->miss_cnt > sl->hit_cnt * 3 + 3) { + n = is_force_checkpoint(env, insn_idx) && sl->state.branches > 0 ? 64 : 3; + if (sl->miss_cnt > sl->hit_cnt * n + n) { /* the state is unlikely to be useful. Remove it to * speed up verification */ @@ -16355,6 +16510,7 @@ next:
cur->parent = new; cur->first_insn_idx = insn_idx; + cur->dfs_depth = new->dfs_depth + 1; clear_jmp_history(cur); new_sl->next = *explored_state(env, insn_idx); *explored_state(env, insn_idx) = new_sl;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 389ede06c2974b2f878a7ebff6b0f4f707f9db74 upstream.
These test cases try to hide read and precision marks from loop convergence logic: marks would only be assigned on subsequent loop iterations or after exploring states pushed to env->head stack first. Without verifier fix to use exact states comparison logic for iterators convergence these tests (except 'triple_continue') would be errorneously marked as safe.
Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231024000917.12153-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/progs/iters.c | 518 ++++++++++++++++++++++++++++++ 1 file changed, 518 insertions(+)
--- a/tools/testing/selftests/bpf/progs/iters.c +++ b/tools/testing/selftests/bpf/progs/iters.c @@ -14,6 +14,13 @@ int my_pid; int arr[256]; int small_arr[16] SEC(".data.small_arr");
+struct { + __uint(type, BPF_MAP_TYPE_HASH); + __uint(max_entries, 10); + __type(key, int); + __type(value, int); +} amap SEC(".maps"); + #ifdef REAL_TEST #define MY_PID_GUARD() if (my_pid != (bpf_get_current_pid_tgid() >> 32)) return 0 #else @@ -716,4 +723,515 @@ int iter_pass_iter_ptr_to_subprog(const return 0; }
+SEC("?raw_tp") +__failure +__msg("R1 type=scalar expected=fp") +__naked int delayed_read_mark(void) +{ + /* This is equivalent to C program below. + * The call to bpf_iter_num_next() is reachable with r7 values &fp[-16] and 0xdead. + * State with r7=&fp[-16] is visited first and follows r6 != 42 ... continue branch. + * At this point iterator next() call is reached with r7 that has no read mark. + * Loop body with r7=0xdead would only be visited if verifier would decide to continue + * with second loop iteration. Absence of read mark on r7 might affect state + * equivalent logic used for iterator convergence tracking. + * + * r7 = &fp[-16] + * fp[-16] = 0 + * r6 = bpf_get_prandom_u32() + * bpf_iter_num_new(&fp[-8], 0, 10) + * while (bpf_iter_num_next(&fp[-8])) { + * r6++ + * if (r6 != 42) { + * r7 = 0xdead + * continue; + * } + * bpf_probe_read_user(r7, 8, 0xdeadbeef); // this is not safe + * } + * bpf_iter_num_destroy(&fp[-8]) + * return 0 + */ + asm volatile ( + "r7 = r10;" + "r7 += -16;" + "r0 = 0;" + "*(u64 *)(r7 + 0) = r0;" + "call %[bpf_get_prandom_u32];" + "r6 = r0;" + "r1 = r10;" + "r1 += -8;" + "r2 = 0;" + "r3 = 10;" + "call %[bpf_iter_num_new];" + "1:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_next];" + "if r0 == 0 goto 2f;" + "r6 += 1;" + "if r6 != 42 goto 3f;" + "r7 = 0xdead;" + "goto 1b;" + "3:" + "r1 = r7;" + "r2 = 8;" + "r3 = 0xdeadbeef;" + "call %[bpf_probe_read_user];" + "goto 1b;" + "2:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_destroy];" + "r0 = 0;" + "exit;" + : + : __imm(bpf_get_prandom_u32), + __imm(bpf_iter_num_new), + __imm(bpf_iter_num_next), + __imm(bpf_iter_num_destroy), + __imm(bpf_probe_read_user) + : __clobber_all + ); +} + +SEC("?raw_tp") +__failure +__msg("math between fp pointer and register with unbounded") +__naked int delayed_precision_mark(void) +{ + /* This is equivalent to C program below. + * The test is similar to delayed_iter_mark but verifies that incomplete + * precision don't fool verifier. + * The call to bpf_iter_num_next() is reachable with r7 values -16 and -32. + * State with r7=-16 is visited first and follows r6 != 42 ... continue branch. + * At this point iterator next() call is reached with r7 that has no read + * and precision marks. + * Loop body with r7=-32 would only be visited if verifier would decide to continue + * with second loop iteration. Absence of precision mark on r7 might affect state + * equivalent logic used for iterator convergence tracking. + * + * r8 = 0 + * fp[-16] = 0 + * r7 = -16 + * r6 = bpf_get_prandom_u32() + * bpf_iter_num_new(&fp[-8], 0, 10) + * while (bpf_iter_num_next(&fp[-8])) { + * if (r6 != 42) { + * r7 = -32 + * r6 = bpf_get_prandom_u32() + * continue; + * } + * r0 = r10 + * r0 += r7 + * r8 = *(u64 *)(r0 + 0) // this is not safe + * r6 = bpf_get_prandom_u32() + * } + * bpf_iter_num_destroy(&fp[-8]) + * return r8 + */ + asm volatile ( + "r8 = 0;" + "*(u64 *)(r10 - 16) = r8;" + "r7 = -16;" + "call %[bpf_get_prandom_u32];" + "r6 = r0;" + "r1 = r10;" + "r1 += -8;" + "r2 = 0;" + "r3 = 10;" + "call %[bpf_iter_num_new];" + "1:" + "r1 = r10;" + "r1 += -8;\n" + "call %[bpf_iter_num_next];" + "if r0 == 0 goto 2f;" + "if r6 != 42 goto 3f;" + "r7 = -32;" + "call %[bpf_get_prandom_u32];" + "r6 = r0;" + "goto 1b;\n" + "3:" + "r0 = r10;" + "r0 += r7;" + "r8 = *(u64 *)(r0 + 0);" + "call %[bpf_get_prandom_u32];" + "r6 = r0;" + "goto 1b;\n" + "2:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_destroy];" + "r0 = r8;" + "exit;" + : + : __imm(bpf_get_prandom_u32), + __imm(bpf_iter_num_new), + __imm(bpf_iter_num_next), + __imm(bpf_iter_num_destroy), + __imm(bpf_probe_read_user) + : __clobber_all + ); +} + +SEC("?raw_tp") +__failure +__msg("math between fp pointer and register with unbounded") +__flag(BPF_F_TEST_STATE_FREQ) +__naked int loop_state_deps1(void) +{ + /* This is equivalent to C program below. + * + * The case turns out to be tricky in a sense that: + * - states with c=-25 are explored only on a second iteration + * of the outer loop; + * - states with read+precise mark on c are explored only on + * second iteration of the inner loop and in a state which + * is pushed to states stack first. + * + * Depending on the details of iterator convergence logic + * verifier might stop states traversal too early and miss + * unsafe c=-25 memory access. + * + * j = iter_new(); // fp[-16] + * a = 0; // r6 + * b = 0; // r7 + * c = -24; // r8 + * while (iter_next(j)) { + * i = iter_new(); // fp[-8] + * a = 0; // r6 + * b = 0; // r7 + * while (iter_next(i)) { + * if (a == 1) { + * a = 0; + * b = 1; + * } else if (a == 0) { + * a = 1; + * if (random() == 42) + * continue; + * if (b == 1) { + * *(r10 + c) = 7; // this is not safe + * iter_destroy(i); + * iter_destroy(j); + * return; + * } + * } + * } + * iter_destroy(i); + * a = 0; + * b = 0; + * c = -25; + * } + * iter_destroy(j); + * return; + */ + asm volatile ( + "r1 = r10;" + "r1 += -16;" + "r2 = 0;" + "r3 = 10;" + "call %[bpf_iter_num_new];" + "r6 = 0;" + "r7 = 0;" + "r8 = -24;" + "j_loop_%=:" + "r1 = r10;" + "r1 += -16;" + "call %[bpf_iter_num_next];" + "if r0 == 0 goto j_loop_end_%=;" + "r1 = r10;" + "r1 += -8;" + "r2 = 0;" + "r3 = 10;" + "call %[bpf_iter_num_new];" + "r6 = 0;" + "r7 = 0;" + "i_loop_%=:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_next];" + "if r0 == 0 goto i_loop_end_%=;" + "check_one_r6_%=:" + "if r6 != 1 goto check_zero_r6_%=;" + "r6 = 0;" + "r7 = 1;" + "goto i_loop_%=;" + "check_zero_r6_%=:" + "if r6 != 0 goto i_loop_%=;" + "r6 = 1;" + "call %[bpf_get_prandom_u32];" + "if r0 != 42 goto check_one_r7_%=;" + "goto i_loop_%=;" + "check_one_r7_%=:" + "if r7 != 1 goto i_loop_%=;" + "r0 = r10;" + "r0 += r8;" + "r1 = 7;" + "*(u64 *)(r0 + 0) = r1;" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_destroy];" + "r1 = r10;" + "r1 += -16;" + "call %[bpf_iter_num_destroy];" + "r0 = 0;" + "exit;" + "i_loop_end_%=:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_destroy];" + "r6 = 0;" + "r7 = 0;" + "r8 = -25;" + "goto j_loop_%=;" + "j_loop_end_%=:" + "r1 = r10;" + "r1 += -16;" + "call %[bpf_iter_num_destroy];" + "r0 = 0;" + "exit;" + : + : __imm(bpf_get_prandom_u32), + __imm(bpf_iter_num_new), + __imm(bpf_iter_num_next), + __imm(bpf_iter_num_destroy) + : __clobber_all + ); +} + +SEC("?raw_tp") +__success +__naked int triple_continue(void) +{ + /* This is equivalent to C program below. + * High branching factor of the loop body turned out to be + * problematic for one of the iterator convergence tracking + * algorithms explored. + * + * r6 = bpf_get_prandom_u32() + * bpf_iter_num_new(&fp[-8], 0, 10) + * while (bpf_iter_num_next(&fp[-8])) { + * if (bpf_get_prandom_u32() != 42) + * continue; + * if (bpf_get_prandom_u32() != 42) + * continue; + * if (bpf_get_prandom_u32() != 42) + * continue; + * r0 += 0; + * } + * bpf_iter_num_destroy(&fp[-8]) + * return 0 + */ + asm volatile ( + "r1 = r10;" + "r1 += -8;" + "r2 = 0;" + "r3 = 10;" + "call %[bpf_iter_num_new];" + "loop_%=:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_next];" + "if r0 == 0 goto loop_end_%=;" + "call %[bpf_get_prandom_u32];" + "if r0 != 42 goto loop_%=;" + "call %[bpf_get_prandom_u32];" + "if r0 != 42 goto loop_%=;" + "call %[bpf_get_prandom_u32];" + "if r0 != 42 goto loop_%=;" + "r0 += 0;" + "goto loop_%=;" + "loop_end_%=:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_destroy];" + "r0 = 0;" + "exit;" + : + : __imm(bpf_get_prandom_u32), + __imm(bpf_iter_num_new), + __imm(bpf_iter_num_next), + __imm(bpf_iter_num_destroy) + : __clobber_all + ); +} + +SEC("?raw_tp") +__success +__naked int widen_spill(void) +{ + /* This is equivalent to C program below. + * The counter is stored in fp[-16], if this counter is not widened + * verifier states representing loop iterations would never converge. + * + * fp[-16] = 0 + * bpf_iter_num_new(&fp[-8], 0, 10) + * while (bpf_iter_num_next(&fp[-8])) { + * r0 = fp[-16]; + * r0 += 1; + * fp[-16] = r0; + * } + * bpf_iter_num_destroy(&fp[-8]) + * return 0 + */ + asm volatile ( + "r0 = 0;" + "*(u64 *)(r10 - 16) = r0;" + "r1 = r10;" + "r1 += -8;" + "r2 = 0;" + "r3 = 10;" + "call %[bpf_iter_num_new];" + "loop_%=:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_next];" + "if r0 == 0 goto loop_end_%=;" + "r0 = *(u64 *)(r10 - 16);" + "r0 += 1;" + "*(u64 *)(r10 - 16) = r0;" + "goto loop_%=;" + "loop_end_%=:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_destroy];" + "r0 = 0;" + "exit;" + : + : __imm(bpf_iter_num_new), + __imm(bpf_iter_num_next), + __imm(bpf_iter_num_destroy) + : __clobber_all + ); +} + +SEC("raw_tp") +__success +__naked int checkpoint_states_deletion(void) +{ + /* This is equivalent to C program below. + * + * int *a, *b, *c, *d, *e, *f; + * int i, sum = 0; + * bpf_for(i, 0, 10) { + * a = bpf_map_lookup_elem(&amap, &i); + * b = bpf_map_lookup_elem(&amap, &i); + * c = bpf_map_lookup_elem(&amap, &i); + * d = bpf_map_lookup_elem(&amap, &i); + * e = bpf_map_lookup_elem(&amap, &i); + * f = bpf_map_lookup_elem(&amap, &i); + * if (a) sum += 1; + * if (b) sum += 1; + * if (c) sum += 1; + * if (d) sum += 1; + * if (e) sum += 1; + * if (f) sum += 1; + * } + * return 0; + * + * The body of the loop spawns multiple simulation paths + * with different combination of NULL/non-NULL information for a/b/c/d/e/f. + * Each combination is unique from states_equal() point of view. + * Explored states checkpoint is created after each iterator next call. + * Iterator convergence logic expects that eventually current state + * would get equal to one of the explored states and thus loop + * exploration would be finished (at-least for a specific path). + * Verifier evicts explored states with high miss to hit ratio + * to to avoid comparing current state with too many explored + * states per instruction. + * This test is designed to "stress test" eviction policy defined using formula: + * + * sl->miss_cnt > sl->hit_cnt * N + N // if true sl->state is evicted + * + * Currently N is set to 64, which allows for 6 variables in this test. + */ + asm volatile ( + "r6 = 0;" /* a */ + "r7 = 0;" /* b */ + "r8 = 0;" /* c */ + "*(u64 *)(r10 - 24) = r6;" /* d */ + "*(u64 *)(r10 - 32) = r6;" /* e */ + "*(u64 *)(r10 - 40) = r6;" /* f */ + "r9 = 0;" /* sum */ + "r1 = r10;" + "r1 += -8;" + "r2 = 0;" + "r3 = 10;" + "call %[bpf_iter_num_new];" + "loop_%=:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_next];" + "if r0 == 0 goto loop_end_%=;" + + "*(u64 *)(r10 - 16) = r0;" + + "r1 = %[amap] ll;" + "r2 = r10;" + "r2 += -16;" + "call %[bpf_map_lookup_elem];" + "r6 = r0;" + + "r1 = %[amap] ll;" + "r2 = r10;" + "r2 += -16;" + "call %[bpf_map_lookup_elem];" + "r7 = r0;" + + "r1 = %[amap] ll;" + "r2 = r10;" + "r2 += -16;" + "call %[bpf_map_lookup_elem];" + "r8 = r0;" + + "r1 = %[amap] ll;" + "r2 = r10;" + "r2 += -16;" + "call %[bpf_map_lookup_elem];" + "*(u64 *)(r10 - 24) = r0;" + + "r1 = %[amap] ll;" + "r2 = r10;" + "r2 += -16;" + "call %[bpf_map_lookup_elem];" + "*(u64 *)(r10 - 32) = r0;" + + "r1 = %[amap] ll;" + "r2 = r10;" + "r2 += -16;" + "call %[bpf_map_lookup_elem];" + "*(u64 *)(r10 - 40) = r0;" + + "if r6 == 0 goto +1;" + "r9 += 1;" + "if r7 == 0 goto +1;" + "r9 += 1;" + "if r8 == 0 goto +1;" + "r9 += 1;" + "r0 = *(u64 *)(r10 - 24);" + "if r0 == 0 goto +1;" + "r9 += 1;" + "r0 = *(u64 *)(r10 - 32);" + "if r0 == 0 goto +1;" + "r9 += 1;" + "r0 = *(u64 *)(r10 - 40);" + "if r0 == 0 goto +1;" + "r9 += 1;" + + "goto loop_%=;" + "loop_end_%=:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_destroy];" + "r0 = 0;" + "exit;" + : + : __imm(bpf_map_lookup_elem), + __imm(bpf_iter_num_new), + __imm(bpf_iter_num_next), + __imm(bpf_iter_num_destroy), + __imm_addr(amap) + : __clobber_all + ); +} + char _license[] SEC("license") = "GPL";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 2a0992829ea3864939d917a5c7b48be6629c6217 upstream.
It turns out that .branches > 0 in is_state_visited() is not a sufficient condition to identify if two verifier states form a loop when iterators convergence is computed. This commit adds logic to distinguish situations like below:
(I) initial (II) initial | | V V .---------> hdr .. | | | | V V | .------... .------.. | | | | | | V V V V | ... ... .-> hdr .. | | | | | | | V V | V V | succ <- cur | succ <- cur | | | | | V | V | ... | ... | | | | '----' '----'
For both (I) and (II) successor 'succ' of the current state 'cur' was previously explored and has branches count at 0. However, loop entry 'hdr' corresponding to 'succ' might be a part of current DFS path. If that is the case 'succ' and 'cur' are members of the same loop and have to be compared exactly.
Co-developed-by: Andrii Nakryiko andrii.nakryiko@gmail.com Co-developed-by: Alexei Starovoitov alexei.starovoitov@gmail.com Reviewed-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231024000917.12153-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bpf_verifier.h | 15 +++ kernel/bpf/verifier.c | 207 ++++++++++++++++++++++++++++++++++++++++++- 2 files changed, 218 insertions(+), 4 deletions(-)
--- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -372,10 +372,25 @@ struct bpf_verifier_state { struct bpf_active_lock active_lock; bool speculative; bool active_rcu_lock; + /* If this state was ever pointed-to by other state's loop_entry field + * this flag would be set to true. Used to avoid freeing such states + * while they are still in use. + */ + bool used_as_loop_entry;
/* first and last insn idx of this verifier state */ u32 first_insn_idx; u32 last_insn_idx; + /* If this state is a part of states loop this field points to some + * parent of this state such that: + * - it is also a member of the same states loop; + * - DFS states traversal starting from initial state visits loop_entry + * state before this state. + * Used to compute topmost loop entry for state loops. + * State loops might appear because of open coded iterators logic. + * See get_loop_entry() for more information. + */ + struct bpf_verifier_state *loop_entry; /* jmp history recorded from first to last. * backtracking is using it to go from last to first. * For most states jmp_history_cnt is [0-3]. --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -1772,6 +1772,7 @@ static int copy_verifier_state(struct bp dst_state->first_insn_idx = src->first_insn_idx; dst_state->last_insn_idx = src->last_insn_idx; dst_state->dfs_depth = src->dfs_depth; + dst_state->used_as_loop_entry = src->used_as_loop_entry; for (i = 0; i <= src->curframe; i++) { dst = dst_state->frame[i]; if (!dst) { @@ -1814,11 +1815,176 @@ static bool same_callsites(struct bpf_ve return true; }
+/* Open coded iterators allow back-edges in the state graph in order to + * check unbounded loops that iterators. + * + * In is_state_visited() it is necessary to know if explored states are + * part of some loops in order to decide whether non-exact states + * comparison could be used: + * - non-exact states comparison establishes sub-state relation and uses + * read and precision marks to do so, these marks are propagated from + * children states and thus are not guaranteed to be final in a loop; + * - exact states comparison just checks if current and explored states + * are identical (and thus form a back-edge). + * + * Paper "A New Algorithm for Identifying Loops in Decompilation" + * by Tao Wei, Jian Mao, Wei Zou and Yu Chen [1] presents a convenient + * algorithm for loop structure detection and gives an overview of + * relevant terminology. It also has helpful illustrations. + * + * [1] https://api.semanticscholar.org/CorpusID:15784067 + * + * We use a similar algorithm but because loop nested structure is + * irrelevant for verifier ours is significantly simpler and resembles + * strongly connected components algorithm from Sedgewick's textbook. + * + * Define topmost loop entry as a first node of the loop traversed in a + * depth first search starting from initial state. The goal of the loop + * tracking algorithm is to associate topmost loop entries with states + * derived from these entries. + * + * For each step in the DFS states traversal algorithm needs to identify + * the following situations: + * + * initial initial initial + * | | | + * V V V + * ... ... .---------> hdr + * | | | | + * V V | V + * cur .-> succ | .------... + * | | | | | | + * V | V | V V + * succ '-- cur | ... ... + * | | | + * | V V + * | succ <- cur + * | | + * | V + * | ... + * | | + * '----' + * + * (A) successor state of cur (B) successor state of cur or it's entry + * not yet traversed are in current DFS path, thus cur and succ + * are members of the same outermost loop + * + * initial initial + * | | + * V V + * ... ... + * | | + * V V + * .------... .------... + * | | | | + * V V V V + * .-> hdr ... ... ... + * | | | | | + * | V V V V + * | succ <- cur succ <- cur + * | | | + * | V V + * | ... ... + * | | | + * '----' exit + * + * (C) successor state of cur is a part of some loop but this loop + * does not include cur or successor state is not in a loop at all. + * + * Algorithm could be described as the following python code: + * + * traversed = set() # Set of traversed nodes + * entries = {} # Mapping from node to loop entry + * depths = {} # Depth level assigned to graph node + * path = set() # Current DFS path + * + * # Find outermost loop entry known for n + * def get_loop_entry(n): + * h = entries.get(n, None) + * while h in entries and entries[h] != h: + * h = entries[h] + * return h + * + * # Update n's loop entry if h's outermost entry comes + * # before n's outermost entry in current DFS path. + * def update_loop_entry(n, h): + * n1 = get_loop_entry(n) or n + * h1 = get_loop_entry(h) or h + * if h1 in path and depths[h1] <= depths[n1]: + * entries[n] = h1 + * + * def dfs(n, depth): + * traversed.add(n) + * path.add(n) + * depths[n] = depth + * for succ in G.successors(n): + * if succ not in traversed: + * # Case A: explore succ and update cur's loop entry + * # only if succ's entry is in current DFS path. + * dfs(succ, depth + 1) + * h = get_loop_entry(succ) + * update_loop_entry(n, h) + * else: + * # Case B or C depending on `h1 in path` check in update_loop_entry(). + * update_loop_entry(n, succ) + * path.remove(n) + * + * To adapt this algorithm for use with verifier: + * - use st->branch == 0 as a signal that DFS of succ had been finished + * and cur's loop entry has to be updated (case A), handle this in + * update_branch_counts(); + * - use st->branch > 0 as a signal that st is in the current DFS path; + * - handle cases B and C in is_state_visited(); + * - update topmost loop entry for intermediate states in get_loop_entry(). + */ +static struct bpf_verifier_state *get_loop_entry(struct bpf_verifier_state *st) +{ + struct bpf_verifier_state *topmost = st->loop_entry, *old; + + while (topmost && topmost->loop_entry && topmost != topmost->loop_entry) + topmost = topmost->loop_entry; + /* Update loop entries for intermediate states to avoid this + * traversal in future get_loop_entry() calls. + */ + while (st && st->loop_entry != topmost) { + old = st->loop_entry; + st->loop_entry = topmost; + st = old; + } + return topmost; +} + +static void update_loop_entry(struct bpf_verifier_state *cur, struct bpf_verifier_state *hdr) +{ + struct bpf_verifier_state *cur1, *hdr1; + + cur1 = get_loop_entry(cur) ?: cur; + hdr1 = get_loop_entry(hdr) ?: hdr; + /* The head1->branches check decides between cases B and C in + * comment for get_loop_entry(). If hdr1->branches == 0 then + * head's topmost loop entry is not in current DFS path, + * hence 'cur' and 'hdr' are not in the same loop and there is + * no need to update cur->loop_entry. + */ + if (hdr1->branches && hdr1->dfs_depth <= cur1->dfs_depth) { + cur->loop_entry = hdr; + hdr->used_as_loop_entry = true; + } +} + static void update_branch_counts(struct bpf_verifier_env *env, struct bpf_verifier_state *st) { while (st) { u32 br = --st->branches;
+ /* br == 0 signals that DFS exploration for 'st' is finished, + * thus it is necessary to update parent's loop entry if it + * turned out that st is a part of some loop. + * This is a part of 'case A' in get_loop_entry() comment. + */ + if (br == 0 && st->parent && st->loop_entry) + update_loop_entry(st->parent, st->loop_entry); + /* WARN_ON(br > 1) technically makes sense here, * but see comment in push_stack(), hence: */ @@ -16262,10 +16428,11 @@ static int is_state_visited(struct bpf_v { struct bpf_verifier_state_list *new_sl; struct bpf_verifier_state_list *sl, **pprev; - struct bpf_verifier_state *cur = env->cur_state, *new; + struct bpf_verifier_state *cur = env->cur_state, *new, *loop_entry; int i, j, n, err, states_cnt = 0; bool force_new_state = env->test_state_freq || is_force_checkpoint(env, insn_idx); bool add_new_state = force_new_state; + bool force_exact;
/* bpf progs typically have pruning point every 4 instructions * http://vger.kernel.org/bpfconf2019.html#session-1 @@ -16360,8 +16527,10 @@ static int is_state_visited(struct bpf_v */ spi = __get_spi(iter_reg->off + iter_reg->var_off.value); iter_state = &func(env, iter_reg)->stack[spi].spilled_ptr; - if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE) + if (iter_state->iter.state == BPF_ITER_STATE_ACTIVE) { + update_loop_entry(cur, &sl->state); goto hit; + } } goto skip_inf_loop_check; } @@ -16392,7 +16561,36 @@ skip_inf_loop_check: add_new_state = false; goto miss; } - if (states_equal(env, &sl->state, cur, false)) { + /* If sl->state is a part of a loop and this loop's entry is a part of + * current verification path then states have to be compared exactly. + * 'force_exact' is needed to catch the following case: + * + * initial Here state 'succ' was processed first, + * | it was eventually tracked to produce a + * V state identical to 'hdr'. + * .---------> hdr All branches from 'succ' had been explored + * | | and thus 'succ' has its .branches == 0. + * | V + * | .------... Suppose states 'cur' and 'succ' correspond + * | | | to the same instruction + callsites. + * | V V In such case it is necessary to check + * | ... ... if 'succ' and 'cur' are states_equal(). + * | | | If 'succ' and 'cur' are a part of the + * | V V same loop exact flag has to be set. + * | succ <- cur To check if that is the case, verify + * | | if loop entry of 'succ' is in current + * | V DFS path. + * | ... + * | | + * '----' + * + * Additional details are in the comment before get_loop_entry(). + */ + loop_entry = get_loop_entry(&sl->state); + force_exact = loop_entry && loop_entry->branches > 0; + if (states_equal(env, &sl->state, cur, force_exact)) { + if (force_exact) + update_loop_entry(cur, loop_entry); hit: sl->hit_cnt++; /* reached equivalent register/stack state, @@ -16441,7 +16639,8 @@ miss: * speed up verification */ *pprev = sl->next; - if (sl->state.frame[0]->regs[0].live & REG_LIVE_DONE) { + if (sl->state.frame[0]->regs[0].live & REG_LIVE_DONE && + !sl->state.used_as_loop_entry) { u32 br = sl->state.branches;
WARN_ONCE(br,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 64870feebecb7130291a55caf0ce839a87405a70 upstream.
A convoluted test case for iterators convergence logic that demonstrates that states with branch count equal to 0 might still be a part of not completely explored loop.
E.g. consider the following state diagram:
initial Here state 'succ' was processed first, | it was eventually tracked to produce a V state identical to 'hdr'. .---------> hdr All branches from 'succ' had been explored | | and thus 'succ' has its .branches == 0. | V | .------... Suppose states 'cur' and 'succ' correspond | | | to the same instruction + callsites. | V V In such case it is necessary to check | ... ... whether 'succ' and 'cur' are identical. | | | If 'succ' and 'cur' are a part of the same loop | V V they have to be compared exactly. | succ <- cur | | | V | ... | | '----'
Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231024000917.12153-7-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/progs/iters.c | 177 ++++++++++++++++++++++++++++++ 1 file changed, 177 insertions(+)
--- a/tools/testing/selftests/bpf/progs/iters.c +++ b/tools/testing/selftests/bpf/progs/iters.c @@ -999,6 +999,183 @@ __naked int loop_state_deps1(void) }
SEC("?raw_tp") +__failure +__msg("math between fp pointer and register with unbounded") +__flag(BPF_F_TEST_STATE_FREQ) +__naked int loop_state_deps2(void) +{ + /* This is equivalent to C program below. + * + * The case turns out to be tricky in a sense that: + * - states with read+precise mark on c are explored only on a second + * iteration of the first inner loop and in a state which is pushed to + * states stack first. + * - states with c=-25 are explored only on a second iteration of the + * second inner loop and in a state which is pushed to states stack + * first. + * + * Depending on the details of iterator convergence logic + * verifier might stop states traversal too early and miss + * unsafe c=-25 memory access. + * + * j = iter_new(); // fp[-16] + * a = 0; // r6 + * b = 0; // r7 + * c = -24; // r8 + * while (iter_next(j)) { + * i = iter_new(); // fp[-8] + * a = 0; // r6 + * b = 0; // r7 + * while (iter_next(i)) { + * if (a == 1) { + * a = 0; + * b = 1; + * } else if (a == 0) { + * a = 1; + * if (random() == 42) + * continue; + * if (b == 1) { + * *(r10 + c) = 7; // this is not safe + * iter_destroy(i); + * iter_destroy(j); + * return; + * } + * } + * } + * iter_destroy(i); + * i = iter_new(); // fp[-8] + * a = 0; // r6 + * b = 0; // r7 + * while (iter_next(i)) { + * if (a == 1) { + * a = 0; + * b = 1; + * } else if (a == 0) { + * a = 1; + * if (random() == 42) + * continue; + * if (b == 1) { + * a = 0; + * c = -25; + * } + * } + * } + * iter_destroy(i); + * } + * iter_destroy(j); + * return; + */ + asm volatile ( + "r1 = r10;" + "r1 += -16;" + "r2 = 0;" + "r3 = 10;" + "call %[bpf_iter_num_new];" + "r6 = 0;" + "r7 = 0;" + "r8 = -24;" + "j_loop_%=:" + "r1 = r10;" + "r1 += -16;" + "call %[bpf_iter_num_next];" + "if r0 == 0 goto j_loop_end_%=;" + + /* first inner loop */ + "r1 = r10;" + "r1 += -8;" + "r2 = 0;" + "r3 = 10;" + "call %[bpf_iter_num_new];" + "r6 = 0;" + "r7 = 0;" + "i_loop_%=:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_next];" + "if r0 == 0 goto i_loop_end_%=;" + "check_one_r6_%=:" + "if r6 != 1 goto check_zero_r6_%=;" + "r6 = 0;" + "r7 = 1;" + "goto i_loop_%=;" + "check_zero_r6_%=:" + "if r6 != 0 goto i_loop_%=;" + "r6 = 1;" + "call %[bpf_get_prandom_u32];" + "if r0 != 42 goto check_one_r7_%=;" + "goto i_loop_%=;" + "check_one_r7_%=:" + "if r7 != 1 goto i_loop_%=;" + "r0 = r10;" + "r0 += r8;" + "r1 = 7;" + "*(u64 *)(r0 + 0) = r1;" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_destroy];" + "r1 = r10;" + "r1 += -16;" + "call %[bpf_iter_num_destroy];" + "r0 = 0;" + "exit;" + "i_loop_end_%=:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_destroy];" + + /* second inner loop */ + "r1 = r10;" + "r1 += -8;" + "r2 = 0;" + "r3 = 10;" + "call %[bpf_iter_num_new];" + "r6 = 0;" + "r7 = 0;" + "i2_loop_%=:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_next];" + "if r0 == 0 goto i2_loop_end_%=;" + "check2_one_r6_%=:" + "if r6 != 1 goto check2_zero_r6_%=;" + "r6 = 0;" + "r7 = 1;" + "goto i2_loop_%=;" + "check2_zero_r6_%=:" + "if r6 != 0 goto i2_loop_%=;" + "r6 = 1;" + "call %[bpf_get_prandom_u32];" + "if r0 != 42 goto check2_one_r7_%=;" + "goto i2_loop_%=;" + "check2_one_r7_%=:" + "if r7 != 1 goto i2_loop_%=;" + "r6 = 0;" + "r8 = -25;" + "goto i2_loop_%=;" + "i2_loop_end_%=:" + "r1 = r10;" + "r1 += -8;" + "call %[bpf_iter_num_destroy];" + + "r6 = 0;" + "r7 = 0;" + "goto j_loop_%=;" + "j_loop_end_%=:" + "r1 = r10;" + "r1 += -16;" + "call %[bpf_iter_num_destroy];" + "r0 = 0;" + "exit;" + : + : __imm(bpf_get_prandom_u32), + __imm(bpf_iter_num_new), + __imm(bpf_iter_num_next), + __imm(bpf_iter_num_destroy) + : __clobber_all + ); +} + +SEC("?raw_tp") __success __naked int triple_continue(void) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit b4d8239534fddc036abe4a0fdbf474d9894d4641 upstream.
Additional logging in is_state_visited(): if infinite loop is detected print full verifier state for both current and equivalent states.
Acked-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231024000917.12153-8-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -16540,6 +16540,10 @@ static int is_state_visited(struct bpf_v !iter_active_depths_differ(&sl->state, cur)) { verbose_linfo(env, insn_idx, "; "); verbose(env, "infinite loop detected at insn %d\n", insn_idx); + verbose(env, "cur state:"); + print_verifier_state(env, cur->frame[cur->curframe], true); + verbose(env, "old state:"); + print_verifier_state(env, sl->state.frame[cur->curframe], true); return -EINVAL; } /* if the verifier is processing a loop, avoid adding new state
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 977bc146d4eb7070118d8a974919b33bb52732b4 upstream.
This change prepares syncookie_{tc,xdp} for update in callbakcs verification logic. To allow bpf_loop() verification converge when multiple callback itreations are considered: - track offset inside TCP payload explicitly, not as a part of the pointer; - make sure that offset does not exceed MAX_PACKET_OFF enforced by verifier; - make sure that offset is tracked as unbound scalar between iterations, otherwise verifier won't be able infer that bpf_loop callback reaches identical states.
Acked-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231121020701.26440-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c | 84 +++++++++++------- 1 file changed, 52 insertions(+), 32 deletions(-)
--- a/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c +++ b/tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c @@ -53,6 +53,8 @@ #define DEFAULT_TTL 64 #define MAX_ALLOWED_PORTS 8
+#define MAX_PACKET_OFF 0xffff + #define swap(a, b) \ do { typeof(a) __tmp = (a); (a) = (b); (b) = __tmp; } while (0)
@@ -183,63 +185,76 @@ static __always_inline __u32 tcp_time_st }
struct tcpopt_context { - __u8 *ptr; - __u8 *end; + void *data; void *data_end; __be32 *tsecr; __u8 wscale; bool option_timestamp; bool option_sack; + __u32 off; };
-static int tscookie_tcpopt_parse(struct tcpopt_context *ctx) +static __always_inline u8 *next(struct tcpopt_context *ctx, __u32 sz) { - __u8 opcode, opsize; + __u64 off = ctx->off; + __u8 *data;
- if (ctx->ptr >= ctx->end) - return 1; - if (ctx->ptr >= ctx->data_end) - return 1; + /* Verifier forbids access to packet when offset exceeds MAX_PACKET_OFF */ + if (off > MAX_PACKET_OFF - sz) + return NULL; + + data = ctx->data + off; + barrier_var(data); + if (data + sz >= ctx->data_end) + return NULL;
- opcode = ctx->ptr[0]; + ctx->off += sz; + return data; +}
- if (opcode == TCPOPT_EOL) - return 1; - if (opcode == TCPOPT_NOP) { - ++ctx->ptr; - return 0; - } +static int tscookie_tcpopt_parse(struct tcpopt_context *ctx) +{ + __u8 *opcode, *opsize, *wscale, *tsecr; + __u32 off = ctx->off;
- if (ctx->ptr + 1 >= ctx->end) + opcode = next(ctx, 1); + if (!opcode) return 1; - if (ctx->ptr + 1 >= ctx->data_end) - return 1; - opsize = ctx->ptr[1]; - if (opsize < 2) + + if (*opcode == TCPOPT_EOL) return 1; + if (*opcode == TCPOPT_NOP) + return 0;
- if (ctx->ptr + opsize > ctx->end) + opsize = next(ctx, 1); + if (!opsize || *opsize < 2) return 1;
- switch (opcode) { + switch (*opcode) { case TCPOPT_WINDOW: - if (opsize == TCPOLEN_WINDOW && ctx->ptr + TCPOLEN_WINDOW <= ctx->data_end) - ctx->wscale = ctx->ptr[2] < TCP_MAX_WSCALE ? ctx->ptr[2] : TCP_MAX_WSCALE; + wscale = next(ctx, 1); + if (!wscale) + return 1; + if (*opsize == TCPOLEN_WINDOW) + ctx->wscale = *wscale < TCP_MAX_WSCALE ? *wscale : TCP_MAX_WSCALE; break; case TCPOPT_TIMESTAMP: - if (opsize == TCPOLEN_TIMESTAMP && ctx->ptr + TCPOLEN_TIMESTAMP <= ctx->data_end) { + tsecr = next(ctx, 4); + if (!tsecr) + return 1; + if (*opsize == TCPOLEN_TIMESTAMP) { ctx->option_timestamp = true; /* Client's tsval becomes our tsecr. */ - *ctx->tsecr = get_unaligned((__be32 *)(ctx->ptr + 2)); + *ctx->tsecr = get_unaligned((__be32 *)tsecr); } break; case TCPOPT_SACK_PERM: - if (opsize == TCPOLEN_SACK_PERM) + if (*opsize == TCPOLEN_SACK_PERM) ctx->option_sack = true; break; }
- ctx->ptr += opsize; + ctx->off = off + *opsize;
return 0; } @@ -256,16 +271,21 @@ static int tscookie_tcpopt_parse_batch(_
static __always_inline bool tscookie_init(struct tcphdr *tcp_header, __u16 tcp_len, __be32 *tsval, - __be32 *tsecr, void *data_end) + __be32 *tsecr, void *data, void *data_end) { struct tcpopt_context loop_ctx = { - .ptr = (__u8 *)(tcp_header + 1), - .end = (__u8 *)tcp_header + tcp_len, + .data = data, .data_end = data_end, .tsecr = tsecr, .wscale = TS_OPT_WSCALE_MASK, .option_timestamp = false, .option_sack = false, + /* Note: currently verifier would track .off as unbound scalar. + * In case if verifier would at some point get smarter and + * compute bounded value for this var, beware that it might + * hinder bpf_loop() convergence validation. + */ + .off = (__u8 *)(tcp_header + 1) - (__u8 *)data, }; u32 cookie;
@@ -635,7 +655,7 @@ static __always_inline int syncookie_han cookie = (__u32)value;
if (tscookie_init((void *)hdr->tcp, hdr->tcp_len, - &tsopt_buf[0], &tsopt_buf[1], data_end)) + &tsopt_buf[0], &tsopt_buf[1], data, data_end)) tsopt = tsopt_buf;
/* Check that there is enough space for a SYNACK. It also covers
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 87eb0152bcc102ecbda866978f4e54db5a3be1ef upstream.
This change prepares strobemeta for update in callbacks verification logic. To allow bpf_loop() verification converge when multiple callback iterations are considered: - track offset inside strobemeta_payload->payload directly as scalar value; - at each iteration make sure that remaining strobemeta_payload->payload capacity is sufficient for execution of read_{map,str}_var functions; - make sure that offset is tracked as unbound scalar between iterations, otherwise verifier won't be able infer that bpf_loop callback reaches identical states.
Acked-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231121020701.26440-3-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/progs/strobemeta.h | 78 +++++++++++++++---------- 1 file changed, 48 insertions(+), 30 deletions(-)
--- a/tools/testing/selftests/bpf/progs/strobemeta.h +++ b/tools/testing/selftests/bpf/progs/strobemeta.h @@ -24,9 +24,11 @@ struct task_struct {}; #define STACK_TABLE_EPOCH_SHIFT 20 #define STROBE_MAX_STR_LEN 1 #define STROBE_MAX_CFGS 32 +#define READ_MAP_VAR_PAYLOAD_CAP \ + ((1 + STROBE_MAX_MAP_ENTRIES * 2) * STROBE_MAX_STR_LEN) #define STROBE_MAX_PAYLOAD \ (STROBE_MAX_STRS * STROBE_MAX_STR_LEN + \ - STROBE_MAX_MAPS * (1 + STROBE_MAX_MAP_ENTRIES * 2) * STROBE_MAX_STR_LEN) + STROBE_MAX_MAPS * READ_MAP_VAR_PAYLOAD_CAP)
struct strobe_value_header { /* @@ -355,7 +357,7 @@ static __always_inline uint64_t read_str size_t idx, void *tls_base, struct strobe_value_generic *value, struct strobemeta_payload *data, - void *payload) + size_t off) { void *location; uint64_t len; @@ -366,7 +368,7 @@ static __always_inline uint64_t read_str return 0;
bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); - len = bpf_probe_read_user_str(payload, STROBE_MAX_STR_LEN, value->ptr); + len = bpf_probe_read_user_str(&data->payload[off], STROBE_MAX_STR_LEN, value->ptr); /* * if bpf_probe_read_user_str returns error (<0), due to casting to * unsinged int, it will become big number, so next check is @@ -378,14 +380,14 @@ static __always_inline uint64_t read_str return 0;
data->str_lens[idx] = len; - return len; + return off + len; }
-static __always_inline void *read_map_var(struct strobemeta_cfg *cfg, - size_t idx, void *tls_base, - struct strobe_value_generic *value, - struct strobemeta_payload *data, - void *payload) +static __always_inline uint64_t read_map_var(struct strobemeta_cfg *cfg, + size_t idx, void *tls_base, + struct strobe_value_generic *value, + struct strobemeta_payload *data, + size_t off) { struct strobe_map_descr* descr = &data->map_descrs[idx]; struct strobe_map_raw map; @@ -397,11 +399,11 @@ static __always_inline void *read_map_va
location = calc_location(&cfg->map_locs[idx], tls_base); if (!location) - return payload; + return off;
bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location); if (bpf_probe_read_user(&map, sizeof(struct strobe_map_raw), value->ptr)) - return payload; + return off;
descr->id = map.id; descr->cnt = map.cnt; @@ -410,10 +412,10 @@ static __always_inline void *read_map_va data->req_meta_valid = 1; }
- len = bpf_probe_read_user_str(payload, STROBE_MAX_STR_LEN, map.tag); + len = bpf_probe_read_user_str(&data->payload[off], STROBE_MAX_STR_LEN, map.tag); if (len <= STROBE_MAX_STR_LEN) { descr->tag_len = len; - payload += len; + off += len; }
#ifdef NO_UNROLL @@ -426,22 +428,22 @@ static __always_inline void *read_map_va break;
descr->key_lens[i] = 0; - len = bpf_probe_read_user_str(payload, STROBE_MAX_STR_LEN, + len = bpf_probe_read_user_str(&data->payload[off], STROBE_MAX_STR_LEN, map.entries[i].key); if (len <= STROBE_MAX_STR_LEN) { descr->key_lens[i] = len; - payload += len; + off += len; } descr->val_lens[i] = 0; - len = bpf_probe_read_user_str(payload, STROBE_MAX_STR_LEN, + len = bpf_probe_read_user_str(&data->payload[off], STROBE_MAX_STR_LEN, map.entries[i].val); if (len <= STROBE_MAX_STR_LEN) { descr->val_lens[i] = len; - payload += len; + off += len; } }
- return payload; + return off; }
#ifdef USE_BPF_LOOP @@ -455,14 +457,20 @@ struct read_var_ctx { struct strobemeta_payload *data; void *tls_base; struct strobemeta_cfg *cfg; - void *payload; + size_t payload_off; /* value gets mutated */ struct strobe_value_generic *value; enum read_type type; };
-static int read_var_callback(__u32 index, struct read_var_ctx *ctx) +static int read_var_callback(__u64 index, struct read_var_ctx *ctx) { + /* lose precision info for ctx->payload_off, verifier won't track + * double xor, barrier_var() is needed to force clang keep both xors. + */ + ctx->payload_off ^= index; + barrier_var(ctx->payload_off); + ctx->payload_off ^= index; switch (ctx->type) { case READ_INT_VAR: if (index >= STROBE_MAX_INTS) @@ -472,14 +480,18 @@ static int read_var_callback(__u32 index case READ_MAP_VAR: if (index >= STROBE_MAX_MAPS) return 1; - ctx->payload = read_map_var(ctx->cfg, index, ctx->tls_base, - ctx->value, ctx->data, ctx->payload); + if (ctx->payload_off > sizeof(ctx->data->payload) - READ_MAP_VAR_PAYLOAD_CAP) + return 1; + ctx->payload_off = read_map_var(ctx->cfg, index, ctx->tls_base, + ctx->value, ctx->data, ctx->payload_off); break; case READ_STR_VAR: if (index >= STROBE_MAX_STRS) return 1; - ctx->payload += read_str_var(ctx->cfg, index, ctx->tls_base, - ctx->value, ctx->data, ctx->payload); + if (ctx->payload_off > sizeof(ctx->data->payload) - STROBE_MAX_STR_LEN) + return 1; + ctx->payload_off = read_str_var(ctx->cfg, index, ctx->tls_base, + ctx->value, ctx->data, ctx->payload_off); break; } return 0; @@ -501,7 +513,8 @@ static void *read_strobe_meta(struct tas pid_t pid = bpf_get_current_pid_tgid() >> 32; struct strobe_value_generic value = {0}; struct strobemeta_cfg *cfg; - void *tls_base, *payload; + size_t payload_off; + void *tls_base;
cfg = bpf_map_lookup_elem(&strobemeta_cfgs, &pid); if (!cfg) @@ -509,7 +522,7 @@ static void *read_strobe_meta(struct tas
data->int_vals_set_mask = 0; data->req_meta_valid = 0; - payload = data->payload; + payload_off = 0; /* * we don't have struct task_struct definition, it should be: * tls_base = (void *)task->thread.fsbase; @@ -522,7 +535,7 @@ static void *read_strobe_meta(struct tas .tls_base = tls_base, .value = &value, .data = data, - .payload = payload, + .payload_off = 0, }; int err;
@@ -540,6 +553,11 @@ static void *read_strobe_meta(struct tas err = bpf_loop(STROBE_MAX_MAPS, read_var_callback, &ctx, 0); if (err != STROBE_MAX_MAPS) return NULL; + + payload_off = ctx.payload_off; + /* this should not really happen, here only to satisfy verifer */ + if (payload_off > sizeof(data->payload)) + payload_off = sizeof(data->payload); #else #ifdef NO_UNROLL #pragma clang loop unroll(disable) @@ -555,7 +573,7 @@ static void *read_strobe_meta(struct tas #pragma unroll #endif /* NO_UNROLL */ for (int i = 0; i < STROBE_MAX_STRS; ++i) { - payload += read_str_var(cfg, i, tls_base, &value, data, payload); + payload_off = read_str_var(cfg, i, tls_base, &value, data, payload_off); } #ifdef NO_UNROLL #pragma clang loop unroll(disable) @@ -563,7 +581,7 @@ static void *read_strobe_meta(struct tas #pragma unroll #endif /* NO_UNROLL */ for (int i = 0; i < STROBE_MAX_MAPS; ++i) { - payload = read_map_var(cfg, i, tls_base, &value, data, payload); + payload_off = read_map_var(cfg, i, tls_base, &value, data, payload_off); } #endif /* USE_BPF_LOOP */
@@ -571,7 +589,7 @@ static void *read_strobe_meta(struct tas * return pointer right after end of payload, so it's possible to * calculate exact amount of useful data that needs to be sent */ - return payload; + return &data->payload[payload_off]; }
SEC("raw_tracepoint/kfree_skb")
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 683b96f9606ab7308ffb23c46ab43cecdef8a241 upstream.
Split check_reg_arg() into two utility functions: - check_reg_arg() operating on registers from current verifier state; - __check_reg_arg() operating on a specific set of registers passed as a parameter;
The __check_reg_arg() function would be used by a follow-up change for callbacks handling.
Acked-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231121020701.26440-5-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -3321,13 +3321,11 @@ static void mark_insn_zext(struct bpf_ve reg->subreg_def = DEF_NOT_SUBREG; }
-static int check_reg_arg(struct bpf_verifier_env *env, u32 regno, - enum reg_arg_type t) +static int __check_reg_arg(struct bpf_verifier_env *env, struct bpf_reg_state *regs, u32 regno, + enum reg_arg_type t) { - struct bpf_verifier_state *vstate = env->cur_state; - struct bpf_func_state *state = vstate->frame[vstate->curframe]; struct bpf_insn *insn = env->prog->insnsi + env->insn_idx; - struct bpf_reg_state *reg, *regs = state->regs; + struct bpf_reg_state *reg; bool rw64;
if (regno >= MAX_BPF_REG) { @@ -3368,6 +3366,15 @@ static int check_reg_arg(struct bpf_veri return 0; }
+static int check_reg_arg(struct bpf_verifier_env *env, u32 regno, + enum reg_arg_type t) +{ + struct bpf_verifier_state *vstate = env->cur_state; + struct bpf_func_state *state = vstate->frame[vstate->curframe]; + + return __check_reg_arg(env, state->regs, regno, t); +} + static void mark_jmp_point(struct bpf_verifier_env *env, int idx) { env->insn_aux_data[idx].jmp_point = true; @@ -9147,7 +9154,7 @@ static void clear_caller_saved_regs(stru /* after the call registers r0 - r5 were scratched */ for (i = 0; i < CALLER_SAVED_REGS; i++) { mark_reg_not_init(env, regs, caller_saved[i]); - check_reg_arg(env, caller_saved[i], DST_OP_NO_MARK); + __check_reg_arg(env, regs, caller_saved[i], DST_OP_NO_MARK); } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 58124a98cb8eda69d248d7f1de954c8b2767c945 upstream.
Move code for simulated stack frame creation to a separate utility function. This function would be used in the follow-up change for callbacks handling.
Acked-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231121020701.26440-6-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 84 ++++++++++++++++++++++++++++---------------------- 1 file changed, 48 insertions(+), 36 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -9167,11 +9167,10 @@ static int set_callee_state(struct bpf_v struct bpf_func_state *caller, struct bpf_func_state *callee, int insn_idx);
-static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, - int *insn_idx, int subprog, - set_callee_state_fn set_callee_state_cb) +static int setup_func_entry(struct bpf_verifier_env *env, int subprog, int callsite, + set_callee_state_fn set_callee_state_cb, + struct bpf_verifier_state *state) { - struct bpf_verifier_state *state = env->cur_state; struct bpf_func_state *caller, *callee; int err;
@@ -9181,13 +9180,53 @@ static int __check_func_call(struct bpf_ return -E2BIG; }
- caller = state->frame[state->curframe]; if (state->frame[state->curframe + 1]) { verbose(env, "verifier bug. Frame %d already allocated\n", state->curframe + 1); return -EFAULT; }
+ caller = state->frame[state->curframe]; + callee = kzalloc(sizeof(*callee), GFP_KERNEL); + if (!callee) + return -ENOMEM; + state->frame[state->curframe + 1] = callee; + + /* callee cannot access r0, r6 - r9 for reading and has to write + * into its own stack before reading from it. + * callee can read/write into caller's stack + */ + init_func_state(env, callee, + /* remember the callsite, it will be used by bpf_exit */ + callsite, + state->curframe + 1 /* frameno within this callchain */, + subprog /* subprog number within this prog */); + /* Transfer references to the callee */ + err = copy_reference_state(callee, caller); + err = err ?: set_callee_state_cb(env, caller, callee, callsite); + if (err) + goto err_out; + + /* only increment it after check_reg_arg() finished */ + state->curframe++; + + return 0; + +err_out: + free_func_state(callee); + state->frame[state->curframe + 1] = NULL; + return err; +} + +static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, + int *insn_idx, int subprog, + set_callee_state_fn set_callee_state_cb) +{ + struct bpf_verifier_state *state = env->cur_state; + struct bpf_func_state *caller, *callee; + int err; + + caller = state->frame[state->curframe]; err = btf_check_subprog_call(env, subprog, caller->regs); if (err == -EFAULT) return err; @@ -9256,35 +9295,12 @@ static int __check_func_call(struct bpf_ return 0; }
- callee = kzalloc(sizeof(*callee), GFP_KERNEL); - if (!callee) - return -ENOMEM; - state->frame[state->curframe + 1] = callee; - - /* callee cannot access r0, r6 - r9 for reading and has to write - * into its own stack before reading from it. - * callee can read/write into caller's stack - */ - init_func_state(env, callee, - /* remember the callsite, it will be used by bpf_exit */ - *insn_idx /* callsite */, - state->curframe + 1 /* frameno within this callchain */, - subprog /* subprog number within this prog */); - - /* Transfer references to the callee */ - err = copy_reference_state(callee, caller); + err = setup_func_entry(env, subprog, *insn_idx, set_callee_state_cb, state); if (err) - goto err_out; - - err = set_callee_state_cb(env, caller, callee, *insn_idx); - if (err) - goto err_out; + return err;
clear_caller_saved_regs(env, caller->regs);
- /* only increment it after check_reg_arg() finished */ - state->curframe++; - /* and go analyze first insn of the callee */ *insn_idx = env->subprog_info[subprog].start - 1;
@@ -9292,14 +9308,10 @@ static int __check_func_call(struct bpf_ verbose(env, "caller:\n"); print_verifier_state(env, caller, true); verbose(env, "callee:\n"); - print_verifier_state(env, callee, true); + print_verifier_state(env, state->frame[state->curframe], true); } - return 0;
-err_out: - free_func_state(callee); - state->frame[state->curframe + 1] = NULL; - return err; + return 0; }
int map_set_for_each_callback_args(struct bpf_verifier_env *env,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit ab5cfac139ab8576fb54630d4cca23c3e690ee90 upstream.
Prior to this patch callbacks were handled as regular function calls, execution of callback body was modeled exactly once. This patch updates callbacks handling logic as follows: - introduces a function push_callback_call() that schedules callback body verification in env->head stack; - updates prepare_func_exit() to reschedule callback body verification upon BPF_EXIT; - as calls to bpf_*_iter_next(), calls to callback invoking functions are marked as checkpoints; - is_state_visited() is updated to stop callback based iteration when some identical parent state is found.
Paths with callback function invoked zero times are now verified first, which leads to necessity to modify some selftests: - the following negative tests required adding release/unlock/drop calls to avoid previously masked unrelated error reports: - cb_refs.c:underflow_prog - exceptions_fail.c:reject_rbtree_add_throw - exceptions_fail.c:reject_with_cp_reference - the following precision tracking selftests needed change in expected log trace: - verifier_subprog_precision.c:callback_result_precise (note: r0 precision is no longer propagated inside callback and I think this is a correct behavior) - verifier_subprog_precision.c:parent_callee_saved_reg_precise_with_callback - verifier_subprog_precision.c:parent_stack_slot_precise_with_callback
Reported-by: Andrew Werner awerner32@gmail.com Closes: https://lore.kernel.org/bpf/CA+vRuzPChFNXmouzGG+wsy=6eMcfr1mFG0F3g7rbg-sedGK... Acked-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231121020701.26440-7-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bpf_verifier.h | 5 kernel/bpf/verifier.c | 272 ++++++---- tools/testing/selftests/bpf/progs/cb_refs.c | 1 tools/testing/selftests/bpf/progs/verifier_subprog_precision.c | 71 ++ 4 files changed, 237 insertions(+), 112 deletions(-)
--- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -399,6 +399,7 @@ struct bpf_verifier_state { struct bpf_idx_pair *jmp_history; u32 jmp_history_cnt; u32 dfs_depth; + u32 callback_unroll_depth; };
#define bpf_get_spilled_reg(slot, frame) \ @@ -506,6 +507,10 @@ struct bpf_insn_aux_data { * this instruction, regardless of any heuristics */ bool force_checkpoint; + /* true if instruction is a call to a helper function that + * accepts callback function as a parameter. + */ + bool calls_callback; };
#define MAX_USED_MAPS 64 /* max number of maps accessed by one eBPF program */ --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -542,12 +542,11 @@ static bool is_dynptr_ref_function(enum return func_id == BPF_FUNC_dynptr_data; }
-static bool is_callback_calling_kfunc(u32 btf_id); +static bool is_sync_callback_calling_kfunc(u32 btf_id);
-static bool is_callback_calling_function(enum bpf_func_id func_id) +static bool is_sync_callback_calling_function(enum bpf_func_id func_id) { return func_id == BPF_FUNC_for_each_map_elem || - func_id == BPF_FUNC_timer_set_callback || func_id == BPF_FUNC_find_vma || func_id == BPF_FUNC_loop || func_id == BPF_FUNC_user_ringbuf_drain; @@ -558,6 +557,18 @@ static bool is_async_callback_calling_fu return func_id == BPF_FUNC_timer_set_callback; }
+static bool is_callback_calling_function(enum bpf_func_id func_id) +{ + return is_sync_callback_calling_function(func_id) || + is_async_callback_calling_function(func_id); +} + +static bool is_sync_callback_calling_insn(struct bpf_insn *insn) +{ + return (bpf_helper_call(insn) && is_sync_callback_calling_function(insn->imm)) || + (bpf_pseudo_kfunc_call(insn) && is_sync_callback_calling_kfunc(insn->imm)); +} + static bool is_storage_get_function(enum bpf_func_id func_id) { return func_id == BPF_FUNC_sk_storage_get || @@ -1772,6 +1783,7 @@ static int copy_verifier_state(struct bp dst_state->first_insn_idx = src->first_insn_idx; dst_state->last_insn_idx = src->last_insn_idx; dst_state->dfs_depth = src->dfs_depth; + dst_state->callback_unroll_depth = src->callback_unroll_depth; dst_state->used_as_loop_entry = src->used_as_loop_entry; for (i = 0; i <= src->curframe; i++) { dst = dst_state->frame[i]; @@ -3613,6 +3625,8 @@ static void fmt_stack_mask(char *buf, ss } }
+static bool calls_callback(struct bpf_verifier_env *env, int insn_idx); + /* For given verifier state backtrack_insn() is called from the last insn to * the first insn. Its purpose is to compute a bitmask of registers and * stack slots that needs precision in the parent verifier state. @@ -3788,16 +3802,13 @@ static int backtrack_insn(struct bpf_ver return -EFAULT; return 0; } - } else if ((bpf_helper_call(insn) && - is_callback_calling_function(insn->imm) && - !is_async_callback_calling_function(insn->imm)) || - (bpf_pseudo_kfunc_call(insn) && is_callback_calling_kfunc(insn->imm))) { - /* callback-calling helper or kfunc call, which means - * we are exiting from subprog, but unlike the subprog - * call handling above, we shouldn't propagate - * precision of r1-r5 (if any requested), as they are - * not actually arguments passed directly to callback - * subprogs + } else if (is_sync_callback_calling_insn(insn) && idx != subseq_idx - 1) { + /* exit from callback subprog to callback-calling helper or + * kfunc call. Use idx/subseq_idx check to discern it from + * straight line code backtracking. + * Unlike the subprog call handling above, we shouldn't + * propagate precision of r1-r5 (if any requested), as they are + * not actually arguments passed directly to callback subprogs */ if (bt_reg_mask(bt) & ~BPF_REGMASK_ARGS) { verbose(env, "BUG regs %x\n", bt_reg_mask(bt)); @@ -3832,10 +3843,18 @@ static int backtrack_insn(struct bpf_ver } else if (opcode == BPF_EXIT) { bool r0_precise;
+ /* Backtracking to a nested function call, 'idx' is a part of + * the inner frame 'subseq_idx' is a part of the outer frame. + * In case of a regular function call, instructions giving + * precision to registers R1-R5 should have been found already. + * In case of a callback, it is ok to have R1-R5 marked for + * backtracking, as these registers are set by the function + * invoking callback. + */ + if (subseq_idx >= 0 && calls_callback(env, subseq_idx)) + for (i = BPF_REG_1; i <= BPF_REG_5; i++) + bt_clear_reg(bt, i); if (bt_reg_mask(bt) & BPF_REGMASK_ARGS) { - /* if backtracing was looking for registers R1-R5 - * they should have been found already. - */ verbose(env, "BUG regs %x\n", bt_reg_mask(bt)); WARN_ONCE(1, "verifier backtracking bug"); return -EFAULT; @@ -9218,11 +9237,11 @@ err_out: return err; }
-static int __check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, - int *insn_idx, int subprog, - set_callee_state_fn set_callee_state_cb) +static int push_callback_call(struct bpf_verifier_env *env, struct bpf_insn *insn, + int insn_idx, int subprog, + set_callee_state_fn set_callee_state_cb) { - struct bpf_verifier_state *state = env->cur_state; + struct bpf_verifier_state *state = env->cur_state, *callback_state; struct bpf_func_state *caller, *callee; int err;
@@ -9230,43 +9249,21 @@ static int __check_func_call(struct bpf_ err = btf_check_subprog_call(env, subprog, caller->regs); if (err == -EFAULT) return err; - if (subprog_is_global(env, subprog)) { - if (err) { - verbose(env, "Caller passes invalid args into func#%d\n", - subprog); - return err; - } else { - if (env->log.level & BPF_LOG_LEVEL) - verbose(env, - "Func#%d is global and valid. Skipping.\n", - subprog); - clear_caller_saved_regs(env, caller->regs); - - /* All global functions return a 64-bit SCALAR_VALUE */ - mark_reg_unknown(env, caller->regs, BPF_REG_0); - caller->regs[BPF_REG_0].subreg_def = DEF_NOT_SUBREG; - - /* continue with next insn after call */ - return 0; - } - }
/* set_callee_state is used for direct subprog calls, but we are * interested in validating only BPF helpers that can call subprogs as * callbacks */ - if (set_callee_state_cb != set_callee_state) { - if (bpf_pseudo_kfunc_call(insn) && - !is_callback_calling_kfunc(insn->imm)) { - verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n", - func_id_name(insn->imm), insn->imm); - return -EFAULT; - } else if (!bpf_pseudo_kfunc_call(insn) && - !is_callback_calling_function(insn->imm)) { /* helper */ - verbose(env, "verifier bug: helper %s#%d not marked as callback-calling\n", - func_id_name(insn->imm), insn->imm); - return -EFAULT; - } + if (bpf_pseudo_kfunc_call(insn) && + !is_sync_callback_calling_kfunc(insn->imm)) { + verbose(env, "verifier bug: kfunc %s#%d not marked as callback-calling\n", + func_id_name(insn->imm), insn->imm); + return -EFAULT; + } else if (!bpf_pseudo_kfunc_call(insn) && + !is_callback_calling_function(insn->imm)) { /* helper */ + verbose(env, "verifier bug: helper %s#%d not marked as callback-calling\n", + func_id_name(insn->imm), insn->imm); + return -EFAULT; }
if (insn->code == (BPF_JMP | BPF_CALL) && @@ -9277,25 +9274,76 @@ static int __check_func_call(struct bpf_ /* there is no real recursion here. timer callbacks are async */ env->subprog_info[subprog].is_async_cb = true; async_cb = push_async_cb(env, env->subprog_info[subprog].start, - *insn_idx, subprog); + insn_idx, subprog); if (!async_cb) return -EFAULT; callee = async_cb->frame[0]; callee->async_entry_cnt = caller->async_entry_cnt + 1;
/* Convert bpf_timer_set_callback() args into timer callback args */ - err = set_callee_state_cb(env, caller, callee, *insn_idx); + err = set_callee_state_cb(env, caller, callee, insn_idx); if (err) return err;
+ return 0; + } + + /* for callback functions enqueue entry to callback and + * proceed with next instruction within current frame. + */ + callback_state = push_stack(env, env->subprog_info[subprog].start, insn_idx, false); + if (!callback_state) + return -ENOMEM; + + err = setup_func_entry(env, subprog, insn_idx, set_callee_state_cb, + callback_state); + if (err) + return err; + + callback_state->callback_unroll_depth++; + return 0; +} + +static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, + int *insn_idx) +{ + struct bpf_verifier_state *state = env->cur_state; + struct bpf_func_state *caller; + int err, subprog, target_insn; + + target_insn = *insn_idx + insn->imm + 1; + subprog = find_subprog(env, target_insn); + if (subprog < 0) { + verbose(env, "verifier bug. No program starts at insn %d\n", target_insn); + return -EFAULT; + } + + caller = state->frame[state->curframe]; + err = btf_check_subprog_call(env, subprog, caller->regs); + if (err == -EFAULT) + return err; + if (subprog_is_global(env, subprog)) { + if (err) { + verbose(env, "Caller passes invalid args into func#%d\n", subprog); + return err; + } + + if (env->log.level & BPF_LOG_LEVEL) + verbose(env, "Func#%d is global and valid. Skipping.\n", subprog); clear_caller_saved_regs(env, caller->regs); + + /* All global functions return a 64-bit SCALAR_VALUE */ mark_reg_unknown(env, caller->regs, BPF_REG_0); caller->regs[BPF_REG_0].subreg_def = DEF_NOT_SUBREG; + /* continue with next insn after call */ return 0; }
- err = setup_func_entry(env, subprog, *insn_idx, set_callee_state_cb, state); + /* for regular function entry setup new frame and continue + * from that frame. + */ + err = setup_func_entry(env, subprog, *insn_idx, set_callee_state, state); if (err) return err;
@@ -9355,22 +9403,6 @@ static int set_callee_state(struct bpf_v return 0; }
-static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, - int *insn_idx) -{ - int subprog, target_insn; - - target_insn = *insn_idx + insn->imm + 1; - subprog = find_subprog(env, target_insn); - if (subprog < 0) { - verbose(env, "verifier bug. No program starts at insn %d\n", - target_insn); - return -EFAULT; - } - - return __check_func_call(env, insn, insn_idx, subprog, set_callee_state); -} - static int set_map_elem_callback_state(struct bpf_verifier_env *env, struct bpf_func_state *caller, struct bpf_func_state *callee, @@ -9601,6 +9633,11 @@ static int prepare_func_exit(struct bpf_ verbose_invalid_scalar(env, r0, &range, "callback return", "R0"); return -EINVAL; } + if (!calls_callback(env, callee->callsite)) { + verbose(env, "BUG: in callback at %d, callsite %d !calls_callback\n", + *insn_idx, callee->callsite); + return -EFAULT; + } } else { /* return to the caller whatever r0 had in the callee */ caller->regs[BPF_REG_0] = *r0; @@ -9618,7 +9655,15 @@ static int prepare_func_exit(struct bpf_ return err; }
- *insn_idx = callee->callsite + 1; + /* for callbacks like bpf_loop or bpf_for_each_map_elem go back to callsite, + * there function call logic would reschedule callback visit. If iteration + * converges is_state_visited() would prune that visit eventually. + */ + if (callee->in_callback_fn) + *insn_idx = callee->callsite; + else + *insn_idx = callee->callsite + 1; + if (env->log.level & BPF_LOG_LEVEL) { verbose(env, "returning from callee:\n"); print_verifier_state(env, callee, true); @@ -10009,24 +10054,24 @@ static int check_helper_call(struct bpf_ } break; case BPF_FUNC_for_each_map_elem: - err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, - set_map_elem_callback_state); + err = push_callback_call(env, insn, insn_idx, meta.subprogno, + set_map_elem_callback_state); break; case BPF_FUNC_timer_set_callback: - err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, - set_timer_callback_state); + err = push_callback_call(env, insn, insn_idx, meta.subprogno, + set_timer_callback_state); break; case BPF_FUNC_find_vma: - err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, - set_find_vma_callback_state); + err = push_callback_call(env, insn, insn_idx, meta.subprogno, + set_find_vma_callback_state); break; case BPF_FUNC_snprintf: err = check_bpf_snprintf_call(env, regs); break; case BPF_FUNC_loop: update_loop_inline_state(env, meta.subprogno); - err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, - set_loop_callback_state); + err = push_callback_call(env, insn, insn_idx, meta.subprogno, + set_loop_callback_state); break; case BPF_FUNC_dynptr_from_mem: if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE) { @@ -10105,8 +10150,8 @@ static int check_helper_call(struct bpf_ break; } case BPF_FUNC_user_ringbuf_drain: - err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, - set_user_ringbuf_callback_state); + err = push_callback_call(env, insn, insn_idx, meta.subprogno, + set_user_ringbuf_callback_state); break; }
@@ -10956,7 +11001,7 @@ static bool is_bpf_graph_api_kfunc(u32 b btf_id == special_kfunc_list[KF_bpf_refcount_acquire_impl]; }
-static bool is_callback_calling_kfunc(u32 btf_id) +static bool is_sync_callback_calling_kfunc(u32 btf_id) { return btf_id == special_kfunc_list[KF_bpf_rbtree_add_impl]; } @@ -11660,6 +11705,21 @@ static int check_kfunc_call(struct bpf_v return -EACCES; }
+ /* Check the arguments */ + err = check_kfunc_args(env, &meta, insn_idx); + if (err < 0) + return err; + + if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) { + err = push_callback_call(env, insn, insn_idx, meta.subprogno, + set_rbtree_add_callback_state); + if (err) { + verbose(env, "kfunc %s#%d failed callback verification\n", + func_name, meta.func_id); + return err; + } + } + rcu_lock = is_kfunc_bpf_rcu_read_lock(&meta); rcu_unlock = is_kfunc_bpf_rcu_read_unlock(&meta);
@@ -11694,10 +11754,6 @@ static int check_kfunc_call(struct bpf_v return -EINVAL; }
- /* Check the arguments */ - err = check_kfunc_args(env, &meta, insn_idx); - if (err < 0) - return err; /* In case of release function, we get register number of refcounted * PTR_TO_BTF_ID in bpf_kfunc_arg_meta, do the release now. */ @@ -11731,16 +11787,6 @@ static int check_kfunc_call(struct bpf_v } }
- if (meta.func_id == special_kfunc_list[KF_bpf_rbtree_add_impl]) { - err = __check_func_call(env, insn, insn_idx_p, meta.subprogno, - set_rbtree_add_callback_state); - if (err) { - verbose(env, "kfunc %s#%d failed callback verification\n", - func_name, meta.func_id); - return err; - } - } - for (i = 0; i < CALLER_SAVED_REGS; i++) mark_reg_not_init(env, regs, caller_saved[i]);
@@ -15047,6 +15093,15 @@ static bool is_force_checkpoint(struct b return env->insn_aux_data[insn_idx].force_checkpoint; }
+static void mark_calls_callback(struct bpf_verifier_env *env, int idx) +{ + env->insn_aux_data[idx].calls_callback = true; +} + +static bool calls_callback(struct bpf_verifier_env *env, int insn_idx) +{ + return env->insn_aux_data[insn_idx].calls_callback; +}
enum { DONE_EXPLORING = 0, @@ -15160,6 +15215,21 @@ static int visit_insn(int t, struct bpf_ * async state will be pushed for further exploration. */ mark_prune_point(env, t); + /* For functions that invoke callbacks it is not known how many times + * callback would be called. Verifier models callback calling functions + * by repeatedly visiting callback bodies and returning to origin call + * instruction. + * In order to stop such iteration verifier needs to identify when a + * state identical some state from a previous iteration is reached. + * Check below forces creation of checkpoint before callback calling + * instruction to allow search for such identical states. + */ + if (is_sync_callback_calling_insn(insn)) { + mark_calls_callback(env, t); + mark_force_checkpoint(env, t); + mark_prune_point(env, t); + mark_jmp_point(env, t); + } if (insn->src_reg == BPF_PSEUDO_KFUNC_CALL) { struct bpf_kfunc_call_arg_meta meta;
@@ -16553,10 +16623,16 @@ static int is_state_visited(struct bpf_v } goto skip_inf_loop_check; } + if (calls_callback(env, insn_idx)) { + if (states_equal(env, &sl->state, cur, true)) + goto hit; + goto skip_inf_loop_check; + } /* attempt to detect infinite loop to avoid unnecessary doomed work */ if (states_maybe_looping(&sl->state, cur) && states_equal(env, &sl->state, cur, false) && - !iter_active_depths_differ(&sl->state, cur)) { + !iter_active_depths_differ(&sl->state, cur) && + sl->state.callback_unroll_depth == cur->callback_unroll_depth) { verbose_linfo(env, insn_idx, "; "); verbose(env, "infinite loop detected at insn %d\n", insn_idx); verbose(env, "cur state:"); --- a/tools/testing/selftests/bpf/progs/cb_refs.c +++ b/tools/testing/selftests/bpf/progs/cb_refs.c @@ -33,6 +33,7 @@ int underflow_prog(void *ctx) if (!p) return 0; bpf_for_each_map_elem(&array_map, cb1, &p, 0); + bpf_kfunc_call_test_release(p); return 0; }
--- a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c +++ b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c @@ -119,15 +119,26 @@ __naked int global_subprog_result_precis
SEC("?raw_tp") __success __log_level(2) +/* First simulated path does not include callback body */ __msg("14: (0f) r1 += r6") -__msg("mark_precise: frame0: last_idx 14 first_idx 10") +__msg("mark_precise: frame0: last_idx 14 first_idx 9") __msg("mark_precise: frame0: regs=r6 stack= before 13: (bf) r1 = r7") __msg("mark_precise: frame0: regs=r6 stack= before 12: (27) r6 *= 4") __msg("mark_precise: frame0: regs=r6 stack= before 11: (25) if r6 > 0x3 goto pc+4") __msg("mark_precise: frame0: regs=r6 stack= before 10: (bf) r6 = r0") -__msg("mark_precise: frame0: parent state regs=r0 stack=:") -__msg("mark_precise: frame0: last_idx 18 first_idx 0") -__msg("mark_precise: frame0: regs=r0 stack= before 18: (95) exit") +__msg("mark_precise: frame0: regs=r0 stack= before 9: (85) call bpf_loop") +/* State entering callback body popped from states stack */ +__msg("from 9 to 17: frame1:") +__msg("17: frame1: R1=scalar() R2=0 R10=fp0 cb") +__msg("17: (b7) r0 = 0") +__msg("18: (95) exit") +__msg("returning from callee:") +__msg("to caller at 9:") +/* r4 (flags) is always precise for bpf_loop() */ +__msg("frame 0: propagating r4") +__msg("mark_precise: frame0: last_idx 9 first_idx 9 subseq_idx -1") +__msg("mark_precise: frame0: regs=r4 stack= before 18: (95) exit") +__msg("from 18 to 9: safe") __naked int callback_result_precise(void) { asm volatile ( @@ -233,20 +244,36 @@ __naked int parent_callee_saved_reg_prec
SEC("?raw_tp") __success __log_level(2) +/* First simulated path does not include callback body */ __msg("12: (0f) r1 += r6") -__msg("mark_precise: frame0: last_idx 12 first_idx 10") +__msg("mark_precise: frame0: last_idx 12 first_idx 9") __msg("mark_precise: frame0: regs=r6 stack= before 11: (bf) r1 = r7") __msg("mark_precise: frame0: regs=r6 stack= before 10: (27) r6 *= 4") +__msg("mark_precise: frame0: regs=r6 stack= before 9: (85) call bpf_loop") __msg("mark_precise: frame0: parent state regs=r6 stack=:") -__msg("mark_precise: frame0: last_idx 16 first_idx 0") -__msg("mark_precise: frame0: regs=r6 stack= before 16: (95) exit") -__msg("mark_precise: frame1: regs= stack= before 15: (b7) r0 = 0") -__msg("mark_precise: frame1: regs= stack= before 9: (85) call bpf_loop#181") +__msg("mark_precise: frame0: last_idx 8 first_idx 0 subseq_idx 9") __msg("mark_precise: frame0: regs=r6 stack= before 8: (b7) r4 = 0") __msg("mark_precise: frame0: regs=r6 stack= before 7: (b7) r3 = 0") __msg("mark_precise: frame0: regs=r6 stack= before 6: (bf) r2 = r8") __msg("mark_precise: frame0: regs=r6 stack= before 5: (b7) r1 = 1") __msg("mark_precise: frame0: regs=r6 stack= before 4: (b7) r6 = 3") +/* State entering callback body popped from states stack */ +__msg("from 9 to 15: frame1:") +__msg("15: frame1: R1=scalar() R2=0 R10=fp0 cb") +__msg("15: (b7) r0 = 0") +__msg("16: (95) exit") +__msg("returning from callee:") +__msg("to caller at 9:") +/* r4 (flags) is always precise for bpf_loop(), + * r6 was marked before backtracking to callback body. + */ +__msg("frame 0: propagating r4,r6") +__msg("mark_precise: frame0: last_idx 9 first_idx 9 subseq_idx -1") +__msg("mark_precise: frame0: regs=r4,r6 stack= before 16: (95) exit") +__msg("mark_precise: frame1: regs= stack= before 15: (b7) r0 = 0") +__msg("mark_precise: frame1: regs= stack= before 9: (85) call bpf_loop") +__msg("mark_precise: frame0: parent state regs= stack=:") +__msg("from 16 to 9: safe") __naked int parent_callee_saved_reg_precise_with_callback(void) { asm volatile ( @@ -373,22 +400,38 @@ __naked int parent_stack_slot_precise_gl
SEC("?raw_tp") __success __log_level(2) +/* First simulated path does not include callback body */ __msg("14: (0f) r1 += r6") -__msg("mark_precise: frame0: last_idx 14 first_idx 11") +__msg("mark_precise: frame0: last_idx 14 first_idx 10") __msg("mark_precise: frame0: regs=r6 stack= before 13: (bf) r1 = r7") __msg("mark_precise: frame0: regs=r6 stack= before 12: (27) r6 *= 4") __msg("mark_precise: frame0: regs=r6 stack= before 11: (79) r6 = *(u64 *)(r10 -8)") +__msg("mark_precise: frame0: regs= stack=-8 before 10: (85) call bpf_loop") __msg("mark_precise: frame0: parent state regs= stack=-8:") -__msg("mark_precise: frame0: last_idx 18 first_idx 0") -__msg("mark_precise: frame0: regs= stack=-8 before 18: (95) exit") -__msg("mark_precise: frame1: regs= stack= before 17: (b7) r0 = 0") -__msg("mark_precise: frame1: regs= stack= before 10: (85) call bpf_loop#181") +__msg("mark_precise: frame0: last_idx 9 first_idx 0 subseq_idx 10") __msg("mark_precise: frame0: regs= stack=-8 before 9: (b7) r4 = 0") __msg("mark_precise: frame0: regs= stack=-8 before 8: (b7) r3 = 0") __msg("mark_precise: frame0: regs= stack=-8 before 7: (bf) r2 = r8") __msg("mark_precise: frame0: regs= stack=-8 before 6: (bf) r1 = r6") __msg("mark_precise: frame0: regs= stack=-8 before 5: (7b) *(u64 *)(r10 -8) = r6") __msg("mark_precise: frame0: regs=r6 stack= before 4: (b7) r6 = 3") +/* State entering callback body popped from states stack */ +__msg("from 10 to 17: frame1:") +__msg("17: frame1: R1=scalar() R2=0 R10=fp0 cb") +__msg("17: (b7) r0 = 0") +__msg("18: (95) exit") +__msg("returning from callee:") +__msg("to caller at 10:") +/* r4 (flags) is always precise for bpf_loop(), + * fp-8 was marked before backtracking to callback body. + */ +__msg("frame 0: propagating r4,fp-8") +__msg("mark_precise: frame0: last_idx 10 first_idx 10 subseq_idx -1") +__msg("mark_precise: frame0: regs=r4 stack=-8 before 18: (95) exit") +__msg("mark_precise: frame1: regs= stack= before 17: (b7) r0 = 0") +__msg("mark_precise: frame1: regs= stack= before 10: (85) call bpf_loop#181") +__msg("mark_precise: frame0: parent state regs= stack=:") +__msg("from 18 to 10: safe") __naked int parent_stack_slot_precise_with_callback(void) { asm volatile (
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 958465e217dbf5fc6677d42d8827fb3073d86afd upstream.
A set of test cases to check behavior of callback handling logic, check if verifier catches the following situations: - program not safe on second callback iteration; - program not safe on zero callback iterations; - infinite loop inside a callback.
Verify that callback logic works for bpf_loop, bpf_for_each_map_elem, bpf_user_ringbuf_drain, bpf_find_vma.
Acked-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231121020701.26440-8-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/prog_tests/verifier.c | 2 tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c | 147 ++++++++++ 2 files changed, 149 insertions(+) create mode 100644 tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c
--- a/tools/testing/selftests/bpf/prog_tests/verifier.c +++ b/tools/testing/selftests/bpf/prog_tests/verifier.c @@ -31,6 +31,7 @@ #include "verifier_helper_restricted.skel.h" #include "verifier_helper_value_access.skel.h" #include "verifier_int_ptr.skel.h" +#include "verifier_iterating_callbacks.skel.h" #include "verifier_jeq_infer_not_null.skel.h" #include "verifier_ld_ind.skel.h" #include "verifier_ldsx.skel.h" @@ -138,6 +139,7 @@ void test_verifier_helper_packet_access( void test_verifier_helper_restricted(void) { RUN(verifier_helper_restricted); } void test_verifier_helper_value_access(void) { RUN(verifier_helper_value_access); } void test_verifier_int_ptr(void) { RUN(verifier_int_ptr); } +void test_verifier_iterating_callbacks(void) { RUN(verifier_iterating_callbacks); } void test_verifier_jeq_infer_not_null(void) { RUN(verifier_jeq_infer_not_null); } void test_verifier_ld_ind(void) { RUN(verifier_ld_ind); } void test_verifier_ldsx(void) { RUN(verifier_ldsx); } --- /dev/null +++ b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c @@ -0,0 +1,147 @@ +// SPDX-License-Identifier: GPL-2.0 + +#include <linux/bpf.h> +#include <bpf/bpf_helpers.h> +#include "bpf_misc.h" + +struct { + __uint(type, BPF_MAP_TYPE_ARRAY); + __uint(max_entries, 8); + __type(key, __u32); + __type(value, __u64); +} map SEC(".maps"); + +struct { + __uint(type, BPF_MAP_TYPE_USER_RINGBUF); + __uint(max_entries, 8); +} ringbuf SEC(".maps"); + +struct vm_area_struct; +struct bpf_map; + +struct buf_context { + char *buf; +}; + +struct num_context { + __u64 i; +}; + +__u8 choice_arr[2] = { 0, 1 }; + +static int unsafe_on_2nd_iter_cb(__u32 idx, struct buf_context *ctx) +{ + if (idx == 0) { + ctx->buf = (char *)(0xDEAD); + return 0; + } + + if (bpf_probe_read_user(ctx->buf, 8, (void *)(0xBADC0FFEE))) + return 1; + + return 0; +} + +SEC("?raw_tp") +__failure __msg("R1 type=scalar expected=fp") +int unsafe_on_2nd_iter(void *unused) +{ + char buf[4]; + struct buf_context loop_ctx = { .buf = buf }; + + bpf_loop(100, unsafe_on_2nd_iter_cb, &loop_ctx, 0); + return 0; +} + +static int unsafe_on_zero_iter_cb(__u32 idx, struct num_context *ctx) +{ + ctx->i = 0; + return 0; +} + +SEC("?raw_tp") +__failure __msg("invalid access to map value, value_size=2 off=32 size=1") +int unsafe_on_zero_iter(void *unused) +{ + struct num_context loop_ctx = { .i = 32 }; + + bpf_loop(100, unsafe_on_zero_iter_cb, &loop_ctx, 0); + return choice_arr[loop_ctx.i]; +} + +static int loop_detection_cb(__u32 idx, struct num_context *ctx) +{ + for (;;) {} + return 0; +} + +SEC("?raw_tp") +__failure __msg("infinite loop detected") +int loop_detection(void *unused) +{ + struct num_context loop_ctx = { .i = 0 }; + + bpf_loop(100, loop_detection_cb, &loop_ctx, 0); + return 0; +} + +static __always_inline __u64 oob_state_machine(struct num_context *ctx) +{ + switch (ctx->i) { + case 0: + ctx->i = 1; + break; + case 1: + ctx->i = 32; + break; + } + return 0; +} + +static __u64 for_each_map_elem_cb(struct bpf_map *map, __u32 *key, __u64 *val, void *data) +{ + return oob_state_machine(data); +} + +SEC("?raw_tp") +__failure __msg("invalid access to map value, value_size=2 off=32 size=1") +int unsafe_for_each_map_elem(void *unused) +{ + struct num_context loop_ctx = { .i = 0 }; + + bpf_for_each_map_elem(&map, for_each_map_elem_cb, &loop_ctx, 0); + return choice_arr[loop_ctx.i]; +} + +static __u64 ringbuf_drain_cb(struct bpf_dynptr *dynptr, void *data) +{ + return oob_state_machine(data); +} + +SEC("?raw_tp") +__failure __msg("invalid access to map value, value_size=2 off=32 size=1") +int unsafe_ringbuf_drain(void *unused) +{ + struct num_context loop_ctx = { .i = 0 }; + + bpf_user_ringbuf_drain(&ringbuf, ringbuf_drain_cb, &loop_ctx, 0); + return choice_arr[loop_ctx.i]; +} + +static __u64 find_vma_cb(struct task_struct *task, struct vm_area_struct *vma, void *data) +{ + return oob_state_machine(data); +} + +SEC("?raw_tp") +__failure __msg("invalid access to map value, value_size=2 off=32 size=1") +int unsafe_find_vma(void *unused) +{ + struct task_struct *task = bpf_get_current_task_btf(); + struct num_context loop_ctx = { .i = 0 }; + + bpf_find_vma(task, 0, find_vma_cb, &loop_ctx, 0); + return choice_arr[loop_ctx.i]; +} + +char _license[] SEC("license") = "GPL";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit cafe2c21508a38cdb3ed22708842e957b2572c3e upstream.
Callbacks are similar to open coded iterators, so add imprecise widening logic for callback body processing. This makes callback based loops behave identically to open coded iterators, e.g. allowing to verify programs like below:
struct ctx { u32 i; }; int cb(u32 idx, struct ctx* ctx) { ++ctx->i; return 0; } ... struct ctx ctx = { .i = 0 }; bpf_loop(100, cb, &ctx, 0); ...
Acked-by: Andrii Nakryiko andrii@kernel.org Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231121020701.26440-9-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/bpf/verifier.c | 24 ++++++++++++++++++++++-- 1 file changed, 22 insertions(+), 2 deletions(-)
--- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -9595,9 +9595,10 @@ static bool in_rbtree_lock_required_cb(s
static int prepare_func_exit(struct bpf_verifier_env *env, int *insn_idx) { - struct bpf_verifier_state *state = env->cur_state; + struct bpf_verifier_state *state = env->cur_state, *prev_st; struct bpf_func_state *caller, *callee; struct bpf_reg_state *r0; + bool in_callback_fn; int err;
callee = state->frame[state->curframe]; @@ -9659,7 +9660,8 @@ static int prepare_func_exit(struct bpf_ * there function call logic would reschedule callback visit. If iteration * converges is_state_visited() would prune that visit eventually. */ - if (callee->in_callback_fn) + in_callback_fn = callee->in_callback_fn; + if (in_callback_fn) *insn_idx = callee->callsite; else *insn_idx = callee->callsite + 1; @@ -9673,6 +9675,24 @@ static int prepare_func_exit(struct bpf_ /* clear everything in the callee */ free_func_state(callee); state->frame[state->curframe--] = NULL; + + /* for callbacks widen imprecise scalars to make programs like below verify: + * + * struct ctx { int i; } + * void cb(int idx, struct ctx *ctx) { ctx->i++; ... } + * ... + * struct ctx = { .i = 0; } + * bpf_loop(100, cb, &ctx, 0); + * + * This is similar to what is done in process_iter_next_call() for open + * coded iterators. + */ + prev_st = in_callback_fn ? find_prev_entry(env, state, *insn_idx) : NULL; + if (prev_st) { + err = widen_imprecise_scalars(env, prev_st, state); + if (err) + return err; + } return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 9f3330aa644d6d979eb064c46e85c62d4b4eac75 upstream.
A test case to verify that imprecise scalars widening is applied to callback entering state, when callback call is simulated repeatedly.
Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231121020701.26440-10-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c | 20 ++++++++++ 1 file changed, 20 insertions(+)
--- a/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c +++ b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c @@ -25,6 +25,7 @@ struct buf_context {
struct num_context { __u64 i; + __u64 j; };
__u8 choice_arr[2] = { 0, 1 }; @@ -69,6 +70,25 @@ int unsafe_on_zero_iter(void *unused) return choice_arr[loop_ctx.i]; }
+static int widening_cb(__u32 idx, struct num_context *ctx) +{ + ++ctx->i; + return 0; +} + +SEC("?raw_tp") +__success +int widening(void *unused) +{ + struct num_context loop_ctx = { .i = 0, .j = 1 }; + + bpf_loop(100, widening_cb, &loop_ctx, 0); + /* loop_ctx.j is not changed during callback iteration, + * verifier should not apply widening to it. + */ + return choice_arr[loop_ctx.j]; +} + static int loop_detection_cb(__u32 idx, struct num_context *ctx) { for (;;) {}
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit bb124da69c47dd98d69361ec13244ece50bec63e upstream.
In some cases verifier can't infer convergence of the bpf_loop() iteration. E.g. for the following program:
static int cb(__u32 idx, struct num_context* ctx) { ctx->i++; return 0; }
SEC("?raw_tp") int prog(void *_) { struct num_context ctx = { .i = 0 }; __u8 choice_arr[2] = { 0, 1 };
bpf_loop(2, cb, &ctx, 0); return choice_arr[ctx.i]; }
Each 'cb' simulation would eventually return to 'prog' and reach 'return choice_arr[ctx.i]' statement. At which point ctx.i would be marked precise, thus forcing verifier to track multitude of separate states with {.i=0}, {.i=1}, ... at bpf_loop() callback entry.
This commit allows "brute force" handling for such cases by limiting number of callback body simulations using 'umax' value of the first bpf_loop() parameter.
For this, extend bpf_func_state with 'callback_depth' field. Increment this field when callback visiting state is pushed to states traversal stack. For frame #N it's 'callback_depth' field counts how many times callback with frame depth N+1 had been executed. Use bpf_func_state specifically to allow independent tracking of callback depths when multiple nested bpf_loop() calls are present.
Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231121020701.26440-11-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/bpf_verifier.h | 11 +++ kernel/bpf/verifier.c | 19 ++++- tools/testing/selftests/bpf/progs/verifier_subprog_precision.c | 35 +++++++--- 3 files changed, 53 insertions(+), 12 deletions(-)
--- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -300,6 +300,17 @@ struct bpf_func_state { bool in_callback_fn; struct tnum callback_ret_range; bool in_async_callback_fn; + /* For callback calling functions that limit number of possible + * callback executions (e.g. bpf_loop) keeps track of current + * simulated iteration number. + * Value in frame N refers to number of times callback with frame + * N+1 was simulated, e.g. for the following call: + * + * bpf_loop(..., fn, ...); | suppose current frame is N + * | fn would be simulated in frame N+1 + * | number of simulations is tracked in frame N + */ + u32 callback_depth;
/* The following fields should be last. See copy_func_state() */ int acquired_refs; --- a/kernel/bpf/verifier.c +++ b/kernel/bpf/verifier.c @@ -9301,6 +9301,8 @@ static int push_callback_call(struct bpf return err;
callback_state->callback_unroll_depth++; + callback_state->frame[callback_state->curframe - 1]->callback_depth++; + caller->callback_depth = 0; return 0; }
@@ -10090,8 +10092,21 @@ static int check_helper_call(struct bpf_ break; case BPF_FUNC_loop: update_loop_inline_state(env, meta.subprogno); - err = push_callback_call(env, insn, insn_idx, meta.subprogno, - set_loop_callback_state); + /* Verifier relies on R1 value to determine if bpf_loop() iteration + * is finished, thus mark it precise. + */ + err = mark_chain_precision(env, BPF_REG_1); + if (err) + return err; + if (cur_func(env)->callback_depth < regs[BPF_REG_1].umax_value) { + err = push_callback_call(env, insn, insn_idx, meta.subprogno, + set_loop_callback_state); + } else { + cur_func(env)->callback_depth = 0; + if (env->log.level & BPF_LOG_LEVEL2) + verbose(env, "frame%d bpf_loop iteration limit reached\n", + env->cur_state->curframe); + } break; case BPF_FUNC_dynptr_from_mem: if (regs[BPF_REG_1].type != PTR_TO_MAP_VALUE) { --- a/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c +++ b/tools/testing/selftests/bpf/progs/verifier_subprog_precision.c @@ -119,7 +119,23 @@ __naked int global_subprog_result_precis
SEC("?raw_tp") __success __log_level(2) -/* First simulated path does not include callback body */ +/* First simulated path does not include callback body, + * r1 and r4 are always precise for bpf_loop() calls. + */ +__msg("9: (85) call bpf_loop#181") +__msg("mark_precise: frame0: last_idx 9 first_idx 9 subseq_idx -1") +__msg("mark_precise: frame0: parent state regs=r4 stack=:") +__msg("mark_precise: frame0: last_idx 8 first_idx 0 subseq_idx 9") +__msg("mark_precise: frame0: regs=r4 stack= before 8: (b7) r4 = 0") +__msg("mark_precise: frame0: last_idx 9 first_idx 9 subseq_idx -1") +__msg("mark_precise: frame0: parent state regs=r1 stack=:") +__msg("mark_precise: frame0: last_idx 8 first_idx 0 subseq_idx 9") +__msg("mark_precise: frame0: regs=r1 stack= before 8: (b7) r4 = 0") +__msg("mark_precise: frame0: regs=r1 stack= before 7: (b7) r3 = 0") +__msg("mark_precise: frame0: regs=r1 stack= before 6: (bf) r2 = r8") +__msg("mark_precise: frame0: regs=r1 stack= before 5: (bf) r1 = r6") +__msg("mark_precise: frame0: regs=r6 stack= before 4: (b7) r6 = 3") +/* r6 precision propagation */ __msg("14: (0f) r1 += r6") __msg("mark_precise: frame0: last_idx 14 first_idx 9") __msg("mark_precise: frame0: regs=r6 stack= before 13: (bf) r1 = r7") @@ -134,10 +150,9 @@ __msg("17: (b7) r0 = 0") __msg("18: (95) exit") __msg("returning from callee:") __msg("to caller at 9:") -/* r4 (flags) is always precise for bpf_loop() */ -__msg("frame 0: propagating r4") +__msg("frame 0: propagating r1,r4") __msg("mark_precise: frame0: last_idx 9 first_idx 9 subseq_idx -1") -__msg("mark_precise: frame0: regs=r4 stack= before 18: (95) exit") +__msg("mark_precise: frame0: regs=r1,r4 stack= before 18: (95) exit") __msg("from 18 to 9: safe") __naked int callback_result_precise(void) { @@ -264,12 +279,12 @@ __msg("15: (b7) r0 = 0") __msg("16: (95) exit") __msg("returning from callee:") __msg("to caller at 9:") -/* r4 (flags) is always precise for bpf_loop(), +/* r1, r4 are always precise for bpf_loop(), * r6 was marked before backtracking to callback body. */ -__msg("frame 0: propagating r4,r6") +__msg("frame 0: propagating r1,r4,r6") __msg("mark_precise: frame0: last_idx 9 first_idx 9 subseq_idx -1") -__msg("mark_precise: frame0: regs=r4,r6 stack= before 16: (95) exit") +__msg("mark_precise: frame0: regs=r1,r4,r6 stack= before 16: (95) exit") __msg("mark_precise: frame1: regs= stack= before 15: (b7) r0 = 0") __msg("mark_precise: frame1: regs= stack= before 9: (85) call bpf_loop") __msg("mark_precise: frame0: parent state regs= stack=:") @@ -422,12 +437,12 @@ __msg("17: (b7) r0 = 0") __msg("18: (95) exit") __msg("returning from callee:") __msg("to caller at 10:") -/* r4 (flags) is always precise for bpf_loop(), +/* r1, r4 are always precise for bpf_loop(), * fp-8 was marked before backtracking to callback body. */ -__msg("frame 0: propagating r4,fp-8") +__msg("frame 0: propagating r1,r4,fp-8") __msg("mark_precise: frame0: last_idx 10 first_idx 10 subseq_idx -1") -__msg("mark_precise: frame0: regs=r4 stack=-8 before 18: (95) exit") +__msg("mark_precise: frame0: regs=r1,r4 stack=-8 before 18: (95) exit") __msg("mark_precise: frame1: regs= stack= before 17: (b7) r0 = 0") __msg("mark_precise: frame1: regs= stack= before 10: (85) call bpf_loop#181") __msg("mark_precise: frame0: parent state regs= stack=:")
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eduard Zingerman eddyz87@gmail.com
commit 57e2a52deeb12ab84c15c6d0fb93638b5b94001b upstream.
Check that even if bpf_loop() callback simulation does not converge to a specific state, verification could proceed via "brute force" simulation of maximal number of callback calls.
Signed-off-by: Eduard Zingerman eddyz87@gmail.com Link: https://lore.kernel.org/r/20231121020701.26440-12-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c | 75 ++++++++++ 1 file changed, 75 insertions(+)
--- a/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c +++ b/tools/testing/selftests/bpf/progs/verifier_iterating_callbacks.c @@ -164,4 +164,79 @@ int unsafe_find_vma(void *unused) return choice_arr[loop_ctx.i]; }
+static int iter_limit_cb(__u32 idx, struct num_context *ctx) +{ + ctx->i++; + return 0; +} + +SEC("?raw_tp") +__success +int bpf_loop_iter_limit_ok(void *unused) +{ + struct num_context ctx = { .i = 0 }; + + bpf_loop(1, iter_limit_cb, &ctx, 0); + return choice_arr[ctx.i]; +} + +SEC("?raw_tp") +__failure __msg("invalid access to map value, value_size=2 off=2 size=1") +int bpf_loop_iter_limit_overflow(void *unused) +{ + struct num_context ctx = { .i = 0 }; + + bpf_loop(2, iter_limit_cb, &ctx, 0); + return choice_arr[ctx.i]; +} + +static int iter_limit_level2a_cb(__u32 idx, struct num_context *ctx) +{ + ctx->i += 100; + return 0; +} + +static int iter_limit_level2b_cb(__u32 idx, struct num_context *ctx) +{ + ctx->i += 10; + return 0; +} + +static int iter_limit_level1_cb(__u32 idx, struct num_context *ctx) +{ + ctx->i += 1; + bpf_loop(1, iter_limit_level2a_cb, ctx, 0); + bpf_loop(1, iter_limit_level2b_cb, ctx, 0); + return 0; +} + +/* Check that path visiting every callback function once had been + * reached by verifier. Variables 'ctx{1,2}i' below serve as flags, + * with each decimal digit corresponding to a callback visit marker. + */ +SEC("socket") +__success __retval(111111) +int bpf_loop_iter_limit_nested(void *unused) +{ + struct num_context ctx1 = { .i = 0 }; + struct num_context ctx2 = { .i = 0 }; + __u64 a, b, c; + + bpf_loop(1, iter_limit_level1_cb, &ctx1, 0); + bpf_loop(1, iter_limit_level1_cb, &ctx2, 0); + a = ctx1.i; + b = ctx2.i; + /* Force 'ctx1.i' and 'ctx2.i' precise. */ + c = choice_arr[(a + b) % 2]; + /* This makes 'c' zero, but neither clang nor verifier know it. */ + c /= 10; + /* Make sure that verifier does not visit 'impossible' states: + * enumerate all possible callback visit masks. + */ + if (a != 0 && a != 1 && a != 11 && a != 101 && a != 111 && + b != 0 && b != 1 && b != 11 && b != 101 && b != 111) + asm volatile ("r0 /= 0;" ::: "r0"); + return 1000 * a + b + c; +} + char _license[] SEC("license") = "GPL";
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jonathan Gray jsg@jsg.id.au
This reverts commit 847e6947afd3c46623172d2eabcfc2481ee8668e.
duplicated a change made in 6.6.5 49227bea27ebcd260f0c94a3055b14bbd8605c5e
Cc: stable@vger.kernel.org # 6.6 Signed-off-by: Jonathan Gray jsg@jsg.id.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2197,8 +2197,6 @@ retry_init:
pci_wake_from_d3(pdev, TRUE);
- pci_wake_from_d3(pdev, TRUE); - /* * For runpm implemented via BACO, PMFW will handle the * timing for BACO in and out:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
commit 5eef12c4e3230f2025dc46ad8c4a3bc19978e5d7 upstream.
The code to handle the case of server disabling multichannel was picking iface_lock with chan_lock held. This goes against the lock ordering rules, as iface_lock is a higher order lock (even if it isn't so obvious).
This change fixes the lock ordering by doing the following in that order for each secondary channel: 1. store iface and server pointers in local variable 2. remove references to iface and server in channels 3. unlock chan_lock 4. lock iface_lock 5. dec ref count for iface 6. unlock iface_lock 7. dec ref count for server 8. lock chan_lock again
Since this function can only be called in smb2_reconnect, and that cannot be called by two parallel processes, we should not have races due to dropping chan_lock between steps 3 and 8.
Fixes: ee1d21794e55 ("cifs: handle when server stops supporting multichannel") Reported-by: Paulo Alcantara pc@manguebit.com Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/sess.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
--- a/fs/smb/client/sess.c +++ b/fs/smb/client/sess.c @@ -316,28 +316,32 @@ cifs_disable_secondary_channels(struct c iface = ses->chans[i].iface; server = ses->chans[i].server;
+ /* + * remove these references first, since we need to unlock + * the chan_lock here, since iface_lock is a higher lock + */ + ses->chans[i].iface = NULL; + ses->chans[i].server = NULL; + spin_unlock(&ses->chan_lock); + if (iface) { spin_lock(&ses->iface_lock); kref_put(&iface->refcount, release_iface); - ses->chans[i].iface = NULL; iface->num_channels--; if (iface->weight_fulfilled) iface->weight_fulfilled--; spin_unlock(&ses->iface_lock); }
- spin_unlock(&ses->chan_lock); - if (server && !server->terminate) { - server->terminate = true; - cifs_signal_cifsd_for_reconnect(server, false); - } - spin_lock(&ses->chan_lock); - if (server) { - ses->chans[i].server = NULL; + if (!server->terminate) { + server->terminate = true; + cifs_signal_cifsd_for_reconnect(server, false); + } cifs_put_tcp_session(server, false); }
+ spin_lock(&ses->chan_lock); }
done:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
commit f30bbc38704e279c06d073ecb18fea376791ecab upstream.
The following commit reverted the changes to ref count the server struct while scheduling a reconnect work: 823342524868 Revert "cifs: reconnect work should have reference on server struct"
However, a following change also introduced scheduling of reconnect work, and assumed ref counting. This change fixes that as well.
Fixes umount problems like:
[73496.157838] CPU: 5 PID: 1321389 Comm: umount Tainted: G W OE 6.7.0-060700rc6-generic #202312172332 [73496.157841] Hardware name: LENOVO 20MAS08500/20MAS08500, BIOS N2CET67W (1.50 ) 12/15/2022 [73496.157843] RIP: 0010:cifs_put_tcp_session+0x17d/0x190 [cifs] [73496.157906] Code: 5d 31 c0 31 d2 31 f6 31 ff c3 cc cc cc cc e8 4a 6e 14 e6 e9 f6 fe ff ff be 03 00 00 00 48 89 d7 e8 78 26 b3 e5 e9 e4 fe ff ff <0f> 0b e9 b1 fe ff ff 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 [73496.157908] RSP: 0018:ffffc90003bcbcb8 EFLAGS: 00010286 [73496.157911] RAX: 00000000ffffffff RBX: ffff8885830fa800 RCX: 0000000000000000 [73496.157913] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [73496.157915] RBP: ffffc90003bcbcc8 R08: 0000000000000000 R09: 0000000000000000 [73496.157917] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [73496.157918] R13: ffff8887d56ba800 R14: 00000000ffffffff R15: ffff8885830fa800 [73496.157920] FS: 00007f1ff0e33800(0000) GS:ffff88887ba80000(0000) knlGS:0000000000000000 [73496.157922] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [73496.157924] CR2: 0000115f002e2010 CR3: 00000003d1e24005 CR4: 00000000003706f0 [73496.157926] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [73496.157928] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [73496.157929] Call Trace: [73496.157931] <TASK> [73496.157933] ? show_regs+0x6d/0x80 [73496.157936] ? __warn+0x89/0x160 [73496.157939] ? cifs_put_tcp_session+0x17d/0x190 [cifs] [73496.157976] ? report_bug+0x17e/0x1b0 [73496.157980] ? handle_bug+0x51/0xa0 [73496.157983] ? exc_invalid_op+0x18/0x80 [73496.157985] ? asm_exc_invalid_op+0x1b/0x20 [73496.157989] ? cifs_put_tcp_session+0x17d/0x190 [cifs] [73496.158023] ? cifs_put_tcp_session+0x1e/0x190 [cifs] [73496.158057] __cifs_put_smb_ses+0x2b5/0x540 [cifs] [73496.158090] ? tconInfoFree+0xc2/0x120 [cifs] [73496.158130] cifs_put_tcon.part.0+0x108/0x2b0 [cifs] [73496.158173] cifs_put_tlink+0x49/0x90 [cifs] [73496.158220] cifs_umount+0x56/0xb0 [cifs] [73496.158258] cifs_kill_sb+0x52/0x60 [cifs] [73496.158306] deactivate_locked_super+0x32/0xc0 [73496.158309] deactivate_super+0x46/0x60 [73496.158311] cleanup_mnt+0xc3/0x170 [73496.158314] __cleanup_mnt+0x12/0x20 [73496.158330] task_work_run+0x5e/0xa0 [73496.158333] exit_to_user_mode_loop+0x105/0x130 [73496.158336] exit_to_user_mode_prepare+0xa5/0xb0 [73496.158338] syscall_exit_to_user_mode+0x29/0x60 [73496.158341] do_syscall_64+0x6c/0xf0 [73496.158344] ? syscall_exit_to_user_mode+0x37/0x60 [73496.158346] ? do_syscall_64+0x6c/0xf0 [73496.158349] ? exit_to_user_mode_prepare+0x30/0xb0 [73496.158353] ? syscall_exit_to_user_mode+0x37/0x60 [73496.158355] ? do_syscall_64+0x6c/0xf0
Reported-by: Robert Morris rtm@csail.mit.edu Fixes: 705fc522fe9d ("cifs: handle when server starts supporting multichannel") Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/smb2pdu.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -440,8 +440,7 @@ skip_sess_setup: skip_add_channels:
if (smb2_command != SMB2_INTERNAL_CMD) - if (mod_delayed_work(cifsiod_wq, &server->reconnect, 0)) - cifs_put_tcp_session(server, false); + mod_delayed_work(cifsiod_wq, &server->reconnect, 0);
atomic_inc(&tconInfoReconnectCount); out:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
commit 27e1fd343f80168ff456785c2443136b6b7ca3cc upstream.
Once the server disables multichannel for an active multichannel session, on the following reconnect, the client would reduce the number of channels to 1. However, it could be the case that the tree connect was active on one of these disabled channels. This results in an unrecoverable state.
This change fixes that by making sure that whenever a channel is being terminated, the session and tcon are marked for reconnect too. This could mean a few redundant tree connect calls to the server, but considering that this is not a frequent event, we should be okay.
Fixes: ee1d21794e55 ("cifs: handle when server stops supporting multichannel") Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/client/connect.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
--- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -212,17 +212,21 @@ cifs_mark_tcp_ses_conns_for_reconnect(st /* If server is a channel, select the primary channel */ pserver = SERVER_IS_CHAN(server) ? server->primary_server : server;
+ /* + * if the server has been marked for termination, there is a + * chance that the remaining channels all need reconnect. To be + * on the safer side, mark the session and trees for reconnect + * for this scenario. This might cause a few redundant session + * setup and tree connect requests, but it is better than not doing + * a tree connect when needed, and all following requests failing + */ + if (server->terminate) { + mark_smb_session = true; + server = pserver; + }
spin_lock(&cifs_tcp_ses_lock); list_for_each_entry_safe(ses, nses, &pserver->smb_ses_list, smb_ses_list) { - /* - * if channel has been marked for termination, nothing to do - * for the channel. in fact, we cannot find the channel for the - * server. So safe to exit here - */ - if (server->terminate) - break; - /* check if iface is still active */ spin_lock(&ses->chan_lock); if (!cifs_chan_is_iface_active(ses, server)) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lucas Stach l.stach@pengutronix.de
[ Upstream commit 1d9cabe2817edd215779dc9c2fe5e7ab9aac0704 ]
Use the proper size when setting up the bio_vec, as otherwise only zero-length UDP packets will be sent.
Fixes: baabf59c2414 ("SUNRPC: Convert svc_udp_sendto() to use the per-socket bio_vec array") Signed-off-by: Lucas Stach l.stach@pengutronix.de Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/sunrpc/svcsock.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sunrpc/svcsock.c b/net/sunrpc/svcsock.c index 998687421fa6..e0ce4276274b 100644 --- a/net/sunrpc/svcsock.c +++ b/net/sunrpc/svcsock.c @@ -717,12 +717,12 @@ static int svc_udp_sendto(struct svc_rqst *rqstp) ARRAY_SIZE(rqstp->rq_bvec), xdr);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, - count, 0); + count, rqstp->rq_res.len); err = sock_sendmsg(svsk->sk_sock, &msg); if (err == -ECONNREFUSED) { /* ICMP error on earlier request. */ iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, rqstp->rq_bvec, - count, 0); + count, rqstp->rq_res.len); err = sock_sendmsg(svsk->sk_sock, &msg); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Johannes Berg johannes.berg@intel.com
[ Upstream commit b01a74b3ca6fd51b62c67733ba7c3280fa6c5d26 ]
When a station is allocated, links are added but not set to valid yet (e.g. during connection to an AP MLD), we might remove the station without ever marking links valid, and leak them. Fix that.
Fixes: cb71f1d136a6 ("wifi: mac80211: add sta link addition/removal") Signed-off-by: Johannes Berg johannes.berg@intel.com Reviewed-by: Ilan Peer ilan.peer@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240111181514.6573998beaf8.I09ac2e1d41c80f82a5a616b8bd1d... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/mac80211/sta_info.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index 0c5cc75857e4..e112300caaf7 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -398,7 +398,10 @@ void sta_info_free(struct ieee80211_local *local, struct sta_info *sta) int i;
for (i = 0; i < ARRAY_SIZE(sta->link); i++) { - if (!(sta->sta.valid_links & BIT(i))) + struct link_sta_info *link_sta; + + link_sta = rcu_access_pointer(sta->link[i]); + if (!link_sta) continue;
sta_remove_link(sta, i, false);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wen Gu guwen@linux.alibaba.com
[ Upstream commit dbc153fd3c142909e564bb256da087e13fbf239c ]
A crash was found when dumping SMC-D connections. It can be reproduced by following steps:
- run nginx/wrk test: smc_run nginx smc_run wrk -t 16 -c 1000 -d <duration> -H 'Connection: Close' <URL>
- continuously dump SMC-D connections in parallel: watch -n 1 'smcss -D'
BUG: kernel NULL pointer dereference, address: 0000000000000030 CPU: 2 PID: 7204 Comm: smcss Kdump: loaded Tainted: G E 6.7.0+ #55 RIP: 0010:__smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag] Call Trace: <TASK> ? __die+0x24/0x70 ? page_fault_oops+0x66/0x150 ? exc_page_fault+0x69/0x140 ? asm_exc_page_fault+0x26/0x30 ? __smc_diag_dump.constprop.0+0x5e5/0x620 [smc_diag] ? __kmalloc_node_track_caller+0x35d/0x430 ? __alloc_skb+0x77/0x170 smc_diag_dump_proto+0xd0/0xf0 [smc_diag] smc_diag_dump+0x26/0x60 [smc_diag] netlink_dump+0x19f/0x320 __netlink_dump_start+0x1dc/0x300 smc_diag_handler_dump+0x6a/0x80 [smc_diag] ? __pfx_smc_diag_dump+0x10/0x10 [smc_diag] sock_diag_rcv_msg+0x121/0x140 ? __pfx_sock_diag_rcv_msg+0x10/0x10 netlink_rcv_skb+0x5a/0x110 sock_diag_rcv+0x28/0x40 netlink_unicast+0x22a/0x330 netlink_sendmsg+0x1f8/0x420 __sock_sendmsg+0xb0/0xc0 ____sys_sendmsg+0x24e/0x300 ? copy_msghdr_from_user+0x62/0x80 ___sys_sendmsg+0x7c/0xd0 ? __do_fault+0x34/0x160 ? do_read_fault+0x5f/0x100 ? do_fault+0xb0/0x110 ? __handle_mm_fault+0x2b0/0x6c0 __sys_sendmsg+0x4d/0x80 do_syscall_64+0x69/0x180 entry_SYSCALL_64_after_hwframe+0x6e/0x76
It is possible that the connection is in process of being established when we dump it. Assumed that the connection has been registered in a link group by smc_conn_create() but the rmb_desc has not yet been initialized by smc_buf_create(), thus causing the illegal access to conn->rmb_desc. So fix it by checking before dump.
Fixes: 4b1b7d3b30a6 ("net/smc: add SMC-D diag support") Signed-off-by: Wen Gu guwen@linux.alibaba.com Reviewed-by: Dust Li dust.li@linux.alibaba.com Reviewed-by: Wenjia Zhang wenjia@linux.ibm.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/smc/smc_diag.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/smc/smc_diag.c b/net/smc/smc_diag.c index 2c464d76b06c..37833b96b508 100644 --- a/net/smc/smc_diag.c +++ b/net/smc/smc_diag.c @@ -163,7 +163,7 @@ static int __smc_diag_dump(struct sock *sk, struct sk_buff *skb, } if (smc_conn_lgr_valid(&smc->conn) && smc->conn.lgr->is_smcd && (req->diag_ext & (1 << (SMC_DIAG_DMBINFO - 1))) && - !list_empty(&smc->conn.lgr->list)) { + !list_empty(&smc->conn.lgr->list) && smc->conn.rmb_desc) { struct smc_connection *conn = &smc->conn; struct smcd_diag_dmbinfo dinfo; struct smcd_dev *smcd = conn->lgr->smcd;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Benjamin Poirier bpoirier@nvidia.com
[ Upstream commit b01f15a7571b7aa222458bc9bf26ab59bd84e384 ]
When tests are run by runner.sh, bond_options.sh gets killed before it can complete:
make -C tools/testing/selftests run_tests TARGETS="drivers/net/bonding" [...] # timeout set to 120 # selftests: drivers/net/bonding: bond_options.sh # TEST: prio (active-backup miimon primary_reselect 0) [ OK ] # TEST: prio (active-backup miimon primary_reselect 1) [ OK ] # TEST: prio (active-backup miimon primary_reselect 2) [ OK ] # TEST: prio (active-backup arp_ip_target primary_reselect 0) [ OK ] # TEST: prio (active-backup arp_ip_target primary_reselect 1) [ OK ] # TEST: prio (active-backup arp_ip_target primary_reselect 2) [ OK ] # not ok 7 selftests: drivers/net/bonding: bond_options.sh # TIMEOUT 120 seconds
This test includes many sleep statements, at least some of which are related to timers in the operation of the bonding driver itself. Increase the test timeout to allow the test to complete.
I ran the test in slightly different VMs (including one without HW virtualization support) and got runtimes of 13m39.760s, 13m31.238s, and 13m2.956s. Use a ~1.5x "safety factor" and set the timeout to 1200s.
Fixes: 42a8d4aaea84 ("selftests: bonding: add bonding prio option test") Reported-by: Jakub Kicinski kuba@kernel.org Closes: https://lore.kernel.org/netdev/20240116104402.1203850a@kernel.org/#t Suggested-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Benjamin Poirier bpoirier@nvidia.com Reviewed-by: Hangbin Liu liuhangbin@gmail.com Link: https://lore.kernel.org/r/20240118001233.304759-1-bpoirier@nvidia.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/drivers/net/bonding/settings | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/drivers/net/bonding/settings b/tools/testing/selftests/drivers/net/bonding/settings index 6091b45d226b..79b65bdf05db 100644 --- a/tools/testing/selftests/drivers/net/bonding/settings +++ b/tools/testing/selftests/drivers/net/bonding/settings @@ -1 +1 @@ -timeout=120 +timeout=1200
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 198bc90e0e734e5f98c3d2833e8390cac3df61b2 ]
When I run syz's reproduction C program locally, it causes the following issue: pvqspinlock: lock 0xffff9d181cd5c660 has corrupted value 0x0! WARNING: CPU: 19 PID: 21160 at __pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508) Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 RIP: 0010:__pv_queued_spin_unlock_slowpath (kernel/locking/qspinlock_paravirt.h:508) Code: 73 56 3a ff 90 c3 cc cc cc cc 8b 05 bb 1f 48 01 85 c0 74 05 c3 cc cc cc cc 8b 17 48 89 fe 48 c7 c7 30 20 ce 8f e8 ad 56 42 ff <0f> 0b c3 cc cc cc cc 0f 0b 0f 1f 40 00 90 90 90 90 90 90 90 90 90 RSP: 0018:ffffa8d200604cb8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffff9d1ef60e0908 RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9d1ef60e0900 RBP: ffff9d181cd5c280 R08: 0000000000000000 R09: 00000000ffff7fff R10: ffffa8d200604b68 R11: ffffffff907dcdc8 R12: 0000000000000000 R13: ffff9d181cd5c660 R14: ffff9d1813a3f330 R15: 0000000000001000 FS: 00007fa110184640(0000) GS:ffff9d1ef60c0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000020000000 CR3: 000000011f65e000 CR4: 00000000000006f0 Call Trace: <IRQ> _raw_spin_unlock (kernel/locking/spinlock.c:186) inet_csk_reqsk_queue_add (net/ipv4/inet_connection_sock.c:1321) inet_csk_complete_hashdance (net/ipv4/inet_connection_sock.c:1358) tcp_check_req (net/ipv4/tcp_minisocks.c:868) tcp_v4_rcv (net/ipv4/tcp_ipv4.c:2260) ip_protocol_deliver_rcu (net/ipv4/ip_input.c:205) ip_local_deliver_finish (net/ipv4/ip_input.c:234) __netif_receive_skb_one_core (net/core/dev.c:5529) process_backlog (./include/linux/rcupdate.h:779) __napi_poll (net/core/dev.c:6533) net_rx_action (net/core/dev.c:6604) __do_softirq (./arch/x86/include/asm/jump_label.h:27) do_softirq (kernel/softirq.c:454 kernel/softirq.c:441) </IRQ> <TASK> __local_bh_enable_ip (kernel/softirq.c:381) __dev_queue_xmit (net/core/dev.c:4374) ip_finish_output2 (./include/net/neighbour.h:540 net/ipv4/ip_output.c:235) __ip_queue_xmit (net/ipv4/ip_output.c:535) __tcp_transmit_skb (net/ipv4/tcp_output.c:1462) tcp_rcv_synsent_state_process (net/ipv4/tcp_input.c:6469) tcp_rcv_state_process (net/ipv4/tcp_input.c:6657) tcp_v4_do_rcv (net/ipv4/tcp_ipv4.c:1929) __release_sock (./include/net/sock.h:1121 net/core/sock.c:2968) release_sock (net/core/sock.c:3536) inet_wait_for_connect (net/ipv4/af_inet.c:609) __inet_stream_connect (net/ipv4/af_inet.c:702) inet_stream_connect (net/ipv4/af_inet.c:748) __sys_connect (./include/linux/file.h:45 net/socket.c:2064) __x64_sys_connect (net/socket.c:2073 net/socket.c:2070 net/socket.c:2070) do_syscall_64 (arch/x86/entry/common.c:51 arch/x86/entry/common.c:82) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7fa10ff05a3d Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48 RSP: 002b:00007fa110183de8 EFLAGS: 00000202 ORIG_RAX: 000000000000002a RAX: ffffffffffffffda RBX: 0000000020000054 RCX: 00007fa10ff05a3d RDX: 000000000000001c RSI: 0000000020000040 RDI: 0000000000000003 RBP: 00007fa110183e20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000202 R12: 00007fa110184640 R13: 0000000000000000 R14: 00007fa10fe8b060 R15: 00007fff73e23b20 </TASK>
The issue triggering process is analyzed as follows: Thread A Thread B tcp_v4_rcv //receive ack TCP packet inet_shutdown tcp_check_req tcp_disconnect //disconnect sock ... tcp_set_state(sk, TCP_CLOSE) inet_csk_complete_hashdance ... inet_csk_reqsk_queue_add inet_listen //start listen spin_lock(&queue->rskq_lock) inet_csk_listen_start ... reqsk_queue_alloc ... spin_lock_init spin_unlock(&queue->rskq_lock) //warning
When the socket receives the ACK packet during the three-way handshake, it will hold spinlock. And then the user actively shutdowns the socket and listens to the socket immediately, the spinlock will be initialized. When the socket is going to release the spinlock, a warning is generated. Also the same issue to fastopenq.lock.
Move init spinlock to inet_create and inet_accept to make sure init the accept_queue's spinlocks once.
Fixes: fff1f3001cc5 ("tcp: add a spinlock to protect struct request_sock_queue") Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path") Reported-by: Ming Shu sming56@aliyun.com Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240118012019.1751966-1-shaozhengchao@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/inet_connection_sock.h | 8 ++++++++ net/core/request_sock.c | 3 --- net/ipv4/af_inet.c | 3 +++ net/ipv4/inet_connection_sock.c | 4 ++++ 4 files changed, 15 insertions(+), 3 deletions(-)
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h index 5d2fcc137b88..01a73bf74fa1 100644 --- a/include/net/inet_connection_sock.h +++ b/include/net/inet_connection_sock.h @@ -347,4 +347,12 @@ static inline bool inet_csk_has_ulp(const struct sock *sk) return inet_test_bit(IS_ICSK, sk) && !!inet_csk(sk)->icsk_ulp_ops; }
+static inline void inet_init_csk_locks(struct sock *sk) +{ + struct inet_connection_sock *icsk = inet_csk(sk); + + spin_lock_init(&icsk->icsk_accept_queue.rskq_lock); + spin_lock_init(&icsk->icsk_accept_queue.fastopenq.lock); +} + #endif /* _INET_CONNECTION_SOCK_H */ diff --git a/net/core/request_sock.c b/net/core/request_sock.c index f35c2e998406..63de5c635842 100644 --- a/net/core/request_sock.c +++ b/net/core/request_sock.c @@ -33,9 +33,6 @@
void reqsk_queue_alloc(struct request_sock_queue *queue) { - spin_lock_init(&queue->rskq_lock); - - spin_lock_init(&queue->fastopenq.lock); queue->fastopenq.rskq_rst_head = NULL; queue->fastopenq.rskq_rst_tail = NULL; queue->fastopenq.qlen = 0; diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index b0a5de1303b5..b739ddbef0f0 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -330,6 +330,9 @@ static int inet_create(struct net *net, struct socket *sock, int protocol, if (INET_PROTOSW_REUSE & answer_flags) sk->sk_reuse = SK_CAN_REUSE;
+ if (INET_PROTOSW_ICSK & answer_flags) + inet_init_csk_locks(sk); + inet = inet_sk(sk); inet_assign_bit(IS_ICSK, sk, INET_PROTOSW_ICSK & answer_flags);
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c index 394a498c2823..762817d6c8d7 100644 --- a/net/ipv4/inet_connection_sock.c +++ b/net/ipv4/inet_connection_sock.c @@ -730,6 +730,10 @@ struct sock *inet_csk_accept(struct sock *sk, int flags, int *err, bool kern) } if (req) reqsk_put(req); + + if (newsk) + inet_init_csk_locks(newsk); + return newsk; out_err: newsk = NULL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit 3c1069fa42872f95cf3c6fedf80723d391e12d57 ]
The first message to firmware may fail if the device is undergoing FLR. The driver has some recovery logic for this failure scenario but we must wait 100 msec for FLR to complete before proceeding. Otherwise the recovery will always fail.
Fixes: ba02629ff6cb ("bnxt_en: log firmware status on firmware init failure") Reviewed-by: Damodharam Ammepalli damodharam.ammepalli@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Link: https://lore.kernel.org/r/20240117234515.226944-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 6039886a8544..9e04db1273a5 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -12261,6 +12261,11 @@ static int bnxt_fw_init_one_p1(struct bnxt *bp)
bp->fw_cap = 0; rc = bnxt_hwrm_ver_get(bp); + /* FW may be unresponsive after FLR. FLR must complete within 100 msec + * so wait before continuing with recovery. + */ + if (rc) + msleep(100); bnxt_try_map_fw_health_reg(bp); if (rc) { rc = bnxt_try_recover_fw(bp);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Chan michael.chan@broadcom.com
[ Upstream commit c20f482129a582455f02eb9a6dcb2a4215274599 ]
We call bnxt_half_open_nic() to setup the chip partially to run loopback tests. The rings and buffers are initialized normally so that we can transmit and receive packets in loopback mode. That means page pool buffers are allocated for the aggregation ring just like the normal case. NAPI is not needed because we are just polling for the loopback packets.
When we're done with the loopback tests, we call bnxt_half_close_nic() to clean up. When freeing the page pools, we hit a WARN_ON() in page_pool_unlink_napi() because the NAPI state linked to the page pool is uninitialized.
The simplest way to avoid this warning is just to initialize the NAPIs during half open and delete the NAPIs during half close. Trying to skip the page pool initialization or skip linking of NAPI during half open will be more complicated.
This fix avoids this warning:
WARNING: CPU: 4 PID: 46967 at net/core/page_pool.c:946 page_pool_unlink_napi+0x1f/0x30 CPU: 4 PID: 46967 Comm: ethtool Tainted: G S W 6.7.0-rc5+ #22 Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS 1.3.8 08/31/2021 RIP: 0010:page_pool_unlink_napi+0x1f/0x30 Code: 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 8b 47 18 48 85 c0 74 1b 48 8b 50 10 83 e2 01 74 08 8b 40 34 83 f8 ff 74 02 <0f> 0b 48 c7 47 18 00 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90 RSP: 0018:ffa000003d0dfbe8 EFLAGS: 00010246 RAX: ff110003607ce640 RBX: ff110010baf5d000 RCX: 0000000000000008 RDX: 0000000000000000 RSI: ff110001e5e522c0 RDI: ff110010baf5d000 RBP: ff11000145539b40 R08: 0000000000000001 R09: ffffffffc063f641 R10: ff110001361eddb8 R11: 000000000040000f R12: 0000000000000001 R13: 000000000000001c R14: ff1100014553a080 R15: 0000000000003fc0 FS: 00007f9301c4f740(0000) GS:ff1100103fd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f91344fa8f0 CR3: 00000003527cc005 CR4: 0000000000771ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> ? __warn+0x81/0x140 ? page_pool_unlink_napi+0x1f/0x30 ? report_bug+0x102/0x200 ? handle_bug+0x44/0x70 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? bnxt_free_ring.isra.123+0xb1/0xd0 [bnxt_en] ? page_pool_unlink_napi+0x1f/0x30 page_pool_destroy+0x3e/0x150 bnxt_free_mem+0x441/0x5e0 [bnxt_en] bnxt_half_close_nic+0x2a/0x40 [bnxt_en] bnxt_self_test+0x21d/0x450 [bnxt_en] __dev_ethtool+0xeda/0x2e30 ? native_queued_spin_lock_slowpath+0x17f/0x2b0 ? __link_object+0xa1/0x160 ? _raw_spin_unlock_irqrestore+0x23/0x40 ? __create_object+0x5f/0x90 ? __kmem_cache_alloc_node+0x317/0x3c0 ? dev_ethtool+0x59/0x170 dev_ethtool+0xa7/0x170 dev_ioctl+0xc3/0x530 sock_do_ioctl+0xa8/0xf0 sock_ioctl+0x270/0x310 __x64_sys_ioctl+0x8c/0xc0 do_syscall_64+0x3e/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76
Fixes: 294e39e0d034 ("bnxt: hook NAPIs to page pools") Reviewed-by: Andy Gospodarek andrew.gospodarek@broadcom.com Reviewed-by: Ajit Khaparde ajit.khaparde@broadcom.com Signed-off-by: Michael Chan michael.chan@broadcom.com Link: https://lore.kernel.org/r/20240117234515.226944-5-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/broadcom/bnxt/bnxt.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/broadcom/bnxt/bnxt.c b/drivers/net/ethernet/broadcom/bnxt/bnxt.c index 9e04db1273a5..dac4f9510c17 100644 --- a/drivers/net/ethernet/broadcom/bnxt/bnxt.c +++ b/drivers/net/ethernet/broadcom/bnxt/bnxt.c @@ -10598,10 +10598,12 @@ int bnxt_half_open_nic(struct bnxt *bp) netdev_err(bp->dev, "bnxt_alloc_mem err: %x\n", rc); goto half_open_err; } + bnxt_init_napi(bp); set_bit(BNXT_STATE_HALF_OPEN, &bp->state); rc = bnxt_init_nic(bp, true); if (rc) { clear_bit(BNXT_STATE_HALF_OPEN, &bp->state); + bnxt_del_napi(bp); netdev_err(bp->dev, "bnxt_init_nic err: %x\n", rc); goto half_open_err; } @@ -10620,6 +10622,7 @@ int bnxt_half_open_nic(struct bnxt *bp) void bnxt_half_close_nic(struct bnxt *bp) { bnxt_hwrm_resource_free(bp, false, true); + bnxt_del_napi(bp); bnxt_free_skbs(bp); bnxt_free_mem(bp, true); clear_bit(BNXT_STATE_HALF_OPEN, &bp->state);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lin Ma linma@zju.edu.cn
[ Upstream commit 6c21660fe221a15c789dee2bc2fd95516bc5aeaf ]
In the vlan_changelink function, a loop is used to parse the nested attributes IFLA_VLAN_EGRESS_QOS and IFLA_VLAN_INGRESS_QOS in order to obtain the struct ifla_vlan_qos_mapping. These two nested attributes are checked in the vlan_validate_qos_map function, which calls nla_validate_nested_deprecated with the vlan_map_policy.
However, this deprecated validator applies a LIBERAL strictness, allowing the presence of an attribute with the type IFLA_VLAN_QOS_UNSPEC. Consequently, the loop in vlan_changelink may parse an attribute of type IFLA_VLAN_QOS_UNSPEC and believe it carries a payload of struct ifla_vlan_qos_mapping, which is not necessarily true.
To address this issue and ensure compatibility, this patch introduces two type checks that skip attributes whose type is not IFLA_VLAN_QOS_MAPPING.
Fixes: 07b5b17e157b ("[VLAN]: Use rtnl_link API") Signed-off-by: Lin Ma linma@zju.edu.cn Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240118130306.1644001-1-linma@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/8021q/vlan_netlink.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/8021q/vlan_netlink.c b/net/8021q/vlan_netlink.c index 214532173536..a3b68243fd4b 100644 --- a/net/8021q/vlan_netlink.c +++ b/net/8021q/vlan_netlink.c @@ -118,12 +118,16 @@ static int vlan_changelink(struct net_device *dev, struct nlattr *tb[], } if (data[IFLA_VLAN_INGRESS_QOS]) { nla_for_each_nested(attr, data[IFLA_VLAN_INGRESS_QOS], rem) { + if (nla_type(attr) != IFLA_VLAN_QOS_MAPPING) + continue; m = nla_data(attr); vlan_dev_set_ingress_priority(dev, m->to, m->from); } } if (data[IFLA_VLAN_EGRESS_QOS]) { nla_for_each_nested(attr, data[IFLA_VLAN_EGRESS_QOS], rem) { + if (nla_type(attr) != IFLA_VLAN_QOS_MAPPING) + continue; m = nla_data(attr); err = vlan_dev_set_egress_priority(dev, m->from, m->to); if (err)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit dad555c816a50c6a6a8a86be1f9177673918c647 ]
syzbot was able to trick llc_ui_sendmsg(), allocating an skb with no headroom, but subsequently trying to push 14 bytes of Ethernet header [1]
Like some others, llc_ui_sendmsg() releases the socket lock before calling sock_alloc_send_skb(). Then it acquires it again, but does not redo all the sanity checks that were performed.
This fix:
- Uses LL_RESERVED_SPACE() to reserve space. - Check all conditions again after socket lock is held again. - Do not account Ethernet header for mtu limitation.
[1]
skbuff: skb_under_panic: text:ffff800088baa334 len:1514 put:14 head:ffff0000c9c37000 data:ffff0000c9c36ff2 tail:0x5dc end:0x6c0 dev:bond0
kernel BUG at net/core/skbuff.c:193 ! Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP Modules linked in: CPU: 0 PID: 6875 Comm: syz-executor.0 Not tainted 6.7.0-rc8-syzkaller-00101-g0802e17d9aca-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : skb_panic net/core/skbuff.c:189 [inline] pc : skb_under_panic+0x13c/0x140 net/core/skbuff.c:203 lr : skb_panic net/core/skbuff.c:189 [inline] lr : skb_under_panic+0x13c/0x140 net/core/skbuff.c:203 sp : ffff800096f97000 x29: ffff800096f97010 x28: ffff80008cc8d668 x27: dfff800000000000 x26: ffff0000cb970c90 x25: 00000000000005dc x24: ffff0000c9c36ff2 x23: ffff0000c9c37000 x22: 00000000000005ea x21: 00000000000006c0 x20: 000000000000000e x19: ffff800088baa334 x18: 1fffe000368261ce x17: ffff80008e4ed000 x16: ffff80008a8310f8 x15: 0000000000000001 x14: 1ffff00012df2d58 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000001 x10: 0000000000ff0100 x9 : e28a51f1087e8400 x8 : e28a51f1087e8400 x7 : ffff80008028f8d0 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff800082b78714 x2 : 0000000000000001 x1 : 0000000100000000 x0 : 0000000000000089 Call trace: skb_panic net/core/skbuff.c:189 [inline] skb_under_panic+0x13c/0x140 net/core/skbuff.c:203 skb_push+0xf0/0x108 net/core/skbuff.c:2451 eth_header+0x44/0x1f8 net/ethernet/eth.c:83 dev_hard_header include/linux/netdevice.h:3188 [inline] llc_mac_hdr_init+0x110/0x17c net/llc/llc_output.c:33 llc_sap_action_send_xid_c+0x170/0x344 net/llc/llc_s_ac.c:85 llc_exec_sap_trans_actions net/llc/llc_sap.c:153 [inline] llc_sap_next_state net/llc/llc_sap.c:182 [inline] llc_sap_state_process+0x1ec/0x774 net/llc/llc_sap.c:209 llc_build_and_send_xid_pkt+0x12c/0x1c0 net/llc/llc_sap.c:270 llc_ui_sendmsg+0x7bc/0xb1c net/llc/af_llc.c:997 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg net/socket.c:745 [inline] sock_sendmsg+0x194/0x274 net/socket.c:767 splice_to_socket+0x7cc/0xd58 fs/splice.c:881 do_splice_from fs/splice.c:933 [inline] direct_splice_actor+0xe4/0x1c0 fs/splice.c:1142 splice_direct_to_actor+0x2a0/0x7e4 fs/splice.c:1088 do_splice_direct+0x20c/0x348 fs/splice.c:1194 do_sendfile+0x4bc/0xc70 fs/read_write.c:1254 __do_sys_sendfile64 fs/read_write.c:1322 [inline] __se_sys_sendfile64 fs/read_write.c:1308 [inline] __arm64_sys_sendfile64+0x160/0x3b4 fs/read_write.c:1308 __invoke_syscall arch/arm64/kernel/syscall.c:37 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:51 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:136 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:155 el0_svc+0x54/0x158 arch/arm64/kernel/entry-common.c:678 el0t_64_sync_handler+0x84/0xfc arch/arm64/kernel/entry-common.c:696 el0t_64_sync+0x190/0x194 arch/arm64/kernel/entry.S:595 Code: aa1803e6 aa1903e7 a90023f5 94792f6a (d4210000)
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-and-tested-by: syzbot+2a7024e9502df538e8ef@syzkaller.appspotmail.com Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/r/20240118183625.4007013-1-edumazet@google.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/llc/af_llc.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-)
diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c index 9b06c380866b..20551cfb7da6 100644 --- a/net/llc/af_llc.c +++ b/net/llc/af_llc.c @@ -928,14 +928,15 @@ static int llc_ui_recvmsg(struct socket *sock, struct msghdr *msg, size_t len, */ static int llc_ui_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) { + DECLARE_SOCKADDR(struct sockaddr_llc *, addr, msg->msg_name); struct sock *sk = sock->sk; struct llc_sock *llc = llc_sk(sk); - DECLARE_SOCKADDR(struct sockaddr_llc *, addr, msg->msg_name); int flags = msg->msg_flags; int noblock = flags & MSG_DONTWAIT; + int rc = -EINVAL, copied = 0, hdrlen, hh_len; struct sk_buff *skb = NULL; + struct net_device *dev; size_t size = 0; - int rc = -EINVAL, copied = 0, hdrlen;
dprintk("%s: sending from %02X to %02X\n", __func__, llc->laddr.lsap, llc->daddr.lsap); @@ -955,22 +956,29 @@ static int llc_ui_sendmsg(struct socket *sock, struct msghdr *msg, size_t len) if (rc) goto out; } - hdrlen = llc->dev->hard_header_len + llc_ui_header_len(sk, addr); + dev = llc->dev; + hh_len = LL_RESERVED_SPACE(dev); + hdrlen = llc_ui_header_len(sk, addr); size = hdrlen + len; - if (size > llc->dev->mtu) - size = llc->dev->mtu; + size = min_t(size_t, size, READ_ONCE(dev->mtu)); copied = size - hdrlen; rc = -EINVAL; if (copied < 0) goto out; release_sock(sk); - skb = sock_alloc_send_skb(sk, size, noblock, &rc); + skb = sock_alloc_send_skb(sk, hh_len + size, noblock, &rc); lock_sock(sk); if (!skb) goto out; - skb->dev = llc->dev; + if (sock_flag(sk, SOCK_ZAPPED) || + llc->dev != dev || + hdrlen != llc_ui_header_len(sk, addr) || + hh_len != LL_RESERVED_SPACE(dev) || + size > READ_ONCE(dev->mtu)) + goto out; + skb->dev = dev; skb->protocol = llc_proto_type(addr->sllc_arphrd); - skb_reserve(skb, hdrlen); + skb_reserve(skb, hh_len + hdrlen); rc = memcpy_from_msg(skb_put(skb, copied), msg, copied); if (rc) goto out;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit e3f9bed9bee261e3347131764e42aeedf1ffea61 ]
syzbot reported an uninit-value bug below. [0]
llc supports ETH_P_802_2 (0x0004) and used to support ETH_P_TR_802_2 (0x0011), and syzbot abused the latter to trigger the bug.
write$tun(r0, &(0x7f0000000040)={@val={0x0, 0x11}, @val, @mpls={[], @llc={@snap={0xaa, 0x1, ')', "90e5dd"}}}}, 0x16)
llc_conn_handler() initialises local variables {saddr,daddr}.mac based on skb in llc_pdu_decode_sa()/llc_pdu_decode_da() and passes them to __llc_lookup().
However, the initialisation is done only when skb->protocol is htons(ETH_P_802_2), otherwise, __llc_lookup_established() and __llc_lookup_listener() will read garbage.
The missing initialisation existed prior to commit 211ed865108e ("net: delete all instances of special processing for token ring").
It removed the part to kick out the token ring stuff but forgot to close the door allowing ETH_P_TR_802_2 packets to sneak into llc_rcv().
Let's remove llc_tr_packet_type and complete the deprecation.
[0]: BUG: KMSAN: uninit-value in __llc_lookup_established+0xe9d/0xf90 __llc_lookup_established+0xe9d/0xf90 __llc_lookup net/llc/llc_conn.c:611 [inline] llc_conn_handler+0x4bd/0x1360 net/llc/llc_conn.c:791 llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206 __netif_receive_skb_one_core net/core/dev.c:5527 [inline] __netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5641 netif_receive_skb_internal net/core/dev.c:5727 [inline] netif_receive_skb+0x58/0x660 net/core/dev.c:5786 tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555 tun_get_user+0x53af/0x66d0 drivers/net/tun.c:2002 tun_chr_write_iter+0x3af/0x5d0 drivers/net/tun.c:2048 call_write_iter include/linux/fs.h:2020 [inline] new_sync_write fs/read_write.c:491 [inline] vfs_write+0x8ef/0x1490 fs/read_write.c:584 ksys_write+0x20f/0x4c0 fs/read_write.c:637 __do_sys_write fs/read_write.c:649 [inline] __se_sys_write fs/read_write.c:646 [inline] __x64_sys_write+0x93/0xd0 fs/read_write.c:646 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x44/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b
Local variable daddr created at: llc_conn_handler+0x53/0x1360 net/llc/llc_conn.c:783 llc_rcv+0xfbb/0x14a0 net/llc/llc_input.c:206
CPU: 1 PID: 5004 Comm: syz-executor994 Not tainted 6.6.0-syzkaller-14500-g1c41041124bd #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/09/2023
Fixes: 211ed865108e ("net: delete all instances of special processing for token ring") Reported-by: syzbot+b5ad66046b913bc04c6f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b5ad66046b913bc04c6f Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240119015515.61898-1-kuniyu@amazon.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/llc_pdu.h | 6 ++---- net/llc/llc_core.c | 7 ------- 2 files changed, 2 insertions(+), 11 deletions(-)
diff --git a/include/net/llc_pdu.h b/include/net/llc_pdu.h index 7e73f8e5e497..1d55ba7c45be 100644 --- a/include/net/llc_pdu.h +++ b/include/net/llc_pdu.h @@ -262,8 +262,7 @@ static inline void llc_pdu_header_init(struct sk_buff *skb, u8 type, */ static inline void llc_pdu_decode_sa(struct sk_buff *skb, u8 *sa) { - if (skb->protocol == htons(ETH_P_802_2)) - memcpy(sa, eth_hdr(skb)->h_source, ETH_ALEN); + memcpy(sa, eth_hdr(skb)->h_source, ETH_ALEN); }
/** @@ -275,8 +274,7 @@ static inline void llc_pdu_decode_sa(struct sk_buff *skb, u8 *sa) */ static inline void llc_pdu_decode_da(struct sk_buff *skb, u8 *da) { - if (skb->protocol == htons(ETH_P_802_2)) - memcpy(da, eth_hdr(skb)->h_dest, ETH_ALEN); + memcpy(da, eth_hdr(skb)->h_dest, ETH_ALEN); }
/** diff --git a/net/llc/llc_core.c b/net/llc/llc_core.c index 6e387aadffce..4f16d9c88350 100644 --- a/net/llc/llc_core.c +++ b/net/llc/llc_core.c @@ -135,22 +135,15 @@ static struct packet_type llc_packet_type __read_mostly = { .func = llc_rcv, };
-static struct packet_type llc_tr_packet_type __read_mostly = { - .type = cpu_to_be16(ETH_P_TR_802_2), - .func = llc_rcv, -}; - static int __init llc_init(void) { dev_add_pack(&llc_packet_type); - dev_add_pack(&llc_tr_packet_type); return 0; }
static void __exit llc_exit(void) { dev_remove_pack(&llc_packet_type); - dev_remove_pack(&llc_tr_packet_type); }
module_init(llc_init);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Eric Dumazet edumazet@google.com
[ Upstream commit a54d51fb2dfb846aedf3751af501e9688db447f5 ]
Generic sk_busy_loop_end() only looks at sk->sk_receive_queue for presence of packets.
Problem is that for UDP sockets after blamed commit, some packets could be present in another queue: udp_sk(sk)->reader_queue
In some cases, a busy poller could spin until timeout expiration, even if some packets are available in udp_sk(sk)->reader_queue.
v3: - make sk_busy_loop_end() nicer (Willem)
v2: - add a READ_ONCE(sk->sk_family) in sk_is_inet() to avoid KCSAN splats. - add a sk_is_inet() check in sk_is_udp() (Willem feedback) - add a sk_is_inet() check in sk_is_tcp().
Fixes: 2276f58ac589 ("udp: use a separate rx queue for packet reception") Signed-off-by: Eric Dumazet edumazet@google.com Reviewed-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Willem de Bruijn willemb@google.com Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/skmsg.h | 6 ------ include/net/inet_sock.h | 5 ----- include/net/sock.h | 18 +++++++++++++++++- net/core/sock.c | 11 +++++++++-- 4 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h index c953b8c0d2f4..bd4418377bac 100644 --- a/include/linux/skmsg.h +++ b/include/linux/skmsg.h @@ -500,12 +500,6 @@ static inline bool sk_psock_strp_enabled(struct sk_psock *psock) return !!psock->saved_data_ready; }
-static inline bool sk_is_udp(const struct sock *sk) -{ - return sk->sk_type == SOCK_DGRAM && - sk->sk_protocol == IPPROTO_UDP; -} - #if IS_ENABLED(CONFIG_NET_SOCK_MSG)
#define BPF_F_STRPARSER (1UL << 1) diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h index 2de0e4d4a027..2790ba58ffe5 100644 --- a/include/net/inet_sock.h +++ b/include/net/inet_sock.h @@ -301,11 +301,6 @@ static inline unsigned long inet_cmsg_flags(const struct inet_sock *inet) #define inet_assign_bit(nr, sk, val) \ assign_bit(INET_FLAGS_##nr, &inet_sk(sk)->inet_flags, val)
-static inline bool sk_is_inet(struct sock *sk) -{ - return sk->sk_family == AF_INET || sk->sk_family == AF_INET6; -} - /** * sk_to_full_sk - Access to a full socket * @sk: pointer to a socket diff --git a/include/net/sock.h b/include/net/sock.h index 70a771d96467..e70c903b04f3 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -2793,9 +2793,25 @@ static inline void skb_setup_tx_timestamp(struct sk_buff *skb, __u16 tsflags) &skb_shinfo(skb)->tskey); }
+static inline bool sk_is_inet(const struct sock *sk) +{ + int family = READ_ONCE(sk->sk_family); + + return family == AF_INET || family == AF_INET6; +} + static inline bool sk_is_tcp(const struct sock *sk) { - return sk->sk_type == SOCK_STREAM && sk->sk_protocol == IPPROTO_TCP; + return sk_is_inet(sk) && + sk->sk_type == SOCK_STREAM && + sk->sk_protocol == IPPROTO_TCP; +} + +static inline bool sk_is_udp(const struct sock *sk) +{ + return sk_is_inet(sk) && + sk->sk_type == SOCK_DGRAM && + sk->sk_protocol == IPPROTO_UDP; }
static inline bool sk_is_stream_unix(const struct sock *sk) diff --git a/net/core/sock.c b/net/core/sock.c index 5cd21e699f2d..383e30fe79f4 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -107,6 +107,7 @@ #include <linux/interrupt.h> #include <linux/poll.h> #include <linux/tcp.h> +#include <linux/udp.h> #include <linux/init.h> #include <linux/highmem.h> #include <linux/user_namespace.h> @@ -4136,8 +4137,14 @@ bool sk_busy_loop_end(void *p, unsigned long start_time) { struct sock *sk = p;
- return !skb_queue_empty_lockless(&sk->sk_receive_queue) || - sk_busy_loop_timeout(sk, start_time); + if (!skb_queue_empty_lockless(&sk->sk_receive_queue)) + return true; + + if (sk_is_udp(sk) && + !skb_queue_empty_lockless(&udp_sk(sk)->reader_queue)) + return true; + + return sk_busy_loop_timeout(sk, start_time); } EXPORT_SYMBOL(sk_busy_loop_end); #endif /* CONFIG_NET_RX_BUSY_POLL */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit d09486a04f5da0a812c26217213b89a3b1acf836 ]
Mark reports a BUG() when a net namespace is removed.
kernel BUG at net/core/dev.c:11520!
Physical interfaces moved outside of init_net get "refunded" to init_net when that namespace disappears. The main interface name may get overwritten in the process if it would have conflicted. We need to also discard all conflicting altnames. Recent fixes addressed ensuring that altnames get moved with the main interface, which surfaced this problem.
Reported-by: Марк Коренберг socketpair@gmail.com Link: https://lore.kernel.org/all/CAEmTpZFZ4Sv3KwqFOY2WKDHeZYdi0O7N5H1nTvcGp=SAEav... Fixes: 7663d522099e ("net: check for altname conflicts when changing netdev's netns") Signed-off-by: Jakub Kicinski kuba@kernel.org Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Xin Long lucien.xin@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/dev.c | 9 +++++++++ net/core/dev.h | 3 +++ 2 files changed, 12 insertions(+)
diff --git a/net/core/dev.c b/net/core/dev.c index e480afb50d4c..d72a4ff689ca 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -11491,6 +11491,7 @@ static struct pernet_operations __net_initdata netdev_net_ops = {
static void __net_exit default_device_exit_net(struct net *net) { + struct netdev_name_node *name_node, *tmp; struct net_device *dev, *aux; /* * Push all migratable network devices back to the @@ -11513,6 +11514,14 @@ static void __net_exit default_device_exit_net(struct net *net) snprintf(fb_name, IFNAMSIZ, "dev%d", dev->ifindex); if (netdev_name_in_use(&init_net, fb_name)) snprintf(fb_name, IFNAMSIZ, "dev%%d"); + + netdev_for_each_altname_safe(dev, name_node, tmp) + if (netdev_name_in_use(&init_net, name_node->name)) { + netdev_name_node_del(name_node); + synchronize_rcu(); + __netdev_name_node_alt_destroy(name_node); + } + err = dev_change_net_namespace(dev, &init_net, fb_name); if (err) { pr_emerg("%s: failed to move %s to init_net: %d\n", diff --git a/net/core/dev.h b/net/core/dev.h index fa2e9c5c4122..f2037d402144 100644 --- a/net/core/dev.h +++ b/net/core/dev.h @@ -64,6 +64,9 @@ int dev_change_name(struct net_device *dev, const char *newname);
#define netdev_for_each_altname(dev, namenode) \ list_for_each_entry((namenode), &(dev)->name_node->list, list) +#define netdev_for_each_altname_safe(dev, namenode, next) \ + list_for_each_entry_safe((namenode), (next), &(dev)->name_node->list, \ + list)
int netdev_name_node_alt_create(struct net_device *dev, const char *name); int netdev_name_node_alt_destroy(struct net_device *dev, const char *name);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yunjian Wang wangyunjian@huawei.com
[ Upstream commit 5744ba05e7c4bff8fec133dd0f9e51ddffba92f5 ]
The commit 8ae1aff0b331 ("tuntap: split out XDP logic") includes dropped counter for XDP_DROP, XDP_ABORTED, and invalid XDP actions. Unfortunately, that commit missed the dropped counter when error occurs during XDP_TX and XDP_REDIRECT actions. This patch fixes this issue.
Fixes: 8ae1aff0b331 ("tuntap: split out XDP logic") Signed-off-by: Yunjian Wang wangyunjian@huawei.com Reviewed-by: Willem de Bruijn willemb@google.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/tun.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index afa5497f7c35..237fef557ba5 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1630,13 +1630,17 @@ static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog, switch (act) { case XDP_REDIRECT: err = xdp_do_redirect(tun->dev, xdp, xdp_prog); - if (err) + if (err) { + dev_core_stats_rx_dropped_inc(tun->dev); return err; + } break; case XDP_TX: err = tun_xdp_tx(tun->dev, xdp); - if (err < 0) + if (err < 0) { + dev_core_stats_rx_dropped_inc(tun->dev); return err; + } break; case XDP_PASS: break;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yunjian Wang wangyunjian@huawei.com
[ Upstream commit f1084c427f55d573fcd5688d9ba7b31b78019716 ]
The TUN can be used as vhost-net backend, and it is necessary to count the packets transmitted from TUN to vhost-net/virtio-net. However, there are some places in the receive path that were not taken into account when using XDP. It would be beneficial to also include new accounting for successfully received bytes using dev_sw_netstats_rx_add.
Fixes: 761876c857cb ("tap: XDP support") Signed-off-by: Yunjian Wang wangyunjian@huawei.com Reviewed-by: Willem de Bruijn willemb@google.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/tun.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/tun.c b/drivers/net/tun.c index 237fef557ba5..4a4f8c8e79fa 100644 --- a/drivers/net/tun.c +++ b/drivers/net/tun.c @@ -1634,6 +1634,7 @@ static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog, dev_core_stats_rx_dropped_inc(tun->dev); return err; } + dev_sw_netstats_rx_add(tun->dev, xdp->data_end - xdp->data); break; case XDP_TX: err = tun_xdp_tx(tun->dev, xdp); @@ -1641,6 +1642,7 @@ static int tun_xdp_act(struct tun_struct *tun, struct bpf_prog *xdp_prog, dev_core_stats_rx_dropped_inc(tun->dev); return err; } + dev_sw_netstats_rx_add(tun->dev, xdp->data_end - xdp->data); break; case XDP_PASS: break;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Horatiu Vultur horatiu.vultur@microchip.com
[ Upstream commit aaf632f7ab6dec57bc9329a438f94504fe8034b9 ]
The HW has the capability to check each frame if it is a PTP frame, which domain it is, which ptp frame type it is, different ip address in the frame. And if one of these checks fail then the frame is not timestamp. Most of these checks were disabled except checking the field minorVersionPTP inside the PTP header. Meaning that once a partner sends a frame compliant to 8021AS which has minorVersionPTP set to 1, then the frame was not timestamp because the HW expected by default a value of 0 in minorVersionPTP. This is exactly the same issue as on lan8841. Fix this issue by removing this check so the userspace can decide on this.
Fixes: ece19502834d ("net: phy: micrel: 1588 support for LAN8814 phy") Signed-off-by: Horatiu Vultur horatiu.vultur@microchip.com Reviewed-by: Maxime Chevallier maxime.chevallier@bootlin.com Reviewed-by: Divya Koppera divya.koppera@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/micrel.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
diff --git a/drivers/net/phy/micrel.c b/drivers/net/phy/micrel.c index dfd5f8e78e29..27ca25bbd141 100644 --- a/drivers/net/phy/micrel.c +++ b/drivers/net/phy/micrel.c @@ -120,6 +120,11 @@ */ #define LAN8814_1PPM_FORMAT 17179
+#define PTP_RX_VERSION 0x0248 +#define PTP_TX_VERSION 0x0288 +#define PTP_MAX_VERSION(x) (((x) & GENMASK(7, 0)) << 8) +#define PTP_MIN_VERSION(x) ((x) & GENMASK(7, 0)) + #define PTP_RX_MOD 0x024F #define PTP_RX_MOD_BAD_UDPV4_CHKSUM_FORCE_FCS_DIS_ BIT(3) #define PTP_RX_TIMESTAMP_EN 0x024D @@ -3125,6 +3130,12 @@ static void lan8814_ptp_init(struct phy_device *phydev) lanphy_write_page_reg(phydev, 5, PTP_TX_PARSE_IP_ADDR_EN, 0); lanphy_write_page_reg(phydev, 5, PTP_RX_PARSE_IP_ADDR_EN, 0);
+ /* Disable checking for minorVersionPTP field */ + lanphy_write_page_reg(phydev, 5, PTP_RX_VERSION, + PTP_MAX_VERSION(0xff) | PTP_MIN_VERSION(0x0)); + lanphy_write_page_reg(phydev, 5, PTP_TX_VERSION, + PTP_MAX_VERSION(0xff) | PTP_MIN_VERSION(0x0)); + skb_queue_head_init(&ptp_priv->tx_queue); skb_queue_head_init(&ptp_priv->rx_queue); INIT_LIST_HEAD(&ptp_priv->rx_ts_list);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sharath Srinivasan sharath.srinivasan@oracle.com
[ Upstream commit 13e788deb7348cc88df34bed736c3b3b9927ea52 ]
Syzcaller UBSAN crash occurs in rds_cmsg_recv(), which reads inc->i_rx_lat_trace[j + 1] with index 4 (3 + 1), but with array size of 4 (RDS_RX_MAX_TRACES). Here 'j' is assigned from rs->rs_rx_trace[i] and in-turn from trace.rx_trace_pos[i] in rds_recv_track_latency(), with both arrays sized 3 (RDS_MSG_RX_DGRAM_TRACE_MAX). So fix the off-by-one bounds check in rds_recv_track_latency() to prevent a potential crash in rds_cmsg_recv().
Found by syzcaller: ================================================================= UBSAN: array-index-out-of-bounds in net/rds/recv.c:585:39 index 4 is out of range for type 'u64 [4]' CPU: 1 PID: 8058 Comm: syz-executor228 Not tainted 6.6.0-gd2f51b3516da #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x136/0x150 lib/dump_stack.c:106 ubsan_epilogue lib/ubsan.c:217 [inline] __ubsan_handle_out_of_bounds+0xd5/0x130 lib/ubsan.c:348 rds_cmsg_recv+0x60d/0x700 net/rds/recv.c:585 rds_recvmsg+0x3fb/0x1610 net/rds/recv.c:716 sock_recvmsg_nosec net/socket.c:1044 [inline] sock_recvmsg+0xe2/0x160 net/socket.c:1066 __sys_recvfrom+0x1b6/0x2f0 net/socket.c:2246 __do_sys_recvfrom net/socket.c:2264 [inline] __se_sys_recvfrom net/socket.c:2260 [inline] __x64_sys_recvfrom+0xe0/0x1b0 net/socket.c:2260 do_syscall_x64 arch/x86/entry/common.c:51 [inline] do_syscall_64+0x40/0x110 arch/x86/entry/common.c:82 entry_SYSCALL_64_after_hwframe+0x63/0x6b ==================================================================
Fixes: 3289025aedc0 ("RDS: add receive message trace used by application") Reported-by: Chenyuan Yang chenyuan0y@gmail.com Closes: https://lore.kernel.org/linux-rdma/CALGdzuoVdq-wtQ4Az9iottBqC5cv9ZhcE5q8N7Lf... Signed-off-by: Sharath Srinivasan sharath.srinivasan@oracle.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/rds/af_rds.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/rds/af_rds.c b/net/rds/af_rds.c index 01c4cdfef45d..8435a20968ef 100644 --- a/net/rds/af_rds.c +++ b/net/rds/af_rds.c @@ -419,7 +419,7 @@ static int rds_recv_track_latency(struct rds_sock *rs, sockptr_t optval,
rs->rs_rx_traces = trace.rx_traces; for (i = 0; i < rs->rs_rx_traces; i++) { - if (trace.rx_trace_pos[i] > RDS_MSG_RX_DGRAM_TRACE_MAX) { + if (trace.rx_trace_pos[i] >= RDS_MSG_RX_DGRAM_TRACE_MAX) { rs->rs_rx_traces = 0; return -EFAULT; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
[ Upstream commit 3be0b3ed1d76c6703b9ee482b55f7e01c369cc68 ]
This function dereferences "cache" and then checks if it's IS_ERR_OR_NULL(). Check first, then dereference.
Fixes: 9549332df4ed ("fscache: Implement cache registration") Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Signed-off-by: David Howells dhowells@redhat.com Link: https://lore.kernel.org/r/e84bc740-3502-4f16-982a-a40d5676615c@moroto.mounta... # v2 Signed-off-by: Sasha Levin sashal@kernel.org --- fs/fscache/cache.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/fscache/cache.c b/fs/fscache/cache.c index d645f8b302a2..9397ed39b0b4 100644 --- a/fs/fscache/cache.c +++ b/fs/fscache/cache.c @@ -179,13 +179,14 @@ EXPORT_SYMBOL(fscache_acquire_cache); void fscache_put_cache(struct fscache_cache *cache, enum fscache_cache_trace where) { - unsigned int debug_id = cache->debug_id; + unsigned int debug_id; bool zero; int ref;
if (IS_ERR_OR_NULL(cache)) return;
+ debug_id = cache->debug_id; zero = __refcount_dec_and_test(&cache->ref, &ref); trace_fscache_cache(debug_id, ref - 1, where);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Petr Pavlu petr.pavlu@suse.com
[ Upstream commit 2b44760609e9eaafc9d234a6883d042fc21132a7 ]
Running the following two commands in parallel on a multi-processor AArch64 machine can sporadically produce an unexpected warning about duplicate histogram entries:
$ while true; do echo hist:key=id.syscall:val=hitcount > \ /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/trigger cat /sys/kernel/debug/tracing/events/raw_syscalls/sys_enter/hist sleep 0.001 done $ stress-ng --sysbadaddr $(nproc)
The warning looks as follows:
[ 2911.172474] ------------[ cut here ]------------ [ 2911.173111] Duplicates detected: 1 [ 2911.173574] WARNING: CPU: 2 PID: 12247 at kernel/trace/tracing_map.c:983 tracing_map_sort_entries+0x3e0/0x408 [ 2911.174702] Modules linked in: iscsi_ibft(E) iscsi_boot_sysfs(E) rfkill(E) af_packet(E) nls_iso8859_1(E) nls_cp437(E) vfat(E) fat(E) ena(E) tiny_power_button(E) qemu_fw_cfg(E) button(E) fuse(E) efi_pstore(E) ip_tables(E) x_tables(E) xfs(E) libcrc32c(E) aes_ce_blk(E) aes_ce_cipher(E) crct10dif_ce(E) polyval_ce(E) polyval_generic(E) ghash_ce(E) gf128mul(E) sm4_ce_gcm(E) sm4_ce_ccm(E) sm4_ce(E) sm4_ce_cipher(E) sm4(E) sm3_ce(E) sm3(E) sha3_ce(E) sha512_ce(E) sha512_arm64(E) sha2_ce(E) sha256_arm64(E) nvme(E) sha1_ce(E) nvme_core(E) nvme_auth(E) t10_pi(E) sg(E) scsi_mod(E) scsi_common(E) efivarfs(E) [ 2911.174738] Unloaded tainted modules: cppc_cpufreq(E):1 [ 2911.180985] CPU: 2 PID: 12247 Comm: cat Kdump: loaded Tainted: G E 6.7.0-default #2 1b58bbb22c97e4399dc09f92d309344f69c44a01 [ 2911.182398] Hardware name: Amazon EC2 c7g.8xlarge/, BIOS 1.0 11/1/2018 [ 2911.183208] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 2911.184038] pc : tracing_map_sort_entries+0x3e0/0x408 [ 2911.184667] lr : tracing_map_sort_entries+0x3e0/0x408 [ 2911.185310] sp : ffff8000a1513900 [ 2911.185750] x29: ffff8000a1513900 x28: ffff0003f272fe80 x27: 0000000000000001 [ 2911.186600] x26: ffff0003f272fe80 x25: 0000000000000030 x24: 0000000000000008 [ 2911.187458] x23: ffff0003c5788000 x22: ffff0003c16710c8 x21: ffff80008017f180 [ 2911.188310] x20: ffff80008017f000 x19: ffff80008017f180 x18: ffffffffffffffff [ 2911.189160] x17: 0000000000000000 x16: 0000000000000000 x15: ffff8000a15134b8 [ 2911.190015] x14: 0000000000000000 x13: 205d373432323154 x12: 5b5d313131333731 [ 2911.190844] x11: 00000000fffeffff x10: 00000000fffeffff x9 : ffffd1b78274a13c [ 2911.191716] x8 : 000000000017ffe8 x7 : c0000000fffeffff x6 : 000000000057ffa8 [ 2911.192554] x5 : ffff0012f6c24ec0 x4 : 0000000000000000 x3 : ffff2e5b72b5d000 [ 2911.193404] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff0003ff254480 [ 2911.194259] Call trace: [ 2911.194626] tracing_map_sort_entries+0x3e0/0x408 [ 2911.195220] hist_show+0x124/0x800 [ 2911.195692] seq_read_iter+0x1d4/0x4e8 [ 2911.196193] seq_read+0xe8/0x138 [ 2911.196638] vfs_read+0xc8/0x300 [ 2911.197078] ksys_read+0x70/0x108 [ 2911.197534] __arm64_sys_read+0x24/0x38 [ 2911.198046] invoke_syscall+0x78/0x108 [ 2911.198553] el0_svc_common.constprop.0+0xd0/0xf8 [ 2911.199157] do_el0_svc+0x28/0x40 [ 2911.199613] el0_svc+0x40/0x178 [ 2911.200048] el0t_64_sync_handler+0x13c/0x158 [ 2911.200621] el0t_64_sync+0x1a8/0x1b0 [ 2911.201115] ---[ end trace 0000000000000000 ]---
The problem appears to be caused by CPU reordering of writes issued from __tracing_map_insert().
The check for the presence of an element with a given key in this function is:
val = READ_ONCE(entry->val); if (val && keys_match(key, val->key, map->key_size)) ...
The write of a new entry is:
elt = get_free_elt(map); memcpy(elt->key, key, map->key_size); entry->val = elt;
The "memcpy(elt->key, key, map->key_size);" and "entry->val = elt;" stores may become visible in the reversed order on another CPU. This second CPU might then incorrectly determine that a new key doesn't match an already present val->key and subsequently insert a new element, resulting in a duplicate.
Fix the problem by adding a write barrier between "memcpy(elt->key, key, map->key_size);" and "entry->val = elt;", and for good measure, also use WRITE_ONCE(entry->val, elt) for publishing the element. The sequence pairs with the mentioned "READ_ONCE(entry->val);" and the "val->key" check which has an address dependency.
The barrier is placed on a path executed when adding an element for a new key. Subsequent updates targeting the same key remain unaffected.
From the user's perspective, the issue was introduced by commit
c193707dde77 ("tracing: Remove code which merges duplicates"), which followed commit cbf4100efb8f ("tracing: Add support to detect and avoid duplicates"). The previous code operated differently; it inherently expected potential races which result in duplicates but merged them later when they occurred.
Link: https://lore.kernel.org/linux-trace-kernel/20240122150928.27725-1-petr.pavlu...
Fixes: c193707dde77 ("tracing: Remove code which merges duplicates") Signed-off-by: Petr Pavlu petr.pavlu@suse.com Acked-by: Tom Zanussi tom.zanussi@linux.intel.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/trace/tracing_map.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/tracing_map.c b/kernel/trace/tracing_map.c index c774e560f2f9..a4dcf0f24352 100644 --- a/kernel/trace/tracing_map.c +++ b/kernel/trace/tracing_map.c @@ -574,7 +574,12 @@ __tracing_map_insert(struct tracing_map *map, void *key, bool lookup_only) }
memcpy(elt->key, key, map->key_size); - entry->val = elt; + /* + * Ensure the initialization is visible and + * publish the elt. + */ + smp_wmb(); + WRITE_ONCE(entry->val, elt); atomic64_inc(&map->hits);
return entry->val;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Howells dhowells@redhat.com
[ Upstream commit 57e9d49c54528c49b8bffe6d99d782ea051ea534 ]
There appears to be a race between silly-rename files being created/removed and various userspace tools iterating over the contents of a directory, leading to such errors as:
find: './kernel/.tmp_cpio_dir/include/dt-bindings/reset/.__afs2080': No such file or directory tar: ./include/linux/greybus/.__afs3C95: File removed before we read it
when building a kernel.
Fix afs_readdir() so that it doesn't return .__afsXXXX silly-rename files to userspace. This doesn't stop them being looked up directly by name as we need to be able to look them up from within the kernel as part of the silly-rename algorithm.
Fixes: 79ddbfa500b3 ("afs: Implement sillyrename for unlink and rename") Signed-off-by: David Howells dhowells@redhat.com cc: Marc Dionne marc.dionne@auristor.com cc: linux-afs@lists.infradead.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/afs/dir.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 5219182e52e1..2df2e9ee130d 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -474,6 +474,14 @@ static int afs_dir_iterate_block(struct afs_vnode *dvnode, continue; }
+ /* Don't expose silly rename entries to userspace. */ + if (nlen > 6 && + dire->u.name[0] == '.' && + ctx->actor != afs_lookup_filldir && + ctx->actor != afs_lookup_one_filldir && + memcmp(dire->u.name, ".__afs", 6) == 0) + continue; + /* found the next entry */ if (!dir_emit(ctx, dire->u.name, nlen, ntohl(dire->u.vnode),
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Salvatore Dipietro dipiets@amazon.com
[ Upstream commit 7267e8dcad6b2f9fce05a6a06335d7040acbc2b6 ]
On CPUs with weak memory models, reads and updates performed by tcp_push to the sk variables can get reordered leaving the socket throttled when it should not. The tasklet running tcp_wfree() may also not observe the memory updates in time and will skip flushing any packets throttled by tcp_push(), delaying the sending. This can pathologically cause 40ms extra latency due to bad interactions with delayed acks.
Adding a memory barrier in tcp_push removes the bug, similarly to the previous commit bf06200e732d ("tcp: tsq: fix nonagle handling"). smp_mb__after_atomic() is used to not incur in unnecessary overhead on x86 since not affected.
Patch has been tested using an AWS c7g.2xlarge instance with Ubuntu 22.04 and Apache Tomcat 9.0.83 running the basic servlet below:
import java.io.IOException; import java.io.OutputStreamWriter; import java.io.PrintWriter; import javax.servlet.ServletException; import javax.servlet.http.HttpServlet; import javax.servlet.http.HttpServletRequest; import javax.servlet.http.HttpServletResponse;
public class HelloWorldServlet extends HttpServlet { @Override protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html;charset=utf-8"); OutputStreamWriter osw = new OutputStreamWriter(response.getOutputStream(),"UTF-8"); String s = "a".repeat(3096); osw.write(s,0,s.length()); osw.flush(); } }
Load was applied using wrk2 (https://github.com/kinvolk/wrk2) from an AWS c6i.8xlarge instance. Before the patch an additional 40ms latency from P99.99+ values is observed while, with the patch, the extra latency disappears.
No patch and tcp_autocorking=1 ./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello ... 50.000% 0.91ms 75.000% 1.13ms 90.000% 1.46ms 99.000% 1.74ms 99.900% 1.89ms 99.990% 41.95ms <<< 40+ ms extra latency 99.999% 48.32ms 100.000% 48.96ms
With patch and tcp_autocorking=1 ./wrk -t32 -c128 -d40s --latency -R10000 http://172.31.60.173:8080/hello/hello ... 50.000% 0.90ms 75.000% 1.13ms 90.000% 1.45ms 99.000% 1.72ms 99.900% 1.83ms 99.990% 2.11ms <<< no 40+ ms extra latency 99.999% 2.53ms 100.000% 2.62ms
Patch has been also tested on x86 (m7i.2xlarge instance) which it is not affected by this issue and the patch doesn't introduce any additional delay.
Fixes: 7aa5470c2c09 ("tcp: tsq: move tsq_flags close to sk_wmem_alloc") Signed-off-by: Salvatore Dipietro dipiets@amazon.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240119190133.43698-1-dipiets@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/tcp.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index f124f6c63915..fb417aee86e6 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -722,6 +722,7 @@ void tcp_push(struct sock *sk, int flags, int mss_now, if (!test_bit(TSQ_THROTTLED, &sk->sk_tsq_flags)) { NET_INC_STATS(sock_net(sk), LINUX_MIB_TCPAUTOCORKING); set_bit(TSQ_THROTTLED, &sk->sk_tsq_flags); + smp_mb__after_atomic(); } /* It is possible TX completion already happened * before we set TSQ_THROTTLED.
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kuniyuki Iwashima kuniyu@amazon.com
[ Upstream commit 97de5a15edf2d22184f5ff588656030bbb7fa358 ]
Jakub reported that ASSERT_EQ(cpu, i) in so_incoming_cpu.c seems to fire somewhat randomly.
# # RUN so_incoming_cpu.before_reuseport.test3 ... # # so_incoming_cpu.c:191:test3:Expected cpu (32) == i (0) # # test3: Test terminated by assertion # # FAIL so_incoming_cpu.before_reuseport.test3 # not ok 3 so_incoming_cpu.before_reuseport.test3
When the test failed, not-yet-accepted CLOSE_WAIT sockets received SYN with a "challenging" SEQ number, which was sent from an unexpected CPU that did not create the receiver.
The test basically does:
1. for each cpu: 1-1. create a server 1-2. set SO_INCOMING_CPU
2. for each cpu: 2-1. set cpu affinity 2-2. create some clients 2-3. let clients connect() to the server on the same cpu 2-4. close() clients
3. for each server: 3-1. accept() all child sockets 3-2. check if all children have the same SO_INCOMING_CPU with the server
The root cause was the close() in 2-4. and net.ipv4.tcp_tw_reuse.
In a loop of 2., close() changed the client state to FIN_WAIT_2, and the peer transitioned to CLOSE_WAIT.
In another loop of 2., connect() happened to select the same port of the FIN_WAIT_2 socket, and it was reused as the default value of net.ipv4.tcp_tw_reuse is 2.
As a result, the new client sent SYN to the CLOSE_WAIT socket from a different CPU, and the receiver's sk_incoming_cpu was overwritten with unexpected CPU ID.
Also, the SYN had a different SEQ number, so the CLOSE_WAIT socket responded with Challenge ACK. The new client properly returned RST and effectively killed the CLOSE_WAIT socket.
This way, all clients were created successfully, but the error was detected later by 3-2., ASSERT_EQ(cpu, i).
To avoid the failure, let's make sure that (i) the number of clients is less than the number of available ports and (ii) such reuse never happens.
Fixes: 6df96146b202 ("selftest: Add test for SO_INCOMING_CPU.") Reported-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Kuniyuki Iwashima kuniyu@amazon.com Tested-by: Jakub Kicinski kuba@kernel.org Link: https://lore.kernel.org/r/20240120031642.67014-1-kuniyu@amazon.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/so_incoming_cpu.c | 68 ++++++++++++++----- 1 file changed, 50 insertions(+), 18 deletions(-)
diff --git a/tools/testing/selftests/net/so_incoming_cpu.c b/tools/testing/selftests/net/so_incoming_cpu.c index a14818164102..e9fa14e10732 100644 --- a/tools/testing/selftests/net/so_incoming_cpu.c +++ b/tools/testing/selftests/net/so_incoming_cpu.c @@ -3,19 +3,16 @@ #define _GNU_SOURCE #include <sched.h>
+#include <fcntl.h> + #include <netinet/in.h> #include <sys/socket.h> #include <sys/sysinfo.h>
#include "../kselftest_harness.h"
-#define CLIENT_PER_SERVER 32 /* More sockets, more reliable */ -#define NR_SERVER self->nproc -#define NR_CLIENT (CLIENT_PER_SERVER * NR_SERVER) - FIXTURE(so_incoming_cpu) { - int nproc; int *servers; union { struct sockaddr addr; @@ -56,12 +53,47 @@ FIXTURE_VARIANT_ADD(so_incoming_cpu, after_all_listen) .when_to_set = AFTER_ALL_LISTEN, };
+static void write_sysctl(struct __test_metadata *_metadata, + char *filename, char *string) +{ + int fd, len, ret; + + fd = open(filename, O_WRONLY); + ASSERT_NE(fd, -1); + + len = strlen(string); + ret = write(fd, string, len); + ASSERT_EQ(ret, len); +} + +static void setup_netns(struct __test_metadata *_metadata) +{ + ASSERT_EQ(unshare(CLONE_NEWNET), 0); + ASSERT_EQ(system("ip link set lo up"), 0); + + write_sysctl(_metadata, "/proc/sys/net/ipv4/ip_local_port_range", "10000 60001"); + write_sysctl(_metadata, "/proc/sys/net/ipv4/tcp_tw_reuse", "0"); +} + +#define NR_PORT (60001 - 10000 - 1) +#define NR_CLIENT_PER_SERVER_DEFAULT 32 +static int nr_client_per_server, nr_server, nr_client; + FIXTURE_SETUP(so_incoming_cpu) { - self->nproc = get_nprocs(); - ASSERT_LE(2, self->nproc); + setup_netns(_metadata); + + nr_server = get_nprocs(); + ASSERT_LE(2, nr_server); + + if (NR_CLIENT_PER_SERVER_DEFAULT * nr_server < NR_PORT) + nr_client_per_server = NR_CLIENT_PER_SERVER_DEFAULT; + else + nr_client_per_server = NR_PORT / nr_server; + + nr_client = nr_client_per_server * nr_server;
- self->servers = malloc(sizeof(int) * NR_SERVER); + self->servers = malloc(sizeof(int) * nr_server); ASSERT_NE(self->servers, NULL);
self->in_addr.sin_family = AF_INET; @@ -74,7 +106,7 @@ FIXTURE_TEARDOWN(so_incoming_cpu) { int i;
- for (i = 0; i < NR_SERVER; i++) + for (i = 0; i < nr_server; i++) close(self->servers[i]);
free(self->servers); @@ -110,10 +142,10 @@ int create_server(struct __test_metadata *_metadata, if (variant->when_to_set == BEFORE_LISTEN) set_so_incoming_cpu(_metadata, fd, cpu);
- /* We don't use CLIENT_PER_SERVER here not to block + /* We don't use nr_client_per_server here not to block * this test at connect() if SO_INCOMING_CPU is broken. */ - ret = listen(fd, NR_CLIENT); + ret = listen(fd, nr_client); ASSERT_EQ(ret, 0);
if (variant->when_to_set == AFTER_LISTEN) @@ -128,7 +160,7 @@ void create_servers(struct __test_metadata *_metadata, { int i, ret;
- for (i = 0; i < NR_SERVER; i++) { + for (i = 0; i < nr_server; i++) { self->servers[i] = create_server(_metadata, self, variant, i);
if (i == 0) { @@ -138,7 +170,7 @@ void create_servers(struct __test_metadata *_metadata, }
if (variant->when_to_set == AFTER_ALL_LISTEN) { - for (i = 0; i < NR_SERVER; i++) + for (i = 0; i < nr_server; i++) set_so_incoming_cpu(_metadata, self->servers[i], i); } } @@ -149,7 +181,7 @@ void create_clients(struct __test_metadata *_metadata, cpu_set_t cpu_set; int i, j, fd, ret;
- for (i = 0; i < NR_SERVER; i++) { + for (i = 0; i < nr_server; i++) { CPU_ZERO(&cpu_set);
CPU_SET(i, &cpu_set); @@ -162,7 +194,7 @@ void create_clients(struct __test_metadata *_metadata, ret = sched_setaffinity(0, sizeof(cpu_set), &cpu_set); ASSERT_EQ(ret, 0);
- for (j = 0; j < CLIENT_PER_SERVER; j++) { + for (j = 0; j < nr_client_per_server; j++) { fd = socket(AF_INET, SOCK_STREAM, 0); ASSERT_NE(fd, -1);
@@ -180,8 +212,8 @@ void verify_incoming_cpu(struct __test_metadata *_metadata, int i, j, fd, cpu, ret, total = 0; socklen_t len = sizeof(int);
- for (i = 0; i < NR_SERVER; i++) { - for (j = 0; j < CLIENT_PER_SERVER; j++) { + for (i = 0; i < nr_server; i++) { + for (j = 0; j < nr_client_per_server; j++) { /* If we see -EAGAIN here, SO_INCOMING_CPU is broken */ fd = accept(self->servers[i], &self->addr, &self->addrlen); ASSERT_NE(fd, -1); @@ -195,7 +227,7 @@ void verify_incoming_cpu(struct __test_metadata *_metadata, } }
- ASSERT_EQ(total, NR_CLIENT); + ASSERT_EQ(total, nr_client); TH_LOG("SO_INCOMING_CPU is very likely to be " "working correctly with %d sockets.", total); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 234ec0b6034b16869d45128b8cd2dc6ffe596f04 ]
I analyze the potential sleeping issue of the following processes: Thread A Thread B ... netlink_create //ref = 1 do_mq_notify ... sock = netlink_getsockbyfilp ... //ref = 2 info->notify_sock = sock; ... ... netlink_sendmsg ... skb = netlink_alloc_large_skb //skb->head is vmalloced ... netlink_unicast ... sk = netlink_getsockbyportid //ref = 3 ... netlink_sendskb ... __netlink_sendskb ... skb_queue_tail //put skb to sk_receive_queue ... sock_put //ref = 2 ... ... ... netlink_release ... deferred_put_nlk_sk //ref = 1 mqueue_flush_file spin_lock remove_notification netlink_sendskb sock_put //ref = 0 sk_free ... __sk_destruct netlink_sock_destruct skb_queue_purge //get skb from sk_receive_queue ... __skb_queue_purge_reason kfree_skb_reason __kfree_skb ... skb_release_all skb_release_head_state netlink_skb_destructor vfree(skb->head) //sleeping while holding spinlock
In netlink_sendmsg, if the memory pointed to by skb->head is allocated by vmalloc, and is put to sk_receive_queue queue, also the skb is not freed. When the mqueue executes flush, the sleeping bug will occur. Use vfree_atomic instead of vfree in netlink_skb_destructor to solve the issue.
Fixes: c05cdb1b864f ("netlink: allow large data transfers from user-space") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Link: https://lore.kernel.org/r/20240122011807.2110357-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/netlink/af_netlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c index eb086b06d60d..d9107b545d36 100644 --- a/net/netlink/af_netlink.c +++ b/net/netlink/af_netlink.c @@ -374,7 +374,7 @@ static void netlink_skb_destructor(struct sk_buff *skb) if (is_vmalloc_addr(skb->head)) { if (!skb->cloned || !atomic_dec_return(&(skb_shinfo(skb)->dataref))) - vfree(skb->head); + vfree_atomic(skb->head);
skb->head = NULL; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit 435e202d645c197dcfd39d7372eb2a56529b6640 ]
In commit 198bc90e0e73("tcp: make sure init the accept_queue's spinlocks once"), the spinlocks of accept_queue are initialized only when socket is created in the inet4 scenario. The locks are not initialized when socket is created in the inet6 scenario. The kernel reports the following error: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 Call Trace: <TASK> dump_stack_lvl (lib/dump_stack.c:107) register_lock_class (kernel/locking/lockdep.c:1289) __lock_acquire (kernel/locking/lockdep.c:5015) lock_acquire.part.0 (kernel/locking/lockdep.c:5756) _raw_spin_lock_bh (kernel/locking/spinlock.c:178) inet_csk_listen_stop (net/ipv4/inet_connection_sock.c:1386) tcp_disconnect (net/ipv4/tcp.c:2981) inet_shutdown (net/ipv4/af_inet.c:935) __sys_shutdown (./include/linux/file.h:32 net/socket.c:2438) __x64_sys_shutdown (net/socket.c:2445) do_syscall_64 (arch/x86/entry/common.c:52) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) RIP: 0033:0x7f52ecd05a3d Code: 5b 41 5c c3 66 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d ab a3 0e 00 f7 d8 64 89 01 48 RSP: 002b:00007f52ecf5dde8 EFLAGS: 00000293 ORIG_RAX: 0000000000000030 RAX: ffffffffffffffda RBX: 00007f52ecf5e640 RCX: 00007f52ecd05a3d RDX: 00007f52ecc8b188 RSI: 0000000000000000 RDI: 0000000000000004 RBP: 00007f52ecf5de20 R08: 00007ffdae45c69f R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000293 R12: 00007f52ecf5e640 R13: 0000000000000000 R14: 00007f52ecc8b060 R15: 00007ffdae45c6e0
Fixes: 198bc90e0e73 ("tcp: make sure init the accept_queue's spinlocks once") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Eric Dumazet edumazet@google.com Link: https://lore.kernel.org/r/20240122102001.2851701-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv6/af_inet6.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index 368824fe9719..b6c5b5e25a2f 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -199,6 +199,9 @@ static int inet6_create(struct net *net, struct socket *sock, int protocol, if (INET_PROTOSW_REUSE & answer_flags) sk->sk_reuse = SK_CAN_REUSE;
+ if (INET_PROTOSW_ICSK & answer_flags) + inet_init_csk_locks(sk); + inet = inet_sk(sk); inet_assign_bit(IS_ICSK, sk, INET_PROTOSW_ICSK & answer_flags);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 04fe7c5029cbdbcdb28917f09a958d939a8f19f7 ]
We are missing a lot of config options from net selftests, it seems:
tun/tap: CONFIG_TUN, CONFIG_MACVLAN, CONFIG_MACVTAP fib_tests: CONFIG_NET_SCH_FQ_CODEL l2tp: CONFIG_L2TP, CONFIG_L2TP_V3, CONFIG_L2TP_IP, CONFIG_L2TP_ETH sctp-vrf: CONFIG_INET_DIAG txtimestamp: CONFIG_NET_CLS_U32 vxlan_mdb: CONFIG_BRIDGE_VLAN_FILTERING gre_gso: CONFIG_NET_IPGRE_DEMUX, CONFIG_IP_GRE, CONFIG_IPV6_GRE srv6_end_dt*_l3vpn: CONFIG_IPV6_SEG6_LWTUNNEL ip_local_port_range: CONFIG_MPTCP fib_test: CONFIG_NET_CLS_BASIC rtnetlink: CONFIG_MACSEC, CONFIG_NET_SCH_HTB, CONFIG_XFRM_INTERFACE CONFIG_NET_IPGRE, CONFIG_BONDING fib_nexthops: CONFIG_MPLS, CONFIG_MPLS_ROUTING vxlan_mdb: CONFIG_NET_ACT_GACT tls: CONFIG_TLS, CONFIG_CRYPTO_CHACHA20POLY1305 psample: CONFIG_PSAMPLE fcnal: CONFIG_TCP_MD5SIG
Try to add them in a semi-alphabetical order.
Fixes: 62199e3f1658 ("selftests: net: Add VXLAN MDB test") Fixes: c12e0d5f267d ("self-tests: introduce self-tests for RPS default mask") Fixes: 122db5e3634b ("selftests/net: add MPTCP coverage for IP_LOCAL_PORT_RANGE") Link: https://lore.kernel.org/r/20240122203528.672004-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/config | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config index 8da562a9ae87..19ff75051660 100644 --- a/tools/testing/selftests/net/config +++ b/tools/testing/selftests/net/config @@ -1,5 +1,6 @@ CONFIG_USER_NS=y CONFIG_NET_NS=y +CONFIG_BONDING=m CONFIG_BPF_SYSCALL=y CONFIG_TEST_BPF=m CONFIG_NUMA=y @@ -14,9 +15,13 @@ CONFIG_VETH=y CONFIG_NET_IPVTI=y CONFIG_IPV6_VTI=y CONFIG_DUMMY=y +CONFIG_BRIDGE_VLAN_FILTERING=y CONFIG_BRIDGE=y +CONFIG_CRYPTO_CHACHA20POLY1305=m CONFIG_VLAN_8021Q=y CONFIG_IFB=y +CONFIG_INET_DIAG=y +CONFIG_IP_GRE=m CONFIG_NETFILTER=y CONFIG_NETFILTER_ADVANCED=y CONFIG_NF_CONNTRACK=m @@ -25,15 +30,36 @@ CONFIG_IP6_NF_IPTABLES=m CONFIG_IP_NF_IPTABLES=m CONFIG_IP6_NF_NAT=m CONFIG_IP_NF_NAT=m +CONFIG_IPV6_GRE=m +CONFIG_IPV6_SEG6_LWTUNNEL=y +CONFIG_L2TP_ETH=m +CONFIG_L2TP_IP=m +CONFIG_L2TP=m +CONFIG_L2TP_V3=y +CONFIG_MACSEC=m +CONFIG_MACVLAN=y +CONFIG_MACVTAP=y +CONFIG_MPLS=y +CONFIG_MPTCP=y CONFIG_NF_TABLES=m CONFIG_NF_TABLES_IPV6=y CONFIG_NF_TABLES_IPV4=y CONFIG_NFT_NAT=m +CONFIG_NET_ACT_GACT=m +CONFIG_NET_CLS_BASIC=m +CONFIG_NET_CLS_U32=m +CONFIG_NET_IPGRE_DEMUX=m +CONFIG_NET_IPGRE=m +CONFIG_NET_SCH_FQ_CODEL=m +CONFIG_NET_SCH_HTB=m CONFIG_NET_SCH_FQ=m CONFIG_NET_SCH_ETF=m CONFIG_NET_SCH_NETEM=y +CONFIG_PSAMPLE=m +CONFIG_TCP_MD5SIG=y CONFIG_TEST_BLACKHOLE_DEV=m CONFIG_KALLSYMS=y +CONFIG_TLS=m CONFIG_TRACEPOINTS=y CONFIG_NET_DROP_MONITOR=m CONFIG_NETDEVSIM=m @@ -48,7 +74,9 @@ CONFIG_BAREUDP=m CONFIG_IPV6_IOAM6_LWTUNNEL=y CONFIG_CRYPTO_SM4_GENERIC=y CONFIG_AMT=m +CONFIG_TUN=y CONFIG_VXLAN=m CONFIG_IP_SCTP=m CONFIG_NETFILTER_XT_MATCH_POLICY=m CONFIG_CRYPTO_ARIA=y +CONFIG_XFRM_INTERFACE=m
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ido Schimmel idosch@nvidia.com
[ Upstream commit 32f2a0afa95fae0d1ceec2ff06e0e816939964b8 ]
When a qdisc is deleted from a net device the stack instructs the underlying driver to remove its flow offload callback from the associated filter block using the 'FLOW_BLOCK_UNBIND' command. The stack then continues to replay the removal of the filters in the block for this driver by iterating over the chains in the block and invoking the 'reoffload' operation of the classifier being used. In turn, the classifier in its 'reoffload' operation prepares and emits a 'FLOW_CLS_DESTROY' command for each filter.
However, the stack does not do the same for chain templates and the underlying driver never receives a 'FLOW_CLS_TMPLT_DESTROY' command when a qdisc is deleted. This results in a memory leak [1] which can be reproduced using [2].
Fix by introducing a 'tmplt_reoffload' operation and have the stack invoke it with the appropriate arguments as part of the replay. Implement the operation in the sole classifier that supports chain templates (flower) by emitting the 'FLOW_CLS_TMPLT_{CREATE,DESTROY}' command based on whether a flow offload callback is being bound to a filter block or being unbound from one.
As far as I can tell, the issue happens since cited commit which reordered tcf_block_offload_unbind() before tcf_block_flush_all_chains() in __tcf_block_put(). The order cannot be reversed as the filter block is expected to be freed after flushing all the chains.
[1] unreferenced object 0xffff888107e28800 (size 2048): comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s) hex dump (first 32 bytes): b1 a6 7c 11 81 88 ff ff e0 5b b3 10 81 88 ff ff ..|......[...... 01 00 00 00 00 00 00 00 e0 aa b0 84 ff ff ff ff ................ backtrace: [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320 [<ffffffff81ab374e>] __kmalloc+0x4e/0x90 [<ffffffff832aec6d>] mlxsw_sp_acl_ruleset_get+0x34d/0x7a0 [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180 [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280 [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340 [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0 [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170 [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0 [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440 [<ffffffff83ac6270>] netlink_unicast+0x540/0x820 [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0 [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80 [<ffffffff8379d29a>] ___sys_sendmsg+0x13a/0x1e0 [<ffffffff8379d50c>] __sys_sendmsg+0x11c/0x1f0 [<ffffffff843b9ce0>] do_syscall_64+0x40/0xe0 unreferenced object 0xffff88816d2c0400 (size 1024): comm "tc", pid 1079, jiffies 4294958525 (age 3074.287s) hex dump (first 32 bytes): 40 00 00 00 00 00 00 00 57 f6 38 be 00 00 00 00 @.......W.8..... 10 04 2c 6d 81 88 ff ff 10 04 2c 6d 81 88 ff ff ..,m......,m.... backtrace: [<ffffffff81c06a68>] __kmem_cache_alloc_node+0x1e8/0x320 [<ffffffff81ab36c1>] __kmalloc_node+0x51/0x90 [<ffffffff81a8ed96>] kvmalloc_node+0xa6/0x1f0 [<ffffffff82827d03>] bucket_table_alloc.isra.0+0x83/0x460 [<ffffffff82828d2b>] rhashtable_init+0x43b/0x7c0 [<ffffffff832aed48>] mlxsw_sp_acl_ruleset_get+0x428/0x7a0 [<ffffffff832bc195>] mlxsw_sp_flower_tmplt_create+0x145/0x180 [<ffffffff832b2e1a>] mlxsw_sp_flow_block_cb+0x1ea/0x280 [<ffffffff83a10613>] tc_setup_cb_call+0x183/0x340 [<ffffffff83a9f85a>] fl_tmplt_create+0x3da/0x4c0 [<ffffffff83a22435>] tc_ctl_chain+0xa15/0x1170 [<ffffffff838a863c>] rtnetlink_rcv_msg+0x3cc/0xed0 [<ffffffff83ac87f0>] netlink_rcv_skb+0x170/0x440 [<ffffffff83ac6270>] netlink_unicast+0x540/0x820 [<ffffffff83ac6e28>] netlink_sendmsg+0x8d8/0xda0 [<ffffffff83793def>] ____sys_sendmsg+0x30f/0xa80
[2] # tc qdisc add dev swp1 clsact # tc chain add dev swp1 ingress proto ip chain 1 flower dst_ip 0.0.0.0/32 # tc qdisc del dev swp1 clsact # devlink dev reload pci/0000:06:00.0
Fixes: bbf73830cd48 ("net: sched: traverse chains in block with tcf_get_next_chain()") Signed-off-by: Ido Schimmel idosch@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/sch_generic.h | 4 ++++ net/sched/cls_api.c | 9 ++++++++- net/sched/cls_flower.c | 23 +++++++++++++++++++++++ 3 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/include/net/sch_generic.h b/include/net/sch_generic.h index f232512505f8..e940debac400 100644 --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -376,6 +376,10 @@ struct tcf_proto_ops { struct nlattr **tca, struct netlink_ext_ack *extack); void (*tmplt_destroy)(void *tmplt_priv); + void (*tmplt_reoffload)(struct tcf_chain *chain, + bool add, + flow_setup_cb_t *cb, + void *cb_priv); struct tcf_exts * (*get_exts)(const struct tcf_proto *tp, u32 handle);
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c index a193cc7b3241..84e18b5f72a3 100644 --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -1536,6 +1536,9 @@ tcf_block_playback_offloads(struct tcf_block *block, flow_setup_cb_t *cb, chain_prev = chain, chain = __tcf_get_next_chain(block, chain), tcf_chain_put(chain_prev)) { + if (chain->tmplt_ops && add) + chain->tmplt_ops->tmplt_reoffload(chain, true, cb, + cb_priv); for (tp = __tcf_get_next_proto(chain, NULL); tp; tp_prev = tp, tp = __tcf_get_next_proto(chain, tp), @@ -1551,6 +1554,9 @@ tcf_block_playback_offloads(struct tcf_block *block, flow_setup_cb_t *cb, goto err_playback_remove; } } + if (chain->tmplt_ops && !add) + chain->tmplt_ops->tmplt_reoffload(chain, false, cb, + cb_priv); }
return 0; @@ -2950,7 +2956,8 @@ static int tc_chain_tmplt_add(struct tcf_chain *chain, struct net *net, ops = tcf_proto_lookup_ops(name, true, extack); if (IS_ERR(ops)) return PTR_ERR(ops); - if (!ops->tmplt_create || !ops->tmplt_destroy || !ops->tmplt_dump) { + if (!ops->tmplt_create || !ops->tmplt_destroy || !ops->tmplt_dump || + !ops->tmplt_reoffload) { NL_SET_ERR_MSG(extack, "Chain templates are not supported with specified classifier"); module_put(ops->owner); return -EOPNOTSUPP; diff --git a/net/sched/cls_flower.c b/net/sched/cls_flower.c index e5314a31f75a..efb9d2811b73 100644 --- a/net/sched/cls_flower.c +++ b/net/sched/cls_flower.c @@ -2721,6 +2721,28 @@ static void fl_tmplt_destroy(void *tmplt_priv) kfree(tmplt); }
+static void fl_tmplt_reoffload(struct tcf_chain *chain, bool add, + flow_setup_cb_t *cb, void *cb_priv) +{ + struct fl_flow_tmplt *tmplt = chain->tmplt_priv; + struct flow_cls_offload cls_flower = {}; + + cls_flower.rule = flow_rule_alloc(0); + if (!cls_flower.rule) + return; + + cls_flower.common.chain_index = chain->index; + cls_flower.command = add ? FLOW_CLS_TMPLT_CREATE : + FLOW_CLS_TMPLT_DESTROY; + cls_flower.cookie = (unsigned long) tmplt; + cls_flower.rule->match.dissector = &tmplt->dissector; + cls_flower.rule->match.mask = &tmplt->mask; + cls_flower.rule->match.key = &tmplt->dummy_key; + + cb(TC_SETUP_CLSFLOWER, &cls_flower, cb_priv); + kfree(cls_flower.rule); +} + static int fl_dump_key_val(struct sk_buff *skb, void *val, int val_type, void *mask, int mask_type, int len) @@ -3628,6 +3650,7 @@ static struct tcf_proto_ops cls_fl_ops __read_mostly = { .bind_class = fl_bind_class, .tmplt_create = fl_tmplt_create, .tmplt_destroy = fl_tmplt_destroy, + .tmplt_reoffload = fl_tmplt_reoffload, .tmplt_dump = fl_tmplt_dump, .get_exts = fl_get_exts, .owner = THIS_MODULE,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rahul Rameshbabu rrameshbabu@nvidia.com
[ Upstream commit 3876638b2c7ebb2c9d181de1191db0de8cac143a ]
Indirection (*) is of lower precedence than postfix increment (++). Logic in napi_poll context would cause an out-of-bound read by first increment the pointer address by byte address space and then dereference the value. Rather, the intended logic was to dereference first and then increment the underlying value.
Fixes: 92214be5979c ("net/mlx5e: Update doorbell for port timestamping CQ before the software counter") Signed-off-by: Rahul Rameshbabu rrameshbabu@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c index af3928eddafd..803035d4e597 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/ptp.c @@ -213,7 +213,7 @@ static void mlx5e_ptp_handle_ts_cqe(struct mlx5e_ptpsq *ptpsq, mlx5e_ptpsq_mark_ts_cqes_undelivered(ptpsq, hwtstamp); out: napi_consume_skb(skb, budget); - md_buff[*md_buff_sz++] = metadata_id; + md_buff[(*md_buff_sz)++] = metadata_id; if (unlikely(mlx5e_ptp_metadata_map_unhealthy(&ptpsq->metadata_map)) && !test_and_set_bit(MLX5E_SQ_STATE_RECOVERING, &sq->state)) queue_work(ptpsq->txqsq.priv->wq, &ptpsq->report_unhealthy_work);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Vlad Buslov vladbu@nvidia.com
[ Upstream commit d76fdd31f953ac5046555171620f2562715e9b71 ]
The cited change refactored mlx5e_tc_del_fdb_peer_flow() to only clear DUP flag when list of peer flows has become empty. However, if any concurrent user holds a reference to a peer flow (for example, the neighbor update workqueue task is updating peer flow's parent encap entry concurrently), then the flow will not be removed from the peer list and, consecutively, DUP flag will remain set. Since mlx5e_tc_del_fdb_peers_flow() calls mlx5e_tc_del_fdb_peer_flow() for every possible peer index the algorithm will try to remove the flow from eswitch instances that it has never peered with causing either NULL pointer dereference when trying to remove the flow peer list head of peer_index that was never initialized or a warning if the list debug config is enabled[0].
Fix the issue by always removing the peer flow from the list even when not releasing the last reference to it.
[0]:
[ 3102.985806] ------------[ cut here ]------------ [ 3102.986223] list_del corruption, ffff888139110698->next is NULL [ 3102.986757] WARNING: CPU: 2 PID: 22109 at lib/list_debug.c:53 __list_del_entry_valid_or_report+0x4f/0xc0 [ 3102.987561] Modules linked in: act_ct nf_flow_table bonding act_tunnel_key act_mirred act_skbedit vxlan cls_matchall nfnetlink_cttimeout act_gact cls_flower sch_ingress mlx5_vdpa vringh vhost_iotlb vdpa openvswitch nsh xt_MASQUERADE nf_conntrack_netlink nfnetlink iptable_nat xt_addrtype xt_conntrack nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcg ss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi ib_umad rdma_cm ib_ipoib iw_cm ib_cm mlx5_ib ib_uverbs ib_core mlx5_core [last unloaded: bonding] [ 3102.991113] CPU: 2 PID: 22109 Comm: revalidator28 Not tainted 6.6.0-rc6+ #3 [ 3102.991695] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 [ 3102.992605] RIP: 0010:__list_del_entry_valid_or_report+0x4f/0xc0 [ 3102.993122] Code: 39 c2 74 56 48 8b 32 48 39 fe 75 62 48 8b 51 08 48 39 f2 75 73 b8 01 00 00 00 c3 48 89 fe 48 c7 c7 48 fd 0a 82 e8 41 0b ad ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 70 fd 0a 82 e8 2d 0b ad ff 0f 0b [ 3102.994615] RSP: 0018:ffff8881383e7710 EFLAGS: 00010286 [ 3102.995078] RAX: 0000000000000000 RBX: 0000000000000002 RCX: 0000000000000000 [ 3102.995670] RDX: 0000000000000001 RSI: ffff88885f89b640 RDI: ffff88885f89b640 [ 3102.997188] DEL flow 00000000be367878 on port 0 [ 3102.998594] RBP: dead000000000122 R08: 0000000000000000 R09: c0000000ffffdfff [ 3102.999604] R10: 0000000000000008 R11: ffff8881383e7598 R12: dead000000000100 [ 3103.000198] R13: 0000000000000002 R14: ffff888139110000 R15: ffff888101901240 [ 3103.000790] FS: 00007f424cde4700(0000) GS:ffff88885f880000(0000) knlGS:0000000000000000 [ 3103.001486] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3103.001986] CR2: 00007fd42e8dcb70 CR3: 000000011e68a003 CR4: 0000000000370ea0 [ 3103.002596] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 3103.003190] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 3103.003787] Call Trace: [ 3103.004055] <TASK> [ 3103.004297] ? __warn+0x7d/0x130 [ 3103.004623] ? __list_del_entry_valid_or_report+0x4f/0xc0 [ 3103.005094] ? report_bug+0xf1/0x1c0 [ 3103.005439] ? console_unlock+0x4a/0xd0 [ 3103.005806] ? handle_bug+0x3f/0x70 [ 3103.006149] ? exc_invalid_op+0x13/0x60 [ 3103.006531] ? asm_exc_invalid_op+0x16/0x20 [ 3103.007430] ? __list_del_entry_valid_or_report+0x4f/0xc0 [ 3103.007910] mlx5e_tc_del_fdb_peers_flow+0xcf/0x240 [mlx5_core] [ 3103.008463] mlx5e_tc_del_flow+0x46/0x270 [mlx5_core] [ 3103.008944] mlx5e_flow_put+0x26/0x50 [mlx5_core] [ 3103.009401] mlx5e_delete_flower+0x25f/0x380 [mlx5_core] [ 3103.009901] tc_setup_cb_destroy+0xab/0x180 [ 3103.010292] fl_hw_destroy_filter+0x99/0xc0 [cls_flower] [ 3103.010779] __fl_delete+0x2d4/0x2f0 [cls_flower] [ 3103.011207] fl_delete+0x36/0x80 [cls_flower] [ 3103.011614] tc_del_tfilter+0x56f/0x750 [ 3103.011982] rtnetlink_rcv_msg+0xff/0x3a0 [ 3103.012362] ? netlink_ack+0x1c7/0x4e0 [ 3103.012719] ? rtnl_calcit.isra.44+0x130/0x130 [ 3103.013134] netlink_rcv_skb+0x54/0x100 [ 3103.013533] netlink_unicast+0x1ca/0x2b0 [ 3103.013902] netlink_sendmsg+0x361/0x4d0 [ 3103.014269] __sock_sendmsg+0x38/0x60 [ 3103.014643] ____sys_sendmsg+0x1f2/0x200 [ 3103.015018] ? copy_msghdr_from_user+0x72/0xa0 [ 3103.015265] ___sys_sendmsg+0x87/0xd0 [ 3103.016608] ? copy_msghdr_from_user+0x72/0xa0 [ 3103.017014] ? ___sys_recvmsg+0x9b/0xd0 [ 3103.017381] ? ttwu_do_activate.isra.137+0x58/0x180 [ 3103.017821] ? wake_up_q+0x49/0x90 [ 3103.018157] ? futex_wake+0x137/0x160 [ 3103.018521] ? __sys_sendmsg+0x51/0x90 [ 3103.018882] __sys_sendmsg+0x51/0x90 [ 3103.019230] ? exit_to_user_mode_prepare+0x56/0x130 [ 3103.019670] do_syscall_64+0x3c/0x80 [ 3103.020017] entry_SYSCALL_64_after_hwframe+0x46/0xb0 [ 3103.020469] RIP: 0033:0x7f4254811ef4 [ 3103.020816] Code: 89 f3 48 83 ec 10 48 89 7c 24 08 48 89 14 24 e8 42 eb ff ff 48 8b 14 24 41 89 c0 48 89 de 48 8b 7c 24 08 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 30 44 89 c7 48 89 04 24 e8 78 eb ff ff 48 8b [ 3103.022290] RSP: 002b:00007f424cdd9480 EFLAGS: 00000293 ORIG_RAX: 000000000000002e [ 3103.022970] RAX: ffffffffffffffda RBX: 00007f424cdd9510 RCX: 00007f4254811ef4 [ 3103.023564] RDX: 0000000000000000 RSI: 00007f424cdd9510 RDI: 0000000000000012 [ 3103.024158] RBP: 00007f424cdda238 R08: 0000000000000000 R09: 00007f41d801a4b0 [ 3103.024748] R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000001 [ 3103.025341] R13: 00007f424cdd9510 R14: 00007f424cdda240 R15: 00007f424cdd99a0 [ 3103.025931] </TASK> [ 3103.026182] ---[ end trace 0000000000000000 ]--- [ 3103.027033] ------------[ cut here ]------------
Fixes: 9be6c21fdcf8 ("net/mlx5e: Handle offloads flows per peer") Signed-off-by: Vlad Buslov vladbu@nvidia.com Reviewed-by: Mark Bloch mbloch@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 25e44ee5121a..dc9b157a4499 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -2012,9 +2012,10 @@ static void mlx5e_tc_del_fdb_peer_flow(struct mlx5e_tc_flow *flow, list_for_each_entry_safe(peer_flow, tmp, &flow->peer_flows, peer_flows) { if (peer_index != mlx5_get_dev_index(peer_flow->priv->mdev)) continue; + + list_del(&peer_flow->peer_flows); if (refcount_dec_and_test(&peer_flow->refcnt)) { mlx5e_tc_del_fdb_flow(peer_flow->priv, peer_flow); - list_del(&peer_flow->peer_flows); kfree(peer_flow); } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yishai Hadas yishaih@nvidia.com
[ Upstream commit cc8091587779cfaddb6b29c9e9edb9079a282cad ]
The below WARN [1] is reported once a callback command failed.
As a callback runs under an interrupt context, needs to use the IRQ save/restore variant.
[1] DEBUG_LOCKS_WARN_ON(lockdep_hardirq_context()) WARNING: CPU: 15 PID: 0 at kernel/locking/lockdep.c:4353 lockdep_hardirqs_on_prepare+0x11b/0x180 Modules linked in: vhost_net vhost tap mlx5_vfio_pci vfio_pci vfio_pci_core vfio_iommu_type1 vfio mlx5_vdpa vringh vhost_iotlb vdpa nfnetlink_cttimeout openvswitch nsh ip6table_mangle ip6table_nat ip6table_filter ip6_tables iptable_mangle xt_conntrackxt_MASQUERADE nf_conntrack_netlink nfnetlink xt_addrtype iptable_nat nf_nat br_netfilter rpcsec_gss_krb5 auth_rpcgss oid_registry overlay rpcrdma rdma_ucm ib_iser libiscsi scsi_transport_iscsi rdma_cm iw_cm ib_umad ib_ipoib ib_cm mlx5_ib ib_uverbs ib_core fuse mlx5_core CPU: 15 PID: 0 Comm: swapper/15 Tainted: G W 6.7.0-rc4+ #1587 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:lockdep_hardirqs_on_prepare+0x11b/0x180 Code: 00 5b c3 c3 e8 e6 0d 58 00 85 c0 74 d6 8b 15 f0 c3 76 01 85 d2 75 cc 48 c7 c6 04 a5 3b 82 48 c7 c7 f1 e9 39 82 e8 95 12 f9 ff <0f> 0b 5b c3 e8 bc 0d 58 00 85 c0 74 ac 8b 3d c6 c3 76 01 85 ff 75 RSP: 0018:ffffc900003ecd18 EFLAGS: 00010086 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027 RDX: 0000000000000000 RSI: ffff88885fbdb880 RDI: ffff88885fbdb888 RBP: 00000000ffffff87 R08: 0000000000000000 R09: 0000000000000001 R10: 0000000000000000 R11: 284e4f5f4e524157 R12: 00000000002c9aa1 R13: ffff88810aace980 R14: ffff88810aace9b8 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff88885fbc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f731436f4c8 CR3: 000000010aae6001 CR4: 0000000000372eb0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> ? __warn+0x81/0x170 ? lockdep_hardirqs_on_prepare+0x11b/0x180 ? report_bug+0xf8/0x1c0 ? handle_bug+0x3f/0x70 ? exc_invalid_op+0x13/0x60 ? asm_exc_invalid_op+0x16/0x20 ? lockdep_hardirqs_on_prepare+0x11b/0x180 ? lockdep_hardirqs_on_prepare+0x11b/0x180 trace_hardirqs_on+0x4a/0xa0 raw_spin_unlock_irq+0x24/0x30 cmd_status_err+0xc0/0x1a0 [mlx5_core] cmd_status_err+0x1a0/0x1a0 [mlx5_core] mlx5_cmd_exec_cb_handler+0x24/0x40 [mlx5_core] mlx5_cmd_comp_handler+0x129/0x4b0 [mlx5_core] cmd_comp_notifier+0x1a/0x20 [mlx5_core] notifier_call_chain+0x3e/0xe0 atomic_notifier_call_chain+0x5f/0x130 mlx5_eq_async_int+0xe7/0x200 [mlx5_core] notifier_call_chain+0x3e/0xe0 atomic_notifier_call_chain+0x5f/0x130 irq_int_handler+0x11/0x20 [mlx5_core] __handle_irq_event_percpu+0x99/0x220 ? tick_irq_enter+0x5d/0x80 handle_irq_event_percpu+0xf/0x40 handle_irq_event+0x3a/0x60 handle_edge_irq+0xa2/0x1c0 __common_interrupt+0x55/0x140 common_interrupt+0x7d/0xa0 </IRQ> <TASK> asm_common_interrupt+0x22/0x40 RIP: 0010:default_idle+0x13/0x20 Code: c0 08 00 00 00 4d 29 c8 4c 01 c7 4c 29 c2 e9 72 ff ff ff cc cc cc cc 8b 05 ea 08 25 01 85 c0 7e 07 0f 00 2d 7f b0 26 00 fb f4 <fa> c3 90 66 2e 0f 1f 84 00 00 00 00 00 65 48 8b 04 25 80 d0 02 00 RSP: 0018:ffffc9000010fec8 EFLAGS: 00000242 RAX: 0000000000000001 RBX: 000000000000000f RCX: 4000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff811c410c RBP: ffffffff829478c0 R08: 0000000000000001 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 ? do_idle+0x1ec/0x210 default_idle_call+0x6c/0x90 do_idle+0x1ec/0x210 cpu_startup_entry+0x26/0x30 start_secondary+0x11b/0x150 secondary_startup_64_no_verify+0x165/0x16b </TASK> irq event stamp: 833284 hardirqs last enabled at (833283): [<ffffffff811c410c>] do_idle+0x1ec/0x210 hardirqs last disabled at (833284): [<ffffffff81daf9ef>] common_interrupt+0xf/0xa0 softirqs last enabled at (833224): [<ffffffff81dc199f>] __do_softirq+0x2bf/0x40e softirqs last disabled at (833177): [<ffffffff81178ddf>] irq_exit_rcu+0x7f/0xa0
Fixes: 34f46ae0d4b3 ("net/mlx5: Add command failures data to debugfs") Signed-off-by: Yishai Hadas yishaih@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c index 7013e1c8741a..55efb932ab2c 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/cmd.c @@ -1921,6 +1921,7 @@ static void cmd_status_log(struct mlx5_core_dev *dev, u16 opcode, u8 status, { const char *namep = mlx5_command_str(opcode); struct mlx5_cmd_stats *stats; + unsigned long flags;
if (!err || !(strcmp(namep, "unknown command opcode"))) return; @@ -1928,7 +1929,7 @@ static void cmd_status_log(struct mlx5_core_dev *dev, u16 opcode, u8 status, stats = xa_load(&dev->cmd.stats, opcode); if (!stats) return; - spin_lock_irq(&stats->lock); + spin_lock_irqsave(&stats->lock, flags); stats->failed++; if (err < 0) stats->last_failed_errno = -err; @@ -1937,7 +1938,7 @@ static void cmd_status_log(struct mlx5_core_dev *dev, u16 opcode, u8 status, stats->last_failed_mbox_status = status; stats->last_failed_syndrome = syndrome; } - spin_unlock_irq(&stats->lock); + spin_unlock_irqrestore(&stats->lock, flags); }
/* preserve -EREMOTEIO for outbox.status != OK, otherwise return err as is */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Erez Shitrit erezsh@nvidia.com
[ Upstream commit 653b7eb9d74426397c95061fd57da3063625af65 ]
In order to have mcast offloads the driver needs the following: It should know if that mcast comes from wire port, in addition the flow should not be marked as any specific source, that way it will give the flexibility for the driver not to be depended on the way iterator implemented in the FW.
Signed-off-by: Erez Shitrit erezsh@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Vlad Buslov vladbu@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Stable-dep-of: ec7cc38ef9f8 ("net/mlx5: Bridge, fix multicast packets sent to uplink") Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/mellanox/mlx5/core/esw/bridge_mcast.c | 11 ++--------- include/linux/mlx5/fs.h | 1 + 2 files changed, 3 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_mcast.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_mcast.c index 7a01714b3780..a7ed87e9d842 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_mcast.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_mcast.c @@ -78,6 +78,8 @@ mlx5_esw_bridge_mdb_flow_create(u16 esw_owner_vhca_id, struct mlx5_esw_bridge_md xa_for_each(&entry->ports, idx, port) { dests[i].type = MLX5_FLOW_DESTINATION_TYPE_FLOW_TABLE; dests[i].ft = port->mcast.ft; + if (port->vport_num == MLX5_VPORT_UPLINK) + dests[i].ft->flags |= MLX5_FLOW_TABLE_UPLINK_VPORT; i++; }
@@ -585,10 +587,6 @@ mlx5_esw_bridge_mcast_vlan_flow_create(u16 vlan_proto, struct mlx5_esw_bridge_po if (!rule_spec) return ERR_PTR(-ENOMEM);
- if (MLX5_CAP_ESW_FLOWTABLE(bridge->br_offloads->esw->dev, flow_source) && - port->vport_num == MLX5_VPORT_UPLINK) - rule_spec->flow_context.flow_source = - MLX5_FLOW_CONTEXT_FLOW_SOURCE_LOCAL_VPORT; rule_spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT; @@ -660,11 +658,6 @@ mlx5_esw_bridge_mcast_fwd_flow_create(struct mlx5_esw_bridge_port *port) if (!rule_spec) return ERR_PTR(-ENOMEM);
- if (MLX5_CAP_ESW_FLOWTABLE(bridge->br_offloads->esw->dev, flow_source) && - port->vport_num == MLX5_VPORT_UPLINK) - rule_spec->flow_context.flow_source = - MLX5_FLOW_CONTEXT_FLOW_SOURCE_LOCAL_VPORT; - if (MLX5_CAP_ESW(bridge->br_offloads->esw->dev, merged_eswitch)) { dest.vport.flags = MLX5_FLOW_DEST_VPORT_VHCA_ID; dest.vport.vhca_id = port->esw_owner_vhca_id; diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 1e00c2436377..6f7725238abc 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -67,6 +67,7 @@ enum { MLX5_FLOW_TABLE_TERMINATION = BIT(2), MLX5_FLOW_TABLE_UNMANAGED = BIT(3), MLX5_FLOW_TABLE_OTHER_VPORT = BIT(4), + MLX5_FLOW_TABLE_UPLINK_VPORT = BIT(5), };
#define LEFTOVERS_RULE_NUM 2
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Moshe Shemesh moshe@nvidia.com
[ Upstream commit ec7cc38ef9f83553102e84c82536971a81630739 ]
To enable multicast packets which are offloaded in bridge multicast offload mode to be sent also to uplink, FTE bit uplink_hairpin_en should be set. Add this bit to FTE for the bridge multicast offload rules.
Fixes: 18c2916cee12 ("net/mlx5: Bridge, snoop igmp/mld packets") Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Gal Pressman gal@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_mcast.c | 3 +++ drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c | 2 ++ include/linux/mlx5/fs.h | 1 + include/linux/mlx5/mlx5_ifc.h | 2 +- 4 files changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_mcast.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_mcast.c index a7ed87e9d842..22dd30cf8033 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_mcast.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/bridge_mcast.c @@ -83,6 +83,7 @@ mlx5_esw_bridge_mdb_flow_create(u16 esw_owner_vhca_id, struct mlx5_esw_bridge_md i++; }
+ rule_spec->flow_context.flags |= FLOW_CONTEXT_UPLINK_HAIRPIN_EN; rule_spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; dmac_v = MLX5_ADDR_OF(fte_match_param, rule_spec->match_value, outer_headers.dmac_47_16); ether_addr_copy(dmac_v, entry->key.addr); @@ -587,6 +588,7 @@ mlx5_esw_bridge_mcast_vlan_flow_create(u16 vlan_proto, struct mlx5_esw_bridge_po if (!rule_spec) return ERR_PTR(-ENOMEM);
+ rule_spec->flow_context.flags |= FLOW_CONTEXT_UPLINK_HAIRPIN_EN; rule_spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS;
flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_PACKET_REFORMAT; @@ -662,6 +664,7 @@ mlx5_esw_bridge_mcast_fwd_flow_create(struct mlx5_esw_bridge_port *port) dest.vport.flags = MLX5_FLOW_DEST_VPORT_VHCA_ID; dest.vport.vhca_id = port->esw_owner_vhca_id; } + rule_spec->flow_context.flags |= FLOW_CONTEXT_UPLINK_HAIRPIN_EN; handle = mlx5_add_flow_rules(port->mcast.ft, rule_spec, &flow_act, &dest, 1);
kvfree(rule_spec); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c index a4b925331661..b29299c49ab3 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c @@ -566,6 +566,8 @@ static int mlx5_cmd_set_fte(struct mlx5_core_dev *dev, fte->flow_context.flow_tag); MLX5_SET(flow_context, in_flow_context, flow_source, fte->flow_context.flow_source); + MLX5_SET(flow_context, in_flow_context, uplink_hairpin_en, + !!(fte->flow_context.flags & FLOW_CONTEXT_UPLINK_HAIRPIN_EN));
MLX5_SET(flow_context, in_flow_context, extended_destination, extended_dest); diff --git a/include/linux/mlx5/fs.h b/include/linux/mlx5/fs.h index 6f7725238abc..3fb428ce7d1c 100644 --- a/include/linux/mlx5/fs.h +++ b/include/linux/mlx5/fs.h @@ -132,6 +132,7 @@ struct mlx5_flow_handle;
enum { FLOW_CONTEXT_HAS_TAG = BIT(0), + FLOW_CONTEXT_UPLINK_HAIRPIN_EN = BIT(1), };
struct mlx5_flow_context { diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index 8ac6ae79e083..51eb83f77938 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -3536,7 +3536,7 @@ struct mlx5_ifc_flow_context_bits { u8 action[0x10];
u8 extended_destination[0x1]; - u8 reserved_at_81[0x1]; + u8 uplink_hairpin_en[0x1]; u8 flow_source[0x2]; u8 encrypt_decrypt_type[0x4]; u8 destination_list_size[0x18];
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yevgeny Kliteynik kliteyn@nvidia.com
[ Upstream commit 5665954293f13642f9c052ead83c1e9d8cff186f ]
When FW provides ICM addresses for drop RX/TX, the provided capability is 64 bits that contain its GVMI as well as the ICM address itself. In case of TX DROP this GVMI is different from the GVMI that the domain is operating on.
This patch fixes the action to use these GVMI IDs, as provided by FW.
Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality") Signed-off-by: Yevgeny Kliteynik kliteyn@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c index 5b83da08692d..1a5aee8a7f13 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c @@ -781,6 +781,7 @@ int mlx5dr_actions_build_ste_arr(struct mlx5dr_matcher *matcher, switch (action_type) { case DR_ACTION_TYP_DROP: attr.final_icm_addr = nic_dmn->drop_icm_addr; + attr.hit_gvmi = nic_dmn->drop_icm_addr >> 48; break; case DR_ACTION_TYP_FT: dest_action = action;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yevgeny Kliteynik kliteyn@nvidia.com
[ Upstream commit 5b2a2523eeea5f03d39a9d1ff1bad2e9f8eb98d2 ]
Go-To-Vport action on RX is not allowed when the vport is uplink. In such case, the packet should be dropped.
Fixes: 9db810ed2d37 ("net/mlx5: DR, Expose steering action functionality") Signed-off-by: Yevgeny Kliteynik kliteyn@nvidia.com Reviewed-by: Erez Shitrit erezsh@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../mellanox/mlx5/core/steering/dr_action.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c index 1a5aee8a7f13..90c38cbbde18 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/dr_action.c @@ -867,11 +867,17 @@ int mlx5dr_actions_build_ste_arr(struct mlx5dr_matcher *matcher, action->sampler->tx_icm_addr; break; case DR_ACTION_TYP_VPORT: - attr.hit_gvmi = action->vport->caps->vhca_gvmi; - dest_action = action; - attr.final_icm_addr = rx_rule ? - action->vport->caps->icm_address_rx : - action->vport->caps->icm_address_tx; + if (unlikely(rx_rule && action->vport->caps->num == MLX5_VPORT_UPLINK)) { + /* can't go to uplink on RX rule - dropping instead */ + attr.final_icm_addr = nic_dmn->drop_icm_addr; + attr.hit_gvmi = nic_dmn->drop_icm_addr >> 48; + } else { + attr.hit_gvmi = action->vport->caps->vhca_gvmi; + dest_action = action; + attr.final_icm_addr = rx_rule ? + action->vport->caps->icm_address_rx : + action->vport->caps->icm_address_tx; + } break; case DR_ACTION_TYP_POP_VLAN: if (!rx_rule && !(dmn->ste_ctx->actions_caps &
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rahul Rameshbabu rrameshbabu@nvidia.com
[ Upstream commit 20cbf8cbb827094197f3b17db60d71449415db1e ]
mlx5 devices have specific constants for choosing the CQ period mode. These constants do not have to match the constants used by the kernel software API for DIM period mode selection.
Fixes: cdd04f4d4d71 ("net/mlx5: Add support to create SQ and CQ for ASO") Signed-off-by: Rahul Rameshbabu rrameshbabu@nvidia.com Reviewed-by: Jianbo Liu jianbol@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c index 40c7be124041..58bd749b5e4d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/aso.c @@ -98,7 +98,7 @@ static int create_aso_cq(struct mlx5_aso_cq *cq, void *cqc_data) mlx5_fill_page_frag_array(&cq->wq_ctrl.buf, (__be64 *)MLX5_ADDR_OF(create_cq_in, in, pas));
- MLX5_SET(cqc, cqc, cq_period_mode, DIM_CQ_PERIOD_MODE_START_FROM_EQE); + MLX5_SET(cqc, cqc, cq_period_mode, MLX5_CQ_PERIOD_MODE_START_FROM_EQE); MLX5_SET(cqc, cqc, c_eqn_or_apu_element, eqn); MLX5_SET(cqc, cqc, uar_page, mdev->priv.uar->index); MLX5_SET(cqc, cqc, log_page_size, cq->wq_ctrl.buf.page_shift -
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leon Romanovsky leonro@nvidia.com
[ Upstream commit 20f5468a7988dedd94a57ba8acd65ebda6a59723 ]
All ConnectX devices have software parsing capability enabled, but it is more correct to set allow_swp only if capability exists, which for IPsec means that crypto offload is supported.
Fixes: 2451da081a34 ("net/mlx5: Unify device IPsec capabilities check") Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/params.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c index e097f336e1c4..30507b7c2fb1 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/params.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/params.c @@ -1062,8 +1062,8 @@ void mlx5e_build_sq_param(struct mlx5_core_dev *mdev, void *wq = MLX5_ADDR_OF(sqc, sqc, wq); bool allow_swp;
- allow_swp = - mlx5_geneve_tx_allowed(mdev) || !!mlx5_ipsec_device_caps(mdev); + allow_swp = mlx5_geneve_tx_allowed(mdev) || + (mlx5_ipsec_device_caps(mdev) & MLX5_IPSEC_CAP_CRYPTO); mlx5e_build_sq_param_common(mdev, param); MLX5_SET(wq, wq, log_wq_sz, params->log_sq_size); MLX5_SET(sqc, sqc, allow_swp, allow_swp);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Leon Romanovsky leonro@nvidia.com
[ Upstream commit 315a597f9bcfe7fe9980985031413457bee95510 ]
XFRM stack doesn't prevent from users to configure replay window in TX side and strongswan sets replay_window to be 1. It causes to failures in validation logic when trying to offload the SA.
Replay window is not relevant in TX side and should be ignored.
Fixes: cded6d80129b ("net/mlx5e: Store replay window in XFRM attributes") Signed-off-by: Aya Levin ayal@nvidia.com Signed-off-by: Leon Romanovsky leonro@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c index 5834e47e72d8..e2ffc572de18 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_accel/ipsec.c @@ -336,12 +336,17 @@ void mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry, /* iv len */ aes_gcm->icv_len = x->aead->alg_icv_len;
+ attrs->dir = x->xso.dir; + /* esn */ if (x->props.flags & XFRM_STATE_ESN) { attrs->replay_esn.trigger = true; attrs->replay_esn.esn = sa_entry->esn_state.esn; attrs->replay_esn.esn_msb = sa_entry->esn_state.esn_msb; attrs->replay_esn.overlap = sa_entry->esn_state.overlap; + if (attrs->dir == XFRM_DEV_OFFLOAD_OUT) + goto skip_replay_window; + switch (x->replay_esn->replay_window) { case 32: attrs->replay_esn.replay_window = @@ -365,7 +370,7 @@ void mlx5e_ipsec_build_accel_xfrm_attrs(struct mlx5e_ipsec_sa_entry *sa_entry, } }
- attrs->dir = x->xso.dir; +skip_replay_window: /* spi */ attrs->spi = be32_to_cpu(x->id.spi);
@@ -501,7 +506,8 @@ static int mlx5e_xfrm_validate_state(struct mlx5_core_dev *mdev, return -EINVAL; }
- if (x->replay_esn && x->replay_esn->replay_window != 32 && + if (x->replay_esn && x->xso.dir == XFRM_DEV_OFFLOAD_IN && + x->replay_esn->replay_window != 32 && x->replay_esn->replay_window != 64 && x->replay_esn->replay_window != 128 && x->replay_esn->replay_window != 256) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhipeng Lu alexious@zju.edu.cn
[ Upstream commit 3c6d5189246f590e4e1f167991558bdb72a4738b ]
When `in` allocated by kvzalloc fails, arfs_create_groups will free ft->g and return an error. However, arfs_create_table, the only caller of arfs_create_groups, will hold this error and call to mlx5e_destroy_flow_table, in which the ft->g will be freed again.
Fixes: 1cabe6b0965e ("net/mlx5e: Create aRFS flow tables") Signed-off-by: Zhipeng Lu alexious@zju.edu.cn Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en_arfs.c | 26 +++++++++++-------- 1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c index bb7f86c993e5..e66f486faafe 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_arfs.c @@ -254,11 +254,13 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft,
ft->g = kcalloc(MLX5E_ARFS_NUM_GROUPS, sizeof(*ft->g), GFP_KERNEL); - in = kvzalloc(inlen, GFP_KERNEL); - if (!in || !ft->g) { - kfree(ft->g); - kvfree(in); + if (!ft->g) return -ENOMEM; + + in = kvzalloc(inlen, GFP_KERNEL); + if (!in) { + err = -ENOMEM; + goto err_free_g; }
mc = MLX5_ADDR_OF(create_flow_group_in, in, match_criteria); @@ -278,7 +280,7 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft, break; default: err = -EINVAL; - goto out; + goto err_free_in; }
switch (type) { @@ -300,7 +302,7 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft, break; default: err = -EINVAL; - goto out; + goto err_free_in; }
MLX5_SET_CFG(in, match_criteria_enable, MLX5_MATCH_OUTER_HEADERS); @@ -309,7 +311,7 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft, MLX5_SET_CFG(in, end_flow_index, ix - 1); ft->g[ft->num_groups] = mlx5_create_flow_group(ft->t, in); if (IS_ERR(ft->g[ft->num_groups])) - goto err; + goto err_clean_group; ft->num_groups++;
memset(in, 0, inlen); @@ -318,18 +320,20 @@ static int arfs_create_groups(struct mlx5e_flow_table *ft, MLX5_SET_CFG(in, end_flow_index, ix - 1); ft->g[ft->num_groups] = mlx5_create_flow_group(ft->t, in); if (IS_ERR(ft->g[ft->num_groups])) - goto err; + goto err_clean_group; ft->num_groups++;
kvfree(in); return 0;
-err: +err_clean_group: err = PTR_ERR(ft->g[ft->num_groups]); ft->g[ft->num_groups] = NULL; -out: +err_free_in: kvfree(in); - +err_free_g: + kfree(ft->g); + ft->g = NULL; return err; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dinghao Liu dinghao.liu@zju.edu.cn
[ Upstream commit aef855df7e1bbd5aa4484851561211500b22707e ]
When kcalloc() for ft->g succeeds but kvzalloc() for in fails, fs_any_create_groups() will free ft->g. However, its caller fs_any_create_table() will free ft->g again through calling mlx5e_destroy_flow_table(), which will lead to a double-free. Fix this by setting ft->g to NULL in fs_any_create_groups().
Fixes: 0f575c20bf06 ("net/mlx5e: Introduce Flow Steering ANY API") Signed-off-by: Dinghao Liu dinghao.liu@zju.edu.cn Reviewed-by: Tariq Toukan tariqt@nvidia.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en/fs_tt_redirect.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/fs_tt_redirect.c b/drivers/net/ethernet/mellanox/mlx5/core/en/fs_tt_redirect.c index e1283531e0b8..671adbad0a40 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/fs_tt_redirect.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/fs_tt_redirect.c @@ -436,6 +436,7 @@ static int fs_any_create_groups(struct mlx5e_flow_table *ft) in = kvzalloc(inlen, GFP_KERNEL); if (!in || !ft->g) { kfree(ft->g); + ft->g = NULL; kvfree(in); return -ENOMEM; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Frederic Weisbecker frederic@kernel.org
[ Upstream commit e787644caf7628ad3269c1fbd321c3255cf51710 ]
When the CPU goes idle for the last time during the CPU down hotplug process, RCU reports a final quiescent state for the current CPU. If this quiescent state propagates up to the top, some tasks may then be woken up to complete the grace period: the main grace period kthread and/or the expedited main workqueue (or kworker).
If those kthreads have a SCHED_FIFO policy, the wake up can indirectly arm the RT bandwith timer to the local offline CPU. Since this happens after hrtimers have been migrated at CPUHP_AP_HRTIMERS_DYING stage, the timer gets ignored. Therefore if the RCU kthreads are waiting for RT bandwidth to be available, they may never be actually scheduled.
This triggers TREE03 rcutorture hangs:
rcu: INFO: rcu_preempt self-detected stall on CPU rcu: 4-...!: (1 GPs behind) idle=9874/1/0x4000000000000000 softirq=0/0 fqs=20 rcuc=21071 jiffies(starved) rcu: (t=21035 jiffies g=938281 q=40787 ncpus=6) rcu: rcu_preempt kthread starved for 20964 jiffies! g938281 f0x0 RCU_GP_WAIT_FQS(5) ->state=0x0 ->cpu=0 rcu: Unless rcu_preempt kthread gets sufficient CPU time, OOM is now expected behavior. rcu: RCU grace-period kthread stack dump: task:rcu_preempt state:R running task stack:14896 pid:14 tgid:14 ppid:2 flags:0x00004000 Call Trace: <TASK> __schedule+0x2eb/0xa80 schedule+0x1f/0x90 schedule_timeout+0x163/0x270 ? __pfx_process_timeout+0x10/0x10 rcu_gp_fqs_loop+0x37c/0x5b0 ? __pfx_rcu_gp_kthread+0x10/0x10 rcu_gp_kthread+0x17c/0x200 kthread+0xde/0x110 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2b/0x40 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK>
The situation can't be solved with just unpinning the timer. The hrtimer infrastructure and the nohz heuristics involved in finding the best remote target for an unpinned timer would then also need to handle enqueues from an offline CPU in the most horrendous way.
So fix this on the RCU side instead and defer the wake up to an online CPU if it's too late for the local one.
Reported-by: Paul E. McKenney paulmck@kernel.org Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier") Signed-off-by: Frederic Weisbecker frederic@kernel.org Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Neeraj Upadhyay (AMD) neeraj.iitr10@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/rcu/tree.c | 34 +++++++++++++++++++++++++++++++++- kernel/rcu/tree_exp.h | 3 +-- 2 files changed, 34 insertions(+), 3 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 9af42eae1ba3..4fe47ed95eeb 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1013,6 +1013,38 @@ static bool rcu_future_gp_cleanup(struct rcu_node *rnp) return needmore; }
+static void swake_up_one_online_ipi(void *arg) +{ + struct swait_queue_head *wqh = arg; + + swake_up_one(wqh); +} + +static void swake_up_one_online(struct swait_queue_head *wqh) +{ + int cpu = get_cpu(); + + /* + * If called from rcutree_report_cpu_starting(), wake up + * is dangerous that late in the CPU-down hotplug process. The + * scheduler might queue an ignored hrtimer. Defer the wake up + * to an online CPU instead. + */ + if (unlikely(cpu_is_offline(cpu))) { + int target; + + target = cpumask_any_and(housekeeping_cpumask(HK_TYPE_RCU), + cpu_online_mask); + + smp_call_function_single(target, swake_up_one_online_ipi, + wqh, 0); + put_cpu(); + } else { + put_cpu(); + swake_up_one(wqh); + } +} + /* * Awaken the grace-period kthread. Don't do a self-awaken (unless in an * interrupt or softirq handler, in which case we just might immediately @@ -1037,7 +1069,7 @@ static void rcu_gp_kthread_wake(void) return; WRITE_ONCE(rcu_state.gp_wake_time, jiffies); WRITE_ONCE(rcu_state.gp_wake_seq, READ_ONCE(rcu_state.gp_seq)); - swake_up_one(&rcu_state.gp_wq); + swake_up_one_online(&rcu_state.gp_wq); }
/* diff --git a/kernel/rcu/tree_exp.h b/kernel/rcu/tree_exp.h index 8239b39d945b..6e87dc764f47 100644 --- a/kernel/rcu/tree_exp.h +++ b/kernel/rcu/tree_exp.h @@ -173,7 +173,6 @@ static bool sync_rcu_exp_done_unlocked(struct rcu_node *rnp) return ret; }
- /* * Report the exit from RCU read-side critical section for the last task * that queued itself during or before the current expedited preemptible-RCU @@ -201,7 +200,7 @@ static void __rcu_report_exp_rnp(struct rcu_node *rnp, raw_spin_unlock_irqrestore_rcu_node(rnp, flags); if (wake) { smp_mb(); /* EGP done before wake_up(). */ - swake_up_one(&rcu_state.expedited_wq); + swake_up_one_online(&rcu_state.expedited_wq); } break; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit c9d9eb9c53d37cdebbad56b91e40baf42d5a97aa ]
Reject bogus configs where internal token counter wraps around. This only occurs with very very large requests, such as 17gbyte/s.
Its better to reject this rather than having incorrect ratelimit.
Fixes: d2168e849ebf ("netfilter: nft_limit: add per-byte limiting") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_limit.c | 23 ++++++++++++++++------- 1 file changed, 16 insertions(+), 7 deletions(-)
diff --git a/net/netfilter/nft_limit.c b/net/netfilter/nft_limit.c index 79039afde34e..cefa25e0dbb0 100644 --- a/net/netfilter/nft_limit.c +++ b/net/netfilter/nft_limit.c @@ -58,17 +58,19 @@ static inline bool nft_limit_eval(struct nft_limit_priv *priv, u64 cost) static int nft_limit_init(struct nft_limit_priv *priv, const struct nlattr * const tb[], bool pkts) { + u64 unit, tokens, rate_with_burst; bool invert = false; - u64 unit, tokens;
if (tb[NFTA_LIMIT_RATE] == NULL || tb[NFTA_LIMIT_UNIT] == NULL) return -EINVAL;
priv->rate = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_RATE])); + if (priv->rate == 0) + return -EINVAL; + unit = be64_to_cpu(nla_get_be64(tb[NFTA_LIMIT_UNIT])); - priv->nsecs = unit * NSEC_PER_SEC; - if (priv->rate == 0 || priv->nsecs < unit) + if (check_mul_overflow(unit, NSEC_PER_SEC, &priv->nsecs)) return -EOVERFLOW;
if (tb[NFTA_LIMIT_BURST]) @@ -77,18 +79,25 @@ static int nft_limit_init(struct nft_limit_priv *priv, if (pkts && priv->burst == 0) priv->burst = NFT_LIMIT_PKT_BURST_DEFAULT;
- if (priv->rate + priv->burst < priv->rate) + if (check_add_overflow(priv->rate, priv->burst, &rate_with_burst)) return -EOVERFLOW;
if (pkts) { - tokens = div64_u64(priv->nsecs, priv->rate) * priv->burst; + u64 tmp = div64_u64(priv->nsecs, priv->rate); + + if (check_mul_overflow(tmp, priv->burst, &tokens)) + return -EOVERFLOW; } else { + u64 tmp; + /* The token bucket size limits the number of tokens can be * accumulated. tokens_max specifies the bucket size. * tokens_max = unit * (rate + burst) / rate. */ - tokens = div64_u64(priv->nsecs * (priv->rate + priv->burst), - priv->rate); + if (check_mul_overflow(priv->nsecs, rate_with_burst, &tmp)) + return -EOVERFLOW; + + tokens = div64_u64(tmp, priv->rate); }
if (tb[NFTA_LIMIT_FLAGS]) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
[ Upstream commit b462579b2b86a8f5230543cadd3a4836be27baf7 ]
nftables has two types of sets/maps, one where userspace defines the name, and anonymous sets/maps, where userspace defines a template name.
For the latter, kernel requires presence of exactly one "%d". nftables uses "__set%d" and "__map%d" for this. The kernel will expand the format specifier and replaces it with the smallest unused number.
As-is, userspace could define a template name that allows to move the set name past the 256 bytes upperlimit (post-expansion).
I don't see how this could be a problem, but I would prefer if userspace cannot do this, so add a limit of 16 bytes for the '%d' template name.
16 bytes is the old total upper limit for set names that existed when nf_tables was merged initially.
Fixes: 387454901bd6 ("netfilter: nf_tables: Allow set names of up to 255 chars") Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index b28fbcb86e94..bad58df478a7 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -24,6 +24,7 @@ #include <net/sock.h>
#define NFT_MODULE_AUTOLOAD_LIMIT (MODULE_NAME_LEN - sizeof("nft-expr-255-")) +#define NFT_SET_MAX_ANONLEN 16
unsigned int nf_tables_net_id __read_mostly;
@@ -4351,6 +4352,9 @@ static int nf_tables_set_alloc_name(struct nft_ctx *ctx, struct nft_set *set, if (p[1] != 'd' || strchr(p + 2, '%')) return -EINVAL;
+ if (strnlen(name, NFT_SET_MAX_ANONLEN) >= NFT_SET_MAX_ANONLEN) + return -EINVAL; + inuse = (unsigned long *)get_zeroed_page(GFP_KERNEL); if (inuse == NULL) return -ENOMEM;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit d0009effa8862c20a13af4cb7475d9771b905693 ]
Several expressions explicitly refer to NF_INET_* hook definitions from expr->ops->validate, however, family is not validated.
Bail out with EOPNOTSUPP in case they are used from unsupported families.
Fixes: 0ca743a55991 ("netfilter: nf_tables: add compatibility layer for x_tables") Fixes: a3c90f7a2323 ("netfilter: nf_tables: flow offload expression") Fixes: 2fa841938c64 ("netfilter: nf_tables: introduce routing expression") Fixes: 554ced0a6e29 ("netfilter: nf_tables: add support for native socket matching") Fixes: ad49d86e07a4 ("netfilter: nf_tables: Add synproxy support") Fixes: 4ed8eb6570a4 ("netfilter: nf_tables: Add native tproxy support") Fixes: 6c47260250fc ("netfilter: nf_tables: add xfrm expression") Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nft_compat.c | 12 ++++++++++++ net/netfilter/nft_flow_offload.c | 5 +++++ net/netfilter/nft_nat.c | 5 +++++ net/netfilter/nft_rt.c | 5 +++++ net/netfilter/nft_socket.c | 5 +++++ net/netfilter/nft_synproxy.c | 7 +++++-- net/netfilter/nft_tproxy.c | 5 +++++ net/netfilter/nft_xfrm.c | 5 +++++ 8 files changed, 47 insertions(+), 2 deletions(-)
diff --git a/net/netfilter/nft_compat.c b/net/netfilter/nft_compat.c index 5284cd2ad532..f0eeda97bfcd 100644 --- a/net/netfilter/nft_compat.c +++ b/net/netfilter/nft_compat.c @@ -350,6 +350,12 @@ static int nft_target_validate(const struct nft_ctx *ctx, unsigned int hook_mask = 0; int ret;
+ if (ctx->family != NFPROTO_IPV4 && + ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_BRIDGE && + ctx->family != NFPROTO_ARP) + return -EOPNOTSUPP; + if (nft_is_base_chain(ctx->chain)) { const struct nft_base_chain *basechain = nft_base_chain(ctx->chain); @@ -595,6 +601,12 @@ static int nft_match_validate(const struct nft_ctx *ctx, unsigned int hook_mask = 0; int ret;
+ if (ctx->family != NFPROTO_IPV4 && + ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_BRIDGE && + ctx->family != NFPROTO_ARP) + return -EOPNOTSUPP; + if (nft_is_base_chain(ctx->chain)) { const struct nft_base_chain *basechain = nft_base_chain(ctx->chain); diff --git a/net/netfilter/nft_flow_offload.c b/net/netfilter/nft_flow_offload.c index ab3362c483b4..397351fa4d5f 100644 --- a/net/netfilter/nft_flow_offload.c +++ b/net/netfilter/nft_flow_offload.c @@ -384,6 +384,11 @@ static int nft_flow_offload_validate(const struct nft_ctx *ctx, { unsigned int hook_mask = (1 << NF_INET_FORWARD);
+ if (ctx->family != NFPROTO_IPV4 && + ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_INET) + return -EOPNOTSUPP; + return nft_chain_validate_hooks(ctx->chain, hook_mask); }
diff --git a/net/netfilter/nft_nat.c b/net/netfilter/nft_nat.c index 583885ce7232..808f5802c270 100644 --- a/net/netfilter/nft_nat.c +++ b/net/netfilter/nft_nat.c @@ -143,6 +143,11 @@ static int nft_nat_validate(const struct nft_ctx *ctx, struct nft_nat *priv = nft_expr_priv(expr); int err;
+ if (ctx->family != NFPROTO_IPV4 && + ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_INET) + return -EOPNOTSUPP; + err = nft_chain_validate_dependency(ctx->chain, NFT_CHAIN_T_NAT); if (err < 0) return err; diff --git a/net/netfilter/nft_rt.c b/net/netfilter/nft_rt.c index 35a2c28caa60..24d977138572 100644 --- a/net/netfilter/nft_rt.c +++ b/net/netfilter/nft_rt.c @@ -166,6 +166,11 @@ static int nft_rt_validate(const struct nft_ctx *ctx, const struct nft_expr *exp const struct nft_rt *priv = nft_expr_priv(expr); unsigned int hooks;
+ if (ctx->family != NFPROTO_IPV4 && + ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_INET) + return -EOPNOTSUPP; + switch (priv->key) { case NFT_RT_NEXTHOP4: case NFT_RT_NEXTHOP6: diff --git a/net/netfilter/nft_socket.c b/net/netfilter/nft_socket.c index 9ed85be79452..f30163e2ca62 100644 --- a/net/netfilter/nft_socket.c +++ b/net/netfilter/nft_socket.c @@ -242,6 +242,11 @@ static int nft_socket_validate(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nft_data **data) { + if (ctx->family != NFPROTO_IPV4 && + ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_INET) + return -EOPNOTSUPP; + return nft_chain_validate_hooks(ctx->chain, (1 << NF_INET_PRE_ROUTING) | (1 << NF_INET_LOCAL_IN) | diff --git a/net/netfilter/nft_synproxy.c b/net/netfilter/nft_synproxy.c index 13da882669a4..1d737f89dfc1 100644 --- a/net/netfilter/nft_synproxy.c +++ b/net/netfilter/nft_synproxy.c @@ -186,7 +186,6 @@ static int nft_synproxy_do_init(const struct nft_ctx *ctx, break; #endif case NFPROTO_INET: - case NFPROTO_BRIDGE: err = nf_synproxy_ipv4_init(snet, ctx->net); if (err) goto nf_ct_failure; @@ -219,7 +218,6 @@ static void nft_synproxy_do_destroy(const struct nft_ctx *ctx) break; #endif case NFPROTO_INET: - case NFPROTO_BRIDGE: nf_synproxy_ipv4_fini(snet, ctx->net); nf_synproxy_ipv6_fini(snet, ctx->net); break; @@ -253,6 +251,11 @@ static int nft_synproxy_validate(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nft_data **data) { + if (ctx->family != NFPROTO_IPV4 && + ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_INET) + return -EOPNOTSUPP; + return nft_chain_validate_hooks(ctx->chain, (1 << NF_INET_LOCAL_IN) | (1 << NF_INET_FORWARD)); } diff --git a/net/netfilter/nft_tproxy.c b/net/netfilter/nft_tproxy.c index ae15cd693f0e..71412adb73d4 100644 --- a/net/netfilter/nft_tproxy.c +++ b/net/netfilter/nft_tproxy.c @@ -316,6 +316,11 @@ static int nft_tproxy_validate(const struct nft_ctx *ctx, const struct nft_expr *expr, const struct nft_data **data) { + if (ctx->family != NFPROTO_IPV4 && + ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_INET) + return -EOPNOTSUPP; + return nft_chain_validate_hooks(ctx->chain, 1 << NF_INET_PRE_ROUTING); }
diff --git a/net/netfilter/nft_xfrm.c b/net/netfilter/nft_xfrm.c index 452f8587adda..1c866757db55 100644 --- a/net/netfilter/nft_xfrm.c +++ b/net/netfilter/nft_xfrm.c @@ -235,6 +235,11 @@ static int nft_xfrm_validate(const struct nft_ctx *ctx, const struct nft_expr *e const struct nft_xfrm *priv = nft_expr_priv(expr); unsigned int hooks;
+ if (ctx->family != NFPROTO_IPV4 && + ctx->family != NFPROTO_IPV6 && + ctx->family != NFPROTO_INET) + return -EOPNOTSUPP; + switch (priv->dir) { case XFRM_POLICY_IN: hooks = (1 << NF_INET_FORWARD) |
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bernd Edlinger bernd.edlinger@hotmail.de
[ Upstream commit a5f5eee282a0aae80227697e1d9c811b1726d31d ]
otherwise the synopsys_id value may be read out wrong, because the GMAC_VERSION register might still be in reset state, for at least 1 us after the reset is de-asserted.
Add a wait for 10 us before continuing to be on the safe side.
From what have you got that delay value?
Just try and error, with very old linux versions and old gcc versions the synopsys_id was read out correctly most of the time (but not always), with recent linux versions and recnet gcc versions it was read out wrongly most of the time, but again not always. I don't have access to the VHDL code in question, so I cannot tell why it takes so long to get the correct values, I also do not have more than a few hardware samples, so I cannot tell how long this timeout must be in worst case. Experimentally I can tell that the register is read several times as zero immediately after the reset is de-asserted, also adding several no-ops is not enough, adding a printk is enough, also udelay(1) seems to be enough but I tried that not very often, and I have not access to many hardware samples to be 100% sure about the necessary delay. And since the udelay here is only executed once per device instance, it seems acceptable to delay the boot for 10 us.
BTW: my hardware's synopsys id is 0x37.
Fixes: c5e4ddbdfa11 ("net: stmmac: Add support for optional reset control") Signed-off-by: Bernd Edlinger bernd.edlinger@hotmail.de Reviewed-by: Jiri Pirko jiri@nvidia.com Reviewed-by: Serge Semin fancer.lancer@gmail.com Link: https://lore.kernel.org/r/AS8P193MB1285A810BD78C111E7F6AA34E4752@AS8P193MB12... Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 684ec7058c82..292857c0e601 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -7441,6 +7441,9 @@ int stmmac_dvr_probe(struct device *device, dev_err(priv->device, "unable to bring out of ahb reset: %pe\n", ERR_PTR(ret));
+ /* Wait a bit for the reset to take effect */ + udelay(10); + /* Init MAC and get the capabilities */ ret = stmmac_hw_init(priv); if (ret)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jenishkumar Maheshbhai Patel jpatel2@marvell.com
[ Upstream commit 9f538b415db862e74b8c5d3abbccfc1b2b6caa38 ]
Register value persist after booting the kernel using kexec which results in kernel panic. Thus clear the BM pool registers before initialisation to fix the issue.
Fixes: 3f518509dedc ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: Jenishkumar Maheshbhai Patel jpatel2@marvell.com Reviewed-by: Maxime Chevallier maxime.chevallier@bootlin.com Link: https://lore.kernel.org/r/20240119035914.2595665-1-jpatel2@marvell.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/marvell/mvpp2/mvpp2_main.c | 27 ++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c index 21c3f9b015c8..aca17082b9ec 100644 --- a/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c +++ b/drivers/net/ethernet/marvell/mvpp2/mvpp2_main.c @@ -614,12 +614,38 @@ static void mvpp23_bm_set_8pool_mode(struct mvpp2 *priv) mvpp2_write(priv, MVPP22_BM_POOL_BASE_ADDR_HIGH_REG, val); }
+/* Cleanup pool before actual initialization in the OS */ +static void mvpp2_bm_pool_cleanup(struct mvpp2 *priv, int pool_id) +{ + unsigned int thread = mvpp2_cpu_to_thread(priv, get_cpu()); + u32 val; + int i; + + /* Drain the BM from all possible residues left by firmware */ + for (i = 0; i < MVPP2_BM_POOL_SIZE_MAX; i++) + mvpp2_thread_read(priv, thread, MVPP2_BM_PHY_ALLOC_REG(pool_id)); + + put_cpu(); + + /* Stop the BM pool */ + val = mvpp2_read(priv, MVPP2_BM_POOL_CTRL_REG(pool_id)); + val |= MVPP2_BM_STOP_MASK; + mvpp2_write(priv, MVPP2_BM_POOL_CTRL_REG(pool_id), val); +} + static int mvpp2_bm_init(struct device *dev, struct mvpp2 *priv) { enum dma_data_direction dma_dir = DMA_FROM_DEVICE; int i, err, poolnum = MVPP2_BM_POOLS_NUM; struct mvpp2_port *port;
+ if (priv->percpu_pools) + poolnum = mvpp2_get_nrxqs(priv) * 2; + + /* Clean up the pool state in case it contains stale state */ + for (i = 0; i < poolnum; i++) + mvpp2_bm_pool_cleanup(priv, i); + if (priv->percpu_pools) { for (i = 0; i < priv->port_count; i++) { port = priv->port_list[i]; @@ -629,7 +655,6 @@ static int mvpp2_bm_init(struct device *dev, struct mvpp2 *priv) } }
- poolnum = mvpp2_get_nrxqs(priv) * 2; for (i = 0; i < poolnum; i++) { /* the pool in use */ int pn = i / (poolnum / 2);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 0719b5338a0cbe80d1637a5fb03d8141b5bfc7a1 ]
If there is more than 32 cpus the bitmask will start to contain commas, leading to:
./rps_default_mask.sh: line 36: [: 00000000,00000000: integer expression expected
Remove the commas, bash doesn't interpret leading zeroes as oct so that should be good enough. Switch to bash, Simon reports that not all shells support this type of substitution.
Fixes: c12e0d5f267d ("self-tests: introduce self-tests for RPS default mask") Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240122195815.638997-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/net/rps_default_mask.sh | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/net/rps_default_mask.sh b/tools/testing/selftests/net/rps_default_mask.sh index a26c5624429f..4287a8529890 100755 --- a/tools/testing/selftests/net/rps_default_mask.sh +++ b/tools/testing/selftests/net/rps_default_mask.sh @@ -1,4 +1,4 @@ -#!/bin/sh +#!/bin/bash # SPDX-License-Identifier: GPL-2.0
readonly ksft_skip=4 @@ -33,6 +33,10 @@ chk_rps() {
rps_mask=$($cmd /sys/class/net/$dev_name/queues/rx-0/rps_cpus) printf "%-60s" "$msg" + + # In case there is more than 32 CPUs we need to remove commas from masks + rps_mask=${rps_mask//,} + expected_rps_mask=${expected_rps_mask//,} if [ $rps_mask -eq $expected_rps_mask ]; then echo "[ ok ]" else
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 0879020a7817e7ce636372c016b4528f541c9f4d ]
This test is missing a whole bunch of checks for interface renaming and one ifup. Presumably it was only used on a system with renaming disabled and NetworkManager running.
Fixes: 91f430b2c49d ("selftests: net: add a test for UDP tunnel info infra") Acked-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240123060529.1033912-1-kuba@kernel.org Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../selftests/drivers/net/netdevsim/udp_tunnel_nic.sh | 9 +++++++++ 1 file changed, 9 insertions(+)
diff --git a/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh b/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh index 1b08e042cf94..185b02d2d4cd 100755 --- a/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh +++ b/tools/testing/selftests/drivers/net/netdevsim/udp_tunnel_nic.sh @@ -269,6 +269,7 @@ for port in 0 1; do echo 1 > $NSIM_DEV_SYS/new_port fi NSIM_NETDEV=`get_netdev_name old_netdevs` + ifconfig $NSIM_NETDEV up
msg="new NIC device created" exp0=( 0 0 0 0 ) @@ -430,6 +431,7 @@ for port in 0 1; do fi
echo $port > $NSIM_DEV_SYS/new_port + NSIM_NETDEV=`get_netdev_name old_netdevs` ifconfig $NSIM_NETDEV up
overflow_table0 "overflow NIC table" @@ -487,6 +489,7 @@ for port in 0 1; do fi
echo $port > $NSIM_DEV_SYS/new_port + NSIM_NETDEV=`get_netdev_name old_netdevs` ifconfig $NSIM_NETDEV up
overflow_table0 "overflow NIC table" @@ -543,6 +546,7 @@ for port in 0 1; do fi
echo $port > $NSIM_DEV_SYS/new_port + NSIM_NETDEV=`get_netdev_name old_netdevs` ifconfig $NSIM_NETDEV up
overflow_table0 "destroy NIC" @@ -572,6 +576,7 @@ for port in 0 1; do fi
echo $port > $NSIM_DEV_SYS/new_port + NSIM_NETDEV=`get_netdev_name old_netdevs` ifconfig $NSIM_NETDEV up
msg="create VxLANs v6" @@ -632,6 +637,7 @@ for port in 0 1; do fi
echo $port > $NSIM_DEV_SYS/new_port + NSIM_NETDEV=`get_netdev_name old_netdevs` ifconfig $NSIM_NETDEV up
echo 110 > $NSIM_DEV_DFS/ports/$port/udp_ports_inject_error @@ -687,6 +693,7 @@ for port in 0 1; do fi
echo $port > $NSIM_DEV_SYS/new_port + NSIM_NETDEV=`get_netdev_name old_netdevs` ifconfig $NSIM_NETDEV up
msg="create VxLANs v6" @@ -746,6 +753,7 @@ for port in 0 1; do fi
echo $port > $NSIM_DEV_SYS/new_port + NSIM_NETDEV=`get_netdev_name old_netdevs` ifconfig $NSIM_NETDEV up
msg="create VxLANs v6" @@ -876,6 +884,7 @@ msg="re-add a port"
echo 2 > $NSIM_DEV_SYS/del_port echo 2 > $NSIM_DEV_SYS/new_port +NSIM_NETDEV=`get_netdev_name old_netdevs` check_tables
msg="replace VxLAN in overflow table"
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit 269009893146c495f41e9572dd9319e787c2eba9 ]
Add missing xsk_buff_free() call when __xsk_rcv_zc() failed to produce descriptor to XSK Rx queue.
Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Acked-by: Magnus Karlsson magnus.karlsson@intel.com Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Link: https://lore.kernel.org/r/20240124191602.566724-2-maciej.fijalkowski@intel.c... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/xdp/xsk.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 774a6d1916e4..d849dc04a334 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -166,8 +166,10 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) contd = XDP_PKT_CONTD;
err = __xsk_rcv_zc(xs, xskb, len, contd); - if (err || likely(!frags)) - goto out; + if (err) + goto err; + if (likely(!frags)) + return 0;
xskb_list = &xskb->pool->xskb_list; list_for_each_entry_safe(pos, tmp, xskb_list, xskb_list_node) { @@ -176,11 +178,13 @@ static int xsk_rcv_zc(struct xdp_sock *xs, struct xdp_buff *xdp, u32 len) len = pos->xdp.data_end - pos->xdp.data; err = __xsk_rcv_zc(xs, pos, len, contd); if (err) - return err; + goto err; list_del(&pos->xskb_list_node); }
-out: + return 0; +err: + xsk_buff_free(xdp); return err; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit f7f6aa8e24383fbb11ac55942e66da9660110f80 ]
XDP multi-buffer support introduced XDP_FLAGS_HAS_FRAGS flag that is used by drivers to notify data path whether xdp_buff contains fragments or not. Data path looks up mentioned flag on first buffer that occupies the linear part of xdp_buff, so drivers only modify it there. This is sufficient for SKB and XDP_DRV modes as usually xdp_buff is allocated on stack or it resides within struct representing driver's queue and fragments are carried via skb_frag_t structs. IOW, we are dealing with only one xdp_buff.
ZC mode though relies on list of xdp_buff structs that is carried via xsk_buff_pool::xskb_list, so ZC data path has to make sure that fragments do *not* have XDP_FLAGS_HAS_FRAGS set. Otherwise, xsk_buff_free() could misbehave if it would be executed against xdp_buff that carries a frag with XDP_FLAGS_HAS_FRAGS flag set. Such scenario can take place when within supplied XDP program bpf_xdp_adjust_tail() is used with negative offset that would in turn release the tail fragment from multi-buffer frame.
Calling xsk_buff_free() on tail fragment with XDP_FLAGS_HAS_FRAGS would result in releasing all the nodes from xskb_list that were produced by driver before XDP program execution, which is not what is intended - only tail fragment should be deleted from xskb_list and then it should be put onto xsk_buff_pool::free_list. Such multi-buffer frame will never make it up to user space, so from AF_XDP application POV there would be no traffic running, however due to free_list getting constantly new nodes, driver will be able to feed HW Rx queue with recycled buffers. Bottom line is that instead of traffic being redirected to user space, it would be continuously dropped.
To fix this, let us clear the mentioned flag on xsk_buff_pool side during xdp_buff initialization, which is what should have been done right from the start of XSK multi-buffer support.
Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support") Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support") Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Link: https://lore.kernel.org/r/20240124191602.566724-3-maciej.fijalkowski@intel.c... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 1 - drivers/net/ethernet/intel/ice/ice_xsk.c | 1 - include/net/xdp_sock_drv.h | 1 + net/xdp/xsk_buff_pool.c | 1 + 4 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index 7d991e4d9b89..b75e6b6d317c 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -503,7 +503,6 @@ int i40e_clean_rx_irq_zc(struct i40e_ring *rx_ring, int budget) xdp_res = i40e_run_xdp_zc(rx_ring, first, xdp_prog); i40e_handle_xdp_result_zc(rx_ring, first, rx_desc, &rx_packets, &rx_bytes, xdp_res, &failure); - first->flags = 0; next_to_clean = next_to_process; if (failure) break; diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 2a3f0834e139..33f194c870bb 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -897,7 +897,6 @@ int ice_clean_rx_irq_zc(struct ice_rx_ring *rx_ring, int budget)
if (!first) { first = xdp; - xdp_buff_clear_frags_flag(first); } else if (ice_add_xsk_frag(rx_ring, first, xdp, size)) { break; } diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 1f6fc8c7a84c..7290eb721c07 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -152,6 +152,7 @@ static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size) xdp->data = xdp->data_hard_start + XDP_PACKET_HEADROOM; xdp->data_meta = xdp->data; xdp->data_end = xdp->data + size; + xdp->flags = 0; }
static inline dma_addr_t xsk_buff_raw_get_dma(struct xsk_buff_pool *pool, diff --git a/net/xdp/xsk_buff_pool.c b/net/xdp/xsk_buff_pool.c index 49cb9f9a09be..b0a611677865 100644 --- a/net/xdp/xsk_buff_pool.c +++ b/net/xdp/xsk_buff_pool.c @@ -541,6 +541,7 @@ struct xdp_buff *xp_alloc(struct xsk_buff_pool *pool)
xskb->xdp.data = xskb->xdp.data_hard_start + XDP_PACKET_HEADROOM; xskb->xdp.data_meta = xskb->xdp.data; + xskb->xdp.flags = 0;
if (pool->dma_need_sync) { dma_sync_single_range_for_device(pool->dev, xskb->dma, 0,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daan De Meyer daan.j.demeyer@gmail.com
[ Upstream commit fefba7d1ae198dcbf8b3b432de46a4e29f8dbd8c ]
As prep for adding unix socket support to the cgroup sockaddr hooks, let's propagate the sockaddr length back to the caller after running a bpf cgroup sockaddr hook program. While not important for AF_INET or AF_INET6, the sockaddr length is important when working with AF_UNIX sockaddrs as the size of the sockaddr cannot be determined just from the address family or the sockaddr's contents.
__cgroup_bpf_run_filter_sock_addr() is modified to take the uaddrlen as an input/output argument. After running the program, the modified sockaddr length is stored in the uaddrlen pointer.
Signed-off-by: Daan De Meyer daan.j.demeyer@gmail.com Link: https://lore.kernel.org/r/20231011185113.140426-3-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Stable-dep-of: c5114710c8ce ("xsk: fix usage of multi-buffer BPF helpers for ZC XDP") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/bpf-cgroup.h | 73 +++++++++++++++++++------------------- include/linux/filter.h | 1 + kernel/bpf/cgroup.c | 17 +++++++-- net/ipv4/af_inet.c | 7 ++-- net/ipv4/ping.c | 2 +- net/ipv4/tcp_ipv4.c | 2 +- net/ipv4/udp.c | 9 +++-- net/ipv6/af_inet6.c | 9 ++--- net/ipv6/ping.c | 2 +- net/ipv6/tcp_ipv6.c | 2 +- net/ipv6/udp.c | 6 ++-- 11 files changed, 76 insertions(+), 54 deletions(-)
diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h index 8506690dbb9c..31561e789715 100644 --- a/include/linux/bpf-cgroup.h +++ b/include/linux/bpf-cgroup.h @@ -120,6 +120,7 @@ int __cgroup_bpf_run_filter_sk(struct sock *sk,
int __cgroup_bpf_run_filter_sock_addr(struct sock *sk, struct sockaddr *uaddr, + int *uaddrlen, enum cgroup_bpf_attach_type atype, void *t_ctx, u32 *flags); @@ -230,22 +231,22 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk, #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk) \ BPF_CGROUP_RUN_SK_PROG(sk, CGROUP_INET6_POST_BIND)
-#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, atype) \ +#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, atype) \ ({ \ int __ret = 0; \ if (cgroup_bpf_enabled(atype)) \ - __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype, \ - NULL, NULL); \ + __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, uaddrlen, \ + atype, NULL, NULL); \ __ret; \ })
-#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, atype, t_ctx) \ +#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, atype, t_ctx) \ ({ \ int __ret = 0; \ if (cgroup_bpf_enabled(atype)) { \ lock_sock(sk); \ - __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype, \ - t_ctx, NULL); \ + __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, uaddrlen, \ + atype, t_ctx, NULL); \ release_sock(sk); \ } \ __ret; \ @@ -256,14 +257,14 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk, * (at bit position 0) is to indicate CAP_NET_BIND_SERVICE capability check * should be bypassed (BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE). */ -#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, atype, bind_flags) \ +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, uaddrlen, atype, bind_flags) \ ({ \ u32 __flags = 0; \ int __ret = 0; \ if (cgroup_bpf_enabled(atype)) { \ lock_sock(sk); \ - __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, atype, \ - NULL, &__flags); \ + __ret = __cgroup_bpf_run_filter_sock_addr(sk, uaddr, uaddrlen, \ + atype, NULL, &__flags); \ release_sock(sk); \ if (__flags & BPF_RET_BIND_NO_CAP_NET_BIND_SERVICE) \ *bind_flags |= BIND_NO_CAP_NET_BIND_SERVICE; \ @@ -276,29 +277,29 @@ static inline bool cgroup_bpf_sock_enabled(struct sock *sk, cgroup_bpf_enabled(CGROUP_INET6_CONNECT)) && \ (sk)->sk_prot->pre_connect)
-#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr) \ - BPF_CGROUP_RUN_SA_PROG(sk, uaddr, CGROUP_INET4_CONNECT) +#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen) \ + BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_INET4_CONNECT)
-#define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr) \ - BPF_CGROUP_RUN_SA_PROG(sk, uaddr, CGROUP_INET6_CONNECT) +#define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen) \ + BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT)
-#define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr) \ - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_INET4_CONNECT, NULL) +#define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen) \ + BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET4_CONNECT, NULL)
-#define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr) \ - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_INET6_CONNECT, NULL) +#define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen) \ + BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_INET6_CONNECT, NULL)
-#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx) \ - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP4_SENDMSG, t_ctx) +#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) \ + BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_SENDMSG, t_ctx)
-#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) \ - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP6_SENDMSG, t_ctx) +#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) \ + BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_SENDMSG, t_ctx)
-#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr) \ - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP4_RECVMSG, NULL) +#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen) \ + BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP4_RECVMSG, NULL)
-#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr) \ - BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, CGROUP_UDP6_RECVMSG, NULL) +#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen) \ + BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, CGROUP_UDP6_RECVMSG, NULL)
/* The SOCK_OPS"_SK" macro should be used when sock_ops->sk is not a * fullsock and its parent fullsock cannot be traced by @@ -477,24 +478,24 @@ static inline int bpf_percpu_cgroup_storage_update(struct bpf_map *map, }
#define cgroup_bpf_enabled(atype) (0) -#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, atype, t_ctx) ({ 0; }) -#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, atype) ({ 0; }) +#define BPF_CGROUP_RUN_SA_PROG_LOCK(sk, uaddr, uaddrlen, atype, t_ctx) ({ 0; }) +#define BPF_CGROUP_RUN_SA_PROG(sk, uaddr, uaddrlen, atype) ({ 0; }) #define BPF_CGROUP_PRE_CONNECT_ENABLED(sk) (0) #define BPF_CGROUP_RUN_PROG_INET_INGRESS(sk,skb) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET_EGRESS(sk,skb) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET_SOCK(sk) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET_SOCK_RELEASE(sk) ({ 0; }) -#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, atype, flags) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, uaddrlen, atype, flags) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET4_POST_BIND(sk) ({ 0; }) #define BPF_CGROUP_RUN_PROG_INET6_POST_BIND(sk) ({ 0; }) -#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr) ({ 0; }) -#define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr) ({ 0; }) -#define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr) ({ 0; }) -#define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr) ({ 0; }) -#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; }) -#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, t_ctx) ({ 0; }) -#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr) ({ 0; }) -#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, uaddrlen) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, uaddrlen) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, uaddrlen) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, uaddr, uaddrlen, t_ctx) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; }) +#define BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, uaddr, uaddrlen) ({ 0; }) #define BPF_CGROUP_RUN_PROG_SOCK_OPS(sock_ops) ({ 0; }) #define BPF_CGROUP_RUN_PROG_DEVICE_CGROUP(atype, major, minor, access) ({ 0; }) #define BPF_CGROUP_RUN_PROG_SYSCTL(head,table,write,buf,count,pos) ({ 0; }) diff --git a/include/linux/filter.h b/include/linux/filter.h index 761af6b3cf2b..77db4263d68d 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -1285,6 +1285,7 @@ struct bpf_sock_addr_kern { */ u64 tmp_reg; void *t_ctx; /* Attach type specific context. */ + u32 uaddrlen; };
struct bpf_sock_ops_kern { diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index 03b3d4492980..ac37bd53aee0 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -1450,6 +1450,9 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk); * provided by user sockaddr * @sk: sock struct that will use sockaddr * @uaddr: sockaddr struct provided by user + * @uaddrlen: Pointer to the size of the sockaddr struct provided by user. It is + * read-only for AF_INET[6] uaddr but can be modified for AF_UNIX + * uaddr. * @atype: The type of program to be executed * @t_ctx: Pointer to attach type specific context * @flags: Pointer to u32 which contains higher bits of BPF program @@ -1462,6 +1465,7 @@ EXPORT_SYMBOL(__cgroup_bpf_run_filter_sk); */ int __cgroup_bpf_run_filter_sock_addr(struct sock *sk, struct sockaddr *uaddr, + int *uaddrlen, enum cgroup_bpf_attach_type atype, void *t_ctx, u32 *flags) @@ -1473,6 +1477,7 @@ int __cgroup_bpf_run_filter_sock_addr(struct sock *sk, }; struct sockaddr_storage unspec; struct cgroup *cgrp; + int ret;
/* Check socket family since not all sockets represent network * endpoint (e.g. AF_UNIX). @@ -1483,11 +1488,19 @@ int __cgroup_bpf_run_filter_sock_addr(struct sock *sk, if (!ctx.uaddr) { memset(&unspec, 0, sizeof(unspec)); ctx.uaddr = (struct sockaddr *)&unspec; + ctx.uaddrlen = 0; + } else { + ctx.uaddrlen = *uaddrlen; }
cgrp = sock_cgroup_ptr(&sk->sk_cgrp_data); - return bpf_prog_run_array_cg(&cgrp->bpf, atype, &ctx, bpf_prog_run, - 0, flags); + ret = bpf_prog_run_array_cg(&cgrp->bpf, atype, &ctx, bpf_prog_run, + 0, flags); + + if (!ret && uaddr) + *uaddrlen = ctx.uaddrlen; + + return ret; } EXPORT_SYMBOL(__cgroup_bpf_run_filter_sock_addr);
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index b739ddbef0f0..7d4d625471f7 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -455,7 +455,7 @@ int inet_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len) /* BPF prog is run before any checks are done so that if the prog * changes context in a wrong way it will be caught. */ - err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, + err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, &addr_len, CGROUP_INET4_BIND, &flags); if (err) return err; @@ -797,6 +797,7 @@ int inet_getname(struct socket *sock, struct sockaddr *uaddr, struct sock *sk = sock->sk; struct inet_sock *inet = inet_sk(sk); DECLARE_SOCKADDR(struct sockaddr_in *, sin, uaddr); + int sin_addr_len = sizeof(*sin);
sin->sin_family = AF_INET; lock_sock(sk); @@ -809,7 +810,7 @@ int inet_getname(struct socket *sock, struct sockaddr *uaddr, } sin->sin_port = inet->inet_dport; sin->sin_addr.s_addr = inet->inet_daddr; - BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, + BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, &sin_addr_len, CGROUP_INET4_GETPEERNAME); } else { __be32 addr = inet->inet_rcv_saddr; @@ -817,7 +818,7 @@ int inet_getname(struct socket *sock, struct sockaddr *uaddr, addr = inet->inet_saddr; sin->sin_port = inet->inet_sport; sin->sin_addr.s_addr = addr; - BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, + BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, &sin_addr_len, CGROUP_INET4_GETSOCKNAME); } release_sock(sk); diff --git a/net/ipv4/ping.c b/net/ipv4/ping.c index 75e0aee35eb7..4cb0c896caf9 100644 --- a/net/ipv4/ping.c +++ b/net/ipv4/ping.c @@ -301,7 +301,7 @@ static int ping_pre_connect(struct sock *sk, struct sockaddr *uaddr, if (addr_len < sizeof(struct sockaddr_in)) return -EINVAL;
- return BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr); + return BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, &addr_len); }
/* Checks the bind address and possibly modifies sk->sk_bound_dev_if. */ diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c index 4167e8a48b60..c7ffab37a34c 100644 --- a/net/ipv4/tcp_ipv4.c +++ b/net/ipv4/tcp_ipv4.c @@ -194,7 +194,7 @@ static int tcp_v4_pre_connect(struct sock *sk, struct sockaddr *uaddr,
sock_owned_by_me(sk);
- return BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr); + return BPF_CGROUP_RUN_PROG_INET4_CONNECT(sk, uaddr, &addr_len); }
/* This will initiate an outgoing connection. */ diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 9cb22a6ae1dc..7be4ddc80d95 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1143,7 +1143,9 @@ int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
if (cgroup_bpf_enabled(CGROUP_UDP4_SENDMSG) && !connected) { err = BPF_CGROUP_RUN_PROG_UDP4_SENDMSG_LOCK(sk, - (struct sockaddr *)usin, &ipc.addr); + (struct sockaddr *)usin, + &msg->msg_namelen, + &ipc.addr); if (err) goto out_free; if (usin) { @@ -1865,7 +1867,8 @@ int udp_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, int flags, *addr_len = sizeof(*sin);
BPF_CGROUP_RUN_PROG_UDP4_RECVMSG_LOCK(sk, - (struct sockaddr *)sin); + (struct sockaddr *)sin, + addr_len); }
if (udp_test_bit(GRO_ENABLED, sk)) @@ -1904,7 +1907,7 @@ int udp_pre_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len) if (addr_len < sizeof(struct sockaddr_in)) return -EINVAL;
- return BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr); + return BPF_CGROUP_RUN_PROG_INET4_CONNECT_LOCK(sk, uaddr, &addr_len); } EXPORT_SYMBOL(udp_pre_connect);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c index b6c5b5e25a2f..4375bfa4f608 100644 --- a/net/ipv6/af_inet6.c +++ b/net/ipv6/af_inet6.c @@ -456,7 +456,7 @@ int inet6_bind_sk(struct sock *sk, struct sockaddr *uaddr, int addr_len) /* BPF prog is run before any checks are done so that if the prog * changes context in a wrong way it will be caught. */ - err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, + err = BPF_CGROUP_RUN_PROG_INET_BIND_LOCK(sk, uaddr, &addr_len, CGROUP_INET6_BIND, &flags); if (err) return err; @@ -522,6 +522,7 @@ int inet6_getname(struct socket *sock, struct sockaddr *uaddr, int peer) { struct sockaddr_in6 *sin = (struct sockaddr_in6 *)uaddr; + int sin_addr_len = sizeof(*sin); struct sock *sk = sock->sk; struct inet_sock *inet = inet_sk(sk); struct ipv6_pinfo *np = inet6_sk(sk); @@ -541,7 +542,7 @@ int inet6_getname(struct socket *sock, struct sockaddr *uaddr, sin->sin6_addr = sk->sk_v6_daddr; if (np->sndflow) sin->sin6_flowinfo = np->flow_label; - BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, + BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, &sin_addr_len, CGROUP_INET6_GETPEERNAME); } else { if (ipv6_addr_any(&sk->sk_v6_rcv_saddr)) @@ -549,13 +550,13 @@ int inet6_getname(struct socket *sock, struct sockaddr *uaddr, else sin->sin6_addr = sk->sk_v6_rcv_saddr; sin->sin6_port = inet->inet_sport; - BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, + BPF_CGROUP_RUN_SA_PROG(sk, (struct sockaddr *)sin, &sin_addr_len, CGROUP_INET6_GETSOCKNAME); } sin->sin6_scope_id = ipv6_iface_scope_id(&sin->sin6_addr, sk->sk_bound_dev_if); release_sock(sk); - return sizeof(*sin); + return sin_addr_len; } EXPORT_SYMBOL(inet6_getname);
diff --git a/net/ipv6/ping.c b/net/ipv6/ping.c index 5831aaa53d75..25243737fbc4 100644 --- a/net/ipv6/ping.c +++ b/net/ipv6/ping.c @@ -56,7 +56,7 @@ static int ping_v6_pre_connect(struct sock *sk, struct sockaddr *uaddr, if (addr_len < SIN6_LEN_RFC2133) return -EINVAL;
- return BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr); + return BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, &addr_len); }
static int ping_v6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c index 44b6949d72b2..3783334ef233 100644 --- a/net/ipv6/tcp_ipv6.c +++ b/net/ipv6/tcp_ipv6.c @@ -135,7 +135,7 @@ static int tcp_v6_pre_connect(struct sock *sk, struct sockaddr *uaddr,
sock_owned_by_me(sk);
- return BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr); + return BPF_CGROUP_RUN_PROG_INET6_CONNECT(sk, uaddr, &addr_len); }
static int tcp_v6_connect(struct sock *sk, struct sockaddr *uaddr, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index f1170dcc21d9..438476a31313 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -410,7 +410,8 @@ int udpv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len, *addr_len = sizeof(*sin6);
BPF_CGROUP_RUN_PROG_UDP6_RECVMSG_LOCK(sk, - (struct sockaddr *)sin6); + (struct sockaddr *)sin6, + addr_len); }
if (udp_test_bit(GRO_ENABLED, sk)) @@ -1157,7 +1158,7 @@ static int udpv6_pre_connect(struct sock *sk, struct sockaddr *uaddr, if (addr_len < SIN6_LEN_RFC2133) return -EINVAL;
- return BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr); + return BPF_CGROUP_RUN_PROG_INET6_CONNECT_LOCK(sk, uaddr, &addr_len); }
/** @@ -1510,6 +1511,7 @@ int udpv6_sendmsg(struct sock *sk, struct msghdr *msg, size_t len) if (cgroup_bpf_enabled(CGROUP_UDP6_SENDMSG) && !connected) { err = BPF_CGROUP_RUN_PROG_UDP6_SENDMSG_LOCK(sk, (struct sockaddr *)sin6, + &addr_len, &fl6->saddr); if (err) goto out_no_dst;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Daan De Meyer daan.j.demeyer@gmail.com
[ Upstream commit 53e380d21441909b12b6e0782b77187ae4b971c4 ]
As prep for adding unix socket support to the cgroup sockaddr hooks, let's add a kfunc bpf_sock_addr_set_sun_path() that allows modifying a unix sockaddr from bpf. While this is already possible for AF_INET and AF_INET6, we'll need this kfunc when we add unix socket support since modifying the address for those requires modifying both the address and the sockaddr length.
Signed-off-by: Daan De Meyer daan.j.demeyer@gmail.com Link: https://lore.kernel.org/r/20231011185113.140426-4-daan.j.demeyer@gmail.com Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Stable-dep-of: c5114710c8ce ("xsk: fix usage of multi-buffer BPF helpers for ZC XDP") Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/bpf/btf.c | 1 + net/core/filter.c | 35 ++++++++++++++++++++++++++++++++++- 2 files changed, 35 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c index 8090d7fb11ef..a31704a6bb61 100644 --- a/kernel/bpf/btf.c +++ b/kernel/bpf/btf.c @@ -7832,6 +7832,7 @@ static int bpf_prog_type_to_kfunc_hook(enum bpf_prog_type prog_type) case BPF_PROG_TYPE_SYSCALL: return BTF_KFUNC_HOOK_SYSCALL; case BPF_PROG_TYPE_CGROUP_SKB: + case BPF_PROG_TYPE_CGROUP_SOCK_ADDR: return BTF_KFUNC_HOOK_CGROUP_SKB; case BPF_PROG_TYPE_SCHED_ACT: return BTF_KFUNC_HOOK_SCHED_ACT; diff --git a/net/core/filter.c b/net/core/filter.c index 90fe3e754383..cbc395d96479 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -81,6 +81,7 @@ #include <net/xdp.h> #include <net/mptcp.h> #include <net/netfilter/nf_conntrack_bpf.h> +#include <linux/un.h>
static const struct bpf_func_proto * bpf_sk_base_func_proto(enum bpf_func_id func_id); @@ -11772,6 +11773,27 @@ __bpf_kfunc int bpf_dynptr_from_xdp(struct xdp_buff *xdp, u64 flags,
return 0; } + +__bpf_kfunc int bpf_sock_addr_set_sun_path(struct bpf_sock_addr_kern *sa_kern, + const u8 *sun_path, u32 sun_path__sz) +{ + struct sockaddr_un *un; + + if (sa_kern->sk->sk_family != AF_UNIX) + return -EINVAL; + + /* We do not allow changing the address to unnamed or larger than the + * maximum allowed address size for a unix sockaddr. + */ + if (sun_path__sz == 0 || sun_path__sz > UNIX_PATH_MAX) + return -EINVAL; + + un = (struct sockaddr_un *)sa_kern->uaddr; + memcpy(un->sun_path, sun_path, sun_path__sz); + sa_kern->uaddrlen = offsetof(struct sockaddr_un, sun_path) + sun_path__sz; + + return 0; +} __diag_pop();
int bpf_dynptr_from_skb_rdonly(struct sk_buff *skb, u64 flags, @@ -11796,6 +11818,10 @@ BTF_SET8_START(bpf_kfunc_check_set_xdp) BTF_ID_FLAGS(func, bpf_dynptr_from_xdp) BTF_SET8_END(bpf_kfunc_check_set_xdp)
+BTF_SET8_START(bpf_kfunc_check_set_sock_addr) +BTF_ID_FLAGS(func, bpf_sock_addr_set_sun_path) +BTF_SET8_END(bpf_kfunc_check_set_sock_addr) + static const struct btf_kfunc_id_set bpf_kfunc_set_skb = { .owner = THIS_MODULE, .set = &bpf_kfunc_check_set_skb, @@ -11806,6 +11832,11 @@ static const struct btf_kfunc_id_set bpf_kfunc_set_xdp = { .set = &bpf_kfunc_check_set_xdp, };
+static const struct btf_kfunc_id_set bpf_kfunc_set_sock_addr = { + .owner = THIS_MODULE, + .set = &bpf_kfunc_check_set_sock_addr, +}; + static int __init bpf_kfunc_init(void) { int ret; @@ -11820,7 +11851,9 @@ static int __init bpf_kfunc_init(void) ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_XMIT, &bpf_kfunc_set_skb); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_LWT_SEG6LOCAL, &bpf_kfunc_set_skb); ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_NETFILTER, &bpf_kfunc_set_skb); - return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp); + ret = ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp); + return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_CGROUP_SOCK_ADDR, + &bpf_kfunc_set_sock_addr); } late_initcall(bpf_kfunc_init);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit c5114710c8ce86b8317e9b448f4fd15c711c2a82 ]
Currently when packet is shrunk via bpf_xdp_adjust_tail() and memory type is set to MEM_TYPE_XSK_BUFF_POOL, null ptr dereference happens:
[1136314.192256] BUG: kernel NULL pointer dereference, address: 0000000000000034 [1136314.203943] #PF: supervisor read access in kernel mode [1136314.213768] #PF: error_code(0x0000) - not-present page [1136314.223550] PGD 0 P4D 0 [1136314.230684] Oops: 0000 [#1] PREEMPT SMP NOPTI [1136314.239621] CPU: 8 PID: 54203 Comm: xdpsock Not tainted 6.6.0+ #257 [1136314.250469] Hardware name: Intel Corporation S2600WFT/S2600WFT, BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019 [1136314.265615] RIP: 0010:__xdp_return+0x6c/0x210 [1136314.274653] Code: ad 00 48 8b 47 08 49 89 f8 a8 01 0f 85 9b 01 00 00 0f 1f 44 00 00 f0 41 ff 48 34 75 32 4c 89 c7 e9 79 cd 80 ff 83 fe 03 75 17 <f6> 41 34 01 0f 85 02 01 00 00 48 89 cf e9 22 cc 1e 00 e9 3d d2 86 [1136314.302907] RSP: 0018:ffffc900089f8db0 EFLAGS: 00010246 [1136314.312967] RAX: ffffc9003168aed0 RBX: ffff8881c3300000 RCX: 0000000000000000 [1136314.324953] RDX: 0000000000000000 RSI: 0000000000000003 RDI: ffffc9003168c000 [1136314.336929] RBP: 0000000000000ae0 R08: 0000000000000002 R09: 0000000000010000 [1136314.348844] R10: ffffc9000e495000 R11: 0000000000000040 R12: 0000000000000001 [1136314.360706] R13: 0000000000000524 R14: ffffc9003168aec0 R15: 0000000000000001 [1136314.373298] FS: 00007f8df8bbcb80(0000) GS:ffff8897e0e00000(0000) knlGS:0000000000000000 [1136314.386105] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [1136314.396532] CR2: 0000000000000034 CR3: 00000001aa912002 CR4: 00000000007706f0 [1136314.408377] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [1136314.420173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [1136314.431890] PKRU: 55555554 [1136314.439143] Call Trace: [1136314.446058] <IRQ> [1136314.452465] ? __die+0x20/0x70 [1136314.459881] ? page_fault_oops+0x15b/0x440 [1136314.468305] ? exc_page_fault+0x6a/0x150 [1136314.476491] ? asm_exc_page_fault+0x22/0x30 [1136314.484927] ? __xdp_return+0x6c/0x210 [1136314.492863] bpf_xdp_adjust_tail+0x155/0x1d0 [1136314.501269] bpf_prog_ccc47ae29d3b6570_xdp_sock_prog+0x15/0x60 [1136314.511263] ice_clean_rx_irq_zc+0x206/0xc60 [ice] [1136314.520222] ? ice_xmit_zc+0x6e/0x150 [ice] [1136314.528506] ice_napi_poll+0x467/0x670 [ice] [1136314.536858] ? ttwu_do_activate.constprop.0+0x8f/0x1a0 [1136314.546010] __napi_poll+0x29/0x1b0 [1136314.553462] net_rx_action+0x133/0x270 [1136314.561619] __do_softirq+0xbe/0x28e [1136314.569303] do_softirq+0x3f/0x60
This comes from __xdp_return() call with xdp_buff argument passed as NULL which is supposed to be consumed by xsk_buff_free() call.
To address this properly, in ZC case, a node that represents the frag being removed has to be pulled out of xskb_list. Introduce appropriate xsk helpers to do such node operation and use them accordingly within bpf_xdp_adjust_tail().
Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Acked-by: Magnus Karlsson magnus.karlsson@intel.com # For the xsk header part Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Link: https://lore.kernel.org/r/20240124191602.566724-4-maciej.fijalkowski@intel.c... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/xdp_sock_drv.h | 26 +++++++++++++++++++++++ net/core/filter.c | 42 ++++++++++++++++++++++++++++++++------ 2 files changed, 62 insertions(+), 6 deletions(-)
diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h index 7290eb721c07..5425f7ad5ebd 100644 --- a/include/net/xdp_sock_drv.h +++ b/include/net/xdp_sock_drv.h @@ -147,6 +147,23 @@ static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) return ret; }
+static inline void xsk_buff_del_tail(struct xdp_buff *tail) +{ + struct xdp_buff_xsk *xskb = container_of(tail, struct xdp_buff_xsk, xdp); + + list_del(&xskb->xskb_list_node); +} + +static inline struct xdp_buff *xsk_buff_get_tail(struct xdp_buff *first) +{ + struct xdp_buff_xsk *xskb = container_of(first, struct xdp_buff_xsk, xdp); + struct xdp_buff_xsk *frag; + + frag = list_last_entry(&xskb->pool->xskb_list, struct xdp_buff_xsk, + xskb_list_node); + return &frag->xdp; +} + static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size) { xdp->data = xdp->data_hard_start + XDP_PACKET_HEADROOM; @@ -310,6 +327,15 @@ static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first) return NULL; }
+static inline void xsk_buff_del_tail(struct xdp_buff *tail) +{ +} + +static inline struct xdp_buff *xsk_buff_get_tail(struct xdp_buff *first) +{ + return NULL; +} + static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size) { } diff --git a/net/core/filter.c b/net/core/filter.c index cbc395d96479..46ee0f5433e3 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -82,6 +82,7 @@ #include <net/mptcp.h> #include <net/netfilter/nf_conntrack_bpf.h> #include <linux/un.h> +#include <net/xdp_sock_drv.h>
static const struct bpf_func_proto * bpf_sk_base_func_proto(enum bpf_func_id func_id); @@ -4084,6 +4085,40 @@ static int bpf_xdp_frags_increase_tail(struct xdp_buff *xdp, int offset) return 0; }
+static void bpf_xdp_shrink_data_zc(struct xdp_buff *xdp, int shrink, + struct xdp_mem_info *mem_info, bool release) +{ + struct xdp_buff *zc_frag = xsk_buff_get_tail(xdp); + + if (release) { + xsk_buff_del_tail(zc_frag); + __xdp_return(NULL, mem_info, false, zc_frag); + } else { + zc_frag->data_end -= shrink; + } +} + +static bool bpf_xdp_shrink_data(struct xdp_buff *xdp, skb_frag_t *frag, + int shrink) +{ + struct xdp_mem_info *mem_info = &xdp->rxq->mem; + bool release = skb_frag_size(frag) == shrink; + + if (mem_info->type == MEM_TYPE_XSK_BUFF_POOL) { + bpf_xdp_shrink_data_zc(xdp, shrink, mem_info, release); + goto out; + } + + if (release) { + struct page *page = skb_frag_page(frag); + + __xdp_return(page_address(page), mem_info, false, NULL); + } + +out: + return release; +} + static int bpf_xdp_frags_shrink_tail(struct xdp_buff *xdp, int offset) { struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); @@ -4098,12 +4133,7 @@ static int bpf_xdp_frags_shrink_tail(struct xdp_buff *xdp, int offset)
len_free += shrink; offset -= shrink; - - if (skb_frag_size(frag) == shrink) { - struct page *page = skb_frag_page(frag); - - __xdp_return(page_address(page), &xdp->rxq->mem, - false, NULL); + if (bpf_xdp_shrink_data(xdp, frag, shrink)) { n_frags_free++; } else { skb_frag_size_sub(frag, shrink);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit ad2047cf5d9313200e308612aed516548873d124 ]
Fix an OOM panic in XDP_DRV mode when a XDP program shrinks a multi-buffer packet by 4k bytes and then redirects it to an AF_XDP socket.
Since support for handling multi-buffer frames was added to XDP, usage of bpf_xdp_adjust_tail() helper within XDP program can free the page that given fragment occupies and in turn decrease the fragment count within skb_shared_info that is embedded in xdp_buff struct. In current ice driver codebase, it can become problematic when page recycling logic decides not to reuse the page. In such case, __page_frag_cache_drain() is used with ice_rx_buf::pagecnt_bias that was not adjusted after refcount of page was changed by XDP prog which in turn does not drain the refcount to 0 and page is never freed.
To address this, let us store the count of frags before the XDP program was executed on Rx ring struct. This will be used to compare with current frag count from skb_shared_info embedded in xdp_buff. A smaller value in the latter indicates that XDP prog freed frag(s). Then, for given delta decrement pagecnt_bias for XDP_DROP verdict.
While at it, let us also handle the EOP frag within ice_set_rx_bufs_act() to make our life easier, so all of the adjustments needed to be applied against freed frags are performed in the single place.
Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Acked-by: Magnus Karlsson magnus.karlsson@intel.com Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Link: https://lore.kernel.org/r/20240124191602.566724-5-maciej.fijalkowski@intel.c... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 14 ++++++--- drivers/net/ethernet/intel/ice/ice_txrx.h | 1 + drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 31 +++++++++++++------ 3 files changed, 32 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 52d0a126eb61..5b0f9e53f6b4 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -600,9 +600,7 @@ ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, ret = ICE_XDP_CONSUMED; } exit: - rx_buf->act = ret; - if (unlikely(xdp_buff_has_frags(xdp))) - ice_set_rx_bufs_act(xdp, rx_ring, ret); + ice_set_rx_bufs_act(xdp, rx_ring, ret); }
/** @@ -890,14 +888,17 @@ ice_add_xdp_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp, }
if (unlikely(sinfo->nr_frags == MAX_SKB_FRAGS)) { - if (unlikely(xdp_buff_has_frags(xdp))) - ice_set_rx_bufs_act(xdp, rx_ring, ICE_XDP_CONSUMED); + ice_set_rx_bufs_act(xdp, rx_ring, ICE_XDP_CONSUMED); return -ENOMEM; }
__skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, rx_buf->page, rx_buf->page_offset, size); sinfo->xdp_frags_size += size; + /* remember frag count before XDP prog execution; bpf_xdp_adjust_tail() + * can pop off frags but driver has to handle it on its own + */ + rx_ring->nr_frags = sinfo->nr_frags;
if (page_is_pfmemalloc(rx_buf->page)) xdp_buff_set_frag_pfmemalloc(xdp); @@ -1249,6 +1250,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
xdp->data = NULL; rx_ring->first_desc = ntc; + rx_ring->nr_frags = 0; continue; construct_skb: if (likely(ice_ring_uses_build_skb(rx_ring))) @@ -1264,10 +1266,12 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget) ICE_XDP_CONSUMED); xdp->data = NULL; rx_ring->first_desc = ntc; + rx_ring->nr_frags = 0; break; } xdp->data = NULL; rx_ring->first_desc = ntc; + rx_ring->nr_frags = 0;
stat_err_bits = BIT(ICE_RX_FLEX_DESC_STATUS0_RXE_S); if (unlikely(ice_test_staterr(rx_desc->wb.status_error0, diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index 166413fc33f4..407d4c320097 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -333,6 +333,7 @@ struct ice_rx_ring { struct ice_channel *ch; struct ice_tx_ring *xdp_ring; struct xsk_buff_pool *xsk_pool; + u32 nr_frags; dma_addr_t dma; /* physical address of ring */ u64 cached_phctime; u16 rx_buf_len; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h index 115969ecdf7b..b0e56675f98b 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h @@ -12,26 +12,39 @@ * act: action to store onto Rx buffers related to XDP buffer parts * * Set action that should be taken before putting Rx buffer from first frag - * to one before last. Last one is handled by caller of this function as it - * is the EOP frag that is currently being processed. This function is - * supposed to be called only when XDP buffer contains frags. + * to the last. */ static inline void ice_set_rx_bufs_act(struct xdp_buff *xdp, const struct ice_rx_ring *rx_ring, const unsigned int act) { - const struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp); - u32 first = rx_ring->first_desc; - u32 nr_frags = sinfo->nr_frags; + u32 sinfo_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; + u32 nr_frags = rx_ring->nr_frags + 1; + u32 idx = rx_ring->first_desc; u32 cnt = rx_ring->count; struct ice_rx_buf *buf;
for (int i = 0; i < nr_frags; i++) { - buf = &rx_ring->rx_buf[first]; + buf = &rx_ring->rx_buf[idx]; buf->act = act;
- if (++first == cnt) - first = 0; + if (++idx == cnt) + idx = 0; + } + + /* adjust pagecnt_bias on frags freed by XDP prog */ + if (sinfo_frags < rx_ring->nr_frags && act == ICE_XDP_CONSUMED) { + u32 delta = rx_ring->nr_frags - sinfo_frags; + + while (delta) { + if (idx == 0) + idx = cnt - 1; + else + idx--; + buf = &rx_ring->rx_buf[idx]; + buf->pagecnt_bias--; + delta--; + } } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tirthendu Sarkar tirthendu.sarkar@intel.com
[ Upstream commit 83014323c642b8faa2d64a5f303b41c019322478 ]
XDP programs can shrink packets by calling the bpf_xdp_adjust_tail() helper function. For multi-buffer packets this may lead to reduction of frag count stored in skb_shared_info area of the xdp_buff struct. This results in issues with the current handling of XDP_PASS and XDP_DROP cases.
For XDP_PASS, currently skb is being built using frag count of xdp_buffer before it was processed by XDP prog and thus will result in an inconsistent skb when frag count gets reduced by XDP prog. To fix this, get correct frag count while building the skb instead of using pre-obtained frag count.
For XDP_DROP, current page recycling logic will not reuse the page but instead will adjust the pagecnt_bias so that the page can be freed. This again results in inconsistent behavior as the page refcnt has already been changed by the helper while freeing the frag(s) as part of shrinking the packet. To fix this, only adjust pagecnt_bias for buffers that are stillpart of the packet post-xdp prog run.
Fixes: e213ced19bef ("i40e: add support for XDP multi-buffer Rx") Reported-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Signed-off-by: Tirthendu Sarkar tirthendu.sarkar@intel.com Link: https://lore.kernel.org/r/20240124191602.566724-6-maciej.fijalkowski@intel.c... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 40 ++++++++++++--------- 1 file changed, 23 insertions(+), 17 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index b047c587629b..2e5546e549d9 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -2100,7 +2100,8 @@ static void i40e_put_rx_buffer(struct i40e_ring *rx_ring, static void i40e_process_rx_buffs(struct i40e_ring *rx_ring, int xdp_res, struct xdp_buff *xdp) { - u32 next = rx_ring->next_to_clean; + u32 nr_frags = xdp_get_shared_info_from_buff(xdp)->nr_frags; + u32 next = rx_ring->next_to_clean, i = 0; struct i40e_rx_buffer *rx_buffer;
xdp->flags = 0; @@ -2113,10 +2114,10 @@ static void i40e_process_rx_buffs(struct i40e_ring *rx_ring, int xdp_res, if (!rx_buffer->page) continue;
- if (xdp_res == I40E_XDP_CONSUMED) - rx_buffer->pagecnt_bias++; - else + if (xdp_res != I40E_XDP_CONSUMED) i40e_rx_buffer_flip(rx_buffer, xdp->frame_sz); + else if (i++ <= nr_frags) + rx_buffer->pagecnt_bias++;
/* EOP buffer will be put in i40e_clean_rx_irq() */ if (next == rx_ring->next_to_process) @@ -2130,20 +2131,20 @@ static void i40e_process_rx_buffs(struct i40e_ring *rx_ring, int xdp_res, * i40e_construct_skb - Allocate skb and populate it * @rx_ring: rx descriptor ring to transact packets on * @xdp: xdp_buff pointing to the data - * @nr_frags: number of buffers for the packet * * This function allocates an skb. It then populates it with the page * data from the current receive descriptor, taking care to set up the * skb correctly. */ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring, - struct xdp_buff *xdp, - u32 nr_frags) + struct xdp_buff *xdp) { unsigned int size = xdp->data_end - xdp->data; struct i40e_rx_buffer *rx_buffer; + struct skb_shared_info *sinfo; unsigned int headlen; struct sk_buff *skb; + u32 nr_frags = 0;
/* prefetch first cache line of first page */ net_prefetch(xdp->data); @@ -2181,6 +2182,10 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring, memcpy(__skb_put(skb, headlen), xdp->data, ALIGN(headlen, sizeof(long)));
+ if (unlikely(xdp_buff_has_frags(xdp))) { + sinfo = xdp_get_shared_info_from_buff(xdp); + nr_frags = sinfo->nr_frags; + } rx_buffer = i40e_rx_bi(rx_ring, rx_ring->next_to_clean); /* update all of the pointers */ size -= headlen; @@ -2200,9 +2205,8 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring, }
if (unlikely(xdp_buff_has_frags(xdp))) { - struct skb_shared_info *sinfo, *skinfo = skb_shinfo(skb); + struct skb_shared_info *skinfo = skb_shinfo(skb);
- sinfo = xdp_get_shared_info_from_buff(xdp); memcpy(&skinfo->frags[skinfo->nr_frags], &sinfo->frags[0], sizeof(skb_frag_t) * nr_frags);
@@ -2225,17 +2229,17 @@ static struct sk_buff *i40e_construct_skb(struct i40e_ring *rx_ring, * i40e_build_skb - Build skb around an existing buffer * @rx_ring: Rx descriptor ring to transact packets on * @xdp: xdp_buff pointing to the data - * @nr_frags: number of buffers for the packet * * This function builds an skb around an existing Rx buffer, taking care * to set up the skb correctly and avoid any memcpy overhead. */ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring, - struct xdp_buff *xdp, - u32 nr_frags) + struct xdp_buff *xdp) { unsigned int metasize = xdp->data - xdp->data_meta; + struct skb_shared_info *sinfo; struct sk_buff *skb; + u32 nr_frags;
/* Prefetch first cache line of first page. If xdp->data_meta * is unused, this points exactly as xdp->data, otherwise we @@ -2244,6 +2248,11 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring, */ net_prefetch(xdp->data_meta);
+ if (unlikely(xdp_buff_has_frags(xdp))) { + sinfo = xdp_get_shared_info_from_buff(xdp); + nr_frags = sinfo->nr_frags; + } + /* build an skb around the page buffer */ skb = napi_build_skb(xdp->data_hard_start, xdp->frame_sz); if (unlikely(!skb)) @@ -2256,9 +2265,6 @@ static struct sk_buff *i40e_build_skb(struct i40e_ring *rx_ring, skb_metadata_set(skb, metasize);
if (unlikely(xdp_buff_has_frags(xdp))) { - struct skb_shared_info *sinfo; - - sinfo = xdp_get_shared_info_from_buff(xdp); xdp_update_skb_shared_info(skb, nr_frags, sinfo->xdp_frags_size, nr_frags * xdp->frame_sz, @@ -2603,9 +2609,9 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget, total_rx_bytes += size; } else { if (ring_uses_build_skb(rx_ring)) - skb = i40e_build_skb(rx_ring, xdp, nfrags); + skb = i40e_build_skb(rx_ring, xdp); else - skb = i40e_construct_skb(rx_ring, xdp, nfrags); + skb = i40e_construct_skb(rx_ring, xdp);
/* drop if we failed to retrieve a buffer */ if (!skb) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit 2ee788c06493d02ee85855414cca39825e768aaf ]
xdp_rxq_info struct can be registered by drivers via two functions - xdp_rxq_info_reg() and __xdp_rxq_info_reg(). The latter one allows drivers that support XDP multi-buffer to set up xdp_rxq_info::frag_size which in turn will make it possible to grow the packet via bpf_xdp_adjust_tail() BPF helper.
Currently, ice registers xdp_rxq_info in two spots: 1) ice_setup_rx_ring() // via xdp_rxq_info_reg(), BUG 2) ice_vsi_cfg_rxq() // via __xdp_rxq_info_reg(), OK
Cited commit under fixes tag took care of setting up frag_size and updated registration scheme in 2) but it did not help as 1) is called before 2) and as shown above it uses old registration function. This means that 2) sees that xdp_rxq_info is already registered and never calls __xdp_rxq_info_reg() which leaves us with xdp_rxq_info::frag_size being set to 0.
To fix this misbehavior, simply remove xdp_rxq_info_reg() call from ice_setup_rx_ring().
Fixes: 2fba7dc5157b ("ice: Add support for XDP multi-buffer on Rx side") Acked-by: Magnus Karlsson magnus.karlsson@intel.com Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Link: https://lore.kernel.org/r/20240124191602.566724-7-maciej.fijalkowski@intel.c... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_txrx.c | 5 ----- 1 file changed, 5 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 5b0f9e53f6b4..24c914015973 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -513,11 +513,6 @@ int ice_setup_rx_ring(struct ice_rx_ring *rx_ring) if (ice_is_xdp_ena_vsi(rx_ring->vsi)) WRITE_ONCE(rx_ring->xdp_prog, rx_ring->vsi->xdp_prog);
- if (rx_ring->vsi->type == ICE_VSI_PF && - !xdp_rxq_info_is_reg(&rx_ring->xdp_rxq)) - if (xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, - rx_ring->q_index, rx_ring->q_vector->napi.napi_id)) - goto err; return 0;
err:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit 290779905d09d5fdf6caa4f58ddefc3f4db0c0a9 ]
Ice and i40e ZC drivers currently set offset of a frag within skb_shared_info to 0, which is incorrect. xdp_buffs that come from xsk_buff_pool always have 256 bytes of a headroom, so they need to be taken into account to retrieve xdp_buff::data via skb_frag_address(). Otherwise, bpf_xdp_frags_increase_tail() would be starting its job from xdp_buff::data_hard_start which would result in overwriting existing payload.
Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support") Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support") Acked-by: Magnus Karlsson magnus.karlsson@intel.com Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Link: https://lore.kernel.org/r/20240124191602.566724-8-maciej.fijalkowski@intel.c... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_xsk.c | 3 ++- drivers/net/ethernet/intel/ice/ice_xsk.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_xsk.c b/drivers/net/ethernet/intel/i40e/i40e_xsk.c index b75e6b6d317c..1f8ae6f5d980 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_xsk.c +++ b/drivers/net/ethernet/intel/i40e/i40e_xsk.c @@ -418,7 +418,8 @@ i40e_add_xsk_frag(struct i40e_ring *rx_ring, struct xdp_buff *first, }
__skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, - virt_to_page(xdp->data_hard_start), 0, size); + virt_to_page(xdp->data_hard_start), + XDP_PACKET_HEADROOM, size); sinfo->xdp_frags_size += size; xsk_buff_add_frag(xdp);
diff --git a/drivers/net/ethernet/intel/ice/ice_xsk.c b/drivers/net/ethernet/intel/ice/ice_xsk.c index 33f194c870bb..307c609137bd 100644 --- a/drivers/net/ethernet/intel/ice/ice_xsk.c +++ b/drivers/net/ethernet/intel/ice/ice_xsk.c @@ -826,7 +826,8 @@ ice_add_xsk_frag(struct ice_rx_ring *rx_ring, struct xdp_buff *first, }
__skb_fill_page_desc_noacc(sinfo, sinfo->nr_frags++, - virt_to_page(xdp->data_hard_start), 0, size); + virt_to_page(xdp->data_hard_start), + XDP_PACKET_HEADROOM, size); sinfo->xdp_frags_size += size; xsk_buff_add_frag(xdp);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit 3de38c87174225487fc93befeea7d380db80aef6 ]
Now that ice driver correctly sets up frag_size in xdp_rxq_info, let us make it work for ZC multi-buffer as well. ice_rx_ring::rx_buf_len for ZC is being set via xsk_pool_get_rx_frame_size() and this needs to be propagated up to xdp_rxq_info.
Use a bigger hammer and instead of unregistering only xdp_rxq_info's memory model, unregister it altogether and register it again and have xdp_rxq_info with correct frag_size value.
Fixes: 1bbc04de607b ("ice: xsk: add RX multi-buffer support") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Link: https://lore.kernel.org/r/20240124191602.566724-9-maciej.fijalkowski@intel.c... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/ice/ice_base.c | 37 ++++++++++++++--------- 1 file changed, 23 insertions(+), 14 deletions(-)
diff --git a/drivers/net/ethernet/intel/ice/ice_base.c b/drivers/net/ethernet/intel/ice/ice_base.c index 7fa43827a3f0..4f3e65b47cdc 100644 --- a/drivers/net/ethernet/intel/ice/ice_base.c +++ b/drivers/net/ethernet/intel/ice/ice_base.c @@ -534,19 +534,27 @@ int ice_vsi_cfg_rxq(struct ice_rx_ring *ring) ring->rx_buf_len = ring->vsi->rx_buf_len;
if (ring->vsi->type == ICE_VSI_PF) { - if (!xdp_rxq_info_is_reg(&ring->xdp_rxq)) - /* coverity[check_return] */ - __xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev, - ring->q_index, - ring->q_vector->napi.napi_id, - ring->vsi->rx_buf_len); + if (!xdp_rxq_info_is_reg(&ring->xdp_rxq)) { + err = __xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev, + ring->q_index, + ring->q_vector->napi.napi_id, + ring->rx_buf_len); + if (err) + return err; + }
ring->xsk_pool = ice_xsk_pool(ring); if (ring->xsk_pool) { - xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); + xdp_rxq_info_unreg(&ring->xdp_rxq);
ring->rx_buf_len = xsk_pool_get_rx_frame_size(ring->xsk_pool); + err = __xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev, + ring->q_index, + ring->q_vector->napi.napi_id, + ring->rx_buf_len); + if (err) + return err; err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_XSK_BUFF_POOL, NULL); @@ -557,13 +565,14 @@ int ice_vsi_cfg_rxq(struct ice_rx_ring *ring) dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n", ring->q_index); } else { - if (!xdp_rxq_info_is_reg(&ring->xdp_rxq)) - /* coverity[check_return] */ - __xdp_rxq_info_reg(&ring->xdp_rxq, - ring->netdev, - ring->q_index, - ring->q_vector->napi.napi_id, - ring->vsi->rx_buf_len); + if (!xdp_rxq_info_is_reg(&ring->xdp_rxq)) { + err = __xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev, + ring->q_index, + ring->q_vector->napi.napi_id, + ring->rx_buf_len); + if (err) + return err; + }
err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_PAGE_SHARED,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit fbadd83a612c3b7aad2987893faca6bd24aaebb3 ]
XSK ZC Rx path calculates the size of data that will be posted to XSK Rx queue via subtracting xdp_buff::data_end from xdp_buff::data.
In bpf_xdp_frags_increase_tail(), when underlying memory type of xdp_rxq_info is MEM_TYPE_XSK_BUFF_POOL, add offset to data_end in tail fragment, so that later on user space will be able to take into account the amount of bytes added by XDP program.
Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Link: https://lore.kernel.org/r/20240124191602.566724-10-maciej.fijalkowski@intel.... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/filter.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/net/core/filter.c b/net/core/filter.c index 46ee0f5433e3..01f2417deef2 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -4081,6 +4081,8 @@ static int bpf_xdp_frags_increase_tail(struct xdp_buff *xdp, int offset) memset(skb_frag_address(frag) + skb_frag_size(frag), 0, offset); skb_frag_size_add(frag, offset); sinfo->xdp_frags_size += offset; + if (rxq->mem.type == MEM_TYPE_XSK_BUFF_POOL) + xsk_buff_get_tail(xdp)->data_end += offset;
return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit a045d2f2d03d23e7db6772dd83e0ba2705dfad93 ]
i40e support XDP multi-buffer so it is supposed to use __xdp_rxq_info_reg() instead of xdp_rxq_info_reg() and set the frag_size. It can not be simply converted at existing callsite because rx_buf_len could be un-initialized, so let us register xdp_rxq_info within i40e_configure_rx_ring(), which happen to be called with already initialized rx_buf_len value.
Commit 5180ff1364bc ("i40e: use int for i40e_status") converted 'err' to int, so two variables to deal with return codes are not needed within i40e_configure_rx_ring(). Remove 'ret' and use 'err' to handle status from xdp_rxq_info registration.
Fixes: e213ced19bef ("i40e: add support for XDP multi-buffer Rx") Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Link: https://lore.kernel.org/r/20240124191602.566724-11-maciej.fijalkowski@intel.... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 40 ++++++++++++--------- drivers/net/ethernet/intel/i40e/i40e_txrx.c | 9 ----- 2 files changed, 24 insertions(+), 25 deletions(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index 5b20eba93d04..aadca7b3443c 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3578,40 +3578,48 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring) struct i40e_hmc_obj_rxq rx_ctx; int err = 0; bool ok; - int ret;
bitmap_zero(ring->state, __I40E_RING_STATE_NBITS);
/* clear the context structure first */ memset(&rx_ctx, 0, sizeof(rx_ctx));
- if (ring->vsi->type == I40E_VSI_MAIN) - xdp_rxq_info_unreg_mem_model(&ring->xdp_rxq); + ring->rx_buf_len = vsi->rx_buf_len; + + /* XDP RX-queue info only needed for RX rings exposed to XDP */ + if (ring->vsi->type != I40E_VSI_MAIN) + goto skip; + + if (!xdp_rxq_info_is_reg(&ring->xdp_rxq)) { + err = __xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev, + ring->queue_index, + ring->q_vector->napi.napi_id, + ring->rx_buf_len); + if (err) + return err; + }
ring->xsk_pool = i40e_xsk_pool(ring); if (ring->xsk_pool) { - ring->rx_buf_len = - xsk_pool_get_rx_frame_size(ring->xsk_pool); - ret = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, + ring->rx_buf_len = xsk_pool_get_rx_frame_size(ring->xsk_pool); + err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_XSK_BUFF_POOL, NULL); - if (ret) - return ret; + if (err) + return err; dev_info(&vsi->back->pdev->dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n", ring->queue_index);
} else { - ring->rx_buf_len = vsi->rx_buf_len; - if (ring->vsi->type == I40E_VSI_MAIN) { - ret = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, - MEM_TYPE_PAGE_SHARED, - NULL); - if (ret) - return ret; - } + err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, + MEM_TYPE_PAGE_SHARED, + NULL); + if (err) + return err; }
+skip: xdp_init_buff(&ring->xdp, i40e_rx_pg_size(ring) / 2, &ring->xdp_rxq);
rx_ctx.dbuff = DIV_ROUND_UP(ring->rx_buf_len, diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c index 2e5546e549d9..1df2f9338812 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c +++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c @@ -1556,7 +1556,6 @@ void i40e_free_rx_resources(struct i40e_ring *rx_ring) int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring) { struct device *dev = rx_ring->dev; - int err;
u64_stats_init(&rx_ring->syncp);
@@ -1577,14 +1576,6 @@ int i40e_setup_rx_descriptors(struct i40e_ring *rx_ring) rx_ring->next_to_process = 0; rx_ring->next_to_use = 0;
- /* XDP RX-queue info only needed for RX rings exposed to XDP */ - if (rx_ring->vsi->type == I40E_VSI_MAIN) { - err = xdp_rxq_info_reg(&rx_ring->xdp_rxq, rx_ring->netdev, - rx_ring->queue_index, rx_ring->q_vector->napi.napi_id); - if (err < 0) - return err; - } - rx_ring->xdp_prog = rx_ring->vsi->xdp_prog;
rx_ring->rx_bi =
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Maciej Fijalkowski maciej.fijalkowski@intel.com
[ Upstream commit 0cbb08707c932b3f004bc1a8ec6200ef572c1f5f ]
Now that i40e driver correctly sets up frag_size in xdp_rxq_info, let us make it work for ZC multi-buffer as well. i40e_ring::rx_buf_len for ZC is being set via xsk_pool_get_rx_frame_size() and this needs to be propagated up to xdp_rxq_info.
Fixes: 1c9ba9c14658 ("i40e: xsk: add RX multi-buffer support") Acked-by: Magnus Karlsson magnus.karlsson@intel.com Signed-off-by: Maciej Fijalkowski maciej.fijalkowski@intel.com Link: https://lore.kernel.org/r/20240124191602.566724-12-maciej.fijalkowski@intel.... Signed-off-by: Alexei Starovoitov ast@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/i40e/i40e_main.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c index aadca7b3443c..aad39ebff4ab 100644 --- a/drivers/net/ethernet/intel/i40e/i40e_main.c +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c @@ -3601,7 +3601,14 @@ static int i40e_configure_rx_ring(struct i40e_ring *ring)
ring->xsk_pool = i40e_xsk_pool(ring); if (ring->xsk_pool) { + xdp_rxq_info_unreg(&ring->xdp_rxq); ring->rx_buf_len = xsk_pool_get_rx_frame_size(ring->xsk_pool); + err = __xdp_rxq_info_reg(&ring->xdp_rxq, ring->netdev, + ring->queue_index, + ring->q_vector->napi.napi_id, + ring->rx_buf_len); + if (err) + return err; err = xdp_rxq_info_reg_mem_model(&ring->xdp_rxq, MEM_TYPE_XSK_BUFF_POOL, NULL);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zhipeng Lu alexious@zju.edu.cn
[ Upstream commit f6cc4b6a3ae53df425771000e9c9540cce9b7bb1 ]
In fjes_hw_setup, it allocates several memory and delay the deallocation to the fjes_hw_exit in fjes_probe through the following call chain:
fjes_probe |-> fjes_hw_init |-> fjes_hw_setup |-> fjes_hw_exit
However, when fjes_hw_setup fails, fjes_hw_exit won't be called and thus all the resources allocated in fjes_hw_setup will be leaked. In this patch, we free those resources in fjes_hw_setup and prevents such leaks.
Fixes: 2fcbca687702 ("fjes: platform_driver's .probe and .remove routine") Signed-off-by: Zhipeng Lu alexious@zju.edu.cn Reviewed-by: Simon Horman horms@kernel.org Link: https://lore.kernel.org/r/20240122172445.3841883-1-alexious@zju.edu.cn Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/fjes/fjes_hw.c | 37 ++++++++++++++++++++++++++++++------- 1 file changed, 30 insertions(+), 7 deletions(-)
diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c index 704e949484d0..b9b5554ea862 100644 --- a/drivers/net/fjes/fjes_hw.c +++ b/drivers/net/fjes/fjes_hw.c @@ -221,21 +221,25 @@ static int fjes_hw_setup(struct fjes_hw *hw)
mem_size = FJES_DEV_REQ_BUF_SIZE(hw->max_epid); hw->hw_info.req_buf = kzalloc(mem_size, GFP_KERNEL); - if (!(hw->hw_info.req_buf)) - return -ENOMEM; + if (!(hw->hw_info.req_buf)) { + result = -ENOMEM; + goto free_ep_info; + }
hw->hw_info.req_buf_size = mem_size;
mem_size = FJES_DEV_RES_BUF_SIZE(hw->max_epid); hw->hw_info.res_buf = kzalloc(mem_size, GFP_KERNEL); - if (!(hw->hw_info.res_buf)) - return -ENOMEM; + if (!(hw->hw_info.res_buf)) { + result = -ENOMEM; + goto free_req_buf; + }
hw->hw_info.res_buf_size = mem_size;
result = fjes_hw_alloc_shared_status_region(hw); if (result) - return result; + goto free_res_buf;
hw->hw_info.buffer_share_bit = 0; hw->hw_info.buffer_unshare_reserve_bit = 0; @@ -246,11 +250,11 @@ static int fjes_hw_setup(struct fjes_hw *hw)
result = fjes_hw_alloc_epbuf(&buf_pair->tx); if (result) - return result; + goto free_epbuf;
result = fjes_hw_alloc_epbuf(&buf_pair->rx); if (result) - return result; + goto free_epbuf;
spin_lock_irqsave(&hw->rx_status_lock, flags); fjes_hw_setup_epbuf(&buf_pair->tx, mac, @@ -273,6 +277,25 @@ static int fjes_hw_setup(struct fjes_hw *hw) fjes_hw_init_command_registers(hw, ¶m);
return 0; + +free_epbuf: + for (epidx = 0; epidx < hw->max_epid ; epidx++) { + if (epidx == hw->my_epid) + continue; + fjes_hw_free_epbuf(&hw->ep_shm_info[epidx].tx); + fjes_hw_free_epbuf(&hw->ep_shm_info[epidx].rx); + } + fjes_hw_free_shared_status_region(hw); +free_res_buf: + kfree(hw->hw_info.res_buf); + hw->hw_info.res_buf = NULL; +free_req_buf: + kfree(hw->hw_info.req_buf); + hw->hw_info.req_buf = NULL; +free_ep_info: + kfree(hw->ep_shm_info); + hw->ep_shm_info = NULL; + return result; }
static void fjes_hw_cleanup(struct fjes_hw *hw)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hangbin Liu liuhangbin@gmail.com
[ Upstream commit a2933a8759a62269754e54733d993b19de870e84 ]
The prio_arp/ns tests hard code the mode to active-backup. At the same time, The balance-alb/tlb modes do not support arp/ns target. So remove the prio_arp/ns tests from the loop and only test active-backup mode.
Fixes: 481b56e0391e ("selftests: bonding: re-format bond option tests") Reported-by: Jay Vosburgh jay.vosburgh@canonical.com Closes: https://lore.kernel.org/netdev/17415.1705965957@famine/ Signed-off-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Jay Vosburgh jay.vosburgh@canonical.com Link: https://lore.kernel.org/r/20240123075917.1576360-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../testing/selftests/drivers/net/bonding/bond_options.sh | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/drivers/net/bonding/bond_options.sh b/tools/testing/selftests/drivers/net/bonding/bond_options.sh index c54d1697f439..d508486cc0bd 100755 --- a/tools/testing/selftests/drivers/net/bonding/bond_options.sh +++ b/tools/testing/selftests/drivers/net/bonding/bond_options.sh @@ -162,7 +162,7 @@ prio_arp() local mode=$1
for primary_reselect in 0 1 2; do - prio_test "mode active-backup arp_interval 100 arp_ip_target ${g_ip4} primary eth1 primary_reselect $primary_reselect" + prio_test "mode $mode arp_interval 100 arp_ip_target ${g_ip4} primary eth1 primary_reselect $primary_reselect" log_test "prio" "$mode arp_ip_target primary_reselect $primary_reselect" done } @@ -178,7 +178,7 @@ prio_ns() fi
for primary_reselect in 0 1 2; do - prio_test "mode active-backup arp_interval 100 ns_ip6_target ${g_ip6} primary eth1 primary_reselect $primary_reselect" + prio_test "mode $mode arp_interval 100 ns_ip6_target ${g_ip6} primary eth1 primary_reselect $primary_reselect" log_test "prio" "$mode ns_ip6_target primary_reselect $primary_reselect" done } @@ -194,9 +194,9 @@ prio()
for mode in $modes; do prio_miimon $mode - prio_arp $mode - prio_ns $mode done + prio_arp "active-backup" + prio_ns "active-backup" }
arp_validate_test()
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shenwei Wang shenwei.wang@nxp.com
[ Upstream commit 5e344807735023cd3a67c37a1852b849caa42620 ]
When repeatedly changing the interface link speed using the command below:
ethtool -s eth0 speed 100 duplex full ethtool -s eth0 speed 1000 duplex full
The following errors may sometimes be reported by the ARM SMMU driver:
[ 5395.035364] fec 5b040000.ethernet eth0: Link is Down [ 5395.039255] arm-smmu 51400000.iommu: Unhandled context fault: fsr=0x402, iova=0x00000000, fsynr=0x100001, cbfrsynra=0x852, cb=2 [ 5398.108460] fec 5b040000.ethernet eth0: Link is Up - 100Mbps/Full - flow control off
It is identified that the FEC driver does not properly stop the TX queue during the link speed transitions, and this results in the invalid virtual I/O address translations from the SMMU and causes the context faults.
Fixes: dbc64a8ea231 ("net: fec: move calls to quiesce/resume packet processing out of fec_restart()") Signed-off-by: Shenwei Wang shenwei.wang@nxp.com Link: https://lore.kernel.org/r/20240123165141.2008104-1-shenwei.wang@nxp.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/freescale/fec_main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c index 35c95f07fd6d..54da59286df4 100644 --- a/drivers/net/ethernet/freescale/fec_main.c +++ b/drivers/net/ethernet/freescale/fec_main.c @@ -2011,6 +2011,7 @@ static void fec_enet_adjust_link(struct net_device *ndev)
/* if any of the above changed restart the FEC */ if (status_change) { + netif_stop_queue(ndev); napi_disable(&fep->napi); netif_tx_lock_bh(ndev); fec_restart(ndev); @@ -2020,6 +2021,7 @@ static void fec_enet_adjust_link(struct net_device *ndev) } } else { if (fep->link) { + netif_stop_queue(ndev); napi_disable(&fep->napi); netif_tx_lock_bh(ndev); fec_stop(ndev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gerhard Engleder gerhard@engleder-embedded.com
[ Upstream commit 50bad6f797d4d501c5ef416a6f92e1912ab5aa8b ]
The RX data buffer includes the FCS. The FCS is already stripped for the normal data path. But for the XDP data path the FCS is included and acts like additional/useless data.
Remove the FCS from the RX data buffer also for XDP.
Fixes: 65b28c810035 ("tsnep: Add XDP RX support") Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support") Signed-off-by: Gerhard Engleder gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/engleder/tsnep_main.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c index 38da2d6c250e..9fea97671f4b 100644 --- a/drivers/net/ethernet/engleder/tsnep_main.c +++ b/drivers/net/ethernet/engleder/tsnep_main.c @@ -1434,7 +1434,7 @@ static int tsnep_rx_poll(struct tsnep_rx *rx, struct napi_struct *napi,
xdp_prepare_buff(&xdp, page_address(entry->page), XDP_PACKET_HEADROOM + TSNEP_RX_INLINE_METADATA_SIZE, - length, false); + length - ETH_FCS_LEN, false);
consume = tsnep_xdp_run_prog(rx, prog, &xdp, &xdp_status, tx_nq, tx); @@ -1517,7 +1517,7 @@ static int tsnep_rx_poll_zc(struct tsnep_rx *rx, struct napi_struct *napi, prefetch(entry->xdp->data); length = __le32_to_cpu(entry->desc_wb->properties) & TSNEP_DESC_LENGTH_MASK; - xsk_buff_set_size(entry->xdp, length); + xsk_buff_set_size(entry->xdp, length - ETH_FCS_LEN); xsk_buff_dma_sync_for_cpu(entry->xdp, rx->xsk_pool);
/* RX metadata with timestamps is in front of actual data,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Gerhard Engleder gerhard@engleder-embedded.com
[ Upstream commit 9a91c05f4bd6f6bdd6b8f90445e0da92e3ac956c ]
The fill ring of the XDP socket may contain not enough buffers to completey fill the RX queue during socket creation. In this case the flag XDP_RING_NEED_WAKEUP is not set as this flag is only set if the RX queue is not completely filled during polling.
Set XDP_RING_NEED_WAKEUP flag also if RX queue is not completely filled during XDP socket creation.
Fixes: 3fc2333933fd ("tsnep: Add XDP socket zero-copy RX support") Signed-off-by: Gerhard Engleder gerhard@engleder-embedded.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/engleder/tsnep_main.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/drivers/net/ethernet/engleder/tsnep_main.c b/drivers/net/ethernet/engleder/tsnep_main.c index 9fea97671f4b..08e113e785a7 100644 --- a/drivers/net/ethernet/engleder/tsnep_main.c +++ b/drivers/net/ethernet/engleder/tsnep_main.c @@ -1711,6 +1711,19 @@ static void tsnep_rx_reopen_xsk(struct tsnep_rx *rx) allocated--; } } + + /* set need wakeup flag immediately if ring is not filled completely, + * first polling would be too late as need wakeup signalisation would + * be delayed for an indefinite time + */ + if (xsk_uses_need_wakeup(rx->xsk_pool)) { + int desc_available = tsnep_rx_desc_available(rx); + + if (desc_available) + xsk_set_rx_need_wakeup(rx->xsk_pool); + else + xsk_clear_rx_need_wakeup(rx->xsk_pool); + } }
static bool tsnep_pending(struct tsnep_queue *queue)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
commit f546c4282673497a06ecb6190b50ae7f6c85b02f upstream.
[BUG] There is a bug report that, on a ext4-converted btrfs, scrub leads to various problems, including:
- "unable to find chunk map" errors BTRFS info (device vdb): scrub: started on devid 1 BTRFS critical (device vdb): unable to find chunk map for logical 2214744064 length 4096 BTRFS critical (device vdb): unable to find chunk map for logical 2214744064 length 45056
This would lead to unrepariable errors.
- Use-after-free KASAN reports: ================================================================== BUG: KASAN: slab-use-after-free in __blk_rq_map_sg+0x18f/0x7c0 Read of size 8 at addr ffff8881013c9040 by task btrfs/909 CPU: 0 PID: 909 Comm: btrfs Not tainted 6.7.0-x64v3-dbg #11 c50636e9419a8354555555245df535e380563b2b Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 2023.11-2 12/24/2023 Call Trace: <TASK> dump_stack_lvl+0x43/0x60 print_report+0xcf/0x640 kasan_report+0xa6/0xd0 __blk_rq_map_sg+0x18f/0x7c0 virtblk_prep_rq.isra.0+0x215/0x6a0 [virtio_blk 19a65eeee9ae6fcf02edfad39bb9ddee07dcdaff] virtio_queue_rqs+0xc4/0x310 [virtio_blk 19a65eeee9ae6fcf02edfad39bb9ddee07dcdaff] blk_mq_flush_plug_list.part.0+0x780/0x860 __blk_flush_plug+0x1ba/0x220 blk_finish_plug+0x3b/0x60 submit_initial_group_read+0x10a/0x290 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] flush_scrub_stripes+0x38e/0x430 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] scrub_stripe+0x82a/0xae0 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] scrub_chunk+0x178/0x200 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] scrub_enumerate_chunks+0x4bc/0xa30 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] btrfs_scrub_dev+0x398/0x810 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] btrfs_ioctl+0x4b9/0x3020 [btrfs e57987a360bed82fe8756dcd3e0de5406ccfe965] __x64_sys_ioctl+0xbd/0x100 do_syscall_64+0x5d/0xe0 entry_SYSCALL_64_after_hwframe+0x63/0x6b RIP: 0033:0x7f47e5e0952b
- Crash, mostly due to above use-after-free
[CAUSE] The converted fs has the following data chunk layout:
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 2214658048) itemoff 16025 itemsize 80 length 86016 owner 2 stripe_len 65536 type DATA|single
For above logical bytenr 2214744064, it's at the chunk end (2214658048 + 86016 = 2214744064).
This means btrfs_submit_bio() would split the bio, and trigger endio function for both of the two halves.
However scrub_submit_initial_read() would only expect the endio function to be called once, not any more. This means the first endio function would already free the bbio::bio, leaving the bvec freed, thus the 2nd endio call would lead to use-after-free.
[FIX] - Make sure scrub_read_endio() only updates bits in its range Since we may read less than 64K at the end of the chunk, we should not touch the bits beyond chunk boundary.
- Make sure scrub_submit_initial_read() only to read the chunk range This is done by calculating the real number of sectors we need to read, and add sector-by-sector to the bio.
Thankfully the scrub read repair path won't need extra fixes:
- scrub_stripe_submit_repair_read() With above fixes, we won't update error bit for range beyond chunk, thus scrub_stripe_submit_repair_read() should never submit any read beyond the chunk.
Reported-by: Rongrong i@rong.moe Fixes: e02ee89baa66 ("btrfs: scrub: switch scrub_simple_mirror() to scrub_stripe infrastructure") Tested-by: Rongrong i@rong.moe Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com [ Use min_t() to fix a compiling error due to difference types ] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/scrub.c | 29 ++++++++++++++++++++++------- 1 file changed, 22 insertions(+), 7 deletions(-)
--- a/fs/btrfs/scrub.c +++ b/fs/btrfs/scrub.c @@ -1099,12 +1099,22 @@ out: static void scrub_read_endio(struct btrfs_bio *bbio) { struct scrub_stripe *stripe = bbio->private; + struct bio_vec *bvec; + int sector_nr = calc_sector_number(stripe, bio_first_bvec_all(&bbio->bio)); + int num_sectors; + u32 bio_size = 0; + int i; + + ASSERT(sector_nr < stripe->nr_sectors); + bio_for_each_bvec_all(bvec, &bbio->bio, i) + bio_size += bvec->bv_len; + num_sectors = bio_size >> stripe->bg->fs_info->sectorsize_bits;
if (bbio->bio.bi_status) { - bitmap_set(&stripe->io_error_bitmap, 0, stripe->nr_sectors); - bitmap_set(&stripe->error_bitmap, 0, stripe->nr_sectors); + bitmap_set(&stripe->io_error_bitmap, sector_nr, num_sectors); + bitmap_set(&stripe->error_bitmap, sector_nr, num_sectors); } else { - bitmap_clear(&stripe->io_error_bitmap, 0, stripe->nr_sectors); + bitmap_clear(&stripe->io_error_bitmap, sector_nr, num_sectors); } bio_put(&bbio->bio); if (atomic_dec_and_test(&stripe->pending_io)) { @@ -1640,6 +1650,9 @@ static void scrub_submit_initial_read(st { struct btrfs_fs_info *fs_info = sctx->fs_info; struct btrfs_bio *bbio; + unsigned int nr_sectors = min_t(u64, BTRFS_STRIPE_LEN, stripe->bg->start + + stripe->bg->length - stripe->logical) >> + fs_info->sectorsize_bits; int mirror = stripe->mirror_num;
ASSERT(stripe->bg); @@ -1649,14 +1662,16 @@ static void scrub_submit_initial_read(st bbio = btrfs_bio_alloc(SCRUB_STRIPE_PAGES, REQ_OP_READ, fs_info, scrub_read_endio, stripe);
- /* Read the whole stripe. */ bbio->bio.bi_iter.bi_sector = stripe->logical >> SECTOR_SHIFT; - for (int i = 0; i < BTRFS_STRIPE_LEN >> PAGE_SHIFT; i++) { + /* Read the whole range inside the chunk boundary. */ + for (unsigned int cur = 0; cur < nr_sectors; cur++) { + struct page *page = scrub_stripe_get_page(stripe, cur); + unsigned int pgoff = scrub_stripe_get_page_offset(stripe, cur); int ret;
- ret = bio_add_page(&bbio->bio, stripe->pages[i], PAGE_SIZE, 0); + ret = bio_add_page(&bbio->bio, page, fs_info->sectorsize, pgoff); /* We should have allocated enough bio vectors. */ - ASSERT(ret == PAGE_SIZE); + ASSERT(ret == fs_info->sectorsize); } atomic_inc(&stripe->pending_io);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naohiro Aota naohiro.aota@wdc.com
commit b18f3b60b35a8c01c9a2a0f0d6424c6d73971dc3 upstream.
The btrfs CI reported a lockdep warning as follows by running generic generic/129.
WARNING: possible circular locking dependency detected 6.7.0-rc5+ #1 Not tainted ------------------------------------------------------ kworker/u5:5/793427 is trying to acquire lock: ffff88813256d028 (&cache->lock){+.+.}-{2:2}, at: btrfs_zone_finish_one_bg+0x5e/0x130 but task is already holding lock: ffff88810a23a318 (&fs_info->zone_active_bgs_lock){+.+.}-{2:2}, at: btrfs_zone_finish_one_bg+0x34/0x130 which lock already depends on the new lock.
the existing dependency chain (in reverse order) is: -> #1 (&fs_info->zone_active_bgs_lock){+.+.}-{2:2}: ... -> #0 (&cache->lock){+.+.}-{2:2}: ...
This is because we take fs_info->zone_active_bgs_lock after a block_group's lock in btrfs_zone_activate() while doing the opposite in other places.
Fix the issue by expanding the fs_info->zone_active_bgs_lock's critical section and taking it before a block_group's lock.
Fixes: a7e1ac7bdc5a ("btrfs: zoned: reserve zones for an active metadata/system block group") CC: stable@vger.kernel.org # 6.6 Signed-off-by: Naohiro Aota naohiro.aota@wdc.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/zoned.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-)
--- a/fs/btrfs/zoned.c +++ b/fs/btrfs/zoned.c @@ -1975,6 +1975,7 @@ bool btrfs_zone_activate(struct btrfs_bl
map = block_group->physical_map;
+ spin_lock(&fs_info->zone_active_bgs_lock); spin_lock(&block_group->lock); if (test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags)) { ret = true; @@ -1987,7 +1988,6 @@ bool btrfs_zone_activate(struct btrfs_bl goto out_unlock; }
- spin_lock(&fs_info->zone_active_bgs_lock); for (i = 0; i < map->num_stripes; i++) { struct btrfs_zoned_device_info *zinfo; int reserved = 0; @@ -2007,20 +2007,17 @@ bool btrfs_zone_activate(struct btrfs_bl */ if (atomic_read(&zinfo->active_zones_left) <= reserved) { ret = false; - spin_unlock(&fs_info->zone_active_bgs_lock); goto out_unlock; }
if (!btrfs_dev_set_active_zone(device, physical)) { /* Cannot activate the zone */ ret = false; - spin_unlock(&fs_info->zone_active_bgs_lock); goto out_unlock; } if (!is_data) zinfo->reserved_active_zones--; } - spin_unlock(&fs_info->zone_active_bgs_lock);
/* Successfully activated all the zones */ set_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags); @@ -2028,8 +2025,6 @@ bool btrfs_zone_activate(struct btrfs_bl
/* For the active block group list */ btrfs_get_block_group(block_group); - - spin_lock(&fs_info->zone_active_bgs_lock); list_add_tail(&block_group->active_bg_list, &fs_info->zone_active_bgs); spin_unlock(&fs_info->zone_active_bgs_lock);
@@ -2037,6 +2032,7 @@ bool btrfs_zone_activate(struct btrfs_bl
out_unlock: spin_unlock(&block_group->lock); + spin_unlock(&fs_info->zone_active_bgs_lock); return ret; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Omar Sandoval osandov@fb.com
commit 3324d0547861b16cf436d54abba7052e0c8aa9de upstream.
Sweet Tea spotted a race between subvolume deletion and snapshotting that can result in the root item for the snapshot having the BTRFS_ROOT_SUBVOL_DEAD flag set. The race is:
Thread 1 | Thread 2 ----------------------------------------------|---------- btrfs_delete_subvolume | btrfs_set_root_flags(BTRFS_ROOT_SUBVOL_DEAD)| |btrfs_mksubvol | down_read(subvol_sem) | create_snapshot | ... | create_pending_snapshot | copy root item from source down_write(subvol_sem) |
This flag is only checked in send and swap activate, which this would cause to fail mysteriously.
create_snapshot() now checks the root refs to reject a deleted subvolume, so we can fix this by locking subvol_sem earlier so that the BTRFS_ROOT_SUBVOL_DEAD flag and the root refs are updated atomically.
CC: stable@vger.kernel.org # 4.14+ Reported-by: Sweet Tea Dorminy sweettea-kernel@dorminy.me Reviewed-by: Sweet Tea Dorminy sweettea-kernel@dorminy.me Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Omar Sandoval osandov@fb.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/inode.c | 22 +++++++++++++--------- 1 file changed, 13 insertions(+), 9 deletions(-)
--- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4447,6 +4447,8 @@ int btrfs_delete_subvolume(struct btrfs_ u64 root_flags; int ret;
+ down_write(&fs_info->subvol_sem); + /* * Don't allow to delete a subvolume with send in progress. This is * inside the inode lock so the error handling that has to drop the bit @@ -4458,25 +4460,25 @@ int btrfs_delete_subvolume(struct btrfs_ btrfs_warn(fs_info, "attempt to delete subvolume %llu during send", dest->root_key.objectid); - return -EPERM; + ret = -EPERM; + goto out_up_write; } if (atomic_read(&dest->nr_swapfiles)) { spin_unlock(&dest->root_item_lock); btrfs_warn(fs_info, "attempt to delete subvolume %llu with active swapfile", root->root_key.objectid); - return -EPERM; + ret = -EPERM; + goto out_up_write; } root_flags = btrfs_root_flags(&dest->root_item); btrfs_set_root_flags(&dest->root_item, root_flags | BTRFS_ROOT_SUBVOL_DEAD); spin_unlock(&dest->root_item_lock);
- down_write(&fs_info->subvol_sem); - ret = may_destroy_subvol(dest); if (ret) - goto out_up_write; + goto out_undead;
btrfs_init_block_rsv(&block_rsv, BTRFS_BLOCK_RSV_TEMP); /* @@ -4486,7 +4488,7 @@ int btrfs_delete_subvolume(struct btrfs_ */ ret = btrfs_subvolume_reserve_metadata(root, &block_rsv, 5, true); if (ret) - goto out_up_write; + goto out_undead;
trans = btrfs_start_transaction(root, 0); if (IS_ERR(trans)) { @@ -4552,15 +4554,17 @@ out_end_trans: inode->i_flags |= S_DEAD; out_release: btrfs_subvolume_release_metadata(root, &block_rsv); -out_up_write: - up_write(&fs_info->subvol_sem); +out_undead: if (ret) { spin_lock(&dest->root_item_lock); root_flags = btrfs_root_flags(&dest->root_item); btrfs_set_root_flags(&dest->root_item, root_flags & ~BTRFS_ROOT_SUBVOL_DEAD); spin_unlock(&dest->root_item_lock); - } else { + } +out_up_write: + up_write(&fs_info->subvol_sem); + if (!ret) { d_invalidate(dentry); btrfs_prune_dentries(dest); ASSERT(dest->send_in_progress == 0);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fedor Pchelkin pchelkin@ispras.ru
commit f03e274a8b29d1d1c1bbd7f764766cb5ca537ab7 upstream.
As clearing REF_VERIFY mount option indicates there were some errors in a ref-verify process, a ref cache is not relevant anymore and should be freed.
btrfs_free_ref_cache() requires REF_VERIFY option being set so call it just before clearing the mount option.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Reported-by: syzbot+be14ed7728594dc8bd42@syzkaller.appspotmail.com Fixes: fd708b81d972 ("Btrfs: add a extent ref verify tool") CC: stable@vger.kernel.org # 5.4+ Closes: https://lore.kernel.org/lkml/000000000000e5a65c05ee832054@google.com/ Reported-by: syzbot+c563a3c79927971f950f@syzkaller.appspotmail.com Closes: https://lore.kernel.org/lkml/0000000000007fe09705fdc6086c@google.com/ Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ref-verify.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/fs/btrfs/ref-verify.c +++ b/fs/btrfs/ref-verify.c @@ -886,8 +886,10 @@ int btrfs_ref_tree_mod(struct btrfs_fs_i out_unlock: spin_unlock(&fs_info->ref_verify_lock); out: - if (ret) + if (ret) { + btrfs_free_ref_cache(fs_info); btrfs_clear_opt(fs_info->mount_opt, REF_VERIFY); + } return ret; }
@@ -1018,8 +1020,8 @@ int btrfs_build_ref_tree(struct btrfs_fs } } if (ret) { - btrfs_clear_opt(fs_info->mount_opt, REF_VERIFY); btrfs_free_ref_cache(fs_info); + btrfs_clear_opt(fs_info->mount_opt, REF_VERIFY); } btrfs_free_path(path); return ret;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Chung-Chiang Cheng cccheng@synology.com
commit f398e70dd69e6ceea71463a5380e6118f219197e upstream.
The error message should accurately reflect the size rather than the type.
Fixes: f82d1c7ca8ae ("btrfs: tree-checker: Add EXTENT_ITEM and METADATA_ITEM check") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Filipe Manana fdmanana@suse.com Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Chung-Chiang Cheng cccheng@synology.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/tree-checker.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/btrfs/tree-checker.c +++ b/fs/btrfs/tree-checker.c @@ -1417,7 +1417,7 @@ static int check_extent_item(struct exte if (unlikely(ptr + btrfs_extent_inline_ref_size(inline_type) > end)) { extent_err(leaf, slot, "inline ref item overflows extent item, ptr %lu iref size %u end %lu", - ptr, inline_type, end); + ptr, btrfs_extent_inline_ref_size(inline_type), end); return -EUCLEAN; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Sterba dsterba@suse.com
commit a208b3f132b48e1f94f620024e66fea635925877 upstream.
There's a warning in btrfs_issue_discard() when the range is not aligned to 512 bytes, originally added in 4d89d377bbb0 ("btrfs: btrfs_issue_discard ensure offset/length are aligned to sector boundaries"). We can't do sub-sector writes anyway so the adjustment is the only thing that we can do and the warning is unnecessary.
CC: stable@vger.kernel.org # 4.19+ Reported-by: syzbot+4a4f1eba14eb5c3417d1@syzkaller.appspotmail.com Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/extent-tree.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -1245,7 +1245,8 @@ static int btrfs_issue_discard(struct bl u64 bytes_left, end; u64 aligned_start = ALIGN(start, 1 << SECTOR_SHIFT);
- if (WARN_ON(start != aligned_start)) { + /* Adjust the range to be aligned to 512B sectors if necessary. */ + if (start != aligned_start) { len -= aligned_start - start; len = round_down(len, 1 << SECTOR_SHIFT); start = aligned_start;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Qu Wenruo wqu@suse.com
commit 173431b274a9a54fc10b273b46e67f46bcf62d2e upstream.
Add extra sanity check for btrfs_ioctl_defrag_range_args::flags.
This is not really to enhance fuzzing tests, but as a preparation for future expansion on btrfs_ioctl_defrag_range_args.
In the future we're going to add new members, allowing more fine tuning for btrfs defrag. Without the -ENONOTSUPP error, there would be no way to detect if the kernel supports those new defrag features.
CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Filipe Manana fdmanana@suse.com Signed-off-by: Qu Wenruo wqu@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ioctl.c | 4 ++++ include/uapi/linux/btrfs.h | 3 +++ 2 files changed, 7 insertions(+)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -2608,6 +2608,10 @@ static int btrfs_ioctl_defrag(struct fil ret = -EFAULT; goto out; } + if (range.flags & ~BTRFS_DEFRAG_RANGE_FLAGS_SUPP) { + ret = -EOPNOTSUPP; + goto out; + } /* compression requires us to start the IO */ if ((range.flags & BTRFS_DEFRAG_RANGE_COMPRESS)) { range.flags |= BTRFS_DEFRAG_RANGE_START_IO; --- a/include/uapi/linux/btrfs.h +++ b/include/uapi/linux/btrfs.h @@ -612,6 +612,9 @@ struct btrfs_ioctl_clone_range_args { */ #define BTRFS_DEFRAG_RANGE_COMPRESS 1 #define BTRFS_DEFRAG_RANGE_START_IO 2 +#define BTRFS_DEFRAG_RANGE_FLAGS_SUPP (BTRFS_DEFRAG_RANGE_COMPRESS | \ + BTRFS_DEFRAG_RANGE_START_IO) + struct btrfs_ioctl_defrag_range_args { /* start of the defrag operation */ __u64 start;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Omar Sandoval osandov@fb.com
commit 7081929ab2572920e94d70be3d332e5c9f97095a upstream.
If the source file descriptor to the snapshot ioctl refers to a deleted subvolume, we get the following abort:
BTRFS: Transaction aborted (error -2) WARNING: CPU: 0 PID: 833 at fs/btrfs/transaction.c:1875 create_pending_snapshot+0x1040/0x1190 [btrfs] Modules linked in: pata_acpi btrfs ata_piix libata scsi_mod virtio_net blake2b_generic xor net_failover virtio_rng failover scsi_common rng_core raid6_pq libcrc32c CPU: 0 PID: 833 Comm: t_snapshot_dele Not tainted 6.7.0-rc6 #2 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014 RIP: 0010:create_pending_snapshot+0x1040/0x1190 [btrfs] RSP: 0018:ffffa09c01337af8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff9982053e7c78 RCX: 0000000000000027 RDX: ffff99827dc20848 RSI: 0000000000000001 RDI: ffff99827dc20840 RBP: ffffa09c01337c00 R08: 0000000000000000 R09: ffffa09c01337998 R10: 0000000000000003 R11: ffffffffb96da248 R12: fffffffffffffffe R13: ffff99820535bb28 R14: ffff99820b7bd000 R15: ffff99820381ea80 FS: 00007fe20aadabc0(0000) GS:ffff99827dc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000559a120b502f CR3: 00000000055b6000 CR4: 00000000000006f0 Call Trace: <TASK> ? create_pending_snapshot+0x1040/0x1190 [btrfs] ? __warn+0x81/0x130 ? create_pending_snapshot+0x1040/0x1190 [btrfs] ? report_bug+0x171/0x1a0 ? handle_bug+0x3a/0x70 ? exc_invalid_op+0x17/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? create_pending_snapshot+0x1040/0x1190 [btrfs] ? create_pending_snapshot+0x1040/0x1190 [btrfs] create_pending_snapshots+0x92/0xc0 [btrfs] btrfs_commit_transaction+0x66b/0xf40 [btrfs] btrfs_mksubvol+0x301/0x4d0 [btrfs] btrfs_mksnapshot+0x80/0xb0 [btrfs] __btrfs_ioctl_snap_create+0x1c2/0x1d0 [btrfs] btrfs_ioctl_snap_create_v2+0xc4/0x150 [btrfs] btrfs_ioctl+0x8a6/0x2650 [btrfs] ? kmem_cache_free+0x22/0x340 ? do_sys_openat2+0x97/0xe0 __x64_sys_ioctl+0x97/0xd0 do_syscall_64+0x46/0xf0 entry_SYSCALL_64_after_hwframe+0x6e/0x76 RIP: 0033:0x7fe20abe83af RSP: 002b:00007ffe6eff1360 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 RAX: ffffffffffffffda RBX: 0000000000000004 RCX: 00007fe20abe83af RDX: 00007ffe6eff23c0 RSI: 0000000050009417 RDI: 0000000000000003 RBP: 0000000000000003 R08: 0000000000000000 R09: 00007fe20ad16cd0 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffe6eff13c0 R14: 00007fe20ad45000 R15: 0000559a120b6d58 </TASK> ---[ end trace 0000000000000000 ]--- BTRFS: error (device vdc: state A) in create_pending_snapshot:1875: errno=-2 No such entry BTRFS info (device vdc: state EA): forced readonly BTRFS warning (device vdc: state EA): Skipping commit of aborted transaction. BTRFS: error (device vdc: state EA) in cleanup_transaction:2055: errno=-2 No such entry
This happens because create_pending_snapshot() initializes the new root item as a copy of the source root item. This includes the refs field, which is 0 for a deleted subvolume. The call to btrfs_insert_root() therefore inserts a root with refs == 0. btrfs_get_new_fs_root() then finds the root and returns -ENOENT if refs == 0, which causes create_pending_snapshot() to abort.
Fix it by checking the source root's refs before attempting the snapshot, but after locking subvol_sem to avoid racing with deletion.
CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Sweet Tea Dorminy sweettea-kernel@dorminy.me Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Omar Sandoval osandov@fb.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/ioctl.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -790,6 +790,9 @@ static int create_snapshot(struct btrfs_ return -EOPNOTSUPP; }
+ if (btrfs_root_refs(&root->root_item) == 0) + return -ENOENT; + if (!test_bit(BTRFS_ROOT_SHAREABLE, &root->state)) return -EINVAL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ilya Dryomov idryomov@gmail.com
commit ded080c86b3f99683774af0441a58fc2e3d60cae upstream.
The running list is supposed to contain requests that are pinning the exclusive lock, i.e. those that must be flushed before exclusive lock is released. When wake_lock_waiters() is called to handle an error, requests on the acquiring list are failed with that error and no flushing takes place. Briefly moving them to the running list is not only pointless but also harmful: if exclusive lock gets acquired before all of their state machines are scheduled and go through rbd_lock_del_request(), we trigger
rbd_assert(list_empty(&rbd_dev->running_list));
in rbd_try_acquire_lock().
Cc: stable@vger.kernel.org Fixes: 637cd060537d ("rbd: new exclusive lock wait/wake code") Signed-off-by: Ilya Dryomov idryomov@gmail.com Reviewed-by: Dongsheng Yang dongsheng.yang@easystack.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/block/rbd.c | 22 ++++++++++++++-------- 1 file changed, 14 insertions(+), 8 deletions(-)
--- a/drivers/block/rbd.c +++ b/drivers/block/rbd.c @@ -3452,14 +3452,15 @@ static bool rbd_lock_add_request(struct static void rbd_lock_del_request(struct rbd_img_request *img_req) { struct rbd_device *rbd_dev = img_req->rbd_dev; - bool need_wakeup; + bool need_wakeup = false;
lockdep_assert_held(&rbd_dev->lock_rwsem); spin_lock(&rbd_dev->lock_lists_lock); - rbd_assert(!list_empty(&img_req->lock_item)); - list_del_init(&img_req->lock_item); - need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_RELEASING && - list_empty(&rbd_dev->running_list)); + if (!list_empty(&img_req->lock_item)) { + list_del_init(&img_req->lock_item); + need_wakeup = (rbd_dev->lock_state == RBD_LOCK_STATE_RELEASING && + list_empty(&rbd_dev->running_list)); + } spin_unlock(&rbd_dev->lock_lists_lock); if (need_wakeup) complete(&rbd_dev->releasing_wait); @@ -3842,14 +3843,19 @@ static void wake_lock_waiters(struct rbd return; }
- list_for_each_entry(img_req, &rbd_dev->acquiring_list, lock_item) { + while (!list_empty(&rbd_dev->acquiring_list)) { + img_req = list_first_entry(&rbd_dev->acquiring_list, + struct rbd_img_request, lock_item); mutex_lock(&img_req->state_mutex); rbd_assert(img_req->state == RBD_IMG_EXCLUSIVE_LOCK); + if (!result) + list_move_tail(&img_req->lock_item, + &rbd_dev->running_list); + else + list_del_init(&img_req->lock_item); rbd_img_schedule(img_req, result); mutex_unlock(&img_req->state_mutex); } - - list_splice_tail_init(&rbd_dev->acquiring_list, &rbd_dev->running_list); }
static bool locker_equal(const struct ceph_locker *lhs,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Bernd Edlinger bernd.edlinger@hotmail.de
commit 84c39ec57d409e803a9bb6e4e85daf1243e0e80b upstream.
If get_unused_fd_flags() fails, the error handling is incomplete because bprm->cred is already set to NULL, and therefore free_bprm will not unlock the cred_guard_mutex. Note there are two error conditions which end up here, one before and one after bprm->cred is cleared.
Fixes: b8a61c9e7b4a ("exec: Generic execfd support") Signed-off-by: Bernd Edlinger bernd.edlinger@hotmail.de Acked-by: Eric W. Biederman ebiederm@xmission.com Link: https://lore.kernel.org/r/AS8P193MB128517ADB5EFF29E04389EDAE4752@AS8P193MB12... Cc: stable@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/exec.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/exec.c +++ b/fs/exec.c @@ -1410,6 +1410,9 @@ int begin_new_exec(struct linux_binprm *
out_unlock: up_write(&me->signal->exec_update_lock); + if (!bprm->cred) + mutex_unlock(&me->signal->cred_guard_mutex); + out: return retval; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Emmanuel Grumbach emmanuel.grumbach@intel.com
commit cf4a0d840ecc72fcf16198d5e9c505ab7d5a5e4d upstream.
iwl_fw_ini_trigger_tlv::data is a pointer to a __le32, which means that if we copy to iwl_fw_ini_trigger_tlv::data + offset while offset is in bytes, we'll write past the buffer.
Cc: stable@vger.kernel.org Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218233 Fixes: cf29c5b66b9f ("iwlwifi: dbg_ini: implement time point handling") Signed-off-by: Emmanuel Grumbach emmanuel.grumbach@intel.com Signed-off-by: Miri Korenblit miriam.rachel.korenblit@intel.com Link: https://msgid.link/20240111150610.2d2b8b870194.I14ed76505a5cf87304e0c9cc05cc... Signed-off-by: Johannes Berg johannes.berg@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c +++ b/drivers/net/wireless/intel/iwlwifi/iwl-dbg-tlv.c @@ -1,6 +1,6 @@ // SPDX-License-Identifier: GPL-2.0 OR BSD-3-Clause /* - * Copyright (C) 2018-2023 Intel Corporation + * Copyright (C) 2018-2024 Intel Corporation */ #include <linux/firmware.h> #include "iwl-drv.h" @@ -1094,7 +1094,7 @@ static int iwl_dbg_tlv_override_trig_nod node_trig = (void *)node_tlv->data; }
- memcpy(node_trig->data + offset, trig->data, trig_data_len); + memcpy((u8 *)node_trig->data + offset, trig->data, trig_data_len); node_tlv->length = cpu_to_le32(size);
if (policy & IWL_FW_INI_APPLY_POLICY_OVERRIDE_CFG) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: NeilBrown neilb@suse.de
commit edcf9725150e42beeca42d085149f4c88fa97afd upstream.
The test on so_count in nfsd4_release_lockowner() is nonsense and harmful. Revert to using check_for_locks(), changing that to not sleep.
First: harmful. As is documented in the kdoc comment for nfsd4_release_lockowner(), the test on so_count can transiently return a false positive resulting in a return of NFS4ERR_LOCKS_HELD when in fact no locks are held. This is clearly a protocol violation and with the Linux NFS client it can cause incorrect behaviour.
If RELEASE_LOCKOWNER is sent while some other thread is still processing a LOCK request which failed because, at the time that request was received, the given owner held a conflicting lock, then the nfsd thread processing that LOCK request can hold a reference (conflock) to the lock owner that causes nfsd4_release_lockowner() to return an incorrect error.
The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so it knows that the error is impossible. It assumes the lock owner was in fact released so it feels free to use the same lock owner identifier in some later locking request.
When it does reuse a lock owner identifier for which a previous RELEASE failed, it will naturally use a lock_seqid of zero. However the server, which didn't release the lock owner, will expect a larger lock_seqid and so will respond with NFS4ERR_BAD_SEQID.
So clearly it is harmful to allow a false positive, which testing so_count allows.
The test is nonsense because ... well... it doesn't mean anything.
so_count is the sum of three different counts. 1/ the set of states listed on so_stateids 2/ the set of active vfs locks owned by any of those states 3/ various transient counts such as for conflicting locks.
When it is tested against '2' it is clear that one of these is the transient reference obtained by find_lockowner_str_locked(). It is not clear what the other one is expected to be.
In practice, the count is often 2 because there is precisely one state on so_stateids. If there were more, this would fail.
In my testing I see two circumstances when RELEASE_LOCKOWNER is called. In one case, CLOSE is called before RELEASE_LOCKOWNER. That results in all the lock states being removed, and so the lockowner being discarded (it is removed when there are no more references which usually happens when the lock state is discarded). When nfsd4_release_lockowner() finds that the lock owner doesn't exist, it returns success.
The other case shows an so_count of '2' and precisely one state listed in so_stateid. It appears that the Linux client uses a separate lock owner for each file resulting in one lock state per lock owner, so this test on '2' is safe. For another client it might not be safe.
So this patch changes check_for_locks() to use the (newish) find_any_file_locked() so that it doesn't take a reference on the nfs4_file and so never calls nfsd_file_put(), and so never sleeps. With this check is it safe to restore the use of check_for_locks() rather than testing so_count against the mysterious '2'.
Fixes: ce3c4ad7f4ce ("NFSD: Fix possible sleep during nfsd4_release_lockowner()") Signed-off-by: NeilBrown neilb@suse.de Reviewed-by: Jeff Layton jlayton@kernel.org Cc: stable@vger.kernel.org # v6.2+ Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4state.c | 26 +++++++++++++++----------- 1 file changed, 15 insertions(+), 11 deletions(-)
--- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -7890,14 +7890,16 @@ check_for_locks(struct nfs4_file *fp, st { struct file_lock *fl; int status = false; - struct nfsd_file *nf = find_any_file(fp); + struct nfsd_file *nf; struct inode *inode; struct file_lock_context *flctx;
+ spin_lock(&fp->fi_lock); + nf = find_any_file_locked(fp); if (!nf) { /* Any valid lock stateid should have some sort of access */ WARN_ON_ONCE(1); - return status; + goto out; }
inode = file_inode(nf->nf_file); @@ -7913,7 +7915,8 @@ check_for_locks(struct nfs4_file *fp, st } spin_unlock(&flctx->flc_lock); } - nfsd_file_put(nf); +out: + spin_unlock(&fp->fi_lock); return status; }
@@ -7923,10 +7926,8 @@ check_for_locks(struct nfs4_file *fp, st * @cstate: NFSv4 COMPOUND state * @u: RELEASE_LOCKOWNER arguments * - * The lockowner's so_count is bumped when a lock record is added - * or when copying a conflicting lock. The latter case is brief, - * but can lead to fleeting false positives when looking for - * locks-in-use. + * Check if theree are any locks still held and if not - free the lockowner + * and any lock state that is owned. * * Return values: * %nfs_ok: lockowner released or not found @@ -7962,10 +7963,13 @@ nfsd4_release_lockowner(struct svc_rqst spin_unlock(&clp->cl_lock); return nfs_ok; } - if (atomic_read(&lo->lo_owner.so_count) != 2) { - spin_unlock(&clp->cl_lock); - nfs4_put_stateowner(&lo->lo_owner); - return nfserr_locks_held; + + list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) { + if (check_for_locks(stp->st_stid.sc_file, lo)) { + spin_unlock(&clp->cl_lock); + nfs4_put_stateowner(&lo->lo_owner); + return nfserr_locks_held; + } } unhash_lockowner_locked(lo); while (!list_empty(&lo->lo_owner.so_stateids)) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Kelley mhklinux@outlook.com
commit 6941f67ad37d5465b75b9ffc498fcf6897a3c00e upstream.
Current code in netvsc_drv_init() incorrectly assumes that PAGE_SIZE is 4 Kbytes, which is wrong on ARM64 with 16K or 64K page size. As a result, the default VMBus ring buffer size on ARM64 with 64K page size is 8 Mbytes instead of the expected 512 Kbytes. While this doesn't break anything, a typical VM with 8 vCPUs and 8 netvsc channels wastes 120 Mbytes (8 channels * 2 ring buffers/channel * 7.5 Mbytes/ring buffer).
Unfortunately, the module parameter specifying the ring buffer size is in units of 4 Kbyte pages. Ideally, it should be in units that are independent of PAGE_SIZE, but backwards compatibility prevents changing that now.
Fix this by having netvsc_drv_init() hardcode 4096 instead of using PAGE_SIZE when calculating the ring buffer size in bytes. Also use the VMBUS_RING_SIZE macro to ensure proper alignment when running with page size larger than 4K.
Cc: stable@vger.kernel.org # 5.15.x Fixes: 7aff79e297ee ("Drivers: hv: Enable Hyper-V code to be built on ARM64") Signed-off-by: Michael Kelley mhklinux@outlook.com Link: https://lore.kernel.org/r/20240122162028.348885-1-mhklinux@outlook.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/hyperv/netvsc_drv.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -44,7 +44,7 @@
static unsigned int ring_size __ro_after_init = 128; module_param(ring_size, uint, 0444); -MODULE_PARM_DESC(ring_size, "Ring buffer size (# of pages)"); +MODULE_PARM_DESC(ring_size, "Ring buffer size (# of 4K pages)"); unsigned int netvsc_ring_bytes __ro_after_init;
static const u32 default_msg = NETIF_MSG_DRV | NETIF_MSG_PROBE | @@ -2805,7 +2805,7 @@ static int __init netvsc_drv_init(void) pr_info("Increased ring_size to %u (min allowed)\n", ring_size); } - netvsc_ring_bytes = ring_size * PAGE_SIZE; + netvsc_ring_bytes = VMBUS_RING_SIZE(ring_size * 4096);
register_netdevice_notifier(&netvsc_netdev_notifier);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pablo Neira Ayuso pablo@netfilter.org
commit 01acb2e8666a6529697141a6017edbf206921913 upstream.
Remove netdevice from inet/ingress basechain in case NETDEV_UNREGISTER event is reported, otherwise a stale reference to netdevice remains in the hook list.
Fixes: 60a3815da702 ("netfilter: add inet ingress support") Cc: stable@vger.kernel.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nft_chain_filter.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-)
--- a/net/netfilter/nft_chain_filter.c +++ b/net/netfilter/nft_chain_filter.c @@ -357,9 +357,10 @@ static int nf_tables_netdev_event(struct unsigned long event, void *ptr) { struct net_device *dev = netdev_notifier_info_to_dev(ptr); + struct nft_base_chain *basechain; struct nftables_pernet *nft_net; - struct nft_table *table; struct nft_chain *chain, *nr; + struct nft_table *table; struct nft_ctx ctx = { .net = dev_net(dev), }; @@ -371,7 +372,8 @@ static int nf_tables_netdev_event(struct nft_net = nft_pernet(ctx.net); mutex_lock(&nft_net->commit_mutex); list_for_each_entry(table, &nft_net->tables, list) { - if (table->family != NFPROTO_NETDEV) + if (table->family != NFPROTO_NETDEV && + table->family != NFPROTO_INET) continue;
ctx.family = table->family; @@ -380,6 +382,11 @@ static int nf_tables_netdev_event(struct if (!nft_is_base_chain(chain)) continue;
+ basechain = nft_base_chain(chain); + if (table->family == NFPROTO_INET && + basechain->ops.hooknum != NF_INET_INGRESS) + continue; + ctx.chain = chain; nft_netdev_event(event, dev, &ctx); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Florian Westphal fw@strlen.de
commit f342de4e2f33e0e39165d8639387aa6c19dff660 upstream.
This reverts commit e0abdadcc6e1.
core.c:nf_hook_slow assumes that the upper 16 bits of NF_DROP verdicts contain a valid errno, i.e. -EPERM, -EHOSTUNREACH or similar, or 0.
Due to the reverted commit, its possible to provide a positive value, e.g. NF_ACCEPT (1), which results in use-after-free.
Its not clear to me why this commit was made.
NF_QUEUE is not used by nftables; "queue" rules in nftables will result in use of "nft_queue" expression.
If we later need to allow specifiying errno values from userspace (do not know why), this has to call NF_DROP_GETERR and check that "err <= 0" holds true.
Fixes: e0abdadcc6e1 ("netfilter: nf_tables: accept QUEUE/DROP verdict parameters") Cc: stable@vger.kernel.org Reported-by: Notselwyn notselwyn@pwning.tech Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/netfilter/nf_tables_api.c | 16 ++++++---------- 1 file changed, 6 insertions(+), 10 deletions(-)
--- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -10871,16 +10871,10 @@ static int nft_verdict_init(const struct data->verdict.code = ntohl(nla_get_be32(tb[NFTA_VERDICT_CODE]));
switch (data->verdict.code) { - default: - switch (data->verdict.code & NF_VERDICT_MASK) { - case NF_ACCEPT: - case NF_DROP: - case NF_QUEUE: - break; - default: - return -EINVAL; - } - fallthrough; + case NF_ACCEPT: + case NF_DROP: + case NF_QUEUE: + break; case NFT_CONTINUE: case NFT_BREAK: case NFT_RETURN: @@ -10915,6 +10909,8 @@ static int nft_verdict_init(const struct
data->verdict.chain = chain; break; + default: + return -EINVAL; }
desc->len = sizeof(data->verdict);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nathan Chancellor nathan@kernel.org
commit 416de0246f35f43d871a57939671fe814f4455ee upstream.
When booting a kernel with CONFIG_CFI_CLANG, there is a CFI failure when accessing any of the values under /sys/devices/system/cpu/intel_uncore_frequency/package_00_die_00:
$ cat /sys/devices/system/cpu/intel_uncore_frequency/package_00_die_00/max_freq_khz fish: Job 1, 'cat /sys/devices/system/cpu/int…' terminated by signal SIGSEGV (Address boundary error)
$ sudo dmesg &| grep 'CFI failure' [ 170.953925] CFI failure at kobj_attr_show+0x19/0x30 (target: show_max_freq_khz+0x0/0xc0 [intel_uncore_frequency_common]; expected type: 0xd34078c5
The sysfs callback functions such as show_domain_id() are written as if they are going to be called by dev_attr_show() but as the above message shows, they are instead called by kobj_attr_show(). kCFI checks that the destination of an indirect jump has the exact same type as the prototype of the function pointer it is called through and fails when they do not.
These callbacks are called through kobj_attr_show() because uncore_root_kobj was initialized with kobject_create_and_add(), which means uncore_root_kobj has a ->sysfs_ops of kobj_sysfs_ops from kobject_create(), which uses kobj_attr_show() as its ->show() value.
The only reason there has not been a more noticeable problem until this point is that 'struct kobj_attribute' and 'struct device_attribute' have the same layout, so getting the callback from container_of() works the same with either value.
Change all the callbacks and their uses to be compatible with kobj_attr_show() and kobj_attr_store(), which resolves the kCFI failure and allows the sysfs files to work properly.
Closes: https://github.com/ClangBuiltLinux/linux/issues/1974 Fixes: ae7b2ce57851 ("platform/x86/intel/uncore-freq: Use sysfs API to create attributes") Cc: stable@vger.kernel.org Signed-off-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Sami Tolvanen samitolvanen@google.com Acked-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Link: https://lore.kernel.org/r/20240104-intel-uncore-freq-kcfi-fix-v1-1-bf1e8939a... Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c | 82 +++++----- drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h | 32 +-- 2 files changed, 57 insertions(+), 57 deletions(-)
--- a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c +++ b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.c @@ -23,23 +23,23 @@ static int (*uncore_read)(struct uncore_ static int (*uncore_write)(struct uncore_data *data, unsigned int input, unsigned int min_max); static int (*uncore_read_freq)(struct uncore_data *data, unsigned int *freq);
-static ssize_t show_domain_id(struct device *dev, struct device_attribute *attr, char *buf) +static ssize_t show_domain_id(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { - struct uncore_data *data = container_of(attr, struct uncore_data, domain_id_dev_attr); + struct uncore_data *data = container_of(attr, struct uncore_data, domain_id_kobj_attr);
return sprintf(buf, "%u\n", data->domain_id); }
-static ssize_t show_fabric_cluster_id(struct device *dev, struct device_attribute *attr, char *buf) +static ssize_t show_fabric_cluster_id(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { - struct uncore_data *data = container_of(attr, struct uncore_data, fabric_cluster_id_dev_attr); + struct uncore_data *data = container_of(attr, struct uncore_data, fabric_cluster_id_kobj_attr);
return sprintf(buf, "%u\n", data->cluster_id); }
-static ssize_t show_package_id(struct device *dev, struct device_attribute *attr, char *buf) +static ssize_t show_package_id(struct kobject *kobj, struct kobj_attribute *attr, char *buf) { - struct uncore_data *data = container_of(attr, struct uncore_data, package_id_dev_attr); + struct uncore_data *data = container_of(attr, struct uncore_data, package_id_kobj_attr);
return sprintf(buf, "%u\n", data->package_id); } @@ -97,30 +97,30 @@ static ssize_t show_perf_status_freq_khz }
#define store_uncore_min_max(name, min_max) \ - static ssize_t store_##name(struct device *dev, \ - struct device_attribute *attr, \ + static ssize_t store_##name(struct kobject *kobj, \ + struct kobj_attribute *attr, \ const char *buf, size_t count) \ { \ - struct uncore_data *data = container_of(attr, struct uncore_data, name##_dev_attr);\ + struct uncore_data *data = container_of(attr, struct uncore_data, name##_kobj_attr);\ \ return store_min_max_freq_khz(data, buf, count, \ min_max); \ }
#define show_uncore_min_max(name, min_max) \ - static ssize_t show_##name(struct device *dev, \ - struct device_attribute *attr, char *buf)\ + static ssize_t show_##name(struct kobject *kobj, \ + struct kobj_attribute *attr, char *buf)\ { \ - struct uncore_data *data = container_of(attr, struct uncore_data, name##_dev_attr);\ + struct uncore_data *data = container_of(attr, struct uncore_data, name##_kobj_attr);\ \ return show_min_max_freq_khz(data, buf, min_max); \ }
#define show_uncore_perf_status(name) \ - static ssize_t show_##name(struct device *dev, \ - struct device_attribute *attr, char *buf)\ + static ssize_t show_##name(struct kobject *kobj, \ + struct kobj_attribute *attr, char *buf)\ { \ - struct uncore_data *data = container_of(attr, struct uncore_data, name##_dev_attr);\ + struct uncore_data *data = container_of(attr, struct uncore_data, name##_kobj_attr);\ \ return show_perf_status_freq_khz(data, buf); \ } @@ -134,11 +134,11 @@ show_uncore_min_max(max_freq_khz, 1); show_uncore_perf_status(current_freq_khz);
#define show_uncore_data(member_name) \ - static ssize_t show_##member_name(struct device *dev, \ - struct device_attribute *attr, char *buf)\ + static ssize_t show_##member_name(struct kobject *kobj, \ + struct kobj_attribute *attr, char *buf)\ { \ struct uncore_data *data = container_of(attr, struct uncore_data,\ - member_name##_dev_attr);\ + member_name##_kobj_attr);\ \ return sysfs_emit(buf, "%u\n", \ data->member_name); \ @@ -149,29 +149,29 @@ show_uncore_data(initial_max_freq_khz);
#define init_attribute_rw(_name) \ do { \ - sysfs_attr_init(&data->_name##_dev_attr.attr); \ - data->_name##_dev_attr.show = show_##_name; \ - data->_name##_dev_attr.store = store_##_name; \ - data->_name##_dev_attr.attr.name = #_name; \ - data->_name##_dev_attr.attr.mode = 0644; \ + sysfs_attr_init(&data->_name##_kobj_attr.attr); \ + data->_name##_kobj_attr.show = show_##_name; \ + data->_name##_kobj_attr.store = store_##_name; \ + data->_name##_kobj_attr.attr.name = #_name; \ + data->_name##_kobj_attr.attr.mode = 0644; \ } while (0)
#define init_attribute_ro(_name) \ do { \ - sysfs_attr_init(&data->_name##_dev_attr.attr); \ - data->_name##_dev_attr.show = show_##_name; \ - data->_name##_dev_attr.store = NULL; \ - data->_name##_dev_attr.attr.name = #_name; \ - data->_name##_dev_attr.attr.mode = 0444; \ + sysfs_attr_init(&data->_name##_kobj_attr.attr); \ + data->_name##_kobj_attr.show = show_##_name; \ + data->_name##_kobj_attr.store = NULL; \ + data->_name##_kobj_attr.attr.name = #_name; \ + data->_name##_kobj_attr.attr.mode = 0444; \ } while (0)
#define init_attribute_root_ro(_name) \ do { \ - sysfs_attr_init(&data->_name##_dev_attr.attr); \ - data->_name##_dev_attr.show = show_##_name; \ - data->_name##_dev_attr.store = NULL; \ - data->_name##_dev_attr.attr.name = #_name; \ - data->_name##_dev_attr.attr.mode = 0400; \ + sysfs_attr_init(&data->_name##_kobj_attr.attr); \ + data->_name##_kobj_attr.show = show_##_name; \ + data->_name##_kobj_attr.store = NULL; \ + data->_name##_kobj_attr.attr.name = #_name; \ + data->_name##_kobj_attr.attr.mode = 0400; \ } while (0)
static int create_attr_group(struct uncore_data *data, char *name) @@ -186,21 +186,21 @@ static int create_attr_group(struct unco
if (data->domain_id != UNCORE_DOMAIN_ID_INVALID) { init_attribute_root_ro(domain_id); - data->uncore_attrs[index++] = &data->domain_id_dev_attr.attr; + data->uncore_attrs[index++] = &data->domain_id_kobj_attr.attr; init_attribute_root_ro(fabric_cluster_id); - data->uncore_attrs[index++] = &data->fabric_cluster_id_dev_attr.attr; + data->uncore_attrs[index++] = &data->fabric_cluster_id_kobj_attr.attr; init_attribute_root_ro(package_id); - data->uncore_attrs[index++] = &data->package_id_dev_attr.attr; + data->uncore_attrs[index++] = &data->package_id_kobj_attr.attr; }
- data->uncore_attrs[index++] = &data->max_freq_khz_dev_attr.attr; - data->uncore_attrs[index++] = &data->min_freq_khz_dev_attr.attr; - data->uncore_attrs[index++] = &data->initial_min_freq_khz_dev_attr.attr; - data->uncore_attrs[index++] = &data->initial_max_freq_khz_dev_attr.attr; + data->uncore_attrs[index++] = &data->max_freq_khz_kobj_attr.attr; + data->uncore_attrs[index++] = &data->min_freq_khz_kobj_attr.attr; + data->uncore_attrs[index++] = &data->initial_min_freq_khz_kobj_attr.attr; + data->uncore_attrs[index++] = &data->initial_max_freq_khz_kobj_attr.attr;
ret = uncore_read_freq(data, &freq); if (!ret) - data->uncore_attrs[index++] = &data->current_freq_khz_dev_attr.attr; + data->uncore_attrs[index++] = &data->current_freq_khz_kobj_attr.attr;
data->uncore_attrs[index] = NULL;
--- a/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h +++ b/drivers/platform/x86/intel/uncore-frequency/uncore-frequency-common.h @@ -26,14 +26,14 @@ * @instance_id: Unique instance id to append to directory name * @name: Sysfs entry name for this instance * @uncore_attr_group: Attribute group storage - * @max_freq_khz_dev_attr: Storage for device attribute max_freq_khz - * @mix_freq_khz_dev_attr: Storage for device attribute min_freq_khz - * @initial_max_freq_khz_dev_attr: Storage for device attribute initial_max_freq_khz - * @initial_min_freq_khz_dev_attr: Storage for device attribute initial_min_freq_khz - * @current_freq_khz_dev_attr: Storage for device attribute current_freq_khz - * @domain_id_dev_attr: Storage for device attribute domain_id - * @fabric_cluster_id_dev_attr: Storage for device attribute fabric_cluster_id - * @package_id_dev_attr: Storage for device attribute package_id + * @max_freq_khz_kobj_attr: Storage for kobject attribute max_freq_khz + * @mix_freq_khz_kobj_attr: Storage for kobject attribute min_freq_khz + * @initial_max_freq_khz_kobj_attr: Storage for kobject attribute initial_max_freq_khz + * @initial_min_freq_khz_kobj_attr: Storage for kobject attribute initial_min_freq_khz + * @current_freq_khz_kobj_attr: Storage for kobject attribute current_freq_khz + * @domain_id_kobj_attr: Storage for kobject attribute domain_id + * @fabric_cluster_id_kobj_attr: Storage for kobject attribute fabric_cluster_id + * @package_id_kobj_attr: Storage for kobject attribute package_id * @uncore_attrs: Attribute storage for group creation * * This structure is used to encapsulate all data related to uncore sysfs @@ -53,14 +53,14 @@ struct uncore_data { char name[32];
struct attribute_group uncore_attr_group; - struct device_attribute max_freq_khz_dev_attr; - struct device_attribute min_freq_khz_dev_attr; - struct device_attribute initial_max_freq_khz_dev_attr; - struct device_attribute initial_min_freq_khz_dev_attr; - struct device_attribute current_freq_khz_dev_attr; - struct device_attribute domain_id_dev_attr; - struct device_attribute fabric_cluster_id_dev_attr; - struct device_attribute package_id_dev_attr; + struct kobj_attribute max_freq_khz_kobj_attr; + struct kobj_attribute min_freq_khz_kobj_attr; + struct kobj_attribute initial_max_freq_khz_kobj_attr; + struct kobj_attribute initial_min_freq_khz_kobj_attr; + struct kobj_attribute current_freq_khz_kobj_attr; + struct kobj_attribute domain_id_kobj_attr; + struct kobj_attribute fabric_cluster_id_kobj_attr; + struct kobj_attribute package_id_kobj_attr; struct attribute *uncore_attrs[9]; };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shin'ichiro Kawasaki shinichiro.kawasaki@wdc.com
commit 5913320eb0b3ec88158cfcb0fa5e996bf4ef681b upstream.
p2sb_bar() unhides P2SB device to get resources from the device. It guards the operation by locking pci_rescan_remove_lock so that parallel rescans do not find the P2SB device. However, this lock causes deadlock when PCI bus rescan is triggered by /sys/bus/pci/rescan. The rescan locks pci_rescan_remove_lock and probes PCI devices. When PCI devices call p2sb_bar() during probe, it locks pci_rescan_remove_lock again. Hence the deadlock.
To avoid the deadlock, do not lock pci_rescan_remove_lock in p2sb_bar(). Instead, do the lock at fs_initcall. Introduce p2sb_cache_resources() for fs_initcall which gets and caches the P2SB resources. At p2sb_bar(), refer the cache and return to the caller.
Before operating the device at P2SB DEVFN for resource cache, check that its device class is PCI_CLASS_MEMORY_OTHER 0x0580 that PCH specifications define. This avoids unexpected operation to other devices at the same DEVFN.
Link: https://lore.kernel.org/linux-pci/6xb24fjmptxxn5js2fjrrddjae6twex5bjaftwqsua... Fixes: 9745fb07474f ("platform/x86/intel: Add Primary to Sideband (P2SB) bridge support") Cc: stable@vger.kernel.org Suggested-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Shin'ichiro Kawasaki shinichiro.kawasaki@wdc.com Link: https://lore.kernel.org/r/20240108062059.3583028-2-shinichiro.kawasaki@wdc.c... Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Tested-by Klara Modin klarasmodin@gmail.com Reviewed-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/platform/x86/p2sb.c | 180 +++++++++++++++++++++++++++++++++----------- 1 file changed, 139 insertions(+), 41 deletions(-)
--- a/drivers/platform/x86/p2sb.c +++ b/drivers/platform/x86/p2sb.c @@ -26,6 +26,21 @@ static const struct x86_cpu_id p2sb_cpu_ {} };
+/* + * Cache BAR0 of P2SB device functions 0 to 7. + * TODO: The constant 8 is the number of functions that PCI specification + * defines. Same definitions exist tree-wide. Unify this definition and + * the other definitions then move to include/uapi/linux/pci.h. + */ +#define NR_P2SB_RES_CACHE 8 + +struct p2sb_res_cache { + u32 bus_dev_id; + struct resource res; +}; + +static struct p2sb_res_cache p2sb_resources[NR_P2SB_RES_CACHE]; + static int p2sb_get_devfn(unsigned int *devfn) { unsigned int fn = P2SB_DEVFN_DEFAULT; @@ -39,8 +54,16 @@ static int p2sb_get_devfn(unsigned int * return 0; }
+static bool p2sb_valid_resource(struct resource *res) +{ + if (res->flags) + return true; + + return false; +} + /* Copy resource from the first BAR of the device in question */ -static int p2sb_read_bar0(struct pci_dev *pdev, struct resource *mem) +static void p2sb_read_bar0(struct pci_dev *pdev, struct resource *mem) { struct resource *bar0 = &pdev->resource[0];
@@ -56,49 +79,66 @@ static int p2sb_read_bar0(struct pci_dev mem->end = bar0->end; mem->flags = bar0->flags; mem->desc = bar0->desc; - - return 0; }
-static int p2sb_scan_and_read(struct pci_bus *bus, unsigned int devfn, struct resource *mem) +static void p2sb_scan_and_cache_devfn(struct pci_bus *bus, unsigned int devfn) { + struct p2sb_res_cache *cache = &p2sb_resources[PCI_FUNC(devfn)]; struct pci_dev *pdev; - int ret;
pdev = pci_scan_single_device(bus, devfn); if (!pdev) - return -ENODEV; + return;
- ret = p2sb_read_bar0(pdev, mem); + p2sb_read_bar0(pdev, &cache->res); + cache->bus_dev_id = bus->dev.id;
pci_stop_and_remove_bus_device(pdev); - return ret; }
-/** - * p2sb_bar - Get Primary to Sideband (P2SB) bridge device BAR - * @bus: PCI bus to communicate with - * @devfn: PCI slot and function to communicate with - * @mem: memory resource to be filled in - * - * The BIOS prevents the P2SB device from being enumerated by the PCI - * subsystem, so we need to unhide and hide it back to lookup the BAR. - * - * if @bus is NULL, the bus 0 in domain 0 will be used. - * If @devfn is 0, it will be replaced by devfn of the P2SB device. - * - * Caller must provide a valid pointer to @mem. - * - * Locking is handled by pci_rescan_remove_lock mutex. - * - * Return: - * 0 on success or appropriate errno value on error. - */ -int p2sb_bar(struct pci_bus *bus, unsigned int devfn, struct resource *mem) +static int p2sb_scan_and_cache(struct pci_bus *bus, unsigned int devfn) +{ + unsigned int slot, fn; + + if (PCI_FUNC(devfn) == 0) { + /* + * When function number of the P2SB device is zero, scan it and + * other function numbers, and if devices are available, cache + * their BAR0s. + */ + slot = PCI_SLOT(devfn); + for (fn = 0; fn < NR_P2SB_RES_CACHE; fn++) + p2sb_scan_and_cache_devfn(bus, PCI_DEVFN(slot, fn)); + } else { + /* Scan the P2SB device and cache its BAR0 */ + p2sb_scan_and_cache_devfn(bus, devfn); + } + + if (!p2sb_valid_resource(&p2sb_resources[PCI_FUNC(devfn)].res)) + return -ENOENT; + + return 0; +} + +static struct pci_bus *p2sb_get_bus(struct pci_bus *bus) +{ + static struct pci_bus *p2sb_bus; + + bus = bus ?: p2sb_bus; + if (bus) + return bus; + + /* Assume P2SB is on the bus 0 in domain 0 */ + p2sb_bus = pci_find_bus(0, 0); + return p2sb_bus; +} + +static int p2sb_cache_resources(void) { - struct pci_dev *pdev_p2sb; unsigned int devfn_p2sb; u32 value = P2SBC_HIDE; + struct pci_bus *bus; + u16 class; int ret;
/* Get devfn for P2SB device itself */ @@ -106,8 +146,17 @@ int p2sb_bar(struct pci_bus *bus, unsign if (ret) return ret;
- /* if @bus is NULL, use bus 0 in domain 0 */ - bus = bus ?: pci_find_bus(0, 0); + bus = p2sb_get_bus(NULL); + if (!bus) + return -ENODEV; + + /* + * When a device with same devfn exists and its device class is not + * PCI_CLASS_MEMORY_OTHER for P2SB, do not touch it. + */ + pci_bus_read_config_word(bus, devfn_p2sb, PCI_CLASS_DEVICE, &class); + if (!PCI_POSSIBLE_ERROR(class) && class != PCI_CLASS_MEMORY_OTHER) + return -ENODEV;
/* * Prevent concurrent PCI bus scan from seeing the P2SB device and @@ -115,17 +164,16 @@ int p2sb_bar(struct pci_bus *bus, unsign */ pci_lock_rescan_remove();
- /* Unhide the P2SB device, if needed */ + /* + * The BIOS prevents the P2SB device from being enumerated by the PCI + * subsystem, so we need to unhide and hide it back to lookup the BAR. + * Unhide the P2SB device here, if needed. + */ pci_bus_read_config_dword(bus, devfn_p2sb, P2SBC, &value); if (value & P2SBC_HIDE) pci_bus_write_config_dword(bus, devfn_p2sb, P2SBC, 0);
- pdev_p2sb = pci_scan_single_device(bus, devfn_p2sb); - if (devfn) - ret = p2sb_scan_and_read(bus, devfn, mem); - else - ret = p2sb_read_bar0(pdev_p2sb, mem); - pci_stop_and_remove_bus_device(pdev_p2sb); + ret = p2sb_scan_and_cache(bus, devfn_p2sb);
/* Hide the P2SB device, if it was hidden */ if (value & P2SBC_HIDE) @@ -133,12 +181,62 @@ int p2sb_bar(struct pci_bus *bus, unsign
pci_unlock_rescan_remove();
- if (ret) - return ret; + return ret; +}
- if (mem->flags == 0) +/** + * p2sb_bar - Get Primary to Sideband (P2SB) bridge device BAR + * @bus: PCI bus to communicate with + * @devfn: PCI slot and function to communicate with + * @mem: memory resource to be filled in + * + * If @bus is NULL, the bus 0 in domain 0 will be used. + * If @devfn is 0, it will be replaced by devfn of the P2SB device. + * + * Caller must provide a valid pointer to @mem. + * + * Return: + * 0 on success or appropriate errno value on error. + */ +int p2sb_bar(struct pci_bus *bus, unsigned int devfn, struct resource *mem) +{ + struct p2sb_res_cache *cache; + int ret; + + bus = p2sb_get_bus(bus); + if (!bus) return -ENODEV;
+ if (!devfn) { + ret = p2sb_get_devfn(&devfn); + if (ret) + return ret; + } + + cache = &p2sb_resources[PCI_FUNC(devfn)]; + if (cache->bus_dev_id != bus->dev.id) + return -ENODEV; + + if (!p2sb_valid_resource(&cache->res)) + return -ENOENT; + + memcpy(mem, &cache->res, sizeof(*mem)); return 0; } EXPORT_SYMBOL_GPL(p2sb_bar); + +static int __init p2sb_fs_init(void) +{ + p2sb_cache_resources(); + return 0; +} + +/* + * pci_rescan_remove_lock to avoid access to unhidden P2SB devices can + * not be locked in sysfs pci bus rescan path because of deadlock. To + * avoid the deadlock, access to P2SB devices with the lock at an early + * step in kernel initialization and cache required resources. This + * should happen after subsys_initcall which initializes PCI subsystem + * and before device_initcall which requires P2SB resources. + */ +fs_initcall(p2sb_fs_init);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lin Ma linma@zju.edu.cn
commit ebeae8adf89d9a82359f6659b1663d09beec2faa upstream.
Similar to a reported issue (check the commit b33fb5b801c6 ("net: qualcomm: rmnet: fix global oob in rmnet_policy"), my local fuzzer finds another global out-of-bounds read for policy ksmbd_nl_policy. See bug trace below:
================================================================== BUG: KASAN: global-out-of-bounds in validate_nla lib/nlattr.c:386 [inline] BUG: KASAN: global-out-of-bounds in __nla_validate_parse+0x24af/0x2750 lib/nlattr.c:600 Read of size 1 at addr ffffffff8f24b100 by task syz-executor.1/62810
CPU: 0 PID: 62810 Comm: syz-executor.1 Tainted: G N 6.1.0 #3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0x8b/0xb3 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:284 [inline] print_report+0x172/0x475 mm/kasan/report.c:395 kasan_report+0xbb/0x1c0 mm/kasan/report.c:495 validate_nla lib/nlattr.c:386 [inline] __nla_validate_parse+0x24af/0x2750 lib/nlattr.c:600 __nla_parse+0x3e/0x50 lib/nlattr.c:697 __nlmsg_parse include/net/netlink.h:748 [inline] genl_family_rcv_msg_attrs_parse.constprop.0+0x1b0/0x290 net/netlink/genetlink.c:565 genl_family_rcv_msg_doit+0xda/0x330 net/netlink/genetlink.c:734 genl_family_rcv_msg net/netlink/genetlink.c:833 [inline] genl_rcv_msg+0x441/0x780 net/netlink/genetlink.c:850 netlink_rcv_skb+0x14f/0x410 net/netlink/af_netlink.c:2540 genl_rcv+0x24/0x40 net/netlink/genetlink.c:861 netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] netlink_unicast+0x54e/0x800 net/netlink/af_netlink.c:1345 netlink_sendmsg+0x930/0xe50 net/netlink/af_netlink.c:1921 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0x154/0x190 net/socket.c:734 ____sys_sendmsg+0x6df/0x840 net/socket.c:2482 ___sys_sendmsg+0x110/0x1b0 net/socket.c:2536 __sys_sendmsg+0xf3/0x1c0 net/socket.c:2565 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x3b/0x90 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fdd66a8f359 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 f1 19 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fdd65e00168 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007fdd66bbcf80 RCX: 00007fdd66a8f359 RDX: 0000000000000000 RSI: 0000000020000500 RDI: 0000000000000003 RBP: 00007fdd66ada493 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffc84b81aff R14: 00007fdd65e00300 R15: 0000000000022000 </TASK>
The buggy address belongs to the variable: ksmbd_nl_policy+0x100/0xa80
The buggy address belongs to the physical page: page:0000000034f47940 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1ccc4b flags: 0x200000000001000(reserved|node=0|zone=2) raw: 0200000000001000 ffffea00073312c8 ffffea00073312c8 0000000000000000 raw: 0000000000000000 0000000000000000 00000001ffffffff 0000000000000000 page dumped because: kasan: bad access detected
Memory state around the buggy address: ffffffff8f24b000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffff8f24b080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffffffff8f24b100: f9 f9 f9 f9 00 00 f9 f9 f9 f9 f9 f9 00 00 07 f9
^ ffffffff8f24b180: f9 f9 f9 f9 00 05 f9 f9 f9 f9 f9 f9 00 00 00 05 ffffffff8f24b200: f9 f9 f9 f9 00 00 03 f9 f9 f9 f9 f9 00 00 04 f9 ==================================================================
To fix it, add a placeholder named __KSMBD_EVENT_MAX and let KSMBD_EVENT_MAX to be its original value - 1 according to what other netlink families do. Also change two sites that refer the KSMBD_EVENT_MAX to correct value.
Cc: stable@vger.kernel.org Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Signed-off-by: Lin Ma linma@zju.edu.cn Acked-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/smb/server/ksmbd_netlink.h | 3 ++- fs/smb/server/transport_ipc.c | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-)
--- a/fs/smb/server/ksmbd_netlink.h +++ b/fs/smb/server/ksmbd_netlink.h @@ -304,7 +304,8 @@ enum ksmbd_event { KSMBD_EVENT_SPNEGO_AUTHEN_REQUEST, KSMBD_EVENT_SPNEGO_AUTHEN_RESPONSE = 15,
- KSMBD_EVENT_MAX + __KSMBD_EVENT_MAX, + KSMBD_EVENT_MAX = __KSMBD_EVENT_MAX - 1 };
/* --- a/fs/smb/server/transport_ipc.c +++ b/fs/smb/server/transport_ipc.c @@ -74,7 +74,7 @@ static int handle_unsupported_event(stru static int handle_generic_event(struct sk_buff *skb, struct genl_info *info); static int ksmbd_ipc_heartbeat_request(void);
-static const struct nla_policy ksmbd_nl_policy[KSMBD_EVENT_MAX] = { +static const struct nla_policy ksmbd_nl_policy[KSMBD_EVENT_MAX + 1] = { [KSMBD_EVENT_UNSPEC] = { .len = 0, }, @@ -403,7 +403,7 @@ static int handle_generic_event(struct s return -EPERM; #endif
- if (type >= KSMBD_EVENT_MAX) { + if (type > KSMBD_EVENT_MAX) { WARN_ON(1); return -EINVAL; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cristian Marussi cristian.marussi@arm.com
commit 437a310b22244d4e0b78665c3042e5d1c0f45306 upstream.
On reception of a completion interrupt the shared memory area is accessed to retrieve the message header at first and then, if the message sequence number identifies a transaction which is still pending, the related payload is fetched too.
When an SCMI command times out the channel ownership remains with the platform until eventually a late reply is received and, as a consequence, any further transmission attempt remains pending, waiting for the channel to be relinquished by the platform.
Once that late reply is received the channel ownership is given back to the agent and any pending request is then allowed to proceed and overwrite the SMT area of the just delivered late reply; then the wait for the reply to the new request starts.
It has been observed that the spurious IRQ related to the late reply can be wrongly associated with the freshly enqueued request: when that happens the SCMI stack in-flight lookup procedure is fooled by the fact that the message header now present in the SMT area is related to the new pending transaction, even though the real reply has still to arrive.
This race-condition on the A2P channel can be detected by looking at the channel status bits: a genuine reply from the platform will have set the channel free bit before triggering the completion IRQ.
Add a consistency check to validate such condition in the A2P ISR.
Reported-by: Xinglong Yang xinglong.yang@cixtech.com Closes: https://lore.kernel.org/all/PUZPR06MB54981E6FA00D82BFDBB864FBF08DA@PUZPR06MB... Fixes: 5c8a47a5a91d ("firmware: arm_scmi: Make scmi core independent of the transport type") Cc: stable@vger.kernel.org # 5.15+ Signed-off-by: Cristian Marussi cristian.marussi@arm.com Tested-by: Xinglong Yang xinglong.yang@cixtech.com Link: https://lore.kernel.org/r/20231220172112.763539-1-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/arm_scmi/common.h | 1 + drivers/firmware/arm_scmi/mailbox.c | 14 ++++++++++++++ drivers/firmware/arm_scmi/shmem.c | 6 ++++++ 3 files changed, 21 insertions(+)
--- a/drivers/firmware/arm_scmi/common.h +++ b/drivers/firmware/arm_scmi/common.h @@ -314,6 +314,7 @@ void shmem_fetch_notification(struct scm void shmem_clear_channel(struct scmi_shared_mem __iomem *shmem); bool shmem_poll_done(struct scmi_shared_mem __iomem *shmem, struct scmi_xfer *xfer); +bool shmem_channel_free(struct scmi_shared_mem __iomem *shmem);
/* declarations for message passing transports */ struct scmi_msg_payld; --- a/drivers/firmware/arm_scmi/mailbox.c +++ b/drivers/firmware/arm_scmi/mailbox.c @@ -45,6 +45,20 @@ static void rx_callback(struct mbox_clie { struct scmi_mailbox *smbox = client_to_scmi_mailbox(cl);
+ /* + * An A2P IRQ is NOT valid when received while the platform still has + * the ownership of the channel, because the platform at first releases + * the SMT channel and then sends the completion interrupt. + * + * This addresses a possible race condition in which a spurious IRQ from + * a previous timed-out reply which arrived late could be wrongly + * associated with the next pending transaction. + */ + if (cl->knows_txdone && !shmem_channel_free(smbox->shmem)) { + dev_warn(smbox->cinfo->dev, "Ignoring spurious A2P IRQ !\n"); + return; + } + scmi_rx_callback(smbox->cinfo, shmem_read_header(smbox->shmem), NULL); }
--- a/drivers/firmware/arm_scmi/shmem.c +++ b/drivers/firmware/arm_scmi/shmem.c @@ -122,3 +122,9 @@ bool shmem_poll_done(struct scmi_shared_ (SCMI_SHMEM_CHAN_STAT_CHANNEL_ERROR | SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE); } + +bool shmem_channel_free(struct scmi_shared_mem __iomem *shmem) +{ + return (ioread32(&shmem->channel_status) & + SCMI_SHMEM_CHAN_STAT_CHANNEL_FREE); +}
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Zimmermann tzimmermann@suse.de
commit d1b163aa0749706379055e40a52cf7a851abf9dc upstream.
This reverts commit 60aebc9559492cea6a9625f514a8041717e3a2e4.
Commit 60aebc9559492cea ("drivers/firmware: Move sysfb_init() from device_initcall to subsys_initcall_sync") messes up initialization order of the graphics drivers and leads to blank displays on some systems. So revert the commit.
To make the display drivers fully independent from initialization order requires to track framebuffer memory by device and independently from the loaded drivers. The kernel currently lacks the infrastructure to do so.
Reported-by: Jaak Ristioja jaak@ristioja.ee Closes: https://lore.kernel.org/dri-devel/ZUnNi3q3yB3zZfTl@P70.localdomain/T/#t Reported-by: Huacai Chen chenhuacai@loongson.cn Closes: https://lore.kernel.org/dri-devel/20231108024613.2898921-1-chenhuacai@loongs... Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10133 Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Cc: Javier Martinez Canillas javierm@redhat.com Cc: Thorsten Leemhuis regressions@leemhuis.info Cc: Jani Nikula jani.nikula@linux.intel.com Cc: stable@vger.kernel.org # v6.5+ Reviewed-by: Javier Martinez Canillas javierm@redhat.com Acked-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240123120937.27736-1-tzimmer... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/sysfb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/firmware/sysfb.c +++ b/drivers/firmware/sysfb.c @@ -128,4 +128,4 @@ unlock_mutex: }
/* must execute after PCI subsystem for EFI quirks */ -subsys_initcall_sync(sysfb_init); +device_initcall(sysfb_init);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ma Jun Jun.Ma2@amd.com
commit bc03c02cc1991a066b23e69bbcc0f66e8f1f7453 upstream.
If the RLC firmware is invalid because of wrong header size, the pointer to the rlc firmware is released in function amdgpu_ucode_request. There will be a null pointer error in subsequent use. So skip validation to fix it.
Fixes: 3da9b71563cb ("drm/amd: Use `amdgpu_ucode_*` helpers for GFX10") Signed-off-by: Ma Jun Jun.Ma2@amd.com Acked-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -3989,16 +3989,13 @@ static int gfx_v10_0_init_microcode(stru
if (!amdgpu_sriov_vf(adev)) { snprintf(fw_name, sizeof(fw_name), "amdgpu/%s_rlc.bin", ucode_prefix); - err = amdgpu_ucode_request(adev, &adev->gfx.rlc_fw, fw_name); - /* don't check this. There are apparently firmwares in the wild with - * incorrect size in the header - */ - if (err == -ENODEV) - goto out; + err = request_firmware(&adev->gfx.rlc_fw, fw_name, adev->dev); if (err) - dev_dbg(adev->dev, - "gfx10: amdgpu_ucode_request() failed "%s"\n", - fw_name); + goto out; + + /* don't validate this firmware. There are apparently firmwares + * in the wild with incorrect size in the header + */ rlc_hdr = (const struct rlc_firmware_header_v2_0 *)adev->gfx.rlc_fw->data; version_major = le16_to_cpu(rlc_hdr->header.header_version_major); version_minor = le16_to_cpu(rlc_hdr->header.header_version_minor);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dave Chinner dchinner@redhat.com
commit d8d222e09dab84a17bb65dda4b94d01c565f5327 upstream.
Recently xfs/513 started failing on my test machines testing "-o ro,norecovery" mount options. This was being emitted in dmesg:
[ 9906.932724] XFS (pmem0): no-recovery mounts must be read-only.
Turns out, readonly mounts with the fsopen()/fsconfig() mount API have been busted since day zero. It's only taken 5 years for debian unstable to start using this "new" mount API, and shortly after this I noticed xfs/513 had started to fail as per above.
The syscall trace is:
fsopen("xfs", FSOPEN_CLOEXEC) = 3 mount_setattr(-1, NULL, 0, NULL, 0) = -1 EINVAL (Invalid argument) ..... fsconfig(3, FSCONFIG_SET_STRING, "source", "/dev/pmem0", 0) = 0 fsconfig(3, FSCONFIG_SET_FLAG, "ro", NULL, 0) = 0 fsconfig(3, FSCONFIG_SET_FLAG, "norecovery", NULL, 0) = 0 fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = -1 EINVAL (Invalid argument) close(3) = 0
Showing that the actual mount instantiation (FSCONFIG_CMD_CREATE) is what threw out the error.
During mount instantiation, we call xfs_fs_validate_params() which does:
/* No recovery flag requires a read-only mount */ if (xfs_has_norecovery(mp) && !xfs_is_readonly(mp)) { xfs_warn(mp, "no-recovery mounts must be read-only."); return -EINVAL; }
and xfs_is_readonly() checks internal mount flags for read only state. This state is set in xfs_init_fs_context() from the context superblock flag state:
/* * Copy binary VFS mount flags we are interested in. */ if (fc->sb_flags & SB_RDONLY) set_bit(XFS_OPSTATE_READONLY, &mp->m_opstate);
With the old mount API, all of the VFS specific superblock flags had already been parsed and set before xfs_init_fs_context() is called, so this all works fine.
However, in the brave new fsopen/fsconfig world, xfs_init_fs_context() is called from fsopen() context, before any VFS superblock have been set or parsed. Hence if we use fsopen(), the internal XFS readonly state is *never set*. Hence anything that depends on xfs_is_readonly() actually returning true for read only mounts is broken if fsopen() has been used to mount the filesystem.
Fix this by moving this internal state initialisation to xfs_fs_fill_super() before we attempt to validate the parameters that have been set prior to the FSCONFIG_CMD_CREATE call being made.
Signed-off-by: Dave Chinner dchinner@redhat.com Fixes: 73e5fff98b64 ("xfs: switch to use the new mount-api") cc: stable@vger.kernel.org Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Chandan Babu R chandanbabu@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/xfs/xfs_super.c | 27 +++++++++++++++++---------- 1 file changed, 17 insertions(+), 10 deletions(-)
--- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -1503,6 +1503,18 @@ xfs_fs_fill_super(
mp->m_super = sb;
+ /* + * Copy VFS mount flags from the context now that all parameter parsing + * is guaranteed to have been completed by either the old mount API or + * the newer fsopen/fsconfig API. + */ + if (fc->sb_flags & SB_RDONLY) + set_bit(XFS_OPSTATE_READONLY, &mp->m_opstate); + if (fc->sb_flags & SB_DIRSYNC) + mp->m_features |= XFS_FEAT_DIRSYNC; + if (fc->sb_flags & SB_SYNCHRONOUS) + mp->m_features |= XFS_FEAT_WSYNC; + error = xfs_fs_validate_params(mp); if (error) return error; @@ -1972,6 +1984,11 @@ static const struct fs_context_operation .free = xfs_fs_free, };
+/* + * WARNING: do not initialise any parameters in this function that depend on + * mount option parsing having already been performed as this can be called from + * fsopen() before any parameters have been set. + */ static int xfs_init_fs_context( struct fs_context *fc) { @@ -2003,16 +2020,6 @@ static int xfs_init_fs_context( mp->m_logbsize = -1; mp->m_allocsize_log = 16; /* 64k */
- /* - * Copy binary VFS mount flags we are interested in. - */ - if (fc->sb_flags & SB_RDONLY) - set_bit(XFS_OPSTATE_READONLY, &mp->m_opstate); - if (fc->sb_flags & SB_DIRSYNC) - mp->m_features |= XFS_FEAT_DIRSYNC; - if (fc->sb_flags & SB_SYNCHRONOUS) - mp->m_features |= XFS_FEAT_WSYNC; - fc->s_fs_info = mp; fc->ops = &xfs_context_ops;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit 805c74eac8cb306dc69b87b6b066ab4da77ceaf1 upstream.
Spurious wakeups are reported on the GPD G1619-04 which can be absolved by programming the GPIO to ignore wakeups.
Cc: stable@vger.kernel.org Reported-and-tested-by: George Melikov mail@gmelikov.ru Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3073 Signed-off-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpio/gpiolib-acpi.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+)
--- a/drivers/gpio/gpiolib-acpi.c +++ b/drivers/gpio/gpiolib-acpi.c @@ -1675,6 +1675,20 @@ static const struct dmi_system_id gpioli .ignore_interrupt = "INT33FC:00@3", }, }, + { + /* + * Spurious wakeups from TP_ATTN# pin + * Found in BIOS 0.35 + * https://gitlab.freedesktop.org/drm/amd/-/issues/3073 + */ + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "GPD"), + DMI_MATCH(DMI_PRODUCT_NAME, "G1619-04"), + }, + .driver_data = &(struct acpi_gpiolib_dmi_quirk) { + .ignore_wake = "PNP0C50:00@8", + }, + }, {} /* Terminating entry */ };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 192cdb1c907fd8df2d764c5bb17496e415e59391 upstream.
On systems using HWP, if a given frequency is equal to the maximum turbo frequency or the maximum non-turbo frequency, the HWP performance level corresponding to it is already known and can be used directly without any computation.
Accordingly, adjust the code to use the known HWP performance levels in the cases mentioned above.
This also helps to avoid limiting CPU capacity artificially in some cases when the BIOS produces the HWP_CAP numbers using a different E-core-to-P-core performance scaling factor than expected by the kernel.
Fixes: f5c8cf2a4992 ("cpufreq: intel_pstate: hybrid: Use known scaling factor for P-cores") Cc: 6.1+ stable@vger.kernel.org # 6.1+ Tested-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cpufreq/intel_pstate.c | 55 +++++++++++++++++++++++++---------------- 1 file changed, 34 insertions(+), 21 deletions(-)
--- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -526,6 +526,30 @@ static int intel_pstate_cppc_get_scaling } #endif /* CONFIG_ACPI_CPPC_LIB */
+static int intel_pstate_freq_to_hwp_rel(struct cpudata *cpu, int freq, + unsigned int relation) +{ + if (freq == cpu->pstate.turbo_freq) + return cpu->pstate.turbo_pstate; + + if (freq == cpu->pstate.max_freq) + return cpu->pstate.max_pstate; + + switch (relation) { + case CPUFREQ_RELATION_H: + return freq / cpu->pstate.scaling; + case CPUFREQ_RELATION_C: + return DIV_ROUND_CLOSEST(freq, cpu->pstate.scaling); + } + + return DIV_ROUND_UP(freq, cpu->pstate.scaling); +} + +static int intel_pstate_freq_to_hwp(struct cpudata *cpu, int freq) +{ + return intel_pstate_freq_to_hwp_rel(cpu, freq, CPUFREQ_RELATION_L); +} + /** * intel_pstate_hybrid_hwp_adjust - Calibrate HWP performance levels. * @cpu: Target CPU. @@ -543,6 +567,7 @@ static void intel_pstate_hybrid_hwp_adju int perf_ctl_scaling = cpu->pstate.perf_ctl_scaling; int perf_ctl_turbo = pstate_funcs.get_turbo(cpu->cpu); int scaling = cpu->pstate.scaling; + int freq;
pr_debug("CPU%d: perf_ctl_max_phys = %d\n", cpu->cpu, perf_ctl_max_phys); pr_debug("CPU%d: perf_ctl_turbo = %d\n", cpu->cpu, perf_ctl_turbo); @@ -556,16 +581,16 @@ static void intel_pstate_hybrid_hwp_adju cpu->pstate.max_freq = rounddown(cpu->pstate.max_pstate * scaling, perf_ctl_scaling);
- cpu->pstate.max_pstate_physical = - DIV_ROUND_UP(perf_ctl_max_phys * perf_ctl_scaling, - scaling); + freq = perf_ctl_max_phys * perf_ctl_scaling; + cpu->pstate.max_pstate_physical = intel_pstate_freq_to_hwp(cpu, freq);
- cpu->pstate.min_freq = cpu->pstate.min_pstate * perf_ctl_scaling; + freq = cpu->pstate.min_pstate * perf_ctl_scaling; + cpu->pstate.min_freq = freq; /* * Cast the min P-state value retrieved via pstate_funcs.get_min() to * the effective range of HWP performance levels. */ - cpu->pstate.min_pstate = DIV_ROUND_UP(cpu->pstate.min_freq, scaling); + cpu->pstate.min_pstate = intel_pstate_freq_to_hwp(cpu, freq); }
static inline void update_turbo_state(void) @@ -2528,13 +2553,12 @@ static void intel_pstate_update_perf_lim * abstract values to represent performance rather than pure ratios. */ if (hwp_active && cpu->pstate.scaling != perf_ctl_scaling) { - int scaling = cpu->pstate.scaling; int freq;
freq = max_policy_perf * perf_ctl_scaling; - max_policy_perf = DIV_ROUND_UP(freq, scaling); + max_policy_perf = intel_pstate_freq_to_hwp(cpu, freq); freq = min_policy_perf * perf_ctl_scaling; - min_policy_perf = DIV_ROUND_UP(freq, scaling); + min_policy_perf = intel_pstate_freq_to_hwp(cpu, freq); }
pr_debug("cpu:%d min_policy_perf:%d max_policy_perf:%d\n", @@ -2908,18 +2932,7 @@ static int intel_cpufreq_target(struct c
cpufreq_freq_transition_begin(policy, &freqs);
- switch (relation) { - case CPUFREQ_RELATION_L: - target_pstate = DIV_ROUND_UP(freqs.new, cpu->pstate.scaling); - break; - case CPUFREQ_RELATION_H: - target_pstate = freqs.new / cpu->pstate.scaling; - break; - default: - target_pstate = DIV_ROUND_CLOSEST(freqs.new, cpu->pstate.scaling); - break; - } - + target_pstate = intel_pstate_freq_to_hwp_rel(cpu, freqs.new, relation); target_pstate = intel_cpufreq_update_pstate(policy, target_pstate, false);
freqs.new = target_pstate * cpu->pstate.scaling; @@ -2937,7 +2950,7 @@ static unsigned int intel_cpufreq_fast_s
update_turbo_state();
- target_pstate = DIV_ROUND_UP(target_freq, cpu->pstate.scaling); + target_pstate = intel_pstate_freq_to_hwp(cpu, target_freq);
target_pstate = intel_cpufreq_update_pstate(policy, target_pstate, true);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit 6992eb815d087858f8d7e4020529c2fe800456b3 upstream.
This reverts commit 88b065943cb583e890324d618e8d4b23460d51a3.
Lenovo 82TQ is unhappy if we do the display on sequence this late. The display output shows severe corruption.
It's unclear if this is a failure on our part (perhaps something to do with sending commands in LP mode after HS /video mode transmission has been started? Though the backlight on command at least seems to work) or simply that there are some commands in the sequence that are needed to be done earlier (eg. could be some DSC init stuff?). If the latter then I don't think the current Windows code would work either, but maybe this was originally tested with an older driver, who knows.
Root causing this fully would likely require a lot of experimentation which isn't really feasible without direct access to the machine, so let's just accept failure and go back to the original sequence.
Cc: stable@vger.kernel.org Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10071 Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240116210821.30194-1-ville.s... Acked-by: Jani Nikula jani.nikula@intel.com (cherry picked from commit dc524d05974f615b145404191fcf91b478950499) Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/icl_dsi.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/drivers/gpu/drm/i915/display/icl_dsi.c +++ b/drivers/gpu/drm/i915/display/icl_dsi.c @@ -1155,6 +1155,7 @@ static void gen11_dsi_powerup_panel(stru }
intel_dsi_vbt_exec_sequence(intel_dsi, MIPI_SEQ_INIT_OTP); + intel_dsi_vbt_exec_sequence(intel_dsi, MIPI_SEQ_DISPLAY_ON);
/* ensure all panel commands dispatched before enabling transcoder */ wait_for_cmds_dispatched_to_panel(encoder); @@ -1255,8 +1256,6 @@ static void gen11_dsi_enable(struct inte /* step6d: enable dsi transcoder */ gen11_dsi_enable_transcoder(encoder);
- intel_dsi_vbt_exec_sequence(intel_dsi, MIPI_SEQ_DISPLAY_ON); - /* step7: enable backlight */ intel_backlight_enable(crtc_state, conn_state); intel_dsi_vbt_exec_sequence(intel_dsi, MIPI_SEQ_BACKLIGHT_ON);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
commit cb4daf271302d71a6b9a7c01bd0b6d76febd8f0c upstream.
If we get a deadlock after the fb lookup in drm_mode_page_flip_ioctl() we proceed to unref the fb and then retry the whole thing from the top. But we forget to reset the fb pointer back to NULL, and so if we then get another error during the retry, before the fb lookup, we proceed the unref the same fb again without having gotten another reference. The end result is that the fb will (eventually) end up being freed while it's still in use.
Reset fb to NULL once we've unreffed it to avoid doing it again until we've done another fb lookup.
This turned out to be pretty easy to hit on a DG2 when doing async flips (and CONFIG_DEBUG_WW_MUTEX_SLOWPATH=y). The first symptom I saw that drm_closefb() simply got stuck in a busy loop while walking the framebuffer list. Fortunately I was able to convince it to oops instead, and from there it was easier to track down the culprit.
Cc: stable@vger.kernel.org Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20231211081625.25704-1-ville.s... Acked-by: Javier Martinez Canillas javierm@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_plane.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/gpu/drm/drm_plane.c +++ b/drivers/gpu/drm/drm_plane.c @@ -1387,6 +1387,7 @@ retry: out: if (fb) drm_framebuffer_put(fb); + fb = NULL; if (plane->old_fb) drm_framebuffer_put(plane->old_fb); plane->old_fb = NULL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
commit 914437992876838662c968cb416f832110fb1093 upstream.
The i2c_master_send/recv() functions return negative error codes or the number of bytes that were able to be sent/received. This code has two problems. 1) Instead of checking if all the bytes were sent or received, it checks that at least one byte was sent or received. 2) If there was a partial send/receive then we should return a negative error code but this code returns success.
Fixes: a9fe713d7d45 ("drm/bridge: Add PTN3460 bridge driver") Cc: stable@vger.kernel.org Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Robert Foss rfoss@kernel.org Signed-off-by: Robert Foss rfoss@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/0cdc2dce-ca89-451a-9774-1482ab... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/bridge/nxp-ptn3460.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/bridge/nxp-ptn3460.c +++ b/drivers/gpu/drm/bridge/nxp-ptn3460.c @@ -56,13 +56,13 @@ static int ptn3460_read_bytes(struct ptn ret = i2c_master_send(ptn_bridge->client, &addr, 1); if (ret <= 0) { DRM_ERROR("Failed to send i2c command, ret=%d\n", ret); - return ret; + return ret ?: -EIO; }
ret = i2c_master_recv(ptn_bridge->client, buf, len); - if (ret <= 0) { + if (ret != len) { DRM_ERROR("Failed to recv i2c data, ret=%d\n", ret); - return ret; + return ret < 0 ? ret : -EIO; }
return 0; @@ -78,9 +78,9 @@ static int ptn3460_write_byte(struct ptn buf[1] = val;
ret = i2c_master_send(ptn_bridge->client, buf, ARRAY_SIZE(buf)); - if (ret <= 0) { + if (ret != ARRAY_SIZE(buf)) { DRM_ERROR("Failed to send i2c command, ret=%d\n", ret); - return ret; + return ret < 0 ? ret : -EIO; }
return 0;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Zimmermann tzimmermann@suse.de
commit 9cf5ca1f485cae406968947a92bf304603999fa1 upstream.
Non-KMS drivers have been removed from DRM. Update the TODO list accordingly.
Signed-off-by: Thomas Zimmermann tzimmermann@suse.de Fixes: a276afc19eec ("drm: Remove some obsolete drm pciids(tdfx, mga, i810, savage, r128, sis, via)") Cc: Cai Huoqing cai.huoqing@linux.dev Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Dave Airlie airlied@redhat.com Cc: Thomas Zimmermann tzimmermann@suse.de Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: David Airlie airlied@gmail.com Cc: Daniel Vetter daniel@ffwll.ch Cc: Jonathan Corbet corbet@lwn.net Cc: dri-devel@lists.freedesktop.org Cc: stable@vger.kernel.org # v6.3+ Cc: linux-doc@vger.kernel.org Reviewed-by: David Airlie airlied@gmail.com Reviewed-by: Daniel Vetter daniel@ffwll.ch Acked-by: Alex Deucher alexander.deucher@amd.com Link: https://patchwork.freedesktop.org/patch/msgid/20231122122449.11588-3-tzimmer... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/gpu/todo.rst | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/Documentation/gpu/todo.rst b/Documentation/gpu/todo.rst index 503d57c75215..41a264bf84ce 100644 --- a/Documentation/gpu/todo.rst +++ b/Documentation/gpu/todo.rst @@ -337,8 +337,8 @@ connector register/unregister fixes
Level: Intermediate
-Remove load/unload callbacks from all non-DRIVER_LEGACY drivers ---------------------------------------------------------------- +Remove load/unload callbacks +----------------------------
The load/unload callbacks in struct &drm_driver are very much midlayers, plus for historical reasons they get the ordering wrong (and we can't fix that) @@ -347,8 +347,7 @@ between setting up the &drm_driver structure and calling drm_dev_register(). - Rework drivers to no longer use the load/unload callbacks, directly coding the load/unload sequence into the driver's probe function.
-- Once all non-DRIVER_LEGACY drivers are converted, disallow the load/unload - callbacks for all modern drivers. +- Once all drivers are converted, remove the load/unload callbacks.
Contact: Daniel Vetter
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomi Valkeinen tomi.valkeinen@ideasonboard.com
commit 95d4b471953411854f9c80b568da7fcf753f3801 upstream.
tidss_crtc_atomic_flush() checks if the crtc is enabled, and if not, returns immediately as there's no reason to do any register changes.
However, the code checks for 'crtc->state->enable', which does not reflect the actual HW state. We should instead look at the 'crtc->state->active' flag.
This causes the tidss_crtc_atomic_flush() to proceed with the flush even if the active state is false, which then causes us to hit the WARN_ON(!crtc->state->event) check.
Fix this by checking the active flag, and while at it, fix the related debug print which had "active" and "needs modeset" wrong way.
Cc: stable@vger.kernel.org Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem") Reviewed-by: Aradhya Bhatia a-bhatia1@ti.com Link: https://lore.kernel.org/r/20231109-tidss-probe-v2-10-ac91b5ea35c0@ideasonboa... Signed-off-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/tidss/tidss_crtc.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/gpu/drm/tidss/tidss_crtc.c +++ b/drivers/gpu/drm/tidss/tidss_crtc.c @@ -169,13 +169,13 @@ static void tidss_crtc_atomic_flush(stru struct tidss_device *tidss = to_tidss(ddev); unsigned long flags;
- dev_dbg(ddev->dev, - "%s: %s enabled %d, needs modeset %d, event %p\n", __func__, - crtc->name, drm_atomic_crtc_needs_modeset(crtc->state), - crtc->state->enable, crtc->state->event); + dev_dbg(ddev->dev, "%s: %s is %sactive, %s modeset, event %p\n", + __func__, crtc->name, crtc->state->active ? "" : "not ", + drm_atomic_crtc_needs_modeset(crtc->state) ? "needs" : "doesn't need", + crtc->state->event);
/* There is nothing to do if CRTC is not going to be enabled. */ - if (!crtc->state->enable) + if (!crtc->state->active) return;
/*
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Zack Rusin zackr@vmware.com
commit 4e3b70da64a53784683cfcbac2deda5d6e540407 upstream.
Cursor planes on virtualized drivers have special meaning and require that the clients handle them in specific ways, e.g. the cursor plane should react to the mouse movement the way a mouse cursor would be expected to and the client is required to set hotspot properties on it in order for the mouse events to be routed correctly.
This breaks the contract as specified by the "universal planes". Fix it by disabling the cursor planes on virtualized drivers while adding a foundation on top of which it's possible to special case mouse cursor planes for clients that want it.
Disabling the cursor planes makes some kms compositors which were broken, e.g. Weston, fallback to software cursor which works fine or at least better than currently while having no effect on others, e.g. gnome-shell or kwin, which put virtualized drivers on a deny-list when running in atomic context to make them fallback to legacy kms and avoid this issue.
Signed-off-by: Zack Rusin zackr@vmware.com Fixes: 681e7ec73044 ("drm: Allow userspace to ask for universal plane list (v2)") Cc: stable@vger.kernel.org # v5.4+ Cc: Maarten Lankhorst maarten.lankhorst@linux.intel.com Cc: Maxime Ripard mripard@kernel.org Cc: Thomas Zimmermann tzimmermann@suse.de Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Dave Airlie airlied@redhat.com Cc: Gerd Hoffmann kraxel@redhat.com Cc: Hans de Goede hdegoede@redhat.com Cc: Gurchetan Singh gurchetansingh@chromium.org Cc: Chia-I Wu olvaffe@gmail.com Cc: dri-devel@lists.freedesktop.org Cc: virtualization@lists.linux-foundation.org Cc: spice-devel@lists.freedesktop.org Acked-by: Pekka Paalanen pekka.paalanen@collabora.com Reviewed-by: Javier Martinez Canillas javierm@redhat.com Acked-by: Simon Ser contact@emersion.fr Signed-off-by: Javier Martinez Canillas javierm@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20231023074613.41327-2-aesteve... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_plane.c | 13 +++++++++++++ drivers/gpu/drm/qxl/qxl_drv.c | 2 +- drivers/gpu/drm/vboxvideo/vbox_drv.c | 2 +- drivers/gpu/drm/virtio/virtgpu_drv.c | 2 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +- include/drm/drm_drv.h | 9 +++++++++ include/drm/drm_file.h | 12 ++++++++++++ 7 files changed, 38 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/drm_plane.c +++ b/drivers/gpu/drm/drm_plane.c @@ -678,6 +678,19 @@ int drm_mode_getplane_res(struct drm_dev !file_priv->universal_planes) continue;
+ /* + * If we're running on a virtualized driver then, + * unless userspace advertizes support for the + * virtualized cursor plane, disable cursor planes + * because they'll be broken due to missing cursor + * hotspot info. + */ + if (plane->type == DRM_PLANE_TYPE_CURSOR && + drm_core_check_feature(dev, DRIVER_CURSOR_HOTSPOT) && + file_priv->atomic && + !file_priv->supports_virtualized_cursor_plane) + continue; + if (drm_lease_held(file_priv, plane->base.id)) { if (count < plane_resp->count_planes && put_user(plane->base.id, plane_ptr + count)) --- a/drivers/gpu/drm/qxl/qxl_drv.c +++ b/drivers/gpu/drm/qxl/qxl_drv.c @@ -283,7 +283,7 @@ static const struct drm_ioctl_desc qxl_i };
static struct drm_driver qxl_driver = { - .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC, + .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC | DRIVER_CURSOR_HOTSPOT,
.dumb_create = qxl_mode_dumb_create, .dumb_map_offset = drm_gem_ttm_dumb_map_offset, --- a/drivers/gpu/drm/vboxvideo/vbox_drv.c +++ b/drivers/gpu/drm/vboxvideo/vbox_drv.c @@ -182,7 +182,7 @@ DEFINE_DRM_GEM_FOPS(vbox_fops);
static const struct drm_driver driver = { .driver_features = - DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC, + DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC | DRIVER_CURSOR_HOTSPOT,
.fops = &vbox_fops, .name = DRIVER_NAME, --- a/drivers/gpu/drm/virtio/virtgpu_drv.c +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c @@ -177,7 +177,7 @@ static const struct drm_driver driver = * out via drm_device::driver_features: */ .driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_RENDER | DRIVER_ATOMIC | - DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE, + DRIVER_SYNCOBJ | DRIVER_SYNCOBJ_TIMELINE | DRIVER_CURSOR_HOTSPOT, .open = virtio_gpu_driver_open, .postclose = virtio_gpu_driver_postclose,
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -1611,7 +1611,7 @@ static const struct file_operations vmwg
static const struct drm_driver driver = { .driver_features = - DRIVER_MODESET | DRIVER_RENDER | DRIVER_ATOMIC | DRIVER_GEM, + DRIVER_MODESET | DRIVER_RENDER | DRIVER_ATOMIC | DRIVER_GEM | DRIVER_CURSOR_HOTSPOT, .ioctls = vmw_ioctls, .num_ioctls = ARRAY_SIZE(vmw_ioctls), .master_set = vmw_master_set, --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -110,6 +110,15 @@ enum drm_driver_feature { * Driver supports user defined GPU VA bindings for GEM objects. */ DRIVER_GEM_GPUVA = BIT(8), + /** + * @DRIVER_CURSOR_HOTSPOT: + * + * Driver supports and requires cursor hotspot information in the + * cursor plane (e.g. cursor plane has to actually track the mouse + * cursor and the clients are required to set hotspot in order for + * the cursor planes to work correctly). + */ + DRIVER_CURSOR_HOTSPOT = BIT(9),
/* IMPORTANT: Below are all the legacy flags, add new ones above. */
--- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -229,6 +229,18 @@ struct drm_file { bool is_master;
/** + * @supports_virtualized_cursor_plane: + * + * This client is capable of handling the cursor plane with the + * restrictions imposed on it by the virtualized drivers. + * + * This implies that the cursor plane has to behave like a cursor + * i.e. track cursor movement. It also requires setting of the + * hotspot properties by the client on the cursor plane. + */ + bool supports_virtualized_cursor_plane; + + /** * @master: * * Master this node is currently associated with. Protected by struct
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Javier Martinez Canillas javierm@redhat.com
commit 0240db231dfe5ee5b7a3a03cba96f0844b7a673d upstream.
The driver does per-buffer uploads and needs to force a full plane update if the plane's attached framebuffer has change since the last page-flip.
Fixes: 01f05940a9a7 ("drm/virtio: Enable fb damage clips property for the primary plane") Cc: stable@vger.kernel.org # v6.4+ Reported-by: nerdopolis bluescreen_avenger@verizon.net Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218115 Suggested-by: Sima Vetter daniel.vetter@ffwll.ch Signed-off-by: Javier Martinez Canillas javierm@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de Reviewed-by: Zack Rusin zackr@vmware.com Acked-by: Sima Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/20231123221315.3579454-3-javie... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/virtio/virtgpu_plane.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
diff --git a/drivers/gpu/drm/virtio/virtgpu_plane.c b/drivers/gpu/drm/virtio/virtgpu_plane.c index 20de599658c1..a72a2dbda031 100644 --- a/drivers/gpu/drm/virtio/virtgpu_plane.c +++ b/drivers/gpu/drm/virtio/virtgpu_plane.c @@ -79,6 +79,8 @@ static int virtio_gpu_plane_atomic_check(struct drm_plane *plane, { struct drm_plane_state *new_plane_state = drm_atomic_get_new_plane_state(state, plane); + struct drm_plane_state *old_plane_state = drm_atomic_get_old_plane_state(state, + plane); bool is_cursor = plane->type == DRM_PLANE_TYPE_CURSOR; struct drm_crtc_state *crtc_state; int ret; @@ -86,6 +88,14 @@ static int virtio_gpu_plane_atomic_check(struct drm_plane *plane, if (!new_plane_state->fb || WARN_ON(!new_plane_state->crtc)) return 0;
+ /* + * Ignore damage clips if the framebuffer attached to the plane's state + * has changed since the last plane update (page-flip). In this case, a + * full plane update should happen because uploads are done per-buffer. + */ + if (old_plane_state->fb != new_plane_state->fb) + new_plane_state->ignore_damage_clips = true; + crtc_state = drm_atomic_get_crtc_state(state, new_plane_state->crtc); if (IS_ERR(crtc_state))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Javier Martinez Canillas javierm@redhat.com
commit 35ed38d58257336c1df26b14fd5110b026e2adde upstream.
It allows drivers to set a struct drm_plane_state .ignore_damage_clips in their plane's .atomic_check callback, as an indication to damage helpers such as drm_atomic_helper_damage_iter_init() that the damage clips should be ignored.
To be used by drivers that do per-buffer (e.g: virtio-gpu) uploads (rather than per-plane uploads), since these type of drivers need to handle buffer damages instead of frame damages.
That way, these drivers could force a full plane update if the framebuffer attached to a plane's state has changed since the last update (page-flip).
Fixes: 01f05940a9a7 ("drm/virtio: Enable fb damage clips property for the primary plane") Cc: stable@vger.kernel.org # v6.4+ Reported-by: nerdopolis bluescreen_avenger@verizon.net Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218115 Suggested-by: Thomas Zimmermann tzimmermann@suse.de Signed-off-by: Javier Martinez Canillas javierm@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de Reviewed-by: Zack Rusin zackr@vmware.com Acked-by: Sima Vetter daniel.vetter@ffwll.ch Link: https://patchwork.freedesktop.org/patch/msgid/20231123221315.3579454-2-javie... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- Documentation/gpu/drm-kms.rst | 2 ++ drivers/gpu/drm/drm_damage_helper.c | 3 ++- include/drm/drm_plane.h | 10 ++++++++++ 3 files changed, 14 insertions(+), 1 deletion(-)
--- a/Documentation/gpu/drm-kms.rst +++ b/Documentation/gpu/drm-kms.rst @@ -546,6 +546,8 @@ Plane Composition Properties .. kernel-doc:: drivers/gpu/drm/drm_blend.c :doc: overview
+.. _damage_tracking_properties: + Damage Tracking Properties --------------------------
--- a/drivers/gpu/drm/drm_damage_helper.c +++ b/drivers/gpu/drm/drm_damage_helper.c @@ -241,7 +241,8 @@ drm_atomic_helper_damage_iter_init(struc iter->plane_src.x2 = (src.x2 >> 16) + !!(src.x2 & 0xFFFF); iter->plane_src.y2 = (src.y2 >> 16) + !!(src.y2 & 0xFFFF);
- if (!iter->clips || !drm_rect_equals(&state->src, &old_state->src)) { + if (!iter->clips || state->ignore_damage_clips || + !drm_rect_equals(&state->src, &old_state->src)) { iter->clips = NULL; iter->num_clips = 0; iter->full_update = true; --- a/include/drm/drm_plane.h +++ b/include/drm/drm_plane.h @@ -191,6 +191,16 @@ struct drm_plane_state { struct drm_property_blob *fb_damage_clips;
/** + * @ignore_damage_clips: + * + * Set by drivers to indicate the drm_atomic_helper_damage_iter_init() + * helper that the @fb_damage_clips blob property should be ignored. + * + * See :ref:`damage_tracking_properties` for more information. + */ + bool ignore_damage_clips; + + /** * @src: * * source coordinates of the plane (in 16.16).
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Melissa Wen mwen@igalia.com
commit 3a0fa3bc245ef92838a8296e0055569b8dff94c4 upstream.
IGT `amdgpu/amd_color/crtc-lut-accuracy` fails right at the beginning of the test execution, during atomic check, because DC rejects the bandwidth state for a fb sizing 64x64. The test was previously working with the deprecated dc_commit_state(). Now using dc_validate_with_context() approach, the atomic check needs to perform a full state validation. Therefore, set fast_validation to false in the dc_validate_global_state call for atomic check.
Cc: stable@vger.kernel.org Fixes: b8272241ff9d ("drm/amd/display: Drop dc_commit_state in favor of dc_commit_streams") Signed-off-by: Melissa Wen mwen@igalia.com Signed-off-by: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -10400,7 +10400,7 @@ static int amdgpu_dm_atomic_check(struct DRM_DEBUG_DRIVER("drm_dp_mst_atomic_check() failed\n"); goto fail; } - status = dc_validate_global_state(dc, dm_state->context, true); + status = dc_validate_global_state(dc, dm_state->context, false); if (status != DC_OK) { DRM_DEBUG_DRIVER("DC global validation failure: %s (%d)", dc_status_to_str(status), status);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
commit 571c2fa26aa654946447c282a09d40a56c7ff128 upstream.
When screen brightness is rapidly changed and PSR-SU is enabled the display hangs on panels with this TCON even on the latest DCN 3.1.4 microcode (0x8002a81 at this time).
This was disabled previously as commit 072030b17830 ("drm/amd: Disable PSR-SU on Parade 0803 TCON") but reverted as commit 1e66a17ce546 ("Revert "drm/amd: Disable PSR-SU on Parade 0803 TCON"") in favor of testing for a new enough microcode (commit cd2e31a9ab93 ("drm/amd/display: Set minimum requirement for using PSR-SU on Phoenix")).
As hangs are still happening specifically with this TCON, disable PSR-SU again for it until it can be root caused.
Cc: stable@vger.kernel.org Cc: aaron.ma@canonical.com Cc: binli@gnome.org Cc: Marc Rossi Marc.Rossi@amd.com Cc: Hamza Mahfooz Hamza.Mahfooz@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2046131 Acked-by: Alex Deucher alexander.deucher@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/modules/power/power_helpers.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/gpu/drm/amd/display/modules/power/power_helpers.c +++ b/drivers/gpu/drm/amd/display/modules/power/power_helpers.c @@ -841,6 +841,8 @@ bool is_psr_su_specific_panel(struct dc_ isPSRSUSupported = false; else if (dpcd_caps->sink_dev_id_str[1] == 0x08 && dpcd_caps->sink_dev_id_str[0] == 0x03) isPSRSUSupported = false; + else if (dpcd_caps->sink_dev_id_str[1] == 0x08 && dpcd_caps->sink_dev_id_str[0] == 0x03) + isPSRSUSupported = false; else if (dpcd_caps->psr_info.force_psrsu_cap == 0x1) isPSRSUSupported = true; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ivan Lipski ivlipski@amd.com
commit c2ab9ce0ee7225fc05f58a6671c43b8a3684f530 upstream.
This commit causes dmesg-warn on several IGT tests on DCN 3.1.6: *ERROR* link_enc_cfg_validate: Invalid link encoder assignments - 0x1c
Affected IGT tests include: - amdgpu/[amd_assr|amd_plane|amd_hotplug] - kms_atomic - kms_color - kms_flip - kms_properties - kms_universal_plane
and some other tests
This reverts commit 3a0fa3bc245ef92838a8296e0055569b8dff94c4.
Cc: Melissa Wen mwen@igalia.com Cc: Hamza Mahfooz hamza.mahfooz@amd.com Reviewed-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Ivan Lipski ivlipski@amd.com Signed-off-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -10400,7 +10400,7 @@ static int amdgpu_dm_atomic_check(struct DRM_DEBUG_DRIVER("drm_dp_mst_atomic_check() failed\n"); goto fail; } - status = dc_validate_global_state(dc, dm_state->context, false); + status = dc_validate_global_state(dc, dm_state->context, true); if (status != DC_OK) { DRM_DEBUG_DRIVER("DC global validation failure: %s (%d)", dc_status_to_str(status), status);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dan Carpenter dan.carpenter@linaro.org
commit 28d3d0696688154cc04983f343011d07bf0508e4 upstream.
The i2c_master_send/recv() functions return negative error codes or they return "len" on success. So the error handling here can be written as just normal checks for "if (ret < 0) return ret;". No need to complicate things.
Btw, in this code the "len" parameter can never be zero, but even if it were, then I feel like this would still be the best way to write it.
Fixes: 914437992876 ("drm/bridge: nxp-ptn3460: fix i2c_master_send() error checking") Suggested-by: Neil Armstrong neil.armstrong@linaro.org Signed-off-by: Dan Carpenter dan.carpenter@linaro.org Reviewed-by: Robert Foss rfoss@kernel.org Signed-off-by: Robert Foss rfoss@kernel.org Link: https://patchwork.freedesktop.org/patch/msgid/04242630-42d8-4920-8c67-24ac9d... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/bridge/nxp-ptn3460.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
--- a/drivers/gpu/drm/bridge/nxp-ptn3460.c +++ b/drivers/gpu/drm/bridge/nxp-ptn3460.c @@ -54,15 +54,15 @@ static int ptn3460_read_bytes(struct ptn int ret;
ret = i2c_master_send(ptn_bridge->client, &addr, 1); - if (ret <= 0) { + if (ret < 0) { DRM_ERROR("Failed to send i2c command, ret=%d\n", ret); - return ret ?: -EIO; + return ret; }
ret = i2c_master_recv(ptn_bridge->client, buf, len); - if (ret != len) { + if (ret < 0) { DRM_ERROR("Failed to recv i2c data, ret=%d\n", ret); - return ret < 0 ? ret : -EIO; + return ret; }
return 0; @@ -78,9 +78,9 @@ static int ptn3460_write_byte(struct ptn buf[1] = val;
ret = i2c_master_send(ptn_bridge->client, buf, ARRAY_SIZE(buf)); - if (ret != ARRAY_SIZE(buf)) { + if (ret < 0) { DRM_ERROR("Failed to send i2c command, ret=%d\n", ret); - return ret < 0 ? ret : -EIO; + return ret; }
return 0;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Likun Gao Likun.Gao@amd.com
commit f4a94dbb6dc0bed10a5fc63718d00f1de45b12c0 upstream.
Correct the algorithm of active CU to skip disabled sa for gfx v11.
Signed-off-by: Likun Gao Likun.Gao@amd.com Reviewed-by: Hawking Zhang Hawking.Zhang@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -6353,6 +6353,9 @@ static int gfx_v11_0_get_cu_info(struct mutex_lock(&adev->grbm_idx_mutex); for (i = 0; i < adev->gfx.config.max_shader_engines; i++) { for (j = 0; j < adev->gfx.config.max_sh_per_se; j++) { + bitmap = i * adev->gfx.config.max_sh_per_se + j; + if (!((gfx_v11_0_get_sa_active_bitmap(adev) >> bitmap) & 1)) + continue; mask = 1; counter = 0; gfx_v11_0_select_se_sh(adev, i, j, 0xffffffff, 0);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivasan Shanmugam srinivasan.shanmugam@amd.com
commit 7073934f5d73f8b53308963cee36f0d389ea857c upstream.
In edp_setup_replay(), 'struct dc *dc' & 'struct dmub_replay *replay' was dereferenced before the pointer 'link' & 'replay' NULL check.
Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_edp_panel_control.c:947 edp_setup_replay() warn: variable dereferenced before check 'link' (see line 933)
Cc: stable@vger.kernel.org Cc: Bhawanpreet Lakha Bhawanpreet.Lakha@amd.com Cc: Harry Wentland harry.wentland@amd.com Cc: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Cc: Aurabindo Pillai aurabindo.pillai@amd.com Cc: Alex Deucher alexander.deucher@amd.com Signed-off-by: Srinivasan Shanmugam srinivasan.shanmugam@amd.com Reviewed-by: Aurabindo Pillai aurabindo.pillai@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c | 11 ++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c +++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c @@ -920,8 +920,8 @@ bool edp_get_replay_state(const struct d bool edp_setup_replay(struct dc_link *link, const struct dc_stream_state *stream) { /* To-do: Setup Replay */ - struct dc *dc = link->ctx->dc; - struct dmub_replay *replay = dc->res_pool->replay; + struct dc *dc; + struct dmub_replay *replay; int i; unsigned int panel_inst; struct replay_context replay_context = { 0 }; @@ -937,6 +937,10 @@ bool edp_setup_replay(struct dc_link *li if (!link) return false;
+ dc = link->ctx->dc; + + replay = dc->res_pool->replay; + if (!replay) return false;
@@ -965,8 +969,7 @@ bool edp_setup_replay(struct dc_link *li
replay_context.line_time_in_ns = lineTimeInNs;
- if (replay) - link->replay_settings.replay_feature_enabled = + link->replay_settings.replay_feature_enabled = replay->funcs->replay_copy_settings(replay, link, &replay_context, panel_inst); if (link->replay_settings.replay_feature_enabled) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Nicholas Kazlauskas nicholas.kazlauskas@amd.com
commit 4b56f7d47be87cde5f368b67bc7fac53a2c3e8d2 upstream.
[Why] We can experience DENTIST hangs during optimize_bandwidth or TDRs if FIFO is toggled and hangs.
[How] Port the DCN35 fixes to DCN314.
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Reviewed-by: Charlene Liu charlene.liu@amd.com Acked-by: Alex Hung alex.hung@amd.com Signed-off-by: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c | 21 ++++------ 1 file changed, 9 insertions(+), 12 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn314/dcn314_clk_mgr.c @@ -131,30 +131,27 @@ static int dcn314_get_active_display_cnt return display_count; }
-static void dcn314_disable_otg_wa(struct clk_mgr *clk_mgr_base, struct dc_state *context, bool disable) +static void dcn314_disable_otg_wa(struct clk_mgr *clk_mgr_base, struct dc_state *context, + bool safe_to_lower, bool disable) { struct dc *dc = clk_mgr_base->ctx->dc; int i;
for (i = 0; i < dc->res_pool->pipe_count; ++i) { - struct pipe_ctx *pipe = &dc->current_state->res_ctx.pipe_ctx[i]; + struct pipe_ctx *pipe = safe_to_lower + ? &context->res_ctx.pipe_ctx[i] + : &dc->current_state->res_ctx.pipe_ctx[i];
if (pipe->top_pipe || pipe->prev_odm_pipe) continue; if (pipe->stream && (pipe->stream->dpms_off || dc_is_virtual_signal(pipe->stream->signal))) { - struct stream_encoder *stream_enc = pipe->stream_res.stream_enc; - if (disable) { - if (stream_enc && stream_enc->funcs->disable_fifo) - pipe->stream_res.stream_enc->funcs->disable_fifo(stream_enc); + if (pipe->stream_res.tg && pipe->stream_res.tg->funcs->immediate_disable_crtc) + pipe->stream_res.tg->funcs->immediate_disable_crtc(pipe->stream_res.tg);
- pipe->stream_res.tg->funcs->immediate_disable_crtc(pipe->stream_res.tg); reset_sync_context_for_pipe(dc, context, i); } else { pipe->stream_res.tg->funcs->enable_crtc(pipe->stream_res.tg); - - if (stream_enc && stream_enc->funcs->enable_fifo) - pipe->stream_res.stream_enc->funcs->enable_fifo(stream_enc); } } } @@ -252,11 +249,11 @@ void dcn314_update_clocks(struct clk_mgr }
if (should_set_clock(safe_to_lower, new_clocks->dispclk_khz, clk_mgr_base->clks.dispclk_khz)) { - dcn314_disable_otg_wa(clk_mgr_base, context, true); + dcn314_disable_otg_wa(clk_mgr_base, context, safe_to_lower, true);
clk_mgr_base->clks.dispclk_khz = new_clocks->dispclk_khz; dcn314_smu_set_dispclk(clk_mgr, clk_mgr_base->clks.dispclk_khz); - dcn314_disable_otg_wa(clk_mgr_base, context, false); + dcn314_disable_otg_wa(clk_mgr_base, context, safe_to_lower, false);
update_dispclk = true; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wayne Lin Wayne.Lin@amd.com
commit bfe79f5fff1300d96203383582b078c7b0aec80a upstream.
[Why] For usb4 connector, AUX transaction is handled by dmub utilizing a differnt code path comparing to legacy DP connector. If the usb4 DP connector is disconnected, AUX access will report EBUSY and cause igt@kms_dp_aux_dev fail.
[How] Align the error code with the one reported by legacy DP as EIO.
Cc: Mario Limonciello mario.limonciello@amd.com Cc: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Acked-by: Alex Hung alex.hung@amd.com Signed-off-by: Wayne Lin Wayne.Lin@amd.com Tested-by: Daniel Wheeler daniel.wheeler@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_helpers.c @@ -956,6 +956,11 @@ int dm_helper_dmub_aux_transfer_sync( struct aux_payload *payload, enum aux_return_code_type *operation_result) { + if (!link->hpd_status) { + *operation_result = AUX_RET_ERROR_HPD_DISCON; + return -1; + } + return amdgpu_dm_process_dmub_aux_transfer_sync(ctx, link->link_index, payload, operation_result); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivasan Shanmugam srinivasan.shanmugam@amd.com
commit 3bb9b1f958c3d986ed90a3ff009f1e77e9553207 upstream.
In link_set_dsc_pps_packet(), 'struct display_stream_compressor *dsc' was dereferenced in a DC_LOGGER_INIT(dsc->ctx->logger); before the 'dsc' NULL pointer check.
Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_dpms.c:905 link_set_dsc_pps_packet() warn: variable dereferenced before check 'dsc' (see line 903)
Cc: stable@vger.kernel.org Cc: Aurabindo Pillai aurabindo.pillai@amd.com Cc: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Cc: Hamza Mahfooz hamza.mahfooz@amd.com Cc: Wenjing Liu wenjing.liu@amd.com Cc: Qingqing Zhuo qingqing.zhuo@amd.com Signed-off-by: Srinivasan Shanmugam srinivasan.shanmugam@amd.com Reviewed-by: Aurabindo Pillai aurabindo.pillai@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/link/link_dpms.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/link/link_dpms.c +++ b/drivers/gpu/drm/amd/display/dc/link/link_dpms.c @@ -873,11 +873,15 @@ bool link_set_dsc_pps_packet(struct pipe { struct display_stream_compressor *dsc = pipe_ctx->stream_res.dsc; struct dc_stream_state *stream = pipe_ctx->stream; - DC_LOGGER_INIT(dsc->ctx->logger);
- if (!pipe_ctx->stream->timing.flags.DSC || !dsc) + if (!pipe_ctx->stream->timing.flags.DSC) + return false; + + if (!dsc) return false;
+ DC_LOGGER_INIT(dsc->ctx->logger); + if (enable) { struct dsc_config dsc_cfg; uint8_t dsc_packed_pps[128];
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ma Jun Jun.Ma2@amd.com
commit ca1ffb174f16b699c536734fc12a4162097c49f4 upstream.
The power source flag should be updated when [1] System receives an interrupt indicating that the power source has changed. [2] System resumes from suspend or runtime suspend
Signed-off-by: Ma Jun Jun.Ma2@amd.com Reviewed-by: Lijo Lazar lijo.lazar@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 13 +++---------- drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c | 2 ++ drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c | 2 ++ 3 files changed, 7 insertions(+), 10 deletions(-)
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c @@ -24,6 +24,7 @@
#include <linux/firmware.h> #include <linux/pci.h> +#include <linux/power_supply.h> #include <linux/reboot.h>
#include "amdgpu.h" @@ -741,16 +742,8 @@ static int smu_late_init(void *handle) * handle the switch automatically. Driver involvement * is unnecessary. */ - if (!smu->dc_controlled_by_gpio) { - ret = smu_set_power_source(smu, - adev->pm.ac_power ? SMU_POWER_SOURCE_AC : - SMU_POWER_SOURCE_DC); - if (ret) { - dev_err(adev->dev, "Failed to switch to %s mode!\n", - adev->pm.ac_power ? "AC" : "DC"); - return ret; - } - } + adev->pm.ac_power = power_supply_is_system_supplied() > 0; + smu_set_ac_dc(smu);
if ((adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 1)) || (adev->ip_versions[MP1_HWIP][0] == IP_VERSION(13, 0, 3))) --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c @@ -1441,10 +1441,12 @@ static int smu_v11_0_irq_process(struct case 0x3: dev_dbg(adev->dev, "Switched to AC mode!\n"); schedule_work(&smu->interrupt_work); + adev->pm.ac_power = true; break; case 0x4: dev_dbg(adev->dev, "Switched to DC mode!\n"); schedule_work(&smu->interrupt_work); + adev->pm.ac_power = false; break; case 0x7: /* --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0.c @@ -1377,10 +1377,12 @@ static int smu_v13_0_irq_process(struct case 0x3: dev_dbg(adev->dev, "Switched to AC mode!\n"); smu_v13_0_ack_ac_dc_interrupt(smu); + adev->pm.ac_power = true; break; case 0x4: dev_dbg(adev->dev, "Switched to DC mode!\n"); smu_v13_0_ack_ac_dc_interrupt(smu); + adev->pm.ac_power = false; break; case 0x7: /*
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Srinivasan Shanmugam srinivasan.shanmugam@amd.com
commit a58371d632ebab9ea63f10893a6b6731196b6f8d upstream.
The 'status' variable in 'core_link_read_dpcd()' & 'core_link_write_dpcd()' was uninitialized.
Thus, initializing 'status' variable to 'DC_ERROR_UNEXPECTED' by default.
Fixes the below: drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dpcd.c:226 core_link_read_dpcd() error: uninitialized symbol 'status'. drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_dpcd.c:248 core_link_write_dpcd() error: uninitialized symbol 'status'.
Cc: stable@vger.kernel.org Cc: Jerry Zuo jerry.zuo@amd.com Cc: Jun Lei Jun.Lei@amd.com Cc: Wayne Lin Wayne.Lin@amd.com Cc: Aurabindo Pillai aurabindo.pillai@amd.com Cc: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Cc: Hamza Mahfooz hamza.mahfooz@amd.com Signed-off-by: Srinivasan Shanmugam srinivasan.shanmugam@amd.com Reviewed-by: Rodrigo Siqueira Rodrigo.Siqueira@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/display/dc/link/protocols/link_dpcd.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dpcd.c +++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dpcd.c @@ -205,7 +205,7 @@ enum dc_status core_link_read_dpcd( uint32_t extended_size; /* size of the remaining partitioned address space */ uint32_t size_left_to_read; - enum dc_status status; + enum dc_status status = DC_ERROR_UNEXPECTED; /* size of the next partition to be read from */ uint32_t partition_size; uint32_t data_index = 0; @@ -234,7 +234,7 @@ enum dc_status core_link_write_dpcd( { uint32_t partition_size; uint32_t data_index = 0; - enum dc_status status; + enum dc_status status = DC_ERROR_UNEXPECTED;
while (size) { partition_size = dpcd_get_next_partition_size(address, size);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Martin KaFai Lau martin.lau@kernel.org
commit 9c1292eca243821249fa99f40175b0660d9329e3 upstream.
It was reported that there is a compiler warning on the unused variable "sin_addr_len" in af_inet.c when CONFIG_CGROUP_BPF is not set. This patch is to address it similar to the ipv6 counterpart in inet6_getname(). It is to "return sin_addr_len;" instead of "return sizeof(*sin);".
Fixes: fefba7d1ae19 ("bpf: Propagate modified uaddrlen from cgroup sockaddr programs") Reported-by: Stephen Rothwell sfr@canb.auug.org.au Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Andrii Nakryiko andrii@kernel.org Reviewed-by: Kuniyuki Iwashima kuniyu@amazon.com Link: https://lore.kernel.org/bpf/20231013185702.3993710-1-martin.lau@linux.dev Closes: https://lore.kernel.org/bpf/20231013114007.2fb09691@canb.auug.org.au/ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/ipv4/af_inet.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -823,7 +823,7 @@ int inet_getname(struct socket *sock, st } release_sock(sk); memset(sin->sin_zero, 0, sizeof(sin->sin_zero)); - return sizeof(*sin); + return sin_addr_len; } EXPORT_SYMBOL(inet_getname);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Neri ricardo.neri-calderon@linux.intel.com
[ Upstream commit 8a8b6bb93c704776c4b05cb517c3fa8baffb72f5 ]
In preparation for the addition of a suspend notifier, wrap the logic to enable HFI and program its memory buffer into helper functions. Both the CPU hotplug callback and the suspend notifier will use them.
This refactoring does not introduce functional changes.
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: 97566d09fd02 ("thermal: intel: hfi: Add syscore callbacks for system-wide PM") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/intel/intel_hfi.c | 43 ++++++++++++++++--------------- 1 file changed, 22 insertions(+), 21 deletions(-)
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index c69db6c90869..820613e293cd 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -347,6 +347,26 @@ static void init_hfi_instance(struct hfi_instance *hfi_instance) hfi_instance->data = hfi_instance->hdr + hfi_features.hdr_size; }
+/* Caller must hold hfi_instance_lock. */ +static void hfi_enable(void) +{ + u64 msr_val; + + rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val); + msr_val |= HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT; + wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val); +} + +static void hfi_set_hw_table(struct hfi_instance *hfi_instance) +{ + phys_addr_t hw_table_pa; + u64 msr_val; + + hw_table_pa = virt_to_phys(hfi_instance->hw_table); + msr_val = hw_table_pa | HW_FEEDBACK_PTR_VALID_BIT; + wrmsrl(MSR_IA32_HW_FEEDBACK_PTR, msr_val); +} + /** * intel_hfi_online() - Enable HFI on @cpu * @cpu: CPU in which the HFI will be enabled @@ -364,8 +384,6 @@ void intel_hfi_online(unsigned int cpu) { struct hfi_instance *hfi_instance; struct hfi_cpu_info *info; - phys_addr_t hw_table_pa; - u64 msr_val; u16 die_id;
/* Nothing to do if hfi_instances are missing. */ @@ -409,8 +427,6 @@ void intel_hfi_online(unsigned int cpu) if (!hfi_instance->hw_table) goto unlock;
- hw_table_pa = virt_to_phys(hfi_instance->hw_table); - /* * Allocate memory to keep a local copy of the table that * hardware generates. @@ -420,16 +436,6 @@ void intel_hfi_online(unsigned int cpu) if (!hfi_instance->local_table) goto free_hw_table;
- /* - * Program the address of the feedback table of this die/package. On - * some processors, hardware remembers the old address of the HFI table - * even after having been reprogrammed and re-enabled. Thus, do not free - * the pages allocated for the table or reprogram the hardware with a - * new base address. Namely, program the hardware only once. - */ - msr_val = hw_table_pa | HW_FEEDBACK_PTR_VALID_BIT; - wrmsrl(MSR_IA32_HW_FEEDBACK_PTR, msr_val); - init_hfi_instance(hfi_instance);
INIT_DELAYED_WORK(&hfi_instance->update_work, hfi_update_work_fn); @@ -438,13 +444,8 @@ void intel_hfi_online(unsigned int cpu)
cpumask_set_cpu(cpu, hfi_instance->cpus);
- /* - * Enable the hardware feedback interface and never disable it. See - * comment on programming the address of the table. - */ - rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val); - msr_val |= HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT; - wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val); + hfi_set_hw_table(hfi_instance); + hfi_enable();
unlock: mutex_unlock(&hfi_instance_lock);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Neri ricardo.neri-calderon@linux.intel.com
[ Upstream commit 1c53081d773c2cb4461636559b0d55b46559ceec ]
In preparation to support hibernation, add functionality to disable an HFI instance during CPU offline. The last CPU of an instance that goes offline will disable such instance.
The Intel Software Development Manual states that the operating system must wait for the hardware to set MSR_IA32_PACKAGE_THERM_STATUS[26] after disabling an HFI instance to ensure that it will no longer write on the HFI memory. Some processors, however, do not ever set such bit. Wait a minimum of 2ms to give time hardware to complete any pending memory writes.
Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: 97566d09fd02 ("thermal: intel: hfi: Add syscore callbacks for system-wide PM") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/intel/intel_hfi.c | 35 +++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+)
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index 820613e293cd..bb25c75acd45 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -24,6 +24,7 @@ #include <linux/bitops.h> #include <linux/cpufeature.h> #include <linux/cpumask.h> +#include <linux/delay.h> #include <linux/gfp.h> #include <linux/io.h> #include <linux/kernel.h> @@ -367,6 +368,32 @@ static void hfi_set_hw_table(struct hfi_instance *hfi_instance) wrmsrl(MSR_IA32_HW_FEEDBACK_PTR, msr_val); }
+/* Caller must hold hfi_instance_lock. */ +static void hfi_disable(void) +{ + u64 msr_val; + int i; + + rdmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val); + msr_val &= ~HW_FEEDBACK_CONFIG_HFI_ENABLE_BIT; + wrmsrl(MSR_IA32_HW_FEEDBACK_CONFIG, msr_val); + + /* + * Wait for hardware to acknowledge the disabling of HFI. Some + * processors may not do it. Wait for ~2ms. This is a reasonable + * time for hardware to complete any pending actions on the HFI + * memory. + */ + for (i = 0; i < 2000; i++) { + rdmsrl(MSR_IA32_PACKAGE_THERM_STATUS, msr_val); + if (msr_val & PACKAGE_THERM_STATUS_HFI_UPDATED) + break; + + udelay(1); + cpu_relax(); + } +} + /** * intel_hfi_online() - Enable HFI on @cpu * @cpu: CPU in which the HFI will be enabled @@ -421,6 +448,10 @@ void intel_hfi_online(unsigned int cpu) /* * Hardware is programmed with the physical address of the first page * frame of the table. Hence, the allocated memory must be page-aligned. + * + * Some processors do not forget the initial address of the HFI table + * even after having been reprogrammed. Keep using the same pages. Do + * not free them. */ hfi_instance->hw_table = alloc_pages_exact(hfi_features.nr_table_pages, GFP_KERNEL | __GFP_ZERO); @@ -485,6 +516,10 @@ void intel_hfi_offline(unsigned int cpu)
mutex_lock(&hfi_instance_lock); cpumask_clear_cpu(cpu, hfi_instance->cpus); + + if (!cpumask_weight(hfi_instance->cpus)) + hfi_disable(); + mutex_unlock(&hfi_instance_lock); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ricardo Neri ricardo.neri-calderon@linux.intel.com
[ Upstream commit 97566d09fd02d2ab329774bb89a2cdf2267e86d9 ]
The kernel allocates a memory buffer and provides its location to the hardware, which uses it to update the HFI table. This allocation occurs during boot and remains constant throughout runtime.
When resuming from hibernation, the restore kernel allocates a second memory buffer and reprograms the HFI hardware with the new location as part of a normal boot. The location of the second memory buffer may differ from the one allocated by the image kernel.
When the restore kernel transfers control to the image kernel, its HFI buffer becomes invalid, potentially leading to memory corruption if the hardware writes to it (the hardware continues to use the buffer from the restore kernel).
It is also possible that the hardware "forgets" the address of the memory buffer when resuming from "deep" suspend. Memory corruption may also occur in such a scenario.
To prevent the described memory corruption, disable HFI when preparing to suspend or hibernate. Enable it when resuming.
Add syscore callbacks to handle the package of the boot CPU (packages of non-boot CPUs are handled via CPU offline). Syscore ops always run on the boot CPU. Additionally, HFI only needs to be disabled during "deep" suspend and hibernation. Syscore ops only run in these cases.
Cc: 6.1+ stable@vger.kernel.org # 6.1+ Signed-off-by: Ricardo Neri ricardo.neri-calderon@linux.intel.com [ rjw: Comment adjustment, subject and changelog edits ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/intel/intel_hfi.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+)
diff --git a/drivers/thermal/intel/intel_hfi.c b/drivers/thermal/intel/intel_hfi.c index bb25c75acd45..1c5a429b2e3e 100644 --- a/drivers/thermal/intel/intel_hfi.c +++ b/drivers/thermal/intel/intel_hfi.c @@ -35,7 +35,9 @@ #include <linux/processor.h> #include <linux/slab.h> #include <linux/spinlock.h> +#include <linux/suspend.h> #include <linux/string.h> +#include <linux/syscore_ops.h> #include <linux/topology.h> #include <linux/workqueue.h>
@@ -568,6 +570,30 @@ static __init int hfi_parse_features(void) return 0; }
+static void hfi_do_enable(void) +{ + /* This code runs only on the boot CPU. */ + struct hfi_cpu_info *info = &per_cpu(hfi_cpu_info, 0); + struct hfi_instance *hfi_instance = info->hfi_instance; + + /* No locking needed. There is no concurrency with CPU online. */ + hfi_set_hw_table(hfi_instance); + hfi_enable(); +} + +static int hfi_do_disable(void) +{ + /* No locking needed. There is no concurrency with CPU offline. */ + hfi_disable(); + + return 0; +} + +static struct syscore_ops hfi_pm_ops = { + .resume = hfi_do_enable, + .suspend = hfi_do_disable, +}; + void __init intel_hfi_init(void) { struct hfi_instance *hfi_instance; @@ -599,6 +625,8 @@ void __init intel_hfi_init(void) if (!hfi_updates_wq) goto err_nomem;
+ register_syscore_ops(&hfi_pm_ops); + return;
err_nomem:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Max Kellermann max.kellermann@ionos.com
[ Upstream commit b4bd6b4bac8edd61eb8f7b836969d12c0c6af165 ]
This declutters the code by reducing the number of #ifdefs and makes the watch_queue checks simpler. This has no runtime effect; the machine code is identical.
Signed-off-by: Max Kellermann max.kellermann@ionos.com Message-Id: 20230921075755.1378787-2-max.kellermann@ionos.com Reviewed-by: David Howells dhowells@redhat.com Signed-off-by: Christian Brauner brauner@kernel.org Stable-dep-of: e95aada4cb93 ("pipe: wakeup wr_wait after setting max_usage") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/pipe.c | 12 +++--------- include/linux/pipe_fs_i.h | 16 ++++++++++++++++ 2 files changed, 19 insertions(+), 9 deletions(-)
diff --git a/fs/pipe.c b/fs/pipe.c index 139190165a1c..603ab19b0861 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -437,12 +437,10 @@ pipe_write(struct kiocb *iocb, struct iov_iter *from) goto out; }
-#ifdef CONFIG_WATCH_QUEUE - if (pipe->watch_queue) { + if (pipe_has_watch_queue(pipe)) { ret = -EXDEV; goto out; } -#endif
/* * If it wasn't empty we try to merge new data into @@ -1324,10 +1322,8 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned int arg) unsigned int nr_slots, size; long ret = 0;
-#ifdef CONFIG_WATCH_QUEUE - if (pipe->watch_queue) + if (pipe_has_watch_queue(pipe)) return -EBUSY; -#endif
size = round_pipe_size(arg); nr_slots = size >> PAGE_SHIFT; @@ -1379,10 +1375,8 @@ struct pipe_inode_info *get_pipe_info(struct file *file, bool for_splice)
if (file->f_op != &pipefifo_fops || !pipe) return NULL; -#ifdef CONFIG_WATCH_QUEUE - if (for_splice && pipe->watch_queue) + if (for_splice && pipe_has_watch_queue(pipe)) return NULL; -#endif return pipe; }
diff --git a/include/linux/pipe_fs_i.h b/include/linux/pipe_fs_i.h index 608a9eb86bff..288a8081a9db 100644 --- a/include/linux/pipe_fs_i.h +++ b/include/linux/pipe_fs_i.h @@ -124,6 +124,22 @@ struct pipe_buf_operations { bool (*get)(struct pipe_inode_info *, struct pipe_buffer *); };
+/** + * pipe_has_watch_queue - Check whether the pipe is a watch_queue, + * i.e. it was created with O_NOTIFICATION_PIPE + * @pipe: The pipe to check + * + * Return: true if pipe is a watch queue, false otherwise. + */ +static inline bool pipe_has_watch_queue(const struct pipe_inode_info *pipe) +{ +#ifdef CONFIG_WATCH_QUEUE + return pipe->watch_queue != NULL; +#else + return false; +#endif +} + /** * pipe_empty - Return true if the pipe is empty * @head: The pipe ring head pointer
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Lukas Schauer lukas@schauer.dev
[ Upstream commit e95aada4cb93d42e25c30a0ef9eb2923d9711d4a ]
Commit c73be61cede5 ("pipe: Add general notification queue support") a regression was introduced that would lock up resized pipes under certain conditions. See the reproducer in [1].
The commit resizing the pipe ring size was moved to a different function, doing that moved the wakeup for pipe->wr_wait before actually raising pipe->max_usage. If a pipe was full before the resize occured it would result in the wakeup never actually triggering pipe_write.
Set @max_usage and @nr_accounted before waking writers if this isn't a watch queue.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=212295 [1] Link: https://lore.kernel.org/r/20231201-orchideen-modewelt-e009de4562c6@brauner Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reviewed-by: David Howells dhowells@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Lukas Schauer lukas@schauer.dev [Christian Brauner brauner@kernel.org: rewrite to account for watch queues] Signed-off-by: Christian Brauner brauner@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/pipe.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/fs/pipe.c b/fs/pipe.c index 603ab19b0861..a234035cc375 100644 --- a/fs/pipe.c +++ b/fs/pipe.c @@ -1305,6 +1305,11 @@ int pipe_resize_ring(struct pipe_inode_info *pipe, unsigned int nr_slots) pipe->tail = tail; pipe->head = head;
+ if (!pipe_has_watch_queue(pipe)) { + pipe->max_usage = nr_slots; + pipe->nr_accounted = nr_slots; + } + spin_unlock_irq(&pipe->rd_wait.lock);
/* This might have made more room for writers */ @@ -1356,8 +1361,6 @@ static long pipe_set_size(struct pipe_inode_info *pipe, unsigned int arg) if (ret < 0) goto out_revert_acct;
- pipe->max_usage = nr_slots; - pipe->nr_accounted = nr_slots; return pipe->max_usage * PAGE_SIZE;
out_revert_acct:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sakari Ailus sakari.ailus@linux.intel.com
[ Upstream commit eba5058633b4d11e2a4d65eae9f1fce0b96365d9 ]
linux/bits.h is needed for GENMASK(). Include it.
Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Reviewed-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Stable-dep-of: d92e7a013ff3 ("media: v4l2-cci: Add support for little-endian encoded registers") Signed-off-by: Sasha Levin sashal@kernel.org --- include/media/v4l2-cci.h | 1 + 1 file changed, 1 insertion(+)
diff --git a/include/media/v4l2-cci.h b/include/media/v4l2-cci.h index 0f6803e4b17e..f2c2962e936b 100644 --- a/include/media/v4l2-cci.h +++ b/include/media/v4l2-cci.h @@ -7,6 +7,7 @@ #ifndef _V4L2_CCI_H #define _V4L2_CCI_H
+#include <linux/bits.h> #include <linux/types.h>
struct i2c_client;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sakari Ailus sakari.ailus@linux.intel.com
[ Upstream commit cd93cc245dfe334c38da98c14b34f9597e1b4ea6 ]
Add CCI_REG_WIDTH() macro to obtain register width in bits and similarly, CCI_REG_WIDTH_BYTES() to obtain it in bytes.
Also add CCI_REG_ADDR() macro to obtain the address of a register.
Use both macros in v4l2-cci.c, too.
Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Reviewed-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Stable-dep-of: d92e7a013ff3 ("media: v4l2-cci: Add support for little-endian encoded registers") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/v4l2-core/v4l2-cci.c | 8 ++++---- include/media/v4l2-cci.h | 5 +++++ 2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/media/v4l2-core/v4l2-cci.c b/drivers/media/v4l2-core/v4l2-cci.c index bc2dbec019b0..3179160abde3 100644 --- a/drivers/media/v4l2-core/v4l2-cci.c +++ b/drivers/media/v4l2-core/v4l2-cci.c @@ -25,8 +25,8 @@ int cci_read(struct regmap *map, u32 reg, u64 *val, int *err) if (err && *err) return *err;
- len = FIELD_GET(CCI_REG_WIDTH_MASK, reg); - reg = FIELD_GET(CCI_REG_ADDR_MASK, reg); + len = CCI_REG_WIDTH_BYTES(reg); + reg = CCI_REG_ADDR(reg);
ret = regmap_bulk_read(map, reg, buf, len); if (ret) { @@ -75,8 +75,8 @@ int cci_write(struct regmap *map, u32 reg, u64 val, int *err) if (err && *err) return *err;
- len = FIELD_GET(CCI_REG_WIDTH_MASK, reg); - reg = FIELD_GET(CCI_REG_ADDR_MASK, reg); + len = CCI_REG_WIDTH_BYTES(reg); + reg = CCI_REG_ADDR(reg);
switch (len) { case 1: diff --git a/include/media/v4l2-cci.h b/include/media/v4l2-cci.h index f2c2962e936b..a2835a663df5 100644 --- a/include/media/v4l2-cci.h +++ b/include/media/v4l2-cci.h @@ -7,6 +7,7 @@ #ifndef _V4L2_CCI_H #define _V4L2_CCI_H
+#include <linux/bitfield.h> #include <linux/bits.h> #include <linux/types.h>
@@ -34,6 +35,10 @@ struct cci_reg_sequence { #define CCI_REG_WIDTH_SHIFT 16 #define CCI_REG_WIDTH_MASK GENMASK(19, 16)
+#define CCI_REG_WIDTH_BYTES(x) FIELD_GET(CCI_REG_WIDTH_MASK, x) +#define CCI_REG_WIDTH(x) (CCI_REG_WIDTH_BYTES(x) << 3) +#define CCI_REG_ADDR(x) FIELD_GET(CCI_REG_ADDR_MASK, x) + #define CCI_REG8(x) ((1 << CCI_REG_WIDTH_SHIFT) | (x)) #define CCI_REG16(x) ((2 << CCI_REG_WIDTH_SHIFT) | (x)) #define CCI_REG24(x) ((3 << CCI_REG_WIDTH_SHIFT) | (x))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Stein alexander.stein@ew.tq-group.com
[ Upstream commit d92e7a013ff33f4e0b31bbf768d0c85a8acefebf ]
Some sensors, e.g. Sony IMX290, are using little-endian registers. Add support for those by encoding the endianness into Bit 20 of the register address.
Fixes: af73323b9770 ("media: imx290: Convert to new CCI register access helpers") Cc: stable@vger.kernel.org Signed-off-by: Alexander Stein alexander.stein@ew.tq-group.com Reviewed-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com [Sakari Ailus: Fixed commit message.] Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/v4l2-core/v4l2-cci.c | 44 ++++++++++++++++++++++++------ include/media/v4l2-cci.h | 5 ++++ 2 files changed, 41 insertions(+), 8 deletions(-)
diff --git a/drivers/media/v4l2-core/v4l2-cci.c b/drivers/media/v4l2-core/v4l2-cci.c index 3179160abde3..10005c80f43b 100644 --- a/drivers/media/v4l2-core/v4l2-cci.c +++ b/drivers/media/v4l2-core/v4l2-cci.c @@ -18,6 +18,7 @@
int cci_read(struct regmap *map, u32 reg, u64 *val, int *err) { + bool little_endian; unsigned int len; u8 buf[8]; int ret; @@ -25,6 +26,7 @@ int cci_read(struct regmap *map, u32 reg, u64 *val, int *err) if (err && *err) return *err;
+ little_endian = reg & CCI_REG_LE; len = CCI_REG_WIDTH_BYTES(reg); reg = CCI_REG_ADDR(reg);
@@ -40,16 +42,28 @@ int cci_read(struct regmap *map, u32 reg, u64 *val, int *err) *val = buf[0]; break; case 2: - *val = get_unaligned_be16(buf); + if (little_endian) + *val = get_unaligned_le16(buf); + else + *val = get_unaligned_be16(buf); break; case 3: - *val = get_unaligned_be24(buf); + if (little_endian) + *val = get_unaligned_le24(buf); + else + *val = get_unaligned_be24(buf); break; case 4: - *val = get_unaligned_be32(buf); + if (little_endian) + *val = get_unaligned_le32(buf); + else + *val = get_unaligned_be32(buf); break; case 8: - *val = get_unaligned_be64(buf); + if (little_endian) + *val = get_unaligned_le64(buf); + else + *val = get_unaligned_be64(buf); break; default: dev_err(regmap_get_device(map), "Error invalid reg-width %u for reg 0x%04x\n", @@ -68,6 +82,7 @@ EXPORT_SYMBOL_GPL(cci_read);
int cci_write(struct regmap *map, u32 reg, u64 val, int *err) { + bool little_endian; unsigned int len; u8 buf[8]; int ret; @@ -75,6 +90,7 @@ int cci_write(struct regmap *map, u32 reg, u64 val, int *err) if (err && *err) return *err;
+ little_endian = reg & CCI_REG_LE; len = CCI_REG_WIDTH_BYTES(reg); reg = CCI_REG_ADDR(reg);
@@ -83,16 +99,28 @@ int cci_write(struct regmap *map, u32 reg, u64 val, int *err) buf[0] = val; break; case 2: - put_unaligned_be16(val, buf); + if (little_endian) + put_unaligned_le16(val, buf); + else + put_unaligned_be16(val, buf); break; case 3: - put_unaligned_be24(val, buf); + if (little_endian) + put_unaligned_le24(val, buf); + else + put_unaligned_be24(val, buf); break; case 4: - put_unaligned_be32(val, buf); + if (little_endian) + put_unaligned_le32(val, buf); + else + put_unaligned_be32(val, buf); break; case 8: - put_unaligned_be64(val, buf); + if (little_endian) + put_unaligned_le64(val, buf); + else + put_unaligned_be64(val, buf); break; default: dev_err(regmap_get_device(map), "Error invalid reg-width %u for reg 0x%04x\n", diff --git a/include/media/v4l2-cci.h b/include/media/v4l2-cci.h index a2835a663df5..8b0b361b464c 100644 --- a/include/media/v4l2-cci.h +++ b/include/media/v4l2-cci.h @@ -38,12 +38,17 @@ struct cci_reg_sequence { #define CCI_REG_WIDTH_BYTES(x) FIELD_GET(CCI_REG_WIDTH_MASK, x) #define CCI_REG_WIDTH(x) (CCI_REG_WIDTH_BYTES(x) << 3) #define CCI_REG_ADDR(x) FIELD_GET(CCI_REG_ADDR_MASK, x) +#define CCI_REG_LE BIT(20)
#define CCI_REG8(x) ((1 << CCI_REG_WIDTH_SHIFT) | (x)) #define CCI_REG16(x) ((2 << CCI_REG_WIDTH_SHIFT) | (x)) #define CCI_REG24(x) ((3 << CCI_REG_WIDTH_SHIFT) | (x)) #define CCI_REG32(x) ((4 << CCI_REG_WIDTH_SHIFT) | (x)) #define CCI_REG64(x) ((8 << CCI_REG_WIDTH_SHIFT) | (x)) +#define CCI_REG16_LE(x) (CCI_REG_LE | (2U << CCI_REG_WIDTH_SHIFT) | (x)) +#define CCI_REG24_LE(x) (CCI_REG_LE | (3U << CCI_REG_WIDTH_SHIFT) | (x)) +#define CCI_REG32_LE(x) (CCI_REG_LE | (4U << CCI_REG_WIDTH_SHIFT) | (x)) +#define CCI_REG64_LE(x) (CCI_REG_LE | (8U << CCI_REG_WIDTH_SHIFT) | (x))
/** * cci_read() - Read a value from a single CCI register
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alexander Stein alexander.stein@ew.tq-group.com
[ Upstream commit 60fc87a69523c294eb23a1316af922f6665a6f8c ]
The conversion to CCI also converted the multi-byte register access to big-endian. Correct the register definition by using the correct little-endian ones.
Fixes: af73323b9770 ("media: imx290: Convert to new CCI register access helpers") Cc: stable@vger.kernel.org Signed-off-by: Alexander Stein alexander.stein@ew.tq-group.com Reviewed-by: Hans de Goede hdegoede@redhat.com Reviewed-by: Laurent Pinchart laurent.pinchart@ideasonboard.com [Sakari Ailus: Fixed the Fixes: tag.] Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/i2c/imx290.c | 42 +++++++++++++++++++------------------- 1 file changed, 21 insertions(+), 21 deletions(-)
diff --git a/drivers/media/i2c/imx290.c b/drivers/media/i2c/imx290.c index 29098612813c..c6fea5837a19 100644 --- a/drivers/media/i2c/imx290.c +++ b/drivers/media/i2c/imx290.c @@ -41,18 +41,18 @@ #define IMX290_WINMODE_720P (1 << 4) #define IMX290_WINMODE_CROP (4 << 4) #define IMX290_FR_FDG_SEL CCI_REG8(0x3009) -#define IMX290_BLKLEVEL CCI_REG16(0x300a) +#define IMX290_BLKLEVEL CCI_REG16_LE(0x300a) #define IMX290_GAIN CCI_REG8(0x3014) -#define IMX290_VMAX CCI_REG24(0x3018) +#define IMX290_VMAX CCI_REG24_LE(0x3018) #define IMX290_VMAX_MAX 0x3ffff -#define IMX290_HMAX CCI_REG16(0x301c) +#define IMX290_HMAX CCI_REG16_LE(0x301c) #define IMX290_HMAX_MAX 0xffff -#define IMX290_SHS1 CCI_REG24(0x3020) +#define IMX290_SHS1 CCI_REG24_LE(0x3020) #define IMX290_WINWV_OB CCI_REG8(0x303a) -#define IMX290_WINPV CCI_REG16(0x303c) -#define IMX290_WINWV CCI_REG16(0x303e) -#define IMX290_WINPH CCI_REG16(0x3040) -#define IMX290_WINWH CCI_REG16(0x3042) +#define IMX290_WINPV CCI_REG16_LE(0x303c) +#define IMX290_WINWV CCI_REG16_LE(0x303e) +#define IMX290_WINPH CCI_REG16_LE(0x3040) +#define IMX290_WINWH CCI_REG16_LE(0x3042) #define IMX290_OUT_CTRL CCI_REG8(0x3046) #define IMX290_ODBIT_10BIT (0 << 0) #define IMX290_ODBIT_12BIT (1 << 0) @@ -78,28 +78,28 @@ #define IMX290_ADBIT2 CCI_REG8(0x317c) #define IMX290_ADBIT2_10BIT 0x12 #define IMX290_ADBIT2_12BIT 0x00 -#define IMX290_CHIP_ID CCI_REG16(0x319a) +#define IMX290_CHIP_ID CCI_REG16_LE(0x319a) #define IMX290_ADBIT3 CCI_REG8(0x31ec) #define IMX290_ADBIT3_10BIT 0x37 #define IMX290_ADBIT3_12BIT 0x0e #define IMX290_REPETITION CCI_REG8(0x3405) #define IMX290_PHY_LANE_NUM CCI_REG8(0x3407) #define IMX290_OPB_SIZE_V CCI_REG8(0x3414) -#define IMX290_Y_OUT_SIZE CCI_REG16(0x3418) -#define IMX290_CSI_DT_FMT CCI_REG16(0x3441) +#define IMX290_Y_OUT_SIZE CCI_REG16_LE(0x3418) +#define IMX290_CSI_DT_FMT CCI_REG16_LE(0x3441) #define IMX290_CSI_DT_FMT_RAW10 0x0a0a #define IMX290_CSI_DT_FMT_RAW12 0x0c0c #define IMX290_CSI_LANE_MODE CCI_REG8(0x3443) -#define IMX290_EXTCK_FREQ CCI_REG16(0x3444) -#define IMX290_TCLKPOST CCI_REG16(0x3446) -#define IMX290_THSZERO CCI_REG16(0x3448) -#define IMX290_THSPREPARE CCI_REG16(0x344a) -#define IMX290_TCLKTRAIL CCI_REG16(0x344c) -#define IMX290_THSTRAIL CCI_REG16(0x344e) -#define IMX290_TCLKZERO CCI_REG16(0x3450) -#define IMX290_TCLKPREPARE CCI_REG16(0x3452) -#define IMX290_TLPX CCI_REG16(0x3454) -#define IMX290_X_OUT_SIZE CCI_REG16(0x3472) +#define IMX290_EXTCK_FREQ CCI_REG16_LE(0x3444) +#define IMX290_TCLKPOST CCI_REG16_LE(0x3446) +#define IMX290_THSZERO CCI_REG16_LE(0x3448) +#define IMX290_THSPREPARE CCI_REG16_LE(0x344a) +#define IMX290_TCLKTRAIL CCI_REG16_LE(0x344c) +#define IMX290_THSTRAIL CCI_REG16_LE(0x344e) +#define IMX290_TCLKZERO CCI_REG16_LE(0x3450) +#define IMX290_TCLKPREPARE CCI_REG16_LE(0x3452) +#define IMX290_TLPX CCI_REG16_LE(0x3454) +#define IMX290_X_OUT_SIZE CCI_REG16_LE(0x3472) #define IMX290_INCKSEL7 CCI_REG8(0x3480)
#define IMX290_PGCTRL_REGEN BIT(0)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
[ Upstream commit a15ffa783ea4210877886c59566a0d20f6b2bc09 ]
It is invalid to call for_each_thermal_trip() on an unregistered thermal zone anyway, and as per thermal_zone_device_register_with_trips(), the trips[] table must be present if num_trips is greater than zero for the given thermal zone.
Hence, the trips check in for_each_thermal_trip() is redundant and so it can be dropped.
Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Acked-by: Daniel Lezcano daniel.lezcano@linaro.org Stable-dep-of: e95fa7404716 ("thermal: gov_power_allocator: avoid inability to reset a cdev") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/thermal_trip.c | 3 --- 1 file changed, 3 deletions(-)
diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c index 597ac4144e33..4b3a9e77c039 100644 --- a/drivers/thermal/thermal_trip.c +++ b/drivers/thermal/thermal_trip.c @@ -17,9 +17,6 @@ int for_each_thermal_trip(struct thermal_zone_device *tz,
lockdep_assert_held(&tz->lock);
- if (!tz->trips) - return -ENODATA; - for (i = 0; i < tz->num_trips; i++) { ret = cb(&tz->trips[i], data); if (ret)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
[ Upstream commit 2c7b4bfadef08cc0995c24a7b9eb120fe897165f ]
Replace the integer trip number stored in struct thermal_instance with a pointer to the relevant trip and adjust the code using the structure in question accordingly.
The main reason for making this change is to allow the trip point to cooling device binding code more straightforward, as illustrated by subsequent modifications of the ACPI thermal driver, but it also helps to clarify the overall design and allows the governor code overhead to be reduced (through subsequent modifications).
The only case in which it adds complexity is trip_point_show() that needs to walk the trips[] table to find the index of the given trip point, but this is not a critical path and the interface that trip_point_show() belongs to is problematic anyway (for instance, it doesn't cover the case when the same cooling devices is associated with multiple trip points).
This is a preliminary change and the affected code will be refined by a series of subsequent modifications of thermal governors, the core and the ACPI thermal driver.
The general functionality is not expected to be affected by this change.
Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Reviewed-by: Daniel Lezcano daniel.lezcano@linaro.org Stable-dep-of: e95fa7404716 ("thermal: gov_power_allocator: avoid inability to reset a cdev") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/gov_bang_bang.c | 23 ++++++++--------------- drivers/thermal/gov_fair_share.c | 5 +++-- drivers/thermal/gov_power_allocator.c | 11 ++++++++--- drivers/thermal/gov_step_wise.c | 16 +++++++--------- drivers/thermal/thermal_core.c | 15 ++++++++++----- drivers/thermal/thermal_core.h | 4 +++- drivers/thermal/thermal_helpers.c | 5 ++++- drivers/thermal/thermal_sysfs.c | 3 ++- drivers/thermal/thermal_trip.c | 15 +++++++++++++++ 9 files changed, 60 insertions(+), 37 deletions(-)
diff --git a/drivers/thermal/gov_bang_bang.c b/drivers/thermal/gov_bang_bang.c index 1b121066521f..49cdfaa3a927 100644 --- a/drivers/thermal/gov_bang_bang.c +++ b/drivers/thermal/gov_bang_bang.c @@ -13,28 +13,21 @@
#include "thermal_core.h"
-static int thermal_zone_trip_update(struct thermal_zone_device *tz, int trip_id) +static int thermal_zone_trip_update(struct thermal_zone_device *tz, int trip_index) { - struct thermal_trip trip; + const struct thermal_trip *trip = &tz->trips[trip_index]; struct thermal_instance *instance; - int ret; - - ret = __thermal_zone_get_trip(tz, trip_id, &trip); - if (ret) { - pr_warn_once("Failed to retrieve trip point %d\n", trip_id); - return ret; - }
- if (!trip.hysteresis) + if (!trip->hysteresis) dev_info_once(&tz->device, "Zero hysteresis value for thermal zone %s\n", tz->type);
dev_dbg(&tz->device, "Trip%d[temp=%d]:temp=%d:hyst=%d\n", - trip_id, trip.temperature, tz->temperature, - trip.hysteresis); + trip_index, trip->temperature, tz->temperature, + trip->hysteresis);
list_for_each_entry(instance, &tz->thermal_instances, tz_node) { - if (instance->trip != trip_id) + if (instance->trip != trip) continue;
/* in case fan is in initial state, switch the fan off */ @@ -52,10 +45,10 @@ static int thermal_zone_trip_update(struct thermal_zone_device *tz, int trip_id) * enable fan when temperature exceeds trip_temp and disable * the fan in case it falls below trip_temp minus hysteresis */ - if (instance->target == 0 && tz->temperature >= trip.temperature) + if (instance->target == 0 && tz->temperature >= trip->temperature) instance->target = 1; else if (instance->target == 1 && - tz->temperature <= trip.temperature - trip.hysteresis) + tz->temperature <= trip->temperature - trip->hysteresis) instance->target = 0;
dev_dbg(&instance->cdev->device, "target=%d\n", diff --git a/drivers/thermal/gov_fair_share.c b/drivers/thermal/gov_fair_share.c index 03c2daeb6ee8..2abeb8979f50 100644 --- a/drivers/thermal/gov_fair_share.c +++ b/drivers/thermal/gov_fair_share.c @@ -49,7 +49,7 @@ static long get_target_state(struct thermal_zone_device *tz, /** * fair_share_throttle - throttles devices associated with the given zone * @tz: thermal_zone_device - * @trip: trip point index + * @trip_index: trip point index * * Throttling Logic: This uses three parameters to calculate the new * throttle state of the cooling devices associated with the given zone. @@ -65,8 +65,9 @@ static long get_target_state(struct thermal_zone_device *tz, * (Heavily assumes the trip points are in ascending order) * new_state of cooling device = P3 * P2 * P1 */ -static int fair_share_throttle(struct thermal_zone_device *tz, int trip) +static int fair_share_throttle(struct thermal_zone_device *tz, int trip_index) { + const struct thermal_trip *trip = &tz->trips[trip_index]; struct thermal_instance *instance; int total_weight = 0; int total_instance = 0; diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c index 8642f1096b91..1faf55446ba2 100644 --- a/drivers/thermal/gov_power_allocator.c +++ b/drivers/thermal/gov_power_allocator.c @@ -90,12 +90,14 @@ static u32 estimate_sustainable_power(struct thermal_zone_device *tz) u32 sustainable_power = 0; struct thermal_instance *instance; struct power_allocator_params *params = tz->governor_data; + const struct thermal_trip *trip_max_desired_temperature = + &tz->trips[params->trip_max_desired_temperature];
list_for_each_entry(instance, &tz->thermal_instances, tz_node) { struct thermal_cooling_device *cdev = instance->cdev; u32 min_power;
- if (instance->trip != params->trip_max_desired_temperature) + if (instance->trip != trip_max_desired_temperature) continue;
if (!cdev_is_power_actor(cdev)) @@ -383,12 +385,13 @@ static int allocate_power(struct thermal_zone_device *tz, { struct thermal_instance *instance; struct power_allocator_params *params = tz->governor_data; + const struct thermal_trip *trip_max_desired_temperature = + &tz->trips[params->trip_max_desired_temperature]; u32 *req_power, *max_power, *granted_power, *extra_actor_power; u32 *weighted_req_power; u32 total_req_power, max_allocatable_power, total_weighted_req_power; u32 total_granted_power, power_range; int i, num_actors, total_weight, ret = 0; - int trip_max_desired_temperature = params->trip_max_desired_temperature;
num_actors = 0; total_weight = 0; @@ -564,12 +567,14 @@ static void allow_maximum_power(struct thermal_zone_device *tz, bool update) { struct thermal_instance *instance; struct power_allocator_params *params = tz->governor_data; + const struct thermal_trip *trip_max_desired_temperature = + &tz->trips[params->trip_max_desired_temperature]; u32 req_power;
list_for_each_entry(instance, &tz->thermal_instances, tz_node) { struct thermal_cooling_device *cdev = instance->cdev;
- if ((instance->trip != params->trip_max_desired_temperature) || + if ((instance->trip != trip_max_desired_temperature) || (!cdev_is_power_actor(instance->cdev))) continue;
diff --git a/drivers/thermal/gov_step_wise.c b/drivers/thermal/gov_step_wise.c index 1050fb4d94c2..849dc1ec8d27 100644 --- a/drivers/thermal/gov_step_wise.c +++ b/drivers/thermal/gov_step_wise.c @@ -81,26 +81,24 @@ static void update_passive_instance(struct thermal_zone_device *tz,
static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip_id) { + const struct thermal_trip *trip = &tz->trips[trip_id]; enum thermal_trend trend; struct thermal_instance *instance; - struct thermal_trip trip; bool throttle = false; int old_target;
- __thermal_zone_get_trip(tz, trip_id, &trip); - trend = get_tz_trend(tz, trip_id);
- if (tz->temperature >= trip.temperature) { + if (tz->temperature >= trip->temperature) { throttle = true; - trace_thermal_zone_trip(tz, trip_id, trip.type); + trace_thermal_zone_trip(tz, trip_id, trip->type); }
dev_dbg(&tz->device, "Trip%d[type=%d,temp=%d]:trend=%d,throttle=%d\n", - trip_id, trip.type, trip.temperature, trend, throttle); + trip_id, trip->type, trip->temperature, trend, throttle);
list_for_each_entry(instance, &tz->thermal_instances, tz_node) { - if (instance->trip != trip_id) + if (instance->trip != trip) continue;
old_target = instance->target; @@ -114,11 +112,11 @@ static void thermal_zone_trip_update(struct thermal_zone_device *tz, int trip_id /* Activate a passive thermal instance */ if (old_target == THERMAL_NO_TARGET && instance->target != THERMAL_NO_TARGET) - update_passive_instance(tz, trip.type, 1); + update_passive_instance(tz, trip->type, 1); /* Deactivate a passive thermal instance */ else if (old_target != THERMAL_NO_TARGET && instance->target == THERMAL_NO_TARGET) - update_passive_instance(tz, trip.type, -1); + update_passive_instance(tz, trip->type, -1);
instance->initialized = true; mutex_lock(&instance->cdev->lock); diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 2de524fb7be5..1494ffa59754 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -606,7 +606,7 @@ struct thermal_zone_device *thermal_zone_get_by_id(int id) /** * thermal_zone_bind_cooling_device() - bind a cooling device to a thermal zone * @tz: pointer to struct thermal_zone_device - * @trip: indicates which trip point the cooling devices is + * @trip_index: indicates which trip point the cooling devices is * associated with in this thermal zone. * @cdev: pointer to struct thermal_cooling_device * @upper: the Maximum cooling state for this trip point. @@ -626,7 +626,7 @@ struct thermal_zone_device *thermal_zone_get_by_id(int id) * Return: 0 on success, the proper error value otherwise. */ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz, - int trip, + int trip_index, struct thermal_cooling_device *cdev, unsigned long upper, unsigned long lower, unsigned int weight) @@ -635,12 +635,15 @@ int thermal_zone_bind_cooling_device(struct thermal_zone_device *tz, struct thermal_instance *pos; struct thermal_zone_device *pos1; struct thermal_cooling_device *pos2; + const struct thermal_trip *trip; bool upper_no_limit; int result;
- if (trip >= tz->num_trips || trip < 0) + if (trip_index >= tz->num_trips || trip_index < 0) return -EINVAL;
+ trip = &tz->trips[trip_index]; + list_for_each_entry(pos1, &thermal_tz_list, node) { if (pos1 == tz) break; @@ -745,7 +748,7 @@ EXPORT_SYMBOL_GPL(thermal_zone_bind_cooling_device); * thermal_zone_unbind_cooling_device() - unbind a cooling device from a * thermal zone. * @tz: pointer to a struct thermal_zone_device. - * @trip: indicates which trip point the cooling devices is + * @trip_index: indicates which trip point the cooling devices is * associated with in this thermal zone. * @cdev: pointer to a struct thermal_cooling_device. * @@ -756,13 +759,15 @@ EXPORT_SYMBOL_GPL(thermal_zone_bind_cooling_device); * Return: 0 on success, the proper error value otherwise. */ int thermal_zone_unbind_cooling_device(struct thermal_zone_device *tz, - int trip, + int trip_index, struct thermal_cooling_device *cdev) { struct thermal_instance *pos, *next; + const struct thermal_trip *trip;
mutex_lock(&tz->lock); mutex_lock(&cdev->lock); + trip = &tz->trips[trip_index]; list_for_each_entry_safe(pos, next, &tz->thermal_instances, tz_node) { if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) { list_del(&pos->tz_node); diff --git a/drivers/thermal/thermal_core.h b/drivers/thermal/thermal_core.h index de884bea28b6..024e82ebf592 100644 --- a/drivers/thermal/thermal_core.h +++ b/drivers/thermal/thermal_core.h @@ -87,7 +87,7 @@ struct thermal_instance { char name[THERMAL_NAME_LENGTH]; struct thermal_zone_device *tz; struct thermal_cooling_device *cdev; - int trip; + const struct thermal_trip *trip; bool initialized; unsigned long upper; /* Highest cooling state for this trip point */ unsigned long lower; /* Lowest cooling state for this trip point */ @@ -119,6 +119,8 @@ void __thermal_zone_device_update(struct thermal_zone_device *tz, void __thermal_zone_set_trips(struct thermal_zone_device *tz); int __thermal_zone_get_trip(struct thermal_zone_device *tz, int trip_id, struct thermal_trip *trip); +int thermal_zone_trip_id(struct thermal_zone_device *tz, + const struct thermal_trip *trip); int __thermal_zone_get_temp(struct thermal_zone_device *tz, int *temp);
/* sysfs I/F */ diff --git a/drivers/thermal/thermal_helpers.c b/drivers/thermal/thermal_helpers.c index 4d66372c9629..c1d0af73c85d 100644 --- a/drivers/thermal/thermal_helpers.c +++ b/drivers/thermal/thermal_helpers.c @@ -42,14 +42,17 @@ int get_tz_trend(struct thermal_zone_device *tz, int trip_index)
struct thermal_instance * get_thermal_instance(struct thermal_zone_device *tz, - struct thermal_cooling_device *cdev, int trip) + struct thermal_cooling_device *cdev, int trip_index) { struct thermal_instance *pos = NULL; struct thermal_instance *target_instance = NULL; + const struct thermal_trip *trip;
mutex_lock(&tz->lock); mutex_lock(&cdev->lock);
+ trip = &tz->trips[trip_index]; + list_for_each_entry(pos, &tz->thermal_instances, tz_node) { if (pos->tz == tz && pos->trip == trip && pos->cdev == cdev) { target_instance = pos; diff --git a/drivers/thermal/thermal_sysfs.c b/drivers/thermal/thermal_sysfs.c index 4e6a97db894e..eef40d4f3063 100644 --- a/drivers/thermal/thermal_sysfs.c +++ b/drivers/thermal/thermal_sysfs.c @@ -943,7 +943,8 @@ trip_point_show(struct device *dev, struct device_attribute *attr, char *buf) instance = container_of(attr, struct thermal_instance, attr);
- return sprintf(buf, "%d\n", instance->trip); + return sprintf(buf, "%d\n", + thermal_zone_trip_id(instance->tz, instance->trip)); }
ssize_t diff --git a/drivers/thermal/thermal_trip.c b/drivers/thermal/thermal_trip.c index 4b3a9e77c039..6e5cebd1e63a 100644 --- a/drivers/thermal/thermal_trip.c +++ b/drivers/thermal/thermal_trip.c @@ -172,3 +172,18 @@ int thermal_zone_set_trip(struct thermal_zone_device *tz, int trip_id,
return 0; } + +int thermal_zone_trip_id(struct thermal_zone_device *tz, + const struct thermal_trip *trip) +{ + int i; + + lockdep_assert_held(&tz->lock); + + for (i = 0; i < tz->num_trips; i++) { + if (&tz->trips[i] == trip) + return i; + } + + return -ENODATA; +}
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Di Shen di.shen@unisoc.com
[ Upstream commit e95fa7404716f6e25021e66067271a4ad8eb1486 ]
Commit 0952177f2a1f ("thermal/core/power_allocator: Update once cooling devices when temp is low") adds an update flag to avoid triggering a thermal event when there is no need, and the thermal cdev is updated once when the temperature is low.
But when the trips are writable, and switch_on_temp is set to be a higher value, the cooling device state may not be reset to 0, because last_temperature is smaller than switch_on_temp.
For example: First: switch_on_temp=70 control_temp=85; Then userspace change the trip_temp: switch_on_temp=45 control_temp=55 cur_temp=54
Then userspace reset the trip_temp: switch_on_temp=70 control_temp=85 cur_temp=57 last_temp=54
At this time, the cooling device state should be reset to 0. However, because cur_temp(57) < switch_on_temp(70) last_temp(54) < switch_on_temp(70) ----> update = false, update is false, the cooling device state can not be reset.
Using the observation that tz->passive can also be regarded as the temperature status, set the update flag to the tz->passive value.
When the temperature drops below switch_on for the first time, the states of cooling devices can be reset once, and tz->passive is updated to 0. In the next round, because tz->passive is 0, cdev->state will not be updated.
By using the tz->passive value as the "update" flag, the issue above can be solved, and the cooling devices can be updated only once when the temperature is low.
Fixes: 0952177f2a1f ("thermal/core/power_allocator: Update once cooling devices when temp is low") Cc: 5.13+ stable@vger.kernel.org # 5.13+ Suggested-by: Wei Wang wvw@google.com Signed-off-by: Di Shen di.shen@unisoc.com Reviewed-by: Lukasz Luba lukasz.luba@arm.com [ rjw: Subject and changelog edits ] Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/thermal/gov_power_allocator.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/gov_power_allocator.c b/drivers/thermal/gov_power_allocator.c index 1faf55446ba2..fc969642f70b 100644 --- a/drivers/thermal/gov_power_allocator.c +++ b/drivers/thermal/gov_power_allocator.c @@ -715,7 +715,7 @@ static int power_allocator_throttle(struct thermal_zone_device *tz, int trip_id)
ret = __thermal_zone_get_trip(tz, params->trip_switch_on, &trip); if (!ret && (tz->temperature < trip.temperature)) { - update = (tz->last_temperature >= trip.temperature); + update = tz->passive; tz->passive = 0; reset_pid_controller(params); allow_maximum_power(tz, update);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baolin Wang baolin.wang@linux.alibaba.com
[ Upstream commit eebb3dabbb5cc590afe32880b5d3726d0fbf88db ]
When doing compaction, I found the lru_add_drain() is an obvious hotspot when migrating pages. The distribution of this hotspot is as follows: - 18.75% compact_zone - 17.39% migrate_pages - 13.79% migrate_pages_batch - 11.66% migrate_folio_move - 7.02% lru_add_drain + 7.02% lru_add_drain_cpu + 3.00% move_to_new_folio 1.23% rmap_walk + 1.92% migrate_folio_unmap + 3.20% migrate_pages_sync + 0.90% isolate_migratepages
The lru_add_drain() was added by commit c3096e6782b7 ("mm/migrate: __unmap_and_move() push good newpage to LRU") to drain the newpage to LRU immediately, to help to build up the correct newpage->mlock_count in remove_migration_ptes() for mlocked pages. However, if there are no mlocked pages are migrating, then we can avoid this lru drain operation, especailly for the heavy concurrent scenarios.
So we can record the source pages' mlocked status in migrate_folio_unmap(), and only drain the lru list when the mlocked status is set in migrate_folio_move().
In addition, the page was already isolated from lru when migrating, so checking the mlocked status is stable by folio_test_mlocked() in migrate_folio_unmap().
After this patch, I can see the hotpot of the lru_add_drain() is gone: - 9.41% migrate_pages_batch - 6.15% migrate_folio_move - 3.64% move_to_new_folio + 1.80% migrate_folio_extra + 1.70% buffer_migrate_folio + 1.41% rmap_walk + 0.62% folio_add_lru + 3.07% migrate_folio_unmap
Meanwhile, the compaction latency shows some improvements when running thpscale: base patched Amean fault-both-1 1131.22 ( 0.00%) 1112.55 * 1.65%* Amean fault-both-3 2489.75 ( 0.00%) 2324.15 * 6.65%* Amean fault-both-5 3257.37 ( 0.00%) 3183.18 * 2.28%* Amean fault-both-7 4257.99 ( 0.00%) 4079.04 * 4.20%* Amean fault-both-12 6614.02 ( 0.00%) 6075.60 * 8.14%* Amean fault-both-18 10607.78 ( 0.00%) 8978.86 * 15.36%* Amean fault-both-24 14911.65 ( 0.00%) 11619.55 * 22.08%* Amean fault-both-30 14954.67 ( 0.00%) 14925.66 * 0.19%* Amean fault-both-32 16654.87 ( 0.00%) 15580.31 * 6.45%*
Link: https://lkml.kernel.org/r/06e9153a7a4850352ec36602df3a3a844de45698.169785974... Signed-off-by: Baolin Wang baolin.wang@linux.alibaba.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Reviewed-by: Zi Yan ziy@nvidia.com Cc: Hugh Dickins hughd@google.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Vlastimil Babka vbabka@suse.cz Cc: Yin Fengwei fengwei.yin@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Stable-dep-of: d1adb25df711 ("mm: migrate: fix getting incorrect page mapping during page migration") Signed-off-by: Sasha Levin sashal@kernel.org --- mm/migrate.c | 48 +++++++++++++++++++++++++++++------------------- 1 file changed, 29 insertions(+), 19 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c index 03bc2063ac87..3373fc1c2d0f 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1035,22 +1035,28 @@ union migration_ptr { struct anon_vma *anon_vma; struct address_space *mapping; }; + +enum { + PAGE_WAS_MAPPED = BIT(0), + PAGE_WAS_MLOCKED = BIT(1), +}; + static void __migrate_folio_record(struct folio *dst, - unsigned long page_was_mapped, + unsigned long old_page_state, struct anon_vma *anon_vma) { union migration_ptr ptr = { .anon_vma = anon_vma }; dst->mapping = ptr.mapping; - dst->private = (void *)page_was_mapped; + dst->private = (void *)old_page_state; }
static void __migrate_folio_extract(struct folio *dst, - int *page_was_mappedp, + int *old_page_state, struct anon_vma **anon_vmap) { union migration_ptr ptr = { .mapping = dst->mapping }; *anon_vmap = ptr.anon_vma; - *page_was_mappedp = (unsigned long)dst->private; + *old_page_state = (unsigned long)dst->private; dst->mapping = NULL; dst->private = NULL; } @@ -1111,7 +1117,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, { struct folio *dst; int rc = -EAGAIN; - int page_was_mapped = 0; + int old_page_state = 0; struct anon_vma *anon_vma = NULL; bool is_lru = !__PageMovable(&src->page); bool locked = false; @@ -1165,6 +1171,8 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, folio_lock(src); } locked = true; + if (folio_test_mlocked(src)) + old_page_state |= PAGE_WAS_MLOCKED;
if (folio_test_writeback(src)) { /* @@ -1214,7 +1222,7 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, dst_locked = true;
if (unlikely(!is_lru)) { - __migrate_folio_record(dst, page_was_mapped, anon_vma); + __migrate_folio_record(dst, old_page_state, anon_vma); return MIGRATEPAGE_UNMAP; }
@@ -1240,11 +1248,11 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, VM_BUG_ON_FOLIO(folio_test_anon(src) && !folio_test_ksm(src) && !anon_vma, src); try_to_migrate(src, mode == MIGRATE_ASYNC ? TTU_BATCH_FLUSH : 0); - page_was_mapped = 1; + old_page_state |= PAGE_WAS_MAPPED; }
if (!folio_mapped(src)) { - __migrate_folio_record(dst, page_was_mapped, anon_vma); + __migrate_folio_record(dst, old_page_state, anon_vma); return MIGRATEPAGE_UNMAP; }
@@ -1256,7 +1264,8 @@ static int migrate_folio_unmap(new_folio_t get_new_folio, if (rc == -EAGAIN) ret = NULL;
- migrate_folio_undo_src(src, page_was_mapped, anon_vma, locked, ret); + migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED, + anon_vma, locked, ret); migrate_folio_undo_dst(dst, dst_locked, put_new_folio, private);
return rc; @@ -1269,12 +1278,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, struct list_head *ret) { int rc; - int page_was_mapped = 0; + int old_page_state = 0; struct anon_vma *anon_vma = NULL; bool is_lru = !__PageMovable(&src->page); struct list_head *prev;
- __migrate_folio_extract(dst, &page_was_mapped, &anon_vma); + __migrate_folio_extract(dst, &old_page_state, &anon_vma); prev = dst->lru.prev; list_del(&dst->lru);
@@ -1295,10 +1304,10 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, * isolated from the unevictable LRU: but this case is the easiest. */ folio_add_lru(dst); - if (page_was_mapped) + if (old_page_state & PAGE_WAS_MLOCKED) lru_add_drain();
- if (page_was_mapped) + if (old_page_state & PAGE_WAS_MAPPED) remove_migration_ptes(src, dst, false);
out_unlock_both: @@ -1330,11 +1339,12 @@ static int migrate_folio_move(free_folio_t put_new_folio, unsigned long private, */ if (rc == -EAGAIN) { list_add(&dst->lru, prev); - __migrate_folio_record(dst, page_was_mapped, anon_vma); + __migrate_folio_record(dst, old_page_state, anon_vma); return rc; }
- migrate_folio_undo_src(src, page_was_mapped, anon_vma, true, ret); + migrate_folio_undo_src(src, old_page_state & PAGE_WAS_MAPPED, + anon_vma, true, ret); migrate_folio_undo_dst(dst, true, put_new_folio, private);
return rc; @@ -1802,12 +1812,12 @@ static int migrate_pages_batch(struct list_head *from, dst = list_first_entry(&dst_folios, struct folio, lru); dst2 = list_next_entry(dst, lru); list_for_each_entry_safe(folio, folio2, &unmap_folios, lru) { - int page_was_mapped = 0; + int old_page_state = 0; struct anon_vma *anon_vma = NULL;
- __migrate_folio_extract(dst, &page_was_mapped, &anon_vma); - migrate_folio_undo_src(folio, page_was_mapped, anon_vma, - true, ret_folios); + __migrate_folio_extract(dst, &old_page_state, &anon_vma); + migrate_folio_undo_src(folio, old_page_state & PAGE_WAS_MAPPED, + anon_vma, true, ret_folios); list_del(&dst->lru); migrate_folio_undo_dst(dst, true, put_new_folio, private); dst = dst2;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Baolin Wang baolin.wang@linux.alibaba.com
[ Upstream commit d1adb25df7111de83b64655a80b5a135adbded61 ]
When running stress-ng testing, we found below kernel crash after a few hours:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 pc : dentry_name+0xd8/0x224 lr : pointer+0x22c/0x370 sp : ffff800025f134c0 ...... Call trace: dentry_name+0xd8/0x224 pointer+0x22c/0x370 vsnprintf+0x1ec/0x730 vscnprintf+0x2c/0x60 vprintk_store+0x70/0x234 vprintk_emit+0xe0/0x24c vprintk_default+0x3c/0x44 vprintk_func+0x84/0x2d0 printk+0x64/0x88 __dump_page+0x52c/0x530 dump_page+0x14/0x20 set_migratetype_isolate+0x110/0x224 start_isolate_page_range+0xc4/0x20c offline_pages+0x124/0x474 memory_block_offline+0x44/0xf4 memory_subsys_offline+0x3c/0x70 device_offline+0xf0/0x120 ......
After analyzing the vmcore, I found this issue is caused by page migration. The scenario is that, one thread is doing page migration, and we will use the target page's ->mapping field to save 'anon_vma' pointer between page unmap and page move, and now the target page is locked and refcount is 1.
Currently, there is another stress-ng thread performing memory hotplug, attempting to offline the target page that is being migrated. It discovers that the refcount of this target page is 1, preventing the offline operation, thus proceeding to dump the page. However, page_mapping() of the target page may return an incorrect file mapping to crash the system in dump_mapping(), since the target page->mapping only saves 'anon_vma' pointer without setting PAGE_MAPPING_ANON flag.
There are seveval ways to fix this issue: (1) Setting the PAGE_MAPPING_ANON flag for target page's ->mapping when saving 'anon_vma', but this can confuse PageAnon() for PFN walkers, since the target page has not built mappings yet. (2) Getting the page lock to call page_mapping() in __dump_page() to avoid crashing the system, however, there are still some PFN walkers that call page_mapping() without holding the page lock, such as compaction. (3) Using target page->private field to save the 'anon_vma' pointer and 2 bits page state, just as page->mapping records an anonymous page, which can remove the page_mapping() impact for PFN walkers and also seems a simple way.
So I choose option 3 to fix this issue, and this can also fix other potential issues for PFN walkers, such as compaction.
Link: https://lkml.kernel.org/r/e60b17a88afc38cb32f84c3e30837ec70b343d2b.170264170... Fixes: 64c8902ed441 ("migrate_pages: split unmap_and_move() to _unmap() and _move()") Signed-off-by: Baolin Wang baolin.wang@linux.alibaba.com Reviewed-by: "Huang, Ying" ying.huang@intel.com Cc: Matthew Wilcox willy@infradead.org Cc: David Hildenbrand david@redhat.com Cc: Xu Yu xuyu@linux.alibaba.com Cc: Zi Yan ziy@nvidia.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/migrate.c | 27 ++++++++++----------------- 1 file changed, 10 insertions(+), 17 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c index 3373fc1c2d0f..b4d972d80b10 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1026,38 +1026,31 @@ static int move_to_new_folio(struct folio *dst, struct folio *src, }
/* - * To record some information during migration, we use some unused - * fields (mapping and private) of struct folio of the newly allocated - * destination folio. This is safe because nobody is using them - * except us. + * To record some information during migration, we use unused private + * field of struct folio of the newly allocated destination folio. + * This is safe because nobody is using it except us. */ -union migration_ptr { - struct anon_vma *anon_vma; - struct address_space *mapping; -}; - enum { PAGE_WAS_MAPPED = BIT(0), PAGE_WAS_MLOCKED = BIT(1), + PAGE_OLD_STATES = PAGE_WAS_MAPPED | PAGE_WAS_MLOCKED, };
static void __migrate_folio_record(struct folio *dst, - unsigned long old_page_state, + int old_page_state, struct anon_vma *anon_vma) { - union migration_ptr ptr = { .anon_vma = anon_vma }; - dst->mapping = ptr.mapping; - dst->private = (void *)old_page_state; + dst->private = (void *)anon_vma + old_page_state; }
static void __migrate_folio_extract(struct folio *dst, int *old_page_state, struct anon_vma **anon_vmap) { - union migration_ptr ptr = { .mapping = dst->mapping }; - *anon_vmap = ptr.anon_vma; - *old_page_state = (unsigned long)dst->private; - dst->mapping = NULL; + unsigned long private = (unsigned long)dst->private; + + *anon_vmap = (struct anon_vma *)(private & ~PAGE_OLD_STATES); + *old_page_state = private & PAGE_OLD_STATES; dst->private = NULL; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit b0af4bcb49464c221ad5f95d40f2b1b252ceedcc ]
When a serial port is used for kernel console output, then all modifications to the UART registers which are done from other contexts, e.g. getty, termios, are interference points for the kernel console.
So far this has been ignored and the printk output is based on the principle of hope. The rework of the console infrastructure which aims to support threaded and atomic consoles, requires to mark sections which modify the UART registers as unsafe. This allows the atomic write function to make informed decisions and eventually to restore operational state. It also allows to prevent the regular UART code from modifying UART registers while printk output is in progress.
All modifications of UART registers are guarded by the UART port lock, which provides an obvious synchronization point with the console infrastructure.
Provide wrapper functions for spin_[un]lock*(port->lock) invocations so that the console mechanics can be applied later on at a single place and does not require to copy the same logic all over the drivers.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Ilpo Järvinen ilpo.jarvinen@linux.intel.com Signed-off-by: John Ogness john.ogness@linutronix.de Link: https://lore.kernel.org/r/20230914183831.587273-2-john.ogness@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 9915753037eb ("serial: sc16is7xx: fix unconditional activation of THRI interrupt") Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/serial_core.h | 79 +++++++++++++++++++++++++++++++++++++ 1 file changed, 79 insertions(+)
diff --git a/include/linux/serial_core.h b/include/linux/serial_core.h index bb6f073bc159..f1d5c0d1568c 100644 --- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -588,6 +588,85 @@ struct uart_port { void *private_data; /* generic platform data pointer */ };
+/** + * uart_port_lock - Lock the UART port + * @up: Pointer to UART port structure + */ +static inline void uart_port_lock(struct uart_port *up) +{ + spin_lock(&up->lock); +} + +/** + * uart_port_lock_irq - Lock the UART port and disable interrupts + * @up: Pointer to UART port structure + */ +static inline void uart_port_lock_irq(struct uart_port *up) +{ + spin_lock_irq(&up->lock); +} + +/** + * uart_port_lock_irqsave - Lock the UART port, save and disable interrupts + * @up: Pointer to UART port structure + * @flags: Pointer to interrupt flags storage + */ +static inline void uart_port_lock_irqsave(struct uart_port *up, unsigned long *flags) +{ + spin_lock_irqsave(&up->lock, *flags); +} + +/** + * uart_port_trylock - Try to lock the UART port + * @up: Pointer to UART port structure + * + * Returns: True if lock was acquired, false otherwise + */ +static inline bool uart_port_trylock(struct uart_port *up) +{ + return spin_trylock(&up->lock); +} + +/** + * uart_port_trylock_irqsave - Try to lock the UART port, save and disable interrupts + * @up: Pointer to UART port structure + * @flags: Pointer to interrupt flags storage + * + * Returns: True if lock was acquired, false otherwise + */ +static inline bool uart_port_trylock_irqsave(struct uart_port *up, unsigned long *flags) +{ + return spin_trylock_irqsave(&up->lock, *flags); +} + +/** + * uart_port_unlock - Unlock the UART port + * @up: Pointer to UART port structure + */ +static inline void uart_port_unlock(struct uart_port *up) +{ + spin_unlock(&up->lock); +} + +/** + * uart_port_unlock_irq - Unlock the UART port and re-enable interrupts + * @up: Pointer to UART port structure + */ +static inline void uart_port_unlock_irq(struct uart_port *up) +{ + spin_unlock_irq(&up->lock); +} + +/** + * uart_port_lock_irqrestore - Unlock the UART port, restore interrupts + * @up: Pointer to UART port structure + * @flags: The saved interrupt flags for restore + */ +static inline void uart_port_unlock_irqrestore(struct uart_port *up, unsigned long flags) +{ + spin_unlock_irqrestore(&up->lock, flags); +} + static inline int serial_port_in(struct uart_port *up, int offset) { return up->serial_in(up, offset);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit b465848be8a652e2c5fefe102661fb660cff8497 ]
When a serial port is used for kernel console output, then all modifications to the UART registers which are done from other contexts, e.g. getty, termios, are interference points for the kernel console.
So far this has been ignored and the printk output is based on the principle of hope. The rework of the console infrastructure which aims to support threaded and atomic consoles, requires to mark sections which modify the UART registers as unsafe. This allows the atomic write function to make informed decisions and eventually to restore operational state. It also allows to prevent the regular UART code from modifying UART registers while printk output is in progress.
All modifications of UART registers are guarded by the UART port lock, which provides an obvious synchronization point with the console infrastructure.
To avoid adding this functionality to all UART drivers, wrap the spin_[un]lock*() invocations for uart_port::lock into helper functions which just contain the spin_[un]lock*() invocations for now. In a subsequent step these helpers will gain the console synchronization mechanisms.
Converted with coccinelle. No functional change.
Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: John Ogness john.ogness@linutronix.de Link: https://lore.kernel.org/r/20230914183831.587273-56-john.ogness@linutronix.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Stable-dep-of: 9915753037eb ("serial: sc16is7xx: fix unconditional activation of THRI interrupt") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/sc16is7xx.c | 40 +++++++++++++++++----------------- 1 file changed, 20 insertions(+), 20 deletions(-)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c index 3276d21ec1c6..425093ce3f24 100644 --- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -642,9 +642,9 @@ static void sc16is7xx_handle_tx(struct uart_port *port) }
if (uart_circ_empty(xmit) || uart_tx_stopped(port)) { - spin_lock_irqsave(&port->lock, flags); + uart_port_lock_irqsave(port, &flags); sc16is7xx_stop_tx(port); - spin_unlock_irqrestore(&port->lock, flags); + uart_port_unlock_irqrestore(port, flags); return; }
@@ -670,13 +670,13 @@ static void sc16is7xx_handle_tx(struct uart_port *port) sc16is7xx_fifo_write(port, to_send); }
- spin_lock_irqsave(&port->lock, flags); + uart_port_lock_irqsave(port, &flags); if (uart_circ_chars_pending(xmit) < WAKEUP_CHARS) uart_write_wakeup(port);
if (uart_circ_empty(xmit)) sc16is7xx_stop_tx(port); - spin_unlock_irqrestore(&port->lock, flags); + uart_port_unlock_irqrestore(port, flags); }
static unsigned int sc16is7xx_get_hwmctrl(struct uart_port *port) @@ -707,7 +707,7 @@ static void sc16is7xx_update_mlines(struct sc16is7xx_one *one)
one->old_mctrl = status;
- spin_lock_irqsave(&port->lock, flags); + uart_port_lock_irqsave(port, &flags); if ((changed & TIOCM_RNG) && (status & TIOCM_RNG)) port->icount.rng++; if (changed & TIOCM_DSR) @@ -718,7 +718,7 @@ static void sc16is7xx_update_mlines(struct sc16is7xx_one *one) uart_handle_cts_change(port, status & TIOCM_CTS);
wake_up_interruptible(&port->state->port.delta_msr_wait); - spin_unlock_irqrestore(&port->lock, flags); + uart_port_unlock_irqrestore(port, flags); }
static bool sc16is7xx_port_irq(struct sc16is7xx_port *s, int portno) @@ -812,9 +812,9 @@ static void sc16is7xx_tx_proc(struct kthread_work *ws) sc16is7xx_handle_tx(port); mutex_unlock(&one->efr_lock);
- spin_lock_irqsave(&port->lock, flags); + uart_port_lock_irqsave(port, &flags); sc16is7xx_ier_set(port, SC16IS7XX_IER_THRI_BIT); - spin_unlock_irqrestore(&port->lock, flags); + uart_port_unlock_irqrestore(port, flags); }
static void sc16is7xx_reconf_rs485(struct uart_port *port) @@ -825,14 +825,14 @@ static void sc16is7xx_reconf_rs485(struct uart_port *port) struct serial_rs485 *rs485 = &port->rs485; unsigned long irqflags;
- spin_lock_irqsave(&port->lock, irqflags); + uart_port_lock_irqsave(port, &irqflags); if (rs485->flags & SER_RS485_ENABLED) { efcr |= SC16IS7XX_EFCR_AUTO_RS485_BIT;
if (rs485->flags & SER_RS485_RTS_AFTER_SEND) efcr |= SC16IS7XX_EFCR_RTS_INVERT_BIT; } - spin_unlock_irqrestore(&port->lock, irqflags); + uart_port_unlock_irqrestore(port, irqflags);
sc16is7xx_port_update(port, SC16IS7XX_EFCR_REG, mask, efcr); } @@ -843,10 +843,10 @@ static void sc16is7xx_reg_proc(struct kthread_work *ws) struct sc16is7xx_one_config config; unsigned long irqflags;
- spin_lock_irqsave(&one->port.lock, irqflags); + uart_port_lock_irqsave(&one->port, &irqflags); config = one->config; memset(&one->config, 0, sizeof(one->config)); - spin_unlock_irqrestore(&one->port.lock, irqflags); + uart_port_unlock_irqrestore(&one->port, irqflags);
if (config.flags & SC16IS7XX_RECONF_MD) { u8 mcr = 0; @@ -952,18 +952,18 @@ static void sc16is7xx_throttle(struct uart_port *port) * value set in MCR register. Stop reading data from RX FIFO so the * AutoRTS feature will de-activate RTS output. */ - spin_lock_irqsave(&port->lock, flags); + uart_port_lock_irqsave(port, &flags); sc16is7xx_ier_clear(port, SC16IS7XX_IER_RDI_BIT); - spin_unlock_irqrestore(&port->lock, flags); + uart_port_unlock_irqrestore(port, flags); }
static void sc16is7xx_unthrottle(struct uart_port *port) { unsigned long flags;
- spin_lock_irqsave(&port->lock, flags); + uart_port_lock_irqsave(port, &flags); sc16is7xx_ier_set(port, SC16IS7XX_IER_RDI_BIT); - spin_unlock_irqrestore(&port->lock, flags); + uart_port_unlock_irqrestore(port, flags); }
static unsigned int sc16is7xx_tx_empty(struct uart_port *port) @@ -1101,7 +1101,7 @@ static void sc16is7xx_set_termios(struct uart_port *port, /* Setup baudrate generator */ baud = sc16is7xx_set_baud(port, baud);
- spin_lock_irqsave(&port->lock, flags); + uart_port_lock_irqsave(port, &flags);
/* Update timeout according to new baud rate */ uart_update_timeout(port, termios->c_cflag, baud); @@ -1109,7 +1109,7 @@ static void sc16is7xx_set_termios(struct uart_port *port, if (UART_ENABLE_MS(port, termios->c_cflag)) sc16is7xx_enable_ms(port);
- spin_unlock_irqrestore(&port->lock, flags); + uart_port_unlock_irqrestore(port, flags); }
static int sc16is7xx_config_rs485(struct uart_port *port, struct ktermios *termios, @@ -1195,9 +1195,9 @@ static int sc16is7xx_startup(struct uart_port *port) sc16is7xx_port_write(port, SC16IS7XX_IER_REG, val);
/* Enable modem status polling */ - spin_lock_irqsave(&port->lock, flags); + uart_port_lock_irqsave(port, &flags); sc16is7xx_enable_ms(port); - spin_unlock_irqrestore(&port->lock, flags); + uart_port_unlock_irqrestore(port, flags);
return 0; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hugo Villeneuve hvilleneuve@dimonoff.com
[ Upstream commit 9915753037eba7135b209fef4f2afeca841af816 ]
Commit cc4c1d05eb10 ("sc16is7xx: Properly resume TX after stop") changed behavior to unconditionnaly set the THRI interrupt in sc16is7xx_tx_proc().
For example when sending a 65 bytes message, and assuming the Tx FIFO is initially empty, sc16is7xx_handle_tx() will write the first 64 bytes of the message to the FIFO and sc16is7xx_tx_proc() will then activate THRI. When the THRI IRQ is fired, the driver will write the remaining byte of the message to the FIFO, and disable THRI by calling sc16is7xx_stop_tx().
When sending a 2 bytes message, sc16is7xx_handle_tx() will write the 2 bytes of the message to the FIFO and call sc16is7xx_stop_tx(), disabling THRI. After sc16is7xx_handle_tx() exits, control returns to sc16is7xx_tx_proc() which will unconditionally set THRI. When the THRI IRQ is fired, the driver simply acknowledges the interrupt and does nothing more, since all the data has already been written to the FIFO. This results in 2 register writes and 4 register reads all for nothing and taking precious cycles from the I2C/SPI bus.
Fix this by enabling the THRI interrupt only when we fill the Tx FIFO to its maximum capacity and there are remaining bytes to send in the message.
Fixes: cc4c1d05eb10 ("sc16is7xx: Properly resume TX after stop") Cc: stable@vger.kernel.org Signed-off-by: Hugo Villeneuve hvilleneuve@dimonoff.com Link: https://lore.kernel.org/r/20231211171353.2901416-7-hugo@hugovil.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/sc16is7xx.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c index 425093ce3f24..f75b8bceb8ca 100644 --- a/drivers/tty/serial/sc16is7xx.c +++ b/drivers/tty/serial/sc16is7xx.c @@ -676,6 +676,8 @@ static void sc16is7xx_handle_tx(struct uart_port *port)
if (uart_circ_empty(xmit)) sc16is7xx_stop_tx(port); + else + sc16is7xx_ier_set(port, SC16IS7XX_IER_THRI_BIT); uart_port_unlock_irqrestore(port, flags); }
@@ -802,7 +804,6 @@ static void sc16is7xx_tx_proc(struct kthread_work *ws) { struct uart_port *port = &(to_sc16is7xx_one(ws, tx_work)->port); struct sc16is7xx_one *one = to_sc16is7xx_one(port, port); - unsigned long flags;
if ((port->rs485.flags & SER_RS485_ENABLED) && (port->rs485.delay_rts_before_send > 0)) @@ -811,10 +812,6 @@ static void sc16is7xx_tx_proc(struct kthread_work *ws) mutex_lock(&one->efr_lock); sc16is7xx_handle_tx(port); mutex_unlock(&one->efr_lock); - - uart_port_lock_irqsave(port, &flags); - sc16is7xx_ier_set(port, SC16IS7XX_IER_THRI_BIT); - uart_port_unlock_irqrestore(port, flags); }
static void sc16is7xx_reconf_rs485(struct uart_port *port)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naohiro Aota naohiro.aota@wdc.com
[ Upstream commit b271fee9a41ca1474d30639fd6cc912c9901d0f8 ]
Factor out prepare_allocation_zoned() for further extension. While at it, optimize the if-branch a bit.
Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Naohiro Aota naohiro.aota@wdc.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 02444f2ac26e ("btrfs: zoned: optimize hint byte for zoned allocator") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent-tree.c | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index d421e289dc73..69abb6eb81df 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4139,6 +4139,24 @@ static int prepare_allocation_clustered(struct btrfs_fs_info *fs_info, return 0; }
+static int prepare_allocation_zoned(struct btrfs_fs_info *fs_info, + struct find_free_extent_ctl *ffe_ctl) +{ + if (ffe_ctl->for_treelog) { + spin_lock(&fs_info->treelog_bg_lock); + if (fs_info->treelog_bg) + ffe_ctl->hint_byte = fs_info->treelog_bg; + spin_unlock(&fs_info->treelog_bg_lock); + } else if (ffe_ctl->for_data_reloc) { + spin_lock(&fs_info->relocation_bg_lock); + if (fs_info->data_reloc_bg) + ffe_ctl->hint_byte = fs_info->data_reloc_bg; + spin_unlock(&fs_info->relocation_bg_lock); + } + + return 0; +} + static int prepare_allocation(struct btrfs_fs_info *fs_info, struct find_free_extent_ctl *ffe_ctl, struct btrfs_space_info *space_info, @@ -4149,19 +4167,7 @@ static int prepare_allocation(struct btrfs_fs_info *fs_info, return prepare_allocation_clustered(fs_info, ffe_ctl, space_info, ins); case BTRFS_EXTENT_ALLOC_ZONED: - if (ffe_ctl->for_treelog) { - spin_lock(&fs_info->treelog_bg_lock); - if (fs_info->treelog_bg) - ffe_ctl->hint_byte = fs_info->treelog_bg; - spin_unlock(&fs_info->treelog_bg_lock); - } - if (ffe_ctl->for_data_reloc) { - spin_lock(&fs_info->relocation_bg_lock); - if (fs_info->data_reloc_bg) - ffe_ctl->hint_byte = fs_info->data_reloc_bg; - spin_unlock(&fs_info->relocation_bg_lock); - } - return 0; + return prepare_allocation_zoned(fs_info, ffe_ctl); default: BUG(); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Naohiro Aota naohiro.aota@wdc.com
[ Upstream commit 02444f2ac26eae6385a65fcd66915084d15dffba ]
Writing sequentially to a huge file on btrfs on a SMR HDD revealed a decline of the performance (220 MiB/s to 30 MiB/s after 500 minutes).
The performance goes down because of increased latency of the extent allocation, which is induced by a traversing of a lot of full block groups.
So, this patch optimizes the ffe_ctl->hint_byte by choosing a block group with sufficient size from the active block group list, which does not contain full block groups.
After applying the patch, the performance is maintained well.
Fixes: 2eda57089ea3 ("btrfs: zoned: implement sequential extent allocation") CC: stable@vger.kernel.org # 5.15+ Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Naohiro Aota naohiro.aota@wdc.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/extent-tree.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 69abb6eb81df..b89b558b1592 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -4152,6 +4152,24 @@ static int prepare_allocation_zoned(struct btrfs_fs_info *fs_info, if (fs_info->data_reloc_bg) ffe_ctl->hint_byte = fs_info->data_reloc_bg; spin_unlock(&fs_info->relocation_bg_lock); + } else if (ffe_ctl->flags & BTRFS_BLOCK_GROUP_DATA) { + struct btrfs_block_group *block_group; + + spin_lock(&fs_info->zone_active_bgs_lock); + list_for_each_entry(block_group, &fs_info->zone_active_bgs, active_bg_list) { + /* + * No lock is OK here because avail is monotinically + * decreasing, and this is just a hint. + */ + u64 avail = block_group->zone_capacity - block_group->alloc_offset; + + if (block_group_bits(block_group, ffe_ctl->flags) && + avail >= ffe_ctl->num_bytes) { + ffe_ctl->hint_byte = block_group->start; + break; + } + } + spin_unlock(&fs_info->zone_active_bgs_lock); }
return 0;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mika Kahola mika.kahola@intel.com
[ Upstream commit a2cd15c2411624a7a97bad60d98d7e0a1e5002a6 ]
Watchdog timers for Lunarlake HW were removed for PSR/PSR2 The patch removes the use of these timers from the driver code.
BSpec: 69895
v2: Reword commit message (Ville) Drop HPD mask from LNL (Ville) Revise masking logic (Jouni) v3: Revise commit message (Ville) Revert HPD mask removal as irrelevant for this patch (Ville)
Signed-off-by: Mika Kahola mika.kahola@intel.com Reviewed-by: Jouni Högander jouni.hogander@intel.com Signed-off-by: Jouni Högander jouni.hogander@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20231010095233.590613-1-mika.k... Stable-dep-of: f9f031dd21a7 ("drm/i915/psr: Only allow PSR in LPSP mode on HSW non-ULT") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/display/intel_psr.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_psr.c b/drivers/gpu/drm/i915/display/intel_psr.c index 97d5eef10130..848ac483259b 100644 --- a/drivers/gpu/drm/i915/display/intel_psr.c +++ b/drivers/gpu/drm/i915/display/intel_psr.c @@ -674,7 +674,9 @@ static void hsw_activate_psr1(struct intel_dp *intel_dp)
val |= EDP_PSR_IDLE_FRAMES(psr_compute_idle_frames(intel_dp));
- val |= EDP_PSR_MAX_SLEEP_TIME(max_sleep_time); + if (DISPLAY_VER(dev_priv) < 20) + val |= EDP_PSR_MAX_SLEEP_TIME(max_sleep_time); + if (IS_HASWELL(dev_priv)) val |= EDP_PSR_MIN_LINK_ENTRY_TIME_8_LINES;
@@ -1399,8 +1401,10 @@ static void intel_psr_enable_source(struct intel_dp *intel_dp, */ mask = EDP_PSR_DEBUG_MASK_MEMUP | EDP_PSR_DEBUG_MASK_HPD | - EDP_PSR_DEBUG_MASK_LPSP | - EDP_PSR_DEBUG_MASK_MAX_SLEEP; + EDP_PSR_DEBUG_MASK_LPSP; + + if (DISPLAY_VER(dev_priv) < 20) + mask |= EDP_PSR_DEBUG_MASK_MAX_SLEEP;
/* * No separate pipe reg write mask on hsw/bdw, so have to unmask all
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Ville Syrjälä ville.syrjala@linux.intel.com
[ Upstream commit f9f031dd21a7ce13a13862fa5281d32e1029c70f ]
On HSW non-ULT (or at least on Dell Latitude E6540) external displays start to flicker when we enable PSR on the eDP. We observe a much higher SR and PC6 residency than should be possible with an external display, and indeen much higher than what we observe with eDP disabled and only the external display enabled. Looks like the hardware is somehow ignoring the fact that the external display is active during PSR.
I wasn't able to redproduce this on my HSW ULT machine, or BDW. So either there's something specific about this particular laptop (eg. some unknown firmware thing) or the issue is limited to just non-ULT HSW systems. All known registers that could affect this look perfectly reasonable on the affected machine.
As a workaround let's unmask the LPSP event to prevent PSR entry except while in LPSP mode (only pipe A + eDP active). This will prevent PSR entry entirely when multiple pipes are active. The one slight downside is that we now also prevent PSR entry when driving eDP with pipe B or C, but I think that's a reasonable tradeoff to avoid having to implement a more complex workaround.
Cc: stable@vger.kernel.org Fixes: 783d8b80871f ("drm/i915/psr: Re-enable PSR1 on hsw/bdw") Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10092 Signed-off-by: Ville Syrjälä ville.syrjala@linux.intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20240118212131.31868-1-ville.s... Reviewed-by: Jouni Högander jouni.hogander@intel.com (cherry picked from commit 94501c3ca6400e463ff6cc0c9cf4a2feb6a9205d) Signed-off-by: Joonas Lahtinen joonas.lahtinen@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/display/intel_psr.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_psr.c b/drivers/gpu/drm/i915/display/intel_psr.c index 848ac483259b..5cf3db7058b9 100644 --- a/drivers/gpu/drm/i915/display/intel_psr.c +++ b/drivers/gpu/drm/i915/display/intel_psr.c @@ -1400,8 +1400,18 @@ static void intel_psr_enable_source(struct intel_dp *intel_dp, * can rely on frontbuffer tracking. */ mask = EDP_PSR_DEBUG_MASK_MEMUP | - EDP_PSR_DEBUG_MASK_HPD | - EDP_PSR_DEBUG_MASK_LPSP; + EDP_PSR_DEBUG_MASK_HPD; + + /* + * For some unknown reason on HSW non-ULT (or at least on + * Dell Latitude E6540) external displays start to flicker + * when PSR is enabled on the eDP. SR/PC6 residency is much + * higher than should be possible with an external display. + * As a workaround leave LPSP unmasked to prevent PSR entry + * when external displays are active. + */ + if (DISPLAY_VER(dev_priv) >= 8 || IS_HASWELL_ULT(dev_priv)) + mask |= EDP_PSR_DEBUG_MASK_LPSP;
if (DISPLAY_VER(dev_priv) < 20) mask |= EDP_PSR_DEBUG_MASK_MAX_SLEEP;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Sheng-Liang Pan sheng-liang.pan@quanta.corp-partner.google.com
[ Upstream commit 3db2420422a5912d97966e0176050bb0fc9aa63e ]
Add panel identification entry for - AUO B116XTN02 family (product ID:0x235c) - BOE NT116WHM-N21,836X2 (product ID:0x09c3) - BOE NV116WHM-N49 V8.0 (product ID:0x0979)
Signed-off-by: Sheng-Liang Pan sheng-liang.pan@quanta.corp-partner.google.com Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://patchwork.freedesktop.org/patch/msgid/20231027110435.1.Ia01fe9ec1c09... Stable-dep-of: fc6e76792965 ("drm/panel-edp: drm/panel-edp: Fix AUO B116XAK01 name and timing") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/panel/panel-edp.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/panel/panel-edp.c b/drivers/gpu/drm/panel/panel-edp.c index 95c8472d878a..5bf28c8443ef 100644 --- a/drivers/gpu/drm/panel/panel-edp.c +++ b/drivers/gpu/drm/panel/panel-edp.c @@ -1840,6 +1840,7 @@ static const struct edp_panel_entry edp_panels[] = { EDP_PANEL_ENTRY('A', 'U', 'O', 0x145c, &delay_200_500_e50, "B116XAB01.4"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x1e9b, &delay_200_500_e50, "B133UAN02.1"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x1ea5, &delay_200_500_e50, "B116XAK01.6"), + EDP_PANEL_ENTRY('A', 'U', 'O', 0x235c, &delay_200_500_e50, "B116XTN02"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x405c, &auo_b116xak01.delay, "B116XAK01"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x582d, &delay_200_500_e50, "B133UAN01.0"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x615c, &delay_200_500_e50, "B116XAN06.1"), @@ -1848,8 +1849,10 @@ static const struct edp_panel_entry edp_panels[] = { EDP_PANEL_ENTRY('B', 'O', 'E', 0x0786, &delay_200_500_p2e80, "NV116WHM-T01"), EDP_PANEL_ENTRY('B', 'O', 'E', 0x07d1, &boe_nv133fhm_n61.delay, "NV133FHM-N61"), EDP_PANEL_ENTRY('B', 'O', 'E', 0x082d, &boe_nv133fhm_n61.delay, "NV133FHM-N62"), + EDP_PANEL_ENTRY('B', 'O', 'E', 0x09c3, &delay_200_500_e50, "NT116WHM-N21,836X2"), EDP_PANEL_ENTRY('B', 'O', 'E', 0x094b, &delay_200_500_e50, "NT116WHM-N21"), EDP_PANEL_ENTRY('B', 'O', 'E', 0x095f, &delay_200_500_e50, "NE135FBM-N41 v8.1"), + EDP_PANEL_ENTRY('B', 'O', 'E', 0x0979, &delay_200_500_e50, "NV116WHM-N49 V8.0"), EDP_PANEL_ENTRY('B', 'O', 'E', 0x098d, &boe_nv110wtm_n61.delay, "NV110WTM-N61"), EDP_PANEL_ENTRY('B', 'O', 'E', 0x09dd, &delay_200_500_e50, "NT116WHM-N21"), EDP_PANEL_ENTRY('B', 'O', 'E', 0x0a5d, &delay_200_500_e50, "NV116WHM-N45"),
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hsin-Yi Wang hsinyi@chromium.org
[ Upstream commit fc6e7679296530106ee0954e8ddef1aa58b2e0b5 ]
Rename AUO 0x405c B116XAK01 to B116XAK01.0 and adjust the timing of auo_b116xak01: T3=200, T12=500, T7_max = 50 according to decoding edid and datasheet.
Fixes: da458286a5e2 ("drm/panel: Add support for AUO B116XAK01 panel") Cc: stable@vger.kernel.org Signed-off-by: Hsin-Yi Wang hsinyi@chromium.org Reviewed-by: Douglas Anderson dianders@chromium.org Acked-by: Maxime Ripard mripard@kernel.org Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://patchwork.freedesktop.org/patch/msgid/20231107204611.3082200-2-hsiny... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/panel/panel-edp.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panel/panel-edp.c b/drivers/gpu/drm/panel/panel-edp.c index 5bf28c8443ef..e93e54a98260 100644 --- a/drivers/gpu/drm/panel/panel-edp.c +++ b/drivers/gpu/drm/panel/panel-edp.c @@ -973,6 +973,8 @@ static const struct panel_desc auo_b116xak01 = { }, .delay = { .hpd_absent = 200, + .unprepare = 500, + .enable = 50, }, };
@@ -1841,7 +1843,7 @@ static const struct edp_panel_entry edp_panels[] = { EDP_PANEL_ENTRY('A', 'U', 'O', 0x1e9b, &delay_200_500_e50, "B133UAN02.1"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x1ea5, &delay_200_500_e50, "B116XAK01.6"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x235c, &delay_200_500_e50, "B116XTN02"), - EDP_PANEL_ENTRY('A', 'U', 'O', 0x405c, &auo_b116xak01.delay, "B116XAK01"), + EDP_PANEL_ENTRY('A', 'U', 'O', 0x405c, &auo_b116xak01.delay, "B116XAK01.0"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x582d, &delay_200_500_e50, "B133UAN01.0"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x615c, &delay_200_500_e50, "B116XAN06.1"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x8594, &delay_200_500_e50, "B133UAN01.0"),
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hsin-Yi Wang hsinyi@chromium.org
[ Upstream commit 962845c090c4f85fa4f6872a5b6c89ee61f53cc0 ]
Rename AUO 0x235c B116XTN02 to B116XTN02.3 according to decoding edid.
Fixes: 3db2420422a5 ("drm/panel-edp: Add AUO B116XTN02, BOE NT116WHM-N21,836X2, NV116WHM-N49 V8.0") Cc: stable@vger.kernel.org Signed-off-by: Hsin-Yi Wang hsinyi@chromium.org Reviewed-by: Douglas Anderson dianders@chromium.org Acked-by: Maxime Ripard mripard@kernel.org Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://patchwork.freedesktop.org/patch/msgid/20231107204611.3082200-3-hsiny... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/panel/panel-edp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panel/panel-edp.c b/drivers/gpu/drm/panel/panel-edp.c index e93e54a98260..7dc6fb7308ce 100644 --- a/drivers/gpu/drm/panel/panel-edp.c +++ b/drivers/gpu/drm/panel/panel-edp.c @@ -1842,7 +1842,7 @@ static const struct edp_panel_entry edp_panels[] = { EDP_PANEL_ENTRY('A', 'U', 'O', 0x145c, &delay_200_500_e50, "B116XAB01.4"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x1e9b, &delay_200_500_e50, "B133UAN02.1"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x1ea5, &delay_200_500_e50, "B116XAK01.6"), - EDP_PANEL_ENTRY('A', 'U', 'O', 0x235c, &delay_200_500_e50, "B116XTN02"), + EDP_PANEL_ENTRY('A', 'U', 'O', 0x235c, &delay_200_500_e50, "B116XTN02.3"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x405c, &auo_b116xak01.delay, "B116XAK01.0"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x582d, &delay_200_500_e50, "B133UAN01.0"), EDP_PANEL_ENTRY('A', 'U', 'O', 0x615c, &delay_200_500_e50, "B116XAN06.1"),
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
[ Upstream commit 03ff6d7238b77e5fb2b85dc5fe01d2db9eb893bd ]
This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Updated firmware is also required for AQL.
Reviewed-by: Feifei Xu Feifei.Xu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 34dc3d5bbf35..c2b9dfc6451d 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c @@ -6572,7 +6572,7 @@ static int gfx_v10_0_compute_mqd_init(struct amdgpu_device *adev, void *m, #ifdef __BIG_ENDIAN tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, ENDIAN_SWAP, 1); #endif - tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 0); + tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 1); tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, TUNNEL_DISPATCH, 0); tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, PRIV_STATE, 1); tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, KMD_QUEUE, 1); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c index 8b7fed913526..22cbfa1bdadd 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v10.c @@ -170,6 +170,7 @@ static void update_mqd(struct mqd_manager *mm, void *mqd, m->cp_hqd_pq_control = 5 << CP_HQD_PQ_CONTROL__RPTR_BLOCK_SIZE__SHIFT; m->cp_hqd_pq_control |= ffs(q->queue_size / sizeof(unsigned int)) - 1 - 1; + m->cp_hqd_pq_control |= CP_HQD_PQ_CONTROL__UNORD_DISPATCH_MASK; pr_debug("cp_hqd_pq_control 0x%x\n", m->cp_hqd_pq_control);
m->cp_hqd_pq_base_lo = lower_32_bits((uint64_t)q->queue_address >> 8);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Alex Deucher alexander.deucher@amd.com
[ Upstream commit 3380fcad2c906872110d31ddf7aa1fdea57f9df6 ]
This needs to be set to 1 to avoid a potential deadlock in the GC 10.x and newer. On GC 9.x and older, this needs to be set to 0. This can lead to hangs in some mixed graphics and compute workloads. Updated firmware is also required for AQL.
Reviewed-by: Feifei Xu Feifei.Xu@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c | 1 + 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index 59efd4ece92d..d0c3ec9f4fb6 100644 --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c @@ -3807,7 +3807,7 @@ static int gfx_v11_0_compute_mqd_init(struct amdgpu_device *adev, void *m, (order_base_2(prop->queue_size / 4) - 1)); tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, RPTR_BLOCK_SIZE, (order_base_2(AMDGPU_GPU_PAGE_SIZE / 4) - 1)); - tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 0); + tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, UNORD_DISPATCH, 1); tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, TUNNEL_DISPATCH, 0); tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, PRIV_STATE, 1); tmp = REG_SET_FIELD(tmp, CP_HQD_PQ_CONTROL, KMD_QUEUE, 1); diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c index 15277f1d5cf0..d722cbd31783 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v11.c @@ -224,6 +224,7 @@ static void update_mqd(struct mqd_manager *mm, void *mqd, m->cp_hqd_pq_control = 5 << CP_HQD_PQ_CONTROL__RPTR_BLOCK_SIZE__SHIFT; m->cp_hqd_pq_control |= ffs(q->queue_size / sizeof(unsigned int)) - 1 - 1; + m->cp_hqd_pq_control |= CP_HQD_PQ_CONTROL__UNORD_DISPATCH_MASK; pr_debug("cp_hqd_pq_control 0x%x\n", m->cp_hqd_pq_control);
m->cp_hqd_pq_base_lo = lower_32_bits((uint64_t)q->queue_address >> 8);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Douglas Anderson dianders@chromium.org
[ Upstream commit 024b32db43a359e0ded3fcc6cd86247cbbed4224 ]
Unlike what is claimed in commit f5aa7d46b0ee ("drm/bridge: parade-ps8640: Provide wait_hpd_asserted() in struct drm_dp_aux"), if someone manually tries to do an AUX transfer (like via `i2cdump ${bus} 0x50 i`) while the panel is off we don't just get a simple transfer error. Instead, the whole ps8640 gets thrown for a loop and goes into a bad state.
Let's put the function to wait for the HPD (and the magical 50 ms after first reset) back in when we're doing an AUX transfer. This shouldn't actually make things much slower (assuming the panel is on) because we should immediately poll and see the HPD high. Mostly this is just an extra i2c transfer to the bridge.
Fixes: f5aa7d46b0ee ("drm/bridge: parade-ps8640: Provide wait_hpd_asserted() in struct drm_dp_aux") Tested-by: Pin-yen Lin treapking@chromium.org Reviewed-by: Pin-yen Lin treapking@chromium.org Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://patchwork.freedesktop.org/patch/msgid/20231221135548.1.I10f326a9305d... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/parade-ps8640.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c b/drivers/gpu/drm/bridge/parade-ps8640.c index 541e4f5afc4c..fb5e9ae9ad81 100644 --- a/drivers/gpu/drm/bridge/parade-ps8640.c +++ b/drivers/gpu/drm/bridge/parade-ps8640.c @@ -346,6 +346,11 @@ static ssize_t ps8640_aux_transfer(struct drm_dp_aux *aux, int ret;
pm_runtime_get_sync(dev); + ret = _ps8640_wait_hpd_asserted(ps_bridge, 200 * 1000); + if (ret) { + pm_runtime_put_sync_suspend(dev); + return ret; + } ret = ps8640_aux_transfer_msg(aux, msg); pm_runtime_mark_last_busy(dev); pm_runtime_put_autosuspend(dev);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Markus Niebel Markus.Niebel@ew.tq-group.com
[ Upstream commit 45dd7df26cee741b31c25ffdd44fb8794eb45ccd ]
The DE signal is active high on this display, fill in the missing bus_flags. This aligns panel_desc with its display_timing.
Fixes: 9a2654c0f62a ("drm/panel: Add and fill drm_panel type field") Fixes: b3bfcdf8a3b6 ("drm/panel: simple: add Tianma TM070JVHG33")
Signed-off-by: Markus Niebel Markus.Niebel@ew.tq-group.com Signed-off-by: Alexander Stein alexander.stein@ew.tq-group.com Reviewed-by: Sam Ravnborg sam@ravnborg.org Link: https://lore.kernel.org/r/20231012084208.2731650-1-alexander.stein@ew.tq-gro... Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20231012084208.2731650-1-alexa... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/panel/panel-simple.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/panel/panel-simple.c b/drivers/gpu/drm/panel/panel-simple.c index 6e46e55d29a9..51f838befb32 100644 --- a/drivers/gpu/drm/panel/panel-simple.c +++ b/drivers/gpu/drm/panel/panel-simple.c @@ -3782,6 +3782,7 @@ static const struct panel_desc tianma_tm070jdhg30 = { }, .bus_format = MEDIA_BUS_FMT_RGB888_1X7X4_SPWG, .connector_type = DRM_MODE_CONNECTOR_LVDS, + .bus_flags = DRM_BUS_FLAG_DE_HIGH, };
static const struct panel_desc tianma_tm070jvhg33 = { @@ -3794,6 +3795,7 @@ static const struct panel_desc tianma_tm070jvhg33 = { }, .bus_format = MEDIA_BUS_FMT_RGB888_1X7X4_SPWG, .connector_type = DRM_MODE_CONNECTOR_LVDS, + .bus_flags = DRM_BUS_FLAG_DE_HIGH, };
static const struct display_timing tianma_tm070rvhg71_timing = {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Artur Weber aweber.kernel@gmail.com
[ Upstream commit 62b143b5ec4a14e1ae0dede5aabaf1832e3b0073 ]
It turns out that I had misconfigured the device I was using the panel with; the bus data polarity is not high for this panel, I had to change the config on the display controller's side.
Fix the panel config to properly reflect its accurate settings.
Fixes: 6810bb390282 ("drm/panel: Add Samsung S6D7AA0 panel controller driver") Reviewed-by: Jessica Zhang quic_jesszhan@quicinc.com Signed-off-by: Artur Weber aweber.kernel@gmail.com Link: https://lore.kernel.org/r/20240105-tab3-display-fixes-v2-2-904d1207bf6f@gmai... Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20240105-tab3-display-fixes-v2... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/panel/panel-samsung-s6d7aa0.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/panel/panel-samsung-s6d7aa0.c b/drivers/gpu/drm/panel/panel-samsung-s6d7aa0.c index ea5a85779382..f23d8832a1ad 100644 --- a/drivers/gpu/drm/panel/panel-samsung-s6d7aa0.c +++ b/drivers/gpu/drm/panel/panel-samsung-s6d7aa0.c @@ -309,7 +309,7 @@ static const struct s6d7aa0_panel_desc s6d7aa0_lsl080al02_desc = { .off_func = s6d7aa0_lsl080al02_off, .drm_mode = &s6d7aa0_lsl080al02_mode, .mode_flags = MIPI_DSI_MODE_VSYNC_FLUSH | MIPI_DSI_MODE_VIDEO_NO_HFP, - .bus_flags = DRM_BUS_FLAG_DE_HIGH, + .bus_flags = 0,
.has_backlight = false, .use_passwd3 = false,
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomi Valkeinen tomi.valkeinen@ideasonboard.com
[ Upstream commit 08ac6f132dd77e40f786d8af51140c96c6d739c9 ]
A null pointer dereference crash has been observed rarely on TI platforms using sii9022 bridge:
[ 53.271356] sii902x_get_edid+0x34/0x70 [sii902x] [ 53.276066] sii902x_bridge_get_edid+0x14/0x20 [sii902x] [ 53.281381] drm_bridge_get_edid+0x20/0x34 [drm] [ 53.286305] drm_bridge_connector_get_modes+0x8c/0xcc [drm_kms_helper] [ 53.292955] drm_helper_probe_single_connector_modes+0x190/0x538 [drm_kms_helper] [ 53.300510] drm_client_modeset_probe+0x1f0/0xbd4 [drm] [ 53.305958] __drm_fb_helper_initial_config_and_unlock+0x50/0x510 [drm_kms_helper] [ 53.313611] drm_fb_helper_initial_config+0x48/0x58 [drm_kms_helper] [ 53.320039] drm_fbdev_dma_client_hotplug+0x84/0xd4 [drm_dma_helper] [ 53.326401] drm_client_register+0x5c/0xa0 [drm] [ 53.331216] drm_fbdev_dma_setup+0xc8/0x13c [drm_dma_helper] [ 53.336881] tidss_probe+0x128/0x264 [tidss] [ 53.341174] platform_probe+0x68/0xc4 [ 53.344841] really_probe+0x188/0x3c4 [ 53.348501] __driver_probe_device+0x7c/0x16c [ 53.352854] driver_probe_device+0x3c/0x10c [ 53.357033] __device_attach_driver+0xbc/0x158 [ 53.361472] bus_for_each_drv+0x88/0xe8 [ 53.365303] __device_attach+0xa0/0x1b4 [ 53.369135] device_initial_probe+0x14/0x20 [ 53.373314] bus_probe_device+0xb0/0xb4 [ 53.377145] deferred_probe_work_func+0xcc/0x124 [ 53.381757] process_one_work+0x1f0/0x518 [ 53.385770] worker_thread+0x1e8/0x3dc [ 53.389519] kthread+0x11c/0x120 [ 53.392750] ret_from_fork+0x10/0x20
The issue here is as follows:
- tidss probes, but is deferred as sii902x is still missing. - sii902x starts probing and enters sii902x_init(). - sii902x calls drm_bridge_add(). Now the sii902x bridge is ready from DRM's perspective. - sii902x calls sii902x_audio_codec_init() and platform_device_register_data() - The registration of the audio platform device causes probing of the deferred devices. - tidss probes, which eventually causes sii902x_bridge_get_edid() to be called. - sii902x_bridge_get_edid() tries to use the i2c to read the edid. However, the sii902x driver has not set up the i2c part yet, leading to the crash.
Fix this by moving the drm_bridge_add() to the end of the sii902x_init(), which is also at the very end of sii902x_probe().
Signed-off-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Fixes: 21d808405fe4 ("drm/bridge/sii902x: Fix EDID readback") Acked-by: Linus Walleij linus.walleij@linaro.org Link: https://lore.kernel.org/r/20240103-si902x-fixes-v1-1-b9fd3e448411@ideasonboa... Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20240103-si902x-fixes-v1-1-b9f... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/sii902x.c | 29 ++++++++++++++++------------- 1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/drivers/gpu/drm/bridge/sii902x.c b/drivers/gpu/drm/bridge/sii902x.c index 2bdc5b439beb..69da73e414a9 100644 --- a/drivers/gpu/drm/bridge/sii902x.c +++ b/drivers/gpu/drm/bridge/sii902x.c @@ -1080,16 +1080,6 @@ static int sii902x_init(struct sii902x *sii902x) return ret; }
- sii902x->bridge.funcs = &sii902x_bridge_funcs; - sii902x->bridge.of_node = dev->of_node; - sii902x->bridge.timings = &default_sii902x_timings; - sii902x->bridge.ops = DRM_BRIDGE_OP_DETECT | DRM_BRIDGE_OP_EDID; - - if (sii902x->i2c->irq > 0) - sii902x->bridge.ops |= DRM_BRIDGE_OP_HPD; - - drm_bridge_add(&sii902x->bridge); - sii902x_audio_codec_init(sii902x, dev);
i2c_set_clientdata(sii902x->i2c, sii902x); @@ -1102,7 +1092,21 @@ static int sii902x_init(struct sii902x *sii902x) return -ENOMEM;
sii902x->i2cmux->priv = sii902x; - return i2c_mux_add_adapter(sii902x->i2cmux, 0, 0, 0); + ret = i2c_mux_add_adapter(sii902x->i2cmux, 0, 0, 0); + if (ret) + return ret; + + sii902x->bridge.funcs = &sii902x_bridge_funcs; + sii902x->bridge.of_node = dev->of_node; + sii902x->bridge.timings = &default_sii902x_timings; + sii902x->bridge.ops = DRM_BRIDGE_OP_DETECT | DRM_BRIDGE_OP_EDID; + + if (sii902x->i2c->irq > 0) + sii902x->bridge.ops |= DRM_BRIDGE_OP_HPD; + + drm_bridge_add(&sii902x->bridge); + + return 0; }
static int sii902x_probe(struct i2c_client *client) @@ -1170,12 +1174,11 @@ static int sii902x_probe(struct i2c_client *client) }
static void sii902x_remove(struct i2c_client *client) - { struct sii902x *sii902x = i2c_get_clientdata(client);
- i2c_mux_del_adapters(sii902x->i2cmux); drm_bridge_remove(&sii902x->bridge); + i2c_mux_del_adapters(sii902x->i2cmux); }
static const struct of_device_id sii902x_dt_ids[] = {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tomi Valkeinen tomi.valkeinen@ideasonboard.com
[ Upstream commit 3fc6c76a8d208d3955c9e64b382d0ff370bc61fc ]
The driver never unregisters the audio codec platform device, which can lead to a crash on module reloading, nor does it handle the return value from sii902x_audio_codec_init().
Signed-off-by: Tomi Valkeinen tomi.valkeinen@ideasonboard.com Fixes: ff5781634c41 ("drm/bridge: sii902x: Implement HDMI audio support") Cc: Jyri Sarha jsarha@ti.com Acked-by: Linus Walleij linus.walleij@linaro.org Link: https://lore.kernel.org/r/20240103-si902x-fixes-v1-2-b9fd3e448411@ideasonboa... Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20240103-si902x-fixes-v1-2-b9f... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/sii902x.c | 21 +++++++++++++++++---- 1 file changed, 17 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/bridge/sii902x.c b/drivers/gpu/drm/bridge/sii902x.c index 69da73e414a9..4560ae9cbce1 100644 --- a/drivers/gpu/drm/bridge/sii902x.c +++ b/drivers/gpu/drm/bridge/sii902x.c @@ -1080,7 +1080,9 @@ static int sii902x_init(struct sii902x *sii902x) return ret; }
- sii902x_audio_codec_init(sii902x, dev); + ret = sii902x_audio_codec_init(sii902x, dev); + if (ret) + return ret;
i2c_set_clientdata(sii902x->i2c, sii902x);
@@ -1088,13 +1090,15 @@ static int sii902x_init(struct sii902x *sii902x) 1, 0, I2C_MUX_GATE, sii902x_i2c_bypass_select, sii902x_i2c_bypass_deselect); - if (!sii902x->i2cmux) - return -ENOMEM; + if (!sii902x->i2cmux) { + ret = -ENOMEM; + goto err_unreg_audio; + }
sii902x->i2cmux->priv = sii902x; ret = i2c_mux_add_adapter(sii902x->i2cmux, 0, 0, 0); if (ret) - return ret; + goto err_unreg_audio;
sii902x->bridge.funcs = &sii902x_bridge_funcs; sii902x->bridge.of_node = dev->of_node; @@ -1107,6 +1111,12 @@ static int sii902x_init(struct sii902x *sii902x) drm_bridge_add(&sii902x->bridge);
return 0; + +err_unreg_audio: + if (!PTR_ERR_OR_ZERO(sii902x->audio.pdev)) + platform_device_unregister(sii902x->audio.pdev); + + return ret; }
static int sii902x_probe(struct i2c_client *client) @@ -1179,6 +1189,9 @@ static void sii902x_remove(struct i2c_client *client)
drm_bridge_remove(&sii902x->bridge); i2c_mux_del_adapters(sii902x->i2cmux); + + if (!PTR_ERR_OR_ZERO(sii902x->audio.pdev)) + platform_device_unregister(sii902x->audio.pdev); }
static const struct of_device_id sii902x_dt_ids[] = {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Pin-yen Lin treapking@chromium.org
[ Upstream commit 26db46bc9c675e43230cc6accd110110a7654299 ]
The ps8640 bridge seems to expect everything to be power cycled at the disable process, but sometimes ps8640_aux_transfer() holds the runtime PM reference and prevents the bridge from suspend.
Prevent that by introducing a mutex lock between ps8640_aux_transfer() and .post_disable() to make sure the bridge is really powered off.
Fixes: 826cff3f7ebb ("drm/bridge: parade-ps8640: Enable runtime power management") Signed-off-by: Pin-yen Lin treapking@chromium.org Reviewed-by: Douglas Anderson dianders@chromium.org Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://patchwork.freedesktop.org/patch/msgid/20240109120528.1292601-1-treap... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/parade-ps8640.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+)
diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c b/drivers/gpu/drm/bridge/parade-ps8640.c index fb5e9ae9ad81..166bfc725ef4 100644 --- a/drivers/gpu/drm/bridge/parade-ps8640.c +++ b/drivers/gpu/drm/bridge/parade-ps8640.c @@ -107,6 +107,7 @@ struct ps8640 { struct device_link *link; bool pre_enabled; bool need_post_hpd_delay; + struct mutex aux_lock; };
static const struct regmap_config ps8640_regmap_config[] = { @@ -345,6 +346,7 @@ static ssize_t ps8640_aux_transfer(struct drm_dp_aux *aux, struct device *dev = &ps_bridge->page[PAGE0_DP_CNTL]->dev; int ret;
+ mutex_lock(&ps_bridge->aux_lock); pm_runtime_get_sync(dev); ret = _ps8640_wait_hpd_asserted(ps_bridge, 200 * 1000); if (ret) { @@ -354,6 +356,7 @@ static ssize_t ps8640_aux_transfer(struct drm_dp_aux *aux, ret = ps8640_aux_transfer_msg(aux, msg); pm_runtime_mark_last_busy(dev); pm_runtime_put_autosuspend(dev); + mutex_unlock(&ps_bridge->aux_lock);
return ret; } @@ -475,7 +478,18 @@ static void ps8640_atomic_post_disable(struct drm_bridge *bridge, ps_bridge->pre_enabled = false;
ps8640_bridge_vdo_control(ps_bridge, DISABLE); + + /* + * The bridge seems to expect everything to be power cycled at the + * disable process, so grab a lock here to make sure + * ps8640_aux_transfer() is not holding a runtime PM reference and + * preventing the bridge from suspend. + */ + mutex_lock(&ps_bridge->aux_lock); + pm_runtime_put_sync_suspend(&ps_bridge->page[PAGE0_DP_CNTL]->dev); + + mutex_unlock(&ps_bridge->aux_lock); }
static int ps8640_bridge_attach(struct drm_bridge *bridge, @@ -624,6 +638,8 @@ static int ps8640_probe(struct i2c_client *client) if (!ps_bridge) return -ENOMEM;
+ mutex_init(&ps_bridge->aux_lock); + ps_bridge->supplies[0].supply = "vdd12"; ps_bridge->supplies[1].supply = "vdd33"; ret = devm_regulator_bulk_get(dev, ARRAY_SIZE(ps_bridge->supplies),
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Douglas Anderson dianders@chromium.org
[ Upstream commit a20f1b02bafcbf5a32d96a1d4185d6981cf7d016 ]
After commit 26db46bc9c67 ("drm/bridge: parade-ps8640: Ensure bridge is suspended in .post_disable()"), if we hit the error case in ps8640_aux_transfer() then we return without dropping the mutex. Fix this oversight.
Fixes: 26db46bc9c67 ("drm/bridge: parade-ps8640: Ensure bridge is suspended in .post_disable()") Reviewed-by: Hsin-Yi Wang hsinyi@chromium.org Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://patchwork.freedesktop.org/patch/msgid/20240117103502.1.Ib726a0184913... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/parade-ps8640.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/parade-ps8640.c b/drivers/gpu/drm/bridge/parade-ps8640.c index 166bfc725ef4..14d4dcf239da 100644 --- a/drivers/gpu/drm/bridge/parade-ps8640.c +++ b/drivers/gpu/drm/bridge/parade-ps8640.c @@ -351,11 +351,13 @@ static ssize_t ps8640_aux_transfer(struct drm_dp_aux *aux, ret = _ps8640_wait_hpd_asserted(ps_bridge, 200 * 1000); if (ret) { pm_runtime_put_sync_suspend(dev); - return ret; + goto exit; } ret = ps8640_aux_transfer_msg(aux, msg); pm_runtime_mark_last_busy(dev); pm_runtime_put_autosuspend(dev); + +exit: mutex_unlock(&ps_bridge->aux_lock);
return ret;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Yajun Deng yajun.deng@linux.dev
[ Upstream commit 6a9531c3a88096a26cf3ac582f7ec44f94a7dcb2 ]
After commit 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") nid of a reserved region is used by init_reserved_page() (with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y) to access node strucure. In many cases the nid of the reserved memory is not set and this causes a crash.
When the nid of a reserved region is not set, fall back to early_pfn_to_nid(), so that nid of the first_online_node will be passed to init_reserved_page().
Fixes: 61167ad5fecd ("mm: pass nid to reserve_bootmem_region()") Signed-off-by: Yajun Deng yajun.deng@linux.dev Link: https://lore.kernel.org/r/20240118061853.2652295-1-yajun.deng@linux.dev [rppt: massaged the commit message] Signed-off-by: Mike Rapoport (IBM) rppt@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/memblock.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/mm/memblock.c b/mm/memblock.c index 913b2520a9a0..6d18485571b4 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -2119,6 +2119,9 @@ static void __init memmap_init_reserved_pages(void) start = region->base; end = start + region->size;
+ if (nid == NUMA_NO_NODE || nid >= MAX_NUMNODES) + nid = early_pfn_to_nid(PFN_DOWN(start)); + reserve_bootmem_region(start, end, nid); } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Arnd Bergmann arnd@arndb.de
[ Upstream commit 960b537e91725bcb17dd1b19e48950e62d134078 ]
gcc rightfully complains about excessive stack usage in the fimd_win_set_pixfmt() function:
drivers/gpu/drm/exynos/exynos_drm_fimd.c: In function 'fimd_win_set_pixfmt': drivers/gpu/drm/exynos/exynos_drm_fimd.c:750:1: error: the frame size of 1032 bytes is larger than 1024 byte drivers/gpu/drm/exynos/exynos5433_drm_decon.c: In function 'decon_win_set_pixfmt': drivers/gpu/drm/exynos/exynos5433_drm_decon.c:381:1: error: the frame size of 1032 bytes is larger than 1024 bytes
There is really no reason to copy the large exynos_drm_plane structure to the stack before using one of its members, so just use a pointer instead.
Fixes: 6f8ee5c21722 ("drm/exynos: fimd: Make plane alpha configurable") Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Inki Dae inki.dae@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/exynos/exynos5433_drm_decon.c | 4 ++-- drivers/gpu/drm/exynos/exynos_drm_fimd.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/exynos/exynos5433_drm_decon.c b/drivers/gpu/drm/exynos/exynos5433_drm_decon.c index 4d986077738b..bce027552474 100644 --- a/drivers/gpu/drm/exynos/exynos5433_drm_decon.c +++ b/drivers/gpu/drm/exynos/exynos5433_drm_decon.c @@ -319,9 +319,9 @@ static void decon_win_set_bldmod(struct decon_context *ctx, unsigned int win, static void decon_win_set_pixfmt(struct decon_context *ctx, unsigned int win, struct drm_framebuffer *fb) { - struct exynos_drm_plane plane = ctx->planes[win]; + struct exynos_drm_plane *plane = &ctx->planes[win]; struct exynos_drm_plane_state *state = - to_exynos_plane_state(plane.base.state); + to_exynos_plane_state(plane->base.state); unsigned int alpha = state->base.alpha; unsigned int pixel_alpha; unsigned long val; diff --git a/drivers/gpu/drm/exynos/exynos_drm_fimd.c b/drivers/gpu/drm/exynos/exynos_drm_fimd.c index 8dde7b1e9b35..5bdc246f5fad 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_fimd.c +++ b/drivers/gpu/drm/exynos/exynos_drm_fimd.c @@ -661,9 +661,9 @@ static void fimd_win_set_bldmod(struct fimd_context *ctx, unsigned int win, static void fimd_win_set_pixfmt(struct fimd_context *ctx, unsigned int win, struct drm_framebuffer *fb, int width) { - struct exynos_drm_plane plane = ctx->planes[win]; + struct exynos_drm_plane *plane = &ctx->planes[win]; struct exynos_drm_plane_state *state = - to_exynos_plane_state(plane.base.state); + to_exynos_plane_state(plane->base.state); uint32_t pixel_format = fb->format->format; unsigned int alpha = state->base.alpha; u32 val = WINCONx_ENWIN;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Fedor Pchelkin pchelkin@ispras.ru
[ Upstream commit 4050957c7c2c14aa795dbf423b4180d5ac04e113 ]
Do not forget to call clk_disable_unprepare() on the first element of ctx->clocks array.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 8b7d3ec83aba ("drm/exynos: gsc: Convert driver to IPP v2 core API") Signed-off-by: Fedor Pchelkin pchelkin@ispras.ru Reviewed-by: Marek Szyprowski m.szyprowski@samsung.com Signed-off-by: Inki Dae inki.dae@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/exynos/exynos_drm_gsc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_gsc.c b/drivers/gpu/drm/exynos/exynos_drm_gsc.c index 34cdabc30b4f..5302bebbe38c 100644 --- a/drivers/gpu/drm/exynos/exynos_drm_gsc.c +++ b/drivers/gpu/drm/exynos/exynos_drm_gsc.c @@ -1342,7 +1342,7 @@ static int __maybe_unused gsc_runtime_resume(struct device *dev) for (i = 0; i < ctx->num_clocks; i++) { ret = clk_prepare_enable(ctx->clocks[i]); if (ret) { - while (--i > 0) + while (--i >= 0) clk_disable_unprepare(ctx->clocks[i]); return ret; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit e8ef4bbe39b9576a73f104f6af743fb9c7b624ba ]
When storing opps by level or index use xa_insert() instead of xa_store() and add error-checking to spot bad duplicates indexes possibly wrongly provided by the platform firmware.
Fixes: 31c7c1397a33 ("firmware: arm_scmi: Add v3.2 perf level indexing mode support") Signed-off-by: Cristian Marussi cristian.marussi@arm.com Link: https://lore.kernel.org/r/20240108185050.1628687-1-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/perf.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/drivers/firmware/arm_scmi/perf.c b/drivers/firmware/arm_scmi/perf.c index e887fd169043..dd344506b0a3 100644 --- a/drivers/firmware/arm_scmi/perf.c +++ b/drivers/firmware/arm_scmi/perf.c @@ -347,8 +347,8 @@ process_response_opp(struct scmi_opp *opp, unsigned int loop_idx, }
static inline void -process_response_opp_v4(struct perf_dom_info *dom, struct scmi_opp *opp, - unsigned int loop_idx, +process_response_opp_v4(struct device *dev, struct perf_dom_info *dom, + struct scmi_opp *opp, unsigned int loop_idx, const struct scmi_msg_resp_perf_describe_levels_v4 *r) { opp->perf = le32_to_cpu(r->opp[loop_idx].perf_val); @@ -359,10 +359,23 @@ process_response_opp_v4(struct perf_dom_info *dom, struct scmi_opp *opp, /* Note that PERF v4 reports always five 32-bit words */ opp->indicative_freq = le32_to_cpu(r->opp[loop_idx].indicative_freq); if (dom->level_indexing_mode) { + int ret; + opp->level_index = le32_to_cpu(r->opp[loop_idx].level_index);
- xa_store(&dom->opps_by_idx, opp->level_index, opp, GFP_KERNEL); - xa_store(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL); + ret = xa_insert(&dom->opps_by_idx, opp->level_index, opp, + GFP_KERNEL); + if (ret) + dev_warn(dev, + "Failed to add opps_by_idx at %d - ret:%d\n", + opp->level_index, ret); + + ret = xa_insert(&dom->opps_by_lvl, opp->perf, opp, GFP_KERNEL); + if (ret) + dev_warn(dev, + "Failed to add opps_by_lvl at %d - ret:%d\n", + opp->perf, ret); + hash_add(dom->opps_by_freq, &opp->hash, opp->indicative_freq); } } @@ -379,7 +392,7 @@ iter_perf_levels_process_response(const struct scmi_protocol_handle *ph, if (PROTOCOL_REV_MAJOR(p->version) <= 0x3) process_response_opp(opp, st->loop_idx, response); else - process_response_opp_v4(p->perf_dom, opp, st->loop_idx, + process_response_opp_v4(ph->dev, p->perf_dom, opp, st->loop_idx, response); p->perf_dom->opp_count++;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Cristian Marussi cristian.marussi@arm.com
[ Upstream commit b5dc0ffd36560dbadaed9a3d9fd7838055d62d74 ]
Use xa_insert() when saving per-channel raw queues to better check for duplicates.
Fixes: 7860701d1e6e ("firmware: arm_scmi: Add per-channel raw injection support") Signed-off-by: Cristian Marussi cristian.marussi@arm.com Link: https://lore.kernel.org/r/20240108185050.1628687-2-cristian.marussi@arm.com Signed-off-by: Sudeep Holla sudeep.holla@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/firmware/arm_scmi/raw_mode.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/drivers/firmware/arm_scmi/raw_mode.c b/drivers/firmware/arm_scmi/raw_mode.c index 0493aa3c12bf..350573518503 100644 --- a/drivers/firmware/arm_scmi/raw_mode.c +++ b/drivers/firmware/arm_scmi/raw_mode.c @@ -1111,7 +1111,6 @@ static int scmi_raw_mode_setup(struct scmi_raw_mode_info *raw, int i;
for (i = 0; i < num_chans; i++) { - void *xret; struct scmi_raw_queue *q;
q = scmi_raw_queue_init(raw); @@ -1120,13 +1119,12 @@ static int scmi_raw_mode_setup(struct scmi_raw_mode_info *raw, goto err_xa; }
- xret = xa_store(&raw->chans_q, channels[i], q, + ret = xa_insert(&raw->chans_q, channels[i], q, GFP_KERNEL); - if (xa_err(xret)) { + if (ret) { dev_err(dev, "Fail to allocate Raw queue 0x%02X\n", channels[i]); - ret = xa_err(xret); goto err_xa; } } @@ -1322,6 +1320,12 @@ void scmi_raw_message_report(void *r, struct scmi_xfer *xfer, dev = raw->handle->dev; q = scmi_raw_queue_select(raw, idx, SCMI_XFER_IS_CHAN_SET(xfer) ? chan_id : 0); + if (!q) { + dev_warn(dev, + "RAW[%d] - NO queue for chan 0x%X. Dropping report.\n", + idx, chan_id); + return; + }
/* * Grab the msg_q_lock upfront to avoid a possible race between
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Wenhua Lin Wenhua.Lin@unisoc.com
[ Upstream commit 84aef4ed59705585d629e81d633a83b7d416f5fb ]
The raw interrupt status of eic maybe set before the interrupt is enabled, since the eic interrupt has a latch function, which would trigger the interrupt event once enabled it from user side. To solve this problem, interrupts generated before setting the interrupt trigger type are ignored.
Fixes: 25518e024e3a ("gpio: Add Spreadtrum EIC driver support") Acked-by: Chunyan Zhang zhang.lyra@gmail.com Signed-off-by: Wenhua Lin Wenhua.Lin@unisoc.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-eic-sprd.c | 32 ++++++++++++++++++++++++++++---- 1 file changed, 28 insertions(+), 4 deletions(-)
diff --git a/drivers/gpio/gpio-eic-sprd.c b/drivers/gpio/gpio-eic-sprd.c index 5320cf1de89c..b24e349deed5 100644 --- a/drivers/gpio/gpio-eic-sprd.c +++ b/drivers/gpio/gpio-eic-sprd.c @@ -321,20 +321,27 @@ static int sprd_eic_irq_set_type(struct irq_data *data, unsigned int flow_type) switch (flow_type) { case IRQ_TYPE_LEVEL_HIGH: sprd_eic_update(chip, offset, SPRD_EIC_DBNC_IEV, 1); + sprd_eic_update(chip, offset, SPRD_EIC_DBNC_IC, 1); break; case IRQ_TYPE_LEVEL_LOW: sprd_eic_update(chip, offset, SPRD_EIC_DBNC_IEV, 0); + sprd_eic_update(chip, offset, SPRD_EIC_DBNC_IC, 1); break; case IRQ_TYPE_EDGE_RISING: case IRQ_TYPE_EDGE_FALLING: case IRQ_TYPE_EDGE_BOTH: state = sprd_eic_get(chip, offset); - if (state) + if (state) { sprd_eic_update(chip, offset, SPRD_EIC_DBNC_IEV, 0); - else + sprd_eic_update(chip, offset, + SPRD_EIC_DBNC_IC, 1); + } else { sprd_eic_update(chip, offset, SPRD_EIC_DBNC_IEV, 1); + sprd_eic_update(chip, offset, + SPRD_EIC_DBNC_IC, 1); + } break; default: return -ENOTSUPP; @@ -346,20 +353,27 @@ static int sprd_eic_irq_set_type(struct irq_data *data, unsigned int flow_type) switch (flow_type) { case IRQ_TYPE_LEVEL_HIGH: sprd_eic_update(chip, offset, SPRD_EIC_LATCH_INTPOL, 0); + sprd_eic_update(chip, offset, SPRD_EIC_LATCH_INTCLR, 1); break; case IRQ_TYPE_LEVEL_LOW: sprd_eic_update(chip, offset, SPRD_EIC_LATCH_INTPOL, 1); + sprd_eic_update(chip, offset, SPRD_EIC_LATCH_INTCLR, 1); break; case IRQ_TYPE_EDGE_RISING: case IRQ_TYPE_EDGE_FALLING: case IRQ_TYPE_EDGE_BOTH: state = sprd_eic_get(chip, offset); - if (state) + if (state) { sprd_eic_update(chip, offset, SPRD_EIC_LATCH_INTPOL, 0); - else + sprd_eic_update(chip, offset, + SPRD_EIC_LATCH_INTCLR, 1); + } else { sprd_eic_update(chip, offset, SPRD_EIC_LATCH_INTPOL, 1); + sprd_eic_update(chip, offset, + SPRD_EIC_LATCH_INTCLR, 1); + } break; default: return -ENOTSUPP; @@ -373,29 +387,34 @@ static int sprd_eic_irq_set_type(struct irq_data *data, unsigned int flow_type) sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTBOTH, 0); sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTMODE, 0); sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTPOL, 1); + sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTCLR, 1); irq_set_handler_locked(data, handle_edge_irq); break; case IRQ_TYPE_EDGE_FALLING: sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTBOTH, 0); sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTMODE, 0); sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTPOL, 0); + sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTCLR, 1); irq_set_handler_locked(data, handle_edge_irq); break; case IRQ_TYPE_EDGE_BOTH: sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTMODE, 0); sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTBOTH, 1); + sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTCLR, 1); irq_set_handler_locked(data, handle_edge_irq); break; case IRQ_TYPE_LEVEL_HIGH: sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTBOTH, 0); sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTMODE, 1); sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTPOL, 1); + sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTCLR, 1); irq_set_handler_locked(data, handle_level_irq); break; case IRQ_TYPE_LEVEL_LOW: sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTBOTH, 0); sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTMODE, 1); sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTPOL, 0); + sprd_eic_update(chip, offset, SPRD_EIC_ASYNC_INTCLR, 1); irq_set_handler_locked(data, handle_level_irq); break; default: @@ -408,29 +427,34 @@ static int sprd_eic_irq_set_type(struct irq_data *data, unsigned int flow_type) sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTBOTH, 0); sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTMODE, 0); sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTPOL, 1); + sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTCLR, 1); irq_set_handler_locked(data, handle_edge_irq); break; case IRQ_TYPE_EDGE_FALLING: sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTBOTH, 0); sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTMODE, 0); sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTPOL, 0); + sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTCLR, 1); irq_set_handler_locked(data, handle_edge_irq); break; case IRQ_TYPE_EDGE_BOTH: sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTMODE, 0); sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTBOTH, 1); + sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTCLR, 1); irq_set_handler_locked(data, handle_edge_irq); break; case IRQ_TYPE_LEVEL_HIGH: sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTBOTH, 0); sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTMODE, 1); sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTPOL, 1); + sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTCLR, 1); irq_set_handler_locked(data, handle_level_irq); break; case IRQ_TYPE_LEVEL_LOW: sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTBOTH, 0); sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTMODE, 1); sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTPOL, 0); + sprd_eic_update(chip, offset, SPRD_EIC_SYNC_INTCLR, 1); irq_set_handler_locked(data, handle_level_irq); break; default:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Artur Weber aweber.kernel@gmail.com
[ Upstream commit eab4f56d3e75dad697acf8dc2c8be3c341d6c63e ]
After more investigation, I've found that it's not the panel driver config that needs to be modified to invert the data polarity, but the FIMD config.
Add the missing invert-vclk option that is required to get the display to work correctly.
Fixes: ee37a457af1d ("ARM: dts: exynos: Add Samsung Galaxy Tab 3 8.0 boards") Signed-off-by: Artur Weber aweber.kernel@gmail.com Link: https://lore.kernel.org/r/20240105-tab3-display-fixes-v2-1-904d1207bf6f@gmai... Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/boot/dts/samsung/exynos4212-tab3.dtsi | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/arm/boot/dts/samsung/exynos4212-tab3.dtsi b/arch/arm/boot/dts/samsung/exynos4212-tab3.dtsi index ce81e42bf5eb..39469b708f91 100644 --- a/arch/arm/boot/dts/samsung/exynos4212-tab3.dtsi +++ b/arch/arm/boot/dts/samsung/exynos4212-tab3.dtsi @@ -435,6 +435,7 @@ &exynos_usbphy { };
&fimd { + samsung,invert-vclk; status = "okay"; };
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mika Westerberg mika.westerberg@linux.intel.com
[ Upstream commit 6c314425b9ef6b247cefd0903e287eb072580c3b ]
Turns out this "SoC" side controller does not support certain commands, such as reading chip JEDEC ID, so the controller is pretty much unusable in Linux. We should be using the "PCH" side controller instead. For this reason remove this PCI ID from the list.
Fixes: c2912d42e86e ("spi: intel-pci: Add support for Meteor Lake-S SPI serial flash") Signed-off-by: Mika Westerberg mika.westerberg@linux.intel.com Link: https://msgid.link/r/20240122120034.2664812-2-mika.westerberg@linux.intel.co... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-intel-pci.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/drivers/spi/spi-intel-pci.c b/drivers/spi/spi-intel-pci.c index 57d767a68e7b..b9918dcc3802 100644 --- a/drivers/spi/spi-intel-pci.c +++ b/drivers/spi/spi-intel-pci.c @@ -84,7 +84,6 @@ static const struct pci_device_id intel_spi_pci_ids[] = { { PCI_VDEVICE(INTEL, 0xa2a4), (unsigned long)&cnl_info }, { PCI_VDEVICE(INTEL, 0xa324), (unsigned long)&cnl_info }, { PCI_VDEVICE(INTEL, 0xa3a4), (unsigned long)&cnl_info }, - { PCI_VDEVICE(INTEL, 0xae23), (unsigned long)&cnl_info }, { }, }; MODULE_DEVICE_TABLE(pci, intel_spi_pci_ids);
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Li Lingfeng lilingfeng3@huawei.com
[ Upstream commit 7777f47f2ea64efd1016262e7b59fab34adfb869 ]
Commit 1a721de8489f ("block: don't add or resize partition on the disk with GENHD_FL_NO_PART") prevented all operations about partitions on disks with GENHD_FL_NO_PART in blkpg_do_ioctl() since they are meaningless. However, it changed error code in some scenarios. So move checking GENHD_FL_NO_PART to bdev_add_partition() to eliminate impact.
Fixes: 1a721de8489f ("block: don't add or resize partition on the disk with GENHD_FL_NO_PART") Reported-by: Allison Karlitskaya allison.karlitskaya@redhat.com Closes: https://lore.kernel.org/all/CAOYeF9VsmqKMcQjo1k6YkGNujwN-nzfxY17N3F-CMikE1tY... Signed-off-by: Li Lingfeng lilingfeng3@huawei.com Reviewed-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20240118130401.792757-1-lilingfeng@huaweicloud.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/ioctl.c | 2 -- block/partitions/core.c | 5 +++++ 2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c index a74ef911e279..d1d8e8391279 100644 --- a/block/ioctl.c +++ b/block/ioctl.c @@ -20,8 +20,6 @@ static int blkpg_do_ioctl(struct block_device *bdev, struct blkpg_partition p; sector_t start, length;
- if (disk->flags & GENHD_FL_NO_PART) - return -EINVAL; if (!capable(CAP_SYS_ADMIN)) return -EACCES; if (copy_from_user(&p, upart, sizeof(struct blkpg_partition))) diff --git a/block/partitions/core.c b/block/partitions/core.c index e137a87f4db0..e58c8b50350b 100644 --- a/block/partitions/core.c +++ b/block/partitions/core.c @@ -458,6 +458,11 @@ int bdev_add_partition(struct gendisk *disk, int partno, sector_t start, goto out; }
+ if (disk->flags & GENHD_FL_NO_PART) { + ret = -EINVAL; + goto out; + } + if (partition_overlaps(disk, start, length, -1)) { ret = -EBUSY; goto out;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Hsin-Yi Wang hsinyi@chromium.org
[ Upstream commit 4d5b7daa3c610af3f322ad1e91fc0c752ff32f0e ]
Similar to commit 26db46bc9c67 ("drm/bridge: parade-ps8640: Ensure bridge is suspended in .post_disable()"). Add a mutex to ensure that aux transfer won't race with atomic_disable by holding the PM reference and prevent the bridge from suspend.
Also we need to use pm_runtime_put_sync_suspend() to suspend the bridge instead of idle with pm_runtime_put_sync().
Fixes: 3203e497eb76 ("drm/bridge: anx7625: Synchronously run runtime suspend.") Fixes: adca62ec370c ("drm/bridge: anx7625: Support reading edid through aux channel") Signed-off-by: Hsin-Yi Wang hsinyi@chromium.org Tested-by: Xuxin Xiong xuxinxiong@huaqin.corp-partner.google.com Reviewed-by: Pin-yen Lin treapking@chromium.org Reviewed-by: Douglas Anderson dianders@chromium.org Signed-off-by: Douglas Anderson dianders@chromium.org Link: https://patchwork.freedesktop.org/patch/msgid/20240118015916.2296741-1-hsiny... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/analogix/anx7625.c | 7 ++++++- drivers/gpu/drm/bridge/analogix/anx7625.h | 2 ++ 2 files changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.c b/drivers/gpu/drm/bridge/analogix/anx7625.c index 51abe42c639e..5168628f11cf 100644 --- a/drivers/gpu/drm/bridge/analogix/anx7625.c +++ b/drivers/gpu/drm/bridge/analogix/anx7625.c @@ -1741,6 +1741,7 @@ static ssize_t anx7625_aux_transfer(struct drm_dp_aux *aux, u8 request = msg->request & ~DP_AUX_I2C_MOT; int ret = 0;
+ mutex_lock(&ctx->aux_lock); pm_runtime_get_sync(dev); msg->reply = 0; switch (request) { @@ -1757,6 +1758,7 @@ static ssize_t anx7625_aux_transfer(struct drm_dp_aux *aux, msg->size, msg->buffer); pm_runtime_mark_last_busy(dev); pm_runtime_put_autosuspend(dev); + mutex_unlock(&ctx->aux_lock);
return ret; } @@ -2453,7 +2455,9 @@ static void anx7625_bridge_atomic_disable(struct drm_bridge *bridge, ctx->connector = NULL; anx7625_dp_stop(ctx);
- pm_runtime_put_sync(dev); + mutex_lock(&ctx->aux_lock); + pm_runtime_put_sync_suspend(dev); + mutex_unlock(&ctx->aux_lock); }
static enum drm_connector_status @@ -2647,6 +2651,7 @@ static int anx7625_i2c_probe(struct i2c_client *client)
mutex_init(&platform->lock); mutex_init(&platform->hdcp_wq_lock); + mutex_init(&platform->aux_lock);
INIT_DELAYED_WORK(&platform->hdcp_work, hdcp_check_work_func); platform->hdcp_workqueue = create_workqueue("hdcp workqueue"); diff --git a/drivers/gpu/drm/bridge/analogix/anx7625.h b/drivers/gpu/drm/bridge/analogix/anx7625.h index 5af819611ebc..80d3fb4e985f 100644 --- a/drivers/gpu/drm/bridge/analogix/anx7625.h +++ b/drivers/gpu/drm/bridge/analogix/anx7625.h @@ -471,6 +471,8 @@ struct anx7625_data { struct workqueue_struct *hdcp_workqueue; /* Lock for hdcp work queue */ struct mutex hdcp_wq_lock; + /* Lock for aux transfer and disable */ + struct mutex aux_lock; char edid_block; struct display_timing dt; u8 display_timing_valid;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Mario Limonciello mario.limonciello@amd.com
[ Upstream commit 22fb4f041999f5f16ecbda15a2859b4ef4cbf47e ]
Scaling min/max freq values were being cached and lagging a setting each time. Fix the ordering of the clamp call to ensure they work.
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217931 Fixes: febab20caeba ("cpufreq/amd-pstate: Fix scaling_min_freq and scaling_max_freq update") Signed-off-by: Mario Limonciello mario.limonciello@amd.com Reviewed-by: Wyes Karny wkarny@gmail.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpufreq/amd-pstate.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/cpufreq/amd-pstate.c b/drivers/cpufreq/amd-pstate.c index 1f6186475715..1791d37fbc53 100644 --- a/drivers/cpufreq/amd-pstate.c +++ b/drivers/cpufreq/amd-pstate.c @@ -1232,14 +1232,13 @@ static void amd_pstate_epp_update_limit(struct cpufreq_policy *policy) max_limit_perf = div_u64(policy->max * cpudata->highest_perf, cpudata->max_freq); min_limit_perf = div_u64(policy->min * cpudata->highest_perf, cpudata->max_freq);
+ WRITE_ONCE(cpudata->max_limit_perf, max_limit_perf); + WRITE_ONCE(cpudata->min_limit_perf, min_limit_perf); + max_perf = clamp_t(unsigned long, max_perf, cpudata->min_limit_perf, cpudata->max_limit_perf); min_perf = clamp_t(unsigned long, min_perf, cpudata->min_limit_perf, cpudata->max_limit_perf); - - WRITE_ONCE(cpudata->max_limit_perf, max_limit_perf); - WRITE_ONCE(cpudata->min_limit_perf, min_limit_perf); - value = READ_ONCE(cpudata->cppc_req_cached);
if (cpudata->policy == CPUFREQ_POLICY_PERFORMANCE)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Kamal Dasu kamal.dasu@broadcom.com
[ Upstream commit 574bf7bbe83794a902679846770f75a9b7f28176 ]
SFDP read shall use the mspi reads when using the bcm_qspi_exec_mem_op() call. This fixes SFDP parameter page read failures seen with parts that now use SFDP protocol to read the basic flash parameter table.
Fixes: 5f195ee7d830 ("spi: bcm-qspi: Implement the spi_mem interface") Signed-off-by: Kamal Dasu kamal.dasu@broadcom.com Tested-by: Florian Fainelli florian.fainelli@broadcom.com Reviewed-by: Florian Fainelli florian.fainelli@broadcom.com Link: https://msgid.link/r/20240109210033.43249-1-kamal.dasu@broadcom.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-bcm-qspi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi-bcm-qspi.c b/drivers/spi/spi-bcm-qspi.c index ef08fcac2f6d..0407b91183ca 100644 --- a/drivers/spi/spi-bcm-qspi.c +++ b/drivers/spi/spi-bcm-qspi.c @@ -19,7 +19,7 @@ #include <linux/platform_device.h> #include <linux/slab.h> #include <linux/spi/spi.h> -#include <linux/spi/spi-mem.h> +#include <linux/mtd/spi-nor.h> #include <linux/sysfs.h> #include <linux/types.h> #include "spi-bcm-qspi.h" @@ -1221,7 +1221,7 @@ static int bcm_qspi_exec_mem_op(struct spi_mem *mem,
/* non-aligned and very short transfers are handled by MSPI */ if (!IS_ALIGNED((uintptr_t)addr, 4) || !IS_ALIGNED((uintptr_t)buf, 4) || - len < 4) + len < 4 || op->cmd.opcode == SPINOR_OP_RDSFDP) mspi_read = true;
if (!has_bspi(qspi) || mspi_read)
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Amit Kumar Mahapatra amit.kumar-mahapatra@amd.com
[ Upstream commit 633cd6fe6e1993ba80e0954c2db127a0b1a3e66f ]
In the existing implementation, when executing interleaved write and read operations in the ISR for a transfer length greater than the FIFO size, the TXFIFO write precedes the RXFIFO read. Consequently, the initially received data in the RXFIFO is pushed out and lost, leading to a failure in data integrity. To address this issue, reverse the order of interleaved operations and conduct the RXFIFO read followed by the TXFIFO write.
Fixes: 6afe2ae8dc48 ("spi: spi-cadence: Interleave write of TX and read of RX FIFO") Signed-off-by: Amit Kumar Mahapatra amit.kumar-mahapatra@amd.com Link: https://msgid.link/r/20231218090652.18403-1-amit.kumar-mahapatra@amd.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi-cadence.c | 17 +++++++++-------- 1 file changed, 9 insertions(+), 8 deletions(-)
diff --git a/drivers/spi/spi-cadence.c b/drivers/spi/spi-cadence.c index bd96d8b546cd..5cab7caf4658 100644 --- a/drivers/spi/spi-cadence.c +++ b/drivers/spi/spi-cadence.c @@ -317,6 +317,15 @@ static void cdns_spi_process_fifo(struct cdns_spi *xspi, int ntx, int nrx) xspi->rx_bytes -= nrx;
while (ntx || nrx) { + if (nrx) { + u8 data = cdns_spi_read(xspi, CDNS_SPI_RXD); + + if (xspi->rxbuf) + *xspi->rxbuf++ = data; + + nrx--; + } + if (ntx) { if (xspi->txbuf) cdns_spi_write(xspi, CDNS_SPI_TXD, *xspi->txbuf++); @@ -326,14 +335,6 @@ static void cdns_spi_process_fifo(struct cdns_spi *xspi, int ntx, int nrx) ntx--; }
- if (nrx) { - u8 data = cdns_spi_read(xspi, CDNS_SPI_RXD); - - if (xspi->rxbuf) - *xspi->rxbuf++ = data; - - nrx--; - } } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Shyam Prasad N sprasad@microsoft.com
[ Upstream commit 993d1c346b1a51ac41b2193609a0d4e51e9748f4 ]
A recent change moved the code that decides to skip a channel or disable multichannel entirely, into a helper function.
During this, a mutex_unlock of the session_mutex should have been removed. Doing that here.
Fixes: f591062bdbf4 ("cifs: handle servers that still advertise multichannel after disabling") Signed-off-by: Shyam Prasad N sprasad@microsoft.com Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/smb/client/smb2pdu.c | 1 - 1 file changed, 1 deletion(-)
diff --git a/fs/smb/client/smb2pdu.c b/fs/smb/client/smb2pdu.c index bfec2ca0f4e6..f5006aa97f5b 100644 --- a/fs/smb/client/smb2pdu.c +++ b/fs/smb/client/smb2pdu.c @@ -195,7 +195,6 @@ cifs_chan_skip_or_disable(struct cifs_ses *ses, pserver = server->primary_server; cifs_signal_cifsd_for_reconnect(pserver, false); skip_terminate: - mutex_unlock(&ses->session_mutex); return -EHOSTDOWN; }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: David Lechner dlechner@baylibre.com
[ Upstream commit 8c2ae772fe08e33f3d7a83849e85539320701abd ]
In __spi_pump_transfer_message(), the message was not finalized in the first error return as it is in the other error return paths. Not finalizing the message could cause anything waiting on the message to complete to hang forever.
This adds the missing call to spi_finalize_current_message().
Fixes: ae7d2346dc89 ("spi: Don't use the message queue if possible in spi_sync") Signed-off-by: David Lechner dlechner@baylibre.com Link: https://msgid.link/r/20240125205312.3458541-2-dlechner@baylibre.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/spi/spi.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 399e81d37b3b..1e08cd571d21 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -1624,6 +1624,10 @@ static int __spi_pump_transfer_message(struct spi_controller *ctlr, pm_runtime_put_noidle(ctlr->dev.parent); dev_err(&ctlr->dev, "Failed to power device: %d\n", ret); + + msg->status = ret; + spi_finalize_current_message(ctlr); + return ret; } }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Aleksander Jan Bajkowski olek2@wp.pl
[ Upstream commit 4bf2a626dc4bb46f0754d8ac02ec8584ff114ad5 ]
Lantiq uses a common kernel config for devices with 24Kc and 34Kc cores. The changes made previously to add support for interrupts on all cores work on 24Kc platforms with SMP disabled and 34Kc platforms with SMP enabled. This patch fixes boot issues on Danube (single core 24Kc) with SMP enabled.
Fixes: 730320fd770d ("MIPS: lantiq: enable all hardware interrupts on second VPE") Signed-off-by: Aleksander Jan Bajkowski olek2@wp.pl Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/lantiq/prom.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/arch/mips/lantiq/prom.c b/arch/mips/lantiq/prom.c index a3cf29365858..0c45767eacf6 100644 --- a/arch/mips/lantiq/prom.c +++ b/arch/mips/lantiq/prom.c @@ -108,10 +108,9 @@ void __init prom_init(void) prom_init_cmdline();
#if defined(CONFIG_MIPS_MT_SMP) - if (cpu_has_mipsmt) { - lantiq_smp_ops = vsmp_smp_ops; + lantiq_smp_ops = vsmp_smp_ops; + if (cpu_has_mipsmt) lantiq_smp_ops.init_secondary = lantiq_init_secondary; - register_smp_ops(&lantiq_smp_ops); - } + register_smp_ops(&lantiq_smp_ops); #endif }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Michael Walle mwalle@kernel.org
[ Upstream commit ff3d5d04db07e5374758baa7e877fde8d683ebab ]
The FORCE_STOP_STATE bit is unsuitable to force the DSI link into LP-11 mode. It seems the bridge internally queues DSI packets and when the FORCE_STOP_STATE bit is cleared, they are sent in close succession without any useful timing (this also means that the DSI lanes won't go into LP-11 mode). The length of this gibberish varies between 1ms and 5ms. This sometimes breaks an attached bridge (TI SN65DSI84 in this case). In our case, the bridge will fail in about 1 per 500 reboots.
The FORCE_STOP_STATE handling was introduced to have the DSI lanes in LP-11 state during the .pre_enable phase. But as it turns out, none of this is needed at all. Between samsung_dsim_init() and samsung_dsim_set_display_enable() the lanes are already in LP-11 mode. The code as it was before commit 20c827683de0 ("drm: bridge: samsung-dsim: Fix init during host transfer") and 0c14d3130654 ("drm: bridge: samsung-dsim: Fix i.MX8M enable flow to meet spec") was correct in this regard.
This patch basically reverts both commits. It was tested on an i.MX8M SoC with an SN65DSI84 bridge. The signals were probed and the DSI packets were decoded during initialization and link start-up. After this patch the first DSI packet on the link is a VSYNC packet and the timing is correct.
Command mode between .pre_enable and .enable was also briefly tested by a quick hack. There was no DSI link partner which would have responded, but it was made sure the DSI packet was send on the link. As a side note, the command mode seems to just work in HS mode. I couldn't find that the bridge will handle commands in LP mode.
Fixes: 20c827683de0 ("drm: bridge: samsung-dsim: Fix init during host transfer") Fixes: 0c14d3130654 ("drm: bridge: samsung-dsim: Fix i.MX8M enable flow to meet spec") Signed-off-by: Michael Walle mwalle@kernel.org Signed-off-by: Inki Dae inki.dae@samsung.com Link: https://patchwork.freedesktop.org/patch/msgid/20231113164344.1612602-1-mwall... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/bridge/samsung-dsim.c | 32 ++------------------------- 1 file changed, 2 insertions(+), 30 deletions(-)
diff --git a/drivers/gpu/drm/bridge/samsung-dsim.c b/drivers/gpu/drm/bridge/samsung-dsim.c index 19bdb32dbc9a..f24666b48193 100644 --- a/drivers/gpu/drm/bridge/samsung-dsim.c +++ b/drivers/gpu/drm/bridge/samsung-dsim.c @@ -941,10 +941,6 @@ static int samsung_dsim_init_link(struct samsung_dsim *dsi) reg = samsung_dsim_read(dsi, DSIM_ESCMODE_REG); reg &= ~DSIM_STOP_STATE_CNT_MASK; reg |= DSIM_STOP_STATE_CNT(driver_data->reg_values[STOP_STATE_CNT]); - - if (!samsung_dsim_hw_is_exynos(dsi->plat_data->hw_type)) - reg |= DSIM_FORCE_STOP_STATE; - samsung_dsim_write(dsi, DSIM_ESCMODE_REG, reg);
reg = DSIM_BTA_TIMEOUT(0xff) | DSIM_LPDR_TIMEOUT(0xffff); @@ -1401,18 +1397,6 @@ static void samsung_dsim_disable_irq(struct samsung_dsim *dsi) disable_irq(dsi->irq); }
-static void samsung_dsim_set_stop_state(struct samsung_dsim *dsi, bool enable) -{ - u32 reg = samsung_dsim_read(dsi, DSIM_ESCMODE_REG); - - if (enable) - reg |= DSIM_FORCE_STOP_STATE; - else - reg &= ~DSIM_FORCE_STOP_STATE; - - samsung_dsim_write(dsi, DSIM_ESCMODE_REG, reg); -} - static int samsung_dsim_init(struct samsung_dsim *dsi) { const struct samsung_dsim_driver_data *driver_data = dsi->driver_data; @@ -1462,9 +1446,6 @@ static void samsung_dsim_atomic_pre_enable(struct drm_bridge *bridge, ret = samsung_dsim_init(dsi); if (ret) return; - - samsung_dsim_set_display_mode(dsi); - samsung_dsim_set_display_enable(dsi, true); } }
@@ -1473,12 +1454,8 @@ static void samsung_dsim_atomic_enable(struct drm_bridge *bridge, { struct samsung_dsim *dsi = bridge_to_dsi(bridge);
- if (samsung_dsim_hw_is_exynos(dsi->plat_data->hw_type)) { - samsung_dsim_set_display_mode(dsi); - samsung_dsim_set_display_enable(dsi, true); - } else { - samsung_dsim_set_stop_state(dsi, false); - } + samsung_dsim_set_display_mode(dsi); + samsung_dsim_set_display_enable(dsi, true);
dsi->state |= DSIM_STATE_VIDOUT_AVAILABLE; } @@ -1491,9 +1468,6 @@ static void samsung_dsim_atomic_disable(struct drm_bridge *bridge, if (!(dsi->state & DSIM_STATE_ENABLED)) return;
- if (!samsung_dsim_hw_is_exynos(dsi->plat_data->hw_type)) - samsung_dsim_set_stop_state(dsi, true); - dsi->state &= ~DSIM_STATE_VIDOUT_AVAILABLE; }
@@ -1795,8 +1769,6 @@ static ssize_t samsung_dsim_host_transfer(struct mipi_dsi_host *host, if (ret) return ret;
- samsung_dsim_set_stop_state(dsi, false); - ret = mipi_dsi_create_packet(&xfer.packet, msg); if (ret < 0) return ret;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Quanquan Cao caoqq@fujitsu.com
commit d76779dd3681c01a4c6c3cae4d0627c9083e0ee6 upstream.
Creating a region with 16 memory devices caused a problem. The div_u64_rem function, used for dividing an unsigned 64-bit number by a 32-bit one, faced an issue when SZ_256M * p->interleave_ways. The result surpassed the maximum limit of the 32-bit divisor (4G), leading to an overflow and a remainder of 0. note: At this point, p->interleave_ways is 16, meaning 16 * 256M = 4G
To fix this issue, I replaced the div_u64_rem function with div64_u64_rem and adjusted the type of the remainder.
Signed-off-by: Quanquan Cao caoqq@fujitsu.com Reviewed-by: Dave Jiang dave.jiang@intel.com Fixes: 23a22cd1c98b ("cxl/region: Allocate HPA capacity to regions") Cc: stable@vger.kernel.org Signed-off-by: Dan Williams dan.j.williams@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cxl/core/region.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -525,7 +525,7 @@ static int alloc_hpa(struct cxl_region * struct cxl_root_decoder *cxlrd = to_cxl_root_decoder(cxlr->dev.parent); struct cxl_region_params *p = &cxlr->params; struct resource *res; - u32 remainder = 0; + u64 remainder = 0;
lockdep_assert_held_write(&cxl_region_rwsem);
@@ -545,7 +545,7 @@ static int alloc_hpa(struct cxl_region * (cxlr->mode == CXL_DECODER_PMEM && uuid_is_null(&p->uuid))) return -ENXIO;
- div_u64_rem(size, SZ_256M * p->interleave_ways, &remainder); + div64_u64_rem(size, (u64)SZ_256M * p->interleave_ways, &remainder); if (remainder) return -EINVAL;
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Xi Ruoyao xry111@xry111.site
commit 59be5c35850171e307ca5d3d703ee9ff4096b948 upstream.
If we still own the FPU after initializing fcr31, when we are preempted the dirty value in the FPU will be read out and stored into fcr31, clobbering our setting. This can cause an improper floating-point environment after execve(). For example:
zsh% cat measure.c #include <fenv.h> int main() { return fetestexcept(FE_INEXACT); } zsh% cc measure.c -o measure -lm zsh% echo $((1.0/3)) # raising FE_INEXACT 0.33333333333333331 zsh% while ./measure; do ; done (stopped in seconds)
Call lose_fpu(0) before setting fcr31 to prevent this.
Closes: https://lore.kernel.org/linux-mips/7a6aa1bbdbbe2e63ae96ff163fab0349f58f1b9e.... Fixes: 9b26616c8d9d ("MIPS: Respect the ISA level in FCSR handling") Cc: stable@vger.kernel.org Signed-off-by: Xi Ruoyao xry111@xry111.site Signed-off-by: Thomas Bogendoerfer tsbogend@alpha.franken.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/mips/kernel/elf.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/arch/mips/kernel/elf.c +++ b/arch/mips/kernel/elf.c @@ -11,6 +11,7 @@
#include <asm/cpu-features.h> #include <asm/cpu-info.h> +#include <asm/fpu.h>
#ifdef CONFIG_MIPS_FP_SUPPORT
@@ -309,6 +310,11 @@ void mips_set_personality_nan(struct arc struct cpuinfo_mips *c = &boot_cpu_data; struct task_struct *t = current;
+ /* Do this early so t->thread.fpu.fcr31 won't be clobbered in case + * we are preempted before the lose_fpu(0) in start_thread. + */ + lose_fpu(0); + t->thread.fpu.fcr31 = c->fpu_csr31; switch (state->nan_2008) { case 0:
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Dawei Li dawei.li@shingroup.cn
commit b184c8c2889ceef0a137c7d0567ef9fe3d92276e upstream.
For a CONFIG_SPARSE_IRQ=n kernel, early_irq_init() is supposed to initialize all interrupt descriptors.
It does except for irq_desc::resend_node, which ia only initialized for the first descriptor.
Use the indexed decriptor and not the base pointer to address that.
Fixes: bc06a9e08742 ("genirq: Use hlist for managing resend handlers") Signed-off-by: Dawei Li dawei.li@shingroup.cn Signed-off-by: Thomas Gleixner tglx@linutronix.de Acked-by: Marc Zyngier maz@kernel.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240122085716.2999875-5-dawei.li@shingroup.cn Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/irq/irqdesc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -600,7 +600,7 @@ int __init early_irq_init(void) mutex_init(&desc[i].request_mutex); init_waitqueue_head(&desc[i].wait_for_threads); desc_set_defaults(i, &desc[i], node, NULL, NULL); - irq_resend_init(desc); + irq_resend_init(&desc[i]); } return arch_early_irq_init(); }
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Jiri Wiesner jwiesner@suse.de
commit 644649553508b9bacf0fc7a5bdc4f9e0165576a5 upstream.
There have been reports of the watchdog marking clocksources unstable on machines with 8 NUMA nodes:
clocksource: timekeeping watchdog on CPU373: Marking clocksource 'tsc' as unstable because the skew is too large: clocksource: 'hpet' wd_nsec: 14523447520 clocksource: 'tsc' cs_nsec: 14524115132
The measured clocksource skew - the absolute difference between cs_nsec and wd_nsec - was 668 microseconds:
cs_nsec - wd_nsec = 14524115132 - 14523447520 = 667612
The kernel used 200 microseconds for the uncertainty_margin of both the clocksource and watchdog, resulting in a threshold of 400 microseconds (the md variable). Both the cs_nsec and the wd_nsec value indicate that the readout interval was circa 14.5 seconds. The observed behaviour is that watchdog checks failed for large readout intervals on 8 NUMA node machines. This indicates that the size of the skew was directly proportinal to the length of the readout interval on those machines. The measured clocksource skew, 668 microseconds, was evaluated against a threshold (the md variable) that is suited for readout intervals of roughly WATCHDOG_INTERVAL, i.e. HZ >> 1, which is 0.5 second.
The intention of 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold") was to tighten the threshold for evaluating skew and set the lower bound for the uncertainty_margin of clocksources to twice WATCHDOG_MAX_SKEW. Later in c37e85c135ce ("clocksource: Loosen clocksource watchdog constraints"), the WATCHDOG_MAX_SKEW constant was increased to 125 microseconds to fit the limit of NTP, which is able to use a clocksource that suffers from up to 500 microseconds of skew per second. Both the TSC and the HPET use default uncertainty_margin. When the readout interval gets stretched the default uncertainty_margin is no longer a suitable lower bound for evaluating skew - it imposes a limit that is far stricter than the skew with which NTP can deal.
The root causes of the skew being directly proportinal to the length of the readout interval are:
* the inaccuracy of the shift/mult pairs of clocksources and the watchdog * the conversion to nanoseconds is imprecise for large readout intervals
Prevent this by skipping the current watchdog check if the readout interval exceeds 2 * WATCHDOG_INTERVAL. Considering the maximum readout interval of 2 * WATCHDOG_INTERVAL, the current default uncertainty margin (of the TSC and HPET) corresponds to a limit on clocksource skew of 250 ppm (microseconds of skew per second). To keep the limit imposed by NTP (500 microseconds of skew per second) for all possible readout intervals, the margins would have to be scaled so that the threshold value is proportional to the length of the actual readout interval.
As for why the readout interval may get stretched: Since the watchdog is executed in softirq context the expiration of the watchdog timer can get severely delayed on account of a ksoftirqd thread not getting to run in a timely manner. Surely, a system with such belated softirq execution is not working well and the scheduling issue should be looked into but the clocksource watchdog should be able to deal with it accordingly.
Fixes: 2e27e793e280 ("clocksource: Reduce clocksource-skew threshold") Suggested-by: Feng Tang feng.tang@intel.com Signed-off-by: Jiri Wiesner jwiesner@suse.de Signed-off-by: Thomas Gleixner tglx@linutronix.de Tested-by: Paul E. McKenney paulmck@kernel.org Reviewed-by: Feng Tang feng.tang@intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240122172350.GA740@incl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/clocksource.c | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-)
--- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -99,6 +99,7 @@ static u64 suspend_start; * Interval: 0.5sec. */ #define WATCHDOG_INTERVAL (HZ >> 1) +#define WATCHDOG_INTERVAL_MAX_NS ((2 * WATCHDOG_INTERVAL) * (NSEC_PER_SEC / HZ))
/* * Threshold: 0.0312s, when doubled: 0.0625s. @@ -134,6 +135,7 @@ static DECLARE_WORK(watchdog_work, clock static DEFINE_SPINLOCK(watchdog_lock); static int watchdog_running; static atomic_t watchdog_reset_pending; +static int64_t watchdog_max_interval;
static inline void clocksource_watchdog_lock(unsigned long *flags) { @@ -399,8 +401,8 @@ static inline void clocksource_reset_wat static void clocksource_watchdog(struct timer_list *unused) { u64 csnow, wdnow, cslast, wdlast, delta; + int64_t wd_nsec, cs_nsec, interval; int next_cpu, reset_pending; - int64_t wd_nsec, cs_nsec; struct clocksource *cs; enum wd_read_status read_ret; unsigned long extra_wait = 0; @@ -470,6 +472,27 @@ static void clocksource_watchdog(struct if (atomic_read(&watchdog_reset_pending)) continue;
+ /* + * The processing of timer softirqs can get delayed (usually + * on account of ksoftirqd not getting to run in a timely + * manner), which causes the watchdog interval to stretch. + * Skew detection may fail for longer watchdog intervals + * on account of fixed margins being used. + * Some clocksources, e.g. acpi_pm, cannot tolerate + * watchdog intervals longer than a few seconds. + */ + interval = max(cs_nsec, wd_nsec); + if (unlikely(interval > WATCHDOG_INTERVAL_MAX_NS)) { + if (system_state > SYSTEM_SCHEDULING && + interval > 2 * watchdog_max_interval) { + watchdog_max_interval = interval; + pr_warn("Long readout interval, skipping watchdog check: cs_nsec: %lld wd_nsec: %lld\n", + cs_nsec, wd_nsec); + } + watchdog_timer.expires = jiffies; + continue; + } + /* Check the deviation from the watchdog clocksource. */ md = cs->uncertainty_margin + watchdog->uncertainty_margin; if (abs(cs_nsec - wd_nsec) > md) {
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Tim Chen tim.c.chen@linux.intel.com
commit 9a574ea9069be30b835a3da772c039993c43369b upstream.
Commit 71fee48f ("tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug") preserved total idle sleep time and iowait sleeptime across CPU hotplug events.
Similar reasoning applies to the number of idle calls and idle sleeps to get the proper average of sleep time per idle invocation.
Preserve those fields too.
Fixes: 71fee48f ("tick-sched: Fix idle and iowait sleeptime accounting vs CPU hotplug") Signed-off-by: Tim Chen tim.c.chen@linux.intel.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240122233534.3094238-1-tim.c.chen@linux.intel.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/time/tick-sched.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/kernel/time/tick-sched.c +++ b/kernel/time/tick-sched.c @@ -1548,6 +1548,7 @@ void tick_cancel_sched_timer(int cpu) { struct tick_sched *ts = &per_cpu(tick_cpu_sched, cpu); ktime_t idle_sleeptime, iowait_sleeptime; + unsigned long idle_calls, idle_sleeps;
# ifdef CONFIG_HIGH_RES_TIMERS if (ts->sched_timer.base) @@ -1556,9 +1557,13 @@ void tick_cancel_sched_timer(int cpu)
idle_sleeptime = ts->idle_sleeptime; iowait_sleeptime = ts->iowait_sleeptime; + idle_calls = ts->idle_calls; + idle_sleeps = ts->idle_sleeps; memset(ts, 0, sizeof(*ts)); ts->idle_sleeptime = idle_sleeptime; ts->iowait_sleeptime = iowait_sleeptime; + ts->idle_calls = idle_calls; + ts->idle_sleeps = idle_sleeps; } #endif
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Richard Palethorpe rpalethorpe@suse.com
commit 56062d60f117dccfb5281869e0ab61e090baf864 upstream.
Presently ia32 registers stored in ptregs are unconditionally cast to unsigned int by the ia32 stub. They are then cast to long when passed to __se_sys*, but will not be sign extended.
This takes the sign of the syscall argument into account in the ia32 stub. It still casts to unsigned int to avoid implementation specific behavior. However then casts to int or unsigned int as necessary. So that the following cast to long sign extends the value.
This fixes the io_pgetevents02 LTP test when compiled with -m32. Presently the systemcall io_pgetevents_time64() unexpectedly accepts -1 for the maximum number of events.
It doesn't appear other systemcalls with signed arguments are effected because they all have compat variants defined and wired up.
Fixes: ebeb8c82ffaf ("syscalls/x86: Use 'struct pt_regs' based syscall calling for IA32_EMULATION and x32") Suggested-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Richard Palethorpe rpalethorpe@suse.com Signed-off-by: Nikolay Borisov nik.borisov@suse.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Reviewed-by: Arnd Bergmann arnd@arndb.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20240110130122.3836513-1-nik.borisov@suse.com Link: https://lore.kernel.org/ltp/20210921130127.24131-1-rpalethorpe@suse.com/ Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/include/asm/syscall_wrapper.h | 25 +++++++++++++++++++++---- include/linux/syscalls.h | 1 + 2 files changed, 22 insertions(+), 4 deletions(-)
--- a/arch/x86/include/asm/syscall_wrapper.h +++ b/arch/x86/include/asm/syscall_wrapper.h @@ -58,12 +58,29 @@ extern long __ia32_sys_ni_syscall(const ,,regs->di,,regs->si,,regs->dx \ ,,regs->r10,,regs->r8,,regs->r9) \
+ +/* SYSCALL_PT_ARGS is Adapted from s390x */ +#define SYSCALL_PT_ARG6(m, t1, t2, t3, t4, t5, t6) \ + SYSCALL_PT_ARG5(m, t1, t2, t3, t4, t5), m(t6, (regs->bp)) +#define SYSCALL_PT_ARG5(m, t1, t2, t3, t4, t5) \ + SYSCALL_PT_ARG4(m, t1, t2, t3, t4), m(t5, (regs->di)) +#define SYSCALL_PT_ARG4(m, t1, t2, t3, t4) \ + SYSCALL_PT_ARG3(m, t1, t2, t3), m(t4, (regs->si)) +#define SYSCALL_PT_ARG3(m, t1, t2, t3) \ + SYSCALL_PT_ARG2(m, t1, t2), m(t3, (regs->dx)) +#define SYSCALL_PT_ARG2(m, t1, t2) \ + SYSCALL_PT_ARG1(m, t1), m(t2, (regs->cx)) +#define SYSCALL_PT_ARG1(m, t1) m(t1, (regs->bx)) +#define SYSCALL_PT_ARGS(x, ...) SYSCALL_PT_ARG##x(__VA_ARGS__) + +#define __SC_COMPAT_CAST(t, a) \ + (__typeof(__builtin_choose_expr(__TYPE_IS_L(t), 0, 0U))) \ + (unsigned int)a + /* Mapping of registers to parameters for syscalls on i386 */ #define SC_IA32_REGS_TO_ARGS(x, ...) \ - __MAP(x,__SC_ARGS \ - ,,(unsigned int)regs->bx,,(unsigned int)regs->cx \ - ,,(unsigned int)regs->dx,,(unsigned int)regs->si \ - ,,(unsigned int)regs->di,,(unsigned int)regs->bp) + SYSCALL_PT_ARGS(x, __SC_COMPAT_CAST, \ + __MAP(x, __SC_TYPE, __VA_ARGS__)) \
#define __SYS_STUB0(abi, name) \ long __##abi##_##name(const struct pt_regs *regs); \ --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -125,6 +125,7 @@ struct cachestat; #define __TYPE_IS_LL(t) (__TYPE_AS(t, 0LL) || __TYPE_AS(t, 0ULL)) #define __SC_LONG(t, a) __typeof(__builtin_choose_expr(__TYPE_IS_LL(t), 0LL, 0L)) a #define __SC_CAST(t, a) (__force t) a +#define __SC_TYPE(t, a) t #define __SC_ARGS(t, a) a #define __SC_TEST(t, a) (void)BUILD_BUG_ON_ZERO(!__TYPE_IS_LL(t) && sizeof(t) > sizeof(long))
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Randy Dunlap rdunlap@infradead.org
commit 29bff582b74ed0bdb7e6986482ad9e6799ea4d2f upstream.
Fix the function name to avoid a kernel-doc warning:
include/linux/serial_core.h:666: warning: expecting prototype for uart_port_lock_irqrestore(). Prototype was for uart_port_unlock_irqrestore() instead
Fixes: b0af4bcb4946 ("serial: core: Provide port lock wrappers") Signed-off-by: Randy Dunlap rdunlap@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: John Ogness john.ogness@linutronix.de Cc: linux-serial@vger.kernel.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jiri Slaby jirislaby@kernel.org Reviewed-by: John Ogness john.ogness@linutronix.de Link: https://lore.kernel.org/r/20230927044128.4748-1-rdunlap@infradead.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/linux/serial_core.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/serial_core.h +++ b/include/linux/serial_core.h @@ -658,7 +658,7 @@ static inline void uart_port_unlock_irq( }
/** - * uart_port_lock_irqrestore - Unlock the UART port, restore interrupts + * uart_port_unlock_irqrestore - Unlock the UART port, restore interrupts * @up: Pointer to UART port structure * @flags: The saved interrupt flags for restore */
6.6-stable review patch. If anyone has any objections, please let me know.
------------------
From: Rafael J. Wysocki rafael.j.wysocki@intel.com
commit 108ffd12be24ba1d74b3314df8db32a0a6d55ba5 upstream.
The lockdep assertion in thermal_zone_trip_id() triggers when the trip point sysfs attribute of a thermal instance is read, because there is no thermal zone locking in that code path.
This is not verly useful, though, because there is no mechanism by which the location of the trips[] table in a thermal zone or its size can change after binding cooling devices to the trips in that thermal zone and before those cooling devices are unbound from them. Thus it is not in fact necessary to hold the thermal zone lock when thermal_zone_trip_id() is called from trip_point_show() and so the lockdep asserion in the former is invalid.
Accordingly, drop that lockdep assertion.
Fixes: 2c7b4bfadef0 ("thermal: core: Store trip pointer in struct thermal_instance") Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thermal/thermal_trip.c | 2 -- 1 file changed, 2 deletions(-)
--- a/drivers/thermal/thermal_trip.c +++ b/drivers/thermal/thermal_trip.c @@ -178,8 +178,6 @@ int thermal_zone_trip_id(struct thermal_ { int i;
- lockdep_assert_held(&tz->lock); - for (i = 0; i < tz->num_trips; i++) { if (&tz->trips[i] == trip) return i;
Hello,
On Mon, 29 Jan 2024 09:01:04 -0800 Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.15 release. There are 331 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 31 Jan 2024 16:59:28 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
This rc kernel passes DAMON functionality test[1] on my test machine. Attaching the test results summary below. Please note that I retrieved the kernel from linux-stable-rc tree[2].
Tested-by: SeongJae Park sj@kernel.org
[1] https://github.com/awslabs/damon-tests/tree/next/corr [2] 1ff49073b88b ("Linux 6.6.15-rc1")
Thanks, SJ
[...]
---
[32m ok 1 selftests: damon: debugfs_attrs.sh ok 2 selftests: damon: debugfs_schemes.sh ok 3 selftests: damon: debugfs_target_ids.sh ok 4 selftests: damon: debugfs_empty_targets.sh ok 5 selftests: damon: debugfs_huge_count_read_write.sh ok 6 selftests: damon: debugfs_duplicate_context_creation.sh ok 7 selftests: damon: debugfs_rm_non_contexts.sh ok 8 selftests: damon: sysfs.sh ok 9 selftests: damon: sysfs_update_removed_scheme_dir.sh ok 10 selftests: damon: reclaim.sh ok 11 selftests: damon: lru_sort.sh ok 1 selftests: damon-tests: kunit.sh ok 2 selftests: damon-tests: huge_count_read_write.sh ok 3 selftests: damon-tests: buffer_overflow.sh ok 4 selftests: damon-tests: rm_contexts.sh ok 5 selftests: damon-tests: record_null_deref.sh ok 6 selftests: damon-tests: dbgfs_target_ids_read_before_terminate_race.sh ok 7 selftests: damon-tests: dbgfs_target_ids_pid_leak.sh ok 8 selftests: damon-tests: damo_tests.sh ok 9 selftests: damon-tests: masim-record.sh ok 10 selftests: damon-tests: build_i386.sh ok 11 selftests: damon-tests: build_arm64.sh ok 12 selftests: damon-tests: build_m68k.sh ok 13 selftests: damon-tests: build_i386_idle_flag.sh ok 14 selftests: damon-tests: build_i386_highpte.sh ok 15 selftests: damon-tests: build_nomemcg.sh [33m [92mPASS [39m _remote_run_corr.sh SUCCESS
On 1/29/24 10:01, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.15 release. There are 331 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 31 Jan 2024 16:59:28 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
This is the start of the stable review cycle for the 6.6.15 release. There are 331 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 31 Jan 2024 16:59:28 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my x86_64 and ARM64 test systems. No errors or regressions.
Tested-by: Allen Pais apais@linux.microsoft.com
Thanks.
On 1/29/2024 9:01 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.15 release. There are 331 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 31 Jan 2024 16:59:28 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On Mon, Jan 29, 2024 at 09:01:04AM -0800, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.15 release. There are 331 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully compiled and installed the kernel on my computer (Acer Aspire E15, Intel Core i3 Haswell). No noticeable regressions.
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
On Monday, January 29, 2024 22:31 IST, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.15 release. There are 331 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 31 Jan 2024 16:59:28 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
KernelCI report for stable-rc/linux-6.6.y for this week :-
## stable-rc HEAD for linux-6.6.y: Date: 2024-01-30 https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/l...
## Build failures: No build failures seen for the stable-rc/linux-6.6.y commit head \o/
## Boot failures: No **new** boot failures seen for the stable-rc/linux-6.6.y commit head \o/
Tested-by: kernelci.org bot bot@kernelci.org
Thanks, Shreeya Patel
Hi Greg
On Tue, Jan 30, 2024 at 2:22 AM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.15 release. There are 331 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 31 Jan 2024 16:59:28 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
6.6.15-rc1 tested.
Build successfully completed. Boot successfully completed. No dmesg regressions. Video output normal. Sound output normal.
Lenovo ThinkPad X1 Carbon Gen10(Intel i7-1260P(x86_64) arch linux)
[ 0.000000] Linux version 6.6.15-rc1rv (takeshi@ThinkPadX1Gen10J0764) (gcc (GCC) 13.2.1 20230801, GNU ld (GNU Binutils) 2.41.0) #1 SMP PREEMPT_DYNAMIC Tue Jan 30 20:22:49 JST 2024
Thanks
Tested-by: Takeshi Ogasawara takeshi.ogasawara@futuring-girl.com
On Mon, 29 Jan 2024 09:01:04 -0800, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.15 release. There are 331 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 31 Jan 2024 16:59:28 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.6: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 116 tests: 116 pass, 0 fail
Linux version: 6.6.15-rc1-g1ff49073b88b Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
On Mon, 29 Jan 2024 at 22:43, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.15 release. There are 331 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 31 Jan 2024 16:59:28 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.6.15-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-6.6.y * git commit: 1ff49073b88be8a41ff42ecbbc2bb82e18783a7e * git describe: v6.6.14-332-g1ff49073b88b * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.6.y/build/v6.6.14...
## Test Regressions (compared to v6.6.13-581-g86b2b02ae53b)
## Metric Regressions (compared to v6.6.13-581-g86b2b02ae53b)
## Test Fixes (compared to v6.6.13-581-g86b2b02ae53b)
## Metric Fixes (compared to v6.6.13-581-g86b2b02ae53b)
## Test result summary total: 155176, pass: 133177, fail: 2238, skip: 19572, xfail: 189
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 145 total, 144 passed, 1 failed * arm64: 52 total, 49 passed, 3 failed * i386: 41 total, 40 passed, 1 failed * mips: 26 total, 26 passed, 0 failed * parisc: 4 total, 4 passed, 0 failed * powerpc: 36 total, 36 passed, 0 failed * riscv: 25 total, 24 passed, 1 failed * s390: 13 total, 13 passed, 0 failed * sh: 10 total, 10 passed, 0 failed * sparc: 8 total, 8 passed, 0 failed * x86_64: 46 total, 44 passed, 2 failed
## Test suites summary * boot * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-exec * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-filesystems-epoll * kselftest-firmware * kselftest-fpu * kselftest-ftrace * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-user_events * kselftest-vDSO * kselftest-vm * kselftest-watchdog * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * perf * rcutorture * v4l2-compliance
-- Linaro LKFT https://lkft.linaro.org
On 1/29/24 9:01 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.15 release. There are 331 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 31 Jan 2024 16:59:28 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.15-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Mon, Jan 29, 2024 at 09:01:04AM -0800, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.15 release. There are 331 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 31 Jan 2024 16:59:28 +0000. Anything received after that time might be too late.
No regressions found on WSL (x86 and arm64).
Built, booted, and reviewed dmesg.
Thank you.
Tested-by: Kelsey Steele kelseysteele@linux.microsoft.com
linux-stable-mirror@lists.linaro.org