This is the start of the stable review cycle for the 5.15.87 release. There are 290 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 Jan 2023 17:59:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.87-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 5.15.87-rc1
Jocelyn Falempe jfalempe@redhat.com drm/mgag200: Fix PLL setup for G200_SE_A rev >=4
Harshit Mogalapalli harshit.m.mogalapalli@oracle.com io_uring: Fix unsigned 'res' comparison with zero in io_fixup_rw_res()
Ard Biesheuvel ardb@kernel.org efi: random: combine bootloader provided RNG seed with RNG protocol output
Jan Kara jack@suse.cz mbcache: Avoid nesting of cache->c_list_lock under bit locks
Jie Wang wangjie125@huawei.com net: hns3: fix return value check bug of rx copybreak
Qu Wenruo wqu@suse.com btrfs: make thaw time super block check to also verify checksum
Muhammad Usama Anjum usama.anjum@collabora.com selftests: set the BUILD variable to absolute path
Eric Biggers ebiggers@google.com ext4: don't allow journal inode to have encrypt flag
Matthieu Baerts matthieu.baerts@tessares.net mptcp: use proper req destructor for IPv6
Matthieu Baerts matthieu.baerts@tessares.net mptcp: dedicated request sock for subflow in v6
Mario Limonciello mario.limonciello@amd.com Revert "ACPI: PM: Add support for upcoming AMD uPEP HID AMDI007"
William Liu will@willsroot.io ksmbd: check nt_len to be at least CIFS_ENCPWD_SIZE in ksmbd_decode_ntlmssp_auth_blob
Namjae Jeon linkinjeon@kernel.org ksmbd: fix infinite loop in ksmbd_conn_handler_loop()
Linus Torvalds torvalds@linux-foundation.org hfs/hfsplus: avoid WARN_ON() for sanity check, use proper error handling
Arnd Bergmann arnd@arndb.de hfs/hfsplus: use WARN_ON for sanity check
Zhenyu Wang zhenyuw@linux.intel.com drm/i915/gvt: fix vgpu debugfs clean in remove
Zhenyu Wang zhenyuw@linux.intel.com drm/i915/gvt: fix gvt debugfs destroy
Björn Töpel bjorn@rivosinc.com riscv, kprobes: Stricter c.jr/c.jalr decoding
Ben Dooks ben-linux@fluff.org riscv: uaccess: fix type of 0 variable on error in get_user()
Srinivas Pandruvada srinivas.pandruvada@linux.intel.com thermal: int340x: Add missing attribute for data rate base
Pavel Begunkov asml.silence@gmail.com io_uring: fix CQ waiting timeout handling
Jens Axboe axboe@kernel.dk block: don't allow splitting of a REQ_NOWAIT bio
Paul Menzel pmenzel@molgen.mpg.de fbdev: matroxfb: G200eW: Increase max memory from 1 MB to 16 MB
Jeff Layton jlayton@kernel.org nfsd: fix handling of readdir in v4root vs. mount upcall timeout
Rodrigo Branco bsdaemon@google.com x86/bugs: Flush IBP in ib_prctl_set()
Takashi Iwai tiwai@suse.de x86/kexec: Fix double-free of elf header buffer
Qu Wenruo wqu@suse.com btrfs: check superblock to ensure the fs was not modified at thaw time
Christoph Hellwig hch@lst.de nvme: also return I/O command effects from nvme_command_effects
Christoph Hellwig hch@lst.de nvmet: use NVME_CMD_EFFECTS_CSUPP instead of open coding it
Jens Axboe axboe@kernel.dk io_uring: check for valid register opcode earlier
Yanjun Zhang zhangyanjun@cestc.cn nvme: fix multipath crash caused by flush request when blktrace is enabled
Hans de Goede hdegoede@redhat.com ASoC: Intel: bytcr_rt5640: Add quirk for the Advantech MICA-071 tablet
Jan Kara jack@suse.cz udf: Fix extension of the last extent in the file
Zhengchao Shao shaozhengchao@huawei.com caif: fix memory leak in cfctrl_linkup_request()
Dan Carpenter error27@gmail.com drm/i915: unpin on error in intel_vgpu_shadow_mm_pin()
Namhyung Kim namhyung@kernel.org perf stat: Fix handling of --for-each-cgroup with --bpf-counters to match non BPF mode
Szymon Heidrich szymon.heidrich@gmail.com usb: rndis_host: Secure rndis_query check against int overflow
Geetha sowjanya gakula@marvell.com octeontx2-pf: Fix lmtst ID used in aura free
Daniil Tatianin d-tatianin@yandex-team.ru drivers/net/bonding/bond_3ad: return when there's no aggregator
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp fs/ntfs3: don't hold ni_lock when calling truncate_setsize()
Philipp Zabel p.zabel@pengutronix.de drm/imx: ipuv3-plane: Fix overlay plane width
Miaoqian Lin linmq006@gmail.com perf tools: Fix resources leak in perf_data__open_dir()
Jozsef Kadlecsik kadlec@netfilter.org netfilter: ipset: Rework long task execution when adding/deleting entries
Jozsef Kadlecsik kadlec@netfilter.org netfilter: ipset: fix hash:net,port,net hang with /0 subnet
Horatiu Vultur horatiu.vultur@microchip.com net: sparx5: Fix reading of the MAC address
Jamal Hadi Salim jhs@mojatatu.com net: sched: cbq: dont intepret cls results when asked to drop
Jamal Hadi Salim jhs@mojatatu.com net: sched: atm: dont intepret cls results when asked to drop
Miaoqian Lin linmq006@gmail.com gpio: sifive: Fix refcount leak in sifive_gpio_probe
Xiubo Li xiubli@redhat.com ceph: switch to vfs_inode_has_locks() to fix file lock bug
Jeff Layton jlayton@kernel.org filelock: new helper: vfs_inode_has_locks
Carlo Caione ccaione@baylibre.com drm/meson: Reduce the FIFO lines held when AFBC is not used
Maor Gottlieb maorg@nvidia.com RDMA/mlx5: Fix validation of max_rd_atomic caps for DC
Shay Drory shayd@nvidia.com RDMA/mlx5: Fix mlx5_ib_get_hw_stats when used for device
Miaoqian Lin linmq006@gmail.com net: phy: xgmiitorgmii: Fix refcount leak in xgmiitorgmii_probe
David Arinzon darinzon@amazon.com net: ena: Update NUMA TPH hint register upon NUMA node update
David Arinzon darinzon@amazon.com net: ena: Set default value for RX interrupt moderation
David Arinzon darinzon@amazon.com net: ena: Fix rx_copybreak value update
David Arinzon darinzon@amazon.com net: ena: Use bitmask to indicate packet redirection
David Arinzon darinzon@amazon.com net: ena: Account for the number of processed bytes in XDP
David Arinzon darinzon@amazon.com net: ena: Don't register memory info on XDP exchange
David Arinzon darinzon@amazon.com net: ena: Fix toeplitz initial hash value
Jiguang Xiao jiguang.xiao@windriver.com net: amd-xgbe: add missed tasklet_kill
Adham Faris afaris@nvidia.com net/mlx5e: Fix hw mtu initializing at XDP SQ allocation
Chris Mi cmi@nvidia.com net/mlx5e: Always clear dest encap in neigh-update-del
Roi Dayan roid@nvidia.com net/mlx5e: TC, Refactor mlx5e_tc_add_flow_mod_hdr() to get flow attr
Dragos Tatulea dtatulea@nvidia.com net/mlx5e: IPoIB, Don't allow CQE compression to be turned on by default
Shay Drory shayd@nvidia.com net/mlx5: Avoid recovery in probe flows
Jiri Pirko jiri@nvidia.com net/mlx5: Add forgotten cleanup calls into mlx5_init_once() error path
Moshe Shemesh moshe@nvidia.com net/mlx5: E-Switch, properly handle ingress tagged packets on VST
Stefano Garzarella sgarzare@redhat.com vdpa_sim: fix vringh initialization in vdpasim_queue_ready()
Stefano Garzarella sgarzare@redhat.com vhost: fix range used in translate_desc()
Stefano Garzarella sgarzare@redhat.com vringh: fix range used in iotlb_translate()
Yuan Can yuancan@huawei.com vhost/vsock: Fix error handling in vhost_vsock_init()
ruanjinjie ruanjinjie@huawei.com vdpa_sim: fix possible memory leak in vdpasim_net_init() and vdpasim_blk_init()
Miaoqian Lin linmq006@gmail.com nfc: Fix potential resource leaks
Johnny S. Lee foss@jsl.io net: dsa: mv88e6xxx: depend on PTP conditionally
Daniil Tatianin d-tatianin@yandex-team.ru qlcnic: prevent ->dcb use-after-free on qlcnic_dcb_enable() failure
Hawkins Jiawei yin31149@gmail.com net: sched: fix memory leak in tcindex_set_parms
Jian Shen shenjian15@huawei.com net: hns3: fix VF promisc mode not update when mac table full
Jian Shen shenjian15@huawei.com net: hns3: fix miss L3E checking for rx packet
Peng Li lipeng321@huawei.com net: hns3: extract macro to simplify ring stats update code
Hao Chen chenhao288@hisilicon.com net: hns3: refactor hns3_nic_reuse_page()
Jie Wang wangjie125@huawei.com net: hns3: add interrupts re-initialization while doing VF FLR
Jeff Layton jlayton@kernel.org nfsd: shut down the NFSv4 state objects before the filecache
Shawn Bohrer sbohrer@cloudflare.com veth: Fix race with AF_XDP exposing old or uninitialized descriptors
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: honor set timeout and garbage collection updates
Ronak Doshi doshir@vmware.com vmxnet3: correctly report csum_level for encapsulated packet
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: perform type checking for existing sets
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: add function to create set stateful expressions
Pablo Neira Ayuso pablo@netfilter.org netfilter: nf_tables: consolidate set description
Steven Price steven.price@arm.com drm/panfrost: Fix GEM handle creation ref-counting
Jakub Kicinski kuba@kernel.org bpf: pull before calling skb_postpull_rcsum()
Sasha Levin sashal@kernel.org btrfs: fix an error handling path in btrfs_defrag_leaves()
minoura makoto minoura@valinux.co.jp SUNRPC: ensure the matching upcall is in-flight upon downcall
Matthew Auld matthew.auld@intel.com drm/i915/migrate: fix length calculation
Matthew Auld matthew.auld@intel.com drm/i915/migrate: fix offset calculation
Matthew Auld matthew.auld@intel.com drm/i915/migrate: don't check the scratch page
Jan Kara jack@suse.cz ext4: fix deadlock due to mbcache entry corruption
Jan Kara jack@suse.cz mbcache: automatically delete entries from cache on freeing
Baokun Li libaokun1@huawei.com ext4: correct inconsistent error msg in nojournal mode
Jason Yan yanaijie@huawei.com ext4: goto right label 'failed_mount3a'
Biju Das biju.das.jz@bp.renesas.com ravb: Fix "failed to switch device to config mode" message during unbind
Masami Hiramatsu (Google) mhiramat@kernel.org perf probe: Fix to get the DW_AT_decl_file and DW_AT_call_file as unsinged data
Masami Hiramatsu (Google) mhiramat@kernel.org perf probe: Use dwarf_attr_integrate as generic DWARF attr accessor
Smitha T Murthy smitha.t@samsung.com media: s5p-mfc: Fix in register read and write for H264
Smitha T Murthy smitha.t@samsung.com media: s5p-mfc: Clear workbit to handle error condition
Smitha T Murthy smitha.t@samsung.com media: s5p-mfc: Fix to handle reference queue during finishing
Yazen Ghannam yazen.ghannam@amd.com x86/MCE/AMD: Clear DFR errors found in THR handler
Borislav Petkov bp@suse.de x86/mce: Get rid of msr_ops
void0red void0red@gmail.com btrfs: fix extent map use-after-free when handling missing device in read_one_chunk
Nikolay Borisov nborisov@suse.com btrfs: move missing device handling in a dedicate function
Sasha Levin sashal@kernel.org btrfs: replace strncpy() with strscpy()
Sasha Levin sashal@kernel.org phy: qcom-qmp-combo: fix out-of-bounds clock access
Jens Axboe axboe@kernel.dk ARM: renumber bits related to _TIF_WORK_MASK
Eric Biggers ebiggers@kernel.org ext4: fix off-by-one errors in fast-commit block filling
Eric Biggers ebiggers@kernel.org ext4: fix unaligned memory access in ext4_fc_reserve_space()
Eric Biggers ebiggers@kernel.org ext4: add missing validation of fast-commit record lengths
Eric Biggers ebiggers@kernel.org ext4: don't set up encryption key during jbd2 transaction
Eric Biggers ebiggers@kernel.org ext4: disable fast-commit of encrypted dir operations
Eric Biggers ebiggers@kernel.org ext4: fix potential out of bound read in ext4_fc_replay_scan()
Eric Biggers ebiggers@kernel.org ext4: factor out ext4_fc_get_tl()
Eric Biggers ebiggers@kernel.org ext4: introduce EXT4_FC_TAG_BASE_LEN helper
Eric Biggers ebiggers@kernel.org ext4: use ext4_debug() instead of jbd_debug()
Eric Biggers ebiggers@kernel.org ext4: remove unused enum EXT4_FC_COMMIT_FAILED
Zheng Yejian zhengyejian1@huawei.com tracing: Fix issue of missing one synthetic field
Damien Le Moal damien.lemoal@opensource.wdc.com block: mq-deadline: Fix dd_finish_request() for zoned devices
Alex Deucher alexander.deucher@amd.com drm/amdgpu: make display pinning more flexible (v2)
Alex Deucher alexander.deucher@amd.com drm/amdgpu: handle polaris10/11 overlap asics (v2)
Ye Bin yebin10@huawei.com ext4: allocate extended attribute value in vmalloc area
Jan Kara jack@suse.cz ext4: avoid unaccounted block allocation when expanding inode
Jan Kara jack@suse.cz ext4: initialize quota before expanding inode in setproject ioctl
Ye Bin yebin10@huawei.com ext4: fix inode leak in ext4_xattr_inode_create() on an error path
Ye Bin yebin10@huawei.com ext4: fix kernel BUG in 'ext4_write_inline_data_end()'
Jan Kara jack@suse.cz ext4: avoid BUG_ON when creating xattrs
Luís Henriques lhenriques@suse.de ext4: fix error code return to user-space in ext4_get_branch()
Baokun Li libaokun1@huawei.com ext4: fix corruption when online resizing a 1K bigalloc fs
Eric Whitney enwlinux@gmail.com ext4: fix delayed allocation bug in ext4_clu_mapped for bigalloc + inline
Ye Bin yebin10@huawei.com ext4: init quota for 'old.inode' in 'ext4_rename'
Ye Bin yebin10@huawei.com ext4: fix uninititialized value in 'ext4_evict_inode'
Eric Biggers ebiggers@google.com ext4: fix leaking uninitialized memory in fast-commit journal
Baokun Li libaokun1@huawei.com ext4: fix bug_on in __es_tree_search caused by bad boot loader inode
Zhang Yi yi.zhang@huawei.com ext4: check and assert if marking an no_delete evicting inode dirty
Ye Bin yebin10@huawei.com ext4: fix reserved cluster accounting in __es_remove_extent()
Baokun Li libaokun1@huawei.com ext4: fix bug_on in __es_tree_search caused by bad quota inode
Baokun Li libaokun1@huawei.com ext4: add helper to check quota inums
Baokun Li libaokun1@huawei.com ext4: add EXT4_IGET_BAD flag to prevent unexpected bad inode
Gaosheng Cui cuigaosheng1@huawei.com ext4: fix undefined behavior in bit shift for ext4_check_flag_values
Baokun Li libaokun1@huawei.com ext4: fix use-after-free in ext4_orphan_cleanup
Alexander Potapenko glider@google.com fs: ext4: initialize fsdata in pagecache_write()
Luís Henriques lhenriques@suse.de ext4: remove trailing newline from ext4_msg() message
Baokun Li libaokun1@huawei.com ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop
Zhang Yi yi.zhang@huawei.com ext4: silence the warning when evicting inode with dioread_nolock
Yuan Can yuancan@huawei.com drm/ingenic: Fix missing platform_driver_unregister() call in ingenic_drm_init()
Mikko Kovanen mikko.kovanen@aavamobile.com drm/i915/dsi: fix VBT send packet port selection for dual link DSI
Zack Rusin zackr@vmware.com drm/vmwgfx: Validate the box size for the snooped cursor
Simon Ser contact@emersion.fr drm/connector: send hotplug uevent on connector cleanup
Wang Weiyang wangweiyang2@huawei.com device_cgroup: Roll back to original exceptions after copy failure
Shang XiaoJing shangxiaojing@huawei.com parisc: led: Fix potential null-ptr-deref in start_task()
Maria Yu quic_aiquny@quicinc.com remoteproc: core: Do pm_relax when in RPROC_OFFLINE state
Kim Phillips kim.phillips@amd.com iommu/amd: Fix ivrs_acpihid cmdline parsing code
Johan Hovold johan+linaro@kernel.org phy: qcom-qmp-combo: fix sc8180x reset
Isaac J. Manjarres isaacmanjarres@google.com driver core: Fix bus_type.match() error handling in __driver_attach()
Mario Limonciello mario.limonciello@amd.com crypto: ccp - Add support for TEE for PCI ID 0x14CA
Corentin Labbe clabbe@baylibre.com crypto: n2 - add missing hash statesize
Sergey Matyukevich sergey.matyukevich@syntacore.com riscv: mm: notify remote harts about mmu cache updates
Guo Ren guoren@linux.alibaba.com riscv: stacktrace: Fixup ftrace_graph_ret_addr retp argument
Sascha Hauer s.hauer@pengutronix.de PCI/sysfs: Fix double free in error path
Michael S. Tsirkin mst@redhat.com PCI: Fix pci_device_is_present() for VFs by checking PF
Dan Carpenter error27@gmail.com ipmi: fix use after free in _ipmi_destroy_user()
Huaxin Lu luhuaxin1@huawei.com ima: Fix a potential NULL pointer access in ima_restore_measurement_list
Alexander Sverdlin alexander.sverdlin@nokia.com mtd: spi-nor: Check for zero erase size in spi_nor_find_best_erase_type()
Zhang Yuchen zhangyuchen.lcr@bytedance.com ipmi: fix long wait in unload when IPMI disconnect
Maximilian Luz luzmaximilian@gmail.com ipu3-imgu: Fix NULL pointer dereference in imgu_subdev_set_selection()
Aidan MacDonald aidanmacdonald.0x0@gmail.com ASoC: jz4740-i2s: Handle independent FIFO flush bits
Michael Walle michael@walle.cc wifi: wilc1000: sdio: fix module autoloading
Aditya Garg gargaditya08@live.com efi: Add iMac Pro 2017 to uefi skip cert quirk
Florian-Ewald Mueller florian-ewald.mueller@ionos.com md/bitmap: Fix bitmap chunk size overflow issues
Damien Le Moal damien.lemoal@opensource.wdc.com block: mq-deadline: Do not break sequential write streams to zoned HDDs
Ian Abbott abbotti@mev.co.uk rtc: ds1347: fix value written to century register
Steve French stfrench@microsoft.com cifs: fix missing display of three mount options
Paulo Alcantara pc@cjr.nz cifs: fix confusing debug message
Takashi Iwai tiwai@suse.de media: dvb-core: Fix UAF due to refcount races at releasing
Keita Suzuki keitasuzuki.park@sslab.ics.keio.ac.jp media: dvb-core: Fix double free in dvb_register_device()
Nick Desaulniers ndesaulniers@google.com ARM: 9256/1: NWFPE: avoid compiler-generated __aeabi_uldivmod
Luca Ceresoli luca.ceresoli@bootlin.com staging: media: tegra-video: fix device_node use after free
Luca Ceresoli luca.ceresoli@bootlin.com staging: media: tegra-video: fix chan->mipi value on error
Yang Jihong yangjihong1@huawei.com tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line
Steven Rostedt (Google) rostedt@goodmis.org tracing/probes: Handle system names with hyphens
Zheng Yejian zhengyejian1@huawei.com tracing/hist: Fix wrong return value in parse_action_params()
Masami Hiramatsu (Google) mhiramat@kernel.org tracing: Fix complicated dependency of CONFIG_TRACER_MAX_TRACE
Steven Rostedt (Google) rostedt@goodmis.org tracing: Fix race where eprobes can be called before the event
Masami Hiramatsu (Google) mhiramat@kernel.org x86/kprobes: Fix optprobe optimization check with CONFIG_RETHUNK
Masami Hiramatsu (Google) mhiramat@kernel.org x86/kprobes: Fix kprobes instruction boudary check with CONFIG_RETHUNK
Steven Rostedt (Google) rostedt@goodmis.org ftrace/x86: Add back ftrace_expected for ftrace bug reports
Ashok Raj ashok.raj@intel.com x86/microcode/intel: Do not retry microcode reloading on the APs
Sean Christopherson seanjc@google.com KVM: nVMX: Properly expose ENABLE_USR_WAIT_PAUSE control to L1
Sean Christopherson seanjc@google.com KVM: nVMX: Inject #GP, not #UD, if "generic" VMXON CR0/CR4 check fails
Sean Christopherson seanjc@google.com KVM: VMX: Resume guest immediately when injecting #GP on ECREATE
Rob Herring robh@kernel.org of/kexec: Fix reading 32-bit "linux,initrd-{start,end}" values
Namhyung Kim namhyung@kernel.org perf/core: Call LSM hook after copying perf_event_attr
Zheng Yejian zhengyejian1@huawei.com tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx'
Mike Snitzer snitzer@kernel.org dm cache: set needs_check flag after aborting metadata
Luo Meng luomeng12@huawei.com dm cache: Fix UAF in destroy()
Luo Meng luomeng12@huawei.com dm clone: Fix UAF in clone_dtr()
Luo Meng luomeng12@huawei.com dm integrity: Fix UAF in dm_integrity_dtr()
Luo Meng luomeng12@huawei.com dm thin: Fix UAF in run_timer_softirq()
Luo Meng luomeng12@huawei.com dm thin: resume even if in FAIL mode
Zhihao Cheng chengzhihao1@huawei.com dm thin: Use last transaction's pmd->root when commit failed
Zhihao Cheng chengzhihao1@huawei.com dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata
Mike Snitzer snitzer@kernel.org dm cache: Fix ABBA deadlock between shrink_slab and dm_cache_metadata_abort
Matthieu Baerts matthieu.baerts@tessares.net mptcp: remove MPTCP 'ifdef' in TCP SYN cookies
Florian Westphal fw@strlen.de mptcp: mark ops structures as ro_after_init
Alexander Aring aahringo@redhat.com fs: dlm: retry accept() until -EAGAIN or error returns
Alexander Aring aahringo@redhat.com fs: dlm: fix sock release if listen fails
Chris Chiu chris.chiu@canonical.com ALSA: hda/realtek: Apply dual codec fixup for Dell Latitude laptops
Philipp Jungkamp p.jungkamp@gmx.net ALSA: patch_realtek: Fix Dell Inspiron Plus 16
Yongqiang Liu liuyongqiang13@huawei.com cpufreq: Init completion before kobject_init_and_add()
Kant Fan kant@allwinnertech.com PM/devfreq: governor: Add a private governor_data for governor
Mickaël Salaün mic@digikod.net selftests: Use optional USERCFLAGS and USERLDFLAGS
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org arm64: dts: qcom: sdm850-lenovo-yoga-c630: correct I2C12 pins drive strength
Jason A. Donenfeld Jason@zx2c4.com ARM: ux500: do not directly dereference __iomem
Boris Burkov boris@bur.io btrfs: fix resolving backrefs for inline extent followed by prealloc
Wenchao Chen wenchao.chen@unisoc.com mmc: sdhci-sprd: Disable CLK_AUTO when the clock is less than 400K
Krzysztof Kozlowski krzysztof.kozlowski@linaro.org arm64: dts: qcom: sdm845-db845c: correct SPI2 pins drive strength
Alexander Antonov alexander.antonov@linux.intel.com perf/x86/intel/uncore: Clear attr_update properly
Alexander Antonov alexander.antonov@linux.intel.com perf/x86/intel/uncore: Disable I/O stacks to PMU mapping on ICX-D
Bixuan Cui cuibixuan@linux.alibaba.com jbd2: use the correct print format
Steven Rostedt rostedt@goodmis.org ktest.pl minconfig: Unset configs instead of just removing them
Steven Rostedt rostedt@goodmis.org kest.pl: Fix grub2 menu handling for rebooting
Manivannan Sadhasivam mani@kernel.org soc: qcom: Select REMAP_MMIO for LLCC driver
Jason A. Donenfeld Jason@zx2c4.com media: stv0288: use explicitly signed char
Eric Dumazet edumazet@google.com net/af_packet: make sure to pull mac header
Hangbin Liu liuhangbin@gmail.com net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO
Paul E. McKenney paulmck@kernel.org rcu-tasks: Simplify trc_read_check_handler() atomic operations
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ASoC/SoundWire: dai: expand 'stream' concept beyond SoundWire
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com ASoC: Intel/SOF: use set_stream() instead of set_tdm_slots() for HDAudio
Marco Elver elver@google.com kcsan: Instrument memcpy/memset/memmove with newer Clang
Chuck Lever chuck.lever@oracle.com SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() fails
Hanjun Guo guohanjun@huawei.com tpm: tpm_tis: Add the missed acpi_put_table() to fix memory leak
Hanjun Guo guohanjun@huawei.com tpm: tpm_crb: Add the missed acpi_put_table() to fix memory leak
Hanjun Guo guohanjun@huawei.com tpm: acpi: Call acpi_put_table() to fix memory leak
Deren Wu deren.wu@mediatek.com mmc: vub300: fix warning - do not call blocking ops when !TASK_RUNNING
Jaegeuk Kim jaegeuk@kernel.org f2fs: allow to read node block after shutdown
Pavel Machek pavel@denx.de f2fs: should put a page when checking the summary info
NARIBAYASHI Akira a.naribayashi@fujitsu.com mm, compaction: fix fast_isolate_around() to stay within boundaries
Mikulas Patocka mpatocka@redhat.com md: fix a crash in mempool_free
ChiYuan Huang cy_huang@richtek.com mfd: mt6360: Add bounds checking in Regmap read/write call-backs
Christian Brauner brauner@kernel.org pnode: terminate at peers of source
Artem Egorkine arteme@gmail.com ALSA: line6: fix stack overflow in line6_midi_transmit
Artem Egorkine arteme@gmail.com ALSA: line6: correct midi status byte when receiving data from podxt
Zhang Tianci zhangtianci.1997@bytedance.com ovl: Use ovl mounter's fsuid and fsgid in ovl_link()
Wang Yufen wangyufen@huawei.com binfmt: Fix error return code in load_elf_fdpic_binary()
Aditya Garg gargaditya08@live.com hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount
Qiujun Huang hqjagain@gmail.com pstore/zone: Use GFP_ATOMIC to allocate zone buffer
Luca Stefani luca@osomprivacy.com pstore: Properly assign mem_type property
Terry Junge linuxhid@cosmicgizmosystems.com HID: plantronics: Additional PIDs for double volume key presses quirk
José Expósito jose.exposito89@gmail.com HID: multitouch: fix Asus ExpertBook P2 P2451FA trackpoint
Nathan Lynch nathanl@linux.ibm.com powerpc/rtas: avoid scheduling in rtas_os_term()
Nathan Lynch nathanl@linux.ibm.com powerpc/rtas: avoid device tree lookups in rtas_os_term()
Christophe Leroy christophe.leroy@csgroup.eu objtool: Fix SEGFAULT
Yin Xiujiang yinxiujiang@kylinos.cn fs/ntfs3: Fix slab-out-of-bounds in r_page
Dan Carpenter dan.carpenter@oracle.com fs/ntfs3: Delete duplicate condition in ntfs_read_mft()
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_fill_super()
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp fs/ntfs3: Use __GFP_NOWARN allocation at wnd_init()
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Validate index root when initialize NTFS security
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com soundwire: dmi-quirks: add quirk variant for LAPBC710 NUC15
Hawkins Jiawei yin31149@gmail.com fs/ntfs3: Fix slab-out-of-bounds read in run_unpack
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Validate resident attribute name
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Validate buffer length while parsing index
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Validate attribute name offset
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Add null pointer check for inode operations
Shigeru Yoshida syoshida@redhat.com fs/ntfs3: Fix memory leak on ntfs_fill_super() error path
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Add null pointer check to attr_load_runs_vcn
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Validate data run offset
edward lo edward.lo@ambergroup.io fs/ntfs3: Add overflow check for attribute size
edward lo edward.lo@ambergroup.io fs/ntfs3: Validate BOOT record_size
Christoph Hellwig hch@lst.de nvmet: don't defer passthrough commands with trivial effects to the workqueue
Christoph Hellwig hch@lst.de nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition
Adam Vodopjan grozzly@protonmail.com ata: ahci: Fix PCS quirk application for suspend
Yu Kuai yukuai3@huawei.com block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq
Adrian Freund adrian@freund.io ACPI: resource: do IRQ override on Lenovo 14ALC7
Erik Schumacher ofenfisch@googlemail.com ACPI: resource: do IRQ override on XMG Core 15
Jiri Slaby (SUSE) jirislaby@kernel.org ACPI: resource: do IRQ override on LENOVO IdeaPad
Tamim Khan tamim@fusetak.com ACPI: resource: Skip IRQ override on Asus Vivobook K3402ZA/K3502ZA
Keith Busch kbusch@kernel.org nvme-pci: fix page size checks
Keith Busch kbusch@kernel.org nvme-pci: fix mempool alloc size
Klaus Jensen k.jensen@samsung.com nvme-pci: fix doorbell buffer value endianness
Sasha Levin sashal@kernel.org Revert "selftests/bpf: Add test for unstable CT lookup API"
Paulo Alcantara pc@cjr.nz cifs: fix oops during encryption
Miaoqian Lin linmq006@gmail.com usb: dwc3: qcom: Fix memory leak in dwc3_qcom_interconnect_init
-------------
Diffstat:
Makefile | 4 +- arch/arm/include/asm/thread_info.h | 13 +- arch/arm/nwfpe/Makefile | 6 + arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 5 +- .../boot/dts/qcom/sdm850-lenovo-yoga-c630.dts | 6 +- arch/powerpc/kernel/rtas.c | 20 +- arch/riscv/include/asm/mmu.h | 2 + arch/riscv/include/asm/pgtable.h | 2 +- arch/riscv/include/asm/tlbflush.h | 18 ++ arch/riscv/include/asm/uaccess.h | 2 +- arch/riscv/kernel/probes/simulate-insn.h | 4 +- arch/riscv/kernel/stacktrace.c | 2 +- arch/riscv/mm/context.c | 10 + arch/riscv/mm/tlbflush.c | 28 +- arch/x86/events/intel/uncore.h | 1 + arch/x86/events/intel/uncore_snbep.c | 22 +- arch/x86/kernel/cpu/bugs.c | 2 + arch/x86/kernel/cpu/mce/amd.c | 37 +-- arch/x86/kernel/cpu/mce/core.c | 95 +++---- arch/x86/kernel/cpu/mce/internal.h | 12 +- arch/x86/kernel/cpu/microcode/intel.c | 8 +- arch/x86/kernel/crash.c | 4 +- arch/x86/kernel/ftrace.c | 2 + arch/x86/kernel/kprobes/core.c | 10 +- arch/x86/kernel/kprobes/opt.c | 28 +- arch/x86/kvm/vmx/nested.c | 47 +++- arch/x86/kvm/vmx/sgx.c | 4 +- block/bfq-iosched.c | 2 +- block/blk-merge.c | 10 + block/mq-deadline.c | 84 +++++- drivers/acpi/resource.c | 78 +++++- drivers/acpi/x86/s2idle.c | 10 +- drivers/ata/ahci.c | 32 ++- drivers/base/dd.c | 6 +- drivers/char/ipmi/ipmi_msghandler.c | 4 +- drivers/char/ipmi/ipmi_si_intf.c | 27 +- drivers/char/tpm/eventlog/acpi.c | 12 +- drivers/char/tpm/tpm_crb.c | 29 ++- drivers/char/tpm/tpm_tis.c | 9 +- drivers/cpufreq/cpufreq.c | 2 +- drivers/crypto/ccp/sp-pci.c | 11 +- drivers/crypto/n2_core.c | 6 + drivers/devfreq/devfreq.c | 6 +- drivers/devfreq/governor_userspace.c | 12 +- drivers/firmware/efi/efi.c | 4 +- drivers/firmware/efi/libstub/efistub.h | 2 + drivers/firmware/efi/libstub/random.c | 42 ++- drivers/gpio/gpio-sifive.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 +- drivers/gpu/drm/drm_connector.c | 3 + drivers/gpu/drm/i915/display/intel_dsi_vbt.c | 4 +- drivers/gpu/drm/i915/gt/intel_migrate.c | 8 +- drivers/gpu/drm/i915/gvt/debugfs.c | 17 +- drivers/gpu/drm/i915/gvt/scheduler.c | 1 + drivers/gpu/drm/imx/ipuv3-plane.c | 14 +- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 6 +- drivers/gpu/drm/meson/meson_viu.c | 5 +- drivers/gpu/drm/mgag200/mgag200_pll.c | 3 +- drivers/gpu/drm/panfrost/panfrost_drv.c | 27 +- drivers/gpu/drm/panfrost/panfrost_gem.c | 16 +- drivers/gpu/drm/panfrost/panfrost_gem.h | 5 +- drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 3 +- drivers/hid/hid-ids.h | 3 + drivers/hid/hid-multitouch.c | 4 + drivers/hid/hid-plantronics.c | 9 + drivers/infiniband/hw/mlx5/counters.c | 6 +- drivers/infiniband/hw/mlx5/qp.c | 49 +++- drivers/iommu/amd/init.c | 7 + drivers/md/dm-cache-metadata.c | 54 +++- drivers/md/dm-cache-target.c | 11 +- drivers/md/dm-clone-target.c | 1 + drivers/md/dm-integrity.c | 2 + drivers/md/dm-thin-metadata.c | 60 ++++- drivers/md/dm-thin.c | 18 +- drivers/md/md-bitmap.c | 20 +- drivers/md/md.c | 9 +- drivers/media/dvb-core/dmxdev.c | 8 + drivers/media/dvb-core/dvbdev.c | 1 + drivers/media/dvb-frontends/stv0288.c | 5 +- drivers/media/platform/s5p-mfc/s5p_mfc_ctrl.c | 4 +- drivers/media/platform/s5p-mfc/s5p_mfc_enc.c | 12 +- drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c | 14 +- drivers/mfd/mt6360-core.c | 14 +- drivers/mmc/host/sdhci-sprd.c | 16 +- drivers/mmc/host/vub300.c | 2 + drivers/mtd/spi-nor/core.c | 2 + drivers/net/bonding/bond_3ad.c | 1 + drivers/net/dsa/mv88e6xxx/Kconfig | 4 +- drivers/net/ethernet/amazon/ena/ena_com.c | 29 +-- drivers/net/ethernet/amazon/ena/ena_ethtool.c | 6 +- drivers/net/ethernet/amazon/ena/ena_netdev.c | 83 ++++-- drivers/net/ethernet/amazon/ena/ena_netdev.h | 17 +- drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 3 + drivers/net/ethernet/amd/xgbe/xgbe-i2c.c | 4 +- drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 4 +- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 178 +++++-------- drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 7 + .../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 75 +++--- .../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 3 +- .../ethernet/marvell/octeontx2/nic/otx2_common.c | 30 ++- .../ethernet/mellanox/mlx5/core/en/tc_tun_encap.c | 11 +- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.h | 4 +- .../mellanox/mlx5/core/esw/acl/egress_lgcy.c | 7 +- .../mellanox/mlx5/core/esw/acl/ingress_lgcy.c | 33 ++- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 30 ++- drivers/net/ethernet/mellanox/mlx5/core/eswitch.h | 6 + drivers/net/ethernet/mellanox/mlx5/core/health.c | 6 + .../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 4 + drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 + .../net/ethernet/microchip/sparx5/sparx5_main.c | 2 +- .../net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c | 8 +- drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h | 10 +- drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 8 +- drivers/net/ethernet/renesas/ravb_main.c | 2 +- drivers/net/phy/xilinx_gmii2rgmii.c | 1 + drivers/net/usb/rndis_host.c | 3 +- drivers/net/veth.c | 5 +- drivers/net/vmxnet3/vmxnet3_drv.c | 8 + drivers/net/wireless/microchip/wilc1000/sdio.c | 1 + drivers/nvme/host/core.c | 32 ++- drivers/nvme/host/nvme.h | 2 +- drivers/nvme/host/pci.c | 37 +-- drivers/nvme/target/admin-cmd.c | 35 +-- drivers/nvme/target/passthru.c | 11 +- drivers/of/kexec.c | 10 +- drivers/parisc/led.c | 3 + drivers/pci/pci-sysfs.c | 13 +- drivers/pci/pci.c | 2 + drivers/phy/qualcomm/phy-qcom-qmp.c | 8 +- drivers/remoteproc/remoteproc_core.c | 9 +- drivers/rtc/rtc-ds1347.c | 2 +- drivers/soc/qcom/Kconfig | 1 + drivers/soc/ux500/ux500-soc-id.c | 10 +- drivers/soundwire/dmi-quirks.c | 8 + drivers/soundwire/intel.c | 8 +- drivers/soundwire/qcom.c | 8 +- drivers/soundwire/stream.c | 4 +- drivers/staging/media/ipu3/ipu3-v4l2.c | 57 +++-- drivers/staging/media/tegra-video/csi.c | 4 +- drivers/staging/media/tegra-video/csi.h | 2 +- .../intel/int340x_thermal/processor_thermal_rfim.c | 4 + drivers/usb/dwc3/dwc3-qcom.c | 13 +- drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 +- drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 4 +- drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 4 +- drivers/vhost/vhost.c | 4 +- drivers/vhost/vringh.c | 5 +- drivers/vhost/vsock.c | 9 +- drivers/video/fbdev/matrox/matroxfb_base.c | 4 +- fs/binfmt_elf_fdpic.c | 5 +- fs/btrfs/backref.c | 4 + fs/btrfs/disk-io.c | 35 ++- fs/btrfs/disk-io.h | 6 +- fs/btrfs/ioctl.c | 9 +- fs/btrfs/rcu-string.h | 6 +- fs/btrfs/super.c | 76 ++++++ fs/btrfs/tree-defrag.c | 6 +- fs/btrfs/volumes.c | 43 ++-- fs/ceph/caps.c | 2 +- fs/ceph/locks.c | 4 - fs/ceph/super.h | 1 - fs/cifs/cifsfs.c | 8 +- fs/cifs/cifsglob.h | 69 +++++ fs/cifs/cifsproto.h | 4 +- fs/cifs/connect.c | 4 +- fs/cifs/misc.c | 4 +- fs/cifs/smb2ops.c | 143 +++++------ fs/dlm/lowcomms.c | 9 +- fs/ext4/balloc.c | 2 +- fs/ext4/ext4.h | 9 +- fs/ext4/ext4_jbd2.c | 3 +- fs/ext4/extents.c | 8 + fs/ext4/extents_status.c | 3 +- fs/ext4/fast_commit.c | 285 ++++++++++++--------- fs/ext4/fast_commit.h | 7 +- fs/ext4/indirect.c | 13 +- fs/ext4/inode.c | 50 +++- fs/ext4/ioctl.c | 13 +- fs/ext4/namei.c | 47 ++-- fs/ext4/orphan.c | 26 +- fs/ext4/resize.c | 6 +- fs/ext4/super.c | 52 +++- fs/ext4/verity.c | 2 +- fs/ext4/xattr.c | 19 +- fs/f2fs/gc.c | 1 + fs/f2fs/node.c | 3 +- fs/hfs/inode.c | 13 +- fs/hfsplus/hfsplus_fs.h | 2 + fs/hfsplus/inode.c | 16 +- fs/hfsplus/options.c | 4 + fs/ksmbd/auth.c | 3 +- fs/ksmbd/connection.c | 7 +- fs/ksmbd/transport_tcp.c | 5 +- fs/locks.c | 23 ++ fs/mbcache.c | 121 ++++----- fs/nfsd/nfs4xdr.c | 11 + fs/nfsd/nfssvc.c | 2 +- fs/ntfs3/attrib.c | 18 ++ fs/ntfs3/attrlist.c | 5 + fs/ntfs3/bitmap.c | 2 +- fs/ntfs3/file.c | 4 +- fs/ntfs3/frecord.c | 14 + fs/ntfs3/fslog.c | 35 +-- fs/ntfs3/fsntfs.c | 10 +- fs/ntfs3/index.c | 6 + fs/ntfs3/inode.c | 9 + fs/ntfs3/record.c | 10 + fs/ntfs3/super.c | 9 +- fs/overlayfs/dir.c | 46 ++-- fs/pnode.c | 2 +- fs/pstore/ram.c | 2 +- fs/pstore/zone.c | 2 +- fs/quota/dquot.c | 2 + fs/udf/inode.c | 2 +- include/linux/devfreq.h | 7 +- include/linux/efi.h | 2 - include/linux/fs.h | 6 + include/linux/mbcache.h | 33 ++- include/linux/mlx5/device.h | 5 + include/linux/mlx5/mlx5_ifc.h | 3 +- include/linux/netfilter/ipset/ip_set.h | 2 +- include/linux/nvme.h | 3 +- include/linux/sunrpc/rpc_pipe_fs.h | 5 + include/net/mptcp.h | 12 +- include/net/netfilter/nf_tables.h | 25 +- include/sound/soc-dai.h | 32 +-- include/trace/events/ext4.h | 7 +- include/trace/events/jbd2.h | 44 ++-- io_uring/io_uring.c | 13 +- kernel/events/core.c | 6 +- kernel/kcsan/core.c | 50 ++++ kernel/rcu/tasks.h | 20 +- kernel/trace/Kconfig | 2 + kernel/trace/trace.c | 38 ++- kernel/trace/trace.h | 27 +- kernel/trace/trace_eprobe.c | 3 + kernel/trace/trace_events_hist.c | 11 +- kernel/trace/trace_events_synth.c | 2 +- kernel/trace/trace_probe.c | 2 +- mm/compaction.c | 18 +- net/caif/cfctrl.c | 6 +- net/core/filter.c | 7 +- net/ipv4/syncookies.c | 7 +- net/mptcp/subflow.c | 76 ++++-- net/netfilter/ipset/ip_set_core.c | 7 +- net/netfilter/ipset/ip_set_hash_ip.c | 14 +- net/netfilter/ipset/ip_set_hash_ipmark.c | 13 +- net/netfilter/ipset/ip_set_hash_ipport.c | 13 +- net/netfilter/ipset/ip_set_hash_ipportip.c | 13 +- net/netfilter/ipset/ip_set_hash_ipportnet.c | 13 +- net/netfilter/ipset/ip_set_hash_net.c | 17 +- net/netfilter/ipset/ip_set_hash_netiface.c | 15 +- net/netfilter/ipset/ip_set_hash_netnet.c | 23 +- net/netfilter/ipset/ip_set_hash_netport.c | 19 +- net/netfilter/ipset/ip_set_hash_netportnet.c | 40 +-- net/netfilter/nf_tables_api.c | 261 ++++++++++++------- net/nfc/netlink.c | 52 +++- net/packet/af_packet.c | 20 +- net/sched/cls_tcindex.c | 12 +- net/sched/sch_atm.c | 5 +- net/sched/sch_cbq.c | 4 +- net/sunrpc/auth_gss/auth_gss.c | 19 +- net/sunrpc/auth_gss/svcauth_gss.c | 9 +- security/device_cgroup.c | 33 ++- security/integrity/ima/ima_template.c | 5 +- security/integrity/platform_certs/load_uefi.c | 1 + sound/pci/hda/patch_realtek.c | 50 ++++ sound/soc/codecs/hdac_hda.c | 22 +- sound/soc/codecs/max98373-sdw.c | 2 +- sound/soc/codecs/rt1308-sdw.c | 2 +- sound/soc/codecs/rt1316-sdw.c | 2 +- sound/soc/codecs/rt5682-sdw.c | 2 +- sound/soc/codecs/rt700.c | 2 +- sound/soc/codecs/rt711-sdca.c | 2 +- sound/soc/codecs/rt711.c | 2 +- sound/soc/codecs/rt715-sdca.c | 2 +- sound/soc/codecs/rt715.c | 2 +- sound/soc/codecs/sdw-mockup.c | 2 +- sound/soc/codecs/wcd938x.c | 2 +- sound/soc/codecs/wsa881x.c | 2 +- sound/soc/intel/boards/bytcr_rt5640.c | 15 ++ sound/soc/intel/boards/sof_sdw.c | 6 +- sound/soc/intel/skylake/skl-pcm.c | 7 +- sound/soc/jz4740/jz4740-i2s.c | 39 ++- sound/soc/qcom/sdm845.c | 4 +- sound/soc/qcom/sm8250.c | 4 +- sound/soc/sof/intel/hda-dai.c | 7 +- sound/usb/line6/driver.c | 3 +- sound/usb/line6/midi.c | 6 +- sound/usb/line6/midibuf.c | 25 +- sound/usb/line6/midibuf.h | 5 +- sound/usb/line6/pod.c | 3 +- tools/objtool/check.c | 2 +- tools/perf/util/cgroup.c | 23 +- tools/perf/util/data.c | 2 + tools/perf/util/dwarf-aux.c | 23 +- tools/testing/ktest/ktest.pl | 23 +- tools/testing/selftests/Makefile | 26 +- tools/testing/selftests/bpf/config | 4 - tools/testing/selftests/bpf/prog_tests/bpf_nf.c | 48 ---- tools/testing/selftests/bpf/progs/test_bpf_nf.c | 109 -------- tools/testing/selftests/lib.mk | 5 + 305 files changed, 3239 insertions(+), 1867 deletions(-)
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit 97a48da1619ba6bd42a0e5da0a03aa490a9496b1 ]
of_icc_get() alloc resources for path handle, we should release it when not need anymore. Like the release in dwc3_qcom_interconnect_exit() function. Add icc_put() in error handling to fix this.
Fixes: bea46b981515 ("usb: dwc3: qcom: Add interconnect support in dwc3 driver") Cc: stable stable@kernel.org Acked-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Signed-off-by: Miaoqian Lin linmq006@gmail.com Link: https://lore.kernel.org/r/20221206081731.818107-1-linmq006@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/dwc3-qcom.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c index d0352daab012..ec1de6f6c290 100644 --- a/drivers/usb/dwc3/dwc3-qcom.c +++ b/drivers/usb/dwc3/dwc3-qcom.c @@ -258,7 +258,8 @@ static int dwc3_qcom_interconnect_init(struct dwc3_qcom *qcom) if (IS_ERR(qcom->icc_path_apps)) { dev_err(dev, "failed to get apps-usb path: %ld\n", PTR_ERR(qcom->icc_path_apps)); - return PTR_ERR(qcom->icc_path_apps); + ret = PTR_ERR(qcom->icc_path_apps); + goto put_path_ddr; }
if (usb_get_maximum_speed(&qcom->dwc3->dev) >= USB_SPEED_SUPER || @@ -271,17 +272,23 @@ static int dwc3_qcom_interconnect_init(struct dwc3_qcom *qcom)
if (ret) { dev_err(dev, "failed to set bandwidth for usb-ddr path: %d\n", ret); - return ret; + goto put_path_apps; }
ret = icc_set_bw(qcom->icc_path_apps, APPS_USB_AVG_BW, APPS_USB_PEAK_BW); if (ret) { dev_err(dev, "failed to set bandwidth for apps-usb path: %d\n", ret); - return ret; + goto put_path_apps; }
return 0; + +put_path_apps: + icc_put(qcom->icc_path_apps); +put_path_ddr: + icc_put(qcom->icc_path_ddr); + return ret; }
/**
From: Paulo Alcantara pc@cjr.nz
[ Upstream commit f7f291e14dde32a07b1f0aa06921d28f875a7b54 ]
When running xfstests against Azure the following oops occurred on an arm64 system
Unable to handle kernel write to read-only memory at virtual address ffff0001221cf000 Mem abort info: ESR = 0x9600004f EC = 0x25: DABT (current EL), IL = 32 bits SET = 0, FnV = 0 EA = 0, S1PTW = 0 FSC = 0x0f: level 3 permission fault Data abort info: ISV = 0, ISS = 0x0000004f CM = 0, WnR = 1 swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000294f3000 [ffff0001221cf000] pgd=18000001ffff8003, p4d=18000001ffff8003, pud=18000001ff82e003, pmd=18000001ff71d003, pte=00600001221cf787 Internal error: Oops: 9600004f [#1] PREEMPT SMP ... pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--) pc : __memcpy+0x40/0x230 lr : scatterwalk_copychunks+0xe0/0x200 sp : ffff800014e92de0 x29: ffff800014e92de0 x28: ffff000114f9de80 x27: 0000000000000008 x26: 0000000000000008 x25: ffff800014e92e78 x24: 0000000000000008 x23: 0000000000000001 x22: 0000040000000000 x21: ffff000000000000 x20: 0000000000000001 x19: ffff0001037c4488 x18: 0000000000000014 x17: 235e1c0d6efa9661 x16: a435f9576b6edd6c x15: 0000000000000058 x14: 0000000000000001 x13: 0000000000000008 x12: ffff000114f2e590 x11: ffffffffffffffff x10: 0000040000000000 x9 : ffff8000105c3580 x8 : 2e9413b10000001a x7 : 534b4410fb86b005 x6 : 534b4410fb86b005 x5 : ffff0001221cf008 x4 : ffff0001037c4490 x3 : 0000000000000001 x2 : 0000000000000008 x1 : ffff0001037c4488 x0 : ffff0001221cf000 Call trace: __memcpy+0x40/0x230 scatterwalk_map_and_copy+0x98/0x100 crypto_ccm_encrypt+0x150/0x180 crypto_aead_encrypt+0x2c/0x40 crypt_message+0x750/0x880 smb3_init_transform_rq+0x298/0x340 smb_send_rqst.part.11+0xd8/0x180 smb_send_rqst+0x3c/0x100 compound_send_recv+0x534/0xbc0 smb2_query_info_compound+0x32c/0x440 smb2_set_ea+0x438/0x4c0 cifs_xattr_set+0x5d4/0x7c0
This is because in scatterwalk_copychunks(), we attempted to write to a buffer (@sign) that was allocated in the stack (vmalloc area) by crypt_message() and thus accessing its remaining 8 (x2) bytes ended up crossing a page boundary.
To simply fix it, we could just pass @sign kmalloc'd from crypt_message() and then we're done. Luckily, we don't seem to pass any other vmalloc'd buffers in smb_rqst::rq_iov...
Instead, let's map the correct pages and offsets from vmalloc buffers as well in cifs_sg_set_buf() and then avoiding such oopses.
Signed-off-by: Paulo Alcantara (SUSE) pc@cjr.nz Cc: stable@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/cifsglob.h | 69 +++++++++++++++++++++ fs/cifs/cifsproto.h | 4 +- fs/cifs/misc.c | 4 +- fs/cifs/smb2ops.c | 143 +++++++++++++++++++++----------------------- 4 files changed, 141 insertions(+), 79 deletions(-)
diff --git a/fs/cifs/cifsglob.h b/fs/cifs/cifsglob.h index 1ab72c3d0bff..0f1b9c48838c 100644 --- a/fs/cifs/cifsglob.h +++ b/fs/cifs/cifsglob.h @@ -13,6 +13,8 @@ #include <linux/in6.h> #include <linux/inet.h> #include <linux/slab.h> +#include <linux/scatterlist.h> +#include <linux/mm.h> #include <linux/mempool.h> #include <linux/workqueue.h> #include "cifs_fs_sb.h" @@ -21,6 +23,7 @@ #include <linux/scatterlist.h> #include <uapi/linux/cifs/cifs_mount.h> #include "smb2pdu.h" +#include "smb2glob.h"
#define CIFS_MAGIC_NUMBER 0xFF534D42 /* the first four bytes of SMB PDUs */
@@ -1972,4 +1975,70 @@ static inline bool cifs_is_referral_server(struct cifs_tcon *tcon, return is_tcon_dfs(tcon) || (ref && (ref->flags & DFSREF_REFERRAL_SERVER)); }
+static inline unsigned int cifs_get_num_sgs(const struct smb_rqst *rqst, + int num_rqst, + const u8 *sig) +{ + unsigned int len, skip; + unsigned int nents = 0; + unsigned long addr; + int i, j; + + /* Assumes the first rqst has a transform header as the first iov. + * I.e. + * rqst[0].rq_iov[0] is transform header + * rqst[0].rq_iov[1+] data to be encrypted/decrypted + * rqst[1+].rq_iov[0+] data to be encrypted/decrypted + */ + for (i = 0; i < num_rqst; i++) { + /* + * The first rqst has a transform header where the + * first 20 bytes are not part of the encrypted blob. + */ + for (j = 0; j < rqst[i].rq_nvec; j++) { + struct kvec *iov = &rqst[i].rq_iov[j]; + + skip = (i == 0) && (j == 0) ? 20 : 0; + addr = (unsigned long)iov->iov_base + skip; + if (unlikely(is_vmalloc_addr((void *)addr))) { + len = iov->iov_len - skip; + nents += DIV_ROUND_UP(offset_in_page(addr) + len, + PAGE_SIZE); + } else { + nents++; + } + } + nents += rqst[i].rq_npages; + } + nents += DIV_ROUND_UP(offset_in_page(sig) + SMB2_SIGNATURE_SIZE, PAGE_SIZE); + return nents; +} + +/* We can not use the normal sg_set_buf() as we will sometimes pass a + * stack object as buf. + */ +static inline struct scatterlist *cifs_sg_set_buf(struct scatterlist *sg, + const void *buf, + unsigned int buflen) +{ + unsigned long addr = (unsigned long)buf; + unsigned int off = offset_in_page(addr); + + addr &= PAGE_MASK; + if (unlikely(is_vmalloc_addr((void *)addr))) { + do { + unsigned int len = min_t(unsigned int, buflen, PAGE_SIZE - off); + + sg_set_page(sg++, vmalloc_to_page((void *)addr), len, off); + + off = 0; + addr += PAGE_SIZE; + buflen -= len; + } while (buflen); + } else { + sg_set_page(sg++, virt_to_page(addr), buflen, off); + } + return sg; +} + #endif /* _CIFS_GLOB_H */ diff --git a/fs/cifs/cifsproto.h b/fs/cifs/cifsproto.h index b2697356b5e7..50844d51da5d 100644 --- a/fs/cifs/cifsproto.h +++ b/fs/cifs/cifsproto.h @@ -590,8 +590,8 @@ int cifs_alloc_hash(const char *name, struct crypto_shash **shash, struct sdesc **sdesc); void cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc);
-extern void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page, - unsigned int *len, unsigned int *offset); +void rqst_page_get_length(const struct smb_rqst *rqst, unsigned int page, + unsigned int *len, unsigned int *offset); struct cifs_chan * cifs_ses_find_chan(struct cifs_ses *ses, struct TCP_Server_Info *server); int cifs_try_adding_channels(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses); diff --git a/fs/cifs/misc.c b/fs/cifs/misc.c index 94143d7f58c7..3a90ee314ed7 100644 --- a/fs/cifs/misc.c +++ b/fs/cifs/misc.c @@ -1134,8 +1134,8 @@ cifs_free_hash(struct crypto_shash **shash, struct sdesc **sdesc) * @len: Where to store the length for this page: * @offset: Where to store the offset for this page */ -void rqst_page_get_length(struct smb_rqst *rqst, unsigned int page, - unsigned int *len, unsigned int *offset) +void rqst_page_get_length(const struct smb_rqst *rqst, unsigned int page, + unsigned int *len, unsigned int *offset) { *len = rqst->rq_pagesz; *offset = (page == 0) ? rqst->rq_offset : 0; diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 5e6526c201fe..817d78129bd2 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -4416,69 +4416,82 @@ fill_transform_hdr(struct smb2_transform_hdr *tr_hdr, unsigned int orig_len, memcpy(&tr_hdr->SessionId, &shdr->SessionId, 8); }
-/* We can not use the normal sg_set_buf() as we will sometimes pass a - * stack object as buf. - */ -static inline void smb2_sg_set_buf(struct scatterlist *sg, const void *buf, - unsigned int buflen) +static void *smb2_aead_req_alloc(struct crypto_aead *tfm, const struct smb_rqst *rqst, + int num_rqst, const u8 *sig, u8 **iv, + struct aead_request **req, struct scatterlist **sgl, + unsigned int *num_sgs) { - void *addr; - /* - * VMAP_STACK (at least) puts stack into the vmalloc address space - */ - if (is_vmalloc_addr(buf)) - addr = vmalloc_to_page(buf); - else - addr = virt_to_page(buf); - sg_set_page(sg, addr, buflen, offset_in_page(buf)); + unsigned int req_size = sizeof(**req) + crypto_aead_reqsize(tfm); + unsigned int iv_size = crypto_aead_ivsize(tfm); + unsigned int len; + u8 *p; + + *num_sgs = cifs_get_num_sgs(rqst, num_rqst, sig); + + len = iv_size; + len += crypto_aead_alignmask(tfm) & ~(crypto_tfm_ctx_alignment() - 1); + len = ALIGN(len, crypto_tfm_ctx_alignment()); + len += req_size; + len = ALIGN(len, __alignof__(struct scatterlist)); + len += *num_sgs * sizeof(**sgl); + + p = kmalloc(len, GFP_ATOMIC); + if (!p) + return NULL; + + *iv = (u8 *)PTR_ALIGN(p, crypto_aead_alignmask(tfm) + 1); + *req = (struct aead_request *)PTR_ALIGN(*iv + iv_size, + crypto_tfm_ctx_alignment()); + *sgl = (struct scatterlist *)PTR_ALIGN((u8 *)*req + req_size, + __alignof__(struct scatterlist)); + return p; }
-/* Assumes the first rqst has a transform header as the first iov. - * I.e. - * rqst[0].rq_iov[0] is transform header - * rqst[0].rq_iov[1+] data to be encrypted/decrypted - * rqst[1+].rq_iov[0+] data to be encrypted/decrypted - */ -static struct scatterlist * -init_sg(int num_rqst, struct smb_rqst *rqst, u8 *sign) +static void *smb2_get_aead_req(struct crypto_aead *tfm, const struct smb_rqst *rqst, + int num_rqst, const u8 *sig, u8 **iv, + struct aead_request **req, struct scatterlist **sgl) { - unsigned int sg_len; + unsigned int off, len, skip; struct scatterlist *sg; - unsigned int i; - unsigned int j; - unsigned int idx = 0; - int skip; - - sg_len = 1; - for (i = 0; i < num_rqst; i++) - sg_len += rqst[i].rq_nvec + rqst[i].rq_npages; + unsigned int num_sgs; + unsigned long addr; + int i, j; + void *p;
- sg = kmalloc_array(sg_len, sizeof(struct scatterlist), GFP_KERNEL); - if (!sg) + p = smb2_aead_req_alloc(tfm, rqst, num_rqst, sig, iv, req, sgl, &num_sgs); + if (!p) return NULL;
- sg_init_table(sg, sg_len); + sg_init_table(*sgl, num_sgs); + sg = *sgl; + + /* Assumes the first rqst has a transform header as the first iov. + * I.e. + * rqst[0].rq_iov[0] is transform header + * rqst[0].rq_iov[1+] data to be encrypted/decrypted + * rqst[1+].rq_iov[0+] data to be encrypted/decrypted + */ for (i = 0; i < num_rqst; i++) { + /* + * The first rqst has a transform header where the + * first 20 bytes are not part of the encrypted blob. + */ for (j = 0; j < rqst[i].rq_nvec; j++) { - /* - * The first rqst has a transform header where the - * first 20 bytes are not part of the encrypted blob - */ - skip = (i == 0) && (j == 0) ? 20 : 0; - smb2_sg_set_buf(&sg[idx++], - rqst[i].rq_iov[j].iov_base + skip, - rqst[i].rq_iov[j].iov_len - skip); - } + struct kvec *iov = &rqst[i].rq_iov[j];
+ skip = (i == 0) && (j == 0) ? 20 : 0; + addr = (unsigned long)iov->iov_base + skip; + len = iov->iov_len - skip; + sg = cifs_sg_set_buf(sg, (void *)addr, len); + } for (j = 0; j < rqst[i].rq_npages; j++) { - unsigned int len, offset; - - rqst_page_get_length(&rqst[i], j, &len, &offset); - sg_set_page(&sg[idx++], rqst[i].rq_pages[j], len, offset); + rqst_page_get_length(&rqst[i], j, &len, &off); + sg_set_page(sg++, rqst[i].rq_pages[j], len, off); } } - smb2_sg_set_buf(&sg[idx], sign, SMB2_SIGNATURE_SIZE); - return sg; + cifs_sg_set_buf(sg, sig, SMB2_SIGNATURE_SIZE); + + return p; }
static int @@ -4522,11 +4535,11 @@ crypt_message(struct TCP_Server_Info *server, int num_rqst, u8 sign[SMB2_SIGNATURE_SIZE] = {}; u8 key[SMB3_ENC_DEC_KEY_SIZE]; struct aead_request *req; - char *iv; - unsigned int iv_len; + u8 *iv; DECLARE_CRYPTO_WAIT(wait); struct crypto_aead *tfm; unsigned int crypt_len = le32_to_cpu(tr_hdr->OriginalMessageSize); + void *creq;
rc = smb2_get_enc_key(server, tr_hdr->SessionId, enc, key); if (rc) { @@ -4561,32 +4574,15 @@ crypt_message(struct TCP_Server_Info *server, int num_rqst, return rc; }
- req = aead_request_alloc(tfm, GFP_KERNEL); - if (!req) { - cifs_server_dbg(VFS, "%s: Failed to alloc aead request\n", __func__); + creq = smb2_get_aead_req(tfm, rqst, num_rqst, sign, &iv, &req, &sg); + if (unlikely(!creq)) return -ENOMEM; - }
if (!enc) { memcpy(sign, &tr_hdr->Signature, SMB2_SIGNATURE_SIZE); crypt_len += SMB2_SIGNATURE_SIZE; }
- sg = init_sg(num_rqst, rqst, sign); - if (!sg) { - cifs_server_dbg(VFS, "%s: Failed to init sg\n", __func__); - rc = -ENOMEM; - goto free_req; - } - - iv_len = crypto_aead_ivsize(tfm); - iv = kzalloc(iv_len, GFP_KERNEL); - if (!iv) { - cifs_server_dbg(VFS, "%s: Failed to alloc iv\n", __func__); - rc = -ENOMEM; - goto free_sg; - } - if ((server->cipher_type == SMB2_ENCRYPTION_AES128_GCM) || (server->cipher_type == SMB2_ENCRYPTION_AES256_GCM)) memcpy(iv, (char *)tr_hdr->Nonce, SMB3_AES_GCM_NONCE); @@ -4595,6 +4591,7 @@ crypt_message(struct TCP_Server_Info *server, int num_rqst, memcpy(iv + 1, (char *)tr_hdr->Nonce, SMB3_AES_CCM_NONCE); }
+ aead_request_set_tfm(req, tfm); aead_request_set_crypt(req, sg, sg, crypt_len, iv); aead_request_set_ad(req, assoc_data_len);
@@ -4607,11 +4604,7 @@ crypt_message(struct TCP_Server_Info *server, int num_rqst, if (!rc && enc) memcpy(&tr_hdr->Signature, sign, SMB2_SIGNATURE_SIZE);
- kfree(iv); -free_sg: - kfree(sg); -free_req: - kfree(req); + kfree_sensitive(creq); return rc; }
This reverts commit f463a1295c4fa73eac0b16fbfbdfc5726b06445d.
Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/bpf/config | 4 - .../testing/selftests/bpf/prog_tests/bpf_nf.c | 48 -------- .../testing/selftests/bpf/progs/test_bpf_nf.c | 109 ------------------ 3 files changed, 161 deletions(-) delete mode 100644 tools/testing/selftests/bpf/prog_tests/bpf_nf.c delete mode 100644 tools/testing/selftests/bpf/progs/test_bpf_nf.c
diff --git a/tools/testing/selftests/bpf/config b/tools/testing/selftests/bpf/config index 4a2a47fcd6ef..5192305159ec 100644 --- a/tools/testing/selftests/bpf/config +++ b/tools/testing/selftests/bpf/config @@ -46,7 +46,3 @@ CONFIG_IMA_READ_POLICY=y CONFIG_BLK_DEV_LOOP=y CONFIG_FUNCTION_TRACER=y CONFIG_DYNAMIC_FTRACE=y -CONFIG_NETFILTER=y -CONFIG_NF_DEFRAG_IPV4=y -CONFIG_NF_DEFRAG_IPV6=y -CONFIG_NF_CONNTRACK=y diff --git a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c b/tools/testing/selftests/bpf/prog_tests/bpf_nf.c deleted file mode 100644 index e3166a81e989..000000000000 --- a/tools/testing/selftests/bpf/prog_tests/bpf_nf.c +++ /dev/null @@ -1,48 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#include <test_progs.h> -#include <network_helpers.h> -#include "test_bpf_nf.skel.h" - -enum { - TEST_XDP, - TEST_TC_BPF, -}; - -void test_bpf_nf_ct(int mode) -{ - struct test_bpf_nf *skel; - int prog_fd, err, retval; - - skel = test_bpf_nf__open_and_load(); - if (!ASSERT_OK_PTR(skel, "test_bpf_nf__open_and_load")) - return; - - if (mode == TEST_XDP) - prog_fd = bpf_program__fd(skel->progs.nf_xdp_ct_test); - else - prog_fd = bpf_program__fd(skel->progs.nf_skb_ct_test); - - err = bpf_prog_test_run(prog_fd, 1, &pkt_v4, sizeof(pkt_v4), NULL, NULL, - (__u32 *)&retval, NULL); - if (!ASSERT_OK(err, "bpf_prog_test_run")) - goto end; - - ASSERT_EQ(skel->bss->test_einval_bpf_tuple, -EINVAL, "Test EINVAL for NULL bpf_tuple"); - ASSERT_EQ(skel->bss->test_einval_reserved, -EINVAL, "Test EINVAL for reserved not set to 0"); - ASSERT_EQ(skel->bss->test_einval_netns_id, -EINVAL, "Test EINVAL for netns_id < -1"); - ASSERT_EQ(skel->bss->test_einval_len_opts, -EINVAL, "Test EINVAL for len__opts != NF_BPF_CT_OPTS_SZ"); - ASSERT_EQ(skel->bss->test_eproto_l4proto, -EPROTO, "Test EPROTO for l4proto != TCP or UDP"); - ASSERT_EQ(skel->bss->test_enonet_netns_id, -ENONET, "Test ENONET for bad but valid netns_id"); - ASSERT_EQ(skel->bss->test_enoent_lookup, -ENOENT, "Test ENOENT for failed lookup"); - ASSERT_EQ(skel->bss->test_eafnosupport, -EAFNOSUPPORT, "Test EAFNOSUPPORT for invalid len__tuple"); -end: - test_bpf_nf__destroy(skel); -} - -void test_bpf_nf(void) -{ - if (test__start_subtest("xdp-ct")) - test_bpf_nf_ct(TEST_XDP); - if (test__start_subtest("tc-bpf-ct")) - test_bpf_nf_ct(TEST_TC_BPF); -} diff --git a/tools/testing/selftests/bpf/progs/test_bpf_nf.c b/tools/testing/selftests/bpf/progs/test_bpf_nf.c deleted file mode 100644 index 6f131c993c0b..000000000000 --- a/tools/testing/selftests/bpf/progs/test_bpf_nf.c +++ /dev/null @@ -1,109 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -#include <vmlinux.h> -#include <bpf/bpf_helpers.h> - -#define EAFNOSUPPORT 97 -#define EPROTO 71 -#define ENONET 64 -#define EINVAL 22 -#define ENOENT 2 - -int test_einval_bpf_tuple = 0; -int test_einval_reserved = 0; -int test_einval_netns_id = 0; -int test_einval_len_opts = 0; -int test_eproto_l4proto = 0; -int test_enonet_netns_id = 0; -int test_enoent_lookup = 0; -int test_eafnosupport = 0; - -struct nf_conn *bpf_xdp_ct_lookup(struct xdp_md *, struct bpf_sock_tuple *, u32, - struct bpf_ct_opts *, u32) __ksym; -struct nf_conn *bpf_skb_ct_lookup(struct __sk_buff *, struct bpf_sock_tuple *, u32, - struct bpf_ct_opts *, u32) __ksym; -void bpf_ct_release(struct nf_conn *) __ksym; - -static __always_inline void -nf_ct_test(struct nf_conn *(*func)(void *, struct bpf_sock_tuple *, u32, - struct bpf_ct_opts *, u32), - void *ctx) -{ - struct bpf_ct_opts opts_def = { .l4proto = IPPROTO_TCP, .netns_id = -1 }; - struct bpf_sock_tuple bpf_tuple; - struct nf_conn *ct; - - __builtin_memset(&bpf_tuple, 0, sizeof(bpf_tuple.ipv4)); - - ct = func(ctx, NULL, 0, &opts_def, sizeof(opts_def)); - if (ct) - bpf_ct_release(ct); - else - test_einval_bpf_tuple = opts_def.error; - - opts_def.reserved[0] = 1; - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); - opts_def.reserved[0] = 0; - opts_def.l4proto = IPPROTO_TCP; - if (ct) - bpf_ct_release(ct); - else - test_einval_reserved = opts_def.error; - - opts_def.netns_id = -2; - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); - opts_def.netns_id = -1; - if (ct) - bpf_ct_release(ct); - else - test_einval_netns_id = opts_def.error; - - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def) - 1); - if (ct) - bpf_ct_release(ct); - else - test_einval_len_opts = opts_def.error; - - opts_def.l4proto = IPPROTO_ICMP; - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); - opts_def.l4proto = IPPROTO_TCP; - if (ct) - bpf_ct_release(ct); - else - test_eproto_l4proto = opts_def.error; - - opts_def.netns_id = 0xf00f; - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); - opts_def.netns_id = -1; - if (ct) - bpf_ct_release(ct); - else - test_enonet_netns_id = opts_def.error; - - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4), &opts_def, sizeof(opts_def)); - if (ct) - bpf_ct_release(ct); - else - test_enoent_lookup = opts_def.error; - - ct = func(ctx, &bpf_tuple, sizeof(bpf_tuple.ipv4) - 1, &opts_def, sizeof(opts_def)); - if (ct) - bpf_ct_release(ct); - else - test_eafnosupport = opts_def.error; -} - -SEC("xdp") -int nf_xdp_ct_test(struct xdp_md *ctx) -{ - nf_ct_test((void *)bpf_xdp_ct_lookup, ctx); - return 0; -} - -SEC("tc") -int nf_skb_ct_test(struct __sk_buff *ctx) -{ - nf_ct_test((void *)bpf_skb_ct_lookup, ctx); - return 0; -} - -char _license[] SEC("license") = "GPL";
From: Klaus Jensen k.jensen@samsung.com
[ Upstream commit b5f96cb719d8ba220b565ddd3ba4ac0d8bcfb130 ]
When using shadow doorbells, the event index and the doorbell values are written to host memory. Prior to this patch, the values written would erroneously be written in host endianness. This causes trouble on big-endian platforms. Fix this by adding missing endian conversions.
This issue was noticed by Guenter while testing various big-endian platforms under QEMU[1]. A similar fix required for hw/nvme in QEMU is up for review as well[2].
[1]: https://lore.kernel.org/qemu-devel/20221209110022.GA3396194@roeck-us.net/ [2]: https://lore.kernel.org/qemu-devel/20221212114409.34972-4-its@irrelevant.dk/
Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices") Reported-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Klaus Jensen k.jensen@samsung.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index d49df7123677..ab038dbafc06 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -142,9 +142,9 @@ struct nvme_dev { mempool_t *iod_mempool;
/* shadow doorbell buffer support: */ - u32 *dbbuf_dbs; + __le32 *dbbuf_dbs; dma_addr_t dbbuf_dbs_dma_addr; - u32 *dbbuf_eis; + __le32 *dbbuf_eis; dma_addr_t dbbuf_eis_dma_addr;
/* host memory buffer support: */ @@ -208,10 +208,10 @@ struct nvme_queue { #define NVMEQ_SQ_CMB 1 #define NVMEQ_DELETE_ERROR 2 #define NVMEQ_POLLED 3 - u32 *dbbuf_sq_db; - u32 *dbbuf_cq_db; - u32 *dbbuf_sq_ei; - u32 *dbbuf_cq_ei; + __le32 *dbbuf_sq_db; + __le32 *dbbuf_cq_db; + __le32 *dbbuf_sq_ei; + __le32 *dbbuf_cq_ei; struct completion delete_done; };
@@ -332,11 +332,11 @@ static inline int nvme_dbbuf_need_event(u16 event_idx, u16 new_idx, u16 old) }
/* Update dbbuf and return true if an MMIO is required */ -static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db, - volatile u32 *dbbuf_ei) +static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db, + volatile __le32 *dbbuf_ei) { if (dbbuf_db) { - u16 old_value; + u16 old_value, event_idx;
/* * Ensure that the queue is written before updating @@ -344,8 +344,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db, */ wmb();
- old_value = *dbbuf_db; - *dbbuf_db = value; + old_value = le32_to_cpu(*dbbuf_db); + *dbbuf_db = cpu_to_le32(value);
/* * Ensure that the doorbell is updated before reading the event @@ -355,7 +355,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db, */ mb();
- if (!nvme_dbbuf_need_event(*dbbuf_ei, value, old_value)) + event_idx = le32_to_cpu(*dbbuf_ei); + if (!nvme_dbbuf_need_event(event_idx, value, old_value)) return false; }
From: Keith Busch kbusch@kernel.org
[ Upstream commit c89a529e823d51dd23c7ec0c047c7a454a428541 ]
Convert the max size to bytes to match the units of the divisor that calculates the worst-case number of PRP entries.
The result is used to determine how many PRP Lists are required. The code was previously rounding this to 1 list, but we can require 2 in the worst case. In that scenario, the driver would corrupt memory beyond the size provided by the mempool.
While unlikely to occur (you'd need a 4MB in exactly 127 phys segments on a queue that doesn't support SGLs), this memory corruption has been observed by kfence.
Cc: Jens Axboe axboe@kernel.dk Fixes: 943e942e6266f ("nvme-pci: limit max IO size and segments to avoid high order allocations") Signed-off-by: Keith Busch kbusch@kernel.org Reviewed-by: Jens Axboe axboe@kernel.dk Reviewed-by: Kanchan Joshi joshi.k@samsung.com Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index ab038dbafc06..7a96cbbfdabb 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -370,8 +370,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db, */ static int nvme_pci_npages_prp(void) { - unsigned nprps = DIV_ROUND_UP(NVME_MAX_KB_SZ + NVME_CTRL_PAGE_SIZE, - NVME_CTRL_PAGE_SIZE); + unsigned max_bytes = (NVME_MAX_KB_SZ * 1024) + NVME_CTRL_PAGE_SIZE; + unsigned nprps = DIV_ROUND_UP(max_bytes, NVME_CTRL_PAGE_SIZE); return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8); }
From: Keith Busch kbusch@kernel.org
[ Upstream commit 841734234a28fd5cd0889b84bd4d93a0988fa11e ]
The size allocated out of the dma pool is at most NVME_CTRL_PAGE_SIZE, which may be smaller than the PAGE_SIZE.
Fixes: c61b82c7b7134 ("nvme-pci: fix PRP pool size") Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 7a96cbbfdabb..0165e65cf548 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -33,7 +33,7 @@ #define SQ_SIZE(q) ((q)->q_depth << (q)->sqes) #define CQ_SIZE(q) ((q)->q_depth * sizeof(struct nvme_completion))
-#define SGES_PER_PAGE (PAGE_SIZE / sizeof(struct nvme_sgl_desc)) +#define SGES_PER_PAGE (NVME_CTRL_PAGE_SIZE / sizeof(struct nvme_sgl_desc))
/* * These can be higher, but we need to ensure that any command doesn't @@ -372,7 +372,7 @@ static int nvme_pci_npages_prp(void) { unsigned max_bytes = (NVME_MAX_KB_SZ * 1024) + NVME_CTRL_PAGE_SIZE; unsigned nprps = DIV_ROUND_UP(max_bytes, NVME_CTRL_PAGE_SIZE); - return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8); + return DIV_ROUND_UP(8 * nprps, NVME_CTRL_PAGE_SIZE - 8); }
/* @@ -382,7 +382,7 @@ static int nvme_pci_npages_prp(void) static int nvme_pci_npages_sgl(void) { return DIV_ROUND_UP(NVME_MAX_SEGS * sizeof(struct nvme_sgl_desc), - PAGE_SIZE); + NVME_CTRL_PAGE_SIZE); }
static size_t nvme_pci_iod_alloc_size(void) @@ -732,7 +732,7 @@ static void nvme_pci_sgl_set_seg(struct nvme_sgl_desc *sge, sge->length = cpu_to_le32(entries * sizeof(*sge)); sge->type = NVME_SGL_FMT_LAST_SEG_DESC << 4; } else { - sge->length = cpu_to_le32(PAGE_SIZE); + sge->length = cpu_to_le32(NVME_CTRL_PAGE_SIZE); sge->type = NVME_SGL_FMT_SEG_DESC << 4; } }
From: Tamim Khan tamim@fusetak.com
[ Upstream commit e12dee3736731e24b1e7367f87d66ac0fcd73ce7 ]
In the ACPI DSDT table for Asus VivoBook K3402ZA/K3502ZA IRQ 1 is described as ActiveLow; however, the kernel overrides it to Edge_High. This prevents the internal keyboard from working on these laptops. In order to fix this add these laptops to the skip_override_table so that the kernel does not override IRQ 1 to Edge_High.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216158 Reviewed-by: Hui Wang hui.wang@canonical.com Tested-by: Tamim Khan tamim@fusetak.com Tested-by: Sunand sunandchakradhar@gmail.com Signed-off-by: Tamim Khan tamim@fusetak.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: f3cb9b740869 ("ACPI: resource: do IRQ override on Lenovo 14ALC7") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/resource.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 19358a641610..596ca9fae389 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -399,6 +399,24 @@ static const struct dmi_system_id medion_laptop[] = { { } };
+static const struct dmi_system_id asus_laptop[] = { + { + .ident = "Asus Vivobook K3402ZA", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BOARD_NAME, "K3402ZA"), + }, + }, + { + .ident = "Asus Vivobook K3502ZA", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BOARD_NAME, "K3502ZA"), + }, + }, + { } +}; + struct irq_override_cmp { const struct dmi_system_id *system; unsigned char irq; @@ -409,6 +427,7 @@ struct irq_override_cmp {
static const struct irq_override_cmp skip_override_table[] = { { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0 }, + { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0 }, };
static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity,
From: Jiri Slaby (SUSE) jirislaby@kernel.org
[ Upstream commit bfcdf58380b1d9be564a78a9370da722ed1a9965 ]
LENOVO IdeaPad Flex 5 is ryzen-5 based and the commit below removed IRQ overriding for those. This broke touchscreen and trackpad: i2c_designware AMDI0010:00: controller timed out i2c_designware AMDI0010:03: controller timed out i2c_hid_acpi i2c-MSFT0001:00: failed to reset device: -61 i2c_designware AMDI0010:03: controller timed out ... i2c_hid_acpi i2c-MSFT0001:00: can't add hid device: -61 i2c_hid_acpi: probe of i2c-MSFT0001:00 failed with error -61
White-list this specific model in the override_table.
For this to work, the ZEN test needs to be put below the table walk.
Fixes: 37c81d9f1d1b (ACPI: resource: skip IRQ override on AMD Zen platforms) Link: https://bugzilla.suse.com/show_bug.cgi?id=1203794 Signed-off-by: Jiri Slaby (SUSE) jirislaby@kernel.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: f3cb9b740869 ("ACPI: resource: do IRQ override on Lenovo 14ALC7") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/resource.c | 42 +++++++++++++++++++++++++++-------------- 1 file changed, 28 insertions(+), 14 deletions(-)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 596ca9fae389..5154c9861ece 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -417,17 +417,31 @@ static const struct dmi_system_id asus_laptop[] = { { } };
+static const struct dmi_system_id lenovo_82ra[] = { + { + .ident = "LENOVO IdeaPad Flex 5 16ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82RA"), + }, + }, + { } +}; + struct irq_override_cmp { const struct dmi_system_id *system; unsigned char irq; unsigned char triggering; unsigned char polarity; unsigned char shareable; + bool override; };
-static const struct irq_override_cmp skip_override_table[] = { - { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0 }, - { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0 }, +static const struct irq_override_cmp override_table[] = { + { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, + { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, + { lenovo_82ra, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { lenovo_82ra, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, };
static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, @@ -435,6 +449,17 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, { int i;
+ for (i = 0; i < ARRAY_SIZE(override_table); i++) { + const struct irq_override_cmp *entry = &override_table[i]; + + if (dmi_check_system(entry->system) && + entry->irq == gsi && + entry->triggering == triggering && + entry->polarity == polarity && + entry->shareable == shareable) + return entry->override; + } + #ifdef CONFIG_X86 /* * IRQ override isn't needed on modern AMD Zen systems and @@ -445,17 +470,6 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, return false; #endif
- for (i = 0; i < ARRAY_SIZE(skip_override_table); i++) { - const struct irq_override_cmp *entry = &skip_override_table[i]; - - if (dmi_check_system(entry->system) && - entry->irq == gsi && - entry->triggering == triggering && - entry->polarity == polarity && - entry->shareable == shareable) - return false; - } - return true; }
From: Erik Schumacher ofenfisch@googlemail.com
[ Upstream commit 7592b79ba4a91350b38469e05238308bcfe1019b ]
The Schenker XMG CORE 15 (M22) is Ryzen-6 based and needs IRQ overriding for the keyboard to work. Adding an entry for this laptop to the override_table makes the internal keyboard functional again.
Signed-off-by: Erik Schumacher ofenfisch@googlemail.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: f3cb9b740869 ("ACPI: resource: do IRQ override on Lenovo 14ALC7") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/resource.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 5154c9861ece..d0bed7e66a33 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -428,6 +428,17 @@ static const struct dmi_system_id lenovo_82ra[] = { { } };
+static const struct dmi_system_id schenker_gm_rg[] = { + { + .ident = "XMG CORE 15 (M22)", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "SchenkerTechnologiesGmbH"), + DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"), + }, + }, + { } +}; + struct irq_override_cmp { const struct dmi_system_id *system; unsigned char irq; @@ -442,6 +453,7 @@ static const struct irq_override_cmp override_table[] = { { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { lenovo_82ra, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, { lenovo_82ra, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { schenker_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, };
static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity,
From: Adrian Freund adrian@freund.io
[ Upstream commit f3cb9b740869712d448edf3b9ef5952b847caf8b ]
Commit bfcdf58380b1 ("ACPI: resource: do IRQ override on LENOVO IdeaPad") added an override for Lenovo IdeaPad 5 16ALC7. The 14ALC7 variant also suffers from a broken touchscreen and trackpad.
Fixes: 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms") Link: https://bugzilla.kernel.org/show_bug.cgi?id=216804 Signed-off-by: Adrian Freund adrian@freund.io Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/resource.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index d0bed7e66a33..33921949bd8f 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -417,7 +417,14 @@ static const struct dmi_system_id asus_laptop[] = { { } };
-static const struct dmi_system_id lenovo_82ra[] = { +static const struct dmi_system_id lenovo_laptop[] = { + { + .ident = "LENOVO IdeaPad Flex 5 14ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82R9"), + }, + }, { .ident = "LENOVO IdeaPad Flex 5 16ALC7", .matches = { @@ -451,8 +458,8 @@ struct irq_override_cmp { static const struct irq_override_cmp override_table[] = { { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, - { lenovo_82ra, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, - { lenovo_82ra, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, { schenker_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, };
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit 246cf66e300b76099b5dbd3fdd39e9a5dbc53f02 ]
Commit 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'") will access 'bic->bfqq' in bic_set_bfqq(), however, bfq_exit_icq_bfqq() can free bfqq first, and then call bic_set_bfqq(), which will cause uaf.
Fix the problem by moving bfq_exit_bfqq() behind bic_set_bfqq().
Fixes: 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'") Reported-by: Yi Zhang yi.zhang@redhat.com Signed-off-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20221226030605.1437081-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/bfq-iosched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index b8b6e9eae94b..85120d7b5cf0 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -5251,8 +5251,8 @@ static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic, bool is_sync) unsigned long flags;
spin_lock_irqsave(&bfqd->lock, flags); - bfq_exit_bfqq(bfqd, bfqq); bic_set_bfqq(bic, NULL, is_sync); + bfq_exit_bfqq(bfqd, bfqq); spin_unlock_irqrestore(&bfqd->lock, flags); } }
From: Adam Vodopjan grozzly@protonmail.com
[ Upstream commit 37e14e4f3715428b809e4df9a9958baa64c77d51 ]
Since kernel 5.3.4 my laptop (ICH8M controller) does not see Kingston SV300S37A60G SSD disk connected into a SATA connector on wake from suspend. The problem was introduced in c312ef176399 ("libata/ahci: Drop PCS quirk for Denverton and beyond"): the quirk is not applied on wake from suspend as it originally was.
It is worth to mention the commit contained another bug: the quirk is not applied at all to controllers which require it. The fix commit 09d6ac8dc51a ("libata/ahci: Fix PCS quirk application") landed in 5.3.8. So testing my patch anywhere between commits c312ef176399 and 09d6ac8dc51a is pointless.
Not all disks trigger the problem. For example nothing bad happens with Western Digital WD5000LPCX HDD.
Test hardware: - Acer 5920G with ICH8M SATA controller - sda: some SATA HDD connnected into the DVD drive IDE port with a SATA-IDE caddy. It is a boot disk - sdb: Kingston SV300S37A60G SSD connected into the only SATA port
Sample "dmesg --notime | grep -E '^(sd |ata)'" output on wake:
sd 0:0:0:0: [sda] Starting disk sd 2:0:0:0: [sdb] Starting disk ata4: SATA link down (SStatus 4 SControl 300) ata3: SATA link down (SStatus 4 SControl 300) ata1.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out ata1.00: ACPI cmd ef/03:42:00:00:00:a0 (SET FEATURES) filtered out ata1: FORCE: cable set to 80c ata5: SATA link down (SStatus 0 SControl 300) ata3: SATA link down (SStatus 4 SControl 300) ata3: SATA link down (SStatus 4 SControl 300) ata3.00: disabled sd 2:0:0:0: rejecting I/O to offline device ata3.00: detaching (SCSI 2:0:0:0) sd 2:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK sd 2:0:0:0: [sdb] Synchronizing SCSI cache sd 2:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK sd 2:0:0:0: [sdb] Stopping disk sd 2:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Commit c312ef176399 dropped ahci_pci_reset_controller() which internally calls ahci_reset_controller() and applies the PCS quirk if needed after that. It was called each time a reset was required instead of just ahci_reset_controller(). This patch puts the function back in place.
Fixes: c312ef176399 ("libata/ahci: Drop PCS quirk for Denverton and beyond") Signed-off-by: Adam Vodopjan grozzly@protonmail.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/ahci.c | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index c1bf7117a9ff..149ee16fd022 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -83,6 +83,7 @@ enum board_ids { static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent); static void ahci_remove_one(struct pci_dev *dev); static void ahci_shutdown_one(struct pci_dev *dev); +static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hpriv); static int ahci_vt8251_hardreset(struct ata_link *link, unsigned int *class, unsigned long deadline); static int ahci_avn_hardreset(struct ata_link *link, unsigned int *class, @@ -668,6 +669,25 @@ static void ahci_pci_save_initial_config(struct pci_dev *pdev, ahci_save_initial_config(&pdev->dev, hpriv); }
+static int ahci_pci_reset_controller(struct ata_host *host) +{ + struct pci_dev *pdev = to_pci_dev(host->dev); + struct ahci_host_priv *hpriv = host->private_data; + int rc; + + rc = ahci_reset_controller(host); + if (rc) + return rc; + + /* + * If platform firmware failed to enable ports, try to enable + * them here. + */ + ahci_intel_pcs_quirk(pdev, hpriv); + + return 0; +} + static void ahci_pci_init_controller(struct ata_host *host) { struct ahci_host_priv *hpriv = host->private_data; @@ -869,7 +889,7 @@ static int ahci_pci_device_runtime_resume(struct device *dev) struct ata_host *host = pci_get_drvdata(pdev); int rc;
- rc = ahci_reset_controller(host); + rc = ahci_pci_reset_controller(host); if (rc) return rc; ahci_pci_init_controller(host); @@ -904,7 +924,7 @@ static int ahci_pci_device_resume(struct device *dev) ahci_mcp89_apple_enable(pdev);
if (pdev->dev.power.power_state.event == PM_EVENT_SUSPEND) { - rc = ahci_reset_controller(host); + rc = ahci_pci_reset_controller(host); if (rc) return rc;
@@ -1789,12 +1809,6 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) /* save initial config */ ahci_pci_save_initial_config(pdev, hpriv);
- /* - * If platform firmware failed to enable ports, try to enable - * them here. - */ - ahci_intel_pcs_quirk(pdev, hpriv); - /* prepare host */ if (hpriv->cap & HOST_CAP_NCQ) { pi.flags |= ATA_FLAG_NCQ; @@ -1904,7 +1918,7 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) if (rc) return rc;
- rc = ahci_reset_controller(host); + rc = ahci_pci_reset_controller(host); if (rc) return rc;
From: Christoph Hellwig hch@lst.de
[ Upstream commit 685e6311637e46f3212439ce2789f8a300e5050f ]
3 << 16 does not generate the correct mask for bits 16, 17 and 18. Use the GENMASK macro to generate the correct mask instead.
Fixes: 84fef62d135b ("nvme: check admin passthru command effects") Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Keith Busch kbusch@kernel.org Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Kanchan Joshi joshi.k@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/nvme.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/nvme.h b/include/linux/nvme.h index 039f59ee8f43..de235916c31c 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -7,6 +7,7 @@ #ifndef _LINUX_NVME_H #define _LINUX_NVME_H
+#include <linux/bits.h> #include <linux/types.h> #include <linux/uuid.h>
@@ -539,7 +540,7 @@ enum { NVME_CMD_EFFECTS_NCC = 1 << 2, NVME_CMD_EFFECTS_NIC = 1 << 3, NVME_CMD_EFFECTS_CCC = 1 << 4, - NVME_CMD_EFFECTS_CSE_MASK = 3 << 16, + NVME_CMD_EFFECTS_CSE_MASK = GENMASK(18, 16), NVME_CMD_EFFECTS_UUID_SEL = 1 << 19, };
From: Christoph Hellwig hch@lst.de
[ Upstream commit 2a459f6933e1c459bffb7cc73fd6c900edc714bd ]
Mask out the "Command Supported" and "Logical Block Content Change" bits and only defer execution of commands that have non-trivial effects to the workqueue for synchronous execution. This allows to execute admin commands asynchronously on controllers that provide a Command Supported and Effects log page, and will keep allowing to execute Write commands asynchronously once command effects on I/O commands are taken into account.
Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Keith Busch kbusch@kernel.org Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Kanchan Joshi joshi.k@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/passthru.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c index 6220e1dd961a..9b5929754195 100644 --- a/drivers/nvme/target/passthru.c +++ b/drivers/nvme/target/passthru.c @@ -271,14 +271,13 @@ static void nvmet_passthru_execute_cmd(struct nvmet_req *req) }
/* - * If there are effects for the command we are about to execute, or - * an end_req function we need to use nvme_execute_passthru_rq() - * synchronously in a work item seeing the end_req function and - * nvme_passthru_end() can't be called in the request done callback - * which is typically in interrupt context. + * If a command needs post-execution fixups, or there are any + * non-trivial effects, make sure to execute the command synchronously + * in a workqueue so that nvme_passthru_end gets called. */ effects = nvme_command_effects(ctrl, ns, req->cmd->common.opcode); - if (req->p.use_workqueue || effects) { + if (req->p.use_workqueue || + (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC))) { INIT_WORK(&req->p.work, nvmet_passthru_execute_cmd_work); req->p.rq = rq; queue_work(nvmet_wq, &req->p.work);
From: edward lo edward.lo@ambergroup.io
[ Upstream commit 0b66046266690454dc04e6307bcff4a5605b42a1 ]
When the NTFS BOOT record_size field < 0, it represents a shift value. However, there is no sanity check on the shift result and the sbi->record_bits calculation through blksize_bits() assumes the size always > 256, which could lead to NPD while mounting a malformed NTFS image.
[ 318.675159] BUG: kernel NULL pointer dereference, address: 0000000000000158 [ 318.675682] #PF: supervisor read access in kernel mode [ 318.675869] #PF: error_code(0x0000) - not-present page [ 318.676246] PGD 0 P4D 0 [ 318.676502] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 318.676934] CPU: 0 PID: 259 Comm: mount Not tainted 5.19.0 #5 [ 318.677289] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 318.678136] RIP: 0010:ni_find_attr+0x2d/0x1c0 [ 318.678656] Code: 89 ca 4d 89 c7 41 56 41 55 41 54 41 89 cc 55 48 89 fd 53 48 89 d3 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 180 [ 318.679848] RSP: 0018:ffffa6c8c0297bd8 EFLAGS: 00000246 [ 318.680104] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000080 [ 318.680790] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 318.681679] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 318.682577] R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000080 [ 318.683015] R13: ffff8d5582e68400 R14: 0000000000000100 R15: 0000000000000000 [ 318.683618] FS: 00007fd9e1c81e40(0000) GS:ffff8d55fdc00000(0000) knlGS:0000000000000000 [ 318.684280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 318.684651] CR2: 0000000000000158 CR3: 0000000002e1a000 CR4: 00000000000006f0 [ 318.685623] Call Trace: [ 318.686607] <TASK> [ 318.686872] ? ntfs_alloc_inode+0x1a/0x60 [ 318.687235] attr_load_runs_vcn+0x2b/0xa0 [ 318.687468] mi_read+0xbb/0x250 [ 318.687576] ntfs_iget5+0x114/0xd90 [ 318.687750] ntfs_fill_super+0x588/0x11b0 [ 318.687953] ? put_ntfs+0x130/0x130 [ 318.688065] ? snprintf+0x49/0x70 [ 318.688164] ? put_ntfs+0x130/0x130 [ 318.688256] get_tree_bdev+0x16a/0x260 [ 318.688407] vfs_get_tree+0x20/0xb0 [ 318.688519] path_mount+0x2dc/0x9b0 [ 318.688877] do_mount+0x74/0x90 [ 318.689142] __x64_sys_mount+0x89/0xd0 [ 318.689636] do_syscall_64+0x3b/0x90 [ 318.689998] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 318.690318] RIP: 0033:0x7fd9e133c48a [ 318.690687] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 318.691357] RSP: 002b:00007ffd374406c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 318.691632] RAX: ffffffffffffffda RBX: 0000564d0b051080 RCX: 00007fd9e133c48a [ 318.691920] RDX: 0000564d0b051280 RSI: 0000564d0b051300 RDI: 0000564d0b0596a0 [ 318.692123] RBP: 0000000000000000 R08: 0000564d0b0512a0 R09: 0000000000000020 [ 318.692349] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564d0b0596a0 [ 318.692673] R13: 0000564d0b051280 R14: 0000000000000000 R15: 00000000ffffffff [ 318.693007] </TASK> [ 318.693271] Modules linked in: [ 318.693614] CR2: 0000000000000158 [ 318.694446] ---[ end trace 0000000000000000 ]--- [ 318.694779] RIP: 0010:ni_find_attr+0x2d/0x1c0 [ 318.694952] Code: 89 ca 4d 89 c7 41 56 41 55 41 54 41 89 cc 55 48 89 fd 53 48 89 d3 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 180 [ 318.696042] RSP: 0018:ffffa6c8c0297bd8 EFLAGS: 00000246 [ 318.696531] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000080 [ 318.698114] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 318.699286] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 318.699795] R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000080 [ 318.700236] R13: ffff8d5582e68400 R14: 0000000000000100 R15: 0000000000000000 [ 318.700973] FS: 00007fd9e1c81e40(0000) GS:ffff8d55fdc00000(0000) knlGS:0000000000000000 [ 318.701688] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 318.702190] CR2: 0000000000000158 CR3: 0000000002e1a000 CR4: 00000000000006f0 [ 318.726510] mount (259) used greatest stack depth: 13320 bytes left
This patch adds a sanity check.
Signed-off-by: edward lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c index 39b09f32f4db..c321f621464b 100644 --- a/fs/ntfs3/super.c +++ b/fs/ntfs3/super.c @@ -789,7 +789,7 @@ static int ntfs_init_from_boot(struct super_block *sb, u32 sector_size, : (u32)boot->record_size << sbi->cluster_bits;
- if (record_size > MAXIMUM_BYTES_PER_MFT) + if (record_size > MAXIMUM_BYTES_PER_MFT || record_size < SECTOR_SIZE) goto out;
sbi->record_bits = blksize_bits(record_size);
From: edward lo edward.lo@ambergroup.io
[ Upstream commit e19c6277652efba203af4ecd8eed4bd30a0054c9 ]
The offset addition could overflow and pass the used size check given an attribute with very large size (e.g., 0xffffff7f) while parsing MFT attributes. This could lead to out-of-bound memory R/W if we try to access the next attribute derived by Add2Ptr(attr, asize)
[ 32.963847] BUG: unable to handle page fault for address: ffff956a83c76067 [ 32.964301] #PF: supervisor read access in kernel mode [ 32.964526] #PF: error_code(0x0000) - not-present page [ 32.964893] PGD 4dc01067 P4D 4dc01067 PUD 0 [ 32.965316] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 32.965727] CPU: 0 PID: 243 Comm: mount Not tainted 5.19.0+ #6 [ 32.966050] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 32.966628] RIP: 0010:mi_enum_attr+0x44/0x110 [ 32.967239] Code: 89 f0 48 29 c8 48 89 c1 39 c7 0f 86 94 00 00 00 8b 56 04 83 fa 17 0f 86 88 00 00 00 89 d0 01 ca 48 01 f0 8d 4a 08 39 f9a [ 32.968101] RSP: 0018:ffffba15c06a7c38 EFLAGS: 00000283 [ 32.968364] RAX: ffff956a83c76067 RBX: ffff956983c76050 RCX: 000000000000006f [ 32.968651] RDX: 0000000000000067 RSI: ffff956983c760e8 RDI: 00000000000001c8 [ 32.968963] RBP: ffffba15c06a7c38 R08: 0000000000000064 R09: 00000000ffffff7f [ 32.969249] R10: 0000000000000007 R11: ffff956983c760e8 R12: ffff95698225e000 [ 32.969870] R13: 0000000000000000 R14: ffffba15c06a7cd8 R15: ffff95698225e170 [ 32.970655] FS: 00007fdab8189e40(0000) GS:ffff9569fdc00000(0000) knlGS:0000000000000000 [ 32.971098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.971378] CR2: ffff956a83c76067 CR3: 0000000002c58000 CR4: 00000000000006f0 [ 32.972098] Call Trace: [ 32.972842] <TASK> [ 32.973341] ni_enum_attr_ex+0xda/0xf0 [ 32.974087] ntfs_iget5+0x1db/0xde0 [ 32.974386] ? slab_post_alloc_hook+0x53/0x270 [ 32.974778] ? ntfs_fill_super+0x4c7/0x12a0 [ 32.975115] ntfs_fill_super+0x5d6/0x12a0 [ 32.975336] get_tree_bdev+0x175/0x270 [ 32.975709] ? put_ntfs+0x150/0x150 [ 32.975956] ntfs_fs_get_tree+0x15/0x20 [ 32.976191] vfs_get_tree+0x2a/0xc0 [ 32.976374] ? capable+0x19/0x20 [ 32.976572] path_mount+0x484/0xaa0 [ 32.977025] ? putname+0x57/0x70 [ 32.977380] do_mount+0x80/0xa0 [ 32.977555] __x64_sys_mount+0x8b/0xe0 [ 32.978105] do_syscall_64+0x3b/0x90 [ 32.978830] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 32.979311] RIP: 0033:0x7fdab72e948a [ 32.980015] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 32.981251] RSP: 002b:00007ffd15b87588 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 32.981832] RAX: ffffffffffffffda RBX: 0000557de0aaf060 RCX: 00007fdab72e948a [ 32.982234] RDX: 0000557de0aaf260 RSI: 0000557de0aaf2e0 RDI: 0000557de0ab7ce0 [ 32.982714] RBP: 0000000000000000 R08: 0000557de0aaf280 R09: 0000000000000020 [ 32.983046] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000557de0ab7ce0 [ 32.983494] R13: 0000557de0aaf260 R14: 0000000000000000 R15: 00000000ffffffff [ 32.984094] </TASK> [ 32.984352] Modules linked in: [ 32.984753] CR2: ffff956a83c76067 [ 32.985911] ---[ end trace 0000000000000000 ]--- [ 32.986555] RIP: 0010:mi_enum_attr+0x44/0x110 [ 32.987217] Code: 89 f0 48 29 c8 48 89 c1 39 c7 0f 86 94 00 00 00 8b 56 04 83 fa 17 0f 86 88 00 00 00 89 d0 01 ca 48 01 f0 8d 4a 08 39 f9a [ 32.988232] RSP: 0018:ffffba15c06a7c38 EFLAGS: 00000283 [ 32.988532] RAX: ffff956a83c76067 RBX: ffff956983c76050 RCX: 000000000000006f [ 32.988916] RDX: 0000000000000067 RSI: ffff956983c760e8 RDI: 00000000000001c8 [ 32.989356] RBP: ffffba15c06a7c38 R08: 0000000000000064 R09: 00000000ffffff7f [ 32.989994] R10: 0000000000000007 R11: ffff956983c760e8 R12: ffff95698225e000 [ 32.990415] R13: 0000000000000000 R14: ffffba15c06a7cd8 R15: ffff95698225e170 [ 32.991011] FS: 00007fdab8189e40(0000) GS:ffff9569fdc00000(0000) knlGS:0000000000000000 [ 32.991524] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.991936] CR2: ffff956a83c76067 CR3: 0000000002c58000 CR4: 00000000000006f0
This patch adds an overflow check
Signed-off-by: edward lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/record.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/ntfs3/record.c b/fs/ntfs3/record.c index 861e35791506..30751fd618df 100644 --- a/fs/ntfs3/record.c +++ b/fs/ntfs3/record.c @@ -220,6 +220,11 @@ struct ATTRIB *mi_enum_attr(struct mft_inode *mi, struct ATTRIB *attr) return NULL; }
+ if (off + asize < off) { + /* overflow check */ + return NULL; + } + attr = Add2Ptr(attr, asize); off += asize; }
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit 6db620863f8528ed9a9aa5ad323b26554a17881d ]
This adds sanity checks for data run offset. We should make sure data run offset is legit before trying to unpack them, otherwise we may encounter use-after-free or some unexpected memory access behaviors.
[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570 [ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240 [ 82.941670] [ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15 [ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 82.943720] Call Trace: [ 82.944204] <TASK> [ 82.944471] dump_stack_lvl+0x49/0x63 [ 82.944908] print_report.cold+0xf5/0x67b [ 82.945141] ? __wait_on_bit+0x106/0x120 [ 82.945750] ? run_unpack+0x2e3/0x570 [ 82.946626] kasan_report+0xa7/0x120 [ 82.947046] ? run_unpack+0x2e3/0x570 [ 82.947280] __asan_load1+0x51/0x60 [ 82.947483] run_unpack+0x2e3/0x570 [ 82.947709] ? memcpy+0x4e/0x70 [ 82.947927] ? run_pack+0x7a0/0x7a0 [ 82.948158] run_unpack_ex+0xad/0x3f0 [ 82.948399] ? mi_enum_attr+0x14a/0x200 [ 82.948717] ? run_unpack+0x570/0x570 [ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0 [ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0 [ 82.949611] ? mi_read+0x262/0x2c0 [ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180 [ 82.950249] ntfs_iget5+0x632/0x1870 [ 82.950621] ? ntfs_get_block_bmap+0x70/0x70 [ 82.951192] ? evict+0x223/0x280 [ 82.951525] ? iput.part.0+0x286/0x320 [ 82.951969] ntfs_fill_super+0x1321/0x1e20 [ 82.952436] ? put_ntfs+0x1d0/0x1d0 [ 82.952822] ? vsprintf+0x20/0x20 [ 82.953188] ? mutex_unlock+0x81/0xd0 [ 82.953379] ? set_blocksize+0x95/0x150 [ 82.954001] get_tree_bdev+0x232/0x370 [ 82.954438] ? put_ntfs+0x1d0/0x1d0 [ 82.954700] ntfs_fs_get_tree+0x15/0x20 [ 82.955049] vfs_get_tree+0x4c/0x130 [ 82.955292] path_mount+0x645/0xfd0 [ 82.955615] ? putname+0x80/0xa0 [ 82.955955] ? finish_automount+0x2e0/0x2e0 [ 82.956310] ? kmem_cache_free+0x110/0x390 [ 82.956723] ? putname+0x80/0xa0 [ 82.957023] do_mount+0xd6/0xf0 [ 82.957411] ? path_mount+0xfd0/0xfd0 [ 82.957638] ? __kasan_check_write+0x14/0x20 [ 82.957948] __x64_sys_mount+0xca/0x110 [ 82.958310] do_syscall_64+0x3b/0x90 [ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 82.959341] RIP: 0033:0x7fd0d1ce948a [ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a [ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0 [ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020 [ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0 [ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/attrib.c | 13 +++++++++++++ fs/ntfs3/attrlist.c | 5 +++++ fs/ntfs3/frecord.c | 14 ++++++++++++++ fs/ntfs3/fslog.c | 9 +++++++++ fs/ntfs3/inode.c | 5 +++++ 5 files changed, 46 insertions(+)
diff --git a/fs/ntfs3/attrib.c b/fs/ntfs3/attrib.c index e8c00dda42ad..43e85c493c05 100644 --- a/fs/ntfs3/attrib.c +++ b/fs/ntfs3/attrib.c @@ -101,6 +101,10 @@ int attr_load_runs(struct ATTRIB *attr, struct ntfs_inode *ni,
asize = le32_to_cpu(attr->size); run_off = le16_to_cpu(attr->nres.run_off); + + if (run_off > asize) + return -EINVAL; + err = run_unpack_ex(run, ni->mi.sbi, ni->mi.rno, svcn, evcn, vcn ? *vcn : svcn, Add2Ptr(attr, run_off), asize - run_off); @@ -1157,6 +1161,10 @@ int attr_load_runs_vcn(struct ntfs_inode *ni, enum ATTR_TYPE type, }
ro = le16_to_cpu(attr->nres.run_off); + + if (ro > le32_to_cpu(attr->size)) + return -EINVAL; + err = run_unpack_ex(run, ni->mi.sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, ro), le32_to_cpu(attr->size) - ro); if (err < 0) @@ -1832,6 +1840,11 @@ int attr_collapse_range(struct ntfs_inode *ni, u64 vbo, u64 bytes) u16 le_sz; u16 roff = le16_to_cpu(attr->nres.run_off);
+ if (roff > le32_to_cpu(attr->size)) { + err = -EINVAL; + goto out; + } + run_unpack_ex(RUN_DEALLOCATE, sbi, ni->mi.rno, svcn, evcn1 - 1, svcn, Add2Ptr(attr, roff), le32_to_cpu(attr->size) - roff); diff --git a/fs/ntfs3/attrlist.c b/fs/ntfs3/attrlist.c index bad6d8a849a2..c0c6bcbc8c05 100644 --- a/fs/ntfs3/attrlist.c +++ b/fs/ntfs3/attrlist.c @@ -68,6 +68,11 @@ int ntfs_load_attr_list(struct ntfs_inode *ni, struct ATTRIB *attr)
run_init(&ni->attr_list.run);
+ if (run_off > le32_to_cpu(attr->size)) { + err = -EINVAL; + goto out; + } + err = run_unpack_ex(&ni->attr_list.run, ni->mi.sbi, ni->mi.rno, 0, le64_to_cpu(attr->nres.evcn), 0, Add2Ptr(attr, run_off), diff --git a/fs/ntfs3/frecord.c b/fs/ntfs3/frecord.c index 18842998c8fa..cdeb0b51f0ba 100644 --- a/fs/ntfs3/frecord.c +++ b/fs/ntfs3/frecord.c @@ -567,6 +567,12 @@ static int ni_repack(struct ntfs_inode *ni) }
roff = le16_to_cpu(attr->nres.run_off); + + if (roff > le32_to_cpu(attr->size)) { + err = -EINVAL; + break; + } + err = run_unpack(&run, sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, roff), le32_to_cpu(attr->size) - roff); @@ -1541,6 +1547,9 @@ int ni_delete_all(struct ntfs_inode *ni) asize = le32_to_cpu(attr->size); roff = le16_to_cpu(attr->nres.run_off);
+ if (roff > asize) + return -EINVAL; + /* run==1 means unpack and deallocate. */ run_unpack_ex(RUN_DEALLOCATE, sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, roff), asize - roff); @@ -2242,6 +2251,11 @@ int ni_decompress_file(struct ntfs_inode *ni) asize = le32_to_cpu(attr->size); roff = le16_to_cpu(attr->nres.run_off);
+ if (roff > asize) { + err = -EINVAL; + goto out; + } + /*run==1 Means unpack and deallocate. */ run_unpack_ex(RUN_DEALLOCATE, sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, roff), asize - roff); diff --git a/fs/ntfs3/fslog.c b/fs/ntfs3/fslog.c index 614513460b8e..bcdddcd7bc79 100644 --- a/fs/ntfs3/fslog.c +++ b/fs/ntfs3/fslog.c @@ -2727,6 +2727,9 @@ static inline bool check_attr(const struct MFT_REC *rec, return false; }
+ if (run_off > asize) + return false; + if (run_unpack(NULL, sbi, 0, svcn, evcn, svcn, Add2Ptr(attr, run_off), asize - run_off) < 0) { return false; @@ -4769,6 +4772,12 @@ int log_replay(struct ntfs_inode *ni, bool *initialized) u16 roff = le16_to_cpu(attr->nres.run_off); CLST svcn = le64_to_cpu(attr->nres.svcn);
+ if (roff > t32) { + kfree(oa->attr); + oa->attr = NULL; + goto fake_attr; + } + err = run_unpack(&oa->run0, sbi, inode->i_ino, svcn, le64_to_cpu(attr->nres.evcn), svcn, Add2Ptr(attr, roff), t32 - roff); diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index 64b4a3c29878..83d4c9f42d9c 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -364,6 +364,11 @@ static struct inode *ntfs_read_mft(struct inode *inode, attr_unpack_run: roff = le16_to_cpu(attr->nres.run_off);
+ if (roff > asize) { + err = -EINVAL; + goto out; + } + t64 = le64_to_cpu(attr->nres.svcn); err = run_unpack_ex(run, sbi, ino, t64, le64_to_cpu(attr->nres.evcn), t64, Add2Ptr(attr, roff), asize - roff);
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit 2681631c29739509eec59cc0b34e977bb04c6cf1 ]
Some metadata files are handled before MFT. This adds a null pointer check for some corner cases that could lead to NPD while reading these metadata files for a malformed NTFS image.
[ 240.190827] BUG: kernel NULL pointer dereference, address: 0000000000000158 [ 240.191583] #PF: supervisor read access in kernel mode [ 240.191956] #PF: error_code(0x0000) - not-present page [ 240.192391] PGD 0 P4D 0 [ 240.192897] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 240.193805] CPU: 0 PID: 242 Comm: mount Tainted: G B 5.19.0+ #17 [ 240.194477] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 240.195152] RIP: 0010:ni_find_attr+0xae/0x300 [ 240.195679] Code: c8 48 c7 45 88 c0 4e 5e 86 c7 00 f1 f1 f1 f1 c7 40 04 00 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 e8 e2 d9f [ 240.196642] RSP: 0018:ffff88800812f690 EFLAGS: 00000286 [ 240.197019] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff85ef037a [ 240.197523] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff88e95f60 [ 240.197877] RBP: ffff88800812f738 R08: 0000000000000001 R09: fffffbfff11d2bed [ 240.198292] R10: ffffffff88e95f67 R11: fffffbfff11d2bec R12: 0000000000000000 [ 240.198647] R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000000 [ 240.199410] FS: 00007f233c33be40(0000) GS:ffff888058200000(0000) knlGS:0000000000000000 [ 240.199895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 240.200314] CR2: 0000000000000158 CR3: 0000000004d32000 CR4: 00000000000006f0 [ 240.200839] Call Trace: [ 240.201104] <TASK> [ 240.201502] ? ni_load_mi+0x80/0x80 [ 240.202297] ? ___slab_alloc+0x465/0x830 [ 240.202614] attr_load_runs_vcn+0x8c/0x1a0 [ 240.202886] ? __kasan_slab_alloc+0x32/0x90 [ 240.203157] ? attr_data_write_resident+0x250/0x250 [ 240.203543] mi_read+0x133/0x2c0 [ 240.203785] mi_get+0x70/0x140 [ 240.204012] ni_load_mi_ex+0xfa/0x190 [ 240.204346] ? ni_std5+0x90/0x90 [ 240.204588] ? __kasan_kmalloc+0x88/0xb0 [ 240.204859] ni_enum_attr_ex+0xf1/0x1c0 [ 240.205107] ? ni_fname_type.part.0+0xd0/0xd0 [ 240.205600] ? ntfs_load_attr_list+0xbe/0x300 [ 240.205864] ? ntfs_cmp_names_cpu+0x125/0x180 [ 240.206157] ntfs_iget5+0x56c/0x1870 [ 240.206510] ? ntfs_get_block_bmap+0x70/0x70 [ 240.206776] ? __kasan_kmalloc+0x88/0xb0 [ 240.207030] ? set_blocksize+0x95/0x150 [ 240.207545] ntfs_fill_super+0xb8f/0x1e20 [ 240.207839] ? put_ntfs+0x1d0/0x1d0 [ 240.208069] ? vsprintf+0x20/0x20 [ 240.208467] ? mutex_unlock+0x81/0xd0 [ 240.208846] ? set_blocksize+0x95/0x150 [ 240.209221] get_tree_bdev+0x232/0x370 [ 240.209804] ? put_ntfs+0x1d0/0x1d0 [ 240.210519] ntfs_fs_get_tree+0x15/0x20 [ 240.210991] vfs_get_tree+0x4c/0x130 [ 240.211455] path_mount+0x645/0xfd0 [ 240.211806] ? putname+0x80/0xa0 [ 240.212112] ? finish_automount+0x2e0/0x2e0 [ 240.212559] ? kmem_cache_free+0x110/0x390 [ 240.212906] ? putname+0x80/0xa0 [ 240.213329] do_mount+0xd6/0xf0 [ 240.213829] ? path_mount+0xfd0/0xfd0 [ 240.214246] ? __kasan_check_write+0x14/0x20 [ 240.214774] __x64_sys_mount+0xca/0x110 [ 240.215080] do_syscall_64+0x3b/0x90 [ 240.215442] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 240.215811] RIP: 0033:0x7f233b4e948a [ 240.216104] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 240.217615] RSP: 002b:00007fff02211ec8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 240.218718] RAX: ffffffffffffffda RBX: 0000561cdc35b060 RCX: 00007f233b4e948a [ 240.219556] RDX: 0000561cdc35b260 RSI: 0000561cdc35b2e0 RDI: 0000561cdc363af0 [ 240.219975] RBP: 0000000000000000 R08: 0000561cdc35b280 R09: 0000000000000020 [ 240.220403] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000561cdc363af0 [ 240.220803] R13: 0000561cdc35b260 R14: 0000000000000000 R15: 00000000ffffffff [ 240.221256] </TASK> [ 240.221567] Modules linked in: [ 240.222028] CR2: 0000000000000158 [ 240.223291] ---[ end trace 0000000000000000 ]--- [ 240.223669] RIP: 0010:ni_find_attr+0xae/0x300 [ 240.224058] Code: c8 48 c7 45 88 c0 4e 5e 86 c7 00 f1 f1 f1 f1 c7 40 04 00 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 e8 e2 d9f [ 240.225033] RSP: 0018:ffff88800812f690 EFLAGS: 00000286 [ 240.225968] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff85ef037a [ 240.226624] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff88e95f60 [ 240.227307] RBP: ffff88800812f738 R08: 0000000000000001 R09: fffffbfff11d2bed [ 240.227816] R10: ffffffff88e95f67 R11: fffffbfff11d2bec R12: 0000000000000000 [ 240.228330] R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000000 [ 240.228729] FS: 00007f233c33be40(0000) GS:ffff888058200000(0000) knlGS:0000000000000000 [ 240.229281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 240.230298] CR2: 0000000000000158 CR3: 0000000004d32000 CR4: 00000000000006f0
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/attrib.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/ntfs3/attrib.c b/fs/ntfs3/attrib.c index 43e85c493c05..42af83bcaf13 100644 --- a/fs/ntfs3/attrib.c +++ b/fs/ntfs3/attrib.c @@ -1146,6 +1146,11 @@ int attr_load_runs_vcn(struct ntfs_inode *ni, enum ATTR_TYPE type, CLST svcn, evcn; u16 ro;
+ if (!ni) { + /* Is record corrupted? */ + return -ENOENT; + } + attr = ni_find_attr(ni, NULL, NULL, type, name, name_len, &vcn, NULL); if (!attr) { /* Is record corrupted? */
From: Shigeru Yoshida syoshida@redhat.com
[ Upstream commit 51e76a232f8c037f1d9e9922edc25b003d5f3414 ]
syzbot reported kmemleak as below:
BUG: memory leak unreferenced object 0xffff8880122f1540 (size 32): comm "a.out", pid 6664, jiffies 4294939771 (age 25.500s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 ed ff ed ff 00 00 00 00 ................ backtrace: [<ffffffff81b16052>] ntfs_init_fs_context+0x22/0x1c0 [<ffffffff8164aaa7>] alloc_fs_context+0x217/0x430 [<ffffffff81626dd4>] path_mount+0x704/0x1080 [<ffffffff81627e7c>] __x64_sys_mount+0x18c/0x1d0 [<ffffffff84593e14>] do_syscall_64+0x34/0xb0 [<ffffffff84600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
This patch fixes this issue by freeing mount options on error path of ntfs_fill_super().
Reported-by: syzbot+9d67170b20e8f94351c8@syzkaller.appspotmail.com Signed-off-by: Shigeru Yoshida syoshida@redhat.com Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/super.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c index c321f621464b..4ff0d2c9507c 100644 --- a/fs/ntfs3/super.c +++ b/fs/ntfs3/super.c @@ -1276,6 +1276,7 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc) * Free resources here. * ntfs_fs_free will be called with fc->s_fs_info = NULL */ + put_mount_options(sbi->options); put_ntfs(sbi); sb->s_fs_info = NULL;
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit c1ca8ef0262b25493631ecbd9cb8c9893e1481a1 ]
This adds a sanity check for the i_op pointer of the inode which is returned after reading Root directory MFT record. We should check the i_op is valid before trying to create the root dentry, otherwise we may encounter a NPD while mounting a image with a funny Root directory MFT record.
[ 114.484325] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 114.484811] #PF: supervisor read access in kernel mode [ 114.485084] #PF: error_code(0x0000) - not-present page [ 114.485606] PGD 0 P4D 0 [ 114.485975] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 114.486570] CPU: 0 PID: 237 Comm: mount Tainted: G B 6.0.0-rc4 #28 [ 114.486977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 114.488169] RIP: 0010:d_flags_for_inode+0xe0/0x110 [ 114.488816] Code: 24 f7 ff 49 83 3e 00 74 41 41 83 cd 02 66 44 89 6b 02 eb 92 48 8d 7b 20 e8 6d 24 f7 ff 4c 8b 73 20 49 8d 7e 08 e8 60 241 [ 114.490326] RSP: 0018:ffff8880065e7aa8 EFLAGS: 00000296 [ 114.490695] RAX: 0000000000000001 RBX: ffff888008ccd750 RCX: ffffffff84af2aea [ 114.490986] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff87abd020 [ 114.491364] RBP: ffff8880065e7ac8 R08: 0000000000000001 R09: fffffbfff0f57a05 [ 114.491675] R10: ffffffff87abd027 R11: fffffbfff0f57a04 R12: 0000000000000000 [ 114.491954] R13: 0000000000000008 R14: 0000000000000000 R15: ffff888008ccd750 [ 114.492397] FS: 00007fdc8a627e40(0000) GS:ffff888058200000(0000) knlGS:0000000000000000 [ 114.492797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 114.493150] CR2: 0000000000000008 CR3: 00000000013ba000 CR4: 00000000000006f0 [ 114.493671] Call Trace: [ 114.493890] <TASK> [ 114.494075] __d_instantiate+0x24/0x1c0 [ 114.494505] d_instantiate.part.0+0x35/0x50 [ 114.494754] d_make_root+0x53/0x80 [ 114.494998] ntfs_fill_super+0x1232/0x1b50 [ 114.495260] ? put_ntfs+0x1d0/0x1d0 [ 114.495499] ? vsprintf+0x20/0x20 [ 114.495723] ? set_blocksize+0x95/0x150 [ 114.495964] get_tree_bdev+0x232/0x370 [ 114.496272] ? put_ntfs+0x1d0/0x1d0 [ 114.496502] ntfs_fs_get_tree+0x15/0x20 [ 114.496859] vfs_get_tree+0x4c/0x130 [ 114.497099] path_mount+0x654/0xfe0 [ 114.497507] ? putname+0x80/0xa0 [ 114.497933] ? finish_automount+0x2e0/0x2e0 [ 114.498362] ? putname+0x80/0xa0 [ 114.498571] ? kmem_cache_free+0x1c4/0x440 [ 114.498819] ? putname+0x80/0xa0 [ 114.499069] do_mount+0xd6/0xf0 [ 114.499343] ? path_mount+0xfe0/0xfe0 [ 114.499683] ? __kasan_check_write+0x14/0x20 [ 114.500133] __x64_sys_mount+0xca/0x110 [ 114.500592] do_syscall_64+0x3b/0x90 [ 114.500930] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 114.501294] RIP: 0033:0x7fdc898e948a [ 114.501542] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 114.502716] RSP: 002b:00007ffd793e58f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 114.503175] RAX: ffffffffffffffda RBX: 0000564b2228f060 RCX: 00007fdc898e948a [ 114.503588] RDX: 0000564b2228f260 RSI: 0000564b2228f2e0 RDI: 0000564b22297ce0 [ 114.504925] RBP: 0000000000000000 R08: 0000564b2228f280 R09: 0000000000000020 [ 114.505484] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564b22297ce0 [ 114.505823] R13: 0000564b2228f260 R14: 0000000000000000 R15: 00000000ffffffff [ 114.506562] </TASK> [ 114.506887] Modules linked in: [ 114.507648] CR2: 0000000000000008 [ 114.508884] ---[ end trace 0000000000000000 ]--- [ 114.509675] RIP: 0010:d_flags_for_inode+0xe0/0x110 [ 114.510140] Code: 24 f7 ff 49 83 3e 00 74 41 41 83 cd 02 66 44 89 6b 02 eb 92 48 8d 7b 20 e8 6d 24 f7 ff 4c 8b 73 20 49 8d 7e 08 e8 60 241 [ 114.511762] RSP: 0018:ffff8880065e7aa8 EFLAGS: 00000296 [ 114.512401] RAX: 0000000000000001 RBX: ffff888008ccd750 RCX: ffffffff84af2aea [ 114.513103] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff87abd020 [ 114.513512] RBP: ffff8880065e7ac8 R08: 0000000000000001 R09: fffffbfff0f57a05 [ 114.513831] R10: ffffffff87abd027 R11: fffffbfff0f57a04 R12: 0000000000000000 [ 114.514757] R13: 0000000000000008 R14: 0000000000000000 R15: ffff888008ccd750 [ 114.515411] FS: 00007fdc8a627e40(0000) GS:ffff888058200000(0000) knlGS:0000000000000000 [ 114.515794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 114.516208] CR2: 0000000000000008 CR3: 00000000013ba000 CR4: 00000000000006f0
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c index 4ff0d2c9507c..a18fb431abbe 100644 --- a/fs/ntfs3/super.c +++ b/fs/ntfs3/super.c @@ -1255,9 +1255,9 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc) ref.low = cpu_to_le32(MFT_REC_ROOT); ref.seq = cpu_to_le16(MFT_REC_ROOT); inode = ntfs_iget5(sb, &ref, &NAME_ROOT); - if (IS_ERR(inode)) { + if (IS_ERR(inode) || !inode->i_op) { ntfs_err(sb, "Failed to load root."); - err = PTR_ERR(inode); + err = IS_ERR(inode) ? PTR_ERR(inode) : -EINVAL; goto out; }
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit 4f1dc7d9756e66f3f876839ea174df2e656b7f79 ]
Although the attribute name length is checked before comparing it to some common names (e.g., $I30), the offset isn't. This adds a sanity check for the attribute name offset, guarantee the validity and prevent possible out-of-bound memory accesses.
[ 191.720056] BUG: unable to handle page fault for address: ffffebde00000008 [ 191.721060] #PF: supervisor read access in kernel mode [ 191.721586] #PF: error_code(0x0000) - not-present page [ 191.722079] PGD 0 P4D 0 [ 191.722571] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 191.723179] CPU: 0 PID: 244 Comm: mount Not tainted 6.0.0-rc4 #28 [ 191.723749] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 191.724832] RIP: 0010:kfree+0x56/0x3b0 [ 191.725870] Code: 80 48 01 d8 0f 82 65 03 00 00 48 c7 c2 00 00 00 80 48 2b 15 2c 06 dd 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 0a 069 [ 191.727375] RSP: 0018:ffff8880076f7878 EFLAGS: 00000286 [ 191.727897] RAX: ffffebde00000000 RBX: 0000000000000040 RCX: ffffffff8528d5b9 [ 191.728531] RDX: 0000777f80000000 RSI: ffffffff8522d49c RDI: 0000000000000040 [ 191.729183] RBP: ffff8880076f78a0 R08: 0000000000000000 R09: 0000000000000000 [ 191.729628] R10: ffff888008949fd8 R11: ffffed10011293fd R12: 0000000000000040 [ 191.730158] R13: ffff888008949f98 R14: ffff888008949ec0 R15: ffff888008949fb0 [ 191.730645] FS: 00007f3520cd7e40(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000 [ 191.731328] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.731667] CR2: ffffebde00000008 CR3: 0000000009704000 CR4: 00000000000006f0 [ 191.732568] Call Trace: [ 191.733231] <TASK> [ 191.733860] kvfree+0x2c/0x40 [ 191.734632] ni_clear+0x180/0x290 [ 191.735085] ntfs_evict_inode+0x45/0x70 [ 191.735495] evict+0x199/0x280 [ 191.735996] iput.part.0+0x286/0x320 [ 191.736438] iput+0x32/0x50 [ 191.736811] iget_failed+0x23/0x30 [ 191.737270] ntfs_iget5+0x337/0x1890 [ 191.737629] ? ntfs_clear_mft_tail+0x20/0x260 [ 191.738201] ? ntfs_get_block_bmap+0x70/0x70 [ 191.738482] ? ntfs_objid_init+0xf6/0x140 [ 191.738779] ? ntfs_reparse_init+0x140/0x140 [ 191.739266] ntfs_fill_super+0x121b/0x1b50 [ 191.739623] ? put_ntfs+0x1d0/0x1d0 [ 191.739984] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 191.740466] ? put_ntfs+0x1d0/0x1d0 [ 191.740787] ? sb_set_blocksize+0x6a/0x80 [ 191.741272] get_tree_bdev+0x232/0x370 [ 191.741829] ? put_ntfs+0x1d0/0x1d0 [ 191.742669] ntfs_fs_get_tree+0x15/0x20 [ 191.743132] vfs_get_tree+0x4c/0x130 [ 191.743457] path_mount+0x654/0xfe0 [ 191.743938] ? putname+0x80/0xa0 [ 191.744271] ? finish_automount+0x2e0/0x2e0 [ 191.744582] ? putname+0x80/0xa0 [ 191.745053] ? kmem_cache_free+0x1c4/0x440 [ 191.745403] ? putname+0x80/0xa0 [ 191.745616] do_mount+0xd6/0xf0 [ 191.745887] ? path_mount+0xfe0/0xfe0 [ 191.746287] ? __kasan_check_write+0x14/0x20 [ 191.746582] __x64_sys_mount+0xca/0x110 [ 191.746850] do_syscall_64+0x3b/0x90 [ 191.747122] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 191.747517] RIP: 0033:0x7f351fee948a [ 191.748332] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 191.749341] RSP: 002b:00007ffd51cf3af8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 191.749960] RAX: ffffffffffffffda RBX: 000055b903733060 RCX: 00007f351fee948a [ 191.750589] RDX: 000055b903733260 RSI: 000055b9037332e0 RDI: 000055b90373bce0 [ 191.751115] RBP: 0000000000000000 R08: 000055b903733280 R09: 0000000000000020 [ 191.751537] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055b90373bce0 [ 191.751946] R13: 000055b903733260 R14: 0000000000000000 R15: 00000000ffffffff [ 191.752519] </TASK> [ 191.752782] Modules linked in: [ 191.753785] CR2: ffffebde00000008 [ 191.754937] ---[ end trace 0000000000000000 ]--- [ 191.755429] RIP: 0010:kfree+0x56/0x3b0 [ 191.755725] Code: 80 48 01 d8 0f 82 65 03 00 00 48 c7 c2 00 00 00 80 48 2b 15 2c 06 dd 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 0a 069 [ 191.756744] RSP: 0018:ffff8880076f7878 EFLAGS: 00000286 [ 191.757218] RAX: ffffebde00000000 RBX: 0000000000000040 RCX: ffffffff8528d5b9 [ 191.757580] RDX: 0000777f80000000 RSI: ffffffff8522d49c RDI: 0000000000000040 [ 191.758016] RBP: ffff8880076f78a0 R08: 0000000000000000 R09: 0000000000000000 [ 191.758570] R10: ffff888008949fd8 R11: ffffed10011293fd R12: 0000000000000040 [ 191.758957] R13: ffff888008949f98 R14: ffff888008949ec0 R15: ffff888008949fb0 [ 191.759317] FS: 00007f3520cd7e40(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000 [ 191.759711] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.760118] CR2: ffffebde00000008 CR3: 0000000009704000 CR4: 00000000000006f0
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/inode.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index 83d4c9f42d9c..66afa3db753a 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -129,6 +129,9 @@ static struct inode *ntfs_read_mft(struct inode *inode, rsize = attr->non_res ? 0 : le32_to_cpu(attr->res.data_size); asize = le32_to_cpu(attr->size);
+ if (le16_to_cpu(attr->name_off) + attr->name_len > asize) + goto out; + switch (attr->type) { case ATTR_STD: if (attr->non_res ||
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit 4d42ecda239cc13738d6fd84d098a32e67b368b9 ]
indx_read is called when we have some NTFS directory operations that need more information from the index buffers. This adds a sanity check to make sure the returned index buffer length is legit, or we may have some out-of-bound memory accesses.
[ 560.897595] BUG: KASAN: slab-out-of-bounds in hdr_find_e.isra.0+0x10c/0x320 [ 560.898321] Read of size 2 at addr ffff888009497238 by task exp/245 [ 560.898760] [ 560.899129] CPU: 0 PID: 245 Comm: exp Not tainted 6.0.0-rc6 #37 [ 560.899505] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 560.900170] Call Trace: [ 560.900407] <TASK> [ 560.900732] dump_stack_lvl+0x49/0x63 [ 560.901108] print_report.cold+0xf5/0x689 [ 560.901395] ? hdr_find_e.isra.0+0x10c/0x320 [ 560.901716] kasan_report+0xa7/0x130 [ 560.901950] ? hdr_find_e.isra.0+0x10c/0x320 [ 560.902208] __asan_load2+0x68/0x90 [ 560.902427] hdr_find_e.isra.0+0x10c/0x320 [ 560.902846] ? cmp_uints+0xe0/0xe0 [ 560.903363] ? cmp_sdh+0x90/0x90 [ 560.903883] ? ntfs_bread_run+0x190/0x190 [ 560.904196] ? rwsem_down_read_slowpath+0x750/0x750 [ 560.904969] ? ntfs_fix_post_read+0xe0/0x130 [ 560.905259] ? __kasan_check_write+0x14/0x20 [ 560.905599] ? up_read+0x1a/0x90 [ 560.905853] ? indx_read+0x22c/0x380 [ 560.906096] indx_find+0x2ef/0x470 [ 560.906352] ? indx_find_buffer+0x2d0/0x2d0 [ 560.906692] ? __kasan_kmalloc+0x88/0xb0 [ 560.906977] dir_search_u+0x196/0x2f0 [ 560.907220] ? ntfs_nls_to_utf16+0x450/0x450 [ 560.907464] ? __kasan_check_write+0x14/0x20 [ 560.907747] ? mutex_lock+0x8f/0xe0 [ 560.907970] ? __mutex_lock_slowpath+0x20/0x20 [ 560.908214] ? kmem_cache_alloc+0x143/0x4b0 [ 560.908459] ntfs_lookup+0xe0/0x100 [ 560.908788] __lookup_slow+0x116/0x220 [ 560.909050] ? lookup_fast+0x1b0/0x1b0 [ 560.909309] ? lookup_fast+0x13f/0x1b0 [ 560.909601] walk_component+0x187/0x230 [ 560.909944] link_path_walk.part.0+0x3f0/0x660 [ 560.910285] ? handle_lookup_down+0x90/0x90 [ 560.910618] ? path_init+0x642/0x6e0 [ 560.911084] ? percpu_counter_add_batch+0x6e/0xf0 [ 560.912559] ? __alloc_file+0x114/0x170 [ 560.913008] path_openat+0x19c/0x1d10 [ 560.913419] ? getname_flags+0x73/0x2b0 [ 560.913815] ? kasan_save_stack+0x3a/0x50 [ 560.914125] ? kasan_save_stack+0x26/0x50 [ 560.914542] ? __kasan_slab_alloc+0x6d/0x90 [ 560.914924] ? kmem_cache_alloc+0x143/0x4b0 [ 560.915339] ? getname_flags+0x73/0x2b0 [ 560.915647] ? getname+0x12/0x20 [ 560.916114] ? __x64_sys_open+0x4c/0x60 [ 560.916460] ? path_lookupat.isra.0+0x230/0x230 [ 560.916867] ? __isolate_free_page+0x2e0/0x2e0 [ 560.917194] do_filp_open+0x15c/0x1f0 [ 560.917448] ? may_open_dev+0x60/0x60 [ 560.917696] ? expand_files+0xa4/0x3a0 [ 560.917923] ? __kasan_check_write+0x14/0x20 [ 560.918185] ? _raw_spin_lock+0x88/0xdb [ 560.918409] ? _raw_spin_lock_irqsave+0x100/0x100 [ 560.918783] ? _find_next_bit+0x4a/0x130 [ 560.919026] ? _raw_spin_unlock+0x19/0x40 [ 560.919276] ? alloc_fd+0x14b/0x2d0 [ 560.919635] do_sys_openat2+0x32a/0x4b0 [ 560.920035] ? file_open_root+0x230/0x230 [ 560.920336] ? __rcu_read_unlock+0x5b/0x280 [ 560.920813] do_sys_open+0x99/0xf0 [ 560.921208] ? filp_open+0x60/0x60 [ 560.921482] ? exit_to_user_mode_prepare+0x49/0x180 [ 560.921867] __x64_sys_open+0x4c/0x60 [ 560.922128] do_syscall_64+0x3b/0x90 [ 560.922369] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 560.923030] RIP: 0033:0x7f7dff2e4469 [ 560.923681] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 088 [ 560.924451] RSP: 002b:00007ffd41a210b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000002 [ 560.925168] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7dff2e4469 [ 560.925655] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00007ffd41a211f0 [ 560.926085] RBP: 00007ffd41a252a0 R08: 00007f7dff60fba0 R09: 00007ffd41a25388 [ 560.926405] R10: 0000000000400b80 R11: 0000000000000206 R12: 00000000004004e0 [ 560.926867] R13: 00007ffd41a25380 R14: 0000000000000000 R15: 0000000000000000 [ 560.927241] </TASK> [ 560.927491] [ 560.927755] Allocated by task 245: [ 560.928409] kasan_save_stack+0x26/0x50 [ 560.929271] __kasan_kmalloc+0x88/0xb0 [ 560.929778] __kmalloc+0x192/0x320 [ 560.930023] indx_read+0x249/0x380 [ 560.930224] indx_find+0x2a2/0x470 [ 560.930695] dir_search_u+0x196/0x2f0 [ 560.930892] ntfs_lookup+0xe0/0x100 [ 560.931115] __lookup_slow+0x116/0x220 [ 560.931323] walk_component+0x187/0x230 [ 560.931570] link_path_walk.part.0+0x3f0/0x660 [ 560.931791] path_openat+0x19c/0x1d10 [ 560.932008] do_filp_open+0x15c/0x1f0 [ 560.932226] do_sys_openat2+0x32a/0x4b0 [ 560.932413] do_sys_open+0x99/0xf0 [ 560.932709] __x64_sys_open+0x4c/0x60 [ 560.933417] do_syscall_64+0x3b/0x90 [ 560.933776] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 560.934235] [ 560.934486] The buggy address belongs to the object at ffff888009497000 [ 560.934486] which belongs to the cache kmalloc-512 of size 512 [ 560.935239] The buggy address is located 56 bytes to the right of [ 560.935239] 512-byte region [ffff888009497000, ffff888009497200) [ 560.936153] [ 560.937326] The buggy address belongs to the physical page: [ 560.938228] page:0000000062a3dfae refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9496 [ 560.939616] head:0000000062a3dfae order:1 compound_mapcount:0 compound_pincount:0 [ 560.940219] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) [ 560.942702] raw: 000fffffc0010200 ffffea0000164f80 dead000000000005 ffff888001041c80 [ 560.943932] raw: 0000000000000000 0000000080080008 00000001ffffffff 0000000000000000 [ 560.944568] page dumped because: kasan: bad access detected [ 560.945735] [ 560.946112] Memory state around the buggy address: [ 560.946870] ffff888009497100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 560.947242] ffff888009497180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 560.947611] >ffff888009497200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 560.947915] ^ [ 560.948249] ffff888009497280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 560.948687] ffff888009497300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/index.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/fs/ntfs3/index.c b/fs/ntfs3/index.c index 76ebea253fa2..99f8a57e9f7a 100644 --- a/fs/ntfs3/index.c +++ b/fs/ntfs3/index.c @@ -1017,6 +1017,12 @@ int indx_read(struct ntfs_index *indx, struct ntfs_inode *ni, CLST vbn, err = 0; }
+ /* check for index header length */ + if (offsetof(struct INDEX_BUFFER, ihdr) + ib->ihdr.used > bytes) { + err = -EINVAL; + goto out; + } + in->index = ib; *node = in;
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit 54e45702b648b7c0000e90b3e9b890e367e16ea8 ]
Though we already have some sanity checks while enumerating attributes, resident attribute names aren't included. This patch checks the resident attribute names are in the valid ranges.
[ 259.209031] BUG: KASAN: slab-out-of-bounds in ni_create_attr_list+0x1e1/0x850 [ 259.210770] Write of size 426 at addr ffff88800632f2b2 by task exp/255 [ 259.211551] [ 259.212035] CPU: 0 PID: 255 Comm: exp Not tainted 6.0.0-rc6 #37 [ 259.212955] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 259.214387] Call Trace: [ 259.214640] <TASK> [ 259.214895] dump_stack_lvl+0x49/0x63 [ 259.215284] print_report.cold+0xf5/0x689 [ 259.215565] ? kasan_poison+0x3c/0x50 [ 259.215778] ? kasan_unpoison+0x28/0x60 [ 259.215991] ? ni_create_attr_list+0x1e1/0x850 [ 259.216270] kasan_report+0xa7/0x130 [ 259.216481] ? ni_create_attr_list+0x1e1/0x850 [ 259.216719] kasan_check_range+0x15a/0x1d0 [ 259.216939] memcpy+0x3c/0x70 [ 259.217136] ni_create_attr_list+0x1e1/0x850 [ 259.217945] ? __rcu_read_unlock+0x5b/0x280 [ 259.218384] ? ni_remove_attr+0x2e0/0x2e0 [ 259.218712] ? kernel_text_address+0xcf/0xe0 [ 259.219064] ? __kernel_text_address+0x12/0x40 [ 259.219434] ? arch_stack_walk+0x9e/0xf0 [ 259.219668] ? __this_cpu_preempt_check+0x13/0x20 [ 259.219904] ? sysvec_apic_timer_interrupt+0x57/0xc0 [ 259.220140] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 259.220561] ni_ins_attr_ext+0x52c/0x5c0 [ 259.220984] ? ni_create_attr_list+0x850/0x850 [ 259.221532] ? run_deallocate+0x120/0x120 [ 259.221972] ? vfs_setxattr+0x128/0x300 [ 259.222688] ? setxattr+0x126/0x140 [ 259.222921] ? path_setxattr+0x164/0x180 [ 259.223431] ? __x64_sys_setxattr+0x6d/0x80 [ 259.223828] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 259.224417] ? mi_find_attr+0x3c/0xf0 [ 259.224772] ni_insert_attr+0x1ba/0x420 [ 259.225216] ? ni_ins_attr_ext+0x5c0/0x5c0 [ 259.225504] ? ntfs_read_ea+0x119/0x450 [ 259.225775] ni_insert_resident+0xc0/0x1c0 [ 259.226316] ? ni_insert_nonresident+0x400/0x400 [ 259.227001] ? __kasan_kmalloc+0x88/0xb0 [ 259.227468] ? __kmalloc+0x192/0x320 [ 259.227773] ntfs_set_ea+0x6bf/0xb30 [ 259.228216] ? ftrace_graph_ret_addr+0x2a/0xb0 [ 259.228494] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 259.228838] ? ntfs_read_ea+0x450/0x450 [ 259.229098] ? is_bpf_text_address+0x24/0x40 [ 259.229418] ? kernel_text_address+0xcf/0xe0 [ 259.229681] ? __kernel_text_address+0x12/0x40 [ 259.229948] ? unwind_get_return_address+0x3a/0x60 [ 259.230271] ? write_profile+0x270/0x270 [ 259.230537] ? arch_stack_walk+0x9e/0xf0 [ 259.230836] ntfs_setxattr+0x114/0x5c0 [ 259.231099] ? ntfs_set_acl_ex+0x2e0/0x2e0 [ 259.231529] ? evm_protected_xattr_common+0x6d/0x100 [ 259.231817] ? posix_xattr_acl+0x13/0x80 [ 259.232073] ? evm_protect_xattr+0x1f7/0x440 [ 259.232351] __vfs_setxattr+0xda/0x120 [ 259.232635] ? xattr_resolve_name+0x180/0x180 [ 259.232912] __vfs_setxattr_noperm+0x93/0x300 [ 259.233219] __vfs_setxattr_locked+0x141/0x160 [ 259.233492] ? kasan_poison+0x3c/0x50 [ 259.233744] vfs_setxattr+0x128/0x300 [ 259.234002] ? __vfs_setxattr_locked+0x160/0x160 [ 259.234837] do_setxattr+0xb8/0x170 [ 259.235567] ? vmemdup_user+0x53/0x90 [ 259.236212] setxattr+0x126/0x140 [ 259.236491] ? do_setxattr+0x170/0x170 [ 259.236791] ? debug_smp_processor_id+0x17/0x20 [ 259.237232] ? kasan_quarantine_put+0x57/0x180 [ 259.237605] ? putname+0x80/0xa0 [ 259.237870] ? __kasan_slab_free+0x11c/0x1b0 [ 259.238234] ? putname+0x80/0xa0 [ 259.238500] ? preempt_count_sub+0x18/0xc0 [ 259.238775] ? __mnt_want_write+0xaa/0x100 [ 259.238990] ? mnt_want_write+0x8b/0x150 [ 259.239290] path_setxattr+0x164/0x180 [ 259.239605] ? setxattr+0x140/0x140 [ 259.239849] ? debug_smp_processor_id+0x17/0x20 [ 259.240174] ? fpregs_assert_state_consistent+0x67/0x80 [ 259.240411] __x64_sys_setxattr+0x6d/0x80 [ 259.240715] do_syscall_64+0x3b/0x90 [ 259.240934] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 259.241697] RIP: 0033:0x7fc6b26e4469 [ 259.242647] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 088 [ 259.244512] RSP: 002b:00007ffc3c7841f8 EFLAGS: 00000217 ORIG_RAX: 00000000000000bc [ 259.245086] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc6b26e4469 [ 259.246025] RDX: 00007ffc3c784380 RSI: 00007ffc3c7842e0 RDI: 00007ffc3c784238 [ 259.246961] RBP: 00007ffc3c788410 R08: 0000000000000001 R09: 00007ffc3c7884f8 [ 259.247775] R10: 000000000000007f R11: 0000000000000217 R12: 00000000004004e0 [ 259.248534] R13: 00007ffc3c7884f0 R14: 0000000000000000 R15: 0000000000000000 [ 259.249368] </TASK> [ 259.249644] [ 259.249888] Allocated by task 255: [ 259.250283] kasan_save_stack+0x26/0x50 [ 259.250957] __kasan_kmalloc+0x88/0xb0 [ 259.251826] __kmalloc+0x192/0x320 [ 259.252745] ni_create_attr_list+0x11e/0x850 [ 259.253298] ni_ins_attr_ext+0x52c/0x5c0 [ 259.253685] ni_insert_attr+0x1ba/0x420 [ 259.253974] ni_insert_resident+0xc0/0x1c0 [ 259.254311] ntfs_set_ea+0x6bf/0xb30 [ 259.254629] ntfs_setxattr+0x114/0x5c0 [ 259.254859] __vfs_setxattr+0xda/0x120 [ 259.255155] __vfs_setxattr_noperm+0x93/0x300 [ 259.255445] __vfs_setxattr_locked+0x141/0x160 [ 259.255862] vfs_setxattr+0x128/0x300 [ 259.256251] do_setxattr+0xb8/0x170 [ 259.256522] setxattr+0x126/0x140 [ 259.256911] path_setxattr+0x164/0x180 [ 259.257308] __x64_sys_setxattr+0x6d/0x80 [ 259.257637] do_syscall_64+0x3b/0x90 [ 259.257970] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 259.258550] [ 259.258772] The buggy address belongs to the object at ffff88800632f000 [ 259.258772] which belongs to the cache kmalloc-1k of size 1024 [ 259.260190] The buggy address is located 690 bytes inside of [ 259.260190] 1024-byte region [ffff88800632f000, ffff88800632f400) [ 259.261412] [ 259.261743] The buggy address belongs to the physical page: [ 259.262354] page:0000000081e8cac9 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x632c [ 259.263722] head:0000000081e8cac9 order:2 compound_mapcount:0 compound_pincount:0 [ 259.264284] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) [ 259.265312] raw: 000fffffc0010200 ffffea0000060d00 dead000000000004 ffff888001041dc0 [ 259.265772] raw: 0000000000000000 0000000080080008 00000001ffffffff 0000000000000000 [ 259.266305] page dumped because: kasan: bad access detected [ 259.266588] [ 259.266728] Memory state around the buggy address: [ 259.267225] ffff88800632f300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 259.267841] ffff88800632f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 259.269111] >ffff88800632f400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 259.269626] ^ [ 259.270162] ffff88800632f480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 259.270810] ffff88800632f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/record.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/ntfs3/record.c b/fs/ntfs3/record.c index 30751fd618df..fd342da398be 100644 --- a/fs/ntfs3/record.c +++ b/fs/ntfs3/record.c @@ -265,6 +265,11 @@ struct ATTRIB *mi_enum_attr(struct mft_inode *mi, struct ATTRIB *attr) if (t16 + t32 > asize) return NULL;
+ if (attr->name_len && + le16_to_cpu(attr->name_off) + sizeof(short) * attr->name_len > t16) { + return NULL; + } + return attr; }
From: Hawkins Jiawei yin31149@gmail.com
[ Upstream commit 887bfc546097fbe8071dac13b2fef73b77920899 ]
Syzkaller reports slab-out-of-bounds bug as follows: ================================================================== BUG: KASAN: slab-out-of-bounds in run_unpack+0x8b7/0x970 fs/ntfs3/run.c:944 Read of size 1 at addr ffff88801bbdff02 by task syz-executor131/3611
[...] Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x2ba/0x719 mm/kasan/report.c:433 kasan_report+0xb1/0x1e0 mm/kasan/report.c:495 run_unpack+0x8b7/0x970 fs/ntfs3/run.c:944 run_unpack_ex+0xb0/0x7c0 fs/ntfs3/run.c:1057 ntfs_read_mft fs/ntfs3/inode.c:368 [inline] ntfs_iget5+0xc20/0x3280 fs/ntfs3/inode.c:501 ntfs_loadlog_and_replay+0x124/0x5d0 fs/ntfs3/fsntfs.c:272 ntfs_fill_super+0x1eff/0x37f0 fs/ntfs3/super.c:1018 get_tree_bdev+0x440/0x760 fs/super.c:1323 vfs_get_tree+0x89/0x2f0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x1326/0x1e20 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK>
The buggy address belongs to the physical page: page:ffffea00006ef600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1bbd8 head:ffffea00006ef600 order:3 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff88801bbdfe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88801bbdfe80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88801bbdff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff88801bbdff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88801bbe0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
Kernel will tries to read record and parse MFT from disk in ntfs_read_mft().
Yet the problem is that during enumerating attributes in record, kernel doesn't check whether run_off field loading from the disk is a valid value.
To be more specific, if attr->nres.run_off is larger than attr->size, kernel will passes an invalid argument run_buf_size in run_unpack_ex(), which having an integer overflow. Then this invalid argument will triggers the slab-out-of-bounds Read bug as above.
This patch solves it by adding the sanity check between the offset to packed runs and attribute size.
link: https://lore.kernel.org/all/0000000000009145fc05e94bd5c3@google.com/#t Reported-and-tested-by: syzbot+8d6fbb27a6aded64b25b@syzkaller.appspotmail.com Signed-off-by: Hawkins Jiawei yin31149@gmail.com Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/inode.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index 66afa3db753a..00fd368e7b4a 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -373,6 +373,13 @@ static struct inode *ntfs_read_mft(struct inode *inode, }
t64 = le64_to_cpu(attr->nres.svcn); + + /* offset to packed runs is out-of-bounds */ + if (roff > asize) { + err = -EINVAL; + goto out; + } + err = run_unpack_ex(run, sbi, ino, t64, le64_to_cpu(attr->nres.evcn), t64, Add2Ptr(attr, roff), asize - roff); if (err < 0)
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit f74495761df10c25a98256d16ea7465191b6e2cd ]
Some NUC15 LAPBC710 devices don't expose the same DMI information as the Intel reference, add additional entry in the match table.
BugLink: https://github.com/thesofproject/linux/issues/3885 Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com Signed-off-by: Bard Liao yung-chuan.liao@linux.intel.com Link: https://lore.kernel.org/r/20221018012500.1592994-1-yung-chuan.liao@linux.int... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soundwire/dmi-quirks.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/soundwire/dmi-quirks.c b/drivers/soundwire/dmi-quirks.c index 747983743a14..2bf534632f64 100644 --- a/drivers/soundwire/dmi-quirks.c +++ b/drivers/soundwire/dmi-quirks.c @@ -71,6 +71,14 @@ static const struct dmi_system_id adr_remap_quirk_table[] = { }, .driver_data = (void *)intel_tgl_bios, }, + { + /* quirk used for NUC15 LAPBC710 skew */ + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "Intel Corporation"), + DMI_MATCH(DMI_BOARD_NAME, "LAPBC710"), + }, + .driver_data = (void *)intel_tgl_bios, + }, { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc"),
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit bfcdbae0523bd95eb75a739ffb6221a37109881e ]
This enhances the sanity check for $SDH and $SII while initializing NTFS security, guarantees these index root are legit.
[ 162.459513] BUG: KASAN: use-after-free in hdr_find_e.isra.0+0x10c/0x320 [ 162.460176] Read of size 2 at addr ffff8880037bca99 by task mount/243 [ 162.460851] [ 162.461252] CPU: 0 PID: 243 Comm: mount Not tainted 6.0.0-rc7 #42 [ 162.461744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 162.462609] Call Trace: [ 162.462954] <TASK> [ 162.463276] dump_stack_lvl+0x49/0x63 [ 162.463822] print_report.cold+0xf5/0x689 [ 162.464608] ? unwind_get_return_address+0x3a/0x60 [ 162.465766] ? hdr_find_e.isra.0+0x10c/0x320 [ 162.466975] kasan_report+0xa7/0x130 [ 162.467506] ? _raw_spin_lock_irq+0xc0/0xf0 [ 162.467998] ? hdr_find_e.isra.0+0x10c/0x320 [ 162.468536] __asan_load2+0x68/0x90 [ 162.468923] hdr_find_e.isra.0+0x10c/0x320 [ 162.469282] ? cmp_uints+0xe0/0xe0 [ 162.469557] ? cmp_sdh+0x90/0x90 [ 162.469864] ? ni_find_attr+0x214/0x300 [ 162.470217] ? ni_load_mi+0x80/0x80 [ 162.470479] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 162.470931] ? ntfs_bread_run+0x190/0x190 [ 162.471307] ? indx_get_root+0xe4/0x190 [ 162.471556] ? indx_get_root+0x140/0x190 [ 162.471833] ? indx_init+0x1e0/0x1e0 [ 162.472069] ? fnd_clear+0x115/0x140 [ 162.472363] ? _raw_spin_lock_irqsave+0x100/0x100 [ 162.472731] indx_find+0x184/0x470 [ 162.473461] ? sysvec_apic_timer_interrupt+0x57/0xc0 [ 162.474429] ? indx_find_buffer+0x2d0/0x2d0 [ 162.474704] ? do_syscall_64+0x3b/0x90 [ 162.474962] dir_search_u+0x196/0x2f0 [ 162.475381] ? ntfs_nls_to_utf16+0x450/0x450 [ 162.475661] ? ntfs_security_init+0x3d6/0x440 [ 162.475906] ? is_sd_valid+0x180/0x180 [ 162.476191] ntfs_extend_init+0x13f/0x2c0 [ 162.476496] ? ntfs_fix_post_read+0x130/0x130 [ 162.476861] ? iput.part.0+0x286/0x320 [ 162.477325] ntfs_fill_super+0x11e0/0x1b50 [ 162.477709] ? put_ntfs+0x1d0/0x1d0 [ 162.477970] ? vsprintf+0x20/0x20 [ 162.478258] ? set_blocksize+0x95/0x150 [ 162.478538] get_tree_bdev+0x232/0x370 [ 162.478789] ? put_ntfs+0x1d0/0x1d0 [ 162.479038] ntfs_fs_get_tree+0x15/0x20 [ 162.479374] vfs_get_tree+0x4c/0x130 [ 162.479729] path_mount+0x654/0xfe0 [ 162.480124] ? putname+0x80/0xa0 [ 162.480484] ? finish_automount+0x2e0/0x2e0 [ 162.480894] ? putname+0x80/0xa0 [ 162.481467] ? kmem_cache_free+0x1c4/0x440 [ 162.482280] ? putname+0x80/0xa0 [ 162.482714] do_mount+0xd6/0xf0 [ 162.483264] ? path_mount+0xfe0/0xfe0 [ 162.484782] ? __kasan_check_write+0x14/0x20 [ 162.485593] __x64_sys_mount+0xca/0x110 [ 162.486024] do_syscall_64+0x3b/0x90 [ 162.486543] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 162.487141] RIP: 0033:0x7f9d374e948a [ 162.488324] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 162.489728] RSP: 002b:00007ffe30e73d18 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 162.490971] RAX: ffffffffffffffda RBX: 0000561cdb43a060 RCX: 00007f9d374e948a [ 162.491669] RDX: 0000561cdb43a260 RSI: 0000561cdb43a2e0 RDI: 0000561cdb442af0 [ 162.492050] RBP: 0000000000000000 R08: 0000561cdb43a280 R09: 0000000000000020 [ 162.492459] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000561cdb442af0 [ 162.493183] R13: 0000561cdb43a260 R14: 0000000000000000 R15: 00000000ffffffff [ 162.493644] </TASK> [ 162.493908] [ 162.494214] The buggy address belongs to the physical page: [ 162.494761] page:000000003e38a3d5 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x37bc [ 162.496064] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) [ 162.497278] raw: 000fffffc0000000 ffffea00000df1c8 ffffea00000df008 0000000000000000 [ 162.498928] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000 [ 162.500542] page dumped because: kasan: bad access detected [ 162.501057] [ 162.501242] Memory state around the buggy address: [ 162.502230] ffff8880037bc980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.502977] ffff8880037bca00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.503522] >ffff8880037bca80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.503963] ^ [ 162.504370] ffff8880037bcb00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.504766] ffff8880037bcb80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/fsntfs.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/ntfs3/fsntfs.c b/fs/ntfs3/fsntfs.c index 24b57c3cc625..4a97a28cb8f2 100644 --- a/fs/ntfs3/fsntfs.c +++ b/fs/ntfs3/fsntfs.c @@ -1878,9 +1878,10 @@ int ntfs_security_init(struct ntfs_sb_info *sbi) goto out; }
- root_sdh = resident_data(attr); + root_sdh = resident_data_ex(attr, sizeof(struct INDEX_ROOT)); if (root_sdh->type != ATTR_ZERO || - root_sdh->rule != NTFS_COLLATION_TYPE_SECURITY_HASH) { + root_sdh->rule != NTFS_COLLATION_TYPE_SECURITY_HASH || + offsetof(struct INDEX_ROOT, ihdr) + root_sdh->ihdr.used > attr->res.data_size) { err = -EINVAL; goto out; } @@ -1896,9 +1897,10 @@ int ntfs_security_init(struct ntfs_sb_info *sbi) goto out; }
- root_sii = resident_data(attr); + root_sii = resident_data_ex(attr, sizeof(struct INDEX_ROOT)); if (root_sii->type != ATTR_ZERO || - root_sii->rule != NTFS_COLLATION_TYPE_UINT) { + root_sii->rule != NTFS_COLLATION_TYPE_UINT || + offsetof(struct INDEX_ROOT, ihdr) + root_sii->ihdr.used > attr->res.data_size) { err = -EINVAL; goto out; }
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 0d0f659bf713662fabed973f9996b8f23c59ca51 ]
syzbot is reporting too large allocation at wnd_init() [1], for a crafted filesystem can become wnd->nwnd close to UINT_MAX. Add __GFP_NOWARN in order to avoid too large allocation warning, than exhausting memory by using kvcalloc().
Link: https://syzkaller.appspot.com/bug?extid=fa4648a5446460b7b963 [1] Reported-by: syzot syzbot+fa4648a5446460b7b963@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ntfs3/bitmap.c b/fs/ntfs3/bitmap.c index 7f2055b7427a..2a63793f522d 100644 --- a/fs/ntfs3/bitmap.c +++ b/fs/ntfs3/bitmap.c @@ -666,7 +666,7 @@ int wnd_init(struct wnd_bitmap *wnd, struct super_block *sb, size_t nbits) if (!wnd->bits_last) wnd->bits_last = wbits;
- wnd->free_bits = kcalloc(wnd->nwnd, sizeof(u16), GFP_NOFS); + wnd->free_bits = kcalloc(wnd->nwnd, sizeof(u16), GFP_NOFS | __GFP_NOWARN); if (!wnd->free_bits) return -ENOMEM;
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 59bfd7a483da36bd202532a3d9ea1f14f3bf3aaf ]
syzbot is reporting too large allocation at ntfs_fill_super() [1], for a crafted filesystem can contain bogus inode->i_size. Add __GFP_NOWARN in order to avoid too large allocation warning, than exhausting memory by using kvmalloc().
Link: https://syzkaller.appspot.com/bug?extid=33f3faaa0c08744f7d40 [1] Reported-by: syzot syzbot+33f3faaa0c08744f7d40@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c index a18fb431abbe..33b1833ad525 100644 --- a/fs/ntfs3/super.c +++ b/fs/ntfs3/super.c @@ -1136,7 +1136,7 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc) goto put_inode_out; } bytes = inode->i_size; - sbi->def_table = t = kmalloc(bytes, GFP_NOFS); + sbi->def_table = t = kmalloc(bytes, GFP_NOFS | __GFP_NOWARN); if (!t) { err = -ENOMEM; goto put_inode_out;
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 658015167a8432b88f5d032e9d85d8fd50e5bf2c ]
There were two patches which addressed the same bug and added the same condition:
commit 6db620863f85 ("fs/ntfs3: Validate data run offset") commit 887bfc546097 ("fs/ntfs3: Fix slab-out-of-bounds read in run_unpack")
Delete one condition.
Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/inode.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index 00fd368e7b4a..ed640e4e3fac 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -374,12 +374,6 @@ static struct inode *ntfs_read_mft(struct inode *inode,
t64 = le64_to_cpu(attr->nres.svcn);
- /* offset to packed runs is out-of-bounds */ - if (roff > asize) { - err = -EINVAL; - goto out; - } - err = run_unpack_ex(run, sbi, ino, t64, le64_to_cpu(attr->nres.evcn), t64, Add2Ptr(attr, roff), asize - roff); if (err < 0)
From: Yin Xiujiang yinxiujiang@kylinos.cn
[ Upstream commit ecfbd57cf9c5ca225184ae266ce44ae473792132 ]
When PAGE_SIZE is 64K, if read_log_page is called by log_read_rst for the first time, the size of *buffer would be equal to DefaultLogPageSize(4K).But for *buffer operations like memcpy, if the memory area size(n) which being assigned to buffer is larger than 4K (log->page_size(64K) or bytes(64K-page_off)), it will cause an out of boundary error. Call trace: [...] kasan_report+0x44/0x130 check_memory_region+0xf8/0x1a0 memcpy+0xc8/0x100 ntfs_read_run_nb+0x20c/0x460 read_log_page+0xd0/0x1f4 log_read_rst+0x110/0x75c log_replay+0x1e8/0x4aa0 ntfs_loadlog_and_replay+0x290/0x2d0 ntfs_fill_super+0x508/0xec0 get_tree_bdev+0x1fc/0x34c [...]
Fix this by setting variable r_page to NULL in log_read_rst.
Signed-off-by: Yin Xiujiang yinxiujiang@kylinos.cn Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/fslog.c | 26 ++------------------------ 1 file changed, 2 insertions(+), 24 deletions(-)
diff --git a/fs/ntfs3/fslog.c b/fs/ntfs3/fslog.c index bcdddcd7bc79..20abdb268286 100644 --- a/fs/ntfs3/fslog.c +++ b/fs/ntfs3/fslog.c @@ -1132,7 +1132,7 @@ static int read_log_page(struct ntfs_log *log, u32 vbo, return -EINVAL;
if (!*buffer) { - to_free = kmalloc(bytes, GFP_NOFS); + to_free = kmalloc(log->page_size, GFP_NOFS); if (!to_free) return -ENOMEM; *buffer = to_free; @@ -1180,10 +1180,7 @@ static int log_read_rst(struct ntfs_log *log, u32 l_size, bool first, struct restart_info *info) { u32 skip, vbo; - struct RESTART_HDR *r_page = kmalloc(DefaultLogPageSize, GFP_NOFS); - - if (!r_page) - return -ENOMEM; + struct RESTART_HDR *r_page = NULL;
/* Determine which restart area we are looking for. */ if (first) { @@ -1197,7 +1194,6 @@ static int log_read_rst(struct ntfs_log *log, u32 l_size, bool first, /* Loop continuously until we succeed. */ for (; vbo < l_size; vbo = 2 * vbo + skip, skip = 0) { bool usa_error; - u32 sys_page_size; bool brst, bchk; struct RESTART_AREA *ra;
@@ -1251,24 +1247,6 @@ static int log_read_rst(struct ntfs_log *log, u32 l_size, bool first, goto check_result; }
- /* Read the entire restart area. */ - sys_page_size = le32_to_cpu(r_page->sys_page_size); - if (DefaultLogPageSize != sys_page_size) { - kfree(r_page); - r_page = kzalloc(sys_page_size, GFP_NOFS); - if (!r_page) - return -ENOMEM; - - if (read_log_page(log, vbo, - (struct RECORD_PAGE_HDR **)&r_page, - &usa_error)) { - /* Ignore any errors. */ - kfree(r_page); - r_page = NULL; - continue; - } - } - if (is_client_area_valid(r_page, usa_error)) { info->valid_page = true; ra = Add2Ptr(r_page, le16_to_cpu(r_page->ra_off));
From: Christophe Leroy christophe.leroy@csgroup.eu
[ Upstream commit efb11fdb3e1a9f694fa12b70b21e69e55ec59c36 ]
find_insn() will return NULL in case of failure. Check insn in order to avoid a kernel Oops for NULL pointer dereference.
Tested-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Reviewed-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Acked-by: Josh Poimboeuf jpoimboe@kernel.org Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20221114175754.1131267-9-sv@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/objtool/check.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/objtool/check.c b/tools/objtool/check.c index edac5aaa2802..308c8806ad94 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -197,7 +197,7 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func, return false;
insn = find_insn(file, func->sec, func->offset); - if (!insn->func) + if (!insn || !insn->func) return false;
func_for_each_insn(file, func, insn) {
From: Nathan Lynch nathanl@linux.ibm.com
[ Upstream commit ed2213bfb192ab51f09f12e9b49b5d482c6493f3 ]
rtas_os_term() is called during panic. Its behavior depends on a couple of conditions in the /rtas node of the device tree, the traversal of which entails locking and local IRQ state changes. If the kernel panics while devtree_lock is held, rtas_os_term() as currently written could hang.
Instead of discovering the relevant characteristics at panic time, cache them in file-static variables at boot. Note the lookup for "ibm,extended-os-term" is converted to of_property_read_bool() since it is a boolean property, not an RTAS function token.
Signed-off-by: Nathan Lynch nathanl@linux.ibm.com Reviewed-by: Nicholas Piggin npiggin@gmail.com Reviewed-by: Andrew Donnellan ajd@linux.ibm.com [mpe: Incorporate suggested change from Nick] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20221118150751.469393-4-nathanl@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/rtas.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 7834ce3aa7f1..4d8de49c9d4b 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -788,6 +788,7 @@ void __noreturn rtas_halt(void)
/* Must be in the RMO region, so we place it here */ static char rtas_os_term_buf[2048]; +static s32 ibm_os_term_token = RTAS_UNKNOWN_SERVICE;
void rtas_os_term(char *str) { @@ -799,14 +800,13 @@ void rtas_os_term(char *str) * this property may terminate the partition which we want to avoid * since it interferes with panic_timeout. */ - if (RTAS_UNKNOWN_SERVICE == rtas_token("ibm,os-term") || - RTAS_UNKNOWN_SERVICE == rtas_token("ibm,extended-os-term")) + if (ibm_os_term_token == RTAS_UNKNOWN_SERVICE) return;
snprintf(rtas_os_term_buf, 2048, "OS panic: %s", str);
do { - status = rtas_call(rtas_token("ibm,os-term"), 1, 1, NULL, + status = rtas_call(ibm_os_term_token, 1, 1, NULL, __pa(rtas_os_term_buf)); } while (rtas_busy_delay(status));
@@ -1167,6 +1167,13 @@ void __init rtas_initialize(void) no_entry = of_property_read_u32(rtas.dev, "linux,rtas-entry", &entry); rtas.entry = no_entry ? rtas.base : entry;
+ /* + * Discover these now to avoid device tree lookups in the + * panic path. + */ + if (of_property_read_bool(rtas.dev, "ibm,extended-os-term")) + ibm_os_term_token = rtas_token("ibm,os-term"); + /* If RTAS was found, allocate the RMO buffer for it and look for * the stop-self token if any */
From: Nathan Lynch nathanl@linux.ibm.com
[ Upstream commit 6c606e57eecc37d6b36d732b1ff7e55b7dc32dd4 ]
It's unsafe to use rtas_busy_delay() to handle a busy status from the ibm,os-term RTAS function in rtas_os_term():
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:618 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 2, expected: 0 CPU: 7 PID: 1 Comm: swapper/0 Tainted: G D 6.0.0-rc5-02182-gf8553a572277-dirty #9 Call Trace: [c000000007b8f000] [c000000001337110] dump_stack_lvl+0xb4/0x110 (unreliable) [c000000007b8f040] [c0000000002440e4] __might_resched+0x394/0x3c0 [c000000007b8f0e0] [c00000000004f680] rtas_busy_delay+0x120/0x1b0 [c000000007b8f100] [c000000000052d04] rtas_os_term+0xb8/0xf4 [c000000007b8f180] [c0000000001150fc] pseries_panic+0x50/0x68 [c000000007b8f1f0] [c000000000036354] ppc_panic_platform_handler+0x34/0x50 [c000000007b8f210] [c0000000002303c4] notifier_call_chain+0xd4/0x1c0 [c000000007b8f2b0] [c0000000002306cc] atomic_notifier_call_chain+0xac/0x1c0 [c000000007b8f2f0] [c0000000001d62b8] panic+0x228/0x4d0 [c000000007b8f390] [c0000000001e573c] do_exit+0x140c/0x1420 [c000000007b8f480] [c0000000001e586c] make_task_dead+0xdc/0x200
Use rtas_busy_delay_time() instead, which signals without side effects whether to attempt the ibm,os-term RTAS call again.
Signed-off-by: Nathan Lynch nathanl@linux.ibm.com Reviewed-by: Nicholas Piggin npiggin@gmail.com Reviewed-by: Andrew Donnellan ajd@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20221118150751.469393-5-nathanl@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/rtas.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 4d8de49c9d4b..2dae702e7a5a 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -805,10 +805,15 @@ void rtas_os_term(char *str)
snprintf(rtas_os_term_buf, 2048, "OS panic: %s", str);
+ /* + * Keep calling as long as RTAS returns a "try again" status, + * but don't use rtas_busy_delay(), which potentially + * schedules. + */ do { status = rtas_call(ibm_os_term_token, 1, 1, NULL, __pa(rtas_os_term_buf)); - } while (rtas_busy_delay(status)); + } while (rtas_busy_delay_time(status));
if (status != 0) printk(KERN_EMERG "ibm,os-term call failed %d\n", status);
From: José Expósito jose.exposito89@gmail.com
[ Upstream commit 4eab1c2fe06c98a4dff258dd64800b6986c101e9 ]
The HID descriptor of this device contains two mouse collections, one for mouse emulation and the other for the trackpoint.
Both collections get merged and, because the first one defines X and Y, the movemenent events reported by the trackpoint collection are ignored.
Set the MT_CLS_WIN_8_FORCE_MULTI_INPUT class for this device to be able to receive its reports.
This fix is similar to/based on commit 40d5bb87377a ("HID: multitouch: enable multi-input as a quirk for some devices").
Link: https://gitlab.freedesktop.org/libinput/libinput/-/issues/825 Reported-by: Akito the@akito.ooo Tested-by: Akito the@akito.ooo Signed-off-by: José Expósito jose.exposito89@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-multitouch.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c index 08462ac72b89..6b86d368d5e7 100644 --- a/drivers/hid/hid-multitouch.c +++ b/drivers/hid/hid-multitouch.c @@ -1965,6 +1965,10 @@ static const struct hid_device_id mt_devices[] = { HID_DEVICE(BUS_I2C, HID_GROUP_MULTITOUCH_WIN_8, USB_VENDOR_ID_ELAN, 0x313a) },
+ { .driver_data = MT_CLS_WIN_8_FORCE_MULTI_INPUT, + HID_DEVICE(BUS_I2C, HID_GROUP_MULTITOUCH_WIN_8, + USB_VENDOR_ID_ELAN, 0x3148) }, + /* Elitegroup panel */ { .driver_data = MT_CLS_SERIAL, MT_USB_DEVICE(USB_VENDOR_ID_ELITEGROUP,
From: Terry Junge linuxhid@cosmicgizmosystems.com
[ Upstream commit 3d57f36c89d8ba32b2c312f397a37fd1a2dc7cfc ]
I no longer work for Plantronics (aka Poly, aka HP) and do not have access to the headsets in order to test. However, as noted by Maxim, the other 32xx models that share the same base code set as the 3220 would need the same quirk. This patch adds the PIDs for the rest of the Blackwire 32XX product family that require the quirk.
Plantronics Blackwire 3210 Series (047f:c055) Plantronics Blackwire 3215 Series (047f:c057) Plantronics Blackwire 3225 Series (047f:c058)
Quote from previous patch by Maxim Mikityanskiy Plantronics Blackwire 3220 Series (047f:c056) sends HID reports twice for each volume key press. This patch adds a quirk to hid-plantronics for this product ID, which will ignore the second volume key press if it happens within 5 ms from the last one that was handled.
The patch was tested on the mentioned model only, it shouldn't affect other models, however, this quirk might be needed for them too. Auto-repeat (when a key is held pressed) is not affected, because the rate is about 3 times per second, which is far less frequent than once in 5 ms. End quote
Signed-off-by: Terry Junge linuxhid@cosmicgizmosystems.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-ids.h | 3 +++ drivers/hid/hid-plantronics.c | 9 +++++++++ 2 files changed, 12 insertions(+)
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 78b55f845d2d..8698d49edaa3 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -966,7 +966,10 @@ #define USB_DEVICE_ID_ORTEK_IHOME_IMAC_A210S 0x8003
#define USB_VENDOR_ID_PLANTRONICS 0x047f +#define USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3210_SERIES 0xc055 #define USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3220_SERIES 0xc056 +#define USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3215_SERIES 0xc057 +#define USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3225_SERIES 0xc058
#define USB_VENDOR_ID_PANASONIC 0x04da #define USB_DEVICE_ID_PANABOARD_UBT780 0x1044 diff --git a/drivers/hid/hid-plantronics.c b/drivers/hid/hid-plantronics.c index e81b7cec2d12..3d414ae194ac 100644 --- a/drivers/hid/hid-plantronics.c +++ b/drivers/hid/hid-plantronics.c @@ -198,9 +198,18 @@ static int plantronics_probe(struct hid_device *hdev, }
static const struct hid_device_id plantronics_devices[] = { + { HID_USB_DEVICE(USB_VENDOR_ID_PLANTRONICS, + USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3210_SERIES), + .driver_data = PLT_QUIRK_DOUBLE_VOLUME_KEYS }, { HID_USB_DEVICE(USB_VENDOR_ID_PLANTRONICS, USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3220_SERIES), .driver_data = PLT_QUIRK_DOUBLE_VOLUME_KEYS }, + { HID_USB_DEVICE(USB_VENDOR_ID_PLANTRONICS, + USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3215_SERIES), + .driver_data = PLT_QUIRK_DOUBLE_VOLUME_KEYS }, + { HID_USB_DEVICE(USB_VENDOR_ID_PLANTRONICS, + USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3225_SERIES), + .driver_data = PLT_QUIRK_DOUBLE_VOLUME_KEYS }, { HID_USB_DEVICE(USB_VENDOR_ID_PLANTRONICS, HID_ANY_ID) }, { } };
From: Luca Stefani luca@osomprivacy.com
commit beca3e311a49cd3c55a056096531737d7afa4361 upstream.
If mem-type is specified in the device tree it would end up overriding the record_size field instead of populating mem_type.
As record_size is currently parsed after the improper assignment with default size 0 it continued to work as expected regardless of the value found in the device tree.
Simply changing the target field of the struct is enough to get mem-type working as expected.
Fixes: 9d843e8fafc7 ("pstore: Add mem_type property DT parsing support") Cc: stable@vger.kernel.org Signed-off-by: Luca Stefani luca@osomprivacy.com Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20221222131049.286288-1-luca@osomprivacy.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/pstore/ram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -670,7 +670,7 @@ static int ramoops_parse_dt(struct platf field = value; \ }
- parse_u32("mem-type", pdata->record_size, pdata->mem_type); + parse_u32("mem-type", pdata->mem_type, pdata->mem_type); parse_u32("record-size", pdata->record_size, 0); parse_u32("console-size", pdata->console_size, 0); parse_u32("ftrace-size", pdata->ftrace_size, 0);
From: Qiujun Huang hqjagain@gmail.com
commit 99b3b837855b987563bcfb397cf9ddd88262814b upstream.
There is a case found when triggering a panic_on_oom, pstore fails to dump kmsg. Because psz_kmsg_write_record can't get the new buffer.
Handle this by using GFP_ATOMIC to allocate a buffer at lower watermark.
Signed-off-by: Qiujun Huang hqjagain@gmail.com Fixes: 335426c6dcdd ("pstore/zone: Provide way to skip "broken" zone for MTD devices") Cc: WeiXiong Liao gmpy.liaowx@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/CAJRQjofRCF7wjrYmw3D7zd5QZnwHQq+F8U-mJDJ6NZ4bddYdL... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/pstore/zone.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/pstore/zone.c +++ b/fs/pstore/zone.c @@ -761,7 +761,7 @@ static inline int notrace psz_kmsg_write /* avoid destroying old data, allocate a new one */ len = zone->buffer_size + sizeof(*zone->buffer); zone->oldbuf = zone->buffer; - zone->buffer = kzalloc(len, GFP_KERNEL); + zone->buffer = kzalloc(len, GFP_ATOMIC); if (!zone->buffer) { zone->buffer = zone->oldbuf; return -ENOMEM;
From: Aditya Garg gargaditya08@live.com
commit 9f2b5debc07073e6dfdd774e3594d0224b991927 upstream.
Despite specifying UID and GID in mount command, the specified UID and GID were not being assigned. This patch fixes this issue.
Link: https://lkml.kernel.org/r/C0264BF5-059C-45CF-B8DA-3A3BD2C803A2@live.com Signed-off-by: Aditya Garg gargaditya08@live.com Reviewed-by: Viacheslav Dubeyko slava@dubeyko.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/hfsplus/hfsplus_fs.h | 2 ++ fs/hfsplus/inode.c | 4 ++-- fs/hfsplus/options.c | 4 ++++ 3 files changed, 8 insertions(+), 2 deletions(-)
--- a/fs/hfsplus/hfsplus_fs.h +++ b/fs/hfsplus/hfsplus_fs.h @@ -198,6 +198,8 @@ struct hfsplus_sb_info { #define HFSPLUS_SB_HFSX 3 #define HFSPLUS_SB_CASEFOLD 4 #define HFSPLUS_SB_NOBARRIER 5 +#define HFSPLUS_SB_UID 6 +#define HFSPLUS_SB_GID 7
static inline struct hfsplus_sb_info *HFSPLUS_SB(struct super_block *sb) { --- a/fs/hfsplus/inode.c +++ b/fs/hfsplus/inode.c @@ -190,11 +190,11 @@ static void hfsplus_get_perms(struct ino mode = be16_to_cpu(perms->mode);
i_uid_write(inode, be32_to_cpu(perms->owner)); - if (!i_uid_read(inode) && !mode) + if ((test_bit(HFSPLUS_SB_UID, &sbi->flags)) || (!i_uid_read(inode) && !mode)) inode->i_uid = sbi->uid;
i_gid_write(inode, be32_to_cpu(perms->group)); - if (!i_gid_read(inode) && !mode) + if ((test_bit(HFSPLUS_SB_GID, &sbi->flags)) || (!i_gid_read(inode) && !mode)) inode->i_gid = sbi->gid;
if (dir) { --- a/fs/hfsplus/options.c +++ b/fs/hfsplus/options.c @@ -140,6 +140,8 @@ int hfsplus_parse_options(char *input, s if (!uid_valid(sbi->uid)) { pr_err("invalid uid specified\n"); return 0; + } else { + set_bit(HFSPLUS_SB_UID, &sbi->flags); } break; case opt_gid: @@ -151,6 +153,8 @@ int hfsplus_parse_options(char *input, s if (!gid_valid(sbi->gid)) { pr_err("invalid gid specified\n"); return 0; + } else { + set_bit(HFSPLUS_SB_GID, &sbi->flags); } break; case opt_part:
From: Wang Yufen wangyufen@huawei.com
commit e7f703ff2507f4e9f496da96cd4b78fd3026120c upstream.
Fix to return a negative error code from create_elf_fdpic_tables() instead of 0.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Wang Yufen wangyufen@huawei.com Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/1669945261-30271-1-git-send-email-wangyufen@huawei... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/binfmt_elf_fdpic.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -434,8 +434,9 @@ static int load_elf_fdpic_binary(struct current->mm->start_stack = current->mm->start_brk + stack_size; #endif
- if (create_elf_fdpic_tables(bprm, current->mm, - &exec_params, &interp_params) < 0) + retval = create_elf_fdpic_tables(bprm, current->mm, &exec_params, + &interp_params); + if (retval < 0) goto error;
kdebug("- start_code %lx", current->mm->start_code);
From: Zhang Tianci zhangtianci.1997@bytedance.com
commit 5b0db51215e895a361bc63132caa7cca36a53d6a upstream.
There is a wrong case of link() on overlay: $ mkdir /lower /fuse /merge $ mount -t fuse /fuse $ mkdir /fuse/upper /fuse/work $ mount -t overlay /merge -o lowerdir=/lower,upperdir=/fuse/upper,\ workdir=work $ touch /merge/file $ chown bin.bin /merge/file // the file's caller becomes "bin" $ ln /merge/file /merge/lnkfile
Then we will get an error(EACCES) because fuse daemon checks the link()'s caller is "bin", it denied this request.
In the changing history of ovl_link(), there are two key commits:
The first is commit bb0d2b8ad296 ("ovl: fix sgid on directory") which overrides the cred's fsuid/fsgid using the new inode. The new inode's owner is initialized by inode_init_owner(), and inode->fsuid is assigned to the current user. So the override fsuid becomes the current user. We know link() is actually modifying the directory, so the caller must have the MAY_WRITE permission on the directory. The current caller may should have this permission. This is acceptable to use the caller's fsuid.
The second is commit 51f7e52dc943 ("ovl: share inode for hard link") which removed the inode creation in ovl_link(). This commit move inode_init_owner() into ovl_create_object(), so the ovl_link() just give the old inode to ovl_create_or_link(). Then the override fsuid becomes the old inode's fsuid, neither the caller nor the overlay's mounter! So this is incorrect.
Fix this bug by using ovl mounter's fsuid/fsgid to do underlying fs's link().
Link: https://lore.kernel.org/all/20220817102952.xnvesg3a7rbv576x@wittgenstein/T Link: https://lore.kernel.org/lkml/20220825130552.29587-1-zhangtianci.1997@bytedan... Signed-off-by: Zhang Tianci zhangtianci.1997@bytedance.com Signed-off-by: Jiachen Zhang zhangjiachen.jaycee@bytedance.com Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org Fixes: 51f7e52dc943 ("ovl: share inode for hard link") Cc: stable@vger.kernel.org # v4.8 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/overlayfs/dir.c | 46 ++++++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 16 deletions(-)
--- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -589,28 +589,42 @@ static int ovl_create_or_link(struct den goto out_revert_creds; }
- err = -ENOMEM; - override_cred = prepare_creds(); - if (override_cred) { + if (!attr->hardlink) { + err = -ENOMEM; + override_cred = prepare_creds(); + if (!override_cred) + goto out_revert_creds; + /* + * In the creation cases(create, mkdir, mknod, symlink), + * ovl should transfer current's fs{u,g}id to underlying + * fs. Because underlying fs want to initialize its new + * inode owner using current's fs{u,g}id. And in this + * case, the @inode is a new inode that is initialized + * in inode_init_owner() to current's fs{u,g}id. So use + * the inode's i_{u,g}id to override the cred's fs{u,g}id. + * + * But in the other hardlink case, ovl_link() does not + * create a new inode, so just use the ovl mounter's + * fs{u,g}id. + */ override_cred->fsuid = inode->i_uid; override_cred->fsgid = inode->i_gid; - if (!attr->hardlink) { - err = security_dentry_create_files_as(dentry, - attr->mode, &dentry->d_name, old_cred, - override_cred); - if (err) { - put_cred(override_cred); - goto out_revert_creds; - } + err = security_dentry_create_files_as(dentry, + attr->mode, &dentry->d_name, old_cred, + override_cred); + if (err) { + put_cred(override_cred); + goto out_revert_creds; } put_cred(override_creds(override_cred)); put_cred(override_cred); - - if (!ovl_dentry_is_whiteout(dentry)) - err = ovl_create_upper(dentry, inode, attr); - else - err = ovl_create_over_whiteout(dentry, inode, attr); } + + if (!ovl_dentry_is_whiteout(dentry)) + err = ovl_create_upper(dentry, inode, attr); + else + err = ovl_create_over_whiteout(dentry, inode, attr); + out_revert_creds: revert_creds(old_cred); return err;
From: Artem Egorkine arteme@gmail.com
commit 8508fa2e7472f673edbeedf1b1d2b7a6bb898ecc upstream.
A PODxt device sends 0xb2, 0xc2 or 0xf2 as a status byte for MIDI messages over USB that should otherwise have a 0xb0, 0xc0 or 0xf0 status byte. This is usually corrected by the driver on other OSes.
This fixes MIDI sysex messages sent by PODxt.
[ tiwai: fixed white spaces ]
Signed-off-by: Artem Egorkine arteme@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221225105728.1153989-1-arteme@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/line6/driver.c | 3 ++- sound/usb/line6/midi.c | 3 ++- sound/usb/line6/midibuf.c | 25 +++++++++++++++++-------- sound/usb/line6/midibuf.h | 5 ++++- sound/usb/line6/pod.c | 3 ++- 5 files changed, 27 insertions(+), 12 deletions(-)
--- a/sound/usb/line6/driver.c +++ b/sound/usb/line6/driver.c @@ -304,7 +304,8 @@ static void line6_data_received(struct u for (;;) { done = line6_midibuf_read(mb, line6->buffer_message, - LINE6_MIDI_MESSAGE_MAXLEN); + LINE6_MIDI_MESSAGE_MAXLEN, + LINE6_MIDIBUF_READ_RX);
if (done <= 0) break; --- a/sound/usb/line6/midi.c +++ b/sound/usb/line6/midi.c @@ -56,7 +56,8 @@ static void line6_midi_transmit(struct s
for (;;) { done = line6_midibuf_read(mb, chunk, - LINE6_FALLBACK_MAXPACKETSIZE); + LINE6_FALLBACK_MAXPACKETSIZE, + LINE6_MIDIBUF_READ_TX);
if (done == 0) break; --- a/sound/usb/line6/midibuf.c +++ b/sound/usb/line6/midibuf.c @@ -9,6 +9,7 @@
#include "midibuf.h"
+ static int midibuf_message_length(unsigned char code) { int message_length; @@ -20,12 +21,7 @@ static int midibuf_message_length(unsign
message_length = length[(code >> 4) - 8]; } else { - /* - Note that according to the MIDI specification 0xf2 is - the "Song Position Pointer", but this is used by Line 6 - to send sysex messages to the host. - */ - static const int length[] = { -1, 2, -1, 2, -1, -1, 1, 1, 1, 1, + static const int length[] = { -1, 2, 2, 2, -1, -1, 1, 1, 1, -1, 1, 1, 1, -1, 1, 1 }; message_length = length[code & 0x0f]; @@ -125,7 +121,7 @@ int line6_midibuf_write(struct midi_buff }
int line6_midibuf_read(struct midi_buffer *this, unsigned char *data, - int length) + int length, int read_type) { int bytes_used; int length1, length2; @@ -148,9 +144,22 @@ int line6_midibuf_read(struct midi_buffe
length1 = this->size - this->pos_read;
- /* check MIDI command length */ command = this->buf[this->pos_read]; + /* + PODxt always has status byte lower nibble set to 0010, + when it means to send 0000, so we correct if here so + that control/program changes come on channel 1 and + sysex message status byte is correct + */ + if (read_type == LINE6_MIDIBUF_READ_RX) { + if (command == 0xb2 || command == 0xc2 || command == 0xf2) { + unsigned char fixed = command & 0xf0; + this->buf[this->pos_read] = fixed; + command = fixed; + } + }
+ /* check MIDI command length */ if (command & 0x80) { midi_length = midibuf_message_length(command); this->command_prev = command; --- a/sound/usb/line6/midibuf.h +++ b/sound/usb/line6/midibuf.h @@ -8,6 +8,9 @@ #ifndef MIDIBUF_H #define MIDIBUF_H
+#define LINE6_MIDIBUF_READ_TX 0 +#define LINE6_MIDIBUF_READ_RX 1 + struct midi_buffer { unsigned char *buf; int size; @@ -23,7 +26,7 @@ extern void line6_midibuf_destroy(struct extern int line6_midibuf_ignore(struct midi_buffer *mb, int length); extern int line6_midibuf_init(struct midi_buffer *mb, int size, int split); extern int line6_midibuf_read(struct midi_buffer *mb, unsigned char *data, - int length); + int length, int read_type); extern void line6_midibuf_reset(struct midi_buffer *mb); extern int line6_midibuf_write(struct midi_buffer *mb, unsigned char *data, int length); --- a/sound/usb/line6/pod.c +++ b/sound/usb/line6/pod.c @@ -159,8 +159,9 @@ static struct line6_pcm_properties pod_p .bytes_per_channel = 3 /* SNDRV_PCM_FMTBIT_S24_3LE */ };
+ static const char pod_version_header[] = { - 0xf2, 0x7e, 0x7f, 0x06, 0x02 + 0xf0, 0x7e, 0x7f, 0x06, 0x02 };
static char *pod_alloc_sysex_buffer(struct usb_line6_pod *pod, int code,
From: Artem Egorkine arteme@gmail.com
commit b8800d324abb50160560c636bfafe2c81001b66c upstream.
Correctly calculate available space including the size of the chunk buffer. This fixes a buffer overflow when multiple MIDI sysex messages are sent to a PODxt device.
Signed-off-by: Artem Egorkine arteme@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221225105728.1153989-2-arteme@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/line6/midi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/sound/usb/line6/midi.c +++ b/sound/usb/line6/midi.c @@ -44,7 +44,8 @@ static void line6_midi_transmit(struct s int req, done;
for (;;) { - req = min(line6_midibuf_bytes_free(mb), line6->max_packet_size); + req = min3(line6_midibuf_bytes_free(mb), line6->max_packet_size, + LINE6_FALLBACK_MAXPACKETSIZE); done = snd_rawmidi_transmit_peek(substream, chunk, req);
if (done == 0)
From: Christian Brauner brauner@kernel.org
commit 11933cf1d91d57da9e5c53822a540bbdc2656c16 upstream.
The propagate_mnt() function handles mount propagation when creating mounts and propagates the source mount tree @source_mnt to all applicable nodes of the destination propagation mount tree headed by @dest_mnt.
Unfortunately it contains a bug where it fails to terminate at peers of @source_mnt when looking up copies of the source mount that become masters for copies of the source mount tree mounted on top of slaves in the destination propagation tree causing a NULL dereference.
Once the mechanics of the bug are understood it's easy to trigger. Because of unprivileged user namespaces it is available to unprivileged users.
While fixing this bug we've gotten confused multiple times due to unclear terminology or missing concepts. So let's start this with some clarifications:
* The terms "master" or "peer" denote a shared mount. A shared mount belongs to a peer group.
* A peer group is a set of shared mounts that propagate to each other. They are identified by a peer group id. The peer group id is available in @shared_mnt->mnt_group_id. Shared mounts within the same peer group have the same peer group id. The peers in a peer group can be reached via @shared_mnt->mnt_share.
* The terms "slave mount" or "dependent mount" denote a mount that receives propagation from a peer in a peer group. IOW, shared mounts may have slave mounts and slave mounts have shared mounts as their master. Slave mounts of a given peer in a peer group are listed on that peers slave list available at @shared_mnt->mnt_slave_list.
* The term "master mount" denotes a mount in a peer group. IOW, it denotes a shared mount or a peer mount in a peer group. The term "master mount" - or "master" for short - is mostly used when talking in the context of slave mounts that receive propagation from a master mount. A master mount of a slave identifies the closest peer group a slave mount receives propagation from. The master mount of a slave can be identified via @slave_mount->mnt_master. Different slaves may point to different masters in the same peer group.
* Multiple peers in a peer group can have non-empty ->mnt_slave_lists. Non-empty ->mnt_slave_lists of peers don't intersect. Consequently, to ensure all slave mounts of a peer group are visited the ->mnt_slave_lists of all peers in a peer group have to be walked.
* Slave mounts point to a peer in the closest peer group they receive propagation from via @slave_mnt->mnt_master (see above). Together with these peers they form a propagation group (see below). The closest peer group can thus be identified through the peer group id @slave_mnt->mnt_master->mnt_group_id of the peer/master that a slave mount receives propagation from.
* A shared-slave mount is a slave mount to a peer group pg1 while also a peer in another peer group pg2. IOW, a peer group may receive propagation from another peer group.
If a peer group pg1 is a slave to another peer group pg2 then all peers in peer group pg1 point to the same peer in peer group pg2 via ->mnt_master. IOW, all peers in peer group pg1 appear on the same ->mnt_slave_list. IOW, they cannot be slaves to different peer groups.
* A pure slave mount is a slave mount that is a slave to a peer group but is not a peer in another peer group.
* A propagation group denotes the set of mounts consisting of a single peer group pg1 and all slave mounts and shared-slave mounts that point to a peer in that peer group via ->mnt_master. IOW, all slave mounts such that @slave_mnt->mnt_master->mnt_group_id is equal to @shared_mnt->mnt_group_id.
The concept of a propagation group makes it easier to talk about a single propagation level in a propagation tree.
For example, in propagate_mnt() the immediate peers of @dest_mnt and all slaves of @dest_mnt's peer group form a propagation group propg1. So a shared-slave mount that is a slave in propg1 and that is a peer in another peer group pg2 forms another propagation group propg2 together with all slaves that point to that shared-slave mount in their ->mnt_master.
* A propagation tree refers to all mounts that receive propagation starting from a specific shared mount.
For example, for propagate_mnt() @dest_mnt is the start of a propagation tree. The propagation tree ecompasses all mounts that receive propagation from @dest_mnt's peer group down to the leafs.
With that out of the way let's get to the actual algorithm.
We know that @dest_mnt is guaranteed to be a pure shared mount or a shared-slave mount. This is guaranteed by a check in attach_recursive_mnt(). So propagate_mnt() will first propagate the source mount tree to all peers in @dest_mnt's peer group:
for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) { ret = propagate_one(n); if (ret) goto out; }
Notice, that the peer propagation loop of propagate_mnt() doesn't propagate @dest_mnt itself. @dest_mnt is mounted directly in attach_recursive_mnt() after we propagated to the destination propagation tree.
The mount that will be mounted on top of @dest_mnt is @source_mnt. This copy was created earlier even before we entered attach_recursive_mnt() and doesn't concern us a lot here.
It's just important to notice that when propagate_mnt() is called @source_mnt will not yet have been mounted on top of @dest_mnt. Thus, @source_mnt->mnt_parent will either still point to @source_mnt or - in the case @source_mnt is moved and thus already attached - still to its former parent.
For each peer @m in @dest_mnt's peer group propagate_one() will create a new copy of the source mount tree and mount that copy @child on @m such that @child->mnt_parent points to @m after propagate_one() returns.
propagate_one() will stash the last destination propagation node @m in @last_dest and the last copy it created for the source mount tree in @last_source.
Hence, if we call into propagate_one() again for the next destination propagation node @m, @last_dest will point to the previous destination propagation node and @last_source will point to the previous copy of the source mount tree and mounted on @last_dest.
Each new copy of the source mount tree is created from the previous copy of the source mount tree. This will become important later.
The peer loop in propagate_mnt() is straightforward. We iterate through the peers copying and updating @last_source and @last_dest as we go through them and mount each copy of the source mount tree @child on a peer @m in @dest_mnt's peer group.
After propagate_mnt() handled the peers in @dest_mnt's peer group propagate_mnt() will propagate the source mount tree down the propagation tree that @dest_mnt's peer group propagates to:
for (m = next_group(dest_mnt, dest_mnt); m; m = next_group(m, dest_mnt)) { /* everything in that slave group */ n = m; do { ret = propagate_one(n); if (ret) goto out; n = next_peer(n); } while (n != m); }
The next_group() helper will recursively walk the destination propagation tree, descending into each propagation group of the propagation tree.
The important part is that it takes care to propagate the source mount tree to all peers in the peer group of a propagation group before it propagates to the slaves to those peers in the propagation group. IOW, it creates and mounts copies of the source mount tree that become masters before it creates and mounts copies of the source mount tree that become slaves to these masters.
It is important to remember that propagating the source mount tree to each mount @m in the destination propagation tree simply means that we create and mount new copies @child of the source mount tree on @m such that @child->mnt_parent points to @m.
Since we know that each node @m in the destination propagation tree headed by @dest_mnt's peer group will be overmounted with a copy of the source mount tree and since we know that the propagation properties of each copy of the source mount tree we create and mount at @m will mostly mirror the propagation properties of @m. We can use that information to create and mount the copies of the source mount tree that become masters before their slaves.
The easy case is always when @m and @last_dest are peers in a peer group of a given propagation group. In that case we know that we can simply copy @last_source without having to figure out what the master for the new copy @child of the source mount tree needs to be as we've done that in a previous call to propagate_one().
The hard case is when we're dealing with a slave mount or a shared-slave mount @m in a destination propagation group that we need to create and mount a copy of the source mount tree on.
For each propagation group in the destination propagation tree we propagate the source mount tree to we want to make sure that the copies @child of the source mount tree we create and mount on slaves @m pick an ealier copy of the source mount tree that we mounted on a master @m of the destination propagation group as their master. This is a mouthful but as far as we can tell that's the core of it all.
But, if we keep track of the masters in the destination propagation tree @m we can use the information to find the correct master for each copy of the source mount tree we create and mount at the slaves in the destination propagation tree @m.
Let's walk through the base case as that's still fairly easy to grasp.
If we're dealing with the first slave in the propagation group that @dest_mnt is in then we don't yet have marked any masters in the destination propagation tree.
We know the master for the first slave to @dest_mnt's peer group is simple @dest_mnt. So we expect this algorithm to yield a copy of the source mount tree that was mounted on a peer in @dest_mnt's peer group as the master for the copy of the source mount tree we want to mount at the first slave @m:
for (n = m; ; n = p) { p = n->mnt_master; if (p == dest_master || IS_MNT_MARKED(p)) break; }
For the first slave we walk the destination propagation tree all the way up to a peer in @dest_mnt's peer group. IOW, the propagation hierarchy can be walked by walking up the @mnt->mnt_master hierarchy of the destination propagation tree @m. We will ultimately find a peer in @dest_mnt's peer group and thus ultimately @dest_mnt->mnt_master.
Btw, here the assumption we listed at the beginning becomes important. Namely, that peers in a peer group pg1 that are slaves in another peer group pg2 appear on the same ->mnt_slave_list. IOW, all slaves who are peers in peer group pg1 point to the same peer in peer group pg2 via their ->mnt_master. Otherwise the termination condition in the code above would be wrong and next_group() would be broken too.
So the first iteration sets:
n = m; p = n->mnt_master;
such that @p now points to a peer or @dest_mnt itself. We walk up one more level since we don't have any marked mounts. So we end up with:
n = dest_mnt; p = dest_mnt->mnt_master;
If @dest_mnt's peer group is not slave to another peer group then @p is now NULL. If @dest_mnt's peer group is a slave to another peer group then @p now points to @dest_mnt->mnt_master points which is a master outside the propagation tree we're dealing with.
Now we need to figure out the master for the copy of the source mount tree we're about to create and mount on the first slave of @dest_mnt's peer group:
do { struct mount *parent = last_source->mnt_parent; if (last_source == first_source) break; done = parent->mnt_master == p; if (done && peers(n, parent)) break; last_source = last_source->mnt_master; } while (!done);
We know that @last_source->mnt_parent points to @last_dest and @last_dest is the last peer in @dest_mnt's peer group we propagated to in the peer loop in propagate_mnt().
Consequently, @last_source is the last copy we created and mount on that last peer in @dest_mnt's peer group. So @last_source is the master we want to pick.
We know that @last_source->mnt_parent->mnt_master points to @last_dest->mnt_master. We also know that @last_dest->mnt_master is either NULL or points to a master outside of the destination propagation tree and so does @p. Hence:
done = parent->mnt_master == p;
is trivially true in the base condition.
We also know that for the first slave mount of @dest_mnt's peer group that @last_dest either points @dest_mnt itself because it was initialized to:
last_dest = dest_mnt;
at the beginning of propagate_mnt() or it will point to a peer of @dest_mnt in its peer group. In both cases it is guaranteed that on the first iteration @n and @parent are peers (Please note the check for peers here as that's important.):
if (done && peers(n, parent)) break;
So, as we expected, we select @last_source, which referes to the last copy of the source mount tree we mounted on the last peer in @dest_mnt's peer group, as the master of the first slave in @dest_mnt's peer group. The rest is taken care of by clone_mnt(last_source, ...). We'll skip over that part otherwise this becomes a blogpost.
At the end of propagate_mnt() we now mark @m->mnt_master as the first master in the destination propagation tree that is distinct from @dest_mnt->mnt_master. IOW, we mark @dest_mnt itself as a master.
By marking @dest_mnt or one of it's peers we are able to easily find it again when we later lookup masters for other copies of the source mount tree we mount copies of the source mount tree on slaves @m to @dest_mnt's peer group. This, in turn allows us to find the master we selected for the copies of the source mount tree we mounted on master in the destination propagation tree again.
The important part is to realize that the code makes use of the fact that the last copy of the source mount tree stashed in @last_source was mounted on top of the previous destination propagation node @last_dest. What this means is that @last_source allows us to walk the destination propagation hierarchy the same way each destination propagation node @m does.
If we take @last_source, which is the copy of @source_mnt we have mounted on @last_dest in the previous iteration of propagate_one(), then we know @last_source->mnt_parent points to @last_dest but we also know that as we walk through the destination propagation tree that @last_source->mnt_master will point to an earlier copy of the source mount tree we mounted one an earlier destination propagation node @m.
IOW, @last_source->mnt_parent will be our hook into the destination propagation tree and each consecutive @last_source->mnt_master will lead us to an earlier propagation node @m via @last_source->mnt_master->mnt_parent.
Hence, by walking up @last_source->mnt_master, each of which is mounted on a node that is a master @m in the destination propagation tree we can also walk up the destination propagation hierarchy.
So, for each new destination propagation node @m we use the previous copy of @last_source and the fact it's mounted on the previous propagation node @last_dest via @last_source->mnt_master->mnt_parent to determine what the master of the new copy of @last_source needs to be.
The goal is to find the _closest_ master that the new copy of the source mount tree we are about to create and mount on a slave @m in the destination propagation tree needs to pick. IOW, we want to find a suitable master in the propagation group.
As the propagation structure of the source mount propagation tree we create mirrors the propagation structure of the destination propagation tree we can find @m's closest master - i.e., a marked master - which is a peer in the closest peer group that @m receives propagation from. We store that closest master of @m in @p as before and record the slave to that master in @n
We then search for this master @p via @last_source by walking up the master hierarchy starting from the last copy of the source mount tree stored in @last_source that we created and mounted on the previous destination propagation node @m.
We will try to find the master by walking @last_source->mnt_master and by comparing @last_source->mnt_master->mnt_parent->mnt_master to @p. If we find @p then we can figure out what earlier copy of the source mount tree needs to be the master for the new copy of the source mount tree we're about to create and mount at the current destination propagation node @m.
If @last_source->mnt_master->mnt_parent and @n are peers then we know that the closest master they receive propagation from is @last_source->mnt_master->mnt_parent->mnt_master. If not then the closest immediate peer group that they receive propagation from must be one level higher up.
This builds on the earlier clarification at the beginning that all peers in a peer group which are slaves of other peer groups all point to the same ->mnt_master, i.e., appear on the same ->mnt_slave_list, of the closest peer group that they receive propagation from.
However, terminating the walk has corner cases.
If the closest marked master for a given destination node @m cannot be found by walking up the master hierarchy via @last_source->mnt_master then we need to terminate the walk when we encounter @source_mnt again.
This isn't an arbitrary termination. It simply means that the new copy of the source mount tree we're about to create has a copy of the source mount tree we created and mounted on a peer in @dest_mnt's peer group as its master. IOW, @source_mnt is the peer in the closest peer group that the new copy of the source mount tree receives propagation from.
We absolutely have to stop @source_mnt because @last_source->mnt_master either points outside the propagation hierarchy we're dealing with or it is NULL because @source_mnt isn't a shared-slave.
So continuing the walk past @source_mnt would cause a NULL dereference via @last_source->mnt_master->mnt_parent. And so we have to stop the walk when we encounter @source_mnt again.
One scenario where this can happen is when we first handled a series of slaves of @dest_mnt's peer group and then encounter peers in a new peer group that is a slave to @dest_mnt's peer group. We handle them and then we encounter another slave mount to @dest_mnt that is a pure slave to @dest_mnt's peer group. That pure slave will have a peer in @dest_mnt's peer group as its master. Consequently, the new copy of the source mount tree will need to have @source_mnt as it's master. So we walk the propagation hierarchy all the way up to @source_mnt based on @last_source->mnt_master.
So terminate on @source_mnt, easy peasy. Except, that the check misses something that the rest of the algorithm already handles.
If @dest_mnt has peers in it's peer group the peer loop in propagate_mnt():
for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) { ret = propagate_one(n); if (ret) goto out; }
will consecutively update @last_source with each previous copy of the source mount tree we created and mounted at the previous peer in @dest_mnt's peer group. So after that loop terminates @last_source will point to whatever copy of the source mount tree was created and mounted on the last peer in @dest_mnt's peer group.
Furthermore, if there is even a single additional peer in @dest_mnt's peer group then @last_source will __not__ point to @source_mnt anymore. Because, as we mentioned above, @dest_mnt isn't even handled in this loop but directly in attach_recursive_mnt(). So it can't even accidently come last in that peer loop.
So the first time we handle a slave mount @m of @dest_mnt's peer group the copy of the source mount tree we create will make the __last copy of the source mount tree we created and mounted on the last peer in @dest_mnt's peer group the master of the new copy of the source mount tree we create and mount on the first slave of @dest_mnt's peer group__.
But this means that the termination condition that checks for @source_mnt is wrong. The @source_mnt cannot be found anymore by propagate_one(). Instead it will find the last copy of the source mount tree we created and mounted for the last peer of @dest_mnt's peer group again. And that is a peer of @source_mnt not @source_mnt itself.
IOW, we fail to terminate the loop correctly and ultimately dereference @last_source->mnt_master->mnt_parent. When @source_mnt's peer group isn't slave to another peer group then @last_source->mnt_master is NULL causing the splat below.
For example, assume @dest_mnt is a pure shared mount and has three peers in its peer group:
=================================================================================== mount-id mount-parent-id peer-group-id =================================================================================== (@dest_mnt) mnt_master[216] 309 297 shared:216 \ (@source_mnt) mnt_master[218]: 609 609 shared:218
(1) mnt_master[216]: 607 605 shared:216 \ (P1) mnt_master[218]: 624 607 shared:218
(2) mnt_master[216]: 576 574 shared:216 \ (P2) mnt_master[218]: 625 576 shared:218
(3) mnt_master[216]: 545 543 shared:216 \ (P3) mnt_master[218]: 626 545 shared:218
After this sequence has been processed @last_source will point to (P3), the copy generated for the third peer in @dest_mnt's peer group we handled. So the copy of the source mount tree (P4) we create and mount on the first slave of @dest_mnt's peer group:
=================================================================================== mount-id mount-parent-id peer-group-id =================================================================================== mnt_master[216] 309 297 shared:216 / / (S0) mnt_slave 483 481 master:216 \ \ (P3) mnt_master[218] 626 545 shared:218 \ / / (P4) mnt_slave 627 483 master:218
will pick the last copy of the source mount tree (P3) as master, not (S0).
When walking the propagation hierarchy via @last_source's master hierarchy we encounter (P3) but not (S0), i.e., @source_mnt.
We can fix this in multiple ways:
(1) By setting @last_source to @source_mnt after we processed the peers in @dest_mnt's peer group right after the peer loop in propagate_mnt().
(2) By changing the termination condition that relies on finding exactly @source_mnt to finding a peer of @source_mnt.
(3) By only moving @last_source when we actually venture into a new peer group or some clever variant thereof.
The first two options are minimally invasive and what we want as a fix. The third option is more intrusive but something we'd like to explore in the near future.
This passes all LTP tests and specifically the mount propagation testsuite part of it. It also holds up against all known reproducers of this issues.
Final words. First, this is a clever but __worringly__ underdocumented algorithm. There isn't a single detailed comment to be found in next_group(), propagate_one() or anywhere else in that file for that matter. This has been a giant pain to understand and work through and a bug like this is insanely difficult to fix without a detailed understanding of what's happening. Let's not talk about the amount of time that was sunk into fixing this.
Second, all the cool kids with access to unshare --mount --user --map-root --propagation=unchanged are going to have a lot of fun. IOW, triggerable by unprivileged users while namespace_lock() lock is held.
[ 115.848393] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 115.848967] #PF: supervisor read access in kernel mode [ 115.849386] #PF: error_code(0x0000) - not-present page [ 115.849803] PGD 0 P4D 0 [ 115.850012] Oops: 0000 [#1] PREEMPT SMP PTI [ 115.850354] CPU: 0 PID: 15591 Comm: mount Not tainted 6.1.0-rc7 #3 [ 115.850851] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 115.851510] RIP: 0010:propagate_one.part.0+0x7f/0x1a0 [ 115.851924] Code: 75 eb 4c 8b 05 c2 25 37 02 4c 89 ca 48 8b 4a 10 49 39 d0 74 1e 48 3b 81 e0 00 00 00 74 26 48 8b 92 e0 00 00 00 be 01 00 00 00 <48> 8b 4a 10 49 39 d0 75 e2 40 84 f6 74 38 4c 89 05 84 25 37 02 4d [ 115.853441] RSP: 0018:ffffb8d5443d7d50 EFLAGS: 00010282 [ 115.853865] RAX: ffff8e4d87c41c80 RBX: ffff8e4d88ded780 RCX: ffff8e4da4333a00 [ 115.854458] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e4d88ded780 [ 115.855044] RBP: ffff8e4d88ded780 R08: ffff8e4da4338000 R09: ffff8e4da43388c0 [ 115.855693] R10: 0000000000000002 R11: ffffb8d540158000 R12: ffffb8d5443d7da8 [ 115.856304] R13: ffff8e4d88ded780 R14: 0000000000000000 R15: 0000000000000000 [ 115.856859] FS: 00007f92c90c9800(0000) GS:ffff8e4dfdc00000(0000) knlGS:0000000000000000 [ 115.857531] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.858006] CR2: 0000000000000010 CR3: 0000000022f4c002 CR4: 00000000000706f0 [ 115.858598] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 115.859393] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 115.860099] Call Trace: [ 115.860358] <TASK> [ 115.860535] propagate_mnt+0x14d/0x190 [ 115.860848] attach_recursive_mnt+0x274/0x3e0 [ 115.861212] path_mount+0x8c8/0xa60 [ 115.861503] __x64_sys_mount+0xf6/0x140 [ 115.861819] do_syscall_64+0x5b/0x80 [ 115.862117] ? do_faccessat+0x123/0x250 [ 115.862435] ? syscall_exit_to_user_mode+0x17/0x40 [ 115.862826] ? do_syscall_64+0x67/0x80 [ 115.863133] ? syscall_exit_to_user_mode+0x17/0x40 [ 115.863527] ? do_syscall_64+0x67/0x80 [ 115.863835] ? do_syscall_64+0x67/0x80 [ 115.864144] ? do_syscall_64+0x67/0x80 [ 115.864452] ? exc_page_fault+0x70/0x170 [ 115.864775] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 115.865187] RIP: 0033:0x7f92c92b0ebe [ 115.865480] Code: 48 8b 0d 75 4f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 42 4f 0c 00 f7 d8 64 89 01 48 [ 115.866984] RSP: 002b:00007fff000aa728 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [ 115.867607] RAX: ffffffffffffffda RBX: 000055a77888d6b0 RCX: 00007f92c92b0ebe [ 115.868240] RDX: 000055a77888d8e0 RSI: 000055a77888e6e0 RDI: 000055a77888e620 [ 115.868823] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 [ 115.869403] R10: 0000000000001000 R11: 0000000000000246 R12: 000055a77888e620 [ 115.869994] R13: 000055a77888d8e0 R14: 00000000ffffffff R15: 00007f92c93e4076 [ 115.870581] </TASK> [ 115.870763] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink qrtr snd_intel8x0 sunrpc snd_ac97_codec ac97_bus snd_pcm snd_timer intel_rapl_msr intel_rapl_common snd vboxguest intel_powerclamp video rapl joydev soundcore i2c_piix4 wmi fuse zram xfs vmwgfx crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic drm_ttm_helper ttm e1000 ghash_clmulni_intel serio_raw ata_generic pata_acpi scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath [ 115.875288] CR2: 0000000000000010 [ 115.875641] ---[ end trace 0000000000000000 ]--- [ 115.876135] RIP: 0010:propagate_one.part.0+0x7f/0x1a0 [ 115.876551] Code: 75 eb 4c 8b 05 c2 25 37 02 4c 89 ca 48 8b 4a 10 49 39 d0 74 1e 48 3b 81 e0 00 00 00 74 26 48 8b 92 e0 00 00 00 be 01 00 00 00 <48> 8b 4a 10 49 39 d0 75 e2 40 84 f6 74 38 4c 89 05 84 25 37 02 4d [ 115.878086] RSP: 0018:ffffb8d5443d7d50 EFLAGS: 00010282 [ 115.878511] RAX: ffff8e4d87c41c80 RBX: ffff8e4d88ded780 RCX: ffff8e4da4333a00 [ 115.879128] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e4d88ded780 [ 115.879715] RBP: ffff8e4d88ded780 R08: ffff8e4da4338000 R09: ffff8e4da43388c0 [ 115.880359] R10: 0000000000000002 R11: ffffb8d540158000 R12: ffffb8d5443d7da8 [ 115.880962] R13: ffff8e4d88ded780 R14: 0000000000000000 R15: 0000000000000000 [ 115.881548] FS: 00007f92c90c9800(0000) GS:ffff8e4dfdc00000(0000) knlGS:0000000000000000 [ 115.882234] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.882713] CR2: 0000000000000010 CR3: 0000000022f4c002 CR4: 00000000000706f0 [ 115.883314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 115.883966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fixes: f2ebb3a921c1 ("smarter propagate_mnt()") Fixes: 5ec0811d3037 ("propogate_mnt: Handle the first propogated copy being a slave") Cc: stable@vger.kernel.org Reported-by: Ditang Chen ditang.c@gmail.com Signed-off-by: Seth Forshee (Digital Ocean) sforshee@kernel.org Signed-off-by: Christian Brauner (Microsoft) brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- If there are no big objections I'll get this to Linus rather sooner than later. --- fs/pnode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/pnode.c +++ b/fs/pnode.c @@ -244,7 +244,7 @@ static int propagate_one(struct mount *m } do { struct mount *parent = last_source->mnt_parent; - if (last_source == first_source) + if (peers(last_source, first_source)) break; done = parent->mnt_master == p; if (done && peers(n, parent))
From: ChiYuan Huang cy_huang@richtek.com
commit 5f4f94e9f26cca6514474b307b59348b8485e711 upstream.
Fix the potential risk of OOB read if bank index is over the maximum.
Refer to the discussion list for the experiment result on mt6370. https://lore.kernel.org/all/20220914013345.GA5802@cyhuang-hp-elitebook-840-g... If not to check the bound, there is the same issue on mt6360.
Cc: stable@vger.kernel.org Fixes: 3b0850440a06c (mfd: mt6360: Merge different sub-devices I2C read/write) Signed-off-by: ChiYuan Huang cy_huang@richtek.com Signed-off-by: Lee Jones lee@kernel.org Link: https://lore.kernel.org/r/1664416817-31590-1-git-send-email-u0084500@gmail.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mfd/mt6360-core.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
--- a/drivers/mfd/mt6360-core.c +++ b/drivers/mfd/mt6360-core.c @@ -402,7 +402,7 @@ static int mt6360_regmap_read(void *cont struct mt6360_ddata *ddata = context; u8 bank = *(u8 *)reg; u8 reg_addr = *(u8 *)(reg + 1); - struct i2c_client *i2c = ddata->i2c[bank]; + struct i2c_client *i2c; bool crc_needed = false; u8 *buf; int buf_len = MT6360_ALLOC_READ_SIZE(val_size); @@ -410,6 +410,11 @@ static int mt6360_regmap_read(void *cont u8 crc; int ret;
+ if (bank >= MT6360_SLAVE_MAX) + return -EINVAL; + + i2c = ddata->i2c[bank]; + if (bank == MT6360_SLAVE_PMIC || bank == MT6360_SLAVE_LDO) { crc_needed = true; ret = mt6360_xlate_pmicldo_addr(®_addr, val_size); @@ -453,13 +458,18 @@ static int mt6360_regmap_write(void *con struct mt6360_ddata *ddata = context; u8 bank = *(u8 *)val; u8 reg_addr = *(u8 *)(val + 1); - struct i2c_client *i2c = ddata->i2c[bank]; + struct i2c_client *i2c; bool crc_needed = false; u8 *buf; int buf_len = MT6360_ALLOC_WRITE_SIZE(val_size); int write_size = val_size - MT6360_REGMAP_REG_BYTE_SIZE; int ret;
+ if (bank >= MT6360_SLAVE_MAX) + return -EINVAL; + + i2c = ddata->i2c[bank]; + if (bank == MT6360_SLAVE_PMIC || bank == MT6360_SLAVE_LDO) { crc_needed = true; ret = mt6360_xlate_pmicldo_addr(®_addr, val_size - MT6360_REGMAP_REG_BYTE_SIZE);
From: Mikulas Patocka mpatocka@redhat.com
commit 341097ee53573e06ab9fc675d96a052385b851fa upstream.
There's a crash in mempool_free when running the lvm test shell/lvchange-rebuild-raid.sh.
The reason for the crash is this: * super_written calls atomic_dec_and_test(&mddev->pending_writes) and wake_up(&mddev->sb_wait). Then it calls rdev_dec_pending(rdev, mddev) and bio_put(bio). * so, the process that waited on sb_wait and that is woken up is racing with bio_put(bio). * if the process wins the race, it calls bioset_exit before bio_put(bio) is executed. * bio_put(bio) attempts to free a bio into a destroyed bio set - causing a crash in mempool_free.
We fix this bug by moving bio_put before atomic_dec_and_test.
We also move rdev_dec_pending before atomic_dec_and_test as suggested by Neil Brown.
The function md_end_flush has a similar bug - we must call bio_put before we decrement the number of in-progress bios.
BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 11557f0067 P4D 11557f0067 PUD 0 Oops: 0002 [#1] PREEMPT SMP CPU: 0 PID: 73 Comm: kworker/0:1 Not tainted 6.1.0-rc3 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Workqueue: kdelayd flush_expired_bios [dm_delay] RIP: 0010:mempool_free+0x47/0x80 Code: 48 89 ef 5b 5d ff e0 f3 c3 48 89 f7 e8 32 45 3f 00 48 63 53 08 48 89 c6 3b 53 04 7d 2d 48 8b 43 10 8d 4a 01 48 89 df 89 4b 08 <48> 89 2c d0 e8 b0 45 3f 00 48 8d 7b 30 5b 5d 31 c9 ba 01 00 00 00 RSP: 0018:ffff88910036bda8 EFLAGS: 00010093 RAX: 0000000000000000 RBX: ffff8891037b65d8 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8891037b65d8 RBP: ffff8891447ba240 R08: 0000000000012908 R09: 00000000003d0900 R10: 0000000000000000 R11: 0000000000173544 R12: ffff889101a14000 R13: ffff8891562ac300 R14: ffff889102b41440 R15: ffffe8ffffa00d05 FS: 0000000000000000(0000) GS:ffff88942fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000001102e99000 CR4: 00000000000006b0 Call Trace: <TASK> clone_endio+0xf4/0x1c0 [dm_mod] clone_endio+0xf4/0x1c0 [dm_mod] __submit_bio+0x76/0x120 submit_bio_noacct_nocheck+0xb6/0x2a0 flush_expired_bios+0x28/0x2f [dm_delay] process_one_work+0x1b4/0x300 worker_thread+0x45/0x3e0 ? rescuer_thread+0x380/0x380 kthread+0xc2/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Modules linked in: brd dm_delay dm_raid dm_mod af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg scsi_common [last unloaded: brd] CR2: 0000000000000000 ---[ end trace 0000000000000000 ]---
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Song Liu song@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/md.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -526,13 +526,14 @@ static void md_end_flush(struct bio *bio struct md_rdev *rdev = bio->bi_private; struct mddev *mddev = rdev->mddev;
+ bio_put(bio); + rdev_dec_pending(rdev, mddev);
if (atomic_dec_and_test(&mddev->flush_pending)) { /* The pre-request flush has finished */ queue_work(md_wq, &mddev->flush_work); } - bio_put(bio); }
static void md_submit_flush_data(struct work_struct *ws); @@ -935,10 +936,12 @@ static void super_written(struct bio *bi } else clear_bit(LastDev, &rdev->flags);
+ bio_put(bio); + + rdev_dec_pending(rdev, mddev); + if (atomic_dec_and_test(&mddev->pending_writes)) wake_up(&mddev->sb_wait); - rdev_dec_pending(rdev, mddev); - bio_put(bio); }
void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
From: NARIBAYASHI Akira a.naribayashi@fujitsu.com
commit be21b32afe470c5ae98e27e49201158a47032942 upstream.
Depending on the memory configuration, isolate_freepages_block() may scan pages out of the target range and causes panic.
Panic can occur on systems with multiple zones in a single pageblock.
The reason it is rare is that it only happens in special configurations. Depending on how many similar systems there are, it may be a good idea to fix this problem for older kernels as well.
The problem is that pfn as argument of fast_isolate_around() could be out of the target range. Therefore we should consider the case where pfn < start_pfn, and also the case where end_pfn < pfn.
This problem should have been addressd by the commit 6e2b7044c199 ("mm, compaction: make fast_isolate_freepages() stay within zone") but there was an oversight.
Case1: pfn < start_pfn
<at memory compaction for node Y> | node X's zone | node Y's zone +-----------------+------------------------------... pageblock ^ ^ ^ +-----------+-----------+-----------+-----------+... ^ ^ ^ ^ ^ end_pfn ^ start_pfn = cc->zone->zone_start_pfn pfn <---------> scanned range by "Scan After"
Case2: end_pfn < pfn
<at memory compaction for node X> | node X's zone | node Y's zone +-----------------+------------------------------... pageblock ^ ^ ^ +-----------+-----------+-----------+-----------+... ^ ^ ^ ^ ^ pfn ^ end_pfn start_pfn <---------> scanned range by "Scan Before"
It seems that there is no good reason to skip nr_isolated pages just after given pfn. So let perform simple scan from start to end instead of dividing the scan into "Before" and "After".
Link: https://lkml.kernel.org/r/20221026112438.236336-1-a.naribayashi@fujitsu.com Fixes: 6e2b7044c199 ("mm, compaction: make fast_isolate_freepages() stay within zone"). Signed-off-by: NARIBAYASHI Akira a.naribayashi@fujitsu.com Cc: David Rientjes rientjes@google.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/compaction.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-)
--- a/mm/compaction.c +++ b/mm/compaction.c @@ -1350,7 +1350,7 @@ move_freelist_tail(struct list_head *fre }
static void -fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated) +fast_isolate_around(struct compact_control *cc, unsigned long pfn) { unsigned long start_pfn, end_pfn; struct page *page; @@ -1371,21 +1371,13 @@ fast_isolate_around(struct compact_contr if (!page) return;
- /* Scan before */ - if (start_pfn != pfn) { - isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, 1, false); - if (cc->nr_freepages >= cc->nr_migratepages) - return; - } - - /* Scan after */ - start_pfn = pfn + nr_isolated; - if (start_pfn < end_pfn) - isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false); + isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
/* Skip this pageblock in the future as it's full or nearly full */ if (cc->nr_freepages < cc->nr_migratepages) set_pageblock_skip(page); + + return; }
/* Search orders in round-robin fashion */ @@ -1561,7 +1553,7 @@ fast_isolate_freepages(struct compact_co return cc->free_pfn;
low_pfn = page_to_pfn(page); - fast_isolate_around(cc, low_pfn, nr_isolated); + fast_isolate_around(cc, low_pfn); return low_pfn; }
From: Pavel Machek pavel@denx.de
commit c3db3c2fd9992c08f49aa93752d3c103c3a4f6aa upstream.
The commit introduces another bug.
Cc: stable@vger.kernel.org Fixes: c6ad7fd16657e ("f2fs: fix to do sanity check on summary info") Signed-off-by: Pavel Machek pavel@denx.de Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/f2fs/gc.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1033,6 +1033,7 @@ static bool is_alive(struct f2fs_sb_info if (ofs_in_node >= max_addrs) { f2fs_err(sbi, "Inconsistent ofs_in_node:%u in summary, ino:%u, nid:%u, max:%u", ofs_in_node, dni->ino, dni->nid, max_addrs); + f2fs_put_page(node_page, 1); return false; }
From: Jaegeuk Kim jaegeuk@kernel.org
commit e6ecb142429183cef4835f31d4134050ae660032 upstream.
If block address is still alive, we should give a valid node block even after shutdown. Otherwise, we can see zero data when reading out a file.
Cc: stable@vger.kernel.org Fixes: 83a3bfdb5a8a ("f2fs: indicate shutdown f2fs to allow unmount successfully") Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/f2fs/node.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1357,8 +1357,7 @@ static int read_node_page(struct page *p return err;
/* NEW_ADDR can be seen, after cp_error drops some dirty node pages */ - if (unlikely(ni.blk_addr == NULL_ADDR || ni.blk_addr == NEW_ADDR) || - is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN)) { + if (unlikely(ni.blk_addr == NULL_ADDR || ni.blk_addr == NEW_ADDR)) { ClearPageUptodate(page); return -ENOENT; }
From: Deren Wu deren.wu@mediatek.com
commit 4a44cd249604e29e7b90ae796d7692f5773dd348 upstream.
vub300_enable_sdio_irq() works with mutex and need TASK_RUNNING here. Ensure that we mark current as TASK_RUNNING for sleepable context.
[ 77.554641] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff92a72c1d>] sdio_irq_thread+0x17d/0x5b0 [ 77.554652] WARNING: CPU: 2 PID: 1983 at kernel/sched/core.c:9813 __might_sleep+0x116/0x160 [ 77.554905] CPU: 2 PID: 1983 Comm: ksdioirqd/mmc1 Tainted: G OE 6.1.0-rc5 #1 [ 77.554910] Hardware name: Intel(R) Client Systems NUC8i7BEH/NUC8BEB, BIOS BECFL357.86A.0081.2020.0504.1834 05/04/2020 [ 77.554912] RIP: 0010:__might_sleep+0x116/0x160 [ 77.554920] RSP: 0018:ffff888107b7fdb8 EFLAGS: 00010282 [ 77.554923] RAX: 0000000000000000 RBX: ffff888118c1b740 RCX: 0000000000000000 [ 77.554926] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffed1020f6ffa9 [ 77.554928] RBP: ffff888107b7fde0 R08: 0000000000000001 R09: ffffed1043ea60ba [ 77.554930] R10: ffff88821f5305cb R11: ffffed1043ea60b9 R12: ffffffff93aa3a60 [ 77.554932] R13: 000000000000011b R14: 7fffffffffffffff R15: ffffffffc0558660 [ 77.554934] FS: 0000000000000000(0000) GS:ffff88821f500000(0000) knlGS:0000000000000000 [ 77.554937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 77.554939] CR2: 00007f8a44010d68 CR3: 000000024421a003 CR4: 00000000003706e0 [ 77.554942] Call Trace: [ 77.554944] <TASK> [ 77.554952] mutex_lock+0x78/0xf0 [ 77.554973] vub300_enable_sdio_irq+0x103/0x3c0 [vub300] [ 77.554981] sdio_irq_thread+0x25c/0x5b0 [ 77.555006] kthread+0x2b8/0x370 [ 77.555017] ret_from_fork+0x1f/0x30 [ 77.555023] </TASK> [ 77.555025] ---[ end trace 0000000000000000 ]---
Fixes: 88095e7b473a ("mmc: Add new VUB300 USB-to-SD/SDIO/MMC driver") Signed-off-by: Deren Wu deren.wu@mediatek.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87dc45b122d26d63c80532976813c9365d7160b3.167014088... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/vub300.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/mmc/host/vub300.c +++ b/drivers/mmc/host/vub300.c @@ -2049,6 +2049,7 @@ static void vub300_enable_sdio_irq(struc return; kref_get(&vub300->kref); if (enable) { + set_current_state(TASK_RUNNING); mutex_lock(&vub300->irq_mutex); if (vub300->irqs_queued) { vub300->irqs_queued -= 1; @@ -2064,6 +2065,7 @@ static void vub300_enable_sdio_irq(struc vub300_queue_poll_work(vub300, 0); } mutex_unlock(&vub300->irq_mutex); + set_current_state(TASK_INTERRUPTIBLE); } else { vub300->irq_enabled = 0; }
From: Hanjun Guo guohanjun@huawei.com
commit 8740a12ca2e2959531ad253bac99ada338b33d80 upstream.
The start and length of the event log area are obtained from TPM2 or TCPA table, so we call acpi_get_table() to get the ACPI information, but the acpi_get_table() should be coupled with acpi_put_table() to release the ACPI memory, add the acpi_put_table() properly to fix the memory leak.
While we are at it, remove the redundant empty line at the end of the tpm_read_log_acpi().
Fixes: 0bfb23746052 ("tpm: Move eventlog files to a subdirectory") Fixes: 85467f63a05c ("tpm: Add support for event log pointer found in TPM2 ACPI table") Cc: stable@vger.kernel.org Signed-off-by: Hanjun Guo guohanjun@huawei.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/eventlog/acpi.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/drivers/char/tpm/eventlog/acpi.c +++ b/drivers/char/tpm/eventlog/acpi.c @@ -90,16 +90,21 @@ int tpm_read_log_acpi(struct tpm_chip *c return -ENODEV;
if (tbl->header.length < - sizeof(*tbl) + sizeof(struct acpi_tpm2_phy)) + sizeof(*tbl) + sizeof(struct acpi_tpm2_phy)) { + acpi_put_table((struct acpi_table_header *)tbl); return -ENODEV; + }
tpm2_phy = (void *)tbl + sizeof(*tbl); len = tpm2_phy->log_area_minimum_length;
start = tpm2_phy->log_area_start_address; - if (!start || !len) + if (!start || !len) { + acpi_put_table((struct acpi_table_header *)tbl); return -ENODEV; + }
+ acpi_put_table((struct acpi_table_header *)tbl); format = EFI_TCG2_EVENT_LOG_FORMAT_TCG_2; } else { /* Find TCPA entry in RSDT (ACPI_LOGICAL_ADDRESSING) */ @@ -120,8 +125,10 @@ int tpm_read_log_acpi(struct tpm_chip *c break; }
+ acpi_put_table((struct acpi_table_header *)buff); format = EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2; } + if (!len) { dev_warn(&chip->dev, "%s: TCPA log area empty\n", __func__); return -EIO; @@ -156,5 +163,4 @@ err: kfree(log->bios_event_log); log->bios_event_log = NULL; return ret; - }
From: Hanjun Guo guohanjun@huawei.com
commit 37e90c374dd11cf4919c51e847c6d6ced0abc555 upstream.
In crb_acpi_add(), we get the TPM2 table to retrieve information like start method, and then assign them to the priv data, so the TPM2 table is not used after the init, should be freed, call acpi_put_table() to fix the memory leak.
Fixes: 30fc8d138e91 ("tpm: TPM 2.0 CRB Interface") Cc: stable@vger.kernel.org Signed-off-by: Hanjun Guo guohanjun@huawei.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm_crb.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-)
--- a/drivers/char/tpm/tpm_crb.c +++ b/drivers/char/tpm/tpm_crb.c @@ -676,12 +676,16 @@ static int crb_acpi_add(struct acpi_devi
/* Should the FIFO driver handle this? */ sm = buf->start_method; - if (sm == ACPI_TPM2_MEMORY_MAPPED) - return -ENODEV; + if (sm == ACPI_TPM2_MEMORY_MAPPED) { + rc = -ENODEV; + goto out; + }
priv = devm_kzalloc(dev, sizeof(struct crb_priv), GFP_KERNEL); - if (!priv) - return -ENOMEM; + if (!priv) { + rc = -ENOMEM; + goto out; + }
if (sm == ACPI_TPM2_COMMAND_BUFFER_WITH_ARM_SMC) { if (buf->header.length < (sizeof(*buf) + sizeof(*crb_smc))) { @@ -689,7 +693,8 @@ static int crb_acpi_add(struct acpi_devi FW_BUG "TPM2 ACPI table has wrong size %u for start method type %d\n", buf->header.length, ACPI_TPM2_COMMAND_BUFFER_WITH_ARM_SMC); - return -EINVAL; + rc = -EINVAL; + goto out; } crb_smc = ACPI_ADD_PTR(struct tpm2_crb_smc, buf, sizeof(*buf)); priv->smc_func_id = crb_smc->smc_func_id; @@ -700,17 +705,23 @@ static int crb_acpi_add(struct acpi_devi
rc = crb_map_io(device, priv, buf); if (rc) - return rc; + goto out;
chip = tpmm_chip_alloc(dev, &tpm_crb); - if (IS_ERR(chip)) - return PTR_ERR(chip); + if (IS_ERR(chip)) { + rc = PTR_ERR(chip); + goto out; + }
dev_set_drvdata(&chip->dev, priv); chip->acpi_dev_handle = device->handle; chip->flags = TPM_CHIP_FLAG_TPM2;
- return tpm_chip_register(chip); + rc = tpm_chip_register(chip); + +out: + acpi_put_table((struct acpi_table_header *)buf); + return rc; }
static int crb_acpi_remove(struct acpi_device *device)
From: Hanjun Guo guohanjun@huawei.com
commit db9622f762104459ff87ecdf885cc42c18053fd9 upstream.
In check_acpi_tpm2(), we get the TPM2 table just to make sure the table is there, not used after the init, so the acpi_put_table() should be added to release the ACPI memory.
Fixes: 4cb586a188d4 ("tpm_tis: Consolidate the platform and acpi probe flow") Cc: stable@vger.kernel.org Signed-off-by: Hanjun Guo guohanjun@huawei.com Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm_tis.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -125,6 +125,7 @@ static int check_acpi_tpm2(struct device const struct acpi_device_id *aid = acpi_match_device(tpm_acpi_tbl, dev); struct acpi_table_tpm2 *tbl; acpi_status st; + int ret = 0;
if (!aid || aid->driver_data != DEVICE_IS_TPM2) return 0; @@ -132,8 +133,7 @@ static int check_acpi_tpm2(struct device /* If the ACPI TPM2 signature is matched then a global ACPI_SIG_TPM2 * table is mandatory */ - st = - acpi_get_table(ACPI_SIG_TPM2, 1, (struct acpi_table_header **)&tbl); + st = acpi_get_table(ACPI_SIG_TPM2, 1, (struct acpi_table_header **)&tbl); if (ACPI_FAILURE(st) || tbl->header.length < sizeof(*tbl)) { dev_err(dev, FW_BUG "failed to get TPM2 ACPI table\n"); return -EINVAL; @@ -141,9 +141,10 @@ static int check_acpi_tpm2(struct device
/* The tpm2_crb driver handles this device */ if (tbl->start_method != ACPI_TPM2_MEMORY_MAPPED) - return -ENODEV; + ret = -ENODEV;
- return 0; + acpi_put_table((struct acpi_table_header *)tbl); + return ret; } #else static int check_acpi_tpm2(struct device *dev)
From: Chuck Lever chuck.lever@oracle.com
commit da522b5fe1a5f8b7c20a0023e87b52a150e53bf5 upstream.
Fixes: 030d794bf498 ("SUNRPC: Use gssproxy upcall for server RPCGSS authentication.") Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/auth_gss/svcauth_gss.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -1162,18 +1162,23 @@ static int gss_read_proxy_verf(struct sv return res;
inlen = svc_getnl(argv); - if (inlen > (argv->iov_len + rqstp->rq_arg.page_len)) + if (inlen > (argv->iov_len + rqstp->rq_arg.page_len)) { + kfree(in_handle->data); return SVC_DENIED; + }
pages = DIV_ROUND_UP(inlen, PAGE_SIZE); in_token->pages = kcalloc(pages, sizeof(struct page *), GFP_KERNEL); - if (!in_token->pages) + if (!in_token->pages) { + kfree(in_handle->data); return SVC_DENIED; + } in_token->page_base = 0; in_token->page_len = inlen; for (i = 0; i < pages; i++) { in_token->pages[i] = alloc_page(GFP_KERNEL); if (!in_token->pages[i]) { + kfree(in_handle->data); gss_free_in_token_pages(in_token); return SVC_DENIED; }
From: Marco Elver elver@google.com
commit 7c201739beef1a586d806463f1465429cdce34c5 upstream.
With Clang version 16+, -fsanitize=thread will turn memcpy/memset/memmove calls in instrumented functions into __tsan_memcpy/__tsan_memset/__tsan_memmove calls respectively.
Add these functions to the core KCSAN runtime, so that we (a) catch data races with mem* functions, and (b) won't run into linker errors with such newer compilers.
Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Marco Elver elver@google.com Signed-off-by: Paul E. McKenney paulmck@kernel.org [ elver@google.com: adjust check_access() call for v5.15 and earlier. ] Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/kcsan/core.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+)
--- a/kernel/kcsan/core.c +++ b/kernel/kcsan/core.c @@ -14,10 +14,12 @@ #include <linux/init.h> #include <linux/kernel.h> #include <linux/list.h> +#include <linux/minmax.h> #include <linux/moduleparam.h> #include <linux/percpu.h> #include <linux/preempt.h> #include <linux/sched.h> +#include <linux/string.h> #include <linux/uaccess.h>
#include "encoding.h" @@ -1060,3 +1062,51 @@ EXPORT_SYMBOL(__tsan_atomic_thread_fence void __tsan_atomic_signal_fence(int memorder); void __tsan_atomic_signal_fence(int memorder) { } EXPORT_SYMBOL(__tsan_atomic_signal_fence); + +#ifdef __HAVE_ARCH_MEMSET +void *__tsan_memset(void *s, int c, size_t count); +noinline void *__tsan_memset(void *s, int c, size_t count) +{ + /* + * Instead of not setting up watchpoints where accessed size is greater + * than MAX_ENCODABLE_SIZE, truncate checked size to MAX_ENCODABLE_SIZE. + */ + size_t check_len = min_t(size_t, count, MAX_ENCODABLE_SIZE); + + check_access(s, check_len, KCSAN_ACCESS_WRITE); + return memset(s, c, count); +} +#else +void *__tsan_memset(void *s, int c, size_t count) __alias(memset); +#endif +EXPORT_SYMBOL(__tsan_memset); + +#ifdef __HAVE_ARCH_MEMMOVE +void *__tsan_memmove(void *dst, const void *src, size_t len); +noinline void *__tsan_memmove(void *dst, const void *src, size_t len) +{ + size_t check_len = min_t(size_t, len, MAX_ENCODABLE_SIZE); + + check_access(dst, check_len, KCSAN_ACCESS_WRITE); + check_access(src, check_len, 0); + return memmove(dst, src, len); +} +#else +void *__tsan_memmove(void *dst, const void *src, size_t len) __alias(memmove); +#endif +EXPORT_SYMBOL(__tsan_memmove); + +#ifdef __HAVE_ARCH_MEMCPY +void *__tsan_memcpy(void *dst, const void *src, size_t len); +noinline void *__tsan_memcpy(void *dst, const void *src, size_t len) +{ + size_t check_len = min_t(size_t, len, MAX_ENCODABLE_SIZE); + + check_access(dst, check_len, KCSAN_ACCESS_WRITE); + check_access(src, check_len, 0); + return memcpy(dst, src, len); +} +#else +void *__tsan_memcpy(void *dst, const void *src, size_t len) __alias(memcpy); +#endif +EXPORT_SYMBOL(__tsan_memcpy);
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
commit 636110411ca726f19ef8e87b0be51bb9a4cdef06 upstream.
Overloading the tx_mask with a linear value is asking for trouble and only works because the codec_dai hw_params() is called before the cpu_dai hw_params().
Move to the more generic set_stream() API to pass the hdac_stream information.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Rander Wang rander.wang@intel.com Reviewed-by: Ranjani Sridharan ranjani.sridharan@intel.com Signed-off-by: Bard Liao yung-chuan.liao@linux.intel.com Link: https://lore.kernel.org/r/20211224021034.26635-6-yung-chuan.liao@linux.intel... Signed-off-by: Mark Brown broonie@kernel.org Cc: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/codecs/hdac_hda.c | 22 +++++++++++----------- sound/soc/intel/skylake/skl-pcm.c | 7 ++----- sound/soc/sof/intel/hda-dai.c | 7 ++----- 3 files changed, 15 insertions(+), 21 deletions(-)
--- a/sound/soc/codecs/hdac_hda.c +++ b/sound/soc/codecs/hdac_hda.c @@ -46,9 +46,8 @@ static int hdac_hda_dai_hw_params(struct struct snd_soc_dai *dai); static int hdac_hda_dai_hw_free(struct snd_pcm_substream *substream, struct snd_soc_dai *dai); -static int hdac_hda_dai_set_tdm_slot(struct snd_soc_dai *dai, - unsigned int tx_mask, unsigned int rx_mask, - int slots, int slot_width); +static int hdac_hda_dai_set_stream(struct snd_soc_dai *dai, void *stream, + int direction); static struct hda_pcm *snd_soc_find_pcm_from_dai(struct hdac_hda_priv *hda_pvt, struct snd_soc_dai *dai);
@@ -58,7 +57,7 @@ static const struct snd_soc_dai_ops hdac .prepare = hdac_hda_dai_prepare, .hw_params = hdac_hda_dai_hw_params, .hw_free = hdac_hda_dai_hw_free, - .set_tdm_slot = hdac_hda_dai_set_tdm_slot, + .set_stream = hdac_hda_dai_set_stream, };
static struct snd_soc_dai_driver hdac_hda_dais[] = { @@ -180,21 +179,22 @@ static struct snd_soc_dai_driver hdac_hd
};
-static int hdac_hda_dai_set_tdm_slot(struct snd_soc_dai *dai, - unsigned int tx_mask, unsigned int rx_mask, - int slots, int slot_width) +static int hdac_hda_dai_set_stream(struct snd_soc_dai *dai, + void *stream, int direction) { struct snd_soc_component *component = dai->component; struct hdac_hda_priv *hda_pvt; struct hdac_hda_pcm *pcm; + struct hdac_stream *hstream; + + if (!stream) + return -EINVAL;
hda_pvt = snd_soc_component_get_drvdata(component); pcm = &hda_pvt->pcm[dai->id]; + hstream = (struct hdac_stream *)stream;
- if (tx_mask) - pcm->stream_tag[SNDRV_PCM_STREAM_PLAYBACK] = tx_mask; - else - pcm->stream_tag[SNDRV_PCM_STREAM_CAPTURE] = rx_mask; + pcm->stream_tag[direction] = hstream->stream_tag;
return 0; } --- a/sound/soc/intel/skylake/skl-pcm.c +++ b/sound/soc/intel/skylake/skl-pcm.c @@ -562,11 +562,8 @@ static int skl_link_hw_params(struct snd
stream_tag = hdac_stream(link_dev)->stream_tag;
- /* set the stream tag in the codec dai dma params */ - if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) - snd_soc_dai_set_tdm_slot(codec_dai, stream_tag, 0, 0, 0); - else - snd_soc_dai_set_tdm_slot(codec_dai, 0, stream_tag, 0, 0); + /* set the hdac_stream in the codec dai */ + snd_soc_dai_set_stream(codec_dai, hdac_stream(link_dev), substream->stream);
p_params.s_fmt = snd_pcm_format_width(params_format(params)); p_params.ch = params_channels(params); --- a/sound/soc/sof/intel/hda-dai.c +++ b/sound/soc/sof/intel/hda-dai.c @@ -236,11 +236,8 @@ static int hda_link_hw_params(struct snd if (!link) return -EINVAL;
- /* set the stream tag in the codec dai dma params */ - if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) - snd_soc_dai_set_tdm_slot(codec_dai, stream_tag, 0, 0, 0); - else - snd_soc_dai_set_tdm_slot(codec_dai, 0, stream_tag, 0, 0); + /* set the hdac_stream in the codec dai */ + snd_soc_dai_set_stream(codec_dai, hdac_stream(link_dev), substream->stream);
p_params.s_fmt = snd_pcm_format_width(params_format(params)); p_params.ch = params_channels(params);
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
commit e8444560b4d9302a511f0996f4cfdf85b628f4ca upstream.
The HDAudio ASoC support relies on the set_tdm_slots() helper to store the HDaudio stream tag in the tx_mask. This only works because of the pre-existing order in soc-pcm.c, where the hw_params() is handled for codec_dais *before* cpu_dais. When the order is reversed, the stream_tag is used as a mask in the codec fixup functions:
/* fixup params based on TDM slot masks */ if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK && codec_dai->tx_mask) soc_pcm_codec_params_fixup(&codec_params, codec_dai->tx_mask);
As a result of this confusion, the codec_params_fixup() ends-up generating bad channel masks, depending on what stream_tag was allocated.
We could add a flag to state that the tx_mask is really not a mask, but it would be quite ugly to persist in overloading concepts.
Instead, this patch suggests a more generic get/set 'stream' API based on the existing model for SoundWire. We can expand the concept to store 'stream' opaque information that is specific to different DAI types. In the case of HDAudio DAIs, we only need to store a stream tag as an unsigned char pointer. The TDM rx_ and tx_masks should really only be used to store masks.
Rename get_sdw_stream/set_sdw_stream callbacks and helpers as get_stream/set_stream. No functionality change beyond the rename.
Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Rander Wang rander.wang@intel.com Reviewed-by: Ranjani Sridharan ranjani.sridharan@intel.com Signed-off-by: Bard Liao yung-chuan.liao@linux.intel.com Acked-By: Vinod Koul vkoul@kernel.org Link: https://lore.kernel.org/r/20211224021034.26635-5-yung-chuan.liao@linux.intel... Signed-off-by: Mark Brown broonie@kernel.org Cc: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soundwire/intel.c | 8 ++++---- drivers/soundwire/qcom.c | 8 ++++---- drivers/soundwire/stream.c | 4 ++-- include/sound/soc-dai.h | 32 ++++++++++++++++---------------- sound/soc/codecs/max98373-sdw.c | 2 +- sound/soc/codecs/rt1308-sdw.c | 2 +- sound/soc/codecs/rt1316-sdw.c | 2 +- sound/soc/codecs/rt5682-sdw.c | 2 +- sound/soc/codecs/rt700.c | 2 +- sound/soc/codecs/rt711-sdca.c | 2 +- sound/soc/codecs/rt711.c | 2 +- sound/soc/codecs/rt715-sdca.c | 2 +- sound/soc/codecs/rt715.c | 2 +- sound/soc/codecs/sdw-mockup.c | 2 +- sound/soc/codecs/wcd938x.c | 2 +- sound/soc/codecs/wsa881x.c | 2 +- sound/soc/intel/boards/sof_sdw.c | 6 +++--- sound/soc/qcom/sdm845.c | 4 ++-- sound/soc/qcom/sm8250.c | 4 ++-- 19 files changed, 45 insertions(+), 45 deletions(-)
--- a/drivers/soundwire/intel.c +++ b/drivers/soundwire/intel.c @@ -1065,8 +1065,8 @@ static const struct snd_soc_dai_ops inte .prepare = intel_prepare, .hw_free = intel_hw_free, .shutdown = intel_shutdown, - .set_sdw_stream = intel_pcm_set_sdw_stream, - .get_sdw_stream = intel_get_sdw_stream, + .set_stream = intel_pcm_set_sdw_stream, + .get_stream = intel_get_sdw_stream, };
static const struct snd_soc_dai_ops intel_pdm_dai_ops = { @@ -1075,8 +1075,8 @@ static const struct snd_soc_dai_ops inte .prepare = intel_prepare, .hw_free = intel_hw_free, .shutdown = intel_shutdown, - .set_sdw_stream = intel_pdm_set_sdw_stream, - .get_sdw_stream = intel_get_sdw_stream, + .set_stream = intel_pdm_set_sdw_stream, + .get_stream = intel_get_sdw_stream, };
static const struct snd_soc_component_driver dai_component = { --- a/drivers/soundwire/qcom.c +++ b/drivers/soundwire/qcom.c @@ -1032,8 +1032,8 @@ static int qcom_swrm_startup(struct snd_ ctrl->sruntime[dai->id] = sruntime;
for_each_rtd_codec_dais(rtd, i, codec_dai) { - ret = snd_soc_dai_set_sdw_stream(codec_dai, sruntime, - substream->stream); + ret = snd_soc_dai_set_stream(codec_dai, sruntime, + substream->stream); if (ret < 0 && ret != -ENOTSUPP) { dev_err(dai->dev, "Failed to set sdw stream on %s\n", codec_dai->name); @@ -1059,8 +1059,8 @@ static const struct snd_soc_dai_ops qcom .hw_free = qcom_swrm_hw_free, .startup = qcom_swrm_startup, .shutdown = qcom_swrm_shutdown, - .set_sdw_stream = qcom_swrm_set_sdw_stream, - .get_sdw_stream = qcom_swrm_get_sdw_stream, + .set_stream = qcom_swrm_set_sdw_stream, + .get_stream = qcom_swrm_get_sdw_stream, };
static const struct snd_soc_component_driver qcom_swrm_dai_component = { --- a/drivers/soundwire/stream.c +++ b/drivers/soundwire/stream.c @@ -1880,7 +1880,7 @@ static int set_stream(struct snd_pcm_sub
/* Set stream pointer on all DAIs */ for_each_rtd_dais(rtd, i, dai) { - ret = snd_soc_dai_set_sdw_stream(dai, sdw_stream, substream->stream); + ret = snd_soc_dai_set_stream(dai, sdw_stream, substream->stream); if (ret < 0) { dev_err(rtd->dev, "failed to set stream pointer on dai %s\n", dai->name); break; @@ -1951,7 +1951,7 @@ void sdw_shutdown_stream(void *sdw_subst /* Find stream from first CPU DAI */ dai = asoc_rtd_to_cpu(rtd, 0);
- sdw_stream = snd_soc_dai_get_sdw_stream(dai, substream->stream); + sdw_stream = snd_soc_dai_get_stream(dai, substream->stream);
if (IS_ERR(sdw_stream)) { dev_err(rtd->dev, "no stream found for DAI %s\n", dai->name); --- a/include/sound/soc-dai.h +++ b/include/sound/soc-dai.h @@ -295,9 +295,9 @@ struct snd_soc_dai_ops { unsigned int *rx_num, unsigned int *rx_slot); int (*set_tristate)(struct snd_soc_dai *dai, int tristate);
- int (*set_sdw_stream)(struct snd_soc_dai *dai, - void *stream, int direction); - void *(*get_sdw_stream)(struct snd_soc_dai *dai, int direction); + int (*set_stream)(struct snd_soc_dai *dai, + void *stream, int direction); + void *(*get_stream)(struct snd_soc_dai *dai, int direction);
/* * DAI digital mute - optional. @@ -515,42 +515,42 @@ static inline void *snd_soc_dai_get_drvd }
/** - * snd_soc_dai_set_sdw_stream() - Configures a DAI for SDW stream operation + * snd_soc_dai_set_stream() - Configures a DAI for stream operation * @dai: DAI - * @stream: STREAM + * @stream: STREAM (opaque structure depending on DAI type) * @direction: Stream direction(Playback/Capture) - * SoundWire subsystem doesn't have a notion of direction and we reuse + * Some subsystems, such as SoundWire, don't have a notion of direction and we reuse * the ASoC stream direction to configure sink/source ports. * Playback maps to source ports and Capture for sink ports. * * This should be invoked with NULL to clear the stream set previously. * Returns 0 on success, a negative error code otherwise. */ -static inline int snd_soc_dai_set_sdw_stream(struct snd_soc_dai *dai, - void *stream, int direction) +static inline int snd_soc_dai_set_stream(struct snd_soc_dai *dai, + void *stream, int direction) { - if (dai->driver->ops->set_sdw_stream) - return dai->driver->ops->set_sdw_stream(dai, stream, direction); + if (dai->driver->ops->set_stream) + return dai->driver->ops->set_stream(dai, stream, direction); else return -ENOTSUPP; }
/** - * snd_soc_dai_get_sdw_stream() - Retrieves SDW stream from DAI + * snd_soc_dai_get_stream() - Retrieves stream from DAI * @dai: DAI * @direction: Stream direction(Playback/Capture) * * This routine only retrieves that was previously configured - * with snd_soc_dai_get_sdw_stream() + * with snd_soc_dai_get_stream() * * Returns pointer to stream or an ERR_PTR value, e.g. * ERR_PTR(-ENOTSUPP) if callback is not supported; */ -static inline void *snd_soc_dai_get_sdw_stream(struct snd_soc_dai *dai, - int direction) +static inline void *snd_soc_dai_get_stream(struct snd_soc_dai *dai, + int direction) { - if (dai->driver->ops->get_sdw_stream) - return dai->driver->ops->get_sdw_stream(dai, direction); + if (dai->driver->ops->get_stream) + return dai->driver->ops->get_stream(dai, direction); else return ERR_PTR(-ENOTSUPP); } --- a/sound/soc/codecs/max98373-sdw.c +++ b/sound/soc/codecs/max98373-sdw.c @@ -741,7 +741,7 @@ static int max98373_sdw_set_tdm_slot(str static const struct snd_soc_dai_ops max98373_dai_sdw_ops = { .hw_params = max98373_sdw_dai_hw_params, .hw_free = max98373_pcm_hw_free, - .set_sdw_stream = max98373_set_sdw_stream, + .set_stream = max98373_set_sdw_stream, .shutdown = max98373_shutdown, .set_tdm_slot = max98373_sdw_set_tdm_slot, }; --- a/sound/soc/codecs/rt1308-sdw.c +++ b/sound/soc/codecs/rt1308-sdw.c @@ -613,7 +613,7 @@ static const struct snd_soc_component_dr static const struct snd_soc_dai_ops rt1308_aif_dai_ops = { .hw_params = rt1308_sdw_hw_params, .hw_free = rt1308_sdw_pcm_hw_free, - .set_sdw_stream = rt1308_set_sdw_stream, + .set_stream = rt1308_set_sdw_stream, .shutdown = rt1308_sdw_shutdown, .set_tdm_slot = rt1308_sdw_set_tdm_slot, }; --- a/sound/soc/codecs/rt1316-sdw.c +++ b/sound/soc/codecs/rt1316-sdw.c @@ -602,7 +602,7 @@ static const struct snd_soc_component_dr static const struct snd_soc_dai_ops rt1316_aif_dai_ops = { .hw_params = rt1316_sdw_hw_params, .hw_free = rt1316_sdw_pcm_hw_free, - .set_sdw_stream = rt1316_set_sdw_stream, + .set_stream = rt1316_set_sdw_stream, .shutdown = rt1316_sdw_shutdown, };
--- a/sound/soc/codecs/rt5682-sdw.c +++ b/sound/soc/codecs/rt5682-sdw.c @@ -272,7 +272,7 @@ static int rt5682_sdw_hw_free(struct snd static const struct snd_soc_dai_ops rt5682_sdw_ops = { .hw_params = rt5682_sdw_hw_params, .hw_free = rt5682_sdw_hw_free, - .set_sdw_stream = rt5682_set_sdw_stream, + .set_stream = rt5682_set_sdw_stream, .shutdown = rt5682_sdw_shutdown, };
--- a/sound/soc/codecs/rt700.c +++ b/sound/soc/codecs/rt700.c @@ -1015,7 +1015,7 @@ static int rt700_pcm_hw_free(struct snd_ static const struct snd_soc_dai_ops rt700_ops = { .hw_params = rt700_pcm_hw_params, .hw_free = rt700_pcm_hw_free, - .set_sdw_stream = rt700_set_sdw_stream, + .set_stream = rt700_set_sdw_stream, .shutdown = rt700_shutdown, };
--- a/sound/soc/codecs/rt711-sdca.c +++ b/sound/soc/codecs/rt711-sdca.c @@ -1361,7 +1361,7 @@ static int rt711_sdca_pcm_hw_free(struct static const struct snd_soc_dai_ops rt711_sdca_ops = { .hw_params = rt711_sdca_pcm_hw_params, .hw_free = rt711_sdca_pcm_hw_free, - .set_sdw_stream = rt711_sdca_set_sdw_stream, + .set_stream = rt711_sdca_set_sdw_stream, .shutdown = rt711_sdca_shutdown, };
--- a/sound/soc/codecs/rt711.c +++ b/sound/soc/codecs/rt711.c @@ -1092,7 +1092,7 @@ static int rt711_pcm_hw_free(struct snd_ static const struct snd_soc_dai_ops rt711_ops = { .hw_params = rt711_pcm_hw_params, .hw_free = rt711_pcm_hw_free, - .set_sdw_stream = rt711_set_sdw_stream, + .set_stream = rt711_set_sdw_stream, .shutdown = rt711_shutdown, };
--- a/sound/soc/codecs/rt715-sdca.c +++ b/sound/soc/codecs/rt715-sdca.c @@ -938,7 +938,7 @@ static int rt715_sdca_pcm_hw_free(struct static const struct snd_soc_dai_ops rt715_sdca_ops = { .hw_params = rt715_sdca_pcm_hw_params, .hw_free = rt715_sdca_pcm_hw_free, - .set_sdw_stream = rt715_sdca_set_sdw_stream, + .set_stream = rt715_sdca_set_sdw_stream, .shutdown = rt715_sdca_shutdown, };
--- a/sound/soc/codecs/rt715.c +++ b/sound/soc/codecs/rt715.c @@ -909,7 +909,7 @@ static int rt715_pcm_hw_free(struct snd_ static const struct snd_soc_dai_ops rt715_ops = { .hw_params = rt715_pcm_hw_params, .hw_free = rt715_pcm_hw_free, - .set_sdw_stream = rt715_set_sdw_stream, + .set_stream = rt715_set_sdw_stream, .shutdown = rt715_shutdown, };
--- a/sound/soc/codecs/sdw-mockup.c +++ b/sound/soc/codecs/sdw-mockup.c @@ -138,7 +138,7 @@ static int sdw_mockup_pcm_hw_free(struct static const struct snd_soc_dai_ops sdw_mockup_ops = { .hw_params = sdw_mockup_pcm_hw_params, .hw_free = sdw_mockup_pcm_hw_free, - .set_sdw_stream = sdw_mockup_set_sdw_stream, + .set_stream = sdw_mockup_set_sdw_stream, .shutdown = sdw_mockup_shutdown, };
--- a/sound/soc/codecs/wcd938x.c +++ b/sound/soc/codecs/wcd938x.c @@ -4302,7 +4302,7 @@ static int wcd938x_codec_set_sdw_stream( static const struct snd_soc_dai_ops wcd938x_sdw_dai_ops = { .hw_params = wcd938x_codec_hw_params, .hw_free = wcd938x_codec_free, - .set_sdw_stream = wcd938x_codec_set_sdw_stream, + .set_stream = wcd938x_codec_set_sdw_stream, };
static struct snd_soc_dai_driver wcd938x_dais[] = { --- a/sound/soc/codecs/wsa881x.c +++ b/sound/soc/codecs/wsa881x.c @@ -1026,7 +1026,7 @@ static const struct snd_soc_dai_ops wsa8 .hw_params = wsa881x_hw_params, .hw_free = wsa881x_hw_free, .mute_stream = wsa881x_digital_mute, - .set_sdw_stream = wsa881x_set_sdw_stream, + .set_stream = wsa881x_set_sdw_stream, };
static struct snd_soc_dai_driver wsa881x_dais[] = { --- a/sound/soc/intel/boards/sof_sdw.c +++ b/sound/soc/intel/boards/sof_sdw.c @@ -291,7 +291,7 @@ int sdw_prepare(struct snd_pcm_substream /* Find stream from first CPU DAI */ dai = asoc_rtd_to_cpu(rtd, 0);
- sdw_stream = snd_soc_dai_get_sdw_stream(dai, substream->stream); + sdw_stream = snd_soc_dai_get_stream(dai, substream->stream);
if (IS_ERR(sdw_stream)) { dev_err(rtd->dev, "no stream found for DAI %s", dai->name); @@ -311,7 +311,7 @@ int sdw_trigger(struct snd_pcm_substream /* Find stream from first CPU DAI */ dai = asoc_rtd_to_cpu(rtd, 0);
- sdw_stream = snd_soc_dai_get_sdw_stream(dai, substream->stream); + sdw_stream = snd_soc_dai_get_stream(dai, substream->stream);
if (IS_ERR(sdw_stream)) { dev_err(rtd->dev, "no stream found for DAI %s", dai->name); @@ -350,7 +350,7 @@ int sdw_hw_free(struct snd_pcm_substream /* Find stream from first CPU DAI */ dai = asoc_rtd_to_cpu(rtd, 0);
- sdw_stream = snd_soc_dai_get_sdw_stream(dai, substream->stream); + sdw_stream = snd_soc_dai_get_stream(dai, substream->stream);
if (IS_ERR(sdw_stream)) { dev_err(rtd->dev, "no stream found for DAI %s", dai->name); --- a/sound/soc/qcom/sdm845.c +++ b/sound/soc/qcom/sdm845.c @@ -56,8 +56,8 @@ static int sdm845_slim_snd_hw_params(str int ret = 0, i;
for_each_rtd_codec_dais(rtd, i, codec_dai) { - sruntime = snd_soc_dai_get_sdw_stream(codec_dai, - substream->stream); + sruntime = snd_soc_dai_get_stream(codec_dai, + substream->stream); if (sruntime != ERR_PTR(-ENOTSUPP)) pdata->sruntime[cpu_dai->id] = sruntime;
--- a/sound/soc/qcom/sm8250.c +++ b/sound/soc/qcom/sm8250.c @@ -70,8 +70,8 @@ static int sm8250_snd_hw_params(struct s switch (cpu_dai->id) { case WSA_CODEC_DMA_RX_0: for_each_rtd_codec_dais(rtd, i, codec_dai) { - sruntime = snd_soc_dai_get_sdw_stream(codec_dai, - substream->stream); + sruntime = snd_soc_dai_get_stream(codec_dai, + substream->stream); if (sruntime != ERR_PTR(-ENOTSUPP)) pdata->sruntime[cpu_dai->id] = sruntime; }
From: Paul E. McKenney paulmck@kernel.org
commit 96017bf9039763a2e02dcc6adaa18592cd73a39d upstream.
Currently, trc_wait_for_one_reader() atomically increments the trc_n_readers_need_end counter before sending the IPI invoking trc_read_check_handler(). All failure paths out of trc_read_check_handler() and also from the smp_call_function_single() within trc_wait_for_one_reader() must carefully atomically decrement this counter. This is more complex than it needs to be.
This commit therefore simplifies things and saves a few lines of code by dispensing with the atomic decrements in favor of having trc_read_check_handler() do the atomic increment only in the success case. In theory, this represents no change in functionality.
Signed-off-by: Paul E. McKenney paulmck@kernel.org Cc: Joel Fernandes joel@joelfernandes.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/rcu/tasks.h | 20 +++----------------- 1 file changed, 3 insertions(+), 17 deletions(-)
--- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -892,32 +892,24 @@ static void trc_read_check_handler(void
// If the task is no longer running on this CPU, leave. if (unlikely(texp != t)) { - if (WARN_ON_ONCE(atomic_dec_and_test(&trc_n_readers_need_end))) - wake_up(&trc_wait); goto reset_ipi; // Already on holdout list, so will check later. }
// If the task is not in a read-side critical section, and // if this is the last reader, awaken the grace-period kthread. if (likely(!READ_ONCE(t->trc_reader_nesting))) { - if (WARN_ON_ONCE(atomic_dec_and_test(&trc_n_readers_need_end))) - wake_up(&trc_wait); - // Mark as checked after decrement to avoid false - // positives on the above WARN_ON_ONCE(). WRITE_ONCE(t->trc_reader_checked, true); goto reset_ipi; } // If we are racing with an rcu_read_unlock_trace(), try again later. - if (unlikely(READ_ONCE(t->trc_reader_nesting) < 0)) { - if (WARN_ON_ONCE(atomic_dec_and_test(&trc_n_readers_need_end))) - wake_up(&trc_wait); + if (unlikely(READ_ONCE(t->trc_reader_nesting) < 0)) goto reset_ipi; - } WRITE_ONCE(t->trc_reader_checked, true);
// Get here if the task is in a read-side critical section. Set // its state so that it will awaken the grace-period kthread upon // exit from that critical section. + atomic_inc(&trc_n_readers_need_end); // One more to wait on. WARN_ON_ONCE(READ_ONCE(t->trc_reader_special.b.need_qs)); WRITE_ONCE(t->trc_reader_special.b.need_qs, true);
@@ -1017,21 +1009,15 @@ static void trc_wait_for_one_reader(stru if (per_cpu(trc_ipi_to_cpu, cpu) || t->trc_ipi_to_cpu >= 0) return;
- atomic_inc(&trc_n_readers_need_end); per_cpu(trc_ipi_to_cpu, cpu) = true; t->trc_ipi_to_cpu = cpu; rcu_tasks_trace.n_ipis++; - if (smp_call_function_single(cpu, - trc_read_check_handler, t, 0)) { + if (smp_call_function_single(cpu, trc_read_check_handler, t, 0)) { // Just in case there is some other reason for // failure than the target CPU being offline. rcu_tasks_trace.n_ipis_fails++; per_cpu(trc_ipi_to_cpu, cpu) = false; t->trc_ipi_to_cpu = cpu; - if (atomic_dec_and_test(&trc_n_readers_need_end)) { - WARN_ON_ONCE(1); - wake_up(&trc_wait); - } } } }
On Tue, Jan 10, 2023 at 1:23 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Paul E. McKenney paulmck@kernel.org
commit 96017bf9039763a2e02dcc6adaa18592cd73a39d upstream.
Thanks Greg, I had sent the same patch earlier for 5.15. Just so I learn, anything I did wrong or should have done differently?
- Joel
Currently, trc_wait_for_one_reader() atomically increments the trc_n_readers_need_end counter before sending the IPI invoking trc_read_check_handler(). All failure paths out of trc_read_check_handler() and also from the smp_call_function_single() within trc_wait_for_one_reader() must carefully atomically decrement this counter. This is more complex than it needs to be.
This commit therefore simplifies things and saves a few lines of code by dispensing with the atomic decrements in favor of having trc_read_check_handler() do the atomic increment only in the success case. In theory, this represents no change in functionality.
Signed-off-by: Paul E. McKenney paulmck@kernel.org Cc: Joel Fernandes joel@joelfernandes.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
kernel/rcu/tasks.h | 20 +++----------------- 1 file changed, 3 insertions(+), 17 deletions(-)
--- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -892,32 +892,24 @@ static void trc_read_check_handler(void
// If the task is no longer running on this CPU, leave. if (unlikely(texp != t)) {
if (WARN_ON_ONCE(atomic_dec_and_test(&trc_n_readers_need_end)))
wake_up(&trc_wait); goto reset_ipi; // Already on holdout list, so will check later. } // If the task is not in a read-side critical section, and // if this is the last reader, awaken the grace-period kthread. if (likely(!READ_ONCE(t->trc_reader_nesting))) {
if (WARN_ON_ONCE(atomic_dec_and_test(&trc_n_readers_need_end)))
wake_up(&trc_wait);
// Mark as checked after decrement to avoid false
// positives on the above WARN_ON_ONCE(). WRITE_ONCE(t->trc_reader_checked, true); goto reset_ipi; } // If we are racing with an rcu_read_unlock_trace(), try again later.
if (unlikely(READ_ONCE(t->trc_reader_nesting) < 0)) {
if (WARN_ON_ONCE(atomic_dec_and_test(&trc_n_readers_need_end)))
wake_up(&trc_wait);
if (unlikely(READ_ONCE(t->trc_reader_nesting) < 0)) goto reset_ipi;
} WRITE_ONCE(t->trc_reader_checked, true); // Get here if the task is in a read-side critical section. Set // its state so that it will awaken the grace-period kthread upon // exit from that critical section.
atomic_inc(&trc_n_readers_need_end); // One more to wait on. WARN_ON_ONCE(READ_ONCE(t->trc_reader_special.b.need_qs)); WRITE_ONCE(t->trc_reader_special.b.need_qs, true);
@@ -1017,21 +1009,15 @@ static void trc_wait_for_one_reader(stru if (per_cpu(trc_ipi_to_cpu, cpu) || t->trc_ipi_to_cpu >= 0) return;
atomic_inc(&trc_n_readers_need_end); per_cpu(trc_ipi_to_cpu, cpu) = true; t->trc_ipi_to_cpu = cpu; rcu_tasks_trace.n_ipis++;
if (smp_call_function_single(cpu,
trc_read_check_handler, t, 0)) {
if (smp_call_function_single(cpu, trc_read_check_handler, t, 0)) { // Just in case there is some other reason for // failure than the target CPU being offline. rcu_tasks_trace.n_ipis_fails++; per_cpu(trc_ipi_to_cpu, cpu) = false; t->trc_ipi_to_cpu = cpu;
if (atomic_dec_and_test(&trc_n_readers_need_end)) {
WARN_ON_ONCE(1);
wake_up(&trc_wait);
} } }
}
On Tue, Jan 10, 2023 at 1:26 PM Joel Fernandes joel@joelfernandes.org wrote:
On Tue, Jan 10, 2023 at 1:23 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Paul E. McKenney paulmck@kernel.org
commit 96017bf9039763a2e02dcc6adaa18592cd73a39d upstream.
Thanks Greg, I had sent the same patch earlier for 5.15. Just so I learn, anything I did wrong or should have done differently?
Never mind, I think it is just coming from your queue after you picked mine up...
On Tue, Jan 10, 2023 at 01:27:02PM -0500, Joel Fernandes wrote:
On Tue, Jan 10, 2023 at 1:26 PM Joel Fernandes joel@joelfernandes.org wrote:
On Tue, Jan 10, 2023 at 1:23 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
From: Paul E. McKenney paulmck@kernel.org
commit 96017bf9039763a2e02dcc6adaa18592cd73a39d upstream.
Thanks Greg, I had sent the same patch earlier for 5.15. Just so I learn, anything I did wrong or should have done differently?
Never mind, I think it is just coming from your queue after you picked mine up...
Yes, this is the one you sent me to have applied :)
From: Hangbin Liu liuhangbin@gmail.com
commit dfed913e8b55a0c2c4906f1242fd38fd9a116e49 upstream.
Currently, the kernel drops GSO VLAN tagged packet if it's created with socket(AF_PACKET, SOCK_RAW, 0) plus virtio_net_hdr.
The reason is AF_PACKET doesn't adjust the skb network header if there is a VLAN tag. Then after virtio_net_hdr_set_proto() called, the skb->protocol will be set to ETH_P_IP/IPv6. And in later inet/ipv6_gso_segment() the skb is dropped as network header position is invalid.
Let's handle VLAN packets by adjusting network header position in packet_parse_headers(). The adjustment is safe and does not affect the later xmit as tap device also did that.
In packet_snd(), packet_parse_headers() need to be moved before calling virtio_net_hdr_set_proto(), so we can set correct skb->protocol and network header first.
There is no need to update tpacket_snd() as it calls packet_parse_headers() in tpacket_fill_skb(), which is already before calling virtio_net_hdr_* functions.
skb->no_fcs setting is also moved upper to make all skb settings together and keep consistency with function packet_sendmsg_spkt().
Signed-off-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Willem de Bruijn willemb@google.com Acked-by: Michael S. Tsirkin mst@redhat.com Link: https://lore.kernel.org/r/20220425014502.985464-1-liuhangbin@gmail.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Tudor Ambarus tudor.ambarus@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1888,12 +1888,20 @@ oom:
static void packet_parse_headers(struct sk_buff *skb, struct socket *sock) { + int depth; + if ((!skb->protocol || skb->protocol == htons(ETH_P_ALL)) && sock->type == SOCK_RAW) { skb_reset_mac_header(skb); skb->protocol = dev_parse_header_protocol(skb); }
+ /* Move network header to the right position for VLAN tagged packets */ + if (likely(skb->dev->type == ARPHRD_ETHER) && + eth_type_vlan(skb->protocol) && + __vlan_get_protocol(skb, skb->protocol, &depth) != 0) + skb_set_network_header(skb, depth); + skb_probe_transport_header(skb); }
@@ -3008,6 +3016,11 @@ static int packet_snd(struct socket *soc skb->mark = sockc.mark; skb->tstamp = sockc.transmit_time;
+ if (unlikely(extra_len == 4)) + skb->no_fcs = 1; + + packet_parse_headers(skb, sock); + if (has_vnet_hdr) { err = virtio_net_hdr_to_skb(skb, &vnet_hdr, vio_le()); if (err) @@ -3016,11 +3029,6 @@ static int packet_snd(struct socket *soc virtio_net_hdr_set_proto(skb, &vnet_hdr); }
- packet_parse_headers(skb, sock); - - if (unlikely(extra_len == 4)) - skb->no_fcs = 1; - err = po->xmit(skb); if (unlikely(err != 0)) { if (err > 0)
From: Eric Dumazet edumazet@google.com
commit e9d3f80935b6607dcdc5682b00b1d4b28e0a0c5d upstream.
GSO assumes skb->head contains link layer headers.
tun device in some case can provide base 14 bytes, regardless of VLAN being used or not.
After blamed commit, we can end up setting a network header offset of 18+, we better pull the missing bytes to avoid a posible crash in GSO.
syzbot report was: kernel BUG at include/linux/skbuff.h:2699! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 3601 Comm: syz-executor210 Not tainted 5.18.0-syzkaller-11338-g2c5ca23f7414 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__skb_pull include/linux/skbuff.h:2699 [inline] RIP: 0010:skb_mac_gso_segment+0x48f/0x530 net/core/gro.c:136 Code: 00 48 c7 c7 00 96 d4 8a c6 05 cb d3 45 06 01 e8 26 bb d0 01 e9 2f fd ff ff 49 c7 c4 ea ff ff ff e9 f1 fe ff ff e8 91 84 19 fa <0f> 0b 48 89 df e8 97 44 66 fa e9 7f fd ff ff e8 ad 44 66 fa e9 48 RSP: 0018:ffffc90002e2f4b8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000012 RCX: 0000000000000000 RDX: ffff88805bb58000 RSI: ffffffff8760ed0f RDI: 0000000000000004 RBP: 0000000000005dbc R08: 0000000000000004 R09: 0000000000000fe0 R10: 0000000000000fe4 R11: 0000000000000000 R12: 0000000000000fe0 R13: ffff88807194d780 R14: 1ffff920005c5e9b R15: 0000000000000012 FS: 000055555730f300(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000200015c0 CR3: 0000000071ff8000 CR4: 0000000000350ee0 Call Trace: <TASK> __skb_gso_segment+0x327/0x6e0 net/core/dev.c:3411 skb_gso_segment include/linux/netdevice.h:4749 [inline] validate_xmit_skb+0x6bc/0xf10 net/core/dev.c:3669 validate_xmit_skb_list+0xbc/0x120 net/core/dev.c:3719 sch_direct_xmit+0x3d1/0xbe0 net/sched/sch_generic.c:327 __dev_xmit_skb net/core/dev.c:3815 [inline] __dev_queue_xmit+0x14a1/0x3a00 net/core/dev.c:4219 packet_snd net/packet/af_packet.c:3071 [inline] packet_sendmsg+0x21cb/0x5550 net/packet/af_packet.c:3102 sock_sendmsg_nosec net/socket.c:714 [inline] sock_sendmsg+0xcf/0x120 net/socket.c:734 ____sys_sendmsg+0x6eb/0x810 net/socket.c:2492 ___sys_sendmsg+0xf3/0x170 net/socket.c:2546 __sys_sendmsg net/socket.c:2575 [inline] __do_sys_sendmsg net/socket.c:2584 [inline] __se_sys_sendmsg net/socket.c:2582 [inline] __x64_sys_sendmsg+0x132/0x220 net/socket.c:2582 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7f4b95da06c9 Code: 28 c3 e8 4a 15 00 00 66 2e 0f 1f 84 00 00 00 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 c0 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd7defc4c8 EFLAGS: 00000246 ORIG_RAX: 000000000000002e RAX: ffffffffffffffda RBX: 00007ffd7defc4f0 RCX: 00007f4b95da06c9 RDX: 0000000000000000 RSI: 0000000020000140 RDI: 0000000000000003 RBP: 0000000000000003 R08: bb1414ac00000050 R09: bb1414ac00000050 R10: 0000000000000004 R11: 0000000000000246 R12: 0000000000000000 R13: 00007ffd7defc4e0 R14: 00007ffd7defc4d8 R15: 00007ffd7defc4d4 </TASK>
Fixes: dfed913e8b55 ("net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO") Signed-off-by: Eric Dumazet edumazet@google.com Reported-by: syzbot syzkaller@googlegroups.com Acked-by: Hangbin Liu liuhangbin@gmail.com Acked-by: Willem de Bruijn willemb@google.com Cc: Michael S. Tsirkin mst@redhat.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Tudor Ambarus tudor.ambarus@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/packet/af_packet.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -1899,8 +1899,10 @@ static void packet_parse_headers(struct /* Move network header to the right position for VLAN tagged packets */ if (likely(skb->dev->type == ARPHRD_ETHER) && eth_type_vlan(skb->protocol) && - __vlan_get_protocol(skb, skb->protocol, &depth) != 0) - skb_set_network_header(skb, depth); + __vlan_get_protocol(skb, skb->protocol, &depth) != 0) { + if (pskb_may_pull(skb, depth)) + skb_set_network_header(skb, depth); + }
skb_probe_transport_header(skb); }
From: Jason A. Donenfeld Jason@zx2c4.com
commit 7392134428c92a4cb541bd5c8f4f5c8d2e88364d upstream.
With char becoming unsigned by default, and with `char` alone being ambiguous and based on architecture, signed chars need to be marked explicitly as such. Use `s8` and `u8` types here, since that's what surrounding code does. This fixes:
drivers/media/dvb-frontends/stv0288.c:471 stv0288_set_frontend() warn: assigning (-9) to unsigned variable 'tm' drivers/media/dvb-frontends/stv0288.c:471 stv0288_set_frontend() warn: we never enter this loop
Cc: Mauro Carvalho Chehab mchehab@kernel.org Cc: linux-media@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/dvb-frontends/stv0288.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
--- a/drivers/media/dvb-frontends/stv0288.c +++ b/drivers/media/dvb-frontends/stv0288.c @@ -440,9 +440,8 @@ static int stv0288_set_frontend(struct d struct stv0288_state *state = fe->demodulator_priv; struct dtv_frontend_properties *c = &fe->dtv_property_cache;
- char tm; - unsigned char tda[3]; - u8 reg, time_out = 0; + u8 tda[3], reg, time_out = 0; + s8 tm;
dprintk("%s : FE_SET_FRONTEND\n", __func__);
From: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org
commit 5d2fe2d7b616b8baa18348ead857b504fc2de336 upstream.
LLCC driver uses REGMAP_MMIO for accessing the hardware registers. So select the dependency in Kconfig. Without this, there will be errors while building the driver with COMPILE_TEST only:
ERROR: modpost: "__devm_regmap_init_mmio_clk" [drivers/soc/qcom/llcc-qcom.ko] undefined! make[1]: *** [scripts/Makefile.modpost:126: Module.symvers] Error 1 make: *** [Makefile:1944: modpost] Error 2
Cc: stable@vger.kernel.org # 4.19 Fixes: a3134fb09e0b ("drivers: soc: Add LLCC driver") Reported-by: Borislav Petkov bp@alien8.de Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Bjorn Andersson andersson@kernel.org Link: https://lore.kernel.org/r/20221129071201.30024-2-manivannan.sadhasivam@linar... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/qcom/Kconfig | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/soc/qcom/Kconfig +++ b/drivers/soc/qcom/Kconfig @@ -63,6 +63,7 @@ config QCOM_GSBI config QCOM_LLCC tristate "Qualcomm Technologies, Inc. LLCC driver" depends on ARCH_QCOM || COMPILE_TEST + select REGMAP_MMIO help Qualcomm Technologies, Inc. platform specific Last Level Cache Controller(LLCC) driver for platforms such as,
From: Steven Rostedt rostedt@goodmis.org
commit 26df05a8c1420ad3de314fdd407e7fc2058cc7aa upstream.
grub2 has submenus where to use grub-reboot, it requires:
grub-reboot X>Y
where X is the main index and Y is the submenu. Thus if you have:
menuentry 'Debian GNU/Linux' --class debian --class gnu-linux ... [...] } submenu 'Advanced options for Debian GNU/Linux' $menuentry_id_option ... menuentry 'Debian GNU/Linux, with Linux 6.0.0-4-amd64' --class debian --class gnu-linux ... [...] } menuentry 'Debian GNU/Linux, with Linux 6.0.0-4-amd64 (recovery mode)' --class debian --class gnu-linux ... [...] } menuentry 'Debian GNU/Linux, with Linux test' --class debian --class gnu-linux ... [...] }
And wanted to boot to the "Linux test" kernel, you need to run:
# grub-reboot 1>2
As 1 is the second top menu (the submenu) and 2 is the third of the sub menu entries.
Have the grub.cfg parsing for grub2 handle such cases.
Cc: stable@vger.kernel.org Fixes: a15ba91361d46 ("ktest: Add support for grub2") Reviewed-by: John 'Warthog9' Hawley (VMware) warthog9@eaglescrag.net Signed-off-by: Steven Rostedt rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/ktest/ktest.pl | 20 +++++++++++++++----- 1 file changed, 15 insertions(+), 5 deletions(-)
--- a/tools/testing/ktest/ktest.pl +++ b/tools/testing/ktest/ktest.pl @@ -1963,7 +1963,7 @@ sub run_scp_mod {
sub _get_grub_index {
- my ($command, $target, $skip) = @_; + my ($command, $target, $skip, $submenu) = @_;
return if (defined($grub_number) && defined($last_grub_menu) && $last_grub_menu eq $grub_menu && defined($last_machine) && @@ -1980,11 +1980,16 @@ sub _get_grub_index {
my $found = 0;
+ my $submenu_number = 0; + while (<IN>) { if (/$target/) { $grub_number++; $found = 1; last; + } elsif (defined($submenu) && /$submenu/) { + $submenu_number++; + $grub_number = -1; } elsif (/$skip/) { $grub_number++; } @@ -1993,6 +1998,9 @@ sub _get_grub_index {
dodie "Could not find '$grub_menu' through $command on $machine" if (!$found); + if ($submenu_number > 0) { + $grub_number = "$submenu_number>$grub_number"; + } doprint "$grub_number\n"; $last_grub_menu = $grub_menu; $last_machine = $machine; @@ -2003,6 +2011,7 @@ sub get_grub_index { my $command; my $target; my $skip; + my $submenu; my $grub_menu_qt;
if ($reboot_type !~ /^grub/) { @@ -2017,8 +2026,9 @@ sub get_grub_index { $skip = '^\s*title\s'; } elsif ($reboot_type eq "grub2") { $command = "cat $grub_file"; - $target = '^menuentry.*' . $grub_menu_qt; - $skip = '^menuentry\s|^submenu\s'; + $target = '^\s*menuentry.*' . $grub_menu_qt; + $skip = '^\s*menuentry'; + $submenu = '^\s*submenu\s'; } elsif ($reboot_type eq "grub2bls") { $command = $grub_bls_get; $target = '^title=.*' . $grub_menu_qt; @@ -2027,7 +2037,7 @@ sub get_grub_index { return; }
- _get_grub_index($command, $target, $skip); + _get_grub_index($command, $target, $skip, $submenu); }
sub wait_for_input { @@ -2090,7 +2100,7 @@ sub reboot_to { if ($reboot_type eq "grub") { run_ssh "'(echo "savedefault --default=$grub_number --once" | grub --batch)'"; } elsif (($reboot_type eq "grub2") or ($reboot_type eq "grub2bls")) { - run_ssh "$grub_reboot $grub_number"; + run_ssh "$grub_reboot "'$grub_number'""; } elsif ($reboot_type eq "syslinux") { run_ssh "$syslinux --once \"$syslinux_label\" $syslinux_path"; } elsif (defined $reboot_script) {
From: Steven Rostedt rostedt@goodmis.org
commit ef784eebb56425eed6e9b16e7d47e5c00dcf9c38 upstream.
After a full run of a make_min_config test, I noticed there were a lot of CONFIGs still enabled that really should not be. Looking at them, I noticed they were all defined as "default y". The issue is that the test simple removes the config and re-runs make oldconfig, which enables it again because it is set to default 'y'. Instead, explicitly disable the config with writing "# CONFIG_FOO is not set" to the file to keep it from being set again.
With this change, one of my box's minconfigs went from 768 configs set, down to 521 configs set.
Link: https://lkml.kernel.org/r/20221202115936.016fce23@gandalf.local.home
Cc: stable@vger.kernel.org Fixes: 0a05c769a9de5 ("ktest: Added config_bisect test type") Reviewed-by: John 'Warthog9' Hawley (VMware) warthog9@eaglescrag.net Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/ktest/ktest.pl | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/tools/testing/ktest/ktest.pl +++ b/tools/testing/ktest/ktest.pl @@ -3778,9 +3778,10 @@ sub test_this_config { # .config to make sure it is missing the config that # we had before my %configs = %min_configs; - delete $configs{$config}; + $configs{$config} = "# $config is not set"; make_new_config ((values %configs), (values %keep_configs)); make_oldconfig; + delete $configs{$config}; undef %configs; assign_configs %configs, $output_config;
From: Bixuan Cui cuibixuan@linux.alibaba.com
commit d87a7b4c77a997d5388566dd511ca8e6b8e8a0a8 upstream.
The print format error was found when using ftrace event: <...>-1406 [000] .... 23599442.895823: jbd2_end_commit: dev 252,8 transaction -1866216965 sync 0 head -1866217368 <...>-1406 [000] .... 23599442.896299: jbd2_start_commit: dev 252,8 transaction -1866216964 sync 0
Use the correct print format for transaction, head and tid.
Fixes: 879c5e6b7cb4 ('jbd2: convert instrumentation from markers to tracepoints') Signed-off-by: Bixuan Cui cuibixuan@linux.alibaba.com Reviewed-by: Jason Yan yanaijie@huawei.com Link: https://lore.kernel.org/r/1665488024-95172-1-git-send-email-cuibixuan@linux.... Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/trace/events/jbd2.h | 44 ++++++++++++++++++++++---------------------- 1 file changed, 22 insertions(+), 22 deletions(-)
--- a/include/trace/events/jbd2.h +++ b/include/trace/events/jbd2.h @@ -40,7 +40,7 @@ DECLARE_EVENT_CLASS(jbd2_commit, TP_STRUCT__entry( __field( dev_t, dev ) __field( char, sync_commit ) - __field( int, transaction ) + __field( tid_t, transaction ) ),
TP_fast_assign( @@ -49,7 +49,7 @@ DECLARE_EVENT_CLASS(jbd2_commit, __entry->transaction = commit_transaction->t_tid; ),
- TP_printk("dev %d,%d transaction %d sync %d", + TP_printk("dev %d,%d transaction %u sync %d", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->transaction, __entry->sync_commit) ); @@ -97,8 +97,8 @@ TRACE_EVENT(jbd2_end_commit, TP_STRUCT__entry( __field( dev_t, dev ) __field( char, sync_commit ) - __field( int, transaction ) - __field( int, head ) + __field( tid_t, transaction ) + __field( tid_t, head ) ),
TP_fast_assign( @@ -108,7 +108,7 @@ TRACE_EVENT(jbd2_end_commit, __entry->head = journal->j_tail_sequence; ),
- TP_printk("dev %d,%d transaction %d sync %d head %d", + TP_printk("dev %d,%d transaction %u sync %d head %u", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->transaction, __entry->sync_commit, __entry->head) ); @@ -134,14 +134,14 @@ TRACE_EVENT(jbd2_submit_inode_data, );
DECLARE_EVENT_CLASS(jbd2_handle_start_class, - TP_PROTO(dev_t dev, unsigned long tid, unsigned int type, + TP_PROTO(dev_t dev, tid_t tid, unsigned int type, unsigned int line_no, int requested_blocks),
TP_ARGS(dev, tid, type, line_no, requested_blocks),
TP_STRUCT__entry( __field( dev_t, dev ) - __field( unsigned long, tid ) + __field( tid_t, tid ) __field( unsigned int, type ) __field( unsigned int, line_no ) __field( int, requested_blocks) @@ -155,28 +155,28 @@ DECLARE_EVENT_CLASS(jbd2_handle_start_cl __entry->requested_blocks = requested_blocks; ),
- TP_printk("dev %d,%d tid %lu type %u line_no %u " + TP_printk("dev %d,%d tid %u type %u line_no %u " "requested_blocks %d", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->tid, __entry->type, __entry->line_no, __entry->requested_blocks) );
DEFINE_EVENT(jbd2_handle_start_class, jbd2_handle_start, - TP_PROTO(dev_t dev, unsigned long tid, unsigned int type, + TP_PROTO(dev_t dev, tid_t tid, unsigned int type, unsigned int line_no, int requested_blocks),
TP_ARGS(dev, tid, type, line_no, requested_blocks) );
DEFINE_EVENT(jbd2_handle_start_class, jbd2_handle_restart, - TP_PROTO(dev_t dev, unsigned long tid, unsigned int type, + TP_PROTO(dev_t dev, tid_t tid, unsigned int type, unsigned int line_no, int requested_blocks),
TP_ARGS(dev, tid, type, line_no, requested_blocks) );
TRACE_EVENT(jbd2_handle_extend, - TP_PROTO(dev_t dev, unsigned long tid, unsigned int type, + TP_PROTO(dev_t dev, tid_t tid, unsigned int type, unsigned int line_no, int buffer_credits, int requested_blocks),
@@ -184,7 +184,7 @@ TRACE_EVENT(jbd2_handle_extend,
TP_STRUCT__entry( __field( dev_t, dev ) - __field( unsigned long, tid ) + __field( tid_t, tid ) __field( unsigned int, type ) __field( unsigned int, line_no ) __field( int, buffer_credits ) @@ -200,7 +200,7 @@ TRACE_EVENT(jbd2_handle_extend, __entry->requested_blocks = requested_blocks; ),
- TP_printk("dev %d,%d tid %lu type %u line_no %u " + TP_printk("dev %d,%d tid %u type %u line_no %u " "buffer_credits %d requested_blocks %d", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->tid, __entry->type, __entry->line_no, __entry->buffer_credits, @@ -208,7 +208,7 @@ TRACE_EVENT(jbd2_handle_extend, );
TRACE_EVENT(jbd2_handle_stats, - TP_PROTO(dev_t dev, unsigned long tid, unsigned int type, + TP_PROTO(dev_t dev, tid_t tid, unsigned int type, unsigned int line_no, int interval, int sync, int requested_blocks, int dirtied_blocks),
@@ -217,7 +217,7 @@ TRACE_EVENT(jbd2_handle_stats,
TP_STRUCT__entry( __field( dev_t, dev ) - __field( unsigned long, tid ) + __field( tid_t, tid ) __field( unsigned int, type ) __field( unsigned int, line_no ) __field( int, interval ) @@ -237,7 +237,7 @@ TRACE_EVENT(jbd2_handle_stats, __entry->dirtied_blocks = dirtied_blocks; ),
- TP_printk("dev %d,%d tid %lu type %u line_no %u interval %d " + TP_printk("dev %d,%d tid %u type %u line_no %u interval %d " "sync %d requested_blocks %d dirtied_blocks %d", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->tid, __entry->type, __entry->line_no, __entry->interval, @@ -246,14 +246,14 @@ TRACE_EVENT(jbd2_handle_stats, );
TRACE_EVENT(jbd2_run_stats, - TP_PROTO(dev_t dev, unsigned long tid, + TP_PROTO(dev_t dev, tid_t tid, struct transaction_run_stats_s *stats),
TP_ARGS(dev, tid, stats),
TP_STRUCT__entry( __field( dev_t, dev ) - __field( unsigned long, tid ) + __field( tid_t, tid ) __field( unsigned long, wait ) __field( unsigned long, request_delay ) __field( unsigned long, running ) @@ -279,7 +279,7 @@ TRACE_EVENT(jbd2_run_stats, __entry->blocks_logged = stats->rs_blocks_logged; ),
- TP_printk("dev %d,%d tid %lu wait %u request_delay %u running %u " + TP_printk("dev %d,%d tid %u wait %u request_delay %u running %u " "locked %u flushing %u logging %u handle_count %u " "blocks %u blocks_logged %u", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->tid, @@ -294,14 +294,14 @@ TRACE_EVENT(jbd2_run_stats, );
TRACE_EVENT(jbd2_checkpoint_stats, - TP_PROTO(dev_t dev, unsigned long tid, + TP_PROTO(dev_t dev, tid_t tid, struct transaction_chp_stats_s *stats),
TP_ARGS(dev, tid, stats),
TP_STRUCT__entry( __field( dev_t, dev ) - __field( unsigned long, tid ) + __field( tid_t, tid ) __field( unsigned long, chp_time ) __field( __u32, forced_to_close ) __field( __u32, written ) @@ -317,7 +317,7 @@ TRACE_EVENT(jbd2_checkpoint_stats, __entry->dropped = stats->cs_dropped; ),
- TP_printk("dev %d,%d tid %lu chp_time %u forced_to_close %u " + TP_printk("dev %d,%d tid %u chp_time %u forced_to_close %u " "written %u dropped %u", MAJOR(__entry->dev), MINOR(__entry->dev), __entry->tid, jiffies_to_msecs(__entry->chp_time),
From: Alexander Antonov alexander.antonov@linux.intel.com
commit efe062705d149b20a15498cb999a9edbb8241e6f upstream.
Current implementation of I/O stacks to PMU mapping doesn't support ICX-D. Detect ICX-D system to disable mapping.
Fixes: 10337e95e04c ("perf/x86/intel/uncore: Enable I/O stacks to IIO PMON mapping on ICX") Signed-off-by: Alexander Antonov alexander.antonov@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Kan Liang kan.liang@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221117122833.3103580-5-alexander.antonov@linux.i... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/uncore.h | 1 + arch/x86/events/intel/uncore_snbep.c | 5 +++++ 2 files changed, 6 insertions(+)
--- a/arch/x86/events/intel/uncore.h +++ b/arch/x86/events/intel/uncore.h @@ -2,6 +2,7 @@ #include <linux/slab.h> #include <linux/pci.h> #include <asm/apicdef.h> +#include <asm/intel-family.h> #include <linux/io-64-nonatomic-lo-hi.h>
#include <linux/perf_event.h> --- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -5144,6 +5144,11 @@ static int icx_iio_get_topology(struct i
static int icx_iio_set_mapping(struct intel_uncore_type *type) { + /* Detect ICX-D system. This case is not supported */ + if (boot_cpu_data.x86_model == INTEL_FAM6_ICELAKE_D) { + pmu_clear_mapping_attr(type->attr_update, &icx_iio_mapping_group); + return -EPERM; + } return pmu_iio_set_mapping(type, &icx_iio_mapping_group); }
From: Alexander Antonov alexander.antonov@linux.intel.com
commit 6532783310e2b2f50dc13f46c49aa6546cb6e7a3 upstream.
Current clear_attr_update procedure in pmu_set_mapping() sets attr_update field in NULL that is not correct because intel_uncore_type pmu types can contain several groups in attr_update field. For example, SPR platform already has uncore_alias_group to update and then UPI topology group will be added in next patches.
Fix current behavior and clear attr_update group related to mapping only.
Fixes: bb42b3d39781 ("perf/x86/intel/uncore: Expose an Uncore unit to IIO PMON mapping") Signed-off-by: Alexander Antonov alexander.antonov@linux.intel.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Kan Liang kan.liang@linux.intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221117122833.3103580-4-alexander.antonov@linux.i... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/events/intel/uncore_snbep.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
--- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -3804,6 +3804,21 @@ static const struct attribute_group *skx NULL, };
+static void pmu_clear_mapping_attr(const struct attribute_group **groups, + struct attribute_group *ag) +{ + int i; + + for (i = 0; groups[i]; i++) { + if (groups[i] == ag) { + for (i++; groups[i]; i++) + groups[i - 1] = groups[i]; + groups[i - 1] = NULL; + break; + } + } +} + static int pmu_iio_set_mapping(struct intel_uncore_type *type, struct attribute_group *ag) { @@ -3852,7 +3867,7 @@ clear_attrs: clear_topology: kfree(type->topology); clear_attr_update: - type->attr_update = NULL; + pmu_clear_mapping_attr(type->attr_update, ag); return ret; }
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit 9905370560d9c29adc15f4937c5a0c0dac05f0b4 upstream.
The pin configuration (done with generic pin controller helpers and as expressed by bindings) requires children nodes with either: 1. "pins" property and the actual configuration, 2. another set of nodes with above point.
The qup_spi2_default pin configuration uses alreaady the second method with a "pinmux" child, so configure drive-strength similarly in "pinconf". Otherwise the PIN drive strength would not be applied.
Fixes: 8d23a0040475 ("arm64: dts: qcom: db845c: add Low speed expansion i2c and spi nodes") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Douglas Anderson dianders@chromium.org Reviewed-by: Neil Armstrong neil.armstrong@linaro.org Signed-off-by: Bjorn Andersson andersson@kernel.org Link: https://lore.kernel.org/r/20221010114417.29859-2-krzysztof.kozlowski@linaro.... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts +++ b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts @@ -1045,7 +1045,10 @@
/* PINCTRL - additions to nodes defined in sdm845.dtsi */ &qup_spi2_default { - drive-strength = <16>; + pinconf { + pins = "gpio27", "gpio28", "gpio29", "gpio30"; + drive-strength = <16>; + }; };
&qup_uart3_default{
From: Wenchao Chen wenchao.chen@unisoc.com
commit ff874dbc4f868af128b412a9bd92637103cf11d7 upstream.
When the clock is less than 400K, some SD cards fail to initialize because CLK_AUTO is enabled.
Fixes: fb8bd90f83c4 ("mmc: sdhci-sprd: Add Spreadtrum's initial host controller") Signed-off-by: Wenchao Chen wenchao.chen@unisoc.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221207051909.32126-1-wenchao.chen@unisoc.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/sdhci-sprd.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-)
--- a/drivers/mmc/host/sdhci-sprd.c +++ b/drivers/mmc/host/sdhci-sprd.c @@ -224,13 +224,15 @@ static inline void _sdhci_sprd_set_clock div = ((div & 0x300) >> 2) | ((div & 0xFF) << 8); sdhci_enable_clk(host, div);
- /* enable auto gate sdhc_enable_auto_gate */ - val = sdhci_readl(host, SDHCI_SPRD_REG_32_BUSY_POSI); - mask = SDHCI_SPRD_BIT_OUTR_CLK_AUTO_EN | - SDHCI_SPRD_BIT_INNR_CLK_AUTO_EN; - if (mask != (val & mask)) { - val |= mask; - sdhci_writel(host, val, SDHCI_SPRD_REG_32_BUSY_POSI); + /* Enable CLK_AUTO when the clock is greater than 400K. */ + if (clk > 400000) { + val = sdhci_readl(host, SDHCI_SPRD_REG_32_BUSY_POSI); + mask = SDHCI_SPRD_BIT_OUTR_CLK_AUTO_EN | + SDHCI_SPRD_BIT_INNR_CLK_AUTO_EN; + if (mask != (val & mask)) { + val |= mask; + sdhci_writel(host, val, SDHCI_SPRD_REG_32_BUSY_POSI); + } } }
From: Boris Burkov boris@bur.io
commit 560840afc3e63bbe5d9c5ef6b2ecf8f3589adff6 upstream.
If a file consists of an inline extent followed by a regular or prealloc extent, then a legitimate attempt to resolve a logical address in the non-inline region will result in add_all_parents reading the invalid offset field of the inline extent. If the inline extent item is placed in the leaf eb s.t. it is the first item, attempting to access the offset field will not only be meaningless, it will go past the end of the eb and cause this panic:
[17.626048] BTRFS warning (device dm-2): bad eb member end: ptr 0x3fd4 start 30834688 member offset 16377 size 8 [17.631693] general protection fault, probably for non-canonical address 0x5088000000000: 0000 [#1] SMP PTI [17.635041] CPU: 2 PID: 1267 Comm: btrfs Not tainted 5.12.0-07246-g75175d5adc74-dirty #199 [17.637969] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [17.641995] RIP: 0010:btrfs_get_64+0xe7/0x110 [17.649890] RSP: 0018:ffffc90001f73a08 EFLAGS: 00010202 [17.651652] RAX: 0000000000000001 RBX: ffff88810c42d000 RCX: 0000000000000000 [17.653921] RDX: 0005088000000000 RSI: ffffc90001f73a0f RDI: 0000000000000001 [17.656174] RBP: 0000000000000ff9 R08: 0000000000000007 R09: c0000000fffeffff [17.658441] R10: ffffc90001f73790 R11: ffffc90001f73788 R12: ffff888106afe918 [17.661070] R13: 0000000000003fd4 R14: 0000000000003f6f R15: cdcdcdcdcdcdcdcd [17.663617] FS: 00007f64e7627d80(0000) GS:ffff888237c80000(0000) knlGS:0000000000000000 [17.666525] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [17.668664] CR2: 000055d4a39152e8 CR3: 000000010c596002 CR4: 0000000000770ee0 [17.671253] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17.673634] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [17.676034] PKRU: 55555554 [17.677004] Call Trace: [17.677877] add_all_parents+0x276/0x480 [17.679325] find_parent_nodes+0xfae/0x1590 [17.680771] btrfs_find_all_leafs+0x5e/0xa0 [17.682217] iterate_extent_inodes+0xce/0x260 [17.683809] ? btrfs_inode_flags_to_xflags+0x50/0x50 [17.685597] ? iterate_inodes_from_logical+0xa1/0xd0 [17.687404] iterate_inodes_from_logical+0xa1/0xd0 [17.689121] ? btrfs_inode_flags_to_xflags+0x50/0x50 [17.691010] btrfs_ioctl_logical_to_ino+0x131/0x190 [17.692946] btrfs_ioctl+0x104a/0x2f60 [17.694384] ? selinux_file_ioctl+0x182/0x220 [17.695995] ? __x64_sys_ioctl+0x84/0xc0 [17.697394] __x64_sys_ioctl+0x84/0xc0 [17.698697] do_syscall_64+0x33/0x40 [17.700017] entry_SYSCALL_64_after_hwframe+0x44/0xae [17.701753] RIP: 0033:0x7f64e72761b7 [17.709355] RSP: 002b:00007ffefb067f58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [17.712088] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f64e72761b7 [17.714667] RDX: 00007ffefb067fb0 RSI: 00000000c0389424 RDI: 0000000000000003 [17.717386] RBP: 00007ffefb06d188 R08: 000055d4a390d2b0 R09: 00007f64e7340a60 [17.719938] R10: 0000000000000231 R11: 0000000000000246 R12: 0000000000000001 [17.722383] R13: 0000000000000000 R14: 00000000c0389424 R15: 000055d4a38fd2a0 [17.724839] Modules linked in:
Fix the bug by detecting the inline extent item in add_all_parents and skipping to the next extent item.
CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Qu Wenruo wqu@suse.com Signed-off-by: Boris Burkov boris@bur.io Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/backref.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/fs/btrfs/backref.c +++ b/fs/btrfs/backref.c @@ -433,6 +433,7 @@ static int add_all_parents(struct btrfs_ u64 wanted_disk_byte = ref->wanted_disk_byte; u64 count = 0; u64 data_offset; + u8 type;
if (level != 0) { eb = path->nodes[level]; @@ -487,6 +488,9 @@ static int add_all_parents(struct btrfs_ continue; } fi = btrfs_item_ptr(eb, slot, struct btrfs_file_extent_item); + type = btrfs_file_extent_type(eb, fi); + if (type == BTRFS_FILE_EXTENT_INLINE) + goto next; disk_byte = btrfs_file_extent_disk_bytenr(eb, fi); data_offset = btrfs_file_extent_offset(eb, fi);
From: Jason A. Donenfeld Jason@zx2c4.com
commit 65b0e307a1a9193571db12910f382f84195a3d29 upstream.
Sparse reports that calling add_device_randomness() on `uid` is a violation of address spaces. And indeed the next usage uses readl() properly, but that was left out when passing it toadd_device_ randomness(). So instead copy the whole thing to the stack first.
Fixes: 4040d10a3d44 ("ARM: ux500: add DB serial number to entropy pool") Cc: Linus Walleij linus.walleij@linaro.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/202210230819.loF90KDh-lkp@intel.com/ Reported-by: kernel test robot lkp@intel.com Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Link: https://lore.kernel.org/r/20221108123755.207438-1-Jason@zx2c4.com Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/soc/ux500/ux500-soc-id.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
--- a/drivers/soc/ux500/ux500-soc-id.c +++ b/drivers/soc/ux500/ux500-soc-id.c @@ -167,20 +167,18 @@ ATTRIBUTE_GROUPS(ux500_soc); static const char *db8500_read_soc_id(struct device_node *backupram) { void __iomem *base; - void __iomem *uid; const char *retstr; + u32 uid[5];
base = of_iomap(backupram, 0); if (!base) return NULL; - uid = base + 0x1fc0; + memcpy_fromio(uid, base + 0x1fc0, sizeof(uid));
/* Throw these device-specific numbers into the entropy pool */ - add_device_randomness(uid, 0x14); + add_device_randomness(uid, sizeof(uid)); retstr = kasprintf(GFP_KERNEL, "%08x%08x%08x%08x%08x", - readl((u32 *)uid+0), - readl((u32 *)uid+1), readl((u32 *)uid+2), - readl((u32 *)uid+3), readl((u32 *)uid+4)); + uid[0], uid[1], uid[2], uid[3], uid[4]); iounmap(base); return retstr; }
From: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org
commit fd49776d8f458bba5499384131eddc0b8bcaf50c upstream.
The pin configuration (done with generic pin controller helpers and as expressed by bindings) requires children nodes with either: 1. "pins" property and the actual configuration, 2. another set of nodes with above point.
The qup_i2c12_default pin configuration used second method - with a "pinmux" child.
Fixes: 44acee207844 ("arm64: dts: qcom: Add Lenovo Yoga C630") Cc: stable@vger.kernel.org Signed-off-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Tested-by: Steev Klimaszewski steev@kali.org Reviewed-by: Konrad Dybcio konrad.dybcio@somainline.org Signed-off-by: Bjorn Andersson andersson@kernel.org Link: https://lore.kernel.org/r/20220930192039.240486-1-krzysztof.kozlowski@linaro... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
--- a/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts +++ b/arch/arm64/boot/dts/qcom/sdm850-lenovo-yoga-c630.dts @@ -475,8 +475,10 @@ };
&qup_i2c12_default { - drive-strength = <2>; - bias-disable; + pinmux { + drive-strength = <2>; + bias-disable; + }; };
&qup_uart6_default {
From: Mickaël Salaün mic@digikod.net
commit de3ee3f63400a23954e7c1ad1cb8c20f29ab6fe3 upstream.
This change enables to extend CFLAGS and LDFLAGS from command line, e.g. to extend compiler checks: make USERCFLAGS=-Werror USERLDFLAGS=-static
USERCFLAGS and USERLDFLAGS are documented in Documentation/kbuild/makefiles.rst and Documentation/kbuild/kbuild.rst
This should be backported (down to 5.10) to improve previous kernel versions testing as well.
Cc: Shuah Khan skhan@linuxfoundation.org Cc: stable@vger.kernel.org Signed-off-by: Mickaël Salaün mic@digikod.net Link: https://lore.kernel.org/r/20220909103901.1503436-1-mic@digikod.net Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/lib.mk | 5 +++++ 1 file changed, 5 insertions(+)
--- a/tools/testing/selftests/lib.mk +++ b/tools/testing/selftests/lib.mk @@ -129,6 +129,11 @@ endef clean: $(CLEAN)
+# Enables to extend CFLAGS and LDFLAGS from command line, e.g. +# make USERCFLAGS=-Werror USERLDFLAGS=-static +CFLAGS += $(USERCFLAGS) +LDFLAGS += $(USERLDFLAGS) + # When make O= with kselftest target from main level # the following aren't defined. #
From: Kant Fan kant@allwinnertech.com
commit 5fdded8448924e3631d466eea499b11606c43640 upstream.
The member void *data in the structure devfreq can be overwrite by governor_userspace. For example: 1. The device driver assigned the devfreq governor to simple_ondemand by the function devfreq_add_device() and init the devfreq member void *data to a pointer of a static structure devfreq_simple_ondemand_data by the function devfreq_add_device(). 2. The user changed the devfreq governor to userspace by the command "echo userspace > /sys/class/devfreq/.../governor". 3. The governor userspace alloced a dynamic memory for the struct userspace_data and assigend the member void *data of devfreq to this memory by the function userspace_init(). 4. The user changed the devfreq governor back to simple_ondemand by the command "echo simple_ondemand > /sys/class/devfreq/.../governor". 5. The governor userspace exited and assigned the member void *data in the structure devfreq to NULL by the function userspace_exit(). 6. The governor simple_ondemand fetched the static information of devfreq_simple_ondemand_data in the function devfreq_simple_ondemand_func() but the member void *data of devfreq was assigned to NULL by the function userspace_exit(). 7. The information of upthreshold and downdifferential is lost and the governor simple_ondemand can't work correctly.
The member void *data in the structure devfreq is designed for a static pointer used in a governor and inited by the function devfreq_add_device(). This patch add an element named governor_data in the devfreq structure which can be used by a governor(E.g userspace) who want to assign a private data to do some private things.
Fixes: ce26c5bb9569 ("PM / devfreq: Add basic governors") Cc: stable@vger.kernel.org # 5.10+ Reviewed-by: Chanwoo Choi cwchoi00@gmail.com Acked-by: MyungJoo Ham myungjoo.ham@samsung.com Signed-off-by: Kant Fan kant@allwinnertech.com Signed-off-by: Chanwoo Choi cw00.choi@samsung.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/devfreq/devfreq.c | 6 ++---- drivers/devfreq/governor_userspace.c | 12 ++++++------ include/linux/devfreq.h | 7 ++++--- 3 files changed, 12 insertions(+), 13 deletions(-)
--- a/drivers/devfreq/devfreq.c +++ b/drivers/devfreq/devfreq.c @@ -775,8 +775,7 @@ static void remove_sysfs_files(struct de * @dev: the device to add devfreq feature. * @profile: device-specific profile to run devfreq. * @governor_name: name of the policy to choose frequency. - * @data: private data for the governor. The devfreq framework does not - * touch this value. + * @data: devfreq driver pass to governors, governor should not change it. */ struct devfreq *devfreq_add_device(struct device *dev, struct devfreq_dev_profile *profile, @@ -1003,8 +1002,7 @@ static void devm_devfreq_dev_release(str * @dev: the device to add devfreq feature. * @profile: device-specific profile to run devfreq. * @governor_name: name of the policy to choose frequency. - * @data: private data for the governor. The devfreq framework does not - * touch this value. + * @data: devfreq driver pass to governors, governor should not change it. * * This function manages automatically the memory of devfreq device using device * resource management and simplify the free operation for memory of devfreq --- a/drivers/devfreq/governor_userspace.c +++ b/drivers/devfreq/governor_userspace.c @@ -21,7 +21,7 @@ struct userspace_data {
static int devfreq_userspace_func(struct devfreq *df, unsigned long *freq) { - struct userspace_data *data = df->data; + struct userspace_data *data = df->governor_data;
if (data->valid) *freq = data->user_frequency; @@ -40,7 +40,7 @@ static ssize_t set_freq_store(struct dev int err = 0;
mutex_lock(&devfreq->lock); - data = devfreq->data; + data = devfreq->governor_data;
sscanf(buf, "%lu", &wanted); data->user_frequency = wanted; @@ -60,7 +60,7 @@ static ssize_t set_freq_show(struct devi int err = 0;
mutex_lock(&devfreq->lock); - data = devfreq->data; + data = devfreq->governor_data;
if (data->valid) err = sprintf(buf, "%lu\n", data->user_frequency); @@ -91,7 +91,7 @@ static int userspace_init(struct devfreq goto out; } data->valid = false; - devfreq->data = data; + devfreq->governor_data = data;
err = sysfs_create_group(&devfreq->dev.kobj, &dev_attr_group); out: @@ -107,8 +107,8 @@ static void userspace_exit(struct devfre if (devfreq->dev.kobj.sd) sysfs_remove_group(&devfreq->dev.kobj, &dev_attr_group);
- kfree(devfreq->data); - devfreq->data = NULL; + kfree(devfreq->governor_data); + devfreq->governor_data = NULL; }
static int devfreq_userspace_handler(struct devfreq *devfreq, --- a/include/linux/devfreq.h +++ b/include/linux/devfreq.h @@ -149,8 +149,8 @@ struct devfreq_stats { * @work: delayed work for load monitoring. * @previous_freq: previously configured frequency value. * @last_status: devfreq user device info, performance statistics - * @data: Private data of the governor. The devfreq framework does not - * touch this. + * @data: devfreq driver pass to governors, governor should not change it. + * @governor_data: private data for governors, devfreq core doesn't touch it. * @user_min_freq_req: PM QoS minimum frequency request from user (via sysfs) * @user_max_freq_req: PM QoS maximum frequency request from user (via sysfs) * @scaling_min_freq: Limit minimum frequency requested by OPP interface @@ -187,7 +187,8 @@ struct devfreq { unsigned long previous_freq; struct devfreq_dev_status last_status;
- void *data; /* private data for governors */ + void *data; + void *governor_data;
struct dev_pm_qos_request user_min_freq_req; struct dev_pm_qos_request user_max_freq_req;
From: Yongqiang Liu liuyongqiang13@huawei.com
commit 5c51054896bcce1d33d39fead2af73fec24f40b6 upstream.
In cpufreq_policy_alloc(), it will call uninitialed completion in cpufreq_sysfs_release() when kobject_init_and_add() fails. And that will cause a crash such as the following page fault in complete:
BUG: unable to handle page fault for address: fffffffffffffff8 [..] RIP: 0010:complete+0x98/0x1f0 [..] Call Trace: kobject_put+0x1be/0x4c0 cpufreq_online.cold+0xee/0x1fd cpufreq_add_dev+0x183/0x1e0 subsys_interface_register+0x3f5/0x4e0 cpufreq_register_driver+0x3b7/0x670 acpi_cpufreq_init+0x56c/0x1000 [acpi_cpufreq] do_one_initcall+0x13d/0x780 do_init_module+0x1c3/0x630 load_module+0x6e67/0x73b0 __do_sys_finit_module+0x181/0x240 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd
Fixes: 4ebe36c94aed ("cpufreq: Fix kobject memleak") Signed-off-by: Yongqiang Liu liuyongqiang13@huawei.com Acked-by: Viresh Kumar viresh.kumar@linaro.org Cc: 5.2+ stable@vger.kernel.org # 5.2+ Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/cpufreq/cpufreq.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1212,6 +1212,7 @@ static struct cpufreq_policy *cpufreq_po if (!zalloc_cpumask_var(&policy->real_cpus, GFP_KERNEL)) goto err_free_rcpumask;
+ init_completion(&policy->kobj_unregister); ret = kobject_init_and_add(&policy->kobj, &ktype_cpufreq, cpufreq_global_kobject, "policy%u", cpu); if (ret) { @@ -1250,7 +1251,6 @@ static struct cpufreq_policy *cpufreq_po init_rwsem(&policy->rwsem); spin_lock_init(&policy->transition_lock); init_waitqueue_head(&policy->transition_wait); - init_completion(&policy->kobj_unregister); INIT_WORK(&policy->update, handle_update);
policy->cpu = cpu;
From: Philipp Jungkamp p.jungkamp@gmx.net
[ Upstream commit 2912cdda734d9136615ed05636d9fcbca2a7a3c5 ]
The Dell Inspiron Plus 16, in both laptop and 2in1 form factor, has top speakers connected on NID 0x17, which the codec reports as unconnected. These speakers should be connected to the DAC on NID 0x03.
Signed-off-by: Philipp Jungkamp p.jungkamp@gmx.net Link: https://lore.kernel.org/r/20221205163713.7476-1-p.jungkamp@gmx.net Signed-off-by: Takashi Iwai tiwai@suse.de Stable-dep-of: a4517c4f3423 ("ALSA: hda/realtek: Apply dual codec fixup for Dell Latitude laptops") Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 37 +++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 79c65da1b4ee..f74c49987f1a 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6709,6 +6709,34 @@ static void alc256_fixup_mic_no_presence_and_resume(struct hda_codec *codec, } }
+static void alc295_fixup_dell_inspiron_top_speakers(struct hda_codec *codec, + const struct hda_fixup *fix, int action) +{ + static const struct hda_pintbl pincfgs[] = { + { 0x14, 0x90170151 }, + { 0x17, 0x90170150 }, + { } + }; + static const hda_nid_t conn[] = { 0x02, 0x03 }; + static const hda_nid_t preferred_pairs[] = { + 0x14, 0x02, + 0x17, 0x03, + 0x21, 0x02, + 0 + }; + struct alc_spec *spec = codec->spec; + + alc_fixup_no_shutup(codec, fix, action); + + switch (action) { + case HDA_FIXUP_ACT_PRE_PROBE: + snd_hda_apply_pincfgs(codec, pincfgs); + snd_hda_override_conn_list(codec, 0x17, ARRAY_SIZE(conn), conn); + spec->gen.preferred_dacs = preferred_pairs; + break; + } +} + enum { ALC269_FIXUP_GPIO2, ALC269_FIXUP_SONY_VAIO, @@ -6940,6 +6968,7 @@ enum { ALC285_FIXUP_LEGION_Y9000X_SPEAKERS, ALC285_FIXUP_LEGION_Y9000X_AUTOMUTE, ALC285_FIXUP_HP_SPEAKERS_MICMUTE_LED, + ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS, };
/* A special fixup for Lenovo C940 and Yoga Duet 7; @@ -8766,6 +8795,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC285_FIXUP_HP_MUTE_LED, }, + [ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc295_fixup_dell_inspiron_top_speakers, + .chained = true, + .chain_id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, + }, };
static const struct snd_pci_quirk alc269_fixup_tbl[] = { @@ -8865,6 +8900,8 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1028, 0x0a9e, "Dell Latitude 5430", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x0b19, "Dell XPS 15 9520", ALC289_FIXUP_DUAL_SPK), SND_PCI_QUIRK(0x1028, 0x0b1a, "Dell Precision 5570", ALC289_FIXUP_DUAL_SPK), + SND_PCI_QUIRK(0x1028, 0x0b37, "Dell Inspiron 16 Plus 7620 2-in-1", ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS), + SND_PCI_QUIRK(0x1028, 0x0b71, "Dell Inspiron 16 Plus 7620", ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS), SND_PCI_QUIRK(0x1028, 0x164a, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x164b, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x103c, 0x1586, "HP", ALC269_FIXUP_HP_MUTE_LED_MIC2),
From: Chris Chiu chris.chiu@canonical.com
[ Upstream commit a4517c4f3423c7c448f2c359218f97c1173523a1 ]
The Dell Latiture 3340/3440/3540 laptops with Realtek ALC3204 have dual codecs and need the ALC1220_FIXUP_GB_DUAL_CODECS to fix the conflicts of Master controls. The existing headset mic fixup for Dell is also required to enable the jack sense and the headset mic.
Introduce a new fixup to fix the dual codec and headset mic issues for particular Dell laptops since other old Dell laptops with the same codec configuration are already well handled by the fixup in alc269_fallback_pin_fixup_tbl[].
Signed-off-by: Chris Chiu chris.chiu@canonical.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221226114303.4027500-1-chris.chiu@canonical.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/pci/hda/patch_realtek.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index f74c49987f1a..642e212278ac 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -6969,6 +6969,7 @@ enum { ALC285_FIXUP_LEGION_Y9000X_AUTOMUTE, ALC285_FIXUP_HP_SPEAKERS_MICMUTE_LED, ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS, + ALC236_FIXUP_DELL_DUAL_CODECS, };
/* A special fixup for Lenovo C940 and Yoga Duet 7; @@ -8801,6 +8802,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, }, + [ALC236_FIXUP_DELL_DUAL_CODECS] = { + .type = HDA_FIXUP_PINS, + .v.func = alc1220_fixup_gb_dual_codecs, + .chained = true, + .chain_id = ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + }, };
static const struct snd_pci_quirk alc269_fixup_tbl[] = { @@ -8902,6 +8909,12 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1028, 0x0b1a, "Dell Precision 5570", ALC289_FIXUP_DUAL_SPK), SND_PCI_QUIRK(0x1028, 0x0b37, "Dell Inspiron 16 Plus 7620 2-in-1", ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS), SND_PCI_QUIRK(0x1028, 0x0b71, "Dell Inspiron 16 Plus 7620", ALC295_FIXUP_DELL_INSPIRON_TOP_SPEAKERS), + SND_PCI_QUIRK(0x1028, 0x0c19, "Dell Precision 3340", ALC236_FIXUP_DELL_DUAL_CODECS), + SND_PCI_QUIRK(0x1028, 0x0c1a, "Dell Precision 3340", ALC236_FIXUP_DELL_DUAL_CODECS), + SND_PCI_QUIRK(0x1028, 0x0c1b, "Dell Precision 3440", ALC236_FIXUP_DELL_DUAL_CODECS), + SND_PCI_QUIRK(0x1028, 0x0c1c, "Dell Precision 3540", ALC236_FIXUP_DELL_DUAL_CODECS), + SND_PCI_QUIRK(0x1028, 0x0c1d, "Dell Precision 3440", ALC236_FIXUP_DELL_DUAL_CODECS), + SND_PCI_QUIRK(0x1028, 0x0c1e, "Dell Precision 3540", ALC236_FIXUP_DELL_DUAL_CODECS), SND_PCI_QUIRK(0x1028, 0x164a, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x1028, 0x164b, "Dell", ALC293_FIXUP_DELL1_MIC_NO_PRESENCE), SND_PCI_QUIRK(0x103c, 0x1586, "HP", ALC269_FIXUP_HP_MUTE_LED_MIC2),
From: Alexander Aring aahringo@redhat.com
commit 08ae0547e75ec3d062b6b6b9cf4830c730df68df upstream.
This patch fixes a double sock_release() call when the listen() is called for the dlm lowcomms listen socket. The caller of dlm_listen_for_all should never care about releasing the socket if dlm_listen_for_all() fails, it's done now only once if listen() fails.
Cc: stable@vger.kernel.org Fixes: 2dc6b1158c28 ("fs: dlm: introduce generic listen") Signed-off-by: Alexander Aring aahringo@redhat.com Signed-off-by: David Teigland teigland@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/dlm/lowcomms.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -1797,7 +1797,7 @@ static int dlm_listen_for_all(void) result = sock->ops->listen(sock, 5); if (result < 0) { dlm_close_sock(&listen_con.sock); - goto out; + return result; }
return 0; @@ -2000,7 +2000,6 @@ fail_listen: dlm_proto_ops = NULL; fail_proto_ops: dlm_allow_conn = 0; - dlm_close_sock(&listen_con.sock); work_stop(); fail_local: deinit_local();
From: Alexander Aring aahringo@redhat.com
commit f0f4bb431bd543ed7bebbaea3ce326cfcd5388bc upstream.
This patch fixes a race if we get two times an socket data ready event while the listen connection worker is queued. Currently it will be served only once but we need to do it (in this case twice) until we hit -EAGAIN which tells us there is no pending accept going on.
This patch wraps an do while loop until we receive a return value which is different than 0 as it was done before commit d11ccd451b65 ("fs: dlm: listen socket out of connection hash").
Cc: stable@vger.kernel.org Fixes: d11ccd451b65 ("fs: dlm: listen socket out of connection hash") Signed-off-by: Alexander Aring aahringo@redhat.com Signed-off-by: David Teigland teigland@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/dlm/lowcomms.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -1520,7 +1520,11 @@ static void process_recv_sockets(struct
static void process_listen_recv_socket(struct work_struct *work) { - accept_from_sock(&listen_con); + int ret; + + do { + ret = accept_from_sock(&listen_con); + } while (!ret); }
static void dlm_connect(struct connection *con)
From: Florian Westphal fw@strlen.de
commit 51fa7f8ebf0e25c7a9039fa3988a623d5f3855aa upstream.
These structures are initialised from the init hooks, so we can't make them 'const'. But no writes occur afterwards, so we can use ro_after_init.
Also, remove bogus EXPORT_SYMBOL, the only access comes from ip stack, not from kernel modules.
Signed-off-by: Florian Westphal fw@strlen.de Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/subflow.c | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-)
--- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -484,8 +484,7 @@ do_reset: }
struct request_sock_ops mptcp_subflow_request_sock_ops; -EXPORT_SYMBOL_GPL(mptcp_subflow_request_sock_ops); -static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops; +static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops __ro_after_init;
static int subflow_v4_conn_request(struct sock *sk, struct sk_buff *skb) { @@ -506,9 +505,9 @@ drop: }
#if IS_ENABLED(CONFIG_MPTCP_IPV6) -static struct tcp_request_sock_ops subflow_request_sock_ipv6_ops; -static struct inet_connection_sock_af_ops subflow_v6_specific; -static struct inet_connection_sock_af_ops subflow_v6m_specific; +static struct tcp_request_sock_ops subflow_request_sock_ipv6_ops __ro_after_init; +static struct inet_connection_sock_af_ops subflow_v6_specific __ro_after_init; +static struct inet_connection_sock_af_ops subflow_v6m_specific __ro_after_init; static struct proto tcpv6_prot_override;
static int subflow_v6_conn_request(struct sock *sk, struct sk_buff *skb) @@ -790,7 +789,7 @@ dispose_child: return child; }
-static struct inet_connection_sock_af_ops subflow_specific; +static struct inet_connection_sock_af_ops subflow_specific __ro_after_init; static struct proto tcp_prot_override;
enum mapping_status { @@ -1327,7 +1326,7 @@ static void subflow_write_space(struct s mptcp_write_space(sk); }
-static struct inet_connection_sock_af_ops * +static const struct inet_connection_sock_af_ops * subflow_default_af_ops(struct sock *sk) { #if IS_ENABLED(CONFIG_MPTCP_IPV6) @@ -1342,7 +1341,7 @@ void mptcpv6_handle_mapped(struct sock * { struct mptcp_subflow_context *subflow = mptcp_subflow_ctx(sk); struct inet_connection_sock *icsk = inet_csk(sk); - struct inet_connection_sock_af_ops *target; + const struct inet_connection_sock_af_ops *target;
target = mapped ? &subflow_v6m_specific : subflow_default_af_ops(sk);
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 3fff88186f047627bb128d65155f42517f8e448f upstream.
To ease the maintenance, it is often recommended to avoid having #ifdef preprocessor conditions.
Here the section related to CONFIG_MPTCP was quite short but the next commit needs to add more code around. It is then cleaner to move specific MPTCP code to functions located in net/mptcp directory.
Now that mptcp_subflow_request_sock_ops structure can be static, it can also be marked as "read only after init".
Suggested-by: Paolo Abeni pabeni@redhat.com Reviewed-by: Mat Martineau mathew.j.martineau@linux.intel.com Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- include/net/mptcp.h | 12 ++++++++++-- net/ipv4/syncookies.c | 7 +++---- net/mptcp/subflow.c | 12 +++++++++++- 3 files changed, 24 insertions(+), 7 deletions(-)
--- a/include/net/mptcp.h +++ b/include/net/mptcp.h @@ -93,8 +93,6 @@ struct mptcp_out_options { };
#ifdef CONFIG_MPTCP -extern struct request_sock_ops mptcp_subflow_request_sock_ops; - void mptcp_init(void);
static inline bool sk_is_mptcp(const struct sock *sk) @@ -182,6 +180,9 @@ void mptcp_seq_show(struct seq_file *seq int mptcp_subflow_init_cookie_req(struct request_sock *req, const struct sock *sk_listener, struct sk_buff *skb); +struct request_sock *mptcp_subflow_reqsk_alloc(const struct request_sock_ops *ops, + struct sock *sk_listener, + bool attach_listener);
__be32 mptcp_get_reset_option(const struct sk_buff *skb);
@@ -274,6 +275,13 @@ static inline int mptcp_subflow_init_coo return 0; /* TCP fallback */ }
+static inline struct request_sock *mptcp_subflow_reqsk_alloc(const struct request_sock_ops *ops, + struct sock *sk_listener, + bool attach_listener) +{ + return NULL; +} + static inline __be32 mptcp_reset_option(const struct sk_buff *skb) { return htonl(0u); } #endif /* CONFIG_MPTCP */
--- a/net/ipv4/syncookies.c +++ b/net/ipv4/syncookies.c @@ -290,12 +290,11 @@ struct request_sock *cookie_tcp_reqsk_al struct tcp_request_sock *treq; struct request_sock *req;
-#ifdef CONFIG_MPTCP if (sk_is_mptcp(sk)) - ops = &mptcp_subflow_request_sock_ops; -#endif + req = mptcp_subflow_reqsk_alloc(ops, sk, false); + else + req = inet_reqsk_alloc(ops, sk, false);
- req = inet_reqsk_alloc(ops, sk, false); if (!req) return NULL;
--- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -483,7 +483,7 @@ do_reset: mptcp_subflow_reset(sk); }
-struct request_sock_ops mptcp_subflow_request_sock_ops; +static struct request_sock_ops mptcp_subflow_request_sock_ops __ro_after_init; static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops __ro_after_init;
static int subflow_v4_conn_request(struct sock *sk, struct sk_buff *skb) @@ -536,6 +536,16 @@ drop: } #endif
+struct request_sock *mptcp_subflow_reqsk_alloc(const struct request_sock_ops *ops, + struct sock *sk_listener, + bool attach_listener) +{ + ops = &mptcp_subflow_request_sock_ops; + + return inet_reqsk_alloc(ops, sk_listener, attach_listener); +} +EXPORT_SYMBOL(mptcp_subflow_reqsk_alloc); + /* validate hmac received in third ACK */ static bool subflow_hmac_valid(const struct request_sock *req, const struct mptcp_options_received *mp_opt)
From: Mike Snitzer snitzer@kernel.org
commit 352b837a5541690d4f843819028cf2b8be83d424 upstream.
Same ABBA deadlock pattern fixed in commit 4b60f452ec51 ("dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata") to DM-cache's metadata.
Reported-by: Zhihao Cheng chengzhihao1@huawei.com Cc: stable@vger.kernel.org Fixes: 028ae9f76f29 ("dm cache: add fail io mode and needs_check flag") Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-cache-metadata.c | 54 +++++++++++++++++++++++++++++++++++------ 1 file changed, 47 insertions(+), 7 deletions(-)
--- a/drivers/md/dm-cache-metadata.c +++ b/drivers/md/dm-cache-metadata.c @@ -551,11 +551,13 @@ static int __create_persistent_data_obje return r; }
-static void __destroy_persistent_data_objects(struct dm_cache_metadata *cmd) +static void __destroy_persistent_data_objects(struct dm_cache_metadata *cmd, + bool destroy_bm) { dm_sm_destroy(cmd->metadata_sm); dm_tm_destroy(cmd->tm); - dm_block_manager_destroy(cmd->bm); + if (destroy_bm) + dm_block_manager_destroy(cmd->bm); }
typedef unsigned long (*flags_mutator)(unsigned long); @@ -826,7 +828,7 @@ static struct dm_cache_metadata *lookup_ cmd2 = lookup(bdev); if (cmd2) { mutex_unlock(&table_lock); - __destroy_persistent_data_objects(cmd); + __destroy_persistent_data_objects(cmd, true); kfree(cmd); return cmd2; } @@ -874,7 +876,7 @@ void dm_cache_metadata_close(struct dm_c mutex_unlock(&table_lock);
if (!cmd->fail_io) - __destroy_persistent_data_objects(cmd); + __destroy_persistent_data_objects(cmd, true); kfree(cmd); } } @@ -1808,14 +1810,52 @@ int dm_cache_metadata_needs_check(struct
int dm_cache_metadata_abort(struct dm_cache_metadata *cmd) { - int r; + int r = -EINVAL; + struct dm_block_manager *old_bm = NULL, *new_bm = NULL; + + /* fail_io is double-checked with cmd->root_lock held below */ + if (unlikely(cmd->fail_io)) + return r; + + /* + * Replacement block manager (new_bm) is created and old_bm destroyed outside of + * cmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of + * shrinker associated with the block manager's bufio client vs cmd root_lock). + * - must take shrinker_rwsem without holding cmd->root_lock + */ + new_bm = dm_block_manager_create(cmd->bdev, DM_CACHE_METADATA_BLOCK_SIZE << SECTOR_SHIFT, + CACHE_MAX_CONCURRENT_LOCKS);
WRITE_LOCK(cmd); - __destroy_persistent_data_objects(cmd); - r = __create_persistent_data_objects(cmd, false); + if (cmd->fail_io) { + WRITE_UNLOCK(cmd); + goto out; + } + + __destroy_persistent_data_objects(cmd, false); + old_bm = cmd->bm; + if (IS_ERR(new_bm)) { + DMERR("could not create block manager during abort"); + cmd->bm = NULL; + r = PTR_ERR(new_bm); + goto out_unlock; + } + + cmd->bm = new_bm; + r = __open_or_format_metadata(cmd, false); + if (r) { + cmd->bm = NULL; + goto out_unlock; + } + new_bm = NULL; +out_unlock: if (r) cmd->fail_io = true; WRITE_UNLOCK(cmd); + dm_block_manager_destroy(old_bm); +out: + if (new_bm && !IS_ERR(new_bm)) + dm_block_manager_destroy(new_bm);
return r; }
From: Zhihao Cheng chengzhihao1@huawei.com
commit 8111964f1b8524c4bb56b02cd9c7a37725ea21fd upstream.
Following concurrent processes:
P1(drop cache) P2(kworker) drop_caches_sysctl_handler drop_slab shrink_slab down_read(&shrinker_rwsem) - LOCK A do_shrink_slab super_cache_scan prune_icache_sb dispose_list evict ext4_evict_inode ext4_clear_inode ext4_discard_preallocations ext4_mb_load_buddy_gfp ext4_mb_init_cache ext4_read_block_bitmap_nowait ext4_read_bh_nowait submit_bh dm_submit_bio do_worker process_deferred_bios commit metadata_operation_failed dm_pool_abort_metadata down_write(&pmd->root_lock) - LOCK B __destroy_persistent_data_objects dm_block_manager_destroy dm_bufio_client_destroy unregister_shrinker down_write(&shrinker_rwsem) thin_map | dm_thin_find_block ↓ down_read(&pmd->root_lock) --> ABBA deadlock
, which triggers hung task:
[ 76.974820] INFO: task kworker/u4:3:63 blocked for more than 15 seconds. [ 76.976019] Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910 [ 76.978521] task:kworker/u4:3 state:D stack:0 pid:63 ppid:2 [ 76.978534] Workqueue: dm-thin do_worker [ 76.978552] Call Trace: [ 76.978564] __schedule+0x6ba/0x10f0 [ 76.978582] schedule+0x9d/0x1e0 [ 76.978588] rwsem_down_write_slowpath+0x587/0xdf0 [ 76.978600] down_write+0xec/0x110 [ 76.978607] unregister_shrinker+0x2c/0xf0 [ 76.978616] dm_bufio_client_destroy+0x116/0x3d0 [ 76.978625] dm_block_manager_destroy+0x19/0x40 [ 76.978629] __destroy_persistent_data_objects+0x5e/0x70 [ 76.978636] dm_pool_abort_metadata+0x8e/0x100 [ 76.978643] metadata_operation_failed+0x86/0x110 [ 76.978649] commit+0x6a/0x230 [ 76.978655] do_worker+0xc6e/0xd90 [ 76.978702] process_one_work+0x269/0x630 [ 76.978714] worker_thread+0x266/0x630 [ 76.978730] kthread+0x151/0x1b0 [ 76.978772] INFO: task test.sh:2646 blocked for more than 15 seconds. [ 76.979756] Not tainted 6.1.0-rc4-00011-g8f17dd350364-dirty #910 [ 76.982111] task:test.sh state:D stack:0 pid:2646 ppid:2459 [ 76.982128] Call Trace: [ 76.982139] __schedule+0x6ba/0x10f0 [ 76.982155] schedule+0x9d/0x1e0 [ 76.982159] rwsem_down_read_slowpath+0x4f4/0x910 [ 76.982173] down_read+0x84/0x170 [ 76.982177] dm_thin_find_block+0x4c/0xd0 [ 76.982183] thin_map+0x201/0x3d0 [ 76.982188] __map_bio+0x5b/0x350 [ 76.982195] dm_submit_bio+0x2b6/0x930 [ 76.982202] __submit_bio+0x123/0x2d0 [ 76.982209] submit_bio_noacct_nocheck+0x101/0x3e0 [ 76.982222] submit_bio_noacct+0x389/0x770 [ 76.982227] submit_bio+0x50/0xc0 [ 76.982232] submit_bh_wbc+0x15e/0x230 [ 76.982238] submit_bh+0x14/0x20 [ 76.982241] ext4_read_bh_nowait+0xc5/0x130 [ 76.982247] ext4_read_block_bitmap_nowait+0x340/0xc60 [ 76.982254] ext4_mb_init_cache+0x1ce/0xdc0 [ 76.982259] ext4_mb_load_buddy_gfp+0x987/0xfa0 [ 76.982263] ext4_discard_preallocations+0x45d/0x830 [ 76.982274] ext4_clear_inode+0x48/0xf0 [ 76.982280] ext4_evict_inode+0xcf/0xc70 [ 76.982285] evict+0x119/0x2b0 [ 76.982290] dispose_list+0x43/0xa0 [ 76.982294] prune_icache_sb+0x64/0x90 [ 76.982298] super_cache_scan+0x155/0x210 [ 76.982303] do_shrink_slab+0x19e/0x4e0 [ 76.982310] shrink_slab+0x2bd/0x450 [ 76.982317] drop_slab+0xcc/0x1a0 [ 76.982323] drop_caches_sysctl_handler+0xb7/0xe0 [ 76.982327] proc_sys_call_handler+0x1bc/0x300 [ 76.982331] proc_sys_write+0x17/0x20 [ 76.982334] vfs_write+0x3d3/0x570 [ 76.982342] ksys_write+0x73/0x160 [ 76.982347] __x64_sys_write+0x1e/0x30 [ 76.982352] do_syscall_64+0x35/0x80 [ 76.982357] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Function metadata_operation_failed() is called when operations failed on dm pool metadata, dm pool will destroy and recreate metadata. So, shrinker will be unregistered and registered, which could down write shrinker_rwsem under pmd_write_lock.
Fix it by allocating dm_block_manager before locking pmd->root_lock and destroying old dm_block_manager after unlocking pmd->root_lock, then old dm_block_manager is replaced with new dm_block_manager under pmd->root_lock. So, shrinker register/unregister could be done without holding pmd->root_lock.
Fetch a reproducer in [Link].
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216676 Cc: stable@vger.kernel.org #v5.2+ Fixes: e49e582965b3 ("dm thin: add read only and fail io modes") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-thin-metadata.c | 51 +++++++++++++++++++++++++++++++++++------- 1 file changed, 43 insertions(+), 8 deletions(-)
--- a/drivers/md/dm-thin-metadata.c +++ b/drivers/md/dm-thin-metadata.c @@ -776,13 +776,15 @@ static int __create_persistent_data_obje return r; }
-static void __destroy_persistent_data_objects(struct dm_pool_metadata *pmd) +static void __destroy_persistent_data_objects(struct dm_pool_metadata *pmd, + bool destroy_bm) { dm_sm_destroy(pmd->data_sm); dm_sm_destroy(pmd->metadata_sm); dm_tm_destroy(pmd->nb_tm); dm_tm_destroy(pmd->tm); - dm_block_manager_destroy(pmd->bm); + if (destroy_bm) + dm_block_manager_destroy(pmd->bm); }
static int __begin_transaction(struct dm_pool_metadata *pmd) @@ -989,7 +991,7 @@ int dm_pool_metadata_close(struct dm_poo } pmd_write_unlock(pmd); if (!pmd->fail_io) - __destroy_persistent_data_objects(pmd); + __destroy_persistent_data_objects(pmd, true);
kfree(pmd); return 0; @@ -1888,19 +1890,52 @@ static void __set_abort_with_changes_fla int dm_pool_abort_metadata(struct dm_pool_metadata *pmd) { int r = -EINVAL; + struct dm_block_manager *old_bm = NULL, *new_bm = NULL; + + /* fail_io is double-checked with pmd->root_lock held below */ + if (unlikely(pmd->fail_io)) + return r; + + /* + * Replacement block manager (new_bm) is created and old_bm destroyed outside of + * pmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of + * shrinker associated with the block manager's bufio client vs pmd root_lock). + * - must take shrinker_rwsem without holding pmd->root_lock + */ + new_bm = dm_block_manager_create(pmd->bdev, THIN_METADATA_BLOCK_SIZE << SECTOR_SHIFT, + THIN_MAX_CONCURRENT_LOCKS);
pmd_write_lock(pmd); - if (pmd->fail_io) + if (pmd->fail_io) { + pmd_write_unlock(pmd); goto out; + }
__set_abort_with_changes_flags(pmd); - __destroy_persistent_data_objects(pmd); - r = __create_persistent_data_objects(pmd, false); + __destroy_persistent_data_objects(pmd, false); + old_bm = pmd->bm; + if (IS_ERR(new_bm)) { + DMERR("could not create block manager during abort"); + pmd->bm = NULL; + r = PTR_ERR(new_bm); + goto out_unlock; + } + + pmd->bm = new_bm; + r = __open_or_format_metadata(pmd, false); + if (r) { + pmd->bm = NULL; + goto out_unlock; + } + new_bm = NULL; +out_unlock: if (r) pmd->fail_io = true; - -out: pmd_write_unlock(pmd); + dm_block_manager_destroy(old_bm); +out: + if (new_bm && !IS_ERR(new_bm)) + dm_block_manager_destroy(new_bm);
return r; }
From: Zhihao Cheng chengzhihao1@huawei.com
commit 7991dbff6849f67e823b7cc0c15e5a90b0549b9f upstream.
Recently we found a softlock up problem in dm thin pool btree lookup code due to corrupted metadata:
Kernel panic - not syncing: softlockup: hung tasks CPU: 7 PID: 2669225 Comm: kworker/u16:3 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) Workqueue: dm-thin do_worker [dm_thin_pool] Call Trace: <IRQ> dump_stack+0x9c/0xd3 panic+0x35d/0x6b9 watchdog_timer_fn.cold+0x16/0x25 __run_hrtimer+0xa2/0x2d0 </IRQ> RIP: 0010:__relink_lru+0x102/0x220 [dm_bufio] __bufio_new+0x11f/0x4f0 [dm_bufio] new_read+0xa3/0x1e0 [dm_bufio] dm_bm_read_lock+0x33/0xd0 [dm_persistent_data] ro_step+0x63/0x100 [dm_persistent_data] btree_lookup_raw.constprop.0+0x44/0x220 [dm_persistent_data] dm_btree_lookup+0x16f/0x210 [dm_persistent_data] dm_thin_find_block+0x12c/0x210 [dm_thin_pool] __process_bio_read_only+0xc5/0x400 [dm_thin_pool] process_thin_deferred_bios+0x1a4/0x4a0 [dm_thin_pool] process_one_work+0x3c5/0x730
Following process may generate a broken btree mixed with fresh and stale btree nodes, which could get dm thin trapped in an infinite loop while looking up data block: Transaction 1: pmd->root = A, A->B->C // One path in btree pmd->root = X, X->Y->Z // Copy-up Transaction 2: X,Z is updated on disk, Y write failed. // Commit failed, dm thin becomes read-only. process_bio_read_only dm_thin_find_block __find_block dm_btree_lookup(pmd->root) The pmd->root points to a broken btree, Y may contain stale node pointing to any block, for example X, which gets dm thin trapped into a dead loop while looking up Z.
Fix this by setting pmd->root in __open_metadata(), so that dm thin will use the last transaction's pmd->root if commit failed.
Fetch a reproducer in [Link].
Linke: https://bugzilla.kernel.org/show_bug.cgi?id=216790 Cc: stable@vger.kernel.org Fixes: 991d9fa02da0 ("dm: add thin provisioning target") Signed-off-by: Zhihao Cheng chengzhihao1@huawei.com Acked-by: Joe Thornber ejt@redhat.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-thin-metadata.c | 9 +++++++++ 1 file changed, 9 insertions(+)
--- a/drivers/md/dm-thin-metadata.c +++ b/drivers/md/dm-thin-metadata.c @@ -724,6 +724,15 @@ static int __open_metadata(struct dm_poo goto bad_cleanup_data_sm; }
+ /* + * For pool metadata opening process, root setting is redundant + * because it will be set again in __begin_transaction(). But dm + * pool aborting process really needs to get last transaction's + * root to avoid accessing broken btree. + */ + pmd->root = le64_to_cpu(disk_super->data_mapping_root); + pmd->details_root = le64_to_cpu(disk_super->device_details_root); + __setup_btree_details(pmd); dm_bm_unlock(sblock);
From: Luo Meng luomeng12@huawei.com
commit 19eb1650afeb1aa86151f61900e9e5f1de5d8d02 upstream.
If a thinpool set fail_io while suspending, resume will fail with: device-mapper: resume ioctl on vg-thinpool failed: Invalid argument
The thin-pool also can't be removed if an in-flight bio is in the deferred list.
This can be easily reproduced using:
echo "offline" > /sys/block/sda/device/state dd if=/dev/zero of=/dev/mapper/thin bs=4K count=1 dmsetup suspend /dev/mapper/pool mkfs.ext4 /dev/mapper/thin dmsetup resume /dev/mapper/pool
The root cause is maybe_resize_data_dev() will check fail_io and return error before called dm_resume.
Fix this by adding FAIL mode check at the end of pool_preresume().
Cc: stable@vger.kernel.org Fixes: da105ed5fd7e ("dm thin metadata: introduce dm_pool_abort_metadata") Signed-off-by: Luo Meng luomeng12@huawei.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-thin.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-)
--- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -3566,20 +3566,28 @@ static int pool_preresume(struct dm_targ */ r = bind_control_target(pool, ti); if (r) - return r; + goto out;
r = maybe_resize_data_dev(ti, &need_commit1); if (r) - return r; + goto out;
r = maybe_resize_metadata_dev(ti, &need_commit2); if (r) - return r; + goto out;
if (need_commit1 || need_commit2) (void) commit(pool); +out: + /* + * When a thin-pool is PM_FAIL, it cannot be rebuilt if + * bio is in deferred list. Therefore need to return 0 + * to allow pool_resume() to flush IO. + */ + if (r && get_pool_mode(pool) == PM_FAIL) + r = 0;
- return 0; + return r; }
static void pool_suspend_active_thins(struct pool *pool)
From: Luo Meng luomeng12@huawei.com
commit 88430ebcbc0ec637b710b947738839848c20feff upstream.
When dm_resume() and dm_destroy() are concurrent, it will lead to UAF, as follows:
BUG: KASAN: use-after-free in __run_timers+0x173/0x710 Write of size 8 at addr ffff88816d9490f0 by task swapper/0/0 <snip> Call Trace: <IRQ> dump_stack_lvl+0x73/0x9f print_report.cold+0x132/0xaa2 _raw_spin_lock_irqsave+0xcd/0x160 __run_timers+0x173/0x710 kasan_report+0xad/0x110 __run_timers+0x173/0x710 __asan_store8+0x9c/0x140 __run_timers+0x173/0x710 call_timer_fn+0x310/0x310 pvclock_clocksource_read+0xfa/0x250 kvm_clock_read+0x2c/0x70 kvm_clock_get_cycles+0xd/0x20 ktime_get+0x5c/0x110 lapic_next_event+0x38/0x50 clockevents_program_event+0xf1/0x1e0 run_timer_softirq+0x49/0x90 __do_softirq+0x16e/0x62c __irq_exit_rcu+0x1fa/0x270 irq_exit_rcu+0x12/0x20 sysvec_apic_timer_interrupt+0x8e/0xc0
One of the concurrency UAF can be shown as below:
use free do_resume | __find_device_hash_cell | dm_get | atomic_inc(&md->holders) | | dm_destroy | __dm_destroy | if (!dm_suspended_md(md)) | atomic_read(&md->holders) | msleep(1) dm_resume | __dm_resume | dm_table_resume_targets | pool_resume | do_waker #add delay work | dm_put | atomic_dec(&md->holders) | | dm_table_destroy | pool_dtr | __pool_dec | __pool_destroy | destroy_workqueue | kfree(pool) # free pool time out __do_softirq run_timer_softirq # pool has already been freed
This can be easily reproduced using: 1. create thin-pool 2. dmsetup suspend pool 3. dmsetup resume pool 4. dmsetup remove_all # Concurrent with 3
The root cause of this UAF bug is that dm_resume() adds timer after dm_destroy() skips cancelling the timer because of suspend status. After timeout, it will call run_timer_softirq(), however pool has already been freed. The concurrency UAF bug will happen.
Therefore, cancelling timer again in __pool_destroy().
Cc: stable@vger.kernel.org Fixes: 991d9fa02da0d ("dm: add thin provisioning target") Signed-off-by: Luo Meng luomeng12@huawei.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-thin.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/md/dm-thin.c +++ b/drivers/md/dm-thin.c @@ -2907,6 +2907,8 @@ static void __pool_destroy(struct pool * dm_bio_prison_destroy(pool->prison); dm_kcopyd_client_destroy(pool->copier);
+ cancel_delayed_work_sync(&pool->waker); + cancel_delayed_work_sync(&pool->no_space_timeout); if (pool->wq) destroy_workqueue(pool->wq);
From: Luo Meng luomeng12@huawei.com
commit f50cb2cbabd6c4a60add93d72451728f86e4791c upstream.
Dm_integrity also has the same UAF problem when dm_resume() and dm_destroy() are concurrent.
Therefore, cancelling timer again in dm_integrity_dtr().
Cc: stable@vger.kernel.org Fixes: 7eada909bfd7a ("dm: add integrity target") Signed-off-by: Luo Meng luomeng12@huawei.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-integrity.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/md/dm-integrity.c +++ b/drivers/md/dm-integrity.c @@ -4539,6 +4539,8 @@ static void dm_integrity_dtr(struct dm_t BUG_ON(!RB_EMPTY_ROOT(&ic->in_progress)); BUG_ON(!list_empty(&ic->wait_list));
+ if (ic->mode == 'B') + cancel_delayed_work_sync(&ic->bitmap_flush_work); if (ic->metadata_wq) destroy_workqueue(ic->metadata_wq); if (ic->wait_wq)
From: Luo Meng luomeng12@huawei.com
commit e4b5957c6f749a501c464f92792f1c8e26b61a94 upstream.
Dm_clone also has the same UAF problem when dm_resume() and dm_destroy() are concurrent.
Therefore, cancelling timer again in clone_dtr().
Cc: stable@vger.kernel.org Fixes: 7431b7835f554 ("dm: add clone target") Signed-off-by: Luo Meng luomeng12@huawei.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-clone-target.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/md/dm-clone-target.c +++ b/drivers/md/dm-clone-target.c @@ -1959,6 +1959,7 @@ static void clone_dtr(struct dm_target *
mempool_exit(&clone->hydration_pool); dm_kcopyd_client_destroy(clone->kcopyd_client); + cancel_delayed_work_sync(&clone->waker); destroy_workqueue(clone->wq); hash_table_exit(clone); dm_clone_metadata_close(clone->cmd);
From: Luo Meng luomeng12@huawei.com
commit 6a459d8edbdbe7b24db42a5a9f21e6aa9e00c2aa upstream.
Dm_cache also has the same UAF problem when dm_resume() and dm_destroy() are concurrent.
Therefore, cancelling timer again in destroy().
Cc: stable@vger.kernel.org Fixes: c6b4fcbad044e ("dm: add cache target") Signed-off-by: Luo Meng luomeng12@huawei.com Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-cache-target.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -1895,6 +1895,7 @@ static void destroy(struct cache *cache) if (cache->prison) dm_bio_prison_destroy_v2(cache->prison);
+ cancel_delayed_work_sync(&cache->waker); if (cache->wq) destroy_workqueue(cache->wq);
From: Mike Snitzer snitzer@kernel.org
commit 6b9973861cb2e96dcd0bb0f1baddc5c034207c5c upstream.
Otherwise the commit that will be aborted will be associated with the metadata objects that will be torn down. Must write needs_check flag to metadata with a reset block manager.
Found through code-inspection (and compared against dm-thin.c).
Cc: stable@vger.kernel.org Fixes: 028ae9f76f29 ("dm cache: add fail io mode and needs_check flag") Signed-off-by: Mike Snitzer snitzer@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/dm-cache-target.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/md/dm-cache-target.c +++ b/drivers/md/dm-cache-target.c @@ -915,16 +915,16 @@ static void abort_transaction(struct cac if (get_cache_mode(cache) >= CM_READ_ONLY) return;
- if (dm_cache_metadata_set_needs_check(cache->cmd)) { - DMERR("%s: failed to set 'needs_check' flag in metadata", dev_name); - set_cache_mode(cache, CM_FAIL); - } - DMERR_LIMIT("%s: aborting current metadata transaction", dev_name); if (dm_cache_metadata_abort(cache->cmd)) { DMERR("%s: failed to abort metadata transaction", dev_name); set_cache_mode(cache, CM_FAIL); } + + if (dm_cache_metadata_set_needs_check(cache->cmd)) { + DMERR("%s: failed to set 'needs_check' flag in metadata", dev_name); + set_cache_mode(cache, CM_FAIL); + } }
static void metadata_operation_failed(struct cache *cache, const char *op, int r)
From: Zheng Yejian zhengyejian1@huawei.com
commit 82470f7d9044842618c847a7166de2b7458157a7 upstream.
When generate a synthetic event with many params and then create a trace action for it [1], kernel panic happened [2].
It is because that in trace_action_create() 'data->n_params' is up to SYNTH_FIELDS_MAX (current value is 64), and array 'data->var_ref_idx' keeps indices into array 'hist_data->var_refs' for each synthetic event param, but the length of 'data->var_ref_idx' is TRACING_MAP_VARS_MAX (current value is 16), so out-of-bound write happened when 'data->n_params' more than 16. In this case, 'data->match_data.event' is overwritten and eventually cause the panic.
To solve the issue, adjust the length of 'data->var_ref_idx' to be SYNTH_FIELDS_MAX and add sanity checks to avoid out-of-bound write.
[1] # cd /sys/kernel/tracing/ # echo "my_synth_event int v1; int v2; int v3; int v4; int v5; int v6;\ int v7; int v8; int v9; int v10; int v11; int v12; int v13; int v14;\ int v15; int v16; int v17; int v18; int v19; int v20; int v21; int v22;\ int v23; int v24; int v25; int v26; int v27; int v28; int v29; int v30;\ int v31; int v32; int v33; int v34; int v35; int v36; int v37; int v38;\ int v39; int v40; int v41; int v42; int v43; int v44; int v45; int v46;\ int v47; int v48; int v49; int v50; int v51; int v52; int v53; int v54;\ int v55; int v56; int v57; int v58; int v59; int v60; int v61; int v62;\ int v63" >> synthetic_events # echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="bash"' >> \ events/sched/sched_waking/trigger # echo "hist:keys=next_pid:onmatch(sched.sched_waking).my_synth_event(\ pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\ pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\ pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,pid,\ pid,pid,pid,pid,pid,pid,pid,pid,pid)" >> events/sched/sched_switch/trigger
[2] BUG: unable to handle page fault for address: ffff91c900000000 PGD 61001067 P4D 61001067 PUD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 2 PID: 322 Comm: bash Tainted: G W 6.1.0-rc8+ #229 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 RIP: 0010:strcmp+0xc/0x30 Code: 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 cc cc cc cc 0f 1f 00 31 c0 eb 08 48 83 c0 01 84 d2 74 13 <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 cc cc cc cc 31 c3 RSP: 0018:ffff9b3b00f53c48 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffffba958a68 RCX: 0000000000000000 RDX: 0000000000000010 RSI: ffff91c943d33a90 RDI: ffff91c900000000 RBP: ffff91c900000000 R08: 00000018d604b529 R09: 0000000000000000 R10: ffff91c9483eddb1 R11: ffff91ca483eddab R12: ffff91c946171580 R13: ffff91c9479f0538 R14: ffff91c9457c2848 R15: ffff91c9479f0538 FS: 00007f1d1cfbe740(0000) GS:ffff91c9bdc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff91c900000000 CR3: 0000000006316000 CR4: 00000000000006e0 Call Trace: <TASK> __find_event_file+0x55/0x90 action_create+0x76c/0x1060 event_hist_trigger_parse+0x146d/0x2060 ? event_trigger_write+0x31/0xd0 trigger_process_regex+0xbb/0x110 event_trigger_write+0x6b/0xd0 vfs_write+0xc8/0x3e0 ? alloc_fd+0xc0/0x160 ? preempt_count_add+0x4d/0xa0 ? preempt_count_add+0x70/0xa0 ksys_write+0x5f/0xe0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f1d1d0cf077 Code: 64 89 02 48 c7 c0 ff ff ff ff eb bb 0f 1f 80 00 00 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 RSP: 002b:00007ffcebb0e568 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000143 RCX: 00007f1d1d0cf077 RDX: 0000000000000143 RSI: 00005639265aa7e0 RDI: 0000000000000001 RBP: 00005639265aa7e0 R08: 000000000000000a R09: 0000000000000142 R10: 000056392639c017 R11: 0000000000000246 R12: 0000000000000143 R13: 00007f1d1d1ae6a0 R14: 00007f1d1d1aa4a0 R15: 00007f1d1d1a98a0 </TASK> Modules linked in: CR2: ffff91c900000000 ---[ end trace 0000000000000000 ]--- RIP: 0010:strcmp+0xc/0x30 Code: 75 f7 31 d2 44 0f b6 04 16 44 88 04 11 48 83 c2 01 45 84 c0 75 ee c3 cc cc cc cc 0f 1f 00 31 c0 eb 08 48 83 c0 01 84 d2 74 13 <0f> b6 14 07 3a 14 06 74 ef 19 c0 83 c8 01 c3 cc cc cc cc 31 c3 RSP: 0018:ffff9b3b00f53c48 EFLAGS: 00000246 RAX: 0000000000000000 RBX: ffffffffba958a68 RCX: 0000000000000000 RDX: 0000000000000010 RSI: ffff91c943d33a90 RDI: ffff91c900000000 RBP: ffff91c900000000 R08: 00000018d604b529 R09: 0000000000000000 R10: ffff91c9483eddb1 R11: ffff91ca483eddab R12: ffff91c946171580 R13: ffff91c9479f0538 R14: ffff91c9457c2848 R15: ffff91c9479f0538 FS: 00007f1d1cfbe740(0000) GS:ffff91c9bdc80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff91c900000000 CR3: 0000000006316000 CR4: 00000000000006e0
Link: https://lore.kernel.org/linux-trace-kernel/20221207035143.2278781-1-zhengyej...
Cc: mhiramat@kernel.org Cc: zanussi@kernel.org Cc: stable@vger.kernel.org Fixes: d380dcde9a07 ("tracing: Fix now invalid var_ref_vals assumption in trace action") Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_events_hist.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)
--- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -452,7 +452,7 @@ struct action_data { * event param, and is passed to the synthetic event * invocation. */ - unsigned int var_ref_idx[TRACING_MAP_VARS_MAX]; + unsigned int var_ref_idx[SYNTH_FIELDS_MAX]; struct synth_event *synth_event; bool use_trace_keyword; char *synth_event_name; @@ -1895,7 +1895,9 @@ static struct hist_field *create_var_ref return ref_field; } } - + /* Sanity check to avoid out-of-bound write on 'hist_data->var_refs' */ + if (hist_data->n_var_refs >= TRACING_MAP_VARS_MAX) + return NULL; ref_field = create_hist_field(var_field->hist_data, NULL, flags, NULL); if (ref_field) { if (init_var_ref(ref_field, var_field, system, event_name)) { @@ -3524,6 +3526,10 @@ static int trace_action_create(struct hi
lockdep_assert_held(&event_mutex);
+ /* Sanity check to avoid out-of-bound write on 'data->var_ref_idx' */ + if (data->n_params > SYNTH_FIELDS_MAX) + return -EINVAL; + if (data->use_trace_keyword) synth_event_name = data->synth_event_name; else
From: Namhyung Kim namhyung@kernel.org
commit 0a041ebca4956292cadfb14a63ace3a9c1dcb0a3 upstream.
It passes the attr struct to the security_perf_event_open() but it's not initialized yet.
Fixes: da97e18458fb ("perf_event: Add support for LSM and SELinux checks") Signed-off-by: Namhyung Kim namhyung@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Joel Fernandes (Google) joel@joelfernandes.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20221220223140.4020470-1-namhyung@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/events/core.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -12215,12 +12215,12 @@ SYSCALL_DEFINE5(perf_event_open, if (flags & ~PERF_FLAG_ALL) return -EINVAL;
- /* Do we allow access to perf_event_open(2) ? */ - err = security_perf_event_open(&attr, PERF_SECURITY_OPEN); + err = perf_copy_attr(attr_uptr, &attr); if (err) return err;
- err = perf_copy_attr(attr_uptr, &attr); + /* Do we allow access to perf_event_open(2) ? */ + err = security_perf_event_open(&attr, PERF_SECURITY_OPEN); if (err) return err;
From: Rob Herring robh@kernel.org
commit e553ad8d7957697385e81034bf76db3b2cb2cf27 upstream.
"linux,initrd-start" and "linux,initrd-end" can be 32-bit values even on a 64-bit platform. Ideally, the size should be based on '#address-cells', but that has never been enforced in the kernel's FDT boot parsing code (early_init_dt_check_for_initrd()). Bootloader behavior is known to vary. For example, kexec always writes these as 64-bit. The result of incorrectly reading 32-bit values is most likely the reserved memory for the original initrd will still be reserved for the new kernel. The original arm64 equivalent of this code failed to release the initrd reserved memory in *all* cases.
Use of_read_number() to mirror the early_init_dt_check_for_initrd() code.
Fixes: b30be4dc733e ("of: Add a common kexec FDT setup function") Cc: stable@vger.kernel.org Reported-by: Peter Maydell peter.maydell@linaro.org Link: https://lore.kernel.org/r/20221128202440.1411895-1-robh@kernel.org Signed-off-by: Rob Herring robh@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/of/kexec.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/drivers/of/kexec.c +++ b/drivers/of/kexec.c @@ -284,7 +284,7 @@ void *of_kexec_alloc_and_setup_fdt(const const char *cmdline, size_t extra_fdt_size) { void *fdt; - int ret, chosen_node; + int ret, chosen_node, len; const void *prop; size_t fdt_size;
@@ -327,19 +327,19 @@ void *of_kexec_alloc_and_setup_fdt(const goto out;
/* Did we boot using an initrd? */ - prop = fdt_getprop(fdt, chosen_node, "linux,initrd-start", NULL); + prop = fdt_getprop(fdt, chosen_node, "linux,initrd-start", &len); if (prop) { u64 tmp_start, tmp_end, tmp_size;
- tmp_start = fdt64_to_cpu(*((const fdt64_t *) prop)); + tmp_start = of_read_number(prop, len / 4);
- prop = fdt_getprop(fdt, chosen_node, "linux,initrd-end", NULL); + prop = fdt_getprop(fdt, chosen_node, "linux,initrd-end", &len); if (!prop) { ret = -EINVAL; goto out; }
- tmp_end = fdt64_to_cpu(*((const fdt64_t *) prop)); + tmp_end = of_read_number(prop, len / 4);
/* * kexec reserves exact initrd size, while firmware may
From: Sean Christopherson seanjc@google.com
commit eb3992e833d3a17f9b0a3e0371d0b1d3d566f740 upstream.
Resume the guest immediately when injecting a #GP on ECREATE due to an invalid enclave size, i.e. don't attempt ECREATE in the host. The #GP is a terminal fault, e.g. skipping the instruction if ECREATE is successful would result in KVM injecting #GP on the instruction following ECREATE.
Fixes: 70210c044b4e ("KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions") Cc: stable@vger.kernel.org Cc: Kai Huang kai.huang@intel.com Signed-off-by: Sean Christopherson seanjc@google.com Reviewed-by: Kai Huang kai.huang@intel.com Link: https://lore.kernel.org/r/20220930233132.1723330-1-seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/sgx.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/vmx/sgx.c +++ b/arch/x86/kvm/vmx/sgx.c @@ -188,8 +188,10 @@ static int __handle_encls_ecreate(struct /* Enforce CPUID restriction on max enclave size. */ max_size_log2 = (attributes & SGX_ATTR_MODE64BIT) ? sgx_12_0->edx >> 8 : sgx_12_0->edx; - if (size >= BIT_ULL(max_size_log2)) + if (size >= BIT_ULL(max_size_log2)) { kvm_inject_gp(vcpu, 0); + return 1; + }
/* * sgx_virt_ecreate() returns:
From: Sean Christopherson seanjc@google.com
commit 9cc409325ddd776f6fd6293d5ce93ce1248af6e4 upstream.
Inject #GP for if VMXON is attempting with a CR0/CR4 that fails the generic "is CRx valid" check, but passes the CR4.VMXE check, and do the generic checks _after_ handling the post-VMXON VM-Fail.
The CR4.VMXE check, and all other #UD cases, are special pre-conditions that are enforced prior to pivoting on the current VMX mode, i.e. occur before interception if VMXON is attempted in VMX non-root mode.
All other CR0/CR4 checks generate #GP and effectively have lower priority than the post-VMXON check.
Per the SDM:
IF (register operand) or (CR0.PE = 0) or (CR4.VMXE = 0) or ... THEN #UD; ELSIF not in VMX operation THEN IF (CPL > 0) or (in A20M mode) or (the values of CR0 and CR4 are not supported in VMX operation) THEN #GP(0); ELSIF in VMX non-root operation THEN VMexit; ELSIF CPL > 0 THEN #GP(0); ELSE VMfail("VMXON executed in VMX root operation"); FI;
which, if re-written without ELSIF, yields:
IF (register operand) or (CR0.PE = 0) or (CR4.VMXE = 0) or ... THEN #UD
IF in VMX non-root operation THEN VMexit;
IF CPL > 0 THEN #GP(0)
IF in VMX operation THEN VMfail("VMXON executed in VMX root operation");
IF (in A20M mode) or (the values of CR0 and CR4 are not supported in VMX operation) THEN #GP(0);
Note, KVM unconditionally forwards VMXON VM-Exits that occur in L2 to L1, i.e. there is no need to check the vCPU is not in VMX non-root mode. Add a comment to explain why unconditionally forwarding such exits is functionally correct.
Reported-by: Eric Li ercli@ucdavis.edu Fixes: c7d855c2aff2 ("KVM: nVMX: Inject #UD if VMXON is attempted with incompatible CR0/CR4") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson seanjc@google.com Link: https://lore.kernel.org/r/20221006001956.329314-1-seanjc@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/nested.c | 44 +++++++++++++++++++++++++++++++++----------- 1 file changed, 33 insertions(+), 11 deletions(-)
--- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -4970,24 +4970,35 @@ static int handle_vmon(struct kvm_vcpu * | FEAT_CTL_VMX_ENABLED_OUTSIDE_SMX;
/* - * Note, KVM cannot rely on hardware to perform the CR0/CR4 #UD checks - * that have higher priority than VM-Exit (see Intel SDM's pseudocode - * for VMXON), as KVM must load valid CR0/CR4 values into hardware while - * running the guest, i.e. KVM needs to check the _guest_ values. + * Manually check CR4.VMXE checks, KVM must force CR4.VMXE=1 to enter + * the guest and so cannot rely on hardware to perform the check, + * which has higher priority than VM-Exit (see Intel SDM's pseudocode + * for VMXON). * - * Rely on hardware for the other two pre-VM-Exit checks, !VM86 and - * !COMPATIBILITY modes. KVM may run the guest in VM86 to emulate Real - * Mode, but KVM will never take the guest out of those modes. + * Rely on hardware for the other pre-VM-Exit checks, CR0.PE=1, !VM86 + * and !COMPATIBILITY modes. For an unrestricted guest, KVM doesn't + * force any of the relevant guest state. For a restricted guest, KVM + * does force CR0.PE=1, but only to also force VM86 in order to emulate + * Real Mode, and so there's no need to check CR0.PE manually. */ - if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) || - !nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) { + if (!kvm_read_cr4_bits(vcpu, X86_CR4_VMXE)) { kvm_queue_exception(vcpu, UD_VECTOR); return 1; }
/* - * CPL=0 and all other checks that are lower priority than VM-Exit must - * be checked manually. + * The CPL is checked for "not in VMX operation" and for "in VMX root", + * and has higher priority than the VM-Fail due to being post-VMXON, + * i.e. VMXON #GPs outside of VMX non-root if CPL!=0. In VMX non-root, + * VMXON causes VM-Exit and KVM unconditionally forwards VMXON VM-Exits + * from L2 to L1, i.e. there's no need to check for the vCPU being in + * VMX non-root. + * + * Forwarding the VM-Exit unconditionally, i.e. without performing the + * #UD checks (see above), is functionally ok because KVM doesn't allow + * L1 to run L2 without CR4.VMXE=0, and because KVM never modifies L2's + * CR0 or CR4, i.e. it's L2's responsibility to emulate #UDs that are + * missed by hardware due to shadowing CR0 and/or CR4. */ if (vmx_get_cpl(vcpu)) { kvm_inject_gp(vcpu, 0); @@ -4997,6 +5008,17 @@ static int handle_vmon(struct kvm_vcpu * if (vmx->nested.vmxon) return nested_vmx_fail(vcpu, VMXERR_VMXON_IN_VMX_ROOT_OPERATION);
+ /* + * Invalid CR0/CR4 generates #GP. These checks are performed if and + * only if the vCPU isn't already in VMX operation, i.e. effectively + * have lower priority than the VM-Fail above. + */ + if (!nested_host_cr0_valid(vcpu, kvm_read_cr0(vcpu)) || + !nested_host_cr4_valid(vcpu, kvm_read_cr4(vcpu))) { + kvm_inject_gp(vcpu, 0); + return 1; + } + if ((vmx->msr_ia32_feature_control & VMXON_NEEDED_FEATURES) != VMXON_NEEDED_FEATURES) { kvm_inject_gp(vcpu, 0);
From: Sean Christopherson seanjc@google.com
commit 31de69f4eea77b28a9724b3fa55aae104fc91fc7 upstream.
Set ENABLE_USR_WAIT_PAUSE in KVM's supported VMX MSR configuration if the feature is supported in hardware and enabled in KVM's base, non-nested configuration, i.e. expose ENABLE_USR_WAIT_PAUSE to L1 if it's supported. This fixes a bug where saving/restoring, i.e. migrating, a vCPU will fail if WAITPKG (the associated CPUID feature) is enabled for the vCPU, and obviously allows L1 to enable the feature for L2.
KVM already effectively exposes ENABLE_USR_WAIT_PAUSE to L1 by stuffing the allowed-1 control ina vCPU's virtual MSR_IA32_VMX_PROCBASED_CTLS2 when updating secondary controls in response to KVM_SET_CPUID(2), but (a) that depends on flawed code (KVM shouldn't touch VMX MSRs in response to CPUID updates) and (b) runs afoul of vmx_restore_control_msr()'s restriction that the guest value must be a strict subset of the supported host value.
Although no past commit explicitly enabled nested support for WAITPKG, doing so is safe and functionally correct from an architectural perspective as no additional KVM support is needed to virtualize TPAUSE, UMONITOR, and UMWAIT for L2 relative to L1, and KVM already forwards VM-Exits to L1 as necessary (commit bf653b78f960, "KVM: vmx: Introduce handle_unexpected_vmexit and handle WAITPKG vmexit").
Note, KVM always keeps the hosts MSR_IA32_UMWAIT_CONTROL resident in hardware, i.e. always runs both L1 and L2 with the host's power management settings for TPAUSE and UMWAIT. See commit bf09fb6cba4f ("KVM: VMX: Stop context switching MSR_IA32_UMWAIT_CONTROL") for more details.
Fixes: e69e72faa3a0 ("KVM: x86: Add support for user wait instructions") Cc: stable@vger.kernel.org Reported-by: Aaron Lewis aaronlewis@google.com Reported-by: Yu Zhang yu.c.zhang@linux.intel.com Signed-off-by: Sean Christopherson seanjc@google.com Reviewed-by: Jim Mattson jmattson@google.com Message-Id: 20221213062306.667649-2-seanjc@google.com Signed-off-by: Paolo Bonzini pbonzini@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kvm/vmx/nested.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -6666,7 +6666,8 @@ void nested_vmx_setup_ctls_msrs(struct n SECONDARY_EXEC_ENABLE_INVPCID | SECONDARY_EXEC_RDSEED_EXITING | SECONDARY_EXEC_XSAVES | - SECONDARY_EXEC_TSC_SCALING; + SECONDARY_EXEC_TSC_SCALING | + SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE;
/* * We can emulate "VMCS shadowing," even if the hardware
From: Ashok Raj ashok.raj@intel.com
commit be1b670f61443aa5d0d01782e9b8ea0ee825d018 upstream.
The retries in load_ucode_intel_ap() were in place to support systems with mixed steppings. Mixed steppings are no longer supported and there is only one microcode image at a time. Any retries will simply reattempt to apply the same image over and over without making progress.
[ bp: Zap the circumstantial reasoning from the commit message. ]
Fixes: 06b8534cb728 ("x86/microcode: Rework microcode loading") Signed-off-by: Ashok Raj ashok.raj@intel.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Reviewed-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221129210832.107850-3-ashok.raj@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/microcode/intel.c | 8 +------- 1 file changed, 1 insertion(+), 7 deletions(-)
--- a/arch/x86/kernel/cpu/microcode/intel.c +++ b/arch/x86/kernel/cpu/microcode/intel.c @@ -659,7 +659,6 @@ void load_ucode_intel_ap(void) else iup = &intel_ucode_patch;
-reget: if (!*iup) { patch = __load_ucode_intel(&uci); if (!patch) @@ -670,12 +669,7 @@ reget:
uci.mc = *iup;
- if (apply_microcode_early(&uci, true)) { - /* Mixed-silicon system? Try to refetch the proper patch: */ - *iup = NULL; - - goto reget; - } + apply_microcode_early(&uci, true); }
static struct microcode_intel *find_patch(struct ucode_cpu_info *uci)
From: Steven Rostedt (Google) rostedt@goodmis.org
commit fd3dc56253acbe9c641a66d312d8393cd55eb04c upstream.
After someone reported a bug report with a failed modification due to the expected value not matching what was found, it came to my attention that the ftrace_expected is no longer set when that happens. This makes for debugging the issue a bit more difficult.
Set ftrace_expected to the expected code before calling ftrace_bug, so that it shows what was expected and why it failed.
Link: https://lore.kernel.org/all/CA+wXwBQ-VhK+hpBtYtyZP-NiX4g8fqRRWithFOHQW-0coQ3... Link: https://lore.kernel.org/linux-trace-kernel/20221209105247.01d4e51d@gandalf.l...
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: Andrew Morton akpm@linux-foundation.org Cc: Peter Zijlstra peterz@infradead.org Cc: Thomas Gleixner tglx@linutronix.de Cc: "x86@kernel.org" x86@kernel.org Cc: Borislav Petkov bp@alien8.de Cc: Ingo Molnar mingo@kernel.org Cc: stable@vger.kernel.org Fixes: 768ae4406a5c ("x86/ftrace: Use text_poke()") Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/ftrace.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/x86/kernel/ftrace.c +++ b/arch/x86/kernel/ftrace.c @@ -219,7 +219,9 @@ void ftrace_replace_code(int enable)
ret = ftrace_verify_code(rec->ip, old); if (ret) { + ftrace_expected = old; ftrace_bug(ret, rec); + ftrace_expected = NULL; return; } }
From: Masami Hiramatsu (Google) mhiramat@kernel.org
commit 1993bf97992df2d560287f3c4120eda57426843d upstream.
Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping speculative execution after RET instruction, kprobes always failes to check the probed instruction boundary by decoding the function body if the probed address is after such sequence. (Note that some conditional code blocks will be placed after function return, if compiler decides it is not on the hot path.)
This is because kprobes expects kgdb puts the INT3 as a software breakpoint and it will replace the original instruction. But these INT3 are not such purpose, it doesn't need to recover the original instruction.
To avoid this issue, kprobes checks whether the INT3 is owned by kgdb or not, and if so, stop decoding and make it fail. The other INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be treated as a one-byte instruction.
Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation") Suggested-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/167146051026.1374301.392728975473572291.stgit@devn... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/kprobes/core.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/kprobes/core.c +++ b/arch/x86/kernel/kprobes/core.c @@ -37,6 +37,7 @@ #include <linux/extable.h> #include <linux/kdebug.h> #include <linux/kallsyms.h> +#include <linux/kgdb.h> #include <linux/ftrace.h> #include <linux/kasan.h> #include <linux/moduleloader.h> @@ -289,12 +290,15 @@ static int can_probe(unsigned long paddr if (ret < 0) return 0;
+#ifdef CONFIG_KGDB /* - * Another debugging subsystem might insert this breakpoint. - * In that case, we can't recover it. + * If there is a dynamically installed kgdb sw breakpoint, + * this function should not be probed. */ - if (insn.opcode.bytes[0] == INT3_INSN_OPCODE) + if (insn.opcode.bytes[0] == INT3_INSN_OPCODE && + kgdb_has_hit_break(addr)) return 0; +#endif addr += insn.length; }
From: Masami Hiramatsu (Google) mhiramat@kernel.org
commit 63dc6325ff41ee9e570bde705ac34a39c5dbeb44 upstream.
Since the CONFIG_RETHUNK and CONFIG_SLS will use INT3 for stopping speculative execution after function return, kprobe jump optimization always fails on the functions with such INT3 inside the function body. (It already checks the INT3 padding between functions, but not inside the function)
To avoid this issue, as same as kprobes, check whether the INT3 comes from kgdb or not, and if so, stop decoding and make it fail. The other INT3 will come from CONFIG_RETHUNK/CONFIG_SLS and those can be treated as a one-byte instruction.
Fixes: e463a09af2f0 ("x86: Add straight-line-speculation mitigation") Suggested-by: Peter Zijlstra peterz@infradead.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/167146051929.1374301.7419382929328081706.stgit@dev... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/kprobes/opt.c | 28 ++++++++-------------------- 1 file changed, 8 insertions(+), 20 deletions(-)
--- a/arch/x86/kernel/kprobes/opt.c +++ b/arch/x86/kernel/kprobes/opt.c @@ -15,6 +15,7 @@ #include <linux/extable.h> #include <linux/kdebug.h> #include <linux/kallsyms.h> +#include <linux/kgdb.h> #include <linux/ftrace.h> #include <linux/objtool.h> #include <linux/pgtable.h> @@ -272,19 +273,6 @@ static int insn_is_indirect_jump(struct return ret; }
-static bool is_padding_int3(unsigned long addr, unsigned long eaddr) -{ - unsigned char ops; - - for (; addr < eaddr; addr++) { - if (get_kernel_nofault(ops, (void *)addr) < 0 || - ops != INT3_INSN_OPCODE) - return false; - } - - return true; -} - /* Decode whole function to ensure any instructions don't jump into target */ static int can_optimize(unsigned long paddr) { @@ -327,15 +315,15 @@ static int can_optimize(unsigned long pa ret = insn_decode_kernel(&insn, (void *)recovered_insn); if (ret < 0) return 0; - +#ifdef CONFIG_KGDB /* - * In the case of detecting unknown breakpoint, this could be - * a padding INT3 between functions. Let's check that all the - * rest of the bytes are also INT3. + * If there is a dynamically installed kgdb sw breakpoint, + * this function should not be probed. */ - if (insn.opcode.bytes[0] == INT3_INSN_OPCODE) - return is_padding_int3(addr, paddr - offset + size) ? 1 : 0; - + if (insn.opcode.bytes[0] == INT3_INSN_OPCODE && + kgdb_has_hit_break(addr)) + return 0; +#endif /* Recover address */ insn.kaddr = (void *)addr; insn.next_byte = (void *)(addr + insn.length);
From: Steven Rostedt (Google) rostedt@goodmis.org
commit d5f30a7da8ea8e6450250275cec5670cee3c4264 upstream.
The flag that tells the event to call its triggers after reading the event is set for eprobes after the eprobe is enabled. This leads to a race where the eprobe may be triggered at the beginning of the event where the record information is NULL. The eprobe then dereferences the NULL record causing a NULL kernel pointer bug.
Test for a NULL record to keep this from happening.
Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelme... Link: https://lore.kernel.org/all/20221117214249.2addbe10@gandalf.local.home/
Cc: stable@vger.kernel.org Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Reported-by: Rafael Mendonca rafaelmendsr@gmail.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_eprobe.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/kernel/trace/trace_eprobe.c +++ b/kernel/trace/trace_eprobe.c @@ -570,6 +570,9 @@ static void eprobe_trigger_func(struct e if (unlikely(!rec)) return;
+ if (unlikely(!rec)) + return; + __eprobe_trace_func(edata, rec); }
From: Masami Hiramatsu (Google) mhiramat@kernel.org
commit e25e43a4e5d8cb2323553d8b6a7ba08d2ebab21f upstream.
Both CONFIG_OSNOISE_TRACER and CONFIG_HWLAT_TRACER partially enables the CONFIG_TRACER_MAX_TRACE code, but that is complicated and has introduced a bug; It declares tracing_max_lat_fops data structure outside of #ifdefs, but since it is defined only when CONFIG_TRACER_MAX_TRACE=y or CONFIG_HWLAT_TRACER=y, if only CONFIG_OSNOISE_TRACER=y, that declaration comes to a definition(!).
To fix this issue, and do not repeat the similar problem, makes CONFIG_OSNOISE_TRACER and CONFIG_HWLAT_TRACER enables the CONFIG_TRACER_MAX_TRACE always. It has there benefits; - Fix the tracing_max_lat_fops bug - Simplify the #ifdefs - CONFIG_TRACER_MAX_TRACE code is fully enabled, or not.
Link: https://lore.kernel.org/linux-trace-kernel/167033628155.4111793.121854056908...
Fixes: 424b650f35c7 ("tracing: Fix missing osnoise tracer on max_latency") Cc: Daniel Bristot de Oliveira bristot@kernel.org Cc: stable@vger.kernel.org Reported-by: David Howells dhowells@redhat.com Reported-by: kernel test robot lkp@intel.com Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Link: https://lore.kernel.org/all/166992525941.1716618.13740663757583361463.stgit@... (original thread and v1) Link: https://lore.kernel.org/all/202212052253.VuhZ2ulJ-lkp@intel.com/T/#u (v1 error report) Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/Kconfig | 2 ++ kernel/trace/trace.c | 23 +++++++++++++---------- kernel/trace/trace.h | 8 +++----- 3 files changed, 18 insertions(+), 15 deletions(-)
--- a/kernel/trace/Kconfig +++ b/kernel/trace/Kconfig @@ -328,6 +328,7 @@ config SCHED_TRACER config HWLAT_TRACER bool "Tracer to detect hardware latencies (like SMIs)" select GENERIC_TRACER + select TRACER_MAX_TRACE help This tracer, when enabled will create one or more kernel threads, depending on what the cpumask file is set to, which each thread @@ -363,6 +364,7 @@ config HWLAT_TRACER config OSNOISE_TRACER bool "OS Noise tracer" select GENERIC_TRACER + select TRACER_MAX_TRACE help In the context of high-performance computing (HPC), the Operating System Noise (osnoise) refers to the interference experienced by an --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -1409,6 +1409,7 @@ int tracing_snapshot_cond_disable(struct return false; } EXPORT_SYMBOL_GPL(tracing_snapshot_cond_disable); +#define free_snapshot(tr) do { } while (0) #endif /* CONFIG_TRACER_SNAPSHOT */
void tracer_tracing_off(struct trace_array *tr) @@ -1679,6 +1680,8 @@ static ssize_t trace_seq_to_buffer(struc }
unsigned long __read_mostly tracing_thresh; + +#ifdef CONFIG_TRACER_MAX_TRACE static const struct file_operations tracing_max_lat_fops;
#ifdef LATENCY_FS_NOTIFY @@ -1735,18 +1738,14 @@ void latency_fsnotify(struct trace_array irq_work_queue(&tr->fsnotify_irqwork); }
-#elif defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER) \ - || defined(CONFIG_OSNOISE_TRACER) +#else /* !LATENCY_FS_NOTIFY */
#define trace_create_maxlat_file(tr, d_tracer) \ trace_create_file("tracing_max_latency", TRACE_MODE_WRITE, \ d_tracer, &tr->max_latency, &tracing_max_lat_fops)
-#else -#define trace_create_maxlat_file(tr, d_tracer) do { } while (0) #endif
-#ifdef CONFIG_TRACER_MAX_TRACE /* * Copy the new maximum trace into the separate maximum-trace * structure. (this way the maximum trace is permanently saved, @@ -1821,14 +1820,15 @@ update_max_tr(struct trace_array *tr, st ring_buffer_record_off(tr->max_buffer.buffer);
#ifdef CONFIG_TRACER_SNAPSHOT - if (tr->cond_snapshot && !tr->cond_snapshot->update(tr, cond_data)) - goto out_unlock; + if (tr->cond_snapshot && !tr->cond_snapshot->update(tr, cond_data)) { + arch_spin_unlock(&tr->max_lock); + return; + } #endif swap(tr->array_buffer.buffer, tr->max_buffer.buffer);
__update_max_tr(tr, tsk, cpu);
- out_unlock: arch_spin_unlock(&tr->max_lock); }
@@ -1875,6 +1875,7 @@ update_max_tr_single(struct trace_array __update_max_tr(tr, tsk, cpu); arch_spin_unlock(&tr->max_lock); } + #endif /* CONFIG_TRACER_MAX_TRACE */
static int wait_on_pipe(struct trace_iterator *iter, int full) @@ -6536,7 +6537,7 @@ out: return ret; }
-#if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER) +#ifdef CONFIG_TRACER_MAX_TRACE
static ssize_t tracing_max_lat_read(struct file *filp, char __user *ubuf, @@ -7560,7 +7561,7 @@ static const struct file_operations trac .llseek = generic_file_llseek, };
-#if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER) +#ifdef CONFIG_TRACER_MAX_TRACE static const struct file_operations tracing_max_lat_fops = { .open = tracing_open_generic, .read = tracing_max_lat_read, @@ -9549,7 +9550,9 @@ init_tracer_tracefs(struct trace_array *
create_trace_options_dir(tr);
+#ifdef CONFIG_TRACER_MAX_TRACE trace_create_maxlat_file(tr, d_tracer); +#endif
if (ftrace_create_function_files(tr, d_tracer)) MEM_FAIL(1, "Could not allocate function filter files"); --- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -309,8 +309,7 @@ struct trace_array { struct array_buffer max_buffer; bool allocated_snapshot; #endif -#if defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER) \ - || defined(CONFIG_OSNOISE_TRACER) +#ifdef CONFIG_TRACER_MAX_TRACE unsigned long max_latency; #ifdef CONFIG_FSNOTIFY struct dentry *d_max_latency; @@ -688,12 +687,11 @@ void update_max_tr(struct trace_array *t void *cond_data); void update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu); -#endif /* CONFIG_TRACER_MAX_TRACE */
-#if (defined(CONFIG_TRACER_MAX_TRACE) || defined(CONFIG_HWLAT_TRACER) \ - || defined(CONFIG_OSNOISE_TRACER)) && defined(CONFIG_FSNOTIFY) +#ifdef CONFIG_FSNOTIFY #define LATENCY_FS_NOTIFY #endif +#endif /* CONFIG_TRACER_MAX_TRACE */
#ifdef LATENCY_FS_NOTIFY void latency_fsnotify(struct trace_array *tr);
From: Zheng Yejian zhengyejian1@huawei.com
commit 2cc6a528882d0e0ccbc1bca5f95b8c963cedac54 upstream.
When number of synth fields is more than SYNTH_FIELDS_MAX, parse_action_params() should return -EINVAL.
Link: https://lore.kernel.org/linux-trace-kernel/20221207034635.2253990-1-zhengyej...
Cc: mhiramat@kernel.org Cc: zanussi@kernel.org Cc: stable@vger.kernel.org Fixes: c282a386a397 ("tracing: Add 'onmatch' hist trigger action support") Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_events_hist.c | 1 + 1 file changed, 1 insertion(+)
--- a/kernel/trace/trace_events_hist.c +++ b/kernel/trace/trace_events_hist.c @@ -3190,6 +3190,7 @@ static int parse_action_params(struct tr while (params) { if (data->n_params >= SYNTH_FIELDS_MAX) { hist_err(tr, HIST_ERR_TOO_MANY_PARAMS, 0); + ret = -EINVAL; goto out; }
From: Steven Rostedt (Google) rostedt@goodmis.org
commit 575b76cb885532aae13a9d979fd476bb2b156cb9 upstream.
When creating probe names, a check is done to make sure it matches basic C standard variable naming standards. Basically, starts with alphabetic or underline, and then the rest of the characters have alpha-numeric or underline in them.
But system names do not have any true naming conventions, as they are created by the TRACE_SYSTEM macro and nothing tests to see what they are. The "xhci-hcd" trace events has a '-' in the system name. When trying to attach a eprobe to one of these trace points, it fails because the system name does not follow the variable naming convention because of the hyphen, and the eprobe checks fail on this.
Allow hyphens in the system name so that eprobes can attach to the "xhci-hcd" trace events.
Link: https://lore.kernel.org/all/Y3eJ8GiGnEvVd8%2FN@macondo/ Link: https://lore.kernel.org/linux-trace-kernel/20221122122345.160f5077@gandalf.l...
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: stable@vger.kernel.org Fixes: 5b7a96220900e ("tracing/probe: Check event/group naming rule at parsing") Reported-by: Rafael Mendonca rafaelmendsr@gmail.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.h | 19 ++++++++++++++++--- kernel/trace/trace_probe.c | 2 +- 2 files changed, 17 insertions(+), 4 deletions(-)
--- a/kernel/trace/trace.h +++ b/kernel/trace/trace.h @@ -1939,17 +1939,30 @@ static __always_inline void trace_iterat }
/* Check the name is good for event/group/fields */ -static inline bool is_good_name(const char *name) +static inline bool __is_good_name(const char *name, bool hash_ok) { - if (!isalpha(*name) && *name != '_') + if (!isalpha(*name) && *name != '_' && (!hash_ok || *name != '-')) return false; while (*++name != '\0') { - if (!isalpha(*name) && !isdigit(*name) && *name != '_') + if (!isalpha(*name) && !isdigit(*name) && *name != '_' && + (!hash_ok || *name != '-')) return false; } return true; }
+/* Check the name is good for event/group/fields */ +static inline bool is_good_name(const char *name) +{ + return __is_good_name(name, false); +} + +/* Check the name is good for system */ +static inline bool is_good_system_name(const char *name) +{ + return __is_good_name(name, true); +} + /* Convert certain expected symbols into '_' when generating event names */ static inline void sanitize_event_name(char *name) { --- a/kernel/trace/trace_probe.c +++ b/kernel/trace/trace_probe.c @@ -246,7 +246,7 @@ int traceprobe_parse_event_name(const ch return -EINVAL; } strlcpy(buf, event, slash - event + 1); - if (!is_good_name(buf)) { + if (!is_good_system_name(buf)) { trace_probe_log_err(offset, BAD_GROUP_NAME); return -EINVAL; }
From: Yang Jihong yangjihong1@huawei.com
commit c1ac03af6ed45d05786c219d102f37eb44880f28 upstream.
print_trace_line may overflow seq_file buffer. If the event is not consumed, the while loop keeps peeking this event, causing a infinite loop.
Link: https://lkml.kernel.org/r/20221129113009.182425-1-yangjihong1@huawei.com
Cc: Masami Hiramatsu mhiramat@kernel.org Cc: stable@vger.kernel.org Fixes: 088b1e427dbba ("ftrace: pipe fixes") Signed-off-by: Yang Jihong yangjihong1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-)
--- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6764,7 +6764,20 @@ waitagain:
ret = print_trace_line(iter); if (ret == TRACE_TYPE_PARTIAL_LINE) { - /* don't print partial lines */ + /* + * If one print_trace_line() fills entire trace_seq in one shot, + * trace_seq_to_user() will returns -EBUSY because save_len == 0, + * In this case, we need to consume it, otherwise, loop will peek + * this event next time, resulting in an infinite loop. + */ + if (save_len == 0) { + iter->seq.full = 0; + trace_seq_puts(&iter->seq, "[LINE TOO BIG]\n"); + trace_consume(iter); + break; + } + + /* In other cases, don't print partial lines */ iter->seq.seq.len = save_len; break; }
From: Luca Ceresoli luca.ceresoli@bootlin.com
commit 10b5ce6743c839fa75336042c64e2479caec9430 upstream.
chan->mipi takes the return value of tegra_mipi_request() which can be a valid pointer or an error. However chan->mipi is checked in several places, including error-cleanup code in tegra_csi_channels_cleanup(), as 'if (chan->mipi)', which suggests the initial intent was that chan->mipi should be either NULL or a valid pointer, never an error. As a consequence, cleanup code in case of tegra_mipi_request() errors would dereference an invalid pointer.
Fix by ensuring chan->mipi always contains either NULL or a void pointer.
Also add that to the documentation.
Fixes: 523c857e34ce ("media: tegra-video: Add CSI MIPI pads calibration") Cc: stable@vger.kernel.org Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Luca Ceresoli luca.ceresoli@bootlin.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/media/tegra-video/csi.c | 1 + drivers/staging/media/tegra-video/csi.h | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/staging/media/tegra-video/csi.c +++ b/drivers/staging/media/tegra-video/csi.c @@ -448,6 +448,7 @@ static int tegra_csi_channel_alloc(struc chan->mipi = tegra_mipi_request(csi->dev, node); if (IS_ERR(chan->mipi)) { ret = PTR_ERR(chan->mipi); + chan->mipi = NULL; dev_err(csi->dev, "failed to get mipi device: %d\n", ret); }
--- a/drivers/staging/media/tegra-video/csi.h +++ b/drivers/staging/media/tegra-video/csi.h @@ -56,7 +56,7 @@ struct tegra_csi; * @framerate: active framerate for TPG * @h_blank: horizontal blanking for TPG active format * @v_blank: vertical blanking for TPG active format - * @mipi: mipi device for corresponding csi channel pads + * @mipi: mipi device for corresponding csi channel pads, or NULL if not applicable (TPG, error) * @pixel_rate: active pixel rate from the sensor on this channel */ struct tegra_csi_channel {
From: Luca Ceresoli luca.ceresoli@bootlin.com
commit c4d344163c3a7f90712525f931a6c016bbb35e18 upstream.
At probe time this code path is followed:
* tegra_csi_init * tegra_csi_channels_alloc * for_each_child_of_node(node, channel) -- iterates over channels * automatically gets 'channel' * tegra_csi_channel_alloc() * saves into chan->of_node a pointer to the channel OF node * automatically gets and puts 'channel' * now the node saved in chan->of_node has refcount 0, can disappear * tegra_csi_channels_init * iterates over channels * tegra_csi_channel_init -- uses chan->of_node
After that, chan->of_node keeps storing the node until the device is removed.
of_node_get() the node and of_node_put() it during teardown to avoid any risk.
Fixes: 1ebaeb09830f ("media: tegra-video: Add support for external sensor capture") Cc: stable@vger.kernel.org Cc: Sowjanya Komatineni skomatineni@nvidia.com Signed-off-by: Luca Ceresoli luca.ceresoli@bootlin.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/media/tegra-video/csi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/staging/media/tegra-video/csi.c +++ b/drivers/staging/media/tegra-video/csi.c @@ -433,7 +433,7 @@ static int tegra_csi_channel_alloc(struc for (i = 0; i < chan->numgangports; i++) chan->csi_port_nums[i] = port_num + i * CSI_PORTS_PER_BRICK;
- chan->of_node = node; + chan->of_node = of_node_get(node); chan->numpads = num_pads; if (num_pads & 0x2) { chan->pads[0].flags = MEDIA_PAD_FL_SINK; @@ -641,6 +641,7 @@ static void tegra_csi_channels_cleanup(s media_entity_cleanup(&subdev->entity); }
+ of_node_put(chan->of_node); list_del(&chan->list); kfree(chan); }
From: Nick Desaulniers ndesaulniers@google.com
commit 3220022038b9a3845eea762af85f1c5694b9f861 upstream.
clang-15's ability to elide loops completely became more aggressive when it can deduce how a variable is being updated in a loop. Counting down one variable by an increment of another can be replaced by a modulo operation.
For 64b variables on 32b ARM EABI targets, this can result in the compiler generating calls to __aeabi_uldivmod, which it does for a do while loop in float64_rem().
For the kernel, we'd generally prefer that developers not open code 64b division via binary / operators and instead use the more explicit helpers from div64.h. On arm-linux-gnuabi targets, failure to do so can result in linkage failures due to undefined references to __aeabi_uldivmod().
While developers can avoid open coding divisions on 64b variables, the compiler doesn't know that the Linux kernel has a partial implementation of a compiler runtime (--rtlib) to enforce this convention.
It's also undecidable for the compiler whether the code in question would be faster to execute the loop vs elide it and do the 64b division.
While I actively avoid using the internal -mllvm command line flags, I think we get better code than using barrier() here, which will force reloads+spills in the loop for all toolchains.
Link: https://github.com/ClangBuiltLinux/linux/issues/1666
Reported-by: Nathan Chancellor nathan@kernel.org Reviewed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Nick Desaulniers ndesaulniers@google.com Tested-by: Nathan Chancellor nathan@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Russell King (Oracle) rmk+kernel@armlinux.org.uk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/nwfpe/Makefile | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/arch/arm/nwfpe/Makefile +++ b/arch/arm/nwfpe/Makefile @@ -11,3 +11,9 @@ nwfpe-y += fpa11.o fpa11_cpdo.o fpa11 entry.o
nwfpe-$(CONFIG_FPE_NWFPE_XP) += extended_cpdo.o + +# Try really hard to avoid generating calls to __aeabi_uldivmod() from +# float64_rem() due to loop elision. +ifdef CONFIG_CC_IS_CLANG +CFLAGS_softfloat.o += -mllvm -replexitval=never +endif
From: Keita Suzuki keitasuzuki.park@sslab.ics.keio.ac.jp
commit 6b0d0477fce747d4137aa65856318b55fba72198 upstream.
In function dvb_register_device() -> dvb_register_media_device() -> dvb_create_media_entity(), dvb->entity is allocated and initialized. If the initialization fails, it frees the dvb->entity, and return an error code. The caller takes the error code and handles the error by calling dvb_media_device_free(), which unregisters the entity and frees the field again if it is not NULL. As dvb->entity may not NULLed in dvb_create_media_entity() when the allocation of dvbdev->pad fails, a double free may occur. This may also cause an Use After free in media_device_unregister_entity().
Fix this by storing NULL to dvb->entity when it is freed.
Link: https://lore.kernel.org/linux-media/20220426052921.2088416-1-keitasuzuki.par... Fixes: fcd5ce4b3936 ("media: dvb-core: fix a memory leak bug") Cc: stable@vger.kernel.org Cc: Wenwen Wang wenwen@cs.uga.edu Signed-off-by: Keita Suzuki keitasuzuki.park@sslab.ics.keio.ac.jp Signed-off-by: Mauro Carvalho Chehab mchehab@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/dvb-core/dvbdev.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/media/dvb-core/dvbdev.c +++ b/drivers/media/dvb-core/dvbdev.c @@ -345,6 +345,7 @@ static int dvb_create_media_entity(struc GFP_KERNEL); if (!dvbdev->pads) { kfree(dvbdev->entity); + dvbdev->entity = NULL; return -ENOMEM; } }
From: Takashi Iwai tiwai@suse.de
commit fd3d91ab1c6ab0628fe642dd570b56302c30a792 upstream.
The dvb-core tries to sync the releases of opened files at dvb_dmxdev_release() with two refcounts: dvbdev->users and dvr_dvbdev->users. A problem is present in those two syncs: when yet another dvb_demux_open() is called during those sync waits, dvb_demux_open() continues to process even if the device is being closed. This includes the increment of the former refcount, resulting in the leftover refcount after the sync of the latter refcount at dvb_dmxdev_release(). It ends up with use-after-free, since the function believes that all usages were gone and releases the resources.
This patch addresses the problem by adding the check of dmxdev->exit flag at dvb_demux_open(), just like dvb_dvr_open() already does. With the exit flag check, the second call of dvb_demux_open() fails, hence the further corruption can be avoided.
Also for avoiding the races of the dmxdev->exit flag reference, this patch serializes the dmxdev->exit set up and the sync waits with the dmxdev->mutex lock at dvb_dmxdev_release(). Without the mutex lock, dvb_demux_open() (or dvb_dvr_open()) may run concurrently with dvb_dmxdev_release(), which allows to skip the exit flag check and continue the open process that is being closed.
CVE-2022-41218 is assigned to those bugs above.
Reported-by: Hyunwoo Kim imv4bel@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/20220908132754.30532-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/media/dvb-core/dmxdev.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/drivers/media/dvb-core/dmxdev.c +++ b/drivers/media/dvb-core/dmxdev.c @@ -800,6 +800,11 @@ static int dvb_demux_open(struct inode * if (mutex_lock_interruptible(&dmxdev->mutex)) return -ERESTARTSYS;
+ if (dmxdev->exit) { + mutex_unlock(&dmxdev->mutex); + return -ENODEV; + } + for (i = 0; i < dmxdev->filternum; i++) if (dmxdev->filter[i].state == DMXDEV_STATE_FREE) break; @@ -1458,7 +1463,10 @@ EXPORT_SYMBOL(dvb_dmxdev_init);
void dvb_dmxdev_release(struct dmxdev *dmxdev) { + mutex_lock(&dmxdev->mutex); dmxdev->exit = 1; + mutex_unlock(&dmxdev->mutex); + if (dmxdev->dvbdev->users > 1) { wait_event(dmxdev->dvbdev->wait_queue, dmxdev->dvbdev->users == 1);
From: Paulo Alcantara pc@cjr.nz
commit a85ceafd41927e41a4103d228a993df7edd8823b upstream.
Since rc was initialised to -ENOMEM in cifs_get_smb_ses(), when an existing smb session was found, free_xid() would be called and then print
CIFS: fs/cifs/connect.c: Existing tcp session with server found CIFS: fs/cifs/connect.c: VFS: in cifs_get_smb_ses as Xid: 44 with uid: 0 CIFS: fs/cifs/connect.c: Existing smb sess found (status=1) CIFS: fs/cifs/connect.c: VFS: leaving cifs_get_smb_ses (xid = 44) rc = -12
Fix this by initialising rc to 0 and then let free_xid() print this instead
CIFS: fs/cifs/connect.c: Existing tcp session with server found CIFS: fs/cifs/connect.c: VFS: in cifs_get_smb_ses as Xid: 14 with uid: 0 CIFS: fs/cifs/connect.c: Existing smb sess found (status=1) CIFS: fs/cifs/connect.c: VFS: leaving cifs_get_smb_ses (xid = 14) rc = 0
Signed-off-by: Paulo Alcantara (SUSE) pc@cjr.nz Cc: stable@vger.kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/connect.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/fs/cifs/connect.c +++ b/fs/cifs/connect.c @@ -1948,7 +1948,7 @@ cifs_set_cifscreds(struct smb3_fs_contex struct cifs_ses * cifs_get_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx) { - int rc = -ENOMEM; + int rc = 0; unsigned int xid; struct cifs_ses *ses; struct sockaddr_in *addr = (struct sockaddr_in *)&server->dstaddr; @@ -1990,6 +1990,8 @@ cifs_get_smb_ses(struct TCP_Server_Info return ses; }
+ rc = -ENOMEM; + cifs_dbg(FYI, "Existing smb sess not found\n"); ses = sesInfoAlloc(); if (ses == NULL)
From: Steve French stfrench@microsoft.com
commit 2bfd81043e944af0e52835ef6d9b41795af22341 upstream.
Three mount options: "tcpnodelay" and "noautotune" and "noblocksend" were not displayed when passed in on cifs/smb3 mounts (e.g. displayed in /proc/mounts e.g.). No change to defaults so these are not displayed if not specified on mount.
Cc: stable@vger.kernel.org Reviewed-by: Paulo Alcantara (SUSE) pc@cjr.nz Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/cifs/cifsfs.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)
--- a/fs/cifs/cifsfs.c +++ b/fs/cifs/cifsfs.c @@ -656,9 +656,15 @@ cifs_show_options(struct seq_file *s, st seq_printf(s, ",echo_interval=%lu", tcon->ses->server->echo_interval / HZ);
- /* Only display max_credits if it was overridden on mount */ + /* Only display the following if overridden on mount */ if (tcon->ses->server->max_credits != SMB2_MAX_CREDITS_AVAILABLE) seq_printf(s, ",max_credits=%u", tcon->ses->server->max_credits); + if (tcon->ses->server->tcp_nodelay) + seq_puts(s, ",tcpnodelay"); + if (tcon->ses->server->noautotune) + seq_puts(s, ",noautotune"); + if (tcon->ses->server->noblocksnd) + seq_puts(s, ",noblocksend");
if (tcon->snapshot_time) seq_printf(s, ",snapshot=%llu", tcon->snapshot_time);
From: Ian Abbott abbotti@mev.co.uk
commit 4dfe05bdc1ade79b943d4979a2e2a8b5ef68fbb5 upstream.
In `ds1347_set_time()`, the wrong value is being written to the `DS1347_CENTURY_REG` register. It needs to be converted to BCD. Fix it.
Fixes: 147dae76dbb9 ("rtc: ds1347: handle century register") Cc: stable@vger.kernel.org # v5.5+ Signed-off-by: Ian Abbott abbotti@mev.co.uk Link: https://lore.kernel.org/r/20221027163249.447416-1-abbotti@mev.co.uk Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/rtc/rtc-ds1347.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/drivers/rtc/rtc-ds1347.c +++ b/drivers/rtc/rtc-ds1347.c @@ -112,7 +112,7 @@ static int ds1347_set_time(struct device return err;
century = (dt->tm_year / 100) + 19; - err = regmap_write(map, DS1347_CENTURY_REG, century); + err = regmap_write(map, DS1347_CENTURY_REG, bin2bcd(century)); if (err) return err;
From: Damien Le Moal damien.lemoal@opensource.wdc.com
commit 015d02f48537cf2d1a65eeac50717566f9db6eec upstream.
mq-deadline ensures an in order dispatching of write requests to zoned block devices using a per zone lock (a bit). This implies that for any purely sequential write workload, the drive is exercised most of the time at a maximum queue depth of one.
However, when such sequential write workload crosses a zone boundary (when sequentially writing multiple contiguous zones), zone write locking may prevent the last write to one zone to be issued (as the previous write is still being executed) but allow the first write to the following zone to be issued (as that zone is not yet being writen and not locked). This result in an out of order delivery of the sequential write commands to the device every time a zone boundary is crossed.
While such behavior does not break the sequential write constraint of zoned block devices (and does not generate any write error), some zoned hard-disks react badly to seeing these out of order writes, resulting in lower write throughput.
This problem can be addressed by always dispatching the first request of a stream of sequential write requests, regardless of the zones targeted by these sequential writes. To do so, the function deadline_skip_seq_writes() is introduced and used in deadline_next_request() to select the next write command to issue if the target device is an HDD (blk_queue_nonrot() being false). deadline_fifo_request() is modified using the new deadline_earlier_request() and deadline_is_seq_write() helpers to ignore requests in the fifo list that have a preceding request in lba order that is sequential.
With this fix, a sequential write workload executed with the following fio command:
fio --name=seq-write --filename=/dev/sda --zonemode=zbd --direct=1 \ --size=68719476736 --ioengine=libaio --iodepth=32 --rw=write \ --bs=65536
results in an increase from 225 MB/s to 250 MB/s of the write throughput of an SMR HDD (11% increase).
Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Link: https://lore.kernel.org/r/20221124021208.242541-3-damien.lemoal@opensource.w... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/mq-deadline.c | 66 ++++++++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 62 insertions(+), 4 deletions(-)
--- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -154,6 +154,20 @@ static u8 dd_rq_ioclass(struct request * }
/* + * get the request before `rq' in sector-sorted order + */ +static inline struct request * +deadline_earlier_request(struct request *rq) +{ + struct rb_node *node = rb_prev(&rq->rb_node); + + if (node) + return rb_entry_rq(node); + + return NULL; +} + +/* * get the request after `rq' in sector-sorted order */ static inline struct request * @@ -289,6 +303,39 @@ static inline int deadline_check_fifo(st }
/* + * Check if rq has a sequential request preceding it. + */ +static bool deadline_is_seq_writes(struct deadline_data *dd, struct request *rq) +{ + struct request *prev = deadline_earlier_request(rq); + + if (!prev) + return false; + + return blk_rq_pos(prev) + blk_rq_sectors(prev) == blk_rq_pos(rq); +} + +/* + * Skip all write requests that are sequential from @rq, even if we cross + * a zone boundary. + */ +static struct request *deadline_skip_seq_writes(struct deadline_data *dd, + struct request *rq) +{ + sector_t pos = blk_rq_pos(rq); + sector_t skipped_sectors = 0; + + while (rq) { + if (blk_rq_pos(rq) != pos + skipped_sectors) + break; + skipped_sectors += blk_rq_sectors(rq); + rq = deadline_latter_request(rq); + } + + return rq; +} + +/* * For the specified data direction, return the next request to * dispatch using arrival ordered lists. */ @@ -308,11 +355,16 @@ deadline_fifo_request(struct deadline_da
/* * Look for a write request that can be dispatched, that is one with - * an unlocked target zone. + * an unlocked target zone. For some HDDs, breaking a sequential + * write stream can lead to lower throughput, so make sure to preserve + * sequential write streams, even if that stream crosses into the next + * zones and these zones are unlocked. */ spin_lock_irqsave(&dd->zone_lock, flags); list_for_each_entry(rq, &per_prio->fifo_list[DD_WRITE], queuelist) { - if (blk_req_can_dispatch_to_zone(rq)) + if (blk_req_can_dispatch_to_zone(rq) && + (blk_queue_nonrot(rq->q) || + !deadline_is_seq_writes(dd, rq))) goto out; } rq = NULL; @@ -342,13 +394,19 @@ deadline_next_request(struct deadline_da
/* * Look for a write request that can be dispatched, that is one with - * an unlocked target zone. + * an unlocked target zone. For some HDDs, breaking a sequential + * write stream can lead to lower throughput, so make sure to preserve + * sequential write streams, even if that stream crosses into the next + * zones and these zones are unlocked. */ spin_lock_irqsave(&dd->zone_lock, flags); while (rq) { if (blk_req_can_dispatch_to_zone(rq)) break; - rq = deadline_latter_request(rq); + if (blk_queue_nonrot(rq->q)) + rq = deadline_latter_request(rq); + else + rq = deadline_skip_seq_writes(dd, rq); } spin_unlock_irqrestore(&dd->zone_lock, flags);
From: Florian-Ewald Mueller florian-ewald.mueller@ionos.com
commit 4555211190798b6b6fa2c37667d175bf67945c78 upstream.
- limit bitmap chunk size internal u64 variable to values not overflowing the u32 bitmap superblock structure variable stored on persistent media - assign bitmap chunk size internal u64 variable from unsigned values to avoid possible sign extension artifacts when assigning from a s32 value
The bug has been there since at least kernel 4.0. Steps to reproduce it: 1: mdadm -C /dev/mdx -l 1 --bitmap=internal --bitmap-chunk=256M -e 1.2 -n2 /dev/rnbd1 /dev/rnbd2 2 resize member device rnbd1 and rnbd2 to 8 TB 3 mdadm --grow /dev/mdx --size=max
The bitmap_chunksize will overflow without patch.
Cc: stable@vger.kernel.org
Signed-off-by: Florian-Ewald Mueller florian-ewald.mueller@ionos.com Signed-off-by: Jack Wang jinpu.wang@ionos.com Signed-off-by: Song Liu song@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/md-bitmap.c | 20 ++++++++++++-------- 1 file changed, 12 insertions(+), 8 deletions(-)
--- a/drivers/md/md-bitmap.c +++ b/drivers/md/md-bitmap.c @@ -486,7 +486,7 @@ void md_bitmap_print_sb(struct bitmap *b sb = kmap_atomic(bitmap->storage.sb_page); pr_debug("%s: bitmap file superblock:\n", bmname(bitmap)); pr_debug(" magic: %08x\n", le32_to_cpu(sb->magic)); - pr_debug(" version: %d\n", le32_to_cpu(sb->version)); + pr_debug(" version: %u\n", le32_to_cpu(sb->version)); pr_debug(" uuid: %08x.%08x.%08x.%08x\n", le32_to_cpu(*(__le32 *)(sb->uuid+0)), le32_to_cpu(*(__le32 *)(sb->uuid+4)), @@ -497,11 +497,11 @@ void md_bitmap_print_sb(struct bitmap *b pr_debug("events cleared: %llu\n", (unsigned long long) le64_to_cpu(sb->events_cleared)); pr_debug(" state: %08x\n", le32_to_cpu(sb->state)); - pr_debug(" chunksize: %d B\n", le32_to_cpu(sb->chunksize)); - pr_debug(" daemon sleep: %ds\n", le32_to_cpu(sb->daemon_sleep)); + pr_debug(" chunksize: %u B\n", le32_to_cpu(sb->chunksize)); + pr_debug(" daemon sleep: %us\n", le32_to_cpu(sb->daemon_sleep)); pr_debug(" sync size: %llu KB\n", (unsigned long long)le64_to_cpu(sb->sync_size)/2); - pr_debug("max write behind: %d\n", le32_to_cpu(sb->write_behind)); + pr_debug("max write behind: %u\n", le32_to_cpu(sb->write_behind)); kunmap_atomic(sb); }
@@ -2106,7 +2106,8 @@ int md_bitmap_resize(struct bitmap *bitm bytes = DIV_ROUND_UP(chunks, 8); if (!bitmap->mddev->bitmap_info.external) bytes += sizeof(bitmap_super_t); - } while (bytes > (space << 9)); + } while (bytes > (space << 9) && (chunkshift + BITMAP_BLOCK_SHIFT) < + (BITS_PER_BYTE * sizeof(((bitmap_super_t *)0)->chunksize) - 1)); } else chunkshift = ffz(~chunksize) - BITMAP_BLOCK_SHIFT;
@@ -2151,7 +2152,7 @@ int md_bitmap_resize(struct bitmap *bitm bitmap->counts.missing_pages = pages; bitmap->counts.chunkshift = chunkshift; bitmap->counts.chunks = chunks; - bitmap->mddev->bitmap_info.chunksize = 1 << (chunkshift + + bitmap->mddev->bitmap_info.chunksize = 1UL << (chunkshift + BITMAP_BLOCK_SHIFT);
blocks = min(old_counts.chunks << old_counts.chunkshift, @@ -2177,8 +2178,8 @@ int md_bitmap_resize(struct bitmap *bitm bitmap->counts.missing_pages = old_counts.pages; bitmap->counts.chunkshift = old_counts.chunkshift; bitmap->counts.chunks = old_counts.chunks; - bitmap->mddev->bitmap_info.chunksize = 1 << (old_counts.chunkshift + - BITMAP_BLOCK_SHIFT); + bitmap->mddev->bitmap_info.chunksize = + 1UL << (old_counts.chunkshift + BITMAP_BLOCK_SHIFT); blocks = old_counts.chunks << old_counts.chunkshift; pr_warn("Could not pre-allocate in-memory bitmap for cluster raid\n"); break; @@ -2519,6 +2520,9 @@ chunksize_store(struct mddev *mddev, con if (csize < 512 || !is_power_of_2(csize)) return -EINVAL; + if (BITS_PER_LONG > 32 && csize >= (1ULL << (BITS_PER_BYTE * + sizeof(((bitmap_super_t *)0)->chunksize)))) + return -EOVERFLOW; mddev->bitmap_info.chunksize = csize; return len; }
From: Aditya Garg gargaditya08@live.com
commit 0be56a116220f9e5731a6609e66a11accfe8d8e2 upstream.
The iMac Pro 2017 is also a T2 Mac. Thus add it to the list of uefi skip cert.
Cc: stable@vger.kernel.org Fixes: 155ca952c7ca ("efi: Do not import certificates from UEFI Secure Boot for T2 Macs") Link: https://lore.kernel.org/linux-integrity/9D46D92F-1381-4F10-989C-1A12CD2FFDD8... Signed-off-by: Aditya Garg gargaditya08@live.com Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/integrity/platform_certs/load_uefi.c | 1 + 1 file changed, 1 insertion(+)
--- a/security/integrity/platform_certs/load_uefi.c +++ b/security/integrity/platform_certs/load_uefi.c @@ -34,6 +34,7 @@ static const struct dmi_system_id uefi_s { UEFI_QUIRK_SKIP_CERT("Apple Inc.", "MacPro7,1") }, { UEFI_QUIRK_SKIP_CERT("Apple Inc.", "iMac20,1") }, { UEFI_QUIRK_SKIP_CERT("Apple Inc.", "iMac20,2") }, + { UEFI_QUIRK_SKIP_CERT("Apple Inc.", "iMacPro1,1") }, { } };
From: Michael Walle michael@walle.cc
commit 57d545b5a3d6ce3a8fb6b093f02bfcbb908973f3 upstream.
There are no SDIO module aliases included in the driver, therefore, module autoloading isn't working. Add the proper MODULE_DEVICE_TABLE().
Cc: stable@vger.kernel.org Signed-off-by: Michael Walle michael@walle.cc Signed-off-by: Kalle Valo kvalo@kernel.org Link: https://lore.kernel.org/r/20221027171221.491937-1-michael@walle.cc Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/wireless/microchip/wilc1000/sdio.c | 1 + 1 file changed, 1 insertion(+)
--- a/drivers/net/wireless/microchip/wilc1000/sdio.c +++ b/drivers/net/wireless/microchip/wilc1000/sdio.c @@ -20,6 +20,7 @@ static const struct sdio_device_id wilc_ { SDIO_DEVICE(SDIO_VENDOR_ID_MICROCHIP_WILC, SDIO_DEVICE_ID_MICROCHIP_WILC1000) }, { }, }; +MODULE_DEVICE_TABLE(sdio, wilc_sdio_ids);
#define WILC_SDIO_BLOCK_SIZE 512
From: Aidan MacDonald aidanmacdonald.0x0@gmail.com
commit 8b3a9ad86239f80ed569e23c3954a311f66481d6 upstream.
On the JZ4740, there is a single bit that flushes (empties) both the transmit and receive FIFO. Later SoCs have independent flush bits for each FIFO.
Independent FIFOs can be flushed before the snd_soc_dai_active() check because it won't disturb other active streams. This ensures that the FIFO we're about to use is always flushed before starting up. With shared FIFOs we can't do that because if another substream is active, flushing its FIFO would cause underrun errors.
This also fixes a bug: since we were only setting the JZ4740's flush bit, which corresponds to the TX FIFO flush bit on other SoCs, other SoCs were not having their RX FIFO flushed at all.
Fixes: 967beb2e8777 ("ASoC: jz4740: Add jz4780 support") Reviewed-by: Paul Cercueil paul@crapouillou.net Cc: stable@vger.kernel.org Signed-off-by: Aidan MacDonald aidanmacdonald.0x0@gmail.com Link: https://lore.kernel.org/r/20221023143328.160866-2-aidanmacdonald.0x0@gmail.c... Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/soc/jz4740/jz4740-i2s.c | 39 ++++++++++++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 5 deletions(-)
--- a/sound/soc/jz4740/jz4740-i2s.c +++ b/sound/soc/jz4740/jz4740-i2s.c @@ -56,7 +56,8 @@ #define JZ_AIC_CTRL_MONO_TO_STEREO BIT(11) #define JZ_AIC_CTRL_SWITCH_ENDIANNESS BIT(10) #define JZ_AIC_CTRL_SIGNED_TO_UNSIGNED BIT(9) -#define JZ_AIC_CTRL_FLUSH BIT(8) +#define JZ_AIC_CTRL_TFLUSH BIT(8) +#define JZ_AIC_CTRL_RFLUSH BIT(7) #define JZ_AIC_CTRL_ENABLE_ROR_INT BIT(6) #define JZ_AIC_CTRL_ENABLE_TUR_INT BIT(5) #define JZ_AIC_CTRL_ENABLE_RFS_INT BIT(4) @@ -91,6 +92,8 @@ enum jz47xx_i2s_version { struct i2s_soc_info { enum jz47xx_i2s_version version; struct snd_soc_dai_driver *dai; + + bool shared_fifo_flush; };
struct jz4740_i2s { @@ -119,19 +122,44 @@ static inline void jz4740_i2s_write(cons writel(value, i2s->base + reg); }
+static inline void jz4740_i2s_set_bits(const struct jz4740_i2s *i2s, + unsigned int reg, uint32_t bits) +{ + uint32_t value = jz4740_i2s_read(i2s, reg); + value |= bits; + jz4740_i2s_write(i2s, reg, value); +} + static int jz4740_i2s_startup(struct snd_pcm_substream *substream, struct snd_soc_dai *dai) { struct jz4740_i2s *i2s = snd_soc_dai_get_drvdata(dai); - uint32_t conf, ctrl; + uint32_t conf; int ret;
+ /* + * When we can flush FIFOs independently, only flush the FIFO + * that is starting up. We can do this when the DAI is active + * because it does not disturb other active substreams. + */ + if (!i2s->soc_info->shared_fifo_flush) { + if (substream->stream == SNDRV_PCM_STREAM_PLAYBACK) + jz4740_i2s_set_bits(i2s, JZ_REG_AIC_CTRL, JZ_AIC_CTRL_TFLUSH); + else + jz4740_i2s_set_bits(i2s, JZ_REG_AIC_CTRL, JZ_AIC_CTRL_RFLUSH); + } + if (snd_soc_dai_active(dai)) return 0;
- ctrl = jz4740_i2s_read(i2s, JZ_REG_AIC_CTRL); - ctrl |= JZ_AIC_CTRL_FLUSH; - jz4740_i2s_write(i2s, JZ_REG_AIC_CTRL, ctrl); + /* + * When there is a shared flush bit for both FIFOs, the TFLUSH + * bit flushes both FIFOs. Flushing while the DAI is active would + * cause FIFO underruns in other active substreams so we have to + * guard this behind the snd_soc_dai_active() check. + */ + if (i2s->soc_info->shared_fifo_flush) + jz4740_i2s_set_bits(i2s, JZ_REG_AIC_CTRL, JZ_AIC_CTRL_TFLUSH);
ret = clk_prepare_enable(i2s->clk_i2s); if (ret) @@ -462,6 +490,7 @@ static struct snd_soc_dai_driver jz4740_ static const struct i2s_soc_info jz4740_i2s_soc_info = { .version = JZ_I2S_JZ4740, .dai = &jz4740_i2s_dai, + .shared_fifo_flush = true, };
static const struct i2s_soc_info jz4760_i2s_soc_info = {
From: Maximilian Luz luzmaximilian@gmail.com
commit dc608edf7d45ba0c2ad14c06eccd66474fec7847 upstream.
Calling v4l2_subdev_get_try_crop() and v4l2_subdev_get_try_compose() with a subdev state of NULL leads to a NULL pointer dereference. This can currently happen in imgu_subdev_set_selection() when the state passed in is NULL, as this method first gets pointers to both the "try" and "active" states and only then decides which to use.
The same issue has been addressed for imgu_subdev_get_selection() with commit 30d03a0de650 ("ipu3-imgu: Fix NULL pointer dereference in active selection access"). However the issue still persists in imgu_subdev_set_selection().
Therefore, apply a similar fix as done in the aforementioned commit to imgu_subdev_set_selection(). To keep things a bit cleaner, introduce helper functions for "crop" and "compose" access and use them in both imgu_subdev_set_selection() and imgu_subdev_get_selection().
Fixes: 0d346d2a6f54 ("media: v4l2-subdev: add subdev-wide state struct") Cc: stable@vger.kernel.org # for v5.14 and later Signed-off-by: Maximilian Luz luzmaximilian@gmail.com Signed-off-by: Sakari Ailus sakari.ailus@linux.intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/staging/media/ipu3/ipu3-v4l2.c | 57 +++++++++++++++----------- 1 file changed, 34 insertions(+), 23 deletions(-)
diff --git a/drivers/staging/media/ipu3/ipu3-v4l2.c b/drivers/staging/media/ipu3/ipu3-v4l2.c index ce13e746c15f..e530767e80a5 100644 --- a/drivers/staging/media/ipu3/ipu3-v4l2.c +++ b/drivers/staging/media/ipu3/ipu3-v4l2.c @@ -188,6 +188,28 @@ static int imgu_subdev_set_fmt(struct v4l2_subdev *sd, return 0; }
+static struct v4l2_rect * +imgu_subdev_get_crop(struct imgu_v4l2_subdev *sd, + struct v4l2_subdev_state *sd_state, unsigned int pad, + enum v4l2_subdev_format_whence which) +{ + if (which == V4L2_SUBDEV_FORMAT_TRY) + return v4l2_subdev_get_try_crop(&sd->subdev, sd_state, pad); + else + return &sd->rect.eff; +} + +static struct v4l2_rect * +imgu_subdev_get_compose(struct imgu_v4l2_subdev *sd, + struct v4l2_subdev_state *sd_state, unsigned int pad, + enum v4l2_subdev_format_whence which) +{ + if (which == V4L2_SUBDEV_FORMAT_TRY) + return v4l2_subdev_get_try_compose(&sd->subdev, sd_state, pad); + else + return &sd->rect.bds; +} + static int imgu_subdev_get_selection(struct v4l2_subdev *sd, struct v4l2_subdev_state *sd_state, struct v4l2_subdev_selection *sel) @@ -200,18 +222,12 @@ static int imgu_subdev_get_selection(struct v4l2_subdev *sd,
switch (sel->target) { case V4L2_SEL_TGT_CROP: - if (sel->which == V4L2_SUBDEV_FORMAT_TRY) - sel->r = *v4l2_subdev_get_try_crop(sd, sd_state, - sel->pad); - else - sel->r = imgu_sd->rect.eff; + sel->r = *imgu_subdev_get_crop(imgu_sd, sd_state, sel->pad, + sel->which); return 0; case V4L2_SEL_TGT_COMPOSE: - if (sel->which == V4L2_SUBDEV_FORMAT_TRY) - sel->r = *v4l2_subdev_get_try_compose(sd, sd_state, - sel->pad); - else - sel->r = imgu_sd->rect.bds; + sel->r = *imgu_subdev_get_compose(imgu_sd, sd_state, sel->pad, + sel->which); return 0; default: return -EINVAL; @@ -223,10 +239,9 @@ static int imgu_subdev_set_selection(struct v4l2_subdev *sd, struct v4l2_subdev_selection *sel) { struct imgu_device *imgu = v4l2_get_subdevdata(sd); - struct imgu_v4l2_subdev *imgu_sd = container_of(sd, - struct imgu_v4l2_subdev, - subdev); - struct v4l2_rect *rect, *try_sel; + struct imgu_v4l2_subdev *imgu_sd = + container_of(sd, struct imgu_v4l2_subdev, subdev); + struct v4l2_rect *rect;
dev_dbg(&imgu->pci_dev->dev, "set subdev %u sel which %u target 0x%4x rect [%ux%u]", @@ -238,22 +253,18 @@ static int imgu_subdev_set_selection(struct v4l2_subdev *sd,
switch (sel->target) { case V4L2_SEL_TGT_CROP: - try_sel = v4l2_subdev_get_try_crop(sd, sd_state, sel->pad); - rect = &imgu_sd->rect.eff; + rect = imgu_subdev_get_crop(imgu_sd, sd_state, sel->pad, + sel->which); break; case V4L2_SEL_TGT_COMPOSE: - try_sel = v4l2_subdev_get_try_compose(sd, sd_state, sel->pad); - rect = &imgu_sd->rect.bds; + rect = imgu_subdev_get_compose(imgu_sd, sd_state, sel->pad, + sel->which); break; default: return -EINVAL; }
- if (sel->which == V4L2_SUBDEV_FORMAT_TRY) - *try_sel = sel->r; - else - *rect = sel->r; - + *rect = sel->r; return 0; }
From: Zhang Yuchen zhangyuchen.lcr@bytedance.com
commit f6f1234d98cce69578bfac79df147a1f6660596c upstream.
When fixing the problem mentioned in PATCH1, we also found the following problem:
If the IPMI is disconnected and in the sending process, the uninstallation driver will be stuck for a long time.
The main problem is that uninstalling the driver waits for curr_msg to be sent or HOSED. After stopping tasklet, the only place to trigger the timeout mechanism is the circular poll in shutdown_smi.
The poll function delays 10us and calls smi_event_handler(smi_info,10). Smi_event_handler deducts 10us from kcs->ibf_timeout.
But the poll func is followed by schedule_timeout_uninterruptible(1). The time consumed here is not counted in kcs->ibf_timeout.
So when 10us is deducted from kcs->ibf_timeout, at least 1 jiffies has actually passed. The waiting time has increased by more than a hundredfold.
Now instead of calling poll(). call smi_event_handler() directly and calculate the elapsed time.
For verification, you can directly use ebpf to check the kcs-> ibf_timeout for each call to kcs_event() when IPMI is disconnected. Decrement at normal rate before unloading. The decrement rate becomes very slow after unloading.
$ bpftrace -e 'kprobe:kcs_event {printf("kcs->ibftimeout : %d\n", *(arg0+584));}'
Signed-off-by: Zhang Yuchen zhangyuchen.lcr@bytedance.com Message-Id: 20221007092617.87597-3-zhangyuchen.lcr@bytedance.com Signed-off-by: Corey Minyard cminyard@mvista.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/ipmi/ipmi_si_intf.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
--- a/drivers/char/ipmi/ipmi_si_intf.c +++ b/drivers/char/ipmi/ipmi_si_intf.c @@ -2152,6 +2152,20 @@ skip_fallback_noirq: } module_init(init_ipmi_si);
+static void wait_msg_processed(struct smi_info *smi_info) +{ + unsigned long jiffies_now; + long time_diff; + + while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) { + jiffies_now = jiffies; + time_diff = (((long)jiffies_now - (long)smi_info->last_timeout_jiffies) + * SI_USEC_PER_JIFFY); + smi_event_handler(smi_info, time_diff); + schedule_timeout_uninterruptible(1); + } +} + static void shutdown_smi(void *send_info) { struct smi_info *smi_info = send_info; @@ -2186,16 +2200,13 @@ static void shutdown_smi(void *send_info * in the BMC. Note that timers and CPU interrupts are off, * so no need for locks. */ - while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) { - poll(smi_info); - schedule_timeout_uninterruptible(1); - } + wait_msg_processed(smi_info); + if (smi_info->handlers) disable_si_irq(smi_info); - while (smi_info->curr_msg || (smi_info->si_state != SI_NORMAL)) { - poll(smi_info); - schedule_timeout_uninterruptible(1); - } + + wait_msg_processed(smi_info); + if (smi_info->handlers) smi_info->handlers->cleanup(smi_info->si_sm);
From: Alexander Sverdlin alexander.sverdlin@nokia.com
commit 2ebc336be08160debfe27f87660cf550d710f3e9 upstream.
Erase can be zeroed in spi_nor_parse_4bait() or spi_nor_init_non_uniform_erase_map(). In practice it happened with mt25qu256a, which supports 4K, 32K, 64K erases with 3b address commands, but only 4K and 64K erase with 4b address commands.
Fixes: dc92843159a7 ("mtd: spi-nor: fix erase_type array to indicate current map conf") Signed-off-by: Alexander Sverdlin alexander.sverdlin@nokia.com Signed-off-by: Tudor Ambarus tudor.ambarus@microchip.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20211119081412.29732-1-alexander.sverdlin@nokia.co... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mtd/spi-nor/core.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/mtd/spi-nor/core.c +++ b/drivers/mtd/spi-nor/core.c @@ -1409,6 +1409,8 @@ spi_nor_find_best_erase_type(const struc continue;
erase = &map->erase_type[i]; + if (!erase->size) + continue;
/* Alignment is not mandatory for overlaid regions */ if (region->offset & SNOR_OVERLAID_REGION &&
From: Huaxin Lu luhuaxin1@huawei.com
commit 11220db412edae8dba58853238f53258268bdb88 upstream.
In restore_template_fmt, when kstrdup fails, a non-NULL value will still be returned, which causes a NULL pointer access in template_desc_init_fields.
Fixes: c7d09367702e ("ima: support restoring multiple template formats") Cc: stable@kernel.org Co-developed-by: Jiaming Li lijiaming30@huawei.com Signed-off-by: Jiaming Li lijiaming30@huawei.com Signed-off-by: Huaxin Lu luhuaxin1@huawei.com Reviewed-by: Stefan Berger stefanb@linux.ibm.com Signed-off-by: Mimi Zohar zohar@linux.ibm.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/integrity/ima/ima_template.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
--- a/security/integrity/ima/ima_template.c +++ b/security/integrity/ima/ima_template.c @@ -336,8 +336,11 @@ static struct ima_template_desc *restore
template_desc->name = ""; template_desc->fmt = kstrdup(template_name, GFP_KERNEL); - if (!template_desc->fmt) + if (!template_desc->fmt) { + kfree(template_desc); + template_desc = NULL; goto out; + }
spin_lock(&template_list); list_add_tail_rcu(&template_desc->list, &defined_templates);
From: Dan Carpenter error27@gmail.com
commit a92ce570c81dc0feaeb12a429b4bc65686d17967 upstream.
The intf_free() function frees the "intf" pointer so we cannot dereference it again on the next line.
Fixes: cbb79863fc31 ("ipmi: Don't allow device module unload when in use") Signed-off-by: Dan Carpenter error27@gmail.com Message-Id: Y3M8xa1drZv4CToE@kili Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: Corey Minyard cminyard@mvista.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/ipmi/ipmi_msghandler.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
--- a/drivers/char/ipmi/ipmi_msghandler.c +++ b/drivers/char/ipmi/ipmi_msghandler.c @@ -1273,6 +1273,7 @@ static void _ipmi_destroy_user(struct ip unsigned long flags; struct cmd_rcvr *rcvr; struct cmd_rcvr *rcvrs = NULL; + struct module *owner;
if (!acquire_ipmi_user(user, &i)) { /* @@ -1334,8 +1335,9 @@ static void _ipmi_destroy_user(struct ip kfree(rcvr); }
+ owner = intf->owner; kref_put(&intf->refcount, intf_free); - module_put(intf->owner); + module_put(owner); }
int ipmi_destroy_user(struct ipmi_user *user)
From: Michael S. Tsirkin mst@redhat.com
commit 98b04dd0b4577894520493d96bc4623387767445 upstream.
pci_device_is_present() previously didn't work for VFs because it reads the Vendor and Device ID, which are 0xffff for VFs, which looks like they aren't present. Check the PF instead.
Wei Gong reported that if virtio I/O is in progress when the driver is unbound or "0" is written to /sys/.../sriov_numvfs, the virtio I/O operation hangs, which may result in output like this:
task:bash state:D stack: 0 pid: 1773 ppid: 1241 flags:0x00004002 Call Trace: schedule+0x4f/0xc0 blk_mq_freeze_queue_wait+0x69/0xa0 blk_mq_freeze_queue+0x1b/0x20 blk_cleanup_queue+0x3d/0xd0 virtblk_remove+0x3c/0xb0 [virtio_blk] virtio_dev_remove+0x4b/0x80 ... device_unregister+0x1b/0x60 unregister_virtio_device+0x18/0x30 virtio_pci_remove+0x41/0x80 pci_device_remove+0x3e/0xb0
This happened because pci_device_is_present(VF) returned "false" in virtio_pci_remove(), so it called virtio_break_device(). The broken vq meant that vring_interrupt() skipped the vq.callback() that would have completed the virtio I/O operation via virtblk_done().
[bhelgaas: commit log, simplify to always use pci_physfn(), add stable tag] Link: https://lore.kernel.org/r/20221026060912.173250-1-mst@redhat.com Reported-by: Wei Gong gongwei833x@gmail.com Tested-by: Wei Gong gongwei833x@gmail.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Bjorn Helgaas bhelgaas@google.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/pci.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -6383,6 +6383,8 @@ bool pci_device_is_present(struct pci_de { u32 v;
+ /* Check PF if pdev is a VF, since VF Vendor/Device IDs are 0xffff */ + pdev = pci_physfn(pdev); if (pci_dev_is_disconnected(pdev)) return false; return pci_bus_read_dev_vendor_id(pdev->bus, pdev->devfn, &v, 0);
From: Sascha Hauer s.hauer@pengutronix.de
commit aa382ffa705bea9931ec92b6f3c70e1fdb372195 upstream.
When pci_create_attr() fails, pci_remove_resource_files() is called which will iterate over the res_attr[_wc] arrays and frees every non NULL entry. To avoid a double free here set the array entry only after it's clear we successfully initialized it.
Fixes: b562ec8f74e4 ("PCI: Don't leak memory if sysfs_create_bin_file() fails") Link: https://lore.kernel.org/r/20221007070735.GX986@pengutronix.de/ Signed-off-by: Sascha Hauer s.hauer@pengutronix.de Signed-off-by: Bjorn Helgaas bhelgaas@google.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/pci/pci-sysfs.c | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-)
--- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -1179,11 +1179,9 @@ static int pci_create_attr(struct pci_de
sysfs_bin_attr_init(res_attr); if (write_combine) { - pdev->res_attr_wc[num] = res_attr; sprintf(res_attr_name, "resource%d_wc", num); res_attr->mmap = pci_mmap_resource_wc; } else { - pdev->res_attr[num] = res_attr; sprintf(res_attr_name, "resource%d", num); if (pci_resource_flags(pdev, num) & IORESOURCE_IO) { res_attr->read = pci_read_resource_io; @@ -1201,10 +1199,17 @@ static int pci_create_attr(struct pci_de res_attr->size = pci_resource_len(pdev, num); res_attr->private = (void *)(unsigned long)num; retval = sysfs_create_bin_file(&pdev->dev.kobj, res_attr); - if (retval) + if (retval) { kfree(res_attr); + return retval; + } + + if (write_combine) + pdev->res_attr_wc[num] = res_attr; + else + pdev->res_attr[num] = res_attr;
- return retval; + return 0; }
/**
From: Guo Ren guoren@linux.alibaba.com
commit 5c3022e4a616d800cf5f4c3a981d7992179e44a1 upstream.
The 'retp' is a pointer to the return address on the stack, so we must pass the current return address pointer as the 'retp' argument to ftrace_push_return_trace(). Not parent function's return address on the stack.
Fixes: b785ec129bd9 ("riscv/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR support") Signed-off-by: Guo Ren guoren@linux.alibaba.com Signed-off-by: Guo Ren guoren@kernel.org Link: https://lore.kernel.org/r/20221109064937.3643993-2-guoren@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/kernel/stacktrace.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/riscv/kernel/stacktrace.c +++ b/arch/riscv/kernel/stacktrace.c @@ -60,7 +60,7 @@ void notrace walk_stackframe(struct task } else { fp = frame->fp; pc = ftrace_graph_ret_addr(current, NULL, frame->ra, - (unsigned long *)(fp - 8)); + &frame->ra); }
}
From: Sergey Matyukevich sergey.matyukevich@syntacore.com
commit 4bd1d80efb5af640f99157f39b50fb11326ce641 upstream.
Current implementation of update_mmu_cache function performs local TLB flush. It does not take into account ASID information. Besides, it does not take into account other harts currently running the same mm context or possible migration of the running context to other harts. Meanwhile TLB flush is not performed for every context switch if ASID support is enabled.
Patch [1] proposed to add ASID support to update_mmu_cache to avoid flushing local TLB entirely. This patch takes into account other harts currently running the same mm context as well as possible migration of this context to other harts.
For this purpose the approach from flush_icache_mm is reused. Remote harts currently running the same mm context are informed via SBI calls that they need to flush their local TLBs. All the other harts are marked as needing a deferred TLB flush when this mm context runs on them.
[1] https://lore.kernel.org/linux-riscv/20220821013926.8968-1-tjytimi@163.com/
Signed-off-by: Sergey Matyukevich sergey.matyukevich@syntacore.com Fixes: 65d4b9c53017 ("RISC-V: Implement ASID allocator") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/linux-riscv/20220829205219.283543-1-geomatsi@gmail.c... Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/mmu.h | 2 ++ arch/riscv/include/asm/pgtable.h | 2 +- arch/riscv/include/asm/tlbflush.h | 18 ++++++++++++++++++ arch/riscv/mm/context.c | 10 ++++++++++ arch/riscv/mm/tlbflush.c | 28 +++++++++++----------------- 5 files changed, 42 insertions(+), 18 deletions(-)
--- a/arch/riscv/include/asm/mmu.h +++ b/arch/riscv/include/asm/mmu.h @@ -19,6 +19,8 @@ typedef struct { #ifdef CONFIG_SMP /* A local icache flush is needed before user execution can resume. */ cpumask_t icache_stale_mask; + /* A local tlb flush is needed before user execution can resume. */ + cpumask_t tlb_stale_mask; #endif } mm_context_t;
--- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -386,7 +386,7 @@ static inline void update_mmu_cache(stru * Relying on flush_tlb_fix_spurious_fault would suffice, but * the extra traps reduce performance. So, eagerly SFENCE.VMA. */ - local_flush_tlb_page(address); + flush_tlb_page(vma, address); }
static inline void update_mmu_cache_pmd(struct vm_area_struct *vma, --- a/arch/riscv/include/asm/tlbflush.h +++ b/arch/riscv/include/asm/tlbflush.h @@ -22,6 +22,24 @@ static inline void local_flush_tlb_page( { ALT_FLUSH_TLB_PAGE(__asm__ __volatile__ ("sfence.vma %0" : : "r" (addr) : "memory")); } + +static inline void local_flush_tlb_all_asid(unsigned long asid) +{ + __asm__ __volatile__ ("sfence.vma x0, %0" + : + : "r" (asid) + : "memory"); +} + +static inline void local_flush_tlb_page_asid(unsigned long addr, + unsigned long asid) +{ + __asm__ __volatile__ ("sfence.vma %0, %1" + : + : "r" (addr), "r" (asid) + : "memory"); +} + #else /* CONFIG_MMU */ #define local_flush_tlb_all() do { } while (0) #define local_flush_tlb_page(addr) do { } while (0) --- a/arch/riscv/mm/context.c +++ b/arch/riscv/mm/context.c @@ -196,6 +196,16 @@ switch_mm_fast:
if (need_flush_tlb) local_flush_tlb_all(); +#ifdef CONFIG_SMP + else { + cpumask_t *mask = &mm->context.tlb_stale_mask; + + if (cpumask_test_cpu(cpu, mask)) { + cpumask_clear_cpu(cpu, mask); + local_flush_tlb_all_asid(cntx & asid_mask); + } + } +#endif }
static void set_mm_noasid(struct mm_struct *mm) --- a/arch/riscv/mm/tlbflush.c +++ b/arch/riscv/mm/tlbflush.c @@ -5,23 +5,7 @@ #include <linux/sched.h> #include <asm/sbi.h> #include <asm/mmu_context.h> - -static inline void local_flush_tlb_all_asid(unsigned long asid) -{ - __asm__ __volatile__ ("sfence.vma x0, %0" - : - : "r" (asid) - : "memory"); -} - -static inline void local_flush_tlb_page_asid(unsigned long addr, - unsigned long asid) -{ - __asm__ __volatile__ ("sfence.vma %0, %1" - : - : "r" (addr), "r" (asid) - : "memory"); -} +#include <asm/tlbflush.h>
void flush_tlb_all(void) { @@ -31,6 +15,7 @@ void flush_tlb_all(void) static void __sbi_tlb_flush_range(struct mm_struct *mm, unsigned long start, unsigned long size, unsigned long stride) { + struct cpumask *pmask = &mm->context.tlb_stale_mask; struct cpumask *cmask = mm_cpumask(mm); struct cpumask hmask; unsigned int cpuid; @@ -45,6 +30,15 @@ static void __sbi_tlb_flush_range(struct if (static_branch_unlikely(&use_asid_allocator)) { unsigned long asid = atomic_long_read(&mm->context.id);
+ /* + * TLB will be immediately flushed on harts concurrently + * executing this MM context. TLB flush on other harts + * is deferred until this MM context migrates there. + */ + cpumask_setall(pmask); + cpumask_clear_cpu(cpuid, pmask); + cpumask_andnot(pmask, pmask, cmask); + if (broadcast) { riscv_cpuid_to_hartid_mask(cmask, &hmask); sbi_remote_sfence_vma_asid(cpumask_bits(&hmask),
From: Corentin Labbe clabbe@baylibre.com
commit 76a4e874593543a2dff91d249c95bac728df2774 upstream.
Add missing statesize to hash templates. This is mandatory otherwise no algorithms can be registered as the core requires statesize to be set.
CC: stable@kernel.org # 4.3+ Reported-by: Rolf Eike Beer eike-kernel@sf-tec.de Tested-by: Rolf Eike Beer eike-kernel@sf-tec.de Fixes: 0a625fd2abaa ("crypto: n2 - Add Niagara2 crypto driver") Signed-off-by: Corentin Labbe clabbe@baylibre.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/crypto/n2_core.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/drivers/crypto/n2_core.c +++ b/drivers/crypto/n2_core.c @@ -1229,6 +1229,7 @@ struct n2_hash_tmpl { const u8 *hash_init; u8 hw_op_hashsz; u8 digest_size; + u8 statesize; u8 block_size; u8 auth_type; u8 hmac_type; @@ -1260,6 +1261,7 @@ static const struct n2_hash_tmpl hash_tm .hmac_type = AUTH_TYPE_HMAC_MD5, .hw_op_hashsz = MD5_DIGEST_SIZE, .digest_size = MD5_DIGEST_SIZE, + .statesize = sizeof(struct md5_state), .block_size = MD5_HMAC_BLOCK_SIZE }, { .name = "sha1", .hash_zero = sha1_zero_message_hash, @@ -1268,6 +1270,7 @@ static const struct n2_hash_tmpl hash_tm .hmac_type = AUTH_TYPE_HMAC_SHA1, .hw_op_hashsz = SHA1_DIGEST_SIZE, .digest_size = SHA1_DIGEST_SIZE, + .statesize = sizeof(struct sha1_state), .block_size = SHA1_BLOCK_SIZE }, { .name = "sha256", .hash_zero = sha256_zero_message_hash, @@ -1276,6 +1279,7 @@ static const struct n2_hash_tmpl hash_tm .hmac_type = AUTH_TYPE_HMAC_SHA256, .hw_op_hashsz = SHA256_DIGEST_SIZE, .digest_size = SHA256_DIGEST_SIZE, + .statesize = sizeof(struct sha256_state), .block_size = SHA256_BLOCK_SIZE }, { .name = "sha224", .hash_zero = sha224_zero_message_hash, @@ -1284,6 +1288,7 @@ static const struct n2_hash_tmpl hash_tm .hmac_type = AUTH_TYPE_RESERVED, .hw_op_hashsz = SHA256_DIGEST_SIZE, .digest_size = SHA224_DIGEST_SIZE, + .statesize = sizeof(struct sha256_state), .block_size = SHA224_BLOCK_SIZE }, }; #define NUM_HASH_TMPLS ARRAY_SIZE(hash_tmpls) @@ -1424,6 +1429,7 @@ static int __n2_register_one_ahash(const
halg = &ahash->halg; halg->digestsize = tmpl->digest_size; + halg->statesize = tmpl->statesize;
base = &halg->base; snprintf(base->cra_name, CRYPTO_MAX_ALG_NAME, "%s", tmpl->name);
From: Mario Limonciello mario.limonciello@amd.com
commit 10da230a4df1dfe32a58eb09246f5ffe82346f27 upstream.
SoCs containing 0x14CA are present both in datacenter parts that support SEV as well as client parts that support TEE.
Cc: stable@vger.kernel.org # 5.15+ Tested-by: Rijo-john Thomas Rijo-john.Thomas@amd.com Signed-off-by: Mario Limonciello mario.limonciello@amd.com Acked-by: Tom Lendacky thomas.lendacky@amd.com Signed-off-by: Herbert Xu herbert@gondor.apana.org.au Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/crypto/ccp/sp-pci.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-)
--- a/drivers/crypto/ccp/sp-pci.c +++ b/drivers/crypto/ccp/sp-pci.c @@ -320,6 +320,15 @@ static const struct psp_vdata pspv3 = { .inten_reg = 0x10690, .intsts_reg = 0x10694, }; + +static const struct psp_vdata pspv4 = { + .sev = &sevv2, + .tee = &teev1, + .feature_reg = 0x109fc, + .inten_reg = 0x10690, + .intsts_reg = 0x10694, +}; + #endif
static const struct sp_dev_vdata dev_vdata[] = { @@ -365,7 +374,7 @@ static const struct sp_dev_vdata dev_vda { /* 5 */ .bar = 2, #ifdef CONFIG_CRYPTO_DEV_SP_PSP - .psp_vdata = &pspv2, + .psp_vdata = &pspv4, #endif }, };
From: Isaac J. Manjarres isaacmanjarres@google.com
commit 27c0d217340e47ec995557f61423ef415afba987 upstream.
When a driver registers with a bus, it will attempt to match with every device on the bus through the __driver_attach() function. Currently, if the bus_type.match() function encounters an error that is not -EPROBE_DEFER, __driver_attach() will return a negative error code, which causes the driver registration logic to stop trying to match with the remaining devices on the bus.
This behavior is not correct; a failure while matching a driver to a device does not mean that the driver won't be able to match and bind with other devices on the bus. Update the logic in __driver_attach() to reflect this.
Fixes: 656b8035b0ee ("ARM: 8524/1: driver cohandle -EPROBE_DEFER from bus_type.match()") Cc: stable@vger.kernel.org Cc: Saravana Kannan saravanak@google.com Signed-off-by: Isaac J. Manjarres isaacmanjarres@google.com Link: https://lore.kernel.org/r/20220921001414.4046492-1-isaacmanjarres@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/base/dd.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/base/dd.c +++ b/drivers/base/dd.c @@ -1127,7 +1127,11 @@ static int __driver_attach(struct device return 0; } else if (ret < 0) { dev_dbg(dev, "Bus failed to match device: %d\n", ret); - return ret; + /* + * Driver could not match with device, but may match with + * another device on the bus. + */ + return 0; } /* ret > 0 means positive match */
if (driver_allows_async_probing(drv)) {
From: Johan Hovold johan+linaro@kernel.org
commit 910dd4883d757af5faac92590f33f0f7da963032 upstream.
The SC8180X has two resets but the DP configuration erroneously described only one.
In case the DP part of the PHY is initialised before the USB part (e.g. depending on probe order), then only the first reset would be asserted.
Fixes: 1633802cd4ac ("phy: qcom: qmp: Add SC8180x USB/DP combo") Cc: stable@vger.kernel.org # 5.15 Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20221114081346.5116-4-johan+linaro@kernel.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/phy/qualcomm/phy-qcom-qmp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/phy/qualcomm/phy-qcom-qmp.c +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c @@ -3417,8 +3417,8 @@ static const struct qmp_phy_cfg sc7180_d
.clk_list = qmp_v3_phy_clk_l, .num_clks = ARRAY_SIZE(qmp_v3_phy_clk_l), - .reset_list = sc7180_usb3phy_reset_l, - .num_resets = ARRAY_SIZE(sc7180_usb3phy_reset_l), + .reset_list = msm8996_usb3phy_reset_l, + .num_resets = ARRAY_SIZE(msm8996_usb3phy_reset_l), .vreg_list = qmp_phy_vreg_l, .num_vregs = ARRAY_SIZE(qmp_phy_vreg_l), .regs = qmp_v3_usb3phy_regs_layout,
From: Kim Phillips kim.phillips@amd.com
commit 5f18e9f8868c6d4eae71678e7ebd4977b7d8c8cf upstream.
The second (UID) strcmp in acpi_dev_hid_uid_match considers "0" and "00" different, which can prevent device registration.
Have the AMD IOMMU driver's ivrs_acpihid parsing code remove any leading zeroes to make the UID strcmp succeed. Now users can safely specify "AMDxxxxx:00" or "AMDxxxxx:0" and expect the same behaviour.
Fixes: ca3bf5d47cec ("iommu/amd: Introduces ivrs_acpihid kernel parameter") Signed-off-by: Kim Phillips kim.phillips@amd.com Cc: stable@vger.kernel.org Cc: Suravee Suthikulpanit Suravee.Suthikulpanit@amd.com Cc: Joerg Roedel jroedel@suse.de Link: https://lore.kernel.org/r/20220919155638.391481-1-kim.phillips@amd.com Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/iommu/amd/init.c | 7 +++++++ 1 file changed, 7 insertions(+)
--- a/drivers/iommu/amd/init.c +++ b/drivers/iommu/amd/init.c @@ -3226,6 +3226,13 @@ static int __init parse_ivrs_acpihid(cha return 1; }
+ /* + * Ignore leading zeroes after ':', so e.g., AMDI0095:00 + * will match AMDI0095:0 in the second strcmp in acpi_dev_hid_uid_match + */ + while (*uid == '0' && *(uid + 1)) + uid++; + i = early_acpihid_map_size++; memcpy(early_acpihid_map[i].hid, hid, strlen(hid)); memcpy(early_acpihid_map[i].uid, uid, strlen(uid));
From: Maria Yu quic_aiquny@quicinc.com
commit 11c7f9e3131ad14b27a957496088fa488b153a48 upstream.
Make sure that pm_relax() happens even when the remoteproc is stopped before the crash handler work is scheduled.
Signed-off-by: Maria Yu quic_aiquny@quicinc.com Cc: stable stable@vger.kernel.org Fixes: a781e5aa5911 ("remoteproc: core: Prevent system suspend during remoteproc recovery") Link: https://lore.kernel.org/r/20221206015957.2616-2-quic_aiquny@quicinc.com Signed-off-by: Mathieu Poirier mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/remoteproc/remoteproc_core.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/drivers/remoteproc/remoteproc_core.c +++ b/drivers/remoteproc/remoteproc_core.c @@ -1955,12 +1955,18 @@ static void rproc_crash_handler_work(str
mutex_lock(&rproc->lock);
- if (rproc->state == RPROC_CRASHED || rproc->state == RPROC_OFFLINE) { + if (rproc->state == RPROC_CRASHED) { /* handle only the first crash detected */ mutex_unlock(&rproc->lock); return; }
+ if (rproc->state == RPROC_OFFLINE) { + /* Don't recover if the remote processor was stopped */ + mutex_unlock(&rproc->lock); + goto out; + } + rproc->state = RPROC_CRASHED; dev_err(dev, "handling crash #%u in %s\n", ++rproc->crash_cnt, rproc->name); @@ -1970,6 +1976,7 @@ static void rproc_crash_handler_work(str if (!rproc->recovery_disabled) rproc_trigger_recovery(rproc);
+out: pm_relax(rproc->dev.parent); }
From: Shang XiaoJing shangxiaojing@huawei.com
commit 41f563ab3c33698bdfc3403c7c2e6c94e73681e4 upstream.
start_task() calls create_singlethread_workqueue() and not checked the ret value, which may return NULL. And a null-ptr-deref may happen:
start_task() create_singlethread_workqueue() # failed, led_wq is NULL queue_delayed_work() queue_delayed_work_on() __queue_delayed_work() # warning here, but continue __queue_work() # access wq->flags, null-ptr-deref
Check the ret value and return -ENOMEM if it is NULL.
Fixes: 3499495205a6 ("[PARISC] Use work queue in LED/LCD driver instead of tasklet.") Signed-off-by: Shang XiaoJing shangxiaojing@huawei.com Signed-off-by: Helge Deller deller@gmx.de Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/parisc/led.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/parisc/led.c +++ b/drivers/parisc/led.c @@ -137,6 +137,9 @@ static int start_task(void)
/* Create the work queue and queue the LED task */ led_wq = create_singlethread_workqueue("led_wq"); + if (!led_wq) + return -ENOMEM; + queue_delayed_work(led_wq, &led_task, 0);
return 0;
From: Wang Weiyang wangweiyang2@huawei.com
commit e68bfbd3b3c3a0ec3cf8c230996ad8cabe90322f upstream.
When add the 'a *:* rwm' entry to devcgroup A's whitelist, at first A's exceptions will be cleaned and A's behavior is changed to DEVCG_DEFAULT_ALLOW. Then parent's exceptions will be copyed to A's whitelist. If copy failure occurs, just return leaving A to grant permissions to all devices. And A may grant more permissions than parent.
Backup A's whitelist and recover original exceptions after copy failure.
Cc: stable@vger.kernel.org Fixes: 4cef7299b478 ("device_cgroup: add proper checking when changing default behavior") Signed-off-by: Wang Weiyang wangweiyang2@huawei.com Reviewed-by: Aristeu Rozanski aris@redhat.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- security/device_cgroup.c | 33 +++++++++++++++++++++++++++++---- 1 file changed, 29 insertions(+), 4 deletions(-)
--- a/security/device_cgroup.c +++ b/security/device_cgroup.c @@ -81,6 +81,17 @@ free_and_exit: return -ENOMEM; }
+static void dev_exceptions_move(struct list_head *dest, struct list_head *orig) +{ + struct dev_exception_item *ex, *tmp; + + lockdep_assert_held(&devcgroup_mutex); + + list_for_each_entry_safe(ex, tmp, orig, list) { + list_move_tail(&ex->list, dest); + } +} + /* * called under devcgroup_mutex */ @@ -603,11 +614,13 @@ static int devcgroup_update_access(struc int count, rc = 0; struct dev_exception_item ex; struct dev_cgroup *parent = css_to_devcgroup(devcgroup->css.parent); + struct dev_cgroup tmp_devcgrp;
if (!capable(CAP_SYS_ADMIN)) return -EPERM;
memset(&ex, 0, sizeof(ex)); + memset(&tmp_devcgrp, 0, sizeof(tmp_devcgrp)); b = buffer;
switch (*b) { @@ -619,15 +632,27 @@ static int devcgroup_update_access(struc
if (!may_allow_all(parent)) return -EPERM; - dev_exception_clean(devcgroup); - devcgroup->behavior = DEVCG_DEFAULT_ALLOW; - if (!parent) + if (!parent) { + devcgroup->behavior = DEVCG_DEFAULT_ALLOW; + dev_exception_clean(devcgroup); break; + }
+ INIT_LIST_HEAD(&tmp_devcgrp.exceptions); + rc = dev_exceptions_copy(&tmp_devcgrp.exceptions, + &devcgroup->exceptions); + if (rc) + return rc; + dev_exception_clean(devcgroup); rc = dev_exceptions_copy(&devcgroup->exceptions, &parent->exceptions); - if (rc) + if (rc) { + dev_exceptions_move(&devcgroup->exceptions, + &tmp_devcgrp.exceptions); return rc; + } + devcgroup->behavior = DEVCG_DEFAULT_ALLOW; + dev_exception_clean(&tmp_devcgrp); break; case DEVCG_DENY: if (css_has_online_children(&devcgroup->css))
From: Simon Ser contact@emersion.fr
commit 6fdc2d490ea1369d17afd7e6eb66fecc5b7209bc upstream.
A typical DP-MST unplug removes a KMS connector. However care must be taken to properly synchronize with user-space. The expected sequence of events is the following:
1. The kernel notices that the DP-MST port is gone. 2. The kernel marks the connector as disconnected, then sends a uevent to make user-space re-scan the connector list. 3. User-space notices the connector goes from connected to disconnected, disables it. 4. Kernel handles the IOCTL disabling the connector. On success, the very last reference to the struct drm_connector is dropped and drm_connector_cleanup() is called. 5. The connector is removed from the list, and a uevent is sent to tell user-space that the connector disappeared.
The very last step was missing. As a result, user-space thought the connector still existed and could try to disable it again. Since the kernel no longer knows about the connector, that would end up with EINVAL and confused user-space.
Fix this by sending a hotplug uevent from drm_connector_cleanup().
Signed-off-by: Simon Ser contact@emersion.fr Cc: stable@vger.kernel.org Cc: Daniel Vetter daniel.vetter@ffwll.ch Cc: Lyude Paul lyude@redhat.com Cc: Jonas Ådahl jadahl@redhat.com Tested-by: Jonas Ådahl jadahl@redhat.com Reviewed-by: Lyude Paul lyude@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20221017153150.60675-2-contact... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/drm_connector.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/drivers/gpu/drm/drm_connector.c +++ b/drivers/gpu/drm/drm_connector.c @@ -487,6 +487,9 @@ void drm_connector_cleanup(struct drm_co mutex_destroy(&connector->mutex);
memset(connector, 0, sizeof(*connector)); + + if (dev->registered) + drm_sysfs_hotplug_event(dev); } EXPORT_SYMBOL(drm_connector_cleanup);
From: Zack Rusin zackr@vmware.com
commit 4cf949c7fafe21e085a4ee386bb2dade9067316e upstream.
Invalid userspace dma surface copies could potentially overflow the memcpy from the surface to the snooped image leading to crashes. To fix it the dimensions of the copybox have to be validated against the expected size of the snooped cursor.
Signed-off-by: Zack Rusin zackr@vmware.com Fixes: 2ac863719e51 ("vmwgfx: Snoop DMA transfers with non-covering sizes") Cc: stable@vger.kernel.org # v3.2+ Reviewed-by: Michael Banack banackm@vmware.com Reviewed-by: Martin Krastev krastevm@vmware.com Link: https://patchwork.freedesktop.org/patch/msgid/20221026031936.1004280-1-zack@... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c @@ -186,7 +186,8 @@ void vmw_kms_cursor_snoop(struct vmw_sur if (cmd->dma.guest.ptr.offset % PAGE_SIZE || box->x != 0 || box->y != 0 || box->z != 0 || box->srcx != 0 || box->srcy != 0 || box->srcz != 0 || - box->d != 1 || box_count != 1) { + box->d != 1 || box_count != 1 || + box->w > 64 || box->h > 64) { /* TODO handle none page aligned offsets */ /* TODO handle more dst & src != 0 */ /* TODO handle more then one copy */
From: Mikko Kovanen mikko.kovanen@aavamobile.com
commit f9cdf4130671d767071607d0a7568c9bd36a68d0 upstream.
intel_dsi->ports contains bitmask of enabled ports and correspondingly logic for selecting port for VBT packet sending must use port specific bitmask when deciding appropriate port.
Fixes: 08c59dde71b7 ("drm/i915/dsi: fix VBT send packet port selection for ICL+") Cc: stable@vger.kernel.org Signed-off-by: Mikko Kovanen mikko.kovanen@aavamobile.com Reviewed-by: Jani Nikula jani.nikula@intel.com Signed-off-by: Jani Nikula jani.nikula@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/DBBPR09MB466592B16885D99ABBF23... (cherry picked from commit 8d58bb7991c45f6b60710cc04c9498c6ea96db90) Signed-off-by: Rodrigo Vivi rodrigo.vivi@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/display/intel_dsi_vbt.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/i915/display/intel_dsi_vbt.c +++ b/drivers/gpu/drm/i915/display/intel_dsi_vbt.c @@ -133,9 +133,9 @@ static enum port intel_dsi_seq_port_to_p return ffs(intel_dsi->ports) - 1;
if (seq_port) { - if (intel_dsi->ports & PORT_B) + if (intel_dsi->ports & BIT(PORT_B)) return PORT_B; - else if (intel_dsi->ports & PORT_C) + else if (intel_dsi->ports & BIT(PORT_C)) return PORT_C; }
From: Yuan Can yuancan@huawei.com
commit 47078311b8efebdefd5b3b2f87e2b02b14f49c66 upstream.
A problem about modprobe ingenic-drm failed is triggered with the following log given:
[ 303.561088] Error: Driver 'ingenic-ipu' is already registered, aborting... modprobe: ERROR: could not insert 'ingenic_drm': Device or resource busy
The reason is that ingenic_drm_init() returns platform_driver_register() directly without checking its return value, if platform_driver_register() failed, it returns without unregistering ingenic_ipu_driver_ptr, resulting the ingenic-drm can never be installed later. A simple call graph is shown as below:
ingenic_drm_init() platform_driver_register() # ingenic_ipu_driver_ptr are registered platform_driver_register() driver_register() bus_add_driver() priv = kzalloc(...) # OOM happened # return without unregister ingenic_ipu_driver_ptr
Fixing this problem by checking the return value of platform_driver_register() and do platform_unregister_drivers() if error happened.
Fixes: fc1acf317b01 ("drm/ingenic: Add support for the IPU") Signed-off-by: Yuan Can yuancan@huawei.com Cc: stable@vger.kernel.org Signed-off-by: Paul Cercueil paul@crapouillou.net Link: https://patchwork.freedesktop.org/patch/msgid/20221104064512.8569-1-yuancan@... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/ingenic/ingenic-drm-drv.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/ingenic/ingenic-drm-drv.c +++ b/drivers/gpu/drm/ingenic/ingenic-drm-drv.c @@ -1326,7 +1326,11 @@ static int ingenic_drm_init(void) return err; }
- return platform_driver_register(&ingenic_drm_driver); + err = platform_driver_register(&ingenic_drm_driver); + if (IS_ENABLED(CONFIG_DRM_INGENIC_IPU) && err) + platform_driver_unregister(ingenic_ipu_driver_ptr); + + return err; } module_init(ingenic_drm_init);
From: Zhang Yi yi.zhang@huawei.com
commit bc12ac98ea2e1b70adc6478c8b473a0003b659d3 upstream.
When evicting an inode with default dioread_nolock, it could be raced by the unwritten extents converting kworker after writeback some new allocated dirty blocks. It convert unwritten extents to written, the extents could be merged to upper level and free extent blocks, so it could mark the inode dirty again even this inode has been marked I_FREEING. But the inode->i_io_list check and warning in ext4_evict_inode() missing this corner case. Fortunately, ext4_evict_inode() will wait all extents converting finished before this check, so it will not lead to inode use-after-free problem, every thing is OK besides this warning. The WARN_ON_ONCE was originally designed for finding inode use-after-free issues in advance, but if we add current dioread_nolock case in, it will become not quite useful, so fix this warning by just remove this check.
====== WARNING: CPU: 7 PID: 1092 at fs/ext4/inode.c:227 ext4_evict_inode+0x875/0xc60 ... RIP: 0010:ext4_evict_inode+0x875/0xc60 ... Call Trace: <TASK> evict+0x11c/0x2b0 iput+0x236/0x3a0 do_unlinkat+0x1b4/0x490 __x64_sys_unlinkat+0x4c/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fa933c1115b ======
rm kworker ext4_end_io_end() vfs_unlink() ext4_unlink() ext4_convert_unwritten_io_end_vec() ext4_convert_unwritten_extents() ext4_map_blocks() ext4_ext_map_blocks() ext4_ext_try_to_merge_up() __mark_inode_dirty() check !I_FREEING locked_inode_to_wb_and_lock_list() iput() iput_final() evict() ext4_evict_inode() truncate_inode_pages_final() //wait release io_end inode_io_list_move_locked() ext4_release_io_end() trigger WARN_ON_ONCE()
Cc: stable@kernel.org Fixes: ceff86fddae8 ("ext4: Avoid freeing inodes on dirty list") Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220629112647.4141034-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
--- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -225,13 +225,13 @@ void ext4_evict_inode(struct inode *inod
/* * For inodes with journalled data, transaction commit could have - * dirtied the inode. Flush worker is ignoring it because of I_FREEING - * flag but we still need to remove the inode from the writeback lists. + * dirtied the inode. And for inodes with dioread_nolock, unwritten + * extents converting worker could merge extents and also have dirtied + * the inode. Flush worker is ignoring it because of I_FREEING flag but + * we still need to remove the inode from the writeback lists. */ - if (!list_empty_careful(&inode->i_io_list)) { - WARN_ON_ONCE(!ext4_should_journal_data(inode)); + if (!list_empty_careful(&inode->i_io_list)) inode_io_list_del(inode); - }
/* * Protect us against freezing - iput() caller didn't have to have any
From: Baokun Li libaokun1@huawei.com
commit eee22187b53611e173161e38f61de1c7ecbeb876 upstream.
In do_writepages, if the value returned by ext4_writepages is "-ENOMEM" and "wbc->sync_mode == WB_SYNC_ALL", retry until the condition is not met.
In __ext4_get_inode_loc, if the bh returned by sb_getblk is NULL, the function returns -ENOMEM.
In __getblk_slow, if the return value of grow_buffers is less than 0, the function returns NULL.
When the three processes are connected in series like the following stack, an infinite loop may occur:
do_writepages <--- keep retrying ext4_writepages mpage_map_and_submit_extent mpage_map_one_extent ext4_map_blocks ext4_ext_map_blocks ext4_ext_handle_unwritten_extents ext4_ext_convert_to_initialized ext4_split_extent ext4_split_extent_at __ext4_ext_dirty __ext4_mark_inode_dirty ext4_reserve_inode_write ext4_get_inode_loc __ext4_get_inode_loc <--- return -ENOMEM sb_getblk __getblk_gfp __getblk_slow <--- return NULL grow_buffers grow_dev_page <--- return -ENXIO ret = (block < end_block) ? 1 : -ENXIO;
In this issue, bg_inode_table_hi is overwritten as an incorrect value. As a result, `block < end_block` cannot be met in grow_dev_page. Therefore, __ext4_get_inode_loc always returns '-ENOMEM' and do_writepages keeps retrying. As a result, the writeback process is in the D state due to an infinite loop.
Add a check on inode table block in the __ext4_get_inode_loc function by referring to ext4_read_inode_bitmap to avoid this infinite loop.
Cc: stable@kernel.org Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Ritesh Harjani (IBM) ritesh.list@gmail.com Link: https://lore.kernel.org/r/20220817132701.3015912-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/inode.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-)
--- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4299,9 +4299,17 @@ static int __ext4_get_inode_loc(struct s inodes_per_block = EXT4_SB(sb)->s_inodes_per_block; inode_offset = ((ino - 1) % EXT4_INODES_PER_GROUP(sb)); - block = ext4_inode_table(sb, gdp) + (inode_offset / inodes_per_block); iloc->offset = (inode_offset % inodes_per_block) * EXT4_INODE_SIZE(sb);
+ block = ext4_inode_table(sb, gdp); + if ((block <= le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block)) || + (block >= ext4_blocks_count(EXT4_SB(sb)->s_es))) { + ext4_error(sb, "Invalid inode table block %llu in " + "block_group %u", block, iloc->block_group); + return -EFSCORRUPTED; + } + block += (inode_offset / inodes_per_block); + bh = sb_getblk(sb, block); if (unlikely(!bh)) return -ENOMEM;
From: Luís Henriques lhenriques@suse.de
commit 78742d4d056df7d2fad241c90185d281bf924844 upstream.
The ext4_msg() function adds a new line to the message. Remove extra '\n' from call to ext4_msg() in ext4_orphan_cleanup().
Signed-off-by: Luís Henriques lhenriques@suse.de Link: https://lore.kernel.org/r/20221011155758.15287-1-lhenriques@suse.de Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/orphan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/orphan.c +++ b/fs/ext4/orphan.c @@ -412,7 +412,7 @@ void ext4_orphan_cleanup(struct super_bl /* don't clear list on RO mount w/ errors */ if (es->s_last_orphan && !(s_flags & SB_RDONLY)) { ext4_msg(sb, KERN_INFO, "Errors on filesystem, " - "clearing orphan list.\n"); + "clearing orphan list."); es->s_last_orphan = 0; } jbd_debug(1, "Skipping orphan recovery on fs with errors.\n");
From: Alexander Potapenko glider@google.com
commit 956510c0c7439e90b8103aaeaf4da92878c622f0 upstream.
When aops->write_begin() does not initialize fsdata, KMSAN reports an error passing the latter to aops->write_end().
Fix this by unconditionally initializing fsdata.
Cc: Eric Biggers ebiggers@kernel.org Fixes: c93d8f885809 ("ext4: add basic fs-verity support") Reported-by: syzbot+9767be679ef5016b6082@syzkaller.appspotmail.com Signed-off-by: Alexander Potapenko glider@google.com Reviewed-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20221121112134.407362-1-glider@google.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/verity.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/verity.c +++ b/fs/ext4/verity.c @@ -76,7 +76,7 @@ static int pagecache_write(struct inode size_t n = min_t(size_t, count, PAGE_SIZE - offset_in_page(pos)); struct page *page; - void *fsdata; + void *fsdata = NULL; int res;
res = pagecache_write_begin(NULL, inode->i_mapping, pos, n, 0,
From: Baokun Li libaokun1@huawei.com
commit a71248b1accb2b42e4980afef4fa4a27fa0e36f5 upstream.
I caught a issue as follows: ================================================================== BUG: KASAN: use-after-free in __list_add_valid+0x28/0x1a0 Read of size 8 at addr ffff88814b13f378 by task mount/710
CPU: 1 PID: 710 Comm: mount Not tainted 6.1.0-rc3-next #370 Call Trace: <TASK> dump_stack_lvl+0x73/0x9f print_report+0x25d/0x759 kasan_report+0xc0/0x120 __asan_load8+0x99/0x140 __list_add_valid+0x28/0x1a0 ext4_orphan_cleanup+0x564/0x9d0 [ext4] __ext4_fill_super+0x48e2/0x5300 [ext4] ext4_fill_super+0x19f/0x3a0 [ext4] get_tree_bdev+0x27b/0x450 ext4_get_tree+0x19/0x30 [ext4] vfs_get_tree+0x49/0x150 path_mount+0xaae/0x1350 do_mount+0xe2/0x110 __x64_sys_mount+0xf0/0x190 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> [...] ==================================================================
Above issue may happen as follows: ------------------------------------- ext4_fill_super ext4_orphan_cleanup --- loop1: assume last_orphan is 12 --- list_add(&EXT4_I(inode)->i_orphan, &EXT4_SB(sb)->s_orphan) ext4_truncate --> return 0 ext4_inode_attach_jinode --> return -ENOMEM iput(inode) --> free inode<12> --- loop2: last_orphan is still 12 --- list_add(&EXT4_I(inode)->i_orphan, &EXT4_SB(sb)->s_orphan); // use inode<12> and trigger UAF
To solve this issue, we need to propagate the return value of ext4_inode_attach_jinode() appropriately.
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221102080633.1630225-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4198,7 +4198,8 @@ int ext4_truncate(struct inode *inode)
/* If we zero-out tail of the page, we have to create jinode for jbd2 */ if (inode->i_size & (inode->i_sb->s_blocksize - 1)) { - if (ext4_inode_attach_jinode(inode) < 0) + err = ext4_inode_attach_jinode(inode); + if (err) goto out_trace; }
From: Gaosheng Cui cuigaosheng1@huawei.com
commit 3bf678a0f9c017c9ba7c581541dbc8453452a7ae upstream.
Shifting signed 32-bit value by 31 bits is undefined, so changing significant bit to unsigned. The UBSAN warning calltrace like below:
UBSAN: shift-out-of-bounds in fs/ext4/ext4.h:591:2 left shift of 1 by 31 places cannot be represented in type 'int' Call Trace: <TASK> dump_stack_lvl+0x7d/0xa5 dump_stack+0x15/0x1b ubsan_epilogue+0xe/0x4e __ubsan_handle_shift_out_of_bounds+0x1e7/0x20c ext4_init_fs+0x5a/0x277 do_one_initcall+0x76/0x430 kernel_init_freeable+0x3b3/0x422 kernel_init+0x24/0x1e0 ret_from_fork+0x1f/0x30 </TASK>
Fixes: 9a4c80194713 ("ext4: ensure Inode flags consistency are checked at build time") Signed-off-by: Gaosheng Cui cuigaosheng1@huawei.com Link: https://lore.kernel.org/r/20221031055833.3966222-1-cuigaosheng1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/ext4.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -559,7 +559,7 @@ enum { * * It's not paranoia if the Murphy's Law really *is* out to get you. :-) */ -#define TEST_FLAG_VALUE(FLAG) (EXT4_##FLAG##_FL == (1 << EXT4_INODE_##FLAG)) +#define TEST_FLAG_VALUE(FLAG) (EXT4_##FLAG##_FL == (1U << EXT4_INODE_##FLAG)) #define CHECK_FLAG_VALUE(FLAG) BUILD_BUG_ON(!TEST_FLAG_VALUE(FLAG))
static inline void ext4_check_flag_values(void)
From: Baokun Li libaokun1@huawei.com
commit 63b1e9bccb71fe7d7e3ddc9877dbdc85e5d2d023 upstream.
There are many places that will get unhappy (and crash) when ext4_iget() returns a bad inode. However, if iget the boot loader inode, allows a bad inode to be returned, because the inode may not be initialized. This mechanism can be used to bypass some checks and cause panic. To solve this problem, we add a special iget flag EXT4_IGET_BAD. Only with this flag we'd be returning bad inode from ext4_iget(), otherwise we always return the error code if the inode is bad inode.(suggested by Jan Kara)
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221026042310.3839669-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/ext4.h | 3 ++- fs/ext4/inode.c | 8 +++++++- fs/ext4/ioctl.c | 3 ++- 3 files changed, 11 insertions(+), 3 deletions(-)
--- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -2996,7 +2996,8 @@ int do_journal_get_write_access(handle_t typedef enum { EXT4_IGET_NORMAL = 0, EXT4_IGET_SPECIAL = 0x0001, /* OK to iget a system inode */ - EXT4_IGET_HANDLE = 0x0002 /* Inode # is from a handle */ + EXT4_IGET_HANDLE = 0x0002, /* Inode # is from a handle */ + EXT4_IGET_BAD = 0x0004 /* Allow to iget a bad inode */ } ext4_iget_flags;
extern struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4885,8 +4885,14 @@ struct inode *__ext4_iget(struct super_b if (IS_CASEFOLDED(inode) && !ext4_has_feature_casefold(inode->i_sb)) ext4_error_inode(inode, function, line, 0, "casefold flag without casefold feature"); - brelse(iloc.bh); + if (is_bad_inode(inode) && !(flags & EXT4_IGET_BAD)) { + ext4_error_inode(inode, function, line, 0, + "bad inode without EXT4_IGET_BAD flag"); + ret = -EUCLEAN; + goto bad_inode; + }
+ brelse(iloc.bh); unlock_new_inode(inode); return inode;
--- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -124,7 +124,8 @@ static long swap_inode_boot_loader(struc blkcnt_t blocks; unsigned short bytes;
- inode_bl = ext4_iget(sb, EXT4_BOOT_LOADER_INO, EXT4_IGET_SPECIAL); + inode_bl = ext4_iget(sb, EXT4_BOOT_LOADER_INO, + EXT4_IGET_SPECIAL | EXT4_IGET_BAD); if (IS_ERR(inode_bl)) return PTR_ERR(inode_bl); ei_bl = EXT4_I(inode_bl);
From: Baokun Li libaokun1@huawei.com
commit 07342ec259df2a35d6a34aebce010567a80a0e15 upstream.
Before quota is enabled, a check on the preset quota inums in ext4_super_block is added to prevent wrong quota inodes from being loaded. In addition, when the quota fails to be enabled, the quota type and quota inum are printed to facilitate fault locating.
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221026042310.3839669-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/super.c | 28 +++++++++++++++++++++++++--- 1 file changed, 25 insertions(+), 3 deletions(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -6309,6 +6309,20 @@ static int ext4_quota_on(struct super_bl return err; }
+static inline bool ext4_check_quota_inum(int type, unsigned long qf_inum) +{ + switch (type) { + case USRQUOTA: + return qf_inum == EXT4_USR_QUOTA_INO; + case GRPQUOTA: + return qf_inum == EXT4_GRP_QUOTA_INO; + case PRJQUOTA: + return qf_inum >= EXT4_GOOD_OLD_FIRST_INO; + default: + BUG(); + } +} + static int ext4_quota_enable(struct super_block *sb, int type, int format_id, unsigned int flags) { @@ -6325,9 +6339,16 @@ static int ext4_quota_enable(struct supe if (!qf_inums[type]) return -EPERM;
+ if (!ext4_check_quota_inum(type, qf_inums[type])) { + ext4_error(sb, "Bad quota inum: %lu, type: %d", + qf_inums[type], type); + return -EUCLEAN; + } + qf_inode = ext4_iget(sb, qf_inums[type], EXT4_IGET_SPECIAL); if (IS_ERR(qf_inode)) { - ext4_error(sb, "Bad quota inode # %lu", qf_inums[type]); + ext4_error(sb, "Bad quota inode: %lu, type: %d", + qf_inums[type], type); return PTR_ERR(qf_inode); }
@@ -6366,8 +6387,9 @@ int ext4_enable_quotas(struct super_bloc if (err) { ext4_warning(sb, "Failed to enable quota tracking " - "(type=%d, err=%d). Please run " - "e2fsck to fix.", type, err); + "(type=%d, err=%d, ino=%lu). " + "Please run e2fsck to fix.", type, + err, qf_inums[type]); for (type--; type >= 0; type--) { struct inode *inode;
From: Baokun Li libaokun1@huawei.com
commit d323877484765aaacbb2769b06e355c2041ed115 upstream.
We got a issue as fllows: ================================================================== kernel BUG at fs/ext4/extents_status.c:202! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 1 PID: 810 Comm: mount Not tainted 6.1.0-rc1-next-g9631525255e3 #352 RIP: 0010:__es_tree_search.isra.0+0xb8/0xe0 RSP: 0018:ffffc90001227900 EFLAGS: 00010202 RAX: 0000000000000000 RBX: 0000000077512a0f RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000002a10 RDI: ffff8881004cd0c8 RBP: ffff888177512ac8 R08: 47ffffffffffffff R09: 0000000000000001 R10: 0000000000000001 R11: 00000000000679af R12: 0000000000002a10 R13: ffff888177512d88 R14: 0000000077512a10 R15: 0000000000000000 FS: 00007f4bd76dbc40(0000)GS:ffff88842fd00000(0000)knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005653bf993cf8 CR3: 000000017bfdf000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ext4_es_cache_extent+0xe2/0x210 ext4_cache_extents+0xd2/0x110 ext4_find_extent+0x5d5/0x8c0 ext4_ext_map_blocks+0x9c/0x1d30 ext4_map_blocks+0x431/0xa50 ext4_getblk+0x82/0x340 ext4_bread+0x14/0x110 ext4_quota_read+0xf0/0x180 v2_read_header+0x24/0x90 v2_check_quota_file+0x2f/0xa0 dquot_load_quota_sb+0x26c/0x760 dquot_load_quota_inode+0xa5/0x190 ext4_enable_quotas+0x14c/0x300 __ext4_fill_super+0x31cc/0x32c0 ext4_fill_super+0x115/0x2d0 get_tree_bdev+0x1d2/0x360 ext4_get_tree+0x19/0x30 vfs_get_tree+0x26/0xe0 path_mount+0x81d/0xfc0 do_mount+0x8d/0xc0 __x64_sys_mount+0xc0/0x160 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> ==================================================================
Above issue may happen as follows: ------------------------------------- ext4_fill_super ext4_orphan_cleanup ext4_enable_quotas ext4_quota_enable ext4_iget --> get error inode <5> ext4_ext_check_inode --> Wrong imode makes it escape inspection make_bad_inode(inode) --> EXT4_BOOT_LOADER_INO set imode dquot_load_quota_inode vfs_setup_quota_inode --> check pass dquot_load_quota_sb v2_check_quota_file v2_read_header ext4_quota_read ext4_bread ext4_getblk ext4_map_blocks ext4_ext_map_blocks ext4_find_extent ext4_cache_extents ext4_es_cache_extent __es_tree_search.isra.0 ext4_es_end --> Wrong extents trigger BUG_ON
In the above issue, s_usr_quota_inum is set to 5, but inode<5> contains incorrect imode and disordered extents. Because 5 is EXT4_BOOT_LOADER_INO, the ext4_ext_check_inode check in the ext4_iget function can be bypassed, finally, the extents that are not checked trigger the BUG_ON in the __es_tree_search function. To solve this issue, check whether the inode is bad_inode in vfs_setup_quota_inode().
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Reviewed-by: Jason Yan yanaijie@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221026042310.3839669-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/quota/dquot.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -2317,6 +2317,8 @@ static int vfs_setup_quota_inode(struct struct super_block *sb = inode->i_sb; struct quota_info *dqopt = sb_dqopt(sb);
+ if (is_bad_inode(inode)) + return -EUCLEAN; if (!S_ISREG(inode->i_mode)) return -EACCES; if (IS_RDONLY(inode))
From: Ye Bin yebin10@huawei.com
commit 1da18e38cb97e9521e93d63034521a9649524f64 upstream.
When bigalloc is enabled, reserved cluster accounting for delayed allocation is handled in extent_status.c. With a corrupted file system, it's possible for this accounting to be incorrect, dsicovered by Syzbot:
EXT4-fs error (device loop0): ext4_validate_block_bitmap:398: comm rep: bg 0: block 5: invalid block bitmap EXT4-fs (loop0): Delayed block allocation failed for inode 18 at logical offset 0 with max blocks 32 with error 28 EXT4-fs (loop0): This should not happen!! Data will be lost
EXT4-fs (loop0): Total free blocks count 0 EXT4-fs (loop0): Free/Dirty block details EXT4-fs (loop0): free_blocks=0 EXT4-fs (loop0): dirty_blocks=32 EXT4-fs (loop0): Block reservation details EXT4-fs (loop0): i_reserved_data_blocks=2 EXT4-fs (loop0): Inode 18 (00000000845cd634): i_reserved_data_blocks (1) not cleared!
Above issue happens as follows: Assume: sbi->s_cluster_ratio = 16 Step1: Insert delay block [0, 31] -> ei->i_reserved_data_blocks=2 Step2: ext4_writepages mpage_map_and_submit_extent -> return failed mpage_release_unused_pages -> to release [0, 30] ext4_es_remove_extent -> remove lblk=0 end=30 __es_remove_extent -> len1=0 len2=31-30=1 __es_remove_extent: ... if (len2 > 0) { ... if (len1 > 0) { ... } else { es->es_lblk = end + 1; es->es_len = len2; ... } if (count_reserved) count_rsvd(inode, lblk, ...); goto out; -> will return but didn't calculate 'reserved' ... Step3: ext4_destroy_inode -> trigger "i_reserved_data_blocks (1) not cleared!"
To solve above issue if 'len2>0' call 'get_rsvd()' before goto out.
Reported-by: syzbot+05a0f0ccab4a25626e38@syzkaller.appspotmail.com Fixes: 8fcc3a580651 ("ext4: rework reserved cluster accounting when invalidating pages") Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Eric Whitney enwlinux@gmail.com Link: https://lore.kernel.org/r/20221208033426.1832460-2-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/extents_status.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/ext4/extents_status.c +++ b/fs/ext4/extents_status.c @@ -1372,7 +1372,7 @@ retry: if (count_reserved) count_rsvd(inode, lblk, orig_es.es_len - len1 - len2, &orig_es, &rc); - goto out; + goto out_get_reserved; }
if (len1 > 0) { @@ -1414,6 +1414,7 @@ retry: } }
+out_get_reserved: if (count_reserved) *reserved = get_rsvd(inode, end, es, &rc); out:
From: Zhang Yi yi.zhang@huawei.com
commit 318cdc822c63b6e2befcfdc2088378ae6fa18def upstream.
In ext4_evict_inode(), if we evicting an inode in the 'no_delete' path, it cannot be raced by another mark_inode_dirty(). If it happens, someone else may accidentally dirty it without holding inode refcount and probably cause use-after-free issues in the writeback procedure. It's indiscoverable and hard to debug, so add an WARN_ON_ONCE() to check and detect this issue in advance.
Suggested-by: Jan Kara jack@suse.cz Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220629112647.4141034-2-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/inode.c | 6 ++++++ 1 file changed, 6 insertions(+)
--- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -338,6 +338,12 @@ stop_handle: ext4_xattr_inode_array_free(ea_inode_array); return; no_delete: + /* + * Check out some where else accidentally dirty the evicting inode, + * which may probably cause inode use-after-free issues later. + */ + WARN_ON_ONCE(!list_empty_careful(&inode->i_io_list)); + if (!list_empty(&EXT4_I(inode)->i_fc_list)) ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_NOMEM, NULL); ext4_clear_inode(inode); /* We must guarantee clearing of inode... */
From: Baokun Li libaokun1@huawei.com
commit 991ed014de0840c5dc405b679168924afb2952ac upstream.
We got a issue as fllows: ================================================================== kernel BUG at fs/ext4/extents_status.c:203! invalid opcode: 0000 [#1] PREEMPT SMP CPU: 1 PID: 945 Comm: cat Not tainted 6.0.0-next-20221007-dirty #349 RIP: 0010:ext4_es_end.isra.0+0x34/0x42 RSP: 0018:ffffc9000143b768 EFLAGS: 00010203 RAX: 0000000000000000 RBX: ffff8881769cd0b8 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff8fc27cf7 RDI: 00000000ffffffff RBP: ffff8881769cd0bc R08: 0000000000000000 R09: ffffc9000143b5f8 R10: 0000000000000001 R11: 0000000000000001 R12: ffff8881769cd0a0 R13: ffff8881768e5668 R14: 00000000768e52f0 R15: 0000000000000000 FS: 00007f359f7f05c0(0000)GS:ffff88842fd00000(0000)knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f359f5a2000 CR3: 000000017130c000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> __es_tree_search.isra.0+0x6d/0xf5 ext4_es_cache_extent+0xfa/0x230 ext4_cache_extents+0xd2/0x110 ext4_find_extent+0x5d5/0x8c0 ext4_ext_map_blocks+0x9c/0x1d30 ext4_map_blocks+0x431/0xa50 ext4_mpage_readpages+0x48e/0xe40 ext4_readahead+0x47/0x50 read_pages+0x82/0x530 page_cache_ra_unbounded+0x199/0x2a0 do_page_cache_ra+0x47/0x70 page_cache_ra_order+0x242/0x400 ondemand_readahead+0x1e8/0x4b0 page_cache_sync_ra+0xf4/0x110 filemap_get_pages+0x131/0xb20 filemap_read+0xda/0x4b0 generic_file_read_iter+0x13a/0x250 ext4_file_read_iter+0x59/0x1d0 vfs_read+0x28f/0x460 ksys_read+0x73/0x160 __x64_sys_read+0x1e/0x30 do_syscall_64+0x35/0x80 entry_SYSCALL_64_after_hwframe+0x63/0xcd </TASK> ==================================================================
In the above issue, ioctl invokes the swap_inode_boot_loader function to swap inode<5> and inode<12>. However, inode<5> contain incorrect imode and disordered extents, and i_nlink is set to 1. The extents check for inode in the ext4_iget function can be bypassed bacause 5 is EXT4_BOOT_LOADER_INO. While links_count is set to 1, the extents are not initialized in swap_inode_boot_loader. After the ioctl command is executed successfully, the extents are swapped to inode<12>, in this case, run the `cat` command to view inode<12>. And Bug_ON is triggered due to the incorrect extents.
When the boot loader inode is not initialized, its imode can be one of the following: 1) the imode is a bad type, which is marked as bad_inode in ext4_iget and set to S_IFREG. 2) the imode is good type but not S_IFREG. 3) the imode is S_IFREG.
The BUG_ON may be triggered by bypassing the check in cases 1 and 2. Therefore, when the boot loader inode is bad_inode or its imode is not S_IFREG, initialize the inode to avoid triggering the BUG.
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jason Yan yanaijie@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221026042310.3839669-5-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/ioctl.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -175,7 +175,7 @@ static long swap_inode_boot_loader(struc /* Protect extent tree against block allocations via delalloc */ ext4_double_down_write_data_sem(inode, inode_bl);
- if (inode_bl->i_nlink == 0) { + if (is_bad_inode(inode_bl) || !S_ISREG(inode_bl->i_mode)) { /* this inode has never been used as a BOOT_LOADER */ set_nlink(inode_bl, 1); i_uid_write(inode_bl, 0);
From: Eric Biggers ebiggers@google.com
commit 594bc43b410316d70bb42aeff168837888d96810 upstream.
When space at the end of fast-commit journal blocks is unused, make sure to zero it out so that uninitialized memory is not leaked to disk.
Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20221106224841.279231-4-ebiggers@kernel.org Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/fast_commit.c | 5 +++++ 1 file changed, 5 insertions(+)
--- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -657,6 +657,9 @@ static u8 *ext4_fc_reserve_space(struct *crc = ext4_chksum(sbi, *crc, tl, sizeof(*tl)); if (pad_len > 0) ext4_fc_memzero(sb, tl + 1, pad_len, crc); + /* Don't leak uninitialized memory in the unused last byte. */ + *((u8 *)(tl + 1) + pad_len) = 0; + ext4_fc_submit_bh(sb, false);
ret = jbd2_fc_get_buf(EXT4_SB(sb)->s_journal, &bh); @@ -713,6 +716,8 @@ static int ext4_fc_write_tail(struct sup dst += sizeof(tail.fc_tid); tail.fc_crc = cpu_to_le32(crc); ext4_fc_memcpy(sb, dst, &tail.fc_crc, sizeof(tail.fc_crc), NULL); + dst += sizeof(tail.fc_crc); + memset(dst, 0, bsize - off); /* Don't leak uninitialized memory. */
ext4_fc_submit_bh(sb, true);
From: Ye Bin yebin10@huawei.com
commit 7ea71af94eaaaf6d9aed24bc94a05b977a741cb9 upstream.
Syzbot found the following issue: ===================================================== BUG: KMSAN: uninit-value in ext4_evict_inode+0xdd/0x26b0 fs/ext4/inode.c:180 ext4_evict_inode+0xdd/0x26b0 fs/ext4/inode.c:180 evict+0x365/0x9a0 fs/inode.c:664 iput_final fs/inode.c:1747 [inline] iput+0x985/0xdd0 fs/inode.c:1773 __ext4_new_inode+0xe54/0x7ec0 fs/ext4/ialloc.c:1361 ext4_mknod+0x376/0x840 fs/ext4/namei.c:2844 vfs_mknod+0x79d/0x830 fs/namei.c:3914 do_mknodat+0x47d/0xaa0 __do_sys_mknodat fs/namei.c:3992 [inline] __se_sys_mknodat fs/namei.c:3989 [inline] __ia32_sys_mknodat+0xeb/0x150 fs/namei.c:3989 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82
Uninit was created at: __alloc_pages+0x9f1/0xe80 mm/page_alloc.c:5578 alloc_pages+0xaae/0xd80 mm/mempolicy.c:2285 alloc_slab_page mm/slub.c:1794 [inline] allocate_slab+0x1b5/0x1010 mm/slub.c:1939 new_slab mm/slub.c:1992 [inline] ___slab_alloc+0x10c3/0x2d60 mm/slub.c:3180 __slab_alloc mm/slub.c:3279 [inline] slab_alloc_node mm/slub.c:3364 [inline] slab_alloc mm/slub.c:3406 [inline] __kmem_cache_alloc_lru mm/slub.c:3413 [inline] kmem_cache_alloc_lru+0x6f3/0xb30 mm/slub.c:3429 alloc_inode_sb include/linux/fs.h:3117 [inline] ext4_alloc_inode+0x5f/0x860 fs/ext4/super.c:1321 alloc_inode+0x83/0x440 fs/inode.c:259 new_inode_pseudo fs/inode.c:1018 [inline] new_inode+0x3b/0x430 fs/inode.c:1046 __ext4_new_inode+0x2a7/0x7ec0 fs/ext4/ialloc.c:959 ext4_mkdir+0x4d5/0x1560 fs/ext4/namei.c:2992 vfs_mkdir+0x62a/0x870 fs/namei.c:4035 do_mkdirat+0x466/0x7b0 fs/namei.c:4060 __do_sys_mkdirat fs/namei.c:4075 [inline] __se_sys_mkdirat fs/namei.c:4073 [inline] __ia32_sys_mkdirat+0xc4/0x120 fs/namei.c:4073 do_syscall_32_irqs_on arch/x86/entry/common.c:112 [inline] __do_fast_syscall_32+0xa2/0x100 arch/x86/entry/common.c:178 do_fast_syscall_32+0x33/0x70 arch/x86/entry/common.c:203 do_SYSENTER_32+0x1b/0x20 arch/x86/entry/common.c:246 entry_SYSENTER_compat_after_hwframe+0x70/0x82
CPU: 1 PID: 4625 Comm: syz-executor.2 Not tainted 6.1.0-rc4-syzkaller-62821-gcb231e2f67ec #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 =====================================================
Now, 'ext4_alloc_inode()' didn't init 'ei->i_flags'. If new inode failed before set 'ei->i_flags' in '__ext4_new_inode()', then do 'iput()'. As after 6bc0d63dad7f commit will access 'ei->i_flags' in 'ext4_evict_inode()' which will lead to access uninit-value. To solve above issue just init 'ei->i_flags' in 'ext4_alloc_inode()'.
Reported-by: syzbot+57b25da729eb0b88177d@syzkaller.appspotmail.com Signed-off-by: Ye Bin yebin10@huawei.com Fixes: 6bc0d63dad7f ("ext4: remove EA inode entry from mbcache on inode eviction") Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20221117073603.2598882-1-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/super.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -1288,6 +1288,7 @@ static struct inode *ext4_alloc_inode(st return NULL;
inode_set_iversion(&ei->vfs_inode, 1); + ei->i_flags = 0; spin_lock_init(&ei->i_raw_lock); INIT_LIST_HEAD(&ei->i_prealloc_list); atomic_set(&ei->i_prealloc_active, 0);
From: Ye Bin yebin10@huawei.com
commit fae381a3d79bb94aa2eb752170d47458d778b797 upstream.
Syzbot found the following issue: ext4_parse_param: s_want_extra_isize=128 ext4_inode_info_init: s_want_extra_isize=32 ext4_rename: old.inode=ffff88823869a2c8 old.dir=ffff888238699828 new.inode=ffff88823869d7e8 new.dir=ffff888238699828 __ext4_mark_inode_dirty: inode=ffff888238699828 ea_isize=32 want_ea_size=128 __ext4_mark_inode_dirty: inode=ffff88823869a2c8 ea_isize=32 want_ea_size=128 ext4_xattr_block_set: inode=ffff88823869a2c8 ------------[ cut here ]------------ WARNING: CPU: 13 PID: 2234 at fs/ext4/xattr.c:2070 ext4_xattr_block_set.cold+0x22/0x980 Modules linked in: RIP: 0010:ext4_xattr_block_set.cold+0x22/0x980 RSP: 0018:ffff888227d3f3b0 EFLAGS: 00010202 RAX: 0000000000000001 RBX: ffff88823007a000 RCX: 0000000000000000 RDX: 0000000000000a03 RSI: 0000000000000040 RDI: ffff888230078178 RBP: 0000000000000000 R08: 000000000000002c R09: ffffed1075c7df8e R10: ffff8883ae3efc6b R11: ffffed1075c7df8d R12: 0000000000000000 R13: ffff88823869a2c8 R14: ffff8881012e0460 R15: dffffc0000000000 FS: 00007f350ac1f740(0000) GS:ffff8883ae200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f350a6ed6a0 CR3: 0000000237456000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? ext4_xattr_set_entry+0x3b7/0x2320 ? ext4_xattr_block_set+0x0/0x2020 ? ext4_xattr_set_entry+0x0/0x2320 ? ext4_xattr_check_entries+0x77/0x310 ? ext4_xattr_ibody_set+0x23b/0x340 ext4_xattr_move_to_block+0x594/0x720 ext4_expand_extra_isize_ea+0x59a/0x10f0 __ext4_expand_extra_isize+0x278/0x3f0 __ext4_mark_inode_dirty.cold+0x347/0x410 ext4_rename+0xed3/0x174f vfs_rename+0x13a7/0x2510 do_renameat2+0x55d/0x920 __x64_sys_rename+0x7d/0xb0 do_syscall_64+0x3b/0xa0 entry_SYSCALL_64_after_hwframe+0x72/0xdc
As 'ext4_rename' will modify 'old.inode' ctime and mark inode dirty, which may trigger expand 'extra_isize' and allocate block. If inode didn't init quota will lead to warning. To solve above issue, init 'old.inode' firstly in 'ext4_rename'.
Reported-by: syzbot+98346927678ac3059c77@syzkaller.appspotmail.com Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221107015335.2524319-1-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/namei.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -3808,6 +3808,9 @@ static int ext4_rename(struct user_names retval = dquot_initialize(old.dir); if (retval) return retval; + retval = dquot_initialize(old.inode); + if (retval) + return retval; retval = dquot_initialize(new.dir); if (retval) return retval;
From: Eric Whitney enwlinux@gmail.com
commit 131294c35ed6f777bd4e79d42af13b5c41bf2775 upstream.
When converting files with inline data to extents, delayed allocations made on a file system created with both the bigalloc and inline options can result in invalid extent status cache content, incorrect reserved cluster counts, kernel memory leaks, and potential kernel panics.
With bigalloc, the code that determines whether a block must be delayed allocated searches the extent tree to see if that block maps to a previously allocated cluster. If not, the block is delayed allocated, and otherwise, it isn't. However, if the inline option is also used, and if the file containing the block is marked as able to store data inline, there isn't a valid extent tree associated with the file. The current code in ext4_clu_mapped() calls ext4_find_extent() to search the non-existent tree for a previously allocated cluster anyway, which typically finds nothing, as desired. However, a side effect of the search can be to cache invalid content from the non-existent tree (garbage) in the extent status tree, including bogus entries in the pending reservation tree.
To fix this, avoid searching the extent tree when allocating blocks for bigalloc + inline files that are being converted from inline to extent mapped.
Signed-off-by: Eric Whitney enwlinux@gmail.com Link: https://lore.kernel.org/r/20221117152207.2424-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/extents.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -5810,6 +5810,14 @@ int ext4_clu_mapped(struct inode *inode, struct ext4_extent *extent; ext4_lblk_t first_lblk, first_lclu, last_lclu;
+ /* + * if data can be stored inline, the logical cluster isn't + * mapped - no physical clusters have been allocated, and the + * file has no extents + */ + if (ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) + return 0; + /* search for the extent closest to the first block in the cluster */ path = ext4_find_extent(inode, EXT4_C2B(sbi, lclu), NULL, 0); if (IS_ERR(path)) {
From: Baokun Li libaokun1@huawei.com
commit 0aeaa2559d6d53358fca3e3fce73807367adca74 upstream.
When a backup superblock is updated in update_backups(), the primary superblock's offset in the group (that is, sbi->s_sbh->b_blocknr) is used as the backup superblock's offset in its group. However, when the block size is 1K and bigalloc is enabled, the two offsets are not equal. This causes the backup group descriptors to be overwritten by the superblock in update_backups(). Moreover, if meta_bg is enabled, the file system will be corrupted because this feature uses backup group descriptors.
To solve this issue, we use a more accurate ext4_group_first_block_no() as the offset of the backup superblock in its group.
Fixes: d77147ff443b ("ext4: add support for online resizing with bigalloc") Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Cc: stable@kernel.org Link: https://lore.kernel.org/r/20221117040341.1380702-4-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/resize.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -1557,8 +1557,8 @@ exit_journal: int meta_bg = ext4_has_feature_meta_bg(sb); sector_t old_gdb = 0;
- update_backups(sb, sbi->s_sbh->b_blocknr, (char *)es, - sizeof(struct ext4_super_block), 0); + update_backups(sb, ext4_group_first_block_no(sb, 0), + (char *)es, sizeof(struct ext4_super_block), 0); for (; gdb_num <= gdb_num_end; gdb_num++) { struct buffer_head *gdb_bh;
@@ -1769,7 +1769,7 @@ errout: if (test_opt(sb, DEBUG)) printk(KERN_DEBUG "EXT4-fs: extended group to %llu " "blocks\n", ext4_blocks_count(es)); - update_backups(sb, EXT4_SB(sb)->s_sbh->b_blocknr, + update_backups(sb, ext4_group_first_block_no(sb, 0), (char *)es, sizeof(struct ext4_super_block), 0); } return err;
From: Luís Henriques lhenriques@suse.de
commit 26d75a16af285a70863ba6a81f85d81e7e65da50 upstream.
If a block is out of range in ext4_get_branch(), -ENOMEM will be returned to user-space. Obviously, this error code isn't really useful. This patch fixes it by making sure the right error code (-EFSCORRUPTED) is propagated to user-space. EUCLEAN is more informative than ENOMEM.
Signed-off-by: Luís Henriques lhenriques@suse.de Link: https://lore.kernel.org/r/20221109181445.17843-1-lhenriques@suse.de Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/indirect.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
--- a/fs/ext4/indirect.c +++ b/fs/ext4/indirect.c @@ -148,6 +148,7 @@ static Indirect *ext4_get_branch(struct struct super_block *sb = inode->i_sb; Indirect *p = chain; struct buffer_head *bh; + unsigned int key; int ret = -EIO;
*err = 0; @@ -156,7 +157,13 @@ static Indirect *ext4_get_branch(struct if (!p->key) goto no_block; while (--depth) { - bh = sb_getblk(sb, le32_to_cpu(p->key)); + key = le32_to_cpu(p->key); + if (key > ext4_blocks_count(EXT4_SB(sb)->s_es)) { + /* the block was out of range */ + ret = -EFSCORRUPTED; + goto failure; + } + bh = sb_getblk(sb, key); if (unlikely(!bh)) { ret = -ENOMEM; goto failure;
From: Jan Kara jack@suse.cz
commit b40ebaf63851b3a401b0dc9263843538f64f5ce6 upstream.
Commit fb0a387dcdcd ("ext4: limit block allocations for indirect-block files to < 2^32") added code to try to allocate xattr block with 32-bit block number for indirect block based files on the grounds that these files cannot use larger block numbers. It also added BUG_ON when allocated block could not fit into 32 bits. This is however bogus reasoning because xattr block is stored in inode->i_file_acl and inode->i_file_acl_hi and as such even indirect block based files can happily use full 48 bits for xattr block number. The proper handling seems to be there basically since 64-bit block number support was added. So remove the bogus limitation and BUG_ON.
Cc: Eric Sandeen sandeen@redhat.com Fixes: fb0a387dcdcd ("ext4: limit block allocations for indirect-block files to < 2^32") Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221121130929.32031-1-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/xattr.c | 8 -------- 1 file changed, 8 deletions(-)
--- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -2070,19 +2070,11 @@ inserted:
goal = ext4_group_first_block_no(sb, EXT4_I(inode)->i_block_group); - - /* non-extent files can't have physical blocks past 2^32 */ - if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) - goal = goal & EXT4_MAX_BLOCK_FILE_PHYS; - block = ext4_new_meta_blocks(handle, inode, goal, 0, NULL, &error); if (error) goto cleanup;
- if (!(ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS))) - BUG_ON(block > EXT4_MAX_BLOCK_FILE_PHYS); - ea_idebug(inode, "creating block %llu", (unsigned long long)block);
From: Ye Bin yebin10@huawei.com
commit 5c099c4fdc438014d5893629e70a8ba934433ee8 upstream.
Syzbot report follow issue: ------------[ cut here ]------------ kernel BUG at fs/ext4/inline.c:227! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 3629 Comm: syz-executor212 Not tainted 6.1.0-rc5-syzkaller-00018-g59d0d52c30d4 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/26/2022 RIP: 0010:ext4_write_inline_data+0x344/0x3e0 fs/ext4/inline.c:227 RSP: 0018:ffffc90003b3f368 EFLAGS: 00010293 RAX: 0000000000000000 RBX: ffff8880704e16c0 RCX: 0000000000000000 RDX: ffff888021763a80 RSI: ffffffff821e31a4 RDI: 0000000000000006 RBP: 000000000006818e R08: 0000000000000006 R09: 0000000000068199 R10: 0000000000000079 R11: 0000000000000000 R12: 000000000000000b R13: 0000000000068199 R14: ffffc90003b3f408 R15: ffff8880704e1c82 FS: 000055555723e3c0(0000) GS:ffff8880b9b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fffe8ac9080 CR3: 0000000079f81000 CR4: 0000000000350ee0 Call Trace: <TASK> ext4_write_inline_data_end+0x2a3/0x12f0 fs/ext4/inline.c:768 ext4_write_end+0x242/0xdd0 fs/ext4/inode.c:1313 ext4_da_write_end+0x3ed/0xa30 fs/ext4/inode.c:3063 generic_perform_write+0x316/0x570 mm/filemap.c:3764 ext4_buffered_write_iter+0x15b/0x460 fs/ext4/file.c:285 ext4_file_write_iter+0x8bc/0x16e0 fs/ext4/file.c:700 call_write_iter include/linux/fs.h:2191 [inline] do_iter_readv_writev+0x20b/0x3b0 fs/read_write.c:735 do_iter_write+0x182/0x700 fs/read_write.c:861 vfs_iter_write+0x74/0xa0 fs/read_write.c:902 iter_file_splice_write+0x745/0xc90 fs/splice.c:686 do_splice_from fs/splice.c:764 [inline] direct_splice_actor+0x114/0x180 fs/splice.c:931 splice_direct_to_actor+0x335/0x8a0 fs/splice.c:886 do_splice_direct+0x1ab/0x280 fs/splice.c:974 do_sendfile+0xb19/0x1270 fs/read_write.c:1255 __do_sys_sendfile64 fs/read_write.c:1323 [inline] __se_sys_sendfile64 fs/read_write.c:1309 [inline] __x64_sys_sendfile64+0x1d0/0x210 fs/read_write.c:1309 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x39/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd ---[ end trace 0000000000000000 ]---
Above issue may happens as follows: ext4_da_write_begin ext4_da_write_inline_data_begin ext4_da_convert_inline_data_to_extent ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA); ext4_da_write_end
ext4_run_li_request ext4_mb_prefetch ext4_read_block_bitmap_nowait ext4_validate_block_bitmap ext4_mark_group_bitmap_corrupted(sb, block_group, EXT4_GROUP_INFO_BBITMAP_CORRUPT) percpu_counter_sub(&sbi->s_freeclusters_counter,grp->bb_free); -> sbi->s_freeclusters_counter become zero ext4_da_write_begin if (ext4_nonda_switch(inode->i_sb)) -> As freeclusters_counter is zero will return true *fsdata = (void *)FALL_BACK_TO_NONDELALLOC; ext4_write_begin ext4_da_write_end if (write_mode == FALL_BACK_TO_NONDELALLOC) ext4_write_end if (inline_data) ext4_write_inline_data_end ext4_write_inline_data BUG_ON(pos + len > EXT4_I(inode)->i_inline_size); -> As inode is already convert to extent, so 'pos + len' > inline_size -> then trigger BUG.
To solve this issue, instead of checking ext4_has_inline_data() which is only cleared after data has been written back, check the EXT4_STATE_MAY_INLINE_DATA flag in ext4_write_end().
Fixes: f19d5870cbf7 ("ext4: add normal write support for inline data") Reported-by: syzbot+4faa160fa96bfba639f8@syzkaller.appspotmail.com Reported-by: Jun Nie jun.nie@linaro.org Signed-off-by: Ye Bin yebin10@huawei.com Link: https://lore.kernel.org/r/20221206144134.1919987-1-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -1304,7 +1304,8 @@ static int ext4_write_end(struct file *f
trace_ext4_write_end(inode, pos, len, copied);
- if (ext4_has_inline_data(inode)) + if (ext4_has_inline_data(inode) && + ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)) return ext4_write_inline_data_end(inode, pos, len, copied, page);
copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
From: Ye Bin yebin10@huawei.com
commit e4db04f7d3dbbe16680e0ded27ea2a65b10f766a upstream.
There is issue as follows when do setxattr with inject fault:
[localhost]# fsck.ext4 -fn /dev/sda e2fsck 1.46.6-rc1 (12-Sep-2022) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Unattached zero-length inode 15. Clear? no
Unattached inode 15 Connect to /lost+found? no
Pass 5: Checking group summary information
/dev/sda: ********** WARNING: Filesystem still has errors **********
/dev/sda: 15/655360 files (0.0% non-contiguous), 66755/2621440 blocks
This occurs in 'ext4_xattr_inode_create()'. If 'ext4_mark_inode_dirty()' fails, dropping i_nlink of the inode is needed. Or will lead to inode leak.
Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221208023233.1231330-5-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/xattr.c | 3 +++ 1 file changed, 3 insertions(+)
--- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1441,6 +1441,9 @@ static struct inode *ext4_xattr_inode_cr if (!err) err = ext4_inode_attach_jinode(ea_inode); if (err) { + if (ext4_xattr_inode_dec_ref(handle, ea_inode)) + ext4_warning_inode(ea_inode, + "cleanup dec ref error %d", err); iput(ea_inode); return ERR_PTR(err); }
From: Jan Kara jack@suse.cz
commit 1485f726c6dec1a1f85438f2962feaa3d585526f upstream.
Make sure we initialize quotas before possibly expanding inode space (and thus maybe needing to allocate external xattr block) in ext4_ioctl_setproject(). This prevents not accounting the necessary block allocation.
Signed-off-by: Jan Kara jack@suse.cz Cc: stable@kernel.org Link: https://lore.kernel.org/r/20221207115937.26601-1-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/ioctl.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
--- a/fs/ext4/ioctl.c +++ b/fs/ext4/ioctl.c @@ -492,6 +492,10 @@ static int ext4_ioctl_setproject(struct if (ext4_is_quota_file(inode)) return err;
+ err = dquot_initialize(inode); + if (err) + return err; + err = ext4_get_inode_loc(inode, &iloc); if (err) return err; @@ -507,10 +511,6 @@ static int ext4_ioctl_setproject(struct brelse(iloc.bh); }
- err = dquot_initialize(inode); - if (err) - return err; - handle = ext4_journal_start(inode, EXT4_HT_QUOTA, EXT4_QUOTA_INIT_BLOCKS(sb) + EXT4_QUOTA_DEL_BLOCKS(sb) + 3);
From: Jan Kara jack@suse.cz
commit 8994d11395f8165b3deca1971946f549f0822630 upstream.
When expanding inode space in ext4_expand_extra_isize_ea() we may need to allocate external xattr block. If quota is not initialized for the inode, the block allocation will not be accounted into quota usage. Make sure the quota is initialized before we try to expand inode space.
Reported-by: Pengfei Xu pengfei.xu@intel.com Link: https://lore.kernel.org/all/Y5BT+k6xWqthZc1P@xpf.sh.intel.com Signed-off-by: Jan Kara jack@suse.cz Cc: stable@kernel.org Link: https://lore.kernel.org/r/20221207115937.26601-2-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/inode.c | 8 ++++++++ 1 file changed, 8 insertions(+)
--- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5813,6 +5813,14 @@ static int __ext4_expand_extra_isize(str return 0; }
+ /* + * We may need to allocate external xattr block so we need quotas + * initialized. Here we can be called with various locks held so we + * cannot affort to initialize quotas ourselves. So just bail. + */ + if (dquot_initialize_needed(inode)) + return -EAGAIN; + /* try to expand with EAs present */ error = ext4_expand_extra_isize_ea(inode, new_extra_isize, raw_inode, handle);
From: Ye Bin yebin10@huawei.com
commit cc12a6f25e07ed05d5825a1664b67a970842b2ca upstream.
Now, extended attribute value maximum length is 64K. The memory requested here does not need continuous physical addresses, so it is appropriate to use kvmalloc to request memory. At the same time, it can also cope with the situation that the extended attribute will become longer in the future.
Signed-off-by: Ye Bin yebin10@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221208023233.1231330-3-yebin@huaweicloud.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/xattr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -2549,7 +2549,7 @@ static int ext4_xattr_move_to_block(hand
is = kzalloc(sizeof(struct ext4_xattr_ibody_find), GFP_NOFS); bs = kzalloc(sizeof(struct ext4_xattr_block_find), GFP_NOFS); - buffer = kmalloc(value_size, GFP_NOFS); + buffer = kvmalloc(value_size, GFP_NOFS); b_entry_name = kmalloc(entry->e_name_len + 1, GFP_NOFS); if (!is || !bs || !buffer || !b_entry_name) { error = -ENOMEM; @@ -2601,7 +2601,7 @@ static int ext4_xattr_move_to_block(hand error = 0; out: kfree(b_entry_name); - kfree(buffer); + kvfree(buffer); if (is) brelse(is->iloc.bh); if (bs)
From: Alex Deucher alexander.deucher@amd.com
commit 1d4624cd72b912b2680c08d0be48338a1629a858 upstream.
Some special polaris 10 chips overlap with the polaris11 DID range. Handle this properly in the driver.
v2: use local flags for other function calls.
Acked-by: Luben Tuikov luben.tuikov@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2008,6 +2008,15 @@ static int amdgpu_pci_probe(struct pci_d "See modparam exp_hw_support\n"); return -ENODEV; } + /* differentiate between P10 and P11 asics with the same DID */ + if (pdev->device == 0x67FF && + (pdev->revision == 0xE3 || + pdev->revision == 0xE7 || + pdev->revision == 0xF3 || + pdev->revision == 0xF7)) { + flags &= ~AMD_ASIC_MASK; + flags |= CHIP_POLARIS10; + }
/* Due to hardware bugs, S/G Display on raven requires a 1:1 IOMMU mapping, * however, SME requires an indirect IOMMU mapping because the encryption @@ -2081,12 +2090,12 @@ static int amdgpu_pci_probe(struct pci_d
pci_set_drvdata(pdev, ddev);
- ret = amdgpu_driver_load_kms(adev, ent->driver_data); + ret = amdgpu_driver_load_kms(adev, flags); if (ret) goto err_pci;
retry_init: - ret = drm_dev_register(ddev, ent->driver_data); + ret = drm_dev_register(ddev, flags); if (ret == -EAGAIN && ++retry <= 3) { DRM_INFO("retry init %d\n", retry); /* Don't request EX mode too frequently which is attacking */
From: Alex Deucher alexander.deucher@amd.com
commit 81d0bcf9900932633d270d5bc4a54ff599c6ebdb upstream.
Only apply the static threshold for Stoney and Carrizo. This hardware has certain requirements that don't allow mixing of GTT and VRAM. Newer asics do not have these requirements so we should be able to be more flexible with where buffers end up.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2270 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2291 Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2255 Acked-by: Luben Tuikov luben.tuikov@amd.com Reviewed-by: Christian König christian.koenig@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c @@ -1510,7 +1510,8 @@ u64 amdgpu_bo_gpu_offset_no_check(struct uint32_t amdgpu_bo_get_preferred_domain(struct amdgpu_device *adev, uint32_t domain) { - if (domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) { + if ((domain == (AMDGPU_GEM_DOMAIN_VRAM | AMDGPU_GEM_DOMAIN_GTT)) && + ((adev->asic_type == CHIP_CARRIZO) || (adev->asic_type == CHIP_STONEY))) { domain = AMDGPU_GEM_DOMAIN_VRAM; if (adev->gmc.real_vram_size <= AMDGPU_SG_THRESHOLD) domain = AMDGPU_GEM_DOMAIN_GTT;
From: Damien Le Moal damien.lemoal@opensource.wdc.com
commit 2820e5d0820ac4daedff1272616a53d9c7682fd2 upstream.
dd_finish_request() tests if the per prio fifo_list is not empty to determine if request dispatching must be restarted for handling blocked write requests to zoned devices with a call to blk_mq_sched_mark_restart_hctx(). While simple, this implementation has 2 problems:
1) Only the priority level of the completed request is considered. However, writes to a zone may be blocked due to other writes to the same zone using a different priority level. While this is unlikely to happen in practice, as writing a zone with different IO priorirites does not make sense, nothing in the code prevents this from happening. 2) The use of list_empty() is dangerous as dd_finish_request() does not take dd->lock and may run concurrently with the insert and dispatch code.
Fix these 2 problems by testing the write fifo list of all priority levels using the new helper dd_has_write_work(), and by testing each fifo list using list_empty_careful().
Fixes: c807ab520fc3 ("block/mq-deadline: Add I/O priority support") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Link: https://lore.kernel.org/r/20221124021208.242541-2-damien.lemoal@opensource.w... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/mq-deadline.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)
--- a/block/mq-deadline.c +++ b/block/mq-deadline.c @@ -791,6 +791,18 @@ static void dd_prepare_request(struct re rq->elv.priv[0] = NULL; }
+static bool dd_has_write_work(struct blk_mq_hw_ctx *hctx) +{ + struct deadline_data *dd = hctx->queue->elevator->elevator_data; + enum dd_prio p; + + for (p = 0; p <= DD_PRIO_MAX; p++) + if (!list_empty_careful(&dd->per_prio[p].fifo_list[DD_WRITE])) + return true; + + return false; +} + /* * Callback from inside blk_mq_free_request(). * @@ -813,7 +825,6 @@ static void dd_finish_request(struct req struct deadline_data *dd = q->elevator->elevator_data; const u8 ioprio_class = dd_rq_ioclass(rq); const enum dd_prio prio = ioprio_class_to_prio[ioprio_class]; - struct dd_per_prio *per_prio = &dd->per_prio[prio];
/* * The block layer core may call dd_finish_request() without having @@ -829,9 +840,10 @@ static void dd_finish_request(struct req
spin_lock_irqsave(&dd->zone_lock, flags); blk_req_zone_write_unlock(rq); - if (!list_empty(&per_prio->fifo_list[DD_WRITE])) - blk_mq_sched_mark_restart_hctx(rq->mq_hctx); spin_unlock_irqrestore(&dd->zone_lock, flags); + + if (dd_has_write_work(rq->mq_hctx)) + blk_mq_sched_mark_restart_hctx(rq->mq_hctx); } }
From: Zheng Yejian zhengyejian1@huawei.com
commit ff4837f7fe59ff018eca4705a70eca5e0b486b97 upstream.
The maximum number of synthetic fields supported is defined as SYNTH_FIELDS_MAX which value currently is 64, but it actually fails when try to generate a synthetic event with 64 fields by executing like:
# echo "my_synth_event int v1; int v2; int v3; int v4; int v5; int v6;\ int v7; int v8; int v9; int v10; int v11; int v12; int v13; int v14;\ int v15; int v16; int v17; int v18; int v19; int v20; int v21; int v22;\ int v23; int v24; int v25; int v26; int v27; int v28; int v29; int v30;\ int v31; int v32; int v33; int v34; int v35; int v36; int v37; int v38;\ int v39; int v40; int v41; int v42; int v43; int v44; int v45; int v46;\ int v47; int v48; int v49; int v50; int v51; int v52; int v53; int v54;\ int v55; int v56; int v57; int v58; int v59; int v60; int v61; int v62;\ int v63; int v64" >> /sys/kernel/tracing/synthetic_events
Correct the field counting to fix it.
Link: https://lore.kernel.org/linux-trace-kernel/20221207091557.3137904-1-zhengyej...
Cc: mhiramat@kernel.org Cc: zanussi@kernel.org Cc: stable@vger.kernel.org Fixes: c9e759b1e845 ("tracing: Rework synthetic event command parsing") Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Steven Rostedt (Google) rostedt@goodmis.org [Fix conflict due to lack of c24be24aed405d64ebcf04526614c13b2adfb1d2] Signed-off-by: Zheng Yejian zhengyejian1@huawei.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/trace/trace_events_synth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/kernel/trace/trace_events_synth.c +++ b/kernel/trace/trace_events_synth.c @@ -1275,12 +1275,12 @@ static int __create_synth_event(const ch goto err; }
- fields[n_fields++] = field; if (n_fields == SYNTH_FIELDS_MAX) { synth_err(SYNTH_ERR_TOO_MANY_FIELDS, 0); ret = -EINVAL; goto err; } + fields[n_fields++] = field;
n_fields_this_loop++; }
From: Eric Biggers ebiggers@kernel.org
From: Ritesh Harjani riteshh@linux.ibm.com
commit c864ccd182d6ff2730a0f5b636c6b7c48f6f4f7f upstream.
Below commit removed all references of EXT4_FC_COMMIT_FAILED. commit 0915e464cb274 ("ext4: simplify updating of fast commit stats")
Just remove it since it is not used anymore.
Signed-off-by: Ritesh Harjani riteshh@linux.ibm.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Harshad Shirwadkar harshadshirwadkar@gmail.com Link: https://lore.kernel.org/r/c941357e476be07a1138c7319ca5faab7fb80fc6.164705758... Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/fast_commit.h | 1 - 1 file changed, 1 deletion(-)
--- a/fs/ext4/fast_commit.h +++ b/fs/ext4/fast_commit.h @@ -93,7 +93,6 @@ enum { EXT4_FC_REASON_RENAME_DIR, EXT4_FC_REASON_FALLOC_RANGE, EXT4_FC_REASON_INODE_JOURNAL_DATA, - EXT4_FC_COMMIT_FAILED, EXT4_FC_REASON_MAX };
From: Eric Biggers ebiggers@kernel.org
From: Jan Kara jack@suse.cz
commit 4978c659e7b5c1926cdb4b556e4ca1fd2de8ad42 upstream.
We use jbd_debug() in some places in ext4. It seems a bit strange to use jbd2 debugging output function for ext4 code. Also these days ext4_debug() uses dynamic printk so each debug message can be enabled / disabled on its own so the time when it made some sense to have these combined (to allow easier common selecting of messages to report) has passed. Just convert all jbd_debug() uses in ext4 to ext4_debug().
Signed-off-by: Jan Kara jack@suse.cz Reviewed-by: Lukas Czerner lczerner@redhat.com Link: https://lore.kernel.org/r/20220608112355.4397-1-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/balloc.c | 2 +- fs/ext4/ext4_jbd2.c | 3 +-- fs/ext4/fast_commit.c | 44 ++++++++++++++++++++++---------------------- fs/ext4/indirect.c | 4 ++-- fs/ext4/inode.c | 2 +- fs/ext4/orphan.c | 24 ++++++++++++------------ fs/ext4/super.c | 2 +- 7 files changed, 40 insertions(+), 41 deletions(-)
--- a/fs/ext4/balloc.c +++ b/fs/ext4/balloc.c @@ -665,7 +665,7 @@ int ext4_should_retry_alloc(struct super * it's possible we've just missed a transaction commit here, * so ignore the returned status */ - jbd_debug(1, "%s: retrying operation after ENOSPC\n", sb->s_id); + ext4_debug("%s: retrying operation after ENOSPC\n", sb->s_id); (void) jbd2_journal_force_commit_nested(sbi->s_journal); return 1; } --- a/fs/ext4/ext4_jbd2.c +++ b/fs/ext4/ext4_jbd2.c @@ -267,8 +267,7 @@ int __ext4_forget(const char *where, uns trace_ext4_forget(inode, is_metadata, blocknr); BUFFER_TRACE(bh, "enter");
- jbd_debug(4, "forgetting bh %p: is_metadata = %d, mode %o, " - "data mode %x\n", + ext4_debug("forgetting bh %p: is_metadata=%d, mode %o, data mode %x\n", bh, is_metadata, inode->i_mode, test_opt(inode->i_sb, DATA_FLAGS));
--- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -845,8 +845,8 @@ static int ext4_fc_write_inode_data(stru mutex_unlock(&ei->i_fc_lock);
cur_lblk_off = old_blk_size; - jbd_debug(1, "%s: will try writing %d to %d for inode %ld\n", - __func__, cur_lblk_off, new_blk_size, inode->i_ino); + ext4_debug("will try writing %d to %d for inode %ld\n", + cur_lblk_off, new_blk_size, inode->i_ino);
while (cur_lblk_off <= new_blk_size) { map.m_lblk = cur_lblk_off; @@ -1101,7 +1101,7 @@ static void ext4_fc_update_stats(struct { struct ext4_fc_stats *stats = &EXT4_SB(sb)->s_fc_stats;
- jbd_debug(1, "Fast commit ended with status = %d", status); + ext4_debug("Fast commit ended with status = %d", status); if (status == EXT4_FC_STATUS_OK) { stats->fc_num_commits++; stats->fc_numblks += nblks; @@ -1303,14 +1303,14 @@ static int ext4_fc_replay_unlink(struct inode = ext4_iget(sb, darg.ino, EXT4_IGET_NORMAL);
if (IS_ERR(inode)) { - jbd_debug(1, "Inode %d not found", darg.ino); + ext4_debug("Inode %d not found", darg.ino); return 0; }
old_parent = ext4_iget(sb, darg.parent_ino, EXT4_IGET_NORMAL); if (IS_ERR(old_parent)) { - jbd_debug(1, "Dir with inode %d not found", darg.parent_ino); + ext4_debug("Dir with inode %d not found", darg.parent_ino); iput(inode); return 0; } @@ -1335,21 +1335,21 @@ static int ext4_fc_replay_link_internal(
dir = ext4_iget(sb, darg->parent_ino, EXT4_IGET_NORMAL); if (IS_ERR(dir)) { - jbd_debug(1, "Dir with inode %d not found.", darg->parent_ino); + ext4_debug("Dir with inode %d not found.", darg->parent_ino); dir = NULL; goto out; }
dentry_dir = d_obtain_alias(dir); if (IS_ERR(dentry_dir)) { - jbd_debug(1, "Failed to obtain dentry"); + ext4_debug("Failed to obtain dentry"); dentry_dir = NULL; goto out; }
dentry_inode = d_alloc(dentry_dir, &qstr_dname); if (!dentry_inode) { - jbd_debug(1, "Inode dentry not created."); + ext4_debug("Inode dentry not created."); ret = -ENOMEM; goto out; } @@ -1362,7 +1362,7 @@ static int ext4_fc_replay_link_internal( * could complete. */ if (ret && ret != -EEXIST) { - jbd_debug(1, "Failed to link\n"); + ext4_debug("Failed to link\n"); goto out; }
@@ -1396,7 +1396,7 @@ static int ext4_fc_replay_link(struct su
inode = ext4_iget(sb, darg.ino, EXT4_IGET_NORMAL); if (IS_ERR(inode)) { - jbd_debug(1, "Inode not found."); + ext4_debug("Inode not found."); return 0; }
@@ -1506,7 +1506,7 @@ static int ext4_fc_replay_inode(struct s /* Given that we just wrote the inode on disk, this SHOULD succeed. */ inode = ext4_iget(sb, ino, EXT4_IGET_NORMAL); if (IS_ERR(inode)) { - jbd_debug(1, "Inode not found."); + ext4_debug("Inode not found."); return -EFSCORRUPTED; }
@@ -1559,7 +1559,7 @@ static int ext4_fc_replay_create(struct
inode = ext4_iget(sb, darg.ino, EXT4_IGET_NORMAL); if (IS_ERR(inode)) { - jbd_debug(1, "inode %d not found.", darg.ino); + ext4_debug("inode %d not found.", darg.ino); inode = NULL; ret = -EINVAL; goto out; @@ -1572,7 +1572,7 @@ static int ext4_fc_replay_create(struct */ dir = ext4_iget(sb, darg.parent_ino, EXT4_IGET_NORMAL); if (IS_ERR(dir)) { - jbd_debug(1, "Dir %d not found.", darg.ino); + ext4_debug("Dir %d not found.", darg.ino); goto out; } ret = ext4_init_new_dir(NULL, dir, inode); @@ -1660,7 +1660,7 @@ static int ext4_fc_replay_add_range(stru
inode = ext4_iget(sb, le32_to_cpu(fc_add_ex.fc_ino), EXT4_IGET_NORMAL); if (IS_ERR(inode)) { - jbd_debug(1, "Inode not found."); + ext4_debug("Inode not found."); return 0; }
@@ -1674,7 +1674,7 @@ static int ext4_fc_replay_add_range(stru
cur = start; remaining = len; - jbd_debug(1, "ADD_RANGE, lblk %d, pblk %lld, len %d, unwritten %d, inode %ld\n", + ext4_debug("ADD_RANGE, lblk %d, pblk %lld, len %d, unwritten %d, inode %ld\n", start, start_pblk, len, ext4_ext_is_unwritten(ex), inode->i_ino);
@@ -1735,7 +1735,7 @@ static int ext4_fc_replay_add_range(stru }
/* Range is mapped and needs a state change */ - jbd_debug(1, "Converting from %ld to %d %lld", + ext4_debug("Converting from %ld to %d %lld", map.m_flags & EXT4_MAP_UNWRITTEN, ext4_ext_is_unwritten(ex), map.m_pblk); ret = ext4_ext_replay_update_ex(inode, cur, map.m_len, @@ -1778,7 +1778,7 @@ ext4_fc_replay_del_range(struct super_bl
inode = ext4_iget(sb, le32_to_cpu(lrange.fc_ino), EXT4_IGET_NORMAL); if (IS_ERR(inode)) { - jbd_debug(1, "Inode %d not found", le32_to_cpu(lrange.fc_ino)); + ext4_debug("Inode %d not found", le32_to_cpu(lrange.fc_ino)); return 0; }
@@ -1786,7 +1786,7 @@ ext4_fc_replay_del_range(struct super_bl if (ret) goto out;
- jbd_debug(1, "DEL_RANGE, inode %ld, lblk %d, len %d\n", + ext4_debug("DEL_RANGE, inode %ld, lblk %d, len %d\n", inode->i_ino, le32_to_cpu(lrange.fc_lblk), le32_to_cpu(lrange.fc_len)); while (remaining > 0) { @@ -1835,7 +1835,7 @@ static void ext4_fc_set_bitmaps_and_coun inode = ext4_iget(sb, state->fc_modified_inodes[i], EXT4_IGET_NORMAL); if (IS_ERR(inode)) { - jbd_debug(1, "Inode %d not found.", + ext4_debug("Inode %d not found.", state->fc_modified_inodes[i]); continue; } @@ -1960,7 +1960,7 @@ static int ext4_fc_replay_scan(journal_t for (cur = start; cur < end; cur = cur + sizeof(tl) + le16_to_cpu(tl.fc_len)) { memcpy(&tl, cur, sizeof(tl)); val = cur + sizeof(tl); - jbd_debug(3, "Scan phase, tag:%s, blk %lld\n", + ext4_debug("Scan phase, tag:%s, blk %lld\n", tag2str(le16_to_cpu(tl.fc_tag)), bh->b_blocknr); switch (le16_to_cpu(tl.fc_tag)) { case EXT4_FC_TAG_ADD_RANGE: @@ -2055,7 +2055,7 @@ static int ext4_fc_replay(journal_t *jou sbi->s_mount_state |= EXT4_FC_REPLAY; } if (!sbi->s_fc_replay_state.fc_replay_num_tags) { - jbd_debug(1, "Replay stops\n"); + ext4_debug("Replay stops\n"); ext4_fc_set_bitmaps_and_counters(sb); return 0; } @@ -2079,7 +2079,7 @@ static int ext4_fc_replay(journal_t *jou ext4_fc_set_bitmaps_and_counters(sb); break; } - jbd_debug(3, "Replay phase, tag:%s\n", + ext4_debug("Replay phase, tag:%s\n", tag2str(le16_to_cpu(tl.fc_tag))); state->fc_replay_num_tags--; switch (le16_to_cpu(tl.fc_tag)) { --- a/fs/ext4/indirect.c +++ b/fs/ext4/indirect.c @@ -467,7 +467,7 @@ static int ext4_splice_branch(handle_t * * the new i_size. But that is not done here - it is done in * generic_commit_write->__mark_inode_dirty->ext4_dirty_inode. */ - jbd_debug(5, "splicing indirect only\n"); + ext4_debug("splicing indirect only\n"); BUFFER_TRACE(where->bh, "call ext4_handle_dirty_metadata"); err = ext4_handle_dirty_metadata(handle, ar->inode, where->bh); if (err) @@ -479,7 +479,7 @@ static int ext4_splice_branch(handle_t * err = ext4_mark_inode_dirty(handle, ar->inode); if (unlikely(err)) goto err_out; - jbd_debug(5, "splicing direct\n"); + ext4_debug("splicing direct\n"); } return err;
--- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5220,7 +5220,7 @@ int ext4_write_inode(struct inode *inode
if (EXT4_SB(inode->i_sb)->s_journal) { if (ext4_journal_current_handle()) { - jbd_debug(1, "called recursively, non-PF_MEMALLOC!\n"); + ext4_debug("called recursively, non-PF_MEMALLOC!\n"); dump_stack(); return -EIO; } --- a/fs/ext4/orphan.c +++ b/fs/ext4/orphan.c @@ -181,8 +181,8 @@ int ext4_orphan_add(handle_t *handle, st } else brelse(iloc.bh);
- jbd_debug(4, "superblock will point to %lu\n", inode->i_ino); - jbd_debug(4, "orphan inode %lu will point to %d\n", + ext4_debug("superblock will point to %lu\n", inode->i_ino); + ext4_debug("orphan inode %lu will point to %d\n", inode->i_ino, NEXT_ORPHAN(inode)); out: ext4_std_error(sb, err); @@ -251,7 +251,7 @@ int ext4_orphan_del(handle_t *handle, st }
mutex_lock(&sbi->s_orphan_lock); - jbd_debug(4, "remove inode %lu from orphan list\n", inode->i_ino); + ext4_debug("remove inode %lu from orphan list\n", inode->i_ino);
prev = ei->i_orphan.prev; list_del_init(&ei->i_orphan); @@ -267,7 +267,7 @@ int ext4_orphan_del(handle_t *handle, st
ino_next = NEXT_ORPHAN(inode); if (prev == &sbi->s_orphan) { - jbd_debug(4, "superblock will point to %u\n", ino_next); + ext4_debug("superblock will point to %u\n", ino_next); BUFFER_TRACE(sbi->s_sbh, "get_write_access"); err = ext4_journal_get_write_access(handle, inode->i_sb, sbi->s_sbh, EXT4_JTR_NONE); @@ -286,7 +286,7 @@ int ext4_orphan_del(handle_t *handle, st struct inode *i_prev = &list_entry(prev, struct ext4_inode_info, i_orphan)->vfs_inode;
- jbd_debug(4, "orphan inode %lu will point to %u\n", + ext4_debug("orphan inode %lu will point to %u\n", i_prev->i_ino, ino_next); err = ext4_reserve_inode_write(handle, i_prev, &iloc2); if (err) { @@ -332,8 +332,8 @@ static void ext4_process_orphan(struct i ext4_msg(sb, KERN_DEBUG, "%s: truncating inode %lu to %lld bytes", __func__, inode->i_ino, inode->i_size); - jbd_debug(2, "truncating inode %lu to %lld bytes\n", - inode->i_ino, inode->i_size); + ext4_debug("truncating inode %lu to %lld bytes\n", + inode->i_ino, inode->i_size); inode_lock(inode); truncate_inode_pages(inode->i_mapping, inode->i_size); ret = ext4_truncate(inode); @@ -353,8 +353,8 @@ static void ext4_process_orphan(struct i ext4_msg(sb, KERN_DEBUG, "%s: deleting unreferenced inode %lu", __func__, inode->i_ino); - jbd_debug(2, "deleting unreferenced inode %lu\n", - inode->i_ino); + ext4_debug("deleting unreferenced inode %lu\n", + inode->i_ino); (*nr_orphans)++; } iput(inode); /* The delete magic happens here! */ @@ -391,7 +391,7 @@ void ext4_orphan_cleanup(struct super_bl int inodes_per_ob = ext4_inodes_per_orphan_block(sb);
if (!es->s_last_orphan && !oi->of_blocks) { - jbd_debug(4, "no orphan inodes to clean up\n"); + ext4_debug("no orphan inodes to clean up\n"); return; }
@@ -415,7 +415,7 @@ void ext4_orphan_cleanup(struct super_bl "clearing orphan list."); es->s_last_orphan = 0; } - jbd_debug(1, "Skipping orphan recovery on fs with errors.\n"); + ext4_debug("Skipping orphan recovery on fs with errors.\n"); return; }
@@ -459,7 +459,7 @@ void ext4_orphan_cleanup(struct super_bl * so, skip the rest. */ if (EXT4_SB(sb)->s_mount_state & EXT4_ERROR_FS) { - jbd_debug(1, "Skipping orphan recovery on fs with errors.\n"); + ext4_debug("Skipping orphan recovery on fs with errors.\n"); es->s_last_orphan = 0; break; } --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5154,7 +5154,7 @@ static struct inode *ext4_get_journal_in return NULL; }
- jbd_debug(2, "Journal inode found at %p: %lld bytes\n", + ext4_debug("Journal inode found at %p: %lld bytes\n", journal_inode, journal_inode->i_size); if (!S_ISREG(journal_inode->i_mode)) { ext4_msg(sb, KERN_ERR, "invalid journal inode");
From: Eric Biggers ebiggers@kernel.org
From: Ye Bin yebin10@huawei.com
commit fdc2a3c75dd8345c5b48718af90bad1a7811bedb upstream.
Introduce EXT4_FC_TAG_BASE_LEN helper for calculate length of struct ext4_fc_tl.
Signed-off-by: Ye Bin yebin10@huawei.com Link: https://lore.kernel.org/r/20220924075233.2315259-2-yebin10@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/fast_commit.c | 54 +++++++++++++++++++++++++------------------------- fs/ext4/fast_commit.h | 3 ++ 2 files changed, 31 insertions(+), 26 deletions(-)
--- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -631,10 +631,10 @@ static u8 *ext4_fc_reserve_space(struct * After allocating len, we should have space at least for a 0 byte * padding. */ - if (len + sizeof(struct ext4_fc_tl) > bsize) + if (len + EXT4_FC_TAG_BASE_LEN > bsize) return NULL;
- if (bsize - off - 1 > len + sizeof(struct ext4_fc_tl)) { + if (bsize - off - 1 > len + EXT4_FC_TAG_BASE_LEN) { /* * Only allocate from current buffer if we have enough space for * this request AND we have space to add a zero byte padding. @@ -651,10 +651,10 @@ static u8 *ext4_fc_reserve_space(struct /* Need to add PAD tag */ tl = (struct ext4_fc_tl *)(sbi->s_fc_bh->b_data + off); tl->fc_tag = cpu_to_le16(EXT4_FC_TAG_PAD); - pad_len = bsize - off - 1 - sizeof(struct ext4_fc_tl); + pad_len = bsize - off - 1 - EXT4_FC_TAG_BASE_LEN; tl->fc_len = cpu_to_le16(pad_len); if (crc) - *crc = ext4_chksum(sbi, *crc, tl, sizeof(*tl)); + *crc = ext4_chksum(sbi, *crc, tl, EXT4_FC_TAG_BASE_LEN); if (pad_len > 0) ext4_fc_memzero(sb, tl + 1, pad_len, crc); /* Don't leak uninitialized memory in the unused last byte. */ @@ -699,7 +699,7 @@ static int ext4_fc_write_tail(struct sup * ext4_fc_reserve_space takes care of allocating an extra block if * there's no enough space on this block for accommodating this tail. */ - dst = ext4_fc_reserve_space(sb, sizeof(tl) + sizeof(tail), &crc); + dst = ext4_fc_reserve_space(sb, EXT4_FC_TAG_BASE_LEN + sizeof(tail), &crc); if (!dst) return -ENOSPC;
@@ -709,8 +709,8 @@ static int ext4_fc_write_tail(struct sup tl.fc_len = cpu_to_le16(bsize - off - 1 + sizeof(struct ext4_fc_tail)); sbi->s_fc_bytes = round_up(sbi->s_fc_bytes, bsize);
- ext4_fc_memcpy(sb, dst, &tl, sizeof(tl), &crc); - dst += sizeof(tl); + ext4_fc_memcpy(sb, dst, &tl, EXT4_FC_TAG_BASE_LEN, &crc); + dst += EXT4_FC_TAG_BASE_LEN; tail.fc_tid = cpu_to_le32(sbi->s_journal->j_running_transaction->t_tid); ext4_fc_memcpy(sb, dst, &tail.fc_tid, sizeof(tail.fc_tid), &crc); dst += sizeof(tail.fc_tid); @@ -734,15 +734,15 @@ static bool ext4_fc_add_tlv(struct super struct ext4_fc_tl tl; u8 *dst;
- dst = ext4_fc_reserve_space(sb, sizeof(tl) + len, crc); + dst = ext4_fc_reserve_space(sb, EXT4_FC_TAG_BASE_LEN + len, crc); if (!dst) return false;
tl.fc_tag = cpu_to_le16(tag); tl.fc_len = cpu_to_le16(len);
- ext4_fc_memcpy(sb, dst, &tl, sizeof(tl), crc); - ext4_fc_memcpy(sb, dst + sizeof(tl), val, len, crc); + ext4_fc_memcpy(sb, dst, &tl, EXT4_FC_TAG_BASE_LEN, crc); + ext4_fc_memcpy(sb, dst + EXT4_FC_TAG_BASE_LEN, val, len, crc);
return true; } @@ -754,8 +754,8 @@ static bool ext4_fc_add_dentry_tlv(struc struct ext4_fc_dentry_info fcd; struct ext4_fc_tl tl; int dlen = fc_dentry->fcd_name.len; - u8 *dst = ext4_fc_reserve_space(sb, sizeof(tl) + sizeof(fcd) + dlen, - crc); + u8 *dst = ext4_fc_reserve_space(sb, + EXT4_FC_TAG_BASE_LEN + sizeof(fcd) + dlen, crc);
if (!dst) return false; @@ -764,8 +764,8 @@ static bool ext4_fc_add_dentry_tlv(struc fcd.fc_ino = cpu_to_le32(fc_dentry->fcd_ino); tl.fc_tag = cpu_to_le16(fc_dentry->fcd_op); tl.fc_len = cpu_to_le16(sizeof(fcd) + dlen); - ext4_fc_memcpy(sb, dst, &tl, sizeof(tl), crc); - dst += sizeof(tl); + ext4_fc_memcpy(sb, dst, &tl, EXT4_FC_TAG_BASE_LEN, crc); + dst += EXT4_FC_TAG_BASE_LEN; ext4_fc_memcpy(sb, dst, &fcd, sizeof(fcd), crc); dst += sizeof(fcd); ext4_fc_memcpy(sb, dst, fc_dentry->fcd_name.name, dlen, crc); @@ -801,13 +801,13 @@ static int ext4_fc_write_inode(struct in
ret = -ECANCELED; dst = ext4_fc_reserve_space(inode->i_sb, - sizeof(tl) + inode_len + sizeof(fc_inode.fc_ino), crc); + EXT4_FC_TAG_BASE_LEN + inode_len + sizeof(fc_inode.fc_ino), crc); if (!dst) goto err;
- if (!ext4_fc_memcpy(inode->i_sb, dst, &tl, sizeof(tl), crc)) + if (!ext4_fc_memcpy(inode->i_sb, dst, &tl, EXT4_FC_TAG_BASE_LEN, crc)) goto err; - dst += sizeof(tl); + dst += EXT4_FC_TAG_BASE_LEN; if (!ext4_fc_memcpy(inode->i_sb, dst, &fc_inode, sizeof(fc_inode), crc)) goto err; dst += sizeof(fc_inode); @@ -1957,9 +1957,10 @@ static int ext4_fc_replay_scan(journal_t }
state->fc_replay_expected_off++; - for (cur = start; cur < end; cur = cur + sizeof(tl) + le16_to_cpu(tl.fc_len)) { - memcpy(&tl, cur, sizeof(tl)); - val = cur + sizeof(tl); + for (cur = start; cur < end; + cur = cur + EXT4_FC_TAG_BASE_LEN + le16_to_cpu(tl.fc_len)) { + memcpy(&tl, cur, EXT4_FC_TAG_BASE_LEN); + val = cur + EXT4_FC_TAG_BASE_LEN; ext4_debug("Scan phase, tag:%s, blk %lld\n", tag2str(le16_to_cpu(tl.fc_tag)), bh->b_blocknr); switch (le16_to_cpu(tl.fc_tag)) { @@ -1982,13 +1983,13 @@ static int ext4_fc_replay_scan(journal_t case EXT4_FC_TAG_PAD: state->fc_cur_tag++; state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur, - sizeof(tl) + le16_to_cpu(tl.fc_len)); + EXT4_FC_TAG_BASE_LEN + le16_to_cpu(tl.fc_len)); break; case EXT4_FC_TAG_TAIL: state->fc_cur_tag++; memcpy(&tail, val, sizeof(tail)); state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur, - sizeof(tl) + + EXT4_FC_TAG_BASE_LEN + offsetof(struct ext4_fc_tail, fc_crc)); if (le32_to_cpu(tail.fc_tid) == expected_tid && @@ -2015,7 +2016,7 @@ static int ext4_fc_replay_scan(journal_t } state->fc_cur_tag++; state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur, - sizeof(tl) + le16_to_cpu(tl.fc_len)); + EXT4_FC_TAG_BASE_LEN + le16_to_cpu(tl.fc_len)); break; default: ret = state->fc_replay_num_tags ? @@ -2070,9 +2071,10 @@ static int ext4_fc_replay(journal_t *jou start = (u8 *)bh->b_data; end = (__u8 *)bh->b_data + journal->j_blocksize - 1;
- for (cur = start; cur < end; cur = cur + sizeof(tl) + le16_to_cpu(tl.fc_len)) { - memcpy(&tl, cur, sizeof(tl)); - val = cur + sizeof(tl); + for (cur = start; cur < end; + cur = cur + EXT4_FC_TAG_BASE_LEN + le16_to_cpu(tl.fc_len)) { + memcpy(&tl, cur, EXT4_FC_TAG_BASE_LEN); + val = cur + EXT4_FC_TAG_BASE_LEN;
if (state->fc_replay_num_tags == 0) { ret = JBD2_FC_REPLAY_STOP; --- a/fs/ext4/fast_commit.h +++ b/fs/ext4/fast_commit.h @@ -70,6 +70,9 @@ struct ext4_fc_tail { __le32 fc_crc; };
+/* Tag base length */ +#define EXT4_FC_TAG_BASE_LEN (sizeof(struct ext4_fc_tl)) + /* * Fast commit status codes */
From: Eric Biggers ebiggers@kernel.org
From: Ye Bin yebin10@huawei.com
commit dcc5827484d6e53ccda12334f8bbfafcc593ceda upstream.
Factor out ext4_fc_get_tl() to fill 'tl' with host byte order.
Signed-off-by: Ye Bin yebin10@huawei.com Link: https://lore.kernel.org/r/20220924075233.2315259-3-yebin10@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/fast_commit.c | 46 +++++++++++++++++++++++++--------------------- 1 file changed, 25 insertions(+), 21 deletions(-)
--- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -1271,7 +1271,7 @@ struct dentry_info_args { };
static inline void tl_to_darg(struct dentry_info_args *darg, - struct ext4_fc_tl *tl, u8 *val) + struct ext4_fc_tl *tl, u8 *val) { struct ext4_fc_dentry_info fcd;
@@ -1280,8 +1280,14 @@ static inline void tl_to_darg(struct den darg->parent_ino = le32_to_cpu(fcd.fc_parent_ino); darg->ino = le32_to_cpu(fcd.fc_ino); darg->dname = val + offsetof(struct ext4_fc_dentry_info, fc_dname); - darg->dname_len = le16_to_cpu(tl->fc_len) - - sizeof(struct ext4_fc_dentry_info); + darg->dname_len = tl->fc_len - sizeof(struct ext4_fc_dentry_info); +} + +static inline void ext4_fc_get_tl(struct ext4_fc_tl *tl, u8 *val) +{ + memcpy(tl, val, EXT4_FC_TAG_BASE_LEN); + tl->fc_len = le16_to_cpu(tl->fc_len); + tl->fc_tag = le16_to_cpu(tl->fc_tag); }
/* Unlink replay function */ @@ -1446,7 +1452,7 @@ static int ext4_fc_replay_inode(struct s struct ext4_inode *raw_fc_inode; struct inode *inode = NULL; struct ext4_iloc iloc; - int inode_len, ino, ret, tag = le16_to_cpu(tl->fc_tag); + int inode_len, ino, ret, tag = tl->fc_tag; struct ext4_extent_header *eh;
memcpy(&fc_inode, val, sizeof(fc_inode)); @@ -1471,7 +1477,7 @@ static int ext4_fc_replay_inode(struct s if (ret) goto out;
- inode_len = le16_to_cpu(tl->fc_len) - sizeof(struct ext4_fc_inode); + inode_len = tl->fc_len - sizeof(struct ext4_fc_inode); raw_inode = ext4_raw_inode(&iloc);
memcpy(raw_inode, raw_fc_inode, offsetof(struct ext4_inode, i_block)); @@ -1958,12 +1964,12 @@ static int ext4_fc_replay_scan(journal_t
state->fc_replay_expected_off++; for (cur = start; cur < end; - cur = cur + EXT4_FC_TAG_BASE_LEN + le16_to_cpu(tl.fc_len)) { - memcpy(&tl, cur, EXT4_FC_TAG_BASE_LEN); + cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) { + ext4_fc_get_tl(&tl, cur); val = cur + EXT4_FC_TAG_BASE_LEN; ext4_debug("Scan phase, tag:%s, blk %lld\n", - tag2str(le16_to_cpu(tl.fc_tag)), bh->b_blocknr); - switch (le16_to_cpu(tl.fc_tag)) { + tag2str(tl.fc_tag), bh->b_blocknr); + switch (tl.fc_tag) { case EXT4_FC_TAG_ADD_RANGE: memcpy(&ext, val, sizeof(ext)); ex = (struct ext4_extent *)&ext.fc_ex; @@ -1983,7 +1989,7 @@ static int ext4_fc_replay_scan(journal_t case EXT4_FC_TAG_PAD: state->fc_cur_tag++; state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur, - EXT4_FC_TAG_BASE_LEN + le16_to_cpu(tl.fc_len)); + EXT4_FC_TAG_BASE_LEN + tl.fc_len); break; case EXT4_FC_TAG_TAIL: state->fc_cur_tag++; @@ -2016,7 +2022,7 @@ static int ext4_fc_replay_scan(journal_t } state->fc_cur_tag++; state->fc_crc = ext4_chksum(sbi, state->fc_crc, cur, - EXT4_FC_TAG_BASE_LEN + le16_to_cpu(tl.fc_len)); + EXT4_FC_TAG_BASE_LEN + tl.fc_len); break; default: ret = state->fc_replay_num_tags ? @@ -2072,8 +2078,8 @@ static int ext4_fc_replay(journal_t *jou end = (__u8 *)bh->b_data + journal->j_blocksize - 1;
for (cur = start; cur < end; - cur = cur + EXT4_FC_TAG_BASE_LEN + le16_to_cpu(tl.fc_len)) { - memcpy(&tl, cur, EXT4_FC_TAG_BASE_LEN); + cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) { + ext4_fc_get_tl(&tl, cur); val = cur + EXT4_FC_TAG_BASE_LEN;
if (state->fc_replay_num_tags == 0) { @@ -2081,10 +2087,9 @@ static int ext4_fc_replay(journal_t *jou ext4_fc_set_bitmaps_and_counters(sb); break; } - ext4_debug("Replay phase, tag:%s\n", - tag2str(le16_to_cpu(tl.fc_tag))); + ext4_debug("Replay phase, tag:%s\n", tag2str(tl.fc_tag)); state->fc_replay_num_tags--; - switch (le16_to_cpu(tl.fc_tag)) { + switch (tl.fc_tag) { case EXT4_FC_TAG_LINK: ret = ext4_fc_replay_link(sb, &tl, val); break; @@ -2105,19 +2110,18 @@ static int ext4_fc_replay(journal_t *jou break; case EXT4_FC_TAG_PAD: trace_ext4_fc_replay(sb, EXT4_FC_TAG_PAD, 0, - le16_to_cpu(tl.fc_len), 0); + tl.fc_len, 0); break; case EXT4_FC_TAG_TAIL: - trace_ext4_fc_replay(sb, EXT4_FC_TAG_TAIL, 0, - le16_to_cpu(tl.fc_len), 0); + trace_ext4_fc_replay(sb, EXT4_FC_TAG_TAIL, + 0, tl.fc_len, 0); memcpy(&tail, val, sizeof(tail)); WARN_ON(le32_to_cpu(tail.fc_tid) != expected_tid); break; case EXT4_FC_TAG_HEAD: break; default: - trace_ext4_fc_replay(sb, le16_to_cpu(tl.fc_tag), 0, - le16_to_cpu(tl.fc_len), 0); + trace_ext4_fc_replay(sb, tl.fc_tag, 0, tl.fc_len, 0); ret = -ECANCELED; break; }
From: Eric Biggers ebiggers@kernel.org
From: Ye Bin yebin10@huawei.com
commit 1b45cc5c7b920fd8bf72e5a888ec7abeadf41e09 upstream.
For scan loop must ensure that at least EXT4_FC_TAG_BASE_LEN space. If remain space less than EXT4_FC_TAG_BASE_LEN which will lead to out of bound read when mounting corrupt file system image. ADD_RANGE/HEAD/TAIL is needed to add extra check when do journal scan, as this three tags will read data during scan, tag length couldn't less than data length which will read.
Cc: stable@kernel.org Signed-off-by: Ye Bin yebin10@huawei.com Link: https://lore.kernel.org/r/20220924075233.2315259-4-yebin10@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Eric Biggers ebiggers@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/fast_commit.c | 38 ++++++++++++++++++++++++++++++++++++-- 1 file changed, 36 insertions(+), 2 deletions(-)
--- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -1907,6 +1907,34 @@ void ext4_fc_replay_cleanup(struct super kfree(sbi->s_fc_replay_state.fc_modified_inodes); }
+static inline bool ext4_fc_tag_len_isvalid(struct ext4_fc_tl *tl, + u8 *val, u8 *end) +{ + if (val + tl->fc_len > end) + return false; + + /* Here only check ADD_RANGE/TAIL/HEAD which will read data when do + * journal rescan before do CRC check. Other tags length check will + * rely on CRC check. + */ + switch (tl->fc_tag) { + case EXT4_FC_TAG_ADD_RANGE: + return (sizeof(struct ext4_fc_add_range) == tl->fc_len); + case EXT4_FC_TAG_TAIL: + return (sizeof(struct ext4_fc_tail) <= tl->fc_len); + case EXT4_FC_TAG_HEAD: + return (sizeof(struct ext4_fc_head) == tl->fc_len); + case EXT4_FC_TAG_DEL_RANGE: + case EXT4_FC_TAG_LINK: + case EXT4_FC_TAG_UNLINK: + case EXT4_FC_TAG_CREAT: + case EXT4_FC_TAG_INODE: + case EXT4_FC_TAG_PAD: + default: + return true; + } +} + /* * Recovery Scan phase handler * @@ -1963,10 +1991,15 @@ static int ext4_fc_replay_scan(journal_t }
state->fc_replay_expected_off++; - for (cur = start; cur < end; + for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN; cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) { ext4_fc_get_tl(&tl, cur); val = cur + EXT4_FC_TAG_BASE_LEN; + if (!ext4_fc_tag_len_isvalid(&tl, val, end)) { + ret = state->fc_replay_num_tags ? + JBD2_FC_REPLAY_STOP : -ECANCELED; + goto out_err; + } ext4_debug("Scan phase, tag:%s, blk %lld\n", tag2str(tl.fc_tag), bh->b_blocknr); switch (tl.fc_tag) { @@ -2077,7 +2110,7 @@ static int ext4_fc_replay(journal_t *jou start = (u8 *)bh->b_data; end = (__u8 *)bh->b_data + journal->j_blocksize - 1;
- for (cur = start; cur < end; + for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN; cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) { ext4_fc_get_tl(&tl, cur); val = cur + EXT4_FC_TAG_BASE_LEN; @@ -2087,6 +2120,7 @@ static int ext4_fc_replay(journal_t *jou ext4_fc_set_bitmaps_and_counters(sb); break; } + ext4_debug("Replay phase, tag:%s\n", tag2str(tl.fc_tag)); state->fc_replay_num_tags--; switch (tl.fc_tag) {
From: Eric Biggers ebiggers@kernel.org
From: Eric Biggers ebiggers@google.com
commit 0fbcb5251fc81b58969b272c4fb7374a7b922e3e upstream.
fast-commit of create, link, and unlink operations in encrypted directories is completely broken because the unencrypted filenames are being written to the fast-commit journal instead of the encrypted filenames. These operations can't be replayed, as encryption keys aren't present at journal replay time. It is also an information leak.
Until if/when we can get this working properly, make encrypted directory operations ineligible for fast-commit.
Note that fast-commit operations on encrypted regular files continue to be allowed, as they seem to work.
Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20221106224841.279231-2-ebiggers@kernel.org Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/fast_commit.c | 41 +++++++++++++++++++++++++---------------- fs/ext4/fast_commit.h | 1 + include/trace/events/ext4.h | 7 +++++-- 3 files changed, 31 insertions(+), 18 deletions(-)
--- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -399,25 +399,34 @@ static int __track_dentry_update(struct struct __track_dentry_update_args *dentry_update = (struct __track_dentry_update_args *)arg; struct dentry *dentry = dentry_update->dentry; - struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); + struct inode *dir = dentry->d_parent->d_inode; + struct super_block *sb = inode->i_sb; + struct ext4_sb_info *sbi = EXT4_SB(sb);
mutex_unlock(&ei->i_fc_lock); + + if (IS_ENCRYPTED(dir)) { + ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_ENCRYPTED_FILENAME, + NULL); + mutex_lock(&ei->i_fc_lock); + return -EOPNOTSUPP; + } + node = kmem_cache_alloc(ext4_fc_dentry_cachep, GFP_NOFS); if (!node) { - ext4_fc_mark_ineligible(inode->i_sb, EXT4_FC_REASON_NOMEM, NULL); + ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_NOMEM, NULL); mutex_lock(&ei->i_fc_lock); return -ENOMEM; }
node->fcd_op = dentry_update->op; - node->fcd_parent = dentry->d_parent->d_inode->i_ino; + node->fcd_parent = dir->i_ino; node->fcd_ino = inode->i_ino; if (dentry->d_name.len > DNAME_INLINE_LEN) { node->fcd_name.name = kmalloc(dentry->d_name.len, GFP_NOFS); if (!node->fcd_name.name) { kmem_cache_free(ext4_fc_dentry_cachep, node); - ext4_fc_mark_ineligible(inode->i_sb, - EXT4_FC_REASON_NOMEM, NULL); + ext4_fc_mark_ineligible(sb, EXT4_FC_REASON_NOMEM, NULL); mutex_lock(&ei->i_fc_lock); return -ENOMEM; } @@ -2179,17 +2188,17 @@ void ext4_fc_init(struct super_block *sb journal->j_fc_cleanup_callback = ext4_fc_cleanup; }
-static const char *fc_ineligible_reasons[] = { - "Extended attributes changed", - "Cross rename", - "Journal flag changed", - "Insufficient memory", - "Swap boot", - "Resize", - "Dir renamed", - "Falloc range op", - "Data journalling", - "FC Commit Failed" +static const char * const fc_ineligible_reasons[] = { + [EXT4_FC_REASON_XATTR] = "Extended attributes changed", + [EXT4_FC_REASON_CROSS_RENAME] = "Cross rename", + [EXT4_FC_REASON_JOURNAL_FLAG_CHANGE] = "Journal flag changed", + [EXT4_FC_REASON_NOMEM] = "Insufficient memory", + [EXT4_FC_REASON_SWAP_BOOT] = "Swap boot", + [EXT4_FC_REASON_RESIZE] = "Resize", + [EXT4_FC_REASON_RENAME_DIR] = "Dir renamed", + [EXT4_FC_REASON_FALLOC_RANGE] = "Falloc range op", + [EXT4_FC_REASON_INODE_JOURNAL_DATA] = "Data journalling", + [EXT4_FC_REASON_ENCRYPTED_FILENAME] = "Encrypted filename", };
int ext4_fc_info_show(struct seq_file *seq, void *v) --- a/fs/ext4/fast_commit.h +++ b/fs/ext4/fast_commit.h @@ -96,6 +96,7 @@ enum { EXT4_FC_REASON_RENAME_DIR, EXT4_FC_REASON_FALLOC_RANGE, EXT4_FC_REASON_INODE_JOURNAL_DATA, + EXT4_FC_REASON_ENCRYPTED_FILENAME, EXT4_FC_REASON_MAX };
--- a/include/trace/events/ext4.h +++ b/include/trace/events/ext4.h @@ -104,6 +104,7 @@ TRACE_DEFINE_ENUM(EXT4_FC_REASON_RESIZE) TRACE_DEFINE_ENUM(EXT4_FC_REASON_RENAME_DIR); TRACE_DEFINE_ENUM(EXT4_FC_REASON_FALLOC_RANGE); TRACE_DEFINE_ENUM(EXT4_FC_REASON_INODE_JOURNAL_DATA); +TRACE_DEFINE_ENUM(EXT4_FC_REASON_ENCRYPTED_FILENAME); TRACE_DEFINE_ENUM(EXT4_FC_REASON_MAX);
#define show_fc_reason(reason) \ @@ -116,7 +117,8 @@ TRACE_DEFINE_ENUM(EXT4_FC_REASON_MAX); { EXT4_FC_REASON_RESIZE, "RESIZE"}, \ { EXT4_FC_REASON_RENAME_DIR, "RENAME_DIR"}, \ { EXT4_FC_REASON_FALLOC_RANGE, "FALLOC_RANGE"}, \ - { EXT4_FC_REASON_INODE_JOURNAL_DATA, "INODE_JOURNAL_DATA"}) + { EXT4_FC_REASON_INODE_JOURNAL_DATA, "INODE_JOURNAL_DATA"}, \ + { EXT4_FC_REASON_ENCRYPTED_FILENAME, "ENCRYPTED_FILENAME"})
TRACE_EVENT(ext4_other_inode_update_time, TP_PROTO(struct inode *inode, ino_t orig_ino), @@ -2764,7 +2766,7 @@ TRACE_EVENT(ext4_fc_stats, ),
TP_printk("dev %d,%d fc ineligible reasons:\n" - "%s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u " + "%s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u, %s:%u" "num_commits:%lu, ineligible: %lu, numblks: %lu", MAJOR(__entry->dev), MINOR(__entry->dev), FC_REASON_NAME_STAT(EXT4_FC_REASON_XATTR), @@ -2776,6 +2778,7 @@ TRACE_EVENT(ext4_fc_stats, FC_REASON_NAME_STAT(EXT4_FC_REASON_RENAME_DIR), FC_REASON_NAME_STAT(EXT4_FC_REASON_FALLOC_RANGE), FC_REASON_NAME_STAT(EXT4_FC_REASON_INODE_JOURNAL_DATA), + FC_REASON_NAME_STAT(EXT4_FC_REASON_ENCRYPTED_FILENAME), __entry->fc_commits, __entry->fc_ineligible_commits, __entry->fc_numblks) );
From: Eric Biggers ebiggers@kernel.org
From: Eric Biggers ebiggers@google.com
commit 4c0d5778385cb3618ff26a561ce41de2b7d9de70 upstream.
Commit a80f7fcf1867 ("ext4: fixup ext4_fc_track_* functions' signature") extended the scope of the transaction in ext4_unlink() too far, making it include the call to ext4_find_entry(). However, ext4_find_entry() can deadlock when called from within a transaction because it may need to set up the directory's encryption key.
Fix this by restoring the transaction to its original scope.
Reported-by: syzbot+1a748d0007eeac3ab079@syzkaller.appspotmail.com Fixes: a80f7fcf1867 ("ext4: fixup ext4_fc_track_* functions' signature") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20221106224841.279231-3-ebiggers@kernel.org Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/ext4.h | 4 ++-- fs/ext4/fast_commit.c | 2 +- fs/ext4/namei.c | 44 ++++++++++++++++++++++++-------------------- 3 files changed, 27 insertions(+), 23 deletions(-)
--- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3647,8 +3647,8 @@ extern void ext4_initialize_dirent_tail( unsigned int blocksize); extern int ext4_handle_dirty_dirblock(handle_t *handle, struct inode *inode, struct buffer_head *bh); -extern int __ext4_unlink(handle_t *handle, struct inode *dir, const struct qstr *d_name, - struct inode *inode); +extern int __ext4_unlink(struct inode *dir, const struct qstr *d_name, + struct inode *inode, struct dentry *dentry); extern int __ext4_link(struct inode *dir, struct inode *inode, struct dentry *dentry);
--- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -1330,7 +1330,7 @@ static int ext4_fc_replay_unlink(struct return 0; }
- ret = __ext4_unlink(NULL, old_parent, &entry, inode); + ret = __ext4_unlink(old_parent, &entry, inode, NULL); /* -ENOENT ok coz it might not exist anymore. */ if (ret == -ENOENT) ret = 0; --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -3204,14 +3204,20 @@ end_rmdir: return retval; }
-int __ext4_unlink(handle_t *handle, struct inode *dir, const struct qstr *d_name, - struct inode *inode) +int __ext4_unlink(struct inode *dir, const struct qstr *d_name, + struct inode *inode, + struct dentry *dentry /* NULL during fast_commit recovery */) { int retval = -ENOENT; struct buffer_head *bh; struct ext4_dir_entry_2 *de; + handle_t *handle; int skip_remove_dentry = 0;
+ /* + * Keep this outside the transaction; it may have to set up the + * directory's encryption key, which isn't GFP_NOFS-safe. + */ bh = ext4_find_entry(dir, d_name, &de, NULL); if (IS_ERR(bh)) return PTR_ERR(bh); @@ -3228,7 +3234,14 @@ int __ext4_unlink(handle_t *handle, stru if (EXT4_SB(inode->i_sb)->s_mount_state & EXT4_FC_REPLAY) skip_remove_dentry = 1; else - goto out; + goto out_bh; + } + + handle = ext4_journal_start(dir, EXT4_HT_DIR, + EXT4_DATA_TRANS_BLOCKS(dir->i_sb)); + if (IS_ERR(handle)) { + retval = PTR_ERR(handle); + goto out_bh; }
if (IS_DIRSYNC(dir)) @@ -3237,12 +3250,12 @@ int __ext4_unlink(handle_t *handle, stru if (!skip_remove_dentry) { retval = ext4_delete_entry(handle, dir, de, bh); if (retval) - goto out; + goto out_handle; dir->i_ctime = dir->i_mtime = current_time(dir); ext4_update_dx_flag(dir); retval = ext4_mark_inode_dirty(handle, dir); if (retval) - goto out; + goto out_handle; } else { retval = 0; } @@ -3255,15 +3268,17 @@ int __ext4_unlink(handle_t *handle, stru ext4_orphan_add(handle, inode); inode->i_ctime = current_time(inode); retval = ext4_mark_inode_dirty(handle, inode); - -out: + if (dentry && !retval) + ext4_fc_track_unlink(handle, dentry); +out_handle: + ext4_journal_stop(handle); +out_bh: brelse(bh); return retval; }
static int ext4_unlink(struct inode *dir, struct dentry *dentry) { - handle_t *handle; int retval;
if (unlikely(ext4_forced_shutdown(EXT4_SB(dir->i_sb)))) @@ -3281,16 +3296,7 @@ static int ext4_unlink(struct inode *dir if (retval) goto out_trace;
- handle = ext4_journal_start(dir, EXT4_HT_DIR, - EXT4_DATA_TRANS_BLOCKS(dir->i_sb)); - if (IS_ERR(handle)) { - retval = PTR_ERR(handle); - goto out_trace; - } - - retval = __ext4_unlink(handle, dir, &dentry->d_name, d_inode(dentry)); - if (!retval) - ext4_fc_track_unlink(handle, dentry); + retval = __ext4_unlink(dir, &dentry->d_name, d_inode(dentry), dentry); #ifdef CONFIG_UNICODE /* VFS negative dentries are incompatible with Encoding and * Case-insensitiveness. Eventually we'll want avoid @@ -3301,8 +3307,6 @@ static int ext4_unlink(struct inode *dir if (IS_CASEFOLDED(dir)) d_invalidate(dentry); #endif - if (handle) - ext4_journal_stop(handle);
out_trace: trace_ext4_unlink_exit(dentry, retval);
From: Eric Biggers ebiggers@kernel.org
From: Eric Biggers ebiggers@google.com
commit 64b4a25c3de81a69724e888ec2db3533b43816e2 upstream.
Validate the inode and filename lengths in fast-commit journal records so that a malicious fast-commit journal cannot cause a crash by having invalid values for these. Also validate EXT4_FC_TAG_DEL_RANGE.
Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20221106224841.279231-5-ebiggers@kernel.org Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/fast_commit.c | 38 +++++++++++++++++++------------------- fs/ext4/fast_commit.h | 2 +- 2 files changed, 20 insertions(+), 20 deletions(-)
--- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -1916,32 +1916,31 @@ void ext4_fc_replay_cleanup(struct super kfree(sbi->s_fc_replay_state.fc_modified_inodes); }
-static inline bool ext4_fc_tag_len_isvalid(struct ext4_fc_tl *tl, - u8 *val, u8 *end) +static bool ext4_fc_value_len_isvalid(struct ext4_sb_info *sbi, + int tag, int len) { - if (val + tl->fc_len > end) - return false; - - /* Here only check ADD_RANGE/TAIL/HEAD which will read data when do - * journal rescan before do CRC check. Other tags length check will - * rely on CRC check. - */ - switch (tl->fc_tag) { + switch (tag) { case EXT4_FC_TAG_ADD_RANGE: - return (sizeof(struct ext4_fc_add_range) == tl->fc_len); - case EXT4_FC_TAG_TAIL: - return (sizeof(struct ext4_fc_tail) <= tl->fc_len); - case EXT4_FC_TAG_HEAD: - return (sizeof(struct ext4_fc_head) == tl->fc_len); + return len == sizeof(struct ext4_fc_add_range); case EXT4_FC_TAG_DEL_RANGE: + return len == sizeof(struct ext4_fc_del_range); + case EXT4_FC_TAG_CREAT: case EXT4_FC_TAG_LINK: case EXT4_FC_TAG_UNLINK: - case EXT4_FC_TAG_CREAT: + len -= sizeof(struct ext4_fc_dentry_info); + return len >= 1 && len <= EXT4_NAME_LEN; case EXT4_FC_TAG_INODE: + len -= sizeof(struct ext4_fc_inode); + return len >= EXT4_GOOD_OLD_INODE_SIZE && + len <= sbi->s_inode_size; case EXT4_FC_TAG_PAD: - default: - return true; + return true; /* padding can have any length */ + case EXT4_FC_TAG_TAIL: + return len >= sizeof(struct ext4_fc_tail); + case EXT4_FC_TAG_HEAD: + return len == sizeof(struct ext4_fc_head); } + return false; }
/* @@ -2004,7 +2003,8 @@ static int ext4_fc_replay_scan(journal_t cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) { ext4_fc_get_tl(&tl, cur); val = cur + EXT4_FC_TAG_BASE_LEN; - if (!ext4_fc_tag_len_isvalid(&tl, val, end)) { + if (tl.fc_len > end - val || + !ext4_fc_value_len_isvalid(sbi, tl.fc_tag, tl.fc_len)) { ret = state->fc_replay_num_tags ? JBD2_FC_REPLAY_STOP : -ECANCELED; goto out_err; --- a/fs/ext4/fast_commit.h +++ b/fs/ext4/fast_commit.h @@ -58,7 +58,7 @@ struct ext4_fc_dentry_info { __u8 fc_dname[0]; };
-/* Value structure for EXT4_FC_TAG_INODE and EXT4_FC_TAG_INODE_PARTIAL. */ +/* Value structure for EXT4_FC_TAG_INODE. */ struct ext4_fc_inode { __le32 fc_ino; __u8 fc_raw_inode[0];
From: Eric Biggers ebiggers@kernel.org
From: Eric Biggers ebiggers@google.com
commit 8415ce07ecf0cc25efdd5db264a7133716e503cf upstream.
As is done elsewhere in the file, build the struct ext4_fc_tl on the stack and memcpy() it into the buffer, rather than directly writing it to a potentially-unaligned location in the buffer.
Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20221106224841.279231-6-ebiggers@kernel.org Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/fast_commit.c | 39 +++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 18 deletions(-)
--- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -604,6 +604,15 @@ static void ext4_fc_submit_bh(struct sup
/* Ext4 commit path routines */
+/* memcpy to fc reserved space and update CRC */ +static void *ext4_fc_memcpy(struct super_block *sb, void *dst, const void *src, + int len, u32 *crc) +{ + if (crc) + *crc = ext4_chksum(EXT4_SB(sb), *crc, src, len); + return memcpy(dst, src, len); +} + /* memzero and update CRC */ static void *ext4_fc_memzero(struct super_block *sb, void *dst, int len, u32 *crc) @@ -629,12 +638,13 @@ static void *ext4_fc_memzero(struct supe */ static u8 *ext4_fc_reserve_space(struct super_block *sb, int len, u32 *crc) { - struct ext4_fc_tl *tl; + struct ext4_fc_tl tl; struct ext4_sb_info *sbi = EXT4_SB(sb); struct buffer_head *bh; int bsize = sbi->s_journal->j_blocksize; int ret, off = sbi->s_fc_bytes % bsize; int pad_len; + u8 *dst;
/* * After allocating len, we should have space at least for a 0 byte @@ -658,16 +668,18 @@ static u8 *ext4_fc_reserve_space(struct return sbi->s_fc_bh->b_data + off; } /* Need to add PAD tag */ - tl = (struct ext4_fc_tl *)(sbi->s_fc_bh->b_data + off); - tl->fc_tag = cpu_to_le16(EXT4_FC_TAG_PAD); + dst = sbi->s_fc_bh->b_data + off; + tl.fc_tag = cpu_to_le16(EXT4_FC_TAG_PAD); pad_len = bsize - off - 1 - EXT4_FC_TAG_BASE_LEN; - tl->fc_len = cpu_to_le16(pad_len); - if (crc) - *crc = ext4_chksum(sbi, *crc, tl, EXT4_FC_TAG_BASE_LEN); - if (pad_len > 0) - ext4_fc_memzero(sb, tl + 1, pad_len, crc); + tl.fc_len = cpu_to_le16(pad_len); + ext4_fc_memcpy(sb, dst, &tl, EXT4_FC_TAG_BASE_LEN, crc); + dst += EXT4_FC_TAG_BASE_LEN; + if (pad_len > 0) { + ext4_fc_memzero(sb, dst, pad_len, crc); + dst += pad_len; + } /* Don't leak uninitialized memory in the unused last byte. */ - *((u8 *)(tl + 1) + pad_len) = 0; + *dst = 0;
ext4_fc_submit_bh(sb, false);
@@ -679,15 +691,6 @@ static u8 *ext4_fc_reserve_space(struct return sbi->s_fc_bh->b_data; }
-/* memcpy to fc reserved space and update CRC */ -static void *ext4_fc_memcpy(struct super_block *sb, void *dst, const void *src, - int len, u32 *crc) -{ - if (crc) - *crc = ext4_chksum(EXT4_SB(sb), *crc, src, len); - return memcpy(dst, src, len); -} - /* * Complete a fast commit by writing tail tag. *
From: Eric Biggers ebiggers@kernel.org
From: Eric Biggers ebiggers@google.com
commit 48a6a66db82b8043d298a630f22c62d43550cae5 upstream.
Due to several different off-by-one errors, or perhaps due to a late change in design that wasn't fully reflected in the code that was actually merged, there are several very strange constraints on how fast-commit blocks are filled with tlv entries:
- tlvs must start at least 10 bytes before the end of the block, even though the minimum tlv length is 8. Otherwise, the replay code will ignore them. (BUG: ext4_fc_reserve_space() could violate this requirement if called with a len of blocksize - 9 or blocksize - 8. Fortunately, this doesn't seem to happen currently.)
- tlvs must end at least 1 byte before the end of the block. Otherwise the replay code will consider them to be invalid. This quirk contributed to a bug (fixed by an earlier commit) where uninitialized memory was being leaked to disk in the last byte of blocks.
Also, strangely these constraints don't apply to the replay code in e2fsprogs, which will accept any tlvs in the blocks (with no bounds checks at all, but that is a separate issue...).
Given that this all seems to be a bug, let's fix it by just filling blocks with tlv entries in the natural way.
Note that old kernels will be unable to replay fast-commit journals created by kernels that have this commit.
Fixes: aa75f4d3daae ("ext4: main fast-commit commit path") Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20221106224841.279231-7-ebiggers@kernel.org Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/fast_commit.c | 66 +++++++++++++++++++++++++------------------------- 1 file changed, 33 insertions(+), 33 deletions(-)
--- a/fs/ext4/fast_commit.c +++ b/fs/ext4/fast_commit.c @@ -643,43 +643,43 @@ static u8 *ext4_fc_reserve_space(struct struct buffer_head *bh; int bsize = sbi->s_journal->j_blocksize; int ret, off = sbi->s_fc_bytes % bsize; - int pad_len; + int remaining; u8 *dst;
/* - * After allocating len, we should have space at least for a 0 byte - * padding. + * If 'len' is too long to fit in any block alongside a PAD tlv, then we + * cannot fulfill the request. */ - if (len + EXT4_FC_TAG_BASE_LEN > bsize) + if (len > bsize - EXT4_FC_TAG_BASE_LEN) return NULL;
- if (bsize - off - 1 > len + EXT4_FC_TAG_BASE_LEN) { - /* - * Only allocate from current buffer if we have enough space for - * this request AND we have space to add a zero byte padding. - */ - if (!sbi->s_fc_bh) { - ret = jbd2_fc_get_buf(EXT4_SB(sb)->s_journal, &bh); - if (ret) - return NULL; - sbi->s_fc_bh = bh; - } - sbi->s_fc_bytes += len; - return sbi->s_fc_bh->b_data + off; + if (!sbi->s_fc_bh) { + ret = jbd2_fc_get_buf(EXT4_SB(sb)->s_journal, &bh); + if (ret) + return NULL; + sbi->s_fc_bh = bh; } - /* Need to add PAD tag */ dst = sbi->s_fc_bh->b_data + off; + + /* + * Allocate the bytes in the current block if we can do so while still + * leaving enough space for a PAD tlv. + */ + remaining = bsize - EXT4_FC_TAG_BASE_LEN - off; + if (len <= remaining) { + sbi->s_fc_bytes += len; + return dst; + } + + /* + * Else, terminate the current block with a PAD tlv, then allocate a new + * block and allocate the bytes at the start of that new block. + */ + tl.fc_tag = cpu_to_le16(EXT4_FC_TAG_PAD); - pad_len = bsize - off - 1 - EXT4_FC_TAG_BASE_LEN; - tl.fc_len = cpu_to_le16(pad_len); + tl.fc_len = cpu_to_le16(remaining); ext4_fc_memcpy(sb, dst, &tl, EXT4_FC_TAG_BASE_LEN, crc); - dst += EXT4_FC_TAG_BASE_LEN; - if (pad_len > 0) { - ext4_fc_memzero(sb, dst, pad_len, crc); - dst += pad_len; - } - /* Don't leak uninitialized memory in the unused last byte. */ - *dst = 0; + ext4_fc_memzero(sb, dst + EXT4_FC_TAG_BASE_LEN, remaining, crc);
ext4_fc_submit_bh(sb, false);
@@ -687,7 +687,7 @@ static u8 *ext4_fc_reserve_space(struct if (ret) return NULL; sbi->s_fc_bh = bh; - sbi->s_fc_bytes = (sbi->s_fc_bytes / bsize + 1) * bsize + len; + sbi->s_fc_bytes += bsize - off + len; return sbi->s_fc_bh->b_data; }
@@ -718,7 +718,7 @@ static int ext4_fc_write_tail(struct sup off = sbi->s_fc_bytes % bsize;
tl.fc_tag = cpu_to_le16(EXT4_FC_TAG_TAIL); - tl.fc_len = cpu_to_le16(bsize - off - 1 + sizeof(struct ext4_fc_tail)); + tl.fc_len = cpu_to_le16(bsize - off + sizeof(struct ext4_fc_tail)); sbi->s_fc_bytes = round_up(sbi->s_fc_bytes, bsize);
ext4_fc_memcpy(sb, dst, &tl, EXT4_FC_TAG_BASE_LEN, &crc); @@ -1981,7 +1981,7 @@ static int ext4_fc_replay_scan(journal_t state = &sbi->s_fc_replay_state;
start = (u8 *)bh->b_data; - end = (__u8 *)bh->b_data + journal->j_blocksize - 1; + end = start + journal->j_blocksize;
if (state->fc_replay_expected_off == 0) { state->fc_cur_tag = 0; @@ -2002,7 +2002,7 @@ static int ext4_fc_replay_scan(journal_t }
state->fc_replay_expected_off++; - for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN; + for (cur = start; cur <= end - EXT4_FC_TAG_BASE_LEN; cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) { ext4_fc_get_tl(&tl, cur); val = cur + EXT4_FC_TAG_BASE_LEN; @@ -2120,9 +2120,9 @@ static int ext4_fc_replay(journal_t *jou #endif
start = (u8 *)bh->b_data; - end = (__u8 *)bh->b_data + journal->j_blocksize - 1; + end = start + journal->j_blocksize;
- for (cur = start; cur < end - EXT4_FC_TAG_BASE_LEN; + for (cur = start; cur <= end - EXT4_FC_TAG_BASE_LEN; cur = cur + EXT4_FC_TAG_BASE_LEN + tl.fc_len) { ext4_fc_get_tl(&tl, cur); val = cur + EXT4_FC_TAG_BASE_LEN;
From: Jens Axboe axboe@kernel.dk
commit 191f8453fc99a537ea78b727acea739782378b0d upstream.
We want to ensure that the mask related to calling do_work_pending() is within the first 16 bits. Move bits unrelated to that outside of that range, to avoid spuriously calling do_work_pending() when we don't need to.
Cc: stable@vger.kernel.org Fixes: 32d59773da38 ("arm: add support for TIF_NOTIFY_SIGNAL") Reported-and-tested-by: Hui Tang tanghui20@huawei.com Suggested-by: Russell King (Oracle) linux@armlinux.org.uk Link: https://lore.kernel.org/lkml/7ecb8f3c-2aeb-a905-0d4a-aa768b9649b5@huawei.com... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/arm/include/asm/thread_info.h | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
--- a/arch/arm/include/asm/thread_info.h +++ b/arch/arm/include/asm/thread_info.h @@ -129,15 +129,16 @@ extern int vfp_restore_user_hwstate(stru #define TIF_NEED_RESCHED 1 /* rescheduling necessary */ #define TIF_NOTIFY_RESUME 2 /* callback before returning to user */ #define TIF_UPROBE 3 /* breakpointed or singlestepping */ -#define TIF_SYSCALL_TRACE 4 /* syscall trace active */ -#define TIF_SYSCALL_AUDIT 5 /* syscall auditing active */ -#define TIF_SYSCALL_TRACEPOINT 6 /* syscall tracepoint instrumentation */ -#define TIF_SECCOMP 7 /* seccomp syscall filtering active */ -#define TIF_NOTIFY_SIGNAL 8 /* signal notifications exist */ +#define TIF_NOTIFY_SIGNAL 4 /* signal notifications exist */
#define TIF_USING_IWMMXT 17 #define TIF_MEMDIE 18 /* is terminating due to OOM killer */ -#define TIF_RESTORE_SIGMASK 20 +#define TIF_RESTORE_SIGMASK 19 +#define TIF_SYSCALL_TRACE 20 /* syscall trace active */ +#define TIF_SYSCALL_AUDIT 21 /* syscall auditing active */ +#define TIF_SYSCALL_TRACEPOINT 22 /* syscall tracepoint instrumentation */ +#define TIF_SECCOMP 23 /* seccomp syscall filtering active */ +
#define _TIF_SIGPENDING (1 << TIF_SIGPENDING) #define _TIF_NEED_RESCHED (1 << TIF_NEED_RESCHED)
[ Upstream commit d8a5b59c5fc75c99ba17e3eb1a8f580d8d172b28 ]
The SM8250 only uses three clocks but the DP configuration erroneously described four clocks.
In case the DP part of the PHY is initialised before the USB part, this would lead to uninitialised memory beyond the bulk-clocks array to be treated as a clock pointer as the clocks are requested based on the USB configuration.
Fixes: aff188feb5e1 ("phy: qcom-qmp: add support for sm8250-usb3-dp phy") Cc: stable@vger.kernel.org # 5.13 Reviewed-by: Dmitry Baryshkov dmitry.baryshkov@linaro.org Signed-off-by: Johan Hovold johan+linaro@kernel.org Link: https://lore.kernel.org/r/20221114081346.5116-2-johan+linaro@kernel.org Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/phy/qualcomm/phy-qcom-qmp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/qualcomm/phy-qcom-qmp.c b/drivers/phy/qualcomm/phy-qcom-qmp.c index 817298d8b0e3..a9687e040960 100644 --- a/drivers/phy/qualcomm/phy-qcom-qmp.c +++ b/drivers/phy/qualcomm/phy-qcom-qmp.c @@ -3805,8 +3805,8 @@ static const struct qmp_phy_cfg sm8250_dpphy_cfg = { .serdes_tbl_hbr3 = qmp_v4_dp_serdes_tbl_hbr3, .serdes_tbl_hbr3_num = ARRAY_SIZE(qmp_v4_dp_serdes_tbl_hbr3),
- .clk_list = qmp_v4_phy_clk_l, - .num_clks = ARRAY_SIZE(qmp_v4_phy_clk_l), + .clk_list = qmp_v4_sm8250_usbphy_clk_l, + .num_clks = ARRAY_SIZE(qmp_v4_sm8250_usbphy_clk_l), .reset_list = msm8996_usb3phy_reset_l, .num_resets = ARRAY_SIZE(msm8996_usb3phy_reset_l), .vreg_list = qmp_phy_vreg_l,
[ Upstream commit 63d5429f68a3d4c4aa27e65a05196c17f86c41d6 ]
Using strncpy() on NUL-terminated strings are deprecated. To avoid possible forming of non-terminated string strscpy() should be used.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Artem Chernyshev artem.chernyshev@red-soft.ru Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/ioctl.c | 9 +++------ fs/btrfs/rcu-string.h | 6 +++++- 2 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c index 391a4af9c5e5..ed9c715d2579 100644 --- a/fs/btrfs/ioctl.c +++ b/fs/btrfs/ioctl.c @@ -3415,13 +3415,10 @@ static long btrfs_ioctl_dev_info(struct btrfs_fs_info *fs_info, di_args->bytes_used = btrfs_device_get_bytes_used(dev); di_args->total_bytes = btrfs_device_get_total_bytes(dev); memcpy(di_args->uuid, dev->uuid, sizeof(di_args->uuid)); - if (dev->name) { - strncpy(di_args->path, rcu_str_deref(dev->name), - sizeof(di_args->path) - 1); - di_args->path[sizeof(di_args->path) - 1] = 0; - } else { + if (dev->name) + strscpy(di_args->path, rcu_str_deref(dev->name), sizeof(di_args->path)); + else di_args->path[0] = '\0'; - }
out: rcu_read_unlock(); diff --git a/fs/btrfs/rcu-string.h b/fs/btrfs/rcu-string.h index 5c1a617eb25d..5c2b66d155ef 100644 --- a/fs/btrfs/rcu-string.h +++ b/fs/btrfs/rcu-string.h @@ -18,7 +18,11 @@ static inline struct rcu_string *rcu_string_strdup(const char *src, gfp_t mask) (len * sizeof(char)), mask); if (!ret) return ret; - strncpy(ret->str, src, len); + /* Warn if the source got unexpectedly truncated. */ + if (WARN_ON(strscpy(ret->str, src, len) < 0)) { + kfree(ret); + return NULL; + } return ret; }
From: Nikolay Borisov nborisov@suse.com
[ Upstream commit ff37c89f94be14b0e22a532d1e6d57187bfd5bb8 ]
This simplifies the code flow in read_one_chunk and makes error handling when handling missing devices a bit simpler by reducing it to a single check if something went wrong. No functional changes.
Reviewed-by: Su Yue l@damenly.su Signed-off-by: Nikolay Borisov nborisov@suse.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Stable-dep-of: 1742e1c90c3d ("btrfs: fix extent map use-after-free when handling missing device in read_one_chunk") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/volumes.c | 38 ++++++++++++++++++++++++-------------- 1 file changed, 24 insertions(+), 14 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c886ec81c5d0..c773ecba7c2d 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7043,6 +7043,27 @@ static void warn_32bit_meta_chunk(struct btrfs_fs_info *fs_info, } #endif
+static struct btrfs_device *handle_missing_device(struct btrfs_fs_info *fs_info, + u64 devid, u8 *uuid) +{ + struct btrfs_device *dev; + + if (!btrfs_test_opt(fs_info, DEGRADED)) { + btrfs_report_missing_device(fs_info, devid, uuid, true); + return ERR_PTR(-ENOENT); + } + + dev = add_missing_dev(fs_info->fs_devices, devid, uuid); + if (IS_ERR(dev)) { + btrfs_err(fs_info, "failed to init missing device %llu: %ld", + devid, PTR_ERR(dev)); + return dev; + } + btrfs_report_missing_device(fs_info, devid, uuid, false); + + return dev; +} + static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf, struct btrfs_chunk *chunk) { @@ -7130,28 +7151,17 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf, BTRFS_UUID_SIZE); args.uuid = uuid; map->stripes[i].dev = btrfs_find_device(fs_info->fs_devices, &args); - if (!map->stripes[i].dev && - !btrfs_test_opt(fs_info, DEGRADED)) { - free_extent_map(em); - btrfs_report_missing_device(fs_info, devid, uuid, true); - return -ENOENT; - } if (!map->stripes[i].dev) { - map->stripes[i].dev = - add_missing_dev(fs_info->fs_devices, devid, - uuid); + map->stripes[i].dev = handle_missing_device(fs_info, + devid, uuid); if (IS_ERR(map->stripes[i].dev)) { free_extent_map(em); - btrfs_err(fs_info, - "failed to init missing dev %llu: %ld", - devid, PTR_ERR(map->stripes[i].dev)); return PTR_ERR(map->stripes[i].dev); } - btrfs_report_missing_device(fs_info, devid, uuid, false); } + set_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &(map->stripes[i].dev->dev_state)); - }
write_lock(&map_tree->lock);
From: void0red void0red@gmail.com
[ Upstream commit 1742e1c90c3da344f3bb9b1f1309b3f47482756a ]
Store the error code before freeing the extent_map. Though it's reference counted structure, in that function it's the first and last allocation so this would lead to a potential use-after-free.
The error can happen eg. when chunk is stored on a missing device and the degraded mount option is missing.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216721 Reported-by: eriri 1527030098@qq.com Fixes: adfb69af7d8c ("btrfs: add_missing_dev() should return the actual error") CC: stable@vger.kernel.org # 4.9+ Signed-off-by: void0red void0red@gmail.com Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/volumes.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index c773ecba7c2d..6b86a3cec04c 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -7155,8 +7155,9 @@ static int read_one_chunk(struct btrfs_key *key, struct extent_buffer *leaf, map->stripes[i].dev = handle_missing_device(fs_info, devid, uuid); if (IS_ERR(map->stripes[i].dev)) { + ret = PTR_ERR(map->stripes[i].dev); free_extent_map(em); - return PTR_ERR(map->stripes[i].dev); + return ret; } }
From: Borislav Petkov bp@suse.de
[ Upstream commit 8121b8f947be0033f567619be204639a50cad298 ]
Avoid having indirect calls and use a normal function which returns the proper MSR address based on ->smca setting.
No functional changes.
Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Tony Luck tony.luck@intel.com Link: https://lkml.kernel.org/r/20210922165101.18951-4-bp@alien8.de Stable-dep-of: bc1b705b0eee ("x86/MCE/AMD: Clear DFR errors found in THR handler") Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/mce/amd.c | 10 ++-- arch/x86/kernel/cpu/mce/core.c | 95 ++++++++++-------------------- arch/x86/kernel/cpu/mce/internal.h | 12 ++-- 3 files changed, 42 insertions(+), 75 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index a873577e49dc..b8b7c304e4ba 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -526,7 +526,7 @@ static u32 get_block_address(u32 current_addr, u32 low, u32 high, /* Fall back to method we used for older processors: */ switch (block) { case 0: - addr = msr_ops.misc(bank); + addr = mca_msr_reg(bank, MCA_MISC); break; case 1: offset = ((low & MASK_BLKPTR_LO) >> 21); @@ -978,8 +978,8 @@ static void log_error_deferred(unsigned int bank) { bool defrd;
- defrd = _log_error_bank(bank, msr_ops.status(bank), - msr_ops.addr(bank), 0); + defrd = _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS), + mca_msr_reg(bank, MCA_ADDR), 0);
if (!mce_flags.smca) return; @@ -1009,7 +1009,7 @@ static void amd_deferred_error_interrupt(void)
static void log_error_thresholding(unsigned int bank, u64 misc) { - _log_error_bank(bank, msr_ops.status(bank), msr_ops.addr(bank), misc); + _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS), mca_msr_reg(bank, MCA_ADDR), misc); }
static void log_and_reset_block(struct threshold_block *block) @@ -1397,7 +1397,7 @@ static int threshold_create_bank(struct threshold_bank **bp, unsigned int cpu, } }
- err = allocate_threshold_blocks(cpu, b, bank, 0, msr_ops.misc(bank)); + err = allocate_threshold_blocks(cpu, b, bank, 0, mca_msr_reg(bank, MCA_MISC)); if (err) goto out_kobj;
diff --git a/arch/x86/kernel/cpu/mce/core.c b/arch/x86/kernel/cpu/mce/core.c index 773037e5fd76..5ee82fd386dd 100644 --- a/arch/x86/kernel/cpu/mce/core.c +++ b/arch/x86/kernel/cpu/mce/core.c @@ -176,53 +176,27 @@ void mce_unregister_decode_chain(struct notifier_block *nb) } EXPORT_SYMBOL_GPL(mce_unregister_decode_chain);
-static inline u32 ctl_reg(int bank) +u32 mca_msr_reg(int bank, enum mca_msr reg) { - return MSR_IA32_MCx_CTL(bank); -} - -static inline u32 status_reg(int bank) -{ - return MSR_IA32_MCx_STATUS(bank); -} - -static inline u32 addr_reg(int bank) -{ - return MSR_IA32_MCx_ADDR(bank); -} - -static inline u32 misc_reg(int bank) -{ - return MSR_IA32_MCx_MISC(bank); -} - -static inline u32 smca_ctl_reg(int bank) -{ - return MSR_AMD64_SMCA_MCx_CTL(bank); -} - -static inline u32 smca_status_reg(int bank) -{ - return MSR_AMD64_SMCA_MCx_STATUS(bank); -} + if (mce_flags.smca) { + switch (reg) { + case MCA_CTL: return MSR_AMD64_SMCA_MCx_CTL(bank); + case MCA_ADDR: return MSR_AMD64_SMCA_MCx_ADDR(bank); + case MCA_MISC: return MSR_AMD64_SMCA_MCx_MISC(bank); + case MCA_STATUS: return MSR_AMD64_SMCA_MCx_STATUS(bank); + } + }
-static inline u32 smca_addr_reg(int bank) -{ - return MSR_AMD64_SMCA_MCx_ADDR(bank); -} + switch (reg) { + case MCA_CTL: return MSR_IA32_MCx_CTL(bank); + case MCA_ADDR: return MSR_IA32_MCx_ADDR(bank); + case MCA_MISC: return MSR_IA32_MCx_MISC(bank); + case MCA_STATUS: return MSR_IA32_MCx_STATUS(bank); + }
-static inline u32 smca_misc_reg(int bank) -{ - return MSR_AMD64_SMCA_MCx_MISC(bank); + return 0; }
-struct mca_msr_regs msr_ops = { - .ctl = ctl_reg, - .status = status_reg, - .addr = addr_reg, - .misc = misc_reg -}; - static void __print_mce(struct mce *m) { pr_emerg(HW_ERR "CPU %d: Machine Check%s: %Lx Bank %d: %016Lx\n", @@ -371,11 +345,11 @@ static int msr_to_offset(u32 msr)
if (msr == mca_cfg.rip_msr) return offsetof(struct mce, ip); - if (msr == msr_ops.status(bank)) + if (msr == mca_msr_reg(bank, MCA_STATUS)) return offsetof(struct mce, status); - if (msr == msr_ops.addr(bank)) + if (msr == mca_msr_reg(bank, MCA_ADDR)) return offsetof(struct mce, addr); - if (msr == msr_ops.misc(bank)) + if (msr == mca_msr_reg(bank, MCA_MISC)) return offsetof(struct mce, misc); if (msr == MSR_IA32_MCG_STATUS) return offsetof(struct mce, mcgstatus); @@ -676,10 +650,10 @@ static struct notifier_block mce_default_nb = { static noinstr void mce_read_aux(struct mce *m, int i) { if (m->status & MCI_STATUS_MISCV) - m->misc = mce_rdmsrl(msr_ops.misc(i)); + m->misc = mce_rdmsrl(mca_msr_reg(i, MCA_MISC));
if (m->status & MCI_STATUS_ADDRV) { - m->addr = mce_rdmsrl(msr_ops.addr(i)); + m->addr = mce_rdmsrl(mca_msr_reg(i, MCA_ADDR));
/* * Mask the reported address by the reported granularity. @@ -749,7 +723,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b) m.bank = i;
barrier(); - m.status = mce_rdmsrl(msr_ops.status(i)); + m.status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS));
/* If this entry is not valid, ignore it */ if (!(m.status & MCI_STATUS_VAL)) @@ -817,7 +791,7 @@ bool machine_check_poll(enum mcp_flags flags, mce_banks_t *b) /* * Clear state for this bank. */ - mce_wrmsrl(msr_ops.status(i), 0); + mce_wrmsrl(mca_msr_reg(i, MCA_STATUS), 0); }
/* @@ -842,7 +816,7 @@ static int mce_no_way_out(struct mce *m, char **msg, unsigned long *validp, int i;
for (i = 0; i < this_cpu_read(mce_num_banks); i++) { - m->status = mce_rdmsrl(msr_ops.status(i)); + m->status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS)); if (!(m->status & MCI_STATUS_VAL)) continue;
@@ -1143,7 +1117,7 @@ static void mce_clear_state(unsigned long *toclear)
for (i = 0; i < this_cpu_read(mce_num_banks); i++) { if (test_bit(i, toclear)) - mce_wrmsrl(msr_ops.status(i), 0); + mce_wrmsrl(mca_msr_reg(i, MCA_STATUS), 0); } }
@@ -1202,7 +1176,7 @@ static void __mc_scan_banks(struct mce *m, struct pt_regs *regs, struct mce *fin m->addr = 0; m->bank = i;
- m->status = mce_rdmsrl(msr_ops.status(i)); + m->status = mce_rdmsrl(mca_msr_reg(i, MCA_STATUS)); if (!(m->status & MCI_STATUS_VAL)) continue;
@@ -1699,8 +1673,8 @@ static void __mcheck_cpu_init_clear_banks(void)
if (!b->init) continue; - wrmsrl(msr_ops.ctl(i), b->ctl); - wrmsrl(msr_ops.status(i), 0); + wrmsrl(mca_msr_reg(i, MCA_CTL), b->ctl); + wrmsrl(mca_msr_reg(i, MCA_STATUS), 0); } }
@@ -1726,7 +1700,7 @@ static void __mcheck_cpu_check_banks(void) if (!b->init) continue;
- rdmsrl(msr_ops.ctl(i), msrval); + rdmsrl(mca_msr_reg(i, MCA_CTL), msrval); b->init = !!msrval; } } @@ -1883,13 +1857,6 @@ static void __mcheck_cpu_init_early(struct cpuinfo_x86 *c) mce_flags.succor = !!cpu_has(c, X86_FEATURE_SUCCOR); mce_flags.smca = !!cpu_has(c, X86_FEATURE_SMCA); mce_flags.amd_threshold = 1; - - if (mce_flags.smca) { - msr_ops.ctl = smca_ctl_reg; - msr_ops.status = smca_status_reg; - msr_ops.addr = smca_addr_reg; - msr_ops.misc = smca_misc_reg; - } } }
@@ -2265,7 +2232,7 @@ static void mce_disable_error_reporting(void) struct mce_bank *b = &mce_banks[i];
if (b->init) - wrmsrl(msr_ops.ctl(i), 0); + wrmsrl(mca_msr_reg(i, MCA_CTL), 0); } return; } @@ -2617,7 +2584,7 @@ static void mce_reenable_cpu(void) struct mce_bank *b = &mce_banks[i];
if (b->init) - wrmsrl(msr_ops.ctl(i), b->ctl); + wrmsrl(mca_msr_reg(i, MCA_CTL), b->ctl); } }
diff --git a/arch/x86/kernel/cpu/mce/internal.h b/arch/x86/kernel/cpu/mce/internal.h index 80dc94313bcf..760b57814760 100644 --- a/arch/x86/kernel/cpu/mce/internal.h +++ b/arch/x86/kernel/cpu/mce/internal.h @@ -168,14 +168,14 @@ struct mce_vendor_flags {
extern struct mce_vendor_flags mce_flags;
-struct mca_msr_regs { - u32 (*ctl) (int bank); - u32 (*status) (int bank); - u32 (*addr) (int bank); - u32 (*misc) (int bank); +enum mca_msr { + MCA_CTL, + MCA_STATUS, + MCA_ADDR, + MCA_MISC, };
-extern struct mca_msr_regs msr_ops; +u32 mca_msr_reg(int bank, enum mca_msr reg);
/* Decide whether to add MCE record to MCE event pool or filter it out. */ extern bool filter_mce(struct mce *m);
From: Yazen Ghannam yazen.ghannam@amd.com
[ Upstream commit bc1b705b0eee4c645ad8b3bbff3c8a66e9688362 ]
AMD's MCA Thresholding feature counts errors of all severity levels, not just correctable errors. If a deferred error causes the threshold limit to be reached (it was the error that caused the overflow), then both a deferred error interrupt and a thresholding interrupt will be triggered.
The order of the interrupts is not guaranteed. If the threshold interrupt handler is executed first, then it will clear MCA_STATUS for the error. It will not check or clear MCA_DESTAT which also holds a copy of the deferred error. When the deferred error interrupt handler runs it will not find an error in MCA_STATUS, but it will find the error in MCA_DESTAT. This will cause two errors to be logged.
Check for deferred errors when handling a threshold interrupt. If a bank contains a deferred error, then clear the bank's MCA_DESTAT register.
Define a new helper function to do the deferred error check and clearing of MCA_DESTAT.
[ bp: Simplify, convert comment to passive voice. ]
Fixes: 37d43acfd79f ("x86/mce/AMD: Redo error logging from APIC LVT interrupt handlers") Signed-off-by: Yazen Ghannam yazen.ghannam@amd.com Signed-off-by: Borislav Petkov bp@suse.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220621155943.33623-1-yazen.ghannam@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/cpu/mce/amd.c | 33 ++++++++++++++++++++------------- 1 file changed, 20 insertions(+), 13 deletions(-)
diff --git a/arch/x86/kernel/cpu/mce/amd.c b/arch/x86/kernel/cpu/mce/amd.c index b8b7c304e4ba..6469d3135d26 100644 --- a/arch/x86/kernel/cpu/mce/amd.c +++ b/arch/x86/kernel/cpu/mce/amd.c @@ -965,6 +965,24 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc) return status & MCI_STATUS_DEFERRED; }
+static bool _log_error_deferred(unsigned int bank, u32 misc) +{ + if (!_log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS), + mca_msr_reg(bank, MCA_ADDR), misc)) + return false; + + /* + * Non-SMCA systems don't have MCA_DESTAT/MCA_DEADDR registers. + * Return true here to avoid accessing these registers. + */ + if (!mce_flags.smca) + return true; + + /* Clear MCA_DESTAT if the deferred error was logged from MCA_STATUS. */ + wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0); + return true; +} + /* * We have three scenarios for checking for Deferred errors: * @@ -976,19 +994,8 @@ _log_error_bank(unsigned int bank, u32 msr_stat, u32 msr_addr, u64 misc) */ static void log_error_deferred(unsigned int bank) { - bool defrd; - - defrd = _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS), - mca_msr_reg(bank, MCA_ADDR), 0); - - if (!mce_flags.smca) - return; - - /* Clear MCA_DESTAT if we logged the deferred error from MCA_STATUS. */ - if (defrd) { - wrmsrl(MSR_AMD64_SMCA_MCx_DESTAT(bank), 0); + if (_log_error_deferred(bank, 0)) return; - }
/* * Only deferred errors are logged in MCA_DE{STAT,ADDR} so just check @@ -1009,7 +1016,7 @@ static void amd_deferred_error_interrupt(void)
static void log_error_thresholding(unsigned int bank, u64 misc) { - _log_error_bank(bank, mca_msr_reg(bank, MCA_STATUS), mca_msr_reg(bank, MCA_ADDR), misc); + _log_error_deferred(bank, misc); }
static void log_and_reset_block(struct threshold_block *block)
From: Smitha T Murthy smitha.t@samsung.com
[ Upstream commit d8a46bc4e1e0446459daa77c4ce14218d32dacf9 ]
On receiving last buffer driver puts MFC to MFCINST_FINISHING state which in turn skips transferring of frame from SRC to REF queue. This causes driver to stop MFC encoding and last frame is lost.
This patch guarantees safe handling of frames during MFCINST_FINISHING and correct clearing of workbit to avoid early stopping of encoding.
Fixes: af9357467810 ("[media] MFC: Add MFC 5.1 V4L2 driver")
Cc: stable@vger.kernel.org Cc: linux-fsd@tesla.com Signed-off-by: Smitha T Murthy smitha.t@samsung.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/s5p-mfc/s5p_mfc_enc.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_enc.c b/drivers/media/platform/s5p-mfc/s5p_mfc_enc.c index 1fad99edb091..3da1775a65f1 100644 --- a/drivers/media/platform/s5p-mfc/s5p_mfc_enc.c +++ b/drivers/media/platform/s5p-mfc/s5p_mfc_enc.c @@ -1218,6 +1218,7 @@ static int enc_post_frame_start(struct s5p_mfc_ctx *ctx) unsigned long mb_y_addr, mb_c_addr; int slice_type; unsigned int strm_size; + bool src_ready;
slice_type = s5p_mfc_hw_call(dev->mfc_ops, get_enc_slice_type, dev); strm_size = s5p_mfc_hw_call(dev->mfc_ops, get_enc_strm_size, dev); @@ -1257,7 +1258,8 @@ static int enc_post_frame_start(struct s5p_mfc_ctx *ctx) } } } - if ((ctx->src_queue_cnt > 0) && (ctx->state == MFCINST_RUNNING)) { + if (ctx->src_queue_cnt > 0 && (ctx->state == MFCINST_RUNNING || + ctx->state == MFCINST_FINISHING)) { mb_entry = list_entry(ctx->src_queue.next, struct s5p_mfc_buf, list); if (mb_entry->flags & MFC_BUF_FLAG_USED) { @@ -1288,7 +1290,13 @@ static int enc_post_frame_start(struct s5p_mfc_ctx *ctx) vb2_set_plane_payload(&mb_entry->b->vb2_buf, 0, strm_size); vb2_buffer_done(&mb_entry->b->vb2_buf, VB2_BUF_STATE_DONE); } - if ((ctx->src_queue_cnt == 0) || (ctx->dst_queue_cnt == 0)) + + src_ready = true; + if (ctx->state == MFCINST_RUNNING && ctx->src_queue_cnt == 0) + src_ready = false; + if (ctx->state == MFCINST_FINISHING && ctx->ref_queue_cnt == 0) + src_ready = false; + if (!src_ready || ctx->dst_queue_cnt == 0) clear_work_bit(ctx);
return 0;
From: Smitha T Murthy smitha.t@samsung.com
[ Upstream commit d3f3c2fe54e30b0636496d842ffbb5ad3a547f9b ]
During error on CLOSE_INSTANCE command, ctx_work_bits was not getting cleared. During consequent mfc execution NULL pointer dereferencing of this context led to kernel panic. This patch fixes this issue by making sure to clear ctx_work_bits always.
Fixes: 818cd91ab8c6 ("[media] s5p-mfc: Extract open/close MFC instance commands") Cc: stable@vger.kernel.org Cc: linux-fsd@tesla.com Signed-off-by: Smitha T Murthy smitha.t@samsung.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/s5p-mfc/s5p_mfc_ctrl.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_ctrl.c b/drivers/media/platform/s5p-mfc/s5p_mfc_ctrl.c index da138c314963..58822ec5370e 100644 --- a/drivers/media/platform/s5p-mfc/s5p_mfc_ctrl.c +++ b/drivers/media/platform/s5p-mfc/s5p_mfc_ctrl.c @@ -468,8 +468,10 @@ void s5p_mfc_close_mfc_inst(struct s5p_mfc_dev *dev, struct s5p_mfc_ctx *ctx) s5p_mfc_hw_call(dev->mfc_ops, try_run, dev); /* Wait until instance is returned or timeout occurred */ if (s5p_mfc_wait_for_done_ctx(ctx, - S5P_MFC_R2H_CMD_CLOSE_INSTANCE_RET, 0)) + S5P_MFC_R2H_CMD_CLOSE_INSTANCE_RET, 0)){ + clear_work_bit_irqsave(ctx); mfc_err("Err returning instance\n"); + }
/* Free resources */ s5p_mfc_hw_call(dev->mfc_ops, release_codec_buffers, ctx);
From: Smitha T Murthy smitha.t@samsung.com
[ Upstream commit 06710cd5d2436135046898d7e4b9408c8bb99446 ]
Few of the H264 encoder registers written were not getting reflected since the read values were not stored and getting overwritten.
Fixes: 6a9c6f681257 ("[media] s5p-mfc: Add variants to access mfc registers")
Cc: stable@vger.kernel.org Cc: linux-fsd@tesla.com Signed-off-by: Smitha T Murthy smitha.t@samsung.com Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c b/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c index a1453053e31a..ef8169f6c428 100644 --- a/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c +++ b/drivers/media/platform/s5p-mfc/s5p_mfc_opr_v6.c @@ -1060,7 +1060,7 @@ static int s5p_mfc_set_enc_params_h264(struct s5p_mfc_ctx *ctx) }
/* aspect ratio VUI */ - readl(mfc_regs->e_h264_options); + reg = readl(mfc_regs->e_h264_options); reg &= ~(0x1 << 5); reg |= ((p_h264->vui_sar & 0x1) << 5); writel(reg, mfc_regs->e_h264_options); @@ -1083,7 +1083,7 @@ static int s5p_mfc_set_enc_params_h264(struct s5p_mfc_ctx *ctx)
/* intra picture period for H.264 open GOP */ /* control */ - readl(mfc_regs->e_h264_options); + reg = readl(mfc_regs->e_h264_options); reg &= ~(0x1 << 4); reg |= ((p_h264->open_gop & 0x1) << 4); writel(reg, mfc_regs->e_h264_options); @@ -1097,23 +1097,23 @@ static int s5p_mfc_set_enc_params_h264(struct s5p_mfc_ctx *ctx) }
/* 'WEIGHTED_BI_PREDICTION' for B is disable */ - readl(mfc_regs->e_h264_options); + reg = readl(mfc_regs->e_h264_options); reg &= ~(0x3 << 9); writel(reg, mfc_regs->e_h264_options);
/* 'CONSTRAINED_INTRA_PRED_ENABLE' is disable */ - readl(mfc_regs->e_h264_options); + reg = readl(mfc_regs->e_h264_options); reg &= ~(0x1 << 14); writel(reg, mfc_regs->e_h264_options);
/* ASO */ - readl(mfc_regs->e_h264_options); + reg = readl(mfc_regs->e_h264_options); reg &= ~(0x1 << 6); reg |= ((p_h264->aso & 0x1) << 6); writel(reg, mfc_regs->e_h264_options);
/* hier qp enable */ - readl(mfc_regs->e_h264_options); + reg = readl(mfc_regs->e_h264_options); reg &= ~(0x1 << 8); reg |= ((p_h264->open_gop & 0x1) << 8); writel(reg, mfc_regs->e_h264_options); @@ -1134,7 +1134,7 @@ static int s5p_mfc_set_enc_params_h264(struct s5p_mfc_ctx *ctx) writel(reg, mfc_regs->e_h264_num_t_layer);
/* frame packing SEI generation */ - readl(mfc_regs->e_h264_options); + reg = readl(mfc_regs->e_h264_options); reg &= ~(0x1 << 25); reg |= ((p_h264->sei_frame_packing & 0x1) << 25); writel(reg, mfc_regs->e_h264_options);
From: Masami Hiramatsu (Google) mhiramat@kernel.org
[ Upstream commit f828929ab7f0dc3353e4a617f94f297fa8f3dec3 ]
Use dwarf_attr_integrate() instead of dwarf_attr() for generic attribute acccessor functions, so that it can find the specified attribute from abstact origin DIE etc.
Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Acked-by: Namhyung Kim namhyung@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Ingo Molnar mingo@redhat.com Cc: Jiri Olsa jolsa@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Peter Zijlstra peterz@infradead.org Cc: Steven Rostedt (VMware) rostedt@goodmis.org Link: https://lore.kernel.org/r/166731051988.2100653.13595339994343449770.stgit@de... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Stable-dep-of: a9dfc46c67b5 ("perf probe: Fix to get the DW_AT_decl_file and DW_AT_call_file as unsinged data") Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/dwarf-aux.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c index 609ca1671501..a07efbadb775 100644 --- a/tools/perf/util/dwarf-aux.c +++ b/tools/perf/util/dwarf-aux.c @@ -308,7 +308,7 @@ static int die_get_attr_udata(Dwarf_Die *tp_die, unsigned int attr_name, { Dwarf_Attribute attr;
- if (dwarf_attr(tp_die, attr_name, &attr) == NULL || + if (dwarf_attr_integrate(tp_die, attr_name, &attr) == NULL || dwarf_formudata(&attr, result) != 0) return -ENOENT;
@@ -321,7 +321,7 @@ static int die_get_attr_sdata(Dwarf_Die *tp_die, unsigned int attr_name, { Dwarf_Attribute attr;
- if (dwarf_attr(tp_die, attr_name, &attr) == NULL || + if (dwarf_attr_integrate(tp_die, attr_name, &attr) == NULL || dwarf_formsdata(&attr, result) != 0) return -ENOENT;
From: Masami Hiramatsu (Google) mhiramat@kernel.org
[ Upstream commit a9dfc46c67b52ad43b8e335e28f4cf8002c67793 ]
DWARF version 5 standard Sec 2.14 says that
Any debugging information entry representing the declaration of an object, module, subprogram or type may have DW_AT_decl_file, DW_AT_decl_line and DW_AT_decl_column attributes, each of whose value is an unsigned integer constant.
So it should be an unsigned integer data. Also, even though the standard doesn't clearly say the DW_AT_call_file is signed or unsigned, the elfutils (eu-readelf) interprets it as unsigned integer data and it is natural to handle it as unsigned integer data as same as DW_AT_decl_file. This changes the DW_AT_call_file as unsigned integer data too.
Fixes: 3f4460a28fb2f73d ("perf probe: Filter out redundant inline-instances") Signed-off-by: Masami Hiramatsu mhiramat@kernel.org Acked-by: Namhyung Kim namhyung@kernel.org Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Ingo Molnar mingo@redhat.com Cc: Jiri Olsa jolsa@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Masami Hiramatsu masami.hiramatsu.pt@hitachi.com Cc: Peter Zijlstra peterz@infradead.org Cc: stable@vger.kernel.org Cc: Steven Rostedt (VMware) rostedt@goodmis.org Link: https://lore.kernel.org/r/166761727445.480106.3738447577082071942.stgit@devn... Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/dwarf-aux.c | 21 ++++----------------- 1 file changed, 4 insertions(+), 17 deletions(-)
diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c index a07efbadb775..623527edeac1 100644 --- a/tools/perf/util/dwarf-aux.c +++ b/tools/perf/util/dwarf-aux.c @@ -315,19 +315,6 @@ static int die_get_attr_udata(Dwarf_Die *tp_die, unsigned int attr_name, return 0; }
-/* Get attribute and translate it as a sdata */ -static int die_get_attr_sdata(Dwarf_Die *tp_die, unsigned int attr_name, - Dwarf_Sword *result) -{ - Dwarf_Attribute attr; - - if (dwarf_attr_integrate(tp_die, attr_name, &attr) == NULL || - dwarf_formsdata(&attr, result) != 0) - return -ENOENT; - - return 0; -} - /** * die_is_signed_type - Check whether a type DIE is signed or not * @tp_die: a DIE of a type @@ -467,9 +454,9 @@ int die_get_data_member_location(Dwarf_Die *mb_die, Dwarf_Word *offs) /* Get the call file index number in CU DIE */ static int die_get_call_fileno(Dwarf_Die *in_die) { - Dwarf_Sword idx; + Dwarf_Word idx;
- if (die_get_attr_sdata(in_die, DW_AT_call_file, &idx) == 0) + if (die_get_attr_udata(in_die, DW_AT_call_file, &idx) == 0) return (int)idx; else return -ENOENT; @@ -478,9 +465,9 @@ static int die_get_call_fileno(Dwarf_Die *in_die) /* Get the declared file index number in CU DIE */ static int die_get_decl_fileno(Dwarf_Die *pdie) { - Dwarf_Sword idx; + Dwarf_Word idx;
- if (die_get_attr_sdata(pdie, DW_AT_decl_file, &idx) == 0) + if (die_get_attr_udata(pdie, DW_AT_decl_file, &idx) == 0) return (int)idx; else return -ENOENT;
From: Biju Das biju.das.jz@bp.renesas.com
[ Upstream commit c72a7e42592b2e18d862cf120876070947000d7a ]
This patch fixes the error "ravb 11c20000.ethernet eth0: failed to switch device to config mode" during unbind.
We are doing register access after pm_runtime_put_sync().
We usually do cleanup in reverse order of init. Currently in remove(), the "pm_runtime_put_sync" is not in reverse order.
Probe reset_control_deassert(rstc); pm_runtime_enable(&pdev->dev); pm_runtime_get_sync(&pdev->dev);
remove pm_runtime_put_sync(&pdev->dev); unregister_netdev(ndev); .. ravb_mdio_release(priv); pm_runtime_disable(&pdev->dev);
Consider the call to unregister_netdev() unregister_netdev->unregister_netdevice_queue->rollback_registered_many that calls the below functions which access the registers after pm_runtime_put_sync() 1) ravb_get_stats 2) ravb_close
Fixes: c156633f1353 ("Renesas Ethernet AVB driver proper") Cc: stable@vger.kernel.org Signed-off-by: Biju Das biju.das.jz@bp.renesas.com Reviewed-by: Leon Romanovsky leonro@nvidia.com Link: https://lore.kernel.org/r/20221214105118.2495313-1-biju.das.jz@bp.renesas.co... Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/renesas/ravb_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/renesas/ravb_main.c b/drivers/net/ethernet/renesas/ravb_main.c index 77a19336abec..c89bcdd15f16 100644 --- a/drivers/net/ethernet/renesas/ravb_main.c +++ b/drivers/net/ethernet/renesas/ravb_main.c @@ -2378,11 +2378,11 @@ static int ravb_remove(struct platform_device *pdev) priv->desc_bat_dma); /* Set reset mode */ ravb_write(ndev, CCC_OPC_RESET, CCC); - pm_runtime_put_sync(&pdev->dev); unregister_netdev(ndev); netif_napi_del(&priv->napi[RAVB_NC]); netif_napi_del(&priv->napi[RAVB_BE]); ravb_mdio_release(priv); + pm_runtime_put_sync(&pdev->dev); pm_runtime_disable(&pdev->dev); reset_control_assert(priv->rstc); free_netdev(ndev);
From: Jason Yan yanaijie@huawei.com
[ Upstream commit 43bd6f1b49b61f43de4d4e33661b8dbe8c911f14 ]
Before these two branches neither loaded the journal nor created the xattr cache. So the right label to goto is 'failed_mount3a'. Although this did not cause any issues because the error handler validated if the pointer is null. However this still made me confused when reading the code. So it's still worth to modify to goto the right label.
Signed-off-by: Jason Yan yanaijie@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Ritesh Harjani (IBM) ritesh.list@gmail.com Link: https://lore.kernel.org/r/20220916141527.1012715-2-yanaijie@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: 89481b5fa8c0 ("ext4: correct inconsistent error msg in nojournal mode") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/super.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index cdc2b1e6aa41..fd7565707975 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4664,30 +4664,30 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) ext4_has_feature_journal_needs_recovery(sb)) { ext4_msg(sb, KERN_ERR, "required journal recovery " "suppressed and not mounted read-only"); - goto failed_mount_wq; + goto failed_mount3a; } else { /* Nojournal mode, all journal mount options are illegal */ if (test_opt2(sb, EXPLICIT_JOURNAL_CHECKSUM)) { ext4_msg(sb, KERN_ERR, "can't mount with " "journal_checksum, fs mounted w/o journal"); - goto failed_mount_wq; + goto failed_mount3a; } if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) { ext4_msg(sb, KERN_ERR, "can't mount with " "journal_async_commit, fs mounted w/o journal"); - goto failed_mount_wq; + goto failed_mount3a; } if (sbi->s_commit_interval != JBD2_DEFAULT_MAX_COMMIT_AGE*HZ) { ext4_msg(sb, KERN_ERR, "can't mount with " "commit=%lu, fs mounted w/o journal", sbi->s_commit_interval / HZ); - goto failed_mount_wq; + goto failed_mount3a; } if (EXT4_MOUNT_DATA_FLAGS & (sbi->s_mount_opt ^ sbi->s_def_mount_opt)) { ext4_msg(sb, KERN_ERR, "can't mount with " "data=, fs mounted w/o journal"); - goto failed_mount_wq; + goto failed_mount3a; } sbi->s_def_mount_opt &= ~EXT4_MOUNT_JOURNAL_CHECKSUM; clear_opt(sb, JOURNAL_CHECKSUM);
From: Baokun Li libaokun1@huawei.com
[ Upstream commit 89481b5fa8c0640e62ba84c6020cee895f7ac643 ]
When we used the journal_async_commit mounting option in nojournal mode, the kernel told me that "can't mount with journal_checksum", was very confusing. I find that when we mount with journal_async_commit, both the JOURNAL_ASYNC_COMMIT and EXPLICIT_JOURNAL_CHECKSUM flags are set. However, in the error branch, CHECKSUM is checked before ASYNC_COMMIT. As a result, the above inconsistency occurs, and the ASYNC_COMMIT branch becomes dead code that cannot be executed. Therefore, we exchange the positions of the two judgments to make the error msg more accurate.
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221109074343.4184862-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/super.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index fd7565707975..1bb2e902667d 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -4667,14 +4667,15 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent) goto failed_mount3a; } else { /* Nojournal mode, all journal mount options are illegal */ - if (test_opt2(sb, EXPLICIT_JOURNAL_CHECKSUM)) { + if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) { ext4_msg(sb, KERN_ERR, "can't mount with " - "journal_checksum, fs mounted w/o journal"); + "journal_async_commit, fs mounted w/o journal"); goto failed_mount3a; } - if (test_opt(sb, JOURNAL_ASYNC_COMMIT)) { + + if (test_opt2(sb, EXPLICIT_JOURNAL_CHECKSUM)) { ext4_msg(sb, KERN_ERR, "can't mount with " - "journal_async_commit, fs mounted w/o journal"); + "journal_checksum, fs mounted w/o journal"); goto failed_mount3a; } if (sbi->s_commit_interval != JBD2_DEFAULT_MAX_COMMIT_AGE*HZ) {
From: Jan Kara jack@suse.cz
[ Upstream commit 307af6c879377c1c63e71cbdd978201f9c7ee8df ]
Use the fact that entries with elevated refcount are not removed from the hash and just move removal of the entry from the hash to the entry freeing time. When doing this we also change the generic code to hold one reference to the cache entry, not two of them, which makes code somewhat more obvious.
Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220712105436.32204-10-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Stable-dep-of: a44e84a9b776 ("ext4: fix deadlock due to mbcache entry corruption") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/mbcache.c | 108 +++++++++++++++------------------------- include/linux/mbcache.h | 24 ++++++--- 2 files changed, 55 insertions(+), 77 deletions(-)
diff --git a/fs/mbcache.c b/fs/mbcache.c index 2010bc80a3f2..950f1829a7fd 100644 --- a/fs/mbcache.c +++ b/fs/mbcache.c @@ -90,7 +90,7 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, return -ENOMEM;
INIT_LIST_HEAD(&entry->e_list); - /* One ref for hash, one ref returned */ + /* Initial hash reference */ atomic_set(&entry->e_refcnt, 1); entry->e_key = key; entry->e_value = value; @@ -106,21 +106,28 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, } } hlist_bl_add_head(&entry->e_hash_list, head); - hlist_bl_unlock(head); - + /* + * Add entry to LRU list before it can be found by + * mb_cache_entry_delete() to avoid races + */ spin_lock(&cache->c_list_lock); list_add_tail(&entry->e_list, &cache->c_list); - /* Grab ref for LRU list */ - atomic_inc(&entry->e_refcnt); cache->c_entry_count++; spin_unlock(&cache->c_list_lock); + hlist_bl_unlock(head);
return 0; } EXPORT_SYMBOL(mb_cache_entry_create);
-void __mb_cache_entry_free(struct mb_cache_entry *entry) +void __mb_cache_entry_free(struct mb_cache *cache, struct mb_cache_entry *entry) { + struct hlist_bl_head *head; + + head = mb_cache_entry_head(cache, entry->e_key); + hlist_bl_lock(head); + hlist_bl_del(&entry->e_hash_list); + hlist_bl_unlock(head); kmem_cache_free(mb_entry_cache, entry); } EXPORT_SYMBOL(__mb_cache_entry_free); @@ -134,7 +141,7 @@ EXPORT_SYMBOL(__mb_cache_entry_free); */ void mb_cache_entry_wait_unused(struct mb_cache_entry *entry) { - wait_var_event(&entry->e_refcnt, atomic_read(&entry->e_refcnt) <= 3); + wait_var_event(&entry->e_refcnt, atomic_read(&entry->e_refcnt) <= 2); } EXPORT_SYMBOL(mb_cache_entry_wait_unused);
@@ -155,10 +162,9 @@ static struct mb_cache_entry *__entry_find(struct mb_cache *cache, while (node) { entry = hlist_bl_entry(node, struct mb_cache_entry, e_hash_list); - if (entry->e_key == key && entry->e_reusable) { - atomic_inc(&entry->e_refcnt); + if (entry->e_key == key && entry->e_reusable && + atomic_inc_not_zero(&entry->e_refcnt)) goto out; - } node = node->next; } entry = NULL; @@ -218,10 +224,9 @@ struct mb_cache_entry *mb_cache_entry_get(struct mb_cache *cache, u32 key, head = mb_cache_entry_head(cache, key); hlist_bl_lock(head); hlist_bl_for_each_entry(entry, node, head, e_hash_list) { - if (entry->e_key == key && entry->e_value == value) { - atomic_inc(&entry->e_refcnt); + if (entry->e_key == key && entry->e_value == value && + atomic_inc_not_zero(&entry->e_refcnt)) goto out; - } } entry = NULL; out: @@ -281,37 +286,25 @@ EXPORT_SYMBOL(mb_cache_entry_delete); struct mb_cache_entry *mb_cache_entry_delete_or_get(struct mb_cache *cache, u32 key, u64 value) { - struct hlist_bl_node *node; - struct hlist_bl_head *head; struct mb_cache_entry *entry;
- head = mb_cache_entry_head(cache, key); - hlist_bl_lock(head); - hlist_bl_for_each_entry(entry, node, head, e_hash_list) { - if (entry->e_key == key && entry->e_value == value) { - if (atomic_read(&entry->e_refcnt) > 2) { - atomic_inc(&entry->e_refcnt); - hlist_bl_unlock(head); - return entry; - } - /* We keep hash list reference to keep entry alive */ - hlist_bl_del_init(&entry->e_hash_list); - hlist_bl_unlock(head); - spin_lock(&cache->c_list_lock); - if (!list_empty(&entry->e_list)) { - list_del_init(&entry->e_list); - if (!WARN_ONCE(cache->c_entry_count == 0, - "mbcache: attempt to decrement c_entry_count past zero")) - cache->c_entry_count--; - atomic_dec(&entry->e_refcnt); - } - spin_unlock(&cache->c_list_lock); - mb_cache_entry_put(cache, entry); - return NULL; - } - } - hlist_bl_unlock(head); + entry = mb_cache_entry_get(cache, key, value); + if (!entry) + return NULL;
+ /* + * Drop the ref we got from mb_cache_entry_get() and the initial hash + * ref if we are the last user + */ + if (atomic_cmpxchg(&entry->e_refcnt, 2, 0) != 2) + return entry; + + spin_lock(&cache->c_list_lock); + if (!list_empty(&entry->e_list)) + list_del_init(&entry->e_list); + cache->c_entry_count--; + spin_unlock(&cache->c_list_lock); + __mb_cache_entry_free(cache, entry); return NULL; } EXPORT_SYMBOL(mb_cache_entry_delete_or_get); @@ -343,42 +336,24 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache, unsigned long nr_to_scan) { struct mb_cache_entry *entry; - struct hlist_bl_head *head; unsigned long shrunk = 0;
spin_lock(&cache->c_list_lock); while (nr_to_scan-- && !list_empty(&cache->c_list)) { entry = list_first_entry(&cache->c_list, struct mb_cache_entry, e_list); - if (entry->e_referenced || atomic_read(&entry->e_refcnt) > 2) { + /* Drop initial hash reference if there is no user */ + if (entry->e_referenced || + atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) { entry->e_referenced = 0; list_move_tail(&entry->e_list, &cache->c_list); continue; } list_del_init(&entry->e_list); cache->c_entry_count--; - /* - * We keep LRU list reference so that entry doesn't go away - * from under us. - */ spin_unlock(&cache->c_list_lock); - head = mb_cache_entry_head(cache, entry->e_key); - hlist_bl_lock(head); - /* Now a reliable check if the entry didn't get used... */ - if (atomic_read(&entry->e_refcnt) > 2) { - hlist_bl_unlock(head); - spin_lock(&cache->c_list_lock); - list_add_tail(&entry->e_list, &cache->c_list); - cache->c_entry_count++; - continue; - } - if (!hlist_bl_unhashed(&entry->e_hash_list)) { - hlist_bl_del_init(&entry->e_hash_list); - atomic_dec(&entry->e_refcnt); - } - hlist_bl_unlock(head); - if (mb_cache_entry_put(cache, entry)) - shrunk++; + __mb_cache_entry_free(cache, entry); + shrunk++; cond_resched(); spin_lock(&cache->c_list_lock); } @@ -470,11 +445,6 @@ void mb_cache_destroy(struct mb_cache *cache) * point. */ list_for_each_entry_safe(entry, next, &cache->c_list, e_list) { - if (!hlist_bl_unhashed(&entry->e_hash_list)) { - hlist_bl_del_init(&entry->e_hash_list); - atomic_dec(&entry->e_refcnt); - } else - WARN_ON(1); list_del(&entry->e_list); WARN_ON(atomic_read(&entry->e_refcnt) != 1); mb_cache_entry_put(cache, entry); diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h index 8eca7f25c432..e9d5ece87794 100644 --- a/include/linux/mbcache.h +++ b/include/linux/mbcache.h @@ -13,8 +13,16 @@ struct mb_cache; struct mb_cache_entry { /* List of entries in cache - protected by cache->c_list_lock */ struct list_head e_list; - /* Hash table list - protected by hash chain bitlock */ + /* + * Hash table list - protected by hash chain bitlock. The entry is + * guaranteed to be hashed while e_refcnt > 0. + */ struct hlist_bl_node e_hash_list; + /* + * Entry refcount. Once it reaches zero, entry is unhashed and freed. + * While refcount > 0, the entry is guaranteed to stay in the hash and + * e.g. mb_cache_entry_try_delete() will fail. + */ atomic_t e_refcnt; /* Key in hash - stable during lifetime of the entry */ u32 e_key; @@ -29,20 +37,20 @@ void mb_cache_destroy(struct mb_cache *cache);
int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, u64 value, bool reusable); -void __mb_cache_entry_free(struct mb_cache_entry *entry); +void __mb_cache_entry_free(struct mb_cache *cache, + struct mb_cache_entry *entry); void mb_cache_entry_wait_unused(struct mb_cache_entry *entry); -static inline int mb_cache_entry_put(struct mb_cache *cache, - struct mb_cache_entry *entry) +static inline void mb_cache_entry_put(struct mb_cache *cache, + struct mb_cache_entry *entry) { unsigned int cnt = atomic_dec_return(&entry->e_refcnt);
if (cnt > 0) { - if (cnt <= 3) + if (cnt <= 2) wake_up_var(&entry->e_refcnt); - return 0; + return; } - __mb_cache_entry_free(entry); - return 1; + __mb_cache_entry_free(cache, entry); }
struct mb_cache_entry *mb_cache_entry_delete_or_get(struct mb_cache *cache,
From: Jan Kara jack@suse.cz
[ Upstream commit a44e84a9b7764c72896f7241a0ec9ac7e7ef38dd ]
When manipulating xattr blocks, we can deadlock infinitely looping inside ext4_xattr_block_set() where we constantly keep finding xattr block for reuse in mbcache but we are unable to reuse it because its reference count is too big. This happens because cache entry for the xattr block is marked as reusable (e_reusable set) although its reference count is too big. When this inconsistency happens, this inconsistent state is kept indefinitely and so ext4_xattr_block_set() keeps retrying indefinitely.
The inconsistent state is caused by non-atomic update of e_reusable bit. e_reusable is part of a bitfield and e_reusable update can race with update of e_referenced bit in the same bitfield resulting in loss of one of the updates. Fix the problem by using atomic bitops instead.
This bug has been around for many years, but it became *much* easier to hit after commit 65f8b80053a1 ("ext4: fix race when reusing xattr blocks").
Cc: stable@vger.kernel.org Fixes: 6048c64b2609 ("mbcache: add reusable flag to cache entries") Fixes: 65f8b80053a1 ("ext4: fix race when reusing xattr blocks") Reported-and-tested-by: Jeremi Piotrowski jpiotrowski@linux.microsoft.com Reported-by: Thilo Fromm t-lo@linux.microsoft.com Link: https://lore.kernel.org/r/c77bf00f-4618-7149-56f1-b8d1664b9d07@linux.microso... Signed-off-by: Jan Kara jack@suse.cz Reviewed-by: Andreas Dilger adilger@dilger.ca Link: https://lore.kernel.org/r/20221123193950.16758-1-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/xattr.c | 4 ++-- fs/mbcache.c | 14 ++++++++------ include/linux/mbcache.h | 9 +++++++-- 3 files changed, 17 insertions(+), 10 deletions(-)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 5ac31d3baab4..b92da41e9640 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1281,7 +1281,7 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode, ce = mb_cache_entry_get(ea_block_cache, hash, bh->b_blocknr); if (ce) { - ce->e_reusable = 1; + set_bit(MBE_REUSABLE_B, &ce->e_flags); mb_cache_entry_put(ea_block_cache, ce); } } @@ -2045,7 +2045,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode, } BHDR(new_bh)->h_refcount = cpu_to_le32(ref); if (ref == EXT4_XATTR_REFCOUNT_MAX) - ce->e_reusable = 0; + clear_bit(MBE_REUSABLE_B, &ce->e_flags); ea_bdebug(new_bh, "reusing; refcount now=%d", ref); ext4_xattr_block_csum_set(inode, new_bh); diff --git a/fs/mbcache.c b/fs/mbcache.c index 950f1829a7fd..7a12ae87c806 100644 --- a/fs/mbcache.c +++ b/fs/mbcache.c @@ -94,8 +94,9 @@ int mb_cache_entry_create(struct mb_cache *cache, gfp_t mask, u32 key, atomic_set(&entry->e_refcnt, 1); entry->e_key = key; entry->e_value = value; - entry->e_reusable = reusable; - entry->e_referenced = 0; + entry->e_flags = 0; + if (reusable) + set_bit(MBE_REUSABLE_B, &entry->e_flags); head = mb_cache_entry_head(cache, key); hlist_bl_lock(head); hlist_bl_for_each_entry(dup, dup_node, head, e_hash_list) { @@ -162,7 +163,8 @@ static struct mb_cache_entry *__entry_find(struct mb_cache *cache, while (node) { entry = hlist_bl_entry(node, struct mb_cache_entry, e_hash_list); - if (entry->e_key == key && entry->e_reusable && + if (entry->e_key == key && + test_bit(MBE_REUSABLE_B, &entry->e_flags) && atomic_inc_not_zero(&entry->e_refcnt)) goto out; node = node->next; @@ -318,7 +320,7 @@ EXPORT_SYMBOL(mb_cache_entry_delete_or_get); void mb_cache_entry_touch(struct mb_cache *cache, struct mb_cache_entry *entry) { - entry->e_referenced = 1; + set_bit(MBE_REFERENCED_B, &entry->e_flags); } EXPORT_SYMBOL(mb_cache_entry_touch);
@@ -343,9 +345,9 @@ static unsigned long mb_cache_shrink(struct mb_cache *cache, entry = list_first_entry(&cache->c_list, struct mb_cache_entry, e_list); /* Drop initial hash reference if there is no user */ - if (entry->e_referenced || + if (test_bit(MBE_REFERENCED_B, &entry->e_flags) || atomic_cmpxchg(&entry->e_refcnt, 1, 0) != 1) { - entry->e_referenced = 0; + clear_bit(MBE_REFERENCED_B, &entry->e_flags); list_move_tail(&entry->e_list, &cache->c_list); continue; } diff --git a/include/linux/mbcache.h b/include/linux/mbcache.h index e9d5ece87794..591bc4cefe1d 100644 --- a/include/linux/mbcache.h +++ b/include/linux/mbcache.h @@ -10,6 +10,12 @@
struct mb_cache;
+/* Cache entry flags */ +enum { + MBE_REFERENCED_B = 0, + MBE_REUSABLE_B +}; + struct mb_cache_entry { /* List of entries in cache - protected by cache->c_list_lock */ struct list_head e_list; @@ -26,8 +32,7 @@ struct mb_cache_entry { atomic_t e_refcnt; /* Key in hash - stable during lifetime of the entry */ u32 e_key; - u32 e_referenced:1; - u32 e_reusable:1; + unsigned long e_flags; /* User provided value - stable during lifetime of the entry */ u64 e_value; };
From: Matthew Auld matthew.auld@intel.com
[ Upstream commit 8eb7fcce34d16f77ac8efa80e8dfecec2503e8c5 ]
The scratch page might not be allocated in LMEM(like on DG2), so instead of using that as the deciding factor for where the paging structures live, let's just query the pt before mapping it.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Ramalingam C ramalingam.c@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20211206112539.3149779-1-matth... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/intel_migrate.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index 1dac21aa7e5c..aa05c26ff792 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -13,7 +13,6 @@
struct insert_pte_data { u64 offset; - bool is_lmem; };
#define CHUNK_SZ SZ_8M /* ~1ms at 8GiB/s preemption delay */ @@ -40,7 +39,7 @@ static void insert_pte(struct i915_address_space *vm, struct insert_pte_data *d = data;
vm->insert_page(vm, px_dma(pt), d->offset, I915_CACHE_NONE, - d->is_lmem ? PTE_LM : 0); + i915_gem_object_is_lmem(pt->base) ? PTE_LM : 0); d->offset += PAGE_SIZE; }
@@ -134,7 +133,6 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) goto err_vm;
/* Now allow the GPU to rewrite the PTE via its own ppGTT */ - d.is_lmem = i915_gem_object_is_lmem(vm->vm.scratch[0]); vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d); }
From: Matthew Auld matthew.auld@intel.com
[ Upstream commit 08c7c122ad90799cc3ae674e7f29f236f91063ce ]
Ensure we add the engine base only after we calculate the qword offset into the PTE window.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Ramalingam C ramalingam.c@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20211206112539.3149779-2-matth... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index aa05c26ff792..fb7fe3a2b6c6 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -279,10 +279,10 @@ static int emit_pte(struct i915_request *rq, GEM_BUG_ON(GRAPHICS_VER(rq->engine->i915) < 8);
/* Compute the page directory offset for the target address range */ - offset += (u64)rq->engine->instance << 32; offset >>= 12; offset *= sizeof(u64); offset += 2 * CHUNK_SZ; + offset += (u64)rq->engine->instance << 32;
cs = intel_ring_begin(rq, 6); if (IS_ERR(cs))
From: Matthew Auld matthew.auld@intel.com
[ Upstream commit 31d70749bfe110593fbe8bf45e7c7788c7d85035 ]
No need to insert PTEs for the PTE window itself, also foreach expects a length not an end offset, which could be gigantic here with a second engine.
Signed-off-by: Matthew Auld matthew.auld@intel.com Cc: Thomas Hellström thomas.hellstrom@linux.intel.com Cc: Ramalingam C ramalingam.c@intel.com Reviewed-by: Ramalingam C ramalingam.c@intel.com Link: https://patchwork.freedesktop.org/patch/msgid/20211206112539.3149779-3-matth... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gt/intel_migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index fb7fe3a2b6c6..5b59a6effc20 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -133,7 +133,7 @@ static struct i915_address_space *migrate_vm(struct intel_gt *gt) goto err_vm;
/* Now allow the GPU to rewrite the PTE via its own ppGTT */ - vm->vm.foreach(&vm->vm, base, base + sz, insert_pte, &d); + vm->vm.foreach(&vm->vm, base, d.offset - base, insert_pte, &d); }
return &vm->vm;
From: minoura makoto minoura@valinux.co.jp
[ Upstream commit b18cba09e374637a0a3759d856a6bca94c133952 ]
Commit 9130b8dbc6ac ("SUNRPC: allow for upcalls for the same uid but different gss service") introduced `auth` argument to __gss_find_upcall(), but in gss_pipe_downcall() it was left as NULL since it (and auth->service) was not (yet) determined.
When multiple upcalls with the same uid and different service are ongoing, it could happen that __gss_find_upcall(), which returns the first match found in the pipe->in_downcall list, could not find the correct gss_msg corresponding to the downcall we are looking for. Moreover, it might return a msg which is not sent to rpc.gssd yet.
We could see mount.nfs process hung in D state with multiple mount.nfs are executed in parallel. The call trace below is of CentOS 7.9 kernel-3.10.0-1160.24.1.el7.x86_64 but we observed the same hang w/ elrepo kernel-ml-6.0.7-1.el7.
PID: 71258 TASK: ffff91ebd4be0000 CPU: 36 COMMAND: "mount.nfs" #0 [ffff9203ca3234f8] __schedule at ffffffffa3b8899f #1 [ffff9203ca323580] schedule at ffffffffa3b88eb9 #2 [ffff9203ca323590] gss_cred_init at ffffffffc0355818 [auth_rpcgss] #3 [ffff9203ca323658] rpcauth_lookup_credcache at ffffffffc0421ebc [sunrpc] #4 [ffff9203ca3236d8] gss_lookup_cred at ffffffffc0353633 [auth_rpcgss] #5 [ffff9203ca3236e8] rpcauth_lookupcred at ffffffffc0421581 [sunrpc] #6 [ffff9203ca323740] rpcauth_refreshcred at ffffffffc04223d3 [sunrpc] #7 [ffff9203ca3237a0] call_refresh at ffffffffc04103dc [sunrpc] #8 [ffff9203ca3237b8] __rpc_execute at ffffffffc041e1c9 [sunrpc] #9 [ffff9203ca323820] rpc_execute at ffffffffc0420a48 [sunrpc]
The scenario is like this. Let's say there are two upcalls for services A and B, A -> B in pipe->in_downcall, B -> A in pipe->pipe.
When rpc.gssd reads pipe to get the upcall msg corresponding to service B from pipe->pipe and then writes the response, in gss_pipe_downcall the msg corresponding to service A will be picked because only uid is used to find the msg and it is before the one for B in pipe->in_downcall. And the process waiting for the msg corresponding to service A will be woken up.
Actual scheduing of that process might be after rpc.gssd processes the next msg. In rpc_pipe_generic_upcall it clears msg->errno (for A). The process is scheduled to see gss_msg->ctx == NULL and gss_msg->msg.errno == 0, therefore it cannot break the loop in gss_create_upcall and is never woken up after that.
This patch adds a simple check to ensure that a msg which is not sent to rpc.gssd yet is not chosen as the matching upcall upon receiving a downcall.
Signed-off-by: minoura makoto minoura@valinux.co.jp Signed-off-by: Hiroshi Shimamoto h-shimamoto@nec.com Tested-by: Hiroshi Shimamoto h-shimamoto@nec.com Cc: Trond Myklebust trondmy@hammerspace.com Fixes: 9130b8dbc6ac ("SUNRPC: allow for upcalls for same uid but different gss service") Signed-off-by: Trond Myklebust trond.myklebust@hammerspace.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/sunrpc/rpc_pipe_fs.h | 5 +++++ net/sunrpc/auth_gss/auth_gss.c | 19 +++++++++++++++++-- 2 files changed, 22 insertions(+), 2 deletions(-)
diff --git a/include/linux/sunrpc/rpc_pipe_fs.h b/include/linux/sunrpc/rpc_pipe_fs.h index cd188a527d16..3b35b6f6533a 100644 --- a/include/linux/sunrpc/rpc_pipe_fs.h +++ b/include/linux/sunrpc/rpc_pipe_fs.h @@ -92,6 +92,11 @@ extern ssize_t rpc_pipe_generic_upcall(struct file *, struct rpc_pipe_msg *, char __user *, size_t); extern int rpc_queue_upcall(struct rpc_pipe *, struct rpc_pipe_msg *);
+/* returns true if the msg is in-flight, i.e., already eaten by the peer */ +static inline bool rpc_msg_is_inflight(const struct rpc_pipe_msg *msg) { + return (msg->copied != 0 && list_empty(&msg->list)); +} + struct rpc_clnt; extern struct dentry *rpc_create_client_dir(struct dentry *, const char *, struct rpc_clnt *); extern int rpc_remove_client_dir(struct rpc_clnt *); diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c index 5f42aa5fc612..2ff66a6a7e54 100644 --- a/net/sunrpc/auth_gss/auth_gss.c +++ b/net/sunrpc/auth_gss/auth_gss.c @@ -301,7 +301,7 @@ __gss_find_upcall(struct rpc_pipe *pipe, kuid_t uid, const struct gss_auth *auth list_for_each_entry(pos, &pipe->in_downcall, list) { if (!uid_eq(pos->uid, uid)) continue; - if (auth && pos->auth->service != auth->service) + if (pos->auth->service != auth->service) continue; refcount_inc(&pos->count); return pos; @@ -685,6 +685,21 @@ gss_create_upcall(struct gss_auth *gss_auth, struct gss_cred *gss_cred) return err; }
+static struct gss_upcall_msg * +gss_find_downcall(struct rpc_pipe *pipe, kuid_t uid) +{ + struct gss_upcall_msg *pos; + list_for_each_entry(pos, &pipe->in_downcall, list) { + if (!uid_eq(pos->uid, uid)) + continue; + if (!rpc_msg_is_inflight(&pos->msg)) + continue; + refcount_inc(&pos->count); + return pos; + } + return NULL; +} + #define MSG_BUF_MAXSIZE 1024
static ssize_t @@ -731,7 +746,7 @@ gss_pipe_downcall(struct file *filp, const char __user *src, size_t mlen) err = -ENOENT; /* Find a matching upcall */ spin_lock(&pipe->lock); - gss_msg = __gss_find_upcall(pipe, uid, NULL); + gss_msg = gss_find_downcall(pipe, uid); if (gss_msg == NULL) { spin_unlock(&pipe->lock); goto err_put_ctx;
[ Upstream commit db0a4a7b8e95f9312a59a67cbd5bc589f090e13d ]
All error handling paths end to 'out', except this memory allocation failure.
This is spurious. So branch to the error handling path also in this case. It will add a call to:
memset(&root->defrag_progress, 0, sizeof(root->defrag_progress));
Fixes: 6702ed490ca0 ("Btrfs: Add run time btree defrag, and an ioctl to force btree defrag") Signed-off-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/tree-defrag.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/tree-defrag.c b/fs/btrfs/tree-defrag.c index 7c45d960b53c..259a3b5f9303 100644 --- a/fs/btrfs/tree-defrag.c +++ b/fs/btrfs/tree-defrag.c @@ -39,8 +39,10 @@ int btrfs_defrag_leaves(struct btrfs_trans_handle *trans, goto out;
path = btrfs_alloc_path(); - if (!path) - return -ENOMEM; + if (!path) { + ret = -ENOMEM; + goto out; + }
level = btrfs_header_level(root->node);
From: Jakub Kicinski kuba@kernel.org
[ Upstream commit 54c3f1a81421f85e60ae2eaae7be3727a09916ee ]
Anand hit a BUG() when pulling off headers on egress to a SW tunnel. We get to skb_checksum_help() with an invalid checksum offset (commit d7ea0d9df2a6 ("net: remove two BUG() from skb_checksum_help()") converted those BUGs to WARN_ONs()). He points out oddness in how skb_postpull_rcsum() gets used. Indeed looks like we should pull before "postpull", otherwise the CHECKSUM_PARTIAL fixup from skb_postpull_rcsum() will not be able to do its job:
if (skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_start_offset(skb) < 0) skb->ip_summed = CHECKSUM_NONE;
Reported-by: Anand Parthasarathy anpartha@meta.com Fixes: 6578171a7ff0 ("bpf: add bpf_skb_change_proto helper") Signed-off-by: Jakub Kicinski kuba@kernel.org Acked-by: Stanislav Fomichev sdf@google.com Link: https://lore.kernel.org/r/20221220004701.402165-1-kuba@kernel.org Signed-off-by: Martin KaFai Lau martin.lau@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/filter.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/net/core/filter.c b/net/core/filter.c index 2da05622afbe..b2031148dd8b 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -3182,15 +3182,18 @@ static int bpf_skb_generic_push(struct sk_buff *skb, u32 off, u32 len)
static int bpf_skb_generic_pop(struct sk_buff *skb, u32 off, u32 len) { + void *old_data; + /* skb_ensure_writable() is not needed here, as we're * already working on an uncloned skb. */ if (unlikely(!pskb_may_pull(skb, off + len))) return -ENOMEM;
- skb_postpull_rcsum(skb, skb->data + off, len); - memmove(skb->data + len, skb->data, off); + old_data = skb->data; __skb_pull(skb, len); + skb_postpull_rcsum(skb, old_data + off, len); + memmove(skb->data, old_data, off);
return 0; }
From: Steven Price steven.price@arm.com
[ Upstream commit 4217c6ac817451d5116687f3cc6286220dc43d49 ]
panfrost_gem_create_with_handle() previously returned a BO but with the only reference being from the handle, which user space could in theory guess and release, causing a use-after-free. Additionally if the call to panfrost_gem_mapping_get() in panfrost_ioctl_create_bo() failed then a(nother) reference on the BO was dropped.
The _create_with_handle() is a problematic pattern, so ditch it and instead create the handle in panfrost_ioctl_create_bo(). If the call to panfrost_gem_mapping_get() fails then this means that user space has indeed gone behind our back and freed the handle. In which case just return an error code.
Reported-by: Rob Clark robdclark@chromium.org Fixes: f3ba91228e8e ("drm/panfrost: Add initial panfrost driver") Signed-off-by: Steven Price steven.price@arm.com Reviewed-by: Rob Clark robdclark@gmail.com Link: https://patchwork.freedesktop.org/patch/msgid/20221219140130.410578-1-steven... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/panfrost/panfrost_drv.c | 27 ++++++++++++++++--------- drivers/gpu/drm/panfrost/panfrost_gem.c | 16 +-------------- drivers/gpu/drm/panfrost/panfrost_gem.h | 5 +---- 3 files changed, 20 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/panfrost/panfrost_drv.c b/drivers/gpu/drm/panfrost/panfrost_drv.c index e48e357ea4f1..4c271244092b 100644 --- a/drivers/gpu/drm/panfrost/panfrost_drv.c +++ b/drivers/gpu/drm/panfrost/panfrost_drv.c @@ -82,6 +82,7 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data, struct panfrost_gem_object *bo; struct drm_panfrost_create_bo *args = data; struct panfrost_gem_mapping *mapping; + int ret;
if (!args->size || args->pad || (args->flags & ~(PANFROST_BO_NOEXEC | PANFROST_BO_HEAP))) @@ -92,21 +93,29 @@ static int panfrost_ioctl_create_bo(struct drm_device *dev, void *data, !(args->flags & PANFROST_BO_NOEXEC)) return -EINVAL;
- bo = panfrost_gem_create_with_handle(file, dev, args->size, args->flags, - &args->handle); + bo = panfrost_gem_create(dev, args->size, args->flags); if (IS_ERR(bo)) return PTR_ERR(bo);
+ ret = drm_gem_handle_create(file, &bo->base.base, &args->handle); + if (ret) + goto out; + mapping = panfrost_gem_mapping_get(bo, priv); - if (!mapping) { - drm_gem_object_put(&bo->base.base); - return -EINVAL; + if (mapping) { + args->offset = mapping->mmnode.start << PAGE_SHIFT; + panfrost_gem_mapping_put(mapping); + } else { + /* This can only happen if the handle from + * drm_gem_handle_create() has already been guessed and freed + * by user space + */ + ret = -EINVAL; }
- args->offset = mapping->mmnode.start << PAGE_SHIFT; - panfrost_gem_mapping_put(mapping); - - return 0; +out: + drm_gem_object_put(&bo->base.base); + return ret; }
/** diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.c b/drivers/gpu/drm/panfrost/panfrost_gem.c index 6d9bdb9180cb..55e3a68ed28a 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.c +++ b/drivers/gpu/drm/panfrost/panfrost_gem.c @@ -234,12 +234,8 @@ struct drm_gem_object *panfrost_gem_create_object(struct drm_device *dev, size_t }
struct panfrost_gem_object * -panfrost_gem_create_with_handle(struct drm_file *file_priv, - struct drm_device *dev, size_t size, - u32 flags, - uint32_t *handle) +panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags) { - int ret; struct drm_gem_shmem_object *shmem; struct panfrost_gem_object *bo;
@@ -255,16 +251,6 @@ panfrost_gem_create_with_handle(struct drm_file *file_priv, bo->noexec = !!(flags & PANFROST_BO_NOEXEC); bo->is_heap = !!(flags & PANFROST_BO_HEAP);
- /* - * Allocate an id of idr table where the obj is registered - * and handle has the id what user can see. - */ - ret = drm_gem_handle_create(file_priv, &shmem->base, handle); - /* drop reference from allocate - handle holds it now. */ - drm_gem_object_put(&shmem->base); - if (ret) - return ERR_PTR(ret); - return bo; }
diff --git a/drivers/gpu/drm/panfrost/panfrost_gem.h b/drivers/gpu/drm/panfrost/panfrost_gem.h index 8088d5fd8480..ad2877eeeccd 100644 --- a/drivers/gpu/drm/panfrost/panfrost_gem.h +++ b/drivers/gpu/drm/panfrost/panfrost_gem.h @@ -69,10 +69,7 @@ panfrost_gem_prime_import_sg_table(struct drm_device *dev, struct sg_table *sgt);
struct panfrost_gem_object * -panfrost_gem_create_with_handle(struct drm_file *file_priv, - struct drm_device *dev, size_t size, - u32 flags, - uint32_t *handle); +panfrost_gem_create(struct drm_device *dev, size_t size, u32 flags);
int panfrost_gem_open(struct drm_gem_object *obj, struct drm_file *file_priv); void panfrost_gem_close(struct drm_gem_object *obj,
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit bed4a63ea4ae77cfe5aae004ef87379f0655260a ]
Add the following fields to the set description:
- key type - data type - object type - policy - gc_int: garbage collection interval) - timeout: element timeout
This prepares for stricter set type checks on updates in a follow up patch.
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: f6594c372afd ("netfilter: nf_tables: perform type checking for existing sets") Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 12 +++++++ net/netfilter/nf_tables_api.c | 58 +++++++++++++++---------------- 2 files changed, 40 insertions(+), 30 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 53746494eb84..5377dbfba120 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -283,17 +283,29 @@ struct nft_set_iter { /** * struct nft_set_desc - description of set elements * + * @ktype: key type * @klen: key length + * @dtype: data type * @dlen: data length + * @objtype: object type + * @flags: flags * @size: number of set elements + * @policy: set policy + * @gc_int: garbage collector interval * @field_len: length of each field in concatenation, bytes * @field_count: number of concatenated fields in element * @expr: set must support for expressions */ struct nft_set_desc { + u32 ktype; unsigned int klen; + u32 dtype; unsigned int dlen; + u32 objtype; unsigned int size; + u32 policy; + u32 gc_int; + u64 timeout; u8 field_len[NFT_REG32_COUNT]; u8 field_count; bool expr; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 3fac57d66dda..dd19726a9ac9 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -3635,8 +3635,7 @@ static bool nft_set_ops_candidate(const struct nft_set_type *type, u32 flags) static const struct nft_set_ops * nft_select_set_ops(const struct nft_ctx *ctx, const struct nlattr * const nla[], - const struct nft_set_desc *desc, - enum nft_set_policies policy) + const struct nft_set_desc *desc) { struct nftables_pernet *nft_net = nft_pernet(ctx->net); const struct nft_set_ops *ops, *bops; @@ -3665,7 +3664,7 @@ nft_select_set_ops(const struct nft_ctx *ctx, if (!ops->estimate(desc, flags, &est)) continue;
- switch (policy) { + switch (desc->policy) { case NFT_SET_POL_PERFORMANCE: if (est.lookup < best.lookup) break; @@ -4247,7 +4246,6 @@ static int nf_tables_set_desc_parse(struct nft_set_desc *desc, static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const nla[]) { - u32 ktype, dtype, flags, policy, gc_int, objtype; struct netlink_ext_ack *extack = info->extack; u8 genmask = nft_genmask_next(info->net); u8 family = info->nfmsg->nfgen_family; @@ -4260,10 +4258,10 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, struct nft_set *set; struct nft_ctx ctx; size_t alloc_size; - u64 timeout; char *name; int err, i; u16 udlen; + u32 flags; u64 size;
if (nla[NFTA_SET_TABLE] == NULL || @@ -4274,10 +4272,10 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info,
memset(&desc, 0, sizeof(desc));
- ktype = NFT_DATA_VALUE; + desc.ktype = NFT_DATA_VALUE; if (nla[NFTA_SET_KEY_TYPE] != NULL) { - ktype = ntohl(nla_get_be32(nla[NFTA_SET_KEY_TYPE])); - if ((ktype & NFT_DATA_RESERVED_MASK) == NFT_DATA_RESERVED_MASK) + desc.ktype = ntohl(nla_get_be32(nla[NFTA_SET_KEY_TYPE])); + if ((desc.ktype & NFT_DATA_RESERVED_MASK) == NFT_DATA_RESERVED_MASK) return -EINVAL; }
@@ -4302,17 +4300,17 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, return -EOPNOTSUPP; }
- dtype = 0; + desc.dtype = 0; if (nla[NFTA_SET_DATA_TYPE] != NULL) { if (!(flags & NFT_SET_MAP)) return -EINVAL;
- dtype = ntohl(nla_get_be32(nla[NFTA_SET_DATA_TYPE])); - if ((dtype & NFT_DATA_RESERVED_MASK) == NFT_DATA_RESERVED_MASK && - dtype != NFT_DATA_VERDICT) + desc.dtype = ntohl(nla_get_be32(nla[NFTA_SET_DATA_TYPE])); + if ((desc.dtype & NFT_DATA_RESERVED_MASK) == NFT_DATA_RESERVED_MASK && + desc.dtype != NFT_DATA_VERDICT) return -EINVAL;
- if (dtype != NFT_DATA_VERDICT) { + if (desc.dtype != NFT_DATA_VERDICT) { if (nla[NFTA_SET_DATA_LEN] == NULL) return -EINVAL; desc.dlen = ntohl(nla_get_be32(nla[NFTA_SET_DATA_LEN])); @@ -4327,34 +4325,34 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, if (!(flags & NFT_SET_OBJECT)) return -EINVAL;
- objtype = ntohl(nla_get_be32(nla[NFTA_SET_OBJ_TYPE])); - if (objtype == NFT_OBJECT_UNSPEC || - objtype > NFT_OBJECT_MAX) + desc.objtype = ntohl(nla_get_be32(nla[NFTA_SET_OBJ_TYPE])); + if (desc.objtype == NFT_OBJECT_UNSPEC || + desc.objtype > NFT_OBJECT_MAX) return -EOPNOTSUPP; } else if (flags & NFT_SET_OBJECT) return -EINVAL; else - objtype = NFT_OBJECT_UNSPEC; + desc.objtype = NFT_OBJECT_UNSPEC;
- timeout = 0; + desc.timeout = 0; if (nla[NFTA_SET_TIMEOUT] != NULL) { if (!(flags & NFT_SET_TIMEOUT)) return -EINVAL;
- err = nf_msecs_to_jiffies64(nla[NFTA_SET_TIMEOUT], &timeout); + err = nf_msecs_to_jiffies64(nla[NFTA_SET_TIMEOUT], &desc.timeout); if (err) return err; } - gc_int = 0; + desc.gc_int = 0; if (nla[NFTA_SET_GC_INTERVAL] != NULL) { if (!(flags & NFT_SET_TIMEOUT)) return -EINVAL; - gc_int = ntohl(nla_get_be32(nla[NFTA_SET_GC_INTERVAL])); + desc.gc_int = ntohl(nla_get_be32(nla[NFTA_SET_GC_INTERVAL])); }
- policy = NFT_SET_POL_PERFORMANCE; + desc.policy = NFT_SET_POL_PERFORMANCE; if (nla[NFTA_SET_POLICY] != NULL) - policy = ntohl(nla_get_be32(nla[NFTA_SET_POLICY])); + desc.policy = ntohl(nla_get_be32(nla[NFTA_SET_POLICY]));
if (nla[NFTA_SET_DESC] != NULL) { err = nf_tables_set_desc_parse(&desc, nla[NFTA_SET_DESC]); @@ -4399,7 +4397,7 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, if (!(info->nlh->nlmsg_flags & NLM_F_CREATE)) return -ENOENT;
- ops = nft_select_set_ops(&ctx, nla, &desc, policy); + ops = nft_select_set_ops(&ctx, nla, &desc); if (IS_ERR(ops)) return PTR_ERR(ops);
@@ -4439,18 +4437,18 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, set->table = table; write_pnet(&set->net, net); set->ops = ops; - set->ktype = ktype; + set->ktype = desc.ktype; set->klen = desc.klen; - set->dtype = dtype; - set->objtype = objtype; + set->dtype = desc.dtype; + set->objtype = desc.objtype; set->dlen = desc.dlen; set->flags = flags; set->size = desc.size; - set->policy = policy; + set->policy = desc.policy; set->udlen = udlen; set->udata = udata; - set->timeout = timeout; - set->gc_int = gc_int; + set->timeout = desc.timeout; + set->gc_int = desc.gc_int;
set->field_count = desc.field_count; for (i = 0; i < desc.field_count; i++)
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit a8fe4154fa5a1bae590b243ed60f871e5a5e1378 ]
Add a helper function to allocate and initialize the stateful expressions that are defined in a set.
This patch allows to reuse this code from the set update path, to check that type of the update matches the existing set in the kernel.
Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Stable-dep-of: f6594c372afd ("netfilter: nf_tables: perform type checking for existing sets") Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 106 ++++++++++++++++++++++------------ 1 file changed, 68 insertions(+), 38 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index dd19726a9ac9..f892a926fb58 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4243,6 +4243,59 @@ static int nf_tables_set_desc_parse(struct nft_set_desc *desc, return err; }
+static int nft_set_expr_alloc(struct nft_ctx *ctx, struct nft_set *set, + const struct nlattr * const *nla, + struct nft_expr **exprs, int *num_exprs, + u32 flags) +{ + struct nft_expr *expr; + int err, i; + + if (nla[NFTA_SET_EXPR]) { + expr = nft_set_elem_expr_alloc(ctx, set, nla[NFTA_SET_EXPR]); + if (IS_ERR(expr)) { + err = PTR_ERR(expr); + goto err_set_expr_alloc; + } + exprs[0] = expr; + (*num_exprs)++; + } else if (nla[NFTA_SET_EXPRESSIONS]) { + struct nlattr *tmp; + int left; + + if (!(flags & NFT_SET_EXPR)) { + err = -EINVAL; + goto err_set_expr_alloc; + } + i = 0; + nla_for_each_nested(tmp, nla[NFTA_SET_EXPRESSIONS], left) { + if (i == NFT_SET_EXPR_MAX) { + err = -E2BIG; + goto err_set_expr_alloc; + } + if (nla_type(tmp) != NFTA_LIST_ELEM) { + err = -EINVAL; + goto err_set_expr_alloc; + } + expr = nft_set_elem_expr_alloc(ctx, set, tmp); + if (IS_ERR(expr)) { + err = PTR_ERR(expr); + goto err_set_expr_alloc; + } + exprs[i++] = expr; + (*num_exprs)++; + } + } + + return 0; + +err_set_expr_alloc: + for (i = 0; i < *num_exprs; i++) + nft_expr_destroy(ctx, exprs[i]); + + return err; +} + static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const nla[]) { @@ -4250,7 +4303,6 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, u8 genmask = nft_genmask_next(info->net); u8 family = info->nfmsg->nfgen_family; const struct nft_set_ops *ops; - struct nft_expr *expr = NULL; struct net *net = info->net; struct nft_set_desc desc; struct nft_table *table; @@ -4258,6 +4310,7 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, struct nft_set *set; struct nft_ctx ctx; size_t alloc_size; + int num_exprs = 0; char *name; int err, i; u16 udlen; @@ -4384,6 +4437,8 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, return PTR_ERR(set); } } else { + struct nft_expr *exprs[NFT_SET_EXPR_MAX] = {}; + if (info->nlh->nlmsg_flags & NLM_F_EXCL) { NL_SET_BAD_ATTR(extack, nla[NFTA_SET_NAME]); return -EEXIST; @@ -4391,6 +4446,13 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, if (info->nlh->nlmsg_flags & NLM_F_REPLACE) return -EOPNOTSUPP;
+ err = nft_set_expr_alloc(&ctx, set, nla, exprs, &num_exprs, flags); + if (err < 0) + return err; + + for (i = 0; i < num_exprs; i++) + nft_expr_destroy(&ctx, exprs[i]); + return 0; }
@@ -4458,43 +4520,11 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, if (err < 0) goto err_set_init;
- if (nla[NFTA_SET_EXPR]) { - expr = nft_set_elem_expr_alloc(&ctx, set, nla[NFTA_SET_EXPR]); - if (IS_ERR(expr)) { - err = PTR_ERR(expr); - goto err_set_expr_alloc; - } - set->exprs[0] = expr; - set->num_exprs++; - } else if (nla[NFTA_SET_EXPRESSIONS]) { - struct nft_expr *expr; - struct nlattr *tmp; - int left; - - if (!(flags & NFT_SET_EXPR)) { - err = -EINVAL; - goto err_set_expr_alloc; - } - i = 0; - nla_for_each_nested(tmp, nla[NFTA_SET_EXPRESSIONS], left) { - if (i == NFT_SET_EXPR_MAX) { - err = -E2BIG; - goto err_set_expr_alloc; - } - if (nla_type(tmp) != NFTA_LIST_ELEM) { - err = -EINVAL; - goto err_set_expr_alloc; - } - expr = nft_set_elem_expr_alloc(&ctx, set, tmp); - if (IS_ERR(expr)) { - err = PTR_ERR(expr); - goto err_set_expr_alloc; - } - set->exprs[i++] = expr; - set->num_exprs++; - } - } + err = nft_set_expr_alloc(&ctx, set, nla, set->exprs, &num_exprs, flags); + if (err < 0) + goto err_set_destroy;
+ set->num_exprs = num_exprs; set->handle = nf_tables_alloc_handle(table);
err = nft_trans_set_add(&ctx, NFT_MSG_NEWSET, set); @@ -4508,7 +4538,7 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, err_set_expr_alloc: for (i = 0; i < set->num_exprs; i++) nft_expr_destroy(&ctx, set->exprs[i]); - +err_set_destroy: ops->destroy(set); err_set_init: kfree(set->name);
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit f6594c372afd5cec8b1e9ee9ea8f8819d59c6fb1 ]
If a ruleset declares a set name that matches an existing set in the kernel, then validate that this declaration really refers to the same set, otherwise bail out with EEXIST.
Currently, the kernel reports success when adding a set that already exists in the kernel. This usually results in EINVAL errors at a later stage, when the user adds elements to the set, if the set declaration mismatches the existing set representation in the kernel.
Add a new function to check that the set declaration really refers to the same existing set in the kernel.
Fixes: 96518518cc41 ("netfilter: add nftables") Reported-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/nf_tables_api.c | 36 ++++++++++++++++++++++++++++++++++- 1 file changed, 35 insertions(+), 1 deletion(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index f892a926fb58..82fe54b64714 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -4296,6 +4296,34 @@ static int nft_set_expr_alloc(struct nft_ctx *ctx, struct nft_set *set, return err; }
+static bool nft_set_is_same(const struct nft_set *set, + const struct nft_set_desc *desc, + struct nft_expr *exprs[], u32 num_exprs, u32 flags) +{ + int i; + + if (set->ktype != desc->ktype || + set->dtype != desc->dtype || + set->flags != flags || + set->klen != desc->klen || + set->dlen != desc->dlen || + set->field_count != desc->field_count || + set->num_exprs != num_exprs) + return false; + + for (i = 0; i < desc->field_count; i++) { + if (set->field_len[i] != desc->field_len[i]) + return false; + } + + for (i = 0; i < num_exprs; i++) { + if (set->exprs[i]->ops != exprs[i]->ops) + return false; + } + + return true; +} + static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, const struct nlattr * const nla[]) { @@ -4450,10 +4478,16 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, if (err < 0) return err;
+ err = 0; + if (!nft_set_is_same(set, &desc, exprs, num_exprs, flags)) { + NL_SET_BAD_ATTR(extack, nla[NFTA_SET_NAME]); + err = -EEXIST; + } + for (i = 0; i < num_exprs; i++) nft_expr_destroy(&ctx, exprs[i]);
- return 0; + return err; }
if (!(info->nlh->nlmsg_flags & NLM_F_CREATE))
From: Ronak Doshi doshir@vmware.com
[ Upstream commit 3d8f2c4269d08f8793e946279dbdf5e972cc4911 ]
Commit dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") added support for encapsulation offload. However, the pathc did not report correctly the csum_level for encapsulated packet.
This patch fixes this issue by reporting correct csum level for the encapsulated packet.
Fixes: dacce2be3312 ("vmxnet3: add geneve and vxlan tunnel offload support") Signed-off-by: Ronak Doshi doshir@vmware.com Acked-by: Peng Li lpeng@vmware.com Link: https://lore.kernel.org/r/20221220202556.24421-1-doshir@vmware.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/vmxnet3/vmxnet3_drv.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index 21896e221300..b88092a6bc85 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -1242,6 +1242,10 @@ vmxnet3_rx_csum(struct vmxnet3_adapter *adapter, (le32_to_cpu(gdesc->dword[3]) & VMXNET3_RCD_CSUM_OK) == VMXNET3_RCD_CSUM_OK) { skb->ip_summed = CHECKSUM_UNNECESSARY; + if ((le32_to_cpu(gdesc->dword[0]) & + (1UL << VMXNET3_RCD_HDR_INNER_SHIFT))) { + skb->csum_level = 1; + } WARN_ON_ONCE(!(gdesc->rcd.tcp || gdesc->rcd.udp) && !(le32_to_cpu(gdesc->dword[0]) & (1UL << VMXNET3_RCD_HDR_INNER_SHIFT))); @@ -1251,6 +1255,10 @@ vmxnet3_rx_csum(struct vmxnet3_adapter *adapter, } else if (gdesc->rcd.v6 && (le32_to_cpu(gdesc->dword[3]) & (1 << VMXNET3_RCD_TUC_SHIFT))) { skb->ip_summed = CHECKSUM_UNNECESSARY; + if ((le32_to_cpu(gdesc->dword[0]) & + (1UL << VMXNET3_RCD_HDR_INNER_SHIFT))) { + skb->csum_level = 1; + } WARN_ON_ONCE(!(gdesc->rcd.tcp || gdesc->rcd.udp) && !(le32_to_cpu(gdesc->dword[0]) & (1UL << VMXNET3_RCD_HDR_INNER_SHIFT)));
From: Pablo Neira Ayuso pablo@netfilter.org
[ Upstream commit 123b99619cca94bdca0bf7bde9abe28f0a0dfe06 ]
Set timeout and garbage collection interval updates are ignored on updates. Add transaction to update global set element timeout and garbage collection interval.
Fixes: 96518518cc41 ("netfilter: add nftables") Suggested-by: Florian Westphal fw@strlen.de Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/net/netfilter/nf_tables.h | 13 ++++++- net/netfilter/nf_tables_api.c | 63 ++++++++++++++++++++++--------- 2 files changed, 57 insertions(+), 19 deletions(-)
diff --git a/include/net/netfilter/nf_tables.h b/include/net/netfilter/nf_tables.h index 5377dbfba120..80df8ff5e675 100644 --- a/include/net/netfilter/nf_tables.h +++ b/include/net/netfilter/nf_tables.h @@ -562,7 +562,9 @@ void *nft_set_catchall_gc(const struct nft_set *set);
static inline unsigned long nft_set_gc_interval(const struct nft_set *set) { - return set->gc_int ? msecs_to_jiffies(set->gc_int) : HZ; + u32 gc_int = READ_ONCE(set->gc_int); + + return gc_int ? msecs_to_jiffies(gc_int) : HZ; }
/** @@ -1511,6 +1513,9 @@ struct nft_trans_rule { struct nft_trans_set { struct nft_set *set; u32 set_id; + u32 gc_int; + u64 timeout; + bool update; bool bound; };
@@ -1520,6 +1525,12 @@ struct nft_trans_set { (((struct nft_trans_set *)trans->data)->set_id) #define nft_trans_set_bound(trans) \ (((struct nft_trans_set *)trans->data)->bound) +#define nft_trans_set_update(trans) \ + (((struct nft_trans_set *)trans->data)->update) +#define nft_trans_set_timeout(trans) \ + (((struct nft_trans_set *)trans->data)->timeout) +#define nft_trans_set_gc_int(trans) \ + (((struct nft_trans_set *)trans->data)->gc_int)
struct nft_trans_chain { bool update; diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c index 82fe54b64714..81bd13b3d8fd 100644 --- a/net/netfilter/nf_tables_api.c +++ b/net/netfilter/nf_tables_api.c @@ -465,8 +465,9 @@ static int nft_delrule_by_chain(struct nft_ctx *ctx) return 0; }
-static int nft_trans_set_add(const struct nft_ctx *ctx, int msg_type, - struct nft_set *set) +static int __nft_trans_set_add(const struct nft_ctx *ctx, int msg_type, + struct nft_set *set, + const struct nft_set_desc *desc) { struct nft_trans *trans;
@@ -474,17 +475,28 @@ static int nft_trans_set_add(const struct nft_ctx *ctx, int msg_type, if (trans == NULL) return -ENOMEM;
- if (msg_type == NFT_MSG_NEWSET && ctx->nla[NFTA_SET_ID] != NULL) { + if (msg_type == NFT_MSG_NEWSET && ctx->nla[NFTA_SET_ID] && !desc) { nft_trans_set_id(trans) = ntohl(nla_get_be32(ctx->nla[NFTA_SET_ID])); nft_activate_next(ctx->net, set); } nft_trans_set(trans) = set; + if (desc) { + nft_trans_set_update(trans) = true; + nft_trans_set_gc_int(trans) = desc->gc_int; + nft_trans_set_timeout(trans) = desc->timeout; + } nft_trans_commit_list_add_tail(ctx->net, trans);
return 0; }
+static int nft_trans_set_add(const struct nft_ctx *ctx, int msg_type, + struct nft_set *set) +{ + return __nft_trans_set_add(ctx, msg_type, set, NULL); +} + static int nft_delset(const struct nft_ctx *ctx, struct nft_set *set) { int err; @@ -3899,8 +3911,10 @@ static int nf_tables_fill_set_concat(struct sk_buff *skb, static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx, const struct nft_set *set, u16 event, u16 flags) { - struct nlmsghdr *nlh; + u64 timeout = READ_ONCE(set->timeout); + u32 gc_int = READ_ONCE(set->gc_int); u32 portid = ctx->portid; + struct nlmsghdr *nlh; struct nlattr *nest; u32 seq = ctx->seq; int i; @@ -3936,13 +3950,13 @@ static int nf_tables_fill_set(struct sk_buff *skb, const struct nft_ctx *ctx, nla_put_be32(skb, NFTA_SET_OBJ_TYPE, htonl(set->objtype))) goto nla_put_failure;
- if (set->timeout && + if (timeout && nla_put_be64(skb, NFTA_SET_TIMEOUT, - nf_jiffies64_to_msecs(set->timeout), + nf_jiffies64_to_msecs(timeout), NFTA_SET_PAD)) goto nla_put_failure; - if (set->gc_int && - nla_put_be32(skb, NFTA_SET_GC_INTERVAL, htonl(set->gc_int))) + if (gc_int && + nla_put_be32(skb, NFTA_SET_GC_INTERVAL, htonl(gc_int))) goto nla_put_failure;
if (set->policy != NFT_SET_POL_PERFORMANCE) { @@ -4487,7 +4501,10 @@ static int nf_tables_newset(struct sk_buff *skb, const struct nfnl_info *info, for (i = 0; i < num_exprs; i++) nft_expr_destroy(&ctx, exprs[i]);
- return err; + if (err < 0) + return err; + + return __nft_trans_set_add(&ctx, NFT_MSG_NEWSET, set, &desc); }
if (!(info->nlh->nlmsg_flags & NLM_F_CREATE)) @@ -5877,7 +5894,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, return err; } else if (set->flags & NFT_SET_TIMEOUT && !(flags & NFT_SET_ELEM_INTERVAL_END)) { - timeout = set->timeout; + timeout = READ_ONCE(set->timeout); }
expiration = 0; @@ -5978,7 +5995,7 @@ static int nft_add_set_elem(struct nft_ctx *ctx, struct nft_set *set, if (err < 0) goto err_parse_key_end;
- if (timeout != set->timeout) { + if (timeout != READ_ONCE(set->timeout)) { err = nft_set_ext_add(&tmpl, NFT_SET_EXT_TIMEOUT); if (err < 0) goto err_parse_key_end; @@ -8833,14 +8850,20 @@ static int nf_tables_commit(struct net *net, struct sk_buff *skb) nft_flow_rule_destroy(nft_trans_flow_rule(trans)); break; case NFT_MSG_NEWSET: - nft_clear(net, nft_trans_set(trans)); - /* This avoids hitting -EBUSY when deleting the table - * from the transaction. - */ - if (nft_set_is_anonymous(nft_trans_set(trans)) && - !list_empty(&nft_trans_set(trans)->bindings)) - trans->ctx.table->use--; + if (nft_trans_set_update(trans)) { + struct nft_set *set = nft_trans_set(trans);
+ WRITE_ONCE(set->timeout, nft_trans_set_timeout(trans)); + WRITE_ONCE(set->gc_int, nft_trans_set_gc_int(trans)); + } else { + nft_clear(net, nft_trans_set(trans)); + /* This avoids hitting -EBUSY when deleting the table + * from the transaction. + */ + if (nft_set_is_anonymous(nft_trans_set(trans)) && + !list_empty(&nft_trans_set(trans)->bindings)) + trans->ctx.table->use--; + } nf_tables_set_notify(&trans->ctx, nft_trans_set(trans), NFT_MSG_NEWSET, GFP_KERNEL); nft_trans_destroy(trans); @@ -9062,6 +9085,10 @@ static int __nf_tables_abort(struct net *net, enum nfnl_abort_action action) nft_trans_destroy(trans); break; case NFT_MSG_NEWSET: + if (nft_trans_set_update(trans)) { + nft_trans_destroy(trans); + break; + } trans->ctx.table->use--; if (nft_trans_set_bound(trans)) { nft_trans_destroy(trans);
From: Shawn Bohrer sbohrer@cloudflare.com
[ Upstream commit fa349e396e4886d742fd6501c599ec627ef1353b ]
When AF_XDP is used on on a veth interface the RX ring is updated in two steps. veth_xdp_rcv() removes packet descriptors from the FILL ring fills them and places them in the RX ring updating the cached_prod pointer. Later xdp_do_flush() syncs the RX ring prod pointer with the cached_prod pointer allowing user-space to see the recently filled in descriptors. The rings are intended to be SPSC, however the existing order in veth_poll allows the xdp_do_flush() to run concurrently with another CPU creating a race condition that allows user-space to see old or uninitialized descriptors in the RX ring. This bug has been observed in production systems.
To summarize, we are expecting this ordering:
CPU 0 __xsk_rcv_zc() CPU 0 __xsk_map_flush() CPU 2 __xsk_rcv_zc() CPU 2 __xsk_map_flush()
But we are seeing this order:
CPU 0 __xsk_rcv_zc() CPU 2 __xsk_rcv_zc() CPU 0 __xsk_map_flush() CPU 2 __xsk_map_flush()
This occurs because we rely on NAPI to ensure that only one napi_poll handler is running at a time for the given veth receive queue. napi_schedule_prep() will prevent multiple instances from getting scheduled. However calling napi_complete_done() signals that this napi_poll is complete and allows subsequent calls to napi_schedule_prep() and __napi_schedule() to succeed in scheduling a concurrent napi_poll before the xdp_do_flush() has been called. For the veth driver a concurrent call to napi_schedule_prep() and __napi_schedule() can occur on a different CPU because the veth xmit path can additionally schedule a napi_poll creating the race.
The fix as suggested by Magnus Karlsson, is to simply move the xdp_do_flush() call before napi_complete_done(). This syncs the producer ring pointers before another instance of napi_poll can be scheduled on another CPU. It will also slightly improve performance by moving the flush closer to when the descriptors were placed in the RX ring.
Fixes: d1396004dd86 ("veth: Add XDP TX and REDIRECT") Suggested-by: Magnus Karlsson magnus.karlsson@gmail.com Signed-off-by: Shawn Bohrer sbohrer@cloudflare.com Link: https://lore.kernel.org/r/20221220185903.1105011-1-sbohrer@cloudflare.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/veth.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/veth.c b/drivers/net/veth.c index 64fa8e9c0a22..41cb9179e8b7 100644 --- a/drivers/net/veth.c +++ b/drivers/net/veth.c @@ -916,6 +916,9 @@ static int veth_poll(struct napi_struct *napi, int budget) xdp_set_return_frame_no_direct(); done = veth_xdp_rcv(rq, budget, &bq, &stats);
+ if (stats.xdp_redirect > 0) + xdp_do_flush(); + if (done < budget && napi_complete_done(napi, done)) { /* Write rx_notify_masked before reading ptr_ring */ smp_store_mb(rq->rx_notify_masked, false); @@ -929,8 +932,6 @@ static int veth_poll(struct napi_struct *napi, int budget)
if (stats.xdp_tx > 0) veth_xdp_flush(rq, &bq); - if (stats.xdp_redirect > 0) - xdp_do_flush(); xdp_clear_return_frame_no_direct();
return done;
From: Jeff Layton jlayton@kernel.org
[ Upstream commit 789e1e10f214c00ca18fc6610824c5b9876ba5f2 ]
Currently, we shut down the filecache before trying to clean up the stateids that depend on it. This leads to the kernel trying to free an nfsd_file twice, and a refcount overput on the nf_mark.
Change the shutdown procedure to tear down all of the stateids prior to shutting down the filecache.
Reported-and-tested-by: Wang Yugui wangyugui@e16-tech.com Signed-off-by: Jeff Layton jlayton@kernel.org Fixes: 5e113224c17e ("nfsd: nfsd_file cache entries should be per net namespace") Signed-off-by: Chuck Lever chuck.lever@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/nfsd/nfssvc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index ccb59e91011b..373695cc62a7 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -425,8 +425,8 @@ static void nfsd_shutdown_net(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id);
- nfsd_file_cache_shutdown_net(net); nfs4_state_shutdown_net(net); + nfsd_file_cache_shutdown_net(net); if (nn->lockd_up) { lockd_down(net); nn->lockd_up = false;
From: Jie Wang wangjie125@huawei.com
[ Upstream commit 09e6b30eeb254f1818a008cace3547159e908dfd ]
Currently keep alive message between PF and VF may be lost and the VF is unalive in PF. So the VF will not do reset during PF FLR reset process. This would make the allocated interrupt resources of VF invalid and VF would't receive or respond to PF any more.
So this patch adds VF interrupts re-initialization during VF FLR for VF recovery in above cases.
Fixes: 862d969a3a4d ("net: hns3: do VF's pci re-initialization while PF doing FLR") Signed-off-by: Jie Wang wangjie125@huawei.com Signed-off-by: Hao Lan lanhao@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c index 21678c12afa2..3c1ff3313221 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c @@ -3258,7 +3258,8 @@ static int hclgevf_pci_reset(struct hclgevf_dev *hdev) struct pci_dev *pdev = hdev->pdev; int ret = 0;
- if (hdev->reset_type == HNAE3_VF_FULL_RESET && + if ((hdev->reset_type == HNAE3_VF_FULL_RESET || + hdev->reset_type == HNAE3_FLR_RESET) && test_bit(HCLGEVF_STATE_IRQ_INITED, &hdev->state)) { hclgevf_misc_irq_uninit(hdev); hclgevf_uninit_msi(hdev);
From: Hao Chen chenhao288@hisilicon.com
[ Upstream commit e74a726da2c4dcedb8b0631f423d0044c7901a20 ]
Split rx copybreak handle into a separate function from function hns3_nic_reuse_page() to improve code simplicity.
Signed-off-by: Hao Chen chenhao288@hisilicon.com Signed-off-by: Guangbin Huang huangguangbin2@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 7d89b53cea1a ("net: hns3: fix miss L3E checking for rx packet") Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/hisilicon/hns3/hns3_enet.c | 55 ++++++++++++------- 1 file changed, 35 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index 818a028703c6..e9f2d51a8b7b 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -3561,6 +3561,38 @@ static bool hns3_can_reuse_page(struct hns3_desc_cb *cb) return page_count(cb->priv) == cb->pagecnt_bias; }
+static int hns3_handle_rx_copybreak(struct sk_buff *skb, int i, + struct hns3_enet_ring *ring, + int pull_len, + struct hns3_desc_cb *desc_cb) +{ + struct hns3_desc *desc = &ring->desc[ring->next_to_clean]; + u32 frag_offset = desc_cb->page_offset + pull_len; + int size = le16_to_cpu(desc->rx.size); + u32 frag_size = size - pull_len; + void *frag = napi_alloc_frag(frag_size); + + if (unlikely(!frag)) { + u64_stats_update_begin(&ring->syncp); + ring->stats.frag_alloc_err++; + u64_stats_update_end(&ring->syncp); + + hns3_rl_err(ring_to_netdev(ring), + "failed to allocate rx frag\n"); + return -ENOMEM; + } + + desc_cb->reuse_flag = 1; + memcpy(frag, desc_cb->buf + frag_offset, frag_size); + skb_add_rx_frag(skb, i, virt_to_page(frag), + offset_in_page(frag), frag_size, frag_size); + + u64_stats_update_begin(&ring->syncp); + ring->stats.frag_alloc++; + u64_stats_update_end(&ring->syncp); + return 0; +} + static void hns3_nic_reuse_page(struct sk_buff *skb, int i, struct hns3_enet_ring *ring, int pull_len, struct hns3_desc_cb *desc_cb) @@ -3570,6 +3602,7 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i, int size = le16_to_cpu(desc->rx.size); u32 truesize = hns3_buf_size(ring); u32 frag_size = size - pull_len; + int ret = 0; bool reused;
if (ring->page_pool) { @@ -3604,27 +3637,9 @@ static void hns3_nic_reuse_page(struct sk_buff *skb, int i, desc_cb->page_offset = 0; desc_cb->reuse_flag = 1; } else if (frag_size <= ring->rx_copybreak) { - void *frag = napi_alloc_frag(frag_size); - - if (unlikely(!frag)) { - u64_stats_update_begin(&ring->syncp); - ring->stats.frag_alloc_err++; - u64_stats_update_end(&ring->syncp); - - hns3_rl_err(ring_to_netdev(ring), - "failed to allocate rx frag\n"); + ret = hns3_handle_rx_copybreak(skb, i, ring, pull_len, desc_cb); + if (ret) goto out; - } - - desc_cb->reuse_flag = 1; - memcpy(frag, desc_cb->buf + frag_offset, frag_size); - skb_add_rx_frag(skb, i, virt_to_page(frag), - offset_in_page(frag), frag_size, frag_size); - - u64_stats_update_begin(&ring->syncp); - ring->stats.frag_alloc++; - u64_stats_update_end(&ring->syncp); - return; }
out:
From: Peng Li lipeng321@huawei.com
[ Upstream commit e6d72f6ac2ad4965491354d74b48e35a60abf298 ]
As the code to update ring stats is alike for different ring stats type, this patch extract macro to simplify ring stats update code.
Signed-off-by: Peng Li lipeng321@huawei.com Signed-off-by: Guangbin Huang huangguangbin2@huawei.com Signed-off-by: David S. Miller davem@davemloft.net Stable-dep-of: 7d89b53cea1a ("net: hns3: fix miss L3E checking for rx packet") Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/hisilicon/hns3/hns3_enet.c | 123 +++++------------- .../net/ethernet/hisilicon/hns3/hns3_enet.h | 7 + 2 files changed, 38 insertions(+), 92 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index e9f2d51a8b7b..d06e2d0bae2e 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -1005,9 +1005,7 @@ static bool hns3_can_use_tx_bounce(struct hns3_enet_ring *ring, return false;
if (ALIGN(len, dma_get_cache_alignment()) > space) { - u64_stats_update_begin(&ring->syncp); - ring->stats.tx_spare_full++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, tx_spare_full); return false; }
@@ -1024,9 +1022,7 @@ static bool hns3_can_use_tx_sgl(struct hns3_enet_ring *ring, return false;
if (space < HNS3_MAX_SGL_SIZE) { - u64_stats_update_begin(&ring->syncp); - ring->stats.tx_spare_full++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, tx_spare_full); return false; }
@@ -1554,9 +1550,7 @@ static int hns3_fill_skb_desc(struct hns3_enet_ring *ring,
ret = hns3_handle_vtags(ring, skb); if (unlikely(ret < 0)) { - u64_stats_update_begin(&ring->syncp); - ring->stats.tx_vlan_err++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, tx_vlan_err); return ret; } else if (ret == HNS3_INNER_VLAN_TAG) { inner_vtag = skb_vlan_tag_get(skb); @@ -1591,9 +1585,7 @@ static int hns3_fill_skb_desc(struct hns3_enet_ring *ring,
ret = hns3_get_l4_protocol(skb, &ol4_proto, &il4_proto); if (unlikely(ret < 0)) { - u64_stats_update_begin(&ring->syncp); - ring->stats.tx_l4_proto_err++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, tx_l4_proto_err); return ret; }
@@ -1601,18 +1593,14 @@ static int hns3_fill_skb_desc(struct hns3_enet_ring *ring, &type_cs_vlan_tso, &ol_type_vlan_len_msec); if (unlikely(ret < 0)) { - u64_stats_update_begin(&ring->syncp); - ring->stats.tx_l2l3l4_err++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, tx_l2l3l4_err); return ret; }
ret = hns3_set_tso(skb, &paylen_ol4cs, &mss_hw_csum, &type_cs_vlan_tso, &desc_cb->send_bytes); if (unlikely(ret < 0)) { - u64_stats_update_begin(&ring->syncp); - ring->stats.tx_tso_err++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, tx_tso_err); return ret; } } @@ -1705,9 +1693,7 @@ static int hns3_map_and_fill_desc(struct hns3_enet_ring *ring, void *priv, }
if (unlikely(dma_mapping_error(dev, dma))) { - u64_stats_update_begin(&ring->syncp); - ring->stats.sw_err_cnt++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, sw_err_cnt); return -ENOMEM; }
@@ -1853,9 +1839,7 @@ static int hns3_skb_linearize(struct hns3_enet_ring *ring, * recursion level of over HNS3_MAX_RECURSION_LEVEL. */ if (bd_num == UINT_MAX) { - u64_stats_update_begin(&ring->syncp); - ring->stats.over_max_recursion++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, over_max_recursion); return -ENOMEM; }
@@ -1864,16 +1848,12 @@ static int hns3_skb_linearize(struct hns3_enet_ring *ring, */ if (skb->len > HNS3_MAX_TSO_SIZE || (!skb_is_gso(skb) && skb->len > HNS3_MAX_NON_TSO_SIZE)) { - u64_stats_update_begin(&ring->syncp); - ring->stats.hw_limitation++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, hw_limitation); return -ENOMEM; }
if (__skb_linearize(skb)) { - u64_stats_update_begin(&ring->syncp); - ring->stats.sw_err_cnt++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, sw_err_cnt); return -ENOMEM; }
@@ -1903,9 +1883,7 @@ static int hns3_nic_maybe_stop_tx(struct hns3_enet_ring *ring,
bd_num = hns3_tx_bd_count(skb->len);
- u64_stats_update_begin(&ring->syncp); - ring->stats.tx_copy++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, tx_copy); }
out: @@ -1925,9 +1903,7 @@ static int hns3_nic_maybe_stop_tx(struct hns3_enet_ring *ring, return bd_num; }
- u64_stats_update_begin(&ring->syncp); - ring->stats.tx_busy++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, tx_busy);
return -EBUSY; } @@ -2012,9 +1988,7 @@ static void hns3_tx_doorbell(struct hns3_enet_ring *ring, int num, ring->pending_buf += num;
if (!doorbell) { - u64_stats_update_begin(&ring->syncp); - ring->stats.tx_more++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, tx_more); return; }
@@ -2064,9 +2038,7 @@ static int hns3_handle_tx_bounce(struct hns3_enet_ring *ring, ret = skb_copy_bits(skb, 0, buf, size); if (unlikely(ret < 0)) { hns3_tx_spare_rollback(ring, cb_len); - u64_stats_update_begin(&ring->syncp); - ring->stats.copy_bits_err++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, copy_bits_err); return ret; }
@@ -2089,9 +2061,8 @@ static int hns3_handle_tx_bounce(struct hns3_enet_ring *ring, dma_sync_single_for_device(ring_to_dev(ring), dma, size, DMA_TO_DEVICE);
- u64_stats_update_begin(&ring->syncp); - ring->stats.tx_bounce++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, tx_bounce); + return bd_num; }
@@ -2121,9 +2092,7 @@ static int hns3_handle_tx_sgl(struct hns3_enet_ring *ring, nents = skb_to_sgvec(skb, sgt->sgl, 0, skb->len); if (unlikely(nents < 0)) { hns3_tx_spare_rollback(ring, cb_len); - u64_stats_update_begin(&ring->syncp); - ring->stats.skb2sgl_err++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, skb2sgl_err); return -ENOMEM; }
@@ -2132,9 +2101,7 @@ static int hns3_handle_tx_sgl(struct hns3_enet_ring *ring, DMA_TO_DEVICE); if (unlikely(!sgt->nents)) { hns3_tx_spare_rollback(ring, cb_len); - u64_stats_update_begin(&ring->syncp); - ring->stats.map_sg_err++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, map_sg_err); return -ENOMEM; }
@@ -2146,10 +2113,7 @@ static int hns3_handle_tx_sgl(struct hns3_enet_ring *ring, for (i = 0; i < sgt->nents; i++) bd_num += hns3_fill_desc(ring, sg_dma_address(sgt->sgl + i), sg_dma_len(sgt->sgl + i)); - - u64_stats_update_begin(&ring->syncp); - ring->stats.tx_sgl++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, tx_sgl);
return bd_num; } @@ -2188,9 +2152,7 @@ netdev_tx_t hns3_nic_net_xmit(struct sk_buff *skb, struct net_device *netdev) if (skb_put_padto(skb, HNS3_MIN_TX_LEN)) { hns3_tx_doorbell(ring, 0, !netdev_xmit_more());
- u64_stats_update_begin(&ring->syncp); - ring->stats.sw_err_cnt++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, sw_err_cnt);
return NETDEV_TX_OK; } @@ -3522,17 +3484,13 @@ static bool hns3_nic_alloc_rx_buffers(struct hns3_enet_ring *ring, for (i = 0; i < cleand_count; i++) { desc_cb = &ring->desc_cb[ring->next_to_use]; if (desc_cb->reuse_flag) { - u64_stats_update_begin(&ring->syncp); - ring->stats.reuse_pg_cnt++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, reuse_pg_cnt);
hns3_reuse_buffer(ring, ring->next_to_use); } else { ret = hns3_alloc_and_map_buffer(ring, &res_cbs); if (ret) { - u64_stats_update_begin(&ring->syncp); - ring->stats.sw_err_cnt++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, sw_err_cnt);
hns3_rl_err(ring_to_netdev(ring), "alloc rx buffer failed: %d\n", @@ -3544,9 +3502,7 @@ static bool hns3_nic_alloc_rx_buffers(struct hns3_enet_ring *ring, } hns3_replace_buffer(ring, ring->next_to_use, &res_cbs);
- u64_stats_update_begin(&ring->syncp); - ring->stats.non_reuse_pg++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, non_reuse_pg); }
ring_ptr_move_fw(ring, next_to_use); @@ -3573,9 +3529,7 @@ static int hns3_handle_rx_copybreak(struct sk_buff *skb, int i, void *frag = napi_alloc_frag(frag_size);
if (unlikely(!frag)) { - u64_stats_update_begin(&ring->syncp); - ring->stats.frag_alloc_err++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, frag_alloc_err);
hns3_rl_err(ring_to_netdev(ring), "failed to allocate rx frag\n"); @@ -3587,9 +3541,7 @@ static int hns3_handle_rx_copybreak(struct sk_buff *skb, int i, skb_add_rx_frag(skb, i, virt_to_page(frag), offset_in_page(frag), frag_size, frag_size);
- u64_stats_update_begin(&ring->syncp); - ring->stats.frag_alloc++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, frag_alloc); return 0; }
@@ -3722,9 +3674,7 @@ static bool hns3_checksum_complete(struct hns3_enet_ring *ring, hns3_rx_ptype_tbl[ptype].ip_summed != CHECKSUM_COMPLETE) return false;
- u64_stats_update_begin(&ring->syncp); - ring->stats.csum_complete++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, csum_complete); skb->ip_summed = CHECKSUM_COMPLETE; skb->csum = csum_unfold((__force __sum16)csum);
@@ -3798,9 +3748,7 @@ static void hns3_rx_checksum(struct hns3_enet_ring *ring, struct sk_buff *skb, if (unlikely(l234info & (BIT(HNS3_RXD_L3E_B) | BIT(HNS3_RXD_L4E_B) | BIT(HNS3_RXD_OL3E_B) | BIT(HNS3_RXD_OL4E_B)))) { - u64_stats_update_begin(&ring->syncp); - ring->stats.l3l4_csum_err++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, l3l4_csum_err);
return; } @@ -3891,10 +3839,7 @@ static int hns3_alloc_skb(struct hns3_enet_ring *ring, unsigned int length, skb = ring->skb; if (unlikely(!skb)) { hns3_rl_err(netdev, "alloc rx skb fail\n"); - - u64_stats_update_begin(&ring->syncp); - ring->stats.sw_err_cnt++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, sw_err_cnt);
return -ENOMEM; } @@ -3925,9 +3870,7 @@ static int hns3_alloc_skb(struct hns3_enet_ring *ring, unsigned int length, if (ring->page_pool) skb_mark_for_recycle(skb);
- u64_stats_update_begin(&ring->syncp); - ring->stats.seg_pkt_cnt++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, seg_pkt_cnt);
ring->pull_len = eth_get_headlen(netdev, va, HNS3_RX_HEAD_SIZE); __skb_put(skb, ring->pull_len); @@ -4119,9 +4062,7 @@ static int hns3_handle_bdinfo(struct hns3_enet_ring *ring, struct sk_buff *skb) ret = hns3_set_gro_and_checksum(ring, skb, l234info, bd_base_info, ol_info, csum); if (unlikely(ret)) { - u64_stats_update_begin(&ring->syncp); - ring->stats.rx_err_cnt++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, rx_err_cnt); return ret; }
@@ -5333,9 +5274,7 @@ static int hns3_clear_rx_ring(struct hns3_enet_ring *ring) if (!ring->desc_cb[ring->next_to_use].reuse_flag) { ret = hns3_alloc_and_map_buffer(ring, &res_cbs); if (ret) { - u64_stats_update_begin(&ring->syncp); - ring->stats.sw_err_cnt++; - u64_stats_update_end(&ring->syncp); + hns3_ring_stats_update(ring, sw_err_cnt); /* if alloc new buffer fail, exit directly * and reclear in up flow. */ diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h index f09a61d9c626..91b656adaacb 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.h @@ -654,6 +654,13 @@ static inline bool hns3_nic_resetting(struct net_device *netdev)
#define hns3_buf_size(_ring) ((_ring)->buf_size)
+#define hns3_ring_stats_update(ring, cnt) do { \ + typeof(ring) (tmp) = (ring); \ + u64_stats_update_begin(&(tmp)->syncp); \ + ((tmp)->stats.cnt)++; \ + u64_stats_update_end(&(tmp)->syncp); \ +} while (0) \ + static inline unsigned int hns3_page_order(struct hns3_enet_ring *ring) { #if (PAGE_SIZE < 8192)
From: Jian Shen shenjian15@huawei.com
[ Upstream commit 7d89b53cea1a702f97117fb4361523519bb1e52c ]
For device supports RXD advanced layout, the driver will return directly if the hardware finish the checksum calculate. It cause missing L3E checking for ip packets. Fixes it.
Fixes: 1ddc028ac849 ("net: hns3: refactor out RX completion checksum") Signed-off-by: Jian Shen shenjian15@huawei.com Signed-off-by: Hao Lan lanhao@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c index d06e2d0bae2e..822193b0d709 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -3667,18 +3667,16 @@ static int hns3_gro_complete(struct sk_buff *skb, u32 l234info) return 0; }
-static bool hns3_checksum_complete(struct hns3_enet_ring *ring, +static void hns3_checksum_complete(struct hns3_enet_ring *ring, struct sk_buff *skb, u32 ptype, u16 csum) { if (ptype == HNS3_INVALID_PTYPE || hns3_rx_ptype_tbl[ptype].ip_summed != CHECKSUM_COMPLETE) - return false; + return;
hns3_ring_stats_update(ring, csum_complete); skb->ip_summed = CHECKSUM_COMPLETE; skb->csum = csum_unfold((__force __sum16)csum); - - return true; }
static void hns3_rx_handle_csum(struct sk_buff *skb, u32 l234info, @@ -3738,8 +3736,7 @@ static void hns3_rx_checksum(struct hns3_enet_ring *ring, struct sk_buff *skb, ptype = hnae3_get_field(ol_info, HNS3_RXD_PTYPE_M, HNS3_RXD_PTYPE_S);
- if (hns3_checksum_complete(ring, skb, ptype, csum)) - return; + hns3_checksum_complete(ring, skb, ptype, csum);
/* check if hardware has done checksum */ if (!(bd_base_info & BIT(HNS3_RXD_L3L4P_B))) @@ -3748,6 +3745,7 @@ static void hns3_rx_checksum(struct hns3_enet_ring *ring, struct sk_buff *skb, if (unlikely(l234info & (BIT(HNS3_RXD_L3E_B) | BIT(HNS3_RXD_L4E_B) | BIT(HNS3_RXD_OL3E_B) | BIT(HNS3_RXD_OL4E_B)))) { + skb->ip_summed = CHECKSUM_NONE; hns3_ring_stats_update(ring, l3l4_csum_err);
return;
From: Jian Shen shenjian15@huawei.com
[ Upstream commit 8ee57c7b8406c7aa8ca31e014440c87c6383f429 ]
Currently, it missed set HCLGE_VPORT_STATE_PROMISC_CHANGE flag for VF when vport->overflow_promisc_flags changed. So the VF won't check whether to update promisc mode in this case. So add it.
Fixes: 1e6e76101fd9 ("net: hns3: configure promisc mode for VF asynchronously") Signed-off-by: Jian Shen shenjian15@huawei.com Signed-off-by: Hao Lan lanhao@huawei.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- .../hisilicon/hns3/hns3pf/hclge_main.c | 75 +++++++++++-------- 1 file changed, 43 insertions(+), 32 deletions(-)
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index 2102b38b9c35..f4d58fcdba27 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -12825,60 +12825,71 @@ static int hclge_gro_en(struct hnae3_handle *handle, bool enable) return ret; }
-static void hclge_sync_promisc_mode(struct hclge_dev *hdev) +static int hclge_sync_vport_promisc_mode(struct hclge_vport *vport) { - struct hclge_vport *vport = &hdev->vport[0]; struct hnae3_handle *handle = &vport->nic; + struct hclge_dev *hdev = vport->back; + bool uc_en = false; + bool mc_en = false; u8 tmp_flags; + bool bc_en; int ret; - u16 i;
if (vport->last_promisc_flags != vport->overflow_promisc_flags) { set_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, &vport->state); vport->last_promisc_flags = vport->overflow_promisc_flags; }
- if (test_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, &vport->state)) { + if (!test_and_clear_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, + &vport->state)) + return 0; + + /* for PF */ + if (!vport->vport_id) { tmp_flags = handle->netdev_flags | vport->last_promisc_flags; ret = hclge_set_promisc_mode(handle, tmp_flags & HNAE3_UPE, tmp_flags & HNAE3_MPE); - if (!ret) { - clear_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, - &vport->state); + if (!ret) set_bit(HCLGE_VPORT_STATE_VLAN_FLTR_CHANGE, &vport->state); - } + else + set_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, + &vport->state); + return ret; }
- for (i = 1; i < hdev->num_alloc_vport; i++) { - bool uc_en = false; - bool mc_en = false; - bool bc_en; + /* for VF */ + if (vport->vf_info.trusted) { + uc_en = vport->vf_info.request_uc_en > 0 || + vport->overflow_promisc_flags & HNAE3_OVERFLOW_UPE; + mc_en = vport->vf_info.request_mc_en > 0 || + vport->overflow_promisc_flags & HNAE3_OVERFLOW_MPE; + } + bc_en = vport->vf_info.request_bc_en > 0;
- vport = &hdev->vport[i]; + ret = hclge_cmd_set_promisc_mode(hdev, vport->vport_id, uc_en, + mc_en, bc_en); + if (ret) { + set_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, &vport->state); + return ret; + } + hclge_set_vport_vlan_fltr_change(vport);
- if (!test_and_clear_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, - &vport->state)) - continue; + return 0; +}
- if (vport->vf_info.trusted) { - uc_en = vport->vf_info.request_uc_en > 0 || - vport->overflow_promisc_flags & - HNAE3_OVERFLOW_UPE; - mc_en = vport->vf_info.request_mc_en > 0 || - vport->overflow_promisc_flags & - HNAE3_OVERFLOW_MPE; - } - bc_en = vport->vf_info.request_bc_en > 0; +static void hclge_sync_promisc_mode(struct hclge_dev *hdev) +{ + struct hclge_vport *vport; + int ret; + u16 i;
- ret = hclge_cmd_set_promisc_mode(hdev, vport->vport_id, uc_en, - mc_en, bc_en); - if (ret) { - set_bit(HCLGE_VPORT_STATE_PROMISC_CHANGE, - &vport->state); + for (i = 0; i < hdev->num_alloc_vport; i++) { + vport = &hdev->vport[i]; + + ret = hclge_sync_vport_promisc_mode(vport); + if (ret) return; - } - hclge_set_vport_vlan_fltr_change(vport); } }
From: Hawkins Jiawei yin31149@gmail.com
[ Upstream commit 399ab7fe0fa0d846881685fd4e57e9a8ef7559f7 ]
Syzkaller reports a memory leak as follows: ==================================== BUG: memory leak unreferenced object 0xffff88810c287f00 (size 256): comm "syz-executor105", pid 3600, jiffies 4294943292 (age 12.990s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff814cf9f0>] kmalloc_trace+0x20/0x90 mm/slab_common.c:1046 [<ffffffff839c9e07>] kmalloc include/linux/slab.h:576 [inline] [<ffffffff839c9e07>] kmalloc_array include/linux/slab.h:627 [inline] [<ffffffff839c9e07>] kcalloc include/linux/slab.h:659 [inline] [<ffffffff839c9e07>] tcf_exts_init include/net/pkt_cls.h:250 [inline] [<ffffffff839c9e07>] tcindex_set_parms+0xa7/0xbe0 net/sched/cls_tcindex.c:342 [<ffffffff839caa1f>] tcindex_change+0xdf/0x120 net/sched/cls_tcindex.c:553 [<ffffffff8394db62>] tc_new_tfilter+0x4f2/0x1100 net/sched/cls_api.c:2147 [<ffffffff8389e91c>] rtnetlink_rcv_msg+0x4dc/0x5d0 net/core/rtnetlink.c:6082 [<ffffffff839eba67>] netlink_rcv_skb+0x87/0x1d0 net/netlink/af_netlink.c:2540 [<ffffffff839eab87>] netlink_unicast_kernel net/netlink/af_netlink.c:1319 [inline] [<ffffffff839eab87>] netlink_unicast+0x397/0x4c0 net/netlink/af_netlink.c:1345 [<ffffffff839eb046>] netlink_sendmsg+0x396/0x710 net/netlink/af_netlink.c:1921 [<ffffffff8383e796>] sock_sendmsg_nosec net/socket.c:714 [inline] [<ffffffff8383e796>] sock_sendmsg+0x56/0x80 net/socket.c:734 [<ffffffff8383eb08>] ____sys_sendmsg+0x178/0x410 net/socket.c:2482 [<ffffffff83843678>] ___sys_sendmsg+0xa8/0x110 net/socket.c:2536 [<ffffffff838439c5>] __sys_sendmmsg+0x105/0x330 net/socket.c:2622 [<ffffffff83843c14>] __do_sys_sendmmsg net/socket.c:2651 [inline] [<ffffffff83843c14>] __se_sys_sendmmsg net/socket.c:2648 [inline] [<ffffffff83843c14>] __x64_sys_sendmmsg+0x24/0x30 net/socket.c:2648 [<ffffffff84605fd5>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<ffffffff84605fd5>] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 [<ffffffff84800087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd ====================================
Kernel uses tcindex_change() to change an existing filter properties.
Yet the problem is that, during the process of changing, if `old_r` is retrieved from `p->perfect`, then kernel uses tcindex_alloc_perfect_hash() to newly allocate filter results, uses tcindex_filter_result_init() to clear the old filter result, without destroying its tcf_exts structure, which triggers the above memory leak.
To be more specific, there are only two source for the `old_r`, according to the tcindex_lookup(). `old_r` is retrieved from `p->perfect`, or `old_r` is retrieved from `p->h`.
* If `old_r` is retrieved from `p->perfect`, kernel uses tcindex_alloc_perfect_hash() to newly allocate the filter results. Then `r` is assigned with `cp->perfect + handle`, which is newly allocated. So condition `old_r && old_r != r` is true in this situation, and kernel uses tcindex_filter_result_init() to clear the old filter result, without destroying its tcf_exts structure
* If `old_r` is retrieved from `p->h`, then `p->perfect` is NULL according to the tcindex_lookup(). Considering that `cp->h` is directly copied from `p->h` and `p->perfect` is NULL, `r` is assigned with `tcindex_lookup(cp, handle)`, whose value should be the same as `old_r`, so condition `old_r && old_r != r` is false in this situation, kernel ignores using tcindex_filter_result_init() to clear the old filter result.
So only when `old_r` is retrieved from `p->perfect` does kernel use tcindex_filter_result_init() to clear the old filter result, which triggers the above memory leak.
Considering that there already exists a tc_filter_wq workqueue to destroy the old tcindex_data by tcindex_partial_destroy_work() at the end of tcindex_set_parms(), this patch solves this memory leak bug by removing this old filter result clearing part and delegating it to the tc_filter_wq workqueue.
Note that this patch doesn't introduce any other issues. If `old_r` is retrieved from `p->perfect`, this patch just delegates old filter result clearing part to the tc_filter_wq workqueue; If `old_r` is retrieved from `p->h`, kernel doesn't reach the old filter result clearing part, so removing this part has no effect.
[Thanks to the suggestion from Jakub Kicinski, Cong Wang, Paolo Abeni and Dmitry Vyukov]
Fixes: b9a24bb76bf6 ("net_sched: properly handle failure case of tcf_exts_init()") Link: https://lore.kernel.org/all/0000000000001de5c505ebc9ec59@google.com/ Reported-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com Tested-by: syzbot+232ebdbd36706c965ebf@syzkaller.appspotmail.com Cc: Cong Wang cong.wang@bytedance.com Cc: Jakub Kicinski kuba@kernel.org Cc: Paolo Abeni pabeni@redhat.com Cc: Dmitry Vyukov dvyukov@google.com Acked-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Hawkins Jiawei yin31149@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/cls_tcindex.c | 12 ++---------- 1 file changed, 2 insertions(+), 10 deletions(-)
diff --git a/net/sched/cls_tcindex.c b/net/sched/cls_tcindex.c index 742c7d49a958..8d1ef858db87 100644 --- a/net/sched/cls_tcindex.c +++ b/net/sched/cls_tcindex.c @@ -332,7 +332,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, struct tcindex_filter_result *r, struct nlattr **tb, struct nlattr *est, u32 flags, struct netlink_ext_ack *extack) { - struct tcindex_filter_result new_filter_result, *old_r = r; + struct tcindex_filter_result new_filter_result; struct tcindex_data *cp = NULL, *oldp; struct tcindex_filter *f = NULL; /* make gcc behave */ struct tcf_result cr = {}; @@ -401,7 +401,7 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, err = tcindex_filter_result_init(&new_filter_result, cp, net); if (err < 0) goto errout_alloc; - if (old_r) + if (r) cr = r->res;
err = -EBUSY; @@ -478,14 +478,6 @@ tcindex_set_parms(struct net *net, struct tcf_proto *tp, unsigned long base, tcf_bind_filter(tp, &cr, base); }
- if (old_r && old_r != r) { - err = tcindex_filter_result_init(old_r, cp, net); - if (err < 0) { - kfree(f); - goto errout_alloc; - } - } - oldp = p; r->res = cr; tcf_exts_change(&r->exts, &e);
From: Daniil Tatianin d-tatianin@yandex-team.ru
[ Upstream commit 13a7c8964afcd8ca43c0b6001ebb0127baa95362 ]
adapter->dcb would get silently freed inside qlcnic_dcb_enable() in case qlcnic_dcb_attach() would return an error, which always happens under OOM conditions. This would lead to use-after-free because both of the existing callers invoke qlcnic_dcb_get_info() on the obtained pointer, which is potentially freed at that point.
Propagate errors from qlcnic_dcb_enable(), and instead free the dcb pointer at callsite using qlcnic_dcb_free(). This also removes the now unused qlcnic_clear_dcb_ops() helper, which was a simple wrapper around kfree() also causing memory leaks for partially initialized dcb.
Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool.
Fixes: 3c44bba1d270 ("qlcnic: Disable DCB operations from SR-IOV VFs") Reviewed-by: Michal Swiatkowski michal.swiatkowski@linux.intel.com Signed-off-by: Daniil Tatianin d-tatianin@yandex-team.ru Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c | 8 +++++++- drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h | 10 ++-------- drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c | 8 +++++++- 3 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c index 27dffa299ca6..7c3cf9ad4563 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_83xx_init.c @@ -2505,7 +2505,13 @@ int qlcnic_83xx_init(struct qlcnic_adapter *adapter, int pci_using_dac) goto disable_mbx_intr;
qlcnic_83xx_clear_function_resources(adapter); - qlcnic_dcb_enable(adapter->dcb); + + err = qlcnic_dcb_enable(adapter->dcb); + if (err) { + qlcnic_dcb_free(adapter->dcb); + goto disable_mbx_intr; + } + qlcnic_83xx_initialize_nic(adapter, 1); qlcnic_dcb_get_info(adapter->dcb);
diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h index 7519773eaca6..22afa2be85fd 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_dcb.h @@ -41,11 +41,6 @@ struct qlcnic_dcb { unsigned long state; };
-static inline void qlcnic_clear_dcb_ops(struct qlcnic_dcb *dcb) -{ - kfree(dcb); -} - static inline int qlcnic_dcb_get_hw_capability(struct qlcnic_dcb *dcb) { if (dcb && dcb->ops->get_hw_capability) @@ -112,9 +107,8 @@ static inline void qlcnic_dcb_init_dcbnl_ops(struct qlcnic_dcb *dcb) dcb->ops->init_dcbnl_ops(dcb); }
-static inline void qlcnic_dcb_enable(struct qlcnic_dcb *dcb) +static inline int qlcnic_dcb_enable(struct qlcnic_dcb *dcb) { - if (dcb && qlcnic_dcb_attach(dcb)) - qlcnic_clear_dcb_ops(dcb); + return dcb ? qlcnic_dcb_attach(dcb) : 0; } #endif diff --git a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c index 75960a29f80e..cec07d5bbe67 100644 --- a/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c +++ b/drivers/net/ethernet/qlogic/qlcnic/qlcnic_main.c @@ -2616,7 +2616,13 @@ qlcnic_probe(struct pci_dev *pdev, const struct pci_device_id *ent) "Device does not support MSI interrupts\n");
if (qlcnic_82xx_check(adapter)) { - qlcnic_dcb_enable(adapter->dcb); + err = qlcnic_dcb_enable(adapter->dcb); + if (err) { + qlcnic_dcb_free(adapter->dcb); + dev_err(&pdev->dev, "Failed to enable DCB\n"); + goto err_out_free_hw; + } + qlcnic_dcb_get_info(adapter->dcb); err = qlcnic_setup_intr(adapter);
From: Johnny S. Lee foss@jsl.io
[ Upstream commit 30e725537546248bddc12eaac2fe0a258917f190 ]
PTP hardware timestamping related objects are not linked when PTP support for MV88E6xxx (NET_DSA_MV88E6XXX_PTP) is disabled, therefore NET_DSA_MV88E6XXX should not depend on PTP_1588_CLOCK_OPTIONAL regardless of NET_DSA_MV88E6XXX_PTP.
Instead, condition more strictly on how NET_DSA_MV88E6XXX_PTP's dependencies are met, making sure that it cannot be enabled when NET_DSA_MV88E6XXX=y and PTP_1588_CLOCK=m.
In other words, this commit allows NET_DSA_MV88E6XXX to be built-in while PTP_1588_CLOCK is a module, as long as NET_DSA_MV88E6XXX_PTP is prevented from being enabled.
Fixes: e5f31552674e ("ethernet: fix PTP_1588_CLOCK dependencies") Signed-off-by: Johnny S. Lee foss@jsl.io Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mv88e6xxx/Kconfig | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/mv88e6xxx/Kconfig b/drivers/net/dsa/mv88e6xxx/Kconfig index 7a2445a34eb7..e3181d5471df 100644 --- a/drivers/net/dsa/mv88e6xxx/Kconfig +++ b/drivers/net/dsa/mv88e6xxx/Kconfig @@ -2,7 +2,6 @@ config NET_DSA_MV88E6XXX tristate "Marvell 88E6xxx Ethernet switch fabric support" depends on NET_DSA - depends on PTP_1588_CLOCK_OPTIONAL select IRQ_DOMAIN select NET_DSA_TAG_EDSA select NET_DSA_TAG_DSA @@ -13,7 +12,8 @@ config NET_DSA_MV88E6XXX config NET_DSA_MV88E6XXX_PTP bool "PTP support for Marvell 88E6xxx" default n - depends on NET_DSA_MV88E6XXX && PTP_1588_CLOCK + depends on (NET_DSA_MV88E6XXX = y && PTP_1588_CLOCK = y) || \ + (NET_DSA_MV88E6XXX = m && PTP_1588_CLOCK) help Say Y to enable PTP hardware timestamping on Marvell 88E6xxx switch chips that support it.
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit df49908f3c52d211aea5e2a14a93bbe67a2cb3af ]
nfc_get_device() take reference for the device, add missing nfc_put_device() to release it when not need anymore. Also fix the style warnning by use error EOPNOTSUPP instead of ENOTSUPP.
Fixes: 5ce3f32b5264 ("NFC: netlink: SE API implementation") Fixes: 29e76924cf08 ("nfc: netlink: Add capability to reply to vendor_cmd with data") Signed-off-by: Miaoqian Lin linmq006@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/nfc/netlink.c | 52 ++++++++++++++++++++++++++++++++++------------- 1 file changed, 38 insertions(+), 14 deletions(-)
diff --git a/net/nfc/netlink.c b/net/nfc/netlink.c index a207f0b8137b..d928d5a24bbc 100644 --- a/net/nfc/netlink.c +++ b/net/nfc/netlink.c @@ -1497,6 +1497,7 @@ static int nfc_genl_se_io(struct sk_buff *skb, struct genl_info *info) u32 dev_idx, se_idx; u8 *apdu; size_t apdu_len; + int rc;
if (!info->attrs[NFC_ATTR_DEVICE_INDEX] || !info->attrs[NFC_ATTR_SE_INDEX] || @@ -1510,25 +1511,37 @@ static int nfc_genl_se_io(struct sk_buff *skb, struct genl_info *info) if (!dev) return -ENODEV;
- if (!dev->ops || !dev->ops->se_io) - return -ENOTSUPP; + if (!dev->ops || !dev->ops->se_io) { + rc = -EOPNOTSUPP; + goto put_dev; + }
apdu_len = nla_len(info->attrs[NFC_ATTR_SE_APDU]); - if (apdu_len == 0) - return -EINVAL; + if (apdu_len == 0) { + rc = -EINVAL; + goto put_dev; + }
apdu = nla_data(info->attrs[NFC_ATTR_SE_APDU]); - if (!apdu) - return -EINVAL; + if (!apdu) { + rc = -EINVAL; + goto put_dev; + }
ctx = kzalloc(sizeof(struct se_io_ctx), GFP_KERNEL); - if (!ctx) - return -ENOMEM; + if (!ctx) { + rc = -ENOMEM; + goto put_dev; + }
ctx->dev_idx = dev_idx; ctx->se_idx = se_idx;
- return nfc_se_io(dev, se_idx, apdu, apdu_len, se_io_cb, ctx); + rc = nfc_se_io(dev, se_idx, apdu, apdu_len, se_io_cb, ctx); + +put_dev: + nfc_put_device(dev); + return rc; }
static int nfc_genl_vendor_cmd(struct sk_buff *skb, @@ -1551,14 +1564,21 @@ static int nfc_genl_vendor_cmd(struct sk_buff *skb, subcmd = nla_get_u32(info->attrs[NFC_ATTR_VENDOR_SUBCMD]);
dev = nfc_get_device(dev_idx); - if (!dev || !dev->vendor_cmds || !dev->n_vendor_cmds) + if (!dev) return -ENODEV;
+ if (!dev->vendor_cmds || !dev->n_vendor_cmds) { + err = -ENODEV; + goto put_dev; + } + if (info->attrs[NFC_ATTR_VENDOR_DATA]) { data = nla_data(info->attrs[NFC_ATTR_VENDOR_DATA]); data_len = nla_len(info->attrs[NFC_ATTR_VENDOR_DATA]); - if (data_len == 0) - return -EINVAL; + if (data_len == 0) { + err = -EINVAL; + goto put_dev; + } } else { data = NULL; data_len = 0; @@ -1573,10 +1593,14 @@ static int nfc_genl_vendor_cmd(struct sk_buff *skb, dev->cur_cmd_info = info; err = cmd->doit(dev, data, data_len); dev->cur_cmd_info = NULL; - return err; + goto put_dev; }
- return -EOPNOTSUPP; + err = -EOPNOTSUPP; + +put_dev: + nfc_put_device(dev); + return err; }
/* message building helper */
From: ruanjinjie ruanjinjie@huawei.com
[ Upstream commit aeca7ff254843d49a8739f07f7dab1341450111d ]
Inject fault while probing module, if device_register() fails in vdpasim_net_init() or vdpasim_blk_init(), but the refcount of kobject is not decreased to 0, the name allocated in dev_set_name() is leaked. Fix this by calling put_device(), so that name can be freed in callback function kobject_cleanup().
(vdpa_sim_net) unreferenced object 0xffff88807eebc370 (size 16): comm "modprobe", pid 3848, jiffies 4362982860 (age 18.153s) hex dump (first 16 bytes): 76 64 70 61 73 69 6d 5f 6e 65 74 00 6b 6b 6b a5 vdpasim_net.kkk. backtrace: [<ffffffff8174f19e>] __kmalloc_node_track_caller+0x4e/0x150 [<ffffffff81731d53>] kstrdup+0x33/0x60 [<ffffffff83a5d421>] kobject_set_name_vargs+0x41/0x110 [<ffffffff82d87aab>] dev_set_name+0xab/0xe0 [<ffffffff82d91a23>] device_add+0xe3/0x1a80 [<ffffffffa0270013>] 0xffffffffa0270013 [<ffffffff81001c27>] do_one_initcall+0x87/0x2e0 [<ffffffff813739cb>] do_init_module+0x1ab/0x640 [<ffffffff81379d20>] load_module+0x5d00/0x77f0 [<ffffffff8137bc40>] __do_sys_finit_module+0x110/0x1b0 [<ffffffff83c4d505>] do_syscall_64+0x35/0x80 [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
(vdpa_sim_blk) unreferenced object 0xffff8881070c1250 (size 16): comm "modprobe", pid 6844, jiffies 4364069319 (age 17.572s) hex dump (first 16 bytes): 76 64 70 61 73 69 6d 5f 62 6c 6b 00 6b 6b 6b a5 vdpasim_blk.kkk. backtrace: [<ffffffff8174f19e>] __kmalloc_node_track_caller+0x4e/0x150 [<ffffffff81731d53>] kstrdup+0x33/0x60 [<ffffffff83a5d421>] kobject_set_name_vargs+0x41/0x110 [<ffffffff82d87aab>] dev_set_name+0xab/0xe0 [<ffffffff82d91a23>] device_add+0xe3/0x1a80 [<ffffffffa0220013>] 0xffffffffa0220013 [<ffffffff81001c27>] do_one_initcall+0x87/0x2e0 [<ffffffff813739cb>] do_init_module+0x1ab/0x640 [<ffffffff81379d20>] load_module+0x5d00/0x77f0 [<ffffffff8137bc40>] __do_sys_finit_module+0x110/0x1b0 [<ffffffff83c4d505>] do_syscall_64+0x35/0x80 [<ffffffff83e0006a>] entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fixes: 899c4d187f6a ("vdpa_sim_blk: add support for vdpa management tool") Fixes: a3c06ae158dd ("vdpa_sim_net: Add support for user supported devices")
Signed-off-by: ruanjinjie ruanjinjie@huawei.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Message-Id: 20221110082348.4105476-1-ruanjinjie@huawei.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vdpa/vdpa_sim/vdpa_sim_blk.c | 4 +++- drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 4 +++- 2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c index a790903f243e..22b812c32bee 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_blk.c @@ -308,8 +308,10 @@ static int __init vdpasim_blk_init(void) int ret;
ret = device_register(&vdpasim_blk_mgmtdev); - if (ret) + if (ret) { + put_device(&vdpasim_blk_mgmtdev); return ret; + }
ret = vdpa_mgmtdev_register(&mgmt_dev); if (ret) diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c index a1ab6163f7d1..f1c420c5e26e 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim_net.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim_net.c @@ -194,8 +194,10 @@ static int __init vdpasim_net_init(void) }
ret = device_register(&vdpasim_net_mgmtdev); - if (ret) + if (ret) { + put_device(&vdpasim_net_mgmtdev); return ret; + }
ret = vdpa_mgmtdev_register(&mgmt_dev); if (ret)
From: Yuan Can yuancan@huawei.com
[ Upstream commit 7a4efe182ca61fb3e5307e69b261c57cbf434cd4 ]
A problem about modprobe vhost_vsock failed is triggered with the following log given:
modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy
The reason is that vhost_vsock_init() returns misc_register() directly without checking its return value, if misc_register() failed, it returns without calling vsock_core_unregister() on vhost_transport, resulting the vhost_vsock can never be installed later. A simple call graph is shown as below:
vhost_vsock_init() vsock_core_register() # register vhost_transport misc_register() device_create_with_groups() device_create_groups_vargs() dev = kzalloc(...) # OOM happened # return without unregister vhost_transport
Fix by calling vsock_core_unregister() when misc_register() returns error.
Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Yuan Can yuancan@huawei.com Message-Id: 20221108101705.45981-1-yuancan@huawei.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Stefano Garzarella sgarzare@redhat.com Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/vsock.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index 97bfe499222b..74ac0c28fe43 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -968,7 +968,14 @@ static int __init vhost_vsock_init(void) VSOCK_TRANSPORT_F_H2G); if (ret < 0) return ret; - return misc_register(&vhost_vsock_misc); + + ret = misc_register(&vhost_vsock_misc); + if (ret) { + vsock_core_unregister(&vhost_transport.transport); + return ret; + } + + return 0; };
static void __exit vhost_vsock_exit(void)
From: Stefano Garzarella sgarzare@redhat.com
[ Upstream commit f85efa9b0f5381874f727bd98f56787840313f0b ]
vhost_iotlb_itree_first() requires `start` and `last` parameters to search for a mapping that overlaps the range.
In iotlb_translate() we cyclically call vhost_iotlb_itree_first(), incrementing `addr` by the amount already translated, so rightly we move the `start` parameter passed to vhost_iotlb_itree_first(), but we should hold the `last` parameter constant.
Let's fix it by saving the `last` parameter value before incrementing `addr` in the loop.
Fixes: 9ad9c49cfe97 ("vringh: IOTLB support") Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: Stefano Garzarella sgarzare@redhat.com Message-Id: 20221109102503.18816-2-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/vringh.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/vhost/vringh.c b/drivers/vhost/vringh.c index eab55accf381..786876af0a73 100644 --- a/drivers/vhost/vringh.c +++ b/drivers/vhost/vringh.c @@ -1101,7 +1101,7 @@ static int iotlb_translate(const struct vringh *vrh, struct vhost_iotlb_map *map; struct vhost_iotlb *iotlb = vrh->iotlb; int ret = 0; - u64 s = 0; + u64 s = 0, last = addr + len - 1;
spin_lock(vrh->iotlb_lock);
@@ -1113,8 +1113,7 @@ static int iotlb_translate(const struct vringh *vrh, break; }
- map = vhost_iotlb_itree_first(iotlb, addr, - addr + len - 1); + map = vhost_iotlb_itree_first(iotlb, addr, last); if (!map || map->start > addr) { ret = -EINVAL; break;
From: Stefano Garzarella sgarzare@redhat.com
[ Upstream commit 98047313cdb46828093894d0ac8b1183b8b317f9 ]
vhost_iotlb_itree_first() requires `start` and `last` parameters to search for a mapping that overlaps the range.
In translate_desc() we cyclically call vhost_iotlb_itree_first(), incrementing `addr` by the amount already translated, so rightly we move the `start` parameter passed to vhost_iotlb_itree_first(), but we should hold the `last` parameter constant.
Let's fix it by saving the `last` parameter value before incrementing `addr` in the loop.
Fixes: a9709d6874d5 ("vhost: convert pre sorted vhost memory array to interval tree") Acked-by: Jason Wang jasowang@redhat.com Signed-off-by: Stefano Garzarella sgarzare@redhat.com Message-Id: 20221109102503.18816-3-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/vhost.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 6942472cffb0..0a9746bc9228 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -2048,7 +2048,7 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, struct vhost_dev *dev = vq->dev; struct vhost_iotlb *umem = dev->iotlb ? dev->iotlb : dev->umem; struct iovec *_iov; - u64 s = 0; + u64 s = 0, last = addr + len - 1; int ret = 0;
while ((u64)len > s) { @@ -2058,7 +2058,7 @@ static int translate_desc(struct vhost_virtqueue *vq, u64 addr, u32 len, break; }
- map = vhost_iotlb_itree_first(umem, addr, addr + len - 1); + map = vhost_iotlb_itree_first(umem, addr, last); if (map == NULL || map->start > addr) { if (umem != dev->iotlb) { ret = -EFAULT;
From: Stefano Garzarella sgarzare@redhat.com
[ Upstream commit 794ec498c9fa79e6bfd71b931410d5897a9c00d4 ]
When we initialize vringh, we should pass the features and the number of elements in the virtqueue negotiated with the driver, otherwise operations with vringh may fail.
This was discovered in a case where the driver sets a number of elements in the virtqueue different from the value returned by .get_vq_num_max().
In vdpasim_vq_reset() is safe to initialize the vringh with default values, since the virtqueue will not be used until vdpasim_queue_ready() is called again.
Fixes: 2c53d0f64c06 ("vdpasim: vDPA device simulator") Signed-off-by: Stefano Garzarella sgarzare@redhat.com Message-Id: 20221110141335.62171-1-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Jason Wang jasowang@redhat.com Acked-by: Eugenio Pérez eperezma@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/drivers/vdpa/vdpa_sim/vdpa_sim.c b/drivers/vdpa/vdpa_sim/vdpa_sim.c index 2faf3bd1c3ba..4d9e3fdae5f6 100644 --- a/drivers/vdpa/vdpa_sim/vdpa_sim.c +++ b/drivers/vdpa/vdpa_sim/vdpa_sim.c @@ -66,8 +66,7 @@ static void vdpasim_queue_ready(struct vdpasim *vdpasim, unsigned int idx) { struct vdpasim_virtqueue *vq = &vdpasim->vqs[idx];
- vringh_init_iotlb(&vq->vring, vdpasim->dev_attr.supported_features, - VDPASIM_QUEUE_MAX, false, + vringh_init_iotlb(&vq->vring, vdpasim->features, vq->num, false, (struct vring_desc *)(uintptr_t)vq->desc_addr, (struct vring_avail *) (uintptr_t)vq->driver_addr,
From: Moshe Shemesh moshe@nvidia.com
[ Upstream commit 1f0ae22ab470946143485a02cc1cd7e05c0f9120 ]
Fix SRIOV VST mode behavior to insert cvlan when a guest tag is already present in the frame. Previous VST mode behavior was to drop packets or override existing tag, depending on the device version.
In this patch we fix this behavior by correctly building the HW steering rule with a push vlan action, or for older devices we ask the FW to stack the vlan when a vlan is already present.
Fixes: 07bab9502641 ("net/mlx5: E-Switch, Refactor eswitch ingress acl codes") Fixes: dfcb1ed3c331 ("net/mlx5: E-Switch, Vport ingress/egress ACLs rules for VST mode") Signed-off-by: Moshe Shemesh moshe@nvidia.com Reviewed-by: Mark Bloch mbloch@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../mellanox/mlx5/core/esw/acl/egress_lgcy.c | 7 +++- .../mellanox/mlx5/core/esw/acl/ingress_lgcy.c | 33 ++++++++++++++++--- .../net/ethernet/mellanox/mlx5/core/eswitch.c | 30 ++++++++++++----- .../net/ethernet/mellanox/mlx5/core/eswitch.h | 6 ++++ include/linux/mlx5/device.h | 5 +++ include/linux/mlx5/mlx5_ifc.h | 3 +- 6 files changed, 68 insertions(+), 16 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c index 60a73990017c..6b4c9ffad95b 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/egress_lgcy.c @@ -67,6 +67,7 @@ static void esw_acl_egress_lgcy_groups_destroy(struct mlx5_vport *vport) int esw_acl_egress_lgcy_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) { + bool vst_mode_steering = esw_vst_mode_is_steering(esw); struct mlx5_flow_destination drop_ctr_dst = {}; struct mlx5_flow_destination *dst = NULL; struct mlx5_fc *drop_counter = NULL; @@ -77,6 +78,7 @@ int esw_acl_egress_lgcy_setup(struct mlx5_eswitch *esw, */ int table_size = 2; int dest_num = 0; + int actions_flag; int err = 0;
if (vport->egress.legacy.drop_counter) { @@ -119,8 +121,11 @@ int esw_acl_egress_lgcy_setup(struct mlx5_eswitch *esw, vport->vport, vport->info.vlan, vport->info.qos);
/* Allowed vlan rule */ + actions_flag = MLX5_FLOW_CONTEXT_ACTION_ALLOW; + if (vst_mode_steering) + actions_flag |= MLX5_FLOW_CONTEXT_ACTION_VLAN_POP; err = esw_egress_acl_vlan_create(esw, vport, NULL, vport->info.vlan, - MLX5_FLOW_CONTEXT_ACTION_ALLOW); + actions_flag); if (err) goto out;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c index b1a5199260f6..093ed86a0acd 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/esw/acl/ingress_lgcy.c @@ -139,11 +139,14 @@ static void esw_acl_ingress_lgcy_groups_destroy(struct mlx5_vport *vport) int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) { + bool vst_mode_steering = esw_vst_mode_is_steering(esw); struct mlx5_flow_destination drop_ctr_dst = {}; struct mlx5_flow_destination *dst = NULL; struct mlx5_flow_act flow_act = {}; struct mlx5_flow_spec *spec = NULL; struct mlx5_fc *counter = NULL; + bool vst_check_cvlan = false; + bool vst_push_cvlan = false; /* The ingress acl table contains 4 groups * (2 active rules at the same time - * 1 allow rule from one of the first 3 groups. @@ -203,7 +206,26 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw, goto out; }
- if (vport->info.vlan || vport->info.qos) + if ((vport->info.vlan || vport->info.qos)) { + if (vst_mode_steering) + vst_push_cvlan = true; + else if (!MLX5_CAP_ESW(esw->dev, vport_cvlan_insert_always)) + vst_check_cvlan = true; + } + + if (vst_check_cvlan || vport->info.spoofchk) + spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; + + /* Create ingress allow rule */ + flow_act.action = MLX5_FLOW_CONTEXT_ACTION_ALLOW; + if (vst_push_cvlan) { + flow_act.action |= MLX5_FLOW_CONTEXT_ACTION_VLAN_PUSH; + flow_act.vlan[0].prio = vport->info.qos; + flow_act.vlan[0].vid = vport->info.vlan; + flow_act.vlan[0].ethtype = ETH_P_8021Q; + } + + if (vst_check_cvlan) MLX5_SET_TO_ONES(fte_match_param, spec->match_criteria, outer_headers.cvlan_tag);
@@ -218,9 +240,6 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw, ether_addr_copy(smac_v, vport->info.mac); }
- /* Create ingress allow rule */ - spec->match_criteria_enable = MLX5_MATCH_OUTER_HEADERS; - flow_act.action = MLX5_FLOW_CONTEXT_ACTION_ALLOW; vport->ingress.allow_rule = mlx5_add_flow_rules(vport->ingress.acl, spec, &flow_act, NULL, 0); if (IS_ERR(vport->ingress.allow_rule)) { @@ -232,6 +251,9 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw, goto out; }
+ if (!vst_check_cvlan && !vport->info.spoofchk) + goto out; + memset(&flow_act, 0, sizeof(flow_act)); flow_act.action = MLX5_FLOW_CONTEXT_ACTION_DROP; /* Attach drop flow counter */ @@ -257,7 +279,8 @@ int esw_acl_ingress_lgcy_setup(struct mlx5_eswitch *esw, return 0;
out: - esw_acl_ingress_lgcy_cleanup(esw, vport); + if (err) + esw_acl_ingress_lgcy_cleanup(esw, vport); kvfree(spec); return err; } diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c index 51a8cecc4a7c..2b9278002354 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c @@ -160,10 +160,17 @@ static int modify_esw_vport_cvlan(struct mlx5_core_dev *dev, u16 vport, esw_vport_context.vport_cvlan_strip, 1);
if (set_flags & SET_VLAN_INSERT) { - /* insert only if no vlan in packet */ - MLX5_SET(modify_esw_vport_context_in, in, - esw_vport_context.vport_cvlan_insert, 1); - + if (MLX5_CAP_ESW(dev, vport_cvlan_insert_always)) { + /* insert either if vlan exist in packet or not */ + MLX5_SET(modify_esw_vport_context_in, in, + esw_vport_context.vport_cvlan_insert, + MLX5_VPORT_CVLAN_INSERT_ALWAYS); + } else { + /* insert only if no vlan in packet */ + MLX5_SET(modify_esw_vport_context_in, in, + esw_vport_context.vport_cvlan_insert, + MLX5_VPORT_CVLAN_INSERT_WHEN_NO_CVLAN); + } MLX5_SET(modify_esw_vport_context_in, in, esw_vport_context.cvlan_pcp, qos); MLX5_SET(modify_esw_vport_context_in, in, @@ -773,6 +780,7 @@ static void esw_vport_cleanup_acl(struct mlx5_eswitch *esw,
static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport) { + bool vst_mode_steering = esw_vst_mode_is_steering(esw); u16 vport_num = vport->vport; int flags; int err; @@ -802,8 +810,9 @@ static int esw_vport_setup(struct mlx5_eswitch *esw, struct mlx5_vport *vport)
flags = (vport->info.vlan || vport->info.qos) ? SET_VLAN_STRIP | SET_VLAN_INSERT : 0; - modify_esw_vport_cvlan(esw->dev, vport_num, vport->info.vlan, - vport->info.qos, flags); + if (esw->mode == MLX5_ESWITCH_OFFLOADS || !vst_mode_steering) + modify_esw_vport_cvlan(esw->dev, vport_num, vport->info.vlan, + vport->info.qos, flags);
return 0; } @@ -1846,6 +1855,7 @@ int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw, u16 vport, u16 vlan, u8 qos, u8 set_flags) { struct mlx5_vport *evport = mlx5_eswitch_get_vport(esw, vport); + bool vst_mode_steering = esw_vst_mode_is_steering(esw); int err = 0;
if (IS_ERR(evport)) @@ -1853,9 +1863,11 @@ int __mlx5_eswitch_set_vport_vlan(struct mlx5_eswitch *esw, if (vlan > 4095 || qos > 7) return -EINVAL;
- err = modify_esw_vport_cvlan(esw->dev, vport, vlan, qos, set_flags); - if (err) - return err; + if (esw->mode == MLX5_ESWITCH_OFFLOADS || !vst_mode_steering) { + err = modify_esw_vport_cvlan(esw->dev, vport, vlan, qos, set_flags); + if (err) + return err; + }
evport->info.vlan = vlan; evport->info.qos = qos; diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h index 2c7444101bb9..0e2c9e6fccb6 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h @@ -505,6 +505,12 @@ static inline bool mlx5_esw_qos_enabled(struct mlx5_eswitch *esw) return esw->qos.enabled; }
+static inline bool esw_vst_mode_is_steering(struct mlx5_eswitch *esw) +{ + return (MLX5_CAP_ESW_EGRESS_ACL(esw->dev, pop_vlan) && + MLX5_CAP_ESW_INGRESS_ACL(esw->dev, push_vlan)); +} + static inline bool mlx5_eswitch_vlan_actions_supported(struct mlx5_core_dev *dev, u8 vlan_depth) { diff --git a/include/linux/mlx5/device.h b/include/linux/mlx5/device.h index 66eaf0aa7f69..3e72133545ca 100644 --- a/include/linux/mlx5/device.h +++ b/include/linux/mlx5/device.h @@ -1074,6 +1074,11 @@ enum { MLX5_VPORT_ADMIN_STATE_AUTO = 0x2, };
+enum { + MLX5_VPORT_CVLAN_INSERT_WHEN_NO_CVLAN = 0x1, + MLX5_VPORT_CVLAN_INSERT_ALWAYS = 0x3, +}; + enum { MLX5_L3_PROT_TYPE_IPV4 = 0, MLX5_L3_PROT_TYPE_IPV6 = 1, diff --git a/include/linux/mlx5/mlx5_ifc.h b/include/linux/mlx5/mlx5_ifc.h index cd9d1c95129e..49ea0004109e 100644 --- a/include/linux/mlx5/mlx5_ifc.h +++ b/include/linux/mlx5/mlx5_ifc.h @@ -822,7 +822,8 @@ struct mlx5_ifc_e_switch_cap_bits { u8 vport_svlan_insert[0x1]; u8 vport_cvlan_insert_if_not_exist[0x1]; u8 vport_cvlan_insert_overwrite[0x1]; - u8 reserved_at_5[0x2]; + u8 reserved_at_5[0x1]; + u8 vport_cvlan_insert_always[0x1]; u8 esw_shared_ingress_acl[0x1]; u8 esw_uplink_ingress_acl[0x1]; u8 root_ft_on_other_esw[0x1];
From: Jiri Pirko jiri@nvidia.com
[ Upstream commit 2a35b2c2e6a252eda2134aae6a756861d9299531 ]
There are two cleanup calls missing in mlx5_init_once() error path. Add them making the error path flow to be the same as mlx5_cleanup_once().
Fixes: 52ec462eca9b ("net/mlx5: Add reserved-gids support") Fixes: 7c39afb394c7 ("net/mlx5: PTP code migration to driver core section") Signed-off-by: Jiri Pirko jiri@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c index 19c11d33f4b6..145e56f5eeee 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c @@ -928,6 +928,8 @@ static int mlx5_init_once(struct mlx5_core_dev *dev) err_tables_cleanup: mlx5_geneve_destroy(dev->geneve); mlx5_vxlan_destroy(dev->vxlan); + mlx5_cleanup_clock(dev); + mlx5_cleanup_reserved_gids(dev); mlx5_cq_debugfs_cleanup(dev); mlx5_fw_reset_cleanup(dev); err_events_cleanup:
From: Shay Drory shayd@nvidia.com
[ Upstream commit 9078e843efec530f279a155f262793c58b0746bd ]
Currently, recovery is done without considering whether the device is still in probe flow. This may lead to recovery before device have finished probed successfully. e.g.: while mlx5_init_one() is running. Recovery flow is using functionality that is loaded only by mlx5_init_one(), and there is no point in running recovery without mlx5_init_one() finished successfully.
Fix it by waiting for probe flow to finish and checking whether the device is probed before trying to perform recovery.
Fixes: 51d138c2610a ("net/mlx5: Fix health error state handling") Signed-off-by: Shay Drory shayd@nvidia.com Reviewed-by: Moshe Shemesh moshe@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/health.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/health.c b/drivers/net/ethernet/mellanox/mlx5/core/health.c index 037e18dd4be0..3dceab45986d 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/health.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/health.c @@ -614,6 +614,12 @@ static void mlx5_fw_fatal_reporter_err_work(struct work_struct *work) priv = container_of(health, struct mlx5_priv, health); dev = container_of(priv, struct mlx5_core_dev, priv);
+ mutex_lock(&dev->intf_state_mutex); + if (test_bit(MLX5_DROP_NEW_HEALTH_WORK, &health->flags)) { + mlx5_core_err(dev, "health works are not permitted at this stage\n"); + return; + } + mutex_unlock(&dev->intf_state_mutex); enter_error_state(dev, false); if (IS_ERR_OR_NULL(health->fw_fatal_reporter)) { if (mlx5_health_try_recover(dev))
From: Dragos Tatulea dtatulea@nvidia.com
[ Upstream commit b12d581e83e3ae1080c32ab83f123005bd89a840 ]
mlx5e_build_nic_params will turn CQE compression on if the hardware capability is enabled and the slow_pci_heuristic condition is detected. As IPoIB doesn't support CQE compression, make sure to disable the feature in the IPoIB profile init.
Please note that the feature is not exposed to the user for IPoIB interfaces, so it can't be subsequently turned on.
Fixes: b797a684b0dd ("net/mlx5e: Enable CQE compression when PCI is slower than link") Signed-off-by: Dragos Tatulea dtatulea@nvidia.com Reviewed-by: Gal Pressman gal@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c index cfde0a45b8b8..10940b8dc83e 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c @@ -70,6 +70,10 @@ static void mlx5i_build_nic_params(struct mlx5_core_dev *mdev, params->packet_merge.type = MLX5E_PACKET_MERGE_NONE; params->hard_mtu = MLX5_IB_GRH_BYTES + MLX5_IPOIB_HARD_LEN; params->tunneled_offload_en = false; + + /* CQE compression is not supported for IPoIB */ + params->rx_cqe_compress_def = false; + MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_CQE_COMPRESS, params->rx_cqe_compress_def); }
/* Called directly after IPoIB netdevice was created to initialize SW structs */
From: Roi Dayan roid@nvidia.com
[ Upstream commit ff99316700799b84e842f819a44db608557bae3e ]
In later commit we are going to instantiate multiple attr instances for flow instead of single attr. Make sure mlx5e_tc_add_flow_mod_hdr() use the correct attr and not flow->attr.
Signed-off-by: Roi Dayan roid@nvidia.com Reviewed-by: Oz Shlomo ozsh@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Stable-dep-of: 2951b2e142ec ("net/mlx5e: Always clear dest encap in neigh-update-del") Signed-off-by: Sasha Levin sashal@kernel.org --- .../ethernet/mellanox/mlx5/core/en/tc_tun_encap.c | 2 +- drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 12 ++++++------ drivers/net/ethernet/mellanox/mlx5/core/en_tc.h | 4 ++-- 3 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c index 700c463ea367..3b63d9c20580 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c @@ -1342,7 +1342,7 @@ static void mlx5e_reoffload_encap(struct mlx5e_priv *priv, continue; }
- err = mlx5e_tc_add_flow_mod_hdr(priv, parse_attr, flow); + err = mlx5e_tc_add_flow_mod_hdr(priv, flow, attr); if (err) { mlx5_core_warn(priv->mdev, "Failed to update flow mod_hdr err=%d", err); diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c index 843c8435387f..8f2f99689aba 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c @@ -1342,10 +1342,10 @@ int mlx5e_tc_query_route_vport(struct net_device *out_dev, struct net_device *ro }
int mlx5e_tc_add_flow_mod_hdr(struct mlx5e_priv *priv, - struct mlx5e_tc_flow_parse_attr *parse_attr, - struct mlx5e_tc_flow *flow) + struct mlx5e_tc_flow *flow, + struct mlx5_flow_attr *attr) { - struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts = &parse_attr->mod_hdr_acts; + struct mlx5e_tc_mod_hdr_acts *mod_hdr_acts = &attr->parse_attr->mod_hdr_acts; struct mlx5_modify_hdr *mod_hdr;
mod_hdr = mlx5_modify_header_alloc(priv->mdev, @@ -1355,8 +1355,8 @@ int mlx5e_tc_add_flow_mod_hdr(struct mlx5e_priv *priv, if (IS_ERR(mod_hdr)) return PTR_ERR(mod_hdr);
- WARN_ON(flow->attr->modify_hdr); - flow->attr->modify_hdr = mod_hdr; + WARN_ON(attr->modify_hdr); + attr->modify_hdr = mod_hdr;
return 0; } @@ -1457,7 +1457,7 @@ mlx5e_tc_add_fdb_flow(struct mlx5e_priv *priv, if (attr->action & MLX5_FLOW_CONTEXT_ACTION_MOD_HDR && !(attr->ct_attr.ct_action & TCA_CT_ACT_CLEAR)) { if (vf_tun) { - err = mlx5e_tc_add_flow_mod_hdr(priv, parse_attr, flow); + err = mlx5e_tc_add_flow_mod_hdr(priv, flow, attr); if (err) goto err_out; } else { diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h index 1a4cd882f0fb..f48af82781f8 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.h @@ -241,8 +241,8 @@ int mlx5e_tc_match_to_reg_set_and_get_id(struct mlx5_core_dev *mdev, u32 data);
int mlx5e_tc_add_flow_mod_hdr(struct mlx5e_priv *priv, - struct mlx5e_tc_flow_parse_attr *parse_attr, - struct mlx5e_tc_flow *flow); + struct mlx5e_tc_flow *flow, + struct mlx5_flow_attr *attr);
int alloc_mod_hdr_actions(struct mlx5_core_dev *mdev, int namespace,
From: Chris Mi cmi@nvidia.com
[ Upstream commit 2951b2e142ecf6e0115df785ba91e91b6da74602 ]
The cited commit introduced a bug for multiple encapsulations flow. If one dest encap becomes invalid, the flow is set slow path flag. But when other dests encap become invalid, they are not cleared due to slow path flag of the flow. When neigh-update-add is running, it will use invalid encap.
Fix it by checking slow path flag after clearing dest encap.
Fixes: 9a5f9cc794e1 ("net/mlx5e: Fix possible use-after-free deleting fdb rule") Signed-off-by: Chris Mi cmi@nvidia.com Reviewed-by: Roi Dayan roid@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- .../net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c index 3b63d9c20580..a8d7f07ee2ca 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en/tc_tun_encap.c @@ -188,12 +188,19 @@ void mlx5e_tc_encap_flows_del(struct mlx5e_priv *priv, int err;
list_for_each_entry(flow, flow_list, tmp_list) { - if (!mlx5e_is_offloaded_flow(flow) || flow_flag_test(flow, SLOW)) + if (!mlx5e_is_offloaded_flow(flow)) continue; attr = flow->attr; esw_attr = attr->esw_attr; spec = &attr->parse_attr->spec;
+ /* Clear pkt_reformat before checking slow path flag. Because + * in next iteration, the same flow is already set slow path + * flag, but still need to clear the pkt_reformat. + */ + if (flow_flag_test(flow, SLOW)) + continue; + /* update from encap rule to slow path rule */ rule = mlx5e_tc_offload_to_slow_path(esw, flow, spec); /* mark the flow's encap dest as non-valid */
From: Adham Faris afaris@nvidia.com
[ Upstream commit 1e267ab88dc44c48f556218f7b7f14c76f7aa066 ]
Current xdp xmit functions logic (mlx5e_xmit_xdp_frame_mpwqe or mlx5e_xmit_xdp_frame), validates xdp packet length by comparing it to hw mtu (configured at xdp sq allocation) before xmiting it. This check does not account for ethernet fcs length (calculated and filled by the nic). Hence, when we try sending packets with length > (hw-mtu - ethernet-fcs-size), the device port drops it and tx_errors_phy is incremented. Desired behavior is to catch these packets and drop them by the driver.
Fix this behavior in XDP SQ allocation function (mlx5e_alloc_xdpsq) by subtracting ethernet FCS header size (4 Bytes) from current hw mtu value, since ethernet FCS is calculated and written to ethernet frames by the nic.
Fixes: d8bec2b29a82 ("net/mlx5e: Support bpf_xdp_adjust_head()") Signed-off-by: Adham Faris afaris@nvidia.com Reviewed-by: Tariq Toukan tariqt@nvidia.com Signed-off-by: Saeed Mahameed saeedm@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index c1c4f380803a..be19f5cf9d15 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -977,7 +977,7 @@ static int mlx5e_alloc_xdpsq(struct mlx5e_channel *c, sq->channel = c; sq->uar_map = mdev->mlx5e_res.hw_objs.bfreg.map; sq->min_inline_mode = params->tx_min_inline_mode; - sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu); + sq->hw_mtu = MLX5E_SW2HW_MTU(params, params->sw_mtu) - ETH_FCS_LEN; sq->xsk_pool = xsk_pool;
sq->stats = sq->xsk_pool ?
From: Jiguang Xiao jiguang.xiao@windriver.com
[ Upstream commit d530ece70f16f912e1d1bfeea694246ab78b0a4b ]
The driver does not call tasklet_kill in several places. Add the calls to fix it.
Fixes: 85b85c853401 ("amd-xgbe: Re-issue interrupt if interrupt status not cleared") Signed-off-by: Jiguang Xiao jiguang.xiao@windriver.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 3 +++ drivers/net/ethernet/amd/xgbe/xgbe-i2c.c | 4 +++- drivers/net/ethernet/amd/xgbe/xgbe-mdio.c | 4 +++- 3 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c index e6883d52d230..555db1871ec9 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-drv.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-drv.c @@ -1064,6 +1064,9 @@ static void xgbe_free_irqs(struct xgbe_prv_data *pdata)
devm_free_irq(pdata->dev, pdata->dev_irq, pdata);
+ tasklet_kill(&pdata->tasklet_dev); + tasklet_kill(&pdata->tasklet_ecc); + if (pdata->vdata->ecc_support && (pdata->dev_irq != pdata->ecc_irq)) devm_free_irq(pdata->dev, pdata->ecc_irq, pdata);
diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c b/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c index 22d4fc547a0a..a9ccc4258ee5 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-i2c.c @@ -447,8 +447,10 @@ static void xgbe_i2c_stop(struct xgbe_prv_data *pdata) xgbe_i2c_disable(pdata); xgbe_i2c_clear_all_interrupts(pdata);
- if (pdata->dev_irq != pdata->i2c_irq) + if (pdata->dev_irq != pdata->i2c_irq) { devm_free_irq(pdata->dev, pdata->i2c_irq, pdata); + tasklet_kill(&pdata->tasklet_i2c); + } }
static int xgbe_i2c_start(struct xgbe_prv_data *pdata) diff --git a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c index 4e97b4869522..0c5c1b155683 100644 --- a/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c +++ b/drivers/net/ethernet/amd/xgbe/xgbe-mdio.c @@ -1390,8 +1390,10 @@ static void xgbe_phy_stop(struct xgbe_prv_data *pdata) /* Disable auto-negotiation */ xgbe_an_disable_all(pdata);
- if (pdata->dev_irq != pdata->an_irq) + if (pdata->dev_irq != pdata->an_irq) { devm_free_irq(pdata->dev, pdata->an_irq, pdata); + tasklet_kill(&pdata->tasklet_an); + }
pdata->phy_if.phy_impl.stop(pdata);
From: David Arinzon darinzon@amazon.com
[ Upstream commit 332b49ff637d6c1a75b971022a8b992cf3c57db1 ]
On driver initialization, RSS hash initial value is set to zero, instead of the default value. This happens because we pass NULL as the RSS key parameter, which caused us to never initialize the RSS hash value.
This patch fixes it by making sure the initial value is set, no matter what the value of the RSS key is.
Fixes: 91a65b7d3ed8 ("net: ena: fix potential crash when rxfh key is NULL") Signed-off-by: Nati Koler nkoler@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_com.c | 29 +++++++---------------- 1 file changed, 9 insertions(+), 20 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_com.c b/drivers/net/ethernet/amazon/ena/ena_com.c index ab413fc1f68e..f0faad149a3b 100644 --- a/drivers/net/ethernet/amazon/ena/ena_com.c +++ b/drivers/net/ethernet/amazon/ena/ena_com.c @@ -2392,29 +2392,18 @@ int ena_com_fill_hash_function(struct ena_com_dev *ena_dev, return -EOPNOTSUPP; }
- switch (func) { - case ENA_ADMIN_TOEPLITZ: - if (key) { - if (key_len != sizeof(hash_key->key)) { - netdev_err(ena_dev->net_device, - "key len (%u) doesn't equal the supported size (%zu)\n", - key_len, sizeof(hash_key->key)); - return -EINVAL; - } - memcpy(hash_key->key, key, key_len); - rss->hash_init_val = init_val; - hash_key->key_parts = key_len / sizeof(hash_key->key[0]); + if ((func == ENA_ADMIN_TOEPLITZ) && key) { + if (key_len != sizeof(hash_key->key)) { + netdev_err(ena_dev->net_device, + "key len (%u) doesn't equal the supported size (%zu)\n", + key_len, sizeof(hash_key->key)); + return -EINVAL; } - break; - case ENA_ADMIN_CRC32: - rss->hash_init_val = init_val; - break; - default: - netdev_err(ena_dev->net_device, "Invalid hash function (%d)\n", - func); - return -EINVAL; + memcpy(hash_key->key, key, key_len); + hash_key->key_parts = key_len / sizeof(hash_key->key[0]); }
+ rss->hash_init_val = init_val; old_func = rss->hash_func; rss->hash_func = func; rc = ena_com_set_hash_function(ena_dev);
From: David Arinzon darinzon@amazon.com
[ Upstream commit 9c9e539956fa67efb8a65e32b72a853740b33445 ]
Since the queues aren't destroyed when we only exchange XDP programs, there's no need to re-register them again.
Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Shay Agroskin shayagr@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index f032e58a4c3c..da16f428e7fa 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -516,16 +516,18 @@ static void ena_xdp_exchange_program_rx_in_range(struct ena_adapter *adapter, struct bpf_prog *prog, int first, int count) { + struct bpf_prog *old_bpf_prog; struct ena_ring *rx_ring; int i = 0;
for (i = first; i < count; i++) { rx_ring = &adapter->rx_ring[i]; - xchg(&rx_ring->xdp_bpf_prog, prog); - if (prog) { + old_bpf_prog = xchg(&rx_ring->xdp_bpf_prog, prog); + + if (!old_bpf_prog && prog) { ena_xdp_register_rxq_info(rx_ring); rx_ring->rx_headroom = XDP_PACKET_HEADROOM; - } else { + } else if (old_bpf_prog && !prog) { ena_xdp_unregister_rxq_info(rx_ring); rx_ring->rx_headroom = NET_SKB_PAD; }
From: David Arinzon darinzon@amazon.com
[ Upstream commit c7f5e34d906320fdc996afa616676161c029cc02 ]
The size of packets that were forwarded or dropped by XDP wasn't added to the total processed bytes statistic.
Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: Shay Agroskin shayagr@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index da16f428e7fa..31afbd17e690 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -1729,6 +1729,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, } if (xdp_verdict != XDP_PASS) { xdp_flags |= xdp_verdict; + total_len += ena_rx_ctx.ena_bufs[0].len; res_budget--; continue; }
From: David Arinzon darinzon@amazon.com
[ Upstream commit 59811faa2c54dbcf44d575b5a8f6e7077da88dc2 ]
Redirecting packets with XDP Redirect is done in two phases: 1. A packet is passed by the driver to the kernel using xdp_do_redirect(). 2. After finishing polling for new packets the driver lets the kernel know that it can now process the redirected packet using xdp_do_flush_map(). The packets' redirection is handled in the napi context of the queue that called xdp_do_redirect()
To avoid calling xdp_do_flush_map() each time the driver first checks whether any packets were redirected, using xdp_flags |= xdp_verdict; and if (xdp_flags & XDP_REDIRECT) xdp_do_flush_map()
essentially treating XDP instructions as a bitmask, which isn't the case: enum xdp_action { XDP_ABORTED = 0, XDP_DROP, XDP_PASS, XDP_TX, XDP_REDIRECT, };
Given the current possible values of xdp_action, the current design doesn't have a bug (since XDP_REDIRECT = 100b), but it is still flawed.
This patch makes the driver use a bitmask instead, to avoid future issues.
Fixes: a318c70ad152 ("net: ena: introduce XDP redirect implementation") Signed-off-by: Shay Agroskin shayagr@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 26 ++++++++++++-------- drivers/net/ethernet/amazon/ena/ena_netdev.h | 9 +++++++ 2 files changed, 25 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 31afbd17e690..294f21a839cf 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -378,9 +378,9 @@ static int ena_xdp_xmit(struct net_device *dev, int n,
static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp) { + u32 verdict = ENA_XDP_PASS; struct bpf_prog *xdp_prog; struct ena_ring *xdp_ring; - u32 verdict = XDP_PASS; struct xdp_frame *xdpf; u64 *xdp_stat;
@@ -397,7 +397,7 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp) if (unlikely(!xdpf)) { trace_xdp_exception(rx_ring->netdev, xdp_prog, verdict); xdp_stat = &rx_ring->rx_stats.xdp_aborted; - verdict = XDP_ABORTED; + verdict = ENA_XDP_DROP; break; }
@@ -413,29 +413,35 @@ static int ena_xdp_execute(struct ena_ring *rx_ring, struct xdp_buff *xdp)
spin_unlock(&xdp_ring->xdp_tx_lock); xdp_stat = &rx_ring->rx_stats.xdp_tx; + verdict = ENA_XDP_TX; break; case XDP_REDIRECT: if (likely(!xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog))) { xdp_stat = &rx_ring->rx_stats.xdp_redirect; + verdict = ENA_XDP_REDIRECT; break; } trace_xdp_exception(rx_ring->netdev, xdp_prog, verdict); xdp_stat = &rx_ring->rx_stats.xdp_aborted; - verdict = XDP_ABORTED; + verdict = ENA_XDP_DROP; break; case XDP_ABORTED: trace_xdp_exception(rx_ring->netdev, xdp_prog, verdict); xdp_stat = &rx_ring->rx_stats.xdp_aborted; + verdict = ENA_XDP_DROP; break; case XDP_DROP: xdp_stat = &rx_ring->rx_stats.xdp_drop; + verdict = ENA_XDP_DROP; break; case XDP_PASS: xdp_stat = &rx_ring->rx_stats.xdp_pass; + verdict = ENA_XDP_PASS; break; default: bpf_warn_invalid_xdp_action(verdict); xdp_stat = &rx_ring->rx_stats.xdp_invalid; + verdict = ENA_XDP_DROP; }
ena_increase_stat(xdp_stat, 1, &rx_ring->syncp); @@ -1631,12 +1637,12 @@ static int ena_xdp_handle_buff(struct ena_ring *rx_ring, struct xdp_buff *xdp) * we expect, then we simply drop it */ if (unlikely(rx_ring->ena_bufs[0].len > ENA_XDP_MAX_MTU)) - return XDP_DROP; + return ENA_XDP_DROP;
ret = ena_xdp_execute(rx_ring, xdp);
/* The xdp program might expand the headers */ - if (ret == XDP_PASS) { + if (ret == ENA_XDP_PASS) { rx_info->page_offset = xdp->data - xdp->data_hard_start; rx_ring->ena_bufs[0].len = xdp->data_end - xdp->data; } @@ -1675,7 +1681,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, xdp_init_buff(&xdp, ENA_PAGE_SIZE, &rx_ring->xdp_rxq);
do { - xdp_verdict = XDP_PASS; + xdp_verdict = ENA_XDP_PASS; skb = NULL; ena_rx_ctx.ena_bufs = rx_ring->ena_bufs; ena_rx_ctx.max_bufs = rx_ring->sgl_size; @@ -1703,7 +1709,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, xdp_verdict = ena_xdp_handle_buff(rx_ring, &xdp);
/* allocate skb and fill it */ - if (xdp_verdict == XDP_PASS) + if (xdp_verdict == ENA_XDP_PASS) skb = ena_rx_skb(rx_ring, rx_ring->ena_bufs, ena_rx_ctx.descs, @@ -1721,13 +1727,13 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, /* Packets was passed for transmission, unmap it * from RX side. */ - if (xdp_verdict == XDP_TX || xdp_verdict == XDP_REDIRECT) { + if (xdp_verdict & ENA_XDP_FORWARDED) { ena_unmap_rx_buff(rx_ring, &rx_ring->rx_buffer_info[req_id]); rx_ring->rx_buffer_info[req_id].page = NULL; } } - if (xdp_verdict != XDP_PASS) { + if (xdp_verdict != ENA_XDP_PASS) { xdp_flags |= xdp_verdict; total_len += ena_rx_ctx.ena_bufs[0].len; res_budget--; @@ -1773,7 +1779,7 @@ static int ena_clean_rx_irq(struct ena_ring *rx_ring, struct napi_struct *napi, ena_refill_rx_bufs(rx_ring, refill_required); }
- if (xdp_flags & XDP_REDIRECT) + if (xdp_flags & ENA_XDP_REDIRECT) xdp_do_flush_map();
return work_done; diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h index 0c39fc2fa345..ada2f8faa33a 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h @@ -412,6 +412,15 @@ enum ena_xdp_errors_t { ENA_XDP_NO_ENOUGH_QUEUES, };
+enum ENA_XDP_ACTIONS { + ENA_XDP_PASS = 0, + ENA_XDP_TX = BIT(0), + ENA_XDP_REDIRECT = BIT(1), + ENA_XDP_DROP = BIT(2) +}; + +#define ENA_XDP_FORWARDED (ENA_XDP_TX | ENA_XDP_REDIRECT) + static inline bool ena_xdp_present(struct ena_adapter *adapter) { return !!adapter->xdp_bpf_prog;
From: David Arinzon darinzon@amazon.com
[ Upstream commit c7062aaee099f2f43d6f07a71744b44b94b94b34 ]
Make the upper bound on rx_copybreak tighter, by making sure it is smaller than the minimum of mtu and ENA_PAGE_SIZE. With the current upper bound of mtu, rx_copybreak can be larger than a page. Such large rx_copybreak will not bring any performance benefit to the user and therefore makes no sense.
In addition, the value update was only reflected in the adapter structure, but not applied for each ring, causing it to not take effect.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Osama Abboud osamaabb@amazon.com Signed-off-by: Arthur Kiyanovski akiyano@amazon.com Signed-off-by: David Arinzon darinzon@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_ethtool.c | 6 +----- drivers/net/ethernet/amazon/ena/ena_netdev.c | 18 ++++++++++++++++++ drivers/net/ethernet/amazon/ena/ena_netdev.h | 2 ++ 3 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_ethtool.c b/drivers/net/ethernet/amazon/ena/ena_ethtool.c index 13e745cf3781..413082f10dc1 100644 --- a/drivers/net/ethernet/amazon/ena/ena_ethtool.c +++ b/drivers/net/ethernet/amazon/ena/ena_ethtool.c @@ -880,11 +880,7 @@ static int ena_set_tunable(struct net_device *netdev, switch (tuna->id) { case ETHTOOL_RX_COPYBREAK: len = *(u32 *)data; - if (len > adapter->netdev->mtu) { - ret = -EINVAL; - break; - } - adapter->rx_copybreak = len; + ret = ena_set_rx_copybreak(adapter, len); break; default: ret = -EINVAL; diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 294f21a839cf..8f1b205e7333 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -2829,6 +2829,24 @@ int ena_update_queue_sizes(struct ena_adapter *adapter, return dev_was_up ? ena_up(adapter) : 0; }
+int ena_set_rx_copybreak(struct ena_adapter *adapter, u32 rx_copybreak) +{ + struct ena_ring *rx_ring; + int i; + + if (rx_copybreak > min_t(u16, adapter->netdev->mtu, ENA_PAGE_SIZE)) + return -EINVAL; + + adapter->rx_copybreak = rx_copybreak; + + for (i = 0; i < adapter->num_io_queues; i++) { + rx_ring = &adapter->rx_ring[i]; + rx_ring->rx_copybreak = rx_copybreak; + } + + return 0; +} + int ena_update_queue_count(struct ena_adapter *adapter, u32 new_channel_count) { struct ena_com_dev *ena_dev = adapter->ena_dev; diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h index ada2f8faa33a..2b5eb573ff23 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h @@ -404,6 +404,8 @@ int ena_update_queue_sizes(struct ena_adapter *adapter,
int ena_update_queue_count(struct ena_adapter *adapter, u32 new_channel_count);
+int ena_set_rx_copybreak(struct ena_adapter *adapter, u32 rx_copybreak); + int ena_get_sset_count(struct net_device *netdev, int sset);
enum ena_xdp_errors_t {
From: David Arinzon darinzon@amazon.com
[ Upstream commit e712f3e4920b3a1a5e6b536827d118e14862896c ]
RX ring can be NULL in XDP use cases where only TX queues are configured. In this scenario, the RX interrupt moderation value sent to the device remains in its default value of 0.
In this change, setting the default value of the RX interrupt moderation to be the same as of the TX.
Fixes: 548c4940b9f1 ("net: ena: Implement XDP_TX action") Signed-off-by: David Arinzon darinzon@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index 8f1b205e7333..b1533a45f645 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -1836,8 +1836,9 @@ static void ena_adjust_adaptive_rx_intr_moderation(struct ena_napi *ena_napi) static void ena_unmask_interrupt(struct ena_ring *tx_ring, struct ena_ring *rx_ring) { + u32 rx_interval = tx_ring->smoothed_interval; struct ena_eth_io_intr_reg intr_reg; - u32 rx_interval = 0; + /* Rx ring can be NULL when for XDP tx queues which don't have an * accompanying rx_ring pair. */
From: David Arinzon darinzon@amazon.com
[ Upstream commit a8ee104f986e720cea52133885cc822d459398c7 ]
The device supports a PCIe optimization hint, which indicates on which NUMA the queue is currently processed. This hint is utilized by PCIe in order to reduce its access time by accessing the correct NUMA resources and maintaining cache coherence.
The driver calls the register update for the hint (called TPH - TLP Processing Hint) during the NAPI loop.
Though the update is expected upon a NUMA change (when a queue is moved from one NUMA to the other), the current logic performs a register update when the queue is moved to a different CPU, but the CPU is not necessarily in a different NUMA.
The changes include: 1. Performing the TPH update only when the queue has switched a NUMA node. 2. Moving the TPH update call to be triggered only when NAPI was scheduled from interrupt context, as opposed to a busy-polling loop. This is due to the fact that during busy-polling, the frequency of CPU switches for a particular queue is significantly higher, thus, the likelihood to switch NUMA is much higher. Therefore, providing the frequent updates to the device upon a NUMA update are unlikely to be beneficial.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: David Arinzon darinzon@amazon.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/amazon/ena/ena_netdev.c | 27 +++++++++++++------- drivers/net/ethernet/amazon/ena/ena_netdev.h | 6 +++-- 2 files changed, 22 insertions(+), 11 deletions(-)
diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.c b/drivers/net/ethernet/amazon/ena/ena_netdev.c index b1533a45f645..23c9750850e9 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.c +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.c @@ -684,6 +684,7 @@ static void ena_init_io_rings_common(struct ena_adapter *adapter, ring->ena_dev = adapter->ena_dev; ring->per_napi_packets = 0; ring->cpu = 0; + ring->numa_node = 0; ring->no_interrupt_event_cnt = 0; u64_stats_init(&ring->syncp); } @@ -787,6 +788,7 @@ static int ena_setup_tx_resources(struct ena_adapter *adapter, int qid) tx_ring->next_to_use = 0; tx_ring->next_to_clean = 0; tx_ring->cpu = ena_irq->cpu; + tx_ring->numa_node = node; return 0;
err_push_buf_intermediate_buf: @@ -919,6 +921,7 @@ static int ena_setup_rx_resources(struct ena_adapter *adapter, rx_ring->next_to_clean = 0; rx_ring->next_to_use = 0; rx_ring->cpu = ena_irq->cpu; + rx_ring->numa_node = node;
return 0; } @@ -1876,20 +1879,27 @@ static void ena_update_ring_numa_node(struct ena_ring *tx_ring, if (likely(tx_ring->cpu == cpu)) goto out;
+ tx_ring->cpu = cpu; + if (rx_ring) + rx_ring->cpu = cpu; + numa_node = cpu_to_node(cpu); + + if (likely(tx_ring->numa_node == numa_node)) + goto out; + put_cpu();
if (numa_node != NUMA_NO_NODE) { ena_com_update_numa_node(tx_ring->ena_com_io_cq, numa_node); - if (rx_ring) + tx_ring->numa_node = numa_node; + if (rx_ring) { + rx_ring->numa_node = numa_node; ena_com_update_numa_node(rx_ring->ena_com_io_cq, numa_node); + } }
- tx_ring->cpu = cpu; - if (rx_ring) - rx_ring->cpu = cpu; - return; out: put_cpu(); @@ -2010,11 +2020,10 @@ static int ena_io_poll(struct napi_struct *napi, int budget) if (ena_com_get_adaptive_moderation_enabled(rx_ring->ena_dev)) ena_adjust_adaptive_rx_intr_moderation(ena_napi);
+ ena_update_ring_numa_node(tx_ring, rx_ring); ena_unmask_interrupt(tx_ring, rx_ring); }
- ena_update_ring_numa_node(tx_ring, rx_ring); - ret = rx_work_done; } else { ret = budget; @@ -2401,7 +2410,7 @@ static int ena_create_io_tx_queue(struct ena_adapter *adapter, int qid) ctx.mem_queue_type = ena_dev->tx_mem_queue_type; ctx.msix_vector = msix_vector; ctx.queue_size = tx_ring->ring_size; - ctx.numa_node = cpu_to_node(tx_ring->cpu); + ctx.numa_node = tx_ring->numa_node;
rc = ena_com_create_io_queue(ena_dev, &ctx); if (rc) { @@ -2469,7 +2478,7 @@ static int ena_create_io_rx_queue(struct ena_adapter *adapter, int qid) ctx.mem_queue_type = ENA_ADMIN_PLACEMENT_POLICY_HOST; ctx.msix_vector = msix_vector; ctx.queue_size = rx_ring->ring_size; - ctx.numa_node = cpu_to_node(rx_ring->cpu); + ctx.numa_node = rx_ring->numa_node;
rc = ena_com_create_io_queue(ena_dev, &ctx); if (rc) { diff --git a/drivers/net/ethernet/amazon/ena/ena_netdev.h b/drivers/net/ethernet/amazon/ena/ena_netdev.h index 2b5eb573ff23..bf2a39c91c00 100644 --- a/drivers/net/ethernet/amazon/ena/ena_netdev.h +++ b/drivers/net/ethernet/amazon/ena/ena_netdev.h @@ -273,9 +273,11 @@ struct ena_ring { bool disable_meta_caching; u16 no_interrupt_event_cnt;
- /* cpu for TPH */ + /* cpu and NUMA for TPH */ int cpu; - /* number of tx/rx_buffer_info's entries */ + int numa_node; + + /* number of tx/rx_buffer_info's entries */ int ring_size;
enum ena_admin_placement_policy_type tx_mem_queue_type;
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit d039535850ee47079d59527e96be18d8e0daa84b ]
of_phy_find_device() return device node with refcount incremented. Call put_device() to relese it when not needed anymore.
Fixes: ab4e6ee578e8 ("net: phy: xgmiitorgmii: Check phy_driver ready before accessing") Signed-off-by: Miaoqian Lin linmq006@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/phy/xilinx_gmii2rgmii.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/phy/xilinx_gmii2rgmii.c b/drivers/net/phy/xilinx_gmii2rgmii.c index 8dcb49ed1f3d..7fd9fe6a602b 100644 --- a/drivers/net/phy/xilinx_gmii2rgmii.c +++ b/drivers/net/phy/xilinx_gmii2rgmii.c @@ -105,6 +105,7 @@ static int xgmiitorgmii_probe(struct mdio_device *mdiodev)
if (!priv->phy_dev->drv) { dev_info(dev, "Attached phy not ready\n"); + put_device(&priv->phy_dev->mdio.dev); return -EPROBE_DEFER; }
From: Shay Drory shayd@nvidia.com
[ Upstream commit 38b50aa44495d5eb4218f0b82fc2da76505cec53 ]
Currently, when mlx5_ib_get_hw_stats() is used for device (port_num = 0), there is a special handling in order to use the correct counters, but, port_num is being passed down the stack without any change. Also, some functions assume that port_num >=1. As a result, the following oops can occur.
BUG: unable to handle page fault for address: ffff89510294f1a8 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 0 P4D 0 Oops: 0002 [#1] SMP CPU: 8 PID: 1382 Comm: devlink Tainted: G W 6.1.0-rc4_for_upstream_base_2022_11_10_16_12 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 RIP: 0010:_raw_spin_lock+0xc/0x20 Call Trace: <TASK> mlx5_ib_get_native_port_mdev+0x73/0xe0 [mlx5_ib] do_get_hw_stats.constprop.0+0x109/0x160 [mlx5_ib] mlx5_ib_get_hw_stats+0xad/0x180 [mlx5_ib] ib_setup_device_attrs+0xf0/0x290 [ib_core] ib_register_device+0x3bb/0x510 [ib_core] ? atomic_notifier_chain_register+0x67/0x80 __mlx5_ib_add+0x2b/0x80 [mlx5_ib] mlx5r_probe+0xb8/0x150 [mlx5_ib] ? auxiliary_match_id+0x6a/0x90 auxiliary_bus_probe+0x3c/0x70 ? driver_sysfs_add+0x6b/0x90 really_probe+0xcd/0x380 __driver_probe_device+0x80/0x170 driver_probe_device+0x1e/0x90 __device_attach_driver+0x7d/0x100 ? driver_allows_async_probing+0x60/0x60 ? driver_allows_async_probing+0x60/0x60 bus_for_each_drv+0x7b/0xc0 __device_attach+0xbc/0x200 bus_probe_device+0x87/0xa0 device_add+0x404/0x940 ? dev_set_name+0x53/0x70 __auxiliary_device_add+0x43/0x60 add_adev+0x99/0xe0 [mlx5_core] mlx5_attach_device+0xc8/0x120 [mlx5_core] mlx5_load_one_devl_locked+0xb2/0xe0 [mlx5_core] devlink_reload+0x133/0x250 devlink_nl_cmd_reload+0x480/0x570 ? devlink_nl_pre_doit+0x44/0x2b0 genl_family_rcv_msg_doit.isra.0+0xc2/0x110 genl_rcv_msg+0x180/0x2b0 ? devlink_nl_cmd_region_read_dumpit+0x540/0x540 ? devlink_reload+0x250/0x250 ? devlink_put+0x50/0x50 ? genl_family_rcv_msg_doit.isra.0+0x110/0x110 netlink_rcv_skb+0x54/0x100 genl_rcv+0x24/0x40 netlink_unicast+0x1f6/0x2c0 netlink_sendmsg+0x237/0x490 sock_sendmsg+0x33/0x40 __sys_sendto+0x103/0x160 ? handle_mm_fault+0x10e/0x290 ? do_user_addr_fault+0x1c0/0x5f0 __x64_sys_sendto+0x25/0x30 do_syscall_64+0x3d/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0
Fix it by setting port_num to 1 in order to get device status and remove unused variable.
Fixes: aac4492ef23a ("IB/mlx5: Update counter implementation for dual port RoCE") Link: https://lore.kernel.org/r/98b82994c3cd3fa593b8a75ed3f3901e208beb0f.167223173... Signed-off-by: Shay Drory shayd@nvidia.com Reviewed-by: Patrisious Haddad phaddad@nvidia.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/mlx5/counters.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/counters.c b/drivers/infiniband/hw/mlx5/counters.c index 224ba36f2946..1a0ecf439c09 100644 --- a/drivers/infiniband/hw/mlx5/counters.c +++ b/drivers/infiniband/hw/mlx5/counters.c @@ -249,7 +249,6 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev, const struct mlx5_ib_counters *cnts = get_counters(dev, port_num - 1); struct mlx5_core_dev *mdev; int ret, num_counters; - u32 mdev_port_num;
if (!stats) return -EINVAL; @@ -270,8 +269,9 @@ static int mlx5_ib_get_hw_stats(struct ib_device *ibdev, }
if (MLX5_CAP_GEN(dev->mdev, cc_query_allowed)) { - mdev = mlx5_ib_get_native_port_mdev(dev, port_num, - &mdev_port_num); + if (!port_num) + port_num = 1; + mdev = mlx5_ib_get_native_port_mdev(dev, port_num, NULL); if (!mdev) { /* If port is not affiliated yet, its in down state * which doesn't have any counters yet, so it would be
From: Maor Gottlieb maorg@nvidia.com
[ Upstream commit 8de8482fe5732fbef4f5af82bc0c0362c804cd1f ]
Currently, when modifying DC, we validate max_rd_atomic user attribute against the RC cap, validate against DC. RC and DC QP types have different device limitations.
This can cause userspace created DC QPs to malfunction.
Fixes: c32a4f296e1d ("IB/mlx5: Add support for DC Initiator QP") Link: https://lore.kernel.org/r/0c5aee72cea188c3bb770f4207cce7abc9b6fc74.167223173... Signed-off-by: Maor Gottlieb maorg@nvidia.com Signed-off-by: Leon Romanovsky leon@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/mlx5/qp.c | 49 +++++++++++++++++++++++---------- 1 file changed, 35 insertions(+), 14 deletions(-)
diff --git a/drivers/infiniband/hw/mlx5/qp.c b/drivers/infiniband/hw/mlx5/qp.c index e5abbcfc1d57..55b05a3e31b8 100644 --- a/drivers/infiniband/hw/mlx5/qp.c +++ b/drivers/infiniband/hw/mlx5/qp.c @@ -4499,6 +4499,40 @@ static bool mlx5_ib_modify_qp_allowed(struct mlx5_ib_dev *dev, return false; }
+static int validate_rd_atomic(struct mlx5_ib_dev *dev, struct ib_qp_attr *attr, + int attr_mask, enum ib_qp_type qp_type) +{ + int log_max_ra_res; + int log_max_ra_req; + + if (qp_type == MLX5_IB_QPT_DCI) { + log_max_ra_res = 1 << MLX5_CAP_GEN(dev->mdev, + log_max_ra_res_dc); + log_max_ra_req = 1 << MLX5_CAP_GEN(dev->mdev, + log_max_ra_req_dc); + } else { + log_max_ra_res = 1 << MLX5_CAP_GEN(dev->mdev, + log_max_ra_res_qp); + log_max_ra_req = 1 << MLX5_CAP_GEN(dev->mdev, + log_max_ra_req_qp); + } + + if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC && + attr->max_rd_atomic > log_max_ra_res) { + mlx5_ib_dbg(dev, "invalid max_rd_atomic value %d\n", + attr->max_rd_atomic); + return false; + } + + if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC && + attr->max_dest_rd_atomic > log_max_ra_req) { + mlx5_ib_dbg(dev, "invalid max_dest_rd_atomic value %d\n", + attr->max_dest_rd_atomic); + return false; + } + return true; +} + int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, int attr_mask, struct ib_udata *udata) { @@ -4586,21 +4620,8 @@ int mlx5_ib_modify_qp(struct ib_qp *ibqp, struct ib_qp_attr *attr, goto out; }
- if (attr_mask & IB_QP_MAX_QP_RD_ATOMIC && - attr->max_rd_atomic > - (1 << MLX5_CAP_GEN(dev->mdev, log_max_ra_res_qp))) { - mlx5_ib_dbg(dev, "invalid max_rd_atomic value %d\n", - attr->max_rd_atomic); - goto out; - } - - if (attr_mask & IB_QP_MAX_DEST_RD_ATOMIC && - attr->max_dest_rd_atomic > - (1 << MLX5_CAP_GEN(dev->mdev, log_max_ra_req_qp))) { - mlx5_ib_dbg(dev, "invalid max_dest_rd_atomic value %d\n", - attr->max_dest_rd_atomic); + if (!validate_rd_atomic(dev, attr, attr_mask, qp_type)) goto out; - }
if (cur_state == new_state && cur_state == IB_QPS_RESET) { err = 0;
From: Carlo Caione ccaione@baylibre.com
[ Upstream commit 3b754ed6d1cd90017e66e5cc16f3923e4a952ffc ]
Having a bigger number of FIFO lines held after vsync is only useful to SoCs using AFBC to give time to the AFBC decoder to be reset, configured and enabled again.
For SoCs not using AFBC this, on the contrary, is causing on some displays issues and a few pixels vertical offset in the displayed image.
Conditionally increase the number of lines held after vsync only for SoCs using AFBC, leaving the default value for all the others.
Fixes: 24e0d4058eff ("drm/meson: hold 32 lines after vsync to give time for AFBC start") Signed-off-by: Carlo Caione ccaione@baylibre.com Acked-by: Martin Blumenstingl martin.blumenstingl@googlemail.com Acked-by: Neil Armstrong neil.armstrong@linaro.org [narmstrong: added fixes tag] Signed-off-by: Neil Armstrong neil.armstrong@linaro.org Link: https://patchwork.freedesktop.org/patch/msgid/20221216-afbc_s905x-v1-0-033be... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/meson/meson_viu.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/meson/meson_viu.c b/drivers/gpu/drm/meson/meson_viu.c index d4b907889a21..cd399b0b7181 100644 --- a/drivers/gpu/drm/meson/meson_viu.c +++ b/drivers/gpu/drm/meson/meson_viu.c @@ -436,15 +436,14 @@ void meson_viu_init(struct meson_drm *priv)
/* Initialize OSD1 fifo control register */ reg = VIU_OSD_DDR_PRIORITY_URGENT | - VIU_OSD_HOLD_FIFO_LINES(31) | VIU_OSD_FIFO_DEPTH_VAL(32) | /* fifo_depth_val: 32*8=256 */ VIU_OSD_WORDS_PER_BURST(4) | /* 4 words in 1 burst */ VIU_OSD_FIFO_LIMITS(2); /* fifo_lim: 2*16=32 */
if (meson_vpu_is_compatible(priv, VPU_COMPATIBLE_G12A)) - reg |= VIU_OSD_BURST_LENGTH_32; + reg |= (VIU_OSD_BURST_LENGTH_32 | VIU_OSD_HOLD_FIFO_LINES(31)); else - reg |= VIU_OSD_BURST_LENGTH_64; + reg |= (VIU_OSD_BURST_LENGTH_64 | VIU_OSD_HOLD_FIFO_LINES(4));
writel_relaxed(reg, priv->io_base + _REG(VIU_OSD1_FIFO_CTRL_STAT)); writel_relaxed(reg, priv->io_base + _REG(VIU_OSD2_FIFO_CTRL_STAT));
From: Jeff Layton jlayton@kernel.org
[ Upstream commit ab1ddef98a715eddb65309ffa83267e4e84a571e ]
Ceph has a need to know whether a particular inode has any locks set on it. It's currently tracking that by a num_locks field in its filp->private_data, but that's problematic as it tries to decrement this field when releasing locks and that can race with the file being torn down.
Add a new vfs_inode_has_locks helper that just returns whether any locks are currently held on the inode.
Reviewed-by: Xiubo Li xiubli@redhat.com Reviewed-by: Christoph Hellwig hch@infradead.org Signed-off-by: Jeff Layton jlayton@kernel.org Stable-dep-of: 461ab10ef7e6 ("ceph: switch to vfs_inode_has_locks() to fix file lock bug") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/locks.c | 23 +++++++++++++++++++++++ include/linux/fs.h | 6 ++++++ 2 files changed, 29 insertions(+)
diff --git a/fs/locks.c b/fs/locks.c index 3d6fb4ae847b..82a4487e95b3 100644 --- a/fs/locks.c +++ b/fs/locks.c @@ -2703,6 +2703,29 @@ int vfs_cancel_lock(struct file *filp, struct file_lock *fl) } EXPORT_SYMBOL_GPL(vfs_cancel_lock);
+/** + * vfs_inode_has_locks - are any file locks held on @inode? + * @inode: inode to check for locks + * + * Return true if there are any FL_POSIX or FL_FLOCK locks currently + * set on @inode. + */ +bool vfs_inode_has_locks(struct inode *inode) +{ + struct file_lock_context *ctx; + bool ret; + + ctx = smp_load_acquire(&inode->i_flctx); + if (!ctx) + return false; + + spin_lock(&ctx->flc_lock); + ret = !list_empty(&ctx->flc_posix) || !list_empty(&ctx->flc_flock); + spin_unlock(&ctx->flc_lock); + return ret; +} +EXPORT_SYMBOL_GPL(vfs_inode_has_locks); + #ifdef CONFIG_PROC_FS #include <linux/proc_fs.h> #include <linux/seq_file.h> diff --git a/include/linux/fs.h b/include/linux/fs.h index 68fcf3ec9cf6..1e1ac116dd13 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1195,6 +1195,7 @@ extern int locks_delete_block(struct file_lock *); extern int vfs_test_lock(struct file *, struct file_lock *); extern int vfs_lock_file(struct file *, unsigned int, struct file_lock *, struct file_lock *); extern int vfs_cancel_lock(struct file *filp, struct file_lock *fl); +bool vfs_inode_has_locks(struct inode *inode); extern int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl); extern int __break_lease(struct inode *inode, unsigned int flags, unsigned int type); extern void lease_get_mtime(struct inode *, struct timespec64 *time); @@ -1307,6 +1308,11 @@ static inline int vfs_cancel_lock(struct file *filp, struct file_lock *fl) return 0; }
+static inline bool vfs_inode_has_locks(struct inode *inode) +{ + return false; +} + static inline int locks_lock_inode_wait(struct inode *inode, struct file_lock *fl) { return -ENOLCK;
From: Xiubo Li xiubli@redhat.com
[ Upstream commit 461ab10ef7e6ea9b41a0571a7fc6a72af9549a3c ]
For the POSIX locks they are using the same owner, which is the thread id. And multiple POSIX locks could be merged into single one, so when checking whether the 'file' has locks may fail.
For a file where some openers use locking and others don't is a really odd usage pattern though. Locks are like stoplights -- they only work if everyone pays attention to them.
Just switch ceph_get_caps() to check whether any locks are set on the inode. If there are POSIX/OFD/FLOCK locks on the file at the time, we should set CHECK_FILELOCK, regardless of what fd was used to set the lock.
Fixes: ff5d913dfc71 ("ceph: return -EIO if read/write against filp that lost file locks") Signed-off-by: Xiubo Li xiubli@redhat.com Reviewed-by: Jeff Layton jlayton@kernel.org Reviewed-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Ilya Dryomov idryomov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ceph/caps.c | 2 +- fs/ceph/locks.c | 4 ---- fs/ceph/super.h | 1 - 3 files changed, 1 insertion(+), 6 deletions(-)
diff --git a/fs/ceph/caps.c b/fs/ceph/caps.c index be96fe615bec..67b782b0a90a 100644 --- a/fs/ceph/caps.c +++ b/fs/ceph/caps.c @@ -2872,7 +2872,7 @@ int ceph_get_caps(struct file *filp, int need, int want, loff_t endoff, int *got
while (true) { flags &= CEPH_FILE_MODE_MASK; - if (atomic_read(&fi->num_locks)) + if (vfs_inode_has_locks(inode)) flags |= CHECK_FILELOCK; _got = 0; ret = try_get_cap_refs(inode, need, want, endoff, diff --git a/fs/ceph/locks.c b/fs/ceph/locks.c index bdeb271f47d9..3e3b8be76b21 100644 --- a/fs/ceph/locks.c +++ b/fs/ceph/locks.c @@ -32,18 +32,14 @@ void __init ceph_flock_init(void)
static void ceph_fl_copy_lock(struct file_lock *dst, struct file_lock *src) { - struct ceph_file_info *fi = dst->fl_file->private_data; struct inode *inode = file_inode(dst->fl_file); atomic_inc(&ceph_inode(inode)->i_filelock_ref); - atomic_inc(&fi->num_locks); }
static void ceph_fl_release_lock(struct file_lock *fl) { - struct ceph_file_info *fi = fl->fl_file->private_data; struct inode *inode = file_inode(fl->fl_file); struct ceph_inode_info *ci = ceph_inode(inode); - atomic_dec(&fi->num_locks); if (atomic_dec_and_test(&ci->i_filelock_ref)) { /* clear error when all locks are released */ spin_lock(&ci->i_ceph_lock); diff --git a/fs/ceph/super.h b/fs/ceph/super.h index 14f951cd5b61..8c9021d0f837 100644 --- a/fs/ceph/super.h +++ b/fs/ceph/super.h @@ -773,7 +773,6 @@ struct ceph_file_info { struct list_head rw_contexts;
u32 filp_gen; - atomic_t num_locks; };
struct ceph_dir_file_info {
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit 694175cd8a1643cde3acb45c9294bca44a8e08e9 ]
of_irq_find_parent() returns a node pointer with refcount incremented, We should use of_node_put() on it when not needed anymore. Add missing of_node_put() to avoid refcount leak.
Fixes: 96868dce644d ("gpio/sifive: Add GPIO driver for SiFive SoCs") Signed-off-by: Miaoqian Lin linmq006@gmail.com Signed-off-by: Bartosz Golaszewski bartosz.golaszewski@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-sifive.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpio/gpio-sifive.c b/drivers/gpio/gpio-sifive.c index 7d82388b4ab7..f50236e68e88 100644 --- a/drivers/gpio/gpio-sifive.c +++ b/drivers/gpio/gpio-sifive.c @@ -209,6 +209,7 @@ static int sifive_gpio_probe(struct platform_device *pdev) return -ENODEV; } parent = irq_find_host(irq_parent); + of_node_put(irq_parent); if (!parent) { dev_err(dev, "no IRQ parent domain\n"); return -ENODEV;
From: Jamal Hadi Salim jhs@mojatatu.com
[ Upstream commit a2965c7be0522eaa18808684b7b82b248515511b ]
If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume res.class contains a valid pointer Fixes: b0188d4dbe5f ("[NET_SCHED]: sch_atm: Lindent")
Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_atm.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/sched/sch_atm.c b/net/sched/sch_atm.c index 70fe1c5e44ad..33737169cc2d 100644 --- a/net/sched/sch_atm.c +++ b/net/sched/sch_atm.c @@ -397,10 +397,13 @@ static int atm_tc_enqueue(struct sk_buff *skb, struct Qdisc *sch, result = tcf_classify(skb, NULL, fl, &res, true); if (result < 0) continue; + if (result == TC_ACT_SHOT) + goto done; + flow = (struct atm_flow_data *)res.class; if (!flow) flow = lookup_flow(sch, res.classid); - goto done; + goto drop; } } flow = NULL;
From: Jamal Hadi Salim jhs@mojatatu.com
[ Upstream commit caa4b35b4317d5147b3ab0fbdc9c075c7d2e9c12 ]
If asked to drop a packet via TC_ACT_SHOT it is unsafe to assume that res.class contains a valid pointer
Sample splat reported by Kyle Zeng
[ 5.405624] 0: reclassify loop, rule prio 0, protocol 800 [ 5.406326] ================================================================== [ 5.407240] BUG: KASAN: slab-out-of-bounds in cbq_enqueue+0x54b/0xea0 [ 5.407987] Read of size 1 at addr ffff88800e3122aa by task poc/299 [ 5.408731] [ 5.408897] CPU: 0 PID: 299 Comm: poc Not tainted 5.10.155+ #15 [ 5.409516] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 [ 5.410439] Call Trace: [ 5.410764] dump_stack+0x87/0xcd [ 5.411153] print_address_description+0x7a/0x6b0 [ 5.411687] ? vprintk_func+0xb9/0xc0 [ 5.411905] ? printk+0x76/0x96 [ 5.412110] ? cbq_enqueue+0x54b/0xea0 [ 5.412323] kasan_report+0x17d/0x220 [ 5.412591] ? cbq_enqueue+0x54b/0xea0 [ 5.412803] __asan_report_load1_noabort+0x10/0x20 [ 5.413119] cbq_enqueue+0x54b/0xea0 [ 5.413400] ? __kasan_check_write+0x10/0x20 [ 5.413679] __dev_queue_xmit+0x9c0/0x1db0 [ 5.413922] dev_queue_xmit+0xc/0x10 [ 5.414136] ip_finish_output2+0x8bc/0xcd0 [ 5.414436] __ip_finish_output+0x472/0x7a0 [ 5.414692] ip_finish_output+0x5c/0x190 [ 5.414940] ip_output+0x2d8/0x3c0 [ 5.415150] ? ip_mc_finish_output+0x320/0x320 [ 5.415429] __ip_queue_xmit+0x753/0x1760 [ 5.415664] ip_queue_xmit+0x47/0x60 [ 5.415874] __tcp_transmit_skb+0x1ef9/0x34c0 [ 5.416129] tcp_connect+0x1f5e/0x4cb0 [ 5.416347] tcp_v4_connect+0xc8d/0x18c0 [ 5.416577] __inet_stream_connect+0x1ae/0xb40 [ 5.416836] ? local_bh_enable+0x11/0x20 [ 5.417066] ? lock_sock_nested+0x175/0x1d0 [ 5.417309] inet_stream_connect+0x5d/0x90 [ 5.417548] ? __inet_stream_connect+0xb40/0xb40 [ 5.417817] __sys_connect+0x260/0x2b0 [ 5.418037] __x64_sys_connect+0x76/0x80 [ 5.418267] do_syscall_64+0x31/0x50 [ 5.418477] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ 5.418770] RIP: 0033:0x473bb7 [ 5.418952] Code: 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 2a 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 18 89 54 24 0c 48 89 34 24 89 [ 5.420046] RSP: 002b:00007fffd20eb0f8 EFLAGS: 00000246 ORIG_RAX: 000000000000002a [ 5.420472] RAX: ffffffffffffffda RBX: 00007fffd20eb578 RCX: 0000000000473bb7 [ 5.420872] RDX: 0000000000000010 RSI: 00007fffd20eb110 RDI: 0000000000000007 [ 5.421271] RBP: 00007fffd20eb150 R08: 0000000000000001 R09: 0000000000000004 [ 5.421671] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001 [ 5.422071] R13: 00007fffd20eb568 R14: 00000000004fc740 R15: 0000000000000002 [ 5.422471] [ 5.422562] Allocated by task 299: [ 5.422782] __kasan_kmalloc+0x12d/0x160 [ 5.423007] kasan_kmalloc+0x5/0x10 [ 5.423208] kmem_cache_alloc_trace+0x201/0x2e0 [ 5.423492] tcf_proto_create+0x65/0x290 [ 5.423721] tc_new_tfilter+0x137e/0x1830 [ 5.423957] rtnetlink_rcv_msg+0x730/0x9f0 [ 5.424197] netlink_rcv_skb+0x166/0x300 [ 5.424428] rtnetlink_rcv+0x11/0x20 [ 5.424639] netlink_unicast+0x673/0x860 [ 5.424870] netlink_sendmsg+0x6af/0x9f0 [ 5.425100] __sys_sendto+0x58d/0x5a0 [ 5.425315] __x64_sys_sendto+0xda/0xf0 [ 5.425539] do_syscall_64+0x31/0x50 [ 5.425764] entry_SYSCALL_64_after_hwframe+0x61/0xc6 [ 5.426065] [ 5.426157] The buggy address belongs to the object at ffff88800e312200 [ 5.426157] which belongs to the cache kmalloc-128 of size 128 [ 5.426955] The buggy address is located 42 bytes to the right of [ 5.426955] 128-byte region [ffff88800e312200, ffff88800e312280) [ 5.427688] The buggy address belongs to the page: [ 5.427992] page:000000009875fabc refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0xe312 [ 5.428562] flags: 0x100000000000200(slab) [ 5.428812] raw: 0100000000000200 dead000000000100 dead000000000122 ffff888007843680 [ 5.429325] raw: 0000000000000000 0000000000100010 00000001ffffffff ffff88800e312401 [ 5.429875] page dumped because: kasan: bad access detected [ 5.430214] page->mem_cgroup:ffff88800e312401 [ 5.430471] [ 5.430564] Memory state around the buggy address: [ 5.430846] ffff88800e312180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 5.431267] ffff88800e312200: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc [ 5.431705] >ffff88800e312280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 5.432123] ^ [ 5.432391] ffff88800e312300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 fc [ 5.432810] ffff88800e312380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 5.433229] ================================================================== [ 5.433648] Disabling lock debugging due to kernel taint
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Kyle Zeng zengyhkyle@gmail.com Signed-off-by: Jamal Hadi Salim jhs@mojatatu.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/sched/sch_cbq.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/sched/sch_cbq.c b/net/sched/sch_cbq.c index fd7e10567371..46b3dd71777d 100644 --- a/net/sched/sch_cbq.c +++ b/net/sched/sch_cbq.c @@ -231,6 +231,8 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) result = tcf_classify(skb, NULL, fl, &res, true); if (!fl || result < 0) goto fallback; + if (result == TC_ACT_SHOT) + return NULL;
cl = (void *)res.class; if (!cl) { @@ -251,8 +253,6 @@ cbq_classify(struct sk_buff *skb, struct Qdisc *sch, int *qerr) case TC_ACT_TRAP: *qerr = NET_XMIT_SUCCESS | __NET_XMIT_STOLEN; fallthrough; - case TC_ACT_SHOT: - return NULL; case TC_ACT_RECLASSIFY: return cbq_reclassify(skb, cl); }
From: Horatiu Vultur horatiu.vultur@microchip.com
[ Upstream commit 588ab2dc25f60efeb516b4abedb6c551949cc185 ]
There is an issue with the checking of the return value of 'of_get_mac_address', which returns 0 on success and negative value on failure. The driver interpretated the result the opposite way. Therefore if there was a MAC address defined in the DT, then the driver was generating a random MAC address otherwise it would use address 0. Fix this by checking correctly the return value of 'of_get_mac_address'
Fixes: b74ef9f9cb91 ("net: sparx5: Do not use mac_addr uninitialized in mchp_sparx5_probe()") Signed-off-by: Horatiu Vultur horatiu.vultur@microchip.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/microchip/sparx5/sparx5_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/microchip/sparx5/sparx5_main.c b/drivers/net/ethernet/microchip/sparx5/sparx5_main.c index 0463f20da17b..174d89ee6374 100644 --- a/drivers/net/ethernet/microchip/sparx5/sparx5_main.c +++ b/drivers/net/ethernet/microchip/sparx5/sparx5_main.c @@ -779,7 +779,7 @@ static int mchp_sparx5_probe(struct platform_device *pdev) if (err) goto cleanup_config;
- if (!of_get_mac_address(np, sparx5->base_mac)) { + if (of_get_mac_address(np, sparx5->base_mac)) { dev_info(sparx5->dev, "MAC addr was not set, use random MAC\n"); eth_random_addr(sparx5->base_mac); sparx5->base_mac[5] = 0;
From: Jozsef Kadlecsik kadlec@netfilter.org
[ Upstream commit a31d47be64b9b74f8cfedffe03e0a8a1f9e51f23 ]
The hash:net,port,net set type supports /0 subnets. However, the patch commit 5f7b51bf09baca8e titled "netfilter: ipset: Limit the maximal range of consecutive elements to add/delete" did not take into account it and resulted in an endless loop. The bug is actually older but the patch 5f7b51bf09baca8e brings it out earlier.
Handle /0 subnets properly in hash:net,port,net set types.
Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete") Reported-by: Марк Коренберг socketpair@gmail.com Signed-off-by: Jozsef Kadlecsik kadlec@netfilter.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/netfilter/ipset/ip_set_hash_netportnet.c | 40 ++++++++++---------- 1 file changed, 21 insertions(+), 19 deletions(-)
diff --git a/net/netfilter/ipset/ip_set_hash_netportnet.c b/net/netfilter/ipset/ip_set_hash_netportnet.c index 19bcdb3141f6..005a7ce87217 100644 --- a/net/netfilter/ipset/ip_set_hash_netportnet.c +++ b/net/netfilter/ipset/ip_set_hash_netportnet.c @@ -173,17 +173,26 @@ hash_netportnet4_kadt(struct ip_set *set, const struct sk_buff *skb, return adtfn(set, &e, &ext, &opt->ext, opt->cmdflags); }
+static u32 +hash_netportnet4_range_to_cidr(u32 from, u32 to, u8 *cidr) +{ + if (from == 0 && to == UINT_MAX) { + *cidr = 0; + return to; + } + return ip_set_range_to_cidr(from, to, cidr); +} + static int hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { - const struct hash_netportnet4 *h = set->data; + struct hash_netportnet4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netportnet4_elem e = { }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 ip = 0, ip_to = 0, p = 0, port, port_to; - u32 ip2_from = 0, ip2_to = 0, ip2, ipn; - u64 n = 0, m = 0; + u32 ip2_from = 0, ip2_to = 0, ip2, i = 0; bool with_ports = false; int ret;
@@ -285,19 +294,6 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[], } else { ip_set_mask_from_to(ip2_from, ip2_to, e.cidr[1]); } - ipn = ip; - do { - ipn = ip_set_range_to_cidr(ipn, ip_to, &e.cidr[0]); - n++; - } while (ipn++ < ip_to); - ipn = ip2_from; - do { - ipn = ip_set_range_to_cidr(ipn, ip2_to, &e.cidr[1]); - m++; - } while (ipn++ < ip2_to); - - if (n*m*(port_to - port + 1) > IPSET_MAX_RANGE) - return -ERANGE;
if (retried) { ip = ntohl(h->next.ip[0]); @@ -310,13 +306,19 @@ hash_netportnet4_uadt(struct ip_set *set, struct nlattr *tb[],
do { e.ip[0] = htonl(ip); - ip = ip_set_range_to_cidr(ip, ip_to, &e.cidr[0]); + ip = hash_netportnet4_range_to_cidr(ip, ip_to, &e.cidr[0]); for (; p <= port_to; p++) { e.port = htons(p); do { + i++; e.ip[1] = htonl(ip2); - ip2 = ip_set_range_to_cidr(ip2, ip2_to, - &e.cidr[1]); + if (i > IPSET_MAX_RANGE) { + hash_netportnet4_data_next(&h->next, + &e); + return -ERANGE; + } + ip2 = hash_netportnet4_range_to_cidr(ip2, + ip2_to, &e.cidr[1]); ret = adtfn(set, &e, &ext, &ext, flags); if (ret && !ip_set_eexist(ret, flags)) return ret;
From: Jozsef Kadlecsik kadlec@netfilter.org
[ Upstream commit 5e29dc36bd5e2166b834ceb19990d9e68a734d7d ]
When adding/deleting large number of elements in one step in ipset, it can take a reasonable amount of time and can result in soft lockup errors. The patch 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete") tried to fix it by limiting the max elements to process at all. However it was not enough, it is still possible that we get hung tasks. Lowering the limit is not reasonable, so the approach in this patch is as follows: rely on the method used at resizing sets and save the state when we reach a smaller internal batch limit, unlock/lock and proceed from the saved state. Thus we can avoid long continuous tasks and at the same time removed the limit to add/delete large number of elements in one step.
The nfnl mutex is held during the whole operation which prevents one to issue other ipset commands in parallel.
Fixes: 5f7b51bf09ba ("netfilter: ipset: Limit the maximal range of consecutive elements to add/delete") Reported-by: syzbot+9204e7399656300bf271@syzkaller.appspotmail.com Signed-off-by: Jozsef Kadlecsik kadlec@netfilter.org Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/netfilter/ipset/ip_set.h | 2 +- net/netfilter/ipset/ip_set_core.c | 7 ++++--- net/netfilter/ipset/ip_set_hash_ip.c | 14 ++++++------- net/netfilter/ipset/ip_set_hash_ipmark.c | 13 ++++++------ net/netfilter/ipset/ip_set_hash_ipport.c | 13 ++++++------ net/netfilter/ipset/ip_set_hash_ipportip.c | 13 ++++++------ net/netfilter/ipset/ip_set_hash_ipportnet.c | 13 +++++++----- net/netfilter/ipset/ip_set_hash_net.c | 17 +++++++-------- net/netfilter/ipset/ip_set_hash_netiface.c | 15 ++++++-------- net/netfilter/ipset/ip_set_hash_netnet.c | 23 +++++++-------------- net/netfilter/ipset/ip_set_hash_netport.c | 19 +++++++---------- 11 files changed, 68 insertions(+), 81 deletions(-)
diff --git a/include/linux/netfilter/ipset/ip_set.h b/include/linux/netfilter/ipset/ip_set.h index ada1296c87d5..72f5ebc5c97a 100644 --- a/include/linux/netfilter/ipset/ip_set.h +++ b/include/linux/netfilter/ipset/ip_set.h @@ -197,7 +197,7 @@ struct ip_set_region { };
/* Max range where every element is added/deleted in one step */ -#define IPSET_MAX_RANGE (1<<20) +#define IPSET_MAX_RANGE (1<<14)
/* The max revision number supported by any set type + 1 */ #define IPSET_REVISION_MAX 9 diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c index 16ae92054baa..ae061b27e446 100644 --- a/net/netfilter/ipset/ip_set_core.c +++ b/net/netfilter/ipset/ip_set_core.c @@ -1698,9 +1698,10 @@ call_ad(struct net *net, struct sock *ctnl, struct sk_buff *skb, ret = set->variant->uadt(set, tb, adt, &lineno, flags, retried); ip_set_unlock(set); retried = true; - } while (ret == -EAGAIN && - set->variant->resize && - (ret = set->variant->resize(set, retried)) == 0); + } while (ret == -ERANGE || + (ret == -EAGAIN && + set->variant->resize && + (ret = set->variant->resize(set, retried)) == 0));
if (!ret || (ret == -IPSET_ERR_EXIST && eexist)) return 0; diff --git a/net/netfilter/ipset/ip_set_hash_ip.c b/net/netfilter/ipset/ip_set_hash_ip.c index 75d556d71652..24adcdd7a0b1 100644 --- a/net/netfilter/ipset/ip_set_hash_ip.c +++ b/net/netfilter/ipset/ip_set_hash_ip.c @@ -98,11 +98,11 @@ static int hash_ip4_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { - const struct hash_ip4 *h = set->data; + struct hash_ip4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ip4_elem e = { 0 }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); - u32 ip = 0, ip_to = 0, hosts; + u32 ip = 0, ip_to = 0, hosts, i = 0; int ret = 0;
if (tb[IPSET_ATTR_LINENO]) @@ -147,14 +147,14 @@ hash_ip4_uadt(struct ip_set *set, struct nlattr *tb[],
hosts = h->netmask == 32 ? 1 : 2 << (32 - h->netmask - 1);
- /* 64bit division is not allowed on 32bit */ - if (((u64)ip_to - ip + 1) >> (32 - h->netmask) > IPSET_MAX_RANGE) - return -ERANGE; - if (retried) ip = ntohl(h->next.ip); - for (; ip <= ip_to;) { + for (; ip <= ip_to; i++) { e.ip = htonl(ip); + if (i > IPSET_MAX_RANGE) { + hash_ip4_data_next(&h->next, &e); + return -ERANGE; + } ret = adtfn(set, &e, &ext, &ext, flags); if (ret && !ip_set_eexist(ret, flags)) return ret; diff --git a/net/netfilter/ipset/ip_set_hash_ipmark.c b/net/netfilter/ipset/ip_set_hash_ipmark.c index 153de3457423..a22ec1a6f6ec 100644 --- a/net/netfilter/ipset/ip_set_hash_ipmark.c +++ b/net/netfilter/ipset/ip_set_hash_ipmark.c @@ -97,11 +97,11 @@ static int hash_ipmark4_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { - const struct hash_ipmark4 *h = set->data; + struct hash_ipmark4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipmark4_elem e = { }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); - u32 ip, ip_to = 0; + u32 ip, ip_to = 0, i = 0; int ret;
if (tb[IPSET_ATTR_LINENO]) @@ -148,13 +148,14 @@ hash_ipmark4_uadt(struct ip_set *set, struct nlattr *tb[], ip_set_mask_from_to(ip, ip_to, cidr); }
- if (((u64)ip_to - ip + 1) > IPSET_MAX_RANGE) - return -ERANGE; - if (retried) ip = ntohl(h->next.ip); - for (; ip <= ip_to; ip++) { + for (; ip <= ip_to; ip++, i++) { e.ip = htonl(ip); + if (i > IPSET_MAX_RANGE) { + hash_ipmark4_data_next(&h->next, &e); + return -ERANGE; + } ret = adtfn(set, &e, &ext, &ext, flags);
if (ret && !ip_set_eexist(ret, flags)) diff --git a/net/netfilter/ipset/ip_set_hash_ipport.c b/net/netfilter/ipset/ip_set_hash_ipport.c index 7303138e46be..10481760a9b2 100644 --- a/net/netfilter/ipset/ip_set_hash_ipport.c +++ b/net/netfilter/ipset/ip_set_hash_ipport.c @@ -105,11 +105,11 @@ static int hash_ipport4_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { - const struct hash_ipport4 *h = set->data; + struct hash_ipport4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipport4_elem e = { .ip = 0 }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); - u32 ip, ip_to = 0, p = 0, port, port_to; + u32 ip, ip_to = 0, p = 0, port, port_to, i = 0; bool with_ports = false; int ret;
@@ -173,17 +173,18 @@ hash_ipport4_uadt(struct ip_set *set, struct nlattr *tb[], swap(port, port_to); }
- if (((u64)ip_to - ip + 1)*(port_to - port + 1) > IPSET_MAX_RANGE) - return -ERANGE; - if (retried) ip = ntohl(h->next.ip); for (; ip <= ip_to; ip++) { p = retried && ip == ntohl(h->next.ip) ? ntohs(h->next.port) : port; - for (; p <= port_to; p++) { + for (; p <= port_to; p++, i++) { e.ip = htonl(ip); e.port = htons(p); + if (i > IPSET_MAX_RANGE) { + hash_ipport4_data_next(&h->next, &e); + return -ERANGE; + } ret = adtfn(set, &e, &ext, &ext, flags);
if (ret && !ip_set_eexist(ret, flags)) diff --git a/net/netfilter/ipset/ip_set_hash_ipportip.c b/net/netfilter/ipset/ip_set_hash_ipportip.c index 334fb1ad0e86..39a01934b153 100644 --- a/net/netfilter/ipset/ip_set_hash_ipportip.c +++ b/net/netfilter/ipset/ip_set_hash_ipportip.c @@ -108,11 +108,11 @@ static int hash_ipportip4_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { - const struct hash_ipportip4 *h = set->data; + struct hash_ipportip4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipportip4_elem e = { .ip = 0 }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); - u32 ip, ip_to = 0, p = 0, port, port_to; + u32 ip, ip_to = 0, p = 0, port, port_to, i = 0; bool with_ports = false; int ret;
@@ -180,17 +180,18 @@ hash_ipportip4_uadt(struct ip_set *set, struct nlattr *tb[], swap(port, port_to); }
- if (((u64)ip_to - ip + 1)*(port_to - port + 1) > IPSET_MAX_RANGE) - return -ERANGE; - if (retried) ip = ntohl(h->next.ip); for (; ip <= ip_to; ip++) { p = retried && ip == ntohl(h->next.ip) ? ntohs(h->next.port) : port; - for (; p <= port_to; p++) { + for (; p <= port_to; p++, i++) { e.ip = htonl(ip); e.port = htons(p); + if (i > IPSET_MAX_RANGE) { + hash_ipportip4_data_next(&h->next, &e); + return -ERANGE; + } ret = adtfn(set, &e, &ext, &ext, flags);
if (ret && !ip_set_eexist(ret, flags)) diff --git a/net/netfilter/ipset/ip_set_hash_ipportnet.c b/net/netfilter/ipset/ip_set_hash_ipportnet.c index 7df94f437f60..5c6de605a9fb 100644 --- a/net/netfilter/ipset/ip_set_hash_ipportnet.c +++ b/net/netfilter/ipset/ip_set_hash_ipportnet.c @@ -160,12 +160,12 @@ static int hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { - const struct hash_ipportnet4 *h = set->data; + struct hash_ipportnet4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_ipportnet4_elem e = { .cidr = HOST_MASK - 1 }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 ip = 0, ip_to = 0, p = 0, port, port_to; - u32 ip2_from = 0, ip2_to = 0, ip2; + u32 ip2_from = 0, ip2_to = 0, ip2, i = 0; bool with_ports = false; u8 cidr; int ret; @@ -253,9 +253,6 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[], swap(port, port_to); }
- if (((u64)ip_to - ip + 1)*(port_to - port + 1) > IPSET_MAX_RANGE) - return -ERANGE; - ip2_to = ip2_from; if (tb[IPSET_ATTR_IP2_TO]) { ret = ip_set_get_hostipaddr4(tb[IPSET_ATTR_IP2_TO], &ip2_to); @@ -282,9 +279,15 @@ hash_ipportnet4_uadt(struct ip_set *set, struct nlattr *tb[], for (; p <= port_to; p++) { e.port = htons(p); do { + i++; e.ip2 = htonl(ip2); ip2 = ip_set_range_to_cidr(ip2, ip2_to, &cidr); e.cidr = cidr - 1; + if (i > IPSET_MAX_RANGE) { + hash_ipportnet4_data_next(&h->next, + &e); + return -ERANGE; + } ret = adtfn(set, &e, &ext, &ext, flags);
if (ret && !ip_set_eexist(ret, flags)) diff --git a/net/netfilter/ipset/ip_set_hash_net.c b/net/netfilter/ipset/ip_set_hash_net.c index 1422739d9aa2..ce0a9ce5a91f 100644 --- a/net/netfilter/ipset/ip_set_hash_net.c +++ b/net/netfilter/ipset/ip_set_hash_net.c @@ -136,11 +136,11 @@ static int hash_net4_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { - const struct hash_net4 *h = set->data; + struct hash_net4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_net4_elem e = { .cidr = HOST_MASK }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); - u32 ip = 0, ip_to = 0, ipn, n = 0; + u32 ip = 0, ip_to = 0, i = 0; int ret;
if (tb[IPSET_ATTR_LINENO]) @@ -188,19 +188,16 @@ hash_net4_uadt(struct ip_set *set, struct nlattr *tb[], if (ip + UINT_MAX == ip_to) return -IPSET_ERR_HASH_RANGE; } - ipn = ip; - do { - ipn = ip_set_range_to_cidr(ipn, ip_to, &e.cidr); - n++; - } while (ipn++ < ip_to); - - if (n > IPSET_MAX_RANGE) - return -ERANGE;
if (retried) ip = ntohl(h->next.ip); do { + i++; e.ip = htonl(ip); + if (i > IPSET_MAX_RANGE) { + hash_net4_data_next(&h->next, &e); + return -ERANGE; + } ip = ip_set_range_to_cidr(ip, ip_to, &e.cidr); ret = adtfn(set, &e, &ext, &ext, flags); if (ret && !ip_set_eexist(ret, flags)) diff --git a/net/netfilter/ipset/ip_set_hash_netiface.c b/net/netfilter/ipset/ip_set_hash_netiface.c index 9810f5bf63f5..031073286236 100644 --- a/net/netfilter/ipset/ip_set_hash_netiface.c +++ b/net/netfilter/ipset/ip_set_hash_netiface.c @@ -202,7 +202,7 @@ hash_netiface4_uadt(struct ip_set *set, struct nlattr *tb[], ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netiface4_elem e = { .cidr = HOST_MASK, .elem = 1 }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); - u32 ip = 0, ip_to = 0, ipn, n = 0; + u32 ip = 0, ip_to = 0, i = 0; int ret;
if (tb[IPSET_ATTR_LINENO]) @@ -256,19 +256,16 @@ hash_netiface4_uadt(struct ip_set *set, struct nlattr *tb[], } else { ip_set_mask_from_to(ip, ip_to, e.cidr); } - ipn = ip; - do { - ipn = ip_set_range_to_cidr(ipn, ip_to, &e.cidr); - n++; - } while (ipn++ < ip_to); - - if (n > IPSET_MAX_RANGE) - return -ERANGE;
if (retried) ip = ntohl(h->next.ip); do { + i++; e.ip = htonl(ip); + if (i > IPSET_MAX_RANGE) { + hash_netiface4_data_next(&h->next, &e); + return -ERANGE; + } ip = ip_set_range_to_cidr(ip, ip_to, &e.cidr); ret = adtfn(set, &e, &ext, &ext, flags);
diff --git a/net/netfilter/ipset/ip_set_hash_netnet.c b/net/netfilter/ipset/ip_set_hash_netnet.c index 3d09eefe998a..c07b70bf32db 100644 --- a/net/netfilter/ipset/ip_set_hash_netnet.c +++ b/net/netfilter/ipset/ip_set_hash_netnet.c @@ -163,13 +163,12 @@ static int hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { - const struct hash_netnet4 *h = set->data; + struct hash_netnet4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netnet4_elem e = { }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); u32 ip = 0, ip_to = 0; - u32 ip2 = 0, ip2_from = 0, ip2_to = 0, ipn; - u64 n = 0, m = 0; + u32 ip2 = 0, ip2_from = 0, ip2_to = 0, i = 0; int ret;
if (tb[IPSET_ATTR_LINENO]) @@ -245,19 +244,6 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[], } else { ip_set_mask_from_to(ip2_from, ip2_to, e.cidr[1]); } - ipn = ip; - do { - ipn = ip_set_range_to_cidr(ipn, ip_to, &e.cidr[0]); - n++; - } while (ipn++ < ip_to); - ipn = ip2_from; - do { - ipn = ip_set_range_to_cidr(ipn, ip2_to, &e.cidr[1]); - m++; - } while (ipn++ < ip2_to); - - if (n*m > IPSET_MAX_RANGE) - return -ERANGE;
if (retried) { ip = ntohl(h->next.ip[0]); @@ -270,7 +256,12 @@ hash_netnet4_uadt(struct ip_set *set, struct nlattr *tb[], e.ip[0] = htonl(ip); ip = ip_set_range_to_cidr(ip, ip_to, &e.cidr[0]); do { + i++; e.ip[1] = htonl(ip2); + if (i > IPSET_MAX_RANGE) { + hash_netnet4_data_next(&h->next, &e); + return -ERANGE; + } ip2 = ip_set_range_to_cidr(ip2, ip2_to, &e.cidr[1]); ret = adtfn(set, &e, &ext, &ext, flags); if (ret && !ip_set_eexist(ret, flags)) diff --git a/net/netfilter/ipset/ip_set_hash_netport.c b/net/netfilter/ipset/ip_set_hash_netport.c index 09cf72eb37f8..d1a0628df4ef 100644 --- a/net/netfilter/ipset/ip_set_hash_netport.c +++ b/net/netfilter/ipset/ip_set_hash_netport.c @@ -154,12 +154,11 @@ static int hash_netport4_uadt(struct ip_set *set, struct nlattr *tb[], enum ipset_adt adt, u32 *lineno, u32 flags, bool retried) { - const struct hash_netport4 *h = set->data; + struct hash_netport4 *h = set->data; ipset_adtfn adtfn = set->variant->adt[adt]; struct hash_netport4_elem e = { .cidr = HOST_MASK - 1 }; struct ip_set_ext ext = IP_SET_INIT_UEXT(set); - u32 port, port_to, p = 0, ip = 0, ip_to = 0, ipn; - u64 n = 0; + u32 port, port_to, p = 0, ip = 0, ip_to = 0, i = 0; bool with_ports = false; u8 cidr; int ret; @@ -236,14 +235,6 @@ hash_netport4_uadt(struct ip_set *set, struct nlattr *tb[], } else { ip_set_mask_from_to(ip, ip_to, e.cidr + 1); } - ipn = ip; - do { - ipn = ip_set_range_to_cidr(ipn, ip_to, &cidr); - n++; - } while (ipn++ < ip_to); - - if (n*(port_to - port + 1) > IPSET_MAX_RANGE) - return -ERANGE;
if (retried) { ip = ntohl(h->next.ip); @@ -255,8 +246,12 @@ hash_netport4_uadt(struct ip_set *set, struct nlattr *tb[], e.ip = htonl(ip); ip = ip_set_range_to_cidr(ip, ip_to, &cidr); e.cidr = cidr - 1; - for (; p <= port_to; p++) { + for (; p <= port_to; p++, i++) { e.port = htons(p); + if (i > IPSET_MAX_RANGE) { + hash_netport4_data_next(&h->next, &e); + return -ERANGE; + } ret = adtfn(set, &e, &ext, &ext, flags); if (ret && !ip_set_eexist(ret, flags)) return ret;
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit 0a6564ebd953c4590663c9a3c99a3ea9920ade6f ]
In perf_data__open_dir(), opendir() opens the directory stream. Add missing closedir() to release it after use.
Fixes: eb6176709b235b96 ("perf data: Add perf_data__open_dir_data function") Reviewed-by: Adrian Hunter adrian.hunter@intel.com Signed-off-by: Miaoqian Lin linmq006@gmail.com Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Alexey Bayduraev alexey.v.bayduraev@linux.intel.com Cc: Ingo Molnar mingo@redhat.com Cc: Jiri Olsa jolsa@kernel.org Cc: Mark Rutland mark.rutland@arm.com Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Link: https://lore.kernel.org/r/20221229090903.1402395-1-linmq006@gmail.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/data.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c index 15a4547d608e..090a76be522b 100644 --- a/tools/perf/util/data.c +++ b/tools/perf/util/data.c @@ -127,6 +127,7 @@ int perf_data__open_dir(struct perf_data *data) file->size = st.st_size; }
+ closedir(dir); if (!files) return -EINVAL;
@@ -135,6 +136,7 @@ int perf_data__open_dir(struct perf_data *data) return 0;
out_err: + closedir(dir); close_dir(files, nr); return ret; }
From: Philipp Zabel p.zabel@pengutronix.de
[ Upstream commit 92d43bd3bc9728c1fb114d7011d46f5ea9489e28 ]
ipu_src_rect_width() was introduced to support odd screen resolutions such as 1366x768 by internally rounding up primary plane width to a multiple of 8 and compensating with reduced horizontal blanking. This also caused overlay plane width to be rounded up, which was not intended. Fix overlay plane width by limiting the rounding up to the primary plane.
drm_rect_width(&new_state->src) >> 16 is the same value as drm_rect_width(dst) because there is no plane scaling support.
Fixes: 94dfec48fca7 ("drm/imx: Add 8 pixel alignment fix") Reviewed-by: Lucas Stach l.stach@pengutronix.de Link: https://lore.kernel.org/r/20221108141420.176696-1-p.zabel@pengutronix.de Signed-off-by: Philipp Zabel p.zabel@pengutronix.de Link: https://patchwork.freedesktop.org/patch/msgid/20221108141420.176696-1-p.zabe... Tested-by: Ian Ray ian.ray@ge.com (cherry picked from commit 4333472f8d7befe62359fecb1083cd57a6e07bfc) Signed-off-by: Philipp Zabel philipp.zabel@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/imx/ipuv3-plane.c | 14 ++++++++------ 1 file changed, 8 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/imx/ipuv3-plane.c b/drivers/gpu/drm/imx/ipuv3-plane.c index 846c1aae69c8..924a66f53951 100644 --- a/drivers/gpu/drm/imx/ipuv3-plane.c +++ b/drivers/gpu/drm/imx/ipuv3-plane.c @@ -619,6 +619,11 @@ static void ipu_plane_atomic_update(struct drm_plane *plane, break; }
+ if (ipu_plane->dp_flow == IPU_DP_FLOW_SYNC_BG) + width = ipu_src_rect_width(new_state); + else + width = drm_rect_width(&new_state->src) >> 16; + eba = drm_plane_state_to_eba(new_state, 0);
/* @@ -627,8 +632,7 @@ static void ipu_plane_atomic_update(struct drm_plane *plane, */ if (ipu_state->use_pre) { axi_id = ipu_chan_assign_axi_id(ipu_plane->dma); - ipu_prg_channel_configure(ipu_plane->ipu_ch, axi_id, - ipu_src_rect_width(new_state), + ipu_prg_channel_configure(ipu_plane->ipu_ch, axi_id, width, drm_rect_height(&new_state->src) >> 16, fb->pitches[0], fb->format->format, fb->modifier, &eba); @@ -683,9 +687,8 @@ static void ipu_plane_atomic_update(struct drm_plane *plane, break; }
- ipu_dmfc_config_wait4eot(ipu_plane->dmfc, ALIGN(drm_rect_width(dst), 8)); + ipu_dmfc_config_wait4eot(ipu_plane->dmfc, width);
- width = ipu_src_rect_width(new_state); height = drm_rect_height(&new_state->src) >> 16; info = drm_format_info(fb->format->format); ipu_calculate_bursts(width, info->cpp[0], fb->pitches[0], @@ -749,8 +752,7 @@ static void ipu_plane_atomic_update(struct drm_plane *plane, ipu_cpmem_set_burstsize(ipu_plane->ipu_ch, 16);
ipu_cpmem_zero(ipu_plane->alpha_ch); - ipu_cpmem_set_resolution(ipu_plane->alpha_ch, - ipu_src_rect_width(new_state), + ipu_cpmem_set_resolution(ipu_plane->alpha_ch, width, drm_rect_height(&new_state->src) >> 16); ipu_cpmem_set_format_passthrough(ipu_plane->alpha_ch, 8); ipu_cpmem_set_high_priority(ipu_plane->alpha_ch);
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 0226635c304cfd5c9db9b78c259cb713819b057e ]
syzbot is reporting hung task at do_user_addr_fault() [1], for there is a silent deadlock between PG_locked bit and ni_lock lock.
Since filemap_update_page() calls filemap_read_folio() after calling folio_trylock() which will set PG_locked bit, ntfs_truncate() must not call truncate_setsize() which will wait for PG_locked bit to be cleared when holding ni_lock lock.
Link: https://lore.kernel.org/all/00000000000060d41f05f139aa44@google.com/ Link: https://syzkaller.appspot.com/bug?extid=bed15dbf10294aa4f2ae [1] Reported-by: syzbot syzbot+bed15dbf10294aa4f2ae@syzkaller.appspotmail.com Debugged-by: Linus Torvalds torvalds@linux-foundation.org Co-developed-by: Hillf Danton hdanton@sina.com Signed-off-by: Hillf Danton hdanton@sina.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Fixes: 4342306f0f0d ("fs/ntfs3: Add file operations and implementation") Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c index 7a678a5b1ca5..c526e0427f2b 100644 --- a/fs/ntfs3/file.c +++ b/fs/ntfs3/file.c @@ -488,10 +488,10 @@ static int ntfs_truncate(struct inode *inode, loff_t new_size)
new_valid = ntfs_up_block(sb, min_t(u64, ni->i_valid, new_size));
- ni_lock(ni); - truncate_setsize(inode, new_size);
+ ni_lock(ni); + down_write(&ni->file.run_lock); err = attr_set_size(ni, ATTR_DATA, NULL, 0, &ni->file.run, new_size, &new_valid, ni->mi.sbi->options->prealloc, NULL);
From: Daniil Tatianin d-tatianin@yandex-team.ru
[ Upstream commit 9c807965483f42df1d053b7436eedd6cf28ece6f ]
Otherwise we would dereference a NULL aggregator pointer when calling __set_agg_ports_ready on the line below.
Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daniil Tatianin d-tatianin@yandex-team.ru Reviewed-by: Jiri Pirko jiri@nvidia.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/bonding/bond_3ad.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c index 8ad095c19f27..ff6d4e74a186 100644 --- a/drivers/net/bonding/bond_3ad.c +++ b/drivers/net/bonding/bond_3ad.c @@ -1539,6 +1539,7 @@ static void ad_port_selection_logic(struct port *port, bool *update_slave_arr) slave_err(bond->dev, port->slave->dev, "Port %d did not find a suitable aggregator\n", port->actor_port_number); + return; } } /* if all aggregator's ports are READY_N == TRUE, set ready=TRUE
From: Geetha sowjanya gakula@marvell.com
[ Upstream commit 4af1b64f80fbe1275fb02c5f1c0cef099a4a231f ]
Current code uses per_cpu pointer to get the lmtst_id mapped to the core on which aura_free() is executed. Using per_cpu pointer without preemption disable causing mismatch between lmtst_id and core on which pointer gets freed. This patch fixes the issue by disabling preemption around aura_free.
Fixes: ef6c8da71eaf ("octeontx2-pf: cn10K: Reserve LMTST lines per core") Signed-off-by: Sunil Goutham sgoutham@marvell.com Signed-off-by: Geetha sowjanya gakula@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- .../marvell/octeontx2/nic/otx2_common.c | 30 +++++++++++++------ 1 file changed, 21 insertions(+), 9 deletions(-)
diff --git a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c index e14624caddc6..f6306eedd59b 100644 --- a/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c +++ b/drivers/net/ethernet/marvell/octeontx2/nic/otx2_common.c @@ -962,6 +962,7 @@ static void otx2_pool_refill_task(struct work_struct *work) rbpool = cq->rbpool; free_ptrs = cq->pool_ptrs;
+ get_cpu(); while (cq->pool_ptrs) { if (otx2_alloc_rbuf(pfvf, rbpool, &bufptr)) { /* Schedule a WQ if we fails to free atleast half of the @@ -981,6 +982,7 @@ static void otx2_pool_refill_task(struct work_struct *work) pfvf->hw_ops->aura_freeptr(pfvf, qidx, bufptr + OTX2_HEAD_ROOM); cq->pool_ptrs--; } + put_cpu(); cq->refill_task_sched = false; }
@@ -1314,6 +1316,7 @@ int otx2_sq_aura_pool_init(struct otx2_nic *pfvf) if (err) goto fail;
+ get_cpu(); /* Allocate pointers and free them to aura/pool */ for (qidx = 0; qidx < hw->tx_queues; qidx++) { pool_id = otx2_get_pool_idx(pfvf, AURA_NIX_SQ, qidx); @@ -1322,18 +1325,24 @@ int otx2_sq_aura_pool_init(struct otx2_nic *pfvf) sq = &qset->sq[qidx]; sq->sqb_count = 0; sq->sqb_ptrs = kcalloc(num_sqbs, sizeof(*sq->sqb_ptrs), GFP_KERNEL); - if (!sq->sqb_ptrs) - return -ENOMEM; + if (!sq->sqb_ptrs) { + err = -ENOMEM; + goto err_mem; + }
for (ptr = 0; ptr < num_sqbs; ptr++) { - if (otx2_alloc_rbuf(pfvf, pool, &bufptr)) - return -ENOMEM; + err = otx2_alloc_rbuf(pfvf, pool, &bufptr); + if (err) + goto err_mem; pfvf->hw_ops->aura_freeptr(pfvf, pool_id, bufptr); sq->sqb_ptrs[sq->sqb_count++] = (u64)bufptr; } }
- return 0; +err_mem: + put_cpu(); + return err ? -ENOMEM : 0; + fail: otx2_mbox_reset(&pfvf->mbox.mbox, 0); otx2_aura_pool_free(pfvf); @@ -1372,18 +1381,21 @@ int otx2_rq_aura_pool_init(struct otx2_nic *pfvf) if (err) goto fail;
+ get_cpu(); /* Allocate pointers and free them to aura/pool */ for (pool_id = 0; pool_id < hw->rqpool_cnt; pool_id++) { pool = &pfvf->qset.pool[pool_id]; for (ptr = 0; ptr < num_ptrs; ptr++) { - if (otx2_alloc_rbuf(pfvf, pool, &bufptr)) - return -ENOMEM; + err = otx2_alloc_rbuf(pfvf, pool, &bufptr); + if (err) + goto err_mem; pfvf->hw_ops->aura_freeptr(pfvf, pool_id, bufptr + OTX2_HEAD_ROOM); } } - - return 0; +err_mem: + put_cpu(); + return err ? -ENOMEM : 0; fail: otx2_mbox_reset(&pfvf->mbox.mbox, 0); otx2_aura_pool_free(pfvf);
From: Szymon Heidrich szymon.heidrich@gmail.com
[ Upstream commit c7dd13805f8b8fc1ce3b6d40f6aff47e66b72ad2 ]
Variables off and len typed as uint32 in rndis_query function are controlled by incoming RNDIS response message thus their value may be manipulated. Setting off to a unexpectetly large value will cause the sum with len and 8 to overflow and pass the implemented validation step. Consequently the response pointer will be referring to a location past the expected buffer boundaries allowing information leakage e.g. via RNDIS_OID_802_3_PERMANENT_ADDRESS OID.
Fixes: ddda08624013 ("USB: rndis_host, various cleanups") Signed-off-by: Szymon Heidrich szymon.heidrich@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/usb/rndis_host.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/usb/rndis_host.c b/drivers/net/usb/rndis_host.c index bedd36ab5cf0..e5f6614da5ac 100644 --- a/drivers/net/usb/rndis_host.c +++ b/drivers/net/usb/rndis_host.c @@ -255,7 +255,8 @@ static int rndis_query(struct usbnet *dev, struct usb_interface *intf,
off = le32_to_cpu(u.get_c->offset); len = le32_to_cpu(u.get_c->len); - if (unlikely((8 + off + len) > CONTROL_BUFFER_SIZE)) + if (unlikely((off > CONTROL_BUFFER_SIZE - 8) || + (len > CONTROL_BUFFER_SIZE - 8 - off))) goto response_error;
if (*reply_len != -1 && len != *reply_len)
From: Namhyung Kim namhyung@kernel.org
[ Upstream commit 54b353a20c7e8be98414754f5aff98c8a68fcc1f ]
The --for-each-cgroup can have the same cgroup multiple times, but this confuses BPF counters (since they have the same cgroup id), making only the last cgroup events to be counted.
Let's check the cgroup name before adding a new entry to the cgroups list.
Before:
$ sudo ./perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1
Performance counter stats for 'system wide':
<not counted> msec cpu-clock / <not counted> context-switches / <not counted> cpu-migrations / <not counted> page-faults / <not counted> cycles / <not counted> instructions / <not counted> branches / <not counted> branch-misses / 8,016.04 msec cpu-clock / # 7.998 CPUs utilized 6,152 context-switches / # 767.461 /sec 250 cpu-migrations / # 31.187 /sec 442 page-faults / # 55.139 /sec 613,111,487 cycles / # 0.076 GHz 280,599,604 instructions / # 0.46 insn per cycle 57,692,724 branches / # 7.197 M/sec 3,385,168 branch-misses / # 5.87% of all branches
1.002220125 seconds time elapsed
After it becomes similar to the non-BPF mode:
$ sudo ./perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1
Performance counter stats for 'system wide':
8,013.38 msec cpu-clock / # 7.998 CPUs utilized 6,859 context-switches / # 855.944 /sec 334 cpu-migrations / # 41.680 /sec 345 page-faults / # 43.053 /sec 782,326,119 cycles / # 0.098 GHz 471,645,724 instructions / # 0.60 insn per cycle 94,963,430 branches / # 11.851 M/sec 3,685,511 branch-misses / # 3.88% of all branches
1.001864539 seconds time elapsed
Committer notes:
As a reminder, to test with BPF counters one has to use BUILD_BPF_SKEL=1 in the make command line and have clang/llvm installed when building perf, otherwise the --bpf-counters option will not be available:
# perf stat -a --bpf-counters --for-each-cgroup /,/ sleep 1 Error: unknown option `bpf-counters'
Usage: perf stat [<options>] [<command>]
-a, --all-cpus system-wide collection from all CPUs <SNIP> #
Fixes: bb1c15b60b981d10 ("perf stat: Support regex pattern in --for-each-cgroup") Signed-off-by: Namhyung Kim namhyung@kernel.org Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: Adrian Hunter adrian.hunter@intel.com Cc: bpf@vger.kernel.org Cc: Ian Rogers irogers@google.com Cc: Ingo Molnar mingo@kernel.org Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: Peter Zijlstra peterz@infradead.org Cc: Song Liu songliubraving@fb.com Link: https://lore.kernel.org/r/20230104064402.1551516-5-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/cgroup.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c index e99b41f9be45..cd978c240e0d 100644 --- a/tools/perf/util/cgroup.c +++ b/tools/perf/util/cgroup.c @@ -224,6 +224,19 @@ static int add_cgroup_name(const char *fpath, const struct stat *sb __maybe_unus return 0; }
+static int check_and_add_cgroup_name(const char *fpath) +{ + struct cgroup_name *cn; + + list_for_each_entry(cn, &cgroup_list, list) { + if (!strcmp(cn->name, fpath)) + return 0; + } + + /* pretend if it's added by ftw() */ + return add_cgroup_name(fpath, NULL, FTW_D, NULL); +} + static void release_cgroup_list(void) { struct cgroup_name *cn; @@ -242,7 +255,7 @@ static int list_cgroups(const char *str) struct cgroup_name *cn; char *s;
- /* use given name as is - for testing purpose */ + /* use given name as is when no regex is given */ for (;;) { p = strchr(str, ','); e = p ? p : eos; @@ -253,13 +266,13 @@ static int list_cgroups(const char *str) s = strndup(str, e - str); if (!s) return -1; - /* pretend if it's added by ftw() */ - ret = add_cgroup_name(s, NULL, FTW_D, NULL); + + ret = check_and_add_cgroup_name(s); free(s); - if (ret) + if (ret < 0) return -1; } else { - if (add_cgroup_name("", NULL, FTW_D, NULL) < 0) + if (check_and_add_cgroup_name("/") < 0) return -1; }
From: Dan Carpenter error27@gmail.com
[ Upstream commit 3792fc508c095abd84b10ceae12bd773e61fdc36 ]
Call intel_vgpu_unpin_mm() on this error path.
Fixes: 418741480809 ("drm/i915/gvt: Adding ppgtt to GVT GEM context after shadow pdps settled.") Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Link: http://patchwork.freedesktop.org/patch/msgid/Y3OQ5tgZIVxyQ/WV@kili Reviewed-by: Zhenyu Wang zhenyuw@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/i915/gvt/scheduler.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c index 1bb1be5c48c8..0291d42cfba8 100644 --- a/drivers/gpu/drm/i915/gvt/scheduler.c +++ b/drivers/gpu/drm/i915/gvt/scheduler.c @@ -694,6 +694,7 @@ intel_vgpu_shadow_mm_pin(struct intel_vgpu_workload *workload)
if (workload->shadow_mm->type != INTEL_GVT_MM_PPGTT || !workload->shadow_mm->ppgtt_mm.shadowed) { + intel_vgpu_unpin_mm(workload->shadow_mm); gvt_vgpu_err("workload shadow ppgtt isn't ready\n"); return -EINVAL; }
From: Zhengchao Shao shaozhengchao@huawei.com
[ Upstream commit fe69230f05897b3de758427b574fc98025dfc907 ]
When linktype is unknown or kzalloc failed in cfctrl_linkup_request(), pkt is not released. Add release process to error path.
Fixes: b482cd2053e3 ("net-caif: add CAIF core protocol stack") Fixes: 8d545c8f958f ("caif: Disconnect without waiting for response") Signed-off-by: Zhengchao Shao shaozhengchao@huawei.com Reviewed-by: Jiri Pirko jiri@nvidia.com Link: https://lore.kernel.org/r/20230104065146.1153009-1-shaozhengchao@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- net/caif/cfctrl.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/caif/cfctrl.c b/net/caif/cfctrl.c index 2809cbd6b7f7..d8cb4b2a076b 100644 --- a/net/caif/cfctrl.c +++ b/net/caif/cfctrl.c @@ -269,11 +269,15 @@ int cfctrl_linkup_request(struct cflayer *layer, default: pr_warn("Request setup of bad link type = %d\n", param->linktype); + cfpkt_destroy(pkt); return -EINVAL; } req = kzalloc(sizeof(*req), GFP_KERNEL); - if (!req) + if (!req) { + cfpkt_destroy(pkt); return -ENOMEM; + } + req->client_layer = user_layer; req->cmd = CFCTRL_CMD_LINK_SETUP; req->param = *param;
From: Jan Kara jack@suse.cz
[ Upstream commit 83c7423d1eb6806d13c521d1002cc1a012111719 ]
When extending the last extent in the file within the last block, we wrongly computed the length of the last extent. This is mostly a cosmetical problem since the extent does not contain any data and the length will be fixed up by following operations but still.
Fixes: 1f3868f06855 ("udf: Fix extending file within last block") Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- fs/udf/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/udf/inode.c b/fs/udf/inode.c index 6a0e8ef664c1..d2488b7e54a5 100644 --- a/fs/udf/inode.c +++ b/fs/udf/inode.c @@ -599,7 +599,7 @@ static void udf_do_extend_final_block(struct inode *inode, */ if (new_elen <= (last_ext->extLength & UDF_EXTENT_LENGTH_MASK)) return; - added_bytes = (last_ext->extLength & UDF_EXTENT_LENGTH_MASK) - new_elen; + added_bytes = new_elen - (last_ext->extLength & UDF_EXTENT_LENGTH_MASK); last_ext->extLength += added_bytes; UDF_I(inode)->i_lenExtents += added_bytes;
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit a1dec9d70b6ad97087b60b81d2492134a84208c6 ]
The Advantech MICA-071 tablet deviates from the defaults for a non CR Bay Trail based tablet in several ways:
1. It uses an analog MIC on IN3 rather then using DMIC1 2. It only has 1 speaker 3. It needs the OVCD current threshold to be set to 1500uA instead of the default 2000uA to reliable differentiate between headphones vs headsets
Add a quirk with these settings for this tablet.
Signed-off-by: Hans de Goede hdegoede@redhat.com Acked-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Link: https://lore.kernel.org/r/20221213123246.11226-1-hdegoede@redhat.com Signed-off-by: Mark Brown broonie@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- sound/soc/intel/boards/bytcr_rt5640.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index f9c82ebc552c..888e04c57757 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -570,6 +570,21 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = { BYT_RT5640_SSP0_AIF1 | BYT_RT5640_MCLK_EN), }, + { + /* Advantech MICA-071 */ + .matches = { + DMI_EXACT_MATCH(DMI_SYS_VENDOR, "Advantech"), + DMI_EXACT_MATCH(DMI_PRODUCT_NAME, "MICA-071"), + }, + /* OVCD Th = 1500uA to reliable detect head-phones vs -set */ + .driver_data = (void *)(BYT_RT5640_IN3_MAP | + BYT_RT5640_JD_SRC_JD2_IN4N | + BYT_RT5640_OVCD_TH_1500UA | + BYT_RT5640_OVCD_SF_0P75 | + BYT_RT5640_MONO_SPEAKER | + BYT_RT5640_DIFF_MIC | + BYT_RT5640_MCLK_EN), + }, { .matches = { DMI_EXACT_MATCH(DMI_SYS_VENDOR, "ARCHOS"),
From: Yanjun Zhang zhangyanjun@cestc.cn
[ Upstream commit 3659fb5ac29a5e6102bebe494ac789fd47fb78f4 ]
The flush request initialized by blk_kick_flush has NULL bio, and it may be dealt with nvme_end_req during io completion. When blktrace is enabled, nvme_trace_bio_complete with multipath activated trying to access NULL pointer bio from flush request results in the following crash:
[ 2517.831677] BUG: kernel NULL pointer dereference, address: 000000000000001a [ 2517.835213] #PF: supervisor read access in kernel mode [ 2517.838724] #PF: error_code(0x0000) - not-present page [ 2517.842222] PGD 7b2d51067 P4D 0 [ 2517.845684] Oops: 0000 [#1] SMP NOPTI [ 2517.849125] CPU: 2 PID: 732 Comm: kworker/2:1H Kdump: loaded Tainted: G S 5.15.67-0.cl9.x86_64 #1 [ 2517.852723] Hardware name: XFUSION 2288H V6/BC13MBSBC, BIOS 1.13 07/27/2022 [ 2517.856358] Workqueue: nvme_tcp_wq nvme_tcp_io_work [nvme_tcp] [ 2517.859993] RIP: 0010:blk_add_trace_bio_complete+0x6/0x30 [ 2517.863628] Code: 1f 44 00 00 48 8b 46 08 31 c9 ba 04 00 10 00 48 8b 80 50 03 00 00 48 8b 78 50 e9 e5 fe ff ff 0f 1f 44 00 00 41 54 49 89 f4 55 <0f> b6 7a 1a 48 89 d5 e8 3e 1c 2b 00 48 89 ee 4c 89 e7 5d 89 c1 ba [ 2517.871269] RSP: 0018:ff7f6a008d9dbcd0 EFLAGS: 00010286 [ 2517.875081] RAX: ff3d5b4be00b1d50 RBX: 0000000002040002 RCX: ff3d5b0a270f2000 [ 2517.878966] RDX: 0000000000000000 RSI: ff3d5b0b021fb9f8 RDI: 0000000000000000 [ 2517.882849] RBP: ff3d5b0b96a6fa00 R08: 0000000000000001 R09: 0000000000000000 [ 2517.886718] R10: 000000000000000c R11: 000000000000000c R12: ff3d5b0b021fb9f8 [ 2517.890575] R13: 0000000002000000 R14: ff3d5b0b021fb1b0 R15: 0000000000000018 [ 2517.894434] FS: 0000000000000000(0000) GS:ff3d5b42bfc80000(0000) knlGS:0000000000000000 [ 2517.898299] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2517.902157] CR2: 000000000000001a CR3: 00000004f023e005 CR4: 0000000000771ee0 [ 2517.906053] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 2517.909930] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 2517.913761] PKRU: 55555554 [ 2517.917558] Call Trace: [ 2517.921294] <TASK> [ 2517.924982] nvme_complete_rq+0x1c3/0x1e0 [nvme_core] [ 2517.928715] nvme_tcp_recv_pdu+0x4d7/0x540 [nvme_tcp] [ 2517.932442] nvme_tcp_recv_skb+0x4f/0x240 [nvme_tcp] [ 2517.936137] ? nvme_tcp_recv_pdu+0x540/0x540 [nvme_tcp] [ 2517.939830] tcp_read_sock+0x9c/0x260 [ 2517.943486] nvme_tcp_try_recv+0x65/0xa0 [nvme_tcp] [ 2517.947173] nvme_tcp_io_work+0x64/0x90 [nvme_tcp] [ 2517.950834] process_one_work+0x1e8/0x390 [ 2517.954473] worker_thread+0x53/0x3c0 [ 2517.958069] ? process_one_work+0x390/0x390 [ 2517.961655] kthread+0x10c/0x130 [ 2517.965211] ? set_kthread_struct+0x40/0x40 [ 2517.968760] ret_from_fork+0x1f/0x30 [ 2517.972285] </TASK>
To avoid this situation, add a NULL check for req->bio before calling trace_block_bio_complete.
Signed-off-by: Yanjun Zhang zhangyanjun@cestc.cn Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/nvme.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h index 7f52b2b179b8..39ca48babbe8 100644 --- a/drivers/nvme/host/nvme.h +++ b/drivers/nvme/host/nvme.h @@ -799,7 +799,7 @@ static inline void nvme_trace_bio_complete(struct request *req) { struct nvme_ns *ns = req->q->queuedata;
- if (req->cmd_flags & REQ_NVME_MPATH) + if ((req->cmd_flags & REQ_NVME_MPATH) && req->bio) trace_block_bio_complete(ns->head->disk->queue, req->bio); }
From: Jens Axboe axboe@kernel.dk
[ Upstream commit 343190841a1f22b96996d9f8cfab902a4d1bfd0e ]
We only check the register opcode value inside the restricted ring section, move it into the main io_uring_register() function instead and check it up front.
Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- io_uring/io_uring.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c index eebbe8a6da0c..52a08632326a 100644 --- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -10895,8 +10895,6 @@ static int __io_uring_register(struct io_ring_ctx *ctx, unsigned opcode, return -ENXIO;
if (ctx->restricted) { - if (opcode >= IORING_REGISTER_LAST) - return -EINVAL; opcode = array_index_nospec(opcode, IORING_REGISTER_LAST); if (!test_bit(opcode, ctx->restrictions.register_op)) return -EACCES; @@ -11028,6 +11026,9 @@ SYSCALL_DEFINE4(io_uring_register, unsigned int, fd, unsigned int, opcode, long ret = -EBADF; struct fd f;
+ if (opcode >= IORING_REGISTER_LAST) + return -EINVAL; + f = fdget(fd); if (!f.file) return -EBADF;
From: Christoph Hellwig hch@lst.de
[ Upstream commit 61f37154c599cf9f2f84dcbd9be842f8645a7099 ]
Use NVME_CMD_EFFECTS_CSUPP instead of open coding it and assign a single value to multiple array entries instead of repeated assignments.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Keith Busch kbusch@kernel.org Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Kanchan Joshi joshi.k@samsung.com Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/admin-cmd.c | 35 ++++++++++++++++++--------------- 1 file changed, 19 insertions(+), 16 deletions(-)
diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c index 52bb262d267a..bf78c58ed41d 100644 --- a/drivers/nvme/target/admin-cmd.c +++ b/drivers/nvme/target/admin-cmd.c @@ -164,26 +164,29 @@ static void nvmet_execute_get_log_page_smart(struct nvmet_req *req)
static void nvmet_get_cmd_effects_nvm(struct nvme_effects_log *log) { - log->acs[nvme_admin_get_log_page] = cpu_to_le32(1 << 0); - log->acs[nvme_admin_identify] = cpu_to_le32(1 << 0); - log->acs[nvme_admin_abort_cmd] = cpu_to_le32(1 << 0); - log->acs[nvme_admin_set_features] = cpu_to_le32(1 << 0); - log->acs[nvme_admin_get_features] = cpu_to_le32(1 << 0); - log->acs[nvme_admin_async_event] = cpu_to_le32(1 << 0); - log->acs[nvme_admin_keep_alive] = cpu_to_le32(1 << 0); - - log->iocs[nvme_cmd_read] = cpu_to_le32(1 << 0); - log->iocs[nvme_cmd_write] = cpu_to_le32(1 << 0); - log->iocs[nvme_cmd_flush] = cpu_to_le32(1 << 0); - log->iocs[nvme_cmd_dsm] = cpu_to_le32(1 << 0); - log->iocs[nvme_cmd_write_zeroes] = cpu_to_le32(1 << 0); + log->acs[nvme_admin_get_log_page] = + log->acs[nvme_admin_identify] = + log->acs[nvme_admin_abort_cmd] = + log->acs[nvme_admin_set_features] = + log->acs[nvme_admin_get_features] = + log->acs[nvme_admin_async_event] = + log->acs[nvme_admin_keep_alive] = + cpu_to_le32(NVME_CMD_EFFECTS_CSUPP); + + log->iocs[nvme_cmd_read] = + log->iocs[nvme_cmd_write] = + log->iocs[nvme_cmd_flush] = + log->iocs[nvme_cmd_dsm] = + log->iocs[nvme_cmd_write_zeroes] = + cpu_to_le32(NVME_CMD_EFFECTS_CSUPP); }
static void nvmet_get_cmd_effects_zns(struct nvme_effects_log *log) { - log->iocs[nvme_cmd_zone_append] = cpu_to_le32(1 << 0); - log->iocs[nvme_cmd_zone_mgmt_send] = cpu_to_le32(1 << 0); - log->iocs[nvme_cmd_zone_mgmt_recv] = cpu_to_le32(1 << 0); + log->iocs[nvme_cmd_zone_append] = + log->iocs[nvme_cmd_zone_mgmt_send] = + log->iocs[nvme_cmd_zone_mgmt_recv] = + cpu_to_le32(NVME_CMD_EFFECTS_CSUPP); }
static void nvmet_execute_get_log_cmd_effects_ns(struct nvmet_req *req)
From: Christoph Hellwig hch@lst.de
[ Upstream commit 831ed60c2aca2d7c517b2da22897a90224a97d27 ]
To be able to use the Commands Supported and Effects Log for allowing unprivileged passtrough, it needs to be corretly reported for I/O commands as well. Return the I/O command effects from nvme_command_effects, and also add a default list of effects for the NVM command set. For other command sets, the Commands Supported and Effects log is required to be present already.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Keith Busch kbusch@kernel.org Reviewed-by: Kanchan Joshi joshi.k@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/core.c | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 2d5b5e0fb66a..672f53d5651a 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -1113,6 +1113,18 @@ static u32 nvme_known_admin_effects(u8 opcode) return 0; }
+static u32 nvme_known_nvm_effects(u8 opcode) +{ + switch (opcode) { + case nvme_cmd_write: + case nvme_cmd_write_zeroes: + case nvme_cmd_write_uncor: + return NVME_CMD_EFFECTS_LBCC; + default: + return 0; + } +} + u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode) { u32 effects = 0; @@ -1120,16 +1132,24 @@ u32 nvme_command_effects(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u8 opcode) if (ns) { if (ns->head->effects) effects = le32_to_cpu(ns->head->effects->iocs[opcode]); + if (ns->head->ids.csi == NVME_CAP_CSS_NVM) + effects |= nvme_known_nvm_effects(opcode); if (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC)) dev_warn_once(ctrl->device, - "IO command:%02x has unhandled effects:%08x\n", + "IO command:%02x has unusual effects:%08x\n", opcode, effects); - return 0; - }
- if (ctrl->effects) - effects = le32_to_cpu(ctrl->effects->acs[opcode]); - effects |= nvme_known_admin_effects(opcode); + /* + * NVME_CMD_EFFECTS_CSE_MASK causes a freeze all I/O queues, + * which would deadlock when done on an I/O command. Note that + * We already warn about an unusual effect above. + */ + effects &= ~NVME_CMD_EFFECTS_CSE_MASK; + } else { + if (ctrl->effects) + effects = le32_to_cpu(ctrl->effects->acs[opcode]); + effects |= nvme_known_admin_effects(opcode); + }
return effects; }
From: Qu Wenruo wqu@suse.com
[ Upstream commit a05d3c9153145283ce9c58a1d7a9056fbb85f6a1 ]
[BACKGROUND] There is an incident report that, one user hibernated the system, with one btrfs on removable device still mounted.
Then by some incident, the btrfs got mounted and modified by another system/OS, then back to the hibernated system.
After resuming from the hibernation, new write happened into the victim btrfs.
Now the fs is completely broken, since the underlying btrfs is no longer the same one before the hibernation, and the user lost their data due to various transid mismatch.
[REPRODUCER] We can emulate the situation using the following small script:
truncate -s 1G $dev mkfs.btrfs -f $dev mount $dev $mnt fsstress -w -d $mnt -n 500 sync xfs_freeze -f $mnt cp $dev $dev.backup
# There is no way to mount the same cloned fs on the same system, # as the conflicting fsid will be rejected by btrfs. # Thus here we have to wipe the fs using a different btrfs. mkfs.btrfs -f $dev.backup
dd if=$dev.backup of=$dev bs=1M xfs_freeze -u $mnt fsstress -w -d $mnt -n 20 umount $mnt btrfs check $dev
The final fsck will fail due to some tree blocks has incorrect fsid.
This is enough to emulate the problem hit by the unfortunate user.
[ENHANCEMENT] Although such case should not be that common, it can still happen from time to time.
From the view of btrfs, we can detect any unexpected super block change,
and if there is any unexpected change, we just mark the fs read-only, and thaw the fs.
By this we can limit the damage to minimal, and I hope no one would lose their data by this anymore.
Suggested-by: Goffredo Baroncelli kreijack@libero.it Link: https://lore.kernel.org/linux-btrfs/83bf3b4b-7f4c-387a-b286-9251e3991e34@blu... Reviewed-by: Anand Jain anand.jain@oracle.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/disk-io.c | 25 ++++++++++++++----- fs/btrfs/disk-io.h | 4 +++- fs/btrfs/super.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++ fs/btrfs/volumes.c | 2 +- 4 files changed, 83 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2fd46093e5bb..6484f61c6fbb 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -2491,8 +2491,8 @@ static int btrfs_read_roots(struct btrfs_fs_info *fs_info) * 1, 2 2nd and 3rd backup copy * -1 skip bytenr check */ -static int validate_super(struct btrfs_fs_info *fs_info, - struct btrfs_super_block *sb, int mirror_num) +int btrfs_validate_super(struct btrfs_fs_info *fs_info, + struct btrfs_super_block *sb, int mirror_num) { u64 nodesize = btrfs_super_nodesize(sb); u64 sectorsize = btrfs_super_sectorsize(sb); @@ -2675,7 +2675,7 @@ static int validate_super(struct btrfs_fs_info *fs_info, */ static int btrfs_validate_mount_super(struct btrfs_fs_info *fs_info) { - return validate_super(fs_info, fs_info->super_copy, 0); + return btrfs_validate_super(fs_info, fs_info->super_copy, 0); }
/* @@ -2689,7 +2689,7 @@ static int btrfs_validate_write_super(struct btrfs_fs_info *fs_info, { int ret;
- ret = validate_super(fs_info, sb, -1); + ret = btrfs_validate_super(fs_info, sb, -1); if (ret < 0) goto out; if (!btrfs_supported_super_csum(btrfs_super_csum_type(sb))) { @@ -3703,7 +3703,7 @@ static void btrfs_end_super_write(struct bio *bio) }
struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev, - int copy_num) + int copy_num, bool drop_cache) { struct btrfs_super_block *super; struct page *page; @@ -3721,6 +3721,19 @@ struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev, if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode)) return ERR_PTR(-EINVAL);
+ if (drop_cache) { + /* This should only be called with the primary sb. */ + ASSERT(copy_num == 0); + + /* + * Drop the page of the primary superblock, so later read will + * always read from the device. + */ + invalidate_inode_pages2_range(mapping, + bytenr >> PAGE_SHIFT, + (bytenr + BTRFS_SUPER_INFO_SIZE) >> PAGE_SHIFT); + } + page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS); if (IS_ERR(page)) return ERR_CAST(page); @@ -3752,7 +3765,7 @@ struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev) * later supers, using BTRFS_SUPER_MIRROR_MAX instead */ for (i = 0; i < 1; i++) { - super = btrfs_read_dev_one_super(bdev, i); + super = btrfs_read_dev_one_super(bdev, i, false); if (IS_ERR(super)) continue;
diff --git a/fs/btrfs/disk-io.h b/fs/btrfs/disk-io.h index 1b8fd3deafc9..9de0c39f63a2 100644 --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -56,10 +56,12 @@ int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options); void __cold close_ctree(struct btrfs_fs_info *fs_info); +int btrfs_validate_super(struct btrfs_fs_info *fs_info, + struct btrfs_super_block *sb, int mirror_num); int write_all_supers(struct btrfs_fs_info *fs_info, int max_mirrors); struct btrfs_super_block *btrfs_read_dev_super(struct block_device *bdev); struct btrfs_super_block *btrfs_read_dev_one_super(struct block_device *bdev, - int copy_num); + int copy_num, bool drop_cache); int btrfs_commit_super(struct btrfs_fs_info *fs_info); struct btrfs_root *btrfs_read_tree_root(struct btrfs_root *tree_root, struct btrfs_key *key); diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 61b84391be58..bde5ead01c24 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2497,11 +2497,71 @@ static int btrfs_freeze(struct super_block *sb) return btrfs_commit_transaction(trans); }
+static int check_dev_super(struct btrfs_device *dev) +{ + struct btrfs_fs_info *fs_info = dev->fs_info; + struct btrfs_super_block *sb; + int ret = 0; + + /* This should be called with fs still frozen. */ + ASSERT(test_bit(BTRFS_FS_FROZEN, &fs_info->flags)); + + /* Missing dev, no need to check. */ + if (!dev->bdev) + return 0; + + /* Only need to check the primary super block. */ + sb = btrfs_read_dev_one_super(dev->bdev, 0, true); + if (IS_ERR(sb)) + return PTR_ERR(sb); + + /* Btrfs_validate_super() includes fsid check against super->fsid. */ + ret = btrfs_validate_super(fs_info, sb, 0); + if (ret < 0) + goto out; + + if (btrfs_super_generation(sb) != fs_info->last_trans_committed) { + btrfs_err(fs_info, "transid mismatch, has %llu expect %llu", + btrfs_super_generation(sb), + fs_info->last_trans_committed); + ret = -EUCLEAN; + goto out; + } +out: + btrfs_release_disk_super(sb); + return ret; +} + static int btrfs_unfreeze(struct super_block *sb) { struct btrfs_fs_info *fs_info = btrfs_sb(sb); + struct btrfs_device *device; + int ret = 0;
+ /* + * Make sure the fs is not changed by accident (like hibernation then + * modified by other OS). + * If we found anything wrong, we mark the fs error immediately. + * + * And since the fs is frozen, no one can modify the fs yet, thus + * we don't need to hold device_list_mutex. + */ + list_for_each_entry(device, &fs_info->fs_devices->devices, dev_list) { + ret = check_dev_super(device); + if (ret < 0) { + btrfs_handle_fs_error(fs_info, ret, + "super block on devid %llu got modified unexpectedly", + device->devid); + break; + } + } clear_bit(BTRFS_FS_FROZEN, &fs_info->flags); + + /* + * We still return 0, to allow VFS layer to unfreeze the fs even the + * above checks failed. Since the fs is either fine or read-only, we're + * safe to continue, without causing further damage. + */ return 0; }
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 6b86a3cec04c..f01549b8c7c5 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -2074,7 +2074,7 @@ void btrfs_scratch_superblocks(struct btrfs_fs_info *fs_info, struct page *page; int ret;
- disk_super = btrfs_read_dev_one_super(bdev, copy_num); + disk_super = btrfs_read_dev_one_super(bdev, copy_num, false); if (IS_ERR(disk_super)) continue;
From: Takashi Iwai tiwai@suse.de
commit d00dd2f2645dca04cf399d8fc692f3f69b6dd996 upstream.
After
b3e34a47f989 ("x86/kexec: fix memory leak of elf header buffer"),
freeing image->elf_headers in the error path of crash_load_segments() is not needed because kimage_file_post_load_cleanup() will take care of that later. And not clearing it could result in a double-free.
Drop the superfluous vfree() call at the error path of crash_load_segments().
Fixes: b3e34a47f989 ("x86/kexec: fix memory leak of elf header buffer") Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Acked-by: Baoquan He bhe@redhat.com Acked-by: Vlastimil Babka vbabka@suse.cz Cc: stable@kernel.org Link: https://lore.kernel.org/r/20221122115122.13937-1-tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/crash.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-)
--- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -401,10 +401,8 @@ int crash_load_segments(struct kimage *i kbuf.buf_align = ELF_CORE_HEADER_ALIGN; kbuf.mem = KEXEC_BUF_MEM_UNKNOWN; ret = kexec_add_buffer(&kbuf); - if (ret) { - vfree((void *)image->elf_headers); + if (ret) return ret; - } image->elf_load_addr = kbuf.mem; pr_debug("Loaded ELF headers at 0x%lx bufsz=0x%lx memsz=0x%lx\n", image->elf_load_addr, kbuf.bufsz, kbuf.bufsz);
From: Rodrigo Branco bsdaemon@google.com
commit a664ec9158eeddd75121d39c9a0758016097fa96 upstream.
We missed the window between the TIF flag update and the next reschedule.
Signed-off-by: Rodrigo Branco bsdaemon@google.com Reviewed-by: Borislav Petkov (AMD) bp@alien8.de Signed-off-by: Ingo Molnar mingo@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/x86/kernel/cpu/bugs.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -1951,6 +1951,8 @@ static int ib_prctl_set(struct task_stru if (ctrl == PR_SPEC_FORCE_DISABLE) task_set_spec_ib_force_disable(task); task_update_spec_tif(task); + if (task == current) + indirect_branch_prediction_barrier(); break; default: return -ERANGE;
From: Jeff Layton jlayton@kernel.org
commit cad853374d85fe678d721512cecfabd7636e51f3 upstream.
If v4 READDIR operation hits a mountpoint and gets back an error, then it will include that entry in the reply and set RDATTR_ERROR for it to the error.
That's fine for "normal" exported filesystems, but on the v4root, we need to be more careful to only expose the existence of dentries that lead to exports.
If the mountd upcall times out while checking to see whether a mountpoint on the v4root is exported, then we have no recourse other than to fail the whole operation.
Cc: Steve Dickson steved@redhat.com Link: https://bugzilla.kernel.org/show_bug.cgi?id=216777 Reported-by: JianHong Yin yin-jianhong@163.com Signed-off-by: Jeff Layton jlayton@kernel.org Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/nfsd/nfs4xdr.c | 11 +++++++++++ 1 file changed, 11 insertions(+)
--- a/fs/nfsd/nfs4xdr.c +++ b/fs/nfsd/nfs4xdr.c @@ -3514,6 +3514,17 @@ nfsd4_encode_dirent(void *ccdv, const ch case nfserr_noent: xdr_truncate_encode(xdr, start_offset); goto skip_entry; + case nfserr_jukebox: + /* + * The pseudoroot should only display dentries that lead to + * exports. If we get EJUKEBOX here, then we can't tell whether + * this entry should be included. Just fail the whole READDIR + * with NFS4ERR_DELAY in that case, and hope that the situation + * will resolve itself by the client's next attempt. + */ + if (cd->rd_fhp->fh_export->ex_flags & NFSEXP_V4ROOT) + goto fail; + fallthrough; default: /* * If the client requested the RDATTR_ERROR attribute,
From: Paul Menzel pmenzel@molgen.mpg.de
commit f685dd7a8025f2554f73748cfdb8143a21fb92c7 upstream.
Commit 62d89a7d49af ("video: fbdev: matroxfb: set maxvram of vbG200eW to the same as vbG200 to avoid black screen") accidently decreases the maximum memory size for the Matrox G200eW (102b:0532) from 8 MB to 1 MB by missing one zero. This caused the driver initialization to fail with the messages below, as the minimum required VRAM size is 2 MB:
[ 9.436420] matroxfb: Matrox MGA-G200eW (PCI) detected [ 9.444502] matroxfb: cannot determine memory size [ 9.449316] matroxfb: probe of 0000:0a:03.0 failed with error -1
So, add the missing 0 to make it the intended 16 MB. Successfully tested on the Dell PowerEdge R910/0KYD3D, BIOS 2.10.0 08/29/2013, that the warning is gone.
While at it, add a leading 0 to the maxdisplayable entry, so it’s aligned properly. The value could probably also be increased from 8 MB to 16 MB, as the G200 uses the same values, but I have not checked any datasheet.
Note, matroxfb is obsolete and superseded by the maintained DRM driver mga200, which is used by default on most systems where both drivers are available. Therefore, on most systems it was only a cosmetic issue.
Fixes: 62d89a7d49af ("video: fbdev: matroxfb: set maxvram of vbG200eW to the same as vbG200 to avoid black screen") Link: https://lore.kernel.org/linux-fbdev/972999d3-b75d-5680-fcef-6e6905c52ac5@sus... Cc: it+linux-fbdev@molgen.mpg.de Cc: Z. Liu liuzx@knownsec.com Cc: Rich Felker dalias@libc.org Cc: stable@vger.kernel.org Signed-off-by: Paul Menzel pmenzel@molgen.mpg.de Signed-off-by: Helge Deller deller@gmx.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/video/fbdev/matrox/matroxfb_base.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/video/fbdev/matrox/matroxfb_base.c +++ b/drivers/video/fbdev/matrox/matroxfb_base.c @@ -1377,8 +1377,8 @@ static struct video_board vbG200 = { .lowlevel = &matrox_G100 }; static struct video_board vbG200eW = { - .maxvram = 0x100000, - .maxdisplayable = 0x800000, + .maxvram = 0x1000000, + .maxdisplayable = 0x0800000, .accelID = FB_ACCEL_MATROX_MGAG200, .lowlevel = &matrox_G100 };
From: Jens Axboe axboe@kernel.dk
commit 9cea62b2cbabff8ed46f2df17778b624ad9dd25a upstream.
If we split a bio marked with REQ_NOWAIT, then we can trigger spurious EAGAIN if constituent parts of that split bio end up failing request allocations. Parts will complete just fine, but just a single failure in one of the chained bios will yield an EAGAIN final result for the parent bio.
Return EAGAIN early if we end up needing to split such a bio, which allows for saner recovery handling.
Cc: stable@vger.kernel.org # 5.15+ Link: https://github.com/axboe/liburing/issues/766 Reported-by: Michael Kelley mikelley@microsoft.com Reviewed-by: Keith Busch kbusch@kernel.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk-merge.c | 10 ++++++++++ 1 file changed, 10 insertions(+)
--- a/block/blk-merge.c +++ b/block/blk-merge.c @@ -279,6 +279,16 @@ static struct bio *blk_bio_segment_split *segs = nsegs; return NULL; split: + /* + * We can't sanely support splitting for a REQ_NOWAIT bio. End it + * with EAGAIN if splitting is required and return an error pointer. + */ + if (bio->bi_opf & REQ_NOWAIT) { + bio->bi_status = BLK_STS_AGAIN; + bio_endio(bio); + return ERR_PTR(-EAGAIN); + } + *segs = nsegs;
/*
From: Pavel Begunkov asml.silence@gmail.com
commit 12521a5d5cb7ff0ad43eadfc9c135d86e1131fa8 upstream.
Jiffy to ktime CQ waiting conversion broke how we treat timeouts, in particular we rearm it anew every time we get into io_cqring_wait_schedule() without adjusting the timeout. Waiting for 2 CQEs and getting a task_work in the middle may double the timeout value, or even worse in some cases task may wait indefinitely.
Cc: stable@vger.kernel.org Fixes: 228339662b398 ("io_uring: don't convert to jiffies for waiting on timeouts") Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/f7bffddd71b08f28a877d44d37ac953ddb01590d.167291566... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)
--- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -7598,7 +7598,7 @@ static int io_run_task_work_sig(void) /* when returns >0, the caller should retry */ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx, struct io_wait_queue *iowq, - ktime_t timeout) + ktime_t *timeout) { int ret;
@@ -7610,7 +7610,7 @@ static inline int io_cqring_wait_schedul if (test_bit(0, &ctx->check_cq_overflow)) return 1;
- if (!schedule_hrtimeout(&timeout, HRTIMER_MODE_ABS)) + if (!schedule_hrtimeout(timeout, HRTIMER_MODE_ABS)) return -ETIME; return 1; } @@ -7673,7 +7673,7 @@ static int io_cqring_wait(struct io_ring } prepare_to_wait_exclusive(&ctx->cq_wait, &iowq.wq, TASK_INTERRUPTIBLE); - ret = io_cqring_wait_schedule(ctx, &iowq, timeout); + ret = io_cqring_wait_schedule(ctx, &iowq, &timeout); finish_wait(&ctx->cq_wait, &iowq.wq); cond_resched(); } while (ret > 0);
From: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com
commit b878d3ba9bb41cddb73ba4b56e5552f0a638daca upstream.
Commit 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver")' added rfi_restriction_data_rate_base string, mmio details and documentation, but missed adding attribute to sysfs.
Add missing sysfs attribute.
Fixes: 473be51142ad ("thermal: int340x: processor_thermal: Add RFIM driver") Cc: 5.11+ stable@vger.kernel.org # v5.11+ Signed-off-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/thermal/intel/int340x_thermal/processor_thermal_rfim.c | 4 ++++ 1 file changed, 4 insertions(+)
--- a/drivers/thermal/intel/int340x_thermal/processor_thermal_rfim.c +++ b/drivers/thermal/intel/int340x_thermal/processor_thermal_rfim.c @@ -172,6 +172,7 @@ static const struct attribute_group fivr RFIM_SHOW(rfi_restriction_run_busy, 1) RFIM_SHOW(rfi_restriction_err_code, 1) RFIM_SHOW(rfi_restriction_data_rate, 1) +RFIM_SHOW(rfi_restriction_data_rate_base, 1) RFIM_SHOW(ddr_data_rate_point_0, 1) RFIM_SHOW(ddr_data_rate_point_1, 1) RFIM_SHOW(ddr_data_rate_point_2, 1) @@ -181,11 +182,13 @@ RFIM_SHOW(rfi_disable, 1) RFIM_STORE(rfi_restriction_run_busy, 1) RFIM_STORE(rfi_restriction_err_code, 1) RFIM_STORE(rfi_restriction_data_rate, 1) +RFIM_STORE(rfi_restriction_data_rate_base, 1) RFIM_STORE(rfi_disable, 1)
static DEVICE_ATTR_RW(rfi_restriction_run_busy); static DEVICE_ATTR_RW(rfi_restriction_err_code); static DEVICE_ATTR_RW(rfi_restriction_data_rate); +static DEVICE_ATTR_RW(rfi_restriction_data_rate_base); static DEVICE_ATTR_RO(ddr_data_rate_point_0); static DEVICE_ATTR_RO(ddr_data_rate_point_1); static DEVICE_ATTR_RO(ddr_data_rate_point_2); @@ -248,6 +251,7 @@ static struct attribute *dvfs_attrs[] = &dev_attr_rfi_restriction_run_busy.attr, &dev_attr_rfi_restriction_err_code.attr, &dev_attr_rfi_restriction_data_rate.attr, + &dev_attr_rfi_restriction_data_rate_base.attr, &dev_attr_ddr_data_rate_point_0.attr, &dev_attr_ddr_data_rate_point_1.attr, &dev_attr_ddr_data_rate_point_2.attr,
From: Ben Dooks ben-linux@fluff.org
commit b9b916aee6715cd7f3318af6dc360c4729417b94 upstream.
If the get_user(x, ptr) has x as a pointer, then the setting of (x) = 0 is going to produce the following sparse warning, so fix this by forcing the type of 'x' when access_ok() fails.
fs/aio.c:2073:21: warning: Using plain integer as NULL pointer
Signed-off-by: Ben Dooks ben-linux@fluff.org Reviewed-by: Palmer Dabbelt palmer@rivosinc.com Link: https://lore.kernel.org/r/20221229170545.718264-1-ben-linux@fluff.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/arch/riscv/include/asm/uaccess.h +++ b/arch/riscv/include/asm/uaccess.h @@ -216,7 +216,7 @@ do { \ might_fault(); \ access_ok(__p, sizeof(*__p)) ? \ __get_user((x), __p) : \ - ((x) = 0, -EFAULT); \ + ((x) = (__force __typeof__(x))0, -EFAULT); \ })
#define __put_user_asm(insn, x, ptr, err) \
From: Björn Töpel bjorn@rivosinc.com
commit b2d473a6019ef9a54b0156ecdb2e0398c9fa6a24 upstream.
In the compressed instruction extension, c.jr, c.jalr, c.mv, and c.add is encoded the following way (each instruction is 16b):
---+-+-----------+-----------+-- 100 0 rs1[4:0]!=0 00000 10 : c.jr 100 1 rs1[4:0]!=0 00000 10 : c.jalr 100 0 rd[4:0]!=0 rs2[4:0]!=0 10 : c.mv 100 1 rd[4:0]!=0 rs2[4:0]!=0 10 : c.add
The following logic is used to decode c.jr and c.jalr:
insn & 0xf007 == 0x8002 => instruction is an c.jr insn & 0xf007 == 0x9002 => instruction is an c.jalr
When 0xf007 is used to mask the instruction, c.mv can be incorrectly decoded as c.jr, and c.add as c.jalr.
Correct the decoding by changing the mask from 0xf007 to 0xf07f.
Fixes: c22b0bcb1dd0 ("riscv: Add kprobes supported") Signed-off-by: Björn Töpel bjorn@rivosinc.com Reviewed-by: Conor Dooley conor.dooley@microchip.com Reviewed-by: Guo Ren guoren@kernel.org Link: https://lore.kernel.org/r/20230102160748.1307289-1-bjorn@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Palmer Dabbelt palmer@rivosinc.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- arch/riscv/kernel/probes/simulate-insn.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/riscv/kernel/probes/simulate-insn.h +++ b/arch/riscv/kernel/probes/simulate-insn.h @@ -31,9 +31,9 @@ __RISCV_INSN_FUNCS(fence, 0x7f, 0x0f); } while (0)
__RISCV_INSN_FUNCS(c_j, 0xe003, 0xa001); -__RISCV_INSN_FUNCS(c_jr, 0xf007, 0x8002); +__RISCV_INSN_FUNCS(c_jr, 0xf07f, 0x8002); __RISCV_INSN_FUNCS(c_jal, 0xe003, 0x2001); -__RISCV_INSN_FUNCS(c_jalr, 0xf007, 0x9002); +__RISCV_INSN_FUNCS(c_jalr, 0xf07f, 0x9002); __RISCV_INSN_FUNCS(c_beqz, 0xe003, 0xc001); __RISCV_INSN_FUNCS(c_bnez, 0xe003, 0xe001); __RISCV_INSN_FUNCS(c_ebreak, 0xffff, 0x9002);
From: Zhenyu Wang zhenyuw@linux.intel.com
commit c4b850d1f448a901fbf4f7f36dec38c84009b489 upstream.
When gvt debug fs is destroyed, need to have a sane check if drm minor's debugfs root is still available or not, otherwise in case like device remove through unbinding, drm minor's debugfs directory has already been removed, then intel_gvt_debugfs_clean() would act upon dangling pointer like below oops.
i915 0000:00:02.0: Direct firmware load for i915/gvt/vid_0x8086_did_0x1926_rid_0x0a.golden_hw_state failed with error -2 i915 0000:00:02.0: MDEV: Registered Console: switching to colour dummy device 80x25 i915 0000:00:02.0: MDEV: Unregistering BUG: kernel NULL pointer dereference, address: 00000000000000a0 PGD 0 P4D 0 Oops: 0002 [#1] PREEMPT SMP PTI CPU: 2 PID: 2486 Comm: gfx-unbind.sh Tainted: G I 6.1.0-rc8+ #15 Hardware name: Dell Inc. XPS 13 9350/0JXC1H, BIOS 1.13.0 02/10/2020 RIP: 0010:down_write+0x1f/0x90 Code: 1d ff ff 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 53 48 89 fb e8 62 c0 ff ff bf 01 00 00 00 e8 28 5e 31 ff 31 c0 ba 01 00 00 00 <f0> 48 0f b1 13 75 33 65 48 8b 04 25 c0 bd 01 00 48 89 43 08 bf 01 RSP: 0018:ffff9eb3036ffcc8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 00000000000000a0 RCX: ffffff8100000000 RDX: 0000000000000001 RSI: 0000000000000064 RDI: ffffffffa48787a8 RBP: ffff9eb3036ffd30 R08: ffffeb1fc45a0608 R09: ffffeb1fc45a05c0 R10: 0000000000000002 R11: 0000000000000000 R12: 0000000000000000 R13: ffff91acc33fa328 R14: ffff91acc033f080 R15: ffff91acced533e0 FS: 00007f6947bba740(0000) GS:ffff91ae36d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00000000000000a0 CR3: 00000001133a2002 CR4: 00000000003706e0 Call Trace: <TASK> simple_recursive_removal+0x9f/0x2a0 ? start_creating.part.0+0x120/0x120 ? _raw_spin_lock+0x13/0x40 debugfs_remove+0x40/0x60 intel_gvt_debugfs_clean+0x15/0x30 [kvmgt] intel_gvt_clean_device+0x49/0xe0 [kvmgt] intel_gvt_driver_remove+0x2f/0xb0 i915_driver_remove+0xa4/0xf0 i915_pci_remove+0x1a/0x30 pci_device_remove+0x33/0xa0 device_release_driver_internal+0x1b2/0x230 unbind_store+0xe0/0x110 kernfs_fop_write_iter+0x11b/0x1f0 vfs_write+0x203/0x3d0 ksys_write+0x63/0xe0 do_syscall_64+0x37/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7f6947cb5190 Code: 40 00 48 8b 15 71 9c 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d 51 24 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89 RSP: 002b:00007ffcbac45a28 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f6947cb5190 RDX: 000000000000000d RSI: 0000555e35c866a0 RDI: 0000000000000001 RBP: 0000555e35c866a0 R08: 0000000000000002 R09: 0000555e358cb97c R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001 R13: 000000000000000d R14: 0000000000000000 R15: 0000555e358cb8e0 </TASK> Modules linked in: kvmgt CR2: 00000000000000a0 ---[ end trace 0000000000000000 ]---
Cc: Wang, Zhi zhi.a.wang@intel.com Cc: He, Yu yu.he@intel.com Cc: stable@vger.kernel.org Reviewed-by: Zhi Wang zhi.a.wang@intel.com Fixes: bc7b0be316ae ("drm/i915/gvt: Add basic debugfs infrastructure") Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Link: http://patchwork.freedesktop.org/patch/msgid/20221219140357.769557-1-zhenyuw... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/gvt/debugfs.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/i915/gvt/debugfs.c +++ b/drivers/gpu/drm/i915/gvt/debugfs.c @@ -199,6 +199,10 @@ void intel_gvt_debugfs_init(struct intel */ void intel_gvt_debugfs_clean(struct intel_gvt *gvt) { - debugfs_remove_recursive(gvt->debugfs_root); - gvt->debugfs_root = NULL; + struct drm_minor *minor = gvt->gt->i915->drm.primary; + + if (minor->debugfs_root) { + debugfs_remove_recursive(gvt->debugfs_root); + gvt->debugfs_root = NULL; + } }
From: Zhenyu Wang zhenyuw@linux.intel.com
commit 704f3384f322b40ba24d958473edfb1c9750c8fd upstream.
Check carefully on root debugfs available when destroying vgpu, e.g in remove case drm minor's debugfs root might already be destroyed, which led to kernel oops like below.
Console: switching to colour dummy device 80x25 i915 0000:00:02.0: MDEV: Unregistering intel_vgpu_mdev b1338b2d-a709-4c23-b766-cc436c36cdf0: Removing from iommu group 14 BUG: kernel NULL pointer dereference, address: 0000000000000150 PGD 0 P4D 0 Oops: 0000 [#1] PREEMPT SMP CPU: 3 PID: 1046 Comm: driverctl Not tainted 6.1.0-rc2+ #6 Hardware name: HP HP ProDesk 600 G3 MT/829D, BIOS P02 Ver. 02.44 09/13/2022 RIP: 0010:__lock_acquire+0x5e2/0x1f90 Code: 87 ad 09 00 00 39 05 e1 1e cc 02 0f 82 f1 09 00 00 ba 01 00 00 00 48 83 c4 48 89 d0 5b 5d 41 5c 41 5d 41 5e 41 5f c3 45 31 ff <48> 81 3f 60 9e c2 b6 45 0f 45 f8 83 fe 01 0f 87 55 fa ff ff 89 f0 RSP: 0018:ffff9f770274f948 EFLAGS: 00010046 RAX: 0000000000000003 RBX: 0000000000000000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000150 RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000 R10: ffff8895d1173300 R11: 0000000000000001 R12: 0000000000000000 R13: 0000000000000150 R14: 0000000000000000 R15: 0000000000000000 FS: 00007fc9b2ba0740(0000) GS:ffff889cdfcc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000150 CR3: 000000010fd93005 CR4: 00000000003706e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> lock_acquire+0xbf/0x2b0 ? simple_recursive_removal+0xa5/0x2b0 ? lock_release+0x13d/0x2d0 down_write+0x2a/0xd0 ? simple_recursive_removal+0xa5/0x2b0 simple_recursive_removal+0xa5/0x2b0 ? start_creating.part.0+0x110/0x110 ? _raw_spin_unlock+0x29/0x40 debugfs_remove+0x40/0x60 intel_gvt_debugfs_remove_vgpu+0x15/0x30 [kvmgt] intel_gvt_destroy_vgpu+0x60/0x100 [kvmgt] intel_vgpu_release_dev+0xe/0x20 [kvmgt] device_release+0x30/0x80 kobject_put+0x79/0x1b0 device_release_driver_internal+0x1b8/0x230 bus_remove_device+0xec/0x160 device_del+0x189/0x400 ? up_write+0x9c/0x1b0 ? mdev_device_remove_common+0x60/0x60 [mdev] mdev_device_remove_common+0x22/0x60 [mdev] mdev_device_remove_cb+0x17/0x20 [mdev] device_for_each_child+0x56/0x80 mdev_unregister_parent+0x5a/0x81 [mdev] intel_gvt_clean_device+0x2d/0xe0 [kvmgt] intel_gvt_driver_remove+0x2e/0xb0 [i915] i915_driver_remove+0xac/0x100 [i915] i915_pci_remove+0x1a/0x30 [i915] pci_device_remove+0x31/0xa0 device_release_driver_internal+0x1b8/0x230 unbind_store+0xd8/0x100 kernfs_fop_write_iter+0x156/0x210 vfs_write+0x236/0x4a0 ksys_write+0x61/0xd0 do_syscall_64+0x55/0x80 ? find_held_lock+0x2b/0x80 ? lock_release+0x13d/0x2d0 ? up_read+0x17/0x20 ? lock_is_held_type+0xe3/0x140 ? asm_exc_page_fault+0x22/0x30 ? lockdep_hardirqs_on+0x7d/0x100 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fc9b2c9e0c4 Code: 15 71 7d 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 80 3d 3d 05 0e 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 48 89 54 24 18 48 RSP: 002b:00007ffec29c81c8 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007fc9b2c9e0c4 RDX: 000000000000000d RSI: 0000559f8b5f48a0 RDI: 0000000000000001 RBP: 0000559f8b5f48a0 R08: 0000559f8b5f3540 R09: 00007fc9b2d76d30 R10: 0000000000000000 R11: 0000000000000202 R12: 000000000000000d R13: 00007fc9b2d77780 R14: 000000000000000d R15: 00007fc9b2d72a00 </TASK> Modules linked in: sunrpc intel_rapl_msr intel_rapl_common intel_pmc_core_pltdrv intel_pmc_core intel_tcc_cooling x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ee1004 igbvf rapl vfat fat intel_cstate intel_uncore pktcdvd i2c_i801 pcspkr wmi_bmof i2c_smbus acpi_pad vfio_pci vfio_pci_core vfio_virqfd zram fuse dm_multipath kvmgt mdev vfio_iommu_type1 vfio kvm irqbypass i915 nvme e1000e igb nvme_core crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic serio_raw ghash_clmulni_intel sha512_ssse3 dca drm_buddy intel_gtt video wmi drm_display_helper ttm CR2: 0000000000000150 ---[ end trace 0000000000000000 ]---
Cc: Wang Zhi zhi.a.wang@intel.com Cc: He Yu yu.he@intel.com Cc: Alex Williamson alex.williamson@redhat.com Cc: stable@vger.kernel.org Reviewed-by: Zhi Wang zhi.a.wang@intel.com Tested-by: Yu He yu.he@intel.com Fixes: bc7b0be316ae ("drm/i915/gvt: Add basic debugfs infrastructure") Signed-off-by: Zhenyu Wang zhenyuw@linux.intel.com Link: http://patchwork.freedesktop.org/patch/msgid/20221219140357.769557-2-zhenyuw... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/i915/gvt/debugfs.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/drivers/gpu/drm/i915/gvt/debugfs.c +++ b/drivers/gpu/drm/i915/gvt/debugfs.c @@ -175,8 +175,13 @@ void intel_gvt_debugfs_add_vgpu(struct i */ void intel_gvt_debugfs_remove_vgpu(struct intel_vgpu *vgpu) { - debugfs_remove_recursive(vgpu->debugfs); - vgpu->debugfs = NULL; + struct intel_gvt *gvt = vgpu->gvt; + struct drm_minor *minor = gvt->gt->i915->drm.primary; + + if (minor->debugfs_root && gvt->debugfs_root) { + debugfs_remove_recursive(vgpu->debugfs); + vgpu->debugfs = NULL; + } }
/**
From: Arnd Bergmann arnd@arndb.de
commit 55d1cbbbb29e6656c662ee8f73ba1fc4777532eb upstream.
gcc warns about a couple of instances in which a sanity check exists but the author wasn't sure how to react to it failing, which makes it look like a possible bug:
fs/hfsplus/inode.c: In function 'hfsplus_cat_read_inode': fs/hfsplus/inode.c:503:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 503 | /* panic? */; | ^ fs/hfsplus/inode.c:524:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 524 | /* panic? */; | ^ fs/hfsplus/inode.c: In function 'hfsplus_cat_write_inode': fs/hfsplus/inode.c:582:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 582 | /* panic? */; | ^ fs/hfsplus/inode.c:608:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 608 | /* panic? */; | ^ fs/hfs/inode.c: In function 'hfs_write_inode': fs/hfs/inode.c:464:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 464 | /* panic? */; | ^ fs/hfs/inode.c:485:37: error: suggest braces around empty body in an 'if' statement [-Werror=empty-body] 485 | /* panic? */; | ^
panic() is probably not the correct choice here, but a WARN_ON seems appropriate and avoids the compile-time warning.
Link: https://lkml.kernel.org/r/20210927102149.1809384-1-arnd@kernel.org Link: https://lore.kernel.org/all/20210322223249.2632268-1-arnd@kernel.org/ Signed-off-by: Arnd Bergmann arnd@arndb.de Reviewed-by: Christian Brauner christian.brauner@ubuntu.com Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Christian Brauner christian.brauner@ubuntu.com Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jan Kara jack@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/hfs/inode.c | 6 ++---- fs/hfsplus/inode.c | 12 ++++-------- 2 files changed, 6 insertions(+), 12 deletions(-)
--- a/fs/hfs/inode.c +++ b/fs/hfs/inode.c @@ -464,8 +464,7 @@ int hfs_write_inode(struct inode *inode, goto out;
if (S_ISDIR(main_inode->i_mode)) { - if (fd.entrylength < sizeof(struct hfs_cat_dir)) - /* panic? */; + WARN_ON(fd.entrylength < sizeof(struct hfs_cat_dir)); hfs_bnode_read(fd.bnode, &rec, fd.entryoffset, sizeof(struct hfs_cat_dir)); if (rec.type != HFS_CDR_DIR || @@ -485,8 +484,7 @@ int hfs_write_inode(struct inode *inode, hfs_bnode_write(fd.bnode, &rec, fd.entryoffset, sizeof(struct hfs_cat_file)); } else { - if (fd.entrylength < sizeof(struct hfs_cat_file)) - /* panic? */; + WARN_ON(fd.entrylength < sizeof(struct hfs_cat_file)); hfs_bnode_read(fd.bnode, &rec, fd.entryoffset, sizeof(struct hfs_cat_file)); if (rec.type != HFS_CDR_FIL || --- a/fs/hfsplus/inode.c +++ b/fs/hfsplus/inode.c @@ -509,8 +509,7 @@ int hfsplus_cat_read_inode(struct inode if (type == HFSPLUS_FOLDER) { struct hfsplus_cat_folder *folder = &entry.folder;
- if (fd->entrylength < sizeof(struct hfsplus_cat_folder)) - /* panic? */; + WARN_ON(fd->entrylength < sizeof(struct hfsplus_cat_folder)); hfs_bnode_read(fd->bnode, &entry, fd->entryoffset, sizeof(struct hfsplus_cat_folder)); hfsplus_get_perms(inode, &folder->permissions, 1); @@ -530,8 +529,7 @@ int hfsplus_cat_read_inode(struct inode } else if (type == HFSPLUS_FILE) { struct hfsplus_cat_file *file = &entry.file;
- if (fd->entrylength < sizeof(struct hfsplus_cat_file)) - /* panic? */; + WARN_ON(fd->entrylength < sizeof(struct hfsplus_cat_file)); hfs_bnode_read(fd->bnode, &entry, fd->entryoffset, sizeof(struct hfsplus_cat_file));
@@ -588,8 +586,7 @@ int hfsplus_cat_write_inode(struct inode if (S_ISDIR(main_inode->i_mode)) { struct hfsplus_cat_folder *folder = &entry.folder;
- if (fd.entrylength < sizeof(struct hfsplus_cat_folder)) - /* panic? */; + WARN_ON(fd.entrylength < sizeof(struct hfsplus_cat_folder)); hfs_bnode_read(fd.bnode, &entry, fd.entryoffset, sizeof(struct hfsplus_cat_folder)); /* simple node checks? */ @@ -614,8 +611,7 @@ int hfsplus_cat_write_inode(struct inode } else { struct hfsplus_cat_file *file = &entry.file;
- if (fd.entrylength < sizeof(struct hfsplus_cat_file)) - /* panic? */; + WARN_ON(fd.entrylength < sizeof(struct hfsplus_cat_file)); hfs_bnode_read(fd.bnode, &entry, fd.entryoffset, sizeof(struct hfsplus_cat_file)); hfsplus_inode_write_fork(inode, &file->data_fork);
From: Linus Torvalds torvalds@linux-foundation.org
commit cb7a95af78d29442b8294683eca4897544b8ef46 upstream.
Commit 55d1cbbbb29e ("hfs/hfsplus: use WARN_ON for sanity check") fixed a build warning by turning a comment into a WARN_ON(), but it turns out that syzbot then complains because it can trigger said warning with a corrupted hfs image.
The warning actually does warn about a bad situation, but we are much better off just handling it as the error it is. So rather than warn about us doing bad things, stop doing the bad things and return -EIO.
While at it, also fix a memory leak that was introduced by an earlier fix for a similar syzbot warning situation, and add a check for one case that historically wasn't handled at all (ie neither comment nor subsequent WARN_ON).
Reported-by: syzbot+7bb7cd3595533513a9e7@syzkaller.appspotmail.com Fixes: 55d1cbbbb29e ("hfs/hfsplus: use WARN_ON for sanity check") Fixes: 8d824e69d9f3 ("hfs: fix OOB Read in __hfs_brec_find") Link: https://lore.kernel.org/lkml/000000000000dbce4e05f170f289@google.com/ Tested-by: Michael Schmitz schmitzmic@gmail.com Cc: Arnd Bergmann arnd@arndb.de Cc: Matthew Wilcox willy@infradead.org Cc: Viacheslav Dubeyko slava@dubeyko.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/hfs/inode.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)
--- a/fs/hfs/inode.c +++ b/fs/hfs/inode.c @@ -456,15 +456,16 @@ int hfs_write_inode(struct inode *inode, /* panic? */ return -EIO;
+ res = -EIO; if (HFS_I(main_inode)->cat_key.CName.len > HFS_NAMELEN) - return -EIO; + goto out; fd.search_key->cat = HFS_I(main_inode)->cat_key; if (hfs_brec_find(&fd)) - /* panic? */ goto out;
if (S_ISDIR(main_inode->i_mode)) { - WARN_ON(fd.entrylength < sizeof(struct hfs_cat_dir)); + if (fd.entrylength < sizeof(struct hfs_cat_dir)) + goto out; hfs_bnode_read(fd.bnode, &rec, fd.entryoffset, sizeof(struct hfs_cat_dir)); if (rec.type != HFS_CDR_DIR || @@ -477,6 +478,8 @@ int hfs_write_inode(struct inode *inode, hfs_bnode_write(fd.bnode, &rec, fd.entryoffset, sizeof(struct hfs_cat_dir)); } else if (HFS_IS_RSRC(inode)) { + if (fd.entrylength < sizeof(struct hfs_cat_file)) + goto out; hfs_bnode_read(fd.bnode, &rec, fd.entryoffset, sizeof(struct hfs_cat_file)); hfs_inode_write_fork(inode, rec.file.RExtRec, @@ -484,7 +487,8 @@ int hfs_write_inode(struct inode *inode, hfs_bnode_write(fd.bnode, &rec, fd.entryoffset, sizeof(struct hfs_cat_file)); } else { - WARN_ON(fd.entrylength < sizeof(struct hfs_cat_file)); + if (fd.entrylength < sizeof(struct hfs_cat_file)) + goto out; hfs_bnode_read(fd.bnode, &rec, fd.entryoffset, sizeof(struct hfs_cat_file)); if (rec.type != HFS_CDR_FIL || @@ -501,9 +505,10 @@ int hfs_write_inode(struct inode *inode, hfs_bnode_write(fd.bnode, &rec, fd.entryoffset, sizeof(struct hfs_cat_file)); } + res = 0; out: hfs_find_exit(&fd); - return 0; + return res; }
static struct dentry *hfs_file_lookup(struct inode *dir, struct dentry *dentry,
From: Namjae Jeon linkinjeon@kernel.org
commit 83dcedd5540d4ac61376ddff5362f7d9f866a6ec upstream.
If kernel_recvmsg() return -EAGAIN in ksmbd_tcp_readv() and go round again, It will cause infinite loop issue. And all threads from next connections would be doing that. This patch add max retry count(2) to avoid it. kernel_recvmsg() will wait during 7sec timeout and try to retry two time if -EAGAIN is returned. And add flags of kvmalloc to __GFP_NOWARN and __GFP_NORETRY to disconnect immediately without retrying on memory alloation failure.
Fixes: 0626e6641f6b ("cifsd: add server handler for central processing and tranport layers") Cc: stable@vger.kernel.org Reported-by: zdi-disclosures@trendmicro.com # ZDI-CAN-18259 Reviewed-by: Sergey Senozhatsky senozhatsky@chromium.org Signed-off-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ksmbd/connection.c | 7 +++++-- fs/ksmbd/transport_tcp.c | 5 ++++- 2 files changed, 9 insertions(+), 3 deletions(-)
--- a/fs/ksmbd/connection.c +++ b/fs/ksmbd/connection.c @@ -310,9 +310,12 @@ int ksmbd_conn_handler_loop(void *p)
/* 4 for rfc1002 length field */ size = pdu_size + 4; - conn->request_buf = kvmalloc(size, GFP_KERNEL); + conn->request_buf = kvmalloc(size, + GFP_KERNEL | + __GFP_NOWARN | + __GFP_NORETRY); if (!conn->request_buf) - continue; + break;
memcpy(conn->request_buf, hdr_buf, sizeof(hdr_buf)); if (!ksmbd_smb_request(conn)) --- a/fs/ksmbd/transport_tcp.c +++ b/fs/ksmbd/transport_tcp.c @@ -295,6 +295,7 @@ static int ksmbd_tcp_readv(struct tcp_tr struct msghdr ksmbd_msg; struct kvec *iov; struct ksmbd_conn *conn = KSMBD_TRANS(t)->conn; + int max_retry = 2;
iov = get_conn_iovec(t, nr_segs); if (!iov) @@ -321,9 +322,11 @@ static int ksmbd_tcp_readv(struct tcp_tr } else if (conn->status == KSMBD_SESS_NEED_RECONNECT) { total_read = -EAGAIN; break; - } else if (length == -ERESTARTSYS || length == -EAGAIN) { + } else if ((length == -ERESTARTSYS || length == -EAGAIN) && + max_retry) { usleep_range(1000, 2000); length = 0; + max_retry--; continue; } else if (length <= 0) { total_read = -EAGAIN;
From: William Liu will@willsroot.io
commit 797805d81baa814f76cf7bdab35f86408a79d707 upstream.
"nt_len - CIFS_ENCPWD_SIZE" is passed directly from ksmbd_decode_ntlmssp_auth_blob to ksmbd_auth_ntlmv2. Malicious requests can set nt_len to less than CIFS_ENCPWD_SIZE, which results in a negative number (or large unsigned value) used for a subsequent memcpy in ksmbd_auth_ntlvm2 and can cause a panic.
Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: William Liu will@willsroot.io Signed-off-by: Hrvoje Mišetić misetichrvoje@gmail.com Acked-by: Namjae Jeon linkinjeon@kernel.org Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ksmbd/auth.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/fs/ksmbd/auth.c +++ b/fs/ksmbd/auth.c @@ -319,7 +319,8 @@ int ksmbd_decode_ntlmssp_auth_blob(struc dn_off = le32_to_cpu(authblob->DomainName.BufferOffset); dn_len = le16_to_cpu(authblob->DomainName.Length);
- if (blob_len < (u64)dn_off + dn_len || blob_len < (u64)nt_off + nt_len) + if (blob_len < (u64)dn_off + dn_len || blob_len < (u64)nt_off + nt_len || + nt_len < CIFS_ENCPWD_SIZE) return -EINVAL;
/* TODO : use domain name that imported from configuration file */
From: Mario Limonciello mario.limonciello@amd.com
A number of AMD based Rembrandt laptops are not working properly in suspend/resume. This has been root caused to be from the BIOS implementation not populating code for the AMD GUID in uPEP, but instead only the Microsoft one.
In later kernels this has been fixed by using the Microsoft GUID instead.
The following series of patches has fixed it in newer kernels:
commit ed470febf837 ("ACPI: PM: s2idle: Add support for upcoming AMD uPEP HID AMDI008") commit 1a2dcab517cb ("ACPI: PM: s2idle: Use LPS0 idle if ACPI_FADT_LOW_POWER_S0 is unset") commit 100a57379380 ("ACPI: x86: s2idle: Move _HID handling for AMD systems into structures") commit fd894f05cf30 ("ACPI: x86: s2idle: If a new AMD _HID is missing assume Rembrandt") commit a0bc002393d4 ("ACPI: x86: s2idle: Add module parameter to prefer Microsoft GUID") commit d0f61e89f08d ("ACPI: x86: s2idle: Add a quirk for ASUS TUF Gaming A17 FA707RE") commit ddeea2c3cb88 ("ACPI: x86: s2idle: Add a quirk for ASUS ROG Zephyrus G14") commit 888ca9c7955e ("ACPI: x86: s2idle: Add a quirk for Lenovo Slim 7 Pro 14ARH7") commit 631b54519e8e ("ACPI: x86: s2idle: Add a quirk for ASUSTeK COMPUTER INC. ROG Flow X13") commit 39f81776c680 ("ACPI: x86: s2idle: Fix a NULL pointer dereference") commit 54bd1e548701 ("ACPI: x86: s2idle: Add another ID to s2idle_dmi_table") commit 577821f756cf ("ACPI: x86: s2idle: Force AMD GUID/_REV 2 on HP Elitebook 865") commit e6d180a35bc0 ("ACPI: x86: s2idle: Stop using AMD specific codepath for Rembrandt+")
This is needlessly complex for 5.15.y though. To accomplish the same effective result revert commit f0c6225531e4 ("ACPI: PM: Add support for upcoming AMD uPEP HID AMDI007") instead.
Link: https://lore.kernel.org/stable/MN0PR12MB61015DB3D6EDBFD841157918E2F59@MN0PR1... Signed-off-by: Mario Limonciello mario.limonciello@amd.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/acpi/x86/s2idle.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-)
--- a/drivers/acpi/x86/s2idle.c +++ b/drivers/acpi/x86/s2idle.c @@ -378,16 +378,13 @@ static int lps0_device_attach(struct acp * AMDI0006: * - should use rev_id 0x0 * - function mask = 0x3: Should use Microsoft method - * AMDI0007: - * - Should use rev_id 0x2 - * - Should only use AMD method */ const char *hid = acpi_device_hid(adev); - rev_id = strcmp(hid, "AMDI0007") ? 0 : 2; + rev_id = 0; lps0_dsm_func_mask = validate_dsm(adev->handle, ACPI_LPS0_DSM_UUID_AMD, rev_id, &lps0_dsm_guid); lps0_dsm_func_mask_microsoft = validate_dsm(adev->handle, - ACPI_LPS0_DSM_UUID_MICROSOFT, 0, + ACPI_LPS0_DSM_UUID_MICROSOFT, rev_id, &lps0_dsm_guid_microsoft); if (lps0_dsm_func_mask > 0x3 && (!strcmp(hid, "AMD0004") || !strcmp(hid, "AMD0005") || @@ -395,9 +392,6 @@ static int lps0_device_attach(struct acp lps0_dsm_func_mask = (lps0_dsm_func_mask << 1) | 0x1; acpi_handle_debug(adev->handle, "_DSM UUID %s: Adjusted function mask: 0x%x\n", ACPI_LPS0_DSM_UUID_AMD, lps0_dsm_func_mask); - } else if (lps0_dsm_func_mask_microsoft > 0 && !strcmp(hid, "AMDI0007")) { - lps0_dsm_func_mask_microsoft = -EINVAL; - acpi_handle_debug(adev->handle, "_DSM Using AMD method\n"); } } else { rev_id = 1;
From: Matthieu Baerts matthieu.baerts@tessares.net
commit 34b21d1ddc8ace77a8fa35c1b1e06377209e0dae upstream.
tcp_request_sock_ops structure is specific to IPv4. It should then not be used with MPTCP subflows on top of IPv6.
For example, it contains the 'family' field, initialised to AF_INET. This 'family' field is used by TCP FastOpen code to generate the cookie but also by TCP Metrics, SELinux and SYN Cookies. Using the wrong family will not lead to crashes but displaying/using/checking wrong things.
Note that 'send_reset' callback from request_sock_ops structure is used in some error paths. It is then also important to use the correct one for IPv4 or IPv6.
The slab name can also be different in IPv4 and IPv6, it will be used when printing some log messages. The slab pointer will anyway be the same because the object size is the same for both v4 and v6. A BUILD_BUG_ON() has also been added to make sure this size is the same.
Fixes: cec37a6e41aa ("mptcp: Handle MP_CAPABLE options for outgoing connections") Reviewed-by: Mat Martineau mathew.j.martineau@linux.intel.com Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/subflow.c | 34 ++++++++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 8 deletions(-)
--- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -483,7 +483,7 @@ do_reset: mptcp_subflow_reset(sk); }
-static struct request_sock_ops mptcp_subflow_request_sock_ops __ro_after_init; +static struct request_sock_ops mptcp_subflow_v4_request_sock_ops __ro_after_init; static struct tcp_request_sock_ops subflow_request_sock_ipv4_ops __ro_after_init;
static int subflow_v4_conn_request(struct sock *sk, struct sk_buff *skb) @@ -496,7 +496,7 @@ static int subflow_v4_conn_request(struc if (skb_rtable(skb)->rt_flags & (RTCF_BROADCAST | RTCF_MULTICAST)) goto drop;
- return tcp_conn_request(&mptcp_subflow_request_sock_ops, + return tcp_conn_request(&mptcp_subflow_v4_request_sock_ops, &subflow_request_sock_ipv4_ops, sk, skb); drop: @@ -505,6 +505,7 @@ drop: }
#if IS_ENABLED(CONFIG_MPTCP_IPV6) +static struct request_sock_ops mptcp_subflow_v6_request_sock_ops __ro_after_init; static struct tcp_request_sock_ops subflow_request_sock_ipv6_ops __ro_after_init; static struct inet_connection_sock_af_ops subflow_v6_specific __ro_after_init; static struct inet_connection_sock_af_ops subflow_v6m_specific __ro_after_init; @@ -527,7 +528,7 @@ static int subflow_v6_conn_request(struc return 0; }
- return tcp_conn_request(&mptcp_subflow_request_sock_ops, + return tcp_conn_request(&mptcp_subflow_v6_request_sock_ops, &subflow_request_sock_ipv6_ops, sk, skb);
drop: @@ -540,7 +541,12 @@ struct request_sock *mptcp_subflow_reqsk struct sock *sk_listener, bool attach_listener) { - ops = &mptcp_subflow_request_sock_ops; + if (ops->family == AF_INET) + ops = &mptcp_subflow_v4_request_sock_ops; +#if IS_ENABLED(CONFIG_MPTCP_IPV6) + else if (ops->family == AF_INET6) + ops = &mptcp_subflow_v6_request_sock_ops; +#endif
return inet_reqsk_alloc(ops, sk_listener, attach_listener); } @@ -1791,7 +1797,6 @@ static struct tcp_ulp_ops subflow_ulp_op static int subflow_ops_init(struct request_sock_ops *subflow_ops) { subflow_ops->obj_size = sizeof(struct mptcp_subflow_request_sock); - subflow_ops->slab_name = "request_sock_subflow";
subflow_ops->slab = kmem_cache_create(subflow_ops->slab_name, subflow_ops->obj_size, 0, @@ -1808,9 +1813,10 @@ static int subflow_ops_init(struct reque
void __init mptcp_subflow_init(void) { - mptcp_subflow_request_sock_ops = tcp_request_sock_ops; - if (subflow_ops_init(&mptcp_subflow_request_sock_ops) != 0) - panic("MPTCP: failed to init subflow request sock ops\n"); + mptcp_subflow_v4_request_sock_ops = tcp_request_sock_ops; + mptcp_subflow_v4_request_sock_ops.slab_name = "request_sock_subflow_v4"; + if (subflow_ops_init(&mptcp_subflow_v4_request_sock_ops) != 0) + panic("MPTCP: failed to init subflow v4 request sock ops\n");
subflow_request_sock_ipv4_ops = tcp_request_sock_ipv4_ops; subflow_request_sock_ipv4_ops.route_req = subflow_v4_route_req; @@ -1824,6 +1830,18 @@ void __init mptcp_subflow_init(void) tcp_prot_override.release_cb = tcp_release_cb_override;
#if IS_ENABLED(CONFIG_MPTCP_IPV6) + /* In struct mptcp_subflow_request_sock, we assume the TCP request sock + * structures for v4 and v6 have the same size. It should not changed in + * the future but better to make sure to be warned if it is no longer + * the case. + */ + BUILD_BUG_ON(sizeof(struct tcp_request_sock) != sizeof(struct tcp6_request_sock)); + + mptcp_subflow_v6_request_sock_ops = tcp6_request_sock_ops; + mptcp_subflow_v6_request_sock_ops.slab_name = "request_sock_subflow_v6"; + if (subflow_ops_init(&mptcp_subflow_v6_request_sock_ops) != 0) + panic("MPTCP: failed to init subflow v6 request sock ops\n"); + subflow_request_sock_ipv6_ops = tcp_request_sock_ipv6_ops; subflow_request_sock_ipv6_ops.route_req = subflow_v6_route_req;
From: Matthieu Baerts matthieu.baerts@tessares.net
commit d3295fee3c756ece33ac0d935e172e68c0a4161b upstream.
Before, only the destructor from TCP request sock in IPv4 was called even if the subflow was IPv6.
It is important to use the right destructor to avoid memory leaks with some advanced IPv6 features, e.g. when the request socks contain specific IPv6 options.
Fixes: 79c0949e9a09 ("mptcp: Add key generation and token tree") Reviewed-by: Mat Martineau mathew.j.martineau@linux.intel.com Cc: stable@vger.kernel.org Signed-off-by: Matthieu Baerts matthieu.baerts@tessares.net Signed-off-by: Mat Martineau mathew.j.martineau@linux.intel.com Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/mptcp/subflow.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-)
--- a/net/mptcp/subflow.c +++ b/net/mptcp/subflow.c @@ -45,7 +45,6 @@ static void subflow_req_destructor(struc sock_put((struct sock *)subflow_req->msk);
mptcp_token_destroy_request(req); - tcp_request_sock_ops.destructor(req); }
static void subflow_generate_hmac(u64 key1, u64 key2, u32 nonce1, u32 nonce2, @@ -504,6 +503,12 @@ drop: return 0; }
+static void subflow_v4_req_destructor(struct request_sock *req) +{ + subflow_req_destructor(req); + tcp_request_sock_ops.destructor(req); +} + #if IS_ENABLED(CONFIG_MPTCP_IPV6) static struct request_sock_ops mptcp_subflow_v6_request_sock_ops __ro_after_init; static struct tcp_request_sock_ops subflow_request_sock_ipv6_ops __ro_after_init; @@ -535,6 +540,12 @@ drop: tcp_listendrop(sk); return 0; /* don't send reset */ } + +static void subflow_v6_req_destructor(struct request_sock *req) +{ + subflow_req_destructor(req); + tcp6_request_sock_ops.destructor(req); +} #endif
struct request_sock *mptcp_subflow_reqsk_alloc(const struct request_sock_ops *ops, @@ -1806,8 +1817,6 @@ static int subflow_ops_init(struct reque if (!subflow_ops->slab) return -ENOMEM;
- subflow_ops->destructor = subflow_req_destructor; - return 0; }
@@ -1815,6 +1824,8 @@ void __init mptcp_subflow_init(void) { mptcp_subflow_v4_request_sock_ops = tcp_request_sock_ops; mptcp_subflow_v4_request_sock_ops.slab_name = "request_sock_subflow_v4"; + mptcp_subflow_v4_request_sock_ops.destructor = subflow_v4_req_destructor; + if (subflow_ops_init(&mptcp_subflow_v4_request_sock_ops) != 0) panic("MPTCP: failed to init subflow v4 request sock ops\n");
@@ -1839,6 +1850,8 @@ void __init mptcp_subflow_init(void)
mptcp_subflow_v6_request_sock_ops = tcp6_request_sock_ops; mptcp_subflow_v6_request_sock_ops.slab_name = "request_sock_subflow_v6"; + mptcp_subflow_v6_request_sock_ops.destructor = subflow_v6_req_destructor; + if (subflow_ops_init(&mptcp_subflow_v6_request_sock_ops) != 0) panic("MPTCP: failed to init subflow v6 request sock ops\n");
From: Eric Biggers ebiggers@google.com
commit 105c78e12468413e426625831faa7db4284e1fec upstream.
Mounting a filesystem whose journal inode has the encrypt flag causes a NULL dereference in fscrypt_limit_io_blocks() when the 'inlinecrypt' mount option is used.
The problem is that when jbd2_journal_init_inode() calls bmap(), it eventually finds its way into ext4_iomap_begin(), which calls fscrypt_limit_io_blocks(). fscrypt_limit_io_blocks() requires that if the inode is encrypted, then its encryption key must already be set up. That's not the case here, since the journal inode is never "opened" like a normal file would be. Hence the crash.
A reproducer is:
mkfs.ext4 -F /dev/vdb debugfs -w /dev/vdb -R "set_inode_field <8> flags 0x80808" mount /dev/vdb /mnt -o inlinecrypt
To fix this, make ext4 consider journal inodes with the encrypt flag to be invalid. (Note, maybe other flags should be rejected on the journal inode too. For now, this is just the minimal fix for the above issue.)
I've marked this as fixing the commit that introduced the call to fscrypt_limit_io_blocks(), since that's what made an actual crash start being possible. But this fix could be applied to any version of ext4 that supports the encrypt feature.
Reported-by: syzbot+ba9dac45bc76c490b7c3@syzkaller.appspotmail.com Fixes: 38ea50daa7a4 ("ext4: support direct I/O with fscrypt using blk-crypto") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers ebiggers@google.com Link: https://lore.kernel.org/r/20221102053312.189962-1-ebiggers@kernel.org Signed-off-by: Theodore Ts'o tytso@mit.edu Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/ext4/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5157,7 +5157,7 @@ static struct inode *ext4_get_journal_in
ext4_debug("Journal inode found at %p: %lld bytes\n", journal_inode, journal_inode->i_size); - if (!S_ISREG(journal_inode->i_mode)) { + if (!S_ISREG(journal_inode->i_mode) || IS_ENCRYPTED(journal_inode)) { ext4_msg(sb, KERN_ERR, "invalid journal inode"); iput(journal_inode); return NULL;
From: Muhammad Usama Anjum usama.anjum@collabora.com
commit 5ad51ab618de5d05f4e692ebabeb6fe6289aaa57 upstream.
The build of kselftests fails if relative path is specified through KBUILD_OUTPUT or O=<path> method. BUILD variable is used to determine the path of the output objects. When make is run from other directories with relative paths, the exact path of the build objects is ambiguous and build fails.
make[1]: Entering directory '/home/usama/repos/kernel/linux_mainline2/tools/testing/selftests/alsa' gcc mixer-test.c -L/usr/lib/x86_64-linux-gnu -lasound -o build/kselftest/alsa/mixer-test /usr/bin/ld: cannot open output file build/kselftest/alsa/mixer-test
Set the BUILD variable to the absolute path of the output directory. Make the logic readable and easy to follow. Use spaces instead of tabs for indentation as if with tab indentation is considered recipe in make.
Signed-off-by: Muhammad Usama Anjum usama.anjum@collabora.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Tyler Hicks (Microsoft) code@tyhicks.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- tools/testing/selftests/Makefile | 26 +++++++++++++++++--------- 1 file changed, 17 insertions(+), 9 deletions(-)
--- a/tools/testing/selftests/Makefile +++ b/tools/testing/selftests/Makefile @@ -114,19 +114,27 @@ ifdef building_out_of_srctree override LDFLAGS = endif
-ifneq ($(O),) - BUILD := $(O)/kselftest +top_srcdir ?= ../../.. + +ifeq ("$(origin O)", "command line") + KBUILD_OUTPUT := $(O) +endif + +ifneq ($(KBUILD_OUTPUT),) + # Make's built-in functions such as $(abspath ...), $(realpath ...) cannot + # expand a shell special character '~'. We use a somewhat tedious way here. + abs_objtree := $(shell cd $(top_srcdir) && mkdir -p $(KBUILD_OUTPUT) && cd $(KBUILD_OUTPUT) && pwd) + $(if $(abs_objtree),, \ + $(error failed to create output directory "$(KBUILD_OUTPUT)")) + # $(realpath ...) resolves symlinks + abs_objtree := $(realpath $(abs_objtree)) + BUILD := $(abs_objtree)/kselftest else - ifneq ($(KBUILD_OUTPUT),) - BUILD := $(KBUILD_OUTPUT)/kselftest - else - BUILD := $(shell pwd) - DEFAULT_INSTALL_HDR_PATH := 1 - endif + BUILD := $(CURDIR) + DEFAULT_INSTALL_HDR_PATH := 1 endif
# Prepare for headers install -top_srcdir ?= ../../.. include $(top_srcdir)/scripts/subarch.include ARCH ?= $(SUBARCH) export KSFT_KHDR_INSTALL_DONE := 1
From: Qu Wenruo wqu@suse.com
commit 3d17adea74a56a4965f7a603d8ed8c66bb9356d9 upstream.
Previous commit a05d3c915314 ("btrfs: check superblock to ensure the fs was not modified at thaw time") only checks the content of the super block, but it doesn't really check if the on-disk super block has a matching checksum.
This patch will add the checksum verification to thaw time superblock verification.
This involves the following extra changes:
- Export btrfs_check_super_csum() As we need to call it in super.c.
- Change the argument list of btrfs_check_super_csum() Instead of passing a char *, directly pass struct btrfs_super_block * pointer.
- Verify that our checksum type didn't change before checking the checksum value, like it's done at mount time
Fixes: a05d3c915314 ("btrfs: check superblock to ensure the fs was not modified at thaw time") Reviewed-by: Johannes Thumshirn johannes.thumshirn@wdc.com Signed-off-by: Qu Wenruo wqu@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/btrfs/disk-io.c | 10 ++++------ fs/btrfs/disk-io.h | 2 ++ fs/btrfs/super.c | 16 ++++++++++++++++ 3 files changed, 22 insertions(+), 6 deletions(-)
--- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -202,11 +202,9 @@ static bool btrfs_supported_super_csum(u * Return 0 if the superblock checksum type matches the checksum value of that * algorithm. Pass the raw disk superblock data. */ -static int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, - char *raw_disk_sb) +int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, + const struct btrfs_super_block *disk_sb) { - struct btrfs_super_block *disk_sb = - (struct btrfs_super_block *)raw_disk_sb; char result[BTRFS_CSUM_SIZE]; SHASH_DESC_ON_STACK(shash, fs_info->csum_shash);
@@ -217,7 +215,7 @@ static int btrfs_check_super_csum(struct * BTRFS_SUPER_INFO_SIZE range, we expect that the unused space is * filled with zeros and is included in the checksum. */ - crypto_shash_digest(shash, raw_disk_sb + BTRFS_CSUM_SIZE, + crypto_shash_digest(shash, (const u8 *)disk_sb + BTRFS_CSUM_SIZE, BTRFS_SUPER_INFO_SIZE - BTRFS_CSUM_SIZE, result);
if (memcmp(disk_sb->csum, result, fs_info->csum_size)) @@ -3210,7 +3208,7 @@ int __cold open_ctree(struct super_block * We want to check superblock checksum, the type is stored inside. * Pass the whole disk block of size BTRFS_SUPER_INFO_SIZE (4k). */ - if (btrfs_check_super_csum(fs_info, (u8 *)disk_super)) { + if (btrfs_check_super_csum(fs_info, disk_super)) { btrfs_err(fs_info, "superblock checksum mismatch"); err = -EINVAL; btrfs_release_disk_super(disk_super); --- a/fs/btrfs/disk-io.h +++ b/fs/btrfs/disk-io.h @@ -52,6 +52,8 @@ struct extent_buffer *btrfs_find_create_ void btrfs_clean_tree_block(struct extent_buffer *buf); void btrfs_clear_oneshot_options(struct btrfs_fs_info *fs_info); int btrfs_start_pre_rw_mount(struct btrfs_fs_info *fs_info); +int btrfs_check_super_csum(struct btrfs_fs_info *fs_info, + const struct btrfs_super_block *disk_sb); int __cold open_ctree(struct super_block *sb, struct btrfs_fs_devices *fs_devices, char *options); --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2501,6 +2501,7 @@ static int check_dev_super(struct btrfs_ { struct btrfs_fs_info *fs_info = dev->fs_info; struct btrfs_super_block *sb; + u16 csum_type; int ret = 0;
/* This should be called with fs still frozen. */ @@ -2515,6 +2516,21 @@ static int check_dev_super(struct btrfs_ if (IS_ERR(sb)) return PTR_ERR(sb);
+ /* Verify the checksum. */ + csum_type = btrfs_super_csum_type(sb); + if (csum_type != btrfs_super_csum_type(fs_info->super_copy)) { + btrfs_err(fs_info, "csum type changed, has %u expect %u", + csum_type, btrfs_super_csum_type(fs_info->super_copy)); + ret = -EUCLEAN; + goto out; + } + + if (btrfs_check_super_csum(fs_info, sb)) { + btrfs_err(fs_info, "csum for on-disk super block no longer matches"); + ret = -EUCLEAN; + goto out; + } + /* Btrfs_validate_super() includes fsid check against super->fsid. */ ret = btrfs_validate_super(fs_info, sb, 0); if (ret < 0)
From: Jie Wang wangjie125@huawei.com
commit 29df7c695ed67a8fa32bb7805bad8fe2a76c1f88 upstream.
The refactoring of rx copybreak modifies the original return logic, which will make this feature unavailable. So this patch fixes the return logic of rx copybreak.
Fixes: e74a726da2c4 ("net: hns3: refactor hns3_nic_reuse_page()") Fixes: 99f6b5fb5f63 ("net: hns3: use bounce buffer when rx page can not be reused") Signed-off-by: Jie Wang wangjie125@huawei.com Signed-off-by: Hao Lan lanhao@huawei.com Signed-off-by: Paolo Abeni pabeni@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c @@ -3590,8 +3590,8 @@ static void hns3_nic_reuse_page(struct s desc_cb->reuse_flag = 1; } else if (frag_size <= ring->rx_copybreak) { ret = hns3_handle_rx_copybreak(skb, i, ring, pull_len, desc_cb); - if (ret) - goto out; + if (!ret) + return; }
out:
From: Jan Kara jack@suse.cz
commit 5fc4cbd9fde5d4630494fd6ffc884148fb618087 upstream.
Commit 307af6c87937 ("mbcache: automatically delete entries from cache on freeing") started nesting cache->c_list_lock under the bit locks protecting hash buckets of the mbcache hash table in mb_cache_entry_create(). This causes problems for real-time kernels because there spinlocks are sleeping locks while bitlocks stay atomic. Luckily the nesting is easy to avoid by holding entry reference until the entry is added to the LRU list. This makes sure we cannot race with entry deletion.
Cc: stable@kernel.org Fixes: 307af6c87937 ("mbcache: automatically delete entries from cache on freeing") Reported-by: Mike Galbraith efault@gmx.de Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220908091032.10513-1-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/mbcache.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
--- a/fs/mbcache.c +++ b/fs/mbcache.c @@ -90,8 +90,14 @@ int mb_cache_entry_create(struct mb_cach return -ENOMEM;
INIT_LIST_HEAD(&entry->e_list); - /* Initial hash reference */ - atomic_set(&entry->e_refcnt, 1); + /* + * We create entry with two references. One reference is kept by the + * hash table, the other reference is used to protect us from + * mb_cache_entry_delete_or_get() until the entry is fully setup. This + * avoids nesting of cache->c_list_lock into hash table bit locks which + * is problematic for RT. + */ + atomic_set(&entry->e_refcnt, 2); entry->e_key = key; entry->e_value = value; entry->e_flags = 0; @@ -107,15 +113,12 @@ int mb_cache_entry_create(struct mb_cach } } hlist_bl_add_head(&entry->e_hash_list, head); - /* - * Add entry to LRU list before it can be found by - * mb_cache_entry_delete() to avoid races - */ + hlist_bl_unlock(head); spin_lock(&cache->c_list_lock); list_add_tail(&entry->e_list, &cache->c_list); cache->c_entry_count++; spin_unlock(&cache->c_list_lock); - hlist_bl_unlock(head); + mb_cache_entry_put(cache, entry);
return 0; }
From: Ard Biesheuvel ardb@kernel.org
commit 196dff2712ca5a2e651977bb2fe6b05474111a83 upstream.
Instead of blindly creating the EFI random seed configuration table if the RNG protocol is implemented and works, check whether such a EFI configuration table was provided by an earlier boot stage and if so, concatenate the existing and the new seeds, leaving it up to the core code to mix it in and credit it the way it sees fit.
This can be used for, e.g., systemd-boot, to pass an additional seed to Linux in a way that can be consumed by the kernel very early. In that case, the following definitions should be used to pass the seed to the EFI stub:
struct linux_efi_random_seed { u32 size; // of the 'seed' array in bytes u8 seed[]; };
The memory for the struct must be allocated as EFI_ACPI_RECLAIM_MEMORY pool memory, and the address of the struct in memory should be installed as a EFI configuration table using the following GUID:
LINUX_EFI_RANDOM_SEED_TABLE_GUID 1ce1e5bc-7ceb-42f2-81e5-8aadf180f57b
Note that doing so is safe even on kernels that were built without this patch applied, but the seed will simply be overwritten with a seed derived from the EFI RNG protocol, if available. The recommended seed size is 32 bytes, and seeds larger than 512 bytes are considered corrupted and ignored entirely.
In order to preserve forward secrecy, seeds from previous bootloaders are memzero'd out, and in order to preserve memory, those older seeds are also freed from memory. Freeing from memory without first memzeroing is not safe to do, as it's possible that nothing else will ever overwrite those pages used by EFI.
Reviewed-by: Jason A. Donenfeld Jason@zx2c4.com [ardb: incorporate Jason's followup changes to extend the maximum seed size on the consumer end, memzero() it and drop a needless printk] Signed-off-by: Ard Biesheuvel ardb@kernel.org Signed-off-by: Jason A. Donenfeld Jason@zx2c4.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/firmware/efi/efi.c | 4 +-- drivers/firmware/efi/libstub/efistub.h | 2 + drivers/firmware/efi/libstub/random.c | 42 ++++++++++++++++++++++++++++----- include/linux/efi.h | 2 - 4 files changed, 40 insertions(+), 10 deletions(-)
--- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -590,7 +590,7 @@ int __init efi_config_parse_tables(const
seed = early_memremap(efi_rng_seed, sizeof(*seed)); if (seed != NULL) { - size = min(seed->size, EFI_RANDOM_SEED_SIZE); + size = min_t(u32, seed->size, SZ_1K); // sanity check early_memunmap(seed, sizeof(*seed)); } else { pr_err("Could not map UEFI random seed!\n"); @@ -599,8 +599,8 @@ int __init efi_config_parse_tables(const seed = early_memremap(efi_rng_seed, sizeof(*seed) + size); if (seed != NULL) { - pr_notice("seeding entropy pool\n"); add_bootloader_randomness(seed->bits, size); + memzero_explicit(seed->bits, size); early_memunmap(seed, sizeof(*seed) + size); } else { pr_err("Could not map UEFI random seed!\n"); --- a/drivers/firmware/efi/libstub/efistub.h +++ b/drivers/firmware/efi/libstub/efistub.h @@ -766,6 +766,8 @@ efi_status_t efi_get_random_bytes(unsign efi_status_t efi_random_alloc(unsigned long size, unsigned long align, unsigned long *addr, unsigned long random_seed);
+efi_status_t efi_random_get_seed(void); + efi_status_t check_platform_features(void);
void *get_efi_config_table(efi_guid_t guid); --- a/drivers/firmware/efi/libstub/random.c +++ b/drivers/firmware/efi/libstub/random.c @@ -67,8 +67,9 @@ efi_status_t efi_random_get_seed(void) efi_guid_t rng_proto = EFI_RNG_PROTOCOL_GUID; efi_guid_t rng_algo_raw = EFI_RNG_ALGORITHM_RAW; efi_guid_t rng_table_guid = LINUX_EFI_RANDOM_SEED_TABLE_GUID; + struct linux_efi_random_seed *prev_seed, *seed = NULL; + int prev_seed_size = 0, seed_size = EFI_RANDOM_SEED_SIZE; efi_rng_protocol_t *rng = NULL; - struct linux_efi_random_seed *seed = NULL; efi_status_t status;
status = efi_bs_call(locate_protocol, &rng_proto, NULL, (void **)&rng); @@ -76,18 +77,33 @@ efi_status_t efi_random_get_seed(void) return status;
/* + * Check whether a seed was provided by a prior boot stage. In that + * case, instead of overwriting it, let's create a new buffer that can + * hold both, and concatenate the existing and the new seeds. + * Note that we should read the seed size with caution, in case the + * table got corrupted in memory somehow. + */ + prev_seed = get_efi_config_table(LINUX_EFI_RANDOM_SEED_TABLE_GUID); + if (prev_seed && prev_seed->size <= 512U) { + prev_seed_size = prev_seed->size; + seed_size += prev_seed_size; + } + + /* * Use EFI_ACPI_RECLAIM_MEMORY here so that it is guaranteed that the * allocation will survive a kexec reboot (although we refresh the seed * beforehand) */ status = efi_bs_call(allocate_pool, EFI_ACPI_RECLAIM_MEMORY, - sizeof(*seed) + EFI_RANDOM_SEED_SIZE, + struct_size(seed, bits, seed_size), (void **)&seed); - if (status != EFI_SUCCESS) - return status; + if (status != EFI_SUCCESS) { + efi_warn("Failed to allocate memory for RNG seed.\n"); + goto err_warn; + }
status = efi_call_proto(rng, get_rng, &rng_algo_raw, - EFI_RANDOM_SEED_SIZE, seed->bits); + EFI_RANDOM_SEED_SIZE, seed->bits);
if (status == EFI_UNSUPPORTED) /* @@ -100,14 +116,28 @@ efi_status_t efi_random_get_seed(void) if (status != EFI_SUCCESS) goto err_freepool;
- seed->size = EFI_RANDOM_SEED_SIZE; + seed->size = seed_size; + if (prev_seed_size) + memcpy(seed->bits + EFI_RANDOM_SEED_SIZE, prev_seed->bits, + prev_seed_size); + status = efi_bs_call(install_configuration_table, &rng_table_guid, seed); if (status != EFI_SUCCESS) goto err_freepool;
+ if (prev_seed_size) { + /* wipe and free the old seed if we managed to install the new one */ + memzero_explicit(prev_seed->bits, prev_seed_size); + efi_bs_call(free_pool, prev_seed); + } return EFI_SUCCESS;
err_freepool: + memzero_explicit(seed, struct_size(seed, bits, seed_size)); efi_bs_call(free_pool, seed); + efi_warn("Failed to obtain seed from EFI_RNG_PROTOCOL\n"); +err_warn: + if (prev_seed) + efi_warn("Retaining bootloader-supplied seed only"); return status; } --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -1114,8 +1114,6 @@ void efi_check_for_embedded_firmwares(vo static inline void efi_check_for_embedded_firmwares(void) { } #endif
-efi_status_t efi_random_get_seed(void); - /* * Arch code can implement the following three template macros, avoiding * reptition for the void/non-void return cases of {__,}efi_call_virt():
From: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com
Smatch warning: io_fixup_rw_res() warn: unsigned 'res' is never less than zero.
Change type of 'res' from unsigned to long.
Fixes: d6b7efc722a2 ("io_uring/rw: fix error'ed retry return values") Signed-off-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -2701,7 +2701,7 @@ static bool __io_complete_rw_common(stru return false; }
-static inline int io_fixup_rw_res(struct io_kiocb *req, unsigned res) +static inline int io_fixup_rw_res(struct io_kiocb *req, long res) { struct io_async_rw *io = req->async_data;
From: Jocelyn Falempe jfalempe@redhat.com
commit b389286d0234e1edbaf62ed8bc0892a568c33662 upstream.
For G200_SE_A, PLL M setting is wrong, which leads to blank screen, or "signal out of range" on VGA display. previous code had "m |= 0x80" which was changed to m |= ((pixpllcn & BIT(8)) >> 1);
Tested on G200_SE_A rev 42
This line of code was moved to another file with commit 877507bb954e ("drm/mgag200: Provide per-device callbacks for PIXPLLC") but can be easily backported before this commit.
v2: * put BIT(7) First to respect MSB-to-LSB (Thomas) * Add a comment to explain that this bit must be set (Thomas)
Fixes: 2dd040946ecf ("drm/mgag200: Store values (not bits) in struct mgag200_pll_values") Cc: stable@vger.kernel.org Signed-off-by: Jocelyn Falempe jfalempe@redhat.com Reviewed-by: Thomas Zimmermann tzimmermann@suse.de Link: https://patchwork.freedesktop.org/patch/msgid/20221013132810.521945-1-jfalem... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/gpu/drm/mgag200/mgag200_pll.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/drivers/gpu/drm/mgag200/mgag200_pll.c +++ b/drivers/gpu/drm/mgag200/mgag200_pll.c @@ -268,7 +268,8 @@ static void mgag200_pixpll_update_g200se pixpllcp = pixpllc->p - 1; pixpllcs = pixpllc->s;
- xpixpllcm = pixpllcm | ((pixpllcn & BIT(8)) >> 1); + // For G200SE A, BIT(7) should be set unconditionally. + xpixpllcm = BIT(7) | pixpllcm; xpixpllcn = pixpllcn; xpixpllcp = (pixpllcs << 3) | pixpllcp;
On 1/10/23 10:01, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.87 release. There are 290 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 Jan 2023 17:59:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.87-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On 1/10/23 11:01, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.87 release. There are 290 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 Jan 2023 17:59:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.87-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Tue, 10 Jan 2023 at 23:53, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 5.15.87 release. There are 290 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 Jan 2023 17:59:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.87-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 5.15.87-rc1 * git: https://gitlab.com/Linaro/lkft/mirrors/stable/linux-stable-rc * git branch: linux-5.15.y * git commit: 5e4a8f5e829f10ba7300f2b854cebaed7ac88e73 * git describe: v5.15.86-291-g5e4a8f5e829f * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-5.15.y/build/v5.15....
## Test Regressions (compared to v5.15.86)
## Metric Regressions (compared to v5.15.86)
## Test Fixes (compared to v5.15.86)
## Metric Fixes (compared to v5.15.86)
## Test result summary total: 158884, pass: 133407, fail: 4419, skip: 20714, xfail: 344
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 151 total, 150 passed, 1 failed * arm64: 49 total, 47 passed, 2 failed * i386: 39 total, 35 passed, 4 failed * mips: 31 total, 29 passed, 2 failed * parisc: 8 total, 8 passed, 0 failed * powerpc: 34 total, 32 passed, 2 failed * riscv: 14 total, 14 passed, 0 failed * s390: 16 total, 14 passed, 2 failed * sh: 14 total, 12 passed, 2 failed * sparc: 8 total, 8 passed, 0 failed * x86_64: 42 total, 40 passed, 2 failed
## Test suites summary * boot * fwts * igt-gpu-tools * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kexec * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-pstore * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-open-posix-tests * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * packetdrill * perf * rcutorture * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
Hi Greg,
On Tue, Jan 10, 2023 at 07:01:32PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.87 release. There are 290 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 Jan 2023 17:59:42 +0000. Anything received after that time might be too late.
Build test (gcc version 12.2.1 20221127): mips: 62 configs -> no failure arm: 99 configs -> no failure arm64: 3 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure csky allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1] arm64: Booted on rpi4b (4GB model). No regression. [2] mips: Booted on ci20 board. No regression. [3]
[1]. https://openqa.qa.codethink.co.uk/tests/2607 [2]. https://openqa.qa.codethink.co.uk/tests/2612 [3]. https://openqa.qa.codethink.co.uk/tests/2615
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
On Tue, Jan 10, 2023 at 07:01:32PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.87 release. There are 290 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully cross-compiled for arm64 (bcm2711_defconfig, GCC 10.2.0) and powerpc (ps3_defconfig, GCC 12.2.0).
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
This is the start of the stable review cycle for the 5.15.87 release. There are 290 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 Jan 2023 17:59:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.87-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my x86_64 and ARM64 test systems. No errors or regressions.
Tested-by: Allen Pais apais@linux.microsoft.com
Thanks.
On Tue, Jan 10, 2023 at 07:01:32PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.87 release. There are 290 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 Jan 2023 17:59:42 +0000. Anything received after that time might be too late.
Build results: total: 160 pass: 160 fail: 0 Qemu test results: total: 489 pass: 489 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Tue, Jan 10, 2023 at 07:01:32PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.87 release. There are 290 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 Jan 2023 17:59:42 +0000. Anything received after that time might be too late.
No regressions found on WSL x86_64 or WSL arm64
Built, booted, and compared dmesg against 5.15.86.
Thank you.
Tested-by: Kelsey Steele kelseysteele@linux.microsoft.com
On 1/10/23 10:01 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 5.15.87 release. There are 290 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Thu, 12 Jan 2023 17:59:42 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.87-rc1... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
linux-stable-mirror@lists.linaro.org