This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.6.28-rc1
Fudongwang fudong.wang@amd.com drm/amd/display: fix disable otg wa logic in DCN316
Harry Wentland harry.wentland@amd.com drm/amd/display: Set VSC SDP Colorimetry same way for MST and SST
Harry Wentland harry.wentland@amd.com drm/amd/display: Program VSC SDP colorimetry for all DP sinks >= 1.4
Tim Huang Tim.Huang@amd.com drm/amdgpu: fix incorrect number of active RBs for gfx11
Alex Deucher alexander.deucher@amd.com drm/amdgpu: always force full reset for SOC21
Lijo Lazar lijo.lazar@amd.com drm/amdgpu: Reset dGPU if suspend got aborted
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915: Disable port sync when bigjoiner is used
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/cdclk: Fix CDCLK programming order when pipes are active
Josh Poimboeuf jpoimboe@kernel.org x86/bugs: Replace CONFIG_SPECTRE_BHI_{ON,OFF} with CONFIG_MITIGATION_SPECTRE_BHI
Josh Poimboeuf jpoimboe@kernel.org x86/bugs: Remove CONFIG_BHI_MITIGATION_AUTO and spectre_bhi=auto
Josh Poimboeuf jpoimboe@kernel.org x86/bugs: Clarify that syscall hardening isn't a BHI mitigation
Josh Poimboeuf jpoimboe@kernel.org x86/bugs: Fix BHI handling of RRSBA
Ingo Molnar mingo@kernel.org x86/bugs: Rename various 'ia32_cap' variables to 'x86_arch_cap_msr'
Josh Poimboeuf jpoimboe@kernel.org x86/bugs: Cache the value of MSR_IA32_ARCH_CAPABILITIES
Josh Poimboeuf jpoimboe@kernel.org x86/bugs: Fix BHI documentation
Daniel Sneddon daniel.sneddon@linux.intel.com x86/bugs: Fix return type of spectre_bhi_state()
Arnd Bergmann arnd@arndb.de irqflags: Explicitly ignore lockdep_hrtimer_exit() argument
Adam Dunlap acdunlap@google.com x86/apic: Force native_apic_mem_read() to use the MOV instruction
John Stultz jstultz@google.com selftests: timers: Fix abs() warning in posix_timers test
Sean Christopherson seanjc@google.com x86/cpu: Actually turn off mitigations by default for SPECULATION_MITIGATIONS=n
Namhyung Kim namhyung@kernel.org perf/x86: Fix out of range data
Gavin Shan gshan@redhat.com vhost: Add smp_rmb() in vhost_enable_notify()
Gavin Shan gshan@redhat.com vhost: Add smp_rmb() in vhost_vq_avail_empty()
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-dma: fix spi lpcg indices
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-lsio: fix pwm lpcg indices
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-conn: fix usb lpcg indices
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-dma: fix adc lpcg indices
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-dma: fix can lpcg indices
Frank Li Frank.Li@nxp.com arm64: dts: imx8qm-ss-dma: fix can lpcg indices
Ville Syrjälä ville.syrjala@linux.intel.com drm/client: Fully protect modes[] with dev->mode_config.mutex
Boris Brezillon boris.brezillon@collabora.com drm/panfrost: Fix the error path in panfrost_mmu_map_fault_addr()
Jammy Huang jammy_huang@aspeedtech.com drm/ast: Fix soft lockup
Harish Kasiviswanathan Harish.Kasiviswanathan@amd.com drm/amdkfd: Reset GPU on queue preemption failure
Ville Syrjälä ville.syrjala@linux.intel.com drm/i915/vrr: Disable VRR when using bigjoiner
Zack Rusin zack.rusin@broadcom.com drm/vmwgfx: Enable DMA mappings with SEV
Jacek Lawrynowicz jacek.lawrynowicz@linux.intel.com accel/ivpu: Fix deadlock in context_xa
Alexander Wetzel Alexander@wetzel-home.de scsi: sg: Avoid race in error handling & drop bogus warn
Alexander Wetzel Alexander@wetzel-home.de scsi: sg: Avoid sg device teardown race
Zheng Yejian zhengyejian1@huawei.com kprobes: Fix possible use-after-free issue on kprobe registration
Pavel Begunkov asml.silence@gmail.com io_uring/net: restore msg_control on sendzc retry
Boris Burkov boris@bur.io btrfs: qgroup: convert PREALLOC to PERTRANS after record_root_in_trans
Boris Burkov boris@bur.io btrfs: record delayed inode root in transaction
Boris Burkov boris@bur.io btrfs: qgroup: fix qgroup prealloc rsv leak in subvolume operations
Boris Burkov boris@bur.io btrfs: qgroup: correctly model root qgroup rsv in convert
Geliang Tang tanggeliang@kylinos.cn selftests: mptcp: use += operator to append strings
Jacob Pan jacob.jun.pan@linux.intel.com iommu/vt-d: Allocate local memory for page request queue
Xuchun Shang xuchun.shang@linux.alibaba.com iommu/vt-d: Fix wrong use of pasid config
Arnd Bergmann arnd@arndb.de tracing: hide unused ftrace_event_id_fops
David Arinzon darinzon@amazon.com net: ena: Set tx_info->xdpf value to NULL
David Arinzon darinzon@amazon.com net: ena: Use tx_ring instead of xdp_ring for XDP channel TX
David Arinzon darinzon@amazon.com net: ena: Pass ena_adapter instead of net_device to ena_xmit_common()
David Arinzon darinzon@amazon.com net: ena: Move XDP code to its new files
David Arinzon darinzon@amazon.com net: ena: Fix incorrect descriptor free behavior
David Arinzon darinzon@amazon.com net: ena: Wrong missing IO completions check order
David Arinzon darinzon@amazon.com net: ena: Fix potential sign extension issue
Michal Luczaj mhal@rbox.co af_unix: Fix garbage collector racing against connect()
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Do not use atomic ops for unix_sk(sk)->inflight.
Arınç ÜNAL arinc.unal@arinc9.com net: dsa: mt7530: trap link-local frames regardless of ST Port State
Gerd Bayer gbayer@linux.ibm.com Revert "s390/ism: fix receive message buffer allocation"
Daniel Machon daniel.machon@microchip.com net: sparx5: fix wrong config being used when reconfiguring PCS
Rahul Rameshbabu rrameshbabu@nvidia.com net/mlx5e: Do not produce metadata freelist entries in Tx port ts WQE xmit
Carolina Jubran cjubran@nvidia.com net/mlx5e: HTB, Fix inconsistencies with QoS SQs number
Carolina Jubran cjubran@nvidia.com net/mlx5e: Fix mlx5e_priv_init() cleanup flow
Cosmin Ratiu cratiu@nvidia.com net/mlx5: Correctly compare pkt reformat ids
Cosmin Ratiu cratiu@nvidia.com net/mlx5: Properly link new fs rules into the tree
Michael Liang mliang@purestorage.com net/mlx5: offset comp irq index in name by one
Shay Drory shayd@nvidia.com net/mlx5: Register devlink first under devlink lock
Moshe Shemesh moshe@nvidia.com net/mlx5: SF, Stop waiting for FW as teardown was called
Eric Dumazet edumazet@google.com netfilter: complete validation of user input
Archie Pusaka apusaka@chromium.org Bluetooth: l2cap: Don't double set the HCI_CONN_MGMT_CONNECTED bit
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: SCO: Fix not validating setsockopt user input
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_sync: Fix using the same interval and window for Coded PHY
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: hci_sync: Use QoS to determine which PHY to scan
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: ISO: Don't reject BT_ISO_QOS if parameters are unset
Luiz Augusto von Dentz luiz.von.dentz@intel.com Bluetooth: ISO: Align broadcast sync_timeout with connection timeout
Jiri Benc jbenc@redhat.com ipv6: fix race condition between ipv6_get_ifaddr and ipv6_del_addr
Arnd Bergmann arnd@arndb.de ipv4/route: avoid unused-but-set-variable warning
Arnd Bergmann arnd@arndb.de ipv6: fib: hide unused 'pn' variable
Geetha sowjanya gakula@marvell.com octeontx2-af: Fix NIX SQ mode and BP config
Kuniyuki Iwashima kuniyu@amazon.com af_unix: Clear stale u->oob_skb.
Marek Vasut marex@denx.de net: ks8851: Handle softirqs at the end of IRQ thread to fix hang
Marek Vasut marex@denx.de net: ks8851: Inline ks8851_rx_skb()
Pavan Chebbi pavan.chebbi@broadcom.com bnxt_en: Reset PTP tx_avail after possible firmware reset
Vikas Gupta vikas.gupta@broadcom.com bnxt_en: Fix error recovery for RoCE ulp client
Vikas Gupta vikas.gupta@broadcom.com bnxt_en: Fix possible memory leak in bnxt_rdma_aux_device_init()
Gerd Bayer gbayer@linux.ibm.com s390/ism: fix receive message buffer allocation
Eric Dumazet edumazet@google.com geneve: fix header validation in geneve[6]_xmit_skb
Ming Lei ming.lei@redhat.com block: fix q->blkg_list corruption during disk rebind
Hariprasad Kelam hkelam@marvell.com octeontx2-pf: Fix transmit scheduler resource leak
Eric Dumazet edumazet@google.com xsk: validate user input for XDP_{UMEM|COMPLETION}_FILL_RING
Petr Tesarik petr@tesarici.cz u64_stats: fix u64_stats_init() for lockdep when used repeatedly in one file
Ilya Maximets i.maximets@ovn.org net: openvswitch: fix unwanted error log on timeout policy probing
Dan Carpenter dan.carpenter@linaro.org scsi: qla2xxx: Fix off by one in qla_edif_app_getstats()
Xiang Chen chenxiang66@hisilicon.com scsi: hisi_sas: Modify the deadline for ata_wait_after_reset()
Arnd Bergmann arnd@arndb.de nouveau: fix function cast warning
Alex Constantino dreaming.about.electric.sheep@gmail.com Revert "drm/qxl: simplify qxl_fence_wait"
Kwangjin Ko kwangjin.ko@sk.com cxl/core: Fix initialization of mbox_cmd.size_out in get event
Frank Li Frank.Li@nxp.com arm64: dts: imx8-ss-conn: fix usdhc wrong lpcg clock order
Dmitry Baryshkov dmitry.baryshkov@linaro.org drm/msm/dpu: don't allow overriding data from catalog
Dave Jiang dave.jiang@intel.com cxl/core/regs: Fix usage of map->reg_type in cxl_decode_regblock() before assigned
Yuquan Wang wangyuquan1236@phytium.com.cn cxl/mem: Fix for the index of Clear Event Record Handle
Cristian Marussi cristian.marussi@arm.com firmware: arm_scmi: Make raw debugfs entries non-seekable
Aaro Koskinen aaro.koskinen@iki.fi ARM: OMAP2+: fix USB regression on Nokia N8x0
Aaro Koskinen aaro.koskinen@iki.fi mmc: omap: restore original power up/down steps
Aaro Koskinen aaro.koskinen@iki.fi mmc: omap: fix deferred probe
Aaro Koskinen aaro.koskinen@iki.fi mmc: omap: fix broken slot switch lookup
Aaro Koskinen aaro.koskinen@iki.fi ARM: OMAP2+: fix N810 MMC gpiod table
Aaro Koskinen aaro.koskinen@iki.fi ARM: OMAP2+: fix bogus MMC GPIO labels on Nokia N8x0
Nini Song nini.song@mediatek.com media: cec: core: remove length check of Timer Status
Anna-Maria Behnsen anna-maria@linutronix.de PM: s2idle: Make sure CPUs will wakeup directly on resume
Hans de Goede hdegoede@redhat.com ACPI: scan: Do not increase dep_unmet for already met dependencies
Noah Loomans noah@noahloomans.com platform/chrome: cros_ec_uart: properly fix race condition
Tim Huang Tim.Huang@amd.com drm/amd/pm: fixes a random hang in S4 for SMU v13.0.4/11
Dmitry Antipov dmantipov@yandex.ru Bluetooth: Fix memory leak in hci_req_sync_complete()
Steven Rostedt (Google) rostedt@goodmis.org ring-buffer: Only update pages_touched when a new page is touched
Yu Kuai yukuai3@huawei.com raid1: fix use-after-free for original bio in raid1_write_request()
Fabio Estevam festevam@denx.de ARM: dts: imx7s-warp: Pass OV2680 link-frequencies
Gavin Shan gshan@redhat.com arm64: tlb: Fix TLBI RANGE operand
Sven Eckelmann sven@narfation.org batman-adv: Avoid infinite loop trying to resize local TT
Damien Le Moal dlemoal@kernel.org ata: libata-scsi: Fix ata_scsi_dev_rescan() error path
Igor Pylypiv ipylypiv@google.com ata: libata-core: Allow command duration limits detection for ACS-4 drives
Steve French stfrench@microsoft.com smb3: fix Open files on server counter going negative
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/spectre.rst | 22 +- Documentation/admin-guide/kernel-parameters.txt | 12 +- .../device_drivers/ethernet/amazon/ena.rst | 1 + Makefile | 4 +- arch/arm/boot/dts/nxp/imx/imx7s-warp.dts | 1 + arch/arm/mach-omap2/board-n8x0.c | 23 +- arch/arm64/boot/dts/freescale/imx8-ss-conn.dtsi | 16 +- arch/arm64/boot/dts/freescale/imx8-ss-dma.dtsi | 36 +- arch/arm64/boot/dts/freescale/imx8-ss-lsio.dtsi | 16 +- arch/arm64/boot/dts/freescale/imx8qm-ss-dma.dtsi | 8 +- arch/arm64/include/asm/tlbflush.h | 20 +- arch/x86/Kconfig | 21 +- arch/x86/events/core.c | 1 + arch/x86/include/asm/apic.h | 3 +- arch/x86/kernel/apic/apic.c | 6 +- arch/x86/kernel/cpu/bugs.c | 82 ++- arch/x86/kernel/cpu/common.c | 48 +- block/blk-cgroup.c | 9 +- block/blk-cgroup.h | 2 + block/blk-core.c | 2 + drivers/accel/ivpu/ivpu_drv.c | 2 +- drivers/acpi/scan.c | 3 +- drivers/ata/libata-core.c | 2 +- drivers/ata/libata-scsi.c | 9 +- drivers/cxl/core/mbox.c | 5 +- drivers/cxl/core/regs.c | 5 +- drivers/firmware/arm_scmi/raw_mode.c | 7 +- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/soc21.c | 27 +- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 + drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 15 +- .../amd/display/dc/clk_mgr/dcn316/dcn316_clk_mgr.c | 19 +- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 12 +- drivers/gpu/drm/ast/ast_dp.c | 3 + drivers/gpu/drm/drm_client_modeset.c | 3 +- drivers/gpu/drm/i915/display/intel_cdclk.c | 7 +- drivers/gpu/drm/i915/display/intel_cdclk.h | 3 + drivers/gpu/drm/i915/display/intel_ddi.c | 5 + drivers/gpu/drm/i915/display/intel_vrr.c | 7 + drivers/gpu/drm/msm/disp/dpu1/dpu_core_perf.c | 10 +- .../gpu/drm/nouveau/nvkm/subdev/bios/shadowof.c | 7 +- drivers/gpu/drm/panfrost/panfrost_mmu.c | 13 +- drivers/gpu/drm/qxl/qxl_release.c | 50 +- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 11 +- drivers/iommu/intel/perfmon.c | 2 +- drivers/iommu/intel/svm.c | 2 +- drivers/md/raid1.c | 2 +- drivers/media/cec/core/cec-adap.c | 14 - drivers/mmc/host/omap.c | 48 +- drivers/net/dsa/mt7530.c | 229 ++++++- drivers/net/dsa/mt7530.h | 5 + drivers/net/ethernet/amazon/ena/Makefile | 2 +- drivers/net/ethernet/amazon/ena/ena_com.c | 2 +- drivers/net/ethernet/amazon/ena/ena_ethtool.c | 1 + drivers/net/ethernet/amazon/ena/ena_netdev.c | 688 ++------------------- drivers/net/ethernet/amazon/ena/ena_netdev.h | 83 +-- drivers/net/ethernet/amazon/ena/ena_xdp.c | 466 ++++++++++++++ drivers/net/ethernet/amazon/ena/ena_xdp.h | 152 +++++ drivers/net/ethernet/broadcom/bnxt/bnxt.c | 2 + drivers/net/ethernet/broadcom/bnxt/bnxt_ulp.c | 6 +- .../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 22 +- drivers/net/ethernet/marvell/octeontx2/nic/qos.c | 1 + drivers/net/ethernet/mellanox/mlx5/core/en/ptp.h | 8 +- drivers/net/ethernet/mellanox/mlx5/core/en/qos.c | 33 +- drivers/net/ethernet/mellanox/mlx5/core/en/selq.c | 2 + drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 - drivers/net/ethernet/mellanox/mlx5/core/en_tx.c | 7 +- drivers/net/ethernet/mellanox/mlx5/core/fs_core.c | 17 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 37 +- drivers/net/ethernet/mellanox/mlx5/core/pci_irq.c | 4 +- .../ethernet/mellanox/mlx5/core/sf/dev/driver.c | 22 +- drivers/net/ethernet/micrel/ks8851.h | 3 - drivers/net/ethernet/micrel/ks8851_common.c | 16 +- drivers/net/ethernet/micrel/ks8851_par.c | 11 - drivers/net/ethernet/micrel/ks8851_spi.c | 11 - .../net/ethernet/microchip/sparx5/sparx5_port.c | 4 +- drivers/net/geneve.c | 4 +- drivers/platform/chrome/cros_ec_uart.c | 28 +- drivers/scsi/hisi_sas/hisi_sas_main.c | 2 +- drivers/scsi/qla2xxx/qla_edif.c | 2 +- drivers/scsi/sg.c | 20 +- drivers/vhost/vhost.c | 28 +- fs/btrfs/delayed-inode.c | 3 + fs/btrfs/inode.c | 13 +- fs/btrfs/ioctl.c | 37 +- fs/btrfs/qgroup.c | 2 + fs/btrfs/root-tree.c | 10 - fs/btrfs/root-tree.h | 2 - fs/btrfs/transaction.c | 17 +- fs/smb/client/cached_dir.c | 4 +- include/linux/dma-fence.h | 7 + include/linux/irqflags.h | 2 +- include/linux/u64_stats_sync.h | 9 +- include/net/addrconf.h | 4 + include/net/af_unix.h | 2 +- include/net/bluetooth/bluetooth.h | 11 + include/net/ip_tunnels.h | 33 + io_uring/net.c | 1 + kernel/cpu.c | 3 +- kernel/kprobes.c | 18 +- kernel/power/suspend.c | 6 + kernel/trace/ring_buffer.c | 6 +- kernel/trace/trace_events.c | 4 + net/batman-adv/translation-table.c | 2 +- net/bluetooth/hci_request.c | 4 +- net/bluetooth/hci_sync.c | 66 +- net/bluetooth/iso.c | 14 +- net/bluetooth/l2cap_core.c | 3 +- net/bluetooth/sco.c | 23 +- net/ipv4/netfilter/arp_tables.c | 4 + net/ipv4/netfilter/ip_tables.c | 4 + net/ipv4/route.c | 4 +- net/ipv6/addrconf.c | 7 +- net/ipv6/ip6_fib.c | 7 +- net/ipv6/netfilter/ip6_tables.c | 4 + net/openvswitch/conntrack.c | 5 +- net/unix/af_unix.c | 8 +- net/unix/garbage.c | 35 +- net/unix/scm.c | 8 +- net/xdp/xsk.c | 2 + tools/testing/selftests/net/mptcp/mptcp_connect.sh | 53 +- tools/testing/selftests/net/mptcp/mptcp_join.sh | 30 +- tools/testing/selftests/timers/posix_timers.c | 2 +- 123 files changed, 1765 insertions(+), 1263 deletions(-)
On 4/15/2024 7:19 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli florian.fainelli@broadcom.com
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
No regressions found on WSL (x86 and arm64).
Built, booted, and reviewed dmesg.
Thank you. :)
Tested-by: Kelsey Steele kelseysteele@linux.microsoft.com
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
I'm seeing boot breakage with this one on the Arm fast models, a bisect is running now, for slow values of run but should be done by the time I get back tonight. It only seems to be affecting 6.6, the boot grinds to a halt shortly after getting to userspace apparently with some PCI/virtio issues:
[ 1.606075] VFS: Mounted root (ext4 filesystem) on device 254:1. [ 1.608751] devtmpfs: mounted [ 1.627412] Freeing unused kernel memory: 9152K [ 1.627894] Run /sbin/init as init process [ 1.627957] with arguments: [ 1.628009] /sbin/init [ 1.628064] Image [ 1.628117] with environment: [ 1.628169] HOME=/ [ 1.628222] TERM=linux [ 1.628275] user_debug=31 [ 11.764055] pci 0000:00:01.0: deferred probe pending [ 11.764141] pci 0000:00:02.0: deferred probe pending [ 11.764227] pci 0000:00:03.0: deferred probe pending [ 11.764313] pci 0000:00:04.0: deferred probe pending [ 11.764399] pci 0000:03:00.0: deferred probe pending [ 11.764485] pci 0000:04:00.0: deferred probe pending [ 11.764571] pci 0000:04:01.0: deferred probe pending [ 11.764657] pci 0000:04:02.0: deferred probe pending [ 11.764743] pci 0000:00:1f.0: deferred probe pending [ 11.764829] pci 0000:01:00.0: deferred probe pending [ 11.764915] pci 0000:05:00.0: deferred probe pending
(no probe deferral happens for working boots.)
On Tue, 16 Apr 2024 at 05:35, Mark Brown broonie@kernel.org wrote:
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
I'm seeing boot breakage with this one on the Arm fast models, a bisect is running now, for slow values of run but should be done by the time I get back tonight. It only seems to be affecting 6.6, the boot grinds to a halt shortly after getting to userspace apparently with some PCI/virtio issues:
LKFT also noticed the problem that Mark Brown reported.
[ 1.606075] VFS: Mounted root (ext4 filesystem) on device 254:1. [ 1.608751] devtmpfs: mounted [ 1.627412] Freeing unused kernel memory: 9152K [ 1.627894] Run /sbin/init as init process [ 1.627957] with arguments: [ 1.628009] /sbin/init [ 1.628064] Image [ 1.628117] with environment: [ 1.628169] HOME=/ [ 1.628222] TERM=linux [ 1.628275] user_debug=31 [ 11.764055] pci 0000:00:01.0: deferred probe pending [ 11.764141] pci 0000:00:02.0: deferred probe pending [ 11.764227] pci 0000:00:03.0: deferred probe pending [ 11.764313] pci 0000:00:04.0: deferred probe pending [ 11.764399] pci 0000:03:00.0: deferred probe pending [ 11.764485] pci 0000:04:00.0: deferred probe pending [ 11.764571] pci 0000:04:01.0: deferred probe pending [ 11.764657] pci 0000:04:02.0: deferred probe pending [ 11.764743] pci 0000:00:1f.0: deferred probe pending [ 11.764829] pci 0000:01:00.0: deferred probe pending [ 11.764915] pci 0000:05:00.0: deferred probe pending
(no probe deferral happens for working boots.)
-- Linaro LKFT https://lkft.linaro.org
On 4/15/24 7:19 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
Hi Greg,
On 15/04/24 19:49, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
No problems seen on x86_64 and aarch64 with our testing.
Tested-by: Harshit Mogalapalli harshit.m.mogalapalli@oracle.com
Thanks, Harshit
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Hi Greg
On Mon, Apr 15, 2024 at 11:35 PM Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
6.6.28-rc1 tested.
Build successfully completed. Boot successfully completed. No dmesg regressions. Video output normal. Sound output normal.
Lenovo ThinkPad X1 Carbon Gen10(Intel i7-1260P(x86_64) arch linux)
[ 0.000000] Linux version 6.6.28-rc1rv (takeshi@ThinkPadX1Gen10J0764) (gcc (GCC) 13.2.1 20230801, GNU ld (GNU Binutils) 2.42.0) #1 SMP PREEMPT_DYNAMIC Tue Apr 16 18:32:52 JST 2024
Thanks
Tested-by: Takeshi Ogasawara takeshi.ogasawara@futuring-girl.com
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
The bisect of the boot issue that's affecting the FVP in v6.6 (only) landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), e3ba51ab24fdd in mainline, as being the first bad commit - it's also in the -rc for v6.8 but that seems fine. I've done no investigation beyond the bisect and looking at the commit log to pull out people to CC and note that the fix was explicitly targeted at v6.6.
Bisect log:
# bad: [a4e5ff3532873150dc32d20f5c214ec59f98bcd2] Linux 6.6.28-rc1 # good: [5e828009c8b380739e13da92be847f10461c38b1] Linux 6.6.27 git bisect start 'a4e5ff3532873150dc32d20f5c214ec59f98bcd2' '5e828009c8b380739e13da92be847f10461c38b1' # bad: [a4e5ff3532873150dc32d20f5c214ec59f98bcd2] Linux 6.6.28-rc1 git bisect bad a4e5ff3532873150dc32d20f5c214ec59f98bcd2 # bad: [f95afc8867d1f2e18e0c6abd16ca76c99a2839be] net/mlx5e: HTB, Fix inconsistencies with QoS SQs number git bisect bad f95afc8867d1f2e18e0c6abd16ca76c99a2839be # bad: [06e82fe83cc671df58a956cd0cf8ba64c15a6d0d] scsi: qla2xxx: Fix off by one in qla_edif_app_getstats() git bisect bad 06e82fe83cc671df58a956cd0cf8ba64c15a6d0d # bad: [d2b5692676e7a204487546699cd5511baad5e9b6] ARM: OMAP2+: fix bogus MMC GPIO labels on Nokia N8x0 git bisect bad d2b5692676e7a204487546699cd5511baad5e9b6 # bad: [a438d050bf7ba5e3462dd61d90897569e7892c80] raid1: fix use-after-free for original bio in raid1_write_request() git bisect bad a438d050bf7ba5e3462dd61d90897569e7892c80 # good: [6e869ee886dead911b2411c7cba816be52dffb19] ata: libata-scsi: Fix ata_scsi_dev_rescan() error path git bisect good 6e869ee886dead911b2411c7cba816be52dffb19 # bad: [c9ad150ed8dd988d1cefc1a8e19df53d46990e76] arm64: tlb: Fix TLBI RANGE operand git bisect bad c9ad150ed8dd988d1cefc1a8e19df53d46990e76 # good: [56a6896c1f107d519c0045dd6575648745bcba21] batman-adv: Avoid infinite loop trying to resize local TT git bisect good 56a6896c1f107d519c0045dd6575648745bcba21 # first bad commit: [c9ad150ed8dd988d1cefc1a8e19df53d46990e76] arm64: tlb: Fix TLBI RANGE operand
On Tue, 16 Apr 2024 11:34:14 +0100, Mark Brown broonie@kernel.org wrote:
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
The bisect of the boot issue that's affecting the FVP in v6.6 (only) landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), e3ba51ab24fdd in mainline, as being the first bad commit - it's also in the -rc for v6.8 but that seems fine. I've done no investigation beyond the bisect and looking at the commit log to pull out people to CC and note that the fix was explicitly targeted at v6.6.
What are the configurations of the kernel and the FVP?
M.
On Tue, Apr 16, 2024 at 12:04:29PM +0100, Marc Zyngier wrote:
Mark Brown broonie@kernel.org wrote:
The bisect of the boot issue that's affecting the FVP in v6.6 (only) landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), e3ba51ab24fdd in mainline, as being the first bad commit - it's also in the -rc for v6.8 but that seems fine. I've done no investigation beyond the bisect and looking at the commit log to pull out people to CC and note that the fix was explicitly targeted at v6.6.
What are the configurations of the kernel and the FVP?
The kernel is a defconfig, the FVP arguments can be seen in the log from the job here:
https://lava.sirena.org.uk/scheduler/job/148281#L233
(sorry, should've included that in the earlier mail.)
On Tue, 16 Apr 2024 at 16:04, Mark Brown broonie@kernel.org wrote:
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
The bisect of the boot issue that's affecting the FVP in v6.6 (only) landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), e3ba51ab24fdd in mainline, as being the first bad commit - it's also in the -rc for v6.8 but that seems fine. I've done no investigation beyond the bisect and looking at the commit log to pull out people to CC and note that the fix was explicitly targeted at v6.6.
Anders investigated this reported issues and bisected and also found the missing commit for stable-rc 6.6 is e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
-- Linaro LKFT https://lkft.linaro.org
On Tue, 16 Apr 2024 14:07:30 +0100, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 16 Apr 2024 at 16:04, Mark Brown broonie@kernel.org wrote:
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
The bisect of the boot issue that's affecting the FVP in v6.6 (only) landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), e3ba51ab24fdd in mainline, as being the first bad commit - it's also in the -rc for v6.8 but that seems fine. I've done no investigation beyond the bisect and looking at the commit log to pull out people to CC and note that the fix was explicitly targeted at v6.6.
Anders investigated this reported issues and bisected and also found the missing commit for stable-rc 6.6 is e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
Which is definitely *not* stable candidate. We need to understand why the invalidation goes south when the scale go up instead of down.
M.
On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
On Tue, 16 Apr 2024 14:07:30 +0100, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 16 Apr 2024 at 16:04, Mark Brown broonie@kernel.org wrote:
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
The bisect of the boot issue that's affecting the FVP in v6.6 (only) landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), e3ba51ab24fdd in mainline, as being the first bad commit - it's also in the -rc for v6.8 but that seems fine. I've done no investigation beyond the bisect and looking at the commit log to pull out people to CC and note that the fix was explicitly targeted at v6.6.
Anders investigated this reported issues and bisected and also found the missing commit for stable-rc 6.6 is e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
Which is definitely *not* stable candidate. We need to understand why the invalidation goes south when the scale go up instead of down.
If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand") which fixes 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my CBMC model, not on the actual kernel. It may be worth adding some WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or num greater than 31.
I haven't investigated properly (and I'm off tomorrow, back on Thu) but it's likely the original code was not very friendly to the maximum range, never tested. Anyway, if one figures out why it goes out of range, I think the solution is to also backport e2768b798a19 to stable.
On Tue, Apr 16, 2024 at 06:28:10PM +0100, Catalin Marinas wrote:
On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
On Tue, 16 Apr 2024 14:07:30 +0100, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 16 Apr 2024 at 16:04, Mark Brown broonie@kernel.org wrote:
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
The bisect of the boot issue that's affecting the FVP in v6.6 (only) landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), e3ba51ab24fdd in mainline, as being the first bad commit - it's also in the -rc for v6.8 but that seems fine. I've done no investigation beyond the bisect and looking at the commit log to pull out people to CC and note that the fix was explicitly targeted at v6.6.
Anders investigated this reported issues and bisected and also found the missing commit for stable-rc 6.6 is e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
Which is definitely *not* stable candidate. We need to understand why the invalidation goes south when the scale go up instead of down.
If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand") which fixes 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my CBMC model, not on the actual kernel. It may be worth adding some WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or num greater than 31.
I haven't investigated properly (and I'm off tomorrow, back on Thu) but it's likely the original code was not very friendly to the maximum range, never tested. Anyway, if one figures out why it goes out of range, I think the solution is to also backport e2768b798a19 to stable.
How about I drop the offending commit from stable and let you all figure out what needs to be added before applying anything else :)
thanks,
greg k-h
On Wed, Apr 17, 2024 at 09:05:12AM +0200, Greg Kroah-Hartman wrote:
On Tue, Apr 16, 2024 at 06:28:10PM +0100, Catalin Marinas wrote:
On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
On Tue, 16 Apr 2024 14:07:30 +0100, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 16 Apr 2024 at 16:04, Mark Brown broonie@kernel.org wrote:
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
The bisect of the boot issue that's affecting the FVP in v6.6 (only) landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), e3ba51ab24fdd in mainline, as being the first bad commit - it's also in the -rc for v6.8 but that seems fine. I've done no investigation beyond the bisect and looking at the commit log to pull out people to CC and note that the fix was explicitly targeted at v6.6.
Anders investigated this reported issues and bisected and also found the missing commit for stable-rc 6.6 is e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
Which is definitely *not* stable candidate. We need to understand why the invalidation goes south when the scale go up instead of down.
If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand") which fixes 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my CBMC model, not on the actual kernel. It may be worth adding some WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or num greater than 31.
I haven't investigated properly (and I'm off tomorrow, back on Thu) but it's likely the original code was not very friendly to the maximum range, never tested. Anyway, if one figures out why it goes out of range, I think the solution is to also backport e2768b798a19 to stable.
How about I drop the offending commit from stable and let you all figure out what needs to be added before applying anything else :)
It makes sense ;). We'll send them to stable once sorted.
On Tue, 16 Apr 2024 18:28:10 +0100, Catalin Marinas catalin.marinas@arm.com wrote:
On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
On Tue, 16 Apr 2024 14:07:30 +0100, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 16 Apr 2024 at 16:04, Mark Brown broonie@kernel.org wrote:
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
The bisect of the boot issue that's affecting the FVP in v6.6 (only) landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), e3ba51ab24fdd in mainline, as being the first bad commit - it's also in the -rc for v6.8 but that seems fine. I've done no investigation beyond the bisect and looking at the commit log to pull out people to CC and note that the fix was explicitly targeted at v6.6.
Anders investigated this reported issues and bisected and also found the missing commit for stable-rc 6.6 is e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
Which is definitely *not* stable candidate. We need to understand why the invalidation goes south when the scale go up instead of down.
If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand") which fixes 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my CBMC model, not on the actual kernel. It may be worth adding some WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or num greater than 31.
I haven't investigated properly (and I'm off tomorrow, back on Thu) but it's likely the original code was not very friendly to the maximum range, never tested. Anyway, if one figures out why it goes out of range, I think the solution is to also backport e2768b798a19 to stable.
I looked into this, and I came to the conclusion that this patch is pretty much incompatible with the increasing scale (even if you cap num to 30).
The number of pages to invalidate is a 20 bit quantity, a 5 bit slice per scale. With the 6.6 approach (limit of num=30 and increasing scale), we invalidate each 5 bit slice independently. After each scale round, the corresponding slice is guaranteed to be 0.
With the 6.9 method, we invalidate the maximum possible for a given scale. With a decreasing scale, we converge towards 0 or 1 on each round. With an increasing scale, this breaks spectacularly, because the strong guarantee that the remaining page count is "aligned" to 2^(5*scale+1) is not valid anymore (the low bits may not be 0).
As a result, we don't converge because we never consider these low bits anymore, the page count doesn't decrease, scale goes past 3, and everything catches fire.
So despite my earlier comment, it looks like picking e2768b798a19 is the right thing to do *if* we're taking e3ba51ab24fd into 6.6-stable.
Otherwise, we need a separate fix, which Ryan initially advocating for initially.
Thanks,
M.
On Thu, Apr 18, 2024 at 12:07:35PM +0100, Marc Zyngier wrote:
On Tue, 16 Apr 2024 18:28:10 +0100, Catalin Marinas catalin.marinas@arm.com wrote:
On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
On Tue, 16 Apr 2024 14:07:30 +0100, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 16 Apr 2024 at 16:04, Mark Brown broonie@kernel.org wrote:
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
The bisect of the boot issue that's affecting the FVP in v6.6 (only) landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), e3ba51ab24fdd in mainline, as being the first bad commit - it's also in the -rc for v6.8 but that seems fine. I've done no investigation beyond the bisect and looking at the commit log to pull out people to CC and note that the fix was explicitly targeted at v6.6.
Anders investigated this reported issues and bisected and also found the missing commit for stable-rc 6.6 is e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
Which is definitely *not* stable candidate. We need to understand why the invalidation goes south when the scale go up instead of down.
If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand") which fixes 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my CBMC model, not on the actual kernel. It may be worth adding some WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or num greater than 31.
I haven't investigated properly (and I'm off tomorrow, back on Thu) but it's likely the original code was not very friendly to the maximum range, never tested. Anyway, if one figures out why it goes out of range, I think the solution is to also backport e2768b798a19 to stable.
I looked into this, and I came to the conclusion that this patch is pretty much incompatible with the increasing scale (even if you cap num to 30).
Thanks Marc for digging into this.
So despite my earlier comment, it looks like picking e2768b798a19 is the right thing to do *if* we're taking e3ba51ab24fd into 6.6-stable.
Otherwise, we need a separate fix, which Ryan initially advocating for initially.
My preference would be to cherry-pick the two upstream commits than coming up with an alternative fix for 6.6.
On Thu, Apr 18, 2024 at 12:21:17PM +0100, Catalin Marinas wrote:
On Thu, Apr 18, 2024 at 12:07:35PM +0100, Marc Zyngier wrote:
On Tue, 16 Apr 2024 18:28:10 +0100, Catalin Marinas catalin.marinas@arm.com wrote:
On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
On Tue, 16 Apr 2024 14:07:30 +0100, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 16 Apr 2024 at 16:04, Mark Brown broonie@kernel.org wrote:
On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote: > This is the start of the stable review cycle for the 6.6.28 release. > There are 122 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know.
The bisect of the boot issue that's affecting the FVP in v6.6 (only) landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), e3ba51ab24fdd in mainline, as being the first bad commit - it's also in the -rc for v6.8 but that seems fine. I've done no investigation beyond the bisect and looking at the commit log to pull out people to CC and note that the fix was explicitly targeted at v6.6.
Anders investigated this reported issues and bisected and also found the missing commit for stable-rc 6.6 is e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
Which is definitely *not* stable candidate. We need to understand why the invalidation goes south when the scale go up instead of down.
If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand") which fixes 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my CBMC model, not on the actual kernel. It may be worth adding some WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or num greater than 31.
I haven't investigated properly (and I'm off tomorrow, back on Thu) but it's likely the original code was not very friendly to the maximum range, never tested. Anyway, if one figures out why it goes out of range, I think the solution is to also backport e2768b798a19 to stable.
I looked into this, and I came to the conclusion that this patch is pretty much incompatible with the increasing scale (even if you cap num to 30).
Thanks Marc for digging into this.
So despite my earlier comment, it looks like picking e2768b798a19 is the right thing to do *if* we're taking e3ba51ab24fd into 6.6-stable.
Otherwise, we need a separate fix, which Ryan initially advocating for initially.
My preference would be to cherry-pick the two upstream commits than coming up with an alternative fix for 6.6.
To be specific, which 2 commits, and what order?
thanks,
greg k-h
On Fri, 19 Apr 2024 11:40:33 +0100, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Apr 18, 2024 at 12:21:17PM +0100, Catalin Marinas wrote:
On Thu, Apr 18, 2024 at 12:07:35PM +0100, Marc Zyngier wrote:
On Tue, 16 Apr 2024 18:28:10 +0100, Catalin Marinas catalin.marinas@arm.com wrote:
On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
On Tue, 16 Apr 2024 14:07:30 +0100, Naresh Kamboju naresh.kamboju@linaro.org wrote:
On Tue, 16 Apr 2024 at 16:04, Mark Brown broonie@kernel.org wrote: > On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 6.6.28 release. > > There are 122 patches in this series, all will be posted as a response > > to this one. If anyone has any issues with these being applied, please > > let me know. > > The bisect of the boot issue that's affecting the FVP in v6.6 (only) > landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), > e3ba51ab24fdd in mainline, as being the first bad commit - it's also in > the -rc for v6.8 but that seems fine. I've done no investigation beyond > the bisect and looking at the commit log to pull out people to CC and > note that the fix was explicitly targeted at v6.6.
Anders investigated this reported issues and bisected and also found the missing commit for stable-rc 6.6 is e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
Which is definitely *not* stable candidate. We need to understand why the invalidation goes south when the scale go up instead of down.
If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand") which fixes 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my CBMC model, not on the actual kernel. It may be worth adding some WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or num greater than 31.
I haven't investigated properly (and I'm off tomorrow, back on Thu) but it's likely the original code was not very friendly to the maximum range, never tested. Anyway, if one figures out why it goes out of range, I think the solution is to also backport e2768b798a19 to stable.
I looked into this, and I came to the conclusion that this patch is pretty much incompatible with the increasing scale (even if you cap num to 30).
Thanks Marc for digging into this.
So despite my earlier comment, it looks like picking e2768b798a19 is the right thing to do *if* we're taking e3ba51ab24fd into 6.6-stable.
Otherwise, we need a separate fix, which Ryan initially advocating for initially.
My preference would be to cherry-pick the two upstream commits than coming up with an alternative fix for 6.6.
To be specific, which 2 commits, and what order?
That'd be:
e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
followed by:
e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand")
Thanks,
M.
On Fri, Apr 19, 2024 at 11:50:14AM +0100, Marc Zyngier wrote:
On Fri, 19 Apr 2024 11:40:33 +0100, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
On Thu, Apr 18, 2024 at 12:21:17PM +0100, Catalin Marinas wrote:
On Thu, Apr 18, 2024 at 12:07:35PM +0100, Marc Zyngier wrote:
On Tue, 16 Apr 2024 18:28:10 +0100, Catalin Marinas catalin.marinas@arm.com wrote:
On Tue, Apr 16, 2024 at 02:22:07PM +0100, Marc Zyngier wrote:
On Tue, 16 Apr 2024 14:07:30 +0100, Naresh Kamboju naresh.kamboju@linaro.org wrote: > On Tue, 16 Apr 2024 at 16:04, Mark Brown broonie@kernel.org wrote: > > On Mon, Apr 15, 2024 at 04:19:25PM +0200, Greg Kroah-Hartman wrote: > > > This is the start of the stable review cycle for the 6.6.28 release. > > > There are 122 patches in this series, all will be posted as a response > > > to this one. If anyone has any issues with these being applied, please > > > let me know. > > > > The bisect of the boot issue that's affecting the FVP in v6.6 (only) > > landed on c9ad150ed8dd988 (arm64: tlb: Fix TLBI RANGE operand), > > e3ba51ab24fdd in mainline, as being the first bad commit - it's also in > > the -rc for v6.8 but that seems fine. I've done no investigation beyond > > the bisect and looking at the commit log to pull out people to CC and > > note that the fix was explicitly targeted at v6.6. > > Anders investigated this reported issues and bisected and also found > the missing commit for stable-rc 6.6 is > e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
Which is definitely *not* stable candidate. We need to understand why the invalidation goes south when the scale go up instead of down.
If you backport e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand") which fixes 117940aa6e5f ("KVM: arm64: Define kvm_tlb_flush_vmid_range()") but without the newer e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale"), it looks like "scale" in __flush_tlb_range_op() goes out of range to 4. Tested on my CBMC model, not on the actual kernel. It may be worth adding some WARN_ONs in __flush_tlb_range_op() if scale is outside the 0..3 range or num greater than 31.
I haven't investigated properly (and I'm off tomorrow, back on Thu) but it's likely the original code was not very friendly to the maximum range, never tested. Anyway, if one figures out why it goes out of range, I think the solution is to also backport e2768b798a19 to stable.
I looked into this, and I came to the conclusion that this patch is pretty much incompatible with the increasing scale (even if you cap num to 30).
Thanks Marc for digging into this.
So despite my earlier comment, it looks like picking e2768b798a19 is the right thing to do *if* we're taking e3ba51ab24fd into 6.6-stable.
Otherwise, we need a separate fix, which Ryan initially advocating for initially.
My preference would be to cherry-pick the two upstream commits than coming up with an alternative fix for 6.6.
To be specific, which 2 commits, and what order?
That'd be:
e2768b798a19 ("arm64/mm: Modify range-based tlbi to decrement scale")
followed by:
e3ba51ab24fd ("arm64: tlb: Fix TLBI RANGE operand")
Thanks, now queued up.
greg k-h
On Mon, 15 Apr 2024 16:19:25 +0200, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
All tests passing for Tegra ...
Test results for stable-v6.6: 10 builds: 10 pass, 0 fail 26 boots: 26 pass, 0 fail 102 tests: 102 pass, 0 fail
Linux version: 6.6.28-rc1-ga4e5ff353287 Boards tested: tegra124-jetson-tk1, tegra186-p2771-0000, tegra194-p2972-0000, tegra194-p3509-0000+p3668-0000, tegra20-ventana, tegra210-p2371-2180, tegra210-p3450-0000, tegra30-cardhu-a04
Tested-by: Jon Hunter jonathanh@nvidia.com
Jon
[2024-04-15 16:19] Greg Kroah-Hartman:
This is the start of the stable review cycle for the 6.6.28 release. There are 122 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 17 Apr 2024 14:19:30 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.6.28-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y and the diffstat can be found below.
thanks,
greg k-h
Hi, 6.6.28-rc1 is running fine on both an x86_64 Haswell VM, and on a Mikrotik SXTsq 5 ac (the SoC a Qualcomm Atheros IPQ4018, which has 4 Cortex-A7 cores).
Regards Pascal