Hi All,
This patch series adds initial support for the HEVC(H.265) and VP9
codecs in iris decoder. The objective of this work is to extend the
decoder's capabilities to handle HEVC and VP9 codec streams,
including necessary format handling and buffer management.
In addition, the series also includes a set of fixes to address issues
identified during testing of these additional codecs.
These patches also address the comments and feedback received from the
RFC patches previously sent. I have made the necessary improvements
based on the community's suggestions.
Changes in v5:
- Splitted patch 01/25 in two patches (Bryan)
- Link to v4: https://lore.kernel.org/r/20250507-video-iris-hevc-vp9-v4-0-58db3660ac61@qu…
Changes in v4:
- Splitted patch patch 06/23 in two patches (Bryan)
- Simplified the conditional logic in patch 13/23 (Bryan)
- Improved commit description for patch patch 13/23 (Nicolas)
- Fix the value of H265_NUM_TILE_ROW macro (Neil)
- Link to v3: https://lore.kernel.org/r/20250502-qcom-iris-hevc-vp9-v3-0-552158a10a7d@qui…
Changes in v3:
- Introduced two wrappers with explicit names to handle destroy internal
buffers (Nicolas)
- Used sub state check instead of introducing new boolean (Vikash)
- Addressed other comments (Vikash)
- Reorderd patches to have all fixes patches first (Dmitry)
- Link to v2:
https://lore.kernel.org/r/20250428-qcom-iris-hevc-vp9-v2-0-3a6013ecb8a5@qui…
Changes in v2:
- Added Changes to make sure all buffers are released in session close
(bryna)
- Added tracking for flush responses to fix a timing issue.
- Added a handling to fix timing issue in reconfig
- Splitted patch 06/20 in two patches (Bryan)
- Added missing fixes tag (bryan)
- Updated fluster report (Nicolas)
- Link to v1:
https://lore.kernel.org/r/20250408-iris-dec-hevc-vp9-v1-0-acd258778bd6@quic…
Changes sinces RFC:
- Added additional fixes to address issues identified during further
testing.
- Moved typo fix to a seperate patch [Neil]
- Reordered the patches for better logical flow and clarity [Neil,
Dmitry]
- Added fixes tag wherever applicable [Neil, Dmitry]
- Removed the default case in the switch statement for codecs [Bryan]
- Replaced if-else statements with switch-case [Bryan]
- Added comments for mbpf [Bryan]
- RFC:
https://lore.kernel.org/linux-media/20250305104335.3629945-1-quic_dikshita@…
These patches are tested on SM8250 and SM8550 with v4l2-ctl and
Gstreamer for HEVC and VP9 decoders, at the same time ensured that
the existing H264 decoder functionality remains uneffected.
Note: 1 of the fluster compliance test is fixed with firmware [3]
[3]:
https://lore.kernel.org/linux-firmware/1a511921-446d-cdc4-0203-084c88a5dc1e…
The result of fluster test on SM8550:
131/147 testcases passed while testing JCT-VC-HEVC_V1 with
GStreamer-H.265-V4L2-Gst1.0.
The failing test case:
- 10 testcases failed due to unsupported 10 bit format.
- DBLK_A_MAIN10_VIXS_4
- INITQP_B_Main10_Sony_1
- TSUNEQBD_A_MAIN10_Technicolor_2
- WP_A_MAIN10_Toshiba_3
- WP_MAIN10_B_Toshiba_3
- WPP_A_ericsson_MAIN10_2
- WPP_B_ericsson_MAIN10_2
- WPP_C_ericsson_MAIN10_2
- WPP_E_ericsson_MAIN10_2
- WPP_F_ericsson_MAIN10_2
- 4 testcase failed due to unsupported resolution
- PICSIZE_A_Bossen_1
- PICSIZE_B_Bossen_1
- WPP_D_ericsson_MAIN10_2
- WPP_D_ericsson_MAIN_2
- 2 testcase failed due to CRC mismatch
- RAP_A_docomo_6
- RAP_B_Bossen_2
- BUG reported:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4392
Analysis - First few frames in this discarded by firmware and are
sent to driver with 0 filled length. Driver send such buffers to
client with timestamp 0 and payload set to 0 and
make buf state to VB2_BUF_STATE_ERROR. Such buffers should be
dropped by GST. But instead, the first frame displayed as green
frame and when a valid buffer is sent to client later with same 0
timestamp, its dropped, leading to CRC mismatch for first frame.
235/305 testcases passed while testing VP9-TEST-VECTORS with
GStreamer-VP9-V4L2-Gst1.0.
The failing test case:
- 64 testcases failed due to unsupported resolution
- vp90-2-02-size-08x08.webm
- vp90-2-02-size-08x10.webm
- vp90-2-02-size-08x16.webm
- vp90-2-02-size-08x18.webm
- vp90-2-02-size-08x32.webm
- vp90-2-02-size-08x34.webm
- vp90-2-02-size-08x64.webm
- vp90-2-02-size-08x66.webm
- vp90-2-02-size-10x08.webm
- vp90-2-02-size-10x10.webm
- vp90-2-02-size-10x16.webm
- vp90-2-02-size-10x18.webm
- vp90-2-02-size-10x32.webm
- vp90-2-02-size-10x34.webm
- vp90-2-02-size-10x64.webm
- vp90-2-02-size-10x66.webm
- vp90-2-02-size-16x08.webm
- vp90-2-02-size-16x10.webm
- vp90-2-02-size-16x16.webm
- vp90-2-02-size-16x18.webm
- vp90-2-02-size-16x32.webm
- vp90-2-02-size-16x34.webm
- vp90-2-02-size-16x64.webm
- vp90-2-02-size-16x66.webm
- vp90-2-02-size-18x08.webm
- vp90-2-02-size-18x10.webm
- vp90-2-02-size-18x16.webm
- vp90-2-02-size-18x18.webm
- vp90-2-02-size-18x32.webm
- vp90-2-02-size-18x34.webm
- vp90-2-02-size-18x64.webm
- vp90-2-02-size-18x66.webm
- vp90-2-02-size-32x08.webm
- vp90-2-02-size-32x10.webm
- vp90-2-02-size-32x16.webm
- vp90-2-02-size-32x18.webm
- vp90-2-02-size-32x32.webm
- vp90-2-02-size-32x34.webm
- vp90-2-02-size-32x64.webm
- vp90-2-02-size-32x66.webm
- vp90-2-02-size-34x08.webm
- vp90-2-02-size-34x10.webm
- vp90-2-02-size-34x16.webm
- vp90-2-02-size-34x18.webm
- vp90-2-02-size-34x32.webm
- vp90-2-02-size-34x34.webm
- vp90-2-02-size-34x64.webm
- vp90-2-02-size-34x66.webm
- vp90-2-02-size-64x08.webm
- vp90-2-02-size-64x10.webm
- vp90-2-02-size-64x16.webm
- vp90-2-02-size-64x18.webm
- vp90-2-02-size-64x32.webm
- vp90-2-02-size-64x34.webm
- vp90-2-02-size-64x64.webm
- vp90-2-02-size-64x66.webm
- vp90-2-02-size-66x08.webm
- vp90-2-02-size-66x10.webm
- vp90-2-02-size-66x16.webm
- vp90-2-02-size-66x18.webm
- vp90-2-02-size-66x32.webm
- vp90-2-02-size-66x34.webm
- vp90-2-02-size-66x64.webm
- vp90-2-02-size-66x66.webm
- 2 testcases failed due to unsupported format
- vp91-2-04-yuv422.webm
- vp91-2-04-yuv444.webm
- 1 testcase failed with CRC mismatch
- vp90-2-22-svc_1280x720_3.ivf
- Bug reported:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4371
- 2 testcase failed due to unsupported resolution after sequence change
- vp90-2-21-resize_inter_320x180_5_1-2.webm
- vp90-2-21-resize_inter_320x180_7_1-2.webm
- 1 testcase failed due to unsupported stream
- vp90-2-16-intra-only.webm
The result of fluster test on SM8250:
133/147 testcases passed while testing JCT-VC-HEVC_V1 with
GStreamer-H.265-V4L2-Gst1.0.
The failing test case:
- 10 testcases failed due to unsupported 10 bit format.
- DBLK_A_MAIN10_VIXS_4
- INITQP_B_Main10_Sony_1
- TSUNEQBD_A_MAIN10_Technicolor_2
- WP_A_MAIN10_Toshiba_3
- WP_MAIN10_B_Toshiba_3
- WPP_A_ericsson_MAIN10_2
- WPP_B_ericsson_MAIN10_2
- WPP_C_ericsson_MAIN10_2
- WPP_E_ericsson_MAIN10_2
- WPP_F_ericsson_MAIN10_2
- 4 testcase failed due to unsupported resolution
- PICSIZE_A_Bossen_1
- PICSIZE_B_Bossen_1
- WPP_D_ericsson_MAIN10_2
- WPP_D_ericsson_MAIN_2
232/305 testcases passed while testing VP9-TEST-VECTORS with
GStreamer-VP9-V4L2-Gst1.0.
The failing test case:
- 64 testcases failed due to unsupported resolution
- vp90-2-02-size-08x08.webm
- vp90-2-02-size-08x10.webm
- vp90-2-02-size-08x16.webm
- vp90-2-02-size-08x18.webm
- vp90-2-02-size-08x32.webm
- vp90-2-02-size-08x34.webm
- vp90-2-02-size-08x64.webm
- vp90-2-02-size-08x66.webm
- vp90-2-02-size-10x08.webm
- vp90-2-02-size-10x10.webm
- vp90-2-02-size-10x16.webm
- vp90-2-02-size-10x18.webm
- vp90-2-02-size-10x32.webm
- vp90-2-02-size-10x34.webm
- vp90-2-02-size-10x64.webm
- vp90-2-02-size-10x66.webm
- vp90-2-02-size-16x08.webm
- vp90-2-02-size-16x10.webm
- vp90-2-02-size-16x16.webm
- vp90-2-02-size-16x18.webm
- vp90-2-02-size-16x32.webm
- vp90-2-02-size-16x34.webm
- vp90-2-02-size-16x64.webm
- vp90-2-02-size-16x66.webm
- vp90-2-02-size-18x08.webm
- vp90-2-02-size-18x10.webm
- vp90-2-02-size-18x16.webm
- vp90-2-02-size-18x18.webm
- vp90-2-02-size-18x32.webm
- vp90-2-02-size-18x34.webm
- vp90-2-02-size-18x64.webm
- vp90-2-02-size-18x66.webm
- vp90-2-02-size-32x08.webm
- vp90-2-02-size-32x10.webm
- vp90-2-02-size-32x16.webm
- vp90-2-02-size-32x18.webm
- vp90-2-02-size-32x32.webm
- vp90-2-02-size-32x34.webm
- vp90-2-02-size-32x64.webm
- vp90-2-02-size-32x66.webm
- vp90-2-02-size-34x08.webm
- vp90-2-02-size-34x10.webm
- vp90-2-02-size-34x16.webm
- vp90-2-02-size-34x18.webm
- vp90-2-02-size-34x32.webm
- vp90-2-02-size-34x34.webm
- vp90-2-02-size-34x64.webm
- vp90-2-02-size-34x66.webm
- vp90-2-02-size-64x08.webm
- vp90-2-02-size-64x10.webm
- vp90-2-02-size-64x16.webm
- vp90-2-02-size-64x18.webm
- vp90-2-02-size-64x32.webm
- vp90-2-02-size-64x34.webm
- vp90-2-02-size-64x64.webm
- vp90-2-02-size-64x66.webm
- vp90-2-02-size-66x08.webm
- vp90-2-02-size-66x10.webm
- vp90-2-02-size-66x16.webm
- vp90-2-02-size-66x18.webm
- vp90-2-02-size-66x32.webm
- vp90-2-02-size-66x34.webm
- vp90-2-02-size-66x64.webm
- vp90-2-02-size-66x66.webm
- 2 testcases failed due to unsupported format
- vp91-2-04-yuv422.webm
- vp91-2-04-yuv444.webm
- 1 testcase failed with CRC mismatch
- vp90-2-22-svc_1280x720_3.ivf
- Bug raised:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4371
- 5 testcase failed due to unsupported resolution after sequence change
- vp90-2-21-resize_inter_320x180_5_1-2.webm
- vp90-2-21-resize_inter_320x180_7_1-2.webm
- vp90-2-21-resize_inter_320x240_5_1-2.webm
- vp90-2-21-resize_inter_320x240_7_1-2.webm
- vp90-2-18-resize.ivf
- 1 testcase failed with CRC mismatch
- vp90-2-16-intra-only.webm
Analysis: First few frames are marked by firmware as NO_SHOW frame.
Driver make buf state to VB2_BUF_STATE_ERROR for such frames.
Such buffers should be dropped by GST. But instead, the first frame
is being displayed and when a valid buffer is sent to client later
with same timestamp, its dropped, leading to CRC mismatch for first
frame.
To: Vikash Garodia <quic_vgarodia(a)quicinc.com>
To: Abhinav Kumar <quic_abhinavk(a)quicinc.com>
To: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
To: Mauro Carvalho Chehab <mchehab(a)kernel.org>
To: Stefan Schmidt <stefan.schmidt(a)linaro.org>
To: Hans Verkuil <hverkuil(a)xs4all.nl>
Cc: linux-media(a)vger.kernel.org
Cc: linux-arm-msm(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: Dmitry Baryshkov <dmitry.baryshkov(a)oss.qualcomm.com>
Cc: Neil Armstrong <neil.armstrong(a)linaro.org>
Cc: Nicolas Dufresne <nicolas.dufresne(a)collabora.com>
Cc: Dan Carpenter <dan.carpenter(a)linaro.org>
Signed-off-by: Dikshita Agarwal <quic_dikshita(a)quicinc.com>
---
Dikshita Agarwal (26):
media: iris: Skip destroying internal buffer if not dequeued
media: iris: Verify internal buffer release on close
media: iris: Update CAPTURE format info based on OUTPUT format
media: iris: Avoid updating frame size to firmware during reconfig
media: iris: Drop port check for session property response
media: iris: Prevent HFI queue writes when core is in deinit state
media: iris: Remove error check for non-zero v4l2 controls
media: iris: Remove deprecated property setting to firmware
media: iris: Fix missing function pointer initialization
media: iris: Fix NULL pointer dereference
media: iris: Fix typo in depth variable
media: iris: Track flush responses to prevent premature completion
media: iris: Fix buffer preparation failure during resolution change
media: iris: Send V4L2_BUF_FLAG_ERROR for capture buffers with 0 filled length
media: iris: Skip flush on first sequence change
media: iris: Remove unnecessary re-initialization of flush completion
media: iris: Add handling for corrupt and drop frames
media: iris: Add handling for no show frames
media: iris: Improve last flag handling
media: iris: Remove redundant buffer count check in stream off
media: iris: Add a comment to explain usage of MBPS
media: iris: Add HEVC and VP9 formats for decoder
media: iris: Add platform capabilities for HEVC and VP9 decoders
media: iris: Set mandatory properties for HEVC and VP9 decoders.
media: iris: Add internal buffer calculation for HEVC and VP9 decoders
media: iris: Add codec specific check for VP9 decoder drain handling
drivers/media/platform/qcom/iris/iris_buffer.c | 35 +-
drivers/media/platform/qcom/iris/iris_buffer.h | 3 +-
drivers/media/platform/qcom/iris/iris_ctrls.c | 35 +-
drivers/media/platform/qcom/iris/iris_hfi_common.h | 1 +
.../platform/qcom/iris/iris_hfi_gen1_command.c | 48 ++-
.../platform/qcom/iris/iris_hfi_gen1_defines.h | 5 +-
.../platform/qcom/iris/iris_hfi_gen1_response.c | 37 +-
.../platform/qcom/iris/iris_hfi_gen2_command.c | 143 +++++++-
.../platform/qcom/iris/iris_hfi_gen2_defines.h | 5 +
.../platform/qcom/iris/iris_hfi_gen2_response.c | 56 ++-
drivers/media/platform/qcom/iris/iris_hfi_queue.c | 2 +-
drivers/media/platform/qcom/iris/iris_instance.h | 6 +
.../platform/qcom/iris/iris_platform_common.h | 28 +-
.../media/platform/qcom/iris/iris_platform_gen2.c | 198 ++++++++--
.../platform/qcom/iris/iris_platform_qcs8300.h | 126 +++++--
.../platform/qcom/iris/iris_platform_sm8250.c | 15 +-
drivers/media/platform/qcom/iris/iris_state.c | 2 +-
drivers/media/platform/qcom/iris/iris_state.h | 1 +
drivers/media/platform/qcom/iris/iris_vb2.c | 18 +-
drivers/media/platform/qcom/iris/iris_vdec.c | 116 +++---
drivers/media/platform/qcom/iris/iris_vdec.h | 11 +
drivers/media/platform/qcom/iris/iris_vidc.c | 36 +-
drivers/media/platform/qcom/iris/iris_vpu_buffer.c | 397 ++++++++++++++++++++-
drivers/media/platform/qcom/iris/iris_vpu_buffer.h | 46 ++-
24 files changed, 1159 insertions(+), 211 deletions(-)
---
base-commit: b64b134942c8cf4801ea288b3fd38b509aedec21
change-id: 20250508-video-iris-hevc-vp9-bd35d588500f
Best regards,
--
Dikshita Agarwal <quic_dikshita(a)quicinc.com>
The two alarm LEDs of on the uDPU board are stopped working since
commit 78efa53e715e ("leds: Init leds class earlier").
The LEDs are driven by the GPIO{15,16} pins of the North Bridge
GPIO controller. These pins are part of the 'spi_quad' pin group
for which the 'spi' function is selected via the default pinctrl
state of the 'spi' node. This is wrong however, since in order to
allow controlling the LEDs, the pins should use the 'gpio' function.
Before the commit mentined above, the 'spi' function is selected
first by the pinctrl core before probing the spi driver, but then
it gets overridden to 'gpio' implicitly via the
devm_gpiod_get_index_optional() call from the 'leds-gpio' driver.
After the commit, the LED subsystem gets initialized before the
SPI subsystem, so the function of the pin group remains 'spi'
which in turn prevents controlling of the LEDs.
Despite the change of the initialization order, the root cause is
that the pinctrl state definition is wrong since its initial commit
0d45062cfc89 ("arm64: dts: marvell: Add device tree for uDPU board"),
To fix the problem, override the function in the 'spi_quad_pins'
node to 'gpio' and move the pinctrl state definition from the
'spi' node into the 'leds' node.
Cc: stable(a)vger.kernel.org # needs adjustment for < 6.1
Fixes: 0d45062cfc89 ("arm64: dts: marvell: Add device tree for uDPU board")
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
Signed-off-by: Imre Kaloz <kaloz(a)openwrt.org>
---
Notes:
1. DTB check shows a bunch of warnings, but none of those are new:
DTC [C] arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/watchdog@8300: failed to match any schema with compatible: ['marvell,armada-3700-wdt']
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/serial@12000: failed to match any schema with compatible: ['marvell,armada-3700-uart']
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/serial@12200: failed to match any schema with compatible: ['marvell,armada-3700-uart-ext']
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/nb-periph-clk@13000: failed to match any schema with compatible: ['marvell,armada-3700-periph-clock-nb', 'syscon']
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/sb-periph-clk@18000: failed to match any schema with compatible: ['marvell,armada-3700-periph-clock-sb']
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/tbg@13200: failed to match any schema with compatible: ['marvell,armada-3700-tbg-clock']
<...>/arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: pinctrl@13800: reg: [[79872, 256], [80896, 32]] is too long
from schema $id: http://devicetree.org/schemas/mfd/syscon-common.yaml#
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/pinctrl@13800: failed to match any schema with compatible: ['marvell,armada3710-nb-pinctrl', 'syscon', 'simple-mfd']
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/pinctrl@13800/xtal-clk: failed to match any schema with compatible: ['marvell,armada-3700-xtal-clock']
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/phy@18300: failed to match any schema with compatible: ['marvell,comphy-a3700']
<...>/arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: pinctrl@18800: reg: [[100352, 256], [101376, 32]] is too long
from schema $id: http://devicetree.org/schemas/mfd/syscon-common.yaml#
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/pinctrl@18800: failed to match any schema with compatible: ['marvell,armada3710-sb-pinctrl', 'syscon', 'simple-mfd']
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/ethernet@30000: failed to match any schema with compatible: ['marvell,armada-3700-neta']
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/ethernet@40000: failed to match any schema with compatible: ['marvell,armada-3700-neta']
<...>/arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: usb@58000: Unevaluated properties are not allowed ('marvell,usb-misc-reg' was unexpected)
from schema $id: http://devicetree.org/schemas/usb/generic-xhci.yaml#
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/system-controller@5d800: failed to match any schema with compatible: ['marvell,armada-3700-usb2-host-device-misc', 'syscon']
<...>/arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: usb@5e000: phy-names:0: 'usb' was expected
from schema $id: http://devicetree.org/schemas/usb/generic-ehci.yaml#
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/xor@60900: failed to match any schema with compatible: ['marvell,armada-3700-xor']
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/bus@d0000000/mailbox@b0000: failed to match any schema with compatible: ['marvell,armada-3700-rwtm-mailbox']
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /soc/pcie@d0070000: failed to match any schema with compatible: ['marvell,armada-3700-pcie']
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtb: /firmware/armada-3700-rwtm: failed to match any schema with compatible: ['marvell,armada-3700-rwtm-firmware']
2. Just for the record, here is the bisect log:
git bisect start
# status: waiting for both good and bad commits
# bad: [7cdabafc001202de9984f22c973305f424e0a8b7] Merge tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
git bisect bad 7cdabafc001202de9984f22c973305f424e0a8b7
# status: waiting for good commit(s), bad commit known
# good: [0c3836482481200ead7b416ca80c68a29cfdaabd] Linux 6.10
git bisect good 0c3836482481200ead7b416ca80c68a29cfdaabd
# bad: [fcc79e1714e8c2b8e216dc3149812edd37884eef] Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
git bisect bad fcc79e1714e8c2b8e216dc3149812edd37884eef
# good: [26bb0d3f38a764b743a3ad5c8b6e5b5044d7ceb4] Merge tag 'for-6.12/block-20240913' of git://git.kernel.dk/linux
git bisect good 26bb0d3f38a764b743a3ad5c8b6e5b5044d7ceb4
# bad: [5e5466433d266046790c0af40a15af0a6be139a1] Merge tag 'char-misc-6.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
git bisect bad 5e5466433d266046790c0af40a15af0a6be139a1
# good: [de848da12f752170c2ebe114804a985314fd5a6a] Merge tag 'drm-next-2024-09-19' of https://gitlab.freedesktop.org/drm/kernel
git bisect good de848da12f752170c2ebe114804a985314fd5a6a
# bad: [962ad08780a5bfb3240bc793e565181eacfceafb] Merge tag 'pinctrl-v6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
git bisect bad 962ad08780a5bfb3240bc793e565181eacfceafb
# good: [440b65232829fad69947b8de983c13a525cc8871] Merge tag 'bpf-next-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
git bisect good 440b65232829fad69947b8de983c13a525cc8871
# good: [f8ffbc365f703d74ecca8ca787318d05bbee2bf7] Merge tag 'pull-stable-struct_fd' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
git bisect good f8ffbc365f703d74ecca8ca787318d05bbee2bf7
# good: [18ba6034468e7949a9e2c2cf28e2e123b4fe7a50] Merge tag 'nfsd-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux
git bisect good 18ba6034468e7949a9e2c2cf28e2e123b4fe7a50
# bad: [bb78146c18ac67f22cabb2448b501bcac30f8801] Merge branch 'pci/controller/xilinx'
git bisect bad bb78146c18ac67f22cabb2448b501bcac30f8801
# bad: [b893f8ea38c530c2c8a337c3429f9f37e6bf65e8] Merge branch 'pci/controller/brcmstb'
git bisect bad b893f8ea38c530c2c8a337c3429f9f37e6bf65e8
# bad: [207bcb73fb08841e242fa1d66e1d0381836da562] Merge branch 'pci/dt-bindings'
git bisect bad 207bcb73fb08841e242fa1d66e1d0381836da562
# good: [e642aa6b38762a2af3a7e0c5e6dac5841c15dea0] Merge branch 'pci/iommu'
git bisect good e642aa6b38762a2af3a7e0c5e6dac5841c15dea0
# good: [f500a2f1282750fb344ce535d78071cf1493efd1] dt-bindings: PCI: imx6q-pcie: Add reg-name "dbi2" and "atu" for i.MX8M PCIe Endpoint
git bisect good f500a2f1282750fb344ce535d78071cf1493efd1
# bad: [d774674f3492740503a3cd3f5da131d088202f1b] Merge branch 'pci/pwrctl'
git bisect bad d774674f3492740503a3cd3f5da131d088202f1b
# bad: [759ec28242894f2006a1606c1d6e9aca48cecfcf] PCI/NPEM: Add _DSM PCIe SSD status LED management
git bisect bad 759ec28242894f2006a1606c1d6e9aca48cecfcf
# bad: [4e893545ef8712d25f3176790ebb95beb073637e] PCI/NPEM: Add Native PCIe Enclosure Management support
git bisect bad 4e893545ef8712d25f3176790ebb95beb073637e
# bad: [78efa53e715e21a97c722dba20f8437a0860521e] leds: Init leds class earlier
git bisect bad 78efa53e715e21a97c722dba20f8437a0860521e
# first bad commit: [78efa53e715e21a97c722dba20f8437a0860521e] leds: Init leds class earlier
---
arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtsi | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtsi b/arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtsi
index 3a9b6907185d0363dff41178543a0210ce99dbf7..24282084570787630cb0beeab3997b943bdf45dc 100644
--- a/arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtsi
+++ b/arch/arm64/boot/dts/marvell/armada-3720-uDPU.dtsi
@@ -26,6 +26,8 @@ memory@0 {
leds {
compatible = "gpio-leds";
+ pinctrl-names = "default";
+ pinctrl-0 = <&spi_quad_pins>;
led-power1 {
label = "udpu:green:power";
@@ -82,8 +84,6 @@ &sdhci0 {
&spi0 {
status = "okay";
- pinctrl-names = "default";
- pinctrl-0 = <&spi_quad_pins>;
flash@0 {
compatible = "jedec,spi-nor";
@@ -108,6 +108,10 @@ partition@180000 {
};
};
+&spi_quad_pins {
+ function = "gpio";
+};
+
&pinctrl_nb {
i2c2_recovery_pins: i2c2-recovery-pins {
groups = "i2c2";
---
base-commit: 92a09c47464d040866cf2b4cd052bc60555185fb
change-id: 20250509-udpu-alarm-led-fix-62828f7e11eb
Best regards,
--
Gabor Juhos <j4g8y7(a)gmail.com>
A new on by default warning in clang aims to flag cases where a const
variable or field is not initialized and has no default value (i.e., not
static or thread local). The field version of the warning triggers in
several places within the kernel that are not problematic so it is
disabled in the first patch. The variable version of the warning only
triggers in one place, the typecheck() macro, so I opted to silence it
in that one place to keep it enabled until it can be proved to be
problematic enough to disable it.
---
Nathan Chancellor (2):
kbuild: Disable -Wdefault-const-init-field-unsafe
include/linux/typecheck.h: Zero initialize dummy variables
include/linux/typecheck.h | 4 ++--
scripts/Makefile.extrawarn | 7 +++++++
2 files changed, 9 insertions(+), 2 deletions(-)
---
base-commit: ebd297a2affadb6f6f4d2e5d975c1eda18ac762d
change-id: 20250430-default-const-init-clang-b6e21b8d03b6
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
From: Peter Korsgaard <peter(a)korsgaard.com>
Commit 29be47fcd6a0 ("nvmem: zynqmp_nvmem: zynqmp_nvmem_probe cleanup")
changed the driver to expect the device pointer to be passed as the
"context", but in nvmem the context parameter comes from nvmem_config.priv
which is never set - Leading to null pointer exceptions when the device is
accessed.
Fixes: 29be47fcd6a0 ("nvmem: zynqmp_nvmem: zynqmp_nvmem_probe cleanup")
Cc: stable(a)vger.kernel.org
Signed-off-by: Peter Korsgaard <peter(a)korsgaard.com>
Reviewed-by: Michal Simek <michal.simek(a)amd.com>
Tested-by: Michal Simek <michal.simek(a)amd.com>
Signed-off-by: Srinivas Kandagatla <srini(a)kernel.org>
---
drivers/nvmem/zynqmp_nvmem.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/nvmem/zynqmp_nvmem.c b/drivers/nvmem/zynqmp_nvmem.c
index 8682adaacd69..7da717d6c7fa 100644
--- a/drivers/nvmem/zynqmp_nvmem.c
+++ b/drivers/nvmem/zynqmp_nvmem.c
@@ -213,6 +213,7 @@ static int zynqmp_nvmem_probe(struct platform_device *pdev)
econfig.word_size = 1;
econfig.size = ZYNQMP_NVMEM_SIZE;
econfig.dev = dev;
+ econfig.priv = dev;
econfig.add_legacy_fixed_of_cells = true;
econfig.reg_read = zynqmp_nvmem_read;
econfig.reg_write = zynqmp_nvmem_write;
--
2.43.0
From: Omar Sandoval <osandov(a)fb.com>
commit bbce3de72be56e4b5f68924b7da9630cc89aa1a8 upstream.
There is a code path in dequeue_entities() that can set the slice of a
sched_entity to U64_MAX, which sometimes results in a crash.
The offending case is when dequeue_entities() is called to dequeue a
delayed group entity, and then the entity's parent's dequeue is delayed.
In that case:
1. In the if (entity_is_task(se)) else block at the beginning of
dequeue_entities(), slice is set to
cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then
it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX.
2. The first for_each_sched_entity() loop dequeues the entity.
3. If the entity was its parent's only child, then the next iteration
tries to dequeue the parent.
4. If the parent's dequeue needs to be delayed, then it breaks from the
first for_each_sched_entity() loop _without updating slice_.
5. The second for_each_sched_entity() loop sets the parent's ->slice to
the saved slice, which is still U64_MAX.
This throws off subsequent calculations with potentially catastrophic
results. A manifestation we saw in production was:
6. In update_entity_lag(), se->slice is used to calculate limit, which
ends up as a huge negative number.
7. limit is used in se->vlag = clamp(vlag, -limit, limit). Because limit
is negative, vlag > limit, so se->vlag is set to the same huge
negative number.
8. In place_entity(), se->vlag is scaled, which overflows and results in
another huge (positive or negative) number.
9. The adjusted lag is subtracted from se->vruntime, which increases or
decreases se->vruntime by a huge number.
10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which
incorrectly returns false because the vruntime is so far from the
other vruntimes on the queue, causing the
(vruntime - cfs_rq->min_vruntime) * load calulation to overflow.
11. Nothing appears to be eligible, so pick_eevdf() returns NULL.
12. pick_next_entity() tries to dereference the return value of
pick_eevdf() and crashes.
Dumping the cfs_rq states from the core dumps with drgn showed tell-tale
huge vruntime ranges and bogus vlag values, and I also traced se->slice
being set to U64_MAX on live systems (which was usually "benign" since
the rest of the runqueue needed to be in a particular state to crash).
Fix it in dequeue_entities() by always setting slice from the first
non-empty cfs_rq.
Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
Signed-off-by: Omar Sandoval <osandov(a)fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Link: https://lkml.kernel.org/r/f0c2d1072be229e1bdddc73c0703919a8b00c652.17455709…
---
Stable backport to 6.12.y resolving a trivial conflict in the patch
context.
Thanks,
Omar
kernel/sched/fair.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index ceb023629d48..990d0828bf2a 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -7182,9 +7182,6 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
idle_h_nr_running = task_has_idle_policy(p);
if (!task_sleep && !task_delayed)
h_nr_delayed = !!se->sched_delayed;
- } else {
- cfs_rq = group_cfs_rq(se);
- slice = cfs_rq_min_slice(cfs_rq);
}
for_each_sched_entity(se) {
@@ -7194,6 +7191,7 @@ static int dequeue_entities(struct rq *rq, struct sched_entity *se, int flags)
if (p && &p->se == se)
return -1;
+ slice = cfs_rq_min_slice(cfs_rq);
break;
}
--
2.49.0
commit 968f19c5b1b7d5595423b0ac0020cc18dfed8cb5 upstream.
[BUG]
It is a long known bug that VM image on btrfs can lead to data csum
mismatch, if the qemu is using direct-io for the image (this is commonly
known as cache mode 'none').
[CAUSE]
Inside the VM, if the fs is EXT4 or XFS, or even NTFS from Windows, the
fs is allowed to dirty/modify the folio even if the folio is under
writeback (as long as the address space doesn't have AS_STABLE_WRITES
flag inherited from the block device).
This is a valid optimization to improve the concurrency, and since these
filesystems have no extra checksum on data, the content change is not a
problem at all.
But the final write into the image file is handled by btrfs, which needs
the content not to be modified during writeback, or the checksum will
not match the data (checksum is calculated before submitting the bio).
So EXT4/XFS/NTRFS assume they can modify the folio under writeback, but
btrfs requires no modification, this leads to the false csum mismatch.
This is only a controlled example, there are even cases where
multi-thread programs can submit a direct IO write, then another thread
modifies the direct IO buffer for whatever reason.
For such cases, btrfs has no sane way to detect such cases and leads to
false data csum mismatch.
[FIX]
I have considered the following ideas to solve the problem:
- Make direct IO to always skip data checksum
This not only requires a new incompatible flag, as it breaks the
current per-inode NODATASUM flag.
But also requires extra handling for no csum found cases.
And this also reduces our checksum protection.
- Let hardware handle all the checksum
AKA, just nodatasum mount option.
That requires trust for hardware (which is not that trustful in a lot
of cases), and it's not generic at all.
- Always fallback to buffered write if the inode requires checksum
This was suggested by Christoph, and is the solution utilized by this
patch.
The cost is obvious, the extra buffer copying into page cache, thus it
reduces the performance.
But at least it's still user configurable, if the end user still wants
the zero-copy performance, just set NODATASUM flag for the inode
(which is a common practice for VM images on btrfs).
Since we cannot trust user space programs to keep the buffer
consistent during direct IO, we have no choice but always falling back
to buffered IO. At least by this, we avoid the more deadly false data
checksum mismatch error.
CC: stable(a)vger.kernel.org # 6.6
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
[ Fix a conflict due to the movement of the function. ]
---
Changelog:
v2:
- Remove the incorrectly included direct-io.c
"git am" automatically included the not-yet-exist file into the diff.
---
fs/btrfs/file.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e794606e7c78..f1456c745c6d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1515,6 +1515,23 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
goto buffered;
}
+ /*
+ * We can't control the folios being passed in, applications can write
+ * to them while a direct IO write is in progress. This means the
+ * content might change after we calculated the data checksum.
+ * Therefore we can end up storing a checksum that doesn't match the
+ * persisted data.
+ *
+ * To be extra safe and avoid false data checksum mismatch, if the
+ * inode requires data checksum, just fallback to buffered IO.
+ * For buffered IO we have full control of page cache and can ensure
+ * no one is modifying the content during writeback.
+ */
+ if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) {
+ btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
+ goto buffered;
+ }
+
/*
* The iov_iter can be mapped to the same file range we are writing to.
* If that's the case, then we will deadlock in the iomap code, because
--
2.49.0
Some of our devices crash in tb_cfg_request_dequeue():
general protection fault, probably for non-canonical address 0xdead000000000122
CPU: 6 PID: 91007 Comm: kworker/6:2 Tainted: G U W 6.6.65
RIP: 0010:tb_cfg_request_dequeue+0x2d/0xa0
Call Trace:
<TASK>
? tb_cfg_request_dequeue+0x2d/0xa0
tb_cfg_request_work+0x33/0x80
worker_thread+0x386/0x8f0
kthread+0xed/0x110
ret_from_fork+0x38/0x50
ret_from_fork_asm+0x1b/0x30
The circumstances are unclear, however, the theory is that
tb_cfg_request_work() can be scheduled twice for a request:
first time via frame.callback from ring_work() and second
time from tb_cfg_request(). Both times kworkers will execute
tb_cfg_request_dequeue(), which results in double list_del()
from the ctl->request_queue (the list poison deference hints
at it: 0xdead000000000122).
Do not dequeue requests that don't have TB_CFG_REQUEST_ACTIVE
bit set.
Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Cc: stable(a)vger.kernel.org
---
v3: tweaked commit message
drivers/thunderbolt/ctl.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/thunderbolt/ctl.c b/drivers/thunderbolt/ctl.c
index cd15e84c47f4..1db2e951b53f 100644
--- a/drivers/thunderbolt/ctl.c
+++ b/drivers/thunderbolt/ctl.c
@@ -151,6 +151,11 @@ static void tb_cfg_request_dequeue(struct tb_cfg_request *req)
struct tb_ctl *ctl = req->ctl;
mutex_lock(&ctl->request_queue_lock);
+ if (!test_bit(TB_CFG_REQUEST_ACTIVE, &req->flags)) {
+ mutex_unlock(&ctl->request_queue_lock);
+ return;
+ }
+
list_del(&req->list);
clear_bit(TB_CFG_REQUEST_ACTIVE, &req->flags);
if (test_bit(TB_CFG_REQUEST_CANCELED, &req->flags))
--
2.49.0.395.g12beb8f557-goog
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 36991c1ccde2d5a521577c448ffe07fcccfe104d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025050939-activism-hesitant-7576@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 36991c1ccde2d5a521577c448ffe07fcccfe104d Mon Sep 17 00:00:00 2001
From: Sean Heelan <seanheelan(a)gmail.com>
Date: Tue, 6 May 2025 22:04:52 +0900
Subject: [PATCH] ksmbd: Fix UAF in __close_file_table_ids
A use-after-free is possible if one thread destroys the file
via __ksmbd_close_fd while another thread holds a reference to
it. The existing checks on fp->refcount are not sufficient to
prevent this.
The fix takes ft->lock around the section which removes the
file from the file table. This prevents two threads acquiring the
same file pointer via __close_file_table_ids, as well as the other
functions which retrieve a file from the IDR and which already use
this same lock.
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Heelan <seanheelan(a)gmail.com>
Acked-by: Namjae Jeon <linkinjeon(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/server/vfs_cache.c b/fs/smb/server/vfs_cache.c
index 1f8fa3468173..dfed6fce8904 100644
--- a/fs/smb/server/vfs_cache.c
+++ b/fs/smb/server/vfs_cache.c
@@ -661,21 +661,40 @@ __close_file_table_ids(struct ksmbd_file_table *ft,
bool (*skip)(struct ksmbd_tree_connect *tcon,
struct ksmbd_file *fp))
{
- unsigned int id;
- struct ksmbd_file *fp;
- int num = 0;
+ struct ksmbd_file *fp;
+ unsigned int id = 0;
+ int num = 0;
- idr_for_each_entry(ft->idr, fp, id) {
- if (skip(tcon, fp))
+ while (1) {
+ write_lock(&ft->lock);
+ fp = idr_get_next(ft->idr, &id);
+ if (!fp) {
+ write_unlock(&ft->lock);
+ break;
+ }
+
+ if (skip(tcon, fp) ||
+ !atomic_dec_and_test(&fp->refcount)) {
+ id++;
+ write_unlock(&ft->lock);
continue;
+ }
set_close_state_blocked_works(fp);
+ idr_remove(ft->idr, fp->volatile_id);
+ fp->volatile_id = KSMBD_NO_FID;
+ write_unlock(&ft->lock);
+
+ down_write(&fp->f_ci->m_lock);
+ list_del_init(&fp->node);
+ up_write(&fp->f_ci->m_lock);
- if (!atomic_dec_and_test(&fp->refcount))
- continue;
__ksmbd_close_fd(ft, fp);
+
num++;
+ id++;
}
+
return num;
}
Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com>
---
Changes in v3:
- To both Andreas Hindborg and Breno Leitao.
- Link to v2: https://lore.kernel.org/r/20250415-fix_configfs-v2-0-fcd527dd1824@quicinc.c…
Changes in v2:
- Drop the last patch which seems wrong.
- Link to v1: https://lore.kernel.org/r/20250408-fix_configfs-v1-0-5a4c88805df7@quicinc.c…
---
Zijun Hu (3):
configfs: Delete semicolon from macro type_print() definition
configfs: Do not override creating attribute file failure in populate_attrs()
configfs: Correct error value returned by API config_item_set_name()
fs/configfs/dir.c | 4 ++--
fs/configfs/item.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)
---
base-commit: eae324ca644554d5ce363186bee820a088bb74ab
change-id: 20250408-fix_configfs-699743163c64
Best regards,
--
Zijun Hu <quic_zijuhu(a)quicinc.com>
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x c23c03bf1faa1e76be1eba35bad6da6a2a7c95ee
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025050945-multitude-powdered-34d0@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c23c03bf1faa1e76be1eba35bad6da6a2a7c95ee Mon Sep 17 00:00:00 2001
From: Cristian Marussi <cristian.marussi(a)arm.com>
Date: Mon, 10 Mar 2025 17:58:00 +0000
Subject: [PATCH] firmware: arm_scmi: Fix timeout checks on polling path
Polling mode transactions wait for a reply busy-looping without holding a
spinlock, but currently the timeout checks are based only on elapsed time:
as a result we could hit a false positive whenever our busy-looping thread
is pre-empted and scheduled out for a time greater than the polling
timeout.
Change the checks at the end of the busy-loop to make sure that the polling
wasn't indeed successful or an out-of-order reply caused the polling to be
forcibly terminated.
Fixes: 31d2f803c19c ("firmware: arm_scmi: Add sync_cmds_completed_on_ret transport flag")
Reported-by: Huangjie <huangjie1663(a)phytium.com.cn>
Closes: https://lore.kernel.org/arm-scmi/20250123083323.2363749-1-jackhuang021@gmai…
Signed-off-by: Cristian Marussi <cristian.marussi(a)arm.com>
Cc: stable(a)vger.kernel.org # 5.18.x
Message-Id: <20250310175800.1444293-1-cristian.marussi(a)arm.com>
Signed-off-by: Sudeep Holla <sudeep.holla(a)arm.com>
diff --git a/drivers/firmware/arm_scmi/driver.c b/drivers/firmware/arm_scmi/driver.c
index 1c75a4c9c371..0390d5ff195e 100644
--- a/drivers/firmware/arm_scmi/driver.c
+++ b/drivers/firmware/arm_scmi/driver.c
@@ -1248,7 +1248,8 @@ static void xfer_put(const struct scmi_protocol_handle *ph,
}
static bool scmi_xfer_done_no_timeout(struct scmi_chan_info *cinfo,
- struct scmi_xfer *xfer, ktime_t stop)
+ struct scmi_xfer *xfer, ktime_t stop,
+ bool *ooo)
{
struct scmi_info *info = handle_to_scmi_info(cinfo->handle);
@@ -1257,7 +1258,7 @@ static bool scmi_xfer_done_no_timeout(struct scmi_chan_info *cinfo,
* in case of out-of-order receptions of delayed responses
*/
return info->desc->ops->poll_done(cinfo, xfer) ||
- try_wait_for_completion(&xfer->done) ||
+ (*ooo = try_wait_for_completion(&xfer->done)) ||
ktime_after(ktime_get(), stop);
}
@@ -1274,15 +1275,17 @@ static int scmi_wait_for_reply(struct device *dev, const struct scmi_desc *desc,
* itself to support synchronous commands replies.
*/
if (!desc->sync_cmds_completed_on_ret) {
+ bool ooo = false;
+
/*
* Poll on xfer using transport provided .poll_done();
* assumes no completion interrupt was available.
*/
ktime_t stop = ktime_add_ms(ktime_get(), timeout_ms);
- spin_until_cond(scmi_xfer_done_no_timeout(cinfo,
- xfer, stop));
- if (ktime_after(ktime_get(), stop)) {
+ spin_until_cond(scmi_xfer_done_no_timeout(cinfo, xfer,
+ stop, &ooo));
+ if (!ooo && !info->desc->ops->poll_done(cinfo, xfer)) {
dev_err(dev,
"timed out in resp(caller: %pS) - polling\n",
(void *)_RET_IP_);
Modify the framework to adapt to more map modes, add benchmark
support for dma_map_sg, and add support sg map mode in ioctl.
The result:
[root@localhost]# ./dma_map_benchmark -m 1 -g 8 -t 8 -s 30 -d 2
dma mapping mode: DMA_MAP_SG_MODE
dma mapping benchmark: threads:8 seconds:30 node:-1 dir:FROM_DEVICE granule/sg_nents: 8
average map latency(us):1.4 standard deviation:0.3
average unmap latency(us):1.3 standard deviation:0.3
[root@localhost]# ./dma_map_benchmark -m 0 -g 8 -t 8 -s 30 -d 2
dma mapping mode: DMA_MAP_SINGLE_MODE
dma mapping benchmark: threads:8 seconds:30 node:-1 dir:FROM_DEVICE granule/sg_nents: 8
average map latency(us):1.0 standard deviation:0.3
average unmap latency(us):1.3 standard deviation:0.5
---
Changes since V2:
- Address the comments from Barry and ALOK, some commit information and function
input parameter names are modified to make them more accurate.
- Link: https://lore.kernel.org/all/20250506030100.394376-1-xiaqinxin@huawei.com/
Changes since V1:
- Address the comments from Barry, added some comments and changed the unmap type to void.
- Link: https://lore.kernel.org/lkml/20250212022718.1995504-1-xiaqinxin@huawei.com/
Qinxin Xia (4):
dma-mapping: benchmark: Add padding to ensure uABI remained consistent
dma-mapping: benchmark: modify the framework to adapt to more map modes
dma-mapping: benchmark: add support for dma_map_sg
selftests/dma: Add dma_map_sg support
include/linux/map_benchmark.h | 46 +++-
kernel/dma/map_benchmark.c | 225 ++++++++++++++++--
.../testing/selftests/dma/dma_map_benchmark.c | 16 +-
3 files changed, 252 insertions(+), 35 deletions(-)
--
2.33.0
This series adds support for camera clock controller base driver,
bindings and DT support on sc8180x platform.
Signed-off-by: Satya Priya Kakitapalli <quic_skakitap(a)quicinc.com>
---
Changes in v3:
- Drop Fixes tag in patch [1/4]. Dropped unused gpu_iref and
aggre_ufs_card_2 clk bindings.
- Move the allOf block below required block in bindings patch.
- Remove the unused cam_cc_parent_data_7 and cam_cc_parent_map_7
in the driver patch. Reported by kernel test bot.
- Link to v2: https://lore.kernel.org/r/20250430-sc8180x-camcc-support-v2-0-6bbb514f467c@…
Changes in v2:
- New patch [1/4] to add all the missing gcc bindings along with
the required GCC_CAMERA_AHB_CLOCK
- As per Konrad's comments, add the camera AHB clock dependency in the
DT and yaml bindings.
- As per Vladimir's comments, update the Kconfig to add the SC8180X config
in correct alphanumerical order.
- Link to v1: https://lore.kernel.org/r/20250422-sc8180x-camcc-support-v1-0-691614d13f06@…
---
Satya Priya Kakitapalli (4):
dt-bindings: clock: qcom: Add missing bindings on gcc-sc8180x
dt-bindings: clock: Add Qualcomm SC8180X Camera clock controller
clk: qcom: camcc-sc8180x: Add SC8180X camera clock controller driver
arm64: dts: qcom: Add camera clock controller for sc8180x
.../bindings/clock/qcom,sc8180x-camcc.yaml | 67 +
arch/arm64/boot/dts/qcom/sc8180x.dtsi | 14 +
drivers/clk/qcom/Kconfig | 10 +
drivers/clk/qcom/Makefile | 1 +
drivers/clk/qcom/camcc-sc8180x.c | 2889 ++++++++++++++++++++
include/dt-bindings/clock/qcom,gcc-sc8180x.h | 10 +
include/dt-bindings/clock/qcom,sc8180x-camcc.h | 181 ++
7 files changed, 3172 insertions(+)
---
base-commit: bc8aa6cdadcc00862f2b5720e5de2e17f696a081
change-id: 20250422-sc8180x-camcc-support-9a82507d2a39
Best regards,
--
Satya Priya Kakitapalli <quic_skakitap(a)quicinc.com>
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x be8250786ca94952a19ce87f98ad9906448bc9ef
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025050521-crispy-study-e836@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be8250786ca94952a19ce87f98ad9906448bc9ef Mon Sep 17 00:00:00 2001
From: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Date: Mon, 21 Apr 2025 15:52:32 +0800
Subject: [PATCH] mm, slab: clean up slab->obj_exts always
When memory allocation profiling is disabled at runtime or due to an
error, shutdown_mem_profiling() is called: slab->obj_exts which
previously allocated remains.
It won't be cleared by unaccount_slab() because of
mem_alloc_profiling_enabled() not true. It's incorrect, slab->obj_exts
should always be cleaned up in unaccount_slab() to avoid following error:
[...]BUG: Bad page state in process...
..
[...]page dumped because: page still charged to cgroup
[andriy.shevchenko(a)linux.intel.com: fold need_slab_obj_ext() into its only user]
Fixes: 21c690a349ba ("mm: introduce slabobj_ext to support slab object extensions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Acked-by: Harry Yoo <harry.yoo(a)oracle.com>
Tested-by: Harry Yoo <harry.yoo(a)oracle.com>
Acked-by: Suren Baghdasaryan <surenb(a)google.com>
Link: https://patch.msgid.link/20250421075232.2165527-1-quic_zhenhuah@quicinc.com
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
diff --git a/mm/slub.c b/mm/slub.c
index dc9e729e1d26..be8b09e09d30 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2028,8 +2028,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
return 0;
}
-/* Should be called only if mem_alloc_profiling_enabled() */
-static noinline void free_slab_obj_exts(struct slab *slab)
+static inline void free_slab_obj_exts(struct slab *slab)
{
struct slabobj_ext *obj_exts;
@@ -2049,18 +2048,6 @@ static noinline void free_slab_obj_exts(struct slab *slab)
slab->obj_exts = 0;
}
-static inline bool need_slab_obj_ext(void)
-{
- if (mem_alloc_profiling_enabled())
- return true;
-
- /*
- * CONFIG_MEMCG creates vector of obj_cgroup objects conditionally
- * inside memcg_slab_post_alloc_hook. No other users for now.
- */
- return false;
-}
-
#else /* CONFIG_SLAB_OBJ_EXT */
static inline void init_slab_obj_exts(struct slab *slab)
@@ -2077,11 +2064,6 @@ static inline void free_slab_obj_exts(struct slab *slab)
{
}
-static inline bool need_slab_obj_ext(void)
-{
- return false;
-}
-
#endif /* CONFIG_SLAB_OBJ_EXT */
#ifdef CONFIG_MEM_ALLOC_PROFILING
@@ -2129,7 +2111,7 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
static inline void
alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
{
- if (need_slab_obj_ext())
+ if (mem_alloc_profiling_enabled())
__alloc_tagging_slab_alloc_hook(s, object, flags);
}
@@ -2601,8 +2583,12 @@ static __always_inline void account_slab(struct slab *slab, int order,
static __always_inline void unaccount_slab(struct slab *slab, int order,
struct kmem_cache *s)
{
- if (memcg_kmem_online() || need_slab_obj_ext())
- free_slab_obj_exts(slab);
+ /*
+ * The slab object extensions should now be freed regardless of
+ * whether mem_alloc_profiling_enabled() or not because profiling
+ * might have been disabled after slab->obj_exts got allocated.
+ */
+ free_slab_obj_exts(slab);
mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
-(PAGE_SIZE << order));
After a recent change [1] in clang's randstruct implementation to
randomize structures that only contain function pointers, there is an
error because qede_ll_ops get randomized but does not use a designated
initializer for the first member:
drivers/net/ethernet/qlogic/qede/qede_main.c:206:2: error: a randomized struct can only be initialized with a designated initializer
206 | {
| ^
Explicitly initialize the common member using a designated initializer
to fix the build.
Cc: stable(a)vger.kernel.org
Fixes: 035f7f87b729 ("randstruct: Enable Clang support")
Link: https://github.com/llvm/llvm-project/commit/04364fb888eea6db9811510607bed4b… [1]
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
drivers/net/ethernet/qlogic/qede/qede_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 99df00c30b8c..b5d744d2586f 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -203,7 +203,7 @@ static struct pci_driver qede_pci_driver = {
};
static struct qed_eth_cb_ops qede_ll_ops = {
- {
+ .common = {
#ifdef CONFIG_RFS_ACCEL
.arfs_filter_op = qede_arfs_filter_op,
#endif
---
base-commit: 9540984da649d46f699c47f28c68bbd3c9d99e4c
change-id: 20250507-qede-fix-clang-randstruct-13d8c593cb58
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
When CONFIG_PREEMPT_COUNT is not configured (i.e. CONFIG_PREEMPT_NONE/
CONFIG_PREEMPT_VOLUNTARY), preempt_disable() / preempt_enable() merely
acts as a barrier(). However, in these cases cond_resched() can still
trigger a context switch and modify the CSR.EUEN, resulting in do_fpu()
exception being activated within the kernel-fpu critical sections, as
demonstrated in the following path:
dcn32_calculate_wm_and_dlg()
DC_FP_START()
dcn32_calculate_wm_and_dlg_fpu()
dcn32_find_dummy_latency_index_for_fw_based_mclk_switch()
dcn32_internal_validate_bw()
dcn32_enable_phantom_stream()
dc_create_stream_for_sink()
kzalloc(GFP_KERNEL)
__kmem_cache_alloc_node()
__cond_resched()
DC_FP_END()
This patch is similar to commit d021985 (x86/fpu: Improve crypto
performance by making kernel-mode FPU reliably usable in softirqs). It
uses local_bh_disable() instead of preempt_disable() for non-RT kernels
so it can avoid the cond_resched() issue, and also extend the kernel-fpu
application scenarios to the softirq context.
Cc: stable(a)vger.kernel.org
Signed-off-by: Tianyang Zhang <zhangtianyang(a)loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
arch/loongarch/kernel/kfpu.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/arch/loongarch/kernel/kfpu.c b/arch/loongarch/kernel/kfpu.c
index ec5b28e570c9..4e469b021cf4 100644
--- a/arch/loongarch/kernel/kfpu.c
+++ b/arch/loongarch/kernel/kfpu.c
@@ -18,11 +18,28 @@ static unsigned int euen_mask = CSR_EUEN_FPEN;
static DEFINE_PER_CPU(bool, in_kernel_fpu);
static DEFINE_PER_CPU(unsigned int, euen_current);
+static inline void fpregs_lock(void)
+{
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ local_bh_disable();
+ else
+ preempt_disable();
+}
+
+static inline void fpregs_unlock(void)
+{
+ if (!IS_ENABLED(CONFIG_PREEMPT_RT))
+ local_bh_enable();
+ else
+ preempt_enable();
+}
+
void kernel_fpu_begin(void)
{
unsigned int *euen_curr;
- preempt_disable();
+ if (!irqs_disabled())
+ fpregs_lock();
WARN_ON(this_cpu_read(in_kernel_fpu));
@@ -73,7 +90,8 @@ void kernel_fpu_end(void)
this_cpu_write(in_kernel_fpu, false);
- preempt_enable();
+ if (!irqs_disabled())
+ fpregs_unlock();
}
EXPORT_SYMBOL_GPL(kernel_fpu_end);
--
2.20.1
commit 968f19c5b1b7d5595423b0ac0020cc18dfed8cb5 upstream.
[BUG]
It is a long known bug that VM image on btrfs can lead to data csum
mismatch, if the qemu is using direct-io for the image (this is commonly
known as cache mode 'none').
[CAUSE]
Inside the VM, if the fs is EXT4 or XFS, or even NTFS from Windows, the
fs is allowed to dirty/modify the folio even if the folio is under
writeback (as long as the address space doesn't have AS_STABLE_WRITES
flag inherited from the block device).
This is a valid optimization to improve the concurrency, and since these
filesystems have no extra checksum on data, the content change is not a
problem at all.
But the final write into the image file is handled by btrfs, which needs
the content not to be modified during writeback, or the checksum will
not match the data (checksum is calculated before submitting the bio).
So EXT4/XFS/NTRFS assume they can modify the folio under writeback, but
btrfs requires no modification, this leads to the false csum mismatch.
This is only a controlled example, there are even cases where
multi-thread programs can submit a direct IO write, then another thread
modifies the direct IO buffer for whatever reason.
For such cases, btrfs has no sane way to detect such cases and leads to
false data csum mismatch.
[FIX]
I have considered the following ideas to solve the problem:
- Make direct IO to always skip data checksum
This not only requires a new incompatible flag, as it breaks the
current per-inode NODATASUM flag.
But also requires extra handling for no csum found cases.
And this also reduces our checksum protection.
- Let hardware handle all the checksum
AKA, just nodatasum mount option.
That requires trust for hardware (which is not that trustful in a lot
of cases), and it's not generic at all.
- Always fallback to buffered write if the inode requires checksum
This was suggested by Christoph, and is the solution utilized by this
patch.
The cost is obvious, the extra buffer copying into page cache, thus it
reduces the performance.
But at least it's still user configurable, if the end user still wants
the zero-copy performance, just set NODATASUM flag for the inode
(which is a common practice for VM images on btrfs).
Since we cannot trust user space programs to keep the buffer
consistent during direct IO, we have no choice but always falling back
to buffered IO. At least by this, we avoid the more deadly false data
checksum mismatch error.
CC: stable(a)vger.kernel.org # 6.6
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
[ Fix a conflict due to the movement of the function. ]
---
fs/btrfs/file.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index e794606e7c78..f1456c745c6d 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -1515,6 +1515,23 @@ static ssize_t btrfs_direct_write(struct kiocb *iocb, struct iov_iter *from)
goto buffered;
}
+ /*
+ * We can't control the folios being passed in, applications can write
+ * to them while a direct IO write is in progress. This means the
+ * content might change after we calculated the data checksum.
+ * Therefore we can end up storing a checksum that doesn't match the
+ * persisted data.
+ *
+ * To be extra safe and avoid false data checksum mismatch, if the
+ * inode requires data checksum, just fallback to buffered IO.
+ * For buffered IO we have full control of page cache and can ensure
+ * no one is modifying the content during writeback.
+ */
+ if (!(BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM)) {
+ btrfs_inode_unlock(BTRFS_I(inode), ilock_flags);
+ goto buffered;
+ }
+
/*
* The iov_iter can be mapped to the same file range we are writing to.
* If that's the case, then we will deadlock in the iomap code, because
--
2.49.0
From: Josef Bacik <josef(a)toxicpanda.com>
[ Upstream commit 8cbc3001a3264d998d6b6db3e23f935c158abd4d ]
The submit helper will always run bio_endio() on the bio if it fails to
submit, so cleaning up the bio just leads to a variety of use-after-free
and NULL pointer dereference bugs because we race with the endio
function that is cleaning up the bio. Instead just return BLK_STS_OK as
the repair function has to continue to process the rest of the pages,
and the endio for the repair bio will do the appropriate cleanup for the
page that it was given.
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
[Minor context change fixed.]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Build test passed.
---
fs/btrfs/extent_io.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 346fc46d019b..a1946d62911c 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2624,7 +2624,6 @@ int btrfs_repair_one_sector(struct inode *inode,
const int icsum = bio_offset >> fs_info->sectorsize_bits;
struct bio *repair_bio;
struct btrfs_io_bio *repair_io_bio;
- blk_status_t status;
btrfs_debug(fs_info,
"repair read error: read error at %llu", start);
@@ -2664,13 +2663,13 @@ int btrfs_repair_one_sector(struct inode *inode,
"repair read error: submitting new read to mirror %d",
failrec->this_mirror);
- status = submit_bio_hook(inode, repair_bio, failrec->this_mirror,
- failrec->bio_flags);
- if (status) {
- free_io_failure(failure_tree, tree, failrec);
- bio_put(repair_bio);
- }
- return blk_status_to_errno(status);
+ /*
+ * At this point we have a bio, so any errors from submit_bio_hook()
+ * will be handled by the endio on the repair_bio, so we can't return an
+ * error here.
+ */
+ submit_bio_hook(inode, repair_bio, failrec->this_mirror, failrec->bio_flags);
+ return BLK_STS_OK;
}
static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
--
2.34.1
The patch titled
Subject: mm: userfaultfd: correct dirty flags set for both present and swap pte
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-userfaultfd-correct-dirty-flags-set-for-both-present-and-swap-pte.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Barry Song <v-songbaohua(a)oppo.com>
Subject: mm: userfaultfd: correct dirty flags set for both present and swap pte
Date: Fri, 9 May 2025 10:09:12 +1200
As David pointed out, what truly matters for mremap and userfaultfd move
operations is the soft dirty bit. The current comment and
implementation���which always sets the dirty bit for present PTEs and
fails to set the soft dirty bit for swap PTEs���are incorrect. This could
break features like Checkpoint-Restore in Userspace (CRIU).
This patch updates the behavior to correctly set the soft dirty bit for
both present and swap PTEs in accordance with mremap.
Link: https://lkml.kernel.org/r/20250508220912.7275-1-21cnbao@gmail.com
Fixes: adef440691bab ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
Reported-by: David Hildenbrand <david(a)redhat.com>
Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redha…
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/userfaultfd.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
--- a/mm/userfaultfd.c~mm-userfaultfd-correct-dirty-flags-set-for-both-present-and-swap-pte
+++ a/mm/userfaultfd.c
@@ -1064,8 +1064,13 @@ static int move_present_pte(struct mm_st
src_folio->index = linear_page_index(dst_vma, dst_addr);
orig_dst_pte = mk_pte(&src_folio->page, dst_vma->vm_page_prot);
- /* Follow mremap() behavior and treat the entry dirty after the move */
- orig_dst_pte = pte_mkwrite(pte_mkdirty(orig_dst_pte), dst_vma);
+ /* Set soft dirty bit so userspace can notice the pte was moved */
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ orig_dst_pte = pte_mksoft_dirty(orig_dst_pte);
+#endif
+ if (pte_dirty(orig_src_pte))
+ orig_dst_pte = pte_mkdirty(orig_dst_pte);
+ orig_dst_pte = pte_mkwrite(orig_dst_pte, dst_vma);
set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte);
out:
@@ -1100,6 +1105,9 @@ static int move_swap_pte(struct mm_struc
}
orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte);
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ orig_src_pte = pte_swp_mksoft_dirty(orig_src_pte);
+#endif
set_pte_at(mm, dst_addr, dst_pte, orig_src_pte);
double_pt_unlock(dst_ptl, src_ptl);
_
Patches currently in -mm which might be from v-songbaohua(a)oppo.com are
mm-userfaultfd-correct-dirty-flags-set-for-both-present-and-swap-pte.patch
Hi,
After updating to 6.14.2, the ethernet adapter is almost unusable, I get
over 30% packet loss.
Bisect says it's this commit:
commit 85f6414167da39e0da30bf370f1ecda5a58c6f7b
Author: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Date: Thu Mar 13 16:05:56 2025 +0200
e1000e: change k1 configuration on MTP and later platforms
[ Upstream commit efaaf344bc2917cbfa5997633bc18a05d3aed27f ]
Starting from Meteor Lake, the Kumeran interface between the integrated
MAC and the I219 PHY works at a different frequency. This causes sporadic
MDI errors when accessing the PHY, and in rare circumstances could lead
to packet corruption.
To overcome this, introduce minor changes to the Kumeran idle
state (K1) parameters during device initialization. Hardware reset
reverts this configuration, therefore it needs to be applied in a few
places.
Fixes: cc23f4f0b6b9 ("e1000e: Add support for Meteor Lake")
Signed-off-by: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Tested-by: Avigail Dahan <avigailx.dahan(a)intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
drivers/net/ethernet/intel/e1000e/defines.h | 3 ++
drivers/net/ethernet/intel/e1000e/ich8lan.c | 80 +++++++++++++++++++++++++++--
drivers/net/ethernet/intel/e1000e/ich8lan.h | 4 ++
3 files changed, 82 insertions(+), 5 deletions(-)
My system is Novacustom V540TU laptop with Intel Core Ultra 5 125H. And
the e1000e driver is running in a Xen HVM (with PCI passthrough).
Interestingly, I have also another one with Intel Core Ultra 7 155H
where the issue does not happen. I don't see what is different about
network adapter there, they look identical on lspci (but there are
differences about other devices)...
I see the commit above was already backported to other stable branches
too...
#regzbot introduced: 85f6414167da39e0da30bf370f1ecda5a58c6f7b
--
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
The report zones buffer size is currently limited by the HBA's
maximum segment count to ensure the buffer can be mapped. However,
the block layer further limits the number of iovec entries to
1024 when allocating a bio.
To avoid allocation of buffers too large to be mapped, further
restrict the maximum buffer size to BIO_MAX_INLINE_VECS.
Replace the UIO_MAXIOV symbolic name with the more contextually
appropriate BIO_MAX_INLINE_VECS.
Fixes: b091ac616846 ("sd_zbc: Fix report zones buffer allocation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Steve Siwinski <ssiwinski(a)atto.com>
---
block/bio.c | 2 +-
drivers/scsi/sd_zbc.c | 6 +++++-
include/linux/bio.h | 1 +
3 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/block/bio.c b/block/bio.c
index 4e6c85a33d74..4be592d37fb6 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -611,7 +611,7 @@ struct bio *bio_kmalloc(unsigned short nr_vecs, gfp_t gfp_mask)
{
struct bio *bio;
- if (nr_vecs > UIO_MAXIOV)
+ if (nr_vecs > BIO_MAX_INLINE_VECS)
return NULL;
return kmalloc(struct_size(bio, bi_inline_vecs, nr_vecs), gfp_mask);
}
diff --git a/drivers/scsi/sd_zbc.c b/drivers/scsi/sd_zbc.c
index 7a447ff600d2..a8db66428f80 100644
--- a/drivers/scsi/sd_zbc.c
+++ b/drivers/scsi/sd_zbc.c
@@ -169,6 +169,7 @@ static void *sd_zbc_alloc_report_buffer(struct scsi_disk *sdkp,
unsigned int nr_zones, size_t *buflen)
{
struct request_queue *q = sdkp->disk->queue;
+ unsigned int max_segments;
size_t bufsize;
void *buf;
@@ -180,12 +181,15 @@ static void *sd_zbc_alloc_report_buffer(struct scsi_disk *sdkp,
* Furthermore, since the report zone command cannot be split, make
* sure that the allocated buffer can always be mapped by limiting the
* number of pages allocated to the HBA max segments limit.
+ * Since max segments can be larger than the max inline bio vectors,
+ * further limit the allocated buffer to BIO_MAX_INLINE_VECS.
*/
nr_zones = min(nr_zones, sdkp->zone_info.nr_zones);
bufsize = roundup((nr_zones + 1) * 64, SECTOR_SIZE);
bufsize = min_t(size_t, bufsize,
queue_max_hw_sectors(q) << SECTOR_SHIFT);
- bufsize = min_t(size_t, bufsize, queue_max_segments(q) << PAGE_SHIFT);
+ max_segments = min(BIO_MAX_INLINE_VECS, queue_max_segments(q));
+ bufsize = min_t(size_t, bufsize, max_segments << PAGE_SHIFT);
while (bufsize >= SECTOR_SIZE) {
buf = kvzalloc(bufsize, GFP_KERNEL | __GFP_NORETRY);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index cafc7c215de8..b786ec5bcc81 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -11,6 +11,7 @@
#include <linux/uio.h>
#define BIO_MAX_VECS 256U
+#define BIO_MAX_INLINE_VECS UIO_MAXIOV
struct queue_limits;
--
2.43.5
From: Barry Song <v-songbaohua(a)oppo.com>
As David pointed out, what truly matters for mremap and userfaultfd
move operations is the soft dirty bit. The current comment and
implementation—which always sets the dirty bit for present PTEs
and fails to set the soft dirty bit for swap PTEs—are incorrect.
This could break features like Checkpoint-Restore in Userspace
(CRIU).
This patch updates the behavior to correctly set the soft dirty bit
for both present and swap PTEs in accordance with mremap.
Reported-by: David Hildenbrand <david(a)redhat.com>
Closes: https://lore.kernel.org/linux-mm/02f14ee1-923f-47e3-a994-4950afb9afcc@redha…
Acked-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: Lokesh Gidra <lokeshgidra(a)google.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Fixes: adef440691bab ("userfaultfd: UFFDIO_MOVE uABI")
Cc: stable(a)vger.kernel.org
Signed-off-by: Barry Song <v-songbaohua(a)oppo.com>
---
mm/userfaultfd.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index e8ce92dc105f..bc473ad21202 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -1064,8 +1064,13 @@ static int move_present_pte(struct mm_struct *mm,
src_folio->index = linear_page_index(dst_vma, dst_addr);
orig_dst_pte = folio_mk_pte(src_folio, dst_vma->vm_page_prot);
- /* Follow mremap() behavior and treat the entry dirty after the move */
- orig_dst_pte = pte_mkwrite(pte_mkdirty(orig_dst_pte), dst_vma);
+ /* Set soft dirty bit so userspace can notice the pte was moved */
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ orig_dst_pte = pte_mksoft_dirty(orig_dst_pte);
+#endif
+ if (pte_dirty(orig_src_pte))
+ orig_dst_pte = pte_mkdirty(orig_dst_pte);
+ orig_dst_pte = pte_mkwrite(orig_dst_pte, dst_vma);
set_pte_at(mm, dst_addr, dst_pte, orig_dst_pte);
out:
@@ -1100,6 +1105,9 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma,
}
orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte);
+#ifdef CONFIG_MEM_SOFT_DIRTY
+ orig_src_pte = pte_swp_mksoft_dirty(orig_src_pte);
+#endif
set_pte_at(mm, dst_addr, dst_pte, orig_src_pte);
double_pt_unlock(dst_ptl, src_ptl);
--
2.39.3 (Apple Git-146)
With UBSAN enabled, we're getting the following trace:
UBSAN: array-index-out-of-bounds in .../drivers/clk/clk-s2mps11.c:186:3
index 0 is out of range for type 'struct clk_hw *[] __counted_by(num)' (aka 'struct clk_hw *[]')
This is because commit f316cdff8d67 ("clk: Annotate struct
clk_hw_onecell_data with __counted_by") annotated the hws member of
that struct with __counted_by, which informs the bounds sanitizer about
the number of elements in hws, so that it can warn when hws is accessed
out of bounds.
As noted in that change, the __counted_by member must be initialised
with the number of elements before the first array access happens,
otherwise there will be a warning from each access prior to the
initialisation because the number of elements is zero. This occurs in
s2mps11_clk_probe() due to ::num being assigned after ::hws access.
Move the assignment to satisfy the requirement of assign-before-access.
Cc: stable(a)vger.kernel.org
Fixes: f316cdff8d67 ("clk: Annotate struct clk_hw_onecell_data with __counted_by")
Signed-off-by: André Draszik <andre.draszik(a)linaro.org>
---
drivers/clk/clk-s2mps11.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/clk/clk-s2mps11.c b/drivers/clk/clk-s2mps11.c
index 014db6386624071e173b5b940466301d2596400a..8ddf3a9a53dfd5bb52a05a3e02788a357ea77ad3 100644
--- a/drivers/clk/clk-s2mps11.c
+++ b/drivers/clk/clk-s2mps11.c
@@ -137,6 +137,8 @@ static int s2mps11_clk_probe(struct platform_device *pdev)
if (!clk_data)
return -ENOMEM;
+ clk_data->num = S2MPS11_CLKS_NUM;
+
switch (hwid) {
case S2MPS11X:
s2mps11_reg = S2MPS11_REG_RTC_CTRL;
@@ -186,7 +188,6 @@ static int s2mps11_clk_probe(struct platform_device *pdev)
clk_data->hws[i] = &s2mps11_clks[i].hw;
}
- clk_data->num = S2MPS11_CLKS_NUM;
of_clk_add_hw_provider(s2mps11_clks->clk_np, of_clk_hw_onecell_get,
clk_data);
---
base-commit: 9388ec571cb1adba59d1cded2300eeb11827679c
change-id: 20250326-s2mps11-ubsan-c90978e7bc04
Best regards,
--
André Draszik <andre.draszik(a)linaro.org>
Hi,
Pasi Kallinen reported in Debian a regression with perf r5101c4
counter, initially it was found in
https://github.com/rr-debugger/rr/issues/3949 but said to be a kernel
problem.
On Tue, May 06, 2025 at 07:18:39PM +0300, Pasi Kallinen wrote:
> Package: src:linux
> Version: 6.12.25-1
> Severity: normal
> X-Debbugs-Cc: debian-amd64(a)lists.debian.org, paxed(a)alt.org
> User: debian-amd64(a)lists.debian.org
> Usertags: amd64
>
> Dear Maintainer,
>
> perf stat -e r5101c4 true
>
> reports "not supported".
>
> The counters worked in kernel 6.11.10.
>
> I first noticed this not working when updating to 6.12.22.
> Booting back to 6.11.10, the counters work correctly.
Does this ring a bell?
Would you be able to bisect the changes to identify where the
behaviour changed?
Regards,
Salvatore
The quilt patch titled
Subject: x86/kexec: fix potential cmem->ranges out of bounds
has been removed from the -mm tree. Its filename was
x86-kexec-fix-potential-cmem-ranges-out-of-bounds.patch
This patch was dropped because an updated version will be issued
------------------------------------------------------
From: fuqiang wang <fuqiang.wang(a)easystack.cn>
Subject: x86/kexec: fix potential cmem->ranges out of bounds
Date: Mon, 8 Jan 2024 21:06:47 +0800
In memmap_exclude_ranges(), elfheader will be excluded from crashk_res.
In the current x86 architecture code, the elfheader is always allocated
at crashk_res.start. It seems that there won't be a new split range.
But it depends on the allocation position of elfheader in crashk_res. To
avoid potential out of bounds in future, add a extra slot.
The similar issue also exists in fill_up_crash_elf_data(). The range to
be excluded is [0, 1M], start (0) is special and will not appear in the
middle of existing cmem->ranges[]. But in cast the low 1M could be
changed in the future, add a extra slot too.
Without this patch, kdump kernel will fail to be loaded by
kexec_file_load,
[ 139.736948] UBSAN: array-index-out-of-bounds in arch/x86/kernel/crash.c:350:25
[ 139.742360] index 0 is out of range for type 'range [*]'
[ 139.745695] CPU: 0 UID: 0 PID: 5778 Comm: kexec Not tainted 6.15.0-0.rc3.20250425git02ddfb981de8.32.fc43.x86_64 #1 PREEMPT(lazy)
[ 139.745698] Hardware name: Amazon EC2 c5.large/, BIOS 1.0 10/16/2017
[ 139.745699] Call Trace:
[ 139.745700] <TASK>
[ 139.745701] dump_stack_lvl+0x5d/0x80
[ 139.745706] ubsan_epilogue+0x5/0x2b
[ 139.745709] __ubsan_handle_out_of_bounds.cold+0x54/0x59
[ 139.745711] crash_setup_memmap_entries+0x2d9/0x330
[ 139.745716] setup_boot_parameters+0xf8/0x6a0
[ 139.745720] bzImage64_load+0x41b/0x4e0
[ 139.745722] ? find_next_iomem_res+0x109/0x140
[ 139.745727] ? locate_mem_hole_callback+0x109/0x170
[ 139.745737] kimage_file_alloc_init+0x1ef/0x3e0
[ 139.745740] __do_sys_kexec_file_load+0x180/0x2f0
[ 139.745742] do_syscall_64+0x7b/0x160
[ 139.745745] ? do_user_addr_fault+0x21a/0x690
[ 139.745747] ? exc_page_fault+0x7e/0x1a0
[ 139.745749] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 139.745751] RIP: 0033:0x7f7712c84e4d
Previously discussed link:
[1] https://lore.kernel.org/kexec/ZXk2oBf%2FT1Ul6o0c@MiWiFi-R3L-srv/
[2] https://lore.kernel.org/kexec/273284e8-7680-4f5f-8065-c5d780987e59@easystac…
[3] https://lore.kernel.org/kexec/ZYQ6O%2F57sHAPxTHm@MiWiFi-R3L-srv/
Link: https://lkml.kernel.org/r/20240108130720.228478-1-fuqiang.wang@easystack.cn
Signed-off-by: fuqiang wang <fuqiang.wang(a)easystack.cn>
Acked-by: Baoquan He <bhe(a)redhat.com>
Reported-by: Coiby Xu <coxu(a)redhat.com>
Closes: https://lkml.kernel.org/r/4de3c2onosr7negqnfhekm4cpbklzmsimgdfv33c52dktqpza…
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: <x86(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/crash.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/crash.c~x86-kexec-fix-potential-cmem-ranges-out-of-bounds
+++ a/arch/x86/kernel/crash.c
@@ -165,8 +165,18 @@ static struct crash_mem *fill_up_crash_e
/*
* Exclusion of crash region and/or crashk_low_res may cause
* another range split. So add extra two slots here.
+ *
+ * Exclusion of low 1M may not cause another range split, because the
+ * range of exclude is [0, 1M] and the condition for splitting a new
+ * region is that the start, end parameters are both in a certain
+ * existing region in cmem and cannot be equal to existing region's
+ * start or end. Obviously, the start of [0, 1M] cannot meet this
+ * condition.
+ *
+ * But in order to lest the low 1M could be changed in the future,
+ * (e.g. [stare, 1M]), add a extra slot.
*/
- nr_ranges += 2;
+ nr_ranges += 3;
cmem = vzalloc(struct_size(cmem, ranges, nr_ranges));
if (!cmem)
return NULL;
@@ -298,9 +308,16 @@ int crash_setup_memmap_entries(struct ki
struct crash_memmap_data cmd;
struct crash_mem *cmem;
- cmem = vzalloc(struct_size(cmem, ranges, 1));
+ /*
+ * In the current x86 architecture code, the elfheader is always
+ * allocated at crashk_res.start. But it depends on the allocation
+ * position of elfheader in crashk_res. To avoid potential out of
+ * bounds in future, add a extra slot.
+ */
+ cmem = vzalloc(struct_size(cmem, ranges, 2));
if (!cmem)
return -ENOMEM;
+ cmem->max_nr_ranges = 2;
memset(&cmd, 0, sizeof(struct crash_memmap_data));
cmd.params = params;
_
Patches currently in -mm which might be from fuqiang.wang(a)easystack.cn are
From: Fabio Estevam <festevam(a)denx.de>
Since commit 2718f15403fb ("iio: sanity check available_scan_masks array"),
verbose and misleading warnings are printed for devices like the MAX11601:
max1363 1-0064: available_scan_mask 8 subset of 0. Never used
max1363 1-0064: available_scan_mask 9 subset of 0. Never used
max1363 1-0064: available_scan_mask 10 subset of 0. Never used
max1363 1-0064: available_scan_mask 11 subset of 0. Never used
max1363 1-0064: available_scan_mask 12 subset of 0. Never used
max1363 1-0064: available_scan_mask 13 subset of 0. Never used
...
[warnings continue]
Fix the available_scan_masks sanity check logic so that it
only prints the warning when an element of available_scan_mask
is in fact a subset of a previous one.
These warnings incorrectly report that later scan masks are subsets of
the first one, even when they are not. The issue lies in the logic that
checks for subset relationships between scan masks.
Fix the subset detection to correctly compare each mask only
against previous masks, and only warn when a true subset is found.
With this fix, the warning output becomes both correct and more
informative:
max1363 1-0064: Mask 7 (0xc) is a subset of mask 6 (0xf) and will be ignored
Cc: stable(a)vger.kernel.org
Fixes: 2718f15403fb ("iio: sanity check available_scan_masks array")
Signed-off-by: Fabio Estevam <festevam(a)denx.de>
---
drivers/iio/industrialio-core.c | 23 ++++++++++-------------
1 file changed, 10 insertions(+), 13 deletions(-)
diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 6a6568d4a2cb..855d5fd3e6b2 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1904,6 +1904,11 @@ static int iio_check_extended_name(const struct iio_dev *indio_dev)
static const struct iio_buffer_setup_ops noop_ring_setup_ops;
+static int is_subset(unsigned long a, unsigned long b)
+{
+ return (a & ~b) == 0;
+}
+
static void iio_sanity_check_avail_scan_masks(struct iio_dev *indio_dev)
{
unsigned int num_masks, masklength, longs_per_mask;
@@ -1947,21 +1952,13 @@ static void iio_sanity_check_avail_scan_masks(struct iio_dev *indio_dev)
* available masks in the order of preference (presumably the least
* costy to access masks first).
*/
- for (i = 0; i < num_masks - 1; i++) {
- const unsigned long *mask1;
- int j;
- mask1 = av_masks + i * longs_per_mask;
- for (j = i + 1; j < num_masks; j++) {
- const unsigned long *mask2;
-
- mask2 = av_masks + j * longs_per_mask;
- if (bitmap_subset(mask2, mask1, masklength))
+ for (i = 1; i < num_masks; ++i)
+ for (int j = 0; j < i; ++j)
+ if (is_subset(av_masks[i], av_masks[j]))
dev_warn(indio_dev->dev.parent,
- "available_scan_mask %d subset of %d. Never used\n",
- j, i);
- }
- }
+ "Mask %d (0x%lx) is a subset of mask %d (0x%lx) and will be ignored\n",
+ i, av_masks[i], j, av_masks[j]);
}
/**
--
2.34.1
Hi,
On 3/14/25 2:08 PM, Benjamin Berg wrote:
> From: Benjamin Berg <benjamin.berg(a)intel.com>
> um: work around sched_yield not yielding in time-travel mode
>
> sched_yield by a userspace may not actually cause scheduling in
> time-travel mode as no time has passed. In the case seen it appears to
> be a badly implemented userspace spinlock in ASAN. Unfortunately, with
> time-travel it causes an extreme slowdown or even deadlock depending on
> the kernel configuration (CONFIG_UML_MAX_USERSPACE_ITERATIONS).
>
> Work around it by accounting time to the process whenever it executes a
> sched_yield syscall.
>
> Signed-off-by: Benjamin Berg <benjamin.berg(a)intel.com>
From what I can tell the patch mentioned above was backported to 6.12.27 by:
<https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/arc…>
but without the upstream
|Commit 0b8b2668f9981c1fefc2ef892bd915288ef01f33
|Author: Benjamin Berg <benjamin.berg(a)intel.com>
|Date: Thu Oct 10 16:25:37 2024 +0200
| um: insert scheduler ticks when userspace does not yield
|
| In time-travel mode userspace can do a lot of work without any time
| passing. Unfortunately, this can result in OOM situations as the RCU
| core code will never be run. [...]
the kernel build for 6.12.27 for the UM-Target will fail:
| /usr/bin/ld: arch/um/kernel/skas/syscall.o: in function `handle_syscall': linux-6.12.27/arch/um/kernel/skas/syscall.c:43:(.text+0xa2): undefined reference to `tt_extra_sched_jiffies'
| collect2: error: ld returned 1 exit status
is it possible to backport 0b8b2668f9981c1fefc2ef892bd915288ef01f33 too?
Or is it better to revert 887c5c12e80c8424bd471122d2e8b6b462e12874 again
in the stable releases?
Best Regards,
Christian Lamparter
>
> ---
>
> I suspect it is this code in ASAN that uses sched_yield
> https://github.com/llvm/llvm-project/blob/main/compiler-rt/lib/sanitizer_co…
> though there are also some other places that use sched_yield.
>
> I doubt that code is reasonable. At the same time, not sure that
> sched_yield is behaving as advertised either as it obviously is not
> necessarily relinquishing the CPU.
> ---
> arch/um/include/linux/time-internal.h | 2 ++
> arch/um/kernel/skas/syscall.c | 11 +++++++++++
> 2 files changed, 13 insertions(+)
>
> diff --git a/arch/um/include/linux/time-internal.h b/arch/um/include/linux/time-internal.h
> index b22226634ff6..138908b999d7 100644
> --- a/arch/um/include/linux/time-internal.h
> +++ b/arch/um/include/linux/time-internal.h
> @@ -83,6 +83,8 @@ extern void time_travel_not_configured(void);
> #define time_travel_del_event(...) time_travel_not_configured()
> #endif /* CONFIG_UML_TIME_TRAVEL_SUPPORT */
>
> +extern unsigned long tt_extra_sched_jiffies;
> +
> /*
> * Without CONFIG_UML_TIME_TRAVEL_SUPPORT this is a linker error if used,
> * which is intentional since we really shouldn't link it in that case.
> diff --git a/arch/um/kernel/skas/syscall.c b/arch/um/kernel/skas/syscall.c
> index b09e85279d2b..a5beaea2967e 100644
> --- a/arch/um/kernel/skas/syscall.c
> +++ b/arch/um/kernel/skas/syscall.c
> @@ -31,6 +31,17 @@ void handle_syscall(struct uml_pt_regs *r)
> goto out;
>
> syscall = UPT_SYSCALL_NR(r);
> +
> + /*
> + * If no time passes, then sched_yield may not actually yield, causing
> + * broken spinlock implementations in userspace (ASAN) to hang for long
> + * periods of time.
> + */
> + if ((time_travel_mode == TT_MODE_INFCPU ||
> + time_travel_mode == TT_MODE_EXTERNAL) &&
> + syscall == __NR_sched_yield)
> + tt_extra_sched_jiffies += 1;
> +
> if (syscall >= 0 && syscall < __NR_syscalls) {
> unsigned long ret = EXECUTE_SYSCALL(syscall, regs);
>
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 90abee6d7895d5eef18c91d870d8168be4e76e9d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025042150-hardiness-hunting-0780@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 90abee6d7895d5eef18c91d870d8168be4e76e9d Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes(a)cmpxchg.org>
Date: Mon, 7 Apr 2025 14:01:53 -0400
Subject: [PATCH] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
single pages from biggest buddy") as the root cause of a 56.4% regression
in vm-scalability::lru-file-mmap-read.
Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
freelist movement during block conversion"), as the root cause for a
regression in worst-case zone->lock+irqoff hold times.
Both of these patches modify the page allocator's fallback path to be less
greedy in an effort to stave off fragmentation. The flip side of this is
that fallbacks are also less productive each time around, which means the
fallback search can run much more frequently.
Carlos' traces point to rmqueue_bulk() specifically, which tries to refill
the percpu cache by allocating a large batch of pages in a loop. It
highlights how once the native freelists are exhausted, the fallback code
first scans orders top-down for whole blocks to claim, then falls back to
a bottom-up search for the smallest buddy to steal. For the next batch
page, it goes through the same thing again.
This can be made more efficient. Since rmqueue_bulk() holds the
zone->lock over the entire batch, the freelists are not subject to outside
changes; when the search for a block to claim has already failed, there is
no point in trying again for the next page.
Modify __rmqueue() to remember the last successful fallback mode, and
restart directly from there on the next rmqueue_bulk() iteration.
Oliver confirms that this improves beyond the regression that the test
robot reported against c2f6ea38fc1b:
commit:
f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch
f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput
Carlos confirms that worst-case times are almost fully recovered
compared to before the earlier culprit patch:
2dd482ba627d (before freelist hygiene): 1ms
c0cd6f557b90 (after freelist hygiene): 90ms
next-20250319 (steal smallest buddy): 280ms
this patch : 8ms
[jackmanb(a)google.com: comment updates]
Link: https://lkml.kernel.org/r/D92AC0P9594X.3BML64MUKTF8Z@google.com
[hannes(a)cmpxchg.org: reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin]
Link: https://lkml.kernel.org/r/20250409140023.GA2313@cmpxchg.org
Link: https://lkml.kernel.org/r/20250407180154.63348-1-hannes@cmpxchg.org
Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Signed-off-by: Brendan Jackman <jackmanb(a)google.com>
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Reported-by: Carlos Song <carlos.song(a)nxp.com>
Tested-by: Carlos Song <carlos.song(a)nxp.com>
Tested-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
Reviewed-by: Brendan Jackman <jackmanb(a)google.com>
Tested-by: Shivank Garg <shivankg(a)amd.com>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.10+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9a219fe8e130..1715e34b91af 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2183,23 +2183,15 @@ try_to_claim_block(struct zone *zone, struct page *page,
}
/*
- * Try finding a free buddy page on the fallback list.
- *
- * This will attempt to claim a whole pageblock for the requested type
- * to ensure grouping of such requests in the future.
- *
- * If a whole block cannot be claimed, steal an individual page, regressing to
- * __rmqueue_smallest() logic to at least break up as little contiguity as
- * possible.
+ * Try to allocate from some fallback migratetype by claiming the entire block,
+ * i.e. converting it to the allocation's start migratetype.
*
* The use of signed ints for order and current_order is a deliberate
* deviation from the rest of this file, to make the for loop
* condition simpler.
- *
- * Return the stolen page, or NULL if none can be found.
*/
static __always_inline struct page *
-__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
+__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
unsigned int alloc_flags)
{
struct free_area *area;
@@ -2237,14 +2229,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
page = try_to_claim_block(zone, page, current_order, order,
start_migratetype, fallback_mt,
alloc_flags);
- if (page)
- goto got_one;
+ if (page) {
+ trace_mm_page_alloc_extfrag(page, order, current_order,
+ start_migratetype, fallback_mt);
+ return page;
+ }
}
- if (alloc_flags & ALLOC_NOFRAGMENT)
- return NULL;
+ return NULL;
+}
+
+/*
+ * Try to steal a single page from some fallback migratetype. Leave the rest of
+ * the block as its current migratetype, potentially causing fragmentation.
+ */
+static __always_inline struct page *
+__rmqueue_steal(struct zone *zone, int order, int start_migratetype)
+{
+ struct free_area *area;
+ int current_order;
+ struct page *page;
+ int fallback_mt;
+ bool claim_block;
- /* No luck claiming pageblock. Find the smallest fallback page */
for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
area = &(zone->free_area[current_order]);
fallback_mt = find_suitable_fallback(area, current_order,
@@ -2254,25 +2261,28 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
page = get_page_from_free_area(area, fallback_mt);
page_del_and_expand(zone, page, order, current_order, fallback_mt);
- goto got_one;
+ trace_mm_page_alloc_extfrag(page, order, current_order,
+ start_migratetype, fallback_mt);
+ return page;
}
return NULL;
-
-got_one:
- trace_mm_page_alloc_extfrag(page, order, current_order,
- start_migratetype, fallback_mt);
-
- return page;
}
+enum rmqueue_mode {
+ RMQUEUE_NORMAL,
+ RMQUEUE_CMA,
+ RMQUEUE_CLAIM,
+ RMQUEUE_STEAL,
+};
+
/*
* Do the hard work of removing an element from the buddy allocator.
* Call me with the zone->lock already held.
*/
static __always_inline struct page *
__rmqueue(struct zone *zone, unsigned int order, int migratetype,
- unsigned int alloc_flags)
+ unsigned int alloc_flags, enum rmqueue_mode *mode)
{
struct page *page;
@@ -2291,16 +2301,48 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
}
}
- page = __rmqueue_smallest(zone, order, migratetype);
- if (unlikely(!page)) {
- if (alloc_flags & ALLOC_CMA)
+ /*
+ * First try the freelists of the requested migratetype, then try
+ * fallbacks modes with increasing levels of fragmentation risk.
+ *
+ * The fallback logic is expensive and rmqueue_bulk() calls in
+ * a loop with the zone->lock held, meaning the freelists are
+ * not subject to any outside changes. Remember in *mode where
+ * we found pay dirt, to save us the search on the next call.
+ */
+ switch (*mode) {
+ case RMQUEUE_NORMAL:
+ page = __rmqueue_smallest(zone, order, migratetype);
+ if (page)
+ return page;
+ fallthrough;
+ case RMQUEUE_CMA:
+ if (alloc_flags & ALLOC_CMA) {
page = __rmqueue_cma_fallback(zone, order);
-
- if (!page)
- page = __rmqueue_fallback(zone, order, migratetype,
- alloc_flags);
+ if (page) {
+ *mode = RMQUEUE_CMA;
+ return page;
+ }
+ }
+ fallthrough;
+ case RMQUEUE_CLAIM:
+ page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
+ if (page) {
+ /* Replenished preferred freelist, back to normal mode. */
+ *mode = RMQUEUE_NORMAL;
+ return page;
+ }
+ fallthrough;
+ case RMQUEUE_STEAL:
+ if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
+ page = __rmqueue_steal(zone, order, migratetype);
+ if (page) {
+ *mode = RMQUEUE_STEAL;
+ return page;
+ }
+ }
}
- return page;
+ return NULL;
}
/*
@@ -2312,6 +2354,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
unsigned long count, struct list_head *list,
int migratetype, unsigned int alloc_flags)
{
+ enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
unsigned long flags;
int i;
@@ -2323,7 +2366,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
}
for (i = 0; i < count; ++i) {
struct page *page = __rmqueue(zone, order, migratetype,
- alloc_flags);
+ alloc_flags, &rmqm);
if (unlikely(page == NULL))
break;
@@ -2948,7 +2991,9 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
if (alloc_flags & ALLOC_HIGHATOMIC)
page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
if (!page) {
- page = __rmqueue(zone, order, migratetype, alloc_flags);
+ enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
+
+ page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
/*
* If the allocation fails, allow OOM handling and
These patchset adds support for VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW
message added in recent VPU firmware. Without it the driver will not be able to
process any jobs after this message is received and would need to be reloaded.
Most patches are as-is from upstream besides these two:
- Fix locking order in ivpu_job_submit
- Abort all jobs after command queue unregister
Both these patches need to be rebased because of missing new CMDQ UAPI changes
that should not be backported to stable.
Changes since v1:
- Documented deviations from the original upstream patches in commit messages
Andrew Kreimer (1):
accel/ivpu: Fix a typo
Andrzej Kacprowski (1):
accel/ivpu: Update VPU FW API headers
Karol Wachowski (4):
accel/ivpu: Use xa_alloc_cyclic() instead of custom function
accel/ivpu: Abort all jobs after command queue unregister
accel/ivpu: Fix locking order in ivpu_job_submit
accel/ivpu: Add handling of VPU_JSM_STATUS_MVNCI_CONTEXT_VIOLATION_HW
Tomasz Rusinowicz (1):
accel/ivpu: Make DB_ID and JOB_ID allocations incremental
drivers/accel/ivpu/ivpu_drv.c | 38 ++--
drivers/accel/ivpu/ivpu_drv.h | 9 +
drivers/accel/ivpu/ivpu_job.c | 125 +++++++++---
drivers/accel/ivpu/ivpu_job.h | 1 +
drivers/accel/ivpu/ivpu_jsm_msg.c | 3 +-
drivers/accel/ivpu/ivpu_mmu.c | 3 +-
drivers/accel/ivpu/ivpu_sysfs.c | 5 +-
drivers/accel/ivpu/vpu_boot_api.h | 45 +++--
drivers/accel/ivpu/vpu_jsm_api.h | 303 +++++++++++++++++++++++++-----
9 files changed, 412 insertions(+), 120 deletions(-)
--
2.45.1
This patchset backports a series of ublk fixes from upstream to 6.14-stable.
Patch 7 fixes the race that can cause kernel panic when ublk server daemon is exiting.
It depends on patches 1-6 which simplifies & improves IO canceling when ublk server daemon
is exiting as described here:
https://lore.kernel.org/linux-block/20250416035444.99569-1-ming.lei@redhat.…
Ming Lei (5):
ublk: add helper of ublk_need_map_io()
ublk: move device reset into ublk_ch_release()
ublk: remove __ublk_quiesce_dev()
ublk: simplify aborting ublk request
ublk: fix race between io_uring_cmd_complete_in_task and
ublk_cancel_cmd
Uday Shankar (2):
ublk: properly serialize all FETCH_REQs
ublk: improve detection and handling of ublk server exit
drivers/block/ublk_drv.c | 550 +++++++++++++++++++++------------------
1 file changed, 291 insertions(+), 259 deletions(-)
--
2.43.0
The patch below does not apply to the 6.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.14.y
git checkout FETCH_HEAD
git cherry-pick -x e775278cd75f24a2758c28558c4e41b36c935740
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025042247-mounting-playlist-f479@gregkh' --subject-prefix 'PATCH 6.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e775278cd75f24a2758c28558c4e41b36c935740 Mon Sep 17 00:00:00 2001
From: Kenneth Graunke <kenneth(a)whitecape.org>
Date: Sun, 30 Mar 2025 12:59:23 -0400
Subject: [PATCH] drm/xe: Invalidate L3 read-only cachelines for geometry
streams too
Historically, the Vertex Fetcher unit has not been an L3 client. That
meant that, when a buffer containing vertex data was written to, it was
necessary to issue a PIPE_CONTROL::VF Cache Invalidate to invalidate any
VF L2 cachelines associated with that buffer, so the new value would be
properly read from memory.
Since Tigerlake and later, VERTEX_BUFFER_STATE and 3DSTATE_INDEX_BUFFER
have included an "L3 Bypass Enable" bit which userspace drivers can set
to request that the vertex fetcher unit snoop L3. However, unlike most
true L3 clients, the "VF Cache Invalidate" bit continues to only
invalidate the VF L2 cache - and not any associated L3 lines.
To handle that, PIPE_CONTROL has a new "L3 Read Only Cache Invalidation
Bit", which according to the docs, "controls the invalidation of the
Geometry streams cached in L3 cache at the top of the pipe." In other
words, the vertex and index buffer data that gets cached in L3 when
"L3 Bypass Disable" is set.
Mesa always sets L3 Bypass Disable so that the VF unit snoops L3, and
whenever it issues a VF Cache Invalidate, it also issues a L3 Read Only
Cache Invalidate so that both L2 and L3 vertex data is invalidated.
xe is issuing VF cache invalidates too (which handles cases like CPU
writes to a buffer between GPU batches). Because userspace may enable
L3 snooping, it needs to issue an L3 Read Only Cache Invalidate as well.
Fixes significant flickering in Firefox on Meteorlake, which was writing
to vertex buffers via the CPU between batches; the missing L3 Read Only
invalidates were causing the vertex fetcher to read stale data from L3.
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4460
Fixes: 6ef3bb60557d ("drm/xe: enable lite restore")
Cc: stable(a)vger.kernel.org # v6.13+
Signed-off-by: Kenneth Graunke <kenneth(a)whitecape.org>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Link: https://lore.kernel.org/r/20250330165923.56410-1-rodrigo.vivi@intel.com
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
(cherry picked from commit 61672806b579dd5a150a042ec9383be2bbc2ae7e)
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
diff --git a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
index a255946b6f77..8cfcd3360896 100644
--- a/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
+++ b/drivers/gpu/drm/xe/instructions/xe_gpu_commands.h
@@ -41,6 +41,7 @@
#define GFX_OP_PIPE_CONTROL(len) ((0x3<<29)|(0x3<<27)|(0x2<<24)|((len)-2))
+#define PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE BIT(10) /* gen12 */
#define PIPE_CONTROL0_HDC_PIPELINE_FLUSH BIT(9) /* gen12 */
#define PIPE_CONTROL_COMMAND_CACHE_INVALIDATE (1<<29)
diff --git a/drivers/gpu/drm/xe/xe_ring_ops.c b/drivers/gpu/drm/xe/xe_ring_ops.c
index 917fc16de866..a7582b097ae6 100644
--- a/drivers/gpu/drm/xe/xe_ring_ops.c
+++ b/drivers/gpu/drm/xe/xe_ring_ops.c
@@ -137,7 +137,8 @@ emit_pipe_control(u32 *dw, int i, u32 bit_group_0, u32 bit_group_1, u32 offset,
static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
int i)
{
- u32 flags = PIPE_CONTROL_CS_STALL |
+ u32 flags0 = 0;
+ u32 flags1 = PIPE_CONTROL_CS_STALL |
PIPE_CONTROL_COMMAND_CACHE_INVALIDATE |
PIPE_CONTROL_INSTRUCTION_CACHE_INVALIDATE |
PIPE_CONTROL_TEXTURE_CACHE_INVALIDATE |
@@ -148,11 +149,15 @@ static int emit_pipe_invalidate(u32 mask_flags, bool invalidate_tlb, u32 *dw,
PIPE_CONTROL_STORE_DATA_INDEX;
if (invalidate_tlb)
- flags |= PIPE_CONTROL_TLB_INVALIDATE;
+ flags1 |= PIPE_CONTROL_TLB_INVALIDATE;
- flags &= ~mask_flags;
+ flags1 &= ~mask_flags;
- return emit_pipe_control(dw, i, 0, flags, LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
+ if (flags1 & PIPE_CONTROL_VF_CACHE_INVALIDATE)
+ flags0 |= PIPE_CONTROL0_L3_READ_ONLY_CACHE_INVALIDATE;
+
+ return emit_pipe_control(dw, i, flags0, flags1,
+ LRC_PPHWSP_FLUSH_INVAL_SCRATCH_ADDR, 0);
}
static int emit_store_imm_ppgtt_posted(u64 addr, u64 value,
Hello,
This fixes an i.Mx 7D SoC PCIe regression caused by a backport mistake.
The regression is broken PCIe initialization and for me a boot hang.
I don't know how to organize this. I think a revert and redo best captures
what's happening.
To complicate things, it looks like the redo patch could also be applied to
5.4, 5.10, 5.15, and 6.1. But those versions don't have the original
backport commit. Version 6.12 matches master and needs no change.
One conflict resolution is needed to apply the redo patch back to versions
6.1 -> 5.15 -> 5.10. One more resolution to apply back to -> 5.4. Patches
against those other versions aren't included here.
-- Ryan
Richard Zhu (1):
PCI: imx6: Skip controller_id generation logic for i.MX7D
Ryan Matthews (1):
Revert "PCI: imx6: Skip controller_id generation logic for i.MX7D"
drivers/pci/controller/dwc/pci-imx6.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
base-commit: 814637ca257f4faf57a73fd4e38888cce88b5911
--
2.47.2
The patch below does not apply to the 6.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.14.y
git checkout FETCH_HEAD
git cherry-pick -x 48c1d1bb525b1c44b8bdc8e7ec5629cb6c2b9fc4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025050558-charger-crumpled-6ca4@gregkh' --subject-prefix 'PATCH 6.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 48c1d1bb525b1c44b8bdc8e7ec5629cb6c2b9fc4 Mon Sep 17 00:00:00 2001
From: Penglei Jiang <superman.xpt(a)gmail.com>
Date: Mon, 21 Apr 2025 08:40:29 -0700
Subject: [PATCH] btrfs: fix the inode leak in btrfs_iget()
[BUG]
There is a bug report that a syzbot reproducer can lead to the following
busy inode at unmount time:
BTRFS info (device loop1): last unmount of filesystem 1680000e-3c1e-4c46-84b6-56bd3909af50
VFS: Busy inodes after unmount of loop1 (btrfs)
------------[ cut here ]------------
kernel BUG at fs/super.c:650!
Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI
CPU: 0 UID: 0 PID: 48168 Comm: syz-executor Not tainted 6.15.0-rc2-00471-g119009db2674 #2 PREEMPT(full)
Hardware name: QEMU Ubuntu 24.04 PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:generic_shutdown_super+0x2e9/0x390 fs/super.c:650
Call Trace:
<TASK>
kill_anon_super+0x3a/0x60 fs/super.c:1237
btrfs_kill_super+0x3b/0x50 fs/btrfs/super.c:2099
deactivate_locked_super+0xbe/0x1a0 fs/super.c:473
deactivate_super fs/super.c:506 [inline]
deactivate_super+0xe2/0x100 fs/super.c:502
cleanup_mnt+0x21f/0x440 fs/namespace.c:1435
task_work_run+0x14d/0x240 kernel/task_work.c:227
resume_user_mode_work include/linux/resume_user_mode.h:50 [inline]
exit_to_user_mode_loop kernel/entry/common.c:114 [inline]
exit_to_user_mode_prepare include/linux/entry-common.h:329 [inline]
__syscall_exit_to_user_mode_work kernel/entry/common.c:207 [inline]
syscall_exit_to_user_mode+0x269/0x290 kernel/entry/common.c:218
do_syscall_64+0xd4/0x250 arch/x86/entry/syscall_64.c:100
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>
[CAUSE]
When btrfs_alloc_path() failed, btrfs_iget() directly returned without
releasing the inode already allocated by btrfs_iget_locked().
This results the above busy inode and trigger the kernel BUG.
[FIX]
Fix it by calling iget_failed() if btrfs_alloc_path() failed.
If we hit error inside btrfs_read_locked_inode(), it will properly call
iget_failed(), so nothing to worry about.
Although the iget_failed() cleanup inside btrfs_read_locked_inode() is a
break of the normal error handling scheme, let's fix the obvious bug
and backport first, then rework the error handling later.
Reported-by: Penglei Jiang <superman.xpt(a)gmail.com>
Link: https://lore.kernel.org/linux-btrfs/20250421102425.44431-1-superman.xpt@gma…
Fixes: 7c855e16ab72 ("btrfs: remove conditional path allocation in btrfs_read_locked_inode()")
CC: stable(a)vger.kernel.org # 6.13+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Penglei Jiang <superman.xpt(a)gmail.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index 312fa996a987..d295a37fa049 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -5682,8 +5682,10 @@ struct btrfs_inode *btrfs_iget(u64 ino, struct btrfs_root *root)
return inode;
path = btrfs_alloc_path();
- if (!path)
+ if (!path) {
+ iget_failed(&inode->vfs_inode);
return ERR_PTR(-ENOMEM);
+ }
ret = btrfs_read_locked_inode(inode, path);
btrfs_free_path(path);
The patch below does not apply to the 6.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.14.y
git checkout FETCH_HEAD
git cherry-pick -x 90abee6d7895d5eef18c91d870d8168be4e76e9d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025042149-busily-amaretto-e684@gregkh' --subject-prefix 'PATCH 6.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 90abee6d7895d5eef18c91d870d8168be4e76e9d Mon Sep 17 00:00:00 2001
From: Johannes Weiner <hannes(a)cmpxchg.org>
Date: Mon, 7 Apr 2025 14:01:53 -0400
Subject: [PATCH] mm: page_alloc: speed up fallbacks in rmqueue_bulk()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The test robot identified c2f6ea38fc1b ("mm: page_alloc: don't steal
single pages from biggest buddy") as the root cause of a 56.4% regression
in vm-scalability::lru-file-mmap-read.
Carlos reports an earlier patch, c0cd6f557b90 ("mm: page_alloc: fix
freelist movement during block conversion"), as the root cause for a
regression in worst-case zone->lock+irqoff hold times.
Both of these patches modify the page allocator's fallback path to be less
greedy in an effort to stave off fragmentation. The flip side of this is
that fallbacks are also less productive each time around, which means the
fallback search can run much more frequently.
Carlos' traces point to rmqueue_bulk() specifically, which tries to refill
the percpu cache by allocating a large batch of pages in a loop. It
highlights how once the native freelists are exhausted, the fallback code
first scans orders top-down for whole blocks to claim, then falls back to
a bottom-up search for the smallest buddy to steal. For the next batch
page, it goes through the same thing again.
This can be made more efficient. Since rmqueue_bulk() holds the
zone->lock over the entire batch, the freelists are not subject to outside
changes; when the search for a block to claim has already failed, there is
no point in trying again for the next page.
Modify __rmqueue() to remember the last successful fallback mode, and
restart directly from there on the next rmqueue_bulk() iteration.
Oliver confirms that this improves beyond the regression that the test
robot reported against c2f6ea38fc1b:
commit:
f3b92176f4 ("tools/selftests: add guard region test for /proc/$pid/pagemap")
c2f6ea38fc ("mm: page_alloc: don't steal single pages from biggest buddy")
acc4d5ff0b ("Merge tag 'net-6.15-rc0' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
2c847f27c3 ("mm: page_alloc: speed up fallbacks in rmqueue_bulk()") <--- your patch
f3b92176f4f7100f c2f6ea38fc1b640aa7a2e155cc1 acc4d5ff0b61eb1715c498b6536 2c847f27c37da65a93d23c237c5
---------------- --------------------------- --------------------------- ---------------------------
%stddev %change %stddev %change %stddev %change %stddev
\ | \ | \ | \
25525364 ± 3% -56.4% 11135467 -57.8% 10779336 +31.6% 33581409 vm-scalability.throughput
Carlos confirms that worst-case times are almost fully recovered
compared to before the earlier culprit patch:
2dd482ba627d (before freelist hygiene): 1ms
c0cd6f557b90 (after freelist hygiene): 90ms
next-20250319 (steal smallest buddy): 280ms
this patch : 8ms
[jackmanb(a)google.com: comment updates]
Link: https://lkml.kernel.org/r/D92AC0P9594X.3BML64MUKTF8Z@google.com
[hannes(a)cmpxchg.org: reset rmqueue_mode in rmqueue_buddy() error loop, per Yunsheng Lin]
Link: https://lkml.kernel.org/r/20250409140023.GA2313@cmpxchg.org
Link: https://lkml.kernel.org/r/20250407180154.63348-1-hannes@cmpxchg.org
Fixes: c0cd6f557b90 ("mm: page_alloc: fix freelist movement during block conversion")
Fixes: c2f6ea38fc1b ("mm: page_alloc: don't steal single pages from biggest buddy")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Signed-off-by: Brendan Jackman <jackmanb(a)google.com>
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Reported-by: Carlos Song <carlos.song(a)nxp.com>
Tested-by: Carlos Song <carlos.song(a)nxp.com>
Tested-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202503271547.fc08b188-lkp@intel.com
Reviewed-by: Brendan Jackman <jackmanb(a)google.com>
Tested-by: Shivank Garg <shivankg(a)amd.com>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org> [6.10+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9a219fe8e130..1715e34b91af 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2183,23 +2183,15 @@ try_to_claim_block(struct zone *zone, struct page *page,
}
/*
- * Try finding a free buddy page on the fallback list.
- *
- * This will attempt to claim a whole pageblock for the requested type
- * to ensure grouping of such requests in the future.
- *
- * If a whole block cannot be claimed, steal an individual page, regressing to
- * __rmqueue_smallest() logic to at least break up as little contiguity as
- * possible.
+ * Try to allocate from some fallback migratetype by claiming the entire block,
+ * i.e. converting it to the allocation's start migratetype.
*
* The use of signed ints for order and current_order is a deliberate
* deviation from the rest of this file, to make the for loop
* condition simpler.
- *
- * Return the stolen page, or NULL if none can be found.
*/
static __always_inline struct page *
-__rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
+__rmqueue_claim(struct zone *zone, int order, int start_migratetype,
unsigned int alloc_flags)
{
struct free_area *area;
@@ -2237,14 +2229,29 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
page = try_to_claim_block(zone, page, current_order, order,
start_migratetype, fallback_mt,
alloc_flags);
- if (page)
- goto got_one;
+ if (page) {
+ trace_mm_page_alloc_extfrag(page, order, current_order,
+ start_migratetype, fallback_mt);
+ return page;
+ }
}
- if (alloc_flags & ALLOC_NOFRAGMENT)
- return NULL;
+ return NULL;
+}
+
+/*
+ * Try to steal a single page from some fallback migratetype. Leave the rest of
+ * the block as its current migratetype, potentially causing fragmentation.
+ */
+static __always_inline struct page *
+__rmqueue_steal(struct zone *zone, int order, int start_migratetype)
+{
+ struct free_area *area;
+ int current_order;
+ struct page *page;
+ int fallback_mt;
+ bool claim_block;
- /* No luck claiming pageblock. Find the smallest fallback page */
for (current_order = order; current_order < NR_PAGE_ORDERS; current_order++) {
area = &(zone->free_area[current_order]);
fallback_mt = find_suitable_fallback(area, current_order,
@@ -2254,25 +2261,28 @@ __rmqueue_fallback(struct zone *zone, int order, int start_migratetype,
page = get_page_from_free_area(area, fallback_mt);
page_del_and_expand(zone, page, order, current_order, fallback_mt);
- goto got_one;
+ trace_mm_page_alloc_extfrag(page, order, current_order,
+ start_migratetype, fallback_mt);
+ return page;
}
return NULL;
-
-got_one:
- trace_mm_page_alloc_extfrag(page, order, current_order,
- start_migratetype, fallback_mt);
-
- return page;
}
+enum rmqueue_mode {
+ RMQUEUE_NORMAL,
+ RMQUEUE_CMA,
+ RMQUEUE_CLAIM,
+ RMQUEUE_STEAL,
+};
+
/*
* Do the hard work of removing an element from the buddy allocator.
* Call me with the zone->lock already held.
*/
static __always_inline struct page *
__rmqueue(struct zone *zone, unsigned int order, int migratetype,
- unsigned int alloc_flags)
+ unsigned int alloc_flags, enum rmqueue_mode *mode)
{
struct page *page;
@@ -2291,16 +2301,48 @@ __rmqueue(struct zone *zone, unsigned int order, int migratetype,
}
}
- page = __rmqueue_smallest(zone, order, migratetype);
- if (unlikely(!page)) {
- if (alloc_flags & ALLOC_CMA)
+ /*
+ * First try the freelists of the requested migratetype, then try
+ * fallbacks modes with increasing levels of fragmentation risk.
+ *
+ * The fallback logic is expensive and rmqueue_bulk() calls in
+ * a loop with the zone->lock held, meaning the freelists are
+ * not subject to any outside changes. Remember in *mode where
+ * we found pay dirt, to save us the search on the next call.
+ */
+ switch (*mode) {
+ case RMQUEUE_NORMAL:
+ page = __rmqueue_smallest(zone, order, migratetype);
+ if (page)
+ return page;
+ fallthrough;
+ case RMQUEUE_CMA:
+ if (alloc_flags & ALLOC_CMA) {
page = __rmqueue_cma_fallback(zone, order);
-
- if (!page)
- page = __rmqueue_fallback(zone, order, migratetype,
- alloc_flags);
+ if (page) {
+ *mode = RMQUEUE_CMA;
+ return page;
+ }
+ }
+ fallthrough;
+ case RMQUEUE_CLAIM:
+ page = __rmqueue_claim(zone, order, migratetype, alloc_flags);
+ if (page) {
+ /* Replenished preferred freelist, back to normal mode. */
+ *mode = RMQUEUE_NORMAL;
+ return page;
+ }
+ fallthrough;
+ case RMQUEUE_STEAL:
+ if (!(alloc_flags & ALLOC_NOFRAGMENT)) {
+ page = __rmqueue_steal(zone, order, migratetype);
+ if (page) {
+ *mode = RMQUEUE_STEAL;
+ return page;
+ }
+ }
}
- return page;
+ return NULL;
}
/*
@@ -2312,6 +2354,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
unsigned long count, struct list_head *list,
int migratetype, unsigned int alloc_flags)
{
+ enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
unsigned long flags;
int i;
@@ -2323,7 +2366,7 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
}
for (i = 0; i < count; ++i) {
struct page *page = __rmqueue(zone, order, migratetype,
- alloc_flags);
+ alloc_flags, &rmqm);
if (unlikely(page == NULL))
break;
@@ -2948,7 +2991,9 @@ struct page *rmqueue_buddy(struct zone *preferred_zone, struct zone *zone,
if (alloc_flags & ALLOC_HIGHATOMIC)
page = __rmqueue_smallest(zone, order, MIGRATE_HIGHATOMIC);
if (!page) {
- page = __rmqueue(zone, order, migratetype, alloc_flags);
+ enum rmqueue_mode rmqm = RMQUEUE_NORMAL;
+
+ page = __rmqueue(zone, order, migratetype, alloc_flags, &rmqm);
/*
* If the allocation fails, allow OOM handling and
Hi there,
I wanted to follow up on my earlier message. Whenever you have a moment, I'd
love to hear your feedback.
Best regards,
Brielle
_____
From: Brielle Hayes
Sent: Friday, May 2, 2025 9:04 AM
To: linux-stable-mirror(a)lists.linaro.org
<mailto:linux-stable-mirror@lists.linaro.org>
Subject: Enhance Your Campaigns
Hi there,
I hope you're doing well.
Would you be interested in acquiring our opt-in data list for mobile
analytics platform users and customers worldwide?
We also provide current Users and Clients data lists for:
* Branch
* Adjust
* Mixpanel
* AppsFlyer
* CleverTap
* Amplitude Analytics
* Braze
* Singular
* Tenjin & many more...!
If this sounds relevant, I'd be happy to share more details along with a
sample for your review.
Best regards,
Brielle Hayes | Marketing Executive
Reply with "Remove" if you no longer wish to receive emails.
This is the start of the stable review cycle for the 5.15.182 release.
There are 55 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 09 May 2025 18:37:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.182-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.182-rc1
Nicolin Chen <nicolinc(a)nvidia.com>
iommu/arm-smmu-v3: Fix iommu_device_probe bug due to duplicated stream ids
Jason Gunthorpe <jgg(a)ziepe.ca>
iommu/arm-smmu-v3: Use the new rb tree helpers
Björn Töpel <bjorn(a)rivosinc.com>
riscv: uprobes: Add missing fence.i after building the XOL buffer
Stephan Gerhold <stephan.gerhold(a)linaro.org>
serial: msm: Configure correct working mode before starting earlycon
Suzuki K Poulose <suzuki.poulose(a)arm.com>
irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()
Thomas Gleixner <tglx(a)linutronix.de>
irqchip/gic-v2m: Mark a few functions __init
Xiang wangx <wangxiang(a)cdjrlc.com>
irqchip/gic-v2m: Add const to of_device_id
Christian Hewitt <christianshewitt(a)gmail.com>
Revert "drm/meson: vclk: fix calculation of 59.94 fractional rates"
Fiona Klute <fiona.klute(a)gmx.de>
net: phy: microchip: force IRQ polling mode for lan88xx
Sébastien Szymanski <sebastien.szymanski(a)armadeus.com>
ARM: dts: opos6ul: add ksz8081 phy properties
Cristian Marussi <cristian.marussi(a)arm.com>
firmware: arm_scmi: Balance device refcount when destroying devices
Yonglong Liu <liuyonglong(a)huawei.com>
net: hns3: fix deadlock issue when externel_lb and reset are executed together
Sergey Shtylyov <s.shtylyov(a)omp.ru>
of: module: add buffer overflow check in of_modalias()
Richard Zhu <hongxing.zhu(a)nxp.com>
PCI: imx6: Skip controller_id generation logic for i.MX7D
Jian Shen <shenjian15(a)huawei.com>
net: hns3: defer calling ptp_clock_register()
Hao Lan <lanhao(a)huawei.com>
net: hns3: fixed debugfs tm_qset size
Yonglong Liu <liuyonglong(a)huawei.com>
net: hns3: fix an interrupt residual problem
Yonglong Liu <liuyonglong(a)huawei.com>
net: hns3: add support for external loopback test
Jian Shen <shenjian15(a)huawei.com>
net: hns3: store rx VLAN tag offload state for VF
Mattias Barthel <mattias.barthel(a)atlascopco.com>
net: fec: ERR007885 Workaround for conventional TX
Thangaraj Samynathan <thangaraj.s(a)microchip.com>
net: lan743x: Fix memleak issue when GSO enabled
Michael Liang <mliang(a)purestorage.com>
nvme-tcp: fix premature queue removal and I/O failover
Michael Chan <michael.chan(a)broadcom.com>
bnxt_en: Fix ethtool -d byte order for 32-bit values
Shruti Parab <shruti.parab(a)broadcom.com>
bnxt_en: Fix out-of-bound memcpy() during ethtool -w
Shruti Parab <shruti.parab(a)broadcom.com>
bnxt_en: Fix coredump logic to free allocated buffer
Felix Fietkau <nbd(a)nbd.name>
net: ipv6: fix UDPv6 GSO segmentation with NAT
Simon Horman <horms(a)kernel.org>
net: dlink: Correct endianness handling of led_mode
Xuanqiang Luo <luoxuanqiang(a)kylinos.cn>
ice: Check VF VSI Pointer Value in ice_vc_add_fdir_fltr()
Brett Creeley <brett.creeley(a)intel.com>
ice: Refactor promiscuous functions
Victor Nogueira <victor(a)mojatatu.com>
net_sched: qfq: Fix double list add in class with netem as child qdisc
Victor Nogueira <victor(a)mojatatu.com>
net_sched: ets: Fix double list add in class with netem as child qdisc
Victor Nogueira <victor(a)mojatatu.com>
net_sched: hfsc: Fix a UAF vulnerability in class with netem as child qdisc
Victor Nogueira <victor(a)mojatatu.com>
net_sched: drr: Fix double list add in class with netem as child qdisc
Louis-Alexis Eyraud <louisalexis.eyraud(a)collabora.com>
net: ethernet: mtk-star-emac: rearm interrupts in rx_poll only when advised
Louis-Alexis Eyraud <louisalexis.eyraud(a)collabora.com>
net: ethernet: mtk-star-emac: fix spinlock recursion issues on rx/tx poll
Biao Huang <biao.huang(a)mediatek.com>
net: ethernet: mtk-star-emac: separate tx/rx handling with two NAPIs
Chris Mi <cmi(a)nvidia.com>
net/mlx5: E-switch, Fix error handling for enabling roce
Maor Gottlieb <maorg(a)nvidia.com>
net/mlx5: E-Switch, Initialize MAC Address for Default GID
Jakub Kicinski <kuba(a)kernel.org>
net/sched: act_mirred: don't override retval if we already lost the skb
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop
Jeongjun Park <aha310510(a)gmail.com>
tracing: Fix oob write in trace_seq_to_buffer()
Mingcong Bai <jeffbai(a)aosc.io>
iommu/vt-d: Apply quirk_iommu_igfx for 8086:0044 (QM57/QS57)
Pavel Paklov <Pavel.Paklov(a)cyberprotect.ru>
iommu/amd: Fix potential buffer overflow in parse_ivrs_acpihid
Benjamin Marzinski <bmarzins(a)redhat.com>
dm: always update the array size in realloc_argv on success
Mikulas Patocka <mpatocka(a)redhat.com>
dm-integrity: fix a warning on invalid table line
Wentao Liang <vulab(a)iscas.ac.cn>
wifi: brcm80211: fmac: Add error handling for brcmf_usb_dl_writeimage()
Ruslan Piasetskyi <ruslan.piasetskyi(a)gmail.com>
mmc: renesas_sdhi: Fix error handling in renesas_sdhi_probe
Vishal Badole <Vishal.Badole(a)amd.com>
amd-xgbe: Fix to ensure dependent features are toggled with RX checksum offload
Helge Deller <deller(a)gmx.de>
parisc: Fix double SIGFPE crash
Will Deacon <will(a)kernel.org>
arm64: errata: Add missing sentinels to Spectre-BHB MIDR arrays
Clark Wang <xiaoning.wang(a)nxp.com>
i2c: imx-lpi2c: Fix clock count when probe defers
Niravkumar L Rabara <niravkumar.l.rabara(a)altera.com>
EDAC/altera: Set DDR and SDMMC interrupt mask before registration
Niravkumar L Rabara <niravkumar.l.rabara(a)altera.com>
EDAC/altera: Test the correct error reg offset
Philipp Stanner <phasta(a)kernel.org>
drm/nouveau: Fix WARN_ON in nouveau_fence_context_kill()
Joachim Priesner <joachim.priesner(a)web.de>
ALSA: usb-audio: Add second USB ID for Jabra Evolve 65 headset
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/imx6ul-imx6ull-opos6ul.dtsi | 3 +
arch/arm64/kernel/proton-pack.c | 2 +
arch/parisc/math-emu/driver.c | 16 +-
arch/riscv/kernel/probes/uprobes.c | 10 +-
arch/x86/include/asm/kvm-x86-ops.h | 1 +
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/svm/svm.c | 13 +-
arch/x86/kvm/vmx/vmx.c | 11 +-
arch/x86/kvm/x86.c | 3 +
drivers/edac/altera_edac.c | 9 +-
drivers/edac/altera_edac.h | 2 +
drivers/firmware/arm_scmi/bus.c | 3 +
drivers/gpu/drm/meson/meson_vclk.c | 6 +-
drivers/gpu/drm/nouveau/nouveau_fence.c | 2 +-
drivers/i2c/busses/i2c-imx-lpi2c.c | 4 +-
drivers/iommu/amd/init.c | 8 +
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 79 ++---
drivers/iommu/intel/iommu.c | 4 +-
drivers/irqchip/irq-gic-v2m.c | 8 +-
drivers/md/dm-integrity.c | 2 +-
drivers/md/dm-table.c | 5 +-
drivers/mmc/host/renesas_sdhi_core.c | 10 +-
drivers/net/ethernet/amd/xgbe/xgbe-desc.c | 9 +-
drivers/net/ethernet/amd/xgbe/xgbe-dev.c | 24 +-
drivers/net/ethernet/amd/xgbe/xgbe-drv.c | 11 +-
drivers/net/ethernet/amd/xgbe/xgbe.h | 4 +
drivers/net/ethernet/broadcom/bnxt/bnxt_coredump.c | 30 +-
drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c | 36 ++-
drivers/net/ethernet/dlink/dl2k.c | 2 +-
drivers/net/ethernet/dlink/dl2k.h | 2 +-
drivers/net/ethernet/freescale/fec_main.c | 7 +-
drivers/net/ethernet/hisilicon/hns3/hnae3.h | 2 +
drivers/net/ethernet/hisilicon/hns3/hns3_debugfs.c | 2 +-
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c | 119 ++++++--
drivers/net/ethernet/hisilicon/hns3/hns3_enet.h | 3 +
drivers/net/ethernet/hisilicon/hns3/hns3_ethtool.c | 61 ++--
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 26 +-
.../net/ethernet/hisilicon/hns3/hns3pf/hclge_ptp.c | 13 +-
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.c | 25 +-
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_main.h | 1 +
drivers/net/ethernet/intel/ice/ice_fltr.c | 58 ++++
drivers/net/ethernet/intel/ice/ice_fltr.h | 12 +
drivers/net/ethernet/intel/ice/ice_main.c | 49 +--
drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c | 5 +
drivers/net/ethernet/intel/ice/ice_virtchnl_pf.c | 139 ++++-----
drivers/net/ethernet/mediatek/mtk_star_emac.c | 339 ++++++++++++---------
.../ethernet/mellanox/mlx5/core/eswitch_offloads.c | 5 +-
drivers/net/ethernet/mellanox/mlx5/core/rdma.c | 11 +-
drivers/net/ethernet/mellanox/mlx5/core/rdma.h | 4 +-
drivers/net/ethernet/microchip/lan743x_main.c | 8 +-
drivers/net/ethernet/microchip/lan743x_main.h | 1 +
drivers/net/phy/microchip.c | 46 +--
.../net/wireless/broadcom/brcm80211/brcmfmac/usb.c | 6 +-
drivers/nvme/host/tcp.c | 31 +-
drivers/of/device.c | 7 +-
drivers/pci/controller/dwc/pci-imx6.c | 5 +-
drivers/tty/serial/msm_serial.c | 6 +
kernel/trace/trace.c | 5 +-
net/ipv4/udp_offload.c | 61 +++-
net/sched/act_mirred.c | 22 +-
net/sched/sch_drr.c | 9 +-
net/sched/sch_ets.c | 9 +-
net/sched/sch_hfsc.c | 2 +-
net/sched/sch_qfq.c | 11 +-
sound/usb/format.c | 3 +-
66 files changed, 932 insertions(+), 505 deletions(-)
Hi All,
Chages since v4:
- unused pages leak is avoided
Chages since v3:
- pfn_to_virt() changed to page_to_virt() due to compile error
Chages since v2:
- page allocation moved out of the atomic context
Chages since v1:
- Fixes: and -stable tags added to the patch description
Thanks!
Alexander Gordeev (1):
kasan: Avoid sleepable page allocation from atomic context
mm/kasan/shadow.c | 77 ++++++++++++++++++++++++++++++++++++++---------
1 file changed, 63 insertions(+), 14 deletions(-)
--
2.45.2
Hi All,
This patch series adds initial support for the HEVC(H.265) and VP9
codecs in iris decoder. The objective of this work is to extend the
decoder's capabilities to handle HEVC and VP9 codec streams,
including necessary format handling and buffer management.
In addition, the series also includes a set of fixes to address issues
identified during testing of these additional codecs.
These patches also address the comments and feedback received from the
RFC patches previously sent. I have made the necessary improvements
based on the community's suggestions.
Changes in v4:
- Splitted patch patch 06/23 in two patches (Bryan)
- Simplified the conditional logic in patch 13/23 (Bryan)
- Improved commit description for patch patch 13/23 (Nicolas)
- Fix the value of H265_NUM_TILE_ROW macro (Neil)
- Link to v3: https://lore.kernel.org/r/20250502-qcom-iris-hevc-vp9-v3-0-552158a10a7d@qui…
Changes in v3:
- Introduced two wrappers with explicit names to handle destroy internal
buffers (Nicolas)
- Used sub state check instead of introducing new boolean (Vikash)
- Addressed other comments (Vikash)
- Reorderd patches to have all fixes patches first (Dmitry)
- Link to v2:
https://lore.kernel.org/r/20250428-qcom-iris-hevc-vp9-v2-0-3a6013ecb8a5@qui…
Changes in v2:
- Added Changes to make sure all buffers are released in session close
(bryna)
- Added tracking for flush responses to fix a timing issue.
- Added a handling to fix timing issue in reconfig
- Splitted patch 06/20 in two patches (Bryan)
- Added missing fixes tag (bryan)
- Updated fluster report (Nicolas)
- Link to v1:
https://lore.kernel.org/r/20250408-iris-dec-hevc-vp9-v1-0-acd258778bd6@quic…
Changes sinces RFC:
- Added additional fixes to address issues identified during further
testing.
- Moved typo fix to a seperate patch [Neil]
- Reordered the patches for better logical flow and clarity [Neil,
Dmitry]
- Added fixes tag wherever applicable [Neil, Dmitry]
- Removed the default case in the switch statement for codecs [Bryan]
- Replaced if-else statements with switch-case [Bryan]
- Added comments for mbpf [Bryan]
- RFC:
https://lore.kernel.org/linux-media/20250305104335.3629945-1-quic_dikshita@…
This patch series depends on [1] & [2]
[1]
https://lore.kernel.org/linux-media/20250417-topic-sm8x50-iris-v10-v7-0-f02…
[2]
https://lore.kernel.org/linux-media/20250424-qcs8300_iris-v5-0-f118f505c300…
These patches are tested on SM8250 and SM8550 with v4l2-ctl and
Gstreamer for HEVC and VP9 decoders, at the same time ensured that
the existing H264 decoder functionality remains uneffected.
Note: 1 of the fluster compliance test is fixed with firmware [3]
[3]:
https://lore.kernel.org/linux-firmware/1a511921-446d-cdc4-0203-084c88a5dc1e…
The result of fluster test on SM8550:
131/147 testcases passed while testing JCT-VC-HEVC_V1 with
GStreamer-H.265-V4L2-Gst1.0.
The failing test case:
- 10 testcases failed due to unsupported 10 bit format.
- DBLK_A_MAIN10_VIXS_4
- INITQP_B_Main10_Sony_1
- TSUNEQBD_A_MAIN10_Technicolor_2
- WP_A_MAIN10_Toshiba_3
- WP_MAIN10_B_Toshiba_3
- WPP_A_ericsson_MAIN10_2
- WPP_B_ericsson_MAIN10_2
- WPP_C_ericsson_MAIN10_2
- WPP_E_ericsson_MAIN10_2
- WPP_F_ericsson_MAIN10_2
- 4 testcase failed due to unsupported resolution
- PICSIZE_A_Bossen_1
- PICSIZE_B_Bossen_1
- WPP_D_ericsson_MAIN10_2
- WPP_D_ericsson_MAIN_2
- 2 testcase failed due to CRC mismatch
- RAP_A_docomo_6
- RAP_B_Bossen_2
- BUG reported:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4392
Analysis - First few frames in this discarded by firmware and are
sent to driver with 0 filled length. Driver send such buffers to
client with timestamp 0 and payload set to 0 and
make buf state to VB2_BUF_STATE_ERROR. Such buffers should be
dropped by GST. But instead, the first frame displayed as green
frame and when a valid buffer is sent to client later with same 0
timestamp, its dropped, leading to CRC mismatch for first frame.
235/305 testcases passed while testing VP9-TEST-VECTORS with
GStreamer-VP9-V4L2-Gst1.0.
The failing test case:
- 64 testcases failed due to unsupported resolution
- vp90-2-02-size-08x08.webm
- vp90-2-02-size-08x10.webm
- vp90-2-02-size-08x16.webm
- vp90-2-02-size-08x18.webm
- vp90-2-02-size-08x32.webm
- vp90-2-02-size-08x34.webm
- vp90-2-02-size-08x64.webm
- vp90-2-02-size-08x66.webm
- vp90-2-02-size-10x08.webm
- vp90-2-02-size-10x10.webm
- vp90-2-02-size-10x16.webm
- vp90-2-02-size-10x18.webm
- vp90-2-02-size-10x32.webm
- vp90-2-02-size-10x34.webm
- vp90-2-02-size-10x64.webm
- vp90-2-02-size-10x66.webm
- vp90-2-02-size-16x08.webm
- vp90-2-02-size-16x10.webm
- vp90-2-02-size-16x16.webm
- vp90-2-02-size-16x18.webm
- vp90-2-02-size-16x32.webm
- vp90-2-02-size-16x34.webm
- vp90-2-02-size-16x64.webm
- vp90-2-02-size-16x66.webm
- vp90-2-02-size-18x08.webm
- vp90-2-02-size-18x10.webm
- vp90-2-02-size-18x16.webm
- vp90-2-02-size-18x18.webm
- vp90-2-02-size-18x32.webm
- vp90-2-02-size-18x34.webm
- vp90-2-02-size-18x64.webm
- vp90-2-02-size-18x66.webm
- vp90-2-02-size-32x08.webm
- vp90-2-02-size-32x10.webm
- vp90-2-02-size-32x16.webm
- vp90-2-02-size-32x18.webm
- vp90-2-02-size-32x32.webm
- vp90-2-02-size-32x34.webm
- vp90-2-02-size-32x64.webm
- vp90-2-02-size-32x66.webm
- vp90-2-02-size-34x08.webm
- vp90-2-02-size-34x10.webm
- vp90-2-02-size-34x16.webm
- vp90-2-02-size-34x18.webm
- vp90-2-02-size-34x32.webm
- vp90-2-02-size-34x34.webm
- vp90-2-02-size-34x64.webm
- vp90-2-02-size-34x66.webm
- vp90-2-02-size-64x08.webm
- vp90-2-02-size-64x10.webm
- vp90-2-02-size-64x16.webm
- vp90-2-02-size-64x18.webm
- vp90-2-02-size-64x32.webm
- vp90-2-02-size-64x34.webm
- vp90-2-02-size-64x64.webm
- vp90-2-02-size-64x66.webm
- vp90-2-02-size-66x08.webm
- vp90-2-02-size-66x10.webm
- vp90-2-02-size-66x16.webm
- vp90-2-02-size-66x18.webm
- vp90-2-02-size-66x32.webm
- vp90-2-02-size-66x34.webm
- vp90-2-02-size-66x64.webm
- vp90-2-02-size-66x66.webm
- 2 testcases failed due to unsupported format
- vp91-2-04-yuv422.webm
- vp91-2-04-yuv444.webm
- 1 testcase failed with CRC mismatch
- vp90-2-22-svc_1280x720_3.ivf
- Bug reported:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4371
- 2 testcase failed due to unsupported resolution after sequence change
- vp90-2-21-resize_inter_320x180_5_1-2.webm
- vp90-2-21-resize_inter_320x180_7_1-2.webm
- 1 testcase failed due to unsupported stream
- vp90-2-16-intra-only.webm
The result of fluster test on SM8250:
133/147 testcases passed while testing JCT-VC-HEVC_V1 with
GStreamer-H.265-V4L2-Gst1.0.
The failing test case:
- 10 testcases failed due to unsupported 10 bit format.
- DBLK_A_MAIN10_VIXS_4
- INITQP_B_Main10_Sony_1
- TSUNEQBD_A_MAIN10_Technicolor_2
- WP_A_MAIN10_Toshiba_3
- WP_MAIN10_B_Toshiba_3
- WPP_A_ericsson_MAIN10_2
- WPP_B_ericsson_MAIN10_2
- WPP_C_ericsson_MAIN10_2
- WPP_E_ericsson_MAIN10_2
- WPP_F_ericsson_MAIN10_2
- 4 testcase failed due to unsupported resolution
- PICSIZE_A_Bossen_1
- PICSIZE_B_Bossen_1
- WPP_D_ericsson_MAIN10_2
- WPP_D_ericsson_MAIN_2
232/305 testcases passed while testing VP9-TEST-VECTORS with
GStreamer-VP9-V4L2-Gst1.0.
The failing test case:
- 64 testcases failed due to unsupported resolution
- vp90-2-02-size-08x08.webm
- vp90-2-02-size-08x10.webm
- vp90-2-02-size-08x16.webm
- vp90-2-02-size-08x18.webm
- vp90-2-02-size-08x32.webm
- vp90-2-02-size-08x34.webm
- vp90-2-02-size-08x64.webm
- vp90-2-02-size-08x66.webm
- vp90-2-02-size-10x08.webm
- vp90-2-02-size-10x10.webm
- vp90-2-02-size-10x16.webm
- vp90-2-02-size-10x18.webm
- vp90-2-02-size-10x32.webm
- vp90-2-02-size-10x34.webm
- vp90-2-02-size-10x64.webm
- vp90-2-02-size-10x66.webm
- vp90-2-02-size-16x08.webm
- vp90-2-02-size-16x10.webm
- vp90-2-02-size-16x16.webm
- vp90-2-02-size-16x18.webm
- vp90-2-02-size-16x32.webm
- vp90-2-02-size-16x34.webm
- vp90-2-02-size-16x64.webm
- vp90-2-02-size-16x66.webm
- vp90-2-02-size-18x08.webm
- vp90-2-02-size-18x10.webm
- vp90-2-02-size-18x16.webm
- vp90-2-02-size-18x18.webm
- vp90-2-02-size-18x32.webm
- vp90-2-02-size-18x34.webm
- vp90-2-02-size-18x64.webm
- vp90-2-02-size-18x66.webm
- vp90-2-02-size-32x08.webm
- vp90-2-02-size-32x10.webm
- vp90-2-02-size-32x16.webm
- vp90-2-02-size-32x18.webm
- vp90-2-02-size-32x32.webm
- vp90-2-02-size-32x34.webm
- vp90-2-02-size-32x64.webm
- vp90-2-02-size-32x66.webm
- vp90-2-02-size-34x08.webm
- vp90-2-02-size-34x10.webm
- vp90-2-02-size-34x16.webm
- vp90-2-02-size-34x18.webm
- vp90-2-02-size-34x32.webm
- vp90-2-02-size-34x34.webm
- vp90-2-02-size-34x64.webm
- vp90-2-02-size-34x66.webm
- vp90-2-02-size-64x08.webm
- vp90-2-02-size-64x10.webm
- vp90-2-02-size-64x16.webm
- vp90-2-02-size-64x18.webm
- vp90-2-02-size-64x32.webm
- vp90-2-02-size-64x34.webm
- vp90-2-02-size-64x64.webm
- vp90-2-02-size-64x66.webm
- vp90-2-02-size-66x08.webm
- vp90-2-02-size-66x10.webm
- vp90-2-02-size-66x16.webm
- vp90-2-02-size-66x18.webm
- vp90-2-02-size-66x32.webm
- vp90-2-02-size-66x34.webm
- vp90-2-02-size-66x64.webm
- vp90-2-02-size-66x66.webm
- 2 testcases failed due to unsupported format
- vp91-2-04-yuv422.webm
- vp91-2-04-yuv444.webm
- 1 testcase failed with CRC mismatch
- vp90-2-22-svc_1280x720_3.ivf
- Bug raised:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4371
- 5 testcase failed due to unsupported resolution after sequence change
- vp90-2-21-resize_inter_320x180_5_1-2.webm
- vp90-2-21-resize_inter_320x180_7_1-2.webm
- vp90-2-21-resize_inter_320x240_5_1-2.webm
- vp90-2-21-resize_inter_320x240_7_1-2.webm
- vp90-2-18-resize.ivf
- 1 testcase failed with CRC mismatch
- vp90-2-16-intra-only.webm
Analysis: First few frames are marked by firmware as NO_SHOW frame.
Driver make buf state to VB2_BUF_STATE_ERROR for such frames.
Such buffers should be dropped by GST. But instead, the first frame
is being displayed and when a valid buffer is sent to client later
with same timestamp, its dropped, leading to CRC mismatch for first
frame.
To: Vikash Garodia <quic_vgarodia(a)quicinc.com>
To: Abhinav Kumar <quic_abhinavk(a)quicinc.com>
To: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
To: Mauro Carvalho Chehab <mchehab(a)kernel.org>
To: Hans Verkuil <hverkuil(a)xs4all.nl>
To: Stefan Schmidt <stefan.schmidt(a)linaro.org>
Cc: linux-media(a)vger.kernel.org
Cc: linux-arm-msm(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: Dmitry Baryshkov <dmitry.baryshkov(a)oss.qualcomm.com>
Cc: Neil Armstrong <neil.armstrong(a)linaro.org>
Cc: Nicolas Dufresne <nicolas.dufresne(a)collabora.com>
Cc: Dan Carpenter <dan.carpenter(a)linaro.org>
Signed-off-by: Dikshita Agarwal <quic_dikshita(a)quicinc.com>
---
Dikshita Agarwal (25):
media: iris: Skip destroying internal buffer if not dequeued
media: iris: Update CAPTURE format info based on OUTPUT format
media: iris: Avoid updating frame size to firmware during reconfig
media: iris: Drop port check for session property response
media: iris: Prevent HFI queue writes when core is in deinit state
media: iris: Remove error check for non-zero v4l2 controls
media: iris: Remove deprecated property setting to firmware
media: iris: Fix missing function pointer initialization
media: iris: Fix NULL pointer dereference
media: iris: Fix typo in depth variable
media: iris: Track flush responses to prevent premature completion
media: iris: Fix buffer preparation failure during resolution change
media: iris: Send V4L2_BUF_FLAG_ERROR for capture buffers with 0 filled length
media: iris: Skip flush on first sequence change
media: iris: Remove unnecessary re-initialization of flush completion
media: iris: Add handling for corrupt and drop frames
media: iris: Add handling for no show frames
media: iris: Improve last flag handling
media: iris: Remove redundant buffer count check in stream off
media: iris: Add a comment to explain usage of MBPS
media: iris: Add HEVC and VP9 formats for decoder
media: iris: Add platform capabilities for HEVC and VP9 decoders
media: iris: Set mandatory properties for HEVC and VP9 decoders.
media: iris: Add internal buffer calculation for HEVC and VP9 decoders
media: iris: Add codec specific check for VP9 decoder drain handling
drivers/media/platform/qcom/iris/iris_buffer.c | 35 +-
drivers/media/platform/qcom/iris/iris_buffer.h | 3 +-
drivers/media/platform/qcom/iris/iris_ctrls.c | 35 +-
drivers/media/platform/qcom/iris/iris_hfi_common.h | 1 +
.../platform/qcom/iris/iris_hfi_gen1_command.c | 48 ++-
.../platform/qcom/iris/iris_hfi_gen1_defines.h | 5 +-
.../platform/qcom/iris/iris_hfi_gen1_response.c | 37 +-
.../platform/qcom/iris/iris_hfi_gen2_command.c | 143 +++++++-
.../platform/qcom/iris/iris_hfi_gen2_defines.h | 5 +
.../platform/qcom/iris/iris_hfi_gen2_response.c | 56 ++-
drivers/media/platform/qcom/iris/iris_hfi_queue.c | 2 +-
drivers/media/platform/qcom/iris/iris_instance.h | 6 +
.../platform/qcom/iris/iris_platform_common.h | 28 +-
.../media/platform/qcom/iris/iris_platform_gen2.c | 198 ++++++++--
.../platform/qcom/iris/iris_platform_qcs8300.h | 126 +++++--
.../platform/qcom/iris/iris_platform_sm8250.c | 15 +-
drivers/media/platform/qcom/iris/iris_state.c | 2 +-
drivers/media/platform/qcom/iris/iris_state.h | 1 +
drivers/media/platform/qcom/iris/iris_vb2.c | 18 +-
drivers/media/platform/qcom/iris/iris_vdec.c | 116 +++---
drivers/media/platform/qcom/iris/iris_vdec.h | 11 +
drivers/media/platform/qcom/iris/iris_vidc.c | 36 +-
drivers/media/platform/qcom/iris/iris_vpu_buffer.c | 397 ++++++++++++++++++++-
drivers/media/platform/qcom/iris/iris_vpu_buffer.h | 46 ++-
24 files changed, 1159 insertions(+), 211 deletions(-)
---
base-commit: 398a1b33f1479af35ca915c5efc9b00d6204f8fa
change-id: 20250507-video-iris-hevc-vp9-59096b189050
prerequisite-message-id: <20250417-topic-sm8x50-iris-v10-v7-0-f020cb1d0e98(a)linaro.org>
prerequisite-patch-id: afffe7096c8e110a8da08c987983bc4441d39578
prerequisite-patch-id: b93c37dc7e09d1631b75387dc1ca90e3066dce17
prerequisite-patch-id: b7b50aa1657be59fd51c3e53d73382a1ee75a08e
prerequisite-patch-id: 30960743105a36f20b3ec4a9ff19e7bca04d6add
prerequisite-patch-id: 2bba98151ca103aa62a513a0fbd0df7ae64d9868
prerequisite-patch-id: 0e43a6d758b5fa5ab921c6aa3c19859e312b47d0
prerequisite-patch-id: 35f8dae1416977e88c2db7c767800c01822e266e
prerequisite-message-id: <20250501-qcs8300_iris-v7-0-b229d5347990(a)quicinc.com>
prerequisite-patch-id: e35b05c527217206ae871aef0d7b0261af0319ea
prerequisite-patch-id: 07ba0745c7d72796567e0a57f5c8e5355a8d2046
prerequisite-patch-id: 3398937a7fabb45934bb98a530eef73252231132
prerequisite-patch-id: 500bc3b8391940d3ebca222d2098b737414b2af4
prerequisite-patch-id: 2e72fe4d11d264db3d42fa450427d30171303c6f
Best regards,
--
Dikshita Agarwal <quic_dikshita(a)quicinc.com>
From: Peter Wang <peter.wang(a)mediatek.com>
Because the member id of struct ufs_hw_queue is u32 (hwq->id) and
the trace entry hwq_id is also u32, the type should be changed to u32.
If mcq is not supported, SDB mode only supports one hardware queue,
for which setting the hwq_id to 0 is more suitable.
Fixes: 4a52338bf288 ("scsi: ufs: core: Add trace event for MCQ")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Peter Wang <peter.wang(a)mediatek.com>
---
drivers/ufs/core/ufshcd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 7735421e3991..14e4cfbcb9eb 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -432,7 +432,7 @@ static void ufshcd_add_command_trace(struct ufs_hba *hba, unsigned int tag,
u8 opcode = 0, group_id = 0;
u32 doorbell = 0;
u32 intr;
- int hwq_id = -1;
+ u32 hwq_id = 0;
struct ufshcd_lrb *lrbp = &hba->lrb[tag];
struct scsi_cmnd *cmd = lrbp->cmd;
struct request *rq = scsi_cmd_to_rq(cmd);
--
2.45.2
The quilt patch titled
Subject: mm: fix folio_pte_batch() on XEN PV
has been removed from the -mm tree. Its filename was
mm-fix-folio_pte_batch-on-xen-pv.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Petr Van��k <arkamar(a)atlas.cz>
Subject: mm: fix folio_pte_batch() on XEN PV
Date: Fri, 2 May 2025 23:50:19 +0200
On XEN PV, folio_pte_batch() can incorrectly batch beyond the end of a
folio due to a corner case in pte_advance_pfn(). Specifically, when the
PFN following the folio maps to an invalidated MFN,
expected_pte = pte_advance_pfn(expected_pte, nr);
produces a pte_none(). If the actual next PTE in memory is also
pte_none(), the pte_same() succeeds,
if (!pte_same(pte, expected_pte))
break;
the loop is not broken, and batching continues into unrelated memory.
For example, with a 4-page folio, the PTE layout might look like this:
[ 53.465673] [ T2552] folio_pte_batch: printing PTE values at addr=0x7f1ac9dc5000
[ 53.465674] [ T2552] PTE[453] = 000000010085c125
[ 53.465679] [ T2552] PTE[454] = 000000010085d125
[ 53.465682] [ T2552] PTE[455] = 000000010085e125
[ 53.465684] [ T2552] PTE[456] = 000000010085f125
[ 53.465686] [ T2552] PTE[457] = 0000000000000000 <-- not present
[ 53.465689] [ T2552] PTE[458] = 0000000101da7125
pte_advance_pfn(PTE[456]) returns a pte_none() due to invalid PFN->MFN
mapping. The next actual PTE (PTE[457]) is also pte_none(), so the loop
continues and includes PTE[457] in the batch, resulting in 5 batched
entries for a 4-page folio. This triggers the following warning:
[ 53.465751] [ T2552] page: refcount:85 mapcount:20 mapping:ffff88813ff4f6a8 index:0x110 pfn:0x10085c
[ 53.465754] [ T2552] head: order:2 mapcount:80 entire_mapcount:0 nr_pages_mapped:4 pincount:0
[ 53.465756] [ T2552] memcg:ffff888003573000
[ 53.465758] [ T2552] aops:0xffffffff8226fd20 ino:82467c dentry name(?):"libc.so.6"
[ 53.465761] [ T2552] flags: 0x2000000000416c(referenced|uptodate|lru|active|private|head|node=0|zone=2)
[ 53.465764] [ T2552] raw: 002000000000416c ffffea0004021f08 ffffea0004021908 ffff88813ff4f6a8
[ 53.465767] [ T2552] raw: 0000000000000110 ffff888133d8bd40 0000005500000013 ffff888003573000
[ 53.465768] [ T2552] head: 002000000000416c ffffea0004021f08 ffffea0004021908 ffff88813ff4f6a8
[ 53.465770] [ T2552] head: 0000000000000110 ffff888133d8bd40 0000005500000013 ffff888003573000
[ 53.465772] [ T2552] head: 0020000000000202 ffffea0004021701 000000040000004f 00000000ffffffff
[ 53.465774] [ T2552] head: 0000000300000003 8000000300000002 0000000000000013 0000000000000004
[ 53.465775] [ T2552] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: (struct folio *)_compound_head(page + nr_pages - 1))) != folio)
Original code works as expected everywhere, except on XEN PV, where
pte_advance_pfn() can yield a pte_none() after balloon inflation due to
MFNs invalidation. In XEN, pte_advance_pfn() ends up calling
__pte()->xen_make_pte()->pte_pfn_to_mfn(), which returns pte_none() when
mfn == INVALID_P2M_ENTRY.
The pte_pfn_to_mfn() documents that nastiness:
If there's no mfn for the pfn, then just create an
empty non-present pte. Unfortunately this loses
information about the original pfn, so
pte_mfn_to_pfn is asymmetric.
While such hacks should certainly be removed, we can do better in
folio_pte_batch() and simply check ahead of time how many PTEs we can
possibly batch in our folio.
This way, we can not only fix the issue but cleanup the code: removing the
pte_pfn() check inside the loop body and avoiding end_ptr comparison +
arithmetic.
Link: https://lkml.kernel.org/r/20250502215019.822-2-arkamar@atlas.cz
Fixes: f8d937761d65 ("mm/memory: optimize fork() with PTE-mapped THP")
Co-developed-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Petr Van��k <arkamar(a)atlas.cz>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/internal.h | 27 +++++++++++----------------
1 file changed, 11 insertions(+), 16 deletions(-)
--- a/mm/internal.h~mm-fix-folio_pte_batch-on-xen-pv
+++ a/mm/internal.h
@@ -248,11 +248,9 @@ static inline int folio_pte_batch(struct
pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
bool *any_writable, bool *any_young, bool *any_dirty)
{
- unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio);
- const pte_t *end_ptep = start_ptep + max_nr;
pte_t expected_pte, *ptep;
bool writable, young, dirty;
- int nr;
+ int nr, cur_nr;
if (any_writable)
*any_writable = false;
@@ -265,11 +263,15 @@ static inline int folio_pte_batch(struct
VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) != folio, folio);
+ /* Limit max_nr to the actual remaining PFNs in the folio we could batch. */
+ max_nr = min_t(unsigned long, max_nr,
+ folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
+
nr = pte_batch_hint(start_ptep, pte);
expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags);
ptep = start_ptep + nr;
- while (ptep < end_ptep) {
+ while (nr < max_nr) {
pte = ptep_get(ptep);
if (any_writable)
writable = !!pte_write(pte);
@@ -282,14 +284,6 @@ static inline int folio_pte_batch(struct
if (!pte_same(pte, expected_pte))
break;
- /*
- * Stop immediately once we reached the end of the folio. In
- * corner cases the next PFN might fall into a different
- * folio.
- */
- if (pte_pfn(pte) >= folio_end_pfn)
- break;
-
if (any_writable)
*any_writable |= writable;
if (any_young)
@@ -297,12 +291,13 @@ static inline int folio_pte_batch(struct
if (any_dirty)
*any_dirty |= dirty;
- nr = pte_batch_hint(ptep, pte);
- expected_pte = pte_advance_pfn(expected_pte, nr);
- ptep += nr;
+ cur_nr = pte_batch_hint(ptep, pte);
+ expected_pte = pte_advance_pfn(expected_pte, cur_nr);
+ ptep += cur_nr;
+ nr += cur_nr;
}
- return min(ptep - start_ptep, max_nr);
+ return min(nr, max_nr);
}
/**
_
Patches currently in -mm which might be from arkamar(a)atlas.cz are
The quilt patch titled
Subject: selftests/mm: fix a build failure on powerpc
has been removed from the -mm tree. Its filename was
selftests-mm-fix-a-build-failure-on-powerpc.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Nysal Jan K.A." <nysal(a)linux.ibm.com>
Subject: selftests/mm: fix a build failure on powerpc
Date: Mon, 28 Apr 2025 18:49:35 +0530
The compiler is unaware of the size of code generated by the ".rept"
assembler directive. This results in the compiler emitting branch
instructions where the offset to branch to exceeds the maximum allowed
value, resulting in build failures like the following:
CC protection_keys
/tmp/ccypKWAE.s: Assembler messages:
/tmp/ccypKWAE.s:2073: Error: operand out of range (0x0000000000020158
is not between 0xffffffffffff8000 and 0x0000000000007ffc)
/tmp/ccypKWAE.s:2509: Error: operand out of range (0x0000000000020130
is not between 0xffffffffffff8000 and 0x0000000000007ffc)
Fix the issue by manually adding nop instructions using the preprocessor.
Link: https://lkml.kernel.org/r/20250428131937.641989-2-nysal@linux.ibm.com
Fixes: 46036188ea1f ("selftests/mm: build with -O2")
Reported-by: Madhavan Srinivasan <maddy(a)linux.ibm.com>
Signed-off-by: Nysal Jan K.A. <nysal(a)linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88(a)linux.ibm.com>
Reviewed-by: Donet Tom <donettom(a)linux.ibm.com>
Tested-by: Donet Tom <donettom(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/pkey-powerpc.h | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/pkey-powerpc.h~selftests-mm-fix-a-build-failure-on-powerpc
+++ a/tools/testing/selftests/mm/pkey-powerpc.h
@@ -104,8 +104,18 @@ static inline void expect_fault_on_read_
return;
}
+#define REPEAT_8(s) s s s s s s s s
+#define REPEAT_64(s) REPEAT_8(s) REPEAT_8(s) REPEAT_8(s) REPEAT_8(s) \
+ REPEAT_8(s) REPEAT_8(s) REPEAT_8(s) REPEAT_8(s)
+#define REPEAT_512(s) REPEAT_64(s) REPEAT_64(s) REPEAT_64(s) REPEAT_64(s) \
+ REPEAT_64(s) REPEAT_64(s) REPEAT_64(s) REPEAT_64(s)
+#define REPEAT_4096(s) REPEAT_512(s) REPEAT_512(s) REPEAT_512(s) REPEAT_512(s) \
+ REPEAT_512(s) REPEAT_512(s) REPEAT_512(s) REPEAT_512(s)
+#define REPEAT_16384(s) REPEAT_4096(s) REPEAT_4096(s) \
+ REPEAT_4096(s) REPEAT_4096(s)
+
/* 4-byte instructions * 16384 = 64K page */
-#define __page_o_noops() asm(".rept 16384 ; nop; .endr")
+#define __page_o_noops() asm(REPEAT_16384("nop\n"))
static inline void *malloc_pkey_with_mprotect_subpage(long size, int prot, u16 pkey)
{
_
Patches currently in -mm which might be from nysal(a)linux.ibm.com are
watchdog-fix-watchdog-may-detect-false-positive-of-softlockup-fix.patch
The quilt patch titled
Subject: selftests/mm: fix build break when compiling pkey_util.c
has been removed from the -mm tree. Its filename was
selftests-mm-fix-build-break-when-compiling-pkey_utilc.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Madhavan Srinivasan <maddy(a)linux.ibm.com>
Subject: selftests/mm: fix build break when compiling pkey_util.c
Date: Mon, 28 Apr 2025 18:49:34 +0530
Commit 50910acd6f615 ("selftests/mm: use sys_pkey helpers consistently")
added a pkey_util.c to refactor some of the protection_keys functions
accessible by other tests. But this broken the build in powerpc in two
ways,
pkey-powerpc.h: In function `arch_is_powervm':
pkey-powerpc.h:73:21: error: storage size of `buf' isn't known
73 | struct stat buf;
| ^~~
pkey-powerpc.h:75:14: error: implicit declaration of function `stat'; did you mean `strcat'? [-Wimplicit-function-declaration]
75 | if ((stat("/sys/firmware/devicetree/base/ibm,partition-name", &buf) == 0) &&
| ^~~~
| strcat
Since pkey_util.c includes pkeys-helper.h, which in turn includes pkeys-powerpc.h,
stat.h including is missing for "struct stat". This is fixed by adding "sys/stat.h"
in pkeys-powerpc.h
Secondly,
pkey-powerpc.h:55:18: warning: format `%llx' expects argument of type `long long unsigned int', but argument 3 has type `u64' {aka `long unsigned int'} [-Wformat=]
55 | dprintf4("%s() changing %016llx to %016llx\n",
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
56 | __func__, __read_pkey_reg(), pkey_reg);
| ~~~~~~~~~~~~~~~~~
| |
| u64 {aka long unsigned int}
pkey-helpers.h:63:32: note: in definition of macro `dprintf_level'
63 | sigsafe_printf(args); \
| ^~~~
These format specifier related warning are removed by adding
"__SANE_USERSPACE_TYPES__" to pkeys_utils.c.
Link: https://lkml.kernel.org/r/20250428131937.641989-1-nysal@linux.ibm.com
Fixes: 50910acd6f61 ("selftests/mm: use sys_pkey helpers consistently")
Signed-off-by: Madhavan Srinivasan <maddy(a)linux.ibm.com>
Signed-off-by: Nysal Jan K.A. <nysal(a)linux.ibm.com>
Tested-by: Venkat Rao Bagalkote <venkat88(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/pkey-powerpc.h | 2 ++
tools/testing/selftests/mm/pkey_util.c | 1 +
2 files changed, 3 insertions(+)
--- a/tools/testing/selftests/mm/pkey-powerpc.h~selftests-mm-fix-build-break-when-compiling-pkey_utilc
+++ a/tools/testing/selftests/mm/pkey-powerpc.h
@@ -3,6 +3,8 @@
#ifndef _PKEYS_POWERPC_H
#define _PKEYS_POWERPC_H
+#include <sys/stat.h>
+
#ifndef SYS_pkey_alloc
# define SYS_pkey_alloc 384
# define SYS_pkey_free 385
--- a/tools/testing/selftests/mm/pkey_util.c~selftests-mm-fix-build-break-when-compiling-pkey_utilc
+++ a/tools/testing/selftests/mm/pkey_util.c
@@ -1,4 +1,5 @@
// SPDX-License-Identifier: GPL-2.0-only
+#define __SANE_USERSPACE_TYPES__
#include <sys/syscall.h>
#include <unistd.h>
_
Patches currently in -mm which might be from maddy(a)linux.ibm.com are
The quilt patch titled
Subject: tools/testing/selftests: fix guard region test tmpfs assumption
has been removed from the -mm tree. Its filename was
tools-testing-selftests-fix-guard-region-test-tmpfs-assumption.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: tools/testing/selftests: fix guard region test tmpfs assumption
Date: Fri, 25 Apr 2025 17:24:36 +0100
The current implementation of the guard region tests assume that /tmp is
mounted as tmpfs, that is shmem.
This isn't always the case, and at least one instance of a spurious test
failure has been reported as a result.
This assumption is unsafe, rushed and silly - and easily remedied by
simply using memfd, so do so.
We also have to fixup the readonly_file test to explicitly only be
applicable to file-backed cases.
Link: https://lkml.kernel.org/r/20250425162436.564002-1-lorenzo.stoakes@oracle.com
Fixes: 272f37d3e99a ("tools/selftests: expand all guard region tests to file-backed")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: Ryan Roberts <ryan.roberts(a)arm.com>
Closes: https://lore.kernel.org/linux-mm/a2d2766b-0ab4-437b-951a-8595a7506fe9@arm.c…
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/guard-regions.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
--- a/tools/testing/selftests/mm/guard-regions.c~tools-testing-selftests-fix-guard-region-test-tmpfs-assumption
+++ a/tools/testing/selftests/mm/guard-regions.c
@@ -271,12 +271,16 @@ FIXTURE_SETUP(guard_regions)
self->page_size = (unsigned long)sysconf(_SC_PAGESIZE);
setup_sighandler();
- if (variant->backing == ANON_BACKED)
+ switch (variant->backing) {
+ case ANON_BACKED:
return;
-
- self->fd = open_file(
- variant->backing == SHMEM_BACKED ? "/tmp/" : "",
- self->path);
+ case LOCAL_FILE_BACKED:
+ self->fd = open_file("", self->path);
+ break;
+ case SHMEM_BACKED:
+ self->fd = memfd_create(self->path, 0);
+ break;
+ }
/* We truncate file to at least 100 pages, tests can modify as needed. */
ASSERT_EQ(ftruncate(self->fd, 100 * self->page_size), 0);
@@ -1696,7 +1700,7 @@ TEST_F(guard_regions, readonly_file)
char *ptr;
int i;
- if (variant->backing == ANON_BACKED)
+ if (variant->backing != LOCAL_FILE_BACKED)
SKIP(return, "Read-only test specific to file-backed");
/* Map shared so we can populate with pattern, populate it, unmap. */
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
maintainers-add-mm-gup-section.patch
mm-vma-fix-incorrectly-disallowed-anonymous-vma-merges.patch
tools-testing-add-procmap_query-helper-functions-in-mm-self-tests.patch
tools-testing-selftests-assert-that-anon-merge-cases-behave-as-expected.patch
mm-move-mmap-vma-locking-logic-into-specific-files.patch
mm-establish-mm-vma_execc-for-shared-exec-mm-vma-functionality.patch
mm-abstract-initial-stack-setup-to-mm-subsystem.patch
mm-move-dup_mmap-to-mm.patch
mm-perform-vma-allocation-freeing-duplication-in-mm.patch
mm-introduce-new-mmap_prepare-file-callback.patch
mm-secretmem-convert-to-mmap_prepare-hook.patch
mm-vma-remove-mmap-retry-merge.patch
The quilt patch titled
Subject: ocfs2: stop quota recovery before disabling quotas
has been removed from the -mm tree. Its filename was
ocfs2-stop-quota-recovery-before-disabling-quotas.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jan Kara <jack(a)suse.cz>
Subject: ocfs2: stop quota recovery before disabling quotas
Date: Thu, 24 Apr 2025 15:45:13 +0200
Currently quota recovery is synchronized with unmount using sb->s_umount
semaphore. That is however prone to deadlocks because
flush_workqueue(osb->ocfs2_wq) called from umount code can wait for quota
recovery to complete while ocfs2_finish_quota_recovery() waits for
sb->s_umount semaphore.
Grabbing of sb->s_umount semaphore in ocfs2_finish_quota_recovery() is
only needed to protect that function from disabling of quotas from
ocfs2_dismount_volume(). Handle this problem by disabling quota recovery
early during unmount in ocfs2_dismount_volume() instead so that we can
drop acquisition of sb->s_umount from ocfs2_finish_quota_recovery().
Link: https://lkml.kernel.org/r/20250424134515.18933-6-jack@suse.cz
Fixes: 5f530de63cfc ("ocfs2: Use s_umount for quota recovery protection")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reported-by: Shichangkuo <shi.changkuo(a)h3c.com>
Reported-by: Murad Masimov <m.masimov(a)mt-integration.ru>
Reviewed-by: Heming Zhao <heming.zhao(a)suse.com>
Tested-by: Heming Zhao <heming.zhao(a)suse.com>
Acked-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/journal.c | 20 ++++++++++++++++++--
fs/ocfs2/journal.h | 1 +
fs/ocfs2/ocfs2.h | 6 ++++++
fs/ocfs2/quota_local.c | 9 ++-------
fs/ocfs2/super.c | 3 +++
5 files changed, 30 insertions(+), 9 deletions(-)
--- a/fs/ocfs2/journal.c~ocfs2-stop-quota-recovery-before-disabling-quotas
+++ a/fs/ocfs2/journal.c
@@ -225,6 +225,11 @@ out_lock:
flush_workqueue(osb->ocfs2_wq);
}
+void ocfs2_recovery_disable_quota(struct ocfs2_super *osb)
+{
+ ocfs2_recovery_disable(osb, OCFS2_REC_QUOTA_WANT_DISABLE);
+}
+
void ocfs2_recovery_exit(struct ocfs2_super *osb)
{
struct ocfs2_recovery_map *rm;
@@ -1489,6 +1494,18 @@ static int __ocfs2_recovery_thread(void
}
}
restart:
+ if (quota_enabled) {
+ mutex_lock(&osb->recovery_lock);
+ /* Confirm that recovery thread will no longer recover quotas */
+ if (osb->recovery_state == OCFS2_REC_QUOTA_WANT_DISABLE) {
+ osb->recovery_state = OCFS2_REC_QUOTA_DISABLED;
+ wake_up(&osb->recovery_event);
+ }
+ if (osb->recovery_state >= OCFS2_REC_QUOTA_DISABLED)
+ quota_enabled = 0;
+ mutex_unlock(&osb->recovery_lock);
+ }
+
status = ocfs2_super_lock(osb, 1);
if (status < 0) {
mlog_errno(status);
@@ -1592,8 +1609,7 @@ bail:
mutex_unlock(&osb->recovery_lock);
- if (quota_enabled)
- kfree(rm_quota);
+ kfree(rm_quota);
return status;
}
--- a/fs/ocfs2/journal.h~ocfs2-stop-quota-recovery-before-disabling-quotas
+++ a/fs/ocfs2/journal.h
@@ -148,6 +148,7 @@ void ocfs2_wait_for_recovery(struct ocfs
int ocfs2_recovery_init(struct ocfs2_super *osb);
void ocfs2_recovery_exit(struct ocfs2_super *osb);
+void ocfs2_recovery_disable_quota(struct ocfs2_super *osb);
int ocfs2_compute_replay_slots(struct ocfs2_super *osb);
void ocfs2_free_replay_slots(struct ocfs2_super *osb);
--- a/fs/ocfs2/ocfs2.h~ocfs2-stop-quota-recovery-before-disabling-quotas
+++ a/fs/ocfs2/ocfs2.h
@@ -310,6 +310,12 @@ void ocfs2_initialize_journal_triggers(s
enum ocfs2_recovery_state {
OCFS2_REC_ENABLED = 0,
+ OCFS2_REC_QUOTA_WANT_DISABLE,
+ /*
+ * Must be OCFS2_REC_QUOTA_WANT_DISABLE + 1 for
+ * ocfs2_recovery_disable_quota() to work.
+ */
+ OCFS2_REC_QUOTA_DISABLED,
OCFS2_REC_WANT_DISABLE,
/*
* Must be OCFS2_REC_WANT_DISABLE + 1 for ocfs2_recovery_exit() to work
--- a/fs/ocfs2/quota_local.c~ocfs2-stop-quota-recovery-before-disabling-quotas
+++ a/fs/ocfs2/quota_local.c
@@ -453,8 +453,7 @@ out:
/* Sync changes in local quota file into global quota file and
* reinitialize local quota file.
- * The function expects local quota file to be already locked and
- * s_umount locked in shared mode. */
+ * The function expects local quota file to be already locked. */
static int ocfs2_recover_local_quota_file(struct inode *lqinode,
int type,
struct ocfs2_quota_recovery *rec)
@@ -588,7 +587,6 @@ int ocfs2_finish_quota_recovery(struct o
{
unsigned int ino[OCFS2_MAXQUOTAS] = { LOCAL_USER_QUOTA_SYSTEM_INODE,
LOCAL_GROUP_QUOTA_SYSTEM_INODE };
- struct super_block *sb = osb->sb;
struct ocfs2_local_disk_dqinfo *ldinfo;
struct buffer_head *bh;
handle_t *handle;
@@ -600,7 +598,6 @@ int ocfs2_finish_quota_recovery(struct o
printk(KERN_NOTICE "ocfs2: Finishing quota recovery on device (%s) for "
"slot %u\n", osb->dev_str, slot_num);
- down_read(&sb->s_umount);
for (type = 0; type < OCFS2_MAXQUOTAS; type++) {
if (list_empty(&(rec->r_list[type])))
continue;
@@ -677,7 +674,6 @@ out_put:
break;
}
out:
- up_read(&sb->s_umount);
kfree(rec);
return status;
}
@@ -843,8 +839,7 @@ static int ocfs2_local_free_info(struct
ocfs2_release_local_quota_bitmaps(&oinfo->dqi_chunk);
/*
- * s_umount held in exclusive mode protects us against racing with
- * recovery thread...
+ * ocfs2_dismount_volume() has already aborted quota recovery...
*/
if (oinfo->dqi_rec) {
ocfs2_free_quota_recovery(oinfo->dqi_rec);
--- a/fs/ocfs2/super.c~ocfs2-stop-quota-recovery-before-disabling-quotas
+++ a/fs/ocfs2/super.c
@@ -1812,6 +1812,9 @@ static void ocfs2_dismount_volume(struct
/* Orphan scan should be stopped as early as possible */
ocfs2_orphan_scan_stop(osb);
+ /* Stop quota recovery so that we can disable quotas */
+ ocfs2_recovery_disable_quota(osb);
+
ocfs2_disable_quotas(osb);
/* All dquots should be freed by now */
_
Patches currently in -mm which might be from jack(a)suse.cz are
The quilt patch titled
Subject: ocfs2: implement handshaking with ocfs2 recovery thread
has been removed from the -mm tree. Its filename was
ocfs2-implement-handshaking-with-ocfs2-recovery-thread.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jan Kara <jack(a)suse.cz>
Subject: ocfs2: implement handshaking with ocfs2 recovery thread
Date: Thu, 24 Apr 2025 15:45:12 +0200
We will need ocfs2 recovery thread to acknowledge transitions of
recovery_state when disabling particular types of recovery. This is
similar to what currently happens when disabling recovery completely, just
more general. Implement the handshake and use it for exit from recovery.
Link: https://lkml.kernel.org/r/20250424134515.18933-5-jack@suse.cz
Fixes: 5f530de63cfc ("ocfs2: Use s_umount for quota recovery protection")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Heming Zhao <heming.zhao(a)suse.com>
Tested-by: Heming Zhao <heming.zhao(a)suse.com>
Acked-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Murad Masimov <m.masimov(a)mt-integration.ru>
Cc: Shichangkuo <shi.changkuo(a)h3c.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/journal.c | 52 ++++++++++++++++++++++++++++---------------
fs/ocfs2/ocfs2.h | 4 +++
2 files changed, 39 insertions(+), 17 deletions(-)
--- a/fs/ocfs2/journal.c~ocfs2-implement-handshaking-with-ocfs2-recovery-thread
+++ a/fs/ocfs2/journal.c
@@ -190,31 +190,48 @@ int ocfs2_recovery_init(struct ocfs2_sup
return 0;
}
-/* we can't grab the goofy sem lock from inside wait_event, so we use
- * memory barriers to make sure that we'll see the null task before
- * being woken up */
static int ocfs2_recovery_thread_running(struct ocfs2_super *osb)
{
- mb();
return osb->recovery_thread_task != NULL;
}
-void ocfs2_recovery_exit(struct ocfs2_super *osb)
+static void ocfs2_recovery_disable(struct ocfs2_super *osb,
+ enum ocfs2_recovery_state state)
{
- struct ocfs2_recovery_map *rm;
-
- /* disable any new recovery threads and wait for any currently
- * running ones to exit. Do this before setting the vol_state. */
mutex_lock(&osb->recovery_lock);
- osb->recovery_state = OCFS2_REC_DISABLED;
+ /*
+ * If recovery thread is not running, we can directly transition to
+ * final state.
+ */
+ if (!ocfs2_recovery_thread_running(osb)) {
+ osb->recovery_state = state + 1;
+ goto out_lock;
+ }
+ osb->recovery_state = state;
+ /* Wait for recovery thread to acknowledge state transition */
+ wait_event_cmd(osb->recovery_event,
+ !ocfs2_recovery_thread_running(osb) ||
+ osb->recovery_state >= state + 1,
+ mutex_unlock(&osb->recovery_lock),
+ mutex_lock(&osb->recovery_lock));
+out_lock:
mutex_unlock(&osb->recovery_lock);
- wait_event(osb->recovery_event, !ocfs2_recovery_thread_running(osb));
- /* At this point, we know that no more recovery threads can be
- * launched, so wait for any recovery completion work to
- * complete. */
+ /*
+ * At this point we know that no more recovery work can be queued so
+ * wait for any recovery completion work to complete.
+ */
if (osb->ocfs2_wq)
flush_workqueue(osb->ocfs2_wq);
+}
+
+void ocfs2_recovery_exit(struct ocfs2_super *osb)
+{
+ struct ocfs2_recovery_map *rm;
+
+ /* disable any new recovery threads and wait for any currently
+ * running ones to exit. Do this before setting the vol_state. */
+ ocfs2_recovery_disable(osb, OCFS2_REC_WANT_DISABLE);
/*
* Now that recovery is shut down, and the osb is about to be
@@ -1569,7 +1586,8 @@ bail:
ocfs2_free_replay_slots(osb);
osb->recovery_thread_task = NULL;
- mb(); /* sync with ocfs2_recovery_thread_running */
+ if (osb->recovery_state == OCFS2_REC_WANT_DISABLE)
+ osb->recovery_state = OCFS2_REC_DISABLED;
wake_up(&osb->recovery_event);
mutex_unlock(&osb->recovery_lock);
@@ -1585,13 +1603,13 @@ void ocfs2_recovery_thread(struct ocfs2_
int was_set = -1;
mutex_lock(&osb->recovery_lock);
- if (osb->recovery_state < OCFS2_REC_DISABLED)
+ if (osb->recovery_state < OCFS2_REC_WANT_DISABLE)
was_set = ocfs2_recovery_map_set(osb, node_num);
trace_ocfs2_recovery_thread(node_num, osb->node_num,
osb->recovery_state, osb->recovery_thread_task, was_set);
- if (osb->recovery_state == OCFS2_REC_DISABLED)
+ if (osb->recovery_state >= OCFS2_REC_WANT_DISABLE)
goto out;
if (osb->recovery_thread_task)
--- a/fs/ocfs2/ocfs2.h~ocfs2-implement-handshaking-with-ocfs2-recovery-thread
+++ a/fs/ocfs2/ocfs2.h
@@ -310,6 +310,10 @@ void ocfs2_initialize_journal_triggers(s
enum ocfs2_recovery_state {
OCFS2_REC_ENABLED = 0,
+ OCFS2_REC_WANT_DISABLE,
+ /*
+ * Must be OCFS2_REC_WANT_DISABLE + 1 for ocfs2_recovery_exit() to work
+ */
OCFS2_REC_DISABLED,
};
_
Patches currently in -mm which might be from jack(a)suse.cz are
The quilt patch titled
Subject: ocfs2: switch osb->disable_recovery to enum
has been removed from the -mm tree. Its filename was
ocfs2-switch-osb-disable_recovery-to-enum.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Jan Kara <jack(a)suse.cz>
Subject: ocfs2: switch osb->disable_recovery to enum
Date: Thu, 24 Apr 2025 15:45:11 +0200
Patch series "ocfs2: Fix deadlocks in quota recovery", v3.
This implements another approach to fixing quota recovery deadlocks. We
avoid grabbing sb->s_umount semaphore from ocfs2_finish_quota_recovery()
and instead stop quota recovery early in ocfs2_dismount_volume().
This patch (of 3):
We will need more recovery states than just pure enable / disable to fix
deadlocks with quota recovery. Switch osb->disable_recovery to enum.
Link: https://lkml.kernel.org/r/20250424134301.1392-1-jack@suse.cz
Link: https://lkml.kernel.org/r/20250424134515.18933-4-jack@suse.cz
Fixes: 5f530de63cfc ("ocfs2: Use s_umount for quota recovery protection")
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Heming Zhao <heming.zhao(a)suse.com>
Tested-by: Heming Zhao <heming.zhao(a)suse.com>
Acked-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: Murad Masimov <m.masimov(a)mt-integration.ru>
Cc: Shichangkuo <shi.changkuo(a)h3c.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/journal.c | 14 ++++++++------
fs/ocfs2/ocfs2.h | 7 ++++++-
2 files changed, 14 insertions(+), 7 deletions(-)
--- a/fs/ocfs2/journal.c~ocfs2-switch-osb-disable_recovery-to-enum
+++ a/fs/ocfs2/journal.c
@@ -174,7 +174,7 @@ int ocfs2_recovery_init(struct ocfs2_sup
struct ocfs2_recovery_map *rm;
mutex_init(&osb->recovery_lock);
- osb->disable_recovery = 0;
+ osb->recovery_state = OCFS2_REC_ENABLED;
osb->recovery_thread_task = NULL;
init_waitqueue_head(&osb->recovery_event);
@@ -206,7 +206,7 @@ void ocfs2_recovery_exit(struct ocfs2_su
/* disable any new recovery threads and wait for any currently
* running ones to exit. Do this before setting the vol_state. */
mutex_lock(&osb->recovery_lock);
- osb->disable_recovery = 1;
+ osb->recovery_state = OCFS2_REC_DISABLED;
mutex_unlock(&osb->recovery_lock);
wait_event(osb->recovery_event, !ocfs2_recovery_thread_running(osb));
@@ -1582,14 +1582,16 @@ bail:
void ocfs2_recovery_thread(struct ocfs2_super *osb, int node_num)
{
+ int was_set = -1;
+
mutex_lock(&osb->recovery_lock);
+ if (osb->recovery_state < OCFS2_REC_DISABLED)
+ was_set = ocfs2_recovery_map_set(osb, node_num);
trace_ocfs2_recovery_thread(node_num, osb->node_num,
- osb->disable_recovery, osb->recovery_thread_task,
- osb->disable_recovery ?
- -1 : ocfs2_recovery_map_set(osb, node_num));
+ osb->recovery_state, osb->recovery_thread_task, was_set);
- if (osb->disable_recovery)
+ if (osb->recovery_state == OCFS2_REC_DISABLED)
goto out;
if (osb->recovery_thread_task)
--- a/fs/ocfs2/ocfs2.h~ocfs2-switch-osb-disable_recovery-to-enum
+++ a/fs/ocfs2/ocfs2.h
@@ -308,6 +308,11 @@ enum ocfs2_journal_trigger_type {
void ocfs2_initialize_journal_triggers(struct super_block *sb,
struct ocfs2_triggers triggers[]);
+enum ocfs2_recovery_state {
+ OCFS2_REC_ENABLED = 0,
+ OCFS2_REC_DISABLED,
+};
+
struct ocfs2_journal;
struct ocfs2_slot_info;
struct ocfs2_recovery_map;
@@ -370,7 +375,7 @@ struct ocfs2_super
struct ocfs2_recovery_map *recovery_map;
struct ocfs2_replay_map *replay_map;
struct task_struct *recovery_thread_task;
- int disable_recovery;
+ enum ocfs2_recovery_state recovery_state;
wait_queue_head_t checkpoint_event;
struct ocfs2_journal *journal;
unsigned long osb_commit_interval;
_
Patches currently in -mm which might be from jack(a)suse.cz are
The quilt patch titled
Subject: mm/userfaultfd: fix uninitialized output field for -EAGAIN race
has been removed from the -mm tree. Its filename was
mm-userfaultfd-fix-uninitialized-output-field-for-eagain-race.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/userfaultfd: fix uninitialized output field for -EAGAIN race
Date: Thu, 24 Apr 2025 17:57:28 -0400
While discussing some userfaultfd relevant issues recently, Andrea noticed
a potential ABI breakage with -EAGAIN on almost all userfaultfd ioctl()s.
Quote from Andrea, explaining how -EAGAIN was processed, and how this
should fix it (taking example of UFFDIO_COPY ioctl):
The "mmap_changing" and "stale pmd" conditions are already reported as
-EAGAIN written in the copy field, this does not change it. This change
removes the subnormal case that left copy.copy uninitialized and required
apps to explicitly set the copy field to get deterministic
behavior (which is a requirement contrary to the documentation in both
the manpage and source code). In turn there's no alteration to backwards
compatibility as result of this change because userland will find the
copy field consistently set to -EAGAIN, and not anymore sometime -EAGAIN
and sometime uninitialized.
Even then the change only can make a difference to non cooperative users
of userfaultfd, so when UFFD_FEATURE_EVENT_* is enabled, which is not
true for the vast majority of apps using userfaultfd or this unintended
uninitialized field may have been noticed sooner.
Meanwhile, since this bug existed for years, it also almost affects all
ioctl()s that was introduced later. Besides UFFDIO_ZEROPAGE, these also
get affected in the same way:
- UFFDIO_CONTINUE
- UFFDIO_POISON
- UFFDIO_MOVE
This patch should have fixed all of them.
Link: https://lkml.kernel.org/r/20250424215729.194656-2-peterx@redhat.com
Fixes: df2cc96e7701 ("userfaultfd: prevent non-cooperative events vs mcopy_atomic races")
Fixes: f619147104c8 ("userfaultfd: add UFFDIO_CONTINUE ioctl")
Fixes: fc71884a5f59 ("mm: userfaultfd: add new UFFDIO_POISON ioctl")
Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Reported-by: Andrea Arcangeli <aarcange(a)redhat.com>
Suggested-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/userfaultfd.c | 28 ++++++++++++++++++++++------
1 file changed, 22 insertions(+), 6 deletions(-)
--- a/fs/userfaultfd.c~mm-userfaultfd-fix-uninitialized-output-field-for-eagain-race
+++ a/fs/userfaultfd.c
@@ -1585,8 +1585,11 @@ static int userfaultfd_copy(struct userf
user_uffdio_copy = (struct uffdio_copy __user *) arg;
ret = -EAGAIN;
- if (atomic_read(&ctx->mmap_changing))
+ if (unlikely(atomic_read(&ctx->mmap_changing))) {
+ if (unlikely(put_user(ret, &user_uffdio_copy->copy)))
+ return -EFAULT;
goto out;
+ }
ret = -EFAULT;
if (copy_from_user(&uffdio_copy, user_uffdio_copy,
@@ -1641,8 +1644,11 @@ static int userfaultfd_zeropage(struct u
user_uffdio_zeropage = (struct uffdio_zeropage __user *) arg;
ret = -EAGAIN;
- if (atomic_read(&ctx->mmap_changing))
+ if (unlikely(atomic_read(&ctx->mmap_changing))) {
+ if (unlikely(put_user(ret, &user_uffdio_zeropage->zeropage)))
+ return -EFAULT;
goto out;
+ }
ret = -EFAULT;
if (copy_from_user(&uffdio_zeropage, user_uffdio_zeropage,
@@ -1744,8 +1750,11 @@ static int userfaultfd_continue(struct u
user_uffdio_continue = (struct uffdio_continue __user *)arg;
ret = -EAGAIN;
- if (atomic_read(&ctx->mmap_changing))
+ if (unlikely(atomic_read(&ctx->mmap_changing))) {
+ if (unlikely(put_user(ret, &user_uffdio_continue->mapped)))
+ return -EFAULT;
goto out;
+ }
ret = -EFAULT;
if (copy_from_user(&uffdio_continue, user_uffdio_continue,
@@ -1801,8 +1810,11 @@ static inline int userfaultfd_poison(str
user_uffdio_poison = (struct uffdio_poison __user *)arg;
ret = -EAGAIN;
- if (atomic_read(&ctx->mmap_changing))
+ if (unlikely(atomic_read(&ctx->mmap_changing))) {
+ if (unlikely(put_user(ret, &user_uffdio_poison->updated)))
+ return -EFAULT;
goto out;
+ }
ret = -EFAULT;
if (copy_from_user(&uffdio_poison, user_uffdio_poison,
@@ -1870,8 +1882,12 @@ static int userfaultfd_move(struct userf
user_uffdio_move = (struct uffdio_move __user *) arg;
- if (atomic_read(&ctx->mmap_changing))
- return -EAGAIN;
+ ret = -EAGAIN;
+ if (unlikely(atomic_read(&ctx->mmap_changing))) {
+ if (unlikely(put_user(ret, &user_uffdio_move->move)))
+ return -EFAULT;
+ goto out;
+ }
if (copy_from_user(&uffdio_move, user_uffdio_move,
/* don't copy "move" last field */
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-selftests-add-a-test-to-verify-mmap_changing-race-with-eagain.patch
The quilt patch titled
Subject: selftests/mm: compaction_test: support platform with huge mount of memory
has been removed from the -mm tree. Its filename was
selftests-mm-compaction_test-support-platform-with-huge-mount-of-memory.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Feng Tang <feng.tang(a)linux.alibaba.com>
Subject: selftests/mm: compaction_test: support platform with huge mount of memory
Date: Wed, 23 Apr 2025 18:36:45 +0800
When running mm selftest to verify mm patches, 'compaction_test' case
failed on an x86 server with 1TB memory. And the root cause is that it
has too much free memory than what the test supports.
The test case tries to allocate 100000 huge pages, which is about 200 GB
for that x86 server, and when it succeeds, it expects it's large than 1/3
of 80% of the free memory in system. This logic only works for platform
with 750 GB ( 200 / (1/3) / 80% ) or less free memory, and may raise false
alarm for others.
Fix it by changing the fixed page number to self-adjustable number
according to the real number of free memory.
Link: https://lkml.kernel.org/r/20250423103645.2758-1-feng.tang@linux.alibaba.com
Fixes: bd67d5c15cc1 ("Test compaction of mlocked memory")
Signed-off-by: Feng Tang <feng.tang(a)linux.alibaba.com>
Acked-by: Dev Jain <dev.jain(a)arm.com>
Reviewed-by: Baolin Wang <baolin.wang(a)linux.alibaba.com>
Tested-by: Baolin Wang <baolin.wang(a)inux.alibaba.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Sri Jayaramappa <sjayaram(a)akamai.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/compaction_test.c | 19 ++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
--- a/tools/testing/selftests/mm/compaction_test.c~selftests-mm-compaction_test-support-platform-with-huge-mount-of-memory
+++ a/tools/testing/selftests/mm/compaction_test.c
@@ -90,6 +90,8 @@ int check_compaction(unsigned long mem_f
int compaction_index = 0;
char nr_hugepages[20] = {0};
char init_nr_hugepages[24] = {0};
+ char target_nr_hugepages[24] = {0};
+ int slen;
snprintf(init_nr_hugepages, sizeof(init_nr_hugepages),
"%lu", initial_nr_hugepages);
@@ -106,11 +108,18 @@ int check_compaction(unsigned long mem_f
goto out;
}
- /* Request a large number of huge pages. The Kernel will allocate
- as much as it can */
- if (write(fd, "100000", (6*sizeof(char))) != (6*sizeof(char))) {
- ksft_print_msg("Failed to write 100000 to /proc/sys/vm/nr_hugepages: %s\n",
- strerror(errno));
+ /*
+ * Request huge pages for about half of the free memory. The Kernel
+ * will allocate as much as it can, and we expect it will get at least 1/3
+ */
+ nr_hugepages_ul = mem_free / hugepage_size / 2;
+ snprintf(target_nr_hugepages, sizeof(target_nr_hugepages),
+ "%lu", nr_hugepages_ul);
+
+ slen = strlen(target_nr_hugepages);
+ if (write(fd, target_nr_hugepages, slen) != slen) {
+ ksft_print_msg("Failed to write %lu to /proc/sys/vm/nr_hugepages: %s\n",
+ nr_hugepages_ul, strerror(errno));
goto close_fd;
}
_
Patches currently in -mm which might be from feng.tang(a)linux.alibaba.com are
The quilt patch titled
Subject: ocfs2: fix panic in failed foilio allocation
has been removed from the -mm tree. Its filename was
v2-ocfs2-fix-panic-in-failed-foilio-allocation.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Mark Tinguely <mark.tinguely(a)oracle.com>
Subject: ocfs2: fix panic in failed foilio allocation
Date: Fri, 11 Apr 2025 11:31:24 -0500
commit 7e119cff9d0a ("ocfs2: convert w_pages to w_folios") and commit
9a5e08652dc4b ("ocfs2: use an array of folios instead of an array of
pages") save -ENOMEM in the folio array upon allocation failure and call
the folio array free code.
The folio array free code expects either valid folio pointers or NULL.
Finding the -ENOMEM will result in a panic. Fix by NULLing the error
folio entry.
Link: https://lkml.kernel.org/r/c879a52b-835c-4fa0-902b-8b2e9196dcbd@oracle.com
Fixes: 7e119cff9d0a ("ocfs2: convert w_pages to w_folios")
Fixes: 9a5e08652dc4b ("ocfs2: use an array of folios instead of an array of pages")
Signed-off-by: Mark Tinguely <mark.tinguely(a)oracle.com>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/alloc.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/ocfs2/alloc.c~v2-ocfs2-fix-panic-in-failed-foilio-allocation
+++ a/fs/ocfs2/alloc.c
@@ -6918,6 +6918,7 @@ static int ocfs2_grab_folios(struct inod
if (IS_ERR(folios[numfolios])) {
ret = PTR_ERR(folios[numfolios]);
mlog_errno(ret);
+ folios[numfolios] = NULL;
goto out;
}
_
Patches currently in -mm which might be from mark.tinguely(a)oracle.com are
The quilt patch titled
Subject: mm/huge_memory: fix dereferencing invalid pmd migration entry
has been removed from the -mm tree. Its filename was
mm-huge_memory-fix-dereferencing-invalid-pmd-migration-entry.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Gavin Guo <gavinguo(a)igalia.com>
Subject: mm/huge_memory: fix dereferencing invalid pmd migration entry
Date: Mon, 21 Apr 2025 19:35:36 +0800
When migrating a THP, concurrent access to the PMD migration entry during
a deferred split scan can lead to an invalid address access, as
illustrated below. To prevent this invalid access, it is necessary to
check the PMD migration entry and return early. In this context, there is
no need to use pmd_to_swp_entry and pfn_swap_entry_to_page to verify the
equality of the target folio. Since the PMD migration entry is locked, it
cannot be served as the target.
Mailing list discussion and explanation from Hugh Dickins: "An anon_vma
lookup points to a location which may contain the folio of interest, but
might instead contain another folio: and weeding out those other folios is
precisely what the "folio != pmd_folio((*pmd)" check (and the "risk of
replacing the wrong folio" comment a few lines above it) is for."
BUG: unable to handle page fault for address: ffffea60001db008
CPU: 0 UID: 0 PID: 2199114 Comm: tee Not tainted 6.14.0+ #4 NONE
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
RIP: 0010:split_huge_pmd_locked+0x3b5/0x2b60
Call Trace:
<TASK>
try_to_migrate_one+0x28c/0x3730
rmap_walk_anon+0x4f6/0x770
unmap_folio+0x196/0x1f0
split_huge_page_to_list_to_order+0x9f6/0x1560
deferred_split_scan+0xac5/0x12a0
shrinker_debugfs_scan_write+0x376/0x470
full_proxy_write+0x15c/0x220
vfs_write+0x2fc/0xcb0
ksys_write+0x146/0x250
do_syscall_64+0x6a/0x120
entry_SYSCALL_64_after_hwframe+0x76/0x7e
The bug is found by syzkaller on an internal kernel, then confirmed on
upstream.
Link: https://lkml.kernel.org/r/20250421113536.3682201-1-gavinguo@igalia.com
Link: https://lore.kernel.org/all/20250414072737.1698513-1-gavinguo@igalia.com/
Link: https://lore.kernel.org/all/20250418085802.2973519-1-gavinguo@igalia.com/
Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Gavin Guo <gavinguo(a)igalia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Gavin Shan <gshan(a)redhat.com>
Cc: Florent Revest <revest(a)google.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
--- a/mm/huge_memory.c~mm-huge_memory-fix-dereferencing-invalid-pmd-migration-entry
+++ a/mm/huge_memory.c
@@ -3075,6 +3075,8 @@ static void __split_huge_pmd_locked(stru
void split_huge_pmd_locked(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmd, bool freeze, struct folio *folio)
{
+ bool pmd_migration = is_pmd_migration_entry(*pmd);
+
VM_WARN_ON_ONCE(folio && !folio_test_pmd_mappable(folio));
VM_WARN_ON_ONCE(!IS_ALIGNED(address, HPAGE_PMD_SIZE));
VM_WARN_ON_ONCE(folio && !folio_test_locked(folio));
@@ -3085,9 +3087,12 @@ void split_huge_pmd_locked(struct vm_are
* require a folio to check the PMD against. Otherwise, there
* is a risk of replacing the wrong folio.
*/
- if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) ||
- is_pmd_migration_entry(*pmd)) {
- if (folio && folio != pmd_folio(*pmd))
+ if (pmd_trans_huge(*pmd) || pmd_devmap(*pmd) || pmd_migration) {
+ /*
+ * Do not apply pmd_folio() to a migration entry; and folio lock
+ * guarantees that it must be of the wrong folio anyway.
+ */
+ if (folio && (pmd_migration || folio != pmd_folio(*pmd)))
return;
__split_huge_pmd_locked(vma, pmd, address, freeze);
}
_
Patches currently in -mm which might be from gavinguo(a)igalia.com are
mm-huge_memory-adjust-try_to_migrate_one-and-split_huge_pmd_locked.patch
mm-huge_memory-remove-useless-folio-pointers-passing.patch
The quilt patch titled
Subject: ocfs2: fix the issue with discontiguous allocation in the global_bitmap
has been removed from the -mm tree. Its filename was
ocfs2-fix-the-issue-with-discontiguous-allocation-in-the-global_bitmap.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Heming Zhao <heming.zhao(a)suse.com>
Subject: ocfs2: fix the issue with discontiguous allocation in the global_bitmap
Date: Mon, 14 Apr 2025 14:01:23 +0800
commit 4eb7b93e0310 ("ocfs2: improve write IO performance when
fragmentation is high") introduced another regression.
The following ocfs2-test case can trigger this issue:
> discontig_runner.sh => activate_discontig_bg.sh => resv_unwritten:
> ${RESV_UNWRITTEN_BIN} -f ${WORK_PLACE}/large_testfile -s 0 -l \
> $((${FILE_MAJOR_SIZE_M}*1024*1024))
In my env, test disk size (by "fdisk -l <dev>"):
> 53687091200 bytes, 104857600 sectors.
Above command is:
> /usr/local/ocfs2-test/bin/resv_unwritten -f \
> /mnt/ocfs2/ocfs2-activate-discontig-bg-dir/large_testfile -s 0 -l \
> 53187969024
Error log:
> [*] Reserve 50724M space for a LARGE file, reserve 200M space for future test.
> ioctl error 28: "No space left on device"
> resv allocation failed Unknown error -1
> reserve unwritten region from 0 to 53187969024.
Call flow:
__ocfs2_change_file_space //by ioctl OCFS2_IOC_RESVSP64
ocfs2_allocate_unwritten_extents //start:0 len:53187969024
while()
+ ocfs2_get_clusters //cpos:0, alloc_size:1623168 (cluster number)
+ ocfs2_extend_allocation
+ ocfs2_lock_allocators
| + choose OCFS2_AC_USE_MAIN & ocfs2_cluster_group_search
|
+ ocfs2_add_inode_data
ocfs2_add_clusters_in_btree
__ocfs2_claim_clusters
ocfs2_claim_suballoc_bits
+ During the allocation of the final part of the large file
(after ~47GB), no chain had the required contiguous
bits_wanted. Consequently, the allocation failed.
How to fix:
When OCFS2 is encountering fragmented allocation, the file system should
stop attempting bits_wanted contiguous allocation and instead provide the
largest available contiguous free bits from the cluster groups.
Link: https://lkml.kernel.org/r/20250414060125.19938-2-heming.zhao@suse.com
Fixes: 4eb7b93e0310 ("ocfs2: improve write IO performance when fragmentation is high")
Signed-off-by: Heming Zhao <heming.zhao(a)suse.com>
Reported-by: Gautham Ananthakrishna <gautham.ananthakrishna(a)oracle.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/suballoc.c | 38 ++++++++++++++++++++++++++++++++------
fs/ocfs2/suballoc.h | 1 +
2 files changed, 33 insertions(+), 6 deletions(-)
--- a/fs/ocfs2/suballoc.c~ocfs2-fix-the-issue-with-discontiguous-allocation-in-the-global_bitmap
+++ a/fs/ocfs2/suballoc.c
@@ -698,10 +698,12 @@ static int ocfs2_block_group_alloc(struc
bg_bh = ocfs2_block_group_alloc_contig(osb, handle, alloc_inode,
ac, cl);
- if (PTR_ERR(bg_bh) == -ENOSPC)
+ if (PTR_ERR(bg_bh) == -ENOSPC) {
+ ac->ac_which = OCFS2_AC_USE_MAIN_DISCONTIG;
bg_bh = ocfs2_block_group_alloc_discontig(handle,
alloc_inode,
ac, cl);
+ }
if (IS_ERR(bg_bh)) {
status = PTR_ERR(bg_bh);
bg_bh = NULL;
@@ -1794,6 +1796,7 @@ static int ocfs2_search_chain(struct ocf
{
int status;
u16 chain;
+ u32 contig_bits;
u64 next_group;
struct inode *alloc_inode = ac->ac_inode;
struct buffer_head *group_bh = NULL;
@@ -1819,10 +1822,21 @@ static int ocfs2_search_chain(struct ocf
status = -ENOSPC;
/* for now, the chain search is a bit simplistic. We just use
* the 1st group with any empty bits. */
- while ((status = ac->ac_group_search(alloc_inode, group_bh,
- bits_wanted, min_bits,
- ac->ac_max_block,
- res)) == -ENOSPC) {
+ while (1) {
+ if (ac->ac_which == OCFS2_AC_USE_MAIN_DISCONTIG) {
+ contig_bits = le16_to_cpu(bg->bg_contig_free_bits);
+ if (!contig_bits)
+ contig_bits = ocfs2_find_max_contig_free_bits(bg->bg_bitmap,
+ le16_to_cpu(bg->bg_bits), 0);
+ if (bits_wanted > contig_bits && contig_bits >= min_bits)
+ bits_wanted = contig_bits;
+ }
+
+ status = ac->ac_group_search(alloc_inode, group_bh,
+ bits_wanted, min_bits,
+ ac->ac_max_block, res);
+ if (status != -ENOSPC)
+ break;
if (!bg->bg_next_group)
break;
@@ -1982,6 +1996,7 @@ static int ocfs2_claim_suballoc_bits(str
victim = ocfs2_find_victim_chain(cl);
ac->ac_chain = victim;
+search:
status = ocfs2_search_chain(ac, handle, bits_wanted, min_bits,
res, &bits_left);
if (!status) {
@@ -2022,6 +2037,16 @@ static int ocfs2_claim_suballoc_bits(str
}
}
+ /* Chains can't supply the bits_wanted contiguous space.
+ * We should switch to using every single bit when allocating
+ * from the global bitmap. */
+ if (i == le16_to_cpu(cl->cl_next_free_rec) &&
+ status == -ENOSPC && ac->ac_which == OCFS2_AC_USE_MAIN) {
+ ac->ac_which = OCFS2_AC_USE_MAIN_DISCONTIG;
+ ac->ac_chain = victim;
+ goto search;
+ }
+
set_hint:
if (status != -ENOSPC) {
/* If the next search of this group is not likely to
@@ -2365,7 +2390,8 @@ int __ocfs2_claim_clusters(handle_t *han
BUG_ON(ac->ac_bits_given >= ac->ac_bits_wanted);
BUG_ON(ac->ac_which != OCFS2_AC_USE_LOCAL
- && ac->ac_which != OCFS2_AC_USE_MAIN);
+ && ac->ac_which != OCFS2_AC_USE_MAIN
+ && ac->ac_which != OCFS2_AC_USE_MAIN_DISCONTIG);
if (ac->ac_which == OCFS2_AC_USE_LOCAL) {
WARN_ON(min_clusters > 1);
--- a/fs/ocfs2/suballoc.h~ocfs2-fix-the-issue-with-discontiguous-allocation-in-the-global_bitmap
+++ a/fs/ocfs2/suballoc.h
@@ -29,6 +29,7 @@ struct ocfs2_alloc_context {
#define OCFS2_AC_USE_MAIN 2
#define OCFS2_AC_USE_INODE 3
#define OCFS2_AC_USE_META 4
+#define OCFS2_AC_USE_MAIN_DISCONTIG 5
u32 ac_which;
/* these are used by the chain search */
_
Patches currently in -mm which might be from heming.zhao(a)suse.com are
The patch titled
Subject: x86/kexec: fix potential cmem->ranges out of bounds
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
x86-kexec-fix-potential-cmem-ranges-out-of-bounds.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: fuqiang wang <fuqiang.wang(a)easystack.cn>
Subject: x86/kexec: fix potential cmem->ranges out of bounds
Date: Mon, 8 Jan 2024 21:06:47 +0800
In memmap_exclude_ranges(), elfheader will be excluded from crashk_res.
In the current x86 architecture code, the elfheader is always allocated
at crashk_res.start. It seems that there won't be a new split range.
But it depends on the allocation position of elfheader in crashk_res. To
avoid potential out of bounds in future, add a extra slot.
The similar issue also exists in fill_up_crash_elf_data(). The range to
be excluded is [0, 1M], start (0) is special and will not appear in the
middle of existing cmem->ranges[]. But in cast the low 1M could be
changed in the future, add a extra slot too.
Without this patch, kdump kernel will fail to be loaded by
kexec_file_load,
[ 139.736948] UBSAN: array-index-out-of-bounds in arch/x86/kernel/crash.c:350:25
[ 139.742360] index 0 is out of range for type 'range [*]'
[ 139.745695] CPU: 0 UID: 0 PID: 5778 Comm: kexec Not tainted 6.15.0-0.rc3.20250425git02ddfb981de8.32.fc43.x86_64 #1 PREEMPT(lazy)
[ 139.745698] Hardware name: Amazon EC2 c5.large/, BIOS 1.0 10/16/2017
[ 139.745699] Call Trace:
[ 139.745700] <TASK>
[ 139.745701] dump_stack_lvl+0x5d/0x80
[ 139.745706] ubsan_epilogue+0x5/0x2b
[ 139.745709] __ubsan_handle_out_of_bounds.cold+0x54/0x59
[ 139.745711] crash_setup_memmap_entries+0x2d9/0x330
[ 139.745716] setup_boot_parameters+0xf8/0x6a0
[ 139.745720] bzImage64_load+0x41b/0x4e0
[ 139.745722] ? find_next_iomem_res+0x109/0x140
[ 139.745727] ? locate_mem_hole_callback+0x109/0x170
[ 139.745737] kimage_file_alloc_init+0x1ef/0x3e0
[ 139.745740] __do_sys_kexec_file_load+0x180/0x2f0
[ 139.745742] do_syscall_64+0x7b/0x160
[ 139.745745] ? do_user_addr_fault+0x21a/0x690
[ 139.745747] ? exc_page_fault+0x7e/0x1a0
[ 139.745749] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ 139.745751] RIP: 0033:0x7f7712c84e4d
Previously discussed link:
[1] https://lore.kernel.org/kexec/ZXk2oBf%2FT1Ul6o0c@MiWiFi-R3L-srv/
[2] https://lore.kernel.org/kexec/273284e8-7680-4f5f-8065-c5d780987e59@easystac…
[3] https://lore.kernel.org/kexec/ZYQ6O%2F57sHAPxTHm@MiWiFi-R3L-srv/
Link: https://lkml.kernel.org/r/20240108130720.228478-1-fuqiang.wang@easystack.cn
Signed-off-by: fuqiang wang <fuqiang.wang(a)easystack.cn>
Acked-by: Baoquan He <bhe(a)redhat.com>
Reported-by: Coiby Xu <coxu(a)redhat.com>
Closes: https://lkml.kernel.org/r/4de3c2onosr7negqnfhekm4cpbklzmsimgdfv33c52dktqpza…
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: <x86(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/crash.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
--- a/arch/x86/kernel/crash.c~x86-kexec-fix-potential-cmem-ranges-out-of-bounds
+++ a/arch/x86/kernel/crash.c
@@ -165,8 +165,18 @@ static struct crash_mem *fill_up_crash_e
/*
* Exclusion of crash region and/or crashk_low_res may cause
* another range split. So add extra two slots here.
+ *
+ * Exclusion of low 1M may not cause another range split, because the
+ * range of exclude is [0, 1M] and the condition for splitting a new
+ * region is that the start, end parameters are both in a certain
+ * existing region in cmem and cannot be equal to existing region's
+ * start or end. Obviously, the start of [0, 1M] cannot meet this
+ * condition.
+ *
+ * But in order to lest the low 1M could be changed in the future,
+ * (e.g. [stare, 1M]), add a extra slot.
*/
- nr_ranges += 2;
+ nr_ranges += 3;
cmem = vzalloc(struct_size(cmem, ranges, nr_ranges));
if (!cmem)
return NULL;
@@ -298,9 +308,16 @@ int crash_setup_memmap_entries(struct ki
struct crash_memmap_data cmd;
struct crash_mem *cmem;
- cmem = vzalloc(struct_size(cmem, ranges, 1));
+ /*
+ * In the current x86 architecture code, the elfheader is always
+ * allocated at crashk_res.start. But it depends on the allocation
+ * position of elfheader in crashk_res. To avoid potential out of
+ * bounds in future, add a extra slot.
+ */
+ cmem = vzalloc(struct_size(cmem, ranges, 2));
if (!cmem)
return -ENOMEM;
+ cmem->max_nr_ranges = 2;
memset(&cmd, 0, sizeof(struct crash_memmap_data));
cmd.params = params;
_
Patches currently in -mm which might be from fuqiang.wang(a)easystack.cn are
x86-kexec-fix-potential-cmem-ranges-out-of-bounds.patch
The patch titled
Subject: mm: fix VM_UFFD_MINOR == VM_SHADOW_STACK on USERFAULTFD=y && ARM64_GCS=y
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-vm_uffd_minor-==-vm_shadow_stack-on-userfaultfd=y-arm64_gcs=y.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Florent Revest <revest(a)chromium.org>
Subject: mm: fix VM_UFFD_MINOR == VM_SHADOW_STACK on USERFAULTFD=y && ARM64_GCS=y
Date: Wed, 7 May 2025 15:09:57 +0200
On configs with CONFIG_ARM64_GCS=y, VM_SHADOW_STACK is bit 38. On configs
with CONFIG_HAVE_ARCH_USERFAULTFD_MINOR=y (selected by CONFIG_ARM64 when
CONFIG_USERFAULTFD=y), VM_UFFD_MINOR is _also_ bit 38.
This bit being shared by two different VMA flags could lead to all sorts
of unintended behaviors. Presumably, a process could maybe call into
userfaultfd in a way that disables the shadow stack vma flag. I can't
think of any attack where this would help (presumably, if an attacker
tries to disable shadow stacks, they are trying to hijack control flow so
can't arbitrarily call into userfaultfd yet anyway) but this still feels
somewhat scary.
Link: https://lkml.kernel.org/r/20250507131000.1204175-2-revest@chromium.org
Fixes: ae80e1629aea ("mm: Define VM_SHADOW_STACK for arm64 when we support GCS")
Signed-off-by: Florent Revest <revest(a)chromium.org>
Reviewed-by: Mark Brown <broonie(a)kernel.org>
Cc: Borislav Betkov <bp(a)alien8.de>
Cc: Brendan Jackman <jackmanb(a)google.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Florent Revest <revest(a)chromium.org>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Thiago Jung Bauermann <thiago.bauermann(a)linaro.org>
Cc: Thomas Gleinxer <tglx(a)linutronix.de>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/include/linux/mm.h~mm-fix-vm_uffd_minor-==-vm_shadow_stack-on-userfaultfd=y-arm64_gcs=y
+++ a/include/linux/mm.h
@@ -385,7 +385,7 @@ extern unsigned int kobjsize(const void
#endif
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
-# define VM_UFFD_MINOR_BIT 38
+# define VM_UFFD_MINOR_BIT 41
# define VM_UFFD_MINOR BIT(VM_UFFD_MINOR_BIT) /* UFFD minor faults */
#else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
# define VM_UFFD_MINOR VM_NONE
_
Patches currently in -mm which might be from revest(a)chromium.org are
mm-fix-vm_uffd_minor-==-vm_shadow_stack-on-userfaultfd=y-arm64_gcs=y.patch
The patch titled
Subject: mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mmap-map-map_stack-to-vm_nohugepage-only-if-thp-is-enabled.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
Subject: mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled
Date: Wed, 07 May 2025 15:28:06 +0200
commit c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") maps the
mmap option MAP_STACK to VM_NOHUGEPAGE. This is also done if
CONFIG_TRANSPARENT_HUGEPAGE is not defined. But in that case, the
VM_NOHUGEPAGE does not make sense.
I discovered this issue when trying to use the tool CRIU to checkpoint and
restore a container. Our running kernel is compiled without
CONFIG_TRANSPARENT_HUGEPAGE. CRIU parses the output of /proc/<pid>/smaps
and saves the "nh" flag. When trying to restore the container, CRIU fails
to restore the "nh" mappings, since madvise() MADV_NOHUGEPAGE always
returns an error because CONFIG_TRANSPARENT_HUGEPAGE is not defined.
Link: https://lkml.kernel.org/r/20250507-map-map_stack-to-vm_nohugepage-only-if-t…
Fixes: c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE")
Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mman.h | 2 ++
1 file changed, 2 insertions(+)
--- a/include/linux/mman.h~mm-mmap-map-map_stack-to-vm_nohugepage-only-if-thp-is-enabled
+++ a/include/linux/mman.h
@@ -155,7 +155,9 @@ calc_vm_flag_bits(struct file *file, uns
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
_calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
_calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
+#endif
arch_calc_vm_flag_bits(file, flags);
}
_
Patches currently in -mm which might be from Ignacio.MorenoGonzalez(a)kuka.com are
mm-mmap-map-map_stack-to-vm_nohugepage-only-if-thp-is-enabled.patch
The quilt patch titled
Subject: mm: vmscan: avoid signedness error for GCC 5.4
has been removed from the -mm tree. Its filename was
mm-vmscan-avoid-signedness-error-for-gcc-54.patch
This patch was dropped because an updated version will be issued
------------------------------------------------------
From: WangYuli <wangyuli(a)uniontech.com>
Subject: mm: vmscan: avoid signedness error for GCC 5.4
Date: Wed, 7 May 2025 12:08:27 +0800
To the GCC 5.4 compiler, (MAX_NR_TIERS - 1) (i.e., (4U - 1)) is unsigned,
whereas tier is a signed integer.
Then, the __types_ok check within the __careful_cmp_once macro failed,
triggered BUILD_BUG_ON.
Use min_t instead of min to circumvent this compiler error.
Fix follow error with gcc 5.4:
mm/vmscan.c: In function `read_ctrl_pos':
mm/vmscan.c:3166:728: error: call to `__compiletime_assert_887' declared with attribute error: min(tier, 4U - 1) signedness error
Link: https://lkml.kernel.org/r/62726950F697595A+20250507040827.1147510-1-wangyul…
Fixes: 37a260870f2c ("mm/mglru: rework type selection")
Signed-off-by: WangYuli <wangyuli(a)uniontech.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: David Laight <david.laight.linux(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/vmscan.c~mm-vmscan-avoid-signedness-error-for-gcc-54
+++ a/mm/vmscan.c
@@ -3163,7 +3163,7 @@ static void read_ctrl_pos(struct lruvec
pos->gain = gain;
pos->refaulted = pos->total = 0;
- for (i = tier % MAX_NR_TIERS; i <= min(tier, MAX_NR_TIERS - 1); i++) {
+ for (i = tier % MAX_NR_TIERS; i <= min_t(int, tier, MAX_NR_TIERS - 1); i++) {
pos->refaulted += lrugen->avg_refaulted[type][i] +
atomic_long_read(&lrugen->refaulted[hist][type][i]);
pos->total += lrugen->avg_total[type][i] +
_
Patches currently in -mm which might be from wangyuli(a)uniontech.com are
ocfs2-o2net_idle_timer-rename-del_timer_sync-in-comment.patch
treewide-fix-typo-previlege.patch
The patch titled
Subject: mm/page_alloc: fix race condition in unaccepted memory handling
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-page_alloc-fix-race-condition-in-unaccepted-memory-handling.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com>
Subject: mm/page_alloc: fix race condition in unaccepted memory handling
Date: Tue, 6 May 2025 16:32:07 +0300
The page allocator tracks the number of zones that have unaccepted memory
using static_branch_enc/dec() and uses that static branch in hot paths to
determine if it needs to deal with unaccepted memory.
Borislav and Thomas pointed out that the tracking is racy: operations on
static_branch are not serialized against adding/removing unaccepted pages
to/from the zone.
Sanity checks inside static_branch machinery detects it:
WARNING: CPU: 0 PID: 10 at kernel/jump_label.c:276 __static_key_slow_dec_cpuslocked+0x8e/0xa0
The comment around the WARN() explains the problem:
/*
* Warn about the '-1' case though; since that means a
* decrement is concurrent with a first (0->1) increment. IOW
* people are trying to disable something that wasn't yet fully
* enabled. This suggests an ordering problem on the user side.
*/
The effect of this static_branch optimization is only visible on
microbenchmark.
Instead of adding more complexity around it, remove it altogether.
Link: https://lkml.kernel.org/r/20250506133207.1009676-1-kirill.shutemov@linux.in…
Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Fixes: dcdfdd40fa82 ("mm: Add support for unaccepted memory")
Link: https://lore.kernel.org/all/20250506092445.GBaBnVXXyvnazly6iF@fat_crate.loc…
Reported-by: Borislav Petkov <bp(a)alien8.de>
Tested-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reported-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Brendan Jackman <jackmanb(a)google.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/internal.h | 1
mm/mm_init.c | 1
mm/page_alloc.c | 47 ----------------------------------------------
3 files changed, 49 deletions(-)
--- a/mm/internal.h~mm-page_alloc-fix-race-condition-in-unaccepted-memory-handling
+++ a/mm/internal.h
@@ -1590,7 +1590,6 @@ unsigned long move_page_tables(struct pa
#ifdef CONFIG_UNACCEPTED_MEMORY
void accept_page(struct page *page);
-void unaccepted_cleanup_work(struct work_struct *work);
#else /* CONFIG_UNACCEPTED_MEMORY */
static inline void accept_page(struct page *page)
{
--- a/mm/mm_init.c~mm-page_alloc-fix-race-condition-in-unaccepted-memory-handling
+++ a/mm/mm_init.c
@@ -1441,7 +1441,6 @@ static void __meminit zone_init_free_lis
#ifdef CONFIG_UNACCEPTED_MEMORY
INIT_LIST_HEAD(&zone->unaccepted_pages);
- INIT_WORK(&zone->unaccepted_cleanup, unaccepted_cleanup_work);
#endif
}
--- a/mm/page_alloc.c~mm-page_alloc-fix-race-condition-in-unaccepted-memory-handling
+++ a/mm/page_alloc.c
@@ -7180,16 +7180,8 @@ bool has_managed_dma(void)
#ifdef CONFIG_UNACCEPTED_MEMORY
-/* Counts number of zones with unaccepted pages. */
-static DEFINE_STATIC_KEY_FALSE(zones_with_unaccepted_pages);
-
static bool lazy_accept = true;
-void unaccepted_cleanup_work(struct work_struct *work)
-{
- static_branch_dec(&zones_with_unaccepted_pages);
-}
-
static int __init accept_memory_parse(char *p)
{
if (!strcmp(p, "lazy")) {
@@ -7214,11 +7206,7 @@ static bool page_contains_unaccepted(str
static void __accept_page(struct zone *zone, unsigned long *flags,
struct page *page)
{
- bool last;
-
list_del(&page->lru);
- last = list_empty(&zone->unaccepted_pages);
-
account_freepages(zone, -MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE);
__mod_zone_page_state(zone, NR_UNACCEPTED, -MAX_ORDER_NR_PAGES);
__ClearPageUnaccepted(page);
@@ -7227,28 +7215,6 @@ static void __accept_page(struct zone *z
accept_memory(page_to_phys(page), PAGE_SIZE << MAX_PAGE_ORDER);
__free_pages_ok(page, MAX_PAGE_ORDER, FPI_TO_TAIL);
-
- if (last) {
- /*
- * There are two corner cases:
- *
- * - If allocation occurs during the CPU bring up,
- * static_branch_dec() cannot be used directly as
- * it causes a deadlock on cpu_hotplug_lock.
- *
- * Instead, use schedule_work() to prevent deadlock.
- *
- * - If allocation occurs before workqueues are initialized,
- * static_branch_dec() should be called directly.
- *
- * Workqueues are initialized before CPU bring up, so this
- * will not conflict with the first scenario.
- */
- if (system_wq)
- schedule_work(&zone->unaccepted_cleanup);
- else
- unaccepted_cleanup_work(&zone->unaccepted_cleanup);
- }
}
void accept_page(struct page *page)
@@ -7285,20 +7251,12 @@ static bool try_to_accept_memory_one(str
return true;
}
-static inline bool has_unaccepted_memory(void)
-{
- return static_branch_unlikely(&zones_with_unaccepted_pages);
-}
-
static bool cond_accept_memory(struct zone *zone, unsigned int order,
int alloc_flags)
{
long to_accept, wmark;
bool ret = false;
- if (!has_unaccepted_memory())
- return false;
-
if (list_empty(&zone->unaccepted_pages))
return false;
@@ -7336,22 +7294,17 @@ static bool __free_unaccepted(struct pag
{
struct zone *zone = page_zone(page);
unsigned long flags;
- bool first = false;
if (!lazy_accept)
return false;
spin_lock_irqsave(&zone->lock, flags);
- first = list_empty(&zone->unaccepted_pages);
list_add_tail(&page->lru, &zone->unaccepted_pages);
account_freepages(zone, MAX_ORDER_NR_PAGES, MIGRATE_MOVABLE);
__mod_zone_page_state(zone, NR_UNACCEPTED, MAX_ORDER_NR_PAGES);
__SetPageUnaccepted(page);
spin_unlock_irqrestore(&zone->lock, flags);
- if (first)
- static_branch_inc(&zones_with_unaccepted_pages);
-
return true;
}
_
Patches currently in -mm which might be from kirill.shutemov(a)linux.intel.com are
mm-page_alloc-ensure-try_alloc_pages-plays-well-with-unaccepted-memory.patch
mm-page_alloc-fix-race-condition-in-unaccepted-memory-handling.patch
After a recent change [1] in clang's randstruct implementation to
randomize structures that only contain function pointers, there is an
error because qede_ll_ops get randomized but does not use a designated
initializer for the first member:
drivers/net/ethernet/qlogic/qede/qede_main.c:206:2: error: a randomized struct can only be initialized with a designated initializer
206 | {
| ^
Explicitly initialize the common member using a designated initializer
to fix the build.
Cc: stable(a)vger.kernel.org
Fixes: 035f7f87b729 ("randstruct: Enable Clang support")
Link: https://github.com/llvm/llvm-project/commit/04364fb888eea6db9811510607bed4b… [1]
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
drivers/net/ethernet/qlogic/qede/qede_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/qlogic/qede/qede_main.c b/drivers/net/ethernet/qlogic/qede/qede_main.c
index 99df00c30b8c..b5d744d2586f 100644
--- a/drivers/net/ethernet/qlogic/qede/qede_main.c
+++ b/drivers/net/ethernet/qlogic/qede/qede_main.c
@@ -203,7 +203,7 @@ static struct pci_driver qede_pci_driver = {
};
static struct qed_eth_cb_ops qede_ll_ops = {
- {
+ .common = {
#ifdef CONFIG_RFS_ACCEL
.arfs_filter_op = qede_arfs_filter_op,
#endif
---
base-commit: 9540984da649d46f699c47f28c68bbd3c9d99e4c
change-id: 20250507-qede-fix-clang-randstruct-011a3119f5d6
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
On 07/05/2025 16:45, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> spi: tegra114: Don't fail set_cs_timing when delays are zero
>
> to the 5.15-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> spi-tegra114-don-t-fail-set_cs_timing-when-delays-ar.patch
> and it can be found in the queue-5.15 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Please drop this from the stable queue because there is another fix
pending to fix this fix.
Jon
--
nvpublic
Commit 7ab4f0e37a0f ("ACPI PPTT: Fix coding mistakes in a couple of
sizeof() calls") corrects the processer entry size but unmasked a longer
standing bug where the last entry in the structure can get skipped due
to an off-by-one mistake if the last entry ends exactly at the end of
the ACPI subtable.
The error manifests for instance on EC2 Graviton Metal instances with
ACPI PPTT: PPTT table found, but unable to locate core 63 (63)
[...]
ACPI: SPE must be homogeneous
Fixes: 2bd00bcd73e5 ("ACPI/PPTT: Add Processor Properties Topology Table parsing")
Cc: stable(a)vger.kernel.org
Signed-off-by: Maximilian Heyne <mheyne(a)amazon.de>
---
drivers/acpi/pptt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/acpi/pptt.c b/drivers/acpi/pptt.c
index f73ce6e13065d..4364da90902e5 100644
--- a/drivers/acpi/pptt.c
+++ b/drivers/acpi/pptt.c
@@ -231,7 +231,7 @@ static int acpi_pptt_leaf_node(struct acpi_table_header *table_hdr,
sizeof(struct acpi_table_pptt));
proc_sz = sizeof(struct acpi_pptt_processor);
- while ((unsigned long)entry + proc_sz < table_end) {
+ while ((unsigned long)entry + proc_sz <= table_end) {
cpu_node = (struct acpi_pptt_processor *)entry;
if (entry->type == ACPI_PPTT_TYPE_PROCESSOR &&
cpu_node->parent == node_entry)
@@ -273,7 +273,7 @@ static struct acpi_pptt_processor *acpi_find_processor_node(struct acpi_table_he
proc_sz = sizeof(struct acpi_pptt_processor);
/* find the processor structure associated with this cpuid */
- while ((unsigned long)entry + proc_sz < table_end) {
+ while ((unsigned long)entry + proc_sz <= table_end) {
cpu_node = (struct acpi_pptt_processor *)entry;
if (entry->length == 0) {
--
2.47.1
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
On 07/05/2025 16:43, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> spi: tegra114: Don't fail set_cs_timing when delays are zero
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> spi-tegra114-don-t-fail-set_cs_timing-when-delays-ar.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Please don't queue this up for stable yet. This fix is not correct and
there is another change pending to correct this change.
Jon
--
nvpublic
From: Chuck Lever <chuck.lever(a)oracle.com>
RFC 7862 states that if an NFS server implements a CLONE operation,
it MUST also implement FATTR4_CLONE_BLKSIZE. NFSD implements CLONE,
but does not implement FATTR4_CLONE_BLKSIZE.
Note that in Section 12.2, RFC 7862 claims that
FATTR4_CLONE_BLKSIZE is RECOMMENDED, not REQUIRED. Likely this is
because a minor version is not permitted to add a REQUIRED
attribute. Confusing.
We assume this attribute reports a block size as a count of bytes,
as RFC 7862 does not specify a unit.
Reported-by: Roland Mainz <roland.mainz(a)nrubsig.org>
Suggested-by: Christoph Hellwig <hch(a)infradead.org>
Reviewed-by: Roland Mainz <roland.mainz(a)nrubsig.org>
Cc: stable(a)vger.kernel.org # v6.7+
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
fs/nfsd/nfs4xdr.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index e67420729ecd..9eb8e5704622 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -3391,6 +3391,23 @@ static __be32 nfsd4_encode_fattr4_suppattr_exclcreat(struct xdr_stream *xdr,
return nfsd4_encode_bitmap4(xdr, supp[0], supp[1], supp[2]);
}
+/*
+ * Copied from generic_remap_checks/generic_remap_file_range_prep.
+ *
+ * These generic functions use the file system's s_blocksize, but
+ * individual file systems aren't required to use
+ * generic_remap_file_range_prep. Until there is a mechanism for
+ * determining a particular file system's (or file's) clone block
+ * size, this is the best NFSD can do.
+ */
+static __be32 nfsd4_encode_fattr4_clone_blksize(struct xdr_stream *xdr,
+ const struct nfsd4_fattr_args *args)
+{
+ struct inode *inode = d_inode(args->dentry);
+
+ return nfsd4_encode_uint32_t(xdr, inode->i_sb->s_blocksize);
+}
+
#ifdef CONFIG_NFSD_V4_SECURITY_LABEL
static __be32 nfsd4_encode_fattr4_sec_label(struct xdr_stream *xdr,
const struct nfsd4_fattr_args *args)
@@ -3545,7 +3562,7 @@ static const nfsd4_enc_attr nfsd4_enc_fattr4_encode_ops[] = {
[FATTR4_MODE_SET_MASKED] = nfsd4_encode_fattr4__noop,
[FATTR4_SUPPATTR_EXCLCREAT] = nfsd4_encode_fattr4_suppattr_exclcreat,
[FATTR4_FS_CHARSET_CAP] = nfsd4_encode_fattr4__noop,
- [FATTR4_CLONE_BLKSIZE] = nfsd4_encode_fattr4__noop,
+ [FATTR4_CLONE_BLKSIZE] = nfsd4_encode_fattr4_clone_blksize,
[FATTR4_SPACE_FREED] = nfsd4_encode_fattr4__noop,
[FATTR4_CHANGE_ATTR_TYPE] = nfsd4_encode_fattr4__noop,
--
2.49.0
From: Long Li <longli(a)microsoft.com>
Following the ring header, the ring data should align to system page
boundary. Adjust the size if necessary.
Cc: stable(a)vger.kernel.org
Fixes: 95096f2fbd10 ("uio-hv-generic: new userspace i/o driver for VMBus")
Signed-off-by: Long Li <longli(a)microsoft.com>
---
drivers/uio/uio_hv_generic.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 08385b04c4ab..cb2e7e0e1540 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -256,6 +256,9 @@ hv_uio_probe(struct hv_device *dev,
if (!ring_size)
ring_size = SZ_2M;
+ /* Adjust ring size if necessary to have it page aligned */
+ ring_size = VMBUS_RING_SIZE(ring_size);
+
pdata = devm_kzalloc(&dev->device, sizeof(*pdata), GFP_KERNEL);
if (!pdata)
return -ENOMEM;
--
2.34.1
From: Long Li <longli(a)microsoft.com>
Interrupt and monitor pages should be in Hyper-V page size (4k bytes).
This can be different from the system page size.
This size is read and used by the user-mode program to determine the
mapped data region. An example of such user-mode program is the VMBus
driver in DPDK.
Cc: stable(a)vger.kernel.org
Fixes: 95096f2fbd10 ("uio-hv-generic: new userspace i/o driver for VMBus")
Signed-off-by: Long Li <longli(a)microsoft.com>
---
drivers/uio/uio_hv_generic.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/uio/uio_hv_generic.c b/drivers/uio/uio_hv_generic.c
index 1b19b5647495..08385b04c4ab 100644
--- a/drivers/uio/uio_hv_generic.c
+++ b/drivers/uio/uio_hv_generic.c
@@ -287,13 +287,13 @@ hv_uio_probe(struct hv_device *dev,
pdata->info.mem[INT_PAGE_MAP].name = "int_page";
pdata->info.mem[INT_PAGE_MAP].addr
= (uintptr_t)vmbus_connection.int_page;
- pdata->info.mem[INT_PAGE_MAP].size = PAGE_SIZE;
+ pdata->info.mem[INT_PAGE_MAP].size = HV_HYP_PAGE_SIZE;
pdata->info.mem[INT_PAGE_MAP].memtype = UIO_MEM_LOGICAL;
pdata->info.mem[MON_PAGE_MAP].name = "monitor_page";
pdata->info.mem[MON_PAGE_MAP].addr
= (uintptr_t)vmbus_connection.monitor_pages[1];
- pdata->info.mem[MON_PAGE_MAP].size = PAGE_SIZE;
+ pdata->info.mem[MON_PAGE_MAP].size = HV_HYP_PAGE_SIZE;
pdata->info.mem[MON_PAGE_MAP].memtype = UIO_MEM_LOGICAL;
if (channel->device_id == HV_NIC) {
--
2.34.1
From: Long Li <longli(a)microsoft.com>
There are use cases that interrupt and monitor pages are mapped to
user-mode through UIO, so they need to be system page aligned. Some
Hyper-V allocation APIs introduced earlier broke those requirements.
Fix this by using page allocation functions directly for interrupt
and monitor pages.
Cc: stable(a)vger.kernel.org
Fixes: ca48739e59df ("Drivers: hv: vmbus: Move Hyper-V page allocator to arch neutral code")
Signed-off-by: Long Li <longli(a)microsoft.com>
---
drivers/hv/connection.c | 23 +++++++++++++++++------
1 file changed, 17 insertions(+), 6 deletions(-)
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 8351360bba16..be490c598785 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -206,11 +206,20 @@ int vmbus_connect(void)
INIT_LIST_HEAD(&vmbus_connection.chn_list);
mutex_init(&vmbus_connection.channel_mutex);
+ /*
+ * The following Hyper-V interrupt and monitor pages can be used by
+ * UIO for mapping to user-space, so they should always be allocated on
+ * system page boundaries. The system page size must be >= the Hyper-V
+ * page size.
+ */
+ BUILD_BUG_ON(PAGE_SIZE < HV_HYP_PAGE_SIZE);
+
/*
* Setup the vmbus event connection for channel interrupt
* abstraction stuff
*/
- vmbus_connection.int_page = hv_alloc_hyperv_zeroed_page();
+ vmbus_connection.int_page =
+ (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO);
if (vmbus_connection.int_page == NULL) {
ret = -ENOMEM;
goto cleanup;
@@ -225,8 +234,8 @@ int vmbus_connect(void)
* Setup the monitor notification facility. The 1st page for
* parent->child and the 2nd page for child->parent
*/
- vmbus_connection.monitor_pages[0] = hv_alloc_hyperv_page();
- vmbus_connection.monitor_pages[1] = hv_alloc_hyperv_page();
+ vmbus_connection.monitor_pages[0] = (void *)__get_free_page(GFP_KERNEL);
+ vmbus_connection.monitor_pages[1] = (void *)__get_free_page(GFP_KERNEL);
if ((vmbus_connection.monitor_pages[0] == NULL) ||
(vmbus_connection.monitor_pages[1] == NULL)) {
ret = -ENOMEM;
@@ -342,21 +351,23 @@ void vmbus_disconnect(void)
destroy_workqueue(vmbus_connection.work_queue);
if (vmbus_connection.int_page) {
- hv_free_hyperv_page(vmbus_connection.int_page);
+ free_page((unsigned long)vmbus_connection.int_page);
vmbus_connection.int_page = NULL;
}
if (vmbus_connection.monitor_pages[0]) {
if (!set_memory_encrypted(
(unsigned long)vmbus_connection.monitor_pages[0], 1))
- hv_free_hyperv_page(vmbus_connection.monitor_pages[0]);
+ free_page((unsigned long)
+ vmbus_connection.monitor_pages[0]);
vmbus_connection.monitor_pages[0] = NULL;
}
if (vmbus_connection.monitor_pages[1]) {
if (!set_memory_encrypted(
(unsigned long)vmbus_connection.monitor_pages[1], 1))
- hv_free_hyperv_page(vmbus_connection.monitor_pages[1]);
+ free_page((unsigned long)
+ vmbus_connection.monitor_pages[1]);
vmbus_connection.monitor_pages[1] = NULL;
}
}
--
2.34.1
Use common wrappers operating directly on the struct sg_table objects to
fix incorrect use of scatterlists sync calls. dma_sync_sg_for_*()
functions have to be called with the number of elements originally passed
to dma_map_sg_*() function, not the one returned in sgtable's nents.
Fixes: 1ffe09590121 ("udmabuf: fix dma-buf cpu access")
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
Acked-by: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
---
drivers/dma-buf/udmabuf.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/dma-buf/udmabuf.c b/drivers/dma-buf/udmabuf.c
index 7eee3eb47a8e..c9d0c68d2fcb 100644
--- a/drivers/dma-buf/udmabuf.c
+++ b/drivers/dma-buf/udmabuf.c
@@ -264,8 +264,7 @@ static int begin_cpu_udmabuf(struct dma_buf *buf,
ubuf->sg = NULL;
}
} else {
- dma_sync_sg_for_cpu(dev, ubuf->sg->sgl, ubuf->sg->nents,
- direction);
+ dma_sync_sgtable_for_cpu(dev, ubuf->sg, direction);
}
return ret;
@@ -280,7 +279,7 @@ static int end_cpu_udmabuf(struct dma_buf *buf,
if (!ubuf->sg)
return -EINVAL;
- dma_sync_sg_for_device(dev, ubuf->sg->sgl, ubuf->sg->nents, direction);
+ dma_sync_sgtable_for_device(dev, ubuf->sg, direction);
return 0;
}
--
2.34.1
As mentioned in Erratum 1544 from the Revision Guide for AMD Family 1Ah
Models 00h-0Fh Processors available at the link below, PMCx188 reports
incorrect information about valid IBS fetch samples when used with unit
mask 0x10 on Zen 5 processors. Remove affected events and metrics.
Link: https://bugzilla.kernel.org/attachment.cgi?id=308095
Fixes: 45c072f2537a ("perf vendor events amd: Add Zen 5 core events")
Signed-off-by: Sandipan Das <sandipan.das(a)amd.com>
Cc: stable(a)vger.kernel.org
---
tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json | 6 ------
1 file changed, 6 deletions(-)
diff --git a/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json b/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
index 4fd5e2c5432f..3b61cf8a04da 100644
--- a/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
+++ b/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
@@ -27,12 +27,6 @@
"BriefDescription": "Fetches discarded after being tagged by Fetch IBS due to IBS filtering.",
"UMask": "0x08"
},
- {
- "EventName": "ic_fetch_ibs_events.sample_valid",
- "EventCode": "0x188",
- "BriefDescription": "Fetches tagged by Fetch IBS that result in a valid sample and an IBS interrupt.",
- "UMask": "0x10"
- },
{
"EventName": "op_cache_hit_miss.op_cache_hit",
"EventCode": "0x28f",
--
2.43.0
As mentioned in Erratum 1583 from the Revision Guide for AMD Family 1Ah
Models 00h-0Fh Processors available at the link below, PMCx18E reports
incorrect information about instruction cache accesses on Zen 5
processors. Remove affected events and metrics.
Link: https://bugzilla.kernel.org/attachment.cgi?id=308095
Fixes: 45c072f2537a ("perf vendor events amd: Add Zen 5 core events")
Signed-off-by: Sandipan Das <sandipan.das(a)amd.com>
Cc: stable(a)vger.kernel.org
---
.../arch/x86/amdzen5/inst-cache.json | 18 ------------------
.../arch/x86/amdzen5/recommended.json | 6 ------
2 files changed, 24 deletions(-)
diff --git a/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json b/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
index ad75e5bf9513..4fd5e2c5432f 100644
--- a/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
+++ b/tools/perf/pmu-events/arch/x86/amdzen5/inst-cache.json
@@ -33,24 +33,6 @@
"BriefDescription": "Fetches tagged by Fetch IBS that result in a valid sample and an IBS interrupt.",
"UMask": "0x10"
},
- {
- "EventName": "ic_tag_hit_miss.instruction_cache_hit",
- "EventCode": "0x18e",
- "BriefDescription": "Instruction cache hits.",
- "UMask": "0x07"
- },
- {
- "EventName": "ic_tag_hit_miss.instruction_cache_miss",
- "EventCode": "0x18e",
- "BriefDescription": "Instruction cache misses.",
- "UMask": "0x18"
- },
- {
- "EventName": "ic_tag_hit_miss.all_instruction_cache_accesses",
- "EventCode": "0x18e",
- "BriefDescription": "Instruction cache accesses of all types.",
- "UMask": "0x1f"
- },
{
"EventName": "op_cache_hit_miss.op_cache_hit",
"EventCode": "0x28f",
diff --git a/tools/perf/pmu-events/arch/x86/amdzen5/recommended.json b/tools/perf/pmu-events/arch/x86/amdzen5/recommended.json
index 635d57e3bc15..863f4b5dfc14 100644
--- a/tools/perf/pmu-events/arch/x86/amdzen5/recommended.json
+++ b/tools/perf/pmu-events/arch/x86/amdzen5/recommended.json
@@ -136,12 +136,6 @@
"MetricExpr": "d_ratio(op_cache_hit_miss.op_cache_miss, op_cache_hit_miss.all_op_cache_accesses)",
"ScaleUnit": "100%"
},
- {
- "MetricName": "ic_fetch_miss_ratio",
- "BriefDescription": "Instruction cache miss ratio for all fetches. An instruction cache miss will not be counted by this metric if it is an OC hit.",
- "MetricExpr": "d_ratio(ic_tag_hit_miss.instruction_cache_miss, ic_tag_hit_miss.all_instruction_cache_accesses)",
- "ScaleUnit": "100%"
- },
{
"MetricName": "l1_data_cache_fills_from_memory_pti",
"BriefDescription": "L1 data cache fills from DRAM or MMIO in any NUMA node per thousand instructions.",
--
2.43.0
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 262de94a3a7ef23c326534b3d9483602b7af841e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025042256-unshackle-unwashed-bd50@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 262de94a3a7ef23c326534b3d9483602b7af841e Mon Sep 17 00:00:00 2001
From: Niranjana Vishwanathapura <niranjana.vishwanathapura(a)intel.com>
Date: Thu, 27 Mar 2025 11:56:04 -0700
Subject: [PATCH] drm/xe: Ensure fixed_slice_mode gets set after ccs_mode
change
The RCU_MODE_FIXED_SLICE_CCS_MODE setting is not getting invoked
in the gt reset path after the ccs_mode setting by the user.
Add it to engine register update list (in hw_engine_setup_default_state())
which ensures it gets set in the gt reset and engine reset paths.
v2: Add register update to engine list to ensure it gets updated
after engine reset also.
Fixes: 0d97ecce16bd ("drm/xe: Enable Fixed CCS mode setting")
Cc: stable(a)vger.kernel.org
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura(a)intel.com>
Reviewed-by: Matt Roper <matthew.d.roper(a)intel.com>
Signed-off-by: Matthew Brost <matthew.brost(a)intel.com>
Link: https://lore.kernel.org/r/20250327185604.18230-1-niranjana.vishwanathapura@…
(cherry picked from commit 12468e519f98e4d93370712e3607fab61df9dae9)
Signed-off-by: Lucas De Marchi <lucas.demarchi(a)intel.com>
diff --git a/drivers/gpu/drm/xe/xe_hw_engine.c b/drivers/gpu/drm/xe/xe_hw_engine.c
index 8c05fd30b7df..93241fd0a4ba 100644
--- a/drivers/gpu/drm/xe/xe_hw_engine.c
+++ b/drivers/gpu/drm/xe/xe_hw_engine.c
@@ -389,12 +389,6 @@ xe_hw_engine_setup_default_lrc_state(struct xe_hw_engine *hwe)
blit_cctl_val,
XE_RTP_ACTION_FLAG(ENGINE_BASE)))
},
- /* Use Fixed slice CCS mode */
- { XE_RTP_NAME("RCU_MODE_FIXED_SLICE_CCS_MODE"),
- XE_RTP_RULES(FUNC(xe_hw_engine_match_fixed_cslice_mode)),
- XE_RTP_ACTIONS(FIELD_SET(RCU_MODE, RCU_MODE_FIXED_SLICE_CCS_MODE,
- RCU_MODE_FIXED_SLICE_CCS_MODE))
- },
/* Disable WMTP if HW doesn't support it */
{ XE_RTP_NAME("DISABLE_WMTP_ON_UNSUPPORTED_HW"),
XE_RTP_RULES(FUNC(xe_rtp_cfeg_wmtp_disabled)),
@@ -461,6 +455,12 @@ hw_engine_setup_default_state(struct xe_hw_engine *hwe)
XE_RTP_ACTIONS(SET(CSFE_CHICKEN1(0), CS_PRIORITY_MEM_READ,
XE_RTP_ACTION_FLAG(ENGINE_BASE)))
},
+ /* Use Fixed slice CCS mode */
+ { XE_RTP_NAME("RCU_MODE_FIXED_SLICE_CCS_MODE"),
+ XE_RTP_RULES(FUNC(xe_hw_engine_match_fixed_cslice_mode)),
+ XE_RTP_ACTIONS(FIELD_SET(RCU_MODE, RCU_MODE_FIXED_SLICE_CCS_MODE,
+ RCU_MODE_FIXED_SLICE_CCS_MODE))
+ },
};
xe_rtp_process_to_sr(&ctx, engine_entries, ARRAY_SIZE(engine_entries), &hwe->reg_sr);
From: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
commit c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") maps
the mmap option MAP_STACK to VM_NOHUGEPAGE. This is also done if
CONFIG_TRANSPARENT_HUGEPAGE is not defined. But in that case, the
VM_NOHUGEPAGE does not make sense.
I discovered this issue when trying to use the tool CRIU to checkpoint
and restore a container. Our running kernel is compiled without
CONFIG_TRANSPARENT_HUGEPAGE. CRIU parses the output of
/proc/<pid>/smaps and saves the "nh" flag. When trying to restore the
container, CRIU fails to restore the "nh" mappings, since madvise()
MADV_NOHUGEPAGE always returns an error because
CONFIG_TRANSPARENT_HUGEPAGE is not defined.
Fixes: c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE")
Cc: stable(a)vger.kernel.org
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
---
I discovered this issue when trying to use the tool CRIU to checkpoint
and restore a container. Our running kernel is compiled without
CONFIG_TRANSPARENT_HUGEPAGE. CRIU parses the output of /proc/<pid>/smaps
and saves the "nh" flag. When trying to restore the container, CRIU
fails to restore the "nh" mappings, since madvise() MADV_NOHUGEPAGE
always returns an error because CONFIG_TRANSPARENT_HUGEPAGE is not
defined.
The mapping MAP_STACK -> VM_NOHUGEPAGE was introduced by commit
c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") in order to
fix a regression introduced by commit efa7df3e3bb5 ("mm: align larger
anonymous mappings on THP boundaries"). The change introducing the
regression (efa7df3e3bb5) was limited to THP kernels, but its fix
(c4608d1bf7c6) is applied without checking if THP is set.
The mapping MAP_STACK -> VM_NOHUGEPAGE should only be applied if THP is
enabled.
---
Changes in v5:
- Correct typo CONFIG_TRANSPARENT_HUGETABLES -> CONFIG_TRANSPARENT_HUGEPAGE in patch description
- Link to v4: https://lore.kernel.org/r/20250507-map-map_stack-to-vm_nohugepage-only-if-t…
Changes in v4:
- Correct typo CONFIG_TRANSPARENT_HUGETABLES -> CONFIG_TRANSPARENT_HUGEPAGE
- Copy description from cover letter to commit description
- Link to v3: https://lore.kernel.org/r/20250507-map-map_stack-to-vm_nohugepage-only-if-t…
Changes in v3:
- Exclude non-stable patch (for huge_mm.h) from this series to avoid mixing stable and non-stable patches, as suggested by Andrew.
- Extend description in cover letter.
- Link to v2: https://lore.kernel.org/r/20250506-map-map_stack-to-vm_nohugepage-only-if-t…
Changes in v2:
- [Patch 1/2] Use '#ifdef' instead of '#if defined(...)'
- [Patch 1/2] Add 'Fixes: c4608d1bf7c6...'
- Create [Patch 2/2]
- Link to v1: https://lore.kernel.org/r/20250502-map-map_stack-to-vm_nohugepage-only-if-t…
---
include/linux/mman.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/mman.h b/include/linux/mman.h
index bce214fece16b9af3791a2baaecd6063d0481938..f4c6346a8fcd29b08d43f7cd9158c3eddc3383e1 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -155,7 +155,9 @@ calc_vm_flag_bits(struct file *file, unsigned long flags)
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
_calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
_calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
+#endif
arch_calc_vm_flag_bits(file, flags);
}
---
base-commit: fc96b232f8e7c0a6c282f47726b2ff6a5fb341d2
change-id: 20250428-map-map_stack-to-vm_nohugepage-only-if-thp-is-enabled-ce40a1de095d
Best regards,
--
Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
Upstream commits:
01: 5ba8b837b522d7051ef81bacf3d95383ff8edce5 ("sch_htb: make
htb_qlen_notify() idempotent")
02: df008598b3a00be02a8051fde89ca0fbc416bd55 ("sch_drr: make
drr_qlen_notify() idempotent")
03: 51eb3b65544c9efd6a1026889ee5fb5aa62da3bb ("sch_hfsc: make
hfsc_qlen_notify() idempotent")
04: 55f9eca4bfe30a15d8656f915922e8c98b7f0728 ("sch_qfq: make
qfq_qlen_notify() idempotent")
05: a7a15f39c682ac4268624da2abdb9114bdde96d5 ("sch_ets: make
est_qlen_notify() idempotent")
06: 342debc12183b51773b3345ba267e9263bdfaaef ("codel: remove
sch->q.qlen check before qdisc_tree_reduce_backlog()")
These patches are patch 01-06 of the original patchset ([1]) authored by
Cong Wang. I have omitted patches 07-11 which are selftests. This patchset
addresses a UAF vulnerability.
Originally, only the last commit (06) was picked to merge into the latest
round of stable queues 5.15,5.10,5.4. For 6.x stable branches, that sole
commit has already been merged in a previous cycle.
From my understanding, this patch depends on the previous patches to work.
Without patches 01-05 which make various classful qdiscs' qlen_notify()
idempotent, if an fq_codel's dequeue() routine empties the fq_codel qdisc,
it will be doubly deactivated - first in the parent qlen_notify and then
again in the parent dequeue. For instance, in the case of parent drr,
the double deactivation will either cause a fault on an invalid address,
or trigger a splat if list checks are compiled into the kernel. This is
also why the original unpatched code included the qlen check in the first
place.
After discussion with Greg, he has helped to temporarily drop the patch
from the 5.x queues ([2]). My suggestion is to include patches 01-06 of the
patchset, as listed above, for the 5.x queues. For the 6.x queues that have
already merged patch 06, the earlier patches 01-05 should be merged too.
I'm not too familiar with the stable patch process, so I may be completely
mistaken here.
Cheers,
Gerrard
[1]: https://lore.kernel.org/netdev/174410343500.1831514.15019771038334698036.gi…
[2]: https://lore.kernel.org/stable/2025050131-fragrant-famine-eb32@gregkh/
From: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
commit c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") maps
the mmap option MAP_STACK to VM_NOHUGEPAGE. This is also done if
CONFIG_TRANSPARENT_HUGETABLES is not defined. But in that case, the
VM_NOHUGEPAGE does not make sense.
I discovered this issue when trying to use the tool CRIU to checkpoint
and restore a container. Our running kernel is compiled without
CONFIG_TRANSPARENT_HUGETABLES. CRIU parses the output of
/proc/<pid>/smaps and saves the "nh" flag. When trying to restore the
container, CRIU fails to restore the "nh" mappings, since madvise()
MADV_NOHUGEPAGE always returns an error because
CONFIG_TRANSPARENT_HUGEPAGE is not defined.
Fixes: c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE")
Cc: stable(a)vger.kernel.org
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
---
I discovered this issue when trying to use the tool CRIU to checkpoint
and restore a container. Our running kernel is compiled without
CONFIG_TRANSPARENT_HUGEPAGE. CRIU parses the output of /proc/<pid>/smaps
and saves the "nh" flag. When trying to restore the container, CRIU
fails to restore the "nh" mappings, since madvise() MADV_NOHUGEPAGE
always returns an error because CONFIG_TRANSPARENT_HUGEPAGE is not
defined.
The mapping MAP_STACK -> VM_NOHUGEPAGE was introduced by commit
c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") in order to
fix a regression introduced by commit efa7df3e3bb5 ("mm: align larger
anonymous mappings on THP boundaries"). The change introducing the
regression (efa7df3e3bb5) was limited to THP kernels, but its fix
(c4608d1bf7c6) is applied without checking if THP is set.
The mapping MAP_STACK -> VM_NOHUGEPAGE should only be applied if THP is
enabled.
---
Changes in v4:
- Correct typo CONFIG_TRANSPARENT_HUGETABLES -> CONFIG_TRANSPARENT_HUGEPAGE
- Copy description from cover letter to commit description
- Link to v3: https://lore.kernel.org/r/20250507-map-map_stack-to-vm_nohugepage-only-if-t…
Changes in v3:
- Exclude non-stable patch (for huge_mm.h) from this series to avoid mixing stable and non-stable patches, as suggested by Andrew.
- Extend description in cover letter.
- Link to v2: https://lore.kernel.org/r/20250506-map-map_stack-to-vm_nohugepage-only-if-t…
Changes in v2:
- [Patch 1/2] Use '#ifdef' instead of '#if defined(...)'
- [Patch 1/2] Add 'Fixes: c4608d1bf7c6...'
- Create [Patch 2/2]
- Link to v1: https://lore.kernel.org/r/20250502-map-map_stack-to-vm_nohugepage-only-if-t…
---
include/linux/mman.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/mman.h b/include/linux/mman.h
index bce214fece16b9af3791a2baaecd6063d0481938..f4c6346a8fcd29b08d43f7cd9158c3eddc3383e1 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -155,7 +155,9 @@ calc_vm_flag_bits(struct file *file, unsigned long flags)
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
_calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
_calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
+#endif
arch_calc_vm_flag_bits(file, flags);
}
---
base-commit: fc96b232f8e7c0a6c282f47726b2ff6a5fb341d2
change-id: 20250428-map-map_stack-to-vm_nohugepage-only-if-thp-is-enabled-ce40a1de095d
Best regards,
--
Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
On configs with CONFIG_ARM64_GCS=y, VM_SHADOW_STACK is bit 38.
On configs with CONFIG_HAVE_ARCH_USERFAULTFD_MINOR=y (selected by
CONFIG_ARM64 when CONFIG_USERFAULTFD=y), VM_UFFD_MINOR is _also_ bit 38.
This bit being shared by two different VMA flags could lead to all sorts
of unintended behaviors. Presumably, a process could maybe call into
userfaultfd in a way that disables the shadow stack vma flag. I can't
think of any attack where this would help (presumably, if an attacker
tries to disable shadow stacks, they are trying to hijack control flow
so can't arbitrarily call into userfaultfd yet anyway) but this still
feels somewhat scary.
Reviewed-by: Mark Brown <broonie(a)kernel.org>
Fixes: ae80e1629aea ("mm: Define VM_SHADOW_STACK for arm64 when we support GCS")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Florent Revest <revest(a)chromium.org>
---
include/linux/mm.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf55206935c46..fdda6b16263b3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -385,7 +385,7 @@ extern unsigned int kobjsize(const void *objp);
#endif
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
-# define VM_UFFD_MINOR_BIT 38
+# define VM_UFFD_MINOR_BIT 41
# define VM_UFFD_MINOR BIT(VM_UFFD_MINOR_BIT) /* UFFD minor faults */
#else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
# define VM_UFFD_MINOR VM_NONE
--
2.49.0.987.g0cc8ee98dc-goog
From: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
phy-rcar-gen3-usb2 driver exports 4 PHYs. The timing registers are common
to all PHYs. There is no need to set them every time a PHY is initialized.
Set timing register only when the 1st PHY is initialized.
Fixes: f3b5a8d9b50d ("phy: rcar-gen3-usb2: Add R-Car Gen3 USB2 PHY driver")
Cc: stable(a)vger.kernel.org
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj(a)bp.renesas.com>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
---
Changes in v3:
- collected tags
Changes in v2:
- collected tags
drivers/phy/renesas/phy-rcar-gen3-usb2.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/renesas/phy-rcar-gen3-usb2.c b/drivers/phy/renesas/phy-rcar-gen3-usb2.c
index 118899efda70..9fdf17e0848a 100644
--- a/drivers/phy/renesas/phy-rcar-gen3-usb2.c
+++ b/drivers/phy/renesas/phy-rcar-gen3-usb2.c
@@ -467,8 +467,11 @@ static int rcar_gen3_phy_usb2_init(struct phy *p)
val = readl(usb2_base + USB2_INT_ENABLE);
val |= USB2_INT_ENABLE_UCOM_INTEN | rphy->int_enable_bits;
writel(val, usb2_base + USB2_INT_ENABLE);
- writel(USB2_SPD_RSM_TIMSET_INIT, usb2_base + USB2_SPD_RSM_TIMSET);
- writel(USB2_OC_TIMSET_INIT, usb2_base + USB2_OC_TIMSET);
+
+ if (!rcar_gen3_is_any_rphy_initialized(channel)) {
+ writel(USB2_SPD_RSM_TIMSET_INIT, usb2_base + USB2_SPD_RSM_TIMSET);
+ writel(USB2_OC_TIMSET_INIT, usb2_base + USB2_OC_TIMSET);
+ }
/* Initialize otg part (only if we initialize a PHY with IRQs). */
if (rphy->int_enable_bits)
--
2.43.0
From: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
The phy-rcar-gen3-usb2 driver exposes four individual PHYs that are
requested and configured by PHY users. The struct phy_ops APIs access the
same set of registers to configure all PHYs. Additionally, PHY settings can
be modified through sysfs or an IRQ handler. While some struct phy_ops APIs
are protected by a driver-wide mutex, others rely on individual
PHY-specific mutexes.
This approach can lead to various issues, including:
1/ the IRQ handler may interrupt PHY settings in progress, racing with
hardware configuration protected by a mutex lock
2/ due to msleep(20) in rcar_gen3_init_otg(), while a configuration thread
suspends to wait for the delay, another thread may try to configure
another PHY (with phy_init() + phy_power_on()); re-running the
phy_init() goes to the exact same configuration code, re-running the
same hardware configuration on the same set of registers (and bits)
which might impact the result of the msleep for the 1st configuring
thread
3/ sysfs can configure the hardware (though role_store()) and it can
still race with the phy_init()/phy_power_on() APIs calling into the
drivers struct phy_ops
To address these issues, add a spinlock to protect hardware register access
and driver private data structures (e.g., calls to
rcar_gen3_is_any_rphy_initialized()). Checking driver-specific data remains
necessary as all PHY instances share common settings. With this change,
the existing mutex protection is removed and the cleanup.h helpers are
used.
While at it, to keep the code simpler, do not skip
regulator_enable()/regulator_disable() APIs in
rcar_gen3_phy_usb2_power_on()/rcar_gen3_phy_usb2_power_off() as the
regulators enable/disable operations are reference counted anyway.
Fixes: f3b5a8d9b50d ("phy: rcar-gen3-usb2: Add R-Car Gen3 USB2 PHY driver")
Cc: stable(a)vger.kernel.org
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh(a)renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj(a)bp.renesas.com>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
---
Changes in v3:
- collected tags
Changes in v2:
- collected tags
drivers/phy/renesas/phy-rcar-gen3-usb2.c | 49 +++++++++++++-----------
1 file changed, 26 insertions(+), 23 deletions(-)
diff --git a/drivers/phy/renesas/phy-rcar-gen3-usb2.c b/drivers/phy/renesas/phy-rcar-gen3-usb2.c
index bb05fd26eb7f..00ce564463de 100644
--- a/drivers/phy/renesas/phy-rcar-gen3-usb2.c
+++ b/drivers/phy/renesas/phy-rcar-gen3-usb2.c
@@ -9,6 +9,7 @@
* Copyright (C) 2014 Cogent Embedded, Inc.
*/
+#include <linux/cleanup.h>
#include <linux/extcon-provider.h>
#include <linux/interrupt.h>
#include <linux/io.h>
@@ -118,7 +119,7 @@ struct rcar_gen3_chan {
struct regulator *vbus;
struct reset_control *rstc;
struct work_struct work;
- struct mutex lock; /* protects rphys[...].powered */
+ spinlock_t lock; /* protects access to hardware and driver data structure. */
enum usb_dr_mode dr_mode;
u32 obint_enable_bits;
bool extcon_host;
@@ -348,6 +349,8 @@ static ssize_t role_store(struct device *dev, struct device_attribute *attr,
bool is_b_device;
enum phy_mode cur_mode, new_mode;
+ guard(spinlock_irqsave)(&ch->lock);
+
if (!ch->is_otg_channel || !rcar_gen3_is_any_otg_rphy_initialized(ch))
return -EIO;
@@ -415,7 +418,7 @@ static void rcar_gen3_init_otg(struct rcar_gen3_chan *ch)
val = readl(usb2_base + USB2_ADPCTRL);
writel(val | USB2_ADPCTRL_IDPULLUP, usb2_base + USB2_ADPCTRL);
}
- msleep(20);
+ mdelay(20);
writel(0xffffffff, usb2_base + USB2_OBINTSTA);
writel(ch->obint_enable_bits, usb2_base + USB2_OBINTEN);
@@ -436,12 +439,14 @@ static irqreturn_t rcar_gen3_phy_usb2_irq(int irq, void *_ch)
if (pm_runtime_suspended(dev))
goto rpm_put;
- status = readl(usb2_base + USB2_OBINTSTA);
- if (status & ch->obint_enable_bits) {
- dev_vdbg(dev, "%s: %08x\n", __func__, status);
- writel(ch->obint_enable_bits, usb2_base + USB2_OBINTSTA);
- rcar_gen3_device_recognition(ch);
- ret = IRQ_HANDLED;
+ scoped_guard(spinlock, &ch->lock) {
+ status = readl(usb2_base + USB2_OBINTSTA);
+ if (status & ch->obint_enable_bits) {
+ dev_vdbg(dev, "%s: %08x\n", __func__, status);
+ writel(ch->obint_enable_bits, usb2_base + USB2_OBINTSTA);
+ rcar_gen3_device_recognition(ch);
+ ret = IRQ_HANDLED;
+ }
}
rpm_put:
@@ -456,6 +461,8 @@ static int rcar_gen3_phy_usb2_init(struct phy *p)
void __iomem *usb2_base = channel->base;
u32 val;
+ guard(spinlock_irqsave)(&channel->lock);
+
/* Initialize USB2 part */
val = readl(usb2_base + USB2_INT_ENABLE);
val |= USB2_INT_ENABLE_UCOM_INTEN | rphy->int_enable_bits;
@@ -479,6 +486,8 @@ static int rcar_gen3_phy_usb2_exit(struct phy *p)
void __iomem *usb2_base = channel->base;
u32 val;
+ guard(spinlock_irqsave)(&channel->lock);
+
rphy->initialized = false;
val = readl(usb2_base + USB2_INT_ENABLE);
@@ -498,16 +507,17 @@ static int rcar_gen3_phy_usb2_power_on(struct phy *p)
u32 val;
int ret = 0;
- mutex_lock(&channel->lock);
- if (!rcar_gen3_are_all_rphys_power_off(channel))
- goto out;
-
if (channel->vbus) {
ret = regulator_enable(channel->vbus);
if (ret)
- goto out;
+ return ret;
}
+ guard(spinlock_irqsave)(&channel->lock);
+
+ if (!rcar_gen3_are_all_rphys_power_off(channel))
+ goto out;
+
val = readl(usb2_base + USB2_USBCTR);
val |= USB2_USBCTR_PLL_RST;
writel(val, usb2_base + USB2_USBCTR);
@@ -517,7 +527,6 @@ static int rcar_gen3_phy_usb2_power_on(struct phy *p)
out:
/* The powered flag should be set for any other phys anyway */
rphy->powered = true;
- mutex_unlock(&channel->lock);
return 0;
}
@@ -528,18 +537,12 @@ static int rcar_gen3_phy_usb2_power_off(struct phy *p)
struct rcar_gen3_chan *channel = rphy->ch;
int ret = 0;
- mutex_lock(&channel->lock);
- rphy->powered = false;
-
- if (!rcar_gen3_are_all_rphys_power_off(channel))
- goto out;
+ scoped_guard(spinlock_irqsave, &channel->lock)
+ rphy->powered = false;
if (channel->vbus)
ret = regulator_disable(channel->vbus);
-out:
- mutex_unlock(&channel->lock);
-
return ret;
}
@@ -750,7 +753,7 @@ static int rcar_gen3_phy_usb2_probe(struct platform_device *pdev)
if (phy_data->no_adp_ctrl)
channel->obint_enable_bits = USB2_OBINT_IDCHG_EN;
- mutex_init(&channel->lock);
+ spin_lock_init(&channel->lock);
for (i = 0; i < NUM_OF_PHYS; i++) {
channel->rphys[i].phy = devm_phy_create(dev, NULL,
phy_data->phy_usb2_ops);
--
2.43.0
Commit 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
introduced an error in the TX DMA handling for 8250_omap.
When the OMAP_DMA_TX_KICK flag is set, one byte is pulled from the
kfifo and emitted directly in order to start the DMA. This is done
without updating DMA tx_size which leads to uart_xmit_advance() called
in the DMA complete callback advancing the kfifo by one too much.
In practice, transmitting N bytes has been seen to result in the last
N-1 bytes being sent repeatedly.
This change fixes the problem by moving all of the dma setup after
the OMAP_DMA_TX_KICK handling and using kfifo_len() instead of the
dma size for the 4-byte cutoff check. This slightly changes the
behaviour at buffer wraparound, but it still transmits the correct
bytes somehow. At the point kfifo_dma_out_prepare_mapped is called,
at least one byte is guaranteed to be in the fifo, so checking the
return value is not necessary.
Fixes: 1788cf6a91d9 ("tty: serial: switch from circ_buf to kfifo")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mans Rullgard <mans(a)mansr.com>
---
v2: split patch in two
---
drivers/tty/serial/8250/8250_omap.c | 24 +++++++++---------------
1 file changed, 9 insertions(+), 15 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index f1aee915bc02..180466e09605 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -1173,16 +1173,6 @@ static int omap_8250_tx_dma(struct uart_8250_port *p)
return 0;
}
- sg_init_table(&sg, 1);
- ret = kfifo_dma_out_prepare_mapped(&tport->xmit_fifo, &sg, 1,
- UART_XMIT_SIZE, dma->tx_addr);
- if (ret != 1) {
- serial8250_clear_THRI(p);
- return 0;
- }
-
- dma->tx_size = sg_dma_len(&sg);
-
if (priv->habit & OMAP_DMA_TX_KICK) {
unsigned char c;
u8 tx_lvl;
@@ -1207,7 +1197,7 @@ static int omap_8250_tx_dma(struct uart_8250_port *p)
ret = -EBUSY;
goto err;
}
- if (dma->tx_size < 4) {
+ if (kfifo_len(&tport->xmit_fifo) < 4) {
ret = -EINVAL;
goto err;
}
@@ -1216,11 +1206,12 @@ static int omap_8250_tx_dma(struct uart_8250_port *p)
goto err;
}
skip_byte = c;
- /* now we need to recompute due to kfifo_get */
- kfifo_dma_out_prepare_mapped(&tport->xmit_fifo, &sg, 1,
- UART_XMIT_SIZE, dma->tx_addr);
}
+ sg_init_table(&sg, 1);
+ kfifo_dma_out_prepare_mapped(&tport->xmit_fifo, &sg, 1,
+ UART_XMIT_SIZE, dma->tx_addr);
+
desc = dmaengine_prep_slave_sg(dma->txchan, &sg, 1, DMA_MEM_TO_DEV,
DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
if (!desc) {
@@ -1228,6 +1219,7 @@ static int omap_8250_tx_dma(struct uart_8250_port *p)
goto err;
}
+ dma->tx_size = sg_dma_len(&sg);
dma->tx_running = 1;
desc->callback = omap_8250_dma_tx_complete;
@@ -1248,8 +1240,10 @@ static int omap_8250_tx_dma(struct uart_8250_port *p)
err:
dma->tx_err = 1;
out_skip:
- if (skip_byte >= 0)
+ if (skip_byte >= 0) {
serial_out(p, UART_TX, skip_byte);
+ p->port.icount.tx++;
+ }
return ret;
}
--
2.49.0
UDF maintains total length of all extents in i_lenExtents. Generally we
keep extent lengths (and thus i_lenExtents) block aligned because it
makes the file appending logic simpler. However the standard mandates
that the inode size must match the length of all extents and thus we
trim the last extent when closing the file. To catch possible bugs we
also verify that i_lenExtents matches i_size when evicting inode from
memory. Commit b405c1e58b73 ("udf: refactor udf_next_aext() to handle
error") however broke the code updating i_lenExtents and thus
udf_evict_inode() ended up spewing lots of errors about incorrectly
sized extents although the extents were actually sized properly. Fix the
updating of i_lenExtents to silence the errors.
Fixes: b405c1e58b73 ("udf: refactor udf_next_aext() to handle error")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/udf/truncate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
I plan to merge this fix to my tree.
diff --git a/fs/udf/truncate.c b/fs/udf/truncate.c
index 4f33a4a48886..b4071c9cf8c9 100644
--- a/fs/udf/truncate.c
+++ b/fs/udf/truncate.c
@@ -115,7 +115,7 @@ void udf_truncate_tail_extent(struct inode *inode)
}
/* This inode entry is in-memory only and thus we don't have to mark
* the inode dirty */
- if (ret == 0)
+ if (ret >= 0)
iinfo->i_lenExtents = inode->i_size;
brelse(epos.bh);
}
--
2.43.0
From: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
commit c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") maps
the mmap option MAP_STACK to VM_NOHUGEPAGE. This is also done if
CONFIG_TRANSPARENT_HUGETABLES is not defined. But in that case, the
VM_NOHUGEPAGE does not make sense.
Fixes: c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE")
Cc: stable(a)vger.kernel.org
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com>
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
---
I discovered this issue when trying to use the tool CRIU to checkpoint
and restore a container. Our running kernel is compiled without
CONFIG_TRANSPARENT_HUGETABLES. CRIU parses the output of
/proc/<pid>/smaps and saves the "nh" flag. When trying to restore the
container, CRIU fails to restore the "nh" mappings, since madvise()
MADV_NOHUGEPAGE always returns an error because
CONFIG_TRANSPARENT_HUGETABLES is not defined.
The mapping MAP_STACK -> VM_NOHUGEPAGE was introduced by commit
c4608d1bf7c6 ("mm: mmap: map MAP_STACK to VM_NOHUGEPAGE") in order to
fix a regression introduced by commit efa7df3e3bb5 ("mm: align larger
anonymous mappings on THP boundaries"). The change introducing the
regression (efa7df3e3bb5) was limited to THP kernels, but its fix
(c4608d1bf7c6) is applied without checking if THP is set.
The mapping MAP_STACK -> VM_NOHUGEPAGE should only be applied if THP is
enabled.
---
Changes in v3:
- Exclude non-stable patch (for huge_mm.h) from this series to avoid mixing stable and non-stable patches, as suggested by Andrew.
- Extend description in cover letter.
- Link to v2: https://lore.kernel.org/r/20250506-map-map_stack-to-vm_nohugepage-only-if-t…
Changes in v2:
- [Patch 1/2] Use '#ifdef' instead of '#if defined(...)'
- [Patch 1/2] Add 'Fixes: c4608d1bf7c6...'
- Create [Patch 2/2]
- Link to v1: https://lore.kernel.org/r/20250502-map-map_stack-to-vm_nohugepage-only-if-t…
---
include/linux/mman.h | 2 ++
1 file changed, 2 insertions(+)
diff --git a/include/linux/mman.h b/include/linux/mman.h
index bce214fece16b9af3791a2baaecd6063d0481938..f4c6346a8fcd29b08d43f7cd9158c3eddc3383e1 100644
--- a/include/linux/mman.h
+++ b/include/linux/mman.h
@@ -155,7 +155,9 @@ calc_vm_flag_bits(struct file *file, unsigned long flags)
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
_calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
_calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) |
+#endif
arch_calc_vm_flag_bits(file, flags);
}
---
base-commit: fc96b232f8e7c0a6c282f47726b2ff6a5fb341d2
change-id: 20250428-map-map_stack-to-vm_nohugepage-only-if-thp-is-enabled-ce40a1de095d
Best regards,
--
Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
The logic that drives the pad calibration values resides in the
controller reset domain and so the calibration values are only being
captured when the controller is out of reset. However, by clearing the
CYA_TRK_CODE_UPDATE_ON_IDLE bit, the calibration values can be set
while the controller is in reset.
The CYA_TRK_CODE_UPDATE_ON_IDLE bit was previously cleared based on the
trk_hw_mode flag, but this dependency is not necessary. Instead,
introduce a new flag, trk_update_on_idle, to independently control this
bit.
Fixes: d8163a32ca95 ("phy: tegra: xusb: Add Tegra234 support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Wayne Chang <waynec(a)nvidia.com>
---
drivers/phy/tegra/xusb-tegra186.c | 14 ++++++++------
drivers/phy/tegra/xusb.h | 1 +
2 files changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/phy/tegra/xusb-tegra186.c b/drivers/phy/tegra/xusb-tegra186.c
index fae6242aa730..dd0aaf305e90 100644
--- a/drivers/phy/tegra/xusb-tegra186.c
+++ b/drivers/phy/tegra/xusb-tegra186.c
@@ -650,14 +650,15 @@ static void tegra186_utmi_bias_pad_power_on(struct tegra_xusb_padctl *padctl)
udelay(100);
}
- if (padctl->soc->trk_hw_mode) {
- value = padctl_readl(padctl, XUSB_PADCTL_USB2_BIAS_PAD_CTL2);
- value |= USB2_TRK_HW_MODE;
+ value = padctl_readl(padctl, XUSB_PADCTL_USB2_BIAS_PAD_CTL2);
+ if (padctl->soc->trk_update_on_idle)
value &= ~CYA_TRK_CODE_UPDATE_ON_IDLE;
- padctl_writel(padctl, value, XUSB_PADCTL_USB2_BIAS_PAD_CTL2);
- } else {
+ if (padctl->soc->trk_hw_mode)
+ value |= USB2_TRK_HW_MODE;
+ padctl_writel(padctl, value, XUSB_PADCTL_USB2_BIAS_PAD_CTL2);
+
+ if (!padctl->soc->trk_hw_mode)
clk_disable_unprepare(priv->usb2_trk_clk);
- }
mutex_unlock(&padctl->lock);
}
@@ -1703,6 +1704,7 @@ const struct tegra_xusb_padctl_soc tegra234_xusb_padctl_soc = {
.supports_gen2 = true,
.poll_trk_completed = true,
.trk_hw_mode = true,
+ .trk_update_on_idle = true,
.supports_lp_cfg_en = true,
};
EXPORT_SYMBOL_GPL(tegra234_xusb_padctl_soc);
diff --git a/drivers/phy/tegra/xusb.h b/drivers/phy/tegra/xusb.h
index 6e45d194c689..d2b5f9565132 100644
--- a/drivers/phy/tegra/xusb.h
+++ b/drivers/phy/tegra/xusb.h
@@ -434,6 +434,7 @@ struct tegra_xusb_padctl_soc {
bool need_fake_usb3_port;
bool poll_trk_completed;
bool trk_hw_mode;
+ bool trk_update_on_idle;
bool supports_lp_cfg_en;
};
--
2.25.1
Hi Greg,
below is a backport for upstream patch
fd87b7783802 ("net: Fix the devmem sock opts and msgs for parisc").
This upstream patch does not apply cleanly against v6.13, and
backporting all intermediate changes are too big, so I created this
trivial standalone patch instead.
Can you please add the patch below to the stable queue for v6.13?
Thanks!
Helge
---
From: Pranjal Shrivastava <praan(a)google.com>
Date: Mon, 24 Mar 2025 07:42:27 +0000
Subject: [PATCH] net: Fix the devmem sock opts and msgs for parisc
The devmem socket options and socket control message definitions
introduced in the TCP devmem series[1] incorrectly continued the socket
definitions for arch/parisc.
The UAPI change seems safe as there are currently no drivers that
declare support for devmem TCP RX via PP_FLAG_ALLOW_UNREADABLE_NETMEM.
Hence, fixing this UAPI should be safe.
Fix the devmem socket options and socket control message definitions to
reflect the series followed by arch/parisc.
[1] https://lore.kernel.org/lkml/20240910171458.219195-10-almasrymina@google.co…
Patch modified for kernel 6.13 by Helge Deller.
Fixes: 8f0b3cc9a4c10 ("tcp: RX path for devmem TCP")
Signed-off-by: Pranjal Shrivastava <praan(a)google.com>
Signed-off-by: Helge Deller <deller(a)gmx.de>
diff --git b/arch/parisc/include/uapi/asm/socket.h a/arch/parisc/include/uapi/asm/socket.h
index d268d69bfcd2..96831c988606 100644
--- b/arch/parisc/include/uapi/asm/socket.h
+++ a/arch/parisc/include/uapi/asm/socket.h
@@ -132,13 +132,15 @@
#define SO_PASSPIDFD 0x404A
#define SO_PEERPIDFD 0x404B
-#define SO_DEVMEM_LINEAR 78
+#define SCM_TS_OPT_ID 0x404C
+
+#define SO_RCVPRIORITY 0x404D
+
+#define SO_DEVMEM_LINEAR 0x404E
#define SCM_DEVMEM_LINEAR SO_DEVMEM_LINEAR
-#define SO_DEVMEM_DMABUF 79
+#define SO_DEVMEM_DMABUF 0x404F
#define SCM_DEVMEM_DMABUF SO_DEVMEM_DMABUF
-#define SO_DEVMEM_DONTNEED 80
-
-#define SCM_TS_OPT_ID 0x404C
+#define SO_DEVMEM_DONTNEED 0x4050
#if !defined(__KERNEL__)
This series adds SPI NOR support for STM32MP25 SoCs from STMicroelectronics.
On STM32MP25 SoCs family, an Octo Memory Manager block manages the muxing,
the memory area split, the chip select override and the time constraint
between its 2 Octo SPI children.
Due to these depedencies, this series adds support for:
- Octo Memory Manager driver.
- Octo SPI driver.
- yaml schema for Octo Memory Manager and Octo SPI drivers.
The device tree files adds Octo Memory Manager and its 2 associated Octo
SPI chidren in stm32mp251.dtsi and adds SPI NOR support in stm32mp257f-ev1
board.
Signed-off-by: Patrice Chotard <patrice.chotard(a)foss.st.com>
Changes in v13:
- Make firewall prototypes always exposed.
- Restore STM32_OMM Kconfig dependency from v11.
- Link to v12: https://lore.kernel.org/r/20250506-upstream_ospi_v6-v12-0-e3bb5a0d78fb@foss…
Changes in v12:
- Update Kconfig dependencies.
- Link to v11: https://lore.kernel.org/r/20250428-upstream_ospi_v6-v11-0-1548736fd9d2@foss…
Changes in v11:
- Add stm32_omm_toggle_child_clock(dev, false) in stm32_omm_disable_child() in case of error.
- Check MUXEN bit in stm32_omm_probe() to check if child clock must be disabled.
- Add dev_err_probe() in stm32_omm_probe().
- Link to v10: https://lore.kernel.org/r/20250422-upstream_ospi_v6-v10-0-6f4942a04e10@foss…
Changes in v10:
- Add of_node_put() in stm32_omm_set_amcr().
- Link to v9: https://lore.kernel.org/r/20250410-upstream_ospi_v6-v9-0-cf119508848a@foss.…
Changes in v9:
- split patchset by susbsystem, current one include only OMM related
patches.
- Update SPDX Identifiers to "GPL-2.0-only".
- Add of_node_put)() instm32_omm_set_amcr().
- Rework error path in stm32_omm_toggle_child_clock().
- Make usage of reset_control_acquire/release() in stm32_omm_disable_child()
and move reset_control_get in probe().
- Rename error label in stm32_omm_configure().
- Remove child compatible check in stm32_omm_probe().
- Make usage of devm_of_platform_populate().
- Link to v8: https://lore.kernel.org/r/20250407-upstream_ospi_v6-v8-0-7b7716c1c1f6@foss.…
Changes in v8:
- update OMM's dt-bindings:
- Remove minItems for clocks and resets properties.
- Fix st,syscfg-amcr items declaration.
- move power-domains property before vendor specific properties.
- Update compatible check wrongly introduced during internal tests in
stm32_omm.c.
- Move ommanager's node outside bus@42080000's node in stm32mp251.dtsi.
- Link to v7: https://lore.kernel.org/r/20250401-upstream_ospi_v6-v7-0-0ef28513ed81@foss.…
Changes in v7:
- update OMM's dt-bindings by updating :
- clock-names and reset-names properties.
- spi unit-address node.
- example.
- update stm32mp251.dtsi to match with OMM's bindings update.
- update stm32mp257f-ev1.dts to match with OMM's bindings update.
- Link to v6: https://lore.kernel.org/r/20250321-upstream_ospi_v6-v6-0-37bbcab43439@foss.…
Changes in v6:
- Update MAINTAINERS file.
- Remove previous patch 1/8 and 2/8, merged by Mark Brown in spi git tree.
- Fix Signed-off-by order for patch 3.
- OMM driver:
- Add dev_err_probe() in error path.
- Rename stm32_omm_enable_child_clock() to stm32_omm_toggle_child_clock().
- Reorder initialised/non-initialized variable in stm32_omm_configure()
and stm32_omm_probe().
- Move pm_runtime_disable() calls from stm32_omm_configure() to
stm32_omm_probe().
- Update children's clocks and reset management.
- Use of_platform_populate() to probe children.
- Add missing pm_runtime_disable().
- Remove useless stm32_omm_check_access's first parameter.
- Update OMM's dt-bindings by adding OSPI's clocks and resets.
- Update stm32mp251.dtsi by adding OSPI's clock and reset in OMM's node.
Changes in v5:
- Add Reviewed-by Krzysztof Kozlowski for patch 1 and 3.
Changes in v4:
- Add default value requested by Krzysztof for st,omm-req2ack-ns,
st,omm-cssel-ovr and st,omm-mux properties in st,stm32mp25-omm.yaml
- Remove constraint in free form test for st,omm-mux property.
- Fix drivers/memory/Kconfig by replacing TEST_COMPILE_ by COMPILE_TEST.
- Fix SPDX-License-Identifier for stm32-omm.c.
- Fix Kernel test robot by fixing dev_err() format in stm32-omm.c.
- Add missing pm_runtime_disable() in the error handling path in
stm32-omm.c.
- Replace an int by an unsigned int in stm32-omm.c
- Remove uneeded "," after terminator in stm32-omm.c.
- Update cover letter description to explain dependecies between
Octo Memory Manager and its 2 Octo SPI children.
Changes in v3:
- Squash defconfig patches 8 and 9.
- Update STM32 Octo Memory Manager controller bindings.
- Rename st,stm32-omm.yaml to st,stm32mp25-omm.yaml.
- Update STM32 OSPI controller bindings.
- Reorder DT properties in .dtsi and .dts files.
- Replace devm_reset_control_get_optional() by
devm_reset_control_get_optional_exclusive() in stm32_omm.c.
- Reintroduce region-memory-names management in stm32_omm.c.
- Rename stm32_ospi_tx_poll() and stm32_ospi_tx() to respectively to
stm32_ospi_poll() and stm32_ospi_xfer() in spi-stm32-ospi.c.
- Set SPI_CONTROLLER_HALF_DUPLEX in controller flags in spi-stm32-ospi.c.
Changes in v2:
- Move STM32 Octo Memory Manager controller driver and bindings from
misc to memory-controllers.
- Update STM32 OSPI controller bindings.
- Update STM32 Octo Memory Manager controller bindings.
- Update STM32 Octo Memory Manager driver to match bindings update.
- Update DT to match bindings update.
Signed-off-by: Patrice Chotard <patrice.chotard(a)foss.st.com>
---
Patrice Chotard (4):
firewall: Always expose firewall prototype
dt-bindings: memory-controllers: Add STM32 Octo Memory Manager controller
memory: Add STM32 Octo Memory Manager driver
MAINTAINERS: add entry for STM32 OCTO MEMORY MANAGER driver
.../memory-controllers/st,stm32mp25-omm.yaml | 226 ++++++++++
MAINTAINERS | 6 +
drivers/memory/Kconfig | 17 +
drivers/memory/Makefile | 1 +
drivers/memory/stm32_omm.c | 476 +++++++++++++++++++++
include/linux/bus/stm32_firewall_device.h | 10 +-
6 files changed, 735 insertions(+), 1 deletion(-)
---
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
change-id: 20250320-upstream_ospi_v6-d432a8172105
Best regards,
--
Patrice Chotard <patrice.chotard(a)foss.st.com>
This patchset backports a series of ublk fixes from upstream to 6.14-stable.
Patch 7 fixes the race that can cause kernel panic when ublk server daemon is exiting.
It depends on patches 1-6 which simplifies & improves IO canceling when ublk server daemon
is exiting as described here:
https://lore.kernel.org/linux-block/20250416035444.99569-1-ming.lei@redhat.…
Ming Lei (5):
ublk: add helper of ublk_need_map_io()
ublk: move device reset into ublk_ch_release()
ublk: remove __ublk_quiesce_dev()
ublk: simplify aborting ublk request
ublk: fix race between io_uring_cmd_complete_in_task and
ublk_cancel_cmd
Uday Shankar (2):
ublk: properly serialize all FETCH_REQs
ublk: improve detection and handling of ublk server exit
drivers/block/ublk_drv.c | 550 +++++++++++++++++++++------------------
1 file changed, 291 insertions(+), 259 deletions(-)
--
2.43.0
Commit fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM") made the
initialization of the local memcache variable in user_mem_abort()
conditional, leaving a codepath where it is used uninitialized via
kvm_pgtable_stage2_map().
This can fail on any path that requires a stage-2 allocation
without transition via a permission fault or dirty logging.
Fix this by making sure that memcache is always valid.
Fixes: fce886a60207 ("KVM: arm64: Plumb the pKVM MMU in KVM")
Signed-off-by: Sebastian Ott <sebott(a)redhat.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/kvmarm/3f5db4c7-ccce-fb95-595c-692fa7aad227@redhat.…
---
arch/arm64/kvm/mmu.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 754f2fe0cc67..eeda92330ade 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1501,6 +1501,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
return -EFAULT;
}
+ if (!is_protected_kvm_enabled())
+ memcache = &vcpu->arch.mmu_page_cache;
+ else
+ memcache = &vcpu->arch.pkvm_memcache;
+
/*
* Permission faults just need to update the existing leaf entry,
* and so normally don't require allocations from the memcache. The
@@ -1510,13 +1515,11 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
if (!fault_is_perm || (logging_active && write_fault)) {
int min_pages = kvm_mmu_cache_min_pages(vcpu->arch.hw_mmu);
- if (!is_protected_kvm_enabled()) {
- memcache = &vcpu->arch.mmu_page_cache;
+ if (!is_protected_kvm_enabled())
ret = kvm_mmu_topup_memory_cache(memcache, min_pages);
- } else {
- memcache = &vcpu->arch.pkvm_memcache;
+ else
ret = topup_hyp_memcache(memcache, min_pages);
- }
+
if (ret)
return ret;
}
base-commit: 92a09c47464d040866cf2b4cd052bc60555185fb
--
2.49.0
Changes since v2 [1]:
* Drop the new x86_platform_op and just use
cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT) directly where needed
(Naveen)
* Make the restriction identical to lockdown and stop playing games with
devmem_is_allowed()
* Ensure that CONFIG_IO_STRICT_DEVMEM is enabled to avoid conflicting
mappings for userspace mappings of PCI MMIO.
The original response to Nikolay's report of an SEPT violation triggered
by /dev/mem access to private memory was "let's just turn off /dev/mem".
After some machinations of x86_platform_ops to block a subset of
problematic access, spelunking the history of devmem_is_allowed()
returning "2" to enable some compatibility benefits while blocking
access, and discovering that userspace depends buggy kernel behavior for
mmap(2) of the first 1MB of memory on x86, the proposal has circled back
to "disable /dev/mem".
Require both STRICT_DEVMEM and IO_STRICT_DEVMEM for x86 confidential
guests to close /dev/mem hole while still allowing for userspace
mapping of PCI MMIO as long as the kernel and userspace are not mapping
the range at the same time.
The range_is_allowed() cleanup is not strictly necessary, but might as
well close a 17 year-old "TODO".
---
Dan Williams (2):
x86/devmem: Remove duplicate range_is_allowed() definition
x86/devmem: Drop /dev/mem access for confidential guests
arch/x86/Kconfig | 4 ++++
arch/x86/mm/pat/memtype.c | 31 ++++---------------------------
drivers/char/mem.c | 27 +++++++++------------------
include/linux/io.h | 21 +++++++++++++++++++++
4 files changed, 38 insertions(+), 45 deletions(-)
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
Do not mix class->size and object size during offsets/sizes
calculation in zs_obj_write(). Size classes can merge into
clusters, based on objects-per-zspage and pages-per-zspage
characteristics, so some size classes can store objects
smaller than class->size. This becomes problematic when
object size is much smaller than class->size - we can determine
that object spans two physical pages, because we use a larger
class->size for this, while the actual object is much smaller
and fits one physical page, so there is nothing to write to
the second page and memcpy() size calculation underflows.
We always know the exact size in bytes of the object
that we are about to write (store), so use it instead of
class->size.
Reported-by: Igor Belousov <igor.b(a)beldev.am>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
---
mm/zsmalloc.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/mm/zsmalloc.c b/mm/zsmalloc.c
index 70406ac94bbd..999b513c7fdf 100644
--- a/mm/zsmalloc.c
+++ b/mm/zsmalloc.c
@@ -1233,19 +1233,19 @@ void zs_obj_write(struct zs_pool *pool, unsigned long handle,
class = zspage_class(pool, zspage);
off = offset_in_page(class->size * obj_idx);
- if (off + class->size <= PAGE_SIZE) {
+ if (!ZsHugePage(zspage))
+ off += ZS_HANDLE_SIZE;
+
+ if (off + mem_len <= PAGE_SIZE) {
/* this object is contained entirely within a page */
void *dst = kmap_local_zpdesc(zpdesc);
- if (!ZsHugePage(zspage))
- off += ZS_HANDLE_SIZE;
memcpy(dst + off, handle_mem, mem_len);
kunmap_local(dst);
} else {
/* this object spans two pages */
size_t sizes[2];
- off += ZS_HANDLE_SIZE;
sizes[0] = PAGE_SIZE - off;
sizes[1] = mem_len - sizes[0];
--
2.49.0.906.g1f30a19c02-goog
This patch fixes a potential deadlock bug. We observed that in the
mtk-cqdma.c file, most functions like mtk_cqdma_issue_pending() and
mtk_cqdma_free_active_desc() follow the correct locking sequence by
acquiring the pc lock first before taking the vc lock when handling the vc
and pc fields. However, in mtk_cqdma_tx_status(), the function incorrectly
acquires the vc lock first before calling mtk_cqdma_find_active_desc(),
which subsequently acquires the pc lock. This reversed lock acquisition
order (vc → pc) violates the established sequence (pc → vc) and could
potentially trigger deadlock scenarios.
To resolve this issue, we have moved the vc lock acquisition code from
mtk_cqdma_tx_status() into the mtk_cqdma_find_active_desc() function.
This adjustment ensures proper lock ordering while maintaining
functionality. Since mtk_cqdma_find_active_desc() is a static function
with only one call site in mtk_cqdma_tx_status(), this fix effectively
addresses the deadlock risk without introducing unintended side effects
to other components.
This possible bug is found by an experimental static analysis tool
developed by our team. This tool analyzes the locking APIs to extract
function pairs that can be concurrently executed, and then analyzes the
instructions in the paired functions to identify possible concurrency bugs
including data races and atomicity violations.
Fixes: b1f01e48df5a ("dmaengine: mediatek: Add MediaTek Command-Queue DMA controller for MT6765 SoC")
Cc: stable(a)vger.kernel.org
Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com>
---
drivers/dma/mediatek/mtk-cqdma.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/mediatek/mtk-cqdma.c b/drivers/dma/mediatek/mtk-cqdma.c
index d5ddb4e30e71..656354bccb44 100644
--- a/drivers/dma/mediatek/mtk-cqdma.c
+++ b/drivers/dma/mediatek/mtk-cqdma.c
@@ -423,11 +423,14 @@ static struct virt_dma_desc *mtk_cqdma_find_active_desc(struct dma_chan *c,
unsigned long flags;
spin_lock_irqsave(&cvc->pc->lock, flags);
+ spin_lock_irqsave(&cvc->vc.lock, flags);
list_for_each_entry(vd, &cvc->pc->queue, node)
if (vd->tx.cookie == cookie) {
+ spin_unlock_irqrestore(&cvc->vc.lock, flags);
spin_unlock_irqrestore(&cvc->pc->lock, flags);
return vd;
}
+ spin_unlock_irqrestore(&cvc->vc.lock, flags);
spin_unlock_irqrestore(&cvc->pc->lock, flags);
list_for_each_entry(vd, &cvc->vc.desc_issued, node)
@@ -452,9 +455,7 @@ static enum dma_status mtk_cqdma_tx_status(struct dma_chan *c,
if (ret == DMA_COMPLETE || !txstate)
return ret;
- spin_lock_irqsave(&cvc->vc.lock, flags);
vd = mtk_cqdma_find_active_desc(c, cookie);
- spin_unlock_irqrestore(&cvc->vc.lock, flags);
if (vd) {
cvd = to_cqdma_vdesc(vd);
--
2.34.1
Initialize current_be_id to 0 in AMD legacy stack(NO DSP enabled) SoundWire
generic machine driver code to handle the unlikely case when there are no
devices connected to a DAI.
In this case create_sdw_dailink() would return without touching the passed
pointer to current_be_id.
Found by gcc -fanalyzer
Cc: stable(a)vger.kernel.org
Fixes: 2981d9b0789c4 ("ASoC: amd: acp: add soundwire machine driver for legacy stack")
Signed-off-by: Vijendar Mukunda <Vijendar.Mukunda(a)amd.com>
---
sound/soc/amd/acp/acp-sdw-legacy-mach.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/amd/acp/acp-sdw-legacy-mach.c b/sound/soc/amd/acp/acp-sdw-legacy-mach.c
index 2020c5cfb3d5..582c68aee6e5 100644
--- a/sound/soc/amd/acp/acp-sdw-legacy-mach.c
+++ b/sound/soc/amd/acp/acp-sdw-legacy-mach.c
@@ -272,7 +272,7 @@ static int create_sdw_dailinks(struct snd_soc_card *card,
/* generate DAI links by each sdw link */
while (soc_dais->initialised) {
- int current_be_id;
+ int current_be_id = 0;
ret = create_sdw_dailink(card, soc_dais, dai_links,
¤t_be_id, codec_conf, sdw_platform_component);
--
2.45.2
This series adds support for camera clock controller base driver,
bindings and DT support on sc8180x platform.
Signed-off-by: Satya Priya Kakitapalli <quic_skakitap(a)quicinc.com>
---
Changes in v2:
- New patch [1/4] to add all the missing gcc bindings along with
the required GCC_CAMERA_AHB_CLOCK
- As per Konrad's comments, add the camera AHB clock dependency in the
DT and yaml bindings.
- As per Vladimir's comments, update the Kconfig to add the SC8180X config
in correct alphanumerical order.
- Link to v1: https://lore.kernel.org/r/20250422-sc8180x-camcc-support-v1-0-691614d13f06@…
---
Satya Priya Kakitapalli (4):
dt-bindings: clock: qcom: Add missing bindings on gcc-sc8180x
dt-bindings: clock: Add Qualcomm SC8180X Camera clock controller
clk: qcom: camcc-sc8180x: Add SC8180X camera clock controller driver
arm64: dts: qcom: Add camera clock controller for sc8180x
.../bindings/clock/qcom,sc8180x-camcc.yaml | 67 +
arch/arm64/boot/dts/qcom/sc8180x.dtsi | 14 +
drivers/clk/qcom/Kconfig | 10 +
drivers/clk/qcom/Makefile | 1 +
drivers/clk/qcom/camcc-sc8180x.c | 2897 ++++++++++++++++++++
include/dt-bindings/clock/qcom,gcc-sc8180x.h | 12 +
include/dt-bindings/clock/qcom,sc8180x-camcc.h | 181 ++
7 files changed, 3182 insertions(+)
---
base-commit: bc8aa6cdadcc00862f2b5720e5de2e17f696a081
change-id: 20250422-sc8180x-camcc-support-9a82507d2a39
Best regards,
--
Satya Priya Kakitapalli <quic_skakitap(a)quicinc.com>
From: Wayne Lin <Wayne.Lin(a)amd.com>
[Why]
Now forcing aux->transfer to return 0 when incomplete AUX write is
inappropriate. It should return bytes have been transferred.
[How]
aux->transfer is asked not to change original msg except reply field of
drm_dp_aux_msg structure. Copy the msg->buffer when it's write request,
and overwrite the first byte when sink reply 1 byte indicating partially
written byte number. Then we can return the correct value without
changing the original msg.
Fixes: 6285f12bc54c ("drm/amd/display: Fix wrong handling for AUX_DEFER case")
Cc: stable(a)vger.kernel.org
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Reviewed-by: Ray Wu <ray.wu(a)amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin(a)amd.com>
Signed-off-by: Ray Wu <ray.wu(a)amd.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 ++-
.../drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 10 ++++++++--
2 files changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 8984e211dd1c..36c16030fca9 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -12853,7 +12853,8 @@ int amdgpu_dm_process_dmub_aux_transfer_sync(
/* The reply is stored in the top nibble of the command. */
payload->reply[0] = (adev->dm.dmub_notify->aux_reply.command >> 4) & 0xF;
- if (!payload->write && p_notify->aux_reply.length)
+ /*write req may receive a byte indicating partially written number as well*/
+ if (p_notify->aux_reply.length)
memcpy(payload->data, p_notify->aux_reply.data,
p_notify->aux_reply.length);
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index d19aea595722..0d7b72c75802 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -62,6 +62,7 @@ static ssize_t dm_dp_aux_transfer(struct drm_dp_aux *aux,
enum aux_return_code_type operation_result;
struct amdgpu_device *adev;
struct ddc_service *ddc;
+ uint8_t copy[16];
if (WARN_ON(msg->size > 16))
return -E2BIG;
@@ -77,6 +78,11 @@ static ssize_t dm_dp_aux_transfer(struct drm_dp_aux *aux,
(msg->request & DP_AUX_I2C_WRITE_STATUS_UPDATE) != 0;
payload.defer_delay = 0;
+ if (payload.write) {
+ memcpy(copy, msg->buffer, msg->size);
+ payload.data = copy;
+ }
+
result = dc_link_aux_transfer_raw(TO_DM_AUX(aux)->ddc_service, &payload,
&operation_result);
@@ -100,9 +106,9 @@ static ssize_t dm_dp_aux_transfer(struct drm_dp_aux *aux,
*/
if (payload.write && result >= 0) {
if (result) {
- /*one byte indicating partially written bytes. Force 0 to retry*/
+ /*one byte indicating partially written bytes*/
drm_info(adev_to_drm(adev), "amdgpu: AUX partially written\n");
- result = 0;
+ result = payload.data[0];
} else if (!payload.reply[0])
/*I2C_ACK|AUX_ACK*/
result = msg->size;
--
2.43.0
Hello all,
After upgrading to 6.14.3 on my PC with a MT7925 chip, I noticed that I could no longer ping *.local addresses provided by Avahi. In addition, I also noticed that I was not able to get a DHCP IPv6 address from my router, no matter how many times I rebooted the router or reconnected with NetworkManager.
Reverting to 6.14.2 fixes both mDNS and IPv6 addresses immediately. Going back to 6.14.3 immediately breaks mDNS again, but the IPv6 address will stay there for a while before disappearing later, possibly because the DHCP lease expired? I am not sure exactly when it stops working.
I've done a kernel bisect between 6.14.2 and 6.14.3 and found the offending commit that causes mDNS to fail:
commit 80007d3f92fd018d0a052a706400e976b36e3c87
Author: Ming Yen Hsieh <mingyen.hsieh(a)mediatek.com>
Date: Tue Mar 4 16:08:50 2025 -0800
wifi: mt76: mt7925: integrate *mlo_sta_cmd and *sta_cmd
commit cb1353ef34735ec1e5d9efa1fe966f05ff1dc1e1 upstream.
Integrate *mlo_sta_cmd and *sta_cmd for the MLO firmware.
Fixes: 86c051f2c418 ("wifi: mt76: mt7925: enabling MLO when the firmware supports it")
drivers/net/wireless/mediatek/mt76/mt7925/mcu.c | 59 ++++-------------------------------------------------------
1 file changed, 4 insertions(+), 55 deletions(-)
I do not know if this same commit is also causing the IPv6 issues as testing that requires quite a bit of time to reproduce. What I do know with certainty as of this moment is that it definitely breaks in kernel 6.14.3.
I've attached my hardware info as well as dmesg logs from the last working kernel from the bisect and 6.14.4 which exhibits the issue. Please let me know if there's any other info you need.
Thanks!
Benjamin Xiao
... and make setting MADV_NOHUGEPAGE with madvise() into a no-op if THP
is not enabled.
I discovered this issue when trying to use the tool CRIU to checkpoint
and restore a container. Our running kernel is compiled without
CONFIG_TRANSPARENT_HUGETABLES. CRIU parses the output of
/proc/<pid>/smaps and saves the "nh" flag. When trying to restore the
container, CRIU fails to restore the "nh" mappings, since madvise()
MADV_NOHUGEPAGE always returns an error because
CONFIG_TRANSPARENT_HUGETABLES is not defined.
These patches:
- Avoid mapping MAP_STACK to VM_NOHUGEPAGE if !THP
- Avoid returning an error when calling madvise() with MADV_NOHUGEPAGE
if !THP
Signed-off-by: Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
---
Changes in v2:
- [Patch 1/2] Use '#ifdef' instead of '#if defined(...)'
- [Patch 1/2] Add 'Fixes: c4608d1bf7c6...'
- Create [Patch 2/2]
- Link to v1: https://lore.kernel.org/r/20250502-map-map_stack-to-vm_nohugepage-only-if-t…
---
Ignacio Moreno Gonzalez (2):
mm: mmap: map MAP_STACK to VM_NOHUGEPAGE only if THP is enabled
mm: madvise: no-op for MADV_NOHUGEPAGE if THP is disabled
include/linux/huge_mm.h | 6 ++++++
include/linux/mman.h | 2 ++
2 files changed, 8 insertions(+)
---
base-commit: fc96b232f8e7c0a6c282f47726b2ff6a5fb341d2
change-id: 20250428-map-map_stack-to-vm_nohugepage-only-if-thp-is-enabled-ce40a1de095d
Best regards,
--
Ignacio Moreno Gonzalez <Ignacio.MorenoGonzalez(a)kuka.com>
This patchset backports a series of ublk fixes from upstream to 6.14-stable.
Patch 7 fixes the race that can cause kernel panic when ublk server daemon is exiting.
It depends on patches 1-6 which simplifies & improves IO canceling when ublk server daemon
is exiting as described here:
https://lore.kernel.org/linux-block/20250416035444.99569-1-ming.lei@redhat.…
Ming Lei (5):
ublk: add helper of ublk_need_map_io()
ublk: move device reset into ublk_ch_release()
ublk: remove __ublk_quiesce_dev()
ublk: simplify aborting ublk request
ublk: fix race between io_uring_cmd_complete_in_task and
ublk_cancel_cmd
Uday Shankar (2):
ublk: properly serialize all FETCH_REQs
ublk: improve detection and handling of ublk server exit
drivers/block/ublk_drv.c | 550 +++++++++++++++++++++------------------
1 file changed, 291 insertions(+), 259 deletions(-)
--
2.43.0
From: Ming Lei <ming.lei(a)redhat.com>
ublk_cancel_cmd() calls io_uring_cmd_done() to complete uring_cmd, but
we may have scheduled task work via io_uring_cmd_complete_in_task() for
dispatching request, then kernel crash can be triggered.
Fix it by not trying to canceling the command if ublk block request is
started.
Fixes: 216c8f5ef0f2 ("ublk: replace monitor with cancelable uring_cmd")
Reported-by: Jared Holzman <jholzman(a)nvidia.com>
Tested-by: Jared Holzman <jholzman(a)nvidia.com>
Closes: https://lore.kernel.org/linux-block/d2179120-171b-47ba-b664-23242981ef19@nv…
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Link: https://lore.kernel.org/r/20250425013742.1079549-3-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
drivers/block/ublk_drv.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 6000147ac2a5..348c4feb7a2d 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1655,14 +1655,31 @@ static void ublk_start_cancel(struct ublk_queue *ubq)
ublk_put_disk(disk);
}
-static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
+static void ublk_cancel_cmd(struct ublk_queue *ubq, unsigned tag,
unsigned int issue_flags)
{
+ struct ublk_io *io = &ubq->ios[tag];
+ struct ublk_device *ub = ubq->dev;
+ struct request *req;
bool done;
if (!(io->flags & UBLK_IO_FLAG_ACTIVE))
return;
+ /*
+ * Don't try to cancel this command if the request is started for
+ * avoiding race between io_uring_cmd_done() and
+ * io_uring_cmd_complete_in_task().
+ *
+ * Either the started request will be aborted via __ublk_abort_rq(),
+ * then this uring_cmd is canceled next time, or it will be done in
+ * task work function ublk_dispatch_req() because io_uring guarantees
+ * that ublk_dispatch_req() is always called
+ */
+ req = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], tag);
+ if (req && blk_mq_request_started(req))
+ return;
+
spin_lock(&ubq->cancel_lock);
done = !!(io->flags & UBLK_IO_FLAG_CANCELED);
if (!done)
@@ -1694,7 +1711,6 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
struct ublk_queue *ubq = pdu->ubq;
struct task_struct *task;
- struct ublk_io *io;
if (WARN_ON_ONCE(!ubq))
return;
@@ -1709,9 +1725,8 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
if (!ubq->canceling)
ublk_start_cancel(ubq);
- io = &ubq->ios[pdu->tag];
- WARN_ON_ONCE(io->cmd != cmd);
- ublk_cancel_cmd(ubq, io, issue_flags);
+ WARN_ON_ONCE(ubq->ios[pdu->tag].cmd != cmd);
+ ublk_cancel_cmd(ubq, pdu->tag, issue_flags);
}
static inline bool ublk_queue_ready(struct ublk_queue *ubq)
@@ -1724,7 +1739,7 @@ static void ublk_cancel_queue(struct ublk_queue *ubq)
int i;
for (i = 0; i < ubq->q_depth; i++)
- ublk_cancel_cmd(ubq, &ubq->ios[i], IO_URING_F_UNLOCKED);
+ ublk_cancel_cmd(ubq, i, IO_URING_F_UNLOCKED);
}
/* Cancel all pending commands, must be called after del_gendisk() returns */
--
2.43.0
From: Ming Lei <ming.lei(a)redhat.com>
Now ublk_abort_queue() is moved to ublk char device release handler,
meantime our request queue is "quiesced" because either ->canceling was
set from uring_cmd cancel function or all IOs are inflight and can't be
completed by ublk server, things becomes easy much:
- all uring_cmd are done, so we needn't to mark io as UBLK_IO_FLAG_ABORTED
for handling completion from uring_cmd
- ublk char device is closed, no one can hold IO request reference any more,
so we can simply complete this request or requeue it for ublk_nosrv_should_reissue_outstanding.
Reviewed-by: Uday Shankar <ushankar(a)purestorage.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-8-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
drivers/block/ublk_drv.c | 82 ++++++++++------------------------------
1 file changed, 20 insertions(+), 62 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index c3f576a9dbf2..6000147ac2a5 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -115,15 +115,6 @@ struct ublk_uring_cmd_pdu {
*/
#define UBLK_IO_FLAG_OWNED_BY_SRV 0x02
-/*
- * IO command is aborted, so this flag is set in case of
- * !UBLK_IO_FLAG_ACTIVE.
- *
- * After this flag is observed, any pending or new incoming request
- * associated with this io command will be failed immediately
- */
-#define UBLK_IO_FLAG_ABORTED 0x04
-
/*
* UBLK_IO_FLAG_NEED_GET_DATA is set because IO command requires
* get data buffer address from ublksrv.
@@ -1054,12 +1045,6 @@ static inline void __ublk_complete_rq(struct request *req)
unsigned int unmapped_bytes;
blk_status_t res = BLK_STS_OK;
- /* called from ublk_abort_queue() code path */
- if (io->flags & UBLK_IO_FLAG_ABORTED) {
- res = BLK_STS_IOERR;
- goto exit;
- }
-
/* failed read IO if nothing is read */
if (!io->res && req_op(req) == REQ_OP_READ)
io->res = -EIO;
@@ -1109,47 +1094,6 @@ static void ublk_complete_rq(struct kref *ref)
__ublk_complete_rq(req);
}
-static void ublk_do_fail_rq(struct request *req)
-{
- struct ublk_queue *ubq = req->mq_hctx->driver_data;
-
- if (ublk_nosrv_should_reissue_outstanding(ubq->dev))
- blk_mq_requeue_request(req, false);
- else
- __ublk_complete_rq(req);
-}
-
-static void ublk_fail_rq_fn(struct kref *ref)
-{
- struct ublk_rq_data *data = container_of(ref, struct ublk_rq_data,
- ref);
- struct request *req = blk_mq_rq_from_pdu(data);
-
- ublk_do_fail_rq(req);
-}
-
-/*
- * Since ublk_rq_task_work_cb always fails requests immediately during
- * exiting, __ublk_fail_req() is only called from abort context during
- * exiting. So lock is unnecessary.
- *
- * Also aborting may not be started yet, keep in mind that one failed
- * request may be issued by block layer again.
- */
-static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
- struct request *req)
-{
- WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
-
- if (ublk_need_req_ref(ubq)) {
- struct ublk_rq_data *data = blk_mq_rq_to_pdu(req);
-
- kref_put(&data->ref, ublk_fail_rq_fn);
- } else {
- ublk_do_fail_rq(req);
- }
-}
-
static void ubq_complete_io_cmd(struct ublk_io *io, int res,
unsigned issue_flags)
{
@@ -1639,10 +1583,26 @@ static void ublk_commit_completion(struct ublk_device *ub,
ublk_put_req_ref(ubq, req);
}
+static void __ublk_fail_req(struct ublk_queue *ubq, struct ublk_io *io,
+ struct request *req)
+{
+ WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_ACTIVE);
+
+ if (ublk_nosrv_should_reissue_outstanding(ubq->dev))
+ blk_mq_requeue_request(req, false);
+ else {
+ io->res = -EIO;
+ __ublk_complete_rq(req);
+ }
+}
+
/*
- * Called from ubq_daemon context via cancel fn, meantime quiesce ublk
- * blk-mq queue, so we are called exclusively with blk-mq and ubq_daemon
- * context, so everything is serialized.
+ * Called from ublk char device release handler, when any uring_cmd is
+ * done, meantime request queue is "quiesced" since all inflight requests
+ * can't be completed because ublk server is dead.
+ *
+ * So no one can hold our request IO reference any more, simply ignore the
+ * reference, and complete the request immediately
*/
static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
{
@@ -1659,10 +1619,8 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
* will do it
*/
rq = blk_mq_tag_to_rq(ub->tag_set.tags[ubq->q_id], i);
- if (rq && blk_mq_request_started(rq)) {
- io->flags |= UBLK_IO_FLAG_ABORTED;
+ if (rq && blk_mq_request_started(rq))
__ublk_fail_req(ubq, io, rq);
- }
}
}
}
--
2.43.0
From: Uday Shankar <ushankar(a)purestorage.com>
There are currently two ways in which ublk server exit is detected by
ublk_drv:
1. uring_cmd cancellation. If there are any outstanding uring_cmds which
have not been completed to the ublk server when it exits, io_uring
calls the uring_cmd callback with a special cancellation flag as the
issuing task is exiting.
2. I/O timeout. This is needed in addition to the above to handle the
"saturated queue" case, when all I/Os for a given queue are in the
ublk server, and therefore there are no outstanding uring_cmds to
cancel when the ublk server exits.
There are a couple of issues with this approach:
- It is complex and inelegant to have two methods to detect the same
condition
- The second method detects ublk server exit only after a long delay
(~30s, the default timeout assigned by the block layer). This delays
the nosrv behavior from kicking in and potential subsequent recovery
of the device.
The second issue is brought to light with the new test_generic_06 which
will be added in following patch. It fails before this fix:
selftests: ublk: test_generic_06.sh
dev id is 0
dd: error writing '/dev/ublkb0': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 30.0611 s, 0.0 kB/s
DEAD
dd took 31 seconds to exit (>= 5s tolerance)!
generic_06 : [FAIL]
Fix this by instead detecting and handling ublk server exit in the
character file release callback. This has several advantages:
- This one place can handle both saturated and unsaturated queues. Thus,
it replaces both preexisting methods of detecting ublk server exit.
- It runs quickly on ublk server exit - there is no 30s delay.
- It starts the process of removing task references in ublk_drv. This is
needed if we want to relax restrictions in the driver like letting
only one thread serve each queue
There is also the disadvantage that the character file release callback
can also be triggered by intentional close of the file, which is a
significant behavior change. Preexisting ublk servers (libublksrv) are
dependent on the ability to open/close the file multiple times. To
address this, only transition to a nosrv state if the file is released
while the ublk device is live. This allows for programs to open/close
the file multiple times during setup. It is still a behavior change if a
ublk server decides to close/reopen the file while the device is LIVE
(i.e. while it is responsible for serving I/O), but that would be highly
unusual. This behavior is in line with what is done by FUSE, which is
very similar to ublk in that a userspace daemon is providing services
traditionally provided by the kernel.
With this change in, the new test (and all other selftests, and all
ublksrv tests) pass:
selftests: ublk: test_generic_06.sh
dev id is 0
dd: error writing '/dev/ublkb0': Input/output error
1+0 records in
0+0 records out
0 bytes copied, 0.0376731 s, 0.0 kB/s
DEAD
generic_04 : [PASS]
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-6-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
drivers/block/ublk_drv.c | 223 ++++++++++++++++++++++-----------------
1 file changed, 124 insertions(+), 99 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index c619df880c72..652742db0396 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -194,8 +194,6 @@ struct ublk_device {
struct completion completion;
unsigned int nr_queues_ready;
unsigned int nr_privileged_daemon;
-
- struct work_struct nosrv_work;
};
/* header of ublk_params */
@@ -204,7 +202,10 @@ struct ublk_params_header {
__u32 types;
};
-static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq);
+
+static void ublk_stop_dev_unlocked(struct ublk_device *ub);
+static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq);
+static void __ublk_quiesce_dev(struct ublk_device *ub);
static inline unsigned int ublk_req_build_flags(struct request *req);
static inline struct ublksrv_io_desc *ublk_get_iod(struct ublk_queue *ubq,
@@ -1306,8 +1307,6 @@ static void ublk_queue_cmd_list(struct ublk_queue *ubq, struct rq_list *l)
static enum blk_eh_timer_return ublk_timeout(struct request *rq)
{
struct ublk_queue *ubq = rq->mq_hctx->driver_data;
- unsigned int nr_inflight = 0;
- int i;
if (ubq->flags & UBLK_F_UNPRIVILEGED_DEV) {
if (!ubq->timeout) {
@@ -1318,26 +1317,6 @@ static enum blk_eh_timer_return ublk_timeout(struct request *rq)
return BLK_EH_DONE;
}
- if (!ubq_daemon_is_dying(ubq))
- return BLK_EH_RESET_TIMER;
-
- for (i = 0; i < ubq->q_depth; i++) {
- struct ublk_io *io = &ubq->ios[i];
-
- if (!(io->flags & UBLK_IO_FLAG_ACTIVE))
- nr_inflight++;
- }
-
- /* cancelable uring_cmd can't help us if all commands are in-flight */
- if (nr_inflight == ubq->q_depth) {
- struct ublk_device *ub = ubq->dev;
-
- if (ublk_abort_requests(ub, ubq)) {
- schedule_work(&ub->nosrv_work);
- }
- return BLK_EH_DONE;
- }
-
return BLK_EH_RESET_TIMER;
}
@@ -1495,13 +1474,105 @@ static void ublk_reset_ch_dev(struct ublk_device *ub)
ub->nr_privileged_daemon = 0;
}
+static struct gendisk *ublk_get_disk(struct ublk_device *ub)
+{
+ struct gendisk *disk;
+
+ spin_lock(&ub->lock);
+ disk = ub->ub_disk;
+ if (disk)
+ get_device(disk_to_dev(disk));
+ spin_unlock(&ub->lock);
+
+ return disk;
+}
+
+static void ublk_put_disk(struct gendisk *disk)
+{
+ if (disk)
+ put_device(disk_to_dev(disk));
+}
+
static int ublk_ch_release(struct inode *inode, struct file *filp)
{
struct ublk_device *ub = filp->private_data;
+ struct gendisk *disk;
+ int i;
+
+ /*
+ * disk isn't attached yet, either device isn't live, or it has
+ * been removed already, so we needn't to do anything
+ */
+ disk = ublk_get_disk(ub);
+ if (!disk)
+ goto out;
+
+ /*
+ * All uring_cmd are done now, so abort any request outstanding to
+ * the ublk server
+ *
+ * This can be done in lockless way because ublk server has been
+ * gone
+ *
+ * More importantly, we have to provide forward progress guarantee
+ * without holding ub->mutex, otherwise control task grabbing
+ * ub->mutex triggers deadlock
+ *
+ * All requests may be inflight, so ->canceling may not be set, set
+ * it now.
+ */
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
+ struct ublk_queue *ubq = ublk_get_queue(ub, i);
+
+ ubq->canceling = true;
+ ublk_abort_queue(ub, ubq);
+ }
+ blk_mq_kick_requeue_list(disk->queue);
+
+ /*
+ * All infligh requests have been completed or requeued and any new
+ * request will be failed or requeued via `->canceling` now, so it is
+ * fine to grab ub->mutex now.
+ */
+ mutex_lock(&ub->mutex);
+
+ /* double check after grabbing lock */
+ if (!ub->ub_disk)
+ goto unlock;
+
+ /*
+ * Transition the device to the nosrv state. What exactly this
+ * means depends on the recovery flags
+ */
+ blk_mq_quiesce_queue(disk->queue);
+ if (ublk_nosrv_should_stop_dev(ub)) {
+ /*
+ * Allow any pending/future I/O to pass through quickly
+ * with an error. This is needed because del_gendisk
+ * waits for all pending I/O to complete
+ */
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+ ublk_get_queue(ub, i)->force_abort = true;
+ blk_mq_unquiesce_queue(disk->queue);
+
+ ublk_stop_dev_unlocked(ub);
+ } else {
+ if (ublk_nosrv_dev_should_queue_io(ub)) {
+ __ublk_quiesce_dev(ub);
+ } else {
+ ub->dev_info.state = UBLK_S_DEV_FAIL_IO;
+ for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
+ ublk_get_queue(ub, i)->fail_io = true;
+ }
+ blk_mq_unquiesce_queue(disk->queue);
+ }
+unlock:
+ mutex_unlock(&ub->mutex);
+ ublk_put_disk(disk);
/* all uring_cmd has been done now, reset device & ubq */
ublk_reset_ch_dev(ub);
-
+out:
clear_bit(UB_STATE_OPEN, &ub->state);
return 0;
}
@@ -1597,37 +1668,22 @@ static void ublk_abort_queue(struct ublk_device *ub, struct ublk_queue *ubq)
}
/* Must be called when queue is frozen */
-static bool ublk_mark_queue_canceling(struct ublk_queue *ubq)
+static void ublk_mark_queue_canceling(struct ublk_queue *ubq)
{
- bool canceled;
-
spin_lock(&ubq->cancel_lock);
- canceled = ubq->canceling;
- if (!canceled)
+ if (!ubq->canceling)
ubq->canceling = true;
spin_unlock(&ubq->cancel_lock);
-
- return canceled;
}
-static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
+static void ublk_start_cancel(struct ublk_queue *ubq)
{
- bool was_canceled = ubq->canceling;
- struct gendisk *disk;
-
- if (was_canceled)
- return false;
-
- spin_lock(&ub->lock);
- disk = ub->ub_disk;
- if (disk)
- get_device(disk_to_dev(disk));
- spin_unlock(&ub->lock);
+ struct ublk_device *ub = ubq->dev;
+ struct gendisk *disk = ublk_get_disk(ub);
/* Our disk has been dead */
if (!disk)
- return false;
-
+ return;
/*
* Now we are serialized with ublk_queue_rq()
*
@@ -1636,15 +1692,9 @@ static bool ublk_abort_requests(struct ublk_device *ub, struct ublk_queue *ubq)
* touch completed uring_cmd
*/
blk_mq_quiesce_queue(disk->queue);
- was_canceled = ublk_mark_queue_canceling(ubq);
- if (!was_canceled) {
- /* abort queue is for making forward progress */
- ublk_abort_queue(ub, ubq);
- }
+ ublk_mark_queue_canceling(ubq);
blk_mq_unquiesce_queue(disk->queue);
- put_device(disk_to_dev(disk));
-
- return !was_canceled;
+ ublk_put_disk(disk);
}
static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
@@ -1668,6 +1718,17 @@ static void ublk_cancel_cmd(struct ublk_queue *ubq, struct ublk_io *io,
/*
* The ublk char device won't be closed when calling cancel fn, so both
* ublk device and queue are guaranteed to be live
+ *
+ * Two-stage cancel:
+ *
+ * - make every active uring_cmd done in ->cancel_fn()
+ *
+ * - aborting inflight ublk IO requests in ublk char device release handler,
+ * which depends on 1st stage because device can only be closed iff all
+ * uring_cmd are done
+ *
+ * Do _not_ try to acquire ub->mutex before all inflight requests are
+ * aborted, otherwise deadlock may be caused.
*/
static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
unsigned int issue_flags)
@@ -1675,8 +1736,6 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
struct ublk_uring_cmd_pdu *pdu = ublk_get_uring_cmd_pdu(cmd);
struct ublk_queue *ubq = pdu->ubq;
struct task_struct *task;
- struct ublk_device *ub;
- bool need_schedule;
struct ublk_io *io;
if (WARN_ON_ONCE(!ubq))
@@ -1689,16 +1748,12 @@ static void ublk_uring_cmd_cancel_fn(struct io_uring_cmd *cmd,
if (WARN_ON_ONCE(task && task != ubq->ubq_daemon))
return;
- ub = ubq->dev;
- need_schedule = ublk_abort_requests(ub, ubq);
+ if (!ubq->canceling)
+ ublk_start_cancel(ubq);
io = &ubq->ios[pdu->tag];
WARN_ON_ONCE(io->cmd != cmd);
ublk_cancel_cmd(ubq, io, issue_flags);
-
- if (need_schedule) {
- schedule_work(&ub->nosrv_work);
- }
}
static inline bool ublk_queue_ready(struct ublk_queue *ubq)
@@ -1757,13 +1812,11 @@ static void __ublk_quiesce_dev(struct ublk_device *ub)
__func__, ub->dev_info.dev_id,
ub->dev_info.state == UBLK_S_DEV_LIVE ?
"LIVE" : "QUIESCED");
- blk_mq_quiesce_queue(ub->ub_disk->queue);
/* mark every queue as canceling */
for (i = 0; i < ub->dev_info.nr_hw_queues; i++)
ublk_get_queue(ub, i)->canceling = true;
ublk_wait_tagset_rqs_idle(ub);
ub->dev_info.state = UBLK_S_DEV_QUIESCED;
- blk_mq_unquiesce_queue(ub->ub_disk->queue);
}
static void ublk_force_abort_dev(struct ublk_device *ub)
@@ -1800,50 +1853,25 @@ static struct gendisk *ublk_detach_disk(struct ublk_device *ub)
return disk;
}
-static void ublk_stop_dev(struct ublk_device *ub)
+static void ublk_stop_dev_unlocked(struct ublk_device *ub)
+ __must_hold(&ub->mutex)
{
struct gendisk *disk;
- mutex_lock(&ub->mutex);
if (ub->dev_info.state == UBLK_S_DEV_DEAD)
- goto unlock;
+ return;
+
if (ublk_nosrv_dev_should_queue_io(ub))
ublk_force_abort_dev(ub);
del_gendisk(ub->ub_disk);
disk = ublk_detach_disk(ub);
put_disk(disk);
- unlock:
- mutex_unlock(&ub->mutex);
- ublk_cancel_dev(ub);
}
-static void ublk_nosrv_work(struct work_struct *work)
+static void ublk_stop_dev(struct ublk_device *ub)
{
- struct ublk_device *ub =
- container_of(work, struct ublk_device, nosrv_work);
- int i;
-
- if (ublk_nosrv_should_stop_dev(ub)) {
- ublk_stop_dev(ub);
- return;
- }
-
mutex_lock(&ub->mutex);
- if (ub->dev_info.state != UBLK_S_DEV_LIVE)
- goto unlock;
-
- if (ublk_nosrv_dev_should_queue_io(ub)) {
- __ublk_quiesce_dev(ub);
- } else {
- blk_mq_quiesce_queue(ub->ub_disk->queue);
- ub->dev_info.state = UBLK_S_DEV_FAIL_IO;
- for (i = 0; i < ub->dev_info.nr_hw_queues; i++) {
- ublk_get_queue(ub, i)->fail_io = true;
- }
- blk_mq_unquiesce_queue(ub->ub_disk->queue);
- }
-
- unlock:
+ ublk_stop_dev_unlocked(ub);
mutex_unlock(&ub->mutex);
ublk_cancel_dev(ub);
}
@@ -2419,7 +2447,6 @@ static int ublk_add_tag_set(struct ublk_device *ub)
static void ublk_remove(struct ublk_device *ub)
{
ublk_stop_dev(ub);
- cancel_work_sync(&ub->nosrv_work);
cdev_device_del(&ub->cdev, &ub->cdev_dev);
ublk_put_device(ub);
ublks_added--;
@@ -2693,7 +2720,6 @@ static int ublk_ctrl_add_dev(struct io_uring_cmd *cmd)
goto out_unlock;
mutex_init(&ub->mutex);
spin_lock_init(&ub->lock);
- INIT_WORK(&ub->nosrv_work, ublk_nosrv_work);
ret = ublk_alloc_dev_number(ub, header->dev_id);
if (ret < 0)
@@ -2828,7 +2854,6 @@ static inline void ublk_ctrl_cmd_dump(struct io_uring_cmd *cmd)
static int ublk_ctrl_stop_dev(struct ublk_device *ub)
{
ublk_stop_dev(ub);
- cancel_work_sync(&ub->nosrv_work);
return 0;
}
--
2.43.0
From: Uday Shankar <ushankar(a)purestorage.com>
Most uring_cmds issued against ublk character devices are serialized
because each command affects only one queue, and there is an early check
which only allows a single task (the queue's ubq_daemon) to issue
uring_cmds against that queue. However, this mechanism does not work for
FETCH_REQs, since they are expected before ubq_daemon is set. Since
FETCH_REQs are only used at initialization and not in the fast path,
serialize them using the per-ublk-device mutex. This fixes a number of
data races that were previously possible if a badly behaved ublk server
decided to issue multiple FETCH_REQs against the same qid/tag
concurrently.
Reported-by: Caleb Sander Mateos <csander(a)purestorage.com>
Signed-off-by: Uday Shankar <ushankar(a)purestorage.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Link: https://lore.kernel.org/r/20250416035444.99569-2-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
drivers/block/ublk_drv.c | 77 +++++++++++++++++++++++++---------------
1 file changed, 49 insertions(+), 28 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index 4e81505179c6..9345a6d8dbd8 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1803,8 +1803,8 @@ static void ublk_nosrv_work(struct work_struct *work)
/* device can only be started after all IOs are ready */
static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
+ __must_hold(&ub->mutex)
{
- mutex_lock(&ub->mutex);
ubq->nr_io_ready++;
if (ublk_queue_ready(ubq)) {
ubq->ubq_daemon = current;
@@ -1816,7 +1816,6 @@ static void ublk_mark_io_ready(struct ublk_device *ub, struct ublk_queue *ubq)
}
if (ub->nr_queues_ready == ub->dev_info.nr_hw_queues)
complete_all(&ub->completion);
- mutex_unlock(&ub->mutex);
}
static inline int ublk_check_cmd_op(u32 cmd_op)
@@ -1855,6 +1854,52 @@ static inline void ublk_prep_cancel(struct io_uring_cmd *cmd,
io_uring_cmd_mark_cancelable(cmd, issue_flags);
}
+static int ublk_fetch(struct io_uring_cmd *cmd, struct ublk_queue *ubq,
+ struct ublk_io *io, __u64 buf_addr)
+{
+ struct ublk_device *ub = ubq->dev;
+ int ret = 0;
+
+ /*
+ * When handling FETCH command for setting up ublk uring queue,
+ * ub->mutex is the innermost lock, and we won't block for handling
+ * FETCH, so it is fine even for IO_URING_F_NONBLOCK.
+ */
+ mutex_lock(&ub->mutex);
+ /* UBLK_IO_FETCH_REQ is only allowed before queue is setup */
+ if (ublk_queue_ready(ubq)) {
+ ret = -EBUSY;
+ goto out;
+ }
+
+ /* allow each command to be FETCHed at most once */
+ if (io->flags & UBLK_IO_FLAG_ACTIVE) {
+ ret = -EINVAL;
+ goto out;
+ }
+
+ WARN_ON_ONCE(io->flags & UBLK_IO_FLAG_OWNED_BY_SRV);
+
+ if (ublk_need_map_io(ubq)) {
+ /*
+ * FETCH_RQ has to provide IO buffer if NEED GET
+ * DATA is not enabled
+ */
+ if (!buf_addr && !ublk_need_get_data(ubq))
+ goto out;
+ } else if (buf_addr) {
+ /* User copy requires addr to be unset */
+ ret = -EINVAL;
+ goto out;
+ }
+
+ ublk_fill_io_cmd(io, cmd, buf_addr);
+ ublk_mark_io_ready(ub, ubq);
+out:
+ mutex_unlock(&ub->mutex);
+ return ret;
+}
+
static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
unsigned int issue_flags,
const struct ublksrv_io_cmd *ub_cmd)
@@ -1907,33 +1952,9 @@ static int __ublk_ch_uring_cmd(struct io_uring_cmd *cmd,
ret = -EINVAL;
switch (_IOC_NR(cmd_op)) {
case UBLK_IO_FETCH_REQ:
- /* UBLK_IO_FETCH_REQ is only allowed before queue is setup */
- if (ublk_queue_ready(ubq)) {
- ret = -EBUSY;
- goto out;
- }
- /*
- * The io is being handled by server, so COMMIT_RQ is expected
- * instead of FETCH_REQ
- */
- if (io->flags & UBLK_IO_FLAG_OWNED_BY_SRV)
- goto out;
-
- if (ublk_need_map_io(ubq)) {
- /*
- * FETCH_RQ has to provide IO buffer if NEED GET
- * DATA is not enabled
- */
- if (!ub_cmd->addr && !ublk_need_get_data(ubq))
- goto out;
- } else if (ub_cmd->addr) {
- /* User copy requires addr to be unset */
- ret = -EINVAL;
+ ret = ublk_fetch(cmd, ubq, io, ub_cmd->addr);
+ if (ret)
goto out;
- }
-
- ublk_fill_io_cmd(io, cmd, ub_cmd->addr);
- ublk_mark_io_ready(ub, ubq);
break;
case UBLK_IO_COMMIT_AND_FETCH_REQ:
req = blk_mq_tag_to_rq(ub->tag_set.tags[ub_cmd->q_id], tag);
--
2.43.0
From: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
[ Upstream commit e56088a13708757da68ad035269d69b93ac8c389 ]
The public datasheets of the following Amlogic SoCs describe a typical
resistor value for the built-in pull up/down resistor:
- Meson8/8b/8m2: not documented
- GXBB (S905): 60 kOhm
- GXL (S905X): 60 kOhm
- GXM (S912): 60 kOhm
- G12B (S922X): 60 kOhm
- SM1 (S905D3): 60 kOhm
The public G12B and SM1 datasheets additionally state min and max
values:
- min value: 50 kOhm for both, pull-up and pull-down
- max value for the pull-up: 70 kOhm
- max value for the pull-down: 130 kOhm
Use 60 kOhm in the pinctrl-meson driver as well so it's shown in the
debugfs output. It may not be accurate for Meson8/8b/8m2 but in reality
60 kOhm is closer to the actual value than 1 Ohm.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
Reviewed-by: Neil Armstrong <neil.armstrong(a)linaro.org>
Link: https://lore.kernel.org/20250329190132.855196-1-martin.blumenstingl@googlem…
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/pinctrl/meson/pinctrl-meson.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pinctrl/meson/pinctrl-meson.c b/drivers/pinctrl/meson/pinctrl-meson.c
index aba479a1150c8..f3b381370e5ed 100644
--- a/drivers/pinctrl/meson/pinctrl-meson.c
+++ b/drivers/pinctrl/meson/pinctrl-meson.c
@@ -480,7 +480,7 @@ static int meson_pinconf_get(struct pinctrl_dev *pcdev, unsigned int pin,
case PIN_CONFIG_BIAS_PULL_DOWN:
case PIN_CONFIG_BIAS_PULL_UP:
if (meson_pinconf_get_pull(pc, pin) == param)
- arg = 1;
+ arg = 60000;
else
return -EINVAL;
break;
--
2.39.5
From: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
[ Upstream commit e56088a13708757da68ad035269d69b93ac8c389 ]
The public datasheets of the following Amlogic SoCs describe a typical
resistor value for the built-in pull up/down resistor:
- Meson8/8b/8m2: not documented
- GXBB (S905): 60 kOhm
- GXL (S905X): 60 kOhm
- GXM (S912): 60 kOhm
- G12B (S922X): 60 kOhm
- SM1 (S905D3): 60 kOhm
The public G12B and SM1 datasheets additionally state min and max
values:
- min value: 50 kOhm for both, pull-up and pull-down
- max value for the pull-up: 70 kOhm
- max value for the pull-down: 130 kOhm
Use 60 kOhm in the pinctrl-meson driver as well so it's shown in the
debugfs output. It may not be accurate for Meson8/8b/8m2 but in reality
60 kOhm is closer to the actual value than 1 Ohm.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
Reviewed-by: Neil Armstrong <neil.armstrong(a)linaro.org>
Link: https://lore.kernel.org/20250329190132.855196-1-martin.blumenstingl@googlem…
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/pinctrl/meson/pinctrl-meson.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/pinctrl/meson/pinctrl-meson.c b/drivers/pinctrl/meson/pinctrl-meson.c
index 20683cd072bb0..ae72edba8a1f0 100644
--- a/drivers/pinctrl/meson/pinctrl-meson.c
+++ b/drivers/pinctrl/meson/pinctrl-meson.c
@@ -483,7 +483,7 @@ static int meson_pinconf_get(struct pinctrl_dev *pcdev, unsigned int pin,
case PIN_CONFIG_BIAS_PULL_DOWN:
case PIN_CONFIG_BIAS_PULL_UP:
if (meson_pinconf_get_pull(pc, pin) == param)
- arg = 1;
+ arg = 60000;
else
return -EINVAL;
break;
--
2.39.5
From: Chenyuan Yang <chenyuan0y(a)gmail.com>
[ Upstream commit a9a69c3b38c89d7992fb53db4abb19104b531d32 ]
Incorrect types are used as sizeof() arguments in devm_kcalloc().
It should be sizeof(dai_link_data) for link_data instead of
sizeof(snd_soc_dai_link).
This is found by our static analysis tool.
Signed-off-by: Chenyuan Yang <chenyuan0y(a)gmail.com>
Link: https://patch.msgid.link/20250406210854.149316-1-chenyuan0y@gmail.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/fsl/imx-card.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/fsl/imx-card.c b/sound/soc/fsl/imx-card.c
index 2b64c0384b6bb..9b14cda56b068 100644
--- a/sound/soc/fsl/imx-card.c
+++ b/sound/soc/fsl/imx-card.c
@@ -517,7 +517,7 @@ static int imx_card_parse_of(struct imx_card_data *data)
if (!card->dai_link)
return -ENOMEM;
- data->link_data = devm_kcalloc(dev, num_links, sizeof(*link), GFP_KERNEL);
+ data->link_data = devm_kcalloc(dev, num_links, sizeof(*link_data), GFP_KERNEL);
if (!data->link_data)
return -ENOMEM;
--
2.39.5
From: Chenyuan Yang <chenyuan0y(a)gmail.com>
[ Upstream commit a9a69c3b38c89d7992fb53db4abb19104b531d32 ]
Incorrect types are used as sizeof() arguments in devm_kcalloc().
It should be sizeof(dai_link_data) for link_data instead of
sizeof(snd_soc_dai_link).
This is found by our static analysis tool.
Signed-off-by: Chenyuan Yang <chenyuan0y(a)gmail.com>
Link: https://patch.msgid.link/20250406210854.149316-1-chenyuan0y@gmail.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/fsl/imx-card.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/fsl/imx-card.c b/sound/soc/fsl/imx-card.c
index c6d55b21f9496..11430f9f49968 100644
--- a/sound/soc/fsl/imx-card.c
+++ b/sound/soc/fsl/imx-card.c
@@ -517,7 +517,7 @@ static int imx_card_parse_of(struct imx_card_data *data)
if (!card->dai_link)
return -ENOMEM;
- data->link_data = devm_kcalloc(dev, num_links, sizeof(*link), GFP_KERNEL);
+ data->link_data = devm_kcalloc(dev, num_links, sizeof(*link_data), GFP_KERNEL);
if (!data->link_data)
return -ENOMEM;
--
2.39.5
From: Chenyuan Yang <chenyuan0y(a)gmail.com>
[ Upstream commit a9a69c3b38c89d7992fb53db4abb19104b531d32 ]
Incorrect types are used as sizeof() arguments in devm_kcalloc().
It should be sizeof(dai_link_data) for link_data instead of
sizeof(snd_soc_dai_link).
This is found by our static analysis tool.
Signed-off-by: Chenyuan Yang <chenyuan0y(a)gmail.com>
Link: https://patch.msgid.link/20250406210854.149316-1-chenyuan0y@gmail.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/fsl/imx-card.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/fsl/imx-card.c b/sound/soc/fsl/imx-card.c
index 7128bcf3a743e..bb304de5cc38a 100644
--- a/sound/soc/fsl/imx-card.c
+++ b/sound/soc/fsl/imx-card.c
@@ -517,7 +517,7 @@ static int imx_card_parse_of(struct imx_card_data *data)
if (!card->dai_link)
return -ENOMEM;
- data->link_data = devm_kcalloc(dev, num_links, sizeof(*link), GFP_KERNEL);
+ data->link_data = devm_kcalloc(dev, num_links, sizeof(*link_data), GFP_KERNEL);
if (!data->link_data)
return -ENOMEM;
--
2.39.5
From: Chenyuan Yang <chenyuan0y(a)gmail.com>
[ Upstream commit a9a69c3b38c89d7992fb53db4abb19104b531d32 ]
Incorrect types are used as sizeof() arguments in devm_kcalloc().
It should be sizeof(dai_link_data) for link_data instead of
sizeof(snd_soc_dai_link).
This is found by our static analysis tool.
Signed-off-by: Chenyuan Yang <chenyuan0y(a)gmail.com>
Link: https://patch.msgid.link/20250406210854.149316-1-chenyuan0y@gmail.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/fsl/imx-card.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/fsl/imx-card.c b/sound/soc/fsl/imx-card.c
index 93dbe40008c00..e5ae435171d68 100644
--- a/sound/soc/fsl/imx-card.c
+++ b/sound/soc/fsl/imx-card.c
@@ -516,7 +516,7 @@ static int imx_card_parse_of(struct imx_card_data *data)
if (!card->dai_link)
return -ENOMEM;
- data->link_data = devm_kcalloc(dev, num_links, sizeof(*link), GFP_KERNEL);
+ data->link_data = devm_kcalloc(dev, num_links, sizeof(*link_data), GFP_KERNEL);
if (!data->link_data)
return -ENOMEM;
--
2.39.5
From: Chenyuan Yang <chenyuan0y(a)gmail.com>
[ Upstream commit a9a69c3b38c89d7992fb53db4abb19104b531d32 ]
Incorrect types are used as sizeof() arguments in devm_kcalloc().
It should be sizeof(dai_link_data) for link_data instead of
sizeof(snd_soc_dai_link).
This is found by our static analysis tool.
Signed-off-by: Chenyuan Yang <chenyuan0y(a)gmail.com>
Link: https://patch.msgid.link/20250406210854.149316-1-chenyuan0y@gmail.com
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/fsl/imx-card.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/fsl/imx-card.c b/sound/soc/fsl/imx-card.c
index 21f617f6f9fa8..566214cb3d60c 100644
--- a/sound/soc/fsl/imx-card.c
+++ b/sound/soc/fsl/imx-card.c
@@ -543,7 +543,7 @@ static int imx_card_parse_of(struct imx_card_data *data)
if (!card->dai_link)
return -ENOMEM;
- data->link_data = devm_kcalloc(dev, num_links, sizeof(*link), GFP_KERNEL);
+ data->link_data = devm_kcalloc(dev, num_links, sizeof(*link_data), GFP_KERNEL);
if (!data->link_data)
return -ENOMEM;
--
2.39.5
The vfio-pci huge_fault handler doesn't make any attempt to insert a
mapping containing the faulting address, it only inserts mappings if the
faulting address and resulting pfn are aligned. This works in a lot of
cases, particularly in conjunction with QEMU where DMA mappings linearly
fault the mmap. However, there are configurations where we don't get
that linear faulting and pages are faulted on-demand.
The scenario reported in the bug below is such a case, where the physical
address width of the CPU is greater than that of the IOMMU, resulting in a
VM where guest firmware has mapped device MMIO beyond the address width of
the IOMMU. In this configuration, the MMIO is faulted on demand and
tracing indicates that occasionally the faults generate a VM_FAULT_OOM.
Given the use case, this results in a "error: kvm run failed Bad address",
killing the VM.
The host is not under memory pressure in this test, therefore it's
suspected that VM_FAULT_OOM is actually the result of a NULL return from
__pte_offset_map_lock() in the get_locked_pte() path from insert_pfn().
This suggests a potential race inserting a pte concurrent to a pmd, and
maybe indicates some deficiency in the mm layer properly handling such a
case.
Nevertheless, Peter noted the inconsistency of vfio-pci's huge_fault
handler where our mapping granularity depends on the alignment of the
faulting address relative to the order rather than aligning the faulting
address to the order to more consistently insert huge mappings. This
change not only uses the page tables more consistently and efficiently, but
as any fault to an aligned page results in the same mapping, the race
condition suspected in the VM_FAULT_OOM is avoided.
Reported-by: Adolfo <adolfotregosa(a)gmail.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220057
Fixes: 09dfc8a5f2ce ("vfio/pci: Fallback huge faults for unaligned pfn")
Cc: stable(a)vger.kernel.org
Tested-by: Adolfo <adolfotregosa(a)gmail.com>
Co-developed-by: Peter Xu <peterx(a)redhat.com>
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
---
drivers/vfio/pci/vfio_pci_core.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_core.c b/drivers/vfio/pci/vfio_pci_core.c
index 35f9046af315..6328c3a05bcd 100644
--- a/drivers/vfio/pci/vfio_pci_core.c
+++ b/drivers/vfio/pci/vfio_pci_core.c
@@ -1646,14 +1646,14 @@ static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf,
{
struct vm_area_struct *vma = vmf->vma;
struct vfio_pci_core_device *vdev = vma->vm_private_data;
- unsigned long pfn, pgoff = vmf->pgoff - vma->vm_pgoff;
+ unsigned long addr = vmf->address & ~((PAGE_SIZE << order) - 1);
+ unsigned long pgoff = (addr - vma->vm_start) >> PAGE_SHIFT;
+ unsigned long pfn = vma_to_pfn(vma) + pgoff;
vm_fault_t ret = VM_FAULT_SIGBUS;
- pfn = vma_to_pfn(vma) + pgoff;
-
- if (order && (pfn & ((1 << order) - 1) ||
- vmf->address & ((PAGE_SIZE << order) - 1) ||
- vmf->address + (PAGE_SIZE << order) > vma->vm_end)) {
+ if (order && (addr < vma->vm_start ||
+ addr + (PAGE_SIZE << order) > vma->vm_end ||
+ pfn & ((1 << order) - 1))) {
ret = VM_FAULT_FALLBACK;
goto out;
}
--
2.48.1
On Tue, 06 May 2025 12:11:26 +0200,
Ezra Khuzadi wrote:
>
>
> Hi Takashi, Jaroslav, all maintainers,
>
> Could you please review it or let me know if any changes are needed? This is
> my first kernel patch as a student, and I’d appreciate any feedback.
I guess you submitted to a wrong address. The proper mailing list is
linux-sound(a)vger.kernel.org. Please try to resubmit.
And, make sure that your mailer doesn't break tabs and whitespaces.
You can test sending to yourself and verify that the submitted patch
is properly applicable beforehand.
thanks,
Takashi
>
> Thanks,
> Ezra Khuzadi
>
> On Wed, Apr 30, 2025 at 1:43 AM Ezra Khuzadi <ekhuzadi(a)uci.edu> wrote:
>
> sound/pci/hda/patch_realtek.c: add quirk for HP Spectre x360 15-eb0xxx
>
> Add subsystem ID 0x86e5 for HP Spectre x360 15-eb0xxx so that
> ALC285_FIXUP_HP_SPECTRE_X360_EB1 (GPIO amp-enable, mic-mute LED and
> pinconfigs) is applied.
>
> Tested on HP Spectre x360 15-eb0043dx (Vendor 0x10ec0285, Subsys
> 0x103c86e5)
> with legacy HDA driver and hda-verb toggles:
>
> $ cat /proc/asound/card0/codec#0 \
> | sed -n -e '1,5p;/Vendor Id:/p;/Subsystem Id:/p'
> Codec: Realtek ALC285
> Vendor Id: 0x10ec0285
> Subsystem Id: 0x103c86e5
>
> $ dmesg | grep -i realtek
> [ 5.828728] snd_hda_codec_realtek ehdaudio0D0: ALC285: picked fixup
> for PCI SSID 103c:86e5
>
> Signed-off-by: Ezra Khuzadi <ekhuzadi(a)uci.edu>
> Cc: stable(a)vger.kernel.org
>
> ---
> sound/pci/hda/patch_realtek.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
> index 877137cb09ac..82ad105e7fa9 100644
> --- a/sound/pci/hda/patch_realtek.c
> +++ b/sound/pci/hda/patch_realtek.c
> @@ -10563,6 +10563,7 @@ static const struct hda_quirk alc269_fixup_tbl[] =
> {
> SND_PCI_QUIRK(0x103c, 0x86c7, "HP Envy AiO 32",
> ALC274_FIXUP_HP_ENVY_GPIO),
> + SND_PCI_QUIRK(0x103c, 0x86e5, "HP Spectre x360 15-eb0xxx",
> ALC285_FIXUP_HP_SPECTRE_X360_EB1),
> SND_PCI_QUIRK(0x103c, 0x86e7, "HP Spectre x360 15-eb0xxx",
> ALC285_FIXUP_HP_SPECTRE_X360_EB1),
> SND_PCI_QUIRK(0x103c, 0x86e8, "HP Spectre x360 15-eb0xxx",
> ALC285_FIXUP_HP_SPECTRE_X360_EB1),
> SND_PCI_QUIRK(0x103c, 0x86f9, "HP Spectre x360 13-aw0xxx",
> ALC285_FIXUP_HP_SPECTRE_X360_MUTE_LED),
>
> On Wed, Apr 30, 2025 at 1:33 AM kernel test robot <lkp(a)intel.com> wrote:
> >
> > Hi,
> >
> > Thanks for your patch.
> >
> > FYI: kernel test robot notices the stable kernel rule is not satisfied.
> >
> > The check is based on
> https://urldefense.com/v3/__https://www.kernel.org/doc/html/latest/process/…
> >
> > Rule: add the tag "Cc: stable(a)vger.kernel.org" in the sign-off area to
> have the patch automatically included in the stable tree.
> > Subject: sound/pci/hda: add quirk for HP Spectre x360 15-eb0xxx
> > Link:
> https://urldefense.com/v3/__https://lore.kernel.org/stable/CAPXr0uxh0c_2b2-…
> >
> > --
> > 0-DAY CI Kernel Test Service
> >
> https://urldefense.com/v3/__https://github.com/intel/lkp-tests/wiki__;!!CzA…
> >
> >
> >
>
If a driver is removed, the driver framework invokes the driver's
remove callback. A CAN driver's remove function calls
unregister_candev(), which calls net_device_ops::ndo_stop further down
in the call stack for interfaces which are in the "up" state.
The removal of the module causes a warning, as can_rx_offload_del()
deletes the NAPI, while it is still active, because the interface is
still up.
To fix the warning, first unregister the network interface, which
calls net_device_ops::ndo_stop, which disables the NAPI, and then call
can_rx_offload_del().
Fixes: 1be37d3b0414 ("can: m_can: fix periph RX path: use rx-offload to ensure skbs are sent from softirq context")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/20250502-can-rx-offload-del-v1-3-59a9b131589d@peng…
Reviewed-by: Markus Schneider-Pargmann <msp(a)baylibre.com>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/m_can/m_can.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c
index 326ede9d400f..c2c116ce1087 100644
--- a/drivers/net/can/m_can/m_can.c
+++ b/drivers/net/can/m_can/m_can.c
@@ -2463,9 +2463,9 @@ EXPORT_SYMBOL_GPL(m_can_class_register);
void m_can_class_unregister(struct m_can_classdev *cdev)
{
+ unregister_candev(cdev->net);
if (cdev->is_peripheral)
can_rx_offload_del(&cdev->offload);
- unregister_candev(cdev->net);
}
EXPORT_SYMBOL_GPL(m_can_class_unregister);
--
2.47.2
If a driver is removed, the driver framework invokes the driver's
remove callback. A CAN driver's remove function calls
unregister_candev(), which calls net_device_ops::ndo_stop further down
in the call stack for interfaces which are in the "up" state.
The removal of the module causes a warning, as can_rx_offload_del()
deletes the NAPI, while it is still active, because the interface is
still up.
To fix the warning, first unregister the network interface, which
calls net_device_ops::ndo_stop, which disables the NAPI, and then call
can_rx_offload_del().
Fixes: ff60bfbaf67f ("can: rockchip_canfd: add driver for Rockchip CAN-FD controller")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/20250502-can-rx-offload-del-v1-2-59a9b131589d@peng…
Reviewed-by: Markus Schneider-Pargmann <msp(a)baylibre.com>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/rockchip/rockchip_canfd-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/rockchip/rockchip_canfd-core.c b/drivers/net/can/rockchip/rockchip_canfd-core.c
index 7107a37da36c..c3fb3176ce42 100644
--- a/drivers/net/can/rockchip/rockchip_canfd-core.c
+++ b/drivers/net/can/rockchip/rockchip_canfd-core.c
@@ -937,8 +937,8 @@ static void rkcanfd_remove(struct platform_device *pdev)
struct rkcanfd_priv *priv = platform_get_drvdata(pdev);
struct net_device *ndev = priv->ndev;
- can_rx_offload_del(&priv->offload);
rkcanfd_unregister(priv);
+ can_rx_offload_del(&priv->offload);
free_candev(ndev);
}
--
2.47.2
If a driver is removed, the driver framework invokes the driver's
remove callback. A CAN driver's remove function calls
unregister_candev(), which calls net_device_ops::ndo_stop further down
in the call stack for interfaces which are in the "up" state.
With the mcp251xfd driver the removal of the module causes the
following warning:
| WARNING: CPU: 0 PID: 352 at net/core/dev.c:7342 __netif_napi_del_locked+0xc8/0xd8
as can_rx_offload_del() deletes the NAPI, while it is still active,
because the interface is still up.
To fix the warning, first unregister the network interface, which
calls net_device_ops::ndo_stop, which disables the NAPI, and then call
can_rx_offload_del().
Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Cc: stable(a)vger.kernel.org
Link: https://patch.msgid.link/20250502-can-rx-offload-del-v1-1-59a9b131589d@peng…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
index 064d81c724f4..c30b04f8fc0d 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
@@ -2198,8 +2198,8 @@ static void mcp251xfd_remove(struct spi_device *spi)
struct mcp251xfd_priv *priv = spi_get_drvdata(spi);
struct net_device *ndev = priv->ndev;
- can_rx_offload_del(&priv->offload);
mcp251xfd_unregister(priv);
+ can_rx_offload_del(&priv->offload);
spi->max_speed_hz = priv->spi_max_speed_hz_orig;
free_candev(ndev);
}
--
2.47.2
Hi All,
Chages since v3:
- pfn_to_virt() changed to page_to_virt() due to compile error
Chages since v2:
- page allocation moved out of the atomic context
Chages since v1:
- Fixes: and -stable tags added to the patch description
Thanks!
Alexander Gordeev (1):
kasan: Avoid sleepable page allocation from atomic context
mm/kasan/shadow.c | 63 +++++++++++++++++++++++++++++++++++------------
1 file changed, 47 insertions(+), 16 deletions(-)
--
2.45.2
Hi All,
Chages since v2:
- page allocation moved out of the atomic context
Chages since v1:
- Fixes: and -stable tags added to the patch description
Thanks!
Alexander Gordeev (1):
kasan: Avoid sleepable page allocation from atomic context
mm/kasan/shadow.c | 65 +++++++++++++++++++++++++++++++++++------------
1 file changed, 49 insertions(+), 16 deletions(-)
--
2.45.2
Greetings Dear,
Send your Ref: FSG2025 / Name / Phone Number / Country to Mr. Andrej
Mahecic on un.grant(a)socialworker.net, +1 888 673 0430 for your £100,000.00.
Sincerely
Mr. C. Gunness
On behalf of the UN.
From: Daniel Gomez <da.gomez(a)samsung.com>
[ Upstream commit a26fe287eed112b4e21e854f173c8918a6a8596d ]
The scripts/kconfig/merge_config.sh script requires an existing
$INITFILE (or the $1 argument) as a base file for merging Kconfig
fragments. However, an empty $INITFILE can serve as an initial starting
point, later referenced by the KCONFIG_ALLCONFIG Makefile variable
if -m is not used. This variable can point to any configuration file
containing preset config symbols (the merged output) as stated in
Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will
contain just the merge output requiring the user to run make (i.e.
KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make
olddefconfig).
Instead of failing when `$INITFILE` is missing, create an empty file and
use it as the starting point for merges.
Signed-off-by: Daniel Gomez <da.gomez(a)samsung.com>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
scripts/kconfig/merge_config.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/kconfig/merge_config.sh b/scripts/kconfig/merge_config.sh
index 0b7952471c18f..79c09b378be81 100755
--- a/scripts/kconfig/merge_config.sh
+++ b/scripts/kconfig/merge_config.sh
@@ -112,8 +112,8 @@ INITFILE=$1
shift;
if [ ! -r "$INITFILE" ]; then
- echo "The base file '$INITFILE' does not exist. Exit." >&2
- exit 1
+ echo "The base file '$INITFILE' does not exist. Creating one..." >&2
+ touch "$INITFILE"
fi
MERGE_LIST=$*
--
2.39.5
On configs with CONFIG_ARM64_GCS=y, VM_SHADOW_STACK is bit 38.
On configs with CONFIG_HAVE_ARCH_USERFAULTFD_MINOR=y (selected by
CONFIG_ARM64 when CONFIG_USERFAULTFD=y), VM_UFFD_MINOR is _also_ bit 38.
This bit being shared by two different VMA flags could lead to all sorts
of unintended behaviors. Presumably, a process could maybe call into
userfaultfd in a way that disables the shadow stack vma flag. I can't
think of any attack where this would help (presumably, if an attacker
tries to disable shadow stacks, they are trying to hijack control flow
so can't arbitrarily call into userfaultfd yet anyway) but this still
feels somewhat scary.
Fixes: ae80e1629aea ("mm: Define VM_SHADOW_STACK for arm64 when we support GCS")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Florent Revest <revest(a)chromium.org>
---
include/linux/mm.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index bf55206935c46..fdda6b16263b3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -385,7 +385,7 @@ extern unsigned int kobjsize(const void *objp);
#endif
#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
-# define VM_UFFD_MINOR_BIT 38
+# define VM_UFFD_MINOR_BIT 41
# define VM_UFFD_MINOR BIT(VM_UFFD_MINOR_BIT) /* UFFD minor faults */
#else /* !CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
# define VM_UFFD_MINOR VM_NONE
--
2.49.0.967.g6a0df3ecc3-goog
From: Amit Sunil Dhamne <amitsd(a)google.com>
Register read of TCPC_RX_BYTE_CNT returns the total size consisting of:
PD message (pending read) size + 1 Byte for Frame Type (SOP*)
This is validated against the max PD message (`struct pd_message`) size
without accounting for the extra byte for the frame type. Note that the
struct pd_message does not contain a field for the frame_type. This
results in false negatives when the "PD message (pending read)" is equal
to the max PD message size.
Fixes: 6f413b559f86 ("usb: typec: tcpci_maxim: Chip level TCPC driver")
Signed-off-by: Amit Sunil Dhamne <amitsd(a)google.com>
Signed-off-by: Badhri Jagan Sridharan <badhri(a)google.com>
Reviewed-by: Kyle Tso <kyletso(a)google.com>
---
drivers/usb/typec/tcpm/tcpci_maxim_core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/typec/tcpm/tcpci_maxim_core.c b/drivers/usb/typec/tcpm/tcpci_maxim_core.c
index fd1b80593367641a6f997da2fb97a2b7238f6982..648311f5e3cf135f23b5cc0668001d2f177b9edd 100644
--- a/drivers/usb/typec/tcpm/tcpci_maxim_core.c
+++ b/drivers/usb/typec/tcpm/tcpci_maxim_core.c
@@ -166,7 +166,8 @@ static void process_rx(struct max_tcpci_chip *chip, u16 status)
return;
}
- if (count > sizeof(struct pd_message) || count + 1 > TCPC_RECEIVE_BUFFER_LEN) {
+ if (count > sizeof(struct pd_message) + 1 ||
+ count + 1 > TCPC_RECEIVE_BUFFER_LEN) {
dev_err(chip->dev, "Invalid TCPC_RX_BYTE_CNT %d\n", count);
return;
}
---
base-commit: ebd297a2affadb6f6f4d2e5d975c1eda18ac762d
change-id: 20250421-b4-new-fix-pd-rx-count-79297ba619b7
Best regards,
--
Amit Sunil Dhamne <amitsd(a)google.com>
create_init_idmap() could be called before .bss section initialization
which is done in early_map_kernel().
Therefore, data/test_prot could be set incorrectly by PTE_MAYBE_NG macro.
PTE_MAYBE_NG macro set NG bit according to value of "arm64_use_ng_mappings".
and this variable places in .bss section.
# llvm-objdump-21 --syms vmlinux-gcc | grep arm64_use_ng_mappings
ffff800082f242a8 g O .bss 0000000000000001 arm64_use_ng_mappings
If .bss section doesn't initialized, "arm64_use_ng_mappings" would be set
with garbage value and then the text_prot or data_prot could be set incorrectly.
Here is what i saw with kernel compiled via llvm-21
// create_init_idmap()
ffff80008255c058: d10103ff sub sp, sp, #0x40
ffff80008255c05c: a9017bfd stp x29, x30, [sp, #0x10]
ffff80008255c060: a90257f6 stp x22, x21, [sp, #0x20]
ffff80008255c064: a9034ff4 stp x20, x19, [sp, #0x30]
ffff80008255c068: 910043fd add x29, sp, #0x10
ffff80008255c06c: 90003fc8 adrp x8, 0xffff800082d54000
ffff80008255c070: d280e06a mov x10, #0x703 // =1795
ffff80008255c074: 91400409 add x9, x0, #0x1, lsl #12 // =0x1000
ffff80008255c078: 394a4108 ldrb w8, [x8, #0x290] ------------- (1)
ffff80008255c07c: f2e00d0a movk x10, #0x68, lsl #48
ffff80008255c080: f90007e9 str x9, [sp, #0x8]
ffff80008255c084: aa0103f3 mov x19, x1
ffff80008255c088: aa0003f4 mov x20, x0
ffff80008255c08c: 14000000 b 0xffff80008255c08c <__pi_create_init_idmap+0x34>
ffff80008255c090: aa082d56 orr x22, x10, x8, lsl #11 -------- (2)
Note, (1) is load the arm64_use_ng_mappings value in w8 and
(2) is set the text or data prot with the w8 value to set PTE_NG bit.
If .bss section doesn't initialized, x8 can include garbage value
-- In case of some platform, x8 loaded with 0xcf -- it could generate
wrong mapping. (i.e) text_prot is expected with
PAGE_KERNEL_ROX(0x0040000000000F83) but
with garbage x8 -- 0xcf, it sets with (0x0040000000067F83)
and This makes boot failure with translation fault.
This error cannot happen according to code generated by compiler.
here is the case of gcc:
ffff80008260a940 <__pi_create_init_idmap>:
ffff80008260a940: d100c3ff sub sp, sp, #0x30
ffff80008260a944: aa0003ed mov x13, x0
ffff80008260a948: 91400400 add x0, x0, #0x1, lsl #12 // =0x1000
ffff80008260a94c: a9017bfd stp x29, x30, [sp, #0x10]
ffff80008260a950: 910043fd add x29, sp, #0x10
ffff80008260a954: f90017e0 str x0, [sp, #0x28]
ffff80008260a958: d00048c0 adrp x0, 0xffff800082f24000 <reset_devices>
ffff80008260a95c: 394aa000 ldrb w0, [x0, #0x2a8]
ffff80008260a960: 37000640 tbnz w0, #0x0, 0xffff80008260aa28 <__pi_create_init_idmap+0xe8> ---(3)
ffff80008260a964: d280f060 mov x0, #0x783 // =1923
ffff80008260a968: d280e062 mov x2, #0x703 // =1795
ffff80008260a96c: f2e00800 movk x0, #0x40, lsl #48
ffff80008260a970: f2e00d02 movk x2, #0x68, lsl #48
ffff80008260a974: aa2103e4 mvn x4, x1
ffff80008260a978: 8a210049 bic x9, x2, x1
...
ffff80008260aa28: d281f060 mov x0, #0xf83 // =3971
ffff80008260aa2c: d281e062 mov x2, #0xf03 // =3843
ffff80008260aa30: f2e00800 movk x0, #0x40, lsl #48
In case of gcc, according to value of arm64_use_ng_mappings (annoated as(3)),
it branches to each prot settup code.
However this is also problem since it branches according to garbage
value too -- idmapping with incorrect pgprot.
To resolve this, annotate arm64_use_ng_mappings as ro_after_init.
Fixes: 84b04d3e6bdb ("arm64: kernel: Create initial ID map from C code")
Cc: <stable(a)vger.kernel.org> # 6.9.x
Tested-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Yeoreum Yun <yeoreum.yun(a)arm.com>
---
Since v1:
- add comments explaining arm64_use_ng_mappings shouldn't place .bss
section
- fix type on commit message
- https://lore.kernel.org/all/20250502145755.3751405-1-yeoreum.yun@arm.com/
There is another way to solve this problem by setting
test/data_prot with _PAGE_DEFAULT which doesn't include PTE_MAYBE_NG
with constanst check in create_init_idmap() to be free from
arm64_use_ng_mappings. but i think it would be better to change
arm64_use_ng_mappings as ro_after_init because it doesn't change after
init phase and solve this problem too.
---
arch/arm64/kernel/cpufeature.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index d2104a1e7843..913ae2cead98 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -114,7 +114,18 @@ static struct arm64_cpu_capabilities const __ro_after_init *cpucap_ptrs[ARM64_NC
DECLARE_BITMAP(boot_cpucaps, ARM64_NCAPS);
-bool arm64_use_ng_mappings = false;
+/*
+ * The variable arm64_use_ng_mappings should be placed in the .rodata section.
+ * Otherwise, it would end up in the .bss section, where it is initialized in
+ * early_map_kernel(). This can cause problems because the PTE_MAYBE_NG macro
+ * uses this variable, and create_init_idmap() — which might run before
+ * early_map_kernel() — could end up generating an incorrect idmap table.
+ *
+ * In other words, accessing variable placed in .bss section before
+ * early_map_kernel() will return garbage,
+ * potentially resulting in a wrong pgprot value.
+ */
+bool arm64_use_ng_mappings __ro_after_init = false;
EXPORT_SYMBOL(arm64_use_ng_mappings);
DEFINE_PER_CPU_READ_MOSTLY(const char *, this_cpu_vector) = vectors;
--
LEVI:{C3F47F37-75D8-414A-A8BA-3980EC8A46D7}
Hi All,
This patch series adds initial support for the HEVC(H.265) and VP9
codecs in iris decoder. The objective of this work is to extend the
decoder's capabilities to handle HEVC and VP9 codec streams,
including necessary format handling and buffer management.
In addition, the series also includes a set of fixes to address issues
identified during testing of these additional codecs.
These patches also address the comments and feedback received from the
RFC patches previously sent. I have made the necessary improvements
based on the community's suggestions.
Changes in v3:
- Introduced two wrappers with explicit names to handle destroy internal
buffers (Nicolas)
- Used sub state check instead of introducing new boolean (Vikash)
- Addressed other comments (Vikash)
- Reorderd patches to have all fixes patches first (Dmitry)
- Link to v2: https://lore.kernel.org/r/20250428-qcom-iris-hevc-vp9-v2-0-3a6013ecb8a5@qui…
Changes in v2:
- Added Changes to make sure all buffers are released in session close
(bryna)
- Added tracking for flush responses to fix a timing issue.
- Added a handling to fix timing issue in reconfig
- Splitted patch 06/20 in two patches (Bryan)
- Added missing fixes tag (bryan)
- Updated fluster report (Nicolas)
- Link to v1:
https://lore.kernel.org/r/20250408-iris-dec-hevc-vp9-v1-0-acd258778bd6@quic…
Changes sinces RFC:
- Added additional fixes to address issues identified during further
testing.
- Moved typo fix to a seperate patch [Neil]
- Reordered the patches for better logical flow and clarity [Neil,
Dmitry]
- Added fixes tag wherever applicable [Neil, Dmitry]
- Removed the default case in the switch statement for codecs [Bryan]
- Replaced if-else statements with switch-case [Bryan]
- Added comments for mbpf [Bryan]
- RFC:
https://lore.kernel.org/linux-media/20250305104335.3629945-1-quic_dikshita@…
This patch series depends on [1] & [2]
[1] https://lore.kernel.org/linux-media/20250417-topic-sm8x50-iris-v10-v7-0-f02…
[2] https://lore.kernel.org/linux-media/20250424-qcs8300_iris-v5-0-f118f505c300…
These patches are tested on SM8250 and SM8550 with v4l2-ctl and
Gstreamer for HEVC and VP9 decoders, at the same time ensured that
the existing H264 decoder functionality remains uneffected.
Note: 1 of the fluster compliance test is fixed with firmware [3]
[3]:
https://lore.kernel.org/linux-firmware/1a511921-446d-cdc4-0203-084c88a5dc1e…
The result of fluster test on SM8550:
131/147 testcases passed while testing JCT-VC-HEVC_V1 with
GStreamer-H.265-V4L2-Gst1.0.
The failing test case:
- 10 testcases failed due to unsupported 10 bit format.
- DBLK_A_MAIN10_VIXS_4
- INITQP_B_Main10_Sony_1
- TSUNEQBD_A_MAIN10_Technicolor_2
- WP_A_MAIN10_Toshiba_3
- WP_MAIN10_B_Toshiba_3
- WPP_A_ericsson_MAIN10_2
- WPP_B_ericsson_MAIN10_2
- WPP_C_ericsson_MAIN10_2
- WPP_E_ericsson_MAIN10_2
- WPP_F_ericsson_MAIN10_2
- 4 testcase failed due to unsupported resolution
- PICSIZE_A_Bossen_1
- PICSIZE_B_Bossen_1
- WPP_D_ericsson_MAIN10_2
- WPP_D_ericsson_MAIN_2
- 2 testcase failed due to CRC mismatch
- RAP_A_docomo_6
- RAP_B_Bossen_2
- BUG reported: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4392
Analysis - First few frames in this discarded by firmware and are
sent to driver with 0 filled length. Driver send such buffers to
client with timestamp 0 and payload set to 0 and
make buf state to VB2_BUF_STATE_ERROR. Such buffers should be
dropped by GST. But instead, the first frame displayed as green
frame and when a valid buffer is sent to client later with same 0
timestamp, its dropped, leading to CRC mismatch for first frame.
235/305 testcases passed while testing VP9-TEST-VECTORS with
GStreamer-VP9-V4L2-Gst1.0.
The failing test case:
- 64 testcases failed due to unsupported resolution
- vp90-2-02-size-08x08.webm
- vp90-2-02-size-08x10.webm
- vp90-2-02-size-08x16.webm
- vp90-2-02-size-08x18.webm
- vp90-2-02-size-08x32.webm
- vp90-2-02-size-08x34.webm
- vp90-2-02-size-08x64.webm
- vp90-2-02-size-08x66.webm
- vp90-2-02-size-10x08.webm
- vp90-2-02-size-10x10.webm
- vp90-2-02-size-10x16.webm
- vp90-2-02-size-10x18.webm
- vp90-2-02-size-10x32.webm
- vp90-2-02-size-10x34.webm
- vp90-2-02-size-10x64.webm
- vp90-2-02-size-10x66.webm
- vp90-2-02-size-16x08.webm
- vp90-2-02-size-16x10.webm
- vp90-2-02-size-16x16.webm
- vp90-2-02-size-16x18.webm
- vp90-2-02-size-16x32.webm
- vp90-2-02-size-16x34.webm
- vp90-2-02-size-16x64.webm
- vp90-2-02-size-16x66.webm
- vp90-2-02-size-18x08.webm
- vp90-2-02-size-18x10.webm
- vp90-2-02-size-18x16.webm
- vp90-2-02-size-18x18.webm
- vp90-2-02-size-18x32.webm
- vp90-2-02-size-18x34.webm
- vp90-2-02-size-18x64.webm
- vp90-2-02-size-18x66.webm
- vp90-2-02-size-32x08.webm
- vp90-2-02-size-32x10.webm
- vp90-2-02-size-32x16.webm
- vp90-2-02-size-32x18.webm
- vp90-2-02-size-32x32.webm
- vp90-2-02-size-32x34.webm
- vp90-2-02-size-32x64.webm
- vp90-2-02-size-32x66.webm
- vp90-2-02-size-34x08.webm
- vp90-2-02-size-34x10.webm
- vp90-2-02-size-34x16.webm
- vp90-2-02-size-34x18.webm
- vp90-2-02-size-34x32.webm
- vp90-2-02-size-34x34.webm
- vp90-2-02-size-34x64.webm
- vp90-2-02-size-34x66.webm
- vp90-2-02-size-64x08.webm
- vp90-2-02-size-64x10.webm
- vp90-2-02-size-64x16.webm
- vp90-2-02-size-64x18.webm
- vp90-2-02-size-64x32.webm
- vp90-2-02-size-64x34.webm
- vp90-2-02-size-64x64.webm
- vp90-2-02-size-64x66.webm
- vp90-2-02-size-66x08.webm
- vp90-2-02-size-66x10.webm
- vp90-2-02-size-66x16.webm
- vp90-2-02-size-66x18.webm
- vp90-2-02-size-66x32.webm
- vp90-2-02-size-66x34.webm
- vp90-2-02-size-66x64.webm
- vp90-2-02-size-66x66.webm
- 2 testcases failed due to unsupported format
- vp91-2-04-yuv422.webm
- vp91-2-04-yuv444.webm
- 1 testcase failed with CRC mismatch
- vp90-2-22-svc_1280x720_3.ivf
- Bug reported: https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4371
- 2 testcase failed due to unsupported resolution after sequence change
- vp90-2-21-resize_inter_320x180_5_1-2.webm
- vp90-2-21-resize_inter_320x180_7_1-2.webm
- 1 testcase failed due to unsupported stream
- vp90-2-16-intra-only.webm
The result of fluster test on SM8250:
133/147 testcases passed while testing JCT-VC-HEVC_V1 with
GStreamer-H.265-V4L2-Gst1.0.
The failing test case:
- 10 testcases failed due to unsupported 10 bit format.
- DBLK_A_MAIN10_VIXS_4
- INITQP_B_Main10_Sony_1
- TSUNEQBD_A_MAIN10_Technicolor_2
- WP_A_MAIN10_Toshiba_3
- WP_MAIN10_B_Toshiba_3
- WPP_A_ericsson_MAIN10_2
- WPP_B_ericsson_MAIN10_2
- WPP_C_ericsson_MAIN10_2
- WPP_E_ericsson_MAIN10_2
- WPP_F_ericsson_MAIN10_2
- 4 testcase failed due to unsupported resolution
- PICSIZE_A_Bossen_1
- PICSIZE_B_Bossen_1
- WPP_D_ericsson_MAIN10_2
- WPP_D_ericsson_MAIN_2
232/305 testcases passed while testing VP9-TEST-VECTORS with
GStreamer-VP9-V4L2-Gst1.0.
The failing test case:
- 64 testcases failed due to unsupported resolution
- vp90-2-02-size-08x08.webm
- vp90-2-02-size-08x10.webm
- vp90-2-02-size-08x16.webm
- vp90-2-02-size-08x18.webm
- vp90-2-02-size-08x32.webm
- vp90-2-02-size-08x34.webm
- vp90-2-02-size-08x64.webm
- vp90-2-02-size-08x66.webm
- vp90-2-02-size-10x08.webm
- vp90-2-02-size-10x10.webm
- vp90-2-02-size-10x16.webm
- vp90-2-02-size-10x18.webm
- vp90-2-02-size-10x32.webm
- vp90-2-02-size-10x34.webm
- vp90-2-02-size-10x64.webm
- vp90-2-02-size-10x66.webm
- vp90-2-02-size-16x08.webm
- vp90-2-02-size-16x10.webm
- vp90-2-02-size-16x16.webm
- vp90-2-02-size-16x18.webm
- vp90-2-02-size-16x32.webm
- vp90-2-02-size-16x34.webm
- vp90-2-02-size-16x64.webm
- vp90-2-02-size-16x66.webm
- vp90-2-02-size-18x08.webm
- vp90-2-02-size-18x10.webm
- vp90-2-02-size-18x16.webm
- vp90-2-02-size-18x18.webm
- vp90-2-02-size-18x32.webm
- vp90-2-02-size-18x34.webm
- vp90-2-02-size-18x64.webm
- vp90-2-02-size-18x66.webm
- vp90-2-02-size-32x08.webm
- vp90-2-02-size-32x10.webm
- vp90-2-02-size-32x16.webm
- vp90-2-02-size-32x18.webm
- vp90-2-02-size-32x32.webm
- vp90-2-02-size-32x34.webm
- vp90-2-02-size-32x64.webm
- vp90-2-02-size-32x66.webm
- vp90-2-02-size-34x08.webm
- vp90-2-02-size-34x10.webm
- vp90-2-02-size-34x16.webm
- vp90-2-02-size-34x18.webm
- vp90-2-02-size-34x32.webm
- vp90-2-02-size-34x34.webm
- vp90-2-02-size-34x64.webm
- vp90-2-02-size-34x66.webm
- vp90-2-02-size-64x08.webm
- vp90-2-02-size-64x10.webm
- vp90-2-02-size-64x16.webm
- vp90-2-02-size-64x18.webm
- vp90-2-02-size-64x32.webm
- vp90-2-02-size-64x34.webm
- vp90-2-02-size-64x64.webm
- vp90-2-02-size-64x66.webm
- vp90-2-02-size-66x08.webm
- vp90-2-02-size-66x10.webm
- vp90-2-02-size-66x16.webm
- vp90-2-02-size-66x18.webm
- vp90-2-02-size-66x32.webm
- vp90-2-02-size-66x34.webm
- vp90-2-02-size-66x64.webm
- vp90-2-02-size-66x66.webm
- 2 testcases failed due to unsupported format
- vp91-2-04-yuv422.webm
- vp91-2-04-yuv444.webm
- 1 testcase failed with CRC mismatch
- vp90-2-22-svc_1280x720_3.ivf
- Bug raised:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4371
- 5 testcase failed due to unsupported resolution after sequence change
- vp90-2-21-resize_inter_320x180_5_1-2.webm
- vp90-2-21-resize_inter_320x180_7_1-2.webm
- vp90-2-21-resize_inter_320x240_5_1-2.webm
- vp90-2-21-resize_inter_320x240_7_1-2.webm
- vp90-2-18-resize.ivf
- 1 testcase failed with CRC mismatch
- vp90-2-16-intra-only.webm
Analysis: First few frames are marked by firmware as NO_SHOW frame.
Driver make buf state to VB2_BUF_STATE_ERROR for such frames.
Such buffers should be dropped by GST. But instead, the first frame
is being displayed and when a valid buffer is sent to client later
with same timestamp, its dropped, leading to CRC mismatch for first
frame.
Signed-off-by: Dikshita Agarwal <quic_dikshita(a)quicinc.com>
---
Dikshita Agarwal (23):
media: iris: Skip destroying internal buffer if not dequeued
media: iris: Update CAPTURE format info based on OUTPUT format
media: iris: Avoid updating frame size to firmware during reconfig
media: iris: Drop port check for session property response
media: iris: Prevent HFI queue writes when core is in deinit state
media: iris: Remove deprecated property setting to firmware
media: iris: Fix missing function pointer initialization
media: iris: Fix NULL pointer dereference
media: iris: Fix typo in depth variable
media: iris: Track flush responses to prevent premature completion
media: iris: Fix buffer preparation failure during resolution change
media: iris: Add handling for corrupt and drop frames
media: iris: Send V4L2_BUF_FLAG_ERROR for buffers with 0 filled length
media: iris: Add handling for no show frames
media: iris: Improve last flag handling
media: iris: Skip flush on first sequence change
media: iris: Remove redundant buffer count check in stream off
media: iris: Add a comment to explain usage of MBPS
media: iris: Add HEVC and VP9 formats for decoder
media: iris: Add platform capabilities for HEVC and VP9 decoders
media: iris: Set mandatory properties for HEVC and VP9 decoders.
media: iris: Add internal buffer calculation for HEVC and VP9 decoders
media: iris: Add codec specific check for VP9 decoder drain handling
drivers/media/platform/qcom/iris/iris_buffer.c | 35 +-
drivers/media/platform/qcom/iris/iris_buffer.h | 3 +-
drivers/media/platform/qcom/iris/iris_ctrls.c | 35 +-
drivers/media/platform/qcom/iris/iris_hfi_common.h | 1 +
.../platform/qcom/iris/iris_hfi_gen1_command.c | 48 ++-
.../platform/qcom/iris/iris_hfi_gen1_defines.h | 5 +-
.../platform/qcom/iris/iris_hfi_gen1_response.c | 37 +-
.../platform/qcom/iris/iris_hfi_gen2_command.c | 143 +++++++-
.../platform/qcom/iris/iris_hfi_gen2_defines.h | 5 +
.../platform/qcom/iris/iris_hfi_gen2_response.c | 57 ++-
drivers/media/platform/qcom/iris/iris_hfi_queue.c | 2 +-
drivers/media/platform/qcom/iris/iris_instance.h | 6 +
.../platform/qcom/iris/iris_platform_common.h | 28 +-
.../media/platform/qcom/iris/iris_platform_gen2.c | 198 ++++++++--
.../platform/qcom/iris/iris_platform_qcs8300.h | 126 +++++--
.../platform/qcom/iris/iris_platform_sm8250.c | 15 +-
drivers/media/platform/qcom/iris/iris_state.c | 2 +-
drivers/media/platform/qcom/iris/iris_state.h | 1 +
drivers/media/platform/qcom/iris/iris_vb2.c | 18 +-
drivers/media/platform/qcom/iris/iris_vdec.c | 116 +++---
drivers/media/platform/qcom/iris/iris_vdec.h | 11 +
drivers/media/platform/qcom/iris/iris_vidc.c | 36 +-
drivers/media/platform/qcom/iris/iris_vpu_buffer.c | 397 ++++++++++++++++++++-
drivers/media/platform/qcom/iris/iris_vpu_buffer.h | 46 ++-
24 files changed, 1160 insertions(+), 211 deletions(-)
---
base-commit: 398a1b33f1479af35ca915c5efc9b00d6204f8fa
change-id: 20250428-qcom-iris-hevc-vp9-eb31f30c3390
prerequisite-message-id: <20250417-topic-sm8x50-iris-v10-v7-0-f020cb1d0e98(a)linaro.org>
prerequisite-patch-id: 35f8dae1416977e88c2db7c767800c01822e266e
prerequisite-patch-id: 2bba98151ca103aa62a513a0fbd0df7ae64d9868
prerequisite-patch-id: 0e43a6d758b5fa5ab921c6aa3c19859e312b47d0
prerequisite-patch-id: b7b50aa1657be59fd51c3e53d73382a1ee75a08e
prerequisite-patch-id: 30960743105a36f20b3ec4a9ff19e7bca04d6add
prerequisite-patch-id: b93c37dc7e09d1631b75387dc1ca90e3066dce17
prerequisite-patch-id: afffe7096c8e110a8da08c987983bc4441d39578
prerequisite-message-id: <20250424-qcs8300_iris-v5-0-f118f505c300(a)quicinc.com>
prerequisite-patch-id: 2e72fe4d11d264db3d42fa450427d30171303c6f
prerequisite-patch-id: 3398937a7fabb45934bb98a530eef73252231132
prerequisite-patch-id: feda620f147ca14a958c92afdc85a1dc507701ac
prerequisite-patch-id: 07ba0745c7d72796567e0a57f5c8e5355a8d2046
prerequisite-patch-id: e35b05c527217206ae871aef0d7b0261af0319ea
Best regards,
--
Dikshita Agarwal <quic_dikshita(a)quicinc.com>
From: Daniel Gomez <da.gomez(a)samsung.com>
[ Upstream commit a26fe287eed112b4e21e854f173c8918a6a8596d ]
The scripts/kconfig/merge_config.sh script requires an existing
$INITFILE (or the $1 argument) as a base file for merging Kconfig
fragments. However, an empty $INITFILE can serve as an initial starting
point, later referenced by the KCONFIG_ALLCONFIG Makefile variable
if -m is not used. This variable can point to any configuration file
containing preset config symbols (the merged output) as stated in
Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will
contain just the merge output requiring the user to run make (i.e.
KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make
olddefconfig).
Instead of failing when `$INITFILE` is missing, create an empty file and
use it as the starting point for merges.
Signed-off-by: Daniel Gomez <da.gomez(a)samsung.com>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
scripts/kconfig/merge_config.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/kconfig/merge_config.sh b/scripts/kconfig/merge_config.sh
index 72da3b8d6f307..151f9938abaa7 100755
--- a/scripts/kconfig/merge_config.sh
+++ b/scripts/kconfig/merge_config.sh
@@ -105,8 +105,8 @@ INITFILE=$1
shift;
if [ ! -r "$INITFILE" ]; then
- echo "The base file '$INITFILE' does not exist. Exit." >&2
- exit 1
+ echo "The base file '$INITFILE' does not exist. Creating one..." >&2
+ touch "$INITFILE"
fi
MERGE_LIST=$*
--
2.39.5
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 32dce6b1949a696dc7abddc04de8cbe35c260217
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025050504-placate-iodize-9693@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 32dce6b1949a696dc7abddc04de8cbe35c260217 Mon Sep 17 00:00:00 2001
From: Janne Grunau <j(a)jannau.net>
Date: Tue, 4 Mar 2025 20:12:14 +0100
Subject: [PATCH] drm: Select DRM_KMS_HELPER from
DRM_DEBUG_DP_MST_TOPOLOGY_REFS
Using "depends on" and "select" for the same Kconfig symbol is known to
cause circular dependencies (cmp. "Kconfig recursive dependency
limitations" in Documentation/kbuild/kconfig-language.rst.
DRM drivers are selecting drm helpers so do the same for
DRM_DEBUG_DP_MST_TOPOLOGY_REFS.
Fixes following circular dependency reported on x86 for the downstream
Asahi Linux tree:
error: recursive dependency detected!
symbol DRM_KMS_HELPER is selected by DRM_GEM_SHMEM_HELPER
symbol DRM_GEM_SHMEM_HELPER is selected by RUST_DRM_GEM_SHMEM_HELPER
symbol RUST_DRM_GEM_SHMEM_HELPER is selected by DRM_ASAHI
symbol DRM_ASAHI depends on RUST
symbol RUST depends on CALL_PADDING
symbol CALL_PADDING depends on OBJTOOL
symbol OBJTOOL is selected by STACK_VALIDATION
symbol STACK_VALIDATION depends on UNWINDER_FRAME_POINTER
symbol UNWINDER_FRAME_POINTER is part of choice block at arch/x86/Kconfig.debug:224
symbol <choice> unknown is visible depending on UNWINDER_GUESS
symbol UNWINDER_GUESS prompt is visible depending on STACKDEPOT
symbol STACKDEPOT is selected by DRM_DEBUG_DP_MST_TOPOLOGY_REFS
symbol DRM_DEBUG_DP_MST_TOPOLOGY_REFS depends on DRM_KMS_HELPER
Fixes: 12a280c72868 ("drm/dp_mst: Add topology ref history tracking for debugging")
Cc: stable(a)vger.kernel.org
Signed-off-by: Janne Grunau <j(a)jannau.net>
Acked-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Link: https://lore.kernel.org/r/20250304-drm_debug_dp_mst_topo_kconfig-v1-1-e16fd…
Signed-off-by: Alyssa Rosenzweig <alyssa(a)rosenzweig.io>
diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 2cba2b6ebe1c..f01925ed8176 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -188,7 +188,7 @@ config DRM_DEBUG_DP_MST_TOPOLOGY_REFS
bool "Enable refcount backtrace history in the DP MST helpers"
depends on STACKTRACE_SUPPORT
select STACKDEPOT
- depends on DRM_KMS_HELPER
+ select DRM_KMS_HELPER
depends on DEBUG_KERNEL
depends on EXPERT
help
The patch titled
Subject: zsmalloc: don't underflow size calculation in zs_obj_write()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
zsmalloc-dont-underflow-size-calculation-in-zs_obj_write.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Subject: zsmalloc: don't underflow size calculation in zs_obj_write()
Date: Sun, 4 May 2025 20:00:22 +0900
Do not mix class->size and object size during offsets/sizes calculation in
zs_obj_write(). Size classes can merge into clusters, based on
objects-per-zspage and pages-per-zspage characteristics, so some size
classes can store objects smaller than class->size. This becomes
problematic when object size is much smaller than class->size - we can
determine that object spans two physical pages, because we use a larger
class->size for this, while the actual object is much smaller and fits one
physical page, so there is nothing to write to the second page and
memcpy() size calculation underflows.
We always know the exact size in bytes of the object that we are about to
write (store), so use it instead of class->size.
Link: https://lkml.kernel.org/r/20250504110650.2783619-1-senozhatsky@chromium.org
Signed-off-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
Reported-by: Igor Belousov <igor.b(a)beldev.am>
Tested-by: Igor Belousov <igor.b(a)beldev.am>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/zsmalloc.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/mm/zsmalloc.c~zsmalloc-dont-underflow-size-calculation-in-zs_obj_write
+++ a/mm/zsmalloc.c
@@ -1243,19 +1243,19 @@ void zs_obj_write(struct zs_pool *pool,
class = zspage_class(pool, zspage);
off = offset_in_page(class->size * obj_idx);
- if (off + class->size <= PAGE_SIZE) {
+ if (!ZsHugePage(zspage))
+ off += ZS_HANDLE_SIZE;
+
+ if (off + mem_len <= PAGE_SIZE) {
/* this object is contained entirely within a page */
void *dst = kmap_local_zpdesc(zpdesc);
- if (!ZsHugePage(zspage))
- off += ZS_HANDLE_SIZE;
memcpy(dst + off, handle_mem, mem_len);
kunmap_local(dst);
} else {
/* this object spans two pages */
size_t sizes[2];
- off += ZS_HANDLE_SIZE;
sizes[0] = PAGE_SIZE - off;
sizes[1] = mem_len - sizes[0];
_
Patches currently in -mm which might be from senozhatsky(a)chromium.org are
zsmalloc-dont-underflow-size-calculation-in-zs_obj_write.patch
zsmalloc-prefer-the-the-original-pages-node-for-compressed-data-fix.patch
zram-modernize-writeback-interface.patch
zram-modernize-writeback-interface-v3.patch
zram-modernize-writeback-interface-v4.patch
zsmalloc-cleanup-headers-includes.patch
documentation-zram-update-idle-pages-tracking-documentation.patch
The patch below does not apply to the 6.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.14.y
git checkout FETCH_HEAD
git cherry-pick -x be8250786ca94952a19ce87f98ad9906448bc9ef
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025050521-provable-extent-4108@gregkh' --subject-prefix 'PATCH 6.14.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From be8250786ca94952a19ce87f98ad9906448bc9ef Mon Sep 17 00:00:00 2001
From: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Date: Mon, 21 Apr 2025 15:52:32 +0800
Subject: [PATCH] mm, slab: clean up slab->obj_exts always
When memory allocation profiling is disabled at runtime or due to an
error, shutdown_mem_profiling() is called: slab->obj_exts which
previously allocated remains.
It won't be cleared by unaccount_slab() because of
mem_alloc_profiling_enabled() not true. It's incorrect, slab->obj_exts
should always be cleaned up in unaccount_slab() to avoid following error:
[...]BUG: Bad page state in process...
..
[...]page dumped because: page still charged to cgroup
[andriy.shevchenko(a)linux.intel.com: fold need_slab_obj_ext() into its only user]
Fixes: 21c690a349ba ("mm: introduce slabobj_ext to support slab object extensions")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zhenhua Huang <quic_zhenhuah(a)quicinc.com>
Acked-by: David Rientjes <rientjes(a)google.com>
Acked-by: Harry Yoo <harry.yoo(a)oracle.com>
Tested-by: Harry Yoo <harry.yoo(a)oracle.com>
Acked-by: Suren Baghdasaryan <surenb(a)google.com>
Link: https://patch.msgid.link/20250421075232.2165527-1-quic_zhenhuah@quicinc.com
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
diff --git a/mm/slub.c b/mm/slub.c
index dc9e729e1d26..be8b09e09d30 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2028,8 +2028,7 @@ int alloc_slab_obj_exts(struct slab *slab, struct kmem_cache *s,
return 0;
}
-/* Should be called only if mem_alloc_profiling_enabled() */
-static noinline void free_slab_obj_exts(struct slab *slab)
+static inline void free_slab_obj_exts(struct slab *slab)
{
struct slabobj_ext *obj_exts;
@@ -2049,18 +2048,6 @@ static noinline void free_slab_obj_exts(struct slab *slab)
slab->obj_exts = 0;
}
-static inline bool need_slab_obj_ext(void)
-{
- if (mem_alloc_profiling_enabled())
- return true;
-
- /*
- * CONFIG_MEMCG creates vector of obj_cgroup objects conditionally
- * inside memcg_slab_post_alloc_hook. No other users for now.
- */
- return false;
-}
-
#else /* CONFIG_SLAB_OBJ_EXT */
static inline void init_slab_obj_exts(struct slab *slab)
@@ -2077,11 +2064,6 @@ static inline void free_slab_obj_exts(struct slab *slab)
{
}
-static inline bool need_slab_obj_ext(void)
-{
- return false;
-}
-
#endif /* CONFIG_SLAB_OBJ_EXT */
#ifdef CONFIG_MEM_ALLOC_PROFILING
@@ -2129,7 +2111,7 @@ __alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
static inline void
alloc_tagging_slab_alloc_hook(struct kmem_cache *s, void *object, gfp_t flags)
{
- if (need_slab_obj_ext())
+ if (mem_alloc_profiling_enabled())
__alloc_tagging_slab_alloc_hook(s, object, flags);
}
@@ -2601,8 +2583,12 @@ static __always_inline void account_slab(struct slab *slab, int order,
static __always_inline void unaccount_slab(struct slab *slab, int order,
struct kmem_cache *s)
{
- if (memcg_kmem_online() || need_slab_obj_ext())
- free_slab_obj_exts(slab);
+ /*
+ * The slab object extensions should now be freed regardless of
+ * whether mem_alloc_profiling_enabled() or not because profiling
+ * might have been disabled after slab->obj_exts got allocated.
+ */
+ free_slab_obj_exts(slab);
mod_node_page_state(slab_pgdat(slab), cache_vmstat_idx(s),
-(PAGE_SIZE << order));
From: Daniel Gomez <da.gomez(a)samsung.com>
[ Upstream commit a26fe287eed112b4e21e854f173c8918a6a8596d ]
The scripts/kconfig/merge_config.sh script requires an existing
$INITFILE (or the $1 argument) as a base file for merging Kconfig
fragments. However, an empty $INITFILE can serve as an initial starting
point, later referenced by the KCONFIG_ALLCONFIG Makefile variable
if -m is not used. This variable can point to any configuration file
containing preset config symbols (the merged output) as stated in
Documentation/kbuild/kconfig.rst. When -m is used $INITFILE will
contain just the merge output requiring the user to run make (i.e.
KCONFIG_ALLCONFIG=<$INITFILE> make <allnoconfig/alldefconfig> or make
olddefconfig).
Instead of failing when `$INITFILE` is missing, create an empty file and
use it as the starting point for merges.
Signed-off-by: Daniel Gomez <da.gomez(a)samsung.com>
Signed-off-by: Masahiro Yamada <masahiroy(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
scripts/kconfig/merge_config.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/scripts/kconfig/merge_config.sh b/scripts/kconfig/merge_config.sh
index d7d5c58b8b6aa..557f37f481fdf 100755
--- a/scripts/kconfig/merge_config.sh
+++ b/scripts/kconfig/merge_config.sh
@@ -98,8 +98,8 @@ INITFILE=$1
shift;
if [ ! -r "$INITFILE" ]; then
- echo "The base file '$INITFILE' does not exist. Exit." >&2
- exit 1
+ echo "The base file '$INITFILE' does not exist. Creating one..." >&2
+ touch "$INITFILE"
fi
MERGE_LIST=$*
--
2.39.5
From: Ashish Kalra <ashish.kalra(a)amd.com>
When the shared pages are being made private during kdump preparation
there are additional checks to handle shared GHCB pages.
These additional checks include handling the case of GHCB page being
contained within a huge page.
The check for handling the case of GHCB contained within a huge
page incorrectly skips a page just below the GHCB page from being
transitioned back to private during kdump preparation.
This skipped page causes a 0x404 #VC exception when it is accessed
later while dumping guest memory during vmcore generation via kdump.
Correct the range to be checked for GHCB contained in a huge page.
Also ensure that the skipped huge page containing the GHCB page is
transitioned back to private by applying the correct address mask
later when changing GHCBs to private at end of kdump preparation.
Cc: stable(a)vger.kernel.org
Fixes: 3074152e56c9 ("x86/sev: Convert shared memory back to private on kexec")
Signed-off-by: Ashish Kalra <ashish.kalra(a)amd.com>
---
arch/x86/coco/sev/core.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/arch/x86/coco/sev/core.c b/arch/x86/coco/sev/core.c
index d35fec7b164a..97e5d475b9f5 100644
--- a/arch/x86/coco/sev/core.c
+++ b/arch/x86/coco/sev/core.c
@@ -1019,7 +1019,8 @@ static void unshare_all_memory(void)
data = per_cpu(runtime_data, cpu);
ghcb = (unsigned long)&data->ghcb_page;
- if (addr <= ghcb && ghcb <= addr + size) {
+ /* Handle the case of a huge page containing the GHCB page */
+ if (addr <= ghcb && ghcb < addr + size) {
skipped_addr = true;
break;
}
@@ -1131,9 +1132,8 @@ static void shutdown_all_aps(void)
void snp_kexec_finish(void)
{
struct sev_es_runtime_data *data;
+ unsigned long size, mask, ghcb;
unsigned int level, cpu;
- unsigned long size;
- struct ghcb *ghcb;
pte_t *pte;
if (!cc_platform_has(CC_ATTR_GUEST_SEV_SNP))
@@ -1157,11 +1157,14 @@ void snp_kexec_finish(void)
for_each_possible_cpu(cpu) {
data = per_cpu(runtime_data, cpu);
- ghcb = &data->ghcb_page;
- pte = lookup_address((unsigned long)ghcb, &level);
+ ghcb = (unsigned long)&data->ghcb_page;
+ pte = lookup_address(ghcb, &level);
size = page_level_size(level);
+ mask = page_level_mask(level);
+ /* Handle the case of a huge page containing the GHCB page */
+ ghcb &= mask;
set_pte_enc(pte, level, (void *)ghcb);
- snp_set_memory_private((unsigned long)ghcb, (size / PAGE_SIZE));
+ snp_set_memory_private(ghcb, (size / PAGE_SIZE));
}
}
--
2.34.1
The commit a4e772898f8b ("PCI: Add missing bridge lock to
pci_bus_lock()") made the lock function to call depend on
dev->subordinate but left pci_slot_unlock() unmodified creating locking
asymmetry compared with pci_slot_lock().
Because of the asymmetric lock handling, the same bridge device is
unlocked twice. First pci_bus_unlock() unlocks bus->self and then
pci_slot_unlock() will unconditionally unlock the same bridge device.
Move pci_dev_unlock() inside an else branch to match the logic in
pci_slot_lock().
Fixes: a4e772898f8b ("PCI: Add missing bridge lock to pci_bus_lock()")
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org>
---
v2:
- Improve changelog (Lukas)
- Added Cc stable
drivers/pci/pci.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 4d7c9f64ea24..26507aa906d7 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5542,7 +5542,8 @@ static void pci_slot_unlock(struct pci_slot *slot)
continue;
if (dev->subordinate)
pci_bus_unlock(dev->subordinate);
- pci_dev_unlock(dev);
+ else
+ pci_dev_unlock(dev);
}
}
base-commit: 0af2f6be1b4281385b618cb86ad946eded089ac8
--
2.39.5
Hi Greg,
below is a backport for upstream patch
fd87b7783802 ("net: Fix the devmem sock opts and msgs for parisc").
This upstream patch does not apply cleanly against v6.12, and
backporting all intermediate changes are too big, so I created this
trivial standalone patch instead.
Can you please add the patc below to the stable queue for v6.12?
Thanks!
Helge
---
From: Pranjal Shrivastava <praan(a)google.com>
Date: Mon, 24 Mar 2025 07:42:27 +0000
Subject: [PATCH] net: Fix the devmem sock opts and msgs for parisc
The devmem socket options and socket control message definitions
introduced in the TCP devmem series[1] incorrectly continued the socket
definitions for arch/parisc.
The UAPI change seems safe as there are currently no drivers that
declare support for devmem TCP RX via PP_FLAG_ALLOW_UNREADABLE_NETMEM.
Hence, fixing this UAPI should be safe.
Fix the devmem socket options and socket control message definitions to
reflect the series followed by arch/parisc.
[1] https://lore.kernel.org/lkml/20240910171458.219195-10-almasrymina@google.co…
Patch modified for kernel 6.12 by Helge Deller.
Fixes: 8f0b3cc9a4c10 ("tcp: RX path for devmem TCP")
Signed-off-by: Pranjal Shrivastava <praan(a)google.com>
Signed-off-by: Helge Deller <deller(a)gmx.de>
diff --git b/arch/parisc/include/uapi/asm/socket.h a/arch/parisc/include/uapi/asm/socket.h
index 38fc0b188e08..96831c988606 100644
--- b/arch/parisc/include/uapi/asm/socket.h
+++ a/arch/parisc/include/uapi/asm/socket.h
@@ -132,11 +132,15 @@
#define SO_PASSPIDFD 0x404A
#define SO_PEERPIDFD 0x404B
-#define SO_DEVMEM_LINEAR 78
+#define SCM_TS_OPT_ID 0x404C
+
+#define SO_RCVPRIORITY 0x404D
+
+#define SO_DEVMEM_LINEAR 0x404E
#define SCM_DEVMEM_LINEAR SO_DEVMEM_LINEAR
-#define SO_DEVMEM_DMABUF 79
+#define SO_DEVMEM_DMABUF 0x404F
#define SCM_DEVMEM_DMABUF SO_DEVMEM_DMABUF
-#define SO_DEVMEM_DONTNEED 80
+#define SO_DEVMEM_DONTNEED 0x4050
#if !defined(__KERNEL__)
Add the correct scale to get temperature in mili degree Celcius.
Add sign component to temperature scan element.
Signed-off-by: Sean Nyekjaer <sean(a)geanix.com>
---
Changes in v3:
- Dropping define infavor of inline scale value
- Added using constants from units.h
- Tweaked commit msg to make it more assertive
- Link to v2: https://lore.kernel.org/r/20250502-fxls-v2-0-e1af65f1aa6c@geanix.com
Changes in v2:
- Correct offset is applied before scaling component
- Added sign component to temperature scan element
- Link to v1: https://lore.kernel.org/r/20250501-fxls-v1-1-f54061a07099@geanix.com
---
Sean Nyekjaer (2):
iio: accel: fxls8962af: Fix temperature calculation
iio: accel: fxls8962af: Fix temperature scan element sign
drivers/iio/accel/fxls8962af-core.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
---
base-commit: 609bc31eca06c7408e6860d8b46311ebe45c1fef
change-id: 20250501-fxls-307ef3d6d065
Best regards,
--
Sean Nyekjaer <sean(a)geanix.com>
Apply bridge window offsets to screen_info framebuffers during
relocation. Fixes invalid access to I/O memory.
Resources behind a PCI bridge can be located at a certain offset
in the kernel's I/O range. The framebuffer memory range stored in
screen_info refers to the offset as seen during boot (essentialy 0).
During boot up, the kernel may assign a different memory offset to
the bridge device and thereby relocating the framebuffer address of
the PCI graphics device as seen by the kernel. The information in
screen_info must be updated as well.
The helper pcibios_bus_to_resource() performs the relocation of
the screen_info resource. The result now matches the I/O-memory
resource of the PCI graphics device. As before, we store away the
information necessary to update the information in screen_info.
Commit 78aa89d1dfba ("firmware/sysfb: Update screen_info for relocated
EFI framebuffers") added the code for updating screen_info. It is
based on similar functionality that pre-existed in efifb. Efifb uses
a pointer to the PCI resource, while the newer code does a memcpy of
the region. Hence efifb sees any updates to the PCI resource and avoids
the issue.
v2:
- Fixed tags (Takashi, Ivan)
- Updated information on efifb
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Reported-by: "Ivan T. Ivanov" <iivanov(a)suse.de>
Closes: https://bugzilla.suse.com/show_bug.cgi?id=1240696
Tested-by: "Ivan T. Ivanov" <iivanov(a)suse.de>
Fixes: 78aa89d1dfba ("firmware/sysfb: Update screen_info for relocated EFI framebuffers")
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v6.9+
---
drivers/video/screen_info_pci.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/drivers/video/screen_info_pci.c b/drivers/video/screen_info_pci.c
index 6c5833517141..c46c75dc3fae 100644
--- a/drivers/video/screen_info_pci.c
+++ b/drivers/video/screen_info_pci.c
@@ -8,7 +8,7 @@
static struct pci_dev *screen_info_lfb_pdev;
static size_t screen_info_lfb_bar;
static resource_size_t screen_info_lfb_offset;
-static struct resource screen_info_lfb_res = DEFINE_RES_MEM(0, 0);
+static struct pci_bus_region screen_info_lfb_region;
static bool __screen_info_relocation_is_valid(const struct screen_info *si, struct resource *pr)
{
@@ -31,7 +31,7 @@ void screen_info_apply_fixups(void)
if (screen_info_lfb_pdev) {
struct resource *pr = &screen_info_lfb_pdev->resource[screen_info_lfb_bar];
- if (pr->start != screen_info_lfb_res.start) {
+ if (pr->start != screen_info_lfb_region.start) {
if (__screen_info_relocation_is_valid(si, pr)) {
/*
* Only update base if we have an actual
@@ -69,10 +69,21 @@ static void screen_info_fixup_lfb(struct pci_dev *pdev)
for (i = 0; i < numres; ++i) {
struct resource *r = &res[i];
+ struct pci_bus_region bus_region = {
+ .start = r->start,
+ .end = r->end,
+ };
const struct resource *pr;
if (!(r->flags & IORESOURCE_MEM))
continue;
+
+ /*
+ * Translate the address to resource if the framebuffer
+ * is behind a PCI bridge.
+ */
+ pcibios_bus_to_resource(pdev->bus, r, &bus_region);
+
pr = pci_find_resource(pdev, r);
if (!pr)
continue;
@@ -85,7 +96,7 @@ static void screen_info_fixup_lfb(struct pci_dev *pdev)
screen_info_lfb_pdev = pdev;
screen_info_lfb_bar = pr - pdev->resource;
screen_info_lfb_offset = r->start - pr->start;
- memcpy(&screen_info_lfb_res, r, sizeof(screen_info_lfb_res));
+ memcpy(&screen_info_lfb_region, &bus_region, sizeof(screen_info_lfb_region));
}
}
DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_ANY_ID, PCI_ANY_ID, PCI_BASE_CLASS_DISPLAY, 16,
--
2.49.0
Starting with Rust 1.87.0 (expected 2025-05-15) [1], Clippy may expand
the `ptr_eq` lint, e.g.:
error: use `core::ptr::eq` when comparing raw pointers
--> rust/kernel/list.rs:438:12
|
438 | if self.first == item {
| ^^^^^^^^^^^^^^^^^^ help: try: `core::ptr::eq(self.first, item)`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#ptr_eq
= note: `-D clippy::ptr-eq` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::ptr_eq)]`
Thus clean the few cases we have.
This patch may not be actually needed by the time Rust 1.87.0 releases
since a PR to relax the lint has been beta nominated [2] due to reports
of being too eager (at least by default) [3].
Cc: stable(a)vger.kernel.org # Needed in 6.12.y and later (Rust is pinned in older LTSs).
Link: https://github.com/rust-lang/rust-clippy/pull/14339 [1]
Link: https://github.com/rust-lang/rust-clippy/pull/14526 [2]
Link: https://github.com/rust-lang/rust-clippy/issues/14525 [3]
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
rust/kernel/alloc/kvec.rs | 2 +-
rust/kernel/list.rs | 12 ++++++------
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/rust/kernel/alloc/kvec.rs b/rust/kernel/alloc/kvec.rs
index ae9d072741ce..cde911551327 100644
--- a/rust/kernel/alloc/kvec.rs
+++ b/rust/kernel/alloc/kvec.rs
@@ -743,7 +743,7 @@ fn into_raw_parts(self) -> (*mut T, NonNull<T>, usize, usize) {
pub fn collect(self, flags: Flags) -> Vec<T, A> {
let old_layout = self.layout;
let (mut ptr, buf, len, mut cap) = self.into_raw_parts();
- let has_advanced = ptr != buf.as_ptr();
+ let has_advanced = !core::ptr::eq(ptr, buf.as_ptr());
if has_advanced {
// Copy the contents we have advanced to at the beginning of the buffer.
diff --git a/rust/kernel/list.rs b/rust/kernel/list.rs
index a335c3b1ff5e..c63cbeee3316 100644
--- a/rust/kernel/list.rs
+++ b/rust/kernel/list.rs
@@ -435,7 +435,7 @@ unsafe fn remove_internal_inner(
// * If `item` was the only item in the list, then `prev == item`, and we just set
// `item->next` to null, so this correctly sets `first` to null now that the list is
// empty.
- if self.first == item {
+ if core::ptr::eq(self.first, item) {
// SAFETY: The `prev` pointer is the value that `item->prev` had when it was in this
// list, so it must be valid. There is no race since `prev` is still in the list and we
// still have exclusive access to the list.
@@ -556,7 +556,7 @@ fn next(&mut self) -> Option<ArcBorrow<'a, T>> {
let next = unsafe { (*current).next };
// INVARIANT: If `current` was the last element of the list, then this updates it to null.
// Otherwise, we update it to the next element.
- self.current = if next != self.stop {
+ self.current = if !core::ptr::eq(next, self.stop) {
next
} else {
ptr::null_mut()
@@ -726,7 +726,7 @@ impl<'a, T: ?Sized + ListItem<ID>, const ID: u64> Cursor<'a, T, ID> {
fn prev_ptr(&self) -> *mut ListLinksFields {
let mut next = self.next;
let first = self.list.first;
- if next == first {
+ if core::ptr::eq(next, first) {
// We are before the first element.
return core::ptr::null_mut();
}
@@ -788,7 +788,7 @@ pub fn move_next(&mut self) -> bool {
// access the `next` field.
let mut next = unsafe { (*self.next).next };
- if next == self.list.first {
+ if core::ptr::eq(next, self.list.first) {
next = core::ptr::null_mut();
}
@@ -802,7 +802,7 @@ pub fn move_next(&mut self) -> bool {
/// If the cursor is before the first element, then this call does nothing. This call returns
/// `true` if the cursor's position was changed.
pub fn move_prev(&mut self) -> bool {
- if self.next == self.list.first {
+ if core::ptr::eq(self.next, self.list.first) {
return false;
}
@@ -822,7 +822,7 @@ fn insert_inner(&mut self, item: ListArc<T, ID>) -> *mut ListLinksFields {
// * `ptr` is an element in the list or null.
// * if `ptr` is null, then `self.list.first` is null so the list is empty.
let item = unsafe { self.list.insert_inner(item, ptr) };
- if self.next == self.list.first {
+ if core::ptr::eq(self.next, self.list.first) {
// INVARIANT: We just inserted `item`, so it's a member of list.
self.list.first = item;
}
--
2.49.0
The commit cdd30ebb1b9f ("module: Convert symbol namespace to string
literal") makes the grammar of MODULE_IMPORT_NS and EXPORT_SYMBOL_NS
different between the stable branches and the mainline. But when
the commit 955f9ede52b8 ("bpf: Add namespace to BPF internal symbols")
was backported from mainline, only EXPORT_SYMBOL_NS instances are
adapted, leaving the MODULE_IMPORT_NS instance with the "new" grammar
and causing the module fails to build:
ERROR: modpost: module bpf_preload uses symbol bpf_link_get_from_fd from namespace BPF_INTERNAL, but does not import it.
ERROR: modpost: module bpf_preload uses symbol kern_sys_bpf from namespace BPF_INTERNAL, but does not import it.
Reported-by: Mingcong Bai <jeffbai(a)aosc.io>
Reported-by: Alex Davis <alex47794(a)gmail.com>
Closes: https://lore.kernel.org/all/CADiockBKBQTVqjA5G+RJ9LBwnEnZ8o0odYnL=LBZ_7QN=_…
Fixes: 955f9ede52b8 ("bpf: Add namespace to BPF internal symbols")
Signed-off-by: Xi Ruoyao <xry111(a)xry111.site>
---
kernel/bpf/preload/bpf_preload_kern.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bpf/preload/bpf_preload_kern.c b/kernel/bpf/preload/bpf_preload_kern.c
index 56a81df7a9d7..fdad0eb308fe 100644
--- a/kernel/bpf/preload/bpf_preload_kern.c
+++ b/kernel/bpf/preload/bpf_preload_kern.c
@@ -89,5 +89,5 @@ static void __exit fini(void)
}
late_initcall(load);
module_exit(fini);
-MODULE_IMPORT_NS("BPF_INTERNAL");
+MODULE_IMPORT_NS(BPF_INTERNAL);
MODULE_LICENSE("GPL");
--
2.49.0
Hi,
Could you please apply commit 6eab70345799 ("ASoC: soc-core: Stop using
of_property_read_bool() for non-boolean properties") to v6.14.x in order
to silence warnings introduced in v6.14 by commit c141ecc3cecd ("of:
Warn when of_property_read_bool() is used on non-boolean properties")
Thanks
Christophe
The following changes since commit 02a22be3c0003af08df510cba3d79d00c6495b74:
bcachefs: bch2_ioctl_subvolume_destroy() fixes (2025-04-03 16:13:53 -0400)
are available in the Git repository at:
git://evilpiepirate.org/bcachefs.git tags/bcachefs-for-6.14-2025-05-02
for you to fetch changes up to 52b17bca7b20663e5df6dbfc24cc2030259b64b6:
bcachefs: Remove incorrect __counted_by annotation (2025-05-02 21:09:51 -0400)
----------------------------------------------------------------
bcachefs fixes for 6.15
remove incorrect counted_by annotation, fixing FORTIFY_SOURCE crashes
that have been hitting arch users
----------------------------------------------------------------
Alan Huang (1):
bcachefs: Remove incorrect __counted_by annotation
fs/bcachefs/xattr_format.h | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
The following changes since commit 1a7a2300e0dd8b4a73bcd3777a2947fe42a16bef:
bcachefs: bch2_ioctl_subvolume_destroy() fixes (2025-03-31 13:16:15 -0400)
are available in the Git repository at:
git://evilpiepirate.org/bcachefs.git tags/bcachefs-for-6.12-2025-05-5
for you to fetch changes up to 3f105630c0b2e53a93713c2328e3426081f961c1:
bcachefs: Remove incorrect __counted_by annotation (2025-05-05 09:41:26 -0400)
----------------------------------------------------------------
bcachefs fixes for 6.12
remove incorrect counted_by annotation, fixing FORTIFY_SOURCE crashes
that have been hitting arch users
----------------------------------------------------------------
Alan Huang (1):
bcachefs: Remove incorrect __counted_by annotation
fs/bcachefs/xattr_format.h | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: 94cff94634e506a4a44684bee1875d2dbf782722
Gitweb: https://git.kernel.org/tip/94cff94634e506a4a44684bee1875d2dbf782722
Author: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
AuthorDate: Fri, 04 Apr 2025 15:31:16 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 05 May 2025 15:34:49 +02:00
clocksource/i8253: Use raw_spinlock_irqsave() in clockevent_i8253_disable()
On x86 during boot, clockevent_i8253_disable() can be invoked via
x86_late_time_init -> hpet_time_init() -> pit_timer_init() which happens
with enabled interrupts.
If some of the old i8253 hardware is actually used then lockdep will notice
that i8253_lock is used in hard interrupt context. This causes lockdep to
complain because it observed the lock being acquired with interrupts
enabled and in hard interrupt context.
Make clockevent_i8253_disable() acquire the lock with
raw_spinlock_irqsave() to cure this.
[ tglx: Massage change log and use guard() ]
Fixes: c8c4076723dac ("x86/timer: Skip PIT initialization on modern chipsets")
Signed-off-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20250404133116.p-XRWJXf@linutronix.de
---
drivers/clocksource/i8253.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/clocksource/i8253.c b/drivers/clocksource/i8253.c
index 39f7c2d..b603c25 100644
--- a/drivers/clocksource/i8253.c
+++ b/drivers/clocksource/i8253.c
@@ -103,7 +103,7 @@ int __init clocksource_i8253_init(void)
#ifdef CONFIG_CLKEVT_I8253
void clockevent_i8253_disable(void)
{
- raw_spin_lock(&i8253_lock);
+ guard(raw_spinlock_irqsave)(&i8253_lock);
/*
* Writing the MODE register should stop the counter, according to
@@ -132,8 +132,6 @@ void clockevent_i8253_disable(void)
outb_p(0, PIT_CH0);
outb_p(0x30, PIT_MODE);
-
- raw_spin_unlock(&i8253_lock);
}
static int pit_shutdown(struct clock_event_device *evt)
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 6eab70345799
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025050546-commute-subduing-1917@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6eab7034579917f207ca6d8e3f4e11e85e0ab7d5 Mon Sep 17 00:00:00 2001
From: Geert Uytterhoeven <geert+renesas(a)glider.be>
Date: Wed, 22 Jan 2025 09:21:27 +0100
Subject: [PATCH] ASoC: soc-core: Stop using of_property_read_bool() for
non-boolean properties
On R-Car:
OF: /sound: Read of boolean property 'simple-audio-card,bitclock-master' with a value.
OF: /sound: Read of boolean property 'simple-audio-card,frame-master' with a value.
or:
OF: /soc/sound@ec500000/ports/port@0/endpoint: Read of boolean property 'bitclock-master' with a value.
OF: /soc/sound@ec500000/ports/port@0/endpoint: Read of boolean property 'frame-master' with a value.
The use of of_property_read_bool() for non-boolean properties is
deprecated in favor of of_property_present() when testing for property
presence.
Replace testing for presence before calling of_property_read_u32() by
testing for an -EINVAL return value from the latter, to simplify the
code.
Signed-off-by: Geert Uytterhoeven <geert+renesas(a)glider.be>
Link: https://patch.msgid.link/db10e96fbda121e7456d70e97a013cbfc9755f4d.173753395…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/soc-core.c b/sound/soc/soc-core.c
index 3c6d8aef4130..26b34b688508 100644
--- a/sound/soc/soc-core.c
+++ b/sound/soc/soc-core.c
@@ -3046,7 +3046,7 @@ int snd_soc_of_parse_pin_switches(struct snd_soc_card *card, const char *prop)
unsigned int i, nb_controls;
int ret;
- if (!of_property_read_bool(dev->of_node, prop))
+ if (!of_property_present(dev->of_node, prop))
return 0;
strings = devm_kcalloc(dev, nb_controls_max,
@@ -3120,23 +3120,17 @@ int snd_soc_of_parse_tdm_slot(struct device_node *np,
if (rx_mask)
snd_soc_of_get_slot_mask(np, "dai-tdm-slot-rx-mask", rx_mask);
- if (of_property_read_bool(np, "dai-tdm-slot-num")) {
- ret = of_property_read_u32(np, "dai-tdm-slot-num", &val);
- if (ret)
- return ret;
+ ret = of_property_read_u32(np, "dai-tdm-slot-num", &val);
+ if (ret && ret != -EINVAL)
+ return ret;
+ if (!ret && slots)
+ *slots = val;
- if (slots)
- *slots = val;
- }
-
- if (of_property_read_bool(np, "dai-tdm-slot-width")) {
- ret = of_property_read_u32(np, "dai-tdm-slot-width", &val);
- if (ret)
- return ret;
-
- if (slot_width)
- *slot_width = val;
- }
+ ret = of_property_read_u32(np, "dai-tdm-slot-width", &val);
+ if (ret && ret != -EINVAL)
+ return ret;
+ if (!ret && slot_width)
+ *slot_width = val;
return 0;
}
@@ -3403,12 +3397,12 @@ unsigned int snd_soc_daifmt_parse_clock_provider_raw(struct device_node *np,
* check "[prefix]frame-master"
*/
snprintf(prop, sizeof(prop), "%sbitclock-master", prefix);
- bit = of_property_read_bool(np, prop);
+ bit = of_property_present(np, prop);
if (bit && bitclkmaster)
*bitclkmaster = of_parse_phandle(np, prop, 0);
snprintf(prop, sizeof(prop), "%sframe-master", prefix);
- frame = of_property_read_bool(np, prop);
+ frame = of_property_present(np, prop);
if (frame && framemaster)
*framemaster = of_parse_phandle(np, prop, 0);
Event polling delay is set to 0 if there are any pending requests in
either rx or tx requests lists. Checking for pending requests does
not work well for "IN" transfers as the tty driver always queues
requests to the list and TRBs to the ring, preparing to receive data
from the host.
This causes unnecessary busylooping and cpu hogging.
Only set the event polling delay to 0 if there are pending tx "write"
transfers, or if it was less than 10ms since last active data transfer
in any direction.
Cc: Łukasz Bartosik <ukaszb(a)chromium.org>
Fixes: fb18e5bb9660 ("xhci: dbc: poll at different rate depending on data transfer activity")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
---
drivers/usb/host/xhci-dbgcap.c | 19 ++++++++++++++++---
drivers/usb/host/xhci-dbgcap.h | 3 +++
2 files changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/host/xhci-dbgcap.c b/drivers/usb/host/xhci-dbgcap.c
index fd7895b24367..0d4ce5734165 100644
--- a/drivers/usb/host/xhci-dbgcap.c
+++ b/drivers/usb/host/xhci-dbgcap.c
@@ -823,6 +823,7 @@ static enum evtreturn xhci_dbc_do_handle_events(struct xhci_dbc *dbc)
{
dma_addr_t deq;
union xhci_trb *evt;
+ enum evtreturn ret = EVT_DONE;
u32 ctrl, portsc;
bool update_erdp = false;
@@ -909,6 +910,7 @@ static enum evtreturn xhci_dbc_do_handle_events(struct xhci_dbc *dbc)
break;
case TRB_TYPE(TRB_TRANSFER):
dbc_handle_xfer_event(dbc, evt);
+ ret = EVT_XFER_DONE;
break;
default:
break;
@@ -927,7 +929,7 @@ static enum evtreturn xhci_dbc_do_handle_events(struct xhci_dbc *dbc)
lo_hi_writeq(deq, &dbc->regs->erdp);
}
- return EVT_DONE;
+ return ret;
}
static void xhci_dbc_handle_events(struct work_struct *work)
@@ -936,6 +938,7 @@ static void xhci_dbc_handle_events(struct work_struct *work)
struct xhci_dbc *dbc;
unsigned long flags;
unsigned int poll_interval;
+ unsigned long busypoll_timelimit;
dbc = container_of(to_delayed_work(work), struct xhci_dbc, event_work);
poll_interval = dbc->poll_interval;
@@ -954,11 +957,21 @@ static void xhci_dbc_handle_events(struct work_struct *work)
dbc->driver->disconnect(dbc);
break;
case EVT_DONE:
- /* set fast poll rate if there are pending data transfers */
+ /*
+ * Set fast poll rate if there are pending out transfers, or
+ * a transfer was recently processed
+ */
+ busypoll_timelimit = dbc->xfer_timestamp +
+ msecs_to_jiffies(DBC_XFER_INACTIVITY_TIMEOUT);
+
if (!list_empty(&dbc->eps[BULK_OUT].list_pending) ||
- !list_empty(&dbc->eps[BULK_IN].list_pending))
+ time_is_after_jiffies(busypoll_timelimit))
poll_interval = 0;
break;
+ case EVT_XFER_DONE:
+ dbc->xfer_timestamp = jiffies;
+ poll_interval = 0;
+ break;
default:
dev_info(dbc->dev, "stop handling dbc events\n");
return;
diff --git a/drivers/usb/host/xhci-dbgcap.h b/drivers/usb/host/xhci-dbgcap.h
index 9dc8f4d8077c..47ac72c2286d 100644
--- a/drivers/usb/host/xhci-dbgcap.h
+++ b/drivers/usb/host/xhci-dbgcap.h
@@ -96,6 +96,7 @@ struct dbc_ep {
#define DBC_WRITE_BUF_SIZE 8192
#define DBC_POLL_INTERVAL_DEFAULT 64 /* milliseconds */
#define DBC_POLL_INTERVAL_MAX 5000 /* milliseconds */
+#define DBC_XFER_INACTIVITY_TIMEOUT 10 /* milliseconds */
/*
* Private structure for DbC hardware state:
*/
@@ -142,6 +143,7 @@ struct xhci_dbc {
enum dbc_state state;
struct delayed_work event_work;
unsigned int poll_interval; /* ms */
+ unsigned long xfer_timestamp;
unsigned resume_required:1;
struct dbc_ep eps[2];
@@ -187,6 +189,7 @@ struct dbc_request {
enum evtreturn {
EVT_ERR = -1,
EVT_DONE,
+ EVT_XFER_DONE,
EVT_GSER,
EVT_DISC,
};
--
2.43.0