Hi All,
Chages since v4:
- unused pages leak is avoided
Chages since v3:
- pfn_to_virt() changed to page_to_virt() due to compile error
Chages since v2:
- page allocation moved out of the atomic context
Chages since v1:
- Fixes: and -stable tags added to the patch description
Thanks!
Alexander Gordeev (1):
kasan: Avoid sleepable page allocation from atomic context
mm/kasan/shadow.c | 77 ++++++++++++++++++++++++++++++++++++++---------
1 file changed, 63 insertions(+), 14 deletions(-)
--
2.45.2
Hi All,
This patch series adds initial support for the HEVC(H.265) and VP9
codecs in iris decoder. The objective of this work is to extend the
decoder's capabilities to handle HEVC and VP9 codec streams,
including necessary format handling and buffer management.
In addition, the series also includes a set of fixes to address issues
identified during testing of these additional codecs.
These patches also address the comments and feedback received from the
RFC patches previously sent. I have made the necessary improvements
based on the community's suggestions.
Changes in v4:
- Splitted patch patch 06/23 in two patches (Bryan)
- Simplified the conditional logic in patch 13/23 (Bryan)
- Improved commit description for patch patch 13/23 (Nicolas)
- Fix the value of H265_NUM_TILE_ROW macro (Neil)
- Link to v3: https://lore.kernel.org/r/20250502-qcom-iris-hevc-vp9-v3-0-552158a10a7d@qui…
Changes in v3:
- Introduced two wrappers with explicit names to handle destroy internal
buffers (Nicolas)
- Used sub state check instead of introducing new boolean (Vikash)
- Addressed other comments (Vikash)
- Reorderd patches to have all fixes patches first (Dmitry)
- Link to v2:
https://lore.kernel.org/r/20250428-qcom-iris-hevc-vp9-v2-0-3a6013ecb8a5@qui…
Changes in v2:
- Added Changes to make sure all buffers are released in session close
(bryna)
- Added tracking for flush responses to fix a timing issue.
- Added a handling to fix timing issue in reconfig
- Splitted patch 06/20 in two patches (Bryan)
- Added missing fixes tag (bryan)
- Updated fluster report (Nicolas)
- Link to v1:
https://lore.kernel.org/r/20250408-iris-dec-hevc-vp9-v1-0-acd258778bd6@quic…
Changes sinces RFC:
- Added additional fixes to address issues identified during further
testing.
- Moved typo fix to a seperate patch [Neil]
- Reordered the patches for better logical flow and clarity [Neil,
Dmitry]
- Added fixes tag wherever applicable [Neil, Dmitry]
- Removed the default case in the switch statement for codecs [Bryan]
- Replaced if-else statements with switch-case [Bryan]
- Added comments for mbpf [Bryan]
- RFC:
https://lore.kernel.org/linux-media/20250305104335.3629945-1-quic_dikshita@…
This patch series depends on [1] & [2]
[1]
https://lore.kernel.org/linux-media/20250417-topic-sm8x50-iris-v10-v7-0-f02…
[2]
https://lore.kernel.org/linux-media/20250424-qcs8300_iris-v5-0-f118f505c300…
These patches are tested on SM8250 and SM8550 with v4l2-ctl and
Gstreamer for HEVC and VP9 decoders, at the same time ensured that
the existing H264 decoder functionality remains uneffected.
Note: 1 of the fluster compliance test is fixed with firmware [3]
[3]:
https://lore.kernel.org/linux-firmware/1a511921-446d-cdc4-0203-084c88a5dc1e…
The result of fluster test on SM8550:
131/147 testcases passed while testing JCT-VC-HEVC_V1 with
GStreamer-H.265-V4L2-Gst1.0.
The failing test case:
- 10 testcases failed due to unsupported 10 bit format.
- DBLK_A_MAIN10_VIXS_4
- INITQP_B_Main10_Sony_1
- TSUNEQBD_A_MAIN10_Technicolor_2
- WP_A_MAIN10_Toshiba_3
- WP_MAIN10_B_Toshiba_3
- WPP_A_ericsson_MAIN10_2
- WPP_B_ericsson_MAIN10_2
- WPP_C_ericsson_MAIN10_2
- WPP_E_ericsson_MAIN10_2
- WPP_F_ericsson_MAIN10_2
- 4 testcase failed due to unsupported resolution
- PICSIZE_A_Bossen_1
- PICSIZE_B_Bossen_1
- WPP_D_ericsson_MAIN10_2
- WPP_D_ericsson_MAIN_2
- 2 testcase failed due to CRC mismatch
- RAP_A_docomo_6
- RAP_B_Bossen_2
- BUG reported:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4392
Analysis - First few frames in this discarded by firmware and are
sent to driver with 0 filled length. Driver send such buffers to
client with timestamp 0 and payload set to 0 and
make buf state to VB2_BUF_STATE_ERROR. Such buffers should be
dropped by GST. But instead, the first frame displayed as green
frame and when a valid buffer is sent to client later with same 0
timestamp, its dropped, leading to CRC mismatch for first frame.
235/305 testcases passed while testing VP9-TEST-VECTORS with
GStreamer-VP9-V4L2-Gst1.0.
The failing test case:
- 64 testcases failed due to unsupported resolution
- vp90-2-02-size-08x08.webm
- vp90-2-02-size-08x10.webm
- vp90-2-02-size-08x16.webm
- vp90-2-02-size-08x18.webm
- vp90-2-02-size-08x32.webm
- vp90-2-02-size-08x34.webm
- vp90-2-02-size-08x64.webm
- vp90-2-02-size-08x66.webm
- vp90-2-02-size-10x08.webm
- vp90-2-02-size-10x10.webm
- vp90-2-02-size-10x16.webm
- vp90-2-02-size-10x18.webm
- vp90-2-02-size-10x32.webm
- vp90-2-02-size-10x34.webm
- vp90-2-02-size-10x64.webm
- vp90-2-02-size-10x66.webm
- vp90-2-02-size-16x08.webm
- vp90-2-02-size-16x10.webm
- vp90-2-02-size-16x16.webm
- vp90-2-02-size-16x18.webm
- vp90-2-02-size-16x32.webm
- vp90-2-02-size-16x34.webm
- vp90-2-02-size-16x64.webm
- vp90-2-02-size-16x66.webm
- vp90-2-02-size-18x08.webm
- vp90-2-02-size-18x10.webm
- vp90-2-02-size-18x16.webm
- vp90-2-02-size-18x18.webm
- vp90-2-02-size-18x32.webm
- vp90-2-02-size-18x34.webm
- vp90-2-02-size-18x64.webm
- vp90-2-02-size-18x66.webm
- vp90-2-02-size-32x08.webm
- vp90-2-02-size-32x10.webm
- vp90-2-02-size-32x16.webm
- vp90-2-02-size-32x18.webm
- vp90-2-02-size-32x32.webm
- vp90-2-02-size-32x34.webm
- vp90-2-02-size-32x64.webm
- vp90-2-02-size-32x66.webm
- vp90-2-02-size-34x08.webm
- vp90-2-02-size-34x10.webm
- vp90-2-02-size-34x16.webm
- vp90-2-02-size-34x18.webm
- vp90-2-02-size-34x32.webm
- vp90-2-02-size-34x34.webm
- vp90-2-02-size-34x64.webm
- vp90-2-02-size-34x66.webm
- vp90-2-02-size-64x08.webm
- vp90-2-02-size-64x10.webm
- vp90-2-02-size-64x16.webm
- vp90-2-02-size-64x18.webm
- vp90-2-02-size-64x32.webm
- vp90-2-02-size-64x34.webm
- vp90-2-02-size-64x64.webm
- vp90-2-02-size-64x66.webm
- vp90-2-02-size-66x08.webm
- vp90-2-02-size-66x10.webm
- vp90-2-02-size-66x16.webm
- vp90-2-02-size-66x18.webm
- vp90-2-02-size-66x32.webm
- vp90-2-02-size-66x34.webm
- vp90-2-02-size-66x64.webm
- vp90-2-02-size-66x66.webm
- 2 testcases failed due to unsupported format
- vp91-2-04-yuv422.webm
- vp91-2-04-yuv444.webm
- 1 testcase failed with CRC mismatch
- vp90-2-22-svc_1280x720_3.ivf
- Bug reported:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4371
- 2 testcase failed due to unsupported resolution after sequence change
- vp90-2-21-resize_inter_320x180_5_1-2.webm
- vp90-2-21-resize_inter_320x180_7_1-2.webm
- 1 testcase failed due to unsupported stream
- vp90-2-16-intra-only.webm
The result of fluster test on SM8250:
133/147 testcases passed while testing JCT-VC-HEVC_V1 with
GStreamer-H.265-V4L2-Gst1.0.
The failing test case:
- 10 testcases failed due to unsupported 10 bit format.
- DBLK_A_MAIN10_VIXS_4
- INITQP_B_Main10_Sony_1
- TSUNEQBD_A_MAIN10_Technicolor_2
- WP_A_MAIN10_Toshiba_3
- WP_MAIN10_B_Toshiba_3
- WPP_A_ericsson_MAIN10_2
- WPP_B_ericsson_MAIN10_2
- WPP_C_ericsson_MAIN10_2
- WPP_E_ericsson_MAIN10_2
- WPP_F_ericsson_MAIN10_2
- 4 testcase failed due to unsupported resolution
- PICSIZE_A_Bossen_1
- PICSIZE_B_Bossen_1
- WPP_D_ericsson_MAIN10_2
- WPP_D_ericsson_MAIN_2
232/305 testcases passed while testing VP9-TEST-VECTORS with
GStreamer-VP9-V4L2-Gst1.0.
The failing test case:
- 64 testcases failed due to unsupported resolution
- vp90-2-02-size-08x08.webm
- vp90-2-02-size-08x10.webm
- vp90-2-02-size-08x16.webm
- vp90-2-02-size-08x18.webm
- vp90-2-02-size-08x32.webm
- vp90-2-02-size-08x34.webm
- vp90-2-02-size-08x64.webm
- vp90-2-02-size-08x66.webm
- vp90-2-02-size-10x08.webm
- vp90-2-02-size-10x10.webm
- vp90-2-02-size-10x16.webm
- vp90-2-02-size-10x18.webm
- vp90-2-02-size-10x32.webm
- vp90-2-02-size-10x34.webm
- vp90-2-02-size-10x64.webm
- vp90-2-02-size-10x66.webm
- vp90-2-02-size-16x08.webm
- vp90-2-02-size-16x10.webm
- vp90-2-02-size-16x16.webm
- vp90-2-02-size-16x18.webm
- vp90-2-02-size-16x32.webm
- vp90-2-02-size-16x34.webm
- vp90-2-02-size-16x64.webm
- vp90-2-02-size-16x66.webm
- vp90-2-02-size-18x08.webm
- vp90-2-02-size-18x10.webm
- vp90-2-02-size-18x16.webm
- vp90-2-02-size-18x18.webm
- vp90-2-02-size-18x32.webm
- vp90-2-02-size-18x34.webm
- vp90-2-02-size-18x64.webm
- vp90-2-02-size-18x66.webm
- vp90-2-02-size-32x08.webm
- vp90-2-02-size-32x10.webm
- vp90-2-02-size-32x16.webm
- vp90-2-02-size-32x18.webm
- vp90-2-02-size-32x32.webm
- vp90-2-02-size-32x34.webm
- vp90-2-02-size-32x64.webm
- vp90-2-02-size-32x66.webm
- vp90-2-02-size-34x08.webm
- vp90-2-02-size-34x10.webm
- vp90-2-02-size-34x16.webm
- vp90-2-02-size-34x18.webm
- vp90-2-02-size-34x32.webm
- vp90-2-02-size-34x34.webm
- vp90-2-02-size-34x64.webm
- vp90-2-02-size-34x66.webm
- vp90-2-02-size-64x08.webm
- vp90-2-02-size-64x10.webm
- vp90-2-02-size-64x16.webm
- vp90-2-02-size-64x18.webm
- vp90-2-02-size-64x32.webm
- vp90-2-02-size-64x34.webm
- vp90-2-02-size-64x64.webm
- vp90-2-02-size-64x66.webm
- vp90-2-02-size-66x08.webm
- vp90-2-02-size-66x10.webm
- vp90-2-02-size-66x16.webm
- vp90-2-02-size-66x18.webm
- vp90-2-02-size-66x32.webm
- vp90-2-02-size-66x34.webm
- vp90-2-02-size-66x64.webm
- vp90-2-02-size-66x66.webm
- 2 testcases failed due to unsupported format
- vp91-2-04-yuv422.webm
- vp91-2-04-yuv444.webm
- 1 testcase failed with CRC mismatch
- vp90-2-22-svc_1280x720_3.ivf
- Bug raised:
https://gitlab.freedesktop.org/gstreamer/gstreamer/-/issues/4371
- 5 testcase failed due to unsupported resolution after sequence change
- vp90-2-21-resize_inter_320x180_5_1-2.webm
- vp90-2-21-resize_inter_320x180_7_1-2.webm
- vp90-2-21-resize_inter_320x240_5_1-2.webm
- vp90-2-21-resize_inter_320x240_7_1-2.webm
- vp90-2-18-resize.ivf
- 1 testcase failed with CRC mismatch
- vp90-2-16-intra-only.webm
Analysis: First few frames are marked by firmware as NO_SHOW frame.
Driver make buf state to VB2_BUF_STATE_ERROR for such frames.
Such buffers should be dropped by GST. But instead, the first frame
is being displayed and when a valid buffer is sent to client later
with same timestamp, its dropped, leading to CRC mismatch for first
frame.
To: Vikash Garodia <quic_vgarodia(a)quicinc.com>
To: Abhinav Kumar <quic_abhinavk(a)quicinc.com>
To: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
To: Mauro Carvalho Chehab <mchehab(a)kernel.org>
To: Hans Verkuil <hverkuil(a)xs4all.nl>
To: Stefan Schmidt <stefan.schmidt(a)linaro.org>
Cc: linux-media(a)vger.kernel.org
Cc: linux-arm-msm(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: Dmitry Baryshkov <dmitry.baryshkov(a)oss.qualcomm.com>
Cc: Neil Armstrong <neil.armstrong(a)linaro.org>
Cc: Nicolas Dufresne <nicolas.dufresne(a)collabora.com>
Cc: Dan Carpenter <dan.carpenter(a)linaro.org>
Signed-off-by: Dikshita Agarwal <quic_dikshita(a)quicinc.com>
---
Dikshita Agarwal (25):
media: iris: Skip destroying internal buffer if not dequeued
media: iris: Update CAPTURE format info based on OUTPUT format
media: iris: Avoid updating frame size to firmware during reconfig
media: iris: Drop port check for session property response
media: iris: Prevent HFI queue writes when core is in deinit state
media: iris: Remove error check for non-zero v4l2 controls
media: iris: Remove deprecated property setting to firmware
media: iris: Fix missing function pointer initialization
media: iris: Fix NULL pointer dereference
media: iris: Fix typo in depth variable
media: iris: Track flush responses to prevent premature completion
media: iris: Fix buffer preparation failure during resolution change
media: iris: Send V4L2_BUF_FLAG_ERROR for capture buffers with 0 filled length
media: iris: Skip flush on first sequence change
media: iris: Remove unnecessary re-initialization of flush completion
media: iris: Add handling for corrupt and drop frames
media: iris: Add handling for no show frames
media: iris: Improve last flag handling
media: iris: Remove redundant buffer count check in stream off
media: iris: Add a comment to explain usage of MBPS
media: iris: Add HEVC and VP9 formats for decoder
media: iris: Add platform capabilities for HEVC and VP9 decoders
media: iris: Set mandatory properties for HEVC and VP9 decoders.
media: iris: Add internal buffer calculation for HEVC and VP9 decoders
media: iris: Add codec specific check for VP9 decoder drain handling
drivers/media/platform/qcom/iris/iris_buffer.c | 35 +-
drivers/media/platform/qcom/iris/iris_buffer.h | 3 +-
drivers/media/platform/qcom/iris/iris_ctrls.c | 35 +-
drivers/media/platform/qcom/iris/iris_hfi_common.h | 1 +
.../platform/qcom/iris/iris_hfi_gen1_command.c | 48 ++-
.../platform/qcom/iris/iris_hfi_gen1_defines.h | 5 +-
.../platform/qcom/iris/iris_hfi_gen1_response.c | 37 +-
.../platform/qcom/iris/iris_hfi_gen2_command.c | 143 +++++++-
.../platform/qcom/iris/iris_hfi_gen2_defines.h | 5 +
.../platform/qcom/iris/iris_hfi_gen2_response.c | 56 ++-
drivers/media/platform/qcom/iris/iris_hfi_queue.c | 2 +-
drivers/media/platform/qcom/iris/iris_instance.h | 6 +
.../platform/qcom/iris/iris_platform_common.h | 28 +-
.../media/platform/qcom/iris/iris_platform_gen2.c | 198 ++++++++--
.../platform/qcom/iris/iris_platform_qcs8300.h | 126 +++++--
.../platform/qcom/iris/iris_platform_sm8250.c | 15 +-
drivers/media/platform/qcom/iris/iris_state.c | 2 +-
drivers/media/platform/qcom/iris/iris_state.h | 1 +
drivers/media/platform/qcom/iris/iris_vb2.c | 18 +-
drivers/media/platform/qcom/iris/iris_vdec.c | 116 +++---
drivers/media/platform/qcom/iris/iris_vdec.h | 11 +
drivers/media/platform/qcom/iris/iris_vidc.c | 36 +-
drivers/media/platform/qcom/iris/iris_vpu_buffer.c | 397 ++++++++++++++++++++-
drivers/media/platform/qcom/iris/iris_vpu_buffer.h | 46 ++-
24 files changed, 1159 insertions(+), 211 deletions(-)
---
base-commit: 398a1b33f1479af35ca915c5efc9b00d6204f8fa
change-id: 20250507-video-iris-hevc-vp9-59096b189050
prerequisite-message-id: <20250417-topic-sm8x50-iris-v10-v7-0-f020cb1d0e98(a)linaro.org>
prerequisite-patch-id: afffe7096c8e110a8da08c987983bc4441d39578
prerequisite-patch-id: b93c37dc7e09d1631b75387dc1ca90e3066dce17
prerequisite-patch-id: b7b50aa1657be59fd51c3e53d73382a1ee75a08e
prerequisite-patch-id: 30960743105a36f20b3ec4a9ff19e7bca04d6add
prerequisite-patch-id: 2bba98151ca103aa62a513a0fbd0df7ae64d9868
prerequisite-patch-id: 0e43a6d758b5fa5ab921c6aa3c19859e312b47d0
prerequisite-patch-id: 35f8dae1416977e88c2db7c767800c01822e266e
prerequisite-message-id: <20250501-qcs8300_iris-v7-0-b229d5347990(a)quicinc.com>
prerequisite-patch-id: e35b05c527217206ae871aef0d7b0261af0319ea
prerequisite-patch-id: 07ba0745c7d72796567e0a57f5c8e5355a8d2046
prerequisite-patch-id: 3398937a7fabb45934bb98a530eef73252231132
prerequisite-patch-id: 500bc3b8391940d3ebca222d2098b737414b2af4
prerequisite-patch-id: 2e72fe4d11d264db3d42fa450427d30171303c6f
Best regards,
--
Dikshita Agarwal <quic_dikshita(a)quicinc.com>
From: Peter Wang <peter.wang(a)mediatek.com>
Because the member id of struct ufs_hw_queue is u32 (hwq->id) and
the trace entry hwq_id is also u32, the type should be changed to u32.
If mcq is not supported, SDB mode only supports one hardware queue,
for which setting the hwq_id to 0 is more suitable.
Fixes: 4a52338bf288 ("scsi: ufs: core: Add trace event for MCQ")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Peter Wang <peter.wang(a)mediatek.com>
---
drivers/ufs/core/ufshcd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 7735421e3991..14e4cfbcb9eb 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -432,7 +432,7 @@ static void ufshcd_add_command_trace(struct ufs_hba *hba, unsigned int tag,
u8 opcode = 0, group_id = 0;
u32 doorbell = 0;
u32 intr;
- int hwq_id = -1;
+ u32 hwq_id = 0;
struct ufshcd_lrb *lrbp = &hba->lrb[tag];
struct scsi_cmnd *cmd = lrbp->cmd;
struct request *rq = scsi_cmd_to_rq(cmd);
--
2.45.2
The quilt patch titled
Subject: mm: fix folio_pte_batch() on XEN PV
has been removed from the -mm tree. Its filename was
mm-fix-folio_pte_batch-on-xen-pv.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Petr Van��k <arkamar(a)atlas.cz>
Subject: mm: fix folio_pte_batch() on XEN PV
Date: Fri, 2 May 2025 23:50:19 +0200
On XEN PV, folio_pte_batch() can incorrectly batch beyond the end of a
folio due to a corner case in pte_advance_pfn(). Specifically, when the
PFN following the folio maps to an invalidated MFN,
expected_pte = pte_advance_pfn(expected_pte, nr);
produces a pte_none(). If the actual next PTE in memory is also
pte_none(), the pte_same() succeeds,
if (!pte_same(pte, expected_pte))
break;
the loop is not broken, and batching continues into unrelated memory.
For example, with a 4-page folio, the PTE layout might look like this:
[ 53.465673] [ T2552] folio_pte_batch: printing PTE values at addr=0x7f1ac9dc5000
[ 53.465674] [ T2552] PTE[453] = 000000010085c125
[ 53.465679] [ T2552] PTE[454] = 000000010085d125
[ 53.465682] [ T2552] PTE[455] = 000000010085e125
[ 53.465684] [ T2552] PTE[456] = 000000010085f125
[ 53.465686] [ T2552] PTE[457] = 0000000000000000 <-- not present
[ 53.465689] [ T2552] PTE[458] = 0000000101da7125
pte_advance_pfn(PTE[456]) returns a pte_none() due to invalid PFN->MFN
mapping. The next actual PTE (PTE[457]) is also pte_none(), so the loop
continues and includes PTE[457] in the batch, resulting in 5 batched
entries for a 4-page folio. This triggers the following warning:
[ 53.465751] [ T2552] page: refcount:85 mapcount:20 mapping:ffff88813ff4f6a8 index:0x110 pfn:0x10085c
[ 53.465754] [ T2552] head: order:2 mapcount:80 entire_mapcount:0 nr_pages_mapped:4 pincount:0
[ 53.465756] [ T2552] memcg:ffff888003573000
[ 53.465758] [ T2552] aops:0xffffffff8226fd20 ino:82467c dentry name(?):"libc.so.6"
[ 53.465761] [ T2552] flags: 0x2000000000416c(referenced|uptodate|lru|active|private|head|node=0|zone=2)
[ 53.465764] [ T2552] raw: 002000000000416c ffffea0004021f08 ffffea0004021908 ffff88813ff4f6a8
[ 53.465767] [ T2552] raw: 0000000000000110 ffff888133d8bd40 0000005500000013 ffff888003573000
[ 53.465768] [ T2552] head: 002000000000416c ffffea0004021f08 ffffea0004021908 ffff88813ff4f6a8
[ 53.465770] [ T2552] head: 0000000000000110 ffff888133d8bd40 0000005500000013 ffff888003573000
[ 53.465772] [ T2552] head: 0020000000000202 ffffea0004021701 000000040000004f 00000000ffffffff
[ 53.465774] [ T2552] head: 0000000300000003 8000000300000002 0000000000000013 0000000000000004
[ 53.465775] [ T2552] page dumped because: VM_WARN_ON_FOLIO((_Generic((page + nr_pages - 1), const struct page *: (const struct folio *)_compound_head(page + nr_pages - 1), struct page *: (struct folio *)_compound_head(page + nr_pages - 1))) != folio)
Original code works as expected everywhere, except on XEN PV, where
pte_advance_pfn() can yield a pte_none() after balloon inflation due to
MFNs invalidation. In XEN, pte_advance_pfn() ends up calling
__pte()->xen_make_pte()->pte_pfn_to_mfn(), which returns pte_none() when
mfn == INVALID_P2M_ENTRY.
The pte_pfn_to_mfn() documents that nastiness:
If there's no mfn for the pfn, then just create an
empty non-present pte. Unfortunately this loses
information about the original pfn, so
pte_mfn_to_pfn is asymmetric.
While such hacks should certainly be removed, we can do better in
folio_pte_batch() and simply check ahead of time how many PTEs we can
possibly batch in our folio.
This way, we can not only fix the issue but cleanup the code: removing the
pte_pfn() check inside the loop body and avoiding end_ptr comparison +
arithmetic.
Link: https://lkml.kernel.org/r/20250502215019.822-2-arkamar@atlas.cz
Fixes: f8d937761d65 ("mm/memory: optimize fork() with PTE-mapped THP")
Co-developed-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Signed-off-by: Petr Van��k <arkamar(a)atlas.cz>
Cc: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/internal.h | 27 +++++++++++----------------
1 file changed, 11 insertions(+), 16 deletions(-)
--- a/mm/internal.h~mm-fix-folio_pte_batch-on-xen-pv
+++ a/mm/internal.h
@@ -248,11 +248,9 @@ static inline int folio_pte_batch(struct
pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags,
bool *any_writable, bool *any_young, bool *any_dirty)
{
- unsigned long folio_end_pfn = folio_pfn(folio) + folio_nr_pages(folio);
- const pte_t *end_ptep = start_ptep + max_nr;
pte_t expected_pte, *ptep;
bool writable, young, dirty;
- int nr;
+ int nr, cur_nr;
if (any_writable)
*any_writable = false;
@@ -265,11 +263,15 @@ static inline int folio_pte_batch(struct
VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio);
VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) != folio, folio);
+ /* Limit max_nr to the actual remaining PFNs in the folio we could batch. */
+ max_nr = min_t(unsigned long, max_nr,
+ folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte));
+
nr = pte_batch_hint(start_ptep, pte);
expected_pte = __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flags);
ptep = start_ptep + nr;
- while (ptep < end_ptep) {
+ while (nr < max_nr) {
pte = ptep_get(ptep);
if (any_writable)
writable = !!pte_write(pte);
@@ -282,14 +284,6 @@ static inline int folio_pte_batch(struct
if (!pte_same(pte, expected_pte))
break;
- /*
- * Stop immediately once we reached the end of the folio. In
- * corner cases the next PFN might fall into a different
- * folio.
- */
- if (pte_pfn(pte) >= folio_end_pfn)
- break;
-
if (any_writable)
*any_writable |= writable;
if (any_young)
@@ -297,12 +291,13 @@ static inline int folio_pte_batch(struct
if (any_dirty)
*any_dirty |= dirty;
- nr = pte_batch_hint(ptep, pte);
- expected_pte = pte_advance_pfn(expected_pte, nr);
- ptep += nr;
+ cur_nr = pte_batch_hint(ptep, pte);
+ expected_pte = pte_advance_pfn(expected_pte, cur_nr);
+ ptep += cur_nr;
+ nr += cur_nr;
}
- return min(ptep - start_ptep, max_nr);
+ return min(nr, max_nr);
}
/**
_
Patches currently in -mm which might be from arkamar(a)atlas.cz are