- Linux-stable-mirror - lists.linaro.org

FAILED: Patch "thunderbolt: Honor TMU requirements in the domain when setting TMU mode" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3cea8af2d1a9ae5869b47c3dabe3b20f331f3bbd Mon Sep 17 00:00:00 2001 From: Gil Fine <gil.fine(a)linux.intel.com> Date: Thu, 10 Oct 2024 17:29:42 +0300 Subject: [PATCH] thunderbolt: Honor TMU requirements in the domain when setting TMU mode Currently, when configuring TMU (Time Management Unit) mode of a given router, we take into account only its own TMU requirements ignoring other routers in the domain. This is problematic if the router we are configuring has lower TMU requirements than what is already configured in the domain. In the scenario below, we have a host router with two USB4 ports: A and B. Port A connected to device router #1 (which supports CL states) and existing DisplayPort tunnel, thus, the TMU mode is HiFi uni-directional. 1. Initial topology [Host] A/ / [Device #1] / Monitor 2. Plug in device #2 (that supports CL states) to downstream port B of the host router [Host] A/ B\ / \ [Device #1] [Device #2] / Monitor The TMU mode on port B and port A will be configured to LowRes which is not what we want and will cause monitor to start flickering. To address this we first scan the domain and search for any router configured to HiFi uni-directional mode, and if found, configure TMU mode of the given router to HiFi uni-directional as well. Cc: stable(a)vger.kernel.org Signed-off-by: Gil Fine <gil.fine(a)linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com> --- drivers/thunderbolt/tb.c | 48 +++++++++++++++++++++++++++++++++++----- 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index 10e719dd837ce..4f777788e9179 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -288,6 +288,24 @@ static void tb_increase_tmu_accuracy(struct tb_tunnel *tunnel) device_for_each_child(&sw->dev, NULL, tb_increase_switch_tmu_accuracy); } +static int tb_switch_tmu_hifi_uni_required(struct device *dev, void *not_used) +{ + struct tb_switch *sw = tb_to_switch(dev); + + if (sw && tb_switch_tmu_is_enabled(sw) && + tb_switch_tmu_is_configured(sw, TB_SWITCH_TMU_MODE_HIFI_UNI)) + return 1; + + return device_for_each_child(dev, NULL, + tb_switch_tmu_hifi_uni_required); +} + +static bool tb_tmu_hifi_uni_required(struct tb *tb) +{ + return device_for_each_child(&tb->dev, NULL, + tb_switch_tmu_hifi_uni_required) == 1; +} + static int tb_enable_tmu(struct tb_switch *sw) { int ret; @@ -302,12 +320,30 @@ static int tb_enable_tmu(struct tb_switch *sw) ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_MEDRES_ENHANCED_UNI); if (ret == -EOPNOTSUPP) { - if (tb_switch_clx_is_enabled(sw, TB_CL1)) - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_LOWRES); - else - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_HIFI_BI); + if (tb_switch_clx_is_enabled(sw, TB_CL1)) { + /* + * Figure out uni-directional HiFi TMU requirements + * currently in the domain. If there are no + * uni-directional HiFi requirements we can put the TMU + * into LowRes mode. + * + * Deliberately skip bi-directional HiFi links + * as these work independently of other links + * (and they do not allow any CL states anyway). + */ + if (tb_tmu_hifi_uni_required(sw->tb)) + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_HIFI_UNI); + else + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_LOWRES); + } else { + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); + } + + /* If not supported, fallback to bi-directional HiFi */ + if (ret == -EOPNOTSUPP) + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); } if (ret) return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "gpiolib: fix debugfs newline separators" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3e8b7238b427e05498034c240451af5f5495afda Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Mon, 28 Oct 2024 13:49:58 +0100 Subject: [PATCH] gpiolib: fix debugfs newline separators The gpiolib debugfs interface exports a list of all gpio chips in a system and the state of their pins. The gpio chip sections are supposed to be separated by a newline character, but a long-standing bug prevents the separator from being included when output is generated in multiple sessions, making the output inconsistent and hard to read. Make sure to only suppress the newline separator at the beginning of the file as intended. Fixes: f9c4a31f6150 ("gpiolib: Use seq_file's iterator interface") Cc: stable(a)vger.kernel.org # 3.7 Cc: Thierry Reding <treding(a)nvidia.com> Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Link: https://lore.kernel.org/r/20241028125000.24051-2-johan+linaro@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> --- drivers/gpio/gpiolib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d5952ab7752c2..e27488a90bc97 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -4926,6 +4926,8 @@ static void *gpiolib_seq_start(struct seq_file *s, loff_t *pos) return NULL; s->private = priv; + if (*pos > 0) + priv->newline = true; priv->idx = srcu_read_lock(&gpio_devices_srcu); list_for_each_entry_srcu(gdev, &gpio_devices, list, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mei: use kvmalloc for read buffer" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4adf613e01bf99e1739f6ff3e162ad5b7d578d1a Mon Sep 17 00:00:00 2001 From: Alexander Usyskin <alexander.usyskin(a)intel.com> Date: Tue, 15 Oct 2024 15:31:57 +0300 Subject: [PATCH] mei: use kvmalloc for read buffer Read buffer is allocated according to max message size, reported by the firmware and may reach 64K in systems with pxp client. Contiguous 64k allocation may fail under memory pressure. Read buffer is used as in-driver message storage and not required to be contiguous. Use kvmalloc to allow kernel to allocate non-contiguous memory. Fixes: 3030dc056459 ("mei: add wrapper for queuing control commands.") Cc: stable <stable(a)kernel.org> Reported-by: Rohit Agarwal <rohiagar(a)chromium.org> Closes: https://lore.kernel.org/all/20240813084542.2921300-1-rohiagar@chromium.org/ Tested-by: Brian Geffon <bgeffon(a)google.com> Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com> Acked-by: Tomas Winkler <tomasw(a)gmail.com> Link: https://lore.kernel.org/r/20241015123157.2337026-1-alexander.usyskin@intel.… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/misc/mei/client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/mei/client.c b/drivers/misc/mei/client.c index 9d090fa07516f..be011cef12e5d 100644 --- a/drivers/misc/mei/client.c +++ b/drivers/misc/mei/client.c @@ -321,7 +321,7 @@ void mei_io_cb_free(struct mei_cl_cb *cb) return; list_del(&cb->list); - kfree(cb->buf.data); + kvfree(cb->buf.data); kfree(cb->ext_hdr); kfree(cb); } @@ -497,7 +497,7 @@ struct mei_cl_cb *mei_cl_alloc_cb(struct mei_cl *cl, size_t length, if (length == 0) return cb; - cb->buf.data = kmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); + cb->buf.data = kvmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); if (!cb->buf.data) { mei_io_cb_free(cb); return NULL; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "vmscan,migrate: fix page count imbalance on node stats when demoting pages" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 35e41024c4c2b02ef8207f61b9004f6956cf037b Mon Sep 17 00:00:00 2001 From: Gregory Price <gourry(a)gourry.net> Date: Fri, 25 Oct 2024 10:17:24 -0400 Subject: [PATCH] vmscan,migrate: fix page count imbalance on node stats when demoting pages When numa balancing is enabled with demotion, vmscan will call migrate_pages when shrinking LRUs. migrate_pages will decrement the the node's isolated page count, leading to an imbalanced count when invoked from (MG)LRU code. The result is dmesg output like such: $ cat /proc/sys/vm/stat_refresh [77383.088417] vmstat_refresh: nr_isolated_anon -103212 [77383.088417] vmstat_refresh: nr_isolated_file -899642 This negative value may impact compaction and reclaim throttling. The following path produces the decrement: shrink_folio_list demote_folio_list migrate_pages migrate_pages_batch migrate_folio_move migrate_folio_done mod_node_page_state(-ve) <- decrement This path happens for SUCCESSFUL migrations, not failures. Typically callers to migrate_pages are required to handle putback/accounting for failures, but this is already handled in the shrink code. When accounting for migrations, instead do not decrement the count when the migration reason is MR_DEMOTION. As of v6.11, this demotion logic is the only source of MR_DEMOTION. Link: https://lkml.kernel.org/r/20241025141724.17927-1-gourry@gourry.net Fixes: 26aa2d199d6f ("mm/migrate: demote pages during reclaim") Signed-off-by: Gregory Price <gourry(a)gourry.net> Reviewed-by: Yang Shi <shy828301(a)gmail.com> Reviewed-by: Davidlohr Bueso <dave(a)stgolabs.net> Reviewed-by: Shakeel Butt <shakeel.butt(a)linux.dev> Reviewed-by: "Huang, Ying" <ying.huang(a)intel.com> Reviewed-by: Oscar Salvador <osalvador(a)suse.de> Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 7e520562d421a..fab84a7760889 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1178,7 +1178,7 @@ static void migrate_folio_done(struct folio *src, * not accounted to NR_ISOLATED_*. They can be recognized * as __folio_test_movable */ - if (likely(!__folio_test_movable(src))) + if (likely(!__folio_test_movable(src)) && reason != MR_DEMOTION) mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON + folio_is_file_lru(src), -folio_nr_pages(src)); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "nilfs2: fix kernel bug due to missing clearing of checked flag" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/page.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd1..10def4b559956 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) { -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Limit internal Mic boost on Dell platform" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 78e7be018784934081afec77f96d49a2483f9188 Mon Sep 17 00:00:00 2001 From: Kailang Yang <kailang(a)realtek.com> Date: Fri, 18 Oct 2024 13:53:24 +0800 Subject: [PATCH] ALSA: hda/realtek: Limit internal Mic boost on Dell platform Dell want to limit internal Mic boost on all Dell platform. Signed-off-by: Kailang Yang <kailang(a)realtek.com> Cc: <stable(a)vger.kernel.org> Link: https://lore.kernel.org/561fc5f5eff04b6cbd79ed173cd1c1db@realtek.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 21 ++++++++++++++++++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index 3567b14b52b7c..784ac058418fc 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -7521,6 +7521,7 @@ enum { ALC286_FIXUP_SONY_MIC_NO_PRESENCE, ALC269_FIXUP_PINCFG_NO_HP_TO_LINEOUT, ALC269_FIXUP_DELL1_MIC_NO_PRESENCE, + ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, ALC269_FIXUP_DELL2_MIC_NO_PRESENCE, ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, @@ -7555,6 +7556,7 @@ enum { ALC255_FIXUP_ACER_MIC_NO_PRESENCE, ALC255_FIXUP_ASUS_MIC_NO_PRESENCE, ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, ALC255_FIXUP_DELL2_MIC_NO_PRESENCE, ALC255_FIXUP_HEADSET_MODE, ALC255_FIXUP_HEADSET_MODE_NO_HP_MIC, @@ -8114,6 +8116,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC269_FIXUP_HEADSET_MODE }, + [ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc269_fixup_limit_int_mic_boost, + .chained = true, + .chain_id = ALC269_FIXUP_DELL1_MIC_NO_PRESENCE + }, [ALC269_FIXUP_DELL2_MIC_NO_PRESENCE] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { @@ -8394,6 +8402,12 @@ static const struct hda_fixup alc269_fixups[] = { .chained = true, .chain_id = ALC255_FIXUP_HEADSET_MODE }, + [ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST] = { + .type = HDA_FIXUP_FUNC, + .v.func = alc269_fixup_limit_int_mic_boost, + .chained = true, + .chain_id = ALC255_FIXUP_DELL1_MIC_NO_PRESENCE + }, [ALC255_FIXUP_DELL2_MIC_NO_PRESENCE] = { .type = HDA_FIXUP_PINS, .v.pins = (const struct hda_pintbl[]) { @@ -11076,6 +11090,7 @@ static const struct hda_model_fixup alc269_fixup_models[] = { {.id = ALC269_FIXUP_DELL2_MIC_NO_PRESENCE, .name = "dell-headset-dock"}, {.id = ALC269_FIXUP_DELL3_MIC_NO_PRESENCE, .name = "dell-headset3"}, {.id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, .name = "dell-headset4"}, + {.id = ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET, .name = "dell-headset4-quiet"}, {.id = ALC283_FIXUP_CHROME_BOOK, .name = "alc283-dac-wcaps"}, {.id = ALC283_FIXUP_SENSE_COMBO_JACK, .name = "alc283-sense-combo"}, {.id = ALC292_FIXUP_TPT440_DOCK, .name = "tpt440-dock"}, @@ -11630,16 +11645,16 @@ static const struct snd_hda_pin_quirk alc269_fallback_pin_fixup_tbl[] = { SND_HDA_PIN_QUIRK(0x10ec0289, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, {0x19, 0x40000000}, {0x1b, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE, + SND_HDA_PIN_QUIRK(0x10ec0295, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE_QUIET, {0x19, 0x40000000}, {0x1b, 0x40000000}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, {0x19, 0x40000000}, {0x1a, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0236, 0x1028, "Dell", ALC255_FIXUP_DELL1_MIC_NO_PRESENCE, + SND_HDA_PIN_QUIRK(0x10ec0236, 0x1028, "Dell", ALC255_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, {0x19, 0x40000000}, {0x1a, 0x40000000}), - SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC274_FIXUP_DELL_AIO_LINEOUT_VERB, + SND_HDA_PIN_QUIRK(0x10ec0274, 0x1028, "Dell", ALC269_FIXUP_DELL1_LIMIT_INT_MIC_BOOST, {0x19, 0x40000000}, {0x1a, 0x40000000}), SND_HDA_PIN_QUIRK(0x10ec0256, 0x1043, "ASUS", ALC2XX_FIXUP_HEADSET_MIC, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 117932eea99b729ee5d12783601a4f7f5fd58a23 Mon Sep 17 00:00:00 2001 From: Chen Ridong <chenridong(a)huawei.com> Date: Tue, 8 Oct 2024 11:24:56 +0000 Subject: [PATCH] cgroup/bpf: use a dedicated workqueue for cgroup bpf destruction A hung_task problem shown below was found: INFO: task kworker/0:0:8 blocked for more than 327 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Workqueue: events cgroup_bpf_release Call Trace: <TASK> __schedule+0x5a2/0x2050 ? find_held_lock+0x33/0x100 ? wq_worker_sleeping+0x9e/0xe0 schedule+0x9f/0x180 schedule_preempt_disabled+0x25/0x50 __mutex_lock+0x512/0x740 ? cgroup_bpf_release+0x1e/0x4d0 ? cgroup_bpf_release+0xcf/0x4d0 ? process_scheduled_works+0x161/0x8a0 ? cgroup_bpf_release+0x1e/0x4d0 ? mutex_lock_nested+0x2b/0x40 ? __pfx_delay_tsc+0x10/0x10 mutex_lock_nested+0x2b/0x40 cgroup_bpf_release+0xcf/0x4d0 ? process_scheduled_works+0x161/0x8a0 ? trace_event_raw_event_workqueue_execute_start+0x64/0xd0 ? process_scheduled_works+0x161/0x8a0 process_scheduled_works+0x23a/0x8a0 worker_thread+0x231/0x5b0 ? __pfx_worker_thread+0x10/0x10 kthread+0x14d/0x1c0 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x59/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1b/0x30 </TASK> This issue can be reproduced by the following pressuse test: 1. A large number of cpuset cgroups are deleted. 2. Set cpu on and off repeatly. 3. Set watchdog_thresh repeatly. The scripts can be obtained at LINK mentioned above the signature. The reason for this issue is cgroup_mutex and cpu_hotplug_lock are acquired in different tasks, which may lead to deadlock. It can lead to a deadlock through the following steps: 1. A large number of cpusets are deleted asynchronously, which puts a large number of cgroup_bpf_release works into system_wq. The max_active of system_wq is WQ_DFL_ACTIVE(256). Consequently, all active works are cgroup_bpf_release works, and many cgroup_bpf_release works will be put into inactive queue. As illustrated in the diagram, there are 256 (in the acvtive queue) + n (in the inactive queue) works. 2. Setting watchdog_thresh will hold cpu_hotplug_lock.read and put smp_call_on_cpu work into system_wq. However step 1 has already filled system_wq, 'sscs.work' is put into inactive queue. 'sscs.work' has to wait until the works that were put into the inacvtive queue earlier have executed (n cgroup_bpf_release), so it will be blocked for a while. 3. Cpu offline requires cpu_hotplug_lock.write, which is blocked by step 2. 4. Cpusets that were deleted at step 1 put cgroup_release works into cgroup_destroy_wq. They are competing to get cgroup_mutex all the time. When cgroup_metux is acqured by work at css_killed_work_fn, it will call cpuset_css_offline, which needs to acqure cpu_hotplug_lock.read. However, cpuset_css_offline will be blocked for step 3. 5. At this moment, there are 256 works in active queue that are cgroup_bpf_release, they are attempting to acquire cgroup_mutex, and as a result, all of them are blocked. Consequently, sscs.work can not be executed. Ultimately, this situation leads to four processes being blocked, forming a deadlock. system_wq(step1) WatchDog(step2) cpu offline(step3) cgroup_destroy_wq(step4) ... 2000+ cgroups deleted asyn 256 actives + n inactives __lockup_detector_reconfigure P(cpu_hotplug_lock.read) put sscs.work into system_wq 256 + n + 1(sscs.work) sscs.work wait to be executed warting sscs.work finish percpu_down_write P(cpu_hotplug_lock.write) ...blocking... css_killed_work_fn P(cgroup_mutex) cpuset_css_offline P(cpu_hotplug_lock.read) ...blocking... 256 cgroup_bpf_release mutex_lock(&cgroup_mutex); ..blocking... To fix the problem, place cgroup_bpf_release works on a dedicated workqueue which can break the loop and solve the problem. System wqs are for misc things which shouldn't create a large number of concurrent work items. If something is going to generate >WQ_DFL_ACTIVE(256) concurrent work items, it should use its own dedicated workqueue. Fixes: 4bfc0bb2c60e ("bpf: decouple the lifetime of cgroup_bpf from cgroup itself") Cc: stable(a)vger.kernel.org # v5.3+ Link: https://lore.kernel.org/cgroups/e90c32d2-2a85-4f28-9154-09c7d320cb60@huawei… Tested-by: Vishal Chourasia <vishalc(a)linux.ibm.com> Signed-off-by: Chen Ridong <chenridong(a)huawei.com> Signed-off-by: Tejun Heo <tj(a)kernel.org> --- kernel/bpf/cgroup.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c index e7113d700b878..025d7e2214aeb 100644 --- a/kernel/bpf/cgroup.c +++ b/kernel/bpf/cgroup.c @@ -24,6 +24,23 @@ DEFINE_STATIC_KEY_ARRAY_FALSE(cgroup_bpf_enabled_key, MAX_CGROUP_BPF_ATTACH_TYPE); EXPORT_SYMBOL(cgroup_bpf_enabled_key); +/* + * cgroup bpf destruction makes heavy use of work items and there can be a lot + * of concurrent destructions. Use a separate workqueue so that cgroup bpf + * destruction work items don't end up filling up max_active of system_wq + * which may lead to deadlock. + */ +static struct workqueue_struct *cgroup_bpf_destroy_wq; + +static int __init cgroup_bpf_wq_init(void) +{ + cgroup_bpf_destroy_wq = alloc_workqueue("cgroup_bpf_destroy", 0, 1); + if (!cgroup_bpf_destroy_wq) + panic("Failed to alloc workqueue for cgroup bpf destroy.\n"); + return 0; +} +core_initcall(cgroup_bpf_wq_init); + /* __always_inline is necessary to prevent indirect call through run_prog * function pointer. */ @@ -334,7 +351,7 @@ static void cgroup_bpf_release_fn(struct percpu_ref *ref) struct cgroup *cgrp = container_of(ref, struct cgroup, bpf.refcnt); INIT_WORK(&cgrp->bpf.release_work, cgroup_bpf_release); - queue_work(system_wq, &cgrp->bpf.release_work); + queue_work(cgroup_bpf_destroy_wq, &cgrp->bpf.release_work); } /* Get underlying bpf_prog of bpf_prog_list entry, regardless if it's through -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: usb-audio: Add quirks for Dell WD19 dock" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4413665dd6c528b31284119e3571c25f371e1c36 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Jan=20Sch=C3=A4r?= <jan(a)jschaer.ch> Date: Tue, 29 Oct 2024 23:12:49 +0100 Subject: [PATCH] ALSA: usb-audio: Add quirks for Dell WD19 dock MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The WD19 family of docks has the same audio chipset as the WD15. This change enables jack detection on the WD19. We don't need the dell_dock_mixer_init quirk for the WD19. It is only needed because of the dell_alc4020_map quirk for the WD15 in mixer_maps.c, which disables the volume controls. Even for the WD15, this quirk was apparently only needed when the dock firmware was not updated. Signed-off-by: Jan Schär <jan(a)jschaer.ch> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029221249.15661-1-jan@jschaer.ch Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/usb/mixer_quirks.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sound/usb/mixer_quirks.c b/sound/usb/mixer_quirks.c index 2a9594f34dac6..6456e87e2f397 100644 --- a/sound/usb/mixer_quirks.c +++ b/sound/usb/mixer_quirks.c @@ -4042,6 +4042,9 @@ int snd_usb_mixer_apply_create_quirk(struct usb_mixer_interface *mixer) break; err = dell_dock_mixer_init(mixer); break; + case USB_ID(0x0bda, 0x402e): /* Dell WD19 dock */ + err = dell_dock_mixer_create(mixer); + break; case USB_ID(0x2a39, 0x3fd2): /* RME ADI-2 Pro */ case USB_ID(0x2a39, 0x3fd3): /* RME ADI-2 DAC */ -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "wifi: iwlwifi: mvm: fix 6 GHz scan construction" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001 From: Johannes Berg <johannes.berg(a)intel.com> Date: Wed, 23 Oct 2024 09:17:44 +0200 Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction If more than 255 colocated APs exist for the set of all APs found during 2.4/5 GHz scanning, then the 6 GHz scan construction will loop forever since the loop variable has type u8, which can never reach the number found when that's bigger than 255, and is stored in a u32 variable. Also move it into the loops to have a smaller scope. Using a u32 there is fine, we limit the number of APs in the scan list and each has a limit on the number of RNR entries due to the frame size. With a limit of 1000 scan results, a frame size upper bound of 4096 (really it's more like ~2300) and a TBTT entry size of at least 11, we get an upper bound for the number of ~372k, well in the bounds of a u32. Cc: stable(a)vger.kernel.org Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375 Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101… Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> --- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 3ce9150213a74..ddcbd80a49fb2 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, &cp->channel_config[ch_cnt]; u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0; - u8 j, k, n_s_ssids = 0, n_bssids = 0; + u8 k, n_s_ssids = 0, n_bssids = 0; u8 max_s_ssids, max_bssids; bool force_passive = false, found = false, allow_passive = true, unsolicited_probe_on_chan = false, psc_no_listen = false; @@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, cfg->v5.iter_count = 1; cfg->v5.iter_interval = 0; - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { s8 tmp_psd_20; if (!(scan_6ghz_params[j].channel_idx == i)) @@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, * SSID. * TODO: improve this logic */ - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { if (!(scan_6ghz_params[j].channel_idx == i)) continue; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From aec8e6bf839101784f3ef037dcdb9432c3f32343 Mon Sep 17 00:00:00 2001 From: Zhihao Cheng <chengzhihao1(a)huawei.com> Date: Mon, 21 Oct 2024 22:02:15 +0800 Subject: [PATCH] btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mounting btrfs from two images (which have the same one fsid and two different dev_uuids) in certain executing order may trigger an UAF for variable 'device->bdev_file' in __btrfs_free_extra_devids(). And following are the details: 1. Attach image_1 to loop0, attach image_2 to loop1, and scan btrfs devices by ioctl(BTRFS_IOC_SCAN_DEV): / btrfs_device_1 → loop0 fs_device \ btrfs_device_2 → loop1 2. mount /dev/loop0 /mnt btrfs_open_devices btrfs_device_1->bdev_file = btrfs_get_bdev_and_sb(loop0) btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree fail: btrfs_close_devices // -ENOMEM btrfs_close_bdev(btrfs_device_1) fput(btrfs_device_1->bdev_file) // btrfs_device_1->bdev_file is freed btrfs_close_bdev(btrfs_device_2) fput(btrfs_device_2->bdev_file) 3. mount /dev/loop1 /mnt btrfs_open_devices btrfs_get_bdev_and_sb(&bdev_file) // EIO, btrfs_device_1->bdev_file is not assigned, // which points to a freed memory area btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree btrfs_free_extra_devids if (btrfs_device_1->bdev_file) fput(btrfs_device_1->bdev_file) // UAF ! Fix it by setting 'device->bdev_file' as 'NULL' after closing the btrfs_device in btrfs_close_one_device(). Fixes: 142388194191 ("btrfs: do not background blkdev_put()") CC: stable(a)vger.kernel.org # 4.19+ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219408 Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/volumes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8f340ad1d9384..eb51b609190fb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1105,6 +1105,7 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (device->bdev) { fs_devices->open_devices--; device->bdev = NULL; + device->bdev_file = NULL; } clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); btrfs_destroy_dev_zone_info(device); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: shmem: fix data-race in shmem_getattr()" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From d949d1d14fa281ace388b1de978e8f2cd52875cf Mon Sep 17 00:00:00 2001 From: Jeongjun Park <aha310510(a)gmail.com> Date: Mon, 9 Sep 2024 21:35:58 +0900 Subject: [PATCH] mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Reported-by: syzbot <syzkaller(a)googlegroup.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/shmem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23cf..4ba1d00fabdaa 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1166,7 +1166,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); + inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); + inode_unlock_shared(inode); if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1" failed to apply to v5.15-stable tree

by Sasha Levin

The patch below does not apply to the v5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From e49370d769e71456db3fbd982e95bab8c69f73e8 Mon Sep 17 00:00:00 2001 From: Christoffer Sandberg <cs(a)tuxedo.de> Date: Tue, 29 Oct 2024 16:16:53 +0100 Subject: [PATCH] ALSA: hda/realtek: Fix headset mic on TUXEDO Stellaris 16 Gen6 mb1 Quirk is needed to enable headset microphone on missing pin 0x19. Signed-off-by: Christoffer Sandberg <cs(a)tuxedo.de> Signed-off-by: Werner Sembach <wse(a)tuxedocomputers.com> Cc: <stable(a)vger.kernel.org> Link: https://patch.msgid.link/20241029151653.80726-2-wse@tuxedocomputers.com Signed-off-by: Takashi Iwai <tiwai(a)suse.de> --- sound/pci/hda/patch_realtek.c | 1 + 1 file changed, 1 insertion(+) diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c index e06a6fdc0bab7..571fa8a6c9e12 100644 --- a/sound/pci/hda/patch_realtek.c +++ b/sound/pci/hda/patch_realtek.c @@ -11008,6 +11008,7 @@ static const struct snd_pci_quirk alc269_fixup_tbl[] = { SND_PCI_QUIRK(0x1d05, 0x115c, "TongFang GMxTGxx", ALC269_FIXUP_NO_SHUTUP), SND_PCI_QUIRK(0x1d05, 0x121b, "TongFang GMxAGxx", ALC269_FIXUP_NO_SHUTUP), SND_PCI_QUIRK(0x1d05, 0x1387, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC), + SND_PCI_QUIRK(0x1d05, 0x1409, "TongFang GMxIXxx", ALC2XX_FIXUP_HEADSET_MIC), SND_PCI_QUIRK(0x1d17, 0x3288, "Haier Boyue G42", ALC269VC_FIXUP_ACER_VCOPPERBOX_PINS), SND_PCI_QUIRK(0x1d72, 0x1602, "RedmiBook", ALC255_FIXUP_XIAOMI_HEADSET_MIC), SND_PCI_QUIRK(0x1d72, 0x1701, "XiaomiNotebook Pro", ALC298_FIXUP_DELL1_MIC_NO_PRESENCE), -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1b6063a57754eae5705753c01e78dc268b989038 Mon Sep 17 00:00:00 2001 From: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Date: Fri, 11 Oct 2024 11:12:19 -0400 Subject: [PATCH] Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35" This reverts commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35") [why & how] The offending commit exposes a hang with lid close/open behavior. Both issues seem to be related to ODM 2:1 mode switching, so there is another issue generic to that sequence that needs to be investigated. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com> Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> (cherry picked from commit 68bf95317ebf2cfa7105251e4279e951daceefb7) Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c index 11c904ae29586..c4c52173ef224 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c @@ -303,6 +303,7 @@ void build_unoptimized_policy_settings(enum dml_project_id project, struct dml_m if (project == dml_project_dcn35 || project == dml_project_dcn351) { policy->DCCProgrammingAssumesScanDirectionUnknownFinal = false; + policy->EnhancedPrefetchScheduleAccelerationFinal = 0; policy->AllowForPStateChangeOrStutterInVBlankFinal = dml_prefetch_support_uclk_fclk_and_stutter_if_possible; /*new*/ policy->UseOnlyMaxPrefetchModes = 1; } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:39 +0000 Subject: [PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton(a)google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Reported-by: David Stevens <stevensd(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: kernel test robot <lkp(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mmzone.h | 5 ++- mm/rmap.c | 9 ++--- mm/vmscan.c | 88 +++++++++++++++++++++++------------------- 3 files changed, 55 insertions(+), 47 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9342e5692dab6..5b1c984daf454 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -555,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -574,8 +574,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index a8797d1b3d496..73d5998677d40 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -885,13 +885,10 @@ static bool folio_referenced_one(struct folio *folio, return false; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f1d33e4b3601..ddaaff67642e1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include <linux/khugepaged.h> #include <linux/rculist_nulls.h> #include <linux/random.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -3294,7 +3295,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3306,13 +3308,20 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte))) return -1; + if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3324,9 +3333,15 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pmd_devmap(pmd))) return -1; + if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3335,10 +3350,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3394,20 +3405,16 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { - continue; - } - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(args->vma, addr, pte + i)) + continue; young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,21 +3480,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) + if (!pmd_present(pmd[i])) goto next; if (!pmd_trans_huge(pmd[i])) { - if (!walk->force_scan && should_clear_pmd_young()) + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,24 +3556,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - continue; - } - - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - continue; - - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + if (pfn != -1) + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } - if (!walk->force_scan && should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -4036,13 +4041,13 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; - int young = 0; + int young = 1; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; @@ -4058,12 +4063,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!ptep_clear_young_notify(vma, addr, pte)) + return false; + if (spin_is_contended(pvmw->ptl)) - return; + return true; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return true; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4071,6 +4079,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return true; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4084,7 +4095,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return true; arch_enter_lazy_mmu_mode(); @@ -4094,19 +4105,16 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) - continue; - folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(vma, addr, pte + i)) + continue; young++; @@ -4136,6 +4144,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return true; } /****************************************************************************** -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "x86/traps: move kmsan check after instrumentation_begin" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1db272864ff250b5e607283eaec819e1186c8e26 Mon Sep 17 00:00:00 2001 From: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com> Date: Wed, 16 Oct 2024 20:24:07 +0500 Subject: [PATCH] x86/traps: move kmsan check after instrumentation_begin During x86_64 kernel build with CONFIG_KMSAN, the objtool warns following: AR built-in.a AR vmlinux.a LD vmlinux.o vmlinux.o: warning: objtool: handle_bug+0x4: call to kmsan_unpoison_entry_regs() leaves .noinstr.text section OBJCOPY modules.builtin.modinfo GEN modules.builtin MODPOST Module.symvers CC .vmlinux.export.o Moving kmsan_unpoison_entry_regs() _after_ instrumentation_begin() fixes the warning. There is decode_bug(regs->ip, &imm) is left before KMSAN unpoisoining, but it has the return condition and if we include it after instrumentation_begin() it results the warning "return with instrumentation enabled", hence, I'm concerned that regs will not be KMSAN unpoisoned if `ud_type == BUG_NONE` is true. Link: https://lkml.kernel.org/r/20241016152407.3149001-1-snovitoll@gmail.com Fixes: ba54d194f8da ("x86/traps: avoid KMSAN bugs originating from handle_bug()") Signed-off-by: Sabyrzhan Tasbolatov <snovitoll(a)gmail.com> Reviewed-by: Alexander Potapenko <glider(a)google.com> Cc: Borislav Petkov (AMD) <bp(a)alien8.de> Cc: Dave Hansen <dave.hansen(a)linux.intel.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/x86/kernel/traps.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/arch/x86/kernel/traps.c b/arch/x86/kernel/traps.c index d05392db5d0fe..2dbadf347b5f4 100644 --- a/arch/x86/kernel/traps.c +++ b/arch/x86/kernel/traps.c @@ -261,12 +261,6 @@ static noinstr bool handle_bug(struct pt_regs *regs) int ud_type; u32 imm; - /* - * Normally @regs are unpoisoned by irqentry_enter(), but handle_bug() - * is a rare case that uses @regs without passing them to - * irqentry_enter(). - */ - kmsan_unpoison_entry_regs(regs); ud_type = decode_bug(regs->ip, &imm); if (ud_type == BUG_NONE) return handled; @@ -275,6 +269,12 @@ static noinstr bool handle_bug(struct pt_regs *regs) * All lies, just get the WARN/BUG out. */ instrumentation_begin(); + /* + * Normally @regs are unpoisoned by irqentry_enter(), but handle_bug() + * is a rare case that uses @regs without passing them to + * irqentry_enter(). + */ + kmsan_unpoison_entry_regs(regs); /* * Since we're emulating a CALL with exceptions, restore the interrupt * state to what it was at the exception site. -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "thunderbolt: Honor TMU requirements in the domain when setting TMU mode" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3cea8af2d1a9ae5869b47c3dabe3b20f331f3bbd Mon Sep 17 00:00:00 2001 From: Gil Fine <gil.fine(a)linux.intel.com> Date: Thu, 10 Oct 2024 17:29:42 +0300 Subject: [PATCH] thunderbolt: Honor TMU requirements in the domain when setting TMU mode Currently, when configuring TMU (Time Management Unit) mode of a given router, we take into account only its own TMU requirements ignoring other routers in the domain. This is problematic if the router we are configuring has lower TMU requirements than what is already configured in the domain. In the scenario below, we have a host router with two USB4 ports: A and B. Port A connected to device router #1 (which supports CL states) and existing DisplayPort tunnel, thus, the TMU mode is HiFi uni-directional. 1. Initial topology [Host] A/ / [Device #1] / Monitor 2. Plug in device #2 (that supports CL states) to downstream port B of the host router [Host] A/ B\ / \ [Device #1] [Device #2] / Monitor The TMU mode on port B and port A will be configured to LowRes which is not what we want and will cause monitor to start flickering. To address this we first scan the domain and search for any router configured to HiFi uni-directional mode, and if found, configure TMU mode of the given router to HiFi uni-directional as well. Cc: stable(a)vger.kernel.org Signed-off-by: Gil Fine <gil.fine(a)linux.intel.com> Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com> --- drivers/thunderbolt/tb.c | 48 +++++++++++++++++++++++++++++++++++----- 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c index 10e719dd837ce..4f777788e9179 100644 --- a/drivers/thunderbolt/tb.c +++ b/drivers/thunderbolt/tb.c @@ -288,6 +288,24 @@ static void tb_increase_tmu_accuracy(struct tb_tunnel *tunnel) device_for_each_child(&sw->dev, NULL, tb_increase_switch_tmu_accuracy); } +static int tb_switch_tmu_hifi_uni_required(struct device *dev, void *not_used) +{ + struct tb_switch *sw = tb_to_switch(dev); + + if (sw && tb_switch_tmu_is_enabled(sw) && + tb_switch_tmu_is_configured(sw, TB_SWITCH_TMU_MODE_HIFI_UNI)) + return 1; + + return device_for_each_child(dev, NULL, + tb_switch_tmu_hifi_uni_required); +} + +static bool tb_tmu_hifi_uni_required(struct tb *tb) +{ + return device_for_each_child(&tb->dev, NULL, + tb_switch_tmu_hifi_uni_required) == 1; +} + static int tb_enable_tmu(struct tb_switch *sw) { int ret; @@ -302,12 +320,30 @@ static int tb_enable_tmu(struct tb_switch *sw) ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_MEDRES_ENHANCED_UNI); if (ret == -EOPNOTSUPP) { - if (tb_switch_clx_is_enabled(sw, TB_CL1)) - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_LOWRES); - else - ret = tb_switch_tmu_configure(sw, - TB_SWITCH_TMU_MODE_HIFI_BI); + if (tb_switch_clx_is_enabled(sw, TB_CL1)) { + /* + * Figure out uni-directional HiFi TMU requirements + * currently in the domain. If there are no + * uni-directional HiFi requirements we can put the TMU + * into LowRes mode. + * + * Deliberately skip bi-directional HiFi links + * as these work independently of other links + * (and they do not allow any CL states anyway). + */ + if (tb_tmu_hifi_uni_required(sw->tb)) + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_HIFI_UNI); + else + ret = tb_switch_tmu_configure(sw, + TB_SWITCH_TMU_MODE_LOWRES); + } else { + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); + } + + /* If not supported, fallback to bi-directional HiFi */ + if (ret == -EOPNOTSUPP) + ret = tb_switch_tmu_configure(sw, TB_SWITCH_TMU_MODE_HIFI_BI); } if (ret) return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "gpiolib: fix debugfs newline separators" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3e8b7238b427e05498034c240451af5f5495afda Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Mon, 28 Oct 2024 13:49:58 +0100 Subject: [PATCH] gpiolib: fix debugfs newline separators The gpiolib debugfs interface exports a list of all gpio chips in a system and the state of their pins. The gpio chip sections are supposed to be separated by a newline character, but a long-standing bug prevents the separator from being included when output is generated in multiple sessions, making the output inconsistent and hard to read. Make sure to only suppress the newline separator at the beginning of the file as intended. Fixes: f9c4a31f6150 ("gpiolib: Use seq_file's iterator interface") Cc: stable(a)vger.kernel.org # 3.7 Cc: Thierry Reding <treding(a)nvidia.com> Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Link: https://lore.kernel.org/r/20241028125000.24051-2-johan+linaro@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> --- drivers/gpio/gpiolib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d5952ab7752c2..e27488a90bc97 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -4926,6 +4926,8 @@ static void *gpiolib_seq_start(struct seq_file *s, loff_t *pos) return NULL; s->private = priv; + if (*pos > 0) + priv->newline = true; priv->idx = srcu_read_lock(&gpio_devices_srcu); list_for_each_entry_srcu(gdev, &gpio_devices, list, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mei: use kvmalloc for read buffer" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4adf613e01bf99e1739f6ff3e162ad5b7d578d1a Mon Sep 17 00:00:00 2001 From: Alexander Usyskin <alexander.usyskin(a)intel.com> Date: Tue, 15 Oct 2024 15:31:57 +0300 Subject: [PATCH] mei: use kvmalloc for read buffer Read buffer is allocated according to max message size, reported by the firmware and may reach 64K in systems with pxp client. Contiguous 64k allocation may fail under memory pressure. Read buffer is used as in-driver message storage and not required to be contiguous. Use kvmalloc to allow kernel to allocate non-contiguous memory. Fixes: 3030dc056459 ("mei: add wrapper for queuing control commands.") Cc: stable <stable(a)kernel.org> Reported-by: Rohit Agarwal <rohiagar(a)chromium.org> Closes: https://lore.kernel.org/all/20240813084542.2921300-1-rohiagar@chromium.org/ Tested-by: Brian Geffon <bgeffon(a)google.com> Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com> Acked-by: Tomas Winkler <tomasw(a)gmail.com> Link: https://lore.kernel.org/r/20241015123157.2337026-1-alexander.usyskin@intel.… Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> --- drivers/misc/mei/client.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/misc/mei/client.c b/drivers/misc/mei/client.c index 9d090fa07516f..be011cef12e5d 100644 --- a/drivers/misc/mei/client.c +++ b/drivers/misc/mei/client.c @@ -321,7 +321,7 @@ void mei_io_cb_free(struct mei_cl_cb *cb) return; list_del(&cb->list); - kfree(cb->buf.data); + kvfree(cb->buf.data); kfree(cb->ext_hdr); kfree(cb); } @@ -497,7 +497,7 @@ struct mei_cl_cb *mei_cl_alloc_cb(struct mei_cl *cl, size_t length, if (length == 0) return cb; - cb->buf.data = kmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); + cb->buf.data = kvmalloc(roundup(length, MEI_SLOT_SIZE), GFP_KERNEL); if (!cb->buf.data) { mei_io_cb_free(cb); return NULL; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "nilfs2: fix kernel bug due to missing clearing of checked flag" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/nilfs2/page.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd1..10def4b559956 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) { -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "cxl/acpi: Ensure ports ready at cxl_acpi_probe() return" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 48f62d38a07d464a499fa834638afcfd2b68f852 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Tue, 22 Oct 2024 18:43:40 -0700 Subject: [PATCH] cxl/acpi: Ensure ports ready at cxl_acpi_probe() return In order to ensure root CXL ports are enabled upon cxl_acpi_probe() when the 'cxl_port' driver is built as a module, arrange for the module to be pre-loaded or built-in. The "Fixes:" but no "Cc: stable" on this patch reflects that the issue is merely by inspection since the bug that triggered the discovery of this potential problem [1] is fixed by other means. However, a stable backport should do no harm. Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Tested-by: Gregory Price <gourry(a)gourry.net> Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Link: https://patch.msgid.link/172964781969.81806.17276352414854540808.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> --- drivers/cxl/acpi.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index 82b78e331d8ed..432b7cfd12a8e 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -924,6 +924,13 @@ static void __exit cxl_acpi_exit(void) /* load before dax_hmem sees 'Soft Reserved' CXL ranges */ subsys_initcall(cxl_acpi_init); + +/* + * Arrange for host-bridge ports to be active synchronous with + * cxl_acpi_probe() exit. + */ +MODULE_SOFTDEP("pre: cxl_port"); + module_exit(cxl_acpi_exit); MODULE_DESCRIPTION("CXL ACPI: Platform Support"); MODULE_LICENSE("GPL v2"); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix defrag not merging contiguous extents due to merged extent maps" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 77b0d113eec49a7390ff1a08ca1923e89f5f86c6 Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Tue, 29 Oct 2024 15:18:45 +0000 Subject: [PATCH] btrfs: fix defrag not merging contiguous extents due to merged extent maps When running defrag (manual defrag) against a file that has extents that are contiguous and we already have the respective extent maps loaded and merged, we end up not defragging the range covered by those contiguous extents. This happens when we have an extent map that was the result of merging multiple extent maps for contiguous extents and the length of the merged extent map is greater than or equals to the defrag threshold length. The script below reproduces this scenario: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT # Create a 256K file with 4 extents of 64K each. xfs_io -f -c "falloc 0 64K" \ -c "pwrite 0 64K" \ -c "falloc 64K 64K" \ -c "pwrite 64K 64K" \ -c "falloc 128K 64K" \ -c "pwrite 128K 64K" \ -c "falloc 192K 64K" \ -c "pwrite 192K 64K" \ $MNT/foo umount $MNT echo -n "Initial number of file extent items: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l mount $DEV $MNT # Read the whole file in order to load and merge extent maps. cat $MNT/foo > /dev/null btrfs filesystem defragment -t 128K $MNT/foo umount $MNT echo -n "Number of file extent items after defrag with 128K threshold: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l mount $DEV $MNT # Read the whole file in order to load and merge extent maps. cat $MNT/foo > /dev/null btrfs filesystem defragment -t 256K $MNT/foo umount $MNT echo -n "Number of file extent items after defrag with 256K threshold: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l Running it: $ ./test.sh Initial number of file extent items: 4 Number of file extent items after defrag with 128K threshold: 4 Number of file extent items after defrag with 256K threshold: 4 The 4 extents don't get merged because we have an extent map with a size of 256K that is the result of merging the individual extent maps for each of the four 64K extents and at defrag_lookup_extent() we have a value of zero for the generation threshold ('newer_than' argument) since this is a manual defrag. As a consequence we don't call defrag_get_extent() to get an extent map representing a single file extent item in the inode's subvolume tree, so we end up using the merged extent map at defrag_collect_targets() and decide not to defrag. Fix this by updating defrag_lookup_extent() to always discard extent maps that were merged and call defrag_get_extent() regardless of the minimum generation threshold ('newer_than' argument). A test case for fstests will be sent along soon. CC: stable(a)vger.kernel.org # 6.1+ Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check") Reviewed-by: Qu Wenruo <wqu(a)suse.com> Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/defrag.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c index b95ef44c326bd..968dae9539482 100644 --- a/fs/btrfs/defrag.c +++ b/fs/btrfs/defrag.c @@ -763,12 +763,12 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start, * We can get a merged extent, in that case, we need to re-search * tree to get the original em for defrag. * - * If @newer_than is 0 or em::generation < newer_than, we can trust - * this em, as either we don't care about the generation, or the - * merged extent map will be rejected anyway. + * This is because even if we have adjacent extents that are contiguous + * and compatible (same type and flags), we still want to defrag them + * so that we use less metadata (extent items in the extent tree and + * file extent items in the inode's subvolume tree). */ - if (em && (em->flags & EXTENT_FLAG_MERGED) && - newer_than && em->generation >= newer_than) { + if (em && (em->flags & EXTENT_FLAG_MERGED)) { free_extent_map(em); em = NULL; } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "cxl/port: Fix use-after-free, permit out-of-order decoder shutdown" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 101c268bd2f37e965a5468353e62d154db38838e Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Tue, 22 Oct 2024 18:43:49 -0700 Subject: [PATCH] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown In support of investigating an initialization failure report [1], cxl_test was updated to register mock memory-devices after the mock root-port/bus device had been registered. That led to cxl_test crashing with a use-after-free bug with the following signature: cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 [..] cxld_unregister: cxl decoder14.0: cxl_region_decode_reset: cxl_region region3: mock_decoder_reset: cxl_port port3: decoder3.0 reset 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 cxl_endpoint_decoder_release: cxl decoder14.0: [..] cxld_unregister: cxl decoder7.0: 3) cxl_region_decode_reset: cxl_region region3: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace: <TASK> cxl_region_decode_reset+0x69/0x190 [cxl_core] cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x5d/0x60 [cxl_core] At 1) a region has been established with 2 endpoint decoders (7.0 and 14.0). Those endpoints share a common switch-decoder in the topology (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits the "out of order reset case" in the switch decoder. The effect though is that region3 cleanup is aborted leaving it in-tact and referencing decoder14.0. At 3) the second attempt to teardown region3 trips over the stale decoder14.0 object which has long since been deleted. The fix here is to recognize that the CXL specification places no mandate on in-order shutdown of switch-decoders, the driver enforces in-order allocation, and hardware enforces in-order commit. So, rather than fail and leave objects dangling, always remove them. In support of making cxl_region_decode_reset() always succeed, cxl_region_invalidate_memregion() failures are turned into warnings. Crashing the kernel is ok there since system integrity is at risk if caches cannot be managed around physical address mutation events like CXL region destruction. A new device_for_each_child_reverse_from() is added to cleanup port->commit_end after all dependent decoders have been disabled. In other words if decoders are allocated 0->1->2 and disabled 1->2->0 then port->commit_end only decrements from 2 after 2 has been disabled, and it decrements all the way to zero since 1 was disabled previously. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: stable(a)vger.kernel.org Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: Alison Schofield <alison.schofield(a)intel.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Cc: Zijun Hu <quic_zijuhu(a)quicinc.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> --- drivers/base/core.c | 35 +++++++++++++++++++++++++ drivers/cxl/core/hdm.c | 50 ++++++++++++++++++++++++++++++------ drivers/cxl/core/region.c | 48 ++++++++++------------------------ drivers/cxl/cxl.h | 3 ++- include/linux/device.h | 3 +++ tools/testing/cxl/test/cxl.c | 14 ++++------ 6 files changed, 100 insertions(+), 53 deletions(-) diff --git a/drivers/base/core.c b/drivers/base/core.c index a4c853411a6b2..e42f1ad730780 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data, } EXPORT_SYMBOL_GPL(device_for_each_child_reverse); +/** + * device_for_each_child_reverse_from - device child iterator in reversed order. + * @parent: parent struct device. + * @from: optional starting point in child list + * @fn: function to be called for each device. + * @data: data for the callback. + * + * Iterate over @parent's child devices, starting at @from, and call @fn + * for each, passing it @data. This helper is identical to + * device_for_each_child_reverse() when @from is NULL. + * + * @fn is checked each iteration. If it returns anything other than 0, + * iteration stop and that value is returned to the caller of + * device_for_each_child_reverse_from(); + */ +int device_for_each_child_reverse_from(struct device *parent, + struct device *from, const void *data, + int (*fn)(struct device *, const void *)) +{ + struct klist_iter i; + struct device *child; + int error = 0; + + if (!parent->p) + return 0; + + klist_iter_init_node(&parent->p->klist_children, &i, + (from ? &from->p->knode_parent : NULL)); + while ((child = prev_device(&i)) && !error) + error = fn(child, data); + klist_iter_exit(&i); + return error; +} +EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from); + /** * device_find_child - device iterator for locating a particular device. * @parent: parent struct device diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 3df10517a3278..223c273c0cd17 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -712,7 +712,44 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld) return 0; } -static int cxl_decoder_reset(struct cxl_decoder *cxld) +static int commit_reap(struct device *dev, const void *data) +{ + struct cxl_port *port = to_cxl_port(dev->parent); + struct cxl_decoder *cxld; + + if (!is_switch_decoder(dev) && !is_endpoint_decoder(dev)) + return 0; + + cxld = to_cxl_decoder(dev); + if (port->commit_end == cxld->id && + ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) { + port->commit_end--; + dev_dbg(&port->dev, "reap: %s commit_end: %d\n", + dev_name(&cxld->dev), port->commit_end); + } + + return 0; +} + +void cxl_port_commit_reap(struct cxl_decoder *cxld) +{ + struct cxl_port *port = to_cxl_port(cxld->dev.parent); + + lockdep_assert_held_write(&cxl_region_rwsem); + + /* + * Once the highest committed decoder is disabled, free any other + * decoders that were pinned allocated by out-of-order release. + */ + port->commit_end--; + dev_dbg(&port->dev, "reap: %s commit_end: %d\n", dev_name(&cxld->dev), + port->commit_end); + device_for_each_child_reverse_from(&port->dev, &cxld->dev, NULL, + commit_reap); +} +EXPORT_SYMBOL_NS_GPL(cxl_port_commit_reap, CXL); + +static void cxl_decoder_reset(struct cxl_decoder *cxld) { struct cxl_port *port = to_cxl_port(cxld->dev.parent); struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev); @@ -721,14 +758,14 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) u32 ctrl; if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) - return 0; + return; - if (port->commit_end != id) { + if (port->commit_end == id) + cxl_port_commit_reap(cxld); + else dev_dbg(&port->dev, "%s: out of order reset, expected decoder%d.%d\n", dev_name(&cxld->dev), port->id, port->commit_end); - return -EBUSY; - } down_read(&cxl_dpa_rwsem); ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); @@ -741,7 +778,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id)); up_read(&cxl_dpa_rwsem); - port->commit_end--; cxld->flags &= ~CXL_DECODER_F_ENABLE; /* Userspace is now responsible for reconfiguring this decoder */ @@ -751,8 +787,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) cxled = to_cxl_endpoint_decoder(&cxld->dev); cxled->state = CXL_DECODER_STATE_MANUAL; } - - return 0; } static int cxl_setup_hdm_decoder_from_dvsec( diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index e701e4b040328..3478d20583035 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -232,8 +232,8 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) "Bypassing cpu_cache_invalidate_memregion() for testing!\n"); return 0; } else { - dev_err(&cxlr->dev, - "Failed to synchronize CPU cache state\n"); + dev_WARN(&cxlr->dev, + "Failed to synchronize CPU cache state\n"); return -ENXIO; } } @@ -242,19 +242,17 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) return 0; } -static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) +static void cxl_region_decode_reset(struct cxl_region *cxlr, int count) { struct cxl_region_params *p = &cxlr->params; - int i, rc = 0; + int i; /* - * Before region teardown attempt to flush, and if the flush - * fails cancel the region teardown for data consistency - * concerns + * Before region teardown attempt to flush, evict any data cached for + * this region, or scream loudly about missing arch / platform support + * for CXL teardown. */ - rc = cxl_region_invalidate_memregion(cxlr); - if (rc) - return rc; + cxl_region_invalidate_memregion(cxlr); for (i = count - 1; i >= 0; i--) { struct cxl_endpoint_decoder *cxled = p->targets[i]; @@ -277,23 +275,17 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) cxl_rr = cxl_rr_load(iter, cxlr); cxld = cxl_rr->decoder; if (cxld->reset) - rc = cxld->reset(cxld); - if (rc) - return rc; + cxld->reset(cxld); set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); } endpoint_reset: - rc = cxled->cxld.reset(&cxled->cxld); - if (rc) - return rc; + cxled->cxld.reset(&cxled->cxld); set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); } /* all decoders associated with this region have been torn down */ clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); - - return 0; } static int commit_decoder(struct cxl_decoder *cxld) @@ -409,16 +401,8 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, * still pending. */ if (p->state == CXL_CONFIG_RESET_PENDING) { - rc = cxl_region_decode_reset(cxlr, p->interleave_ways); - /* - * Revert to committed since there may still be active - * decoders associated with this region, or move forward - * to active to mark the reset successful - */ - if (rc) - p->state = CXL_CONFIG_COMMIT; - else - p->state = CXL_CONFIG_ACTIVE; + cxl_region_decode_reset(cxlr, p->interleave_ways); + p->state = CXL_CONFIG_ACTIVE; } } @@ -2054,13 +2038,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled) get_device(&cxlr->dev); if (p->state > CXL_CONFIG_ACTIVE) { - /* - * TODO: tear down all impacted regions if a device is - * removed out of order - */ - rc = cxl_region_decode_reset(cxlr, p->interleave_ways); - if (rc) - goto out; + cxl_region_decode_reset(cxlr, p->interleave_ways); p->state = CXL_CONFIG_ACTIVE; } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 0d8b810a51f04..5406e3ab3d4a4 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -359,7 +359,7 @@ struct cxl_decoder { struct cxl_region *region; unsigned long flags; int (*commit)(struct cxl_decoder *cxld); - int (*reset)(struct cxl_decoder *cxld); + void (*reset)(struct cxl_decoder *cxld); }; /* @@ -730,6 +730,7 @@ static inline bool is_cxl_root(struct cxl_port *port) int cxl_num_decoders_committed(struct cxl_port *port); bool is_cxl_port(const struct device *dev); struct cxl_port *to_cxl_port(const struct device *dev); +void cxl_port_commit_reap(struct cxl_decoder *cxld); struct pci_bus; int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev, struct pci_bus *bus); diff --git a/include/linux/device.h b/include/linux/device.h index b4bde8d226979..667cb6db90193 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1078,6 +1078,9 @@ int device_for_each_child(struct device *dev, void *data, int (*fn)(struct device *dev, void *data)); int device_for_each_child_reverse(struct device *dev, void *data, int (*fn)(struct device *dev, void *data)); +int device_for_each_child_reverse_from(struct device *parent, + struct device *from, const void *data, + int (*fn)(struct device *, const void *)); struct device *device_find_child(struct device *dev, void *data, int (*match)(struct device *dev, void *data)); struct device *device_find_child_by_name(struct device *parent, diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index 90d5afd52dd06..c5bbd89b31920 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -693,26 +693,22 @@ static int mock_decoder_commit(struct cxl_decoder *cxld) return 0; } -static int mock_decoder_reset(struct cxl_decoder *cxld) +static void mock_decoder_reset(struct cxl_decoder *cxld) { struct cxl_port *port = to_cxl_port(cxld->dev.parent); int id = cxld->id; if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) - return 0; + return; dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev)); - if (port->commit_end != id) { + if (port->commit_end == id) + cxl_port_commit_reap(cxld); + else dev_dbg(&port->dev, "%s: out of order reset, expected decoder%d.%d\n", dev_name(&cxld->dev), port->id, port->commit_end); - return -EBUSY; - } - - port->commit_end--; cxld->flags &= ~CXL_DECODER_F_ENABLE; - - return 0; } static void default_mock_decoder(struct cxl_decoder *cxld) -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: avoid unconditional one-tick sleep when swapcache_prepare fails" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001 From: Barry Song <v-songbaohua(a)oppo.com> Date: Fri, 27 Sep 2024 09:19:36 +1200 Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3ccee51adfbbd..bdf77a3ec47bc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "cxl/port: Fix CXL port initialization order when the subsystem is built-in" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 6575b268157f37929948a8d1f3bafb3d7c055bc1 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Fri, 25 Oct 2024 12:32:55 -0700 Subject: [PATCH] cxl/port: Fix CXL port initialization order when the subsystem is built-in When the CXL subsystem is built-in the module init order is determined by Makefile order. That order violates expectations. The expectation is that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race, cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That flow only works if cxl_acpi can assume ports are enabled immediately upon cxl_acpi_probe() return. That in turn can only happen in the CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before cxl_acpi_probe() runs. Fix up the order to prevent initialization failures. Ensure that cxl_port is built-in when cxl_acpi is also built-in, arrange for Makefile order to resolve the subsys_initcall() order of cxl_port and cxl_acpi, and arrange for Makefile order to resolve the device_initcall() (module_init()) order of the remaining objects. As for what contributed to this not being found earlier, the CXL regression environment, cxl_test, builds all CXL functionality as a module to allow to symbol mocking and other dynamic reload tests. As a result there is no regression coverage for the built-in case. Reported-by: Gregory Price <gourry(a)gourry.net> Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net Tested-by: Gregory Price <gourry(a)gourry.net> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: stable(a)vger.kernel.org Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: Alison Schofield <alison.schofield(a)intel.com> Cc: Vishal Verma <vishal.l.verma(a)intel.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Tested-by: Alejandro Lucero <alucerop(a)amd.com> Reviewed-by: Alejandro Lucero <alucerop(a)amd.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> --- drivers/cxl/Kconfig | 1 + drivers/cxl/Makefile | 20 ++++++++++++++------ drivers/cxl/port.c | 17 ++++++++++++++++- 3 files changed, 31 insertions(+), 7 deletions(-) diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index 29c192f20082c..876469e23f7a7 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -60,6 +60,7 @@ config CXL_ACPI default CXL_BUS select ACPI_TABLE_LIB select ACPI_HMAT + select CXL_PORT help Enable support for host managed device memory (HDM) resources published by a platform's ACPI CXL memory layout description. See diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile index db321f48ba52e..2caa90fa4bf25 100644 --- a/drivers/cxl/Makefile +++ b/drivers/cxl/Makefile @@ -1,13 +1,21 @@ # SPDX-License-Identifier: GPL-2.0 + +# Order is important here for the built-in case: +# - 'core' first for fundamental init +# - 'port' before platform root drivers like 'acpi' so that CXL-root ports +# are immediately enabled +# - 'mem' and 'pmem' before endpoint drivers so that memdevs are +# immediately enabled +# - 'pci' last, also mirrors the hardware enumeration hierarchy obj-y += core/ -obj-$(CONFIG_CXL_PCI) += cxl_pci.o -obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PORT) += cxl_port.o obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o -obj-$(CONFIG_CXL_PORT) += cxl_port.o +obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PCI) += cxl_pci.o -cxl_mem-y := mem.o -cxl_pci-y := pci.o +cxl_port-y := port.o cxl_acpi-y := acpi.o cxl_pmem-y := pmem.o security.o -cxl_port-y := port.o +cxl_mem-y := mem.o +cxl_pci-y := pci.o diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index 861dde65768fe..9dc394295e1fc 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = { }, }; -module_cxl_driver(cxl_port_driver); +static int __init cxl_port_init(void) +{ + return cxl_driver_register(&cxl_port_driver); +} +/* + * Be ready to immediately enable ports emitted by the platform CXL root + * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y. + */ +subsys_initcall(cxl_port_init); + +static void __exit cxl_port_exit(void) +{ + cxl_driver_unregister(&cxl_port_driver); +} +module_exit(cxl_port_exit); + MODULE_DESCRIPTION("CXL: Port enumeration and services"); MODULE_LICENSE("GPL v2"); MODULE_IMPORT_NS(CXL); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "wifi: iwlwifi: mvm: fix 6 GHz scan construction" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001 From: Johannes Berg <johannes.berg(a)intel.com> Date: Wed, 23 Oct 2024 09:17:44 +0200 Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction If more than 255 colocated APs exist for the set of all APs found during 2.4/5 GHz scanning, then the 6 GHz scan construction will loop forever since the loop variable has type u8, which can never reach the number found when that's bigger than 255, and is stored in a u32 variable. Also move it into the loops to have a smaller scope. Using a u32 there is fine, we limit the number of APs in the scan list and each has a limit on the number of RNR entries due to the frame size. With a limit of 1000 scan results, a frame size upper bound of 4096 (really it's more like ~2300) and a TBTT entry size of at least 11, we get an upper bound for the number of ~372k, well in the bounds of a u32. Cc: stable(a)vger.kernel.org Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375 Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101… Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> --- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 3ce9150213a74..ddcbd80a49fb2 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, &cp->channel_config[ch_cnt]; u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0; - u8 j, k, n_s_ssids = 0, n_bssids = 0; + u8 k, n_s_ssids = 0, n_bssids = 0; u8 max_s_ssids, max_bssids; bool force_passive = false, found = false, allow_passive = true, unsolicited_probe_on_chan = false, psc_no_listen = false; @@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, cfg->v5.iter_count = 1; cfg->v5.iter_interval = 0; - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { s8 tmp_psd_20; if (!(scan_6ghz_params[j].channel_idx == i)) @@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, * SSID. * TODO: improve this logic */ - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { if (!(scan_6ghz_params[j].channel_idx == i)) continue; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix extent map merging not happening for adjacent extents" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From a0f0625390858321525c2a8d04e174a546bd19b3 Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Mon, 28 Oct 2024 16:23:00 +0000 Subject: [PATCH] btrfs: fix extent map merging not happening for adjacent extents If we have 3 or more adjacent extents in a file, that is, consecutive file extent items pointing to adjacent extents, within a contiguous file range and compatible flags, we end up not merging all the extents into a single extent map. For example: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ xfs_io -f -d -c "pwrite -b 64K 0 64K" \ -c "pwrite -b 64K 64K 64K" \ -c "pwrite -b 64K 128K 64K" \ -c "pwrite -b 64K 192K 64K" \ /mnt/sdc/foo After all the ordered extents complete we unpin the extent maps and try to merge them, but instead of getting a single extent map we get two because: 1) When the first ordered extent completes (file range [0, 64K)) we unpin its extent map and attempt to merge it with the extent map for the range [64K, 128K), but we can't because that extent map is still pinned; 2) When the second ordered extent completes (file range [64K, 128K)), we unpin its extent map and merge it with the previous extent map, for file range [0, 64K), but we can't merge with the next extent map, for the file range [128K, 192K), because this one is still pinned. The merged extent map for the file range [0, 128K) gets the flag EXTENT_MAP_MERGED set; 3) When the third ordered extent completes (file range [128K, 192K)), we unpin its extent map and attempt to merge it with the previous extent map, for file range [0, 128K), but we can't because that extent map has the flag EXTENT_MAP_MERGED set (mergeable_maps() returns false due to different flags) while the extent map for the range [128K, 192K) doesn't have that flag set. We also can't merge it with the next extent map, for file range [192K, 256K), because that one is still pinned. At this moment we have 3 extent maps: One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set. One for file range [128K, 192K). One for file range [192K, 256K) which is still pinned; 4) When the fourth and final extent completes (file range [192K, 256K)), we unpin its extent map and attempt to merge it with the previous extent map, for file range [128K, 192K), which succeeds since none of these extent maps have the EXTENT_MAP_MERGED flag set. So we end up with 2 extent maps: One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set. One for file range [128K, 256K), with the flag EXTENT_MAP_MERGED set. Since after merging extent maps we don't attempt to merge again, that is, merge the resulting extent map with the one that is now preceding it (and the one following it), we end up with those two extent maps, when we could have had a single extent map to represent the whole file. Fix this by making mergeable_maps() ignore the EXTENT_MAP_MERGED flag. While this doesn't present any functional issue, it prevents the merging of extent maps which allows to save memory, and can make defrag not merging extents too (that will be addressed in the next patch). Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check") CC: stable(a)vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo <wqu(a)suse.com> Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/extent_map.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 668c617444a50..1d93e1202c339 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -230,7 +230,12 @@ static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma if (extent_map_end(prev) != next->start) return false; - if (prev->flags != next->flags) + /* + * The merged flag is not an on-disk flag, it just indicates we had the + * extent maps of 2 (or more) adjacent extents merged, so factor it out. + */ + if ((prev->flags & ~EXTENT_FLAG_MERGED) != + (next->flags & ~EXTENT_FLAG_MERGED)) return false; if (next->disk_bytenr < EXTENT_MAP_LAST_BYTE - 1) -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From aec8e6bf839101784f3ef037dcdb9432c3f32343 Mon Sep 17 00:00:00 2001 From: Zhihao Cheng <chengzhihao1(a)huawei.com> Date: Mon, 21 Oct 2024 22:02:15 +0800 Subject: [PATCH] btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mounting btrfs from two images (which have the same one fsid and two different dev_uuids) in certain executing order may trigger an UAF for variable 'device->bdev_file' in __btrfs_free_extra_devids(). And following are the details: 1. Attach image_1 to loop0, attach image_2 to loop1, and scan btrfs devices by ioctl(BTRFS_IOC_SCAN_DEV): / btrfs_device_1 → loop0 fs_device \ btrfs_device_2 → loop1 2. mount /dev/loop0 /mnt btrfs_open_devices btrfs_device_1->bdev_file = btrfs_get_bdev_and_sb(loop0) btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree fail: btrfs_close_devices // -ENOMEM btrfs_close_bdev(btrfs_device_1) fput(btrfs_device_1->bdev_file) // btrfs_device_1->bdev_file is freed btrfs_close_bdev(btrfs_device_2) fput(btrfs_device_2->bdev_file) 3. mount /dev/loop1 /mnt btrfs_open_devices btrfs_get_bdev_and_sb(&bdev_file) // EIO, btrfs_device_1->bdev_file is not assigned, // which points to a freed memory area btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree btrfs_free_extra_devids if (btrfs_device_1->bdev_file) fput(btrfs_device_1->bdev_file) // UAF ! Fix it by setting 'device->bdev_file' as 'NULL' after closing the btrfs_device in btrfs_close_one_device(). Fixes: 142388194191 ("btrfs: do not background blkdev_put()") CC: stable(a)vger.kernel.org # 4.19+ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219408 Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/volumes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8f340ad1d9384..eb51b609190fb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1105,6 +1105,7 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (device->bdev) { fs_devices->open_devices--; device->bdev = NULL; + device->bdev_file = NULL; } clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); btrfs_destroy_dev_zone_info(device); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "riscv: Do not use fortify in early code" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From afedc3126e11ff1404b32e538657b68022e933ca Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti <alexghiti(a)rivosinc.com> Date: Wed, 9 Oct 2024 09:27:49 +0200 Subject: [PATCH] riscv: Do not use fortify in early code Early code designates the code executed when the MMU is not yet enabled, and this comes with some limitations (see Documentation/arch/riscv/boot.rst, section "Pre-MMU execution"). FORTIFY_SOURCE must be disabled then since it can trigger kernel panics as reported in [1]. Reported-by: Jason Montleon <jmontleo(a)redhat.com> Closes: https://lore.kernel.org/linux-riscv/CAJD_bPJes4QhmXY5f63GHV9B9HFkSCoaZjk-qC… [1] Fixes: a35707c3d850 ("riscv: add memory-type errata for T-Head") Fixes: 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line") Cc: stable(a)vger.kernel.org Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Link: https://lore.kernel.org/r/20241009072749.45006-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> --- arch/riscv/errata/Makefile | 6 ++++++ arch/riscv/kernel/Makefile | 5 +++++ arch/riscv/kernel/pi/Makefile | 6 +++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile index 8a27394851233..f0da9d7b39c37 100644 --- a/arch/riscv/errata/Makefile +++ b/arch/riscv/errata/Makefile @@ -2,6 +2,12 @@ ifdef CONFIG_RELOCATABLE KBUILD_CFLAGS += -fno-pie endif +ifdef CONFIG_RISCV_ALTERNATIVE_EARLY +ifdef CONFIG_FORTIFY_SOURCE +KBUILD_CFLAGS += -D__NO_FORTIFY +endif +endif + obj-$(CONFIG_ERRATA_ANDES) += andes/ obj-$(CONFIG_ERRATA_SIFIVE) += sifive/ obj-$(CONFIG_ERRATA_THEAD) += thead/ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 7f88cc4931f5c..69dc8aaab3fb3 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -36,6 +36,11 @@ KASAN_SANITIZE_alternative.o := n KASAN_SANITIZE_cpufeature.o := n KASAN_SANITIZE_sbi_ecall.o := n endif +ifdef CONFIG_FORTIFY_SOURCE +CFLAGS_alternative.o += -D__NO_FORTIFY +CFLAGS_cpufeature.o += -D__NO_FORTIFY +CFLAGS_sbi_ecall.o += -D__NO_FORTIFY +endif endif extra-y += vmlinux.lds diff --git a/arch/riscv/kernel/pi/Makefile b/arch/riscv/kernel/pi/Makefile index d5bf1bc7de62e..81d69d45c06c3 100644 --- a/arch/riscv/kernel/pi/Makefile +++ b/arch/riscv/kernel/pi/Makefile @@ -16,8 +16,12 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) KBUILD_CFLAGS += -mcmodel=medany CFLAGS_cmdline_early.o += -D__NO_FORTIFY -CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY CFLAGS_fdt_early.o += -D__NO_FORTIFY +# lib/string.c already defines __NO_FORTIFY +CFLAGS_ctype.o += -D__NO_FORTIFY +CFLAGS_lib-fdt.o += -D__NO_FORTIFY +CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY +CFLAGS_archrandom_early.o += -D__NO_FORTIFY $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \ --remove-section=.note.gnu.property \ -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: shmem: fix data-race in shmem_getattr()" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From d949d1d14fa281ace388b1de978e8f2cd52875cf Mon Sep 17 00:00:00 2001 From: Jeongjun Park <aha310510(a)gmail.com> Date: Mon, 9 Sep 2024 21:35:58 +0900 Subject: [PATCH] mm: shmem: fix data-race in shmem_getattr() I got the following KCSAN report during syzbot testing: ================================================================== BUG: KCSAN: data-race in generic_fillattr / inode_set_ctime_current write to 0xffff888102eb3260 of 4 bytes by task 6565 on cpu 1: inode_set_ctime_to_ts include/linux/fs.h:1638 [inline] inode_set_ctime_current+0x169/0x1d0 fs/inode.c:2626 shmem_mknod+0x117/0x180 mm/shmem.c:3443 shmem_create+0x34/0x40 mm/shmem.c:3497 lookup_open fs/namei.c:3578 [inline] open_last_lookups fs/namei.c:3647 [inline] path_openat+0xdbc/0x1f00 fs/namei.c:3883 do_filp_open+0xf7/0x200 fs/namei.c:3913 do_sys_openat2+0xab/0x120 fs/open.c:1416 do_sys_open fs/open.c:1431 [inline] __do_sys_openat fs/open.c:1447 [inline] __se_sys_openat fs/open.c:1442 [inline] __x64_sys_openat+0xf3/0x120 fs/open.c:1442 x64_sys_call+0x1025/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:258 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff888102eb3260 of 4 bytes by task 3498 on cpu 0: inode_get_ctime_nsec include/linux/fs.h:1623 [inline] inode_get_ctime include/linux/fs.h:1629 [inline] generic_fillattr+0x1dd/0x2f0 fs/stat.c:62 shmem_getattr+0x17b/0x200 mm/shmem.c:1157 vfs_getattr_nosec fs/stat.c:166 [inline] vfs_getattr+0x19b/0x1e0 fs/stat.c:207 vfs_statx_path fs/stat.c:251 [inline] vfs_statx+0x134/0x2f0 fs/stat.c:315 vfs_fstatat+0xec/0x110 fs/stat.c:341 __do_sys_newfstatat fs/stat.c:505 [inline] __se_sys_newfstatat+0x58/0x260 fs/stat.c:499 __x64_sys_newfstatat+0x55/0x70 fs/stat.c:499 x64_sys_call+0x141f/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:263 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e value changed: 0x2755ae53 -> 0x27ee44d3 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 3498 Comm: udevd Not tainted 6.11.0-rc6-syzkaller-00326-gd1f2d51b711a-dirty #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024 ================================================================== When calling generic_fillattr(), if you don't hold read lock, data-race will occur in inode member variables, which can cause unexpected behavior. Since there is no special protection when shmem_getattr() calls generic_fillattr(), data-race occurs by functions such as shmem_unlink() or shmem_mknod(). This can cause unexpected results, so commenting it out is not enough. Therefore, when calling generic_fillattr() from shmem_getattr(), it is appropriate to protect the inode using inode_lock_shared() and inode_unlock_shared() to prevent data-race. Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat") Signed-off-by: Jeongjun Park <aha310510(a)gmail.com> Reported-by: syzbot <syzkaller(a)googlegroup.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/shmem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index c5adb987b23cf..4ba1d00fabdaa 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1166,7 +1166,9 @@ static int shmem_getattr(struct mnt_idmap *idmap, stat->attributes_mask |= (STATX_ATTR_APPEND | STATX_ATTR_IMMUTABLE | STATX_ATTR_NODUMP); + inode_lock_shared(inode); generic_fillattr(idmap, request_mask, inode, stat); + inode_unlock_shared(inode); if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize = HPAGE_PMD_SIZE; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats" failed to apply to v6.1-stable tree

by Sasha Levin

The patch below does not apply to the v6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:38 +0000 Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google… This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: David Stevens <stevensd(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mmzone.h | 2 -- mm/vmscan.c | 14 +++++--------- 2 files changed, 5 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 17506e4a28355..9342e5692dab6 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -458,9 +458,7 @@ struct lru_gen_folio { enum { MM_LEAF_TOTAL, /* total leaf entries */ - MM_LEAF_OLD, /* old leaf entries */ MM_LEAF_YOUNG, /* young leaf entries */ - MM_NONLEAF_TOTAL, /* total non-leaf entries */ MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ NR_MM_STATS diff --git a/mm/vmscan.c b/mm/vmscan.c index eb4e8440c5071..4f1d33e4b3601 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, continue; if (!pte_young(ptent)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_TOTAL]++; if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, continue; } - walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; @@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, for (tier = 0; tier < MAX_NR_TIERS; tier++) { seq_printf(m, " %10d", tier); for (type = 0; type < ANON_AND_FILE; type++) { - const char *s = " "; + const char *s = "xxx"; unsigned long n[3] = {}; if (seq == max_seq) { - s = "RT "; + s = "RTx"; n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); n[1] = READ_ONCE(lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { @@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, seq_puts(m, " "); for (i = 0; i < NR_MM_STATS; i++) { - const char *s = " "; + const char *s = "xxxx"; unsigned long n = 0; if (seq == max_seq && NR_HIST_GENS == 1) { - s = "LOYNFA"; + s = "TYFA"; n = READ_ONCE(mm_state->stats[hist][i]); } else if (seq != max_seq && NR_HIST_GENS > 1) { - s = "loynfa"; + s = "tyfa"; n = READ_ONCE(mm_state->stats[hist][i]); } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35"" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1b6063a57754eae5705753c01e78dc268b989038 Mon Sep 17 00:00:00 2001 From: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Date: Fri, 11 Oct 2024 11:12:19 -0400 Subject: [PATCH] Revert "drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35" This reverts commit 9dad21f910fc ("drm/amd/display: update DML2 policy EnhancedPrefetchScheduleAccelerationFinal DCN35") [why & how] The offending commit exposes a hang with lid close/open behavior. Both issues seem to be related to ODM 2:1 mode switching, so there is another issue generic to that sequence that needs to be investigated. Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Ovidiu Bunea <Ovidiu.Bunea(a)amd.com> Signed-off-by: Tom Chung <chiahsuan.chung(a)amd.com> Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> (cherry picked from commit 68bf95317ebf2cfa7105251e4279e951daceefb7) Cc: stable(a)vger.kernel.org --- drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c index 11c904ae29586..c4c52173ef224 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_policy.c @@ -303,6 +303,7 @@ void build_unoptimized_policy_settings(enum dml_project_id project, struct dml_m if (project == dml_project_dcn35 || project == dml_project_dcn351) { policy->DCCProgrammingAssumesScanDirectionUnknownFinal = false; + policy->EnhancedPrefetchScheduleAccelerationFinal = 0; policy->AllowForPStateChangeOrStutterInVBlankFinal = dml_prefetch_support_uclk_fclk_and_stutter_if_possible; /*new*/ policy->UseOnlyMaxPrefetchModes = 1; } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:39 +0000 Subject: [PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton(a)google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Reported-by: David Stevens <stevensd(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: kernel test robot <lkp(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mmzone.h | 5 ++- mm/rmap.c | 9 ++--- mm/vmscan.c | 88 +++++++++++++++++++++++------------------- 3 files changed, 55 insertions(+), 47 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9342e5692dab6..5b1c984daf454 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -555,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -574,8 +574,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index a8797d1b3d496..73d5998677d40 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -885,13 +885,10 @@ static bool folio_referenced_one(struct folio *folio, return false; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f1d33e4b3601..ddaaff67642e1 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include <linux/khugepaged.h> #include <linux/rculist_nulls.h> #include <linux/random.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -3294,7 +3295,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3306,13 +3308,20 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte))) return -1; + if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3324,9 +3333,15 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pmd_devmap(pmd))) return -1; + if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3335,10 +3350,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3394,20 +3405,16 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { - continue; - } - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(args->vma, addr, pte + i)) + continue; young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,21 +3480,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) + if (!pmd_present(pmd[i])) goto next; if (!pmd_trans_huge(pmd[i])) { - if (!walk->force_scan && should_clear_pmd_young()) + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,24 +3556,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - continue; - } - - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - continue; - - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + if (pfn != -1) + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } - if (!walk->force_scan && should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -4036,13 +4041,13 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; - int young = 0; + int young = 1; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; @@ -4058,12 +4063,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!ptep_clear_young_notify(vma, addr, pte)) + return false; + if (spin_is_contended(pvmw->ptl)) - return; + return true; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return true; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4071,6 +4079,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return true; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4084,7 +4095,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return true; arch_enter_lazy_mmu_mode(); @@ -4094,19 +4105,16 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) - continue; - folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(vma, addr, pte + i)) + continue; young++; @@ -4136,6 +4144,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return true; } /****************************************************************************** -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "gpiolib: fix debugfs newline separators" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 3e8b7238b427e05498034c240451af5f5495afda Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Mon, 28 Oct 2024 13:49:58 +0100 Subject: [PATCH] gpiolib: fix debugfs newline separators The gpiolib debugfs interface exports a list of all gpio chips in a system and the state of their pins. The gpio chip sections are supposed to be separated by a newline character, but a long-standing bug prevents the separator from being included when output is generated in multiple sessions, making the output inconsistent and hard to read. Make sure to only suppress the newline separator at the beginning of the file as intended. Fixes: f9c4a31f6150 ("gpiolib: Use seq_file's iterator interface") Cc: stable(a)vger.kernel.org # 3.7 Cc: Thierry Reding <treding(a)nvidia.com> Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Link: https://lore.kernel.org/r/20241028125000.24051-2-johan+linaro@kernel.org Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> --- drivers/gpio/gpiolib.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d5952ab7752c2..e27488a90bc97 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -4926,6 +4926,8 @@ static void *gpiolib_seq_start(struct seq_file *s, loff_t *pos) return NULL; s->private = priv; + if (*pos > 0) + priv->newline = true; priv->idx = srcu_read_lock(&gpio_devices_srcu); list_for_each_entry_srcu(gdev, &gpio_devices, list, -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "drm/amd/pm: Vangogh: Fix kernel memory out of bounds write" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 4aa923a6e6406b43566ef6ac35a3d9a3197fa3e8 Mon Sep 17 00:00:00 2001 From: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Date: Fri, 25 Oct 2024 15:56:39 +0100 Subject: [PATCH] drm/amd/pm: Vangogh: Fix kernel memory out of bounds write KASAN reports that the GPU metrics table allocated in vangogh_tables_init() is not large enough for the memset done in smu_cmn_init_soft_gpu_metrics(). Condensed report follows: [ 33.861314] BUG: KASAN: slab-out-of-bounds in smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu] [ 33.861799] Write of size 168 at addr ffff888129f59500 by task mangoapp/1067 ... [ 33.861808] CPU: 6 UID: 1000 PID: 1067 Comm: mangoapp Tainted: G W 6.12.0-rc4 #356 1a56f59a8b5182eeaf67eb7cb8b13594dd23b544 [ 33.861816] Tainted: [W]=WARN [ 33.861818] Hardware name: Valve Galileo/Galileo, BIOS F7G0107 12/01/2023 [ 33.861822] Call Trace: [ 33.861826] <TASK> [ 33.861829] dump_stack_lvl+0x66/0x90 [ 33.861838] print_report+0xce/0x620 [ 33.861853] kasan_report+0xda/0x110 [ 33.862794] kasan_check_range+0xfd/0x1a0 [ 33.862799] __asan_memset+0x23/0x40 [ 33.862803] smu_cmn_init_soft_gpu_metrics+0x73/0x200 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.863306] vangogh_get_gpu_metrics_v2_4+0x123/0xad0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.864257] vangogh_common_get_gpu_metrics+0xb0c/0xbc0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.865682] amdgpu_dpm_get_gpu_metrics+0xcc/0x110 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.866160] amdgpu_get_gpu_metrics+0x154/0x2d0 [amdgpu 13b1bc364ec578808f676eba412c20eaab792779] [ 33.867135] dev_attr_show+0x43/0xc0 [ 33.867147] sysfs_kf_seq_show+0x1f1/0x3b0 [ 33.867155] seq_read_iter+0x3f8/0x1140 [ 33.867173] vfs_read+0x76c/0xc50 [ 33.867198] ksys_read+0xfb/0x1d0 [ 33.867214] do_syscall_64+0x90/0x160 ... [ 33.867353] Allocated by task 378 on cpu 7 at 22.794876s: [ 33.867358] kasan_save_stack+0x33/0x50 [ 33.867364] kasan_save_track+0x17/0x60 [ 33.867367] __kasan_kmalloc+0x87/0x90 [ 33.867371] vangogh_init_smc_tables+0x3f9/0x840 [amdgpu] [ 33.867835] smu_sw_init+0xa32/0x1850 [amdgpu] [ 33.868299] amdgpu_device_init+0x467b/0x8d90 [amdgpu] [ 33.868733] amdgpu_driver_load_kms+0x19/0xf0 [amdgpu] [ 33.869167] amdgpu_pci_probe+0x2d6/0xcd0 [amdgpu] [ 33.869608] local_pci_probe+0xda/0x180 [ 33.869614] pci_device_probe+0x43f/0x6b0 Empirically we can confirm that the former allocates 152 bytes for the table, while the latter memsets the 168 large block. Root cause appears that when GPU metrics tables for v2_4 parts were added it was not considered to enlarge the table to fit. The fix in this patch is rather "brute force" and perhaps later should be done in a smarter way, by extracting and consolidating the part version to size logic to a common helper, instead of brute forcing the largest possible allocation. Nevertheless, for now this works and fixes the out of bounds write. v2: * Drop impossible v3_0 case. (Mario) Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)igalia.com> Fixes: 41cec40bc9ba ("drm/amd/pm: Vangogh: Add new gpu_metrics_v2_4 to acquire gpu_metrics") Cc: Mario Limonciello <mario.limonciello(a)amd.com> Cc: Evan Quan <evan.quan(a)amd.com> Cc: Wenyou Yang <WenYou.Yang(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com> Link: https://lore.kernel.org/r/20241025145639.19124-1-tursulin@igalia.com Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> (cherry picked from commit 0880f58f9609f0200483a49429af0f050d281703) Cc: stable(a)vger.kernel.org # v6.6+ --- drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c index 22737b11b1bfb..1fe020f1f4dbe 100644 --- a/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c +++ b/drivers/gpu/drm/amd/pm/swsmu/smu11/vangogh_ppt.c @@ -242,7 +242,9 @@ static int vangogh_tables_init(struct smu_context *smu) goto err0_out; smu_table->metrics_time = 0; - smu_table->gpu_metrics_table_size = max(sizeof(struct gpu_metrics_v2_3), sizeof(struct gpu_metrics_v2_2)); + smu_table->gpu_metrics_table_size = sizeof(struct gpu_metrics_v2_2); + smu_table->gpu_metrics_table_size = max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_3)); + smu_table->gpu_metrics_table_size = max(smu_table->gpu_metrics_table_size, sizeof(struct gpu_metrics_v2_4)); smu_table->gpu_metrics_table = kzalloc(smu_table->gpu_metrics_table_size, GFP_KERNEL); if (!smu_table->gpu_metrics_table) goto err1_out; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: allow set/clear page_type again" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 9d08ec41a0645283d79a2e642205d488feaceacf Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 22:22:12 -0600 Subject: [PATCH] mm: allow set/clear page_type again Some page flags (page->flags) were converted to page types (page->page_types). A recent example is PG_hugetlb. From the exclusive writer's perspective, e.g., a thread doing __folio_set_hugetlb(), there is a difference between the page flag and type APIs: the former allows the same non-atomic operation to be repeated whereas the latter does not. For example, calling __folio_set_hugetlb() twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type (PG_hugetlb) not to be set previously. Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in the following error-handling path. And when that happens, it triggers the aforementioned VM_BUG_ON_FOLIO(). if (folio_test_hugetlb(folio)) { rc = hugetlb_vmemmap_restore_folio(h, folio); if (rc) { spin_lock_irq(&hugetlb_lock); add_hugetlb_folio(h, folio, false); ... It is possible to make hugeTLB comply with the new requirements from the page type API. However, a straightforward fix would be to just allow the same page type to be set or cleared again inside the API, to avoid any changes to its callers. Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/page-flags.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1b3a767104878..cc839e4365c18 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -975,12 +975,16 @@ static __always_inline bool folio_test_##fname(const struct folio *folio) \ } \ static __always_inline void __folio_set_##fname(struct folio *folio) \ { \ + if (folio_test_##fname(folio)) \ + return; \ VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX, \ folio); \ folio->page.page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __folio_clear_##fname(struct folio *folio) \ { \ + if (folio->page.page_type == UINT_MAX) \ + return; \ VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \ folio->page.page_type = UINT_MAX; \ } @@ -993,11 +997,15 @@ static __always_inline int Page##uname(const struct page *page) \ } \ static __always_inline void __SetPage##uname(struct page *page) \ { \ + if (Page##uname(page)) \ + return; \ VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page); \ page->page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __ClearPage##uname(struct page *page) \ { \ + if (page->page_type == UINT_MAX) \ + return; \ VM_BUG_ON_PAGE(!Page##uname(page), page); \ page->page_type = UINT_MAX; \ } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix defrag not merging contiguous extents due to merged extent maps" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 77b0d113eec49a7390ff1a08ca1923e89f5f86c6 Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Tue, 29 Oct 2024 15:18:45 +0000 Subject: [PATCH] btrfs: fix defrag not merging contiguous extents due to merged extent maps When running defrag (manual defrag) against a file that has extents that are contiguous and we already have the respective extent maps loaded and merged, we end up not defragging the range covered by those contiguous extents. This happens when we have an extent map that was the result of merging multiple extent maps for contiguous extents and the length of the merged extent map is greater than or equals to the defrag threshold length. The script below reproduces this scenario: $ cat test.sh #!/bin/bash DEV=/dev/sdi MNT=/mnt/sdi mkfs.btrfs -f $DEV mount $DEV $MNT # Create a 256K file with 4 extents of 64K each. xfs_io -f -c "falloc 0 64K" \ -c "pwrite 0 64K" \ -c "falloc 64K 64K" \ -c "pwrite 64K 64K" \ -c "falloc 128K 64K" \ -c "pwrite 128K 64K" \ -c "falloc 192K 64K" \ -c "pwrite 192K 64K" \ $MNT/foo umount $MNT echo -n "Initial number of file extent items: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l mount $DEV $MNT # Read the whole file in order to load and merge extent maps. cat $MNT/foo > /dev/null btrfs filesystem defragment -t 128K $MNT/foo umount $MNT echo -n "Number of file extent items after defrag with 128K threshold: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l mount $DEV $MNT # Read the whole file in order to load and merge extent maps. cat $MNT/foo > /dev/null btrfs filesystem defragment -t 256K $MNT/foo umount $MNT echo -n "Number of file extent items after defrag with 256K threshold: " btrfs inspect-internal dump-tree -t 5 $DEV | grep EXTENT_DATA | wc -l Running it: $ ./test.sh Initial number of file extent items: 4 Number of file extent items after defrag with 128K threshold: 4 Number of file extent items after defrag with 256K threshold: 4 The 4 extents don't get merged because we have an extent map with a size of 256K that is the result of merging the individual extent maps for each of the four 64K extents and at defrag_lookup_extent() we have a value of zero for the generation threshold ('newer_than' argument) since this is a manual defrag. As a consequence we don't call defrag_get_extent() to get an extent map representing a single file extent item in the inode's subvolume tree, so we end up using the merged extent map at defrag_collect_targets() and decide not to defrag. Fix this by updating defrag_lookup_extent() to always discard extent maps that were merged and call defrag_get_extent() regardless of the minimum generation threshold ('newer_than' argument). A test case for fstests will be sent along soon. CC: stable(a)vger.kernel.org # 6.1+ Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check") Reviewed-by: Qu Wenruo <wqu(a)suse.com> Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/defrag.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/defrag.c b/fs/btrfs/defrag.c index b95ef44c326bd..968dae9539482 100644 --- a/fs/btrfs/defrag.c +++ b/fs/btrfs/defrag.c @@ -763,12 +763,12 @@ static struct extent_map *defrag_lookup_extent(struct inode *inode, u64 start, * We can get a merged extent, in that case, we need to re-search * tree to get the original em for defrag. * - * If @newer_than is 0 or em::generation < newer_than, we can trust - * this em, as either we don't care about the generation, or the - * merged extent map will be rejected anyway. + * This is because even if we have adjacent extents that are contiguous + * and compatible (same type and flags), we still want to defrag them + * so that we use less metadata (extent items in the extent tree and + * file extent items in the inode's subvolume tree). */ - if (em && (em->flags & EXTENT_FLAG_MERGED) && - newer_than && em->generation >= newer_than) { + if (em && (em->flags & EXTENT_FLAG_MERGED)) { free_extent_map(em); em = NULL; } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: avoid unconditional one-tick sleep when swapcache_prepare fails" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001 From: Barry Song <v-songbaohua(a)oppo.com> Date: Fri, 27 Sep 2024 09:19:36 +1200 Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3ccee51adfbbd..bdf77a3ec47bc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "cxl/port: Fix CXL port initialization order when the subsystem is built-in" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 6575b268157f37929948a8d1f3bafb3d7c055bc1 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Fri, 25 Oct 2024 12:32:55 -0700 Subject: [PATCH] cxl/port: Fix CXL port initialization order when the subsystem is built-in When the CXL subsystem is built-in the module init order is determined by Makefile order. That order violates expectations. The expectation is that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race, cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That flow only works if cxl_acpi can assume ports are enabled immediately upon cxl_acpi_probe() return. That in turn can only happen in the CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before cxl_acpi_probe() runs. Fix up the order to prevent initialization failures. Ensure that cxl_port is built-in when cxl_acpi is also built-in, arrange for Makefile order to resolve the subsys_initcall() order of cxl_port and cxl_acpi, and arrange for Makefile order to resolve the device_initcall() (module_init()) order of the remaining objects. As for what contributed to this not being found earlier, the CXL regression environment, cxl_test, builds all CXL functionality as a module to allow to symbol mocking and other dynamic reload tests. As a result there is no regression coverage for the built-in case. Reported-by: Gregory Price <gourry(a)gourry.net> Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net Tested-by: Gregory Price <gourry(a)gourry.net> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: stable(a)vger.kernel.org Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: Alison Schofield <alison.schofield(a)intel.com> Cc: Vishal Verma <vishal.l.verma(a)intel.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Tested-by: Alejandro Lucero <alucerop(a)amd.com> Reviewed-by: Alejandro Lucero <alucerop(a)amd.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> --- drivers/cxl/Kconfig | 1 + drivers/cxl/Makefile | 20 ++++++++++++++------ drivers/cxl/port.c | 17 ++++++++++++++++- 3 files changed, 31 insertions(+), 7 deletions(-) diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index 29c192f20082c..876469e23f7a7 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -60,6 +60,7 @@ config CXL_ACPI default CXL_BUS select ACPI_TABLE_LIB select ACPI_HMAT + select CXL_PORT help Enable support for host managed device memory (HDM) resources published by a platform's ACPI CXL memory layout description. See diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile index db321f48ba52e..2caa90fa4bf25 100644 --- a/drivers/cxl/Makefile +++ b/drivers/cxl/Makefile @@ -1,13 +1,21 @@ # SPDX-License-Identifier: GPL-2.0 + +# Order is important here for the built-in case: +# - 'core' first for fundamental init +# - 'port' before platform root drivers like 'acpi' so that CXL-root ports +# are immediately enabled +# - 'mem' and 'pmem' before endpoint drivers so that memdevs are +# immediately enabled +# - 'pci' last, also mirrors the hardware enumeration hierarchy obj-y += core/ -obj-$(CONFIG_CXL_PCI) += cxl_pci.o -obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PORT) += cxl_port.o obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o -obj-$(CONFIG_CXL_PORT) += cxl_port.o +obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PCI) += cxl_pci.o -cxl_mem-y := mem.o -cxl_pci-y := pci.o +cxl_port-y := port.o cxl_acpi-y := acpi.o cxl_pmem-y := pmem.o security.o -cxl_port-y := port.o +cxl_mem-y := mem.o +cxl_pci-y := pci.o diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index 861dde65768fe..9dc394295e1fc 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = { }, }; -module_cxl_driver(cxl_port_driver); +static int __init cxl_port_init(void) +{ + return cxl_driver_register(&cxl_port_driver); +} +/* + * Be ready to immediately enable ports emitted by the platform CXL root + * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y. + */ +subsys_initcall(cxl_port_init); + +static void __exit cxl_port_exit(void) +{ + cxl_driver_unregister(&cxl_port_driver); +} +module_exit(cxl_port_exit); + MODULE_DESCRIPTION("CXL: Port enumeration and services"); MODULE_LICENSE("GPL v2"); MODULE_IMPORT_NS(CXL); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "wifi: iwlwifi: mvm: fix 6 GHz scan construction" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001 From: Johannes Berg <johannes.berg(a)intel.com> Date: Wed, 23 Oct 2024 09:17:44 +0200 Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction If more than 255 colocated APs exist for the set of all APs found during 2.4/5 GHz scanning, then the 6 GHz scan construction will loop forever since the loop variable has type u8, which can never reach the number found when that's bigger than 255, and is stored in a u32 variable. Also move it into the loops to have a smaller scope. Using a u32 there is fine, we limit the number of APs in the scan list and each has a limit on the number of RNR entries due to the frame size. With a limit of 1000 scan results, a frame size upper bound of 4096 (really it's more like ~2300) and a TBTT entry size of at least 11, we get an upper bound for the number of ~372k, well in the bounds of a u32. Cc: stable(a)vger.kernel.org Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375 Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101… Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> --- drivers/net/wireless/intel/iwlwifi/mvm/scan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 3ce9150213a74..ddcbd80a49fb2 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, &cp->channel_config[ch_cnt]; u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0; - u8 j, k, n_s_ssids = 0, n_bssids = 0; + u8 k, n_s_ssids = 0, n_bssids = 0; u8 max_s_ssids, max_bssids; bool force_passive = false, found = false, allow_passive = true, unsolicited_probe_on_chan = false, psc_no_listen = false; @@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, cfg->v5.iter_count = 1; cfg->v5.iter_interval = 0; - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { s8 tmp_psd_20; if (!(scan_6ghz_params[j].channel_idx == i)) @@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, * SSID. * TODO: improve this logic */ - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { if (!(scan_6ghz_params[j].channel_idx == i)) continue; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix extent map merging not happening for adjacent extents" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From a0f0625390858321525c2a8d04e174a546bd19b3 Mon Sep 17 00:00:00 2001 From: Filipe Manana <fdmanana(a)suse.com> Date: Mon, 28 Oct 2024 16:23:00 +0000 Subject: [PATCH] btrfs: fix extent map merging not happening for adjacent extents If we have 3 or more adjacent extents in a file, that is, consecutive file extent items pointing to adjacent extents, within a contiguous file range and compatible flags, we end up not merging all the extents into a single extent map. For example: $ mkfs.btrfs -f /dev/sdc $ mount /dev/sdc /mnt/sdc $ xfs_io -f -d -c "pwrite -b 64K 0 64K" \ -c "pwrite -b 64K 64K 64K" \ -c "pwrite -b 64K 128K 64K" \ -c "pwrite -b 64K 192K 64K" \ /mnt/sdc/foo After all the ordered extents complete we unpin the extent maps and try to merge them, but instead of getting a single extent map we get two because: 1) When the first ordered extent completes (file range [0, 64K)) we unpin its extent map and attempt to merge it with the extent map for the range [64K, 128K), but we can't because that extent map is still pinned; 2) When the second ordered extent completes (file range [64K, 128K)), we unpin its extent map and merge it with the previous extent map, for file range [0, 64K), but we can't merge with the next extent map, for the file range [128K, 192K), because this one is still pinned. The merged extent map for the file range [0, 128K) gets the flag EXTENT_MAP_MERGED set; 3) When the third ordered extent completes (file range [128K, 192K)), we unpin its extent map and attempt to merge it with the previous extent map, for file range [0, 128K), but we can't because that extent map has the flag EXTENT_MAP_MERGED set (mergeable_maps() returns false due to different flags) while the extent map for the range [128K, 192K) doesn't have that flag set. We also can't merge it with the next extent map, for file range [192K, 256K), because that one is still pinned. At this moment we have 3 extent maps: One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set. One for file range [128K, 192K). One for file range [192K, 256K) which is still pinned; 4) When the fourth and final extent completes (file range [192K, 256K)), we unpin its extent map and attempt to merge it with the previous extent map, for file range [128K, 192K), which succeeds since none of these extent maps have the EXTENT_MAP_MERGED flag set. So we end up with 2 extent maps: One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set. One for file range [128K, 256K), with the flag EXTENT_MAP_MERGED set. Since after merging extent maps we don't attempt to merge again, that is, merge the resulting extent map with the one that is now preceding it (and the one following it), we end up with those two extent maps, when we could have had a single extent map to represent the whole file. Fix this by making mergeable_maps() ignore the EXTENT_MAP_MERGED flag. While this doesn't present any functional issue, it prevents the merging of extent maps which allows to save memory, and can make defrag not merging extents too (that will be addressed in the next patch). Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check") CC: stable(a)vger.kernel.org # 6.1+ Reviewed-by: Qu Wenruo <wqu(a)suse.com> Signed-off-by: Filipe Manana <fdmanana(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/extent_map.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index 668c617444a50..1d93e1202c339 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -230,7 +230,12 @@ static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma if (extent_map_end(prev) != next->start) return false; - if (prev->flags != next->flags) + /* + * The merged flag is not an on-disk flag, it just indicates we had the + * extent maps of 2 (or more) adjacent extents merged, so factor it out. + */ + if ((prev->flags & ~EXTENT_FLAG_MERGED) != + (next->flags & ~EXTENT_FLAG_MERGED)) return false; if (next->disk_bytenr < EXTENT_MAP_LAST_BYTE - 1) -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids()" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From aec8e6bf839101784f3ef037dcdb9432c3f32343 Mon Sep 17 00:00:00 2001 From: Zhihao Cheng <chengzhihao1(a)huawei.com> Date: Mon, 21 Oct 2024 22:02:15 +0800 Subject: [PATCH] btrfs: fix use-after-free of block device file in __btrfs_free_extra_devids() MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Mounting btrfs from two images (which have the same one fsid and two different dev_uuids) in certain executing order may trigger an UAF for variable 'device->bdev_file' in __btrfs_free_extra_devids(). And following are the details: 1. Attach image_1 to loop0, attach image_2 to loop1, and scan btrfs devices by ioctl(BTRFS_IOC_SCAN_DEV): / btrfs_device_1 → loop0 fs_device \ btrfs_device_2 → loop1 2. mount /dev/loop0 /mnt btrfs_open_devices btrfs_device_1->bdev_file = btrfs_get_bdev_and_sb(loop0) btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree fail: btrfs_close_devices // -ENOMEM btrfs_close_bdev(btrfs_device_1) fput(btrfs_device_1->bdev_file) // btrfs_device_1->bdev_file is freed btrfs_close_bdev(btrfs_device_2) fput(btrfs_device_2->bdev_file) 3. mount /dev/loop1 /mnt btrfs_open_devices btrfs_get_bdev_and_sb(&bdev_file) // EIO, btrfs_device_1->bdev_file is not assigned, // which points to a freed memory area btrfs_device_2->bdev_file = btrfs_get_bdev_and_sb(loop1) btrfs_fill_super open_ctree btrfs_free_extra_devids if (btrfs_device_1->bdev_file) fput(btrfs_device_1->bdev_file) // UAF ! Fix it by setting 'device->bdev_file' as 'NULL' after closing the btrfs_device in btrfs_close_one_device(). Fixes: 142388194191 ("btrfs: do not background blkdev_put()") CC: stable(a)vger.kernel.org # 4.19+ Link: https://bugzilla.kernel.org/show_bug.cgi?id=219408 Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com> Reviewed-by: David Sterba <dsterba(a)suse.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/volumes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index 8f340ad1d9384..eb51b609190fb 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -1105,6 +1105,7 @@ static void btrfs_close_one_device(struct btrfs_device *device) if (device->bdev) { fs_devices->open_devices--; device->bdev = NULL; + device->bdev_file = NULL; } clear_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state); btrfs_destroy_dev_zone_info(device); -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "riscv: Do not use fortify in early code" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From afedc3126e11ff1404b32e538657b68022e933ca Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti <alexghiti(a)rivosinc.com> Date: Wed, 9 Oct 2024 09:27:49 +0200 Subject: [PATCH] riscv: Do not use fortify in early code Early code designates the code executed when the MMU is not yet enabled, and this comes with some limitations (see Documentation/arch/riscv/boot.rst, section "Pre-MMU execution"). FORTIFY_SOURCE must be disabled then since it can trigger kernel panics as reported in [1]. Reported-by: Jason Montleon <jmontleo(a)redhat.com> Closes: https://lore.kernel.org/linux-riscv/CAJD_bPJes4QhmXY5f63GHV9B9HFkSCoaZjk-qC… [1] Fixes: a35707c3d850 ("riscv: add memory-type errata for T-Head") Fixes: 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line") Cc: stable(a)vger.kernel.org Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Link: https://lore.kernel.org/r/20241009072749.45006-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> --- arch/riscv/errata/Makefile | 6 ++++++ arch/riscv/kernel/Makefile | 5 +++++ arch/riscv/kernel/pi/Makefile | 6 +++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile index 8a27394851233..f0da9d7b39c37 100644 --- a/arch/riscv/errata/Makefile +++ b/arch/riscv/errata/Makefile @@ -2,6 +2,12 @@ ifdef CONFIG_RELOCATABLE KBUILD_CFLAGS += -fno-pie endif +ifdef CONFIG_RISCV_ALTERNATIVE_EARLY +ifdef CONFIG_FORTIFY_SOURCE +KBUILD_CFLAGS += -D__NO_FORTIFY +endif +endif + obj-$(CONFIG_ERRATA_ANDES) += andes/ obj-$(CONFIG_ERRATA_SIFIVE) += sifive/ obj-$(CONFIG_ERRATA_THEAD) += thead/ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 7f88cc4931f5c..69dc8aaab3fb3 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -36,6 +36,11 @@ KASAN_SANITIZE_alternative.o := n KASAN_SANITIZE_cpufeature.o := n KASAN_SANITIZE_sbi_ecall.o := n endif +ifdef CONFIG_FORTIFY_SOURCE +CFLAGS_alternative.o += -D__NO_FORTIFY +CFLAGS_cpufeature.o += -D__NO_FORTIFY +CFLAGS_sbi_ecall.o += -D__NO_FORTIFY +endif endif extra-y += vmlinux.lds diff --git a/arch/riscv/kernel/pi/Makefile b/arch/riscv/kernel/pi/Makefile index d5bf1bc7de62e..81d69d45c06c3 100644 --- a/arch/riscv/kernel/pi/Makefile +++ b/arch/riscv/kernel/pi/Makefile @@ -16,8 +16,12 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) KBUILD_CFLAGS += -mcmodel=medany CFLAGS_cmdline_early.o += -D__NO_FORTIFY -CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY CFLAGS_fdt_early.o += -D__NO_FORTIFY +# lib/string.c already defines __NO_FORTIFY +CFLAGS_ctype.o += -D__NO_FORTIFY +CFLAGS_lib-fdt.o += -D__NO_FORTIFY +CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY +CFLAGS_archrandom_early.o += -D__NO_FORTIFY $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \ --remove-section=.note.gnu.property \ -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "btrfs: fix error propagation of split bios" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From d48e1dea3931de64c26717adc2b89743c7ab6594 Mon Sep 17 00:00:00 2001 From: Naohiro Aota <naohiro.aota(a)wdc.com> Date: Wed, 9 Oct 2024 22:52:06 +0900 Subject: [PATCH] btrfs: fix error propagation of split bios The purpose of btrfs_bbio_propagate_error() shall be propagating an error of split bio to its original btrfs_bio, and tell the error to the upper layer. However, it's not working well on some cases. * Case 1. Immediate (or quick) end_bio with an error When btrfs sends btrfs_bio to mirrored devices, btrfs calls btrfs_bio_end_io() when all the mirroring bios are completed. If that btrfs_bio was split, it is from btrfs_clone_bioset and its end_io function is btrfs_orig_write_end_io. For this case, btrfs_bbio_propagate_error() accesses the orig_bbio's bio context to increase the error count. That works well in most cases. However, if the end_io is called enough fast, orig_bbio's (remaining part after split) bio context may not be properly set at that time. Since the bio context is set when the orig_bbio (the last btrfs_bio) is sent to devices, that might be too late for earlier split btrfs_bio's completion. That will result in NULL pointer dereference. That bug is easily reproducible by running btrfs/146 on zoned devices [1] and it shows the following trace. [1] You need raid-stripe-tree feature as it create "-d raid0 -m raid1" FS. BUG: kernel NULL pointer dereference, address: 0000000000000020 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 0 P4D 0 Oops: Oops: 0000 [#1] PREEMPT SMP PTI CPU: 1 UID: 0 PID: 13 Comm: kworker/u32:1 Not tainted 6.11.0-rc7-BTRFS-ZNS+ #474 Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 Workqueue: writeback wb_workfn (flush-btrfs-5) RIP: 0010:btrfs_bio_end_io+0xae/0xc0 [btrfs] BTRFS error (device dm-0): bdev /dev/mapper/error-test errs: wr 2, rd 0, flush 0, corrupt 0, gen 0 RSP: 0018:ffffc9000006f248 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff888005a7f080 RCX: ffffc9000006f1dc RDX: 0000000000000000 RSI: 000000000000000a RDI: ffff888005a7f080 RBP: ffff888011dfc540 R08: 0000000000000000 R09: 0000000000000001 R10: ffffffff82e508e0 R11: 0000000000000005 R12: ffff88800ddfbe58 R13: ffff888005a7f080 R14: ffff888005a7f158 R15: ffff888005a7f158 FS: 0000000000000000(0000) GS:ffff88803ea80000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000020 CR3: 0000000002e22006 CR4: 0000000000370ef0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> ? __die_body.cold+0x19/0x26 ? page_fault_oops+0x13e/0x2b0 ? _printk+0x58/0x73 ? do_user_addr_fault+0x5f/0x750 ? exc_page_fault+0x76/0x240 ? asm_exc_page_fault+0x22/0x30 ? btrfs_bio_end_io+0xae/0xc0 [btrfs] ? btrfs_log_dev_io_error+0x7f/0x90 [btrfs] btrfs_orig_write_end_io+0x51/0x90 [btrfs] dm_submit_bio+0x5c2/0xa50 [dm_mod] ? find_held_lock+0x2b/0x80 ? blk_try_enter_queue+0x90/0x1e0 __submit_bio+0xe0/0x130 ? ktime_get+0x10a/0x160 ? lockdep_hardirqs_on+0x74/0x100 submit_bio_noacct_nocheck+0x199/0x410 btrfs_submit_bio+0x7d/0x150 [btrfs] btrfs_submit_chunk+0x1a1/0x6d0 [btrfs] ? lockdep_hardirqs_on+0x74/0x100 ? __folio_start_writeback+0x10/0x2c0 btrfs_submit_bbio+0x1c/0x40 [btrfs] submit_one_bio+0x44/0x60 [btrfs] submit_extent_folio+0x13f/0x330 [btrfs] ? btrfs_set_range_writeback+0xa3/0xd0 [btrfs] extent_writepage_io+0x18b/0x360 [btrfs] extent_write_locked_range+0x17c/0x340 [btrfs] ? __pfx_end_bbio_data_write+0x10/0x10 [btrfs] run_delalloc_cow+0x71/0xd0 [btrfs] btrfs_run_delalloc_range+0x176/0x500 [btrfs] ? find_lock_delalloc_range+0x119/0x260 [btrfs] writepage_delalloc+0x2ab/0x480 [btrfs] extent_write_cache_pages+0x236/0x7d0 [btrfs] btrfs_writepages+0x72/0x130 [btrfs] do_writepages+0xd4/0x240 ? find_held_lock+0x2b/0x80 ? wbc_attach_and_unlock_inode+0x12c/0x290 ? wbc_attach_and_unlock_inode+0x12c/0x290 __writeback_single_inode+0x5c/0x4c0 ? do_raw_spin_unlock+0x49/0xb0 writeback_sb_inodes+0x22c/0x560 __writeback_inodes_wb+0x4c/0xe0 wb_writeback+0x1d6/0x3f0 wb_workfn+0x334/0x520 process_one_work+0x1ee/0x570 ? lock_is_held_type+0xc6/0x130 worker_thread+0x1d1/0x3b0 ? __pfx_worker_thread+0x10/0x10 kthread+0xee/0x120 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Modules linked in: dm_mod btrfs blake2b_generic xor raid6_pq rapl CR2: 0000000000000020 * Case 2. Earlier completion of orig_bbio for mirrored btrfs_bios btrfs_bbio_propagate_error() assumes the end_io function for orig_bbio is called last among split bios. In that case, btrfs_orig_write_end_io() sets the bio->bi_status to BLK_STS_IOERR by seeing the bioc->error [2]. Otherwise, the increased orig_bio's bioc->error is not checked by anyone and return BLK_STS_OK to the upper layer. [2] Actually, this is not true. Because we only increases orig_bioc->errors by max_errors, the condition "atomic_read(&bioc->error) > bioc->max_errors" is still not met if only one split btrfs_bio fails. * Case 3. Later completion of orig_bbio for un-mirrored btrfs_bios In contrast to the above case, btrfs_bbio_propagate_error() is not working well if un-mirrored orig_bbio is completed last. It sets orig_bbio->bio.bi_status to the btrfs_bio's error. But, that is easily over-written by orig_bbio's completion status. If the status is BLK_STS_OK, the upper layer would not know the failure. * Solution Considering the above cases, we can only save the error status in the orig_bbio (remaining part after split) itself as it is always available. Also, the saved error status should be propagated when all the split btrfs_bios are finished (i.e, bbio->pending_ios == 0). This commit introduces "status" to btrfs_bbio and saves the first error of split bios to original btrfs_bio's "status" variable. When all the split bios are finished, the saved status is loaded into original btrfs_bio's status. With this commit, btrfs/146 on zoned devices does not hit the NULL pointer dereference anymore. Fixes: 852eee62d31a ("btrfs: allow btrfs_submit_bio to split bios") CC: stable(a)vger.kernel.org # 6.6+ Reviewed-by: Qu Wenruo <wqu(a)suse.com> Reviewed-by: Christoph Hellwig <hch(a)lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota(a)wdc.com> Signed-off-by: David Sterba <dsterba(a)suse.com> --- fs/btrfs/bio.c | 37 +++++++++++++------------------------ fs/btrfs/bio.h | 3 +++ 2 files changed, 16 insertions(+), 24 deletions(-) diff --git a/fs/btrfs/bio.c b/fs/btrfs/bio.c index ce13416bc10f0..f83ec5a1baa60 100644 --- a/fs/btrfs/bio.c +++ b/fs/btrfs/bio.c @@ -49,6 +49,7 @@ void btrfs_bio_init(struct btrfs_bio *bbio, struct btrfs_fs_info *fs_info, bbio->end_io = end_io; bbio->private = private; atomic_set(&bbio->pending_ios, 1); + WRITE_ONCE(bbio->status, BLK_STS_OK); } /* @@ -120,41 +121,29 @@ static void __btrfs_bio_end_io(struct btrfs_bio *bbio) } } -static void btrfs_orig_write_end_io(struct bio *bio); - -static void btrfs_bbio_propagate_error(struct btrfs_bio *bbio, - struct btrfs_bio *orig_bbio) -{ - /* - * For writes we tolerate nr_mirrors - 1 write failures, so we can't - * just blindly propagate a write failure here. Instead increment the - * error count in the original I/O context so that it is guaranteed to - * be larger than the error tolerance. - */ - if (bbio->bio.bi_end_io == &btrfs_orig_write_end_io) { - struct btrfs_io_stripe *orig_stripe = orig_bbio->bio.bi_private; - struct btrfs_io_context *orig_bioc = orig_stripe->bioc; - - atomic_add(orig_bioc->max_errors, &orig_bioc->error); - } else { - orig_bbio->bio.bi_status = bbio->bio.bi_status; - } -} - void btrfs_bio_end_io(struct btrfs_bio *bbio, blk_status_t status) { bbio->bio.bi_status = status; if (bbio->bio.bi_pool == &btrfs_clone_bioset) { struct btrfs_bio *orig_bbio = bbio->private; - if (bbio->bio.bi_status) - btrfs_bbio_propagate_error(bbio, orig_bbio); btrfs_cleanup_bio(bbio); bbio = orig_bbio; } - if (atomic_dec_and_test(&bbio->pending_ios)) + /* + * At this point, bbio always points to the original btrfs_bio. Save + * the first error in it. + */ + if (status != BLK_STS_OK) + cmpxchg(&bbio->status, BLK_STS_OK, status); + + if (atomic_dec_and_test(&bbio->pending_ios)) { + /* Load split bio's error which might be set above. */ + if (status == BLK_STS_OK) + bbio->bio.bi_status = READ_ONCE(bbio->status); __btrfs_bio_end_io(bbio); + } } static int next_repair_mirror(struct btrfs_failed_bio *fbio, int cur_mirror) diff --git a/fs/btrfs/bio.h b/fs/btrfs/bio.h index e486123407458..e2fe16074ad65 100644 --- a/fs/btrfs/bio.h +++ b/fs/btrfs/bio.h @@ -79,6 +79,9 @@ struct btrfs_bio { /* File system that this I/O operates on. */ struct btrfs_fs_info *fs_info; + /* Save the first error status of split bio. */ + blk_status_t status; + /* * This member must come last, bio_alloc_bioset will allocate enough * bytes for entire btrfs_bio but relies on bio being last. -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats" failed to apply to v6.6-stable tree

by Sasha Levin

The patch below does not apply to the v6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:38 +0000 Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google… This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: David Stevens <stevensd(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mmzone.h | 2 -- mm/vmscan.c | 14 +++++--------- 2 files changed, 5 insertions(+), 11 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 17506e4a28355..9342e5692dab6 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -458,9 +458,7 @@ struct lru_gen_folio { enum { MM_LEAF_TOTAL, /* total leaf entries */ - MM_LEAF_OLD, /* old leaf entries */ MM_LEAF_YOUNG, /* young leaf entries */ - MM_NONLEAF_TOTAL, /* total non-leaf entries */ MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ NR_MM_STATS diff --git a/mm/vmscan.c b/mm/vmscan.c index eb4e8440c5071..4f1d33e4b3601 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, continue; if (!pte_young(ptent)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_TOTAL]++; if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, continue; } - walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; @@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, for (tier = 0; tier < MAX_NR_TIERS; tier++) { seq_printf(m, " %10d", tier); for (type = 0; type < ANON_AND_FILE; type++) { - const char *s = " "; + const char *s = "xxx"; unsigned long n[3] = {}; if (seq == max_seq) { - s = "RT "; + s = "RTx"; n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); n[1] = READ_ONCE(lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { @@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, seq_puts(m, " "); for (i = 0; i < NR_MM_STATS; i++) { - const char *s = " "; + const char *s = "xxxx"; unsigned long n = 0; if (seq == max_seq && NR_HIST_GENS == 1) { - s = "LOYNFA"; + s = "TYFA"; n = READ_ONCE(mm_state->stats[hist][i]); } else if (seq != max_seq && NR_HIST_GENS > 1) { - s = "loynfa"; + s = "tyfa"; n = READ_ONCE(mm_state->stats[hist][i]); } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: allow set/clear page_type again" failed to apply to v6.11-stable tree

by Sasha Levin

The patch below does not apply to the v6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 9d08ec41a0645283d79a2e642205d488feaceacf Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 22:22:12 -0600 Subject: [PATCH] mm: allow set/clear page_type again Some page flags (page->flags) were converted to page types (page->page_types). A recent example is PG_hugetlb. From the exclusive writer's perspective, e.g., a thread doing __folio_set_hugetlb(), there is a difference between the page flag and type APIs: the former allows the same non-atomic operation to be repeated whereas the latter does not. For example, calling __folio_set_hugetlb() twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type (PG_hugetlb) not to be set previously. Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in the following error-handling path. And when that happens, it triggers the aforementioned VM_BUG_ON_FOLIO(). if (folio_test_hugetlb(folio)) { rc = hugetlb_vmemmap_restore_folio(h, folio); if (rc) { spin_lock_irq(&hugetlb_lock); add_hugetlb_folio(h, folio, false); ... It is possible to make hugeTLB comply with the new requirements from the page type API. However, a straightforward fix would be to just allow the same page type to be set or cleared again inside the API, to avoid any changes to its callers. Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/page-flags.h | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1b3a767104878..cc839e4365c18 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -975,12 +975,16 @@ static __always_inline bool folio_test_##fname(const struct folio *folio) \ } \ static __always_inline void __folio_set_##fname(struct folio *folio) \ { \ + if (folio_test_##fname(folio)) \ + return; \ VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX, \ folio); \ folio->page.page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __folio_clear_##fname(struct folio *folio) \ { \ + if (folio->page.page_type == UINT_MAX) \ + return; \ VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \ folio->page.page_type = UINT_MAX; \ } @@ -993,11 +997,15 @@ static __always_inline int Page##uname(const struct page *page) \ } \ static __always_inline void __SetPage##uname(struct page *page) \ { \ + if (Page##uname(page)) \ + return; \ VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page); \ page->page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __ClearPage##uname(struct page *page) \ { \ + if (page->page_type == UINT_MAX) \ + return; \ VM_BUG_ON_PAGE(!Page##uname(page), page); \ page->page_type = UINT_MAX; \ } -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "mm: avoid unconditional one-tick sleep when swapcache_prepare fails" failed to apply to v6.11-stable tree

by Sasha Levin

The patch below does not apply to the v6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001 From: Barry Song <v-songbaohua(a)oppo.com> Date: Fri, 27 Sep 2024 09:19:36 +1200 Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3ccee51adfbbd..bdf77a3ec47bc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "riscv: Do not use fortify in early code" failed to apply to v6.11-stable tree

by Sasha Levin

The patch below does not apply to the v6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From afedc3126e11ff1404b32e538657b68022e933ca Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti <alexghiti(a)rivosinc.com> Date: Wed, 9 Oct 2024 09:27:49 +0200 Subject: [PATCH] riscv: Do not use fortify in early code Early code designates the code executed when the MMU is not yet enabled, and this comes with some limitations (see Documentation/arch/riscv/boot.rst, section "Pre-MMU execution"). FORTIFY_SOURCE must be disabled then since it can trigger kernel panics as reported in [1]. Reported-by: Jason Montleon <jmontleo(a)redhat.com> Closes: https://lore.kernel.org/linux-riscv/CAJD_bPJes4QhmXY5f63GHV9B9HFkSCoaZjk-qC… [1] Fixes: a35707c3d850 ("riscv: add memory-type errata for T-Head") Fixes: 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line") Cc: stable(a)vger.kernel.org Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Link: https://lore.kernel.org/r/20241009072749.45006-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> --- arch/riscv/errata/Makefile | 6 ++++++ arch/riscv/kernel/Makefile | 5 +++++ arch/riscv/kernel/pi/Makefile | 6 +++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile index 8a27394851233..f0da9d7b39c37 100644 --- a/arch/riscv/errata/Makefile +++ b/arch/riscv/errata/Makefile @@ -2,6 +2,12 @@ ifdef CONFIG_RELOCATABLE KBUILD_CFLAGS += -fno-pie endif +ifdef CONFIG_RISCV_ALTERNATIVE_EARLY +ifdef CONFIG_FORTIFY_SOURCE +KBUILD_CFLAGS += -D__NO_FORTIFY +endif +endif + obj-$(CONFIG_ERRATA_ANDES) += andes/ obj-$(CONFIG_ERRATA_SIFIVE) += sifive/ obj-$(CONFIG_ERRATA_THEAD) += thead/ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 7f88cc4931f5c..69dc8aaab3fb3 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -36,6 +36,11 @@ KASAN_SANITIZE_alternative.o := n KASAN_SANITIZE_cpufeature.o := n KASAN_SANITIZE_sbi_ecall.o := n endif +ifdef CONFIG_FORTIFY_SOURCE +CFLAGS_alternative.o += -D__NO_FORTIFY +CFLAGS_cpufeature.o += -D__NO_FORTIFY +CFLAGS_sbi_ecall.o += -D__NO_FORTIFY +endif endif extra-y += vmlinux.lds diff --git a/arch/riscv/kernel/pi/Makefile b/arch/riscv/kernel/pi/Makefile index d5bf1bc7de62e..81d69d45c06c3 100644 --- a/arch/riscv/kernel/pi/Makefile +++ b/arch/riscv/kernel/pi/Makefile @@ -16,8 +16,12 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) KBUILD_CFLAGS += -mcmodel=medany CFLAGS_cmdline_early.o += -D__NO_FORTIFY -CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY CFLAGS_fdt_early.o += -D__NO_FORTIFY +# lib/string.c already defines __NO_FORTIFY +CFLAGS_ctype.o += -D__NO_FORTIFY +CFLAGS_lib-fdt.o += -D__NO_FORTIFY +CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY +CFLAGS_archrandom_early.o += -D__NO_FORTIFY $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \ --remove-section=.note.gnu.property \ -- 2.43.0

1 year, 1 month

1
0
0 0

FAILED: Patch "riscv: Do not use fortify in early code" failed to apply to v6.11-stable tree

by Sasha Levin

The patch below does not apply to the v6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Thanks, Sasha ------------------ original commit in Linus's tree ------------------ From afedc3126e11ff1404b32e538657b68022e933ca Mon Sep 17 00:00:00 2001 From: Alexandre Ghiti <alexghiti(a)rivosinc.com> Date: Wed, 9 Oct 2024 09:27:49 +0200 Subject: [PATCH] riscv: Do not use fortify in early code Early code designates the code executed when the MMU is not yet enabled, and this comes with some limitations (see Documentation/arch/riscv/boot.rst, section "Pre-MMU execution"). FORTIFY_SOURCE must be disabled then since it can trigger kernel panics as reported in [1]. Reported-by: Jason Montleon <jmontleo(a)redhat.com> Closes: https://lore.kernel.org/linux-riscv/CAJD_bPJes4QhmXY5f63GHV9B9HFkSCoaZjk-qC… [1] Fixes: a35707c3d850 ("riscv: add memory-type errata for T-Head") Fixes: 26e7aacb83df ("riscv: Allow to downgrade paging mode from the command line") Cc: stable(a)vger.kernel.org Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Link: https://lore.kernel.org/r/20241009072749.45006-1-alexghiti@rivosinc.com Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> --- arch/riscv/errata/Makefile | 6 ++++++ arch/riscv/kernel/Makefile | 5 +++++ arch/riscv/kernel/pi/Makefile | 6 +++++- 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/arch/riscv/errata/Makefile b/arch/riscv/errata/Makefile index 8a27394851233..f0da9d7b39c37 100644 --- a/arch/riscv/errata/Makefile +++ b/arch/riscv/errata/Makefile @@ -2,6 +2,12 @@ ifdef CONFIG_RELOCATABLE KBUILD_CFLAGS += -fno-pie endif +ifdef CONFIG_RISCV_ALTERNATIVE_EARLY +ifdef CONFIG_FORTIFY_SOURCE +KBUILD_CFLAGS += -D__NO_FORTIFY +endif +endif + obj-$(CONFIG_ERRATA_ANDES) += andes/ obj-$(CONFIG_ERRATA_SIFIVE) += sifive/ obj-$(CONFIG_ERRATA_THEAD) += thead/ diff --git a/arch/riscv/kernel/Makefile b/arch/riscv/kernel/Makefile index 7f88cc4931f5c..69dc8aaab3fb3 100644 --- a/arch/riscv/kernel/Makefile +++ b/arch/riscv/kernel/Makefile @@ -36,6 +36,11 @@ KASAN_SANITIZE_alternative.o := n KASAN_SANITIZE_cpufeature.o := n KASAN_SANITIZE_sbi_ecall.o := n endif +ifdef CONFIG_FORTIFY_SOURCE +CFLAGS_alternative.o += -D__NO_FORTIFY +CFLAGS_cpufeature.o += -D__NO_FORTIFY +CFLAGS_sbi_ecall.o += -D__NO_FORTIFY +endif endif extra-y += vmlinux.lds diff --git a/arch/riscv/kernel/pi/Makefile b/arch/riscv/kernel/pi/Makefile index d5bf1bc7de62e..81d69d45c06c3 100644 --- a/arch/riscv/kernel/pi/Makefile +++ b/arch/riscv/kernel/pi/Makefile @@ -16,8 +16,12 @@ KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) KBUILD_CFLAGS += -mcmodel=medany CFLAGS_cmdline_early.o += -D__NO_FORTIFY -CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY CFLAGS_fdt_early.o += -D__NO_FORTIFY +# lib/string.c already defines __NO_FORTIFY +CFLAGS_ctype.o += -D__NO_FORTIFY +CFLAGS_lib-fdt.o += -D__NO_FORTIFY +CFLAGS_lib-fdt_ro.o += -D__NO_FORTIFY +CFLAGS_archrandom_early.o += -D__NO_FORTIFY $(obj)/%.pi.o: OBJCOPYFLAGS := --prefix-symbols=__pi_ \ --remove-section=.note.gnu.property \ -- 2.43.0

1 year, 1 month

1
0
0 0

[merged mm-nonmm-stable] ipc-fix-memleak-if-msg_init_ns-failed-in-create_ipc_ns.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: ipc: fix memleak if msg_init_ns failed in create_ipc_ns has been removed from the -mm tree. Its filename was ipc-fix-memleak-if-msg_init_ns-failed-in-create_ipc_ns.patch This patch was dropped because it was merged into the mm-nonmm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Wupeng Ma <mawupeng1(a)huawei.com> Subject: ipc: fix memleak if msg_init_ns failed in create_ipc_ns Date: Wed, 23 Oct 2024 17:31:29 +0800 From: Ma Wupeng <mawupeng1(a)huawei.com> Percpu memory allocation may failed during create_ipc_ns however this fail is not handled properly since ipc sysctls and mq sysctls is not released properly. Fix this by release these two resource when failure. Here is the kmemleak stack when percpu failed: unreferenced object 0xffff88819de2a600 (size 512): comm "shmem_2nstest", pid 120711, jiffies 4300542254 hex dump (first 32 bytes): 60 aa 9d 84 ff ff ff ff fc 18 48 b2 84 88 ff ff `.........H..... 04 00 00 00 a4 01 00 00 20 e4 56 81 ff ff ff ff ........ .V..... backtrace (crc be7cba35): [<ffffffff81b43f83>] __kmalloc_node_track_caller_noprof+0x333/0x420 [<ffffffff81a52e56>] kmemdup_noprof+0x26/0x50 [<ffffffff821b2f37>] setup_mq_sysctls+0x57/0x1d0 [<ffffffff821b29cc>] copy_ipcs+0x29c/0x3b0 [<ffffffff815d6a10>] create_new_namespaces+0x1d0/0x920 [<ffffffff815d7449>] copy_namespaces+0x2e9/0x3e0 [<ffffffff815458f3>] copy_process+0x29f3/0x7ff0 [<ffffffff8154b080>] kernel_clone+0xc0/0x650 [<ffffffff8154b6b1>] __do_sys_clone+0xa1/0xe0 [<ffffffff843df8ff>] do_syscall_64+0xbf/0x1c0 [<ffffffff846000b0>] entry_SYSCALL_64_after_hwframe+0x4b/0x53 Link: https://lkml.kernel.org/r/20241023093129.3074301-1-mawupeng1@huawei.com Fixes: 72d1e611082e ("ipc/msg: mitigate the lock contention with percpu counter") Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- ipc/namespace.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/ipc/namespace.c~ipc-fix-memleak-if-msg_init_ns-failed-in-create_ipc_ns +++ a/ipc/namespace.c @@ -83,13 +83,15 @@ static struct ipc_namespace *create_ipc_ err = msg_init_ns(ns); if (err) - goto fail_put; + goto fail_ipc; sem_init_ns(ns); shm_init_ns(ns); return ns; +fail_ipc: + retire_ipc_sysctls(ns); fail_mq: retire_mq_sysctls(ns); _ Patches currently in -mm which might be from mawupeng1(a)huawei.com are

1 year, 1 month

1
0
0 0

[merged mm-stable] mm-damon-vaddr-fix-issue-in-damon_va_evenly_split_region.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/damon/vaddr: fix issue in damon_va_evenly_split_region() has been removed from the -mm tree. Its filename was mm-damon-vaddr-fix-issue-in-damon_va_evenly_split_region.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Zheng Yejian <zhengyejian(a)huaweicloud.com> Subject: mm/damon/vaddr: fix issue in damon_va_evenly_split_region() Date: Tue, 22 Oct 2024 16:39:26 +0800 Patch series "mm/damon/vaddr: Fix issue in damon_va_evenly_split_region()". v2. According to the logic of damon_va_evenly_split_region(), currently following split case would not meet the expectation: Suppose DAMON_MIN_REGION=0x1000, Case: Split [0x0, 0x3000) into 2 pieces, then the result would be acutually 3 regions: [0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000) but NOT the expected 2 regions: [0x0, 0x1000), [0x1000, 0x3000) !!! The root cause is that when calculating size of each split piece in damon_va_evenly_split_region(): `sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION);` both the dividing and the ALIGN_DOWN may cause loss of precision, then each time split one piece of size 'sz_piece' from origin 'start' to 'end' would cause more pieces are split out than expected!!! To fix it, count for each piece split and make sure no more than 'nr_pieces'. In addition, add above case into damon_test_split_evenly(). And add 'nr_piece == 1' check in damon_va_evenly_split_region() for better code readability and add a corresponding kunit testcase. This patch (of 2): According to the logic of damon_va_evenly_split_region(), currently following split case would not meet the expectation: Suppose DAMON_MIN_REGION=0x1000, Case: Split [0x0, 0x3000) into 2 pieces, then the result would be acutually 3 regions: [0x0, 0x1000), [0x1000, 0x2000), [0x2000, 0x3000) but NOT the expected 2 regions: [0x0, 0x1000), [0x1000, 0x3000) !!! The root cause is that when calculating size of each split piece in damon_va_evenly_split_region(): `sz_piece = ALIGN_DOWN(sz_orig / nr_pieces, DAMON_MIN_REGION);` both the dividing and the ALIGN_DOWN may cause loss of precision, then each time split one piece of size 'sz_piece' from origin 'start' to 'end' would cause more pieces are split out than expected!!! To fix it, count for each piece split and make sure no more than 'nr_pieces'. In addition, add above case into damon_test_split_evenly(). After this patch, damon-operations test passed: # ./tools/testing/kunit/kunit.py run damon-operations [...] ============== damon-operations (6 subtests) =============== [PASSED] damon_test_three_regions_in_vmas [PASSED] damon_test_apply_three_regions1 [PASSED] damon_test_apply_three_regions2 [PASSED] damon_test_apply_three_regions3 [PASSED] damon_test_apply_three_regions4 [PASSED] damon_test_split_evenly ================ [PASSED] damon-operations ================= Link: https://lkml.kernel.org/r/20241022083927.3592237-1-zhengyejian@huaweicloud.… Link: https://lkml.kernel.org/r/20241022083927.3592237-2-zhengyejian@huaweicloud.… Fixes: 3f49584b262c ("mm/damon: implement primitives for the virtual memory address spaces") Signed-off-by: Zheng Yejian <zhengyejian(a)huaweicloud.com> Reviewed-by: SeongJae Park <sj(a)kernel.org> Cc: Fernand Sieber <sieberf(a)amazon.com> Cc: Leonard Foerster <foersleo(a)amazon.de> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Ye Weihua <yeweihua4(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/damon/tests/vaddr-kunit.h | 1 + mm/damon/vaddr.c | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) --- a/mm/damon/tests/vaddr-kunit.h~mm-damon-vaddr-fix-issue-in-damon_va_evenly_split_region +++ a/mm/damon/tests/vaddr-kunit.h @@ -300,6 +300,7 @@ static void damon_test_split_evenly(stru damon_test_split_evenly_fail(test, 0, 100, 0); damon_test_split_evenly_succ(test, 0, 100, 10); damon_test_split_evenly_succ(test, 5, 59, 5); + damon_test_split_evenly_succ(test, 0, 3, 2); damon_test_split_evenly_fail(test, 5, 6, 2); } --- a/mm/damon/vaddr.c~mm-damon-vaddr-fix-issue-in-damon_va_evenly_split_region +++ a/mm/damon/vaddr.c @@ -67,6 +67,7 @@ static int damon_va_evenly_split_region( unsigned long sz_orig, sz_piece, orig_end; struct damon_region *n = NULL, *next; unsigned long start; + unsigned int i; if (!r || !nr_pieces) return -EINVAL; @@ -80,8 +81,7 @@ static int damon_va_evenly_split_region( r->ar.end = r->ar.start + sz_piece; next = damon_next_region(r); - for (start = r->ar.end; start + sz_piece <= orig_end; - start += sz_piece) { + for (start = r->ar.end, i = 1; i < nr_pieces; start += sz_piece, i++) { n = damon_new_region(start, start + sz_piece); if (!n) return -ENOMEM; _ Patches currently in -mm which might be from zhengyejian(a)huaweicloud.com are

1 year, 1 month

1
0
0 0

[merged mm-stable] mm-vmalloc-combine-all-tlb-flush-operations-of-kasan-shadow-virtual-address-into-one-operation.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/vmalloc: combine all TLB flush operations of KASAN shadow virtual address into one operation has been removed from the -mm tree. Its filename was mm-vmalloc-combine-all-tlb-flush-operations-of-kasan-shadow-virtual-address-into-one-operation.patch This patch was dropped because it was merged into the mm-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Adrian Huang <ahuang12(a)lenovo.com> Subject: mm/vmalloc: combine all TLB flush operations of KASAN shadow virtual address into one operation Date: Sat, 27 Jul 2024 00:52:46 +0800 When compiling kernel source 'make -j $(nproc)' with the up-and-running KASAN-enabled kernel on a 256-core machine, the following soft lockup is shown: watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [kworker/28:1:1760] CPU: 28 PID: 1760 Comm: kworker/28:1 Kdump: loaded Not tainted 6.10.0-rc5 #95 Workqueue: events drain_vmap_area_work RIP: 0010:smp_call_function_many_cond+0x1d8/0xbb0 Code: 38 c8 7c 08 84 c9 0f 85 49 08 00 00 8b 45 08 a8 01 74 2e 48 89 f1 49 89 f7 48 c1 e9 03 41 83 e7 07 4c 01 e9 41 83 c7 03 f3 90 <0f> b6 01 41 38 c7 7c 08 84 c0 0f 85 d4 06 00 00 8b 45 08 a8 01 75 RSP: 0018:ffffc9000cb3fb60 EFLAGS: 00000202 RAX: 0000000000000011 RBX: ffff8883bc4469c0 RCX: ffffed10776e9949 RDX: 0000000000000002 RSI: ffff8883bb74ca48 RDI: ffffffff8434dc50 RBP: ffff8883bb74ca40 R08: ffff888103585dc0 R09: ffff8884533a1800 R10: 0000000000000004 R11: ffffffffffffffff R12: ffffed1077888d39 R13: dffffc0000000000 R14: ffffed1077888d38 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff8883bc400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005577b5c8d158 CR3: 0000000004850000 CR4: 0000000000350ef0 Call Trace: <IRQ> ? watchdog_timer_fn+0x2cd/0x390 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x300/0x6d0 ? sched_clock_cpu+0x69/0x4e0 ? __pfx___hrtimer_run_queues+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? ktime_get_update_offsets_now+0x7f/0x2a0 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? hrtimer_interrupt+0x2ca/0x760 ? __sysvec_apic_timer_interrupt+0x8c/0x2b0 ? sysvec_apic_timer_interrupt+0x6a/0x90 </IRQ> <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? smp_call_function_many_cond+0x1d8/0xbb0 ? __pfx_do_kernel_range_flush+0x10/0x10 on_each_cpu_cond_mask+0x20/0x40 flush_tlb_kernel_range+0x19b/0x250 ? srso_return_thunk+0x5/0x5f ? kasan_release_vmalloc+0xa7/0xc0 purge_vmap_node+0x357/0x820 ? __pfx_purge_vmap_node+0x10/0x10 __purge_vmap_area_lazy+0x5b8/0xa10 drain_vmap_area_work+0x21/0x30 process_one_work+0x661/0x10b0 worker_thread+0x844/0x10e0 ? srso_return_thunk+0x5/0x5f ? __kthread_parkme+0x82/0x140 ? __pfx_worker_thread+0x10/0x10 kthread+0x2a5/0x370 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Debugging Analysis: 1. The following ftrace log shows that the lockup CPU spends too much time iterating vmap_nodes and flushing TLB when purging vm_area structures. (Some info is trimmed). kworker: funcgraph_entry: | drain_vmap_area_work() { kworker: funcgraph_entry: | mutex_lock() { kworker: funcgraph_entry: 1.092 us | __cond_resched(); kworker: funcgraph_exit: 3.306 us | } ... ... kworker: funcgraph_entry: | flush_tlb_kernel_range() { ... ... kworker: funcgraph_exit: # 7533.649 us | } ... ... kworker: funcgraph_entry: 2.344 us | mutex_unlock(); kworker: funcgraph_exit: $ 23871554 us | } The drain_vmap_area_work() spends over 23 seconds. There are 2805 flush_tlb_kernel_range() calls in the ftrace log. * One is called in __purge_vmap_area_lazy(). * Others are called by purge_vmap_node->kasan_release_vmalloc. purge_vmap_node() iteratively releases kasan vmalloc allocations and flushes TLB for each vmap_area. - [Rough calculation] Each flush_tlb_kernel_range() runs about 7.5ms. -- 2804 * 7.5ms = 21.03 seconds. -- That's why a soft lock is triggered. 2. Extending the soft lockup time can work around the issue (For example, # echo 60 > /proc/sys/kernel/watchdog_thresh). This confirms the above-mentioned speculation: drain_vmap_area_work() spends too much time. If we combine all TLB flush operations of the KASAN shadow virtual address into one operation in the call path 'purge_vmap_node()->kasan_release_vmalloc()', the running time of drain_vmap_area_work() can be saved greatly. The idea is from the flush_tlb_kernel_range() call in __purge_vmap_area_lazy(). And, the soft lockup won't be triggered. Here is the test result based on 6.10: [6.10 wo/ the patch] 1. ftrace latency profiling (record a trace if the latency > 20s). echo 20000000 > /sys/kernel/debug/tracing/tracing_thresh echo drain_vmap_area_work > /sys/kernel/debug/tracing/set_graph_function echo function_graph > /sys/kernel/debug/tracing/current_tracer echo 1 > /sys/kernel/debug/tracing/tracing_on 2. Run `make -j $(nproc)` to compile the kernel source 3. Once the soft lockup is reproduced, check the ftrace log: cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 76) $ 50412985 us | } /* __purge_vmap_area_lazy */ 76) $ 50412997 us | } /* drain_vmap_area_work */ 76) $ 29165911 us | } /* __purge_vmap_area_lazy */ 76) $ 29165926 us | } /* drain_vmap_area_work */ 91) $ 53629423 us | } /* __purge_vmap_area_lazy */ 91) $ 53629434 us | } /* drain_vmap_area_work */ 91) $ 28121014 us | } /* __purge_vmap_area_lazy */ 91) $ 28121026 us | } /* drain_vmap_area_work */ [6.10 w/ the patch] 1. Repeat step 1-2 in "[6.10 wo/ the patch]" 2. The soft lockup is not triggered and ftrace log is empty. cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 3. Setting 'tracing_thresh' to 10/5 seconds does not get any ftrace log. 4. Setting 'tracing_thresh' to 1 second gets ftrace log. cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 23) $ 1074942 us | } /* __purge_vmap_area_lazy */ 23) $ 1074950 us | } /* drain_vmap_area_work */ The worst execution time of drain_vmap_area_work() is about 1 second. Link: https://lore.kernel.org/lkml/ZqFlawuVnOMY2k3E@pc638.lan/ Link: https://lkml.kernel.org/r/20240726165246.31326-1-ahuang12@lenovo.com Fixes: 282631cb2447 ("mm: vmalloc: remove global purge_vmap_area_root rb-tree") Signed-off-by: Adrian Huang <ahuang12(a)lenovo.com> Co-developed-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki(a)gmail.com> Tested-by: Jiwei Sun <sunjw10(a)lenovo.com> Reviewed-by: Baoquan He <bhe(a)redhat.com> Cc: Alexander Potapenko <glider(a)google.com> Cc: Andrey Konovalov <andreyknvl(a)gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: Dmitry Vyukov <dvyukov(a)google.com> Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/kasan.h | 12 +++++++++--- mm/kasan/shadow.c | 14 ++++++++++---- mm/vmalloc.c | 34 ++++++++++++++++++++++++++-------- 3 files changed, 45 insertions(+), 15 deletions(-) --- a/include/linux/kasan.h~mm-vmalloc-combine-all-tlb-flush-operations-of-kasan-shadow-virtual-address-into-one-operation +++ a/include/linux/kasan.h @@ -29,6 +29,9 @@ typedef unsigned int __bitwise kasan_vma #define KASAN_VMALLOC_VM_ALLOC ((__force kasan_vmalloc_flags_t)0x02u) #define KASAN_VMALLOC_PROT_NORMAL ((__force kasan_vmalloc_flags_t)0x04u) +#define KASAN_VMALLOC_PAGE_RANGE 0x1 /* Apply exsiting page range */ +#define KASAN_VMALLOC_TLB_FLUSH 0x2 /* TLB flush */ + #if defined(CONFIG_KASAN_GENERIC) || defined(CONFIG_KASAN_SW_TAGS) #include <linux/pgtable.h> @@ -564,7 +567,8 @@ void kasan_populate_early_vm_area_shadow int kasan_populate_vmalloc(unsigned long addr, unsigned long size); void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end); + unsigned long free_region_end, + unsigned long flags); #else /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */ @@ -579,7 +583,8 @@ static inline int kasan_populate_vmalloc static inline void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end) { } + unsigned long free_region_end, + unsigned long flags) { } #endif /* CONFIG_KASAN_GENERIC || CONFIG_KASAN_SW_TAGS */ @@ -614,7 +619,8 @@ static inline int kasan_populate_vmalloc static inline void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end) { } + unsigned long free_region_end, + unsigned long flags) { } static inline void *kasan_unpoison_vmalloc(const void *start, unsigned long size, --- a/mm/kasan/shadow.c~mm-vmalloc-combine-all-tlb-flush-operations-of-kasan-shadow-virtual-address-into-one-operation +++ a/mm/kasan/shadow.c @@ -489,7 +489,8 @@ static int kasan_depopulate_vmalloc_pte( */ void kasan_release_vmalloc(unsigned long start, unsigned long end, unsigned long free_region_start, - unsigned long free_region_end) + unsigned long free_region_end, + unsigned long flags) { void *shadow_start, *shadow_end; unsigned long region_start, region_end; @@ -522,12 +523,17 @@ void kasan_release_vmalloc(unsigned long __memset(shadow_start, KASAN_SHADOW_INIT, shadow_end - shadow_start); return; } - apply_to_existing_page_range(&init_mm, + + + if (flags & KASAN_VMALLOC_PAGE_RANGE) + apply_to_existing_page_range(&init_mm, (unsigned long)shadow_start, size, kasan_depopulate_vmalloc_pte, NULL); - flush_tlb_kernel_range((unsigned long)shadow_start, - (unsigned long)shadow_end); + + if (flags & KASAN_VMALLOC_TLB_FLUSH) + flush_tlb_kernel_range((unsigned long)shadow_start, + (unsigned long)shadow_end); } } --- a/mm/vmalloc.c~mm-vmalloc-combine-all-tlb-flush-operations-of-kasan-shadow-virtual-address-into-one-operation +++ a/mm/vmalloc.c @@ -2182,6 +2182,25 @@ decay_va_pool_node(struct vmap_node *vn, reclaim_list_global(&decay_list); } +static void +kasan_release_vmalloc_node(struct vmap_node *vn) +{ + struct vmap_area *va; + unsigned long start, end; + + start = list_first_entry(&vn->purge_list, struct vmap_area, list)->va_start; + end = list_last_entry(&vn->purge_list, struct vmap_area, list)->va_end; + + list_for_each_entry(va, &vn->purge_list, list) { + if (is_vmalloc_or_module_addr((void *) va->va_start)) + kasan_release_vmalloc(va->va_start, va->va_end, + va->va_start, va->va_end, + KASAN_VMALLOC_PAGE_RANGE); + } + + kasan_release_vmalloc(start, end, start, end, KASAN_VMALLOC_TLB_FLUSH); +} + static void purge_vmap_node(struct work_struct *work) { struct vmap_node *vn = container_of(work, @@ -2190,20 +2209,17 @@ static void purge_vmap_node(struct work_ struct vmap_area *va, *n_va; LIST_HEAD(local_list); + if (IS_ENABLED(CONFIG_KASAN_VMALLOC)) + kasan_release_vmalloc_node(vn); + vn->nr_purged = 0; list_for_each_entry_safe(va, n_va, &vn->purge_list, list) { unsigned long nr = va_size(va) >> PAGE_SHIFT; - unsigned long orig_start = va->va_start; - unsigned long orig_end = va->va_end; unsigned int vn_id = decode_vn_id(va->flags); list_del_init(&va->list); - if (is_vmalloc_or_module_addr((void *)orig_start)) - kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end); - nr_purged_pages += nr; vn->nr_purged++; @@ -4784,7 +4800,8 @@ recovery: &free_vmap_area_list); if (va) kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end); + va->va_start, va->va_end, + KASAN_VMALLOC_PAGE_RANGE | KASAN_VMALLOC_TLB_FLUSH); vas[area] = NULL; } @@ -4834,7 +4851,8 @@ err_free_shadow: &free_vmap_area_list); if (va) kasan_release_vmalloc(orig_start, orig_end, - va->va_start, va->va_end); + va->va_start, va->va_end, + KASAN_VMALLOC_PAGE_RANGE | KASAN_VMALLOC_TLB_FLUSH); vas[area] = NULL; kfree(vms[area]); } _ Patches currently in -mm which might be from ahuang12(a)lenovo.com are

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-resolve-faulty-mmap_region-error-path-behaviour.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: resolve faulty mmap_region() error path behaviour has been removed from the -mm tree. Its filename was mm-resolve-faulty-mmap_region-error-path-behaviour.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Subject: mm: resolve faulty mmap_region() error path behaviour Date: Tue, 29 Oct 2024 18:11:48 +0000 The mmap_region() function is somewhat terrifying, with spaghetti-like control flow and numerous means by which issues can arise and incomplete state, memory leaks and other unpleasantness can occur. A large amount of the complexity arises from trying to handle errors late in the process of mapping a VMA, which forms the basis of recently observed issues with resource leaks and observable inconsistent state. Taking advantage of previous patches in this series we move a number of checks earlier in the code, simplifying things by moving the core of the logic into a static internal function __mmap_region(). Doing this allows us to perform a number of checks up front before we do any real work, and allows us to unwind the writable unmap check unconditionally as required and to perform a CONFIG_DEBUG_VM_MAPLE_TREE validation unconditionally also. We move a number of things here: 1. We preallocate memory for the iterator before we call the file-backed memory hook, allowing us to exit early and avoid having to perform complicated and error-prone close/free logic. We carefully free iterator state on both success and error paths. 2. The enclosing mmap_region() function handles the mapping_map_writable() logic early. Previously the logic had the mapping_map_writable() at the point of mapping a newly allocated file-backed VMA, and a matching mapping_unmap_writable() on success and error paths. We now do this unconditionally if this is a file-backed, shared writable mapping. If a driver changes the flags to eliminate VM_MAYWRITE, however doing so does not invalidate the seal check we just performed, and we in any case always decrement the counter in the wrapper. We perform a debug assert to ensure a driver does not attempt to do the opposite. 3. We also move arch_validate_flags() up into the mmap_region() function. This is only relevant on arm64 and sparc64, and the check is only meaningful for SPARC with ADI enabled. We explicitly add a warning for this arch if a driver invalidates this check, though the code ought eventually to be fixed to eliminate the need for this. With all of these measures in place, we no longer need to explicitly close the VMA on error paths, as we place all checks which might fail prior to a call to any driver mmap hook. This eliminates an entire class of errors, makes the code easier to reason about and more robust. Link: https://lkml.kernel.org/r/6e0becb36d2f5472053ac5d544c0edfe9b899e25.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Tested-by: Mark Brown <broonie(a)kernel.org> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/mmap.c | 121 ++++++++++++++++++++++++++++------------------------ 1 file changed, 66 insertions(+), 55 deletions(-) --- a/mm/mmap.c~mm-resolve-faulty-mmap_region-error-path-behaviour +++ a/mm/mmap.c @@ -1358,20 +1358,18 @@ int do_munmap(struct mm_struct *mm, unsi return do_vmi_munmap(&vmi, mm, start, len, uf, false); } -unsigned long mmap_region(struct file *file, unsigned long addr, +static unsigned long __mmap_region(struct file *file, unsigned long addr, unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, struct list_head *uf) { struct mm_struct *mm = current->mm; struct vm_area_struct *vma = NULL; pgoff_t pglen = PHYS_PFN(len); - struct vm_area_struct *merge; unsigned long charged = 0; struct vma_munmap_struct vms; struct ma_state mas_detach; struct maple_tree mt_detach; unsigned long end = addr + len; - bool writable_file_mapping = false; int error; VMA_ITERATOR(vmi, mm, addr); VMG_STATE(vmg, mm, &vmi, addr, end, vm_flags, pgoff); @@ -1445,28 +1443,26 @@ unsigned long mmap_region(struct file *f vm_flags_init(vma, vm_flags); vma->vm_page_prot = vm_get_page_prot(vm_flags); + if (vma_iter_prealloc(&vmi, vma)) { + error = -ENOMEM; + goto free_vma; + } + if (file) { vma->vm_file = get_file(file); error = mmap_file(file, vma); if (error) - goto unmap_and_free_vma; - - if (vma_is_shared_maywrite(vma)) { - error = mapping_map_writable(file->f_mapping); - if (error) - goto close_and_free_vma; - - writable_file_mapping = true; - } + goto unmap_and_free_file_vma; + /* Drivers cannot alter the address of the VMA. */ + WARN_ON_ONCE(addr != vma->vm_start); /* - * Expansion is handled above, merging is handled below. - * Drivers should not alter the address of the VMA. + * Drivers should not permit writability when previously it was + * disallowed. */ - if (WARN_ON((addr != vma->vm_start))) { - error = -EINVAL; - goto close_and_free_vma; - } + VM_WARN_ON_ONCE(vm_flags != vma->vm_flags && + !(vm_flags & VM_MAYWRITE) && + (vma->vm_flags & VM_MAYWRITE)); vma_iter_config(&vmi, addr, end); /* @@ -1474,6 +1470,8 @@ unsigned long mmap_region(struct file *f * vma again as we may succeed this time. */ if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) { + struct vm_area_struct *merge; + vmg.flags = vma->vm_flags; /* If this fails, state is reset ready for a reattempt. */ merge = vma_merge_new_range(&vmg); @@ -1491,7 +1489,7 @@ unsigned long mmap_region(struct file *f vma = merge; /* Update vm_flags to pick up the change. */ vm_flags = vma->vm_flags; - goto unmap_writable; + goto file_expanded; } vma_iter_config(&vmi, addr, end); } @@ -1500,26 +1498,15 @@ unsigned long mmap_region(struct file *f } else if (vm_flags & VM_SHARED) { error = shmem_zero_setup(vma); if (error) - goto free_vma; + goto free_iter_vma; } else { vma_set_anonymous(vma); } - if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) { - error = -EACCES; - goto close_and_free_vma; - } - - /* Allow architectures to sanity-check the vm_flags */ - if (!arch_validate_flags(vma->vm_flags)) { - error = -EINVAL; - goto close_and_free_vma; - } - - if (vma_iter_prealloc(&vmi, vma)) { - error = -ENOMEM; - goto close_and_free_vma; - } +#ifdef CONFIG_SPARC64 + /* TODO: Fix SPARC ADI! */ + WARN_ON_ONCE(!arch_validate_flags(vm_flags)); +#endif /* Lock the VMA since it is modified after insertion into VMA tree */ vma_start_write(vma); @@ -1533,10 +1520,7 @@ unsigned long mmap_region(struct file *f */ khugepaged_enter_vma(vma, vma->vm_flags); - /* Once vma denies write, undo our temporary denial count */ -unmap_writable: - if (writable_file_mapping) - mapping_unmap_writable(file->f_mapping); +file_expanded: file = vma->vm_file; ksm_add_vma(vma); expanded: @@ -1569,23 +1553,17 @@ expanded: vma_set_page_prot(vma); - validate_mm(mm); return addr; -close_and_free_vma: - vma_close(vma); - - if (file || vma->vm_file) { -unmap_and_free_vma: - fput(vma->vm_file); - vma->vm_file = NULL; - - vma_iter_set(&vmi, vma->vm_end); - /* Undo any partial mapping done by a device driver. */ - unmap_region(&vmi.mas, vma, vmg.prev, vmg.next); - } - if (writable_file_mapping) - mapping_unmap_writable(file->f_mapping); +unmap_and_free_file_vma: + fput(vma->vm_file); + vma->vm_file = NULL; + + vma_iter_set(&vmi, vma->vm_end); + /* Undo any partial mapping done by a device driver. */ + unmap_region(&vmi.mas, vma, vmg.prev, vmg.next); +free_iter_vma: + vma_iter_free(&vmi); free_vma: vm_area_free(vma); unacct_error: @@ -1595,10 +1573,43 @@ unacct_error: abort_munmap: vms_abort_munmap_vmas(&vms, &mas_detach); gather_failed: - validate_mm(mm); return error; } +unsigned long mmap_region(struct file *file, unsigned long addr, + unsigned long len, vm_flags_t vm_flags, unsigned long pgoff, + struct list_head *uf) +{ + unsigned long ret; + bool writable_file_mapping = false; + + /* Check to see if MDWE is applicable. */ + if (map_deny_write_exec(vm_flags, vm_flags)) + return -EACCES; + + /* Allow architectures to sanity-check the vm_flags. */ + if (!arch_validate_flags(vm_flags)) + return -EINVAL; + + /* Map writable and ensure this isn't a sealed memfd. */ + if (file && is_shared_maywrite(vm_flags)) { + int error = mapping_map_writable(file->f_mapping); + + if (error) + return error; + writable_file_mapping = true; + } + + ret = __mmap_region(file, addr, len, vm_flags, pgoff, uf); + + /* Clear our write mapping regardless of error. */ + if (writable_file_mapping) + mapping_unmap_writable(file->f_mapping); + + validate_mm(current->mm); + return ret; +} + static int __vm_munmap(unsigned long start, size_t len, bool unlock) { int ret; _ Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch mm-refactor-mm_access-to-not-return-null.patch mm-madvise-unrestrict-process_madvise-for-current-process.patch maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch tools-testing-fix-phys_addr_t-size-on-64-bit-systems.patch tools-testing-add-additional-vma_internalh-stubs.patch mm-isolate-mmap-internal-logic-to-mm-vmac.patch mm-refactor-__mmap_region.patch mm-remove-unnecessary-reset-state-logic-on-merge-new-vma.patch mm-defer-second-attempt-at-merge-on-mmap.patch mm-pagewalk-add-the-ability-to-install-ptes.patch mm-add-pte_marker_guard-pte-marker.patch mm-madvise-implement-lightweight-guard-page-mechanism.patch tools-testing-update-tools-uapi-header-for-mman-commonh.patch selftests-mm-add-self-tests-for-guard-page-feature.patch mm-remove-unnecessary-page_table_lock-on-stack-expansion.patch

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling has been removed from the -mm tree. Its filename was mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Subject: mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling Date: Tue, 29 Oct 2024 18:11:47 +0000 Currently MTE is permitted in two circumstances (desiring to use MTE having been specified by the VM_MTE flag) - where MAP_ANONYMOUS is specified, as checked by arch_calc_vm_flag_bits() and actualised by setting the VM_MTE_ALLOWED flag, or if the file backing the mapping is shmem, in which case we set VM_MTE_ALLOWED in shmem_mmap() when the mmap hook is activated in mmap_region(). The function that checks that, if VM_MTE is set, VM_MTE_ALLOWED is also set is the arm64 implementation of arch_validate_flags(). Unfortunately, we intend to refactor mmap_region() to perform this check earlier, meaning that in the case of a shmem backing we will not have invoked shmem_mmap() yet, causing the mapping to fail spuriously. It is inappropriate to set this architecture-specific flag in general mm code anyway, so a sensible resolution of this issue is to instead move the check somewhere else. We resolve this by setting VM_MTE_ALLOWED much earlier in do_mmap(), via the arch_calc_vm_flag_bits() call. This is an appropriate place to do this as we already check for the MAP_ANONYMOUS case here, and the shmem file case is simply a variant of the same idea - we permit RAM-backed memory. This requires a modification to the arch_calc_vm_flag_bits() signature to pass in a pointer to the struct file associated with the mapping, however this is not too egregious as this is only used by two architectures anyway - arm64 and parisc. So this patch performs this adjustment and removes the unnecessary assignment of VM_MTE_ALLOWED in shmem_mmap(). [akpm(a)linux-foundation.org: fix whitespace, per Catalin] Link: https://lkml.kernel.org/r/ec251b20ba1964fb64cf1607d2ad80c47f3873df.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Suggested-by: Catalin Marinas <catalin.marinas(a)arm.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Catalin Marinas <catalin.marinas(a)arm.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- arch/arm64/include/asm/mman.h | 10 +++++++--- arch/parisc/include/asm/mman.h | 5 +++-- include/linux/mman.h | 7 ++++--- mm/mmap.c | 2 +- mm/nommu.c | 2 +- mm/shmem.c | 3 --- 6 files changed, 16 insertions(+), 13 deletions(-) --- a/arch/arm64/include/asm/mman.h~mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling +++ a/arch/arm64/include/asm/mman.h @@ -6,6 +6,8 @@ #ifndef BUILD_VDSO #include <linux/compiler.h> +#include <linux/fs.h> +#include <linux/shmem_fs.h> #include <linux/types.h> static inline unsigned long arch_calc_vm_prot_bits(unsigned long prot, @@ -31,19 +33,21 @@ static inline unsigned long arch_calc_vm } #define arch_calc_vm_prot_bits(prot, pkey) arch_calc_vm_prot_bits(prot, pkey) -static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) +static inline unsigned long arch_calc_vm_flag_bits(struct file *file, + unsigned long flags) { /* * Only allow MTE on anonymous mappings as these are guaranteed to be * backed by tags-capable memory. The vm_flags may be overridden by a * filesystem supporting MTE (RAM-based). */ - if (system_supports_mte() && (flags & MAP_ANONYMOUS)) + if (system_supports_mte() && + ((flags & MAP_ANONYMOUS) || shmem_file(file))) return VM_MTE_ALLOWED; return 0; } -#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags) +#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags) static inline bool arch_validate_prot(unsigned long prot, unsigned long addr __always_unused) --- a/arch/parisc/include/asm/mman.h~mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling +++ a/arch/parisc/include/asm/mman.h @@ -2,6 +2,7 @@ #ifndef __ASM_MMAN_H__ #define __ASM_MMAN_H__ +#include <linux/fs.h> #include <uapi/asm/mman.h> /* PARISC cannot allow mdwe as it needs writable stacks */ @@ -11,7 +12,7 @@ static inline bool arch_memory_deny_writ } #define arch_memory_deny_write_exec_supported arch_memory_deny_write_exec_supported -static inline unsigned long arch_calc_vm_flag_bits(unsigned long flags) +static inline unsigned long arch_calc_vm_flag_bits(struct file *file, unsigned long flags) { /* * The stack on parisc grows upwards, so if userspace requests memory @@ -23,6 +24,6 @@ static inline unsigned long arch_calc_vm return 0; } -#define arch_calc_vm_flag_bits(flags) arch_calc_vm_flag_bits(flags) +#define arch_calc_vm_flag_bits(file, flags) arch_calc_vm_flag_bits(file, flags) #endif /* __ASM_MMAN_H__ */ --- a/include/linux/mman.h~mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling +++ a/include/linux/mman.h @@ -2,6 +2,7 @@ #ifndef _LINUX_MMAN_H #define _LINUX_MMAN_H +#include <linux/fs.h> #include <linux/mm.h> #include <linux/percpu_counter.h> @@ -94,7 +95,7 @@ static inline void vm_unacct_memory(long #endif #ifndef arch_calc_vm_flag_bits -#define arch_calc_vm_flag_bits(flags) 0 +#define arch_calc_vm_flag_bits(file, flags) 0 #endif #ifndef arch_validate_prot @@ -151,13 +152,13 @@ calc_vm_prot_bits(unsigned long prot, un * Combine the mmap "flags" argument into "vm_flags" used internally. */ static inline unsigned long -calc_vm_flag_bits(unsigned long flags) +calc_vm_flag_bits(struct file *file, unsigned long flags) { return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) | _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) | _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) | _calc_vm_trans(flags, MAP_STACK, VM_NOHUGEPAGE) | - arch_calc_vm_flag_bits(flags); + arch_calc_vm_flag_bits(file, flags); } unsigned long vm_commit_limit(void); --- a/mm/mmap.c~mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling +++ a/mm/mmap.c @@ -344,7 +344,7 @@ unsigned long do_mmap(struct file *file, * to. we assume access permissions have been handled by the open * of the memory object, so we don't do any here. */ - vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(flags) | + vm_flags |= calc_vm_prot_bits(prot, pkey) | calc_vm_flag_bits(file, flags) | mm->def_flags | VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC; /* Obtain the address to map to. we verify (or select) it and ensure --- a/mm/nommu.c~mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling +++ a/mm/nommu.c @@ -842,7 +842,7 @@ static unsigned long determine_vm_flags( { unsigned long vm_flags; - vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(flags); + vm_flags = calc_vm_prot_bits(prot, 0) | calc_vm_flag_bits(file, flags); if (!file) { /* --- a/mm/shmem.c~mm-refactor-arch_calc_vm_flag_bits-and-arm64-mte-handling +++ a/mm/shmem.c @@ -2733,9 +2733,6 @@ static int shmem_mmap(struct file *file, if (ret) return ret; - /* arm64 - allow memory tagging on RAM-based files */ - vm_flags_set(vma, VM_MTE_ALLOWED); - file_accessed(file); /* This is anonymous shared memory if it is unlinked at the time of mmap */ if (inode->i_nlink) _ Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch mm-refactor-mm_access-to-not-return-null.patch mm-madvise-unrestrict-process_madvise-for-current-process.patch maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch tools-testing-fix-phys_addr_t-size-on-64-bit-systems.patch tools-testing-add-additional-vma_internalh-stubs.patch mm-isolate-mmap-internal-logic-to-mm-vmac.patch mm-refactor-__mmap_region.patch mm-remove-unnecessary-reset-state-logic-on-merge-new-vma.patch mm-defer-second-attempt-at-merge-on-mmap.patch mm-pagewalk-add-the-ability-to-install-ptes.patch mm-add-pte_marker_guard-pte-marker.patch mm-madvise-implement-lightweight-guard-page-mechanism.patch tools-testing-update-tools-uapi-header-for-mman-commonh.patch selftests-mm-add-self-tests-for-guard-page-feature.patch mm-remove-unnecessary-page_table_lock-on-stack-expansion.patch

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-refactor-map_deny_write_exec.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: refactor map_deny_write_exec() has been removed from the -mm tree. Its filename was mm-refactor-map_deny_write_exec.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Subject: mm: refactor map_deny_write_exec() Date: Tue, 29 Oct 2024 18:11:46 +0000 Refactor the map_deny_write_exec() to not unnecessarily require a VMA parameter but rather to accept VMA flags parameters, which allows us to use this function early in mmap_region() in a subsequent commit. While we're here, we refactor the function to be more readable and add some additional documentation. Link: https://lkml.kernel.org/r/6be8bb59cd7c68006ebb006eb9d8dc27104b1f70.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Jann Horn <jannh(a)google.com> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mman.h | 21 ++++++++++++++++++--- mm/mmap.c | 2 +- mm/mprotect.c | 2 +- mm/vma.h | 2 +- 4 files changed, 21 insertions(+), 6 deletions(-) --- a/include/linux/mman.h~mm-refactor-map_deny_write_exec +++ a/include/linux/mman.h @@ -188,16 +188,31 @@ static inline bool arch_memory_deny_writ * * d) mmap(PROT_READ | PROT_EXEC) * mmap(PROT_READ | PROT_EXEC | PROT_BTI) + * + * This is only applicable if the user has set the Memory-Deny-Write-Execute + * (MDWE) protection mask for the current process. + * + * @old specifies the VMA flags the VMA originally possessed, and @new the ones + * we propose to set. + * + * Return: false if proposed change is OK, true if not ok and should be denied. */ -static inline bool map_deny_write_exec(struct vm_area_struct *vma, unsigned long vm_flags) +static inline bool map_deny_write_exec(unsigned long old, unsigned long new) { + /* If MDWE is disabled, we have nothing to deny. */ if (!test_bit(MMF_HAS_MDWE, &current->mm->flags)) return false; - if ((vm_flags & VM_EXEC) && (vm_flags & VM_WRITE)) + /* If the new VMA is not executable, we have nothing to deny. */ + if (!(new & VM_EXEC)) + return false; + + /* Under MDWE we do not accept newly writably executable VMAs... */ + if (new & VM_WRITE) return true; - if (!(vma->vm_flags & VM_EXEC) && (vm_flags & VM_EXEC)) + /* ...nor previously non-executable VMAs becoming executable. */ + if (!(old & VM_EXEC)) return true; return false; --- a/mm/mmap.c~mm-refactor-map_deny_write_exec +++ a/mm/mmap.c @@ -1505,7 +1505,7 @@ unsigned long mmap_region(struct file *f vma_set_anonymous(vma); } - if (map_deny_write_exec(vma, vma->vm_flags)) { + if (map_deny_write_exec(vma->vm_flags, vma->vm_flags)) { error = -EACCES; goto close_and_free_vma; } --- a/mm/mprotect.c~mm-refactor-map_deny_write_exec +++ a/mm/mprotect.c @@ -810,7 +810,7 @@ static int do_mprotect_pkey(unsigned lon break; } - if (map_deny_write_exec(vma, newflags)) { + if (map_deny_write_exec(vma->vm_flags, newflags)) { error = -EACCES; break; } --- a/mm/vma.h~mm-refactor-map_deny_write_exec +++ a/mm/vma.h @@ -42,7 +42,7 @@ struct vma_munmap_struct { int vma_count; /* Number of vmas that will be removed */ bool unlock; /* Unlock after the munmap */ bool clear_ptes; /* If there are outstanding PTE to be cleared */ - /* 1 byte hole */ + /* 2 byte hole */ unsigned long nr_pages; /* Number of pages being removed */ unsigned long locked_vm; /* Number of locked pages */ unsigned long nr_accounted; /* Number of VM_ACCOUNT pages */ _ Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch mm-refactor-mm_access-to-not-return-null.patch mm-madvise-unrestrict-process_madvise-for-current-process.patch maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch tools-testing-fix-phys_addr_t-size-on-64-bit-systems.patch tools-testing-add-additional-vma_internalh-stubs.patch mm-isolate-mmap-internal-logic-to-mm-vmac.patch mm-refactor-__mmap_region.patch mm-remove-unnecessary-reset-state-logic-on-merge-new-vma.patch mm-defer-second-attempt-at-merge-on-mmap.patch mm-pagewalk-add-the-ability-to-install-ptes.patch mm-add-pte_marker_guard-pte-marker.patch mm-madvise-implement-lightweight-guard-page-mechanism.patch tools-testing-update-tools-uapi-header-for-mman-commonh.patch selftests-mm-add-self-tests-for-guard-page-feature.patch mm-remove-unnecessary-page_table_lock-on-stack-expansion.patch

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-unconditionally-close-vmas-on-error.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: unconditionally close VMAs on error has been removed from the -mm tree. Its filename was mm-unconditionally-close-vmas-on-error.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Subject: mm: unconditionally close VMAs on error Date: Tue, 29 Oct 2024 18:11:45 +0000 Incorrect invocation of VMA callbacks when the VMA is no longer in a consistent state is bug prone and risky to perform. With regards to the important vm_ops->close() callback We have gone to great lengths to try to track whether or not we ought to close VMAs. Rather than doing so and risking making a mistake somewhere, instead unconditionally close and reset vma->vm_ops to an empty dummy operations set with a NULL .close operator. We introduce a new function to do so - vma_close() - and simplify existing vms logic which tracked whether we needed to close or not. This simplifies the logic, avoids incorrect double-calling of the .close() callback and allows us to update error paths to simply call vma_close() unconditionally - making VMA closure idempotent. Link: https://lkml.kernel.org/r/28e89dda96f68c505cb6f8e9fc9b57c3e9f74b42.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Jann Horn <jannh(a)google.com> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/internal.h | 18 ++++++++++++++++++ mm/mmap.c | 5 ++--- mm/nommu.c | 3 +-- mm/vma.c | 14 +++++--------- mm/vma.h | 4 +--- 5 files changed, 27 insertions(+), 17 deletions(-) --- a/mm/internal.h~mm-unconditionally-close-vmas-on-error +++ a/mm/internal.h @@ -135,6 +135,24 @@ static inline int mmap_file(struct file return err; } +/* + * If the VMA has a close hook then close it, and since closing it might leave + * it in an inconsistent state which makes the use of any hooks suspect, clear + * them down by installing dummy empty hooks. + */ +static inline void vma_close(struct vm_area_struct *vma) +{ + if (vma->vm_ops && vma->vm_ops->close) { + vma->vm_ops->close(vma); + + /* + * The mapping is in an inconsistent state, and no further hooks + * may be invoked upon it. + */ + vma->vm_ops = &vma_dummy_vm_ops; + } +} + #ifdef CONFIG_MMU /* Flags for folio_pte_batch(). */ --- a/mm/mmap.c~mm-unconditionally-close-vmas-on-error +++ a/mm/mmap.c @@ -1573,8 +1573,7 @@ expanded: return addr; close_and_free_vma: - if (file && !vms.closed_vm_ops && vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); + vma_close(vma); if (file || vma->vm_file) { unmap_and_free_vma: @@ -1934,7 +1933,7 @@ void exit_mmap(struct mm_struct *mm) do { if (vma->vm_flags & VM_ACCOUNT) nr_accounted += vma_pages(vma); - remove_vma(vma, /* unreachable = */ true, /* closed = */ false); + remove_vma(vma, /* unreachable = */ true); count++; cond_resched(); vma = vma_next(&vmi); --- a/mm/nommu.c~mm-unconditionally-close-vmas-on-error +++ a/mm/nommu.c @@ -589,8 +589,7 @@ static int delete_vma_from_mm(struct vm_ */ static void delete_vma(struct mm_struct *mm, struct vm_area_struct *vma) { - if (vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); + vma_close(vma); if (vma->vm_file) fput(vma->vm_file); put_nommu_region(vma->vm_region); --- a/mm/vma.c~mm-unconditionally-close-vmas-on-error +++ a/mm/vma.c @@ -323,11 +323,10 @@ static bool can_vma_merge_right(struct v /* * Close a vm structure and free it. */ -void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed) +void remove_vma(struct vm_area_struct *vma, bool unreachable) { might_sleep(); - if (!closed && vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); + vma_close(vma); if (vma->vm_file) fput(vma->vm_file); mpol_put(vma_policy(vma)); @@ -1115,9 +1114,7 @@ void vms_clean_up_area(struct vma_munmap vms_clear_ptes(vms, mas_detach, true); mas_set(mas_detach, 0); mas_for_each(mas_detach, vma, ULONG_MAX) - if (vma->vm_ops && vma->vm_ops->close) - vma->vm_ops->close(vma); - vms->closed_vm_ops = true; + vma_close(vma); } /* @@ -1160,7 +1157,7 @@ void vms_complete_munmap_vmas(struct vma /* Remove and clean up vmas */ mas_set(mas_detach, 0); mas_for_each(mas_detach, vma, ULONG_MAX) - remove_vma(vma, /* = */ false, vms->closed_vm_ops); + remove_vma(vma, /* unreachable = */ false); vm_unacct_memory(vms->nr_accounted); validate_mm(mm); @@ -1684,8 +1681,7 @@ struct vm_area_struct *copy_vma(struct v return new_vma; out_vma_link: - if (new_vma->vm_ops && new_vma->vm_ops->close) - new_vma->vm_ops->close(new_vma); + vma_close(new_vma); if (new_vma->vm_file) fput(new_vma->vm_file); --- a/mm/vma.h~mm-unconditionally-close-vmas-on-error +++ a/mm/vma.h @@ -42,7 +42,6 @@ struct vma_munmap_struct { int vma_count; /* Number of vmas that will be removed */ bool unlock; /* Unlock after the munmap */ bool clear_ptes; /* If there are outstanding PTE to be cleared */ - bool closed_vm_ops; /* call_mmap() was encountered, so vmas may be closed */ /* 1 byte hole */ unsigned long nr_pages; /* Number of pages being removed */ unsigned long locked_vm; /* Number of locked pages */ @@ -198,7 +197,6 @@ static inline void init_vma_munmap(struc vms->unmap_start = FIRST_USER_ADDRESS; vms->unmap_end = USER_PGTABLES_CEILING; vms->clear_ptes = false; - vms->closed_vm_ops = false; } #endif @@ -269,7 +267,7 @@ int do_vmi_munmap(struct vma_iterator *v unsigned long start, size_t len, struct list_head *uf, bool unlock); -void remove_vma(struct vm_area_struct *vma, bool unreachable, bool closed); +void remove_vma(struct vm_area_struct *vma, bool unreachable); void unmap_region(struct ma_state *mas, struct vm_area_struct *vma, struct vm_area_struct *prev, struct vm_area_struct *next); _ Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch mm-refactor-mm_access-to-not-return-null.patch mm-madvise-unrestrict-process_madvise-for-current-process.patch maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch tools-testing-fix-phys_addr_t-size-on-64-bit-systems.patch tools-testing-add-additional-vma_internalh-stubs.patch mm-isolate-mmap-internal-logic-to-mm-vmac.patch mm-refactor-__mmap_region.patch mm-remove-unnecessary-reset-state-logic-on-merge-new-vma.patch mm-defer-second-attempt-at-merge-on-mmap.patch mm-pagewalk-add-the-ability-to-install-ptes.patch mm-add-pte_marker_guard-pte-marker.patch mm-madvise-implement-lightweight-guard-page-mechanism.patch tools-testing-update-tools-uapi-header-for-mman-commonh.patch selftests-mm-add-self-tests-for-guard-page-feature.patch mm-remove-unnecessary-page_table_lock-on-stack-expansion.patch

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: avoid unsafe VMA hook invocation when error arises on mmap hook has been removed from the -mm tree. Its filename was mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Subject: mm: avoid unsafe VMA hook invocation when error arises on mmap hook Date: Tue, 29 Oct 2024 18:11:44 +0000 Patch series "fix error handling in mmap_region() and refactor (hotfixes)", v4. mmap_region() is somewhat terrifying, with spaghetti-like control flow and numerous means by which issues can arise and incomplete state, memory leaks and other unpleasantness can occur. A large amount of the complexity arises from trying to handle errors late in the process of mapping a VMA, which forms the basis of recently observed issues with resource leaks and observable inconsistent state. This series goes to great lengths to simplify how mmap_region() works and to avoid unwinding errors late on in the process of setting up the VMA for the new mapping, and equally avoids such operations occurring while the VMA is in an inconsistent state. The patches in this series comprise the minimal changes required to resolve existing issues in mmap_region() error handling, in order that they can be hotfixed and backported. There is additionally a follow up series which goes further, separated out from the v1 series and sent and updated separately. This patch (of 5): After an attempted mmap() fails, we are no longer in a situation where we can safely interact with VMA hooks. This is currently not enforced, meaning that we need complicated handling to ensure we do not incorrectly call these hooks. We can avoid the whole issue by treating the VMA as suspect the moment that the file->f_ops->mmap() function reports an error by replacing whatever VMA operations were installed with a dummy empty set of VMA operations. We do so through a new helper function internal to mm - mmap_file() - which is both more logically named than the existing call_mmap() function and correctly isolates handling of the vm_op reassignment to mm. All the existing invocations of call_mmap() outside of mm are ultimately nested within the call_mmap() from mm, which we now replace. It is therefore safe to leave call_mmap() in place as a convenience function (and to avoid churn). The invokers are: ovl_file_operations -> mmap -> ovl_mmap() -> backing_file_mmap() coda_file_operations -> mmap -> coda_file_mmap() shm_file_operations -> shm_mmap() shm_file_operations_huge -> shm_mmap() dma_buf_fops -> dma_buf_mmap_internal -> i915_dmabuf_ops -> i915_gem_dmabuf_mmap() None of these callers interact with vm_ops or mappings in a problematic way on error, quickly exiting out. Link: https://lkml.kernel.org/r/cover.1730224667.git.lorenzo.stoakes@oracle.com Link: https://lkml.kernel.org/r/d41fd763496fd0048a962f3fd9407dc72dd4fd86.17302246… Fixes: deb0f6562884 ("mm/mmap: undo ->mmap() when arch_validate_flags() fails") Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Reported-by: Jann Horn <jannh(a)google.com> Reviewed-by: Liam R. Howlett <Liam.Howlett(a)oracle.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Reviewed-by: Jann Horn <jannh(a)google.com> Cc: Andreas Larsson <andreas(a)gaisler.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: David S. Miller <davem(a)davemloft.net> Cc: Helge Deller <deller(a)gmx.de> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Mark Brown <broonie(a)kernel.org> Cc: Peter Xu <peterx(a)redhat.com> Cc: Will Deacon <will(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/internal.h | 27 +++++++++++++++++++++++++++ mm/mmap.c | 6 +++--- mm/nommu.c | 4 ++-- 3 files changed, 32 insertions(+), 5 deletions(-) --- a/mm/internal.h~mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook +++ a/mm/internal.h @@ -108,6 +108,33 @@ static inline void *folio_raw_mapping(co return (void *)(mapping & ~PAGE_MAPPING_FLAGS); } +/* + * This is a file-backed mapping, and is about to be memory mapped - invoke its + * mmap hook and safely handle error conditions. On error, VMA hooks will be + * mutated. + * + * @file: File which backs the mapping. + * @vma: VMA which we are mapping. + * + * Returns: 0 if success, error otherwise. + */ +static inline int mmap_file(struct file *file, struct vm_area_struct *vma) +{ + int err = call_mmap(file, vma); + + if (likely(!err)) + return 0; + + /* + * OK, we tried to call the file hook for mmap(), but an error + * arose. The mapping is in an inconsistent state and we most not invoke + * any further hooks on it. + */ + vma->vm_ops = &vma_dummy_vm_ops; + + return err; +} + #ifdef CONFIG_MMU /* Flags for folio_pte_batch(). */ --- a/mm/mmap.c~mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook +++ a/mm/mmap.c @@ -1422,7 +1422,7 @@ unsigned long mmap_region(struct file *f /* * clear PTEs while the vma is still in the tree so that rmap * cannot race with the freeing later in the truncate scenario. - * This is also needed for call_mmap(), which is why vm_ops + * This is also needed for mmap_file(), which is why vm_ops * close function is called. */ vms_clean_up_area(&vms, &mas_detach); @@ -1447,7 +1447,7 @@ unsigned long mmap_region(struct file *f if (file) { vma->vm_file = get_file(file); - error = call_mmap(file, vma); + error = mmap_file(file, vma); if (error) goto unmap_and_free_vma; @@ -1470,7 +1470,7 @@ unsigned long mmap_region(struct file *f vma_iter_config(&vmi, addr, end); /* - * If vm_flags changed after call_mmap(), we should try merge + * If vm_flags changed after mmap_file(), we should try merge * vma again as we may succeed this time. */ if (unlikely(vm_flags != vma->vm_flags && vmg.prev)) { --- a/mm/nommu.c~mm-avoid-unsafe-vma-hook-invocation-when-error-arises-on-mmap-hook +++ a/mm/nommu.c @@ -885,7 +885,7 @@ static int do_mmap_shared_file(struct vm { int ret; - ret = call_mmap(vma->vm_file, vma); + ret = mmap_file(vma->vm_file, vma); if (ret == 0) { vma->vm_region->vm_top = vma->vm_region->vm_end; return 0; @@ -918,7 +918,7 @@ static int do_mmap_private(struct vm_are * happy. */ if (capabilities & NOMMU_MAP_DIRECT) { - ret = call_mmap(vma->vm_file, vma); + ret = mmap_file(vma->vm_file, vma); /* shouldn't return success if we're not sharing */ if (WARN_ON_ONCE(!is_nommu_shared_mapping(vma->vm_flags))) ret = -ENOSYS; _ Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are selftests-mm-add-pkey_sighandler_xx-hugetlb_dio-to-gitignore.patch mm-refactor-mm_access-to-not-return-null.patch mm-madvise-unrestrict-process_madvise-for-current-process.patch maple_tree-do-not-hash-pointers-on-dump-in-debug-mode.patch tools-testing-fix-phys_addr_t-size-on-64-bit-systems.patch tools-testing-add-additional-vma_internalh-stubs.patch mm-isolate-mmap-internal-logic-to-mm-vmac.patch mm-refactor-__mmap_region.patch mm-remove-unnecessary-reset-state-logic-on-merge-new-vma.patch mm-defer-second-attempt-at-merge-on-mmap.patch mm-pagewalk-add-the-ability-to-install-ptes.patch mm-add-pte_marker_guard-pte-marker.patch mm-madvise-implement-lightweight-guard-page-mechanism.patch tools-testing-update-tools-uapi-header-for-mman-commonh.patch selftests-mm-add-self-tests-for-guard-page-feature.patch mm-remove-unnecessary-page_table_lock-on-stack-expansion.patch

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-thp-fix-deferred-split-unqueue-naming-and-locking.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm/thp: fix deferred split unqueue naming and locking has been removed from the -mm tree. Its filename was mm-thp-fix-deferred-split-unqueue-naming-and-locking.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Hugh Dickins <hughd(a)google.com> Subject: mm/thp: fix deferred split unqueue naming and locking Date: Sun, 27 Oct 2024 13:02:13 -0700 (PDT) Recent changes are putting more pressure on THP deferred split queues: under load revealing long-standing races, causing list_del corruptions, "Bad page state"s and worse (I keep BUGs in both of those, so usually don't get to see how badly they end up without). The relevant recent changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin, improved swap allocation, and underused THP splitting. Before fixing locking: rename misleading folio_undo_large_rmappable(), which does not undo large_rmappable, to folio_unqueue_deferred_split(), which is what it does. But that and its out-of-line __callee are mm internals of very limited usability: add comment and WARN_ON_ONCEs to check usage; and return a bool to say if a deferred split was unqueued, which can then be used in WARN_ON_ONCEs around safety checks (sparing callers the arcane conditionals in __folio_unqueue_deferred_split()). Just omit the folio_unqueue_deferred_split() from free_unref_folios(), all of whose callers now call it beforehand (and if any forget then bad_page() will tell) - except for its caller put_pages_list(), which itself no longer has any callers (and will be deleted separately). Swapout: mem_cgroup_swapout() has been resetting folio->memcg_data 0 without checking and unqueueing a THP folio from deferred split list; which is unfortunate, since the split_queue_lock depends on the memcg (when memcg is enabled); so swapout has been unqueueing such THPs later, when freeing the folio, using the pgdat's lock instead: potentially corrupting the memcg's list. __remove_mapping() has frozen refcount to 0 here, so no problem with calling folio_unqueue_deferred_split() before resetting memcg_data. That goes back to 5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware"): which included a check on swapcache before adding to deferred queue, but no check on deferred queue before adding THP to swapcache. That worked fine with the usual sequence of events in reclaim (though there were a couple of rare ways in which a THP on deferred queue could have been swapped out), but 6.12 commit dafff3f4c850 ("mm: split underused THPs") avoids splitting underused THPs in reclaim, which makes swapcache THPs on deferred queue commonplace. Keep the check on swapcache before adding to deferred queue? Yes: it is no longer essential, but preserves the existing behaviour, and is likely to be a worthwhile optimization (vmstat showed much more traffic on the queue under swapping load if the check was removed); update its comment. Memcg-v1 move (deprecated): mem_cgroup_move_account() has been changing folio->memcg_data without checking and unqueueing a THP folio from the deferred list, sometimes corrupting "from" memcg's list, like swapout. Refcount is non-zero here, so folio_unqueue_deferred_split() can only be used in a WARN_ON_ONCE to validate the fix, which must be done earlier: mem_cgroup_move_charge_pte_range() first try to split the THP (splitting of course unqueues), or skip it if that fails. Not ideal, but moving charge has been requested, and khugepaged should repair the THP later: nobody wants new custom unqueueing code just for this deprecated case. The 87eaceb3faa5 commit did have the code to move from one deferred list to another (but was not conscious of its unsafety while refcount non-0); but that was removed by 5.6 commit fac0516b5534 ("mm: thp: don't need care deferred split queue in memcg charge move path"), which argued that the existence of a PMD mapping guarantees that the THP cannot be on a deferred list. As above, false in rare cases, and now commonly false. Backport to 6.11 should be straightforward. Earlier backports must take care that other _deferred_list fixes and dependencies are included. There is not a strong case for backports, but they can fix cornercases. Link: https://lkml.kernel.org/r/8dc111ae-f6db-2da7-b25c-7a20b1effe3b@google.com Fixes: 87eaceb3faa5 ("mm: thp: make deferred split shrinker memcg aware") Fixes: dafff3f4c850 ("mm: split underused THPs") Signed-off-by: Hugh Dickins <hughd(a)google.com> Acked-by: David Hildenbrand <david(a)redhat.com> Reviewed-by: Yang Shi <shy828301(a)gmail.com> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Barry Song <baohua(a)kernel.org> Cc: Chris Li <chrisl(a)kernel.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Nhat Pham <nphamcs(a)gmail.com> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Shakeel Butt <shakeel.butt(a)linux.dev> Cc: Usama Arif <usamaarif642(a)gmail.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: Zi Yan <ziy(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/huge_memory.c | 35 ++++++++++++++++++++++++++--------- mm/internal.h | 10 +++++----- mm/memcontrol-v1.c | 25 +++++++++++++++++++++++++ mm/memcontrol.c | 8 +++++--- mm/migrate.c | 4 ++-- mm/page_alloc.c | 1 - mm/swap.c | 4 ++-- mm/vmscan.c | 4 ++-- 8 files changed, 67 insertions(+), 24 deletions(-) --- a/mm/huge_memory.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/huge_memory.c @@ -3588,10 +3588,27 @@ int split_folio_to_list(struct folio *fo return split_huge_page_to_list_to_order(&folio->page, list, ret); } -void __folio_undo_large_rmappable(struct folio *folio) +/* + * __folio_unqueue_deferred_split() is not to be called directly: + * the folio_unqueue_deferred_split() inline wrapper in mm/internal.h + * limits its calls to those folios which may have a _deferred_list for + * queueing THP splits, and that list is (racily observed to be) non-empty. + * + * It is unsafe to call folio_unqueue_deferred_split() until folio refcount is + * zero: because even when split_queue_lock is held, a non-empty _deferred_list + * might be in use on deferred_split_scan()'s unlocked on-stack list. + * + * If memory cgroups are enabled, split_queue_lock is in the mem_cgroup: it is + * therefore important to unqueue deferred split before changing folio memcg. + */ +bool __folio_unqueue_deferred_split(struct folio *folio) { struct deferred_split *ds_queue; unsigned long flags; + bool unqueued = false; + + WARN_ON_ONCE(folio_ref_count(folio)); + WARN_ON_ONCE(!mem_cgroup_disabled() && !folio_memcg(folio)); ds_queue = get_deferred_split_queue(folio); spin_lock_irqsave(&ds_queue->split_queue_lock, flags); @@ -3603,8 +3620,11 @@ void __folio_undo_large_rmappable(struct MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); } list_del_init(&folio->_deferred_list); + unqueued = true; } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); + + return unqueued; /* useful for debug warnings */ } /* partially_mapped=false won't clear PG_partially_mapped folio flag */ @@ -3627,14 +3647,11 @@ void deferred_split_folio(struct folio * return; /* - * The try_to_unmap() in page reclaim path might reach here too, - * this may cause a race condition to corrupt deferred split queue. - * And, if page reclaim is already handling the same folio, it is - * unnecessary to handle it again in shrinker. - * - * Check the swapcache flag to determine if the folio is being - * handled by page reclaim since THP swap would add the folio into - * swap cache before calling try_to_unmap(). + * Exclude swapcache: originally to avoid a corrupt deferred split + * queue. Nowadays that is fully prevented by mem_cgroup_swapout(); + * but if page reclaim is already handling the same folio, it is + * unnecessary to handle it again in the shrinker, so excluding + * swapcache here may still be a useful optimization. */ if (folio_test_swapcache(folio)) return; --- a/mm/internal.h~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/internal.h @@ -639,11 +639,11 @@ static inline void folio_set_order(struc #endif } -void __folio_undo_large_rmappable(struct folio *folio); -static inline void folio_undo_large_rmappable(struct folio *folio) +bool __folio_unqueue_deferred_split(struct folio *folio); +static inline bool folio_unqueue_deferred_split(struct folio *folio) { if (folio_order(folio) <= 1 || !folio_test_large_rmappable(folio)) - return; + return false; /* * At this point, there is no one trying to add the folio to @@ -651,9 +651,9 @@ static inline void folio_undo_large_rmap * to check without acquiring the split_queue_lock. */ if (data_race(list_empty(&folio->_deferred_list))) - return; + return false; - __folio_undo_large_rmappable(folio); + return __folio_unqueue_deferred_split(folio); } static inline struct folio *page_rmappable_folio(struct page *page) --- a/mm/memcontrol.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/memcontrol.c @@ -4629,9 +4629,6 @@ static void uncharge_folio(struct folio struct obj_cgroup *objcg; VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); - VM_BUG_ON_FOLIO(folio_order(folio) > 1 && - !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list), folio); /* * Nobody should be changing or seriously looking at @@ -4678,6 +4675,7 @@ static void uncharge_folio(struct folio ug->nr_memory += nr_pages; ug->pgpgout++; + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = 0; } @@ -4789,6 +4787,9 @@ void mem_cgroup_migrate(struct folio *ol /* Transfer the charge and the css ref */ commit_charge(new, memcg); + + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(old)); old->memcg_data = 0; } @@ -4975,6 +4976,7 @@ void mem_cgroup_swapout(struct folio *fo VM_BUG_ON_FOLIO(oldid, folio); mod_memcg_state(swap_memcg, MEMCG_SWAP, nr_entries); + folio_unqueue_deferred_split(folio); folio->memcg_data = 0; if (!mem_cgroup_is_root(memcg)) --- a/mm/memcontrol-v1.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/memcontrol-v1.c @@ -848,6 +848,8 @@ static int mem_cgroup_move_account(struc css_get(&to->css); css_put(&from->css); + /* Warning should never happen, so don't worry about refcount non-0 */ + WARN_ON_ONCE(folio_unqueue_deferred_split(folio)); folio->memcg_data = (unsigned long)to; __folio_memcg_unlock(from); @@ -1217,7 +1219,9 @@ static int mem_cgroup_move_charge_pte_ra enum mc_target_type target_type; union mc_target target; struct folio *folio; + bool tried_split_before = false; +retry_pmd: ptl = pmd_trans_huge_lock(pmd, vma); if (ptl) { if (mc.precharge < HPAGE_PMD_NR) { @@ -1227,6 +1231,27 @@ static int mem_cgroup_move_charge_pte_ra target_type = get_mctgt_type_thp(vma, addr, *pmd, &target); if (target_type == MC_TARGET_PAGE) { folio = target.folio; + /* + * Deferred split queue locking depends on memcg, + * and unqueue is unsafe unless folio refcount is 0: + * split or skip if on the queue? first try to split. + */ + if (!list_empty(&folio->_deferred_list)) { + spin_unlock(ptl); + if (!tried_split_before) + split_folio(folio); + folio_unlock(folio); + folio_put(folio); + if (tried_split_before) + return 0; + tried_split_before = true; + goto retry_pmd; + } + /* + * So long as that pmd lock is held, the folio cannot + * be racily added to the _deferred_list, because + * __folio_remove_rmap() will find !partially_mapped. + */ if (folio_isolate_lru(folio)) { if (!mem_cgroup_move_account(folio, true, mc.from, mc.to)) { --- a/mm/migrate.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/migrate.c @@ -490,7 +490,7 @@ static int __folio_migrate_mapping(struc folio_test_large_rmappable(folio)) { if (!folio_ref_freeze(folio, expected_count)) return -EAGAIN; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); folio_ref_unfreeze(folio, expected_count); } @@ -515,7 +515,7 @@ static int __folio_migrate_mapping(struc } /* Take off deferred split queue while frozen and memcg set */ - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); /* * Now we know that no one else is looking at the folio: --- a/mm/page_alloc.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/page_alloc.c @@ -2681,7 +2681,6 @@ void free_unref_folios(struct folio_batc unsigned long pfn = folio_pfn(folio); unsigned int order = folio_order(folio); - folio_undo_large_rmappable(folio); if (!free_pages_prepare(&folio->page, order)) continue; /* --- a/mm/swap.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/swap.c @@ -121,7 +121,7 @@ void __folio_put(struct folio *folio) } page_cache_release(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); mem_cgroup_uncharge(folio); free_unref_page(&folio->page, folio_order(folio)); } @@ -988,7 +988,7 @@ void folios_put_refs(struct folio_batch free_huge_folio(folio); continue; } - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); __page_cache_release(folio, &lruvec, &flags); if (j != i) --- a/mm/vmscan.c~mm-thp-fix-deferred-split-unqueue-naming-and-locking +++ a/mm/vmscan.c @@ -1476,7 +1476,7 @@ free_it: */ nr_reclaimed += nr_pages; - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { mem_cgroup_uncharge_folios(&free_folios); try_to_unmap_flush(); @@ -1864,7 +1864,7 @@ static unsigned int move_folios_to_lru(s if (unlikely(folio_put_testzero(folio))) { __folio_clear_lru_flags(folio); - folio_undo_large_rmappable(folio); + folio_unqueue_deferred_split(folio); if (folio_batch_add(&free_folios, folio) == 0) { spin_unlock_irq(&lruvec->lru_lock); mem_cgroup_uncharge_folios(&free_folios); _ Patches currently in -mm which might be from hughd(a)google.com are mm-delete-the-unused-put_pages_list.patch

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] nilfs2: fix kernel bug due to missing clearing of checked" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110529-clapper-deferred-1146@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd..10def4b55995 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) {

1 year, 1 month

2
1
0 0

[GIT PULL 01/10] xfs: convert perag to use xarrays

by Darrick J. Wong

Hi Carlos, Please pull this branch with changes for xfs for 6.13-rc1. As usual, I did a test-merge with the main upstream branch as of a few minutes ago, and didn't see any conflicts. Please let me know if you encounter any problems. --D The following changes since commit 59b723cd2adbac2a34fc8e12c74ae26ae45bf230: Linux 6.12-rc6 (2024-11-03 14:05:52 -1000) are available in the Git repository at: https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux.git tags/perag-xarray-6.13_2024-11-05 for you to fetch changes up to d66496578b2a099ea453f56782f1cd2bf63a8029: xfs: insert the pag structures into the xarray later (2024-11-05 13:38:27 -0800) ---------------------------------------------------------------- xfs: convert perag to use xarrays [v5.5 01/10] Convert the xfs_mount perag tree to use an xarray instead of a radix tree. There should be no functional changes here. With a bit of luck, this should all go splendidly. Signed-off-by: Darrick J. Wong <djwong(a)kernel.org> ---------------------------------------------------------------- Christoph Hellwig (22): xfs: fix superfluous clearing of info->low in __xfs_getfsmap_datadev xfs: remove the unused pagb_count field in struct xfs_perag xfs: remove the unused pag_active_wq field in struct xfs_perag xfs: pass a pag to xfs_difree_inode_chunk xfs: remove the agno argument to xfs_free_ag_extent xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers xfs: add a xfs_agino_to_ino helper xfs: pass a pag to xfs_extent_busy_{search,reuse} xfs: keep a reference to the pag for busy extents xfs: remove the mount field from struct xfs_busy_extents xfs: remove the unused trace_xfs_iwalk_ag trace point xfs: remove the unused xrep_bmap_walk_rmap trace point xfs: constify pag arguments to trace points xfs: pass a perag structure to the xfs_ag_resv_init_error trace point xfs: pass objects to the xfs_irec_merge_{pre,post} trace points xfs: pass the iunlink item to the xfs_iunlink_update_dinode trace point xfs: pass objects to the xrep_ibt_walk_rmap tracepoint xfs: pass the pag to the trace_xrep_calc_ag_resblks{,_btsize} trace points xfs: pass the pag to the xrep_newbt_extent_class tracepoints xfs: convert remaining trace points to pass pag structures xfs: split xfs_initialize_perag xfs: insert the pag structures into the xarray later Darrick J. Wong (1): xfs: fix simplify extent lookup in xfs_can_free_eofblocks fs/xfs/libxfs/xfs_ag.c | 135 ++++++++++++++------------ fs/xfs/libxfs/xfs_ag.h | 30 +++++- fs/xfs/libxfs/xfs_ag_resv.c | 3 +- fs/xfs/libxfs/xfs_alloc.c | 32 +++---- fs/xfs/libxfs/xfs_alloc.h | 5 +- fs/xfs/libxfs/xfs_alloc_btree.c | 2 +- fs/xfs/libxfs/xfs_btree.c | 7 +- fs/xfs/libxfs/xfs_ialloc.c | 67 ++++++------- fs/xfs/libxfs/xfs_ialloc_btree.c | 2 +- fs/xfs/libxfs/xfs_inode_util.c | 4 +- fs/xfs/libxfs/xfs_refcount.c | 11 +-- fs/xfs/libxfs/xfs_refcount_btree.c | 3 +- fs/xfs/libxfs/xfs_rmap_btree.c | 2 +- fs/xfs/scrub/agheader_repair.c | 16 +--- fs/xfs/scrub/alloc_repair.c | 10 +- fs/xfs/scrub/bmap.c | 5 +- fs/xfs/scrub/bmap_repair.c | 4 +- fs/xfs/scrub/common.c | 2 +- fs/xfs/scrub/cow_repair.c | 18 ++-- fs/xfs/scrub/ialloc.c | 8 +- fs/xfs/scrub/ialloc_repair.c | 25 ++--- fs/xfs/scrub/newbt.c | 46 ++++----- fs/xfs/scrub/reap.c | 8 +- fs/xfs/scrub/refcount_repair.c | 5 +- fs/xfs/scrub/repair.c | 13 ++- fs/xfs/scrub/rmap_repair.c | 9 +- fs/xfs/scrub/trace.h | 161 +++++++++++++++---------------- fs/xfs/xfs_bmap_util.c | 8 +- fs/xfs/xfs_buf_item_recover.c | 5 +- fs/xfs/xfs_discard.c | 20 ++-- fs/xfs/xfs_extent_busy.c | 31 +++--- fs/xfs/xfs_extent_busy.h | 14 ++- fs/xfs/xfs_extfree_item.c | 4 +- fs/xfs/xfs_filestream.c | 5 +- fs/xfs/xfs_fsmap.c | 25 ++--- fs/xfs/xfs_health.c | 8 +- fs/xfs/xfs_inode.c | 5 +- fs/xfs/xfs_iunlink_item.c | 13 ++- fs/xfs/xfs_iwalk.c | 17 ++-- fs/xfs/xfs_log_cil.c | 3 +- fs/xfs/xfs_log_recover.c | 5 +- fs/xfs/xfs_trace.c | 1 + fs/xfs/xfs_trace.h | 191 ++++++++++++++++--------------------- fs/xfs/xfs_trans.c | 2 +- 44 files changed, 459 insertions(+), 531 deletions(-)

1 year, 1 month

1
0
0 0

+ util_macrosh-fix-rework-find_closest-macros.patch added to mm-nonmm-unstable branch

by Andrew Morton

The patch titled Subject: util_macros.h: fix/rework find_closest() macros has been added to the -mm mm-nonmm-unstable branch. Its filename is util_macrosh-fix-rework-find_closest-macros.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-nonmm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Alexandru Ardelean <aardelean(a)baylibre.com> Subject: util_macros.h: fix/rework find_closest() macros Date: Tue, 5 Nov 2024 16:54:05 +0200 A bug was found in the find_closest() (find_closest_descending() is also affected after some testing), where for certain values with small progressions, the rounding (done by averaging 2 values) causes an incorrect index to be returned. The rounding issues occur for progressions of 1, 2 and 3. It goes away when the progression/interval between two values is 4 or larger. It's particularly bad for progressions of 1. For example if there's an array of 'a = { 1, 2, 3 }', using 'find_closest(2, a ...)' would return 0 (the index of '1'), rather than returning 1 (the index of '2'). This means that for exact values (with a progression of 1), find_closest() will misbehave and return the index of the value smaller than the one we're searching for. For progressions of 2 and 3, the exact values are obtained correctly; but values aren't approximated correctly (as one would expect). Starting with progressions of 4, all seems to be good (one gets what one would expect). While one could argue that 'find_closest()' should not be used for arrays with progressions of 1 (i.e. '{1, 2, 3, ...}', the macro should still behave correctly. The bug was found while testing the 'drivers/iio/adc/ad7606.c', specifically the oversampling feature. For reference, the oversampling values are listed as: static const unsigned int ad7606_oversampling_avail[7] = { 1, 2, 4, 8, 16, 32, 64, }; When doing: 1. $ echo 1 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 1 # this is fine 2. $ echo 2 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 1 # this is wrong; 2 should be returned here 3. $ echo 3 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 2 # this is fine 4. $ echo 4 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 4 # this is fine And from here-on, the values are as correct (one gets what one would expect.) While writing a kunit test for this bug, a peculiar issue was found for the array in the 'drivers/hwmon/ina2xx.c' & 'drivers/iio/adc/ina2xx-adc.c' drivers. While running the kunit test (for 'ina226_avg_tab' from these drivers): * idx = find_closest([-1 to 2], ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab)); This returns idx == 0, so value. * idx = find_closest(3, ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab)); This returns idx == 0, value 1; and now one could argue whether 3 is closer to 4 or to 1. This quirk only appears for value '3' in this array, but it seems to be a another rounding issue. * And from 4 onwards the 'find_closest'() works fine (one gets what one would expect). This change reworks the find_closest() macros to also check the difference between the left and right elements when 'x'. If the distance to the right is smaller (than the distance to the left), the index is incremented by 1. This also makes redundant the need for using the DIV_ROUND_CLOSEST() macro. In order to accommodate for any mix of negative + positive values, the internal variables '__fc_x', '__fc_mid_x', '__fc_left' & '__fc_right' are forced to 'long' type. This also addresses any potential bugs/issues with 'x' being of an unsigned type. In those situations any comparison between signed & unsigned would be promoted to a comparison between 2 unsigned numbers; this is especially annoying when '__fc_left' & '__fc_right' underflow. The find_closest_descending() macro was also reworked and duplicated from the find_closest(), and it is being iterated in reverse. The main reason for this is to get the same indices as 'find_closest()' (but in reverse). The comparison for '__fc_right < __fc_left' favors going the array in ascending order. For example for array '{ 1024, 512, 256, 128, 64, 16, 4, 1 }' and x = 3, we get: __fc_mid_x = 2 __fc_left = -1 __fc_right = -2 Then '__fc_right < __fc_left' evaluates to true and '__fc_i++' becomes 7 which is not quite incorrect, but 3 is closer to 4 than to 1. This change has been validated with the kunit from the next patch. Link: https://lkml.kernel.org/r/20241105145406.554365-1-aardelean@baylibre.com Fixes: 95d119528b0b ("util_macros.h: add find_closest() macro") Signed-off-by: Alexandru Ardelean <aardelean(a)baylibre.com> Cc: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/util_macros.h | 56 ++++++++++++++++++++++++---------- 1 file changed, 40 insertions(+), 16 deletions(-) --- a/include/linux/util_macros.h~util_macrosh-fix-rework-find_closest-macros +++ a/include/linux/util_macros.h @@ -4,19 +4,6 @@ #include <linux/math.h> -#define __find_closest(x, a, as, op) \ -({ \ - typeof(as) __fc_i, __fc_as = (as) - 1; \ - typeof(x) __fc_x = (x); \ - typeof(*a) const *__fc_a = (a); \ - for (__fc_i = 0; __fc_i < __fc_as; __fc_i++) { \ - if (__fc_x op DIV_ROUND_CLOSEST(__fc_a[__fc_i] + \ - __fc_a[__fc_i + 1], 2)) \ - break; \ - } \ - (__fc_i); \ -}) - /** * find_closest - locate the closest element in a sorted array * @x: The reference value. @@ -25,8 +12,27 @@ * @as: Size of 'a'. * * Returns the index of the element closest to 'x'. + * Note: If using an array of negative numbers (or mixed positive numbers), + * then be sure that 'x' is of a signed-type to get good results. */ -#define find_closest(x, a, as) __find_closest(x, a, as, <=) +#define find_closest(x, a, as) \ +({ \ + typeof(as) __fc_i, __fc_as = (as) - 1; \ + long __fc_mid_x, __fc_x = (x); \ + long __fc_left, __fc_right; \ + typeof(*a) const *__fc_a = (a); \ + for (__fc_i = 0; __fc_i < __fc_as; __fc_i++) { \ + __fc_mid_x = (__fc_a[__fc_i] + __fc_a[__fc_i + 1]) / 2; \ + if (__fc_x <= __fc_mid_x) { \ + __fc_left = __fc_x - __fc_a[__fc_i]; \ + __fc_right = __fc_a[__fc_i + 1] - __fc_x; \ + if (__fc_right < __fc_left) \ + __fc_i++; \ + break; \ + } \ + } \ + (__fc_i); \ +}) /** * find_closest_descending - locate the closest element in a sorted array @@ -36,9 +42,27 @@ * @as: Size of 'a'. * * Similar to find_closest() but 'a' is expected to be sorted in descending - * order. + * order. The iteration is done in reverse order, so that the comparison + * of '__fc_right' & '__fc_left' also works for unsigned numbers. */ -#define find_closest_descending(x, a, as) __find_closest(x, a, as, >=) +#define find_closest_descending(x, a, as) \ +({ \ + typeof(as) __fc_i, __fc_as = (as) - 1; \ + long __fc_mid_x, __fc_x = (x); \ + long __fc_left, __fc_right; \ + typeof(*a) const *__fc_a = (a); \ + for (__fc_i = __fc_as; __fc_i >= 1; __fc_i--) { \ + __fc_mid_x = (__fc_a[__fc_i] + __fc_a[__fc_i - 1]) / 2; \ + if (__fc_x <= __fc_mid_x) { \ + __fc_left = __fc_x - __fc_a[__fc_i]; \ + __fc_right = __fc_a[__fc_i - 1] - __fc_x; \ + if (__fc_right < __fc_left) \ + __fc_i--; \ + break; \ + } \ + } \ + (__fc_i); \ +}) /** * is_insidevar - check if the @ptr points inside the @var memory range. _ Patches currently in -mm which might be from aardelean(a)baylibre.com are util_macrosh-fix-rework-find_closest-macros.patch lib-util_macros_kunit-add-kunit-test-for-util_macrosh.patch

1 year, 1 month

1
0
0 0

[REGRESSION] The iwl4965 driver broke somewhere between 6.10.10 and 6.11.5 (probably 6.11rc)

by Alf Marius

Hi, I recently installed Arch Linux on an old laptop (Fujitsu-Siemens AMILO Xi 2550) and noticed that: - when booting Linux from the Arch ISO (kernel version 6.10.10) WIFI is working fine - after installing Arch Linux from the ISO and booting (kernel version 6.11.5) WIFI was not working properly By "not working properly" I mean: downloading small files or installing a few small packages was working ok, but when downloading larger files or installing larger packages with lots of dependencies, the connection would gradually slow down and eventually die. I reported this on the Arch Linux forum (https://bbs.archlinux.org/viewtopic.php?pid=2206757) and some helpful memeber suggested that this might be the commit that broke things: https://github.com/torvalds/linux/commit/02b682d54598f61cbb7dbb14d98ec18011… An Arch Linux packet manager (gromit) helped me debug this issue by building a couple of kernels that I tested. - https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.12rc… - https://pkgbuild.com/\~gromit/linux-bisection-kernels/linux-mainline-6.12rc… The first one didn't work, but the second (in which he reverted the commit linked above) did fix my problem. So, I guess this commit should be investigated by those in the know. Thats why I also added Andrii and Kalle to CC as they are listed in the commit message. My network controller: Intel corporation PRO/Wireless 4965 AG or AGN [Kedron] Network Connection (rev 61) Kernel driver in use: iwl4965 This is my first kernel bug report, hope I did everything right :) I'm ofc willing to help provide more info and debug locally here to help solve this issue. Thanks and good night Alf :) -- "The generation of random numbers is too important to be left to chance."

1 year, 1 month

1
0
0 0

[PATCHSET v5.5 01/10] xfs: convert perag to use xarrays

by Darrick J. Wong

Hi all, Convert the xfs_mount perag tree to use an xarray instead of a radix tree. There should be no functional changes here. If you're going to start using this code, I strongly recommend pulling from my git trees, which are linked below. With a bit of luck, this should all go splendidly. Comments and questions are, as always, welcome. --D kernel git tree: https://git.kernel.org/cgit/linux/kernel/git/djwong/xfs-linux.git/log/?h=pe… --- Commits in this patchset: * xfs: fix simplify extent lookup in xfs_can_free_eofblocks * xfs: fix superfluous clearing of info->low in __xfs_getfsmap_datadev * xfs: remove the unused pagb_count field in struct xfs_perag * xfs: remove the unused pag_active_wq field in struct xfs_perag * xfs: pass a pag to xfs_difree_inode_chunk * xfs: remove the agno argument to xfs_free_ag_extent * xfs: add xfs_agbno_to_fsb and xfs_agbno_to_daddr helpers * xfs: add a xfs_agino_to_ino helper * xfs: pass a pag to xfs_extent_busy_{search,reuse} * xfs: keep a reference to the pag for busy extents * xfs: remove the mount field from struct xfs_busy_extents * xfs: remove the unused trace_xfs_iwalk_ag trace point * xfs: remove the unused xrep_bmap_walk_rmap trace point * xfs: constify pag arguments to trace points * xfs: pass a perag structure to the xfs_ag_resv_init_error trace point * xfs: pass objects to the xfs_irec_merge_{pre,post} trace points * xfs: pass the iunlink item to the xfs_iunlink_update_dinode trace point * xfs: pass objects to the xrep_ibt_walk_rmap tracepoint * xfs: pass the pag to the trace_xrep_calc_ag_resblks{,_btsize} trace points * xfs: pass the pag to the xrep_newbt_extent_class tracepoints * xfs: convert remaining trace points to pass pag structures * xfs: split xfs_initialize_perag * xfs: insert the pag structures into the xarray later --- fs/xfs/libxfs/xfs_ag.c | 135 ++++++++++++++----------- fs/xfs/libxfs/xfs_ag.h | 30 +++++- fs/xfs/libxfs/xfs_ag_resv.c | 3 - fs/xfs/libxfs/xfs_alloc.c | 32 +++--- fs/xfs/libxfs/xfs_alloc.h | 5 - fs/xfs/libxfs/xfs_alloc_btree.c | 2 fs/xfs/libxfs/xfs_btree.c | 7 + fs/xfs/libxfs/xfs_ialloc.c | 67 ++++++------- fs/xfs/libxfs/xfs_ialloc_btree.c | 2 fs/xfs/libxfs/xfs_inode_util.c | 4 - fs/xfs/libxfs/xfs_refcount.c | 11 +- fs/xfs/libxfs/xfs_refcount_btree.c | 3 - fs/xfs/libxfs/xfs_rmap_btree.c | 2 fs/xfs/scrub/agheader_repair.c | 16 +-- fs/xfs/scrub/alloc_repair.c | 10 +- fs/xfs/scrub/bmap.c | 5 - fs/xfs/scrub/bmap_repair.c | 4 - fs/xfs/scrub/common.c | 2 fs/xfs/scrub/cow_repair.c | 18 +-- fs/xfs/scrub/ialloc.c | 8 +- fs/xfs/scrub/ialloc_repair.c | 25 ++--- fs/xfs/scrub/newbt.c | 46 ++++----- fs/xfs/scrub/reap.c | 8 +- fs/xfs/scrub/refcount_repair.c | 5 - fs/xfs/scrub/repair.c | 13 +- fs/xfs/scrub/rmap_repair.c | 9 +- fs/xfs/scrub/trace.h | 161 ++++++++++++++---------------- fs/xfs/xfs_bmap_util.c | 8 +- fs/xfs/xfs_buf_item_recover.c | 5 - fs/xfs/xfs_discard.c | 20 ++-- fs/xfs/xfs_extent_busy.c | 31 ++---- fs/xfs/xfs_extent_busy.h | 14 +-- fs/xfs/xfs_extfree_item.c | 4 - fs/xfs/xfs_filestream.c | 5 - fs/xfs/xfs_fsmap.c | 25 ++--- fs/xfs/xfs_health.c | 8 +- fs/xfs/xfs_inode.c | 5 - fs/xfs/xfs_iunlink_item.c | 13 +- fs/xfs/xfs_iwalk.c | 17 ++- fs/xfs/xfs_log_cil.c | 3 - fs/xfs/xfs_log_recover.c | 5 - fs/xfs/xfs_trace.c | 1 fs/xfs/xfs_trace.h | 191 +++++++++++++++--------------------- fs/xfs/xfs_trans.c | 2 44 files changed, 459 insertions(+), 531 deletions(-)

1 year, 1 month

1
1
0 0

[PATCH] scsi: lpfc: Fix improper handling of refcount in lpfc_bsg_hba_get_event()

by Qiu-ji Chen

This patch addresses a reference count handling issue in the lpfc_bsg_hba_get_event() function. In the branch if (evt->reg_id == event_req->ev_reg_id), the function calls lpfc_bsg_event_ref(), which increments the reference count of the relevant resources. However, in the branch if (evt_dat == NULL), a goto statement directly jumps to the function’s final goto block, skipping the release operations at the end of the function. This means that, if the condition if (evt_dat == NULL) is met, the function fails to correctly release the resources acquired by lpfc_bsg_event_ref(), leading to a reference count leak. To fix this issue, we added a new block job_error_unref before the job_error block. When the condition if (evt_dat == NULL) is met, the function will enter the job_error_unref block, ensuring that the previously allocated resources are properly released, thereby preventing the reference count leak. This bug was identified by an experimental static analysis tool developed by our team. The tool specializes in analyzing reference count operations and detecting potential issues where resources are not properly managed. In this case, the tool flagged the missing release operation as a potential problem, which led to the development of this patch. Fixes: 4cc0e56e977f ("[SCSI] lpfc 8.3.8: (BSG3) Modify BSG commands to operate asynchronously") Cc: stable(a)vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com> --- drivers/scsi/lpfc/lpfc_bsg.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/scsi/lpfc/lpfc_bsg.c b/drivers/scsi/lpfc/lpfc_bsg.c index 85059b83ea6b..832a5a6dd85f 100644 --- a/drivers/scsi/lpfc/lpfc_bsg.c +++ b/drivers/scsi/lpfc/lpfc_bsg.c @@ -1294,7 +1294,7 @@ lpfc_bsg_hba_get_event(struct bsg_job *job) if (evt_dat == NULL) { bsg_reply->reply_payload_rcv_len = 0; rc = -ENOENT; - goto job_error; + goto job_error_unref; } if (evt_dat->len > job->request_payload.payload_len) { @@ -1329,6 +1329,10 @@ lpfc_bsg_hba_get_event(struct bsg_job *job) bsg_reply->reply_payload_rcv_len); return 0; +job_err_unref: + spin_lock_irqsave(&phba->ct_ev_lock, flags); + lpfc_bsg_event_unref(evt); + spin_unlock_irqrestore(&phba->ct_ev_lock, flags); job_error: job->dd_data = NULL; bsg_reply->result = rc; -- 2.34.1

1 year, 1 month

3
3
0 0

[PATCH 13/16] drm/amd/display: update pipe selection policy to check head pipe

by Hamza Mahfooz

From: Yihan Zhu <Yihan.Zhu(a)amd.com> [Why] No check on head pipe during the dml to dc hw mapping will allow illegal pipe usage. This will result in a wrong pipe topology to cause mpcc tree totally mess up then cause a display hang. [How] Avoid to use the pipe is head in all check and avoid ODM slice during preferred pipe check. Cc: stable(a)vger.kernel.org Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Yihan Zhu <Yihan.Zhu(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- .../display/dc/dml2/dml2_dc_resource_mgmt.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c b/drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c index 6eccf0241d85..9be9ed7e01d3 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml2_dc_resource_mgmt.c @@ -258,12 +258,23 @@ static unsigned int find_preferred_pipe_candidates(const struct dc_state *existi * However this condition comes with a caveat. We need to ignore pipes that will * require a change in OPP but still have the same stream id. For example during * an MPC to ODM transiton. + * + * Adding check to avoid pipe select on the head pipe by utilizing dc resource + * helper function resource_get_primary_dpp_pipe and comparing the pipe index. */ if (existing_state) { for (i = 0; i < pipe_count; i++) { if (existing_state->res_ctx.pipe_ctx[i].stream && existing_state->res_ctx.pipe_ctx[i].stream->stream_id == stream_id) { + struct pipe_ctx *head_pipe = + resource_get_primary_dpp_pipe(&existing_state->res_ctx.pipe_ctx[i]); + + // we should always respect the head pipe from selection + if (head_pipe && head_pipe->pipe_idx == i) + continue; if (existing_state->res_ctx.pipe_ctx[i].plane_res.hubp && - existing_state->res_ctx.pipe_ctx[i].plane_res.hubp->opp_id != i) + existing_state->res_ctx.pipe_ctx[i].plane_res.hubp->opp_id != i && + (existing_state->res_ctx.pipe_ctx[i].prev_odm_pipe || + existing_state->res_ctx.pipe_ctx[i].next_odm_pipe)) continue; preferred_pipe_candidates[num_preferred_candidates++] = i; @@ -292,6 +303,12 @@ static unsigned int find_last_resort_pipe_candidates(const struct dc_state *exis */ if (existing_state) { for (i = 0; i < pipe_count; i++) { + struct pipe_ctx *head_pipe = + resource_get_primary_dpp_pipe(&existing_state->res_ctx.pipe_ctx[i]); + + // we should always respect the head pipe from selection + if (head_pipe && head_pipe->pipe_idx == i) + continue; if ((existing_state->res_ctx.pipe_ctx[i].plane_res.hubp && existing_state->res_ctx.pipe_ctx[i].plane_res.hubp->opp_id != i) || existing_state->res_ctx.pipe_ctx[i].stream_res.tg) -- 2.46.1

1 year, 1 month

1
0
0 0

[PATCH 12/16] drm/amd/display: Require minimum VBlank size for stutter optimization

by Hamza Mahfooz

From: Dillon Varone <dillon.varone(a)amd.com> If the nominal VBlank is too small, optimizing for stutter can cause the prefetch bandwidth to increase drasticaly, resulting in higher clock and power requirements. Only optimize if it is >3x the stutter latency. Cc: stable(a)vger.kernel.org Reviewed-by: Austin Zheng <austin.zheng(a)amd.com> Signed-off-by: Dillon Varone <dillon.varone(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- .../dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c index 5a09dd298e6f..92269f0e50ed 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c +++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_pmo/dml2_pmo_dcn4_fams2.c @@ -8,6 +8,7 @@ #include "dml2_pmo_dcn4_fams2.h" static const double MIN_VACTIVE_MARGIN_PCT = 0.25; // We need more than non-zero margin because DET buffer granularity can alter vactive latency hiding +static const double MIN_BLANK_STUTTER_FACTOR = 3.0; static const struct dml2_pmo_pstate_strategy base_strategy_list_1_display[] = { // VActive Preferred @@ -2140,6 +2141,7 @@ bool pmo_dcn4_fams2_init_for_stutter(struct dml2_pmo_init_for_stutter_in_out *in struct dml2_pmo_instance *pmo = in_out->instance; bool stutter_period_meets_z8_eco = true; bool z8_stutter_optimization_too_expensive = false; + bool stutter_optimization_too_expensive = false; double line_time_us, vblank_nom_time_us; unsigned int i; @@ -2161,10 +2163,15 @@ bool pmo_dcn4_fams2_init_for_stutter(struct dml2_pmo_init_for_stutter_in_out *in line_time_us = (double)in_out->base_display_config->display_config.stream_descriptors[i].timing.h_total / (in_out->base_display_config->display_config.stream_descriptors[i].timing.pixel_clock_khz * 1000) * 1000000; vblank_nom_time_us = line_time_us * in_out->base_display_config->display_config.stream_descriptors[i].timing.vblank_nom; - if (vblank_nom_time_us < pmo->soc_bb->power_management_parameters.z8_stutter_exit_latency_us) { + if (vblank_nom_time_us < pmo->soc_bb->power_management_parameters.z8_stutter_exit_latency_us * MIN_BLANK_STUTTER_FACTOR) { z8_stutter_optimization_too_expensive = true; break; } + + if (vblank_nom_time_us < pmo->soc_bb->power_management_parameters.stutter_enter_plus_exit_latency_us * MIN_BLANK_STUTTER_FACTOR) { + stutter_optimization_too_expensive = true; + break; + } } pmo->scratch.pmo_dcn4.num_stutter_candidates = 0; @@ -2180,7 +2187,7 @@ bool pmo_dcn4_fams2_init_for_stutter(struct dml2_pmo_init_for_stutter_in_out *in pmo->scratch.pmo_dcn4.z8_vblank_optimizable = false; } - if (pmo->soc_bb->power_management_parameters.stutter_enter_plus_exit_latency_us > 0) { + if (!stutter_optimization_too_expensive && pmo->soc_bb->power_management_parameters.stutter_enter_plus_exit_latency_us > 0) { pmo->scratch.pmo_dcn4.optimal_vblank_reserved_time_for_stutter_us[pmo->scratch.pmo_dcn4.num_stutter_candidates] = (unsigned int)pmo->soc_bb->power_management_parameters.stutter_enter_plus_exit_latency_us; pmo->scratch.pmo_dcn4.num_stutter_candidates++; } -- 2.46.1

1 year, 1 month

1
0
0 0

[PATCH 11/16] drm/amd/display: Handle dml allocation failure to avoid crash

by Hamza Mahfooz

From: Ryan Seto <ryanseto(a)amd.com> [Why] In the case where a dml allocation fails for any reason, the current state's dml contexts would no longer be valid. Then subsequent calls dc_state_copy_internal would shallow copy invalid memory and if the new state was released, a double free would occur. [How] Reset dml pointers in new_state to NULL and avoid invalid pointer Cc: stable(a)vger.kernel.org Reviewed-by: Dillon Varone <dillon.varone(a)amd.com> Signed-off-by: Ryan Seto <ryanseto(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- drivers/gpu/drm/amd/display/dc/core/dc_state.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_state.c b/drivers/gpu/drm/amd/display/dc/core/dc_state.c index 2597e3fd562b..e006f816ff2f 100644 --- a/drivers/gpu/drm/amd/display/dc/core/dc_state.c +++ b/drivers/gpu/drm/amd/display/dc/core/dc_state.c @@ -265,6 +265,9 @@ struct dc_state *dc_state_create_copy(struct dc_state *src_state) dc_state_copy_internal(new_state, src_state); #ifdef CONFIG_DRM_AMD_DC_FP + new_state->bw_ctx.dml2 = NULL; + new_state->bw_ctx.dml2_dc_power_source = NULL; + if (src_state->bw_ctx.dml2 && !dml2_create_copy(&new_state->bw_ctx.dml2, src_state->bw_ctx.dml2)) { dc_state_release(new_state); -- 2.46.1

1 year, 1 month

1
0
0 0

[PATCH 07/16] drm/amd/display: always blank stream before disable crtc

by Hamza Mahfooz

From: Fudongwang <Fudong.Wang(a)amd.com> Garbage will show due to dig is on. So blank stream needed. Cc: stable(a)vger.kernel.org Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com> Signed-off-by: Fudongwang <Fudong.Wang(a)amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com> --- .../gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c index 036cb7e9b5bb..59b2e87317e3 100644 --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn31/dcn31_hwseq.c @@ -519,15 +519,18 @@ static void dcn31_reset_back_end_for_pipe( dc->hwss.set_abm_immediate_disable(pipe_ctx); - if ((!pipe_ctx->stream->dpms_off || pipe_ctx->stream->link->link_status.link_active) - && pipe_ctx->stream->sink && pipe_ctx->stream->sink->edid_caps.panel_patch.blankstream_before_otg_off) { + link = pipe_ctx->stream->link; + + if ((!pipe_ctx->stream->dpms_off || link->link_status.link_active) && + (link->connector_signal == SIGNAL_TYPE_EDP)) dc->hwss.blank_stream(pipe_ctx); - } pipe_ctx->stream_res.tg->funcs->set_dsc_config( pipe_ctx->stream_res.tg, OPTC_DSC_DISABLED, 0, 0); + pipe_ctx->stream_res.tg->funcs->disable_crtc(pipe_ctx->stream_res.tg); + pipe_ctx->stream_res.tg->funcs->enable_optc_clock(pipe_ctx->stream_res.tg, false); if (pipe_ctx->stream_res.tg->funcs->set_odm_bypass) pipe_ctx->stream_res.tg->funcs->set_odm_bypass( @@ -539,7 +542,6 @@ static void dcn31_reset_back_end_for_pipe( pipe_ctx->stream_res.tg->funcs->set_drr( pipe_ctx->stream_res.tg, NULL); - link = pipe_ctx->stream->link; /* DPMS may already disable or */ /* dpms_off status is incorrect due to fastboot * feature. When system resume from S4 with second -- 2.46.1

1 year, 1 month

1
0
0 0

[PATCH] scsi: lpfc: Fix improper handling of refcount in lpfc_bsg_hba_set_event()

by Qiu-ji Chen

This patch fixes a reference count handling issue in the function lpfc_bsg_hba_set_event(). In the branch if (evt->reg_id == event_req->ev_reg_id), the function calls lpfc_bsg_event_ref(), which increments the reference count of the associated resource. However, in the subsequent branch if (&evt->node == &phba->ct_ev_waiters), a new evt is allocated, but the old evt should be released at this point. Failing to do so could lead to issues. To resolve this issue, we added a release instruction at the beginning of the next branch if (&evt->node == &phba->ct_ev_waiters), ensuring that the resources allocated in the previous branch are properly released, thereby preventing a reference count leak. This bug was identified by an experimental static analysis tool developed by our team. The tool specializes in analyzing reference count operations and detecting potential issues where resources are not properly managed. In this case, the tool flagged the missing release operation as a potential problem, which led to the development of this patch. Fixes: 4cc0e56e977f ("[SCSI] lpfc 8.3.8: (BSG3) Modify BSG commands to operate asynchronously") Cc: stable(a)vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com> --- drivers/scsi/lpfc/lpfc_bsg.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/scsi/lpfc/lpfc_bsg.c b/drivers/scsi/lpfc/lpfc_bsg.c index 85059b83ea6b..3a65270c5584 100644 --- a/drivers/scsi/lpfc/lpfc_bsg.c +++ b/drivers/scsi/lpfc/lpfc_bsg.c @@ -1200,6 +1200,9 @@ lpfc_bsg_hba_set_event(struct bsg_job *job) spin_unlock_irqrestore(&phba->ct_ev_lock, flags); if (&evt->node == &phba->ct_ev_waiters) { + spin_lock_irqsave(&phba->ct_ev_lock, flags); + lpfc_bsg_event_unref(evt); + spin_unlock_irqrestore(&phba->ct_ev_lock, flags); /* no event waiting struct yet - first call */ dd_data = kmalloc(sizeof(struct bsg_job_data), GFP_KERNEL); if (dd_data == NULL) { -- 2.34.1

1 year, 1 month

2
1
0 0

[tip: x86/urgent] x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 client

by tip-bot2 for Mario Limonciello

The following commit has been merged into the x86/urgent branch of tip: Commit-ID: a5ca1dc46a6b610dd4627d8b633d6c84f9724ef0 Gitweb: https://git.kernel.org/tip/a5ca1dc46a6b610dd4627d8b633d6c84f9724ef0 Author: Mario Limonciello <mario.limonciello(a)amd.com> AuthorDate: Tue, 05 Nov 2024 10:02:34 -06:00 Committer: Borislav Petkov (AMD) <bp(a)alien8.de> CommitterDate: Tue, 05 Nov 2024 17:48:32 +01:00 x86/CPU/AMD: Clear virtualized VMLOAD/VMSAVE on Zen4 client A number of Zen4 client SoCs advertise the ability to use virtualized VMLOAD/VMSAVE, but using these instructions is reported to be a cause of a random host reboot. These instructions aren't intended to be advertised on Zen4 client so clear the capability. Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com> Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de> Cc: stable(a)vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=219009 --- arch/x86/kernel/cpu/amd.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c index fab5cae..823f44f 100644 --- a/arch/x86/kernel/cpu/amd.c +++ b/arch/x86/kernel/cpu/amd.c @@ -924,6 +924,17 @@ static void init_amd_zen4(struct cpuinfo_x86 *c) { if (!cpu_has(c, X86_FEATURE_HYPERVISOR)) msr_set_bit(MSR_ZEN4_BP_CFG, MSR_ZEN4_BP_CFG_SHARED_BTB_FIX_BIT); + + /* + * These Zen4 SoCs advertise support for virtualized VMLOAD/VMSAVE + * in some BIOS versions but they can lead to random host reboots. + */ + switch (c->x86_model) { + case 0x18 ... 0x1f: + case 0x60 ... 0x7f: + clear_cpu_cap(c, X86_FEATURE_V_VMSAVE_VMLOAD); + break; + } } static void init_amd_zen5(struct cpuinfo_x86 *c)

1 year, 1 month

1
0
0 0

[PATCH] ACPICA: Fix dereference in acpi_ev_address_space_dispatch()

by George Rurikov

When support for PCC Opregion was added, validation of field_obj was missed. Based on the acpi_ev_address_space_dispatch function description, field_obj can be NULL, and also when acpi_ev_address_space_dispatch is called in the acpi_ex_region_read() NULL is passed as field_obj. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: 0acf24ad7e10 ("ACPICA: Add support for PCC Opregion special context data") Cc: stable(a)vger.kernel.org Signed-off-by: George Rurikov <grurikov(a)gmail.com> --- drivers/acpi/acpica/evregion.c | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/drivers/acpi/acpica/evregion.c b/drivers/acpi/acpica/evregion.c index cf53b9535f18..03e8b6f186af 100644 --- a/drivers/acpi/acpica/evregion.c +++ b/drivers/acpi/acpica/evregion.c @@ -164,13 +164,17 @@ acpi_ev_address_space_dispatch(union acpi_operand_object *region_obj, } if (region_obj->region.space_id == ACPI_ADR_SPACE_PLATFORM_COMM) { - struct acpi_pcc_info *ctx = - handler_desc->address_space.context; - - ctx->internal_buffer = - field_obj->field.internal_pcc_buffer; - ctx->length = (u16)region_obj->region.length; - ctx->subspace_id = (u8)region_obj->region.address; + if (field_obj != NULL) { + struct acpi_pcc_info *ctx = + handler_desc->address_space.context; + + ctx->internal_buffer = + field_obj->field.internal_pcc_buffer; + ctx->length = (u16)region_obj->region.length; + ctx->subspace_id = (u8)region_obj->region.address; + } else { + return_ACPI_STATUS(AE_ERROR); + } } if (region_obj->region.space_id == -- 2.34.1

1 year, 1 month

3
2
0 0

FAILED: patch "[PATCH] cxl/port: Fix CXL port initialization order when the" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 6575b268157f37929948a8d1f3bafb3d7c055bc1 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110551-fragrance-deodorant-e7fa@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 6575b268157f37929948a8d1f3bafb3d7c055bc1 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Fri, 25 Oct 2024 12:32:55 -0700 Subject: [PATCH] cxl/port: Fix CXL port initialization order when the subsystem is built-in When the CXL subsystem is built-in the module init order is determined by Makefile order. That order violates expectations. The expectation is that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race, cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That flow only works if cxl_acpi can assume ports are enabled immediately upon cxl_acpi_probe() return. That in turn can only happen in the CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before cxl_acpi_probe() runs. Fix up the order to prevent initialization failures. Ensure that cxl_port is built-in when cxl_acpi is also built-in, arrange for Makefile order to resolve the subsys_initcall() order of cxl_port and cxl_acpi, and arrange for Makefile order to resolve the device_initcall() (module_init()) order of the remaining objects. As for what contributed to this not being found earlier, the CXL regression environment, cxl_test, builds all CXL functionality as a module to allow to symbol mocking and other dynamic reload tests. As a result there is no regression coverage for the built-in case. Reported-by: Gregory Price <gourry(a)gourry.net> Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net Tested-by: Gregory Price <gourry(a)gourry.net> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: stable(a)vger.kernel.org Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: Alison Schofield <alison.schofield(a)intel.com> Cc: Vishal Verma <vishal.l.verma(a)intel.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Tested-by: Alejandro Lucero <alucerop(a)amd.com> Reviewed-by: Alejandro Lucero <alucerop(a)amd.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index 29c192f20082..876469e23f7a 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -60,6 +60,7 @@ config CXL_ACPI default CXL_BUS select ACPI_TABLE_LIB select ACPI_HMAT + select CXL_PORT help Enable support for host managed device memory (HDM) resources published by a platform's ACPI CXL memory layout description. See diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile index db321f48ba52..2caa90fa4bf2 100644 --- a/drivers/cxl/Makefile +++ b/drivers/cxl/Makefile @@ -1,13 +1,21 @@ # SPDX-License-Identifier: GPL-2.0 + +# Order is important here for the built-in case: +# - 'core' first for fundamental init +# - 'port' before platform root drivers like 'acpi' so that CXL-root ports +# are immediately enabled +# - 'mem' and 'pmem' before endpoint drivers so that memdevs are +# immediately enabled +# - 'pci' last, also mirrors the hardware enumeration hierarchy obj-y += core/ -obj-$(CONFIG_CXL_PCI) += cxl_pci.o -obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PORT) += cxl_port.o obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o -obj-$(CONFIG_CXL_PORT) += cxl_port.o +obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PCI) += cxl_pci.o -cxl_mem-y := mem.o -cxl_pci-y := pci.o +cxl_port-y := port.o cxl_acpi-y := acpi.o cxl_pmem-y := pmem.o security.o -cxl_port-y := port.o +cxl_mem-y := mem.o +cxl_pci-y := pci.o diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index 861dde65768f..9dc394295e1f 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = { }, }; -module_cxl_driver(cxl_port_driver); +static int __init cxl_port_init(void) +{ + return cxl_driver_register(&cxl_port_driver); +} +/* + * Be ready to immediately enable ports emitted by the platform CXL root + * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y. + */ +subsys_initcall(cxl_port_init); + +static void __exit cxl_port_exit(void) +{ + cxl_driver_unregister(&cxl_port_driver); +} +module_exit(cxl_port_exit); + MODULE_DESCRIPTION("CXL: Port enumeration and services"); MODULE_LICENSE("GPL v2"); MODULE_IMPORT_NS(CXL);

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] cxl/port: Fix CXL port initialization order when the" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 6575b268157f37929948a8d1f3bafb3d7c055bc1 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110550-chamomile-zit-2c77@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 6575b268157f37929948a8d1f3bafb3d7c055bc1 Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Fri, 25 Oct 2024 12:32:55 -0700 Subject: [PATCH] cxl/port: Fix CXL port initialization order when the subsystem is built-in When the CXL subsystem is built-in the module init order is determined by Makefile order. That order violates expectations. The expectation is that cxl_acpi and cxl_mem can race to attach. If cxl_acpi wins the race, cxl_mem will find the enabled CXL root ports it needs. If cxl_acpi loses the race it will retrigger cxl_mem to attach via cxl_bus_rescan(). That flow only works if cxl_acpi can assume ports are enabled immediately upon cxl_acpi_probe() return. That in turn can only happen in the CONFIG_CXL_ACPI=y case if the cxl_port driver is registered before cxl_acpi_probe() runs. Fix up the order to prevent initialization failures. Ensure that cxl_port is built-in when cxl_acpi is also built-in, arrange for Makefile order to resolve the subsys_initcall() order of cxl_port and cxl_acpi, and arrange for Makefile order to resolve the device_initcall() (module_init()) order of the remaining objects. As for what contributed to this not being found earlier, the CXL regression environment, cxl_test, builds all CXL functionality as a module to allow to symbol mocking and other dynamic reload tests. As a result there is no regression coverage for the built-in case. Reported-by: Gregory Price <gourry(a)gourry.net> Closes: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net Tested-by: Gregory Price <gourry(a)gourry.net> Fixes: 8dd2bc0f8e02 ("cxl/mem: Add the cxl_mem driver") Cc: stable(a)vger.kernel.org Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Jonathan Cameron <jonathan.cameron(a)huawei.com> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: Alison Schofield <alison.schofield(a)intel.com> Cc: Vishal Verma <vishal.l.verma(a)intel.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Tested-by: Alejandro Lucero <alucerop(a)amd.com> Reviewed-by: Alejandro Lucero <alucerop(a)amd.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Link: https://patch.msgid.link/172988474904.476062.7961350937442459266.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> diff --git a/drivers/cxl/Kconfig b/drivers/cxl/Kconfig index 29c192f20082..876469e23f7a 100644 --- a/drivers/cxl/Kconfig +++ b/drivers/cxl/Kconfig @@ -60,6 +60,7 @@ config CXL_ACPI default CXL_BUS select ACPI_TABLE_LIB select ACPI_HMAT + select CXL_PORT help Enable support for host managed device memory (HDM) resources published by a platform's ACPI CXL memory layout description. See diff --git a/drivers/cxl/Makefile b/drivers/cxl/Makefile index db321f48ba52..2caa90fa4bf2 100644 --- a/drivers/cxl/Makefile +++ b/drivers/cxl/Makefile @@ -1,13 +1,21 @@ # SPDX-License-Identifier: GPL-2.0 + +# Order is important here for the built-in case: +# - 'core' first for fundamental init +# - 'port' before platform root drivers like 'acpi' so that CXL-root ports +# are immediately enabled +# - 'mem' and 'pmem' before endpoint drivers so that memdevs are +# immediately enabled +# - 'pci' last, also mirrors the hardware enumeration hierarchy obj-y += core/ -obj-$(CONFIG_CXL_PCI) += cxl_pci.o -obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PORT) += cxl_port.o obj-$(CONFIG_CXL_ACPI) += cxl_acpi.o obj-$(CONFIG_CXL_PMEM) += cxl_pmem.o -obj-$(CONFIG_CXL_PORT) += cxl_port.o +obj-$(CONFIG_CXL_MEM) += cxl_mem.o +obj-$(CONFIG_CXL_PCI) += cxl_pci.o -cxl_mem-y := mem.o -cxl_pci-y := pci.o +cxl_port-y := port.o cxl_acpi-y := acpi.o cxl_pmem-y := pmem.o security.o -cxl_port-y := port.o +cxl_mem-y := mem.o +cxl_pci-y := pci.o diff --git a/drivers/cxl/port.c b/drivers/cxl/port.c index 861dde65768f..9dc394295e1f 100644 --- a/drivers/cxl/port.c +++ b/drivers/cxl/port.c @@ -208,7 +208,22 @@ static struct cxl_driver cxl_port_driver = { }, }; -module_cxl_driver(cxl_port_driver); +static int __init cxl_port_init(void) +{ + return cxl_driver_register(&cxl_port_driver); +} +/* + * Be ready to immediately enable ports emitted by the platform CXL root + * (e.g. cxl_acpi) when CONFIG_CXL_PORT=y. + */ +subsys_initcall(cxl_port_init); + +static void __exit cxl_port_exit(void) +{ + cxl_driver_unregister(&cxl_port_driver); +} +module_exit(cxl_port_exit); + MODULE_DESCRIPTION("CXL: Port enumeration and services"); MODULE_LICENSE("GPL v2"); MODULE_IMPORT_NS(CXL);

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] cxl/port: Fix use-after-free, permit out-of-order decoder" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 101c268bd2f37e965a5468353e62d154db38838e # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110539-swooned-cold-53f5@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 101c268bd2f37e965a5468353e62d154db38838e Mon Sep 17 00:00:00 2001 From: Dan Williams <dan.j.williams(a)intel.com> Date: Tue, 22 Oct 2024 18:43:49 -0700 Subject: [PATCH] cxl/port: Fix use-after-free, permit out-of-order decoder shutdown In support of investigating an initialization failure report [1], cxl_test was updated to register mock memory-devices after the mock root-port/bus device had been registered. That led to cxl_test crashing with a use-after-free bug with the following signature: cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem0:decoder7.0 @ 0 next: cxl_switch_uport.0 nr_eps: 1 nr_targets: 1 cxl_port_attach_region: cxl region3: cxl_host_bridge.0:port3 decoder3.0 add: mem4:decoder14.0 @ 1 next: cxl_switch_uport.0 nr_eps: 2 nr_targets: 1 cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[0] = cxl_switch_dport.0 for mem0:decoder7.0 @ 0 1) cxl_port_setup_targets: cxl region3: cxl_switch_uport.0:port6 target[1] = cxl_switch_dport.4 for mem4:decoder14.0 @ 1 [..] cxld_unregister: cxl decoder14.0: cxl_region_decode_reset: cxl_region region3: mock_decoder_reset: cxl_port port3: decoder3.0 reset 2) mock_decoder_reset: cxl_port port3: decoder3.0: out of order reset, expected decoder3.1 cxl_endpoint_decoder_release: cxl decoder14.0: [..] cxld_unregister: cxl decoder7.0: 3) cxl_region_decode_reset: cxl_region region3: Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6bc3: 0000 [#1] PREEMPT SMP PTI [..] RIP: 0010:to_cxl_port+0x8/0x60 [cxl_core] [..] Call Trace: <TASK> cxl_region_decode_reset+0x69/0x190 [cxl_core] cxl_region_detach+0xe8/0x210 [cxl_core] cxl_decoder_kill_region+0x27/0x40 [cxl_core] cxld_unregister+0x5d/0x60 [cxl_core] At 1) a region has been established with 2 endpoint decoders (7.0 and 14.0). Those endpoints share a common switch-decoder in the topology (3.0). At teardown, 2), decoder14.0 is the first to be removed and hits the "out of order reset case" in the switch decoder. The effect though is that region3 cleanup is aborted leaving it in-tact and referencing decoder14.0. At 3) the second attempt to teardown region3 trips over the stale decoder14.0 object which has long since been deleted. The fix here is to recognize that the CXL specification places no mandate on in-order shutdown of switch-decoders, the driver enforces in-order allocation, and hardware enforces in-order commit. So, rather than fail and leave objects dangling, always remove them. In support of making cxl_region_decode_reset() always succeed, cxl_region_invalidate_memregion() failures are turned into warnings. Crashing the kernel is ok there since system integrity is at risk if caches cannot be managed around physical address mutation events like CXL region destruction. A new device_for_each_child_reverse_from() is added to cleanup port->commit_end after all dependent decoders have been disabled. In other words if decoders are allocated 0->1->2 and disabled 1->2->0 then port->commit_end only decrements from 2 after 2 has been disabled, and it decrements all the way to zero since 1 was disabled previously. Link: http://lore.kernel.org/20241004212504.1246-1-gourry@gourry.net [1] Cc: stable(a)vger.kernel.org Fixes: 176baefb2eb5 ("cxl/hdm: Commit decoder state to hardware") Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: Alison Schofield <alison.schofield(a)intel.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Cc: Zijun Hu <quic_zijuhu(a)quicinc.com> Signed-off-by: Dan Williams <dan.j.williams(a)intel.com> Reviewed-by: Ira Weiny <ira.weiny(a)intel.com> Link: https://patch.msgid.link/172964782781.81806.17902885593105284330.stgit@dwil… Signed-off-by: Ira Weiny <ira.weiny(a)intel.com> diff --git a/drivers/base/core.c b/drivers/base/core.c index a4c853411a6b..e42f1ad73078 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -4037,6 +4037,41 @@ int device_for_each_child_reverse(struct device *parent, void *data, } EXPORT_SYMBOL_GPL(device_for_each_child_reverse); +/** + * device_for_each_child_reverse_from - device child iterator in reversed order. + * @parent: parent struct device. + * @from: optional starting point in child list + * @fn: function to be called for each device. + * @data: data for the callback. + * + * Iterate over @parent's child devices, starting at @from, and call @fn + * for each, passing it @data. This helper is identical to + * device_for_each_child_reverse() when @from is NULL. + * + * @fn is checked each iteration. If it returns anything other than 0, + * iteration stop and that value is returned to the caller of + * device_for_each_child_reverse_from(); + */ +int device_for_each_child_reverse_from(struct device *parent, + struct device *from, const void *data, + int (*fn)(struct device *, const void *)) +{ + struct klist_iter i; + struct device *child; + int error = 0; + + if (!parent->p) + return 0; + + klist_iter_init_node(&parent->p->klist_children, &i, + (from ? &from->p->knode_parent : NULL)); + while ((child = prev_device(&i)) && !error) + error = fn(child, data); + klist_iter_exit(&i); + return error; +} +EXPORT_SYMBOL_GPL(device_for_each_child_reverse_from); + /** * device_find_child - device iterator for locating a particular device. * @parent: parent struct device diff --git a/drivers/cxl/core/hdm.c b/drivers/cxl/core/hdm.c index 3df10517a327..223c273c0cd1 100644 --- a/drivers/cxl/core/hdm.c +++ b/drivers/cxl/core/hdm.c @@ -712,7 +712,44 @@ static int cxl_decoder_commit(struct cxl_decoder *cxld) return 0; } -static int cxl_decoder_reset(struct cxl_decoder *cxld) +static int commit_reap(struct device *dev, const void *data) +{ + struct cxl_port *port = to_cxl_port(dev->parent); + struct cxl_decoder *cxld; + + if (!is_switch_decoder(dev) && !is_endpoint_decoder(dev)) + return 0; + + cxld = to_cxl_decoder(dev); + if (port->commit_end == cxld->id && + ((cxld->flags & CXL_DECODER_F_ENABLE) == 0)) { + port->commit_end--; + dev_dbg(&port->dev, "reap: %s commit_end: %d\n", + dev_name(&cxld->dev), port->commit_end); + } + + return 0; +} + +void cxl_port_commit_reap(struct cxl_decoder *cxld) +{ + struct cxl_port *port = to_cxl_port(cxld->dev.parent); + + lockdep_assert_held_write(&cxl_region_rwsem); + + /* + * Once the highest committed decoder is disabled, free any other + * decoders that were pinned allocated by out-of-order release. + */ + port->commit_end--; + dev_dbg(&port->dev, "reap: %s commit_end: %d\n", dev_name(&cxld->dev), + port->commit_end); + device_for_each_child_reverse_from(&port->dev, &cxld->dev, NULL, + commit_reap); +} +EXPORT_SYMBOL_NS_GPL(cxl_port_commit_reap, CXL); + +static void cxl_decoder_reset(struct cxl_decoder *cxld) { struct cxl_port *port = to_cxl_port(cxld->dev.parent); struct cxl_hdm *cxlhdm = dev_get_drvdata(&port->dev); @@ -721,14 +758,14 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) u32 ctrl; if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) - return 0; + return; - if (port->commit_end != id) { + if (port->commit_end == id) + cxl_port_commit_reap(cxld); + else dev_dbg(&port->dev, "%s: out of order reset, expected decoder%d.%d\n", dev_name(&cxld->dev), port->id, port->commit_end); - return -EBUSY; - } down_read(&cxl_dpa_rwsem); ctrl = readl(hdm + CXL_HDM_DECODER0_CTRL_OFFSET(id)); @@ -741,7 +778,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) writel(0, hdm + CXL_HDM_DECODER0_BASE_LOW_OFFSET(id)); up_read(&cxl_dpa_rwsem); - port->commit_end--; cxld->flags &= ~CXL_DECODER_F_ENABLE; /* Userspace is now responsible for reconfiguring this decoder */ @@ -751,8 +787,6 @@ static int cxl_decoder_reset(struct cxl_decoder *cxld) cxled = to_cxl_endpoint_decoder(&cxld->dev); cxled->state = CXL_DECODER_STATE_MANUAL; } - - return 0; } static int cxl_setup_hdm_decoder_from_dvsec( diff --git a/drivers/cxl/core/region.c b/drivers/cxl/core/region.c index e701e4b04032..3478d2058303 100644 --- a/drivers/cxl/core/region.c +++ b/drivers/cxl/core/region.c @@ -232,8 +232,8 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) "Bypassing cpu_cache_invalidate_memregion() for testing!\n"); return 0; } else { - dev_err(&cxlr->dev, - "Failed to synchronize CPU cache state\n"); + dev_WARN(&cxlr->dev, + "Failed to synchronize CPU cache state\n"); return -ENXIO; } } @@ -242,19 +242,17 @@ static int cxl_region_invalidate_memregion(struct cxl_region *cxlr) return 0; } -static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) +static void cxl_region_decode_reset(struct cxl_region *cxlr, int count) { struct cxl_region_params *p = &cxlr->params; - int i, rc = 0; + int i; /* - * Before region teardown attempt to flush, and if the flush - * fails cancel the region teardown for data consistency - * concerns + * Before region teardown attempt to flush, evict any data cached for + * this region, or scream loudly about missing arch / platform support + * for CXL teardown. */ - rc = cxl_region_invalidate_memregion(cxlr); - if (rc) - return rc; + cxl_region_invalidate_memregion(cxlr); for (i = count - 1; i >= 0; i--) { struct cxl_endpoint_decoder *cxled = p->targets[i]; @@ -277,23 +275,17 @@ static int cxl_region_decode_reset(struct cxl_region *cxlr, int count) cxl_rr = cxl_rr_load(iter, cxlr); cxld = cxl_rr->decoder; if (cxld->reset) - rc = cxld->reset(cxld); - if (rc) - return rc; + cxld->reset(cxld); set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); } endpoint_reset: - rc = cxled->cxld.reset(&cxled->cxld); - if (rc) - return rc; + cxled->cxld.reset(&cxled->cxld); set_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); } /* all decoders associated with this region have been torn down */ clear_bit(CXL_REGION_F_NEEDS_RESET, &cxlr->flags); - - return 0; } static int commit_decoder(struct cxl_decoder *cxld) @@ -409,16 +401,8 @@ static ssize_t commit_store(struct device *dev, struct device_attribute *attr, * still pending. */ if (p->state == CXL_CONFIG_RESET_PENDING) { - rc = cxl_region_decode_reset(cxlr, p->interleave_ways); - /* - * Revert to committed since there may still be active - * decoders associated with this region, or move forward - * to active to mark the reset successful - */ - if (rc) - p->state = CXL_CONFIG_COMMIT; - else - p->state = CXL_CONFIG_ACTIVE; + cxl_region_decode_reset(cxlr, p->interleave_ways); + p->state = CXL_CONFIG_ACTIVE; } } @@ -2054,13 +2038,7 @@ static int cxl_region_detach(struct cxl_endpoint_decoder *cxled) get_device(&cxlr->dev); if (p->state > CXL_CONFIG_ACTIVE) { - /* - * TODO: tear down all impacted regions if a device is - * removed out of order - */ - rc = cxl_region_decode_reset(cxlr, p->interleave_ways); - if (rc) - goto out; + cxl_region_decode_reset(cxlr, p->interleave_ways); p->state = CXL_CONFIG_ACTIVE; } diff --git a/drivers/cxl/cxl.h b/drivers/cxl/cxl.h index 0d8b810a51f0..5406e3ab3d4a 100644 --- a/drivers/cxl/cxl.h +++ b/drivers/cxl/cxl.h @@ -359,7 +359,7 @@ struct cxl_decoder { struct cxl_region *region; unsigned long flags; int (*commit)(struct cxl_decoder *cxld); - int (*reset)(struct cxl_decoder *cxld); + void (*reset)(struct cxl_decoder *cxld); }; /* @@ -730,6 +730,7 @@ static inline bool is_cxl_root(struct cxl_port *port) int cxl_num_decoders_committed(struct cxl_port *port); bool is_cxl_port(const struct device *dev); struct cxl_port *to_cxl_port(const struct device *dev); +void cxl_port_commit_reap(struct cxl_decoder *cxld); struct pci_bus; int devm_cxl_register_pci_bus(struct device *host, struct device *uport_dev, struct pci_bus *bus); diff --git a/include/linux/device.h b/include/linux/device.h index b4bde8d22697..667cb6db9019 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1078,6 +1078,9 @@ int device_for_each_child(struct device *dev, void *data, int (*fn)(struct device *dev, void *data)); int device_for_each_child_reverse(struct device *dev, void *data, int (*fn)(struct device *dev, void *data)); +int device_for_each_child_reverse_from(struct device *parent, + struct device *from, const void *data, + int (*fn)(struct device *, const void *)); struct device *device_find_child(struct device *dev, void *data, int (*match)(struct device *dev, void *data)); struct device *device_find_child_by_name(struct device *parent, diff --git a/tools/testing/cxl/test/cxl.c b/tools/testing/cxl/test/cxl.c index 90d5afd52dd0..c5bbd89b3192 100644 --- a/tools/testing/cxl/test/cxl.c +++ b/tools/testing/cxl/test/cxl.c @@ -693,26 +693,22 @@ static int mock_decoder_commit(struct cxl_decoder *cxld) return 0; } -static int mock_decoder_reset(struct cxl_decoder *cxld) +static void mock_decoder_reset(struct cxl_decoder *cxld) { struct cxl_port *port = to_cxl_port(cxld->dev.parent); int id = cxld->id; if ((cxld->flags & CXL_DECODER_F_ENABLE) == 0) - return 0; + return; dev_dbg(&port->dev, "%s reset\n", dev_name(&cxld->dev)); - if (port->commit_end != id) { + if (port->commit_end == id) + cxl_port_commit_reap(cxld); + else dev_dbg(&port->dev, "%s: out of order reset, expected decoder%d.%d\n", dev_name(&cxld->dev), port->id, port->commit_end); - return -EBUSY; - } - - port->commit_end--; cxld->flags &= ~CXL_DECODER_F_ENABLE; - - return 0; } static void default_mock_decoder(struct cxl_decoder *cxld)

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] soc: qcom: pmic_glink: Handle GLINK intent allocation" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x f8c879192465d9f328cb0df07208ef077c560bb1 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110503-registry-reheat-cd11@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From f8c879192465d9f328cb0df07208ef077c560bb1 Mon Sep 17 00:00:00 2001 From: Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com> Date: Wed, 23 Oct 2024 17:24:33 +0000 Subject: [PATCH] soc: qcom: pmic_glink: Handle GLINK intent allocation rejections Some versions of the pmic_glink firmware does not allow dynamic GLINK intent allocations, attempting to send a message before the firmware has allocated its receive buffers and announced these intent allocations will fail. When this happens something like this showns up in the log: pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to send altmode request: 0x10 (-125) pmic_glink_altmode.pmic_glink_altmode pmic_glink.altmode.0: failed to request altmode notifications: -125 ucsi_glink.pmic_glink_ucsi pmic_glink.ucsi.0: failed to send UCSI read request: -125 qcom_battmgr.pmic_glink_power_supply pmic_glink.power-supply.0: failed to request power notifications GLINK has been updated to distinguish between the cases where the remote is going down (-ECANCELED) and the intent allocation being rejected (-EAGAIN). Retry the send until intent buffers becomes available, or an actual error occur. To avoid infinitely waiting for the firmware in the event that this misbehaves and no intents arrive, an arbitrary 5 second timeout is used. This patch was developed with input from Chris Lew. Reported-by: Johan Hovold <johan(a)kernel.org> Closes: https://lore.kernel.org/all/Zqet8iInnDhnxkT9@hovoldconsulting.com/#t Cc: stable(a)vger.kernel.org # rpmsg: glink: Handle rejected intent request better Fixes: 58ef4ece1e41 ("soc: qcom: pmic_glink: Introduce base PMIC GLINK driver") Tested-by: Johan Hovold <johan+linaro(a)kernel.org> Reviewed-by: Johan Hovold <johan+linaro(a)kernel.org> Signed-off-by: Bjorn Andersson <bjorn.andersson(a)oss.qualcomm.com> Reviewed-by: Chris Lew <quic_clew(a)quicinc.com> Link: https://lore.kernel.org/r/20241023-pmic-glink-ecancelled-v2-2-ebc268129407@… Signed-off-by: Bjorn Andersson <andersson(a)kernel.org> diff --git a/drivers/soc/qcom/pmic_glink.c b/drivers/soc/qcom/pmic_glink.c index 9606222993fd..baa4ac6704a9 100644 --- a/drivers/soc/qcom/pmic_glink.c +++ b/drivers/soc/qcom/pmic_glink.c @@ -4,6 +4,7 @@ * Copyright (c) 2022, Linaro Ltd */ #include <linux/auxiliary_bus.h> +#include <linux/delay.h> #include <linux/module.h> #include <linux/of.h> #include <linux/platform_device.h> @@ -13,6 +14,8 @@ #include <linux/soc/qcom/pmic_glink.h> #include <linux/spinlock.h> +#define PMIC_GLINK_SEND_TIMEOUT (5 * HZ) + enum { PMIC_GLINK_CLIENT_BATT = 0, PMIC_GLINK_CLIENT_ALTMODE, @@ -112,13 +115,29 @@ EXPORT_SYMBOL_GPL(pmic_glink_client_register); int pmic_glink_send(struct pmic_glink_client *client, void *data, size_t len) { struct pmic_glink *pg = client->pg; + bool timeout_reached = false; + unsigned long start; int ret; mutex_lock(&pg->state_lock); - if (!pg->ept) + if (!pg->ept) { ret = -ECONNRESET; - else - ret = rpmsg_send(pg->ept, data, len); + } else { + start = jiffies; + for (;;) { + ret = rpmsg_send(pg->ept, data, len); + if (ret != -EAGAIN) + break; + + if (timeout_reached) { + ret = -ETIMEDOUT; + break; + } + + usleep_range(1000, 5000); + timeout_reached = time_after(jiffies, start + PMIC_GLINK_SEND_TIMEOUT); + } + } mutex_unlock(&pg->state_lock); return ret;

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] firmware: qcom: scm: suppress download mode error" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x d67907154808745b0fae5874edc7b0f78d33991c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110527-maternal-devious-a112@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d67907154808745b0fae5874edc7b0f78d33991c Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Wed, 2 Oct 2024 12:01:21 +0200 Subject: [PATCH] firmware: qcom: scm: suppress download mode error Stop spamming the logs with errors about missing mechanism for setting the so called download (or dump) mode for users that have not requested that feature to be enabled in the first place. This avoids the follow error being logged on boot as well as on shutdown when the feature it not available and download mode has not been enabled on the kernel command line: qcom_scm firmware:scm: No available mechanism for setting download mode Fixes: 79cb2cb8d89b ("firmware: qcom: scm: Disable SDI and write no dump to dump mode") Fixes: 781d32d1c970 ("firmware: qcom_scm: Clear download bit during reboot") Cc: Mukesh Ojha <quic_mojha(a)quicinc.com> Cc: stable(a)vger.kernel.org # 6.4 Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com> Link: https://lore.kernel.org/r/20241002100122.18809-2-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson <andersson(a)kernel.org> diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c index 10986cb11ec0..e2ac595902ed 100644 --- a/drivers/firmware/qcom/qcom_scm.c +++ b/drivers/firmware/qcom/qcom_scm.c @@ -545,7 +545,7 @@ static void qcom_scm_set_download_mode(u32 dload_mode) } else if (__qcom_scm_is_call_available(__scm->dev, QCOM_SCM_SVC_BOOT, QCOM_SCM_BOOT_SET_DLOAD_MODE)) { ret = __qcom_scm_set_dload_mode(__scm->dev, !!dload_mode); - } else { + } else if (dload_mode) { dev_err(__scm->dev, "No available mechanism for setting download mode\n"); }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] firmware: qcom: scm: suppress download mode error" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x d67907154808745b0fae5874edc7b0f78d33991c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110526-driven-small-2a5c@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From d67907154808745b0fae5874edc7b0f78d33991c Mon Sep 17 00:00:00 2001 From: Johan Hovold <johan+linaro(a)kernel.org> Date: Wed, 2 Oct 2024 12:01:21 +0200 Subject: [PATCH] firmware: qcom: scm: suppress download mode error Stop spamming the logs with errors about missing mechanism for setting the so called download (or dump) mode for users that have not requested that feature to be enabled in the first place. This avoids the follow error being logged on boot as well as on shutdown when the feature it not available and download mode has not been enabled on the kernel command line: qcom_scm firmware:scm: No available mechanism for setting download mode Fixes: 79cb2cb8d89b ("firmware: qcom: scm: Disable SDI and write no dump to dump mode") Fixes: 781d32d1c970 ("firmware: qcom_scm: Clear download bit during reboot") Cc: Mukesh Ojha <quic_mojha(a)quicinc.com> Cc: stable(a)vger.kernel.org # 6.4 Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org> Reviewed-by: Mukesh Ojha <quic_mojha(a)quicinc.com> Link: https://lore.kernel.org/r/20241002100122.18809-2-johan+linaro@kernel.org Signed-off-by: Bjorn Andersson <andersson(a)kernel.org> diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c index 10986cb11ec0..e2ac595902ed 100644 --- a/drivers/firmware/qcom/qcom_scm.c +++ b/drivers/firmware/qcom/qcom_scm.c @@ -545,7 +545,7 @@ static void qcom_scm_set_download_mode(u32 dload_mode) } else if (__qcom_scm_is_call_available(__scm->dev, QCOM_SCM_SVC_BOOT, QCOM_SCM_BOOT_SET_DLOAD_MODE)) { ret = __qcom_scm_set_dload_mode(__scm->dev, !!dload_mode); - } else { + } else if (dload_mode) { dev_err(__scm->dev, "No available mechanism for setting download mode\n"); }

1 year, 1 month

1
0
0 0

[PATCH] ASoC: codecs: Fix atomicity violation in snd_soc_component_get_drvdata()

by Qiu-ji Chen

An atomicity violation occurs when the validity of the variables da7219->clk_src and da7219->mclk_rate is being assessed. Since the entire assessment is not protected by a lock, the da7219 variable might still be in flux during the assessment, rendering this check invalid. To fix this issue, we recommend adding a lock before the block if ((da7219->clk_src == clk_id) && (da7219->mclk_rate == freq)) so that the legitimacy check for da7219->clk_src and da7219->mclk_rate is protected by the lock, ensuring the validity of the check. This possible bug is found by an experimental static analysis tool developed by our team. This tool analyzes the locking APIs to extract function pairs that can be concurrently executed, and then analyzes the instructions in the paired functions to identify possible concurrency bugs including data races and atomicity violations. Fixes: 6d817c0e9fd7 ("ASoC: codecs: Add da7219 codec driver") Cc: stable(a)vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com> --- sound/soc/codecs/da7219.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/sound/soc/codecs/da7219.c b/sound/soc/codecs/da7219.c index 311ea7918b31..e2da3e317b5a 100644 --- a/sound/soc/codecs/da7219.c +++ b/sound/soc/codecs/da7219.c @@ -1167,17 +1167,20 @@ static int da7219_set_dai_sysclk(struct snd_soc_dai *codec_dai, struct da7219_priv *da7219 = snd_soc_component_get_drvdata(component); int ret = 0; - if ((da7219->clk_src == clk_id) && (da7219->mclk_rate == freq)) + mutex_lock(&da7219->pll_lock); + + if ((da7219->clk_src == clk_id) && (da7219->mclk_rate == freq)) { + mutex_unlock(&da7219->pll_lock); return 0; + } if ((freq < 2000000) || (freq > 54000000)) { + mutex_unlock(&da7219->pll_lock); dev_err(codec_dai->dev, "Unsupported MCLK value %d\n", freq); return -EINVAL; } - mutex_lock(&da7219->pll_lock); - switch (clk_id) { case DA7219_CLKSRC_MCLK_SQR: snd_soc_component_update_bits(component, DA7219_PLL_CTRL, -- 2.34.1

1 year, 1 month

2
1
0 0

FAILED: patch "[PATCH] mm: allow set/clear page_type again" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 9d08ec41a0645283d79a2e642205d488feaceacf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110524-patience-rice-79e2@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 9d08ec41a0645283d79a2e642205d488feaceacf Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 22:22:12 -0600 Subject: [PATCH] mm: allow set/clear page_type again Some page flags (page->flags) were converted to page types (page->page_types). A recent example is PG_hugetlb. From the exclusive writer's perspective, e.g., a thread doing __folio_set_hugetlb(), there is a difference between the page flag and type APIs: the former allows the same non-atomic operation to be repeated whereas the latter does not. For example, calling __folio_set_hugetlb() twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type (PG_hugetlb) not to be set previously. Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in the following error-handling path. And when that happens, it triggers the aforementioned VM_BUG_ON_FOLIO(). if (folio_test_hugetlb(folio)) { rc = hugetlb_vmemmap_restore_folio(h, folio); if (rc) { spin_lock_irq(&hugetlb_lock); add_hugetlb_folio(h, folio, false); ... It is possible to make hugeTLB comply with the new requirements from the page type API. However, a straightforward fix would be to just allow the same page type to be set or cleared again inside the API, to avoid any changes to its callers. Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1b3a76710487..cc839e4365c1 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -975,12 +975,16 @@ static __always_inline bool folio_test_##fname(const struct folio *folio) \ } \ static __always_inline void __folio_set_##fname(struct folio *folio) \ { \ + if (folio_test_##fname(folio)) \ + return; \ VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX, \ folio); \ folio->page.page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __folio_clear_##fname(struct folio *folio) \ { \ + if (folio->page.page_type == UINT_MAX) \ + return; \ VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \ folio->page.page_type = UINT_MAX; \ } @@ -993,11 +997,15 @@ static __always_inline int Page##uname(const struct page *page) \ } \ static __always_inline void __SetPage##uname(struct page *page) \ { \ + if (Page##uname(page)) \ + return; \ VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page); \ page->page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __ClearPage##uname(struct page *page) \ { \ + if (page->page_type == UINT_MAX) \ + return; \ VM_BUG_ON_PAGE(!Page##uname(page), page); \ page->page_type = UINT_MAX; \ }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: allow set/clear page_type again" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 9d08ec41a0645283d79a2e642205d488feaceacf # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110524-runner-gravity-4288@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 9d08ec41a0645283d79a2e642205d488feaceacf Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 22:22:12 -0600 Subject: [PATCH] mm: allow set/clear page_type again Some page flags (page->flags) were converted to page types (page->page_types). A recent example is PG_hugetlb. From the exclusive writer's perspective, e.g., a thread doing __folio_set_hugetlb(), there is a difference between the page flag and type APIs: the former allows the same non-atomic operation to be repeated whereas the latter does not. For example, calling __folio_set_hugetlb() twice triggers VM_BUG_ON_FOLIO(), since the second call expects the type (PG_hugetlb) not to be set previously. Using add_hugetlb_folio() as an example, it calls __folio_set_hugetlb() in the following error-handling path. And when that happens, it triggers the aforementioned VM_BUG_ON_FOLIO(). if (folio_test_hugetlb(folio)) { rc = hugetlb_vmemmap_restore_folio(h, folio); if (rc) { spin_lock_irq(&hugetlb_lock); add_hugetlb_folio(h, folio, false); ... It is possible to make hugeTLB comply with the new requirements from the page type API. However, a straightforward fix would be to just allow the same page type to be set or cleared again inside the API, to avoid any changes to its callers. Link: https://lkml.kernel.org/r/20241020042212.296781-1-yuzhao@google.com Fixes: d99e3140a4d3 ("mm: turn folio_test_hugetlb into a PageType") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Muchun Song <muchun.song(a)linux.dev> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 1b3a76710487..cc839e4365c1 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -975,12 +975,16 @@ static __always_inline bool folio_test_##fname(const struct folio *folio) \ } \ static __always_inline void __folio_set_##fname(struct folio *folio) \ { \ + if (folio_test_##fname(folio)) \ + return; \ VM_BUG_ON_FOLIO(data_race(folio->page.page_type) != UINT_MAX, \ folio); \ folio->page.page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __folio_clear_##fname(struct folio *folio) \ { \ + if (folio->page.page_type == UINT_MAX) \ + return; \ VM_BUG_ON_FOLIO(!folio_test_##fname(folio), folio); \ folio->page.page_type = UINT_MAX; \ } @@ -993,11 +997,15 @@ static __always_inline int Page##uname(const struct page *page) \ } \ static __always_inline void __SetPage##uname(struct page *page) \ { \ + if (Page##uname(page)) \ + return; \ VM_BUG_ON_PAGE(data_race(page->page_type) != UINT_MAX, page); \ page->page_type = (unsigned int)PGTY_##lname << 24; \ } \ static __always_inline void __ClearPage##uname(struct page *page) \ { \ + if (page->page_type == UINT_MAX) \ + return; \ VM_BUG_ON_PAGE(!Page##uname(page), page); \ page->page_type = UINT_MAX; \ }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110510-props-clutter-f4ae@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:39 +0000 Subject: [PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton(a)google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Reported-by: David Stevens <stevensd(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: kernel test robot <lkp(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9342e5692dab..5b1c984daf45 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -555,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -574,8 +574,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index a8797d1b3d49..73d5998677d4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -885,13 +885,10 @@ static bool folio_referenced_one(struct folio *folio, return false; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f1d33e4b360..ddaaff67642e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include <linux/khugepaged.h> #include <linux/rculist_nulls.h> #include <linux/random.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -3294,7 +3295,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3306,13 +3308,20 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte))) return -1; + if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3324,9 +3333,15 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pmd_devmap(pmd))) return -1; + if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3335,10 +3350,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3394,20 +3405,16 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { - continue; - } - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(args->vma, addr, pte + i)) + continue; young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,21 +3480,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) + if (!pmd_present(pmd[i])) goto next; if (!pmd_trans_huge(pmd[i])) { - if (!walk->force_scan && should_clear_pmd_young()) + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,24 +3556,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - continue; - } - - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - continue; - - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + if (pfn != -1) + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } - if (!walk->force_scan && should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -4036,13 +4041,13 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; - int young = 0; + int young = 1; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; @@ -4058,12 +4063,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!ptep_clear_young_notify(vma, addr, pte)) + return false; + if (spin_is_contended(pvmw->ptl)) - return; + return true; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return true; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4071,6 +4079,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return true; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4084,7 +4095,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return true; arch_enter_lazy_mmu_mode(); @@ -4094,19 +4105,16 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) - continue; - folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(vma, addr, pte + i)) + continue; young++; @@ -4136,6 +4144,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return true; } /******************************************************************************

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110509-doable-blemish-19d2@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:39 +0000 Subject: [PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton(a)google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Reported-by: David Stevens <stevensd(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: kernel test robot <lkp(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9342e5692dab..5b1c984daf45 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -555,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -574,8 +574,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index a8797d1b3d49..73d5998677d4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -885,13 +885,10 @@ static bool folio_referenced_one(struct folio *folio, return false; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f1d33e4b360..ddaaff67642e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include <linux/khugepaged.h> #include <linux/rculist_nulls.h> #include <linux/random.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -3294,7 +3295,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3306,13 +3308,20 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte))) return -1; + if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3324,9 +3333,15 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pmd_devmap(pmd))) return -1; + if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3335,10 +3350,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3394,20 +3405,16 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { - continue; - } - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(args->vma, addr, pte + i)) + continue; young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,21 +3480,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) + if (!pmd_present(pmd[i])) goto next; if (!pmd_trans_huge(pmd[i])) { - if (!walk->force_scan && should_clear_pmd_young()) + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,24 +3556,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - continue; - } - - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - continue; - - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + if (pfn != -1) + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } - if (!walk->force_scan && should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -4036,13 +4041,13 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; - int young = 0; + int young = 1; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; @@ -4058,12 +4063,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!ptep_clear_young_notify(vma, addr, pte)) + return false; + if (spin_is_contended(pvmw->ptl)) - return; + return true; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return true; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4071,6 +4079,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return true; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4084,7 +4095,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return true; arch_enter_lazy_mmu_mode(); @@ -4094,19 +4105,16 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) - continue; - folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(vma, addr, pte + i)) + continue; young++; @@ -4136,6 +4144,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return true; } /******************************************************************************

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify()" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110508-poppy-cameo-fa1b@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 1d4832becdc2cdb2cffe2a6050c9d9fd8ff1c58c Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:39 +0000 Subject: [PATCH] mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton(a)google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Reported-by: David Stevens <stevensd(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: kernel test robot <lkp(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9342e5692dab..5b1c984daf45 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -555,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -574,8 +574,9 @@ static inline void lru_gen_init_lruvec(struct lruvec *lruvec) { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) diff --git a/mm/rmap.c b/mm/rmap.c index a8797d1b3d49..73d5998677d4 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -885,13 +885,10 @@ static bool folio_referenced_one(struct folio *folio, return false; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; diff --git a/mm/vmscan.c b/mm/vmscan.c index 4f1d33e4b360..ddaaff67642e 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -56,6 +56,7 @@ #include <linux/khugepaged.h> #include <linux/rculist_nulls.h> #include <linux/random.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -3294,7 +3295,8 @@ static bool get_next_vma(unsigned long mask, unsigned long size, struct mm_walk return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3306,13 +3308,20 @@ static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte))) return -1; + if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3324,9 +3333,15 @@ static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned if (WARN_ON_ONCE(pmd_devmap(pmd))) return -1; + if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3335,10 +3350,6 @@ static struct folio *get_pfn_folio(unsigned long pfn, struct mem_cgroup *memcg, { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3394,20 +3405,16 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { - continue; - } - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(args->vma, addr, pte + i)) + continue; young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,21 +3480,25 @@ static void walk_pmd_range_locked(pud_t *pud, unsigned long addr, struct vm_area /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) + if (!pmd_present(pmd[i])) goto next; if (!pmd_trans_huge(pmd[i])) { - if (!walk->force_scan && should_clear_pmd_young()) + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,24 +3556,18 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - continue; - } - - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - continue; - - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + if (pfn != -1) + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } - if (!walk->force_scan && should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -4036,13 +4041,13 @@ static void lru_gen_age_node(struct pglist_data *pgdat, struct scan_control *sc) * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; - int young = 0; + int young = 1; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; @@ -4058,12 +4063,15 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!ptep_clear_young_notify(vma, addr, pte)) + return false; + if (spin_is_contended(pvmw->ptl)) - return; + return true; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return true; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4071,6 +4079,9 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return true; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4084,7 +4095,7 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return true; arch_enter_lazy_mmu_mode(); @@ -4094,19 +4105,16 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) - continue; - folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(vma, addr, pte + i)) + continue; young++; @@ -4136,6 +4144,8 @@ void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return true; } /******************************************************************************

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x ddd6d8e975b171ea3f63a011a75820883ff0d479 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110504-flagship-precook-4a47@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:38 +0000 Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google… This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: David Stevens <stevensd(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 17506e4a2835..9342e5692dab 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -458,9 +458,7 @@ struct lru_gen_folio { enum { MM_LEAF_TOTAL, /* total leaf entries */ - MM_LEAF_OLD, /* old leaf entries */ MM_LEAF_YOUNG, /* young leaf entries */ - MM_NONLEAF_TOTAL, /* total non-leaf entries */ MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ NR_MM_STATS diff --git a/mm/vmscan.c b/mm/vmscan.c index eb4e8440c507..4f1d33e4b360 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, continue; if (!pte_young(ptent)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_TOTAL]++; if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, continue; } - walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; @@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, for (tier = 0; tier < MAX_NR_TIERS; tier++) { seq_printf(m, " %10d", tier); for (type = 0; type < ANON_AND_FILE; type++) { - const char *s = " "; + const char *s = "xxx"; unsigned long n[3] = {}; if (seq == max_seq) { - s = "RT "; + s = "RTx"; n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); n[1] = READ_ONCE(lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { @@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, seq_puts(m, " "); for (i = 0; i < NR_MM_STATS; i++) { - const char *s = " "; + const char *s = "xxxx"; unsigned long n = 0; if (seq == max_seq && NR_HIST_GENS == 1) { - s = "LOYNFA"; + s = "TYFA"; n = READ_ONCE(mm_state->stats[hist][i]); } else if (seq != max_seq && NR_HIST_GENS > 1) { - s = "loynfa"; + s = "tyfa"; n = READ_ONCE(mm_state->stats[hist][i]); }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x ddd6d8e975b171ea3f63a011a75820883ff0d479 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110502-removal-babied-f697@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:38 +0000 Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google… This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: David Stevens <stevensd(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 17506e4a2835..9342e5692dab 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -458,9 +458,7 @@ struct lru_gen_folio { enum { MM_LEAF_TOTAL, /* total leaf entries */ - MM_LEAF_OLD, /* old leaf entries */ MM_LEAF_YOUNG, /* young leaf entries */ - MM_NONLEAF_TOTAL, /* total non-leaf entries */ MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ NR_MM_STATS diff --git a/mm/vmscan.c b/mm/vmscan.c index eb4e8440c507..4f1d33e4b360 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, continue; if (!pte_young(ptent)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_TOTAL]++; if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, continue; } - walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; @@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, for (tier = 0; tier < MAX_NR_TIERS; tier++) { seq_printf(m, " %10d", tier); for (type = 0; type < ANON_AND_FILE; type++) { - const char *s = " "; + const char *s = "xxx"; unsigned long n[3] = {}; if (seq == max_seq) { - s = "RT "; + s = "RTx"; n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); n[1] = READ_ONCE(lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { @@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, seq_puts(m, " "); for (i = 0; i < NR_MM_STATS; i++) { - const char *s = " "; + const char *s = "xxxx"; unsigned long n = 0; if (seq == max_seq && NR_HIST_GENS == 1) { - s = "LOYNFA"; + s = "TYFA"; n = READ_ONCE(mm_state->stats[hist][i]); } else if (seq != max_seq && NR_HIST_GENS > 1) { - s = "loynfa"; + s = "tyfa"; n = READ_ONCE(mm_state->stats[hist][i]); }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x ddd6d8e975b171ea3f63a011a75820883ff0d479 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110501-scrubber-eating-d64d@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From ddd6d8e975b171ea3f63a011a75820883ff0d479 Mon Sep 17 00:00:00 2001 From: Yu Zhao <yuzhao(a)google.com> Date: Sat, 19 Oct 2024 01:29:38 +0000 Subject: [PATCH] mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google… This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: David Stevens <stevensd(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 17506e4a2835..9342e5692dab 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -458,9 +458,7 @@ struct lru_gen_folio { enum { MM_LEAF_TOTAL, /* total leaf entries */ - MM_LEAF_OLD, /* old leaf entries */ MM_LEAF_YOUNG, /* young leaf entries */ - MM_NONLEAF_TOTAL, /* total non-leaf entries */ MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ NR_MM_STATS diff --git a/mm/vmscan.c b/mm/vmscan.c index eb4e8440c507..4f1d33e4b360 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -3399,7 +3399,6 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end, continue; if (!pte_young(ptent)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3552,7 +3551,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, walk->mm_stats[MM_LEAF_TOTAL]++; if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3564,8 +3562,6 @@ static void walk_pmd_range(pud_t *pud, unsigned long start, unsigned long end, continue; } - walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; @@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, for (tier = 0; tier < MAX_NR_TIERS; tier++) { seq_printf(m, " %10d", tier); for (type = 0; type < ANON_AND_FILE; type++) { - const char *s = " "; + const char *s = "xxx"; unsigned long n[3] = {}; if (seq == max_seq) { - s = "RT "; + s = "RTx"; n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); n[1] = READ_ONCE(lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { @@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec, seq_puts(m, " "); for (i = 0; i < NR_MM_STATS; i++) { - const char *s = " "; + const char *s = "xxxx"; unsigned long n = 0; if (seq == max_seq && NR_HIST_GENS == 1) { - s = "LOYNFA"; + s = "TYFA"; n = READ_ONCE(mm_state->stats[hist][i]); } else if (seq != max_seq && NR_HIST_GENS > 1) { - s = "loynfa"; + s = "tyfa"; n = READ_ONCE(mm_state->stats[hist][i]); }

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 01626a18230246efdcea322aa8f067e60ffe5ccd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110506-octane-phosphate-f084@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001 From: Barry Song <baohua(a)kernel.org> Date: Fri, 27 Sep 2024 09:19:36 +1200 Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/memory.c b/mm/memory.c index 3ccee51adfbb..bdf77a3ec47b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret;

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 01626a18230246efdcea322aa8f067e60ffe5ccd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110505-january-napped-4f1b@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001 From: Barry Song <baohua(a)kernel.org> Date: Fri, 27 Sep 2024 09:19:36 +1200 Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/memory.c b/mm/memory.c index 3ccee51adfbb..bdf77a3ec47b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret;

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 01626a18230246efdcea322aa8f067e60ffe5ccd # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110504-water-overarch-bbef@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 01626a18230246efdcea322aa8f067e60ffe5ccd Mon Sep 17 00:00:00 2001 From: Barry Song <baohua(a)kernel.org> Date: Fri, 27 Sep 2024 09:19:36 +1200 Subject: [PATCH] mm: avoid unconditional one-tick sleep when swapcache_prepare fails Commit 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") introduced an unconditional one-tick sleep when `swapcache_prepare()` fails, which has led to reports of UI stuttering on latency-sensitive Android devices. To address this, we can use a waitqueue to wake up tasks that fail `swapcache_prepare()` sooner, instead of always sleeping for a full tick. While tasks may occasionally be woken by an unrelated `do_swap_page()`, this method is preferable to two scenarios: rapid re-entry into page faults, which can cause livelocks, and multiple millisecond sleeps, which visibly degrade user experience. Oven's testing shows that a single waitqueue resolves the UI stuttering issue. If a 'thundering herd' problem becomes apparent later, a waitqueue hash similar to `folio_wait_table[PAGE_WAIT_TABLE_SIZE]` for page bit locks can be introduced. [v-songbaohua(a)oppo.com: wake_up only when swapcache_wq waitqueue is active] Link: https://lkml.kernel.org/r/20241008130807.40833-1-21cnbao@gmail.com Link: https://lkml.kernel.org/r/20240926211936.75373-1-21cnbao@gmail.com Fixes: 13ddaf26be32 ("mm/swap: fix race when skipping swapcache") Signed-off-by: Barry Song <v-songbaohua(a)oppo.com> Reported-by: Oven Liyang <liyangouwen1(a)oppo.com> Tested-by: Oven Liyang <liyangouwen1(a)oppo.com> Cc: Kairui Song <kasong(a)tencent.com> Cc: "Huang, Ying" <ying.huang(a)intel.com> Cc: Yu Zhao <yuzhao(a)google.com> Cc: David Hildenbrand <david(a)redhat.com> Cc: Chris Li <chrisl(a)kernel.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Yosry Ahmed <yosryahmed(a)google.com> Cc: SeongJae Park <sj(a)kernel.org> Cc: Kalesh Singh <kaleshsingh(a)google.com> Cc: Suren Baghdasaryan <surenb(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/mm/memory.c b/mm/memory.c index 3ccee51adfbb..bdf77a3ec47b 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4187,6 +4187,8 @@ static struct folio *alloc_swap_folio(struct vm_fault *vmf) } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +static DECLARE_WAIT_QUEUE_HEAD(swapcache_wq); + /* * We enter with non-exclusive mmap_lock (to exclude vma changes, * but allow concurrent faults), and pte mapped but not yet locked. @@ -4199,6 +4201,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) { struct vm_area_struct *vma = vmf->vma; struct folio *swapcache, *folio = NULL; + DECLARE_WAITQUEUE(wait, current); struct page *page; struct swap_info_struct *si = NULL; rmap_t rmap_flags = RMAP_NONE; @@ -4297,7 +4300,9 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * Relax a bit to prevent rapid * repeated page faults. */ + add_wait_queue(&swapcache_wq, &wait); schedule_timeout_uninterruptible(1); + remove_wait_queue(&swapcache_wq, &wait); goto out_page; } need_clear_cache = true; @@ -4604,8 +4609,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) pte_unmap_unlock(vmf->pte, vmf->ptl); out: /* Clear the swap cache pin for direct swapin after PTL unlock */ - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret; @@ -4620,8 +4628,11 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) folio_unlock(swapcache); folio_put(swapcache); } - if (need_clear_cache) + if (need_clear_cache) { swapcache_clear(si, entry, nr_pages); + if (waitqueue_active(&swapcache_wq)) + wake_up(&swapcache_wq); + } if (si) put_swap_device(si); return ret;

1 year, 1 month

1
0
0 0

[PATCH 1/2 net V2] net: vertexcom: mse102x: Fix possible double free of TX skb

by Stefan Wahren

The scope of the TX skb is wider than just mse102x_tx_frame_spi(), so in case the TX skb room needs to be expanded, we should free the the temporary skb instead of the original skb. Otherwise the original TX skb pointer would be freed again in mse102x_tx_work(), which leads to crashes: Internal error: Oops: 0000000096000004 [#2] PREEMPT SMP CPU: 0 PID: 712 Comm: kworker/0:1 Tainted: G D 6.6.23 Hardware name: chargebyte Charge SOM DC-ONE (DT) Workqueue: events mse102x_tx_work [mse102x] pstate: 20400009 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : skb_release_data+0xb8/0x1d8 lr : skb_release_data+0x1ac/0x1d8 sp : ffff8000819a3cc0 x29: ffff8000819a3cc0 x28: ffff0000046daa60 x27: ffff0000057f2dc0 x26: ffff000005386c00 x25: 0000000000000002 x24: 00000000ffffffff x23: 0000000000000000 x22: 0000000000000001 x21: ffff0000057f2e50 x20: 0000000000000006 x19: 0000000000000000 x18: ffff00003fdacfcc x17: e69ad452d0c49def x16: 84a005feff870102 x15: 0000000000000000 x14: 000000000000024a x13: 0000000000000002 x12: 0000000000000000 x11: 0000000000000400 x10: 0000000000000930 x9 : ffff00003fd913e8 x8 : fffffc00001bc008 x7 : 0000000000000000 x6 : 0000000000000008 x5 : ffff00003fd91340 x4 : 0000000000000000 x3 : 0000000000000009 x2 : 00000000fffffffe x1 : 0000000000000000 x0 : 0000000000000000 Call trace: skb_release_data+0xb8/0x1d8 kfree_skb_reason+0x48/0xb0 mse102x_tx_work+0x164/0x35c [mse102x] process_one_work+0x138/0x260 worker_thread+0x32c/0x438 kthread+0x118/0x11c ret_from_fork+0x10/0x20 Code: aa1303e0 97fffab6 72001c1f 54000141 (f9400660) Cc: stable(a)vger.kernel.org Fixes: 2f207cbf0dd4 ("net: vertexcom: Add MSE102x SPI support") Signed-off-by: Stefan Wahren <wahrenst(a)gmx.net> --- drivers/net/ethernet/vertexcom/mse102x.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/vertexcom/mse102x.c b/drivers/net/ethernet/vertexcom/mse102x.c index a04d4073def9..2c37957478fb 100644 --- a/drivers/net/ethernet/vertexcom/mse102x.c +++ b/drivers/net/ethernet/vertexcom/mse102x.c @@ -222,7 +222,7 @@ static int mse102x_tx_frame_spi(struct mse102x_net *mse, struct sk_buff *txp, struct mse102x_net_spi *mses = to_mse102x_spi(mse); struct spi_transfer *xfer = &mses->spi_xfer; struct spi_message *msg = &mses->spi_msg; - struct sk_buff *tskb; + struct sk_buff *tskb = NULL; int ret; netif_dbg(mse, tx_queued, mse->ndev, "%s: skb %p, %d@%p\n", @@ -235,7 +235,6 @@ static int mse102x_tx_frame_spi(struct mse102x_net *mse, struct sk_buff *txp, if (!tskb) return -ENOMEM; - dev_kfree_skb(txp); txp = tskb; } @@ -257,6 +256,8 @@ static int mse102x_tx_frame_spi(struct mse102x_net *mse, struct sk_buff *txp, mse->stats.xfer_err++; } + dev_kfree_skb(tskb); + return ret; } -- 2.34.1

1 year, 1 month

1
0
0 0

WTF: patch "[PATCH] riscv: Remove duplicated GET_RM" was seriously submitted to be applied to the 6.11-stable tree?

by gregkh＠linuxfoundation.org

The patch below was submitted to be applied to the 6.11-stable tree. I fail to see how this patch meets the stable kernel rules as found at Documentation/process/stable-kernel-rules.rst. I could be totally wrong, and if so, please respond to <stable(a)vger.kernel.org> and let me know why this patch should be applied. Otherwise, it is now dropped from my patch queues, never to be seen again. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 164f66de6bb6ef454893f193c898dc8f1da6d18b Mon Sep 17 00:00:00 2001 From: Chunyan Zhang <zhangchunyan(a)iscas.ac.cn> Date: Tue, 8 Oct 2024 17:41:39 +0800 Subject: [PATCH] riscv: Remove duplicated GET_RM The macro GET_RM defined twice in this file, one can be removed. Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Signed-off-by: Chunyan Zhang <zhangchunyan(a)iscas.ac.cn> Fixes: 956d705dd279 ("riscv: Unaligned load/store handling for M_MODE") Cc: stable(a)vger.kernel.org Link: https://lore.kernel.org/r/20241008094141.549248-3-zhangchunyan@iscas.ac.cn Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> diff --git a/arch/riscv/kernel/traps_misaligned.c b/arch/riscv/kernel/traps_misaligned.c index d4fd8af7aaf5..1b9867136b61 100644 --- a/arch/riscv/kernel/traps_misaligned.c +++ b/arch/riscv/kernel/traps_misaligned.c @@ -136,8 +136,6 @@ #define REG_PTR(insn, pos, regs) \ (ulong *)((ulong)(regs) + REG_OFFSET(insn, pos)) -#define GET_RM(insn) (((insn) >> 12) & 7) - #define GET_RS1(insn, regs) (*REG_PTR(insn, SH_RS1, regs)) #define GET_RS2(insn, regs) (*REG_PTR(insn, SH_RS2, regs)) #define GET_RS1S(insn, regs) (*REG_PTR(RVC_RS1S(insn), 0, regs))

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction" failed to apply to 6.6-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.6-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y git checkout FETCH_HEAD git cherry-pick -x 7245012f0f496162dd95d888ed2ceb5a35170f1a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110521-antler-uneatable-d873@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001 From: Johannes Berg <johannes.berg(a)intel.com> Date: Wed, 23 Oct 2024 09:17:44 +0200 Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction If more than 255 colocated APs exist for the set of all APs found during 2.4/5 GHz scanning, then the 6 GHz scan construction will loop forever since the loop variable has type u8, which can never reach the number found when that's bigger than 255, and is stored in a u32 variable. Also move it into the loops to have a smaller scope. Using a u32 there is fine, we limit the number of APs in the scan list and each has a limit on the number of RNR entries due to the frame size. With a limit of 1000 scan results, a frame size upper bound of 4096 (really it's more like ~2300) and a TBTT entry size of at least 11, we get an upper bound for the number of ~372k, well in the bounds of a u32. Cc: stable(a)vger.kernel.org Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375 Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101… Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 3ce9150213a7..ddcbd80a49fb 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, &cp->channel_config[ch_cnt]; u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0; - u8 j, k, n_s_ssids = 0, n_bssids = 0; + u8 k, n_s_ssids = 0, n_bssids = 0; u8 max_s_ssids, max_bssids; bool force_passive = false, found = false, allow_passive = true, unsolicited_probe_on_chan = false, psc_no_listen = false; @@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, cfg->v5.iter_count = 1; cfg->v5.iter_interval = 0; - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { s8 tmp_psd_20; if (!(scan_6ghz_params[j].channel_idx == i)) @@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, * SSID. * TODO: improve this logic */ - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { if (!(scan_6ghz_params[j].channel_idx == i)) continue;

1 year, 1 month

2
1
0 0

FAILED: patch "[PATCH] RISC-V: disallow gcc + rust builds" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 33549fcf37ec461f398f0a41e1c9948be2e5aca4 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110519-underage-paying-6d38@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 33549fcf37ec461f398f0a41e1c9948be2e5aca4 Mon Sep 17 00:00:00 2001 From: Conor Dooley <conor.dooley(a)microchip.com> Date: Tue, 1 Oct 2024 12:28:13 +0100 Subject: [PATCH] RISC-V: disallow gcc + rust builds During the discussion before supporting rust on riscv, it was decided not to support gcc yet, due to differences in extension handling compared to llvm (only the version of libclang matching the c compiler is supported). Recently Jason Montleon reported [1] that building with gcc caused build issues, due to unsupported arguments being passed to libclang. After some discussion between myself and Miguel, it is better to disable gcc + rust builds to match the original intent, and subsequently support it when an appropriate set of extensions can be deduced from the version of libclang. Closes: https://lore.kernel.org/all/20240917000848.720765-2-jmontleo@redhat.com/ [1] Link: https://lore.kernel.org/all/20240926-battering-revolt-6c6a7827413e@spud/ [2] Fixes: 70a57b247251a ("RISC-V: enable building 64-bit kernels with rust support") Cc: stable(a)vger.kernel.org Reported-by: Jason Montleon <jmontleo(a)redhat.com> Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com> Acked-by: Miguel Ojeda <ojeda(a)kernel.org> Reviewed-by: Nathan Chancellor <nathan(a)kernel.org> Link: https://lore.kernel.org/r/20241001-playlist-deceiving-16ece2f440f5@spud Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> diff --git a/Documentation/rust/arch-support.rst b/Documentation/rust/arch-support.rst index 750ff371570a..54be7ddf3e57 100644 --- a/Documentation/rust/arch-support.rst +++ b/Documentation/rust/arch-support.rst @@ -17,7 +17,7 @@ Architecture Level of support Constraints ============= ================ ============================================== ``arm64`` Maintained Little Endian only. ``loongarch`` Maintained \- -``riscv`` Maintained ``riscv64`` only. +``riscv`` Maintained ``riscv64`` and LLVM/Clang only. ``um`` Maintained \- ``x86`` Maintained ``x86_64`` only. ============= ================ ============================================== diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 62545946ecf4..f4c570538d55 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -177,7 +177,7 @@ config RISCV select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RETHOOK if !XIP_KERNEL select HAVE_RSEQ - select HAVE_RUST if RUSTC_SUPPORTS_RISCV + select HAVE_RUST if RUSTC_SUPPORTS_RISCV && CC_IS_CLANG select HAVE_SAMPLE_FTRACE_DIRECT select HAVE_SAMPLE_FTRACE_DIRECT_MULTI select HAVE_STACKPROTECTOR

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] nilfs2: fix kernel bug due to missing clearing of checked" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110533-dexterous-semantic-aa9e@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd..10def4b55995 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) {

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] nilfs2: fix kernel bug due to missing clearing of checked" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y git checkout FETCH_HEAD git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110532-veal-argue-331b@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd..10def4b55995 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) {

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] nilfs2: fix kernel bug due to missing clearing of checked" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y git checkout FETCH_HEAD git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110531-wound-perkiness-ed39@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd..10def4b55995 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) {

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] nilfs2: fix kernel bug due to missing clearing of checked" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110530-diligence-author-75bb@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd..10def4b55995 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) {

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] nilfs2: fix kernel bug due to missing clearing of checked" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 41e192ad2779cae0102879612dfe46726e4396aa # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110530-elastic-reproduce-14c4@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 41e192ad2779cae0102879612dfe46726e4396aa Mon Sep 17 00:00:00 2001 From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Date: Fri, 18 Oct 2024 04:33:10 +0900 Subject: [PATCH] nilfs2: fix kernel bug due to missing clearing of checked flag Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2694a ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com> Reported-by: syzbot+d6ca2daf692c7a82f959(a)syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> diff --git a/fs/nilfs2/page.c b/fs/nilfs2/page.c index 5436eb0424bd..10def4b55995 100644 --- a/fs/nilfs2/page.c +++ b/fs/nilfs2/page.c @@ -401,6 +401,7 @@ void nilfs_clear_folio_dirty(struct folio *folio) folio_clear_uptodate(folio); folio_clear_mappedtodisk(folio); + folio_clear_checked(folio); head = folio_buffers(folio); if (head) {

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] iio: adc: ad7380: fix supplies for ad7380-4" failed to apply to 6.11-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.11-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.11.y git checkout FETCH_HEAD git cherry-pick -x 05f9c67179c9a8d66dee175fb4b17f380908a26f # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110504-eloquent-stalling-91df@gregkh' --subject-prefix 'PATCH 6.11.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 05f9c67179c9a8d66dee175fb4b17f380908a26f Mon Sep 17 00:00:00 2001 From: Julien Stephan <jstephan(a)baylibre.com> Date: Tue, 22 Oct 2024 15:22:39 +0200 Subject: [PATCH] iio: adc: ad7380: fix supplies for ad7380-4 ad7380-4 is the only device in the family that does not have an internal reference. It uses "refin" as a required external reference. All other devices in the family use "refio"" as an optional external reference. Fixes: 737413da8704 ("iio: adc: ad7380: add support for ad738x-4 4 channels variants") Reviewed-by: Nuno Sa <nuno.sa(a)analog.com> Reviewed-by: David Lechner <dlechner(a)baylibre.com> Signed-off-by: Julien Stephan <jstephan(a)baylibre.com> Link: https://patch.msgid.link/20241022-ad7380-fix-supplies-v3-4-f0cefe1b7fa6@bay… Cc: <Stable(a)vger.kernel.org> Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> diff --git a/drivers/iio/adc/ad7380.c b/drivers/iio/adc/ad7380.c index b107d8e97ab3..fb728570debe 100644 --- a/drivers/iio/adc/ad7380.c +++ b/drivers/iio/adc/ad7380.c @@ -89,6 +89,7 @@ struct ad7380_chip_info { bool has_mux; const char * const *supplies; unsigned int num_supplies; + bool external_ref_only; const char * const *vcm_supplies; unsigned int num_vcm_supplies; const unsigned long *available_scan_masks; @@ -431,6 +432,7 @@ static const struct ad7380_chip_info ad7380_4_chip_info = { .num_simult_channels = 4, .supplies = ad7380_supplies, .num_supplies = ARRAY_SIZE(ad7380_supplies), + .external_ref_only = true, .available_scan_masks = ad7380_4_channel_scan_masks, .timing_specs = &ad7380_4_timing, }; @@ -1047,17 +1049,31 @@ static int ad7380_probe(struct spi_device *spi) "Failed to enable power supplies\n"); fsleep(T_POWERUP_US); - /* - * If there is no REFIO supply, then it means that we are using - * the internal 2.5V reference, otherwise REFIO is reference voltage. - */ - ret = devm_regulator_get_enable_read_voltage(&spi->dev, "refio"); - if (ret < 0 && ret != -ENODEV) - return dev_err_probe(&spi->dev, ret, - "Failed to get refio regulator\n"); + if (st->chip_info->external_ref_only) { + ret = devm_regulator_get_enable_read_voltage(&spi->dev, + "refin"); + if (ret < 0) + return dev_err_probe(&spi->dev, ret, + "Failed to get refin regulator\n"); - external_ref_en = ret != -ENODEV; - st->vref_mv = external_ref_en ? ret / 1000 : AD7380_INTERNAL_REF_MV; + st->vref_mv = ret / 1000; + + /* these chips don't have a register bit for this */ + external_ref_en = false; + } else { + /* + * If there is no REFIO supply, then it means that we are using + * the internal reference, otherwise REFIO is reference voltage. + */ + ret = devm_regulator_get_enable_read_voltage(&spi->dev, + "refio"); + if (ret < 0 && ret != -ENODEV) + return dev_err_probe(&spi->dev, ret, + "Failed to get refio regulator\n"); + + external_ref_en = ret != -ENODEV; + st->vref_mv = external_ref_en ? ret / 1000 : AD7380_INTERNAL_REF_MV; + } if (st->chip_info->num_vcm_supplies > ARRAY_SIZE(st->vcm_mv)) return dev_err_probe(&spi->dev, -EINVAL,

1 year, 1 month

1
0
0 0

[PATCH 2/6] PCI: of: Use device_{add,remove}_of_node() to attach of_node to existing device

by Herve Codina

The commit 407d1a51921e ("PCI: Create device tree node for bridge") creates of_node for PCI devices. The newly created of_node is attached to an existing device. This is done setting directly pdev->dev.of_node in the code. Even if pdev->dev.of_node cannot be previously set, this doesn't handle the fwnode field of the struct device. Indeed, this field needs to be set if it hasn't already been set. device_{add,remove}_of_node() have been introduced to handle this case. Use them instead of the direct setting. Fixes: 407d1a51921e ("PCI: Create device tree node for bridge") Cc: stable(a)vger.kernel.org Signed-off-by: Herve Codina <herve.codina(a)bootlin.com> --- drivers/pci/of.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/pci/of.c b/drivers/pci/of.c index dacea3fc5128..141ffbb1b3e6 100644 --- a/drivers/pci/of.c +++ b/drivers/pci/of.c @@ -655,8 +655,8 @@ void of_pci_remove_node(struct pci_dev *pdev) np = pci_device_to_OF_node(pdev); if (!np || !of_node_check_flag(np, OF_DYNAMIC)) return; - pdev->dev.of_node = NULL; + device_remove_of_node(&pdev->dev); of_changeset_revert(np->data); of_changeset_destroy(np->data); of_node_put(np); @@ -713,7 +713,7 @@ void of_pci_make_dev_node(struct pci_dev *pdev) goto out_free_node; np->data = cset; - pdev->dev.of_node = np; + device_add_of_node(&pdev->dev, np); kfree(name); return; -- 2.46.2

1 year, 1 month

2
2
0 0

FAILED: patch "[PATCH] staging: iio: frequency: ad9832: fix division by zero in" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y git checkout FETCH_HEAD git cherry-pick -x 6bd301819f8f69331a55ae2336c8b111fc933f3d # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110556-record-unscrew-f7e1@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 6bd301819f8f69331a55ae2336c8b111fc933f3d Mon Sep 17 00:00:00 2001 From: Zicheng Qu <quzicheng(a)huawei.com> Date: Tue, 22 Oct 2024 13:43:54 +0000 Subject: [PATCH] staging: iio: frequency: ad9832: fix division by zero in ad9832_calc_freqreg() In the ad9832_write_frequency() function, clk_get_rate() might return 0. This can lead to a division by zero when calling ad9832_calc_freqreg(). The check if (fout > (clk_get_rate(st->mclk) / 2)) does not protect against the case when fout is 0. The ad9832_write_frequency() function is called from ad9832_write(), and fout is derived from a text buffer, which can contain any value. Link: https://lore.kernel.org/all/2024100904-CVE-2024-47663-9bdc@gregkh/ Fixes: ea707584bac1 ("Staging: IIO: DDS: AD9832 / AD9835 driver") Cc: stable(a)vger.kernel.org Signed-off-by: Zicheng Qu <quzicheng(a)huawei.com> Reviewed-by: Nuno Sa <nuno.sa(a)analog.com> Reviewed-by: Dan Carpenter <dan.carpenter(a)linaro.org> Link: https://patch.msgid.link/20241022134354.574614-1-quzicheng@huawei.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com> diff --git a/drivers/staging/iio/frequency/ad9832.c b/drivers/staging/iio/frequency/ad9832.c index 6c390c4eb26d..492612e8f8ba 100644 --- a/drivers/staging/iio/frequency/ad9832.c +++ b/drivers/staging/iio/frequency/ad9832.c @@ -129,12 +129,15 @@ static unsigned long ad9832_calc_freqreg(unsigned long mclk, unsigned long fout) static int ad9832_write_frequency(struct ad9832_state *st, unsigned int addr, unsigned long fout) { + unsigned long clk_freq; unsigned long regval; - if (fout > (clk_get_rate(st->mclk) / 2)) + clk_freq = clk_get_rate(st->mclk); + + if (!clk_freq || fout > (clk_freq / 2)) return -EINVAL; - regval = ad9832_calc_freqreg(clk_get_rate(st->mclk), fout); + regval = ad9832_calc_freqreg(clk_freq, fout); st->freq_data[0] = cpu_to_be16((AD9832_CMD_FRE8BITSW << CMD_SHIFT) | (addr << ADD_SHIFT) |

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction" failed to apply to 5.15-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.15-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y git checkout FETCH_HEAD git cherry-pick -x 7245012f0f496162dd95d888ed2ceb5a35170f1a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110522-unshaken-unvisited-f359@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001 From: Johannes Berg <johannes.berg(a)intel.com> Date: Wed, 23 Oct 2024 09:17:44 +0200 Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction If more than 255 colocated APs exist for the set of all APs found during 2.4/5 GHz scanning, then the 6 GHz scan construction will loop forever since the loop variable has type u8, which can never reach the number found when that's bigger than 255, and is stored in a u32 variable. Also move it into the loops to have a smaller scope. Using a u32 there is fine, we limit the number of APs in the scan list and each has a limit on the number of RNR entries due to the frame size. With a limit of 1000 scan results, a frame size upper bound of 4096 (really it's more like ~2300) and a TBTT entry size of at least 11, we get an upper bound for the number of ~372k, well in the bounds of a u32. Cc: stable(a)vger.kernel.org Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375 Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101… Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 3ce9150213a7..ddcbd80a49fb 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, &cp->channel_config[ch_cnt]; u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0; - u8 j, k, n_s_ssids = 0, n_bssids = 0; + u8 k, n_s_ssids = 0, n_bssids = 0; u8 max_s_ssids, max_bssids; bool force_passive = false, found = false, allow_passive = true, unsolicited_probe_on_chan = false, psc_no_listen = false; @@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, cfg->v5.iter_count = 1; cfg->v5.iter_interval = 0; - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { s8 tmp_psd_20; if (!(scan_6ghz_params[j].channel_idx == i)) @@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, * SSID. * TODO: improve this logic */ - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { if (!(scan_6ghz_params[j].channel_idx == i)) continue;

1 year, 1 month

1
0
0 0

FAILED: patch "[PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 7245012f0f496162dd95d888ed2ceb5a35170f1a # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024110521-compile-luminous-2080@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 7245012f0f496162dd95d888ed2ceb5a35170f1a Mon Sep 17 00:00:00 2001 From: Johannes Berg <johannes.berg(a)intel.com> Date: Wed, 23 Oct 2024 09:17:44 +0200 Subject: [PATCH] wifi: iwlwifi: mvm: fix 6 GHz scan construction If more than 255 colocated APs exist for the set of all APs found during 2.4/5 GHz scanning, then the 6 GHz scan construction will loop forever since the loop variable has type u8, which can never reach the number found when that's bigger than 255, and is stored in a u32 variable. Also move it into the loops to have a smaller scope. Using a u32 there is fine, we limit the number of APs in the scan list and each has a limit on the number of RNR entries due to the frame size. With a limit of 1000 scan results, a frame size upper bound of 4096 (really it's more like ~2300) and a TBTT entry size of at least 11, we get an upper bound for the number of ~372k, well in the bounds of a u32. Cc: stable(a)vger.kernel.org Fixes: eae94cf82d74 ("iwlwifi: mvm: add support for 6GHz") Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219375 Link: https://patch.msgid.link/20241023091744.f4baed5c08a1.I8b417148bbc8c5d11c101… Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c index 3ce9150213a7..ddcbd80a49fb 100644 --- a/drivers/net/wireless/intel/iwlwifi/mvm/scan.c +++ b/drivers/net/wireless/intel/iwlwifi/mvm/scan.c @@ -1774,7 +1774,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, &cp->channel_config[ch_cnt]; u32 s_ssid_bitmap = 0, bssid_bitmap = 0, flags = 0; - u8 j, k, n_s_ssids = 0, n_bssids = 0; + u8 k, n_s_ssids = 0, n_bssids = 0; u8 max_s_ssids, max_bssids; bool force_passive = false, found = false, allow_passive = true, unsolicited_probe_on_chan = false, psc_no_listen = false; @@ -1799,7 +1799,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, cfg->v5.iter_count = 1; cfg->v5.iter_interval = 0; - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { s8 tmp_psd_20; if (!(scan_6ghz_params[j].channel_idx == i)) @@ -1873,7 +1873,7 @@ iwl_mvm_umac_scan_cfg_channels_v7_6g(struct iwl_mvm *mvm, * SSID. * TODO: improve this logic */ - for (j = 0; j < params->n_6ghz_params; j++) { + for (u32 j = 0; j < params->n_6ghz_params; j++) { if (!(scan_6ghz_params[j].channel_idx == i)) continue;

1 year, 1 month

1
0
0 0

[PATCH v2 1/2] drm: adv7511: Fix use-after-free in adv7533_attach_dsi()

by Biju Das

The host_node pointer assigned and freed in adv7533_parse_dt() and later adv7533_attach_dsi() uses the same. Fix this issue by freeing the host_node in adv7533_attach_dsi() instead of adv7533_parse_dt(). Fixes: 1e4d58cd7f88 ("drm/bridge: adv7533: Create a MIPI DSI device") Cc: stable(a)vger.kernel.org Signed-off-by: Biju Das <biju.das.jz(a)bp.renesas.com> --- Changes in v2: - Added the tag "Cc: stable(a)vger.kernel.org" in the sign-off area. - Dropped Archit Taneja invalid Mail address --- drivers/gpu/drm/bridge/adv7511/adv7533.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/bridge/adv7511/adv7533.c b/drivers/gpu/drm/bridge/adv7511/adv7533.c index 4481489aaf5e..3e57ba838e5e 100644 --- a/drivers/gpu/drm/bridge/adv7511/adv7533.c +++ b/drivers/gpu/drm/bridge/adv7511/adv7533.c @@ -133,6 +133,7 @@ int adv7533_patch_cec_registers(struct adv7511 *adv) int adv7533_attach_dsi(struct adv7511 *adv) { + struct device_node *np __free(device_node) = adv->host_node; struct device *dev = &adv->i2c_main->dev; struct mipi_dsi_host *host; struct mipi_dsi_device *dsi; @@ -181,8 +182,6 @@ int adv7533_parse_dt(struct device_node *np, struct adv7511 *adv) if (!adv->host_node) return -ENODEV; - of_node_put(adv->host_node); - adv->use_timing_gen = !of_property_read_bool(np, "adi,disable-timing-generator"); -- 2.43.0

1 year, 1 month

2
2
0 0

[PATCH] xen: Fix the issue of resource not being properly released in xenbus_dev_probe()

by Qiu-ji Chen

This patch fixes an issue in the function xenbus_dev_probe(). In the xenbus_dev_probe() function, within the if (err) branch at line 313, the program incorrectly returns err directly without releasing the resources allocated by err = drv->probe(dev, id). As the return value is non-zero, the upper layers assume the processing logic has failed. However, the probe operation was performed earlier without a corresponding remove operation. Since the probe actually allocates resources, failing to perform the remove operation could lead to problems. To fix this issue, we followed the resource release logic of the xenbus_dev_remove() function by adding a new block fail_remove before the fail_put block. After entering the branch if (err) at line 313, the function will use a goto statement to jump to the fail_remove block, ensuring that the previously acquired resources are correctly released, thus preventing the reference count leak. This bug was identified by an experimental static analysis tool developed by our team. The tool specializes in analyzing reference count operations and detecting potential issues where resources are not properly managed. In this case, the tool flagged the missing release operation as a potential problem, which led to the development of this patch. Fixes: 4bac07c993d0 ("xen: add the Xenbus sysfs and virtual device hotplug driver") Cc: stable(a)vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666(a)gmail.com> --- drivers/xen/xenbus/xenbus_probe.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/xen/xenbus/xenbus_probe.c b/drivers/xen/xenbus/xenbus_probe.c index 9f097f1f4a4c..6d32ffb01136 100644 --- a/drivers/xen/xenbus/xenbus_probe.c +++ b/drivers/xen/xenbus/xenbus_probe.c @@ -313,7 +313,7 @@ int xenbus_dev_probe(struct device *_dev) if (err) { dev_warn(&dev->dev, "watch_otherend on %s failed.\n", dev->nodename); - return err; + goto fail_remove; } dev->spurious_threshold = 1; @@ -322,6 +322,12 @@ int xenbus_dev_probe(struct device *_dev) dev->nodename); return 0; +fail_remove: + if (drv->remove) { + down(&dev->reclaim_sem); + drv->remove(dev); + up(&dev->reclaim_sem); + } fail_put: module_put(drv->driver.owner); fail: -- 2.34.1

1 year, 1 month

2
1
0 0

[PATCH v2] arm64/signal: Avoid corruption of SME state when entering signal handler

by Mark Brown

We intend that signal handlers are entered with PSTATE.{SM,ZA}={0,0}. The logic for this in setup_return() manipulates the saved state and live CPU state in an unsafe manner, and consequently, when a task enters a signal handler: * The task entering the signal handler might not have its PSTATE.{SM,ZA} bits cleared, and other register state that is affected by changes to PSTATE.{SM,ZA} might not be zeroed as expected. * An unrelated task might have its PSTATE.{SM,ZA} bits cleared unexpectedly, potentially zeroing other register state that is affected by changes to PSTATE.{SM,ZA}. Tasks which do not set PSTATE.{SM,ZA} (i.e. those only using plain FPSIMD or non-streaming SVE) are not affected, as there is no resulting change to PSTATE.{SM,ZA}. Consider for example two tasks on one CPU: A: Begins signal entry in kernel mode, is preempted prior to SMSTOP. B: Using SM and/or ZA in userspace with register state current on the CPU, is preempted. A: Scheduled in, no register state changes made as in kernel mode. A: Executes SMSTOP, modifying live register state. A: Scheduled out. B: Scheduled in, fpsimd_thread_switch() sees the register state on the CPU is tracked as being that for task B so the state is not reloaded prior to returning to userspace. Task B is now running with SM and ZA incorrectly cleared. Fix this by: * Checking TIF_FOREIGN_FPSTATE, and only updating the saved or live state as appropriate. * Using {get,put}_cpu_fpsimd_context() to ensure mutual exclusion against other code which manipulates this state. To allow their use, the logic is moved into a new fpsimd_enter_sighandler() helper in fpsimd.c. This race has been observed intermittently with fp-stress, especially with preempt disabled, commonly but not exclusively reporting "Bad SVCR: 0". Fixes: 40a8e87bb3285 ("arm64/sme: Disable ZA and streaming mode when handling signals") Signed-off-by: Mark Brown <broonie(a)kernel.org> Cc: stable(a)vger.kernel.org --- Changes in v2: - Commit message tweaks. - Flush the task state when updating in memory to ensure we reload. - Link to v1: https://lore.kernel.org/r/20241023-arm64-fp-sme-sigentry-v1-1-249ff7ec3ad0@… --- arch/arm64/include/asm/fpsimd.h | 1 + arch/arm64/kernel/fpsimd.c | 33 +++++++++++++++++++++++++++++++++ arch/arm64/kernel/signal.c | 19 +------------------ 3 files changed, 35 insertions(+), 18 deletions(-) diff --git a/arch/arm64/include/asm/fpsimd.h b/arch/arm64/include/asm/fpsimd.h index f2a84efc361858d4deda99faf1967cc7cac386c1..09af7cfd9f6c2cec26332caa4c254976e117b1bf 100644 --- a/arch/arm64/include/asm/fpsimd.h +++ b/arch/arm64/include/asm/fpsimd.h @@ -76,6 +76,7 @@ extern void fpsimd_load_state(struct user_fpsimd_state *state); extern void fpsimd_thread_switch(struct task_struct *next); extern void fpsimd_flush_thread(void); +extern void fpsimd_enter_sighandler(void); extern void fpsimd_signal_preserve_current_state(void); extern void fpsimd_preserve_current_state(void); extern void fpsimd_restore_current_state(void); diff --git a/arch/arm64/kernel/fpsimd.c b/arch/arm64/kernel/fpsimd.c index 77006df20a75aee7c991cf116b6d06bfe953d1a4..c4149f474ce889af42bc2ce9402e7d032818c2e4 100644 --- a/arch/arm64/kernel/fpsimd.c +++ b/arch/arm64/kernel/fpsimd.c @@ -1693,6 +1693,39 @@ void fpsimd_signal_preserve_current_state(void) sve_to_fpsimd(current); } +/* + * Called by the signal handling code when preparing current to enter + * a signal handler. Currently this only needs to take care of exiting + * streaming mode and clearing ZA on SME systems. + */ +void fpsimd_enter_sighandler(void) +{ + if (!system_supports_sme()) + return; + + get_cpu_fpsimd_context(); + + if (test_thread_flag(TIF_FOREIGN_FPSTATE)) { + /* Exiting streaming mode zeros the FPSIMD state */ + if (current->thread.svcr & SVCR_SM_MASK) { + memset(&current->thread.uw.fpsimd_state, 0, + sizeof(current->thread.uw.fpsimd_state)); + current->thread.fp_type = FP_STATE_FPSIMD; + } + + current->thread.svcr &= ~(SVCR_ZA_MASK | + SVCR_SM_MASK); + + /* Ensure any copies on other CPUs aren't reused */ + fpsimd_flush_task_state(current); + } else { + /* The register state is current, just update it. */ + sme_smstop(); + } + + put_cpu_fpsimd_context(); +} + /* * Called by KVM when entering the guest. */ diff --git a/arch/arm64/kernel/signal.c b/arch/arm64/kernel/signal.c index 5619869475304776fc005fe24a385bf86bfdd253..fe07d0bd9f7978d73973f07ce38b7bdd7914abb2 100644 --- a/arch/arm64/kernel/signal.c +++ b/arch/arm64/kernel/signal.c @@ -1218,24 +1218,7 @@ static void setup_return(struct pt_regs *regs, struct k_sigaction *ka, /* TCO (Tag Check Override) always cleared for signal handlers */ regs->pstate &= ~PSR_TCO_BIT; - /* Signal handlers are invoked with ZA and streaming mode disabled */ - if (system_supports_sme()) { - /* - * If we were in streaming mode the saved register - * state was SVE but we will exit SM and use the - * FPSIMD register state - flush the saved FPSIMD - * register state in case it gets loaded. - */ - if (current->thread.svcr & SVCR_SM_MASK) { - memset(&current->thread.uw.fpsimd_state, 0, - sizeof(current->thread.uw.fpsimd_state)); - current->thread.fp_type = FP_STATE_FPSIMD; - } - - current->thread.svcr &= ~(SVCR_ZA_MASK | - SVCR_SM_MASK); - sme_smstop(); - } + fpsimd_enter_sighandler(); if (system_supports_poe()) write_sysreg_s(POR_EL0_INIT, SYS_POR_EL0); --- base-commit: 8e929cb546ee42c9a61d24fae60605e9e3192354 change-id: 20241023-arm64-fp-sme-sigentry-a2bd7187e71b Best regards, -- Mark Brown <broonie(a)kernel.org>

1 year, 1 month

2
2
0 0

[PATCH 5.10.y 0/2] Backport fix of CVE-2024-47674 to 5.10

by Harshvardhan Jha

Following series is a backport of CVE-2024-47674 fix "mm: avoid leaving partial pfn mappings around in error case" to 5.10. This required an extra commit "mm: add remap_pfn_range_notrack" to make both picks clean. The patchset shows no regression compared to 5.10.228 tag. Christoph Hellwig (1): mm: add remap_pfn_range_notrack Linus Torvalds (1): mm: avoid leaving partial pfn mappings around in error case include/linux/mm.h | 2 ++ mm/memory.c | 70 ++++++++++++++++++++++++++++++++-------------- 2 files changed, 51 insertions(+), 21 deletions(-) -- 2.46.0

1 year, 1 month

1
2
0 0

[PATCH] soc: fsl: cpm1: qmc: Set the ret error code on platform_get_irq() failure

by Herve Codina

A kernel test robot detected a missing error code: qmc.c:1942 qmc_probe() warn: missing error code 'ret' Indeed, the error returned by platform_get_irq() is checked and the operation is aborted in case of failure but the ret error code is not set in that case. Set the ret error code. Reported-by: kernel test robot <lkp(a)intel.com> Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org> Closes: https://lore.kernel.org/r/202411051350.KNy6ZIWA-lkp@intel.com/ Fixes: 3178d58e0b97 ("soc: fsl: cpm1: Add support for QMC") Cc: stable(a)vger.kernel.org Signed-off-by: Herve Codina <herve.codina(a)bootlin.com> --- drivers/soc/fsl/qe/qmc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/soc/fsl/qe/qmc.c b/drivers/soc/fsl/qe/qmc.c index 19cc581b06d0..b3f773e135fd 100644 --- a/drivers/soc/fsl/qe/qmc.c +++ b/drivers/soc/fsl/qe/qmc.c @@ -2004,8 +2004,10 @@ static int qmc_probe(struct platform_device *pdev) /* Set the irq handler */ irq = platform_get_irq(pdev, 0); - if (irq < 0) + if (irq < 0) { + ret = irq; goto err_exit_xcc; + } ret = devm_request_irq(qmc->dev, irq, qmc_irq_handler, 0, "qmc", qmc); if (ret < 0) goto err_exit_xcc; -- 2.46.2

1 year, 1 month

1
0
0 0

[PATCH v4 0/2] uvc: Fix OOPs after rmmod if gpio unit is used

by Ricardo Ribalda

Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org> --- Changes in v4: Thanks Laurent. - Remove refcounted cleaup to support devres. - Link to v3: https://lore.kernel.org/r/20241105-uvc-crashrmmod-v3-1-c0959c8906d3@chromiu… Changes in v3: Thanks Sakari. - Rename variable to initialized. - Other CodeStyle. - Link to v2: https://lore.kernel.org/r/20241105-uvc-crashrmmod-v2-1-547ce6a6962e@chromiu… Changes in v2: Thanks to Laurent. - The main structure is not allocated with devres so there is a small period of time where we can get an irq with the structure free. Do not use devres for the IRQ. - Link to v1: https://lore.kernel.org/r/20241031-uvc-crashrmmod-v1-1-059fe593b1e6@chromiu… --- Ricardo Ribalda (2): media: uvcvideo: Remove refcounted cleanup media: uvcvideo: Fix crash during unbind if gpio unit is in use drivers/media/usb/uvc/uvc_driver.c | 30 ++++++++---------------------- drivers/media/usb/uvc/uvcvideo.h | 1 - 2 files changed, 8 insertions(+), 23 deletions(-) --- base-commit: c7ccf3683ac9746b263b0502255f5ce47f64fe0a change-id: 20241031-uvc-crashrmmod-666de3fc9141 Best regards, -- Ricardo Ribalda <ribalda(a)chromium.org>

1 year, 1 month

1
3
0 0

[PATCH v2] media: uvcvideo: Fix crash during unbind if gpio unit is in use

by Ricardo Ribalda

We used the wrong device for the device managed functions. We used the usb device, when we should be using the interface device. If we unbind the driver from the usb interface, the cleanup functions are never called. In our case, the IRQ is never disabled. If an IRQ is triggered, it will try to access memory sections that are already free, causing an OOPS. Luckily this bug has small impact, as it is only affected by devices with gpio units and the user has to unbind the device, a disconnect will not trigger this error. Cc: stable(a)vger.kernel.org Fixes: 2886477ff987 ("media: uvcvideo: Implement UVC_EXT_GPIO_UNIT") Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org> Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org> --- Changes in v2: Thanks to Laurent. - The main structure is not allocated with devres so there is a small period of time where we can get an irq with the structure free. Do not use devres for the IRQ. - Link to v1: https://lore.kernel.org/r/20241031-uvc-crashrmmod-v1-1-059fe593b1e6@chromiu… --- drivers/media/usb/uvc/uvc_driver.c | 28 +++++++++++++++++++++------- drivers/media/usb/uvc/uvcvideo.h | 1 + 2 files changed, 22 insertions(+), 7 deletions(-) diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c index a96f6ca0889f..af6aec27083c 100644 --- a/drivers/media/usb/uvc/uvc_driver.c +++ b/drivers/media/usb/uvc/uvc_driver.c @@ -1295,14 +1295,14 @@ static int uvc_gpio_parse(struct uvc_device *dev) struct gpio_desc *gpio_privacy; int irq; - gpio_privacy = devm_gpiod_get_optional(&dev->udev->dev, "privacy", + gpio_privacy = devm_gpiod_get_optional(&dev->intf->dev, "privacy", GPIOD_IN); if (IS_ERR_OR_NULL(gpio_privacy)) return PTR_ERR_OR_ZERO(gpio_privacy); irq = gpiod_to_irq(gpio_privacy); if (irq < 0) - return dev_err_probe(&dev->udev->dev, irq, + return dev_err_probe(&dev->intf->dev, irq, "No IRQ for privacy GPIO\n"); unit = uvc_alloc_new_entity(dev, UVC_EXT_GPIO_UNIT, @@ -1329,15 +1329,28 @@ static int uvc_gpio_parse(struct uvc_device *dev) static int uvc_gpio_init_irq(struct uvc_device *dev) { struct uvc_entity *unit = dev->gpio_unit; + int ret; if (!unit || unit->gpio.irq < 0) return 0; - return devm_request_threaded_irq(&dev->udev->dev, unit->gpio.irq, NULL, - uvc_gpio_irq, - IRQF_ONESHOT | IRQF_TRIGGER_FALLING | - IRQF_TRIGGER_RISING, - "uvc_privacy_gpio", dev); + ret = request_threaded_irq(unit->gpio.irq, NULL, uvc_gpio_irq, + IRQF_ONESHOT | IRQF_TRIGGER_FALLING | + IRQF_TRIGGER_RISING, + "uvc_privacy_gpio", dev); + + if (!ret) + dev->gpio_unit->gpio.inited = true; + + return ret; +} + +static void uvc_gpio_cleanup(struct uvc_device *dev) +{ + if (!dev->gpio_unit || !dev->gpio_unit->gpio.inited) + return; + + free_irq(dev->gpio_unit->gpio.irq, dev); } /* ------------------------------------------------------------------------ @@ -1880,6 +1893,7 @@ static void uvc_delete(struct kref *kref) struct uvc_device *dev = container_of(kref, struct uvc_device, ref); struct list_head *p, *n; + uvc_gpio_cleanup(dev); uvc_status_cleanup(dev); uvc_ctrl_cleanup_device(dev); diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h index 07f9921d83f2..376cd670539b 100644 --- a/drivers/media/usb/uvc/uvcvideo.h +++ b/drivers/media/usb/uvc/uvcvideo.h @@ -234,6 +234,7 @@ struct uvc_entity { u8 *bmControls; struct gpio_desc *gpio_privacy; int irq; + bool inited; } gpio; }; --- base-commit: c7ccf3683ac9746b263b0502255f5ce47f64fe0a change-id: 20241031-uvc-crashrmmod-666de3fc9141 Best regards, -- Ricardo Ribalda <ribalda(a)chromium.org>

1 year, 1 month

3
3
0 0

[PATCH v3] media: uvcvideo: Fix crash during unbind if gpio unit is in use

by Ricardo Ribalda

We used the wrong device for the device managed functions. We used the usb device, when we should be using the interface device. If we unbind the driver from the usb interface, the cleanup functions are never called. In our case, the IRQ is never disabled. If an IRQ is triggered, it will try to access memory sections that are already free, causing an OOPS. Luckily this bug has small impact, as it is only affected by devices with gpio units and the user has to unbind the device, a disconnect will not trigger this error. Cc: stable(a)vger.kernel.org Fixes: 2886477ff987 ("media: uvcvideo: Implement UVC_EXT_GPIO_UNIT") Reviewed-by: Sergey Senozhatsky <senozhatsky(a)chromium.org> Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org> --- Changes in v3: Thanks Sakari. - Rename variable to initialized. - Other CodeStyle. - Link to v2: https://lore.kernel.org/r/20241105-uvc-crashrmmod-v2-1-547ce6a6962e@chromiu… Changes in v2: Thanks to Laurent. - The main structure is not allocated with devres so there is a small period of time where we can get an irq with the structure free. Do not use devres for the IRQ. - Link to v1: https://lore.kernel.org/r/20241031-uvc-crashrmmod-v1-1-059fe593b1e6@chromiu… --- drivers/media/usb/uvc/uvc_driver.c | 27 ++++++++++++++++++++------- drivers/media/usb/uvc/uvcvideo.h | 1 + 2 files changed, 21 insertions(+), 7 deletions(-) diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c index a96f6ca0889f..2a907e3e3f81 100644 --- a/drivers/media/usb/uvc/uvc_driver.c +++ b/drivers/media/usb/uvc/uvc_driver.c @@ -1295,14 +1295,14 @@ static int uvc_gpio_parse(struct uvc_device *dev) struct gpio_desc *gpio_privacy; int irq; - gpio_privacy = devm_gpiod_get_optional(&dev->udev->dev, "privacy", + gpio_privacy = devm_gpiod_get_optional(&dev->intf->dev, "privacy", GPIOD_IN); if (IS_ERR_OR_NULL(gpio_privacy)) return PTR_ERR_OR_ZERO(gpio_privacy); irq = gpiod_to_irq(gpio_privacy); if (irq < 0) - return dev_err_probe(&dev->udev->dev, irq, + return dev_err_probe(&dev->intf->dev, irq, "No IRQ for privacy GPIO\n"); unit = uvc_alloc_new_entity(dev, UVC_EXT_GPIO_UNIT, @@ -1329,15 +1329,27 @@ static int uvc_gpio_parse(struct uvc_device *dev) static int uvc_gpio_init_irq(struct uvc_device *dev) { struct uvc_entity *unit = dev->gpio_unit; + int ret; if (!unit || unit->gpio.irq < 0) return 0; - return devm_request_threaded_irq(&dev->udev->dev, unit->gpio.irq, NULL, - uvc_gpio_irq, - IRQF_ONESHOT | IRQF_TRIGGER_FALLING | - IRQF_TRIGGER_RISING, - "uvc_privacy_gpio", dev); + ret = request_threaded_irq(unit->gpio.irq, NULL, uvc_gpio_irq, + IRQF_ONESHOT | IRQF_TRIGGER_FALLING | + IRQF_TRIGGER_RISING, + "uvc_privacy_gpio", dev); + + unit->gpio.initialized = !ret; + + return ret; +} + +static void uvc_gpio_cleanup(struct uvc_device *dev) +{ + if (!dev->gpio_unit || !dev->gpio_unit->gpio.initialized) + return; + + free_irq(dev->gpio_unit->gpio.irq, dev); } /* ------------------------------------------------------------------------ @@ -1880,6 +1892,7 @@ static void uvc_delete(struct kref *kref) struct uvc_device *dev = container_of(kref, struct uvc_device, ref); struct list_head *p, *n; + uvc_gpio_cleanup(dev); uvc_status_cleanup(dev); uvc_ctrl_cleanup_device(dev); diff --git a/drivers/media/usb/uvc/uvcvideo.h b/drivers/media/usb/uvc/uvcvideo.h index 07f9921d83f2..965a789ed03e 100644 --- a/drivers/media/usb/uvc/uvcvideo.h +++ b/drivers/media/usb/uvc/uvcvideo.h @@ -234,6 +234,7 @@ struct uvc_entity { u8 *bmControls; struct gpio_desc *gpio_privacy; int irq; + bool initialized; } gpio; }; --- base-commit: c7ccf3683ac9746b263b0502255f5ce47f64fe0a change-id: 20241031-uvc-crashrmmod-666de3fc9141 Best regards, -- Ricardo Ribalda <ribalda(a)chromium.org>

1 year, 1 month

1
0
0 0

[PATCH 0/2] drm: adv7511: ADV7535 fixes

by Biju Das

This patch series aims to fix 2 bugs in ADV7535 driver 1) use-after-free bug in adv7533_attach_dsi() 2) out-of-bounds array in adv7511_dsi_config_timing_gen() for clock_div_by_lanes[]. Biju Das (2): drm: adv7511: Fix use-after-free in adv7533_attach_dsi() drm: adv7511: Fix out-of-bounds array in clock_div_by_lanes drivers/gpu/drm/bridge/adv7511/adv7533.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) -- 2.43.0

1 year, 1 month

2
3
0 0

[PATCH v2] wifi: nl80211: fix bounds checker error in nl80211_parse_sched_scan

by Aleksei Vetrov

The channels array in the cfg80211_scan_request has a __counted_by attribute attached to it, which points to the n_channels variable. This attribute is used in bounds checking, and if it is not set before the array is filled, then the bounds sanitizer will issue a warning or a kernel panic if CONFIG_UBSAN_TRAP is set. This patch sets the size of allocated memory as the initial value for n_channels. It is updated with the actual number of added elements after the array is filled. Fixes: aa4ec06c455d ("wifi: cfg80211: use __counted_by where appropriate") Cc: stable(a)vger.kernel.org Signed-off-by: Aleksei Vetrov <vvvvvv(a)google.com> --- Changes in v2: - Added Fixes tag and added stable to CC - Link to v1: https://lore.kernel.org/r/20241028-nl80211_parse_sched_scan-bounds-checker-… --- net/wireless/nl80211.c | 1 + 1 file changed, 1 insertion(+) diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c index d7d099f7118ab5d5c745905abdea85d246c2b7b2..9b1b9dc5a7eb2a864da7b0212bc6a156b7757a9d 100644 --- a/net/wireless/nl80211.c +++ b/net/wireless/nl80211.c @@ -9776,6 +9776,7 @@ nl80211_parse_sched_scan(struct wiphy *wiphy, struct wireless_dev *wdev, request = kzalloc(size, GFP_KERNEL); if (!request) return ERR_PTR(-ENOMEM); + request->n_channels = n_channels; if (n_ssids) request->ssids = (void *)request + --- base-commit: 81983758430957d9a5cb3333fe324fd70cf63e7e change-id: 20241028-nl80211_parse_sched_scan-bounds-checker-fix-c5842f41b863 Best regards, -- Aleksei Vetrov <vvvvvv(a)google.com>

1 year, 1 month

3
6
0 0

Re: Patch "net: hns3: add sync command to sync io-pgtable" has been added to the 6.11-stable tree

by Jijie Shao

on 2024/11/2 3:22, Sasha Levin wrote: > This is a note to let you know that I've just added the patch titled > > net: hns3: add sync command to sync io-pgtable > > to the 6.11-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > net-hns3-add-sync-command-to-sync-io-pgtable.patch > and it can be found in the queue-6.11 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. Hi: This patch was reverted from netdev, so, it also need be reverted from stable tree. I am sorry for that. reverted link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=2… Thanks, Jijie Shao > > > > commit 0ea8c71561bc40a678c7bf15e081737e1f2d15e2 > Author: Jian Shen <shenjian15(a)huawei.com> > Date: Fri Oct 25 17:29:31 2024 +0800 > > net: hns3: add sync command to sync io-pgtable > > [ Upstream commit f2c14899caba76da93ff3fff46b4d5a8f43ce07e ] > > To avoid errors in pgtable prefectch, add a sync command to sync > io-pagtable. > > This is a supplement for the previous patch. > We want all the tx packet can be handled with tx bounce buffer path. > But it depends on the remain space of the spare buffer, checked by the > hns3_can_use_tx_bounce(). In most cases, maybe 99.99%, it returns true. > But once it return false by no available space, the packet will be handled > with the former path, which will map/unmap the skb buffer. > Then the driver will face the smmu prefetch risk again. > > So add a sync command in this case to avoid smmu prefectch, > just protects corner scenes. > > Fixes: 295ba232a8c3 ("net: hns3: add device version to replace pci revision") > Signed-off-by: Jian Shen <shenjian15(a)huawei.com> > Signed-off-by: Peiyang Wang <wangpeiyang1(a)huawei.com> > Signed-off-by: Jijie Shao <shaojijie(a)huawei.com> > Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> > Signed-off-by: Sasha Levin <sashal(a)kernel.org> > > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c > index ac88e301f2211..8760b4e9ade6b 100644 > --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c > +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c > @@ -381,6 +381,24 @@ static const struct hns3_rx_ptype hns3_rx_ptype_tbl[] = { > #define HNS3_INVALID_PTYPE \ > ARRAY_SIZE(hns3_rx_ptype_tbl) > > +static void hns3_dma_map_sync(struct device *dev, unsigned long iova) > +{ > + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > + struct iommu_iotlb_gather iotlb_gather; > + size_t granule; > + > + if (!domain || !iommu_is_dma_domain(domain)) > + return; > + > + granule = 1 << __ffs(domain->pgsize_bitmap); > + iova = ALIGN_DOWN(iova, granule); > + iotlb_gather.start = iova; > + iotlb_gather.end = iova + granule - 1; > + iotlb_gather.pgsize = granule; > + > + iommu_iotlb_sync(domain, &iotlb_gather); > +} > + > static irqreturn_t hns3_irq_handle(int irq, void *vector) > { > struct hns3_enet_tqp_vector *tqp_vector = vector; > @@ -1728,7 +1746,9 @@ static int hns3_map_and_fill_desc(struct hns3_enet_ring *ring, void *priv, > unsigned int type) > { > struct hns3_desc_cb *desc_cb = &ring->desc_cb[ring->next_to_use]; > + struct hnae3_handle *handle = ring->tqp->handle; > struct device *dev = ring_to_dev(ring); > + struct hnae3_ae_dev *ae_dev; > unsigned int size; > dma_addr_t dma; > > @@ -1760,6 +1780,13 @@ static int hns3_map_and_fill_desc(struct hns3_enet_ring *ring, void *priv, > return -ENOMEM; > } > > + /* Add a SYNC command to sync io-pgtale to avoid errors in pgtable > + * prefetch > + */ > + ae_dev = hns3_get_ae_dev(handle); > + if (ae_dev->dev_version >= HNAE3_DEVICE_VERSION_V3) > + hns3_dma_map_sync(dev, dma); > + > desc_cb->priv = priv; > desc_cb->length = size; > desc_cb->dma = dma;

1 year, 1 month

2
1
0 0

Re: Dell WD19TB Thunderbolt Dock not working with kernel > 6.6.28-1

by rick＠581238.xyz

Hi all, I am having the exact same issue. linux-lts-6.6.28-1 works, anything above doesnt. When kernel above linux-lts-6.6.28-1: - Boltctl does not show anything - thunderbolt.host_reset=0 had no impact - triggers following errors: [ 50.627948] ucsi_acpi USBC000:00: unknown error 0 [ 50.627957] ucsi_acpi USBC000:00: UCSI_GET_PDOS failed (-5) Gists: - https://gist.github.com/ricklahaye/83695df8c8273c30d2403da97a353e15 dmesg with Linux system 6.11.4-arch1-1 #1 SMP PREEMPT_DYNAMIC Thu, 17 Oct 2024 20:53:41 +0000 x86_64 GNU/Linux where thunderbolt dock does not work - https://gist.github.com/ricklahaye/79e4040abcd368524633e86addec1833 dmesg with Linux system 6.6.28-1-lts #1 SMP PREEMPT_DYNAMIC Wed, 17 Apr 2024 10:11:09 +0000 x86_64 GNU/Linux where thunderbolt does work - https://gist.github.com/ricklahaye/c9a7b4a7eeba5e7900194eecf9fce454 boltctl with Linux system 6.6.28-1-lts #1 SMP PREEMPT_DYNAMIC Wed, 17 Apr 2024 10:11:09 +0000 x86_64 GNU/Linux where thunderbolt does work Kind regards, Rick Ps: sorry for resend; this time with plain text format

1 year, 1 month

4
15
0 0

[PATCH v2 00/15] media: stm32: introduction of CSI / DCMIPP for STM32MP25

by Alain Volmat

This series introduces the camera pipeline support for the STM32MP25 SOC. The STM32MP25 has 3 pipelines, fed from a single camera input which can be either parallel or csi. This series adds the basic support for the 1st pipe (dump) which, in term of features is same as the one featured on the STM32MP13 SOC. It focuses on introduction of the CSI input stage for the DCMIPP, and the CSI specific new control code for the DCMIPP. One of the subdev of the DCMIPP, dcmipp_parallel is now renamed as dcmipp_input since it allows to not only control the parallel but also the csi interface. Signed-off-by: Alain Volmat <alain.volmat(a)foss.st.com> --- Alain Volmat (15): media: stm32: dcmipp: correct dma_set_mask_and_coherent mask value dt-bindings: media: add description of stm32 csi media: stm32: csi: addition of the STM32 CSI driver media: stm32: dcmipp: use v4l2_subdev_is_streaming media: stm32: dcmipp: replace s_stream with enable/disable_streams media: stm32: dcmipp: rename dcmipp_parallel into dcmipp_input media: stm32: dcmipp: add support for csi input into dcmipp-input media: stm32: dcmipp: add bayer 10~14 bits formats media: stm32: dcmipp: add 1X16 RGB / YUV formats support media: stm32: dcmipp: avoid duplicated format on enum in bytecap media: stm32: dcmipp: fill media ctl hw_revision field dt-bindings: media: add the stm32mp25 compatible of DCMIPP media: stm32: dcmipp: add core support for the stm32mp25 arm64: dts: st: add csi & dcmipp node in stm32mp25 arm64: dts: st: enable imx335/csi/dcmipp pipeline on stm32mp257f-ev1 .../devicetree/bindings/media/st,stm32-dcmipp.yaml | 53 +- .../bindings/media/st,stm32mp25-csi.yaml | 125 +++ MAINTAINERS | 8 + arch/arm64/boot/dts/st/stm32mp251.dtsi | 23 + arch/arm64/boot/dts/st/stm32mp257f-ev1.dts | 85 ++ drivers/media/platform/st/stm32/Kconfig | 14 + drivers/media/platform/st/stm32/Makefile | 1 + drivers/media/platform/st/stm32/stm32-csi.c | 1144 ++++++++++++++++++++ .../media/platform/st/stm32/stm32-dcmipp/Makefile | 2 +- .../st/stm32/stm32-dcmipp/dcmipp-bytecap.c | 128 ++- .../st/stm32/stm32-dcmipp/dcmipp-byteproc.c | 119 +- .../platform/st/stm32/stm32-dcmipp/dcmipp-common.h | 4 +- .../platform/st/stm32/stm32-dcmipp/dcmipp-core.c | 116 +- .../platform/st/stm32/stm32-dcmipp/dcmipp-input.c | 540 +++++++++ .../st/stm32/stm32-dcmipp/dcmipp-parallel.c | 440 -------- 15 files changed, 2226 insertions(+), 576 deletions(-) --- base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc change-id: 20241007-csi_dcmipp_mp25-7779601f57da Best regards, -- Alain Volmat <alain.volmat(a)foss.st.com>

1 year, 1 month

1
1
0 0

[PATCH v6.1.y v6.6.y] riscv/purgatory: align riscv_kernel_entry

by Xiangyu Chen

From: Daniel Maslowski <cyrevolt(a)googlemail.com> [ Upstream commit fb197c5d2fd24b9af3d4697d0cf778645846d6d5 ] When alignment handling is delegated to the kernel, everything must be word-aligned in purgatory, since the trap handler is then set to the kexec one. Without the alignment, hitting the exception would ultimately crash. On other occasions, the kernel's handler would take care of exceptions. This has been tested on a JH7110 SoC with oreboot and its SBI delegating unaligned access exceptions and the kernel configured to handle them. Fixes: 736e30af583fb ("RISC-V: Add purgatory") Signed-off-by: Daniel Maslowski <cyrevolt(a)gmail.com> Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com> Link: https://lore.kernel.org/r/20240719170437.247457-1-cyrevolt@gmail.com Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> [Xiangyu: bp to fix CVE: CVE-2024-43868 ,resolved minor conflicts] Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com> --- arch/riscv/purgatory/entry.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/riscv/purgatory/entry.S b/arch/riscv/purgatory/entry.S index 0194f4554130..a4ede42bc151 100644 --- a/arch/riscv/purgatory/entry.S +++ b/arch/riscv/purgatory/entry.S @@ -11,6 +11,8 @@ .macro size, sym:req .size \sym, . - \sym .endm +#include <asm/asm.h> +#include <linux/linkage.h> .text @@ -39,6 +41,7 @@ size purgatory_start .data +.align LGREG .globl riscv_kernel_entry riscv_kernel_entry: .quad 0 -- 2.43.0

1 year, 1 month

1
0
0 0

net: dpaa: Pad packets to ETH_ZLEN

by Hardik Gohil

Request to add the upstream patch to v5.4 kernel. Upstream commit cbd7ec083413c6a2e0c326d49e24ec7d12c7a9e0

1 year, 1 month

2
3
0 0

[PATCH net 7/8] can: mcp251xfd: mcp251xfd_ring_alloc(): fix coalescing configuration when switching CAN modes

by Marc Kleine-Budde

Since commit 50ea5449c563 ("can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode"), the current ring and coalescing configuration is passed to can_ram_get_layout(). That fixed the issue when switching between CAN-CC and CAN-FD mode with configured ring (rx, tx) and/or coalescing parameters (rx-frames-irq, tx-frames-irq). However 50ea5449c563 ("can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode"), introduced a regression when switching CAN modes with disabled coalescing configuration: Even if the previous CAN mode has no coalescing configured, the new mode is configured with active coalescing. This leads to delayed receiving of CAN-FD frames. This comes from the fact, that ethtool uses usecs = 0 and max_frames = 1 to disable coalescing, however the driver uses internally priv->{rx,tx}_obj_num_coalesce_irq = 0 to indicate disabled coalescing. Fix the regression by assigning struct ethtool_coalesce ec->{rx,tx}_max_coalesced_frames_irq = 1 if coalescing is disabled in the driver as can_ram_get_layout() expects this. Reported-by: https://github.com/vdh-robothania Closes: https://github.com/raspberrypi/linux/issues/6407 Fixes: 50ea5449c563 ("can: mcp251xfd: fix ring configuration when switching from CAN-CC to CAN-FD mode") Cc: stable(a)vger.kernel.org Reviewed-by: Simon Horman <horms(a)kernel.org> Link: https://patch.msgid.link/20241025-mcp251xfd-fix-coalesing-v1-1-9d11416de1df… Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de> --- drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c index e684991fa391..7209a831f0f2 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-ring.c @@ -2,7 +2,7 @@ // // mcp251xfd - Microchip MCP251xFD Family CAN controller driver // -// Copyright (c) 2019, 2020, 2021 Pengutronix, +// Copyright (c) 2019, 2020, 2021, 2024 Pengutronix, // Marc Kleine-Budde <kernel(a)pengutronix.de> // // Based on: @@ -483,9 +483,11 @@ int mcp251xfd_ring_alloc(struct mcp251xfd_priv *priv) }; const struct ethtool_coalesce ec = { .rx_coalesce_usecs_irq = priv->rx_coalesce_usecs_irq, - .rx_max_coalesced_frames_irq = priv->rx_obj_num_coalesce_irq, + .rx_max_coalesced_frames_irq = priv->rx_obj_num_coalesce_irq == 0 ? + 1 : priv->rx_obj_num_coalesce_irq, .tx_coalesce_usecs_irq = priv->tx_coalesce_usecs_irq, - .tx_max_coalesced_frames_irq = priv->tx_obj_num_coalesce_irq, + .tx_max_coalesced_frames_irq = priv->tx_obj_num_coalesce_irq == 0 ? + 1 : priv->tx_obj_num_coalesce_irq, }; struct can_ram_layout layout; -- 2.45.2

1 year, 1 month

2
2
0 0

[PATCH v5.4.y v5.10.y] spi: Fix deadlock when adding SPI controllers on SPI buses

by Hardik Gohil

From: Mark Brown <broonie(a)kernel.org> [ Upstream commit 6098475d4cb48d821bdf453c61118c56e26294f0 ] Currently we have a global spi_add_lock which we take when adding new devices so that we can check that we're not trying to reuse a chip select that's already controlled. This means that if the SPI device is itself a SPI controller and triggers the instantiation of further SPI devices we trigger a deadlock as we try to register and instantiate those devices while in the process of doing so for the parent controller and hence already holding the global spi_add_lock. Since we only care about concurrency within a single SPI bus move the lock to be per controller, avoiding the deadlock. This can be easily triggered in the case of spi-mux. Reported-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de> Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Hardik Gohil <hgohil(a)mvista.com> --- This fix was not backported to v5.4 and v5.10 please also apply following fix on top of this spi: fix use-after-free of the add_lock mutex commit 6c53b45c71b4920b5e62f0ea8079a1da382b9434 upstream. Commit 6098475d4cb4 ("spi: Fix deadlock when adding SPI controllers on SPI buses") introduced a per-controller mutex. But mutex_unlock() of said lock is called after the controller is already freed: drivers/spi/spi.c | 15 +++++---------- include/linux/spi/spi.h | 3 +++ 2 files changed, 8 insertions(+), 10 deletions(-) diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 0f9410e..58f1947 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -472,12 +472,6 @@ static LIST_HEAD(spi_controller_list); */ static DEFINE_MUTEX(board_lock); -/* - * Prevents addition of devices with same chip select and - * addition of devices below an unregistering controller. - */ -static DEFINE_MUTEX(spi_add_lock); - /** * spi_alloc_device - Allocate a new SPI device * @ctlr: Controller to which device is connected @@ -580,7 +574,7 @@ int spi_add_device(struct spi_device *spi) * chipselect **BEFORE** we call setup(), else we'll trash * its configuration. Lock against concurrent add() calls. */ - mutex_lock(&spi_add_lock); + mutex_lock(&ctlr->add_lock); status = bus_for_each_dev(&spi_bus_type, NULL, spi, spi_dev_check); if (status) { @@ -624,7 +618,7 @@ int spi_add_device(struct spi_device *spi) } done: - mutex_unlock(&spi_add_lock); + mutex_unlock(&ctlr->add_lock); return status; } EXPORT_SYMBOL_GPL(spi_add_device); @@ -2512,6 +2506,7 @@ int spi_register_controller(struct spi_controller *ctlr) spin_lock_init(&ctlr->bus_lock_spinlock); mutex_init(&ctlr->bus_lock_mutex); mutex_init(&ctlr->io_mutex); + mutex_init(&ctlr->add_lock); ctlr->bus_lock_flag = 0; init_completion(&ctlr->xfer_completion); if (!ctlr->max_dma_len) @@ -2657,7 +2652,7 @@ void spi_unregister_controller(struct spi_controller *ctlr) /* Prevent addition of new devices, unregister existing ones */ if (IS_ENABLED(CONFIG_SPI_DYNAMIC)) - mutex_lock(&spi_add_lock); + mutex_lock(&ctlr->add_lock); device_for_each_child(&ctlr->dev, NULL, __unregister); @@ -2688,7 +2683,7 @@ void spi_unregister_controller(struct spi_controller *ctlr) mutex_unlock(&board_lock); if (IS_ENABLED(CONFIG_SPI_DYNAMIC)) - mutex_unlock(&spi_add_lock); + mutex_unlock(&ctlr->add_lock); } EXPORT_SYMBOL_GPL(spi_unregister_controller); diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h index ca39b33..1b9cb90 100644 --- a/include/linux/spi/spi.h +++ b/include/linux/spi/spi.h @@ -483,6 +483,9 @@ struct spi_controller { /* I/O mutex */ struct mutex io_mutex; + /* Used to avoid adding the same CS twice */ + struct mutex add_lock; + /* lock and mutex for SPI bus locking */ spinlock_t bus_lock_spinlock; struct mutex bus_lock_mutex; -- 2.7.4

1 year, 1 month

1
0
0 0

+ mm-gup-avoid-an-unnecessary-allocation-call-for-foll_longterm-cases.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-gup-avoid-an-unnecessary-allocation-call-for-foll_longterm-cases.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: John Hubbard <jhubbard(a)nvidia.com> Subject: mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases Date: Mon, 4 Nov 2024 19:29:44 -0800 commit 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()") created a new constraint on the pin_user_pages*() API family: a potentially large internal allocation must now occur, for FOLL_LONGTERM cases. A user-visible consequence has now appeared: user space can no longer pin more than 2GB of memory anymore on x86_64. That's because, on a 4KB PAGE_SIZE system, when user space tries to (indirectly, via a device driver that calls pin_user_pages()) pin 2GB, this requires an allocation of a folio pointers array of MAX_PAGE_ORDER size, which is the limit for kmalloc(). In addition to the directly visible effect described above, there is also the problem of adding an unnecessary allocation. The **pages array argument has already been allocated, and there is no need for a redundant **folios array allocation in this case. Fix this by avoiding the new allocation entirely. This is done by referring to either the original page[i] within **pages, or to the associated folio. Thanks to David Hildenbrand for suggesting this approach and for providing the initial implementation (which I've tested and adjusted slightly) as well. Link: https://lkml.kernel.org/r/20241105032944.141488-2-jhubbard@nvidia.com Fixes: 53ba78de064b ("mm/gup: introduce check_and_migrate_movable_folios()") Signed-off-by: John Hubbard <jhubbard(a)nvidia.com> Suggested-by: David Hildenbrand <david(a)redhat.com> Cc: Vivek Kasireddy <vivek.kasireddy(a)intel.com> Cc: Dave Airlie <airlied(a)redhat.com> Cc: Gerd Hoffmann <kraxel(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Christoph Hellwig <hch(a)infradead.org> Cc: Jason Gunthorpe <jgg(a)nvidia.com> Cc: Peter Xu <peterx(a)redhat.com> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch> Cc: Dongwon Kim <dongwon.kim(a)intel.com> Cc: Hugh Dickins <hughd(a)google.com> Cc: Junxiao Chang <junxiao.chang(a)intel.com> Cc: Mike Kravetz <mike.kravetz(a)oracle.com> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/gup.c | 116 +++++++++++++++++++++++++++++++++++------------------ 1 file changed, 77 insertions(+), 39 deletions(-) --- a/mm/gup.c~mm-gup-avoid-an-unnecessary-allocation-call-for-foll_longterm-cases +++ a/mm/gup.c @@ -2273,20 +2273,57 @@ struct page *get_dump_page(unsigned long #endif /* CONFIG_ELF_CORE */ #ifdef CONFIG_MIGRATION + +/* + * An array of either pages or folios ("pofs"). Although it may seem tempting to + * avoid this complication, by simply interpreting a list of folios as a list of + * pages, that approach won't work in the longer term, because eventually the + * layouts of struct page and struct folio will become completely different. + * Furthermore, this pof approach avoids excessive page_folio() calls. + */ +struct pages_or_folios { + union { + struct page **pages; + struct folio **folios; + void **entries; + }; + bool has_folios; + long nr_entries; +}; + +static struct folio *pofs_get_folio(struct pages_or_folios *pofs, long i) +{ + if (pofs->has_folios) + return pofs->folios[i]; + return page_folio(pofs->pages[i]); +} + +static void pofs_clear_entry(struct pages_or_folios *pofs, long i) +{ + pofs->entries[i] = NULL; +} + +static void pofs_unpin(struct pages_or_folios *pofs) +{ + if (pofs->has_folios) + unpin_folios(pofs->folios, pofs->nr_entries); + else + unpin_user_pages(pofs->pages, pofs->nr_entries); +} + /* * Returns the number of collected folios. Return value is always >= 0. */ static unsigned long collect_longterm_unpinnable_folios( - struct list_head *movable_folio_list, - unsigned long nr_folios, - struct folio **folios) + struct list_head *movable_folio_list, + struct pages_or_folios *pofs) { unsigned long i, collected = 0; struct folio *prev_folio = NULL; bool drain_allow = true; - for (i = 0; i < nr_folios; i++) { - struct folio *folio = folios[i]; + for (i = 0; i < pofs->nr_entries; i++) { + struct folio *folio = pofs_get_folio(pofs, i); if (folio == prev_folio) continue; @@ -2327,16 +2364,15 @@ static unsigned long collect_longterm_un * Returns -EAGAIN if all folios were successfully migrated or -errno for * failure (or partial success). */ -static int migrate_longterm_unpinnable_folios( - struct list_head *movable_folio_list, - unsigned long nr_folios, - struct folio **folios) +static int +migrate_longterm_unpinnable_folios(struct list_head *movable_folio_list, + struct pages_or_folios *pofs) { int ret; unsigned long i; - for (i = 0; i < nr_folios; i++) { - struct folio *folio = folios[i]; + for (i = 0; i < pofs->nr_entries; i++) { + struct folio *folio = pofs_get_folio(pofs, i); if (folio_is_device_coherent(folio)) { /* @@ -2344,7 +2380,7 @@ static int migrate_longterm_unpinnable_f * convert the pin on the source folio to a normal * reference. */ - folios[i] = NULL; + pofs_clear_entry(pofs, i); folio_get(folio); gup_put_folio(folio, 1, FOLL_PIN); @@ -2363,8 +2399,8 @@ static int migrate_longterm_unpinnable_f * calling folio_isolate_lru() which takes a reference so the * folio won't be freed if it's migrating. */ - unpin_folio(folios[i]); - folios[i] = NULL; + unpin_folio(pofs_get_folio(pofs, i)); + pofs_clear_entry(pofs, i); } if (!list_empty(movable_folio_list)) { @@ -2387,12 +2423,26 @@ static int migrate_longterm_unpinnable_f return -EAGAIN; err: - unpin_folios(folios, nr_folios); + pofs_unpin(pofs); putback_movable_pages(movable_folio_list); return ret; } +static long +check_and_migrate_movable_pages_or_folios(struct pages_or_folios *pofs) +{ + LIST_HEAD(movable_folio_list); + unsigned long collected; + + collected = + collect_longterm_unpinnable_folios(&movable_folio_list, pofs); + if (!collected) + return 0; + + return migrate_longterm_unpinnable_folios(&movable_folio_list, pofs); +} + /* * Check whether all folios are *allowed* to be pinned indefinitely (long term). * Rather confusingly, all folios in the range are required to be pinned via @@ -2417,16 +2467,13 @@ err: static long check_and_migrate_movable_folios(unsigned long nr_folios, struct folio **folios) { - unsigned long collected; - LIST_HEAD(movable_folio_list); + struct pages_or_folios pofs = { + .folios = folios, + .has_folios = true, + .nr_entries = nr_folios, + }; - collected = collect_longterm_unpinnable_folios(&movable_folio_list, - nr_folios, folios); - if (!collected) - return 0; - - return migrate_longterm_unpinnable_folios(&movable_folio_list, - nr_folios, folios); + return check_and_migrate_movable_pages_or_folios(&pofs); } /* @@ -2436,22 +2483,13 @@ static long check_and_migrate_movable_fo static long check_and_migrate_movable_pages(unsigned long nr_pages, struct page **pages) { - struct folio **folios; - long i, ret; - - folios = kmalloc_array(nr_pages, sizeof(*folios), GFP_KERNEL); - if (!folios) { - unpin_user_pages(pages, nr_pages); - return -ENOMEM; - } - - for (i = 0; i < nr_pages; i++) - folios[i] = page_folio(pages[i]); + struct pages_or_folios pofs = { + .pages = pages, + .has_folios = false, + .nr_entries = nr_pages, + }; - ret = check_and_migrate_movable_folios(nr_pages, folios); - - kfree(folios); - return ret; + return check_and_migrate_movable_pages_or_folios(&pofs); } #else static long check_and_migrate_movable_pages(unsigned long nr_pages, _ Patches currently in -mm which might be from jhubbard(a)nvidia.com are mm-gup-avoid-an-unnecessary-allocation-call-for-foll_longterm-cases.patch kaslr-rename-physmem_end-and-physmem_end-to-direct_map_physmem_end.patch

1 year, 1 month

1
0
0 0

[PATCH] scsi: ufs: Start the RTC update work later

by Bart Van Assche

The RTC update work involves runtime resuming the UFS controller. Hence, only start the RTC update work after runtime power management in the UFS driver has been fully initialized. This patch fixes the following kernel crash: Internal error: Oops: 0000000096000006 [#1] PREEMPT SMP Workqueue: events ufshcd_rtc_work Call trace: _raw_spin_lock_irqsave+0x34/0x8c (P) pm_runtime_get_if_active+0x24/0x9c (L) pm_runtime_get_if_active+0x24/0x9c ufshcd_rtc_work+0x138/0x1b4 process_one_work+0x148/0x288 worker_thread+0x2cc/0x3d4 kthread+0x110/0x114 ret_from_fork+0x10/0x20 Reported-by: Neil Armstrong <neil.armstrong(a)linaro.org> Closes: https://lore.kernel.org/linux-scsi/0c0bc528-fdc2-4106-bc99-f23ae377f6f5@lin… Fixes: 6bf999e0eb41 ("scsi: ufs: core: Add UFS RTC support") Cc: Bean Huo <beanhuo(a)micron.com> Cc: stable(a)vger.kernel.org Signed-off-by: Bart Van Assche <bvanassche(a)acm.org> --- drivers/ufs/core/ufshcd.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 585557eaa9a2..ed82ff329314 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -8633,6 +8633,14 @@ static int ufshcd_add_lus(struct ufs_hba *hba) ufshcd_init_clk_scaling_sysfs(hba); } + /* + * The RTC update code accesses the hba->ufs_device_wlun->sdev_gendev + * pointer and hence must only be started after the WLUN pointer has + * been initialized by ufshcd_scsi_add_wlus(). + */ + schedule_delayed_work(&hba->ufs_rtc_update_work, + msecs_to_jiffies(UFS_RTC_UPDATE_INTERVAL_MS)); + ufs_bsg_probe(hba); scsi_scan_host(hba->host); @@ -8727,8 +8735,6 @@ static int ufshcd_post_device_init(struct ufs_hba *hba) ufshcd_force_reset_auto_bkops(hba); ufshcd_set_timestamp_attr(hba); - schedule_delayed_work(&hba->ufs_rtc_update_work, - msecs_to_jiffies(UFS_RTC_UPDATE_INTERVAL_MS)); if (!hba->max_pwr_info.is_valid) return 0;

1 year, 1 month

7
8
0 0

Re: [PATCH v2 1/1] [PATCH] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases

by kernel test robot

Hi, Thanks for your patch. FYI: kernel test robot notices the stable kernel rule is not satisfied. The check is based on https://www.kernel.org/doc/html/latest/process/stable-kernel-rules.html#opt… Rule: add the tag "Cc: stable(a)vger.kernel.org" in the sign-off area to have the patch automatically included in the stable tree. Subject: [PATCH v2 1/1] [PATCH] mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases Link: https://lore.kernel.org/stable/20241105032944.141488-2-jhubbard%40nvidia.com -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki

1 year, 1 month

1
0
0 0

+ ocfs2-remove-entry-once-instead-of-null-ptr-dereference-in-ocfs2_xa_remove.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove() has been added to the -mm mm-hotfixes-unstable branch. Its filename is ocfs2-remove-entry-once-instead-of-null-ptr-dereference-in-ocfs2_xa_remove.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Andrew Kanner <andrew.kanner(a)gmail.com> Subject: ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove() Date: Sun, 3 Nov 2024 20:38:45 +0100 Syzkaller is able to provoke null-ptr-dereference in ocfs2_xa_remove(): [ 57.319872] (a.out,1161,7):ocfs2_xa_remove:2028 ERROR: status = -12 [ 57.320420] (a.out,1161,7):ocfs2_xa_cleanup_value_truncate:1999 ERROR: Partial truncate while removing xattr overlay.upper. Leaking 1 clusters and removing the entry [ 57.321727] BUG: kernel NULL pointer dereference, address: 0000000000000004 [...] [ 57.325727] RIP: 0010:ocfs2_xa_block_wipe_namevalue+0x2a/0xc0 [...] [ 57.331328] Call Trace: [ 57.331477] <TASK> [...] [ 57.333511] ? do_user_addr_fault+0x3e5/0x740 [ 57.333778] ? exc_page_fault+0x70/0x170 [ 57.334016] ? asm_exc_page_fault+0x2b/0x30 [ 57.334263] ? __pfx_ocfs2_xa_block_wipe_namevalue+0x10/0x10 [ 57.334596] ? ocfs2_xa_block_wipe_namevalue+0x2a/0xc0 [ 57.334913] ocfs2_xa_remove_entry+0x23/0xc0 [ 57.335164] ocfs2_xa_set+0x704/0xcf0 [ 57.335381] ? _raw_spin_unlock+0x1a/0x40 [ 57.335620] ? ocfs2_inode_cache_unlock+0x16/0x20 [ 57.335915] ? trace_preempt_on+0x1e/0x70 [ 57.336153] ? start_this_handle+0x16c/0x500 [ 57.336410] ? preempt_count_sub+0x50/0x80 [ 57.336656] ? _raw_read_unlock+0x20/0x40 [ 57.336906] ? start_this_handle+0x16c/0x500 [ 57.337162] ocfs2_xattr_block_set+0xa6/0x1e0 [ 57.337424] __ocfs2_xattr_set_handle+0x1fd/0x5d0 [ 57.337706] ? ocfs2_start_trans+0x13d/0x290 [ 57.337971] ocfs2_xattr_set+0xb13/0xfb0 [ 57.338207] ? dput+0x46/0x1c0 [ 57.338393] ocfs2_xattr_trusted_set+0x28/0x30 [ 57.338665] ? ocfs2_xattr_trusted_set+0x28/0x30 [ 57.338948] __vfs_removexattr+0x92/0xc0 [ 57.339182] __vfs_removexattr_locked+0xd5/0x190 [ 57.339456] ? preempt_count_sub+0x50/0x80 [ 57.339705] vfs_removexattr+0x5f/0x100 [...] Reproducer uses faultinject facility to fail ocfs2_xa_remove() -> ocfs2_xa_value_truncate() with -ENOMEM. In this case the comment mentions that we can return 0 if ocfs2_xa_cleanup_value_truncate() is going to wipe the entry anyway. But the following 'rc' check is wrong and execution flow do 'ocfs2_xa_remove_entry(loc);' twice: * 1st: in ocfs2_xa_cleanup_value_truncate(); * 2nd: returning back to ocfs2_xa_remove() instead of going to 'out'. Fix this by skipping the 2nd removal of the same entry and making syzkaller repro happy. Link: https://lkml.kernel.org/r/20241103193845.2940988-1-andrew.kanner@gmail.com Fixes: 399ff3a748cf ("ocfs2: Handle errors while setting external xattr values.") Signed-off-by: Andrew Kanner <andrew.kanner(a)gmail.com> Reported-by: syzbot+386ce9e60fa1b18aac5b(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/671e13ab.050a0220.2b8c0f.01d0.GAE@google.com/T/ Tested-by: syzbot+386ce9e60fa1b18aac5b(a)syzkaller.appspotmail.com Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com> Cc: Mark Fasheh <mark(a)fasheh.com> Cc: Joel Becker <jlbec(a)evilplan.org> Cc: Junxiao Bi <junxiao.bi(a)oracle.com> Cc: Changwei Ge <gechangwei(a)live.cn> Cc: Gang He <ghe(a)suse.com> Cc: Jun Piao <piaojun(a)huawei.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/ocfs2/xattr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/fs/ocfs2/xattr.c~ocfs2-remove-entry-once-instead-of-null-ptr-dereference-in-ocfs2_xa_remove +++ a/fs/ocfs2/xattr.c @@ -2036,8 +2036,7 @@ static int ocfs2_xa_remove(struct ocfs2_ rc = 0; ocfs2_xa_cleanup_value_truncate(loc, "removing", orig_clusters); - if (rc) - goto out; + goto out; } } _ Patches currently in -mm which might be from andrew.kanner(a)gmail.com are ocfs2-remove-entry-once-instead-of-null-ptr-dereference-in-ocfs2_xa_remove.patch

1 year, 1 month

1
0
0 0

Re: Patch "net: hns3: add sync command to sync io-pgtable" has been added to the 6.6-stable tree

by Jijie Shao

on 2024/11/2 3:24, Sasha Levin wrote: > This is a note to let you know that I've just added the patch titled > > net: hns3: add sync command to sync io-pgtable > > to the 6.6-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > net-hns3-add-sync-command-to-sync-io-pgtable.patch > and it can be found in the queue-6.6 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. Hi: This patch was reverted from netdev, so, it also need be reverted from stable tree. I am sorry for that. reverted link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=2… Thanks, Jijie Shao > > > > commit 98cb88cce78da8a369cf780343d0280f9d3af4ca > Author: Jian Shen <shenjian15(a)huawei.com> > Date: Fri Oct 25 17:29:31 2024 +0800 > > net: hns3: add sync command to sync io-pgtable > > [ Upstream commit f2c14899caba76da93ff3fff46b4d5a8f43ce07e ] > > To avoid errors in pgtable prefectch, add a sync command to sync > io-pagtable. > > This is a supplement for the previous patch. > We want all the tx packet can be handled with tx bounce buffer path. > But it depends on the remain space of the spare buffer, checked by the > hns3_can_use_tx_bounce(). In most cases, maybe 99.99%, it returns true. > But once it return false by no available space, the packet will be handled > with the former path, which will map/unmap the skb buffer. > Then the driver will face the smmu prefetch risk again. > > So add a sync command in this case to avoid smmu prefectch, > just protects corner scenes. > > Fixes: 295ba232a8c3 ("net: hns3: add device version to replace pci revision") > Signed-off-by: Jian Shen <shenjian15(a)huawei.com> > Signed-off-by: Peiyang Wang <wangpeiyang1(a)huawei.com> > Signed-off-by: Jijie Shao <shaojijie(a)huawei.com> > Signed-off-by: Paolo Abeni <pabeni(a)redhat.com> > Signed-off-by: Sasha Levin <sashal(a)kernel.org> > > diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c > index 1f9bbf13214fb..bfcebf4e235ef 100644 > --- a/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c > +++ b/drivers/net/ethernet/hisilicon/hns3/hns3_enet.c > @@ -381,6 +381,24 @@ static const struct hns3_rx_ptype hns3_rx_ptype_tbl[] = { > #define HNS3_INVALID_PTYPE \ > ARRAY_SIZE(hns3_rx_ptype_tbl) > > +static void hns3_dma_map_sync(struct device *dev, unsigned long iova) > +{ > + struct iommu_domain *domain = iommu_get_domain_for_dev(dev); > + struct iommu_iotlb_gather iotlb_gather; > + size_t granule; > + > + if (!domain || !iommu_is_dma_domain(domain)) > + return; > + > + granule = 1 << __ffs(domain->pgsize_bitmap); > + iova = ALIGN_DOWN(iova, granule); > + iotlb_gather.start = iova; > + iotlb_gather.end = iova + granule - 1; > + iotlb_gather.pgsize = granule; > + > + iommu_iotlb_sync(domain, &iotlb_gather); > +} > + > static irqreturn_t hns3_irq_handle(int irq, void *vector) > { > struct hns3_enet_tqp_vector *tqp_vector = vector; > @@ -1728,7 +1746,9 @@ static int hns3_map_and_fill_desc(struct hns3_enet_ring *ring, void *priv, > unsigned int type) > { > struct hns3_desc_cb *desc_cb = &ring->desc_cb[ring->next_to_use]; > + struct hnae3_handle *handle = ring->tqp->handle; > struct device *dev = ring_to_dev(ring); > + struct hnae3_ae_dev *ae_dev; > unsigned int size; > dma_addr_t dma; > > @@ -1760,6 +1780,13 @@ static int hns3_map_and_fill_desc(struct hns3_enet_ring *ring, void *priv, > return -ENOMEM; > } > > + /* Add a SYNC command to sync io-pgtale to avoid errors in pgtable > + * prefetch > + */ > + ae_dev = hns3_get_ae_dev(handle); > + if (ae_dev->dev_version >= HNAE3_DEVICE_VERSION_V3) > + hns3_dma_map_sync(dev, dma); > + > desc_cb->priv = priv; > desc_cb->length = size; > desc_cb->dma = dma;

1 year, 1 month

1
0
0 0

[PATCH 6.1] bnxt_re: avoid shift undefined behavior in bnxt_qplib_alloc_init_hwq

by Xiangyu Chen

From: Michal Schmidt <mschmidt(a)redhat.com> [ Upstream commit 78cfd17142ef70599d6409cbd709d94b3da58659 ] Undefined behavior is triggered when bnxt_qplib_alloc_init_hwq is called with hwq_attr->aux_depth != 0 and hwq_attr->aux_stride == 0. In that case, "roundup_pow_of_two(hwq_attr->aux_stride)" gets called. roundup_pow_of_two is documented as undefined for 0. Fix it in the one caller that had this combination. The undefined behavior was detected by UBSAN: UBSAN: shift-out-of-bounds in ./include/linux/log2.h:57:13 shift exponent 64 is too large for 64-bit type 'long unsigned int' CPU: 24 PID: 1075 Comm: (udev-worker) Not tainted 6.9.0-rc6+ #4 Hardware name: Abacus electric, s.r.o. - servis(a)abacus.cz Super Server/H12SSW-iN, BIOS 2.7 10/25/2023 Call Trace: <TASK> dump_stack_lvl+0x5d/0x80 ubsan_epilogue+0x5/0x30 __ubsan_handle_shift_out_of_bounds.cold+0x61/0xec __roundup_pow_of_two+0x25/0x35 [bnxt_re] bnxt_qplib_alloc_init_hwq+0xa1/0x470 [bnxt_re] bnxt_qplib_create_qp+0x19e/0x840 [bnxt_re] bnxt_re_create_qp+0x9b1/0xcd0 [bnxt_re] ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 ? __kmalloc+0x1b6/0x4f0 ? create_qp.part.0+0x128/0x1c0 [ib_core] ? __pfx_bnxt_re_create_qp+0x10/0x10 [bnxt_re] create_qp.part.0+0x128/0x1c0 [ib_core] ib_create_qp_kernel+0x50/0xd0 [ib_core] create_mad_qp+0x8e/0xe0 [ib_core] ? __pfx_qp_event_handler+0x10/0x10 [ib_core] ib_mad_init_device+0x2be/0x680 [ib_core] add_client_context+0x10d/0x1a0 [ib_core] enable_device_and_get+0xe0/0x1d0 [ib_core] ib_register_device+0x53c/0x630 [ib_core] ? srso_alias_return_thunk+0x5/0xfbef5 bnxt_re_probe+0xbd8/0xe50 [bnxt_re] ? __pfx_bnxt_re_probe+0x10/0x10 [bnxt_re] auxiliary_bus_probe+0x49/0x80 ? driver_sysfs_add+0x57/0xc0 really_probe+0xde/0x340 ? pm_runtime_barrier+0x54/0x90 ? __pfx___driver_attach+0x10/0x10 __driver_probe_device+0x78/0x110 driver_probe_device+0x1f/0xa0 __driver_attach+0xba/0x1c0 bus_for_each_dev+0x8f/0xe0 bus_add_driver+0x146/0x220 driver_register+0x72/0xd0 __auxiliary_driver_register+0x6e/0xd0 ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re] bnxt_re_mod_init+0x3e/0xff0 [bnxt_re] ? __pfx_bnxt_re_mod_init+0x10/0x10 [bnxt_re] do_one_initcall+0x5b/0x310 do_init_module+0x90/0x250 init_module_from_file+0x86/0xc0 idempotent_init_module+0x121/0x2b0 __x64_sys_finit_module+0x5e/0xb0 do_syscall_64+0x82/0x160 ? srso_alias_return_thunk+0x5/0xfbef5 ? syscall_exit_to_user_mode_prepare+0x149/0x170 ? srso_alias_return_thunk+0x5/0xfbef5 ? syscall_exit_to_user_mode+0x75/0x230 ? srso_alias_return_thunk+0x5/0xfbef5 ? do_syscall_64+0x8e/0x160 ? srso_alias_return_thunk+0x5/0xfbef5 ? __count_memcg_events+0x69/0x100 ? srso_alias_return_thunk+0x5/0xfbef5 ? count_memcg_events.constprop.0+0x1a/0x30 ? srso_alias_return_thunk+0x5/0xfbef5 ? handle_mm_fault+0x1f0/0x300 ? srso_alias_return_thunk+0x5/0xfbef5 ? do_user_addr_fault+0x34e/0x640 ? srso_alias_return_thunk+0x5/0xfbef5 ? srso_alias_return_thunk+0x5/0xfbef5 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7f4e5132821d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d e3 db 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffca9c906a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139 RAX: ffffffffffffffda RBX: 0000563ec8a8f130 RCX: 00007f4e5132821d RDX: 0000000000000000 RSI: 00007f4e518fa07d RDI: 000000000000003b RBP: 00007ffca9c90760 R08: 00007f4e513f6b20 R09: 00007ffca9c906f0 R10: 0000563ec8a8faa0 R11: 0000000000000246 R12: 00007f4e518fa07d R13: 0000000000020000 R14: 0000563ec8409e90 R15: 0000563ec8a8fa60 </TASK> ---[ end trace ]--- Fixes: 0c4dcd602817 ("RDMA/bnxt_re: Refactor hardware queue memory allocation") Signed-off-by: Michal Schmidt <mschmidt(a)redhat.com> Link: https://lore.kernel.org/r/20240507103929.30003-1-mschmidt@redhat.com Acked-by: Selvin Xavier <selvin.xavier(a)broadcom.com> Signed-off-by: Leon Romanovsky <leon(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> [Xiangyu: bp to fix CVE: CVE-2024-38540] Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com> --- drivers/infiniband/hw/bnxt_re/qplib_fp.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/hw/bnxt_re/qplib_fp.c b/drivers/infiniband/hw/bnxt_re/qplib_fp.c index 1011293547ef..7d8dede13929 100644 --- a/drivers/infiniband/hw/bnxt_re/qplib_fp.c +++ b/drivers/infiniband/hw/bnxt_re/qplib_fp.c @@ -1014,7 +1014,8 @@ int bnxt_qplib_create_qp(struct bnxt_qplib_res *res, struct bnxt_qplib_qp *qp) hwq_attr.stride = sizeof(struct sq_sge); hwq_attr.depth = bnxt_qplib_get_depth(sq); hwq_attr.aux_stride = psn_sz; - hwq_attr.aux_depth = bnxt_qplib_set_sq_size(sq, qp->wqe_mode); + hwq_attr.aux_depth = psn_sz ? bnxt_qplib_set_sq_size(sq, qp->wqe_mode) + : 0; hwq_attr.type = HWQ_TYPE_QUEUE; rc = bnxt_qplib_alloc_init_hwq(&sq->hwq, &hwq_attr); if (rc) -- 2.43.0

1 year, 1 month

1
0
0 0

[PATCH v2 1/1] phy: freescale: imx8m-pcie: Do CMN_RST just before PHY PLL lock check

by Frank Li

From: Richard Zhu <hongxing.zhu(a)nxp.com> When enable initcall_debug together with higher debug level below. CONFIG_CONSOLE_LOGLEVEL_DEFAULT=9 CONFIG_CONSOLE_LOGLEVEL_QUIET=9 CONFIG_MESSAGE_LOGLEVEL_DEFAULT=7 The initialization of i.MX8MP PCIe PHY might be timeout failed randomly. To fix this issue, adjust the sequence of the resets refer to the power up sequence listed below. i.MX8MP PCIe PHY power up sequence: /--------------------------------------------- 1.8v supply ---------/ /--------------------------------------------------- 0.8v supply ---/ ---\ /-------------------------------------------------- X REFCLK Valid Reference Clock ---/ \-------------------------------------------------- ------------------------------------------- | i_init_restn -------------- ------------------------------------ | i_cmn_rstn --------------------- ------------------------------- | o_pll_lock_done -------------------------- Logs: imx6q-pcie 33800000.pcie: host bridge /soc@0/pcie@33800000 ranges: imx6q-pcie 33800000.pcie: IO 0x001ff80000..0x001ff8ffff -> 0x0000000000 imx6q-pcie 33800000.pcie: MEM 0x0018000000..0x001fefffff -> 0x0018000000 probe of clk_imx8mp_audiomix.reset.0 returned 0 after 1052 usecs probe of 30e20000.clock-controller returned 0 after 32971 usecs phy phy-32f00000.pcie-phy.4: phy poweron failed --> -110 probe of 30e10000.dma-controller returned 0 after 10235 usecs imx6q-pcie 33800000.pcie: waiting for PHY ready timeout! dwhdmi-imx 32fd8000.hdmi: Detected HDMI TX controller v2.13a with HDCP (samsung_dw_hdmi_phy2) imx6q-pcie 33800000.pcie: probe with driver imx6q-pcie failed with error -110 Fixes: dce9edff16ee ("phy: freescale: imx8m-pcie: Add i.MX8MP PCIe PHY support") Cc: stable(a)vger.kernel.org Signed-off-by: Richard Zhu <hongxing.zhu(a)nxp.com> Signed-off-by: Frank Li <Frank.Li(a)nxp.com> v2 changes: - Rebase to latest fixes branch of linux-phy git repo. - Richard's environment have problem and can't sent out patch. So I help post this fix patch. --- drivers/phy/freescale/phy-fsl-imx8m-pcie.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c index 11fcb1867118c..e98361dcdeadf 100644 --- a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c +++ b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c @@ -141,11 +141,6 @@ static int imx8_pcie_phy_power_on(struct phy *phy) IMX8MM_GPR_PCIE_REF_CLK_PLL); usleep_range(100, 200); - /* Do the PHY common block reset */ - regmap_update_bits(imx8_phy->iomuxc_gpr, IOMUXC_GPR14, - IMX8MM_GPR_PCIE_CMN_RST, - IMX8MM_GPR_PCIE_CMN_RST); - switch (imx8_phy->drvdata->variant) { case IMX8MP: reset_control_deassert(imx8_phy->perst); @@ -156,6 +151,11 @@ static int imx8_pcie_phy_power_on(struct phy *phy) break; } + /* Do the PHY common block reset */ + regmap_update_bits(imx8_phy->iomuxc_gpr, IOMUXC_GPR14, + IMX8MM_GPR_PCIE_CMN_RST, + IMX8MM_GPR_PCIE_CMN_RST); + /* Polling to check the phy is ready or not. */ ret = readl_poll_timeout(imx8_phy->base + IMX8MM_PCIE_PHY_CMN_REG075, val, val == ANA_PLL_DONE, 10, 20000); -- 2.34.1

1 year, 1 month

4
6
0 0

[PATCH] mm/slab: fix warning caused by duplicate kmem_cache creation in kmem_buckets_create

by Koichiro Den

Commit b035f5a6d852 ("mm: slab: reduce the kmalloc() minimum alignment if DMA bouncing possible") reduced ARCH_KMALLOC_MINALIGN to 8 on arm64. However, with KASAN_HW_TAGS enabled, arch_slab_minalign() becomes 16. This causes kmalloc_caches[*][8] to be aliased to kmalloc_caches[*][16], resulting in kmem_buckets_create() attempting to create a kmem_cache for size 16 twice. This duplication triggers warnings on boot: [ 2.325108] ------------[ cut here ]------------ [ 2.325135] kmem_cache of name 'memdup_user-16' already exists [ 2.325783] WARNING: CPU: 0 PID: 1 at mm/slab_common.c:107 __kmem_cache_create_args+0xb8/0x3b0 [ 2.327957] Modules linked in: [ 2.328550] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.12.0-rc5mm-unstable-arm64+ #12 [ 2.328683] Hardware name: QEMU QEMU Virtual Machine, BIOS 2024.02-2 03/11/2024 [ 2.328790] pstate: 61000009 (nZCv daif -PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 2.328911] pc : __kmem_cache_create_args+0xb8/0x3b0 [ 2.328930] lr : __kmem_cache_create_args+0xb8/0x3b0 [ 2.328942] sp : ffff800083d6fc50 [ 2.328961] x29: ffff800083d6fc50 x28: f2ff0000c1674410 x27: ffff8000820b0598 [ 2.329061] x26: 000000007fffffff x25: 0000000000000010 x24: 0000000000002000 [ 2.329101] x23: ffff800083d6fce8 x22: ffff8000832222e8 x21: ffff800083222388 [ 2.329118] x20: f2ff0000c1674410 x19: f5ff0000c16364c0 x18: ffff800083d80030 [ 2.329135] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 2.329152] x14: 0000000000000000 x13: 0a73747369786520 x12: 79646165726c6120 [ 2.329169] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : 0000000000000000 [ 2.329194] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 2.329210] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 2.329226] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 [ 2.329291] Call trace: [ 2.329407] __kmem_cache_create_args+0xb8/0x3b0 [ 2.329499] kmem_buckets_create+0xfc/0x320 [ 2.329526] init_user_buckets+0x34/0x78 [ 2.329540] do_one_initcall+0x64/0x3c8 [ 2.329550] kernel_init_freeable+0x26c/0x578 [ 2.329562] kernel_init+0x3c/0x258 [ 2.329574] ret_from_fork+0x10/0x20 [ 2.329698] ---[ end trace 0000000000000000 ]--- [ 2.403704] ------------[ cut here ]------------ [ 2.404716] kmem_cache of name 'msg_msg-16' already exists [ 2.404801] WARNING: CPU: 2 PID: 1 at mm/slab_common.c:107 __kmem_cache_create_args+0xb8/0x3b0 [ 2.404842] Modules linked in: [ 2.404971] CPU: 2 UID: 0 PID: 1 Comm: swapper/0 Tainted: G W 6.12.0-rc5mm-unstable-arm64+ #12 [ 2.405026] Tainted: [W]=WARN [ 2.405043] Hardware name: QEMU QEMU Virtual Machine, BIOS 2024.02-2 03/11/2024 [ 2.405057] pstate: 60400009 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 2.405079] pc : __kmem_cache_create_args+0xb8/0x3b0 [ 2.405100] lr : __kmem_cache_create_args+0xb8/0x3b0 [ 2.405111] sp : ffff800083d6fc50 [ 2.405115] x29: ffff800083d6fc50 x28: fbff0000c1674410 x27: ffff8000820b0598 [ 2.405135] x26: 000000000000ffd0 x25: 0000000000000010 x24: 0000000000006000 [ 2.405153] x23: ffff800083d6fce8 x22: ffff8000832222e8 x21: ffff800083222388 [ 2.405169] x20: fbff0000c1674410 x19: fdff0000c163d6c0 x18: ffff800083d80030 [ 2.405185] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 2.405201] x14: 0000000000000000 x13: 0a73747369786520 x12: 79646165726c6120 [ 2.405217] x11: 656820747563205b x10: 2d2d2d2d2d2d2d2d x9 : 0000000000000000 [ 2.405233] x8 : 0000000000000000 x7 : 0000000000000000 x6 : 0000000000000000 [ 2.405248] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 2.405271] x2 : 0000000000000000 x1 : 0000000000000000 x0 : 0000000000000000 [ 2.405287] Call trace: [ 2.405293] __kmem_cache_create_args+0xb8/0x3b0 [ 2.405305] kmem_buckets_create+0xfc/0x320 [ 2.405315] init_msg_buckets+0x34/0x78 [ 2.405326] do_one_initcall+0x64/0x3c8 [ 2.405337] kernel_init_freeable+0x26c/0x578 [ 2.405348] kernel_init+0x3c/0x258 [ 2.405360] ret_from_fork+0x10/0x20 [ 2.405370] ---[ end trace 0000000000000000 ]--- To address this, alias kmem_cache for sizes smaller than min alignment to the aligned sized kmem_cache, as done with the default system kmalloc bucket. Cc: <stable(a)vger.kernel.org> # 6.11.x Fixes: b32801d1255b ("mm/slab: Introduce kmem_buckets_create() and family") Signed-off-by: Koichiro Den <koichiro.den(a)gmail.com> --- mm/slab_common.c | 102 ++++++++++++++++++++++++++++------------------- 1 file changed, 62 insertions(+), 40 deletions(-) diff --git a/mm/slab_common.c b/mm/slab_common.c index 3d26c257ed8b..64140561dd0e 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -354,6 +354,38 @@ struct kmem_cache *__kmem_cache_create_args(const char *name, } EXPORT_SYMBOL(__kmem_cache_create_args); +static unsigned int __kmalloc_minalign(void) +{ + unsigned int minalign = dma_get_cache_alignment(); + + if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) && + is_swiotlb_allocated()) + minalign = ARCH_KMALLOC_MINALIGN; + + return max(minalign, arch_slab_minalign()); +} + +static unsigned int __kmalloc_aligned_size(unsigned int idx) +{ + unsigned int aligned_size = kmalloc_info[idx].size; + unsigned int minalign = __kmalloc_minalign(); + + if (minalign > ARCH_KMALLOC_MINALIGN) + aligned_size = ALIGN(aligned_size, minalign); + + return aligned_size; +} + +static unsigned int __kmalloc_aligned_idx(unsigned int idx) +{ + unsigned int minalign = __kmalloc_minalign(); + + if (minalign > ARCH_KMALLOC_MINALIGN) + return __kmalloc_index(__kmalloc_aligned_size(idx), false); + + return idx; +} + static struct kmem_cache *kmem_buckets_cache __ro_after_init; /** @@ -381,7 +413,10 @@ kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, void (*ctor)(void *)) { kmem_buckets *b; - int idx; + unsigned int idx; + unsigned long mask = 0; + + BUILD_BUG_ON(ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]) > BITS_PER_LONG); /* * When the separate buckets API is not built in, just return @@ -402,43 +437,47 @@ kmem_buckets *kmem_buckets_create(const char *name, slab_flags_t flags, for (idx = 0; idx < ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]); idx++) { char *short_size, *cache_name; + unsigned int aligned_size = __kmalloc_aligned_size(idx); + unsigned int aligned_idx = __kmalloc_aligned_idx(idx); unsigned int cache_useroffset, cache_usersize; - unsigned int size; + /* this might be an aliased kmem_cache */ if (!kmalloc_caches[KMALLOC_NORMAL][idx]) continue; - size = kmalloc_caches[KMALLOC_NORMAL][idx]->object_size; - if (!size) - continue; - short_size = strchr(kmalloc_caches[KMALLOC_NORMAL][idx]->name, '-'); if (WARN_ON(!short_size)) goto fail; - cache_name = kasprintf(GFP_KERNEL, "%s-%s", name, short_size + 1); - if (WARN_ON(!cache_name)) - goto fail; - - if (useroffset >= size) { + if (useroffset >= aligned_size) { cache_useroffset = 0; cache_usersize = 0; } else { cache_useroffset = useroffset; - cache_usersize = min(size - cache_useroffset, usersize); + cache_usersize = min(aligned_size - cache_useroffset, usersize); } - (*b)[idx] = kmem_cache_create_usercopy(cache_name, size, - 0, flags, cache_useroffset, - cache_usersize, ctor); - kfree(cache_name); - if (WARN_ON(!(*b)[idx])) - goto fail; + + if (!(*b)[aligned_idx]) { + cache_name = kasprintf(GFP_KERNEL, "%s-%s", name, short_size + 1); + if (WARN_ON(!cache_name)) + goto fail; + (*b)[aligned_idx] = kmem_cache_create_usercopy(cache_name, aligned_size, + 0, flags, cache_useroffset, + cache_usersize, ctor); + if (WARN_ON(!(*b)[aligned_idx])) { + kfree(cache_name); + goto fail; + } + set_bit(aligned_idx, &mask); + } + if (idx != aligned_idx) + (*b)[idx] = (*b)[aligned_idx]; } return b; fail: - for (idx = 0; idx < ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL]); idx++) + for_each_set_bit(idx, &mask, ARRAY_SIZE(kmalloc_caches[KMALLOC_NORMAL])) kmem_cache_destroy((*b)[idx]); kmem_cache_free(kmem_buckets_cache, b); @@ -871,24 +910,12 @@ void __init setup_kmalloc_cache_index_table(void) } } -static unsigned int __kmalloc_minalign(void) -{ - unsigned int minalign = dma_get_cache_alignment(); - - if (IS_ENABLED(CONFIG_DMA_BOUNCE_UNALIGNED_KMALLOC) && - is_swiotlb_allocated()) - minalign = ARCH_KMALLOC_MINALIGN; - - return max(minalign, arch_slab_minalign()); -} - static void __init -new_kmalloc_cache(int idx, enum kmalloc_cache_type type) +new_kmalloc_cache(unsigned int idx, enum kmalloc_cache_type type) { slab_flags_t flags = 0; - unsigned int minalign = __kmalloc_minalign(); - unsigned int aligned_size = kmalloc_info[idx].size; - int aligned_idx = idx; + unsigned int aligned_size = __kmalloc_aligned_size(idx); + unsigned int aligned_idx = __kmalloc_aligned_idx(idx); if ((KMALLOC_RECLAIM != KMALLOC_NORMAL) && (type == KMALLOC_RECLAIM)) { flags |= SLAB_RECLAIM_ACCOUNT; @@ -914,11 +941,6 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type) if (IS_ENABLED(CONFIG_MEMCG) && (type == KMALLOC_NORMAL)) flags |= SLAB_NO_MERGE; - if (minalign > ARCH_KMALLOC_MINALIGN) { - aligned_size = ALIGN(aligned_size, minalign); - aligned_idx = __kmalloc_index(aligned_size, false); - } - if (!kmalloc_caches[type][aligned_idx]) kmalloc_caches[type][aligned_idx] = create_kmalloc_cache( kmalloc_info[aligned_idx].name[type], @@ -934,7 +956,7 @@ new_kmalloc_cache(int idx, enum kmalloc_cache_type type) */ void __init create_kmalloc_caches(void) { - int i; + unsigned int i; enum kmalloc_cache_type type; /* -- 2.43.0

1 year, 1 month

5
7
0 0

[PATCH] media: uvcvideo:Create input device for all uvc devices with status endpoints.

by chenchangcheng

Some applications need to check if there is an input device on the camera before proceeding to the next step. When there is no input device, the application will report an error. Create input device for all uvc devices with status endpoints. and only when bTriggerSupport and bTriggerUsage are one are allowed to report camera button. Fixes: 3bc22dc66a4f ("media: uvcvideo: Only create input devs if hw supports it") Signed-off-by: chenchangcheng <ccc194101(a)163.com> --- drivers/media/usb/uvc/uvc_status.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/media/usb/uvc/uvc_status.c b/drivers/media/usb/uvc/uvc_status.c index a78a88c710e2..177640c6a813 100644 --- a/drivers/media/usb/uvc/uvc_status.c +++ b/drivers/media/usb/uvc/uvc_status.c @@ -44,9 +44,6 @@ static int uvc_input_init(struct uvc_device *dev) struct input_dev *input; int ret; - if (!uvc_input_has_button(dev)) - return 0; - input = input_allocate_device(); if (input == NULL) return -ENOMEM; @@ -110,10 +107,12 @@ static void uvc_event_streaming(struct uvc_device *dev, if (len <= offsetof(struct uvc_status, streaming)) return; - uvc_dbg(dev, STATUS, "Button (intf %u) %s len %d\n", - status->bOriginator, - status->streaming.button ? "pressed" : "released", len); - uvc_input_report_key(dev, KEY_CAMERA, status->streaming.button); + if (uvc_input_has_button(dev)) { + uvc_dbg(dev, STATUS, "Button (intf %u) %s len %d\n", + status->bOriginator, + status->streaming.button ? "pressed" : "released", len); + uvc_input_report_key(dev, KEY_CAMERA, status->streaming.button); + } } else { uvc_dbg(dev, STATUS, "Stream %u error event %02x len %d\n", status->bOriginator, status->bEvent, len); -- 2.25.1

1 year, 1 month

4
5
0 0

[PATCH] usb: dwc3: fix fault at system suspend if device was already runtime suspended

by Roger Quadros

If the device was already runtime suspended then during system suspend we cannot access the device registers else it will crash. Also we cannot access any registers after dwc3_core_exit() on some platforms so move the dwc3_enable_susphy() call to the top. Cc: stable(a)vger.kernel.org # v5.15+ Reported-by: William McVicker <willmcvicker(a)google.com> Closes: https://lore.kernel.org/all/ZyVfcUuPq56R2m1Y@google.com Fixes: 705e3ce37bcc ("usb: dwc3: core: Fix system suspend on TI AM62 platforms") Signed-off-by: Roger Quadros <rogerq(a)kernel.org> --- drivers/usb/dwc3/core.c | 25 ++++++++++++------------- 1 file changed, 12 insertions(+), 13 deletions(-) diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 427e5660f87c..98114c2827c0 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -2342,10 +2342,18 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg) u32 reg; int i; - dwc->susphy_state = (dwc3_readl(dwc->regs, DWC3_GUSB2PHYCFG(0)) & - DWC3_GUSB2PHYCFG_SUSPHY) || - (dwc3_readl(dwc->regs, DWC3_GUSB3PIPECTL(0)) & - DWC3_GUSB3PIPECTL_SUSPHY); + if (!pm_runtime_suspended(dwc->dev) && !PMSG_IS_AUTO(msg)) { + dwc->susphy_state = (dwc3_readl(dwc->regs, DWC3_GUSB2PHYCFG(0)) & + DWC3_GUSB2PHYCFG_SUSPHY) || + (dwc3_readl(dwc->regs, DWC3_GUSB3PIPECTL(0)) & + DWC3_GUSB3PIPECTL_SUSPHY); + /* + * TI AM62 platform requires SUSPHY to be + * enabled for system suspend to work. + */ + if (!dwc->susphy_state) + dwc3_enable_susphy(dwc, true); + } switch (dwc->current_dr_role) { case DWC3_GCTL_PRTCAP_DEVICE: @@ -2398,15 +2406,6 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg) break; } - if (!PMSG_IS_AUTO(msg)) { - /* - * TI AM62 platform requires SUSPHY to be - * enabled for system suspend to work. - */ - if (!dwc->susphy_state) - dwc3_enable_susphy(dwc, true); - } - return 0; } --- base-commit: 42f7652d3eb527d03665b09edac47f85fb600924 change-id: 20241102-am62-lpm-usb-fix-347dd86135c1 Best regards, -- Roger Quadros <rogerq(a)kernel.org>

1 year, 1 month

3
2
0 0

[PATCH net 4/6] idpf: fix idpf_vc_core_init error path

by Tony Nguyen

From: Pavan Kumar Linga <pavan.kumar.linga(a)intel.com> In an event where the platform running the device control plane is rebooted, reset is detected on the driver. It releases all the resources and waits for the reset to complete. Once the reset is done, it tries to build the resources back. At this time if the device control plane is not yet started, then the driver timeouts on the virtchnl message and retries to establish the mailbox again. In the retry flow, mailbox is deinitialized but the mailbox workqueue is still alive and polling for the mailbox message. This results in accessing the released control queue leading to null-ptr-deref. Fix it by unrolling the work queue cancellation and mailbox deinitialization in the reverse order which they got initialized. Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request") Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager") Cc: stable(a)vger.kernel.org # 6.9+ Reviewed-by: Tarun K Singh <tarun.k.singh(a)intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga(a)intel.com> Tested-by: Krishneil Singh <krishneil.k.singh(a)intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com> --- drivers/net/ethernet/intel/idpf/idpf_lib.c | 1 + drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 1 - 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c index c3848e10e7db..b4fbb99bfad2 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_lib.c +++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c @@ -1786,6 +1786,7 @@ static int idpf_init_hard_reset(struct idpf_adapter *adapter) */ err = idpf_vc_core_init(adapter); if (err) { + cancel_delayed_work_sync(&adapter->mbx_task); idpf_deinit_dflt_mbx(adapter); goto unlock_mutex; } diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c index ce217e274506..d46c95f91b0d 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c @@ -3063,7 +3063,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter) adapter->state = __IDPF_VER_CHECK; if (adapter->vcxn_mngr) idpf_vc_xn_shutdown(adapter->vcxn_mngr); - idpf_deinit_dflt_mbx(adapter); set_bit(IDPF_HR_DRV_LOAD, adapter->flags); queue_delayed_work(adapter->vc_event_wq, &adapter->vc_event_task, msecs_to_jiffies(task_delay)); -- 2.42.0

1 year, 1 month

1
0
0 0

[PATCH net 3/6] idpf: avoid vport access in idpf_get_link_ksettings

by Tony Nguyen

From: Pavan Kumar Linga <pavan.kumar.linga(a)intel.com> When the device control plane is removed or the platform running device control plane is rebooted, a reset is detected on the driver. On driver reset, it releases the resources and waits for the reset to complete. If the reset fails, it takes the error path and releases the vport lock. At this time if the monitoring tools tries to access link settings, it call traces for accessing released vport pointer. To avoid it, move link_speed_mbps to netdev_priv structure which removes the dependency on vport pointer and the vport lock in idpf_get_link_ksettings. Also use netif_carrier_ok() to check the link status and adjust the offsetof to use link_up instead of link_speed_mbps. Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks") Cc: stable(a)vger.kernel.org # 6.7+ Reviewed-by: Tarun K Singh <tarun.k.singh(a)intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga(a)intel.com> Tested-by: Krishneil Singh <krishneil.k.singh(a)intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com> --- drivers/net/ethernet/intel/idpf/idpf.h | 4 ++-- drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 11 +++-------- drivers/net/ethernet/intel/idpf/idpf_lib.c | 4 ++-- drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 2 +- 4 files changed, 8 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h index 2c31ad87587a..66544faab710 100644 --- a/drivers/net/ethernet/intel/idpf/idpf.h +++ b/drivers/net/ethernet/intel/idpf/idpf.h @@ -141,6 +141,7 @@ enum idpf_vport_state { * @adapter: Adapter back pointer * @vport: Vport back pointer * @vport_id: Vport identifier + * @link_speed_mbps: Link speed in mbps * @vport_idx: Relative vport index * @state: See enum idpf_vport_state * @netstats: Packet and byte stats @@ -150,6 +151,7 @@ struct idpf_netdev_priv { struct idpf_adapter *adapter; struct idpf_vport *vport; u32 vport_id; + u32 link_speed_mbps; u16 vport_idx; enum idpf_vport_state state; struct rtnl_link_stats64 netstats; @@ -287,7 +289,6 @@ struct idpf_port_stats { * @tx_itr_profile: TX profiles for Dynamic Interrupt Moderation * @port_stats: per port csum, header split, and other offload stats * @link_up: True if link is up - * @link_speed_mbps: Link speed in mbps * @sw_marker_wq: workqueue for marker packets */ struct idpf_vport { @@ -331,7 +332,6 @@ struct idpf_vport { struct idpf_port_stats port_stats; bool link_up; - u32 link_speed_mbps; wait_queue_head_t sw_marker_wq; }; diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c index 3806ddd3ce4a..59b1a1a09996 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c @@ -1296,24 +1296,19 @@ static void idpf_set_msglevel(struct net_device *netdev, u32 data) static int idpf_get_link_ksettings(struct net_device *netdev, struct ethtool_link_ksettings *cmd) { - struct idpf_vport *vport; - - idpf_vport_ctrl_lock(netdev); - vport = idpf_netdev_to_vport(netdev); + struct idpf_netdev_priv *np = netdev_priv(netdev); ethtool_link_ksettings_zero_link_mode(cmd, supported); cmd->base.autoneg = AUTONEG_DISABLE; cmd->base.port = PORT_NONE; - if (vport->link_up) { + if (netif_carrier_ok(netdev)) { cmd->base.duplex = DUPLEX_FULL; - cmd->base.speed = vport->link_speed_mbps; + cmd->base.speed = np->link_speed_mbps; } else { cmd->base.duplex = DUPLEX_UNKNOWN; cmd->base.speed = SPEED_UNKNOWN; } - idpf_vport_ctrl_unlock(netdev); - return 0; } diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c index 4f20343e49a9..c3848e10e7db 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_lib.c +++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c @@ -1860,7 +1860,7 @@ int idpf_initiate_soft_reset(struct idpf_vport *vport, * mess with. Nothing below should use those variables from new_vport * and should instead always refer to them in vport if they need to. */ - memcpy(new_vport, vport, offsetof(struct idpf_vport, link_speed_mbps)); + memcpy(new_vport, vport, offsetof(struct idpf_vport, link_up)); /* Adjust resource parameters prior to reallocating resources */ switch (reset_cause) { @@ -1906,7 +1906,7 @@ int idpf_initiate_soft_reset(struct idpf_vport *vport, /* Same comment as above regarding avoiding copying the wait_queues and * mutexes applies here. We do not want to mess with those if possible. */ - memcpy(vport, new_vport, offsetof(struct idpf_vport, link_speed_mbps)); + memcpy(vport, new_vport, offsetof(struct idpf_vport, link_up)); if (reset_cause == IDPF_SR_Q_CHANGE) idpf_vport_alloc_vec_indexes(vport); diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c index 15c00a01f1c0..ce217e274506 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c @@ -141,7 +141,7 @@ static void idpf_handle_event_link(struct idpf_adapter *adapter, } np = netdev_priv(vport->netdev); - vport->link_speed_mbps = le32_to_cpu(v2e->link_speed); + np->link_speed_mbps = le32_to_cpu(v2e->link_speed); if (vport->link_up == v2e->link_status) return; -- 2.42.0

1 year, 1 month

1
0
0 0

[PATCH] media: uvcvideo: Fix crash during unbind if gpio unit is in use

by Ricardo Ribalda

We used the wrong device for the device managed functions. We used the usb device, when we should be using the interface device. If we unbind the driver from the usb interface, the cleanup functions are never called. In our case, the IRQ is never disabled. If an IRQ is triggered, it will try to access memory sections that are already free, causing an OOPS. Luckily this bug has small impact, as it is only affected by devices with gpio units and the user has to unbind the device, a disconnect will not trigger this error. Cc: stable(a)vger.kernel.org Fixes: 2886477ff987 ("media: uvcvideo: Implement UVC_EXT_GPIO_UNIT") Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org> --- drivers/media/usb/uvc/uvc_driver.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c index a96f6ca0889f..1100d3ed342e 100644 --- a/drivers/media/usb/uvc/uvc_driver.c +++ b/drivers/media/usb/uvc/uvc_driver.c @@ -1295,14 +1295,14 @@ static int uvc_gpio_parse(struct uvc_device *dev) struct gpio_desc *gpio_privacy; int irq; - gpio_privacy = devm_gpiod_get_optional(&dev->udev->dev, "privacy", + gpio_privacy = devm_gpiod_get_optional(&dev->intf->dev, "privacy", GPIOD_IN); if (IS_ERR_OR_NULL(gpio_privacy)) return PTR_ERR_OR_ZERO(gpio_privacy); irq = gpiod_to_irq(gpio_privacy); if (irq < 0) - return dev_err_probe(&dev->udev->dev, irq, + return dev_err_probe(&dev->intf->dev, irq, "No IRQ for privacy GPIO\n"); unit = uvc_alloc_new_entity(dev, UVC_EXT_GPIO_UNIT, @@ -1333,7 +1333,7 @@ static int uvc_gpio_init_irq(struct uvc_device *dev) if (!unit || unit->gpio.irq < 0) return 0; - return devm_request_threaded_irq(&dev->udev->dev, unit->gpio.irq, NULL, + return devm_request_threaded_irq(&dev->intf->dev, unit->gpio.irq, NULL, uvc_gpio_irq, IRQF_ONESHOT | IRQF_TRIGGER_FALLING | IRQF_TRIGGER_RISING, --- base-commit: c7ccf3683ac9746b263b0502255f5ce47f64fe0a change-id: 20241031-uvc-crashrmmod-666de3fc9141 Best regards, -- Ricardo Ribalda <ribalda(a)chromium.org>

1 year, 1 month

3
2
0 0

[PATCH v2] signal: restore the override_rlimit logic

by Roman Gushchin

Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of signals. However now it's enforced unconditionally, even if override_rlimit is set. This behavior change caused production issues. For example, if the limit is reached and a process receives a SIGSEGV signal, sigqueue_alloc fails to allocate the necessary resources for the signal delivery, preventing the signal from being delivered with siginfo. This prevents the process from correctly identifying the fault address and handling the error. From the user-space perspective, applications are unaware that the limit has been reached and that the siginfo is effectively 'corrupted'. This can lead to unpredictable behavior and crashes, as we observed with java applications. Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip the comparison to max there if override_rlimit is set. This effectively restores the old behavior. v2: refactor to make the logic simpler (Eric, Oleg, Alexey) Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev> Co-developed-by: Andrei Vagin <avagin(a)google.com> Signed-off-by: Andrei Vagin <avagin(a)google.com> Cc: Kees Cook <kees(a)kernel.org> Cc: "Eric W. Biederman" <ebiederm(a)xmission.com> Cc: Alexey Gladkov <legion(a)kernel.org> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: <stable(a)vger.kernel.org> --- include/linux/user_namespace.h | 3 ++- kernel/signal.c | 3 ++- kernel/ucount.c | 6 ++++-- 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 3625096d5f85..7183e5aca282 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -141,7 +141,8 @@ static inline long get_rlimit_value(struct ucounts *ucounts, enum rlimit_type ty long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v); bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v); -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type); +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type, + bool override_rlimit); void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type); bool is_rlimit_overlimit(struct ucounts *ucounts, enum rlimit_type type, unsigned long max); diff --git a/kernel/signal.c b/kernel/signal.c index 4344860ffcac..cbabb2d05e0a 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -419,7 +419,8 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags, */ rcu_read_lock(); ucounts = task_ucounts(t); - sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, + override_rlimit); rcu_read_unlock(); if (!sigpending) return NULL; diff --git a/kernel/ucount.c b/kernel/ucount.c index 16c0ea1cb432..49fcec41e5b4 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -307,7 +307,8 @@ void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type) do_dec_rlimit_put_ucounts(ucounts, NULL, type); } -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type) +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type, + bool override_rlimit) { /* Caller must hold a reference to ucounts */ struct ucounts *iter; @@ -320,7 +321,8 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type) goto unwind; if (iter == ucounts) ret = new; - max = get_userns_rlimit_max(iter->ns, type); + if (!override_rlimit) + max = get_userns_rlimit_max(iter->ns, type); /* * Grab an extra ucount reference for the caller when * the rlimit count was previously 0. -- 2.47.0.199.ga7371fff76-goog

1 year, 1 month

3
2
0 0

[PATCH] ovl: properly handle large files in ovl_security_fileattr

by Oleksandr Tymoshenko

dentry_open in ovl_security_fileattr fails for any file larger than 2GB if open method of the underlying filesystem calls generic_file_open (e.g. fusefs). The issue can be reproduce using the following script: (passthrough_ll is an example app from libfuse). $ D=/opt/test/mnt $ mkdir -p ${D}/{source,base,top/uppr,top/work,ovlfs} $ dd if=/dev/zero of=${D}/source/zero.bin bs=1G count=2 $ passthrough_ll -o source=${D}/source ${D}/base $ mount -t overlay overlay \ -olowerdir=${D}/base,upperdir=${D}/top/uppr,workdir=${D}/top/work \ ${D}/ovlfs $ chmod 0777 ${D}/mnt/ovlfs/zero.bin Running this script results in "Value too large for defined data type" error message from chmod. Signed-off-by: Oleksandr Tymoshenko <ovt(a)google.com> Fixes: 72db82115d2b ("ovl: copy up sync/noatime fileattr flags") Cc: stable(a)vger.kernel.org # v5.15+ --- fs/overlayfs/inode.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/overlayfs/inode.c b/fs/overlayfs/inode.c index 35fd3e3e1778..baa54c718bd7 100644 --- a/fs/overlayfs/inode.c +++ b/fs/overlayfs/inode.c @@ -616,8 +616,13 @@ static int ovl_security_fileattr(const struct path *realpath, struct fileattr *f struct file *file; unsigned int cmd; int err; + unsigned int flags; + + flags = O_RDONLY; + if (force_o_largefile()) + flags |= O_LARGEFILE; - file = dentry_open(realpath, O_RDONLY, current_cred()); + file = dentry_open(realpath, flags, current_cred()); if (IS_ERR(file)) return PTR_ERR(file); -- 2.47.0.163.g1226f6d8fa-goog

1 year, 1 month

2
1
0 0

+ signal-restore-the-override_rlimit-logic.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: signal: restore the override_rlimit logic has been added to the -mm mm-hotfixes-unstable branch. Its filename is signal-restore-the-override_rlimit-logic.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Roman Gushchin <roman.gushchin(a)linux.dev> Subject: signal: restore the override_rlimit logic Date: Mon, 4 Nov 2024 19:54:19 +0000 Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of signals. However now it's enforced unconditionally, even if override_rlimit is set. This behavior change caused production issues. For example, if the limit is reached and a process receives a SIGSEGV signal, sigqueue_alloc fails to allocate the necessary resources for the signal delivery, preventing the signal from being delivered with siginfo. This prevents the process from correctly identifying the fault address and handling the error. From the user-space perspective, applications are unaware that the limit has been reached and that the siginfo is effectively 'corrupted'. This can lead to unpredictable behavior and crashes, as we observed with java applications. Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip the comparison to max there if override_rlimit is set. This effectively restores the old behavior. Link: https://lkml.kernel.org/r/20241104195419.3962584-1-roman.gushchin@linux.dev Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev> Co-developed-by: Andrei Vagin <avagin(a)google.com> Signed-off-by: Andrei Vagin <avagin(a)google.com> Acked-by: Oleg Nesterov <oleg(a)redhat.com> Cc: Kees Cook <kees(a)kernel.org> Cc: "Eric W. Biederman" <ebiederm(a)xmission.com> Cc: Alexey Gladkov <legion(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/user_namespace.h | 3 ++- kernel/signal.c | 3 ++- kernel/ucount.c | 6 ++++-- 3 files changed, 8 insertions(+), 4 deletions(-) --- a/include/linux/user_namespace.h~signal-restore-the-override_rlimit-logic +++ a/include/linux/user_namespace.h @@ -141,7 +141,8 @@ static inline long get_rlimit_value(stru long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v); bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v); -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type); +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type, + bool override_rlimit); void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type); bool is_rlimit_overlimit(struct ucounts *ucounts, enum rlimit_type type, unsigned long max); --- a/kernel/signal.c~signal-restore-the-override_rlimit-logic +++ a/kernel/signal.c @@ -419,7 +419,8 @@ __sigqueue_alloc(int sig, struct task_st */ rcu_read_lock(); ucounts = task_ucounts(t); - sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, + override_rlimit); rcu_read_unlock(); if (!sigpending) return NULL; --- a/kernel/ucount.c~signal-restore-the-override_rlimit-logic +++ a/kernel/ucount.c @@ -307,7 +307,8 @@ void dec_rlimit_put_ucounts(struct ucoun do_dec_rlimit_put_ucounts(ucounts, NULL, type); } -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type) +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type, + bool override_rlimit) { /* Caller must hold a reference to ucounts */ struct ucounts *iter; @@ -320,7 +321,8 @@ long inc_rlimit_get_ucounts(struct ucoun goto dec_unwind; if (iter == ucounts) ret = new; - max = get_userns_rlimit_max(iter->ns, type); + if (!override_rlimit) + max = get_userns_rlimit_max(iter->ns, type); /* * Grab an extra ucount reference for the caller when * the rlimit count was previously 0. _ Patches currently in -mm which might be from roman.gushchin(a)linux.dev are signal-restore-the-override_rlimit-logic.patch

1 year, 1 month

1
0
0 0

[PATCH 5.15.y] ice: Add a per-VF limit on number of FDIR filters

by Sherry Yang

From: Ahmed Zaki <ahmed.zaki(a)intel.com> commit 6ebbe97a488179f5dc85f2f1e0c89b486e99ee97 upstream. While the iavf driver adds a s/w limit (128) on the number of FDIR filters that the VF can request, a malicious VF driver can request more than that and exhaust the resources for other VFs. Add a similar limit in ice. CC: stable(a)vger.kernel.org Fixes: 1f7ea1cd6a37 ("ice: Enable FDIR Configure for AVF") Reviewed-by: Przemek Kitszel <przemyslaw.kitszel(a)intel.com> Suggested-by: Sridhar Samudrala <sridhar.samudrala(a)intel.com> Signed-off-by: Ahmed Zaki <ahmed.zaki(a)intel.com> Reviewed-by: Wojciech Drewek <wojciech.drewek(a)intel.com> Tested-by: Rafal Romanowski <rafal.romanowski(a)intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> [Sherry: bp to fix CVE-2024-42291. Ignore context change in ice_fdir.h to resolve conflicts.] Signed-off-by: Sherry Yang <sherry.yang(a)oracle.com> --- .../net/ethernet/intel/ice/ice_ethtool_fdir.c | 2 +- drivers/net/ethernet/intel/ice/ice_fdir.h | 3 +++ .../net/ethernet/intel/ice/ice_virtchnl_fdir.c | 16 ++++++++++++++++ .../net/ethernet/intel/ice/ice_virtchnl_fdir.h | 1 + 4 files changed, 21 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice_ethtool_fdir.c b/drivers/net/ethernet/intel/ice/ice_ethtool_fdir.c index 0106ea3519a0..b52b77579c7f 100644 --- a/drivers/net/ethernet/intel/ice/ice_ethtool_fdir.c +++ b/drivers/net/ethernet/intel/ice/ice_ethtool_fdir.c @@ -456,7 +456,7 @@ ice_parse_rx_flow_user_data(struct ethtool_rx_flow_spec *fsp, * * Returns the number of available flow director filters to this VSI */ -static int ice_fdir_num_avail_fltr(struct ice_hw *hw, struct ice_vsi *vsi) +int ice_fdir_num_avail_fltr(struct ice_hw *hw, struct ice_vsi *vsi) { u16 vsi_num = ice_get_hw_vsi_num(hw, vsi->idx); u16 num_guar; diff --git a/drivers/net/ethernet/intel/ice/ice_fdir.h b/drivers/net/ethernet/intel/ice/ice_fdir.h index d2d40e18ae8a..32e9258d27d7 100644 --- a/drivers/net/ethernet/intel/ice/ice_fdir.h +++ b/drivers/net/ethernet/intel/ice/ice_fdir.h @@ -201,6 +201,8 @@ struct ice_fdir_base_pkt { const u8 *tun_pkt; }; +struct ice_vsi; + enum ice_status ice_alloc_fd_res_cntr(struct ice_hw *hw, u16 *cntr_id); enum ice_status ice_free_fd_res_cntr(struct ice_hw *hw, u16 cntr_id); enum ice_status @@ -214,6 +216,7 @@ enum ice_status ice_fdir_get_gen_prgm_pkt(struct ice_hw *hw, struct ice_fdir_fltr *input, u8 *pkt, bool frag, bool tun); int ice_get_fdir_cnt_all(struct ice_hw *hw); +int ice_fdir_num_avail_fltr(struct ice_hw *hw, struct ice_vsi *vsi); bool ice_fdir_is_dup_fltr(struct ice_hw *hw, struct ice_fdir_fltr *input); bool ice_fdir_has_frag(enum ice_fltr_ptype flow); struct ice_fdir_fltr * diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c index 412deb36b645..2ca8102e8f36 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c @@ -744,6 +744,8 @@ static void ice_vc_fdir_reset_cnt_all(struct ice_vf_fdir *fdir) fdir->fdir_fltr_cnt[flow][0] = 0; fdir->fdir_fltr_cnt[flow][1] = 0; } + + fdir->fdir_fltr_cnt_total = 0; } /** @@ -1837,6 +1839,7 @@ ice_vc_add_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx, resp->status = status; resp->flow_id = conf->flow_id; vf->fdir.fdir_fltr_cnt[conf->input.flow_type][is_tun]++; + vf->fdir.fdir_fltr_cnt_total++; ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret, (u8 *)resp, len); @@ -1901,6 +1904,7 @@ ice_vc_del_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx, resp->status = status; ice_vc_fdir_remove_entry(vf, conf, conf->flow_id); vf->fdir.fdir_fltr_cnt[conf->input.flow_type][is_tun]--; + vf->fdir.fdir_fltr_cnt_total--; ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret, (u8 *)resp, len); @@ -2065,6 +2069,7 @@ int ice_vc_add_fdir_fltr(struct ice_vf *vf, u8 *msg) struct virtchnl_fdir_add *stat = NULL; struct virtchnl_fdir_fltr_conf *conf; enum virtchnl_status_code v_ret; + struct ice_vsi *vf_vsi; struct device *dev; struct ice_pf *pf; int is_tun = 0; @@ -2073,6 +2078,17 @@ int ice_vc_add_fdir_fltr(struct ice_vf *vf, u8 *msg) pf = vf->pf; dev = ice_pf_to_dev(pf); + vf_vsi = ice_get_vf_vsi(vf); + +#define ICE_VF_MAX_FDIR_FILTERS 128 + if (!ice_fdir_num_avail_fltr(&pf->hw, vf_vsi) || + vf->fdir.fdir_fltr_cnt_total >= ICE_VF_MAX_FDIR_FILTERS) { + v_ret = VIRTCHNL_STATUS_ERR_PARAM; + dev_err(dev, "Max number of FDIR filters for VF %d is reached\n", + vf->vf_id); + goto err_exit; + } + ret = ice_vc_fdir_param_check(vf, fltr->vsi_id); if (ret) { v_ret = VIRTCHNL_STATUS_ERR_PARAM; diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.h b/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.h index f4e629f4c09b..6258af1ff0aa 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.h +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.h @@ -28,6 +28,7 @@ struct ice_vf_fdir_ctx { struct ice_vf_fdir { u16 fdir_fltr_cnt[ICE_FLTR_PTYPE_MAX][ICE_FD_HW_SEG_MAX]; int prof_entry_cnt[ICE_FLTR_PTYPE_MAX][ICE_FD_HW_SEG_MAX]; + u16 fdir_fltr_cnt_total; struct ice_fd_hw_prof **fdir_prof; struct idr fdir_rule_idr; -- 2.46.0

1 year, 1 month

1
0
0 0

[PATCH net 8/8] can: mcp251xfd: mcp251xfd_get_tef_len(): fix length calculation

by Marc Kleine-Budde

Commit b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum") introduced mcp251xfd_get_tef_len() to get the number of unhandled transmit events from the Transmit Event FIFO (TEF). As the TEF has no head pointer, the driver uses the TX FIFO's tail pointer instead, assuming that send frames are completed. However the check for the TEF being full was not correct. This leads to the driver stop working if the TEF is full. Fix the TEF full check by assuming that if, from the driver's point of view, there are no free TX buffers in the chip and the TX FIFO is empty, all messages must have been sent and the TEF must therefore be full. Reported-by: Sven Schuchmann <schuchmann(a)schleissheimer.de> Closes: https://patch.msgid.link/FR3P281MB155216711EFF900AD9791B7ED9692@FR3P281MB15… Fixes: b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum") Tested-by: Sven Schuchmann <schuchmann(a)schleissheimer.de> Cc: stable(a)vger.kernel.org Link: https://patch.msgid.link/20241104-mcp251xfd-fix-length-calculation-v3-1-608… Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de> --- drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c index f732556d233a..d3ac865933fd 100644 --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c @@ -16,9 +16,9 @@ #include "mcp251xfd.h" -static inline bool mcp251xfd_tx_fifo_sta_full(u32 fifo_sta) +static inline bool mcp251xfd_tx_fifo_sta_empty(u32 fifo_sta) { - return !(fifo_sta & MCP251XFD_REG_FIFOSTA_TFNRFNIF); + return fifo_sta & MCP251XFD_REG_FIFOSTA_TFERFFIF; } static inline int @@ -122,7 +122,11 @@ mcp251xfd_get_tef_len(struct mcp251xfd_priv *priv, u8 *len_p) if (err) return err; - if (mcp251xfd_tx_fifo_sta_full(fifo_sta)) { + /* If the chip says the TX-FIFO is empty, but there are no TX + * buffers free in the ring, we assume all have been sent. + */ + if (mcp251xfd_tx_fifo_sta_empty(fifo_sta) && + mcp251xfd_get_tx_free(tx_ring) == 0) { *len_p = tx_ring->obj_num; return 0; } -- 2.45.2

1 year, 1 month

1
0
0 0

[PATCH net 3/8] can: m_can: m_can_close(): don't call free_irq() for IRQ-less devices

by Marc Kleine-Budde

In commit b382380c0d2d ("can: m_can: Add hrtimer to generate software interrupt") support for IRQ-less devices was added. Instead of an interrupt, the interrupt routine is called by a hrtimer-based polling loop. That patch forgot to change free_irq() to be only called for devices with IRQs. Fix this, by calling free_irq() conditionally only if an IRQ is available for the device (and thus has been requested previously). Fixes: b382380c0d2d ("can: m_can: Add hrtimer to generate software interrupt") Reviewed-by: Simon Horman <horms(a)kernel.org> Reviewed-by: Markus Schneider-Pargmann <msp(a)baylibre.com> Link: https://patch.msgid.link/20240930-m_can-cleanups-v1-1-001c579cdee4@pengutro… Cc: <stable(a)vger.kernel.org> # v6.6+ Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de> --- drivers/net/can/m_can/m_can.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/net/can/m_can/m_can.c b/drivers/net/can/m_can/m_can.c index a978b960f1f1..16e9e7d7527d 100644 --- a/drivers/net/can/m_can/m_can.c +++ b/drivers/net/can/m_can/m_can.c @@ -1765,7 +1765,8 @@ static int m_can_close(struct net_device *dev) netif_stop_queue(dev); m_can_stop(dev); - free_irq(dev->irq, dev); + if (dev->irq) + free_irq(dev->irq, dev); m_can_clean(dev); -- 2.45.2

1 year, 1 month

1
0
0 0

[PATCH net 2/8] can: {cc770,sja1000}_isa: allow building on x86_64

by Marc Kleine-Budde

From: Thomas Mühlbacher <tmuehlbacher(a)posteo.net> The ISA variable is only defined if X86_32 is also defined. However, these drivers are still useful and in use on at least some modern 64-bit x86 industrial systems as well. With the correct module parameters, they work as long as IO port communication is possible, despite their name having ISA in them. Fixes: a29689e60ed3 ("net: handle HAS_IOPORT dependencies") Signed-off-by: Thomas Mühlbacher <tmuehlbacher(a)posteo.net> Link: https://patch.msgid.link/20240919174151.15473-2-tmuehlbacher@posteo.net Cc: stable(a)vger.kernel.org Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de> --- drivers/net/can/cc770/Kconfig | 2 +- drivers/net/can/sja1000/Kconfig | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/net/can/cc770/Kconfig b/drivers/net/can/cc770/Kconfig index 467ef19de1c1..aae25c2f849e 100644 --- a/drivers/net/can/cc770/Kconfig +++ b/drivers/net/can/cc770/Kconfig @@ -7,7 +7,7 @@ if CAN_CC770 config CAN_CC770_ISA tristate "ISA Bus based legacy CC770 driver" - depends on ISA + depends on HAS_IOPORT help This driver adds legacy support for CC770 and AN82527 chips connected to the ISA bus using I/O port, memory mapped or diff --git a/drivers/net/can/sja1000/Kconfig b/drivers/net/can/sja1000/Kconfig index 01168db4c106..2f516cc6d22c 100644 --- a/drivers/net/can/sja1000/Kconfig +++ b/drivers/net/can/sja1000/Kconfig @@ -87,7 +87,7 @@ config CAN_PLX_PCI config CAN_SJA1000_ISA tristate "ISA Bus based legacy SJA1000 driver" - depends on ISA + depends on HAS_IOPORT help This driver adds legacy support for SJA1000 chips connected to the ISA bus using I/O port, memory mapped or indirect access. -- 2.45.2

1 year, 1 month

1
0
0 0

[PATCH] signal: restore the override_rlimit logic

by Roman Gushchin

Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of signals. However now it's enforced unconditionally, even if override_rlimit is set. This behavior change caused production issues. For example, if the limit is reached and a process receives a SIGSEGV signal, sigqueue_alloc fails to allocate the necessary resources for the signal delivery, preventing the signal from being delivered with siginfo. This prevents the process from correctly identifying the fault address and handling the error. From the user-space perspective, applications are unaware that the limit has been reached and that the siginfo is effectively 'corrupted'. This can lead to unpredictable behavior and crashes, as we observed with java applications. Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip the comparison to max there if override_rlimit is set. This effectively restores the old behavior. Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev> Co-developed-by: Andrei Vagin <avagin(a)google.com> Signed-off-by: Andrei Vagin <avagin(a)google.com> Cc: Kees Cook <kees(a)kernel.org> Cc: "Eric W. Biederman" <ebiederm(a)xmission.com> Cc: Alexey Gladkov <legion(a)kernel.org> Cc: <stable(a)vger.kernel.org> --- include/linux/user_namespace.h | 3 ++- kernel/signal.c | 3 ++- kernel/ucount.c | 5 +++-- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/include/linux/user_namespace.h b/include/linux/user_namespace.h index 3625096d5f85..7183e5aca282 100644 --- a/include/linux/user_namespace.h +++ b/include/linux/user_namespace.h @@ -141,7 +141,8 @@ static inline long get_rlimit_value(struct ucounts *ucounts, enum rlimit_type ty long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v); bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v); -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type); +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type, + bool override_rlimit); void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type); bool is_rlimit_overlimit(struct ucounts *ucounts, enum rlimit_type type, unsigned long max); diff --git a/kernel/signal.c b/kernel/signal.c index 4344860ffcac..cbabb2d05e0a 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -419,7 +419,8 @@ __sigqueue_alloc(int sig, struct task_struct *t, gfp_t gfp_flags, */ rcu_read_lock(); ucounts = task_ucounts(t); - sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, + override_rlimit); rcu_read_unlock(); if (!sigpending) return NULL; diff --git a/kernel/ucount.c b/kernel/ucount.c index 16c0ea1cb432..046b3d57ebb4 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -307,7 +307,8 @@ void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type) do_dec_rlimit_put_ucounts(ucounts, NULL, type); } -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type) +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type, + bool override_rlimit) { /* Caller must hold a reference to ucounts */ struct ucounts *iter; @@ -316,7 +317,7 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type) for (iter = ucounts; iter; iter = iter->ns->ucounts) { long new = atomic_long_add_return(1, &iter->rlimit[type]); - if (new < 0 || new > max) + if (new < 0 || (!override_rlimit && (new > max))) goto unwind; if (iter == ucounts) ret = new; -- 2.47.0.163.g1226f6d8fa-goog

1 year, 1 month

5
14
0 0

[PATCH 5.15.y] drm/i915: Fix potential context UAFs

by Sherry Yang

From: Rob Clark <robdclark(a)chromium.org> commit afce71ff6daa9c0f852df0727fe32c6fb107f0fa upstream. gem_context_register() makes the context visible to userspace, and which point a separate thread can trigger the I915_GEM_CONTEXT_DESTROY ioctl. So we need to ensure that nothing uses the ctx ptr after this. And we need to ensure that adding the ctx to the xarray is the *last* thing that gem_context_register() does with the ctx pointer. Signed-off-by: Rob Clark <robdclark(a)chromium.org> Fixes: eb4dedae920a ("drm/i915/gem: Delay tracking the GEM context until it is registered") Fixes: a4c1cdd34e2c ("drm/i915/gem: Delay context creation (v3)") Fixes: 49bd54b390c2 ("drm/i915: Track all user contexts per client") Cc: <stable(a)vger.kernel.org> # v5.10+ Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com> Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com> [tursulin: Stable and fixes tags add/tidy.] Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230103234948.1218393-1-robd… (cherry picked from commit bed4b455cf5374e68879be56971c1da563bcd90c) Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> [Sherry: bp to fix CVE-2023-52913, ignore context conflicts due to missing commit 49bd54b390c2 "drm/i915: Track all user contexts per client")] Signed-off-by: Sherry Yang <sherry.yang(a)oracle.com> --- drivers/gpu/drm/i915/gem/i915_gem_context.c | 24 +++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c index 0eb4a0739fa2..0a7c4548b77f 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c @@ -1436,6 +1436,10 @@ void i915_gem_init__contexts(struct drm_i915_private *i915) init_contexts(&i915->gem.contexts); } +/* + * Note that this implicitly consumes the ctx reference, by placing + * the ctx in the context_xa. + */ static void gem_context_register(struct i915_gem_context *ctx, struct drm_i915_file_private *fpriv, u32 id) @@ -1449,13 +1453,13 @@ static void gem_context_register(struct i915_gem_context *ctx, snprintf(ctx->name, sizeof(ctx->name), "%s[%d]", current->comm, pid_nr(ctx->pid)); - /* And finally expose ourselves to userspace via the idr */ - old = xa_store(&fpriv->context_xa, id, ctx, GFP_KERNEL); - WARN_ON(old); - spin_lock(&i915->gem.contexts.lock); list_add_tail(&ctx->link, &i915->gem.contexts.list); spin_unlock(&i915->gem.contexts.lock); + + /* And finally expose ourselves to userspace via the idr */ + old = xa_store(&fpriv->context_xa, id, ctx, GFP_KERNEL); + WARN_ON(old); } int i915_gem_context_open(struct drm_i915_private *i915, @@ -1932,14 +1936,22 @@ finalize_create_context_locked(struct drm_i915_file_private *file_priv, if (IS_ERR(ctx)) return ctx; + /* + * One for the xarray and one for the caller. We need to grab + * the reference *prior* to making the ctx visble to userspace + * in gem_context_register(), as at any point after that + * userspace can try to race us with another thread destroying + * the context under our feet. + */ + i915_gem_context_get(ctx); + gem_context_register(ctx, file_priv, id); old = xa_erase(&file_priv->proto_context_xa, id); GEM_BUG_ON(old != pc); proto_context_close(pc); - /* One for the xarray and one for the caller */ - return i915_gem_context_get(ctx); + return ctx; } struct i915_gem_context * -- 2.46.0

1 year, 1 month

1
0
0 0

[PATCH 1/6] driver core: Introduce device_{add,remove}_of_node()

by Herve Codina

An of_node can be set to a device using device_set_node(). This function cannot prevent any of_node and/or fwnode overwrites. When adding an of_node on an already present device, the following operations need to be done: - Attach the of_node if no of_node were already attached - Attach the of_node as a fwnode if no fwnode were already attached This is the purpose of device_add_of_node(). device_remove_of_node() reverts the operations done by device_add_of_node(). Cc: stable(a)vger.kernel.org Signed-off-by: Herve Codina <herve.codina(a)bootlin.com> --- drivers/base/core.c | 52 ++++++++++++++++++++++++++++++++++++++++++ include/linux/device.h | 2 ++ 2 files changed, 54 insertions(+) diff --git a/drivers/base/core.c b/drivers/base/core.c index 24c572031403..0aa63371f55d 100644 --- a/drivers/base/core.c +++ b/drivers/base/core.c @@ -5118,6 +5118,58 @@ void set_secondary_fwnode(struct device *dev, struct fwnode_handle *fwnode) } EXPORT_SYMBOL_GPL(set_secondary_fwnode); +/** + * device_remove_of_node - Remove an of_node from a device + * @dev: device whose device-tree node is being removed + */ +void device_remove_of_node(struct device *dev) +{ + dev = get_device(dev); + if (!dev) + return; + + if (!dev->of_node) + goto end; + + if (dev->fwnode == of_fwnode_handle(dev->of_node)) + dev->fwnode = NULL; + + of_node_put(dev->of_node); + dev->of_node = NULL; + +end: + put_device(dev); +} +EXPORT_SYMBOL_GPL(device_remove_of_node); + +/** + * device_add_of_node - Add an of_node to an existing device + * @dev: device whose device-tree node is being added + * @of_node: of_node to add + */ +void device_add_of_node(struct device *dev, struct device_node *of_node) +{ + if (!of_node) + return; + + dev = get_device(dev); + if (!dev) + return; + + if (WARN(dev->of_node, "%s: Cannot replace node %pOF with %pOF\n", + dev_name(dev), dev->of_node, of_node)) + goto end; + + dev->of_node = of_node_get(of_node); + + if (!dev->fwnode) + dev->fwnode = of_fwnode_handle(of_node); + +end: + put_device(dev); +} +EXPORT_SYMBOL_GPL(device_add_of_node); + /** * device_set_of_node_from_dev - reuse device-tree node of another device * @dev: device whose device-tree node is being set diff --git a/include/linux/device.h b/include/linux/device.h index b4bde8d22697..e3aa25ce1f90 100644 --- a/include/linux/device.h +++ b/include/linux/device.h @@ -1146,6 +1146,8 @@ int device_online(struct device *dev); void set_primary_fwnode(struct device *dev, struct fwnode_handle *fwnode); void set_secondary_fwnode(struct device *dev, struct fwnode_handle *fwnode); void device_set_node(struct device *dev, struct fwnode_handle *fwnode); +void device_add_of_node(struct device *dev, struct device_node *of_node); +void device_remove_of_node(struct device *dev); void device_set_of_node_from_dev(struct device *dev, const struct device *dev2); static inline struct device_node *dev_of_node(struct device *dev) -- 2.46.2

1 year, 1 month

1
0
0 0

[PATCH 2/4] um: ubd: Do not use drvdata in release

by Tiwei Bie

The drvdata is not available in release. Let's just use container_of() to get the ubd instance. Otherwise, removing a ubd device will result in a crash: RIP: 0033:blk_mq_free_tag_set+0x1f/0xba RSP: 00000000e2083bf0 EFLAGS: 00010246 RAX: 000000006021463a RBX: 0000000000000348 RCX: 0000000062604d00 RDX: 0000000004208060 RSI: 00000000605241a0 RDI: 0000000000000348 RBP: 00000000e2083c10 R08: 0000000062414010 R09: 00000000601603f7 R10: 000000000000133a R11: 000000006038c4bd R12: 0000000000000000 R13: 0000000060213a5c R14: 0000000062405d20 R15: 00000000604f7aa0 Kernel panic - not syncing: Segfault with no mm CPU: 0 PID: 17 Comm: kworker/0:1 Not tainted 6.8.0-rc3-00107-gba3f67c11638 #1 Workqueue: events mc_work_proc Stack: 00000000 604f7ef0 62c5d000 62405d20 e2083c30 6002c776 6002c755 600e47ff e2083c60 6025ffe3 04208060 603d36e0 Call Trace: [<6002c776>] ubd_device_release+0x21/0x55 [<6002c755>] ? ubd_device_release+0x0/0x55 [<600e47ff>] ? kfree+0x0/0x100 [<6025ffe3>] device_release+0x70/0xba [<60381d6a>] kobject_put+0xb5/0xe2 [<6026027b>] put_device+0x19/0x1c [<6026a036>] platform_device_put+0x26/0x29 [<6026ac5a>] platform_device_unregister+0x2c/0x2e [<6002c52e>] ubd_remove+0xb8/0xd6 [<6002bb74>] ? mconsole_reply+0x0/0x50 [<6002b926>] mconsole_remove+0x160/0x1cc [<6002bbbc>] ? mconsole_reply+0x48/0x50 [<6003379c>] ? um_set_signals+0x3b/0x43 [<60061c55>] ? update_min_vruntime+0x14/0x70 [<6006251f>] ? dequeue_task_fair+0x164/0x235 [<600620aa>] ? update_cfs_group+0x0/0x40 [<603a0e77>] ? __schedule+0x0/0x3ed [<60033761>] ? um_set_signals+0x0/0x43 [<6002af6a>] mc_work_proc+0x77/0x91 [<600520b4>] process_scheduled_works+0x1af/0x2c3 [<6004ede3>] ? assign_work+0x0/0x58 [<600527a1>] worker_thread+0x2f7/0x37a [<6004ee3b>] ? set_pf_worker+0x0/0x64 [<6005765d>] ? arch_local_irq_save+0x0/0x2d [<60058e07>] ? kthread_exit+0x0/0x3a [<600524aa>] ? worker_thread+0x0/0x37a [<60058f9f>] kthread+0x130/0x135 [<6002068e>] new_thread_handler+0x85/0xb6 Cc: stable(a)vger.kernel.org Signed-off-by: Tiwei Bie <tiwei.btw(a)antgroup.com> --- arch/um/drivers/ubd_kern.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index f19173da64d8..66c1a8835e36 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -779,7 +779,7 @@ static int ubd_open_dev(struct ubd *ubd_dev) static void ubd_device_release(struct device *dev) { - struct ubd *ubd_dev = dev_get_drvdata(dev); + struct ubd *ubd_dev = container_of(dev, struct ubd, pdev.dev); blk_mq_free_tag_set(&ubd_dev->tag_set); *ubd_dev = ((struct ubd) DEFAULT_UBD); -- 2.34.1

1 year, 1 month

2
1
0 0

[PATCH 3/4] um: net: Do not use drvdata in release

by Tiwei Bie

The drvdata is not available in release. Let's just use container_of() to get the uml_net instance. Otherwise, removing a network device will result in a crash: RIP: 0033:net_device_release+0x10/0x6f RSP: 00000000e20c7c40 EFLAGS: 00010206 RAX: 000000006002e4e7 RBX: 00000000600f1baf RCX: 00000000624074e0 RDX: 0000000062778000 RSI: 0000000060551c80 RDI: 00000000627af028 RBP: 00000000e20c7c50 R08: 00000000603ad594 R09: 00000000e20c7b70 R10: 000000000000135a R11: 00000000603ad422 R12: 0000000000000000 R13: 0000000062c7af00 R14: 0000000062406d60 R15: 00000000627700b6 Kernel panic - not syncing: Segfault with no mm CPU: 0 UID: 0 PID: 29 Comm: kworker/0:2 Not tainted 6.12.0-rc6-g59b723cd2adb #1 Workqueue: events mc_work_proc Stack: 627af028 62c7af00 e20c7c80 60276fcd 62778000 603f5820 627af028 00000000 e20c7cb0 603a2bcd 627af000 62770010 Call Trace: [<60276fcd>] device_release+0x70/0xba [<603a2bcd>] kobject_put+0xba/0xe7 [<60277265>] put_device+0x19/0x1c [<60281266>] platform_device_put+0x26/0x29 [<60281e5f>] platform_device_unregister+0x2c/0x2e [<6002ec9c>] net_remove+0x63/0x69 [<60031316>] ? mconsole_reply+0x0/0x50 [<600310c8>] mconsole_remove+0x160/0x1cc [<60087d40>] ? __remove_hrtimer+0x38/0x74 [<60087ff8>] ? hrtimer_try_to_cancel+0x8c/0x98 [<6006b3cf>] ? dl_server_stop+0x3f/0x48 [<6006b390>] ? dl_server_stop+0x0/0x48 [<600672e8>] ? dequeue_entities+0x327/0x390 [<60038fa6>] ? um_set_signals+0x0/0x43 [<6003070c>] mc_work_proc+0x77/0x91 [<60057664>] process_scheduled_works+0x1b3/0x2dd [<60055f32>] ? assign_work+0x0/0x58 [<60057f0a>] worker_thread+0x1e9/0x293 [<6005406f>] ? set_pf_worker+0x0/0x64 [<6005d65d>] ? arch_local_irq_save+0x0/0x2d [<6005d748>] ? kthread_exit+0x0/0x3a [<60057d21>] ? worker_thread+0x0/0x293 [<6005dbf1>] kthread+0x126/0x12b [<600219c5>] new_thread_handler+0x85/0xb6 Cc: stable(a)vger.kernel.org Signed-off-by: Tiwei Bie <tiwei.btw(a)antgroup.com> --- arch/um/drivers/net_kern.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/um/drivers/net_kern.c b/arch/um/drivers/net_kern.c index 77c4afb8ab90..75d04fb4994a 100644 --- a/arch/um/drivers/net_kern.c +++ b/arch/um/drivers/net_kern.c @@ -336,7 +336,7 @@ static struct platform_driver uml_net_driver = { static void net_device_release(struct device *dev) { - struct uml_net *device = dev_get_drvdata(dev); + struct uml_net *device = container_of(dev, struct uml_net, pdev.dev); struct net_device *netdev = device->dev; struct uml_net_private *lp = netdev_priv(netdev); -- 2.34.1

1 year, 1 month

2
1
0 0

[PATCH 4/4] um: vector: Do not use drvdata in release

by Tiwei Bie

The drvdata is not available in release. Let's just use container_of() to get the vector_device instance. Otherwise, removing a vector device will result in a crash: RIP: 0033:vector_device_release+0xf/0x50 RSP: 00000000e187bc40 EFLAGS: 00010202 RAX: 0000000060028f61 RBX: 00000000600f1baf RCX: 00000000620074e0 RDX: 000000006220b9c0 RSI: 0000000060551c80 RDI: 0000000000000000 RBP: 00000000e187bc50 R08: 00000000603ad594 R09: 00000000e187bb70 R10: 000000000000135a R11: 00000000603ad422 R12: 00000000623ae028 R13: 000000006287a200 R14: 0000000062006d30 R15: 00000000623700b6 Kernel panic - not syncing: Segfault with no mm CPU: 0 UID: 0 PID: 16 Comm: kworker/0:1 Not tainted 6.12.0-rc6-g59b723cd2adb #1 Workqueue: events mc_work_proc Stack: 60028f61 623ae028 e187bc80 60276fcd 6220b9c0 603f5820 623ae028 00000000 e187bcb0 603a2bcd 623ae000 62370010 Call Trace: [<60028f61>] ? vector_device_release+0x0/0x50 [<60276fcd>] device_release+0x70/0xba [<603a2bcd>] kobject_put+0xba/0xe7 [<60277265>] put_device+0x19/0x1c [<60281266>] platform_device_put+0x26/0x29 [<60281e5f>] platform_device_unregister+0x2c/0x2e [<60029422>] vector_remove+0x52/0x58 [<60031316>] ? mconsole_reply+0x0/0x50 [<600310c8>] mconsole_remove+0x160/0x1cc [<603b19f4>] ? strlen+0x0/0x15 [<60066611>] ? __dequeue_entity+0x1a9/0x206 [<600666a7>] ? set_next_entity+0x39/0x63 [<6006666e>] ? set_next_entity+0x0/0x63 [<60038fa6>] ? um_set_signals+0x0/0x43 [<6003070c>] mc_work_proc+0x77/0x91 [<60057664>] process_scheduled_works+0x1b3/0x2dd [<60055f32>] ? assign_work+0x0/0x58 [<60057f0a>] worker_thread+0x1e9/0x293 [<6005406f>] ? set_pf_worker+0x0/0x64 [<6005d65d>] ? arch_local_irq_save+0x0/0x2d [<6005d748>] ? kthread_exit+0x0/0x3a [<60057d21>] ? worker_thread+0x0/0x293 [<6005dbf1>] kthread+0x126/0x12b [<600219c5>] new_thread_handler+0x85/0xb6 Cc: stable(a)vger.kernel.org Signed-off-by: Tiwei Bie <tiwei.btw(a)antgroup.com> --- arch/um/drivers/vector_kern.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/um/drivers/vector_kern.c b/arch/um/drivers/vector_kern.c index c992da83268d..64c09db392c1 100644 --- a/arch/um/drivers/vector_kern.c +++ b/arch/um/drivers/vector_kern.c @@ -815,7 +815,8 @@ static struct platform_driver uml_net_driver = { static void vector_device_release(struct device *dev) { - struct vector_device *device = dev_get_drvdata(dev); + struct vector_device *device = + container_of(dev, struct vector_device, pdev.dev); struct net_device *netdev = device->dev; list_del(&device->list); -- 2.34.1

1 year, 1 month

2
1
0 0

[PATCH v2] gpio: exar: set value when external pull-up or pull-down is present

by Sai Kumar Cholleti

Setting GPIO direction = high, sometimes results in GPIO value = 0. If a GPIO is pulled high, the following construction results in the value being 0 when the desired value is 1: $ echo "high" > /sys/class/gpio/gpio336/direction $ cat /sys/class/gpio/gpio336/value 0 Before the GPIO direction is changed from an input to an output, exar_set_value() is called with value = 1, but since the GPIO is an input when exar_set_value() is called, _regmap_update_bits() reads a 1 due to an external pull-up. regmap_set_bits() sets force_write = false, so the value (1) is not written. When the direction is then changed, the GPIO becomes an output with the value of 0 (the hardware default). regmap_write_bits() sets force_write = true, so the value is always written by exar_set_value() and an external pull-up doesn't affect the outcome of setting direction = high. The same can happen when a GPIO is pulled low, but the scenario is a little more complicated. $ echo high > /sys/class/gpio/gpio351/direction $ cat /sys/class/gpio/gpio351/value 1 $ echo in > /sys/class/gpio/gpio351/direction $ cat /sys/class/gpio/gpio351/value 0 $ echo low > /sys/class/gpio/gpio351/direction $ cat /sys/class/gpio/gpio351/value 1 Fixes: 36fb7218e878 ("gpio: exar: switch to using regmap") Signed-off-by: Sai Kumar Cholleti <skmr537(a)gmail.com> Signed-off-by: Matthew McClain <mmcclain(a)noprivs.com> Cc: <stable(a)vger.kernel.org> --- drivers/gpio/gpio-exar.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpio/gpio-exar.c b/drivers/gpio/gpio-exar.c index 5170fe7599cd..dfc7a9ca3e62 100644 --- a/drivers/gpio/gpio-exar.c +++ b/drivers/gpio/gpio-exar.c @@ -99,11 +99,13 @@ static void exar_set_value(struct gpio_chip *chip, unsigned int offset, struct exar_gpio_chip *exar_gpio = gpiochip_get_data(chip); unsigned int addr = exar_offset_to_lvl_addr(exar_gpio, offset); unsigned int bit = exar_offset_to_bit(exar_gpio, offset); + unsigned int bit_value = value ? BIT(bit) : 0; - if (value) - regmap_set_bits(exar_gpio->regmap, addr, BIT(bit)); - else - regmap_clear_bits(exar_gpio->regmap, addr, BIT(bit)); + /* + * regmap_write_bits forces value to be written when an external + * pull up/down might otherwise indicate value was already set + */ + regmap_write_bits(exar_gpio->regmap, addr, BIT(bit), bit_value); } static int exar_direction_output(struct gpio_chip *chip, unsigned int offset, -- 2.34.1

1 year, 1 month

1
0
0 0

[PATCH 6.1.y / 6.6.y 0/4] Backport fix(es) for dummy_hcd transfer rate

by Guilherme G. Piccoli

Hi folks, here is a series with some fixes for dummy_hcd. First of all, the reasoning behind it. Syzkaller report [0] shows a hung task on uevent_show, and despite it was fixed with a patch on drivers/base (a race between drivers shutdown and uevent_show), another issue remains: a problem with Realtek emulated wifi device [1]. While working the fix ([1]), we noticed that if it is applied to recent kernels, all fine. But in v6.1.y and v6.6.y for example, it didn't solve entirely the issue, and after some debugging, it was narrowed to dummy_hcd transfer rates being waaay slower in such stable versions. The reason of such slowness is well-described in the first 2 patches of this backport, but the thing is that these patches introduced subtle issues as well, fixed in the other 2 patches. Hence, I decided to backport all of them for the 2 latest LTS kernels. Maybe this is not a good idea - I don't see a strong con, but who's better to judge the benefits vs the risks than the patch authors, reviewers, and the USB maintainer?! So, I've CCed Alan, Andrey, Greg and Marcello here, and I thank you all in advance for reviews on this. And my apologies for bothering you with the emails, I hope this is a simple "OK, makes sense" or "Nah, doesn't worth it" situation =) Cheers, Guilherme [0] https://syzkaller.appspot.com/bug?extid=edd9fe0d3a65b14588d5 [1] https://lore.kernel.org/r/20241101193412.1390391-1-gpiccoli@igalia.com/ Alan Stern (1): USB: gadget: dummy-hcd: Fix "task hung" problem Andrey Konovalov (1): usb: gadget: dummy_hcd: execute hrtimer callback in softirq context Marcello Sylvester Bauer (2): usb: gadget: dummy_hcd: Switch to hrtimer transfer scheduler usb: gadget: dummy_hcd: Set transfer interval to 1 microframe drivers/usb/gadget/udc/dummy_hcd.c | 57 ++++++++++++++++++++---------- 1 file changed, 38 insertions(+), 19 deletions(-) -- 2.46.2

1 year, 1 month

4
8
0 0

[PATCH v3] usb: dwc3: core: Fix system suspend on TI AM62 platforms

by Roger Quadros

Since commit 6d735722063a ("usb: dwc3: core: Prevent phy suspend during init"), system suspend is broken on AM62 TI platforms. Before that commit, both DWC3_GUSB3PIPECTL_SUSPHY and DWC3_GUSB2PHYCFG_SUSPHY bits (hence forth called 2 SUSPHY bits) were being set during core initialization and even during core re-initialization after a system suspend/resume. These bits are required to be set for system suspend/resume to work correctly on AM62 platforms. Since that commit, the 2 SUSPHY bits are not set for DEVICE/OTG mode if gadget driver is not loaded and started. For Host mode, the 2 SUSPHY bits are set before the first system suspend but get cleared at system resume during core re-init and are never set again. This patch resovles these two issues by ensuring the 2 SUSPHY bits are set before system suspend and restored to the original state during system resume. Cc: stable(a)vger.kernel.org # v6.9+ Fixes: 6d735722063a ("usb: dwc3: core: Prevent phy suspend during init") Link: https://lore.kernel.org/all/1519dbe7-73b6-4afc-bfe3-23f4f75d772f@kernel.org/ Signed-off-by: Roger Quadros <rogerq(a)kernel.org> Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com> --- Changes in v3: - Fix single line comment style - add DWC3_GUSB3PIPECTL_SUSPHY to documentation of susphy_state - Added Acked-by tag - Link to v2: https://lore.kernel.org/r/20241009-am62-lpm-usb-v2-1-da26c0cd2b1e@kernel.org Changes in v2: - Fix comment style - Use both USB3 and USB2 SUSPHY bits to determine susphy_state during system suspend/resume. - Restore SUSPHY bits at system resume regardless if it was set or cleared before system suspend. - Link to v1: https://lore.kernel.org/r/20241001-am62-lpm-usb-v1-1-9916b71165f7@kernel.org --- drivers/usb/dwc3/core.c | 19 +++++++++++++++++++ drivers/usb/dwc3/core.h | 3 +++ 2 files changed, 22 insertions(+) diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c index 9eb085f359ce..ca77f0b186c4 100644 --- a/drivers/usb/dwc3/core.c +++ b/drivers/usb/dwc3/core.c @@ -2336,6 +2336,11 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg) u32 reg; int i; + dwc->susphy_state = (dwc3_readl(dwc->regs, DWC3_GUSB2PHYCFG(0)) & + DWC3_GUSB2PHYCFG_SUSPHY) || + (dwc3_readl(dwc->regs, DWC3_GUSB3PIPECTL(0)) & + DWC3_GUSB3PIPECTL_SUSPHY); + switch (dwc->current_dr_role) { case DWC3_GCTL_PRTCAP_DEVICE: if (pm_runtime_suspended(dwc->dev)) @@ -2387,6 +2392,15 @@ static int dwc3_suspend_common(struct dwc3 *dwc, pm_message_t msg) break; } + if (!PMSG_IS_AUTO(msg)) { + /* + * TI AM62 platform requires SUSPHY to be + * enabled for system suspend to work. + */ + if (!dwc->susphy_state) + dwc3_enable_susphy(dwc, true); + } + return 0; } @@ -2454,6 +2468,11 @@ static int dwc3_resume_common(struct dwc3 *dwc, pm_message_t msg) break; } + if (!PMSG_IS_AUTO(msg)) { + /* restore SUSPHY state to that before system suspend. */ + dwc3_enable_susphy(dwc, dwc->susphy_state); + } + return 0; } diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h index c71240e8f7c7..31de4b57ae7c 100644 --- a/drivers/usb/dwc3/core.h +++ b/drivers/usb/dwc3/core.h @@ -1150,6 +1150,8 @@ struct dwc3_scratchpad_array { * @sys_wakeup: set if the device may do system wakeup. * @wakeup_configured: set if the device is configured for remote wakeup. * @suspended: set to track suspend event due to U3/L2. + * @susphy_state: state of DWC3_GUSB2PHYCFG_SUSPHY + DWC3_GUSB3PIPECTL_SUSPHY + * before PM suspend. * @imod_interval: set the interrupt moderation interval in 250ns * increments or 0 to disable. * @max_cfg_eps: current max number of IN eps used across all USB configs. @@ -1382,6 +1384,7 @@ struct dwc3 { unsigned sys_wakeup:1; unsigned wakeup_configured:1; unsigned suspended:1; + unsigned susphy_state:1; u16 imod_interval; --- base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc change-id: 20240923-am62-lpm-usb-f420917bd707 Best regards, -- Roger Quadros <rogerq(a)kernel.org>

1 year, 1 month

4
7
0 0

[PATCH] drm/bridge: it6505: Fix inverted reset polarity

by Chen-Yu Tsai

The IT6505 bridge chip has a active low reset line. Since it is a "reset" and not an "enable" line, the GPIO should be asserted to put it in reset and deasserted to bring it out of reset during the power on sequence. The polarity was inverted when the driver was first introduced, likely because the device family that was targeted had an inverting level shifter on the reset line. The MT8186 Corsola devices already have the IT6505 in their device tree, but the whole display pipeline is actually disabled and won't be enabled until some remaining issues are sorted out. The other known user is the MT8183 Kukui / Jacuzzi family; their device trees currently do not have the IT6505 included. Fix the polarity in the driver while there are no actual users. Fixes: b5c84a9edcd4 ("drm/bridge: add it6505 driver") Cc: <stable(a)vger.kernel.org> Signed-off-by: Chen-Yu Tsai <wenst(a)chromium.org> --- drivers/gpu/drm/bridge/ite-it6505.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/bridge/ite-it6505.c b/drivers/gpu/drm/bridge/ite-it6505.c index 7502a5f81557..df7ecdf0f422 100644 --- a/drivers/gpu/drm/bridge/ite-it6505.c +++ b/drivers/gpu/drm/bridge/ite-it6505.c @@ -2618,9 +2618,9 @@ static int it6505_poweron(struct it6505 *it6505) /* time interval between OVDD and SYSRSTN at least be 10ms */ if (pdata->gpiod_reset) { usleep_range(10000, 20000); - gpiod_set_value_cansleep(pdata->gpiod_reset, 0); - usleep_range(1000, 2000); gpiod_set_value_cansleep(pdata->gpiod_reset, 1); + usleep_range(1000, 2000); + gpiod_set_value_cansleep(pdata->gpiod_reset, 0); usleep_range(25000, 35000); } @@ -2651,7 +2651,7 @@ static int it6505_poweroff(struct it6505 *it6505) disable_irq_nosync(it6505->irq); if (pdata->gpiod_reset) - gpiod_set_value_cansleep(pdata->gpiod_reset, 0); + gpiod_set_value_cansleep(pdata->gpiod_reset, 1); if (pdata->pwr18) { err = regulator_disable(pdata->pwr18); @@ -3205,7 +3205,7 @@ static int it6505_init_pdata(struct it6505 *it6505) return PTR_ERR(pdata->ovdd); } - pdata->gpiod_reset = devm_gpiod_get(dev, "reset", GPIOD_OUT_LOW); + pdata->gpiod_reset = devm_gpiod_get(dev, "reset", GPIOD_OUT_HIGH); if (IS_ERR(pdata->gpiod_reset)) { dev_err(dev, "gpiod_reset gpio not found"); return PTR_ERR(pdata->gpiod_reset); -- 2.47.0.163.g1226f6d8fa-goog

1 year, 1 month

3
3
0 0

[PATCH 0/2] drm/mediatek: Fix child node refcount handling and use scoped macro

by Javier Carrasco

This series fixes a wrong handling of the child node within the for_each_child_of_node() by adding the missing call to of_node_put() to make it compatible with stable kernels that don't provide the scoped variant of the macro, which is more secure and was introduced early this year. Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com> --- Javier Carrasco (2): drm/mediatek: Fix child node refcount handling in early exit drm/mediatek: Switch to for_each_child_of_node_scoped() drivers/gpu/drm/mediatek/mtk_drm_drv.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- base-commit: d61a00525464bfc5fe92c6ad713350988e492b88 change-id: 20241011-mtk_drm_drv_memleak-5e8b8e45ed1c Best regards, -- Javier Carrasco <javier.carrasco.cruz(a)gmail.com>

1 year, 1 month

5
5
0 0

[PATCH v5 1/3] drm/xe: Move LNL scheduling WA to xe_device.h

by Nirmoy Das

Move LNL scheduling WA to xe_device.h so this can be used in other places without needing keep the same comment about removal of this WA in the future. The WA, which flushes work or workqueues, is now wrapped in macros and can be reused wherever needed. Cc: Badal Nilawar <badal.nilawar(a)intel.com> Cc: Matthew Auld <matthew.auld(a)intel.com> Cc: Matthew Brost <matthew.brost(a)intel.com> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray(a)intel.com> Cc: Lucas De Marchi <lucas.demarchi(a)intel.com> cc: <stable(a)vger.kernel.org> # v6.11+ Suggested-by: John Harrison <John.C.Harrison(a)Intel.com> Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com> --- drivers/gpu/drm/xe/xe_device.h | 14 ++++++++++++++ drivers/gpu/drm/xe/xe_guc_ct.c | 11 +---------- 2 files changed, 15 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h index 4c3f0ebe78a9..f1fbfe916867 100644 --- a/drivers/gpu/drm/xe/xe_device.h +++ b/drivers/gpu/drm/xe/xe_device.h @@ -191,4 +191,18 @@ void xe_device_declare_wedged(struct xe_device *xe); struct xe_file *xe_file_get(struct xe_file *xef); void xe_file_put(struct xe_file *xef); +/* + * Occasionally it is seen that the G2H worker starts running after a delay of more than + * a second even after being queued and activated by the Linux workqueue subsystem. This + * leads to G2H timeout error. The root cause of issue lies with scheduling latency of + * Lunarlake Hybrid CPU. Issue disappears if we disable Lunarlake atom cores from BIOS + * and this is beyond xe kmd. + * + * TODO: Drop this change once workqueue scheduling delay issue is fixed on LNL Hybrid CPU. + */ +#define LNL_FLUSH_WORKQUEUE(wq__) \ + flush_workqueue(wq__) +#define LNL_FLUSH_WORK(wrk__) \ + flush_work(wrk__) + #endif diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c index 1b5d8fb1033a..703b44b257a7 100644 --- a/drivers/gpu/drm/xe/xe_guc_ct.c +++ b/drivers/gpu/drm/xe/xe_guc_ct.c @@ -1018,17 +1018,8 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len, ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ); - /* - * Occasionally it is seen that the G2H worker starts running after a delay of more than - * a second even after being queued and activated by the Linux workqueue subsystem. This - * leads to G2H timeout error. The root cause of issue lies with scheduling latency of - * Lunarlake Hybrid CPU. Issue dissappears if we disable Lunarlake atom cores from BIOS - * and this is beyond xe kmd. - * - * TODO: Drop this change once workqueue scheduling delay issue is fixed on LNL Hybrid CPU. - */ if (!ret) { - flush_work(&ct->g2h_worker); + LNL_FLUSH_WORK(&ct->g2h_worker); if (g2h_fence.done) { xe_gt_warn(gt, "G2H fence %u, action %04x, done\n", g2h_fence.seqno, action[0]); -- 2.46.0

1 year, 1 month

5
13
0 0

[PATCH v2] usb: typec: Drop reference to a fwnode

by Joe Hattori

In typec_port_register_altmodes(), the fwnode reference obtained by device_get_named_child_node() is not dropped. This commit adds a call to fwnode_handle_put() to fix the possible reference leak. Fixes: 7b458a4c5d73 ("usb: typec: Add typec_port_register_altmodes()") Cc: stable(a)vger.kernel.org Signed-off-by: Joe Hattori <joe(a)pf.is.s.u-tokyo.ac.jp> --- Changes in v2: - Add the Cc: stable(a)vger.kernel.org line. --- drivers/usb/typec/class.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/usb/typec/class.c b/drivers/usb/typec/class.c index 58f40156de56..145e12e13aef 100644 --- a/drivers/usb/typec/class.c +++ b/drivers/usb/typec/class.c @@ -2343,6 +2343,7 @@ void typec_port_register_altmodes(struct typec_port *port, altmodes[index] = alt; index++; } + fwnode_handle_put(altmodes_node); } EXPORT_SYMBOL_GPL(typec_port_register_altmodes); -- 2.34.1

1 year, 1 month

3
2
0 0

Re: Patch "igb: Disable threaded IRQ for igb_msix_other" has been added to the 6.11-stable tree

by Sebastian Andrzej Siewior

On 2024-11-01 15:21:24 [-0400], Sasha Levin wrote: > commit 052382490ee4f0f6d783ddce02fe6f2d15e134b5 > Author: Wander Lairson Costa <wander(a)redhat.com> > Date: Mon Oct 21 16:26:24 2024 -0700 > > igb: Disable threaded IRQ for igb_msix_other > > [ Upstream commit 338c4d3902feb5be49bfda530a72c7ab860e2c9f ] > > During testing of SR-IOV, Red Hat QE encountered an issue where the > ip link up command intermittently fails for the igbvf interfaces when > using the PREEMPT_RT variant. Investigation revealed that > e1000_write_posted_mbx returns an error due to the lack of an ACK > from e1000_poll_for_ack. > > The underlying issue arises from the fact that IRQs are threaded by > default under PREEMPT_RT. While the exact hardware details are not > available, it appears that the IRQ handled by igb_msix_other must > be processed before e1000_poll_for_ack times out. However, > e1000_write_posted_mbx is called with preemption disabled, leading > to a scenario where the IRQ is serviced only after the failure of > e1000_write_posted_mbx. > > To resolve this, we set IRQF_NO_THREAD for the affected interrupt, > ensuring that the kernel handles it immediately, thereby preventing > the aforementioned error. Wander, please send a revert of this patch. The ISR (E1000_ICR_TS set) may invoke igb_msg_task(), ptp_clock_event(), igb_perout(), igb_extts() each of which acquire sleeping locks on PREEMPT_RT. Not sure if this improved the situation or not. Sebastian

1 year, 1 month

1
0
0 0

[PATCH AUTOSEL 4.19 1/5] ASoC: Intel: bytcr_rt5640: Add DMI quirk for Vexia Edu Atla 10 tablet

by Sasha Levin

From: Hans de Goede <hdegoede(a)redhat.com> [ Upstream commit 0107f28f135231da22a9ad5756bb16bd5cada4d5 ] The Vexia Edu Atla 10 tablet mostly uses the BYTCR tablet defaults, but as happens on more models it is using IN1 instead of IN3 for its internal mic and JD_SRC_JD2_IN4N instead of JD_SRC_JD1_IN4P for jack-detection. Add a DMI quirk for this to fix the internal-mic and jack-detection. Signed-off-by: Hans de Goede <hdegoede(a)redhat.com> Link: https://patch.msgid.link/20241024211615.79518-2-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- sound/soc/intel/boards/bytcr_rt5640.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index 16e2ab2903754..2ed63866e9cc6 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -879,6 +879,21 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = { BYT_RT5640_SSP0_AIF2 | BYT_RT5640_MCLK_EN), }, + { /* Vexia Edu Atla 10 tablet */ + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "AMI Corporation"), + DMI_MATCH(DMI_BOARD_NAME, "Aptio CRB"), + /* Above strings are too generic, also match on BIOS date */ + DMI_MATCH(DMI_BIOS_DATE, "08/25/2014"), + }, + .driver_data = (void *)(BYT_RT5640_IN1_MAP | + BYT_RT5640_JD_SRC_JD2_IN4N | + BYT_RT5640_OVCD_TH_2000UA | + BYT_RT5640_OVCD_SF_0P75 | + BYT_RT5640_DIFF_MIC | + BYT_RT5640_SSP0_AIF2 | + BYT_RT5640_MCLK_EN), + }, { /* Voyo Winpad A15 */ .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "AMI Corporation"), -- 2.43.0

1 year, 1 month

1
4
0 0

[PATCH AUTOSEL 5.4 1/6] ASoC: Intel: bytcr_rt5640: Add DMI quirk for Vexia Edu Atla 10 tablet

by Sasha Levin

From: Hans de Goede <hdegoede(a)redhat.com> [ Upstream commit 0107f28f135231da22a9ad5756bb16bd5cada4d5 ] The Vexia Edu Atla 10 tablet mostly uses the BYTCR tablet defaults, but as happens on more models it is using IN1 instead of IN3 for its internal mic and JD_SRC_JD2_IN4N instead of JD_SRC_JD1_IN4P for jack-detection. Add a DMI quirk for this to fix the internal-mic and jack-detection. Signed-off-by: Hans de Goede <hdegoede(a)redhat.com> Link: https://patch.msgid.link/20241024211615.79518-2-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- sound/soc/intel/boards/bytcr_rt5640.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index 057ecfe2c8b5c..53a15be38b56f 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -909,6 +909,21 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = { BYT_RT5640_SSP0_AIF2 | BYT_RT5640_MCLK_EN), }, + { /* Vexia Edu Atla 10 tablet */ + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "AMI Corporation"), + DMI_MATCH(DMI_BOARD_NAME, "Aptio CRB"), + /* Above strings are too generic, also match on BIOS date */ + DMI_MATCH(DMI_BIOS_DATE, "08/25/2014"), + }, + .driver_data = (void *)(BYT_RT5640_IN1_MAP | + BYT_RT5640_JD_SRC_JD2_IN4N | + BYT_RT5640_OVCD_TH_2000UA | + BYT_RT5640_OVCD_SF_0P75 | + BYT_RT5640_DIFF_MIC | + BYT_RT5640_SSP0_AIF2 | + BYT_RT5640_MCLK_EN), + }, { /* Voyo Winpad A15 */ .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "AMI Corporation"), -- 2.43.0

1 year, 1 month

1
5
0 0

[PATCH AUTOSEL 5.10 1/6] ASoC: Intel: bytcr_rt5640: Add DMI quirk for Vexia Edu Atla 10 tablet

by Sasha Levin

From: Hans de Goede <hdegoede(a)redhat.com> [ Upstream commit 0107f28f135231da22a9ad5756bb16bd5cada4d5 ] The Vexia Edu Atla 10 tablet mostly uses the BYTCR tablet defaults, but as happens on more models it is using IN1 instead of IN3 for its internal mic and JD_SRC_JD2_IN4N instead of JD_SRC_JD1_IN4P for jack-detection. Add a DMI quirk for this to fix the internal-mic and jack-detection. Signed-off-by: Hans de Goede <hdegoede(a)redhat.com> Link: https://patch.msgid.link/20241024211615.79518-2-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- sound/soc/intel/boards/bytcr_rt5640.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index 47b581d99da67..6fc6a1fcd935e 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -935,6 +935,21 @@ static const struct dmi_system_id byt_rt5640_quirk_table[] = { BYT_RT5640_SSP0_AIF2 | BYT_RT5640_MCLK_EN), }, + { /* Vexia Edu Atla 10 tablet */ + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "AMI Corporation"), + DMI_MATCH(DMI_BOARD_NAME, "Aptio CRB"), + /* Above strings are too generic, also match on BIOS date */ + DMI_MATCH(DMI_BIOS_DATE, "08/25/2014"), + }, + .driver_data = (void *)(BYT_RT5640_IN1_MAP | + BYT_RT5640_JD_SRC_JD2_IN4N | + BYT_RT5640_OVCD_TH_2000UA | + BYT_RT5640_OVCD_SF_0P75 | + BYT_RT5640_DIFF_MIC | + BYT_RT5640_SSP0_AIF2 | + BYT_RT5640_MCLK_EN), + }, { /* Voyo Winpad A15 */ .matches = { DMI_MATCH(DMI_BOARD_VENDOR, "AMI Corporation"), -- 2.43.0

1 year, 1 month

1
5
0 0

[PATCH AUTOSEL 5.15 01/10] ASoC: Intel: bytcr_rt5640: Add support for non ACPI instantiated codec

by Sasha Levin

From: Hans de Goede <hdegoede(a)redhat.com> [ Upstream commit d48696b915527b5bcdd207a299aec03fb037eb17 ] On some x86 Bay Trail tablets which shipped with Android as factory OS, the DSDT is so broken that the codec needs to be manually instantatiated by the special x86-android-tablets.ko "fixup" driver for cases like this. This means that the codec-dev cannot be retrieved through its ACPI fwnode, add support to the bytcr_rt5640 machine driver for such manually instantiated rt5640 i2c_clients. An example of a tablet which needs this is the Vexia EDU ATLA 10 tablet, which has been distributed to schools in the Spanish Andalucía region. Signed-off-by: Hans de Goede <hdegoede(a)redhat.com> Link: https://patch.msgid.link/20241024211615.79518-1-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- sound/soc/intel/boards/bytcr_rt5640.c | 33 ++++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index 3d2a0e8cad9a5..899a8435a1eb8 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -17,6 +17,7 @@ #include <linux/acpi.h> #include <linux/clk.h> #include <linux/device.h> +#include <linux/device/bus.h> #include <linux/dmi.h> #include <linux/gpio/consumer.h> #include <linux/gpio/machine.h> @@ -32,6 +33,8 @@ #include "../atom/sst-atom-controls.h" #include "../common/soc-intel-quirks.h" +#define BYT_RT5640_FALLBACK_CODEC_DEV_NAME "i2c-rt5640" + enum { BYT_RT5640_DMIC1_MAP, BYT_RT5640_DMIC2_MAP, @@ -1616,9 +1619,33 @@ static int snd_byt_rt5640_mc_probe(struct platform_device *pdev) codec_dev = acpi_get_first_physical_node(adev); acpi_dev_put(adev); - if (!codec_dev) - return -EPROBE_DEFER; - priv->codec_dev = get_device(codec_dev); + + if (codec_dev) { + priv->codec_dev = get_device(codec_dev); + } else { + /* + * Special case for Android tablets where the codec i2c_client + * has been manually instantiated by x86_android_tablets.ko due + * to a broken DSDT. + */ + codec_dev = bus_find_device_by_name(&i2c_bus_type, NULL, + BYT_RT5640_FALLBACK_CODEC_DEV_NAME); + if (!codec_dev) + return -EPROBE_DEFER; + + if (!i2c_verify_client(codec_dev)) { + dev_err(dev, "Error '%s' is not an i2c_client\n", + BYT_RT5640_FALLBACK_CODEC_DEV_NAME); + put_device(codec_dev); + } + + /* fixup codec name */ + strscpy(byt_rt5640_codec_name, BYT_RT5640_FALLBACK_CODEC_DEV_NAME, + sizeof(byt_rt5640_codec_name)); + + /* bus_find_device() returns a reference no need to get() */ + priv->codec_dev = codec_dev; + } /* * swap SSP0 if bytcr is detected -- 2.43.0

1 year, 1 month

1
9
0 0

[PATCH AUTOSEL 6.1 01/11] ASoC: Intel: bytcr_rt5640: Add support for non ACPI instantiated codec

by Sasha Levin

From: Hans de Goede <hdegoede(a)redhat.com> [ Upstream commit d48696b915527b5bcdd207a299aec03fb037eb17 ] On some x86 Bay Trail tablets which shipped with Android as factory OS, the DSDT is so broken that the codec needs to be manually instantatiated by the special x86-android-tablets.ko "fixup" driver for cases like this. This means that the codec-dev cannot be retrieved through its ACPI fwnode, add support to the bytcr_rt5640 machine driver for such manually instantiated rt5640 i2c_clients. An example of a tablet which needs this is the Vexia EDU ATLA 10 tablet, which has been distributed to schools in the Spanish Andalucía region. Signed-off-by: Hans de Goede <hdegoede(a)redhat.com> Link: https://patch.msgid.link/20241024211615.79518-1-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie(a)kernel.org> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- sound/soc/intel/boards/bytcr_rt5640.c | 33 ++++++++++++++++++++++++--- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/sound/soc/intel/boards/bytcr_rt5640.c b/sound/soc/intel/boards/bytcr_rt5640.c index ff879e173d51d..7a57d7abd3803 100644 --- a/sound/soc/intel/boards/bytcr_rt5640.c +++ b/sound/soc/intel/boards/bytcr_rt5640.c @@ -17,6 +17,7 @@ #include <linux/acpi.h> #include <linux/clk.h> #include <linux/device.h> +#include <linux/device/bus.h> #include <linux/dmi.h> #include <linux/gpio/consumer.h> #include <linux/gpio/machine.h> @@ -32,6 +33,8 @@ #include "../atom/sst-atom-controls.h" #include "../common/soc-intel-quirks.h" +#define BYT_RT5640_FALLBACK_CODEC_DEV_NAME "i2c-rt5640" + enum { BYT_RT5640_DMIC1_MAP, BYT_RT5640_DMIC2_MAP, @@ -1687,9 +1690,33 @@ static int snd_byt_rt5640_mc_probe(struct platform_device *pdev) codec_dev = acpi_get_first_physical_node(adev); acpi_dev_put(adev); - if (!codec_dev) - return -EPROBE_DEFER; - priv->codec_dev = get_device(codec_dev); + + if (codec_dev) { + priv->codec_dev = get_device(codec_dev); + } else { + /* + * Special case for Android tablets where the codec i2c_client + * has been manually instantiated by x86_android_tablets.ko due + * to a broken DSDT. + */ + codec_dev = bus_find_device_by_name(&i2c_bus_type, NULL, + BYT_RT5640_FALLBACK_CODEC_DEV_NAME); + if (!codec_dev) + return -EPROBE_DEFER; + + if (!i2c_verify_client(codec_dev)) { + dev_err(dev, "Error '%s' is not an i2c_client\n", + BYT_RT5640_FALLBACK_CODEC_DEV_NAME); + put_device(codec_dev); + } + + /* fixup codec name */ + strscpy(byt_rt5640_codec_name, BYT_RT5640_FALLBACK_CODEC_DEV_NAME, + sizeof(byt_rt5640_codec_name)); + + /* bus_find_device() returns a reference no need to get() */ + priv->codec_dev = codec_dev; + } /* * swap SSP0 if bytcr is detected -- 2.43.0

1 year, 1 month

1
10
0 0

[PATCH AUTOSEL 6.6 01/14] wifi: radiotap: Avoid -Wflex-array-member-not-at-end warnings

by Sasha Levin

From: "Gustavo A. R. Silva" <gustavoars(a)kernel.org> [ Upstream commit 57be3d3562ca4aa62b8047bc681028cc402af8ce ] -Wflex-array-member-not-at-end was introduced in GCC-14, and we are getting ready to enable it, globally. So, in order to avoid ending up with a flexible-array member in the middle of multiple other structs, we use the `__struct_group()` helper to create a new tagged `struct ieee80211_radiotap_header_fixed`. This structure groups together all the members of the flexible `struct ieee80211_radiotap_header` except the flexible array. As a result, the array is effectively separated from the rest of the members without modifying the memory layout of the flexible structure. We then change the type of the middle struct members currently causing trouble from `struct ieee80211_radiotap_header` to `struct ieee80211_radiotap_header_fixed`. We also want to ensure that in case new members need to be added to the flexible structure, they are always included within the newly created tagged struct. For this, we use `static_assert()`. This ensures that the memory layout for both the flexible structure and the new tagged struct is the same after any changes. This approach avoids having to implement `struct ieee80211_radiotap_header_fixed` as a completely separate structure, thus preventing having to maintain two independent but basically identical structures, closing the door to potential bugs in the future. So, with these changes, fix the following warnings: drivers/net/wireless/ath/wil6210/txrx.c:309:50: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/wireless/intel/ipw2x00/ipw2100.c:2521:50: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/wireless/intel/ipw2x00/ipw2200.h:1146:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/wireless/intel/ipw2x00/libipw.h:595:36: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/wireless/marvell/libertas/radiotap.h:34:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/wireless/marvell/libertas/radiotap.h:5:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/wireless/microchip/wilc1000/mon.c:10:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/wireless/microchip/wilc1000/mon.c:15:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/wireless/virtual/mac80211_hwsim.c:758:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] drivers/net/wireless/virtual/mac80211_hwsim.c:767:42: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars(a)kernel.org> Link: https://patch.msgid.link/ZwBMtBZKcrzwU7l4@kspp Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- drivers/net/wireless/ath/wil6210/txrx.c | 2 +- drivers/net/wireless/intel/ipw2x00/ipw2100.c | 2 +- drivers/net/wireless/intel/ipw2x00/ipw2200.h | 2 +- .../net/wireless/marvell/libertas/radiotap.h | 4 +- drivers/net/wireless/microchip/wilc1000/mon.c | 4 +- drivers/net/wireless/virtual/mac80211_hwsim.c | 4 +- include/net/ieee80211_radiotap.h | 43 +++++++++++-------- 7 files changed, 33 insertions(+), 28 deletions(-) diff --git a/drivers/net/wireless/ath/wil6210/txrx.c b/drivers/net/wireless/ath/wil6210/txrx.c index f29ac6de71399..19702b6f09c32 100644 --- a/drivers/net/wireless/ath/wil6210/txrx.c +++ b/drivers/net/wireless/ath/wil6210/txrx.c @@ -306,7 +306,7 @@ static void wil_rx_add_radiotap_header(struct wil6210_priv *wil, struct sk_buff *skb) { struct wil6210_rtap { - struct ieee80211_radiotap_header rthdr; + struct ieee80211_radiotap_header_fixed rthdr; /* fields should be in the order of bits in rthdr.it_present */ /* flags */ u8 flags; diff --git a/drivers/net/wireless/intel/ipw2x00/ipw2100.c b/drivers/net/wireless/intel/ipw2x00/ipw2100.c index 0812db8936f13..9e9ff0cb724ca 100644 --- a/drivers/net/wireless/intel/ipw2x00/ipw2100.c +++ b/drivers/net/wireless/intel/ipw2x00/ipw2100.c @@ -2520,7 +2520,7 @@ static void isr_rx_monitor(struct ipw2100_priv *priv, int i, * to build this manually element by element, we can write it much * more efficiently than we can parse it. ORDER MATTERS HERE */ struct ipw_rt_hdr { - struct ieee80211_radiotap_header rt_hdr; + struct ieee80211_radiotap_header_fixed rt_hdr; s8 rt_dbmsignal; /* signal in dbM, kluged to signed */ } *ipw_rt; diff --git a/drivers/net/wireless/intel/ipw2x00/ipw2200.h b/drivers/net/wireless/intel/ipw2x00/ipw2200.h index 8ebf09121e173..226286cb7eb82 100644 --- a/drivers/net/wireless/intel/ipw2x00/ipw2200.h +++ b/drivers/net/wireless/intel/ipw2x00/ipw2200.h @@ -1143,7 +1143,7 @@ struct ipw_prom_priv { * structure is provided regardless of any bits unset. */ struct ipw_rt_hdr { - struct ieee80211_radiotap_header rt_hdr; + struct ieee80211_radiotap_header_fixed rt_hdr; u64 rt_tsf; /* TSF */ /* XXX */ u8 rt_flags; /* radiotap packet flags */ u8 rt_rate; /* rate in 500kb/s */ diff --git a/drivers/net/wireless/marvell/libertas/radiotap.h b/drivers/net/wireless/marvell/libertas/radiotap.h index 1ed5608d353ff..d543bfe739dcb 100644 --- a/drivers/net/wireless/marvell/libertas/radiotap.h +++ b/drivers/net/wireless/marvell/libertas/radiotap.h @@ -2,7 +2,7 @@ #include <net/ieee80211_radiotap.h> struct tx_radiotap_hdr { - struct ieee80211_radiotap_header hdr; + struct ieee80211_radiotap_header_fixed hdr; u8 rate; u8 txpower; u8 rts_retries; @@ -31,7 +31,7 @@ struct tx_radiotap_hdr { #define IEEE80211_FC_DSTODS 0x0300 struct rx_radiotap_hdr { - struct ieee80211_radiotap_header hdr; + struct ieee80211_radiotap_header_fixed hdr; u8 flags; u8 rate; u8 antsignal; diff --git a/drivers/net/wireless/microchip/wilc1000/mon.c b/drivers/net/wireless/microchip/wilc1000/mon.c index 03b7229a0ff5a..c3d27aaec2974 100644 --- a/drivers/net/wireless/microchip/wilc1000/mon.c +++ b/drivers/net/wireless/microchip/wilc1000/mon.c @@ -7,12 +7,12 @@ #include "cfg80211.h" struct wilc_wfi_radiotap_hdr { - struct ieee80211_radiotap_header hdr; + struct ieee80211_radiotap_header_fixed hdr; u8 rate; } __packed; struct wilc_wfi_radiotap_cb_hdr { - struct ieee80211_radiotap_header hdr; + struct ieee80211_radiotap_header_fixed hdr; u8 rate; u8 dump; u16 tx_flags; diff --git a/drivers/net/wireless/virtual/mac80211_hwsim.c b/drivers/net/wireless/virtual/mac80211_hwsim.c index 07be0adc13ec5..d86a1bd7aab08 100644 --- a/drivers/net/wireless/virtual/mac80211_hwsim.c +++ b/drivers/net/wireless/virtual/mac80211_hwsim.c @@ -736,7 +736,7 @@ static const struct rhashtable_params hwsim_rht_params = { }; struct hwsim_radiotap_hdr { - struct ieee80211_radiotap_header hdr; + struct ieee80211_radiotap_header_fixed hdr; __le64 rt_tsft; u8 rt_flags; u8 rt_rate; @@ -745,7 +745,7 @@ struct hwsim_radiotap_hdr { } __packed; struct hwsim_radiotap_ack_hdr { - struct ieee80211_radiotap_header hdr; + struct ieee80211_radiotap_header_fixed hdr; u8 rt_flags; u8 pad; __le16 rt_channel; diff --git a/include/net/ieee80211_radiotap.h b/include/net/ieee80211_radiotap.h index 2338f8d2a8b33..c6cb6f6427423 100644 --- a/include/net/ieee80211_radiotap.h +++ b/include/net/ieee80211_radiotap.h @@ -24,25 +24,27 @@ * struct ieee80211_radiotap_header - base radiotap header */ struct ieee80211_radiotap_header { - /** - * @it_version: radiotap version, always 0 - */ - uint8_t it_version; - - /** - * @it_pad: padding (or alignment) - */ - uint8_t it_pad; - - /** - * @it_len: overall radiotap header length - */ - __le16 it_len; - - /** - * @it_present: (first) present word - */ - __le32 it_present; + __struct_group(ieee80211_radiotap_header_fixed, hdr, __packed, + /** + * @it_version: radiotap version, always 0 + */ + uint8_t it_version; + + /** + * @it_pad: padding (or alignment) + */ + uint8_t it_pad; + + /** + * @it_len: overall radiotap header length + */ + __le16 it_len; + + /** + * @it_present: (first) present word + */ + __le32 it_present; + ); /** * @it_optional: all remaining presence bitmaps @@ -50,6 +52,9 @@ struct ieee80211_radiotap_header { __le32 it_optional[]; } __packed; +static_assert(offsetof(struct ieee80211_radiotap_header, it_optional) == sizeof(struct ieee80211_radiotap_header_fixed), + "struct member likely outside of __struct_group()"); + /* version is always 0 */ #define PKTHDR_RADIOTAP_VERSION 0 -- 2.43.0

1 year, 1 month

1
13
0 0

[PATCH AUTOSEL 6.11 01/21] wifi: mac80211: Fix setting txpower with emulate_chanctx

by Sasha Levin

From: Ben Greear <greearb(a)candelatech.com> [ Upstream commit 8dd0498983eef524a8d104eb8abb32ec4c595bec ] Propagate hw conf into the driver when txpower changes and driver is emulating channel contexts. Signed-off-by: Ben Greear <greearb(a)candelatech.com> Link: https://patch.msgid.link/20240924011325.1509103-1-greearb@candelatech.com Signed-off-by: Johannes Berg <johannes.berg(a)intel.com> Signed-off-by: Sasha Levin <sashal(a)kernel.org> --- net/mac80211/cfg.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/mac80211/cfg.c b/net/mac80211/cfg.c index b02b84ce21307..c4f4300c81c16 100644 --- a/net/mac80211/cfg.c +++ b/net/mac80211/cfg.c @@ -3046,6 +3046,7 @@ static int ieee80211_set_tx_power(struct wiphy *wiphy, enum nl80211_tx_power_setting txp_type = type; bool update_txp_type = false; bool has_monitor = false; + int old_power = local->user_power_level; lockdep_assert_wiphy(local->hw.wiphy); @@ -3128,6 +3129,10 @@ static int ieee80211_set_tx_power(struct wiphy *wiphy, } } + if (local->emulate_chanctx && + (old_power != local->user_power_level)) + ieee80211_hw_conf_chan(local); + return 0; } -- 2.43.0

1 year, 1 month

1
20
0 0

[PATCHv6, RESEND 3/4] x86/tdx: Dynamically disable SEPT violations from causing #VEs

by Kirill A. Shutemov

Memory access #VEs are hard for Linux to handle in contexts like the entry code or NMIs. But other OSes need them for functionality. There's a static (pre-guest-boot) way for a VMM to choose one or the other. But VMMs don't always know which OS they are booting, so they choose to deliver those #VEs so the "other" OSes will work. That, unfortunately has left us in the lurch and exposed to these hard-to-handle #VEs. The TDX module has introduced a new feature. Even if the static configuration is set to "send nasty #VEs", the kernel can dynamically request that they be disabled. Once they are disabled, access to private memory that is not in the Mapped state in the Secure-EPT (SEPT) will result in an exit to the VMM rather than injecting a #VE. Check if the feature is available and disable SEPT #VE if possible. If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE attribute is no longer reliable. It reflects the initial state of the control for the TD, but it will not be updated if someone (e.g. bootloader) changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to determine if SEPT #VEs are enabled or disabled. Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Fixes: 373e715e31bf ("x86/tdx: Panic on bad configs that #VE on "private" memory access") Cc: stable(a)vger.kernel.org Acked-by: Kai Huang <kai.huang(a)intel.com> --- arch/x86/coco/tdx/tdx.c | 76 ++++++++++++++++++++++++------- arch/x86/include/asm/shared/tdx.h | 10 +++- 2 files changed, 69 insertions(+), 17 deletions(-) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 28b321a95a5e..a27230c44cc2 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -79,7 +79,7 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args) } /* Read TD-scoped metadata */ -static inline u64 __maybe_unused tdg_vm_rd(u64 field, u64 *value) +static inline u64 tdg_vm_rd(u64 field, u64 *value) { struct tdx_module_args args = { .rdx = field, @@ -194,6 +194,62 @@ static void __noreturn tdx_panic(const char *msg) __tdx_hypercall(&args); } +/* + * The kernel cannot handle #VEs when accessing normal kernel memory. Ensure + * that no #VE will be delivered for accesses to TD-private memory. + * + * TDX 1.0 does not allow the guest to disable SEPT #VE on its own. The VMM + * controls if the guest will receive such #VE with TD attribute + * ATTR_SEPT_VE_DISABLE. + * + * Newer TDX modules allow the guest to control if it wants to receive SEPT + * violation #VEs. + * + * Check if the feature is available and disable SEPT #VE if possible. + * + * If the TD is allowed to disable/enable SEPT #VEs, the ATTR_SEPT_VE_DISABLE + * attribute is no longer reliable. It reflects the initial state of the + * control for the TD, but it will not be updated if someone (e.g. bootloader) + * changes it before the kernel starts. Kernel must check TDCS_TD_CTLS bit to + * determine if SEPT #VEs are enabled or disabled. + */ +static void disable_sept_ve(u64 td_attr) +{ + const char *msg = "TD misconfiguration: SEPT #VE has to be disabled"; + bool debug = td_attr & ATTR_DEBUG; + u64 config, controls; + + /* Is this TD allowed to disable SEPT #VE */ + tdg_vm_rd(TDCS_CONFIG_FLAGS, &config); + if (!(config & TDCS_CONFIG_FLEXIBLE_PENDING_VE)) { + /* No SEPT #VE controls for the guest: check the attribute */ + if (td_attr & ATTR_SEPT_VE_DISABLE) + return; + + /* Relax SEPT_VE_DISABLE check for debug TD for backtraces */ + if (debug) + pr_warn("%s\n", msg); + else + tdx_panic(msg); + return; + } + + /* Check if SEPT #VE has been disabled before us */ + tdg_vm_rd(TDCS_TD_CTLS, &controls); + if (controls & TD_CTLS_PENDING_VE_DISABLE) + return; + + /* Keep #VEs enabled for splats in debugging environments */ + if (debug) + return; + + /* Disable SEPT #VEs */ + tdg_vm_wr(TDCS_TD_CTLS, TD_CTLS_PENDING_VE_DISABLE, + TD_CTLS_PENDING_VE_DISABLE); + + return; +} + static void tdx_setup(u64 *cc_mask) { struct tdx_module_args args = {}; @@ -219,24 +275,12 @@ static void tdx_setup(u64 *cc_mask) gpa_width = args.rcx & GENMASK(5, 0); *cc_mask = BIT_ULL(gpa_width - 1); + td_attr = args.rdx; + /* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */ tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL); - /* - * The kernel can not handle #VE's when accessing normal kernel - * memory. Ensure that no #VE will be delivered for accesses to - * TD-private memory. Only VMM-shared memory (MMIO) will #VE. - */ - td_attr = args.rdx; - if (!(td_attr & ATTR_SEPT_VE_DISABLE)) { - const char *msg = "TD misconfiguration: SEPT_VE_DISABLE attribute must be set."; - - /* Relax SEPT_VE_DISABLE check for debug TD. */ - if (td_attr & ATTR_DEBUG) - pr_warn("%s\n", msg); - else - tdx_panic(msg); - } + disable_sept_ve(td_attr); } /* diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h index 7e12cfa28bec..fecb2a6e864b 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -19,9 +19,17 @@ #define TDG_VM_RD 7 #define TDG_VM_WR 8 -/* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */ +/* TDX TD-Scope Metadata. To be used by TDG.VM.WR and TDG.VM.RD */ +#define TDCS_CONFIG_FLAGS 0x1110000300000016 +#define TDCS_TD_CTLS 0x1110000300000017 #define TDCS_NOTIFY_ENABLES 0x9100000000000010 +/* TDCS_CONFIG_FLAGS bits */ +#define TDCS_CONFIG_FLEXIBLE_PENDING_VE BIT_ULL(1) + +/* TDCS_TD_CTLS bits */ +#define TD_CTLS_PENDING_VE_DISABLE BIT_ULL(0) + /* TDX hypercall Leaf IDs */ #define TDVMCALL_MAP_GPA 0x10001 #define TDVMCALL_GET_QUOTE 0x10002 -- 2.45.2

1 year, 1 month

1
0
0 0

[PATCHv6, RESEND 2/4] x86/tdx: Rename tdx_parse_tdinfo() to tdx_setup()

by Kirill A. Shutemov

Rename tdx_parse_tdinfo() to tdx_setup() and move setting NOTIFY_ENABLES there. The function will be extended to adjust TD configuration. Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com> Reviewed-by: Kai Huang <kai.huang(a)intel.com> Cc: stable(a)vger.kernel.org --- arch/x86/coco/tdx/tdx.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index c74bb9e7d7a3..28b321a95a5e 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -194,7 +194,7 @@ static void __noreturn tdx_panic(const char *msg) __tdx_hypercall(&args); } -static void tdx_parse_tdinfo(u64 *cc_mask) +static void tdx_setup(u64 *cc_mask) { struct tdx_module_args args = {}; unsigned int gpa_width; @@ -219,6 +219,9 @@ static void tdx_parse_tdinfo(u64 *cc_mask) gpa_width = args.rcx & GENMASK(5, 0); *cc_mask = BIT_ULL(gpa_width - 1); + /* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */ + tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL); + /* * The kernel can not handle #VE's when accessing normal kernel * memory. Ensure that no #VE will be delivered for accesses to @@ -969,11 +972,11 @@ void __init tdx_early_init(void) setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); cc_vendor = CC_VENDOR_INTEL; - tdx_parse_tdinfo(&cc_mask); - cc_set_mask(cc_mask); - /* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */ - tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL); + /* Configure the TD */ + tdx_setup(&cc_mask); + + cc_set_mask(cc_mask); /* * All bits above GPA width are reserved and kernel treats shared bit -- 2.45.2

1 year, 1 month

1
0
0 0

[PATCHv6, RESEND 1/4] x86/tdx: Introduce wrappers to read and write TD metadata

by Kirill A. Shutemov

The TDG_VM_WR TDCALL is used to ask the TDX module to change some TD-specific VM configuration. There is currently only one user in the kernel of this TDCALL leaf. More will be added shortly. Refactor to make way for more users of TDG_VM_WR who will need to modify other TD configuration values. Add a wrapper for the TDG_VM_RD TDCALL that requests TD-specific metadata from the TDX module. There are currently no users for TDG_VM_RD. Mark it as __maybe_unused until the first user appears. This is preparation for enumeration and enabling optional TD features. Signed-off-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Reviewed-by: Kai Huang <kai.huang(a)intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy(a)linux.intel.com> Cc: stable(a)vger.kernel.org --- arch/x86/coco/tdx/tdx.c | 32 ++++++++++++++++++++++++++----- arch/x86/include/asm/shared/tdx.h | 1 + 2 files changed, 28 insertions(+), 5 deletions(-) diff --git a/arch/x86/coco/tdx/tdx.c b/arch/x86/coco/tdx/tdx.c index 327c45c5013f..c74bb9e7d7a3 100644 --- a/arch/x86/coco/tdx/tdx.c +++ b/arch/x86/coco/tdx/tdx.c @@ -78,6 +78,32 @@ static inline void tdcall(u64 fn, struct tdx_module_args *args) panic("TDCALL %lld failed (Buggy TDX module!)\n", fn); } +/* Read TD-scoped metadata */ +static inline u64 __maybe_unused tdg_vm_rd(u64 field, u64 *value) +{ + struct tdx_module_args args = { + .rdx = field, + }; + u64 ret; + + ret = __tdcall_ret(TDG_VM_RD, &args); + *value = args.r8; + + return ret; +} + +/* Write TD-scoped metadata */ +static inline u64 tdg_vm_wr(u64 field, u64 value, u64 mask) +{ + struct tdx_module_args args = { + .rdx = field, + .r8 = value, + .r9 = mask, + }; + + return __tdcall(TDG_VM_WR, &args); +} + /** * tdx_mcall_get_report0() - Wrapper to get TDREPORT0 (a.k.a. TDREPORT * subtype 0) using TDG.MR.REPORT TDCALL. @@ -929,10 +955,6 @@ static void tdx_kexec_finish(void) void __init tdx_early_init(void) { - struct tdx_module_args args = { - .rdx = TDCS_NOTIFY_ENABLES, - .r9 = -1ULL, - }; u64 cc_mask; u32 eax, sig[3]; @@ -951,7 +973,7 @@ void __init tdx_early_init(void) cc_set_mask(cc_mask); /* Kernel does not use NOTIFY_ENABLES and does not need random #VEs */ - tdcall(TDG_VM_WR, &args); + tdg_vm_wr(TDCS_NOTIFY_ENABLES, 0, -1ULL); /* * All bits above GPA width are reserved and kernel treats shared bit diff --git a/arch/x86/include/asm/shared/tdx.h b/arch/x86/include/asm/shared/tdx.h index fdfd41511b02..7e12cfa28bec 100644 --- a/arch/x86/include/asm/shared/tdx.h +++ b/arch/x86/include/asm/shared/tdx.h @@ -16,6 +16,7 @@ #define TDG_VP_VEINFO_GET 3 #define TDG_MR_REPORT 4 #define TDG_MEM_PAGE_ACCEPT 6 +#define TDG_VM_RD 7 #define TDG_VM_WR 8 /* TDCS fields. To be used by TDG.VM.WR and TDG.VM.RD module calls */ -- 2.45.2

1 year, 1 month

1
0
0 0

[PATCH] arm64: dts: mediatek: mt8186-corsola: Fix IT6505 reset line polarity

by Chen-Yu Tsai

The reset line of the IT6505 bridge chip is active low, not active high. It was incorrectly inverted in the device tree as the implementation at the time incorrectly inverted the polarity in its driver, due to a prior device having an inline inverting level shifter. Fix the polarity now while the external display pipeline is incomplete, thereby avoiding any impact to running systems. A matching fix for the driver should be included if this change is backported. Fixes: 8855d01fb81f ("arm64: dts: mediatek: Add MT8186 Krabby platform based Tentacruel / Tentacool") Cc: <stable(a)vger.kernel.org> Signed-off-by: Chen-Yu Tsai <wenst(a)chromium.org> --- The matching driver change can be found at https://lore.kernel.org/all/20241029095411.657616-1-wenst@chromium.org/ arch/arm64/boot/dts/mediatek/mt8186-corsola.dtsi | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/arm64/boot/dts/mediatek/mt8186-corsola.dtsi b/arch/arm64/boot/dts/mediatek/mt8186-corsola.dtsi index e3b58641f2c9..43c83620e479 100644 --- a/arch/arm64/boot/dts/mediatek/mt8186-corsola.dtsi +++ b/arch/arm64/boot/dts/mediatek/mt8186-corsola.dtsi @@ -422,7 +422,7 @@ it6505dptx: dp-bridge@5c { #sound-dai-cells = <0>; ovdd-supply = <&mt6366_vsim2_reg>; pwr18-supply = <&pp1800_dpbrdg_dx>; - reset-gpios = <&pio 177 GPIO_ACTIVE_HIGH>; + reset-gpios = <&pio 177 GPIO_ACTIVE_LOW>; ports { #address-cells = <1>; -- 2.47.0.163.g1226f6d8fa-goog

1 year, 1 month

2
1
0 0

[merged] lib-string_helpers-fix-potential-snprintf-output-truncation.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: lib: string_helpers: fix potential snprintf() output truncation has been removed from the -mm tree. Its filename was lib-string_helpers-fix-potential-snprintf-output-truncation.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> Subject: lib: string_helpers: fix potential snprintf() output truncation Date: Mon, 21 Oct 2024 11:14:17 +0200 The output of ".%03u" with the unsigned int in range [0, 4294966295] may get truncated if the target buffer is not 12 bytes. Link: https://lkml.kernel.org/r/20241021091417.37796-1-brgl@bgdev.pl Fixes: 3c9f3681d0b4 ("[SCSI] lib: add generic helper to print sizes rounded to the correct SI range") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> Reviewed-by: Andy Shevchenko <andy(a)kernel.org> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Kees Cook <kees(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/string_helpers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/lib/string_helpers.c~lib-string_helpers-fix-potential-snprintf-output-truncation +++ a/lib/string_helpers.c @@ -57,7 +57,7 @@ int string_get_size(u64 size, u64 blk_si static const unsigned int rounding[] = { 500, 50, 5 }; int i = 0, j; u32 remainder = 0, sf_cap; - char tmp[8]; + char tmp[12]; const char *unit; tmp[0] = '\0'; _ Patches currently in -mm which might be from bartosz.golaszewski(a)linaro.org are

1 year, 1 month

1
0
0 0

[PATCH stable 5.10.y] x86/bugs: Use code segment selector for VERW operand

by Xiongfeng Wang

From: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com> commit e4d2102018542e3ae5e297bc6e229303abff8a0f upstream. Robert Gill reported below #GP in 32-bit mode when dosemu software was executing vm86() system call: general protection fault: 0000 [#1] PREEMPT SMP CPU: 4 PID: 4610 Comm: dosemu.bin Not tainted 6.6.21-gentoo-x86 #1 Hardware name: Dell Inc. PowerEdge 1950/0H723K, BIOS 2.7.0 10/30/2010 EIP: restore_all_switch_stack+0xbe/0xcf EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000 ESI: 00000000 EDI: 00000000 EBP: 00000000 ESP: ff8affdc DS: 0000 ES: 0000 FS: 0000 GS: 0033 SS: 0068 EFLAGS: 00010046 CR0: 80050033 CR2: 00c2101c CR3: 04b6d000 CR4: 000406d0 Call Trace: show_regs+0x70/0x78 die_addr+0x29/0x70 exc_general_protection+0x13c/0x348 exc_bounds+0x98/0x98 handle_exception+0x14d/0x14d exc_bounds+0x98/0x98 restore_all_switch_stack+0xbe/0xcf exc_bounds+0x98/0x98 restore_all_switch_stack+0xbe/0xcf This only happens in 32-bit mode when VERW based mitigations like MDS/RFDS are enabled. This is because segment registers with an arbitrary user value can result in #GP when executing VERW. Intel SDM vol. 2C documents the following behavior for VERW instruction: #GP(0) - If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. CLEAR_CPU_BUFFERS macro executes VERW instruction before returning to user space. Use %cs selector to reference VERW operand. This ensures VERW will not #GP for an arbitrary user %ds. [ mingo: Fixed the SOB chain. ] Fixes: a0e2dab44d22 ("x86/entry_32: Add VERW just before userspace transition") Reported-by: Robert Gill <rtgill82(a)gmail.com> Reviewed-by: Andrew Cooper <andrew.cooper3(a)citrix.com> Cc: stable(a)vger.kernel.org # 5.10+ Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218707 Closes: https://lore.kernel.org/all/8c77ccfd-d561-45a1-8ed5-6b75212c7a58@leemhuis.i… Suggested-by: Dave Hansen <dave.hansen(a)linux.intel.com> Suggested-by: Brian Gerst <brgerst(a)gmail.com> Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com> Signed-off-by: Ingo Molnar <mingo(a)kernel.org> [xiongfeng: fix conflicts caused by the runtime patch jmp] Signed-off-by: Xiongfeng Wang <wangxiongfeng2(a)huawei.com> --- arch/x86/include/asm/nospec-branch.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 87e1ff064025..7978d5fe1ce6 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -199,7 +199,16 @@ */ .macro CLEAR_CPU_BUFFERS ALTERNATIVE "jmp .Lskip_verw_\@", "", X86_FEATURE_CLEAR_CPU_BUF - verw _ASM_RIP(mds_verw_sel) +#ifdef CONFIG_X86_64 + verw mds_verw_sel(%rip) +#else + /* + * In 32bit mode, the memory operand must be a %cs reference. The data + * segments may not be usable (vm86 mode), and the stack segment may not + * be flat (ESPFIX32). + */ + verw %cs:mds_verw_sel +#endif .Lskip_verw_\@: .endm -- 2.20.1

1 year, 1 month

2
2
0 0

[PATCH v2] ocfs2: remove entry once instead of null-ptr-dereference in ocfs2_xa_remove()

by Andrew Kanner

Syzkaller is able to provoke null-ptr-dereference in ocfs2_xa_remove(): [ 57.319872] (a.out,1161,7):ocfs2_xa_remove:2028 ERROR: status = -12 [ 57.320420] (a.out,1161,7):ocfs2_xa_cleanup_value_truncate:1999 ERROR: Partial truncate while removing xattr overlay.upper. Leaking 1 clusters and removing the entry [ 57.321727] BUG: kernel NULL pointer dereference, address: 0000000000000004 [...] [ 57.325727] RIP: 0010:ocfs2_xa_block_wipe_namevalue+0x2a/0xc0 [...] [ 57.331328] Call Trace: [ 57.331477] <TASK> [...] [ 57.333511] ? do_user_addr_fault+0x3e5/0x740 [ 57.333778] ? exc_page_fault+0x70/0x170 [ 57.334016] ? asm_exc_page_fault+0x2b/0x30 [ 57.334263] ? __pfx_ocfs2_xa_block_wipe_namevalue+0x10/0x10 [ 57.334596] ? ocfs2_xa_block_wipe_namevalue+0x2a/0xc0 [ 57.334913] ocfs2_xa_remove_entry+0x23/0xc0 [ 57.335164] ocfs2_xa_set+0x704/0xcf0 [ 57.335381] ? _raw_spin_unlock+0x1a/0x40 [ 57.335620] ? ocfs2_inode_cache_unlock+0x16/0x20 [ 57.335915] ? trace_preempt_on+0x1e/0x70 [ 57.336153] ? start_this_handle+0x16c/0x500 [ 57.336410] ? preempt_count_sub+0x50/0x80 [ 57.336656] ? _raw_read_unlock+0x20/0x40 [ 57.336906] ? start_this_handle+0x16c/0x500 [ 57.337162] ocfs2_xattr_block_set+0xa6/0x1e0 [ 57.337424] __ocfs2_xattr_set_handle+0x1fd/0x5d0 [ 57.337706] ? ocfs2_start_trans+0x13d/0x290 [ 57.337971] ocfs2_xattr_set+0xb13/0xfb0 [ 57.338207] ? dput+0x46/0x1c0 [ 57.338393] ocfs2_xattr_trusted_set+0x28/0x30 [ 57.338665] ? ocfs2_xattr_trusted_set+0x28/0x30 [ 57.338948] __vfs_removexattr+0x92/0xc0 [ 57.339182] __vfs_removexattr_locked+0xd5/0x190 [ 57.339456] ? preempt_count_sub+0x50/0x80 [ 57.339705] vfs_removexattr+0x5f/0x100 [...] Reproducer uses faultinject facility to fail ocfs2_xa_remove() -> ocfs2_xa_value_truncate() with -ENOMEM. In this case the comment mentions that we can return 0 if ocfs2_xa_cleanup_value_truncate() is going to wipe the entry anyway. But the following 'rc' check is wrong and execution flow do 'ocfs2_xa_remove_entry(loc);' twice: * 1st: in ocfs2_xa_cleanup_value_truncate(); * 2nd: returning back to ocfs2_xa_remove() instead of going to 'out'. Fix this by skipping the 2nd removal of the same entry and making syzkaller repro happy. Cc: stable(a)vger.kernel.org Fixes: 399ff3a748cf ("ocfs2: Handle errors while setting external xattr values.") Reported-by: syzbot+386ce9e60fa1b18aac5b(a)syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/671e13ab.050a0220.2b8c0f.01d0.GAE@google.com/T/ Tested-by: syzbot+386ce9e60fa1b18aac5b(a)syzkaller.appspotmail.com Signed-off-by: Andrew Kanner <andrew.kanner(a)gmail.com> --- Notes (akanner): v2: remove rc check completely, suggested by Joseph Qi <joseph.qi(a)linux.alibaba.com> v1: https://lore.kernel.org/all/20241029224304.2169092-2-andrew.kanner@gmail.co… fs/ocfs2/xattr.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/fs/ocfs2/xattr.c b/fs/ocfs2/xattr.c index dd0a05365e79..73a6f6fd8a8e 100644 --- a/fs/ocfs2/xattr.c +++ b/fs/ocfs2/xattr.c @@ -2036,8 +2036,7 @@ static int ocfs2_xa_remove(struct ocfs2_xa_loc *loc, rc = 0; ocfs2_xa_cleanup_value_truncate(loc, "removing", orig_clusters); - if (rc) - goto out; + goto out; } } -- 2.43.5

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-multi-gen-lru-use-pteppmdp_clear_young_notify.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() has been removed from the -mm tree. Its filename was mm-multi-gen-lru-use-pteppmdp_clear_young_notify.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Yu Zhao <yuzhao(a)google.com> Subject: mm: multi-gen LRU: use {ptep,pmdp}_clear_young_notify() Date: Sat, 19 Oct 2024 01:29:39 +0000 When the MM_WALK capability is enabled, memory that is mostly accessed by a VM appears younger than it really is, therefore this memory will be less likely to be evicted. Therefore, the presence of a running VM can significantly increase swap-outs for non-VM memory, regressing the performance for the rest of the system. Fix this regression by always calling {ptep,pmdp}_clear_young_notify() whenever we clear the young bits on PMDs/PTEs. [jthoughton(a)google.com: fix link-time error] Link: https://lkml.kernel.org/r/20241019012940.3656292-3-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Reported-by: David Stevens <stevensd(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Cc: kernel test robot <lkp(a)intel.com> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mmzone.h | 5 +- mm/rmap.c | 9 +-- mm/vmscan.c | 88 +++++++++++++++++++++------------------ 3 files changed, 55 insertions(+), 47 deletions(-) --- a/include/linux/mmzone.h~mm-multi-gen-lru-use-pteppmdp_clear_young_notify +++ a/include/linux/mmzone.h @@ -555,7 +555,7 @@ struct lru_gen_memcg { void lru_gen_init_pgdat(struct pglist_data *pgdat); void lru_gen_init_lruvec(struct lruvec *lruvec); -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw); +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw); void lru_gen_init_memcg(struct mem_cgroup *memcg); void lru_gen_exit_memcg(struct mem_cgroup *memcg); @@ -574,8 +574,9 @@ static inline void lru_gen_init_lruvec(s { } -static inline void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +static inline bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { + return false; } static inline void lru_gen_init_memcg(struct mem_cgroup *memcg) --- a/mm/rmap.c~mm-multi-gen-lru-use-pteppmdp_clear_young_notify +++ a/mm/rmap.c @@ -885,13 +885,10 @@ static bool folio_referenced_one(struct return false; } - if (pvmw.pte) { - if (lru_gen_enabled() && - pte_young(ptep_get(pvmw.pte))) { - lru_gen_look_around(&pvmw); + if (lru_gen_enabled() && pvmw.pte) { + if (lru_gen_look_around(&pvmw)) referenced++; - } - + } else if (pvmw.pte) { if (ptep_clear_flush_young_notify(vma, address, pvmw.pte)) referenced++; --- a/mm/vmscan.c~mm-multi-gen-lru-use-pteppmdp_clear_young_notify +++ a/mm/vmscan.c @@ -56,6 +56,7 @@ #include <linux/khugepaged.h> #include <linux/rculist_nulls.h> #include <linux/random.h> +#include <linux/mmu_notifier.h> #include <asm/tlbflush.h> #include <asm/div64.h> @@ -3294,7 +3295,8 @@ static bool get_next_vma(unsigned long m return false; } -static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pte_pfn(pte_t pte, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pte_pfn(pte); @@ -3306,13 +3308,20 @@ static unsigned long get_pte_pfn(pte_t p if (WARN_ON_ONCE(pte_devmap(pte) || pte_special(pte))) return -1; + if (!pte_young(pte) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } -static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr) +static unsigned long get_pmd_pfn(pmd_t pmd, struct vm_area_struct *vma, unsigned long addr, + struct pglist_data *pgdat) { unsigned long pfn = pmd_pfn(pmd); @@ -3324,9 +3333,15 @@ static unsigned long get_pmd_pfn(pmd_t p if (WARN_ON_ONCE(pmd_devmap(pmd))) return -1; + if (!pmd_young(pmd) && !mm_has_notifiers(vma->vm_mm)) + return -1; + if (WARN_ON_ONCE(!pfn_valid(pfn))) return -1; + if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) + return -1; + return pfn; } @@ -3335,10 +3350,6 @@ static struct folio *get_pfn_folio(unsig { struct folio *folio; - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - return NULL; - folio = pfn_folio(pfn); if (folio_nid(folio) != pgdat->node_id) return NULL; @@ -3394,20 +3405,16 @@ restart: total++; walk->mm_stats[MM_LEAF_TOTAL]++; - pfn = get_pte_pfn(ptent, args->vma, addr); + pfn = get_pte_pfn(ptent, args->vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) { - continue; - } - folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(args->vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(args->vma, addr, pte + i)) + continue; young++; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3473,21 +3480,25 @@ static void walk_pmd_range_locked(pud_t /* don't round down the first address */ addr = i ? (*first & PMD_MASK) + i * PMD_SIZE : *first; - pfn = get_pmd_pfn(pmd[i], vma, addr); - if (pfn == -1) + if (!pmd_present(pmd[i])) goto next; if (!pmd_trans_huge(pmd[i])) { - if (!walk->force_scan && should_clear_pmd_young()) + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) pmdp_test_and_clear_young(vma, addr, pmd + i); goto next; } + pfn = get_pmd_pfn(pmd[i], vma, addr, pgdat); + if (pfn == -1) + goto next; + folio = get_pfn_folio(pfn, memcg, pgdat, walk->can_swap); if (!folio) goto next; - if (!pmdp_test_and_clear_young(vma, addr, pmd + i)) + if (!pmdp_clear_young_notify(vma, addr, pmd + i)) goto next; walk->mm_stats[MM_LEAF_YOUNG]++; @@ -3545,24 +3556,18 @@ restart: } if (pmd_trans_huge(val)) { - unsigned long pfn = pmd_pfn(val); struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec); + unsigned long pfn = get_pmd_pfn(val, vma, addr, pgdat); walk->mm_stats[MM_LEAF_TOTAL]++; - if (!pmd_young(val)) { - continue; - } - - /* try to avoid unnecessary memory loads */ - if (pfn < pgdat->node_start_pfn || pfn >= pgdat_end_pfn(pgdat)) - continue; - - walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); + if (pfn != -1) + walk_pmd_range_locked(pud, addr, vma, args, bitmap, &first); continue; } - if (!walk->force_scan && should_clear_pmd_young()) { + if (!walk->force_scan && should_clear_pmd_young() && + !mm_has_notifiers(args->mm)) { if (!pmd_young(val)) continue; @@ -4036,13 +4041,13 @@ static void lru_gen_age_node(struct pgli * the PTE table to the Bloom filter. This forms a feedback loop between the * eviction and the aging. */ -void lru_gen_look_around(struct page_vma_mapped_walk *pvmw) +bool lru_gen_look_around(struct page_vma_mapped_walk *pvmw) { int i; unsigned long start; unsigned long end; struct lru_gen_mm_walk *walk; - int young = 0; + int young = 1; pte_t *pte = pvmw->pte; unsigned long addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; @@ -4058,12 +4063,15 @@ void lru_gen_look_around(struct page_vma lockdep_assert_held(pvmw->ptl); VM_WARN_ON_ONCE_FOLIO(folio_test_lru(folio), folio); + if (!ptep_clear_young_notify(vma, addr, pte)) + return false; + if (spin_is_contended(pvmw->ptl)) - return; + return true; /* exclude special VMAs containing anon pages from COW */ if (vma->vm_flags & VM_SPECIAL) - return; + return true; /* avoid taking the LRU lock under the PTL when possible */ walk = current->reclaim_state ? current->reclaim_state->mm_walk : NULL; @@ -4071,6 +4079,9 @@ void lru_gen_look_around(struct page_vma start = max(addr & PMD_MASK, vma->vm_start); end = min(addr | ~PMD_MASK, vma->vm_end - 1) + 1; + if (end - start == PAGE_SIZE) + return true; + if (end - start > MIN_LRU_BATCH * PAGE_SIZE) { if (addr - start < MIN_LRU_BATCH * PAGE_SIZE / 2) end = start + MIN_LRU_BATCH * PAGE_SIZE; @@ -4084,7 +4095,7 @@ void lru_gen_look_around(struct page_vma /* folio_update_gen() requires stable folio_memcg() */ if (!mem_cgroup_trylock_pages(memcg)) - return; + return true; arch_enter_lazy_mmu_mode(); @@ -4094,19 +4105,16 @@ void lru_gen_look_around(struct page_vma unsigned long pfn; pte_t ptent = ptep_get(pte + i); - pfn = get_pte_pfn(ptent, vma, addr); + pfn = get_pte_pfn(ptent, vma, addr, pgdat); if (pfn == -1) continue; - if (!pte_young(ptent)) - continue; - folio = get_pfn_folio(pfn, memcg, pgdat, can_swap); if (!folio) continue; - if (!ptep_test_and_clear_young(vma, addr, pte + i)) - VM_WARN_ON_ONCE(true); + if (!ptep_clear_young_notify(vma, addr, pte + i)) + continue; young++; @@ -4136,6 +4144,8 @@ void lru_gen_look_around(struct page_vma /* feedback from rmap walkers to page table walkers */ if (mm_state && suitable_to_scan(i, young)) update_bloom_filter(mm_state, max_seq, pvmw->pmd); + + return true; } /****************************************************************************** _ Patches currently in -mm which might be from yuzhao(a)google.com are mm-page_alloc-keep-track-of-free-highatomic.patch

1 year, 1 month

1
0
0 0

[merged mm-hotfixes-stable] mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats.patch removed from -mm tree

by Andrew Morton

The quilt patch titled Subject: mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats has been removed from the -mm tree. Its filename was mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats.patch This patch was dropped because it was merged into the mm-hotfixes-stable branch of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm ------------------------------------------------------ From: Yu Zhao <yuzhao(a)google.com> Subject: mm: multi-gen LRU: remove MM_LEAF_OLD and MM_NONLEAF_TOTAL stats Date: Sat, 19 Oct 2024 01:29:38 +0000 Patch series "mm: multi-gen LRU: Have secondary MMUs participate in MM_WALK". Today, the MM_WALK capability causes MGLRU to clear the young bit from PMDs and PTEs during the page table walk before eviction, but MGLRU does not call the clear_young() MMU notifier in this case. By not calling this notifier, the MM walk takes less time/CPU, but it causes pages that are accessed mostly through KVM / secondary MMUs to appear younger than they should be. We do call the clear_young() notifier today, but only when attempting to evict the page, so we end up clearing young/accessed information less frequently for secondary MMUs than for mm PTEs, and therefore they appear younger and are less likely to be evicted. Therefore, memory that is *not* being accessed mostly by KVM will be evicted *more* frequently, worsening performance. ChromeOS observed a tab-open latency regression when enabling MGLRU with a setup that involved running a VM: Tab-open latency histogram (ms) Version p50 mean p95 p99 max base 1315 1198 2347 3454 10319 mglru 2559 1311 7399 12060 43758 fix 1119 926 2470 4211 6947 This series replaces the final non-selftest patchs from this series[1], which introduced a similar change (and a new MMU notifier) with KVM optimizations. I'll send a separate series (to Sean and Paolo) for the KVM optimizations. This series also makes proactive reclaim with MGLRU possible for KVM memory. I have verified that this functions correctly with the selftest from [1], but given that that test is a KVM selftest, I'll send it with the rest of the KVM optimizations later. Andrew, let me know if you'd like to take the test now anyway. [1]: https://lore.kernel.org/linux-mm/20240926013506.860253-18-jthoughton@google… This patch (of 2): The removed stats, MM_LEAF_OLD and MM_NONLEAF_TOTAL, are not very helpful and become more complicated to properly compute when adding test/clear_young() notifiers in MGLRU's mm walk. Link: https://lkml.kernel.org/r/20241019012940.3656292-1-jthoughton@google.com Link: https://lkml.kernel.org/r/20241019012940.3656292-2-jthoughton@google.com Fixes: bd74fdaea146 ("mm: multi-gen LRU: support page table walks") Signed-off-by: Yu Zhao <yuzhao(a)google.com> Signed-off-by: James Houghton <jthoughton(a)google.com> Cc: Axel Rasmussen <axelrasmussen(a)google.com> Cc: David Matlack <dmatlack(a)google.com> Cc: David Rientjes <rientjes(a)google.com> Cc: David Stevens <stevensd(a)google.com> Cc: Oliver Upton <oliver.upton(a)linux.dev> Cc: Paolo Bonzini <pbonzini(a)redhat.com> Cc: Sean Christopherson <seanjc(a)google.com> Cc: Wei Xu <weixugc(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mmzone.h | 2 -- mm/vmscan.c | 14 +++++--------- 2 files changed, 5 insertions(+), 11 deletions(-) --- a/include/linux/mmzone.h~mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats +++ a/include/linux/mmzone.h @@ -458,9 +458,7 @@ struct lru_gen_folio { enum { MM_LEAF_TOTAL, /* total leaf entries */ - MM_LEAF_OLD, /* old leaf entries */ MM_LEAF_YOUNG, /* young leaf entries */ - MM_NONLEAF_TOTAL, /* total non-leaf entries */ MM_NONLEAF_FOUND, /* non-leaf entries found in Bloom filters */ MM_NONLEAF_ADDED, /* non-leaf entries added to Bloom filters */ NR_MM_STATS --- a/mm/vmscan.c~mm-multi-gen-lru-remove-mm_leaf_old-and-mm_nonleaf_total-stats +++ a/mm/vmscan.c @@ -3399,7 +3399,6 @@ restart: continue; if (!pte_young(ptent)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3552,7 +3551,6 @@ restart: walk->mm_stats[MM_LEAF_TOTAL]++; if (!pmd_young(val)) { - walk->mm_stats[MM_LEAF_OLD]++; continue; } @@ -3564,8 +3562,6 @@ restart: continue; } - walk->mm_stats[MM_NONLEAF_TOTAL]++; - if (!walk->force_scan && should_clear_pmd_young()) { if (!pmd_young(val)) continue; @@ -5254,11 +5250,11 @@ static void lru_gen_seq_show_full(struct for (tier = 0; tier < MAX_NR_TIERS; tier++) { seq_printf(m, " %10d", tier); for (type = 0; type < ANON_AND_FILE; type++) { - const char *s = " "; + const char *s = "xxx"; unsigned long n[3] = {}; if (seq == max_seq) { - s = "RT "; + s = "RTx"; n[0] = READ_ONCE(lrugen->avg_refaulted[type][tier]); n[1] = READ_ONCE(lrugen->avg_total[type][tier]); } else if (seq == min_seq[type] || NR_HIST_GENS > 1) { @@ -5280,14 +5276,14 @@ static void lru_gen_seq_show_full(struct seq_puts(m, " "); for (i = 0; i < NR_MM_STATS; i++) { - const char *s = " "; + const char *s = "xxxx"; unsigned long n = 0; if (seq == max_seq && NR_HIST_GENS == 1) { - s = "LOYNFA"; + s = "TYFA"; n = READ_ONCE(mm_state->stats[hist][i]); } else if (seq != max_seq && NR_HIST_GENS > 1) { - s = "loynfa"; + s = "tyfa"; n = READ_ONCE(mm_state->stats[hist][i]); } _ Patches currently in -mm which might be from yuzhao(a)google.com are mm-page_alloc-keep-track-of-free-highatomic.patch

1 year, 1 month

1
0
0 0

[PATCH 6.1 000/137] 6.1.115-rc1 review

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 6.1.115 release. There are 137 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Wed, 30 Oct 2024 06:22:39 +0000. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.115-rc… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 6.1.115-rc1 junhua huang <huang.junhua(a)zte.com.cn> arm64/uprobes: change the uprobe_opcode_t typedef to fix the sparse warning Dan Carpenter <dan.carpenter(a)linaro.org> ACPI: PRM: Clean up guid type in struct prm_handler_info Armin Wolf <W_Armin(a)gmx.de> platform/x86: dell-wmi: Ignore suspend notifications Zichen Xie <zichenxie0106(a)gmail.com> ASoC: qcom: Fix NULL Dereference in asoc_qcom_lpass_cpu_platform_probe() Xinyu Zhang <xizhang(a)purestorage.com> block: fix sanity checks in blk_rq_map_user_bvec Michel Alex <Alex.Michel(a)wiedemann-group.com> net: phy: dp83822: Fix reset pin definitions Jiri Slaby (SUSE) <jirislaby(a)kernel.org> serial: protect uart_port_dtr_rts() in uart_shutdown() too Paul Moore <paul(a)paul-moore.com> selinux: improve error checking in sel_write_load() Mario Limonciello <mario.limonciello(a)amd.com> drm/amd/display: Disable PSR-SU on Parade 08-01 TCON too Haiyang Zhang <haiyangz(a)microsoft.com> hv_netvsc: Fix VF namespace also in synthetic NIC NETDEV_REGISTER event Petr Vaganov <p.vaganov(a)ideco.ru> xfrm: fix one more kernel-infoleak in algo dumping Huacai Chen <chenhuacai(a)kernel.org> LoongArch: Get correct cores_per_package for SMT systems José Relvas <josemonsantorelvas(a)gmail.com> ALSA: hda/realtek: Add subwoofer quirk for Acer Predator G9-593 Marc Zyngier <maz(a)kernel.org> KVM: arm64: Don't eagerly teardown the vgic on init error Sean Christopherson <seanjc(a)google.com> KVM: nSVM: Ignore nCR3[4:0] when loading PDPTEs from memory Aleksa Sarai <cyphar(a)cyphar.com> openat2: explicitly return -E2BIG for (usize > PAGE_SIZE) Ryusuke Konishi <konishi.ryusuke(a)gmail.com> nilfs2: fix kernel bug due to missing clearing of buffer delay flag Shubham Panwar <shubiisp8(a)gmail.com> ACPI: button: Add DMI quirk for Samsung Galaxy Book2 to fix initial lid detection issue Koba Ko <kobak(a)nvidia.com> ACPI: PRM: Find EFI_MEMORY_RUNTIME block for PRM handler and context Christian Heusel <christian(a)heusel.eu> ACPI: resource: Add LG 16T90SP to irq1_level_low_skip_override[] Mario Limonciello <mario.limonciello(a)amd.com> drm/amd: Guard against bad data for ATIF ACPI method Naohiro Aota <naohiro.aota(a)wdc.com> btrfs: zoned: fix zone unusable accounting for freed reserved extent Yue Haibing <yuehaibing(a)huawei.com> btrfs: fix passing 0 to ERR_PTR in btrfs_search_dir_index_item() liwei <liwei728(a)huawei.com> cpufreq: CPPC: fix perf_to_khz/khz_to_perf conversion exception Vincent Guittot <vincent.guittot(a)linaro.org> cpufreq/cppc: Move and rename cppc_cpufreq_{perf_to_khz|khz_to_perf}() Kailang Yang <kailang(a)realtek.com> ALSA: hda/realtek: Update default depop procedure Yuan Can <yuancan(a)huawei.com> powercap: dtpm_devfreq: Fix error check against dev_pm_qos_add_request() Andrey Shumilin <shum.sdl(a)nppct.ru> ALSA: firewire-lib: Avoid division by zero in apply_constraint_to_size() Miquel Raynal <miquel.raynal(a)bootlin.com> ASoC: dt-bindings: davinci-mcasp: Fix interrupt properties Miquel Raynal <miquel.raynal(a)bootlin.com> ASoC: dt-bindings: davinci-mcasp: Fix interrupts property Jiri Olsa <jolsa(a)kernel.org> bpf,perf: Fix perf_event_detach_bpf_prog error handling Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com> Bluetooth: ISO: Fix UAF on iso_sock_timeout Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com> Bluetooth: SCO: Fix UAF on sco_sock_timeout Jinjie Ruan <ruanjinjie(a)huawei.com> posix-clock: posix-clock: Fix unbalanced locking in pc_clock_settime() Heiner Kallweit <hkallweit1(a)gmail.com> r8169: avoid unsolicited interrupts Dmitry Antipov <dmantipov(a)yandex.ru> net: sched: fix use-after-free in taprio_change() Vladimir Oltean <vladimir.oltean(a)nxp.com> net/sched: act_api: deny mismatched skip_sw/skip_hw flags for actions created by classifiers Oliver Neukum <oneukum(a)suse.com> net: usb: usbnet: fix name regression Eric Dumazet <edumazet(a)google.com> net: fix races in netdev_tx_sent_queue()/dev_watchdog() Praveen Kumar Kannoju <praveen.kannoju(a)oracle.com> net/sched: adjust device watchdog timer to detect stopped queue at right time Jakub Kicinski <kuba(a)kernel.org> net: provide macros for commonly copied lockless queue stop/wake code Jakub Kicinski <kuba(a)kernel.org> docs: net: reformat driver.rst from a list to sections Lin Ma <linma(a)zju.edu.cn> net: wwan: fix global oob in wwan_rtnl_policy Pablo Neira Ayuso <pablo(a)netfilter.org> netfilter: xtables: fix typo causing some targets not to load on IPv6 Peter Rashleigh <peter(a)rashleigh.ca> net: dsa: mv88e6xxx: Fix error when setting port policy on mv88e6393x Aleksandr Mishin <amishin(a)t-argos.ru> octeon_ep: Add SKB allocation failures handling in __octep_oq_process_rx() Aleksandr Mishin <amishin(a)t-argos.ru> octeon_ep: Implement helper for iterating packets in Rx queue Jakub Boehm <boehm.jakub(a)gmail.com> net: plip: fix break; causing plip to never transmit Wang Hai <wanghai38(a)huawei.com> be2net: fix potential memory leak in be_xmit() Wang Hai <wanghai38(a)huawei.com> net/sun3_82586: fix potential memory leak in sun3_82586_send_packet() Eyal Birger <eyal.birger(a)gmail.com> xfrm: respect ip protocols rules criteria when performing dst lookups Eyal Birger <eyal.birger(a)gmail.com> xfrm: extract dst lookup parameters into a struct Leo Yan <leo.yan(a)arm.com> tracing: Consider the NULL character when validating the event length Dave Kleikamp <dave.kleikamp(a)oracle.com> jfs: Fix sanity check in dbMount Thomas Weißschuh <thomas.weissschuh(a)linutronix.de> LoongArch: Don't crash in stack_top() for tasks without vDSO Tiezhu Yang <yangtiezhu(a)loongson.cn> LoongArch: Add support to clone a time namespace Crag Wang <crag_wang(a)dell.com> platform/x86: dell-sysman: add support for alienware products Alexey Klimov <alexey.klimov(a)linaro.org> ASoC: qcom: sm8250: add qrb4210-rb2-sndcard compatible string Gianfranco Trad <gianf.trad(a)gmail.com> udf: fix uninit-value use in udf_get_fileshortad Zhao Mengmeng <zhaomengmeng(a)kylinos.cn> udf: refactor udf_current_aext() to handle error Mark Rutland <mark.rutland(a)arm.com> arm64: Force position-independent veneers Shengjiu Wang <shengjiu.wang(a)nxp.com> ASoC: fsl_sai: Enable 'FIFO continue on error' FCONT bit Alexey Klimov <alexey.klimov(a)linaro.org> ASoC: codecs: lpass-rx-macro: add missing CDC_RX_BCL_VBAT_RF_PROC2 to default regs values Hans de Goede <hdegoede(a)redhat.com> drm/vboxvideo: Replace fake VLA at end of vbva_mouse_pointer_shape with real VLA Konstantin Komarov <almaz.alexandrovich(a)paragon-software.com> fs/ntfs3: Add more attributes checks in mi_enum_attr() Mateusz Guzik <mjguzik(a)gmail.com> exec: don't WARN for racy path_noexec check Yu Kuai <yukuai3(a)huawei.com> block, bfq: fix procress reference leakage for bfqq in merge chain Marek Vasut <marex(a)denx.de> serial: imx: Update mctrl old_status on RTSD interrupt Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com> serial: Make uart_handle_cts_change() status param bool active Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com> tty/serial: Make ->dcd_change()+uart_handle_dcd_change() status bool active Roger Quadros <rogerq(a)kernel.org> usb: dwc3: core: Fix system suspend on TI AM62 platforms Frank Li <Frank.Li(a)nxp.com> XHCI: Separate PORT and CAPs macros into dedicated file Elson Roy Serrao <quic_eserrao(a)quicinc.com> usb: gadget: Add function wakeup support Kevin Groeneveld <kgroeneveld(a)lenbrook.com> usb: gadget: f_uac2: fix return value for UAC2_ATTRIBUTE_STRING store John Keeping <jkeeping(a)inmusicbrands.com> usb: gadget: f_uac2: fix non-newline-terminated function name Lee Jones <lee(a)kernel.org> usb: gadget: f_uac2: Replace snprintf() with the safer scnprintf() variant Mathias Nyman <mathias.nyman(a)linux.intel.com> xhci: dbc: honor usb transfer size boundaries. Jiri Slaby (SUSE) <jirislaby(a)kernel.org> xhci: dbgtty: use kfifo from tty_port struct Jiri Slaby (SUSE) <jirislaby(a)kernel.org> xhci: dbgtty: remove kfifo_out() wrapper Mark Rutland <mark.rutland(a)arm.com> arm64: probes: Fix uprobes for big-endian kernels junhua huang <huang.junhua(a)zte.com.cn> arm64:uprobe fix the uprobe SWBP_INSN in big-endian Jordan Rome <linux(a)jordanrome.com> bpf: Fix iter/task tid filtering Andrea Parri <parri.andrea(a)gmail.com> riscv, bpf: Make BPF_CMPXCHG fully ordered Cosmin Ratiu <cratiu(a)nvidia.com> net/mlx5: Unregister notifier on eswitch init failure Shay Drory <shayd(a)nvidia.com> net/mlx5: Fix command bitmask initialization Shay Drory <shayd(a)nvidia.com> net/mlx5: split mlx5_cmd_init() to probe and reload routines Shay Drory <shayd(a)nvidia.com> net/mlx5: Remove redundant cmdif revision check Ye Bin <yebin10(a)huawei.com> Bluetooth: bnep: fix wild-memory-access in proto_unregister Heiko Carstens <hca(a)linux.ibm.com> s390: Initialize psw mask in perf_arch_fetch_caller_regs() Thadeu Lima de Souza Cascardo <cascardo(a)igalia.com> usb: typec: altmode should keep reference to parent Paulo Alcantara <pc(a)manguebit.com> smb: client: fix OOBs when building SMB2_IOCTL request Wang Hai <wanghai38(a)huawei.com> scsi: target: core: Fix null-ptr-deref in target_alloc_device() Niklas Söderlund <niklas.soderlund+renesas(a)ragnatech.se> net: ravb: Only advertise Rx/Tx timestamps if hardware supports it Gal Pressman <gal(a)nvidia.com> ravb: Remove setting of RX software timestamp Eric Dumazet <edumazet(a)google.com> genetlink: hold RCU in genlmsg_mcast() Kuniyuki Iwashima <kuniyu(a)amazon.com> tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink(). Jessica Zhang <quic_jesszhan(a)quicinc.com> drm/msm/dpu: don't always program merge_3d block Marijn Suijten <marijn.suijten(a)somainline.org> drm/msm/dpu: Wire up DSC mask for active CTL configuration Fabrizio Castro <fabrizio.castro.jz(a)renesas.com> irqchip/renesas-rzg2l: Fix missing put_device Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com> irqchip/renesas-rzg2l: Add support for suspend to RAM Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com> irqchip/renesas-rzg2l: Document structure members Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com> irqchip/renesas-rzg2l: Align struct member names to tabs Wang Hai <wanghai38(a)huawei.com> net: systemport: fix potential memory leak in bcm_sysport_xmit() Wang Hai <wanghai38(a)huawei.com> net: xilinx: axienet: fix potential memory leak in axienet_start_xmit() Li RongQing <lirongqing(a)baidu.com> net/smc: Fix searching in list of known pnetids in smc_pnet_add_pnetid Wang Hai <wanghai38(a)huawei.com> net: ethernet: aeroflex: fix potential memory leak in greth_start_xmit_gbit() Eric Dumazet <edumazet(a)google.com> netdevsim: use cond_resched() in nsim_dev_trap_report_work() Sabrina Dubroca <sd(a)queasysnail.net> macsec: don't increment counters for an unrelated SA Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com> drm/amd/amdgpu: Fix double unlock in amdgpu_mes_add_ring Colin Ian King <colin.i.king(a)gmail.com> octeontx2-af: Fix potential integer overflows on integer shifts Oliver Neukum <oneukum(a)suse.com> net: usb: usbnet: fix race in probe failure Douglas Anderson <dianders(a)chromium.org> drm/msm: Allocate memory for disp snapshot with kvzalloc() Douglas Anderson <dianders(a)chromium.org> drm/msm: Avoid NULL dereference in msm_disp_state_print_regs() Jonathan Marek <jonathan(a)marek.ca> drm/msm/dsi: fix 32-bit signed integer extension in pclk_rate calculation Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org> drm/msm/dpu: make sure phys resources are properly initialized Bhargava Chenna Marreddy <bhargava.marreddy(a)broadcom.com> RDMA/bnxt_re: Fix a bug while setting up Level-2 PBL pages Kalesh AP <kalesh-anakkur.purayil(a)broadcom.com> RDMA/bnxt_re: Return more meaningful error Xin Long <lucien.xin(a)gmail.com> ipv4: give an IPv4 dev to blackhole_netdev Bart Van Assche <bvanassche(a)acm.org> RDMA/srpt: Make slab cache names unique Alexander Zubkov <green(a)qrator.net> RDMA/irdma: Fix misspelling of "accept*" Anumula Murali Mohan Reddy <anumula(a)chelsio.com> RDMA/cxgb4: Fix RDMA_CM_EVENT_UNREACHABLE error for iWARP Murad Masimov <m.masimov(a)maxima.ru> ALSA: hda/cs8409: Fix possible NULL dereference Tony Ambardar <tony.ambardar(a)gmail.com> selftests/bpf: Fix cross-compiling urandom_read Ian Forbes <ian.forbes(a)broadcom.com> drm/vmwgfx: Handle possible ENOMEM in vmw_stdu_connector_atomic_check Javier Carrasco <javier.carrasco.cruz(a)gmail.com> iio: frequency: admv4420: fix missing select REMAP_SPI in Kconfig Javier Carrasco <javier.carrasco.cruz(a)gmail.com> iio: frequency: {admv4420,adrf6780}: format Kconfig entries Toke Høiland-Jørgensen <toke(a)redhat.com> bpf: fix kfunc btf caching for modules Niklas Schnelle <schnelle(a)linux.ibm.com> s390/pci: Handle PCI error codes other than 0x3a Florian Klink <flokli(a)flokli.de> ARM: dts: bcm2837-rpi-cm3-io3: Fix HDMI hpd-gpio pin Martin Kletzander <nert.pinx(a)gmail.com> x86/resctrl: Avoid overflow in MB settings in bw_validate() Kalesh AP <kalesh-anakkur.purayil(a)broadcom.com> RDMA/bnxt_re: Add a check for memory allocation Saravanan Vajravel <saravanan.vajravel(a)broadcom.com> RDMA/bnxt_re: Fix incorrect AVID type in WQE structure Jiri Olsa <jolsa(a)kernel.org> bpf: Fix memory leak in bpf_core_apply Florian Kauer <florian.kauer(a)linutronix.de> bpf: devmap: provide rxq after redirect Toke Høiland-Jørgensen <toke(a)redhat.com> bpf: Make sure internal and UAPI bpf_redirect flags don't overlap Mikhail Lobanov <m.lobanov(a)rosalinux.ru> iio: accel: bma400: Fix uninitialized variable field_value in tap event handling. Wander Lairson Costa <wander.lairson(a)gmail.com> bpf: Use raw_spinlock_t in ringbuf ------------- Diffstat: .../bindings/sound/davinci-mcasp-audio.yaml | 18 +- Documentation/networking/driver.rst | 97 +++++--- Makefile | 4 +- arch/arm/boot/dts/bcm2837-rpi-cm3-io3.dts | 2 +- arch/arm64/Makefile | 2 +- arch/arm64/include/asm/uprobes.h | 12 +- arch/arm64/kernel/probes/uprobes.c | 4 +- arch/arm64/kvm/arm.c | 3 + arch/arm64/kvm/vgic/vgic-init.c | 6 +- arch/loongarch/Kconfig | 1 + arch/loongarch/include/asm/bootinfo.h | 4 + arch/loongarch/include/asm/page.h | 1 + arch/loongarch/include/asm/vdso/gettimeofday.h | 9 +- arch/loongarch/include/asm/vdso/vdso.h | 32 ++- arch/loongarch/kernel/process.c | 14 +- arch/loongarch/kernel/setup.c | 3 +- arch/loongarch/kernel/vdso.c | 98 ++++++-- arch/loongarch/vdso/vgetcpu.c | 2 +- arch/riscv/net/bpf_jit_comp64.c | 4 +- arch/s390/include/asm/perf_event.h | 1 + arch/s390/pci/pci_event.c | 17 +- arch/x86/kernel/cpu/resctrl/ctrlmondata.c | 23 +- arch/x86/kvm/svm/nested.c | 6 +- block/bfq-iosched.c | 37 ++- block/blk-map.c | 4 +- drivers/acpi/button.c | 11 + drivers/acpi/cppc_acpi.c | 116 +++++++++ drivers/acpi/prmt.c | 29 ++- drivers/acpi/resource.c | 7 + drivers/cpufreq/cppc_cpufreq.c | 139 ++--------- drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c | 15 +- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 5 +- .../drm/amd/display/modules/power/power_helpers.c | 2 + drivers/gpu/drm/msm/disp/dpu1/dpu_encoder.c | 9 +- .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_cmd.c | 1 + .../gpu/drm/msm/disp/dpu1/dpu_encoder_phys_vid.c | 3 +- drivers/gpu/drm/msm/disp/msm_disp_snapshot_util.c | 19 +- drivers/gpu/drm/msm/dsi/dsi_host.c | 2 +- drivers/gpu/drm/vboxvideo/hgsmi_base.c | 10 +- drivers/gpu/drm/vboxvideo/vboxvideo.h | 4 +- drivers/gpu/drm/vmwgfx/vmwgfx_stdu.c | 4 + drivers/iio/accel/bma400_core.c | 3 +- drivers/iio/frequency/Kconfig | 31 +-- drivers/infiniband/hw/bnxt_re/qplib_fp.h | 2 +- drivers/infiniband/hw/bnxt_re/qplib_rcfw.c | 2 +- drivers/infiniband/hw/bnxt_re/qplib_res.c | 21 +- drivers/infiniband/hw/cxgb4/cm.c | 9 +- drivers/infiniband/hw/irdma/cm.c | 2 +- drivers/infiniband/ulp/srpt/ib_srpt.c | 80 ++++++- drivers/irqchip/irq-renesas-rzg2l.c | 94 ++++++-- drivers/net/dsa/mv88e6xxx/port.c | 1 + drivers/net/ethernet/aeroflex/greth.c | 3 +- drivers/net/ethernet/broadcom/bcmsysport.c | 1 + drivers/net/ethernet/emulex/benet/be_main.c | 10 +- drivers/net/ethernet/i825xx/sun3_82586.c | 1 + drivers/net/ethernet/marvell/octeon_ep/octep_rx.c | 82 +++++-- .../net/ethernet/marvell/octeontx2/af/rvu_nix.c | 4 +- drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 138 ++++++----- drivers/net/ethernet/mellanox/mlx5/core/eswitch.c | 5 +- drivers/net/ethernet/mellanox/mlx5/core/main.c | 15 +- .../net/ethernet/mellanox/mlx5/core/mlx5_core.h | 2 + drivers/net/ethernet/realtek/r8169_main.c | 4 +- drivers/net/ethernet/renesas/ravb_main.c | 25 +- drivers/net/ethernet/xilinx/xilinx_axienet_main.c | 2 + drivers/net/hyperv/netvsc_drv.c | 30 +++ drivers/net/macsec.c | 18 -- drivers/net/netdevsim/dev.c | 15 +- drivers/net/phy/dp83822.c | 4 +- drivers/net/plip/plip.c | 2 +- drivers/net/usb/usbnet.c | 4 +- drivers/net/wwan/wwan_core.c | 2 +- drivers/platform/x86/dell/dell-wmi-base.c | 9 + drivers/platform/x86/dell/dell-wmi-sysman/sysman.c | 1 + drivers/powercap/dtpm_devfreq.c | 2 +- drivers/pps/clients/pps-ldisc.c | 6 +- drivers/target/target_core_device.c | 2 +- drivers/target/target_core_user.c | 2 +- drivers/tty/serial/imx.c | 17 +- drivers/tty/serial/max3100.c | 2 +- drivers/tty/serial/max310x.c | 3 +- drivers/tty/serial/serial_core.c | 32 +-- drivers/tty/serial/sunhv.c | 8 +- drivers/usb/dwc3/core.c | 19 ++ drivers/usb/dwc3/core.h | 3 + drivers/usb/gadget/composite.c | 40 ++++ drivers/usb/gadget/function/f_uac2.c | 13 +- drivers/usb/host/xhci-caps.h | 85 +++++++ drivers/usb/host/xhci-dbgcap.h | 2 +- drivers/usb/host/xhci-dbgtty.c | 71 ++++-- drivers/usb/host/xhci-port.h | 176 ++++++++++++++ drivers/usb/host/xhci.h | 262 +-------------------- drivers/usb/typec/class.c | 3 + fs/btrfs/block-group.c | 2 + fs/btrfs/dir-item.c | 4 +- fs/btrfs/inode.c | 7 +- fs/exec.c | 21 +- fs/jfs/jfs_dmap.c | 2 +- fs/nilfs2/page.c | 6 +- fs/ntfs3/record.c | 67 +++++- fs/open.c | 2 + fs/smb/client/smb2pdu.c | 9 + fs/udf/inode.c | 49 ++-- fs/udf/truncate.c | 10 +- fs/udf/udfdecl.h | 5 +- include/acpi/cppc_acpi.h | 2 + include/linux/netdevice.h | 13 + include/linux/serial_core.h | 6 +- include/linux/tty_ldisc.h | 4 +- include/linux/usb/composite.h | 6 + include/linux/usb/gadget.h | 1 + include/net/bluetooth/bluetooth.h | 1 + include/net/genetlink.h | 3 +- include/net/netdev_queues.h | 144 +++++++++++ include/net/xfrm.h | 28 ++- include/uapi/linux/bpf.h | 13 +- kernel/bpf/btf.c | 1 + kernel/bpf/devmap.c | 11 +- kernel/bpf/ringbuf.c | 12 +- kernel/bpf/task_iter.c | 2 +- kernel/bpf/verifier.c | 8 +- kernel/time/posix-clock.c | 6 +- kernel/trace/bpf_trace.c | 2 - kernel/trace/trace_probe.c | 2 +- net/bluetooth/af_bluetooth.c | 22 ++ net/bluetooth/bnep/core.c | 3 +- net/bluetooth/iso.c | 18 +- net/bluetooth/sco.c | 18 +- net/core/filter.c | 8 +- net/ipv4/devinet.c | 35 ++- net/ipv4/inet_connection_sock.c | 21 +- net/ipv4/xfrm4_policy.c | 38 ++- net/ipv6/xfrm6_policy.c | 31 +-- net/l2tp/l2tp_netlink.c | 4 +- net/netfilter/xt_NFLOG.c | 2 +- net/netfilter/xt_TRACE.c | 1 + net/netfilter/xt_mark.c | 2 +- net/netlink/genetlink.c | 28 +-- net/sched/act_api.c | 23 +- net/sched/sch_generic.c | 17 +- net/sched/sch_taprio.c | 3 +- net/smc/smc_pnet.c | 2 +- net/wireless/nl80211.c | 8 +- net/xfrm/xfrm_device.c | 11 +- net/xfrm/xfrm_policy.c | 50 +++- net/xfrm/xfrm_user.c | 4 +- security/selinux/selinuxfs.c | 27 ++- sound/firewire/amdtp-stream.c | 3 + sound/pci/hda/patch_cs8409.c | 5 +- sound/pci/hda/patch_realtek.c | 48 ++-- sound/soc/codecs/lpass-rx-macro.c | 2 +- sound/soc/fsl/fsl_sai.c | 5 +- sound/soc/fsl/fsl_sai.h | 1 + sound/soc/qcom/lpass-cpu.c | 2 + sound/soc/qcom/sm8250.c | 1 + tools/testing/selftests/bpf/Makefile | 2 +- 155 files changed, 1978 insertions(+), 1066 deletions(-)

1 year, 1 month

14
153
0 0

Re: Patch "dt-bindings: gpu: Convert Samsung Image Rotator to dt-schema" has been added to the 5.4-stable tree

by Conor Dooley

On 01/11/2024 19:29, Sasha Levin wrote: > This is a note to let you know that I've just added the patch titled > > dt-bindings: gpu: Convert Samsung Image Rotator to dt-schema > > to the 5.4-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > dt-bindings-gpu-convert-samsung-image-rotator-to-dt-.patch > and it can be found in the queue-5.4 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. > > > > commit 25cb1f1f53fe137aefdc5e54bb1392098c4200ed > Author: Maciej Falkowski <m.falkowski(a)samsung.com> > Date: Tue Sep 17 12:37:27 2019 +0200 > > dt-bindings: gpu: Convert Samsung Image Rotator to dt-schema > > [ Upstream commit 6e3ffcd592060403ee2d956c9b1704775898db79 ] > > Convert Samsung Image Rotator to newer dt-schema format. > > Signed-off-by: Maciej Falkowski <m.falkowski(a)samsung.com> > Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com> > Signed-off-by: Rob Herring <robh(a)kernel.org> > Stable-dep-of: 338c4d3902fe ("igb: Disable threaded IRQ for igb_msix_other") > Signed-off-by: Sasha Levin <sashal(a)kernel.org> How is a binding conversion a dep for a one line driver patch that doesn't parse any new properties?

1 year, 1 month

2
2
0 0

[PATCH V2] vp_vdpa: fix id_table array not null terminated error

by Xiaoguang Wang

Allocate one extra virtio_device_id as null terminator, otherwise vdpa_mgmtdev_get_classes() may iterate multiple times and visit undefined memory. Fixes: ffbda8e9df10 ("vdpa/vp_vdpa : add vdpa tool support in vp_vdpa") Cc: stable(a)vger.kernel.org Suggested-by: Parav Pandit <parav(a)nvidia.com> Signed-off-by: Angus Chen <angus.chen(a)jaguarmicro.com> Signed-off-by: Xiaoguang Wang <lege.wang(a)jaguarmicro.com> --- V2: Use kcalloc() api. --- drivers/vdpa/virtio_pci/vp_vdpa.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/vdpa/virtio_pci/vp_vdpa.c b/drivers/vdpa/virtio_pci/vp_vdpa.c index ac4ab22f7d8b..b6410a984f29 100644 --- a/drivers/vdpa/virtio_pci/vp_vdpa.c +++ b/drivers/vdpa/virtio_pci/vp_vdpa.c @@ -612,7 +612,11 @@ static int vp_vdpa_probe(struct pci_dev *pdev, const struct pci_device_id *id) goto mdev_err; } - mdev_id = kzalloc(sizeof(struct virtio_device_id), GFP_KERNEL); + /* + * id_table should be a null terminated array, so allocate one additional + * entry here, see vdpa_mgmtdev_get_classes(). + */ + mdev_id = kcalloc(2, sizeof(struct virtio_device_id), GFP_KERNEL); if (!mdev_id) { err = -ENOMEM; goto mdev_id_err; -- 2.40.1

1 year, 1 month

2
1
0 0

Backport smb client char/block fixes

by Pali Rohár

Hello, I would like to propose backporting these two commits into stable: * 663f295e3559 ("smb: client: fix parsing of device numbers") * a9de67336a4a ("smb: client: set correct device number on nfs reparse points") Linux SMB client without these two recent fixes swaps device major and minor numbers, which makes basically char/block device nodes unusable. Commit 663f295e3559 ("smb: client: fix parsing of device numbers") should have had following Fixes line: Fixes: 45e724022e27 ("smb: client: set correct file type from NFS reparse points") And commit a9de67336a4a ("smb: client: set correct device number on nfs reparse points") should have contained line: Fixes: 102466f303ff ("smb: client: allow creating special files via reparse points") Pali

1 year, 1 month

2
2
0 0

[PATCH v2] lib: string_helpers: silence snprintf() output truncation warning

by Bartosz Golaszewski

From: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> The output of ".%03u" with the unsigned int in range [0, 4294966295] may get truncated if the target buffer is not 12 bytes. This can't really happen here as the 'remainder' variable cannot exceed 999 but the compiler doesn't know it. To make it happy just increase the buffer to where the warning goes away. Fixes: 3c9f3681d0b4 ("[SCSI] lib: add generic helper to print sizes rounded to the correct SI range") Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org> Reviewed-by: Andy Shevchenko <andy(a)kernel.org> Cc: James E.J. Bottomley <James.Bottomley(a)HansenPartnership.com> Cc: Kees Cook <kees(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- Changes in v2: - improve the commit message lib/string_helpers.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/string_helpers.c b/lib/string_helpers.c index 4f887aa62fa0..91fa37b5c510 100644 --- a/lib/string_helpers.c +++ b/lib/string_helpers.c @@ -57,7 +57,7 @@ int string_get_size(u64 size, u64 blk_size, const enum string_size_units units, static const unsigned int rounding[] = { 500, 50, 5 }; int i = 0, j; u32 remainder = 0, sf_cap; - char tmp[8]; + char tmp[12]; const char *unit; tmp[0] = '\0'; -- 2.45.2

1 year, 1 month

2
1
0 0

[PATCH RFC 0/2] PCI: endpoint: fix bugs for both API pci_epc_destroy() and pci_epc_remove_epf()

by Zijun Hu

This patch series is to fix bugs for below 2 APIs: pci_epc_destroy() pci_epc_remove_epf() Signed-off-by: Zijun Hu <quic_zijuhu(a)quicinc.com> --- Zijun Hu (2): PCI: endpoint: Fix API pci_epc_destroy() releasing domain_nr ID faults PCI: endpoint: Fix that API pci_epc_remove_epf() cleans up wrong EPC of EPF drivers/pci/endpoint/pci-epc-core.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) --- base-commit: 11066801dd4b7c4d75fce65c812723a80c1481ae change-id: 20241102-epc_rfc-e1d9d03d5101 Best regards, -- Zijun Hu <quic_zijuhu(a)quicinc.com>

1 year, 1 month

1
2
0 0

Proposal for you!

by William Cheung

Hello Dear, I have a lucratuve proposal for you, reply for more info. Thank you, William Cheung

1 year, 1 month

1
0
0 0

[PATCH v4] usb: typec: qcom-pmic: init value of hdr_len/txbuf_len earlier

by Rex Nie

If the read of USB_PDPHY_RX_ACKNOWLEDGE_REG failed, then hdr_len and txbuf_len are uninitialized. This commit stops to print uninitialized value and misleading/false data. Cc: stable(a)vger.kernel.org Fixes: a4422ff22142 (" usb: typec: qcom: Add Qualcomm PMIC Type-C driver") Signed-off-by: Rex Nie <rex.nie(a)jaguarmicro.com> --- V2 -> V3: - add changelog, add Fixes tag, add Cc stable ml. Thanks heikki - Link to v2: https://lore.kernel.org/all/20241030022753.2045-1-rex.nie@jaguarmicro.com/ V1 -> V2: - keep printout when data didn't transmit, thanks Bjorn, bod, greg k-h - Links: https://lore.kernel.org/all/b177e736-e640-47ed-9f1e-ee65971dfc9c@linaro.org/ --- drivers/usb/typec/tcpm/qcom/qcom_pmic_typec_pdphy.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/usb/typec/tcpm/qcom/qcom_pmic_typec_pdphy.c b/drivers/usb/typec/tcpm/qcom/qcom_pmic_typec_pdphy.c index 5b7f52b74a40..726423684bae 100644 --- a/drivers/usb/typec/tcpm/qcom/qcom_pmic_typec_pdphy.c +++ b/drivers/usb/typec/tcpm/qcom/qcom_pmic_typec_pdphy.c @@ -227,6 +227,10 @@ qcom_pmic_typec_pdphy_pd_transmit_payload(struct pmic_typec_pdphy *pmic_typec_pd spin_lock_irqsave(&pmic_typec_pdphy->lock, flags); + hdr_len = sizeof(msg->header); + txbuf_len = pd_header_cnt_le(msg->header) * 4; + txsize_len = hdr_len + txbuf_len - 1; + ret = regmap_read(pmic_typec_pdphy->regmap, pmic_typec_pdphy->base + USB_PDPHY_RX_ACKNOWLEDGE_REG, &val); @@ -244,10 +248,6 @@ qcom_pmic_typec_pdphy_pd_transmit_payload(struct pmic_typec_pdphy *pmic_typec_pd if (ret) goto done; - hdr_len = sizeof(msg->header); - txbuf_len = pd_header_cnt_le(msg->header) * 4; - txsize_len = hdr_len + txbuf_len - 1; - /* Write message header sizeof(u16) to USB_PDPHY_TX_BUFFER_HDR_REG */ ret = regmap_bulk_write(pmic_typec_pdphy->regmap, pmic_typec_pdphy->base + USB_PDPHY_TX_BUFFER_HDR_REG, -- 2.17.1

1 year, 1 month

3
3
0 0

[PATCH iwl-net v2 1/2] idpf: avoid vport access in idpf_get_link_ksettings

by Pavan Kumar Linga

When the device control plane is removed or the platform running device control plane is rebooted, a reset is detected on the driver. On driver reset, it releases the resources and waits for the reset to complete. If the reset fails, it takes the error path and releases the vport lock. At this time if the monitoring tools tries to access link settings, it call traces for accessing released vport pointer. To avoid it, move link_speed_mbps to netdev_priv structure which removes the dependency on vport pointer and the vport lock in idpf_get_link_ksettings. Also use netif_carrier_ok() to check the link status and adjust the offsetof to use link_up instead of link_speed_mbps. Fixes: 02cbfba1add5 ("idpf: add ethtool callbacks") Cc: stable(a)vger.kernel.org # 6.7+ Reviewed-by: Tarun K Singh <tarun.k.singh(a)intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga(a)intel.com> --- drivers/net/ethernet/intel/idpf/idpf.h | 4 ++-- drivers/net/ethernet/intel/idpf/idpf_ethtool.c | 11 +++-------- drivers/net/ethernet/intel/idpf/idpf_lib.c | 4 ++-- drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 2 +- 4 files changed, 8 insertions(+), 13 deletions(-) diff --git a/drivers/net/ethernet/intel/idpf/idpf.h b/drivers/net/ethernet/intel/idpf/idpf.h index 2c31ad87587a..66544faab710 100644 --- a/drivers/net/ethernet/intel/idpf/idpf.h +++ b/drivers/net/ethernet/intel/idpf/idpf.h @@ -141,6 +141,7 @@ enum idpf_vport_state { * @adapter: Adapter back pointer * @vport: Vport back pointer * @vport_id: Vport identifier + * @link_speed_mbps: Link speed in mbps * @vport_idx: Relative vport index * @state: See enum idpf_vport_state * @netstats: Packet and byte stats @@ -150,6 +151,7 @@ struct idpf_netdev_priv { struct idpf_adapter *adapter; struct idpf_vport *vport; u32 vport_id; + u32 link_speed_mbps; u16 vport_idx; enum idpf_vport_state state; struct rtnl_link_stats64 netstats; @@ -287,7 +289,6 @@ struct idpf_port_stats { * @tx_itr_profile: TX profiles for Dynamic Interrupt Moderation * @port_stats: per port csum, header split, and other offload stats * @link_up: True if link is up - * @link_speed_mbps: Link speed in mbps * @sw_marker_wq: workqueue for marker packets */ struct idpf_vport { @@ -331,7 +332,6 @@ struct idpf_vport { struct idpf_port_stats port_stats; bool link_up; - u32 link_speed_mbps; wait_queue_head_t sw_marker_wq; }; diff --git a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c index 3806ddd3ce4a..59b1a1a09996 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_ethtool.c +++ b/drivers/net/ethernet/intel/idpf/idpf_ethtool.c @@ -1296,24 +1296,19 @@ static void idpf_set_msglevel(struct net_device *netdev, u32 data) static int idpf_get_link_ksettings(struct net_device *netdev, struct ethtool_link_ksettings *cmd) { - struct idpf_vport *vport; - - idpf_vport_ctrl_lock(netdev); - vport = idpf_netdev_to_vport(netdev); + struct idpf_netdev_priv *np = netdev_priv(netdev); ethtool_link_ksettings_zero_link_mode(cmd, supported); cmd->base.autoneg = AUTONEG_DISABLE; cmd->base.port = PORT_NONE; - if (vport->link_up) { + if (netif_carrier_ok(netdev)) { cmd->base.duplex = DUPLEX_FULL; - cmd->base.speed = vport->link_speed_mbps; + cmd->base.speed = np->link_speed_mbps; } else { cmd->base.duplex = DUPLEX_UNKNOWN; cmd->base.speed = SPEED_UNKNOWN; } - idpf_vport_ctrl_unlock(netdev); - return 0; } diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c index 4f20343e49a9..c3848e10e7db 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_lib.c +++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c @@ -1860,7 +1860,7 @@ int idpf_initiate_soft_reset(struct idpf_vport *vport, * mess with. Nothing below should use those variables from new_vport * and should instead always refer to them in vport if they need to. */ - memcpy(new_vport, vport, offsetof(struct idpf_vport, link_speed_mbps)); + memcpy(new_vport, vport, offsetof(struct idpf_vport, link_up)); /* Adjust resource parameters prior to reallocating resources */ switch (reset_cause) { @@ -1906,7 +1906,7 @@ int idpf_initiate_soft_reset(struct idpf_vport *vport, /* Same comment as above regarding avoiding copying the wait_queues and * mutexes applies here. We do not want to mess with those if possible. */ - memcpy(vport, new_vport, offsetof(struct idpf_vport, link_speed_mbps)); + memcpy(vport, new_vport, offsetof(struct idpf_vport, link_up)); if (reset_cause == IDPF_SR_Q_CHANGE) idpf_vport_alloc_vec_indexes(vport); diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c index 70986e12da28..3be883726b87 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c @@ -141,7 +141,7 @@ static void idpf_handle_event_link(struct idpf_adapter *adapter, } np = netdev_priv(vport->netdev); - vport->link_speed_mbps = le32_to_cpu(v2e->link_speed); + np->link_speed_mbps = le32_to_cpu(v2e->link_speed); if (vport->link_up == v2e->link_status) return; -- 2.43.0

1 year, 1 month

2
1
0 0

[PATCH iwl-net v2 2/2] idpf: fix idpf_vc_core_init error path

by Pavan Kumar Linga

In an event where the platform running the device control plane is rebooted, reset is detected on the driver. It releases all the resources and waits for the reset to complete. Once the reset is done, it tries to build the resources back. At this time if the device control plane is not yet started, then the driver timeouts on the virtchnl message and retries to establish the mailbox again. In the retry flow, mailbox is deinitialized but the mailbox workqueue is still alive and polling for the mailbox message. This results in accessing the released control queue leading to null-ptr-deref. Fix it by unrolling the work queue cancellation and mailbox deinitialization in the reverse order which they got initialized. Fixes: 4930fbf419a7 ("idpf: add core init and interrupt request") Fixes: 34c21fa894a1 ("idpf: implement virtchnl transaction manager") Cc: stable(a)vger.kernel.org # 6.9+ Reviewed-by: Tarun K Singh <tarun.k.singh(a)intel.com> Signed-off-by: Pavan Kumar Linga <pavan.kumar.linga(a)intel.com> --- v2: - remove changes which are not fixes for the actual issue from this patch --- drivers/net/ethernet/intel/idpf/idpf_lib.c | 1 + drivers/net/ethernet/intel/idpf/idpf_virtchnl.c | 1 - 2 files changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/idpf/idpf_lib.c b/drivers/net/ethernet/intel/idpf/idpf_lib.c index c3848e10e7db..b4fbb99bfad2 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_lib.c +++ b/drivers/net/ethernet/intel/idpf/idpf_lib.c @@ -1786,6 +1786,7 @@ static int idpf_init_hard_reset(struct idpf_adapter *adapter) */ err = idpf_vc_core_init(adapter); if (err) { + cancel_delayed_work_sync(&adapter->mbx_task); idpf_deinit_dflt_mbx(adapter); goto unlock_mutex; } diff --git a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c index 3be883726b87..e7eee571d908 100644 --- a/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c +++ b/drivers/net/ethernet/intel/idpf/idpf_virtchnl.c @@ -3070,7 +3070,6 @@ int idpf_vc_core_init(struct idpf_adapter *adapter) adapter->state = __IDPF_VER_CHECK; if (adapter->vcxn_mngr) idpf_vc_xn_shutdown(adapter->vcxn_mngr); - idpf_deinit_dflt_mbx(adapter); set_bit(IDPF_HR_DRV_LOAD, adapter->flags); queue_delayed_work(adapter->vc_event_wq, &adapter->vc_event_task, msecs_to_jiffies(task_delay)); -- 2.43.0

1 year, 1 month

2
1
0 0

[PATCH] Drivers: hv: kvp/vss: Avoid accessing a ringbuffer not initialized yet

by Dexuan Cui

If the KVP (or VSS) daemon starts before the VMBus channel's ringbuffer is fully initialized, we can hit the panic below: hv_utils: Registering HyperV Utility Driver hv_vmbus: registering driver hv_utils ... BUG: kernel NULL pointer dereference, address: 0000000000000000 CPU: 44 UID: 0 PID: 2552 Comm: hv_kvp_daemon Tainted: G E 6.11.0-rc3+ #1 RIP: 0010:hv_pkt_iter_first+0x12/0xd0 Call Trace: ... vmbus_recvpacket hv_kvp_onchannelcallback vmbus_on_event tasklet_action_common tasklet_action handle_softirqs irq_exit_rcu sysvec_hyperv_stimer0 </IRQ> <TASK> asm_sysvec_hyperv_stimer0 ... kvp_register_done hvt_op_read vfs_read ksys_read __x64_sys_read This can happen because the KVP/VSS channel callback can be invoked even before the channel is fully opened: 1) as soon as hv_kvp_init() -> hvutil_transport_init() creates /dev/vmbus/hv_kvp, the kvp daemon can open the device file immediately and register itself to the driver by writing a message KVP_OP_REGISTER1 to the file (which is handled by kvp_on_msg() ->kvp_handle_handshake()) and reading the file for the driver's response, which is handled by hvt_op_read(), which calls hvt->on_read(), i.e. kvp_register_done(). 2) the problem with kvp_register_done() is that it can cause the channel callback to be called even before the channel is fully opened, and when the channel callback is starting to run, util_probe()-> vmbus_open() may have not initialized the ringbuffer yet, so the callback can hit the panic of NULL pointer dereference. To reproduce the panic consistently, we can add a "ssleep(10)" for KVP in __vmbus_open(), just before the first hv_ringbuffer_init(), and then we unload and reload the driver hv_utils, and run the daemon manually within the 10 seconds. Fix the panic by checking the channel state in the channel callback. To avoid the race condition with __vmbus_open(), we disable and enable the channel callback temporarily in __vmbus_open(). The channel callbacks of the other VMBus devices don't need to check the channel state since they can't run before the channels are fully initialized. Note: we would also need to fix the fcopy driver code, but that has been removed in commit ec314f61e4fc ("Drivers: hv: Remove fcopy driver") in the mainline kernel since v6.10. For old 6.x LTS kernels, and the 5.x and 4.x LTS kernels, the fcopy driver needs to be fixed when the fix is backported to the stable kernel branches. Fixes: e0fa3e5e7df6 ("Drivers: hv: utils: fix a race on userspace daemons registration") Cc: stable(a)vger.kernel.org Signed-off-by: Dexuan Cui <decui(a)microsoft.com> --- drivers/hv/channel.c | 11 +++++++++++ drivers/hv/hv_kvp.c | 3 +++ drivers/hv/hv_snapshot.c | 3 +++ 3 files changed, 17 insertions(+) diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index fb8cd8469328..685e407a3fdf 100644 --- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -657,6 +657,14 @@ static int __vmbus_open(struct vmbus_channel *newchannel, return -ENOMEM; } + /* + * The channel callbacks of KVP/VSS may run before __vmbus_open() + * finishes (see kvp_register_done() -> ... -> kvp_poll_wrapper()), so + * they check newchannel->state to tell the ringbuffer has been fully + * initialized or not. Disable and enable the tasklet to avoid the race. + */ + tasklet_disable(&newchannel->callback_event); + newchannel->state = CHANNEL_OPENING_STATE; newchannel->onchannel_callback = onchannelcallback; newchannel->channel_callback_context = context; @@ -750,6 +758,8 @@ static int __vmbus_open(struct vmbus_channel *newchannel, } newchannel->state = CHANNEL_OPENED_STATE; + tasklet_enable(&newchannel->callback_event); + kfree(open_info); return 0; @@ -766,6 +776,7 @@ static int __vmbus_open(struct vmbus_channel *newchannel, hv_ringbuffer_cleanup(&newchannel->inbound); vmbus_free_requestor(&newchannel->requestor); newchannel->state = CHANNEL_OPEN_STATE; + tasklet_enable(&newchannel->callback_event); return err; } diff --git a/drivers/hv/hv_kvp.c b/drivers/hv/hv_kvp.c index d35b60c06114..ec098067e579 100644 --- a/drivers/hv/hv_kvp.c +++ b/drivers/hv/hv_kvp.c @@ -662,6 +662,9 @@ void hv_kvp_onchannelcallback(void *context) if (kvp_transaction.state > HVUTIL_READY) return; + if (channel->state != CHANNEL_OPENED_STATE) + return; + if (vmbus_recvpacket(channel, recv_buffer, HV_HYP_PAGE_SIZE * 4, &recvlen, &requestid)) { pr_err_ratelimited("KVP request received. Could not read into recv buf\n"); return; diff --git a/drivers/hv/hv_snapshot.c b/drivers/hv/hv_snapshot.c index 0d2184be1691..f7924c2fc62e 100644 --- a/drivers/hv/hv_snapshot.c +++ b/drivers/hv/hv_snapshot.c @@ -301,6 +301,9 @@ void hv_vss_onchannelcallback(void *context) if (vss_transaction.state > HVUTIL_READY) return; + if (channel->state != CHANNEL_OPENED_STATE) + return; + if (vmbus_recvpacket(channel, recv_buffer, VSS_MAX_PKT_SIZE, &recvlen, &requestid)) { pr_err_ratelimited("VSS request received. Could not read into recv buf\n"); return; -- 2.25.1

1 year, 1 month

3
9
0 0

[PATCH v3] tpm: Lock TPM chip in tpm_pm_suspend() first

by Jarkko Sakkinen

Setting TPM_CHIP_FLAG_SUSPENDED in the end of tpm_pm_suspend() can be racy according, as this leaves window for tpm_hwrng_read() to be called while the operation is in progress. The recent bug report gives also evidence of this behaviour. Aadress this by locking the TPM chip before checking any chip->flags both in tpm_pm_suspend() and tpm_hwrng_read(). Move TPM_CHIP_FLAG_SUSPENDED check inside tpm_get_random() so that it will be always checked only when the lock is reserved. Cc: stable(a)vger.kernel.org # v6.4+ Fixes: 99d464506255 ("tpm: Prevent hwrng from activating during resume") Reported-by: Mike Seo <mikeseohyungjin(a)gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219383 Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org> --- v3: - Check TPM_CHIP_FLAG_SUSPENDED inside tpm_get_random() so that it is also done under the lock (suggested by Jerry Snitselaar). v2: - Addressed my own remark: https://lore.kernel.org/linux-integrity/D59JAI6RR2CD.G5E5T4ZCZ49W@kernel.or… --- drivers/char/tpm/tpm-chip.c | 4 ---- drivers/char/tpm/tpm-interface.c | 32 ++++++++++++++++++++++---------- 2 files changed, 22 insertions(+), 14 deletions(-) diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c index 1ff99a7091bb..7df7abaf3e52 100644 --- a/drivers/char/tpm/tpm-chip.c +++ b/drivers/char/tpm/tpm-chip.c @@ -525,10 +525,6 @@ static int tpm_hwrng_read(struct hwrng *rng, void *data, size_t max, bool wait) { struct tpm_chip *chip = container_of(rng, struct tpm_chip, hwrng); - /* Give back zero bytes, as TPM chip has not yet fully resumed: */ - if (chip->flags & TPM_CHIP_FLAG_SUSPENDED) - return 0; - return tpm_get_random(chip, data, max); } diff --git a/drivers/char/tpm/tpm-interface.c b/drivers/char/tpm/tpm-interface.c index 8134f002b121..b1daa0d7b341 100644 --- a/drivers/char/tpm/tpm-interface.c +++ b/drivers/char/tpm/tpm-interface.c @@ -370,6 +370,13 @@ int tpm_pm_suspend(struct device *dev) if (!chip) return -ENODEV; + rc = tpm_try_get_ops(chip); + if (rc) { + /* Can be safely set out of locks, as no action cannot race: */ + chip->flags |= TPM_CHIP_FLAG_SUSPENDED; + goto out; + } + if (chip->flags & TPM_CHIP_FLAG_ALWAYS_POWERED) goto suspended; @@ -377,21 +384,19 @@ int tpm_pm_suspend(struct device *dev) !pm_suspend_via_firmware()) goto suspended; - rc = tpm_try_get_ops(chip); - if (!rc) { - if (chip->flags & TPM_CHIP_FLAG_TPM2) { - tpm2_end_auth_session(chip); - tpm2_shutdown(chip, TPM2_SU_STATE); - } else { - rc = tpm1_pm_suspend(chip, tpm_suspend_pcr); - } - - tpm_put_ops(chip); + if (chip->flags & TPM_CHIP_FLAG_TPM2) { + tpm2_end_auth_session(chip); + tpm2_shutdown(chip, TPM2_SU_STATE); + goto suspended; } + rc = tpm1_pm_suspend(chip, tpm_suspend_pcr); + suspended: chip->flags |= TPM_CHIP_FLAG_SUSPENDED; + tpm_put_ops(chip); +out: if (rc) dev_err(dev, "Ignoring error %d while suspending\n", rc); return 0; @@ -440,11 +445,18 @@ int tpm_get_random(struct tpm_chip *chip, u8 *out, size_t max) if (!chip) return -ENODEV; + /* Give back zero bytes, as TPM chip has not yet fully resumed: */ + if (chip->flags & TPM_CHIP_FLAG_SUSPENDED) { + rc = 0; + goto out; + } + if (chip->flags & TPM_CHIP_FLAG_TPM2) rc = tpm2_get_random(chip, out, max); else rc = tpm1_get_random(chip, out, max); +out: tpm_put_ops(chip); return rc; } -- 2.47.0

1 year, 1 month

2
6
0 0

+ ucounts-fix-counter-leak-in-inc_rlimit_get_ucounts.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: ucounts: fix counter leak in inc_rlimit_get_ucounts() has been added to the -mm mm-hotfixes-unstable branch. Its filename is ucounts-fix-counter-leak-in-inc_rlimit_get_ucounts.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Andrei Vagin <avagin(a)google.com> Subject: ucounts: fix counter leak in inc_rlimit_get_ucounts() Date: Fri, 1 Nov 2024 19:19:40 +0000 The inc_rlimit_get_ucounts() increments the specified rlimit counter and then checks its limit. If the value exceeds the limit, the function returns an error without decrementing the counter. Link: https://lkml.kernel.org/r/20241101191940.3211128-1-roman.gushchin@linux.dev Fixes: 15bc01effefe ("ucounts: Fix signal ucount refcounting") Signed-off-by: Andrei Vagin <avagin(a)google.com> Co-developed-by: Roman Gushchin <roman.gushchin(a)linux.dev> Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev> Tested-by: Roman Gushchin <roman.gushchin(a)linux.dev> Acked-by: Alexey Gladkov <legion(a)kernel.org> Cc: Kees Cook <kees(a)kernel.org> Cc: Andrei Vagin <avagin(a)google.com> Cc: "Eric W. Biederman" <ebiederm(a)xmission.com> Cc: Alexey Gladkov <legion(a)kernel.org> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/ucount.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) --- a/kernel/ucount.c~ucounts-fix-counter-leak-in-inc_rlimit_get_ucounts +++ a/kernel/ucount.c @@ -318,7 +318,7 @@ long inc_rlimit_get_ucounts(struct ucoun for (iter = ucounts; iter; iter = iter->ns->ucounts) { long new = atomic_long_add_return(1, &iter->rlimit[type]); if (new < 0 || (!override_rlimit && (new > max))) - goto unwind; + goto dec_unwind; if (iter == ucounts) ret = new; max = get_userns_rlimit_max(iter->ns, type); @@ -335,7 +335,6 @@ long inc_rlimit_get_ucounts(struct ucoun dec_unwind: dec = atomic_long_sub_return(1, &iter->rlimit[type]); WARN_ON_ONCE(dec < 0); -unwind: do_dec_rlimit_put_ucounts(ucounts, iter, type); return 0; } _ Patches currently in -mm which might be from avagin(a)google.com are ucounts-fix-counter-leak-in-inc_rlimit_get_ucounts.patch

1 year, 1 month

1
0
0 0

[PATCH v2] ucounts: fix counter leak in inc_rlimit_get_ucounts()

by Roman Gushchin

From: Andrei Vagin <avagin(a)google.com> The inc_rlimit_get_ucounts() increments the specified rlimit counter and then checks its limit. If the value exceeds the limit, the function returns an error without decrementing the counter. v2: changed the goto label name [Roman] Fixes: 15bc01effefe ("ucounts: Fix signal ucount refcounting") Signed-off-by: Andrei Vagin <avagin(a)google.com> Co-developed-by: Roman Gushchin <roman.gushchin(a)linux.dev> Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev> Tested-by: Roman Gushchin <roman.gushchin(a)linux.dev> Acked-by: Alexey Gladkov <legion(a)kernel.org> Cc: Kees Cook <kees(a)kernel.org> Cc: Andrei Vagin <avagin(a)google.com> Cc: "Eric W. Biederman" <ebiederm(a)xmission.com> Cc: Alexey Gladkov <legion(a)kernel.org> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: <stable(a)vger.kernel.org> --- kernel/ucount.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/kernel/ucount.c b/kernel/ucount.c index 8c07714ff27d..9469102c5ac0 100644 --- a/kernel/ucount.c +++ b/kernel/ucount.c @@ -317,7 +317,7 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type) for (iter = ucounts; iter; iter = iter->ns->ucounts) { long new = atomic_long_add_return(1, &iter->rlimit[type]); if (new < 0 || new > max) - goto unwind; + goto dec_unwind; if (iter == ucounts) ret = new; max = get_userns_rlimit_max(iter->ns, type); @@ -334,7 +334,6 @@ long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type) dec_unwind: dec = atomic_long_sub_return(1, &iter->rlimit[type]); WARN_ON_ONCE(dec < 0); -unwind: do_dec_rlimit_put_ucounts(ucounts, iter, type); return 0; } -- 2.47.0.163.g1226f6d8fa-goog

1 year, 1 month

2
1
0 0

[PATCH V3] wifi: rtlwifi: Drastically reduce the attempts to read efuse in case of failures

by Guilherme G. Piccoli

Syzkaller reported a hung task with uevent_show() on stack trace. That specific issue was addressed by another commit [0], but even with that fix applied (for example, running v6.12-rc5) we face another type of hung task that comes from the same reproducer [1]. By investigating that, we could narrow it to the following path: (a) Syzkaller emulates a Realtek USB WiFi adapter using raw-gadget and dummy_hcd infrastructure. (b) During the probe of rtl8192cu, the driver ends-up performing an efuse read procedure (which is related to EEPROM load IIUC), and here lies the issue: the function read_efuse() calls read_efuse_byte() many times, as loop iterations depending on the efuse size (in our example, 512 in total). This procedure for reading efuse bytes relies in a loop that performs an I/O read up to *10k* times in case of failures. We measured the time of the loop inside read_efuse_byte() alone, and in this reproducer (which involves the dummy_hcd emulation layer), it takes 15 seconds each. As a consequence, we have the driver stuck in its probe routine for big time, exposing a stack trace like below if we attempt to reboot the system, for example: task:kworker/0:3 state:D stack:0 pid:662 tgid:662 ppid:2 flags:0x00004000 Workqueue: usb_hub_wq hub_event Call Trace: __schedule+0xe22/0xeb6 schedule_timeout+0xe7/0x132 __wait_for_common+0xb5/0x12e usb_start_wait_urb+0xc5/0x1ef ? usb_alloc_urb+0x95/0xa4 usb_control_msg+0xff/0x184 _usbctrl_vendorreq_sync+0xa0/0x161 _usb_read_sync+0xb3/0xc5 read_efuse_byte+0x13c/0x146 read_efuse+0x351/0x5f0 efuse_read_all_map+0x42/0x52 rtl_efuse_shadow_map_update+0x60/0xef rtl_get_hwinfo+0x5d/0x1c2 rtl92cu_read_eeprom_info+0x10a/0x8d5 ? rtl92c_read_chip_version+0x14f/0x17e rtl_usb_probe+0x323/0x851 usb_probe_interface+0x278/0x34b really_probe+0x202/0x4a4 __driver_probe_device+0x166/0x1b2 driver_probe_device+0x2f/0xd8 [...] We propose hereby to drastically reduce the attempts of doing the I/O reads in case of failures, restricted to USB devices (given that they're inherently slower than PCIe ones). By retrying up to 10 times (instead of 10000), we got reponsiveness in the reproducer, while seems reasonable to believe that there's no sane USB device implementation in the field requiring this amount of retries at every I/O read in order to properly work. Based on that assumption, it'd be good to have it backported to stable but maybe not since driver implementation (the 10k number comes from day 0), perhaps up to 6.x series makes sense. [0] Commit 15fffc6a5624 ("driver core: Fix uevent_show() vs driver detach race"). [1] A note about that: this syzkaller report presents multiple reproducers that differs by the type of emulated USB device. For this specific case, check the entry from 2024/08/08 06:23 in the list of crashes; the C repro is available at https://syzkaller.appspot.com/text?tag=ReproC&x=1521fc83980000. Cc: stable(a)vger.kernel.org # v6.1+ Reported-by: syzbot+edd9fe0d3a65b14588d5(a)syzkaller.appspotmail.com Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> --- V3: - Switch variable declaration to reverse xmas tree order (thanks Ping-Ke Shih for the suggestion). V2: - Restrict the change to USB device only (thanks Ping-Ke Shih). - Tested in 2 USB devices by Bitterblue Smith - thanks a lot! V1: https://lore.kernel.org/lkml/20241025150226.896613-1-gpiccoli@igalia.com/ drivers/net/wireless/realtek/rtlwifi/efuse.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/net/wireless/realtek/rtlwifi/efuse.c b/drivers/net/wireless/realtek/rtlwifi/efuse.c index 82cf5fb5175f..0ff553f650f9 100644 --- a/drivers/net/wireless/realtek/rtlwifi/efuse.c +++ b/drivers/net/wireless/realtek/rtlwifi/efuse.c @@ -162,9 +162,19 @@ void efuse_write_1byte(struct ieee80211_hw *hw, u16 address, u8 value) void read_efuse_byte(struct ieee80211_hw *hw, u16 _offset, u8 *pbuf) { struct rtl_priv *rtlpriv = rtl_priv(hw); + u16 retry, max_attempts; u32 value32; u8 readbyte; - u16 retry; + + /* + * In case of USB devices, transfer speeds are limited, hence + * efuse I/O reads could be (way) slower. So, decrease (a lot) + * the read attempts in case of failures. + */ + if (rtlpriv->rtlhal.interface == INTF_PCI) + max_attempts = 10000; + else + max_attempts = 10; rtl_write_byte(rtlpriv, rtlpriv->cfg->maps[EFUSE_CTRL] + 1, (_offset & 0xff)); @@ -178,7 +188,7 @@ void read_efuse_byte(struct ieee80211_hw *hw, u16 _offset, u8 *pbuf) retry = 0; value32 = rtl_read_dword(rtlpriv, rtlpriv->cfg->maps[EFUSE_CTRL]); - while (!(((value32 >> 24) & 0xff) & 0x80) && (retry < 10000)) { + while (!(((value32 >> 24) & 0xff) & 0x80) && (retry < max_attempts)) { value32 = rtl_read_dword(rtlpriv, rtlpriv->cfg->maps[EFUSE_CTRL]); retry++; -- 2.46.2

1 year, 1 month

2
4
0 0

+ mm-fix-__wp_page_copy_user-fallback-path-for-remote-mm.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm: fix __wp_page_copy_user fallback path for remote mm has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-fix-__wp_page_copy_user-fallback-path-for-remote-mm.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Asahi Lina <lina(a)asahilina.net> Subject: mm: fix __wp_page_copy_user fallback path for remote mm Date: Fri, 01 Nov 2024 21:08:02 +0900 If the source page is a PFN mapping, we copy back from userspace. However, if this fault is a remote access, we cannot use __copy_from_user_inatomic. Instead, use access_remote_vm() in this case. Fixes WARN and incorrect zero-filling when writing to CoW mappings in a remote process, such as when using gdb on a binary present on a DAX filesystem. [ 143.683782] ------------[ cut here ]------------ [ 143.683784] WARNING: CPU: 1 PID: 350 at mm/memory.c:2904 __wp_page_copy_user+0x120/0x2bc [ 143.683793] CPU: 1 PID: 350 Comm: gdb Not tainted 6.6.52 #1 [ 143.683794] Hardware name: linux,dummy-virt (DT) [ 143.683795] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS BTYPE=--) [ 143.683796] pc : __wp_page_copy_user+0x120/0x2bc [ 143.683798] lr : __wp_page_copy_user+0x254/0x2bc [ 143.683799] sp : ffff80008272b8b0 [ 143.683799] x29: ffff80008272b8b0 x28: 0000000000000000 x27: ffff000083bad580 [ 143.683801] x26: 0000000000000000 x25: 0000fffff7fd5000 x24: ffff000081db04c0 [ 143.683802] x23: ffff00014f24b000 x22: fffffc00053c92c0 x21: ffff000083502150 [ 143.683803] x20: 0000fffff7fd5000 x19: ffff80008272b9d0 x18: 0000000000000000 [ 143.683804] x17: ffff000081db0500 x16: ffff800080fe52a0 x15: 0000fffff7fd5000 [ 143.683804] x14: 0000000000bb1845 x13: 0000000000000080 x12: ffff80008272b880 [ 143.683805] x11: ffff000081d13600 x10: ffff000081d13608 x9 : ffff000081d1360c [ 143.683806] x8 : ffff000083a16f00 x7 : 0000000000000010 x6 : ffff00014f24b000 [ 143.683807] x5 : ffff00014f24c000 x4 : 0000000000000000 x3 : ffff000083582000 [ 143.683807] x2 : 0000000000000f80 x1 : 0000fffff7fd5000 x0 : 0000000000001000 [ 143.683808] Call trace: [ 143.683809] __wp_page_copy_user+0x120/0x2bc [ 143.683810] wp_page_copy+0x98/0x5c0 [ 143.683813] do_wp_page+0x250/0x530 [ 143.683814] __handle_mm_fault+0x278/0x284 [ 143.683817] handle_mm_fault+0x64/0x1e8 [ 143.683819] faultin_page+0x5c/0x110 [ 143.683820] __get_user_pages+0xc8/0x2f4 [ 143.683821] get_user_pages_remote+0xac/0x30c [ 143.683823] __access_remote_vm+0xb4/0x368 [ 143.683824] access_remote_vm+0x10/0x1c [ 143.683826] mem_rw.isra.0+0xc4/0x218 [ 143.683831] mem_write+0x18/0x24 [ 143.683831] vfs_write+0xa0/0x37c [ 143.683834] ksys_pwrite64+0x7c/0xc0 [ 143.683834] __arm64_sys_pwrite64+0x20/0x2c [ 143.683835] invoke_syscall+0x48/0x10c [ 143.683837] el0_svc_common.constprop.0+0x40/0xe0 [ 143.683839] do_el0_svc+0x1c/0x28 [ 143.683841] el0_svc+0x3c/0xdc [ 143.683846] el0t_64_sync_handler+0x120/0x12c [ 143.683848] el0t_64_sync+0x194/0x198 [ 143.683849] ---[ end trace 0000000000000000 ]--- Link: https://lkml.kernel.org/r/20241101-mm-remote-pfn-v1-1-080b609270b7@asahilin… Fixes: 83d116c53058 ("mm: fix double page fault on arm64 if PTE_AF is cleared") Signed-off-by: Asahi Lina <lina(a)asahilina.net> Cc: Jia He <justin.he(a)arm.com> Cc: Yibo Cai <Yibo.Cai(a)arm.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Catalin Marinas <catalin.marinas(a)arm.com> Cc: Asahi Lina <lina(a)asahilina.net> Cc: Sergio Lopez Pascual <slp(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/mm/memory.c~mm-fix-__wp_page_copy_user-fallback-path-for-remote-mm +++ a/mm/memory.c @@ -3081,13 +3081,18 @@ static inline int __wp_page_copy_user(st update_mmu_cache_range(vmf, vma, addr, vmf->pte, 1); } + /* If the mm is a remote mm, copy in the page using access_remote_vm() */ + if (current->mm != mm) { + if (access_remote_vm(mm, (unsigned long)uaddr, kaddr, PAGE_SIZE, 0) != PAGE_SIZE) + goto warn; + } /* * This really shouldn't fail, because the page is there * in the page tables. But it might just be unreadable, * in which case we just give up and fill the result with * zeroes. */ - if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) { + else if (__copy_from_user_inatomic(kaddr, uaddr, PAGE_SIZE)) { if (vmf->pte) goto warn; _ Patches currently in -mm which might be from lina(a)asahilina.net are mm-fix-__wp_page_copy_user-fallback-path-for-remote-mm.patch

1 year, 1 month

1
0
0 0

+ selftests-hugetlb_dio-check-for-initial-conditions-to-skip-in-the-start.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: selftests: hugetlb_dio: check for initial conditions to skip in the start has been added to the -mm mm-hotfixes-unstable branch. Its filename is selftests-hugetlb_dio-check-for-initial-conditions-to-skip-in-the-start.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Subject: selftests: hugetlb_dio: check for initial conditions to skip in the start Date: Fri, 1 Nov 2024 19:15:57 +0500 The test should be skipped if initial conditions aren't fulfilled in the start instead of failing and outputting non-compliant TAP logs. This kind of failure pollutes the results. The initial conditions are: - The test should only execute if /tmp file can be allocated. - The test should only execute if huge pages are free. Before: TAP version 13 1..4 Bail out! Error opening file : Read-only file system (30) # Planned tests != run tests (4 != 0) # Totals: pass:0 fail:0 xfail:0 xpass:0 skip:0 error:0 After: TAP version 13 1..0 # SKIP Unable to allocate file: Read-only file system Link: https://lkml.kernel.org/r/20241101141557.3159432-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Fixes: 3a103b5315b7 ("selftest: mm: Test if hugepage does not get leaked during __bio_release_pages()") Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Donet Tom <donettom(a)linux.ibm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- tools/testing/selftests/mm/hugetlb_dio.c | 19 ++++++++++++------- 1 file changed, 12 insertions(+), 7 deletions(-) --- a/tools/testing/selftests/mm/hugetlb_dio.c~selftests-hugetlb_dio-check-for-initial-conditions-to-skip-in-the-start +++ a/tools/testing/selftests/mm/hugetlb_dio.c @@ -44,13 +44,6 @@ void run_dio_using_hugetlb(unsigned int if (fd < 0) ksft_exit_fail_perror("Error opening file\n"); - /* Get the free huge pages before allocation */ - free_hpage_b = get_free_hugepages(); - if (free_hpage_b == 0) { - close(fd); - ksft_exit_skip("No free hugepage, exiting!\n"); - } - /* Allocate a hugetlb page */ orig_buffer = mmap(NULL, h_pagesize, mmap_prot, mmap_flags, -1, 0); if (orig_buffer == MAP_FAILED) { @@ -94,8 +87,20 @@ void run_dio_using_hugetlb(unsigned int int main(void) { size_t pagesize = 0; + int fd; ksft_print_header(); + + /* Open the file to DIO */ + fd = open("/tmp", O_TMPFILE | O_RDWR | O_DIRECT, 0664); + if (fd < 0) + ksft_exit_skip("Unable to allocate file: %s\n", strerror(errno)); + close(fd); + + /* Check if huge pages are free */ + if (!get_free_hugepages()) + ksft_exit_skip("No free hugepage, exiting\n"); + ksft_set_plan(4); /* Get base page size */ _ Patches currently in -mm which might be from usama.anjum(a)collabora.com are selftests-hugetlb_dio-check-for-initial-conditions-to-skip-in-the-start.patch

1 year, 1 month

1
0
0 0

[for-linus][PATCH 1/3] tracing: Fix tracefs mount options

by Steven Rostedt

From: Kalesh Singh <kaleshsingh(a)google.com> Commit 78ff64081949 ("vfs: Convert tracefs to use the new mount API") converted tracefs to use the new mount APIs caused mount options (e.g. gid=<gid>) to not take effect. The tracefs superblock can be updated from multiple paths: - on fs_initcall() to init_trace_printk_function_export() - from a work queue to initialize eventfs tracer_init_tracefs_work_func() - fsconfig() syscall to mount or remount of tracefs The tracefs superblock root inode gets created early on in init_trace_printk_function_export(). With the new mount API, tracefs effectively uses get_tree_single() instead of the old API mount_single(). Previously, mount_single() ensured that the options are always applied to the superblock root inode: (1) If the root inode didn't exist, call fill_super() to create it and apply the options. (2) If the root inode exists, call reconfigure_single() which effectively calls tracefs_apply_options() to parse and apply options to the subperblock's fs_info and inode and remount eventfs (if necessary) On the other hand, get_tree_single() effectively calls vfs_get_super() which: (3) If the root inode doesn't exists, calls fill_super() to create it and apply the options. (4) If the root inode already exists, updates the fs_context root with the superblock's root inode. (4) above is always the case for tracefs mounts, since the super block's root inode will already be created by init_trace_printk_function_export(). This means that the mount options get ignored: - Since it isn't applied to the superblock's root inode, it doesn't get inherited by the children. - Since eventfs is initialized from a separate work queue and before call to mount with the options, and it doesn't get remounted for mount. Ensure that the mount options are applied to the super block and eventfs is remounted to respect the mount options. To understand this better, if fstab has the following: tracefs /sys/kernel/tracing tracefs nosuid,nodev,noexec,gid=tracing 0 0 On boot up, permissions look like: # ls -l /sys/kernel/tracing/trace -rw-r----- 1 root root 0 Nov 1 08:37 /sys/kernel/tracing/trace When it should look like: # ls -l /sys/kernel/tracing/trace -rw-r----- 1 root tracing 0 Nov 1 08:37 /sys/kernel/tracing/trace Link: https://lore.kernel.org/r/536e99d3-345c-448b-adee-a21389d7ab4b@redhat.com/ Cc: Eric Sandeen <sandeen(a)redhat.com> Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Shuah Khan <shuah(a)kernel.org> Cc: Ali Zahraee <ahzahraee(a)gmail.com> Cc: Christian Brauner <brauner(a)kernel.org> Cc: David Howells <dhowells(a)redhat.com> Cc: Steven Rostedt <rostedt(a)goodmis.org> Cc: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: stable(a)vger.kernel.org Fixes: 78ff64081949 ("vfs: Convert tracefs to use the new mount API") Link: https://lore.kernel.org/20241030171928.4168869-2-kaleshsingh@google.com Signed-off-by: Kalesh Singh <kaleshsingh(a)google.com> Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> --- fs/tracefs/inode.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/fs/tracefs/inode.c b/fs/tracefs/inode.c index 1748dff58c3b..cfc614c638da 100644 --- a/fs/tracefs/inode.c +++ b/fs/tracefs/inode.c @@ -392,6 +392,9 @@ static int tracefs_reconfigure(struct fs_context *fc) struct tracefs_fs_info *sb_opts = sb->s_fs_info; struct tracefs_fs_info *new_opts = fc->s_fs_info; + if (!new_opts) + return 0; + sync_filesystem(sb); /* structure copy of new mount options to sb */ *sb_opts = *new_opts; @@ -478,14 +481,17 @@ static int tracefs_fill_super(struct super_block *sb, struct fs_context *fc) sb->s_op = &tracefs_super_operations; sb->s_d_op = &tracefs_dentry_operations; - tracefs_apply_options(sb, false); - return 0; } static int tracefs_get_tree(struct fs_context *fc) { - return get_tree_single(fc, tracefs_fill_super); + int err = get_tree_single(fc, tracefs_fill_super); + + if (err) + return err; + + return tracefs_reconfigure(fc); } static void tracefs_free_fc(struct fs_context *fc) -- 2.45.2

1 year, 1 month

1
0
0 0

+ signal-restore-the-override_rlimit-logic.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: signal: restore the override_rlimit logic has been added to the -mm mm-hotfixes-unstable branch. Its filename is signal-restore-the-override_rlimit-logic.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Roman Gushchin <roman.gushchin(a)linux.dev> Subject: signal: restore the override_rlimit logic Date: Thu, 31 Oct 2024 20:04:38 +0000 Prior to commit d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") UCOUNT_RLIMIT_SIGPENDING rlimit was not enforced for a class of signals. However now it's enforced unconditionally, even if override_rlimit is set. This behavior change caused production issues. For example, if the limit is reached and a process receives a SIGSEGV signal, sigqueue_alloc fails to allocate the necessary resources for the signal delivery, preventing the signal from being delivered with siginfo. This prevents the process from correctly identifying the fault address and handling the error. From the user-space perspective, applications are unaware that the limit has been reached and that the siginfo is effectively 'corrupted'. This can lead to unpredictable behavior and crashes, as we observed with java applications. Fix this by passing override_rlimit into inc_rlimit_get_ucounts() and skip the comparison to max there if override_rlimit is set. This effectively restores the old behavior. Link: https://lkml.kernel.org/r/20241031200438.2951287-1-roman.gushchin@linux.dev Fixes: d64696905554 ("Reimplement RLIMIT_SIGPENDING on top of ucounts") Signed-off-by: Roman Gushchin <roman.gushchin(a)linux.dev> Co-developed-by: Andrei Vagin <avagin(a)google.com> Signed-off-by: Andrei Vagin <avagin(a)google.com> Cc: Kees Cook <kees(a)kernel.org> Cc: "Eric W. Biederman" <ebiederm(a)xmission.com> Cc: Alexey Gladkov <legion(a)kernel.org> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/user_namespace.h | 3 ++- kernel/signal.c | 3 ++- kernel/ucount.c | 5 +++-- 3 files changed, 7 insertions(+), 4 deletions(-) --- a/include/linux/user_namespace.h~signal-restore-the-override_rlimit-logic +++ a/include/linux/user_namespace.h @@ -141,7 +141,8 @@ static inline long get_rlimit_value(stru long inc_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v); bool dec_rlimit_ucounts(struct ucounts *ucounts, enum rlimit_type type, long v); -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type); +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type, + bool override_rlimit); void dec_rlimit_put_ucounts(struct ucounts *ucounts, enum rlimit_type type); bool is_rlimit_overlimit(struct ucounts *ucounts, enum rlimit_type type, unsigned long max); --- a/kernel/signal.c~signal-restore-the-override_rlimit-logic +++ a/kernel/signal.c @@ -419,7 +419,8 @@ __sigqueue_alloc(int sig, struct task_st */ rcu_read_lock(); ucounts = task_ucounts(t); - sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING); + sigpending = inc_rlimit_get_ucounts(ucounts, UCOUNT_RLIMIT_SIGPENDING, + override_rlimit); rcu_read_unlock(); if (!sigpending) return NULL; --- a/kernel/ucount.c~signal-restore-the-override_rlimit-logic +++ a/kernel/ucount.c @@ -307,7 +307,8 @@ void dec_rlimit_put_ucounts(struct ucoun do_dec_rlimit_put_ucounts(ucounts, NULL, type); } -long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type) +long inc_rlimit_get_ucounts(struct ucounts *ucounts, enum rlimit_type type, + bool override_rlimit) { /* Caller must hold a reference to ucounts */ struct ucounts *iter; @@ -316,7 +317,7 @@ long inc_rlimit_get_ucounts(struct ucoun for (iter = ucounts; iter; iter = iter->ns->ucounts) { long new = atomic_long_add_return(1, &iter->rlimit[type]); - if (new < 0 || new > max) + if (new < 0 || (!override_rlimit && (new > max))) goto unwind; if (iter == ucounts) ret = new; _ Patches currently in -mm which might be from roman.gushchin(a)linux.dev are signal-restore-the-override_rlimit-logic.patch

1 year, 1 month

1
0
0 0

[PATCH v7 bpf-next 01/10] lib/buildid: harden build ID parsing logic

by Andrii Nakryiko

Harden build ID parsing logic, adding explicit READ_ONCE() where it's important to have a consistent value read and validated just once. Also, as pointed out by Andi Kleen, we need to make sure that entire ELF note is within a page bounds, so move the overflow check up and add an extra note_size boundaries validation. Fixes tag below points to the code that moved this code into lib/buildid.c, and then subsequently was used in perf subsystem, making this code exposed to perf_event_open() users in v5.12+. Cc: stable(a)vger.kernel.org Reviewed-by: Eduard Zingerman <eddyz87(a)gmail.com> Reviewed-by: Jann Horn <jannh(a)google.com> Suggested-by: Andi Kleen <ak(a)linux.intel.com> Fixes: bd7525dacd7e ("bpf: Move stack_map_get_build_id into lib") Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org> --- lib/buildid.c | 76 +++++++++++++++++++++++++++++---------------------- 1 file changed, 44 insertions(+), 32 deletions(-) diff --git a/lib/buildid.c b/lib/buildid.c index e02b5507418b..26007cc99a38 100644 --- a/lib/buildid.c +++ b/lib/buildid.c @@ -18,31 +18,37 @@ static int parse_build_id_buf(unsigned char *build_id, const void *note_start, Elf32_Word note_size) { - Elf32_Word note_offs = 0, new_offs; - - while (note_offs + sizeof(Elf32_Nhdr) < note_size) { - Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_offs); + const char note_name[] = "GNU"; + const size_t note_name_sz = sizeof(note_name); + u64 note_off = 0, new_off, name_sz, desc_sz; + const char *data; + + while (note_off + sizeof(Elf32_Nhdr) < note_size && + note_off + sizeof(Elf32_Nhdr) > note_off /* overflow */) { + Elf32_Nhdr *nhdr = (Elf32_Nhdr *)(note_start + note_off); + + name_sz = READ_ONCE(nhdr->n_namesz); + desc_sz = READ_ONCE(nhdr->n_descsz); + + new_off = note_off + sizeof(Elf32_Nhdr); + if (check_add_overflow(new_off, ALIGN(name_sz, 4), &new_off) || + check_add_overflow(new_off, ALIGN(desc_sz, 4), &new_off) || + new_off > note_size) + break; if (nhdr->n_type == BUILD_ID && - nhdr->n_namesz == sizeof("GNU") && - !strcmp((char *)(nhdr + 1), "GNU") && - nhdr->n_descsz > 0 && - nhdr->n_descsz <= BUILD_ID_SIZE_MAX) { - memcpy(build_id, - note_start + note_offs + - ALIGN(sizeof("GNU"), 4) + sizeof(Elf32_Nhdr), - nhdr->n_descsz); - memset(build_id + nhdr->n_descsz, 0, - BUILD_ID_SIZE_MAX - nhdr->n_descsz); + name_sz == note_name_sz && + memcmp(nhdr + 1, note_name, note_name_sz) == 0 && + desc_sz > 0 && desc_sz <= BUILD_ID_SIZE_MAX) { + data = note_start + note_off + ALIGN(note_name_sz, 4); + memcpy(build_id, data, desc_sz); + memset(build_id + desc_sz, 0, BUILD_ID_SIZE_MAX - desc_sz); if (size) - *size = nhdr->n_descsz; + *size = desc_sz; return 0; } - new_offs = note_offs + sizeof(Elf32_Nhdr) + - ALIGN(nhdr->n_namesz, 4) + ALIGN(nhdr->n_descsz, 4); - if (new_offs <= note_offs) /* overflow */ - break; - note_offs = new_offs; + + note_off = new_off; } return -EINVAL; @@ -71,7 +77,7 @@ static int get_build_id_32(const void *page_addr, unsigned char *build_id, { Elf32_Ehdr *ehdr = (Elf32_Ehdr *)page_addr; Elf32_Phdr *phdr; - int i; + __u32 i, phnum; /* * FIXME @@ -80,18 +86,19 @@ static int get_build_id_32(const void *page_addr, unsigned char *build_id, */ if (ehdr->e_phoff != sizeof(Elf32_Ehdr)) return -EINVAL; + + phnum = READ_ONCE(ehdr->e_phnum); /* only supports phdr that fits in one page */ - if (ehdr->e_phnum > - (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr)) + if (phnum > (PAGE_SIZE - sizeof(Elf32_Ehdr)) / sizeof(Elf32_Phdr)) return -EINVAL; phdr = (Elf32_Phdr *)(page_addr + sizeof(Elf32_Ehdr)); - for (i = 0; i < ehdr->e_phnum; ++i) { + for (i = 0; i < phnum; ++i) { if (phdr[i].p_type == PT_NOTE && !parse_build_id(page_addr, build_id, size, - page_addr + phdr[i].p_offset, - phdr[i].p_filesz)) + page_addr + READ_ONCE(phdr[i].p_offset), + READ_ONCE(phdr[i].p_filesz))) return 0; } return -EINVAL; @@ -103,7 +110,7 @@ static int get_build_id_64(const void *page_addr, unsigned char *build_id, { Elf64_Ehdr *ehdr = (Elf64_Ehdr *)page_addr; Elf64_Phdr *phdr; - int i; + __u32 i, phnum; /* * FIXME @@ -112,18 +119,19 @@ static int get_build_id_64(const void *page_addr, unsigned char *build_id, */ if (ehdr->e_phoff != sizeof(Elf64_Ehdr)) return -EINVAL; + + phnum = READ_ONCE(ehdr->e_phnum); /* only supports phdr that fits in one page */ - if (ehdr->e_phnum > - (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr)) + if (phnum > (PAGE_SIZE - sizeof(Elf64_Ehdr)) / sizeof(Elf64_Phdr)) return -EINVAL; phdr = (Elf64_Phdr *)(page_addr + sizeof(Elf64_Ehdr)); - for (i = 0; i < ehdr->e_phnum; ++i) { + for (i = 0; i < phnum; ++i) { if (phdr[i].p_type == PT_NOTE && !parse_build_id(page_addr, build_id, size, - page_addr + phdr[i].p_offset, - phdr[i].p_filesz)) + page_addr + READ_ONCE(phdr[i].p_offset), + READ_ONCE(phdr[i].p_filesz))) return 0; } return -EINVAL; @@ -152,6 +160,10 @@ int build_id_parse(struct vm_area_struct *vma, unsigned char *build_id, page = find_get_page(vma->vm_file->f_mapping, 0); if (!page) return -EFAULT; /* page not mapped */ + if (!PageUptodate(page)) { + put_page(page); + return -EFAULT; + } ret = -EINVAL; page_addr = kmap_local_page(page); -- 2.43.5

1 year, 1 month

3
2
0 0

Re: [PATCH 6.6 000/213] 6.6.57-rc1 review

by Ivan Fay

Sent from my iPhone

1 year, 1 month

1
0
0 0

[PATCH 2/2] drm/xe: improve hibernation on igpu

by Matthew Auld

The GGTT looks to be stored inside stolen memory on igpu which is not treated as normal RAM. The core kernel skips this memory range when creating the hibernation image, therefore when coming back from hibernation the GGTT programming is lost. This seems to cause issues with broken resume where GuC FW fails to load: [drm] *ERROR* GT0: load failed: status = 0x400000A0, time = 10ms, freq = 1250MHz (req 1300MHz), done = -1 [drm] *ERROR* GT0: load failed: status: Reset = 0, BootROM = 0x50, UKernel = 0x00, MIA = 0x00, Auth = 0x01 [drm] *ERROR* GT0: firmware signature verification failed [drm] *ERROR* CRITICAL: Xe has declared device 0000:00:02.0 as wedged. Current GGTT users are kernel internal and tracked as pinned, so it should be possible to hook into the existing save/restore logic that we use for dgpu, where the actual evict is skipped but on restore we importantly restore the GGTT programming. This has been confirmed to fix hibernation on at least ADL and MTL, though likely all igpu platforms are affected. This also means we have a hole in our testing, where the existing s4 tests only really test the driver hooks, and don't go as far as actually rebooting and restoring from the hibernation image and in turn powering down RAM (and therefore losing the contents of stolen). Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3275 Signed-off-by: Matthew Auld <matthew.auld(a)intel.com> Cc: Matthew Brost <matthew.brost(a)intel.com> Cc: <stable(a)vger.kernel.org> # v6.8+ --- drivers/gpu/drm/xe/xe_bo.c | 36 ++++++++++++++------------------ drivers/gpu/drm/xe/xe_bo_evict.c | 6 ------ 2 files changed, 16 insertions(+), 26 deletions(-) diff --git a/drivers/gpu/drm/xe/xe_bo.c b/drivers/gpu/drm/xe/xe_bo.c index d79d8ef5c7d5..0ae5c8f7bab8 100644 --- a/drivers/gpu/drm/xe/xe_bo.c +++ b/drivers/gpu/drm/xe/xe_bo.c @@ -950,7 +950,10 @@ int xe_bo_restore_pinned(struct xe_bo *bo) if (WARN_ON(!xe_bo_is_pinned(bo))) return -EINVAL; - if (WARN_ON(xe_bo_is_vram(bo) || !bo->ttm.ttm)) + if (WARN_ON(xe_bo_is_vram(bo))) + return -EINVAL; + + if (WARN_ON(!bo->ttm.ttm && !xe_bo_is_stolen(bo))) return -EINVAL; if (!mem_type_is_vram(place->mem_type)) @@ -1770,6 +1773,7 @@ int xe_bo_pin_external(struct xe_bo *bo) int xe_bo_pin(struct xe_bo *bo) { + struct ttm_place *place = &(bo->placements[0]); struct xe_device *xe = xe_bo_device(bo); int err; @@ -1800,7 +1804,6 @@ int xe_bo_pin(struct xe_bo *bo) */ if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) && bo->flags & XE_BO_FLAG_INTERNAL_TEST)) { - struct ttm_place *place = &(bo->placements[0]); if (mem_type_is_vram(place->mem_type)) { xe_assert(xe, place->flags & TTM_PL_FLAG_CONTIGUOUS); @@ -1809,13 +1812,12 @@ int xe_bo_pin(struct xe_bo *bo) vram_region_gpu_offset(bo->ttm.resource)) >> PAGE_SHIFT; place->lpfn = place->fpfn + (bo->size >> PAGE_SHIFT); } + } - if (mem_type_is_vram(place->mem_type) || - bo->flags & XE_BO_FLAG_GGTT) { - spin_lock(&xe->pinned.lock); - list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present); - spin_unlock(&xe->pinned.lock); - } + if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) { + spin_lock(&xe->pinned.lock); + list_add_tail(&bo->pinned_link, &xe->pinned.kernel_bo_present); + spin_unlock(&xe->pinned.lock); } ttm_bo_pin(&bo->ttm); @@ -1863,24 +1865,18 @@ void xe_bo_unpin_external(struct xe_bo *bo) void xe_bo_unpin(struct xe_bo *bo) { + struct ttm_place *place = &(bo->placements[0]); struct xe_device *xe = xe_bo_device(bo); xe_assert(xe, !bo->ttm.base.import_attach); xe_assert(xe, xe_bo_is_pinned(bo)); - if (IS_DGFX(xe) && !(IS_ENABLED(CONFIG_DRM_XE_DEBUG) && - bo->flags & XE_BO_FLAG_INTERNAL_TEST)) { - struct ttm_place *place = &(bo->placements[0]); - - if (mem_type_is_vram(place->mem_type) || - bo->flags & XE_BO_FLAG_GGTT) { - spin_lock(&xe->pinned.lock); - xe_assert(xe, !list_empty(&bo->pinned_link)); - list_del_init(&bo->pinned_link); - spin_unlock(&xe->pinned.lock); - } + if (mem_type_is_vram(place->mem_type) || bo->flags & XE_BO_FLAG_GGTT) { + spin_lock(&xe->pinned.lock); + xe_assert(xe, !list_empty(&bo->pinned_link)); + list_del_init(&bo->pinned_link); + spin_unlock(&xe->pinned.lock); } - ttm_bo_unpin(&bo->ttm); } diff --git a/drivers/gpu/drm/xe/xe_bo_evict.c b/drivers/gpu/drm/xe/xe_bo_evict.c index 32043e1e5a86..b01bc20eb90b 100644 --- a/drivers/gpu/drm/xe/xe_bo_evict.c +++ b/drivers/gpu/drm/xe/xe_bo_evict.c @@ -34,9 +34,6 @@ int xe_bo_evict_all(struct xe_device *xe) u8 id; int ret; - if (!IS_DGFX(xe)) - return 0; - /* User memory */ for (mem_type = XE_PL_VRAM0; mem_type <= XE_PL_VRAM1; ++mem_type) { struct ttm_resource_manager *man = @@ -125,9 +122,6 @@ int xe_bo_restore_kernel(struct xe_device *xe) struct xe_bo *bo; int ret; - if (!IS_DGFX(xe)) - return 0; - spin_lock(&xe->pinned.lock); for (;;) { bo = list_first_entry_or_null(&xe->pinned.evicted, -- 2.47.0

1 year, 1 month

2
1
0 0

Re: Patch "lib: alloc_tag_module_unload must wait for pending kfree_rcu calls" has been added to the 6.11-stable tree

by Suren Baghdasaryan

On Fri, Nov 1, 2024 at 6:58 AM Sasha Levin <sashal(a)kernel.org> wrote: > > This is a note to let you know that I've just added the patch titled > > lib: alloc_tag_module_unload must wait for pending kfree_rcu calls > > to the 6.11-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum… > > The filename of the patch is: > lib-alloc_tag_module_unload-must-wait-for-pending-kf.patch > and it can be found in the queue-6.11 subdirectory. Thanks Sasha! Could you please double-check that the prerequisite patch https://lore.kernel.org/all/20241021171003.2907935-1-surenb@google.com/ was also picked up? I don't see it in the queue-6.11 directory. Without that patch this one will cause build errors, that's why I sent them as a patchset. Thanks, Suren. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable(a)vger.kernel.org> know about it. > > > > commit 536dfe685ebd28b27ebfbc3d4b9168207b7e28a3 > Author: Florian Westphal <fw(a)strlen.de> > Date: Mon Oct 7 22:52:24 2024 +0200 > > lib: alloc_tag_module_unload must wait for pending kfree_rcu calls > > [ Upstream commit dc783ba4b9df3fb3e76e968b2cbeb9960069263c ] > > Ben Greear reports following splat: > ------------[ cut here ]------------ > net/netfilter/nf_nat_core.c:1114 module nf_nat func:nf_nat_register_fn has 256 allocated at module unload > WARNING: CPU: 1 PID: 10421 at lib/alloc_tag.c:168 alloc_tag_module_unload+0x22b/0x3f0 > Modules linked in: nf_nat(-) btrfs ufs qnx4 hfsplus hfs minix vfat msdos fat > ... > Hardware name: Default string Default string/SKYBAY, BIOS 5.12 08/04/2020 > RIP: 0010:alloc_tag_module_unload+0x22b/0x3f0 > codetag_unload_module+0x19b/0x2a0 > ? codetag_load_module+0x80/0x80 > > nf_nat module exit calls kfree_rcu on those addresses, but the free > operation is likely still pending by the time alloc_tag checks for leaks. > > Wait for outstanding kfree_rcu operations to complete before checking > resolves this warning. > > Reproducer: > unshare -n iptables-nft -t nat -A PREROUTING -p tcp > grep nf_nat /proc/allocinfo # will list 4 allocations > rmmod nft_chain_nat > rmmod nf_nat # will WARN. > > [akpm(a)linux-foundation.org: add comment] > Link: https://lkml.kernel.org/r/20241007205236.11847-1-fw@strlen.de > Fixes: a473573964e5 ("lib: code tagging module support") > Signed-off-by: Florian Westphal <fw(a)strlen.de> > Reported-by: Ben Greear <greearb(a)candelatech.com> > Closes: https://lore.kernel.org/netdev/bdaaef9d-4364-4171-b82b-bcfc12e207eb@candela… > Cc: Uladzislau Rezki <urezki(a)gmail.com> > Cc: Vlastimil Babka <vbabka(a)suse.cz> > Cc: Suren Baghdasaryan <surenb(a)google.com> > Cc: Kent Overstreet <kent.overstreet(a)linux.dev> > Cc: <stable(a)vger.kernel.org> > Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> > Signed-off-by: Sasha Levin <sashal(a)kernel.org> > > diff --git a/lib/codetag.c b/lib/codetag.c > index afa8a2d4f3173..d1fbbb7c2ec3d 100644 > --- a/lib/codetag.c > +++ b/lib/codetag.c > @@ -228,6 +228,9 @@ bool codetag_unload_module(struct module *mod) > if (!mod) > return true; > > + /* await any module's kfree_rcu() operations to complete */ > + kvfree_rcu_barrier(); > + > mutex_lock(&codetag_lock); > list_for_each_entry(cttype, &codetag_types, link) { > struct codetag_module *found = NULL;

1 year, 1 month

1
0
0 0

[PATCH] nvme: rdma: Add check for queue in nvmet_rdma_cm_handler()

by George Rurikov

From: MrRurikov <grurikov(a)gmal.com> After having been assigned to a NULL value at rdma.c:1758, pointer 'queue' is passed as 1st parameter in call to function 'nvmet_rdma_queue_established' at rdma.c:1773, as 1st parameter in call to function 'nvmet_rdma_queue_disconnect' at rdma.c:1787 and as 2nd parameter in call to function 'nvmet_rdma_queue_connect_fail' at rdma.c:1800, where it is dereferenced. I understand, that driver is confident that the RDMA_CM_EVENT_CONNECT_REQUEST event will occur first and perform initialization, but maliciously prepared hardware could send events in violation of the protocol. Nothing guarantees that the sequence of events will start with RDMA_CM_EVENT_CONNECT_REQUEST. Found by Linux Verification Center (linuxtesting.org) with SVACE Fixes: e1a2ee249b19 ("nvmet-rdma: Fix use after free in nvmet_rdma_cm_handler()") Cc: stable(a)vger.kernel.org Signed-off-by: George Rurikov <g.ryurikov(a)securitycode.ru> --- drivers/nvme/target/rdma.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/drivers/nvme/target/rdma.c b/drivers/nvme/target/rdma.c index 1b6264fa5803..becebc95f349 100644 --- a/drivers/nvme/target/rdma.c +++ b/drivers/nvme/target/rdma.c @@ -1770,8 +1770,10 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id, ret = nvmet_rdma_queue_connect(cm_id, event); break; case RDMA_CM_EVENT_ESTABLISHED: - nvmet_rdma_queue_established(queue); - break; + if (!queue) { + nvmet_rdma_queue_established(queue); + break; + } case RDMA_CM_EVENT_ADDR_CHANGE: if (!queue) { struct nvmet_rdma_port *port = cm_id->context; @@ -1782,8 +1784,10 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id, fallthrough; case RDMA_CM_EVENT_DISCONNECTED: case RDMA_CM_EVENT_TIMEWAIT_EXIT: - nvmet_rdma_queue_disconnect(queue); - break; + if (!queue) { + nvmet_rdma_queue_disconnect(queue); + break; + } case RDMA_CM_EVENT_DEVICE_REMOVAL: ret = nvmet_rdma_device_removal(cm_id, queue); break; @@ -1793,8 +1797,10 @@ static int nvmet_rdma_cm_handler(struct rdma_cm_id *cm_id, fallthrough; case RDMA_CM_EVENT_UNREACHABLE: case RDMA_CM_EVENT_CONNECT_ERROR: - nvmet_rdma_queue_connect_fail(cm_id, queue); - break; + if (!queue) { + nvmet_rdma_queue_connect_fail(cm_id, queue); + break; + } default: pr_err("received unrecognized RDMA CM event %d\n", event->event); -- 2.34.1

1 year, 1 month

3
3
0 0

Passenger traffic 2024

by Camille Batiste

Hey, Would you be interested in acquiring the attendees list of Passenger traffic Expo 2024? List contains: Names, Titles, Phone Numbers, Company Details, and more… Interested? Let me know so that I’ll send you the pricing for the same. Kind Regards, Camille Batiste Marketing Executive If you do not wish to receive our emails, please reply with "Not Interested."

1 year, 1 month

1
0
0 0

[PATCH 0/1: 5.10/5.15] net: bridge: xmit: make sure we have at least eth header len bytes

by Randy.MacLeod＠windriver.com

From: Randy MacLeod <Randy.MacLeod(a)windriver.com> This is my first commit to -stable so I'm going to carefully explain what I've done. I work on the Yocto Project and I have done some work on the Linux network stack a long time ago so I'm not quite a complete newbie. I took the commit found here: https://lore.kernel.org/stable/20240527185645.658299380@linuxfoundation.org/ and backported as per my commit log: Based on above commit but simplified since pskb_may_pull_reason() does not exist until 6.1. I also trimmed the original commit log of the "Tested by dropwatch" section as well as the full stack trace since that may have changed in 5.10/5.15 and It compiles fine for 5.10 and 5.15 but I have not tested with dropwatch since the patch is just dropping short xmit packets for bridging. Finally, since the patch is much simpler than the original, I've removed the original patch author's SOB line. Please let me know if any of this is not what y'all'd like to see. Randy MacLeod (1): net: bridge: xmit: make sure we have at least eth header len bytes net/bridge/br_device.c | 5 +++++ 1 file changed, 5 insertions(+) base-commit: 5a8fa04b2a4de1d52be4a04690dcb52ac7998943 -- 2.34.1

1 year, 1 month

1
1
0 0

[PATCH v2] drm/bridge: Fix assignment of the of_node of the parent to aux bridge

by Abel Vesa

The assignment of the of_node to the aux bridge needs to mark the of_node as reused as well, otherwise resource providers like pinctrl will report a gpio as already requested by a different device when both pinconf and gpios property are present. Fix that by using the device_set_of_node_from_dev() helper instead. Fixes: 6914968a0b52 ("drm/bridge: properly refcount DT nodes in aux bridge drivers") Cc: stable(a)vger.kernel.org # 6.8 Cc: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org> Signed-off-by: Abel Vesa <abel.vesa(a)linaro.org> --- Changes in v2: - Re-worded commit to be more explicit of what it fixes, as Johan suggested - Used device_set_of_node_from_dev() helper, as per Johan's suggestion - Added Fixes tag and cc'ed stable - Link to v1: https://lore.kernel.org/r/20241017-drm-aux-bridge-mark-of-node-reused-v1-1-… --- drivers/gpu/drm/bridge/aux-bridge.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/bridge/aux-bridge.c b/drivers/gpu/drm/bridge/aux-bridge.c index b29980f95379ec7af873ed6e0fb79a9abb663c7b..295e9d031e2dc86cbfd2a7350718fca181c99487 100644 --- a/drivers/gpu/drm/bridge/aux-bridge.c +++ b/drivers/gpu/drm/bridge/aux-bridge.c @@ -58,9 +58,10 @@ int drm_aux_bridge_register(struct device *parent) adev->id = ret; adev->name = "aux_bridge"; adev->dev.parent = parent; - adev->dev.of_node = of_node_get(parent->of_node); adev->dev.release = drm_aux_bridge_release; + device_set_of_node_from_dev(&adev->dev, parent); + ret = auxiliary_device_init(adev); if (ret) { ida_free(&drm_aux_bridge_ida, adev->id); --- base-commit: d61a00525464bfc5fe92c6ad713350988e492b88 change-id: 20241017-drm-aux-bridge-mark-of-node-reused-5c2ee740ff19 Best regards, -- Abel Vesa <abel.vesa(a)linaro.org>

1 year, 1 month

6
25
0 0

[PATCH] vfio/qat: fix overflow check in qat_vf_resume_write()

by Giovanni Cabiddu

The unsigned variable `size_t len` is cast to the signed type `loff_t` when passed to the function check_add_overflow(). This function considers the type of the destination, which is of type loff_t (signed), potentially leading to an overflow. This issue is similar to the one described in the link below. Remove the cast. Note that even if check_add_overflow() is bypassed, by setting `len` to a value that is greater than LONG_MAX (which is considered as a negative value after the cast), the function copy_from_user(), invoked a few lines later, will not perform any copy and return `len` as (len > INT_MAX) causing qat_vf_resume_write() to fail with -EFAULT. Fixes: bb208810b1ab ("vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices") CC: stable(a)vger.kernel.org # 6.10+ Link: https://lore.kernel.org/all/138bd2e2-ede8-4bcc-aa7b-f3d9de167a37@moroto.mou… Reported-by: Zijie Zhao <zzjas98(a)gmail.com> Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com> Reviewed-by: Xin Zeng <xin.zeng(a)intel.com> --- drivers/vfio/pci/qat/main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/vfio/pci/qat/main.c b/drivers/vfio/pci/qat/main.c index e36740a282e7..1e3563fe7cab 100644 --- a/drivers/vfio/pci/qat/main.c +++ b/drivers/vfio/pci/qat/main.c @@ -305,7 +305,7 @@ static ssize_t qat_vf_resume_write(struct file *filp, const char __user *buf, offs = &filp->f_pos; if (*offs < 0 || - check_add_overflow((loff_t)len, *offs, &end)) + check_add_overflow(len, *offs, &end)) return -EOVERFLOW; if (end > mig_dev->state_size) -- 2.47.0

1 year, 1 month

2
1
0 0

[PATCH v2 1/5] xhci: Combine two if statements for Etron xHCI host

by Kuangyi Chiang

Combine two if statements, because these hosts have the same quirk flags applied. Fixes: 91f7a1524a92 ("xhci: Apply broken streams quirk to Etron EJ188 xHCI host") Cc: <stable(a)vger.kernel.org> Signed-off-by: Kuangyi Chiang <ki.chiang65(a)gmail.com> --- drivers/usb/host/xhci-pci.c | 8 ++------ 1 file changed, 2 insertions(+), 6 deletions(-) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index 7e538194a0a4..33a6d99afc10 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -395,12 +395,8 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) xhci->quirks |= XHCI_DEFAULT_PM_RUNTIME_ALLOW; if (pdev->vendor == PCI_VENDOR_ID_ETRON && - pdev->device == PCI_DEVICE_ID_EJ168) { - xhci->quirks |= XHCI_RESET_ON_RESUME; - xhci->quirks |= XHCI_BROKEN_STREAMS; - } - if (pdev->vendor == PCI_VENDOR_ID_ETRON && - pdev->device == PCI_DEVICE_ID_EJ188) { + (pdev->device == PCI_DEVICE_ID_EJ168 || + pdev->device == PCI_DEVICE_ID_EJ188)) { xhci->quirks |= XHCI_RESET_ON_RESUME; xhci->quirks |= XHCI_BROKEN_STREAMS; } -- 2.25.1

1 year, 1 month

2
3
0 0

[PATCH 6.6] rcu-tasks: Fix access non-existent percpu rtpcp variable in rcu_tasks_need_gpcb()

by liwei

From: Zqiang <qiang.zhang1211(a)gmail.com> [ Upstream commit fd70e9f1d85f5323096ad313ba73f5fe3d15ea41 ] For kernels built with CONFIG_FORCE_NR_CPUS=y, the nr_cpu_ids is defined as NR_CPUS instead of the number of possible cpus, this will cause the following system panic: smpboot: Allowing 4 CPUs, 0 hotplug CPUs ... setup_percpu: NR_CPUS:512 nr_cpumask_bits:512 nr_cpu_ids:512 nr_node_ids:1 ... BUG: unable to handle page fault for address: ffffffff9911c8c8 Oops: 0000 [#1] PREEMPT SMP PTI CPU: 0 PID: 15 Comm: rcu_tasks_trace Tainted: G W 6.6.21 #1 5dc7acf91a5e8e9ac9dcfc35bee0245691283ea6 RIP: 0010:rcu_tasks_need_gpcb+0x25d/0x2c0 RSP: 0018:ffffa371c00a3e60 EFLAGS: 00010082 CR2: ffffffff9911c8c8 CR3: 000000040fa20005 CR4: 00000000001706f0 Call Trace: <TASK> ? __die+0x23/0x80 ? page_fault_oops+0xa4/0x180 ? exc_page_fault+0x152/0x180 ? asm_exc_page_fault+0x26/0x40 ? rcu_tasks_need_gpcb+0x25d/0x2c0 ? __pfx_rcu_tasks_kthread+0x40/0x40 rcu_tasks_one_gp+0x69/0x180 rcu_tasks_kthread+0x94/0xc0 kthread+0xe8/0x140 ? __pfx_kthread+0x40/0x40 ret_from_fork+0x34/0x80 ? __pfx_kthread+0x40/0x40 ret_from_fork_asm+0x1b/0x80 </TASK> Considering that there may be holes in the CPU numbers, use the maximum possible cpu number, instead of nr_cpu_ids, for configuring enqueue and dequeue limits. [ neeraj.upadhyay: Fix htmldocs build error reported by Stephen Rothwell ] Closes: https://lore.kernel.org/linux-input/CALMA0xaTSMN+p4xUXkzrtR5r6k7hgoswcaXx7b… Reported-by: Zhixu Liu <zhixu.liu(a)gmail.com> Signed-off-by: Zqiang <qiang.zhang1211(a)gmail.com> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay(a)kernel.org> [liwei: adaptation context conflict] Signed-off-by: liwei <liwei728(a)huawei.com> --- kernel/rcu/tasks.h | 83 ++++++++++++++++++++++++++++++---------------- 1 file changed, 54 insertions(+), 29 deletions(-) diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h index df81506cf2bd..fbfa0f90f884 100644 --- a/kernel/rcu/tasks.h +++ b/kernel/rcu/tasks.h @@ -33,6 +33,7 @@ typedef void (*postgp_func_t)(struct rcu_tasks *rtp); * @barrier_q_head: RCU callback for barrier operation. * @rtp_blkd_tasks: List of tasks blocked as readers. * @cpu: CPU number corresponding to this entry. + * @index: Index of this CPU in rtpcp_array of the rcu_tasks structure. * @rtpp: Pointer to the rcu_tasks structure. */ struct rcu_tasks_percpu { @@ -47,6 +48,7 @@ struct rcu_tasks_percpu { struct rcu_head barrier_q_head; struct list_head rtp_blkd_tasks; int cpu; + int index; struct rcu_tasks *rtpp; }; @@ -73,6 +75,7 @@ struct rcu_tasks_percpu { * @postgp_func: This flavor's post-grace-period function (optional). * @call_func: This flavor's call_rcu()-equivalent function. * @rtpcpu: This flavor's rcu_tasks_percpu structure. + * @rtpcp_array: Array of pointers to rcu_tasks_percpu structure of CPUs in cpu_possible_mask. * @percpu_enqueue_shift: Shift down CPU ID this much when enqueuing callbacks. * @percpu_enqueue_lim: Number of per-CPU callback queues in use for enqueuing. * @percpu_dequeue_lim: Number of per-CPU callback queues in use for dequeuing. @@ -106,6 +109,7 @@ struct rcu_tasks { postgp_func_t postgp_func; call_rcu_func_t call_func; struct rcu_tasks_percpu __percpu *rtpcpu; + struct rcu_tasks_percpu **rtpcp_array; int percpu_enqueue_shift; int percpu_enqueue_lim; int percpu_dequeue_lim; @@ -179,6 +183,8 @@ module_param(rcu_task_collapse_lim, int, 0444); static int rcu_task_lazy_lim __read_mostly = 32; module_param(rcu_task_lazy_lim, int, 0444); +static int rcu_task_cpu_ids; + /* RCU tasks grace-period state for debugging. */ #define RTGS_INIT 0 #define RTGS_WAIT_WAIT_CBS 1 @@ -243,6 +249,8 @@ static void cblist_init_generic(struct rcu_tasks *rtp) unsigned long flags; int lim; int shift; + int maxcpu; + int index = 0; if (rcu_task_enqueue_lim < 0) { rcu_task_enqueue_lim = 1; @@ -252,14 +260,9 @@ static void cblist_init_generic(struct rcu_tasks *rtp) } lim = rcu_task_enqueue_lim; - if (lim > nr_cpu_ids) - lim = nr_cpu_ids; - shift = ilog2(nr_cpu_ids / lim); - if (((nr_cpu_ids - 1) >> shift) >= lim) - shift++; - WRITE_ONCE(rtp->percpu_enqueue_shift, shift); - WRITE_ONCE(rtp->percpu_dequeue_lim, lim); - smp_store_release(&rtp->percpu_enqueue_lim, lim); + rtp->rtpcp_array = kcalloc(num_possible_cpus(), sizeof(struct rcu_tasks_percpu *), GFP_KERNEL); + BUG_ON(!rtp->rtpcp_array); + for_each_possible_cpu(cpu) { struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu); @@ -273,12 +276,28 @@ static void cblist_init_generic(struct rcu_tasks *rtp) INIT_WORK(&rtpcp->rtp_work, rcu_tasks_invoke_cbs_wq); rtpcp->cpu = cpu; rtpcp->rtpp = rtp; + rtpcp->index = index; + rtp->rtpcp_array[index] = rtpcp; + index++; if (!rtpcp->rtp_blkd_tasks.next) INIT_LIST_HEAD(&rtpcp->rtp_blkd_tasks); + + maxcpu = cpu; } - pr_info("%s: Setting shift to %d and lim to %d rcu_task_cb_adjust=%d.\n", rtp->name, - data_race(rtp->percpu_enqueue_shift), data_race(rtp->percpu_enqueue_lim), rcu_task_cb_adjust); + rcu_task_cpu_ids = maxcpu + 1; + if (lim > rcu_task_cpu_ids) + lim = rcu_task_cpu_ids; + shift = ilog2(rcu_task_cpu_ids / lim); + if (((rcu_task_cpu_ids - 1) >> shift) >= lim) + shift++; + WRITE_ONCE(rtp->percpu_enqueue_shift, shift); + WRITE_ONCE(rtp->percpu_dequeue_lim, lim); + smp_store_release(&rtp->percpu_enqueue_lim, lim); + + pr_info("%s: Setting shift to %d and lim to %d rcu_task_cb_adjust=%d rcu_task_cpu_ids=%d.\n", + rtp->name, data_race(rtp->percpu_enqueue_shift), data_race(rtp->percpu_enqueue_lim), + rcu_task_cb_adjust, rcu_task_cpu_ids); } // Compute wakeup time for lazy callback timer. @@ -346,7 +365,7 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func, rtpcp->rtp_n_lock_retries = 0; } if (rcu_task_cb_adjust && ++rtpcp->rtp_n_lock_retries > rcu_task_contend_lim && - READ_ONCE(rtp->percpu_enqueue_lim) != nr_cpu_ids) + READ_ONCE(rtp->percpu_enqueue_lim) != rcu_task_cpu_ids) needadjust = true; // Defer adjustment to avoid deadlock. } // Queuing callbacks before initialization not yet supported. @@ -366,10 +385,10 @@ static void call_rcu_tasks_generic(struct rcu_head *rhp, rcu_callback_t func, raw_spin_unlock_irqrestore_rcu_node(rtpcp, flags); if (unlikely(needadjust)) { raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags); - if (rtp->percpu_enqueue_lim != nr_cpu_ids) { + if (rtp->percpu_enqueue_lim != rcu_task_cpu_ids) { WRITE_ONCE(rtp->percpu_enqueue_shift, 0); - WRITE_ONCE(rtp->percpu_dequeue_lim, nr_cpu_ids); - smp_store_release(&rtp->percpu_enqueue_lim, nr_cpu_ids); + WRITE_ONCE(rtp->percpu_dequeue_lim, rcu_task_cpu_ids); + smp_store_release(&rtp->percpu_enqueue_lim, rcu_task_cpu_ids); pr_info("Switching %s to per-CPU callback queuing.\n", rtp->name); } raw_spin_unlock_irqrestore(&rtp->cbs_gbl_lock, flags); @@ -440,6 +459,8 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp) int needgpcb = 0; for (cpu = 0; cpu < smp_load_acquire(&rtp->percpu_dequeue_lim); cpu++) { + if (!cpu_possible(cpu)) + continue; struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu); /* Advance and accelerate any new callbacks. */ @@ -477,7 +498,7 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp) if (rcu_task_cb_adjust && ncbs <= rcu_task_collapse_lim) { raw_spin_lock_irqsave(&rtp->cbs_gbl_lock, flags); if (rtp->percpu_enqueue_lim > 1) { - WRITE_ONCE(rtp->percpu_enqueue_shift, order_base_2(nr_cpu_ids)); + WRITE_ONCE(rtp->percpu_enqueue_shift, order_base_2(rcu_task_cpu_ids)); smp_store_release(&rtp->percpu_enqueue_lim, 1); rtp->percpu_dequeue_gpseq = get_state_synchronize_rcu(); gpdone = false; @@ -492,7 +513,9 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp) pr_info("Completing switch %s to CPU-0 callback queuing.\n", rtp->name); } if (rtp->percpu_dequeue_lim == 1) { - for (cpu = rtp->percpu_dequeue_lim; cpu < nr_cpu_ids; cpu++) { + for (cpu = rtp->percpu_dequeue_lim; cpu < rcu_task_cpu_ids; cpu++) { + if (!cpu_possible(cpu)) + continue; struct rcu_tasks_percpu *rtpcp = per_cpu_ptr(rtp->rtpcpu, cpu); WARN_ON_ONCE(rcu_segcblist_n_cbs(&rtpcp->cblist)); @@ -507,30 +530,32 @@ static int rcu_tasks_need_gpcb(struct rcu_tasks *rtp) // Advance callbacks and invoke any that are ready. static void rcu_tasks_invoke_cbs(struct rcu_tasks *rtp, struct rcu_tasks_percpu *rtpcp) { - int cpu; - int cpunext; int cpuwq; unsigned long flags; int len; + int index; struct rcu_head *rhp; struct rcu_cblist rcl = RCU_CBLIST_INITIALIZER(rcl); struct rcu_tasks_percpu *rtpcp_next; - cpu = rtpcp->cpu; - cpunext = cpu * 2 + 1; - if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) { - rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext); - cpuwq = rcu_cpu_beenfullyonline(cpunext) ? cpunext : WORK_CPU_UNBOUND; - queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work); - cpunext++; - if (cpunext < smp_load_acquire(&rtp->percpu_dequeue_lim)) { - rtpcp_next = per_cpu_ptr(rtp->rtpcpu, cpunext); - cpuwq = rcu_cpu_beenfullyonline(cpunext) ? cpunext : WORK_CPU_UNBOUND; + index = rtpcp->index * 2 + 1; + if (index < num_possible_cpus()) { + rtpcp_next = rtp->rtpcp_array[index]; + if (rtpcp_next->cpu < smp_load_acquire(&rtp->percpu_dequeue_lim)) { + cpuwq = rcu_cpu_beenfullyonline(rtpcp_next->cpu) ? rtpcp_next->cpu : WORK_CPU_UNBOUND; queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work); + index++; + if (index < num_possible_cpus()) { + rtpcp_next = rtp->rtpcp_array[index]; + if (rtpcp_next->cpu < smp_load_acquire(&rtp->percpu_dequeue_lim)) { + cpuwq = rcu_cpu_beenfullyonline(rtpcp_next->cpu) ? rtpcp_next->cpu : WORK_CPU_UNBOUND; + queue_work_on(cpuwq, system_wq, &rtpcp_next->rtp_work); + } + } } } - if (rcu_segcblist_empty(&rtpcp->cblist) || !cpu_possible(cpu)) + if (rcu_segcblist_empty(&rtpcp->cblist)) return; raw_spin_lock_irqsave_rcu_node(rtpcp, flags); rcu_segcblist_advance(&rtpcp->cblist, rcu_seq_current(&rtp->tasks_gp_seq)); -- 2.25.1

1 year, 1 month

1
0
0 0

[PATCH] counter: stm32-timer-cnt: fix device_node handling in probe_encoder()

by Javier Carrasco

Device nodes accessed via of_get_compatible_child() require of_node_put() to be called when the node is no longer required to avoid leaving a reference to the node behind, leaking the resource. In this case, the usage of 'tnode' is straightforward and there are no error paths, allowing for a single of_node_put() when 'tnode' is no longer required. Cc: stable(a)vger.kernel.org Fixes: 29646ee33cc3 ("counter: stm32-timer-cnt: add checks on quadrature encoder capability") Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com> --- drivers/counter/stm32-timer-cnt.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/counter/stm32-timer-cnt.c b/drivers/counter/stm32-timer-cnt.c index 186e73d6ccb4..0d8206adccb3 100644 --- a/drivers/counter/stm32-timer-cnt.c +++ b/drivers/counter/stm32-timer-cnt.c @@ -694,6 +694,7 @@ static int stm32_timer_cnt_probe_encoder(struct device *dev, } ret = of_property_read_u32(tnode, "reg", &idx); + of_node_put(tnode); if (ret) { dev_err(dev, "Can't get index (%d)\n", ret); return ret; --- base-commit: a39230ecf6b3057f5897bc4744a790070cfbe7a8 change-id: 20241027-stm32-timer-cnt-of_node_put-8c6695e7a373 Best regards, -- Javier Carrasco <javier.carrasco.cruz(a)gmail.com>

1 year, 1 month

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror