- Linux-stable-mirror - lists.linaro.org

+ mm-huge_memory-preserve-pg_has_hwpoisoned-if-a-folio-is-split-to-0-order.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split to >0 order has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-huge_memory-preserve-pg_has_hwpoisoned-if-a-folio-is-split-to-0-order.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Zi Yan <ziy(a)nvidia.com> Subject: mm/huge_memory: preserve PG_has_hwpoisoned if a folio is split to >0 order Date: Wed, 22 Oct 2025 23:05:21 -0400 folio split clears PG_has_hwpoisoned, but the flag should be preserved in after-split folios containing pages with PG_hwpoisoned flag if the folio is split to >0 order folios. Scan all pages in a to-be-split folio to determine which after-split folios need the flag. An alternatives is to change PG_has_hwpoisoned to PG_maybe_hwpoisoned to avoid the scan and set it on all after-split folios, but resulting false positive has undesirable negative impact. To remove false positive, caller of folio_test_has_hwpoisoned() and folio_contain_hwpoisoned_page() needs to do the scan. That might be causing a hassle for current and future callers and more costly than doing the scan in the split code. More details are discussed in [1]. This issue can be exposed via: 1. splitting a has_hwpoisoned folio to >0 order from debugfs interface; 2. truncating part of a has_hwpoisoned folio in truncate_inode_partial_folio(). And later accesses to a hwpoisoned page could be possible due to the missing has_hwpoisoned folio flag. This will lead to MCE errors. Link: https://lore.kernel.org/all/CAHbLzkoOZm0PXxE9qwtF4gKR=cpRXrSrJ9V9Pm2DJexs98… [1] Link: https://lkml.kernel.org/r/20251023030521.473097-1-ziy@nvidia.com Fixes: c010d47f107f ("mm: thp: split huge page to any lower order pages") Signed-off-by: Zi Yan <ziy(a)nvidia.com> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Pankaj Raghav <kernel(a)pankajraghav.com> Reviewed-by: Yang Shi <yang(a)os.amperecomputing.com> Cc: Baolin Wang <baolin.wang(a)linux.alibaba.com> Cc: Barry Song <baohua(a)kernel.org> Cc: Dev Jain <dev.jain(a)arm.com> Cc: Jane Chu <jane.chu(a)oracle.com> Cc: Lance Yang <lance.yang(a)linux.dev> Cc: Liam Howlett <liam.howlett(a)oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com> Cc: Luis Chamberalin <mcgrof(a)kernel.org> Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org> Cc: Miaohe Lin <linmiaohe(a)huawei.com> Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com> Cc: Nico Pache <npache(a)redhat.com> Cc: Ryan Roberts <ryan.roberts(a)arm.com> Cc: Wei Yang <richard.weiyang(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/huge_memory.c | 23 ++++++++++++++++++++--- 1 file changed, 20 insertions(+), 3 deletions(-) --- a/mm/huge_memory.c~mm-huge_memory-preserve-pg_has_hwpoisoned-if-a-folio-is-split-to-0-order +++ a/mm/huge_memory.c @@ -3263,6 +3263,14 @@ bool can_split_folio(struct folio *folio caller_pins; } +static bool page_range_has_hwpoisoned(struct page *page, long nr_pages) +{ + for (; nr_pages; page++, nr_pages--) + if (PageHWPoison(page)) + return true; + return false; +} + /* * It splits @folio into @new_order folios and copies the @folio metadata to * all the resulting folios. @@ -3270,17 +3278,24 @@ bool can_split_folio(struct folio *folio static void __split_folio_to_order(struct folio *folio, int old_order, int new_order) { + /* Scan poisoned pages when split a poisoned folio to large folios */ + const bool handle_hwpoison = folio_test_has_hwpoisoned(folio) && new_order; long new_nr_pages = 1 << new_order; long nr_pages = 1 << old_order; long i; + folio_clear_has_hwpoisoned(folio); + + /* Check first new_nr_pages since the loop below skips them */ + if (handle_hwpoison && + page_range_has_hwpoisoned(folio_page(folio, 0), new_nr_pages)) + folio_set_has_hwpoisoned(folio); /* * Skip the first new_nr_pages, since the new folio from them have all * the flags from the original folio. */ for (i = new_nr_pages; i < nr_pages; i += new_nr_pages) { struct page *new_head = &folio->page + i; - /* * Careful: new_folio is not a "real" folio before we cleared PageTail. * Don't pass it around before clear_compound_head(). @@ -3322,6 +3337,10 @@ static void __split_folio_to_order(struc (1L << PG_dirty) | LRU_GEN_MASK | LRU_REFS_MASK)); + if (handle_hwpoison && + page_range_has_hwpoisoned(new_head, new_nr_pages)) + folio_set_has_hwpoisoned(new_folio); + new_folio->mapping = folio->mapping; new_folio->index = folio->index + i; @@ -3422,8 +3441,6 @@ static int __split_unmapped_folio(struct if (folio_test_anon(folio)) mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1); - folio_clear_has_hwpoisoned(folio); - /* * split to new_order one order at a time. For uniform split, * folio is split to new_order directly. _ Patches currently in -mm which might be from ziy(a)nvidia.com are mm-huge_memory-do-not-change-split_huge_page-target-order-silently.patch mm-huge_memory-preserve-pg_has_hwpoisoned-if-a-folio-is-split-to-0-order.patch

1 week, 4 days

3
2
0 0

[PATCH V2] LoongArch: Align ACPI structures if ARCH_STRICT_ALIGN enabled

by Huacai Chen

ARCH_STRICT_ALIGN is used for hardware without UAL, now it only control the -mstrict-align flag. However, ACPI structures are packed by default so will cause unaligned accesses. To avoid this, define ACPI_MISALIGNMENT_NOT_SUPPORTED in asm/acenv.h to align ACPI structures if ARCH_STRICT_ALIGN enabled. Cc: stable(a)vger.kernel.org Reported-by: Binbin Zhou <zhoubinbin(a)loongson.cn> Suggested-by: Xi Ruoyao <xry111(a)xry111.site> Suggested-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com> Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn> --- V2: Modify asm/acenv.h instead of Makefile. arch/loongarch/include/asm/acenv.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/arch/loongarch/include/asm/acenv.h b/arch/loongarch/include/asm/acenv.h index 52f298f7293b..483c955f2ae5 100644 --- a/arch/loongarch/include/asm/acenv.h +++ b/arch/loongarch/include/asm/acenv.h @@ -10,9 +10,8 @@ #ifndef _ASM_LOONGARCH_ACENV_H #define _ASM_LOONGARCH_ACENV_H -/* - * This header is required by ACPI core, but we have nothing to fill in - * right now. Will be updated later when needed. - */ +#ifdef CONFIG_ARCH_STRICT_ALIGN +#define ACPI_MISALIGNMENT_NOT_SUPPORTED +#endif /* CONFIG_ARCH_STRICT_ALIGN */ #endif /* _ASM_LOONGARCH_ACENV_H */ -- 2.47.3

1 week, 4 days

6
8
0 0

[PATCH] leds: leds-lp50xx: enable chip before any communication

by Christian Hitz

From: Christian Hitz <christian.hitz(a)bbv.ch> If a GPIO is used to control the chip's enable pin, it needs to be pulled high before any SPI communication is attempted. Split lp50xx_enable_disable() into two distinct functions to enforce correct ordering. Observe correct timing after manipulating the enable GPIO and SPI communication. Fixes: 242b81170fb8 ("leds: lp50xx: Add the LP50XX family of the RGB LED driver") Signed-off-by: Christian Hitz <christian.hitz(a)bbv.ch> Cc: stable(a)vger.kernel.org --- drivers/leds/leds-lp50xx.c | 51 +++++++++++++++++++++++++++----------- 1 file changed, 36 insertions(+), 15 deletions(-) diff --git a/drivers/leds/leds-lp50xx.c b/drivers/leds/leds-lp50xx.c index d19b6a459151..f23e9ae434e4 100644 --- a/drivers/leds/leds-lp50xx.c +++ b/drivers/leds/leds-lp50xx.c @@ -52,6 +52,8 @@ #define LP50XX_SW_RESET 0xff #define LP50XX_CHIP_EN BIT(6) +#define LP50XX_START_TIME_US 500 +#define LP50XX_RESET_TIME_US 3 /* There are 3 LED outputs per bank */ #define LP50XX_LEDS_PER_MODULE 3 @@ -374,19 +376,42 @@ static int lp50xx_reset(struct lp50xx *priv) return regmap_write(priv->regmap, priv->chip_info->reset_reg, LP50XX_SW_RESET); } -static int lp50xx_enable_disable(struct lp50xx *priv, int enable_disable) +static int lp50xx_enable(struct lp50xx *priv) { int ret; - ret = gpiod_direction_output(priv->enable_gpio, enable_disable); + if (priv->enable_gpio) { + ret = gpiod_direction_output(priv->enable_gpio, 1); + if (ret) + return ret; + + udelay(LP50XX_START_TIME_US); + } else { + ret = lp50xx_reset(priv); + if (ret) + return ret; + } + + return regmap_write(priv->regmap, LP50XX_DEV_CFG0, LP50XX_CHIP_EN); +} + +static int lp50xx_disable(struct lp50xx *priv) +{ + int ret; + + ret = regmap_write(priv->regmap, LP50XX_DEV_CFG0, 0); if (ret) return ret; - if (enable_disable) - return regmap_write(priv->regmap, LP50XX_DEV_CFG0, LP50XX_CHIP_EN); - else - return regmap_write(priv->regmap, LP50XX_DEV_CFG0, 0); + if (priv->enable_gpio) { + ret = gpiod_direction_output(priv->enable_gpio, 0); + if (ret) + return ret; + + udelay(LP50XX_RESET_TIME_US); + } + return 0; } static int lp50xx_probe_leds(struct fwnode_handle *child, struct lp50xx *priv, @@ -453,6 +478,10 @@ static int lp50xx_probe_dt(struct lp50xx *priv) return dev_err_probe(priv->dev, PTR_ERR(priv->enable_gpio), "Failed to get enable GPIO\n"); + ret = lp50xx_enable(priv); + if (ret) + return ret; + priv->regulator = devm_regulator_get(priv->dev, "vled"); if (IS_ERR(priv->regulator)) priv->regulator = NULL; @@ -550,14 +579,6 @@ static int lp50xx_probe(struct i2c_client *client) return ret; } - ret = lp50xx_reset(led); - if (ret) - return ret; - - ret = lp50xx_enable_disable(led, 1); - if (ret) - return ret; - return lp50xx_probe_dt(led); } @@ -566,7 +587,7 @@ static void lp50xx_remove(struct i2c_client *client) struct lp50xx *led = i2c_get_clientdata(client); int ret; - ret = lp50xx_enable_disable(led, 0); + ret = lp50xx_disable(led); if (ret) dev_err(led->dev, "Failed to disable chip\n"); -- 2.51.0

1 week, 4 days

2
2
0 0

[PATCH v3 06/14] iommu/mediatek: fix use-after-free on probe deferral

by Johan Hovold

The driver is dropping the references taken to the larb devices during probe after successful lookup as well as on errors. This can potentially lead to a use-after-free in case a larb device has not yet been bound to its driver so that the iommu driver probe defers. Fix this by keeping the references as expected while the iommu driver is bound. Fixes: 26593928564c ("iommu/mediatek: Add error path for loop of mm_dts_parse") Cc: stable(a)vger.kernel.org Cc: Yong Wu <yong.wu(a)mediatek.com> Acked-by: Robin Murphy <robin.murphy(a)arm.com> Signed-off-by: Johan Hovold <johan(a)kernel.org> --- drivers/iommu/mtk_iommu.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index 8d8e85186188..64ce041238fd 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -1213,16 +1213,19 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, struct component_match **m } component_match_add(dev, match, component_compare_dev, &plarbdev->dev); - platform_device_put(plarbdev); } - if (!frst_avail_smicomm_node) - return -EINVAL; + if (!frst_avail_smicomm_node) { + ret = -EINVAL; + goto err_larbdev_put; + } pcommdev = of_find_device_by_node(frst_avail_smicomm_node); of_node_put(frst_avail_smicomm_node); - if (!pcommdev) - return -ENODEV; + if (!pcommdev) { + ret = -ENODEV; + goto err_larbdev_put; + } data->smicomm_dev = &pcommdev->dev; link = device_link_add(data->smicomm_dev, dev, @@ -1230,7 +1233,8 @@ static int mtk_iommu_mm_dts_parse(struct device *dev, struct component_match **m platform_device_put(pcommdev); if (!link) { dev_err(dev, "Unable to link %s.\n", dev_name(data->smicomm_dev)); - return -EINVAL; + ret = -EINVAL; + goto err_larbdev_put; } return 0; @@ -1402,8 +1406,12 @@ static int mtk_iommu_probe(struct platform_device *pdev) iommu_device_sysfs_remove(&data->iommu); out_list_del: list_del(&data->list); - if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) + if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) { device_link_remove(data->smicomm_dev, dev); + + for (i = 0; i < MTK_LARB_NR_MAX; i++) + put_device(data->larb_imu[i].dev); + } out_runtime_disable: pm_runtime_disable(dev); return ret; @@ -1423,6 +1431,9 @@ static void mtk_iommu_remove(struct platform_device *pdev) if (MTK_IOMMU_IS_TYPE(data->plat_data, MTK_IOMMU_TYPE_MM)) { device_link_remove(data->smicomm_dev, &pdev->dev); component_master_del(&pdev->dev, &mtk_iommu_com_ops); + + for (i = 0; i < MTK_LARB_NR_MAX; i++) + put_device(data->larb_imu[i].dev); } pm_runtime_disable(&pdev->dev); for (i = 0; i < data->plat_data->banks_num; i++) { -- 2.49.1

1 week, 4 days

3
2
0 0

[PATCH v2 0/5] mm, swap: misc cleanup and bugfix

by Kairui Song

A few cleanups and a bugfix that are either suitable after the swap table phase I or found during code review. Patch 1 is a bugfix and needs to be included in the stable branch, the rest have no behavior change. --- Changes in v2: - Update commit message for patch 1, it's a sub-optimal fix and a better fix can be done later. [ Chris Li ] - Fix a lock balance issue in patch 1. [ YoungJun Park ] - Add a trivial cleanup patch to remove an unused argument, no behavior change. - Update kernel doc. - Fix minor issue with commit message [ Nhat Pham ] - Link to v1: https://lore.kernel.org/r/20251007-swap-clean-after-swap-table-p1-v1-0-7486… --- Kairui Song (5): mm, swap: do not perform synchronous discard during allocation mm, swap: rename helper for setup bad slots mm, swap: cleanup swap entry allocation parameter mm/migrate, swap: drop usage of folio_index mm, swap: remove redundant argument for isolating a cluster include/linux/swap.h | 4 +-- mm/migrate.c | 4 +-- mm/shmem.c | 2 +- mm/swap.h | 21 ---------------- mm/swapfile.c | 71 +++++++++++++++++++++++++++++++++++----------------- mm/vmscan.c | 4 +-- 6 files changed, 55 insertions(+), 51 deletions(-) --- base-commit: 5b5c3e53c939318f6a0698c895c7ec40758bff6a change-id: 20251007-swap-clean-after-swap-table-p1-b9a7635ee3fa Best regards, -- Kairui Song <kasong(a)tencent.com>

1 week, 5 days

3
3
0 0

[PATCH v2 6.1.y] drm/amdgpu: Fix potential out-of-bounds access in 'amdgpu_discovery_reg_base_init()'

by Vasiliy Kovalev

From: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com> commit cdb637d339572398821204a1142d8d615668f1e9 upstream. The issue arises when the array 'adev->vcn.vcn_config' is accessed before checking if the index 'adev->vcn.num_vcn_inst' is within the bounds of the array. The fix involves moving the bounds check before the array access. This ensures that 'adev->vcn.num_vcn_inst' is within the bounds of the array before it is used as an index. Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c:1289 amdgpu_discovery_reg_base_init() error: testing array offset 'adev->vcn.num_vcn_inst' after use. Fixes: a0ccc717c4ab ("drm/amdgpu/discovery: validate VCN and SDMA instances") Cc: Christian König <christian.koenig(a)amd.com> Cc: Alex Deucher <alexander.deucher(a)amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com> Reviewed-by: Alex Deucher <alexander.deucher(a)amd.com> Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com> [ kovalev: bp to fix CVE-2024-27042 ] Signed-off-by: Vasiliy Kovalev <kovalev(a)altlinux.org> --- v2: Added braces around the single statement in the `else` block to comply with kernel coding style. --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c index d8441e273cb5..9274ac361203 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c @@ -1128,15 +1128,16 @@ static int amdgpu_discovery_reg_base_init(struct amdgpu_device *adev) * 0b10 : encode is disabled * 0b01 : decode is disabled */ - adev->vcn.vcn_config[adev->vcn.num_vcn_inst] = - ip->revision & 0xc0; - ip->revision &= ~0xc0; - if (adev->vcn.num_vcn_inst < AMDGPU_MAX_VCN_INSTANCES) + if (adev->vcn.num_vcn_inst < AMDGPU_MAX_VCN_INSTANCES) { + adev->vcn.vcn_config[adev->vcn.num_vcn_inst] = + ip->revision & 0xc0; adev->vcn.num_vcn_inst++; - else + } else { dev_err(adev->dev, "Too many VCN instances: %d vs %d\n", adev->vcn.num_vcn_inst + 1, AMDGPU_MAX_VCN_INSTANCES); + } + ip->revision &= ~0xc0; } if (le16_to_cpu(ip->hw_id) == SDMA0_HWID || le16_to_cpu(ip->hw_id) == SDMA1_HWID || -- 2.50.1

1 week, 5 days

1
0
0 0

[PATCH v2 0/4] arm64: dts: marvell: cn913x-solidrun: fix sata ports status

by Josua Mayer

The SolidRun CN9130 SoC based boards have a variety of functional problems, in particular - SATA ports - CN9132 CEX-7 eMMC - CN9132 Clearfog PCI-E x2 / x4 ports are not functional. The SATA issue was recently introduced via changes to the armada-cp11x.dtsi, wheras the eMMC and SPI problems were present in the board dts from the very beginning. This patch-set aims to resolve the problems after testing on Debian 13 release (Linux v6.12). Signed-off-by: Josua Mayer <josua(a)solid-run.com> --- Changes in v2: - fixed mistakes in the original board device-trees that caused functional issues with eMMC and pci. - Link to v1: https://lore.kernel.org/r/20250911-cn913x-sr-fix-sata-v1-1-9e72238d0988@sol… --- Josua Mayer (4): arm64: dts: marvell: cn913x-solidrun: fix sata ports status arm64: dts: marvell: cn9132-clearfog: disable eMMC high-speed modes arm64: dts: marvell: cn9132-clearfog: fix multi-lane pci x2 and x4 ports arm64: dts: marvell: cn9130-sr-som: add missing properties to emmc arch/arm64/boot/dts/marvell/cn9130-cf.dtsi | 7 ++++--- arch/arm64/boot/dts/marvell/cn9130-sr-som.dtsi | 2 ++ arch/arm64/boot/dts/marvell/cn9131-cf-solidwan.dts | 6 ++++-- arch/arm64/boot/dts/marvell/cn9132-clearfog.dts | 22 ++++++++++++++++------ arch/arm64/boot/dts/marvell/cn9132-sr-cex7.dtsi | 8 ++++++++ 5 files changed, 34 insertions(+), 11 deletions(-) --- base-commit: 8f5ae30d69d7543eee0d70083daf4de8fe15d585 change-id: 20250911-cn913x-sr-fix-sata-5c737ebdb97f Best regards, -- Josua Mayer <josua(a)solid-run.com>

1 week, 5 days

3
13
0 0

[PATCH bpf-next 1/3] ftrace: Fix BPF fexit with livepatch

by Song Liu

When livepatch is attached to the same function as bpf trampoline with a fexit program, bpf trampoline code calls register_ftrace_direct() twice. The first time will fail with -EAGAIN, and the second time it will succeed. This requires register_ftrace_direct() to unregister the address on the first attempt. Otherwise, the bpf trampoline cannot attach. Here is an easy way to reproduce this issue: insmod samples/livepatch/livepatch-sample.ko bpftrace -e 'fexit:cmdline_proc_show {}' ERROR: Unable to attach probe: fexit:vmlinux:cmdline_proc_show... Fix this by cleaning up the hash when register_ftrace_function_nolock hits errors. Fixes: d05cb470663a ("ftrace: Fix modification of direct_function hash while in use") Cc: stable(a)vger.kernel.org # v6.6+ Reported-by: Andrey Grodzovsky <andrey.grodzovsky(a)crowdstrike.com> Closes: https://lore.kernel.org/live-patching/c5058315a39d4615b333e485893345be@crow… Cc: Steven Rostedt (Google) <rostedt(a)goodmis.org> Cc: Masami Hiramatsu (Google) <mhiramat(a)kernel.org> Acked-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky(a)crowdstrike.com> Signed-off-by: Song Liu <song(a)kernel.org> --- kernel/trace/ftrace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 42bd2ba68a82..7f432775a6b5 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -6048,6 +6048,8 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr) ops->direct_call = addr; err = register_ftrace_function_nolock(ops); + if (err) + remove_direct_functions_hash(hash, addr); out_unlock: mutex_unlock(&direct_mutex); -- 2.47.3

1 week, 5 days

3
3
0 0

[PATCH v2 bpf 1/3] ftrace: Fix BPF fexit with livepatch

by Song Liu

When livepatch is attached to the same function as bpf trampoline with a fexit program, bpf trampoline code calls register_ftrace_direct() twice. The first time will fail with -EAGAIN, and the second time it will succeed. This requires register_ftrace_direct() to unregister the address on the first attempt. Otherwise, the bpf trampoline cannot attach. Here is an easy way to reproduce this issue: insmod samples/livepatch/livepatch-sample.ko bpftrace -e 'fexit:cmdline_proc_show {}' ERROR: Unable to attach probe: fexit:vmlinux:cmdline_proc_show... Fix this by cleaning up the hash when register_ftrace_function_nolock hits errors. Fixes: d05cb470663a ("ftrace: Fix modification of direct_function hash while in use") Cc: stable(a)vger.kernel.org # v6.6+ Reported-by: Andrey Grodzovsky <andrey.grodzovsky(a)crowdstrike.com> Closes: https://lore.kernel.org/live-patching/c5058315a39d4615b333e485893345be@crow… Cc: Steven Rostedt (Google) <rostedt(a)goodmis.org> Cc: Masami Hiramatsu (Google) <mhiramat(a)kernel.org> Acked-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky(a)crowdstrike.com> Signed-off-by: Song Liu <song(a)kernel.org> --- kernel/trace/ftrace.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 42bd2ba68a82..7f432775a6b5 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -6048,6 +6048,8 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr) ops->direct_call = addr; err = register_ftrace_function_nolock(ops); + if (err) + remove_direct_functions_hash(hash, addr); out_unlock: mutex_unlock(&direct_mutex); -- 2.47.3

1 week, 5 days

1
0
0 0

FAILED: patch "[PATCH] x86/resctrl: Fix miscount of bandwidth event when" failed to apply to 6.1-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 6.1-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. To reproduce the conflict and resubmit, you may use the following commands: git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y git checkout FETCH_HEAD git cherry-pick -x 15292f1b4c55a3a7c940dbcb6cb8793871ed3d92 # <resolve conflicts, build, test, etc.> git commit -s git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025102053-condone-sprout-77c6@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^.. Possible dependencies: thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 15292f1b4c55a3a7c940dbcb6cb8793871ed3d92 Mon Sep 17 00:00:00 2001 From: Babu Moger <babu.moger(a)amd.com> Date: Fri, 10 Oct 2025 12:08:35 -0500 Subject: [PATCH] x86/resctrl: Fix miscount of bandwidth event when reactivating previously unavailable RMID Users can create as many monitoring groups as the number of RMIDs supported by the hardware. However, on AMD systems, only a limited number of RMIDs are guaranteed to be actively tracked by the hardware. RMIDs that exceed this limit are placed in an "Unavailable" state. When a bandwidth counter is read for such an RMID, the hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable remains set on first read after tracking re-starts and is clear on all subsequent reads as long as the RMID is tracked. resctrl miscounts the bandwidth events after an RMID transitions from the "Unavailable" state back to being tracked. This happens because when the hardware starts counting again after resetting the counter to zero, resctrl in turn compares the new count against the counter value stored from the previous time the RMID was tracked. This results in resctrl computing an event value that is either undercounting (when new counter is more than stored counter) or a mistaken overflow (when new counter is less than stored counter). Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to zero whenever the RMID is in the "Unavailable" state to ensure accurate counting after the RMID resets to zero when it starts to be tracked again. Example scenario that results in mistaken overflow ================================================== 1. The resctrl filesystem is mounted, and a task is assigned to a monitoring group. $mount -t resctrl resctrl /sys/fs/resctrl $mkdir /sys/fs/resctrl/mon_groups/test1/ $echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 21323 <- Total bytes on domain 0 "Unavailable" <- Total bytes on domain 1 Task is running on domain 0. Counter on domain 1 is "Unavailable". 2. The task runs on domain 0 for a while and then moves to domain 1. The counter starts incrementing on domain 1. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 7345357 <- Total bytes on domain 0 4545 <- Total bytes on domain 1 3. At some point, the RMID in domain 0 transitions to the "Unavailable" state because the task is no longer executing in that domain. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes "Unavailable" <- Total bytes on domain 0 434341 <- Total bytes on domain 1 4. Since the task continues to migrate between domains, it may eventually return to domain 0. $cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes 17592178699059 <- Overflow on domain 0 3232332 <- Total bytes on domain 1 In this case, the RMID on domain 0 transitions from "Unavailable" state to active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when the counter is read and begins tracking the RMID counting from 0. Subsequent reads succeed but return a value smaller than the previously saved MSR value (7345357). Consequently, the resctrl's overflow logic is triggered, it compares the previous value (7345357) with the new, smaller value and incorrectly interprets this as a counter overflow, adding a large delta. In reality, this is a false positive: the counter did not overflow but was simply reset when the RMID transitioned from "Unavailable" back to active state. Here is the text from APM [1] available from [2]. "In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the first QM_CTR read when it begins tracking an RMID that it was not previously tracking. The U bit will be zero for all subsequent reads from that RMID while it is still tracked by the hardware. Therefore, a QM_CTR read with the U bit set when that RMID is in use by a processor can be considered 0 when calculating the difference with a subsequent read." [1] AMD64 Architecture Programmer's Manual Volume 2: System Programming Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory Bandwidth (MBM). [ bp: Split commit message into smaller paragraph chunks for better consumption. ] Fixes: 4d05bf71f157d ("x86/resctrl: Introduce AMD QOS feature") Signed-off-by: Babu Moger <babu.moger(a)amd.com> Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre(a)intel.com> Tested-by: Reinette Chatre <reinette.chatre(a)intel.com> Cc: stable(a)vger.kernel.org # needs adjustments for <= v6.17 Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2] diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c index c8945610d455..2cd25a0d4637 100644 --- a/arch/x86/kernel/cpu/resctrl/monitor.c +++ b/arch/x86/kernel/cpu/resctrl/monitor.c @@ -242,7 +242,9 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d, u32 unused, u32 rmid, enum resctrl_event_id eventid, u64 *val, void *ignored) { + struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d); int cpu = cpumask_any(&d->hdr.cpu_mask); + struct arch_mbm_state *am; u64 msr_val; u32 prmid; int ret; @@ -251,12 +253,16 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d, prmid = logical_rmid_to_physical_rmid(cpu, rmid); ret = __rmid_read_phys(prmid, eventid, &msr_val); - if (ret) - return ret; - *val = get_corrected_val(r, d, rmid, eventid, msr_val); + if (!ret) { + *val = get_corrected_val(r, d, rmid, eventid, msr_val); + } else if (ret == -EINVAL) { + am = get_arch_mbm_state(hw_dom, rmid, eventid); + if (am) + am->prev_msr = 0; + } - return 0; + return ret; } static int __cntr_id_read(u32 cntr_id, u64 *val)

1 week, 5 days

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror