Commits 7b8ef22ea547 ("usb: xhci: plat: Add USB phy support") and
9134c1fd0503 ("usb: xhci: plat: Add USB 3.0 phy support") added support
for looking up legacy PHYs from the sysdev devicetree node and
initialising them.
This broke drivers such as dwc3 which manages PHYs themself as the PHYs
would now be initialised twice, something which specifically can lead to
resources being left enabled during suspend (e.g. with the
usb_phy_generic PHY driver).
As the dwc3 driver uses driver-name matching for the xhci platform
device, fix this by only looking up and initialising PHYs for devices
that have been matched using OF.
Note that checking that the platform device has a devicetree node would
currently be sufficient, but that could lead to subtle breakages in case
anyone ever tries to reuse an ancestor's node.
Fixes: 7b8ef22ea547 ("usb: xhci: plat: Add USB phy support")
Fixes: 9134c1fd0503 ("usb: xhci: plat: Add USB 3.0 phy support")
Cc: stable(a)vger.kernel.org # 4.1
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Stanley Chang <stanley_chang(a)realtek.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/usb/host/xhci-plat.c | 50 +++++++++++++++++++++---------------
1 file changed, 30 insertions(+), 20 deletions(-)
diff --git a/drivers/usb/host/xhci-plat.c b/drivers/usb/host/xhci-plat.c
index 28218c8f1837..01d19d17153b 100644
--- a/drivers/usb/host/xhci-plat.c
+++ b/drivers/usb/host/xhci-plat.c
@@ -13,6 +13,7 @@
#include <linux/module.h>
#include <linux/pci.h>
#include <linux/of.h>
+#include <linux/of_device.h>
#include <linux/platform_device.h>
#include <linux/usb/phy.h>
#include <linux/slab.h>
@@ -148,7 +149,7 @@ int xhci_plat_probe(struct platform_device *pdev, struct device *sysdev, const s
int ret;
int irq;
struct xhci_plat_priv *priv = NULL;
-
+ bool of_match;
if (usb_disabled())
return -ENODEV;
@@ -253,16 +254,23 @@ int xhci_plat_probe(struct platform_device *pdev, struct device *sysdev, const s
&xhci->imod_interval);
}
- hcd->usb_phy = devm_usb_get_phy_by_phandle(sysdev, "usb-phy", 0);
- if (IS_ERR(hcd->usb_phy)) {
- ret = PTR_ERR(hcd->usb_phy);
- if (ret == -EPROBE_DEFER)
- goto disable_clk;
- hcd->usb_phy = NULL;
- } else {
- ret = usb_phy_init(hcd->usb_phy);
- if (ret)
- goto disable_clk;
+ /*
+ * Drivers such as dwc3 manages PHYs themself (and rely on driver name
+ * matching for the xhci platform device).
+ */
+ of_match = of_match_device(pdev->dev.driver->of_match_table, &pdev->dev);
+ if (of_match) {
+ hcd->usb_phy = devm_usb_get_phy_by_phandle(sysdev, "usb-phy", 0);
+ if (IS_ERR(hcd->usb_phy)) {
+ ret = PTR_ERR(hcd->usb_phy);
+ if (ret == -EPROBE_DEFER)
+ goto disable_clk;
+ hcd->usb_phy = NULL;
+ } else {
+ ret = usb_phy_init(hcd->usb_phy);
+ if (ret)
+ goto disable_clk;
+ }
}
hcd->tpl_support = of_usb_host_tpl_support(sysdev->of_node);
@@ -285,15 +293,17 @@ int xhci_plat_probe(struct platform_device *pdev, struct device *sysdev, const s
goto dealloc_usb2_hcd;
}
- xhci->shared_hcd->usb_phy = devm_usb_get_phy_by_phandle(sysdev,
- "usb-phy", 1);
- if (IS_ERR(xhci->shared_hcd->usb_phy)) {
- xhci->shared_hcd->usb_phy = NULL;
- } else {
- ret = usb_phy_init(xhci->shared_hcd->usb_phy);
- if (ret)
- dev_err(sysdev, "%s init usb3phy fail (ret=%d)\n",
- __func__, ret);
+ if (of_match) {
+ xhci->shared_hcd->usb_phy = devm_usb_get_phy_by_phandle(sysdev,
+ "usb-phy", 1);
+ if (IS_ERR(xhci->shared_hcd->usb_phy)) {
+ xhci->shared_hcd->usb_phy = NULL;
+ } else {
+ ret = usb_phy_init(xhci->shared_hcd->usb_phy);
+ if (ret)
+ dev_err(sysdev, "%s init usb3phy fail (ret=%d)\n",
+ __func__, ret);
+ }
}
xhci->shared_hcd->tpl_support = hcd->tpl_support;
--
2.41.0
From: Shuai Xue <xueshuai(a)linux.alibaba.com>
[ Upstream commit 54aee5f15b83437f23b2b2469bcf21bdd9823916 ]
When perf-record with a large AUX area, e.g 4GB, it fails with:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
and it reveals a WARNING with __alloc_pages():
------------[ cut here ]------------
WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
Call trace:
__alloc_pages+0x1ec/0x248
__kmalloc_large_node+0xc0/0x1f8
__kmalloc_node+0x134/0x1e8
rb_alloc_aux+0xe0/0x298
perf_mmap+0x440/0x660
mmap_region+0x308/0x8a8
do_mmap+0x3c0/0x528
vm_mmap_pgoff+0xf4/0x1b8
ksys_mmap_pgoff+0x18c/0x218
__arm64_sys_mmap+0x38/0x58
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x58/0x188
do_el0_svc+0x34/0x50
el0_svc+0x34/0x108
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x1a4/0x1a8
'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to
maintains AUX trace pages. The allocated page for this array is physically
contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
WARNING.
So bail out early with -ENOMEM if the request AUX area is out of bound,
e.g.:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/ring_buffer.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index fb1e180b5f0af..e8d82c2f07d0e 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -700,6 +700,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
watermark = 0;
}
+ /*
+ * kcalloc_node() is unable to allocate buffer if the size is larger
+ * than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
+ */
+ if (get_order((unsigned long)nr_pages * sizeof(void *)) > MAX_ORDER)
+ return -ENOMEM;
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
node);
if (!rb->aux_pages)
--
2.42.0
DAMON sysfs interface's before_damos_apply callback
(damon_sysfs_before_damos_apply()), which creates the DAMOS tried
regions for each DAMOS action applied region, is not handling the
allocation failure for the sysfs directory data. As a result, NULL
pointer derefeence is possible. Fix it by handling the case.
Fixes: f1d13cacabe1 ("mm/damon/sysfs: implement DAMOS tried regions update command")
Cc: <stable(a)vger.kernel.org> # 6.2.x
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
mm/damon/sysfs-schemes.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 7413cb35c5a9..be667236b8e6 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -1826,6 +1826,8 @@ static int damon_sysfs_before_damos_apply(struct damon_ctx *ctx,
return 0;
region = damon_sysfs_scheme_region_alloc(r);
+ if (!region)
+ return 0;
list_add_tail(®ion->list, &sysfs_regions->regions_list);
sysfs_regions->nr_regions++;
if (kobject_init_and_add(®ion->kobj,
--
2.34.1
damon_sysfs_update_target() returns error code for failures, but its
caller, damon_sysfs_set_targets() is ignoring that. The update function
seems making no critical change in case of such failures, but the
behavior will look like DAMON sysfs is silently ignoring or only
partially accepting the user input. Fix it.
Fixes: 19467a950b49 ("mm/damon/sysfs: remove requested targets when online-commit inputs")
Cc: <stable(a)vger.kernel.org> # 5.19.x
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
Note that yet another fix[1] should be applied before this.
[1] https://lore.kernel.org/all/739e6aaf-a634-4e33-98a8-16546379ec9f@moroto.mou…
mm/damon/sysfs.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/mm/damon/sysfs.c b/mm/damon/sysfs.c
index 1dfa96d4de99..7472404456aa 100644
--- a/mm/damon/sysfs.c
+++ b/mm/damon/sysfs.c
@@ -1203,8 +1203,10 @@ static int damon_sysfs_set_targets(struct damon_ctx *ctx,
damon_for_each_target_safe(t, next, ctx) {
if (i < sysfs_targets->nr) {
- damon_sysfs_update_target(t, ctx,
+ err = damon_sysfs_update_target(t, ctx,
sysfs_targets->targets_arr[i]);
+ if (err)
+ return err;
} else {
if (damon_target_has_pid(ctx))
put_pid(t->pid);
--
2.34.1
From: Shuai Xue <xueshuai(a)linux.alibaba.com>
[ Upstream commit 54aee5f15b83437f23b2b2469bcf21bdd9823916 ]
When perf-record with a large AUX area, e.g 4GB, it fails with:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
and it reveals a WARNING with __alloc_pages():
------------[ cut here ]------------
WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
Call trace:
__alloc_pages+0x1ec/0x248
__kmalloc_large_node+0xc0/0x1f8
__kmalloc_node+0x134/0x1e8
rb_alloc_aux+0xe0/0x298
perf_mmap+0x440/0x660
mmap_region+0x308/0x8a8
do_mmap+0x3c0/0x528
vm_mmap_pgoff+0xf4/0x1b8
ksys_mmap_pgoff+0x18c/0x218
__arm64_sys_mmap+0x38/0x58
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x58/0x188
do_el0_svc+0x34/0x50
el0_svc+0x34/0x108
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x1a4/0x1a8
'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to
maintains AUX trace pages. The allocated page for this array is physically
contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
WARNING.
So bail out early with -ENOMEM if the request AUX area is out of bound,
e.g.:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/ring_buffer.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 12f351b253bbb..2f6f77658eba2 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -639,6 +639,12 @@ int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
}
}
+ /*
+ * kcalloc_node() is unable to allocate buffer if the size is larger
+ * than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
+ */
+ if (get_order((unsigned long)nr_pages * sizeof(void *)) > MAX_ORDER)
+ return -ENOMEM;
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
node);
if (!rb->aux_pages)
--
2.42.0
From: Shuai Xue <xueshuai(a)linux.alibaba.com>
[ Upstream commit 54aee5f15b83437f23b2b2469bcf21bdd9823916 ]
When perf-record with a large AUX area, e.g 4GB, it fails with:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
and it reveals a WARNING with __alloc_pages():
------------[ cut here ]------------
WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
Call trace:
__alloc_pages+0x1ec/0x248
__kmalloc_large_node+0xc0/0x1f8
__kmalloc_node+0x134/0x1e8
rb_alloc_aux+0xe0/0x298
perf_mmap+0x440/0x660
mmap_region+0x308/0x8a8
do_mmap+0x3c0/0x528
vm_mmap_pgoff+0xf4/0x1b8
ksys_mmap_pgoff+0x18c/0x218
__arm64_sys_mmap+0x38/0x58
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x58/0x188
do_el0_svc+0x34/0x50
el0_svc+0x34/0x108
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x1a4/0x1a8
'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to
maintains AUX trace pages. The allocated page for this array is physically
contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
WARNING.
So bail out early with -ENOMEM if the request AUX area is out of bound,
e.g.:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/ring_buffer.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index ffb59a4ef4ff3..fb3edb2f8ac93 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -653,6 +653,12 @@ int rb_alloc_aux(struct ring_buffer *rb, struct perf_event *event,
max_order--;
}
+ /*
+ * kcalloc_node() is unable to allocate buffer if the size is larger
+ * than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
+ */
+ if (get_order((unsigned long)nr_pages * sizeof(void *)) > MAX_ORDER)
+ return -ENOMEM;
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
node);
if (!rb->aux_pages)
--
2.42.0
From: Shuai Xue <xueshuai(a)linux.alibaba.com>
[ Upstream commit 54aee5f15b83437f23b2b2469bcf21bdd9823916 ]
When perf-record with a large AUX area, e.g 4GB, it fails with:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
and it reveals a WARNING with __alloc_pages():
------------[ cut here ]------------
WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
Call trace:
__alloc_pages+0x1ec/0x248
__kmalloc_large_node+0xc0/0x1f8
__kmalloc_node+0x134/0x1e8
rb_alloc_aux+0xe0/0x298
perf_mmap+0x440/0x660
mmap_region+0x308/0x8a8
do_mmap+0x3c0/0x528
vm_mmap_pgoff+0xf4/0x1b8
ksys_mmap_pgoff+0x18c/0x218
__arm64_sys_mmap+0x38/0x58
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x58/0x188
do_el0_svc+0x34/0x50
el0_svc+0x34/0x108
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x1a4/0x1a8
'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to
maintains AUX trace pages. The allocated page for this array is physically
contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
WARNING.
So bail out early with -ENOMEM if the request AUX area is out of bound,
e.g.:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/ring_buffer.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 4032cd4750001..01351e7e25435 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -691,6 +691,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
max_order--;
}
+ /*
+ * kcalloc_node() is unable to allocate buffer if the size is larger
+ * than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
+ */
+ if (get_order((unsigned long)nr_pages * sizeof(void *)) > MAX_ORDER)
+ return -ENOMEM;
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
node);
if (!rb->aux_pages)
--
2.42.0
From: Shuai Xue <xueshuai(a)linux.alibaba.com>
[ Upstream commit 54aee5f15b83437f23b2b2469bcf21bdd9823916 ]
When perf-record with a large AUX area, e.g 4GB, it fails with:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
and it reveals a WARNING with __alloc_pages():
------------[ cut here ]------------
WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
Call trace:
__alloc_pages+0x1ec/0x248
__kmalloc_large_node+0xc0/0x1f8
__kmalloc_node+0x134/0x1e8
rb_alloc_aux+0xe0/0x298
perf_mmap+0x440/0x660
mmap_region+0x308/0x8a8
do_mmap+0x3c0/0x528
vm_mmap_pgoff+0xf4/0x1b8
ksys_mmap_pgoff+0x18c/0x218
__arm64_sys_mmap+0x38/0x58
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x58/0x188
do_el0_svc+0x34/0x50
el0_svc+0x34/0x108
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x1a4/0x1a8
'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to
maintains AUX trace pages. The allocated page for this array is physically
contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
WARNING.
So bail out early with -ENOMEM if the request AUX area is out of bound,
e.g.:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/ring_buffer.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index f40da32f5e753..6808873555f0d 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -696,6 +696,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
watermark = 0;
}
+ /*
+ * kcalloc_node() is unable to allocate buffer if the size is larger
+ * than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
+ */
+ if (get_order((unsigned long)nr_pages * sizeof(void *)) > MAX_ORDER)
+ return -ENOMEM;
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
node);
if (!rb->aux_pages)
--
2.42.0
From: Shuai Xue <xueshuai(a)linux.alibaba.com>
[ Upstream commit 54aee5f15b83437f23b2b2469bcf21bdd9823916 ]
When perf-record with a large AUX area, e.g 4GB, it fails with:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
and it reveals a WARNING with __alloc_pages():
------------[ cut here ]------------
WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
Call trace:
__alloc_pages+0x1ec/0x248
__kmalloc_large_node+0xc0/0x1f8
__kmalloc_node+0x134/0x1e8
rb_alloc_aux+0xe0/0x298
perf_mmap+0x440/0x660
mmap_region+0x308/0x8a8
do_mmap+0x3c0/0x528
vm_mmap_pgoff+0xf4/0x1b8
ksys_mmap_pgoff+0x18c/0x218
__arm64_sys_mmap+0x38/0x58
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x58/0x188
do_el0_svc+0x34/0x50
el0_svc+0x34/0x108
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x1a4/0x1a8
'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to
maintains AUX trace pages. The allocated page for this array is physically
contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
WARNING.
So bail out early with -ENOMEM if the request AUX area is out of bound,
e.g.:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/ring_buffer.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index 273a0fe7910a5..45965f13757e4 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -699,6 +699,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
watermark = 0;
}
+ /*
+ * kcalloc_node() is unable to allocate buffer if the size is larger
+ * than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
+ */
+ if (get_order((unsigned long)nr_pages * sizeof(void *)) > MAX_ORDER)
+ return -ENOMEM;
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
node);
if (!rb->aux_pages)
--
2.42.0
From: Shuai Xue <xueshuai(a)linux.alibaba.com>
[ Upstream commit 54aee5f15b83437f23b2b2469bcf21bdd9823916 ]
When perf-record with a large AUX area, e.g 4GB, it fails with:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
and it reveals a WARNING with __alloc_pages():
------------[ cut here ]------------
WARNING: CPU: 44 PID: 17573 at mm/page_alloc.c:5568 __alloc_pages+0x1ec/0x248
Call trace:
__alloc_pages+0x1ec/0x248
__kmalloc_large_node+0xc0/0x1f8
__kmalloc_node+0x134/0x1e8
rb_alloc_aux+0xe0/0x298
perf_mmap+0x440/0x660
mmap_region+0x308/0x8a8
do_mmap+0x3c0/0x528
vm_mmap_pgoff+0xf4/0x1b8
ksys_mmap_pgoff+0x18c/0x218
__arm64_sys_mmap+0x38/0x58
invoke_syscall+0x50/0x128
el0_svc_common.constprop.0+0x58/0x188
do_el0_svc+0x34/0x50
el0_svc+0x34/0x108
el0t_64_sync_handler+0xb8/0xc0
el0t_64_sync+0x1a4/0x1a8
'rb->aux_pages' allocated by kcalloc() is a pointer array which is used to
maintains AUX trace pages. The allocated page for this array is physically
contiguous (and virtually contiguous) with an order of 0..MAX_ORDER. If the
size of pointer array crosses the limitation set by MAX_ORDER, it reveals a
WARNING.
So bail out early with -ENOMEM if the request AUX area is out of bound,
e.g.:
#perf record -C 0 -m ,4G -e arm_spe_0// -- sleep 1
failed to mmap with 12 (Cannot allocate memory)
Signed-off-by: Shuai Xue <xueshuai(a)linux.alibaba.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/ring_buffer.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index a0433f37b0243..4a260ceed9c73 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -699,6 +699,12 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event *event,
watermark = 0;
}
+ /*
+ * kcalloc_node() is unable to allocate buffer if the size is larger
+ * than: PAGE_SIZE << MAX_ORDER; directly bail out in this case.
+ */
+ if (get_order((unsigned long)nr_pages * sizeof(void *)) > MAX_ORDER)
+ return -ENOMEM;
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
node);
if (!rb->aux_pages)
--
2.42.0
From: John Stultz <jstultz(a)google.com>
[ Upstream commit bccdd808902f8c677317cec47c306e42b93b849e ]
In some cases running with the test-ww_mutex code, I was seeing
odd behavior where sometimes it seemed flush_workqueue was
returning before all the work threads were finished.
Often this would cause strange crashes as the mutexes would be
freed while they were being used.
Looking at the code, there is a lifetime problem as the
controlling thread that spawns the work allocates the
"struct stress" structures that are passed to the workqueue
threads. Then when the workqueue threads are finished,
they free the stress struct that was passed to them.
Unfortunately the workqueue work_struct node is in the stress
struct. Which means the work_struct is freed before the work
thread returns and while flush_workqueue is waiting.
It seems like a better idea to have the controlling thread
both allocate and free the stress structures, so that we can
be sure we don't corrupt the workqueue by freeing the structure
prematurely.
So this patch reworks the test to do so, and with this change
I no longer see the early flush_workqueue returns.
Signed-off-by: John Stultz <jstultz(a)google.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/test-ww_mutex.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 654977862b06b..8489a01f943e8 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -439,7 +439,6 @@ static void stress_inorder_work(struct work_struct *work)
} while (!time_after(jiffies, stress->timeout));
kfree(order);
- kfree(stress);
}
struct reorder_lock {
@@ -504,7 +503,6 @@ static void stress_reorder_work(struct work_struct *work)
list_for_each_entry_safe(ll, ln, &locks, link)
kfree(ll);
kfree(order);
- kfree(stress);
}
static void stress_one_work(struct work_struct *work)
@@ -525,8 +523,6 @@ static void stress_one_work(struct work_struct *work)
break;
}
} while (!time_after(jiffies, stress->timeout));
-
- kfree(stress);
}
#define STRESS_INORDER BIT(0)
@@ -537,15 +533,24 @@ static void stress_one_work(struct work_struct *work)
static int stress(int nlocks, int nthreads, unsigned int flags)
{
struct ww_mutex *locks;
- int n;
+ struct stress *stress_array;
+ int n, count;
locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
if (!locks)
return -ENOMEM;
+ stress_array = kmalloc_array(nthreads, sizeof(*stress_array),
+ GFP_KERNEL);
+ if (!stress_array) {
+ kfree(locks);
+ return -ENOMEM;
+ }
+
for (n = 0; n < nlocks; n++)
ww_mutex_init(&locks[n], &ww_class);
+ count = 0;
for (n = 0; nthreads; n++) {
struct stress *stress;
void (*fn)(struct work_struct *work);
@@ -569,9 +574,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
if (!fn)
continue;
- stress = kmalloc(sizeof(*stress), GFP_KERNEL);
- if (!stress)
- break;
+ stress = &stress_array[count++];
INIT_WORK(&stress->work, fn);
stress->locks = locks;
@@ -586,6 +589,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
for (n = 0; n < nlocks; n++)
ww_mutex_destroy(&locks[n]);
+ kfree(stress_array);
kfree(locks);
return 0;
--
2.42.0
From: John Stultz <jstultz(a)google.com>
[ Upstream commit bccdd808902f8c677317cec47c306e42b93b849e ]
In some cases running with the test-ww_mutex code, I was seeing
odd behavior where sometimes it seemed flush_workqueue was
returning before all the work threads were finished.
Often this would cause strange crashes as the mutexes would be
freed while they were being used.
Looking at the code, there is a lifetime problem as the
controlling thread that spawns the work allocates the
"struct stress" structures that are passed to the workqueue
threads. Then when the workqueue threads are finished,
they free the stress struct that was passed to them.
Unfortunately the workqueue work_struct node is in the stress
struct. Which means the work_struct is freed before the work
thread returns and while flush_workqueue is waiting.
It seems like a better idea to have the controlling thread
both allocate and free the stress structures, so that we can
be sure we don't corrupt the workqueue by freeing the structure
prematurely.
So this patch reworks the test to do so, and with this change
I no longer see the early flush_workqueue returns.
Signed-off-by: John Stultz <jstultz(a)google.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/test-ww_mutex.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 65a3b7e55b9fc..4fd05d9d5d6d1 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -439,7 +439,6 @@ static void stress_inorder_work(struct work_struct *work)
} while (!time_after(jiffies, stress->timeout));
kfree(order);
- kfree(stress);
}
struct reorder_lock {
@@ -504,7 +503,6 @@ static void stress_reorder_work(struct work_struct *work)
list_for_each_entry_safe(ll, ln, &locks, link)
kfree(ll);
kfree(order);
- kfree(stress);
}
static void stress_one_work(struct work_struct *work)
@@ -525,8 +523,6 @@ static void stress_one_work(struct work_struct *work)
break;
}
} while (!time_after(jiffies, stress->timeout));
-
- kfree(stress);
}
#define STRESS_INORDER BIT(0)
@@ -537,15 +533,24 @@ static void stress_one_work(struct work_struct *work)
static int stress(int nlocks, int nthreads, unsigned int flags)
{
struct ww_mutex *locks;
- int n;
+ struct stress *stress_array;
+ int n, count;
locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
if (!locks)
return -ENOMEM;
+ stress_array = kmalloc_array(nthreads, sizeof(*stress_array),
+ GFP_KERNEL);
+ if (!stress_array) {
+ kfree(locks);
+ return -ENOMEM;
+ }
+
for (n = 0; n < nlocks; n++)
ww_mutex_init(&locks[n], &ww_class);
+ count = 0;
for (n = 0; nthreads; n++) {
struct stress *stress;
void (*fn)(struct work_struct *work);
@@ -569,9 +574,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
if (!fn)
continue;
- stress = kmalloc(sizeof(*stress), GFP_KERNEL);
- if (!stress)
- break;
+ stress = &stress_array[count++];
INIT_WORK(&stress->work, fn);
stress->locks = locks;
@@ -586,6 +589,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
for (n = 0; n < nlocks; n++)
ww_mutex_destroy(&locks[n]);
+ kfree(stress_array);
kfree(locks);
return 0;
--
2.42.0
From: John Stultz <jstultz(a)google.com>
[ Upstream commit bccdd808902f8c677317cec47c306e42b93b849e ]
In some cases running with the test-ww_mutex code, I was seeing
odd behavior where sometimes it seemed flush_workqueue was
returning before all the work threads were finished.
Often this would cause strange crashes as the mutexes would be
freed while they were being used.
Looking at the code, there is a lifetime problem as the
controlling thread that spawns the work allocates the
"struct stress" structures that are passed to the workqueue
threads. Then when the workqueue threads are finished,
they free the stress struct that was passed to them.
Unfortunately the workqueue work_struct node is in the stress
struct. Which means the work_struct is freed before the work
thread returns and while flush_workqueue is waiting.
It seems like a better idea to have the controlling thread
both allocate and free the stress structures, so that we can
be sure we don't corrupt the workqueue by freeing the structure
prematurely.
So this patch reworks the test to do so, and with this change
I no longer see the early flush_workqueue returns.
Signed-off-by: John Stultz <jstultz(a)google.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/test-ww_mutex.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 3e82f449b4ff7..da36997d8742c 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -426,7 +426,6 @@ static void stress_inorder_work(struct work_struct *work)
} while (!time_after(jiffies, stress->timeout));
kfree(order);
- kfree(stress);
}
struct reorder_lock {
@@ -491,7 +490,6 @@ static void stress_reorder_work(struct work_struct *work)
list_for_each_entry_safe(ll, ln, &locks, link)
kfree(ll);
kfree(order);
- kfree(stress);
}
static void stress_one_work(struct work_struct *work)
@@ -512,8 +510,6 @@ static void stress_one_work(struct work_struct *work)
break;
}
} while (!time_after(jiffies, stress->timeout));
-
- kfree(stress);
}
#define STRESS_INORDER BIT(0)
@@ -524,15 +520,24 @@ static void stress_one_work(struct work_struct *work)
static int stress(int nlocks, int nthreads, unsigned int flags)
{
struct ww_mutex *locks;
- int n;
+ struct stress *stress_array;
+ int n, count;
locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
if (!locks)
return -ENOMEM;
+ stress_array = kmalloc_array(nthreads, sizeof(*stress_array),
+ GFP_KERNEL);
+ if (!stress_array) {
+ kfree(locks);
+ return -ENOMEM;
+ }
+
for (n = 0; n < nlocks; n++)
ww_mutex_init(&locks[n], &ww_class);
+ count = 0;
for (n = 0; nthreads; n++) {
struct stress *stress;
void (*fn)(struct work_struct *work);
@@ -556,9 +561,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
if (!fn)
continue;
- stress = kmalloc(sizeof(*stress), GFP_KERNEL);
- if (!stress)
- break;
+ stress = &stress_array[count++];
INIT_WORK(&stress->work, fn);
stress->locks = locks;
@@ -573,6 +576,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
for (n = 0; n < nlocks; n++)
ww_mutex_destroy(&locks[n]);
+ kfree(stress_array);
kfree(locks);
return 0;
--
2.42.0
From: John Stultz <jstultz(a)google.com>
[ Upstream commit bccdd808902f8c677317cec47c306e42b93b849e ]
In some cases running with the test-ww_mutex code, I was seeing
odd behavior where sometimes it seemed flush_workqueue was
returning before all the work threads were finished.
Often this would cause strange crashes as the mutexes would be
freed while they were being used.
Looking at the code, there is a lifetime problem as the
controlling thread that spawns the work allocates the
"struct stress" structures that are passed to the workqueue
threads. Then when the workqueue threads are finished,
they free the stress struct that was passed to them.
Unfortunately the workqueue work_struct node is in the stress
struct. Which means the work_struct is freed before the work
thread returns and while flush_workqueue is waiting.
It seems like a better idea to have the controlling thread
both allocate and free the stress structures, so that we can
be sure we don't corrupt the workqueue by freeing the structure
prematurely.
So this patch reworks the test to do so, and with this change
I no longer see the early flush_workqueue returns.
Signed-off-by: John Stultz <jstultz(a)google.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/test-ww_mutex.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 3e82f449b4ff7..da36997d8742c 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -426,7 +426,6 @@ static void stress_inorder_work(struct work_struct *work)
} while (!time_after(jiffies, stress->timeout));
kfree(order);
- kfree(stress);
}
struct reorder_lock {
@@ -491,7 +490,6 @@ static void stress_reorder_work(struct work_struct *work)
list_for_each_entry_safe(ll, ln, &locks, link)
kfree(ll);
kfree(order);
- kfree(stress);
}
static void stress_one_work(struct work_struct *work)
@@ -512,8 +510,6 @@ static void stress_one_work(struct work_struct *work)
break;
}
} while (!time_after(jiffies, stress->timeout));
-
- kfree(stress);
}
#define STRESS_INORDER BIT(0)
@@ -524,15 +520,24 @@ static void stress_one_work(struct work_struct *work)
static int stress(int nlocks, int nthreads, unsigned int flags)
{
struct ww_mutex *locks;
- int n;
+ struct stress *stress_array;
+ int n, count;
locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
if (!locks)
return -ENOMEM;
+ stress_array = kmalloc_array(nthreads, sizeof(*stress_array),
+ GFP_KERNEL);
+ if (!stress_array) {
+ kfree(locks);
+ return -ENOMEM;
+ }
+
for (n = 0; n < nlocks; n++)
ww_mutex_init(&locks[n], &ww_class);
+ count = 0;
for (n = 0; nthreads; n++) {
struct stress *stress;
void (*fn)(struct work_struct *work);
@@ -556,9 +561,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
if (!fn)
continue;
- stress = kmalloc(sizeof(*stress), GFP_KERNEL);
- if (!stress)
- break;
+ stress = &stress_array[count++];
INIT_WORK(&stress->work, fn);
stress->locks = locks;
@@ -573,6 +576,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
for (n = 0; n < nlocks; n++)
ww_mutex_destroy(&locks[n]);
+ kfree(stress_array);
kfree(locks);
return 0;
--
2.42.0
From: John Stultz <jstultz(a)google.com>
[ Upstream commit bccdd808902f8c677317cec47c306e42b93b849e ]
In some cases running with the test-ww_mutex code, I was seeing
odd behavior where sometimes it seemed flush_workqueue was
returning before all the work threads were finished.
Often this would cause strange crashes as the mutexes would be
freed while they were being used.
Looking at the code, there is a lifetime problem as the
controlling thread that spawns the work allocates the
"struct stress" structures that are passed to the workqueue
threads. Then when the workqueue threads are finished,
they free the stress struct that was passed to them.
Unfortunately the workqueue work_struct node is in the stress
struct. Which means the work_struct is freed before the work
thread returns and while flush_workqueue is waiting.
It seems like a better idea to have the controlling thread
both allocate and free the stress structures, so that we can
be sure we don't corrupt the workqueue by freeing the structure
prematurely.
So this patch reworks the test to do so, and with this change
I no longer see the early flush_workqueue returns.
Signed-off-by: John Stultz <jstultz(a)google.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/test-ww_mutex.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 3e82f449b4ff7..da36997d8742c 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -426,7 +426,6 @@ static void stress_inorder_work(struct work_struct *work)
} while (!time_after(jiffies, stress->timeout));
kfree(order);
- kfree(stress);
}
struct reorder_lock {
@@ -491,7 +490,6 @@ static void stress_reorder_work(struct work_struct *work)
list_for_each_entry_safe(ll, ln, &locks, link)
kfree(ll);
kfree(order);
- kfree(stress);
}
static void stress_one_work(struct work_struct *work)
@@ -512,8 +510,6 @@ static void stress_one_work(struct work_struct *work)
break;
}
} while (!time_after(jiffies, stress->timeout));
-
- kfree(stress);
}
#define STRESS_INORDER BIT(0)
@@ -524,15 +520,24 @@ static void stress_one_work(struct work_struct *work)
static int stress(int nlocks, int nthreads, unsigned int flags)
{
struct ww_mutex *locks;
- int n;
+ struct stress *stress_array;
+ int n, count;
locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
if (!locks)
return -ENOMEM;
+ stress_array = kmalloc_array(nthreads, sizeof(*stress_array),
+ GFP_KERNEL);
+ if (!stress_array) {
+ kfree(locks);
+ return -ENOMEM;
+ }
+
for (n = 0; n < nlocks; n++)
ww_mutex_init(&locks[n], &ww_class);
+ count = 0;
for (n = 0; nthreads; n++) {
struct stress *stress;
void (*fn)(struct work_struct *work);
@@ -556,9 +561,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
if (!fn)
continue;
- stress = kmalloc(sizeof(*stress), GFP_KERNEL);
- if (!stress)
- break;
+ stress = &stress_array[count++];
INIT_WORK(&stress->work, fn);
stress->locks = locks;
@@ -573,6 +576,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
for (n = 0; n < nlocks; n++)
ww_mutex_destroy(&locks[n]);
+ kfree(stress_array);
kfree(locks);
return 0;
--
2.42.0
From: John Stultz <jstultz(a)google.com>
[ Upstream commit bccdd808902f8c677317cec47c306e42b93b849e ]
In some cases running with the test-ww_mutex code, I was seeing
odd behavior where sometimes it seemed flush_workqueue was
returning before all the work threads were finished.
Often this would cause strange crashes as the mutexes would be
freed while they were being used.
Looking at the code, there is a lifetime problem as the
controlling thread that spawns the work allocates the
"struct stress" structures that are passed to the workqueue
threads. Then when the workqueue threads are finished,
they free the stress struct that was passed to them.
Unfortunately the workqueue work_struct node is in the stress
struct. Which means the work_struct is freed before the work
thread returns and while flush_workqueue is waiting.
It seems like a better idea to have the controlling thread
both allocate and free the stress structures, so that we can
be sure we don't corrupt the workqueue by freeing the structure
prematurely.
So this patch reworks the test to do so, and with this change
I no longer see the early flush_workqueue returns.
Signed-off-by: John Stultz <jstultz(a)google.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/test-ww_mutex.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 43efb2a041602..b1e25695185a4 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -466,7 +466,6 @@ static void stress_inorder_work(struct work_struct *work)
} while (!time_after(jiffies, stress->timeout));
kfree(order);
- kfree(stress);
}
struct reorder_lock {
@@ -531,7 +530,6 @@ static void stress_reorder_work(struct work_struct *work)
list_for_each_entry_safe(ll, ln, &locks, link)
kfree(ll);
kfree(order);
- kfree(stress);
}
static void stress_one_work(struct work_struct *work)
@@ -552,8 +550,6 @@ static void stress_one_work(struct work_struct *work)
break;
}
} while (!time_after(jiffies, stress->timeout));
-
- kfree(stress);
}
#define STRESS_INORDER BIT(0)
@@ -564,15 +560,24 @@ static void stress_one_work(struct work_struct *work)
static int stress(int nlocks, int nthreads, unsigned int flags)
{
struct ww_mutex *locks;
- int n;
+ struct stress *stress_array;
+ int n, count;
locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
if (!locks)
return -ENOMEM;
+ stress_array = kmalloc_array(nthreads, sizeof(*stress_array),
+ GFP_KERNEL);
+ if (!stress_array) {
+ kfree(locks);
+ return -ENOMEM;
+ }
+
for (n = 0; n < nlocks; n++)
ww_mutex_init(&locks[n], &ww_class);
+ count = 0;
for (n = 0; nthreads; n++) {
struct stress *stress;
void (*fn)(struct work_struct *work);
@@ -596,9 +601,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
if (!fn)
continue;
- stress = kmalloc(sizeof(*stress), GFP_KERNEL);
- if (!stress)
- break;
+ stress = &stress_array[count++];
INIT_WORK(&stress->work, fn);
stress->locks = locks;
@@ -613,6 +616,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
for (n = 0; n < nlocks; n++)
ww_mutex_destroy(&locks[n]);
+ kfree(stress_array);
kfree(locks);
return 0;
--
2.42.0
From: John Stultz <jstultz(a)google.com>
[ Upstream commit bccdd808902f8c677317cec47c306e42b93b849e ]
In some cases running with the test-ww_mutex code, I was seeing
odd behavior where sometimes it seemed flush_workqueue was
returning before all the work threads were finished.
Often this would cause strange crashes as the mutexes would be
freed while they were being used.
Looking at the code, there is a lifetime problem as the
controlling thread that spawns the work allocates the
"struct stress" structures that are passed to the workqueue
threads. Then when the workqueue threads are finished,
they free the stress struct that was passed to them.
Unfortunately the workqueue work_struct node is in the stress
struct. Which means the work_struct is freed before the work
thread returns and while flush_workqueue is waiting.
It seems like a better idea to have the controlling thread
both allocate and free the stress structures, so that we can
be sure we don't corrupt the workqueue by freeing the structure
prematurely.
So this patch reworks the test to do so, and with this change
I no longer see the early flush_workqueue returns.
Signed-off-by: John Stultz <jstultz(a)google.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/test-ww_mutex.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 93cca6e698600..7c5a8f05497f2 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -466,7 +466,6 @@ static void stress_inorder_work(struct work_struct *work)
} while (!time_after(jiffies, stress->timeout));
kfree(order);
- kfree(stress);
}
struct reorder_lock {
@@ -531,7 +530,6 @@ static void stress_reorder_work(struct work_struct *work)
list_for_each_entry_safe(ll, ln, &locks, link)
kfree(ll);
kfree(order);
- kfree(stress);
}
static void stress_one_work(struct work_struct *work)
@@ -552,8 +550,6 @@ static void stress_one_work(struct work_struct *work)
break;
}
} while (!time_after(jiffies, stress->timeout));
-
- kfree(stress);
}
#define STRESS_INORDER BIT(0)
@@ -564,15 +560,24 @@ static void stress_one_work(struct work_struct *work)
static int stress(int nlocks, int nthreads, unsigned int flags)
{
struct ww_mutex *locks;
- int n;
+ struct stress *stress_array;
+ int n, count;
locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
if (!locks)
return -ENOMEM;
+ stress_array = kmalloc_array(nthreads, sizeof(*stress_array),
+ GFP_KERNEL);
+ if (!stress_array) {
+ kfree(locks);
+ return -ENOMEM;
+ }
+
for (n = 0; n < nlocks; n++)
ww_mutex_init(&locks[n], &ww_class);
+ count = 0;
for (n = 0; nthreads; n++) {
struct stress *stress;
void (*fn)(struct work_struct *work);
@@ -596,9 +601,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
if (!fn)
continue;
- stress = kmalloc(sizeof(*stress), GFP_KERNEL);
- if (!stress)
- break;
+ stress = &stress_array[count++];
INIT_WORK(&stress->work, fn);
stress->locks = locks;
@@ -613,6 +616,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
for (n = 0; n < nlocks; n++)
ww_mutex_destroy(&locks[n]);
+ kfree(stress_array);
kfree(locks);
return 0;
--
2.42.0
From: John Stultz <jstultz(a)google.com>
[ Upstream commit bccdd808902f8c677317cec47c306e42b93b849e ]
In some cases running with the test-ww_mutex code, I was seeing
odd behavior where sometimes it seemed flush_workqueue was
returning before all the work threads were finished.
Often this would cause strange crashes as the mutexes would be
freed while they were being used.
Looking at the code, there is a lifetime problem as the
controlling thread that spawns the work allocates the
"struct stress" structures that are passed to the workqueue
threads. Then when the workqueue threads are finished,
they free the stress struct that was passed to them.
Unfortunately the workqueue work_struct node is in the stress
struct. Which means the work_struct is freed before the work
thread returns and while flush_workqueue is waiting.
It seems like a better idea to have the controlling thread
both allocate and free the stress structures, so that we can
be sure we don't corrupt the workqueue by freeing the structure
prematurely.
So this patch reworks the test to do so, and with this change
I no longer see the early flush_workqueue returns.
Signed-off-by: John Stultz <jstultz(a)google.com>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Link: https://lore.kernel.org/r/20230922043616.19282-3-jstultz@google.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/locking/test-ww_mutex.c | 20 ++++++++++++--------
1 file changed, 12 insertions(+), 8 deletions(-)
diff --git a/kernel/locking/test-ww_mutex.c b/kernel/locking/test-ww_mutex.c
index 93cca6e698600..7c5a8f05497f2 100644
--- a/kernel/locking/test-ww_mutex.c
+++ b/kernel/locking/test-ww_mutex.c
@@ -466,7 +466,6 @@ static void stress_inorder_work(struct work_struct *work)
} while (!time_after(jiffies, stress->timeout));
kfree(order);
- kfree(stress);
}
struct reorder_lock {
@@ -531,7 +530,6 @@ static void stress_reorder_work(struct work_struct *work)
list_for_each_entry_safe(ll, ln, &locks, link)
kfree(ll);
kfree(order);
- kfree(stress);
}
static void stress_one_work(struct work_struct *work)
@@ -552,8 +550,6 @@ static void stress_one_work(struct work_struct *work)
break;
}
} while (!time_after(jiffies, stress->timeout));
-
- kfree(stress);
}
#define STRESS_INORDER BIT(0)
@@ -564,15 +560,24 @@ static void stress_one_work(struct work_struct *work)
static int stress(int nlocks, int nthreads, unsigned int flags)
{
struct ww_mutex *locks;
- int n;
+ struct stress *stress_array;
+ int n, count;
locks = kmalloc_array(nlocks, sizeof(*locks), GFP_KERNEL);
if (!locks)
return -ENOMEM;
+ stress_array = kmalloc_array(nthreads, sizeof(*stress_array),
+ GFP_KERNEL);
+ if (!stress_array) {
+ kfree(locks);
+ return -ENOMEM;
+ }
+
for (n = 0; n < nlocks; n++)
ww_mutex_init(&locks[n], &ww_class);
+ count = 0;
for (n = 0; nthreads; n++) {
struct stress *stress;
void (*fn)(struct work_struct *work);
@@ -596,9 +601,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
if (!fn)
continue;
- stress = kmalloc(sizeof(*stress), GFP_KERNEL);
- if (!stress)
- break;
+ stress = &stress_array[count++];
INIT_WORK(&stress->work, fn);
stress->locks = locks;
@@ -613,6 +616,7 @@ static int stress(int nlocks, int nthreads, unsigned int flags)
for (n = 0; n < nlocks; n++)
ww_mutex_destroy(&locks[n]);
+ kfree(stress_array);
kfree(locks);
return 0;
--
2.42.0
Recent changes to kernel_connect() and kernel_bind() ensure that
callers are insulated from changes to the address parameter made by BPF
SOCK_ADDR hooks. This patch wraps direct calls to ops->connect() and
ops->bind() with kernel_connect() and kernel_bind() to protect callers
in such cases.
Link: https://lore.kernel.org/netdev/9944248dba1bce861375fcce9de663934d933ba9.cam…
Fixes: d74bad4e74ee ("bpf: Hooks for sys_connect")
Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jordan Rife <jrife(a)google.com>
---
fs/dlm/lowcomms.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index 67f8dd8a05ef2..6296c62c10fa9 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1817,8 +1817,8 @@ static int dlm_tcp_bind(struct socket *sock)
memcpy(&src_addr, &dlm_local_addr[0], sizeof(src_addr));
make_sockaddr(&src_addr, 0, &addr_len);
- result = sock->ops->bind(sock, (struct sockaddr *)&src_addr,
- addr_len);
+ result = kernel_bind(sock, (struct sockaddr *)&src_addr,
+ addr_len);
if (result < 0) {
/* This *may* not indicate a critical error */
log_print("could not bind for connect: %d", result);
@@ -1830,7 +1830,7 @@ static int dlm_tcp_bind(struct socket *sock)
static int dlm_tcp_connect(struct connection *con, struct socket *sock,
struct sockaddr *addr, int addr_len)
{
- return sock->ops->connect(sock, addr, addr_len, O_NONBLOCK);
+ return kernel_connect(sock, addr, addr_len, O_NONBLOCK);
}
static int dlm_tcp_listen_validate(void)
@@ -1862,8 +1862,8 @@ static int dlm_tcp_listen_bind(struct socket *sock)
/* Bind to our port */
make_sockaddr(&dlm_local_addr[0], dlm_config.ci_tcp_port, &addr_len);
- return sock->ops->bind(sock, (struct sockaddr *)&dlm_local_addr[0],
- addr_len);
+ return kernel_bind(sock, (struct sockaddr *)&dlm_local_addr[0],
+ addr_len);
}
static const struct dlm_proto_ops dlm_tcp_ops = {
@@ -1888,12 +1888,12 @@ static int dlm_sctp_connect(struct connection *con, struct socket *sock,
int ret;
/*
- * Make sock->ops->connect() function return in specified time,
+ * Make kernel_connect() function return in specified time,
* since O_NONBLOCK argument in connect() function does not work here,
* then, we should restore the default value of this attribute.
*/
sock_set_sndtimeo(sock->sk, 5);
- ret = sock->ops->connect(sock, addr, addr_len, 0);
+ ret = kernel_connect(sock, addr, addr_len, 0);
sock_set_sndtimeo(sock->sk, 0);
return ret;
}
--
2.42.0.869.gea05f2083d-goog
Greg,
Friday before the merge window opened, I received a bug report
for the eventfs code that was in linux-next. I spent the next
5 days debugging it and not only fixing it, but it led to finding
other bugs in the code. Several of these other bugs happen to
also affect the 6.6 kernel.
The eventfs code was written in two parts to lower the complexity.
The first part added just the dynamic creation of the eventfs
file system and that was added to 6.6.
The second part went further and removed the one-to-one mapping between
dentry/inode and meta data, as all events have the same files. It replaced
the meta data for each file with callbacks, which caused quite a bit of
code churn.
As the merge window was already open, when I finished all the fixes
I just sent those fixes on top of the linux-next changes along with
my pull request. That means, there are 5 commits that are marked
stable (or should be marked for stable) that need to be applied to
6.6 but require a bit of tweaking or even a new way of implementing the fix!
After sending the pull request, I then checked out 6.6 an took those
5 changes and fixed them up on top of it. I ran them through all my
tests that I use to send to Linus.
So these should be as good as the versions of the patches in Linus's tree.
I waited until Linus pulled in those changes to send this series out.
-- Steve
Steven Rostedt (Google) (5):
tracing: Have trace_event_file have ref counters
eventfs: Remove "is_freed" union with rcu head
eventfs: Save ownership and mode
eventfs: Delete eventfs_inode when the last dentry is freed
eventfs: Use simple_recursive_removal() to clean up dentries
----
fs/tracefs/event_inode.c | 288 +++++++++++++++++++++++--------------
include/linux/trace_events.h | 4 +
kernel/trace/trace.c | 15 ++
kernel/trace/trace.h | 3 +
kernel/trace/trace_events.c | 31 +++-
kernel/trace/trace_events_filter.c | 3 +
6 files changed, 231 insertions(+), 113 deletions(-)
The patch titled
Subject: mm: fix for negative counter: nr_file_hugepages
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-for-negative-counter-nr_file_hugepages.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Stefan Roesch <shr(a)devkernel.io>
Subject: mm: fix for negative counter: nr_file_hugepages
Date: Mon, 6 Nov 2023 10:19:18 -0800
While qualifiying the 6.4 release, the following warning was detected in
messages:
vmstat_refresh: nr_file_hugepages -15664
The warning is caused by the incorrect updating of the NR_FILE_THPS
counter in the function split_huge_page_to_list. The if case is checking
for folio_test_swapbacked, but the else case is missing the check for
folio_test_pmd_mappable. The other functions that manipulate the counter
like __filemap_add_folio and filemap_unaccount_folio have the
corresponding check.
I have a test case, which reproduces the problem. It can be found here:
https://github.com/sroeschus/testcase/blob/main/vmstat_refresh/madv.c
The test case reproduces on an XFS filesystem. Running the same test
case on a BTRFS filesystem does not reproduce the problem.
AFAIK version 6.1 until 6.6 are affected by this problem.
Link: https://lkml.kernel.org/r/20231106181918.1091043-1-shr@devkernel.io
Signed-off-by: Stefan Roesch <shr(a)devkernel.io>
Co-debugged-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Rik van Riel <riel(a)surriel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/huge_memory.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/huge_memory.c~mm-fix-for-negative-counter-nr_file_hugepages
+++ a/mm/huge_memory.c
@@ -2772,7 +2772,8 @@ int split_huge_page_to_list(struct page
if (folio_test_swapbacked(folio)) {
__lruvec_stat_mod_folio(folio, NR_SHMEM_THPS,
-nr);
- } else {
+ } else if (folio_test_pmd_mappable(folio)) {
+
__lruvec_stat_mod_folio(folio, NR_FILE_THPS,
-nr);
filemap_nr_thps_dec(mapping);
_
Patches currently in -mm which might be from shr(a)devkernel.io are
mm-fix-for-negative-counter-nr_file_hugepages.patch
The ttyname buffer for the ledtrig_tty_data struct is allocated in the
sysfs ttyname_store() function. This buffer must be released on trigger
deactivation. This was missing and is thus a memory leak.
While we are at it, the tty handler in the ledtrig_tty_data struct should
also be returned in case of the trigger deactivation call.
Cc: stable(a)vger.kernel.org
Fixes: fd4a641ac88f ("leds: trigger: implement a tty trigger")
Signed-off-by: Florian Eckert <fe(a)dev.tdt.de>
---
v1 -> v2:
Add Cc: tag
drivers/leds/trigger/ledtrig-tty.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/leds/trigger/ledtrig-tty.c b/drivers/leds/trigger/ledtrig-tty.c
index 8ae0d2d284af..3e69a7bde928 100644
--- a/drivers/leds/trigger/ledtrig-tty.c
+++ b/drivers/leds/trigger/ledtrig-tty.c
@@ -168,6 +168,10 @@ static void ledtrig_tty_deactivate(struct led_classdev *led_cdev)
cancel_delayed_work_sync(&trigger_data->dwork);
+ kfree(trigger_data->ttyname);
+ tty_kref_put(trigger_data->tty);
+ trigger_data->tty = NULL;
+
kfree(trigger_data);
}
--
2.30.2
On Mon, Nov 06, 2023 at 01:18:36PM +0100, gregkh(a)linuxfoundation.org wrote:
>
> This is a note to let you know that I've just added the patch titled
>
> tty: 8250: Add support for Intashield IX cards
>
> to the 5.10-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> tty-8250-add-support-for-intashield-ix-cards.patch
> and it can be found in the queue-5.10 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
I don't think this patch should be in 5.10-stable. It's using the
pbn_oxsemi_x_15625000 configuration which isn't available in the version
of the driver (it's actually pbn_oxsemi_x_3906250 in this version).
The rest of the patches to be merged look OK for this branch (as they are
all using the generic configuration rather than Oxsemi).
>
Thanks,
Cameron
> From 62d2ec2ded278c7512d91ca7bf8eb9bac46baf90 Mon Sep 17 00:00:00 2001
> From: Cameron Williams <cang1(a)live.co.uk>
> Date: Fri, 20 Oct 2023 17:03:16 +0100
> Subject: tty: 8250: Add support for Intashield IX cards
>
> From: Cameron Williams <cang1(a)live.co.uk>
>
> commit 62d2ec2ded278c7512d91ca7bf8eb9bac46baf90 upstream.
>
> Add support for the IX-100, IX-200 and IX-400 serial cards.
>
> Cc: stable(a)vger.kernel.org
> Signed-off-by: Cameron Williams <cang1(a)live.co.uk>
> Link: https://lore.kernel.org/r/DU0PR02MB7899614E5837E82A03272A4BC4DBA@DU0PR02MB7…
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> ---
> drivers/tty/serial/8250/8250_pci.c | 21 +++++++++++++++++++++
> 1 file changed, 21 insertions(+)
>
> --- a/drivers/tty/serial/8250/8250_pci.c
> +++ b/drivers/tty/serial/8250/8250_pci.c
> @@ -5150,6 +5150,27 @@ static const struct pci_device_id serial
> { PCI_VENDOR_ID_INTASHIELD, PCI_DEVICE_ID_INTASHIELD_IS400,
> PCI_ANY_ID, PCI_ANY_ID, 0, 0, /* 135a.0dc0 */
> pbn_b2_4_115200 },
> + /*
> + * IntaShield IX-100
> + */
> + { PCI_VENDOR_ID_INTASHIELD, 0x4027,
> + PCI_ANY_ID, PCI_ANY_ID,
> + 0, 0,
> + pbn_oxsemi_1_15625000 },
> + /*
> + * IntaShield IX-200
> + */
> + { PCI_VENDOR_ID_INTASHIELD, 0x4028,
> + PCI_ANY_ID, PCI_ANY_ID,
> + 0, 0,
> + pbn_oxsemi_2_15625000 },
> + /*
> + * IntaShield IX-400
> + */
> + { PCI_VENDOR_ID_INTASHIELD, 0x4029,
> + PCI_ANY_ID, PCI_ANY_ID,
> + 0, 0,
> + pbn_oxsemi_4_15625000 },
> /* Brainboxes Devices */
> /*
> * Brainboxes UC-101
>
>
> Patches currently in stable-queue which might be from cang1(a)live.co.uk are
>
> queue-5.10/tty-8250-add-support-for-additional-brainboxes-uc-cards.patch
> queue-5.10/tty-8250-add-support-for-intashield-ix-cards.patch
> queue-5.10/tty-8250-add-support-for-brainboxes-up-cards.patch
> queue-5.10/tty-8250-add-support-for-intashield-is-100.patch
> queue-5.10/tty-8250-remove-uc-257-and-uc-431.patch
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x b65235f6e102354ccafda601eaa1c5bef5284d21
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023102017-human-marine-7125@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b65235f6e102354ccafda601eaa1c5bef5284d21 Mon Sep 17 00:00:00 2001
From: Maxim Levitsky <mlevitsk(a)redhat.com>
Date: Thu, 28 Sep 2023 20:33:51 +0300
Subject: [PATCH] x86: KVM: SVM: always update the x2avic msr interception
The following problem exists since x2avic was enabled in the KVM:
svm_set_x2apic_msr_interception is called to enable the interception of
the x2apic msrs.
In particular it is called at the moment the guest resets its apic.
Assuming that the guest's apic was in x2apic mode, the reset will bring
it back to the xapic mode.
The svm_set_x2apic_msr_interception however has an erroneous check for
'!apic_x2apic_mode()' which prevents it from doing anything in this case.
As a result of this, all x2apic msrs are left unintercepted, and that
exposes the bare metal x2apic (if enabled) to the guest.
Oops.
Remove the erroneous '!apic_x2apic_mode()' check to fix that.
This fixes CVE-2023-5090
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: Maxim Levitsky <mlevitsk(a)redhat.com>
Reviewed-by: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Tested-by: Suravee Suthikulpanit <suravee.suthikulpanit(a)amd.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20230928173354.217464-2-mlevitsk(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 9507df93f410..acdd0b89e471 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -913,8 +913,7 @@ void svm_set_x2apic_msr_interception(struct vcpu_svm *svm, bool intercept)
if (intercept == svm->x2avic_msrs_intercepted)
return;
- if (!x2avic_enabled ||
- !apic_x2apic_mode(svm->vcpu.arch.apic))
+ if (!x2avic_enabled)
return;
for (i = 0; i < MAX_DIRECT_ACCESS_MSRS; i++) {
This helper is used for checking if the connected host supports
the feature, it can be moved into generic code to be used by other
smu implementations as well.
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Reviewed-by: Evan Quan <evan.quan(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 188623076d0f1a500583d392b6187056bf7cc71a)
The original problematic dGPU is not supported in 5.15.
Just introduce new function for 5.15 as a dependency for fixing
unrelated dGPU that uses this symbol as well.
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
v1->v2:
* Update commit to 6.5-rc2 commit.
It merged as both of these:
188623076d0f1a500583d392b6187056bf7cc71a
5d1eb4c4c872b55664f5754cc16827beff8630a7
It's already been backported into 6.1.y as well.
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d90da384d185..1f1e7966beb5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1285,6 +1285,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
int amdgpu_device_pci_reset(struct amdgpu_device *adev);
bool amdgpu_device_need_post(struct amdgpu_device *adev);
+bool amdgpu_device_pcie_dynamic_switching_supported(void);
bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);
bool amdgpu_device_aspm_support_quirk(void);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2cf49a32ac6c..f57334fff7fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1319,6 +1319,25 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev)
return true;
}
+/*
+ * Intel hosts such as Raptor Lake and Sapphire Rapids don't support dynamic
+ * speed switching. Until we have confirmation from Intel that a specific host
+ * supports it, it's safer that we keep it disabled for all.
+ *
+ * https://edc.intel.com/content/www/us/en/design/products/platforms/details/r…
+ * https://gitlab.freedesktop.org/drm/amd/-/issues/2663
+ */
+bool amdgpu_device_pcie_dynamic_switching_supported(void)
+{
+#if IS_ENABLED(CONFIG_X86)
+ struct cpuinfo_x86 *c = &cpu_data(0);
+
+ if (c->x86_vendor == X86_VENDOR_INTEL)
+ return false;
+#endif
+ return true;
+}
+
/**
* amdgpu_device_should_use_aspm - check if the device should program ASPM
*
--
2.34.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x d920abd1e7c4884f9ecd0749d1921b7ab19ddfbd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023102012-pleat-snippet-29cf@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d920abd1e7c4884f9ecd0749d1921b7ab19ddfbd Mon Sep 17 00:00:00 2001
From: Sagi Grimberg <sagi(a)grimberg.me>
Date: Mon, 2 Oct 2023 13:54:28 +0300
Subject: [PATCH] nvmet-tcp: Fix a possible UAF in queue intialization setup
From Alon:
"Due to a logical bug in the NVMe-oF/TCP subsystem in the Linux kernel,
a malicious user can cause a UAF and a double free, which may lead to
RCE (may also lead to an LPE in case the attacker already has local
privileges)."
Hence, when a queue initialization fails after the ahash requests are
allocated, it is guaranteed that the queue removal async work will be
called, hence leave the deallocation to the queue removal.
Also, be extra careful not to continue processing the socket, so set
queue rcv_state to NVMET_TCP_RECV_ERR upon a socket error.
Cc: stable(a)vger.kernel.org
Reported-by: Alon Zahavi <zahavi.alon(a)gmail.com>
Tested-by: Alon Zahavi <zahavi.alon(a)gmail.com>
Signed-off-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Chaitanya Kulkarni <kch(a)nvidia.com>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
index cd92d7ddf5ed..197fc2ecb164 100644
--- a/drivers/nvme/target/tcp.c
+++ b/drivers/nvme/target/tcp.c
@@ -372,6 +372,7 @@ static void nvmet_tcp_fatal_error(struct nvmet_tcp_queue *queue)
static void nvmet_tcp_socket_error(struct nvmet_tcp_queue *queue, int status)
{
+ queue->rcv_state = NVMET_TCP_RECV_ERR;
if (status == -EPIPE || status == -ECONNRESET)
kernel_sock_shutdown(queue->sock, SHUT_RDWR);
else
@@ -910,15 +911,11 @@ static int nvmet_tcp_handle_icreq(struct nvmet_tcp_queue *queue)
iov.iov_len = sizeof(*icresp);
ret = kernel_sendmsg(queue->sock, &msg, &iov, 1, iov.iov_len);
if (ret < 0)
- goto free_crypto;
+ return ret; /* queue removal will cleanup */
queue->state = NVMET_TCP_Q_LIVE;
nvmet_prepare_receive_pdu(queue);
return 0;
-free_crypto:
- if (queue->hdr_digest || queue->data_digest)
- nvmet_tcp_free_crypto(queue);
- return ret;
}
static void nvmet_tcp_handle_req_failure(struct nvmet_tcp_queue *queue,
The backport of commit 9c5df2f14ee3 ("can: isotp: isotp_ops: fix poll() to
not report false EPOLLOUT events") introduced a new regression where the
fix could potentially introduce new side effects.
To reduce the risk of other unmet dependencies and missing fixes and checks
the latest 6.1 LTS code base is ported back to the 5.10 LTS tree.
Lukas Magel (1):
can: isotp: isotp_sendmsg(): fix TX state detection and wait behavior
Oliver Hartkopp (6):
can: isotp: set max PDU size to 64 kByte
can: isotp: isotp_bind(): return -EINVAL on incorrect CAN ID formatting
can: isotp: check CAN address family in isotp_bind()
can: isotp: handle wait_event_interruptible() return values
can: isotp: add local echo tx processing and tx without FC
can: isotp: isotp_bind(): do not validate unused address information
Patrick Menschel (3):
can: isotp: change error format from decimal to symbolic error names
can: isotp: add symbolic error message to isotp_module_init()
can: isotp: Add error message if txqueuelen is too small
include/uapi/linux/can/isotp.h | 25 +-
net/can/isotp.c | 434 +++++++++++++++++++++------------
2 files changed, 293 insertions(+), 166 deletions(-)
--
2.34.1
This reverts commit a08799cf17c22375752abfad3b4a2b34b3acb287.
The recently added Realtek PHY drivers depend on the new port status
notification mechanism which was built on the deprecated USB PHY
implementation and devicetree binding.
Specifically, using these PHYs would require describing the very same
PHY using both the generic "phy" property and the deprecated "usb-phy"
property which is clearly wrong.
We should not be building new functionality on top of the legacy USB PHY
implementation even if it is currently stuck in some kind of
transitional limbo.
Revert the new notification interface which is broken by design.
Fixes: a08799cf17c2 ("usb: phy: add usb phy notify port status API")
Cc: stable(a)vger.kernel.org # 6.6
Cc: Stanley Chang <stanley_chang(a)realtek.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/usb/core/hub.c | 23 -----------------------
include/linux/usb/phy.h | 13 -------------
2 files changed, 36 deletions(-)
diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 0ff47eeffb49..dfc30cebd4c4 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -622,29 +622,6 @@ static int hub_ext_port_status(struct usb_hub *hub, int port1, int type,
ret = 0;
}
mutex_unlock(&hub->status_mutex);
-
- /*
- * There is no need to lock status_mutex here, because status_mutex
- * protects hub->status, and the phy driver only checks the port
- * status without changing the status.
- */
- if (!ret) {
- struct usb_device *hdev = hub->hdev;
-
- /*
- * Only roothub will be notified of port state changes,
- * since the USB PHY only cares about changes at the next
- * level.
- */
- if (is_root_hub(hdev)) {
- struct usb_hcd *hcd = bus_to_hcd(hdev->bus);
-
- if (hcd->usb_phy)
- usb_phy_notify_port_status(hcd->usb_phy,
- port1 - 1, *status, *change);
- }
- }
-
return ret;
}
diff --git a/include/linux/usb/phy.h b/include/linux/usb/phy.h
index b513749582d7..e4de6bc1f69b 100644
--- a/include/linux/usb/phy.h
+++ b/include/linux/usb/phy.h
@@ -144,10 +144,6 @@ struct usb_phy {
*/
int (*set_wakeup)(struct usb_phy *x, bool enabled);
- /* notify phy port status change */
- int (*notify_port_status)(struct usb_phy *x, int port,
- u16 portstatus, u16 portchange);
-
/* notify phy connect status change */
int (*notify_connect)(struct usb_phy *x,
enum usb_device_speed speed);
@@ -320,15 +316,6 @@ usb_phy_set_wakeup(struct usb_phy *x, bool enabled)
return 0;
}
-static inline int
-usb_phy_notify_port_status(struct usb_phy *x, int port, u16 portstatus, u16 portchange)
-{
- if (x && x->notify_port_status)
- return x->notify_port_status(x, port, portstatus, portchange);
- else
- return 0;
-}
-
static inline int
usb_phy_notify_connect(struct usb_phy *x, enum usb_device_speed speed)
{
--
2.41.0
This is the candidate patch of CVE-2023-47233 :
https://nvd.nist.gov/vuln/detail/CVE-2023-47233
In brcm80211 driver,it starts with the following invoking chain
to start init a timeout worker:
->brcmf_usb_probe
->brcmf_usb_probe_cb
->brcmf_attach
->brcmf_bus_started
->brcmf_cfg80211_attach
->wl_init_priv
->brcmf_init_escan
->INIT_WORK(&cfg->escan_timeout_work,
brcmf_cfg80211_escan_timeout_worker);
If we disconnect the USB by hotplug, it will call
brcmf_usb_disconnect to make cleanup. The invoking chain is :
brcmf_usb_disconnect
->brcmf_usb_disconnect_cb
->brcmf_detach
->brcmf_cfg80211_detach
->kfree(cfg);
While the timeout woker may still be running. This will cause
a use-after-free bug on cfg in brcmf_cfg80211_escan_timeout_worker.
Fix it by deleting the timer and canceling the worker in
brcmf_cfg80211_detach.
Fixes: e756af5b30b0 ("brcmfmac: add e-scan support.")
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Cc: stable(a)vger.kernel.org
---
v4:
- rename the subject and add CVE number as Ping-Ke Shih suggested
v3:
- rename the subject as Johannes suggested
v2:
- fix the error of kernel test bot reported
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 667462369a32..646ec8bdf512 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -8431,6 +8431,9 @@ void brcmf_cfg80211_detach(struct brcmf_cfg80211_info *cfg)
if (!cfg)
return;
+ if (timer_pending(&cfg->escan_timeout))
+ del_timer_sync(&cfg->escan_timeout);
+ cancel_work_sync(&cfg->escan_timeout_work);
brcmf_pno_detach(cfg);
brcmf_btcoex_detach(cfg);
wiphy_unregister(cfg->wiphy);
--
2.25.1
Hi,
commit 2f46993d83ff upstream.
Please backport this commit to the other stable kernels, since this
patch landed on 6.1 and We've seen 30% improvement with docker
containers running heavy cpu/mem tasks
-MNAdam
In brcm80211 driver,it starts with the following invoking chain
to start init a timeout worker:
->brcmf_usb_probe
->brcmf_usb_probe_cb
->brcmf_attach
->brcmf_bus_started
->brcmf_cfg80211_attach
->wl_init_priv
->brcmf_init_escan
->INIT_WORK(&cfg->escan_timeout_work,
brcmf_cfg80211_escan_timeout_worker);
If we disconnect the USB by hotplug, it will call
brcmf_usb_disconnect to make cleanup. The invoking chain is :
brcmf_usb_disconnect
->brcmf_usb_disconnect_cb
->brcmf_detach
->brcmf_cfg80211_detach
->kfree(cfg);
While the timeout woker may still be running. This will cause
a use-after-free bug on cfg in brcmf_cfg80211_escan_timeout_worker.
Fix it by deleting the timer and canceling the worker in
brcmf_cfg80211_detach.
Fixes: e756af5b30b0 ("brcmfmac: add e-scan support.")
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Cc: stable(a)vger.kernel.org
---
v3:
- rename the subject as Johannes suggested
v2:
- fix the error of kernel test bot reported
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 667462369a32..646ec8bdf512 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -8431,6 +8431,9 @@ void brcmf_cfg80211_detach(struct brcmf_cfg80211_info *cfg)
if (!cfg)
return;
+ if (timer_pending(&cfg->escan_timeout))
+ del_timer_sync(&cfg->escan_timeout);
+ cancel_work_sync(&cfg->escan_timeout_work);
brcmf_pno_detach(cfg);
brcmf_btcoex_detach(cfg);
wiphy_unregister(cfg->wiphy);
--
2.25.1
Goededag,
Ik ben mevrouw Joanna Liu en een medewerker van Citi Bank Hong Kong.
Kan ik € 100.000.000 aan u overmaken? Kan ik je vertrouwen
Ik wacht op jullie reacties
Met vriendelijke groeten
mevrouw Joanna Liu
Hi,
There is a problem under high IRQ that PSR can hang. We've got a few
bug reports like this.
Can you please bring this commit into 6.5.y and 6.6.y:
79df45dc4bfb ("drm/amd/display: Don't use fsleep for PSR exit waits")
This restores some of the behavior of the PSR interrupt handling to how
it behaved in older kernels before it was changed in 6.4-rc1 by
c69fc3d0de6ca.
Thanks,
Greetings,
I wonder why you continue neglecting my emails. Please, acknowledge
the receipt of this message in reference to the subject above as I
intend to send to you the details of the mail. Sometimes, try to check
your spam box because most of these correspondences fall out sometimes
in SPAM folder.
Best regards,
Hi,
I noticed that without commit 0b4e3b6f6b79 ("rust: types: make `Opaque`
be `!Unpin`") the `Opaque` type has an unsound API:
The `Opaque` type is designed to wrap C types, hence it is often used to
convert raw pointers to references in Rust. Normally `&mut` references
are unique, but for `&mut Opaque<T>` this is should not be the case,
since C also has pointers to the object. The way to disable the
uniqueness guarantee for `&mut` in Rust is to make the type `!Unpin`.
This is accomplished by the given commit above. At the time of creating
that patch however, we did not consider this unsoundness issue.
For this reason I propose to backport the commit 0b4e3b6f6b79.
The only affected version is 6.5. No earlier version is affected, since
the `Opaque` type does not exist in 6.1. Newer versions are also
unaffected, since the patch is present in 6.6.
Additionally I also propose to backport commit 35cad617df2e ("rust: make
`UnsafeCell` the outer type in `Opaque`") to 6.5, as this is a
prerequisite of 0b4e3b6f6b79.
--
Cheers,
Benno
Dear stable team,
I would like to propose the commit
bbaa6ffa5b6c ("power: supply: core: Use blocking_notifier_call_chain to avoid RCU complaint")
from mainline for inclusion into the stable kernels.
The commit fixes a RCU violation as indicated in its commit message.
Thanks,
Thomas
Hi Greg,
Apology for any inconvenience. This is info and justification for patch.
Commit-id: f9cdeb58a9cf46c09b56f5f661ea8da24b6458c3
Subject: perf evlist: Avoid frequency mode for the dummy event
Justification:
This patch fixes a critical performance issue at perf-tool level for
anything running PMU in a virtualized environment. Majority of the
justification is within the commit message. The only thing I would
like to add is that this patch could save up to 50% of the vCPU time
when running perf in sampling mode with a large number of events in
the VM.
Appreciate your help.
-Mingwei
In brcm80211 driver,it starts with the following invoking chain
to start init a timeout worker:
->brcmf_usb_probe
->brcmf_usb_probe_cb
->brcmf_attach
->brcmf_bus_started
->brcmf_cfg80211_attach
->wl_init_priv
->brcmf_init_escan
->INIT_WORK(&cfg->escan_timeout_work,
brcmf_cfg80211_escan_timeout_worker);
If we disconnect the USB by hotplug, it will call
brcmf_usb_disconnect to make cleanup. The invoking chain is :
brcmf_usb_disconnect
->brcmf_usb_disconnect_cb
->brcmf_detach
->brcmf_cfg80211_detach
->kfree(cfg);
While the timeout woker may still be running. This will cause
a use-after-free bug on cfg in brcmf_cfg80211_escan_timeout_worker.
Fix it by deleting the timer and canceling the worker in
brcmf_cfg80211_detach.
Fixes: e756af5b30b0 ("brcmfmac: add e-scan support.")
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Cc: stable(a)vger.kernel.org
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 667462369a32..0224e377eb6e 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -8431,6 +8431,9 @@ void brcmf_cfg80211_detach(struct brcmf_cfg80211_info *cfg)
if (!cfg)
return;
+ if (timer_pending(&cfg->escan_timeout))
+ del_timer_sync(&cfg->escan_timeout);
+ cancel_delayed_work_sync(&cfg->escan_timeout_work);
brcmf_pno_detach(cfg);
brcmf_btcoex_detach(cfg);
wiphy_unregister(cfg->wiphy);
--
2.25.1
In brcm80211 driver,it starts with the following invoking chain
to start init a timeout worker:
->brcmf_usb_probe
->brcmf_usb_probe_cb
->brcmf_attach
->brcmf_bus_started
->brcmf_cfg80211_attach
->wl_init_priv
->brcmf_init_escan
->INIT_WORK(&cfg->escan_timeout_work,
brcmf_cfg80211_escan_timeout_worker);
If we disconnect the USB by hotplug, it will call
brcmf_usb_disconnect to make cleanup. The invoking chain is :
brcmf_usb_disconnect
->brcmf_usb_disconnect_cb
->brcmf_detach
->brcmf_cfg80211_detach
->kfree(cfg);
While the timeout woker may still be running. This will cause
a use-after-free bug on cfg in brcmf_cfg80211_escan_timeout_worker.
Fix it by deleting the timer and canceling the worker in
brcmf_cfg80211_detach.
Fixes: e756af5b30b0 ("brcmfmac: add e-scan support.")
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Cc: stable(a)vger.kernel.org
---
v2:
- fix the error of kernel test bot reported
---
drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
index 667462369a32..646ec8bdf512 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/cfg80211.c
@@ -8431,6 +8431,9 @@ void brcmf_cfg80211_detach(struct brcmf_cfg80211_info *cfg)
if (!cfg)
return;
+ if (timer_pending(&cfg->escan_timeout))
+ del_timer_sync(&cfg->escan_timeout);
+ cancel_work_sync(&cfg->escan_timeout_work);
brcmf_pno_detach(cfg);
brcmf_btcoex_detach(cfg);
wiphy_unregister(cfg->wiphy);
--
2.25.1
This is a partial revert of commit 8b3517f88ff2 ("PCI:
loongson: Prevent LS7A MRRS increases") for MIPS based Loongson.
There are many MIPS based Loongson systems in wild that
shipped with firmware which does not set maximum MRRS properly.
Limiting MRRS to 256 for all as MIPS Loongson comes with higher
MRRS support is considered rare.
It must be done at device enablement stage because hardware will
reset MRRS to inavlid value if a device got disabled.
Cc: stable(a)vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217680
Fixes: 8b3517f88ff2 ("PCI: loongson: Prevent LS7A MRRS increases")
Signed-off-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
---
v4: Improve commit message
This is a partial revert of the origin quirk so there shouldn't
be any drama.
---
drivers/pci/controller/pci-loongson.c | 38 +++++++++++++++++++++++++++
1 file changed, 38 insertions(+)
diff --git a/drivers/pci/controller/pci-loongson.c b/drivers/pci/controller/pci-loongson.c
index d45e7b8dc530..d184d7b97e54 100644
--- a/drivers/pci/controller/pci-loongson.c
+++ b/drivers/pci/controller/pci-loongson.c
@@ -108,6 +108,44 @@ DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_LOONGSON,
DEV_LS7A_PCIE_PORT6, loongson_mrrs_quirk);
+#ifdef CONFIG_MIPS
+static void loongson_old_mrrs_quirk(struct pci_dev *pdev)
+{
+ struct pci_bus *bus = pdev->bus;
+ struct pci_dev *bridge;
+ static const struct pci_device_id bridge_devids[] = {
+ { PCI_VDEVICE(LOONGSON, DEV_LS2K_PCIE_PORT0) },
+ { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT0) },
+ { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT1) },
+ { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT2) },
+ { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT3) },
+ { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT4) },
+ { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT5) },
+ { PCI_VDEVICE(LOONGSON, DEV_LS7A_PCIE_PORT6) },
+ { 0, },
+ };
+
+ /* look for the matching bridge */
+ while (!pci_is_root_bus(bus)) {
+ bridge = bus->self;
+ bus = bus->parent;
+ /*
+ * There are still some wild MIPS Loongson firmware won't
+ * set MRRS properly. Limiting MRRS to 256 as MIPS Loongson
+ * comes with higher MRRS support is considered rare.
+ */
+ if (pci_match_id(bridge_devids, bridge)) {
+ if (pcie_get_readrq(pdev) > 256) {
+ pci_info(pdev, "limiting MRRS to 256\n");
+ pcie_set_readrq(pdev, 256);
+ }
+ break;
+ }
+ }
+}
+DECLARE_PCI_FIXUP_ENABLE(PCI_ANY_ID, PCI_ANY_ID, loongson_old_mrrs_quirk);
+#endif
+
static void loongson_pci_pin_quirk(struct pci_dev *pdev)
{
pdev->pin = 1 + (PCI_FUNC(pdev->devfn) & 3);
--
2.34.1
From: Arnd Bergmann <arnd(a)arndb.de>
[ Upstream commit c1a8d1d0edb71dec15c9649cb56866c71c1ecd9e ]
ioremap_uc() is only meaningful on old x86-32 systems with the PAT
extension, and on ia64 with its slightly unconventional ioremap()
behavior, everywhere else this is the same as ioremap() anyway.
Change the only driver that still references ioremap_uc() to only do so
on x86-32/ia64 in order to allow removing that interface at some
point in the future for the other architectures.
On some architectures, ioremap_uc() just returns NULL, changing
the driver to call ioremap() means that they now have a chance
of working correctly.
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Reviewed-by: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: linux-fbdev(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Signed-off-by: Helge Deller <deller(a)gmx.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/video/fbdev/aty/atyfb_base.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index cba2b113b28b0..a73114c1c6918 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -3440,11 +3440,15 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
}
info->fix.mmio_start = raddr;
+#if defined(__i386__) || defined(__ia64__)
/*
* By using strong UC we force the MTRR to never have an
* effect on the MMIO region on both non-PAT and PAT systems.
*/
par->ati_regbase = ioremap_uc(info->fix.mmio_start, 0x1000);
+#else
+ par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
+#endif
if (par->ati_regbase == NULL)
return -ENOMEM;
--
2.42.0
From: Arnd Bergmann <arnd(a)arndb.de>
[ Upstream commit c1a8d1d0edb71dec15c9649cb56866c71c1ecd9e ]
ioremap_uc() is only meaningful on old x86-32 systems with the PAT
extension, and on ia64 with its slightly unconventional ioremap()
behavior, everywhere else this is the same as ioremap() anyway.
Change the only driver that still references ioremap_uc() to only do so
on x86-32/ia64 in order to allow removing that interface at some
point in the future for the other architectures.
On some architectures, ioremap_uc() just returns NULL, changing
the driver to call ioremap() means that they now have a chance
of working correctly.
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Reviewed-by: Luis Chamberlain <mcgrof(a)kernel.org>
Cc: Helge Deller <deller(a)gmx.de>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: linux-fbdev(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Signed-off-by: Helge Deller <deller(a)gmx.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/video/fbdev/aty/atyfb_base.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/video/fbdev/aty/atyfb_base.c b/drivers/video/fbdev/aty/atyfb_base.c
index 1aef3d6ebd880..246bf67b32ea0 100644
--- a/drivers/video/fbdev/aty/atyfb_base.c
+++ b/drivers/video/fbdev/aty/atyfb_base.c
@@ -3447,11 +3447,15 @@ static int atyfb_setup_generic(struct pci_dev *pdev, struct fb_info *info,
}
info->fix.mmio_start = raddr;
+#if defined(__i386__) || defined(__ia64__)
/*
* By using strong UC we force the MTRR to never have an
* effect on the MMIO region on both non-PAT and PAT systems.
*/
par->ati_regbase = ioremap_uc(info->fix.mmio_start, 0x1000);
+#else
+ par->ati_regbase = ioremap(info->fix.mmio_start, 0x1000);
+#endif
if (par->ati_regbase == NULL)
return -ENOMEM;
--
2.42.0
The rtnl lock also needs to be held before rndis_filter_device_add()
which advertises nvsp_2_vsc_capability / sriov bit, and triggers
VF NIC offering and registering. If VF NIC finished register_netdev()
earlier it may cause name based config failure.
To fix this issue, move the call to rtnl_lock() before
rndis_filter_device_add(), so VF will be registered later than netvsc
/ synthetic NIC, and gets a name numbered (ethX) after netvsc.
And, move register_netdevice_notifier() earlier, so the call back
function is set before probing.
Cc: stable(a)vger.kernel.org
Fixes: e04e7a7bbd4b ("hv_netvsc: Fix a deadlock by getting rtnl lock earlier in netvsc_probe()")
Signed-off-by: Haiyang Zhang <haiyangz(a)microsoft.com>
---
v2:
Fix rtnl_unlock() in error handling as found by Wojciech Drewek.
---
drivers/net/hyperv/netvsc_drv.c | 32 ++++++++++++++++++++------------
1 file changed, 20 insertions(+), 12 deletions(-)
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 3ba3c8fb28a5..1d1491da303b 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -2531,15 +2531,6 @@ static int netvsc_probe(struct hv_device *dev,
goto devinfo_failed;
}
- nvdev = rndis_filter_device_add(dev, device_info);
- if (IS_ERR(nvdev)) {
- ret = PTR_ERR(nvdev);
- netdev_err(net, "unable to add netvsc device (ret %d)\n", ret);
- goto rndis_failed;
- }
-
- eth_hw_addr_set(net, device_info->mac_adr);
-
/* We must get rtnl lock before scheduling nvdev->subchan_work,
* otherwise netvsc_subchan_work() can get rtnl lock first and wait
* all subchannels to show up, but that may not happen because
@@ -2547,9 +2538,23 @@ static int netvsc_probe(struct hv_device *dev,
* -> ... -> device_add() -> ... -> __device_attach() can't get
* the device lock, so all the subchannels can't be processed --
* finally netvsc_subchan_work() hangs forever.
+ *
+ * The rtnl lock also needs to be held before rndis_filter_device_add()
+ * which advertises nvsp_2_vsc_capability / sriov bit, and triggers
+ * VF NIC offering and registering. If VF NIC finished register_netdev()
+ * earlier it may cause name based config failure.
*/
rtnl_lock();
+ nvdev = rndis_filter_device_add(dev, device_info);
+ if (IS_ERR(nvdev)) {
+ ret = PTR_ERR(nvdev);
+ netdev_err(net, "unable to add netvsc device (ret %d)\n", ret);
+ goto rndis_failed;
+ }
+
+ eth_hw_addr_set(net, device_info->mac_adr);
+
if (nvdev->num_chn > 1)
schedule_work(&nvdev->subchan_work);
@@ -2586,9 +2591,9 @@ static int netvsc_probe(struct hv_device *dev,
return 0;
register_failed:
- rtnl_unlock();
rndis_filter_device_remove(dev, nvdev);
rndis_failed:
+ rtnl_unlock();
netvsc_devinfo_put(device_info);
devinfo_failed:
free_percpu(net_device_ctx->vf_stats);
@@ -2788,11 +2793,14 @@ static int __init netvsc_drv_init(void)
}
netvsc_ring_bytes = ring_size * PAGE_SIZE;
+ register_netdevice_notifier(&netvsc_netdev_notifier);
+
ret = vmbus_driver_register(&netvsc_drv);
- if (ret)
+ if (ret) {
+ unregister_netdevice_notifier(&netvsc_netdev_notifier);
return ret;
+ }
- register_netdevice_notifier(&netvsc_netdev_notifier);
return 0;
}
--
2.25.1
Following warnings and errors have been noticed while building i386 build
on stable-rc linux.4.19.y and linux.4.14.y.
Reported-by: Linux Kernel Functional Testing <lkft(a)linaro.org>
Build log:
==========
kernel/profile.c: In function 'profile_dead_cpu':
kernel/profile.c:346:27: warning: the comparison will always evaluate
as 'true' for the address of 'prof_cpu_mask' will never be NULL
[-Waddress]
346 | if (prof_cpu_mask != NULL)
| ^~
kernel/profile.c:49:22: note: 'prof_cpu_mask' declared here
49 | static cpumask_var_t prof_cpu_mask;
| ^~~~~~~~~~~~~
kernel/profile.c: In function 'profile_online_cpu':
kernel/profile.c:383:27: warning: the comparison will always evaluate
as 'true' for the address of 'prof_cpu_mask' will never be NULL
[-Waddress]
383 | if (prof_cpu_mask != NULL)
| ^~
kernel/profile.c:49:22: note: 'prof_cpu_mask' declared here
49 | static cpumask_var_t prof_cpu_mask;
| ^~~~~~~~~~~~~
kernel/profile.c: In function 'profile_tick':
kernel/profile.c:413:47: warning: the comparison will always evaluate
as 'true' for the address of 'prof_cpu_mask' will never be NULL
[-Waddress]
413 | if (!user_mode(regs) && prof_cpu_mask != NULL &&
| ^~
kernel/profile.c:49:22: note: 'prof_cpu_mask' declared here
49 | static cpumask_var_t prof_cpu_mask;
| ^~~~~~~~~~~~~
arch/x86/kernel/head_32.S: Assembler messages:
arch/x86/kernel/head_32.S:126: Error: invalid character '(' in mnemonic
arch/x86/kernel/head_32.S:57: Info: macro invoked from here
arch/x86/kernel/head_32.S:128: Error: invalid character '(' in mnemonic
arch/x86/kernel/head_32.S:57: Info: macro invoked from here
make[3]: *** [scripts/Makefile.build:403: arch/x86/kernel/head_32.o] Error 1
make[3]: Target '__build' not remade because of errors.
make[2]: *** [scripts/Makefile.build:544: arch/x86/kernel] Error 2
Links:
- https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-4.19.y/build/v4.19…
--
Linaro LKFT
https://lkft.linaro.org
I see the following build failures in v4.14.y.queue and v4.19.y.queue.
Build reference: v4.19.297-41-g46e03d3
Compiler version: x86_64-linux-gcc (GCC) 11.4.0
Assembler version: GNU assembler (GNU Binutils) 2.40
Building i386:defconfig ... failed
--------------
Error log:
arch/x86/kernel/head_32.S: Assembler messages:
arch/x86/kernel/head_32.S:126: Error: invalid character '(' in mnemonic
arch/x86/kernel/head_32.S:57: Info: macro invoked from here
arch/x86/kernel/head_32.S:128: Error: invalid character '(' in mnemonic
arch/x86/kernel/head_32.S:57: Info: macro invoked from here
make[3]: *** [scripts/Makefile.build:403: arch/x86/kernel/head_32.o] Error 1
make[3]: *** Waiting for unfinished jobs....
make[2]: *** [scripts/Makefile.build:544: arch/x86/kernel] Error 2
make[1]: *** [Makefile:1086: arch/x86] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:153: sub-make] Error 2
This appears to be caused by an attempt to support older versions of binutils
in those branches, specifically
30aa3427922d x86/mm: Fix RESERVE_BRK() for older binutils
a70252b27451 x86/mm: Simplify RESERVE_BRK()
I also tried gcc 11.4.0 / binutils 2.38 as well as gcc 6.4.0 / binutils 2.32
with the same result.
Guenter
From: Dominique Martinet <dominique.martinet(a)atmark-techno.com>
This reverts commit 84ee19bffc9306128cd0f1c650e89767079efeff.
The commit above made quirks with an OEMID fail to be applied, as they
were checking card->cid.oemid for the full 16 bits defined in MMC_FIXUP
macros but the field would only contain the bottom 8 bits.
eMMC v5.1A might have bogus values in OEMID's higher bits so another fix
will be made, but it has been decided to revert this until that is ready.
Fixes: 84ee19bffc93 ("mmc: core: Capture correct oemid-bits for eMMC cards")
Link: https://lkml.kernel.org/r/ZToJsSLHr8RnuTHz@codewreck.org
Link: https://lkml.kernel.org/r/CAPDyKFqkKibcXnwjnhc3+W1iJBHLeqQ9BpcZrSwhW2u9K2oU…
Signed-off-by: Dominique Martinet <dominique.martinet(a)atmark-techno.com>
Cc: stable(a)vger.kernel.org
Cc: Avri Altman <avri.altman(a)wdc.com>
Cc: Ulf Hansson <ulf.hansson(a)linaro.org>
Cc: Alex Fetters <Alex.Fetters(a)garmin.com>
---
Here's the revert as discussed in "mmc: truncate quirks' oemid to 8
bits"' patch thread.
Feel free to ignore if you already have something, I just checked your
-next branch quickly and might have missed it.
drivers/mmc/core/mmc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 4a4bab9aa726..89cd48fcec79 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -104,7 +104,7 @@ static int mmc_decode_cid(struct mmc_card *card)
case 3: /* MMC v3.1 - v3.3 */
case 4: /* MMC v4 */
card->cid.manfid = UNSTUFF_BITS(resp, 120, 8);
- card->cid.oemid = UNSTUFF_BITS(resp, 104, 8);
+ card->cid.oemid = UNSTUFF_BITS(resp, 104, 16);
card->cid.prod_name[0] = UNSTUFF_BITS(resp, 96, 8);
card->cid.prod_name[1] = UNSTUFF_BITS(resp, 88, 8);
card->cid.prod_name[2] = UNSTUFF_BITS(resp, 80, 8);
--
2.41.0
From: Bean Huo <beanhuo(a)micron.com>
Micron MTFC4GACAJCN eMMC supports cache but requires that flush cache
operation be allowed only after a write has occurred. Otherwise, the
cache flush command or subsequent commands will time out.
Signed-off-by: Bean Huo <beanhuo(a)micron.com>
Signed-off-by: Rafael Beims <rafael.beims(a)toradex.com>
Cc: stable(a)vger.kernel.org
---
Changelog:
v4--v5:
1. In the case of a successful flush, set writing_flag in _mmc_flush_cache()
v3--v4:
1. Add helper function for this quirk in drivers/mmc/core/card.h.
2. Set card->written_flag only for REQ_OP_WRITE.
v2--v3:
1. Set card->written_flag in mmc_blk_mq_issue_rq().
v1--v2:
1. Add Rafael's test-tag, and Co-developed-by.
2. Check host->card whether NULL or not in __mmc_start_request() before asserting host->card->->quirks
---
drivers/mmc/core/block.c | 4 +++-
drivers/mmc/core/card.h | 4 ++++
drivers/mmc/core/mmc.c | 8 ++++++--
drivers/mmc/core/quirks.h | 7 ++++---
include/linux/mmc/card.h | 2 ++
5 files changed, 19 insertions(+), 6 deletions(-)
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 3a8f27c3e310..152dfe593c43 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2381,8 +2381,10 @@ enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, struct request *req)
}
ret = mmc_blk_cqe_issue_flush(mq, req);
break;
- case REQ_OP_READ:
case REQ_OP_WRITE:
+ card->written_flag = true;
+ fallthrough;
+ case REQ_OP_READ:
if (host->cqe_enabled)
ret = mmc_blk_cqe_issue_rw_rq(mq, req);
else
diff --git a/drivers/mmc/core/card.h b/drivers/mmc/core/card.h
index 4edf9057fa79..b7754a1b8d97 100644
--- a/drivers/mmc/core/card.h
+++ b/drivers/mmc/core/card.h
@@ -280,4 +280,8 @@ static inline int mmc_card_broken_sd_cache(const struct mmc_card *c)
return c->quirks & MMC_QUIRK_BROKEN_SD_CACHE;
}
+static inline int mmc_card_broken_cache_flush(const struct mmc_card *c)
+{
+ return c->quirks & MMC_QUIRK_BROKEN_CACHE_FLUSH;
+}
#endif
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 8180983bd402..11053f920ac4 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -2086,13 +2086,17 @@ static int _mmc_flush_cache(struct mmc_host *host)
{
int err = 0;
+ if (mmc_card_broken_cache_flush(host->card) && !host->card->written_flag)
+ return err;
+
if (_mmc_cache_enabled(host)) {
err = mmc_switch(host->card, EXT_CSD_CMD_SET_NORMAL,
EXT_CSD_FLUSH_CACHE, 1,
CACHE_FLUSH_TIMEOUT_MS);
if (err)
- pr_err("%s: cache flush error %d\n",
- mmc_hostname(host), err);
+ pr_err("%s: cache flush error %d\n", mmc_hostname(host), err);
+ else
+ host->card->written_flag = false;
}
return err;
diff --git a/drivers/mmc/core/quirks.h b/drivers/mmc/core/quirks.h
index 32b64b564fb1..5e68c8b4cdca 100644
--- a/drivers/mmc/core/quirks.h
+++ b/drivers/mmc/core/quirks.h
@@ -110,11 +110,12 @@ static const struct mmc_fixup __maybe_unused mmc_blk_fixups[] = {
MMC_QUIRK_TRIM_BROKEN),
/*
- * Micron MTFC4GACAJCN-1M advertises TRIM but it does not seems to
- * support being used to offload WRITE_ZEROES.
+ * Micron MTFC4GACAJCN-1M supports TRIM but does not appear to suppor
+ * WRITE_ZEROES offloading. It also supports caching, but the cache can
+ * only be flushed after a write has occurred.
*/
MMC_FIXUP("Q2J54A", CID_MANFID_MICRON, 0x014e, add_quirk_mmc,
- MMC_QUIRK_TRIM_BROKEN),
+ MMC_QUIRK_TRIM_BROKEN | MMC_QUIRK_BROKEN_CACHE_FLUSH),
/*
* Kingston EMMC04G-M627 advertises TRIM but it does not seems to
diff --git a/include/linux/mmc/card.h b/include/linux/mmc/card.h
index daa2f40d9ce6..7b12eebc5586 100644
--- a/include/linux/mmc/card.h
+++ b/include/linux/mmc/card.h
@@ -295,7 +295,9 @@ struct mmc_card {
#define MMC_QUIRK_BROKEN_HPI (1<<13) /* Disable broken HPI support */
#define MMC_QUIRK_BROKEN_SD_DISCARD (1<<14) /* Disable broken SD discard support */
#define MMC_QUIRK_BROKEN_SD_CACHE (1<<15) /* Disable broken SD cache support */
+#define MMC_QUIRK_BROKEN_CACHE_FLUSH (1<<16) /* Don't flush cache until the write has occurred */
+ bool written_flag; /* Indicates eMMC has been written since power on */
bool reenable_cmdq; /* Re-enable Command Queue */
unsigned int erase_size; /* erase size in sectors */
--
2.34.1
Fix these two error paths:
1. When set_memory_decrypted() fails, pages may be left fully or partially
decrypted.
2. Decrypted pages may be freed if swiotlb_alloc_tlb() determines that the
physical address is too high.
To fix the first issue, call set_memory_encrypted() on the allocated region
after a failed decryption attempt. If that also fails, leak the pages.
To fix the second issue, check that the TLB physical address is below the
requested limit before decrypting.
Let the caller differentiate between unsuitable physical address (=> retry
from a lower zone) and allocation failures (=> no point in retrying).
Cc: stable(a)vger.kernel.org
Cc: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Fixes: 79636caad361 ("swiotlb: if swiotlb is full, fall back to a transient memory pool")
Signed-off-by: Petr Tesarik <petr.tesarik1(a)huawei-partners.com>
---
kernel/dma/swiotlb.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/kernel/dma/swiotlb.c b/kernel/dma/swiotlb.c
index dff067bd56b1..0e1632f75421 100644
--- a/kernel/dma/swiotlb.c
+++ b/kernel/dma/swiotlb.c
@@ -558,29 +558,40 @@ void __init swiotlb_exit(void)
* alloc_dma_pages() - allocate pages to be used for DMA
* @gfp: GFP flags for the allocation.
* @bytes: Size of the buffer.
+ * @phys_limit: Maximum allowed physical address of the buffer.
*
* Allocate pages from the buddy allocator. If successful, make the allocated
* pages decrypted that they can be used for DMA.
*
- * Return: Decrypted pages, or %NULL on failure.
+ * Return: Decrypted pages, %NULL on allocation failure, or ERR_PTR(-EAGAIN)
+ * if the allocated physical address was above @phys_limit.
*/
-static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes)
+static struct page *alloc_dma_pages(gfp_t gfp, size_t bytes, u64 phys_limit)
{
unsigned int order = get_order(bytes);
struct page *page;
+ phys_addr_t paddr;
void *vaddr;
page = alloc_pages(gfp, order);
if (!page)
return NULL;
- vaddr = page_address(page);
+ paddr = page_to_phys(page);
+ if (paddr + bytes - 1 > phys_limit) {
+ __free_pages(page, order);
+ return ERR_PTR(-EAGAIN);
+ }
+
+ vaddr = phys_to_virt(paddr);
if (set_memory_decrypted((unsigned long)vaddr, PFN_UP(bytes)))
goto error;
return page;
error:
- __free_pages(page, order);
+ /* Intentional leak if pages cannot be encrypted again. */
+ if (!set_memory_encrypted((unsigned long)vaddr, PFN_UP(bytes)))
+ __free_pages(page, order);
return NULL;
}
@@ -618,11 +629,7 @@ static struct page *swiotlb_alloc_tlb(struct device *dev, size_t bytes,
else if (phys_limit <= DMA_BIT_MASK(32))
gfp |= __GFP_DMA32;
- while ((page = alloc_dma_pages(gfp, bytes)) &&
- page_to_phys(page) + bytes - 1 > phys_limit) {
- /* allocated, but too high */
- __free_pages(page, get_order(bytes));
-
+ while (IS_ERR(page = alloc_dma_pages(gfp, bytes, phys_limit))) {
if (IS_ENABLED(CONFIG_ZONE_DMA32) &&
phys_limit < DMA_BIT_MASK(64) &&
!(gfp & (__GFP_DMA32 | __GFP_DMA)))
--
2.34.1
We now only capture 8 bits for oemid in card->cid.oemid, so quirks that
were filling up the full 16 bits up till now would no longer apply.
Work around the problem by only checking for the bottom 8 bits when
checking if quirks should be applied
Fixes: 84ee19bffc93 ("mmc: core: Capture correct oemid-bits for eMMC cards")
Link: https://lkml.kernel.org/r/ZToJsSLHr8RnuTHz@codewreck.org
Signed-off-by: Dominique Martinet <dominique.martinet(a)atmark-techno.com>
Cc: stable(a)vger.kernel.org
Cc: Avri Altman <avri.altman(a)wdc.com>
Cc: Ulf Hansson <ulf.hansson(a)linaro.org>
Cc: Alex Fetters <Alex.Fetters(a)garmin.com>
---
Notes:
- mmc_fixup_device() was rewritten in 5.17, so older stable kernels
will need a separate patch... I suppose I can send it to stable
after this is merged if we go this way
- struct mmc_cid's and mmc_fixup's oemid fields are unsigned shorts,
we probably just want to make them unsigned char instead in which
case we don't need that check anymore?
But it's kind of nice to have a wider type so CID_OEMID_ANY can never
be a match.... Which unfortunately my patch makes moot as
((unsigned short)-1) & 0xff will be 0xff which can match anything...
- this could also be worked around in the _FIXUP_EXT macro that builds
the fixup structs, but we're getting ugly here... Or we can just go
for the big boom and try to fix all MMC_FIXUP() users in tree and
call it a day, but that'll also be fun to backport.
drivers/mmc/core/quirks.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/mmc/core/quirks.h b/drivers/mmc/core/quirks.h
index 32b64b564fb1..27e0349e176d 100644
--- a/drivers/mmc/core/quirks.h
+++ b/drivers/mmc/core/quirks.h
@@ -211,8 +211,9 @@ static inline void mmc_fixup_device(struct mmc_card *card,
if (f->manfid != CID_MANFID_ANY &&
f->manfid != card->cid.manfid)
continue;
+ /* Only the bottom 8bits are valid in JESD84-B51 */
if (f->oemid != CID_OEMID_ANY &&
- f->oemid != card->cid.oemid)
+ (f->oemid & 0xff) != (card->cid.oemid & 0xff))
continue;
if (f->name != CID_NAME_ANY &&
strncmp(f->name, card->cid.prod_name,
--
2.39.2
Hi Greg,
I see the following build warning / errors everywhere on stable-rc 5.15 branch.
ld.lld: error: undefined symbol: kallsyms_on_each_symbol
>>> referenced by trace_kprobe.c
>>> trace/trace_kprobe.o:(create_local_trace_kprobe) in archive kernel/built-in.a
>>> referenced by trace_kprobe.c
>>> trace/trace_kprobe.o:(__trace_kprobe_create) in archive kernel/built-in.a
make[1]: *** [Makefile:1227: vmlinux] Error 1
Links,
- https://storage.tuxsuite.com/public/linaro/lkft/builds/2XXALLRIZaXJVcqhff4Z…
- Naresh
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x b022f0c7e404887a7c5229788fc99eff9f9a80d5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023110246-cursive-monthly-f800@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b022f0c7e404887a7c5229788fc99eff9f9a80d5 Mon Sep 17 00:00:00 2001
From: Francis Laniel <flaniel(a)linux.microsoft.com>
Date: Fri, 20 Oct 2023 13:42:49 +0300
Subject: [PATCH] tracing/kprobes: Return EADDRNOTAVAIL when func matches
several symbols
When a kprobe is attached to a function that's name is not unique (is
static and shares the name with other functions in the kernel), the
kprobe is attached to the first function it finds. This is a bug as the
function that it is attaching to is not necessarily the one that the
user wants to attach to.
Instead of blindly picking a function to attach to what is ambiguous,
error with EADDRNOTAVAIL to let the user know that this function is not
unique, and that the user must use another unique function with an
address offset to get to the function they want to attach to.
Link: https://lore.kernel.org/all/20231020104250.9537-2-flaniel@linux.microsoft.c…
Cc: stable(a)vger.kernel.org
Fixes: 413d37d1eb69 ("tracing: Add kprobe-based event tracer")
Suggested-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Signed-off-by: Francis Laniel <flaniel(a)linux.microsoft.com>
Link: https://lore.kernel.org/lkml/20230819101105.b0c104ae4494a7d1f2eea742@kernel…
Acked-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
diff --git a/kernel/trace/trace_kprobe.c b/kernel/trace/trace_kprobe.c
index 3d7a180a8427..a8fef6ab0872 100644
--- a/kernel/trace/trace_kprobe.c
+++ b/kernel/trace/trace_kprobe.c
@@ -705,6 +705,25 @@ static struct notifier_block trace_kprobe_module_nb = {
.priority = 1 /* Invoked after kprobe module callback */
};
+static int count_symbols(void *data, unsigned long unused)
+{
+ unsigned int *count = data;
+
+ (*count)++;
+
+ return 0;
+}
+
+static unsigned int number_of_same_symbols(char *func_name)
+{
+ unsigned int count;
+
+ count = 0;
+ kallsyms_on_each_match_symbol(count_symbols, func_name, &count);
+
+ return count;
+}
+
static int __trace_kprobe_create(int argc, const char *argv[])
{
/*
@@ -836,6 +855,31 @@ static int __trace_kprobe_create(int argc, const char *argv[])
}
}
+ if (symbol && !strchr(symbol, ':')) {
+ unsigned int count;
+
+ count = number_of_same_symbols(symbol);
+ if (count > 1) {
+ /*
+ * Users should use ADDR to remove the ambiguity of
+ * using KSYM only.
+ */
+ trace_probe_log_err(0, NON_UNIQ_SYMBOL);
+ ret = -EADDRNOTAVAIL;
+
+ goto error;
+ } else if (count == 0) {
+ /*
+ * We can return ENOENT earlier than when register the
+ * kprobe.
+ */
+ trace_probe_log_err(0, BAD_PROBE_ADDR);
+ ret = -ENOENT;
+
+ goto error;
+ }
+ }
+
trace_probe_log_set_index(0);
if (event) {
ret = traceprobe_parse_event_name(&event, &group, gbuf,
@@ -1695,6 +1739,7 @@ static int unregister_kprobe_event(struct trace_kprobe *tk)
}
#ifdef CONFIG_PERF_EVENTS
+
/* create a trace_kprobe, but don't add it to global lists */
struct trace_event_call *
create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
@@ -1705,6 +1750,24 @@ create_local_trace_kprobe(char *func, void *addr, unsigned long offs,
int ret;
char *event;
+ if (func) {
+ unsigned int count;
+
+ count = number_of_same_symbols(func);
+ if (count > 1)
+ /*
+ * Users should use addr to remove the ambiguity of
+ * using func only.
+ */
+ return ERR_PTR(-EADDRNOTAVAIL);
+ else if (count == 0)
+ /*
+ * We can return ENOENT earlier than when register the
+ * kprobe.
+ */
+ return ERR_PTR(-ENOENT);
+ }
+
/*
* local trace_kprobes are not added to dyn_event, so they are never
* searched in find_trace_kprobe(). Therefore, there is no concern of
diff --git a/kernel/trace/trace_probe.h b/kernel/trace/trace_probe.h
index 02b432ae7513..850d9ecb6765 100644
--- a/kernel/trace/trace_probe.h
+++ b/kernel/trace/trace_probe.h
@@ -450,6 +450,7 @@ extern int traceprobe_define_arg_fields(struct trace_event_call *event_call,
C(BAD_MAXACT, "Invalid maxactive number"), \
C(MAXACT_TOO_BIG, "Maxactive is too big"), \
C(BAD_PROBE_ADDR, "Invalid probed address or symbol"), \
+ C(NON_UNIQ_SYMBOL, "The symbol is not unique"), \
C(BAD_RETPROBE, "Retprobe address must be an function entry"), \
C(NO_TRACEPOINT, "Tracepoint is not found"), \
C(BAD_ADDR_SUFFIX, "Invalid probed address suffix"), \
Commit 3c0897c180c6 ("cpufreq: Use scnprintf() for avoiding potential
buffer overflow") switched from snprintf to the more secure scnprintf
but never updated the exit condition for PAGE_SIZE.
As the commit say and as scnprintf document, what scnprintf returns what
is actually written not counting the '\0' end char. This results in the
case of len exceeding the size, len set to PAGE_SIZE - 1, as it can be
written at max PAGESIZE - 1 (as '\0' is not counted)
Because of len is never set to PAGE_SIZE, the function never break early,
never print the warning and never return -EFBIG.
Fix this by fixing the condition to PAGE_SIZE -1 to correctly trigger
the error condition.
Cc: stable(a)vger.kernel.org
Fixes: 3c0897c180c6 ("cpufreq: Use scnprintf() for avoiding potential buffer overflow")
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
---
drivers/cpufreq/cpufreq_stats.c | 14 +++++++-------
1 file changed, 7 insertions(+), 7 deletions(-)
diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index a33df3c66c88..40a9ff18da06 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -131,23 +131,23 @@ static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf)
len += sysfs_emit_at(buf, len, " From : To\n");
len += sysfs_emit_at(buf, len, " : ");
for (i = 0; i < stats->state_num; i++) {
- if (len >= PAGE_SIZE)
+ if (len >= PAGE_SIZE - 1)
break;
len += sysfs_emit_at(buf, len, "%9u ", stats->freq_table[i]);
}
- if (len >= PAGE_SIZE)
- return PAGE_SIZE;
+ if (len >= PAGE_SIZE - 1)
+ return PAGE_SIZE - 1;
len += sysfs_emit_at(buf, len, "\n");
for (i = 0; i < stats->state_num; i++) {
- if (len >= PAGE_SIZE)
+ if (len >= PAGE_SIZE - 1)
break;
len += sysfs_emit_at(buf, len, "%9u: ", stats->freq_table[i]);
for (j = 0; j < stats->state_num; j++) {
- if (len >= PAGE_SIZE)
+ if (len >= PAGE_SIZE - 1)
break;
if (pending)
@@ -157,12 +157,12 @@ static ssize_t show_trans_table(struct cpufreq_policy *policy, char *buf)
len += sysfs_emit_at(buf, len, "%9u ", count);
}
- if (len >= PAGE_SIZE)
+ if (len >= PAGE_SIZE - 1)
break;
len += sysfs_emit_at(buf, len, "\n");
}
- if (len >= PAGE_SIZE) {
+ if (len >= PAGE_SIZE - 1) {
pr_warn_once("cpufreq transition table exceeds PAGE_SIZE. Disabling\n");
return -EFBIG;
}
--
2.40.1
Üdvözlöm, van egy vállalkozásom, amelyre úgy hivatkoztam rád, mint
neked ugyanaz a vezetéknév, mint a néhai ügyfelem, de a részletek az
alábbiak lesznek értesítjük Önt, amikor megerősíti ennek az e-mailnek
a kézhezvételét. Üdvözlettel
This function takes a pointer to a pointer, unlike sprintf() which is
passed a plain pointer. Fix up the documentation to make this clear.
Fixes: 7888fe53b706 ("ethtool: Add common function for filling out strings")
Cc: Alexander Duyck <alexanderduyck(a)fb.com>
Cc: Justin Stitt <justinstitt(a)google.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Andrew Lunn <andrew(a)lunn.ch>
---
include/linux/ethtool.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/ethtool.h b/include/linux/ethtool.h
index 62b61527bcc4..1b523fd48586 100644
--- a/include/linux/ethtool.h
+++ b/include/linux/ethtool.h
@@ -1045,10 +1045,10 @@ static inline int ethtool_mm_frag_size_min_to_add(u32 val_min, u32 *val_add,
/**
* ethtool_sprintf - Write formatted string to ethtool string data
- * @data: Pointer to start of string to update
+ * @data: Pointer to a pointer to the start of string to update
* @fmt: Format of string to write
*
- * Write formatted string to data. Update data to point at start of
+ * Write formatted string to *data. Update *data to point at start of
* next string.
*/
extern __printf(2, 3) void ethtool_sprintf(u8 **data, const char *fmt, ...);
--
2.42.0
CCITMIN is a 12 bit field and doesn't fit in a u8, so extend it to u16.
This probably wasn't an issue previously because values higher than 255
never occurred.
But since commit 0f55b43dedcd ("coresight: etm: Override TRCIDR3.CCITMIN
on errata affected cpus"), a comparison with 256 was done to enable the
errata, generating the following W=1 build error:
coresight-etm4x-core.c:1188:24: error: result of comparison of
constant 256 with expression of type 'u8' (aka 'unsigned char') is
always false [-Werror,-Wtautological-constant-out-of-range-compare]
if (drvdata->ccitmin == 256)
Cc: stable(a)vger.kernel.org
Fixes: 54ff892b76c6 ("coresight: etm4x: splitting struct etmv4_drvdata")
Signed-off-by: James Clark <james.clark(a)arm.com>
---
drivers/hwtracing/coresight/coresight-etm4x.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
index 20e2e4cb7614..da17b6c49b0f 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.h
+++ b/drivers/hwtracing/coresight/coresight-etm4x.h
@@ -1036,7 +1036,7 @@ struct etmv4_drvdata {
u8 ctxid_size;
u8 vmid_size;
u8 ccsize;
- u8 ccitmin;
+ u16 ccitmin;
u8 s_ex_level;
u8 ns_ex_level;
u8 q_support;
--
2.34.1
From: Ronald Wahl <ronald.wahl(a)raritan.com>
Starting RX DMA on THRI interrupt is too early because TX may not have
finished yet.
This change is inspired by commit 90b8596ac460 ("serial: 8250: Prevent
starting up DMA Rx on THRI interrupt") and fixes DMA issues I had with
an AM62 SoC that is using the 8250 OMAP variant.
Cc: stable(a)vger.kernel.org
Fixes: c26389f998a8 ("serial: 8250: 8250_omap: Add DMA support for UARTs on K3 SoCs")
Signed-off-by: Ronald Wahl <ronald.wahl(a)raritan.com>
---
V4: - add missing braces to fix build warning
V3: - add Cc: stable(a)vger.kernel.org
V2: - add Fixes: tag
- fix author
drivers/tty/serial/8250/8250_omap.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index c7ab2963040b..1122f37fe744 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -1282,10 +1282,12 @@ static int omap_8250_dma_handle_irq(struct uart_port *port)
status = serial_port_in(port, UART_LSR);
- if (priv->habit & UART_HAS_EFR2)
- am654_8250_handle_rx_dma(up, iir, status);
- else
- status = omap_8250_handle_rx_dma(up, iir, status);
+ if ((iir & 0x3f) != UART_IIR_THRI) {
+ if (priv->habit & UART_HAS_EFR2)
+ am654_8250_handle_rx_dma(up, iir, status);
+ else
+ status = omap_8250_handle_rx_dma(up, iir, status);
+ }
serial8250_modem_status(up);
if (status & UART_LSR_THRE && up->dma->tx_err) {
--
2.41.0
Linux 5.10.198 commit
2cdec9c13f81 ("spi: spi-zynqmp-gqspi: Fix runtime PM imbalance in
zynqmp_qspi_probe")
looks very different compared to matching upstream commit:
a21fbc42807b ("spi: spi-zynqmp-gqspi: Fix runtime PM imbalance in
zynqmp_qspi_probe")
The Linux 5.10.198 change breaks a platform for me and it really looks
like an incorrect backport.
Dinghao, can you have a look ?
Thank you
Hi,
(prefix, I was not aware of the regression reporting process and incorrectly reported this informally with the developers mentioned in the change)
I upgraded from 6.1.38 to 6.1.55 this morning and it broke my traffic shaping script, leaving me with a non-functional uplink on a remote router.
The script errors out like this:
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + ext=ispA
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + ext_ingress=ifb0
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + modprobe ifb
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + modprobe act_mirred
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + tc qdisc del dev ispA root
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2061]: Error: Cannot delete qdisc with handle of zero.
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + true
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + tc qdisc del dev ispA ingress
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2064]: Error: Cannot find specified qdisc on specified device.
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + true
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + tc qdisc del dev ifb0 root
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2066]: Error: Cannot delete qdisc with handle of zero.
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + true
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + tc qdisc del dev ifb0 ingress
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2067]: Error: Cannot find specified qdisc on specified device.
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + true
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + tc qdisc add dev ispA handle ffff: ingress
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + ifconfig ifb0 up
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + tc filter add dev ispA parent ffff: protocol all u32 match u32 0 0 action mirred egress redirect dev ifb0
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + tc qdisc add dev ifb0 root handle 1: hfsc default 1
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + tc class add dev ifb0 parent 1: classid 1:999 hfsc rt m2 2.5gbit
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2053]: + tc class add dev ifb0 parent 1:999 classid 1:1 hfsc sc rate 50mbit
Oct 06 05:49:22 wendy00 isp-setup-shaping-start[2077]: Error: Invalid parent - parent class must have FSC.
The error message is also a bit weird (but that’s likely due to iproute2 being weird) as the CLI interface for `tc` and the error message do not map well. (I think I would have to choose `hfsc sc` on the parent to enable the FSC option which isn’t mentioned anywhere in the hfsc manpage).
The breaking change was introduced in 6.1.53[1] and a multitude of other currently supported kernels:
----
commit a1e820fc7808e42b990d224f40e9b4895503ac40
Author: Budimir Markovic <markovicbudimir(a)gmail.com>
Date: Thu Aug 24 01:49:05 2023 -0700
net/sched: sch_hfsc: Ensure inner classes have fsc curve
[ Upstream commit b3d26c5702c7d6c45456326e56d2ccf3f103e60f ]
HFSC assumes that inner classes have an fsc curve, but it is currently
possible for classes without an fsc curve to become parents. This leads
to bugs including a use-after-free.
Don't allow non-root classes without HFSC_FSC to become parents.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: Budimir Markovic <markovicbudimir(a)gmail.com>
Signed-off-by: Budimir Markovic <markovicbudimir(a)gmail.com>
Acked-by: Jamal Hadi Salim <jhs(a)mojatatu.com>
Link: https://lore.kernel.org/r/20230824084905.422-1-markovicbudimir@gmail.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
----
Regards,
Christian
[1] https://cdn.kernel.org/pub/linux/kernel/v6.x/ChangeLog-6.1.53
#regzbot introduced: a1e820fc7808e42b990d224f40e9b4895503ac40
--
Christian Theune · ct(a)flyingcircus.io · +49 345 219401 0
Flying Circus Internet Operations GmbH · https://flyingcircus.io
Leipziger Str. 70/71 · 06108 Halle (Saale) · Deutschland
HR Stendal HRB 21169 · Geschäftsführer: Christian Theune, Christian Zagrodnick
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Looking at how dentry is removed via the tracefs system, I found that
eventfs does not do everything that it did under tracefs. The tracefs
removal of a dentry calls simple_recursive_removal() that does a lot more
than a simple d_invalidate().
As it should be a requirement that any eventfs_inode that has a dentry, so
does its parent. When removing a eventfs_inode, if it has a dentry, a call
to simple_recursive_removal() on that dentry should clean up all the
dentries underneath it.
Add WARN_ON_ONCE() to check for the parent having a dentry if any children
do.
Link: https://lore.kernel.org/all/20231101022553.GE1957730@ZenIV/
Cc: stable(a)vger.kernel.org
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Changes since the last patch: https://lore.kernel.org/linux-trace-kernel/20231031144703.71eef3a0@gandalf.…
- Was originally called: eventfs: Process deletion of dentry more thoroughly
- Al Viro pointed out that I could use simple_recursive_removal() instead.
I had originally thought that I could not, but looking deeper into it,
and realizing that if a dentry exists on any eventfs_inode, then all
the parent eventfs_inode of that would als have a dentry. Hence, calling
simple_recursive_removal() on the top dentry would clean up all the
children dentries as well. Doing it his way cleans up the code quite
a bit!
fs/tracefs/event_inode.c | 77 +++++++++++++++++++++++-----------------
fs/tracefs/internal.h | 2 --
2 files changed, 44 insertions(+), 35 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 0087a3f455f1..f8a594a50ae6 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -967,30 +967,29 @@ static void unhook_dentry(struct dentry *dentry)
{
if (!dentry)
return;
-
- /* Keep the dentry from being freed yet (see eventfs_workfn()) */
+ /*
+ * Need to add a reference to the dentry that is expected by
+ * simple_recursive_removal(), which will include a dput().
+ */
dget(dentry);
- dentry->d_fsdata = NULL;
- d_invalidate(dentry);
- mutex_lock(&eventfs_mutex);
- /* dentry should now have at least a single reference */
- WARN_ONCE((int)d_count(dentry) < 1,
- "dentry %px (%s) less than one reference (%d) after invalidate\n",
- dentry, dentry->d_name.name, d_count(dentry));
- mutex_unlock(&eventfs_mutex);
+ /*
+ * Also add a reference for the dput() in eventfs_workfn().
+ * That is required as that dput() will free the ei after
+ * the SRCU grace period is over.
+ */
+ dget(dentry);
}
/**
* eventfs_remove_rec - remove eventfs dir or file from list
* @ei: eventfs_inode to be removed.
- * @head: the list head to place the deleted @ei and children
* @level: prevent recursion from going more than 3 levels deep.
*
* This function recursively removes eventfs_inodes which
* contains info of files and/or directories.
*/
-static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head, int level)
+static void eventfs_remove_rec(struct eventfs_inode *ei, int level)
{
struct eventfs_inode *ei_child;
@@ -1009,13 +1008,26 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head,
/* search for nested folders or files */
list_for_each_entry_srcu(ei_child, &ei->children, list,
lockdep_is_held(&eventfs_mutex)) {
- eventfs_remove_rec(ei_child, head, level + 1);
+ /* Children only have dentry if parent does */
+ WARN_ON_ONCE(ei_child->dentry && !ei->dentry);
+ eventfs_remove_rec(ei_child, level + 1);
}
+
ei->is_freed = 1;
+ for (int i = 0; i < ei->nr_entries; i++) {
+ if (ei->d_children[i]) {
+ /* Children only have dentry if parent does */
+ WARN_ON_ONCE(!ei->dentry);
+ unhook_dentry(ei->d_children[i]);
+ }
+ }
+
+ unhook_dentry(ei->dentry);
+
list_del_rcu(&ei->list);
- list_add_tail(&ei->del_list, head);
+ call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei);
}
/**
@@ -1026,30 +1038,22 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head,
*/
void eventfs_remove_dir(struct eventfs_inode *ei)
{
- struct eventfs_inode *tmp;
- LIST_HEAD(ei_del_list);
+ struct dentry *dentry;
if (!ei)
return;
- /*
- * Move the deleted eventfs_inodes onto the ei_del_list
- * which will also set the is_freed value. Note, this has to be
- * done under the eventfs_mutex, but the deletions of
- * the dentries must be done outside the eventfs_mutex.
- * Hence moving them to this temporary list.
- */
mutex_lock(&eventfs_mutex);
- eventfs_remove_rec(ei, &ei_del_list, 0);
+ dentry = ei->dentry;
+ eventfs_remove_rec(ei, 0);
mutex_unlock(&eventfs_mutex);
- list_for_each_entry_safe(ei, tmp, &ei_del_list, del_list) {
- for (int i = 0; i < ei->nr_entries; i++)
- unhook_dentry(ei->d_children[i]);
- unhook_dentry(ei->dentry);
- list_del(&ei->del_list);
- call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei);
- }
+ /*
+ * If any of the ei children has a dentry, then the ei itself
+ * must have a dentry.
+ */
+ if (dentry)
+ simple_recursive_removal(dentry, NULL);
}
/**
@@ -1060,10 +1064,17 @@ void eventfs_remove_dir(struct eventfs_inode *ei)
*/
void eventfs_remove_events_dir(struct eventfs_inode *ei)
{
- struct dentry *dentry = ei->dentry;
+ struct dentry *dentry;
+ dentry = ei->dentry;
eventfs_remove_dir(ei);
- /* Matches the dget() from eventfs_create_events_dir() */
+ /*
+ * Matches the dget() done by tracefs_start_creating()
+ * in eventfs_create_events_dir() when it the dentry was
+ * created. In other words, it's a normal dentry that
+ * sticks around while the other ei->dentry are created
+ * and destroyed dynamically.
+ */
dput(dentry);
}
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 06a1f220b901..ccee18ca66c7 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -55,12 +55,10 @@ struct eventfs_inode {
/*
* Union - used for deletion
* @llist: for calling dput() if needed after RCU
- * @del_list: list of eventfs_inode to delete
* @rcu: eventfs_inode to delete in RCU
*/
union {
struct llist_node llist;
- struct list_head del_list;
struct rcu_head rcu;
};
unsigned int is_freed:1;
--
2.42.0
Immediate is incorrectly cast to u32 before being spilled, losing sign
information. The range information is incorrect after load again. Fix
immediate spill by remove the cast. The second patch add a test case
for this.
Signed-off-by: Hao Sun <sunhao.th(a)gmail.com>
---
Changes in v3:
- Change the expected log to fix the test case
- Link to v2: https://lore.kernel.org/r/20231101-fix-check-stack-write-v2-0-cb7c17b869b0@…
Changes in v2:
- Add fix and cc tags.
- Link to v1: https://lore.kernel.org/r/20231026-fix-check-stack-write-v1-0-6b325ef3ce7e@…
---
Hao Sun (2):
bpf: Fix check_stack_write_fixed_off() to correctly spill imm
selftests/bpf: Add test for immediate spilled to stack
kernel/bpf/verifier.c | 2 +-
tools/testing/selftests/bpf/verifier/bpf_st_mem.c | 32 +++++++++++++++++++++++
2 files changed, 33 insertions(+), 1 deletion(-)
---
base-commit: f2fbb908112311423b09cd0d2b4978f174b99585
change-id: 20231026-fix-check-stack-write-c40996694dfa
Best regards,
--
Hao Sun <sunhao.th(a)gmail.com>
From: Willem de Bruijn <willemb(a)google.com>
LLC reads the mac header with eth_hdr without verifying that the skb
has an Ethernet header.
Syzbot was able to enter llc_rcv on a tun device. Tun can insert
packets without mac len and with user configurable skb->protocol
(passing a tun_pi header when not configuring IFF_NO_PI).
BUG: KMSAN: uninit-value in llc_station_ac_send_test_r net/llc/llc_station.c:81 [inline]
BUG: KMSAN: uninit-value in llc_station_rcv+0x6fb/0x1290 net/llc/llc_station.c:111
llc_station_ac_send_test_r net/llc/llc_station.c:81 [inline]
llc_station_rcv+0x6fb/0x1290 net/llc/llc_station.c:111
llc_rcv+0xc5d/0x14a0 net/llc/llc_input.c:218
__netif_receive_skb_one_core net/core/dev.c:5523 [inline]
__netif_receive_skb+0x1a6/0x5a0 net/core/dev.c:5637
netif_receive_skb_internal net/core/dev.c:5723 [inline]
netif_receive_skb+0x58/0x660 net/core/dev.c:5782
tun_rx_batched+0x3ee/0x980 drivers/net/tun.c:1555
tun_get_user+0x54c5/0x69c0 drivers/net/tun.c:2002
Add a mac_len test before all three eth_hdr(skb) calls under net/llc.
There are further uses in include/net/llc_pdu.h. All these are
protected by a test skb->protocol == ETH_P_802_2. Which does not
protect against this tun scenario.
But the mac_len test added in this patch in llc_fixup_skb will
indirectly protect those too. That is called from llc_rcv before any
other LLC code.
It is tempting to just add a blanket mac_len check in llc_rcv, but
not sure whether that could break valid LLC paths that do not assume
an Ethernet header. 802.2 LLC may be used on top of non-802.3
protocols in principle. The below referenced commit shows that used
to, on top of Token Ring.
At least one of the three eth_hdr uses goes back to before the start
of git history. But the one that syzbot exercises is introduced in
this commit. That commit is old enough (2008), that effectively all
stable kernels should receive this.
Fixes: f83f1768f833 ("[LLC]: skb allocation size for responses")
Reported-by: syzbot+a8c7be6dee0de1b669cc(a)syzkaller.appspotmail.com
Signed-off-by: Willem de Bruijn <willemb(a)google.com>
---
Changes
v1->v2
- fix return value in llc_sap_action_send_test_r
- add Fixes tag
- cc: stable(a)vger.kernel.org
---
net/llc/llc_input.c | 10 ++++++++--
net/llc/llc_s_ac.c | 3 +++
net/llc/llc_station.c | 3 +++
3 files changed, 14 insertions(+), 2 deletions(-)
diff --git a/net/llc/llc_input.c b/net/llc/llc_input.c
index 7cac441862e21..51bccfb00a9cd 100644
--- a/net/llc/llc_input.c
+++ b/net/llc/llc_input.c
@@ -127,8 +127,14 @@ static inline int llc_fixup_skb(struct sk_buff *skb)
skb->transport_header += llc_len;
skb_pull(skb, llc_len);
if (skb->protocol == htons(ETH_P_802_2)) {
- __be16 pdulen = eth_hdr(skb)->h_proto;
- s32 data_size = ntohs(pdulen) - llc_len;
+ __be16 pdulen;
+ s32 data_size;
+
+ if (skb->mac_len < ETH_HLEN)
+ return 0;
+
+ pdulen = eth_hdr(skb)->h_proto;
+ data_size = ntohs(pdulen) - llc_len;
if (data_size < 0 ||
!pskb_may_pull(skb, data_size))
diff --git a/net/llc/llc_s_ac.c b/net/llc/llc_s_ac.c
index 79d1cef8f15a9..06fb8e6944b06 100644
--- a/net/llc/llc_s_ac.c
+++ b/net/llc/llc_s_ac.c
@@ -153,6 +153,9 @@ int llc_sap_action_send_test_r(struct llc_sap *sap, struct sk_buff *skb)
int rc = 1;
u32 data_size;
+ if (skb->mac_len < ETH_HLEN)
+ return 1;
+
llc_pdu_decode_sa(skb, mac_da);
llc_pdu_decode_da(skb, mac_sa);
llc_pdu_decode_ssap(skb, &dsap);
diff --git a/net/llc/llc_station.c b/net/llc/llc_station.c
index 05c6ae0920534..f506542925109 100644
--- a/net/llc/llc_station.c
+++ b/net/llc/llc_station.c
@@ -76,6 +76,9 @@ static int llc_station_ac_send_test_r(struct sk_buff *skb)
u32 data_size;
struct sk_buff *nskb;
+ if (skb->mac_len < ETH_HLEN)
+ goto out;
+
/* The test request command is type U (llc_len = 3) */
data_size = ntohs(eth_hdr(skb)->h_proto) - 3;
nskb = llc_alloc_frame(NULL, skb->dev, LLC_PDU_TYPE_U, data_size);
--
2.42.0.758.gaed0368e0e-goog
BPF_END and BPF_NEG has a different specification for the source bit in
the opcode compared to other ALU/ALU64 instructions, and is either
reserved or use to specify the byte swap endianness. In both cases the
source bit does not encode source operand location, and src_reg is a
reserved field.
backtrack_insn() currently does not differentiate BPF_END and BPF_NEG
from other ALU/ALU64 instructions, which leads to r0 being incorrectly
marked as precise when processing BPF_ALU | BPF_TO_BE | BPF_END
instructions. This commit teaches backtrack_insn() to correctly mark
precision for such case.
While precise tracking of BPF_NEG and other BPF_END instructions are
correct and does not need fixing, this commit opt to process all BPF_NEG
and BPF_END instructions within the same if-clause to better align with
current convention used in the verifier (e.g. check_alu_op).
Fixes: b5dc0163d8fd ("bpf: precise scalar_value tracking")
Cc: stable(a)vger.kernel.org
Reported-by: Mohamed Mahmoud <mmahmoud(a)redhat.com>
Closes: https://lore.kernel.org/r/87jzrrwptf.fsf@toke.dk
Tested-by: Toke Høiland-Jørgensen <toke(a)redhat.com>
Tested-by: Tao Lyu <tao.lyu(a)epfl.ch>
Acked-by: Eduard Zingerman <eddyz87(a)gmail.com>
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu(a)suse.com>
---
kernel/bpf/verifier.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 873ade146f3d..ba9aee3a4269 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3426,7 +3426,12 @@ static int backtrack_insn(struct bpf_verifier_env *env, int idx, int subseq_idx,
if (class == BPF_ALU || class == BPF_ALU64) {
if (!bt_is_reg_set(bt, dreg))
return 0;
- if (opcode == BPF_MOV) {
+ if (opcode == BPF_END || opcode == BPF_NEG) {
+ /* sreg is reserved and unused
+ * dreg still need precision before this insn
+ */
+ return 0;
+ } else if (opcode == BPF_MOV) {
if (BPF_SRC(insn->code) == BPF_X) {
/* dreg = sreg or dreg = (s8, s16, s32)sreg
* dreg needs precision after this insn
--
2.42.0
A recent change to the optimization pipeline in LLVM reveals some
fragility around the inlining of LoongArch's __percpu functions, which
manifests as a BUILD_BUG() failure:
In file included from kernel/sched/build_policy.c:17:
In file included from include/linux/sched/cputime.h:5:
In file included from include/linux/sched/signal.h:5:
In file included from include/linux/rculist.h:11:
In file included from include/linux/rcupdate.h:26:
In file included from include/linux/irqflags.h:18:
arch/loongarch/include/asm/percpu.h:97:3: error: call to '__compiletime_assert_51' declared with 'error' attribute: BUILD_BUG failed
97 | BUILD_BUG();
| ^
include/linux/build_bug.h:59:21: note: expanded from macro 'BUILD_BUG'
59 | #define BUILD_BUG() BUILD_BUG_ON_MSG(1, "BUILD_BUG failed")
| ^
include/linux/build_bug.h:39:37: note: expanded from macro 'BUILD_BUG_ON_MSG'
39 | #define BUILD_BUG_ON_MSG(cond, msg) compiletime_assert(!(cond), msg)
| ^
include/linux/compiler_types.h:425:2: note: expanded from macro 'compiletime_assert'
425 | _compiletime_assert(condition, msg, __compiletime_assert_, __COUNTER__)
| ^
include/linux/compiler_types.h:413:2: note: expanded from macro '_compiletime_assert'
413 | __compiletime_assert(condition, msg, prefix, suffix)
| ^
include/linux/compiler_types.h:406:4: note: expanded from macro '__compiletime_assert'
406 | prefix ## suffix(); \
| ^
<scratch space>:86:1: note: expanded from here
86 | __compiletime_assert_51
| ^
1 error generated.
If these functions are not inlined, the BUILD_BUG() in the default case
cannot be eliminated since the compiler cannot prove it is never used,
resulting in a build failure due to the error attribute.
Mark these functions as __always_inline so that the BUILD_BUG() only
triggers when the default case genuinely cannot be eliminated due to an
unexpected size.
Cc: <stable(a)vger.kernel.org>
Closes: https://github.com/ClangBuiltLinux/linux/issues/1955
Fixes: 46859ac8af52 ("LoongArch: Add multi-processor (SMP) support")
Link: https://github.com/llvm/llvm-project/commit/1a2e77cf9e11dbf56b5720c607313a5…
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
arch/loongarch/include/asm/percpu.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/loongarch/include/asm/percpu.h b/arch/loongarch/include/asm/percpu.h
index b9f567e66016..8fb857ae962b 100644
--- a/arch/loongarch/include/asm/percpu.h
+++ b/arch/loongarch/include/asm/percpu.h
@@ -32,7 +32,7 @@ static inline void set_my_cpu_offset(unsigned long off)
#define __my_cpu_offset __my_cpu_offset
#define PERCPU_OP(op, asm_op, c_op) \
-static inline unsigned long __percpu_##op(void *ptr, \
+static __always_inline unsigned long __percpu_##op(void *ptr, \
unsigned long val, int size) \
{ \
unsigned long ret; \
@@ -63,7 +63,7 @@ PERCPU_OP(and, and, &)
PERCPU_OP(or, or, |)
#undef PERCPU_OP
-static inline unsigned long __percpu_read(void *ptr, int size)
+static __always_inline unsigned long __percpu_read(void *ptr, int size)
{
unsigned long ret;
---
base-commit: 278be83601dd1725d4732241f066d528e160a39d
change-id: 20231101-loongarch-always-inline-percpu-ops-cf77c161871f
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
Top post yada yada, I don;t care.
So, I got emailed on 190 patches from you that I don't think you
intended to send publicly.
Greg, if you see this, this is probably already in stable and is not
6.1-stable material anyway, I just had backported some stuff to my
vendor tree that is the source of this.
Cheers,
Conor.
On Wed, Nov 01, 2023 at 05:56:25PM +0000, Saravanan.K.S wrote:
>
> [You don't often get email from saravanan.kadambathursubramaniyam(a)windriver.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>
> From: Heiko Stuebner <heiko.stuebner(a)vrull.eu>
>
> commit 7d27a3bb961dfb59ececd2f641b09fbe7198837e from
> https://github.com/linux4microchip/linux.git linux-6.1-mchp
>
> On the non-assembler-side wrapping alternative-macros inside other macros
> to prevent duplication of code works, as the end result will just be a
> string that gets fed to the asm instruction.
>
> In real assembler code, wrapping .macro blocks inside other .macro blocks
> brings more restrictions on usage it seems and the optimization done by
> commit 2ba8c7dc71c0 ("riscv: Don't duplicate __ALTERNATIVE_CFG in __ALTERNATIVE_CFG_2")
> results in a compile error like:
>
> ../arch/riscv/lib/strcmp.S: Assembler messages:
> ../arch/riscv/lib/strcmp.S:15: Error: too many positional arguments
> ../arch/riscv/lib/strcmp.S:15: Error: backward ref to unknown label "886:"
> ../arch/riscv/lib/strcmp.S:15: Error: backward ref to unknown label "887:"
> ../arch/riscv/lib/strcmp.S:15: Error: backward ref to unknown label "886:"
> ../arch/riscv/lib/strcmp.S:15: Error: backward ref to unknown label "887:"
> ../arch/riscv/lib/strcmp.S:15: Error: backward ref to unknown label "886:"
> ../arch/riscv/lib/strcmp.S:15: Error: attempt to move .org backwards
>
> Wrapping the variables containing assembler code in quotes solves this issue,
> compilation and the code in question still works and objdump also shows sane
> decompiled results of the affected code.
>
> Fixes: 2ba8c7dc71c0 ("riscv: Don't duplicate __ALTERNATIVE_CFG in __ALTERNATIVE_CFG_2")
> Signed-off-by: Heiko Stuebner <heiko.stuebner(a)vrull.eu>
> Reviewed-by: Palmer Dabbelt <palmer(a)rivosinc.com>
> Reviewed-by: Andrew Jones <ajones(a)ventanamicro.com>
> Link: https://lore.kernel.org/r/20230105192610.1940841-1-heiko@sntech.de
> Cc: stable(a)vger.kernel.org
> Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
> Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
> Signed-off-by: Saravanan.K.S <saravanan.kadambathursubramaniyam(a)windriver.com>
> ---
> arch/riscv/include/asm/alternative-macros.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/riscv/include/asm/alternative-macros.h b/arch/riscv/include/asm/alternative-macros.h
> index 7226e2462584..2c0f4c887289 100644
> --- a/arch/riscv/include/asm/alternative-macros.h
> +++ b/arch/riscv/include/asm/alternative-macros.h
> @@ -46,7 +46,7 @@
>
> .macro ALTERNATIVE_CFG_2 old_c, new_c_1, vendor_id_1, errata_id_1, enable_1, \
> new_c_2, vendor_id_2, errata_id_2, enable_2
> - ALTERNATIVE_CFG \old_c, \new_c_1, \vendor_id_1, \errata_id_1, \enable_1
> + ALTERNATIVE_CFG "\old_c", "\new_c_1", \vendor_id_1, \errata_id_1, \enable_1
> ALT_NEW_CONTENT \vendor_id_2, \errata_id_2, \enable_2, \new_c_2
> .endm
>
> --
> 2.40.0
>
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Looking at how dentry is removed via the tracefs system, I found that
eventfs does not do everything that it did under tracefs. The tracefs
removal of a dentry calls simple_recursive_removal() that does a lot more
than a simple d_invalidate().
As it should be a requirement that any eventfs_inode that has a dentry, so
does its parent. When removing a eventfs_inode, if it has a dentry, a call
to simple_recursive_removal() on that dentry should clean up all the
dentries underneath it.
Add WARN_ON_ONCE() to check for the parent having a dentry if any children
do.
Link: https://lore.kernel.org/all/20231101022553.GE1957730@ZenIV/
Link: https://lkml.kernel.org/r/20231101172650.552471568@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 77 +++++++++++++++++++++++-----------------
fs/tracefs/internal.h | 2 --
2 files changed, 44 insertions(+), 35 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 0087a3f455f1..f8a594a50ae6 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -967,30 +967,29 @@ static void unhook_dentry(struct dentry *dentry)
{
if (!dentry)
return;
-
- /* Keep the dentry from being freed yet (see eventfs_workfn()) */
+ /*
+ * Need to add a reference to the dentry that is expected by
+ * simple_recursive_removal(), which will include a dput().
+ */
dget(dentry);
- dentry->d_fsdata = NULL;
- d_invalidate(dentry);
- mutex_lock(&eventfs_mutex);
- /* dentry should now have at least a single reference */
- WARN_ONCE((int)d_count(dentry) < 1,
- "dentry %px (%s) less than one reference (%d) after invalidate\n",
- dentry, dentry->d_name.name, d_count(dentry));
- mutex_unlock(&eventfs_mutex);
+ /*
+ * Also add a reference for the dput() in eventfs_workfn().
+ * That is required as that dput() will free the ei after
+ * the SRCU grace period is over.
+ */
+ dget(dentry);
}
/**
* eventfs_remove_rec - remove eventfs dir or file from list
* @ei: eventfs_inode to be removed.
- * @head: the list head to place the deleted @ei and children
* @level: prevent recursion from going more than 3 levels deep.
*
* This function recursively removes eventfs_inodes which
* contains info of files and/or directories.
*/
-static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head, int level)
+static void eventfs_remove_rec(struct eventfs_inode *ei, int level)
{
struct eventfs_inode *ei_child;
@@ -1009,13 +1008,26 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head,
/* search for nested folders or files */
list_for_each_entry_srcu(ei_child, &ei->children, list,
lockdep_is_held(&eventfs_mutex)) {
- eventfs_remove_rec(ei_child, head, level + 1);
+ /* Children only have dentry if parent does */
+ WARN_ON_ONCE(ei_child->dentry && !ei->dentry);
+ eventfs_remove_rec(ei_child, level + 1);
}
+
ei->is_freed = 1;
+ for (int i = 0; i < ei->nr_entries; i++) {
+ if (ei->d_children[i]) {
+ /* Children only have dentry if parent does */
+ WARN_ON_ONCE(!ei->dentry);
+ unhook_dentry(ei->d_children[i]);
+ }
+ }
+
+ unhook_dentry(ei->dentry);
+
list_del_rcu(&ei->list);
- list_add_tail(&ei->del_list, head);
+ call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei);
}
/**
@@ -1026,30 +1038,22 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head,
*/
void eventfs_remove_dir(struct eventfs_inode *ei)
{
- struct eventfs_inode *tmp;
- LIST_HEAD(ei_del_list);
+ struct dentry *dentry;
if (!ei)
return;
- /*
- * Move the deleted eventfs_inodes onto the ei_del_list
- * which will also set the is_freed value. Note, this has to be
- * done under the eventfs_mutex, but the deletions of
- * the dentries must be done outside the eventfs_mutex.
- * Hence moving them to this temporary list.
- */
mutex_lock(&eventfs_mutex);
- eventfs_remove_rec(ei, &ei_del_list, 0);
+ dentry = ei->dentry;
+ eventfs_remove_rec(ei, 0);
mutex_unlock(&eventfs_mutex);
- list_for_each_entry_safe(ei, tmp, &ei_del_list, del_list) {
- for (int i = 0; i < ei->nr_entries; i++)
- unhook_dentry(ei->d_children[i]);
- unhook_dentry(ei->dentry);
- list_del(&ei->del_list);
- call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei);
- }
+ /*
+ * If any of the ei children has a dentry, then the ei itself
+ * must have a dentry.
+ */
+ if (dentry)
+ simple_recursive_removal(dentry, NULL);
}
/**
@@ -1060,10 +1064,17 @@ void eventfs_remove_dir(struct eventfs_inode *ei)
*/
void eventfs_remove_events_dir(struct eventfs_inode *ei)
{
- struct dentry *dentry = ei->dentry;
+ struct dentry *dentry;
+ dentry = ei->dentry;
eventfs_remove_dir(ei);
- /* Matches the dget() from eventfs_create_events_dir() */
+ /*
+ * Matches the dget() done by tracefs_start_creating()
+ * in eventfs_create_events_dir() when it the dentry was
+ * created. In other words, it's a normal dentry that
+ * sticks around while the other ei->dentry are created
+ * and destroyed dynamically.
+ */
dput(dentry);
}
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 06a1f220b901..ccee18ca66c7 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -55,12 +55,10 @@ struct eventfs_inode {
/*
* Union - used for deletion
* @llist: for calling dput() if needed after RCU
- * @del_list: list of eventfs_inode to delete
* @rcu: eventfs_inode to delete in RCU
*/
union {
struct llist_node llist;
- struct list_head del_list;
struct rcu_head rcu;
};
unsigned int is_freed:1;
--
2.42.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
There exists a race between holding a reference of an eventfs_inode dentry
and the freeing of the eventfs_inode. If user space has a dentry held long
enough, it may still be able to access the dentry's eventfs_inode after it
has been freed.
To prevent this, have he eventfs_inode freed via the last dput() (or via
RCU if the eventfs_inode does not have a dentry).
This means reintroducing the eventfs_inode del_list field at a temporary
place to put the eventfs_inode. It needs to mark it as freed (via the
list) but also must invalidate the dentry immediately as the return from
eventfs_remove_dir() expects that they are. But the dentry invalidation
must not be called under the eventfs_mutex, so it must be done after the
eventfs_inode is marked as free (put on a deletion list).
Link: https://lkml.kernel.org/r/20231101172650.123479767@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ajay Kaher <akaher(a)vmware.com>
Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 146 ++++++++++++++++++---------------------
fs/tracefs/internal.h | 2 +
2 files changed, 69 insertions(+), 79 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 8ac9abf7a3d5..0a04ae0ca8c8 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -85,8 +85,7 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
mutex_lock(&eventfs_mutex);
ei = dentry->d_fsdata;
- /* The LSB is set when the eventfs_inode is being freed */
- if (((unsigned long)ei & 1UL) || ei->is_freed) {
+ if (ei->is_freed) {
/* Do not allow changes if the event is about to be removed. */
mutex_unlock(&eventfs_mutex);
return -ENODEV;
@@ -276,35 +275,17 @@ static void free_ei(struct eventfs_inode *ei)
void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
{
struct tracefs_inode *ti_parent;
- struct eventfs_inode *ei_child, *tmp;
struct eventfs_inode *ei;
int i;
/* The top level events directory may be freed by this */
if (unlikely(ti->flags & TRACEFS_EVENT_TOP_INODE)) {
- LIST_HEAD(ef_del_list);
-
mutex_lock(&eventfs_mutex);
-
ei = ti->private;
-
- /* Record all the top level files */
- list_for_each_entry_srcu(ei_child, &ei->children, list,
- lockdep_is_held(&eventfs_mutex)) {
- list_add_tail(&ei_child->del_list, &ef_del_list);
- }
-
/* Nothing should access this, but just in case! */
ti->private = NULL;
-
mutex_unlock(&eventfs_mutex);
- /* Now safely free the top level files and their children */
- list_for_each_entry_safe(ei_child, tmp, &ef_del_list, del_list) {
- list_del(&ei_child->del_list);
- eventfs_remove_dir(ei_child);
- }
-
free_ei(ei);
return;
}
@@ -319,14 +300,6 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
if (!ei)
goto out;
- /*
- * If ei was freed, then the LSB bit is set for d_fsdata.
- * But this should not happen, as it should still have a
- * ref count that prevents it. Warn in case it does.
- */
- if (WARN_ON_ONCE((unsigned long)ei & 1))
- goto out;
-
/* This could belong to one of the files of the ei */
if (ei->dentry != dentry) {
for (i = 0; i < ei->nr_entries; i++) {
@@ -336,6 +309,8 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
if (WARN_ON_ONCE(i == ei->nr_entries))
goto out;
ei->d_children[i] = NULL;
+ } else if (ei->is_freed) {
+ free_ei(ei);
} else {
ei->dentry = NULL;
}
@@ -962,13 +937,65 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
return ERR_PTR(-ENOMEM);
}
+static LLIST_HEAD(free_list);
+
+static void eventfs_workfn(struct work_struct *work)
+{
+ struct eventfs_inode *ei, *tmp;
+ struct llist_node *llnode;
+
+ llnode = llist_del_all(&free_list);
+ llist_for_each_entry_safe(ei, tmp, llnode, llist) {
+ /* This dput() matches the dget() from unhook_dentry() */
+ for (int i = 0; i < ei->nr_entries; i++) {
+ if (ei->d_children[i])
+ dput(ei->d_children[i]);
+ }
+ /* This should only get here if it had a dentry */
+ if (!WARN_ON_ONCE(!ei->dentry))
+ dput(ei->dentry);
+ }
+}
+
+static DECLARE_WORK(eventfs_work, eventfs_workfn);
+
static void free_rcu_ei(struct rcu_head *head)
{
struct eventfs_inode *ei = container_of(head, struct eventfs_inode, rcu);
+ if (ei->dentry) {
+ /* Do not free the ei until all references of dentry are gone */
+ if (llist_add(&ei->llist, &free_list))
+ queue_work(system_unbound_wq, &eventfs_work);
+ return;
+ }
+
+ /* If the ei doesn't have a dentry, neither should its children */
+ for (int i = 0; i < ei->nr_entries; i++) {
+ WARN_ON_ONCE(ei->d_children[i]);
+ }
+
free_ei(ei);
}
+static void unhook_dentry(struct dentry *dentry)
+{
+ if (!dentry)
+ return;
+
+ /* Keep the dentry from being freed yet (see eventfs_workfn()) */
+ dget(dentry);
+
+ dentry->d_fsdata = NULL;
+ d_invalidate(dentry);
+ mutex_lock(&eventfs_mutex);
+ /* dentry should now have at least a single reference */
+ WARN_ONCE((int)d_count(dentry) < 1,
+ "dentry %px (%s) less than one reference (%d) after invalidate\n",
+ dentry, dentry->d_name.name, d_count(dentry));
+ mutex_unlock(&eventfs_mutex);
+}
+
/**
* eventfs_remove_rec - remove eventfs dir or file from list
* @ei: eventfs_inode to be removed.
@@ -1006,33 +1033,6 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head,
list_add_tail(&ei->del_list, head);
}
-static void unhook_dentry(struct dentry **dentry, struct dentry **list)
-{
- if (*dentry) {
- unsigned long ptr = (unsigned long)*list;
-
- /* Keep the dentry from being freed yet */
- dget(*dentry);
-
- /*
- * Paranoid: The dget() above should prevent the dentry
- * from being freed and calling eventfs_set_ei_status_free().
- * But just in case, set the link list LSB pointer to 1
- * and have eventfs_set_ei_status_free() check that to
- * make sure that if it does happen, it will not think
- * the d_fsdata is an eventfs_inode.
- *
- * For this to work, no eventfs_inode should be allocated
- * on a odd space, as the ef should always be allocated
- * to be at least word aligned. Check for that too.
- */
- WARN_ON_ONCE(ptr & 1);
-
- (*dentry)->d_fsdata = (void *)(ptr | 1);
- *list = *dentry;
- *dentry = NULL;
- }
-}
/**
* eventfs_remove_dir - remove eventfs dir or file from list
* @ei: eventfs_inode to be removed.
@@ -1043,40 +1043,28 @@ void eventfs_remove_dir(struct eventfs_inode *ei)
{
struct eventfs_inode *tmp;
LIST_HEAD(ei_del_list);
- struct dentry *dentry_list = NULL;
- struct dentry *dentry;
- int i;
if (!ei)
return;
+ /*
+ * Move the deleted eventfs_inodes onto the ei_del_list
+ * which will also set the is_freed value. Note, this has to be
+ * done under the eventfs_mutex, but the deletions of
+ * the dentries must be done outside the eventfs_mutex.
+ * Hence moving them to this temporary list.
+ */
mutex_lock(&eventfs_mutex);
eventfs_remove_rec(ei, &ei_del_list, 0);
+ mutex_unlock(&eventfs_mutex);
list_for_each_entry_safe(ei, tmp, &ei_del_list, del_list) {
- for (i = 0; i < ei->nr_entries; i++)
- unhook_dentry(&ei->d_children[i], &dentry_list);
- unhook_dentry(&ei->dentry, &dentry_list);
+ for (int i = 0; i < ei->nr_entries; i++)
+ unhook_dentry(ei->d_children[i]);
+ unhook_dentry(ei->dentry);
+ list_del(&ei->del_list);
call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei);
}
- mutex_unlock(&eventfs_mutex);
-
- while (dentry_list) {
- unsigned long ptr;
-
- dentry = dentry_list;
- ptr = (unsigned long)dentry->d_fsdata & ~1UL;
- dentry_list = (struct dentry *)ptr;
- dentry->d_fsdata = NULL;
- d_invalidate(dentry);
- mutex_lock(&eventfs_mutex);
- /* dentry should now have at least a single reference */
- WARN_ONCE((int)d_count(dentry) < 1,
- "dentry %px (%s) less than one reference (%d) after invalidate\n",
- dentry, dentry->d_name.name, d_count(dentry));
- mutex_unlock(&eventfs_mutex);
- dput(dentry);
- }
}
/**
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 5f60bcd69289..06a1f220b901 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -54,10 +54,12 @@ struct eventfs_inode {
void *data;
/*
* Union - used for deletion
+ * @llist: for calling dput() if needed after RCU
* @del_list: list of eventfs_inode to delete
* @rcu: eventfs_inode to delete in RCU
*/
union {
+ struct llist_node llist;
struct list_head del_list;
struct rcu_head rcu;
};
--
2.42.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The eventfs_inode->is_freed was a union with the rcu_head with the
assumption that when it was on the srcu list the head would contain a
pointer which would make "is_freed" true. But that was a wrong assumption
as the rcu head is a single link list where the last element is NULL.
Instead, split the nr_entries integer so that "is_freed" is one bit and
the nr_entries is the next 31 bits. As there shouldn't be more than 10
(currently there's at most 5 to 7 depending on the config), this should
not be a problem.
Link: https://lkml.kernel.org/r/20231101172649.049758712@goodmis.org
Cc: stable(a)vger.kernel.org
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Ajay Kaher <akaher(a)vmware.com>
Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 2 ++
fs/tracefs/internal.h | 6 +++---
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 9f612a8f009d..1ce73acf3df0 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -824,6 +824,8 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head,
eventfs_remove_rec(ei_child, head, level + 1);
}
+ ei->is_freed = 1;
+
list_del_rcu(&ei->list);
list_add_tail(&ei->del_list, head);
}
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 64fde9490f52..c7d88aaa949f 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -23,6 +23,7 @@ struct tracefs_inode {
* @d_parent: pointer to the parent's dentry
* @d_children: The array of dentries to represent the files when created
* @data: The private data to pass to the callbacks
+ * @is_freed: Flag set if the eventfs is on its way to be freed
* @nr_entries: The number of items in @entries
*/
struct eventfs_inode {
@@ -38,14 +39,13 @@ struct eventfs_inode {
* Union - used for deletion
* @del_list: list of eventfs_inode to delete
* @rcu: eventfs_inode to delete in RCU
- * @is_freed: node is freed if one of the above is set
*/
union {
struct list_head del_list;
struct rcu_head rcu;
- unsigned long is_freed;
};
- int nr_entries;
+ unsigned int is_freed:1;
+ unsigned int nr_entries:31;
};
static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
--
2.42.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
A synthetic event is created by the synthetic event interface that can
read both user or kernel address memory. In reality, it reads any
arbitrary memory location from within the kernel. If the address space is
in USER (where CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE is set) then
it uses strncpy_from_user_nofault() to copy strings otherwise it uses
strncpy_from_kernel_nofault().
But since both functions use the same variable there's no annotation to
what that variable is (ie. __user). This makes sparse complain.
Quiet sparse by typecasting the strncpy_from_user_nofault() variable to
a __user pointer.
Link: https://lore.kernel.org/linux-trace-kernel/20231031151033.73c42e23@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Fixes: 0934ae9977c2 ("tracing: Fix reading strings from synthetic events");
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311010013.fm8WTxa5-lkp@intel.com/
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_events_synth.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 14cb275a0bab..846e02c0fb59 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -452,7 +452,7 @@ static unsigned int trace_string(struct synth_trace_event *entry,
#ifdef CONFIG_ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE
if ((unsigned long)str_val < TASK_SIZE)
- ret = strncpy_from_user_nofault(str_field, str_val, STR_VAR_LEN_MAX);
+ ret = strncpy_from_user_nofault(str_field, (const void __user *)str_val, STR_VAR_LEN_MAX);
else
#endif
ret = strncpy_from_kernel_nofault(str_field, str_val, STR_VAR_LEN_MAX);
--
2.42.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The following can crash the kernel:
# cd /sys/kernel/tracing
# echo 'p:sched schedule' > kprobe_events
# exec 5>>events/kprobes/sched/enable
# > kprobe_events
# exec 5>&-
The above commands:
1. Change directory to the tracefs directory
2. Create a kprobe event (doesn't matter what one)
3. Open bash file descriptor 5 on the enable file of the kprobe event
4. Delete the kprobe event (removes the files too)
5. Close the bash file descriptor 5
The above causes a crash!
BUG: kernel NULL pointer dereference, address: 0000000000000028
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP PTI
CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
RIP: 0010:tracing_release_file_tr+0xc/0x50
What happens here is that the kprobe event creates a trace_event_file
"file" descriptor that represents the file in tracefs to the event. It
maintains state of the event (is it enabled for the given instance?).
Opening the "enable" file gets a reference to the event "file" descriptor
via the open file descriptor. When the kprobe event is deleted, the file is
also deleted from the tracefs system which also frees the event "file"
descriptor.
But as the tracefs file is still opened by user space, it will not be
totally removed until the final dput() is called on it. But this is not
true with the event "file" descriptor that is already freed. If the user
does a write to or simply closes the file descriptor it will reference the
event "file" descriptor that was just freed, causing a use-after-free bug.
To solve this, add a ref count to the event "file" descriptor as well as a
new flag called "FREED". The "file" will not be freed until the last
reference is released. But the FREE flag will be set when the event is
removed to prevent any more modifications to that event from happening,
even if there's still a reference to the event "file" descriptor.
Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.…
Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files")
Reported-by: Beau Belgrave <beaub(a)linux.microsoft.com>
Tested-by: Beau Belgrave <beaub(a)linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/linux/trace_events.h | 4 ++++
kernel/trace/trace.c | 15 +++++++++++++++
kernel/trace/trace.h | 3 +++
kernel/trace/trace_events.c | 31 ++++++++++++++++++++++++++----
kernel/trace/trace_events_filter.c | 3 +++
5 files changed, 52 insertions(+), 4 deletions(-)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index 12207dc6722d..696f8dc4aa53 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -492,6 +492,7 @@ enum {
EVENT_FILE_FL_TRIGGER_COND_BIT,
EVENT_FILE_FL_PID_FILTER_BIT,
EVENT_FILE_FL_WAS_ENABLED_BIT,
+ EVENT_FILE_FL_FREED_BIT,
};
extern struct trace_event_file *trace_get_event_file(const char *instance,
@@ -630,6 +631,7 @@ extern int __kprobe_event_add_fields(struct dynevent_cmd *cmd, ...);
* TRIGGER_COND - When set, one or more triggers has an associated filter
* PID_FILTER - When set, the event is filtered based on pid
* WAS_ENABLED - Set when enabled to know to clear trace on module removal
+ * FREED - File descriptor is freed, all fields should be considered invalid
*/
enum {
EVENT_FILE_FL_ENABLED = (1 << EVENT_FILE_FL_ENABLED_BIT),
@@ -643,6 +645,7 @@ enum {
EVENT_FILE_FL_TRIGGER_COND = (1 << EVENT_FILE_FL_TRIGGER_COND_BIT),
EVENT_FILE_FL_PID_FILTER = (1 << EVENT_FILE_FL_PID_FILTER_BIT),
EVENT_FILE_FL_WAS_ENABLED = (1 << EVENT_FILE_FL_WAS_ENABLED_BIT),
+ EVENT_FILE_FL_FREED = (1 << EVENT_FILE_FL_FREED_BIT),
};
struct trace_event_file {
@@ -671,6 +674,7 @@ struct trace_event_file {
* caching and such. Which is mostly OK ;-)
*/
unsigned long flags;
+ atomic_t ref; /* ref count for opened files */
atomic_t sm_ref; /* soft-mode reference counter */
atomic_t tm_ref; /* trigger-mode reference counter */
};
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 2539cfc20a97..9aebf904ff97 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4978,6 +4978,20 @@ int tracing_open_file_tr(struct inode *inode, struct file *filp)
if (ret)
return ret;
+ mutex_lock(&event_mutex);
+
+ /* Fail if the file is marked for removal */
+ if (file->flags & EVENT_FILE_FL_FREED) {
+ trace_array_put(file->tr);
+ ret = -ENODEV;
+ } else {
+ event_file_get(file);
+ }
+
+ mutex_unlock(&event_mutex);
+ if (ret)
+ return ret;
+
filp->private_data = inode->i_private;
return 0;
@@ -4988,6 +5002,7 @@ int tracing_release_file_tr(struct inode *inode, struct file *filp)
struct trace_event_file *file = inode->i_private;
trace_array_put(file->tr);
+ event_file_put(file);
return 0;
}
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 0e1405abf4f7..b7f4ea25a194 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -1669,6 +1669,9 @@ extern void event_trigger_unregister(struct event_command *cmd_ops,
char *glob,
struct event_trigger_data *trigger_data);
+extern void event_file_get(struct trace_event_file *file);
+extern void event_file_put(struct trace_event_file *file);
+
/**
* struct event_trigger_ops - callbacks for trace event triggers
*
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index f9e3e24d8796..f29e815ca5b2 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -990,13 +990,35 @@ static void remove_subsystem(struct trace_subsystem_dir *dir)
}
}
+void event_file_get(struct trace_event_file *file)
+{
+ atomic_inc(&file->ref);
+}
+
+void event_file_put(struct trace_event_file *file)
+{
+ if (WARN_ON_ONCE(!atomic_read(&file->ref))) {
+ if (file->flags & EVENT_FILE_FL_FREED)
+ kmem_cache_free(file_cachep, file);
+ return;
+ }
+
+ if (atomic_dec_and_test(&file->ref)) {
+ /* Count should only go to zero when it is freed */
+ if (WARN_ON_ONCE(!(file->flags & EVENT_FILE_FL_FREED)))
+ return;
+ kmem_cache_free(file_cachep, file);
+ }
+}
+
static void remove_event_file_dir(struct trace_event_file *file)
{
eventfs_remove_dir(file->ei);
list_del(&file->list);
remove_subsystem(file->system);
free_event_filter(file->filter);
- kmem_cache_free(file_cachep, file);
+ file->flags |= EVENT_FILE_FL_FREED;
+ event_file_put(file);
}
/*
@@ -1369,7 +1391,7 @@ event_enable_read(struct file *filp, char __user *ubuf, size_t cnt,
flags = file->flags;
mutex_unlock(&event_mutex);
- if (!file)
+ if (!file || flags & EVENT_FILE_FL_FREED)
return -ENODEV;
if (flags & EVENT_FILE_FL_ENABLED &&
@@ -1403,7 +1425,7 @@ event_enable_write(struct file *filp, const char __user *ubuf, size_t cnt,
ret = -ENODEV;
mutex_lock(&event_mutex);
file = event_file_data(filp);
- if (likely(file)) {
+ if (likely(file && !(file->flags & EVENT_FILE_FL_FREED))) {
ret = tracing_update_buffers(file->tr);
if (ret < 0) {
mutex_unlock(&event_mutex);
@@ -1683,7 +1705,7 @@ event_filter_read(struct file *filp, char __user *ubuf, size_t cnt,
mutex_lock(&event_mutex);
file = event_file_data(filp);
- if (file)
+ if (file && !(file->flags & EVENT_FILE_FL_FREED))
print_event_filter(file, s);
mutex_unlock(&event_mutex);
@@ -2902,6 +2924,7 @@ trace_create_new_event(struct trace_event_call *call,
atomic_set(&file->tm_ref, 0);
INIT_LIST_HEAD(&file->triggers);
list_add(&file->list, &tr->events);
+ event_file_get(file);
return file;
}
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 33264e510d16..0c611b281a5b 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -2349,6 +2349,9 @@ int apply_event_filter(struct trace_event_file *file, char *filter_string)
struct event_filter *filter = NULL;
int err;
+ if (file->flags & EVENT_FILE_FL_FREED)
+ return -ENODEV;
+
if (!strcmp(strstrip(filter_string), "0")) {
filter_disable(file);
filter = event_filter(file);
--
2.42.0
The quilt patch titled
Subject: scripts/gdb/vmalloc: disable on no-MMU
has been removed from the -mm tree. Its filename was
scripts-gdb-vmalloc-disable-on-no-mmu.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ben Wolsieffer <ben.wolsieffer(a)hefring.com>
Subject: scripts/gdb/vmalloc: disable on no-MMU
Date: Tue, 31 Oct 2023 16:22:36 -0400
vmap_area does not exist on no-MMU, therefore the GDB scripts fail to
load:
Traceback (most recent call last):
File "<...>/vmlinux-gdb.py", line 51, in <module>
import linux.vmalloc
File "<...>/scripts/gdb/linux/vmalloc.py", line 14, in <module>
vmap_area_ptr_type = vmap_area_type.get_type().pointer()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<...>/scripts/gdb/linux/utils.py", line 28, in get_type
self._type = gdb.lookup_type(self._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
gdb.error: No struct type named vmap_area.
To fix this, disable the command and add an informative error message if
CONFIG_MMU is not defined, following the example of lx-slabinfo.
Link: https://lkml.kernel.org/r/20231031202235.2655333-2-ben.wolsieffer@hefring.c…
Fixes: 852622bf3616 ("scripts/gdb/vmalloc: add vmallocinfo support")
Signed-off-by: Ben Wolsieffer <ben.wolsieffer(a)hefring.com>
Cc: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Kieran Bingham <kbingham(a)kernel.org>
Cc: Kuan-Ying Lee <Kuan-Ying.Lee(a)mediatek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gdb/linux/constants.py.in | 1 +
scripts/gdb/linux/vmalloc.py | 8 ++++++--
2 files changed, 7 insertions(+), 2 deletions(-)
--- a/scripts/gdb/linux/constants.py.in~scripts-gdb-vmalloc-disable-on-no-mmu
+++ a/scripts/gdb/linux/constants.py.in
@@ -158,3 +158,4 @@ LX_CONFIG(CONFIG_STACKDEPOT)
LX_CONFIG(CONFIG_PAGE_OWNER)
LX_CONFIG(CONFIG_SLUB_DEBUG)
LX_CONFIG(CONFIG_SLAB_FREELIST_HARDENED)
+LX_CONFIG(CONFIG_MMU)
--- a/scripts/gdb/linux/vmalloc.py~scripts-gdb-vmalloc-disable-on-no-mmu
+++ a/scripts/gdb/linux/vmalloc.py
@@ -10,8 +10,9 @@ import gdb
import re
from linux import lists, utils, stackdepot, constants, mm
-vmap_area_type = utils.CachedType('struct vmap_area')
-vmap_area_ptr_type = vmap_area_type.get_type().pointer()
+if constants.LX_CONFIG_MMU:
+ vmap_area_type = utils.CachedType('struct vmap_area')
+ vmap_area_ptr_type = vmap_area_type.get_type().pointer()
def is_vmalloc_addr(x):
pg_ops = mm.page_ops().ops
@@ -25,6 +26,9 @@ class LxVmallocInfo(gdb.Command):
super(LxVmallocInfo, self).__init__("lx-vmallocinfo", gdb.COMMAND_DATA)
def invoke(self, arg, from_tty):
+ if not constants.LX_CONFIG_MMU:
+ raise gdb.GdbError("Requires MMU support")
+
vmap_area_list = gdb.parse_and_eval('vmap_area_list')
for vmap_area in lists.list_for_each_entry(vmap_area_list, vmap_area_ptr_type, "list"):
if not vmap_area['vm']:
_
Patches currently in -mm which might be from ben.wolsieffer(a)hefring.com are
The quilt patch titled
Subject: mm/damon/sysfs: update monitoring target regions for online input commit
has been removed from the -mm tree. Its filename was
mm-damon-sysfs-update-monitoring-target-regions-for-online-input-commit.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/sysfs: update monitoring target regions for online input commit
Date: Tue, 31 Oct 2023 17:01:31 +0000
When user input is committed online, DAMON sysfs interface is ignoring the
user input for the monitoring target regions. Such request is valid and
useful for fixed monitoring target regions-based monitoring ops like
'paddr' or 'fvaddr'.
Update the region boundaries as user specified, too. Note that the
monitoring results of the regions that overlap between the latest
monitoring target regions and the new target regions are preserved.
Treat empty monitoring target regions user request as a request to just
make no change to the monitoring target regions. Otherwise, users should
set the monitoring target regions same to current one for every online
input commit, and it could be challenging for dynamic monitoring target
regions update DAMON ops like 'vaddr'. If the user really need to remove
all monitoring target regions, they can simply remove the target and then
create the target again with empty target regions.
Link: https://lkml.kernel.org/r/20231031170131.46972-1-sj@kernel.org
Fixes: da87878010e5 ("mm/damon/sysfs: support online inputs update")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [5.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs.c | 47 ++++++++++++++++++++++++++++-----------------
1 file changed, 30 insertions(+), 17 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-update-monitoring-target-regions-for-online-input-commit
+++ a/mm/damon/sysfs.c
@@ -1150,34 +1150,47 @@ destroy_targets_out:
return err;
}
-static int damon_sysfs_update_target(struct damon_target *target,
- struct damon_ctx *ctx,
- struct damon_sysfs_target *sys_target)
+static int damon_sysfs_update_target_pid(struct damon_target *target, int pid)
{
- struct pid *pid;
- struct damon_region *r, *next;
-
- if (!damon_target_has_pid(ctx))
- return 0;
+ struct pid *pid_new;
- pid = find_get_pid(sys_target->pid);
- if (!pid)
+ pid_new = find_get_pid(pid);
+ if (!pid_new)
return -EINVAL;
- /* no change to the target */
- if (pid == target->pid) {
- put_pid(pid);
+ if (pid_new == target->pid) {
+ put_pid(pid_new);
return 0;
}
- /* remove old monitoring results and update the target's pid */
- damon_for_each_region_safe(r, next, target)
- damon_destroy_region(r, target);
put_pid(target->pid);
- target->pid = pid;
+ target->pid = pid_new;
return 0;
}
+static int damon_sysfs_update_target(struct damon_target *target,
+ struct damon_ctx *ctx,
+ struct damon_sysfs_target *sys_target)
+{
+ int err;
+
+ if (damon_target_has_pid(ctx)) {
+ err = damon_sysfs_update_target_pid(target, sys_target->pid);
+ if (err)
+ return err;
+ }
+
+ /*
+ * Do monitoring target region boundary update only if one or more
+ * regions are set by the user. This is for keeping current monitoring
+ * target results and range easier, especially for dynamic monitoring
+ * target regions update ops like 'vaddr'.
+ */
+ if (sys_target->regions->nr)
+ err = damon_sysfs_set_regions(target, sys_target->regions);
+ return err;
+}
+
static int damon_sysfs_set_targets(struct damon_ctx *ctx,
struct damon_sysfs_targets *sysfs_targets)
{
_
Patches currently in -mm which might be from sj(a)kernel.org are
The quilt patch titled
Subject: mm/damon/sysfs: remove requested targets when online-commit inputs
has been removed from the -mm tree. Its filename was
mm-damon-sysfs-remove-requested-targets-when-online-commit-inputs.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/sysfs: remove requested targets when online-commit inputs
Date: Sun, 22 Oct 2023 21:07:33 +0000
damon_sysfs_set_targets(), which updates the targets of the context for
online commitment, do not remove targets that removed from the
corresponding sysfs files. As a result, more than intended targets of the
context can exist and hence consume memory and monitoring CPU resource
more than expected.
Fix it by removing all targets of the context and fill up again using the
user input. This could cause unnecessary memory dealloc and realloc
operations, but this is not a hot code path. Also, note that damon_target
is stateless, and hence no data is lost.
[sj(a)kernel.org: fix unnecessary monitoring results removal]
Link: https://lkml.kernel.org/r/20231028213353.45397-1-sj@kernel.org
Link: https://lkml.kernel.org/r/20231022210735.46409-2-sj@kernel.org
Fixes: da87878010e5 ("mm/damon/sysfs: support online inputs update")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: Brendan Higgins <brendanhiggins(a)google.com>
Cc: <stable(a)vger.kernel.org> [5.19.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs.c | 68 +++++++++++++++++++++++----------------------
1 file changed, 35 insertions(+), 33 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-remove-requested-targets-when-online-commit-inputs
+++ a/mm/damon/sysfs.c
@@ -1150,58 +1150,60 @@ destroy_targets_out:
return err;
}
-/*
- * Search a target in a context that corresponds to the sysfs target input.
- *
- * Return: pointer to the target if found, NULL if not found, or negative
- * error code if the search failed.
- */
-static struct damon_target *damon_sysfs_existing_target(
- struct damon_sysfs_target *sys_target, struct damon_ctx *ctx)
+static int damon_sysfs_update_target(struct damon_target *target,
+ struct damon_ctx *ctx,
+ struct damon_sysfs_target *sys_target)
{
struct pid *pid;
- struct damon_target *t;
+ struct damon_region *r, *next;
- if (!damon_target_has_pid(ctx)) {
- /* Up to only one target for paddr could exist */
- damon_for_each_target(t, ctx)
- return t;
- return NULL;
- }
+ if (!damon_target_has_pid(ctx))
+ return 0;
- /* ops.id should be DAMON_OPS_VADDR or DAMON_OPS_FVADDR */
pid = find_get_pid(sys_target->pid);
if (!pid)
- return ERR_PTR(-EINVAL);
- damon_for_each_target(t, ctx) {
- if (t->pid == pid) {
- put_pid(pid);
- return t;
- }
+ return -EINVAL;
+
+ /* no change to the target */
+ if (pid == target->pid) {
+ put_pid(pid);
+ return 0;
}
- put_pid(pid);
- return NULL;
+
+ /* remove old monitoring results and update the target's pid */
+ damon_for_each_region_safe(r, next, target)
+ damon_destroy_region(r, target);
+ put_pid(target->pid);
+ target->pid = pid;
+ return 0;
}
static int damon_sysfs_set_targets(struct damon_ctx *ctx,
struct damon_sysfs_targets *sysfs_targets)
{
- int i, err;
+ struct damon_target *t, *next;
+ int i = 0, err;
/* Multiple physical address space monitoring targets makes no sense */
if (ctx->ops.id == DAMON_OPS_PADDR && sysfs_targets->nr > 1)
return -EINVAL;
- for (i = 0; i < sysfs_targets->nr; i++) {
+ damon_for_each_target_safe(t, next, ctx) {
+ if (i < sysfs_targets->nr) {
+ damon_sysfs_update_target(t, ctx,
+ sysfs_targets->targets_arr[i]);
+ } else {
+ if (damon_target_has_pid(ctx))
+ put_pid(t->pid);
+ damon_destroy_target(t);
+ }
+ i++;
+ }
+
+ for (; i < sysfs_targets->nr; i++) {
struct damon_sysfs_target *st = sysfs_targets->targets_arr[i];
- struct damon_target *t = damon_sysfs_existing_target(st, ctx);
- if (IS_ERR(t))
- return PTR_ERR(t);
- if (!t)
- err = damon_sysfs_add_target(st, ctx);
- else
- err = damon_sysfs_set_regions(t, st->regions);
+ err = damon_sysfs_add_target(st, ctx);
if (err)
return err;
}
_
Patches currently in -mm which might be from sj(a)kernel.org are
This is the start of the stable review cycle for the 6.1.61 release.
There are 86 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 02 Nov 2023 16:59:03 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.61-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.1.61-rc1
John Sperbeck <jsperbeck(a)google.com>
objtool/x86: add missing embedded_insn check
Baokun Li <libaokun1(a)huawei.com>
ext4: avoid overlapping preallocations due to overflow
Baokun Li <libaokun1(a)huawei.com>
ext4: fix BUG in ext4_mb_new_inode_pa() due to overflow
Baokun Li <libaokun1(a)huawei.com>
ext4: add two helper functions extent_logical_end() and pa_logical_end()
David Lazar <dlazar(a)gmail.com>
platform/x86: Add s2idle quirk for more Lenovo laptops
Alessandro Carminati <alessandro.carminati(a)gmail.com>
clk: Sanitize possible_parent_show to Handle Return Value of of_clk_get_parent_name
Al Viro <viro(a)zeniv.linux.org.uk>
sparc32: fix a braino in fault handling in csum_and_copy_..._user()
Peter Zijlstra <peterz(a)infradead.org>
perf/core: Fix potential NULL deref
Tony Luck <tony.luck(a)intel.com>
x86/cpu: Add model number for Intel Arrow Lake mobile processor
Thomas Gleixner <tglx(a)linutronix.de>
x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility
Peng Fan <peng.fan(a)nxp.com>
nvmem: imx: correct nregs for i.MX6UL
Peng Fan <peng.fan(a)nxp.com>
nvmem: imx: correct nregs for i.MX6SLL
Peng Fan <peng.fan(a)nxp.com>
nvmem: imx: correct nregs for i.MX6ULL
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Unmap only if buffer is unmapped from DSP
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Clean buffers on remote invocation failures
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Free DMA handles for RPC calls with no arguments
Ekansh Gupta <quic_ekangupt(a)quicinc.com>
misc: fastrpc: Reset metadata buffer to avoid incorrect free
Yujie Liu <yujie.liu(a)intel.com>
tracing/kprobes: Fix the description of variable length arguments
Jian Zhang <zhangjian.3032(a)bytedance.com>
i2c: aspeed: Fix i2c bus hang in slave read
Alain Volmat <alain.volmat(a)foss.st.com>
i2c: stm32f7: Fix PEC handling in case of SMBUS transfers
Herve Codina <herve.codina(a)bootlin.com>
i2c: muxes: i2c-demux-pinctrl: Use of_get_i2c_adapter_by_node()
Herve Codina <herve.codina(a)bootlin.com>
i2c: muxes: i2c-mux-gpmux: Use of_get_i2c_adapter_by_node()
Herve Codina <herve.codina(a)bootlin.com>
i2c: muxes: i2c-mux-pinctrl: Use of_get_i2c_adapter_by_node()
Robert Hancock <robert.hancock(a)calian.com>
iio: adc: xilinx-xadc: Correct temperature offset/scale for UltraScale
Robert Hancock <robert.hancock(a)calian.com>
iio: adc: xilinx-xadc: Don't clobber preset voltage/temperature thresholds
Marek Szyprowski <m.szyprowski(a)samsung.com>
iio: exynos-adc: request second interupt only when touchscreen mode is used
Linus Walleij <linus.walleij(a)linaro.org>
iio: afe: rescale: Accept only offset channels
Jens Axboe <axboe(a)kernel.dk>
io_uring/fdinfo: lock SQ thread while retrieving thread cpu/pid
Haibo Li <haibo.li(a)mediatek.com>
kasan: print the original fault addr when access invalid shadow
Khazhismel Kumykov <khazhy(a)chromium.org>
blk-throttle: check for overflow in calculate_bytes_allowed
Damien Le Moal <dlemoal(a)kernel.org>
scsi: sd: Introduce manage_shutdown device flag
Michal Schmidt <mschmidt(a)redhat.com>
iavf: in iavf_down, disable queues when removing the driver
Sui Jingfeng <suijingfeng(a)loongson.cn>
drm/logicvc: Kconfig: select REGMAP and REGMAP_MMIO
Ivan Vecera <ivecera(a)redhat.com>
i40e: Fix wrong check for I40E_TXR_FLAGS_WB_ON_ITR
Pablo Neira Ayuso <pablo(a)netfilter.org>
gtp: fix fragmentation needed check with gso
Pablo Neira Ayuso <pablo(a)netfilter.org>
gtp: uapi: fix GTPA_MAX
Fred Chen <fred.chenchen03(a)gmail.com>
tcp: fix wrong RTO timeout when received SACK reneging
Douglas Anderson <dianders(a)chromium.org>
r8152: Release firmware if we have an error in probe
Douglas Anderson <dianders(a)chromium.org>
r8152: Cancel hw_phy_work if we have an error in probe
Douglas Anderson <dianders(a)chromium.org>
r8152: Run the unload routine if we have errors during probe
Douglas Anderson <dianders(a)chromium.org>
r8152: Increase USB control msg timeout to 5000ms as per spec
Shigeru Yoshida <syoshida(a)redhat.com>
net: usb: smsc95xx: Fix uninit-value access in smsc95xx_read_reg
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
net: ieee802154: adf7242: Fix some potential buffer overflow in adf7242_stats_show()
Dell Jin <dell.jin.code(a)outlook.com>
net: ethernet: adi: adin1110: Fix uninitialized variable
Sasha Neftin <sasha.neftin(a)intel.com>
igc: Fix ambiguity in the ethtool advertising
Eric Dumazet <edumazet(a)google.com>
neighbour: fix various data-races
Mateusz Palczewski <mateusz.palczewski(a)intel.com>
igb: Fix potential memory leak in igb_add_ethtool_nfc_entry
Kunwu Chan <chentao(a)kylinos.cn>
treewide: Spelling fix in comment
Ivan Vecera <ivecera(a)redhat.com>
i40e: Fix I40E_FLAG_VF_VLAN_PRUNING value
Michal Schmidt <mschmidt(a)redhat.com>
iavf: initialize waitqueues before starting watchdog_task
Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
r8169: fix the KCSAN reported data race in rtl_rx while reading desc->opts1
Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
r8169: fix the KCSAN reported data-race in rtl_tx while reading TxDescArray[entry].opts1
Mirsad Goran Todorovac <mirsad.todorovac(a)alu.unizg.hr>
r8169: fix the KCSAN reported data-race in rtl_tx() while reading tp->cur_tx
Tony Lindgren <tony(a)atomide.com>
clk: ti: Fix missing omap5 mcbsp functional clock and aliases
Tony Lindgren <tony(a)atomide.com>
clk: ti: Fix missing omap4 mcbsp functional clock and aliases
Hao Ge <gehao(a)kylinos.cn>
firmware/imx-dsp: Fix use_after_free in imx_dsp_setup_channels()
Randy Dunlap <rdunlap(a)infradead.org>
ARM: OMAP: timer32K: fix all kernel-doc warnings
Lukasz Majczak <lma(a)semihalf.com>
drm/dp_mst: Fix NULL deref in get_mst_branch_device_by_guid_helper()
Mario Limonciello <mario.limonciello(a)amd.com>
drm/amd: Disable ASPM for VI w/ all Intel systems
Umesh Nerlige Ramappa <umesh.nerlige.ramappa(a)intel.com>
drm/i915/pmu: Check if pmu is closed before stopping event
Al Viro <viro(a)zeniv.linux.org.uk>
nfsd: lock_rename() needs both directories to live on the same fs
Liam R. Howlett <Liam.Howlett(a)oracle.com>
maple_tree: add GFP_KERNEL to allocations in mas_expected_entries()
Rik van Riel <riel(a)surriel.com>
hugetlbfs: extend hugetlb_vma_lock to private VMAs
Gregory Price <gourry.memverge(a)gmail.com>
mm/migrate: fix do_pages_move for compat pointers
Kemeng Shi <shikemeng(a)huaweicloud.com>
mm/page_alloc: correct start page when guard page debug is enabled
Rik van Riel <riel(a)surriel.com>
hugetlbfs: clear resv_map pointer if mmap fails
Sebastian Ott <sebott(a)redhat.com>
mm: fix vm_brk_flags() to not bail out while holding lock
Christopher Obbard <chris.obbard(a)collabora.com>
arm64: dts: rockchip: Fix i2s0 pin conflict on ROCK Pi 4 boards
Christopher Obbard <chris.obbard(a)collabora.com>
arm64: dts: rockchip: Add i2s0-2ch-bus-bclk-off pins to RK3399
Eric Auger <eric.auger(a)redhat.com>
vhost: Allow null msg.size on VHOST_IOTLB_INVALIDATE
Alexandru Matei <alexandru.matei(a)uipath.com>
vsock/virtio: initialize the_virtio_vsock before using VQs
Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
virtio_pci: fix the common cfg map size
zhenwei pi <pizhenwei(a)bytedance.com>
virtio-crypto: handle config changed by work queue
Maximilian Heyne <mheyne(a)amazon.de>
virtio-mmio: fix memory leak of vm_dev
Gavin Shan <gshan(a)redhat.com>
virtio_balloon: Fix endless deflation and inflation on arm64
Rodríguez Barbarin, José Javier <JoseJavier.Rodriguez(a)duagon.com>
mcb-lpc: Reallocate memory region to avoid memory overlapping
Rodríguez Barbarin, José Javier <JoseJavier.Rodriguez(a)duagon.com>
mcb: Return actual parsed size when reading chameleon table
Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
pinctrl: qcom: lpass-lpi: fix concurrent register updates
Johan Hovold <johan+linaro(a)kernel.org>
ASoC: codecs: wcd938x: fix runtime PM imbalance on remove
Johan Hovold <johan+linaro(a)kernel.org>
ASoC: codecs: wcd938x: fix regulator leaks on probe errors
Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
ASoC: codecs: wcd938x: Simplify with dev_err_probe
Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
ASoC: codecs: wcd938x: Convert to platform remove callback returning void
Ulf Hansson <ulf.hansson(a)linaro.org>
mmc: core: Fix error propagation for some ioctl commands
Christian Loehle <CLoehle(a)hyperstone.com>
mmc: block: ioctl: do write error check for spi
Ulf Hansson <ulf.hansson(a)linaro.org>
mmc: core: Align to common busy polling behaviour for mmc ioctls
Roman Kagan <rkagan(a)amazon.de>
KVM: x86/pmu: Truncate counter value to allowed width on write
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/omap4-l4-abe.dtsi | 6 ++
arch/arm/boot/dts/omap4-l4.dtsi | 2 +
arch/arm/boot/dts/omap5-l4-abe.dtsi | 6 ++
arch/arm/mach-omap1/timer32k.c | 14 ++---
arch/arm64/boot/dts/rockchip/rk3399-rock-pi-4.dtsi | 1 +
arch/arm64/boot/dts/rockchip/rk3399.dtsi | 10 +++
arch/sparc/lib/checksum_32.S | 2 +-
arch/x86/include/asm/i8259.h | 2 +
arch/x86/include/asm/intel-family.h | 2 +
arch/x86/kernel/acpi/boot.c | 3 +
arch/x86/kernel/i8259.c | 38 ++++++++---
arch/x86/kvm/pmu.h | 6 ++
arch/x86/kvm/svm/pmu.c | 2 +-
arch/x86/kvm/vmx/pmu_intel.c | 4 +-
block/blk-throttle.c | 6 ++
drivers/ata/libata-scsi.c | 5 +-
drivers/clk/clk.c | 21 ++++---
drivers/clk/ti/clk-44xx.c | 5 ++
drivers/clk/ti/clk-54xx.c | 4 ++
drivers/crypto/virtio/virtio_crypto_common.h | 3 +
drivers/crypto/virtio/virtio_crypto_core.c | 14 ++++-
drivers/firewire/sbp2.c | 1 +
drivers/firmware/imx/imx-dsp.c | 2 +-
drivers/gpu/drm/amd/amdgpu/vi.c | 2 +-
drivers/gpu/drm/display/drm_dp_mst_topology.c | 6 +-
drivers/gpu/drm/i915/i915_pmu.c | 9 +++
drivers/gpu/drm/logicvc/Kconfig | 2 +
drivers/i2c/busses/i2c-aspeed.c | 3 +-
drivers/i2c/busses/i2c-stm32f7.c | 9 ++-
drivers/i2c/muxes/i2c-demux-pinctrl.c | 2 +-
drivers/i2c/muxes/i2c-mux-gpmux.c | 2 +-
drivers/i2c/muxes/i2c-mux-pinctrl.c | 2 +-
drivers/iio/adc/exynos_adc.c | 24 ++++---
drivers/iio/adc/xilinx-xadc-core.c | 39 +++++-------
drivers/iio/adc/xilinx-xadc.h | 2 +
drivers/iio/afe/iio-rescale.c | 19 ++++--
drivers/mcb/mcb-lpc.c | 35 +++++++++--
drivers/mcb/mcb-parse.c | 15 +++--
drivers/misc/fastrpc.c | 34 +++++-----
drivers/mmc/core/block.c | 38 ++++++++---
drivers/mmc/core/mmc_ops.c | 1 +
drivers/net/ethernet/adi/adin1110.c | 2 +-
drivers/net/ethernet/intel/i40e/i40e.h | 2 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 2 +-
drivers/net/ethernet/intel/iavf/iavf_main.c | 7 ++-
drivers/net/ethernet/intel/igb/igb_ethtool.c | 6 +-
drivers/net/ethernet/intel/igc/igc_ethtool.c | 35 ++++++++---
drivers/net/ethernet/realtek/r8169_main.c | 6 +-
drivers/net/ethernet/toshiba/ps3_gelic_wireless.c | 2 +-
drivers/net/gtp.c | 5 +-
drivers/net/ieee802154/adf7242.c | 5 +-
drivers/net/usb/r8152.c | 11 +++-
drivers/net/usb/smsc95xx.c | 4 +-
drivers/nvmem/imx-ocotp.c | 6 +-
drivers/pinctrl/qcom/pinctrl-lpass-lpi.c | 17 +++--
drivers/platform/x86/thinkpad_acpi.c | 73 ++++++++++++++++++++++
drivers/scsi/sd.c | 39 +++++++++++-
drivers/vhost/vhost.c | 4 +-
drivers/virtio/virtio_balloon.c | 6 +-
drivers/virtio/virtio_mmio.c | 19 ++++--
drivers/virtio/virtio_pci_modern_dev.c | 2 +-
fs/ext4/mballoc.c | 51 +++++++--------
fs/ext4/mballoc.h | 14 +++++
fs/nfsd/vfs.c | 12 ++--
include/linux/hugetlb.h | 6 ++
include/linux/kasan.h | 6 +-
include/scsi/scsi_device.h | 20 +++++-
include/uapi/linux/gtp.h | 2 +-
io_uring/fdinfo.c | 18 ++++--
kernel/events/core.c | 3 +-
kernel/trace/trace_kprobe.c | 4 +-
lib/maple_tree.c | 2 +-
lib/test_maple_tree.c | 35 +++++++----
mm/hugetlb.c | 48 +++++++++++---
mm/kasan/report.c | 4 +-
mm/migrate.c | 14 ++++-
mm/mmap.c | 6 +-
mm/page_alloc.c | 2 +-
net/core/neighbour.c | 67 ++++++++++----------
net/ipv4/tcp_input.c | 9 +--
net/vmw_vsock/virtio_transport.c | 18 +++++-
sound/soc/codecs/wcd938x.c | 51 ++++++++-------
tools/include/linux/rwsem.h | 40 ++++++++++++
tools/objtool/check.c | 2 +-
85 files changed, 789 insertions(+), 305 deletions(-)
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
There exists a race between holding a reference of an eventfs_inode dentry
and the freeing of the eventfs_inode. If user space has a dentry held long
enough, it may still be able to access the dentry's eventfs_inode after it
has been freed.
To prevent this, have he eventfs_inode freed via the last dput() (or via
RCU if the eventfs_inode does not have a dentry).
This means reintroducing the eventfs_inode del_list field at a temporary
place to put the eventfs_inode. It needs to mark it as freed (via the
list) but also must invalidate the dentry immediately as the return from
eventfs_remove_dir() expects that they are. But the dentry invalidation
must not be called under the eventfs_mutex, so it must be done after the
eventfs_inode is marked as free (put on a deletion list).
Cc: stable(a)vger.kernel.org
Cc: Ajay Kaher <akaher(a)vmware.com>
Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Changes since v5: https://lkml.kernel.org/r/20231031223420.988874091@goodmis.org
- Rebased for this patch series
fs/tracefs/event_inode.c | 146 ++++++++++++++++++---------------------
fs/tracefs/internal.h | 2 +
2 files changed, 69 insertions(+), 79 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 8ac9abf7a3d5..0a04ae0ca8c8 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -85,8 +85,7 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
mutex_lock(&eventfs_mutex);
ei = dentry->d_fsdata;
- /* The LSB is set when the eventfs_inode is being freed */
- if (((unsigned long)ei & 1UL) || ei->is_freed) {
+ if (ei->is_freed) {
/* Do not allow changes if the event is about to be removed. */
mutex_unlock(&eventfs_mutex);
return -ENODEV;
@@ -276,35 +275,17 @@ static void free_ei(struct eventfs_inode *ei)
void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
{
struct tracefs_inode *ti_parent;
- struct eventfs_inode *ei_child, *tmp;
struct eventfs_inode *ei;
int i;
/* The top level events directory may be freed by this */
if (unlikely(ti->flags & TRACEFS_EVENT_TOP_INODE)) {
- LIST_HEAD(ef_del_list);
-
mutex_lock(&eventfs_mutex);
-
ei = ti->private;
-
- /* Record all the top level files */
- list_for_each_entry_srcu(ei_child, &ei->children, list,
- lockdep_is_held(&eventfs_mutex)) {
- list_add_tail(&ei_child->del_list, &ef_del_list);
- }
-
/* Nothing should access this, but just in case! */
ti->private = NULL;
-
mutex_unlock(&eventfs_mutex);
- /* Now safely free the top level files and their children */
- list_for_each_entry_safe(ei_child, tmp, &ef_del_list, del_list) {
- list_del(&ei_child->del_list);
- eventfs_remove_dir(ei_child);
- }
-
free_ei(ei);
return;
}
@@ -319,14 +300,6 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
if (!ei)
goto out;
- /*
- * If ei was freed, then the LSB bit is set for d_fsdata.
- * But this should not happen, as it should still have a
- * ref count that prevents it. Warn in case it does.
- */
- if (WARN_ON_ONCE((unsigned long)ei & 1))
- goto out;
-
/* This could belong to one of the files of the ei */
if (ei->dentry != dentry) {
for (i = 0; i < ei->nr_entries; i++) {
@@ -336,6 +309,8 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
if (WARN_ON_ONCE(i == ei->nr_entries))
goto out;
ei->d_children[i] = NULL;
+ } else if (ei->is_freed) {
+ free_ei(ei);
} else {
ei->dentry = NULL;
}
@@ -962,13 +937,65 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
return ERR_PTR(-ENOMEM);
}
+static LLIST_HEAD(free_list);
+
+static void eventfs_workfn(struct work_struct *work)
+{
+ struct eventfs_inode *ei, *tmp;
+ struct llist_node *llnode;
+
+ llnode = llist_del_all(&free_list);
+ llist_for_each_entry_safe(ei, tmp, llnode, llist) {
+ /* This dput() matches the dget() from unhook_dentry() */
+ for (int i = 0; i < ei->nr_entries; i++) {
+ if (ei->d_children[i])
+ dput(ei->d_children[i]);
+ }
+ /* This should only get here if it had a dentry */
+ if (!WARN_ON_ONCE(!ei->dentry))
+ dput(ei->dentry);
+ }
+}
+
+static DECLARE_WORK(eventfs_work, eventfs_workfn);
+
static void free_rcu_ei(struct rcu_head *head)
{
struct eventfs_inode *ei = container_of(head, struct eventfs_inode, rcu);
+ if (ei->dentry) {
+ /* Do not free the ei until all references of dentry are gone */
+ if (llist_add(&ei->llist, &free_list))
+ queue_work(system_unbound_wq, &eventfs_work);
+ return;
+ }
+
+ /* If the ei doesn't have a dentry, neither should its children */
+ for (int i = 0; i < ei->nr_entries; i++) {
+ WARN_ON_ONCE(ei->d_children[i]);
+ }
+
free_ei(ei);
}
+static void unhook_dentry(struct dentry *dentry)
+{
+ if (!dentry)
+ return;
+
+ /* Keep the dentry from being freed yet (see eventfs_workfn()) */
+ dget(dentry);
+
+ dentry->d_fsdata = NULL;
+ d_invalidate(dentry);
+ mutex_lock(&eventfs_mutex);
+ /* dentry should now have at least a single reference */
+ WARN_ONCE((int)d_count(dentry) < 1,
+ "dentry %px (%s) less than one reference (%d) after invalidate\n",
+ dentry, dentry->d_name.name, d_count(dentry));
+ mutex_unlock(&eventfs_mutex);
+}
+
/**
* eventfs_remove_rec - remove eventfs dir or file from list
* @ei: eventfs_inode to be removed.
@@ -1006,33 +1033,6 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head,
list_add_tail(&ei->del_list, head);
}
-static void unhook_dentry(struct dentry **dentry, struct dentry **list)
-{
- if (*dentry) {
- unsigned long ptr = (unsigned long)*list;
-
- /* Keep the dentry from being freed yet */
- dget(*dentry);
-
- /*
- * Paranoid: The dget() above should prevent the dentry
- * from being freed and calling eventfs_set_ei_status_free().
- * But just in case, set the link list LSB pointer to 1
- * and have eventfs_set_ei_status_free() check that to
- * make sure that if it does happen, it will not think
- * the d_fsdata is an eventfs_inode.
- *
- * For this to work, no eventfs_inode should be allocated
- * on a odd space, as the ef should always be allocated
- * to be at least word aligned. Check for that too.
- */
- WARN_ON_ONCE(ptr & 1);
-
- (*dentry)->d_fsdata = (void *)(ptr | 1);
- *list = *dentry;
- *dentry = NULL;
- }
-}
/**
* eventfs_remove_dir - remove eventfs dir or file from list
* @ei: eventfs_inode to be removed.
@@ -1043,40 +1043,28 @@ void eventfs_remove_dir(struct eventfs_inode *ei)
{
struct eventfs_inode *tmp;
LIST_HEAD(ei_del_list);
- struct dentry *dentry_list = NULL;
- struct dentry *dentry;
- int i;
if (!ei)
return;
+ /*
+ * Move the deleted eventfs_inodes onto the ei_del_list
+ * which will also set the is_freed value. Note, this has to be
+ * done under the eventfs_mutex, but the deletions of
+ * the dentries must be done outside the eventfs_mutex.
+ * Hence moving them to this temporary list.
+ */
mutex_lock(&eventfs_mutex);
eventfs_remove_rec(ei, &ei_del_list, 0);
+ mutex_unlock(&eventfs_mutex);
list_for_each_entry_safe(ei, tmp, &ei_del_list, del_list) {
- for (i = 0; i < ei->nr_entries; i++)
- unhook_dentry(&ei->d_children[i], &dentry_list);
- unhook_dentry(&ei->dentry, &dentry_list);
+ for (int i = 0; i < ei->nr_entries; i++)
+ unhook_dentry(ei->d_children[i]);
+ unhook_dentry(ei->dentry);
+ list_del(&ei->del_list);
call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei);
}
- mutex_unlock(&eventfs_mutex);
-
- while (dentry_list) {
- unsigned long ptr;
-
- dentry = dentry_list;
- ptr = (unsigned long)dentry->d_fsdata & ~1UL;
- dentry_list = (struct dentry *)ptr;
- dentry->d_fsdata = NULL;
- d_invalidate(dentry);
- mutex_lock(&eventfs_mutex);
- /* dentry should now have at least a single reference */
- WARN_ONCE((int)d_count(dentry) < 1,
- "dentry %px (%s) less than one reference (%d) after invalidate\n",
- dentry, dentry->d_name.name, d_count(dentry));
- mutex_unlock(&eventfs_mutex);
- dput(dentry);
- }
}
/**
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 5f60bcd69289..06a1f220b901 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -54,10 +54,12 @@ struct eventfs_inode {
void *data;
/*
* Union - used for deletion
+ * @llist: for calling dput() if needed after RCU
* @del_list: list of eventfs_inode to delete
* @rcu: eventfs_inode to delete in RCU
*/
union {
+ struct llist_node llist;
struct list_head del_list;
struct rcu_head rcu;
};
--
2.42.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The eventfs_inode->is_freed was a union with the rcu_head with the
assumption that when it was on the srcu list the head would contain a
pointer which would make "is_freed" true. But that was a wrong assumption
as the rcu head is a single link list where the last element is NULL.
Instead, split the nr_entries integer so that "is_freed" is one bit and
the nr_entries is the next 31 bits. As there shouldn't be more than 10
(currently there's at most 5 to 7 depending on the config), this should
not be a problem.
Cc: stable(a)vger.kernel.org
Cc: Ajay Kaher <akaher(a)vmware.com>
Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Changes since v5: https://lkml.kernel.org/r/20231031223419.935276916@goodmis.org
- Resynced for this patch series
fs/tracefs/event_inode.c | 2 ++
fs/tracefs/internal.h | 6 +++---
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 9f612a8f009d..1ce73acf3df0 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -824,6 +824,8 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head,
eventfs_remove_rec(ei_child, head, level + 1);
}
+ ei->is_freed = 1;
+
list_del_rcu(&ei->list);
list_add_tail(&ei->del_list, head);
}
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 64fde9490f52..c7d88aaa949f 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -23,6 +23,7 @@ struct tracefs_inode {
* @d_parent: pointer to the parent's dentry
* @d_children: The array of dentries to represent the files when created
* @data: The private data to pass to the callbacks
+ * @is_freed: Flag set if the eventfs is on its way to be freed
* @nr_entries: The number of items in @entries
*/
struct eventfs_inode {
@@ -38,14 +39,13 @@ struct eventfs_inode {
* Union - used for deletion
* @del_list: list of eventfs_inode to delete
* @rcu: eventfs_inode to delete in RCU
- * @is_freed: node is freed if one of the above is set
*/
union {
struct list_head del_list;
struct rcu_head rcu;
- unsigned long is_freed;
};
- int nr_entries;
+ unsigned int is_freed:1;
+ unsigned int nr_entries:31;
};
static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
--
2.42.0
Hi all,
This series fixes some long-term issues in kernel that preventing
some machine from work properly.
Hopefully that will rescue some system in wild :-)
Thanks
Signed-off-by: Jiaxun Yang <jiaxun.yang(a)flygoat.com>
---
Jiaxun Yang (3):
MIPS: Loongson64: Reserve vgabios memory on boot
MIPS: Loongson64: Enable DMA noncoherent support
MIPS: Loongson64: Handle more memory types passed from firmware
arch/mips/Kconfig | 2 +
arch/mips/include/asm/mach-loongson64/boot_param.h | 9 ++++-
arch/mips/loongson64/env.c | 9 ++++-
arch/mips/loongson64/init.c | 47 ++++++++++++++--------
4 files changed, 48 insertions(+), 19 deletions(-)
---
base-commit: 9c2d379d63450ae464eeab45462e0cb573cd97d0
change-id: 20231101-loongson64_fixes-0afb1b503d1e
Best regards,
--
Jiaxun Yang <jiaxun.yang(a)flygoat.com>
Hi Greg,
I see following build warnings / errors on stable-rc 5.4 branch.
arch/arm/mach-omap2/timer.c:51:10: fatal error: plat/counter-32k.h: No
such file or directory
51 | #include <plat/counter-32k.h>
| ^~~~~~~~~~~~~~~~~~~~
compilation terminated.
Link:
https://storage.tuxsuite.com/public/linaro/lkft/builds/2XXAJIrAB4GOy6jEODeH…
- Naresh
Hi Greg,
I see the following build warning / errors everywhere on stable-rc 5.4 branch.
ld.lld: error: undefined symbol: kallsyms_on_each_symbol
>>> referenced by trace_kprobe.c
>>> trace/trace_kprobe.o:(create_local_trace_kprobe) in archive kernel/built-in.a
>>> referenced by trace_kprobe.c
>>> trace/trace_kprobe.o:(__trace_kprobe_create) in archive kernel/built-in.a
make[1]: *** [Makefile:1227: vmlinux] Error 1
Links,
- https://storage.tuxsuite.com/public/linaro/lkft/builds/2XXALLRIZaXJVcqhff4Z…
- Naresh
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 51f625377561e5b167da2db5aafb7ee268f691c5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023102704-surrogate-dole-2888@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 51f625377561e5b167da2db5aafb7ee268f691c5 Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Thu, 28 Sep 2023 13:24:32 -0400
Subject: [PATCH] mm/mempolicy: fix set_mempolicy_home_node() previous VMA
pointer
The two users of mbind_range() are expecting that mbind_range() will
update the pointer to the previous VMA, or return an error. However,
set_mempolicy_home_node() does not call mbind_range() if there is no VMA
policy. The fix is to update the pointer to the previous VMA prior to
continuing iterating the VMAs when there is no policy.
Users may experience a WARN_ON() during VMA policy updates when updating
a range of VMAs on the home node.
Link: https://lkml.kernel.org/r/20230928172432.2246534-1-Liam.Howlett@oracle.com
Link: https://lore.kernel.org/linux-mm/CALcu4rbT+fMVNaO_F2izaCT+e7jzcAciFkOvk21HG…
Fixes: f4e9e0e69468 ("mm/mempolicy: fix use-after-free of VMA iterator")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Yikebaer Aizezi <yikebaer61(a)gmail.com>
Closes: https://lore.kernel.org/linux-mm/CALcu4rbT+fMVNaO_F2izaCT+e7jzcAciFkOvk21HG…
Reviewed-by: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index f1b00d6ac7ee..29ebf1e7898c 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -1543,8 +1543,10 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, unsigned long, start, unsigned long, le
* the home node for vmas we already updated before.
*/
old = vma_policy(vma);
- if (!old)
+ if (!old) {
+ prev = vma;
continue;
+ }
if (old->mode != MPOL_BIND && old->mode != MPOL_PREFERRED_MANY) {
err = -EOPNOTSUPP;
break;
Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
> Kernel 6.4.6 compiled from source worked AOK on my desktop with Intel Xeon cpu and Nvidia graphics - see below for system specs.
>
> Kernels 6.4.7 & 6.4.8 also compiled from source with identical configs hang with a frozen boot terminal screen after a significant way through the boot sequence (e.g. whilst running /etc/profile). The system may still be running as a sound is emitted when the power button is pressed (only way to escape from the system hang).
>
> The issue seems to be specific to the hardware of this desktop as the problem kernels do boot through to completion on other machines.
>
> A test was done with a different build (from Porteus) of kernel 6.5-RC4 and that did not hang - but kernel 6.4.7 from the same builder hung just like my build.
>
> I apologise that I cannot provide any detailed diagnostics - but I can put diagnostics into /etc/profile and provide screenshots if requested.
>
> Forum thread with more details and screenshots:
> https://forum.puppylinux.com/viewtopic.php?p=95733#p95733
>
> Computer Profile:
> Machine Dell Inc. Precision WorkStation T5400 (version: Not Specified)
> Mainboard Dell Inc. 0RW203 (version: NA)
> • BIOS Dell Inc. A11 | Date: 04/30/2012 | Type: Legacy
> • CPU Intel(R) Xeon(R) CPU E5450 @ 3.00GHz (4 cores)
> • RAM Total: 7955 MB | Used: 1555 MB (19.5%) | Actual Used: 775 MB (9.7%)
> Graphics Resolution: 1366x768 pixels | Display Server: X.Org 21.1.8
> • device-0 NVIDIA Corporation GT218 [NVS 300] [10de:10d8] (rev a2)
> Audio ALSA
> • device-0 Intel Corporation 631xESB/632xESB High Definition Audio Controller [8086:269a] (rev 09)
> • device-1 NVIDIA Corporation High Definition Audio Controller [10de:0be3] (rev a1)
> Network wlan1
> • device-0 Ethernet: Broadcom Inc. and subsidiaries NetXtreme BCM5754 Gigabit Ethernet PCI Express [14e4:167a] (rev 02)
See Bugzilla for the full thread.
FYI, this is stable-specific regression since it doesn't appear on mainline.
Also, I have asked the reporter to also open the issue on gitlab.freedesktop.org
tracker (as it is the standard for DRM subsystem).
To the reporter (on To: list): It'd been great if you also have netconsole
output on your Bugzilla report, providing that you have another machine
connecting to your problematic one.
Anyway, I'm adding this regression to be tracked by regzbot:
#regzbot introduced: v6.4.6..v6.4.7 https://bugzilla.kernel.org/show_bug.cgi?id=217776
#regzbot link: https://forum.puppylinux.com/viewtopic.php?p=95733#p95733
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217776
--
An old man doll... just what I always wanted! - Clara
Add the necessary definitions to the qcom-cpufreq-nvmem driver to
support basic cpufreq scaling on the Qualcomm MSM8909 SoC. In practice
the necessary power domains vary depending on the actual PMIC the SoC
was combined with. With PM8909 the VDD_APC power domain is shared with
VDD_CX so the RPM firmware handles all voltage adjustments, while with
PM8916 and PM660 Linux is responsible to do adaptive voltage scaling
of a dedicated CPU regulator using CPR.
Signed-off-by: Stephan Gerhold <stephan.gerhold(a)kernkonzept.com>
---
Changes in v2:
- Reword commit messages based on discussion with Uffe
- Use generic power domain name "perf" (Uffe)
- Fix pm_runtime error handling (Uffe)
- Add allocation cleanup patch as preparation
- Fix ordering of qcom,msm8909 compatible (Konrad)
- cpufreq-dt-platdev blocklist/dt-bindings patches were applied already
- Link to v1: https://lore.kernel.org/r/20230912-msm8909-cpufreq-v1-0-767ce66b544b@kernko…
---
Stephan Gerhold (3):
cpufreq: qcom-nvmem: Simplify driver data allocation
cpufreq: qcom-nvmem: Enable virtual power domain devices
cpufreq: qcom-nvmem: Add MSM8909
drivers/cpufreq/qcom-cpufreq-nvmem.c | 124 +++++++++++++++++++++++++----------
1 file changed, 90 insertions(+), 34 deletions(-)
---
base-commit: 2e12b516f5e6046ceabd4d24e24297e4d130b148
change-id: 20230906-msm8909-cpufreq-dff238de9ff3
Best regards,
--
Stephan Gerhold <stephan.gerhold(a)kernkonzept.com>
Kernkonzept GmbH at Dresden, Germany, HRB 31129, CEO Dr.-Ing. Michael Hohmuth
The patch titled
Subject: scripts/gdb/vmalloc: disable on no-MMU
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
scripts-gdb-vmalloc-disable-on-no-mmu.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ben Wolsieffer <ben.wolsieffer(a)hefring.com>
Subject: scripts/gdb/vmalloc: disable on no-MMU
Date: Tue, 31 Oct 2023 16:22:36 -0400
vmap_area does not exist on no-MMU, therefore the GDB scripts fail to
load:
Traceback (most recent call last):
File "<...>/vmlinux-gdb.py", line 51, in <module>
import linux.vmalloc
File "<...>/scripts/gdb/linux/vmalloc.py", line 14, in <module>
vmap_area_ptr_type = vmap_area_type.get_type().pointer()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "<...>/scripts/gdb/linux/utils.py", line 28, in get_type
self._type = gdb.lookup_type(self._name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
gdb.error: No struct type named vmap_area.
To fix this, disable the command and add an informative error message if
CONFIG_MMU is not defined, following the example of lx-slabinfo.
Link: https://lkml.kernel.org/r/20231031202235.2655333-2-ben.wolsieffer@hefring.c…
Fixes: 852622bf3616 ("scripts/gdb/vmalloc: add vmallocinfo support")
Signed-off-by: Ben Wolsieffer <ben.wolsieffer(a)hefring.com>
Cc: Jan Kiszka <jan.kiszka(a)siemens.com>
Cc: Kieran Bingham <kbingham(a)kernel.org>
Cc: Kuan-Ying Lee <Kuan-Ying.Lee(a)mediatek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
scripts/gdb/linux/constants.py.in | 1 +
scripts/gdb/linux/vmalloc.py | 8 ++++++--
2 files changed, 7 insertions(+), 2 deletions(-)
--- a/scripts/gdb/linux/constants.py.in~scripts-gdb-vmalloc-disable-on-no-mmu
+++ a/scripts/gdb/linux/constants.py.in
@@ -157,3 +157,4 @@ LX_CONFIG(CONFIG_STACKDEPOT)
LX_CONFIG(CONFIG_PAGE_OWNER)
LX_CONFIG(CONFIG_SLUB_DEBUG)
LX_CONFIG(CONFIG_SLAB_FREELIST_HARDENED)
+LX_CONFIG(CONFIG_MMU)
--- a/scripts/gdb/linux/vmalloc.py~scripts-gdb-vmalloc-disable-on-no-mmu
+++ a/scripts/gdb/linux/vmalloc.py
@@ -10,8 +10,9 @@ import gdb
import re
from linux import lists, utils, stackdepot, constants, mm
-vmap_area_type = utils.CachedType('struct vmap_area')
-vmap_area_ptr_type = vmap_area_type.get_type().pointer()
+if constants.LX_CONFIG_MMU:
+ vmap_area_type = utils.CachedType('struct vmap_area')
+ vmap_area_ptr_type = vmap_area_type.get_type().pointer()
def is_vmalloc_addr(x):
pg_ops = mm.page_ops().ops
@@ -25,6 +26,9 @@ class LxVmallocInfo(gdb.Command):
super(LxVmallocInfo, self).__init__("lx-vmallocinfo", gdb.COMMAND_DATA)
def invoke(self, arg, from_tty):
+ if not constants.LX_CONFIG_MMU:
+ raise gdb.GdbError("Requires MMU support")
+
vmap_area_list = gdb.parse_and_eval('vmap_area_list')
for vmap_area in lists.list_for_each_entry(vmap_area_list, vmap_area_ptr_type, "list"):
if not vmap_area['vm']:
_
Patches currently in -mm which might be from ben.wolsieffer(a)hefring.com are
scripts-gdb-vmalloc-disable-on-no-mmu.patch
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 1419430c8abb5a00590169068590dd54d86590ba
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023102731-olympics-bullpen-6897@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1419430c8abb5a00590169068590dd54d86590ba Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Fri, 29 Sep 2023 14:30:39 -0400
Subject: [PATCH] mmap: fix vma_iterator in error path of vma_merge()
During the error path, the vma iterator may not be correctly positioned or
set to the correct range. Undo the vma_prev() call by resetting to the
passed in address. Re-walking to the same range will fix the range to the
area previously passed in.
Users would notice increased cycles as vma_merge() would be called an
extra time with vma == prev, and thus would fail to merge and return.
Link: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejt…
Link: https://lkml.kernel.org/r/20230929183041.2835469-2-Liam.Howlett@oracle.com
Fixes: 18b098af2890 ("vma_merge: set vma iterator to correct position.")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Jann Horn <jannh(a)google.com>
Closes: https://lore.kernel.org/linux-mm/CAG48ez12VN1JAOtTNMY+Y2YnsU45yL5giS-Qn=ejt…
Reviewed-by: Lorenzo Stoakes <lstoakes(a)gmail.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/mmap.c b/mm/mmap.c
index 7ed286662839..a0917ed26057 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -975,7 +975,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
/* Error in anon_vma clone. */
if (err)
- return NULL;
+ goto anon_vma_fail;
if (vma_start < vma->vm_start || vma_end > vma->vm_end)
vma_expanded = true;
@@ -988,7 +988,7 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
}
if (vma_iter_prealloc(vmi, vma))
- return NULL;
+ goto prealloc_fail;
init_multi_vma_prep(&vp, vma, adjust, remove, remove2);
VM_WARN_ON(vp.anon_vma && adjust && adjust->anon_vma &&
@@ -1016,6 +1016,12 @@ struct vm_area_struct *vma_merge(struct vma_iterator *vmi, struct mm_struct *mm,
vma_complete(&vp, vmi, mm);
khugepaged_enter_vma(res, vm_flags);
return res;
+
+prealloc_fail:
+anon_vma_fail:
+ vma_iter_set(vmi, addr);
+ vma_iter_load(vmi);
+ return NULL;
}
/*
CCITMIN is a 12 bit field and doesn't fit in a u8, so extend it to u16.
This probably wasn't an issue previously because values higher than 255
never occurred.
But since commit 0f55b43dedcd ("coresight: etm: Override TRCIDR3.CCITMIN
on errata affected cpus"), a comparison with 256 was done to enable the
errata, generating the following W=1 build error:
coresight-etm4x-core.c:1188:24: error: result of comparison of
constant 256 with expression of type 'u8' (aka 'unsigned char') is
always false [-Werror,-Wtautological-constant-out-of-range-compare]
if (drvdata->ccitmin == 256)
Cc: stable(a)vger.kernel.org
Fixes: 2e1cdfe184b5 ("coresight-etm4x: Adding CoreSight ETM4x driver")
Reviewed-by: Mike Leach <mike.leach(a)linaro.org>
Signed-off-by: James Clark <james.clark(a)arm.com>
---
Changes since V1:
* Change the fixes commit to the original addition of ccitmin, rather
than the last refactor of the struct.
drivers/hwtracing/coresight/coresight-etm4x.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwtracing/coresight/coresight-etm4x.h b/drivers/hwtracing/coresight/coresight-etm4x.h
index 20e2e4cb7614..da17b6c49b0f 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x.h
+++ b/drivers/hwtracing/coresight/coresight-etm4x.h
@@ -1036,7 +1036,7 @@ struct etmv4_drvdata {
u8 ctxid_size;
u8 vmid_size;
u8 ccsize;
- u8 ccitmin;
+ u16 ccitmin;
u8 s_ex_level;
u8 ns_ex_level;
u8 q_support;
--
2.34.1
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The eventfs_inode->is_freed was a union with the rcu_head with the
assumption that when it was on the srcu list the head would contain a
pointer which would make "is_freed" true. But that was a wrong assumption
as the rcu head is a single link list where the last element is NULL.
Instead, split the nr_entries integer so that "is_freed" is one bit and
the nr_entries is the next 31 bits. As there shouldn't be more than 10
(currently there's at most 5 to 7 depending on the config), this should
not be a problem.
Cc: stable(a)vger.kernel.org
Fixes: 63940449555e7 ("eventfs: Implement eventfs lookup, read, open functions")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 2 ++
fs/tracefs/internal.h | 6 +++---
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 754885dfe71c..2c2c75b2ad73 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -824,6 +824,8 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head,
eventfs_remove_rec(ei_child, head, level + 1);
}
+ ei->is_freed = 1;
+
list_del_rcu(&ei->list);
list_add_tail(&ei->del_list, head);
}
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 64fde9490f52..c7d88aaa949f 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -23,6 +23,7 @@ struct tracefs_inode {
* @d_parent: pointer to the parent's dentry
* @d_children: The array of dentries to represent the files when created
* @data: The private data to pass to the callbacks
+ * @is_freed: Flag set if the eventfs is on its way to be freed
* @nr_entries: The number of items in @entries
*/
struct eventfs_inode {
@@ -38,14 +39,13 @@ struct eventfs_inode {
* Union - used for deletion
* @del_list: list of eventfs_inode to delete
* @rcu: eventfs_inode to delete in RCU
- * @is_freed: node is freed if one of the above is set
*/
union {
struct list_head del_list;
struct rcu_head rcu;
- unsigned long is_freed;
};
- int nr_entries;
+ unsigned int is_freed:1;
+ unsigned int nr_entries:31;
};
static inline struct tracefs_inode *get_tracefs(const struct inode *inode)
--
2.42.0
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 1de195dd0e05d9cba43dec16f83d4ee32af94dd2
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023110127-chip-surfacing-4583@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
1de195dd0e05 ("riscv: fix set_huge_pte_at() for NAPOT mappings when a swap entry is set")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1de195dd0e05d9cba43dec16f83d4ee32af94dd2 Mon Sep 17 00:00:00 2001
From: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Date: Thu, 28 Sep 2023 17:18:46 +0200
Subject: [PATCH] riscv: fix set_huge_pte_at() for NAPOT mappings when a swap
entry is set
We used to determine the number of page table entries to set for a NAPOT
hugepage by using the pte value which actually fails when the pte to set
is a swap entry.
So take advantage of a recent fix for arm64 reported in [1] which
introduces the size of the mapping as an argument of set_huge_pte_at(): we
can then use this size to compute the number of page table entries to set
for a NAPOT region.
Link: https://lkml.kernel.org/r/20230928151846.8229-3-alexghiti@rivosinc.com
Fixes: 82a1a1f3bfb6 ("riscv: mm: support Svnapot in hugetlb page")
Signed-off-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Reported-by: Ryan Roberts <ryan.roberts(a)arm.com>
Closes: https://lore.kernel.org/linux-arm-kernel/20230922115804.2043771-1-ryan.robe… [1]
Reviewed-by: Andrew Jones <ajones(a)ventanamicro.com>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: Qinglin Pan <panqinglin2020(a)iscas.ac.cn>
Cc: Conor Dooley <conor(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/arch/riscv/mm/hugetlbpage.c b/arch/riscv/mm/hugetlbpage.c
index e4a2ace92dbe..b52f0210481f 100644
--- a/arch/riscv/mm/hugetlbpage.c
+++ b/arch/riscv/mm/hugetlbpage.c
@@ -183,15 +183,22 @@ void set_huge_pte_at(struct mm_struct *mm,
pte_t pte,
unsigned long sz)
{
+ unsigned long hugepage_shift;
int i, pte_num;
- if (!pte_napot(pte)) {
- set_pte_at(mm, addr, ptep, pte);
- return;
- }
+ if (sz >= PGDIR_SIZE)
+ hugepage_shift = PGDIR_SHIFT;
+ else if (sz >= P4D_SIZE)
+ hugepage_shift = P4D_SHIFT;
+ else if (sz >= PUD_SIZE)
+ hugepage_shift = PUD_SHIFT;
+ else if (sz >= PMD_SIZE)
+ hugepage_shift = PMD_SHIFT;
+ else
+ hugepage_shift = PAGE_SHIFT;
- pte_num = napot_pte_num(napot_cont_order(pte));
- for (i = 0; i < pte_num; i++, ptep++, addr += PAGE_SIZE)
+ pte_num = sz >> hugepage_shift;
+ for (i = 0; i < pte_num; i++, ptep++, addr += (1 << hugepage_shift))
set_pte_at(mm, addr, ptep, pte);
}
Immediate is incorrectly cast to u32 before being spilled, losing sign
information. The range information is incorrect after load again. Fix
immediate spill by remove the cast. The second patch add a test case
for this.
Signed-off-by: Hao Sun <sunhao.th(a)gmail.com>
---
Changes in v2:
- Add fix and cc tags.
- Link to v1: https://lore.kernel.org/r/20231026-fix-check-stack-write-v1-0-6b325ef3ce7e@…
---
Hao Sun (2):
bpf: Fix check_stack_write_fixed_off() to correctly spill imm
selftests/bpf: Add test for immediate spilled to stack
kernel/bpf/verifier.c | 2 +-
tools/testing/selftests/bpf/verifier/bpf_st_mem.c | 32 +++++++++++++++++++++++
2 files changed, 33 insertions(+), 1 deletion(-)
---
base-commit: f1c73396133cb3d913e2075298005644ee8dfade
change-id: 20231026-fix-check-stack-write-c40996694dfa
Best regards,
--
Hao Sun <sunhao.th(a)gmail.com>
From: Ronald Wahl <ronald.wahl(a)raritan.com>
Starting RX DMA on THRI interrupt is too early because TX may not have
finished yet.
This change is inspired by commit 90b8596ac460 ("serial: 8250: Prevent
starting up DMA Rx on THRI interrupt") and fixes DMA issues I had with
an AM62 SoC that is using the 8250 OMAP variant.
Cc: stable(a)vger.kernel.org
Fixes: c26389f998a8 ("serial: 8250: 8250_omap: Add DMA support for UARTs on K3 SoCs")
Signed-off-by: Ronald Wahl <ronald.wahl(a)raritan.com>
---
V3: - add Cc: stable(a)vger.kernel.org
V2: - add Fixes: tag
- fix author
drivers/tty/serial/8250/8250_omap.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_omap.c b/drivers/tty/serial/8250/8250_omap.c
index c7ab2963040b..f2f59ec6b50b 100644
--- a/drivers/tty/serial/8250/8250_omap.c
+++ b/drivers/tty/serial/8250/8250_omap.c
@@ -1282,10 +1282,11 @@ static int omap_8250_dma_handle_irq(struct uart_port *port)
status = serial_port_in(port, UART_LSR);
- if (priv->habit & UART_HAS_EFR2)
- am654_8250_handle_rx_dma(up, iir, status);
- else
- status = omap_8250_handle_rx_dma(up, iir, status);
+ if ((iir & 0x3f) != UART_IIR_THRI)
+ if (priv->habit & UART_HAS_EFR2)
+ am654_8250_handle_rx_dma(up, iir, status);
+ else
+ status = omap_8250_handle_rx_dma(up, iir, status);
serial8250_modem_status(up);
if (status & UART_LSR_THRE && up->dma->tx_err) {
--
2.41.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
There exists a race between holding a reference of an eventfs_inode dentry
and the freeing of the eventfs_inode. If user space has a dentry held long
enough, it may still be able to access the dentry's eventfs_inode after it
has been freed.
To prevent this, have he eventfs_inode freed via the last dput() (or via
RCU if the eventfs_inode does not have a dentry).
This means reintroducing the eventfs_inode del_list field at a temporary
place to put the eventfs_inode. It needs to mark it as freed (via the
list) but also must invalidate the dentry immediately as the return from
eventfs_remove_dir() expects that they are. But the dentry invalidation
must not be called under the eventfs_mutex, so it must be done after the
eventfs_inode is marked as free (put on a deletion list).
Cc: stable(a)vger.kernel.org
Fixes: 5bdcd5f5331a2 ("eventfs: Implement removal of meta data from eventfs")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
fs/tracefs/event_inode.c | 184 +++++++++++++++++----------------------
fs/tracefs/internal.h | 2 +
2 files changed, 84 insertions(+), 102 deletions(-)
diff --git a/fs/tracefs/event_inode.c b/fs/tracefs/event_inode.c
index 87a8aaeda231..827ca152cfbe 100644
--- a/fs/tracefs/event_inode.c
+++ b/fs/tracefs/event_inode.c
@@ -85,8 +85,7 @@ static int eventfs_set_attr(struct mnt_idmap *idmap, struct dentry *dentry,
mutex_lock(&eventfs_mutex);
ei = dentry->d_fsdata;
- /* The LSB is set when the eventfs_inode is being freed */
- if (((unsigned long)ei & 1UL) || ei->is_freed) {
+ if (ei->is_freed) {
/* Do not allow changes if the event is about to be removed. */
mutex_unlock(&eventfs_mutex);
return -ENODEV;
@@ -276,35 +275,17 @@ static void free_ei(struct eventfs_inode *ei)
void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
{
struct tracefs_inode *ti_parent;
- struct eventfs_inode *ei_child, *tmp;
struct eventfs_inode *ei;
int i;
/* The top level events directory may be freed by this */
if (unlikely(ti->flags & TRACEFS_EVENT_TOP_INODE)) {
- LIST_HEAD(ef_del_list);
-
mutex_lock(&eventfs_mutex);
-
ei = ti->private;
-
- /* Record all the top level files */
- list_for_each_entry_srcu(ei_child, &ei->children, list,
- lockdep_is_held(&eventfs_mutex)) {
- list_add_tail(&ei_child->del_list, &ef_del_list);
- }
-
/* Nothing should access this, but just in case! */
ti->private = NULL;
-
mutex_unlock(&eventfs_mutex);
- /* Now safely free the top level files and their children */
- list_for_each_entry_safe(ei_child, tmp, &ef_del_list, del_list) {
- list_del(&ei_child->del_list);
- eventfs_remove_dir(ei_child);
- }
-
free_ei(ei);
return;
}
@@ -319,14 +300,6 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
if (!ei)
goto out;
- /*
- * If ei was freed, then the LSB bit is set for d_fsdata.
- * But this should not happen, as it should still have a
- * ref count that prevents it. Warn in case it does.
- */
- if (WARN_ON_ONCE((unsigned long)ei & 1))
- goto out;
-
/* This could belong to one of the files of the ei */
if (ei->dentry != dentry) {
for (i = 0; i < ei->nr_entries; i++) {
@@ -336,6 +309,8 @@ void eventfs_set_ei_status_free(struct tracefs_inode *ti, struct dentry *dentry)
if (WARN_ON_ONCE(i == ei->nr_entries))
goto out;
ei->d_children[i] = NULL;
+ } else if (ei->is_freed) {
+ free_ei(ei);
} else {
ei->dentry = NULL;
}
@@ -962,13 +937,79 @@ struct eventfs_inode *eventfs_create_events_dir(const char *name, struct dentry
return ERR_PTR(-ENOMEM);
}
+static LLIST_HEAD(free_list);
+
+static void eventfs_workfn(struct work_struct *work)
+{
+ struct eventfs_inode *ei, *tmp;
+ struct llist_node *llnode;
+
+ llnode = llist_del_all(&free_list);
+ llist_for_each_entry_safe(ei, tmp, llnode, llist) {
+ /* This dput() matches the dget() from unhook_dentry() */
+ for (int i = 0; i < ei->nr_entries; i++) {
+ if (ei->d_children[i])
+ dput(ei->d_children[i]);
+ }
+ /* This should only get here if it had a dentry */
+ if (!WARN_ON_ONCE(!ei->dentry))
+ dput(ei->dentry);
+ }
+}
+
+static DECLARE_WORK(eventfs_work, eventfs_workfn);
+
static void free_rcu_ei(struct rcu_head *head)
{
struct eventfs_inode *ei = container_of(head, struct eventfs_inode, rcu);
+ if (ei->dentry) {
+ /* Do not free the ei until all references of dentry are gone */
+ if (llist_add(&ei->llist, &free_list))
+ queue_work(system_unbound_wq, &eventfs_work);
+ return;
+ }
+
+ /* If the ei doesn't have a dentry, neither should its children */
+ for (int i = 0; i < ei->nr_entries; i++) {
+ WARN_ON_ONCE(ei->d_children[i]);
+ }
+
free_ei(ei);
}
+static void unhook_dentry(struct dentry *dentry)
+{
+ struct inode *inode;
+
+ if (!dentry)
+ return;
+
+ /* Keep the dentry from being freed yet (see eventfs_workfn()) */
+ dget(dentry);
+
+ inode = dentry->d_inode;
+ inode_lock(inode);
+ if (d_is_dir(dentry))
+ inode->i_flags |= S_DEAD;
+ clear_nlink(inode);
+ inode_unlock(inode);
+
+ inode = dentry->d_parent->d_inode;
+ inode_lock(inode);
+
+ /* Remove its visibility */
+ d_invalidate(dentry);
+ if (d_is_dir(dentry))
+ fsnotify_rmdir(inode, dentry);
+ else
+ fsnotify_unlink(inode, dentry);
+
+ if (d_is_dir(dentry))
+ drop_nlink(inode);
+ inode_unlock(inode);
+}
+
/**
* eventfs_remove_rec - remove eventfs dir or file from list
* @ei: eventfs_inode to be removed.
@@ -1006,34 +1047,6 @@ static void eventfs_remove_rec(struct eventfs_inode *ei, struct list_head *head,
list_add_tail(&ei->del_list, head);
}
-static void unhook_dentry(struct dentry **dentry, struct dentry **list)
-{
- if (*dentry) {
- unsigned long ptr = (unsigned long)*list;
-
- /* Keep the dentry from being freed yet */
- dget(*dentry);
-
- /*
- * Paranoid: The dget() above should prevent the dentry
- * from being freed and calling eventfs_set_ei_status_free().
- * But just in case, set the link list LSB pointer to 1
- * and have eventfs_set_ei_status_free() check that to
- * make sure that if it does happen, it will not think
- * the d_fsdata is an eventfs_inode.
- *
- * For this to work, no eventfs_inode should be allocated
- * on a odd space, as the ef should always be allocated
- * to be at least word aligned. Check for that too.
- */
- WARN_ON_ONCE(ptr & 1);
-
- (*dentry)->d_fsdata = (void *)(ptr | 1);
- *list = *dentry;
- *dentry = NULL;
- }
-}
-
/**
* eventfs_remove_dir - remove eventfs dir or file from list
* @ei: eventfs_inode to be removed.
@@ -1044,61 +1057,28 @@ void eventfs_remove_dir(struct eventfs_inode *ei)
{
struct eventfs_inode *tmp;
LIST_HEAD(ei_del_list);
- struct dentry *dentry_list = NULL;
- struct dentry *dentry;
- struct inode *inode;
- int i;
if (!ei)
return;
+ /*
+ * Move the deleted eventfs_inodes onto the ei_del_list
+ * which will also set the is_freed value. Note, this has to be
+ * done under the eventfs_mutex, but the deletions of
+ * the dentries must be done outside the eventfs_mutex.
+ * Hence moving them to this temporary list.
+ */
mutex_lock(&eventfs_mutex);
eventfs_remove_rec(ei, &ei_del_list, 0);
-
- list_for_each_entry_safe(ei, tmp, &ei_del_list, del_list) {
- for (i = 0; i < ei->nr_entries; i++)
- unhook_dentry(&ei->d_children[i], &dentry_list);
- unhook_dentry(&ei->dentry, &dentry_list);
- call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei);
- }
mutex_unlock(&eventfs_mutex);
- while (dentry_list) {
- unsigned long ptr;
-
- dentry = dentry_list;
- ptr = (unsigned long)dentry->d_fsdata & ~1UL;
- dentry_list = (struct dentry *)ptr;
- dentry->d_fsdata = NULL;
-
- inode = dentry->d_inode;
- inode_lock(inode);
- if (d_is_dir(dentry))
- inode->i_flags |= S_DEAD;
- clear_nlink(inode);
- inode_unlock(inode);
-
- inode = dentry->d_parent->d_inode;
- inode_lock(inode);
-
- /* Remove its visibility */
- d_invalidate(dentry);
- if (d_is_dir(dentry))
- fsnotify_rmdir(inode, dentry);
- else
- fsnotify_unlink(inode, dentry);
-
- if (d_is_dir(dentry))
- drop_nlink(inode);
- inode_unlock(inode);
+ list_for_each_entry_safe(ei, tmp, &ei_del_list, del_list) {
- mutex_lock(&eventfs_mutex);
- /* dentry should now have at least a single reference */
- WARN_ONCE((int)d_count(dentry) < 1,
- "dentry %px (%s) less than one reference (%d) after invalidate\n",
- dentry, dentry->d_name.name, d_count(dentry));
- mutex_unlock(&eventfs_mutex);
- dput(dentry);
+ for (int i = 0; i < ei->nr_entries; i++)
+ unhook_dentry(ei->d_children[i]);
+ unhook_dentry(ei->dentry);
+ list_del(&ei->del_list);
+ call_srcu(&eventfs_srcu, &ei->rcu, free_rcu_ei);
}
}
diff --git a/fs/tracefs/internal.h b/fs/tracefs/internal.h
index 5f60bcd69289..06a1f220b901 100644
--- a/fs/tracefs/internal.h
+++ b/fs/tracefs/internal.h
@@ -54,10 +54,12 @@ struct eventfs_inode {
void *data;
/*
* Union - used for deletion
+ * @llist: for calling dput() if needed after RCU
* @del_list: list of eventfs_inode to delete
* @rcu: eventfs_inode to delete in RCU
*/
union {
+ struct llist_node llist;
struct list_head del_list;
struct rcu_head rcu;
};
--
2.42.0
The patch titled
Subject: mm/damon/sysfs: update monitoring target regions for online input commit
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-damon-sysfs-update-monitoring-target-regions-for-online-input-commit.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: SeongJae Park <sj(a)kernel.org>
Subject: mm/damon/sysfs: update monitoring target regions for online input commit
Date: Tue, 31 Oct 2023 17:01:31 +0000
When user input is committed online, DAMON sysfs interface is ignoring the
user input for the monitoring target regions. Such request is valid and
useful for fixed monitoring target regions-based monitoring ops like
'paddr' or 'fvaddr'.
Update the region boundaries as user specified, too. Note that the
monitoring results of the regions that overlap between the latest
monitoring target regions and the new target regions are preserved.
Treat empty monitoring target regions user request as a request to just
make no change to the monitoring target regions. Otherwise, users should
set the monitoring target regions same to current one for every online
input commit, and it could be challenging for dynamic monitoring target
regions update DAMON ops like 'vaddr'. If the user really need to remove
all monitoring target regions, they can simply remove the target and then
create the target again with empty target regions.
Link: https://lkml.kernel.org/r/20231031170131.46972-1-sj@kernel.org
Fixes: da87878010e5 ("mm/damon/sysfs: support online inputs update")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [5.19+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/damon/sysfs.c | 47 ++++++++++++++++++++++++++++-----------------
1 file changed, 30 insertions(+), 17 deletions(-)
--- a/mm/damon/sysfs.c~mm-damon-sysfs-update-monitoring-target-regions-for-online-input-commit
+++ a/mm/damon/sysfs.c
@@ -1150,34 +1150,47 @@ destroy_targets_out:
return err;
}
-static int damon_sysfs_update_target(struct damon_target *target,
- struct damon_ctx *ctx,
- struct damon_sysfs_target *sys_target)
+static int damon_sysfs_update_target_pid(struct damon_target *target, int pid)
{
- struct pid *pid;
- struct damon_region *r, *next;
-
- if (!damon_target_has_pid(ctx))
- return 0;
+ struct pid *pid_new;
- pid = find_get_pid(sys_target->pid);
- if (!pid)
+ pid_new = find_get_pid(pid);
+ if (!pid_new)
return -EINVAL;
- /* no change to the target */
- if (pid == target->pid) {
- put_pid(pid);
+ if (pid_new == target->pid) {
+ put_pid(pid_new);
return 0;
}
- /* remove old monitoring results and update the target's pid */
- damon_for_each_region_safe(r, next, target)
- damon_destroy_region(r, target);
put_pid(target->pid);
- target->pid = pid;
+ target->pid = pid_new;
return 0;
}
+static int damon_sysfs_update_target(struct damon_target *target,
+ struct damon_ctx *ctx,
+ struct damon_sysfs_target *sys_target)
+{
+ int err;
+
+ if (damon_target_has_pid(ctx)) {
+ err = damon_sysfs_update_target_pid(target, sys_target->pid);
+ if (err)
+ return err;
+ }
+
+ /*
+ * Do monitoring target region boundary update only if one or more
+ * regions are set by the user. This is for keeping current monitoring
+ * target results and range easier, especially for dynamic monitoring
+ * target regions update ops like 'vaddr'.
+ */
+ if (sys_target->regions->nr)
+ err = damon_sysfs_set_regions(target, sys_target->regions);
+ return err;
+}
+
static int damon_sysfs_set_targets(struct damon_ctx *ctx,
struct damon_sysfs_targets *sysfs_targets)
{
_
Patches currently in -mm which might be from sj(a)kernel.org are
mm-damon-sysfs-remove-requested-targets-when-online-commit-inputs.patch
mm-damon-sysfs-remove-requested-targets-when-online-commit-inputs-fix.patch
mm-damon-sysfs-update-monitoring-target-regions-for-online-input-commit.patch
[ Adding Masami and stable ]
On Tue, 31 Oct 2023 00:27:07 +0000
Beau Belgrave <beaub(a)linux.microsoft.com> wrote:
> On Mon, Oct 30, 2023 at 05:31:51PM -0400, Steven Rostedt wrote:
> > On Mon, 30 Oct 2023 12:42:23 -0400
> > Steven Rostedt <rostedt(a)goodmis.org> wrote:
> >
> > > > I still get the splat about the trace_array_put when running
> > > > user_event's ftrace selftest:
> > > >
> > > > [ 26.665931] ------------[ cut here ]------------
> > > > [ 26.666663] WARNING: CPU: 12 PID: 291 at kernel/trace/trace.c:516 tracing_release_file_tr+0x46/0x50
> > > > [ 26.667470] Modules linked in:
> > > > [ 26.667808] CPU: 12 PID: 291 Comm: ftrace_test Not tainted 6.6.0-rc7-next-20231026 #3
> > > > [ 26.668665] RIP: 0010:tracing_release_file_tr+0x46/0x50
> > > > [ 26.669093] Code: d1 03 01 8b 83 c0 1e 00 00 85 c0 74 1d 83 e8 01 48 c7 c7 80 5b ef bc 89 83 c0 1e 00 00 e8 f2 b5 03 01 31 c0 5b e9 75 ee 27 01 <0f> 0b eb df 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90
> > > > [ 26.670580] RSP: 0018:ffffb6ef858ffee8 EFLAGS: 00010246
> > > > [ 26.671128] RAX: 0000000000000000 RBX: ffff9d7ae2364058 RCX: 0000000000000000
> > > > [ 26.671793] RDX: 0000000000000000 RSI: ffffffffbcb6b38b RDI: 00000000ffffffff
> > > > [ 26.672444] RBP: ffff9d7ac3e72200 R08: 0000000000000000 R09: 0000000000000000
> > > > [ 26.673072] R10: ffffb6ef858ffee8 R11: ffffffffbb28526f R12: 00000000000f801f
> > > > [ 26.673705] R13: ffff9d7b661a2020 R14: ffff9d7ac6057728 R15: 0000000000000000
> > > > [ 26.674339] FS: 00007fa852fa6740(0000) GS:ffff9d81a6300000(0000) knlGS:0000000000000000
> > > > [ 26.674978] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [ 26.675506] CR2: 00007fa852c2a250 CR3: 0000000105d92001 CR4: 0000000000370eb0
> > > > [ 26.676142] Call Trace:
> > > > [ 26.676357] <TASK>
> > > > [ 26.676572] ? __warn+0x7f/0x160
> > > > [ 26.677092] ? tracing_release_file_tr+0x46/0x50
> > > > [ 26.677540] ? report_bug+0x1c3/0x1d0
> > > > [ 26.677871] ? handle_bug+0x3c/0x70
> > > > [ 26.678196] ? exc_invalid_op+0x14/0x70
> > > > [ 26.678520] ? asm_exc_invalid_op+0x16/0x20
> > > > [ 26.678845] ? tracing_release_file_tr+0x1f/0x50
> > > > [ 26.679268] ? tracing_release_file_tr+0x46/0x50
> > > > [ 26.679691] ? tracing_release_file_tr+0x1f/0x50
> > > > [ 26.680105] __fput+0xab/0x300
> > > > [ 26.680437] __x64_sys_close+0x38/0x80
> > >
> > > Hmm, this doesn't tell me much. Let me go play with the user_event self
> > > tests.
> >
> > I added a bunch of printk()s and I'm thinking there's a race in user event
> > (or dynamic event) code.
> >
>
> I did as well, however, I don't see how user events would be involved
> other than allowing a trace_remove_event_call() with open enable fds?
>
> I believe the scenario is open the enable file and keep the fd open.
>
> While the fd is open to the enable file, call trace_remove_event_call().
>
> If trace_remove_event_call() is called for an event with a tr->ref > 0,
> should it fail or work? (It currently works without issue.)
>
> Should writes to the fd still work after the event it is related to has
> been removed?
>
> I don't see how user_events could prevent this, it seems
> trace_remove_event_call() should fail if files for it are still open?
>
This is a separate issue from eventfs (good, because I think I have solved
all the known bugs for that one - phew!).
Anyway, I checkout the code just before adding the eventfs, and did the following:
# echo 'p:sched schedule' > /sys/kernel/tracing/kprobe_events
# exec 5>>/sys/kernel/tracing/events/kprobes/sched/enable
# > /sys/kernel/tracing/kprobe_events
# exec 5>&-
And it worked fine. The above creates a kprobe event, opens the enable file
of that event with the bash file descriptor #5, removes the kprobe event,
and then closes the file descriptor #5.
But the I applied:
f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files")
And do the above commands again and BOOM! it crashes with:
[ 217.879087] BUG: kernel NULL pointer dereference, address: 0000000000000028
[ 217.881121] #PF: supervisor read access in kernel mode
[ 217.882532] #PF: error_code(0x0000) - not-present page
[ 217.883932] PGD 0 P4D 0
[ 217.884672] Oops: 0000 [#1] PREEMPT SMP PTI
[ 217.885821] CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186
[ 217.888178] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[ 217.890684] RIP: 0010:tracing_release_file_tr+0xc/0x50
[ 217.892097] Code: 5d 41 5c c3 cc cc cc cc 66 0f 1f 44 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa 53 48 8b 87 80 04 00 00 <48> 8b 58 28 48 85 db 74 2d 31 f6 48 c7 c7 c0 b3 1e 93 e8 2d 48 ca
[ 217.897102] RSP: 0018:ffffa5d400587eb0 EFLAGS: 00010282
[ 217.898531] RAX: 0000000000000000 RBX: ffff907d06aa6c00 RCX: 0000000000000000
[ 217.900471] RDX: 0000000000000000 RSI: ffff907d06aa6c00 RDI: ffff907d0bf21bd0
[ 217.902403] RBP: 00000000000d801e R08: 0000000000000001 R09: ffff907d0bf21bd0
[ 217.904350] R10: 0000000000000001 R11: 0000000000000001 R12: ffff907d0bf21bd0
[ 217.906282] R13: ffff907d103708e0 R14: ffff907d0a178c30 R15: 0000000000000000
[ 217.908215] FS: 00007ff49c150740(0000) GS:ffff907e77d00000(0000) knlGS:0000000000000000
[ 217.910405] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 217.911970] CR2: 0000000000000028 CR3: 00000001051ec005 CR4: 0000000000170ee0
[ 217.913924] Call Trace:
[ 217.914624] <TASK>
[ 217.915232] ? __die+0x23/0x70
[ 217.916105] ? page_fault_oops+0x17d/0x4d0
[ 217.917262] ? exc_page_fault+0x7f/0x200
[ 217.918350] ? asm_exc_page_fault+0x26/0x30
[ 217.919513] ? tracing_release_file_tr+0xc/0x50
[ 217.920780] __fput+0xfb/0x2a0
[ 217.921651] task_work_run+0x5d/0x90
[ 217.922652] exit_to_user_mode_prepare+0x231/0x240
[ 217.923981] syscall_exit_to_user_mode+0x1a/0x50
[ 217.925248] do_syscall_64+0x4b/0xc0
[ 217.926176] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Look familiar?
It's now midnight and I've been at this all day. I'm going to look more at
this tomorrow. It's not going to be easy :-( I'm not sure what exactly to
do. We may need to prevent dynamic events from being deleted if there's any
of its files opened (enable, format, etc).
That is, if you try to delete the event, it will give you an -EBUSY, just
like having them enabled would.
More good news, that commit is in 6.6 *and* marked for stable :-p
-- Steve
When user input is committed online, DAMON sysfs interface is ignoring
the user input for the monitoring target regions. Such request is valid
and useful for fixed monitoring target regions-based monitoring ops like
'paddr' or 'fvaddr'.
Update the region boundaries as user specified, too. Note that the
monitoring results of the regions that overlap between the latest
monitoring target regions and the new target regions are preserved.
Treat empty monitoring target regions user request as a request to just
make no change to the monitoring target regions. Otherwise, users
should set the monitoring target regions same to current one for every
online input commit, and it could be challenging for dynamic monitoring
target regions update DAMON ops like 'vaddr'. If the user really need
to remove all monitoring target regions, they can simply remove the
target and then create the target again with empty target regions.
Fixes: da87878010e5 ("mm/damon/sysfs: support online inputs update")
Cc: <stable(a)vger.kernel.org> # 5.19.x
Signed-off-by: SeongJae Park <sj(a)kernel.org>
---
mm/damon/sysfs.c | 47 ++++++++++++++++++++++++++++++-----------------
1 file changed, 30 insertions(+), 17 deletions(-)
diff --git a/mm/damon/sysfs.c b/mm/damon/sysfs.c
index 1a231bde18f9..e27846708b5a 100644
--- a/mm/damon/sysfs.c
+++ b/mm/damon/sysfs.c
@@ -1150,34 +1150,47 @@ static int damon_sysfs_add_target(struct damon_sysfs_target *sys_target,
return err;
}
-static int damon_sysfs_update_target(struct damon_target *target,
- struct damon_ctx *ctx,
- struct damon_sysfs_target *sys_target)
+static int damon_sysfs_update_target_pid(struct damon_target *target, int pid)
{
- struct pid *pid;
- struct damon_region *r, *next;
-
- if (!damon_target_has_pid(ctx))
- return 0;
+ struct pid *pid_new;
- pid = find_get_pid(sys_target->pid);
- if (!pid)
+ pid_new = find_get_pid(pid);
+ if (!pid_new)
return -EINVAL;
- /* no change to the target */
- if (pid == target->pid) {
- put_pid(pid);
+ if (pid_new == target->pid) {
+ put_pid(pid_new);
return 0;
}
- /* remove old monitoring results and update the target's pid */
- damon_for_each_region_safe(r, next, target)
- damon_destroy_region(r, target);
put_pid(target->pid);
- target->pid = pid;
+ target->pid = pid_new;
return 0;
}
+static int damon_sysfs_update_target(struct damon_target *target,
+ struct damon_ctx *ctx,
+ struct damon_sysfs_target *sys_target)
+{
+ int err;
+
+ if (damon_target_has_pid(ctx)) {
+ err = damon_sysfs_update_target_pid(target, sys_target->pid);
+ if (err)
+ return err;
+ }
+
+ /*
+ * Do monitoring target region boundary update only if one or more
+ * regions are set by the user. This is for keeping current monitoring
+ * target results and range easier, especially for dynamic monitoring
+ * target regions update ops like 'vaddr'.
+ */
+ if (sys_target->regions->nr)
+ err = damon_sysfs_set_regions(target, sys_target->regions);
+ return err;
+}
+
static int damon_sysfs_set_targets(struct damon_ctx *ctx,
struct damon_sysfs_targets *sysfs_targets)
{
--
2.34.1
Calls to amd_pmc_get_dram_size can fail because the function assumes smu
version information has already been read when it hasn't. The smu
version is lazily read as opposed to being read at probe because it is
slow and increases boot time.
Read the smu version information if it has not been read yet.
Link: https://lore.kernel.org/all/a3ee6577-d521-6d18-0a15-2f97d6f8ac3a@amd.com/
Fixes: be8325fb3d8c ("platform/x86/amd: pmc: Get STB DRAM size from PMFW")
Cc: stable(a)vger.kernel.org # 6.5.x
Signed-off-by: Mark Hasemeyer <markhas(a)chromium.org>
---
drivers/platform/x86/amd/pmc/pmc.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/platform/x86/amd/pmc/pmc.c
index cd6ac04c1468..f668eddbc5d5 100644
--- a/drivers/platform/x86/amd/pmc/pmc.c
+++ b/drivers/platform/x86/amd/pmc/pmc.c
@@ -970,6 +970,11 @@ static int amd_pmc_get_dram_size(struct amd_pmc_dev *dev)
switch (dev->cpu_id) {
case AMD_CPU_ID_YC:
+ if (!dev->major) {
+ ret = amd_pmc_get_smu_version(dev);
+ if (ret)
+ goto err_dram_size;
+ }
if (!(dev->major > 90 || (dev->major == 90 && dev->minor > 39))) {
ret = -EINVAL;
goto err_dram_size;
--
2.42.0.820.g83a721a137-goog
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
commit 6c2f421174273de8f83cde4286d1c076d43a2d35 upstream.
Several core drivers and buses expect that driver_override is a
dynamically allocated memory thus later they can kfree() it.
However such assumption is not documented, there were in the past and
there are already users setting it to a string literal. This leads to
kfree() of static memory during device release (e.g. in error paths or
during unbind):
kernel BUG at ../mm/slub.c:3960!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kfree) from [<c058da50>] (platform_device_release+0x88/0xb4)
(platform_device_release) from [<c0585be0>] (device_release+0x2c/0x90)
(device_release) from [<c0a69050>] (kobject_put+0xec/0x20c)
(kobject_put) from [<c0f2f120>] (exynos5_clk_probe+0x154/0x18c)
(exynos5_clk_probe) from [<c058de70>] (platform_drv_probe+0x6c/0xa4)
(platform_drv_probe) from [<c058b7ac>] (really_probe+0x280/0x414)
(really_probe) from [<c058baf4>] (driver_probe_device+0x78/0x1c4)
(driver_probe_device) from [<c0589854>] (bus_for_each_drv+0x74/0xb8)
(bus_for_each_drv) from [<c058b48c>] (__device_attach+0xd4/0x16c)
(__device_attach) from [<c058a638>] (bus_probe_device+0x88/0x90)
(bus_probe_device) from [<c05871fc>] (device_add+0x3dc/0x62c)
(device_add) from [<c075ff10>] (of_platform_device_create_pdata+0x94/0xbc)
(of_platform_device_create_pdata) from [<c07600ec>] (of_platform_bus_create+0x1a8/0x4fc)
(of_platform_bus_create) from [<c0760150>] (of_platform_bus_create+0x20c/0x4fc)
(of_platform_bus_create) from [<c07605f0>] (of_platform_populate+0x84/0x118)
(of_platform_populate) from [<c0f3c964>] (of_platform_default_populate_init+0xa0/0xb8)
(of_platform_default_populate_init) from [<c01031f8>] (do_one_initcall+0x8c/0x404)
Provide a helper which clearly documents the usage of driver_override.
This will allow later to reuse the helper and reduce the amount of
duplicated code.
Convert the platform driver to use a new helper and make the
driver_override field const char (it is not modified by the core).
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Lee Jones <lee(a)kernel.org>
---
drivers/base/driver.c | 69 +++++++++++++++++++++++++++++++++
drivers/base/platform.c | 28 ++-----------
include/linux/device.h | 2 +
include/linux/platform_device.h | 6 ++-
4 files changed, 80 insertions(+), 25 deletions(-)
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 4eabfe28d2b39..9fb57df44c77b 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -31,6 +31,75 @@ static struct device *next_device(struct klist_iter *i)
return dev;
}
+/**
+ * driver_set_override() - Helper to set or clear driver override.
+ * @dev: Device to change
+ * @override: Address of string to change (e.g. &device->driver_override);
+ * The contents will be freed and hold newly allocated override.
+ * @s: NUL-terminated string, new driver name to force a match, pass empty
+ * string to clear it ("" or "\n", where the latter is only for sysfs
+ * interface).
+ * @len: length of @s
+ *
+ * Helper to set or clear driver override in a device, intended for the cases
+ * when the driver_override field is allocated by driver/bus code.
+ *
+ * Returns: 0 on success or a negative error code on failure.
+ */
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len)
+{
+ const char *new, *old;
+ char *cp;
+
+ if (!override || !s)
+ return -EINVAL;
+
+ /*
+ * The stored value will be used in sysfs show callback (sysfs_emit()),
+ * which has a length limit of PAGE_SIZE and adds a trailing newline.
+ * Thus we can store one character less to avoid truncation during sysfs
+ * show.
+ */
+ if (len >= (PAGE_SIZE - 1))
+ return -EINVAL;
+
+ if (!len) {
+ /* Empty string passed - clear override */
+ device_lock(dev);
+ old = *override;
+ *override = NULL;
+ device_unlock(dev);
+ kfree(old);
+
+ return 0;
+ }
+
+ cp = strnchr(s, len, '\n');
+ if (cp)
+ len = cp - s;
+
+ new = kstrndup(s, len, GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ device_lock(dev);
+ old = *override;
+ if (cp != s) {
+ *override = new;
+ } else {
+ /* "\n" passed - clear override */
+ kfree(new);
+ *override = NULL;
+ }
+ device_unlock(dev);
+
+ kfree(old);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(driver_set_override);
+
/**
* driver_for_each_device - Iterator for devices bound to a driver.
* @drv: Driver we're iterating.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 0ee3cab88f70f..f2273655284f0 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -873,31 +873,11 @@ static ssize_t driver_override_store(struct device *dev,
const char *buf, size_t count)
{
struct platform_device *pdev = to_platform_device(dev);
- char *driver_override, *old, *cp;
-
- /* We need to keep extra room for a newline */
- if (count >= (PAGE_SIZE - 1))
- return -EINVAL;
-
- driver_override = kstrndup(buf, count, GFP_KERNEL);
- if (!driver_override)
- return -ENOMEM;
-
- cp = strchr(driver_override, '\n');
- if (cp)
- *cp = '\0';
-
- device_lock(dev);
- old = pdev->driver_override;
- if (strlen(driver_override)) {
- pdev->driver_override = driver_override;
- } else {
- kfree(driver_override);
- pdev->driver_override = NULL;
- }
- device_unlock(dev);
+ int ret;
- kfree(old);
+ ret = driver_set_override(dev, &pdev->driver_override, buf, count);
+ if (ret)
+ return ret;
return count;
}
diff --git a/include/linux/device.h b/include/linux/device.h
index fab5798a47fdb..65e06a066b671 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -319,6 +319,8 @@ extern int __must_check driver_create_file(struct device_driver *driver,
extern void driver_remove_file(struct device_driver *driver,
const struct driver_attribute *attr);
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len);
extern int __must_check driver_for_each_device(struct device_driver *drv,
struct device *start,
void *data,
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 49f634d961188..8fdd3290072b8 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -29,7 +29,11 @@ struct platform_device {
struct resource *resource;
const struct platform_device_id *id_entry;
- char *driver_override; /* Driver name to force a match */
+ /*
+ * Driver name to force a match. Do not set directly, because core
+ * frees it. Use driver_set_override() to set or clear it.
+ */
+ const char *driver_override;
/* MFD cell pointer */
struct mfd_cell *mfd_cell;
--
2.42.0.820.g83a721a137-goog
This helper is used for checking if the connected host supports
the feature, it can be moved into generic code to be used by other
smu implementations as well.
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Reviewed-by: Evan Quan <evan.quan(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
(cherry picked from commit 5d1eb4c4c872b55664f5754cc16827beff8630a7)
The original problematic dGPU is not supported in 5.15.
Just introduce new function for 5.15 as a dependency for fixing
unrelated dGPU that uses this symbol as well.
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +++++++++++++++++++
2 files changed, 20 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index d90da384d185..1f1e7966beb5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1285,6 +1285,7 @@ int amdgpu_device_gpu_recover(struct amdgpu_device *adev,
void amdgpu_device_pci_config_reset(struct amdgpu_device *adev);
int amdgpu_device_pci_reset(struct amdgpu_device *adev);
bool amdgpu_device_need_post(struct amdgpu_device *adev);
+bool amdgpu_device_pcie_dynamic_switching_supported(void);
bool amdgpu_device_should_use_aspm(struct amdgpu_device *adev);
bool amdgpu_device_aspm_support_quirk(void);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2cf49a32ac6c..f57334fff7fc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -1319,6 +1319,25 @@ bool amdgpu_device_need_post(struct amdgpu_device *adev)
return true;
}
+/*
+ * Intel hosts such as Raptor Lake and Sapphire Rapids don't support dynamic
+ * speed switching. Until we have confirmation from Intel that a specific host
+ * supports it, it's safer that we keep it disabled for all.
+ *
+ * https://edc.intel.com/content/www/us/en/design/products/platforms/details/r…
+ * https://gitlab.freedesktop.org/drm/amd/-/issues/2663
+ */
+bool amdgpu_device_pcie_dynamic_switching_supported(void)
+{
+#if IS_ENABLED(CONFIG_X86)
+ struct cpuinfo_x86 *c = &cpu_data(0);
+
+ if (c->x86_vendor == X86_VENDOR_INTEL)
+ return false;
+#endif
+ return true;
+}
+
/**
* amdgpu_device_should_use_aspm - check if the device should program ASPM
*
--
2.34.1
Hi Greg and Sasha,
Please consider applying the following mbox files to their respective
stable trees, which contains commit a1e2c031ec39 ("x86/mm: Simplify
RESERVE_BRK()") and commit e32683c6f7d2 ("x86/mm: Fix RESERVE_BRK() for
older binutils"). This resolves a link failure noticed in the Android
trees due to a new diagnostic in ld.lld:
https://github.com/llvm/llvm-project/commit/1981b1b6b92f7579a30c9ed32dbdf3b…
ld.lld: error: relocation refers to a symbol in a discarded section: __brk_reservation_fn_dmi_alloc__
>>> defined in vmlinux.o
>>> referenced by ld-temp.o
>>> vmlinux.o:(exit_amd_microcode.cfi_jt)
ld.lld: error: relocation refers to a symbol in a discarded section: __brk_reservation_fn_early_pgt_alloc__
>>> defined in vmlinux.o
>>> referenced by ld-temp.o
>>> vmlinux.o:(exit_amd_microcode.cfi_jt)
While I think this may be related to Android's downstream use of LTO and
CFI, I see no reason that this could not happen without LTO due to
RESERVE_BRK() prior to those upstream commits residing in the
.discard.text section.
I confirmed they resolve the Android build problem and I did an
ARCH=x86_64 defconfig build and boot test in QEMU and an allmodconfig
build with GCC, which had no regressions.
Cheers,
Nathan
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 128b0c9781c9f2651bea163cb85e52a6c7be0f9e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023102936-encounter-impatient-894d@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 128b0c9781c9f2651bea163cb85e52a6c7be0f9e Mon Sep 17 00:00:00 2001
From: Thomas Gleixner <tglx(a)linutronix.de>
Date: Wed, 25 Oct 2023 23:04:15 +0200
Subject: [PATCH] x86/i8259: Skip probing when ACPI/MADT advertises PCAT
compatibility
David and a few others reported that on certain newer systems some legacy
interrupts fail to work correctly.
Debugging revealed that the BIOS of these systems leaves the legacy PIC in
uninitialized state which makes the PIC detection fail and the kernel
switches to a dummy implementation.
Unfortunately this fallback causes quite some code to fail as it depends on
checks for the number of legacy PIC interrupts or the availability of the
real PIC.
In theory there is no reason to use the PIC on any modern system when
IO/APIC is available, but the dependencies on the related checks cannot be
resolved trivially and on short notice. This needs lots of analysis and
rework.
The PIC detection has been added to avoid quirky checks and force selection
of the dummy implementation all over the place, especially in VM guest
scenarios. So it's not an option to revert the relevant commit as that
would break a lot of other scenarios.
One solution would be to try to initialize the PIC on detection fail and
retry the detection, but that puts the burden on everything which does not
have a PIC.
Fortunately the ACPI/MADT table header has a flag field, which advertises
in bit 0 that the system is PCAT compatible, which means it has a legacy
8259 PIC.
Evaluate that bit and if set avoid the detection routine and keep the real
PIC installed, which then gets initialized (for nothing) and makes the rest
of the code with all the dependencies work again.
Fixes: e179f6914152 ("x86, irq, pic: Probe for legacy PIC and set legacy_pic appropriately")
Reported-by: David Lazar <dlazar(a)gmail.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: David Lazar <dlazar(a)gmail.com>
Reviewed-by: Hans de Goede <hdegoede(a)redhat.com>
Reviewed-by: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: stable(a)vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218003
Link: https://lore.kernel.org/r/875y2u5s8g.ffs@tglx
diff --git a/arch/x86/include/asm/i8259.h b/arch/x86/include/asm/i8259.h
index 637fa1df3512..c715097e92fd 100644
--- a/arch/x86/include/asm/i8259.h
+++ b/arch/x86/include/asm/i8259.h
@@ -69,6 +69,8 @@ struct legacy_pic {
void (*make_irq)(unsigned int irq);
};
+void legacy_pic_pcat_compat(void);
+
extern struct legacy_pic *legacy_pic;
extern struct legacy_pic null_legacy_pic;
diff --git a/arch/x86/kernel/acpi/boot.c b/arch/x86/kernel/acpi/boot.c
index 2a0ea38955df..c55c0ef47a18 100644
--- a/arch/x86/kernel/acpi/boot.c
+++ b/arch/x86/kernel/acpi/boot.c
@@ -148,6 +148,9 @@ static int __init acpi_parse_madt(struct acpi_table_header *table)
pr_debug("Local APIC address 0x%08x\n", madt->address);
}
+ if (madt->flags & ACPI_MADT_PCAT_COMPAT)
+ legacy_pic_pcat_compat();
+
/* ACPI 6.3 and newer support the online capable bit. */
if (acpi_gbl_FADT.header.revision > 6 ||
(acpi_gbl_FADT.header.revision == 6 &&
diff --git a/arch/x86/kernel/i8259.c b/arch/x86/kernel/i8259.c
index 30a55207c000..c20d1832c481 100644
--- a/arch/x86/kernel/i8259.c
+++ b/arch/x86/kernel/i8259.c
@@ -32,6 +32,7 @@
*/
static void init_8259A(int auto_eoi);
+static bool pcat_compat __ro_after_init;
static int i8259A_auto_eoi;
DEFINE_RAW_SPINLOCK(i8259A_lock);
@@ -299,15 +300,32 @@ static void unmask_8259A(void)
static int probe_8259A(void)
{
+ unsigned char new_val, probe_val = ~(1 << PIC_CASCADE_IR);
unsigned long flags;
- unsigned char probe_val = ~(1 << PIC_CASCADE_IR);
- unsigned char new_val;
+
+ /*
+ * If MADT has the PCAT_COMPAT flag set, then do not bother probing
+ * for the PIC. Some BIOSes leave the PIC uninitialized and probing
+ * fails.
+ *
+ * Right now this causes problems as quite some code depends on
+ * nr_legacy_irqs() > 0 or has_legacy_pic() == true. This is silly
+ * when the system has an IO/APIC because then PIC is not required
+ * at all, except for really old machines where the timer interrupt
+ * must be routed through the PIC. So just pretend that the PIC is
+ * there and let legacy_pic->init() initialize it for nothing.
+ *
+ * Alternatively this could just try to initialize the PIC and
+ * repeat the probe, but for cases where there is no PIC that's
+ * just pointless.
+ */
+ if (pcat_compat)
+ return nr_legacy_irqs();
+
/*
- * Check to see if we have a PIC.
- * Mask all except the cascade and read
- * back the value we just wrote. If we don't
- * have a PIC, we will read 0xff as opposed to the
- * value we wrote.
+ * Check to see if we have a PIC. Mask all except the cascade and
+ * read back the value we just wrote. If we don't have a PIC, we
+ * will read 0xff as opposed to the value we wrote.
*/
raw_spin_lock_irqsave(&i8259A_lock, flags);
@@ -429,5 +447,9 @@ static int __init i8259A_init_ops(void)
return 0;
}
-
device_initcall(i8259A_init_ops);
+
+void __init legacy_pic_pcat_compat(void)
+{
+ pcat_compat = true;
+}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 1aee9158bc978f91701c5992e395efbc6da2de3c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023102701-cadet-groovy-9672@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1aee9158bc978f91701c5992e395efbc6da2de3c Mon Sep 17 00:00:00 2001
From: Al Viro <viro(a)zeniv.linux.org.uk>
Date: Sat, 14 Oct 2023 21:34:40 -0400
Subject: [PATCH] nfsd: lock_rename() needs both directories to live on the
same fs
... checking that after lock_rename() is too late. Incidentally,
NFSv2 had no nfserr_xdev...
Fixes: aa387d6ce153 "nfsd: fix EXDEV checking in rename"
Cc: stable(a)vger.kernel.org # v3.9+
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Acked-by: Chuck Lever <chuck.lever(a)oracle.com>
Tested-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Al Viro <viro(a)zeniv.linux.org.uk>
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index 48260cf68fde..02f5fcaad03f 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -1788,6 +1788,12 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
if (!flen || isdotent(fname, flen) || !tlen || isdotent(tname, tlen))
goto out;
+ err = (rqstp->rq_vers == 2) ? nfserr_acces : nfserr_xdev;
+ if (ffhp->fh_export->ex_path.mnt != tfhp->fh_export->ex_path.mnt)
+ goto out;
+ if (ffhp->fh_export->ex_path.dentry != tfhp->fh_export->ex_path.dentry)
+ goto out;
+
retry:
host_err = fh_want_write(ffhp);
if (host_err) {
@@ -1823,12 +1829,6 @@ nfsd_rename(struct svc_rqst *rqstp, struct svc_fh *ffhp, char *fname, int flen,
if (ndentry == trap)
goto out_dput_new;
- host_err = -EXDEV;
- if (ffhp->fh_export->ex_path.mnt != tfhp->fh_export->ex_path.mnt)
- goto out_dput_new;
- if (ffhp->fh_export->ex_path.dentry != tfhp->fh_export->ex_path.dentry)
- goto out_dput_new;
-
if ((ndentry->d_sb->s_export_op->flags & EXPORT_OP_CLOSE_BEFORE_UNLINK) &&
nfsd_has_cached_files(ndentry)) {
close_cached = true;
Check for return value of wait_for_completion_interruptible_timeout
should be added in v5.10. The following patch contains necessary
changes and can be cleanly applied.
Found by Linux Verification Center (linuxtesting.org) with Svace.
This is a backport of slab-out-of-bounds fix to 5.10. The only change
from the original is adjustment of fill_kobj_path signature.
Wang Hai (1):
kobject: Fix slab-out-of-bounds in fill_kobj_path()
lib/kobject.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
--
2.42.0.758.gaed0368e0e-goog
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
commit 6c2f421174273de8f83cde4286d1c076d43a2d35 upstream.
Several core drivers and buses expect that driver_override is a
dynamically allocated memory thus later they can kfree() it.
However such assumption is not documented, there were in the past and
there are already users setting it to a string literal. This leads to
kfree() of static memory during device release (e.g. in error paths or
during unbind):
kernel BUG at ../mm/slub.c:3960!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kfree) from [<c058da50>] (platform_device_release+0x88/0xb4)
(platform_device_release) from [<c0585be0>] (device_release+0x2c/0x90)
(device_release) from [<c0a69050>] (kobject_put+0xec/0x20c)
(kobject_put) from [<c0f2f120>] (exynos5_clk_probe+0x154/0x18c)
(exynos5_clk_probe) from [<c058de70>] (platform_drv_probe+0x6c/0xa4)
(platform_drv_probe) from [<c058b7ac>] (really_probe+0x280/0x414)
(really_probe) from [<c058baf4>] (driver_probe_device+0x78/0x1c4)
(driver_probe_device) from [<c0589854>] (bus_for_each_drv+0x74/0xb8)
(bus_for_each_drv) from [<c058b48c>] (__device_attach+0xd4/0x16c)
(__device_attach) from [<c058a638>] (bus_probe_device+0x88/0x90)
(bus_probe_device) from [<c05871fc>] (device_add+0x3dc/0x62c)
(device_add) from [<c075ff10>] (of_platform_device_create_pdata+0x94/0xbc)
(of_platform_device_create_pdata) from [<c07600ec>] (of_platform_bus_create+0x1a8/0x4fc)
(of_platform_bus_create) from [<c0760150>] (of_platform_bus_create+0x20c/0x4fc)
(of_platform_bus_create) from [<c07605f0>] (of_platform_populate+0x84/0x118)
(of_platform_populate) from [<c0f3c964>] (of_platform_default_populate_init+0xa0/0xb8)
(of_platform_default_populate_init) from [<c01031f8>] (do_one_initcall+0x8c/0x404)
Provide a helper which clearly documents the usage of driver_override.
This will allow later to reuse the helper and reduce the amount of
duplicated code.
Convert the platform driver to use a new helper and make the
driver_override field const char (it is not modified by the core).
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Lee Jones <lee(a)kernel.org>
---
drivers/base/driver.c | 69 +++++++++++++++++++++++++++++++++
drivers/base/platform.c | 28 ++-----------
include/linux/device.h | 2 +
include/linux/platform_device.h | 6 ++-
4 files changed, 80 insertions(+), 25 deletions(-)
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 857c8f1b876e5..668c6c8c22f1f 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -29,6 +29,75 @@ static struct device *next_device(struct klist_iter *i)
return dev;
}
+/**
+ * driver_set_override() - Helper to set or clear driver override.
+ * @dev: Device to change
+ * @override: Address of string to change (e.g. &device->driver_override);
+ * The contents will be freed and hold newly allocated override.
+ * @s: NUL-terminated string, new driver name to force a match, pass empty
+ * string to clear it ("" or "\n", where the latter is only for sysfs
+ * interface).
+ * @len: length of @s
+ *
+ * Helper to set or clear driver override in a device, intended for the cases
+ * when the driver_override field is allocated by driver/bus code.
+ *
+ * Returns: 0 on success or a negative error code on failure.
+ */
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len)
+{
+ const char *new, *old;
+ char *cp;
+
+ if (!override || !s)
+ return -EINVAL;
+
+ /*
+ * The stored value will be used in sysfs show callback (sysfs_emit()),
+ * which has a length limit of PAGE_SIZE and adds a trailing newline.
+ * Thus we can store one character less to avoid truncation during sysfs
+ * show.
+ */
+ if (len >= (PAGE_SIZE - 1))
+ return -EINVAL;
+
+ if (!len) {
+ /* Empty string passed - clear override */
+ device_lock(dev);
+ old = *override;
+ *override = NULL;
+ device_unlock(dev);
+ kfree(old);
+
+ return 0;
+ }
+
+ cp = strnchr(s, len, '\n');
+ if (cp)
+ len = cp - s;
+
+ new = kstrndup(s, len, GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ device_lock(dev);
+ old = *override;
+ if (cp != s) {
+ *override = new;
+ } else {
+ /* "\n" passed - clear override */
+ kfree(new);
+ *override = NULL;
+ }
+ device_unlock(dev);
+
+ kfree(old);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(driver_set_override);
+
/**
* driver_for_each_device - Iterator for devices bound to a driver.
* @drv: Driver we're iterating.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 2f89e618b142c..a09e7a681f7a7 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -891,31 +891,11 @@ static ssize_t driver_override_store(struct device *dev,
const char *buf, size_t count)
{
struct platform_device *pdev = to_platform_device(dev);
- char *driver_override, *old, *cp;
-
- /* We need to keep extra room for a newline */
- if (count >= (PAGE_SIZE - 1))
- return -EINVAL;
-
- driver_override = kstrndup(buf, count, GFP_KERNEL);
- if (!driver_override)
- return -ENOMEM;
-
- cp = strchr(driver_override, '\n');
- if (cp)
- *cp = '\0';
-
- device_lock(dev);
- old = pdev->driver_override;
- if (strlen(driver_override)) {
- pdev->driver_override = driver_override;
- } else {
- kfree(driver_override);
- pdev->driver_override = NULL;
- }
- device_unlock(dev);
+ int ret;
- kfree(old);
+ ret = driver_set_override(dev, &pdev->driver_override, buf, count);
+ if (ret)
+ return ret;
return count;
}
diff --git a/include/linux/device.h b/include/linux/device.h
index 37e359d81a86f..bccd367c11de5 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -330,6 +330,8 @@ extern int __must_check driver_create_file(struct device_driver *driver,
extern void driver_remove_file(struct device_driver *driver,
const struct driver_attribute *attr);
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len);
extern int __must_check driver_for_each_device(struct device_driver *drv,
struct device *start,
void *data,
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 9e5c98fcea8c6..8268439975b21 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -29,7 +29,11 @@ struct platform_device {
struct resource *resource;
const struct platform_device_id *id_entry;
- char *driver_override; /* Driver name to force a match */
+ /*
+ * Driver name to force a match. Do not set directly, because core
+ * frees it. Use driver_set_override() to set or clear it.
+ */
+ const char *driver_override;
/* MFD cell pointer */
struct mfd_cell *mfd_cell;
--
2.42.0.820.g83a721a137-goog
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
commit 6c2f421174273de8f83cde4286d1c076d43a2d35 upstream.
Several core drivers and buses expect that driver_override is a
dynamically allocated memory thus later they can kfree() it.
However such assumption is not documented, there were in the past and
there are already users setting it to a string literal. This leads to
kfree() of static memory during device release (e.g. in error paths or
during unbind):
kernel BUG at ../mm/slub.c:3960!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kfree) from [<c058da50>] (platform_device_release+0x88/0xb4)
(platform_device_release) from [<c0585be0>] (device_release+0x2c/0x90)
(device_release) from [<c0a69050>] (kobject_put+0xec/0x20c)
(kobject_put) from [<c0f2f120>] (exynos5_clk_probe+0x154/0x18c)
(exynos5_clk_probe) from [<c058de70>] (platform_drv_probe+0x6c/0xa4)
(platform_drv_probe) from [<c058b7ac>] (really_probe+0x280/0x414)
(really_probe) from [<c058baf4>] (driver_probe_device+0x78/0x1c4)
(driver_probe_device) from [<c0589854>] (bus_for_each_drv+0x74/0xb8)
(bus_for_each_drv) from [<c058b48c>] (__device_attach+0xd4/0x16c)
(__device_attach) from [<c058a638>] (bus_probe_device+0x88/0x90)
(bus_probe_device) from [<c05871fc>] (device_add+0x3dc/0x62c)
(device_add) from [<c075ff10>] (of_platform_device_create_pdata+0x94/0xbc)
(of_platform_device_create_pdata) from [<c07600ec>] (of_platform_bus_create+0x1a8/0x4fc)
(of_platform_bus_create) from [<c0760150>] (of_platform_bus_create+0x20c/0x4fc)
(of_platform_bus_create) from [<c07605f0>] (of_platform_populate+0x84/0x118)
(of_platform_populate) from [<c0f3c964>] (of_platform_default_populate_init+0xa0/0xb8)
(of_platform_default_populate_init) from [<c01031f8>] (do_one_initcall+0x8c/0x404)
Provide a helper which clearly documents the usage of driver_override.
This will allow later to reuse the helper and reduce the amount of
duplicated code.
Convert the platform driver to use a new helper and make the
driver_override field const char (it is not modified by the core).
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Lee Jones <lee(a)kernel.org>
---
drivers/base/driver.c | 69 +++++++++++++++++++++++++++++++++
drivers/base/platform.c | 28 ++-----------
include/linux/device.h | 2 +
include/linux/platform_device.h | 6 ++-
4 files changed, 80 insertions(+), 25 deletions(-)
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 4e5ca632f35e8..ef14566a49710 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -29,6 +29,75 @@ static struct device *next_device(struct klist_iter *i)
return dev;
}
+/**
+ * driver_set_override() - Helper to set or clear driver override.
+ * @dev: Device to change
+ * @override: Address of string to change (e.g. &device->driver_override);
+ * The contents will be freed and hold newly allocated override.
+ * @s: NUL-terminated string, new driver name to force a match, pass empty
+ * string to clear it ("" or "\n", where the latter is only for sysfs
+ * interface).
+ * @len: length of @s
+ *
+ * Helper to set or clear driver override in a device, intended for the cases
+ * when the driver_override field is allocated by driver/bus code.
+ *
+ * Returns: 0 on success or a negative error code on failure.
+ */
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len)
+{
+ const char *new, *old;
+ char *cp;
+
+ if (!override || !s)
+ return -EINVAL;
+
+ /*
+ * The stored value will be used in sysfs show callback (sysfs_emit()),
+ * which has a length limit of PAGE_SIZE and adds a trailing newline.
+ * Thus we can store one character less to avoid truncation during sysfs
+ * show.
+ */
+ if (len >= (PAGE_SIZE - 1))
+ return -EINVAL;
+
+ if (!len) {
+ /* Empty string passed - clear override */
+ device_lock(dev);
+ old = *override;
+ *override = NULL;
+ device_unlock(dev);
+ kfree(old);
+
+ return 0;
+ }
+
+ cp = strnchr(s, len, '\n');
+ if (cp)
+ len = cp - s;
+
+ new = kstrndup(s, len, GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ device_lock(dev);
+ old = *override;
+ if (cp != s) {
+ *override = new;
+ } else {
+ /* "\n" passed - clear override */
+ kfree(new);
+ *override = NULL;
+ }
+ device_unlock(dev);
+
+ kfree(old);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(driver_set_override);
+
/**
* driver_for_each_device - Iterator for devices bound to a driver.
* @drv: Driver we're iterating.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 75623b914b8c2..0ed43d185a900 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -973,31 +973,11 @@ static ssize_t driver_override_store(struct device *dev,
const char *buf, size_t count)
{
struct platform_device *pdev = to_platform_device(dev);
- char *driver_override, *old, *cp;
-
- /* We need to keep extra room for a newline */
- if (count >= (PAGE_SIZE - 1))
- return -EINVAL;
-
- driver_override = kstrndup(buf, count, GFP_KERNEL);
- if (!driver_override)
- return -ENOMEM;
-
- cp = strchr(driver_override, '\n');
- if (cp)
- *cp = '\0';
-
- device_lock(dev);
- old = pdev->driver_override;
- if (strlen(driver_override)) {
- pdev->driver_override = driver_override;
- } else {
- kfree(driver_override);
- pdev->driver_override = NULL;
- }
- device_unlock(dev);
+ int ret;
- kfree(old);
+ ret = driver_set_override(dev, &pdev->driver_override, buf, count);
+ if (ret)
+ return ret;
return count;
}
diff --git a/include/linux/device.h b/include/linux/device.h
index c7be3a8073ec3..af4ecbf889107 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -422,6 +422,8 @@ extern int __must_check driver_create_file(struct device_driver *driver,
extern void driver_remove_file(struct device_driver *driver,
const struct driver_attribute *attr);
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len);
extern int __must_check driver_for_each_device(struct device_driver *drv,
struct device *start,
void *data,
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 569f446502bed..c7bd8a1a60976 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -29,7 +29,11 @@ struct platform_device {
struct resource *resource;
const struct platform_device_id *id_entry;
- char *driver_override; /* Driver name to force a match */
+ /*
+ * Driver name to force a match. Do not set directly, because core
+ * frees it. Use driver_set_override() to set or clear it.
+ */
+ const char *driver_override;
/* MFD cell pointer */
struct mfd_cell *mfd_cell;
--
2.42.0.820.g83a721a137-goog
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
commit 6c2f421174273de8f83cde4286d1c076d43a2d35 upstream.
Several core drivers and buses expect that driver_override is a
dynamically allocated memory thus later they can kfree() it.
However such assumption is not documented, there were in the past and
there are already users setting it to a string literal. This leads to
kfree() of static memory during device release (e.g. in error paths or
during unbind):
kernel BUG at ../mm/slub.c:3960!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kfree) from [<c058da50>] (platform_device_release+0x88/0xb4)
(platform_device_release) from [<c0585be0>] (device_release+0x2c/0x90)
(device_release) from [<c0a69050>] (kobject_put+0xec/0x20c)
(kobject_put) from [<c0f2f120>] (exynos5_clk_probe+0x154/0x18c)
(exynos5_clk_probe) from [<c058de70>] (platform_drv_probe+0x6c/0xa4)
(platform_drv_probe) from [<c058b7ac>] (really_probe+0x280/0x414)
(really_probe) from [<c058baf4>] (driver_probe_device+0x78/0x1c4)
(driver_probe_device) from [<c0589854>] (bus_for_each_drv+0x74/0xb8)
(bus_for_each_drv) from [<c058b48c>] (__device_attach+0xd4/0x16c)
(__device_attach) from [<c058a638>] (bus_probe_device+0x88/0x90)
(bus_probe_device) from [<c05871fc>] (device_add+0x3dc/0x62c)
(device_add) from [<c075ff10>] (of_platform_device_create_pdata+0x94/0xbc)
(of_platform_device_create_pdata) from [<c07600ec>] (of_platform_bus_create+0x1a8/0x4fc)
(of_platform_bus_create) from [<c0760150>] (of_platform_bus_create+0x20c/0x4fc)
(of_platform_bus_create) from [<c07605f0>] (of_platform_populate+0x84/0x118)
(of_platform_populate) from [<c0f3c964>] (of_platform_default_populate_init+0xa0/0xb8)
(of_platform_default_populate_init) from [<c01031f8>] (do_one_initcall+0x8c/0x404)
Provide a helper which clearly documents the usage of driver_override.
This will allow later to reuse the helper and reduce the amount of
duplicated code.
Convert the platform driver to use a new helper and make the
driver_override field const char (it is not modified by the core).
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Lee Jones <lee(a)kernel.org>
---
drivers/base/driver.c | 69 +++++++++++++++++++++++++++++++++
drivers/base/platform.c | 28 ++-----------
include/linux/device/driver.h | 2 +
include/linux/platform_device.h | 6 ++-
4 files changed, 80 insertions(+), 25 deletions(-)
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 8c0d33e182fd5..1b9d47b10bd0a 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -30,6 +30,75 @@ static struct device *next_device(struct klist_iter *i)
return dev;
}
+/**
+ * driver_set_override() - Helper to set or clear driver override.
+ * @dev: Device to change
+ * @override: Address of string to change (e.g. &device->driver_override);
+ * The contents will be freed and hold newly allocated override.
+ * @s: NUL-terminated string, new driver name to force a match, pass empty
+ * string to clear it ("" or "\n", where the latter is only for sysfs
+ * interface).
+ * @len: length of @s
+ *
+ * Helper to set or clear driver override in a device, intended for the cases
+ * when the driver_override field is allocated by driver/bus code.
+ *
+ * Returns: 0 on success or a negative error code on failure.
+ */
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len)
+{
+ const char *new, *old;
+ char *cp;
+
+ if (!override || !s)
+ return -EINVAL;
+
+ /*
+ * The stored value will be used in sysfs show callback (sysfs_emit()),
+ * which has a length limit of PAGE_SIZE and adds a trailing newline.
+ * Thus we can store one character less to avoid truncation during sysfs
+ * show.
+ */
+ if (len >= (PAGE_SIZE - 1))
+ return -EINVAL;
+
+ if (!len) {
+ /* Empty string passed - clear override */
+ device_lock(dev);
+ old = *override;
+ *override = NULL;
+ device_unlock(dev);
+ kfree(old);
+
+ return 0;
+ }
+
+ cp = strnchr(s, len, '\n');
+ if (cp)
+ len = cp - s;
+
+ new = kstrndup(s, len, GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ device_lock(dev);
+ old = *override;
+ if (cp != s) {
+ *override = new;
+ } else {
+ /* "\n" passed - clear override */
+ kfree(new);
+ *override = NULL;
+ }
+ device_unlock(dev);
+
+ kfree(old);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(driver_set_override);
+
/**
* driver_for_each_device - Iterator for devices bound to a driver.
* @drv: Driver we're iterating.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 88aef93eb4ddf..647066229fec3 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1046,31 +1046,11 @@ static ssize_t driver_override_store(struct device *dev,
const char *buf, size_t count)
{
struct platform_device *pdev = to_platform_device(dev);
- char *driver_override, *old, *cp;
-
- /* We need to keep extra room for a newline */
- if (count >= (PAGE_SIZE - 1))
- return -EINVAL;
-
- driver_override = kstrndup(buf, count, GFP_KERNEL);
- if (!driver_override)
- return -ENOMEM;
-
- cp = strchr(driver_override, '\n');
- if (cp)
- *cp = '\0';
-
- device_lock(dev);
- old = pdev->driver_override;
- if (strlen(driver_override)) {
- pdev->driver_override = driver_override;
- } else {
- kfree(driver_override);
- pdev->driver_override = NULL;
- }
- device_unlock(dev);
+ int ret;
- kfree(old);
+ ret = driver_set_override(dev, &pdev->driver_override, buf, count);
+ if (ret)
+ return ret;
return count;
}
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index ee7ba5b5417e5..a44f5adeaef5a 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -150,6 +150,8 @@ extern int __must_check driver_create_file(struct device_driver *driver,
extern void driver_remove_file(struct device_driver *driver,
const struct driver_attribute *attr);
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len);
extern int __must_check driver_for_each_device(struct device_driver *drv,
struct device *start,
void *data,
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 17f9cd5626c83..e7a83b0218077 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -30,7 +30,11 @@ struct platform_device {
struct resource *resource;
const struct platform_device_id *id_entry;
- char *driver_override; /* Driver name to force a match */
+ /*
+ * Driver name to force a match. Do not set directly, because core
+ * frees it. Use driver_set_override() to set or clear it.
+ */
+ const char *driver_override;
/* MFD cell pointer */
struct mfd_cell *mfd_cell;
--
2.42.0.820.g83a721a137-goog
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
commit 6c2f421174273de8f83cde4286d1c076d43a2d35 upstream.
Several core drivers and buses expect that driver_override is a
dynamically allocated memory thus later they can kfree() it.
However such assumption is not documented, there were in the past and
there are already users setting it to a string literal. This leads to
kfree() of static memory during device release (e.g. in error paths or
during unbind):
kernel BUG at ../mm/slub.c:3960!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kfree) from [<c058da50>] (platform_device_release+0x88/0xb4)
(platform_device_release) from [<c0585be0>] (device_release+0x2c/0x90)
(device_release) from [<c0a69050>] (kobject_put+0xec/0x20c)
(kobject_put) from [<c0f2f120>] (exynos5_clk_probe+0x154/0x18c)
(exynos5_clk_probe) from [<c058de70>] (platform_drv_probe+0x6c/0xa4)
(platform_drv_probe) from [<c058b7ac>] (really_probe+0x280/0x414)
(really_probe) from [<c058baf4>] (driver_probe_device+0x78/0x1c4)
(driver_probe_device) from [<c0589854>] (bus_for_each_drv+0x74/0xb8)
(bus_for_each_drv) from [<c058b48c>] (__device_attach+0xd4/0x16c)
(__device_attach) from [<c058a638>] (bus_probe_device+0x88/0x90)
(bus_probe_device) from [<c05871fc>] (device_add+0x3dc/0x62c)
(device_add) from [<c075ff10>] (of_platform_device_create_pdata+0x94/0xbc)
(of_platform_device_create_pdata) from [<c07600ec>] (of_platform_bus_create+0x1a8/0x4fc)
(of_platform_bus_create) from [<c0760150>] (of_platform_bus_create+0x20c/0x4fc)
(of_platform_bus_create) from [<c07605f0>] (of_platform_populate+0x84/0x118)
(of_platform_populate) from [<c0f3c964>] (of_platform_default_populate_init+0xa0/0xb8)
(of_platform_default_populate_init) from [<c01031f8>] (do_one_initcall+0x8c/0x404)
Provide a helper which clearly documents the usage of driver_override.
This will allow later to reuse the helper and reduce the amount of
duplicated code.
Convert the platform driver to use a new helper and make the
driver_override field const char (it is not modified by the core).
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Lee Jones <lee(a)kernel.org>
---
drivers/base/driver.c | 69 +++++++++++++++++++++++++++++++++
drivers/base/platform.c | 28 ++-----------
include/linux/device/driver.h | 2 +
include/linux/platform_device.h | 6 ++-
4 files changed, 80 insertions(+), 25 deletions(-)
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 8c0d33e182fd5..1b9d47b10bd0a 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -30,6 +30,75 @@ static struct device *next_device(struct klist_iter *i)
return dev;
}
+/**
+ * driver_set_override() - Helper to set or clear driver override.
+ * @dev: Device to change
+ * @override: Address of string to change (e.g. &device->driver_override);
+ * The contents will be freed and hold newly allocated override.
+ * @s: NUL-terminated string, new driver name to force a match, pass empty
+ * string to clear it ("" or "\n", where the latter is only for sysfs
+ * interface).
+ * @len: length of @s
+ *
+ * Helper to set or clear driver override in a device, intended for the cases
+ * when the driver_override field is allocated by driver/bus code.
+ *
+ * Returns: 0 on success or a negative error code on failure.
+ */
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len)
+{
+ const char *new, *old;
+ char *cp;
+
+ if (!override || !s)
+ return -EINVAL;
+
+ /*
+ * The stored value will be used in sysfs show callback (sysfs_emit()),
+ * which has a length limit of PAGE_SIZE and adds a trailing newline.
+ * Thus we can store one character less to avoid truncation during sysfs
+ * show.
+ */
+ if (len >= (PAGE_SIZE - 1))
+ return -EINVAL;
+
+ if (!len) {
+ /* Empty string passed - clear override */
+ device_lock(dev);
+ old = *override;
+ *override = NULL;
+ device_unlock(dev);
+ kfree(old);
+
+ return 0;
+ }
+
+ cp = strnchr(s, len, '\n');
+ if (cp)
+ len = cp - s;
+
+ new = kstrndup(s, len, GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ device_lock(dev);
+ old = *override;
+ if (cp != s) {
+ *override = new;
+ } else {
+ /* "\n" passed - clear override */
+ kfree(new);
+ *override = NULL;
+ }
+ device_unlock(dev);
+
+ kfree(old);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(driver_set_override);
+
/**
* driver_for_each_device - Iterator for devices bound to a driver.
* @drv: Driver we're iterating.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index ac5cf1a8d79ab..596fbe6b701a5 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1270,31 +1270,11 @@ static ssize_t driver_override_store(struct device *dev,
const char *buf, size_t count)
{
struct platform_device *pdev = to_platform_device(dev);
- char *driver_override, *old, *cp;
-
- /* We need to keep extra room for a newline */
- if (count >= (PAGE_SIZE - 1))
- return -EINVAL;
-
- driver_override = kstrndup(buf, count, GFP_KERNEL);
- if (!driver_override)
- return -ENOMEM;
-
- cp = strchr(driver_override, '\n');
- if (cp)
- *cp = '\0';
-
- device_lock(dev);
- old = pdev->driver_override;
- if (strlen(driver_override)) {
- pdev->driver_override = driver_override;
- } else {
- kfree(driver_override);
- pdev->driver_override = NULL;
- }
- device_unlock(dev);
+ int ret;
- kfree(old);
+ ret = driver_set_override(dev, &pdev->driver_override, buf, count);
+ if (ret)
+ return ret;
return count;
}
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index a498ebcf49933..abf948e102f5d 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -150,6 +150,8 @@ extern int __must_check driver_create_file(struct device_driver *driver,
extern void driver_remove_file(struct device_driver *driver,
const struct driver_attribute *attr);
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len);
extern int __must_check driver_for_each_device(struct device_driver *drv,
struct device *start,
void *data,
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 8aefdc0099c86..72cf70857b85f 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -31,7 +31,11 @@ struct platform_device {
struct resource *resource;
const struct platform_device_id *id_entry;
- char *driver_override; /* Driver name to force a match */
+ /*
+ * Driver name to force a match. Do not set directly, because core
+ * frees it. Use driver_set_override() to set or clear it.
+ */
+ const char *driver_override;
/* MFD cell pointer */
struct mfd_cell *mfd_cell;
--
2.42.0.820.g83a721a137-goog
The patch below does not apply to the 6.5-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.5.y
git checkout FETCH_HEAD
git cherry-pick -x 76b7069bcc89dec33f03eb08abee165d0306b754
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023102716-prudishly-reggae-1b29@gregkh' --subject-prefix 'PATCH 6.5.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 76b7069bcc89dec33f03eb08abee165d0306b754 Mon Sep 17 00:00:00 2001
From: SeongJae Park <sj(a)kernel.org>
Date: Sat, 7 Oct 2023 20:04:32 +0000
Subject: [PATCH] mm/damon/sysfs: check DAMOS regions update progress from
before_terminate()
DAMON_SYSFS can receive DAMOS tried regions update request while kdamond
is already out of the main loop and before_terminate callback
(damon_sysfs_before_terminate() in this case) is not yet called. And
damon_sysfs_handle_cmd() can further be finished before the callback is
invoked. Then, damon_sysfs_before_terminate() unlocks damon_sysfs_lock,
which is not locked by anyone. This happens because the callback function
assumes damon_sysfs_cmd_request_callback() should be called before it.
Check if the assumption was true before doing the unlock, to avoid this
problem.
Link: https://lkml.kernel.org/r/20231007200432.3110-1-sj@kernel.org
Fixes: f1d13cacabe1 ("mm/damon/sysfs: implement DAMOS tried regions update command")
Signed-off-by: SeongJae Park <sj(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [6.2.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/damon/sysfs.c b/mm/damon/sysfs.c
index b86ba7b0a921..f60e56150feb 100644
--- a/mm/damon/sysfs.c
+++ b/mm/damon/sysfs.c
@@ -1208,6 +1208,8 @@ static int damon_sysfs_set_targets(struct damon_ctx *ctx,
return 0;
}
+static bool damon_sysfs_schemes_regions_updating;
+
static void damon_sysfs_before_terminate(struct damon_ctx *ctx)
{
struct damon_target *t, *next;
@@ -1219,8 +1221,10 @@ static void damon_sysfs_before_terminate(struct damon_ctx *ctx)
cmd = damon_sysfs_cmd_request.cmd;
if (kdamond && ctx == kdamond->damon_ctx &&
(cmd == DAMON_SYSFS_CMD_UPDATE_SCHEMES_TRIED_REGIONS ||
- cmd == DAMON_SYSFS_CMD_UPDATE_SCHEMES_TRIED_BYTES)) {
+ cmd == DAMON_SYSFS_CMD_UPDATE_SCHEMES_TRIED_BYTES) &&
+ damon_sysfs_schemes_regions_updating) {
damon_sysfs_schemes_update_regions_stop(ctx);
+ damon_sysfs_schemes_regions_updating = false;
mutex_unlock(&damon_sysfs_lock);
}
@@ -1340,7 +1344,6 @@ static int damon_sysfs_commit_input(struct damon_sysfs_kdamond *kdamond)
static int damon_sysfs_cmd_request_callback(struct damon_ctx *c)
{
struct damon_sysfs_kdamond *kdamond;
- static bool damon_sysfs_schemes_regions_updating;
bool total_bytes_only = false;
int err = 0;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 3d887d512494d678b17c57b835c32f4e48d34f26
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023102723-steerable-trench-2f00@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3d887d512494d678b17c57b835c32f4e48d34f26 Mon Sep 17 00:00:00 2001
From: Lukasz Majczak <lma(a)semihalf.com>
Date: Fri, 22 Sep 2023 08:34:10 +0200
Subject: [PATCH] drm/dp_mst: Fix NULL deref in
get_mst_branch_device_by_guid_helper()
As drm_dp_get_mst_branch_device_by_guid() is called from
drm_dp_get_mst_branch_device_by_guid(), mstb parameter has to be checked,
otherwise NULL dereference may occur in the call to
the memcpy() and cause following:
[12579.365869] BUG: kernel NULL pointer dereference, address: 0000000000000049
[12579.365878] #PF: supervisor read access in kernel mode
[12579.365880] #PF: error_code(0x0000) - not-present page
[12579.365882] PGD 0 P4D 0
[12579.365887] Oops: 0000 [#1] PREEMPT SMP NOPTI
...
[12579.365895] Workqueue: events_long drm_dp_mst_up_req_work
[12579.365899] RIP: 0010:memcmp+0xb/0x29
[12579.365921] Call Trace:
[12579.365927] get_mst_branch_device_by_guid_helper+0x22/0x64
[12579.365930] drm_dp_mst_up_req_work+0x137/0x416
[12579.365933] process_one_work+0x1d0/0x419
[12579.365935] worker_thread+0x11a/0x289
[12579.365938] kthread+0x13e/0x14f
[12579.365941] ? process_one_work+0x419/0x419
[12579.365943] ? kthread_blkcg+0x31/0x31
[12579.365946] ret_from_fork+0x1f/0x30
As get_mst_branch_device_by_guid_helper() is recursive, moving condition
to the first line allow to remove a similar one for step over of NULL elements
inside a loop.
Fixes: 5e93b8208d3c ("drm/dp/mst: move GUID storage from mgr, port to only mst branch")
Cc: <stable(a)vger.kernel.org> # 4.14+
Signed-off-by: Lukasz Majczak <lma(a)semihalf.com>
Reviewed-by: Radoslaw Biernacki <rad(a)chromium.org>
Signed-off-by: Manasi Navare <navaremanasi(a)chromium.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20230922063410.23626-1-lma@se…
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index ed96cfcfa304..8c929ef72c72 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -2574,14 +2574,14 @@ static struct drm_dp_mst_branch *get_mst_branch_device_by_guid_helper(
struct drm_dp_mst_branch *found_mstb;
struct drm_dp_mst_port *port;
+ if (!mstb)
+ return NULL;
+
if (memcmp(mstb->guid, guid, 16) == 0)
return mstb;
list_for_each_entry(port, &mstb->ports, next) {
- if (!port->mstb)
- continue;
-
found_mstb = get_mst_branch_device_by_guid_helper(port->mstb, guid);
if (found_mstb)
gen8_ggtt_invalidate() is only needed for limited set of platforms
where GGTT is mapped as WC. This was added as way to fix WC based GGTT in
commit 0f9b91c754b7 ("drm/i915: flush system agent TLBs on SNB") and
there are no reference in HW docs that forces us to use this on non-WC
backed GGTT.
This can also cause unwanted side-effects on XE_HP platforms where
GFX_FLSH_CNTL_GEN6 is not valid anymore.
v2: Add a func to detect wc ggtt detection (Ville)
v3: Improve commit log and add reference commit (Daniel)
Fixes: d2eae8e98d59 ("drm/i915/dg2: Drop force_probe requirement")
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen(a)linux.intel.com>
Cc: Jani Nikula <jani.nikula(a)linux.intel.com>
Cc: Jonathan Cavitt <jonathan.cavitt(a)intel.com>
Cc: John Harrison <john.c.harrison(a)intel.com>
Cc: Andi Shyti <andi.shyti(a)linux.intel.com>
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: <stable(a)vger.kernel.org> # v6.2+
Suggested-by: Matt Roper <matthew.d.roper(a)intel.com>
Signed-off-by: Nirmoy Das <nirmoy.das(a)intel.com>
Reviewed-by: Matt Roper <matthew.d.roper(a)intel.com>
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com>
---
drivers/gpu/drm/i915/gt/intel_ggtt.c | 35 +++++++++++++++++++---------
1 file changed, 24 insertions(+), 11 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index 1c93e84278a0..15fc8e4703f4 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -195,6 +195,21 @@ void gen6_ggtt_invalidate(struct i915_ggtt *ggtt)
spin_unlock_irq(&uncore->lock);
}
+static bool needs_wc_ggtt_mapping(struct drm_i915_private *i915)
+{
+ /*
+ * On BXT+/ICL+ writes larger than 64 bit to the GTT pagetable range
+ * will be dropped. For WC mappings in general we have 64 byte burst
+ * writes when the WC buffer is flushed, so we can't use it, but have to
+ * resort to an uncached mapping. The WC issue is easily caught by the
+ * readback check when writing GTT PTE entries.
+ */
+ if (!IS_GEN9_LP(i915) && GRAPHICS_VER(i915) < 11)
+ return true;
+
+ return false;
+}
+
static void gen8_ggtt_invalidate(struct i915_ggtt *ggtt)
{
struct intel_uncore *uncore = ggtt->vm.gt->uncore;
@@ -202,8 +217,12 @@ static void gen8_ggtt_invalidate(struct i915_ggtt *ggtt)
/*
* Note that as an uncached mmio write, this will flush the
* WCB of the writes into the GGTT before it triggers the invalidate.
+ *
+ * Only perform this when GGTT is mapped as WC, see ggtt_probe_common().
*/
- intel_uncore_write_fw(uncore, GFX_FLSH_CNTL_GEN6, GFX_FLSH_CNTL_EN);
+ if (needs_wc_ggtt_mapping(ggtt->vm.i915))
+ intel_uncore_write_fw(uncore, GFX_FLSH_CNTL_GEN6,
+ GFX_FLSH_CNTL_EN);
}
static void guc_ggtt_ct_invalidate(struct intel_gt *gt)
@@ -1140,17 +1159,11 @@ static int ggtt_probe_common(struct i915_ggtt *ggtt, u64 size)
GEM_WARN_ON(pci_resource_len(pdev, GEN4_GTTMMADR_BAR) != gen6_gttmmadr_size(i915));
phys_addr = pci_resource_start(pdev, GEN4_GTTMMADR_BAR) + gen6_gttadr_offset(i915);
- /*
- * On BXT+/ICL+ writes larger than 64 bit to the GTT pagetable range
- * will be dropped. For WC mappings in general we have 64 byte burst
- * writes when the WC buffer is flushed, so we can't use it, but have to
- * resort to an uncached mapping. The WC issue is easily caught by the
- * readback check when writing GTT PTE entries.
- */
- if (IS_GEN9_LP(i915) || GRAPHICS_VER(i915) >= 11)
- ggtt->gsm = ioremap(phys_addr, size);
- else
+ if (needs_wc_ggtt_mapping(i915))
ggtt->gsm = ioremap_wc(phys_addr, size);
+ else
+ ggtt->gsm = ioremap(phys_addr, size);
+
if (!ggtt->gsm) {
drm_err(&i915->drm, "Failed to map the ggtt page table\n");
return -ENOMEM;
--
2.41.0
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
commit 6c2f421174273de8f83cde4286d1c076d43a2d35 upstream.
Several core drivers and buses expect that driver_override is a
dynamically allocated memory thus later they can kfree() it.
However such assumption is not documented, there were in the past and
there are already users setting it to a string literal. This leads to
kfree() of static memory during device release (e.g. in error paths or
during unbind):
kernel BUG at ../mm/slub.c:3960!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kfree) from [<c058da50>] (platform_device_release+0x88/0xb4)
(platform_device_release) from [<c0585be0>] (device_release+0x2c/0x90)
(device_release) from [<c0a69050>] (kobject_put+0xec/0x20c)
(kobject_put) from [<c0f2f120>] (exynos5_clk_probe+0x154/0x18c)
(exynos5_clk_probe) from [<c058de70>] (platform_drv_probe+0x6c/0xa4)
(platform_drv_probe) from [<c058b7ac>] (really_probe+0x280/0x414)
(really_probe) from [<c058baf4>] (driver_probe_device+0x78/0x1c4)
(driver_probe_device) from [<c0589854>] (bus_for_each_drv+0x74/0xb8)
(bus_for_each_drv) from [<c058b48c>] (__device_attach+0xd4/0x16c)
(__device_attach) from [<c058a638>] (bus_probe_device+0x88/0x90)
(bus_probe_device) from [<c05871fc>] (device_add+0x3dc/0x62c)
(device_add) from [<c075ff10>] (of_platform_device_create_pdata+0x94/0xbc)
(of_platform_device_create_pdata) from [<c07600ec>] (of_platform_bus_create+0x1a8/0x4fc)
(of_platform_bus_create) from [<c0760150>] (of_platform_bus_create+0x20c/0x4fc)
(of_platform_bus_create) from [<c07605f0>] (of_platform_populate+0x84/0x118)
(of_platform_populate) from [<c0f3c964>] (of_platform_default_populate_init+0xa0/0xb8)
(of_platform_default_populate_init) from [<c01031f8>] (do_one_initcall+0x8c/0x404)
Provide a helper which clearly documents the usage of driver_override.
This will allow later to reuse the helper and reduce the amount of
duplicated code.
Convert the platform driver to use a new helper and make the
driver_override field const char (it is not modified by the core).
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Lee Jones <lee(a)kernel.org>
Change-Id: I131f04170f2f76d468565b27929e0ee6fd0e7adf
---
drivers/base/driver.c | 69 +++++++++++++++++++++++++++++++++
drivers/base/platform.c | 28 ++-----------
include/linux/device/driver.h | 2 +
include/linux/platform_device.h | 6 ++-
4 files changed, 80 insertions(+), 25 deletions(-)
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 8c0d33e182fd5..1b9d47b10bd0a 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -30,6 +30,75 @@ static struct device *next_device(struct klist_iter *i)
return dev;
}
+/**
+ * driver_set_override() - Helper to set or clear driver override.
+ * @dev: Device to change
+ * @override: Address of string to change (e.g. &device->driver_override);
+ * The contents will be freed and hold newly allocated override.
+ * @s: NUL-terminated string, new driver name to force a match, pass empty
+ * string to clear it ("" or "\n", where the latter is only for sysfs
+ * interface).
+ * @len: length of @s
+ *
+ * Helper to set or clear driver override in a device, intended for the cases
+ * when the driver_override field is allocated by driver/bus code.
+ *
+ * Returns: 0 on success or a negative error code on failure.
+ */
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len)
+{
+ const char *new, *old;
+ char *cp;
+
+ if (!override || !s)
+ return -EINVAL;
+
+ /*
+ * The stored value will be used in sysfs show callback (sysfs_emit()),
+ * which has a length limit of PAGE_SIZE and adds a trailing newline.
+ * Thus we can store one character less to avoid truncation during sysfs
+ * show.
+ */
+ if (len >= (PAGE_SIZE - 1))
+ return -EINVAL;
+
+ if (!len) {
+ /* Empty string passed - clear override */
+ device_lock(dev);
+ old = *override;
+ *override = NULL;
+ device_unlock(dev);
+ kfree(old);
+
+ return 0;
+ }
+
+ cp = strnchr(s, len, '\n');
+ if (cp)
+ len = cp - s;
+
+ new = kstrndup(s, len, GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ device_lock(dev);
+ old = *override;
+ if (cp != s) {
+ *override = new;
+ } else {
+ /* "\n" passed - clear override */
+ kfree(new);
+ *override = NULL;
+ }
+ device_unlock(dev);
+
+ kfree(old);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(driver_set_override);
+
/**
* driver_for_each_device - Iterator for devices bound to a driver.
* @drv: Driver we're iterating.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index ac5cf1a8d79ab..596fbe6b701a5 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1270,31 +1270,11 @@ static ssize_t driver_override_store(struct device *dev,
const char *buf, size_t count)
{
struct platform_device *pdev = to_platform_device(dev);
- char *driver_override, *old, *cp;
-
- /* We need to keep extra room for a newline */
- if (count >= (PAGE_SIZE - 1))
- return -EINVAL;
-
- driver_override = kstrndup(buf, count, GFP_KERNEL);
- if (!driver_override)
- return -ENOMEM;
-
- cp = strchr(driver_override, '\n');
- if (cp)
- *cp = '\0';
-
- device_lock(dev);
- old = pdev->driver_override;
- if (strlen(driver_override)) {
- pdev->driver_override = driver_override;
- } else {
- kfree(driver_override);
- pdev->driver_override = NULL;
- }
- device_unlock(dev);
+ int ret;
- kfree(old);
+ ret = driver_set_override(dev, &pdev->driver_override, buf, count);
+ if (ret)
+ return ret;
return count;
}
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index a498ebcf49933..abf948e102f5d 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -150,6 +150,8 @@ extern int __must_check driver_create_file(struct device_driver *driver,
extern void driver_remove_file(struct device_driver *driver,
const struct driver_attribute *attr);
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len);
extern int __must_check driver_for_each_device(struct device_driver *drv,
struct device *start,
void *data,
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 8aefdc0099c86..72cf70857b85f 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -31,7 +31,11 @@ struct platform_device {
struct resource *resource;
const struct platform_device_id *id_entry;
- char *driver_override; /* Driver name to force a match */
+ /*
+ * Driver name to force a match. Do not set directly, because core
+ * frees it. Use driver_set_override() to set or clear it.
+ */
+ const char *driver_override;
/* MFD cell pointer */
struct mfd_cell *mfd_cell;
--
2.42.0.820.g83a721a137-goog
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
commit 6c2f421174273de8f83cde4286d1c076d43a2d35 upstream.
Several core drivers and buses expect that driver_override is a
dynamically allocated memory thus later they can kfree() it.
However such assumption is not documented, there were in the past and
there are already users setting it to a string literal. This leads to
kfree() of static memory during device release (e.g. in error paths or
during unbind):
kernel BUG at ../mm/slub.c:3960!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kfree) from [<c058da50>] (platform_device_release+0x88/0xb4)
(platform_device_release) from [<c0585be0>] (device_release+0x2c/0x90)
(device_release) from [<c0a69050>] (kobject_put+0xec/0x20c)
(kobject_put) from [<c0f2f120>] (exynos5_clk_probe+0x154/0x18c)
(exynos5_clk_probe) from [<c058de70>] (platform_drv_probe+0x6c/0xa4)
(platform_drv_probe) from [<c058b7ac>] (really_probe+0x280/0x414)
(really_probe) from [<c058baf4>] (driver_probe_device+0x78/0x1c4)
(driver_probe_device) from [<c0589854>] (bus_for_each_drv+0x74/0xb8)
(bus_for_each_drv) from [<c058b48c>] (__device_attach+0xd4/0x16c)
(__device_attach) from [<c058a638>] (bus_probe_device+0x88/0x90)
(bus_probe_device) from [<c05871fc>] (device_add+0x3dc/0x62c)
(device_add) from [<c075ff10>] (of_platform_device_create_pdata+0x94/0xbc)
(of_platform_device_create_pdata) from [<c07600ec>] (of_platform_bus_create+0x1a8/0x4fc)
(of_platform_bus_create) from [<c0760150>] (of_platform_bus_create+0x20c/0x4fc)
(of_platform_bus_create) from [<c07605f0>] (of_platform_populate+0x84/0x118)
(of_platform_populate) from [<c0f3c964>] (of_platform_default_populate_init+0xa0/0xb8)
(of_platform_default_populate_init) from [<c01031f8>] (do_one_initcall+0x8c/0x404)
Provide a helper which clearly documents the usage of driver_override.
This will allow later to reuse the helper and reduce the amount of
duplicated code.
Convert the platform driver to use a new helper and make the
driver_override field const char (it is not modified by the core).
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Lee Jones <lee(a)kernel.org>
Change-Id: Ic3c1fefbb9bf1a27f7adcca5bf0405e02d2f7775
---
drivers/base/driver.c | 69 +++++++++++++++++++++++++++++++++
drivers/base/platform.c | 28 ++-----------
include/linux/device/driver.h | 2 +
include/linux/platform_device.h | 6 ++-
4 files changed, 80 insertions(+), 25 deletions(-)
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 8c0d33e182fd5..1b9d47b10bd0a 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -30,6 +30,75 @@ static struct device *next_device(struct klist_iter *i)
return dev;
}
+/**
+ * driver_set_override() - Helper to set or clear driver override.
+ * @dev: Device to change
+ * @override: Address of string to change (e.g. &device->driver_override);
+ * The contents will be freed and hold newly allocated override.
+ * @s: NUL-terminated string, new driver name to force a match, pass empty
+ * string to clear it ("" or "\n", where the latter is only for sysfs
+ * interface).
+ * @len: length of @s
+ *
+ * Helper to set or clear driver override in a device, intended for the cases
+ * when the driver_override field is allocated by driver/bus code.
+ *
+ * Returns: 0 on success or a negative error code on failure.
+ */
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len)
+{
+ const char *new, *old;
+ char *cp;
+
+ if (!override || !s)
+ return -EINVAL;
+
+ /*
+ * The stored value will be used in sysfs show callback (sysfs_emit()),
+ * which has a length limit of PAGE_SIZE and adds a trailing newline.
+ * Thus we can store one character less to avoid truncation during sysfs
+ * show.
+ */
+ if (len >= (PAGE_SIZE - 1))
+ return -EINVAL;
+
+ if (!len) {
+ /* Empty string passed - clear override */
+ device_lock(dev);
+ old = *override;
+ *override = NULL;
+ device_unlock(dev);
+ kfree(old);
+
+ return 0;
+ }
+
+ cp = strnchr(s, len, '\n');
+ if (cp)
+ len = cp - s;
+
+ new = kstrndup(s, len, GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ device_lock(dev);
+ old = *override;
+ if (cp != s) {
+ *override = new;
+ } else {
+ /* "\n" passed - clear override */
+ kfree(new);
+ *override = NULL;
+ }
+ device_unlock(dev);
+
+ kfree(old);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(driver_set_override);
+
/**
* driver_for_each_device - Iterator for devices bound to a driver.
* @drv: Driver we're iterating.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 88aef93eb4ddf..647066229fec3 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -1046,31 +1046,11 @@ static ssize_t driver_override_store(struct device *dev,
const char *buf, size_t count)
{
struct platform_device *pdev = to_platform_device(dev);
- char *driver_override, *old, *cp;
-
- /* We need to keep extra room for a newline */
- if (count >= (PAGE_SIZE - 1))
- return -EINVAL;
-
- driver_override = kstrndup(buf, count, GFP_KERNEL);
- if (!driver_override)
- return -ENOMEM;
-
- cp = strchr(driver_override, '\n');
- if (cp)
- *cp = '\0';
-
- device_lock(dev);
- old = pdev->driver_override;
- if (strlen(driver_override)) {
- pdev->driver_override = driver_override;
- } else {
- kfree(driver_override);
- pdev->driver_override = NULL;
- }
- device_unlock(dev);
+ int ret;
- kfree(old);
+ ret = driver_set_override(dev, &pdev->driver_override, buf, count);
+ if (ret)
+ return ret;
return count;
}
diff --git a/include/linux/device/driver.h b/include/linux/device/driver.h
index ee7ba5b5417e5..a44f5adeaef5a 100644
--- a/include/linux/device/driver.h
+++ b/include/linux/device/driver.h
@@ -150,6 +150,8 @@ extern int __must_check driver_create_file(struct device_driver *driver,
extern void driver_remove_file(struct device_driver *driver,
const struct driver_attribute *attr);
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len);
extern int __must_check driver_for_each_device(struct device_driver *drv,
struct device *start,
void *data,
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 17f9cd5626c83..e7a83b0218077 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -30,7 +30,11 @@ struct platform_device {
struct resource *resource;
const struct platform_device_id *id_entry;
- char *driver_override; /* Driver name to force a match */
+ /*
+ * Driver name to force a match. Do not set directly, because core
+ * frees it. Use driver_set_override() to set or clear it.
+ */
+ const char *driver_override;
/* MFD cell pointer */
struct mfd_cell *mfd_cell;
--
2.42.0.820.g83a721a137-goog
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
commit 6c2f421174273de8f83cde4286d1c076d43a2d35 upstream.
Several core drivers and buses expect that driver_override is a
dynamically allocated memory thus later they can kfree() it.
However such assumption is not documented, there were in the past and
there are already users setting it to a string literal. This leads to
kfree() of static memory during device release (e.g. in error paths or
during unbind):
kernel BUG at ../mm/slub.c:3960!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kfree) from [<c058da50>] (platform_device_release+0x88/0xb4)
(platform_device_release) from [<c0585be0>] (device_release+0x2c/0x90)
(device_release) from [<c0a69050>] (kobject_put+0xec/0x20c)
(kobject_put) from [<c0f2f120>] (exynos5_clk_probe+0x154/0x18c)
(exynos5_clk_probe) from [<c058de70>] (platform_drv_probe+0x6c/0xa4)
(platform_drv_probe) from [<c058b7ac>] (really_probe+0x280/0x414)
(really_probe) from [<c058baf4>] (driver_probe_device+0x78/0x1c4)
(driver_probe_device) from [<c0589854>] (bus_for_each_drv+0x74/0xb8)
(bus_for_each_drv) from [<c058b48c>] (__device_attach+0xd4/0x16c)
(__device_attach) from [<c058a638>] (bus_probe_device+0x88/0x90)
(bus_probe_device) from [<c05871fc>] (device_add+0x3dc/0x62c)
(device_add) from [<c075ff10>] (of_platform_device_create_pdata+0x94/0xbc)
(of_platform_device_create_pdata) from [<c07600ec>] (of_platform_bus_create+0x1a8/0x4fc)
(of_platform_bus_create) from [<c0760150>] (of_platform_bus_create+0x20c/0x4fc)
(of_platform_bus_create) from [<c07605f0>] (of_platform_populate+0x84/0x118)
(of_platform_populate) from [<c0f3c964>] (of_platform_default_populate_init+0xa0/0xb8)
(of_platform_default_populate_init) from [<c01031f8>] (do_one_initcall+0x8c/0x404)
Provide a helper which clearly documents the usage of driver_override.
This will allow later to reuse the helper and reduce the amount of
duplicated code.
Convert the platform driver to use a new helper and make the
driver_override field const char (it is not modified by the core).
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Lee Jones <lee(a)kernel.org>
Change-Id: Ib0c76960fce44b52a71e53aa6e30f39e7e8e5175
---
drivers/base/driver.c | 69 +++++++++++++++++++++++++++++++++
drivers/base/platform.c | 28 ++-----------
include/linux/device.h | 2 +
include/linux/platform_device.h | 6 ++-
4 files changed, 80 insertions(+), 25 deletions(-)
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 4e5ca632f35e8..ef14566a49710 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -29,6 +29,75 @@ static struct device *next_device(struct klist_iter *i)
return dev;
}
+/**
+ * driver_set_override() - Helper to set or clear driver override.
+ * @dev: Device to change
+ * @override: Address of string to change (e.g. &device->driver_override);
+ * The contents will be freed and hold newly allocated override.
+ * @s: NUL-terminated string, new driver name to force a match, pass empty
+ * string to clear it ("" or "\n", where the latter is only for sysfs
+ * interface).
+ * @len: length of @s
+ *
+ * Helper to set or clear driver override in a device, intended for the cases
+ * when the driver_override field is allocated by driver/bus code.
+ *
+ * Returns: 0 on success or a negative error code on failure.
+ */
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len)
+{
+ const char *new, *old;
+ char *cp;
+
+ if (!override || !s)
+ return -EINVAL;
+
+ /*
+ * The stored value will be used in sysfs show callback (sysfs_emit()),
+ * which has a length limit of PAGE_SIZE and adds a trailing newline.
+ * Thus we can store one character less to avoid truncation during sysfs
+ * show.
+ */
+ if (len >= (PAGE_SIZE - 1))
+ return -EINVAL;
+
+ if (!len) {
+ /* Empty string passed - clear override */
+ device_lock(dev);
+ old = *override;
+ *override = NULL;
+ device_unlock(dev);
+ kfree(old);
+
+ return 0;
+ }
+
+ cp = strnchr(s, len, '\n');
+ if (cp)
+ len = cp - s;
+
+ new = kstrndup(s, len, GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ device_lock(dev);
+ old = *override;
+ if (cp != s) {
+ *override = new;
+ } else {
+ /* "\n" passed - clear override */
+ kfree(new);
+ *override = NULL;
+ }
+ device_unlock(dev);
+
+ kfree(old);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(driver_set_override);
+
/**
* driver_for_each_device - Iterator for devices bound to a driver.
* @drv: Driver we're iterating.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 75623b914b8c2..0ed43d185a900 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -973,31 +973,11 @@ static ssize_t driver_override_store(struct device *dev,
const char *buf, size_t count)
{
struct platform_device *pdev = to_platform_device(dev);
- char *driver_override, *old, *cp;
-
- /* We need to keep extra room for a newline */
- if (count >= (PAGE_SIZE - 1))
- return -EINVAL;
-
- driver_override = kstrndup(buf, count, GFP_KERNEL);
- if (!driver_override)
- return -ENOMEM;
-
- cp = strchr(driver_override, '\n');
- if (cp)
- *cp = '\0';
-
- device_lock(dev);
- old = pdev->driver_override;
- if (strlen(driver_override)) {
- pdev->driver_override = driver_override;
- } else {
- kfree(driver_override);
- pdev->driver_override = NULL;
- }
- device_unlock(dev);
+ int ret;
- kfree(old);
+ ret = driver_set_override(dev, &pdev->driver_override, buf, count);
+ if (ret)
+ return ret;
return count;
}
diff --git a/include/linux/device.h b/include/linux/device.h
index c7be3a8073ec3..af4ecbf889107 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -422,6 +422,8 @@ extern int __must_check driver_create_file(struct device_driver *driver,
extern void driver_remove_file(struct device_driver *driver,
const struct driver_attribute *attr);
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len);
extern int __must_check driver_for_each_device(struct device_driver *drv,
struct device *start,
void *data,
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 569f446502bed..c7bd8a1a60976 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -29,7 +29,11 @@ struct platform_device {
struct resource *resource;
const struct platform_device_id *id_entry;
- char *driver_override; /* Driver name to force a match */
+ /*
+ * Driver name to force a match. Do not set directly, because core
+ * frees it. Use driver_set_override() to set or clear it.
+ */
+ const char *driver_override;
/* MFD cell pointer */
struct mfd_cell *mfd_cell;
--
2.42.0.820.g83a721a137-goog
From: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
commit 6c2f421174273de8f83cde4286d1c076d43a2d35 upstream.
Several core drivers and buses expect that driver_override is a
dynamically allocated memory thus later they can kfree() it.
However such assumption is not documented, there were in the past and
there are already users setting it to a string literal. This leads to
kfree() of static memory during device release (e.g. in error paths or
during unbind):
kernel BUG at ../mm/slub.c:3960!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kfree) from [<c058da50>] (platform_device_release+0x88/0xb4)
(platform_device_release) from [<c0585be0>] (device_release+0x2c/0x90)
(device_release) from [<c0a69050>] (kobject_put+0xec/0x20c)
(kobject_put) from [<c0f2f120>] (exynos5_clk_probe+0x154/0x18c)
(exynos5_clk_probe) from [<c058de70>] (platform_drv_probe+0x6c/0xa4)
(platform_drv_probe) from [<c058b7ac>] (really_probe+0x280/0x414)
(really_probe) from [<c058baf4>] (driver_probe_device+0x78/0x1c4)
(driver_probe_device) from [<c0589854>] (bus_for_each_drv+0x74/0xb8)
(bus_for_each_drv) from [<c058b48c>] (__device_attach+0xd4/0x16c)
(__device_attach) from [<c058a638>] (bus_probe_device+0x88/0x90)
(bus_probe_device) from [<c05871fc>] (device_add+0x3dc/0x62c)
(device_add) from [<c075ff10>] (of_platform_device_create_pdata+0x94/0xbc)
(of_platform_device_create_pdata) from [<c07600ec>] (of_platform_bus_create+0x1a8/0x4fc)
(of_platform_bus_create) from [<c0760150>] (of_platform_bus_create+0x20c/0x4fc)
(of_platform_bus_create) from [<c07605f0>] (of_platform_populate+0x84/0x118)
(of_platform_populate) from [<c0f3c964>] (of_platform_default_populate_init+0xa0/0xb8)
(of_platform_default_populate_init) from [<c01031f8>] (do_one_initcall+0x8c/0x404)
Provide a helper which clearly documents the usage of driver_override.
This will allow later to reuse the helper and reduce the amount of
duplicated code.
Convert the platform driver to use a new helper and make the
driver_override field const char (it is not modified by the core).
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Link: https://lore.kernel.org/r/20220419113435.246203-2-krzysztof.kozlowski@linar…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Lee Jones <lee(a)kernel.org>
Change-Id: I2f59769cfb99d8359d14e2cb7345ce3428593afc
---
drivers/base/driver.c | 69 +++++++++++++++++++++++++++++++++
drivers/base/platform.c | 28 ++-----------
include/linux/device.h | 2 +
include/linux/platform_device.h | 6 ++-
4 files changed, 80 insertions(+), 25 deletions(-)
diff --git a/drivers/base/driver.c b/drivers/base/driver.c
index 857c8f1b876e5..668c6c8c22f1f 100644
--- a/drivers/base/driver.c
+++ b/drivers/base/driver.c
@@ -29,6 +29,75 @@ static struct device *next_device(struct klist_iter *i)
return dev;
}
+/**
+ * driver_set_override() - Helper to set or clear driver override.
+ * @dev: Device to change
+ * @override: Address of string to change (e.g. &device->driver_override);
+ * The contents will be freed and hold newly allocated override.
+ * @s: NUL-terminated string, new driver name to force a match, pass empty
+ * string to clear it ("" or "\n", where the latter is only for sysfs
+ * interface).
+ * @len: length of @s
+ *
+ * Helper to set or clear driver override in a device, intended for the cases
+ * when the driver_override field is allocated by driver/bus code.
+ *
+ * Returns: 0 on success or a negative error code on failure.
+ */
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len)
+{
+ const char *new, *old;
+ char *cp;
+
+ if (!override || !s)
+ return -EINVAL;
+
+ /*
+ * The stored value will be used in sysfs show callback (sysfs_emit()),
+ * which has a length limit of PAGE_SIZE and adds a trailing newline.
+ * Thus we can store one character less to avoid truncation during sysfs
+ * show.
+ */
+ if (len >= (PAGE_SIZE - 1))
+ return -EINVAL;
+
+ if (!len) {
+ /* Empty string passed - clear override */
+ device_lock(dev);
+ old = *override;
+ *override = NULL;
+ device_unlock(dev);
+ kfree(old);
+
+ return 0;
+ }
+
+ cp = strnchr(s, len, '\n');
+ if (cp)
+ len = cp - s;
+
+ new = kstrndup(s, len, GFP_KERNEL);
+ if (!new)
+ return -ENOMEM;
+
+ device_lock(dev);
+ old = *override;
+ if (cp != s) {
+ *override = new;
+ } else {
+ /* "\n" passed - clear override */
+ kfree(new);
+ *override = NULL;
+ }
+ device_unlock(dev);
+
+ kfree(old);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(driver_set_override);
+
/**
* driver_for_each_device - Iterator for devices bound to a driver.
* @drv: Driver we're iterating.
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index 2f89e618b142c..a09e7a681f7a7 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -891,31 +891,11 @@ static ssize_t driver_override_store(struct device *dev,
const char *buf, size_t count)
{
struct platform_device *pdev = to_platform_device(dev);
- char *driver_override, *old, *cp;
-
- /* We need to keep extra room for a newline */
- if (count >= (PAGE_SIZE - 1))
- return -EINVAL;
-
- driver_override = kstrndup(buf, count, GFP_KERNEL);
- if (!driver_override)
- return -ENOMEM;
-
- cp = strchr(driver_override, '\n');
- if (cp)
- *cp = '\0';
-
- device_lock(dev);
- old = pdev->driver_override;
- if (strlen(driver_override)) {
- pdev->driver_override = driver_override;
- } else {
- kfree(driver_override);
- pdev->driver_override = NULL;
- }
- device_unlock(dev);
+ int ret;
- kfree(old);
+ ret = driver_set_override(dev, &pdev->driver_override, buf, count);
+ if (ret)
+ return ret;
return count;
}
diff --git a/include/linux/device.h b/include/linux/device.h
index 37e359d81a86f..bccd367c11de5 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -330,6 +330,8 @@ extern int __must_check driver_create_file(struct device_driver *driver,
extern void driver_remove_file(struct device_driver *driver,
const struct driver_attribute *attr);
+int driver_set_override(struct device *dev, const char **override,
+ const char *s, size_t len);
extern int __must_check driver_for_each_device(struct device_driver *drv,
struct device *start,
void *data,
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 9e5c98fcea8c6..8268439975b21 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -29,7 +29,11 @@ struct platform_device {
struct resource *resource;
const struct platform_device_id *id_entry;
- char *driver_override; /* Driver name to force a match */
+ /*
+ * Driver name to force a match. Do not set directly, because core
+ * frees it. Use driver_set_override() to set or clear it.
+ */
+ const char *driver_override;
/* MFD cell pointer */
struct mfd_cell *mfd_cell;
--
2.42.0.820.g83a721a137-goog
The backport of commit 9c5df2f14ee3 ("can: isotp: isotp_ops: fix poll() to
not report false EPOLLOUT events") introduced a new regression where the
fix could potentially introduce new side effects.
To reduce the risk of other unmet dependencies and missing fixes and checks
the latest 6.1 LTS code base is ported back to the 5.15 LTS tree.
Lukas Magel (1):
can: isotp: isotp_sendmsg(): fix TX state detection and wait behavior
Oliver Hartkopp (6):
can: isotp: set max PDU size to 64 kByte
can: isotp: isotp_bind(): return -EINVAL on incorrect CAN ID formatting
can: isotp: check CAN address family in isotp_bind()
can: isotp: handle wait_event_interruptible() return values
can: isotp: add local echo tx processing and tx without FC
can: isotp: isotp_bind(): do not validate unused address information
include/uapi/linux/can/isotp.h | 25 +-
net/can/isotp.c | 426 +++++++++++++++++++++------------
2 files changed, 288 insertions(+), 163 deletions(-)
--
2.34.1
In mtk_jpegdec_worker, if error occurs in mtk_jpeg_set_dec_dst, it
will start the timeout worker and invoke v4l2_m2m_job_finish at
the same time. This will break the logic of design for there should
be only one function to call v4l2_m2m_job_finish. But now the timeout
handler and mtk_jpegdec_worker will both invoke it.
Fix it by start the worker only if mtk_jpeg_set_dec_dst successfully
finished.
Fixes: da4ede4b7fd6 ("media: mtk-jpeg: move data/code inside CONFIG_OF blocks")
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko(a)collabora.com>
Cc: stable(a)vger.kernel.org
---
v2:
- put the patches into a single series suggested by Dmitry
---
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
index a39acde2724a..c3456c700c07 100644
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
+++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
@@ -1749,9 +1749,6 @@ static void mtk_jpegdec_worker(struct work_struct *work)
v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
- schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work,
- msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
-
mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs);
if (mtk_jpeg_set_dec_dst(ctx,
&jpeg_src_buf->dec_param,
@@ -1761,6 +1758,9 @@ static void mtk_jpegdec_worker(struct work_struct *work)
goto setdst_end;
}
+ schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work,
+ msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
+
spin_lock_irqsave(&comp_jpeg[hw_id]->hw_lock, flags);
ctx->total_frame_num++;
mtk_jpeg_dec_reset(comp_jpeg[hw_id]->reg_base);
--
2.25.1
In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
mtk_jpeg_job_timeout_work.
In mtk_jpeg_dec_device_run, if error happens in
mtk_jpeg_set_dec_dst, it will finally start the worker while
mark the job as finished by invoking v4l2_m2m_job_finish.
There are two methods to trigger the bug. If we remove the
module, it which will call mtk_jpeg_remove to make cleanup.
The possible sequence is as follows, which will cause a
use-after-free bug.
CPU0 CPU1
mtk_jpeg_dec_... |
start worker |
|mtk_jpeg_job_timeout_work
mtk_jpeg_remove |
v4l2_m2m_release |
kfree(m2m_dev); |
|
| v4l2_m2m_get_curr_priv
| m2m_dev->curr_ctx //use
If we close the file descriptor, which will call mtk_jpeg_release,
it will have a similar sequence.
Fix this bug by starting timeout worker only if started jpegdec worker
successfully. Then v4l2_m2m_job_finish will only be called in
either mtk_jpeg_job_timeout_work or mtk_jpeg_dec_device_run.
Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko(a)collabora.com>
Cc: stable(a)vger.kernel.org
---
v2:
- put the patches into a single series suggested by Dmitry
---
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
index 60425c99a2b8..a39acde2724a 100644
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
+++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
@@ -1021,13 +1021,13 @@ static void mtk_jpeg_dec_device_run(void *priv)
if (ret < 0)
goto dec_end;
- schedule_delayed_work(&jpeg->job_timeout_work,
- msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
-
mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs);
if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, &dst_buf->vb2_buf, &fb))
goto dec_end;
+ schedule_delayed_work(&jpeg->job_timeout_work,
+ msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
+
spin_lock_irqsave(&jpeg->hw_lock, flags);
mtk_jpeg_dec_reset(jpeg->reg_base);
mtk_jpeg_dec_set_config(jpeg->reg_base,
--
2.25.1
In mtk_jpegdec_worker, if error occurs in mtk_jpeg_set_dec_dst, it
will start the timeout worker and invoke v4l2_m2m_job_finish at
the same time. This will break the logic of design for there should
be only one function to call v4l2_m2m_job_finish. But now the timeout
handler and mtk_jpegdec_worker will both invoke it.
Fix it by start the worker only if mtk_jpeg_set_dec_dst successfully
finished.
Fixes: da4ede4b7fd6 ("media: mtk-jpeg: move data/code inside CONFIG_OF blocks")
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko(a)collabora.com>
Cc: stable(a)vger.kernel.org
---
v2:
- put the patches into a single series suggested by Dmitry
---
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
index a39acde2724a..c3456c700c07 100644
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
+++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
@@ -1749,9 +1749,6 @@ static void mtk_jpegdec_worker(struct work_struct *work)
v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
- schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work,
- msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
-
mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs);
if (mtk_jpeg_set_dec_dst(ctx,
&jpeg_src_buf->dec_param,
@@ -1761,6 +1758,9 @@ static void mtk_jpegdec_worker(struct work_struct *work)
goto setdst_end;
}
+ schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work,
+ msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
+
spin_lock_irqsave(&comp_jpeg[hw_id]->hw_lock, flags);
ctx->total_frame_num++;
mtk_jpeg_dec_reset(comp_jpeg[hw_id]->reg_base);
--
2.25.1
In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
mtk_jpeg_job_timeout_work.
In mtk_jpeg_dec_device_run, if error happens in
mtk_jpeg_set_dec_dst, it will finally start the worker while
mark the job as finished by invoking v4l2_m2m_job_finish.
There are two methods to trigger the bug. If we remove the
module, it which will call mtk_jpeg_remove to make cleanup.
The possible sequence is as follows, which will cause a
use-after-free bug.
CPU0 CPU1
mtk_jpeg_dec_... |
start worker |
|mtk_jpeg_job_timeout_work
mtk_jpeg_remove |
v4l2_m2m_release |
kfree(m2m_dev); |
|
| v4l2_m2m_get_curr_priv
| m2m_dev->curr_ctx //use
If we close the file descriptor, which will call mtk_jpeg_release,
it will have a similar sequence.
Fix this bug by starting timeout worker only if started jpegdec worker
successfully. Then v4l2_m2m_job_finish will only be called in
either mtk_jpeg_job_timeout_work or mtk_jpeg_dec_device_run.
Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko(a)collabora.com>
Cc: stable(a)vger.kernel.org
---
v2:
- put the patches into a single series suggested by Dmitry
---
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
index 60425c99a2b8..a39acde2724a 100644
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
+++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
@@ -1021,13 +1021,13 @@ static void mtk_jpeg_dec_device_run(void *priv)
if (ret < 0)
goto dec_end;
- schedule_delayed_work(&jpeg->job_timeout_work,
- msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
-
mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs);
if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, &dst_buf->vb2_buf, &fb))
goto dec_end;
+ schedule_delayed_work(&jpeg->job_timeout_work,
+ msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
+
spin_lock_irqsave(&jpeg->hw_lock, flags);
mtk_jpeg_dec_reset(jpeg->reg_base);
mtk_jpeg_dec_set_config(jpeg->reg_base,
--
2.25.1
In mtk_jpegdec_worker, if error occurs in mtk_jpeg_set_dec_dst, it
will start the timeout worker and invoke v4l2_m2m_job_finish at
the same time. This will break the logic of design for there should
be only one function to call v4l2_m2m_job_finish. But now the timeout
handler and mtk_jpegdec_worker will both invoke it.
Fix it by start the worker only if mtk_jpeg_set_dec_dst successfully
finished.
Fixes: da4ede4b7fd6 ("media: mtk-jpeg: move data/code inside CONFIG_OF blocks")
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko(a)collabora.com>
Cc: stable(a)vger.kernel.org
---
v2:
- put the patches into a single series suggested by Dmitry
---
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
index a39acde2724a..c3456c700c07 100644
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
+++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
@@ -1749,9 +1749,6 @@ static void mtk_jpegdec_worker(struct work_struct *work)
v4l2_m2m_src_buf_remove(ctx->fh.m2m_ctx);
v4l2_m2m_dst_buf_remove(ctx->fh.m2m_ctx);
- schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work,
- msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
-
mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs);
if (mtk_jpeg_set_dec_dst(ctx,
&jpeg_src_buf->dec_param,
@@ -1761,6 +1758,9 @@ static void mtk_jpegdec_worker(struct work_struct *work)
goto setdst_end;
}
+ schedule_delayed_work(&comp_jpeg[hw_id]->job_timeout_work,
+ msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
+
spin_lock_irqsave(&comp_jpeg[hw_id]->hw_lock, flags);
ctx->total_frame_num++;
mtk_jpeg_dec_reset(comp_jpeg[hw_id]->reg_base);
--
2.25.1
In mtk_jpeg_probe, &jpeg->job_timeout_work is bound with
mtk_jpeg_job_timeout_work.
In mtk_jpeg_dec_device_run, if error happens in
mtk_jpeg_set_dec_dst, it will finally start the worker while
mark the job as finished by invoking v4l2_m2m_job_finish.
There are two methods to trigger the bug. If we remove the
module, it which will call mtk_jpeg_remove to make cleanup.
The possible sequence is as follows, which will cause a
use-after-free bug.
CPU0 CPU1
mtk_jpeg_dec_... |
start worker |
|mtk_jpeg_job_timeout_work
mtk_jpeg_remove |
v4l2_m2m_release |
kfree(m2m_dev); |
|
| v4l2_m2m_get_curr_priv
| m2m_dev->curr_ctx //use
If we close the file descriptor, which will call mtk_jpeg_release,
it will have a similar sequence.
Fix this bug by starting timeout worker only if started jpegdec worker
successfully. Then v4l2_m2m_job_finish will only be called in
either mtk_jpeg_job_timeout_work or mtk_jpeg_dec_device_run.
Fixes: b2f0d2724ba4 ("[media] vcodec: mediatek: Add Mediatek JPEG Decoder Driver")
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko(a)collabora.com>
Cc: stable(a)vger.kernel.org
---
v2:
- put the patches into a single series suggested by Dmitry
---
drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
index 60425c99a2b8..a39acde2724a 100644
--- a/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
+++ b/drivers/media/platform/mediatek/jpeg/mtk_jpeg_core.c
@@ -1021,13 +1021,13 @@ static void mtk_jpeg_dec_device_run(void *priv)
if (ret < 0)
goto dec_end;
- schedule_delayed_work(&jpeg->job_timeout_work,
- msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
-
mtk_jpeg_set_dec_src(ctx, &src_buf->vb2_buf, &bs);
if (mtk_jpeg_set_dec_dst(ctx, &jpeg_src_buf->dec_param, &dst_buf->vb2_buf, &fb))
goto dec_end;
+ schedule_delayed_work(&jpeg->job_timeout_work,
+ msecs_to_jiffies(MTK_JPEG_HW_TIMEOUT_MSEC));
+
spin_lock_irqsave(&jpeg->hw_lock, flags);
mtk_jpeg_dec_reset(jpeg->reg_base);
mtk_jpeg_dec_set_config(jpeg->reg_base,
--
2.25.1
On Mon, Oct 30, 2023 at 12:04 PM Mingwei Zhang <mizhang(a)google.com> wrote:
>
> On Fri, Sep 15, 2023 at 9:10 PM Ian Rogers <irogers(a)google.com> wrote:
> >
> > Dummy events are created with an attribute where the period and freq
> > are zero. evsel__config will then see the uninitialized values and
> > initialize them in evsel__default_freq_period. As fequency mode is
> > used by default the dummy event would be set to use frequency
> > mode. However, this has no effect on the dummy event but does cause
> > unnecessary timers/interrupts. Avoid this overhead by setting the
> > period to 1 for dummy events.
> >
> > evlist__add_aux_dummy calls evlist__add_dummy then sets freq=0 and
> > period=1. This isn't necessary after this change and so the setting is
> > removed.
> >
> > From Stephane:
> >
> > The dummy event is not counting anything. It is used to collect mmap
> > records and avoid a race condition during the synthesize mmap phase of
> > perf record. As such, it should not cause any overhead during active
> > profiling. Yet, it did. Because of a bug the dummy event was
> > programmed as a sampling event in frequency mode. Events in that mode
> > incur more kernel overheads because on timer tick, the kernel has to
> > look at the number of samples for each event and potentially adjust
> > the sampling period to achieve the desired frequency. The dummy event
> > was therefore adding a frequency event to task and ctx contexts we may
> > otherwise not have any, e.g., perf record -a -e
> > cpu/event=0x3c,period=10000000/. On each timer tick the
> > perf_adjust_freq_unthr_context() is invoked and if ctx->nr_freq is
> > non-zero, then the kernel will loop over ALL the events of the context
> > looking for frequency mode ones. In doing, so it locks the context,
> > and enable/disable the PMU of each hw event. If all the events of the
> > context are in period mode, the kernel will have to traverse the list for
> > nothing incurring overhead. The overhead is multiplied by a very large
> > factor when this happens in a guest kernel. There is no need for the
> > dummy event to be in frequency mode, it does not count anything and
> > therefore should not cause extra overhead for no reason.
> >
> > Fixes: 5bae0250237f ("perf evlist: Introduce perf_evlist__new_dummy constructor")
> > Reported-by: Stephane Eranian <eranian(a)google.com>
> > Signed-off-by: Ian Rogers <irogers(a)google.com>
> > ---
> > tools/perf/util/evlist.c | 5 +++--
> > 1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> > index 25c3ebe2c2f5..e36da58522ef 100644
> > --- a/tools/perf/util/evlist.c
> > +++ b/tools/perf/util/evlist.c
> > @@ -251,6 +251,9 @@ static struct evsel *evlist__dummy_event(struct evlist *evlist)
> > .type = PERF_TYPE_SOFTWARE,
> > .config = PERF_COUNT_SW_DUMMY,
> > .size = sizeof(attr), /* to capture ABI version */
> > + /* Avoid frequency mode for dummy events to avoid associated timers. */
> > + .freq = 0,
> > + .sample_period = 1,
> > };
> >
> > return evsel__new_idx(&attr, evlist->core.nr_entries);
> > @@ -277,8 +280,6 @@ struct evsel *evlist__add_aux_dummy(struct evlist *evlist, bool system_wide)
> > evsel->core.attr.exclude_kernel = 1;
> > evsel->core.attr.exclude_guest = 1;
> > evsel->core.attr.exclude_hv = 1;
> > - evsel->core.attr.freq = 0;
> > - evsel->core.attr.sample_period = 1;
> > evsel->core.system_wide = system_wide;
> > evsel->no_aux_samples = true;
> > evsel->name = strdup("dummy:u");
> > --
> > 2.42.0.459.ge4e396fd5e-goog
> >
>
> Hi Greg,
>
> This patch is a critical performance fix for perf and vPMU. Can you
> help us dispatch the commit to all stable kernel versions?
>
> Appreciate your help. Thanks.
> -Mingwei
Oops... Update target email to: stable(a)vger.kernel.org
On Mon, Oct 30, 2023 at 04:47:01PM +0000, Saleem, Shiraz wrote:
> Hi,
>
> There was a security bug fix recently made to the Intel RDMA driver (irdma) that has made to mainline.
>
> https://github.com/torvalds/linux/commit/bb6d73d9add68ad270888db327514384df…
> subject: RDMA/irdma: Prevent zero-length STAG registration
> commit-id: bb6d73d9add68ad270888db327514384dfa44958
>
> This problem in theory is possible in i40iw as well. i40iw is replaced with irdma upstream since 5.14.
>
> However, i40iw is still part of LTS 4.14.x, 4.19.x, 5.4.x, and 5.10. Since it is a security fix, I am thinking its reasonable we backport it to i40iw too for these kernels. The patch would need some adjustments and I can do this if required.
If you feel it is needed, yes, please do the needed backport and submit
it here.
thanks,
greg k-h
Dear all,
after upgrading my toolchain to gcc 13.2 and GNU assembler (GNU
Binutils) 2.41.0.20230926, compiling a 4.14 kernel fails
arch/arm/mm/proc-arm926.S: Assembler messages:
arch/arm/mm/proc-arm926.S:477: Error: junk at end of line, first
unrecognized character is `#'
The problem is that gas 2.41.0.20230926 does no longer support
Solaris style section attributes like
.section ".start", #alloc, #execinstr
Commit 790756c7e022 ("ARM: 8933/1: replace Sun/Solaris style flag on
section directive") fixed up the section attributes that used the legacy
syntax. It seems that this commit landed in 5.5 and has already been
backported to 5.4.
Should we backport this commit to 4.19 and 4.14 as well? If so, should I
submit patches that apply against the 4.19 and 4.14 trees or do you want
to resolve the conflicts when you queue up the patch?
Thanks,
Martin
Commit abd3ac7902fb ("watchdog: sbsa: Support architecture version 1")
introduced new timer math for watchdog revision 1 with the 48 bit offset
register.
The gwdt->clk and timeout are u32, but the argument being calculated is
u64. Without a cast, the compiler performs u32 operations, truncating
intermediate steps, resulting in incorrect values.
A watchdog revision 1 implementation with a gwdt->clk of 1GHz and a
timeout of 600s writes 3647256576 to the one shot watchdog instead of
300000000000, resulting in the watchdog firing in 3.6s instead of 600s.
Force u64 math by casting the first argument (gwdt->clk) as a u64. Make
the order of operations explicit with parenthesis.
Fixes: abd3ac7902fb ("watchdog: sbsa: Support architecture version 1")
Reported-by: Vanshidhar Konda <vanshikonda(a)os.amperecomputing.com>
Signed-off-by: Darren Hart <darren(a)os.amperecomputing.com>
Cc: Wim Van Sebroeck <wim(a)linux-watchdog.org>
Cc: Guenter Roeck <linux(a)roeck-us.net>
Cc: linux-watchdog(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: <stable(a)vger.kernel.org> # 5.14.x
---
drivers/watchdog/sbsa_gwdt.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/watchdog/sbsa_gwdt.c b/drivers/watchdog/sbsa_gwdt.c
index fd3cfdda4949..76527324b63c 100644
--- a/drivers/watchdog/sbsa_gwdt.c
+++ b/drivers/watchdog/sbsa_gwdt.c
@@ -153,14 +153,14 @@ static int sbsa_gwdt_set_timeout(struct watchdog_device *wdd,
timeout = clamp_t(unsigned int, timeout, 1, wdd->max_hw_heartbeat_ms / 1000);
if (action)
- sbsa_gwdt_reg_write(gwdt->clk * timeout, gwdt);
+ sbsa_gwdt_reg_write((u64)gwdt->clk * timeout, gwdt);
else
/*
* In the single stage mode, The first signal (WS0) is ignored,
* the timeout is (WOR * 2), so the WOR should be configured
* to half value of timeout.
*/
- sbsa_gwdt_reg_write(gwdt->clk / 2 * timeout, gwdt);
+ sbsa_gwdt_reg_write(((u64)gwdt->clk / 2) * timeout, gwdt);
return 0;
}
--
2.41.0