The use-after-free bug appears when:
- A platform device is created from OF, by of_device_add();
- The same device's name is changed afterwards using dev_set_name(),
by its probe for example.
Out of the 37 drivers that deal with platform devices and do a
dev_set_name() call, only one might be affected. That driver is
loongson-i2s-plat [0]. All other dev_set_name() calls are on children
devices created on the spot. The issue was found on downstream kernels
and we don't have what it takes to test loongson-i2s-plat.
Note: loongson-i2s-plat maintainers are CCed.
⟩ # Finding potential trouble-makers:
⟩ git grep -l 'struct platform_device' | xargs grep -l dev_set_name
The solution proposed is to add a flag to platform_device that tells if
it is responsible for freeing its name. We can then duplicate the
device name inside of_device_add() instead of copying the pointer.
What is done elsewhere?
- Platform bus code does a copy of the argument name that is stored
alongside the struct platform_device; see platform_device_alloc()[1].
- Other busses duplicate the device name; either through a dynamic
allocation [2] or through an array embedded inside devices [3].
- Some busses don't have a separate name; when they want a name they
take it from the device [4].
[0]: https://elixir.bootlin.com/linux/v6.13.2/source/sound/soc/loongson/loongson…
[1]: https://elixir.bootlin.com/linux/v6.13.2/source/drivers/base/platform.c#L581
[2]: https://elixir.bootlin.com/linux/v6.13.2/source/drivers/gpu/drm/drm_drv.c#L…
[3]: https://elixir.bootlin.com/linux/v6.13.2/source/include/linux/i2c.h#L343
[4]: https://elixir.bootlin.com/linux/v6.13.2/source/include/linux/pci.h#L2150
This can be reproduced using Buildroot's qemu_aarch64_virt_defconfig
with CONFIG_KASAN=y and a dev_set_name() inside the probe of:
drivers/pci/controller/pci-host-common.c
The below splat appears at boot. It happens whenever something tries to
access pdev->name; one big consumer of this field is platform_match()
that fallbacks to name matching.
==================================================================
BUG: KASAN: slab-use-after-free in strcmp+0x2c/0x78
Read of size 1 at addr ffffff80c0300160 by task swapper/0/1
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.32 #1
Hardware name: linux,dummy-virt (DT)
Call trace:
dump_backtrace+0x90/0xe8
show_stack+0x18/0x24
dump_stack_lvl+0x48/0x60
print_report+0xf8/0x5d8
kasan_report+0x90/0xcc
__asan_load1+0x60/0x6c
strcmp+0x2c/0x78
platform_match+0xd0/0x140
__driver_attach+0x44/0x240
bus_for_each_dev+0xe4/0x160
driver_attach+0x34/0x44
bus_add_driver+0x134/0x270
driver_register+0xa4/0x1e4
__platform_driver_register+0x44/0x54
ged_driver_init+0x1c/0x28
do_one_initcall+0xdc/0x260
kernel_init_freeable+0x314/0x448
kernel_init+0x2c/0x1e0
ret_from_fork+0x10/0x20
Allocated by task 1:
kasan_save_stack+0x3c/0x64
kasan_set_track+0x2c/0x40
kasan_save_alloc_info+0x24/0x34
__kasan_kmalloc+0xb8/0xbc
__kmalloc_node_track_caller+0x64/0xa4
kvasprintf+0xcc/0x16c
kvasprintf_const+0xe8/0x180
kobject_set_name_vargs+0x54/0xd4
dev_set_name+0xa8/0xe4
of_device_make_bus_id+0x298/0x2b0
of_device_alloc+0x1ec/0x204
of_platform_device_create_pdata+0x60/0x168
of_platform_bus_create+0x20c/0x4a0
of_platform_populate+0x50/0x10c
of_platform_default_populate_init+0xe0/0x100
do_one_initcall+0xdc/0x260
kernel_init_freeable+0x314/0x448
kernel_init+0x2c/0x1e0
ret_from_fork+0x10/0x20
Freed by task 1:
kasan_save_stack+0x3c/0x64
kasan_set_track+0x2c/0x40
kasan_save_free_info+0x38/0x60
__kasan_slab_free+0xe4/0x150
__kmem_cache_free+0x134/0x26c
kfree+0x54/0x6c
kfree_const+0x34/0x40
kobject_set_name_vargs+0xa8/0xd4
dev_set_name+0xa8/0xe4
pci_host_common_probe+0x9c/0x294
platform_probe+0x90/0x100
really_probe+0x100/0x3cc
__driver_probe_device+0xb8/0x18c
driver_probe_device+0x108/0x1d8
__driver_attach+0xc8/0x240
bus_for_each_dev+0xe4/0x160
driver_attach+0x34/0x44
bus_add_driver+0x134/0x270
driver_register+0xa4/0x1e4
__platform_driver_register+0x44/0x54
gen_pci_driver_init+0x1c/0x28
do_one_initcall+0xdc/0x260
kernel_init_freeable+0x314/0x448
kernel_init+0x2c/0x1e0
ret_from_fork+0x10/0x20
The buggy address belongs to the object at ffffff80c0300160
which belongs to the cache kmalloc-16 of size 16
The buggy address is located 0 bytes inside of
freed 16-byte region [ffffff80c0300160, ffffff80c0300170)
The buggy address belongs to the physical page:
page:0000000099fe29a0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100300
flags: 0x8000000000000800(slab|zone=2)
page_type: 0xffffffff()
raw: 8000000000000800 ffffff80c00013c0 dead000000000122 0000000000000000
raw: 0000000000000000 0000000080800080 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffffff80c0300000: fa fb fc fc fa fb fc fc 00 07 fc fc 00 07 fc fc
ffffff80c0300080: 00 07 fc fc 00 02 fc fc 00 02 fc fc 00 02 fc fc
>ffffff80c0300100: 00 06 fc fc 00 06 fc fc 00 06 fc fc fa fb fc fc
^
ffffff80c0300180: 00 00 fc fc 00 00 fc fc 00 06 fc fc 00 06 fc fc
ffffff80c0300200: 00 06 fc fc 00 06 fc fc 00 06 fc fc 00 06 fc fc
==================================================================
Signed-off-by: Théo Lebrun <theo.lebrun(a)bootlin.com>
---
Théo Lebrun (2):
driver core: platform: turn pdev->id_auto into pdev->flags
driver core: platform: avoid use-after-free on pdev->name
drivers/base/platform.c | 8 +++++---
drivers/of/platform.c | 12 +++++++++++-
include/linux/platform_device.h | 4 +++-
3 files changed, 19 insertions(+), 5 deletions(-)
---
base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319
change-id: 20250217-pdev-uaf-1a779a98d81b
Best regards,
--
Théo Lebrun <theo.lebrun(a)bootlin.com>
Commit <d74169ceb0d2> ("iommu/vt-d: Allocate DMAR fault interrupts
locally") moved the call to enable_drhd_fault_handling() to a code
path that does not hold any lock while traversing the drhd list. Fix
it by ensuring the dmar_global_lock lock is held when traversing the
drhd list.
Without this fix, the following warning is triggered:
=============================
WARNING: suspicious RCU usage
6.14.0-rc3 #55 Not tainted
-----------------------------
drivers/iommu/intel/dmar.c:2046 RCU-list traversed in non-reader section!!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 1
2 locks held by cpuhp/1/23:
#0: ffffffff84a67c50 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0
#1: ffffffff84a6a380 (cpuhp_state-up){+.+.}-{0:0}, at: cpuhp_thread_fun+0x87/0x2c0
stack backtrace:
CPU: 1 UID: 0 PID: 23 Comm: cpuhp/1 Not tainted 6.14.0-rc3 #55
Call Trace:
<TASK>
dump_stack_lvl+0xb7/0xd0
lockdep_rcu_suspicious+0x159/0x1f0
? __pfx_enable_drhd_fault_handling+0x10/0x10
enable_drhd_fault_handling+0x151/0x180
cpuhp_invoke_callback+0x1df/0x990
cpuhp_thread_fun+0x1ea/0x2c0
smpboot_thread_fn+0x1f5/0x2e0
? __pfx_smpboot_thread_fn+0x10/0x10
kthread+0x12a/0x2d0
? __pfx_kthread+0x10/0x10
ret_from_fork+0x4a/0x60
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Simply holding the lock in enable_drhd_fault_handling() will trigger a
lock order splat. Avoid holding the dmar_global_lock when calling
iommu_device_register(), which starts the device probe process.
Fixes: d74169ceb0d2 ("iommu/vt-d: Allocate DMAR fault interrupts locally")
Reported-by: Ido Schimmel <idosch(a)idosch.org>
Closes: https://lore.kernel.org/linux-iommu/Zx9OwdLIc_VoQ0-a@shredder.mtl.com/
Cc: stable(a)vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com>
---
drivers/iommu/intel/dmar.c | 1 +
drivers/iommu/intel/iommu.c | 7 +++++++
2 files changed, 8 insertions(+)
diff --git a/drivers/iommu/intel/dmar.c b/drivers/iommu/intel/dmar.c
index 9f424acf474e..e540092d664d 100644
--- a/drivers/iommu/intel/dmar.c
+++ b/drivers/iommu/intel/dmar.c
@@ -2043,6 +2043,7 @@ int enable_drhd_fault_handling(unsigned int cpu)
/*
* Enable fault control interrupt.
*/
+ guard(rwsem_read)(&dmar_global_lock);
for_each_iommu(iommu, drhd) {
u32 fault_status;
int ret;
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index cc46098f875b..9a1e61b429ca 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3146,7 +3146,14 @@ int __init intel_iommu_init(void)
iommu_device_sysfs_add(&iommu->iommu, NULL,
intel_iommu_groups,
"%s", iommu->name);
+ /*
+ * The iommu device probe is protected by the iommu_probe_device_lock.
+ * Release the dmar_global_lock before entering the device probe path
+ * to avoid unnecessary lock order splat.
+ */
+ up_read(&dmar_global_lock);
iommu_device_register(&iommu->iommu, &intel_iommu_ops, NULL);
+ down_read(&dmar_global_lock);
iommu_pmu_register(iommu);
}
--
2.43.0
From: Steven Rostedt <rostedt(a)goodmis.org>
Function graph uses a subops and manager ops mechanism to attach to
ftrace. The manager ops connects to ftrace and the functions it connects
to is defined by a list of subops that it manages.
The function hash that defines what the above ops attaches to limits the
functions to attach if the hash has any content. If the hash is empty, it
means to trace all functions.
The creation of the manager ops hash is done by iterating over all the
subops hashes. If any of the subops hashes is empty, it means that the
manager ops hash must trace all functions as well.
The issue is in the creation of the manager ops. When a second subops is
attached, a new hash is created by starting it as NULL and adding the
subops one at a time. But the NULL ops is mistaken as an empty hash, and
once an empty hash is found, it stops the loop of subops and just enables
all functions.
# echo "f:myevent1 kernel_clone" >> /sys/kernel/tracing/dynamic_events
# cat /sys/kernel/tracing/enabled_functions
kernel_clone (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
# echo "f:myevent2 schedule_timeout" >> /sys/kernel/tracing/dynamic_events
# cat /sys/kernel/tracing/enabled_functions
trace_initcall_start_cb (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
run_init_process (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
try_to_run_init_process (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
x86_pmu_show_pmu_cap (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
cleanup_rapl_pmus (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
uncore_free_pcibus_map (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
uncore_types_exit (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
uncore_pci_exit.part.0 (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
kvm_shutdown (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
vmx_dump_msrs (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
vmx_cleanup_l1d_flush (1) tramp: 0xffffffffc0309000 (ftrace_graph_func+0x0/0x60) ->ftrace_graph_func+0x0/0x60
[..]
Fix this by initializing the new hash to NULL and if the hash is NULL do
not treat it as an empty hash but instead allocate by copying the content
of the first sub ops. Then on subsequent iterations, the new hash will not
be NULL, but the content of the previous subops. If that first subops
attached to all functions, then new hash may assume that the manager ops
also needs to attach to all functions.
Cc: stable(a)vger.kernel.org
Fixes: 5fccc7552ccbc ("ftrace: Add subops logic to allow one ops to manage many")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Changes since v2: https://lore.kernel.org/20250219220510.888959028@goodmis.org
- Have append_hashes() return EMPTY_HASH and not NULL if the resulting
new hash is empty.
kernel/trace/ftrace.c | 33 ++++++++++++++++++++++-----------
1 file changed, 22 insertions(+), 11 deletions(-)
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 728ecda6e8d4..bec54dc27204 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3220,15 +3220,22 @@ static struct ftrace_hash *copy_hash(struct ftrace_hash *src)
* The filter_hash updates uses just the append_hash() function
* and the notrace_hash does not.
*/
-static int append_hash(struct ftrace_hash **hash, struct ftrace_hash *new_hash)
+static int append_hash(struct ftrace_hash **hash, struct ftrace_hash *new_hash,
+ int size_bits)
{
struct ftrace_func_entry *entry;
int size;
int i;
- /* An empty hash does everything */
- if (ftrace_hash_empty(*hash))
- return 0;
+ if (*hash) {
+ /* An empty hash does everything */
+ if (ftrace_hash_empty(*hash))
+ return 0;
+ } else {
+ *hash = alloc_ftrace_hash(size_bits);
+ if (!*hash)
+ return -ENOMEM;
+ }
/* If new_hash has everything make hash have everything */
if (ftrace_hash_empty(new_hash)) {
@@ -3292,16 +3299,18 @@ static int intersect_hash(struct ftrace_hash **hash, struct ftrace_hash *new_has
/* Return a new hash that has a union of all @ops->filter_hash entries */
static struct ftrace_hash *append_hashes(struct ftrace_ops *ops)
{
- struct ftrace_hash *new_hash;
+ struct ftrace_hash *new_hash = NULL;
struct ftrace_ops *subops;
+ int size_bits;
int ret;
- new_hash = alloc_ftrace_hash(ops->func_hash->filter_hash->size_bits);
- if (!new_hash)
- return NULL;
+ if (ops->func_hash->filter_hash)
+ size_bits = ops->func_hash->filter_hash->size_bits;
+ else
+ size_bits = FTRACE_HASH_DEFAULT_BITS;
list_for_each_entry(subops, &ops->subop_list, list) {
- ret = append_hash(&new_hash, subops->func_hash->filter_hash);
+ ret = append_hash(&new_hash, subops->func_hash->filter_hash, size_bits);
if (ret < 0) {
free_ftrace_hash(new_hash);
return NULL;
@@ -3310,7 +3319,8 @@ static struct ftrace_hash *append_hashes(struct ftrace_ops *ops)
if (ftrace_hash_empty(new_hash))
break;
}
- return new_hash;
+ /* Can't return NULL as that means this failed */
+ return new_hash ? : EMPTY_HASH;
}
/* Make @ops trace evenything except what all its subops do not trace */
@@ -3505,7 +3515,8 @@ int ftrace_startup_subops(struct ftrace_ops *ops, struct ftrace_ops *subops, int
filter_hash = alloc_and_copy_ftrace_hash(size_bits, ops->func_hash->filter_hash);
if (!filter_hash)
return -ENOMEM;
- ret = append_hash(&filter_hash, subops->func_hash->filter_hash);
+ ret = append_hash(&filter_hash, subops->func_hash->filter_hash,
+ size_bits);
if (ret < 0) {
free_ftrace_hash(filter_hash);
return ret;
--
2.47.2
From: Ard Biesheuvel <ardb(a)kernel.org>
When using -fPIE codegen, the compiler will emit const global objects
(which are useless unless statically initialized) into .data.rel.ro
rather than .rodata if the object contains fields that carry absolute
addresses of other code or data objects. This permits the linker to
annotate such regions as requiring read-write access only at load time,
but not at execution time (in user space).
This distinction does not matter for the kernel, but it does imply that
const data will end up in writable memory if the .data.rel.ro sections
are not treated in a special way.
So emit .data.rel.ro into the .rodata segment.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
include/asm-generic/vmlinux.lds.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/asm-generic/vmlinux.lds.h b/include/asm-generic/vmlinux.lds.h
index 02a4adb4a999..0d5b186abee8 100644
--- a/include/asm-generic/vmlinux.lds.h
+++ b/include/asm-generic/vmlinux.lds.h
@@ -457,7 +457,7 @@ defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG)
. = ALIGN((align)); \
.rodata : AT(ADDR(.rodata) - LOAD_OFFSET) { \
__start_rodata = .; \
- *(.rodata) *(.rodata.*) \
+ *(.rodata) *(.rodata.*) *(.data.rel.ro*) \
SCHED_DATA \
RO_AFTER_INIT_DATA /* Read only after init */ \
. = ALIGN(8); \
--
2.48.1.601.g30ceb7b040-goog