Syzkaller found this, fput runs the release from a work queue so the refcount remains elevated during abort. This is tricky so move more handling of files into the core code.
Add a WARN_ON to catch things like this more reliably without relying on kasn.
Update the fail_nth test to succeed on 6.17 kernels.
Jason Gunthorpe (3): iommufd: Fix race during abort for file descriptors iommufd: WARN if an object is aborted with an elevated refcount iommufd/selftest: Update the fail_nth limit
drivers/iommu/iommufd/device.c | 3 +- drivers/iommu/iommufd/eventq.c | 9 +---- drivers/iommu/iommufd/iommufd_private.h | 3 +- drivers/iommu/iommufd/main.c | 39 +++++++++++++++++-- .../selftests/iommu/iommufd_fail_nth.c | 2 +- 5 files changed, 42 insertions(+), 14 deletions(-)
base-commit: 1046d40b0e78d2cd63f6183629699b629b21f877
fput() doesn't actually call file_operations release() synchronously, it puts the file on a work queue and it will be released eventually.
This is normally fine, except for iommufd the file and the iommufd_object are tied to gether. The file has the object as it's private_data and holds a users refcount, while the object is expected to remain alive as long as the file is.
When the allocation of a new object aborts before installing the file it will fput() the file and then go on to immediately kfree() the obj. This causes a UAF once the workqueue completes the fput() and tries to decrement the users refcount.
Fix this by putting the core code in charge of the file lifetime, and call __fput_sync() during abort to ensure that release() is called before kfree. __fput_sync() is a bit too tricky to open code in all the object implementations. Instead the objects tell the core code where the file pointer is and the core will take care of the life cycle.
If the object is successfully allocated then the file will hold a users refcount and the iommufd_object cannot be destroyed.
It is worth noting that close(); ioctl(IOMMU_DESTROY); doesn't have an issue because close() is already using a synchronous version of fput().
The UAF looks like this:
BUG: KASAN: slab-use-after-free in iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376 Write of size 4 at addr ffff888059c97804 by task syz.0.46/6164
CPU: 0 UID: 0 PID: 6164 Comm: syz.0.46 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/18/2025 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x116/0x1f0 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xcd/0x630 mm/kasan/report.c:482 kasan_report+0xe0/0x110 mm/kasan/report.c:595 check_region_inline mm/kasan/generic.c:183 [inline] kasan_check_range+0x100/0x1b0 mm/kasan/generic.c:189 instrument_atomic_read_write include/linux/instrumented.h:96 [inline] atomic_fetch_sub_release include/linux/atomic/atomic-instrumented.h:400 [inline] __refcount_dec include/linux/refcount.h:455 [inline] refcount_dec include/linux/refcount.h:476 [inline] iommufd_eventq_fops_release+0x45/0xc0 drivers/iommu/iommufd/eventq.c:376 __fput+0x402/0xb70 fs/file_table.c:468 task_work_run+0x14d/0x240 kernel/task_work.c:227 resume_user_mode_work include/linux/resume_user_mode.h:50 [inline] exit_to_user_mode_loop+0xeb/0x110 kernel/entry/common.c:43 exit_to_user_mode_prepare include/linux/irq-entry-common.h:225 [inline] syscall_exit_to_user_mode_work include/linux/entry-common.h:175 [inline] syscall_exit_to_user_mode include/linux/entry-common.h:210 [inline] do_syscall_64+0x41c/0x4c0 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f
Cc: stable@vger.kernel.org Fixes: 07838f7fd529 ("iommufd: Add iommufd fault object") Reported-by: syzbot+80620e2d0d0a33b09f93@syzkaller.appspotmail.com Closes: https://lore.kernel.org/r/68c8583d.050a0220.2ff435.03a2.GAE@google.com Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/eventq.c | 9 ++------- drivers/iommu/iommufd/main.c | 35 +++++++++++++++++++++++++++++++--- 2 files changed, 34 insertions(+), 10 deletions(-)
diff --git a/drivers/iommu/iommufd/eventq.c b/drivers/iommu/iommufd/eventq.c index fc4de63b0bce64..e23d9ee4fe3806 100644 --- a/drivers/iommu/iommufd/eventq.c +++ b/drivers/iommu/iommufd/eventq.c @@ -393,12 +393,12 @@ static int iommufd_eventq_init(struct iommufd_eventq *eventq, char *name, const struct file_operations *fops) { struct file *filep; - int fdno;
spin_lock_init(&eventq->lock); INIT_LIST_HEAD(&eventq->deliver); init_waitqueue_head(&eventq->wait_queue);
+ /* The filep is fput() by the core code during failure */ filep = anon_inode_getfile(name, fops, eventq, O_RDWR); if (IS_ERR(filep)) return PTR_ERR(filep); @@ -408,10 +408,7 @@ static int iommufd_eventq_init(struct iommufd_eventq *eventq, char *name, eventq->filep = filep; refcount_inc(&eventq->obj.users);
- fdno = get_unused_fd_flags(O_CLOEXEC); - if (fdno < 0) - fput(filep); - return fdno; + return get_unused_fd_flags(O_CLOEXEC); }
static const struct file_operations iommufd_fault_fops = @@ -452,7 +449,6 @@ int iommufd_fault_alloc(struct iommufd_ucmd *ucmd) return 0; out_put_fdno: put_unused_fd(fdno); - fput(fault->common.filep); return rc; }
@@ -536,7 +532,6 @@ int iommufd_veventq_alloc(struct iommufd_ucmd *ucmd)
out_put_fdno: put_unused_fd(fdno); - fput(veventq->common.filep); out_abort: iommufd_object_abort_and_destroy(ucmd->ictx, &veventq->common.obj); out_unlock_veventqs: diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index 109de747e8b3ed..b8475194279a9a 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -23,6 +23,7 @@ #include "iommufd_test.h"
struct iommufd_object_ops { + size_t file_offset; void (*pre_destroy)(struct iommufd_object *obj); void (*destroy)(struct iommufd_object *obj); void (*abort)(struct iommufd_object *obj); @@ -131,10 +132,30 @@ void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj) void iommufd_object_abort_and_destroy(struct iommufd_ctx *ictx, struct iommufd_object *obj) { - if (iommufd_object_ops[obj->type].abort) - iommufd_object_ops[obj->type].abort(obj); + const struct iommufd_object_ops *ops = &iommufd_object_ops[obj->type]; + + if (ops->file_offset) { + struct file **filep = ((void *)obj) + ops->file_offset; + + /* + * files should hold a users refcount while the file is open and + * put it back in their release. They should hold a pointer to + * obj in their private data. Normal fput() is deferred to a + * workqueue and can get out of order with the following + * kfree(obj). Using the sync version ensures the release + * happens immediately. During abort we require the file + * refcount is one at this point - meaning the object alloc + * function cannot do anything to allow another thread to take a + * refcount prior to a guaranteed success. + */ + if (*filep) + __fput_sync(*filep); + } + + if (ops->abort) + ops->abort(obj); else - iommufd_object_ops[obj->type].destroy(obj); + ops->destroy(obj); iommufd_object_abort(ictx, obj); }
@@ -659,6 +680,12 @@ void iommufd_ctx_put(struct iommufd_ctx *ictx) } EXPORT_SYMBOL_NS_GPL(iommufd_ctx_put, "IOMMUFD");
+#define IOMMUFD_FILE_OFFSET(_struct, _filep, _obj) \ + .file_offset = (offsetof(_struct, _filep) + \ + BUILD_BUG_ON_ZERO(!__same_type( \ + struct file *, ((_struct *)NULL)->_filep)) + \ + BUILD_BUG_ON_ZERO(offsetof(_struct, _obj))) + static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_ACCESS] = { .destroy = iommufd_access_destroy_object, @@ -669,6 +696,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { }, [IOMMUFD_OBJ_FAULT] = { .destroy = iommufd_fault_destroy, + IOMMUFD_FILE_OFFSET(struct iommufd_fault, common.filep, common.obj), }, [IOMMUFD_OBJ_HW_QUEUE] = { .destroy = iommufd_hw_queue_destroy, @@ -691,6 +719,7 @@ static const struct iommufd_object_ops iommufd_object_ops[] = { [IOMMUFD_OBJ_VEVENTQ] = { .destroy = iommufd_veventq_destroy, .abort = iommufd_veventq_abort, + IOMMUFD_FILE_OFFSET(struct iommufd_veventq, common.filep, common.obj), }, [IOMMUFD_OBJ_VIOMMU] = { .destroy = iommufd_viommu_destroy,
If something holds a refcount then it is at risk of UAFing. For abort paths we expect the caller to never share the object with a parallel thread and to clean up any refcounts it obtained on its own.
Add the missing dec inside iommufd_hwpt_paging_alloc()during error unwind by making iommufd_hw_pagetable_attach/detach() proper pairs.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- drivers/iommu/iommufd/device.c | 3 ++- drivers/iommu/iommufd/iommufd_private.h | 3 +-- drivers/iommu/iommufd/main.c | 4 ++++ 3 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/iommu/iommufd/device.c b/drivers/iommu/iommufd/device.c index 65fbd098f9e98f..4c842368289f08 100644 --- a/drivers/iommu/iommufd/device.c +++ b/drivers/iommu/iommufd/device.c @@ -711,6 +711,8 @@ iommufd_hw_pagetable_detach(struct iommufd_device *idev, ioasid_t pasid) iopt_remove_reserved_iova(&hwpt_paging->ioas->iopt, idev->dev); mutex_unlock(&igroup->lock);
+ iommufd_hw_pagetable_put(idev->ictx, hwpt); + /* Caller must destroy hwpt */ return hwpt; } @@ -1057,7 +1059,6 @@ void iommufd_device_detach(struct iommufd_device *idev, ioasid_t pasid) hwpt = iommufd_hw_pagetable_detach(idev, pasid); if (!hwpt) return; - iommufd_hw_pagetable_put(idev->ictx, hwpt); refcount_dec(&idev->obj.users); } EXPORT_SYMBOL_NS_GPL(iommufd_device_detach, "IOMMUFD"); diff --git a/drivers/iommu/iommufd/iommufd_private.h b/drivers/iommu/iommufd/iommufd_private.h index 0da2a81eedfa8b..627f9b78483a0e 100644 --- a/drivers/iommu/iommufd/iommufd_private.h +++ b/drivers/iommu/iommufd/iommufd_private.h @@ -454,9 +454,8 @@ static inline void iommufd_hw_pagetable_put(struct iommufd_ctx *ictx, if (hwpt->obj.type == IOMMUFD_OBJ_HWPT_PAGING) { struct iommufd_hwpt_paging *hwpt_paging = to_hwpt_paging(hwpt);
- lockdep_assert_not_held(&hwpt_paging->ioas->mutex); - if (hwpt_paging->auto_domain) { + lockdep_assert_not_held(&hwpt_paging->ioas->mutex); iommufd_object_put_and_try_destroy(ictx, &hwpt->obj); return; } diff --git a/drivers/iommu/iommufd/main.c b/drivers/iommu/iommufd/main.c index b8475194279a9a..3454b49cc58dcf 100644 --- a/drivers/iommu/iommufd/main.c +++ b/drivers/iommu/iommufd/main.c @@ -122,6 +122,10 @@ void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj) old = xas_store(&xas, NULL); xa_unlock(&ictx->objects); WARN_ON(old != XA_ZERO_ENTRY); + + if (WARN_ON(!refcount_dec_and_test(&obj->users))) + return; + kfree(obj); }
There are more failure conditions now so 400 iterations is not enough pass them all, up it to 1000. The limit exists so it doesn't infinite loop.
Signed-off-by: Jason Gunthorpe jgg@nvidia.com --- tools/testing/selftests/iommu/iommufd_fail_nth.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/iommu/iommufd_fail_nth.c b/tools/testing/selftests/iommu/iommufd_fail_nth.c index 651fc9f13c0810..45c14323a6183c 100644 --- a/tools/testing/selftests/iommu/iommufd_fail_nth.c +++ b/tools/testing/selftests/iommu/iommufd_fail_nth.c @@ -113,7 +113,7 @@ static bool fail_nth_next(struct __test_metadata *_metadata, * necessarily mean a test failure, just that the limit has to be made * bigger. */ - ASSERT_GT(400, nth_state->iteration); + ASSERT_GT(1000, nth_state->iteration); if (nth_state->iteration != 0) { ssize_t res; ssize_t res2;
linux-kselftest-mirror@lists.linaro.org