Hi Jason,
On 11/18/22 21:23, Jason Gunthorpe wrote:
On Fri, Nov 18, 2022 at 05:27:35PM +0100, Eric Auger wrote:
+config IOMMUFD
- tristate "IOMMU Userspace API"
- select INTERVAL_TREE
- select INTERVAL_TREE_SPAN_ITER
- select IOMMU_API
- default n
- help
Provides /dev/iommu the user API to control the IOMMU subsystem as
it relates to managing IO page tables that point at user space memory.
nit: missing ',' after /dev/iommu or Provides /dev/iommu user API
Done
+/**
- iommufd_ref_to_users() - Switch from destroy_rwsem to users refcount
protection
- @obj - Object to release
- Objects have two refcount protections (destroy_rwsem and the refcount_t
- users). Holding either of these will prevent the object from being destroyed.
- Depending on the use case, one protection or the other is appropriate. In
- most cases references are being protected by the destroy_rwsem. This allows
- orderly destruction of the object because iommufd_object_destroy_user() will
- wait for it to become unlocked. However, as a rwsem, it cannot be held across
- a system call return. So cases that have longer term needs must switch
- to the weaker users refcount_t.
- With users protection iommufd_object_destroy_user() will return -EBUSY to
iommufd_object_destroy_user() returns false and iommufd_destroy retruns -EBUSY.
""
- With users protection iommufd_object_destroy_user() will return false,
- refusing to destroy the object, causing -EBUSY to userspace.
*/ ""
- userspace and refuse to destroy the object.
- */
+static inline void iommufd_ref_to_users(struct iommufd_object *obj) +{
- up_read(&obj->destroy_rwsem);
- /* iommufd_lock_obj() obtains users as well */
Do you have a way to check that put() is done in accordance, ie. we are not going to try up_read() the rwsem if this latter is not used anymore?
If put becomes unbalanced then fd closure will WARN_ON
If someone misuses the rwsem (eg double up_reading it) then lockdep will fire
OK
+static int iommufd_fops_release(struct inode *inode, struct file *filp) +{
- struct iommufd_ctx *ictx = filp->private_data;
- struct iommufd_object *obj;
- /* Destroy the graph from depth first */
I would suggest: destroy the leaf objects first thanks to the hierarchical user ref counting? or something alike
"depth first" is a technical term when working with graphs..
OK. I ignored that.
How about replacing both comments with this:
/* * The objects in the xarray form a graph of "users" counts, and we have * to destroy them in a depth first manner. Leaf objects will reduce the * users count of interior objects when they are destroyed. * * Repeatedly destroying all the "1 users" leaf objects will progress * until the entire list is destroyed. If this can't progress then there * is some bug related to object refcounting. */
Yes that looks much clearer to me. Thanks!
- while (!xa_empty(&ictx->objects)) {
unsigned int destroyed = 0;
unsigned long index;
xa_for_each(&ictx->objects, index, obj) {
/*
* Since we are in release elevated users must come from
* other objects holding the users. We will eventually
the sentense sounds a bit cryptic to me.
* destroy the object that holds this one and the next
* pass will progress it.
*/
if (!refcount_dec_if_one(&obj->users))
continue;
destroyed++;
xa_erase(&ictx->objects, index);
iommufd_object_ops[obj->type].destroy(obj);
kfree(obj);
Use iommufd_object_abort_and_destroy(obj) instead of the above 3 lines?
Ah, they are not quite the same things, the order is different and abort has a protective assertion that the xa_array hasn't been messed with. It would be messy to merge them
It is also very similar to iommufd_object_destroy_user() except we shortcut some unncessary locking.
OK
+/**
- DOC: General ioctl format
- The ioctl interface follows a general format to allow for extensibility. Each
- ioctl is passed in a structure pointer as the argument providing the size of
- the structure in the first u32. The kernel checks that any structure space
- beyond what it understands is 0. This allows userspace to use the backward
- compatible portion while consistently using the newer, larger, structures.
- ioctls use a standard meaning for common errnos:
- ENOTTY: The IOCTL number itself is not supported at all
- E2BIG: The IOCTL number is supported, but the provided structure has
- non-zero in a part the kernel does not understand.
- EOPNOTSUPP: The IOCTL number is supported, and the structure is
- understood, however a known field has a value the kernel does not
- understand or support.
- EINVAL: Everything about the IOCTL was understood, but a field is not
- correct.
- ENOENT: An ID or IOVA provided does not exist.
- ENOMEM: Out of memory.
- EOVERFLOW: Mathematics oveflowed.
overflowed
Done
Thanks, Jason
Thanks
Eric