On Thu, Sep 16, 2021 at 03:44:25PM +0300, Oded Gabbay wrote:
On Thu, Sep 16, 2021 at 3:31 PM Daniel Vetter daniel@ffwll.ch wrote:
On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote:
On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe jgg@ziepe.ca wrote:
On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
Hi, Re-sending this patch-set following the release of our user-space TPC compiler and runtime library.
I would appreciate a review on this.
I think the big open we have is the entire revoke discussions. Having the option to let dma-buf hang around which map to random local memory ranges, without clear ownership link and a way to kill it sounds bad to me.
I think there's a few options:
- We require revoke support. But I've heard rdma really doesn't like that, I guess because taking out an MR while holding the dma_resv_lock would be an inversion, so can't be done. Jason, can you recap what exactly the hold-up was again that makes this a no-go?
RDMA HW can't do revoke.
Like why? I'm assuming when the final open handle or whatever for that MR is closed, you do clean up everything? Or does that MR still stick around forever too?
So we have to exclude almost all the HW and several interesting use cases to enable a revoke operation.
- For non-revokable things like these dma-buf we'd keep a drm_master reference around. This would prevent the next open to acquire ownership rights, which at least prevents all the nasty potential problems.
This is what I generally would expect, the DMABUF FD and its DMA memory just floats about until the unrevokable user releases it, which happens when the FD that is driving the import eventually gets closed.
This is exactly what we are doing in the driver. We make sure everything is valid until the unrevokable user releases it and that happens only when the dmabuf fd gets closed. And the user can't close it's fd of the device until he performs the above, so there is no leakage between users.
Maybe I got the device security model all wrong, but I thought Guadi is single user, and the only thing it protects is the system against the Gaudi device trhough iommu/device gart. So roughly the following can happen:
User A opens gaudi device, sets up dma-buf export
User A registers that with RDMA, or anything else that doesn't support
revoke.
- User A closes gaudi device
This can not happen without User A closing the FD of the dma-buf it exported. We prevent User A from closing the device because when it exported the dma-buf, the driver's code took a refcnt of the user's private structure. You can see that in export_dmabuf_common() in the 2nd patch. There is a call there to hl_ctx_get. So even if User A calls close(device_fd), the driver won't let any other user open the device until User A closes the fd of the dma-buf object.
Moreover, once User A will close the dma-buf fd and the device is released, the driver will scrub the device memory (this is optional for systems who care about security).
And AFAIK, User A can't close the dma-buf fd once it registered it with RDMA, without doing unregister. This can be seen in ib_umem_dmabuf_get() which calls dma_buf_get() which does fget(fd)
Yeah that's essentially what I was looking for. This is defacto hand-rolling the drm_master owner tracking stuff. As long as we have something like this in place it should be fine I think. -Daniel
- User B opens gaudi device, assumes that it has full control over the
device and uploads some secrets, which happen to end up in the dma-buf region user A set up
- User B extracts secrets.
I still don't think any of the complexity is needed, pinnable memory is a thing in Linux, just account for it in mlocked and that is enough.
It's not mlocked memory, it's mlocked memory and I can exfiltrate it. Mlock is fine, exfiltration not so much. It's mlock, but a global pool and if you didn't munlock then the next mlock from a completely different user will alias with your stuff.
Or is there something that prevents that? Oded at least explain that gaudi works like a gpu from 20 years ago, single user, no security at all within the device.
-Daniel
Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch