Hopefully the last round for this.
Added dma_resv_iter_begin/end as requested by Daniel. Fixed a bunch of
problems pointed out by the CI systems and found a few more myselve.
Please review and/or comment,
Christian.
On Fri, Sep 17, 2021 at 02:59:42PM +0200, Alexandre Bailon wrote:
> This adds the device tree bindings for the APU DRM driver.
>
> Signed-off-by: Alexandre Bailon <abailon(a)baylibre.com>
> ---
> .../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 +++++++++++++++++++
> 1 file changed, 38 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
>
> diff --git a/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> new file mode 100644
> index 0000000000000..6f432d3ea478c
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
> @@ -0,0 +1,38 @@
> +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause
> +%YAML 1.2
> +---
> +$id: http://devicetree.org/schemas/gpu/mediatek,apu-drm.yaml#
> +$schema: http://devicetree.org/meta-schemas/core.yaml#
> +
> +title: AI Processor Unit DRM
DRM is a linux thing, not h/w.
> +
> +properties:
> + compatible:
> + const: mediatek,apu-drm
> +
> + remoteproc:
So is remoteproc.
Why don't you have the remoteproc driver create the DRM device?
> + maxItems: 2
> + description:
> + Handle to remoteproc devices controlling the APU
> +
> + iova:
> + maxItems: 1
> + description:
> + Address and size of virtual memory that could used by the APU
Why does this need to be in DT? If you need to reserve certain VAs, then
this discussion[1] might be of interest.
Rob
[1] https://lore.kernel.org/all/YUIPCxnyRutMS47%2F@orome.fritz.box/
On Fri, 17 Sep 2021 14:59:42 +0200, Alexandre Bailon wrote:
> This adds the device tree bindings for the APU DRM driver.
>
> Signed-off-by: Alexandre Bailon <abailon(a)baylibre.com>
> ---
> .../devicetree/bindings/gpu/mtk,apu-drm.yaml | 38 +++++++++++++++++++
> 1 file changed, 38 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
>
My bot found errors running 'make DT_CHECKER_FLAGS=-m dt_binding_check'
on your patch (DT_CHECKER_FLAGS is new in v5.13):
yamllint warnings/errors:
dtschema/dtc warnings/errors:
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: 'maintainers' is a required property
hint: Metaschema for devicetree binding documentation
from schema $id: http://devicetree.org/meta-schemas/base.yaml#
./Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: $id: relative path/filename doesn't match actual path or filename
expected: http://devicetree.org/schemas/gpu/mtk,apu-drm.yaml#
/builds/robherring/linux-dt-review/Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml: ignoring, error in schema:
warning: no schema found in file: ./Documentation/devicetree/bindings/gpu/mtk,apu-drm.yaml
Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dts:19.15-23.11: Warning (unit_address_vs_reg): /example-0/apu@0: node has a unit name, but no reg or ranges property
Documentation/devicetree/bindings/gpu/mtk,apu-drm.example.dt.yaml:0:0: /example-0/apu@0: failed to match any schema with compatible: ['mediatek,apu-drm']
doc reference errors (make refcheckdocs):
See https://patchwork.ozlabs.org/patch/1529388
This check can fail if there are any dependencies. The base for a patch
series is generally the most recent rc1.
If you already ran 'make dt_binding_check' and didn't see the above
error(s), then make sure 'yamllint' is installed and dt-schema is up to
date:
pip3 install dtschema --upgrade
Please check and re-submit.
On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote:
> On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg(a)ziepe.ca> wrote:
> >
> > On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> > > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > > > Hi,
> > > > Re-sending this patch-set following the release of our user-space TPC
> > > > compiler and runtime library.
> > > >
> > > > I would appreciate a review on this.
> > >
> > > I think the big open we have is the entire revoke discussions. Having the
> > > option to let dma-buf hang around which map to random local memory ranges,
> > > without clear ownership link and a way to kill it sounds bad to me.
> > >
> > > I think there's a few options:
> > > - We require revoke support. But I've heard rdma really doesn't like that,
> > > I guess because taking out an MR while holding the dma_resv_lock would
> > > be an inversion, so can't be done. Jason, can you recap what exactly the
> > > hold-up was again that makes this a no-go?
> >
> > RDMA HW can't do revoke.
Like why? I'm assuming when the final open handle or whatever for that MR
is closed, you do clean up everything? Or does that MR still stick around
forever too?
> > So we have to exclude almost all the HW and several interesting use
> > cases to enable a revoke operation.
> >
> > > - For non-revokable things like these dma-buf we'd keep a drm_master
> > > reference around. This would prevent the next open to acquire
> > > ownership rights, which at least prevents all the nasty potential
> > > problems.
> >
> > This is what I generally would expect, the DMABUF FD and its DMA
> > memory just floats about until the unrevokable user releases it, which
> > happens when the FD that is driving the import eventually gets closed.
> This is exactly what we are doing in the driver. We make sure
> everything is valid until the unrevokable user releases it and that
> happens only when the dmabuf fd gets closed.
> And the user can't close it's fd of the device until he performs the
> above, so there is no leakage between users.
Maybe I got the device security model all wrong, but I thought Guadi is
single user, and the only thing it protects is the system against the
Gaudi device trhough iommu/device gart. So roughly the following can
happen:
1. User A opens gaudi device, sets up dma-buf export
2. User A registers that with RDMA, or anything else that doesn't support
revoke.
3. User A closes gaudi device
4. User B opens gaudi device, assumes that it has full control over the
device and uploads some secrets, which happen to end up in the dma-buf
region user A set up
5. User B extracts secrets.
> > I still don't think any of the complexity is needed, pinnable memory
> > is a thing in Linux, just account for it in mlocked and that is
> > enough.
It's not mlocked memory, it's mlocked memory and I can exfiltrate it.
Mlock is fine, exfiltration not so much. It's mlock, but a global pool and
if you didn't munlock then the next mlock from a completely different user
will alias with your stuff.
Or is there something that prevents that? Oded at least explain that gaudi
works like a gpu from 20 years ago, single user, no security at all within
the device.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
On Thu, Sep 16, 2021 at 03:44:25PM +0300, Oded Gabbay wrote:
> On Thu, Sep 16, 2021 at 3:31 PM Daniel Vetter <daniel(a)ffwll.ch> wrote:
> >
> > On Wed, Sep 15, 2021 at 10:45:36AM +0300, Oded Gabbay wrote:
> > > On Tue, Sep 14, 2021 at 7:12 PM Jason Gunthorpe <jgg(a)ziepe.ca> wrote:
> > > >
> > > > On Tue, Sep 14, 2021 at 04:18:31PM +0200, Daniel Vetter wrote:
> > > > > On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> > > > > > Hi,
> > > > > > Re-sending this patch-set following the release of our user-space TPC
> > > > > > compiler and runtime library.
> > > > > >
> > > > > > I would appreciate a review on this.
> > > > >
> > > > > I think the big open we have is the entire revoke discussions. Having the
> > > > > option to let dma-buf hang around which map to random local memory ranges,
> > > > > without clear ownership link and a way to kill it sounds bad to me.
> > > > >
> > > > > I think there's a few options:
> > > > > - We require revoke support. But I've heard rdma really doesn't like that,
> > > > > I guess because taking out an MR while holding the dma_resv_lock would
> > > > > be an inversion, so can't be done. Jason, can you recap what exactly the
> > > > > hold-up was again that makes this a no-go?
> > > >
> > > > RDMA HW can't do revoke.
> >
> > Like why? I'm assuming when the final open handle or whatever for that MR
> > is closed, you do clean up everything? Or does that MR still stick around
> > forever too?
> >
> > > > So we have to exclude almost all the HW and several interesting use
> > > > cases to enable a revoke operation.
> > > >
> > > > > - For non-revokable things like these dma-buf we'd keep a drm_master
> > > > > reference around. This would prevent the next open to acquire
> > > > > ownership rights, which at least prevents all the nasty potential
> > > > > problems.
> > > >
> > > > This is what I generally would expect, the DMABUF FD and its DMA
> > > > memory just floats about until the unrevokable user releases it, which
> > > > happens when the FD that is driving the import eventually gets closed.
> > > This is exactly what we are doing in the driver. We make sure
> > > everything is valid until the unrevokable user releases it and that
> > > happens only when the dmabuf fd gets closed.
> > > And the user can't close it's fd of the device until he performs the
> > > above, so there is no leakage between users.
> >
> > Maybe I got the device security model all wrong, but I thought Guadi is
> > single user, and the only thing it protects is the system against the
> > Gaudi device trhough iommu/device gart. So roughly the following can
> > happen:
> >
> > 1. User A opens gaudi device, sets up dma-buf export
> >
> > 2. User A registers that with RDMA, or anything else that doesn't support
> > revoke.
> >
> > 3. User A closes gaudi device
> This can not happen without User A closing the FD of the dma-buf it exported.
> We prevent User A from closing the device because when it exported the
> dma-buf, the driver's code took a refcnt of the user's private
> structure. You can see that in export_dmabuf_common() in the 2nd
> patch. There is a call there to hl_ctx_get.
> So even if User A calls close(device_fd), the driver won't let any
> other user open the device until User A closes the fd of the dma-buf
> object.
>
> Moreover, once User A will close the dma-buf fd and the device is
> released, the driver will scrub the device memory (this is optional
> for systems who care about security).
>
> And AFAIK, User A can't close the dma-buf fd once it registered it
> with RDMA, without doing unregister.
> This can be seen in ib_umem_dmabuf_get() which calls dma_buf_get()
> which does fget(fd)
Yeah that's essentially what I was looking for. This is defacto
hand-rolling the drm_master owner tracking stuff. As long as we have
something like this in place it should be fine I think.
-Daniel
> > 4. User B opens gaudi device, assumes that it has full control over the
> > device and uploads some secrets, which happen to end up in the dma-buf
> > region user A set up
> >
> > 5. User B extracts secrets.
> >
> > > > I still don't think any of the complexity is needed, pinnable memory
> > > > is a thing in Linux, just account for it in mlocked and that is
> > > > enough.
> >
> > It's not mlocked memory, it's mlocked memory and I can exfiltrate it.
> > Mlock is fine, exfiltration not so much. It's mlock, but a global pool and
> > if you didn't munlock then the next mlock from a completely different user
> > will alias with your stuff.
> >
> > Or is there something that prevents that? Oded at least explain that gaudi
> > works like a gpu from 20 years ago, single user, no security at all within
> > the device.
> > -Daniel
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
Abstract the complexity of iterating over all the fences
in a dma_resv object.
The new loop handles the whole RCU and retry dance and
returns only fences where we can be sure we grabbed the
right one.
Signed-off-by: Christian König <christian.koenig(a)amd.com>
---
drivers/dma-buf/dma-resv.c | 63 ++++++++++++++++++++++++++++++++++++++
include/linux/dma-resv.h | 36 ++++++++++++++++++++++
2 files changed, 99 insertions(+)
diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 84fbe60629e3..213a9b7251ca 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -323,6 +323,69 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence)
}
EXPORT_SYMBOL(dma_resv_add_excl_fence);
+/**
+ * dma_resv_walk_unlocked - walk over fences in a dma_resv obj
+ * @obj: the dma_resv object
+ * @cursor: cursor to record the current position
+ * @all_fences: true returns also the shared fences
+ * @first: if we should start over
+ *
+ * Return all the fences in the dma_resv object which are not yet signaled.
+ * The returned fence has an extra local reference so will stay alive.
+ * If a concurrent modify is detected the whole iterator is started over again.
+ */
+struct dma_fence *dma_resv_walk_unlocked(struct dma_resv *obj,
+ struct dma_resv_cursor *cursor,
+ bool all_fences, bool first)
+{
+ struct dma_fence *fence = NULL;
+
+ do {
+ /* Drop the reference from the previous round */
+ dma_fence_put(fence);
+
+ cursor->is_first = first;
+ if (first) {
+ cursor->seq = read_seqcount_begin(&obj->seq);
+ cursor->index = -1;
+ cursor->fences = dma_resv_shared_list(obj);
+ cursor->is_exclusive = true;
+
+ fence = dma_resv_excl_fence(obj);
+ if (fence && test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+ &fence->flags))
+ fence = NULL;
+ } else {
+ fence = NULL;
+ }
+
+ if (fence) {
+ fence = dma_fence_get_rcu(fence);
+ } else if (all_fences && cursor->fences) {
+ struct dma_resv_list *fences = cursor->fences;
+
+ cursor->is_exclusive = false;
+ while (++cursor->index < fences->shared_count) {
+ fence = rcu_dereference(fences->shared[
+ cursor->index]);
+ if (!test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+ &fence->flags))
+ break;
+ }
+ if (cursor->index < fences->shared_count)
+ fence = dma_fence_get_rcu(fence);
+ else
+ fence = NULL;
+ }
+
+ /* For the eventually next round */
+ first = true;
+ } while (read_seqcount_retry(&obj->seq, cursor->seq));
+
+ return fence;
+}
+EXPORT_SYMBOL_GPL(dma_resv_walk_unlocked);
+
/**
* dma_resv_copy_fences - Copy all fences from src to dst.
* @dst: the destination reservation object
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index 9100dd3dc21f..f5b91c292ee0 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -149,6 +149,39 @@ struct dma_resv {
struct dma_resv_list __rcu *fence;
};
+/**
+ * struct dma_resv_cursor - current position into the dma_resv fences
+ * @seq: sequence number to check
+ * @index: index into the shared fences
+ * @shared: the shared fences
+ * @is_first: true if this is the first returned fence
+ * @is_exclusive: if the current fence is the exclusive one
+ */
+struct dma_resv_cursor {
+ unsigned int seq;
+ unsigned int index;
+ struct dma_resv_list *fences;
+ bool is_first;
+ bool is_exclusive;
+};
+
+/**
+ * dma_resv_for_each_fence_unlocked - fence iterator
+ * @obj: a dma_resv object pointer
+ * @cursor: a struct dma_resv_cursor pointer
+ * @all_fences: true if all fences should be returned
+ * @fence: the current fence
+ *
+ * Iterate over the fences in a struct dma_resv object without holding the
+ * dma_resv::lock. The RCU read side lock must be hold when using this, but can
+ * be dropped and re-taken as necessary inside the loop. @all_fences controls
+ * if the shared fences are returned as well.
+ */
+#define dma_resv_for_each_fence_unlocked(obj, cursor, all_fences, fence) \
+ for (fence = dma_resv_walk_unlocked(obj, cursor, all_fences, true); \
+ fence; dma_fence_put(fence), \
+ fence = dma_resv_walk_unlocked(obj, cursor, all_fences, false))
+
#define dma_resv_held(obj) lockdep_is_held(&(obj)->lock.base)
#define dma_resv_assert_held(obj) lockdep_assert_held(&(obj)->lock.base)
@@ -366,6 +399,9 @@ void dma_resv_fini(struct dma_resv *obj);
int dma_resv_reserve_shared(struct dma_resv *obj, unsigned int num_fences);
void dma_resv_add_shared_fence(struct dma_resv *obj, struct dma_fence *fence);
void dma_resv_add_excl_fence(struct dma_resv *obj, struct dma_fence *fence);
+struct dma_fence *dma_resv_walk_unlocked(struct dma_resv *obj,
+ struct dma_resv_cursor *cursor,
+ bool first, bool all_fences);
int dma_resv_get_fences(struct dma_resv *obj, struct dma_fence **pfence_excl,
unsigned *pshared_count, struct dma_fence ***pshared);
int dma_resv_copy_fences(struct dma_resv *dst, struct dma_resv *src);
--
2.25.1
Next round for that one here, maybe the CI systems are now more
gracefully with me :)
I'm pretty sure that a couple of those dma_resv_for_each_fence_unlocked
should actually be replaced with lock+dma_resv_for_each_fence, but that
needs more auditing.
Please review and comment.
Thanks,
Christian.
Hi everybody,
we recently found that a good bunch of the RCU accesses to the dma_resv object are actually not correctly protected.
Those where fixed by either dropping the RCU approach and taking appropriate locks or using a central function to return the current fences as array and then work with that snapshot.
This set now tries to prevent adding any new broken code by rolling out two new interfaces to access the fences in a dma_resv object:
dma_resv_for_each_fence() - Iterator which should be used while holding the reservation lock.
dma_resv_for_each_fence_unlocked() - Iterator based on RCU which can be used without holding the reservation lock and automatic restart on concurrent modification.
While doing this we also move the decision which fences to use for write and read accesses into the dma_resv object which results in a quite nice code de-duplication and simplification.
The only two remaining users of the RCU shared fence interface are removing shared fences in amdkfd and debugfs code in qxl which will both be addresses in the next patch set.
Please review and/or comment,
Christian.
On Sun, Sep 12, 2021 at 07:53:07PM +0300, Oded Gabbay wrote:
> Hi,
> Re-sending this patch-set following the release of our user-space TPC
> compiler and runtime library.
>
> I would appreciate a review on this.
I think the big open we have is the entire revoke discussions. Having the
option to let dma-buf hang around which map to random local memory ranges,
without clear ownership link and a way to kill it sounds bad to me.
I think there's a few options:
- We require revoke support. But I've heard rdma really doesn't like that,
I guess because taking out an MR while holding the dma_resv_lock would
be an inversion, so can't be done. Jason, can you recap what exactly the
hold-up was again that makes this a no-go?
- The other option I discussed is a bit more the exlusive device ownership
model we've had for gpus in drm of the really old kind. Roughly this
would work like this, in terms of drm_device:
- Only the current owner (drm_master in current drm code, but should
probably rename that to drm_owner) is allowed to use the accel driver.
So all ioctl would fail if you're not drm_master.
- On dropmaster/file close we'd revoke as much as possible, e.g.
in-flight commands, mmaps, anything really that can be revoked.
- For non-revokable things like these dma-buf we'd keep a drm_master
reference around. This would prevent the next open to acquire
ownership rights, which at least prevents all the nasty potential
problems.
- admin (or well container orchestrator) then has responsibility to
shoot down all process until the problem goes away (i.e. until you hit
the one with the rdma MR which keeps the dma-buf alive)
- Not sure there's another reasonable way to do this without inviting some
problems once we get outside of the "single kernel instance per tenant"
use-case.
Wrt implementation there's the trouble of this reinventing a bunch of drm
stuff and concepts, but that's maybe for after we've figured out
semantics.
Also would be great if you have a pull request for the userspace runtime
that shows a bit how this all gets used and tied together. Or maybe some
pointers, since I guess retconning a PR in github is maybe a bit much.
Cheers, Daniel
>
> Thanks,
> Oded
>
> Oded Gabbay (1):
> habanalabs: define uAPI to export FD for DMA-BUF
>
> Tomer Tayar (1):
> habanalabs: add support for dma-buf exporter
>
> drivers/misc/habanalabs/Kconfig | 1 +
> drivers/misc/habanalabs/common/habanalabs.h | 22 +
> drivers/misc/habanalabs/common/memory.c | 522 +++++++++++++++++++-
> drivers/misc/habanalabs/gaudi/gaudi.c | 1 +
> drivers/misc/habanalabs/goya/goya.c | 1 +
> include/uapi/misc/habanalabs.h | 28 +-
> 6 files changed, 570 insertions(+), 5 deletions(-)
>
> --
> 2.17.1
>
--
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch