Linaro-mm-sig

linaro-mm-sig@lists.linaro.org

2905 discussions

[PATCH v5 0/2] dma-buf: heaps: Support carved-out heaps

by Maxime Ripard

Hi, This series is the follow-up of the discussion that John and I had some time ago here: https://lore.kernel.org/all/CANDhNCquJn6bH3KxKf65BWiTYLVqSd9892-xtFDHHqqyrr… The initial problem we were discussing was that I'm currently working on a platform which has a memory layout with ECC enabled. However, enabling the ECC has a number of drawbacks on that platform: lower performance, increased memory usage, etc. So for things like framebuffers, the trade-off isn't great and thus there's a memory region with ECC disabled to allocate from for such use cases. After a suggestion from John, I chose to first start using heap allocations flags to allow for userspace to ask for a particular ECC setup. This is then backed by a new heap type that runs from reserved memory chunks flagged as such, and the existing DT properties to specify the ECC properties. After further discussion, it was considered that flags were not the right solution, and relying on the names of the heaps would be enough to let userspace know the kind of buffer it deals with. Thus, even though the uAPI part of it has been dropped in this second version, we still need a driver to create heaps out of carved-out memory regions. In addition to the original usecase, a similar driver can be found in BSPs from most vendors, so I believe it would be a useful addition to the kernel. Let me know what you think, Maxime Signed-off-by: Maxime Ripard <mripard(a)kernel.org> --- Changes in v5: - Rebased on 6.16-rc2 - Switch from property to dedicated binding - Link to v4: https://lore.kernel.org/r/20250520-dma-buf-ecc-heap-v4-1-bd2e1f1bb42c@kerne… Changes in v4: - Rebased on 6.15-rc7 - Map buffers only when map is actually called, not at allocation time - Deal with restricted-dma-pool and shared-dma-pool - Reword Kconfig options - Properly report dma_map_sgtable failures - Link to v3: https://lore.kernel.org/r/20250407-dma-buf-ecc-heap-v3-0-97cdd36a5f29@kerne… Changes in v3: - Reworked global variable patch - Link to v2: https://lore.kernel.org/r/20250401-dma-buf-ecc-heap-v2-0-043fd006a1af@kerne… Changes in v2: - Add vmap/vunmap operations - Drop ECC flags uapi - Rebase on top of 6.14 - Link to v1: https://lore.kernel.org/r/20240515-dma-buf-ecc-heap-v1-0-54cbbd049511@kerne… --- Maxime Ripard (2): dt-bindings: reserved-memory: Introduce carved-out memory region binding dma-buf: heaps: Introduce a new heap for reserved memory .../bindings/reserved-memory/carved-out.yaml | 49 +++ drivers/dma-buf/heaps/Kconfig | 8 + drivers/dma-buf/heaps/Makefile | 1 + drivers/dma-buf/heaps/carveout_heap.c | 362 +++++++++++++++++++++ 4 files changed, 420 insertions(+) --- base-commit: d076bed8cb108ba2236d4d49c92303fda4036893 change-id: 20240515-dma-buf-ecc-heap-28a311d2c94e Best regards, -- Maxime Ripard <mripard(a)kernel.org>

6 hours, 23 minutes

Re: [PATCH v7 04/10] accel/rocket: Add job submission IOCTL

by Rob Herring

On Fri, Jun 6, 2025 at 1:29 AM Tomeu Vizoso <tomeu(a)tomeuvizoso.net> wrote: > > Using the DRM GPU scheduler infrastructure, with a scheduler for each > core. > > Userspace can decide for a series of tasks to be executed sequentially > in the same core, so SRAM locality can be taken advantage of. > > The job submission code was initially based on Panfrost. > > v2: > - Remove hardcoded number of cores > - Misc. style fixes (Jeffrey Hugo) > - Repack IOCTL struct (Jeffrey Hugo) > > v3: > - Adapt to a split of the register block in the DT bindings (Nicolas > Frattaroli) > - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo) > - Use drm_* logging functions (Thomas Zimmermann) > - Rename reg i/o macros (Thomas Zimmermann) > - Add padding to ioctls and check for zero (Jeff Hugo) > - Improve error handling (Nicolas Frattaroli) > > v6: > - Use mutexes guard (Markus Elfring) > - Use u64_to_user_ptr (Jeff Hugo) > - Drop rocket_fence (Rob Herring) > > v7: > - Assign its own IOMMU domain to each client, for isolation (Daniel > Stone and Robin Murphy) > > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > --- [...] > --- a/include/uapi/drm/rocket_accel.h > +++ b/include/uapi/drm/rocket_accel.h > @@ -12,8 +12,10 @@ extern "C" { > #endif > > #define DRM_ROCKET_CREATE_BO 0x00 > +#define DRM_ROCKET_SUBMIT 0x01 > > #define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo) > +#define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit) > > /** > * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs. > @@ -37,6 +39,68 @@ struct drm_rocket_create_bo { > __u64 offset; > }; > > +/** > + * struct drm_rocket_task - A task to be run on the NPU > + * > + * A task is the smallest unit of work that can be run on the NPU. > + */ > +struct drm_rocket_task { > + /** Input: DMA address to NPU mapping of register command buffer */ > + __u64 regcmd; > + > + /** Input: Number of commands in the register command buffer */ > + __u32 regcmd_count; > + > + /** Reserved, must be zero. */ > + __u32 reserved; > +}; > + > +/** > + * struct drm_rocket_job - A job to be run on the NPU > + * > + * The kernel will schedule the execution of this job taking into account its > + * dependencies with other jobs. All tasks in the same job will be executed > + * sequentially on the same core, to benefit from memory residency in SRAM. > + */ > +struct drm_rocket_job { > + /** Input: Pointer to an array of struct drm_rocket_task. */ > + __u64 tasks; > + > + /** Input: Pointer to a u32 array of the BOs that are read by the job. */ > + __u64 in_bo_handles; > + > + /** Input: Pointer to a u32 array of the BOs that are written to by the job. */ > + __u64 out_bo_handles; > + > + /** Input: Number of tasks passed in. */ > + __u32 task_count; > + > + /** Input: Number of input BO handles passed in (size is that times 4). */ > + __u32 in_bo_handle_count; > + > + /** Input: Number of output BO handles passed in (size is that times 4). */ > + __u32 out_bo_handle_count; > + > + /** Reserved, must be zero. */ > + __u32 reserved; > +}; > + > +/** > + * struct drm_rocket_submit - ioctl argument for submitting commands to the NPU. > + * > + * The kernel will schedule the execution of these jobs in dependency order. > + */ > +struct drm_rocket_submit { > + /** Input: Pointer to an array of struct drm_rocket_job. */ > + __u64 jobs; > + > + /** Input: Number of jobs passed in. */ > + __u32 job_count; Isn't there a problem if you need to expand drm_rocket_job beyond using the 1 reserved field? You can't add to the struct because then you don't know the size here. So you have to modify drm_rocket_submit to modify drm_rocket_job. Maybe better if you plan for that now rather than later by making the size explicit. Though etnaviv at least has similar issues. Rob > + > + /** Reserved, must be zero. */ > + __u32 reserved; > +};

7 hours, 35 minutes

[PATCH v2] drm/gem: Acquire references on GEM handles for framebuffers

by Thomas Zimmermann

A GEM handle can be released while the GEM buffer object is attached to a DRM framebuffer. This leads to the release of the dma-buf backing the buffer object, if any. [1] Trying to use the framebuffer in further mode-setting operations leads to a segmentation fault. Most easily happens with driver that use shadow planes for vmap-ing the dma-buf during a page flip. An example is shown below. [ 156.791968] ------------[ cut here ]------------ [ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430 [...] [ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430 [ 157.043420] Call Trace: [ 157.045898] <TASK> [ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710 [ 157.065567] ? dma_buf_vmap+0x224/0x430 [ 157.069446] ? __warn.cold+0x58/0xe4 [ 157.073061] ? dma_buf_vmap+0x224/0x430 [ 157.077111] ? report_bug+0x1dd/0x390 [ 157.080842] ? handle_bug+0x5e/0xa0 [ 157.084389] ? exc_invalid_op+0x14/0x50 [ 157.088291] ? asm_exc_invalid_op+0x16/0x20 [ 157.092548] ? dma_buf_vmap+0x224/0x430 [ 157.096663] ? dma_resv_get_singleton+0x6d/0x230 [ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10 [ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10 [ 157.110697] drm_gem_shmem_vmap+0x74/0x710 [ 157.114866] drm_gem_vmap+0xa9/0x1b0 [ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0 [ 157.123086] drm_gem_fb_vmap+0xab/0x300 [ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10 [ 157.133032] ? lockdep_init_map_type+0x19d/0x880 [ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0 [ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180 [ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40 [...] [ 157.346424] ---[ end trace 0000000000000000 ]--- Acquiring GEM handles for the framebuffer's GEM buffer objects prevents this from happening. The framebuffer's cleanup later puts the handle references. Commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object instance") triggers the segmentation fault easily by using the dma-buf field more widely. The underlying issue with reference counting has been present before. v2: - acquire the handle instead of the BO (Christian) - fix comment style (Christian) - drop the Fixes tag (Christian) - rename err_ gotos - add missing Link tag Suggested-by: Christian König <christian.koenig(a)amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> Link: https://elixir.bootlin.com/linux/v6.15/source/drivers/gpu/drm/drm_gem.c#L241 # [1] Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: Anusha Srivatsa <asrivats(a)redhat.com> Cc: Christian König <christian.koenig(a)amd.com> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: dri-devel(a)lists.freedesktop.org Cc: linaro-mm-sig(a)lists.linaro.org Cc: <stable(a)vger.kernel.org> --- drivers/gpu/drm/drm_gem.c | 44 ++++++++++++++++++-- drivers/gpu/drm/drm_gem_framebuffer_helper.c | 16 +++---- drivers/gpu/drm/drm_internal.h | 2 + 3 files changed, 51 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 19d50d254fe6..bc505d938b3e 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -213,6 +213,35 @@ void drm_gem_private_object_fini(struct drm_gem_object *obj) } EXPORT_SYMBOL(drm_gem_private_object_fini); +static void drm_gem_object_handle_get(struct drm_gem_object *obj) +{ + struct drm_device *dev = obj->dev; + + drm_WARN_ON(dev, !mutex_is_locked(&dev->object_name_lock)); + + if (obj->handle_count++ == 0) + drm_gem_object_get(obj); +} + +/** + * drm_gem_object_handle_get_unlocked - acquire reference on user-space handles + * @obj: GEM object + * + * Acquires a reference on the GEM buffer object's handle. Required + * to keep the GEM object alive. Call drm_gem_object_handle_put_unlocked() + * to release the reference. + */ +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj) +{ + struct drm_device *dev = obj->dev; + + guard(mutex)(&dev->object_name_lock); + + drm_WARN_ON(dev, !obj->handle_count); /* first ref taken in create-tail helper */ + drm_gem_object_handle_get(obj); +} +EXPORT_SYMBOL(drm_gem_object_handle_get_unlocked); + /** * drm_gem_object_handle_free - release resources bound to userspace handles * @obj: GEM object to clean up. @@ -243,8 +272,14 @@ static void drm_gem_object_exported_dma_buf_free(struct drm_gem_object *obj) } } -static void -drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) +/** + * drm_gem_object_handle_put_unlocked - releases reference on user-space handles + * @obj: GEM object + * + * Releases a reference on the GEM buffer object's handle. Possibly releases + * the GEM buffer object and associated dma-buf objects. + */ +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) { struct drm_device *dev = obj->dev; bool final = false; @@ -269,6 +304,7 @@ drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) if (final) drm_gem_object_put(obj); } +EXPORT_SYMBOL(drm_gem_object_handle_put_unlocked); /* * Called at device or object close to release the file's @@ -390,8 +426,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, int ret; WARN_ON(!mutex_is_locked(&dev->object_name_lock)); - if (obj->handle_count++ == 0) - drm_gem_object_get(obj); + + drm_gem_object_handle_get(obj); /* * Get the user-visible handle using idr. Preload and perform diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c b/drivers/gpu/drm/drm_gem_framebuffer_helper.c index 618ce725cd75..c60d0044d036 100644 --- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c +++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c @@ -100,7 +100,7 @@ void drm_gem_fb_destroy(struct drm_framebuffer *fb) unsigned int i; for (i = 0; i < fb->format->num_planes; i++) - drm_gem_object_put(fb->obj[i]); + drm_gem_object_handle_put_unlocked(fb->obj[i]); drm_framebuffer_cleanup(fb); kfree(fb); @@ -183,8 +183,10 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, if (!objs[i]) { drm_dbg_kms(dev, "Failed to lookup GEM object\n"); ret = -ENOENT; - goto err_gem_object_put; + goto err_gem_object_handle_put_unlocked; } + drm_gem_object_handle_get_unlocked(objs[i]); + drm_gem_object_put(objs[i]); min_size = (height - 1) * mode_cmd->pitches[i] + drm_format_info_min_pitch(info, i, width) @@ -194,22 +196,22 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, drm_dbg_kms(dev, "GEM object size (%zu) smaller than minimum size (%u) for plane %d\n", objs[i]->size, min_size, i); - drm_gem_object_put(objs[i]); + drm_gem_object_handle_put_unlocked(objs[i]); ret = -EINVAL; - goto err_gem_object_put; + goto err_gem_object_handle_put_unlocked; } } ret = drm_gem_fb_init(dev, fb, mode_cmd, objs, i, funcs); if (ret) - goto err_gem_object_put; + goto err_gem_object_handle_put_unlocked; return 0; -err_gem_object_put: +err_gem_object_handle_put_unlocked: while (i > 0) { --i; - drm_gem_object_put(objs[i]); + drm_gem_object_handle_put_unlocked(objs[i]); } return ret; } diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 442eb31351dd..f7b414a813ae 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -161,6 +161,8 @@ void drm_sysfs_lease_event(struct drm_device *dev); /* drm_gem.c */ int drm_gem_init(struct drm_device *dev); +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj); +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj); int drm_gem_handle_create_tail(struct drm_file *file_priv, struct drm_gem_object *obj, u32 *handlep); -- 2.50.0

16 hours, 45 minutes

[PATCH] drm/gem: Acquire references on GEM handles for framebuffers

by Thomas Zimmermann

A GEM handle can be released while the GEM buffer object is attached to a DRM framebuffer. This leads to the release of the dma-buf backing the buffer object, if any. [1] Trying to use the framebuffer in further mode-setting operations leads to a segmentation fault. Most easily happens with driver that use shadow planes for vmap-ing the dma-buf during a page flip. An example is shown below. [ 156.791968] ------------[ cut here ]------------ [ 156.796830] WARNING: CPU: 2 PID: 2255 at drivers/dma-buf/dma-buf.c:1527 dma_buf_vmap+0x224/0x430 [...] [ 156.942028] RIP: 0010:dma_buf_vmap+0x224/0x430 [ 157.043420] Call Trace: [ 157.045898] <TASK> [ 157.048030] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.052436] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.056836] ? show_trace_log_lvl+0x1af/0x2c0 [ 157.061253] ? drm_gem_shmem_vmap+0x74/0x710 [ 157.065567] ? dma_buf_vmap+0x224/0x430 [ 157.069446] ? __warn.cold+0x58/0xe4 [ 157.073061] ? dma_buf_vmap+0x224/0x430 [ 157.077111] ? report_bug+0x1dd/0x390 [ 157.080842] ? handle_bug+0x5e/0xa0 [ 157.084389] ? exc_invalid_op+0x14/0x50 [ 157.088291] ? asm_exc_invalid_op+0x16/0x20 [ 157.092548] ? dma_buf_vmap+0x224/0x430 [ 157.096663] ? dma_resv_get_singleton+0x6d/0x230 [ 157.101341] ? __pfx_dma_buf_vmap+0x10/0x10 [ 157.105588] ? __pfx_dma_resv_get_singleton+0x10/0x10 [ 157.110697] drm_gem_shmem_vmap+0x74/0x710 [ 157.114866] drm_gem_vmap+0xa9/0x1b0 [ 157.118763] drm_gem_vmap_unlocked+0x46/0xa0 [ 157.123086] drm_gem_fb_vmap+0xab/0x300 [ 157.126979] drm_atomic_helper_prepare_planes.part.0+0x487/0xb10 [ 157.133032] ? lockdep_init_map_type+0x19d/0x880 [ 157.137701] drm_atomic_helper_commit+0x13d/0x2e0 [ 157.142671] ? drm_atomic_nonblocking_commit+0xa0/0x180 [ 157.147988] drm_mode_atomic_ioctl+0x766/0xe40 [...] [ 157.346424] ---[ end trace 0000000000000000 ]--- Acquiring GEM handles for the framebuffer's GEM buffer objects prevents this from happening. The framebuffer's cleanup later puts the handle references. The Fixes tag points to commit 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object instance"), which triggers the segmentation fault. The issue has been present before. Suggested-by: Christian König <christian.koenig(a)amd.com> Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de> Fixes: 1a148af06000 ("drm/gem-shmem: Use dma_buf from GEM object instance") Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: Anusha Srivatsa <asrivats(a)redhat.com> Cc: Christian König <christian.koenig(a)amd.com> Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: Sumit Semwal <sumit.semwal(a)linaro.org> Cc: "Christian König" <christian.koenig(a)amd.com> Cc: linux-media(a)vger.kernel.org Cc: dri-devel(a)lists.freedesktop.org Cc: linaro-mm-sig(a)lists.linaro.org Cc: <stable(a)vger.kernel.org> --- drivers/gpu/drm/drm_gem.c | 44 ++++++++++++++++++-- drivers/gpu/drm/drm_gem_framebuffer_helper.c | 7 +++- drivers/gpu/drm/drm_internal.h | 2 + 3 files changed, 48 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/drm_gem.c b/drivers/gpu/drm/drm_gem.c index 19d50d254fe6..8be50b3cc9c2 100644 --- a/drivers/gpu/drm/drm_gem.c +++ b/drivers/gpu/drm/drm_gem.c @@ -213,6 +213,35 @@ void drm_gem_private_object_fini(struct drm_gem_object *obj) } EXPORT_SYMBOL(drm_gem_private_object_fini); +static void drm_gem_object_handle_get(struct drm_gem_object *obj) +{ + struct drm_device *dev = obj->dev; + + drm_WARN_ON(dev, !mutex_is_locked(&dev->object_name_lock)); + + if (obj->handle_count++ == 0) + drm_gem_object_get(obj); +} + +/** + * drm_gem_object_handle_get_unlocked - acquire reference on user-space handles + * @obj: GEM object + * + * Acquires a reference on the GEM buffer object's handle. Required + * to keep the GEM object alive. Call drm_gem_object_handle_put_unlocked() + * to release the reference. + */ +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj) +{ + struct drm_device *dev = obj->dev; + + guard(mutex)(&dev->object_name_lock); + + drm_WARN_ON(dev, !obj->handle_count); // first ref taken in create-tail helper + drm_gem_object_handle_get(obj); +} +EXPORT_SYMBOL(drm_gem_object_handle_get_unlocked); + /** * drm_gem_object_handle_free - release resources bound to userspace handles * @obj: GEM object to clean up. @@ -243,8 +272,14 @@ static void drm_gem_object_exported_dma_buf_free(struct drm_gem_object *obj) } } -static void -drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) +/** + * drm_gem_object_handle_put_unlocked - releases reference on user-space handles + * @obj: GEM object + * + * Releases a reference on the GEM buffer object's handle. Possibly releases + * the GEM buffer object and associated dma-buf objects. + */ +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) { struct drm_device *dev = obj->dev; bool final = false; @@ -269,6 +304,7 @@ drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj) if (final) drm_gem_object_put(obj); } +EXPORT_SYMBOL(drm_gem_object_handle_put_unlocked); /* * Called at device or object close to release the file's @@ -390,8 +426,8 @@ drm_gem_handle_create_tail(struct drm_file *file_priv, int ret; WARN_ON(!mutex_is_locked(&dev->object_name_lock)); - if (obj->handle_count++ == 0) - drm_gem_object_get(obj); + + drm_gem_object_handle_get(obj); /* * Get the user-visible handle using idr. Preload and perform diff --git a/drivers/gpu/drm/drm_gem_framebuffer_helper.c b/drivers/gpu/drm/drm_gem_framebuffer_helper.c index 618ce725cd75..723f1d652c01 100644 --- a/drivers/gpu/drm/drm_gem_framebuffer_helper.c +++ b/drivers/gpu/drm/drm_gem_framebuffer_helper.c @@ -99,8 +99,10 @@ void drm_gem_fb_destroy(struct drm_framebuffer *fb) { unsigned int i; - for (i = 0; i < fb->format->num_planes; i++) + for (i = 0; i < fb->format->num_planes; i++) { + drm_gem_object_handle_put_unlocked(fb->obj[i]); drm_gem_object_put(fb->obj[i]); + } drm_framebuffer_cleanup(fb); kfree(fb); @@ -185,6 +187,7 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, ret = -ENOENT; goto err_gem_object_put; } + drm_gem_object_handle_get_unlocked(objs[i]); min_size = (height - 1) * mode_cmd->pitches[i] + drm_format_info_min_pitch(info, i, width) @@ -195,6 +198,7 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, "GEM object size (%zu) smaller than minimum size (%u) for plane %d\n", objs[i]->size, min_size, i); drm_gem_object_put(objs[i]); + drm_gem_object_handle_put_unlocked(objs[i]); ret = -EINVAL; goto err_gem_object_put; } @@ -210,6 +214,7 @@ int drm_gem_fb_init_with_funcs(struct drm_device *dev, while (i > 0) { --i; drm_gem_object_put(objs[i]); + drm_gem_object_handle_put_unlocked(objs[i]); } return ret; } diff --git a/drivers/gpu/drm/drm_internal.h b/drivers/gpu/drm/drm_internal.h index 442eb31351dd..f7b414a813ae 100644 --- a/drivers/gpu/drm/drm_internal.h +++ b/drivers/gpu/drm/drm_internal.h @@ -161,6 +161,8 @@ void drm_sysfs_lease_event(struct drm_device *dev); /* drm_gem.c */ int drm_gem_init(struct drm_device *dev); +void drm_gem_object_handle_get_unlocked(struct drm_gem_object *obj); +void drm_gem_object_handle_put_unlocked(struct drm_gem_object *obj); int drm_gem_handle_create_tail(struct drm_file *file_priv, struct drm_gem_object *obj, u32 *handlep); -- 2.50.0

3 days, 15 hours

Re: [PATCH 0/6] Add few updates to the STM32 SPI driver

by Mark Brown

On Mon, 16 Jun 2025 11:21:01 +0200, Clément Le Goffic wrote: > This series aims to improve the STM32 SPI driver in different areas. > It adds SPI_READY mode, fixes an issue raised by a kernel bot, > add the ability to use DMA-MDMA chaining for RX and deprecate an ST bindings > vendor property. > > Applied to https://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git for-next Thanks! [1/6] spi: stm32: Add SPI_READY mode to spi controller commit: e4feefa5c71912ebfcb97a3dbe2b021fd1cea9d1 [2/6] spi: stm32: Check for cfg availability in stm32_spi_probe commit: 21f1c800f6620e43f31dfd76709dbac8ebaa5a16 [3/6] dt-bindings: spi: stm32: update bindings with SPI Rx DMA-MDMA chaining commit: bd60f94a3eb4f80cb66c9687d640554fd0c579d0 [4/6] spi: stm32: use STM32 DMA with STM32 MDMA to enhance DDR use commit: d17dd2f1d8a1d919e39c6302b024f135a2f90773 [5/6] spi: stm32: deprecate `st,spi-midi-ns` property commit: 4956bf44524394211ca80aa04d0c9e1e9bb0219d [6/6] dt-bindings: spi: stm32: deprecate `st,spi-midi-ns` property commit: 9a944494c299fabf3cc781798eb7c02a0bece364 All being well this means that it will be integrated into the linux-next tree (usually sometime in the next 24 hours) and sent to Linus during the next merge window (or sooner if it is a bug fix), however if problems are discovered then the patch may be dropped or reverted. You may get further e-mails resulting from automated or manual testing and review of the tree, please engage with people reporting problems and send followup patches addressing any issues that are reported if needed. If any updates are required or you are submitting further changes they should be sent as incremental updates against current git, existing patches will not be replaced. Please add any relevant lists and maintainers to the CCs when replying to this mail. Thanks, Mark

5 days, 9 hours

Re: [PATCH v7 05/10] accel/rocket: Add IOCTLs for synchronizing memory accesses

by Robin Murphy

On 2025-06-06 7:28 am, Tomeu Vizoso wrote: > The NPU cores have their own access to the memory bus, and this isn't > cache coherent with the CPUs. > > Add IOCTLs so userspace can mark when the caches need to be flushed, and > also when a writer job needs to be waited for before the buffer can be > accessed from the CPU. > > Initially based on the same IOCTLs from the Etnaviv driver. > > v2: > - Don't break UABI by reordering the IOCTL IDs (Jeff Hugo) > > v3: > - Check that padding fields in IOCTLs are zero (Jeff Hugo) > > v6: > - Fix conversion logic to make sure we use DMA_BIDIRECTIONAL when needed > (Lucas Stach) > > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > Reviewed-by: Jeff Hugo <jeff.hugo(a)oss.qualcomm.com> > --- > drivers/accel/rocket/rocket_drv.c | 2 + > drivers/accel/rocket/rocket_gem.c | 82 +++++++++++++++++++++++++++++++++++++++ > drivers/accel/rocket/rocket_gem.h | 5 +++ > include/uapi/drm/rocket_accel.h | 37 ++++++++++++++++++ > 4 files changed, 126 insertions(+) > > diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c > index 4ab78193c186dfcfc3e323f16c588e85e6a8a334..eb9284ee2511f730afe6a532225c2706ce0e2822 100644 > --- a/drivers/accel/rocket/rocket_drv.c > +++ b/drivers/accel/rocket/rocket_drv.c > @@ -62,6 +62,8 @@ static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = { > > ROCKET_IOCTL(CREATE_BO, create_bo), > ROCKET_IOCTL(SUBMIT, submit), > + ROCKET_IOCTL(PREP_BO, prep_bo), > + ROCKET_IOCTL(FINI_BO, fini_bo), > }; > > DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops); > diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c > index 61b7f970a6885aa13784daa1222611a02aa10dee..07024b6e71bf544dc7f00b008b9afb74b0c4e802 100644 > --- a/drivers/accel/rocket/rocket_gem.c > +++ b/drivers/accel/rocket/rocket_gem.c > @@ -113,3 +113,85 @@ int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file * > > return ret; > } > + > +static inline enum dma_data_direction rocket_op_to_dma_dir(u32 op) > +{ > + op &= ROCKET_PREP_READ | ROCKET_PREP_WRITE; > + > + if (op == ROCKET_PREP_READ) > + return DMA_FROM_DEVICE; > + else if (op == ROCKET_PREP_WRITE) > + return DMA_TO_DEVICE; > + else > + return DMA_BIDIRECTIONAL; > +} > + > +int rocket_ioctl_prep_bo(struct drm_device *dev, void *data, struct drm_file *file) > +{ > + struct drm_rocket_prep_bo *args = data; > + unsigned long timeout = drm_timeout_abs_to_jiffies(args->timeout_ns); > + struct rocket_device *rdev = to_rocket_device(dev); > + struct drm_gem_object *gem_obj; > + struct drm_gem_shmem_object *shmem_obj; > + bool write = !!(args->op & ROCKET_PREP_WRITE); > + long ret = 0; > + > + if (args->op & ~(ROCKET_PREP_READ | ROCKET_PREP_WRITE)) > + return -EINVAL; > + > + gem_obj = drm_gem_object_lookup(file, args->handle); > + if (!gem_obj) > + return -ENOENT; > + > + ret = dma_resv_wait_timeout(gem_obj->resv, dma_resv_usage_rw(write), > + true, timeout); > + if (!ret) > + ret = timeout ? -ETIMEDOUT : -EBUSY; > + > + shmem_obj = &to_rocket_bo(gem_obj)->base; > + > + for (unsigned int core = 1; core < rdev->num_cores; core++) { Huh? If you need to sync the BO memory ever, then you need to sync it for the same device it was mapped, and certainly not 0 or 2+ times depending on how may cores happen to be enabled. Please throw CONFIG_DMA_API_DEBUG at this. > + dma_sync_sgtable_for_cpu(rdev->cores[core].dev, shmem_obj->sgt, > + rocket_op_to_dma_dir(args->op)); Hmm, the intent of the API is really that the direction for sync should match the direction for map and unmap too; if it was mapped DMA_BIDIRECTIONAL then it should be synced DMA_BIDIRECTIONAL. If you have BOs which are really only used for one-directional purposes then they should be mapped as such at creation. Does anything actually prevent one thread form trying to read from a buffer while another thread is writing it, and thus the read inintuitively destroying newly-written data (and/or the write unwittingly destroying its own data in FINI_BO because last_cpu_prep_op got overwritten)? Unless there's a significant measurable benefit to trying to be clever here (of which I'm somewhjat doubtful), I would be strongly inclined to just keep things simple and straightforward. Thanks, Robin. > + } > + > + to_rocket_bo(gem_obj)->last_cpu_prep_op = args->op; > + > + drm_gem_object_put(gem_obj); > + > + return ret; > +} > + > +int rocket_ioctl_fini_bo(struct drm_device *dev, void *data, struct drm_file *file) > +{ > + struct rocket_device *rdev = to_rocket_device(dev); > + struct drm_rocket_fini_bo *args = data; > + struct drm_gem_shmem_object *shmem_obj; > + struct rocket_gem_object *rkt_obj; > + struct drm_gem_object *gem_obj; > + > + if (args->reserved != 0) { > + drm_dbg(dev, "Reserved field in drm_rocket_fini_bo struct should be 0.\n"); > + return -EINVAL; > + } > + > + gem_obj = drm_gem_object_lookup(file, args->handle); > + if (!gem_obj) > + return -ENOENT; > + > + rkt_obj = to_rocket_bo(gem_obj); > + shmem_obj = &rkt_obj->base; > + > + WARN_ON(rkt_obj->last_cpu_prep_op == 0); > + > + for (unsigned int core = 1; core < rdev->num_cores; core++) { > + dma_sync_sgtable_for_device(rdev->cores[core].dev, shmem_obj->sgt, > + rocket_op_to_dma_dir(rkt_obj->last_cpu_prep_op)); > + } > + > + rkt_obj->last_cpu_prep_op = 0; > + > + drm_gem_object_put(gem_obj); > + > + return 0; > +} > diff --git a/drivers/accel/rocket/rocket_gem.h b/drivers/accel/rocket/rocket_gem.h > index e8a4d6213fd80419be2ec8af04583a67fb1a4b75..a52a63cd78339a6150b99592ab5f94feeeb51fde 100644 > --- a/drivers/accel/rocket/rocket_gem.h > +++ b/drivers/accel/rocket/rocket_gem.h > @@ -12,12 +12,17 @@ struct rocket_gem_object { > struct iommu_domain *domain; > size_t size; > u32 offset; > + u32 last_cpu_prep_op; > }; > > struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size); > > int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file); > > +int rocket_ioctl_prep_bo(struct drm_device *dev, void *data, struct drm_file *file); > + > +int rocket_ioctl_fini_bo(struct drm_device *dev, void *data, struct drm_file *file); > + > static inline > struct rocket_gem_object *to_rocket_bo(struct drm_gem_object *obj) > { > diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h > index cb1b5934c201160e7650aabd1b3a2b1c77b1fd7b..b5c80dd767be56e9720b51e4a82617a425a881a1 100644 > --- a/include/uapi/drm/rocket_accel.h > +++ b/include/uapi/drm/rocket_accel.h > @@ -13,9 +13,13 @@ extern "C" { > > #define DRM_ROCKET_CREATE_BO 0x00 > #define DRM_ROCKET_SUBMIT 0x01 > +#define DRM_ROCKET_PREP_BO 0x02 > +#define DRM_ROCKET_FINI_BO 0x03 > > #define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo) > #define DRM_IOCTL_ROCKET_SUBMIT DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_SUBMIT, struct drm_rocket_submit) > +#define DRM_IOCTL_ROCKET_PREP_BO DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_PREP_BO, struct drm_rocket_prep_bo) > +#define DRM_IOCTL_ROCKET_FINI_BO DRM_IOW(DRM_COMMAND_BASE + DRM_ROCKET_FINI_BO, struct drm_rocket_fini_bo) > > /** > * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs. > @@ -39,6 +43,39 @@ struct drm_rocket_create_bo { > __u64 offset; > }; > > +#define ROCKET_PREP_READ 0x01 > +#define ROCKET_PREP_WRITE 0x02 > + > +/** > + * struct drm_rocket_prep_bo - ioctl argument for starting CPU ownership of the BO. > + * > + * Takes care of waiting for any NPU jobs that might still use the NPU and performs cache > + * synchronization. > + */ > +struct drm_rocket_prep_bo { > + /** Input: GEM handle of the buffer object. */ > + __u32 handle; > + > + /** Input: mask of ROCKET_PREP_x, direction of the access. */ > + __u32 op; > + > + /** Input: Amount of time to wait for NPU jobs. */ > + __s64 timeout_ns; > +}; > + > +/** > + * struct drm_rocket_fini_bo - ioctl argument for finishing CPU ownership of the BO. > + * > + * Synchronize caches for NPU access. > + */ > +struct drm_rocket_fini_bo { > + /** Input: GEM handle of the buffer object. */ > + __u32 handle; > + > + /** Reserved, must be zero. */ > + __u32 reserved; > +}; > + > /** > * struct drm_rocket_task - A task to be run on the NPU > * >

6 days, 11 hours

Re: [PATCH v7 04/10] accel/rocket: Add job submission IOCTL

by Robin Murphy

On 2025-06-06 7:28 am, Tomeu Vizoso wrote: [...] > diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h > index 10acfe8534f00a7985d40a93f4b2f7f69d43caee..50e46f0516bd1615b5f826c5002a6c0ecbf9aed4 100644 > --- a/drivers/accel/rocket/rocket_device.h > +++ b/drivers/accel/rocket/rocket_device.h > @@ -13,6 +13,8 @@ > struct rocket_device { > struct drm_device ddev; > > + struct mutex sched_lock; > + > struct mutex iommu_lock; Just realised I missed this in the last patch, but iommu_lock appears to be completely unnecessary now. > struct rocket_core *cores; [...] > +static void rocket_job_hw_submit(struct rocket_core *core, struct rocket_job *job) > +{ > + struct rocket_task *task; > + bool task_pp_en = 1; > + bool task_count = 1; > + > + /* GO ! */ > + > + /* Don't queue the job if a reset is in progress */ > + if (atomic_read(&core->reset.pending)) > + return; > + > + task = &job->tasks[job->next_task_idx]; > + job->next_task_idx++; > + > + rocket_pc_writel(core, BASE_ADDRESS, 0x1); > + > + rocket_cna_writel(core, S_POINTER, 0xe + 0x10000000 * core->index); > + rocket_core_writel(core, S_POINTER, 0xe + 0x10000000 * core->index); Those really look like bitfield operations rather than actual arithmetic to me. > + > + rocket_pc_writel(core, BASE_ADDRESS, task->regcmd); I don't see how regcmd is created (I guess that's in userspace?), but given that it's explicitly u64 all the way through - and especially since you claim to support 40-bit DMA addresses - it definitely seems suspicious that the upper 32 bits never seem to be consumed anywhere :/ > + rocket_pc_writel(core, REGISTER_AMOUNTS, (task->regcmd_count + 1) / 2 - 1); > + > + rocket_pc_writel(core, INTERRUPT_MASK, PC_INTERRUPT_MASK_DPU_0 | PC_INTERRUPT_MASK_DPU_1); > + rocket_pc_writel(core, INTERRUPT_CLEAR, PC_INTERRUPT_CLEAR_DPU_0 | PC_INTERRUPT_CLEAR_DPU_1); > + > + rocket_pc_writel(core, TASK_CON, ((0x6 | task_pp_en) << 12) | task_count); > + > + rocket_pc_writel(core, TASK_DMA_BASE_ADDR, 0x0); > + > + rocket_pc_writel(core, OPERATION_ENABLE, 0x1); > + > + dev_dbg(core->dev, "Submitted regcmd at 0x%llx to core %d", task->regcmd, core->index); > +} [...] > +static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job) > +{ > + struct rocket_job *job = to_rocket_job(sched_job); > + struct rocket_device *rdev = job->rdev; > + struct rocket_core *core = sched_to_core(rdev, sched_job->sched); > + struct dma_fence *fence = NULL; > + int ret; > + > + if (unlikely(job->base.s_fence->finished.error)) > + return NULL; > + > + /* > + * Nothing to execute: can happen if the job has finished while > + * we were resetting the GPU. GPU? (Similarly in various other comments/prints) > + */ > + if (job->next_task_idx == job->task_count) > + return NULL; > + > + fence = rocket_fence_create(core); > + if (IS_ERR(fence)) > + return fence; > + > + if (job->done_fence) > + dma_fence_put(job->done_fence); > + job->done_fence = dma_fence_get(fence); > + > + ret = pm_runtime_get_sync(core->dev); > + if (ret < 0) > + return fence; > + > + ret = iommu_attach_group(job->domain, iommu_group_get(core->dev)); I don't see iommu_group_put() anywhere, so you're leaking refcounts all over. > + if (ret < 0) > + return fence; > + > + scoped_guard(spinlock, &core->job_lock) { > + core->in_flight_job = job; > + rocket_job_hw_submit(core, job); > + } > + > + return fence; > +} [...] > +static void rocket_job_handle_irq(struct rocket_core *core) > +{ > + u32 status, raw_status; > + > + pm_runtime_mark_last_busy(core->dev); > + > + status = rocket_pc_readl(core, INTERRUPT_STATUS); > + raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS); > + > + rocket_pc_writel(core, OPERATION_ENABLE, 0x0); > + rocket_pc_writel(core, INTERRUPT_CLEAR, 0x1ffff); What was the point of reading the status registers if we're just going to blindly clear every possible condition anyway? > + scoped_guard(spinlock, &core->job_lock) > + if (core->in_flight_job) > + rocket_job_handle_done(core, core->in_flight_job); But then is it really OK to just start the next task regardless of whether the current task was reporting successful completion or an error? > +} > + > +static void > +rocket_reset(struct rocket_core *core, struct drm_sched_job *bad) > +{ > + bool cookie; > + > + if (!atomic_read(&core->reset.pending)) > + return; > + > + /* > + * Stop the scheduler. > + * > + * FIXME: We temporarily get out of the dma_fence_signalling section > + * because the cleanup path generate lockdep splats when taking locks > + * to release job resources. We should rework the code to follow this > + * pattern: > + * > + * try_lock > + * if (locked) > + * release > + * else > + * schedule_work_to_release_later > + */ > + drm_sched_stop(&core->sched, bad); > + > + cookie = dma_fence_begin_signalling(); > + > + if (bad) > + drm_sched_increase_karma(bad); > + > + /* > + * Mask job interrupts and synchronize to make sure we won't be > + * interrupted during our reset. > + */ > + rocket_pc_writel(core, INTERRUPT_MASK, 0x0); > + synchronize_irq(core->irq); ...except it's a shared IRQ, so it can still merrily fire at any time. > + > + /* Handle the remaining interrupts before we reset. */ > + rocket_job_handle_irq(core); > + > + /* > + * Remaining interrupts have been handled, but we might still have > + * stuck jobs. Let's make sure the PM counters stay balanced by > + * manually calling pm_runtime_put_noidle() and > + * rocket_devfreq_record_idle() for each stuck job. > + * Let's also make sure the cycle counting register's refcnt is > + * kept balanced to prevent it from running forever Comments that don't match the code are more confusing than helpful :/ > + */ > + scoped_guard(spinlock, &core->job_lock) { > + if (core->in_flight_job) > + pm_runtime_put_noidle(core->dev); > + > + core->in_flight_job = NULL; > + } > + > + /* Proceed with reset now. */ > + pm_runtime_force_suspend(core->dev); > + pm_runtime_force_resume(core->dev); Can you guarantee that actually resets the hardware if something else is holding the power domain open or RPM is disabled? I'm not familiar with the details of drm_sched, but if there are other jobs queued behind the stuck one would it even pass the rocket_job_is_idle() check for suspend to succeed anyway? Not to mention that you have an actual reset control in the DT binding, which isn't even optional... :/ > + /* GPU has been reset, we can clear the reset pending bit. */ > + atomic_set(&core->reset.pending, 0); > + > + /* > + * Now resubmit jobs that were previously queued but didn't have a > + * chance to finish. > + * FIXME: We temporarily get out of the DMA fence signalling section > + * while resubmitting jobs because the job submission logic will > + * allocate memory with the GFP_KERNEL flag which can trigger memory > + * reclaim and exposes a lock ordering issue. > + */ > + dma_fence_end_signalling(cookie); > + drm_sched_resubmit_jobs(&core->sched); Since I happened to look, this says it's deprecated? > + cookie = dma_fence_begin_signalling(); > + > + /* Restart the scheduler */ > + drm_sched_start(&core->sched, 0); > + > + dma_fence_end_signalling(cookie); > +} > + > +static enum drm_gpu_sched_stat rocket_job_timedout(struct drm_sched_job *sched_job) > +{ > + struct rocket_job *job = to_rocket_job(sched_job); > + struct rocket_device *rdev = job->rdev; > + struct rocket_core *core = sched_to_core(rdev, sched_job->sched); > + > + /* > + * If the GPU managed to complete this jobs fence, the timeout is > + * spurious. Bail out. > + */ > + if (dma_fence_is_signaled(job->done_fence)) > + return DRM_GPU_SCHED_STAT_NOMINAL; Do we really need the same return condition twice? What if the IRQ fires immediately after we've made this check, and is handled without delay such that sychronize_irq() effectively still does nothing? Either way we've taken longer than the timeout value to observe the job completing successfully, and either that's significant and worth warning about or it's not - I don't see any point in trying to (inaccurately) nitpick *why* it might have happened. > + /* > + * Rocket IRQ handler may take a long time to process an interrupt > + * if there is another IRQ handler hogging the processing. > + * For example, the HDMI encoder driver might be stuck in the IRQ > + * handler for a significant time in a case of bad cable connection. What have HDMI cables got to do with anything here? Yes, in general IRQ latency can be high, since CPUs can have IRQs masked and/or be taking higher-priority interrupts for any number of reasons. I don't see how an oddly-specific example (of apparently poor driver design, to boot) is useful. > + * In order to catch such cases and not report spurious rocket > + * job timeouts, synchronize the IRQ handler and re-check the fence > + * status. > + */ > + synchronize_irq(core->irq); > + > + if (dma_fence_is_signaled(job->done_fence)) { > + dev_warn(core->dev, "unexpectedly high interrupt latency\n"); > + return DRM_GPU_SCHED_STAT_NOMINAL; > + } > + > + dev_err(core->dev, "gpu sched timeout"); > + > + atomic_set(&core->reset.pending, 1); > + rocket_reset(core, sched_job); > + iommu_detach_group(NULL, iommu_group_get(core->dev)); > + > + return DRM_GPU_SCHED_STAT_NOMINAL; > +} > + > +static void rocket_reset_work(struct work_struct *work) > +{ > + struct rocket_core *core; > + > + core = container_of(work, struct rocket_core, reset.work); > + rocket_reset(core, NULL); > +} > + > +static const struct drm_sched_backend_ops rocket_sched_ops = { > + .run_job = rocket_job_run, > + .timedout_job = rocket_job_timedout, > + .free_job = rocket_job_free > +}; > + > +static irqreturn_t rocket_job_irq_handler_thread(int irq, void *data) > +{ > + struct rocket_core *core = data; > + > + rocket_job_handle_irq(core); > + > + return IRQ_HANDLED; > +} > + > +static irqreturn_t rocket_job_irq_handler(int irq, void *data) > +{ > + struct rocket_core *core = data; > + u32 raw_status = rocket_pc_readl(core, INTERRUPT_RAW_STATUS); Given that this can be a shared IRQ as above, it would be a good idea to take care to avoid register accesses while suspended. Especially if you're trying to utilise suspend to reset a failing job that may well be throwing IOMMU faults. > + > + WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR); > + WARN_ON(raw_status & PC_INTERRUPT_RAW_STATUS_DMA_READ_ERROR); > + > + if (!(raw_status & PC_INTERRUPT_RAW_STATUS_DPU_0 || > + raw_status & PC_INTERRUPT_RAW_STATUS_DPU_1)) > + return IRQ_NONE; > + > + rocket_pc_writel(core, INTERRUPT_MASK, 0x0); > + > + return IRQ_WAKE_THREAD; > +} > + > +int rocket_job_init(struct rocket_core *core) > +{ > + struct drm_sched_init_args args = { > + .ops = &rocket_sched_ops, > + .num_rqs = DRM_SCHED_PRIORITY_COUNT, > + .credit_limit = 1, Ah, does this mean that all the stuff about queued jobs was in fact all nonsense anyway? > + .timeout = msecs_to_jiffies(JOB_TIMEOUT_MS), > + .name = dev_name(core->dev), > + .dev = core->dev, > + }; > + int ret; > + > + INIT_WORK(&core->reset.work, rocket_reset_work); > + spin_lock_init(&core->job_lock); > + > + core->irq = platform_get_irq(to_platform_device(core->dev), 0); > + if (core->irq < 0) > + return core->irq; > + > + ret = devm_request_threaded_irq(core->dev, core->irq, > + rocket_job_irq_handler, > + rocket_job_irq_handler_thread, > + IRQF_SHARED, KBUILD_MODNAME "-job", Is it really a "job" interrupt though? The binding and the register definitions suggest it's just a general status interrupt for the core. Furthermore since we expect to have multiple cores, being able to more easily identify and attribute per-core IRQ activity seems more useful for debugging than copy-pasting from something really rather different which also expects to be the only one of its kind on the system. Thanks, Robin. > + core); > + if (ret) { > + dev_err(core->dev, "failed to request job irq"); > + return ret; > + }

6 days, 14 hours

Re: [PATCH v7 06/10] dt-bindings: npu: rockchip,rknn: Add bindings

by Robin Murphy

On 2025-06-06 7:28 am, Tomeu Vizoso wrote: > Add the bindings for the Neural Processing Unit IP from Rockchip. > > v2: > - Adapt to new node structure (one node per core, each with its own > IOMMU) > - Several misc. fixes from Sebastian Reichel > > v3: > - Split register block in its constituent subblocks, and only require > the ones that the kernel would ever use (Nicolas Frattaroli) > - Group supplies (Rob Herring) > - Explain the way in which the top core is special (Rob Herring) > > v4: > - Change required node name to npu@ (Rob Herring and Krzysztof Kozlowski) > - Remove unneeded items: (Krzysztof Kozlowski) > - Fix use of minItems/maxItems (Krzysztof Kozlowski) > - Add reg-names to list of required properties (Krzysztof Kozlowski) > - Fix example (Krzysztof Kozlowski) > > v5: > - Rename file to rockchip,rk3588-rknn-core.yaml (Krzysztof Kozlowski) > - Streamline compatible property (Krzysztof Kozlowski) > > v6: > - Remove mention to NVDLA, as the hardware is only incidentally related > (Kever Yang) > - Mark pclk and npu clocks as required by all clocks (Rob Herring) > > v7: > - Remove allOf section, not needed now that all nodes require 4 clocks > (Heiko Stübner) > > Signed-off-by: Sebastian Reichel <sebastian.reichel(a)collabora.com> > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org> > --- > .../bindings/npu/rockchip,rk3588-rknn-core.yaml | 118 +++++++++++++++++++++ > 1 file changed, 118 insertions(+) > > diff --git a/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml > new file mode 100644 > index 0000000000000000000000000000000000000000..0588c085a723a34f4fa30a9680ea948d960b092f > --- /dev/null > +++ b/Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml > @@ -0,0 +1,118 @@ > +# SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) > +%YAML 1.2 > +--- > +$id: http://devicetree.org/schemas/npu/rockchip,rk3588-rknn-core.yaml# > +$schema: http://devicetree.org/meta-schemas/core.yaml# > + > +title: Neural Processing Unit IP from Rockchip > + > +maintainers: > + - Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > + > +description: > + Rockchip IP for accelerating inference of neural networks. > + > + There is to be a node per each core in the NPU. In Rockchip's design there > + will be one core that is special because it is able to redistribute work to > + the other cores by forwarding register writes and sharing data. This special > + core is called the top core and should have the compatible string that > + corresponds to top cores. Say a future SoC, for scaling reasons, puts down two or more whole NPUs rather than just increasing the number of sub-cores in one? How is a DT consumer then going to know which "cores" are associated with which "top cores"? I think at the very least they want phandles in one direction or the other, but if there is a real functional hierarchy then I'd be strongly tempted to have the "core" nodes as children of their "top core", particularly since "forwarding register writes" sounds absolutely like something which could justify being represented as a "bus" in the DT sense. Thanks, Robin. > + > +properties: > + $nodename: > + pattern: '^npu@[a-f0-9]+$' > + > + compatible: > + enum: > + - rockchip,rk3588-rknn-core-top > + - rockchip,rk3588-rknn-core > + > + reg: > + maxItems: 3 > + > + reg-names: > + items: > + - const: pc > + - const: cna > + - const: core > + > + clocks: > + maxItems: 4 > + > + clock-names: > + items: > + - const: aclk > + - const: hclk > + - const: npu > + - const: pclk > + > + interrupts: > + maxItems: 1 > + > + iommus: > + maxItems: 1 > + > + npu-supply: true > + > + power-domains: > + maxItems: 1 > + > + resets: > + maxItems: 2 > + > + reset-names: > + items: > + - const: srst_a > + - const: srst_h > + > + sram-supply: true > + > +required: > + - compatible > + - reg > + - reg-names > + - clocks > + - clock-names > + - interrupts > + - iommus > + - power-domains > + - resets > + - reset-names > + - npu-supply > + - sram-supply > + > +additionalProperties: false > + > +examples: > + - | > + #include <dt-bindings/clock/rockchip,rk3588-cru.h> > + #include <dt-bindings/interrupt-controller/irq.h> > + #include <dt-bindings/interrupt-controller/arm-gic.h> > + #include <dt-bindings/power/rk3588-power.h> > + #include <dt-bindings/reset/rockchip,rk3588-cru.h> > + > + bus { > + #address-cells = <2>; > + #size-cells = <2>; > + > + npu@fdab0000 { > + compatible = "rockchip,rk3588-rknn-core-top"; > + reg = <0x0 0xfdab0000 0x0 0x1000>, > + <0x0 0xfdab1000 0x0 0x1000>, > + <0x0 0xfdab3000 0x0 0x1000>; > + reg-names = "pc", "cna", "core"; > + assigned-clocks = <&scmi_clk SCMI_CLK_NPU>; > + assigned-clock-rates = <200000000>; > + clocks = <&cru ACLK_NPU0>, <&cru HCLK_NPU0>, > + <&scmi_clk SCMI_CLK_NPU>, <&cru PCLK_NPU_ROOT>; > + clock-names = "aclk", "hclk", "npu", "pclk"; > + interrupts = <GIC_SPI 110 IRQ_TYPE_LEVEL_HIGH 0>; > + iommus = <&rknn_mmu_top>; > + npu-supply = <&vdd_npu_s0>; > + power-domains = <&power RK3588_PD_NPUTOP>; > + resets = <&cru SRST_A_RKNN0>, <&cru SRST_H_RKNN0>; > + reset-names = "srst_a", "srst_h"; > + sram-supply = <&vdd_npu_mem_s0>; > + }; > + }; > +... >

6 days, 15 hours

Re: [PATCH v7 03/10] accel/rocket: Add IOCTL for BO creation

by Robin Murphy

On 2025-06-06 7:28 am, Tomeu Vizoso wrote: > This uses the SHMEM DRM helpers and we map right away to the CPU and NPU > sides, as all buffers are expected to be accessed from both. > > v2: > - Sync the IOMMUs for the other cores when mapping and unmapping. > > v3: > - Make use of GPL-2.0-only for the copyright notice (Jeff Hugo) > > v6: > - Use mutexes guard (Markus Elfring) > > v7: > - Assign its own IOMMU domain to each client, for isolation (Daniel > Stone and Robin Murphy) > > Reviewed-by: Jeffrey Hugo <quic_jhugo(a)quicinc.com> > Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net> > --- > drivers/accel/rocket/Makefile | 3 +- > drivers/accel/rocket/rocket_device.c | 4 ++ > drivers/accel/rocket/rocket_device.h | 2 + > drivers/accel/rocket/rocket_drv.c | 7 ++- > drivers/accel/rocket/rocket_gem.c | 115 +++++++++++++++++++++++++++++++++++ > drivers/accel/rocket/rocket_gem.h | 27 ++++++++ > include/uapi/drm/rocket_accel.h | 44 ++++++++++++++ > 7 files changed, 200 insertions(+), 2 deletions(-) > > diff --git a/drivers/accel/rocket/Makefile b/drivers/accel/rocket/Makefile > index abdd75f2492eaecf8bf5e78a2ac150ea19ac3e96..4deef267f9e1238c4d8bd108dcc8afd9dc8b2b8f 100644 > --- a/drivers/accel/rocket/Makefile > +++ b/drivers/accel/rocket/Makefile > @@ -5,4 +5,5 @@ obj-$(CONFIG_DRM_ACCEL_ROCKET) := rocket.o > rocket-y := \ > rocket_core.o \ > rocket_device.o \ > - rocket_drv.o > + rocket_drv.o \ > + rocket_gem.o > diff --git a/drivers/accel/rocket/rocket_device.c b/drivers/accel/rocket/rocket_device.c > index a05c103e117e3eaa6439884b7acb6e3483296edb..5e559104741af22c528914c96e44558323ab6c89 100644 > --- a/drivers/accel/rocket/rocket_device.c > +++ b/drivers/accel/rocket/rocket_device.c > @@ -4,6 +4,7 @@ > #include <linux/array_size.h> > #include <linux/clk.h> > #include <linux/dev_printk.h> > +#include <linux/mutex.h> > > #include "rocket_device.h" > > @@ -16,10 +17,13 @@ int rocket_device_init(struct rocket_device *rdev) > if (err) > return err; > > + mutex_init(&rdev->iommu_lock); > + > return 0; > } > > void rocket_device_fini(struct rocket_device *rdev) > { > + mutex_destroy(&rdev->iommu_lock); > rocket_core_fini(&rdev->cores[0]); > } > diff --git a/drivers/accel/rocket/rocket_device.h b/drivers/accel/rocket/rocket_device.h > index b5d5f1479d56e2fde59bbcad9de2b58cef9a9a4d..10acfe8534f00a7985d40a93f4b2f7f69d43caee 100644 > --- a/drivers/accel/rocket/rocket_device.h > +++ b/drivers/accel/rocket/rocket_device.h > @@ -13,6 +13,8 @@ > struct rocket_device { > struct drm_device ddev; > > + struct mutex iommu_lock; > + > struct rocket_core *cores; > unsigned int num_cores; > }; > diff --git a/drivers/accel/rocket/rocket_drv.c b/drivers/accel/rocket/rocket_drv.c > index b38a5c6264cb4e74d5e381adaeba1426e576fa56..2b8a88db20c408f313f4f4fe36b051c9d5e4829b 100644 > --- a/drivers/accel/rocket/rocket_drv.c > +++ b/drivers/accel/rocket/rocket_drv.c > @@ -6,6 +6,7 @@ > #include <drm/drm_gem.h> > #include <drm/drm_ioctl.h> > #include <drm/drm_of.h> > +#include <drm/rocket_accel.h> > #include <linux/array_size.h> > #include <linux/clk.h> > #include <linux/component.h> > @@ -16,6 +17,7 @@ > #include <linux/pm_runtime.h> > > #include "rocket_drv.h" > +#include "rocket_gem.h" > > static int > rocket_open(struct drm_device *dev, struct drm_file *file) > @@ -46,6 +48,8 @@ rocket_postclose(struct drm_device *dev, struct drm_file *file) > static const struct drm_ioctl_desc rocket_drm_driver_ioctls[] = { > #define ROCKET_IOCTL(n, func) \ > DRM_IOCTL_DEF_DRV(ROCKET_##n, rocket_ioctl_##func, 0) > + > + ROCKET_IOCTL(CREATE_BO, create_bo), > }; > > DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops); > @@ -55,9 +59,10 @@ DEFINE_DRM_ACCEL_FOPS(rocket_accel_driver_fops); > * - 1.0 - initial interface > */ > static const struct drm_driver rocket_drm_driver = { > - .driver_features = DRIVER_COMPUTE_ACCEL, > + .driver_features = DRIVER_COMPUTE_ACCEL | DRIVER_GEM, > .open = rocket_open, > .postclose = rocket_postclose, > + .gem_create_object = rocket_gem_create_object, > .ioctls = rocket_drm_driver_ioctls, > .num_ioctls = ARRAY_SIZE(rocket_drm_driver_ioctls), > .fops = &rocket_accel_driver_fops, > diff --git a/drivers/accel/rocket/rocket_gem.c b/drivers/accel/rocket/rocket_gem.c > new file mode 100644 > index 0000000000000000000000000000000000000000..61b7f970a6885aa13784daa1222611a02aa10dee > --- /dev/null > +++ b/drivers/accel/rocket/rocket_gem.c > @@ -0,0 +1,115 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* Copyright 2024-2025 Tomeu Vizoso <tomeu(a)tomeuvizoso.net> */ > + > +#include <drm/drm_device.h> > +#include <drm/drm_utils.h> > +#include <drm/rocket_accel.h> > +#include <linux/dma-mapping.h> > +#include <linux/iommu.h> > + > +#include "rocket_device.h" > +#include "rocket_drv.h" > +#include "rocket_gem.h" > + > +static void rocket_gem_bo_free(struct drm_gem_object *obj) > +{ > + struct rocket_device *rdev = to_rocket_device(obj->dev); > + struct rocket_gem_object *bo = to_rocket_bo(obj); > + size_t unmapped; > + > + drm_WARN_ON(obj->dev, bo->base.pages_use_count > 1); > + > + guard(mutex)(&rdev->iommu_lock); > + > + unmapped = iommu_unmap(bo->domain, bo->base.sgt->sgl->dma_address, bo->size); > + drm_WARN_ON(obj->dev, unmapped != bo->size); > + > + /* This will unmap the pages from the IOMMU linked to core 0 */ This means "DMA-unmap the pages", right? If things have been done correctly then the iommu_unmap() above will already have removed the actual translation all cores' IOMMUs were using. > + drm_gem_shmem_free(&bo->base); > +} > + > +static const struct drm_gem_object_funcs rocket_gem_funcs = { > + .free = rocket_gem_bo_free, > + .print_info = drm_gem_shmem_object_print_info, > + .pin = drm_gem_shmem_object_pin, > + .unpin = drm_gem_shmem_object_unpin, > + .get_sg_table = drm_gem_shmem_object_get_sg_table, > + .vmap = drm_gem_shmem_object_vmap, > + .vunmap = drm_gem_shmem_object_vunmap, > + .mmap = drm_gem_shmem_object_mmap, > + .vm_ops = &drm_gem_shmem_vm_ops, > +}; > + > +struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size) > +{ > + struct rocket_gem_object *obj; > + > + obj = kzalloc(sizeof(*obj), GFP_KERNEL); > + if (!obj) > + return ERR_PTR(-ENOMEM); > + > + obj->base.base.funcs = &rocket_gem_funcs; > + > + return &obj->base.base; > +} > + > +int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file) > +{ > + struct rocket_file_priv *rocket_priv = file->driver_priv; > + struct drm_rocket_create_bo *args = data; > + struct rocket_device *rdev = to_rocket_device(dev); > + struct drm_gem_shmem_object *shmem_obj; > + struct rocket_gem_object *rkt_obj; > + struct drm_gem_object *gem_obj; > + struct sg_table *sgt; > + int ret; > + > + shmem_obj = drm_gem_shmem_create(dev, args->size); > + if (IS_ERR(shmem_obj)) > + return PTR_ERR(shmem_obj); > + > + gem_obj = &shmem_obj->base; > + rkt_obj = to_rocket_bo(gem_obj); > + > + rkt_obj->domain = rocket_priv->domain; > + rkt_obj->size = args->size; > + rkt_obj->offset = 0; > + > + ret = drm_gem_handle_create(file, gem_obj, &args->handle); > + drm_gem_object_put(gem_obj); > + > + guard(mutex)(&rdev->iommu_lock); > + > + if (ret) > + goto err; > + > + sgt = drm_gem_shmem_get_pages_sgt(shmem_obj); > + if (IS_ERR(sgt)) { > + ret = PTR_ERR(sgt); > + goto err; > + } > + > + ret = iommu_map_sgtable(rocket_priv->domain, > + shmem_obj->sgt->sgl->dma_address, Is this expected to be a DMA address implicitly generated by the dma_map_sg() in drm_gem_shmem_get_pages_sgt()? I would strongly recommend against relying on that - at the moment it happens that iommu-dma still does complete dma_map_* operations in the unattached DMA ops domain, mostly redundantly, but I've long been meaning to optimise that so that it only performs any necessary cache maintenance on the underlying memory when the caller is already using their own IOMMU domain. At that point the returned DMA address is likely to just be the PA, and this tactic probably won't work. > + shmem_obj->sgt, > + IOMMU_READ | IOMMU_WRITE); > + if (ret < 0 || ret < args->size) { > + drm_err(dev, "failed to map buffer: size=%d request_size=%u\n", > + ret, args->size); > + ret = -ENOMEM; > + goto err; > + } > + > + /* iommu_map_sgtable might have aligned the size */ > + rkt_obj->size = ret; > + dma_sync_sgtable_for_device(dev->dev, shmem_obj->sgt, DMA_BIDIRECTIONAL); What's this for? The buffer is already in for_device state when it initially comes back from get_pages_sgt, and hasn't even been touched yet anyway. Thanks, Robin. > + args->offset = drm_vma_node_offset_addr(&gem_obj->vma_node); > + args->dma_address = sg_dma_address(shmem_obj->sgt->sgl); > + > + return 0; > + > +err: > + drm_gem_shmem_object_free(gem_obj); > + > + return ret; > +} > diff --git a/drivers/accel/rocket/rocket_gem.h b/drivers/accel/rocket/rocket_gem.h > new file mode 100644 > index 0000000000000000000000000000000000000000..e8a4d6213fd80419be2ec8af04583a67fb1a4b75 > --- /dev/null > +++ b/drivers/accel/rocket/rocket_gem.h > @@ -0,0 +1,27 @@ > +/* SPDX-License-Identifier: GPL-2.0-only */ > +/* Copyright 2024-2025 Tomeu Vizoso <tomeu(a)tomeuvizoso.net> */ > + > +#ifndef __ROCKET_GEM_H__ > +#define __ROCKET_GEM_H__ > + > +#include <drm/drm_gem_shmem_helper.h> > + > +struct rocket_gem_object { > + struct drm_gem_shmem_object base; > + > + struct iommu_domain *domain; > + size_t size; > + u32 offset; > +}; > + > +struct drm_gem_object *rocket_gem_create_object(struct drm_device *dev, size_t size); > + > +int rocket_ioctl_create_bo(struct drm_device *dev, void *data, struct drm_file *file); > + > +static inline > +struct rocket_gem_object *to_rocket_bo(struct drm_gem_object *obj) > +{ > + return container_of(to_drm_gem_shmem_obj(obj), struct rocket_gem_object, base); > +} > + > +#endif > diff --git a/include/uapi/drm/rocket_accel.h b/include/uapi/drm/rocket_accel.h > new file mode 100644 > index 0000000000000000000000000000000000000000..95720702b7c4413d72b89c1f0f59abb22dc8c6b3 > --- /dev/null > +++ b/include/uapi/drm/rocket_accel.h > @@ -0,0 +1,44 @@ > +/* SPDX-License-Identifier: MIT */ > +/* > + * Copyright © 2024 Tomeu Vizoso > + */ > +#ifndef __DRM_UAPI_ROCKET_ACCEL_H__ > +#define __DRM_UAPI_ROCKET_ACCEL_H__ > + > +#include "drm.h" > + > +#if defined(__cplusplus) > +extern "C" { > +#endif > + > +#define DRM_ROCKET_CREATE_BO 0x00 > + > +#define DRM_IOCTL_ROCKET_CREATE_BO DRM_IOWR(DRM_COMMAND_BASE + DRM_ROCKET_CREATE_BO, struct drm_rocket_create_bo) > + > +/** > + * struct drm_rocket_create_bo - ioctl argument for creating Rocket BOs. > + * > + */ > +struct drm_rocket_create_bo { > + /** Input: Size of the requested BO. */ > + __u32 size; > + > + /** Output: GEM handle for the BO. */ > + __u32 handle; > + > + /** > + * Output: DMA address for the BO in the NPU address space. This address > + * is private to the DRM fd and is valid for the lifetime of the GEM > + * handle. > + */ > + __u64 dma_address; > + > + /** Output: Offset into the drm node to use for subsequent mmap call. */ > + __u64 offset; > +}; > + > +#if defined(__cplusplus) > +} > +#endif > + > +#endif /* __DRM_UAPI_ROCKET_ACCEL_H__ */ >

6 days, 18 hours

Re: [PATCH v10 7/9] optee: support protected memory allocation

by Jens Wiklander

Hi Amir, On Tue, Jun 24, 2025 at 8:54 AM Amirreza Zarrabi <amirreza.zarrabi(a)oss.qualcomm.com> wrote: > > Hi Jens, > > On 6/10/2025 11:13 PM, Jens Wiklander wrote: > > Add support in the OP-TEE backend driver for protected memory > > allocation. The support is limited to only the SMC ABI and for secure > > video buffers. > > > > OP-TEE is probed for the range of protected physical memory and a > > memory pool allocator is initialized if OP-TEE have support for such > > memory. > > > > Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org> > > --- > > drivers/tee/optee/Kconfig | 5 +++ > > drivers/tee/optee/core.c | 10 +++++ > > drivers/tee/optee/optee_private.h | 2 + > > drivers/tee/optee/smc_abi.c | 70 ++++++++++++++++++++++++++++++- > > 4 files changed, 85 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/tee/optee/Kconfig b/drivers/tee/optee/Kconfig > > index 7bb7990d0b07..50d2051f7f20 100644 > > --- a/drivers/tee/optee/Kconfig > > +++ b/drivers/tee/optee/Kconfig > > @@ -25,3 +25,8 @@ config OPTEE_INSECURE_LOAD_IMAGE > > > > Additional documentation on kernel security risks are at > > Documentation/tee/op-tee.rst. > > + > > +config OPTEE_STATIC_PROTMEM_POOL > > + bool > > + depends on HAS_IOMEM && TEE_DMABUF_HEAPS > > + default y > > diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c > > index c75fddc83576..4b14a7ac56f9 100644 > > --- a/drivers/tee/optee/core.c > > +++ b/drivers/tee/optee/core.c > > @@ -56,6 +56,15 @@ int optee_rpmb_intf_rdev(struct notifier_block *intf, unsigned long action, > > return 0; > > } > > > > +int optee_set_dma_mask(struct optee *optee, u_int pa_width) > > +{ > > + u64 mask = DMA_BIT_MASK(min(64, pa_width)); > > + > > nit: Why not dma_coerce_mask_and_coherent() instead of bellow? Good point, I'll update in the next version. Thanks, Jens > > - Amir > > > + optee->teedev->dev.dma_mask = &optee->teedev->dev.coherent_dma_mask; > > + > > + return dma_set_mask_and_coherent(&optee->teedev->dev, mask); > > +} > > + > > static void optee_bus_scan(struct work_struct *work) > > { > > WARN_ON(optee_enumerate_devices(PTA_CMD_GET_DEVICES_SUPP)); > > @@ -181,6 +190,7 @@ void optee_remove_common(struct optee *optee) > > tee_device_unregister(optee->supp_teedev); > > tee_device_unregister(optee->teedev); > > > > + tee_device_unregister_all_dma_heaps(optee->teedev); > > tee_shm_pool_free(optee->pool); > > optee_supp_uninit(&optee->supp); > > mutex_destroy(&optee->call_queue.mutex); > > diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h > > index dc0f355ef72a..5e3c34802121 100644 > > --- a/drivers/tee/optee/optee_private.h > > +++ b/drivers/tee/optee/optee_private.h > > @@ -272,6 +272,8 @@ struct optee_call_ctx { > > > > extern struct blocking_notifier_head optee_rpmb_intf_added; > > > > +int optee_set_dma_mask(struct optee *optee, u_int pa_width); > > + > > int optee_notif_init(struct optee *optee, u_int max_key); > > void optee_notif_uninit(struct optee *optee); > > int optee_notif_wait(struct optee *optee, u_int key, u32 timeout); > > diff --git a/drivers/tee/optee/smc_abi.c b/drivers/tee/optee/smc_abi.c > > index f0c3ac1103bb..cf106d15e64e 100644 > > --- a/drivers/tee/optee/smc_abi.c > > +++ b/drivers/tee/optee/smc_abi.c > > @@ -1584,6 +1584,68 @@ static inline int optee_load_fw(struct platform_device *pdev, > > } > > #endif > > > > +static struct tee_protmem_pool *static_protmem_pool_init(struct optee *optee) > > +{ > > +#if IS_ENABLED(CONFIG_OPTEE_STATIC_PROTMEM_POOL) > > + union { > > + struct arm_smccc_res smccc; > > + struct optee_smc_get_protmem_config_result result; > > + } res; > > + struct tee_protmem_pool *pool; > > + void *p; > > + int rc; > > + > > + optee->smc.invoke_fn(OPTEE_SMC_GET_PROTMEM_CONFIG, 0, 0, 0, 0, > > + 0, 0, 0, &res.smccc); > > + if (res.result.status != OPTEE_SMC_RETURN_OK) > > + return ERR_PTR(-EINVAL); > > + > > + rc = optee_set_dma_mask(optee, res.result.pa_width); > > + if (rc) > > + return ERR_PTR(rc); > > + > > + /* > > + * Map the memory as uncached to make sure the kernel can work with > > + * __pfn_to_page() and friends since that's needed when passing the > > + * protected DMA-buf to a device. The memory should otherwise not > > + * be touched by the kernel since it's likely to cause an external > > + * abort due to the protection status. > > + */ > > + p = devm_memremap(&optee->teedev->dev, res.result.start, > > + res.result.size, MEMREMAP_WC); > > + if (IS_ERR(p)) > > + return p; > > + > > + pool = tee_protmem_static_pool_alloc(res.result.start, res.result.size); > > + if (IS_ERR(pool)) > > + devm_memunmap(&optee->teedev->dev, p); > > + > > + return pool; > > +#else > > + return ERR_PTR(-EINVAL); > > +#endif > > +} > > + > > +static int optee_protmem_pool_init(struct optee *optee) > > +{ > > + enum tee_dma_heap_id heap_id = TEE_DMA_HEAP_SECURE_VIDEO_PLAY; > > + struct tee_protmem_pool *pool = ERR_PTR(-EINVAL); > > + int rc; > > + > > + if (!(optee->smc.sec_caps & OPTEE_SMC_SEC_CAP_PROTMEM)) > > + return 0; > > + > > + pool = static_protmem_pool_init(optee); > > + if (IS_ERR(pool)) > > + return PTR_ERR(pool); > > + > > + rc = tee_device_register_dma_heap(optee->teedev, heap_id, pool); > > + if (rc) > > + pool->ops->destroy_pool(pool); > > + > > + return rc; > > +} > > + > > static int optee_probe(struct platform_device *pdev) > > { > > optee_invoke_fn *invoke_fn; > > @@ -1679,7 +1741,7 @@ static int optee_probe(struct platform_device *pdev) > > optee = kzalloc(sizeof(*optee), GFP_KERNEL); > > if (!optee) { > > rc = -ENOMEM; > > - goto err_free_pool; > > + goto err_free_shm_pool; > > } > > > > optee->ops = &optee_ops; > > @@ -1752,6 +1814,9 @@ static int optee_probe(struct platform_device *pdev) > > pr_info("Asynchronous notifications enabled\n"); > > } > > > > + if (optee_protmem_pool_init(optee)) > > + pr_info("Protected memory service not available\n"); > > + > > /* > > * Ensure that there are no pre-existing shm objects before enabling > > * the shm cache so that there's no chance of receiving an invalid > > @@ -1787,6 +1852,7 @@ static int optee_probe(struct platform_device *pdev) > > optee_disable_shm_cache(optee); > > optee_smc_notif_uninit_irq(optee); > > optee_unregister_devices(); > > + tee_device_unregister_all_dma_heaps(optee->teedev); > > err_notif_uninit: > > optee_notif_uninit(optee); > > err_close_ctx: > > @@ -1803,7 +1869,7 @@ static int optee_probe(struct platform_device *pdev) > > tee_device_unregister(optee->teedev); > > err_free_optee: > > kfree(optee); > > -err_free_pool: > > +err_free_shm_pool: > > tee_shm_pool_free(pool); > > if (memremaped_shm) > > memunmap(memremaped_shm); >

6 days, 20 hours

Jump to page:

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Linaro-mm-sig