On 1/13/26 18:44, Tomeu Vizoso wrote:
> This series adds a new DRM/Accel driver that supports the C7x DSPs
> inside some Texas Instruments SoCs such as the J722S. These can be used
> as accelerators for various workloads, including machine learning
> inference.
>
> This driver controls the power state of the hardware via remoteproc and
> communicates with the firmware running on the DSP via rpmsg_virtio. The
> kernel driver itself allocates buffers, manages contexts, and submits
> jobs to the DSP firmware. Buffers are mapped by the DSP itself using its
> MMU, providing memory isolation among different clients.
>
> The source code for the firmware running on the DSP is available at:
> https://gitlab.freedesktop.org/tomeu/thames_firmware/.
>
> Everything else is done in userspace, as a Gallium driver (also called
> thames) that is part of the Mesa3D project: https://docs.mesa3d.org/teflon.html
>
> If there is more than one core that advertises the same rpmsg_virtio
> service name, the driver will load balance jobs between them with
> drm-gpu-scheduler.
I only took 5 minutes to skim over it, so no full review.
You have the classic mistake of allocating memory in the run_job callback of the scheduler, but that is trivial to fix.
Apart from that looks pretty solid to me.
Regards,
Christian.
>
> Userspace portion of the driver: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/39298
>
> Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net>
> ---
> Tomeu Vizoso (5):
> arm64: dts: ti: k3-j722s-ti-ipc-firmware: Add memory pool for DSP i/o buffers
> accel/thames: Add driver for the C7x DSPs in TI SoCs
> accel/thames: Add IOCTLs for BO creation and mapping
> accel/thames: Add IOCTL for job submission
> accel/thames: Add IOCTL for memory synchronization
>
> Documentation/accel/thames/index.rst | 28 ++
> MAINTAINERS | 9 +
> .../boot/dts/ti/k3-j722s-ti-ipc-firmware.dtsi | 11 +-
> drivers/accel/Kconfig | 1 +
> drivers/accel/Makefile | 3 +-
> drivers/accel/thames/Kconfig | 26 ++
> drivers/accel/thames/Makefile | 11 +
> drivers/accel/thames/thames_core.c | 161 +++++++
> drivers/accel/thames/thames_core.h | 53 +++
> drivers/accel/thames/thames_device.c | 93 +++++
> drivers/accel/thames/thames_device.h | 46 ++
> drivers/accel/thames/thames_drv.c | 180 ++++++++
> drivers/accel/thames/thames_drv.h | 21 +
> drivers/accel/thames/thames_gem.c | 407 ++++++++++++++++++
> drivers/accel/thames/thames_gem.h | 45 ++
> drivers/accel/thames/thames_ipc.h | 204 +++++++++
> drivers/accel/thames/thames_job.c | 463 +++++++++++++++++++++
> drivers/accel/thames/thames_job.h | 51 +++
> drivers/accel/thames/thames_rpmsg.c | 276 ++++++++++++
> drivers/accel/thames/thames_rpmsg.h | 27 ++
> 20 files changed, 2113 insertions(+), 3 deletions(-)
> ---
> base-commit: 27927a79b3c6aebd18f38507a8160294243763dc
> change-id: 20260113-thames-334127a2d91d
>
> Best regards,
On Tue, Jan 13, 2026 at 11:45 AM Tomeu Vizoso <tomeu(a)tomeuvizoso.net> wrote:
>
> Some SoCs from Texas Instruments contain DSPs that can be used for
> general compute tasks.
>
> This driver provides a drm/accel UABI to userspace for submitting jobs
> to the DSP cores and managing the input, output and intermediate memory.
>
> Signed-off-by: Tomeu Vizoso <tomeu(a)tomeuvizoso.net>
> ---
> Documentation/accel/thames/index.rst | 28 +++++
> MAINTAINERS | 9 ++
> drivers/accel/Kconfig | 1 +
> drivers/accel/Makefile | 3 +-
> drivers/accel/thames/Kconfig | 26 +++++
> drivers/accel/thames/Makefile | 9 ++
> drivers/accel/thames/thames_core.c | 155 ++++++++++++++++++++++++++
> drivers/accel/thames/thames_core.h | 53 +++++++++
> drivers/accel/thames/thames_device.c | 93 ++++++++++++++++
> drivers/accel/thames/thames_device.h | 46 ++++++++
> drivers/accel/thames/thames_drv.c | 156 +++++++++++++++++++++++++++
> drivers/accel/thames/thames_drv.h | 21 ++++
> drivers/accel/thames/thames_ipc.h | 204 +++++++++++++++++++++++++++++++++++
> drivers/accel/thames/thames_rpmsg.c | 155 ++++++++++++++++++++++++++
> drivers/accel/thames/thames_rpmsg.h | 27 +++++
> 15 files changed, 985 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/accel/thames/index.rst b/Documentation/accel/thames/index.rst
> new file mode 100644
> index 0000000000000000000000000000000000000000..ca8391031f226f7ef1dc210a356c86acbe126c6f
> --- /dev/null
> +++ b/Documentation/accel/thames/index.rst
> @@ -0,0 +1,28 @@
> +.. SPDX-License-Identifier: GPL-2.0-only
> +
> +============================================================
> + accel/thames Driver for the C7x DSPs from Texas Instruments
> +============================================================
> +
> +The accel/thames driver supports the C7x DSPs inside some Texas Instruments SoCs
> +such as the J722S. These can be used as accelerators for various workloads,
> +including machine learning inference.
> +
> +This driver controls the power state of the hardware via :doc:`remoteproc </staging/remoteproc>`
> +and communicates with the firmware running on the DSP via :doc:`rpmsg_virtio </staging/rpmsg_virtio>`.
> +The kernel driver itself allocates buffers, manages contexts, and submits jobs
> +to the DSP firmware. Buffers are mapped by the DSP itself using its MMU,
> +providing memory isolation among different clients.
> +
> +The source code for the firmware running on the DSP is available at:
> +https://gitlab.freedesktop.org/tomeu/thames_firmware/.
> +
> +Everything else is done in userspace, as a Gallium driver (also called thames)
> +that is part of the Mesa3D project: https://docs.mesa3d.org/teflon.html
> +
> +If there is more than one core that advertises the same rpmsg_virtio service
> +name, the driver will load balance jobs between them with drm-gpu-scheduler.
> +
> +Hardware currently supported:
> +
> +* J722S
> diff --git a/MAINTAINERS b/MAINTAINERS
> index dc731d37c8feeff25613c59fe9c929927dadaa7e..a3fc809c797269d0792dfe5202cc1b49f6ff57e9 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -7731,6 +7731,15 @@ F: Documentation/devicetree/bindings/npu/rockchip,rk3588-rknn-core.yaml
> F: drivers/accel/rocket/
> F: include/uapi/drm/rocket_accel.h
>
> +DRM ACCEL DRIVER FOR TI C7x DSPS
> +M: Tomeu Vizoso <tomeu(a)tomeuvizoso.net>
> +L: dri-devel(a)lists.freedesktop.org
> +S: Supported
> +T: git https://gitlab.freedesktop.org/drm/misc/kernel.git
> +F: Documentation/accel/thames/
> +F: drivers/accel/thames/
> +F: include/uapi/drm/thames_accel.h
Oh where is this "thames_accel.h" ? ;)
2026-01-13T18:16:11.881084Z 01E
drivers/accel/thames/thames_drv.c:8:10: fatal error:
drm/thames_accel.h: No such file or directory
2026-01-13T18:16:11.881086Z 01E 8 | #include <drm/thames_accel.h>
2026-01-13T18:16:11.881087Z 01E | ^~~~~~~~~~~~~~~~~~~~
2026-01-13T18:16:11.881115Z 01E compilation terminated.
2026-01-13T18:16:11.884552Z 01E make[8]: ***
[scripts/Makefile.build:287: drivers/accel/thames/thames_drv.o] Error
1
2026-01-13T18:16:11.884694Z 01E make[7]: ***
[scripts/Makefile.build:544: drivers/accel/thames] Error 2
2026-01-13T18:16:11.884926Z 01E make[6]: ***
[scripts/Makefile.build:544: drivers/accel] Error 2
2026-01-13T18:16:11.884976Z 01E make[6]: *** Waiting for unfinished jobs....
$ find . | grep thames_accel.h
$ grep -R "thames_accel.h" ./*
./drivers/accel/thames/Kconfig: include/uapi/drm/thames_accel.h
and is used by the Thames userspace
./drivers/accel/thames/thames_job.c:#include <drm/thames_accel.h>
./drivers/accel/thames/thames_drv.c:#include <drm/thames_accel.h>
./drivers/accel/thames/thames_gem.c:#include <drm/thames_accel.h>
./MAINTAINERS:F: include/uapi/drm/thames_accel.h
Regards,
--
Robert Nelson
https://rcn-ee.com/
This series implements a dma-buf “revoke” mechanism: to allow a dma-buf
exporter to explicitly invalidate (“kill”) a shared buffer after it has
been distributed to importers, so that further CPU and device access is
prevented and importers reliably observe failure.
Today, dma-buf effectively provides “if you have the fd, you can keep using
the memory indefinitely.” That assumption breaks down when an exporter must
reclaim, reset, evict, or otherwise retire backing memory after it has been
shared. Concrete cases include GPU reset and recovery where old allocations
become unsafe to access, memory eviction/overcommit where backing storage
must be withdrawn, and security or isolation situations where continued access
must be prevented. While drivers can sometimes approximate this with
exporter-specific fencing and policy, there is no core dma-buf state transition
that communicates “this buffer is no longer valid; fail access” across all
access paths.
The change in this series is to introduce a core “revoked” state on the dma-buf
object and a corresponding exporter-triggered revoke operation. Once a dma-buf
is revoked, new access paths are blocked so that attempts to DMA-map, vmap, or
mmap the buffer fail in a consistent way.
In addition, the series aims to invalidate existing access as much as the kernel
allows: device mappings are torn down where possible so devices and IOMMUs cannot
continue DMA.
The semantics are intentionally simple: revoke is a one-way, permanent transition
for the lifetime of that dma-buf instance.
From a compatibility perspective, users that never invoke revoke are unaffected,
and exporters that adopt it gain a core-supported enforcement mechanism rather
than relying on ad hoc driver behavior. The intent is to keep the interface
minimal and avoid imposing policy; the series provides the mechanism to terminate
access, with policy remaining in the exporter and higher-level components.
BTW, see this megathread [1] for additional context.
Ironically, it was posted exactly one year ago.
[1] https://lore.kernel.org/all/20250107142719.179636-2-yilun.xu@linux.intel.co…
Thanks
Cc: linux-rdma(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-media(a)vger.kernel.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: linaro-mm-sig(a)lists.linaro.org
Cc: kvm(a)vger.kernel.org
Cc: iommu(a)lists.linux.dev
To: Jason Gunthorpe <jgg(a)ziepe.ca>
To: Leon Romanovsky <leon(a)kernel.org>
To: Sumit Semwal <sumit.semwal(a)linaro.org>
To: Christian König <christian.koenig(a)amd.com>
To: Alex Williamson <alex(a)shazbot.org>
To: Kevin Tian <kevin.tian(a)intel.com>
To: Joerg Roedel <joro(a)8bytes.org>
To: Will Deacon <will(a)kernel.org>
To: Robin Murphy <robin.murphy(a)arm.com>
Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com>
---
Leon Romanovsky (4):
dma-buf: Introduce revoke semantics
vfio: Use dma-buf revoke semantics
iommufd: Require DMABUF revoke semantics
iommufd/selftest: Reuse dma-buf revoke semantics
drivers/dma-buf/dma-buf.c | 36 ++++++++++++++++++++++++++++++++----
drivers/iommu/iommufd/pages.c | 2 +-
drivers/iommu/iommufd/selftest.c | 12 ++++--------
drivers/vfio/pci/vfio_pci_dmabuf.c | 27 ++++++---------------------
include/linux/dma-buf.h | 31 +++++++++++++++++++++++++++++++
5 files changed, 74 insertions(+), 34 deletions(-)
---
base-commit: 9ace4753a5202b02191d54e9fdf7f9e3d02b85eb
change-id: 20251221-dmabuf-revoke-b90ef16e4236
Best regards,
--
Leon Romanovsky <leonro(a)nvidia.com>
On Fri, Jan 09, 2026 at 10:10:57AM +0800, Ming Lei wrote:
> On Thu, Jan 08, 2026 at 11:17:03AM +0100, Christoph Hellwig wrote:
> > On Thu, Jan 08, 2026 at 10:19:18AM +0800, Ming Lei wrote:
> > > > The feature is in no way nvme specific. nvme is just the initial
> > > > underlying driver. It makes total sense to support this for any high
> > > > performance block device, and to pass it through file systems.
> > >
> > > But why does FS care the dma buffer attachment? Since high performance
> > > host controller is exactly the dma buffer attachment point.
> >
> > I can't parse what you're trying to say here.
>
> dma buffer attachment is simply none of FS's business.
The file systems should indeed never do a dma buffer attachment itself,
but that's not the point.
> > But even when not stacking, the registration still needs to go
> > through the file system even for a single device, never mind multiple
> > controlled by the file system.
>
> dma_buf can have multiple importers, so why does it have to go through FS for
> single device only?
>
> If the registered buffer is attached to single device before going
> through FS, it can not support stacking block device, and it can't or not
> easily to use for multiple block device, no matter if they are behind same
> host controller or multiple.
Because the file system, or the file_operations instance to be more
specific, is the only entity that known what block device(s) or other DMA
capable device(s) like (R)NIC a file maps to.
On Thu, Jan 08, 2026 at 10:19:18AM +0800, Ming Lei wrote:
> > The feature is in no way nvme specific. nvme is just the initial
> > underlying driver. It makes total sense to support this for any high
> > performance block device, and to pass it through file systems.
>
> But why does FS care the dma buffer attachment? Since high performance
> host controller is exactly the dma buffer attachment point.
I can't parse what you're trying to say here.
> If the callback is added in `struct file_operations` for wiring dma buffer
> and the importer(host contrller), you will see it is hard to let it cross device
> mapper/raid or other stackable block devices.
Why?
But even when not stacking, the registration still needs to go
through the file system even for a single device, never mind multiple
controlled by the file system.
On 1/4/26 02:42, Ming Lei wrote:
> On Thu, Dec 04, 2025 at 02:10:25PM +0100, Christoph Hellwig wrote:
>> On Thu, Dec 04, 2025 at 12:09:46PM +0100, Christian König wrote:
>>>> I find the naming pretty confusing a well. But what this does is to
>>>> tell the file system/driver that it should expect a future
>>>> read_iter/write_iter operation that takes data from / puts data into
>>>> the dmabuf passed to this operation.
>>>
>>> That explanation makes much more sense.
>>>
>>> The remaining question is why does the underlying file system / driver
>>> needs to know that it will get addresses from a DMA-buf?
>>
>> This eventually ends up calling dma_buf_dynamic_attach and provides
>> a way to find the dma_buf_attachment later in the I/O path.
>
> Maybe it can be named as ->dma_buf_attach()? For wiring dma-buf and the
> importer side(nvme).
Yeah that would make it much more cleaner.
Also some higher level documentation would certainly help.
> But I am wondering why not make it as one subsystem interface, such as nvme
> ioctl, then the whole implementation can be simplified a lot. It is reasonable
> because subsystem is exactly the side for consuming/importing the dma-buf.
Yeah that it might be better if it's more nvme specific came to me as well.
Regards,
Christian.
>
>
> Thanks,
> Ming
>
On 12/19/25 16:58, Maxime Ripard wrote:
> On Fri, Dec 19, 2025 at 02:50:50PM +0100, Christian König wrote:
>> On 12/19/25 11:25, Maxime Ripard wrote:
>>> On Mon, Dec 15, 2025 at 03:53:22PM +0100, Christian König wrote:
>>>> On 12/15/25 14:59, Maxime Ripard wrote:
>> ...
>>>>>>> The shared ownership is indeed broken, but it's not more or less broken
>>>>>>> than, say, memfd + udmabuf, and I'm sure plenty of others.
>>>>>>>
>>>>>>> So we really improve the common case, but only make the "advanced"
>>>>>>> slightly more broken than it already is.
>>>>>>>
>>>>>>> Would you disagree?
>>>>>>
>>>>>> I strongly disagree. As far as I can see there is a huge chance we
>>>>>> break existing use cases with that.
>>>>>
>>>>> Which ones? And what about the ones that are already broken?
>>>>
>>>> Well everybody that expects that driver resources are *not* accounted to memcg.
>>>
>>> Which is a thing only because these buffers have never been accounted
>>> for in the first place.
>>
>> Yeah, completely agree. By not accounting it for such a long time we
>> ended up with people depending on this behavior.
>>
>> Not nice, but that's what it is.
>>
>>> So I guess the conclusion is that we shouldn't
>>> even try to do memory accounting, because someone somewhere might not
>>> expect that one of its application would take too much RAM in the
>>> system?
>>
>> Well we do need some kind of solution to the problem. Either having
>> some setting where you say "This memcg limit is inclusive/exclusive
>> device driver allocated memory" or have a completely separate limit
>> for device driver allocated memory.
>
> A device driver memory specific limit sounds like a good idea because it
> would make it easier to bridge the gap with dmem.
Completely agree, but that approach was rejected by the cgroups people.
I mean we can already use udmabuf to allocate memcg accounted system memory which then can be imported into device drivers.
So I don't see much reason why we should account dma-buf heaps and driver interfaces to memcg as well, we just need some way to limit them.
Regards,
Christian.
>
> Happy holidays,
> Maxime
On Tue, Jan 06, 2026 at 07:51:12PM +0000, Pavel Begunkov wrote:
>> But I am wondering why not make it as one subsystem interface, such as nvme
>> ioctl, then the whole implementation can be simplified a lot. It is reasonable
>> because subsystem is exactly the side for consuming/importing the dma-buf.
>
> It's not an nvme specific interface, and so a file op was much more
> convenient.
It is the much better abstraction. Also the nvme subsystems is not
an actor, and registering things to the subsystems does not work.
The nvme controller is the entity that does the dma mapping, and this
interface works very well for that.