On 20-05-2026 21:17, Tomeu Vizoso wrote:
On Wed, May 20, 2026 at 4:12 PM Dmitry Baryshkov dmitry.baryshkov@oss.qualcomm.com wrote:
On Tue, May 19, 2026 at 11:45:52AM +0530, Ekansh Gupta via B4 Relay wrote:
From: Ekansh Gupta ekansh.gupta@oss.qualcomm.com
Add documentation for the Qualcomm DSP Accelerator (QDA) driver under Documentation/accel/qda/. The documentation covers the driver architecture, GEM-based buffer management, IOMMU context bank isolation, and the RPMsg transport layer.
The user-space API section describes the DRM IOCTLs for session management, GEM buffer allocation, and remote procedure invocation via the FastRPC protocol, along with a typical application lifecycle example. Sections for dynamic debug and basic testing are also included.
Wire the new documentation into the Compute Accelerators index at Documentation/accel/index.rst.
Assisted-by: Claude:claude-4-6-sonnet Signed-off-by: Ekansh Gupta ekansh.gupta@oss.qualcomm.com
Documentation/accel/index.rst | 1 + Documentation/accel/qda/index.rst | 13 ++++ Documentation/accel/qda/qda.rst | 146 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 160 insertions(+)
diff --git a/Documentation/accel/index.rst b/Documentation/accel/index.rst index cbc7d4c3876a..5901ea7f784c 100644 --- a/Documentation/accel/index.rst +++ b/Documentation/accel/index.rst @@ -10,4 +10,5 @@ Compute Accelerators introduction amdxdna/index qaic/index
- qda/index rocket/index
diff --git a/Documentation/accel/qda/index.rst b/Documentation/accel/qda/index.rst new file mode 100644 index 000000000000..013400cf9c25 --- /dev/null +++ b/Documentation/accel/qda/index.rst @@ -0,0 +1,13 @@ +.. SPDX-License-Identifier: GPL-2.0-only
+================================== +accel/qda Qualcomm DSP Accelerator +==================================
+The QDA driver provides a DRM accel based interface for Qualcomm DSP offload. +It uses the FastRPC protocol and integrates with DRM and GEM infrastructure +for device and buffer management.
+.. toctree::
- qda
diff --git a/Documentation/accel/qda/qda.rst b/Documentation/accel/qda/qda.rst new file mode 100644 index 000000000000..9f49af6e6acc --- /dev/null +++ b/Documentation/accel/qda/qda.rst @@ -0,0 +1,146 @@ +.. SPDX-License-Identifier: GPL-2.0-only
+===================================== +Qualcomm DSP Accelerator (QDA) Driver +=====================================
+Introduction +============
+The QDA driver is a DRM accel driver for Qualcomm's DSPs. It provides a +DRM accel based interface for Qualcomm DSP offload, supporting workloads +such as AI inference, computer vision, audio processing, and sensor offload +on Qualcomm SoCs. It uses the FastRPC protocol and integrates with DRM and +GEM infrastructure for device and buffer management.
+Key Features +============
+* **DRM accel Interface**: Exposes a standard character device node
- (e.g., ``/dev/accel/accel0``) via the DRM accel subsystem.
+* **FastRPC Protocol**: Implements the FastRPC protocol for communication
- between the application processor and the DSP.
+* **GEM Buffer Management**: Uses the DRM GEM interface for buffer
- allocation, lifecycle management, and DMA-BUF import/export.
+* **IOMMU Isolation**: Uses IOMMU context banks to enforce memory isolation
- between different DSP user sessions.
+* **Modular Design**: Clean separation between the core DRM logic, the
- memory manager, and the RPMsg-based transport layer.
+Architecture +============
+The QDA driver consists of several functional blocks:
+1. **Core Driver (``qda_drv``)**: Manages device registration, file operations,
- and DRM accel integration.
+2. **Memory Manager (``qda_memory_manager``)**: A flexible memory management
- layer that handles IOMMU context banks. It supports pluggable backends
- (such as DMA-coherent) to adapt to different SoC memory architectures.
+3. **GEM Subsystem**: Implements the DRM GEM interface for buffer management:
- **``qda_gem``**: Core GEM object management, including allocation, mmap
operations, and buffer lifecycle management.
- **``qda_prime``**: PRIME import functionality for DMA-BUF interoperability
with other kernel subsystems.+4. **Transport Layer (``qda_rpmsg``)**: Abstraction over the RPMsg framework
- to handle low-level message passing with the DSP firmware.
+5. **Compute Bus (``qda_compute_bus``)**: A custom virtual bus used to
- enumerate and manage the specific compute context banks defined in the
- device tree. The bus was introduced because IOMMU context banks (CBs) are
- synthetic constructs — not real platform devices — making a platform driver
- an incorrect abstraction for them. The earlier platform-driver approach also
- had a race condition: device nodes were created before the RPMsg channel
- resources were fully initialized, and because ``probe`` runs asynchronously,
- applications could open a CB device and attempt to start a session before
- the underlying transport was ready. The compute bus makes CB lifetime
- explicitly subordinate to the parent QDA device, closing that window.
+6. **FastRPC Core (``qda_fastrpc``)**: Implements the protocol logic for
- marshalling arguments and handling remote invocations.
+User-Space API +==============
+The driver exposes a set of DRM-compliant IOCTLs:
+* ``DRM_IOCTL_QDA_QUERY``: Query DSP type (e.g., "cdsp", "adsp")
- and capabilities.
+* ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE``: Initialize a new process context
- on the DSP.
+* ``DRM_IOCTL_QDA_REMOTE_INVOKE``: Submit a remote method invocation (the
- primary execution unit).
+* ``DRM_IOCTL_QDA_GEM_CREATE``: Allocate a GEM buffer object for DSP usage. +* ``DRM_IOCTL_QDA_GEM_MMAP_OFFSET``: Retrieve mmap offsets for memory mapping. +* ``DRM_IOCTL_QDA_REMOTE_MAP`` / ``DRM_IOCTL_QDA_REMOTE_MUNMAP``: Map or unmap
- buffers into the DSP's virtual address space. Each accepts a ``request``
- field selecting between a legacy operation (``QDA_MAP_REQUEST_LEGACY`` /
- ``QDA_MUNMAP_REQUEST_LEGACY``) and an attribute-based operation
- (``QDA_MAP_REQUEST_ATTR`` / ``QDA_MUNMAP_REQUEST_ATTR``).
Explain, what happens in the users don't map the buffers into the DSP space. Will DRM_IOCTL_QDA_REMOTE_INVOKE handle the mapping or not? What is the difference between those two modes?
Would the driver benefit from using GPUVM?
+Usage Example +=============
+A typical lifecycle for a user-space application:
+1. **Discovery**: Open ``/dev/accel/accel*`` and use
- ``DRM_IOCTL_QDA_QUERY`` to identify the DSP domain served by that
- device node.
+2. **Initialization**: Call ``DRM_IOCTL_QDA_REMOTE_SESSION_CREATE`` to
- establish a session and create a process context on the DSP.
+3. **Memory**: Allocate buffers via ``DRM_IOCTL_QDA_GEM_CREATE`` or import
- DMA-BUFs (PRIME fd) from other drivers using ``DRM_IOCTL_PRIME_FD_TO_HANDLE``.
+4. **Execution**: Use ``DRM_IOCTL_QDA_REMOTE_INVOKE`` to pass arguments and
- execute functions on the DSP.
+5. **Cleanup**: Close file descriptors to automatically release resources and
- detach the session.
I'd have expected the description of the actual example. I.e. clone the app from https://the.addr, prepare clang >= NN.MM, QAIC (https://foo), run make, run the app, check the results. I'd remind that DRM Accel has a very specific requirement of having the working toolhain in the open-source.
We have been getting submissions lately that don't fulfill that requirement so I will point to the precise part of the documentation that explains it:
https://www.kernel.org/doc/html/latest/gpu/drm-uapi.html#open-source-userspa...
For an example of a submissions that complies, see:
https://lore.kernel.org/dri-devel/20260114-thames-v2-0-e94a6636e050@tomeuviz...
Most importantly, notice how the proposed Thames Mesa driver generates machine code for all the hardware units, and doesn't use any blob for that.
I believe QDA checks all boxes for accel, as there is available opensource userspace, opensource QAIC compiler for IDL compilation and LLVM supports hexagon arch.
I'll try adding these details as well.
Thanks!> Regards,
Tomeu
+Internal Implementation +=======================
+Memory Management +----------------- +The driver's memory manager creates virtual "IOMMU devices" that map to +hardware context banks. This allows the driver to manage multiple isolated +address spaces. The implementation uses a DMA-coherent backend to ensure data consistency +between the CPU and DSP without manual cache maintenance in most cases.
GEM usage?
+Debugging +========= +The driver includes extensive dynamic debug support. Enable it via the +kernel's dynamic debug control:
+.. code-block:: bash
- echo "file drivers/accel/qda/* +p" > /sys/kernel/debug/dynamic_debug/control
+Testing +======= +The QDA driver can be exercised using the ``fastrpc_test`` utility from the +FastRPC userspace library. Run the test application:
pointer
+.. code-block:: bash
- fastrpc_test -d 3 -U 1 -t linux -a v68
+**Options**
+``-d domain``
- Select the DSP domain to run on:
- ``0`` — ADSP
- ``1`` — MDSP
- ``2`` — SDSP
- ``3`` — CDSP *(default on targets with CDSP)*
+``-U unsigned_PD``
- Select signed or unsigned protection domain:
- ``0`` — signed PD
- ``1`` — unsigned PD *(default)*
+``-t target``
- Target platform: ``android`` or ``linux`` *(default: linux)*
+``-a arch_version``
- DSP architecture version, e.g. ``v68``, ``v75`` *(default: v68)*
-- 2.34.1
-- With best wishes Dmitry