This patch series introduces the Qualcomm DSP Accelerator (QDA) driver,
a DRM-based accelerator driver for Qualcomm DSPs. The driver provides a
standardized interface for offloading computational tasks to DSPs found
on Qualcomm SoCs, supporting all DSP domains.
The QDA driver implements the FastRPC protocol over the DRM accel
subsystem. It uses the same device-tree node structure as the existing
fastrpc driver in drivers/misc/. The approach for binding the QDA driver
to device-tree nodes while coexisting with the fastrpc driver is an open
item described below.
RFC thread: https://lore.kernel.org/dri-devel/20260224-qda-firstpost-v1-0-fe46a9c1a046@…
User-space staging branch
=========================
https://github.com/qualcomm/fastrpc/tree/accel/staging
Key Features
============
* Standard DRM accelerator interface via /dev/accel/accelN
* GEM-based buffer management with DMA-BUF import/export (PRIME)
* IOMMU-based memory isolation using per-process context banks
* FastRPC protocol implementation for DSP communication
* RPMsg transport layer for reliable message passing
* Support for all DSP domains (ADSP, CDSP, SDSP, GDSP)
* DRM IOCTL interface for DSP session management, buffer allocation,
and remote procedure invocation
Architecture
============
1. DRM Accelerator Framework Integration
The driver registers as a DRM accel device, exposing a standard
/dev/accel/accelN character device node. This provides established
DRM infrastructure for device management, file operations, and
IOCTL dispatch.
2. Memory Management
Buffers are managed as GEM objects with full PRIME support for
DMA-BUF import/export. This enables seamless buffer sharing with
other DRM drivers (GPU, camera, video) using standard kernel
mechanisms.
3. IOMMU Context Bank Management
IOMMU context banks (CBs) are represented as proper struct device
instances on a custom virtual bus (qda-compute-cb). Each CB device
is registered with the IOMMU subsystem and receives its own IOMMU
domain, enabling per-session address space isolation. The custom
bus was introduced because IOMMU context banks are synthetic
constructs — not real platform devices — and to ensure CB device
lifetime is strictly subordinate to the parent QDA device.
See also: https://lore.kernel.org/all/245d602f-3037-4ae3-9af9-d98f37258aae@oss.qualco…
4. Memory Manager Architecture
A pluggable memory manager coordinates IOMMU device assignment and
buffer allocation. The current implementation uses a DMA-coherent
backend with SID-prefixed DMA addresses for DSP firmware
compatibility.
5. Transport Layer
RPMsg communication is handled in a dedicated transport layer
(qda_rpmsg.c), separate from the core DRM driver logic.
6. Code Organization
The driver is organized across multiple files (~4600 lines total):
* qda_drv.c: Core driver and DRM integration
* qda_rpmsg.c: RPMsg transport layer
* qda_cb.c: Context bank device management
* qda_compute_bus.c: Custom virtual bus for CB devices
* qda_gem.c: GEM object management
* qda_prime.c: DMA-BUF import (PRIME)
* qda_memory_manager.c: IOMMU device registry and allocation
* qda_memory_dma.c: DMA-coherent allocation backend
* qda_fastrpc.c: FastRPC protocol implementation
* qda_ioctl.c: IOCTL dispatch
7. UAPI Design
The driver exposes DRM-style IOCTLs defined in
include/uapi/drm/qda_accel.h, following DRM UAPI conventions
(__u32/__u64 types, C++ guard, GPL-2.0-only WITH Linux-syscall-note).
Patch Series Organization
==========================
Patch 01: MAINTAINERS entry
Patch 02: Driver documentation (Documentation/accel/qda/)
Patches 03-04: Core driver skeleton and compute bus
Patch 05: iommu: Register qda-compute-cb bus with IOMMU subsystem
Patches 06-07: CB device enumeration and memory manager
Patch 08: QUERY IOCTL and UAPI header
Patches 09-11: GEM buffer management and PRIME import
Patches 12-15: FastRPC protocol (invoke, session create/release,
map/unmap)
Open Items
===========
1. Device-Tree Compatible String
The QDA driver uses the same device-tree node structure and
properties as the existing fastrpc driver in drivers/misc/. A
mechanism is needed to allow the QDA driver to bind to its device
node independently of the fastrpc driver.
The intended coexistence model is: platforms that require the
complete fastrpc feature set continue to use "qcom,fastrpc"; new
platforms where a feature available only in QDA takes priority, or
where QDA's current feature set is sufficient, use a QDA-specific
compatible string. New feature development is directed toward QDA
rather than the existing fastrpc driver. As QDA matures toward
feature parity with fastrpc, platforms can adopt the QDA-specific
compatible string exclusively.
The options under consideration are:
a) Add a new "qcom,qda" compatible string to the existing
qcom,fastrpc.yaml binding, since the DT node structure and
properties are identical. This avoids a separate binding file
but adds a QDA-specific string to a fastrpc binding.
b) Introduce a separate qcom,qda.yaml binding that references or
inherits the fastrpc binding properties.
Seeking guidance from DT binding maintainers on the preferred
approach.
2. Privilege Level Management
Currently, daemon processes and user processes have the same access
level as both use the same accel device node. This needs to be
addressed as daemons attach to privileged DSP protection domains
and require higher privilege levels for system-level operations.
Seeking guidance on the best approach: separate device nodes,
capability-based checks, or DRM master/authentication mechanisms.
3. UAPI Compatibility Layer
A compatibility layer is needed to facilitate migration of client
applications from the existing FastRPC UAPI to the new QDA UAPI,
ensuring a smooth transition for existing userspace code. Seeking
guidance on the preferred implementation approach: in-kernel
translation layer, userspace wrapper library, or hybrid solution.
An initial evaluation of an in-kernel translation shim was
performed, where legacy FastRPC device nodes (/dev/fastrpc-*) are
exposed and requests are internally routed to the QDA accel driver.
The goal was to keep the compatibility layer minimal, reuse existing
QDA helper paths (attach, buffer allocation, mapping, etc.), and
avoid duplication of GEM and buffer management logic.
However, the following challenges were identified:
a) Dependency on drm_file for QDA helpers
QDA relies on GEM-backed allocations and per-client handle
namespaces, which require a valid struct drm_file. Since GEM
handles are scoped per drm_file, the compatibility layer cannot
directly reuse QDA helper paths without establishing a proper
drm_file context for each client.
b) Lack of public API for drm_file creation
Creating a drm_file directly (similar to mock_drm_getfile()-style
approaches) is not feasible, as the required helpers
(drm_file_alloc(), drm_file_free(), etc.) are internal to the DRM
core and not exported. This prevents external drivers from safely
constructing and managing drm_file instances.
c) VFS-based open is not a viable solution
Opening the underlying accel device (/dev/accel/accelN) from the
compatibility driver via filp_open() does provide a valid
drm_file, but introduces reliance on userspace-visible device
paths, lack of stability in containerized or chroot environments,
and no clean mapping between legacy device nodes and accel
devices.
d) Userspace proxy limitations (CUSE)
A CUSE-based userspace proxy was evaluated. However, DMA-buf file
descriptors passed by legacy applications cannot be directly
reused in the CUSE daemon (file descriptors are process-specific),
which breaks buffer sharing semantics.
e) drm_client-based approaches do not match requirements
drm_client APIs (used for fbdev emulation) rely on a shared
drm_file and do not provide the per-client isolation required by
FastRPC semantics.
Due to the above constraints, it is currently unclear how to
implement an in-kernel compatibility layer that correctly handles
per-client drm_file contexts without relying on VFS paths or
non-exported DRM internals.
4. Documentation Improvements
Add detailed IOCTL usage examples, document DSP firmware interface
requirements, and create a migration guide from the existing FastRPC
driver.
5. Per-Session Memory Allocation
Develop a userspace API to support memory allocation on a per-session
basis, enabling session-specific memory management.
6. Audio and Sensors PD Support
The current series does not handle Audio PD and Sensors PD
functionalities. These specialized protection domains require
additional support for real-time constraints and power management.
Interface Compatibility
========================
The QDA driver uses the same device-tree node structure and child node
layout (including "qcom,fastrpc-compute-cb" child nodes) as the
existing fastrpc driver. The underlying FastRPC protocol and DSP
firmware interface are compatible with the existing fastrpc driver,
ensuring that DSP firmware and libraries continue to work without
modification.
References
==========
Previous discussions on this migration:
- https://lkml.org/lkml/2024/6/24/479
- https://lkml.org/lkml/2024/6/21/1252
Testing
=======
The driver has been tested on Qualcomm platforms with:
- Basic FastRPC attach/release operations
- DSP process creation and initialization
- Memory mapping/unmapping operations
- Dynamic invocation with various buffer types
- GEM buffer allocation and mmap
- PRIME buffer import from other subsystems
Signed-off-by: Ekansh Gupta <ekansh.gupta(a)oss.qualcomm.com>
---
Ekansh Gupta (15):
MAINTAINERS: Add entry for Qualcomm DSP Accelerator (QDA) driver
accel/qda: Add QDA driver documentation
accel/qda: Add initial QDA DRM accelerator driver
accel/qda: Add compute bus for QDA context banks
iommu: Add QDA compute context bank bus to iommu_buses
accel/qda: Create compute context bank devices on QDA compute bus
accel/qda: Add memory manager for CB devices
accel/qda: Add QUERY IOCTL and QDA UAPI header
accel/qda: Add DMA-backed GEM objects and memory manager integration
accel/qda: Add GEM_CREATE and GEM_MMAP_OFFSET IOCTLs
accel/qda: Add PRIME DMA-BUF import support
accel/qda: Add FastRPC invocation support
accel/qda: Add DSP process creation and release
accel/qda: Add remote memory mapping to DSP address space
accel/qda: Add remote memory unmap from DSP address space
Documentation/accel/index.rst | 1 +
Documentation/accel/qda/index.rst | 13 +
Documentation/accel/qda/qda.rst | 146 +++++
MAINTAINERS | 9 +
drivers/accel/Kconfig | 1 +
drivers/accel/Makefile | 2 +
drivers/accel/qda/Kconfig | 34 +
drivers/accel/qda/Makefile | 19 +
drivers/accel/qda/qda_cb.c | 146 +++++
drivers/accel/qda/qda_cb.h | 32 +
drivers/accel/qda/qda_compute_bus.c | 68 ++
drivers/accel/qda/qda_drv.c | 192 ++++++
drivers/accel/qda/qda_drv.h | 91 +++
drivers/accel/qda/qda_fastrpc.c | 1058 ++++++++++++++++++++++++++++++++
drivers/accel/qda/qda_fastrpc.h | 390 ++++++++++++
drivers/accel/qda/qda_gem.c | 177 ++++++
drivers/accel/qda/qda_gem.h | 62 ++
drivers/accel/qda/qda_ioctl.c | 296 +++++++++
drivers/accel/qda/qda_ioctl.h | 19 +
drivers/accel/qda/qda_memory_dma.c | 110 ++++
drivers/accel/qda/qda_memory_dma.h | 17 +
drivers/accel/qda/qda_memory_manager.c | 380 ++++++++++++
drivers/accel/qda/qda_memory_manager.h | 75 +++
drivers/accel/qda/qda_prime.c | 184 ++++++
drivers/accel/qda/qda_prime.h | 18 +
drivers/accel/qda/qda_rpmsg.c | 248 ++++++++
drivers/accel/qda/qda_rpmsg.h | 30 +
drivers/iommu/iommu.c | 4 +
include/linux/qda_compute_bus.h | 32 +
include/uapi/drm/qda_accel.h | 229 +++++++
30 files changed, 4083 insertions(+)
---
base-commit: 80dd246accce631c328ea43294e53b2b2dd2aa32
change-id: 20260519-qda-series-78c2bf0ed78b
Best regards,
--
Ekansh Gupta <ekansh.gupta(a)oss.qualcomm.com>
In rocket_job_run(), after taking an extra fence reference for
job->done_fence via dma_fence_get(), the error paths have three bugs:
- The dma_fence reference held by job->done_fence is never released,
causing a reference leak.
- pm_runtime_get_sync() increments the usage counter even on failure,
but the error path does not decrement it, leaking the runtime PM
reference and preventing the NPU from suspending.
- A valid but unsignaled fence is returned to the DRM scheduler,
which triggers WARN("Fence ... released with pending signals!")
when the scheduler drops its reference.
Fix by replacing pm_runtime_get_sync() with pm_runtime_resume_and_get()
which auto-balances the usage counter on failure, releasing both fence
references on error, and returning ERR_PTR(ret) instead of the
unsignaled fence.
Cc: stable(a)vger.kernel.org
Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL")
Signed-off-by: ZhaoJinming <zhaojinming(a)uniontech.com>
---
drivers/accel/rocket/rocket_job.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index ac51bff39833..e8a073e22ac2 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -310,13 +310,22 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
dma_fence_put(job->done_fence);
job->done_fence = dma_fence_get(fence);
- ret = pm_runtime_get_sync(core->dev);
- if (ret < 0)
- return fence;
+ ret = pm_runtime_resume_and_get(core->dev);
+ if (ret < 0) {
+ dma_fence_put(job->done_fence);
+ job->done_fence = NULL;
+ dma_fence_put(fence);
+ return ERR_PTR(ret);
+ }
ret = iommu_attach_group(job->domain->domain, core->iommu_group);
- if (ret < 0)
- return fence;
+ if (ret < 0) {
+ pm_runtime_put(core->dev);
+ dma_fence_put(job->done_fence);
+ job->done_fence = NULL;
+ dma_fence_put(fence);
+ return ERR_PTR(ret);
+ }
scoped_guard(mutex, &core->job_lock) {
core->in_flight_job = job;
--
2.20.1
In rocket_job_run(), after taking an extra fence reference for
job->done_fence via dma_fence_get(), the error paths have three bugs:
- The dma_fence reference held by job->done_fence is never released,
causing a reference leak.
- pm_runtime_get_sync() increments the usage counter even on failure,
but the error path does not decrement it, leaking the runtime PM
reference and preventing the NPU from suspending.
- A valid but unsignaled fence is returned to the DRM scheduler,
which triggers WARN("Fence ... released with pending signals!")
when the scheduler drops its reference.
Fix by replacing pm_runtime_get_sync() with pm_runtime_resume_and_get()
which auto-balances the usage counter on failure, releasing both fence
references on error, and returning ERR_PTR(ret) instead of the
unsignaled fence.
Cc: stable(a)vger.kernel.org
Fixes: 0810d5ad88a1 ("accel/rocket: Add job submission IOCTL")
Signed-off-by: ZhaoJinming <zhaojinming(a)uniontech.com>
---
drivers/accel/rocket/rocket_job.c | 19 ++++++++++++++-----
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/accel/rocket/rocket_job.c b/drivers/accel/rocket/rocket_job.c
index ac51bff39833..e8a073e22ac2 100644
--- a/drivers/accel/rocket/rocket_job.c
+++ b/drivers/accel/rocket/rocket_job.c
@@ -310,13 +310,22 @@ static struct dma_fence *rocket_job_run(struct drm_sched_job *sched_job)
dma_fence_put(job->done_fence);
job->done_fence = dma_fence_get(fence);
- ret = pm_runtime_get_sync(core->dev);
- if (ret < 0)
- return fence;
+ ret = pm_runtime_resume_and_get(core->dev);
+ if (ret < 0) {
+ dma_fence_put(job->done_fence);
+ job->done_fence = NULL;
+ dma_fence_put(fence);
+ return ERR_PTR(ret);
+ }
ret = iommu_attach_group(job->domain->domain, core->iommu_group);
- if (ret < 0)
- return fence;
+ if (ret < 0) {
+ pm_runtime_put(core->dev);
+ dma_fence_put(job->done_fence);
+ job->done_fence = NULL;
+ dma_fence_put(fence);
+ return ERR_PTR(ret);
+ }
scoped_guard(mutex, &core->job_lock) {
core->in_flight_job = job;
--
2.20.1
Most of this patch series has already been pushed upstream, this is just
the second half of the patch series that has not been pushed yet + some
additional changes which were required to implement changes requested by
the mailing list. This patch series is originally from Asahi, previously
posted by Daniel Almeida.
The previous version of the patch series can be found here:
https://patchwork.freedesktop.org/series/164580/
Branch with patches applied available here:
https://gitlab.freedesktop.org/lyudess/linux/-/commits/rust/gem-shmem
This patch series applies on top of drm-rust-next
Patch-series wide changes since V15:
* Fix some major rebasing errors I somehow didn't notice :(
* Drop the dependency on LazyInit, use the trick that Alice suggested
instead.
* Fix dependency ordering so that Tyr can get the vmap stuff first
without the other bits.
Patch-series wide changes since V16:
* Fix ordering one more time (SetOnce::reset() doesn't need to come
before adding vmap functions)
* Rebase against the latest DeviceContext changes from me that got
pushed.
Lyude Paul (4):
rust: drm: gem: shmem: Add DmaResvGuard helper
rust: drm: gem: shmem: Add vmap functions
rust: faux: Allow retrieving a bound Device
rust: drm: gem: Introduce shmem::Object::sg_table()
rust/kernel/drm/gem/shmem.rs | 524 ++++++++++++++++++++++++++++++++++-
rust/kernel/faux.rs | 16 +-
2 files changed, 524 insertions(+), 16 deletions(-)
base-commit: fea3a2dd7d3fc1936211ced5f84420e610435730
--
2.54.0
It is 2026. Cryptocurrency is no longer just an investment; it is the backbone of a new global financial system. Millions of people now manage their wealth through digital assets like Bitcoin, Ethereum, and USDT. This shift has brought unprecedented financial freedom, but it has also opened the door to a new era of cybercrime.
From sophisticated phishing attacks and fake investment platforms to direct wallet hacks, the methods used by scammers have become increasingly complex. Every day, victims lose access to their hard-earned digital assets. The feeling of helplessness is overwhelming—but it is not the end of the story.
At MUYERN TRUST HACKER, we turn that helplessness into action. Recognized as the Best, Top, and Most Trusted Cryptocurrency Recovery Company in 2026, we have been fighting crypto crime since 2010. With over $990 Million recovered and a reported 99% success rate, we don’t just promise results; we deliver them.
But how exactly do we recover what seems lost? Here is a look at our proven methodology.
The Myth of "Untraceable" Crypto
Many victims believe that once crypto is stolen, it is gone forever because blockchain is anonymous. This is a misconception. Blockchain is pseudonymous, not anonymous. Every transaction leaves a permanent, public footprint. While scammers use mixers and bridges to hide their tracks, they eventually need to cash out. That is where they make mistakes—and where we step in.
Our 4-Step Recovery Process
We combine cutting-edge technology with legal expertise to trace, freeze, and return your assets.
1. Forensic Case Assessment
Recovery begins with intelligence. When you contact us, our experts conduct a deep-dive analysis of your case. We review transaction hashes, wallet addresses, and communication logs to identify the type of fraud (e.g., romance scam, investment fraud, or hack). We determine the viability of recovery before proceeding, ensuring transparency and honesty from day one.
2. Advanced Blockchain Tracing
Using proprietary forensic tools, we trace the movement of your funds across multiple blockchains. Even if scammers attempt to launder money through decentralized exchanges (DEXs) or privacy protocols, our algorithms identify patterns and link illicit addresses to known entities. We follow the money trail until it reaches a point of vulnerability—usually a centralized exchange or a regulated financial institution.
3. Strategic Legal Intervention
Tracing is only half the battle. To recover funds, we must act where the criminals cash out. We collaborate with:
Global Exchanges: Providing irrefutable forensic evidence to freeze accounts holding stolen funds.
Law Enforcement: Assisting cybercrime units with detailed investigation reports.
Legal Networks: Navigating international jurisdictions to secure asset returns through legal channels.
This multi-pronged approach puts immense pressure on bad actors and compliant platforms to return the assets.
4. Secure Asset Return & Protection
Once funds are recovered, we facilitate their secure transfer back to your personal wallet. But our job doesn’t end there. We provide a post-recovery security audit, helping you strengthen your digital defenses to prevent future attacks. We believe in empowering our clients, not just rescuing them.
Why MUYERN TRUST HACKER is the Leader in 2026
In an industry rife with secondary scams, trust is everything. Here is why thousands of victims choose us:
Proven Longevity: Operating since 2010, we have evolved alongside the technology, giving us unmatched experience.
Massive Impact: We have recovered over $990 Million in digital assets for individuals and businesses worldwide.
High Success Rate: Our rigorous vetting process ensures a reported 99% Crypto Recovery Rate for accepted cases.
Global Reach: Cybercrime has no borders, and neither do we. Our network spans major financial hubs across the globe.
Don’t Let Scammers Win
If you have lost cryptocurrency to fraud, hacks, or scams, time is critical. The faster you act, the higher the chance of recovery. You do not have to face this alone.
Join the thousands of clients who have reclaimed their financial freedom with MUYERN TRUST HACKER.
Ready to Start Your Recovery?
Contact our specialist team today for a confidential, no-obligation case assessment.
Email: [ muyerntrusted(at)mail-me(.)c o m ]
What App: [ +1.2.0.2.7.0.3.2.2.3.9 ]
In case MMIO size is bigger than 4G and peer2peer DMA goes
through host bridge, we trigger a code path that assigns the
total linked IOVA (which is greater than 4G) to mapped_len.
Previously, `mapped_len` was declared as 32-bit `unsigned int`.
When accumulating `size_t` lengths, this leads to a silent wrap-around.
This truncation causes truncated lengths to be passed to functions
like `fill_sg_entry()`.
Fix this by changing `mapped_len` to `size_t` (64-bit). While
at it, fix similar potential overflow issues in `calc_sg_nents`
by using `check_add_overflow()` for `nents` and using
`unsigned int` for the loop iterator in `fill_sg_entry` to match.
Fixes: 3aa31a8bb11e ("dma-buf: provide phys_vec to scatter-gather mapping routine")
Cc: stable(a)vger.kernel.org
Cc: iommu(a)lists.linux.dev
Reviewed-by: Pranjal Shrivastava <praan(a)google.com>
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
Reviewed-by: Leon Romanovsky <leon(a)kernel.org>
Signed-off-by: David Hu <xuehaohu(a)google.com>
---
Changes in v7:
- Added a missing blank line after local variable declaration in
`calc_sg_nents()` (Leon).
- Collected Reviewed-by from Leon Romanovsky.
Changes in v6:
- Used `check_add_overflow()` in `calc_sg_nents()` for safer
accumulation (Leon).
- Dropped explicit `!nents` check and added a comment noting that
`sg_alloc_table` handles `nents == 0` (Leon).
- Collected Reviewed-by from Kevin Tian.
Changes in v5:
- Removed WARN_ON_ONCE from calc_sg_nents() to avoid log noise (Jason).
- Added explicit check for `!nents` in dma_buf_phys_vec_to_sgt() to
cleanly return -EINVAL on overflow (Jason).
Changes in v4:
- Added WARN_ON_ONCE() to the nents overflow check to prevent silent
failures (Claude Bot).
Changes in v3:
- Removed leftover sentence fragment from the commit message.
- Kept `nents = 0` initialization (previously stated as removed in the
v2 changelog) as it is strictly required for the `+=` accumulation
loop in `calc_sg_nents()`.
Changes in v2:
- Fixed 'IVOA' -> 'IOVA' typo and expanded commit message (Claude Bot).
- Added Reverse Xmas tree formatting (Pranjal).
- Folded in extra bounds checking for calc_sg_nents() (Pranjal).
- Folded in type consistency fix for fill_sg_entry() (Pranjal).
- Collected Reviewed-by from Pranjal Shrivastava.
drivers/dma-buf/dma-buf-mapping.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/dma-buf-mapping.c b/drivers/dma-buf/dma-buf-mapping.c
index 794acff2546a..80f6ab2f4809 100644
--- a/drivers/dma-buf/dma-buf-mapping.c
+++ b/drivers/dma-buf/dma-buf-mapping.c
@@ -5,12 +5,13 @@
*/
#include <linux/dma-buf-mapping.h>
#include <linux/dma-resv.h>
+#include <linux/overflow.h>
static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length,
dma_addr_t addr)
{
unsigned int len, nents;
- int i;
+ unsigned int i;
nents = DIV_ROUND_UP(length, UINT_MAX);
for (i = 0; i < nents; i++) {
@@ -40,8 +41,12 @@ static unsigned int calc_sg_nents(struct dma_iova_state *state,
size_t i;
if (!state || !dma_use_iova(state)) {
- for (i = 0; i < nr_ranges; i++)
- nents += DIV_ROUND_UP(phys_vec[i].len, UINT_MAX);
+ for (i = 0; i < nr_ranges; i++) {
+ unsigned int added = DIV_ROUND_UP(phys_vec[i].len, UINT_MAX);
+
+ if (check_add_overflow(nents, added, &nents))
+ return 0;
+ }
} else {
/*
* In IOVA case, there is only one SG entry which spans
@@ -95,9 +100,10 @@ struct sg_table *dma_buf_phys_vec_to_sgt(struct dma_buf_attachment *attach,
size_t nr_ranges, size_t size,
enum dma_data_direction dir)
{
- unsigned int nents, mapped_len = 0;
struct dma_buf_dma *dma;
struct scatterlist *sgl;
+ size_t mapped_len = 0;
+ unsigned int nents;
dma_addr_t addr;
size_t i;
int ret;
@@ -133,6 +139,8 @@ struct sg_table *dma_buf_phys_vec_to_sgt(struct dma_buf_attachment *attach,
}
nents = calc_sg_nents(dma->state, phys_vec, nr_ranges, size);
+
+ /* sg_alloc_table will cleanly fail and return -EINVAL if nents == 0 */
ret = sg_alloc_table(&dma->sgt, nents, GFP_KERNEL | __GFP_ZERO);
if (ret)
goto err_free_state;
--
2.54.0.1064.gd145956f57-goog
In case MMIO size is bigger than 4G and peer2peer DMA goes
through host bridge, we trigger a code path that assigns the
total linked IOVA (which is greater than 4G) to mapped_len.
Previously, `mapped_len` was declared as 32-bit `unsigned int`.
When accumulating `size_t` lengths, this leads to a silent wrap-around.
This truncation causes truncated lengths to be passed to functions
like `fill_sg_entry()`.
Fix this by changing `mapped_len` to `size_t` (64-bit). While
at it, fix similar potential overflow issues in `calc_sg_nents`
by using `check_add_overflow()` for `nents` and using
`unsigned int` for the loop iterator in `fill_sg_entry` to match.
Fixes: 3aa31a8bb11e ("dma-buf: provide phys_vec to scatter-gather mapping routine")
Cc: stable(a)vger.kernel.org
Cc: iommu(a)lists.linux.dev
Reviewed-by: Pranjal Shrivastava <praan(a)google.com>
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
Signed-off-by: David Hu <xuehaohu(a)google.com>
---
Changes in v6:
- Used `check_add_overflow()` in `calc_sg_nents()` for safer
accumulation (Leon).
- Dropped explicit `!nents` check and added a comment noting that
`sg_alloc_table` handles `nents == 0` (Leon).
- Collected Reviewed-by from Kevin Tian.
Changes in v5:
- Removed WARN_ON_ONCE from calc_sg_nents() to avoid log noise (Jason).
- Added explicit check for `!nents` in dma_buf_phys_vec_to_sgt() to
cleanly return -EINVAL on overflow (Jason).
Changes in v4:
- Added WARN_ON_ONCE() to the nents overflow check to prevent silent
failures (Claude Bot).
Changes in v3:
- Removed leftover sentence fragment from the commit message.
- Kept `nents = 0` initialization (previously stated as removed in the
v2 changelog) as it is strictly required for the `+=` accumulation
loop in `calc_sg_nents()`.
Changes in v2:
- Fixed 'IVOA' -> 'IOVA' typo and expanded commit message (Claude Bot).
- Added Reverse Xmas tree formatting (Pranjal).
- Folded in extra bounds checking for calc_sg_nents() (Pranjal).
- Folded in type consistency fix for fill_sg_entry() (Pranjal).
- Collected Reviewed-by from Pranjal Shrivastava.
drivers/dma-buf/dma-buf-mapping.c | 15 +++++++++++----
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/drivers/dma-buf/dma-buf-mapping.c b/drivers/dma-buf/dma-buf-mapping.c
index 794acff2546a..67a8ff52fb8f 100644
--- a/drivers/dma-buf/dma-buf-mapping.c
+++ b/drivers/dma-buf/dma-buf-mapping.c
@@ -5,12 +5,13 @@
*/
#include <linux/dma-buf-mapping.h>
#include <linux/dma-resv.h>
+#include <linux/overflow.h>
static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length,
dma_addr_t addr)
{
unsigned int len, nents;
- int i;
+ unsigned int i;
nents = DIV_ROUND_UP(length, UINT_MAX);
for (i = 0; i < nents; i++) {
@@ -40,8 +41,11 @@ static unsigned int calc_sg_nents(struct dma_iova_state *state,
size_t i;
if (!state || !dma_use_iova(state)) {
- for (i = 0; i < nr_ranges; i++)
- nents += DIV_ROUND_UP(phys_vec[i].len, UINT_MAX);
+ for (i = 0; i < nr_ranges; i++) {
+ unsigned int added = DIV_ROUND_UP(phys_vec[i].len, UINT_MAX);
+ if (check_add_overflow(nents, added, &nents))
+ return 0;
+ }
} else {
/*
* In IOVA case, there is only one SG entry which spans
@@ -95,9 +99,10 @@ struct sg_table *dma_buf_phys_vec_to_sgt(struct dma_buf_attachment *attach,
size_t nr_ranges, size_t size,
enum dma_data_direction dir)
{
- unsigned int nents, mapped_len = 0;
struct dma_buf_dma *dma;
struct scatterlist *sgl;
+ size_t mapped_len = 0;
+ unsigned int nents;
dma_addr_t addr;
size_t i;
int ret;
@@ -133,6 +138,8 @@ struct sg_table *dma_buf_phys_vec_to_sgt(struct dma_buf_attachment *attach,
}
nents = calc_sg_nents(dma->state, phys_vec, nr_ranges, size);
+
+ /* sg_alloc_table will cleanly fail and return -EINVAL if nents == 0 */
ret = sg_alloc_table(&dma->sgt, nents, GFP_KERNEL | __GFP_ZERO);
if (ret)
goto err_free_state;
--
2.54.0.1064.gd145956f57-goog
Importers notoriously abused the struct page pointers from the sg_table the
DMA-buf exporter provides. This has created numerous problems ranging from
crashes over random memory corruption to security issues.
To find such bad importers DMA-buf already has a functionality to wrap the
sg_table and set the page pointers to NULL enabled under CONFIG_DMABUF_DEBUG.
Change that to just CONFIG_DEBUG to catch even more importers doing something
nasty.
Signed-off-by: Christian König <christian.koenig(a)amd.com>
---
drivers/dma-buf/dma-buf.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
index 71f37544a5c6..d5dfa82ed2dd 100644
--- a/drivers/dma-buf/dma-buf.c
+++ b/drivers/dma-buf/dma-buf.c
@@ -857,7 +857,7 @@ static int dma_buf_wrap_sg_table(struct sg_table **sg_table)
struct dma_buf_sg_table_wrapper *to;
int i, ret;
- if (!IS_ENABLED(CONFIG_DMABUF_DEBUG))
+ if (!IS_ENABLED(CONFIG_DEBUG))
return 0;
/*
@@ -896,7 +896,7 @@ static void dma_buf_unwrap_sg_table(struct sg_table **sg_table)
{
struct dma_buf_sg_table_wrapper *copy;
- if (!IS_ENABLED(CONFIG_DMABUF_DEBUG))
+ if (!IS_ENABLED(CONFIG_DEBUG))
return;
copy = container_of(*sg_table, typeof(*copy), wrapper);
--
2.43.0
Yes. RHS operates as a blockchain investigation service, providing professional tracing and forensic reporting. RHS has contributed to major seizures, including a December 2025 case involving over $300 million linked to an international crypto fraud scheme. RHS does not promise guaranteed recovery but delivers actionable intelligence that supports law enforcement and legal action.