On Wed, Jun 10, 2026 at 03:08:17PM +0530, Ekansh Gupta wrote:
On 08-06-2026 02:44, Dmitry Baryshkov wrote:
On Thu, Jun 04, 2026 at 10:39:14AM +0530, Ekansh Gupta wrote:
On 20-05-2026 19:26, Dmitry Baryshkov wrote:
On Tue, May 19, 2026 at 11:46:02AM +0530, Ekansh Gupta via B4 Relay wrote:
From: Ekansh Gupta ekansh.gupta@oss.qualcomm.com
Implement the FastRPC remote procedure call path, allowing user-space to invoke methods on the DSP via DRM_IOCTL_QDA_REMOTE_INVOKE.
qda_fastrpc.c / qda_fastrpc.h Implements the FastRPC protocol layer: argument marshalling (qda_fastrpc_invoke_pack), response unmarshalling (qda_fastrpc_invoke_unpack), and invocation context lifecycle management. Each invocation allocates a fastrpc_invoke_context which tracks buffer descriptors, GEM objects, and the completion used to synchronise with the DSP response.
Buffer arguments are handled in three ways:
- DMA-BUF fd: imported via PRIME, IOMMU-mapped dma_addr used
- Direct (inline): copied into the GEM-backed message buffer
- DMA handle: fd forwarded to DSP, physical page descriptor computed
No. This needs to go away. The QDA should support only one way to pass data - via the GEM buffers. Everything else should be handled by the shim layer, etc.
each FD passed here is a GEM buffer. The reason to pass fd is that there are some APIs on DSP side which takes fd as an argument and the user might use the same on their skel implementation. So in this case the remote call will take fd to DSP and the skel implementation will use the FD.>
Then handle it all on the userspace side. In the end, bad library API is not a reason to complicate kernel API and kernel driver.
The problem is that the user passes the fd as an argument to the remote call which the fastrpc library cannot decrypt. So basically the user can allocate some fd buffer(say with FD1) and then call some remote method passing FD1 as an int argument to call HAP_mmap on the same at DSP side, this int argument cannot be differentiated by fastrpc library as FD/non-FD argument.
How does it make the difference _now_? I hope it doesn't accept u64 value, bruteforcing if it is an FD, address or something else.
+#define FASTRPC_SCALARS(method, in, out) \
FASTRPC_BUILD_SCALARS(0, method, in, out, 0, 0)+/**
- struct fastrpc_buf_overlap - Buffer overlap tracking structure
- Tracks overlapping buffer regions to optimise memory mapping and avoid
- redundant mappings of the same physical memory.
WHat for? Even if this is a valid optimization, implement it as a subsequent patch. The first goal should be very simple - get GEM buffers from the app, pass them to the DSP, read the results.
yes, this implementation is mimicking the existing fastrpc design where non-FD buffers are also supported. I am currently evaluating the maintainance of such buffers from userspace side and trying to understand the impacts of the same. I am planning to bring it as a future enhancement if there is no regression.>
Other way around. Drop it for now and bring it back if it has any positive impact.
We did evaluation and don't see userspace side handling being feasible for non-FD buffers, I'll try to summarize the current design and the problem:
Currently a remote call can take up to 255 arguments and in many cases the user passes the buffers as non-FD arguments which is then copied to meta data and sent to DSP. Before copy there is an operation to identify if the buffers are overlapped so that it can be maintained efficiently.
DSP understands this based on offset and maps it accordingly, so for multiple small sized arguments, there is a possibility that a single page is used. Now if we allocate GEM buffers for each of these small arguments, it would lead to creation of multiple pages(can go up to 255) and all these pages then are required to be mapped onto DSP which could also lead to DSP address space exhaustion. So the limitation is too many pages and that DSP cannot handling this as efficiently as overlapped copy buffers.>
We started to discuss it during the call. Pretty much like you use a single page (or single buffer) for small buffers in the kernel, your userspace should be able to create the same single-BO-multiple-data argument and then pass it to the kernel.
I think, you are mixing several different problems into a single bucket. One is how to pass and map data buffers to the DSP, the other one is how to pass arguments via the uAPI.
I think, for the second question we have an answer. Each argument is located in a buffer at a certain offset provided by the userspace. All the buffers are identified by the GEM handles. It should not matter for the kernel driver if the buffer has been allocated from the QDA device or if it was imported from another DMA-BUF provider. It should not matter (again, for the kernel), if the user wants to pass all arguments in a single BO or if each argument is a separate BO. The kernel must collect GEM handles used by the call, make sure that they are mapped to the DSP address space, covert them to the addresses for the DSP side and then pass those addresses to the DSP. All the overlapping calculations, packing, strategy belong to the userspace.
- /** @handle: Handle of the remote method being invoked */
- u32 handle;
- /** @crc: Pointer to CRC values for data integrity checking */
- u32 *crc;
Add it later. It's unused. Drop all unused fields.
ack.>
- /** @fdlist: Pointer to array of DMA-BUF file descriptors */
- u64 *fdlist;
Why do you need DMA-BUFs in the invocation context? They all should be GEM buffers.
the reason is that the users are dependent on FDs as they can import buffers allocated from anywhere and there are DSP APIs which takes fd as an argument, so they might end up using the same in there skel implementation.>
No, DSP API can't take FD, they don't quite cross the OS and IOMMU boundary. It's the userspace library API. Which might be improved, rewritten, implemented underneath, etc. For the kernel side please, pass _only_ GEM handles + offsets.
Yes, but with the current DSP design, DSP APIs take FD just because of client/user design. On fastrpc, users could bring FD from any source, register it with fastrpc and pass it on to DSP.
The users can bring FD from any source, import it to the QDA's GEM and then receive the handle.
The major problem is what I mentioned above, where the user application passes the FD as an integer argument and the fastrpc library not able to identify if that int is an fd or some other data.>
Please provide an example: the API and the ways to pass the data via the FD or 'other data'. Explain, how _currently_ it is handled.
But, anyway, a bad userspace design is not a reason to complicate uAPI. Library API is not written in stone, there are SOVERSIONs, wrappers and all other ways to provide phase out, deprecation and backwards compatibility. The uAPI, on the other hand, is written in stone.
- /** @pkt_size: Total payload size in bytes */
- u64 pkt_size;
- /** @aligned_pkt_size: Page-aligned payload size for GEM allocation */
- u64 aligned_pkt_size;
- /** @list: Array of invoke buffer descriptors */
- struct fastrpc_invoke_buf *list;
- /** @pages: Array of physical page descriptors for all arguments */
- struct fastrpc_phy_page *pages;
- /** @input_pages: Array of physical page descriptors for input buffers */
- struct fastrpc_phy_page *input_pages;
I think you are trying to bring all the complexity from the old driver with no added benefit. Please don't. Use the existing memory manager. Let it handle all the gory details. If someting is not there, we should consider extending GEM instead.
I'm not changing the metadata format as the DSP might not understand the messages if we modify it.
Well, it's up to you to know if DSP will understand the message or not. The probability ("might not") is not suitable here. Anyway, let's get rid of the various data formats first, then maybe some of the items will go away on their own.
ack>
Also, the fd is still being used because of the client dependency on it. I'll check if there is any other logic that needs alteration here.>
If the client keeps on passing FD to the library calls, you can map FD to GEM handles in the library code.
I hope the int argument part mentioned above answers this.>
NO. You are still telling me that you allow users to shove random data to the kernel and then make the kernel decipher what kind of data it received. This is a very bad interface. Fix it.
+static int fastrpc_context_get_id(struct fastrpc_invoke_context *ctx, struct qda_dev *qdev) +{
- int ret;
- u32 id;
- if (!qdev)
return -EINVAL;- ret = xa_alloc(&qdev->ctx_xa, &id, ctx, xa_limit_32b, GFP_KERNEL);
- if (ret)
return ret;- ctx->ctxid = id << 4;
Why is it being shifted?
this is to accomodate PD type>
Not really an answer.
Okay, let me bring the ctxid layout that DSP expects:
[11:4] = CCCCCCCC (context ID) [3:0] = PPPP (PD type)
Based on this PD type, DSP will decide where to queue the message.
And what does it mean?
- return 0;
+}