Happy new year everyone,
I have a few questions regarding the design of the whole memory sharing over Vsock work and looking to get some feedback before I spend time making the changes.
The initial design (from Nov): - Initially I implemented a FFA based DMA heap (no DMA ops used), which would allocate memory and make direct FFA calls to send the memory to the other endpoint.
- The userspace would open this heap, allocated dma-buf and pass its FD over Vsock.
- The Vsock layer would then call ->shmem() (a new callback I added to dma-buf/heap, which will send memory over FFA and return ffa bus address), to get the metadata to be shared over Vsock.
The current design: - In one of the calls Bertrand suggested to not create parallel paths for sending memory over FFA. Instead we should use the exiting dma-ops for FFA (used for virtqueue and reserved-mem) somehow.
- I created a platform device (as a child of the FFA device) and assigned a new set of DMA ops to it (the only difference from reserved-mem ops is that we don't do swiotlb here and allocate fresh instead of the reserved mem).
- This pdev is used by the DMA heap to allocate memory using dma_alloc_coherent() and that made sure everything got mapped correctly.
- I still need a dma-buf helper to get the metadata to send over Vsock. The existing helper was renamed as s/shmem/shmem_data.
The future design (that I have questions about): - The FFA specific DMA heap I now have doesn't do anything special compared to the system heap, mostly exactly same.
- Which made me realize that probably I shouldn't add a new heap (Google can add one later if they really want) and the solution should work with any heap / dma-buf.
- So, userspace should allocate heap from system-heap, get a dma-buf from it and send its FD.
- The vsock layer then should attach this dma-buf to a `struct device` somehow and then call map_dma_buf() for the dma-buf. This requires the dma-ops of the device to be set to the FFA based dma-ops and then it should just work.
- The tricky point is finding that device struct (as Vsock can't get it from dma-buf or usersapce).
- One way, I think (still needs exploring but should be possible) is to use the struct device of the virtio-msg device over which Vsock is implemented. We can set the dma-ops of the virtio-msg device accordingly.
- The system heap doesn't guarantee contiguous allocation though (which my FFA heap did) and so we will be required to send a scatter-gather list over vsock, instead of just one address and size (what I am doing right now).
- Does this make sense ? Or if there is a use-case that this won't solve, etc ?
Hi Viresh,
Please my comments inline after.
On 5 Jan 2026, at 12:18, Viresh Kumar viresh.kumar@linaro.org wrote:
Happy new year everyone,
I have a few questions regarding the design of the whole memory sharing over Vsock work and looking to get some feedback before I spend time making the changes.
The initial design (from Nov):
- Initially I implemented a FFA based DMA heap (no DMA ops used), which would
allocate memory and make direct FFA calls to send the memory to the other endpoint.
- The userspace would open this heap, allocated dma-buf and pass its FD over
Vsock.
- The Vsock layer would then call ->shmem() (a new callback I added to
dma-buf/heap, which will send memory over FFA and return ffa bus address), to get the metadata to be shared over Vsock.
The current design:
- In one of the calls Bertrand suggested to not create parallel paths for
sending memory over FFA. Instead we should use the exiting dma-ops for FFA (used for virtqueue and reserved-mem) somehow.
- I created a platform device (as a child of the FFA device) and assigned a new
set of DMA ops to it (the only difference from reserved-mem ops is that we don't do swiotlb here and allocate fresh instead of the reserved mem).
- This pdev is used by the DMA heap to allocate memory using
dma_alloc_coherent() and that made sure everything got mapped correctly.
- I still need a dma-buf helper to get the metadata to send over Vsock. The
existing helper was renamed as s/shmem/shmem_data.
The future design (that I have questions about):
- The FFA specific DMA heap I now have doesn't do anything special compared to
the system heap, mostly exactly same.
- Which made me realize that probably I shouldn't add a new heap (Google can add
one later if they really want) and the solution should work with any heap / dma-buf.
- So, userspace should allocate heap from system-heap, get a dma-buf from it and
send its FD.
- The vsock layer then should attach this dma-buf to a `struct device` somehow
and then call map_dma_buf() for the dma-buf. This requires the dma-ops of the device to be set to the FFA based dma-ops and then it should just work.
- The tricky point is finding that device struct (as Vsock can't get it from
dma-buf or usersapce).
- One way, I think (still needs exploring but should be possible) is to use the
struct device of the virtio-msg device over which Vsock is implemented. We can set the dma-ops of the virtio-msg device accordingly.
In the case of a virtio-msg over FF-A device, you will in this case use the dma-ops based on the virtio-msg over FF-A bus memory sharing or are you implying that you will define an other dma-ops purely based on FF-A (and exchange FF-A handles) here ?
If you use area_share system, as dma_alloc_coherent will give you a contiguous virtual address space, you can get a bus address to exchange so that the other side can resolve issues related to non-contiguous physical addresses, removing the need to exchange a scatter gather list as you mention after.
- The system heap doesn't guarantee contiguous allocation though (which my FFA
heap did) and so we will be required to send a scatter-gather list over vsock, instead of just one address and size (what I am doing right now).
- Does this make sense ? Or if there is a use-case that this won't solve, etc ?
It does but i think we can solve the scatter-gather list issue through FFA handle which can correspond to a non-contiguous area of memory or bus address which could do the same (if we have contiguous virtual address space).
Cheers Bertrand
-- viresh
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On Mon, 5 Jan 2026 at 19:06, Bertrand Marquis Bertrand.Marquis@arm.com wrote:
On 5 Jan 2026, at 12:18, Viresh Kumar viresh.kumar@linaro.org wrote: The future design (that I have questions about):
- The FFA specific DMA heap I now have doesn't do anything special compared to
the system heap, mostly exactly same.
- Which made me realize that probably I shouldn't add a new heap (Google can add
one later if they really want) and the solution should work with any heap / dma-buf.
- So, userspace should allocate heap from system-heap, get a dma-buf from it and
send its FD.
- The vsock layer then should attach this dma-buf to a `struct device` somehow
and then call map_dma_buf() for the dma-buf. This requires the dma-ops of the device to be set to the FFA based dma-ops and then it should just work.
- The tricky point is finding that device struct (as Vsock can't get it from
dma-buf or usersapce).
- One way, I think (still needs exploring but should be possible) is to use the
struct device of the virtio-msg device over which Vsock is implemented. We can set the dma-ops of the virtio-msg device accordingly.
In the case of a virtio-msg over FF-A device, you will in this case use the dma-ops based on the virtio-msg over FF-A bus memory sharing or are you implying that you will define an other dma-ops purely based on FF-A (and exchange FF-A handles) here ?
There will be two set of DMA OPS Ipresent in the same file), both based on FFA mostly doing the same thing. The only difference is: - One will take care of memory allocation from reserved-mem, other will use memory provided to it by heap. - One will do swiotlb, other one will not.
From FFA point of view, they will do the exact same thing.
If you use area_share system, as dma_alloc_coherent will give you a contiguous virtual address space, you can get a bus address to exchange so that the other side can resolve issues related to non-contiguous physical addresses, removing the need to exchange a scatter gather list as you mention after.
The system heap doesn't use dma_alloc_coherent(), but get_free_pages() multiple times, so it is a scatter gather list. The mapping routine will map the sg list, which the FFA dma ops will map in a for-loop, so separate area-ids.
-- Viresh
Hi Viresh,
On 5 Jan 2026, at 18:53, Viresh Kumar viresh.kumar@linaro.org wrote:
On Mon, 5 Jan 2026 at 19:06, Bertrand Marquis Bertrand.Marquis@arm.com wrote:
On 5 Jan 2026, at 12:18, Viresh Kumar viresh.kumar@linaro.org wrote: The future design (that I have questions about):
- The FFA specific DMA heap I now have doesn't do anything special compared to
the system heap, mostly exactly same.
- Which made me realize that probably I shouldn't add a new heap (Google can add
one later if they really want) and the solution should work with any heap / dma-buf.
- So, userspace should allocate heap from system-heap, get a dma-buf from it and
send its FD.
- The vsock layer then should attach this dma-buf to a `struct device` somehow
and then call map_dma_buf() for the dma-buf. This requires the dma-ops of the device to be set to the FFA based dma-ops and then it should just work.
- The tricky point is finding that device struct (as Vsock can't get it from
dma-buf or usersapce).
- One way, I think (still needs exploring but should be possible) is to use the
struct device of the virtio-msg device over which Vsock is implemented. We can set the dma-ops of the virtio-msg device accordingly.
In the case of a virtio-msg over FF-A device, you will in this case use the dma-ops based on the virtio-msg over FF-A bus memory sharing or are you implying that you will define an other dma-ops purely based on FF-A (and exchange FF-A handles) here ?
There will be two set of DMA OPS Ipresent in the same file), both based on FFA mostly doing the same thing. The only difference is:
- One will take care of memory allocation from reserved-mem, other
will use memory provided to it by heap.
- One will do swiotlb, other one will not.
From FFA point of view, they will do the exact same thing.
You say FFA, do you mean virtio-msg over FF-A or do you want to have an FF-A specific dma-ops, not related to virtio-msg ?
If you use area_share system, as dma_alloc_coherent will give you a contiguous virtual address space, you can get a bus address to exchange so that the other side can resolve issues related to non-contiguous physical addresses, removing the need to exchange a scatter gather list as you mention after.
The system heap doesn't use dma_alloc_coherent(), but get_free_pages() multiple times, so it is a scatter gather list. The mapping routine will map the sg list, which the FFA dma ops will map in a for-loop, so separate area-ids.
A single FFA_MEM_SHARE call can get a scatter gather list and generate a single HANDLE, why doing several area ids ? Will you end up with one area id per page all the time ? in which case will an area ID point to a larger region ?
In the case of a preallocated contiguous (or not) area for dma_alloc_coherent, I think we could do a single mem share and have a single area ID and then only use bus addresses inside the area. This would highly reduce the stress on the map/unmap.
Cheers Bertrand
-- Viresh
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
On 06-01-26, 07:28, Bertrand Marquis wrote:
You say FFA, do you mean virtio-msg over FF-A or do you want to have an FF-A specific dma-ops, not related to virtio-msg ?
virtio-msg over FF-A.
Here [1] is the current implementation which adds two set of struct dma_map_ops.
A single FFA_MEM_SHARE call can get a scatter gather list and generate a single HANDLE, why doing several area ids ?
Yeah, just realized I can improve virtio_msg_dma_map_sg() to do this better.