[Virtio-msg] Re: Linux Loopback POC

28 Apr 2026

      Hi Viresh,
...
On 28 Apr 2026, at 08:41, Viresh Kumar viresh.kumar@linaro.org wrote:
On 27-04-26, 12:00, Bertrand Marquis wrote:
...
...
On 27 Apr 2026, at 13:34, Viresh Kumar viresh.kumar@linaro.org wrote:
We can have a spin-lock implementation for that ? How does the current code
solve that ? Some sort of blocking needs to be done if the caller expects the
response in the same thread.
I hope that can be done with a simple change over the current code.
thing is you need to sleep to wait for an answer, using a spinlock means we
block waiting for an answer from an other VM or qemu in the kernel, this is not
possible.
Right.
...
In some cases (event sending) you can solve that using deferred work
but in some others (block driver during probe) i had to solve it with more complex
systems:

config register caches
dma pool and kind of retry answer with defer pool increase so that next try

would have enough space in the pool to continue without blocking
I believe that can be done with current design too ?
Correct me if I'm wrong but your current design is just using a statically shared area
from DT, it is not using ffa mem share. So it can be done and this is what i did.
...
...
Definitely ok with that but right now as said i want to have something working in full
to be able to:

ensure the spec is implementable
check if there are some spec enhancements possible to simplify implementation

That's okay.
...
...
Maybe lets start with the problems one by one, with exact use-case to see what
we are lacking right now. I am still not able to see the full picture (in sense
of the problems we have).
My goal right now is to have the following working in loopback and using FF-A
between 2 VMs using qemu as vmm:

entropy device
block device

and be able to stress the system by creating a file from entropy output inside the
disk.
Nice.
...
I discovered that having entropy working is not that complex but disk is a lot more
hacky.
Main issues i encountered so far:

init chicken and egg:
     - device or driver coming first

Not sure why that is an issue. The Linux driver model will probe only after the
device is available.
Linux driver model does I agree with that but in practice when using indirect messages
you potentially have several drivers coming at the same time and also making themselves
visible to others which is ending up in some complexity.
This is possible to do and what i am trying to solve (works on my current ffa poc) but
requires a bunch of loops and retry and waiting and also a bit of ordering so that drivers
are ready and working before they start receiving indirect messages from other VMs.
You seem to imply that this is simple, i though so to but the more i go to a realistic case
the more i have to strengthen the implementation which ends up in more complex code
which is logic.
My initial bare metal implementation is way simpler because it does not have all the linux
complexity.
...
...
   - driver needing to exchange messages or use dma during probe

This is quite normal and must be supported. Not sure what prevents that
currently. I have tested I2C, GPIO, Vsock so far and they do basic message
exchange at probe.
Agree but disk is doing dma sharing, virtqueue configuration and using them from a
non sleepable context which is not that simple to handle without going against the
scheduling model of linux (thinks worked a lot better before i activated all linux RCU,
locks and timing constraints debugging).
...
...
   - messages exchanged or DMA share creation during non sleepable context

qemu/vmm memory handling
     - when do you unshare
     - how can you ensure a share is ready before first event avail
all timeout and queuing issues
     - how to sleep waiting for an answer or waiting to be able to send
     - how to stack (events for example) or defer
     - who has to sleep and when
     - how to handle defered work when exiting, removing a device or VM

Right, these all look valid concerns and I don't see why this can't work with
the current implementation. Maybe we need to fix a few things here and there.
Agree but a few is at the end a lot but yes all those are fixable.
...
...
Right now i already have several consequences i need to handle in the spec

we must have a pool, sharing on demand does not work for disk

Isn't that a device specific issue ? Why is this a virtio-msg kernel or spec issue ?
Because to properly use a pool you need to inform the other side that a shared area is
not to be relinquished once used as it will be reused. So i need to modify the ffa bus sharing
protocol to add something to define to the device side if it should try to relinquish once it is
not needed anymore or if it should wait for an explicit release from the driver side.
This is not major but still need some spec rework.
Added to that, the main issue that i am facing in the kernel is the fact that memory sharing
request from an non sleepable context cannot sleep... and ffa bus memory sharing relies
on request/response system. If you try to send the request without waiting for the response
you end up having an event avail sent before the other side can actually access the memory
because it did not process the share before the event avail and you have errors.
This requires either some ordering or a different way to share asynchronously to have some
performance.
I designed this in the early stage with sharing going through a table in shared memory so that
mapping could be done on demand but seeing how qemu works this could work but would need
some rework (we cannot say that shared memory bus addresses can be anywhere otherwise
implementation becomes to complex.
So it is not a spec issue but not modifying the spec is making implementation very complex
and my current investigation is showing that some changes could be make implementation
simpler and win a lot of performance. Finding those was the point of my PoC.
...
...

if we have a pool the sharer must say when to release, otherwise we have to reshare

the pool content as the device has no idea that something is a pool

we need some config caching in the transport,

I feel that may not be the right approach. The transport should provide the
mechanism to make it work, sleep-able and non-sleep-able (busy loop). The driver
can choose to do what it wants, but it may not be correct for the transport to
manage that.
configuration space is a transport thing and the bus handling config caching
would be very complex. The whole idea from the generation system from Bill
was to allow this kind of things and practice seem to reveal that it is in fact
needed.
I am not quite sure how this could be solved at bus level but definitely open
to suggestions here.
...
Though it may be better to get views on this at the time of upstreaming.
Maintainers may have a say in that and may not agree with what I said :)
Before upstreaming anything my goal is to have something working.
...
...
otherwise any config value request
 from interrupt context which has to sleep cannot be processed
But that can busy loop ?
You cannot busy loop waiting for an interrupt generated by an other VM, that
would go against our goal of making things asynchronous.
You can try and with all debug activated you will for sure end up in
a kernel oops.
...
...

config generation strict as it is cannot easily be implemented without ending up in loops

because generation is changing while you refresh or you have to refresh the whole config
 cache each time one value is modified
Having something working by something working by simplifying the scope was easy but
having something working in a realistic case is far more complex.
I managed to have something working fully between qemu and the kernel which is what
i shared but with ffa between VMs is still only working reliably only in simple cases.
Hopefully we won't require the complex design with bridges etc here and the
simple one can be modified to sort this all out in the end. Lets see how it
goes.
I think we do. The bridge design goal is to have qemu implementation independent
of the bus implementation in the kernel. The PoC works with the same qemu for loopback
or ffa which is nice and having a clear layering with bridge - bus - transport was kind of
the spec goal. Now where to cut could be a question and we might be able one day to
move implementation down to FIFO management and handling inside Qemu but the layering
is becoming blurry as in ffa case not everything can be transferred through the FIFO, reset
message for example cannot as during reset the fifo memory will be relinquished so you will
never get the answer back.
Cheers
Bertrand
...
--
viresh
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

2026

2025

2024

[Virtio-msg] Re: Linux Loopback POC