Re: [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code

21 Jan 2014


      Hi Anup,
On 20 January 2014 21:46, Anup Patel anup@brainfault.org wrote:
...
On Tue, Jan 21, 2014 at 10:54 AM, Victor Kamensky
victor.kamensky@linaro.org wrote:
...
On 20 January 2014 17:19, Christoffer Dall christoffer.dall@linaro.org wrote:
...
On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
...
On 6 January 2014 14:56, Christoffer Dall christoffer.dall@linaro.org wrote:
...
On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
...
On 6 January 2014 18:20, Marc Zyngier marc.zyngier@arm.com wrote:
>  No matter how data is stored in memory (BE, LE, or
> even PDP endianness), CPU registers always have a consistent
> representation. They are immune to CPU endianness change, and storing
> to/reading from memory won't change the value, as long as you use the
> same endianness for writing/reading.
Ah, endianness. This always confuses me, but I hope the following
is correct... (in all the following when I say BE I mean BE8, not BE32,
since BE32 and virtualization never occur in the same CPU).
Certainly registers don't have endianness, but the entire point
of the CPSR.E bit is exactly that it changes the value as it is
stored to / read from memory, isn't it? -- that's where and when the
byte-lane flipping happens.
Where this impacts the hypervisor is that instead of actually sending
the data out to the bus via the byte-swapping h/w, we've trapped instead.
The hypervisor reads the original data directly from the guest CPU
registers, and so it's the hypervisor and userspace support code that
between them have to emulate the equivalent of the byte lane
swapping h/w. You could argue that it shouldn't be the kernel's
job, but since the kernel has to do it for the devices it emulates
internally, I'm not sure that makes much sense.
As far as I understand, this is exactly what vcpu_data_guest_to_host and
vcpu_data_host_to_guest do; emulate the byte lane swapping.
The problem is that it only works on a little-endian host with the
current code, because be16_to_cpu (for example), actually perform a
byteswap, which is what needs to be emulated.  On a big-endian host, we
do nothing, so we end up giving a byteswapped value to the emulated
device.
Yes, that was my point on the thread: vcpu_data_guest_to_host and
vcpu_data_host_to_guest functions for any given host endianity should
give opposite endian results depending on CPSR E bit value. And
currently it is not happening in BE host case. It seems that Peter and
you agree with that and I gave example in another email with
dynamically switching E bit illustrating this problem for BE host.
...
I think a cleaner fix than this patch is to just change the
be16_to_cpu() to a __swab16() instead, which clearly indicates that
'here is the byte lane swapping'.
Yes, that may work, but it is a bit orthogonal issue.
Why?  I don't think it is, I think it's addressing exactly the point at
hand.
...
And I don't think
it is better. For this to work one need to change canonical endianity on
one of the sides around vcpu_data_guest_to_host and
vcpu_data_host_to_guest functions.
You have to simply clearly define which format you want mmio.data to be
in.
I believe it is already decided. 'mmio.data' in 'struct kvm_run' is not
an integer type - it is bytes array. Bytes array does not have endianity.
It is endian agnostic. Here is snippet from linux/kvm.h
            /* KVM_EXIT_MMIO */
            struct {
                    __u64 phys_addr;
                    __u8  data[8];
                    __u32 len;
                    __u8  is_write;
            } mmio;


it is very natural to treat it as just a piece of memory. I.e when code reads
emulated LE device address as integer, this array will contain integer
placed in memory in LE order, data[3] is MSB, as it would be located in
regular memory. When code reads emulated BE device address as
integer this array will contain integer placed in memory in BE order,
data[0] is MSB.
You can think about it in that way: ARM system emulator runs on x86
(LE) and on PPC (BE). How mmio.data array for the same emulated
device should look like in across these two cases? I believe it should
be identical - just a stream of bytes.
Emulator code handles this situation quite nicely. For example check
in qemu endianness field of MemoryRegionOps structure. Depending
of the field value and current emulator endianity code will place
results into 'mmio.data' array in right order. See [1] as an example
in qemu where endianity of certain ARM devices were not declared
correctly - it was marked as DEVICE_NATIVE_ENDIAN whereas
it should be DEVICE_LITTLE_ENDIAN. After I changed that BE qemu
pretty much started working. I strongly suspect if one would run
ARM system emulation on PPC (BE) he/she would need the same
changes.
Note issue with virtio endianity is very different problem - there it
is not clear for given arrangement of host/emulator how to treat
virtio devices as LE or BE, and in what format data in rings
descriptors are.
IMHO, device endianess should be taken care by device emulators only
because we can have Machine Model containing both LE devices and
BE devices. KVM ARM/ARM64 should only worry about endianess of
in-kernel emulated devices (e.g. VGIC). In general, QEMU or KVMTOOL
should be responsible of device endianess and for this QEMU or KVMTOOL
should also know whether Guest (or VM) is little-endian or big-endian.
I agree with most of above statement except last part. I think
emulator and host KVM should not really care about guest endianity.
They should work in the same way in either case. MarcZ illustrated this
earlier with setup where LE KVM hosted either LE guest or BE guest.
Also note endianity as far as emulation concerned strictly speaking is
not property of the guest, it is rather property of current CPU execution
context (i.e E bit in CPSR reg of V7) In fact access endianity can
change on the fly - i.e when BE V7 image starts initially it assumes
that it runs in LE mode, then once kernel entered it switches CPU
into BE mode, the same happens with secondary CPU callback. And
with the last one I run into situation where such callback before switching
into BE mode read some emulated device with E bit off, latter the same
kernel reads the same device register with E bit on
Thanks,
Victor
...
Regards,
Anup
...
Thanks,
Victor
[1]  https://git.linaro.org/people/victor.kamensky/qemu-be.git/commitdiff/8599358...
...
This is a user space interface across multiple architectures and
therefore something you have to consider carefully and you're limited in
choices to something that works with existing user space code.
...
Changing  it on side that faces hypervisor (code that handles guest spilled
CPU register set) does not make sense at all - if we will keep guest CPU
register set in memory in LE form and hypervisor runs in BE (BE host),
code that spills registers would need to do constant byteswaps. Also any
access by host kernel and hypervisor (all running in BE) would need to do
byteswaps while working with guest saved registers.
Changing canonical form of data on side that faces emulator and mmio
part of kvm_run does not make sense either. kvm_run mmio.data field is
bytes array, when it comes to host kernel from emulator, it already contains
device memory in correct endian order that corresponds to endianity of
emulated device. For example for LE device word read access, after call is
emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2]
mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at
mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function
will byte copy this mmio.data buffer into integer according to ongoing mmio
access size. Note in BE host case such integer, in 'data' variable of
kvm_handle_mmio_return function, will have byteswapped value. Now when it will
be passed into vcpu_data_host_to_guest function, and it emulates read access
of guest with E bit set, and if we follow your suggestion, it will be
byteswapped.
I.e 'data' integer will contain non byteswapped value of LE device. It will be
further stored into some vcpu_reg register, still in native format (BE
store), and
further restored into guest CPU register, still non byteswapped (BE hypervisor).
And that is not what BE client reading word of LE device expects - BE client
knowing that it reads LE device with E bit set, it will issue additional rev
instruction to get device memory as integer. If we really want to follow your
suggestion, one may introduce compensatory byteswaps in mmio_read_buf
and mmio_write_buf functions in case of BE host, rather then just do
memcpy ... but I am not sure what it will buy us - in BE case it will swap data
twice.
Note in above description by "canonical" I mean some form of data regardless
of current access CPSR E value. But it may differ depending on host endianess.
There's a lot of text to digest here, talking about a canonical form
here doesn't help; just define the layout of the destination byte array.
I also got completely lost in what you're referring to when you talk
about 'sides' here.
The thing we must decide is how the data is stored in
kvm_exit_mmio.data.  See Peter's recent thread "KVM and
variable-endianness guest CPUs".  Once we agree on this, the rest should
be easy (assuming we use the same structure for the data in the kernel's
internal kvm_exit_mmio declared on the stack in io_mem_abort()).
The format you suggest requires any consumer of this data to consider
the host endianness, which I don't think makes anything more clear (see
my comment on the vgic patch).
The in-kernel interface between the io_mem_abort() code and any
in-kernel emulated device must do exactly the same as the interface
between KVM and QEMU must do for KVM_EXIT_MMIO.
--
Christoffer

kvmarm mailing list
kvmarm@lists.cs.columbia.edu
https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [PATCH REPOST 5/5] ARM: kvm MMIO support BE host running LE code