[repost: adding kvmarm mailing list as per Christoffer's request]
Hi Guys,
Here is series that enables KVM support for V7 big endian kernels. Mostly it deals with BE KVM host support. Marc Zyngier showed before with his patches how BE guest could run on top LE host. With these patches BE guest runs on top of BE host. If Marc's kvmtool is used with few additional changes I tested that BE host could run LE guest. Also I verified that there were no regressions in BE guest on top of LE host case.
Note that posted series covers only kernel side changes. The changes were tested inside of bigger setup with additional changes in qemu and kvmtool. I will post those changes separately in proper aliases but for completeness sake Appendix A gives pointers to git repositories and branches with all needed changes.
Please note first patch is not related to BE KVM per se. I've run into an issue of conflicting 'push' identifier use while trying to include assembler.h into KVM .S files. Details of an issue I observed covered in Appendix B. The first patch is my take on solving it.
Victor Kamensky (5): ARM: kvm: replace push and pop with stdmb and ldmia instrs to enable assembler.h inclusion ARM: fix KVM assembler files to work in BE case ARM: kvm one_reg coproc set and get BE fixes ARM: kvm vgic mmio should return data in BE format in BE case ARM: kvm MMIO support BE host running LE code
arch/arm/include/asm/assembler.h | 7 +++ arch/arm/include/asm/kvm_asm.h | 4 +- arch/arm/include/asm/kvm_emulate.h | 22 +++++++-- arch/arm/kvm/coproc.c | 94 ++++++++++++++++++++++++++++---------- arch/arm/kvm/init.S | 7 ++- arch/arm/kvm/interrupts.S | 50 +++++++++++--------- arch/arm/kvm/interrupts_head.S | 61 +++++++++++++++---------- virt/kvm/arm/vgic.c | 4 +- 8 files changed, 168 insertions(+), 81 deletions(-)
Before fix kvm interrupt.S and interrupt_head.S used push and pop assembler instruction. It causes problem if <asm/assembler.h> file should be include. In assembler.h "push" is defined as macro so it causes compilation errors like this:
arch/arm/kvm/interrupts.S: Assembler messages: arch/arm/kvm/interrupts.S:51: Error: ARM register expected -- `lsr {r2,r3}'
Solution implemented by this patch replaces all 'push {...}' with 'stdmb sp!, {...}' instruction; and all 'pop {...}' with 'ldmia sp!, {...}'.
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org --- arch/arm/kvm/interrupts.S | 38 +++++++++++++++++++------------------- arch/arm/kvm/interrupts_head.S | 34 +++++++++++++++++----------------- 2 files changed, 36 insertions(+), 36 deletions(-)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S index ddc1553..df19133 100644 --- a/arch/arm/kvm/interrupts.S +++ b/arch/arm/kvm/interrupts.S @@ -47,7 +47,7 @@ __kvm_hyp_code_start: * instead, ignoring the ipa value. */ ENTRY(__kvm_tlb_flush_vmid_ipa) - push {r2, r3} + stmdb sp!, {r2, r3}
dsb ishst add r0, r0, #KVM_VTTBR @@ -62,7 +62,7 @@ ENTRY(__kvm_tlb_flush_vmid_ipa) mcrr p15, 6, r2, r3, c2 @ Back to VMID #0 isb @ Not necessary if followed by eret
- pop {r2, r3} + ldmia sp!, {r2, r3} bx lr ENDPROC(__kvm_tlb_flush_vmid_ipa)
@@ -110,7 +110,7 @@ ENTRY(__kvm_vcpu_run) #ifdef CONFIG_VFPv3 @ Set FPEXC_EN so the guest doesn't trap floating point instructions VFPFMRX r2, FPEXC @ VMRS - push {r2} + stmdb sp!, {r2} orr r2, r2, #FPEXC_EN VFPFMXR FPEXC, r2 @ VMSR #endif @@ -175,7 +175,7 @@ __kvm_vcpu_return:
after_vfp_restore: @ Restore FPEXC_EN which we clobbered on entry - pop {r2} + ldmia sp!, {r2} VFPFMXR FPEXC, r2 #endif
@@ -260,7 +260,7 @@ ENTRY(kvm_call_hyp)
/* Handle undef, svc, pabt, or dabt by crashing with a user notice */ .macro bad_exception exception_code, panic_str - push {r0-r2} + stmdb sp!, {r0-r2} mrrc p15, 6, r0, r1, c2 @ Read VTTBR lsr r1, r1, #16 ands r1, r1, #0xff @@ -338,7 +338,7 @@ hyp_hvc: * Getting here is either becuase of a trap from a guest or from calling * HVC from the host kernel, which means "switch to Hyp mode". */ - push {r0, r1, r2} + stmdb sp!, {r0, r1, r2}
@ Check syndrome register mrc p15, 4, r1, c5, c2, 0 @ HSR @@ -361,11 +361,11 @@ hyp_hvc: bne guest_trap @ Guest called HVC
host_switch_to_hyp: - pop {r0, r1, r2} + ldmia sp!, {r0, r1, r2}
- push {lr} + stmdb sp!, {lr} mrs lr, SPSR - push {lr} + stmdb sp!, {lr}
mov lr, r0 mov r0, r1 @@ -375,9 +375,9 @@ host_switch_to_hyp: THUMB( orr lr, #1) blx lr @ Call the HYP function
- pop {lr} + ldmia sp!, {lr} msr SPSR_csxf, lr - pop {lr} + ldmia sp!, {lr} eret
guest_trap: @@ -418,7 +418,7 @@ guest_trap:
/* Preserve PAR */ mrrc p15, 0, r0, r1, c7 @ PAR - push {r0, r1} + stmdb sp!, {r0, r1}
/* Resolve IPA using the xFAR */ mcr p15, 0, r2, c7, c8, 0 @ ATS1CPR @@ -431,7 +431,7 @@ guest_trap: orr r2, r2, r1, lsl #24
/* Restore PAR */ - pop {r0, r1} + ldmia sp!, {r0, r1} mcrr p15, 0, r0, r1, c7 @ PAR
3: load_vcpu @ Load VCPU pointer to r0 @@ -440,10 +440,10 @@ guest_trap: 1: mov r1, #ARM_EXCEPTION_HVC b __kvm_vcpu_return
-4: pop {r0, r1} @ Failed translation, return to guest +4: ldmia sp!, {r0, r1} @ Failed translation, return to guest mcrr p15, 0, r0, r1, c7 @ PAR clrex - pop {r0, r1, r2} + ldmia sp!, {r0, r1, r2} eret
/* @@ -455,7 +455,7 @@ guest_trap: #ifdef CONFIG_VFPv3 switch_to_guest_vfp: load_vcpu @ Load VCPU pointer to r0 - push {r3-r7} + stmdb sp!, {r3-r7}
@ NEON/VFP used. Turn on VFP access. set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11)) @@ -467,15 +467,15 @@ switch_to_guest_vfp: add r7, r0, #VCPU_VFP_GUEST restore_vfp_state r7
- pop {r3-r7} - pop {r0-r2} + ldmia sp!, {r3-r7} + ldmia sp!, {r0-r2} clrex eret #endif
.align hyp_irq: - push {r0, r1, r2} + stmdb sp!, {r0, r1, r2} mov r1, #ARM_EXCEPTION_IRQ load_vcpu @ Load VCPU pointer to r0 b __kvm_vcpu_return diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S index 6f18695..c371db7 100644 --- a/arch/arm/kvm/interrupts_head.S +++ b/arch/arm/kvm/interrupts_head.S @@ -63,7 +63,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrs r2, SP_\mode mrs r3, LR_\mode mrs r4, SPSR_\mode - push {r2, r3, r4} + stmdb sp!, {r2, r3, r4} .endm
/* @@ -73,13 +73,13 @@ vcpu .req r0 @ vcpu pointer always in r0 .macro save_host_regs /* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */ mrs r2, ELR_hyp - push {r2} + stmdb sp!, {r2}
/* usr regs */ - push {r4-r12} @ r0-r3 are always clobbered + stmdb sp!, {r4-r12} @ r0-r3 are always clobbered mrs r2, SP_usr mov r3, lr - push {r2, r3} + stmdb sp!, {r2, r3}
push_host_regs_mode svc push_host_regs_mode abt @@ -95,11 +95,11 @@ vcpu .req r0 @ vcpu pointer always in r0 mrs r7, SP_fiq mrs r8, LR_fiq mrs r9, SPSR_fiq - push {r2-r9} + stmdb sp!, {r2-r9} .endm
.macro pop_host_regs_mode mode - pop {r2, r3, r4} + ldmia sp!, {r2, r3, r4} msr SP_\mode, r2 msr LR_\mode, r3 msr SPSR_\mode, r4 @@ -110,7 +110,7 @@ vcpu .req r0 @ vcpu pointer always in r0 * Clobbers all registers, in all modes, except r0 and r1. */ .macro restore_host_regs - pop {r2-r9} + ldmia sp!, {r2-r9} msr r8_fiq, r2 msr r9_fiq, r3 msr r10_fiq, r4 @@ -125,12 +125,12 @@ vcpu .req r0 @ vcpu pointer always in r0 pop_host_regs_mode abt pop_host_regs_mode svc
- pop {r2, r3} + ldmia sp!, {r2, r3} msr SP_usr, r2 mov lr, r3 - pop {r4-r12} + ldmia sp!, {r4-r12}
- pop {r2} + ldmia sp!, {r2} msr ELR_hyp, r2 .endm
@@ -218,7 +218,7 @@ vcpu .req r0 @ vcpu pointer always in r0 add r2, vcpu, #VCPU_USR_REG(3) stm r2, {r3-r12} add r2, vcpu, #VCPU_USR_REG(0) - pop {r3, r4, r5} @ r0, r1, r2 + ldmia sp!, {r3, r4, r5} @ r0, r1, r2 stm r2, {r3, r4, r5} mrs r2, SP_usr mov r3, lr @@ -258,7 +258,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrc p15, 2, r12, c0, c0, 0 @ CSSELR
.if \store_to_vcpu == 0 - push {r2-r12} @ Push CP15 registers + stmdb sp!, {r2-r12} @ Push CP15 registers .else str r2, [vcpu, #CP15_OFFSET(c1_SCTLR)] str r3, [vcpu, #CP15_OFFSET(c1_CPACR)] @@ -286,7 +286,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrc p15, 0, r12, c12, c0, 0 @ VBAR
.if \store_to_vcpu == 0 - push {r2-r12} @ Push CP15 registers + stmdb sp!, {r2-r12} @ Push CP15 registers .else str r2, [vcpu, #CP15_OFFSET(c13_CID)] str r3, [vcpu, #CP15_OFFSET(c13_TID_URW)] @@ -305,7 +305,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrrc p15, 0, r4, r5, c7 @ PAR
.if \store_to_vcpu == 0 - push {r2,r4-r5} + stmdb sp!, {r2,r4-r5} .else str r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)] add r12, vcpu, #CP15_OFFSET(c7_PAR) @@ -322,7 +322,7 @@ vcpu .req r0 @ vcpu pointer always in r0 */ .macro write_cp15_state read_from_vcpu .if \read_from_vcpu == 0 - pop {r2,r4-r5} + ldmia sp!, {r2,r4-r5} .else ldr r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)] add r12, vcpu, #CP15_OFFSET(c7_PAR) @@ -333,7 +333,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mcrr p15, 0, r4, r5, c7 @ PAR
.if \read_from_vcpu == 0 - pop {r2-r12} + ldmia sp!, {r2-r12} .else ldr r2, [vcpu, #CP15_OFFSET(c13_CID)] ldr r3, [vcpu, #CP15_OFFSET(c13_TID_URW)] @@ -361,7 +361,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mcr p15, 0, r12, c12, c0, 0 @ VBAR
.if \read_from_vcpu == 0 - pop {r2-r12} + ldmia sp!, {r2-r12} .else ldr r2, [vcpu, #CP15_OFFSET(c1_SCTLR)] ldr r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
On Fri, Dec 20, 2013 at 08:48:41AM -0800, Victor Kamensky wrote:
Before fix kvm interrupt.S and interrupt_head.S used push and pop assembler instruction. It causes problem if <asm/assembler.h> file should be include. In assembler.h "push" is defined as macro so it causes compilation errors like this:
"Before fix kvm..." doesn't read very pleasently, consider using something like "Prior to this commit...."
"causes a problem" or "causes problems"
change "if <asm/assembler.h> file should be include..." to "if <asm/assembler.h> is included, because assember.h defines 'push' as a macro..."
arch/arm/kvm/interrupts.S: Assembler messages: arch/arm/kvm/interrupts.S:51: Error: ARM register expected -- `lsr {r2,r3}'
Solution implemented by this patch replaces all 'push {...}' with 'stdmb sp!, {...}' instruction; and all 'pop {...}' with 'ldmia sp!, {...}'.
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org
arch/arm/kvm/interrupts.S | 38 +++++++++++++++++++------------------- arch/arm/kvm/interrupts_head.S | 34 +++++++++++++++++----------------- 2 files changed, 36 insertions(+), 36 deletions(-)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S index ddc1553..df19133 100644 --- a/arch/arm/kvm/interrupts.S +++ b/arch/arm/kvm/interrupts.S @@ -47,7 +47,7 @@ __kvm_hyp_code_start:
- instead, ignoring the ipa value.
*/ ENTRY(__kvm_tlb_flush_vmid_ipa)
- push {r2, r3}
- stmdb sp!, {r2, r3}
dsb ishst add r0, r0, #KVM_VTTBR @@ -62,7 +62,7 @@ ENTRY(__kvm_tlb_flush_vmid_ipa) mcrr p15, 6, r2, r3, c2 @ Back to VMID #0 isb @ Not necessary if followed by eret
- pop {r2, r3}
- ldmia sp!, {r2, r3} bx lr
ENDPROC(__kvm_tlb_flush_vmid_ipa) @@ -110,7 +110,7 @@ ENTRY(__kvm_vcpu_run) #ifdef CONFIG_VFPv3 @ Set FPEXC_EN so the guest doesn't trap floating point instructions VFPFMRX r2, FPEXC @ VMRS
- push {r2}
- stmdb sp!, {r2} orr r2, r2, #FPEXC_EN VFPFMXR FPEXC, r2 @ VMSR
#endif @@ -175,7 +175,7 @@ __kvm_vcpu_return: after_vfp_restore: @ Restore FPEXC_EN which we clobbered on entry
- pop {r2}
- ldmia sp!, {r2} VFPFMXR FPEXC, r2
#endif @@ -260,7 +260,7 @@ ENTRY(kvm_call_hyp) /* Handle undef, svc, pabt, or dabt by crashing with a user notice */ .macro bad_exception exception_code, panic_str
- push {r0-r2}
- stmdb sp!, {r0-r2} mrrc p15, 6, r0, r1, c2 @ Read VTTBR lsr r1, r1, #16 ands r1, r1, #0xff
@@ -338,7 +338,7 @@ hyp_hvc: * Getting here is either becuase of a trap from a guest or from calling * HVC from the host kernel, which means "switch to Hyp mode". */
- push {r0, r1, r2}
- stmdb sp!, {r0, r1, r2}
@ Check syndrome register mrc p15, 4, r1, c5, c2, 0 @ HSR @@ -361,11 +361,11 @@ hyp_hvc: bne guest_trap @ Guest called HVC host_switch_to_hyp:
- pop {r0, r1, r2}
- ldmia sp!, {r0, r1, r2}
- push {lr}
- stmdb sp!, {lr} mrs lr, SPSR
- push {lr}
- stmdb sp!, {lr}
mov lr, r0 mov r0, r1 @@ -375,9 +375,9 @@ host_switch_to_hyp: THUMB( orr lr, #1) blx lr @ Call the HYP function
- pop {lr}
- ldmia sp!, {lr} msr SPSR_csxf, lr
- pop {lr}
- ldmia sp!, {lr} eret
guest_trap: @@ -418,7 +418,7 @@ guest_trap: /* Preserve PAR */ mrrc p15, 0, r0, r1, c7 @ PAR
- push {r0, r1}
- stmdb sp!, {r0, r1}
/* Resolve IPA using the xFAR */ mcr p15, 0, r2, c7, c8, 0 @ ATS1CPR @@ -431,7 +431,7 @@ guest_trap: orr r2, r2, r1, lsl #24 /* Restore PAR */
- pop {r0, r1}
- ldmia sp!, {r0, r1} mcrr p15, 0, r0, r1, c7 @ PAR
3: load_vcpu @ Load VCPU pointer to r0 @@ -440,10 +440,10 @@ guest_trap: 1: mov r1, #ARM_EXCEPTION_HVC b __kvm_vcpu_return -4: pop {r0, r1} @ Failed translation, return to guest +4: ldmia sp!, {r0, r1} @ Failed translation, return to guest mcrr p15, 0, r0, r1, c7 @ PAR clrex
- pop {r0, r1, r2}
- ldmia sp!, {r0, r1, r2} eret
/* @@ -455,7 +455,7 @@ guest_trap: #ifdef CONFIG_VFPv3 switch_to_guest_vfp: load_vcpu @ Load VCPU pointer to r0
- push {r3-r7}
- stmdb sp!, {r3-r7}
@ NEON/VFP used. Turn on VFP access. set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11)) @@ -467,15 +467,15 @@ switch_to_guest_vfp: add r7, r0, #VCPU_VFP_GUEST restore_vfp_state r7
- pop {r3-r7}
- pop {r0-r2}
- ldmia sp!, {r3-r7}
- ldmia sp!, {r0-r2} clrex eret
#endif .align hyp_irq:
- push {r0, r1, r2}
- stmdb sp!, {r0, r1, r2} mov r1, #ARM_EXCEPTION_IRQ load_vcpu @ Load VCPU pointer to r0 b __kvm_vcpu_return
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S index 6f18695..c371db7 100644 --- a/arch/arm/kvm/interrupts_head.S +++ b/arch/arm/kvm/interrupts_head.S @@ -63,7 +63,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrs r2, SP_\mode mrs r3, LR_\mode mrs r4, SPSR_\mode
- push {r2, r3, r4}
- stmdb sp!, {r2, r3, r4}
.endm /* @@ -73,13 +73,13 @@ vcpu .req r0 @ vcpu pointer always in r0 .macro save_host_regs /* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */ mrs r2, ELR_hyp
- push {r2}
- stmdb sp!, {r2}
/* usr regs */
- push {r4-r12} @ r0-r3 are always clobbered
- stmdb sp!, {r4-r12} @ r0-r3 are always clobbered mrs r2, SP_usr mov r3, lr
- push {r2, r3}
- stmdb sp!, {r2, r3}
push_host_regs_mode svc push_host_regs_mode abt @@ -95,11 +95,11 @@ vcpu .req r0 @ vcpu pointer always in r0 mrs r7, SP_fiq mrs r8, LR_fiq mrs r9, SPSR_fiq
- push {r2-r9}
- stmdb sp!, {r2-r9}
.endm .macro pop_host_regs_mode mode
- pop {r2, r3, r4}
- ldmia sp!, {r2, r3, r4} msr SP_\mode, r2 msr LR_\mode, r3 msr SPSR_\mode, r4
@@ -110,7 +110,7 @@ vcpu .req r0 @ vcpu pointer always in r0
- Clobbers all registers, in all modes, except r0 and r1.
*/ .macro restore_host_regs
- pop {r2-r9}
- ldmia sp!, {r2-r9} msr r8_fiq, r2 msr r9_fiq, r3 msr r10_fiq, r4
@@ -125,12 +125,12 @@ vcpu .req r0 @ vcpu pointer always in r0 pop_host_regs_mode abt pop_host_regs_mode svc
- pop {r2, r3}
- ldmia sp!, {r2, r3} msr SP_usr, r2 mov lr, r3
- pop {r4-r12}
- ldmia sp!, {r4-r12}
- pop {r2}
- ldmia sp!, {r2} msr ELR_hyp, r2
.endm @@ -218,7 +218,7 @@ vcpu .req r0 @ vcpu pointer always in r0 add r2, vcpu, #VCPU_USR_REG(3) stm r2, {r3-r12} add r2, vcpu, #VCPU_USR_REG(0)
- pop {r3, r4, r5} @ r0, r1, r2
- ldmia sp!, {r3, r4, r5} @ r0, r1, r2 stm r2, {r3, r4, r5} mrs r2, SP_usr mov r3, lr
@@ -258,7 +258,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrc p15, 2, r12, c0, c0, 0 @ CSSELR .if \store_to_vcpu == 0
- push {r2-r12} @ Push CP15 registers
- stmdb sp!, {r2-r12} @ Push CP15 registers .else str r2, [vcpu, #CP15_OFFSET(c1_SCTLR)] str r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
@@ -286,7 +286,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrc p15, 0, r12, c12, c0, 0 @ VBAR .if \store_to_vcpu == 0
- push {r2-r12} @ Push CP15 registers
- stmdb sp!, {r2-r12} @ Push CP15 registers .else str r2, [vcpu, #CP15_OFFSET(c13_CID)] str r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
@@ -305,7 +305,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrrc p15, 0, r4, r5, c7 @ PAR .if \store_to_vcpu == 0
- push {r2,r4-r5}
- stmdb sp!, {r2,r4-r5} .else str r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)] add r12, vcpu, #CP15_OFFSET(c7_PAR)
@@ -322,7 +322,7 @@ vcpu .req r0 @ vcpu pointer always in r0 */ .macro write_cp15_state read_from_vcpu .if \read_from_vcpu == 0
- pop {r2,r4-r5}
- ldmia sp!, {r2,r4-r5} .else ldr r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)] add r12, vcpu, #CP15_OFFSET(c7_PAR)
@@ -333,7 +333,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mcrr p15, 0, r4, r5, c7 @ PAR .if \read_from_vcpu == 0
- pop {r2-r12}
- ldmia sp!, {r2-r12} .else ldr r2, [vcpu, #CP15_OFFSET(c13_CID)] ldr r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
@@ -361,7 +361,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mcr p15, 0, r12, c12, c0, 0 @ VBAR .if \read_from_vcpu == 0
- pop {r2-r12}
- ldmia sp!, {r2-r12} .else ldr r2, [vcpu, #CP15_OFFSET(c1_SCTLR)] ldr r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
-- 1.8.1.4
If you fix to address Dave's comments, then the code change otherwise looks good.
Thanks,
On 21/01/14 01:18, Christoffer Dall wrote:
On Fri, Dec 20, 2013 at 08:48:41AM -0800, Victor Kamensky wrote:
Before fix kvm interrupt.S and interrupt_head.S used push and pop assembler instruction. It causes problem if <asm/assembler.h> file should be include. In assembler.h "push" is defined as macro so it causes compilation errors like this:
"Before fix kvm..." doesn't read very pleasently, consider using something like "Prior to this commit...."
"causes a problem" or "causes problems"
change "if <asm/assembler.h> file should be include..." to "if <asm/assembler.h> is included, because assember.h defines 'push' as a macro..."
arch/arm/kvm/interrupts.S: Assembler messages: arch/arm/kvm/interrupts.S:51: Error: ARM register expected -- `lsr {r2,r3}'
Solution implemented by this patch replaces all 'push {...}' with 'stdmb sp!, {...}' instruction; and all 'pop {...}' with 'ldmia sp!, {...}'.
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org
arch/arm/kvm/interrupts.S | 38 +++++++++++++++++++------------------- arch/arm/kvm/interrupts_head.S | 34 +++++++++++++++++----------------- 2 files changed, 36 insertions(+), 36 deletions(-)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S index ddc1553..df19133 100644 --- a/arch/arm/kvm/interrupts.S +++ b/arch/arm/kvm/interrupts.S @@ -47,7 +47,7 @@ __kvm_hyp_code_start:
- instead, ignoring the ipa value.
*/ ENTRY(__kvm_tlb_flush_vmid_ipa)
- push {r2, r3}
- stmdb sp!, {r2, r3}
dsb ishst add r0, r0, #KVM_VTTBR @@ -62,7 +62,7 @@ ENTRY(__kvm_tlb_flush_vmid_ipa) mcrr p15, 6, r2, r3, c2 @ Back to VMID #0 isb @ Not necessary if followed by eret
- pop {r2, r3}
- ldmia sp!, {r2, r3} bx lr
ENDPROC(__kvm_tlb_flush_vmid_ipa) @@ -110,7 +110,7 @@ ENTRY(__kvm_vcpu_run) #ifdef CONFIG_VFPv3 @ Set FPEXC_EN so the guest doesn't trap floating point instructions VFPFMRX r2, FPEXC @ VMRS
- push {r2}
- stmdb sp!, {r2} orr r2, r2, #FPEXC_EN VFPFMXR FPEXC, r2 @ VMSR
#endif @@ -175,7 +175,7 @@ __kvm_vcpu_return: after_vfp_restore: @ Restore FPEXC_EN which we clobbered on entry
- pop {r2}
- ldmia sp!, {r2} VFPFMXR FPEXC, r2
#endif @@ -260,7 +260,7 @@ ENTRY(kvm_call_hyp) /* Handle undef, svc, pabt, or dabt by crashing with a user notice */ .macro bad_exception exception_code, panic_str
- push {r0-r2}
- stmdb sp!, {r0-r2} mrrc p15, 6, r0, r1, c2 @ Read VTTBR lsr r1, r1, #16 ands r1, r1, #0xff
@@ -338,7 +338,7 @@ hyp_hvc: * Getting here is either becuase of a trap from a guest or from calling * HVC from the host kernel, which means "switch to Hyp mode". */
- push {r0, r1, r2}
- stmdb sp!, {r0, r1, r2}
@ Check syndrome register mrc p15, 4, r1, c5, c2, 0 @ HSR @@ -361,11 +361,11 @@ hyp_hvc: bne guest_trap @ Guest called HVC host_switch_to_hyp:
- pop {r0, r1, r2}
- ldmia sp!, {r0, r1, r2}
- push {lr}
- stmdb sp!, {lr} mrs lr, SPSR
- push {lr}
- stmdb sp!, {lr}
mov lr, r0 mov r0, r1 @@ -375,9 +375,9 @@ host_switch_to_hyp: THUMB( orr lr, #1) blx lr @ Call the HYP function
- pop {lr}
- ldmia sp!, {lr} msr SPSR_csxf, lr
- pop {lr}
- ldmia sp!, {lr} eret
guest_trap: @@ -418,7 +418,7 @@ guest_trap: /* Preserve PAR */ mrrc p15, 0, r0, r1, c7 @ PAR
- push {r0, r1}
- stmdb sp!, {r0, r1}
/* Resolve IPA using the xFAR */ mcr p15, 0, r2, c7, c8, 0 @ ATS1CPR @@ -431,7 +431,7 @@ guest_trap: orr r2, r2, r1, lsl #24 /* Restore PAR */
- pop {r0, r1}
- ldmia sp!, {r0, r1} mcrr p15, 0, r0, r1, c7 @ PAR
3: load_vcpu @ Load VCPU pointer to r0 @@ -440,10 +440,10 @@ guest_trap: 1: mov r1, #ARM_EXCEPTION_HVC b __kvm_vcpu_return -4: pop {r0, r1} @ Failed translation, return to guest +4: ldmia sp!, {r0, r1} @ Failed translation, return to guest mcrr p15, 0, r0, r1, c7 @ PAR clrex
- pop {r0, r1, r2}
- ldmia sp!, {r0, r1, r2} eret
/* @@ -455,7 +455,7 @@ guest_trap: #ifdef CONFIG_VFPv3 switch_to_guest_vfp: load_vcpu @ Load VCPU pointer to r0
- push {r3-r7}
- stmdb sp!, {r3-r7}
@ NEON/VFP used. Turn on VFP access. set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11)) @@ -467,15 +467,15 @@ switch_to_guest_vfp: add r7, r0, #VCPU_VFP_GUEST restore_vfp_state r7
- pop {r3-r7}
- pop {r0-r2}
- ldmia sp!, {r3-r7}
- ldmia sp!, {r0-r2} clrex eret
#endif .align hyp_irq:
- push {r0, r1, r2}
- stmdb sp!, {r0, r1, r2} mov r1, #ARM_EXCEPTION_IRQ load_vcpu @ Load VCPU pointer to r0 b __kvm_vcpu_return
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S index 6f18695..c371db7 100644 --- a/arch/arm/kvm/interrupts_head.S +++ b/arch/arm/kvm/interrupts_head.S @@ -63,7 +63,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrs r2, SP_\mode mrs r3, LR_\mode mrs r4, SPSR_\mode
- push {r2, r3, r4}
- stmdb sp!, {r2, r3, r4}
.endm /* @@ -73,13 +73,13 @@ vcpu .req r0 @ vcpu pointer always in r0 .macro save_host_regs /* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */ mrs r2, ELR_hyp
- push {r2}
- stmdb sp!, {r2}
/* usr regs */
- push {r4-r12} @ r0-r3 are always clobbered
- stmdb sp!, {r4-r12} @ r0-r3 are always clobbered mrs r2, SP_usr mov r3, lr
- push {r2, r3}
- stmdb sp!, {r2, r3}
push_host_regs_mode svc push_host_regs_mode abt @@ -95,11 +95,11 @@ vcpu .req r0 @ vcpu pointer always in r0 mrs r7, SP_fiq mrs r8, LR_fiq mrs r9, SPSR_fiq
- push {r2-r9}
- stmdb sp!, {r2-r9}
.endm .macro pop_host_regs_mode mode
- pop {r2, r3, r4}
- ldmia sp!, {r2, r3, r4} msr SP_\mode, r2 msr LR_\mode, r3 msr SPSR_\mode, r4
@@ -110,7 +110,7 @@ vcpu .req r0 @ vcpu pointer always in r0
- Clobbers all registers, in all modes, except r0 and r1.
*/ .macro restore_host_regs
- pop {r2-r9}
- ldmia sp!, {r2-r9} msr r8_fiq, r2 msr r9_fiq, r3 msr r10_fiq, r4
@@ -125,12 +125,12 @@ vcpu .req r0 @ vcpu pointer always in r0 pop_host_regs_mode abt pop_host_regs_mode svc
- pop {r2, r3}
- ldmia sp!, {r2, r3} msr SP_usr, r2 mov lr, r3
- pop {r4-r12}
- ldmia sp!, {r4-r12}
- pop {r2}
- ldmia sp!, {r2} msr ELR_hyp, r2
.endm @@ -218,7 +218,7 @@ vcpu .req r0 @ vcpu pointer always in r0 add r2, vcpu, #VCPU_USR_REG(3) stm r2, {r3-r12} add r2, vcpu, #VCPU_USR_REG(0)
- pop {r3, r4, r5} @ r0, r1, r2
- ldmia sp!, {r3, r4, r5} @ r0, r1, r2 stm r2, {r3, r4, r5} mrs r2, SP_usr mov r3, lr
@@ -258,7 +258,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrc p15, 2, r12, c0, c0, 0 @ CSSELR .if \store_to_vcpu == 0
- push {r2-r12} @ Push CP15 registers
- stmdb sp!, {r2-r12} @ Push CP15 registers .else str r2, [vcpu, #CP15_OFFSET(c1_SCTLR)] str r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
@@ -286,7 +286,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrc p15, 0, r12, c12, c0, 0 @ VBAR .if \store_to_vcpu == 0
- push {r2-r12} @ Push CP15 registers
- stmdb sp!, {r2-r12} @ Push CP15 registers .else str r2, [vcpu, #CP15_OFFSET(c13_CID)] str r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
@@ -305,7 +305,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrrc p15, 0, r4, r5, c7 @ PAR .if \store_to_vcpu == 0
- push {r2,r4-r5}
- stmdb sp!, {r2,r4-r5} .else str r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)] add r12, vcpu, #CP15_OFFSET(c7_PAR)
@@ -322,7 +322,7 @@ vcpu .req r0 @ vcpu pointer always in r0 */ .macro write_cp15_state read_from_vcpu .if \read_from_vcpu == 0
- pop {r2,r4-r5}
- ldmia sp!, {r2,r4-r5} .else ldr r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)] add r12, vcpu, #CP15_OFFSET(c7_PAR)
@@ -333,7 +333,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mcrr p15, 0, r4, r5, c7 @ PAR .if \read_from_vcpu == 0
- pop {r2-r12}
- ldmia sp!, {r2-r12} .else ldr r2, [vcpu, #CP15_OFFSET(c13_CID)] ldr r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
@@ -361,7 +361,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mcr p15, 0, r12, c12, c0, 0 @ VBAR .if \read_from_vcpu == 0
- pop {r2-r12}
- ldmia sp!, {r2-r12} .else ldr r2, [vcpu, #CP15_OFFSET(c1_SCTLR)] ldr r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
-- 1.8.1.4
If you fix to address Dave's comments, then the code change otherwise looks good.
How about trying this alternative approach:
It looks like all the users of the push/pop macros are located in arch/arm/lib (mostly checksumming code). Can't we move these macros to a separate include file and leave the code that uses push/pop (as defined by the assembler) alone?
Thanks,
M.
Russell, Dave, Will, could you please check below inline, looking for your opinion.
Marc, response is inline.
On 21 January 2014 01:58, Marc Zyngier marc.zyngier@arm.com wrote:
On 21/01/14 01:18, Christoffer Dall wrote:
On Fri, Dec 20, 2013 at 08:48:41AM -0800, Victor Kamensky wrote:
Before fix kvm interrupt.S and interrupt_head.S used push and pop assembler instruction. It causes problem if <asm/assembler.h> file should be include. In assembler.h "push" is defined as macro so it causes compilation errors like this:
"Before fix kvm..." doesn't read very pleasently, consider using something like "Prior to this commit...."
"causes a problem" or "causes problems"
change "if <asm/assembler.h> file should be include..." to "if <asm/assembler.h> is included, because assember.h defines 'push' as a macro..."
arch/arm/kvm/interrupts.S: Assembler messages: arch/arm/kvm/interrupts.S:51: Error: ARM register expected -- `lsr {r2,r3}'
Solution implemented by this patch replaces all 'push {...}' with 'stdmb sp!, {...}' instruction; and all 'pop {...}' with 'ldmia sp!, {...}'.
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org
arch/arm/kvm/interrupts.S | 38 +++++++++++++++++++------------------- arch/arm/kvm/interrupts_head.S | 34 +++++++++++++++++----------------- 2 files changed, 36 insertions(+), 36 deletions(-)
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S index ddc1553..df19133 100644 --- a/arch/arm/kvm/interrupts.S +++ b/arch/arm/kvm/interrupts.S @@ -47,7 +47,7 @@ __kvm_hyp_code_start:
- instead, ignoring the ipa value.
*/ ENTRY(__kvm_tlb_flush_vmid_ipa)
- push {r2, r3}
stmdb sp!, {r2, r3}
dsb ishst add r0, r0, #KVM_VTTBR
@@ -62,7 +62,7 @@ ENTRY(__kvm_tlb_flush_vmid_ipa) mcrr p15, 6, r2, r3, c2 @ Back to VMID #0 isb @ Not necessary if followed by eret
- pop {r2, r3}
- ldmia sp!, {r2, r3} bx lr
ENDPROC(__kvm_tlb_flush_vmid_ipa)
@@ -110,7 +110,7 @@ ENTRY(__kvm_vcpu_run) #ifdef CONFIG_VFPv3 @ Set FPEXC_EN so the guest doesn't trap floating point instructions VFPFMRX r2, FPEXC @ VMRS
- push {r2}
- stmdb sp!, {r2} orr r2, r2, #FPEXC_EN VFPFMXR FPEXC, r2 @ VMSR
#endif @@ -175,7 +175,7 @@ __kvm_vcpu_return:
after_vfp_restore: @ Restore FPEXC_EN which we clobbered on entry
- pop {r2}
- ldmia sp!, {r2} VFPFMXR FPEXC, r2
#endif
@@ -260,7 +260,7 @@ ENTRY(kvm_call_hyp)
/* Handle undef, svc, pabt, or dabt by crashing with a user notice */ .macro bad_exception exception_code, panic_str
- push {r0-r2}
- stmdb sp!, {r0-r2} mrrc p15, 6, r0, r1, c2 @ Read VTTBR lsr r1, r1, #16 ands r1, r1, #0xff
@@ -338,7 +338,7 @@ hyp_hvc: * Getting here is either becuase of a trap from a guest or from calling * HVC from the host kernel, which means "switch to Hyp mode". */
- push {r0, r1, r2}
stmdb sp!, {r0, r1, r2}
@ Check syndrome register mrc p15, 4, r1, c5, c2, 0 @ HSR
@@ -361,11 +361,11 @@ hyp_hvc: bne guest_trap @ Guest called HVC
host_switch_to_hyp:
- pop {r0, r1, r2}
- ldmia sp!, {r0, r1, r2}
- push {lr}
- stmdb sp!, {lr} mrs lr, SPSR
- push {lr}
stmdb sp!, {lr}
mov lr, r0 mov r0, r1
@@ -375,9 +375,9 @@ host_switch_to_hyp: THUMB( orr lr, #1) blx lr @ Call the HYP function
- pop {lr}
- ldmia sp!, {lr} msr SPSR_csxf, lr
- pop {lr}
- ldmia sp!, {lr} eret
guest_trap: @@ -418,7 +418,7 @@ guest_trap:
/* Preserve PAR */ mrrc p15, 0, r0, r1, c7 @ PAR
- push {r0, r1}
stmdb sp!, {r0, r1}
/* Resolve IPA using the xFAR */ mcr p15, 0, r2, c7, c8, 0 @ ATS1CPR
@@ -431,7 +431,7 @@ guest_trap: orr r2, r2, r1, lsl #24
/* Restore PAR */
- pop {r0, r1}
- ldmia sp!, {r0, r1} mcrr p15, 0, r0, r1, c7 @ PAR
3: load_vcpu @ Load VCPU pointer to r0 @@ -440,10 +440,10 @@ guest_trap: 1: mov r1, #ARM_EXCEPTION_HVC b __kvm_vcpu_return
-4: pop {r0, r1} @ Failed translation, return to guest +4: ldmia sp!, {r0, r1} @ Failed translation, return to guest mcrr p15, 0, r0, r1, c7 @ PAR clrex
- pop {r0, r1, r2}
- ldmia sp!, {r0, r1, r2} eret
/* @@ -455,7 +455,7 @@ guest_trap: #ifdef CONFIG_VFPv3 switch_to_guest_vfp: load_vcpu @ Load VCPU pointer to r0
- push {r3-r7}
stmdb sp!, {r3-r7}
@ NEON/VFP used. Turn on VFP access. set_hcptr vmexit, (HCPTR_TCP(10) | HCPTR_TCP(11))
@@ -467,15 +467,15 @@ switch_to_guest_vfp: add r7, r0, #VCPU_VFP_GUEST restore_vfp_state r7
- pop {r3-r7}
- pop {r0-r2}
- ldmia sp!, {r3-r7}
- ldmia sp!, {r0-r2} clrex eret
#endif
.align
hyp_irq:
- push {r0, r1, r2}
- stmdb sp!, {r0, r1, r2} mov r1, #ARM_EXCEPTION_IRQ load_vcpu @ Load VCPU pointer to r0 b __kvm_vcpu_return
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S index 6f18695..c371db7 100644 --- a/arch/arm/kvm/interrupts_head.S +++ b/arch/arm/kvm/interrupts_head.S @@ -63,7 +63,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrs r2, SP_\mode mrs r3, LR_\mode mrs r4, SPSR_\mode
- push {r2, r3, r4}
- stmdb sp!, {r2, r3, r4}
.endm
/* @@ -73,13 +73,13 @@ vcpu .req r0 @ vcpu pointer always in r0 .macro save_host_regs /* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */ mrs r2, ELR_hyp
- push {r2}
stmdb sp!, {r2}
/* usr regs */
- push {r4-r12} @ r0-r3 are always clobbered
- stmdb sp!, {r4-r12} @ r0-r3 are always clobbered mrs r2, SP_usr mov r3, lr
- push {r2, r3}
stmdb sp!, {r2, r3}
push_host_regs_mode svc push_host_regs_mode abt
@@ -95,11 +95,11 @@ vcpu .req r0 @ vcpu pointer always in r0 mrs r7, SP_fiq mrs r8, LR_fiq mrs r9, SPSR_fiq
- push {r2-r9}
- stmdb sp!, {r2-r9}
.endm
.macro pop_host_regs_mode mode
- pop {r2, r3, r4}
- ldmia sp!, {r2, r3, r4} msr SP_\mode, r2 msr LR_\mode, r3 msr SPSR_\mode, r4
@@ -110,7 +110,7 @@ vcpu .req r0 @ vcpu pointer always in r0
- Clobbers all registers, in all modes, except r0 and r1.
*/ .macro restore_host_regs
- pop {r2-r9}
- ldmia sp!, {r2-r9} msr r8_fiq, r2 msr r9_fiq, r3 msr r10_fiq, r4
@@ -125,12 +125,12 @@ vcpu .req r0 @ vcpu pointer always in r0 pop_host_regs_mode abt pop_host_regs_mode svc
- pop {r2, r3}
- ldmia sp!, {r2, r3} msr SP_usr, r2 mov lr, r3
- pop {r4-r12}
- ldmia sp!, {r4-r12}
- pop {r2}
- ldmia sp!, {r2} msr ELR_hyp, r2
.endm
@@ -218,7 +218,7 @@ vcpu .req r0 @ vcpu pointer always in r0 add r2, vcpu, #VCPU_USR_REG(3) stm r2, {r3-r12} add r2, vcpu, #VCPU_USR_REG(0)
- pop {r3, r4, r5} @ r0, r1, r2
- ldmia sp!, {r3, r4, r5} @ r0, r1, r2 stm r2, {r3, r4, r5} mrs r2, SP_usr mov r3, lr
@@ -258,7 +258,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrc p15, 2, r12, c0, c0, 0 @ CSSELR
.if \store_to_vcpu == 0
- push {r2-r12} @ Push CP15 registers
- stmdb sp!, {r2-r12} @ Push CP15 registers .else str r2, [vcpu, #CP15_OFFSET(c1_SCTLR)] str r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
@@ -286,7 +286,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrc p15, 0, r12, c12, c0, 0 @ VBAR
.if \store_to_vcpu == 0
- push {r2-r12} @ Push CP15 registers
- stmdb sp!, {r2-r12} @ Push CP15 registers .else str r2, [vcpu, #CP15_OFFSET(c13_CID)] str r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
@@ -305,7 +305,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mrrc p15, 0, r4, r5, c7 @ PAR
.if \store_to_vcpu == 0
- push {r2,r4-r5}
- stmdb sp!, {r2,r4-r5} .else str r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)] add r12, vcpu, #CP15_OFFSET(c7_PAR)
@@ -322,7 +322,7 @@ vcpu .req r0 @ vcpu pointer always in r0 */ .macro write_cp15_state read_from_vcpu .if \read_from_vcpu == 0
- pop {r2,r4-r5}
- ldmia sp!, {r2,r4-r5} .else ldr r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)] add r12, vcpu, #CP15_OFFSET(c7_PAR)
@@ -333,7 +333,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mcrr p15, 0, r4, r5, c7 @ PAR
.if \read_from_vcpu == 0
- pop {r2-r12}
- ldmia sp!, {r2-r12} .else ldr r2, [vcpu, #CP15_OFFSET(c13_CID)] ldr r3, [vcpu, #CP15_OFFSET(c13_TID_URW)]
@@ -361,7 +361,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mcr p15, 0, r12, c12, c0, 0 @ VBAR
.if \read_from_vcpu == 0
- pop {r2-r12}
- ldmia sp!, {r2-r12} .else ldr r2, [vcpu, #CP15_OFFSET(c1_SCTLR)] ldr r3, [vcpu, #CP15_OFFSET(c1_CPACR)]
-- 1.8.1.4
If you fix to address Dave's comments, then the code change otherwise looks good.
How about trying this alternative approach:
It looks like all the users of the push/pop macros are located in arch/arm/lib (mostly checksumming code). Can't we move these macros to a separate include file and leave the code that uses push/pop (as defined by the assembler) alone?
Marc, personally I am OK with such proposal. I was considering something along these lines as one of the options. It works for me both ways. If others agree I am happy to recode it as your suggested. I choose proposed above patch because kvm arm code came after push and pop defines were introduced in asm/assembler.h and used in other places. I am OK either way. I agree that use of push and pop as define names seems a bit unfortunate, but I don't have any historic visibility here
Russell, Dave, Will, do you have any opinion on Marc's proposal to fix this issue?
Thanks, Victor
Thanks,
M.
-- Jazz is not dead. It just smells funny...
[Adding Nico, as the author of the push/pull macros. Background is that kvm is using push to store to the stack and would now like to include assembler.h]
On Wed, Jan 22, 2014 at 06:41:09AM +0000, Victor Kamensky wrote:
On 21 January 2014 01:58, Marc Zyngier marc.zyngier@arm.com wrote:
How about trying this alternative approach:
It looks like all the users of the push/pop macros are located in arch/arm/lib (mostly checksumming code). Can't we move these macros to a separate include file and leave the code that uses push/pop (as defined by the assembler) alone?
Marc, personally I am OK with such proposal. I was considering something along these lines as one of the options. It works for me both ways. If others agree I am happy to recode it as your suggested. I choose proposed above patch because kvm arm code came after push and pop defines were introduced in asm/assembler.h and used in other places. I am OK either way. I agree that use of push and pop as define names seems a bit unfortunate, but I don't have any historic visibility here
Russell, Dave, Will, do you have any opinion on Marc's proposal to fix this issue?
I'm perfectly fine with moving those macros into a lib/-local header file. An alternative is renaming push/pull to something like lspush and lspull and updating the files under lib.
Will
ARM v7 KVM assembler files fixes to work in big endian mode:
vgic h/w registers are little endian; when asm code reads/writes from/to them, it needs to do byteswap after/before. Byte swap code uses ARM_BE8 wrapper to add swap only if BIG_ENDIAN kernel is configured
mcrr and mrrc instructions take couple 32 bit registers as argument, one is supposed to be high part of 64 bit value and another is low part of 64 bit. Typically those values are loaded/stored with ldrd and strd instructions and those will load high and low parts in opposite register depending on endianity. Introduce and use rr_lo_hi macro that swap registers in BE mode when they are passed to mcrr and mrrc instructions.
function that returns 64 bit result __kvm_vcpu_run in couple registers has to be adjusted for BE case.
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org --- arch/arm/include/asm/assembler.h | 7 +++++++ arch/arm/include/asm/kvm_asm.h | 4 ++-- arch/arm/kvm/init.S | 7 +++++-- arch/arm/kvm/interrupts.S | 12 +++++++++--- arch/arm/kvm/interrupts_head.S | 27 ++++++++++++++++++++------- 5 files changed, 43 insertions(+), 14 deletions(-)
diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index 5c22851..ad1ad31 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -60,6 +60,13 @@ #define ARM_BE8(code...) #endif
+/* swap pair of registers position depending on current endianity */ +#ifdef CONFIG_CPU_ENDIAN_BE8 +#define rr_lo_hi(a1, a2) a2, a1 +#else +#define rr_lo_hi(a1, a2) a1, a2 +#endif + /* * Data preload for architectures that support it */ diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h index 661da11..12981d6 100644 --- a/arch/arm/include/asm/kvm_asm.h +++ b/arch/arm/include/asm/kvm_asm.h @@ -26,9 +26,9 @@ #define c1_ACTLR 4 /* Auxilliary Control Register */ #define c1_CPACR 5 /* Coprocessor Access Control */ #define c2_TTBR0 6 /* Translation Table Base Register 0 */ -#define c2_TTBR0_high 7 /* TTBR0 top 32 bits */ +#define c2_TTBR0_hilo 7 /* TTBR0 top 32 bits in LE case, low 32 bits in BE case */ #define c2_TTBR1 8 /* Translation Table Base Register 1 */ -#define c2_TTBR1_high 9 /* TTBR1 top 32 bits */ +#define c2_TTBR1_hilo 9 /* TTBR1 top 32 bits in LE case, low 32 bits in BE case */ #define c2_TTBCR 10 /* Translation Table Base Control R. */ #define c3_DACR 11 /* Domain Access Control Register */ #define c5_DFSR 12 /* Data Fault Status Register */ diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S index 1b9844d..2d10b2d 100644 --- a/arch/arm/kvm/init.S +++ b/arch/arm/kvm/init.S @@ -22,6 +22,7 @@ #include <asm/kvm_asm.h> #include <asm/kvm_arm.h> #include <asm/kvm_mmu.h> +#include <asm/assembler.h>
/******************************************************************** * Hypervisor initialization @@ -70,8 +71,10 @@ __do_hyp_init: cmp r0, #0 @ We have a SP? bne phase2 @ Yes, second stage init
+ARM_BE8(setend be) @ Switch to Big Endian mode if needed + @ Set the HTTBR to point to the hypervisor PGD pointer passed - mcrr p15, 4, r2, r3, c2 + mcrr p15, 4, rr_lo_hi(r2, r3), c2
@ Set the HTCR and VTCR to the same shareability and cacheability @ settings as the non-secure TTBCR and with T0SZ == 0. @@ -137,7 +140,7 @@ phase2: mov pc, r0
target: @ We're now in the trampoline code, switch page tables - mcrr p15, 4, r2, r3, c2 + mcrr p15, 4, rr_lo_hi(r2, r3), c2 isb
@ Invalidate the old TLBs diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S index df19133..0784ec3 100644 --- a/arch/arm/kvm/interrupts.S +++ b/arch/arm/kvm/interrupts.S @@ -25,6 +25,7 @@ #include <asm/kvm_asm.h> #include <asm/kvm_arm.h> #include <asm/vfpmacros.h> +#include <asm/assembler.h> #include "interrupts_head.S"
.text @@ -52,14 +53,14 @@ ENTRY(__kvm_tlb_flush_vmid_ipa) dsb ishst add r0, r0, #KVM_VTTBR ldrd r2, r3, [r0] - mcrr p15, 6, r2, r3, c2 @ Write VTTBR + mcrr p15, 6, rr_lo_hi(r2, r3), c2 @ Write VTTBR isb mcr p15, 0, r0, c8, c3, 0 @ TLBIALLIS (rt ignored) dsb ish isb mov r2, #0 mov r3, #0 - mcrr p15, 6, r2, r3, c2 @ Back to VMID #0 + mcrr p15, 6, rr_lo_hi(r2, r3), c2 @ Back to VMID #0 isb @ Not necessary if followed by eret
ldmia sp!, {r2, r3} @@ -135,7 +136,7 @@ ENTRY(__kvm_vcpu_run) ldr r1, [vcpu, #VCPU_KVM] add r1, r1, #KVM_VTTBR ldrd r2, r3, [r1] - mcrr p15, 6, r2, r3, c2 @ Write VTTBR + mcrr p15, 6, rr_lo_hi(r2, r3), c2 @ Write VTTBR
@ We're all done, just restore the GPRs and go to the guest restore_guest_regs @@ -199,8 +200,13 @@ after_vfp_restore:
restore_host_regs clrex @ Clear exclusive monitor +#ifndef __ARMEB__ mov r0, r1 @ Return the return code mov r1, #0 @ Clear upper bits in return value +#else + @ r1 already has return code + mov r0, #0 @ Clear upper bits in return value +#endif /* __ARMEB__ */ bx lr @ return to IOCTL
/******************************************************************** diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S index c371db7..67b4002 100644 --- a/arch/arm/kvm/interrupts_head.S +++ b/arch/arm/kvm/interrupts_head.S @@ -251,8 +251,8 @@ vcpu .req r0 @ vcpu pointer always in r0 mrc p15, 0, r3, c1, c0, 2 @ CPACR mrc p15, 0, r4, c2, c0, 2 @ TTBCR mrc p15, 0, r5, c3, c0, 0 @ DACR - mrrc p15, 0, r6, r7, c2 @ TTBR 0 - mrrc p15, 1, r8, r9, c2 @ TTBR 1 + mrrc p15, 0, rr_lo_hi(r6, r7), c2 @ TTBR 0 + mrrc p15, 1, rr_lo_hi(r8, r9), c2 @ TTBR 1 mrc p15, 0, r10, c10, c2, 0 @ PRRR mrc p15, 0, r11, c10, c2, 1 @ NMRR mrc p15, 2, r12, c0, c0, 0 @ CSSELR @@ -380,8 +380,8 @@ vcpu .req r0 @ vcpu pointer always in r0 mcr p15, 0, r3, c1, c0, 2 @ CPACR mcr p15, 0, r4, c2, c0, 2 @ TTBCR mcr p15, 0, r5, c3, c0, 0 @ DACR - mcrr p15, 0, r6, r7, c2 @ TTBR 0 - mcrr p15, 1, r8, r9, c2 @ TTBR 1 + mcrr p15, 0, rr_lo_hi(r6, r7), c2 @ TTBR 0 + mcrr p15, 1, rr_lo_hi(r8, r9), c2 @ TTBR 1 mcr p15, 0, r10, c10, c2, 0 @ PRRR mcr p15, 0, r11, c10, c2, 1 @ NMRR mcr p15, 2, r12, c0, c0, 0 @ CSSELR @@ -413,13 +413,21 @@ vcpu .req r0 @ vcpu pointer always in r0 ldr r9, [r2, #GICH_ELRSR1] ldr r10, [r2, #GICH_APR]
+ARM_BE8(rev r3, r3 ) str r3, [r11, #VGIC_CPU_HCR] +ARM_BE8(rev r4, r4 ) str r4, [r11, #VGIC_CPU_VMCR] +ARM_BE8(rev r5, r5 ) str r5, [r11, #VGIC_CPU_MISR] +ARM_BE8(rev r6, r6 ) str r6, [r11, #VGIC_CPU_EISR] +ARM_BE8(rev r7, r7 ) str r7, [r11, #(VGIC_CPU_EISR + 4)] +ARM_BE8(rev r8, r8 ) str r8, [r11, #VGIC_CPU_ELRSR] +ARM_BE8(rev r9, r9 ) str r9, [r11, #(VGIC_CPU_ELRSR + 4)] +ARM_BE8(rev r10, r10 ) str r10, [r11, #VGIC_CPU_APR]
/* Clear GICH_HCR */ @@ -431,6 +439,7 @@ vcpu .req r0 @ vcpu pointer always in r0 add r3, r11, #VGIC_CPU_LR ldr r4, [r11, #VGIC_CPU_NR_LR] 1: ldr r6, [r2], #4 +ARM_BE8(rev r6, r6 ) str r6, [r3], #4 subs r4, r4, #1 bne 1b @@ -459,8 +468,11 @@ vcpu .req r0 @ vcpu pointer always in r0 ldr r4, [r11, #VGIC_CPU_VMCR] ldr r8, [r11, #VGIC_CPU_APR]
+ARM_BE8(rev r3, r3 ) str r3, [r2, #GICH_HCR] +ARM_BE8(rev r4, r4 ) str r4, [r2, #GICH_VMCR] +ARM_BE8(rev r8, r8 ) str r8, [r2, #GICH_APR]
/* Restore list registers */ @@ -468,6 +480,7 @@ vcpu .req r0 @ vcpu pointer always in r0 add r3, r11, #VGIC_CPU_LR ldr r4, [r11, #VGIC_CPU_NR_LR] 1: ldr r6, [r3], #4 +ARM_BE8(rev r6, r6 ) str r6, [r2], #4 subs r4, r4, #1 bne 1b @@ -498,7 +511,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mcr p15, 0, r2, c14, c3, 1 @ CNTV_CTL isb
- mrrc p15, 3, r2, r3, c14 @ CNTV_CVAL + mrrc p15, 3, rr_lo_hi(r2, r3), c14 @ CNTV_CVAL ldr r4, =VCPU_TIMER_CNTV_CVAL add r5, vcpu, r4 strd r2, r3, [r5] @@ -538,12 +551,12 @@ vcpu .req r0 @ vcpu pointer always in r0
ldr r2, [r4, #KVM_TIMER_CNTVOFF] ldr r3, [r4, #(KVM_TIMER_CNTVOFF + 4)] - mcrr p15, 4, r2, r3, c14 @ CNTVOFF + mcrr p15, 4, rr_lo_hi(r2, r3), c14 @ CNTVOFF
ldr r4, =VCPU_TIMER_CNTV_CVAL add r5, vcpu, r4 ldrd r2, r3, [r5] - mcrr p15, 3, r2, r3, c14 @ CNTV_CVAL + mcrr p15, 3, rr_lo_hi(r2, r3), c14 @ CNTV_CVAL isb
ldr r2, [vcpu, #VCPU_TIMER_CNTV_CTL]
On Fri, Dec 20, 2013 at 08:48:42AM -0800, Victor Kamensky wrote:
ARM v7 KVM assembler files fixes to work in big endian mode:
I don't think 'files fixes' is proper English, could be something like:
Fix ARM v7 KVM assembler files to work...
vgic h/w registers are little endian; when asm code reads/writes from/to
the vgic h/w registers
them, it needs to do byteswap after/before. Byte swap code uses ARM_BE8
Byteswap
wrapper to add swap only if BIG_ENDIAN kernel is configured
what is the config symbol, CONFIG_BIG_ENDIAN?
mcrr and mrrc instructions take couple 32 bit registers as argument, one
The mcrr and mrrc...
a couple of
as their arguments
is supposed to be high part of 64 bit value and another is low part of 64 bit. Typically those values are loaded/stored with ldrd and strd
one is supposed to be?
instructions and those will load high and low parts in opposite register depending on endianity. Introduce and use rr_lo_hi macro that swap
opposite register? This text is more confusing that clarifying, I think you need to explain what how the rr_lo_hi macro is intended to be used if anything.
registers in BE mode when they are passed to mcrr and mrrc instructions.
function that returns 64 bit result __kvm_vcpu_run in couple registers has to be adjusted for BE case.
The __kvm_vcpu_run function returns a 64-bit result in two registers, which has...
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org
arch/arm/include/asm/assembler.h | 7 +++++++ arch/arm/include/asm/kvm_asm.h | 4 ++-- arch/arm/kvm/init.S | 7 +++++-- arch/arm/kvm/interrupts.S | 12 +++++++++--- arch/arm/kvm/interrupts_head.S | 27 ++++++++++++++++++++------- 5 files changed, 43 insertions(+), 14 deletions(-)
diff --git a/arch/arm/include/asm/assembler.h b/arch/arm/include/asm/assembler.h index 5c22851..ad1ad31 100644 --- a/arch/arm/include/asm/assembler.h +++ b/arch/arm/include/asm/assembler.h @@ -60,6 +60,13 @@ #define ARM_BE8(code...) #endif +/* swap pair of registers position depending on current endianity */ +#ifdef CONFIG_CPU_ENDIAN_BE8 +#define rr_lo_hi(a1, a2) a2, a1 +#else +#define rr_lo_hi(a1, a2) a1, a2 +#endif
I'm not convinced that this is needed generally in the kernel and not locally to KVM, but if it is, then I think it needs to be documented more. I assume the idea here is that a1 is always the lowered number register in an ldrd instruction loading the values to write to the register?
/*
- Data preload for architectures that support it
*/ diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h index 661da11..12981d6 100644 --- a/arch/arm/include/asm/kvm_asm.h +++ b/arch/arm/include/asm/kvm_asm.h @@ -26,9 +26,9 @@ #define c1_ACTLR 4 /* Auxilliary Control Register */ #define c1_CPACR 5 /* Coprocessor Access Control */ #define c2_TTBR0 6 /* Translation Table Base Register 0 */ -#define c2_TTBR0_high 7 /* TTBR0 top 32 bits */ +#define c2_TTBR0_hilo 7 /* TTBR0 top 32 bits in LE case, low 32 bits in BE case */ #define c2_TTBR1 8 /* Translation Table Base Register 1 */ -#define c2_TTBR1_high 9 /* TTBR1 top 32 bits */ +#define c2_TTBR1_hilo 9 /* TTBR1 top 32 bits in LE case, low 32 bits in BE case */
These lines far exceed 80 chars, but not sure how to improve on that...
#define c2_TTBCR 10 /* Translation Table Base Control R. */ #define c3_DACR 11 /* Domain Access Control Register */ #define c5_DFSR 12 /* Data Fault Status Register */ diff --git a/arch/arm/kvm/init.S b/arch/arm/kvm/init.S index 1b9844d..2d10b2d 100644 --- a/arch/arm/kvm/init.S +++ b/arch/arm/kvm/init.S @@ -22,6 +22,7 @@ #include <asm/kvm_asm.h> #include <asm/kvm_arm.h> #include <asm/kvm_mmu.h> +#include <asm/assembler.h> /********************************************************************
- Hypervisor initialization
@@ -70,8 +71,10 @@ __do_hyp_init: cmp r0, #0 @ We have a SP? bne phase2 @ Yes, second stage init +ARM_BE8(setend be) @ Switch to Big Endian mode if needed
- @ Set the HTTBR to point to the hypervisor PGD pointer passed
- mcrr p15, 4, r2, r3, c2
- mcrr p15, 4, rr_lo_hi(r2, r3), c2
@ Set the HTCR and VTCR to the same shareability and cacheability @ settings as the non-secure TTBCR and with T0SZ == 0. @@ -137,7 +140,7 @@ phase2: mov pc, r0 target: @ We're now in the trampoline code, switch page tables
- mcrr p15, 4, r2, r3, c2
- mcrr p15, 4, rr_lo_hi(r2, r3), c2 isb
I guess you could switch r2 and r3 (without a third register or using stack space) on big endian to avoid the need for the macro in a header file and define the macro locally in the interrupts*.S files... Hmmm, undecided.
@ Invalidate the old TLBs diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S index df19133..0784ec3 100644 --- a/arch/arm/kvm/interrupts.S +++ b/arch/arm/kvm/interrupts.S @@ -25,6 +25,7 @@ #include <asm/kvm_asm.h> #include <asm/kvm_arm.h> #include <asm/vfpmacros.h> +#include <asm/assembler.h> #include "interrupts_head.S" .text @@ -52,14 +53,14 @@ ENTRY(__kvm_tlb_flush_vmid_ipa) dsb ishst add r0, r0, #KVM_VTTBR ldrd r2, r3, [r0]
- mcrr p15, 6, r2, r3, c2 @ Write VTTBR
- mcrr p15, 6, rr_lo_hi(r2, r3), c2 @ Write VTTBR isb mcr p15, 0, r0, c8, c3, 0 @ TLBIALLIS (rt ignored) dsb ish isb mov r2, #0 mov r3, #0
- mcrr p15, 6, r2, r3, c2 @ Back to VMID #0
- mcrr p15, 6, rr_lo_hi(r2, r3), c2 @ Back to VMID #0 isb @ Not necessary if followed by eret
ldmia sp!, {r2, r3} @@ -135,7 +136,7 @@ ENTRY(__kvm_vcpu_run) ldr r1, [vcpu, #VCPU_KVM] add r1, r1, #KVM_VTTBR ldrd r2, r3, [r1]
- mcrr p15, 6, r2, r3, c2 @ Write VTTBR
- mcrr p15, 6, rr_lo_hi(r2, r3), c2 @ Write VTTBR
@ We're all done, just restore the GPRs and go to the guest restore_guest_regs @@ -199,8 +200,13 @@ after_vfp_restore: restore_host_regs clrex @ Clear exclusive monitor +#ifndef __ARMEB__ mov r0, r1 @ Return the return code mov r1, #0 @ Clear upper bits in return value +#else
- @ r1 already has return code
- mov r0, #0 @ Clear upper bits in return value
+#endif /* __ARMEB__ */ bx lr @ return to IOCTL /******************************************************************** diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S index c371db7..67b4002 100644 --- a/arch/arm/kvm/interrupts_head.S +++ b/arch/arm/kvm/interrupts_head.S @@ -251,8 +251,8 @@ vcpu .req r0 @ vcpu pointer always in r0 mrc p15, 0, r3, c1, c0, 2 @ CPACR mrc p15, 0, r4, c2, c0, 2 @ TTBCR mrc p15, 0, r5, c3, c0, 0 @ DACR
- mrrc p15, 0, r6, r7, c2 @ TTBR 0
- mrrc p15, 1, r8, r9, c2 @ TTBR 1
- mrrc p15, 0, rr_lo_hi(r6, r7), c2 @ TTBR 0
- mrrc p15, 1, rr_lo_hi(r8, r9), c2 @ TTBR 1 mrc p15, 0, r10, c10, c2, 0 @ PRRR mrc p15, 0, r11, c10, c2, 1 @ NMRR mrc p15, 2, r12, c0, c0, 0 @ CSSELR
@@ -380,8 +380,8 @@ vcpu .req r0 @ vcpu pointer always in r0 mcr p15, 0, r3, c1, c0, 2 @ CPACR mcr p15, 0, r4, c2, c0, 2 @ TTBCR mcr p15, 0, r5, c3, c0, 0 @ DACR
- mcrr p15, 0, r6, r7, c2 @ TTBR 0
- mcrr p15, 1, r8, r9, c2 @ TTBR 1
- mcrr p15, 0, rr_lo_hi(r6, r7), c2 @ TTBR 0
- mcrr p15, 1, rr_lo_hi(r8, r9), c2 @ TTBR 1 mcr p15, 0, r10, c10, c2, 0 @ PRRR mcr p15, 0, r11, c10, c2, 1 @ NMRR mcr p15, 2, r12, c0, c0, 0 @ CSSELR
@@ -413,13 +413,21 @@ vcpu .req r0 @ vcpu pointer always in r0 ldr r9, [r2, #GICH_ELRSR1] ldr r10, [r2, #GICH_APR] +ARM_BE8(rev r3, r3 ) str r3, [r11, #VGIC_CPU_HCR] +ARM_BE8(rev r4, r4 ) str r4, [r11, #VGIC_CPU_VMCR] +ARM_BE8(rev r5, r5 ) str r5, [r11, #VGIC_CPU_MISR] +ARM_BE8(rev r6, r6 ) str r6, [r11, #VGIC_CPU_EISR] +ARM_BE8(rev r7, r7 ) str r7, [r11, #(VGIC_CPU_EISR + 4)] +ARM_BE8(rev r8, r8 ) str r8, [r11, #VGIC_CPU_ELRSR] +ARM_BE8(rev r9, r9 ) str r9, [r11, #(VGIC_CPU_ELRSR + 4)] +ARM_BE8(rev r10, r10 ) str r10, [r11, #VGIC_CPU_APR]
Wouldn't it be semantically cleaner to to the byteswap after the loads from the hardware instead?
/* Clear GICH_HCR */ @@ -431,6 +439,7 @@ vcpu .req r0 @ vcpu pointer always in r0 add r3, r11, #VGIC_CPU_LR ldr r4, [r11, #VGIC_CPU_NR_LR] 1: ldr r6, [r2], #4 +ARM_BE8(rev r6, r6 ) str r6, [r3], #4 subs r4, r4, #1 bne 1b @@ -459,8 +468,11 @@ vcpu .req r0 @ vcpu pointer always in r0 ldr r4, [r11, #VGIC_CPU_VMCR] ldr r8, [r11, #VGIC_CPU_APR] +ARM_BE8(rev r3, r3 ) str r3, [r2, #GICH_HCR] +ARM_BE8(rev r4, r4 ) str r4, [r2, #GICH_VMCR] +ARM_BE8(rev r8, r8 ) str r8, [r2, #GICH_APR] /* Restore list registers */ @@ -468,6 +480,7 @@ vcpu .req r0 @ vcpu pointer always in r0 add r3, r11, #VGIC_CPU_LR ldr r4, [r11, #VGIC_CPU_NR_LR] 1: ldr r6, [r3], #4 +ARM_BE8(rev r6, r6 ) str r6, [r2], #4 subs r4, r4, #1 bne 1b @@ -498,7 +511,7 @@ vcpu .req r0 @ vcpu pointer always in r0 mcr p15, 0, r2, c14, c3, 1 @ CNTV_CTL isb
- mrrc p15, 3, r2, r3, c14 @ CNTV_CVAL
- mrrc p15, 3, rr_lo_hi(r2, r3), c14 @ CNTV_CVAL ldr r4, =VCPU_TIMER_CNTV_CVAL add r5, vcpu, r4 strd r2, r3, [r5]
@@ -538,12 +551,12 @@ vcpu .req r0 @ vcpu pointer always in r0 ldr r2, [r4, #KVM_TIMER_CNTVOFF] ldr r3, [r4, #(KVM_TIMER_CNTVOFF + 4)]
- mcrr p15, 4, r2, r3, c14 @ CNTVOFF
- mcrr p15, 4, rr_lo_hi(r2, r3), c14 @ CNTVOFF
ldr r4, =VCPU_TIMER_CNTV_CVAL add r5, vcpu, r4 ldrd r2, r3, [r5]
- mcrr p15, 3, r2, r3, c14 @ CNTV_CVAL
- mcrr p15, 3, rr_lo_hi(r2, r3), c14 @ CNTV_CVAL isb
ldr r2, [vcpu, #VCPU_TIMER_CNTV_CTL] -- 1.8.1.4
Thanks,
This patch fixes issue of reading and writing ARM V7 registers values from/to user land. Existing code was designed to work only in LE case.
struct kvm_one_reg ------------------
registers value passed through kvm_one_reg structure. It is used by KVM_SET_ONE_REG, KVM_GET_ONE_REG ioctls. Note by looking at structure itself we cannot tell what is size of register. Note that structure carries address of user memory, 'addr' where register should be read or written
Setting register (from user-land to kvm) ----------------------------------------
kvm_arm_set_reg takes vcpu and pointer to struct kvm_one_reg which already read from user space
kvm_arm_set_reg calls set_core_reg or kvm_arm_coproc_set_reg
set_core_reg deals only with 4 bytes registers, it just reads 4 bytes from user space and store it properly into vcpu->arch.regs
kvm_arm_coproc_set_reg deals with registers of different size. At certain point code reaches phase where it retrieves description of register by id and it knows register size, which could be either 4 or 8 bytes. Kernel code is ready to read values from user space, but destination type may vary. It could be pointer to 32 bit integer or it could be pointer to 64 bit integer. And all possible permutation of size and destination pointer are possible. Depending on destination pointer type, 4 bytes or 8 bytes, two new helper functions are introduced - reg_from_user32 and reg_from_user64. They are used instead of reg_from_user function which could work only in LE case.
Size sizeof(*DstInt) Function used to read from user 4 4 reg_from_user32 8 4 reg_from_user32 - read two registers 4 8 reg_from_user64 - need special handling for BE 8 8 reg_from_user64
Getting register (to user-land from kvm) ----------------------------------------
Situation with reading registers is similar to writing. Integer pointer type of register to be copied could be 4 or 8 bytes. And size passed in struct kvm_one_reg could be 4 or 8. And any permutation is possible. Depending on src pointer type, 4 bytes or 8 bytes, two new helper functions are introduced - reg_from_user32 and reg_to_user64. They are used instead of reg_to_user function, which could work only in LE case.
Size sizeof(*SrcInt) Function used to write to user 4 4 reg_to_user32 8 4 reg_to_user32 - writes two registers 4 8 reg_to_user64 - need special handleing for BE 8 8 reg_to_user64
Note code does assume that it can only deals with 4 or 8 byte registers.
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org --- arch/arm/kvm/coproc.c | 94 +++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 69 insertions(+), 25 deletions(-)
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c index 78c0885..64b2b94 100644 --- a/arch/arm/kvm/coproc.c +++ b/arch/arm/kvm/coproc.c @@ -634,17 +634,61 @@ static struct coproc_reg invariant_cp15[] = { { CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR }, };
-static int reg_from_user(void *val, const void __user *uaddr, u64 id) +static int reg_from_user64(u64 *val, const void __user *uaddr, u64 id) +{ + unsigned long regsize = KVM_REG_SIZE(id); + union { + u32 word; + u64 dword; + } tmp = {0}; + + if (copy_from_user(&tmp, uaddr, regsize) != 0) + return -EFAULT; + + switch (regsize) { + case 4: + *val = tmp.word; + break; + case 8: + *val = tmp.dword; + break; + } + return 0; +} + +/* Note it may really copy two u32 registers */ +static int reg_from_user32(u32 *val, const void __user *uaddr, u64 id) { - /* This Just Works because we are little endian. */ if (copy_from_user(val, uaddr, KVM_REG_SIZE(id)) != 0) return -EFAULT; return 0; }
-static int reg_to_user(void __user *uaddr, const void *val, u64 id) +static int reg_to_user64(void __user *uaddr, const u64 *val, u64 id) +{ + unsigned long regsize = KVM_REG_SIZE(id); + union { + u32 word; + u64 dword; + } tmp; + + switch (regsize) { + case 4: + tmp.word = *val; + break; + case 8: + tmp.dword = *val; + break; + } + + if (copy_to_user(uaddr, &tmp, regsize) != 0) + return -EFAULT; + return 0; +} + +/* Note it may really copy two u32 registers */ +static int reg_to_user32(void __user *uaddr, const u32 *val, u64 id) { - /* This Just Works because we are little endian. */ if (copy_to_user(uaddr, val, KVM_REG_SIZE(id)) != 0) return -EFAULT; return 0; @@ -662,7 +706,7 @@ static int get_invariant_cp15(u64 id, void __user *uaddr) if (!r) return -ENOENT;
- return reg_to_user(uaddr, &r->val, id); + return reg_to_user64(uaddr, &r->val, id); }
static int set_invariant_cp15(u64 id, void __user *uaddr) @@ -678,7 +722,7 @@ static int set_invariant_cp15(u64 id, void __user *uaddr) if (!r) return -ENOENT;
- err = reg_from_user(&val, uaddr, id); + err = reg_from_user64(&val, uaddr, id); if (err) return err;
@@ -846,7 +890,7 @@ static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr) if (vfpid < num_fp_regs()) { if (KVM_REG_SIZE(id) != 8) return -ENOENT; - return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpregs[vfpid], + return reg_to_user64(uaddr, &vcpu->arch.vfp_guest.fpregs[vfpid], id); }
@@ -856,22 +900,22 @@ static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
switch (vfpid) { case KVM_REG_ARM_VFP_FPEXC: - return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpexc, id); + return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpexc, id); case KVM_REG_ARM_VFP_FPSCR: - return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpscr, id); + return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpscr, id); case KVM_REG_ARM_VFP_FPINST: - return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst, id); + return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpinst, id); case KVM_REG_ARM_VFP_FPINST2: - return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst2, id); + return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpinst2, id); case KVM_REG_ARM_VFP_MVFR0: val = fmrx(MVFR0); - return reg_to_user(uaddr, &val, id); + return reg_to_user32(uaddr, &val, id); case KVM_REG_ARM_VFP_MVFR1: val = fmrx(MVFR1); - return reg_to_user(uaddr, &val, id); + return reg_to_user32(uaddr, &val, id); case KVM_REG_ARM_VFP_FPSID: val = fmrx(FPSID); - return reg_to_user(uaddr, &val, id); + return reg_to_user32(uaddr, &val, id); default: return -ENOENT; } @@ -890,8 +934,8 @@ static int vfp_set_reg(struct kvm_vcpu *vcpu, u64 id, const void __user *uaddr) if (vfpid < num_fp_regs()) { if (KVM_REG_SIZE(id) != 8) return -ENOENT; - return reg_from_user(&vcpu->arch.vfp_guest.fpregs[vfpid], - uaddr, id); + return reg_from_user64(&vcpu->arch.vfp_guest.fpregs[vfpid], + uaddr, id); }
/* FP control registers are all 32 bit. */ @@ -900,28 +944,28 @@ static int vfp_set_reg(struct kvm_vcpu *vcpu, u64 id, const void __user *uaddr)
switch (vfpid) { case KVM_REG_ARM_VFP_FPEXC: - return reg_from_user(&vcpu->arch.vfp_guest.fpexc, uaddr, id); + return reg_from_user32(&vcpu->arch.vfp_guest.fpexc, uaddr, id); case KVM_REG_ARM_VFP_FPSCR: - return reg_from_user(&vcpu->arch.vfp_guest.fpscr, uaddr, id); + return reg_from_user32(&vcpu->arch.vfp_guest.fpscr, uaddr, id); case KVM_REG_ARM_VFP_FPINST: - return reg_from_user(&vcpu->arch.vfp_guest.fpinst, uaddr, id); + return reg_from_user32(&vcpu->arch.vfp_guest.fpinst, uaddr, id); case KVM_REG_ARM_VFP_FPINST2: - return reg_from_user(&vcpu->arch.vfp_guest.fpinst2, uaddr, id); + return reg_from_user32(&vcpu->arch.vfp_guest.fpinst2, uaddr, id); /* These are invariant. */ case KVM_REG_ARM_VFP_MVFR0: - if (reg_from_user(&val, uaddr, id)) + if (reg_from_user32(&val, uaddr, id)) return -EFAULT; if (val != fmrx(MVFR0)) return -EINVAL; return 0; case KVM_REG_ARM_VFP_MVFR1: - if (reg_from_user(&val, uaddr, id)) + if (reg_from_user32(&val, uaddr, id)) return -EFAULT; if (val != fmrx(MVFR1)) return -EINVAL; return 0; case KVM_REG_ARM_VFP_FPSID: - if (reg_from_user(&val, uaddr, id)) + if (reg_from_user32(&val, uaddr, id)) return -EFAULT; if (val != fmrx(FPSID)) return -EINVAL; @@ -968,7 +1012,7 @@ int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) return get_invariant_cp15(reg->id, uaddr);
/* Note: copies two regs if size is 64 bit. */ - return reg_to_user(uaddr, &vcpu->arch.cp15[r->reg], reg->id); + return reg_to_user32(uaddr, &vcpu->arch.cp15[r->reg], reg->id); }
int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) @@ -987,7 +1031,7 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) return set_invariant_cp15(reg->id, uaddr);
/* Note: copies two regs if size is 64 bit */ - return reg_from_user(&vcpu->arch.cp15[r->reg], uaddr, reg->id); + return reg_from_user32(&vcpu->arch.cp15[r->reg], uaddr, reg->id); }
static unsigned int num_demux_regs(void)
Hi Victor,
On Fri, Dec 20, 2013 at 08:48:43AM -0800, Victor Kamensky wrote:
This patch fixes issue of reading and writing
interesting line break.
an issue with
ARM V7 registers values from/to user land. Existing code was designed to work only in LE case.
The existing code...
'LE case'? 'little-endian'?
struct kvm_one_reg
registers value passed through kvm_one_reg structure. It is used by
registers value passed through? What are you trying to say?
KVM_SET_ONE_REG, KVM_GET_ONE_REG ioctls. Note by looking at structure
by the KVM...
the structure
itself we cannot tell what is size of register. Note that structure carries
the size of the register
address of user memory, 'addr' where register should be read or written
I'm a little confused as to the value of this Section of the commit text. I believe the ONE_REG interface is quite well documented already...
Setting register (from user-land to kvm)
kvm_arm_set_reg takes vcpu and pointer to struct kvm_one_reg which already read from user space
I think you could ditch this first sentence
kvm_arm_set_reg calls set_core_reg or kvm_arm_coproc_set_reg
nit: adding kvm_arm_set_reg() makes it clear that this is the function you're refering to, and not the ioctl as a concept.
set_core_reg deals only with 4 bytes registers, it just reads 4 bytes from user space and store it properly into vcpu->arch.regs
stores
kvm_arm_coproc_set_reg deals with registers of different size. At certain
different sizes
At a certain point
point code reaches phase where it retrieves description of register by id
the description of a register
and it knows register size, which could be either 4 or 8 bytes. Kernel code
s/could be/is/
Kernel code is ready?
is ready to read values from user space, but destination type may vary. It could be pointer to 32 bit integer or it could be pointer to 64 bit integer. And all possible permutation of size and destination pointer are
permutations
possible. Depending on destination pointer type, 4 bytes or 8 bytes, two
the destination pointer type
new helper functions are introduced - reg_from_user32 and reg_from_user64. They are used instead of reg_from_user function which could work only in LE case.
which only worked in
Size sizeof(*DstInt) Function used to read from user 4 4 reg_from_user32 8 4 reg_from_user32 - read two registers 4 8 reg_from_user64 - need special handling for BE 8 8 reg_from_user64
Getting register (to user-land from kvm)
Situation with reading registers is similar to writing. Integer pointer
The situation
type of register to be copied could be 4 or 8 bytes. And size passed in
The integer pointer
pointer to be copied? Please clarify what you are referring to.
struct kvm_one_reg could be 4 or 8. And any permutation is possible.
Any permutation of source pointer type and size is possible.
Depending on src pointer type, 4 bytes or 8 bytes, two new helper functions are introduced - reg_from_user32 and reg_to_user64. They are used instead
reg_to_user32?
of reg_to_user function, which could work only in LE case.
the reg_to_user, which worked only for LE.
Size sizeof(*SrcInt) Function used to write to user 4 4 reg_to_user32 8 4 reg_to_user32 - writes two registers 4 8 reg_to_user64 - need special handleing for BE 8 8 reg_to_user64
I think it could be slightly more helpful to put a comment on the functions, like "Write to 32-bit user pointer" on reg_to_user32, but it's up to you.
Note code does assume that it can only deals with 4 or 8 byte registers.
Note: We only support register sizes of 4 or 8 bytes.
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org
I don't mean to hammer on your commit message with all my comments. I really do appreciate you taking the time to document your changes. However, with the level of detail you are providing in the commit, I think you have to be slightly more careful with the language, so that it doesn't end up being misleading instead of helpful. I think you could sum this up much shorter to simply say that core register handling is already endian-safe, but coprocessors and vfpregs use reg_to_user which is not endian-safe, and therefore needs changing.
The motivation about the pointer types and register sizes being arbitrarily different is important though, so I appreciate you listing that.
arch/arm/kvm/coproc.c | 94 +++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 69 insertions(+), 25 deletions(-)
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c index 78c0885..64b2b94 100644 --- a/arch/arm/kvm/coproc.c +++ b/arch/arm/kvm/coproc.c @@ -634,17 +634,61 @@ static struct coproc_reg invariant_cp15[] = { { CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR }, }; -static int reg_from_user(void *val, const void __user *uaddr, u64 id) +static int reg_from_user64(u64 *val, const void __user *uaddr, u64 id) +{
- unsigned long regsize = KVM_REG_SIZE(id);
- union {
u32 word;
u64 dword;
- } tmp = {0};
- if (copy_from_user(&tmp, uaddr, regsize) != 0)
return -EFAULT;
- switch (regsize) {
- case 4:
*val = tmp.word;
break;
- case 8:
*val = tmp.dword;
break;
- }
- return 0;
+}
You stated in the commit message that any permutation of KVM_REG_SIZE(id) and sizeof(*val) is possible.
So doesn't this totally mess up the the kernel if I pass a 32-bit pointer to reg_from_user64? Or is that not really the case and that's an exception to all of the permutations?
Basically you KVM_REG_SIZE(id) and sizeof your destination pointer type should always match, but we abuse this slightly so far. I don't think you should cater to that, but just require callers to always provide a consistent size/type pair (you could also add a union you use as a parameter instead, or have two typed parameter) and simplify into a single function.
The only special cases you now have to deal with are in: set_invariant_cp15(): declare two temp variables of u32 and u64 sizes get_invariant_cp15(): either have temporary values or change val in corproc_reg to be a union
The current scheme is pretty hard to understand and to make sure we're not breaking anything...
+/* Note it may really copy two u32 registers */ +static int reg_from_user32(u32 *val, const void __user *uaddr, u64 id) {
- /* This Just Works because we are little endian. */ if (copy_from_user(val, uaddr, KVM_REG_SIZE(id)) != 0) return -EFAULT; return 0;
} -static int reg_to_user(void __user *uaddr, const void *val, u64 id) +static int reg_to_user64(void __user *uaddr, const u64 *val, u64 id) +{
- unsigned long regsize = KVM_REG_SIZE(id);
- union {
u32 word;
u64 dword;
- } tmp;
- switch (regsize) {
- case 4:
tmp.word = *val;
break;
- case 8:
tmp.dword = *val;
break;
- }
- if (copy_to_user(uaddr, &tmp, regsize) != 0)
return -EFAULT;
- return 0;
+}
+/* Note it may really copy two u32 registers */ +static int reg_to_user32(void __user *uaddr, const u32 *val, u64 id) {
- /* This Just Works because we are little endian. */ if (copy_to_user(uaddr, val, KVM_REG_SIZE(id)) != 0) return -EFAULT; return 0;
@@ -662,7 +706,7 @@ static int get_invariant_cp15(u64 id, void __user *uaddr) if (!r) return -ENOENT;
- return reg_to_user(uaddr, &r->val, id);
- return reg_to_user64(uaddr, &r->val, id);
} static int set_invariant_cp15(u64 id, void __user *uaddr) @@ -678,7 +722,7 @@ static int set_invariant_cp15(u64 id, void __user *uaddr) if (!r) return -ENOENT;
- err = reg_from_user(&val, uaddr, id);
- err = reg_from_user64(&val, uaddr, id); if (err) return err;
@@ -846,7 +890,7 @@ static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr) if (vfpid < num_fp_regs()) { if (KVM_REG_SIZE(id) != 8) return -ENOENT;
return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpregs[vfpid],
}return reg_to_user64(uaddr, &vcpu->arch.vfp_guest.fpregs[vfpid], id);
@@ -856,22 +900,22 @@ static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr) switch (vfpid) { case KVM_REG_ARM_VFP_FPEXC:
return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpexc, id);
case KVM_REG_ARM_VFP_FPSCR:return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpexc, id);
return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpscr, id);
case KVM_REG_ARM_VFP_FPINST:return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpscr, id);
return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst, id);
case KVM_REG_ARM_VFP_FPINST2:return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpinst, id);
return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst2, id);
case KVM_REG_ARM_VFP_MVFR0: val = fmrx(MVFR0);return reg_to_user32(uaddr, &vcpu->arch.vfp_guest.fpinst2, id);
return reg_to_user(uaddr, &val, id);
case KVM_REG_ARM_VFP_MVFR1: val = fmrx(MVFR1);return reg_to_user32(uaddr, &val, id);
return reg_to_user(uaddr, &val, id);
case KVM_REG_ARM_VFP_FPSID: val = fmrx(FPSID);return reg_to_user32(uaddr, &val, id);
return reg_to_user(uaddr, &val, id);
default: return -ENOENT; }return reg_to_user32(uaddr, &val, id);
@@ -890,8 +934,8 @@ static int vfp_set_reg(struct kvm_vcpu *vcpu, u64 id, const void __user *uaddr) if (vfpid < num_fp_regs()) { if (KVM_REG_SIZE(id) != 8) return -ENOENT;
return reg_from_user(&vcpu->arch.vfp_guest.fpregs[vfpid],
uaddr, id);
return reg_from_user64(&vcpu->arch.vfp_guest.fpregs[vfpid],
}uaddr, id);
/* FP control registers are all 32 bit. */ @@ -900,28 +944,28 @@ static int vfp_set_reg(struct kvm_vcpu *vcpu, u64 id, const void __user *uaddr) switch (vfpid) { case KVM_REG_ARM_VFP_FPEXC:
return reg_from_user(&vcpu->arch.vfp_guest.fpexc, uaddr, id);
case KVM_REG_ARM_VFP_FPSCR:return reg_from_user32(&vcpu->arch.vfp_guest.fpexc, uaddr, id);
return reg_from_user(&vcpu->arch.vfp_guest.fpscr, uaddr, id);
case KVM_REG_ARM_VFP_FPINST:return reg_from_user32(&vcpu->arch.vfp_guest.fpscr, uaddr, id);
return reg_from_user(&vcpu->arch.vfp_guest.fpinst, uaddr, id);
case KVM_REG_ARM_VFP_FPINST2:return reg_from_user32(&vcpu->arch.vfp_guest.fpinst, uaddr, id);
return reg_from_user(&vcpu->arch.vfp_guest.fpinst2, uaddr, id);
/* These are invariant. */ case KVM_REG_ARM_VFP_MVFR0:return reg_from_user32(&vcpu->arch.vfp_guest.fpinst2, uaddr, id);
if (reg_from_user(&val, uaddr, id))
if (val != fmrx(MVFR0)) return -EINVAL; return 0; case KVM_REG_ARM_VFP_MVFR1:if (reg_from_user32(&val, uaddr, id)) return -EFAULT;
if (reg_from_user(&val, uaddr, id))
if (val != fmrx(MVFR1)) return -EINVAL; return 0; case KVM_REG_ARM_VFP_FPSID:if (reg_from_user32(&val, uaddr, id)) return -EFAULT;
if (reg_from_user(&val, uaddr, id))
if (val != fmrx(FPSID)) return -EINVAL;if (reg_from_user32(&val, uaddr, id)) return -EFAULT;
@@ -968,7 +1012,7 @@ int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) return get_invariant_cp15(reg->id, uaddr); /* Note: copies two regs if size is 64 bit. */
- return reg_to_user(uaddr, &vcpu->arch.cp15[r->reg], reg->id);
- return reg_to_user32(uaddr, &vcpu->arch.cp15[r->reg], reg->id);
} int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) @@ -987,7 +1031,7 @@ int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) return set_invariant_cp15(reg->id, uaddr); /* Note: copies two regs if size is 64 bit */
- return reg_from_user(&vcpu->arch.cp15[r->reg], uaddr, reg->id);
- return reg_from_user32(&vcpu->arch.cp15[r->reg], uaddr, reg->id);
} static unsigned int num_demux_regs(void) -- 1.8.1.4
Thanks,
KVM mmio in BE case assumes that data it recieves is in BE format. Vgic operates in LE, so need byteswap data in BE case.
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org --- virt/kvm/arm/vgic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 685fc72..7e11458 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -236,12 +236,12 @@ static void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq)
static u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask) { - return *((u32 *)mmio->data) & mask; + return le32_to_cpu(*((u32 *)mmio->data)) & mask; }
static void mmio_data_write(struct kvm_exit_mmio *mmio, u32 mask, u32 value) { - *((u32 *)mmio->data) = value & mask; + *((u32 *)mmio->data) = cpu_to_le32(value) & mask; }
/**
On Fri, Dec 20, 2013 at 08:48:44AM -0800, Victor Kamensky wrote:
KVM mmio in BE case assumes that data it recieves is in BE format. Vgic operates in LE, so need byteswap data in BE case.
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org
virt/kvm/arm/vgic.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/virt/kvm/arm/vgic.c b/virt/kvm/arm/vgic.c index 685fc72..7e11458 100644 --- a/virt/kvm/arm/vgic.c +++ b/virt/kvm/arm/vgic.c @@ -236,12 +236,12 @@ static void vgic_cpu_irq_clear(struct kvm_vcpu *vcpu, int irq) static u32 mmio_data_read(struct kvm_exit_mmio *mmio, u32 mask) {
- return *((u32 *)mmio->data) & mask;
- return le32_to_cpu(*((u32 *)mmio->data)) & mask;
} static void mmio_data_write(struct kvm_exit_mmio *mmio, u32 mask, u32 value) {
- *((u32 *)mmio->data) = value & mask;
- *((u32 *)mmio->data) = cpu_to_le32(value) & mask;
} /** -- 1.8.1.4
The VGIC code is complicated enough without adding endianness logic in its depths. I would strongly prefer that the VGIC emulation is an endianness-agnostic software model of a device. In fact, a better fix for this whole situation would probably be to let the vgic_handle_mmio() function take a typed union (or a u64) instead of the byte array and deal with any endianness conversion outside of the vgic itself.
-Christoffer
In case of status register E bit is not set (LE mode) and host runs in BE mode we need byteswap data, so read/write is emulated correctly.
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org --- arch/arm/include/asm/kvm_emulate.h | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h index 0fa90c9..69b7469 100644 --- a/arch/arm/include/asm/kvm_emulate.h +++ b/arch/arm/include/asm/kvm_emulate.h @@ -185,9 +185,16 @@ static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu, default: return be32_to_cpu(data); } + } else { + switch (len) { + case 1: + return data & 0xff; + case 2: + return le16_to_cpu(data & 0xffff); + default: + return le32_to_cpu(data); + } } - - return data; /* Leave LE untouched */ }
static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu, @@ -203,9 +210,16 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu, default: return cpu_to_be32(data); } + } else { + switch (len) { + case 1: + return data & 0xff; + case 2: + return cpu_to_le16(data & 0xffff); + default: + return cpu_to_le32(data); + } } - - return data; /* Leave LE untouched */ }
#endif /* __ARM_KVM_EMULATE_H__ */
On Fri, Dec 20 2013 at 04:48:45 PM, Victor Kamensky victor.kamensky@linaro.org wrote:
In case of status register E bit is not set (LE mode) and host runs in BE mode we need byteswap data, so read/write is emulated correctly.
I don't think this is correct.
The only reason we byteswap the value in the BE guest case is because it has byteswapped the data the first place.
With a LE guest, the value we get in the register is the right one, no need for further processing. I think your additional byteswap only hides bugs somewhere else in the stack.
M.
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org
arch/arm/include/asm/kvm_emulate.h | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h index 0fa90c9..69b7469 100644 --- a/arch/arm/include/asm/kvm_emulate.h +++ b/arch/arm/include/asm/kvm_emulate.h @@ -185,9 +185,16 @@ static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu, default: return be32_to_cpu(data); }
- } else {
switch (len) {
case 1:
return data & 0xff;
case 2:
return le16_to_cpu(data & 0xffff);
default:
return le32_to_cpu(data);
}}
- return data; /* Leave LE untouched */
} static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu, @@ -203,9 +210,16 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu, default: return cpu_to_be32(data); }
- } else {
switch (len) {
case 1:
return data & 0xff;
case 2:
return cpu_to_le16(data & 0xffff);
default:
return cpu_to_le32(data);
}}
- return data; /* Leave LE untouched */
} #endif /* __ARM_KVM_EMULATE_H__ */
Hi Marc,
Thank you for looking into this.
On 6 January 2014 04:37, Marc Zyngier marc.zyngier@arm.com wrote:
On Fri, Dec 20 2013 at 04:48:45 PM, Victor Kamensky victor.kamensky@linaro.org wrote:
In case of status register E bit is not set (LE mode) and host runs in BE mode we need byteswap data, so read/write is emulated correctly.
I don't think this is correct.
The only reason we byteswap the value in the BE guest case is because it has byteswapped the data the first place.
With a LE guest, the value we get in the register is the right one, no need for further processing. I think your additional byteswap only hides bugs somewhere else in the stack.
First, do we agree that this patch has effect only in BE host case (CONFIG_CPU_BIG_ENDIAN=y), because in LE host case cpu_to_leXX function does nothing only simple copy, just the same we had before?
In BE host case, we have emulator (qemu, kvm-tool), host kernel, and hypervisor part of code, all, operating in BE mode; and guest could be either LE or BE (i.e E bit not set or set). That is opposite to LE host case, where we have emulator (qemu, kvm-tool), host kernel, and hypervisor part of code, all, operating in LE mode. Your changes introduced byteswap when host is LE and access is happening with E bit set. I don't see why symmetry should break for case when host is BE and access is happening with E bit cleared.
In another words, regardless of E bit setting of guest access operation rest of the stack should bring/see the same value before/after vcpu_data_host_to_guest/vcpu_data_guest_to_host functions are applied. I.e the rest of stack should be agnostic to E bit setting of access operation. Do we agree on that? Now, depending on E bit setting of guest access operation result should differ in its endianity - so in one of two cases byteswap must happen. But it will not happen in case of BE host and LE access, unless my diff is applied. Previously added byteswap code for E bit set case will not have effect because in BE host case cpu_to_beXX functions don't do anything just copy, and in another branch of if statement again it just copies the data. So regardless of E bit setting guest access resulting value is the same in case of BE host - it cannot be that way. Note, just only with your changes, in LE host case byteswap will happen if E bit is set and no byteswap if E bit is clear - so guest access resulting value does depend on E setting.
Also please note that vcpu_data_host_to_guest/vcpu_data_guest_to_host functions effectively transfer data between host kernel and memory of saved guest CPU registers. Those saved registers will be will be put back to CPU registers, or saved from CPU registers to memory by hypervisor part of code. In BE host case this hypervisor part of code operates in BE mode as well, so register set shared between host and hypervisor part of code holds guest registers values in memory in BE order. vcpu_data_host_to_guest/vcpu_data_guest_to_host function are not interacting with CPU registers directly. I am not sure, but may this point was missed.
Thanks, Victor
M.
Signed-off-by: Victor Kamensky victor.kamensky@linaro.org
arch/arm/include/asm/kvm_emulate.h | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-)
diff --git a/arch/arm/include/asm/kvm_emulate.h b/arch/arm/include/asm/kvm_emulate.h index 0fa90c9..69b7469 100644 --- a/arch/arm/include/asm/kvm_emulate.h +++ b/arch/arm/include/asm/kvm_emulate.h @@ -185,9 +185,16 @@ static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu, default: return be32_to_cpu(data); }
} else {
switch (len) {
case 1:
return data & 0xff;
case 2:
return le16_to_cpu(data & 0xffff);
default:
return le32_to_cpu(data);
} }
return data; /* Leave LE untouched */
}
static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu, @@ -203,9 +210,16 @@ static inline unsigned long vcpu_data_host_to_guest(struct kvm_vcpu *vcpu, default: return cpu_to_be32(data); }
} else {
switch (len) {
case 1:
return data & 0xff;
case 2:
return cpu_to_le16(data & 0xffff);
default:
return cpu_to_le32(data);
} }
return data; /* Leave LE untouched */
}
#endif /* __ARM_KVM_EMULATE_H__ */
-- Jazz is not dead. It just smells funny.
Hi Victor,
On Mon, Jan 06 2014 at 05:44:48 PM, Victor Kamensky victor.kamensky@linaro.org wrote:
Hi Marc,
Thank you for looking into this.
On 6 January 2014 04:37, Marc Zyngier marc.zyngier@arm.com wrote:
On Fri, Dec 20 2013 at 04:48:45 PM, Victor Kamensky victor.kamensky@linaro.org wrote:
In case of status register E bit is not set (LE mode) and host runs in BE mode we need byteswap data, so read/write is emulated correctly.
I don't think this is correct.
The only reason we byteswap the value in the BE guest case is because it has byteswapped the data the first place.
With a LE guest, the value we get in the register is the right one, no need for further processing. I think your additional byteswap only hides bugs somewhere else in the stack.
First, do we agree that this patch has effect only in BE host case (CONFIG_CPU_BIG_ENDIAN=y), because in LE host case cpu_to_leXX function does nothing only simple copy, just the same we had before?
Sure, but that is not the point.
In BE host case, we have emulator (qemu, kvm-tool), host kernel, and hypervisor part of code, all, operating in BE mode; and guest could be either LE or BE (i.e E bit not set or set). That is opposite to LE host case, where we have emulator (qemu, kvm-tool), host kernel, and hypervisor part of code, all, operating in LE mode. Your changes introduced byteswap when host is LE and access is happening with E bit set. I don't see why symmetry should break for case when host is BE and access is happening with E bit cleared.
It is certainly not about symmetry. An IO access is LE, always. Again, the only reason we byteswap a BE guest is because it tries to write to a LE device, and thus byteswapping the data before it hits the bus.
When we trap this access, we need to correct that byteswap. And that is the only case we should handle. A LE guest writes a LE value to a LE device, and nothing is to be byteswapped.
As for the value you read on the host, it will be exactly the value the guest has written (registers don't have any endianness).
In another words, regardless of E bit setting of guest access operation rest of the stack should bring/see the same value before/after vcpu_data_host_to_guest/vcpu_data_guest_to_host functions are applied. I.e the rest of stack should be agnostic to E bit setting of access operation. Do we agree on that? Now, depending on E bit setting of guest access operation result should differ in its endianity - so in one of two cases byteswap must happen. But it will not happen in case of BE host and LE access, unless my diff is applied. Previously added byteswap code for E bit set case will not have effect because in BE host case cpu_to_beXX functions don't do anything just copy, and in another branch of if statement again it just copies the data. So regardless of E bit setting guest access resulting value is the same in case of BE host - it cannot be that way. Note, just only with your changes, in LE host case byteswap will happen if E bit is set and no byteswap if E bit is clear - so guest access resulting value does depend on E setting.
Also please note that vcpu_data_host_to_guest/vcpu_data_guest_to_host functions effectively transfer data between host kernel and memory of saved guest CPU registers. Those saved registers will be will be put back to CPU registers, or saved from CPU registers to memory by hypervisor part of code. In BE host case this hypervisor part of code operates in BE mode as well, so register set shared between host and hypervisor part of code holds guest registers values in memory in BE order. vcpu_data_host_to_guest/vcpu_data_guest_to_host function are not interacting with CPU registers directly. I am not sure, but may this point was missed.
It wasn't missed. No matter how data is stored in memory (BE, LE, or even PDP endianness), CPU registers always have a consistent representation. They are immune to CPU endianness change, and storing to/reading from memory won't change the value, as long as you use the same endianness for writing/reading.
What you seems to be missing is that the emulated devices must be LE. There is no such thing as a BE GIC. So for this to work properly, you will need to fix the VGIC code (distributor emulation only) to be host-endianness agnostic, and behave like a LE device, even on a BE system. And all your other device emulations.
M.
On 6 January 2014 10:20, Marc Zyngier marc.zyngier@arm.com wrote:
Hi Victor,
On Mon, Jan 06 2014 at 05:44:48 PM, Victor Kamensky victor.kamensky@linaro.org wrote:
Hi Marc,
Thank you for looking into this.
On 6 January 2014 04:37, Marc Zyngier marc.zyngier@arm.com wrote:
On Fri, Dec 20 2013 at 04:48:45 PM, Victor Kamensky victor.kamensky@linaro.org wrote:
In case of status register E bit is not set (LE mode) and host runs in BE mode we need byteswap data, so read/write is emulated correctly.
I don't think this is correct.
The only reason we byteswap the value in the BE guest case is because it has byteswapped the data the first place.
With a LE guest, the value we get in the register is the right one, no need for further processing. I think your additional byteswap only hides bugs somewhere else in the stack.
First, do we agree that this patch has effect only in BE host case (CONFIG_CPU_BIG_ENDIAN=y), because in LE host case cpu_to_leXX function does nothing only simple copy, just the same we had before?
Sure, but that is not the point.
In BE host case, we have emulator (qemu, kvm-tool), host kernel, and hypervisor part of code, all, operating in BE mode; and guest could be either LE or BE (i.e E bit not set or set). That is opposite to LE host case, where we have emulator (qemu, kvm-tool), host kernel, and hypervisor part of code, all, operating in LE mode. Your changes introduced byteswap when host is LE and access is happening with E bit set. I don't see why symmetry should break for case when host is BE and access is happening with E bit cleared.
It is certainly not about symmetry. An IO access is LE, always. Again, the only reason we byteswap a BE guest is because it tries to write to a LE device, and thus byteswapping the data before it hits the bus.
When we trap this access, we need to correct that byteswap. And that is the only case we should handle. A LE guest writes a LE value to a LE device, and nothing is to be byteswapped.
As for the value you read on the host, it will be exactly the value the guest has written (registers don't have any endianness).
In another words, regardless of E bit setting of guest access operation rest of the stack should bring/see the same value before/after vcpu_data_host_to_guest/vcpu_data_guest_to_host functions are applied. I.e the rest of stack should be agnostic to E bit setting of access operation. Do we agree on that? Now, depending on E bit setting of guest access operation result should differ in its endianity - so in one of two cases byteswap must happen. But it will not happen in case of BE host and LE access, unless my diff is applied. Previously added byteswap code for E bit set case will not have effect because in BE host case cpu_to_beXX functions don't do anything just copy, and in another branch of if statement again it just copies the data. So regardless of E bit setting guest access resulting value is the same in case of BE host - it cannot be that way. Note, just only with your changes, in LE host case byteswap will happen if E bit is set and no byteswap if E bit is clear - so guest access resulting value does depend on E setting.
Also please note that vcpu_data_host_to_guest/vcpu_data_guest_to_host functions effectively transfer data between host kernel and memory of saved guest CPU registers. Those saved registers will be will be put back to CPU registers, or saved from CPU registers to memory by hypervisor part of code. In BE host case this hypervisor part of code operates in BE mode as well, so register set shared between host and hypervisor part of code holds guest registers values in memory in BE order. vcpu_data_host_to_guest/vcpu_data_guest_to_host function are not interacting with CPU registers directly. I am not sure, but may this point was missed.
It wasn't missed. No matter how data is stored in memory (BE, LE, or even PDP endianness), CPU registers always have a consistent representation. They are immune to CPU endianness change, and storing to/reading from memory won't change the value, as long as you use the same endianness for writing/reading.
What you seems to be missing is that the emulated devices must be LE.
It does not matter whether emulated devices are LE or BE. It is about how E bit should be *emulated* during access.
For example consider situation
1) BE host case 2a) In some place BE guest (E bit set) accesses LE device like this ldr r3, [r0, #0] rev r3, r3 2b) In the same place BE guest (E bit initially set) accesses LE device like this (which is really equivalent to 2a): setend le ldr r3, [r0, #0] setend be 3) everything else is completely the same
Regardless of how memory device at r0 address is emulated in the rest of the stack, if my patch is not applied, in BE host case after 'ldr r3, [r0, #0' is trapped and emulated for both 2a) and 2b) cases r3 would contain the same value! It is clearly wrong because in one case memory was read with E bit set and another with E bit cleared. Such reads should give the byteswapped values for the same memory location (device or not). In 2a) case after 'rev r3, r3' executes r3 value will be byteswapped compared to 2b) case - which is very different if the same 2a) and 2b) code pieces would be executed in non emulated case with real LE device.
If you suggest that current guest access E value should be propagated down to the rest of the stack I disagree - it is too invasive. I believe the rest of stack should emulate access to r0 memory in the same way regardless what is current guest access E bit value.
Note if I construct similar example for LE host and in some place in LE guest (E bit initially cleared) will have (where r0 address is emulated). 4a) ldr r3, [r0, #0] 4b) setend be ldr r3, [r0, #0] setend le rev r3, r3
The rest of stack would emulate access to r0 address memory in the same way (regardless of current E bit value) and in 4b) case value would be byteswapped by code that you added (E bit is set and host is in LE) and it would be byteswapped again by rev instruction. As result r3 value will be the same for both 4a) and 4b) cases, the same result as one would have with real non emulated device.
Thanks, Victor
There is no such thing as a BE GIC. So for this to work properly, you will need to fix the VGIC code (distributor emulation only) to be host-endianness agnostic, and behave like a LE device, even on a BE system. And all your other device emulations.
M.
-- Jazz is not dead. It just smells funny.
On 6 January 2014 18:20, Marc Zyngier marc.zyngier@arm.com wrote:
No matter how data is stored in memory (BE, LE, or even PDP endianness), CPU registers always have a consistent representation. They are immune to CPU endianness change, and storing to/reading from memory won't change the value, as long as you use the same endianness for writing/reading.
Ah, endianness. This always confuses me, but I hope the following is correct... (in all the following when I say BE I mean BE8, not BE32, since BE32 and virtualization never occur in the same CPU).
Certainly registers don't have endianness, but the entire point of the CPSR.E bit is exactly that it changes the value as it is stored to / read from memory, isn't it? -- that's where and when the byte-lane flipping happens.
Where this impacts the hypervisor is that instead of actually sending the data out to the bus via the byte-swapping h/w, we've trapped instead. The hypervisor reads the original data directly from the guest CPU registers, and so it's the hypervisor and userspace support code that between them have to emulate the equivalent of the byte lane swapping h/w. You could argue that it shouldn't be the kernel's job, but since the kernel has to do it for the devices it emulates internally, I'm not sure that makes much sense.
What you seems to be missing is that the emulated devices must be LE. There is no such thing as a BE GIC.
Right, so a BE guest would be internally flipping the 32 bit value it wants to write so that when it goes through the CPU's byte-lane swap (because CPSR.E is set) it appears to the GIC with the correct bit at the bottom, yes?
(At least I think that's what the GIC being LE means; I don't think it's like the devices on the Private Peripheral Bus on the M-profile cores which are genuinely on the CPU's side of the byte-lane swapping h/w and thus always LE regardless of the state of the endianness bit. Am I wrong there?)
It's not necessary that *all* emulated devices must be LE, of course -- you could have a QEMU which supported a board with a bunch of BE devices on it.
thanks -- PMM
On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
On 6 January 2014 18:20, Marc Zyngier marc.zyngier@arm.com wrote:
No matter how data is stored in memory (BE, LE, or even PDP endianness), CPU registers always have a consistent representation. They are immune to CPU endianness change, and storing to/reading from memory won't change the value, as long as you use the same endianness for writing/reading.
Ah, endianness. This always confuses me, but I hope the following is correct... (in all the following when I say BE I mean BE8, not BE32, since BE32 and virtualization never occur in the same CPU).
Certainly registers don't have endianness, but the entire point of the CPSR.E bit is exactly that it changes the value as it is stored to / read from memory, isn't it? -- that's where and when the byte-lane flipping happens.
Where this impacts the hypervisor is that instead of actually sending the data out to the bus via the byte-swapping h/w, we've trapped instead. The hypervisor reads the original data directly from the guest CPU registers, and so it's the hypervisor and userspace support code that between them have to emulate the equivalent of the byte lane swapping h/w. You could argue that it shouldn't be the kernel's job, but since the kernel has to do it for the devices it emulates internally, I'm not sure that makes much sense.
As far as I understand, this is exactly what vcpu_data_guest_to_host and vcpu_data_host_to_guest do; emulate the byte lane swapping.
The problem is that it only works on a little-endian host with the current code, because be16_to_cpu (for example), actually perform a byteswap, which is what needs to be emulated. On a big-endian host, we do nothing, so we end up giving a byteswapped value to the emulated device.
I think a cleaner fix than this patch is to just change the be16_to_cpu() to a __swab16() instead, which clearly indicates that 'here is the byte lane swapping'.
But admittedly this hurts my brain, so I'm not 100% sure I got this last part right.
-Christoffer
What you seems to be missing is that the emulated devices must be LE. There is no such thing as a BE GIC.
Right, so a BE guest would be internally flipping the 32 bit value it wants to write so that when it goes through the CPU's byte-lane swap (because CPSR.E is set) it appears to the GIC with the correct bit at the bottom, yes?
(At least I think that's what the GIC being LE means; I don't think it's like the devices on the Private Peripheral Bus on the M-profile cores which are genuinely on the CPU's side of the byte-lane swapping h/w and thus always LE regardless of the state of the endianness bit. Am I wrong there?)
It's not necessary that *all* emulated devices must be LE, of course -- you could have a QEMU which supported a board with a bunch of BE devices on it.
thanks -- PMM
On 6 January 2014 14:56, Christoffer Dall christoffer.dall@linaro.org wrote:
On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
On 6 January 2014 18:20, Marc Zyngier marc.zyngier@arm.com wrote:
No matter how data is stored in memory (BE, LE, or even PDP endianness), CPU registers always have a consistent representation. They are immune to CPU endianness change, and storing to/reading from memory won't change the value, as long as you use the same endianness for writing/reading.
Ah, endianness. This always confuses me, but I hope the following is correct... (in all the following when I say BE I mean BE8, not BE32, since BE32 and virtualization never occur in the same CPU).
Certainly registers don't have endianness, but the entire point of the CPSR.E bit is exactly that it changes the value as it is stored to / read from memory, isn't it? -- that's where and when the byte-lane flipping happens.
Where this impacts the hypervisor is that instead of actually sending the data out to the bus via the byte-swapping h/w, we've trapped instead. The hypervisor reads the original data directly from the guest CPU registers, and so it's the hypervisor and userspace support code that between them have to emulate the equivalent of the byte lane swapping h/w. You could argue that it shouldn't be the kernel's job, but since the kernel has to do it for the devices it emulates internally, I'm not sure that makes much sense.
As far as I understand, this is exactly what vcpu_data_guest_to_host and vcpu_data_host_to_guest do; emulate the byte lane swapping.
The problem is that it only works on a little-endian host with the current code, because be16_to_cpu (for example), actually perform a byteswap, which is what needs to be emulated. On a big-endian host, we do nothing, so we end up giving a byteswapped value to the emulated device.
Yes, that was my point on the thread: vcpu_data_guest_to_host and vcpu_data_host_to_guest functions for any given host endianity should give opposite endian results depending on CPSR E bit value. And currently it is not happening in BE host case. It seems that Peter and you agree with that and I gave example in another email with dynamically switching E bit illustrating this problem for BE host.
I think a cleaner fix than this patch is to just change the be16_to_cpu() to a __swab16() instead, which clearly indicates that 'here is the byte lane swapping'.
Yes, that may work, but it is a bit orthogonal issue. And I don't think it is better. For this to work one need to change canonical endianity on one of the sides around vcpu_data_guest_to_host and vcpu_data_host_to_guest functions.
Changing it on side that faces hypervisor (code that handles guest spilled CPU register set) does not make sense at all - if we will keep guest CPU register set in memory in LE form and hypervisor runs in BE (BE host), code that spills registers would need to do constant byteswaps. Also any access by host kernel and hypervisor (all running in BE) would need to do byteswaps while working with guest saved registers.
Changing canonical form of data on side that faces emulator and mmio part of kvm_run does not make sense either. kvm_run mmio.data field is bytes array, when it comes to host kernel from emulator, it already contains device memory in correct endian order that corresponds to endianity of emulated device. For example for LE device word read access, after call is emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2] mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function will byte copy this mmio.data buffer into integer according to ongoing mmio access size. Note in BE host case such integer, in 'data' variable of kvm_handle_mmio_return function, will have byteswapped value. Now when it will be passed into vcpu_data_host_to_guest function, and it emulates read access of guest with E bit set, and if we follow your suggestion, it will be byteswapped. I.e 'data' integer will contain non byteswapped value of LE device. It will be further stored into some vcpu_reg register, still in native format (BE store), and further restored into guest CPU register, still non byteswapped (BE hypervisor). And that is not what BE client reading word of LE device expects - BE client knowing that it reads LE device with E bit set, it will issue additional rev instruction to get device memory as integer. If we really want to follow your suggestion, one may introduce compensatory byteswaps in mmio_read_buf and mmio_write_buf functions in case of BE host, rather then just do memcpy ... but I am not sure what it will buy us - in BE case it will swap data twice.
Note in above description by "canonical" I mean some form of data regardless of current access CPSR E value. But it may differ depending on host endianess.
Also as far as working with VGIC concerned: PATCH 2/5 [1] of this series reads real h/w vgic values from #GICH_HCR, #VGIC_CPU_VMCR, etc and byteswapps them in case of BE host. So now VGIC "integer" values are present in kernel in cpu native format. When mmio_data_read, and mmio_data_write functions of vgic.c are called to fill mmio.data array because VGIC values are now in native format but mmio.data array should contain memory in device endianity (LE for VGIC) my PATCH 4/5 [2] of this series cpu_to_le32 and le32_to_cpu function to byteswap. I admit that PATCH 4/5 comment is a bit obscure.
Thanks, Victor
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2013-December/221168.h... [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2013-December/221167.h...
But admittedly this hurts my brain, so I'm not 100% sure I got this last part right.
-Christoffer
What you seems to be missing is that the emulated devices must be LE. There is no such thing as a BE GIC.
Right, so a BE guest would be internally flipping the 32 bit value it wants to write so that when it goes through the CPU's byte-lane swap (because CPSR.E is set) it appears to the GIC with the correct bit at the bottom, yes?
(At least I think that's what the GIC being LE means; I don't think it's like the devices on the Private Peripheral Bus on the M-profile cores which are genuinely on the CPU's side of the byte-lane swapping h/w and thus always LE regardless of the state of the endianness bit. Am I wrong there?)
It's not necessary that *all* emulated devices must be LE, of course -- you could have a QEMU which supported a board with a bunch of BE devices on it.
thanks -- PMM
kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
On 6 January 2014 14:56, Christoffer Dall christoffer.dall@linaro.org wrote:
On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
On 6 January 2014 18:20, Marc Zyngier marc.zyngier@arm.com wrote:
No matter how data is stored in memory (BE, LE, or even PDP endianness), CPU registers always have a consistent representation. They are immune to CPU endianness change, and storing to/reading from memory won't change the value, as long as you use the same endianness for writing/reading.
Ah, endianness. This always confuses me, but I hope the following is correct... (in all the following when I say BE I mean BE8, not BE32, since BE32 and virtualization never occur in the same CPU).
Certainly registers don't have endianness, but the entire point of the CPSR.E bit is exactly that it changes the value as it is stored to / read from memory, isn't it? -- that's where and when the byte-lane flipping happens.
Where this impacts the hypervisor is that instead of actually sending the data out to the bus via the byte-swapping h/w, we've trapped instead. The hypervisor reads the original data directly from the guest CPU registers, and so it's the hypervisor and userspace support code that between them have to emulate the equivalent of the byte lane swapping h/w. You could argue that it shouldn't be the kernel's job, but since the kernel has to do it for the devices it emulates internally, I'm not sure that makes much sense.
As far as I understand, this is exactly what vcpu_data_guest_to_host and vcpu_data_host_to_guest do; emulate the byte lane swapping.
The problem is that it only works on a little-endian host with the current code, because be16_to_cpu (for example), actually perform a byteswap, which is what needs to be emulated. On a big-endian host, we do nothing, so we end up giving a byteswapped value to the emulated device.
Yes, that was my point on the thread: vcpu_data_guest_to_host and vcpu_data_host_to_guest functions for any given host endianity should give opposite endian results depending on CPSR E bit value. And currently it is not happening in BE host case. It seems that Peter and you agree with that and I gave example in another email with dynamically switching E bit illustrating this problem for BE host.
I think a cleaner fix than this patch is to just change the be16_to_cpu() to a __swab16() instead, which clearly indicates that 'here is the byte lane swapping'.
Yes, that may work, but it is a bit orthogonal issue.
Why? I don't think it is, I think it's addressing exactly the point at hand.
And I don't think it is better. For this to work one need to change canonical endianity on one of the sides around vcpu_data_guest_to_host and vcpu_data_host_to_guest functions.
You have to simply clearly define which format you want mmio.data to be in. This is a user space interface across multiple architectures and therefore something you have to consider carefully and you're limited in choices to something that works with existing user space code.
Changing it on side that faces hypervisor (code that handles guest spilled CPU register set) does not make sense at all - if we will keep guest CPU register set in memory in LE form and hypervisor runs in BE (BE host), code that spills registers would need to do constant byteswaps. Also any access by host kernel and hypervisor (all running in BE) would need to do byteswaps while working with guest saved registers.
Changing canonical form of data on side that faces emulator and mmio part of kvm_run does not make sense either. kvm_run mmio.data field is bytes array, when it comes to host kernel from emulator, it already contains device memory in correct endian order that corresponds to endianity of emulated device. For example for LE device word read access, after call is emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2] mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function will byte copy this mmio.data buffer into integer according to ongoing mmio access size. Note in BE host case such integer, in 'data' variable of kvm_handle_mmio_return function, will have byteswapped value. Now when it will be passed into vcpu_data_host_to_guest function, and it emulates read access of guest with E bit set, and if we follow your suggestion, it will be byteswapped. I.e 'data' integer will contain non byteswapped value of LE device. It will be further stored into some vcpu_reg register, still in native format (BE store), and further restored into guest CPU register, still non byteswapped (BE hypervisor). And that is not what BE client reading word of LE device expects - BE client knowing that it reads LE device with E bit set, it will issue additional rev instruction to get device memory as integer. If we really want to follow your suggestion, one may introduce compensatory byteswaps in mmio_read_buf and mmio_write_buf functions in case of BE host, rather then just do memcpy ... but I am not sure what it will buy us - in BE case it will swap data twice.
Note in above description by "canonical" I mean some form of data regardless of current access CPSR E value. But it may differ depending on host endianess.
There's a lot of text to digest here, talking about a canonical form here doesn't help; just define the layout of the destination byte array. I also got completely lost in what you're referring to when you talk about 'sides' here.
The thing we must decide is how the data is stored in kvm_exit_mmio.data. See Peter's recent thread "KVM and variable-endianness guest CPUs". Once we agree on this, the rest should be easy (assuming we use the same structure for the data in the kernel's internal kvm_exit_mmio declared on the stack in io_mem_abort()).
The format you suggest requires any consumer of this data to consider the host endianness, which I don't think makes anything more clear (see my comment on the vgic patch).
The in-kernel interface between the io_mem_abort() code and any in-kernel emulated device must do exactly the same as the interface between KVM and QEMU must do for KVM_EXIT_MMIO.
On 20 January 2014 17:19, Christoffer Dall christoffer.dall@linaro.org wrote:
On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
On 6 January 2014 14:56, Christoffer Dall christoffer.dall@linaro.org wrote:
On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
On 6 January 2014 18:20, Marc Zyngier marc.zyngier@arm.com wrote:
No matter how data is stored in memory (BE, LE, or even PDP endianness), CPU registers always have a consistent representation. They are immune to CPU endianness change, and storing to/reading from memory won't change the value, as long as you use the same endianness for writing/reading.
Ah, endianness. This always confuses me, but I hope the following is correct... (in all the following when I say BE I mean BE8, not BE32, since BE32 and virtualization never occur in the same CPU).
Certainly registers don't have endianness, but the entire point of the CPSR.E bit is exactly that it changes the value as it is stored to / read from memory, isn't it? -- that's where and when the byte-lane flipping happens.
Where this impacts the hypervisor is that instead of actually sending the data out to the bus via the byte-swapping h/w, we've trapped instead. The hypervisor reads the original data directly from the guest CPU registers, and so it's the hypervisor and userspace support code that between them have to emulate the equivalent of the byte lane swapping h/w. You could argue that it shouldn't be the kernel's job, but since the kernel has to do it for the devices it emulates internally, I'm not sure that makes much sense.
As far as I understand, this is exactly what vcpu_data_guest_to_host and vcpu_data_host_to_guest do; emulate the byte lane swapping.
The problem is that it only works on a little-endian host with the current code, because be16_to_cpu (for example), actually perform a byteswap, which is what needs to be emulated. On a big-endian host, we do nothing, so we end up giving a byteswapped value to the emulated device.
Yes, that was my point on the thread: vcpu_data_guest_to_host and vcpu_data_host_to_guest functions for any given host endianity should give opposite endian results depending on CPSR E bit value. And currently it is not happening in BE host case. It seems that Peter and you agree with that and I gave example in another email with dynamically switching E bit illustrating this problem for BE host.
I think a cleaner fix than this patch is to just change the be16_to_cpu() to a __swab16() instead, which clearly indicates that 'here is the byte lane swapping'.
Yes, that may work, but it is a bit orthogonal issue.
Why? I don't think it is, I think it's addressing exactly the point at hand.
And I don't think it is better. For this to work one need to change canonical endianity on one of the sides around vcpu_data_guest_to_host and vcpu_data_host_to_guest functions.
You have to simply clearly define which format you want mmio.data to be in.
I believe it is already decided. 'mmio.data' in 'struct kvm_run' is not an integer type - it is bytes array. Bytes array does not have endianity. It is endian agnostic. Here is snippet from linux/kvm.h
/* KVM_EXIT_MMIO */ struct { __u64 phys_addr; __u8 data[8]; __u32 len; __u8 is_write; } mmio;
it is very natural to treat it as just a piece of memory. I.e when code reads emulated LE device address as integer, this array will contain integer placed in memory in LE order, data[3] is MSB, as it would be located in regular memory. When code reads emulated BE device address as integer this array will contain integer placed in memory in BE order, data[0] is MSB.
You can think about it in that way: ARM system emulator runs on x86 (LE) and on PPC (BE). How mmio.data array for the same emulated device should look like in across these two cases? I believe it should be identical - just a stream of bytes.
Emulator code handles this situation quite nicely. For example check in qemu endianness field of MemoryRegionOps structure. Depending of the field value and current emulator endianity code will place results into 'mmio.data' array in right order. See [1] as an example in qemu where endianity of certain ARM devices were not declared correctly - it was marked as DEVICE_NATIVE_ENDIAN whereas it should be DEVICE_LITTLE_ENDIAN. After I changed that BE qemu pretty much started working. I strongly suspect if one would run ARM system emulation on PPC (BE) he/she would need the same changes.
Note issue with virtio endianity is very different problem - there it is not clear for given arrangement of host/emulator how to treat virtio devices as LE or BE, and in what format data in rings descriptors are.
Thanks, Victor
[1] https://git.linaro.org/people/victor.kamensky/qemu-be.git/commitdiff/8599358...
This is a user space interface across multiple architectures and therefore something you have to consider carefully and you're limited in choices to something that works with existing user space code.
Changing it on side that faces hypervisor (code that handles guest spilled CPU register set) does not make sense at all - if we will keep guest CPU register set in memory in LE form and hypervisor runs in BE (BE host), code that spills registers would need to do constant byteswaps. Also any access by host kernel and hypervisor (all running in BE) would need to do byteswaps while working with guest saved registers.
Changing canonical form of data on side that faces emulator and mmio part of kvm_run does not make sense either. kvm_run mmio.data field is bytes array, when it comes to host kernel from emulator, it already contains device memory in correct endian order that corresponds to endianity of emulated device. For example for LE device word read access, after call is emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2] mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function will byte copy this mmio.data buffer into integer according to ongoing mmio access size. Note in BE host case such integer, in 'data' variable of kvm_handle_mmio_return function, will have byteswapped value. Now when it will be passed into vcpu_data_host_to_guest function, and it emulates read access of guest with E bit set, and if we follow your suggestion, it will be byteswapped. I.e 'data' integer will contain non byteswapped value of LE device. It will be further stored into some vcpu_reg register, still in native format (BE store), and further restored into guest CPU register, still non byteswapped (BE hypervisor). And that is not what BE client reading word of LE device expects - BE client knowing that it reads LE device with E bit set, it will issue additional rev instruction to get device memory as integer. If we really want to follow your suggestion, one may introduce compensatory byteswaps in mmio_read_buf and mmio_write_buf functions in case of BE host, rather then just do memcpy ... but I am not sure what it will buy us - in BE case it will swap data twice.
Note in above description by "canonical" I mean some form of data regardless of current access CPSR E value. But it may differ depending on host endianess.
There's a lot of text to digest here, talking about a canonical form here doesn't help; just define the layout of the destination byte array. I also got completely lost in what you're referring to when you talk about 'sides' here.
The thing we must decide is how the data is stored in kvm_exit_mmio.data. See Peter's recent thread "KVM and variable-endianness guest CPUs". Once we agree on this, the rest should be easy (assuming we use the same structure for the data in the kernel's internal kvm_exit_mmio declared on the stack in io_mem_abort()).
The format you suggest requires any consumer of this data to consider the host endianness, which I don't think makes anything more clear (see my comment on the vgic patch).
The in-kernel interface between the io_mem_abort() code and any in-kernel emulated device must do exactly the same as the interface between KVM and QEMU must do for KVM_EXIT_MMIO.
-- Christoffer
On Tue, Jan 21, 2014 at 10:54 AM, Victor Kamensky victor.kamensky@linaro.org wrote:
On 20 January 2014 17:19, Christoffer Dall christoffer.dall@linaro.org wrote:
On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
On 6 January 2014 14:56, Christoffer Dall christoffer.dall@linaro.org wrote:
On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
On 6 January 2014 18:20, Marc Zyngier marc.zyngier@arm.com wrote:
No matter how data is stored in memory (BE, LE, or even PDP endianness), CPU registers always have a consistent representation. They are immune to CPU endianness change, and storing to/reading from memory won't change the value, as long as you use the same endianness for writing/reading.
Ah, endianness. This always confuses me, but I hope the following is correct... (in all the following when I say BE I mean BE8, not BE32, since BE32 and virtualization never occur in the same CPU).
Certainly registers don't have endianness, but the entire point of the CPSR.E bit is exactly that it changes the value as it is stored to / read from memory, isn't it? -- that's where and when the byte-lane flipping happens.
Where this impacts the hypervisor is that instead of actually sending the data out to the bus via the byte-swapping h/w, we've trapped instead. The hypervisor reads the original data directly from the guest CPU registers, and so it's the hypervisor and userspace support code that between them have to emulate the equivalent of the byte lane swapping h/w. You could argue that it shouldn't be the kernel's job, but since the kernel has to do it for the devices it emulates internally, I'm not sure that makes much sense.
As far as I understand, this is exactly what vcpu_data_guest_to_host and vcpu_data_host_to_guest do; emulate the byte lane swapping.
The problem is that it only works on a little-endian host with the current code, because be16_to_cpu (for example), actually perform a byteswap, which is what needs to be emulated. On a big-endian host, we do nothing, so we end up giving a byteswapped value to the emulated device.
Yes, that was my point on the thread: vcpu_data_guest_to_host and vcpu_data_host_to_guest functions for any given host endianity should give opposite endian results depending on CPSR E bit value. And currently it is not happening in BE host case. It seems that Peter and you agree with that and I gave example in another email with dynamically switching E bit illustrating this problem for BE host.
I think a cleaner fix than this patch is to just change the be16_to_cpu() to a __swab16() instead, which clearly indicates that 'here is the byte lane swapping'.
Yes, that may work, but it is a bit orthogonal issue.
Why? I don't think it is, I think it's addressing exactly the point at hand.
And I don't think it is better. For this to work one need to change canonical endianity on one of the sides around vcpu_data_guest_to_host and vcpu_data_host_to_guest functions.
You have to simply clearly define which format you want mmio.data to be in.
I believe it is already decided. 'mmio.data' in 'struct kvm_run' is not an integer type - it is bytes array. Bytes array does not have endianity. It is endian agnostic. Here is snippet from linux/kvm.h
/* KVM_EXIT_MMIO */ struct { __u64 phys_addr; __u8 data[8]; __u32 len; __u8 is_write; } mmio;
it is very natural to treat it as just a piece of memory. I.e when code reads emulated LE device address as integer, this array will contain integer placed in memory in LE order, data[3] is MSB, as it would be located in regular memory. When code reads emulated BE device address as integer this array will contain integer placed in memory in BE order, data[0] is MSB.
You can think about it in that way: ARM system emulator runs on x86 (LE) and on PPC (BE). How mmio.data array for the same emulated device should look like in across these two cases? I believe it should be identical - just a stream of bytes.
Emulator code handles this situation quite nicely. For example check in qemu endianness field of MemoryRegionOps structure. Depending of the field value and current emulator endianity code will place results into 'mmio.data' array in right order. See [1] as an example in qemu where endianity of certain ARM devices were not declared correctly - it was marked as DEVICE_NATIVE_ENDIAN whereas it should be DEVICE_LITTLE_ENDIAN. After I changed that BE qemu pretty much started working. I strongly suspect if one would run ARM system emulation on PPC (BE) he/she would need the same changes.
Note issue with virtio endianity is very different problem - there it is not clear for given arrangement of host/emulator how to treat virtio devices as LE or BE, and in what format data in rings descriptors are.
IMHO, device endianess should be taken care by device emulators only because we can have Machine Model containing both LE devices and BE devices. KVM ARM/ARM64 should only worry about endianess of in-kernel emulated devices (e.g. VGIC). In general, QEMU or KVMTOOL should be responsible of device endianess and for this QEMU or KVMTOOL should also know whether Guest (or VM) is little-endian or big-endian.
Regards, Anup
Thanks, Victor
[1] https://git.linaro.org/people/victor.kamensky/qemu-be.git/commitdiff/8599358...
This is a user space interface across multiple architectures and therefore something you have to consider carefully and you're limited in choices to something that works with existing user space code.
Changing it on side that faces hypervisor (code that handles guest spilled CPU register set) does not make sense at all - if we will keep guest CPU register set in memory in LE form and hypervisor runs in BE (BE host), code that spills registers would need to do constant byteswaps. Also any access by host kernel and hypervisor (all running in BE) would need to do byteswaps while working with guest saved registers.
Changing canonical form of data on side that faces emulator and mmio part of kvm_run does not make sense either. kvm_run mmio.data field is bytes array, when it comes to host kernel from emulator, it already contains device memory in correct endian order that corresponds to endianity of emulated device. For example for LE device word read access, after call is emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2] mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function will byte copy this mmio.data buffer into integer according to ongoing mmio access size. Note in BE host case such integer, in 'data' variable of kvm_handle_mmio_return function, will have byteswapped value. Now when it will be passed into vcpu_data_host_to_guest function, and it emulates read access of guest with E bit set, and if we follow your suggestion, it will be byteswapped. I.e 'data' integer will contain non byteswapped value of LE device. It will be further stored into some vcpu_reg register, still in native format (BE store), and further restored into guest CPU register, still non byteswapped (BE hypervisor). And that is not what BE client reading word of LE device expects - BE client knowing that it reads LE device with E bit set, it will issue additional rev instruction to get device memory as integer. If we really want to follow your suggestion, one may introduce compensatory byteswaps in mmio_read_buf and mmio_write_buf functions in case of BE host, rather then just do memcpy ... but I am not sure what it will buy us - in BE case it will swap data twice.
Note in above description by "canonical" I mean some form of data regardless of current access CPSR E value. But it may differ depending on host endianess.
There's a lot of text to digest here, talking about a canonical form here doesn't help; just define the layout of the destination byte array. I also got completely lost in what you're referring to when you talk about 'sides' here.
The thing we must decide is how the data is stored in kvm_exit_mmio.data. See Peter's recent thread "KVM and variable-endianness guest CPUs". Once we agree on this, the rest should be easy (assuming we use the same structure for the data in the kernel's internal kvm_exit_mmio declared on the stack in io_mem_abort()).
The format you suggest requires any consumer of this data to consider the host endianness, which I don't think makes anything more clear (see my comment on the vgic patch).
The in-kernel interface between the io_mem_abort() code and any in-kernel emulated device must do exactly the same as the interface between KVM and QEMU must do for KVM_EXIT_MMIO.
-- Christoffer
kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
On Tue, Jan 21, 2014 at 11:16:46AM +0530, Anup Patel wrote:
On Tue, Jan 21, 2014 at 10:54 AM, Victor Kamensky victor.kamensky@linaro.org wrote:
On 20 January 2014 17:19, Christoffer Dall christoffer.dall@linaro.org wrote:
On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
On 6 January 2014 14:56, Christoffer Dall christoffer.dall@linaro.org wrote:
On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
On 6 January 2014 18:20, Marc Zyngier marc.zyngier@arm.com wrote: > No matter how data is stored in memory (BE, LE, or > even PDP endianness), CPU registers always have a consistent > representation. They are immune to CPU endianness change, and storing > to/reading from memory won't change the value, as long as you use the > same endianness for writing/reading.
Ah, endianness. This always confuses me, but I hope the following is correct... (in all the following when I say BE I mean BE8, not BE32, since BE32 and virtualization never occur in the same CPU).
Certainly registers don't have endianness, but the entire point of the CPSR.E bit is exactly that it changes the value as it is stored to / read from memory, isn't it? -- that's where and when the byte-lane flipping happens.
Where this impacts the hypervisor is that instead of actually sending the data out to the bus via the byte-swapping h/w, we've trapped instead. The hypervisor reads the original data directly from the guest CPU registers, and so it's the hypervisor and userspace support code that between them have to emulate the equivalent of the byte lane swapping h/w. You could argue that it shouldn't be the kernel's job, but since the kernel has to do it for the devices it emulates internally, I'm not sure that makes much sense.
As far as I understand, this is exactly what vcpu_data_guest_to_host and vcpu_data_host_to_guest do; emulate the byte lane swapping.
The problem is that it only works on a little-endian host with the current code, because be16_to_cpu (for example), actually perform a byteswap, which is what needs to be emulated. On a big-endian host, we do nothing, so we end up giving a byteswapped value to the emulated device.
Yes, that was my point on the thread: vcpu_data_guest_to_host and vcpu_data_host_to_guest functions for any given host endianity should give opposite endian results depending on CPSR E bit value. And currently it is not happening in BE host case. It seems that Peter and you agree with that and I gave example in another email with dynamically switching E bit illustrating this problem for BE host.
I think a cleaner fix than this patch is to just change the be16_to_cpu() to a __swab16() instead, which clearly indicates that 'here is the byte lane swapping'.
Yes, that may work, but it is a bit orthogonal issue.
Why? I don't think it is, I think it's addressing exactly the point at hand.
And I don't think it is better. For this to work one need to change canonical endianity on one of the sides around vcpu_data_guest_to_host and vcpu_data_host_to_guest functions.
You have to simply clearly define which format you want mmio.data to be in.
I believe it is already decided. 'mmio.data' in 'struct kvm_run' is not an integer type - it is bytes array. Bytes array does not have endianity. It is endian agnostic. Here is snippet from linux/kvm.h
/* KVM_EXIT_MMIO */ struct { __u64 phys_addr; __u8 data[8]; __u32 len; __u8 is_write; } mmio;
it is very natural to treat it as just a piece of memory. I.e when code reads emulated LE device address as integer, this array will contain integer placed in memory in LE order, data[3] is MSB, as it would be located in regular memory. When code reads emulated BE device address as integer this array will contain integer placed in memory in BE order, data[0] is MSB.
You can think about it in that way: ARM system emulator runs on x86 (LE) and on PPC (BE). How mmio.data array for the same emulated device should look like in across these two cases? I believe it should be identical - just a stream of bytes.
Emulator code handles this situation quite nicely. For example check in qemu endianness field of MemoryRegionOps structure. Depending of the field value and current emulator endianity code will place results into 'mmio.data' array in right order. See [1] as an example in qemu where endianity of certain ARM devices were not declared correctly - it was marked as DEVICE_NATIVE_ENDIAN whereas it should be DEVICE_LITTLE_ENDIAN. After I changed that BE qemu pretty much started working. I strongly suspect if one would run ARM system emulation on PPC (BE) he/she would need the same changes.
Note issue with virtio endianity is very different problem - there it is not clear for given arrangement of host/emulator how to treat virtio devices as LE or BE, and in what format data in rings descriptors are.
IMHO, device endianess should be taken care by device emulators only because we can have Machine Model containing both LE devices and BE devices. KVM ARM/ARM64 should only worry about endianess of in-kernel emulated devices (e.g. VGIC). In general, QEMU or KVMTOOL should be responsible of device endianess and for this QEMU or KVMTOOL should also know whether Guest (or VM) is little-endian or big-endian.
Specifying the interface to say that this is a store of the register value directly using the endianness of the host kernel is an option. However, user space must fetch the CPSR on each MMIO from the kernel and look at the E-bit to understand how it should interpret the data, which may add overhead, and it doesn't change the fact that this needs to be specified in the API.
The E bit on ARM specifies that the CPU will swap the bytes before putting the register value on the memory bus. That's all it does.
Something has to emulate this, and given that KVM emulates the CPU, I think KVM should emulate the E-bit.
From my point of view, the mmio.data API as the signal you would receive
if you're any consumer of the memory operation externally to the CPU, which would be in the form of a bunch of wires and a length, with no endianness.
But, the thread I pointed Victor to is focused purely on this discussion, so you should probably respond there.
-Christoffer
Thanks, Victor
[1] https://git.linaro.org/people/victor.kamensky/qemu-be.git/commitdiff/8599358...
This is a user space interface across multiple architectures and therefore something you have to consider carefully and you're limited in choices to something that works with existing user space code.
Changing it on side that faces hypervisor (code that handles guest spilled CPU register set) does not make sense at all - if we will keep guest CPU register set in memory in LE form and hypervisor runs in BE (BE host), code that spills registers would need to do constant byteswaps. Also any access by host kernel and hypervisor (all running in BE) would need to do byteswaps while working with guest saved registers.
Changing canonical form of data on side that faces emulator and mmio part of kvm_run does not make sense either. kvm_run mmio.data field is bytes array, when it comes to host kernel from emulator, it already contains device memory in correct endian order that corresponds to endianity of emulated device. For example for LE device word read access, after call is emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2] mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function will byte copy this mmio.data buffer into integer according to ongoing mmio access size. Note in BE host case such integer, in 'data' variable of kvm_handle_mmio_return function, will have byteswapped value. Now when it will be passed into vcpu_data_host_to_guest function, and it emulates read access of guest with E bit set, and if we follow your suggestion, it will be byteswapped. I.e 'data' integer will contain non byteswapped value of LE device. It will be further stored into some vcpu_reg register, still in native format (BE store), and further restored into guest CPU register, still non byteswapped (BE hypervisor). And that is not what BE client reading word of LE device expects - BE client knowing that it reads LE device with E bit set, it will issue additional rev instruction to get device memory as integer. If we really want to follow your suggestion, one may introduce compensatory byteswaps in mmio_read_buf and mmio_write_buf functions in case of BE host, rather then just do memcpy ... but I am not sure what it will buy us - in BE case it will swap data twice.
Note in above description by "canonical" I mean some form of data regardless of current access CPSR E value. But it may differ depending on host endianess.
There's a lot of text to digest here, talking about a canonical form here doesn't help; just define the layout of the destination byte array. I also got completely lost in what you're referring to when you talk about 'sides' here.
The thing we must decide is how the data is stored in kvm_exit_mmio.data. See Peter's recent thread "KVM and variable-endianness guest CPUs". Once we agree on this, the rest should be easy (assuming we use the same structure for the data in the kernel's internal kvm_exit_mmio declared on the stack in io_mem_abort()).
The format you suggest requires any consumer of this data to consider the host endianness, which I don't think makes anything more clear (see my comment on the vgic patch).
The in-kernel interface between the io_mem_abort() code and any in-kernel emulated device must do exactly the same as the interface between KVM and QEMU must do for KVM_EXIT_MMIO.
-- Christoffer
kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
Hi Anup,
On 20 January 2014 21:46, Anup Patel anup@brainfault.org wrote:
On Tue, Jan 21, 2014 at 10:54 AM, Victor Kamensky victor.kamensky@linaro.org wrote:
On 20 January 2014 17:19, Christoffer Dall christoffer.dall@linaro.org wrote:
On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
On 6 January 2014 14:56, Christoffer Dall christoffer.dall@linaro.org wrote:
On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
On 6 January 2014 18:20, Marc Zyngier marc.zyngier@arm.com wrote: > No matter how data is stored in memory (BE, LE, or > even PDP endianness), CPU registers always have a consistent > representation. They are immune to CPU endianness change, and storing > to/reading from memory won't change the value, as long as you use the > same endianness for writing/reading.
Ah, endianness. This always confuses me, but I hope the following is correct... (in all the following when I say BE I mean BE8, not BE32, since BE32 and virtualization never occur in the same CPU).
Certainly registers don't have endianness, but the entire point of the CPSR.E bit is exactly that it changes the value as it is stored to / read from memory, isn't it? -- that's where and when the byte-lane flipping happens.
Where this impacts the hypervisor is that instead of actually sending the data out to the bus via the byte-swapping h/w, we've trapped instead. The hypervisor reads the original data directly from the guest CPU registers, and so it's the hypervisor and userspace support code that between them have to emulate the equivalent of the byte lane swapping h/w. You could argue that it shouldn't be the kernel's job, but since the kernel has to do it for the devices it emulates internally, I'm not sure that makes much sense.
As far as I understand, this is exactly what vcpu_data_guest_to_host and vcpu_data_host_to_guest do; emulate the byte lane swapping.
The problem is that it only works on a little-endian host with the current code, because be16_to_cpu (for example), actually perform a byteswap, which is what needs to be emulated. On a big-endian host, we do nothing, so we end up giving a byteswapped value to the emulated device.
Yes, that was my point on the thread: vcpu_data_guest_to_host and vcpu_data_host_to_guest functions for any given host endianity should give opposite endian results depending on CPSR E bit value. And currently it is not happening in BE host case. It seems that Peter and you agree with that and I gave example in another email with dynamically switching E bit illustrating this problem for BE host.
I think a cleaner fix than this patch is to just change the be16_to_cpu() to a __swab16() instead, which clearly indicates that 'here is the byte lane swapping'.
Yes, that may work, but it is a bit orthogonal issue.
Why? I don't think it is, I think it's addressing exactly the point at hand.
And I don't think it is better. For this to work one need to change canonical endianity on one of the sides around vcpu_data_guest_to_host and vcpu_data_host_to_guest functions.
You have to simply clearly define which format you want mmio.data to be in.
I believe it is already decided. 'mmio.data' in 'struct kvm_run' is not an integer type - it is bytes array. Bytes array does not have endianity. It is endian agnostic. Here is snippet from linux/kvm.h
/* KVM_EXIT_MMIO */ struct { __u64 phys_addr; __u8 data[8]; __u32 len; __u8 is_write; } mmio;
it is very natural to treat it as just a piece of memory. I.e when code reads emulated LE device address as integer, this array will contain integer placed in memory in LE order, data[3] is MSB, as it would be located in regular memory. When code reads emulated BE device address as integer this array will contain integer placed in memory in BE order, data[0] is MSB.
You can think about it in that way: ARM system emulator runs on x86 (LE) and on PPC (BE). How mmio.data array for the same emulated device should look like in across these two cases? I believe it should be identical - just a stream of bytes.
Emulator code handles this situation quite nicely. For example check in qemu endianness field of MemoryRegionOps structure. Depending of the field value and current emulator endianity code will place results into 'mmio.data' array in right order. See [1] as an example in qemu where endianity of certain ARM devices were not declared correctly - it was marked as DEVICE_NATIVE_ENDIAN whereas it should be DEVICE_LITTLE_ENDIAN. After I changed that BE qemu pretty much started working. I strongly suspect if one would run ARM system emulation on PPC (BE) he/she would need the same changes.
Note issue with virtio endianity is very different problem - there it is not clear for given arrangement of host/emulator how to treat virtio devices as LE or BE, and in what format data in rings descriptors are.
IMHO, device endianess should be taken care by device emulators only because we can have Machine Model containing both LE devices and BE devices. KVM ARM/ARM64 should only worry about endianess of in-kernel emulated devices (e.g. VGIC). In general, QEMU or KVMTOOL should be responsible of device endianess and for this QEMU or KVMTOOL should also know whether Guest (or VM) is little-endian or big-endian.
I agree with most of above statement except last part. I think emulator and host KVM should not really care about guest endianity. They should work in the same way in either case. MarcZ illustrated this earlier with setup where LE KVM hosted either LE guest or BE guest. Also note endianity as far as emulation concerned strictly speaking is not property of the guest, it is rather property of current CPU execution context (i.e E bit in CPSR reg of V7) In fact access endianity can change on the fly - i.e when BE V7 image starts initially it assumes that it runs in LE mode, then once kernel entered it switches CPU into BE mode, the same happens with secondary CPU callback. And with the last one I run into situation where such callback before switching into BE mode read some emulated device with E bit off, latter the same kernel reads the same device register with E bit on
Thanks, Victor
Regards, Anup
Thanks, Victor
[1] https://git.linaro.org/people/victor.kamensky/qemu-be.git/commitdiff/8599358...
This is a user space interface across multiple architectures and therefore something you have to consider carefully and you're limited in choices to something that works with existing user space code.
Changing it on side that faces hypervisor (code that handles guest spilled CPU register set) does not make sense at all - if we will keep guest CPU register set in memory in LE form and hypervisor runs in BE (BE host), code that spills registers would need to do constant byteswaps. Also any access by host kernel and hypervisor (all running in BE) would need to do byteswaps while working with guest saved registers.
Changing canonical form of data on side that faces emulator and mmio part of kvm_run does not make sense either. kvm_run mmio.data field is bytes array, when it comes to host kernel from emulator, it already contains device memory in correct endian order that corresponds to endianity of emulated device. For example for LE device word read access, after call is emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2] mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function will byte copy this mmio.data buffer into integer according to ongoing mmio access size. Note in BE host case such integer, in 'data' variable of kvm_handle_mmio_return function, will have byteswapped value. Now when it will be passed into vcpu_data_host_to_guest function, and it emulates read access of guest with E bit set, and if we follow your suggestion, it will be byteswapped. I.e 'data' integer will contain non byteswapped value of LE device. It will be further stored into some vcpu_reg register, still in native format (BE store), and further restored into guest CPU register, still non byteswapped (BE hypervisor). And that is not what BE client reading word of LE device expects - BE client knowing that it reads LE device with E bit set, it will issue additional rev instruction to get device memory as integer. If we really want to follow your suggestion, one may introduce compensatory byteswaps in mmio_read_buf and mmio_write_buf functions in case of BE host, rather then just do memcpy ... but I am not sure what it will buy us - in BE case it will swap data twice.
Note in above description by "canonical" I mean some form of data regardless of current access CPSR E value. But it may differ depending on host endianess.
There's a lot of text to digest here, talking about a canonical form here doesn't help; just define the layout of the destination byte array. I also got completely lost in what you're referring to when you talk about 'sides' here.
The thing we must decide is how the data is stored in kvm_exit_mmio.data. See Peter's recent thread "KVM and variable-endianness guest CPUs". Once we agree on this, the rest should be easy (assuming we use the same structure for the data in the kernel's internal kvm_exit_mmio declared on the stack in io_mem_abort()).
The format you suggest requires any consumer of this data to consider the host endianness, which I don't think makes anything more clear (see my comment on the vgic patch).
The in-kernel interface between the io_mem_abort() code and any in-kernel emulated device must do exactly the same as the interface between KVM and QEMU must do for KVM_EXIT_MMIO.
-- Christoffer
kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm
On Mon, Jan 20, 2014 at 09:24:10PM -0800, Victor Kamensky wrote:
On 20 January 2014 17:19, Christoffer Dall christoffer.dall@linaro.org wrote:
On Mon, Jan 06, 2014 at 05:59:03PM -0800, Victor Kamensky wrote:
On 6 January 2014 14:56, Christoffer Dall christoffer.dall@linaro.org wrote:
On Mon, Jan 06, 2014 at 10:31:42PM +0000, Peter Maydell wrote:
On 6 January 2014 18:20, Marc Zyngier marc.zyngier@arm.com wrote:
No matter how data is stored in memory (BE, LE, or even PDP endianness), CPU registers always have a consistent representation. They are immune to CPU endianness change, and storing to/reading from memory won't change the value, as long as you use the same endianness for writing/reading.
Ah, endianness. This always confuses me, but I hope the following is correct... (in all the following when I say BE I mean BE8, not BE32, since BE32 and virtualization never occur in the same CPU).
Certainly registers don't have endianness, but the entire point of the CPSR.E bit is exactly that it changes the value as it is stored to / read from memory, isn't it? -- that's where and when the byte-lane flipping happens.
Where this impacts the hypervisor is that instead of actually sending the data out to the bus via the byte-swapping h/w, we've trapped instead. The hypervisor reads the original data directly from the guest CPU registers, and so it's the hypervisor and userspace support code that between them have to emulate the equivalent of the byte lane swapping h/w. You could argue that it shouldn't be the kernel's job, but since the kernel has to do it for the devices it emulates internally, I'm not sure that makes much sense.
As far as I understand, this is exactly what vcpu_data_guest_to_host and vcpu_data_host_to_guest do; emulate the byte lane swapping.
The problem is that it only works on a little-endian host with the current code, because be16_to_cpu (for example), actually perform a byteswap, which is what needs to be emulated. On a big-endian host, we do nothing, so we end up giving a byteswapped value to the emulated device.
Yes, that was my point on the thread: vcpu_data_guest_to_host and vcpu_data_host_to_guest functions for any given host endianity should give opposite endian results depending on CPSR E bit value. And currently it is not happening in BE host case. It seems that Peter and you agree with that and I gave example in another email with dynamically switching E bit illustrating this problem for BE host.
I think a cleaner fix than this patch is to just change the be16_to_cpu() to a __swab16() instead, which clearly indicates that 'here is the byte lane swapping'.
Yes, that may work, but it is a bit orthogonal issue.
Why? I don't think it is, I think it's addressing exactly the point at hand.
And I don't think it is better. For this to work one need to change canonical endianity on one of the sides around vcpu_data_guest_to_host and vcpu_data_host_to_guest functions.
You have to simply clearly define which format you want mmio.data to be in.
I believe it is already decided. 'mmio.data' in 'struct kvm_run' is not an integer type - it is bytes array. Bytes array does not have endianity.
Please read through this thread: https://lists.cs.columbia.edu/pipermail/kvmarm/2014-January/008784.html
It is endian agnostic. Here is snippet from linux/kvm.h
/* KVM_EXIT_MMIO */ struct { __u64 phys_addr; __u8 data[8]; __u32 len; __u8 is_write; } mmio;
Thanks, I already knew where to find this though ;)
I realize that it is a byte array. But that doesn't change the fact that a store of a word would have to put either the most or least significant byte in data[0].
it is very natural to treat it as just a piece of memory. I.e when code reads emulated LE device address as integer, this array will contain integer placed in memory in LE order, data[3] is MSB, as it would be located in regular memory. When code reads emulated BE device address as integer this array will contain integer placed in memory in BE order, data[0] is MSB.
I don't understand this. "code reads emulated device address as integer". The format of the byte array cannot be device-specific, because the kernel doesn't know about device. It can only depend on the endianness of the VM and of the host.
Can you try in a single sentence to to specify what the format of the byte array is?
You can think about it in that way: ARM system emulator runs on x86 (LE) and on PPC (BE). How mmio.data array for the same emulated device should look like in across these two cases? I believe it should be identical - just a stream of bytes.
Well, KVM/ARM cannot run on PPC for obvious reasons, and this is a KVM kernel to user space interface.
Emulator code handles this situation quite nicely. For example check in qemu endianness field of MemoryRegionOps structure. Depending of the field value and current emulator endianity code will place results into 'mmio.data' array in right order. See [1] as an example in qemu where endianity of certain ARM devices were not declared correctly - it was marked as DEVICE_NATIVE_ENDIAN whereas it should be DEVICE_LITTLE_ENDIAN. After I changed that BE qemu pretty much started working. I strongly suspect if one would run ARM system emulation on PPC (BE) he/she would need the same changes.
It doesn't really matter what the emulator does if there's no clear specification of the interface it relies on. It may happen to work in the cases that are already supported (by chance), but we don't know how to deal with a new (cross-endianness situation) because it is not specified.
Note issue with virtio endianity is very different problem - there it is not clear for given arrangement of host/emulator how to treat virtio devices as LE or BE, and in what format data in rings descriptors are.
Thanks, Victor
[1] https://git.linaro.org/people/victor.kamensky/qemu-be.git/commitdiff/8599358...
This is a user space interface across multiple architectures and therefore something you have to consider carefully and you're limited in choices to something that works with existing user space code.
Changing it on side that faces hypervisor (code that handles guest spilled CPU register set) does not make sense at all - if we will keep guest CPU register set in memory in LE form and hypervisor runs in BE (BE host), code that spills registers would need to do constant byteswaps. Also any access by host kernel and hypervisor (all running in BE) would need to do byteswaps while working with guest saved registers.
Changing canonical form of data on side that faces emulator and mmio part of kvm_run does not make sense either. kvm_run mmio.data field is bytes array, when it comes to host kernel from emulator, it already contains device memory in correct endian order that corresponds to endianity of emulated device. For example for LE device word read access, after call is emulated, mmio.data will contain mmio.data[0], mmio.data[1], mmio.data[2] mmio.data[3] values in LE order (mmio.data[3] is MSB). Now look at mmio_read_buf function introduced by Marc's 6d89d2d9 commit, this function will byte copy this mmio.data buffer into integer according to ongoing mmio access size. Note in BE host case such integer, in 'data' variable of kvm_handle_mmio_return function, will have byteswapped value. Now when it will be passed into vcpu_data_host_to_guest function, and it emulates read access of guest with E bit set, and if we follow your suggestion, it will be byteswapped. I.e 'data' integer will contain non byteswapped value of LE device. It will be further stored into some vcpu_reg register, still in native format (BE store), and further restored into guest CPU register, still non byteswapped (BE hypervisor). And that is not what BE client reading word of LE device expects - BE client knowing that it reads LE device with E bit set, it will issue additional rev instruction to get device memory as integer. If we really want to follow your suggestion, one may introduce compensatory byteswaps in mmio_read_buf and mmio_write_buf functions in case of BE host, rather then just do memcpy ... but I am not sure what it will buy us - in BE case it will swap data twice.
Note in above description by "canonical" I mean some form of data regardless of current access CPSR E value. But it may differ depending on host endianess.
There's a lot of text to digest here, talking about a canonical form here doesn't help; just define the layout of the destination byte array. I also got completely lost in what you're referring to when you talk about 'sides' here.
The thing we must decide is how the data is stored in kvm_exit_mmio.data. See Peter's recent thread "KVM and variable-endianness guest CPUs". Once we agree on this, the rest should be easy (assuming we use the same structure for the data in the kernel's internal kvm_exit_mmio declared on the stack in io_mem_abort()).
The format you suggest requires any consumer of this data to consider the host endianness, which I don't think makes anything more clear (see my comment on the vgic patch).
The in-kernel interface between the io_mem_abort() code and any in-kernel emulated device must do exactly the same as the interface between KVM and QEMU must do for KVM_EXIT_MMIO.
-- Christoffer
linaro-kernel@lists.linaro.org