On 16 August 2013 22:44, Christoffer Dall christoffer.dall@linaro.orgwrote:
On Fri, Aug 16, 2013 at 10:32:30AM +0530, Anup Patel wrote:
On Thu, Aug 15, 2013 at 10:23 PM, Christoffer Dall christoffer.dall@linaro.org wrote:
On Thu, Aug 15, 2013 at 04:37:27PM +0100, Marc Zyngier wrote:
On 2013-08-15 16:13, Anup Patel wrote:
Hi Marc,
On Thu, Aug 15, 2013 at 8:17 PM, Marc Zyngier <marc.zyngier@arm.com
wrote:
On 2013-08-15 14:31, Anup Patel wrote: > > On Thu, Aug 15, 2013 at 2:01 PM, Marc Zyngier > marc.zyngier@arm.com > wrote: >> >> On 2013-08-15 07:26, Anup Patel wrote: >>> >>> >>> Hi Marc, >>> >>> On Thu, Aug 15, 2013 at 10:22 AM, Marc Zyngier >>> marc.zyngier@arm.com >>> wrote: >>>> >>>> >>>> Hi Anup, >>>> >>>> >>>> On 2013-08-14 15:22, Anup Patel wrote: >>>>> >>>>> >>>>> >>>>> On Wed, Aug 14, 2013 at 5:34 PM, Marc Zyngier >>>>> marc.zyngier@arm.com >>>>> wrote: >>>>>> >>>>>> >>>>>> >>>>>> Hi Pranav, >>>>>> >>>>>> On 2013-08-14 12:47, Pranavkumar Sawargaonkar wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> Systems with large external L3-cache (few MBs), might have >>>>>>> dirty >>>>>>> content belonging to the guest page in L3-cache. To tackle >>>>>>> this, >>>>>>> we need to flush such dirty content from d-cache so that >>>>>>> guest >>>>>>> will see correct contents of guest page when guest MMU is >>>>>>> disabled. >>>>>>> >>>>>>> The patch fixes coherent_icache_guest_page() for external >>>>>>> L3-cache. >>>>>>> >>>>>>> Signed-off-by: Pranavkumar Sawargaonkar >>>>>>> pranavkumar@linaro.org >>>>>>> Signed-off-by: Anup Patel anup.patel@linaro.org >>>>>>> --- >>>>>>> arch/arm64/include/asm/kvm_mmu.h | 2 ++ >>>>>>> 1 file changed, 2 insertions(+) >>>>>>> >>>>>>> diff --git a/arch/arm64/include/asm/kvm_mmu.h >>>>>>> b/arch/arm64/include/asm/kvm_mmu.h >>>>>>> index efe609c..5129038 100644 >>>>>>> --- a/arch/arm64/include/asm/kvm_mmu.h >>>>>>> +++ b/arch/arm64/include/asm/kvm_mmu.h >>>>>>> @@ -123,6 +123,8 @@ static inline void >>>>>>> coherent_icache_guest_page(struct kvm *kvm, gfn_t gfn) >>>>>>> if (!icache_is_aliasing()) { /* PIPT */ >>>>>>> unsigned long hva = gfn_to_hva(kvm, gfn); >>>>>>> flush_icache_range(hva, hva + PAGE_SIZE); >>>>>>> + /* Flush d-cache for systems with external >>>>>>> caches. */ >>>>>>> + __flush_dcache_area((void *) hva, PAGE_SIZE); >>>>>>> } else if (!icache_is_aivivt()) { /* non >>>>>>> ASID-tagged >>>>>>> VIVT >>>>>>> */ >>>>>>> /* any kind of VIPT cache */ >>>>>>> __flush_icache_all(); >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> [adding Will to the discussion as we talked about this in the >>>>>> past] >>>>>> >>>>>> That's definitely an issue, but I'm not sure the fix is to
hit
>>>>>> the >>>>>> data >>>>>> cache on each page mapping. It looks overkill. >>>>>> >>>>>> Wouldn't it be enough to let userspace do the cache cleaning? >>>>>> kvmtools >>>>>> knows which bits of the guest memory have been touched, and >>>>>> can do a >>>>>> "DC >>>>>> DVAC" on this region. >>>>> >>>>> >>>>> >>>>> >>>>> It seems a bit unnatural to have cache cleaning is user-space. >>>>> I am >>>>> sure >>>>> other architectures don't have such cache cleaning in >>>>> user-space for >>>>> KVM. >>>>> >>>>>> >>>>>> The alternative is do it in the kernel before running any
vcpu
>>>>>> - but >>>>>> that's not very nice either (have to clean the whole of the >>>>>> guest >>>>>> memory, which makes a full dcache clean more appealing). >>>>> >>>>> >>>>> >>>>> >>>>> Actually, cleaning full d-cache by set/way upon first run of >>>>> VCPU was >>>>> our second option but current approach seemed very simple
hence
>>>>> we went for this. >>>>> >>>>> If more people vote for full d-cache clean upon first run of >>>>> VCPU then >>>>> we should revise this patch. >>>> >>>> >>>> >>>> >>>> Can you please give the attached patch a spin on your HW? I've >>>> boot-tested >>>> it on a model, but of course I can't really verify that it
fixes
>>>> your >>>> issue. >>>> >>>> As far as I can see, it should fix it without any additional >>>> flushing. >>>> >>>> Please let me know how it goes. >>> >>> >>> >>> HCR_EL2.DC=1 means all memory access for Stage1 MMU off are >>> treated as "Normal Non-shareable, Inner Write-Back >>> Write-Allocate, >>> Outer Write-Back Write-Allocate memory" >>> >>> HCR_EL2.DC=0 means all memory access for Stage1 MMU off are >>> treated as "Strongly-ordered device memory" >>> >>> Now if Guest/VM access hardware MMIO devices directly (such as >>> VGIC CPU interface) with MMU off then MMIO devices will be >>> accessed as normal memory when HCR_EL2.DC=1. >> >> >> >> I don't think so. Stage-2 still applies, and should force MMIO to >> be >> accessed as device memory. >> >> >>> The HCR_EL2.DC=1 makes sense only if we have all software >>> emulated devices for Guest/VM which is not true for KVM ARM or >>> KVM ARM64 because we use VGIC. >>> >>> IMHO, this patch enforces incorrect memory attribute for
Guest/VM
>>> when Stage1 MMU is off. >> >> >> >> See above. My understanding is that HCR.DC controls the default >> output of >> Stage-1, and Stage-2 overrides still apply. > > > You had mentioned that PAGE_S2_DEVICE attribute was redundant > and wanted guest to decide the memory attribute. In other words, > you > did not want to enforce any memory attribute in Stage2. > > Please refer to this patch > https://patchwork.kernel.org/patch/2543201/
This patch has never been merged. If you carry on following the discussion, you will certainly notice it was dropped for a very good reason:
https://lists.cs.columbia.edu/pipermail/kvmarm/2013-May/005827.html
So Stage-2 memory attributes are used, they are not going away, and they are essential to the patch I sent this morning.
M.
-- Fast, cheap, reliable. Pick two.
HCR_EL2.DC=1 will break Guest/VM bootloader/firmware if the Guest/VM is provided a DMA-capable device in pass-through mode. The reason being bootloader/firmware typically don't enable MMU and such bootloader/firmware will programme a pass-through DMA-capable device without any flushes to guest RAM (because it has not enabled MMU).
A good example of such a device would be SATA AHCI controller given to a Guest/VM as direct access (using SystemMMU) and Guest bootloader/firmware accessing this SATA AHCI controller to load kernel images from SATA disk. In this situation, we will have to hack Guest bootloader/firmware AHCI driver to explicitly flushes to Guest RAM (because have HCR_EL2.DC=1).
OK. So *that* looks like a valid argument against HCR_EL2.DC==1. Out
of
curiosity: is that a made up example or something you actually have?
Back to square one: Can you please benchmark the various cache cleaning options (global at startup time, per-page on S2 translation fault, and user-space)?
Eh, why is this a more valid argument than the vgic? The device passthrough Stage-2 mappings would still have the Stage-2 memory attributes to configure that memory region as device memory. Why is it relevant if the device is DMA-capable in this context?
Most ARM bootloader/firmware run with MMU off hence, they will not do explicit cache flushes when programming DMA-capable device. Now If HCR_EL2.DC=1 then Guest RAM updates done by bootloader/firmware will not be visible to DMA-capable device.
Read my question again: The bootloaders running with the MMU off in a VM will only disable the MMU for Stage-1 translations. Stage-2 translations are still enforced using hte Stage-2 page tables and their attributes for all mappings to devices will still enforce strongly-ordered device type memory.
Please read my reply again. Also try to read-up SATA AHCI spec.
Again, what does it matter if the device is DMA-capable or not?
-Christoffer _______________________________________________ kvmarm mailing list kvmarm@lists.cs.columbia.edu https://lists.cs.columbia.edu/cucslists/listinfo/kvmarm