On Thu, Aug 29, 2013 at 05:02:50PM +0100, Anup Patel wrote:
On Thu, Aug 29, 2013 at 6:23 PM, Catalin Marinas catalin.marinas@arm.com wrote:
On Thu, Aug 29, 2013 at 01:31:43PM +0100, Anup Patel wrote:
On Thu, Aug 29, 2013 at 4:22 PM, Catalin Marinas catalin.marinas@arm.com wrote:
On Fri, Aug 16, 2013 at 07:57:55AM +0100, Anup Patel wrote:
The approach of flushing d-cache by set/way upon first run of VCPU will not work because for set/way operations ARM ARM says: "For set/way operations, and for All (entire cache) operations, the point is defined to be to the next level of caching". In other words, set/way operations work upto point of unification.
I don't understand where you got the idea that set/way operations work up to the point of unification. This is incorrect, the set/way operations work on the level of cache specified by bits 3:1 in the register passed to the DC CISW instruction. For your L3 cache, those bits would be 2 (and __flush_dcache_all() implementation does this dynamically).
The L3-cache is not visible to CPU. It is totally independent and transparent to CPU.
OK. But you say that operations like DC CIVAC actually flush the L3? So I don't see it as completely transparent to the CPU.
It is transparent from CPU perspective. In other words, there is nothing in CPU for controlling/monitoring L3-cache.
We probably have a different understanding of "transparent". It doesn't look to me like any more transparent than the L1 or L2 cache. Basically, from a software perspective, it needs maintenance. Whether the CPU explicitly asks the L3 cache for this or the L3 cache figures it on its own based on the L1/L2 operations is irrelevant.
It would have been transparent if the software didn't need to know about it at all, but it's not the case.
Do you have any configuration bits which would make the L3 completely transparent like always caching even when accesses are non-cacheable and DC ops to PoC ignoring it?
Actually, L3-cache monitors the types of read/write generated by CPU (i.e. whether the request is cacheable/non-cacheable or whether the request is due to DC ops to PoC, or ...).
To answer your query, there is no configuration to have L3 caching when accesses are non-cacheable and DC ops to PoC.
So it's an outer cache with some "improvements" to handle DC ops to PoC. I think it was a pretty bad decision on the hardware side as we really try to get rid of outer caches for many reasons:
1. Non-standard cache flushing operations (MMIO-based) 2. It may require cache maintenance by physical address - something hard to get in a guest OS (unless you virtualise L3 cache maintenance) 3. Are barriers like DSB propagated correctly? Does a DC op to PoC followed by DSB ensure that the L3 drained the cachelines to RAM?
I think point 2 isn't required because your L3 detects DC ops to PoC. I hope point 3 is handled correctly (otherwise look how "nice" the mb() macro on arm is to cope with L2x0).
If only 1 is left, we don't need the full outer_cache framework but it still needs to be addressed since the assumption is that flush_cache_all (or __flush_dcache_all) flushes all cache levels. These are not used in generic code but are used during kernel booting, KVM and cpuidle drivers.
Now, back to the idea of outer_cache framework for arm64. Does your CPU have separate instructions for flushing this L3 cache?
No, CPU does not have separate instruction for flushing L3-cache. On the other hand, L3-cache has MMIO registers which can be use to explicitly flush L3-cache.
I guess you use those in your firmware or boot loader since Linux requires clean/invalidated caches at boot (and I plan to push a patch which removes kernel D-cache cleaning during boot to spot such problems early). A cpuidle driver would probably need this as well.