On Mon, Mar 10, 2014 at 10:44:05AM +0000, karim.allah.ahmed@gmail.com wrote:
I have two questions:
1- I was wondering what should be the expected semantics of "flush_cache_all" on a Big.LITTLE architecture.
I can see that the implementation of this function under linux kernel is doing the following:
a- Read the value of LoC ( level of coherency ) b- Flush each level of cache to that LoC value using DCCISW co-processor register.
My expectation would be that if this is executed on one of the processors of the Big cluster it should flush all L1 and L2 caches on this cluster and then signal the CCI interconnect of the cache cleaning operation and then the CCI interconnect would propagate this signal downstream to the LITTLE cluster. This will mean that at the end all cache will be flushed.
I am not sure exactly how the CCI behaves here but cache flushing by set/way (like the flush_cache_all function) is not safe on SMP (independent of big.LITTLE) and it should only be used in certain contexts like suspend/resume where we have more control about cache lines migration between CPUs/clusters.
Is that the proper semantics of this operation ?
or it's only going to affect this CPU and no other CPUs in the cluster ( and consequently no other CPUs on the other cluster ). And if that's the case, does this mean that I've to do the cache flushing per_cpu ?
The safe thing is to assume that it only affects a single CPU (and as an optimisation we use a flush_cache_louis which does the L1 cache only). When the whole cluster is going down and we know that only one CPU is running, we can use flush_cache_all for that cluster but it does not affect the caches in the other cluster.
Per-CPU cache flushing isn't useful either when all the CPUs are active since cache lines can still migrate (unless you use something like stop_machine, disable the MMU on all CPUs, do the flushing after the MMUs have been disabled).
2- and Is there a difference in semantics between flushing each cache till I reach the Level of coherency ( using DCCISW register ) and flushing the first cache only to the point of coherency ( using DCCIMVAC register ) ?
The difference is that the MVA operation is guaranteed to work on SMP since it is broadcast to the other CPUs in hardware. The SW ops are not.