[CC'ing Will]
On Mon, Mar 04, 2013 at 06:41:42AM +0000, Leo Yan wrote:
On Thu, Feb 14, 2013 at 05:07:43PM +0000, Jon Medhurst (Tixy) wrote: > The function v7_coherent_kern_range uses the macro icache_line_size to > read the current CPUs icache line size for the purpose of invalidating > all cache lines in the given range. > > Unfortunately, on the TC2 big.LITTLE test chip, the A15 icache line size > is 64 bytes, but the A7 size is only 32 bytes. So when the function > executes on the A15 it will miss out every alternate cache line for the > A7.
I think here have two scenarios:
- When the program calls the function *v7_coherent_kern_range*, it will
firstly read the cache type register (CTR) to get the icache line size is 64 bytes, and then it will run into the loop to flush every icache line with 64 bytes per step; if in the middle of this loop, the program is migrated onto A7, then on A7 it will continue to flush the icache with 64 bytes per step, but A7 will ONLY invalidate the first half 32 bytes of the cache line. So finally there have the possibility for the icache corruption issues.
- When A15 and A7 cores run at the meantime; when the A15 core execute
the instruction ICIMAVU then it will invalidate the i cache with 64 bytes and it will also send DVM to A7 cores to invalidate the icache as well; but A7 will ONLY invalidate 32 bytes. If so, then that means this is an architecture issue, and we must force A15's icache line to be 32 bytes for big.LITTLE from the silicon's level.
So could u help confirm, these two scenarios both will introduce the icache corruption, right? If i miss something, pls feel free point out.
IMINLN provides just the stride to the cache functions. So short answer both (1) and (2) are wrong.
(1) is wrong since on A7 I-cache size is 32 bytes, so the first-half you are mentioning is an incorrect way to put it. The problem is that the MVA passed will be 64bytes aligned and the stride is 64 bytes, which means that, if run on a core with 32 bytes I-cache line, one line in two is not invalidated, but that's because the address passed is incremented by 64 bytes at a time, remember, the only thing that matters is the MVA you are passing, not the stride itself. (2) is just a wrong understanding of how things work, you are invalidating by MVA, so the MVA determines what the DVM is doing, not the cache line size.
There is a signal (IMINLN) to the core which allows A15 to behave as though it has a 32-byte line size and this should be driven correctly for big/little.
How do we set that signal? Is that something we have to set up in Linux or is it something that we expect the Firmware to set up?
If I am not mistaken, SCC register at offset 0x400 (bit 7) allows IMINLN to be forced to 0 (ie Instruction Cache minimum line size == 32 bytes).
This can be done through board.txt so that it is set up as we want.
Thx a lot for the info, now on our side with TC2 board, we do see the system is much stable after applied to allow IMINLN to be forced to 0.
Here i have another question is for the instruction *ICIALLUIS*; when the core invalidates all icache, actually it's to use the set/way method to invalidate the icache line and send DVM to message other inner share domain's cores.
ICIALLUIS does not use set/way operations.
If so, that means the core will invalidate the it selves icache and send the DVM to other cores to invalidate icache line if they have the same icache line. But after ICIALLUIS is executed, other cores still may have valid icache lines, right?
That's not correct. I will check what happens at bus level, but I guess the Invalidate All Inner shareable will be a single coherency command sent over CCI.
Cache line size is just used as stride to for the cache function to be optimized.
BTW, A15 TRM 6.3.6 explains what I tried to summarize above.
Lorenzo