Re: [BUG] v7_coherent_kern_range broken on big.LITTLE

List overview All Threads
Download

newer

older

lists.linaro.org maintenance...

Linaro Release 13.02 Postmortem...

Jon Medhurst (Tixy)

15 Feb 2013 15 Feb '13

12:06 p.m.

On Fri, 2013-02-15 at 10:33 +0000, Lorenzo Pieralisi wrote:

...

On Fri, Feb 15, 2013 at 10:04:37AM +0000, Jon Medhurst (Tixy) wrote:

...
On Thu, 2013-02-14 at 17:16 +0000, Will Deacon wrote:

...
Hi Tixy,

On Thu, Feb 14, 2013 at 05:07:43PM +0000, Jon Medhurst (Tixy) wrote:

...
The function v7_coherent_kern_range uses the macro icache_line_size to read the current CPUs icache line size for the purpose of invalidating all cache lines in the given range.

Unfortunately, on the TC2 big.LITTLE test chip, the A15 icache line size is 64 bytes, but the A7 size is only 32 bytes. So when the function executes on the A15 it will miss out every alternate cache line for the A7.

There is a signal (IMINLN) to the core which allows A15 to behave as though it has a 32-byte line size and this should be driven correctly for big/little.

How do we set that signal? Is that something we have to set up in Linux or is it something that we expect the Firmware to set up?

If I am not mistaken, SCC register at offset 0x400 (bit 7) allows IMINLN to be forced to 0 (ie Instruction Cache minimum line size == 32 bytes).

This can be done through board.txt so that it is set up as we want.

According to the TRM for TC2 the default value for that register is 0x33330c80, so adding the line "SCC: 0x400 0x33330c00" and incrementing TOTALSCCS does the trick, and the A15's now report an icache size of 32.

We'll have to get everyone with a TC2 to make that change then?

As this is a TC2 issue, and not Linux related, shall we drop linux-arm-kernel from all future replies to avoid the noise? I've added linaro-dev to the cc list so this continues to get public visibility.

-- Tixy

Show replies by date

Lorenzo Pieralisi

15 Feb 15 Feb

12:43 p.m.

New subject: [BUG] v7_coherent_kern_range broken on big.LITTLE

[dropped ALKML, added Pawel]

On Fri, Feb 15, 2013 at 12:06:25PM +0000, Jon Medhurst (Tixy) wrote:

...

On Fri, 2013-02-15 at 10:33 +0000, Lorenzo Pieralisi wrote:

...
On Fri, Feb 15, 2013 at 10:04:37AM +0000, Jon Medhurst (Tixy) wrote:

...
On Thu, 2013-02-14 at 17:16 +0000, Will Deacon wrote:

...
Hi Tixy,

On Thu, Feb 14, 2013 at 05:07:43PM +0000, Jon Medhurst (Tixy) wrote:

...
The function v7_coherent_kern_range uses the macro icache_line_size to read the current CPUs icache line size for the purpose of invalidating all cache lines in the given range.

Unfortunately, on the TC2 big.LITTLE test chip, the A15 icache line size is 64 bytes, but the A7 size is only 32 bytes. So when the function executes on the A15 it will miss out every alternate cache line for the A7.

There is a signal (IMINLN) to the core which allows A15 to behave as though it has a 32-byte line size and this should be driven correctly for big/little.

How do we set that signal? Is that something we have to set up in Linux or is it something that we expect the Firmware to set up?

If I am not mistaken, SCC register at offset 0x400 (bit 7) allows IMINLN to be forced to 0 (ie Instruction Cache minimum line size == 32 bytes).

This can be done through board.txt so that it is set up as we want.

According to the TRM for TC2 the default value for that register is 0x33330c80, so adding the line "SCC: 0x400 0x33330c00" and incrementing TOTALSCCS does the trick, and the A15's now report an icache size of 32.

We'll have to get everyone with a TC2 to make that change then?

It looks like that's what we should do unless there is a strong reason against that. If we use TC2 as A15 cluster only we should set IMINLN to 1 though.

...

As this is a TC2 issue, and not Linux related, shall we drop linux-arm-kernel from all future replies to avoid the noise? I've added linaro-dev to the cc list so this continues to get public visibility.

Done.

Lorenzo

Dave Martin

7:14 p.m.

New subject: [BUG] v7_coherent_kern_range broken on big.LITTLE

On Fri, Feb 15, 2013 at 12:43:52PM +0000, Lorenzo Pieralisi wrote:

...

[dropped ALKML, added Pawel]

On Fri, Feb 15, 2013 at 12:06:25PM +0000, Jon Medhurst (Tixy) wrote:

...
On Fri, 2013-02-15 at 10:33 +0000, Lorenzo Pieralisi wrote:

...
On Fri, Feb 15, 2013 at 10:04:37AM +0000, Jon Medhurst (Tixy) wrote:

...
On Thu, 2013-02-14 at 17:16 +0000, Will Deacon wrote:

...
Hi Tixy,

On Thu, Feb 14, 2013 at 05:07:43PM +0000, Jon Medhurst (Tixy) wrote:

...
The function v7_coherent_kern_range uses the macro icache_line_size to read the current CPUs icache line size for the purpose of invalidating all cache lines in the given range.

Unfortunately, on the TC2 big.LITTLE test chip, the A15 icache line size is 64 bytes, but the A7 size is only 32 bytes. So when the function executes on the A15 it will miss out every alternate cache line for the A7.

There is a signal (IMINLN) to the core which allows A15 to behave as though it has a 32-byte line size and this should be driven correctly for big/little.

How do we set that signal? Is that something we have to set up in Linux or is it something that we expect the Firmware to set up?

If I am not mistaken, SCC register at offset 0x400 (bit 7) allows IMINLN to be forced to 0 (ie Instruction Cache minimum line size == 32 bytes).

This can be done through board.txt so that it is set up as we want.

According to the TRM for TC2 the default value for that register is 0x33330c80, so adding the line "SCC: 0x400 0x33330c00" and incrementing TOTALSCCS does the trick, and the A15's now report an icache size of 32.

We'll have to get everyone with a TC2 to make that change then?

It looks like that's what we should do unless there is a strong reason against that. If we use TC2 as A15 cluster only we should set IMINLN to 1 though.

...
As this is a TC2 issue, and not Linux related, shall we drop linux-arm-kernel from all future replies to avoid the noise? I've added linaro-dev to the cc list so this continues to get public visibility.

This doesn't halve the effective cache size, does it?

It also sounds like it will apply to any A15+A7 system, not just TC2, if we can get preempted inside the cache maintenance routines.

The other fix would be to disable preemption during the cache maintenance routines, but that would doubtless be contraversial...

Cheers ---Dave

Leo Yan

4 Mar 4 Mar

6:41 a.m.

New subject: [BUG] v7_coherent_kern_range broken on big.LITTLE

...

...
...
...
...
On Thu, Feb 14, 2013 at 05:07:43PM +0000, Jon Medhurst (Tixy) wrote:

...
The function v7_coherent_kern_range uses the macro icache_line_size to read the current CPUs icache line size for the purpose of invalidating all cache lines in the given range.

Unfortunately, on the TC2 big.LITTLE test chip, the A15 icache line size is 64 bytes, but the A7 size is only 32 bytes. So when the function executes on the A15 it will miss out every alternate cache line for the A7.

I think here have two scenarios:

1. When the program calls the function *v7_coherent_kern_range*, it will firstly read the cache type register (CTR) to get the icache line size is 64 bytes, and then it will run into the loop to flush every icache line with 64 bytes per step; if in the middle of this loop, the program is migrated onto A7, then on A7 it will continue to flush the icache with 64 bytes per step, but A7 will ONLY invalidate the first half 32 bytes of the cache line. So finally there have the possibility for the icache corruption issues.

2. When A15 and A7 cores run at the meantime; when the A15 core execute the instruction ICIMAVU then it will invalidate the i cache with 64 bytes and it will also send DVM to A7 cores to invalidate the icache as well; but A7 will ONLY invalidate 32 bytes. If so, then that means this is an architecture issue, and we must force A15's icache line to be 32 bytes for big.LITTLE from the silicon's level.

So could u help confirm, these two scenarios both will introduce the icache corruption, right? If i miss something, pls feel free point out.

...

...
...
...
...
There is a signal (IMINLN) to the core which allows A15 to behave as though it has a 32-byte line size and this should be driven correctly for big/little.

How do we set that signal? Is that something we have to set up in Linux or is it something that we expect the Firmware to set up?

If I am not mistaken, SCC register at offset 0x400 (bit 7) allows IMINLN to be forced to 0 (ie Instruction Cache minimum line size == 32 bytes).

This can be done through board.txt so that it is set up as we want.

Thx a lot for the info, now on our side with TC2 board, we do see the system is much stable after applied to allow IMINLN to be forced to 0.

Here i have another question is for the instruction *ICIALLUIS*; when the core invalidates all icache, actually it's to use the set/way method to invalidate the icache line and send DVM to message other inner share domain's cores.

If so, that means the core will invalidate the it selves icache and send the DVM to other cores to invalidate icache line if they have the same icache line. But after ICIALLUIS is executed, other cores still may have valid icache lines, right?

Thx, Leo Yan

Lorenzo Pieralisi

7 Mar 7 Mar

7:15 a.m.

New subject: [BUG] v7_coherent_kern_range broken on big.LITTLE

[CC'ing Will]

On Mon, Mar 04, 2013 at 06:41:42AM +0000, Leo Yan wrote:

...

...
...
...
...
...
On Thu, Feb 14, 2013 at 05:07:43PM +0000, Jon Medhurst (Tixy) wrote: > The function v7_coherent_kern_range uses the macro icache_line_size to > read the current CPUs icache line size for the purpose of invalidating > all cache lines in the given range. > > Unfortunately, on the TC2 big.LITTLE test chip, the A15 icache line size > is 64 bytes, but the A7 size is only 32 bytes. So when the function > executes on the A15 it will miss out every alternate cache line for the > A7.

I think here have two scenarios:

When the program calls the function *v7_coherent_kern_range*, it will

firstly read the cache type register (CTR) to get the icache line size is 64 bytes, and then it will run into the loop to flush every icache line with 64 bytes per step; if in the middle of this loop, the program is migrated onto A7, then on A7 it will continue to flush the icache with 64 bytes per step, but A7 will ONLY invalidate the first half 32 bytes of the cache line. So finally there have the possibility for the icache corruption issues.

When A15 and A7 cores run at the meantime; when the A15 core execute

the instruction ICIMAVU then it will invalidate the i cache with 64 bytes and it will also send DVM to A7 cores to invalidate the icache as well; but A7 will ONLY invalidate 32 bytes. If so, then that means this is an architecture issue, and we must force A15's icache line to be 32 bytes for big.LITTLE from the silicon's level.

So could u help confirm, these two scenarios both will introduce the icache corruption, right? If i miss something, pls feel free point out.

IMINLN provides just the stride to the cache functions. So short answer both (1) and (2) are wrong.

(1) is wrong since on A7 I-cache size is 32 bytes, so the first-half you are mentioning is an incorrect way to put it. The problem is that the MVA passed will be 64bytes aligned and the stride is 64 bytes, which means that, if run on a core with 32 bytes I-cache line, one line in two is not invalidated, but that's because the address passed is incremented by 64 bytes at a time, remember, the only thing that matters is the MVA you are passing, not the stride itself. (2) is just a wrong understanding of how things work, you are invalidating by MVA, so the MVA determines what the DVM is doing, not the cache line size.

...

...
...
...
...
...
There is a signal (IMINLN) to the core which allows A15 to behave as though it has a 32-byte line size and this should be driven correctly for big/little.

How do we set that signal? Is that something we have to set up in Linux or is it something that we expect the Firmware to set up?

If I am not mistaken, SCC register at offset 0x400 (bit 7) allows IMINLN to be forced to 0 (ie Instruction Cache minimum line size == 32 bytes).

This can be done through board.txt so that it is set up as we want.

Thx a lot for the info, now on our side with TC2 board, we do see the system is much stable after applied to allow IMINLN to be forced to 0.

Here i have another question is for the instruction *ICIALLUIS*; when the core invalidates all icache, actually it's to use the set/way method to invalidate the icache line and send DVM to message other inner share domain's cores.

ICIALLUIS does not use set/way operations.

...

If so, that means the core will invalidate the it selves icache and send the DVM to other cores to invalidate icache line if they have the same icache line. But after ICIALLUIS is executed, other cores still may have valid icache lines, right?

That's not correct. I will check what happens at bus level, but I guess the Invalidate All Inner shareable will be a single coherency command sent over CCI.

Cache line size is just used as stride to for the cache function to be optimized.

BTW, A15 TRM 6.3.6 explains what I tried to summarize above.

Lorenzo

Leo Yan

11:59 a.m.

New subject: [BUG] v7_coherent_kern_range broken on big.LITTLE

Very appreciate the detailed answers. i'd like to discuss further more for the questions, so pls see below comments.

On 03/07/2013 03:15 PM, Lorenzo Pieralisi wrote:

...

...

When the program calls the function *v7_coherent_kern_range*, it will

firstly read the cache type register (CTR) to get the icache line size is 64 bytes, and then it will run into the loop to flush every icache line with 64 bytes per step; if in the middle of this loop, the program is migrated onto A7, then on A7 it will continue to flush the icache with 64 bytes per step, but A7 will ONLY invalidate the first half 32 bytes of the cache line. So finally there have the possibility for the icache corruption issues.

When A15 and A7 cores run at the meantime; when the A15 core execute

the instruction ICIMAVU then it will invalidate the i cache with 64 bytes and it will also send DVM to A7 cores to invalidate the icache as well; but A7 will ONLY invalidate 32 bytes. If so, then that means this is an architecture issue, and we must force A15's icache line to be 32 bytes for big.LITTLE from the silicon's level.

So could u help confirm, these two scenarios both will introduce the icache corruption, right? If i miss something, pls feel free point out.

IMINLN provides just the stride to the cache functions. So short answer both (1) and (2) are wrong.

(1) is wrong since on A7 I-cache size is 32 bytes, so the first-half you are mentioning is an incorrect way to put it. The problem is that the MVA passed will be 64bytes aligned and the stride is 64 bytes, which means that, if run on a core with 32 bytes I-cache line, one line in two is not invalidated, but that's because the address passed is incremented by 64 bytes at a time, remember, the only thing that matters is the MVA you are passing, not the stride itself. (2) is just a wrong understanding of how things work, you are invalidating by MVA, so the MVA determines what the DVM is doing, not the cache line size.

I'm curious if A15 use the MVA to invalidate icache, then will the DVM contain the info of MVA and the range size? If it's ONLY include the MVA, then A7 how to know it need invalidate 64 bytes' range so that cannot coordinate with A15.

...

...
Here i have another question is for the instruction *ICIALLUIS*; when the core invalidates all icache, actually it's to use the set/way method to invalidate the icache line and send DVM to message other inner share domain's cores.

ICIALLUIS does not use set/way operations.

Here i may made mistake. before i thought *ICIALLUIS* is a pesudo instruction, the logic will use the set/way operations to invalidate the core's icache line one by one. So how ICIALLUIS can invalidate the totally core self icache lines?

...

...
If so, that means the core will invalidate the it selves icache and send the DVM to other cores to invalidate icache line if they have the same icache line. But after ICIALLUIS is executed, other cores still may have valid icache lines, right?

That's not correct. I will check what happens at bus level, but I guess the Invalidate All Inner shareable will be a single coherency command sent over CCI.

If so, it's make sense other inner share cores will invalidate all their icaches after receive the single command.

...

Cache line size is just used as stride to for the cache function to be optimized.

BTW, A15 TRM 6.3.6 explains what I tried to summarize above.

Thx for nicely reminding, i think i need read well A7/A15's TRM.

Thx, Leo Yan

Lorenzo Pieralisi

11 Mar 11 Mar

2:56 p.m.

New subject: [BUG] v7_coherent_kern_range broken on big.LITTLE

On Thu, Mar 07, 2013 at 11:59:25AM +0000, Leo Yan wrote:

...

Very appreciate the detailed answers. i'd like to discuss further more for the questions, so pls see below comments.

On 03/07/2013 03:15 PM, Lorenzo Pieralisi wrote:

...
...

When the program calls the function *v7_coherent_kern_range*, it will

firstly read the cache type register (CTR) to get the icache line size is 64 bytes, and then it will run into the loop to flush every icache line with 64 bytes per step; if in the middle of this loop, the program is migrated onto A7, then on A7 it will continue to flush the icache with 64 bytes per step, but A7 will ONLY invalidate the first half 32 bytes of the cache line. So finally there have the possibility for the icache corruption issues.

When A15 and A7 cores run at the meantime; when the A15 core execute

the instruction ICIMAVU then it will invalidate the i cache with 64 bytes and it will also send DVM to A7 cores to invalidate the icache as well; but A7 will ONLY invalidate 32 bytes. If so, then that means this is an architecture issue, and we must force A15's icache line to be 32 bytes for big.LITTLE from the silicon's level.

So could u help confirm, these two scenarios both will introduce the icache corruption, right? If i miss something, pls feel free point out.

IMINLN provides just the stride to the cache functions. So short answer both (1) and (2) are wrong.

(1) is wrong since on A7 I-cache size is 32 bytes, so the first-half you are mentioning is an incorrect way to put it. The problem is that the MVA passed will be 64bytes aligned and the stride is 64 bytes, which means that, if run on a core with 32 bytes I-cache line, one line in two is not invalidated, but that's because the address passed is incremented by 64 bytes at a time, remember, the only thing that matters is the MVA you are passing, not the stride itself. (2) is just a wrong understanding of how things work, you are invalidating by MVA, so the MVA determines what the DVM is doing, not the cache line size.

I'm curious if A15 use the MVA to invalidate icache, then will the DVM contain the info of MVA and the range size? If it's ONLY include the MVA, then A7 how to know it need invalidate 64 bytes' range so that cannot coordinate with A15.

A7 cannot know that, since the operation sent over CCI is all about invalidating an MVA, the operation sent through CCI does not contain a line size with it. This is the reason why IMINLN has to be set up according to the A15 TRM for things to function properly on bL systems (A15 TRM 6.3.6).

...

...
...
Here i have another question is for the instruction *ICIALLUIS*; when the core invalidates all icache, actually it's to use the set/way method to invalidate the icache line and send DVM to message other inner share domain's cores.

ICIALLUIS does not use set/way operations.

Here i may made mistake. before i thought *ICIALLUIS* is a pesudo instruction, the logic will use the set/way operations to invalidate the core's icache line one by one. So how ICIALLUIS can invalidate the totally core self icache lines?

Do you mean in RTL :-) ? It is a HW operation, the processor logic will certainly figure that out.

...

...
...
If so, that means the core will invalidate the it selves icache and send the DVM to other cores to invalidate icache line if they have the same icache line. But after ICIALLUIS is executed, other cores still may have valid icache lines, right?

No, it is a broadcast operation all I-caches in the IS domain are invalidated.

...

...
That's not correct. I will check what happens at bus level, but I guess the Invalidate All Inner shareable will be a single coherency command sent over CCI.

If so, it's make sense other inner share cores will invalidate all their icaches after receive the single command.

Correct.

...

...
Cache line size is just used as stride to for the cache function to be optimized.

BTW, A15 TRM 6.3.6 explains what I tried to summarize above.

Thx for nicely reminding, i think i need read well A7/A15's TRM.

You are welcome, Lorenzo

Leo Yan

15 Mar 15 Mar

3:13 a.m.

New subject: [BUG] v7_coherent_kern_range broken on big.LITTLE

Thx a lot, Lorenzo. It's much clear for me. :-)

On 03/11/2013 10:56 PM, Lorenzo Pieralisi wrote:

...

...
I'm curious if A15 use the MVA to invalidate icache, then will the DVM contain the info of MVA and the range size? If it's ONLY include the MVA, then A7 how to know it need invalidate 64 bytes' range so that cannot coordinate with A15.

A7 cannot know that, since the operation sent over CCI is all about invalidating an MVA, the operation sent through CCI does not contain a line size with it. This is the reason why IMINLN has to be set up according to the A15 TRM for things to function properly on bL systems (A15 TRM 6.3.6).

...

...
Here i may made mistake. before i thought *ICIALLUIS* is a pesudo instruction, the logic will use the set/way operations to invalidate the core's icache line one by one. So how ICIALLUIS can invalidate the totally core self icache lines?

Do you mean in RTL :-) ? It is a HW operation, the processor logic will certainly figure that out.

...
...
...
If so, that means the core will invalidate the it selves icache and send the DVM to other cores to invalidate icache line if they have the same icache line. But after ICIALLUIS is executed, other cores still may have valid icache lines, right?

No, it is a broadcast operation all I-caches in the IS domain are invalidated.

...
...
That's not correct. I will check what happens at bus level, but I guess the Invalidate All Inner shareable will be a single coherency command sent over CCI.

If so, it's make sense other inner share cores will invalidate all their icaches after receive the single command.

Correct.

...

Dietmar Eggemann

15 Feb 15 Feb

2:37 p.m.

New subject: [BUG] v7_coherent_kern_range broken on big.LITTLE

dropped linux-arm-kernel@lists.infradead.org linux-arm-kernel@lists.infradead.org

On 15/02/13 12:06, Jon Medhurst (Tixy) wrote:

...

On Fri, 2013-02-15 at 10:33 +0000, Lorenzo Pieralisi wrote:

...
On Fri, Feb 15, 2013 at 10:04:37AM +0000, Jon Medhurst (Tixy) wrote:

...
On Thu, 2013-02-14 at 17:16 +0000, Will Deacon wrote:

...
Hi Tixy,

On Thu, Feb 14, 2013 at 05:07:43PM +0000, Jon Medhurst (Tixy) wrote:

...
The function v7_coherent_kern_range uses the macro icache_line_size to read the current CPUs icache line size for the purpose of invalidating all cache lines in the given range.

Unfortunately, on the TC2 big.LITTLE test chip, the A15 icache line size is 64 bytes, but the A7 size is only 32 bytes. So when the function executes on the A15 it will miss out every alternate cache line for the A7.

There is a signal (IMINLN) to the core which allows A15 to behave as though it has a 32-byte line size and this should be driven correctly for big/little.

How do we set that signal? Is that something we have to set up in Linux or is it something that we expect the Firmware to set up?

If I am not mistaken, SCC register at offset 0x400 (bit 7) allows IMINLN to be forced to 0 (ie Instruction Cache minimum line size == 32 bytes).

This can be done through board.txt so that it is set up as we want.

According to the TRM for TC2 the default value for that register is 0x33330c80, so adding the line "SCC: 0x400 0x33330c00" and incrementing TOTALSCCS does the trick, and the A15's now report an icache size of 32.

We'll have to get everyone with a TC2 to make that change then?

Hi,

I tried with SCC: 0x400 0x33330c00 and incremented TOTALSCCS but it makes no difference for me.

With cpuidle state1 enabled on all CPUs, the system stops after

root@linaro-nano:/sys/kernel/debug/tracing# echo function > current_tracer

When I attach the debugger I can always see that one A15 stopped here:

#0 atomic_add_return( v = <Value not available : Failed to read 4 bytes from address S:0x0000001D because Bus error on memory operation.>, i = 1 ) at atomic.h:62 #1 function_trace_call( ip = 3221328812, parent_ip = 3221331569, op = <Value currently has no location>, pt_regs = <Value currently has no location> ) at trace_functions.c:109 #2 [S:0x00001008]

i r R0 0xC0084E89 3221769865 R1 0x00000001 1 R2 0x00000001 1 R3 0xEF0C4000 4010557440 R4 0xC1BCDE88 3250380424 R5 0x200001D3 536871379 R6 0xC05FB3AC 3227497388 R7 0xC0019E71 3221331569 R8 0xC00193AC 3221328812 R9 0xC0B54EF8 3233107704 R10 0x00000000 0 R11 0x00000000 0 R12 0x00000000 0 SP 0xEF0C5E78 0xEF0C5E78 LR 0xC0084E89 0xC0084E89 <current_thread_info+0x1> PC 0xC0084E9E 0xC0084E9E <atomic_add_return+0x4> CPSR 0x200001F3 nzCvq_ge3ge2ge1ge0_inactive_eAIFTj_SVC

disas 0xC0084E9A S:0xC0084E9A : DMB S:0xC0084E9E : LDREX r2,[r4] <--- PC S:0xC0084EA2 : ADD r2,r2,#1 S:0xC0084EA6 : STREX r1,r2,[r4] S:0xC0084EAA : TEQ r1,#0 S:0xC0084EAE : BNE {pc}-0x10 ; 0xc0084e9e S:0xC0084EB0 : DMB

CP15_SCTLR: 0x50C53879 ... C ... disable

Cheers,

-- Dietmar

...

As this is a TC2 issue, and not Linux related, shall we drop linux-arm-kernel from all future replies to avoid the noise? I've added linaro-dev to the cc list so this continues to get public visibility.

-- IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Lorenzo Pieralisi

3:09 p.m.

New subject: [BUG] v7_coherent_kern_range broken on big.LITTLE

On Fri, Feb 15, 2013 at 02:37:49PM +0000, Dietmar Eggemann wrote:

...

dropped linux-arm-kernel@lists.infradead.org linux-arm-kernel@lists.infradead.org

On 15/02/13 12:06, Jon Medhurst (Tixy) wrote:

...
On Fri, 2013-02-15 at 10:33 +0000, Lorenzo Pieralisi wrote:

...
On Fri, Feb 15, 2013 at 10:04:37AM +0000, Jon Medhurst (Tixy) wrote:

...
On Thu, 2013-02-14 at 17:16 +0000, Will Deacon wrote:

...
Hi Tixy,

On Thu, Feb 14, 2013 at 05:07:43PM +0000, Jon Medhurst (Tixy) wrote:

...
The function v7_coherent_kern_range uses the macro icache_line_size to read the current CPUs icache line size for the purpose of invalidating all cache lines in the given range.

Unfortunately, on the TC2 big.LITTLE test chip, the A15 icache line size is 64 bytes, but the A7 size is only 32 bytes. So when the function executes on the A15 it will miss out every alternate cache line for the A7.

There is a signal (IMINLN) to the core which allows A15 to behave as though it has a 32-byte line size and this should be driven correctly for big/little.

How do we set that signal? Is that something we have to set up in Linux or is it something that we expect the Firmware to set up?

If I am not mistaken, SCC register at offset 0x400 (bit 7) allows IMINLN to be forced to 0 (ie Instruction Cache minimum line size == 32 bytes).

This can be done through board.txt so that it is set up as we want.

According to the TRM for TC2 the default value for that register is 0x33330c80, so adding the line "SCC: 0x400 0x33330c00" and incrementing TOTALSCCS does the trick, and the A15's now report an icache size of 32.

We'll have to get everyone with a TC2 to make that change then?

Hi,

I tried with SCC: 0x400 0x33330c00 and incremented TOTALSCCS but it makes no difference for me.

With cpuidle state1 enabled on all CPUs, the system stops after

root@linaro-nano:/sys/kernel/debug/tracing# echo function > current_tracer

When I attach the debugger I can always see that one A15 stopped here:

#0 atomic_add_return( v = <Value not available : Failed to read 4 bytes from address S:0x0000001D because Bus error on memory operation.>, i = 1 ) at atomic.h:62 #1 function_trace_call( ip = 3221328812, parent_ip = 3221331569, op = <Value currently has no location>, pt_regs = <Value currently has no location> ) at trace_functions.c:109 #2 [S:0x00001008]

i r R0 0xC0084E89 3221769865 R1 0x00000001 1 R2 0x00000001 1 R3 0xEF0C4000 4010557440 R4 0xC1BCDE88 3250380424 R5 0x200001D3 536871379 R6 0xC05FB3AC 3227497388 R7 0xC0019E71 3221331569 R8 0xC00193AC 3221328812 R9 0xC0B54EF8 3233107704 R10 0x00000000 0 R11 0x00000000 0 R12 0x00000000 0 SP 0xEF0C5E78 0xEF0C5E78 LR 0xC0084E89 0xC0084E89 <current_thread_info+0x1> PC 0xC0084E9E 0xC0084E9E <atomic_add_return+0x4> CPSR 0x200001F3 nzCvq_ge3ge2ge1ge0_inactive_eAIFTj_SVC

disas 0xC0084E9A S:0xC0084E9A : DMB S:0xC0084E9E : LDREX r2,[r4] <--- PC S:0xC0084EA2 : ADD r2,r2,#1 S:0xC0084EA6 : STREX r1,r2,[r4] S:0xC0084EAA : TEQ r1,#0 S:0xC0084EAE : BNE {pc}-0x10 ; 0xc0084e9e S:0xC0084EB0 : DMB

CP15_SCTLR: 0x50C53879 ... C ... disable

Looks like code relying on ldrex/strex is being run with C bit cleared which is a recipe for disasters.

We should avoid tracing power down functions, I have to compile and disassemble the kernel to be more precise, just guessing.

Lorenzo

Jon Medhurst (Tixy)

3:18 p.m.

New subject: [BUG] v7_coherent_kern_range broken on big.LITTLE

On Fri, 2013-02-15 at 14:37 +0000, Dietmar Eggemann wrote:

...

dropped linux-arm-kernel@lists.infradead.org linux-arm-kernel@lists.infradead.org

On 15/02/13 12:06, Jon Medhurst (Tixy) wrote:

...
On Fri, 2013-02-15 at 10:33 +0000, Lorenzo Pieralisi wrote:

...
On Fri, Feb 15, 2013 at 10:04:37AM +0000, Jon Medhurst (Tixy) wrote:

...
On Thu, 2013-02-14 at 17:16 +0000, Will Deacon wrote:

...
Hi Tixy,

On Thu, Feb 14, 2013 at 05:07:43PM +0000, Jon Medhurst (Tixy) wrote:

...
The function v7_coherent_kern_range uses the macro icache_line_size to read the current CPUs icache line size for the purpose of invalidating all cache lines in the given range.

Unfortunately, on the TC2 big.LITTLE test chip, the A15 icache line size is 64 bytes, but the A7 size is only 32 bytes. So when the function executes on the A15 it will miss out every alternate cache line for the A7.

There is a signal (IMINLN) to the core which allows A15 to behave as though it has a 32-byte line size and this should be driven correctly for big/little.

How do we set that signal? Is that something we have to set up in Linux or is it something that we expect the Firmware to set up?

If I am not mistaken, SCC register at offset 0x400 (bit 7) allows IMINLN to be forced to 0 (ie Instruction Cache minimum line size == 32 bytes).

This can be done through board.txt so that it is set up as we want.

According to the TRM for TC2 the default value for that register is 0x33330c80, so adding the line "SCC: 0x400 0x33330c00" and incrementing TOTALSCCS does the trick, and the A15's now report an icache size of 32.

We'll have to get everyone with a TC2 to make that change then?

Hi,

I tried with SCC: 0x400 0x33330c00 and incremented TOTALSCCS but it makes no difference for me.

Yes, there are still problems with the Linaro kernel, probably related to all the big.LITTLE, cpuidle or cpufreq code or pathways they trigger. The cache line size change fixed things for me when running mainline Linux, which doesn't have all these new features yet.

Though, the whole ftrace design looks a bit suspect to me, we are relying on every single function in the pathway used for implementing ftrace being marked with 'notrace' (or their file being marked likewise, or the function implemented in assembler). Also, as Will Deacon has pointed out [1], the Architecture specification doesn't guarantee the instruction manipulation being done is safe (though we could quite possibly get away with that).

[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-December/136441.h...

I was going to carry on trying to debug this issue between my other tasks, (it's been a background task for me for a long time).

-- Tixy

4740

days inactive

4768

days old

linaro-dev@lists.linaro.org

10 comments

participants

tags (0)

participants (5)

Dave Martin
Dietmar Eggemann
Jon Medhurst (Tixy)
Leo Yan
Lorenzo Pieralisi