Questions For CCI-400's Non-secure Access For big.LITTLE MP

List overview All Threads
Download

newer

older

[ACTIVITY] (Ulf Hansson)...

[RFC v2] Add mempressure cgroup

Leo Yan

3 Dec 2012 3 Dec '12

2:06 a.m.

hi,

i saw Nico's git for the developing the big.LITTLE's cluster power control for MP. In the kernel code. the cluster's first man need enable the CCI's port and snooping for the cluster in non-secure world; In CCI-400's spec, it says need to set the Secure Access Register (0x90008) bit 0 so that we can enable non-secure access to CCI-400 registers.

On fast model, i added the code in boot-wrapper to set bit_0 for CCI's Secure Access Register; but after set this bit, the boot-wrapper code cannot change to hypervisor mode successfully.

On fast model, can we use CCI's secure access register? Current i use the fast model version is: FE000-KT-00002-r7p1-80rel0.tgz, so if it's related with the fast model's version?

Also, could u kindly point out there have boot-swapper's git for reference?

-- Thx, Leo Yan

Show replies by date

Leo Yan

3 Dec 3 Dec

12:06 p.m.

On 12/03/2012 10:06 AM, Leo Yan wrote:

...

hi,

i saw Nico's git for the developing the big.LITTLE's cluster power control for MP. In the kernel code. the cluster's first man need enable the CCI's port and snooping for the cluster in non-secure world; In CCI-400's spec, it says need to set the Secure Access Register (0x90008) bit 0 so that we can enable non-secure access to CCI-400 registers.

On fast model, i added the code in boot-wrapper to set bit_0 for CCI's Secure Access Register; but after set this bit, the boot-wrapper code cannot change to hypervisor mode successfully.

On fast model, can we use CCI's secure access register? Current i use the fast model version is: FE000-KT-00002-r7p1-80rel0.tgz, so if it's related with the fast model's version?

Also, could u kindly point out there have boot-swapper's git for reference?

i found the key point is to add "dsb" to insure the register has been modified completely.

Below is the patch for the repo http://git.linaro.org/git-ro/people/pundiramit/boot-wrapper.git:

From: Leo Yan leoy@marvell.com Date: Mon, 3 Dec 2012 19:57:55 +0800 Subject: [PATCH] bootwrapper: enable CCI-400's non-secure access

Signed-off-by: Leo Yan leoy@marvell.com --- boot.S | 4 ++++ 1 file changed, 4 insertions(+)

diff --git a/boot.S b/boot.S index 60b59a6..eec50d9 100644 --- a/boot.S +++ b/boot.S @@ -80,6 +80,10 @@ start: tst r3, #1 bne 0b

+ mov r3, #1 @ enable CCI-400's secure access + str r3, [r4, #0x8] + dsb + @ Set up hvbar so hvc comes back here. ldr r0, =vectors mov r7, #0xfffffff0

-- 1.7.9.5

Leo Yan

10 Dec 10 Dec

6:48 a.m.

On 12/03/2012 08:06 PM, Leo Yan wrote:

...

On 12/03/2012 10:06 AM, Leo Yan wrote:

...
hi,

i saw Nico's git for the developing the big.LITTLE's cluster power control for MP. In the kernel code. the cluster's first man need enable the CCI's port and snooping for the cluster in non-secure world; In CCI-400's spec, it says need to set the Secure Access Register (0x90008) bit 0 so that we can enable non-secure access to CCI-400 registers.

On fast model, i added the code in boot-wrapper to set bit_0 for CCI's Secure Access Register; but after set this bit, the boot-wrapper code cannot change to hypervisor mode successfully.

On fast model, can we use CCI's secure access register? Current i use the fast model version is: FE000-KT-00002-r7p1-80rel0.tgz, so if it's related with the fast model's version?

Also, could u kindly point out there have boot-swapper's git for reference?

i found the key point is to add "dsb" to insure the register has been modified completely.

Below is the patch for the repo http://git.linaro.org/git-ro/people/pundiramit/boot-wrapper.git:

From: Leo Yan leoy@marvell.com Date: Mon, 3 Dec 2012 19:57:55 +0800 Subject: [PATCH] bootwrapper: enable CCI-400's non-secure access

Signed-off-by: Leo Yan leoy@marvell.com

boot.S | 4 ++++ 1 file changed, 4 insertions(+)

diff --git a/boot.S b/boot.S index 60b59a6..eec50d9 100644 --- a/boot.S +++ b/boot.S @@ -80,6 +80,10 @@ start: tst r3, #1 bne 0b
  mov     r3, #1                          @ enable CCI-400's
secure access
  str     r3, [r4, #0x8]
  dsb
   @ Set up hvbar so hvc comes back here.
   ldr     r0, =vectors
   mov     r7, #0xfffffff0

Here i also have some questions for big.LITTLE's coherency. so far, my working are mainly based on the fast model; but i found on fast model the coherency behavior is very sensitive.

Let's see my experiment on fast model: I used Nico's DCSCB patch for the core run into LPM; sometimes during A15 core want to run into reset state, it need flush its L1 cache to inner sharable PoU, then it's very easily to let A7 get in-consistent data and introduce the kernel panic. But after i commented out A15's flush operations and just let A15 core run into infinite loop, then the system is stable, and the kernel panic issue will dismiss. So i believe A7 core has not get the coherence cache data so that it will introduce the kernel panic.

i know now in this mail list, there have many guys are working for big.LITTLE's mp and IKS and work with fast model; so below are some questions: 1. On Fast model, its cache's coherency and consistent have the same behavior with the real TC2 chip? 2. On Fast model, after the core has set the SMP bit in ACTRL register, then we can say the core's cache for secure and non-secure both will be automatically synced with other core's and cluster, right? 3. On Fast model, if we use the DCSCB, there have extra operations for the cache/TLB maintenance?

Very appreciate your help.

Thx, Leo Yan

Nicolas Pitre

4:45 p.m.

On Mon, 10 Dec 2012, Leo Yan wrote:

...

Here i also have some questions for big.LITTLE's coherency. so far, my working are mainly based on the fast model; but i found on fast model the coherency behavior is very sensitive.

Let's see my experiment on fast model: I used Nico's DCSCB patch for the core run into LPM;

Which one? If you are using the bL_cluster_pm branch, please note that it is lagging behind current development and probably has some bugs. I'd suggest you look at the tc2_pm_api branch which has support for both TC2 and RTSM.

...

sometimes during A15 core want to run into reset state, it need flush its L1 cache to inner sharable PoU, then it's very easily to let A7 get in-consistent data and introduce the kernel panic. But after i commented out A15's flush operations and just let A15 core run into infinite loop, then the system is stable, and the kernel panic issue will dismiss. So i believe A7 core has not get the coherence cache data so that it will introduce the kernel panic.

Please report back if you still see this happening with the above branch.

Nicolas

Leo Yan

11 Dec 11 Dec

12:47 a.m.

On 12/11/2012 12:45 AM, Nicolas Pitre wrote:

...

On Mon, 10 Dec 2012, Leo Yan wrote:

...
Here i also have some questions for big.LITTLE's coherency. so far, my working are mainly based on the fast model; but i found on fast model the coherency behavior is very sensitive.

Let's see my experiment on fast model: I used Nico's DCSCB patch for the core run into LPM;

Which one? If you are using the bL_cluster_pm branch, please note that it is lagging behind current development and probably has some bugs. I'd suggest you look at the tc2_pm_api branch which has support for both TC2 and RTSM.

Nico, thx for response.

Yes, i have used tc2_pm_api's patches. But here may have one difference is i have not directly used your branch, instead i migrate your patches into my own kernel.

Here have some things want to confirm: 1. now i use the boot-wrapper is: http://git.linaro.org/git-ro/arm/models/boot-wrapper.git; because in bootwrapper there have some stuffs related with trustzone related registers, so the boot-wapper's setting is enough for non-secure world, right?

2. When we use the DCSCB to release the core, do we need to invalidate I$/D$/TLB or not? For CA9, we need do that, but for CA7/CA15, we can skip all these operations, right?

3. When we launch the fast model, do we need enable the cluster's configuration for l1_dcache-state_modelled/l1_icache-state_modelled/l2_cache-state_modelled, because i saw the spec in Cortex_A15_A7_RTSM_UG.pdf, these configurations are related with memory attribute with TLB, but i have not enabled them yet.

4. I saw the code in dcscb.c, before the core run into "wfi", it will flush the l1 cache twice, the code likes below: flush_cache_louis(); -> cpu_proc_fin(); -> flush_cache_louis(); -> clear SMP bit; -> wfi(); so there have some special reason need flush the cache twice? it's caused by fast model?

Thx, Leo Yan

Leo Yan

11:10 a.m.

On 12/11/2012 08:47 AM, Leo Yan wrote:

...

On 12/11/2012 12:45 AM, Nicolas Pitre wrote:

...
On Mon, 10 Dec 2012, Leo Yan wrote:

...
Here i also have some questions for big.LITTLE's coherency. so far, my working are mainly based on the fast model; but i found on fast model the coherency behavior is very sensitive.

Let's see my experiment on fast model: I used Nico's DCSCB patch for the core run into LPM;

Which one? If you are using the bL_cluster_pm branch, please note that it is lagging behind current development and probably has some bugs. I'd suggest you look at the tc2_pm_api branch which has support for both TC2 and RTSM.

Nico, thx for response.

Yes, i have used tc2_pm_api's patches. But here may have one difference is i have not directly used your branch, instead i migrate your patches into my own kernel.

Here have some things want to confirm:

now i use the boot-wrapper is:

http://git.linaro.org/git-ro/arm/models/boot-wrapper.git; because in bootwrapper there have some stuffs related with trustzone related registers, so the boot-wapper's setting is enough for non-secure world, right?

When we use the DCSCB to release the core, do we need to invalidate

I$/D$/TLB or not? For CA9, we need do that, but for CA7/CA15, we can skip all these operations, right?

When we launch the fast model, do we need enable the cluster's

configuration for l1_dcache-state_modelled/l1_icache-state_modelled/l2_cache-state_modelled, because i saw the spec in Cortex_A15_A7_RTSM_UG.pdf, these configurations are related with memory attribute with TLB, but i have not enabled them yet.

I saw the code in dcscb.c, before the core run into "wfi", it will

flush the l1 cache twice, the code likes below: flush_cache_louis(); -> cpu_proc_fin(); -> flush_cache_louis(); -> clear SMP bit; -> wfi(); so there have some special reason need flush the cache twice? it's caused by fast model?

hi Nico,

Today i also get one VE TC2 board to try your kernel image. With your kernel image, it can only boot up one core (the first CA15 core); i reviewed the code, tc2_pm_api branch have changed to use SPC's bxadd register.

so i just wander i need update the boot monitor; could u kindly tell me there have some modification for the boot sequence? Can i get the code for the updated sequence (such as the boot monitor)?

Thx, Leo Yan

Leo Yan

13 Dec 13 Dec

2:38 a.m.

On 12/11/2012 07:10 PM, Leo Yan wrote:

...

On 12/11/2012 08:47 AM, Leo Yan wrote:

...
On 12/11/2012 12:45 AM, Nicolas Pitre wrote:

...
On Mon, 10 Dec 2012, Leo Yan wrote:

...
Here i also have some questions for big.LITTLE's coherency. so far, my working are mainly based on the fast model; but i found on fast model the coherency behavior is very sensitive.

Let's see my experiment on fast model: I used Nico's DCSCB patch for the core run into LPM;

Which one? If you are using the bL_cluster_pm branch, please note that it is lagging behind current development and probably has some bugs. I'd suggest you look at the tc2_pm_api branch which has support for both TC2 and RTSM.

Nico, thx for response.

Yes, i have used tc2_pm_api's patches. But here may have one difference is i have not directly used your branch, instead i migrate your patches into my own kernel.

Here have some things want to confirm:

now i use the boot-wrapper is:

http://git.linaro.org/git-ro/arm/models/boot-wrapper.git; because in bootwrapper there have some stuffs related with trustzone related registers, so the boot-wapper's setting is enough for non-secure world, right?

When we use the DCSCB to release the core, do we need to invalidate

I$/D$/TLB or not? For CA9, we need do that, but for CA7/CA15, we can skip all these operations, right?

When we launch the fast model, do we need enable the cluster's

configuration for l1_dcache-state_modelled/l1_icache-state_modelled/l2_cache-state_modelled,

because i saw the spec in Cortex_A15_A7_RTSM_UG.pdf, these configurations are related with memory attribute with TLB, but i have not enabled them yet.

I saw the code in dcscb.c, before the core run into "wfi", it will

flush the l1 cache twice, the code likes below: flush_cache_louis(); -> cpu_proc_fin(); -> flush_cache_louis(); -> clear SMP bit; -> wfi(); so there have some special reason need flush the cache twice? it's caused by fast model?

hi Nico,

Today i also get one VE TC2 board to try your kernel image. With your kernel image, it can only boot up one core (the first CA15 core); i reviewed the code, tc2_pm_api branch have changed to use SPC's bxadd register.

so i just wander i need update the boot monitor; could u kindly tell me there have some modification for the boot sequence? Can i get the code for the updated sequence (such as the boot monitor)?

hi Nico,

I used the modeldebugger to dig into the code for the core's power off, and found some issues in the code.

In the ARM spec, it recommend the flow for core's suspend should look like below: 1. clear SCTLR.C bit; 2. flush l1 cache; 3. clear ACTLR.SMP bit; 4. dsb; 5. wfi;

But in dcscb.c or tc2_pm.c files, when the core what to run into LPM, we use the function *flush_cache_louis()* to flush L1 cache. But in this function it will NOT only flush l1 cache by set/way, but also it will invalidate I cache with below command: mcr p15, 0, r0, c7, c1, 0; From my experiment result, i believe this sentence will introduce unexpected behavior so that later instructions cannot execute properly. so i manually re-write the flush l1 cache flow for core's power off (almostly same with *flush_cache_louis()*, except remove invalidate I cache instruction), then i saw it's much more stable.

i also believe this is a common issue for both fast model and TC2. how about u think for this?

Thx, Leo Yan

Nicolas Pitre

3:57 a.m.

Sorry for the delay -- I've been extremely busy lately.

On Thu, 13 Dec 2012, Leo Yan wrote:

...

On 12/11/2012 07:10 PM, Leo Yan wrote:

...
On 12/11/2012 08:47 AM, Leo Yan wrote:

...
Yes, i have used tc2_pm_api's patches. But here may have one difference is i have not directly used your branch, instead i migrate your patches into my own kernel.

Please use the kernel I'm suggesting you to try first. It has been tested and is known to work.

I'd suggest you start from the following repository at this point:

http://git.linaro.org/gitweb?p=landing-teams/working/arm/kernel.git%3Ba=shor...

That's where the latest patches are currently located.

...

...
...
Here have some things want to confirm:

now i use the boot-wrapper is:

http://git.linaro.org/git-ro/arm/models/boot-wrapper.git; because in bootwrapper there have some stuffs related with trustzone related registers, so the boot-wapper's setting is enough for non-secure world, right?

I can't tell about non secure world. I've always tested RTSM with the kernel running in secure world via a simplistic boot wrapper.

I know that Tixy did test the boot wrapper from that repository above so he might be able to help.

It is still lacking an important patch though. This patch can be found here:

http://article.gmane.org/gmane.linux.linaro.devel/14457

...

...
...

When we use the DCSCB to release the core, do we need to invalidate

I$/D$/TLB or not? For CA9, we need do that, but for CA7/CA15, we can skip all these operations, right?

The generic kernel code already takes care of invalidating those upon entry in the kernel.

...

...
...

When we launch the fast model, do we need enable the cluster's

configuration for l1_dcache-state_modelled/l1_icache-state_modelled/l2_cache-state_modelled,

because i saw the spec in Cortex_A15_A7_RTSM_UG.pdf, these configurations are related with memory attribute with TLB, but i have not enabled them yet.

It is always a good idea to enable those to validate the code for correctness. But things should just work without those options nevertheless.

...

...
...

I saw the code in dcscb.c, before the core run into "wfi", it will

flush the l1 cache twice, the code likes below: flush_cache_louis(); -> cpu_proc_fin(); -> flush_cache_louis(); -> clear SMP bit; -> wfi(); so there have some special reason need flush the cache twice? it's caused by fast model?

Here's the comment right before that code:

/* * Flush all cache levels for this cluster. * * A15/A7 can hit in the cache with SCTLR.C=0, so we don't need * a preliminary flush here for those CPUs. At least, that's * the theory -- without the extra flush, Linux explodes on * RTSM (maybe not needed anymore, to be investigated). */

...

...
hi Nico,

Today i also get one VE TC2 board to try your kernel image. With your kernel image, it can only boot up one core (the first CA15 core); i reviewed the code, tc2_pm_api branch have changed to use SPC's bxadd register.

so i just wander i need update the boot monitor; could u kindly tell me there have some modification for the boot sequence? Can i get the code for the updated sequence (such as the boot monitor)?

You do need a new set of firmware files for TC2 to work with this code. Unfortunately, it has not been officially released by ARM yet, although they should release it soon.

...

hi Nico,

I used the modeldebugger to dig into the code for the core's power off, and found some issues in the code.

In the ARM spec, it recommend the flow for core's suspend should look like below:

clear SCTLR.C bit;

flush l1 cache;

clear ACTLR.SMP bit;

dsb;

wfi;

But in dcscb.c or tc2_pm.c files, when the core what to run into LPM, we use the function *flush_cache_louis()* to flush L1 cache. But in this function it will NOT only flush l1 cache by set/way, but also it will invalidate I cache with below command: mcr p15, 0, r0, c7, c1, 0; From my experiment result, i believe this sentence will introduce unexpected behavior so that later instructions cannot execute properly. so i manually re-write the flush l1 cache flow for core's power off (almostly same with *flush_cache_louis()*, except remove invalidate I cache instruction), then i saw it's much more stable.

Could you describe what you mean by "more stable" ?

Nicolas

Leo Yan

12:42 p.m.

Nico, very appreciate for the useful info; i will try the advised repo. pls see below comments.

...

...
I used the modeldebugger to dig into the code for the core's power off, and found some issues in the code.

In the ARM spec, it recommend the flow for core's suspend should look like below:

clear SCTLR.C bit;

flush l1 cache;

clear ACTLR.SMP bit;

dsb;

wfi;

But in dcscb.c or tc2_pm.c files, when the core what to run into LPM, we use the function *flush_cache_louis()* to flush L1 cache. But in this function it will NOT only flush l1 cache by set/way, but also it will invalidate I cache with below command: mcr p15, 0, r0, c7, c1, 0; From my experiment result, i believe this sentence will introduce unexpected behavior so that later instructions cannot execute properly. so i manually re-write the flush l1 cache flow for core's power off (almostly same with *flush_cache_louis()*, except remove invalidate I cache instruction), then i saw it's much more stable.

Could you describe what you mean by "more stable" ?

Sorry i made some mistakes so that introduced the confusion; and in short words, so far the testing result with your code is good enough and it will NOT introduce the hang issue.

Here are more inputs for the experiments: for i wanna to prototype the core power down flow for big.LITTLE, so i launch a thread (not the idle thread) and call the function *bL_cpu_power_down()* to power off the core. For the core's exit coherent, the experiment as below:

Testing 1: With your code, it can work well; the step is: flush_cache_louis(); -> cpu_proc_fin(); -> flush_cache_louis(); -> clear SMP bit -> wfi();

Testing 2. I modified the code to: clear SCTLR.C bit -> flush_cache_louis(); -> clear SMP bit -> wfi(); then looks like it's easily to introduce the hang issue; so that just like i asked in the previous email, we need remove the instruction for I$ invalidation; after that, i also can get the testing to pass.

BTW, i just wander whether need to refine the flow as: clear SCTLR.C bit -> don't use flush_cache_louis() anymore, instead to write a dedicated flush L1 cache function -> clear SMP bit -> wfi(); and this flow can exactly meet ARM's recommendation. How about u think for this?

Thx, Leo Yan

4587

days inactive

4597

days old

linaro-kernel@lists.linaro.org

8 comments

participants

tags (0)

participants (2)

Leo Yan
Nicolas Pitre