These patches fix and tweak various cache settings for the 4460 resulting in a speed increase exceeding 10% in some tests.
Mans Rullgard (5): OMAP4: apply L2 cache lockdown workaround only on 4460 ES1.0 OMAP4: enable double linefill on 4460 OMAP4: fix PL310 prefetch offset setting OMAP4: set PL310 prefetch offset to 3 OMAP4: do not force workarounds for errata fixed in 4460
arch/arm/mach-omap2/Kconfig | 3 --- arch/arm/mach-omap2/omap4-common.c | 25 ++++++++----------------- 2 files changed, 8 insertions(+), 20 deletions(-)
The issue addressed by this workaround is fixed in ES1.1.
Signed-off-by: Mans Rullgard mans.rullgard@linaro.org --- arch/arm/mach-omap2/omap4-common.c | 12 ++---------- 1 files changed, 2 insertions(+), 10 deletions(-)
diff --git a/arch/arm/mach-omap2/omap4-common.c b/arch/arm/mach-omap2/omap4-common.c index 5df70fb..e0120d5 100644 --- a/arch/arm/mach-omap2/omap4-common.c +++ b/arch/arm/mach-omap2/omap4-common.c @@ -190,25 +190,17 @@ static int __init omap_l2_cache_init(void) omap_smc1(0x113, por_ctrl); }
- if (cpu_is_omap446x()) { - writel_relaxed(0xa5a5, l2cache_base + 0x900); - writel_relaxed(0xa5a5, l2cache_base + 0x908); - writel_relaxed(0xa5a5, l2cache_base + 0x904); - writel_relaxed(0xa5a5, l2cache_base + 0x90C); - } - /* - * FIXME: Temporary WA for OMAP4460 stability issue. + * Workaround for OMAP4460 ES1.0 stability issue. * Lock-down specific L2 cache ways which makes effective * L2 size as 512 KB instead of 1 MB */ - if (cpu_is_omap446x()) { + if (omap_rev() == OMAP4460_REV_ES1_0) { lockdown = 0xa5a5; writel_relaxed(lockdown, l2cache_base + L2X0_LOCKDOWN_WAY_D0); writel_relaxed(lockdown, l2cache_base + L2X0_LOCKDOWN_WAY_D1); writel_relaxed(lockdown, l2cache_base + L2X0_LOCKDOWN_WAY_I0); writel_relaxed(lockdown, l2cache_base + L2X0_LOCKDOWN_WAY_I1); - goto skip_aux_por_api; }
skip_aux_por_api:
Re-enable the PL310 double linefill feature on 4460, disabled in 285d2c4, without setting the "reserved" bit 25 of the prefetch control register. Benchmarking shows no measurable difference with and without this bit set.
Signed-off-by: Mans Rullgard mans.rullgard@linaro.org --- arch/arm/mach-omap2/omap4-common.c | 10 ++++------ 1 files changed, 4 insertions(+), 6 deletions(-)
diff --git a/arch/arm/mach-omap2/omap4-common.c b/arch/arm/mach-omap2/omap4-common.c index e0120d5..03b13e3 100644 --- a/arch/arm/mach-omap2/omap4-common.c +++ b/arch/arm/mach-omap2/omap4-common.c @@ -175,16 +175,14 @@ static int __init omap_l2_cache_init(void)
/* Setup POR Control register */ por_ctrl = readl_relaxed(l2cache_base + L2X0_PREFETCH_CTRL); -#if 0 + /* * Double linefill is available only on OMAP4460 L2X0. - * Undocumented bit 25 is set for better performance. */ if (cpu_is_omap446x()) - por_ctrl |= ((1 << L2X0_PREFETCH_DATA_PREFETCH_SHIFT) | - (1 << L2X0_PREFETCH_DOUBLE_LINEFILL_SHIFT) | - (1 << 25)); -#endif + por_ctrl |= (1 << L2X0_PREFETCH_DATA_PREFETCH_SHIFT) | + (1 << L2X0_PREFETCH_DOUBLE_LINEFILL_SHIFT); + if (cpu_is_omap446x() || (omap_rev() >= OMAP4430_REV_ES2_2)) { por_ctrl |= L2X0_POR_OFFSET_VALUE; omap_smc1(0x113, por_ctrl);
The old value needs to be cleared before inserting the new one.
Signed-off-by: Mans Rullgard mans.rullgard@linaro.org --- arch/arm/mach-omap2/omap4-common.c | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/arch/arm/mach-omap2/omap4-common.c b/arch/arm/mach-omap2/omap4-common.c index 03b13e3..be74f78 100644 --- a/arch/arm/mach-omap2/omap4-common.c +++ b/arch/arm/mach-omap2/omap4-common.c @@ -184,6 +184,7 @@ static int __init omap_l2_cache_init(void) (1 << L2X0_PREFETCH_DOUBLE_LINEFILL_SHIFT);
if (cpu_is_omap446x() || (omap_rev() >= OMAP4430_REV_ES2_2)) { + por_ctrl &= ~0x1f; por_ctrl |= L2X0_POR_OFFSET_VALUE; omap_smc1(0x113, por_ctrl); }
According to the PL310 TRM, 9 is not a valid value for this field, and benchmarking shows slightly better results with a value of 3.
Signed-off-by: Mans Rullgard mans.rullgard@linaro.org --- arch/arm/mach-omap2/omap4-common.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/arch/arm/mach-omap2/omap4-common.c b/arch/arm/mach-omap2/omap4-common.c index be74f78..8f33746 100644 --- a/arch/arm/mach-omap2/omap4-common.c +++ b/arch/arm/mach-omap2/omap4-common.c @@ -32,7 +32,7 @@ #include "omap4-sar-layout.h"
#ifdef CONFIG_CACHE_L2X0 -#define L2X0_POR_OFFSET_VALUE 0x9 +#define L2X0_POR_OFFSET_VALUE 0x3 void __iomem *l2cache_base; #endif
Signed-off-by: Mans Rullgard mans.rullgard@linaro.org --- arch/arm/mach-omap2/Kconfig | 3 --- 1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/arch/arm/mach-omap2/Kconfig b/arch/arm/mach-omap2/Kconfig index fe18e27..427d27b 100644 --- a/arch/arm/mach-omap2/Kconfig +++ b/arch/arm/mach-omap2/Kconfig @@ -44,9 +44,6 @@ config ARCH_OMAP4 select CPU_V7 select ARM_GIC select LOCAL_TIMERS if SMP - select PL310_ERRATA_588369 - select PL310_ERRATA_727915 - select ARM_ERRATA_720789 select ARCH_HAS_OPP select PM_OPP if PM select USB_ARCH_HAS_EHCI
On 11/22/2011 10:45 AM, Somebody in the thread at some point said:
Signed-off-by: Mans Rullgardmans.rullgard@linaro.org
arch/arm/mach-omap2/Kconfig | 3 --- 1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/arch/arm/mach-omap2/Kconfig b/arch/arm/mach-omap2/Kconfig index fe18e27..427d27b 100644 --- a/arch/arm/mach-omap2/Kconfig +++ b/arch/arm/mach-omap2/Kconfig @@ -44,9 +44,6 @@ config ARCH_OMAP4 select CPU_V7 select ARM_GIC select LOCAL_TIMERS if SMP
- select PL310_ERRATA_588369
- select PL310_ERRATA_727915
- select ARM_ERRATA_720789 select ARCH_HAS_OPP select PM_OPP if PM select USB_ARCH_HAS_EHCI
Mans thanks for bringing these to my attention.
From what I can figure out from Santosh's comments I want the two PL310 patches. I already had my own patch for limiting 4460 ES1.0 cache workaround to that revision, although I'll happily deprecate it when yours turns up in mainline.
For this particular patch although it's fixed in 4460 CONFIG_ARCH_OMAP4 is also there for 4430. Am I right to take from your patch name we do need these workarounds on 4430?
-Andy
On 22 November 2011 10:30, Andy Green andy.green@linaro.org wrote:
On 11/22/2011 10:45 AM, Somebody in the thread at some point said:
Signed-off-by: Mans Rullgardmans.rullgard@linaro.org
arch/arm/mach-omap2/Kconfig | 3 --- 1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/arch/arm/mach-omap2/Kconfig b/arch/arm/mach-omap2/Kconfig index fe18e27..427d27b 100644 --- a/arch/arm/mach-omap2/Kconfig +++ b/arch/arm/mach-omap2/Kconfig @@ -44,9 +44,6 @@ config ARCH_OMAP4 select CPU_V7 select ARM_GIC select LOCAL_TIMERS if SMP
- select PL310_ERRATA_588369
- select PL310_ERRATA_727915
- select ARM_ERRATA_720789
select ARCH_HAS_OPP select PM_OPP if PM select USB_ARCH_HAS_EHCI
Mans thanks for bringing these to my attention.
From what I can figure out from Santosh's comments I want the two PL310 patches. I already had my own patch for limiting 4460 ES1.0 cache workaround to that revision, although I'll happily deprecate it when yours turns up in mainline.
Mainline has nothing at all for this, nor does linux-omap.
For this particular patch although it's fixed in 4460 CONFIG_ARCH_OMAP4 is also there for 4430. Am I right to take from your patch name we do need these workarounds on 4430?
Yes, the workarounds are needed on 4430. Ideally any workarounds which can be applied selectively at runtime should be done that way. If I have time, I might have a look at that, but no promises.
On 22 November 2011 12:22, Mans Rullgard mans.rullgard@linaro.org wrote:
On 22 November 2011 10:30, Andy Green andy.green@linaro.org wrote:
On 11/22/2011 10:45 AM, Somebody in the thread at some point said:
Signed-off-by: Mans Rullgardmans.rullgard@linaro.org
arch/arm/mach-omap2/Kconfig | 3 --- 1 files changed, 0 insertions(+), 3 deletions(-)
diff --git a/arch/arm/mach-omap2/Kconfig b/arch/arm/mach-omap2/Kconfig index fe18e27..427d27b 100644 --- a/arch/arm/mach-omap2/Kconfig +++ b/arch/arm/mach-omap2/Kconfig @@ -44,9 +44,6 @@ config ARCH_OMAP4 select CPU_V7 select ARM_GIC select LOCAL_TIMERS if SMP
- select PL310_ERRATA_588369
- select PL310_ERRATA_727915
- select ARM_ERRATA_720789
select ARCH_HAS_OPP select PM_OPP if PM select USB_ARCH_HAS_EHCI
Mans thanks for bringing these to my attention.
From what I can figure out from Santosh's comments I want the two PL310 patches. I already had my own patch for limiting 4460 ES1.0 cache workaround to that revision, although I'll happily deprecate it when yours turns up in mainline.
Mainline has nothing at all for this, nor does linux-omap.
For this particular patch although it's fixed in 4460 CONFIG_ARCH_OMAP4 is also there for 4430. Am I right to take from your patch name we do need these workarounds on 4430?
Yes, the workarounds are needed on 4430.
Correction, 588369 is not needed on 4430 ES2.x. It was fixed in PL310 r2p0 which is used there. I can't find a TRM for ES1.x so can't check the PL310 version it uses (don't have a device either).
Mans,
On Tue, Nov 22, 2011 at 8:15 AM, Mans Rullgard mans.rullgard@linaro.org wrote:
These patches fix and tweak various cache settings for the 4460 resulting in a speed increase exceeding 10% in some tests.
Mans Rullgard (5): OMAP4: apply L2 cache lockdown workaround only on 4460 ES1.0
This one is OK though the Panda were suppose to made out of es1.1 and es1.0 was not suppose to be supported. The WA is not full proof and you still might see corruption with this. Hence for mainline, we have decided not to push this patch.
OMAP4: enable double linefill on 4460
Don't do that. We found a new errata around this recently which demanded to disable the DLF. Signature of the failure is full cache line getting corrupted. The errata should soon go into the ARM documentation. So this one too isn't useful even though it improves benchmarks.
OMAP4: fix PL310 prefetch offset setting OMAP4: set PL310 prefetch offset to 3
You can combine above two if possible and it can go to mainline.
OMAP4: do not force workarounds for errata fixed in 4460
I agree though with single defconfig (omap2plus), it's hard to have such distinctions since most of the ARM errata WA are static configuratipons.
On 22 November 2011 05:14, Shilimkar, Santosh santosh.shilimkar@ti.com wrote:
Mans,
On Tue, Nov 22, 2011 at 8:15 AM, Mans Rullgard mans.rullgard@linaro.org wrote:
These patches fix and tweak various cache settings for the 4460 resulting in a speed increase exceeding 10% in some tests.
Mans Rullgard (5): OMAP4: apply L2 cache lockdown workaround only on 4460 ES1.0
This one is OK though the Panda were suppose to made out of es1.1 and es1.0 was not suppose to be supported. The WA is not full proof and you still might see corruption with this. Hence for mainline, we have decided not to push this patch.
Well, currently the tilt kernel applies this to all 4460 versions, twice even. This patch makes it do the right thing on both 1.0 and 1.1.
OMAP4: enable double linefill on 4460
Don't do that. We found a new errata around this recently which demanded to disable the DLF. Signature of the failure is full cache line getting corrupted. The errata should soon go into the ARM documentation. So this one too isn't useful even though it improves benchmarks.
Do you have an erratum number for this?
OMAP4: fix PL310 prefetch offset setting OMAP4: set PL310 prefetch offset to 3
You can combine above two if possible and it can go to mainline.
OMAP4: do not force workarounds for errata fixed in 4460
I agree though with single defconfig (omap2plus), it's hard to have such distinctions since most of the ARM errata WA are static configurations.
So keep them on by default but allow them to be turned off. In the longer term, we should of course try to make these selectively applied at runtime whenever possible.
On Tue, Nov 22, 2011 at 6:02 PM, Mans Rullgard mans.rullgard@linaro.org wrote:
On 22 November 2011 05:14, Shilimkar, Santosh santosh.shilimkar@ti.com wrote:
Mans,
On Tue, Nov 22, 2011 at 8:15 AM, Mans Rullgard mans.rullgard@linaro.org wrote:
These patches fix and tweak various cache settings for the 4460 resulting in a speed increase exceeding 10% in some tests.
Mans Rullgard (5): OMAP4: apply L2 cache lockdown workaround only on 4460 ES1.0
This one is OK though the Panda were suppose to made out of es1.1 and es1.0 was not suppose to be supported. The WA is not full proof and you still might see corruption with this. Hence for mainline, we have decided not to push this patch.
Well, currently the tilt kernel applies this to all 4460 versions, twice even. This patch makes it do the right thing on both 1.0 and 1.1.
I see. If it's for Linaro internal tree it's fine.
OMAP4: enable double linefill on 4460
Don't do that. We found a new errata around this recently which demanded to disable the DLF. Signature of the failure is full cache line getting corrupted. The errata should soon go into the ARM documentation. So this one too isn't useful even though it improves benchmarks.
Do you have an erratum number for this?
This was very recent BUG and not yet made it to the public errata numbers. Most likely next PL310 errata update should have this one documented.
OMAP4: fix PL310 prefetch offset setting OMAP4: set PL310 prefetch offset to 3
You can combine above two if possible and it can go to mainline.
OMAP4: do not force workarounds for errata fixed in 4460
I agree though with single defconfig (omap2plus), it's hard to have such distinctions since most of the ARM errata WA are static configurations.
So keep them on by default but allow them to be turned off. In the longer term, we should of course try to make these selectively applied at runtime whenever possible.
Yep. That's the idea. On internal product kernels we do disable once which are NA for a chip.
Regards Santosh
On 22 November 2011 12:57, Shilimkar, Santosh santosh.shilimkar@ti.com wrote:
On Tue, Nov 22, 2011 at 6:02 PM, Mans Rullgard mans.rullgard@linaro.org wrote:
On 22 November 2011 05:14, Shilimkar, Santosh santosh.shilimkar@ti.com wrote:
OMAP4: enable double linefill on 4460
Don't do that. We found a new errata around this recently which demanded to disable the DLF. Signature of the failure is full cache line getting corrupted. The errata should soon go into the ARM documentation. So this one too isn't useful even though it improves benchmarks.
Do you have an erratum number for this?
This was very recent BUG and not yet made it to the public errata numbers. Most likely next PL310 errata update should have this one documented.
Do you have _any_ identifier for it?
From: linaro-dev-bounces@lists.linaro.org [mailto:linaro-dev- bounces@lists.linaro.org] On Behalf Of Mans Rullgard
Do you have an erratum number for this?
This was very recent BUG and not yet made it to the public errata numbers. Most likely next PL310 errata update should have this one documented.
Do you have _any_ identifier for it?
ARM expanded errata 752271 to cover DLF not working till r3p2 in errata version 13.1 (21 Nov 11), 4460 is r3p1-50rel0 and is impacted.
Regards, Richard W.
On 27 November 2011 18:18, Woodruff, Richard r-woodruff2@ti.com wrote:
From: linaro-dev-bounces@lists.linaro.org [mailto:linaro-dev- bounces@lists.linaro.org] On Behalf Of Mans Rullgard
Do you have an erratum number for this?
This was very recent BUG and not yet made it to the public errata numbers. Most likely next PL310 errata update should have this one documented.
Do you have _any_ identifier for it?
ARM expanded errata 752271 to cover DLF not working till r3p2 in errata version 13.1 (21 Nov 11), 4460 is r3p1-50rel0 and is impacted.
Found the updated text, most annoying. It really does help performance.
On Mon, Nov 28, 2011 at 1:53 AM, Mans Rullgard mans.rullgard@linaro.org wrote:
On 27 November 2011 18:18, Woodruff, Richard r-woodruff2@ti.com wrote:
From: linaro-dev-bounces@lists.linaro.org [mailto:linaro-dev- bounces@lists.linaro.org] On Behalf Of Mans Rullgard
Do you have an erratum number for this?
This was very recent BUG and not yet made it to the public errata numbers. Most likely next PL310 errata update should have this one documented.
Do you have _any_ identifier for it?
ARM expanded errata 752271 to cover DLF not working till r3p2 in errata version 13.1 (21 Nov 11), 4460 is r3p1-50rel0 and is impacted.
Found the updated text, most annoying. It really does help performance.
You already see the updated test. Indeed disabling DLF does affect few synthetic benchmarks.
Regards Santosh
On Sun, Nov 27, 2011 at 8:18 PM, Woodruff, Richard r-woodruff2@ti.com wrote:
From: linaro-dev-bounces@lists.linaro.org [mailto:linaro-dev- bounces@lists.linaro.org] On Behalf Of Mans Rullgard
Do you have an erratum number for this?
This was very recent BUG and not yet made it to the public errata numbers. Most likely next PL310 errata update should have this one documented.
Do you have _any_ identifier for it?
ARM expanded errata 752271 to cover DLF not working till r3p2 in errata version 13.1 (21 Nov 11), 4460 is r3p1-50rel0 and is impacted.
Thanks a lot. Your posts are very informative as usual.
By the way, do you know whether it is safe to use "SCU Speculative linefills" with Cortex-A9 r2pX and PL310 r3pX? http://infocenter.arm.com/help/topic/com.arm.doc.ddi0407f/BABEBFBH.html
As a quick and dirty test, it can be enabled in 'arch/arm/kernel/smp_scu.c' by just setting extra (1 << 3) bit in SCU Control Register from 'scu_enable' function. In my tests, this seems to reduce L2 cache access latency a bit with an overall ~1.5% performance improvement at least for 7zip data compression. This is not a huge boost, but still would be nice to have unless some errata prevent this feature from being enabled.
Synthetic random read latency benchmark (extra overhead caused by L1 cache misses) and also a bit more realistic p7zip benchmark results on origenboard (Exynos 4210 @1.2GHz) are listed below.
=== SCU Speculative linefills disabled ===
block size : random read access time 1024 : 0.0 ns 2048 : 0.0 ns 4096 : 0.0 ns 8192 : 0.0 ns 16384 : 0.0 ns 32768 : 0.1 ns 65536 : 9.1 ns 131072 : 13.8 ns 262144 : 19.0 ns 524288 : 21.6 ns 1048576 : 31.0 ns 2097152 : 86.2 ns 4194304 : 117.2 ns 8388608 : 134.5 ns 16777216 : 146.4 ns 33554432 : 155.9 ns 67108864 : 164.7 ns
# ./7za b -mmt=1
7-Zip (A) 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18 p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,2 CPUs)
RAM size: 477 MB, # CPU hardware threads: 2 RAM usage: 419 MB, # Benchmark threads: 1
Dict Compressing | Decompressing Speed Usage R/U Rating | Speed Usage R/U Rating KB/s % MIPS MIPS | KB/s % MIPS MIPS
22: 752 100 730 732 | 12665 100 1146 1143 23: 736 100 750 750 | 12469 100 1141 1141 24: 717 100 770 771 | 12289 100 1139 1140 25: 694 100 793 792 | 12103 100 1138 1138 ---------------------------------------------------------------- Avr: 100 761 761 100 1141 1141 Tot: 100 951 951
7-Zip (A) 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18 p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,2 CPUs)
RAM size: 477 MB, # CPU hardware threads: 2 RAM usage: 419 MB, # Benchmark threads: 1
Dict Compressing | Decompressing Speed Usage R/U Rating | Speed Usage R/U Rating KB/s % MIPS MIPS | KB/s % MIPS MIPS
22: 751 100 730 730 | 12675 100 1144 1144 23: 736 100 750 750 | 12483 100 1143 1143 24: 716 100 770 770 | 12300 100 1142 1141 25: 693 100 792 792 | 12113 100 1139 1139 ---------------------------------------------------------------- Avr: 100 760 760 100 1142 1142 Tot: 100 951 951
=== SCU Speculative linefills enabled ===
block size : random read access time 1024 : 0.0 ns 2048 : 0.0 ns 4096 : 0.0 ns 8192 : 0.0 ns 16384 : 0.0 ns 32768 : 0.1 ns 65536 : 7.5 ns 131072 : 11.1 ns 262144 : 16.3 ns 524288 : 19.0 ns 1048576 : 33.2 ns 2097152 : 82.9 ns 4194304 : 113.1 ns 8388608 : 130.5 ns 16777216 : 143.2 ns 33554432 : 151.6 ns 67108864 : 160.3 ns
# ./7za b -mmt=1
7-Zip (A) 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18 p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,2 CPUs)
RAM size: 479 MB, # CPU hardware threads: 2 RAM usage: 419 MB, # Benchmark threads: 1
Dict Compressing | Decompressing Speed Usage R/U Rating | Speed Usage R/U Rating KB/s % MIPS MIPS | KB/s % MIPS MIPS
22: 764 100 742 743 | 12721 100 1150 1148 23: 746 100 761 760 | 12535 100 1147 1147 24: 726 100 781 781 | 12362 100 1145 1147 25: 704 100 804 804 | 12170 100 1145 1144 ---------------------------------------------------------------- Avr: 100 772 772 100 1147 1147 Tot: 100 959 959
7-Zip (A) 9.20 Copyright (c) 1999-2010 Igor Pavlov 2010-11-18 p7zip Version 9.20 (locale=C,Utf16=off,HugeFiles=on,2 CPUs)
RAM size: 479 MB, # CPU hardware threads: 2 RAM usage: 419 MB, # Benchmark threads: 1
Dict Compressing | Decompressing Speed Usage R/U Rating | Speed Usage R/U Rating KB/s % MIPS MIPS | KB/s % MIPS MIPS
22: 764 100 745 744 | 12732 100 1148 1149 23: 747 100 761 762 | 12542 100 1149 1148 24: 728 100 783 783 | 12366 100 1147 1147 25: 706 100 806 806 | 12179 100 1145 1145 ---------------------------------------------------------------- Avr: 100 774 773 100 1147 1147 Tot: 100 960 960
On 27 November 2011 21:30, Siarhei Siamashka siarhei.siamashka@gmail.com wrote:
On Sun, Nov 27, 2011 at 8:18 PM, Woodruff, Richard r-woodruff2@ti.com wrote:
From: linaro-dev-bounces@lists.linaro.org [mailto:linaro-dev- bounces@lists.linaro.org] On Behalf Of Mans Rullgard
Do you have an erratum number for this?
This was very recent BUG and not yet made it to the public errata numbers. Most likely next PL310 errata update should have this one documented.
Do you have _any_ identifier for it?
ARM expanded errata 752271 to cover DLF not working till r3p2 in errata version 13.1 (21 Nov 11), 4460 is r3p1-50rel0 and is impacted.
Thanks a lot. Your posts are very informative as usual.
By the way, do you know whether it is safe to use "SCU Speculative linefills" with Cortex-A9 r2pX and PL310 r3pX? http://infocenter.arm.com/help/topic/com.arm.doc.ddi0407f/BABEBFBH.html
As a quick and dirty test, it can be enabled in 'arch/arm/kernel/smp_scu.c' by just setting extra (1 << 3) bit in SCU Control Register from 'scu_enable' function.
The SCU is already enabled when that function runs (don't know what enables it), so you'll need to remove the early return to make any changes. I'm not aware of any errata affecting this feature on 4460, nor have I seen any bad behaviour while running with it enabled, and I do get a slight performance increase in Libav benchmarks.
On Mon, Nov 28, 2011 at 12:18 AM, Mans Rullgard mans.rullgard@linaro.org wrote:
On 27 November 2011 21:30, Siarhei Siamashka siarhei.siamashka@gmail.com wrote:
On Sun, Nov 27, 2011 at 8:18 PM, Woodruff, Richard r-woodruff2@ti.com wrote:
From: linaro-dev-bounces@lists.linaro.org [mailto:linaro-dev- bounces@lists.linaro.org] On Behalf Of Mans Rullgard
Do you have an erratum number for this?
This was very recent BUG and not yet made it to the public errata numbers. Most likely next PL310 errata update should have this one documented.
Do you have _any_ identifier for it?
ARM expanded errata 752271 to cover DLF not working till r3p2 in errata version 13.1 (21 Nov 11), 4460 is r3p1-50rel0 and is impacted.
Thanks a lot. Your posts are very informative as usual.
By the way, do you know whether it is safe to use "SCU Speculative linefills" with Cortex-A9 r2pX and PL310 r3pX? http://infocenter.arm.com/help/topic/com.arm.doc.ddi0407f/BABEBFBH.html
As a quick and dirty test, it can be enabled in 'arch/arm/kernel/smp_scu.c' by just setting extra (1 << 3) bit in SCU Control Register from 'scu_enable' function.
The SCU is already enabled when that function runs (don't know what enables it), so you'll need to remove the early return to make any changes.
Do you mean SCU is enabled and has "SCU Speculative linefills enable" bit already set on OMAP4460? Or just SCU is enabled without speculative linefills?
In my case (origenboard), 'scu_enable' function seems to be called just once and the value in SCU Control Register is originally 0x00000000. So SCU gets enabled without speculative linefills when using the current linaro u-boot and kernel. And I thought that speculative linefills might be not enabled on purpose. But I guess this is actually a question for Samsung folks.
I'm not aware of any errata affecting this feature on 4460, nor have I seen any bad behaviour while running with it enabled, and I do get a slight performance increase in Libav benchmarks.
Thanks, that's good to know.
On 27 November 2011 23:36, Siarhei Siamashka siarhei.siamashka@gmail.com wrote:
On Mon, Nov 28, 2011 at 12:18 AM, Mans Rullgard mans.rullgard@linaro.org wrote:
On 27 November 2011 21:30, Siarhei Siamashka siarhei.siamashka@gmail.com wrote:
On Sun, Nov 27, 2011 at 8:18 PM, Woodruff, Richard r-woodruff2@ti.com wrote:
From: linaro-dev-bounces@lists.linaro.org [mailto:linaro-dev- bounces@lists.linaro.org] On Behalf Of Mans Rullgard
> Do you have an erratum number for this? > This was very recent BUG and not yet made it to the public errata numbers. Most likely next PL310 errata update should have this one documented.
Do you have _any_ identifier for it?
ARM expanded errata 752271 to cover DLF not working till r3p2 in errata version 13.1 (21 Nov 11), 4460 is r3p1-50rel0 and is impacted.
Thanks a lot. Your posts are very informative as usual.
By the way, do you know whether it is safe to use "SCU Speculative linefills" with Cortex-A9 r2pX and PL310 r3pX? http://infocenter.arm.com/help/topic/com.arm.doc.ddi0407f/BABEBFBH.html
As a quick and dirty test, it can be enabled in 'arch/arm/kernel/smp_scu.c' by just setting extra (1 << 3) bit in SCU Control Register from 'scu_enable' function.
The SCU is already enabled when that function runs (don't know what enables it), so you'll need to remove the early return to make any changes.
Do you mean SCU is enabled and has "SCU Speculative linefills enable" bit already set on OMAP4460? Or just SCU is enabled without speculative linefills?
On the OMAP4, the SCU is enabled but speculative linefills are not. To enable speculative linefills, I simply removed the early return from that function.
In my case (origenboard),
If talking about a different chip in a thread titled OMAP4, it is a good idea to mention this.
Be careful. That chip has PL310 r3p0 so it's affected by erratum 729806, "Speculative reads from the Cortex-A9 MPCore processor can cause deadlock".
'scu_enable' function seems to be called just once and the value in SCU Control Register is originally 0x00000000. So SCU gets enabled without speculative linefills when using the current linaro u-boot and kernel. And I thought that speculative linefills might be not enabled on purpose. But I guess this is actually a question for Samsung folks.
It is off by default in the hardware. That code is shared by all ARM MPCore system, so it's probably best not to turn it on unless explicitly requested somehow.
From: Mans Rullgard [mailto:mans.rullgard@linaro.org] Sent: Sunday, November 27, 2011 6:26 PM
Hi,
By the way, do you know whether it is safe to use "SCU Speculative linefills" with Cortex-A9 r2pX and PL310 r3pX?
http://infocenter.arm.com/help/topic/com.arm.doc.ddi0407f/BABEBFBH.html
As a quick and dirty test, it can be enabled in 'arch/arm/kernel/smp_scu.c' by just setting extra (1 << 3) bit in SCU Control Register from 'scu_enable' function.
<snip>
Be careful. That chip has PL310 r3p0 so it's affected by erratum 729806, "Speculative reads from the Cortex-A9 MPCore processor can cause deadlock".
For OMAP4's it should be OK as long as other necessary errata work arounds are activated in code today. However, As Mans points out other partner chips might have an issue.
Your benchmark results are interesting. I did have a couple threads with an expert on this point and your result matches. Impact depends on data set size of use case and configured speed of pl310 logic.
It was explained that when 1 processor requests shared data (all Linux-SMP memory is marked with S-bit), the SCU can signal a hint to PL310 to start a parallel L2-tag lookup while SCU-snoop-tag is checked (to see if coherency L1-cache-2-cache transfer needs to happen). In case of a snoop-cache-miss, you are now 2-3 cycles ahead of the game into the PL310 cache-tag lookup. If the data is in other processor you would have burned some power with needless lookup and perhaps (depending on resource load) delayed some valid request.
Regard, Richard W.
On 11/22/2011 08:57 PM, Somebody in the thread at some point said:
On Tue, Nov 22, 2011 at 6:02 PM, Mans Rullgardmans.rullgard@linaro.org wrote:
On 22 November 2011 05:14, Shilimkar, Santoshsantosh.shilimkar@ti.com wrote:
Mans,
On Tue, Nov 22, 2011 at 8:15 AM, Mans Rullgardmans.rullgard@linaro.org wrote:
These patches fix and tweak various cache settings for the 4460 resulting in a speed increase exceeding 10% in some tests.
Mans Rullgard (5): OMAP4: apply L2 cache lockdown workaround only on 4460 ES1.0
This one is OK though the Panda were suppose to made out of es1.1 and es1.0 was not suppose to be supported. The WA is not full proof and you still might see corruption with this. Hence for mainline, we have decided not to push this patch.
Well, currently the tilt kernel applies this to all 4460 versions, twice even. This patch makes it do the right thing on both 1.0 and 1.1.
I see. If it's for Linaro internal tree it's fine.
Just a FYI TI LT tree is basis for customer for FOSS release from TI.
What TI LT tree does for 4460 support is cobbled together and stolen from other trees in various states of completion, it's workable at the moment but any input from Mans or anyone else for making it better is super welcome.
So keep them on by default but allow them to be turned off. In the longer term, we should of course try to make these selectively applied at runtime whenever possible.
Yep. That's the idea. On internal product kernels we do disable once which are NA for a chip.
Issue is that for TI LT kernels, we target 4430 and 4460 support in one build. We can't statically turn off stuff that 4430 needs.
-Andy
On Wed, Nov 23, 2011 at 6:55 AM, Andy Green andy.green@linaro.org wrote:
On 11/22/2011 08:57 PM, Somebody in the thread at some point said:
On Tue, Nov 22, 2011 at 6:02 PM, Mans Rullgardmans.rullgard@linaro.org wrote:
On 22 November 2011 05:14, Shilimkar, Santoshsantosh.shilimkar@ti.com wrote:
Mans,
On Tue, Nov 22, 2011 at 8:15 AM, Mans Rullgardmans.rullgard@linaro.org wrote:
These patches fix and tweak various cache settings for the 4460 resulting in a speed increase exceeding 10% in some tests.
Mans Rullgard (5): OMAP4: apply L2 cache lockdown workaround only on 4460 ES1.0
This one is OK though the Panda were suppose to made out of es1.1 and es1.0 was not suppose to be supported. The WA is not full proof and you still might see corruption with this. Hence for mainline, we have decided not to push this patch.
Well, currently the tilt kernel applies this to all 4460 versions, twice even. This patch makes it do the right thing on both 1.0 and 1.1.
I see. If it's for Linaro internal tree it's fine.
Just a FYI TI LT tree is basis for customer for FOSS release from TI.
What TI LT tree does for 4460 support is cobbled together and stolen from other trees in various states of completion, it's workable at the moment but any input from Mans or anyone else for making it better is super welcome.
So keep them on by default but allow them to be turned off. In the longer term, we should of course try to make these selectively applied at runtime whenever possible.
Yep. That's the idea. On internal product kernels we do disable once which are NA for a chip.
Issue is that for TI LT kernels, we target 4430 and 4460 support in one build. We can't statically turn off stuff that 4430 needs.
We know and hence in mainline tree all of them are enabled so that both OMAP4430 and OMAP4460 works.
On 24 November 2011 12:38, Shilimkar, Santosh santosh.shilimkar@ti.com wrote:
On Wed, Nov 23, 2011 at 6:55 AM, Andy Green andy.green@linaro.org wrote:
On 11/22/2011 08:57 PM, Somebody in the thread at some point said:
On Tue, Nov 22, 2011 at 6:02 PM, Mans Rullgardmans.rullgard@linaro.org wrote:
So keep them on by default but allow them to be turned off. In the longer term, we should of course try to make these selectively applied at runtime whenever possible.
Yep. That's the idea. On internal product kernels we do disable once which are NA for a chip.
Issue is that for TI LT kernels, we target 4430 and 4460 support in one build. We can't statically turn off stuff that 4430 needs.
We know and hence in mainline tree all of them are enabled so that both OMAP4430 and OMAP4460 works.
Checking the manuals more carefully, it seems like 588369 does not affect 4430 either, having been fixed in PL310 r2p0. For 727915, it would be possible to replace outer_cache.set_debug by an empty function for r3p1 (4460), thus avoiding the monitor call if running on such a chip.