From: Kevin Hilman khilman@linaro.org
The option CONFIG_EXYNOS5420_MCPM is causing imprecise external aborts during boot testing, causing various userspace startup failures.
Disable until it has gotten more testing.
Cc: Kukjin Kim kgene.kim@samsung.com, Cc: Javier Martinez Canillas javier.martinez@collabora.co.uk, Cc: Sachin Kamat sachin.kamat@samsung.com, Cc: Doug Anderson dianders@chromium.org, Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com, Cc: Krzysztof Kozlowski k.kozlowski@samsung.com, Cc: Tushar Behera tushar.behera@linaro.org, Cc: stable@vger.kernel.org # v3.17+ Signed-off-by: Kevin Hilman khilman@linaro.org --- This has been reported by a few people[1], but not investigated or fixed, so it's time to disable this feature until it can be fixed.
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/288344....
arch/arm/configs/exynos_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/configs/exynos_defconfig b/arch/arm/configs/exynos_defconfig index 72058b8a6f4d..a250dcbf34cd 100644 --- a/arch/arm/configs/exynos_defconfig +++ b/arch/arm/configs/exynos_defconfig @@ -10,7 +10,7 @@ CONFIG_MODULE_UNLOAD=y CONFIG_PARTITION_ADVANCED=y CONFIG_ARCH_EXYNOS=y CONFIG_ARCH_EXYNOS3=y -CONFIG_EXYNOS5420_MCPM=y +CONFIG_EXYNOS5420_MCPM=n CONFIG_SMP=y CONFIG_BIG_LITTLE=y CONFIG_BL_SWITCHER=y
Kevin Hilman wrote:
From: Kevin Hilman khilman@linaro.org
The option CONFIG_EXYNOS5420_MCPM is causing imprecise external aborts during boot testing, causing various userspace startup failures.
Disable until it has gotten more testing.
Cc: Kukjin Kim kgene.kim@samsung.com, Cc: Javier Martinez Canillas javier.martinez@collabora.co.uk, Cc: Sachin Kamat sachin.kamat@samsung.com, Cc: Doug Anderson dianders@chromium.org, Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com, Cc: Krzysztof Kozlowski k.kozlowski@samsung.com, Cc: Tushar Behera tushar.behera@linaro.org, Cc: stable@vger.kernel.org # v3.17+ Signed-off-by: Kevin Hilman khilman@linaro.org
This has been reported by a few people[1], but not investigated or fixed, so it's time to disable this feature until it can be fixed.
Hi Kevin,
Yeah I agree with your opinion.
But as you can see my tree, I've queued regarding mcpm patches for 3.19 will be shown in -next in this weekend. Anyway let me apply this into -fixes and then let's enable after test its functionality in -next in a couple of days.
Thanks, Kukjin
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/288344....
arch/arm/configs/exynos_defconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/configs/exynos_defconfig b/arch/arm/configs/exynos_defconfig index 72058b8a6f4d..a250dcbf34cd 100644 --- a/arch/arm/configs/exynos_defconfig +++ b/arch/arm/configs/exynos_defconfig @@ -10,7 +10,7 @@ CONFIG_MODULE_UNLOAD=y CONFIG_PARTITION_ADVANCED=y CONFIG_ARCH_EXYNOS=y CONFIG_ARCH_EXYNOS3=y -CONFIG_EXYNOS5420_MCPM=y +CONFIG_EXYNOS5420_MCPM=n CONFIG_SMP=y CONFIG_BIG_LITTLE=y CONFIG_BL_SWITCHER=y -- 2.1.0
Kukjin Kim kgene@kernel.org writes:
Kevin Hilman wrote:
From: Kevin Hilman khilman@linaro.org
The option CONFIG_EXYNOS5420_MCPM is causing imprecise external aborts during boot testing, causing various userspace startup failures.
Disable until it has gotten more testing.
Cc: Kukjin Kim kgene.kim@samsung.com, Cc: Javier Martinez Canillas javier.martinez@collabora.co.uk, Cc: Sachin Kamat sachin.kamat@samsung.com, Cc: Doug Anderson dianders@chromium.org, Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com, Cc: Krzysztof Kozlowski k.kozlowski@samsung.com, Cc: Tushar Behera tushar.behera@linaro.org, Cc: stable@vger.kernel.org # v3.17+ Signed-off-by: Kevin Hilman khilman@linaro.org
This has been reported by a few people[1], but not investigated or fixed, so it's time to disable this feature until it can be fixed.
Hi Kevin,
Yeah I agree with your opinion.
But as you can see my tree, I've queued regarding mcpm patches for 3.19 will be shown in -next in this weekend.
Which of the recently queued patches are expected to address the imprecise abort issue? I'd be happy to test them out.
Anyway let me apply this into -fixes and then let's enable after test its functionality in -next in a couple of days.
Yes, I think this needs to be applied until these aborts are understood and fixed.
Thanks,
Kevin
Kukjin,
On Mon, Nov 10, 2014 at 11:35 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin Kim kgene@kernel.org writes:
Kevin Hilman wrote:
From: Kevin Hilman khilman@linaro.org
The option CONFIG_EXYNOS5420_MCPM is causing imprecise external aborts during boot testing, causing various userspace startup failures.
Disable until it has gotten more testing.
Cc: Kukjin Kim kgene.kim@samsung.com, Cc: Javier Martinez Canillas javier.martinez@collabora.co.uk, Cc: Sachin Kamat sachin.kamat@samsung.com, Cc: Doug Anderson dianders@chromium.org, Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com, Cc: Krzysztof Kozlowski k.kozlowski@samsung.com, Cc: Tushar Behera tushar.behera@linaro.org, Cc: stable@vger.kernel.org # v3.17+ Signed-off-by: Kevin Hilman khilman@linaro.org
This has been reported by a few people[1], but not investigated or fixed, so it's time to disable this feature until it can be fixed.
Hi Kevin,
Yeah I agree with your opinion.
But as you can see my tree, I've queued regarding mcpm patches for 3.19 will be shown in -next in this weekend.
Which of the recently queued patches are expected to address the imprecise abort issue? I'd be happy to test them out.
Exynos5 MCPM is still broken in linux-next and still causing an imprecise abort.
What is the status of $SUBJECT patch?
Anyway let me apply this into -fixes and then let's enable after test its functionality in -next in a couple of days.
Yes, I think this needs to be applied until these aborts are understood and fixed.
Is anyone at Samsung actually looking into these MCPM issues?
Kevin
On Mon, Nov 24, 2014 at 11:51 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin,
On Mon, Nov 10, 2014 at 11:35 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin Kim kgene@kernel.org writes:
Kevin Hilman wrote:
From: Kevin Hilman khilman@linaro.org
The option CONFIG_EXYNOS5420_MCPM is causing imprecise external aborts during boot testing, causing various userspace startup failures.
Disable until it has gotten more testing.
Cc: Kukjin Kim kgene.kim@samsung.com, Cc: Javier Martinez Canillas javier.martinez@collabora.co.uk, Cc: Sachin Kamat sachin.kamat@samsung.com, Cc: Doug Anderson dianders@chromium.org, Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com, Cc: Krzysztof Kozlowski k.kozlowski@samsung.com, Cc: Tushar Behera tushar.behera@linaro.org, Cc: stable@vger.kernel.org # v3.17+ Signed-off-by: Kevin Hilman khilman@linaro.org
This has been reported by a few people[1], but not investigated or fixed, so it's time to disable this feature until it can be fixed.
Hi Kevin,
Yeah I agree with your opinion.
But as you can see my tree, I've queued regarding mcpm patches for 3.19 will be shown in -next in this weekend.
Which of the recently queued patches are expected to address the imprecise abort issue? I'd be happy to test them out.
Exynos5 MCPM is still broken in linux-next and still causing an imprecise abort.
What is the status of $SUBJECT patch?
Anyway let me apply this into -fixes and then let's enable after test its functionality in -next in a couple of days.
Yes, I think this needs to be applied until these aborts are understood and fixed.
Is anyone at Samsung actually looking into these MCPM issues?
Hi Kevin,
What hardware are you having problems with? 5420 or 5422/5800?
-Olof
On Mon, Nov 24, 2014 at 4:25 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 11:51 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin,
On Mon, Nov 10, 2014 at 11:35 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin Kim kgene@kernel.org writes:
Kevin Hilman wrote:
From: Kevin Hilman khilman@linaro.org
The option CONFIG_EXYNOS5420_MCPM is causing imprecise external aborts during boot testing, causing various userspace startup failures.
Disable until it has gotten more testing.
Cc: Kukjin Kim kgene.kim@samsung.com, Cc: Javier Martinez Canillas javier.martinez@collabora.co.uk, Cc: Sachin Kamat sachin.kamat@samsung.com, Cc: Doug Anderson dianders@chromium.org, Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com, Cc: Krzysztof Kozlowski k.kozlowski@samsung.com, Cc: Tushar Behera tushar.behera@linaro.org, Cc: stable@vger.kernel.org # v3.17+ Signed-off-by: Kevin Hilman khilman@linaro.org
This has been reported by a few people[1], but not investigated or fixed, so it's time to disable this feature until it can be fixed.
Hi Kevin,
Yeah I agree with your opinion.
But as you can see my tree, I've queued regarding mcpm patches for 3.19 will be shown in -next in this weekend.
Which of the recently queued patches are expected to address the imprecise abort issue? I'd be happy to test them out.
Exynos5 MCPM is still broken in linux-next and still causing an imprecise abort.
What is the status of $SUBJECT patch?
Anyway let me apply this into -fixes and then let's enable after test its functionality in -next in a couple of days.
Yes, I think this needs to be applied until these aborts are understood and fixed.
Is anyone at Samsung actually looking into these MCPM issues?
Hi Kevin,
What hardware are you having problems with? 5420 or 5422/5800?
Yes. :)
exynos5420-arndale-octa: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig... exynos5422-odroid-xu3: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
My boot tests seem to pass fine because I have such a minimal userspace, but Tyler Baker reported that with a "real" userspace, he can't boot to a shell:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/286203....
Kevin
On Mon, Nov 24, 2014 at 5:35 PM, Kevin Hilman khilman@kernel.org wrote:
On Mon, Nov 24, 2014 at 4:25 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 11:51 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin,
On Mon, Nov 10, 2014 at 11:35 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin Kim kgene@kernel.org writes:
Kevin Hilman wrote:
From: Kevin Hilman khilman@linaro.org
The option CONFIG_EXYNOS5420_MCPM is causing imprecise external aborts during boot testing, causing various userspace startup failures.
Disable until it has gotten more testing.
Cc: Kukjin Kim kgene.kim@samsung.com, Cc: Javier Martinez Canillas javier.martinez@collabora.co.uk, Cc: Sachin Kamat sachin.kamat@samsung.com, Cc: Doug Anderson dianders@chromium.org, Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com, Cc: Krzysztof Kozlowski k.kozlowski@samsung.com, Cc: Tushar Behera tushar.behera@linaro.org, Cc: stable@vger.kernel.org # v3.17+ Signed-off-by: Kevin Hilman khilman@linaro.org
This has been reported by a few people[1], but not investigated or fixed, so it's time to disable this feature until it can be fixed.
Hi Kevin,
Yeah I agree with your opinion.
But as you can see my tree, I've queued regarding mcpm patches for 3.19 will be shown in -next in this weekend.
Which of the recently queued patches are expected to address the imprecise abort issue? I'd be happy to test them out.
Exynos5 MCPM is still broken in linux-next and still causing an imprecise abort.
What is the status of $SUBJECT patch?
Anyway let me apply this into -fixes and then let's enable after test its functionality in -next in a couple of days.
Yes, I think this needs to be applied until these aborts are understood and fixed.
Is anyone at Samsung actually looking into these MCPM issues?
Hi Kevin,
What hardware are you having problems with? 5420 or 5422/5800?
Yes. :)
exynos5420-arndale-octa: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig... exynos5422-odroid-xu3: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
My boot tests seem to pass fine because I have such a minimal userspace, but Tyler Baker reported that with a "real" userspace, he can't boot to a shell:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/286203....
I'm not surprised that 5420 has issues, but I have not seen any external aborts on neither Chromebook that I have in my farm.
Sounds like the secondary cpus should be disabled on those device trees instead, doesn't it?
-Olof
On Mon, Nov 24, 2014 at 5:37 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 5:35 PM, Kevin Hilman khilman@kernel.org wrote:
On Mon, Nov 24, 2014 at 4:25 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 11:51 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin,
On Mon, Nov 10, 2014 at 11:35 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin Kim kgene@kernel.org writes:
Kevin Hilman wrote: > > From: Kevin Hilman khilman@linaro.org > > The option CONFIG_EXYNOS5420_MCPM is causing imprecise external aborts > during boot testing, causing various userspace startup failures. > > Disable until it has gotten more testing. > > Cc: Kukjin Kim kgene.kim@samsung.com, > Cc: Javier Martinez Canillas javier.martinez@collabora.co.uk, > Cc: Sachin Kamat sachin.kamat@samsung.com, > Cc: Doug Anderson dianders@chromium.org, > Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com, > Cc: Krzysztof Kozlowski k.kozlowski@samsung.com, > Cc: Tushar Behera tushar.behera@linaro.org, > Cc: stable@vger.kernel.org # v3.17+ > Signed-off-by: Kevin Hilman khilman@linaro.org > --- > This has been reported by a few people[1], but not investigated or fixed, so it's > time to disable this feature until it can be fixed. > Hi Kevin,
Yeah I agree with your opinion.
But as you can see my tree, I've queued regarding mcpm patches for 3.19 will be shown in -next in this weekend.
Which of the recently queued patches are expected to address the imprecise abort issue? I'd be happy to test them out.
Exynos5 MCPM is still broken in linux-next and still causing an imprecise abort.
What is the status of $SUBJECT patch?
Anyway let me apply this into -fixes and then let's enable after test its functionality in -next in a couple of days.
Yes, I think this needs to be applied until these aborts are understood and fixed.
Is anyone at Samsung actually looking into these MCPM issues?
Hi Kevin,
What hardware are you having problems with? 5420 or 5422/5800?
Yes. :)
exynos5420-arndale-octa: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig... exynos5422-odroid-xu3: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
My boot tests seem to pass fine because I have such a minimal userspace, but Tyler Baker reported that with a "real" userspace, he can't boot to a shell:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/286203....
I'm not surprised that 5420 has issues, but I have not seen any external aborts on neither Chromebook that I have in my farm.
Sounds like the secondary cpus should be disabled on those device trees instead, doesn't it?
Er, cluster, not cpus.
-Olof
Olof Johansson wrote:
On Mon, Nov 24, 2014 at 5:37 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 5:35 PM, Kevin Hilman khilman@kernel.org wrote:
On Mon, Nov 24, 2014 at 4:25 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 11:51 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin,
On Mon, Nov 10, 2014 at 11:35 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin Kim kgene@kernel.org writes:
> Kevin Hilman wrote: >> >> From: Kevin Hilman khilman@linaro.org >> >> The option CONFIG_EXYNOS5420_MCPM is causing imprecise external aborts >> during boot testing, causing various userspace startup failures. >> >> Disable until it has gotten more testing. >> >> Cc: Kukjin Kim kgene.kim@samsung.com, >> Cc: Javier Martinez Canillas javier.martinez@collabora.co.uk, >> Cc: Sachin Kamat sachin.kamat@samsung.com, >> Cc: Doug Anderson dianders@chromium.org, >> Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com, >> Cc: Krzysztof Kozlowski k.kozlowski@samsung.com, >> Cc: Tushar Behera tushar.behera@linaro.org, >> Cc: stable@vger.kernel.org # v3.17+ >> Signed-off-by: Kevin Hilman khilman@linaro.org >> --- >> This has been reported by a few people[1], but not investigated or fixed, so it's >> time to disable this feature until it can be fixed. >> > Hi Kevin, > > Yeah I agree with your opinion. > > But as you can see my tree, I've queued regarding mcpm patches for 3.19 will > be shown in -next in this weekend.
Which of the recently queued patches are expected to address the imprecise abort issue? I'd be happy to test them out.
Exynos5 MCPM is still broken in linux-next and still causing an imprecise abort.
What is the status of $SUBJECT patch?
> Anyway let me apply this into -fixes and > then let's enable after test its functionality in -next in a couple of days.
Yes, I think this needs to be applied until these aborts are understood and fixed.
Is anyone at Samsung actually looking into these MCPM issues?
Hi Kevin,
What hardware are you having problems with? 5420 or 5422/5800?
Yes. :)
exynos5420-arndale-octa: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
arndale-octa.html
exynos5422-odroid-xu3: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
odroid-xu3.html
My boot tests seem to pass fine because I have such a minimal userspace, but Tyler Baker reported that with a "real" userspace, he can't boot to a shell:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/286203....
Hmm...his report was in Sep...I think it should be fine with current -next?
To be honest, since I don't have the exynos5420 arndale, chromebook...but smdk which has different bootloader, I couldn't test it...I'll try to make a test farm like you guys...
I'm not surprised that 5420 has issues, but I have not seen any
Sorry.
external aborts on neither Chromebook that I have in my farm.
Sounds like the secondary cpus should be disabled on those device trees instead, doesn't it?
Er, cluster, not cpus.
- Kukjin
On Mon, Nov 24, 2014 at 5:50 PM, Kukjin Kim kgene@kernel.org wrote:
Olof Johansson wrote:
On Mon, Nov 24, 2014 at 5:37 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 5:35 PM, Kevin Hilman khilman@kernel.org wrote:
On Mon, Nov 24, 2014 at 4:25 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 11:51 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin,
On Mon, Nov 10, 2014 at 11:35 AM, Kevin Hilman khilman@kernel.org wrote: > Kukjin Kim kgene@kernel.org writes: > >> Kevin Hilman wrote: >>> >>> From: Kevin Hilman khilman@linaro.org >>> >>> The option CONFIG_EXYNOS5420_MCPM is causing imprecise external aborts >>> during boot testing, causing various userspace startup failures. >>> >>> Disable until it has gotten more testing. >>> >>> Cc: Kukjin Kim kgene.kim@samsung.com, >>> Cc: Javier Martinez Canillas javier.martinez@collabora.co.uk, >>> Cc: Sachin Kamat sachin.kamat@samsung.com, >>> Cc: Doug Anderson dianders@chromium.org, >>> Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com, >>> Cc: Krzysztof Kozlowski k.kozlowski@samsung.com, >>> Cc: Tushar Behera tushar.behera@linaro.org, >>> Cc: stable@vger.kernel.org # v3.17+ >>> Signed-off-by: Kevin Hilman khilman@linaro.org >>> --- >>> This has been reported by a few people[1], but not investigated or fixed, so it's >>> time to disable this feature until it can be fixed. >>> >> Hi Kevin, >> >> Yeah I agree with your opinion. >> >> But as you can see my tree, I've queued regarding mcpm patches for 3.19 will >> be shown in -next in this weekend. > > Which of the recently queued patches are expected to address the > imprecise abort issue? I'd be happy to test them out.
Exynos5 MCPM is still broken in linux-next and still causing an imprecise abort.
What is the status of $SUBJECT patch?
>> Anyway let me apply this into -fixes and >> then let's enable after test its functionality in -next in a couple of days. > > Yes, I think this needs to be applied until these aborts are understood > and fixed.
Is anyone at Samsung actually looking into these MCPM issues?
Hi Kevin,
What hardware are you having problems with? 5420 or 5422/5800?
Yes. :)
exynos5420-arndale-octa: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
arndale-octa.html
exynos5422-odroid-xu3: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
odroid-xu3.html
My boot tests seem to pass fine because I have such a minimal userspace, but Tyler Baker reported that with a "real" userspace, he can't boot to a shell:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/286203....
Hmm...his report was in Sep...I think it should be fine with current -next?
No, it is still broken in linux-next (as I stated above.)
Moreover, earlier in this thread you mentioned you were merging some MCPM patches that should address this, but did not respond when I asked which patches you thing should address this issue
To be honest, since I don't have the exynos5420 arndale, chromebook...but smdk which has different bootloader, I couldn't test it...I'll try to make a test farm like you guys...
Do you have some colleagues with any other 542x hardware? I had assumed that linux-next was being better tested on the publicaly available, and widely available boards like odroid-xu3 and Chromebook2, but I've come to realize the hard way that that is not the case. You mention your board has a different bootloader. Do you suspect there's a bootloader issue on these other platforms? If so, could you elaborate on possible fixes? I'm more than willing to test any proposed fixes, but I'm not familiar enough yet with these SoCs to figure out the underlying issues alone.
Until you have a working board farm, you could start having a closer look at the boot logs we're already producing. Admittedly linux-next broken in many ways besides this one for exynos currently, but it has been having these imprecise aborts well before the other recent issues.
Also, It's very possible that this issue is not even MCPM related at all, and MCPM is just uncovering a previously hidden bug. It would be very helpful if people more familiar with this hardware and SoC would investigate bug reports like these.
Kevin
Hello Kevin,
On Tue, Nov 25, 2014 at 8:50 AM, Kevin Hilman khilman@kernel.org wrote:
On Mon, Nov 24, 2014 at 5:50 PM, Kukjin Kim kgene@kernel.org wrote:
Olof Johansson wrote:
On Mon, Nov 24, 2014 at 5:37 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 5:35 PM, Kevin Hilman khilman@kernel.org wrote:
On Mon, Nov 24, 2014 at 4:25 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 11:51 AM, Kevin Hilman khilman@kernel.org wrote: > Kukjin, > > On Mon, Nov 10, 2014 at 11:35 AM, Kevin Hilman khilman@kernel.org wrote: >> Kukjin Kim kgene@kernel.org writes: >> >>> Kevin Hilman wrote: >>>> >>>> From: Kevin Hilman khilman@linaro.org >>>> >>>> The option CONFIG_EXYNOS5420_MCPM is causing imprecise external aborts >>>> during boot testing, causing various userspace startup failures. >>>> >>>> Disable until it has gotten more testing. >>>> >>>> Cc: Kukjin Kim kgene.kim@samsung.com, >>>> Cc: Javier Martinez Canillas javier.martinez@collabora.co.uk, >>>> Cc: Sachin Kamat sachin.kamat@samsung.com, >>>> Cc: Doug Anderson dianders@chromium.org, >>>> Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com, >>>> Cc: Krzysztof Kozlowski k.kozlowski@samsung.com, >>>> Cc: Tushar Behera tushar.behera@linaro.org, >>>> Cc: stable@vger.kernel.org # v3.17+ >>>> Signed-off-by: Kevin Hilman khilman@linaro.org >>>> --- >>>> This has been reported by a few people[1], but not investigated or fixed, so it's >>>> time to disable this feature until it can be fixed. >>>> >>> Hi Kevin, >>> >>> Yeah I agree with your opinion. >>> >>> But as you can see my tree, I've queued regarding mcpm patches for 3.19 will >>> be shown in -next in this weekend. >> >> Which of the recently queued patches are expected to address the >> imprecise abort issue? I'd be happy to test them out. > > Exynos5 MCPM is still broken in linux-next and still causing an imprecise abort. > > What is the status of $SUBJECT patch? > >>> Anyway let me apply this into -fixes and >>> then let's enable after test its functionality in -next in a couple of days. >> >> Yes, I think this needs to be applied until these aborts are understood >> and fixed. > > Is anyone at Samsung actually looking into these MCPM issues?
Hi Kevin,
What hardware are you having problems with? 5420 or 5422/5800?
Yes. :)
exynos5420-arndale-octa: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
arndale-octa.html
exynos5422-odroid-xu3: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
odroid-xu3.html
My boot tests seem to pass fine because I have such a minimal userspace, but Tyler Baker reported that with a "real" userspace, he can't boot to a shell:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/286203....
Hmm...his report was in Sep...I think it should be fine with current -next?
No, it is still broken in linux-next (as I stated above.)
Moreover, earlier in this thread you mentioned you were merging some MCPM patches that should address this, but did not respond when I asked which patches you thing should address this issue
To be honest, since I don't have the exynos5420 arndale, chromebook...but smdk which has different bootloader, I couldn't test it...I'll try to make a test farm like you guys...
Do you have some colleagues with any other 542x hardware? I had assumed that linux-next was being better tested on the publicaly available, and widely available boards like odroid-xu3 and Chromebook2, but I've come to realize the hard way that that is not
Are you seeing this on Chromebook2 (Peach-Pi 5800) too ?
the case. You mention your board has a different bootloader. Do you suspect there's a bootloader issue on these other platforms? If so, could you elaborate on possible fixes? I'm more than willing to test any proposed fixes, but I'm not familiar enough yet with these SoCs to figure out the underlying issues alone.
Until you have a working board farm, you could start having a closer look at the boot logs we're already producing. Admittedly linux-next broken in many ways besides this one for exynos currently, but it has been having these imprecise aborts well before the other recent issues.
Also, It's very possible that this issue is not even MCPM related at all, and MCPM is just uncovering a previously hidden bug. It would be very helpful if people more familiar with this hardware and SoC would investigate bug reports like these.
The 3 boards I have access to (SMDK5420, Chromebook Peach-Pi and Chromebook Peach-Pit) work fine with MCPM enabled. I am not sure why it is failing only on the above mentioned boards as there is nothing specific to them in the MCPM back-end.
I assume that when you default to platsmp (on disabling MCPM), the non-working boards boot all cores upto userspace without any issues ?
Based on the timeline (problems started about 2.5 months back), there have only been a couple of changes in the 5420 MCPM back-end. Could you revert the following commits and check if things improve.
20fe6f9 ARM: EXYNOS: Support cluster power off on exynos5420/5800 fbb0499 ARM: 8083/1: exynos: activate the CCI on boot CPU/cluster using the MCPM loopback
These might not revert cleanly, so instead of the above you could also comment the following 2 lines:
diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c index dc9a764..9a07188 100644 --- a/arch/arm/mach-exynos/mcpm-exynos.c +++ b/arch/arm/mach-exynos/mcpm-exynos.c @@ -152,7 +152,7 @@ static void exynos_power_down(void) exynos_cpu_power_down(cpunr);
if (exynos_cluster_unused(cluster)) { - exynos_cluster_power_down(cluster); + //exynos_cluster_power_down(cluster); last_man = true; } } else if (cpu_use_count[cpu][cluster] == 1) { @@ -356,8 +356,8 @@ static int __init exynos_mcpm_init(void) ret = mcpm_platform_register(&exynos_power_ops); if (!ret) ret = mcpm_sync_init(exynos_pm_power_up_setup); - if (!ret) - ret = mcpm_loopback(exynos_cache_off); /* turn on the CCI */ + //if (!ret) + //ret = mcpm_loopback(exynos_cache_off); /* turn on the CCI */ if (ret) { iounmap(ns_sram_base_addr); return ret;
If you still get aborts then I suspect that the problem is with the bootloader configuration but am not sure. I am OK with disabling 5420_MCPM in the default configuration in such a case. This would however mean that S2R also stops working by default on 5420.
Regards, Abhilash
Kevin
linux-arm-kernel mailing list linux-arm-kernel@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
Hi Abhilash,
Abhilash Kesavan kesavan.abhilash@gmail.com writes:
[...]
To be honest, since I don't have the exynos5420 arndale, chromebook...but smdk which has different bootloader, I couldn't test it...I'll try to make a test farm like you guys...
Do you have some colleagues with any other 542x hardware? I had assumed that linux-next was being better tested on the publicaly available, and widely available boards like odroid-xu3 and Chromebook2, but I've come to realize the hard way that that is not
Are you seeing this on Chromebook2 (Peach-Pi 5800) too ?
No, it seems that my exynos5800-peach-pi is not having this problem, which suggests it's a bootloader setup issue.
the case. You mention your board has a different bootloader. Do you suspect there's a bootloader issue on these other platforms? If so, could you elaborate on possible fixes? I'm more than willing to test any proposed fixes, but I'm not familiar enough yet with these SoCs to figure out the underlying issues alone.
Until you have a working board farm, you could start having a closer look at the boot logs we're already producing. Admittedly linux-next broken in many ways besides this one for exynos currently, but it has been having these imprecise aborts well before the other recent issues.
Also, It's very possible that this issue is not even MCPM related at all, and MCPM is just uncovering a previously hidden bug. It would be very helpful if people more familiar with this hardware and SoC would investigate bug reports like these.
The 3 boards I have access to (SMDK5420, Chromebook Peach-Pi and Chromebook Peach-Pit) work fine with MCPM enabled.
Thanks for helping look into this.
I am not sure why it is failing only on the above mentioned boards as there is nothing specific to them in the MCPM back-end.
I assume that when you default to platsmp (on disabling MCPM), the non-working boards boot all cores upto userspace without any issues ?
Nope. With MCPM disabled:
- 5420/arndale-octa: CPU0-3 come up (A15s) - 5422/odroid-xu3: only CPU0 (A7) - 5800/peach-pi: only CPU0 (A15)
Note that with MCPM enabled, the arndale-octa gets the same result. Peach-pi on the other hand gets all 8 CPUs, and the odroid-xu3 only gets 6/8 CPUs (see other thread on that topic.)
Based on the timeline (problems started about 2.5 months back), there have only been a couple of changes in the 5420 MCPM back-end. Could you revert the following commits and check if things improve.
20fe6f9 ARM: EXYNOS: Support cluster power off on exynos5420/5800 fbb0499 ARM: 8083/1: exynos: activate the CCI on boot CPU/cluster using the MCPM loopback
These might not revert cleanly, so instead of the above you could also comment the following 2 lines:
diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c index dc9a764..9a07188 100644 --- a/arch/arm/mach-exynos/mcpm-exynos.c +++ b/arch/arm/mach-exynos/mcpm-exynos.c @@ -152,7 +152,7 @@ static void exynos_power_down(void) exynos_cpu_power_down(cpunr);
if (exynos_cluster_unused(cluster)) {
exynos_cluster_power_down(cluster);
//exynos_cluster_power_down(cluster); last_man = true; }
2> } else if (cpu_use_count[cpu][cluster] == 1) {
@@ -356,8 +356,8 @@ static int __init exynos_mcpm_init(void) ret = mcpm_platform_register(&exynos_power_ops); if (!ret) ret = mcpm_sync_init(exynos_pm_power_up_setup);
if (!ret)
ret = mcpm_loopback(exynos_cache_off); /* turn on the CCI */
//if (!ret)
//ret = mcpm_loopback(exynos_cache_off); /* turn on the CCI */ if (ret) { iounmap(ns_sram_base_addr); return ret;
If you still get aborts then I suspect that the problem is with the bootloader configuration but am not sure.
Nice. With those lines commented out, the arndale-octa is not geting imprecise aborts anymore, and this is the platform where those aborts seem to prevent booting into a full userspace (as originally reported by Tyler.)
More specifically, with only the loopback call to turn off CCI commented out, the imprecise aborts go away.
The odroid-xu3 is still getting them, but these seem to happen whether or not MCPM is enabled, so must a different issue related to the bootloader setup.
I am OK with disabling 5420_MCPM in the default configuration in such a case. This would however mean that S2R also stops working by default on 5420.
Disabling the option isn't my first choice either, I would rather see this issue debugged and fixed by folks that are more familiar with MCPM on Exynos.
Kevin
Hi Kevin,
On Wed, Nov 26, 2014 at 6:30 AM, Kevin Hilman khilman@kernel.org wrote:
Hi Abhilash,
Abhilash Kesavan kesavan.abhilash@gmail.com writes:
[...]
To be honest, since I don't have the exynos5420 arndale, chromebook...but smdk which has different bootloader, I couldn't test it...I'll try to make a test farm like you guys...
Do you have some colleagues with any other 542x hardware? I had assumed that linux-next was being better tested on the publicaly available, and widely available boards like odroid-xu3 and Chromebook2, but I've come to realize the hard way that that is not
Are you seeing this on Chromebook2 (Peach-Pi 5800) too ?
No, it seems that my exynos5800-peach-pi is not having this problem, which suggests it's a bootloader setup issue.
the case. You mention your board has a different bootloader. Do you suspect there's a bootloader issue on these other platforms? If so, could you elaborate on possible fixes? I'm more than willing to test any proposed fixes, but I'm not familiar enough yet with these SoCs to figure out the underlying issues alone.
Until you have a working board farm, you could start having a closer look at the boot logs we're already producing. Admittedly linux-next broken in many ways besides this one for exynos currently, but it has been having these imprecise aborts well before the other recent issues.
Also, It's very possible that this issue is not even MCPM related at all, and MCPM is just uncovering a previously hidden bug. It would be very helpful if people more familiar with this hardware and SoC would investigate bug reports like these.
The 3 boards I have access to (SMDK5420, Chromebook Peach-Pi and Chromebook Peach-Pit) work fine with MCPM enabled.
Thanks for helping look into this.
I am not sure why it is failing only on the above mentioned boards as there is nothing specific to them in the MCPM back-end.
I assume that when you default to platsmp (on disabling MCPM), the non-working boards boot all cores upto userspace without any issues ?
Nope. With MCPM disabled:
- 5420/arndale-octa: CPU0-3 come up (A15s)
- 5422/odroid-xu3: only CPU0 (A7)
- 5800/peach-pi: only CPU0 (A15)
Note that with MCPM enabled, the arndale-octa gets the same result. Peach-pi on the other hand gets all 8 CPUs, and the odroid-xu3 only gets 6/8 CPUs (see other thread on that topic.)
Based on the timeline (problems started about 2.5 months back), there have only been a couple of changes in the 5420 MCPM back-end. Could you revert the following commits and check if things improve.
20fe6f9 ARM: EXYNOS: Support cluster power off on exynos5420/5800 fbb0499 ARM: 8083/1: exynos: activate the CCI on boot CPU/cluster using the MCPM loopback
These might not revert cleanly, so instead of the above you could also comment the following 2 lines:
diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c index dc9a764..9a07188 100644 --- a/arch/arm/mach-exynos/mcpm-exynos.c +++ b/arch/arm/mach-exynos/mcpm-exynos.c @@ -152,7 +152,7 @@ static void exynos_power_down(void) exynos_cpu_power_down(cpunr);
if (exynos_cluster_unused(cluster)) {
exynos_cluster_power_down(cluster);
//exynos_cluster_power_down(cluster); last_man = true; }
2> } else if (cpu_use_count[cpu][cluster] == 1) {
@@ -356,8 +356,8 @@ static int __init exynos_mcpm_init(void) ret = mcpm_platform_register(&exynos_power_ops); if (!ret) ret = mcpm_sync_init(exynos_pm_power_up_setup);
if (!ret)
ret = mcpm_loopback(exynos_cache_off); /* turn on the CCI */
//if (!ret)
//ret = mcpm_loopback(exynos_cache_off); /* turn on the CCI */ if (ret) { iounmap(ns_sram_base_addr); return ret;
If you still get aborts then I suspect that the problem is with the bootloader configuration but am not sure.
Nice. With those lines commented out, the arndale-octa is not geting imprecise aborts anymore, and this is the platform where those aborts seem to prevent booting into a full userspace (as originally reported by Tyler.)
More specifically, with only the loopback call to turn off CCI commented out, the imprecise aborts go away.
I can't see how enabling snoops for the boot cluster is causing these aborts. Perhaps as Krzysztof commented it has something to do with the secure firmware/tz software on these boards ? Other than there does not appear to be any difference between the working/non-working setups.
Abhilash
The odroid-xu3 is still getting them, but these seem to happen whether or not MCPM is enabled, so must a different issue related to the bootloader setup.
I am OK with disabling 5420_MCPM in the default configuration in such a case. This would however mean that S2R also stops working by default on 5420.
Disabling the option isn't my first choice either, I would rather see this issue debugged and fixed by folks that are more familiar with MCPM on Exynos.
Kevin
Abhilash Kesavan kesavan.abhilash@gmail.com writes:
Hi Kevin,
On Wed, Nov 26, 2014 at 6:30 AM, Kevin Hilman khilman@kernel.org wrote:
Hi Abhilash,
Abhilash Kesavan kesavan.abhilash@gmail.com writes:
[...]
To be honest, since I don't have the exynos5420 arndale, chromebook...but smdk which has different bootloader, I couldn't test it...I'll try to make a test farm like you guys...
Do you have some colleagues with any other 542x hardware? I had assumed that linux-next was being better tested on the publicaly available, and widely available boards like odroid-xu3 and Chromebook2, but I've come to realize the hard way that that is not
Are you seeing this on Chromebook2 (Peach-Pi 5800) too ?
No, it seems that my exynos5800-peach-pi is not having this problem, which suggests it's a bootloader setup issue.
the case. You mention your board has a different bootloader. Do you suspect there's a bootloader issue on these other platforms? If so, could you elaborate on possible fixes? I'm more than willing to test any proposed fixes, but I'm not familiar enough yet with these SoCs to figure out the underlying issues alone.
Until you have a working board farm, you could start having a closer look at the boot logs we're already producing. Admittedly linux-next broken in many ways besides this one for exynos currently, but it has been having these imprecise aborts well before the other recent issues.
Also, It's very possible that this issue is not even MCPM related at all, and MCPM is just uncovering a previously hidden bug. It would be very helpful if people more familiar with this hardware and SoC would investigate bug reports like these.
The 3 boards I have access to (SMDK5420, Chromebook Peach-Pi and Chromebook Peach-Pit) work fine with MCPM enabled.
Thanks for helping look into this.
I am not sure why it is failing only on the above mentioned boards as there is nothing specific to them in the MCPM back-end.
I assume that when you default to platsmp (on disabling MCPM), the non-working boards boot all cores upto userspace without any issues ?
Nope. With MCPM disabled:
- 5420/arndale-octa: CPU0-3 come up (A15s)
- 5422/odroid-xu3: only CPU0 (A7)
- 5800/peach-pi: only CPU0 (A15)
Note that with MCPM enabled, the arndale-octa gets the same result. Peach-pi on the other hand gets all 8 CPUs, and the odroid-xu3 only gets 6/8 CPUs (see other thread on that topic.)
Based on the timeline (problems started about 2.5 months back), there have only been a couple of changes in the 5420 MCPM back-end. Could you revert the following commits and check if things improve.
20fe6f9 ARM: EXYNOS: Support cluster power off on exynos5420/5800 fbb0499 ARM: 8083/1: exynos: activate the CCI on boot CPU/cluster using the MCPM loopback
These might not revert cleanly, so instead of the above you could also comment the following 2 lines:
diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c index dc9a764..9a07188 100644 --- a/arch/arm/mach-exynos/mcpm-exynos.c +++ b/arch/arm/mach-exynos/mcpm-exynos.c @@ -152,7 +152,7 @@ static void exynos_power_down(void) exynos_cpu_power_down(cpunr);
if (exynos_cluster_unused(cluster)) {
exynos_cluster_power_down(cluster);
//exynos_cluster_power_down(cluster); last_man = true; }
2> } else if (cpu_use_count[cpu][cluster] == 1) {
@@ -356,8 +356,8 @@ static int __init exynos_mcpm_init(void) ret = mcpm_platform_register(&exynos_power_ops); if (!ret) ret = mcpm_sync_init(exynos_pm_power_up_setup);
if (!ret)
ret = mcpm_loopback(exynos_cache_off); /* turn on the CCI */
//if (!ret)
//ret = mcpm_loopback(exynos_cache_off); /* turn on the CCI */ if (ret) { iounmap(ns_sram_base_addr); return ret;
If you still get aborts then I suspect that the problem is with the bootloader configuration but am not sure.
Nice. With those lines commented out, the arndale-octa is not geting imprecise aborts anymore, and this is the platform where those aborts seem to prevent booting into a full userspace (as originally reported by Tyler.)
More specifically, with only the loopback call to turn off CCI commented out, the imprecise aborts go away.
I can't see how enabling snoops for the boot cluster is causing these aborts. Perhaps as Krzysztof commented it has something to do with the secure firmware/tz software on these boards ? Other than there does not appear to be any difference between the working/non-working setups.
Perhaps the secure firmware is preventing the CCI to be enabled by the kernel, and that is causing the imprecise abort?
Is there a way to update/replace the BL1/BL2/TZ firmware blobs with something that is known to be working better?
Kevin
On 11/27/14 02:56, Kevin Hilman wrote:
Abhilash Kesavan kesavan.abhilash@gmail.com writes:
Hi Kevin,
On Wed, Nov 26, 2014 at 6:30 AM, Kevin Hilman khilman@kernel.org wrote:
Hi Abhilash,
Abhilash Kesavan kesavan.abhilash@gmail.com writes:
[...]
To be honest, since I don't have the exynos5420 arndale, chromebook...but smdk which has different bootloader, I couldn't test it...I'll try to make a test farm like you guys...
Do you have some colleagues with any other 542x hardware? I had assumed that linux-next was being better tested on the publicaly available, and widely available boards like odroid-xu3 and Chromebook2, but I've come to realize the hard way that that is not
Are you seeing this on Chromebook2 (Peach-Pi 5800) too ?
No, it seems that my exynos5800-peach-pi is not having this problem, which suggests it's a bootloader setup issue.
the case. You mention your board has a different bootloader. Do you suspect there's a bootloader issue on these other platforms? If so, could you elaborate on possible fixes? I'm more than willing to test any proposed fixes, but I'm not familiar enough yet with these SoCs to figure out the underlying issues alone.
Until you have a working board farm, you could start having a closer look at the boot logs we're already producing. Admittedly linux-next broken in many ways besides this one for exynos currently, but it has been having these imprecise aborts well before the other recent issues.
Also, It's very possible that this issue is not even MCPM related at all, and MCPM is just uncovering a previously hidden bug. It would be very helpful if people more familiar with this hardware and SoC would investigate bug reports like these.
The 3 boards I have access to (SMDK5420, Chromebook Peach-Pi and Chromebook Peach-Pit) work fine with MCPM enabled.
Thanks for helping look into this.
I am not sure why it is failing only on the above mentioned boards as there is nothing specific to them in the MCPM back-end.
I assume that when you default to platsmp (on disabling MCPM), the non-working boards boot all cores upto userspace without any issues ?
Nope. With MCPM disabled:
- 5420/arndale-octa: CPU0-3 come up (A15s)
- 5422/odroid-xu3: only CPU0 (A7)
- 5800/peach-pi: only CPU0 (A15)
Note that with MCPM enabled, the arndale-octa gets the same result. Peach-pi on the other hand gets all 8 CPUs, and the odroid-xu3 only gets 6/8 CPUs (see other thread on that topic.)
Based on the timeline (problems started about 2.5 months back), there have only been a couple of changes in the 5420 MCPM back-end. Could you revert the following commits and check if things improve.
20fe6f9 ARM: EXYNOS: Support cluster power off on exynos5420/5800 fbb0499 ARM: 8083/1: exynos: activate the CCI on boot CPU/cluster using the MCPM loopback
These might not revert cleanly, so instead of the above you could also comment the following 2 lines:
diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c index dc9a764..9a07188 100644 --- a/arch/arm/mach-exynos/mcpm-exynos.c +++ b/arch/arm/mach-exynos/mcpm-exynos.c @@ -152,7 +152,7 @@ static void exynos_power_down(void) exynos_cpu_power_down(cpunr);
if (exynos_cluster_unused(cluster)) {
exynos_cluster_power_down(cluster);
//exynos_cluster_power_down(cluster); last_man = true; }
2> } else if (cpu_use_count[cpu][cluster] == 1) {
@@ -356,8 +356,8 @@ static int __init exynos_mcpm_init(void) ret = mcpm_platform_register(&exynos_power_ops); if (!ret) ret = mcpm_sync_init(exynos_pm_power_up_setup);
if (!ret)
ret = mcpm_loopback(exynos_cache_off); /* turn on the CCI */
//if (!ret)
//ret = mcpm_loopback(exynos_cache_off); /* turn on the CCI */ if (ret) { iounmap(ns_sram_base_addr); return ret;
If you still get aborts then I suspect that the problem is with the bootloader configuration but am not sure.
Nice. With those lines commented out, the arndale-octa is not geting imprecise aborts anymore, and this is the platform where those aborts seem to prevent booting into a full userspace (as originally reported by Tyler.)
More specifically, with only the loopback call to turn off CCI commented out, the imprecise aborts go away.
I can't see how enabling snoops for the boot cluster is causing these aborts. Perhaps as Krzysztof commented it has something to do with the secure firmware/tz software on these boards ? Other than there does not appear to be any difference between the working/non-working setups.
Perhaps the secure firmware is preventing the CCI to be enabled by the kernel, and that is causing the imprecise abort?
Is there a way to update/replace the BL1/BL2/TZ firmware blobs with something that is known to be working better?
Seems current problem you mentioned is due to different bootloader as I commented before, but to release bootloader images (bl1, bl2 and so on) should be handled by board manufacture not SoC vendor I think...even though the images are provided by vendor for manufacture. To be honest I'm not sure what procedure should be passed in Samsung side for now because we including Abhilash are belong to just development team. Need some time but I can't confirm that...sorry. Let us try.
BTW, Kevin do you know current version for bootloader images on the boards?
Thanks, Kukjin
On Wed, 26 Nov 2014, Kevin Hilman wrote:
Abhilash Kesavan kesavan.abhilash@gmail.com writes:
Hi Kevin,
On Wed, Nov 26, 2014 at 6:30 AM, Kevin Hilman khilman@kernel.org wrote:
[...]
More specifically, with only the loopback call to turn off CCI commented out, the imprecise aborts go away.
I can't see how enabling snoops for the boot cluster is causing these aborts. Perhaps as Krzysztof commented it has something to do with the secure firmware/tz software on these boards ? Other than there does not appear to be any difference between the working/non-working setups.
Perhaps the secure firmware is preventing the CCI to be enabled by the kernel, and that is causing the imprecise abort?
That is well possible.
Now...... if the bootloader/firmware does not let Linux deal with both the CCI and caches then MCPM simply has no more purpose for this board. The whole point of MCPM is actually to handle the CCI properly and the most efficient way despite all the possible races and opportunities for memory corruptions. And yes, this is a complex task.
So there is actually two choices: the firmware let Linux take care of it via the MCPM layer (easy), or the firmware has to implement it all _properly_ (hard) behind an interface such as PSCI, at which point MCPM should be configured out.
If the firmware does not let Linux interact with the CCI _and_ does not implement full MCPM-like services then the platform is broken and only a firmware upgrade could fix that. It might still be possible to boot all CPUs through other means, but power management would then be severely limited.
Nicolas
Hi Kevin,
On Thu, Nov 27, 2014 at 12:11 AM, Nicolas Pitre nicolas.pitre@linaro.org wrote:
On Wed, 26 Nov 2014, Kevin Hilman wrote:
Abhilash Kesavan kesavan.abhilash@gmail.com writes:
Hi Kevin,
On Wed, Nov 26, 2014 at 6:30 AM, Kevin Hilman khilman@kernel.org wrote:
[...]
More specifically, with only the loopback call to turn off CCI commented out, the imprecise aborts go away.
I can't see how enabling snoops for the boot cluster is causing these aborts. Perhaps as Krzysztof commented it has something to do with the secure firmware/tz software on these boards ? Other than there does not appear to be any difference between the working/non-working setups.
Perhaps the secure firmware is preventing the CCI to be enabled by the kernel, and that is causing the imprecise abort?
That is well possible.
Now...... if the bootloader/firmware does not let Linux deal with both the CCI and caches then MCPM simply has no more purpose for this board. The whole point of MCPM is actually to handle the CCI properly and the most efficient way despite all the possible races and opportunities for memory corruptions. And yes, this is a complex task.
So there is actually two choices: the firmware let Linux take care of it via the MCPM layer (easy), or the firmware has to implement it all _properly_ (hard) behind an interface such as PSCI, at which point MCPM should be configured out.
If the firmware does not let Linux interact with the CCI _and_ does not implement full MCPM-like services then the platform is broken and only a firmware upgrade could fix that. It might still be possible to boot all CPUs through other means, but power management would then be severely limited.
How about restricting the mcpm initialization to only known working boards like chromebooks and smdk. This would be better than disabling the config altogether from exynos_defconfig. The non-working boards would then default to platsmp. Assuming that the firmware handles all CCI/cache activities then platsmp may work for secondary core boot-up ?
Can you please apply the below diff and test the non-working boards with CONFIG_EXYNOS5420_MCPM enabled.
diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c index b0d3c2e..34d77bb 100644 --- a/arch/arm/mach-exynos/mcpm-exynos.c +++ b/arch/arm/mach-exynos/mcpm-exynos.c @@ -316,8 +316,9 @@ static void __init exynos_cache_off(void) }
static const struct of_device_id exynos_dt_mcpm_match[] = { - { .compatible = "samsung,exynos5420" }, - { .compatible = "samsung,exynos5800" }, + { .compatible = "samsung,smdk5420" }, + { .compatible = "google,pi" }, + { .compatible = "google,pit" }, {}, };
On a different note, I did not see any mainline support for Odroid Xu3, are you testing this board with a non-mainline kernel ?
Regards, Abhilash
Nicolas
On Thu, 27 Nov 2014, Abhilash Kesavan wrote:
Hi Kevin,
On Thu, Nov 27, 2014 at 12:11 AM, Nicolas Pitre nicolas.pitre@linaro.org wrote:
On Wed, 26 Nov 2014, Kevin Hilman wrote:
Abhilash Kesavan kesavan.abhilash@gmail.com writes:
Hi Kevin,
On Wed, Nov 26, 2014 at 6:30 AM, Kevin Hilman khilman@kernel.org wrote:
[...]
More specifically, with only the loopback call to turn off CCI commented out, the imprecise aborts go away.
I can't see how enabling snoops for the boot cluster is causing these aborts. Perhaps as Krzysztof commented it has something to do with the secure firmware/tz software on these boards ? Other than there does not appear to be any difference between the working/non-working setups.
Perhaps the secure firmware is preventing the CCI to be enabled by the kernel, and that is causing the imprecise abort?
That is well possible.
Now...... if the bootloader/firmware does not let Linux deal with both the CCI and caches then MCPM simply has no more purpose for this board. The whole point of MCPM is actually to handle the CCI properly and the most efficient way despite all the possible races and opportunities for memory corruptions. And yes, this is a complex task.
So there is actually two choices: the firmware let Linux take care of it via the MCPM layer (easy), or the firmware has to implement it all _properly_ (hard) behind an interface such as PSCI, at which point MCPM should be configured out.
If the firmware does not let Linux interact with the CCI _and_ does not implement full MCPM-like services then the platform is broken and only a firmware upgrade could fix that. It might still be possible to boot all CPUs through other means, but power management would then be severely limited.
How about restricting the mcpm initialization to only known working boards like chromebooks and smdk. This would be better than disabling the config altogether from exynos_defconfig. The non-working boards would then default to platsmp. Assuming that the firmware handles all CCI/cache activities then platsmp may work for secondary core boot-up ?
Can you please apply the below diff and test the non-working boards with CONFIG_EXYNOS5420_MCPM enabled.
I'd much prefer if the CCI is non accessible on some board that the device tree file for that board be modified instead by marking the CCI as unavailable.
diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c index b0d3c2e..34d77bb 100644 --- a/arch/arm/mach-exynos/mcpm-exynos.c +++ b/arch/arm/mach-exynos/mcpm-exynos.c @@ -316,8 +316,9 @@ static void __init exynos_cache_off(void) }
static const struct of_device_id exynos_dt_mcpm_match[] = {
{ .compatible = "samsung,exynos5420" },
{ .compatible = "samsung,exynos5800" },
{ .compatible = "samsung,smdk5420" },
{ .compatible = "google,pi" },
{ .compatible = "google,pit" }, {},
};
On a different note, I did not see any mainline support for Odroid Xu3, are you testing this board with a non-mainline kernel ?
Regards, Abhilash
Nicolas
Hi Nicolas,
On Thu, Nov 27, 2014 at 10:36 PM, Nicolas Pitre nicolas.pitre@linaro.org wrote:
On Thu, 27 Nov 2014, Abhilash Kesavan wrote:
Hi Kevin,
On Thu, Nov 27, 2014 at 12:11 AM, Nicolas Pitre nicolas.pitre@linaro.org wrote:
On Wed, 26 Nov 2014, Kevin Hilman wrote:
Abhilash Kesavan kesavan.abhilash@gmail.com writes:
Hi Kevin,
On Wed, Nov 26, 2014 at 6:30 AM, Kevin Hilman khilman@kernel.org wrote:
[...]
More specifically, with only the loopback call to turn off CCI commented out, the imprecise aborts go away.
I can't see how enabling snoops for the boot cluster is causing these aborts. Perhaps as Krzysztof commented it has something to do with the secure firmware/tz software on these boards ? Other than there does not appear to be any difference between the working/non-working setups.
Perhaps the secure firmware is preventing the CCI to be enabled by the kernel, and that is causing the imprecise abort?
That is well possible.
Now...... if the bootloader/firmware does not let Linux deal with both the CCI and caches then MCPM simply has no more purpose for this board. The whole point of MCPM is actually to handle the CCI properly and the most efficient way despite all the possible races and opportunities for memory corruptions. And yes, this is a complex task.
So there is actually two choices: the firmware let Linux take care of it via the MCPM layer (easy), or the firmware has to implement it all _properly_ (hard) behind an interface such as PSCI, at which point MCPM should be configured out.
If the firmware does not let Linux interact with the CCI _and_ does not implement full MCPM-like services then the platform is broken and only a firmware upgrade could fix that. It might still be possible to boot all CPUs through other means, but power management would then be severely limited.
How about restricting the mcpm initialization to only known working boards like chromebooks and smdk. This would be better than disabling the config altogether from exynos_defconfig. The non-working boards would then default to platsmp. Assuming that the firmware handles all CCI/cache activities then platsmp may work for secondary core boot-up ?
Can you please apply the below diff and test the non-working boards with CONFIG_EXYNOS5420_MCPM enabled.
I'd much prefer if the CCI is non accessible on some board that the device tree file for that board be modified instead by marking the CCI as unavailable.
I will post a patch disabling CCI for Arndale-Octa.
Abhilasj
diff --git a/arch/arm/mach-exynos/mcpm-exynos.c b/arch/arm/mach-exynos/mcpm-exynos.c index b0d3c2e..34d77bb 100644 --- a/arch/arm/mach-exynos/mcpm-exynos.c +++ b/arch/arm/mach-exynos/mcpm-exynos.c @@ -316,8 +316,9 @@ static void __init exynos_cache_off(void) }
static const struct of_device_id exynos_dt_mcpm_match[] = {
{ .compatible = "samsung,exynos5420" },
{ .compatible = "samsung,exynos5800" },
{ .compatible = "samsung,smdk5420" },
{ .compatible = "google,pi" },
{ .compatible = "google,pit" }, {},
};
On a different note, I did not see any mainline support for Odroid Xu3, are you testing this board with a non-mainline kernel ?
Regards, Abhilash
Nicolas
On 26/11/14 18:41, Nicolas Pitre wrote:
On Wed, 26 Nov 2014, Kevin Hilman wrote:
Abhilash Kesavan kesavan.abhilash@gmail.com writes:
Hi Kevin,
On Wed, Nov 26, 2014 at 6:30 AM, Kevin Hilman khilman@kernel.org wrote:
[...]
More specifically, with only the loopback call to turn off CCI commented out, the imprecise aborts go away.
I can't see how enabling snoops for the boot cluster is causing these aborts. Perhaps as Krzysztof commented it has something to do with the secure firmware/tz software on these boards ? Other than there does not appear to be any difference between the working/non-working setups.
Perhaps the secure firmware is preventing the CCI to be enabled by the kernel, and that is causing the imprecise abort?
That is well possible.
Now...... if the bootloader/firmware does not let Linux deal with both the CCI and caches then MCPM simply has no more purpose for this board. The whole point of MCPM is actually to handle the CCI properly and the most efficient way despite all the possible races and opportunities for memory corruptions. And yes, this is a complex task.
So there is actually two choices: the firmware let Linux take care of it via the MCPM layer (easy), or the firmware has to implement it all _properly_ (hard) behind an interface such as PSCI, at which point MCPM should be configured out.
If the firmware does not let Linux interact with the CCI _and_ does not implement full MCPM-like services then the platform is broken and only a firmware upgrade could fix that. It might still be possible to boot all CPUs through other means, but power management would then be severely limited.
Thanks Nico for the detailed description on the requirements for using MCPM. This is the kind of issue I was worried in the other thread on Fijitsu platform. That's the reason I was asking the information about their secure firmware and what exactly it configures so that we won't end up with similar situation on there too and definitely not to push PSCI. I completely agree with you that making a some change in firmware to give control of CCI to kernel is easy.
Probably if the vendors disagree to apply this small fix to the firmware we should provide them with *only choice* of PSCI implementation which is quite complex and easy to get it wrong. That might trigger them to provide a small fix to use MCPM.
Regards, Sudeep
On pon, 2014-11-24 at 19:20 -0800, Kevin Hilman wrote:
On Mon, Nov 24, 2014 at 5:50 PM, Kukjin Kim kgene@kernel.org wrote:
Olof Johansson wrote:
On Mon, Nov 24, 2014 at 5:37 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 5:35 PM, Kevin Hilman khilman@kernel.org wrote:
On Mon, Nov 24, 2014 at 4:25 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 11:51 AM, Kevin Hilman khilman@kernel.org wrote: > > Is anyone at Samsung actually looking into these MCPM issues?
Hi Kevin,
What hardware are you having problems with? 5420 or 5422/5800?
Yes. :)
exynos5420-arndale-octa: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
arndale-octa.html
exynos5422-odroid-xu3: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
odroid-xu3.html
My boot tests seem to pass fine because I have such a minimal userspace, but Tyler Baker reported that with a "real" userspace, he can't boot to a shell:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/286203....
Hmm...his report was in Sep...I think it should be fine with current -next?
No, it is still broken in linux-next (as I stated above.)
Moreover, earlier in this thread you mentioned you were merging some MCPM patches that should address this, but did not respond when I asked which patches you thing should address this issue
To be honest, since I don't have the exynos5420 arndale, chromebook...but smdk which has different bootloader, I couldn't test it...I'll try to make a test farm like you guys...
Do you have some colleagues with any other 542x hardware? I had assumed that linux-next was being better tested on the publicaly available, and widely available boards like odroid-xu3 and Chromebook2, but I've come to realize the hard way that that is not the case. You mention your board has a different bootloader. Do you suspect there's a bootloader issue on these other platforms? If so, could you elaborate on possible fixes? I'm more than willing to test any proposed fixes, but I'm not familiar enough yet with these SoCs to figure out the underlying issues alone.
Until you have a working board farm, you could start having a closer look at the boot logs we're already producing. Admittedly linux-next broken in many ways besides this one for exynos currently, but it has been having these imprecise aborts well before the other recent issues.
Also, It's very possible that this issue is not even MCPM related at all, and MCPM is just uncovering a previously hidden bug. It would be very helpful if people more familiar with this hardware and SoC would investigate bug reports like these.
Interesting thing can be found in exynos5420.dtsi: mdma1: mdma@11C10000 { ... /* * MDMA1 can support both secure and non-secure * AXI transactions. When this is enabled in the kernel * for boards that run in secure mode, we are getting * imprecise external aborts causing the kernel to oops. */ status = "disabled"; };
I am booting Arndale Octa on some other config and exynos. However with or without MCPM the imprecise aborts are still present (but not fatal, shell comes up).
My board boots also under secure firmware (I am using Linaro's ubuntu image). Maybe that is the cause?
Best regards, Krzysztof
On wto, 2014-11-25 at 09:47 +0100, Krzysztof Kozlowski wrote:
On pon, 2014-11-24 at 19:20 -0800, Kevin Hilman wrote:
On Mon, Nov 24, 2014 at 5:50 PM, Kukjin Kim kgene@kernel.org wrote:
Olof Johansson wrote:
On Mon, Nov 24, 2014 at 5:37 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 5:35 PM, Kevin Hilman khilman@kernel.org wrote:
On Mon, Nov 24, 2014 at 4:25 PM, Olof Johansson olof@lixom.net wrote: > On Mon, Nov 24, 2014 at 11:51 AM, Kevin Hilman khilman@kernel.org wrote: >> >> Is anyone at Samsung actually looking into these MCPM issues? > > Hi Kevin, > > What hardware are you having problems with? 5420 or 5422/5800?
Yes. :)
exynos5420-arndale-octa: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
arndale-octa.html
exynos5422-odroid-xu3: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
odroid-xu3.html
My boot tests seem to pass fine because I have such a minimal userspace, but Tyler Baker reported that with a "real" userspace, he can't boot to a shell:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/286203....
Hmm...his report was in Sep...I think it should be fine with current -next?
No, it is still broken in linux-next (as I stated above.)
Moreover, earlier in this thread you mentioned you were merging some MCPM patches that should address this, but did not respond when I asked which patches you thing should address this issue
To be honest, since I don't have the exynos5420 arndale, chromebook...but smdk which has different bootloader, I couldn't test it...I'll try to make a test farm like you guys...
Do you have some colleagues with any other 542x hardware? I had assumed that linux-next was being better tested on the publicaly available, and widely available boards like odroid-xu3 and Chromebook2, but I've come to realize the hard way that that is not the case. You mention your board has a different bootloader. Do you suspect there's a bootloader issue on these other platforms? If so, could you elaborate on possible fixes? I'm more than willing to test any proposed fixes, but I'm not familiar enough yet with these SoCs to figure out the underlying issues alone.
Until you have a working board farm, you could start having a closer look at the boot logs we're already producing. Admittedly linux-next broken in many ways besides this one for exynos currently, but it has been having these imprecise aborts well before the other recent issues.
Also, It's very possible that this issue is not even MCPM related at all, and MCPM is just uncovering a previously hidden bug. It would be very helpful if people more familiar with this hardware and SoC would investigate bug reports like these.
Interesting thing can be found in exynos5420.dtsi: mdma1: mdma@11C10000 { ... /* * MDMA1 can support both secure and non-secure * AXI transactions. When this is enabled in the kernel * for boards that run in secure mode, we are getting * imprecise external aborts causing the kernel to oops. */ status = "disabled"; };
I am booting Arndale Octa on some other config and exynos. However with or without MCPM the imprecise aborts are still present (but not fatal, shell comes up).
My board boots also under secure firmware (I am using Linaro's ubuntu image). Maybe that is the cause?
One update: some fatal imprecise aborts (hanging boot) happen also on exynos_defconfig without MCPM. It looks random... one boot fails, next succeeds (however also with "imprecise external abort" message but shell comes up).
Best regards, Krzysztof
On Mon, Nov 24, 2014 at 5:37 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 5:35 PM, Kevin Hilman khilman@kernel.org wrote:
On Mon, Nov 24, 2014 at 4:25 PM, Olof Johansson olof@lixom.net wrote:
On Mon, Nov 24, 2014 at 11:51 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin,
On Mon, Nov 10, 2014 at 11:35 AM, Kevin Hilman khilman@kernel.org wrote:
Kukjin Kim kgene@kernel.org writes:
Kevin Hilman wrote: > > From: Kevin Hilman khilman@linaro.org > > The option CONFIG_EXYNOS5420_MCPM is causing imprecise external aborts > during boot testing, causing various userspace startup failures. > > Disable until it has gotten more testing. > > Cc: Kukjin Kim kgene.kim@samsung.com, > Cc: Javier Martinez Canillas javier.martinez@collabora.co.uk, > Cc: Sachin Kamat sachin.kamat@samsung.com, > Cc: Doug Anderson dianders@chromium.org, > Cc: Bartlomiej Zolnierkiewicz b.zolnierkie@samsung.com, > Cc: Krzysztof Kozlowski k.kozlowski@samsung.com, > Cc: Tushar Behera tushar.behera@linaro.org, > Cc: stable@vger.kernel.org # v3.17+ > Signed-off-by: Kevin Hilman khilman@linaro.org > --- > This has been reported by a few people[1], but not investigated or fixed, so it's > time to disable this feature until it can be fixed. > Hi Kevin,
Yeah I agree with your opinion.
But as you can see my tree, I've queued regarding mcpm patches for 3.19 will be shown in -next in this weekend.
Which of the recently queued patches are expected to address the imprecise abort issue? I'd be happy to test them out.
Exynos5 MCPM is still broken in linux-next and still causing an imprecise abort.
What is the status of $SUBJECT patch?
Anyway let me apply this into -fixes and then let's enable after test its functionality in -next in a couple of days.
Yes, I think this needs to be applied until these aborts are understood and fixed.
Is anyone at Samsung actually looking into these MCPM issues?
Hi Kevin,
What hardware are you having problems with? 5420 or 5422/5800?
Yes. :)
exynos5420-arndale-octa: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig... exynos5422-odroid-xu3: http://storage.armcloud.us/kernel-ci/mainline/v3.18-rc6/arm-exynos_defconfig...
My boot tests seem to pass fine because I have such a minimal userspace, but Tyler Baker reported that with a "real" userspace, he can't boot to a shell:
http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/286203....
I'm not surprised that 5420 has issues, but I have not seen any external aborts on neither Chromebook that I have in my farm.
Sounds like the secondary cpus should be disabled on those device trees instead, doesn't it?
That's possible.
Unfortunately, I've gotten very little support from Samsung on this and it was originally reported 2.5 months ago[2], so I think that the 5420 MCPM should be disabled until they can propose the right fix, and actually test it.
Also, I tried disabling some CPUs at boot time, but the exynos5422-odroid-xu3 wont' even boot with nr_cpus=1 or 4 (or anything less than 8)
Kevin
[1] http://lists.infradead.org/pipermail/linux-arm-kernel/2014-September/286203....
linaro-kernel@lists.linaro.org