Hi
With an ARM Landing Team branch based on LSK 3.18 Android I'm seeing a hang if I
echo 0 > /sys/devices/system/cpu/cpu1/online
After bisecting, I fixed this problem by reverting commit 553b785b5383 ("cpufreq: Iterate over all the possible cpus to create powerstats.")
I was tempted to also revert the other scheduler and cpufreq mods from Google that are part of the same series, just as a precaution, because they don't fill me with confidence, but have left them in for now.
Hi Tixy,
On 27 July 2015 at 23:34, Jon Medhurst (Tixy) tixy@linaro.org wrote:
Hi
With an ARM Landing Team branch based on LSK 3.18 Android I'm seeing a hang if I
echo 0 > /sys/devices/system/cpu/cpu1/online
After bisecting, I fixed this problem by reverting commit 553b785b5383 ("cpufreq: Iterate over all the possible cpus to create powerstats.")
So in this patch I believe instead of "for_each_possible_cpu" we should use "for_each_online_cpu". Can you please re-enable this patch and try this change?
Regards, Amit Pundir
I was tempted to also revert the other scheduler and cpufreq mods from Google that are part of the same series, just as a precaution, because they don't fill me with confidence, but have left them in for now.
-- Tixy
On 28 July 2015 at 15:45, Amit Pundir amit.pundir@linaro.org wrote:
Hi Tixy,
On 27 July 2015 at 23:34, Jon Medhurst (Tixy) tixy@linaro.org wrote:
Hi
With an ARM Landing Team branch based on LSK 3.18 Android I'm seeing a hang if I
echo 0 > /sys/devices/system/cpu/cpu1/online
After bisecting, I fixed this problem by reverting commit 553b785b5383 ("cpufreq: Iterate over all the possible cpus to create powerstats.")
So in this patch I believe instead of "for_each_possible_cpu" we should use "for_each_online_cpu". Can you please re-enable this patch and try this change?
Also if it doesn't work out, can you please tell more about this hang? Did the UI hang up on you? Did you see any panic or dmesg warnings/errors?
I did a quick test run of aosp/android-3.18 on qemu (with smp 2), but sadly my host machine couldn't keep up with multi-core qemu load and I didn't boot up all the way to UI. I booted up to shell prompt though and I did't see any unusual activity on shell prompt while I run following commands.
root@vexpress:/ # echo 0 > /sys/devices/system/cpu/cpu1/online [ 707.896250] CPU1: shutdown root@vexpress:/ # root@vexpress:/ # echo 1 > /sys/devices/system/cpu/cpu1/online [ 763.520081] CPU1: 2741 spurious wakeup calls [ 763.520164] CPU1: smp_ops.cpu_die() returned, trying to resuscitate [ 763.520344] CPU1: Booted secondary processor root@vexpress:/ # root@vexpress:/ # root@vexpress:/ # echo 0 > /sys/devices/system/cpu/cpu1/online [ 775.841608] CPU1: shutdown root@vexpress:/ # root@vexpress:/ # root@vexpress:/ # echo 1 > /sys/devices/system/cpu/cpu1/online [ 998.250284] CPU1: 23 spurious wakeup calls [ 998.250312] CPU1: smp_ops.cpu_die() returned, trying to resuscitate [ 998.250351] CPU1: Booted secondary processor
Next I'll give it an another shot on Juno. Can I boot Juno with non-LSK kernel i.e. only aosp/android-3.18 and ARM LT Juno platform patches? Or is there any dependency on LSK as well?
Regards, Amit Pundir
I was tempted to also revert the other scheduler and cpufreq mods from Google that are part of the same series, just as a precaution, because they don't fill me with confidence, but have left them in for now.
-- Tixy
On Tue, 2015-07-28 at 15:45 +0530, Amit Pundir wrote:
Hi Tixy,
On 27 July 2015 at 23:34, Jon Medhurst (Tixy) tixy@linaro.org wrote:
Hi
With an ARM Landing Team branch based on LSK 3.18 Android I'm seeing a hang if I
echo 0 > /sys/devices/system/cpu/cpu1/online
After bisecting, I fixed this problem by reverting commit 553b785b5383 ("cpufreq: Iterate over all the possible cpus to create powerstats.")
So in this patch I believe instead of "for_each_possible_cpu" we should use "for_each_online_cpu". Can you please re-enable this patch and try this change?
That gets rid of the hang, but it somewhat defeats the purpose of the original patch doesn't it? Though that patch itself looks a bit suspect as the module init function already calls cpufreq_powerstats_create for each online cpu (via cpufreq_stats_create_table).
Anyway, I don't fancy opening this can of worms further. Google seems to have lots of patches in this area and as even the upstream maintainers seem to struggle to catch all the interactions within cpufreq I generally don't trust any cpufreq patches ;-)
BTW, the hang was that serial console stops responding after the cpu is offlined. This is on Juno with an Open Embedded files system, no GUI (well, framebuffer text console).
On Tue, 2015-07-28 at 12:11 +0100, Jon Medhurst (Tixy) wrote:
On Tue, 2015-07-28 at 15:45 +0530, Amit Pundir wrote:
Hi Tixy,
On 27 July 2015 at 23:34, Jon Medhurst (Tixy) tixy@linaro.org wrote:
Hi
With an ARM Landing Team branch based on LSK 3.18 Android I'm seeing a hang if I
echo 0 > /sys/devices/system/cpu/cpu1/online
After bisecting, I fixed this problem by reverting commit 553b785b5383 ("cpufreq: Iterate over all the possible cpus to create powerstats.")
So in this patch I believe instead of "for_each_possible_cpu" we should use "for_each_online_cpu". Can you please re-enable this patch and try this change?
That gets rid of the hang,
So does using your linaro-android [1] branch that isn't in LSK yet. The specific patch that fixes things is Kevin's "cpufreq_stats: fix use of cpufreq_for_each_valid_entry() iterator"
[1] git://android.git.linaro.org/kernel/linaro-android linaro-android-3.18-lsk
but it somewhat defeats the purpose of the original patch doesn't it? Though that patch itself looks a bit suspect as the module init function already calls cpufreq_powerstats_create for each online cpu (via cpufreq_stats_create_table).
But I'm wrong because that init code probably won't get called, as the early return and comment in cpufreq_stats_create_table implies.
Anyway, I don't fancy opening this can of worms further. Google seems to have lots of patches in this area
I rather foolishly looked a little further and I spotted and bug in cpufreq_powerstats_create. That code is scanning device-tree for some 'current' bindings they invented, but the code doing the scanning erroneously assumes that Linux's cpu number relates to what's in device-tree, it doesn't. I.e. this is flawed...
snprintf(device_path, sizeof(device_path), "/cpus/cpu@%d", cpu); cpu_node = of_find_node_by_path(device_path);
Doesn't matter for us though, as we don't have Google's binding additions. Which means that this whole area of code is going to somewhat futilely keep allocating memory, scanning device-tree, failing, then cleaning up.
"Jon Medhurst (Tixy)" tixy@linaro.org writes:
On Tue, 2015-07-28 at 12:11 +0100, Jon Medhurst (Tixy) wrote:
On Tue, 2015-07-28 at 15:45 +0530, Amit Pundir wrote:
Hi Tixy,
On 27 July 2015 at 23:34, Jon Medhurst (Tixy) tixy@linaro.org wrote:
Hi
With an ARM Landing Team branch based on LSK 3.18 Android I'm seeing a hang if I
echo 0 > /sys/devices/system/cpu/cpu1/online
After bisecting, I fixed this problem by reverting commit 553b785b5383 ("cpufreq: Iterate over all the possible cpus to create powerstats.")
So in this patch I believe instead of "for_each_possible_cpu" we should use "for_each_online_cpu". Can you please re-enable this patch and try this change?
That gets rid of the hang,
So does using your linaro-android [1] branch that isn't in LSK yet. The specific patch that fixes things is Kevin's "cpufreq_stats: fix use of cpufreq_for_each_valid_entry() iterator"
[1] git://android.git.linaro.org/kernel/linaro-android linaro-android-3.18-lsk
Right, without my fix (which Google has now merged into upstream AOSP), that branch doesn't even boot on some platforms. :/
but it somewhat defeats the purpose of the original patch doesn't it? Though that patch itself looks a bit suspect as the module init function already calls cpufreq_powerstats_create for each online cpu (via cpufreq_stats_create_table).
But I'm wrong because that init code probably won't get called, as the early return and comment in cpufreq_stats_create_table implies.
Anyway, I don't fancy opening this can of worms further. Google seems to have lots of patches in this area
I rather foolishly looked a little further and I spotted and bug in cpufreq_powerstats_create. That code is scanning device-tree for some 'current' bindings they invented, but the code doing the scanning erroneously assumes that Linux's cpu number relates to what's in device-tree, it doesn't. I.e. this is flawed...
snprintf(device_path, sizeof(device_path), "/cpus/cpu@%d", cpu); cpu_node = of_find_node_by_path(device_path);
Doesn't matter for us though, as we don't have Google's binding additions. Which means that this whole area of code is going to somewhat futilely keep allocating memory, scanning device-tree, failing, then cleaning up.
I too am quite unimpressed (and rather scared) of the haphazard hacking Google has been doing in the cpufreq drivers.
Kevin
On 30-07-15, 15:41, Kevin Hilman wrote:
I too am quite unimpressed (and rather scared) of the haphazard hacking Google has been doing in the cpufreq drivers.
I looked at that scary code several times (mostly when others asked me to look), and never had the guts to get deeper into it to clean it up.
Its really too scary of a hack and then its from THE GOOGLE :)
In mainline, I am planning to give a sysfs interface to the new opp-v2 stuff, that will kill some of the stuff google has posted.
On Fri, 2015-07-31 at 09:10 +0530, Viresh Kumar wrote:
On 30-07-15, 15:41, Kevin Hilman wrote:
I too am quite unimpressed (and rather scared) of the haphazard hacking Google has been doing in the cpufreq drivers.
I looked at that scary code several times (mostly when others asked me to look), and never had the guts to get deeper into it to clean it up.
Its really too scary of a hack and then its from THE GOOGLE :)
I too originally used the words 'scary' and 'hack' in a draft email then toned it down, I needn’t have bothered ;-)
As I'll soon be getting big.LITTLE MP (aka HMP) and Intelligent Power Allocator (IPA) from ARM to integrate, I can't help but think that Google's efforts in this area are going to cause us problems even if it wasn't buggy. So I'm sorely tempted to just revert the whole lot in the ARM Landing Team tree. I'll have to get peoples opinions on that.
On 31-07-15, 10:38, Jon Medhurst (Tixy) wrote:
I too originally used the words 'scary' and 'hack' in a draft email then toned it down, I needn’t have bothered ;-)
As I'll soon be getting big.LITTLE MP (aka HMP) and Intelligent Power Allocator (IPA) from ARM to integrate, I can't help but think that Google's efforts in this area are going to cause us problems even if it wasn't buggy. So I'm sorely tempted to just revert the whole lot in the ARM Landing Team tree. I'll have to get peoples opinions on that.
I haven't seen them all, so can't really ask you to drop them :)
On 31 July 2015 at 10:41, Viresh Kumar viresh.kumar@linaro.org wrote:
On 31-07-15, 10:38, Jon Medhurst (Tixy) wrote:
I too originally used the words 'scary' and 'hack' in a draft email then toned it down, I needn’t have bothered ;-)
As I'll soon be getting big.LITTLE MP (aka HMP) and Intelligent Power Allocator (IPA) from ARM to integrate, I can't help but think that Google's efforts in this area are going to cause us problems even if it wasn't buggy. So I'm sorely tempted to just revert the whole lot in the ARM Landing Team tree. I'll have to get peoples opinions on that.
I'm of the opinion that they should not be included into LSK at all.
That is, they should be dropped from Amit's branch and someone should go back to original authors and ask them to either improve them, make them CONFIG_ conditional and so on.
If or once they're in better shape and we can be sure they aren't going to break things, then we should think about including them.
I haven't seen them all, so can't really ask you to drop them :)
-- viresh _______________________________________________ linaro-kernel mailing list linaro-kernel@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-kernel
On 31 July 2015 at 21:11, Ryan Harkin ryan.harkin@linaro.org wrote:
On 31 July 2015 at 10:41, Viresh Kumar viresh.kumar@linaro.org wrote:
On 31-07-15, 10:38, Jon Medhurst (Tixy) wrote:
I too originally used the words 'scary' and 'hack' in a draft email then toned it down, I needn’t have bothered ;-)
As I'll soon be getting big.LITTLE MP (aka HMP) and Intelligent Power Allocator (IPA) from ARM to integrate, I can't help but think that Google's efforts in this area are going to cause us problems even if it wasn't buggy. So I'm sorely tempted to just revert the whole lot in the ARM Landing Team tree. I'll have to get peoples opinions on that.
I'm of the opinion that they should not be included into LSK at all.
That is, they should be dropped from Amit's branch and someone should go back to original authors and ask them to either improve them, make them CONFIG_ conditional and so on.
CONFIG_ conditional sounds like a good plan for the time being. I'll propose a fix in AOSP for that.
FWIW I have already dropped these patches from 4.1 onwards. I've been told that mainline(4.2+) already has replacement framework for AOSP's "persistent cpu_freq history on hotplug" patches and as Viresh mentioned they are also working on new opp-bindings using which we can track powers (current/voltage) stats.
Regards, Amit Pundir
If or once they're in better shape and we can be sure they aren't going to break things, then we should think about including them.
I haven't seen them all, so can't really ask you to drop them :)
-- viresh _______________________________________________ linaro-kernel mailing list linaro-kernel@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-kernel
linaro-kernel mailing list linaro-kernel@lists.linaro.org https://lists.linaro.org/mailman/listinfo/linaro-kernel
On 05-08-15, 14:39, Amit Pundir wrote:
FWIW I have already dropped these patches from 4.1 onwards. I've been told that mainline(4.2+) already has replacement framework for AOSP's "persistent cpu_freq history on hotplug" patches and as Viresh
Right.
mentioned they are also working on new opp-bindings using which we can track powers (current/voltage) stats.
So most of the opp-v2 stuff will be pushed into 4.3, its all reviewed/tested.
The stats stuff you are talking about is already coded/tested by me and is pushed here:
ssh://git@git.linaro.org/people/viresh.kumar/linux.git opp/v2
That adds debugfs support to OPP, and their properties can be seen there.
linaro-kernel@lists.linaro.org