 
            On Fri, 18 Jan 2019 at 20:03, Leo Yan leo.yan@linaro.org wrote:
On Fri, Jan 18, 2019 at 04:11:33PM -0700, Mathieu Poirier wrote:
On Fri, 18 Jan 2019 at 02:44, Al Grant Al.Grant@arm.com wrote:
Hi,
One of the top questions that comes up on CoreSight is how it interacts with CPU power saving, and I'd like to get a handle on where we are with this. This will help us understand if any more work needs to be done.
I'd suggest three levels of support:
- transparent: use of CoreSight has no effect on CPU power saving - if an idle
CPU would have been powered down it's still powered down. Any increased power draw from CoreSight comes from debug/trace blocks being powered up as necessary, not from keeping entire CPUs powered up.
Yes, that is the ideal scenario and also how Juno is currently working. It uses the TRCPDRC:PU bit and requires FW and PMIC support. To the best of my knowledge it is the only platform that works this way. It is hard to implement because it requires involvement from different teams (kernel, FW, PMIC) and the HW needs to have been designed properly.
Sorry jump into this topic and if introduce noise :)
Just want to remind, for dynamic power management, at least we need to consider below factors:
- Clock domain;
- Power domain;
- Voltage domain;
- Context saving and restoring;
With reading roughly for the spec when I wrote driver for CPU debug module, TRCPDRC:PU is used as an indicate signal for power controller (Usually the power controller means the logic in SoC, it either can be one MCU or it's integrated logic in SoC) to enable power domian for CoreSight logic. But clock domain and voltage domain controlling is separated from power domain controlling (actually this kind design is quite common for Arm SoCs), hence from the software pespective we need to handle all of them and it's not always valid for clock domain/power domain/voltage domain can response to TRCPDRC:PU bit, especially for the voltage domain which is controlled by external PMIC.
Furthermore, if the registers value will be lost during CPUIdle (e.g. for ETM), we need to provide mechanism to save and restore the content for CPU power cycle. This part is missed.
That is only the case when operated from sysfs. When talking about perf, the only thing that is not supported is if a trace session is started *and then* a CPU is hotplugged in. Tracing on that CPU will not work.
Seeing the improbability to get all this right on all platform, three years ago I set out to fix the solution in the kernel by using the genPD subsystem. That was swiftly refused by the Juno maintainers, leaving only the above as an option.
- automatic wakelock: use of CoreSight has the effect of disabling powering
off of idle CPUs, so there may be a significant increase in power consumption, but it's done automatically and works out of the box. CoreSight itself is fully functional irrespective of how the system is configured.
This is functionality that I will not accept upstream and will have to be carried (and maintained) out of tree by ARM. To me it is better to spend time and efforts on making things right using either the FW/PMIC mechanic or genPD.
- invasive: power saving must be disabled manually - i.e. you have to get a
manual (and possibly device-specific) recipe from somewhere. If you don't then things will break (loss of trace at best, crash at worst).
That is how all platforms (except Juno) currently work. CPUidle needs to be manually disabled at compilation time or at runtime via sysfs.
I would hope that perf (all modes) is transparent, and direct use of sysfs is at worst a wakelock... but where are we now? Are there still boards that need manual recipes with the current kernel - either with perf or with sysfs?
I think the first thing that needs to be worked on is the integration of CPUs with genPD so that CPU power domains can be controlled by genPD rather than CPUidle. That way power management can be done by the kernel rather than in FW. I'm being told the functionality is now supported in genPD but I wouldn't bet a penny that it is adequate for CS or that it does exactly what we want.
Just curious for the genPD usage with CoreSight.
For a sane solution, we can use genPD to manage the dependency within different CoreSight components, and also manage the dependency between CoreSight and CPU. But let's see if we can support dynamic power management without genPD (and maybe it's simple for the first step :))
Seems to me we can get rid of the dependency between CoreSight and CPU simply based on the task switching, e.g. when the task is switched out and CPU runs with idle task, we firstly disable logic for CoreSight and then the CPU can be safely powered off by CPUIdle. Should this be sufficient for perf (per-thread mode and CPU wide mode)?
The only reason to add genPD into the mix is to provide an alternative to the TRCPDRC:PU bit manipulation so that boards that can't use it (for whatever reason) have a fallback option.
Using genPD so we can get benefit for wider tracing scope (e.g. using SysFS method for firmwares tracing). If so, genPD is used as wakelock for this case.
Second is properly handling CPUhotplug operation when perf or sysfs sessions are ongoing.
Hotplug handling with perf: As mentionned above, this is only needed when a session (CPU wide or per-thread) has been started and a CPU is hotplugged in.
Hotplug handling with sysfs: That needs to be addressed.
Since the CPU hotplug will migrate all tasks to other online CPUs and at the end the idle task is running, so if we simply diable CoreSight when switch to idle task then CPU hotplug can be easily handled.
Again, it is very important to specify this is only relevant for sysfs. There is already a hotplug state machine that takes care of signalling drivers that a CPU is going down (or up).
Thoughts?
Thanks, Leo Yan