There is another idea that I have.
Lets sacrifice idleness of CPU0 (which is already considered as housekeeping CPU in scheduler) to save us from all the complexity we have today.
Suppose we have 16 CPUs, with 4 CPUs per policy and hence 4 policies.
- Keep a single delayed-work (non-deferrable) per policy and queue
them as: queue_work_on(CPU0).
- This will work because any CPU can calculate the load of other
CPUs, and there is no dependency on the local CPU.
- CPU0 will hence get interrupted, check if the policy->cpus are idle or not, and if not, update their frequency (perhaps with an IPI).
Not sure if this will be better performance wise though.
No it is not. In your example, everything could be okay with many ifs. I'm from the HPC field and would expect load imbalances because this approach does not scale with the number of frequency domains.
In 2011, AMD released Bulldozer servers with 4 sockets and each of these sockets had 2 dies with 4 modules. Each module has its own frequency domain, which sums up to 32 different domains. Putting all the load to CPU0 will be unfavorable for a balanced performance over all CPUS. This is worsened by the cache coherence protocol which might delay accesses to shared and foreign memory significantly.
And this had been in 2011. Today we have Haswell-EP quad socket servers with up to 72 frequency domains for all CPUs of all sockets put together. If you have an SGI Altix UV, you even have 256 sockets with 18 domains each, which sums up to 4608 domains.
I would expect architectures with more frequency domains in the future.
Robert