Thanks Liviu!
Some comments below..
Quoting Liviu Dudau (2013-05-21 10:15:42)
... Which side of the interface are you actually thinking of?
Both, I'm really just trying to understand the problem.
I don't think there is any C-state other than simple idle (which translates into an WFI for the core) that *doesn't* take into account power domain latencies and code path lengths to reach that state.
I'm speaking more about additional c-states after the lowest independent compute domain cstate, where we may add additional cstates which reduce the power further at a higher latency cost. These may be changing power states for the rest of the SOC or external power chips/supplies. Those states would effectively enter the lowest PSCI C-state, but then have additional steps in the CPUIdle hw specific driver.
I don't know how to draw the line between the host OS costs and the guest OS costs when using target latencies. On one hand I think that the host OS should add its own costs into what gets passed to the guest and the guest will see a slower than baremetal system in terms of state transitions;
I was thinking maybe this also.. Is there a way to query the state transition cost information through PSCI? Would there be a way to have the layers of hosts/monitors/etc contribute the cost of their paths into the query results?
... on the other hand I would like to see the guest OS shielded from this type of information as there are too many variables behind it (is the host OS also under some monitor code? are all transitions to the same state happening in constant time or are they dependent of number of cores involved, their state, etc, etc)
I agree, but don't see how. In our systems, we do very much care about the costs, and have ~real time constraints to manage. I think we need a good understanding of costs for the hw states.
If one uses a simple set of C-states (CPU_ON, CPU_IDLE, CPU_OFF, CLUSTER_OFF, SYSTEM_OFF) then the guest could make requests independent of the host OS latencies _after the relevant translations between time-to-next-event and intended target C-state have been performed_.
I think that if we don't know the real cost of entering a state, we basically will end up chosing the wrong states in many occasions.
CPUIdle is already binning the allowable costs into a specific state. If we decide that CPUIdle does not know the real cost of the states then the binning will be wrong sometimes, and cpuidle would not be selecting the correct states. I think this could have bad side effects for real time systems.
For my purposes and as things are today, I'd likely factor in the (probably pre-known & measured) host os/monitor costs into the cpuidle DT entries and have cpuidle run the show. At the lower layers, it won't matter what is passed through as long as the correct state is chosen.
Thanks,
Sebastian
`