On Thu, Dec 17, 2020 at 12:55:39PM -0800, Dave Hansen wrote:
On 11/6/20 3:29 PM, ira.weiny@intel.com wrote:
/* Arm for context switch test */
write(fd, "1", 1);
/* Context switch out... */
sleep(4);
/* Check msr restored */
write(fd, "2", 1);
These are always tricky. What you ideally want here is:
- Switch away from this task to a non-PKS task, or
- Switch from this task to a PKS-using task, but one which has a different PKS value
Or both...
then, switch back to this task and make sure PKS maintained its value.
*But*, there's no absolute guarantee that another task will run. It would not be totally unreasonable to have the kernel just sit in a loop without context switching here if no other tasks can run.
The only way you *know* there is a context switch is by having two tasks bound to the same logical CPU and make sure they run one after another.
Ah... We do that.
... + CPU_ZERO(&cpuset); + CPU_SET(0, &cpuset); + /* Two processes run on CPU 0 so that they go through context switch. */ + sched_setaffinity(getpid(), sizeof(cpu_set_t), &cpuset); ...
I think this should be ensuring that both the parent and the child are running on CPU 0. At least according to the man page they should be.
<man> A child created via fork(2) inherits its parent's CPU affinity mask. </man>
Perhaps a better method would be to synchronize the 2 threads more to ensure that we are really running at the 'same time' and forcing the context switch.
This just gets itself into a state where it *CAN* context switch and prays that one will happen.
Not sure what you mean by 'This'? Do you mean that running on the same CPU will sometimes not force a context switch? Or do you mean that the sleeps could be badly timed and the 2 threads could run 1 after the other on the same CPU? The latter is AFAICT the most likely case.
You can also run a bunch of these in parallel bound to a single CPU. That would also give you higher levels of assurance that *some* context switch happens at sleep().
I think more cycles is a good idea for sure. But I'm more comfortable with forcing the test to be more synchronized so that it is actually running in the order we think/want it to be.
One critical thing with these tests is to sabotage the kernel and then run them and make *sure* they fail. Basically, if you screw up, do they actually work to catch it?
I'll try and come up with a more stressful test.
Ira