On Fri, Aug 20, 2021, Mathieu Desnoyers wrote:
I still really hate flakiness in tests, because then people stop caring when they fail once in a while. And with the nature of rseq, a once-in-a-while failure is a big deal. Let's see if we can use other tricks to ensure stability of the cpu id without changing timings too much.
Yeah, zero agrument regarding flaky tests.
One idea would be to use a seqcount lock.
A sequence counter did the trick! Thanks much!
But even if we use that, I'm concerned that the very long writer critical section calling sched_setaffinity would need to be alternated with a sleep to ensure the read-side progresses. The sleep delay could be relatively small compared to the duration of the sched_setaffinity call, e.g. ratio 1:10.
I already had an arbitrary usleep(10) to let the reader make progress between sched_setaffinity() calls. Dropping it down to 1us didn't affect reproducibility, so I went with that to shave those precious cycles :-) Eliminating the delay entirely did result in no repro, which was a nice confirmation that it's needed to let the reader get back into KVM_RUN.
Thanks again!