On Fri, 29 Dec 2017, Linus Torvalds wrote:
Ok, so what does seem to be consistent for everybody is that double-fault in the NMI backtrace.
So the fact that the NMI always hits on a double-fault does make me suspect that it's a infinite stream of double-faults, and that is presumably also what causes the RCU timeout.
As I've been fighting with recursive double-faults lately (backporting PTI to ancient kernels), I can tell you that this is not the symptom you'd be seeing in such case; recursive double fault pretty quickly overflows the interrupt stack and triple-faults.