On 16/04/2020 19:49, Sasha Levin wrote:
Just a question while I process your explanation (thanks for doing it!): wouldn't this be done by the neural network?
Yes, in the basic case. (Hopefully we're agreed that this is a long way from "I'm not sure what a fixes tag has to do with inclusion in a stable tree.", which is how this whole brouhaha started.)
It learns what a stable worthy commit is (and what isn't), and applies weights based on these findings, right? So if it learns that most non-stable commits don't have a fixes tag, it's likely to use that and "require" other inputs to have enough weight to compensate over a missing fixes tag so that it'll pass the threshold, no?
Yes. The problem comes when there are other inputs the NN doesn't have, that ought to screen off some of the information it's using. This is probably best illustrated by an unrealistic extreme case... Let's imagine hypothetically that the maintainer of drivers/blub is an absolutely perfect judge of which patches should go to -stable, and that the transmission path from him to the stable trees never loses a patch. This would mean that every autosel patch in drivers/blub is necessarily a false positive, because all the 'true positives' it might have found have already been taken out of the pool, so to speak. But if the NN is just trained to discriminate patches on whether they end up going to stable, it won't see any difference between a drivers/blub patch that the maintainer sent to stable straight away and a drivers/wibble patch that the latter's less diligent maintainer didn't forward and that only got picked up later when a stable kernel user encountered the bug it was fixing. As long as the NN doesn't have that piece of information, it's going to either generate lots of false positives in drivers/blub or lots of false negatives in drivers/wibble. Now obviously drivers/blub doesn't exist, no maintainer is 100% perfect at -stable submissions; but any difference will produce the same effect on a smaller scale, with the 'blubbish' maintainers seeing a high false positive fraction while from the 'wibblesome' maintainer's point of view autosel is working great. And since the 'blubs' are the ones who're putting effort of their own into stable selection already, they get aggrieved at having to also put effort into catching the false positives from a system that doesn't seem to be doing much for them, and everyone ends up shouting at each other as we're seeing here.
(Do you want me to do another worked numerical example demonstrating the above, or does it make enough sense in words not to need one?)
-ed