On Fri, Nov 30, 2018 at 09:22:03AM +0100, Greg KH wrote:
On Fri, Nov 30, 2018 at 09:40:19AM +1100, Dave Chinner wrote:
I stopped my tests at 5 billion ops yesterday (i.e. 20 billion ops aggregate) to focus on testing the copy_file_range() changes, but Darrick's tests are still ongoing and have passed 40 billion ops in aggregate over the past few days.
The reason we are running these so long is that we've seen fsx data corruption failures after 12+ hours of runtime and hundreds of millions of ops. Hence the testing for backported fixes will need to replicate these test runs across multiple configurations for multiple days before we have any confidence that we've actually fixed the data corruptions and not introduced any new ones.
If you pull only a small subset of the fixes, the fsx will still fail and we have no real way of actually verifying that there have been no regression introduced by the backport. IOWs, there's a /massive/ amount of QA needed for ensuring that these backports work correctly.
Right now the XFS developers don't have the time or resources available to validate stable backports are correct and regression fre because we are focussed on ensuring the upstream fixes we've already made (and are still writing) are solid and reliable.
Ok, that's fine, so users of XFS should wait until the 4.20 release before relying on it? :)
It's getting to the point that with the amount of known issues with XFS on LTS kernels it makes sense to mark it as CONFIG_BROKEN.
I understand your reluctance to want to backport anything, but it really feels like you are not even allowing for fixes that are "obviously right" to be backported either, even after they pass testing. Which isn't ok for your users.
Do the XFS maintainers expect users to always use the latest upstream kernel?
-- Thanks, Sasha