Re: xfs: Assertion failed in xfs_ag_resv_init()

2 May 2019


      On Thu, May 02, 2019 at 07:45:16PM +0200, Andre Noll wrote:
...
On Thu, May 02, 18:52, Greg Kroah-Hartman wrote
...
On Thu, May 02, 2019 at 05:27:36PM +0200, Andre Noll wrote:
...
On Thu, May 02, 16:10, Greg Kroah-Hartman wrote
...
Ok, then how about we hold off on this patch for 4.9.y then.  "no one"
should be using 4.9.y in a "server system" anymore, unless you happen to
have an enterprise kernel based on it.  So we should be fine as the
users of the older kernels don't run xfs.
Well, we do run xfs on top of bcache on vanilla 4.9 kernels on a few
dozen production servers here. Mainly because we ran into all sorts
of issues with newer kernels (not necessary related to xfs). 4.9,
OTOH, appears to be rock solid for our workload.
Great, but what is wrong with 4.14.y or better yet, 4.19.y?  Do those
also work for your workload?  If not, we should fix that, and soon :)
Some months ago we tried 4.14 and it was a real disaster: random
crashes with nothing in the logs on the file servers and unkillable
hung processes on the compute machines. The thing is, I can't afford
an extended downtime of these production systems, or test patches, or
enable debugging options which slow down the systems too much. Also,
10 of the compute nodes load the nvidia module, so all bets are off
anyway. But we've seen the hung processes also on the non-gpu nodes
where the nvidia module is not loaded.
As for 4.19, xfs on bcache was broken until a couple of weeks
ago. Meanwhile the fix (e578f90d8a9c) went in, so I benchmarked 4.19.x
on one system briefly. To my surprise the results were *worse* than
with 4.9. This seems to be another cache bypass issue, but I need to
have a closer look, and more reliable numbers.
Is this something you can reproduce outside of those 10 magical
machines?
--
Thanks,
Sasha

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: xfs: Assertion failed in xfs_ag_resv_init()