Hi Bob,
On Fri, Sep 11, 2020 at 08:49:14AM -0400, Bob Peterson wrote:
----- Original Message -----
On Fri, Sep 11, 2020 at 08:08:35AM -0400, Bob Peterson wrote:
----- Original Message -----
On Thu, Sep 10, 2020 at 09:43:19PM +0200, Salvatore Bonaccorso wrote:
Hi,
On Tue, Jun 23, 2020 at 09:57:50PM +0200, Greg Kroah-Hartman wrote:
From: Bob Peterson rpeterso@redhat.com
[ Upstream commit 83d060ca8d90fa1e3feac227f995c013100862d3 ]
Before this patch, transactions could be merged into the system transaction by function gfs2_merge_trans(), but the transaction ail lists were never merged. Because the ail flushing mechanism can run separately, bd elements can be attached to the transaction's buffer list during the transaction (trans_add_meta, etc) but quickly moved to its ail lists. Later, in function gfs2_trans_end, the transaction can be freed (by gfs2_trans_end) while it still has bd elements queued to its ail lists, which can cause it to either lose track of the bd elements altogether (memory leak) or worse, reference the bd elements after the parent transaction has been freed.
Although I've not seen any serious consequences, the problem becomes apparent with the previous patch's addition of:
gfs2_assert_warn(sdp, list_empty(&tr->tr_ail1_list));
to function gfs2_trans_free().
This patch adds logic into gfs2_merge_trans() to move the merged transaction's ail lists to the sdp transaction. This prevents the use-after-free. To do this properly, we need to hold the ail lock, so we pass sdp into the function instead of the transaction itself.
Signed-off-by: Bob Peterson rpeterso@redhat.com Signed-off-by: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
(snip)
In Debian two user confirmed issues on writing on a GFS2 partition with this commit applied. The initial Debian report is at https://bugs.debian.org/968567 and Daniel Craig reported it into Bugzilla at https://bugzilla.kernel.org/show_bug.cgi?id=209217 .
Writing to a gfs2 filesystem fails and results in a soft lookup of the machine for kernels with that commit applied. I cannot reporduce the issue myself due not having a respective setup available, but Daniel described a minimal serieos of steps to reproduce the issue.
This might affect as well other stable series where this commit was applied, as there was a similar report for someone running 5.4.58 in https://www.redhat.com/archives/linux-cluster/2020-August/msg00000.html
Can you report this to the gfs2 developers?
thanks,
greg k-h
Hi Greg,
No need. The patch came from the gfs2 developers. I think he just wants it added to a stable release.
What commit needs to be added to a stable release?
confused,
greg k-h
Sorry Greg,
It's pretty early here and the caffeine hadn't quite hit my system. The problem is most likely that 4.19.132 is missing this upstream patch:
cbcc89b630447ec7836aa2b9242d9bb1725f5a61
I'm not sure how or why 83d060ca8d90fa1e3feac227f995c013100862d3 got put into stable without a stable CC but cbcc89b6304 is definitely required.
I'd like to suggest Salvatore try cherry-picking this patch to see if it fixes the problem, and if so, perhaps Greg can add it to stable.
Thanks I will ask the affected users if they can test this (because as said I cannot myself in this case).
If it is true that we need to cherry-pick as well cbcc89b630447ec7836aa2b9242d9bb1725f5a61, then all of v4.14.y, v4.19.y, v5.4.y would need to have it included as well (83d060ca8d90fa1e3feac227f995c013100862d3 was applied down to v4.14.186, v4.19.130, v5.4.49, v5.7.6 (EOL)).
Regards, Salvatore