New subject: [syzbot] [xfs?] BUG: unable to handle kernel paging request in clear_user_rep_good

1 May 2023


      [ Added Borislav and stable people ]
On Sun, Apr 30, 2023 at 9:31 PM syzbot
syzbot+401145a9a237779feb26@syzkaller.appspotmail.com wrote:
...
syzbot suspects this issue was fixed by commit:
Indeed.
My initial reaction was "no, that didn't fix anything, it just cleaned
stuff up", but it turns out that yes, it did in fact fix a real bug in
the process.
The fix was not intentional, but the cleanup actually got rid of buggy code.
So here's the automatic marker for syzbot:
#syz fix: x86: don't use REP_GOOD or ERMS for user memory clearing
and the reason for the bug - in case people care - is that the old
clear_user_rep_good (which no longer exists after that commit) had the
exception entry pointing to the wrong instruction.
The buggy code did:
.Lrep_good_bytes:
            mov %edx, %ecx
            rep stosb
and the exception entry weas
_ASM_EXTABLE_UA(.Lrep_good_bytes, .Lrep_good_exit)
so the exception entry pointed at the register move instruction, not
at the actual "rep stosb" that does the user space store.
End result: if you had a situation where you *should* return -EFAULT,
and you triggered that "last final bytes" case, instead of the
exception handling dealing with it properly and fixing it up, you got
that kernel oops.
The bug goes back to commit 0db7058e8e23 ("x86/clear_user: Make it
faster") from about a year ago, which made it into v6.1.
It only affects old hardware that doesn't have the ERMS capability
flag, which *probably* means that it's mostly only triggerable in
virtualization (since pretty much any CPU from the last decade has
ERMS, afaik).
Borislav - opinions? This needs fixing for v6.1..v6.3, and the options are:
(1) just fix up the exception entry. I think this is literally this
one-liner, but somebody should double-check me. I did *not* actually
test this:
--- a/arch/x86/lib/clear_page_64.S
    +++ b/arch/x86/lib/clear_page_64.S
    @@ -142,8 +142,8 @@ SYM_FUNC_START(clear_user_rep_good)
            and $7, %edx
            jz .Lrep_good_exit
-.Lrep_good_bytes:
            mov %edx, %ecx
    +.Lrep_good_bytes:
            rep stosb
.Lrep_good_exit:
because the only use of '.Lrep_good_bytes' is that exception table entry.
(2) backport just that one commit for clear_user
In this case we should probably do commit e046fe5a36a9 ("x86: set
FSRS automatically on AMD CPUs that have FSRM") too, since that commit
changes the decision to use 'rep stosb' to check FSRS.
(3) backport the entire series of commits:
git log --oneline v6.3..034ff37d3407
Or we could even revert that commit 0db7058e8e23, but it seems silly
to revert when we have so many ways to fix it, including a one-line
code movement.
Borislav / stable people? Opinions?
Linus

Re: [syzbot] [xfs?] BUG: unable to handle kernel paging request in clear_user_rep_good