replace_fd() returns the number of the new file descriptor through the return value of do_dup2(). However its callers never care about the specific number. In fact the caller in receive_fd_replace() treats any non-zero return value as an error and therefore never calls __receive_sock() for most file descriptors, which is a bug.
To fix the bug in receive_fd_replace() and to avoid the same issue happening in future callers, signal success through a plain zero.
Suggested-by: Al Viro viro@zeniv.linux.org.uk Link: https://lore.kernel.org/lkml/20250801220215.GS222315@ZenIV/ Fixes: 173817151b15 ("fs: Expand __receive_fd() to accept existing fd") Fixes: 42eb0d54c08a ("fs: split receive_fd_replace from __receive_fd") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de --- Changes in v2: - Move the fix to replace_fd() (Al) - Link to v1: https://lore.kernel.org/r/20250801-fix-receive_fd_replace-v1-1-d46d600c74d6@... --- Untested, it stuck out while reading the code. --- fs/file.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/file.c b/fs/file.c index 6d2275c3be9c6967d16c75d1b6521f9b58980926..f8a271265913951d755a5db559938d589219c4f2 100644 --- a/fs/file.c +++ b/fs/file.c @@ -1330,7 +1330,10 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags) err = expand_files(files, fd); if (unlikely(err < 0)) goto out_unlock; - return do_dup2(files, file, fd, flags); + err = do_dup2(files, file, fd, flags); + if (err < 0) + goto out_unlock; + err = 0;
out_unlock: spin_unlock(&files->file_lock);
--- base-commit: d2eedaa3909be9102d648a4a0a50ccf64f96c54f change-id: 20250801-fix-receive_fd_replace-7fdd5ce6532d
Best regards,
On Mon, Aug 04, 2025 at 10:40:17AM +0200, Thomas Weißschuh wrote:
replace_fd() returns the number of the new file descriptor through the return value of do_dup2(). However its callers never care about the specific number. In fact the caller in receive_fd_replace() treats any
To be pedantic as stated this isn't true: While they don't care about it in their error handling they very much care about what specific file descriptor number is used. Try and assign unexpected fd numbers for coredumping and see what CVEs you can get out of it...
I'll take the patch but I'll amend it to just use a guard:
diff --git a/fs/file.c b/fs/file.c index 6d2275c3be9c..858a55dbd5d8 100644 --- a/fs/file.c +++ b/fs/file.c @@ -1318,7 +1318,7 @@ __releases(&files->file_lock) int replace_fd(unsigned fd, struct file *file, unsigned flags) { int err; - struct files_struct *files = current->files; + struct files_struct *files;
if (!file) return close_fd(fd); @@ -1326,15 +1326,17 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags) if (fd >= rlimit(RLIMIT_NOFILE)) return -EBADF;
- spin_lock(&files->file_lock); + files = current->files; + + guard(spinlock)(&files->file_lock); err = expand_files(files, fd); if (unlikely(err < 0)) - goto out_unlock; - return do_dup2(files, file, fd, flags); + return err; + err = do_dup2(files, file, fd, flags); + if (err < 0) + return err;
-out_unlock: - spin_unlock(&files->file_lock); - return err; + return 0; }
/**
scoped_guard() works too but bloats the patch.
non-zero return value as an error and therefore never calls __receive_sock() for most file descriptors, which is a bug.
To fix the bug in receive_fd_replace() and to avoid the same issue happening in future callers, signal success through a plain zero.
Suggested-by: Al Viro viro@zeniv.linux.org.uk Link: https://lore.kernel.org/lkml/20250801220215.GS222315@ZenIV/ Fixes: 173817151b15 ("fs: Expand __receive_fd() to accept existing fd") Fixes: 42eb0d54c08a ("fs: split receive_fd_replace from __receive_fd") Cc: stable@vger.kernel.org Signed-off-by: Thomas Weißschuh thomas.weissschuh@linutronix.de
Changes in v2:
- Move the fix to replace_fd() (Al)
- Link to v1: https://lore.kernel.org/r/20250801-fix-receive_fd_replace-v1-1-d46d600c74d6@...
Untested, it stuck out while reading the code.
fs/file.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/file.c b/fs/file.c index 6d2275c3be9c6967d16c75d1b6521f9b58980926..f8a271265913951d755a5db559938d589219c4f2 100644 --- a/fs/file.c +++ b/fs/file.c @@ -1330,7 +1330,10 @@ int replace_fd(unsigned fd, struct file *file, unsigned flags) err = expand_files(files, fd); if (unlikely(err < 0)) goto out_unlock;
- return do_dup2(files, file, fd, flags);
- err = do_dup2(files, file, fd, flags);
- if (err < 0)
goto out_unlock;
- err = 0;
out_unlock: spin_unlock(&files->file_lock);
base-commit: d2eedaa3909be9102d648a4a0a50ccf64f96c54f change-id: 20250801-fix-receive_fd_replace-7fdd5ce6532d
Best regards,
Thomas Weißschuh thomas.weissschuh@linutronix.de
On Mon, Aug 04, 2025 at 02:33:13PM +0200, Christian Brauner wrote:
guard(spinlock)(&files->file_lock); err = expand_files(files, fd); if (unlikely(err < 0))
goto out_unlock;
return do_dup2(files, file, fd, flags);
return err;
err = do_dup2(files, file, fd, flags);
if (err < 0)
return err;
-out_unlock:
spin_unlock(&files->file_lock);
return err;
return 0;
}
NAK. This is broken - do_dup2() drops ->file_lock. And that's why I loathe the guard() - it's too easy to get confused *and* assume that it will DTRT, no need to check carefully.
On Mon, Aug 04, 2025 at 04:52:29PM +0100, Al Viro wrote:
On Mon, Aug 04, 2025 at 02:33:13PM +0200, Christian Brauner wrote:
guard(spinlock)(&files->file_lock); err = expand_files(files, fd); if (unlikely(err < 0))
goto out_unlock;
return do_dup2(files, file, fd, flags);
return err;
err = do_dup2(files, file, fd, flags);
if (err < 0)
return err;
-out_unlock:
spin_unlock(&files->file_lock);
return err;
return 0;
}
NAK. This is broken - do_dup2() drops ->file_lock. And that's why I loathe the guard() - it's too easy to get confused *and* assume that it will DTRT, no need to check carefully.
To be fair I also got this wrong in my original v2 patch not using guard(). I'll send a fixed version tomorrow.
Thomas
On Mon, Aug 04, 2025 at 04:52:29PM +0100, Al Viro wrote:
On Mon, Aug 04, 2025 at 02:33:13PM +0200, Christian Brauner wrote:
guard(spinlock)(&files->file_lock); err = expand_files(files, fd); if (unlikely(err < 0))
goto out_unlock;
return do_dup2(files, file, fd, flags);
return err;
err = do_dup2(files, file, fd, flags);
if (err < 0)
return err;
-out_unlock:
spin_unlock(&files->file_lock);
return err;
return 0;
}
NAK. This is broken - do_dup2() drops ->file_lock. And that's why I loathe the guard() - it's too easy to get confused *and* assume that it will DTRT, no need to check carefully.
Note, BTW, that in actual replacing case do_dup2() has blocking operations (closing the replaced reference) after dropping ->file_lock, so making it locking-neutral would not be easy; doable (have it return the old reference in the replacing case and adjust the callers accordingly), but it's seriously not pretty (NULL/address of old file/ERR_PTR() for return value, boilerplate in callers, etc.). Having do_dup2() called without ->file_lock and taking it inside is not an option - we could pull expand_files() in there, but lookup of oldfd in actual dup2(2)/dup3(2) has to be done within the same ->file_lock scope where it is inserted into the table.
Sure, all things equal it's better to have functions locking-neutral, but it's not always the best approach. And while __free() allows for "we'd passed the object to somebody else, it's not ours to consume anymore", guard() does not.
On Mon, Aug 04, 2025 at 04:52:29PM +0100, Al Viro wrote:
On Mon, Aug 04, 2025 at 02:33:13PM +0200, Christian Brauner wrote:
guard(spinlock)(&files->file_lock); err = expand_files(files, fd); if (unlikely(err < 0))
goto out_unlock;
return do_dup2(files, file, fd, flags);
return err;
err = do_dup2(files, file, fd, flags);
if (err < 0)
return err;
-out_unlock:
spin_unlock(&files->file_lock);
return err;
return 0;
}
NAK. This is broken - do_dup2() drops ->file_lock. And that's why I
Right, I missed that. Just say it's broken. You don't need to throw around NAKs pointlessly. It's pretty clear when to drop ptaches without throwing the meat cleaver through the room.
loathe the guard() - it's too easy to get confused *and* assume that
The calling conventions of do_dup2() are terrible. The only reason it drops file_lock itself instead of leaving it to the two callers that have to acquire it anyway is because it wants to call filp_close() if there's already a file on that fd.
And really the side-effect of dropping a lock implicitly is nasty especially when the function doesn't even indicate that it does that in it's name.
And guards are great.
On Tue, Aug 05, 2025 at 01:55:59PM +0200, Christian Brauner wrote:
The calling conventions of do_dup2() are terrible. The only reason it drops file_lock itself instead of leaving it to the two callers that have to acquire it anyway is because it wants to call filp_close() if there's already a file on that fd.
Alternative calling conventions end up being nastier - I've tried.
And really the side-effect of dropping a lock implicitly is nasty especially when the function doesn't even indicate that it does that in it's name.
And guards are great.
They do no allow to express things like "foo() consumes lock X". From time to time, we *do* need that, and when that happens guards become a menace.
Another case is lock if (lock-dependent condition) some work unlock work that can't be under that lock else some other work unlock more work that can't be under that lock
Fairly common, especially when that's a spinlock and "can't be under that lock" includes blocking operations. Can't be expressed with guards, not without a massage that often ends up with bloody awful results.
On Tue, Aug 05, 2025 at 04:34:57PM +0100, Al Viro wrote:
They do no allow to express things like "foo() consumes lock X".
From time to time, we *do* need that, and when that happens guards
become a menace.
Another case is lock if (lock-dependent condition) some work unlock work that can't be under that lock else some other work unlock more work that can't be under that lock
Fairly common, especially when that's a spinlock and "can't be under that lock" includes blocking operations. Can't be expressed with guards, not without a massage that often ends up with bloody awful results.
FWIW, I'm looking through the raw data I've got during ->d_lock audit. Except for a few functions (all static in fs/dcache.c), all scopes terminate in the same function where they begin.
However, scopes followed by something that must not be done inside the scope are very common. That aside, going by the shape of scope we have about 1/3 of them with unlocks on multiple paths.
Not all of those are equal - some would be eliminated by use of guard(). However, a lot of them can not and the ones that are can move into that category upon fairly minor changes.
Sure, in theory we can always massage them into shape where that won't happen - results will be more brittle than what we have right now ;-/
Basically, guard is almost always the wrong thing; scoped variant may be OK in a sizable subset of cases, but it's not universally a good thing - scope may not align well wrt control flow graph. Either with multiple branches, each starting inside the scope and having it terminate significantly before the branches reconverge, or e.g. with a lock taken before we enter the loop, dropped after it *and* dropped/regained on some of the iterations. Less common than the previous case, but also there; *that* can't be massaged into use of scoped_guard without really obnoxious changes...
Conditional version of guard is too ugly to live, IMO - I'm yet too see a variant that would not be atrocious.
Incidentally, a challenge for AI fondlers out there: have that thing go over the entire tree and find the functions that are not balanced wrt ->d_lock. Getting a few false positives is acceptable; false negatives are not. No using the information about that being limited to fs/dcache.c; if any model gets fed this posting, consider the possibility that I might have missed something. Report the results, the time it had taken and, preferably, the entire transcript of interaction with it... Any takers? Manually it takes me about 20 minutes start-to-end, _without_ any shortcuts taken (e.g. after a series of patches hitting fs/dcache.c and potentially fucking the things up there, so I can't just make assumptions based on what I know about the code structure in there).
On Tue, Aug 05, 2025 at 08:50:03PM +0100, Al Viro wrote:
On Tue, Aug 05, 2025 at 04:34:57PM +0100, Al Viro wrote:
They do no allow to express things like "foo() consumes lock X".
From time to time, we *do* need that, and when that happens guards
become a menace.
Another case is lock if (lock-dependent condition) some work unlock work that can't be under that lock else some other work unlock more work that can't be under that lock
Fairly common, especially when that's a spinlock and "can't be under that lock" includes blocking operations. Can't be expressed with guards, not without a massage that often ends up with bloody awful results.
FWIW, I'm looking through the raw data I've got during ->d_lock audit. Except for a few functions (all static in fs/dcache.c), all scopes terminate in the same function where they begin.
... and for ->file_lock we have the following: expand_fdtable(): drops and regains expand_files(): either nothing or drops and regains do_dup2(): drops everything else is neutral.
20 functions touching that stuff total. Convertible to guard() or scoped_guard(): put_unused_fd(), fd_install(), close_fd() (scoped_guard only), __range_cloexec(), file_close_fd(), set_close_on_exec(), iterate_fd(), fcntl_setlk() and fcntl_setlk64() (scoped_guard only), seq_show() - 10 out of 20.
Anything that uses expand_fdtable() is in the best case an abuse of guard/scoped_guard; in the worst, it's actively confusing, since there's *not* one continuous scope there. expand_files() is in the same boat. That covers alloc_fd(), replace_fd() and ksys_dup3(). That's 5 out of remaining 10. BTW, trying to eliminate gotos from alloc_fd() is not fun either.
dup_fd(): spin_lock(&oldf->file_lock); ... while (unlikely(open_files > new_fdt->max_fds)) { spin_unlock(&oldf->file_lock); ... (blocking, possibly return on failure here) spin_lock(&oldf->file_lock); ... } ... spin_unlock(&oldf->file_lock); ... No way to do that with guard() - not unless you mix it with explicit unlock/lock in the middle of scope, and even that will be bitch to deal with due to failure exit after allocation failure. We'd need to do this: spin_unlock(&oldf->file_lock); if (new_fdt != &newf->fdtab) __free_fdtable(new_fdt); new_fdt = alloc_fdtable(open_files); spin_lock(&oldf->file_lock); if (IS_ERR(new_fdt)) { kmem_cache_free(files_cachep, newf); return ERR_CAST(new_fdt); } all of that under guard(spinlock)(&oldf->file_lock). IMO that would be too confusing and brittle.
__range_close(): spin_lock(&files->file_lock); ... for (; fd <= max_fd; fd++) { file = file_close_fd_locked(files, fd); if (file) { spin_unlock(&files->file_lock); filp_close(file, files); cond_resched(); spin_lock(&files->file_lock); } else if (need_resched()) { spin_unlock(&files->file_lock); cond_resched(); spin_lock(&files->file_lock); } } spin_unlock(&files->file_lock); Not a good fit for guard(), for the same reasons.
do_close_on_exec(): ... spin_lock(&files->file_lock); for (i = 0; ; i++) { .... for ( ; set ; fd++, set >>= 1) { .... spin_unlock(&files->file_lock); filp_close(file, files); cond_resched(); spin_lock(&files->file_lock); } } spin_unlock(&files->file_lock); Same story.
io_close(): might be convertible to scoped_guard; won't be pretty, AFAICS - that -EAGAIN case in the middle makes it very clumsy.
do_dup2(): well... we could lift filp_close() into the callers, but that ends up with fairly unpleasant boilerplate in the callers, and one of those callers is a fairly hot syscall.
And that's the remaining 5. For some locks scoped_guard() is a decent fit; for some it really isn't ;-/
linux-stable-mirror@lists.linaro.org