cached operations sometimes need to do invalid operations (e.g. read on a write only file) Historic fscache had added a "writeback fid" for this, but the conversion to new fscache somehow lost usage of it: use the writeback fid instead of normal one.
Note that the way this works (writeback fid being linked to inode) means we might use overprivileged fid for some operations, e.g. write as root when we shouldn't. Ideally we should keep both fids handy, and only use the writeback fid when really required e.g. reads to a write-only file to fill in the page cache (read-modify-write); but this is the situation we've always had and this commit only fixes an issue we've had for too long.
Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching") Cc: stable@vger.kernel.org Cc: David Howells dhowells@redhat.com Reported-By: Christian Schoenebeck linux_oss@crudebyte.com Signed-off-by: Dominique Martinet asmadeus@codewreck.org --- Ok so finally had time to look at this, and it's not a lot so this is the most straight forward way to do: just reverting to how the old fscache worked.
This appears to work from quick testing, Chiristian could you test it?
I think the warnings you added in p9_client_read/write that check fid->mode might a lot of sense, if you care to resend it as WARN_ON((fid->mode & ACCMODE) == O_xyz); instead I'll queue that for 5.20
@Stable people, I've checked it applies to 5.17 and 5.18 so should be good to grab once I submit it for inclusion (that commit was included in 5.16, which is no longer stable)
fs/9p/vfs_addr.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 7382c5227e94..262968d02f55 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -58,7 +58,11 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) */ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file) { - struct p9_fid *fid = file->private_data; + struct inode *inode = file_inode(file); + struct v9fs_inode *v9inode = V9FS_I(inode); + struct p9_fid *fid = v9inode->writeback_fid; + + BUG_ON(!fid);
p9_fid_get(fid); rreq->netfs_priv = fid;
Dominique Martinet wrote on Tue, Jun 14, 2022 at 12:38:02PM +0900:
cached operations sometimes need to do invalid operations (e.g. read on a write only file) Historic fscache had added a "writeback fid" for this, but the conversion to new fscache somehow lost usage of it: use the writeback fid instead of normal one.
Note that the way this works (writeback fid being linked to inode) means we might use overprivileged fid for some operations, e.g. write as root when we shouldn't. Ideally we should keep both fids handy, and only use the writeback fid when really required e.g. reads to a write-only file to fill in the page cache (read-modify-write); but this is the situation we've always had and this commit only fixes an issue we've had for too long.
Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching") Cc: stable@vger.kernel.org Cc: David Howells dhowells@redhat.com Reported-By: Christian Schoenebeck linux_oss@crudebyte.com Signed-off-by: Dominique Martinet asmadeus@codewreck.org
Ok so finally had time to look at this, and it's not a lot so this is the most straight forward way to do: just reverting to how the old fscache worked.
This appears to work from quick testing, Chiristian could you test it?
I think the warnings you added in p9_client_read/write that check fid->mode might a lot of sense, if you care to resend it as WARN_ON((fid->mode & ACCMODE) == O_xyz); instead I'll queue that for 5.20
@Stable people, I've checked it applies to 5.17 and 5.18 so should be good to grab once I submit it for inclusion (that commit was included in 5.16, which is no longer stable)
fs/9p/vfs_addr.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 7382c5227e94..262968d02f55 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -58,7 +58,11 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) */ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file) {
- struct p9_fid *fid = file->private_data;
- struct inode *inode = file_inode(file);
- struct v9fs_inode *v9inode = V9FS_I(inode);
- struct p9_fid *fid = v9inode->writeback_fid;
Sorry for mails back-to-back (grmbl I hate git commit --amend not warning I only have unstaged changes), this is missing the following here:
+ /* If there is no writeback fid this file only ever has had + * read-only opens, so we can use file's fid which should + * always be set instead */ + if (!fid) + fid = file->private_data;
Christian, you can find it here to test: https://github.com/martinetd/linux/commit/a6e033c41cc9f0ec105f5d208b0a820118...
- BUG_ON(!fid);
p9_fid_get(fid); rreq->netfs_priv = fid;
Thanks,
On Dienstag, 14. Juni 2022 05:41:40 CEST Dominique Martinet wrote:
Dominique Martinet wrote on Tue, Jun 14, 2022 at 12:38:02PM +0900:
cached operations sometimes need to do invalid operations (e.g. read on a write only file) Historic fscache had added a "writeback fid" for this, but the conversion to new fscache somehow lost usage of it: use the writeback fid instead of normal one.
Note that the way this works (writeback fid being linked to inode) means we might use overprivileged fid for some operations, e.g. write as root when we shouldn't. Ideally we should keep both fids handy, and only use the writeback fid when really required e.g. reads to a write-only file to fill in the page cache (read-modify-write); but this is the situation we've always had and this commit only fixes an issue we've had for too long.
Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching") Cc: stable@vger.kernel.org Cc: David Howells dhowells@redhat.com Reported-By: Christian Schoenebeck linux_oss@crudebyte.com Signed-off-by: Dominique Martinet asmadeus@codewreck.org
Ok so finally had time to look at this, and it's not a lot so this is the most straight forward way to do: just reverting to how the old fscache worked.
This appears to work from quick testing, Chiristian could you test it?
I think the warnings you added in p9_client_read/write that check fid->mode might a lot of sense, if you care to resend it as WARN_ON((fid->mode & ACCMODE) == O_xyz); instead I'll queue that for 5.20
@Stable people, I've checked it applies to 5.17 and 5.18 so should be good to grab once I submit it for inclusion (that commit was included in 5.16, which is no longer stable)
fs/9p/vfs_addr.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index 7382c5227e94..262968d02f55 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -58,7 +58,11 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq)> */ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file) {
- struct p9_fid *fid = file->private_data;
- struct inode *inode = file_inode(file);
- struct v9fs_inode *v9inode = V9FS_I(inode);
- struct p9_fid *fid = v9inode->writeback_fid;
Sorry for mails back-to-back (grmbl I hate git commit --amend not warning I only have unstaged changes), this is missing the following here:
I think git does actually. It shows you staged and unstaged changes as comment below the commit log text inside the editor. Not as a big fat warning, but the info is there.
- /* If there is no writeback fid this file only ever has had
* read-only opens, so we can use file's fid which should
* always be set instead */
- if (!fid)
fid = file->private_data;
Christian, you can find it here to test: https://github.com/martinetd/linux/commit/a6e033c41cc9f0ec105f5d208b0a820118 e2bda8
BUG_ON(!fid);
p9_fid_get(fid); rreq->netfs_priv = fid;
It definitely goes into the right direction, but I think it's going a bit too far by using writeback_fid also in cases where it is not necessary and wasn't used before in the past.
What about something like this in v9fs_init_request() (yet untested):
/* writeback_fid is always opened O_RDWR (instead of just O_WRONLY) * explicitly for this case: partial write backs that require a read * prior to actual write and therefore requires a fid with read * capability. */ if (rreq->origin == NETFS_READ_FOR_WRITE) fid = v9inode->writeback_fid;
If desired, this could be further constrained later on like:
if (rreq->origin == NETFS_READ_FOR_WRITE && (fid->mode & O_ACCMODE) == O_WRONLY) { fid = v9inode->writeback_fid; }
I will definitely give these options some test spins here, a short feedback ahead would be appreciated though.
Thanks Dominique!
Best regards, Christian Schoenebeck
Christian Schoenebeck wrote on Tue, Jun 14, 2022 at 02:10:01PM +0200:
It definitely goes into the right direction, but I think it's going a bit too far by using writeback_fid also in cases where it is not necessary and wasn't used before in the past.
Would help if I had an idea of what was used where in the past.. :)
From a quick look at the code, checking out v5.10, v9fs_vfs_writepage_locked() used the writeback fid always for all writes v9fs_vfs_readpages is a bit more complex but only seems to be using the "direct" private_data fid for reads... It took me a bit of time but I think the reads you were seeing on writeback fid come from v9fs_write_begin that does some readpage on the writeback fid to populate the page before a non-filling write happens.
What about something like this in v9fs_init_request() (yet untested):
/* writeback_fid is always opened O_RDWR (instead of just O_WRONLY) * explicitly for this case: partial write backs that require a read * prior to actual write and therefore requires a fid with read * capability. */ if (rreq->origin == NETFS_READ_FOR_WRITE) fid = v9inode->writeback_fid;
... Which seems to be exactly what this origin is about, so if that works I'm all for it.
If desired, this could be further constrained later on like:
if (rreq->origin == NETFS_READ_FOR_WRITE && (fid->mode & O_ACCMODE) == O_WRONLY) { fid = v9inode->writeback_fid; }
That also makes sense, if the fid mode has read permissions we might as well use these as the writeback fid would needlessly be doing root IOs.
I will definitely give these options some test spins here, a short feedback ahead would be appreciated though.
Please let me know how that works out, I'd be happy to use either of your versions instead of mine. If I can be greedy though I'd like to post it together with the other couple of fixes next week, so having something before the end of the week would be great -- I think even my first overkill version early and building on it would make sense at this point.
But I think you've got the right end, so hopefully won't be needing to delay
Cheers,
On Dienstag, 14. Juni 2022 14:45:38 CEST Dominique Martinet wrote:
Christian Schoenebeck wrote on Tue, Jun 14, 2022 at 02:10:01PM +0200:
It definitely goes into the right direction, but I think it's going a bit too far by using writeback_fid also in cases where it is not necessary and wasn't used before in the past.
Would help if I had an idea of what was used where in the past.. :)
From a quick look at the code, checking out v5.10, v9fs_vfs_writepage_locked() used the writeback fid always for all writes v9fs_vfs_readpages is a bit more complex but only seems to be using the "direct" private_data fid for reads... It took me a bit of time but I think the reads you were seeing on writeback fid come from v9fs_write_begin that does some readpage on the writeback fid to populate the page before a non-filling write happens.
Yes, the overall picture in the past was not clear to me either.
To be more specific, I was reading your patch as if it would e.g. also use the writeback_fid if somebody explicitly called read() (i.e. not an implied read caused by a partial write back), and was concerned about a potential privilege escalation. Maybe it's just a theoretical issue, as this case is probably already catched on a higher, general fs handling level, but worth consideration.
What about something like this in v9fs_init_request() (yet untested): /* writeback_fid is always opened O_RDWR (instead of just O_WRONLY) * explicitly for this case: partial write backs that require a read * prior to actual write and therefore requires a fid with read * capability. */ if (rreq->origin == NETFS_READ_FOR_WRITE) fid = v9inode->writeback_fid;
... Which seems to be exactly what this origin is about, so if that works I'm all for it.
If desired, this could be further constrained later on like: if (rreq->origin == NETFS_READ_FOR_WRITE && (fid->mode & O_ACCMODE) == O_WRONLY) { fid = v9inode->writeback_fid; }
That also makes sense, if the fid mode has read permissions we might as well use these as the writeback fid would needlessly be doing root IOs.
I will definitely give these options some test spins here, a short feedback ahead would be appreciated though.
Please let me know how that works out, I'd be happy to use either of your versions instead of mine. If I can be greedy though I'd like to post it together with the other couple of fixes next week, so having something before the end of the week would be great -- I think even my first overkill version early and building on it would make sense at this point.
But I think you've got the right end, so hopefully won't be needing to delay
I need a day or two for testing, then I will report back for sure. So it should perfectly fit into your intended schedule.
Thanks!
Best regards, Christian Schoenebeck
On Dienstag, 14. Juni 2022 16:11:35 CEST Christian Schoenebeck wrote:
On Dienstag, 14. Juni 2022 14:45:38 CEST Dominique Martinet wrote:
[...]
Please let me know how that works out, I'd be happy to use either of your versions instead of mine. If I can be greedy though I'd like to post it together with the other couple of fixes next week, so having something before the end of the week would be great -- I think even my first overkill version early and building on it would make sense at this point.
But I think you've got the right end, so hopefully won't be needing to delay
I need a day or two for testing, then I will report back for sure. So it should perfectly fit into your intended schedule.
Two things:
1. your EBADF patch is based on you recent get/put refactoring patch, so it won't apply on stable.
2. I fixed the conflict and gave your patch a test spin, and it triggers the BUG_ON(!fid); that you added with that patch. Backtrace based on 30306f6194ca ("Merge tag 'hardening-v5.19-rc3' ..."):
[ 2.211473] kernel BUG at fs/9p/vfs_addr.c:65! ... [ 2.244415] netfs_alloc_request (fs/netfs/objects.c:42) netfs [ 2.245438] netfs_readahead (fs/netfs/buffered_read.c:166) netfs [ 2.246392] read_pages (./include/linux/pagemap.h:1264 ./include/linux/pagemap.h:1306 mm/readahead.c:164) [ 2.247120] ? folio_add_lru (./arch/x86/include/asm/preempt.h:103 mm/swap.c:468) [ 2.247911] page_cache_ra_unbounded (./include/linux/fs.h:808 mm/readahead.c:264) [ 2.248875] filemap_get_pages (mm/filemap.c:2594) [ 2.249723] filemap_read (mm/filemap.c:2679) [ 2.250478] ? ptep_set_access_flags (./arch/x86/include/asm/paravirt.h:441 arch/x86/mm/pgtable.c:493) [ 2.251417] ? _raw_spin_unlock (./arch/x86/include/asm/preempt.h:103 ./include/linux/spinlock_api_smp.h:143 kernel/locking/spinlock.c:186) [ 2.252253] ? do_wp_page (mm/memory.c:3293 mm/memory.c:3393) [ 2.253012] ? aa_file_perm (security/apparmor/file.c:604) [ 2.253824] new_sync_read (fs/read_write.c:402 (discriminator 1)) [ 2.254616] vfs_read (fs/read_write.c:482) [ 2.255313] ksys_read (fs/read_write.c:620) [ 2.256000] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 2.256764] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:115)
Did your patch work there for you? I mean I have not applied the other pending 9p patches, but they should not really make difference, right? I won't have time today, but I will continue to look at it tomorrow. If you already had some thoughts on this, that would be great of course.
Best regards, Christian Schoenebeck
Christian Schoenebeck wrote on Thu, Jun 16, 2022 at 03:35:59PM +0200:
- I fixed the conflict and gave your patch a test spin, and it triggers
the BUG_ON(!fid); that you added with that patch. Backtrace based on 30306f6194ca ("Merge tag 'hardening-v5.19-rc3' ..."):
hm, that's probably the version I sent without the fallback to private_data fid if writeback fid was sent (I've only commented without sending a v2)
- your EBADF patch is based on you recent get/put refactoring patch, so it won't apply on stable.
ugh, you are correct, that was wrong as well in the version I sent by mail... I've hurried that way too much.
The patch that's currently on the tip of my 9p-next branch should be alright though, I'll resend it now so you can apply cleanly if you don't want to fetch https://github.com/martinetd/linux/commits/9p-next
Did your patch work there for you? I mean I have not applied the other pending 9p patches, but they should not really make difference, right? I won't have time today, but I will continue to look at it tomorrow. If you already had some thoughts on this, that would be great of course.
Yes, my version passes basic tests at least, and I could no longer reproduce the problem.
Without the if (!fid) fid = file->private_data though it does fail horribly like you've found out.
-- Dominique
Dominique Martinet wrote on Thu, Jun 16, 2022 at 10:51:31PM +0900:
Did your patch work there for you? I mean I have not applied the other pending 9p patches, but they should not really make difference, right? I won't have time today, but I will continue to look at it tomorrow. If you already had some thoughts on this, that would be great of course.
Yes, my version passes basic tests at least, and I could no longer reproduce the problem.
For what it's worth I've also tested a version of your patch:
----- diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index a8f512b44a85..d0833fa69faf 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -58,8 +58,21 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) */ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file) { + struct inode *inode = file_inode(file); + struct v9fs_inode *v9inode = V9FS_I(inode); struct p9_fid *fid = file->private_data;
+ BUG_ON(!fid); + + /* we might need to read from a fid that was opened write-only + * for read-modify-write of page cache, use the writeback fid + * for that */ + if (rreq->origin == NETFS_READ_FOR_WRITE && + (fid->mode & O_ACCMODE) == O_WRONLY) { + fid = v9inode->writeback_fid; + BUG_ON(!fid); + } + refcount_inc(&fid->count); rreq->netfs_priv = fid; return 0; -----
And this also seems to work alright.
I was about to ask why the original code did writes with the writeback fid, but I'm noticing now the current code still does (through v9fs_vfs_write_folio_locked()), so that part hasn't changed from the old code, and init_request will only be getting reads? Which actually makes sense now I'm thinking about it because I recall David saying he's working on netfs writes now...
So that minimal version is probably what we want, give or take style adjustments (only initializing inode/v9inode in the if case or not) -- I sure hope compilers optimizes it away when not needed.
I'll let you test one or both versions and will fixup the commit message again/credit you/resend if we go with this version, unless you want to send it.
-- Dominique
On Donnerstag, 16. Juni 2022 15:51:31 CEST Dominique Martinet wrote:
Christian Schoenebeck wrote on Thu, Jun 16, 2022 at 03:35:59PM +0200:
- I fixed the conflict and gave your patch a test spin, and it triggers
the BUG_ON(!fid); that you added with that patch. Backtrace based on
30306f6194ca ("Merge tag 'hardening-v5.19-rc3' ..."):
hm, that's probably the version I sent without the fallback to private_data fid if writeback fid was sent (I've only commented without sending a v2)
Right, I forgot that you queued another version, sorry. With your already queued patch (today's v2) that's fine now.
On Donnerstag, 16. Juni 2022 16:11:16 CEST Dominique Martinet wrote:
Dominique Martinet wrote on Thu, Jun 16, 2022 at 10:51:31PM +0900:
Did your patch work there for you? I mean I have not applied the other pending 9p patches, but they should not really make difference, right? I won't have time today, but I will continue to look at it tomorrow. If you already had some thoughts on this, that would be great of course.
Yes, my version passes basic tests at least, and I could no longer reproduce the problem.
For what it's worth I've also tested a version of your patch:
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index a8f512b44a85..d0833fa69faf 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -58,8 +58,21 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) */ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file) {
struct inode *inode = file_inode(file);
struct v9fs_inode *v9inode = V9FS_I(inode); struct p9_fid *fid = file->private_data;
BUG_ON(!fid);
/* we might need to read from a fid that was opened write-only
* for read-modify-write of page cache, use the writeback fid
* for that */
if (rreq->origin == NETFS_READ_FOR_WRITE &&
(fid->mode & O_ACCMODE) == O_WRONLY) {
fid = v9inode->writeback_fid;
BUG_ON(!fid);
}
refcount_inc(&fid->count); rreq->netfs_priv = fid; return 0;
And this also seems to work alright.
I was about to ask why the original code did writes with the writeback fid, but I'm noticing now the current code still does (through v9fs_vfs_write_folio_locked()), so that part hasn't changed from the old code, and init_request will only be getting reads? Which actually makes sense now I'm thinking about it because I recall David saying he's working on netfs writes now...
So that minimal version is probably what we want, give or take style adjustments (only initializing inode/v9inode in the if case or not) -- I sure hope compilers optimizes it away when not needed.
I'll let you test one or both versions and will fixup the commit message again/credit you/resend if we go with this version, unless you want to send it.
-- Dominique
I tested all 3 variants today, and they were all behaving correctly (no EBADF errors anymore, no other side effects observed).
The minimalistic version (i.e. your initial suggestion) performed 20% slower in my tests, but that could be due to the fact that it was simply the 1st version I tested, so caching on host side might be the reason. If necessary I can check the performance aspect more thoroughly.
Personally I would at least use the NETFS_READ_FOR_WRITE version, but that's up to you. On doubt, clarify with David's plans.
Feel free to add my RB and TB tags to any of the 3 version(s) you end up queuing:
Reviewed-by: Christian Schoenebeck linux_oss@crudebyte.com Tested-by: Christian Schoenebeck linux_oss@crudebyte.com
Best regards, Christian Schoenebeck
Christian Schoenebeck wrote on Thu, Jun 16, 2022 at 10:14:16PM +0200:
I tested all 3 variants today, and they were all behaving correctly (no EBADF errors anymore, no other side effects observed).
Thanks!
The minimalistic version (i.e. your initial suggestion) performed 20% slower in my tests, but that could be due to the fact that it was simply the 1st version I tested, so caching on host side might be the reason. If necessary I can check the performance aspect more thoroughly.
hmm, yeah we open the writeback fids anyway so I'm not sure what would be really different performance-wise, but I'd tend to go with the most restricted change anyway.
Personally I would at least use the NETFS_READ_FOR_WRITE version, but that's up to you. On doubt, clarify with David's plans.
Feel free to add my RB and TB tags to any of the 3 version(s) you end up queuing:
Reviewed-by: Christian Schoenebeck linux_oss@crudebyte.com Tested-by: Christian Schoenebeck linux_oss@crudebyte.com
Thanks, I'll add these and resend the last version for archival on the list / commit message wording check.
At last that issue closed... -- Dominique
cached operations sometimes need to do invalid operations (e.g. read on a write only file) Historic fscache had added a "writeback fid", a special handle opened RW as root, for this. The conversion to new fscache missed that bit.
This commit reinstates a slightly lesser variant of the original code that uses the writeback fid for partial pages backfills if the regular user fid had been open as WRONLY, and thus would lack read permissions.
Link: https://lkml.kernel.org/r/20220614033802.1606738-1-asmadeus@codewreck.org Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching") Cc: stable@vger.kernel.org Cc: David Howells dhowells@redhat.com Reported-By: Christian Schoenebeck linux_oss@crudebyte.com Reviewed-by: Christian Schoenebeck linux_oss@crudebyte.com Tested-by: Christian Schoenebeck linux_oss@crudebyte.com Signed-off-by: Dominique Martinet asmadeus@codewreck.org --- v3: use the least permissive version of the patch that only uses writeback fid when really required
If no problem shows up by then I'll post this patch around Wed 23 (next week) with the other stable fixes.
fs/9p/vfs_addr.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index a8f512b44a85..d0833fa69faf 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -58,8 +58,21 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) */ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file) { + struct inode *inode = file_inode(file); + struct v9fs_inode *v9inode = V9FS_I(inode); struct p9_fid *fid = file->private_data;
+ BUG_ON(!fid); + + /* we might need to read from a fid that was opened write-only + * for read-modify-write of page cache, use the writeback fid + * for that */ + if (rreq->origin == NETFS_READ_FOR_WRITE && + (fid->mode & O_ACCMODE) == O_WRONLY) { + fid = v9inode->writeback_fid; + BUG_ON(!fid); + } + refcount_inc(&fid->count); rreq->netfs_priv = fid; return 0;
On Donnerstag, 16. Juni 2022 23:10:25 CEST Dominique Martinet wrote:
cached operations sometimes need to do invalid operations (e.g. read on a write only file) Historic fscache had added a "writeback fid", a special handle opened RW as root, for this. The conversion to new fscache missed that bit.
This commit reinstates a slightly lesser variant of the original code that uses the writeback fid for partial pages backfills if the regular user fid had been open as WRONLY, and thus would lack read permissions.
Link: https://lkml.kernel.org/r/20220614033802.1606738-1-asmadeus@codewreck.org Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching") Cc: stable@vger.kernel.org Cc: David Howells dhowells@redhat.com Reported-By: Christian Schoenebeck linux_oss@crudebyte.com Reviewed-by: Christian Schoenebeck linux_oss@crudebyte.com Tested-by: Christian Schoenebeck linux_oss@crudebyte.com Signed-off-by: Dominique Martinet asmadeus@codewreck.org
v3: use the least permissive version of the patch that only uses writeback fid when really required
If no problem shows up by then I'll post this patch around Wed 23 (next week) with the other stable fixes.
fs/9p/vfs_addr.c | 13 +++++++++++++ 1 file changed, 13 insertions(+)
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index a8f512b44a85..d0833fa69faf 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -58,8 +58,21 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) */ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file) {
struct inode *inode = file_inode(file);
struct v9fs_inode *v9inode = V9FS_I(inode); struct p9_fid *fid = file->private_data;
BUG_ON(!fid);
/* we might need to read from a fid that was opened write-only
* for read-modify-write of page cache, use the writeback fid
* for that */
if (rreq->origin == NETFS_READ_FOR_WRITE &&
(fid->mode & O_ACCMODE) == O_WRONLY) {
fid = v9inode->writeback_fid;
BUG_ON(!fid);
}
refcount_inc(&fid->count); rreq->netfs_priv = fid; return 0;
Some more tests this weekend; all looks fine. It appears that this also fixed the performance degradation that I reported early in this thread. Again, benchmarks compiling a bunch of sources:
Case Linux kernel version msize cache duration (average)
A) EBADF fix only [1] 512000 loose 31m 14s B) EBADF fix only [1] 512000 mmap 44m 1s C) EBADF fix + clunk fixes [2] 512000 loose 29m 32s D) EBADF fix + clunk fixes [2] 512000 mmap 44m 0s E) 5.10.84 512000 loose 35m 5s F) 5.10.84 512000 mmap 65m 5s
[1] 5.19.0-rc2 + EBADF fix v3 patch (alone): https://lore.kernel.org/lkml/20220616211025.1790171-1-asmadeus@codewreck.org...
[2] 5.19.0-rc2 + EBADF fix v3 patch + clunk fix patches, a.k.a. 9p-next: https://github.com/martinetd/linux/commit/b0017602fdf6bd3f344dd49eaee8b6ffee...
Conclusion: all thumbs in my possession pointing upwards. :)
Thanks Dominique!
Best regards, Christian Schoenebeck
Christian Schoenebeck wrote on Mon, Jun 20, 2022 at 02:47:24PM +0200:
Some more tests this weekend; all looks fine. It appears that this also fixed the performance degradation that I reported early in this thread.
wow, I wouldn't have expected the EBADF fix patch to have any impact on performance. Maybe the build just behaved differently enough to take more time with the errors?
Again, benchmarks compiling a bunch of sources:
Case Linux kernel version msize cache duration (average)
A) EBADF fix only [1] 512000 loose 31m 14s B) EBADF fix only [1] 512000 mmap 44m 1s C) EBADF fix + clunk fixes [2] 512000 loose 29m 32s D) EBADF fix + clunk fixes [2] 512000 mmap 44m 0s E) 5.10.84 512000 loose 35m 5s F) 5.10.84 512000 mmap 65m 5s
[1] 5.19.0-rc2 + EBADF fix v3 patch (alone): https://lore.kernel.org/lkml/20220616211025.1790171-1-asmadeus@codewreck.org...
[2] 5.19.0-rc2 + EBADF fix v3 patch + clunk fix patches, a.k.a. 9p-next: https://github.com/martinetd/linux/commit/b0017602fdf6bd3f344dd49eaee8b6ffee...
Conclusion: all thumbs in my possession pointing upwards. :)
Thanks Dominique!
Great news, thanks for the tests! :)
-- Dominique
On Montag, 20. Juni 2022 22:34:38 CEST Dominique Martinet wrote:
Christian Schoenebeck wrote on Mon, Jun 20, 2022 at 02:47:24PM +0200:
Some more tests this weekend; all looks fine. It appears that this also fixed the performance degradation that I reported early in this thread.
wow, I wouldn't have expected the EBADF fix patch to have any impact on performance. Maybe the build just behaved differently enough to take more time with the errors?
Maybe. It could also be less overhead using writeback_fid vs. dedicated fid, i.e. no walking and fid cloning required when just using the writeback_fid which is already there (reduced latency).
Probably also overall reduced total amount of fids might have some (smaller) impact, as on QEMU 9p server side we still have a simple linked list for fids which is iterated on each fid lookup. A proc-like interface for statistics (e.g. max. amount of fids) would be useful.
But honestly, all these things still don't really explain to me such a difference from performance PoV in regards to this patch, as the particular case handled by this patch does not appear to happen often.
Anyway, my plan is to identify performance bottlenecks in general more analytically this year. Now that we have macOS support for 9p in QEMU, I'll probably use Xcode's "Instruments" tool which really has a great way to graphically investigate complex performance aspects in a very intuitive and customizable way, which goes beyond standard profiling. Then I can hunt down performance issues by weight.
Best regards, Christian Schoenebeck
cached operations sometimes need to do invalid operations (e.g. read on a write only file) Historic fscache had added a "writeback fid" for this, but the conversion to new fscache somehow lost usage of it: use the writeback fid instead of normal one.
Note that the way this works (writeback fid being linked to inode) means we might use overprivileged fid for some operations, e.g. write as root when we shouldn't. Ideally we should keep both fids handy, and only use the writeback fid when really required e.g. reads to a write-only file to fill in the page cache (read-modify-write); but this is the situation we've always had and this commit only fixes an issue we've had for too long.
Link: https://lkml.kernel.org/r/20220614033802.1606738-1-asmadeus@codewreck.org Fixes: eb497943fa21 ("9p: Convert to using the netfs helper lib to do reads and caching") Cc: stable@vger.kernel.org Cc: David Howells dhowells@redhat.com Reported-By: Christian Schoenebeck linux_oss@crudebyte.com Signed-off-by: Dominique Martinet asmadeus@codewreck.org --- fs/9p/vfs_addr.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/fs/9p/vfs_addr.c b/fs/9p/vfs_addr.c index a8f512b44a85..7f924e671e3e 100644 --- a/fs/9p/vfs_addr.c +++ b/fs/9p/vfs_addr.c @@ -58,7 +58,17 @@ static void v9fs_issue_read(struct netfs_io_subrequest *subreq) */ static int v9fs_init_request(struct netfs_io_request *rreq, struct file *file) { - struct p9_fid *fid = file->private_data; + struct inode *inode = file_inode(file); + struct v9fs_inode *v9inode = V9FS_I(inode); + struct p9_fid *fid = v9inode->writeback_fid; + + /* If there is no writeback fid this file only ever has had + * read-only opens, so we can use file's fid which should + * always be set instead */ + if (!fid) + fid = file->private_data; + + BUG_ON(!fid);
refcount_inc(&fid->count); rreq->netfs_priv = fid;
linux-stable-mirror@lists.linaro.org