Hi Greg,
the issue might be due to this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tre…
2024-02-23T15:39:05.6297767Z CC kernel/sys_ni.o
2024-02-23T15:39:05.7583048Z security/apparmor/af_unix.c: In function
‘unix_state_double_lock’:
2024-02-23T15:39:05.7586076Z security/apparmor/af_unix.c:583:17: error:
too few arguments to function ‘unix_state_lock_nested’
2024-02-23T15:39:05.7588374Z 583 |
unix_state_lock_nested(sk2);
2024-02-23T15:39:05.7589913Z | ^~~~~~~~~~~~~~~~~~~~~~
2024-02-23T15:39:05.7591564Z In file included from
security/apparmor/include/af_unix.h:15,
2024-02-23T15:39:05.7593341Z from
security/apparmor/af_unix.c:17:
2024-02-23T15:39:05.7594989Z ./include/net/af_unix.h:77:20: note:
declared here
2024-02-23T15:39:05.7596733Z 77 | static inline void
unix_state_lock_nested(struct sock *sk,
2024-02-23T15:39:05.7598516Z |
^~~~~~~~~~~~~~~~~~~~~~
2024-02-23T15:39:05.7600862Z security/apparmor/af_unix.c:586:17: error:
too few arguments to function ‘unix_state_lock_nested’
2024-02-23T15:39:05.7603177Z 586 |
unix_state_lock_nested(sk1);
2024-02-23T15:39:05.7605189Z | ^~~~~~~~~~~~~~~~~~~~~~
2024-02-23T15:39:05.7606765Z ./include/net/af_unix.h:77:20: note:
declared here
2024-02-23T15:39:05.7608497Z 77 | static inline void
unix_state_lock_nested(struct sock *sk,
2024-02-23T15:39:05.7610208Z |
^~~~~~~~~~~~~~~~~~~~~~
2024-02-23T15:39:05.8002385Z make[2]: *** [scripts/Makefile.build:262:
security/apparmor/af_unix.o] Error 1
2024-02-23T15:39:05.8005077Z make[2]: *** Waiting for unfinished jobs....
2024-02-23T15:39:05.8094726Z CC crypto/scatterwalk.o
2024-02-23T15:39:05.9082621Z CC [M] fs/btrfs/sysfs.o
2024-02-23T15:39:06.2502316Z CC kernel/nsproxy.o
2024-02-23T15:39:06.4094246Z make[1]: *** [scripts/Makefile.build:497:
security/apparmor] Error 2
2024-02-23T15:39:06.4207119Z make: *** [Makefile:1750: security] Error 2
2024-02-23T15:39:06.4208636Z CC kernel/notifier.o
2024-02-23T15:39:06.4210296Z make: *** Waiting for unfinished jobs....
2024-02-23T15:39:06.8604827Z CC crypto/proc.o
--
Best, Philip
This is the backport of recently upstreamed series that moves VERW
execution to a later point in exit-to-user path. This is needed because
in some cases it may be possible for data accessed after VERW executions
may end into MDS affected CPU buffers. Moving VERW closer to ring
transition reduces the attack surface.
Patch 1/6 includes a minor fix that is queued for upstream:
https://lore.kernel.org/lkml/170899674562.398.6398007479766564897.tip-bot2@…
Patch 2/6 needed a conflict to be resolved for the hunk
swapgs_restore_regs_and_return_to_usermode.
This is only compile and boot tested on qemu.
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
To: stable(a)vger.kernel.org
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
---
Pawan Gupta (5):
x86/bugs: Add asm helpers for executing VERW
x86/entry_64: Add VERW just before userspace transition
x86/entry_32: Add VERW just before userspace transition
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
KVM/VMX: Move VERW closer to VMentry for MDS mitigation
Sean Christopherson (1):
KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
Documentation/arch/x86/mds.rst | 38 +++++++++++++++++++++++++-----------
arch/x86/entry/entry.S | 23 ++++++++++++++++++++++
arch/x86/entry/entry_32.S | 3 +++
arch/x86/entry/entry_64.S | 11 +++++++++++
arch/x86/entry/entry_64_compat.S | 1 +
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/entry-common.h | 1 -
arch/x86/include/asm/nospec-branch.h | 25 ++++++++++++------------
arch/x86/kernel/cpu/bugs.c | 15 ++++++--------
arch/x86/kernel/nmi.c | 3 ---
arch/x86/kvm/vmx/run_flags.h | 7 +++++--
arch/x86/kvm/vmx/vmenter.S | 9 ++++++---
arch/x86/kvm/vmx/vmx.c | 20 +++++++++++++++----
13 files changed, 112 insertions(+), 46 deletions(-)
---
base-commit: d8a27ea2c98685cdaa5fa66c809c7069a4ff394b
change-id: 20240226-delay-verw-backport-6-6-y-2cda3298e600
Oliver Upton (2):
KVM: arm64: vgic-its: Test for valid IRQ in
its_sync_lpi_pending_table()
KVM: arm64: vgic-its: Test for valid IRQ in MOVALL handler
virt/kvm/arm/vgic/vgic-its.c | 5 +++++
1 file changed, 5 insertions(+)
base-commit: ab219d38aef198d26083cc800954d352acd5137b
--
2.44.0.rc1.240.g4c46232300-goog
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 678e54d4bb9a4822f8ae99690ac131c5d490cdb1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024022622-resent-ripeness-43f1@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
678e54d4bb9a ("mm/zswap: invalidate duplicate entry when !zswap_enabled")
a65b0e7607cc ("zswap: make shrinking memcg-aware")
ddc1a5cbc05d ("mempolicy: alloc_pages_mpol() for NUMA policy without vma")
23e4883248f0 ("mm: add page_rmappable_folio() wrapper")
c36f6e6dff4d ("mempolicy trivia: slightly more consistent naming")
7f1ee4e20708 ("mempolicy trivia: delete those ancient pr_debug()s")
1cb5d11a370f ("mempolicy: fix migrate_pages(2) syscall return nr_failed")
3657fdc2451a ("mm: move vma_policy() and anon_vma_name() decls to mm_types.h")
3022fd7af960 ("shmem: _add_to_page_cache() before shmem_inode_acct_blocks()")
054a9f7ccd0a ("shmem: move memcg charge out of shmem_add_to_page_cache()")
4199f51a7eb2 ("shmem: shmem_acct_blocks() and shmem_inode_acct_blocks()")
e3e1a5067fd2 ("shmem: remove vma arg from shmem_get_folio_gfp()")
75c70128a673 ("mm: mempolicy: make mpol_misplaced() to take a folio")
cda6d93672ac ("mm: memory: make numa_migrate_prep() to take a folio")
6695cf68b15c ("mm: memory: use a folio in do_numa_page()")
667ffc31aa95 ("mm: huge_memory: use a folio in do_huge_pmd_numa_page()")
73eab3ca481e ("mm: migrate: convert migrate_misplaced_page() to migrate_misplaced_folio()")
2ac9e99f3b21 ("mm: migrate: convert numamigrate_isolate_page() to numamigrate_isolate_folio()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 678e54d4bb9a4822f8ae99690ac131c5d490cdb1 Mon Sep 17 00:00:00 2001
From: Chengming Zhou <zhouchengming(a)bytedance.com>
Date: Thu, 8 Feb 2024 02:32:54 +0000
Subject: [PATCH] mm/zswap: invalidate duplicate entry when !zswap_enabled
We have to invalidate any duplicate entry even when !zswap_enabled since
zswap can be disabled anytime. If the folio store success before, then
got dirtied again but zswap disabled, we won't invalidate the old
duplicate entry in the zswap_store(). So later lru writeback may
overwrite the new data in swapfile.
Link: https://lkml.kernel.org/r/20240208023254.3873823-1-chengming.zhou@linux.dev
Fixes: 42c06a0e8ebe ("mm: kill frontswap")
Signed-off-by: Chengming Zhou <zhouchengming(a)bytedance.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Nhat Pham <nphamcs(a)gmail.com>
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/zswap.c b/mm/zswap.c
index 36903d938c15..db4625af65fb 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1518,7 +1518,7 @@ bool zswap_store(struct folio *folio)
if (folio_test_large(folio))
return false;
- if (!zswap_enabled || !tree)
+ if (!tree)
return false;
/*
@@ -1533,6 +1533,10 @@ bool zswap_store(struct folio *folio)
zswap_invalidate_entry(tree, dupentry);
}
spin_unlock(&tree->lock);
+
+ if (!zswap_enabled)
+ return false;
+
objcg = get_obj_cgroup_from_folio(folio);
if (objcg && !obj_cgroup_may_zswap(objcg)) {
memcg = get_mem_cgroup_from_objcg(objcg);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x db744ddd59be798c2627efbfc71f707f5a935a40
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024022609-womanless-imprison-678c@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
db744ddd59be ("PCI/MSI: Prevent MSI hardware interrupt number truncation")
aa423ac4221a ("PCI/MSI: Split out irqdomain code")
a01e09ef1237 ("PCI/MSI: Split out !IRQDOMAIN code")
54324c2f3d72 ("PCI/MSI: Split out CONFIG_PCI_MSI independent part")
288c81ce4be7 ("PCI/MSI: Move code into a separate directory")
29a03ada4a00 ("PCI/MSI: Cleanup include zoo")
ae72f3156729 ("PCI/MSI: Make arch_restore_msi_irqs() less horrible.")
e58f2259b91c ("genirq/msi, treewide: Use a named struct for PCI/MSI attributes")
ade044a3d0f0 ("PCI/MSI: Remove msi_desc_to_pci_sysdata()")
9e8688c5f299 ("PCI/MSI: Make pci_msi_domain_write_msg() static")
29bbc35e29d9 ("PCI/MSI: Fix pci_irq_vector()/pci_irq_get_affinity()")
c36e33e2f477 ("Merge tag 'irq-urgent-2021-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From db744ddd59be798c2627efbfc71f707f5a935a40 Mon Sep 17 00:00:00 2001
From: Vidya Sagar <vidyas(a)nvidia.com>
Date: Mon, 15 Jan 2024 19:26:49 +0530
Subject: [PATCH] PCI/MSI: Prevent MSI hardware interrupt number truncation
While calculating the hardware interrupt number for a MSI interrupt, the
higher bits (i.e. from bit-5 onwards a.k.a domain_nr >= 32) of the PCI
domain number gets truncated because of the shifted value casting to return
type of pci_domain_nr() which is 'int'. This for example is resulting in
same hardware interrupt number for devices 0019:00:00.0 and 0039:00:00.0.
To address this cast the PCI domain number to 'irq_hw_number_t' before left
shifting it to calculate the hardware interrupt number.
Please note that this fixes the issue only on 64-bit systems and doesn't
change the behavior for 32-bit systems i.e. the 32-bit systems continue to
have the issue. Since the issue surfaces only if there are too many PCIe
controllers in the system which usually is the case in modern server
systems and they don't tend to run 32-bit kernels.
Fixes: 3878eaefb89a ("PCI/MSI: Enhance core to support hierarchy irqdomain")
Signed-off-by: Vidya Sagar <vidyas(a)nvidia.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Shanker Donthineni <sdonthineni(a)nvidia.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240115135649.708536-1-vidyas@nvidia.com
diff --git a/drivers/pci/msi/irqdomain.c b/drivers/pci/msi/irqdomain.c
index c8be056c248d..cfd84a899c82 100644
--- a/drivers/pci/msi/irqdomain.c
+++ b/drivers/pci/msi/irqdomain.c
@@ -61,7 +61,7 @@ static irq_hw_number_t pci_msi_domain_calc_hwirq(struct msi_desc *desc)
return (irq_hw_number_t)desc->msi_index |
pci_dev_id(dev) << 11 |
- (pci_domain_nr(dev->bus) & 0xFFFFFFFF) << 27;
+ ((irq_hw_number_t)(pci_domain_nr(dev->bus) & 0xFFFFFFFF)) << 27;
}
static void pci_msi_domain_set_desc(msi_alloc_info_t *arg,
From: Shyam Prasad N <nspmangalore(a)gmail.com>
commit 69cba9d3c1284e0838ae408830a02c4a063104bc upstream.
When the number of responses with status of STATUS_IO_TIMEOUT
exceeds a specified threshold (NUM_STATUS_IO_TIMEOUT), we reconnect
the connection. But we do not return the mid, or the credits
returned for the mid, or reduce the number of in-flight requests.
This bug could result in the server->in_flight count to go bad,
and also cause a leak in the mids.
This change moves the check to a few lines below where the
response is decrypted, even of the response is read from the
transform header. This way, the code for returning the mids
can be reused.
Also, the cifs_reconnect was reconnecting just the transport
connection before. In case of multi-channel, this may not be
what we want to do after several timeouts. Changed that to
reconnect the session and the tree too.
Also renamed NUM_STATUS_IO_TIMEOUT to a more appropriate name
MAX_STATUS_IO_TIMEOUT.
Fixes: 8e670f77c4a5 ("Handle STATUS_IO_TIMEOUT gracefully")
Signed-off-by: Shyam Prasad N <sprasad(a)microsoft.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
[Harshit: Backport to 5.15.y]
Conflicts:
fs/cifs/connect.c -- 5.15.y doesn't have commit 183eea2ee5ba
("cifs: reconnect only the connection and not smb session where
possible") -- User cifs_reconnect(server) instead of
cifs_reconnect(server, true)
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli(a)oracle.com>
---
Would be nice to get a review from author/maintainer of the upstream patch.
A backport request was made previously but the patch didnot apply
cleanly then:
https://lore.kernel.org/all/CANT5p=oPGnCd4H5ppMbAiHsAKMor3LT_aQRqU7tKu=q6q1…
xfstests with cifs done: before and after patching with this patch on 5.15.149.
There is no change in test results before and after the patch.
Ran: cifs/001 generic/001 generic/002 generic/005 generic/006 generic/007
generic/010 generic/011 generic/013 generic/014 generic/023 generic/024
generic/028 generic/029 generic/030 generic/036 generic/069 generic/074
generic/075 generic/084 generic/091 generic/095 generic/098 generic/100
generic/109 generic/112 generic/113 generic/124 generic/127 generic/129
generic/130 generic/132 generic/133 generic/135 generic/141 generic/169
generic/198 generic/207 generic/208 generic/210 generic/211 generic/212
generic/221 generic/239 generic/241 generic/245 generic/246 generic/247
generic/248 generic/249 generic/257 generic/263 generic/285 generic/286
generic/308 generic/309 generic/310 generic/315 generic/323 generic/339
generic/340 generic/344 generic/345 generic/346 generic/354 generic/360
generic/393 generic/394
Not run: generic/010 generic/286 generic/315
Failures: generic/075 generic/112 generic/127 generic/285
Failed 4 of 68 tests
SECTION -- smb3
=========================
Ran: cifs/001 generic/001 generic/002 generic/005 generic/006 generic/007
generic/010 generic/011 generic/013 generic/014 generic/023 generic/024
generic/028 generic/029 generic/030 generic/036 generic/069 generic/074
generic/075 generic/084 generic/091 generic/095 generic/098 generic/100
generic/109 generic/112 generic/113 generic/124 generic/127 generic/129
generic/130 generic/132 generic/133 generic/135 generic/141 generic/169
generic/198 generic/207 generic/208 generic/210 generic/211 generic/212
generic/221 generic/239 generic/241 generic/245 generic/246 generic/247
generic/248 generic/249 generic/257 generic/263 generic/285 generic/286
generic/308 generic/309 generic/310 generic/315 generic/323 generic/339
generic/340 generic/344 generic/345 generic/346 generic/354 generic/360
generic/393 generic/394
Not run: generic/010 generic/014 generic/129 generic/130 generic/239
Failures: generic/075 generic/091 generic/112 generic/127 generic/263 generic/285 generic/286
Failed 7 of 68 tests
SECTION -- smb21
=========================
Ran: cifs/001 generic/001 generic/002 generic/005 generic/006 generic/007
generic/010 generic/011 generic/013 generic/014 generic/023 generic/024
generic/028 generic/029 generic/030 generic/036 generic/069 generic/074
generic/075 generic/084 generic/091 generic/095 generic/098 generic/100
generic/109 generic/112 generic/113 generic/124 generic/127 generic/129
generic/130 generic/132 generic/133 generic/135 generic/141 generic/169
generic/198 generic/207 generic/208 generic/210 generic/211 generic/212
generic/221 generic/239 generic/241 generic/245 generic/246 generic/247
generic/248 generic/249 generic/257 generic/263 generic/285 generic/286
generic/308 generic/309 generic/310 generic/315 generic/323 generic/339
generic/340 generic/344 generic/345 generic/346 generic/354 generic/360
generic/393 generic/394
Not run: generic/010 generic/014 generic/129 generic/130 generic/239 generic/286 generic/315
Failures: generic/075 generic/112 generic/127 generic/285
Failed 4 of 68 tests
SECTION -- smb2
=========================
Ran: cifs/001 generic/001 generic/002 generic/005 generic/006 generic/007
generic/010 generic/011 generic/013 generic/014 generic/023 generic/024
generic/028 generic/029 generic/030 generic/036 generic/069 generic/074
generic/075 generic/084 generic/091 generic/095 generic/098 generic/100
generic/109 generic/112 generic/113 generic/124 generic/127 generic/129
generic/130 generic/132 generic/133 generic/135 generic/141 generic/169
generic/198 generic/207 generic/208 generic/210 generic/211 generic/212
generic/221 generic/239 generic/241 generic/245 generic/246 generic/247
generic/248 generic/249 generic/257 generic/263 generic/285 generic/286
generic/308 generic/309 generic/310 generic/315 generic/323 generic/339
generic/340 generic/344 generic/345 generic/346 generic/354 generic/360
generic/393 generic/394
Not run: generic/010 generic/286 generic/315
Failures: generic/075 generic/112 generic/127 generic/285
Failed 4 of 68 tests
---
fs/cifs/connect.c | 19 +++++++++++++++----
1 file changed, 15 insertions(+), 4 deletions(-)
diff --git a/fs/cifs/connect.c b/fs/cifs/connect.c
index a521c705b0d7..a3e4811b7871 100644
--- a/fs/cifs/connect.c
+++ b/fs/cifs/connect.c
@@ -59,7 +59,7 @@ extern bool disable_legacy_dialects;
#define TLINK_IDLE_EXPIRE (600 * HZ)
/* Drop the connection to not overload the server */
-#define NUM_STATUS_IO_TIMEOUT 5
+#define MAX_STATUS_IO_TIMEOUT 5
struct mount_ctx {
struct cifs_sb_info *cifs_sb;
@@ -965,6 +965,7 @@ cifs_demultiplex_thread(void *p)
struct mid_q_entry *mids[MAX_COMPOUND];
char *bufs[MAX_COMPOUND];
unsigned int noreclaim_flag, num_io_timeout = 0;
+ bool pending_reconnect = false;
noreclaim_flag = memalloc_noreclaim_save();
cifs_dbg(FYI, "Demultiplex PID: %d\n", task_pid_nr(current));
@@ -1004,6 +1005,8 @@ cifs_demultiplex_thread(void *p)
cifs_dbg(FYI, "RFC1002 header 0x%x\n", pdu_length);
if (!is_smb_response(server, buf[0]))
continue;
+
+ pending_reconnect = false;
next_pdu:
server->pdu_size = pdu_length;
@@ -1063,10 +1066,13 @@ cifs_demultiplex_thread(void *p)
if (server->ops->is_status_io_timeout &&
server->ops->is_status_io_timeout(buf)) {
num_io_timeout++;
- if (num_io_timeout > NUM_STATUS_IO_TIMEOUT) {
- cifs_reconnect(server);
+ if (num_io_timeout > MAX_STATUS_IO_TIMEOUT) {
+ cifs_server_dbg(VFS,
+ "Number of request timeouts exceeded %d. Reconnecting",
+ MAX_STATUS_IO_TIMEOUT);
+
+ pending_reconnect = true;
num_io_timeout = 0;
- continue;
}
}
@@ -1113,6 +1119,11 @@ cifs_demultiplex_thread(void *p)
buf = server->smallbuf;
goto next_pdu;
}
+
+ /* do this reconnect at the very end after processing all MIDs */
+ if (pending_reconnect)
+ cifs_reconnect(server);
+
} /* end while !EXITING */
/* buffer usually freed in free_mid - need to free it here on exit */
--
2.43.0
The patch below does not apply to the 6.7-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.7.y
git checkout FETCH_HEAD
git cherry-pick -x e3b63e966cac0bf78aaa1efede1827a252815a1d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024022610-amino-basically-add3@gregkh' --subject-prefix 'PATCH 6.7.y' HEAD^..
Possible dependencies:
e3b63e966cac ("mm: zswap: fix missing folio cleanup in writeback race path")
96c7b0b42239 ("mm: return the folio from __read_swap_cache_async()")
e947ba0bbf47 ("mm/zswap: cleanup zswap_writeback_entry()")
32acba4c0483 ("mm/zswap: refactor out __zswap_load()")
c75f5c1e0f1d ("mm/zswap: reuse dstmem when decompress")
b5ba474f3f51 ("zswap: shrink zswap pool based on memory pressure")
a65b0e7607cc ("zswap: make shrinking memcg-aware")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e3b63e966cac0bf78aaa1efede1827a252815a1d Mon Sep 17 00:00:00 2001
From: Yosry Ahmed <yosryahmed(a)google.com>
Date: Thu, 25 Jan 2024 08:51:27 +0000
Subject: [PATCH] mm: zswap: fix missing folio cleanup in writeback race path
In zswap_writeback_entry(), after we get a folio from
__read_swap_cache_async(), we grab the tree lock again to check that the
swap entry was not invalidated and recycled. If it was, we delete the
folio we just added to the swap cache and exit.
However, __read_swap_cache_async() returns the folio locked when it is
newly allocated, which is always true for this path, and the folio is
ref'd. Make sure to unlock and put the folio before returning.
This was discovered by code inspection, probably because this path handles
a race condition that should not happen often, and the bug would not crash
the system, it will only strand the folio indefinitely.
Link: https://lkml.kernel.org/r/20240125085127.1327013-1-yosryahmed@google.com
Fixes: 04fc7816089c ("mm: fix zswap writeback race condition")
Signed-off-by: Yosry Ahmed <yosryahmed(a)google.com>
Reviewed-by: Chengming Zhou <zhouchengming(a)bytedance.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reviewed-by: Nhat Pham <nphamcs(a)gmail.com>
Cc: Domenico Cerasuolo <cerasuolodomenico(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/zswap.c b/mm/zswap.c
index 350dd2fc8159..d2423247acfd 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1440,6 +1440,8 @@ static int zswap_writeback_entry(struct zswap_entry *entry,
if (zswap_rb_search(&tree->rbroot, swp_offset(entry->swpentry)) != entry) {
spin_unlock(&tree->lock);
delete_from_swap_cache(folio);
+ folio_unlock(folio);
+ folio_put(folio);
return -ENOMEM;
}
spin_unlock(&tree->lock);
The following patch accidentally removed the code for delivering
completions for cancelled reads and writes to user space: "[PATCH 04/33]
aio: remove retry-based AIO"
(https://lore.kernel.org/all/1363883754-27966-5-git-send-email-koverstreet@g…)
From that patch:
- if (kiocbIsCancelled(iocb)) {
- ret = -EINTR;
- aio_complete(iocb, ret, 0);
- /* must not access the iocb after this */
- goto out;
- }
This leads to a leak in user space of a struct iocb. Hence this patch
that restores the code that reports to user space that a read or write
has been cancelled successfully.
Fixes: 41003a7bcfed ("aio: remove retry-based AIO")
Cc: Christoph Hellwig <hch(a)lst.de>
Cc: Avi Kivity <avi(a)scylladb.com>
Cc: Sandeep Dhavale <dhavale(a)google.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
---
fs/aio.c | 27 +++++++++++----------------
1 file changed, 11 insertions(+), 16 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c
index da18dbcfcb22..28223f511931 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -2165,14 +2165,11 @@ COMPAT_SYSCALL_DEFINE3(io_submit, compat_aio_context_t, ctx_id,
#endif
/* sys_io_cancel:
- * Attempts to cancel an iocb previously passed to io_submit. If
- * the operation is successfully cancelled, the resulting event is
- * copied into the memory pointed to by result without being placed
- * into the completion queue and 0 is returned. May fail with
- * -EFAULT if any of the data structures pointed to are invalid.
- * May fail with -EINVAL if aio_context specified by ctx_id is
- * invalid. May fail with -EAGAIN if the iocb specified was not
- * cancelled. Will fail with -ENOSYS if not implemented.
+ * Attempts to cancel an iocb previously passed to io_submit(). If the
+ * operation is successfully cancelled 0 is returned. May fail with
+ * -EFAULT if any of the data structures pointed to are invalid. May
+ * fail with -EINVAL if aio_context specified by ctx_id is invalid. Will
+ * fail with -ENOSYS if not implemented.
*/
SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
struct io_event __user *, result)
@@ -2203,14 +2200,12 @@ SYSCALL_DEFINE3(io_cancel, aio_context_t, ctx_id, struct iocb __user *, iocb,
}
spin_unlock_irq(&ctx->ctx_lock);
- if (!ret) {
- /*
- * The result argument is no longer used - the io_event is
- * always delivered via the ring buffer. -EINPROGRESS indicates
- * cancellation is progress:
- */
- ret = -EINPROGRESS;
- }
+ /*
+ * The result argument is no longer used - the io_event is always
+ * delivered via the ring buffer.
+ */
+ if (ret == 0 && kiocb->rw.ki_flags & IOCB_AIO_RW)
+ aio_complete_rw(&kiocb->rw, -EINTR);
percpu_ref_put(&ctx->users);